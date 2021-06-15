Cumulus contamination in SECM. We performed scBS-seq on 194 SECM cfDNA samples using a protocol that does not require extracting cfDNA (see Methods). For each sample, we sequenced an average of 5 Gb, generating 3.6 Gb of clean data, which covered an average of 5.3 million CpG sites (≥1×) (Supplemental Table 1; supplemental material available online with this article; https://doi.org/10.1172/JCI146051DS1). Using the quality criterion of the number of unique mapping reads being greater than 1 million, 191 (98.5%) good-quality samples were obtained for subsequent analysis. We also performed scBS-seq on cumulus cells (n = 12) obtained from 4 individuals and sequenced an average of 8 Gb for each sample.

We retrieved the scBS-seq data of the preimplantation embryonic cells and the germ cells published in our previous study (23). The whole-genome DNA methylation levels of the SECM cfDNAs ranged from 13% to 74%, with a median value of 36%, and these levels were significantly higher than the reported levels in the inner cell mass (ICM) and TE (24% and 24% for ICM and TE, respectively; P < 0.01, two-tailed Mann-Whitney-Wilcoxon [MWW] test). Clustering analysis showed that a portion (50 of 191) of the SECM was clustered with cumulus cells (cluster III, Figure 2A). These samples displayed high DNA methylation levels (average 60%), which were close to those of the cumulus cells (average 71%).

Figure 2 Assessment of cumulus contamination in SECM cfDNA. (A) Unsupervised hierarchical clustering analysis of DNA methylation levels in the SECM cfDNA samples, human preimplantation embryos, germ cells, and cumulus cells. GV, germinal vesicle oocytes; MII, metaphase II oocytes; PN, pronuclei. (B) Heatmap of 769 CpG islands (C-DMRs) that are specifically hypermethylated in cumulus cells. (C) Scatter plot showing a positive correlation between the whole-genome DNA methylation levels and the C-DMR methylation levels in the SECM cfDNAs. The 2-tailed Mann-Whitney-Wilcoxon test was used to assess significance. (D) Box-and-whisker plot showing the whole-genome DNA methylation levels of the ICM, TE, cumulus cells, and 3 SECM cfDNA groups with no, moderate, and severe cumulus contamination degrees as estimated by the C-DMR methylation levels. (E) Bar plots showing the general concordance rate, false negative rate, and false positive rate of the 3 groups of SECM compared with TE biopsy.

To accurately assess the fraction of cumulus cell–derived DNA in SECM, we identified 769 CpG islands as cumulus differentially methylated regions (C-DMRs) that were highly methylated in cumulus cells and nearly unmethylated in preimplantation embryonic cells, including the ICM, TE, and oocytes (average methylation levels of 92%, 4%, and 3% for cumulus cells, ICM/TE, and MII oocytes, respectively; Supplemental Table 2 and Figure 2B). Notably, the average methylation levels of these C-DMRs were positively correlated with the whole-genome DNA methylation levels of SECM, indicating that the high whole-genome methylation levels of the SECM could largely be attributed to contamination of the cumulus cells (R = 0.93, P < 2.2 × 10–16, Pearson’s correlation; Figure 2C).

We determined that approximately half of the SECM samples (95 of 191) were contaminated with cumulus cells (C-DMR methylation levels higher than 8% [mean 4% + 3 SD (3 × 1.3%) of the C-DMR methylation level in ICM/TE]). Among them, approximately half (50 of 95) displayed moderate contamination (C-DMR methylation levels 8% to 40%), and the other half (45 of 95) displayed severe contamination (C-DMR methylation levels >40%). As expected, the whole-genome methylation levels increased from the no- to severe-contamination groups (Figure 2D).

Together, these results show that our DNA methylation analysis confirmed the assumption of cumulus contamination in SECM.

Detection of chromosome aneuploidy by scBS-seq. We have previously shown that scBS-seq is capable of assessing copy number (CN) variations (CNVs) (28, 29). We analyzed HCT116 cells, and the results showed that scBS-seq and multiple annealing and looping-based amplification cycles (MALBAC; ref. 30) gave the same expected CN profiles (Supplemental Figure 1A). To estimate the lower limit of the sequencing depth for accurate CNV calling, we downsampled the data, and the results showed that the coefficient of variation remained as low as 2 Mb (Supplemental Figure 1B). The majority of the SECM samples (182 of 191) gave informative CN profiles; the remaining 9 showed more than 6 aneuploidies and were defined as “aneuploid-chaotic” and were not used for further analysis. According to consistency between SECM and TE biopsy, the embryos were divided into 4 categories: (a) euploid in SECM versus euploid in TE biopsy (Euploid-Euploid), (b) euploid in SECM versus aneuploid in TE biopsy (Euploid-Aneuploid), (c) aneuploid in SECM versus euploid in TE biopsy (Aneuploid-Euploid), and (d) aneuploid in SECM versus aneuploid in TE biopsy (Aneuploid-Aneuploid). The Aneuploid-Aneuploid samples were further grouped into “Full ploidy concordance,” “Partial ploidy concordance (overlapping),” “Partial ploidy concordance (complementary),” and “Partial ploidy concordance (nonoverlapping)” (Supplemental Figure 2A). Figure 3 and Supplemental Figure 2B show the representative samples in each category.

Figure 3 scBS-seq detects chromosome aneuploidy in SECM. Representative CN profiles of SECM in different categories. The results of SECM versus TE biopsy are presented.

Notably, SECM with no cumulus cell contamination showed the highest general concordance rate (GCR) (73.9%, 68 of 92) and the lowest false negative rate (FNR) (13.7%, 7 of 51), while SECM with severe cumulus cell contamination showed the lowest GCR (46.5%, 20 of 43) and the highest FNR (90.0%, 18 of 20) (Figure 2E). The false positive rates (FPRs) were 41.5%, 35.0%, and 21.7% in the no-, moderate-, and severe-contamination groups, respectively (Figure 2E). These “false positive” cases should mainly reflect CNV mosaicism of the embryo that was detected in the SECM cfDNA but not detected by the TE biopsy. Since the cumulus cell is mostly euploidy, an increase in the cumulus DNA fraction, i.e., contamination, will result in an increase in the euploid DNA fraction and thus a reduction in the “false positive” aneuploidy rate, which is not a technical artifact but indeed embryonic mosaicism.

Together, our results demonstrated that scBS-seq is sensitive for detecting chromosome aneuploidy in SECM. The cumulus contamination led to an increased FNR, a decreased FPR, and a decreased GCR.

Polar body contamination in SECM. To further explore the cellular origins of SECM, we performed clustering analysis for the samples with no cumulus contamination (n = 96), as well as for the preimplantation embryonic cells and germ cells. The results showed that most SECM samples (92 of 96) were clustered with the ICM and TE, while 1 (S167) and 2 (S176 and S193) samples were notably clustered with the MII oocytes and female pronuclei, respectively (Figure 4A). Since the genomic DNA of oocytes and pronuclei should not be released, these SECM most likely contained components of polar bodies, which are produced by the oocyte during meiosis.

Figure 4 Polar body contamination in SECM. (A) Unsupervised hierarchical clustering of whole-genome DNA methylation for the SECM samples with no cumulus cell contamination, human preimplantation embryos of different stages, germ cells, and cumulus cells. GV, germinal vesicle oocytes; MII, metaphase II oocytes; PN, pronuclei. (B) A total of 548 regions (O-DMRs) were specifically hypermethylated in the MII oocytes. (C) Chromosome CN profiles of 2 SECM samples clustered with the female pronuclei (upper, S167) or the MII oocytes (lower, S176). The chromosome aneuploidy results of TE biopsy and SECM are indicated, along with the methylation levels of the C-DMRs and the O-DMRs. (D) Correlations between non-CpG (left, CHG; right, CHH) DNA methylation levels and the O-DMR DNA methylation levels. CHG/CHH are short for methylation levels on non-CpG islands; H represents A (adenine) or T (thymine). The 2-tailed Mann-Whitney-Wilcoxon test was used to assess significance.

To further assess polar body contamination, we identified 548 oocyte/polar body–specific DMRs (O-DMRs) with high methylation in MII oocytes but low methylation in preimplantation embryonic cells and cumulus cells (average methylation levels of 19%, 22%, and 82% for cumulus cells, ICM/TE, and oocytes, respectively; Supplemental Table 2 and Figure 4B), assuming that polar bodies have similar DNA methylation profiles to those of oocytes. The 3 SECM samples indeed displayed significantly higher methylation levels for the O-DMRs than the other SECM samples (median methylation levels 100%, 56%, and 79% for S167, S176, and S193, respectively, versus a median of 14% for the other SECM samples, P < 0.01; Supplemental Figure 3A).

Remarkably, the chromosome CN profiles showed that all 3 SEM samples were false negative or gender discordant; the TE biopsy results were “46, XY” for S176 and S193 and “–21, XX” for S167, but all 3 SECM samples were “46, XX” (Figure 4C and Supplemental Figure 3B). They were clearly not contaminated by cumulus cells, as shown by the C-DMR methylation levels.

We determined that approximately one-third (27%, 53 of 191) of the SECM samples were contaminated with polar bodies (O-DMR methylation levels higher than 31% [mean 22% + 3 SD (3 × 3%) of the O-DMR methylation level in ICM/TE]). We also examined the non-CpG methylation level, which is higher in oocytes than in embryonic cells of other preimplantation stages (31). The results showed that the methylation levels in both the CHG and CHH (non-CpG) contexts were positively correlated with the O-DMR methylation levels (CHG: R = 0.52, P = 4.6 × 10−8; CHH: R = 0.55, P = 6.8 × 10−9; Pearson’s correlation, 2-tailed MWW test; Figure 4D).

We also explored whether the SECM cfDNA was derived from the ICM or TE. Our recent study profiled the DNA methylation patterns of epiblast (EPI) and TE samples using single-cell triple-omics sequencing (32). Principal component analysis (PCA) showed that the EPI and TE can be roughly separated based on DNA methylation profiles. We focused on the day 6 SECM samples with no cumulus cell or polar body contamination (n = 61). The results showed that approximately one-third (18 of 61) of SECM samples were positioned with TE and that approximately two-thirds (43 of 61) were positioned with EPI (P < 0.01, χ2 test) (Supplemental Figure 3C). The promoter methylation levels of EPI differentially expressed genes (DEGs) divided by those of TE DEGs can distinguish between EPI and TE. The distribution suggested that SECM can be derived from both the TE and ICM (P < 0.01, two-tailed MWW test; Supplemental Figure 3D).

Together, the DNA methylation clustering, DMR, non-CpG, and chromosome CN analyses demonstrated the presence of polar body contamination in SECM.

Deducing the maternal DNA contamination ratio and integrated chromosome aneuploidy analysis. We next sought to deduce the maternal DNA contamination ratio. The methylation levels of the C-DMRs and O-DMRs were used to set up an algorithm for deducing the cumulus and polar body DNA fractions in SECM, respectively, and then, 2 fractions were added to obtain the net maternal DNA contamination ratio (see Methods). To test the accuracy of the approach, we performed simulation analysis by generating a series of synthetic data sets with different cumulus and polar body percentages mixed with the ICM/TE (Figure 5A). As shown in Figure 5B, the estimated percentages correlated well with the input percentages of the DNA mixtures, which gave linear regression lines (R = 0.99, Pearson’s correlation, 2-tailed MWW test).

Figure 5 DNA mixing analysis. (A) Pie charts depicting the results of the simulated DNA mixing experiment. Different percentages of DNA methylation data of the polar body (the MII oocyte), ICM/TE, and cumulus cells were mixed, including 100% input from 1 of the 3 components (100% input), 50% input from each of 2 components (50% + 1 input), 75% input of 1 component plus 25% input of 1 other component (75% + 1 input), 50% input of 1 component plus 25% each of the other 2 components (50% + 2 input), and 75% input of 1 component plus 12.5% each of the other 2 components (75% + 2 input). The input percentages and the predicted percentages are shown for comparison. (B) Correlations between the predicted and input component fractions of the simulated DNA mixing experiment. The 2-tailed Mann-Whitney-Wilcoxon test was used to assess significance.

We then assessed SECM. The cumulus cells contributed to more severe contamination (cumulus cell ratio > 60%: 39 of 182, 22%) than the polar body (polar body ratio > 60%: 7 of 182, 4%; Supplemental Figure 4A). Two fractions were slightly correlated (R = –0.19, Pearson’s correlation, 2-tailed MWW test), possibly reflecting that situations such as lower embryonic fractions lead to higher maternal fractions from both origins (Figure 6A). It was clear that high polar body ratios occurred in SECM samples with no or mild cumulus cell proportions and vice versa (Figure 6A). For net maternal DNA contamination, approximately one-third (31.3%, 57 of 182) of the samples showed a ratio greater than 60%, and one-third (34.1%, 62 of 182) showed a ratio less than 20% (Figure 6B).

Figure 6 Maternal DNA ratio in SECM and integrated chromosome aneuploidy analysis. (A) Scatter plot showing the correlation between the cumulus cell and polar body contamination fractions in SECM. The percentage distribution of each fraction is shown. The 2-tailed Mann-Whitney-Wilcoxon test was used to assess significance. (B) Pie chart showing the numbers and percentages of the SECM samples with different net maternal DNA contamination ratios. (C) Histograms showing GDRs (left) and FNRs (right) for different ratios of cumulus cell, polar body, and net maternal contamination. (D) Representative CN profiles for false negative SECM with nearly no maternal DNA contamination.

To investigate the effect of maternal contamination, we calculated sensitivity, specificity, positive and negative predictive value, as well as the gender discordance rate (GDR) and FNR, using the TE biopsy as the reference (Supplemental Figure 4B). Notably, the GDR reached zero (0%, 0 of 24) when the net maternal ratio was less than 20%, indicating that this SECM group indeed had minimal maternal contamination (Figure 6C). In contrast, the GDR remained at 18% (9 of 49) when only the cumulus cell ratio was less than 20% and remained at 42% (24 of 57) when only the polar body ratio was less than 20%. Examination of the chromosome CN profiles confirmed that these samples were affected by contamination from the corresponding maternal components, as shown in Figure 4C. This further confirmed that both the cumulus cells and polar body contributed to maternal contamination.

Interestingly, the FNR was still high (16%, 6 of 37) when the net maternal ratio was less than 20%. Close examination of the chromosome CN profiles suggested that these SECM FNRs were mosaic aneuploidy with signs of CN gain or loss matching or complementing the TE biopsy results in most (5 of 6) cases (Figure 6, C and D, and Supplemental Figure 4C). This suggested that these embryos contained both aneuploid and euploid cells, with the euploid cells not sampled by TE biopsy.

Both the GDR and FNR increased with increasing cumulus cell, polar body, and net maternal contamination ratios. Remarkably, when the net maternal ratio was higher than 60%, the GDR and FNR increased to 100% (31 of 31) and 75% (6 of 8), respectively (Figure 6C).

We also examined the sampling time and found that the cumulus ratios, GDR, and FNR were significantly lower in the day 6 samples than in the day 5 samples (Supplemental Figure 4, D and G). The amplified DNA amounts were significantly higher in the day 6 samples than in the day 5 samples, indicating that the day 6 samples had more embryonic DNA (Supplemental Figure 4F). Interestingly, the polar body ratios were not different between these 2 groups, suggesting that the polar body DNA continued to be released between day 5 and day 6 (Supplemental Figure 4E).

Next, we wanted to determine the impact of maternal contamination and chromosome CN on DNA concentration in the culture medium. Our results showed that the amplified DNA amount decreased with increasing maternal contamination ratios, suggesting that the main variable was the amount of embryonic DNA (2-tailed MWW test; Supplemental Figure 5A). The amplified DNA amount was not different between embryos with and without CNVs (2-tailed MWW test; Supplemental Figure 5B).

In summary, we established an algorithm for deducing the maternal contamination ratio using scBS-seq, which allowed recognition of the SECM samples with a low GDR and FNR in the chromosome aneuploidy analysis.