Production of F 1 animals by ICSI. To examine whether SSC manipulation affects ART outcome, we used GS cells from C57BL/6 Tg14(act-EGFP)OsbY01(Green) mice that express Egfp gene ubiquitously (B6-GS cells) (Supplemental Figure 1A; supplemental material available online with this article; https://doi.org/10.1172/JCI170140DS1). B6-GS cells appeared very similar to GS cells in a DBA/2 background (DBA-GS cells), which produce offspring by natural mating even after long-term culture (22). Bisulfite sequencing analysis showed typical androgenetic DNA methylation patterns with hypermethylation of H19 and Meg3 IG differentially methylated regions (DMRs) and hypomethylation in Igf2r and Snrpn DMRs in both cell types (Supplemental Figure 1B). Real-time PCR analysis was consistent with the DNA methylation patterns (Supplemental Figure 1C). B6-GS cells were transplanted into the seminiferous tubules of congenitally infertile WBB6F1-W/Wv mice (W) to produce sperm (23). Within 3 months, B6-GS cells generated SYCP3+ spermatocytes and peanut agglutinin+ (PNA+) haploid cells (Supplemental Figure 1, D and E). To produce offspring, sperm or elongated spermatids were used for ICSI (24). We also used sperm freshly prepared from green mouse testes as a control (Figure 1A).

Figure 1 Congenital abnormalities in ICSI-derived offspring. (A) Experimental outline. (B) Body and placental weight at the time of birth (n = 31 for ICSI-F 1 ; n = 37 for GS-F 1 ; n = 63 for young control-F 2 ; n = 50 for young ICSI-F 2 ; n = 109 for young GS-F 2 ; n = 25 for aged control-F 2 ; n = 103 for aged ICSI-F 2 ; n = 34 for control-F 3 ; n = 45 for ICSI-F 3 ; n = 49 for GS-F 3 ). (C) F 1 offspring produced by ICSI and SSC transplantation. (D) Congenital deformities found in F 2 offspring produced by IVF using sperm from young (15 months) or aged (25 months) F 1 mice. (E) Congenital deformities found in F 3 offspring produced by IVF using sperm from F 2 mice. (F) Congenital deformities found in F 3 offspring produced by natural mating between F 2 mice. *P < 0.05, 2-tailed Student’s t test.

After Caesarean section, we found that significantly fewer mice were born from W mice compared with mice born after ICSI using fresh sperm (Supplemental Table 1). The most striking finding was the production of placenta-only offspring (5.2% vs. 0.4%). Bodies and placentas from GS cell–derived mice were larger than those of ICSI mice (Figure 1, B and C). Litter size and body/placental ratio, which is a measure of placental efficiency, were comparable between the two groups (Figure 1B and Supplemental Table 1). Because ICSI produces offspring with abnormal imprinting and may influence body weight (25), we performed combined bisulfite restriction enzyme analysis (COBRA). We collected tail DNA and determined the DNA methylation levels of DMRs in H19, Meg3 IG, Igf2r, and Snrpn. None of the mice showed abnormalities (Supplemental Figure 2A). Bisulfite sequencing confirmed these results (Supplemental Figure 3A).

Behavior analysis of F 1 animals. To examine the functional effect on offspring, we conducted a battery of behavioral tests (26). In this experiment, we used only male mice because no obvious sexual differences in behavior were found in a previous study using offspring born after spermatogonial transplantation (27). We compared 3 groups of male mice: F 1 offspring produced by ICSI using wild-type sperm (ICSI-F 1 ) or sperm from GS cells (GS-F 1 ) and control offspring sired by natural mating (control-F 1 ) (Figure 1A). Although GS-F 1 mice were heavier, no difference was found in the grip strength and wire hang tests, and they did not exhibit abnormal sensitivity to a thermal stimulus in hot plate test (Supplemental Figure 4, A–D).

Several tests showed reduced locomotor activity of GS-F 1 mice. GS-F 1 mice showed reduced distance traveled in the light/dark transition test (Supplemental Figure 4E). Activity level was significantly lower in 24-hour cage monitoring (Supplemental Figure 4F). An open-field test, which is used to assay general locomotor activity levels, anxiety, and exploration activity, showed a tendency toward less activity in GS-F 1 mice (Supplemental Figure 4G). GS-F 1 mice showed reduced vertical behavior, spent less time in the center area compared with other types of mice, and had lower stereotypic counts.

The most notable characteristic of GS-F 1 mice was their startle response (Figure 2A). Prepulse inhibition of the acoustic startle response is an index of sensorimotor gating. The startle responses to acoustic stimulation at 110 and 120 dB in GS-F 1 mice were significantly impaired compared with control mice, suggesting a hearing deficit in GS-F 1 offspring. However, a weak auditory stimulus at 74 and 78 dB inhibited the startle response more significantly in GS-F 1 mice, indicating that they do not have a hearing deficit.

Figure 2 Abnormal behavior of F 1 offspring. (A) Acoustic response and prepulse inhibition test. (B) Three-chamber social approach test (Crawley version). In the sociability test, time spent in or around the chamber with an empty cage, the center cage, and the chamber with a stranger mouse (stranger 1) were recorded. In the social novelty preference test, time spent in or around the chamber with a stranger mouse (stranger 1), the center cage, and the chamber with a novel stranger mouse (stranger 2) were recorded. (C) Elevated plus maze test. (D) Cued and contextual fear conditioning test. The number of mice analyzed is as follows: (A and B) n = 13 for control, n = 14 for ICSI-F 1 , and n = 14 for GS-F 1 ; (C) n = 13 for control, n = 15 for ICSI-F 1 , and n = 14 for GS-F 1 ; and (D) n = 13 for control, n = 14 for ICSI-F 1 , and n = 13 for GS-F 1 . *P < 0.05, 1-way ANOVA (mouse type) or 2-way repeated measures ANOVA (mouse type, 2-way interaction [e.g., mouse type time interaction]). CS, conditioned stimulus; UCS, unconditioned stimulus. See Supplemental Methods and Supplemental Tables 4 and 5 for details.

Although the tail suspension test showed reduced mobility of GS-F 1 mice (Supplemental Figure 4H), the Porsolt forced swim test, another test for depressive behavior, showed enhanced immobility and reduced distance traveled (Supplemental Figure 4I). However, because this test also depends on locomotor activity, the result may simply reflect their low locomotive activity. ICSI-F 1 mice did not show differences in the immobility (i.e., distance traveled); however, several abnormalities were common between ICSI-F 1 and GS-F 1 mice. The 3-chamber social approach test (assessing sociability) revealed decreased social behavior in both types of mice (Figure 2B). The sociability test, which compares the behavior around an empty cage and a cage with a stranger mouse (stranger 1), showed that ICSI-F 1 mice spent less time around the stranger side. Moreover, ICSI-F 1 and GS-F 1 mice traveled shorter distance, and the average speed of GS-F 1 mice was reduced. Although abnormalities in social behavior in GS-F 1 mice were evident in the social interaction test in a new environment (Supplemental Figure 4J), this test did not show abnormalities in ICSI-F 1 mice. However, in the elevated plus maze test, which reflects anxiety-like behavior, GS-F 1 and ICSI-F 1 mice entered into open arm significantly less frequently (Figure 2C). Therefore, ICSI-F 1 and GS-F 1 mice exhibited abnormalities in social behavior and increase in anxiety response.

ICSI-derived F 1 offspring have impaired memory function (28). To confirm this, we performed several tests. First, the T-maze test, which examines working memory, did not show a defect in ICSI-F 1 mice (Supplemental Figure 4K). Second, the Barnes maze test, which assesses spatial learning and memory, showed that ICSI-F 1 mice spent significantly less time around the target hole in probe tests performed 1 month after the last training session and the rate of omission error was significantly increased, suggesting impaired memory retention (Supplemental Figure 4L). A cued and contextual fear conditioning test showed an increase in freezing response and decrease in distance traveled in GS-F 1 mice in the training session (Figure 2D). GS-F 1 mice no longer showed abnormalities in the retention test. These results confirmed that ICSI-F 1 mice have impaired memory.

Implantation failure and congenital malformation in F 2 offspring. To examine whether abnormalities are transmitted to the F 2 generation, we performed IVF using F 1 sperm and wild-type oocytes (Supplemental Table 1). After Caesarean section, we found that body and placenta weights of GS-F 2 mice were significantly increased (Figure 1B). Moreover, the implantation rate was significantly reduced in ICSI-F 2 mice (Supplemental Table 1). The frequency of placenta-only offspring increased by approximately 16.8-fold when compared with that of ICSI-F 1 mice. The combined numbers of dead and placenta-only offspring was higher for ICSI-F 2 offspring, which accounted for approximately 29.6% of newborn offspring. Notably, 8.5% and 1.7% of ICSI-F 2 offspring exhibited hydrocephalus and anophthalmia, respectively (Figure 1D). Hydrocephalus was also found in 1 GS-F 2 offspring. Litter size and body/placental ratio were comparable among the 3 groups (Figure 1B and Supplemental Table 1).

We performed another set of IVF using approximately 25-month-old control-F 1 and ICSI-F 1 mice to confirm whether ICSI per se causes abnormalities. After Caesarean section, we found that 17.6% of ICSI-F 2 mice were either dead or placenta only, compared with 3.8% for control-F 2 mice. Overall, 11.2% of ICSI-F 2 offspring exhibited congenital malformation. Along with hydrocephalus (1.7%) and anophthalmia (2.6%), offspring were born with small or open eyes (3.4%), skull defect (0.9%), and umbilical hernia (0.9%) were born (Figure 1D).

Unexpectedly, 15.4% of control-F 2 offspring, which were produced by IVF using sperm from control-F 1 mice and wild-type oocytes, showed similar congenital deformities (Figure 1D). Offspring with anophthalmia (3.8%), hydrocephalus (3.8%), and small eyes (3.8%) were born. However, the phenotype was not exactly the same because we found 2 offspring with tanned skin (7.7%). Because such abnormalities were not found in control-F 2 offspring from young control-F 1 mice (Supplemental Table 1), these results suggested that IVF using aged sperm increases the frequency of congenial malformation.

To determine whether F 1 female mice can sire abnormal offspring, we performed IVF using ICSI-F 1 oocytes and wild-type sperm and found an F 2 offspring with hydrocephalus (Supplemental Table 1). These results showed that congenital abnormalities can occur through the female germline. Based on the increased body weight of GS-F 2 offspring, we carried out COBRA for all F 2 offspring (Supplemental Figure 2B); however, no significant abnormalities were found. Bisulfite sequencing confirmed these results (Supplemental Figure 3B).

Behavioral abnormalities in F 2 offspring. To determine whether behavioral abnormalities persist in the F 2 generation, male F 2 offspring were subjected to a battery of behavioral tests. Overall, the phenotype of GS-F 2 mice was stronger than that of ICSI-F 2 mice. All 3 types of mice had comparable body weights, and no differences were found in a grip strength test and a wire hang test (Supplemental Figure 5, A–C). However, GS-F 2 mice were more sensitive to heat than control-F 2 mice (Supplemental Figure 5D).

GS-F 2 mice exhibited many of the defects of GS-F 1 animals. They showed low activity in the light/dark transition test (Supplemental Figure 5E). Although no abnormalities in 24-hour cage monitoring was found (Supplemental Figure 5F), an open-field test showed lower activity (Supplemental Figure 5G). Abnormalities in acoustic startle response and prepulse inhibition clearly persisted in GS-F 2 mice (Figure 3A). Despite the lack of a significant differences in the tail suspension test (Supplemental Figure 5H), we found abnormalities in the Porsolt forced swim test (Supplemental Figure 5I). The 3-chamber social approach test and social interaction test in a new environment indicated defective social behavior in GS-F 2 mice (Figure 3B and Supplemental Figure 5J). Abnormalities in elevated plus maze test also suggested anxiety-like behavior (Figure 3C).

Figure 3 Abnormal behavior of F 2 offspring. (A) Acoustic response and prepulse inhibition test. (B) Three-chamber social approach test (Crawley version). (C) Elevated plus maze test. (D) Cued and contextual fear conditioning test. The number of mice analyzed is as follows: (A, B, and D) n = 18 for control-F 2 , n = 14 for ICSI-F 2 , and n = 16 for GS-F 2 and (C) n = 17 for control-F 2 , n = 14 for ICSI-F 2 , and n = 17 for GS-F 2 . *P < 0.05, 1-way ANOVA (mouse type) or 2-way repeated measures ANOVA (mouse type, 2-way interaction [e.g., mouse type time interaction). CS, conditioned stimulus; UCS, unconditioned stimulus. See Supplemental Methods and Supplemental Tables 4 and 5 for details.

We observed new phenotypes in GS-F 2 mice. In addition to thermal sensitivity, GS-F 2 mice showed abnormalities in social novelty preference test (Figure 3B). They also showed a superior response in the T-maze test (Supplemental Figure 5K). Moreover, GS-F 2 mice showed significant reductions in distance traveled and in number of errors in the Barnes maze test (Supplemental Figure 5L). Therefore, although GS-F 2 mice exhibited many of the same abnormalities of the GS-F 1 mice, their memory was significantly improved in the next generation.

The phenotype of ICSI-F 2 mice was mild. However, they showed abnormalities in the 3-chamber social approach test of social novelty preference (Figure 3B). Control-F 2 and GS-F 2 mice spent more time in and around the cage with a new stranger mouse (stranger 2) than in the cage with the familiar mouse (stranger 1), while ICSI-F 1 mice did not show such a preference. Like GS-F 2 mice, ICSI-F 2 mice also exhibited phenotypes not found in ICSI-F 1 mice. ICSI-F 2 mice showed a reduction in distance traveled in the dark (Supplemental Figure 5E), showing low locomotive activity. They also showed reduced travel speed in 3-chamber social approach test (Figure 3B). Neither the T-maze test nor Barnes maze test showed abnormalities (Supplemental Figure 5, K and L). However, ICSI-F 2 mice exhibited a longer freezing time and shorter distance traveled in the conditioning session (Figure 3D). Although the effect of reduced activity needs to be considered, abnormalities were also found in context testing and cued testing with altered context. When fear memory was assessed after 1 month, ICSI-F 2 mice still showed defects in context testing, suggesting poor learning ability and memory retention (Figure 3D). Therefore, behavioral abnormalities are propagated by germline transmission.

Congenital deformity in F 3 offspring. We produced F 3 offspring using sperm from F 2 mice and wild-type oocytes. After Caesarean section, we found that GS-F 3 offspring were heavier than control-F 3 mice (Figure 1B). Litter size and body/placental ratio were comparable among the 3 groups (Figure 1B and Supplemental Table 1). Anophthalmia and hydrocephalus were similarly observed in ICSI-F 3 mice (1.8%; Figure 1E). Moreover, ICSI-F 3 and GS-F 3 offspring showed severe defects, with missing head and limbs (Figure 1E). Anophthalmia was found in control-F 3 offspring (Figure 1E). To examine whether natural mating can erase abnormalities, we crossed ICSI-F 2 male and female mice with normal appearance. However, natural mating produced 1 mouse with microphthalmia and 1 with hydrocephalus (Figure 1F). COBRA of tail DNA did not show apparent abnormalities in DNA methylation levels (Supplemental Figure 2C). These results suggested that congenital abnormalities occur in the subsequent generations.

Analysis of spermatogenesis and SSCs in F 1 mice. To understand the mechanism of transmission of abnormal phenotype, we performed immunostaining of ICSI-F 1 and control-F 1 mouse testes (Figure 4A). We used antibodies against the regions of histone H3 containing the dimethylated lysine 4 (H3K4me2), dimethylated lysine 9 (H3K9me2), dimethylated lysine 27 (H3K27me2), trimethylated lysine 27 (H3K27me3), demethylated lysine 36 (H3K36me2), and dimethylated lysine 79 (H3K79me2). Immunostaining pattens were similar to results reported in previous studies (29–31). However, there were no obvious differences in staining patterns between the 2 groups.

Figure 4 Analysis of spermatogenesis and GS cells derived from F 1 mice. (A) Immunostaining of F 1 testes using antibodies against H3K4me2, H3K9me2, H3K27me2, H3K27me3, H3K36me2, and H3K79me2. One hundred cells in 5 tubules of 3 mice were analyzed per group. Each antigen was assessed using a single antibody. Signal intensity in PNA+ cells was measured. H3K9me2 was omitted for quantification because PNA+ cells did not show H3K9me2 signals. Scale bar: 30 μm. (B) A scatter plot with a list of genes, showing correlation of the DNA methylation data at individual CpG sites in gene promoters (n = 4). Methylation statuses at 237,680 CpG sites were covered. The numbers of identified hypermethylated sites and hypomethylated sites in ICSI-F 1 compared with control-F 1 GS cells are shown in red and blue, respectively, along with the percentage of commonly covered sites. Red or blue lines indicate 20% increased methylation levels or 20% decreased methylation levels in ICSI-F 1 GS cells, respectively. The dashed line indicates the linear regression line. Up, upregulation; Down, downregulation. (C) A scatter plot of gene expression by RNA-Seq (n = 4). (D) Real-time PCR analysis of F 1 GS cells (n = 3). See Supplemental Tables 6 and 7 for details.

To study gene expression in the germline directly, we derived GS cells from ICSI-F 1 and control-F 1 mice. GS cells were derived by collecting CD9-expressing spermatogonia from mature testes by magnetic cell sorting. These cells are enriched for SSCs (32). The morphology and growth characteristics of ICSI-F 1 and control-F 1 GS cells did not show apparent differences. To study the genomic imprinting in both types of GS cells, we performed COBRA. However, all of them showed the same androgenetic DNA methylation patterns (Supplemental Figure 2D).

We then used the reduced representation bisulfite sequencing method to verify the overall genomic methylation (Figure 4B). Of the 237,680 covered CpG sites, our analysis identified 143 (0.06% of commonly covered sites) hypermethylated sites and 19 (0.008% of commonly covered sites) hypomethylated sites in the ICSI-F 1 versus the control-F 1 GS cells (>20% change, R2 = 0.9581) (Supplemental Table 2). Gene ontology analysis failed to detect significant association with specific biological functions. Moreover, we were not able to find significant differences in DNA methylation patterns for imprinted genes (Supplemental Figure 6).