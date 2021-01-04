Characteristics of patients. From 2014 to 2020, 115 probands enrolled in the BCM UDN clinical site and 67 family members (182 total) underwent RNA-seq from whole blood and skin fibroblasts (Figure 1 and Table 1). Among all probands with RNA-seq, 72 (63%) had ES, 29 (25%) had GS, and another 14 (12%) had both ES and GS. In terms of tissue source, 49 (~42%) of probands had RNA-seq from fibroblasts, 18 (16%) from blood, and 48 (42%) from both tissues. The majority (70%, n = 81) of probands were in the pediatric age group (<18 years of age), and nearly half (46%, n = 53) had a primary neurologic phenotype, consistent with the overall UDN historical proportions (Table 1). Musculoskeletal and immune phenotypes followed at 22% (n = 25) and 8% (n = 9), respectively, with many probands having multiple system involvement.

Figure 1 Flow diagram outlining BCM UDN RNA-seq diagnostic research process. *Cases diagnosed on initial review of ES/GS without needing RNA-seq. #Undiagnosed but with expression/splicing outliers prompting follow-up studies for potentially novel disease gene discovery. §Five cases diagnosed with ES/GS candidate variant approach were validated using RNA-seq–directed approach. ES, exome sequencing; GS, genome sequencing.

Table 1 Demographics, primary phenotypes, RNA-seq tissue source, and ES/GS counts for proband participants

Comparison of skin fibroblasts and whole blood. Principal component analysis (PCA) demonstrated notably better consistency of gene expression in skin fibroblasts than whole blood (Figure 2). Although 2 distinct clusters were present, the fibroblast data showed less variability. This finding suggests that fibroblast RNA-seq is preferable to whole blood for detecting differences in gene expression that have a biological basis and may be clinically relevant. In addition, fibroblast-derived RNA had a higher number of well-expressed genes with transcripts per million (TPM) values greater than 10 across multiple disease gene sets compared with whole blood. In 10 of 16 gene classes, at least half of the genes were well expressed in fibroblasts compared with only 1 of 16 for whole blood (Supplemental Table 1; supplemental material available online with this article; https://doi.org/10.1172/JCI141500DS1). The most significant difference was noted for aortopathy-associated genes where 80% had a TPM greater than 10 in fibroblasts compared with only 24% in whole blood (P ≤ 1 × 10–10). This pattern was consistent for genes associated with common UDN patient phenotypes including skeletal dysplasias (60% vs. 15%, P ≤ 1 × 10–10), autism/intellectual disability (ID) (58% vs. 25%, P ≤ 1 × 10–10), and epilepsy (42% vs. 15%, P ≤ 1 × 10–10) (Supplemental Table 1). Consistent with the sample type, only immunodeficiency-related genes had a higher percentage of well-expressed genes in whole blood than fibroblasts (58% vs. 45%, P ≤ 1 × 10–10). Notably, fibroblast expression was higher in 92% (12 of 13) of genes identified in the solved cases described here (Supplemental Table 2).

Figure 2 Principal component analysis (PCA) plot of gene expression (TPM) in whole blood (blue) and skin fibroblasts (red). Two distinct tissue clusters are visible; however, less variability is present in skin fibroblasts. This suggests that fibroblasts may be better for detecting clinically relevant differences in gene expression by RNA-seq. TPM, transcripts per million; FB, fibroblast; WB, whole blood.

Transcriptome outlier detection. Overall, each proband had an average of 3–4 genes with significantly increased or decreased expression (FDR < 0.05) relative to the entire cohort for both tissues (Supplemental Table 3). We further refined this list by prioritizing known Online Mendelian Inheritance in Man (OMIM) disease genes. For novel disease gene discovery, about 1 in 3 (fibroblasts) and 1 in 6 (whole blood) probands had a gene with low expression predicted to be intolerant of loss-of-function (pLI ≥ 0.9) or predicted to cause a dominant disorder (DOMINO score ≥ 0.8) (Supplemental Table 3). For splicing, we focused on rare events in which a particular splicing junction had not been seen more than twice in the cohort, yielding an average of 60.7 and 22.5 abnormal splicing events per proband in fibroblasts and whole blood, respectively (Supplemental Table 3). We prioritized this list by focusing on known disease genes. Splicing and expression abnormalities were validated by visual inspection of the RNA-seq alignment in the Integrative Genomics Viewer (IGV) (12). Verified results were used for targeted analysis of DNA sequencing data to identify the underlying cause for the transcriptome difference and confirm the diagnosis. For unsolved cases but with expression/splicing outliers, additional workup, including GeneMatcher (13) submissions, animal models, and long-read sequencing, were initiated as part of UDN standard practice for novel gene discovery purposes.

Diagnoses made with transcriptome-guided genomic analysis. Of the 115 probands who underwent RNA-seq, 32 (28%) were diagnosed via other methods such as research ES/GS analysis or clinical evaluation without the need for RNA-seq (Figure 1). We first validated the transcriptome-guided method in the 5 cases previously diagnosed with RNA-seq via a traditional candidate approach (Table 2). These 5 validation cases had all undergone ES, and 1 also had GS. Of the remaining 78 undiagnosed probands (Figure 1), 41 (53%) had ES, 25 (32%) had GS, and 12 (15%) had both ES and GS. A diagnosis was made in an additional 9 of these cases using the new technique (Table 2). All 9 had undergone ES, and an additional 7 also had GS, the latter needed to identify the genomic event responsible for the RNA-seq finding. Across the entire cohort, RNA-seq led to an overall diagnostic rate of 12% (14 of 115; 95% CI, 7%–19%). Excluding cases solved on ES/GS alone without the need for RNA-seq, the diagnostic rate was 17% (14 of 82; 95% CI, 10%–26%). The causative genomic variants identified through RNA-seq included synonymous (n = 1), near intronic (3–50 bp from canonical exon boundary, n = 2), deep intronic (>50 bp away from canonical exon boundary, n = 4), promoter (n = 1), and canonical splice site single-nucleotide variants (SNVs) (n = 1) as well as both coding (n = 3) and noncoding (n = 2) deletion copy number variants (CNVs) (Figure 3). Among solved cases, 7 (50%) had RNA-seq from fibroblasts only and 1 (7%) from whole blood only, and 6 (43%) from both (Table 2). Notably, in those 6 cases, the RNA-seq from whole blood failed to identify the causative defect in half (n = 3), while none were missed from fibroblasts. This strategy also streamlined our analysis workflow; after one-time processing, the abnormally expressed genes and splicing events guided targeted analysis of existing sequencing data, and any additional confirmatory testing was done to make the final diagnosis. This contrasts with the 1–4 hours we typically require for an ES/GS research analysis to identify candidate variants and manually inspect the transcriptome for abnormalities. In one recent report, the time required was up to 6–8 hours for genome analysis (14). The following are several case examples of this approach in making diagnoses that also show the limitations of commonly used diagnostic tests.

Figure 3 Causative genomic variants identified through RNA-seq–directed genomic analysis. Variant types included synonymous (n = 1), near intronic (3–50 bp from canonical exon boundary, n = 2), deep intronic (>50 bp away from canonical exon boundary, n = 4), promoter (n = 1), and canonical splice site SNVs (n = 1) as well as both coding (n = 3) and noncoding (n = 2) deletion CNVs. SNV, single-nucleotide variant; CNV, copy number variant.

Table 2 Causative variants identified with the transcriptome-directed approach

Case 1 — PQBP1: This case involved a 3-year-old male referred to the UDN with multiple congenital anomalies. Medical history was significant for congenital heart defects (ventricular septal defect and patent ductus artery), vertebral anomalies (butterfly vertebrae), ectopic pelvic left kidney, hypospadias, sensorineural hearing loss, and failure to thrive. Developmentally he was delayed, rolling over at 12 months and not yet sitting independently or speaking at age 3 years. On exam, he was small (weight –2.45 SD, height –3.1 SD) and dysmorphic with microbrachycephaly (occipitofrontal circumference –4.88 SD), deep-set eyes, midface hypoplasia, broad nose, low-set ears, high palate, and 4-5 toe syndactyly among other findings (Figure 4A). Family history was significant for a maternal half-brother (not enrolled in the study) with VACTERL association (vertebral defects, anal atresia, cardiac defects, tracheo-esophageal fistula, renal anomalies, and limb abnormalities) and a half-sibling who died in utero with cardiac defect and gastroschisis versus limb-body wall defect. The differential diagnosis included Coffin-Siris syndrome or chromosomal abnormality; however, trio ES and CMA were negative, as well as GS sent through the UDN.

Figure 4 Case 1 — Renpenning syndrome. (A) Dysmorphic features, including microbrachycephaly, deep-set eyes, midface hypoplasia, broad nose, and low-set ears. (B) GS with hemizygous deep intronic PQBPQ1 variant (green) inherited from heterozygous mother. (C) RNA-seq sashimi plot from whole blood showing out-of-frame pseudoexon and distal intron retention in the proband (red) but absent from controls (blue/green). Black arrow indicates the location of PQBP1 intronic variant. GS, genome sequencing.

RNA-seq analysis detected a nearly 50% reduction in the expression of PQBP1 in the proband compared with controls in whole blood. Reanalysis of GS data revealed a hemizygous deep intronic variant in PQBP1 (c.180-306G>A) inherited from the heterozygous mother that activated a cryptic splice donor near the variant site (Figure 4B). The RNA-seq pipeline also detected an abnormal splicing pattern that resulted in an out-of-frame pseudoexon between exons 3 and 4, as well as more distal intron retention (Figure 4C).

Defects in PQBP1 cause Renpenning syndrome (MIM 309500), an X-linked ID syndrome characterized by males with microcephaly, short stature, cardiac and renal anomalies, small testes, and dysmorphic features (15). The encoded polyglutamine-binding protein 1 has been shown to have an essential role in neurodevelopment (16). Most of the causative PQBP1 variants are exonic frameshift deletions leading to markedly reduced gene expression via nonsense-mediated mRNA decay (NMD) and impaired protein function (17, 18). The RNA-seq findings in this proband were consistent with NMD due to the out-of-frame pseudoexon creation and other splicing abnormalities. With the RNA-seq results and substantial phenotypic overlap, we diagnosed Renpenning syndrome in the proband. Notably, no sequencing reads covered this variant on the previous ES, nor did it appear on the GS report. In addition, the SpliceAI prediction tool (19) considered this to be a benign change (score 0.33) unlikely to affect splicing. Therefore, the RNA-seq–directed analysis was indispensable in making the diagnosis. As the proband’s mother was heterozygous for this change, there were also important recurrence risk issues discussed with the family.

Case 2 — CLTC: This case involved a 14-year-old male enrolled in the UDN with a history of ID and dysmorphic features. Delays in development were global, with walking occurring at 2.5 years and first words at 4–5 years. At age 14 years, his IQ was measured at 60–70 with academic skills at a second-grade level. He had maladaptive behaviors, including aggressive features, self-harm, violent outbursts, and refusal to eat, necessitating a G-tube placement. Other issues included chronic constipation and seizures. Physical exam was significant for marked hypertelorism, broad forehead, low posterior hairline, and hypotonia (Figure 5A). An extensive previous genetic workup, including karyotype, CMA, trio ES, and fragile X testing, was negative.

Figure 5 Case 2 — CLTC-associated ID syndrome. (A) Dysmorphic features, including hypertelorism, broad forehead, and low posterior hairline. (B) CLTC deletion (red bar) on GS encompassing exons 18–32. (C) PCR confirmation of deletion in proband and father but absent from mother and control (NA12878). Expected size with deletion = 391 bp. GS, genome sequencing; ID, intellectual disability.

RNA-seq analysis of both whole blood and fibroblast data demonstrated approximately half the normal expression of 2 genes, CLTC and PTRH2. These 2 genes are adjacent to each other on chromosome 17q23.1, suggesting a possible contiguous deletion. Defects in CLTC are associated with an autosomal dominant disorder (MIM 617854) with a variable phenotype that includes ID, developmental delay (DD), and epilepsy (20, 21). In contrast, infantile-onset multisystem neurologic, endocrine, and pancreatic disease (IMNEPD) is caused by biallelic pathogenic variants in PTRH2 (22).

Genome sequencing revealed a heterozygous 22.7 kb deletion (chr17: 57756685–57779426) that removed the segment from exon 18 to the transcription end of CLTC and part of the adjacent PTRH2 (Figure 5B), consistent with a diagnosis of CLTC-associated ID. Polymerase chain reaction (PCR) analysis (Figure 5C) confirmed the deletion in the proband and his father. Notably, the father reported a history of special-education classes due to learning difficulties, a finding consistent with the variable expressivity of the CLTC-related syndrome (21). The inherited nature of the deletion also raised important genetic counseling issues for the family. Of note, this deletion was not called on trio ES, and there was no coverage of CLTC on the previous CMA.

Case 3 — KANSL1: The third case involved a 7-year-old female referred to the UDN with ID, DD, dysmorphic features, and epilepsy. Developmentally, she sat at 7 months and walked at 23 months. At age 7 years, her IQ was in the 50s, and she was only able to combine 2–3 words. Dysmorphic features included blepharophimosis, epicanthal folds, protruding ears, and a tubular nose with a broad tip (Figure 6A). Other significant findings in her history included scoliosis, hyperopia, strabismus, and mild joint hypermobility. Her parents and sister were in good health, and other family history was noncontributory. Trio ES, including subsequent reanalysis, was negative. CMA was negative in 2012 and 2018.

Figure 6 Case 3 — Koolen–de Vries syndrome. (A) Dysmorphic features, including blepharophimosis, epicanthal folds, protruding ears, and a tubular nose with a broad tip. (B) Exon 14 SNP (red box) in ES but absent in RNA-seq consistent with loss of that allele. (C) PCR confirmation of de novo deletion in proband absent from parents and control (NA12878). Expected size with deletion = 926 bp. ES, exome sequencing; SNP, single-nucleotide polymorphism.

RNA-seq analysis of fibroblast data identified a nearly 50% decrease in expression of KANSL1. Defects in KANSL1 cause Koolen–de Vries (KdV) syndrome (MIM 610443), an autosomal dominant ID syndrome with distinctive facial features, epilepsy, congenital heart defects, and renal and urologic anomalies (23, 24). The KdV syndrome is caused by either a heterozygous microdeletion at chromosome 17q21.31 that includes KANSL1 or heterozygous loss-of-function variants in KANSL1 (23, 24). On manual review of this patient’s sequencing, a heterozygous KANSL1 single-nucleotide polymorphism (SNP) in exon 14 inherited from the father was present in the ES data but absent from the RNA-seq, suggesting that allele was not expressed (Figure 6B). Manual inspection of the ES data revealed approximately half the expected coverage of the initiation codon–containing exon 2 in the proband compared with the parents (Supplemental Figure 1A). A subsequent review of CMA data showed evidence of a deletion at that locus (Supplemental Figure 1B). However, the deletion was considered benign, as it lay in a known complex region with common background CNV variation where other similar benign losses have been reported in ClinVar and Decipher (25–27). GS failed to call any variants here due to problems in read-mapping in the region. Nevertheless, given the RNA-seq results and phenotypic fit, PCR analysis was done and identified a 307 kb heterozygous deletion (chr17: 44174219–44481307) at 17q21.31 removing the first 2 exons of KANSL1 (Figure 6C), consistent with a diagnosis of KdV syndrome. The deletion was not present in either parent, indicating a low risk of recurrence. Additional KdV-specific management, including screening for cardiac and urogenital defects, was initiated.

Case 4 — NSD2: The fourth case involved a 26-year-old male with DD, failure to thrive, unilateral hearing loss, microcephaly, and myopathy. He walked at 2 years and had delays in fine motor control and language. He completed 12th grade with special education. Physical exam was significant for microcephaly (occipitofrontal circumference –2.93 SD), brachycephaly, microstomia, and decreased muscle bulk and tone. An extensive genetic workup, including karyotype, CMA, myotonic dystrophy type 1, mitochondrial testing, and duo ES, were nondiagnostic.

RNA-seq analysis of both whole blood and fibroblast data demonstrated approximately half-normal expression of NSD2. Also known as WHSC1, NSD2 is one of 2 genes within the Wolf-Hirschhorn syndrome (WHS) critical region (WHSCR) on chromosome 4p16.3. NSD2 is predicted to be intolerant of loss of function (pLI = 1), and reports have described truncating NSD2 variants in association with a phenotype resembling a mild form of WHS (28–30).

Suspecting a noncoding causative variant given the negative prior ES and CMA, GS was requested and revealed a heterozygous 3.9 kb deletion (chr4: 1870996–1874851) containing part of NSD2 (Figure 7A). The deletion encompassed all of NSD2 exon 1 (representing >80% of the 5′UTR), including the transcription start site and the upstream region containing the promoter and enhancer elements (31). PCR analysis confirmed that the deletion was not inherited from the mother (Figure 7B); however, a paternal sample was not available for segregation. Notably, the small noncoding deletion was not called on ES or CMA due to the lack of coverage in this region.

Figure 7 Case 4 — NSD2-associated ID syndrome. (A) A 3.9 kb deletion (red bar) including the NSD2 transcription start site and upstream promoter/enhancer elements detected by GS. (B) PCR confirmation of deletion in proband absent from mother and control (NA12878). Expected size with deletion = 635 bp. GS, genome sequencing; ID, intellectual disability.

With the deletion finding as well as both whole-blood and fibroblast transcriptome data demonstrating half-normal NSD2 expression, the diagnosis of NSD2-associated neurodevelopmental disorder was made. The proband’s phenotype was consistent with a mild form of WHS in that he had learning disabilities, decreased muscle bulk/hypotonia, hearing loss, microcephaly, and postnatal growth retardation.