Common genetic variation associated with telomere length contributes to TBD variant penetrance and expressivity in inherited bone marrow failure syndrome cohorts.

To assess whether common genetic variation impacts penetrance and expressivity in TBDs, we developed polygenic scores (PGS) using genome-wide common SNPs associated with telomere length (73, 74, 77). For a given individual, these scores provide an estimate of the combined effect of common genetic variants across the genome that increase or decrease telomere length (78, 79). While these common polymorphisms underlie subtle variation in telomere length in the healthy population, we reasoned that the PGS might have a more profound impact in the context of high-impact monogenic alleles that are causal for the TBDs. We therefore applied the most predictive PGS, as determined using PRCise-2 (Supplemental Table 1; supplemental material available online with this article; https://doi.org/10.1172/JCI191107DS1) to the National Cancer Institute (NCI) longitudinal cohort of individuals with inherited bone marrow failure syndromes, including those with TBDs (ClinicalTrials.gov Identifier: NCT00027274). The NCI cohort included 92 patients with dyskeratosis congenita and related TBDs and is significantly enriched for individuals presenting with bone marrow failure or other severe phenotypes (80).

We reasoned that if both monogenic germline mutations and polygenic predisposition to short telomeres contribute to the clinical severity of TBDs and the likelihood of early-onset manifestations, the distribution of genetically predicted telomere length in this clinically ascertained cohort would be shifted towards shorter telomeres compared with the population average (Figure 2B). Consistent with this, individuals with TBDs enriched for early-onset bone marrow failure phenotypes (n = 92) had a median polygenic score 0.44 SDs shorter than the UK Biobank (P = 1.04 × 10–4), and 0.37 SDs shorter than the external All of Us cohort (P = 5.82 × 10–4) (Figure 2C and Supplemental Figure 1A). To estimate the effect size of the polygenic contribution, we binned the PGS distribution into quintiles based on the UK Biobank population distribution and calculated odds ratios with the number of TBD patients in each quintile representing cases and with UK Biobank participants as controls. Individuals in the lowest quintile of genetically predicted telomere length had approximately three-fold odds of being a TBD case compared with those in the highest quintile (Figure 2D). We validated these findings in a separate cohort with 190 TBD patients from the Dyskeratosis Congenita Registry (DCR) at Queen Mary University of London (81). The DCR cohort has a broader referral base and is less enriched for reported severe phenotypes compared with the NCI cohort (80, 81); consistent with this, we observed a slightly attenuated but consistent effect compared with our original NCI discovery cohort (median difference –0.20 SDs, P = 0.009) (Supplemental Figure 1B). A combined analysis across both TBD cohorts further demonstrated a consistent and strongly significant association between an individual’s PGS and their odds of having a TBD (median difference –0.28 SDs, P = 1.18 × 10–5) (Figure 2E and Supplemental Figure 1C).

Figure 2 Polygenic modification of TBD expressivity in disease cohorts. (A) Illustration. TBD-associated germline variants affect genes involved in telomere length and integrity. Variable expressivity of TBD variants results in diverse phenotypic presentations and age of onset. (B) Schematic of distribution of TBD-case telomere length polygenic scores compared with biobanks, under different hypotheses. If common variation affecting telomere length contributes to TBD expressivity and disease cohort ascertainment, left-shifted (towards shorter TL) PGS distribution would be expected (top panel). The null hypothesis is that TBD high-impact variants overpower any effects of common variation (central panel). An alternative hypothesis is that common variation predisposing to long telomere length protects TBD variant carriers from severe phenotypes and mortality; under this model, a right-shifted PGS could be observed (bottom panel). (C) Distribution of telomere length PGS in NCI TBD cases compared to the UK Biobank (Welch’s 2-tailed t test, P = 1.037 × 10–4). (D) Odds ratio of case-control status versus telomere length PGS quintile, NCI TBD cases. (E) Comparison of meta-analysis of telomere length PGS distribution in NCI and DCR cases versus UK Biobank (Welch’s 2-tailed t test, P = 1.18 × 10–5). (F) Comparison of NCI non-TBD IBMFS case telomere length PGS versus UK Biobank (Welch’s 2-tailed t test, P = 0.5124).

Interestingly, in both the NCI and DCR cohorts, a subset of patients harbored no known causal high-impact TBD mutation. These patients may have lower effect-size mutations that past TBD gene discovery efforts have been unable to detect. Under a simple liability-threshold model in which rare large-effect variants, common small-effect variants, and environmental effects combine to drive disease risk, patients with no identified large-effect variant are expected to have a more significant polygenic contribution to their disease risk on average (82, 83). To test this, we separated patients with and without a known TBD variant. While both patients with and without a known variant have a significantly shifted PGS predictive of short telomeres compared with the population average, those with no known variant had an increased PGS burden for shorter telomeres compared with individuals with a known variant (Supplemental Figure 1D), though the difference was not statistically significant. While underpowered, this analysis hints at a model in which polygenic variation might contribute to TBD risk and variant penetrance and expressivity.

We considered the possibility that population stratification and other demographic factors contributing to differences in the PGS across populations could underlie our observations (84–87). We used kinship-based inference for GWAS (KING) and 1000 Genomes reference data to infer ancestry in our patient cohorts and the UK Biobank (88, 89). The vast majority of individuals in each of the NCI, DCR, and UK Biobank cohorts were of predominantly European ancestry, followed by Admixed American ancestry (Supplemental Figure 1E, Supplemental Table 18). In conducting the GWAS and constructing the polygenic scores, we restricted our analyses to the European ancestry subset in the UK Biobank to minimize the effects of population stratification and maximize predictive accuracy in the TBD patients (see Methods). We closely examined our results to determine if ancestry had a significant effect. We found that the PGS distribution did not differ between European and non-European individuals in the patient cohorts (P = 0.87) (Supplemental Figure 1F). Furthermore, the PGS in the cohorts was not associated with the ancestry principal components inferred by KING (Supplemental Figure 1G) (84–87). As a negative control, we also assessed cases of non-TBD inherited bone marrow failure syndrome cases that included individuals diagnosed with Diamond-Blackfan Anemia, Fanconi Anemia, and Shwachman-Diamond Syndrome, and that came from the same NCI bone marrow failure syndrome cohort as the patients with TBD. For these conditions, telomere length does not drive the disease process, and nontelomere related gene mutations are implicated (71, 72, 90–92). As expected, the polygenically predicted telomere length for these patients is indistinguishable from the UK Biobank population average (P = 0.51) (Figure 2F), and the telomere length PGS for the TBD patients was 0.49 SDs shifted towards shorter predicted telomeres compared with non-TBD–inherited bone marrow failure syndrome patients (P = 3.99 × 10–4), indicating that the observed phenomenon is telomere disease-specific and is unlikely to be due to population stratification (Supplemental Figure 1H).

Polygenic variation associated with telomere length impacts TBD penetrance and expressivity in population biobanks.

Taken together, our analyses of the NCI and DCR cohorts indicate that common genetic variants associated with short telomere length contribute to ascertainment as a TBD case in disease cohorts enriched for individuals with childhood-onset bone marrow failure. We reasoned that the reverse should also be true: TBD causal variants should be present in adult population biobanks, and adults with a pathogenic variant who avoided the severe childhood-onset manifestations of TBDs should not have polygenically predicted short telomere length (Figure 2A). To test this, we examined the UK Biobank for carriers of variants in the genes known to cause TBDs (see Methods and Supplemental Table 3).

To be maximally comprehensive while maintaining stringency, we defined multiple variant sets, given that each variant annotation approach has limitations for predicting true pathogenicity (93). First, we included variants annotated in ClinVar as causing dyskeratosis congenita or a related TBD with high confidence, hereafter referred to as “ClinVar Pathogenic.” Applying quality-control, dominance, and ancestry filters, we identified 213 variant carriers (Figure 1B). Next, we defined a more restrictive subset of ClinVar variants including only carriers of variants specifically annotated to cause childhood-onset TBDs in a dominant manner and male carriers of DKC1, resulting in 22 variant carriers, referred to as “ClinVar Dominant-Acting.” Finally, to be maximally inclusive of potentially pathogenic variants that may not have been annotated in ClinVar, we defined a set of predicted pathogenic rare coding variants in TBD genes using a consensus of Ensembl Variant Effect Predictor, LOFTEE, and AlphaMissense annotations, resulting in 1,666 carriers in the UK Biobank (“Consensus Predicted Pathogenic”) (see Methods and Supplemental Tables 4–12) (94–96). Genes that harbor TBD-causal mutations are often classified based on mode of inheritance (16, 17, 97). For all of these analyses, we excluded genes that cause TBDs in an exclusively autosomal recessive manner, given the challenges present in determining the phase of mutations (see Methods).

Supporting the pathogenicity of the variants in the associated sets, individuals in each group in the UK Biobank had shorter measured telomere length (TL) compared with noncarriers. Furthermore, the magnitude of effect was concordant with the expected order of pathogenicity, with the most inclusive Consensus Predicted Pathogenic cohort associated with the smallest average decrease in TL (0.31 SDs), the more restrictive ClinVar Pathogenic cohort associated with a 0.85 SD decrease in mean TL, and the most restrictive ClinVar Dominant-Acting cohort showing the largest average decrease in TL (1.15 SDs) (Figure 3A).

Figure 3 Polygenic modification of TBD expressivity in the UK Biobank. (A) Measured telomere length in UK Biobank noncarriers and carriers of pathogenic TBD variants (pairwise 1-sided t tests with Bonferroni multiple testing correction; noncarrier versus Predicted Pathogenic: P = 3.80 × 10–20; noncarrier versus ClinVar Pathogenic: P = 3.46 × 10–23; noncarrier versus ClinVar Dominant-Acting: P = 1.39E-3). (B) TL PGS in UK Biobank noncarriers and carriers of pathogenic TBD variants (pairwise 2-sided t test with Bonferroni multiple testing correction; non-carrier versus Predicted Pathogenic: P = 1; noncarrier versus ClinVar Pathogenic: P = 0.20; noncarrier versus ClinVar Dominant-Acting: P = 1). (C) TL PGS in NCI cases compared to UKB pathogenic variant carriers (pairwise 1-sided t test with Bonferroni multiple testing correction; TBD case versus Predicted Pathogenic: P = 0.00087; TBD case versus ClinVar Pathogenic: P = 0.041; TBD case versus ClinVar Dominant-Acting: P = 0.0896). (D) Odds ratios of aplastic anemia and idiopathic pulmonary fibrosis in carriers of pathogenic TBD variants compared with noncarriers (logistic regression adjusting for age and sex). (E) Blood cell counts in UK Biobank noncarriers and carriers of pathogenic TBD variants (pairwise t test with Bonferroni multiple testing correction). (F) Odds ratios of idiopathic pulmonary fibrosis in UK Biobank stratified by PGS tertile (top third, middle third, and lowest third) and ClinVar Path or Predicted Path variant-carrier status, using noncarrier intermediate group as the control group (logistic regression adjusting for age, sex and first 4 ancestry PCs).

Despite having short telomeres, all three variant carrier cohorts had a population-normal polygenic contribution to telomere length on average compared with noncarriers of TBD variants in the UK Biobank (Figure 3B), and significantly longer predicted telomere length than the TBD cohorts (Figure 3C). Interestingly, the 22 carriers of the most severe mutations (“ClinVar Dominant-Acting”) appeared to have a right-shifted polygenic score distribution relative to noncarriers, suggestive of a protective effect that could help explain why carriers of these large-effect variants escaped early-life manifestations of disease, but this difference was not statistically significant given the small sample size (Figure 3B). These findings complement our results in patient cohorts, demonstrating that individuals with TBD mutations who do not have early clinical presentations have a relatively decreased common variant burden for short telomeres compared with childhood-onset disease cohorts. Together, these findings support the idea that the risk of childhood-onset severe TBD manifestations due to large-effect causal TBD gene variants can be modified by common variants that impact telomere length.

We then assessed whether these pathogenic variant carriers were enriched for childhood or adult-onset TBD phenotypes relative to noncarriers in the UK Biobank. Importantly, examining childhood-onset TBD phenotypes, we found no evidence for increased risk of bone marrow failure or altered blood counts in these TBD variant carriers (Figure 3, D and E). We then examined idiopathic pulmonary fibrosis, an adult TBD manifestation (97, 98). We found that being a carrier of a pathogenic variant was associated with greatly increased odds of presenting with idiopathic pulmonary fibrosis (Figure 3D). We wondered whether common variation also affects the penetrance of these adult-onset manifestations of TBDs. Strikingly, we found that within both variant carriers and noncarriers, telomere length PGS stratified risk of idiopathic pulmonary fibrosis (Figure 3F and Supplemental Figure 2D, combined VEP and ClinVar carrier analysis and separated, respectively).

We sought to quantify the effects of PGS in both rare TBD variant carriers and noncarriers and also asked whether there was evidence for a nonadditive interaction between polygenic effects on telomere length and pathogenic variants. We regressed idiopathic pulmonary fibrosis disease status on PGS, variant carrier status, and an interaction term between the two, while controlling for age, sex, and the first 4 ancestry principal components (Supplemental Table 13). Including both carriers and noncarriers, a 1 SD decrease in PGS (predicting shorter telomere length) was associated with an odds ratio of 1.22 for idiopathic pulmonary fibrosis (P = 9.04 × 10–27). The included interaction term between rare variant status and PGS was not statistically significant and did not affect the coefficient estimates (Supplemental Table 14). Restricting to only rare variant carriers, this association was consistent, but not statistically significant (though possibly limited in statistical power due to small sample size), with a 1-unit decrease in PGS associated with an odds ratio of 1.31 for having idiopathic pulmonary fibrosis (P = 0.108) (Supplemental Table 15). We confirmed through a mediation analysis that the effect of telomere length PGS on pulmonary fibrosis risk is mediated through telomere length (Supplemental Table 16 and Supplemental Note) (99). These results support a model in which small-effect polygenic variants and pathogenic large-effect variants independently contribute to adult TBD manifestations by affecting telomere length and maintenance, although this analysis may be limited by power (see Methods).

In summary, we found that carriers of TBD-causing variants in the UK Biobank, in contrast to TBD cohorts, have a population-normal PGS and no enrichment for bone marrow failure, but do have increased risk of idiopathic pulmonary fibrosis, a common adult manifestation of TBDs. While the UK Biobank variant carriers were not enriched for PGS associated with short telomere length overall, common genetic influences captured by the PGS do impact the penetrance of TBD mutations associated with idiopathic pulmonary fibrosis. Collectively, these findings demonstrate that both pathogenic mutations and common genetic variation associated with telomere length combine to impact penetrance and expressivity.

Within-family polygenic effects on disease risk.

Having observed a significant effect of common polygenic variation associated with telomere length on clinical manifestations of TBDs in large disease cohorts and population biobanks, we wondered whether polygenic variation also affects penetrance and expressivity within a single family with a shared causal variant. To explore this question, we analyzed a large kindred with multiple dyskeratosis congenita cases and a heterozygous pathogenic variant in the TERT gene (ClinicalTrials.gov Identifier: NCT00027274). Of the 22 family members for whom genotype data was available, there were 12 carriers of the pathogenic TERT variant, 3 of whom had clinically diagnosed TBDs (Figure 4A). We restricted all analyses to the 12 TERT mutation carriers, comparing the 3 clinically affected family members with the 9 unaffected family members.

Figure 4 Polygenic modification of expressivity within a family. (A) Pedigree depicting family with TERT variant. Black, case; gray, TERT noncase carrier; transparent, noncarrier; square, male; circle, female; ?, unknown status; diamond, unknown sex. (B) Telomere length PGS comparing cases with noncase TERT variant carriers using the same parameters as best genome-wide SNP score (linear mixed model with kinship matrix as random effect; see Methods). (C) Telomere length PGS comparing cases with noncase TERT variant carriers for Variant-inclusive PGS Score 1 (linear mixed model with kinship as random effect).

We reasoned that the strict clumping and pruning approach that was optimal to construct polygenic scores in the UK Biobank may not be best suited to detect relatively subtle within-family common genetic variation, given significant shared variation among the family members. Therefore, alongside the PGS used throughout the study, we constructed multiple other polygenic predictors, including more variants, reasoning that this would be more likely to pick up any subtle differences that exist within a family. As an orthogonal approach, we also constructed a polygenic score including all conditionally genome-wide significant SNP signals (100), with the idea that this would enable detection of multiple independent effects on different haplotypes that could be segregating within this family (100). We accounted for family structure using a linear mixed model with kinship as a random effect. Remarkably, across all tested approaches, the clinically affected family members had a more negative PGS on average than the unaffected variant-carrying family members, indicating a greater burden of polygenic variation associated with shorter telomeres (Figures 4, B and C, and Supplemental Figure 3, B and C). We speculate that the affected family members have disease not because they have a shifted PGS causing a somewhat shorter mean telomere length, but rather that the PGS hints that these patients present with disease at least in part because of a decreased general ability to protect and repair telomeres, on the background of a major perturbation caused by the rare disease-driving mutation.

Thus, even in the relatively controlled setting of a family with a shared causal variant and overall similar genetic background, our results suggest that random segregation of common variants that alter disease biology might contribute to variable expressivity and penetrance. These results provide a framework for larger studies of trios and families to further elucidate how common genetic variation may help explain why some family members with a disease-causing variant have severe symptoms, while others remain clinically unaffected.

Convergence of common and rare genetic variation in telomere biology disorders.

A key question regarding polygenic modifiers of disease is whether common and rare variation converges on the same genes and biological pathways (101). In autism spectrum disorder, examples of convergence of common and rare variation at the same loci have been observed (102). Similarly, in Hirschsprung’s disease and craniosynostosis, shared signaling and regulatory pathways are affected by rare and common risk variants (6, 9, 12). In contrast, in sickle cell disease and β-thalassemia, common genetic variation largely impacts disease expressivity through modulation of fetal hemoglobin gene transcriptional regulation, a mechanism distinct from the primary disease-causing mutations that alter adult hemoglobin (10, 103). Having shown that common polygenic variation affects penetrance and expressivity in TBD, we asked whether these variants act upon the same or different genes as the causal high-impact variants underlying TBDs.

In the TBDs, causal variants affect genes regulating telomere length, maintenance, and function (17, 97). Using multiple gene prioritization approaches (see Methods), we found that common variation associated with telomere length and TBD expressivity implicates genes that strongly overlap the set of genes implicated as high-impact monogenic variants in TBDs (P = 5.41×10–17) (Figure 5 and Supplemental Figure 4A). The polygenic variants are primarily noncoding, with more than 97% of credible set variants mapping to introns or intergenic regions (Supplemental Figure 4B). These variants show a striking enrichment for enhancers in CD34+ hematopoietic stem and progenitor cells, possibly explaining the association we observe in bone marrow failure and likely underlying mechanisms in these progenitors for all blood and immune cells (Supplemental Figure 4C). Collectively, these findings suggest a model in which common, noncoding variation converges upon the same genes implicated by high-effect Mendelian coding mutations.