Study population. In the current analysis, we included the first 424,651 unrelated participants enrolled in the UK Biobank study who underwent exome sequencing of blood DNA and were free of hematologic cancer and CVD at baseline (61, 62). Between 2006 and 2010, approximately 500,000 residents of the United Kingdom (UK) aged 40–69 years were recruited at one of 22 assessment centers across the UK and had samples, including blood-derived DNA, collected at baseline, as well as baseline clinical characteristics, biomarkers, and subsequently incident clinical events through medical history and linkage to data on hospital admissions and mortality. Details regarding this cohort have been described elsewhere in detail (60). Relatedness was defined as one individual in each pair within a third degree of relatedness as determined based on kinship coefficients centrally calculated by UK Biobank (60).

Whole-exome sequencing and CHIP detection. Exomes of approximately 450,000 UK Biobank participants were sequenced from blood-derived DNA at the Regeneron Genetics Center, as reported previously (62). Briefly, exomes were captured by Integrated Data Technologies’ (IDT’s) xGen probe library and sequenced on the Illumina NovaSeq platform. Sample-specific FASTQ files were aligned to the GRCh38 reference. The resultant binary alignment file (BAM) containing the genomic information was evaluated for duplicate reads using the Picard3 MarkDuplicates tool and then converted by SAMtools to CRAM files that, after going through quality controls, were submitted to the UK Biobank data repository for distribution. CHIP detection was conducted through using GATK Mutect2 software (https://software.broadinstitute.org/gatk) as previously performed (7, 63, 64). Participants were annotated as having putative CHIP if the output contained at least 1 of a prespecified list of putative CHIP variants in 74 genes anticipated to cause myeloid malignancy at a VAF greater than 2% (Supplemental Table 7) (3, 6, 65). Common sequencing artifacts and germline variants were excluded, as described elsewhere (7).

RNA-Seq data. RNA-Seq data were obtained from 2 TransOmics in Precision Medicine (TOPMed) cohorts: MESA and FHS.

MESA is a multiancestry prospective cohort of 6,814 self-identified White, Black, Hispanic, or Asian men and women free of clinical CVD at recruitment in 2000–2002 (66). Included in this study were 889 individuals who had RNA-Seq data in PBMCs measured at baseline. A total of 889 participants were randomly selected from the MESA cohort for RNA-Seq in PBMCs following the standard protocol. For technical details for sample acquisition and RNA-Seq, see Liu et al. (67).

FHS is a multigenerational cohort initiated in 1948 (68). The Framingham Offspring cohort (generation 2 [Gen 2]) was recruited in 1971 (n = 5,124), and the Gen 3 cohort was recruited in 2002–2005 (n = 4,095) (69, 70). The participants were predominantly self-identified White. Included in this study were 2,622 individuals from the Offspring and Gen 3 cohorts who had peripheral whole-blood samples collected and blood RNA sequenced at exams 9 and 2, respectively. For technical details for the blood draw and RNA-Seq, see Liu et al. (71).

Gene selection and predicted expression score generation. We examined pairs of common CHIP mutations that are associated with CVD risk (6), including DNMT3A, TET2, ASXL1, and JAK2, and genetically predicted expression levels of inflammatory genes that are biologically closely related to the NLRP3 or AIM2 inflammasomes; these genes were selected based on established biological pathways (72, 73) and protein-protein interactions (74). Specifically, activation of the AIM2 and NLRP3 inflammasomes, both regulated by IFN-γ (72, 75), leads to cleavage of IL-1β and IL-18 to produce their mature forms (76, 77). IL-1β and IL-18 in their active forms then exert diverse biological functions related to inflammation (78), including inducing the production of IL-6, a strong independent predictor of cardiovascular outcomes (79, 80). We therefore included genes encoding these key proteins, namely IFNG, AIM2, NLRP3, IL1B, IL18, and IL6R. Based on the protein-protein interaction networks provided by STRING (https://string-db.org/), we further extended our study to genes that encode proteins with the top 10 highest interaction scores with each of the key proteins (since AIM2 and NLRP3 highly interact, we only kept one of them, NLRP3, as a key protein for selecting genes in the extended list). This resulted in a total of 29 additional genes, namely CARD8, CASP1, CASP5, DHX33, IFNGR1, IFNGR2, IL10, IL18BP, IL18R1, IL18RAP, IL1R1, IL1R2, IL1RAP, IL6, IL6ST, IRF1, JAK1, JAK2, JAK3, NEK7, NLRC4, SOCS, STAT1, STAT3, STAT4, STAT5A, STAT6, TNF, and TYK2.

For all selected genes, we used genotyping array data from the UK Biobank participants to generate predicted expression scores. The details on quality control and imputation of genotypic data in UK Biobank have been described elsewhere in detail (60). Briefly, genotypic data were obtained using either UK BiLEVE Axiom arrays (Affymetrix Research Service Laboratory) or UK Biobank Axiom and then imputed to either the Haplotype Reference Consortium (HRC) or the merged UK10K+1000 Genomes as reference panel. Principal component analysis (PCA) was performed using fastPCA (81) based on a pruned set of 147,604 single nucleotide variations (SNVs) among unrelated individuals (82).

We calculated the predicted expression score as weighted sums of expression-increase allele counts among selected SNPs, weighted by their raw or posterior effect sizes on the expression levels of the corresponding genes (β coefficient) (22, 83). Raw β coefficient estimates were based on summary statistics of the whole blood (85% of the Consortium) and PBMCs (15% of the Consortium) cis-eQTL results from the eQTLGen Consortium (N = 31,684; https://www.eqtlgen.org/) (20), with cis being defined as within ±500,000 bp around the transcriptional start site (TSS) of the encoding gene of the target protein. The majority of participants included in the eQTLGen Consortium are of European descent, which is similar to our study population (20). We used 2 methods to calculate the scores among EA and non-EA participants separately. (i) One was the pruning + thresholding (P+T) approach, where we used the raw effect size as weights for SNPs and conducted SNPs selection based on the following formula:

(Equation 1)

where for an individual i, and p j are the effect size and P of variant estimated from the summary statistics, respectively; G ij is the genotype dosage for that individual i and j variant; the set of S clumping (r c 2,w c ) means restricting to variants remained after clumping at the squared correlation threshold of r c 2 and clumping window size of w c ; and I(p j < p r ) is a binary indicator function, with 1 indicating P of variant j less than the specific P cutoff p r , and 0 the other way (21). For each gene, we created 30 candidates’ P+T-based predicted expression scores based on 3 r2 levels (0.1, 0.01, and 0.001), 5 P value thresholds (5 × 10−8, 1 × 10−5, 0.001, 0.01, and 0.1), and 2 clumping window sizes (within 250 kb and 5 Mb to both ends of the index SNP). (ii) The second method was the PRS-CS approach, which uses a continuous shrinkage Bayesian framework to calculate the posterior mean of effect sizes, used as weights, across all SNPs (22). For each gene, we also created 4 candidate PRS-CS–based predicted expression scores using 4 candidate global shrinkage parameters (1 × 10−6, 1 × 10−4, 0.01, and 1). For both approaches, we used a set of unrelated individuals from phase 3 of the 1000 Genomes Project as the linkage disequilibrium (LD) reference panel (84). Since eQTLGen summary statistics were from both whole-bold and PBMC samples, we used genotypes and transcriptome concentrations from both FHS (whole blood) and MESA (PBMCs) for score tuning (67). For each gene, we selected the optimal method and parameters for generating the score based on the largest r2 of the measured transcriptome levels in either FHS or MESA, since the eQTL source data were from either whole blood or PBMCs. The best-predicted expression scores were all standardized to zero-mean and unit variance and were approximately normally distributed in the population. In the current study, we continued studying genes whose final-selected best-performed predicted expression scores had r2 > 1% among EA participants, resulting in suitable scores for 26 genes (Figure 2 and Supplemental Table 2).

Study outcomes. The primary outcome, CVD event, was a composite of myocardial infarction, coronary artery revascularization, stroke, or death as before (7). We also secondarily used CAD for sensitivity analysis, which was defined as myocardial infarction, percutaneous transluminal coronary angioplasty or coronary artery bypass grafting, chronic ischemic heart disease, and angina. Both disease outcomes were defined by a combination of inpatient hospital billing International Classification of Diseases (ICD) codes and UK death registries, listed in Supplemental Table 8 (7). The exploratory outcomes included 31 hematopoietic cell count indexes and 5 cardiometabolic biomarkers (C-reactive protein [CRP], total cholesterol, HDL cholesterol, LDL cholesterol, and triglycerides). These conventionally measured biomarkers were analyzed as quantitative traits and were log 2 -transformed (with 1 added across all measurements to avoid 0 values for CRP), standardized to zero-mean and unit variance, and normalized in the population. Blood samples of UK Biobank participants were collected into 4 mL EDTA Vacutainers by vacuum draw, stored at 4°C, and then transported to the UK Biocentre in temperature-controlled shipping boxes (85). Full blood counts were measured among all participants using clinical hematology analyzers at the centralized processing laboratory. Serum CRP level was measured by immunoturbidimetric high-sensitivity analysis on a Beckman Coulter AU5800. Lipid measurements were performed on the Beckman Coulter AU5800 platform and run using an immunoturbidimetric approach.

Asxl1-chimeric mice. Bone marrow from CD45.2+ Cas9 transgenic mice (The Jackson Laboratory, 026179) was harvested and enriched for c-Kit+ cells using magnetic beads (Miltenyi Biotec, 130-091-224). LT-HSCs (Lin–c-Kit+Sca1+CD48–CD150+) (86) were then harvested by flow cytometric sorting. LT-HSCs were then spinfected with 6 μg/mL Polybrene (MilliporeSigma, TR-1003-G) and lentiviruses containing nontargeting guides (Nmt4) or guides targeted to Asxl1 in exon 12 (Asxl1-G623*). LT-HSCs were washed and then incubated for 3 days. LT-HSCs were then mixed with 1 × 106 supporting cells from CD45.1+ WT mice and transplanted into irradiated Ldlr–/– recipient mice.

Asxl1-CRISPR validation. CRISPR guides targeted to exon 12 of Asxl1 were designed by CHOPCHOP (87) and screened in skin-derived fibroblasts from Cas9 transgenic mice. Guide sequence AGTGGTAACCTCTCGCCCCTCGG was evaluated by Sanger sequencing of PCR amplification of flanking regions using forward GCAGCATAAAATGGCTCTTGAT and reverse GCTGAGTCTTCTCTTCTGGCTC primers.

Inflammasome activation studies. Five weeks after transplantation, bone marrow was harvested and cultured in L cell medium for 5 days to generate BMDMs. 20,000 BMDMs/well were seeded into 96-well plates and allowed to recover overnight. BMDMs were then primed with 20 ng/mL LPS (Cell Signaling Technology, 14011) for 3 hours and stimulated with the indicated concentrations of ATP (MilliporeSigma) for 1 hour. For AIM2 inflammasome activation, BMDMs were primed for 1 hour with 20 ng/mL LPS (Cell Signaling Technology, 14011) then incubated with Lipofectamine 2000 (Thermo Fisher Scientific, 11668019) and poly(deoxyadenylic-deoxythymidylic) acid sodium salt (pdAdT) (Invivogen, tlrl-patn) for 6 hours. Following incubations, supernatants were collected, spun down at 3,000 g for 10 minutes, then assessed for IL-1β protein by ELISA (R&D Systems, DY401) and LDH activity (Thermo Fisher Scientific, C20301).

BMDM cultures. For protein secretion assays, bone marrow was harvested as indicated above, and after 5 days of differentiation in L cell medium, BMDMs were seeded at 20,000/well in 96 well-plates and allowed to recover overnight. Cells were treated with vehicle (PBS) or LPS at a final concentration of 20 ng/mL for 6 hours. Medium was collected and frozen, and ELISA was conducted to determine concentrations of IL-6 (R&D Systems, DY406), TNF-α (R&D Systems, DY410), and IL-10 (R&D System, DY417).

For mRNA analysis, BMDM were differentiated for 5 days, then seeded into 12-well plates and allowed to recover overnight. Cells were treated with vehicle (PBS) or LPS at a final concentration of 20 ng/mL for 6 hours. BMDMs were then rinsed 3 times with PBS and suspended in TRIzol Reagent (Thermo Fisher Scientific, 15596026), and RNA was isolated using an RNeasy Micro Kit (QIAGEN, 74004) with DNase digestion. cDNA was synthesized (Thermo Fisher Scientific, 4368814), quantitative PCR (qPCR) analysis was conducted, and values were normalized to β-actin expression. Quantification of relative gene expression and percent knockdown were determined using the ΔΔ quantification cycle (C q ) method, derived from C q values obtained through qPCR analysis. The ΔΔC q was computed in a 3-step process. Initially, the C q values of the gene of interest were normalized to the reference gene, β-actin, using the formula ΔC q = C q (gene of interest) – C q (β-actin). This was followed by an exponential transformation of the expression, denoted as ΔC q expression = 2–ΔCq. Finally, the ΔΔC q was calculated by dividing the ΔC q expression by the average ΔC q expression of the control group. p–γ-H2AX Western blot analysis was conducted on BMDMs differentiated for 5 days, plated into 6-well dishes, and allowed to recover overnight. BDMDs were treated with the indicated stimulus, including 20 ng/mL LPS, for 6 hours. Cells were then washed 3 times with PBS, and protein was isolated in RIPA buffer (Boston BioProducts, BP-115) with protease and phosphatase inhibitors (Thermo Fisher Scientific, 78439). Protein was quantified with BCA analysis and subjected to Western blotting using antibodies to p–γ-H2AX (Cell Signaling Technology, 9718) and β-actin (Cell Signaling Technology, 12262).

Atherosclerosis studies. Bone marrow transplantations were conducted as described above into lethally irradiated Ldlr–/– mice. After 4 weeks of recovery, mice were subjected to WTD feeding for 12 weeks. Blood cell counts were quantified from cheek bleeding using a VetScan HM5 Hematology system (Abaxis). For Asxl1 burden analysis, red blood cells were lysed using RBC lysis buffer (BioLegend, 420301), washed in PBS with 1% BSA and 2 mM EDTA, stained with the indicated antibodies (CD3, CD115, Ly6G, CD45.1, and CD45.2), and then analyzed using a LSR-Fortessa. After 12 weeks of WTD feeding, mice were euthanized and perfused with PBS, and aortic roots were fixed in 4% paraformaldehyde for 48 hours. Aortic roots were embedded in paraffin and sectioned 6 μm thick. H&E staining was conducted on 6 slides 60 μm apart and imaged on a Nikon Labophot 2 and Image Pro Plus software (Media Cybernetics, version 7.0.0.591). Researchers blinded to the experimental protocol quantified lesion area and necrotic core area in Fiji software (88), and reported the average for the 6 slides.

Statistics. We evaluated the association between CHIP mutations and incident CVD, as well as the modification effects, by predicted expression levels of inflammatory genes measured as predicted expression scores. Using Cox’s proportional-hazard models, we first estimated the HRs and associated 95% CIs of (i) the presence of CHIP mutations and (ii) the presence of large clones, defined as having a VAF > 10%, of CHIP mutations for incident CVD events. Then we conducted stratified analyses evaluating the associations between the predicted expression scores of selected inflammatory genes on the incidence of the primary outcome (i.e., CVD) with or without the presence of CHIP variables. We carried forward predicted expression scores that were associated with incident CVD risk (defined as P < 0.05) only in the presence of CHIP variables(s) to evaluate the effect of the interactions between those scores and the corresponding CHIP variables on the primary outcome. We considered time at risk as starting at enrollment in the study and continuing until the event of interest, death, loss to follow-up, or the end of follow-up. Models were adjusted for age at the time of enrollment, sex, self-reported White British ancestry, BMI, diagnoses of type 2 diabetes mellitus at the time of enrollment, ever-smoker status, and the first 10 principal components of genetic ancestry (60). Since only less than 2% of the study population had missingness for any of the adjusted covariates, we removed those individuals from our regression models.

For significant interactions (FDR < 0.05) discovered in the above analysis, we evaluated their associations across 31 hematological and 5 cardiometabolic traits using the same Cox proportional-hazard models with adjustment for the same sets of covariates. All hematological and lipid traits were log 2 -transformed, standardized to zero-mean and unit variance, and were approximately normally distributed in the population. Analyses used R version 4.0.0 software (The R Foundation), 2-tailed P values, as well a statistical significance level of 0.05 for other analyses.

Study approval. The secondary use of data for the present analysis was approved by the Massachusetts General Hospital Institutional Review Board (protocol 2021P002228) and facilitated through UK Biobank Application 7089. All animal experiments were conducted with approval from the Institutional Animal Care and Use Committee of Columbia University (New York, New York, USA).

Data availability. TOPMed individual-level DNA and proteomics data used in this analysis are available with restricted access via the Database of Genotypes and Phenotypes (dbGaP; https://www.ncbi.nlm.nih.gov/gap/). UK Biobank individual-level data are available with request by application (https://www.ukbiobank.ac.uk). Raw data for mouse experiments are reported in the Supporting Data Values file. All code used for the described analyses are available at https://github.com/zhiyu7/chipmodifier (commit ID: 8e634e2).