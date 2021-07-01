Population ancestry of REDS-III RBC-Omics cohort. The RBC-Omics cohort included a diverse group of US blood donors born in many (n = 71) countries. Initially, groups were divided into continental ancestry groups; however, we have followed recent recommendations to divide the Hispanic (27, 28) and Asian ancestry groups into multiple subgroups based on country of birth. Donors of Hispanic ancestry were divided into 2 groups: Mexican and Central American Hispanics (MCAH) (Supplemental Figures 1 and 2; supplemental material available online with this article; https://doi.org/10.1172/JCI146077DS1) and Caribbean Island Hispanics (CIH) (Supplemental Figures 1 and 3). Donors of Asian ancestry were divided into East Asians and South Asians to reflect the diversity of these RBC-Omics subpopulations (27, 28). In total, the REDS-III RBC-Omics populations (Figure 1A) were divided into 7 ancestry groups that included non-Hispanic Whites (n = 7,586), East Asians (n = 1,049), South Asians (n = 257), MCAH (n = 456), CIH (n = 489), African Americans (n = 1,046), and “Other” participants (n = 1,336). “Other” participants is a heterogeneous group including all individuals that did not cluster within the other groups, but included people who self-identified as Native Americans, Native Hawaiians, Native Alaskans, multiple races, or were from countries like Iran and the Philippines. We also considered the entire RBC-Omics as a single group referred to as ALL Ancestries.

Figure 1 Ancestry of RBC-Omics population and Manhattan plots. (A) Plot of the first 2 principal components (PCs) of the extended RBC-Omics population overlaid on the 1000 Genomes phase 3 samples. Individuals are labeled by genetic ancestry (AFR, African American; EAS, East Asian; SAS, South Asian; EUR, non-Hispanic White; AMR, admixed American; CIH, Caribbean Island Hispanics; MCAH, Mexican and Central American Hispanics; OTH, other/multiple ancestry) overlain by ancestry groups from 1000 Genomes v3. (B–D) Manhattan plots summarizing the mega-analysis results for osmotic hemolysis (n = 12,215, λ = 1.003; B), oxidative hemolysis (n = 10,007, λ = 1.048; C), and storage hemolysis (n = 12,177, λ = 1.002; D). Each data point corresponds to a –log 10 (P value) from a multivariant linear regression model’s P value for an SNP. The black horizontal line represents an accepted P-value level of genome-wide significance (P = 5 × 10–8). Circles represent noncoding variants, and triangles are coding variants.

GWA studies of osmotic, oxidative, and storage hemolysis in mega-analysis. The SNP-based heritability from linkage disequilibrium (LD) score regression for osmotic hemolysis was 0.348 (SEM = 0.062), and for oxidative hemolysis was 0.156 (SEM = 0.073). The heritability score for storage hemolysis was not different from zero. Genome-wide analysis of 12,353 subjects from the REDS-III RBC-Omics cohort was conducted between 14.1 million genotyped and imputed SNPs for osmotic (Figure 1B), oxidative (Figure 1C), and cold-storage hemolysis (Figure 1D). GWA analyses using ALL Ancestries samples identified 14, 4, and 2 genome-wide significant regions that were associated with osmotic, oxidative, and spontaneous cold-storage hemolysis, respectively (Table 1). Q-Q plots (Supplemental Figure 4) did not exhibit any P-value inflation.

Table 1 Genome-wide significant results for hemolysis in all samples or within individual ancestry groups

Genome-wide analysis of osmotic hemolysis in the entire data set (ALL Ancestries) revealed that the genome-wide significant variants were in or close to several logical candidate genes known to modulate RBC structure and function, such as spectrin α chain, erythrocytic 1 (SPTA1/band 1; P < 1.01 × 10–22), ankyrin 1 (ANK1/band 2.1; P < 5.85 × 10–28), aquaporin 1 (AQP1; P < 4.23 × 10–10), and solute carrier family 4 member 1 (SLC4A1/band 3; P < 3.62 × 10–8) (Table 1). In addition, a number of potentially novel GWA-significant variations were found in metabolic enzymes (hexokinase 1 [HK1]; P < 4.90 × 10–11), stress kinases (MAPKAPK5; P < 2.24 × 10–13), ion channels (piezo-type mechanosensitive ion channel component 1 [PIEZO1]; P < 4.04 × 10–14), and other proteins, such as myosin IXB (MYO9B; P < 9.88 × 10–15). Supporting the internal validity of these findings, many of these SNPs are in proteins known to cause RBC disorders such as spherocytosis (23), elliptocytosis (29), xerocytosis (30), and α-thalassemia (31).

GWA analysis of oxidative hemolysis identified genome-wide significant SNPs in G6PD (P < 2.66 × 10–17), SEC14-like 4 (SEC14L4; P < 9.85 × 10–10), glutaredoxin (GLRX; P < 1.15 × 10–12), and glutathione peroxidase 4 (GPX4; P < 3.80 × 10–14). G6PD, GLRX, and GPX4 are all known to have roles in protecting cells from oxidative damage. Analysis of storage hemolysis (Figure 1D) identified only 2 genome-wide significant loci: one on chromosome 8 more than 500 kb from the nearest genes, and another on chromosome 17 (TMC8; P < 1.34 × 10–8).

Ancestry-specific GWA results. Individual principal component analysis–defined (PCA-defined) ancestry-group GWA revealed a high degree of overlap with the ALL Ancestries analysis; however, 7 additional genome-wide significant loci were observed in genes such as EYS (P < 3.20 × 10–9), HBB (P < 3.66 × 10–10), HBA2 (P < 2.90 × 10–14), and G6PD (P < 2.66 × 10–17) within specific ancestry groups (Table 1) and in only some cases (G6PD and HBA2) were the results significant in the ALL Ancestries analysis. Several loci such as GPX4 and SEC14L4 were only significant when considered with ALL Ancestries groups together. Only studying hemolysis in ancestry-specific analysis and in combined analysis enabled the discovery of all 27 of these loci.

Identification and bioinformatics analysis of variation. We identified 12 directly genotyped genome-wide significant (P < 5 × 10–8) nonsynonymous variants (NSVs) for hemolysis measures in the entire population or in the ancestry-specific groups, predicted using SIFT (https://sift.bii.a-star.edu.sg/) or PolyPhen2 (http://genetics.bwh.harvard.edu/pph2/). SPTA1 contains the NSV rs857725 (Lys1693Gln, P < 8.75 × 10–21; Figure 2A). Notably, the marker for α-thalassemia (Figure 2B) deletion (chr16: 223678) and the HbS variant modulated osmotic, oxidative, and spontaneous storage hemolysis (7). In HBB, the HbS variant (rs334, Glu7Val) was significantly associated (P < 3.66 × 10–10) with osmotic hemolysis in the African American ancestry group (Figure 2C). For oxidative hemolysis, SEC14L4 AX-83171224/rs9606739 (Arg124Gly, P < 3.07 × 10–9; Figure 2D) and G6PD rs1050828 (Val68Met, P < 2.66 × 10–17; Figure 2E) were significant NSVs, whereas for spontaneous storage hemolysis, TMC8 rs7208422 (Asn306Ile, P < 1.23 × 10–8; Figure 2F) was GWA significant.

Figure 2 Box-and-whisker plots of various hemolysis levels by genotype for GWA-significant nonsynonymous variants by ancestry group. Osmotic hemolysis: (A) Osmotic SPTA1 (rs857725/Lys1693Gln); (B) osmotic HBA2 (chr16: 223678); (C) osmotic HBB (rs334/Gul7Val) (HbS). n = 12,219 for all osmotic analyses. Oxidative hemolysis: (D) Oxidative SEC14L4 (AX-83171224/rs9606739) Arg112Gly; (E) oxidative G6PD (rs1050828) Val68Met is on the X chromosome; therefore, male and female sample members are displayed separately. n = 10,007 for all oxidative analyses. Spontaneous hemolysis: (F) Storage TMC8 (rs7208422) Asn306Ile. Minor allele homozygotes are in shades of red, heterozygotes in green, and reference allele homozygotes in shades of blue. n = 12,219 for all storage analyses. For the box-and-whisker plots, the bounds of the box are the 25th and 75th percentiles, the line in the box is the 50th percentile/median. The whiskers are 1.5 times the interquartile range (25%–75%), and black dots are values outside the whiskers. Ancestry groups: AFR, African Americans; EUR, non-Hispanic Whites; EAS, East Asians; SAS, South Asians; CIH, Caribbean Island Hispanics; MCAH, Mexican/Central American Hispanics; OTH, other.

Chromosome 8 had 2 nonoverlapping genome-wide significant loci for osmotic hemolysis within ANK1 (Figure 3, A–D). The first locus is centered on rs4737010 (Figure 3A), and the second is 87 kb away and centered on the NSV rs34664882 (Ala114Val; Figure 3B). PolyPhen2 and SIFT suggested that rs34664882 is deleterious. The SNP appears to have a large quantitative effect on osmotic hemolysis across multiple ancestry groups, accounting for 3.2% of the variation in osmotic hemolysis in the combined data set. The second GWA-significant locus near ANK1 is centered on rs4737009, which is in the canonical binding motif for the MAZ and STAT5A transcription factors (Supplemental Figure 5). It is likely that both rs34664882 and rs4737009 are independent and functionally consequential mutations for osmotic hemolysis. Conditional GWA showed these loci (rs34664882 and rs4737009) are fully independent and each is genome-wide significant, conditional on the other locus. Additional conditional GWA suggested there may be 2 or more independent loci at SEC14L4 and PIEZO1 (data not shown).

Figure 3 LocusZoom and box-and-whisker plots for 2 nonoverlapping genome-wide significant loci in ANK1. (A) LocusZoom plot centered on rs4737010 in ANK1. (B) LocusZoom plot of rs34664882 in ANK1. In these plots, each data point represents an SNP passing quality control in the linear regression analysis of imputed dosage plotted with its P value as a function of genomic position (GRCh38 Assembly). The lead SNP is represented by the purple symbol. The color coding of all other SNPs indicates LD with the lead SNP (estimated by Phase II HapMap CEU r2 values): red, r2 ≥ 0.8; gold, 0.6 ≤ r2 < 0.8; green, 0.4 ≤ r2 < 0.6; cyan, 0.2 ≤ r2 < 0.4; blue, r2 < 0.2; gray, r2 unknown. Recombination rates are estimated from 1000 Genomes phase 3 data. (C) Box-and-whisker plot of osmotic hemolysis measure by genotype and genetic ancestry group for rs4737010. (D) Box-and-whisker plot of osmotic hemolysis measure by genotype and genetic ancestry group for rs34664882. These figures illustrate 2 nonoverlapping genome-wide significant loci with the ANK1 gene. For the box-and-whisker plots, the bounds of the box are the 25th and 75th percentiles, the line in the box is the 50th percentile/median. The whiskers are 1.5 times the interquartile range (25%–75%), and black dots are values outside the whiskers. Ancestry groups: AFR, African Americans; EUR, non-Hispanic Whites; EAS, East Asians; SAS, South Asians; CIH, Caribbean Island Hispanics; MCAH, Mexican/Central American Hispanics; OTH, other.

Within G6PD, the rs1050828 Val68Met variant associated with oxidative hemolysis in this study is a common class III variant, also referred to as G6PD A-. Individuals with class III G6PD variants are susceptible to acute hemolytic anemia when their RBCs are exposed to oxidative stress (32). G6PD deficiency is a chromosome X–linked disorder. Figure 2E shows that female A- heterozygotes have intermediate phenotypes for oxidant-induced hemolysis between the female major allele homozygotes and the few (n = 4) female A- homozygotes who are similar to the male A- hemizygotes. This supports the observation that heterozygotes for many disorders potentially have altered or intermediate phenotypes (33).

Pathway and gene-set enrichment analysis (GSEA) identified 3 Bonferroni-corrected significant groups for osmotic hemolysis: spectrin-associated cytoskeleton (P Bon = 6.77 × 10–4), Steiner erythrocyte membrane genes (P Bon = 2.58 × 10–3), and Nikolsky breast cancer 19p13 amplicon (P Bon = 0.028). For oxidative hemolysis, there were no gene sets significantly enriched after the Bonferroni correction.

Inference of differential expression. MetaXcan (https://github.com/hakyimlab/MetaXcan) was used to infer expression patterns for all genes based on the genotypes that have been identified by GTEx (https://www.gtexportal.org/home/) as expression quantitative trait loci (eQTLs). The inferred gene expression was correlated with spontaneous storage, osmotic, and oxidative hemolysis in the RBC-Omics cohort. Thirteen genes were predicted to be significantly (P < 0.05) differentially expressed and significantly (P < 0.05) associated with osmotic (n = 11) or oxidative (n = 2) hemolysis but not spontaneous storage hemolysis (n = 0; Table 2). Of these, 10 were situated within one of the genome-wide significant regions, and 2 others were close (<700 kb). Most of the genes (SLC4A1, SWAP70, and MFSD2B) found by MetaXcan encode kinases, channels, and metabolic genes whose mechanisms could be affected by changes in gene expression (34–36). MetaXcan did not identify RBC membrane structural genes, such as ANK1 and SPTA1, which is consistent with the previous observations that disease causative variations in genes encoding structural genes tend to be to gain- or loss-of-function mutations, as opposed to changes in gene expression levels (37–39). The most significant SNP in GLRX (rs72785409; P = 6.14 × 10–48) is an eQTL for GLRX in whole blood based on 15 cohorts in the eQTLGen database (40).

Table 2 MetaXcan analysis of genes whose expression is modeled to be associated with osmotic and oxidative hemolysis

Polygenic scores. We modeled the polygenic scores (PGSs) by using data from two-thirds of the population, whereas data from the remaining third was used for validation. We found the pruning and thresholding model in osmotic hemolysis (at P < 10–7 and r2 < 0.4) to validate better than the best LDPred score (correction of best LD pruning = 0.173 versus best LDPred model = 0.0904; Supplemental Figures 6–9). According to these data for osmotic and oxidative hemolysis, pruning and thresholding is a more precise method of developing PGSs than LDPred.

Table 3 highlights the correlation of each of the 3 hemolysis PGSs within each ancestry group with the observed hemolysis measures. Within non-Hispanic White samples, the correlation with osmotic hemolysis was 0.221, which explained more of the variability in osmotic hemolysis than any single marker. The best model for oxidative hemolysis was in African American and MCAH samples, where the PGS correlation is approximately 0.260. Some ancestry groups did not yield PGSs because of small sample sizes or lack of markers with a P value of less than 1 × 10–7 when split for cross validation. To develop predictors within these groups, hemolysis measures by ancestry group were correlated with the non-Hispanic White PGS. This revealed that an ancestry-specific PGS was more precise than those developed in other ancestry groups, even if the latter sample size is larger. Therefore, when possible, PGS should be developed in ancestry-appropriate groups; if not applicable, scores from other ancestry groups can be used but will give diminished precision.

Table 3 Ancestry and cross-ancestry polygenic risk scores

Unlike single-gene disorders in which only a few people contain causal loci, for polygenic traits such as hemolysis everyone has a combination of alleles that increase or decrease hemolysis across all identified loci. For example, for the top 50 loci identified in the non-Hispanic White PGS for osmotic hemolysis, all RBC-Omic donors are heterozygous for between 7 and 34 of the loci (mean ± SD = 18.3 ± 4.6). Thus, genetic factors modulated osmotic and oxidative hemolysis in all individuals.

Genetic analysis of in vivo hemolysis in the WALK-PhASST and PUSH SCD cohorts. To test the hypothesis that the genetic findings obtained from in vitro stress hemolysis perturbations of cold-stored RBCs from healthy blood donors may also be relevant to the in vivo severity of steady-state hemolytic anemia in human diseases, the genome-wide significant SNPs identified in the 27 loci for each hemolysis GWA were then tested in 2 cohorts of patients with SCD (Walk-PHaSST and PUSH). Note that there were 232 significant SNPs within these 27 loci. The same SNPs were tested for association using an in vivo measure of intensity of steady-state hemolytic anemia as a quantitative trait in the SCD patient cohorts. Results between in vitro and in vivo hemolysis were considered consistent if the initial GWA P value was significant at the genome level (P < 5 × 10–8) and the P value for the association in the 2 SCD cohorts was also significant (P < 0.05).

Consistent results were found in 7 regions, including 4 regions for osmotic hemolysis GWA and 3 of 4 regions from the oxidative hemolysis GWA (P < 0.05; Table 4). Significant results were found for osmotic hemolysis on chromosomes 7 (AQP1), 12 (several genes), and 16 (HBA2, PIEZO1). Oxidative hemolysis was concordant for 3 of the 4 genome-wide significant loci including on chromosome 5 (GLRX), 22 (SEC14L4), and X (G6PD). Even using more conservative assessments, the HBA2 and G6PD loci were significant in the SCD cohorts with Bonferroni’s testing correction.