Multiscale genetic architecture of donor-recipient differences reveals intronic LIMS1 mismatches associated with kidney transplant survival

Donor-recipient (D-R) mismatches outside of human leukocyte antigens (HLAs) contribute to kidney allograft loss, but the mechanisms remain unclear, specifically for intronic mismatches. We quantified non-HLA mismatches at variant-, gene-, and genome-wide scales from single nucleotide polymorphism (SNP) data of D-Rs from 2 well-phenotyped transplant cohorts: Genomics of Chronic Allograft Rejection (GoCAR; n = 385) and Clinical Trials in Organ Transplantation-01/17 (CTOT-01/17; n = 146). Unbiased gene-level screening in GoCAR uncovered the LIMS1 locus as the top-ranked gene where D-R mismatches associated with death-censored graft loss (DCGL). A previously unreported, intronic, LIMS1 haplotype of 30 SNPs independently associated with DCGL in both cohorts. Haplotype mismatches showed a dosage effect, and minor-allele introduction to major-allele-carrying recipients showed greater hazard of DCGL. The LIMS1 haplotype and the previously reported LIMS1 SNP rs893403 are expression quantitative trait loci (eQTL) in immune cells for GCC2 (not LIMS1), which encodes a protein involved in mannose-6-phosphase receptor (M6PR) recycling. Peripheral blood and T cell transcriptome analyses associated the GCC2 gene and LIMS1 SNPs with the TGF-β1/SMAD pathway, suggesting a regulatory effect. In vitro GCC2 modulation impacted M6PR-dependent regulation of active TGF-β1 and downstream signaling in T cells. Together, our data link LIMS1 locus D-R mismatches to DCGL via GCC2 eQTLs that modulate TGF-β1–dependent effects on T cells.


Figure S2. Association of different genome-wide D-R mismatch scores with DCGL in European-
to-European transplants in GoCAR and CTOT.Genome-wide mismatch scores were calculated for all SNPs, exonic SNPs (SNPs located in exonic region), non-exonic SNPs (all SNPs minus exonic SNPs).Univariate and multivariable Cox regression (adjusting HLA mismatch score, induction therapy, and donor status) analyses were performed for both GoCAR and CTOT cohorts.

Figure S3
. Association of different genome-wide D-R mismatch scores with DCGL in all transplants.Genome-wide mismatch scores were calculated for all SNPs, exonic SNPs (SNPs located in exonic region), non-exonic SNPs (all SNPs excluding exonic SNPs).Univariate and multivariable Cox regression (adjusting HLA mismatch score, induction therapy, and donor status) analyses were performed for both GoCAR and CTOT cohorts.HLA 4-antigen (A, B, DR, and DQ) mismatch score was used in GoCAR and 6-antigen (A, B, C, DP, DR, and DQ) mismatch score in CTOT.

Figure S4. Correlation among D-R mismatch scores defined within different genomic scopes in
GoCAR.SNP-level any mismatches were summed for all SNPs, exonic SNPs, non-exonic SNPs (all SNPs excluding exonic SNPs), all-nsSNPs (all SNPs excluding non-synonymous SNPs), nsSNPs-trSNPs (non-synonymous SNPs excluding transmembrane SNPs), and trSNPs (transmembrane SNPs) as raw counts, and normalized by their scope-specific IQR, respectively.1 and Methods) for the whole GoCAR cohort or the subset of European-to-European (E-to-E) D-R pairs.In GoCAR E-to-E D-R pairs, Kaplan-Meier plots show the graft survival curves for equally dichotomized groups of mismatch scores at LIMS1 locus, where mismatch scores were defined as "any mismatch" in (B) and "double mismatch" in (C).P-values were derived from log-rank tests in comparison of upper quantile versus lower quantile.The LD structure of the region surrounding the 30 candidate SNPs and rs893403 generated by LDlink, displayed as the R 2 value with a representative SNP rs200106875 of the haplotype of 30 candidate SNPs.The SNP rs893403 is highlighted by a red dot, clearly located in a distinct haplotype other than the 30-SNP haplotype.Gene models are shown below.A-C), E-to-E (D-F), and non-E-to-E (G-I) D-R pairs.Major: mismatch derived from major haplotype allele introduced by donor to the recipient with homozygous minor allele and no rs893403 risk mismatch; Minor: mismatch derived from minor haplotype allele introduced by donor to the recipient with homozygous major allele and no rs893403 risk mismatch; rs893403: mismatch at rs893403 defined as risk allele (A allele) introduced by donor to the recipient carrying G/G genotype; Other: no mismatch at the haplotype and rs893403.P-values were derived from log-rank tests.c : P-value was calculated from unpaired for continuous variables and from Fisher's exact test for categorical variables unless otherwise specified.Bold p-value < 0.05.

Figure S12. eQTL data of rs893403 (A) and the haplotype (represented as one of the candidate SNP rs4012003) (B) in tubulointerstitial tissue from NephQTL
d : Genetic ancestry was inferred from genome-wide genotype data and considered more accurate than self-reported race 3 .
f : In order to calculate the p-value, the raw HLA mismatch score used in CTOT was hereby categorized in the same way as GOCAR so that the HLA mismatch scores originally defined on different scales in the two cohorts are comparable.The p-value was calculated by Fisher's exact test.Table S13.GCC2 expression was associated with SNP rs893403 and the haplotype genotype in multiple blood cell types from healthy individuals in the DICE data 6 .
Table S14.List of SNPs in high LD with SNP rs893403 and located at peak regions of the LIMS1 locus in the kidney scATAC-seq data 7 .
Table S15.List of SNPs in high LD with the haplotype and located at peak regions of the LIMS1 locus the kidney scATAC-seq data 7 .

Figure S5 .
Figure S5.Manhattan plots for genome-wide association tests of gene-level mismatch scores with DCGL in GoCAR using different models: (A) double mismatch in all cohort, (B) double mismatch in the sub-cohort of E-to-E D-R pairs, (C) any mismatch in all cohort, and (D) any mismatch in the sub-cohort of E-to-E D-R pairs.The 23 genes consistently showing significant association signals (nominal P ≤ 0.05) are highlighted in red diamond.

Figure S6 .
Figure S6.Gene-level mismatches associated with graft loss.(A) Venn diagram shows the number of genes identified with mismatch score significantly associated with DCGL (nominal p <= 0.05) from four different analyses: double mismatch or any mismatch (definition in Figure1and Methods) for the whole GoCAR cohort or the subset of European-to-European (E-to-E) D-R pairs.In GoCAR E-to-E D-R pairs, Kaplan-Meier plots show the graft survival curves for equally dichotomized groups of mismatch scores at LIMS1 locus, where mismatch scores were defined as "any mismatch" in (B) and "double mismatch" in (C).P-values were derived from log-rank tests in comparison of upper quantile versus lower quantile.

Figure S7 .
Figure S7.Summary of mismatch score of the 23 candidate genes was significantly related to DCGL.(A) The Correlation of the summary score of the 23 genes and other gene regions with genome-wide mismatch score.(B) Survival curve of the patients stratified by the 23 genes' mismatch score grouped by mean value.(C) Forest plot of the hazard ratio of 23 genes' mismatch score to DCGL adjusted by mismatch score of the other genome regions, Induction therapy, donor status and HLA mismatch.

Figure S8 .
Figure S8.Linkage disequilibrium between 30 top candidate SNPs and rs893403 within the LIMS1 locus.(A)The LD metrics R 2 (red) and D' (blue) were calculated based on the genotype data from all the continental populations of the 1000 Genomes Project (hg19) using LDlink 1 .(B) The LD structure of the region surrounding the 30 candidate SNPs and rs893403 generated by LDlink, displayed as the R 2 value with a representative SNP rs200106875 of the haplotype of 30 candidate SNPs.The SNP rs893403 is highlighted by a red dot, clearly located in a distinct haplotype other than the 30-SNP haplotype.Gene models are shown below.

Figure S9 .
Figure S9.Kaplan-Meier curves of DCGL for the CTOT patients grouped by presence and absence of any D-R mismatch of the identified LIMS1 haplotype.P-value was derived from log-rank test.

Figure S10 .
Figure S10.Allele frequency of rs893403 (A) and haplotype (B) in different major ancestral populations.The allele frequencies were retrieved from 1000 Genomes project deep wholegenome sequencing data.The MAF of the haplotype was calculated as the mean value of the MAF of the 30 SNPs.

Figure S11 .
Figure S11.Directionality of the haplotype mismatch associated with DCGL in GoCAR.Kaplan-Meier plots show the graft survival curves grouped by the directionality of the mismatches at the haplotype and the rs893403 risk mismatch in all (A-C), E-to-E (D-F), and non-E-to-E (G-I) D-R pairs.Major: mismatch derived from major haplotype allele introduced by donor to the recipient with homozygous minor allele

2 .Figure
Figure S12.eQTL data of rs893403 (A) and the haplotype (represented as one of the candidate SNP rs4012003) (B) in tubulointerstitial tissue from NephQTL 2 .

Figure S15 .
Figure S15.Expression levels of LIMS1 and GCC2 in different immune cell types from published bulk and single-cell RNA-seq datasets.(A) Distribution of LIMS1 expression in sorted PBMC subtypes from the DICE cohort.(B) Expression levels of LIMS1 and GCC2 in the PBMC single cell data.

Figure S16 .
Figure S16.Enriched functions and transcription factors of GCC2 co-expressed genes in multiple T cell subtypes from the DICE cohort.

Figure S17 .
Figure S17.Survival curves of DCGL in all patients(A) and E-to-E (B) of the GoCAR cohort grouped by rs2460944 risk allele (A allele) introduced by donor.

Figure S18 .
Figure S18.GCC2 modulate generation of active TGFB1 and downstream signaling in lymphocytes and epithelial cell lines.(A-C) We overexpressed either a GFP-GCC2 (GCC2-OE) or a GFP Control construct and confirmed overexpression in IMCD cells, followed by extraction of protein lysates, subcellular fractionation, and immunoblotting (n=2 sets).(A) Representative immunoblots of cellular fractions probed for GCC2, CI-M6PR, Calreticulin, and GAPDH are displayed for GCC2-OE-vs GFP-control-IMCD cells.(B) Dot plots show corresponding densitometric quantifications of CI-M6PR in the MFs (normalized to Calreticulin) from two experiments.(C) Dot plot shows corresponding ratios of active (LAP cleaved) to total TGFB1 levels (both in pg/ml normalized to Control in each experiment and analyzed by paired ttest) in GCC2-OE and Control IMCD supernatants assayed by ELISA after 24 hours serum starvation.(D-G) To further investigate the cellular role of GCC2, we used Lentiviral-ShRNA mediated knockdown of GCC2 (3 sequences named GCC-Kd1, -Kd2, -Kd3 respectively), and compared this with a scramble shRNA (RFP Scramble) infected control lentivirus in Jurkat Tcells, HEK-293Tcells, or IMCD cell lines.(D) Representative Immunoblots probed for GCC2 (normalized to GAPDH) confirmed GCC2 knockdown in HEK 293 cells, and total lysates were immunoblotted for Phosphorylated and total SMAD3.Dot plots show ratios of active (LAP cleaved) to total TGFB1 levels (both in pg/ml normalized to scramble in each experiment and analyzed by paired T-test) in GCC2-knockdown vs. Scramble-infected supernatants from Jurkat T-cell (E) and IMCD cells (F) assayed by ELISA after 24 hours serum starvation (n=2 sets of experiments each).(G) A representative standard curve is generated during TGFB1-ELISA experiments [Ratios paired t-test unless specified; *: P<0.05; CI-M6PR: Cation independent mannose-6-Phosphate receptor; MF and CF: membrane and cytoplasmic fraction of lysate; IMCD: rat inner medullary collecting duct cells; TGF-β: TGFB1].

Supplementary tables Table S1. Demographic and clinicopathologic characteristics of donors and recipients with genome-wide genotype data in the GoCAR and CTOT1/17 cohorts.
Genome-wide genotype data is available for 385 donor-recipient (D-R) pairs from the parent GOCAR study after data processing and quality control detailed elsewhere 3 .Genome-wide genotype data is available for 146 donor-recipient (D-R) pairs from the parent CTOT study after data processing and quality control detailed elsewhere [ref KI].
a :b :

Table S2 . Causes of DCGL in the GoCAR cohort.
*Includes BK virus disease

Table S7 . Association of LIMS1 gene-level mismatch, rs893403, and haplotype mismatch with DCGL in 385 GoCAR patients using muti-variable Cox regression analysis adjusted by the presence of anti-HLA DSA.
Cox model adjusted by genome-wide mismatch, donor specific antibody, induction therapy, donor status and HLA mismatch.
a :

Table S8 . Top candidate variants at the LIMS1 locus with D-R mismatches associated with graft loss. Table S9. Association of identified LIMS1 haplotype with DCGL in CTOT using multivariable Cox model.
Induction therapy was excluded from adjusted covariates because the model would have not converged when including the variable.
a :

Table S10 . The directionality of the haplotype mismatch in association with DCGL in the non-E-to-E GoCAR cohort
a : Definition of the mismatch group.Major: mismatch derived from major haplotype allele introduced by donor to the recipient with homozygous minor allele and no rs893403 risk mismatch; Minor: mismatch derived from minor haplotype allele introduced by donor to the recipient with homozygous minor allele and no rs893403 risk mismatch; rs893403: mismatch at rs893403 defined as risk allele (A allele) introduced by donor to the recipient carrying G/G genotype; Reference: no mismatch at the haplotype and rs893403.

Table S11 . Correlation of LIMS1 and haplotype mismatch score with subclinical and clinical TCMR episodes
a Adjusted by genome-wide mismatch, induction therapy donor status and donor age.

Table S12 . Correlation of number of haplotype minor alleles in the donor and number of haplotype mismatches with 12-month CADI or Ci+Ct score.
a Ordinal Logistic regression adjusted by genome-wide mismatch, induction therapy donor status and donor age.