Reduced methylation correlates with diabetic nephropathy risk in type 1 diabetes

Diabetic nephropathy (DN) is a polygenic disorder with few risk variants showing robust replication in large-scale genome-wide association studies. To understand the role of DNA methylation, it is important to have the prevailing genomic view to distinguish key sequence elements that influence gene expression. This is particularly challenging for DN because genome-wide methylation patterns are poorly defined. While methylation is known to alter gene expression, the importance of this causal relationship is obscured by array-based technologies since coverage outside promoter regions is low. To overcome these challenges, we performed methylation sequencing using leukocytes derived from participants of the Finnish Diabetic Nephropathy (FinnDiane) type 1 diabetes (T1D) study (n = 39) that was subsequently replicated in a larger validation cohort (n = 296). Gene body–related regions made up more than 60% of the methylation differences and emphasized the importance of methylation sequencing. We observed differentially methylated genes associated with DN in 3 independent T1D registries originating from Denmark (n = 445), Hong Kong (n = 107), and Thailand (n = 130). Reduced DNA methylation at CTCF and Pol2B sites was tightly connected with DN pathways that include insulin signaling, lipid metabolism, and fibrosis. To define the pathophysiological significance of these population findings, methylation indices were assessed in human renal cells such as podocytes and proximal convoluted tubule cells. The expression of core genes was associated with reduced methylation, elevated CTCF and Pol2B binding, and the activation of insulin-signaling phosphoproteins in hyperglycemic cells. These experimental observations also closely parallel methylation-mediated regulation in human macrophages and vascular endothelial cells.


Replication cohorts
All individuals with diabetes from Hong Kong Diabetes Register attended after an overnight 8-hour fast and underwent structured comprehensive assessments, including eye, feet, urine, and blood examinations. Eye examination included visual acuity and fundoscopy through dilated pupils or retinal photography. A sterile, spot urine sample was used to measure the albumin-to-creatinine ratio (UACR). Microalbuminuria was defined as UACR of 25 to 300mg/g in females and 30 to 300 mg/g in males. Macroalbuminuria was defined as UACR > 300mg/g. eGFR was calculated according to the CKD-EPI equation. Study participants were classified with type 1 diabetes if they presented with diabetic ketoacidosis or required continuous use of insulin within 1 year of diagnosis. Healthy participants were recruited from the community in a community-based health screening program (9). DNA was extracted from peripheral blood leukocytes using standard procedures. Ethical approval was obtained from the Chinese University of Hong Kong Clinical Research Ethics Committee. All study participants gave written informed consent for data analysis and research purpose at the time of recruitment.
For Thai individuals with T1D, demographic data, random plasma C-peptide levels, and blood samples were collected. Individuals with T1D who developed diabetic nephropathy (n=9, micro -UACR ³30<300mg/g and macro -UACR >300mg/g) were compared with individuals without renal complications (n=56) and age-matched healthy controls (n=65). C-peptide was measured by a chemiluminescent immunometric assay (IMMULITE ®, Siemens) which had inter-assay coefficient of variation 3.3 % at plasma C-peptide 0.6 ng/mL. Diabetic nephropathy staging was classified and aligned using albuminuria range (mg/g) measured by urinary albumin-to-creatinine ratio (UACR) in all samples in the discovery and replication cohorts.

Methylation sequencing (methyl-seq)
Purified gDNA was fragmented by sonication using the BioRuptor (Diagenode); fragmentation was confirmed by capillary electrophoresis on the MultiNA (Shimadzu). 500 ng of fragmented genomic DNA was used for methyl-CpG enrichment using MethylMiner (Life Technologies) according to the manufacturer's instructions (positive control methylated DNA spike-in option used). DNA was eluted from the Methyl-CpG Binding Domain-coupled magnetic beads with 0.6 M NaCl, protocol described in more detail (10). Eluted DNA was quantified, and 5 ng of this methylated DNA was used to generate Illumina sequencing libraries using the NEB-Next DNA Library Preparation Kit (New England Biolabs) following the manufacturer's protocol. Cluster generation was performed on an Illumina cluster station at a concentration of 10pM per library, using version 2 Cluster generation kits (single end sequencing) and the flow cell was processed on a Genome Analyzer IIx (Illumina) with 36 cycles using version 2 SBS kits (Illumina). Base calling was performed by the Illumina RTA software version 1.6. Methyl-seq was also performed for human podocytes maintained in culture using the following conditions; NG (normal glucose control), and HG (15 days high glucose) and the demethylating agent 5-Aza-2′-deoxycytidine NG+5adC (normal glucose including 3-day treatment with 5-aza-2′-deoxycytidine, and HG+5adC (15 days glucose including 3-day treatment with 5-aza-2′-deoxycytidine) (n=3 per group). Enriched methylated DNA libraries were prepared using the NEB-Next DNA Library Preparation Kit (New England Biolabs) following the manufacturer's protocol.
Sequencing was conducted on the Illumina HiSeq2500 platform, paired-end 150 bp. The data in this article are deposited in NCBI Gene Expression Omnibus and accessible as GSE77011.

Bioinformatic analysis for methyl-seq
DNA methylation were compared between all pairwise combinations of samples using the MACS peak calling software (version 1.4.0) with a fixed shift size of 75 bp and a significance cut-off of 10E-05 (11,12). Genomic regions showing different methylation patterns between pairs of samples were merged using Bedtools (13,14). Duplicate reads which aligned to the same location in each sample were removed from further analysis. The number of reads aligning to each region (peaks) were counted using Bedtools multicov, producing a count matrix (reads per region per sample). For the genomic annotation of regions, a custom python script was used. The human (h19) gene annotation file was obtained from Genecode (release 14) (15) and was used to assign regions to genes based on the nearest genebody overlap.
Differentially Methylated Regions (DMRs) were identified by using edgeR (16) with a Generalized Linear Model (17). Biological variation caused by differences in age and gender were included as model factors. The batch differences were adjusted by including library cluster concentrations as part of the modelling. To account for differences in cell heterogeneity, we utilized reference-based celltype deconvolution algorithm (in R) (18). Reference-based cell-type deconvolution uses cell-type specific differentially methylated regions (DMRs) to infer cell type proportions, we performed this type of analysis for our data and show the methylation variance associated with cell-type markers for three major white blood cells (B-cell; CD19, T-cell; CD3D, Monocyte; CD14). B cell marker showed the greatest variance among the sequenced samples (Fig. S1C), therefore the methylation status of the B cell marker, CD19, was used to adjust heterogeneity. To validate the modelling, the ranked beta values of variables were tested against related gene sets. Statistical tests were based on a 6-way comparison between controls and cases with different renal status. The following contrasts were included; Healthy vs Normo, Healthy vs Macro, Healthy vs ESRD, Normo vs Macro, Normo vs ESRD and Macro vs ESRD. DMR clusters were identified by using supervised heatmap analysis was performed using ComplexHeatmap (19) in R split by k-means clustering (minimum row_km=4).
The data imputed was DMRs identified at P value <0.01 for the following comparisons; Healthy vs All cases, Healthy vs Normo, Healthy vs. Macro, and Healthy vs. ESRD.
Next, differentially methylated regions identified by edgeR were integrated with the publicly available databases EpiExplorer (20) and ENCODE genome project (21). EpiExplorer is a web tool for exploring genome and epigenome data on a genomic scale and was used to expand the association of DMRs with regulatory elements such as TSS, gene promoters, exons, introns, and Transcription Factor Binding Sites (TFBSs). All EpiExplorer analyses compared identified DMRs with a randomly generated background set of regions to give insights into enrichment. Regions of reduced methylation localise in or around CpG islands (CGIs) (22). To examine this, DMRs identified were integrated with coordinates (Bedfile) of CpG islands from the human genomes (hg19), downloaded from UCSC Genome Browser. Then, DMRs were integrated with regions around CGIs defined as CGI shores and shelves, which are the regions immediately flanking and up to 5kb away from CpG islands (CpG shores; ± 1kb from CGIs, CpG shelves; ± 1-5kb from CGIs). These regions are interesting because they are variably methylated in cancer and development (23). To generate a list of coordinates for CGI shores and shelves, we used Bedtools to extend genomic regions by 1kb for shores and 1-5kb in either direction from CGIs. Then subtractBed was used to remove CGIs coordinates, and therefore coordinates for only shores and shelves remained. Further analysis on TFBS association with differentially methylated genes was performed by intersecting Encode TFBS data selected from experiments conducted in human tissues including heart, kidney, retina, immune cells and the vasculature.
Pathway analysis for differentially methylated regions annotated to genes (DMGs) was performed using Reactome (24,25) and Gene set enrichment analysis (GSEA software) utilizing predefined gene sets from the Molecular Signatures Database (MSigDB v5.0) (26). A false discovery rate (FDR) threshold of 0.05 was used for selection of gene sets derived from the Reactome pathway database.
Gene network analysis and visualization was performed using enrichment map (27) within Cytoscape software (28).
Human podocytes Methyl-seq data analysis was performed using a similar workflow to the FinnDiane Leukocyte data. Briefly, sequenced reads were aligned to a human reference genome (hg19) using BWA-MEM with default alignment parameters and then the sequence alignments for each read were counted. Profiles of DNA methylation were compared between each sample and its respective input, using the MACS peak calling software. The numbers of reads aligning to each region were extracted, producing a matrix of counts (reads per region per sample). Differentially Methylated Regions (DMRs) were identified by using edgeR with a Generalized Linear Model with FDR threshold of 0.05.

Prospective analysis for progression of diabetic nephropathy
A subset of the participants from the FinnDiane and PROFIL validation cohorts were evaluated prospectively for disease progression by change in albuminuria and also follow-up eGFR (estimated glomerular filtration rate). Samples from the diabetes cohorts were classified as non-progressors and progressors based on change in albuminuria stages (Normo to Micro; Micro to Macro; Macro to ESRD). Decline in eGFR was calculated as follows: (last eGFR -first eGFR)/ (years between measurements). No decline was defined as an eGFR slope of > -1ml/min/1.73 m 2 , a slow decline as an eGFR slope of > -3 and < -1 ml/min/1.73 m 2 and steep decline was defined by eGFR slope of < -3ml/min/1.73 m 2 . Some samples were excluded from the analysis based on age at diabetes diagnosis (>30 yrs. old; n=68), no follow-up eGFR data (n=6), time between baseline and last eGFR <2 years (n=5), more than 50% eGFR reduction in 2 years (n=1). In total, we collected prospective data from At confluence all cells models were serum starved for 24h and FBS supplementation was reduced to 2% before glucose treatment. The glucose treatment response was performed for 15 days to a medium containing either 30 mM high glucose (HG) or 5.5 mM normal glucose (NG), medium was changed every three days. For PCT glucose treatment was 40 mM high glucose (HG) or 17.5 mM normal glucose (NG). At day 12, cells were cultured for 3 days in HG or NG glucose medium treated with 8µM of the DNA methylation inhibitor 5-aza-2′-deoxycytidine (Sigma, Cat# A3656; HG-5adC/NG-5adC) and DMSO (vehicle control).

Chromatin immunoprecipitation (ChIP)
Human podocytes, HMEC-1 and HAECs post glucose exposure were washed and cross-linked with 1% formaldehyde (in phosphate buffered saline without calcium and magnesium) for 10 min at room temperature. Quenching for excess formaldehyde was performed with 0.125 M glycine for 10 min. MeCP2 (Sigma, Cat#9317). An IgG ChIP-grade antibody was used as a negative control for nonspecific background enrichment. Antibody-bound chromatin complex was captured with Dynabeads® Protein A magnetic beads (Invitrogen) and 5 washes of each beginning with low salt, high salt, lithium chloride, TE buffer (pH 8.0) followed with TE + 0.01% SDS were performed.
Bound DNA was eluted with ChIP elution buffer (20 mM Tris-HCl pH7.5, 5 mM EDTA, 50 mM NaCl, 1% SDS) and reverse cross-linked for 2 h at 62°C on a thermomixer (Eppendorf) set to 1,400 rpm. DNA was recovered and purified with NucleoSpin® PCR Clean-up Columns (Macharey-Nagel) following the manufacturer's instructions. Quantitative PCR was performed and percentage input (% input) was calculated for each ChIP experiment, and results expressed as relative fold enrichment/ratio for the target sequences compared between the case and control groups. Chromatin immunoprecipitation assays were performed five times per sample for CTCF/Pol2B and three times per sample for MeCP2. 10ng of ChIP eluted DNA was retained for ChIP-seq workflow.

Chromatin immunoprecipitation analysis by quantitative real-time (ChIP-qPCR)
Gene-specific chromatin enrichment was assessed using Chromatin immunoprecipitation followed by quantitative PCR (ChIP-qPCR). Target  Sequenced tags were aligned to the human reference genome hg19 using BWA-MEM with default settings. Uniquely mapped reads with no more than one mismatch were used for binding peak detection. Profiles of ChIP-seq peaks were compared between each sample and its respective input, using the MACS peak calling software. The numbers of read tags aligning to each region were extracted, producing a matrix of counts (tags per region per sample). Differentially Methylated Regions (DMRs) were identified by using edgeR with a Generalized Linear Model with FDR threshold of 0.05.

Human podocyte mRNA-seq
One million human podocyte cells (n=3, per group) were used for total RNA extraction using Trizol reagent (Thermo Fisher Scientific, MA), followed by Direct-zol column purification (Zymo Research). RNA quality was verified on the Shimadzu MultiNA capillary electrophoresis system (Shimadzu, Japan). Following Dynabead Oligo(dT) enrichment (Invitrogen), mRNA was prepared into sequence ready libraries with the NEBNext mRNA Library Prep Reagent Set for Illumina (New England Biolabs). Libraries were sequenced as described above with a 150 bp single read length.
Sequence tags were aligned to the human reference genome hg19 and Ensembl transcript reference (Homo Sapiens 75) using STAR aligner (30). The numbers of read tags aligning to each region were extracted producing a matrix of counts (tags per gene per sample). Genes with less than 100 sequence reads across all samples were removed from further analysis.

Data integration
We explored the relationship between DNA methylation and transcription factor binding using rankrank density analysis, comparing clinical samples with human podocytes. We plotted the signed rank of the -log10(p-value) of the differentially methylated peaks detected in FinnDiane vs signed rank of the -log10(p-value) of the same peaks measured for DNA methylation and transcription factor enrichment in human podocytes. This was transformed into a density distribution using a 2dimensional kernel. This density distribution was then coloured as a heatmap for visualization. The rank-rank analysis was repeated for podocyte data only, comparing DNA methylation and Transcription factor enrichment in human podocytes exposed to chronic HG and 5adC.

Phosphorylation profiling of insulin signalling proteins
Cell lysates isolated from human podocytes exposed to chronic HG and 5adC (n = 2) were lysed using Protein Extraction Buffer (Full Moon BioSystems, Sunnyvale, CA, USA). A total of 2μg of total protein per sample was used to hybridize the PIG219 Ab Array (Full Moon Biosystems) covering major signalling molecules of the insulin signalling and mammalian target of rapamycin (MTOR) pathways. Labelling, detection, and array scanning were performed by Crux biolabs (Melbourne, Australia), according to standard protocols. Data were normalized to the average signal intensity on each array and are depicted as fold change between samples.

Infrared protein detection and quantification
Human podocytes cultured in high and normal glucose medium for 15 days (HG/NG), were treated with 8µM of 5-Aza-2′-deoxycytidine (Sigma) and DMSO (vehicle control) at day 12 for 3 days.

Core DDN gene detection by methyl-qPCR assay
Methyl-binding-domain capture quantitative PCR (denoted as methyl-qPCR) was used to calculate percentage methylation by comparing amplification of the target sequences (DMRs identified from FinnDiane discovery cohort) in unbound (unmethylated) and bound (methylated) fractions enriched after Methylminer assay. Methylated-control primer was used to amplify the methylated DNA spikein added to each sample, assisting in normalizing any technical variability between qPCR experiments (housekeeping control). DDN core gene primer sequences are detailed in Supplementary Table 13.

Bisulfite sequencing
In total, 100 ng of genomic DNA (five samples of each group) was Bisulfite converted using the EpiTect FAST Bisulfite kit (QIAGEN) following the manufacturer's standard protocol. Bisulfite specific primers were designed so that they contained a high number CpGs within a 500bp amplicon limit, for DMRs detected on genes of interest using MethPrimer (31). PCR was performed in a total