Type 2 diabetes candidate genes, including PAX5, cause impaired insulin secretion in human pancreatic islets

Type 2 diabetes (T2D) is caused by insufficient insulin secretion from pancreatic β cells. To identify candidate genes contributing to T2D pathophysiology, we studied human pancreatic islets from approximately 300 individuals. We found 395 differentially expressed genes (DEGs) in islets from individuals with T2D, including, to our knowledge, novel (OPRD1, PAX5, TET1) and previously identified (CHL1, GLRA1, IAPP) candidates. A third of the identified expression changes in islets may predispose to diabetes, as expression of these genes associated with HbA1c in individuals not previously diagnosed with T2D. Most DEGs were expressed in human β cells, based on single-cell RNA-Seq data. Additionally, DEGs displayed alterations in open chromatin and associated with T2D SNPs. Mouse KO strains demonstrated that the identified T2D-associated candidate genes regulate glucose homeostasis and body composition in vivo. Functional validation showed that mimicking T2D-associated changes for OPRD1, PAX5, and SLC2A2 impaired insulin secretion. Impairments in Pax5-overexpressing β cells were due to severe mitochondrial dysfunction. Finally, we discovered PAX5 as a potential transcriptional regulator of many T2D-associated DEGs in human islets. Overall, we have identified molecular alterations in human pancreatic islets that contribute to β cell dysfunction in T2D pathophysiology.


RNA-sequencing
RNA was extracted from human pancreatic islets using AllPrep DNA/RNA kit or miRNeasy Mini Kit (Qiagen, Hilden, Germany). Sample preparation of 1 µg high quality RNA was done using TruSeq RNA Library Preparation Kit or TruSeq Stranded Total RNA Library Prep, followed by RNA-sequencing on HiSeq 2000 or NextSeq 500 (Illumina, San Diego, CA, USA), respectively. For preparation of data and analysis, quality and adapter trimming was done using Trim Galore (https://www.bioinformatics.babraham.ac.uk/projects/trim_galore/), and then Salmon (1) was used for quantification of transcript expression. Quality control (QC) was done using fastQC (https://www.bioinformatics.babraham.ac.uk/projects/fastqc/) and multiQC (2) tools. Downstream analysis was done using R (http://www.R-project.org/). R package tximport (3) was used to import transcript-level abundance, estimated counts and transcript lengths into R using gencode annotation v. 32. Genes with ≥3 reads in ≥10% of samples were considered expressed. The criteria for inclusion of islet preparations in further analysis were islet purity >50% and days in culture ≤7 days (all preparations), as well as >40 years of age and no previous gestational diabetes (preparations from ND controls). Differential expression analysis and identification of DEGs was based on a generalized linear model using DESeq2 (4), with correction for age, sex, islet purity, and days in culture. We performed an additional analysis where we also corrected the generalized linear model for BMI. Deseq2 was also used to generate normalized gene expression counts. The analysis of association between HbA1c and gene expression was performed as above on data from all 176 islet preparations from individuals without a clinical T2D diagnosis for which we had HbA1c values, with the exclusion of an individual with previous gestational diabetes. The analysis was corrected for age, sex, BMI, islet purity, and days in culture.

Overlap between our DEGs and differentially expressed genes in published studies
Public data on differentially expressed genes in islets from ND controls versus individuals with T2D were integrated into lists in the following way; Ensembl IDs were preferred over gene symbols/names due to their more consistent annotation. Ensembl IDs were used directly when available, otherwise, gene symbols were matched to their Ensembl IDs using the R package biomaRt (GRCh38 -Ensembl v.100). Genes that were not assigned to an ID were scanned for synonyms/aliases with biomaRt and a second round of Ensembl ID matching was performed. Unmatched genes without Ensembl IDs were removed from subsequent analyses. Genes with multiple Ensembl IDs were represented by their first matched Ensembl ID.

Pathway analysis
The R package WebGestaltR (v.0.4.4) was used to discover enriched functional terms and pathways in our datasets. Over-representation analysis using the human genome as reference was performed against Gene Ontology (Release 2021-11-16) with an enrichment threshold of FDR below 5% (q<0.05). Pancreatic islets from 18 donors (13 ND, 3 pre-T2D based on HbA1c 42-48 mmol/mol, and 2  T2D, see Table 1 in the main text) were dispersed, fixed and stained as previously described (5), with the exception that MAB1249 (R&D Systems, Minneapolis, MN, USA) was used for glucagon staining. The sorting was performed on an Aria Fusion (BD Biosciences, San Jose, CA, USA). RNA was isolated from sorted cells with the RecoverAll Total Nucleic Acid Isolation Kit (ThermoFisher Scientific, Waltham, MA, USA). Libraries for sequencing were prepared with the TruSeq Stranded Total RNA Library Prep kit (Illumina) and sequencing was done on a NextSeq 500 (Illumina). The data were analyzed as described above, with adjustment for individual id as covariate in DESeq2 (4).

eQTL analysis
Genome-wide genotyping for islet samples in the LUDC islet case-control cohort (134 ND and 33 donors with T2D) was run on the OmniExpress genotype arrays on the Illumina platform. Quality of genotyping was assessed before imputation separately and samples were excluded if they had an overlap genotype success rate <95%. Quality control was performed using the PLINK1.9 software (6). Briefly, samples with missingness rate of genotype data larger than 5% were removed and we assessed concordance between reported sex and the sex based on genotype data. Loci with minor allele count (MAC) smaller than five or a p-value smaller than 1x10 -4 were removed. Heterozygosity of the samples was checked and none of the samples were outside three standard deviations from the mean. Finally, no duplicate or related samples were found, and monomorphic sites were removed. Michigan imputation server (https://imputationserver.sph.umich.edu/) was used for imputation utilizing HRC version r1.1 as a reference panel. Imputation was conducted using the Sanger imputation service (https://imputation.sanger.ac.uk/) using MINIMAC3 and Haplotype Reference Consortium version r1.1 as a reference panel with array build hg38. SNPs with an INFO score >0.4 and Hardy-Weinberg Equilibrium p>1x10^− 6 (for chrX this was calculated from female individuals only) from each panel were kept. A vcf for each chromosome was generated. Dosages were calculated from the imputation probabilities and converted into best-guess genotypes using PLINK. Files from all chromosomes were merged and transposed.
The RNA-sequencing data of our 395 DEGs from the LUDC islet case-control cohort were normalized for sequencing depth for the selected genes and further processed by quantile normalization to obtain normal distribution while preserving the ranks. A cis-eQTL analysis was implemented on the Matrix eQTL software using a linear model with age, sex, BMI, days in culture, islet purity and T2D status as covariates. Association of genotype with gene expression of our 395 DEGs was computed within a window of SNPs located 500kb upstream and 500kb downstream of respective gene, and effects are presented as beta coefficients and tstatistics. Significance levels are indicated by raw p-values and multiple testing correction by FDR adjustment. A subset of the donors included in the LUDC eQTL analysis were also included in the INSPIRE eQTL analysis.

Identification of SNPs associated with T2D and/or glycemic traits
We downloaded SNPs associated with T2D and/or glycemic traits (corrected insulin response (CIR), Disposition index (DI), HOMA-B, fasting glucose (FG), fasting insulin (FI), HbA1c) based on genome-wide significance (p-value ≤ 5x10 -8 ) from the Common Metabolic Diseases (CMD) Knowledge Portal (https://hugeamp.org). In this analysis, we investigated SNPs that reside in regions stretching from 500kb upstream of the transcription start site to 500kb downstream of the transcription end site of the 395 DEGs.

Mouse models
The International Mouse Phenotyping Consortium (IMPC) is a joint initiative between 19 research organizations across Europe, North America and Asia, whose goal is to generate a comprehensive, robust, reproducible, and freely accessible catalogue of functions for all mammalian genes. The IMPC uses standardized operating procedures, to create and phenotype global knock out strains for individual genes, which ensures the quality of its data and makes it an excellent resource for the global scientific community (7,8). Where possible, we retrieved and collated IMPC data for three relevant metabolic based phenotypic measurements (data extracted on April 15 th , 2020). These were: 1. insulin blood level (IB), 2. intraperitoneal glucose tolerance test (IPGTT) with readings for three parameters; area under glucose response curve (AUCG), initial response to glucose challenge (IGR) and fasted blood glucose concentration (FG) and, 3. DEXA (dual-energy X-ray absorptiometry) analysis with readings for two parameters; fat and lean mass adjusted by total body weight (Fat/BW and Lean/BW). All IMPC data retrieved for our analysis were generated from experiments performed in single-gene adult KO (7M/7F) and wild type (large variable n) mice on a C57BL/6N background (11-16 weeks), and for the purposes of our study we considered a p-value <0.05 significant.

siRNA transfection of human islets
100-200 islets were seeded in culture dishes containing RPMI media with 5mM glucose, 10% FBS, 200mM L-glutamine. A final transfection volume of 2.5mL per dish contained 50nM of Silencer® Select Pre-Designed siRNA against OPRD1 (siRNA ID s9862), CHL1 (s21124), HHATL (s33086) or SLC2A2 (s12928) in Opti-MEM reduced serum media and 6.25μL of Lipofectamine RNAiMAX (all from ThermoFisher Scientific). A second transfection was performed 24h after the first transfection and all functional experiments were performed 72h after the first transfection. In total, islets from eight donors were transfected.

RNA extraction and quantification by Real-time Quantitative PCR
Total RNA was extracted from siRNA-transfected human islets with the miRNeasy isolation kit (QIAGEN) and converted to cDNA with the RevertAid First Strand cDNA synthesis kit (ThermoFisher Scientific). Quantitative PCR (qPCR) was performed in triplicates on a 384well plate using Applied Biosystems QuantStudio 7 Flex Real Time PCR system (ThermoFisher Scientific) under default cycling parameters. TaqMan assays were used to measure mRNA expression of OPRD1 (assay ID Hs00538331_m1), CHL1 (Hs00544069_m1), HHATL (Hs00375325_m1), and SLC2A2 (Hs01096908_m1). We used PPIA (Hs04194521_s1) and HPRT1 (Hs02800695_m1) to normalize the mRNA expression. All TaqMan assays and qPCR reagents were purchased from ThermoFisher Scientific. Threshold levels of all Ct values were automatically set, and mRNA expression was normalized using the geometric means of the two endogenous controls. Relative expression was calculated with the ΔΔCt method.

Cell culture, transfection and transduction
The clonal rat β-cell line 832/13 INS1 (9), a kind gift from Professor Christopher Newgard (Duke University, Durham, NC, USA), was used for overexpression experiments. C-terminally HA-tagged (a three amino acid glycine linker followed by the tag) cDNA sequences for Barx1, Elfn1, Faim2, Nefl, Pax5, Pcocle2, and Sfrp1 were cloned into the pcDNA3.1 expression vector by GenScript (Piscataway, NJ, USA). The plasmids were transfected into INS1 β-cells as previously described (10). A plasmid containing the HA-tagged cDNA of GFP was used as control. Lentiviruses containing HA-tagged (as above) cDNA of Barx1, Nefl, Pax5, Pcolce2, or Sfrp1, upstream of an internal ribosomal entry site and the cDNA for GFP, were produced by Lund University Cell and Gene Therapy core facility. A lentivirus containing only the HAtagged cDNA of GFP was used as control. Expression of the respective cDNA was driven by the human phosphoglycerate kinase 1 promoter. Cells were transduced with 1.25 viral particles per cell for 24h and all experiments performed 72h after initiation of transduction.

Insulin secretion in INS1 β-cells
Transduced INS1 β-cells were washed in secretion assay buffer (SAB; 114mM NaCl, 4.7mM KCl, 1.2mM KH2PO4, 1.16mM MgSO4, 20mM HEPES, 2.5mM CaCl2, 25.5mM NaHCO3, 0.2% BSA, pH 7.2) supplemented with 2.8mM glucose, followed by a 2h pre-incubation at 37 °C in the same buffer. Insulin secretion was then determined by stimulating the cells with fresh SAB with 2.8 or 16.7mM glucose for 1h. For secretion experiments with high K + , NaCl was proportionally decreased to maintain osmolarity in the buffer. After stimulation, cells were dissolved in RIPA buffer. Insulin was measured by ELISA (#10-1145-01, Mercodia) and secreted insulin normalized to the total insulin content. Total protein in each well was measured with the BCA Protein Assay Kit (ThermoFisher Scientific).

Mitochondrial oxygen consumption measurements
Mitochondrial oxygen consumption rate (OCR) was evaluated with an XFe24 extracellular flux analyzer (Agilent). Transduced INS1 β-cells were starved in SAB containing 2.8mM glucose for 2h and OCR was measured in unbuffered SAB (i.e., without bicarbonate and HEPES) every 3 minutes for 90 minutes. OCR was measured at basal glucose (2.8mM glucose) and after addition of 16.7mM glucose, 5μM oligomycin, 4μM Carbonyl cyanide ptrifluoromethoxyphenylhydrazone (FCCP), and 1μM Rotenone/Antimycin A. Wave Seahorse Software and Seahorse Analytics online tool (seahorseanalytics.agilent.com) were used to analyze the data. Non-mitochondrial respiration was subtracted, and OCR data normalized to total protein content as measured by BCA assay.

MTT assay
The viability of transduced INS1 β-cells was investigated with the Cell Proliferation Kit I (Roche, Basel, Switzerland), a MTT reagent kit, according to the manufacturer's instructions.

EdU incorporation assay
Proliferation was quantified using Click-iT™ Plus EdU Cell Proliferation Kit for Imaging (ThermoFisher Scientific). Cells were plated and transduced in 8-well chamber slides coated with poly-D-lysine. On the day of analysis, cells were exposed to 10μM of 5-ethynyl-2′deoxyuridine (EdU) for 4h at 37°C and then fixed with 4% paraformaldehyde in PBS. EdU labelling was done according to the manufacturer's instructions. Images, 12 per well, were acquired in random locations of the wells with a confocal microscope and quantification was done manually with the Cell Counter Plugin in ImageJ to manually tag and count stained cells.

Transcription factor binding motif analysis
Analysis of transcription factor binding motif enrichment within promoter regions (-450bp -+50bp relative to the transcription start site) was performed by using Pscan (12) and the JASPAR database.

Transcriptomic profiling in Pax5-overexpressing INS1 β-cells
Total RNA quality was assessed by Agilent Technologies 2200 Tapestation and concentrations were measured by NanoDrop ND-1000 Spectrophotometer. 100ng of total RNA from Pax5overexpressing (n=8) and control (GFP-overexpressing, n=8) INS1 β-cells was used in downstream analyses. GeneChip® WT Plus Reagent Kit (ThermoFisher Scientific) was used to generate amplified sense strand cDNA targets, fragmentation and labelling. 2.3µg of cDNA target was hybridized to Clariom™ S Rat Arrays for 16h at 45°C under rotation in Affymetrix Gene Chip Hybridization Oven 645 (ThermoFisher Scientific). Washing and staining was carried out on Affymetrix GeneChip® Fluidics Station 450 (ThermoFisher Scientific), according to the manufacturer's protocol. The fluorescent intensities were determined with Affymetrix GeneChip Scanner 3000 7G (ThermoFisher Scientific). Processing of raw data was performed in Transcriptome Analysis Console (TAC, v4.0) using quantile normalization and gene-level SST-RMA summarization. Sample group comparisons were performed using eBayes ANOVA tests, and p-values were corrected for multiple testing using the Benjamini-Hochberg method. Data were further processed to remove probes that corresponded to multiple genes and genes with outdated names/IDs. The R package biomaRt (Rnor_6.0 -Ensembl v. 104) was used to retrieve updated Ensembl gene names/IDs from the provided gene names, NCBI Reference Sequence IDs, Entrez IDs or Ensembl transcript IDs. After this, probes that corresponded to more than 1 or no genes were discarded. Genes that corresponded to multiple probes, and therefore measured more than once, were represented by the probe with the highest absolute fold-change value between the sample groups.

Co-expression analysis
Co-expression networks among the DEGs were constructed using the R WGCNA package (v. 1.70-3) (13). The analysis was performed using the blockwiseModules function with parameters: power = 10, networkType = "signed", minModuleSize = 20, mergeCutHeight = 0.25, minKMEtoStay = 0, pamRespectsDendro = F. Default settings were selected for the rest of the parameters. First, a similarity matrix was constructed by calculating the correlation coefficient of each gene using Pearson's correlation. Here, a variance stabilizing transformation was applied to the gene expression values using Deseq2 and used as input. Considering only positively correlated genes as connected (signed network type and default by WGCNA), a power function was used to transform the similarity network into an adjacency matrix. After testing a range of power values and evaluating the corresponding scale independence and mean connectivity plots, a soft-thresholding power of 10 was selected to achieve a scale-free network with degree of independence above 0.80. Next, a topological overlap matrix was obtained from the adjacency matrix and hierarchical clustering using a dynamic tree cut algorithm was applied to identify co-expression gene clusters (modules).  Figure S1. Donor characteristics and expression in human islets. (A-B) Graphs showing the age (A) and BMI (B) distribution in the LUDC islet case-control cohort. **p<0.01 based on a two-tailed t-test. The line shows the median. (C-D) Expression in the LUDC islet case-control cohort of the eight and nine genes that have previously been shown to be differentially expressed in α-(C) and β-cells (D), respectively, based on single cell RNA-sequencing (14)(15)(16). *q<0.05, **q<0.01, ***q<0.001, based on a generalized linear model as implemented in DESeq2 (4), with correction for age, sex, purity, and days in culture. (E) Gene ontology enrichment among the 90 islet DEGs that exhibit higher expression in α-cells compared to βcells in the LUDC sorted α/β-cell cohort. Box-and-whisker plots show the median, 25 th and 75 th percentile, and minimum and maximum values.