Global chromatin landscapes identify candidate noncoding modifiers of cardiac rhythm

Comprehensive cis-regulatory landscapes are essential for accurate enhancer prediction and disease variant mapping. Although cis-regulatory element (CRE) resources exist for most tissues and organs, many rare — yet functionally important — cell types remain overlooked. Despite representing only a small fraction of the heart’s cellular biomass, the cardiac conduction system (CCS) unfailingly coordinates every life-sustaining heartbeat. To globally profile the mouse CCS cis-regulatory landscape, we genetically tagged CCS component–specific nuclei for comprehensive assay for transposase-accessible chromatin–sequencing (ATAC-Seq) analysis. Thus, we established a global CCS-enriched CRE database, referred to as CCS-ATAC, as a key resource for studying CCS-wide and component-specific regulatory functions. Using transcription factor (TF) motifs to construct CCS component–specific gene regulatory networks (GRNs), we identified and independently confirmed several specific TF sub-networks. Highlighting the functional importance of CCS-ATAC, we also validated numerous CCS-enriched enhancer elements and suggested gene targets based on CCS single–cell RNA-Seq data. Furthermore, we leveraged CCS-ATAC to improve annotation of existing human variants related to cardiac rhythm and nominated a potential enhancer-target pair that was dysregulated by a specific SNP. Collectively, our results established a CCS-regulatory compendium, identified novel CCS enhancer elements, and illuminated potential functional associations between human genomic variants and CCS component–specific CREs.


48
For enhancer cloning, PCR primers were designed to amplify the mm10 coordinates corresponding 49 to 4 candidate CCS 5enhancers: mm1326, hs2384, hs1932, and hs1751. For enhancers 50 3 containing GWAS SNPs, the corresponding mm10 genomic coordinates were extended by 80 51 basepairs in both directions from the presumptive SNP location to design PCR primers. For Figure   52 5, we chose to only examine minor and reference alleles for the PR SNP (rs3807989) because 53 there was near-perfect conservation between the rodent and human genomic sequences. Given 54 that mouse regulatory elements were lifted over to the human genome and that we tested 55 elements in primary mouse cells, we did not want to use candidate enhancers without high 56 sequence conservation. For rs3807989, the minor allele version was made using PCR primers 57 containing the appropriate mutation. PCR was performed with WT mouse genomic DNA. Next, 58 each enhancer sequence was cloned into a destination reporter vector containing a Hsp68 59 promoter coupled to the luciferase reporter gene (pGL3-hsp68-luciferase) using Gibson Assembly ®

60
Cloning Kit (NEB, # E5510S). PCR and/or restriction enzyme analysis were used to confirm 61 successful enhancer insertion. were pooled from (n = 5) Cre + hearts. In order to obtain single cell suspension of micro-dissected 84 SAN, we used the Pierce Cardiomyocyte Isolation Kit (Thermoscientific, #88281). The cell yield 85 and viability were determined using Trypan Blue stain on a hemocytometer counting chamber. 86 Since P28 CMs are too large to fit within the sequencing droplets of the 10X Chromium platform 87 and our prior AVCS scRNA-seq dataset was obtained from cells rather than nuclei, we performed 88 SAN scRNA-seq on hearts derived from P4 mice. for these samples, we called the regions using genomic background and Poisson p-value over local 120 region cutoff of 1e-04.

121
After calling significantly enriched ATAC-seq regions, for QC purposes we compared these 122 regions and called regions shared between two samples if the regions from the two samples 123 overlapped by at least 1bp. Regions were called unique if they did not overlap with ATAC-seq regions 124 from any other sample. We also performed correlation analysis to measure the reproducibility 125 6 between the replicates for each sample group. Since biological replicates within each sample group 126 showed high correlation (R > 0.98), we combined the sequencing files (fastq) for each biological 127 replicate in a given sample to obtain more sequencing depth. Combined fastq files were processed 128 again using the pipeline described above, and significantly enriched ATAC-seq regions were 129 identified using HOMER with 8 week adult whole heart input serving as a control. To determine 130 differential accessibility between samples, we used an occupancy-based analytical method(79).

131
Specifically, the Fisher's exact test was used to identify ATAC-seq peaks that were found 132 significantly more in a CCS component compared to CMs, or vice versa.

136
HOMER uses internal background within the sample to call the ATAC-seq peak). To do the initial 137 quality control and check replicates' reproducibility we performed the MergePeaks option. Since 138 the reproducibility is very high, for the reminder of the paper we combined the sample replicates' 139 (e.g. AVN-1 and AVN-2, VCS-1 and VCS-2, etc.,) raw data. The annotation of peaks was done 140 using ChIPseeker(81). The genome-wide distribution of ATAC-seq regions on promoters, exons, 141 introns, and intergenic regions in each of the samples was also done using ChIPseeker. We used 142 HOMER to normalize ATAC-seq read count per basepair to 10 million total mapped tags. We 143 generated Pearson correlation graphs in R and Venn plots using venerable R package for the  The shared regions between replicates in SAN, AVN, VCS, and CM samples were obtained as  To compare with mouse ENCODE DHS(82) datasets (https://www.encodeproject.org/), we 158 used the above-mentioned combined peaks and called the peaks with adult mouse heart (H) ChIP 159 input (https://www.encodeproject.org/, see Table S3 for accession ID) using HOMER. Peaks were 160 compared between the samples using bedtools intersectBed tool(83). We separated the peaks into 161 distal and promoter proximal peaks based on a ±2 kb cutoff relative to the annotated TSS. De 162 novo motif analysis was performed using findMotifsGenome module available in HOMER package.

163
To identify normalized ATAC counts that were significantly enriched in a specific CCS component   direction. We initially attempted to use multiple-testing corrected p-values in the form of FDR to 191 identify significant GO terms, but we did not retrieve a large list of terms that met the arbitrary FDR 192 cutoff of <0.05. Therefore, we used the GREAT analysis as a way to confirm the specificity of our 193 ATAC-seq datasets by focusing on binomial enrichment, since this approach tends to highlight 194 terms with strong biological significance(91). For Gene Regulatory Network (GRN) inference, we used a previously described method(92).

198
Briefly, vertebrate TF motifs from the JASPAR database(93) were used as the basis for inferring 9 GRNs. Only TFs with high-purity (>0.7) binding sites overlapping an ATAC-seq peak were 200 retained(92). Binding sites located in the gene body or in the 2,500-bp region upstream of its TSS 201 were assigned to the overlapping gene(s), and intergenic binding sites were assigned to the gene 202 whose TSS was closest to the peak. To infer GRNs, an interaction score (defined as the sum over 203 all TF binding sites of a TF that can be assigned to a gene) between a TF and a gene was 204 calculated as previously described (92). This score provided a unidirectional (TFs to genes) and 205 weighted (based on the interaction score) relationship, establishing the edges of the GRN. Upon normalizing for overall TF connectivity by calculating an average TF-to-gene interaction 219 score for each TF motif, we compared each TF score for a specific CCS component with the 220 corresponding TF score in CMs. By computing a differential TF score, we rank-ordered TF sub-221 networks based on their degree of motif enrichment in each component relative to CMs. TFs that 222 showed differential score of > +0.5 ( Figure 3A, D) were used to construct the representative sub-223 networks and visualized using "Directed Diffuse Layout" on Cytoscape v3.7.2. TF motifs identified 224 10 in our CCS-ATAC datasets and their respective enrichment in each compartment were used for 225 calculating TF differential scoring data.

227
Identification and validation of TF motifs associated with predicted target genes 228 Using FIMO (Find Individual Motif Occurrences) (http://meme-suite.org/doc/fimo.html), we scanned 229 for individual matches to the EWSR1-FLI1 (GGAA)n, Etv1 (ACCGGAAGT), and Onecut1 230 (AAAAATCGATA) motifs. We filtered the sequence menu to only contain databases that have 231 additional information that is specific to UCSC Mammal genome-->Mouse--> Heart tissue DHS-seq 232 data (ENCODE) such that MEME Suite uses tissue/cell-specific information. Using the FIMO 233 output, we scanned the chromosome coordinates harboring the desired motif/sequence for 234 matches with our list of target genes that showed interaction with EWSR1-FLI1 or Onecut1 in the 235 SAN and AVN subnetworks. TF binding sites located in the gene body or in the 15 kb region 236 upstream of its TSS were assigned to the overlapping gene(s), and intergenic binding sites were 237 assigned to the gene whose TSS was closest to the peak. To validate candidate binding sites, 238 ChIP-qPCR was performed using primers that were designed to amplify ~200bp surrounding the 239 TF motif for each predicted target.   consensus TFs predicted to bind a similar DNA motif (curated from public databases). We chose to 285 display the logo of the TF motif that had the highest % motif similarity score.    Table   339 3. Bigwig files from ENCODE portal (https://www.encodeproject.org/) were downloaded, imported,