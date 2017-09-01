Primary HCC and paired nonmalignant liver tissue. Fresh primary HCC tumors were surgically removed (therapeutic segmentectomy or hemihepatectomy) at the National Cancer Centre Singapore between 2008 and 2011. Resected liver tissue was analyzed by frozen section to identify neoplastic and nonneoplastic areas. These were macrodissected for analyses as HCC and paired nonmalignant liver tissue.

Liver-conditional knockout of Gata4. Gata4 double-floxed mice were purchased from the Jackson Laboratory (stock 008194). These mice were crossed to albumin-cre mice, also from the Jackson Laboratory (stock 003574). 100% of F1 progeny were Gata4 haploinsufficient only in hepatocyte cells. Both double-floxed and liver-conditional Gata4-haploinsufficient mice were monitored daily; animals with signs of distress were euthanized by an IUCAC-approved protocol. PCR genotyping primer (Supplemental Table 6) sequences were provided by the Jackson Laboratory. DNA was isolated from mouse tails using the DNA Purification Kit (Promega catalog A1020).

RNA and DNA extraction. Snap-frozen specimens were equilibrated with a buffer (RNAlater-ICE, Ambion) that preserves RNA integrity. Frozen specimens, no larger than 0.5 cm, were added to RNAlater-ICE that was first cooled to –80°C in polypropylene tubes and equilibrated with the RNAlater-ICE at –20°C overnight. The specimens were separated for (a) tissue homogenization and RNA extraction and (b) proteinase K digestion and DNA extraction. The fragment for RNA extraction was homogenized using a tissue homogenizer. The mirVana Kit (Ambion) was used per the manufacturer’s protocol for extraction of total RNA. The fragment for DNA extraction was minced with a blade into fragments approximately 2 mm in dimension. These fragments were placed in Eppendorf tubes with cold PBS and left on ice for 5 to 10 minutes to leach out the RNAlater-ICE before digestion in proteinase K for DNA extraction. After removing PBS, lysis buffer was added followed by proteinase K digestion as per the manufacturer’s instructions for the DNeasy Blood and Tissue Kit (QIAGEN).

Cell culture and transfection. Human HCC cell lines HepG2 (ATCC, stock Hb-8065) and PLC (ATCC, stock CRL-8024) were cultured in RPMI media with 10% FBS, 100 U/ml penicillin, and 100 μg/ml streptomycin (Mediatech). Cells were incubated at 37°C in a 5% CO 2 atmosphere. DNA was isolated from both cell lines for SNP array analysis. PLC cells were transfected with GATA4 WT and GATA4 V267M vector, and approximately 60 × 106 cell pellets were harvested at 72 hours. Harvested cells were resuspended in 1× PBS plus PI plus PMSF (PBSW buffer PH7-9). The pellet was centrifuged at 835 g for 5 minutes. Cells were washed 3 times in PBSW buffer.

GATA4 and ARID1A expression vectors. WT GATA4 cDNA was cloned into pFlag-CMV4 (OriGene Technologies) using Clontech infusion cloning (Clontech Laboratories Inc.). The primers used for infusion cloning (Supplemental Table 6) were designed to insert WT GATA4 cDNA at the C terminal end of the Flag tag in the pFlagCMV4 vector (OriGene Technologies). Transient transfection of GATA4 WT or empty vector into HCC cell lines was performed using Xfect transfection (Clontech) following the manufacturer’s guidelines. Cell pellets were isolated at 0, 48, and 96 hours after transfection for downstream analysis. pLenti-puro-ARID1A was a gift from Ie-Ming Shih (Johns Hopkins, Baltimore, MD, USA) (Addgene plasmid 39478) (76).

In vitro site-directed mutagenesis. In vitro site-directed mutagenesis was performed using the Stratagene QuikChange Site-Directed Mutagenesis Kit (catalog 200519). Mutagenic oligonucleotides specific for point mutation 11607635G>A in GATA4 exon 4 (Supplemental Table 6) were designed and ordered from Integrated DNA Technologies (ITD). Both forward and reverse primers annealed to the same sequence on opposite strands of the plasmid expressing GATA4 cDNA. The mutant strand synthesis reaction was performed with the recommended PCR conditions from a Stratagene kit and using PfuTurbo DNA polymerase (Stratagene 200519). After the thermocycler reaction, the PCR product was treated with Dpn I endonuclease reagent to digest the WT parenteral DNA template. The Dpn I–treated PCR product was purified and used to transform XL1 Blue Supercompetent Cells. The cells containing the mutant cDNA were used to transform E. coli at 37°C overnight. Colonies were selected and mini-prep reaction was carried out using the QIAGEN Mini-Prep Kit (catalog 27106). The isolated bacteria DNA was sequenced to confirm the presence of the mutant GATA4 cDNA by both Sanger sequencing and target deep sequencing using primers designed to amplify WT GATA4 cDNA (data not shown) (Supplemental Table 2). Colonies with mutant GATA4 were maxi-prepared using the Promega Pure Yield Plasmid Kit (catalog A2492) and used for transfection experiments.

QRT-PCR using SYBR green. RNA was isolated using the RNeasy method (QIAGEN), and cDNA was prepared using the iScript cDNA Synthesis Kit (Bio-Rad). QRT-PCR was done using an ABI Prism 7500 Sequence Detection System (Applied Biosystems) and SYBR Premix Ex Taq II (TakaRa). Real-time PCR primers (Supplemental Table 6) were designed with PrimerQuest (https://www.idtdna.com/primerquest/Home/Index). The relative number of copies of mRNA (RQ) was calculated based on the average Ct values using the housekeeping gene GAPDH as internal control and baseline controls for relative expression. Results are shown as mean ± SD of 3 independent experiments.

Sanger and targeted next-generation sequencing. All coding region exons of GATA4 in genomic DNA from paired primary HCC and nonmalignant liver tissue were Sanger sequenced by ABI 3730×I DNA analyzer (Applied Biosystems). Primers for bidirectional sequencing were designed using PrimerQuest (www.idtdna.com) (Supplemental Table 6). DNA sequences were analyzed using FinchTV DNA analysis software. In patients in whom mutations were identified, DNA isolated from peripheral blood mononuclear cells was similarly sequenced to determine whether the mutation was germline.

In addition, targeted next-generation sequencing was applied to exon 4 of GATA4 (to confirm the mutation identified by Sanger sequencing) and to all coding exons of HNF1A, ARID1A, SMARCA4, ARID2, CTNNB1, and TP53. Primers (Supplemental Table 6) were designed using PrimerQuest (www.idtdna.com). Each target exon was amplified by PCR using 100 ng/μl or more of primary DNA. The PCR amplification protocol included an initial denaturation step (94°C for 5 minutes), followed by 19 cycles of denaturation (94°C for 30 seconds, at 25% ramp cooling temperature), annealing, and extension (60°C for 3 minutes at 40% ramp cooling). After PCR amplification, subsequent purification and sequencing library preparation were according to Illumina pair-end library protocol. Briefly, DNA was purified using magnetic beads. The purified PCR products were end repaired to introduce sticky ends using end-repair enzyme (NEB catalog E6050S). Paired-end adapters were then ligated using T4 DNA Ligase (NEB catalog M0202S) to the amplified PCR fragments of about 250 bp in length. Nick fill reaction was performed using Bst DNA polymerase (NEB catalog MO374S). Library fragments and all PCR amplification were performed using HotStart Applied Biological Material (ABM) Taq DNA polymerase (ABM catalog G011). The library was subjected to deep sequencing on Illumina MiSeq using 2 × 300 paired-end sequencing following standard loading procedures. Coverage information for each sequenced gene for each sample is reported in Supplemental Table 4. Raw sequencing reads were trimmed of Illumina adapters, and low quality reads were discarded. Paired-end alignment was done using BWA 0.6 aligner with hg19 as the reference sequence. Variant extraction was done using the GATK pipeline (GATKv3.3). We prioritized variants after an elimination of sequencing/mapping errors (removal of low-quality reads, sequencing depth of at least 30 reads and 10 mutation reads, removal of variants with directional bias), removal of potential, common, benign polymorphisms (using publicly available databases of common and rare germline variants such as ExAC [http://exac.broadinstitute.org/], ESP6500 [http://evs.gs.washington.edu/EVS/], and 1000 Genomes [http://www.internationalgenome.org/]), and removal of variants with minor allelic frequency of more than 0.0001. Sequences were aligned to the reference genome using NovoAlign and were analyzed using Integrative Genome Viewer (IGV) software.

RNA sequencing. RNA was extracted from murine livers by the methods discussed above; however, the liver samples were also treated with DNaseI to remove potential DNA contamination. RNA integrity was measured by 1% 2D gel electrophoresis. Samples with 28S and 18S RNA (n = 3 mice per group) were sent for sequencing by ABM. RNA quality check was confirmed using Agilent 2100 Bioanalyzer, with all samples passing quality control. The samples were subjected to poly A enrichment, followed by fragmentation, first- and second-strand synthesis, adenylation of 3′ ends, adapter ligation, DNA fragmentation enrichment, and real-time PCR quantification. Cluster generation and sequencing were carried out in one run on NextSeq 500 (Illumina, cluster generation and 2-channel sequencing), and Bcl files were converted to FastQ data immediately after the run. Over forty million paired-end reads for all 6 samples were recovered from the sequencing. Sequence reads were aligned to murine reference genome mm10 by Bowtie 2, and expression data were generated for analysis. Heat maps were generated using ArrayStarv3 (DNASTAR).

Gene expression analysis by microarray. The HumanHT-12 v3 gene expression microarray (Illumina) was used to analyze RNA from paired HCC and nonmalignant liver. The array evaluated over 25,000 annotated genes with over 48,000 probes designed using RefSeq (Build 36.2, Rel 22) and UniGene (Build 199). Microarray probe intensity values were subject to average normalization by GenomeStudio software to minimize the effects of variation from nonbiological factors and to calculate expression measures from the raw data. Expression measures of probe sets covering specific genes of interest were exported as a spreadsheet to the SAS System V9.2 (SAS Institute Inc.) for further statistical analysis. Only probe intensity data with detection P values of less than 0.05 (a statistical calculation that provides the probability that the signal from a given probe is greater than the average signal from the negative controls) were used in analyses of differences between groups of samples. Heat maps were generated by ArrayStarv3 (DNASTAR). All original microarray data were deposited in the NCBI’s Gene Expression Omnibus (GEO GSE57958).

High-resolution molecular karyotyping by SNP array. The Human660W-Quad v1.0 DNA BeadChip Kit was used for high-resolution molecular karyotyping of DNA isolated from primary HCC specimens and a control nonmalignant DNA. The BeadChip analyzed more than 660,000 individual loci. Genome Studio (Illumina), KaryoStudio (Illumina), and IGV (77) software were used to document large chromosome aberrations (e.g., >75 kb), to score these aberrations as loss, gain, or uniparental disomy (UPD), and for crossmatching these aberrations with information from public databases. Affymetrix SNP6 CEL files of 27 liver-derived CCLE samples were converted into A- and B-allele frequencies using crlmm of the Bioconductor R package oligo. log R ratios (LRR) were then computed as log 2 of A+B divided by its median. Curves fitted to LRR were obtained using the ggplot2 function stat_smooth with its span parameter set to 0.3. The smooth curve displayed without its data was found similarly, but using data from all 27 CCLE liver cell lines combined.

Cell fractionation and nuclear protein extraction. Cells were resuspended in 500 μl PBSW followed by addition of 10% NP-40 (1:20 μl). This was incubated on ice for 1 minute followed by centrifugation at 835 g for 10 minutes. Supernatant containing the cytoplasmic fraction was transferred to a separate tube. Nuclear pellets were washed in PBSW buffer and centrifuged at 835 g for 10 minutes. Benzonase was added (1 μl), and this was incubated on ice for 90 minutes and vortexed every 30 minutes. 250 μl of nuclear extraction buffer E1 (250 μl PBSW + 2% NP40 + 500 mM [5 μl] + 5 M NaCl [25 μl]) was added, followed by a 10-minute incubation on ice; mixture was vortexed every 5 minutes. Sample was centrifuged at full speed for 10 minutes. The supernatant, which contained the cytoplasmic fraction, was transferred to a separate tube. 5 μl of 10% SDS was added to the remaining nuclear pellet. Nuclear protein was extracted again by adding 250 μl of nuclear extraction buffer E2 (250 μl PBSW + 1% NP40 + 500 mM [2.5 μl] NaCl) to the remaining pellet. Sample was incubated on ice for 10 minutes and vortexed every 5 minutes as above. This was followed by centrifugation at full speed for 15 minutes. Supernatant was added to the nuclear protein extraction tube. Nuclear extraction buffer (200 μl) was added to the tube containing the remaining pellet. This was incubated on ice for 10 minutes with vortexing every 5 minutes. The sample was centrifuged at full speed for 15 minutes, and supernatant was added to the nuclear pellet tube. Concentration of the total protein extracted was measured using BCA.

Covalent binding of antibody to protein G beads. 25 mg (200 μl) of protein G–sepharose was washed twice with 1× PBS, followed by incubation with 200 μl of Flag antibody for 1 hour at room temperature and pressure. Antibody-bound protein G was incubated in 1% chicken egg albumin for 1 hour. This was washed twice with 1× PBS. 25 mg of dimethylpimelimidate was added to 1 ml of 300 mM NEM, followed by swirling for 30 minutes at room temperature and pressure. This was repeated twice. Glycine-HCl (PH3), was added, followed by spin down. This was washed 3× using 1× PBS. Samples were then washed ×2 using nuclear extraction buffer.

Immunoprecipitation. 200 μl of nuclear protein lysate was precleared using protein G-sepharose (50% slury). This was incubated at 4°C for approximately 60 minutes and spun for 10 minutes at 4°C. Supernatant was transferred to fresh tubes. 30 mg nuclear protein extracts (precleared lysate) was transferred to tubes with antibody-bound protein G beads and rocked gently at 4°C overnight. This mixture was washed 5 times with 1× PBS containing 1% NP-40. Samples were dried using Spin-Dry vacuum centrifugation at –100 on a SpeedVac vapor trap. Immunoprecipitation products were extracted from the protein G beads using Laemmli sample buffer.

Western blot analysis. Western blot was by standard methods: antibodies used were GATA4 (Abcam, catalog Ab124265), anti-Flag (Sigma-Aldrich, catalog F7425-2MG), c-MYC (Cell Signaling Technology, catalog 5605), p27/CDKN1B (Cell Signaling Technology, catalog 3833) and β-actin (Sigma-Aldrich, catalog a3854), histone 3 (Cell Signaling Technology, catalog 9715S), MED12 (Cell Signaling Technology, catalog 4529S,) SMARCA5/SNF2 (Cell Signaling Technology, catalog 13543s), and anti-V5 (Abcam, catalog Ab27671).

Protein identification by LC-MS/MS. Anti-Flag and isotype antibody immunoprecipitation products were separated by molecular weight using SDS-polyacrylamide gel electrophoresis and stained with colloidal Coomassie blue (GelCode Blue, Pierce Chemical). All separated proteins on the gel slices were excised (Supplemental Figure 9); proteins were reduced with dithiothreitol (Sigma-Aldrich, D0632, 10 mM), alkylated with iodoacetamide (Sigma-Aldrich, I1149, 55 mM), and digested in situ with trypsin. Peptides were extracted from gel pieces 3 times using 60% acetonitrile and 0.1% formic acid/water. The dried tryptic peptide mixture was redissolved in 20 μl of 1% formic acid for MS analysis. Tryptic peptide mixtures were analyzed by online LC-MS/MS on an Orbitrap Mass Spectrometer (Thermo Fisher Scientific).

Database search and data validation. Mascot Daemon software (version 2.3.2; Matrix Science) was used to perform database searches using the Extract_msn.exe macro provided with Xcalibur (version 2.0 SR2; Thermo Fisher Scientific) to generate peaklists. The following parameters were set for creation of the peaklists: parent ions in the mass range of 400–4500, no grouping of MS/MS scans, and threshold at 1000. A peaklist was created for each analyzed fraction (i.e., gel slice), and individual Mascot (version 2.3.01) searches were performed for each fraction. The data were searched against Homo sapiens entries in the UniProt protein database (May 2015 release, 151,569 total sequences; http://www.uniprot.org/). Carbamidomethylation of cysteines was set as a fixed modification, and oxidation of methionine was set as a variable modification. Specificity of trypsin digestion was set for cleavage after Lys or Arg, and 2 missed trypsin cleavage sites were allowed. The mass tolerances in MS and MS/MS were set to 10 ppm and 0.6 Da, respectively, and the instrument setting was specified as ESI-Trap. To calculate the FDR, the search was performed using the decoy option in Mascot. The spectral FDR and protein FDR were 0.35% ± 0.17% and 4.28% ± 1.99%, respectively. A minimum Mascot ion score of 25 and peptide rank 1 were used for automatically accepting all peptide MS/MS spectra.

LFQ. Relative protein quantification was performed using spectral count-based label-free relative protein quantitation (LFQ). For each biological sample, data from the individual gel slices were combined. Statistical analysis was performed on all proteins identified, with average spectral counts of 2 or more for at least 1 of the 3 experiments. The spectral count data were normalized by total spectral counts of the bait protein (GATA4) in each sample to adjust for differences in overall protein levels among samples. Proteins were considered to have a significant difference in abundance if there was a difference of 2-fold or greater in normalized spectral counts between experiments and a P value of 0.01 or less using a 2-tailed t test. Spectral counts for all proteins and peptides identified are provided in Supplemental Table 5.

GATA4 DNA-binding analysis. A 5′-GATA probe (ATTACT GATA ATGGTG-3′ X3) and negative control probe (5′-ATTACT CCCC ATGGTG-3′ X3) (underlines indicate GATA4 response element versus scrambled sequence ‘CCCC’) were designed and ordered from ITD. The probes were biotinylated with a biotin-labeling kit following the manufacturer’s guidelines (Thermo Fisher Scientific, catalog K0651). GATA4 WT and GATA4 V267M vector were transfected in HCC cell PLC, cell pellets from 6.0 × 106 cells were isolated at 48 hours, and nuclear fractions were extracted using nuclear extraction buffer: (250 μl PBSW buffer PH7-9) + ( 2% NP40 + 500 mM [5 μl] + 5 M NaCl [25 μl]). Isolated nuclear fractions were incubated with the GATA probe and the scramble control probe at 4°C overnight. Incubated fractions were pulled down using streptavidin beads for 1 hour at room temperature.

Bioinformatic and statistical analysis. Proteins identified by label-free LC-MS/MS were analyzed by the Ingenuity Pathway Analysis Tool (IPA, Ingenuity Systems). The core analysis function included in IPA was used to interpret the data in the context of biological processes, pathways, and networks. Right-tailed Fisher’s exact test was used to determine a P value indicating that the probability of biological functions, canonical pathways, and diseases associated with the networks was not because of chance alone. Each protein identifier was mapped to its corresponding gene object in the Ingenuity Pathways Knowledge Base.

For classification of hepatocyte precursor and hepatocyte genes, a public gene expression database (GEO GSE13149) of genes expressed sequentially from early to late stages of liver development was analyzed to identify genes enriched at different stages of development (23). Comparative Marker Selection (V10) in GenePattern (Broad Institute, Massachusetts Institute of Technology, Boston, Massachusetts, USA) (78) was used to find genes whose expression correlated with phenotype. Statistical significance was determined by the 1000 permutations test and an FDR (Benjamini-Hochberg) cutoff of less than 0.01. In a separate approach to identifying genes that were differentially expressed between HCC and paired nonmalignant liver, the average expression of each gene across 46 HCC samples and, separately, across 46 adjacent nonmalignant livers was calculated. Genes with average expression values in HCC (n = 46) that were less than 66% of the average expression value in nonmalignant liver (n = 46) were identified. To determine tissue expression (UNIGENE EST QUARTILE) associations, gene lists were uploaded into DAVID (79) to provide a ranked representation of tissue expression associations that were most saturated or “enriched” with the input gene lists. For analysis of public ChIP sequencing (ChIP-Seq) data, aligned H1 ESC H3K4me3 ChIP-Seq data (ENCFF775QSF) were imported, analyzed, and visualized using EaSeq and its suite of integrated tools; all values were normalized to reads per million per 1 kbp (80).

Wilcoxon rank sum and Student’s t tests were 2-sided and performed at the 0.05 significance level unless stated otherwise. SDs for each set of measurements were calculated and represented as y axis error bars on each graph. JMP Pro 10.0 or SAS statistical software was used to perform statistical analysis (SAS Institute Inc., http://www.jmp.com) including correlation analyses.

Study approval. Human samples were obtained in the context of clinically indicated surgery and with written informed consent from patients in accordance with the Declaration of Helsinki and using protocols approved by the SingHealth Institutional Review Board at the National Cancer Centre Singapore; human HCC tumors and paired normal samples were surgically removed at the National Cancer Centre Singapore between 2008 and 2011. Animal studies were conducted at the Cleveland Clinic using protocols approved by the Cleveland Clinic IACUC.