To define an epigenomic signature associated with the TFE3 fusion, we first profiled or reanalyzed 27 epigenomic libraries from 4 tRCC and 6 clear-cell RCC (ccRCC) lines (Figure 1A; see Methods). We performed ChIP-seq for 2 posttranslational histone modifications (H3K4me3 and H3K27ac) as well as methylated CpG dinucleotide immunoprecipitation and sequencing (MeDIP-seq). H3K4me3 is enriched at active gene promoters (32), and H3K27ac is enriched at active gene promoters and enhancers (33), while DNA methylation is associated with promoter silencing (34). Across all RCC cell lines, a median of 29,588 peaks (range 25,314–30,076) were captured by H3K4me3 ChIP-seq, a median of 57,226 peaks (range 44,470–73,400) by H3K27ac ChIP-seq, and a median of 229,624 peaks (range 111,904–297,344) by MeDIP-seq (Supplemental Figure 1; supplemental material available online with this article; https://doi.org/10.1172/JCI195725DS1).

Figure 1 Cell line–informed epigenomic signature of tRCC. (A) Epigenomic datasets generated from 4 tRCC (s-TFE, FU-UR-1, UOK109, and UOK146) and 6 ccRCC (Caki-1, A-498, RFX393, 786-O, 769-P, and KMRC-1) cell lines, either in-house or in a previously published study (69, 81). (B) Unsupervised hierarchical clustering of the H3K4me3 ChIP-seq, H3K27ac ChIP-seq, and MeDIP-seq consensus peaks across tRCC and ccRCC cell lines analyzed in this study. (C) Volcano plots showing differentially marked peaks between tRCC and ccRCC cell lines for H3K4me3 ChIP-Seq, H3K27ac ChIP-seq, and MeDIP-seq. Thresholds for significance were set at FDR-q < 0.01 and log 2 FC > 1 for H3K27ac and MeDIP and > 2 for H3K4me3.

Unsupervised hierarchical clustering and principal component analyses of consensus H3K4me3 and H3K27ac peaks revealed clear segregation of tRCC and ccRCC cell lines, while DNA methylation profiles were minimally discriminatory between tRCC and ccRCC (Figure 1B and Supplemental Figure 1; see Methods), even when restricted to CpG islands (Supplemental Figure 1). This may be in part because methylation patterns tend to be more stable and often highlight lineage identity, which may be similar between ccRCC and tRCC (13, 31, 35–37), whereas histone marks, particularly active modifications, can more dynamically reflect transcriptional changes driven by the TFE3 fusion (9, 33).

We next sought to identify regulatory elements with differential epigenetic modifications in tRCC versus ccRCC cells. Overlapping peaks across samples were merged for each mark, creating a consensus set of 26,529, 63,322, and 342,285 peaks for the H3K4me3, H3K27ac, and MeDIP profiles, respectively. Via differential peak analysis of ChIP-seq data using the DiffBind R package (38), we identified 2,860 differential H3K4me3 peaks (of which 2,450 were enriched in tRCC [tRCC-up]; FDR q value [FDR-q] < 0.01 and log 2 fold-change [log 2 FC] > 2) and 21,325 differential H3K27ac peaks (of which 11,443 were tRCC-up; FDR-q < 0.01 and log 2 FC > 1) (Figure 1C and Figure 2A; see Methods). By contrast, among MeDIP-seq peaks, we identified only 627 differentially methylated regions (DMRs) between tRCC and ccRCC (FDR-q < 0.01 and log 2 FC > 1; Figure 1C and Supplemental Figure 1; see Methods), consistent with the weak segregation observed in unsupervised hierarchical clustering. Motif analysis of the 11,443 H3K27ac tRCC-up peaks identified significant enrichment for sequences bound by TFE3 and its paralog, MITF, which share consensus binding sites (39). This suggests that H3K27ac tRCC-up sites include regulatory elements activated by direct binding of TFE3 fusions (Supplemental Figure 2) and is consistent with recent reports that TFE3 fusions may facilitate the organization of enhancer loops and transcriptional activation (31, 40–43).

Figure 2 Definition of TFE3 fusion–occupied sites in tRCC. (A) Heatmaps of normalized H3K27ac and H3K4me3 tag densities at differentially marked regions between tRCC and ccRCC cell lines shown over a window ±2 kb from peak center. (B) Schema for identifying 6,540 TFE3 fusion–occupied TFBSs through the intersection of known TFE3 TFBSs (GTRD) and TFE3 fusion peaks via TFE3 ChIP-seq in tRCC cell lines. (C) Aggregated H3K27ac density at 6,540 TFE3 fusion–occupied TFBSs across RCC cell lines profiled in this study, showing increased signal in tRCC cell lines (red) compared with ccRCC cell lines (gray).

Next, we sought to refine our epigenetic signature by incorporating transcriptionally active sites directly bound by the TFE3 fusion in tRCC. First, to obtain a robust consensus set of TFE3 fusion binding sites, we intersected 24,050 WT TFE3 transcription factor binding sites (TFBSs) derived from 2 non-RCC cell lines (LoVo, a colorectal cancer cell line, and HepG2, a hepatocellular carcinoma cell line) in the Gene Transcription Regulation Database (GTRD) with 29,785 TFE3 TFBSs identified via ChIP-seq in 3 tRCC cell lines representing 2 distinct TFE3 fusions (44). This resulted in a final set of 6,540 sites that we deemed fusion-occupied TFBSs (Figure 2B; see Methods). Assessing aggregated H3K27ac signal across these 6,540 fusion-occupied TFBSs revealed a higher signal in all tRCC cell lines compared with ccRCC cell lines (Figure 2, A and C). Importantly, this difference in signal intensity was less pronounced when considering all TFBSs from GTRD (n = 24,050) or the fusion nonoccupied binding sites, which did not overlap with TFE3 fusion ChIP-seq peaks (n = 17,510) (Supplemental Figure 2), underscoring the importance of building a robust consensus set of TFE3 fusion–occupied TFBSs. Furthermore, only 853 of the 6,540 fusion-occupied TFBSs overlapped with the H3K27ac tRCC-up sites (n = 11,443) identified using DiffBind (Supplemental Figure 2), suggesting that these 2 methods identify partly nonoverlapping regulatory sites associated with tRCC. We sought to evaluate the specificity of our 3 sets of sites for tRCC compared with other RCC subtypes. At the fusion-occupied TFBSs and H3K27ac and H3K4me3 tRCC-up sites, we computed the aggregated H3K27ac and H3K4me3 ChIP-seq signal in tRCC, ccRCC, and papillary RCC (pRCC) cell lines (Supplemental Figure 3). The signal for pRCC cell lines was similar to that observed in ccRCC and trended lower than that in tRCC. We extended this analysis using publicly available H3K27ac data from 6 pRCC, 12 chRCC, and 12 ccRCC tumor samples (45). Here again, the signals in pRCC and chRCC were comparable with those observed in ccRCC previously.

Having identified a tRCC-specific epigenomic signature in cell line models, we next evaluated its ability to discriminate plasma from patients with tRCC, ccRCC, and those serving as healthy controls using cf-ChIP. We profiled 141 epigenomic libraries from 51 plasma samples from patients with tRCC (N = 30 samples), ccRCC (N = 12), and healthy individuals (N = 9) (Figure 3A). cf-ChIP revealed increased signals at tRCC-up or ccRCC-up peaks in RCC plasma that were absent in plasma from healthy volunteers (Figure 3B). For example, in tRCC plasma, H3K4me3 and H3K27ac signals were elevated at the GPR143 gene locus, a tRCC-specific peak and TFE3-fusion target gene (46). Conversely, we observed an increased H3K4me3 and H3K27ac signal at the C1QL1 gene locus in ccRCC plasma samples. These findings were concordant with the published RNA-seq data in ccRCC and tRCC cell lines, as well as in tRCC and ccRCC tumor samples from a cohort of patients with metastatic ccRCC (47) (Supplemental Figure 4).

Figure 3 Detection of a tRCC-specific epigenomic signature via plasma cf-ChIP. (A) Epigenomic datasets generated from 51 plasma samples collected from patients with metastatic tRCC (N = 30; 10 patients), metastatic ccRCC (N = 12; 12 patients), or those acting as healthy controls (N = 9; 9 individuals). (B) Integrative Genomics Viewer tracks from ChIP-seq profiles for H3K4me3, H3K27ac, and TFE3 in cell lines (tRCC: UOK109, s-TFE; ccRCC: 786-O, A-498) and plasma samples at representative tRCC-selective (GPR143) or ccRCC-selective (C1Q1L) loci. (C) Aggregated cf-ChIP signal compared among tRCC, ccRCC, and healthy plasma samples for the following marks; the box plots quantify the AUCs for each histone mark (left to right): H3K4me3 signal at cell line–informed H3K4me3 tRCC-up sites was significantly higher in 28 tRCC samples from 10 patients compared with 11 ccRCC samples from 11 patients (P = 0.0027) and showed a trend when compared with 9 healthy control samples (P = 0.079); H3K27ac signal at cell line–informed H3K27ac tRCC-up sites was significantly higher in 27 tRCC samples from 10 patients compared with 12 ccRCC samples from 12 patients (P = 0.0029) and with 9 healthy control samples from 9 individuals (P = 0.00017); H3K27ac signal at TFE3 fusion–occupied TFBSs was significantly higher in 27 tRCC samples from 10 patients compared with 12 ccRCC samples from 12 patients (P = 0.0003) and with 9 healthy control samples from 9 individuals (P = 0.00042). P values were determined by Wilcoxon’s test. (D) Comparison of TIESs of cf-ChIP H3K4me3 and H3K27ac signals at cell line–informed sites (H3K4me3 tRCC-up peaks, H3K27ac tRCC-up peaks, and TFE3 fusion–occupied binding sites) among tRCC, ccRCC, and healthy plasma, with samples color-scaled according to tumor fraction. TIES was significantly higher in 27 tRCC samples from 10 patients compared with 11 ccRCC samples from 11 patients (P = 0.00035) and with 9 healthy control samples from 9 individuals (P = 0.000049). HP, healthy plasma. P values were determined by Wilcoxon’s test. The box plots depict the minimum and maximum values (whiskers), the upper and lower quartiles, and the median.

Given that GPR143 and C1QL1 appeared to be highly selective marker genes for tRCC and ccRCC, respectively, we also attempted to measure their expression directly in the blood by profiling circulating tumor cells (CTCs). A cohort of 10 patients with metastatic ccRCC, 7 with metastatic tRCC, and 1 serving as a healthy control was sampled for CTCs using isolation on the TellDx CTC system (see Methods). We extracted RNA from pelleted samples and aimed to detect tRCC-specific (GPR143 and TRIM63) or ccRCC-specific (C1QL1) transcripts by digital droplet PCR (ddPCR) after whole transcriptome amplification. While these transcripts could be detected in healthy blood spiked with RNA derived from tRCC or ccRCC cell lines, no signal was detectable in the tRCC or ccRCC patient samples (Supplemental Figure 4). Of note, 4 patients were sampled for both cfDNA and CTC isolation, with 2 of them being sampled from the same blood draw. These results suggest that cf-ChIP can infer tRCC gene expression programs even when they are not detectable by standard methods for profiling CTCs.

Next, to develop an epigenome-wide cf-ChIP signature for detecting tRCC, we compared H3K4me3 and H3K27ac coverage in patient plasma at the 2,450 H3K4me3 tRCC-up and 11,443 H3K27ac tRCC-up peaks, informed by cell line profiling. Aggregated coverage at both H3K4me3 and H3K27ac tRCC-up sites was increased in plasma from patients with tRCC compared with patients with ccRCC (P = 0.0027 and 0.0029, respectively) or those serving as healthy controls (P = 0.079 and 0.00017, respectively) (Figure 3C). Conversely, the aggregated methylation signal in plasma measured using cf-MeDIP at tRCC DMRs derived from cell lines or The Cancer Genome Atlas (TCGA) methylation data did not distinguish tRCC and ccRCC plasma samples (Supplemental Figure 5). Finally, H3K27ac coverage at the 6,540 TFE3 fusion–occupied binding sites showed higher discriminating power versus ccRCC (P = 0.0003) and versus healthy patients (P = 0.00042), perhaps due to WT TFE3 activity in white blood cells, particularly macrophages, where MiT/TFE genes can be active (48) (Figure 3C).

Finally, we sought to integrate the 3 sets of epigenomic data described above to build a robust cf-ChIP classifier for tRCC (Supplemental Table 1). Our tRCC classifier utilized 3 distinct cell line–informed signatures: H3K4me3 tRCC-up sites, H3K27ac tRCC-up sites, and TFE3 fusion–occupied binding sites (Supplemental Tables 2 and 3). Aggregating plasma H3K4me3 signal across H3K4me3 tRCC-up sites (n = 2,450) distinguished tRCC from ccRCC plasma samples (n = 27 and 11, respectively) with an AUC of 0.78. Aggregating plasma H3K27ac signal across H3K27ac tRCC-up sites (n = 11,443) and TFE3 fusion–occupied binding sites (n = 6,540) achieved AUCs of 0.8 and 0.84, respectively. To enhance the performance of the classifier, we combined the 3 scores for each sample (see Methods) to create a tRCC integrated epigenomic score (TIES) (Figure 3D). In this process, we ensured that signals at overlapping sites between H3K27ac tRCC-up sites and TFE3 fusion–occupied binding sites were not double counted (Supplemental Figure 2). This approach achieved an AUC of 0.86 for the discrimination of tRCC from ccRCC (Figure 4A). For discriminating tRCC from healthy plasma samples (n = 27 and 9, respectively), aggregated H3K4me3 signal at H3K4me3 tRCC-up sites, aggregated H3K27ac signal at H3K27ac tRCC-up sites, and TFE3 fusion–occupied binding sites achieved AUCs of 0.7, 0.89, and 0.87, respectively, with an integrated AUC of 0.92 (Figure 4A).

Figure 4 Detection and monitoring of tRCC using cf-ChIP. (A) Classifier assessing individual cf-ChIP H3K4me3 and H3K27ac signals at cell line–informed sites – H3K4me3 tRCC-up peaks, H3K27ac tRCC-up peaks, and TFE3 fusion–occupied binding sites — and evaluating their combined performance in distinguishing tRCC from ccRCC (left) and tRCC versus healthy plasma (right). (B) Estimation of TIES limit of detection. Ten tRCC samples (all with tumor fraction > 3% by ichorCNA) were diluted in silico by adding reads from 9 healthy plasma samples at 0.9, 0.8, 0.7, 0.6, 0.5, 0.4, 0.3, 0.2, 0.1, 0.05, and 0.01 ratios. An estimated tumor fraction (TF) was assigned for each dilution, equal to the TF calculated by ichorCNA for the tRCC sample multiplied by the portion of tRCC reads for that dilution. For a given bin of estimated TF, TIESs for all tRCC samples were pooled and compared with the healthy plasma value (Wilcoxon’s rank sum test). Red dashed line indicates the threshold of significance (P = 0.05). (C) Longitudinal tracking of the TIES (orange) and ctDNA fraction (blue) in a patient with tRCC. Radiographic changes in an index lesion (pleural metastasis) are provided, as are timings and doses of administered systemic therapies. SD, stable disease; PR, partial response; PD, progressive disease. (D) Percent change in TIES between consecutive plasma draws was significantly higher in patients with disease progression compared with patients with no change or response (P = 0.027). The box plots depict the minimum and maximum values (whiskers), the upper and lower quartiles, and the median.

Leave-one-out cross-validation (LOO-CV) performed on tRCC and healthy plasma samples (see Methods) achieved a precision of 100%, a recall of 77.8%, and a specificity of 100%. The optimal actionable cutoff maximizing the difference between true positive and false positive rates, averaged across all LOO iterations, was 7.75. In addition, at the single-patient level, and among plasma samples with detectable ctDNA, 5 out of 6 patients with tRCC exhibited elevated (>7.75) TIESs. Interestingly, two tRCC plasma samples had markedly elevated TIESs but < 3% tumor fraction estimated by ichorCNA, possibly due to the paucity of copy number alterations in some tRCC tumors that may limit quantification of cfDNA quantity by tumor fraction alone (Supplemental Figure 5).

Given this observation, we next estimated the limit of detection of the TIES via in silico dilution. We combined the 10 tRCC samples with tumor fraction > 3% (as estimated by ichorCNA) with each of 9 healthy plasma samples at 10 dilution ratios (90 combinations for each of 10 dilution ratios ranging from 0.9 to 0.01 of tumor/healthy plasma by number of reads). Diluted samples were binned into intervals of 0.4% expected tumor fraction (ranging from <0.4% to >3.2%). In each bin, the TIES for the tumor dilutions was compared with the TIES for the 9 healthy plasma samples via Wilcoxon’s test. A significant difference between healthy samples and tRCC samples (P < 0.05) was observed down to a tumor fraction of 0.8%–1.2% (Figure 4B and Supplemental Figure 5). While previous studies have directly identified gene fusions in ctDNA using high-depth sequencing (49), the low coverage of our cfDNA low-pass whole-genome sequencing (cf-lpWGS) data (~0.1×) was insufficient to detect fusion-supporting reads; nonetheless, transcriptional evidence of the translocation could be detected via cf-ChIP with a tumor fraction in the 1% range. Finally, we also compared this approach to nucleosome depletion signatures at TFBSs in cfDNA using low-pass whole-genome sequencing (49). We used Griffin (50) to calculate the aggregated normalized coverage at the TFE3 fusion–occupied TFBS. No difference was observed between the median profile of ccRCC, tRCC, and that of individuals acting as healthy controls. However, the central coverage (defined as the bottom peak value) between subgroups showed a trend toward lower values for tRCC samples with a tumor fraction > 3% (Supplemental Figure 5).

Having established a tRCC-specific epigenetic signature that can be detected in plasma cfDNA, we also aimed to monitor tRCC disease burden via TIES measured by cf-ChIP in 3 patients with metastatic tRCC whose plasma samples were collected at multiple time points during treatment (Figure 4C and Supplemental Figure 6). In all 3 patients, we observed that variations in TIES were concordant with the clinical course of response and progression to systemic therapy. For instance, in the patient TRCCP4, we observed an increase of TIES at 3 time points (13, 26, and 52 months after diagnosis) corresponding to radiographic progression but a decrease at time points corresponding to disease control or response (18 months and 39 months after diagnosis) (Figure 4C). Interestingly, we also observed an increase in TIES within 3 months prior to the scan showing progression, although the ctDNA fraction remained undetectable at this time by cf-lpWGS. In patient TRCCP5 (initially misdiagnosed as ccRCC on pathology), we observed an increase in TIES at disease recurrence, followed by a subsequent decrease after a change in systemic therapy that resulted in radiographic disease control (Supplemental Figure 6). Similarly, in patient TRCCP3, we observed an initial decrease in TIES following curative nephrectomy, followed by an increase of the signal 16 months later, aligned with disease recurrence and metastasis to the liver (Supplemental Figure 6). We note that, across multiple patients, TIES was detectable and dynamic even when the tumor fraction was in the undetectable range (<3%) by a method that estimates ctDNA using copy number alterations (17).

To evaluate cf-ChIP for monitoring tRCC treatment response, we calculated changes in TIES and tumor fraction for each pair of consecutive plasma draws. When we compared these changes during intervals of disease progression, stability, or response, we found that TIES between consecutive draws increased at times of disease progression and decreased during response or disease stability (P = 0.027; Figure 4D). Importantly, changes in tumor fraction were far less pronounced (P = 0.63; Supplemental Figure 6). This suggests that our liquid biopsy epigenomic assay may be more effective at tracking disease evolution and response compared with other methods that rely solely on copy number alterations, likely due to the low frequency of such alterations in tRCC.

Finally, to evaluate the extensibility of our approach, we assessed its potential to detect other fusion-specific epigenomic signatures in plasma samples. We compared plasma samples from patients with prostate cancer with (n = 5) or without (n = 8) the TMPRSS2-ERG fusion, previously profiled with cf-ChIP (22). The TMPRSS2-ERG fusion, which places the ETS family transcription factor ERG under the control of the androgen-responsive gene TMPRSS2, is found in approximately 50% of prostate cancer cases and is associated with a distinctive transcriptional signature (51, 52). We observed that plasma from patients with the TMPRSS2-ERG fusion exhibited significantly higher H3K27ac signal at fusion-specific H3K27ac sites (n = 7,531) (53) compared with samples from patients with fusion-negative cancers (P = 0.006); this corresponded to an AUC of 0.95 for discriminating between samples with and without the TMPRSS2-ERG fusion (Supplemental Figure 7).

We also reasoned that cf-ChIP might be more broadly applicable in cancers with distinctive transcriptional profiles, such as those harboring driver fusions involving a transcription factor. To nominate additional cancer types that may be amenable to profiling via cf-ChIP, we first performed a pan-cancer survey of the fraction of genome altered (FGA), a metric of the proportion of the genome affected by copy number alteration (54). We observed substantial variation in FGA both between and within cancer lineages (Supplemental Figure 7); we note that tumors with low FGA may have insufficient CNAs to enable accurate estimation of tumor fraction in cfDNA (17). We then assessed how FGA tracked with fusion status across cancer types. When limiting to tumors harboring driver fusions involving transcription factors (55, 56) (analogous to the TFE3 fusions in tRCC), we observed that fusion-positive cancers had significantly lower FGA than fusion-negative cancers (median 0.06 vs. 0.20, respectively, P < 2.2 × 10–16; Supplemental Figure 7), consistent with a prior pan-cancer fusion analysis, suggesting that >1% cancers may harbor a fusion oncogene as the sole driver (56). For example, FGA in TFE3 fusion-positive RCCs was significantly lower than in other RCCs (median 0.08 vs. 0.15, respectively, P = 0.038). Similarly, FGA in SSX2 fusion-positive synovial sarcoma was significantly lower than in other sarcomas (median 0.08 vs. 0.34, respectively; P = 0.011) (Supplemental Figure 7). Together, these findings may suggest a potential applicability of cf-ChIP to an array of mutationally quiet cancers, particularly those harboring driver fusions involving a transcription factor.