Defining genomic late progression events in ERG-fusion and SPOP-mutant subclasses. To understand molecular progression in specific subtypes of PCa, we initially took an unbiased approach to define genomic alterations (including point mutations, amplifications, and homozygous deletions) associated with specific subclasses (Figure 1A, Supplemental Figure 1, and Supplemental Table 1; supplemental material available online with this article; https://doi.org/10.1172/JCI147878DS1). In the ERG-fusion subclass, PTEN deletion was the most enriched alteration, while CHD1 deletion was the most enriched alteration in the SPOP-mutant subclass (Figure 1A), consistent with prior results (4, 12, 30).

Figure 1 Identification of subclass-specific late progression events in localized prostate cancer. (A) Enrichment of recurrent genomic alterations in ERG-fusion and SPOP-mutant subclasses from TCGA localized prostate cancer (PCa) cohort (n = 333). The alteration enrichment between 2 subclasses was calculated by 2-sided Fisher’s exact test. Orange denotes enrichment in SPOP mutant, pink denotes enrichment in ERG subclass. Amp, amplification; homdel, homozygous deletion. (B) Clonality results of ERG fusion, PTEN deletion, SPOP mutation, and CHD1 deletion from TCGA cohort. Alteration frequency from each event is shown on the y axis, and different colors represent clonal and subclonal changes. (C) Enrichment of genomic alterations from localized PCa to metCRPC nominate progression events. Alteration percentages in metCRPC samples (n = 150) are shown on the x axis, and those in localized TCGA samples (n = 333) are shown on the y axis. The significance of enrichment (2-sided Fisher’s exact test P value) is shown by the size of the dots: small, P < 0.05; medium, P < 0.01; and large, P < 0.001. Genes in bold have significant enrichments of genomic alterations by using Fisher’s exact test for alteration burden.

We next attempted to distinguish between early alterations and those more likely to represent late progression events. By investigating the clonal architectures of those genomic events in The Cancer Genome Atlas (TCGA) primary PCa cohort (4), we found that all ERG fusions and SPOP mutations were clonal changes, consistent with early alterations. In contrast, a significant fraction of PTEN and CHD1 deletions were subclonal changes (Figure 1B and Supplemental Table 2), more suggestive of late progression events and consistent with previous findings (15). Furthermore, we compared the fraction of samples with these alterations in advanced metastatic castration-resistant prostate cancer (metCRPC) (16) and primary PCa (4); PTEN and CHD1 deletions were enriched in metCRPC (Figure 1C and Supplemental Table 3), again consistent with late progression events (16, 31). Overall, these results confirmed that specific subtypes of PCa are associated with subsequent molecular changes; tumors with ERG fusions later may acquire PTEN deletions, while SPOP-mutant tumors may progress with CHD1 deletion.

Identification of 2 tumor lineage models: ERG/PTEN and SPOP/CHD1. To understand the transcriptional landscape of molecular progression within subtypes, we established a tumor lineage model with 3 PCa states: (a) normal (benign prostate samples), (b) “early” (ERG overexpressing or SPOP mutant), or (c) “late” (PTEN or CHD1 deleted) cancer from TCGA cohort (4). We investigated transcriptional alterations via unbiased differential expression analyses across these states within each genomically defined subtype (32). We hypothesized that transcriptional changes associated with disease progression follow a specific pattern: increasing or decreasing steadily from the normal to early to late states (Figure 2A). Using the 2 models, (a) normal to ERG+ to PTEN-deleted (ERG/PTEN) and (b) normal to SPOP-mutated to CHD1-deleted (SPOP/CHD1), we found 3,160 ERG/PTEN and 1,654 SPOP/CHD1 progressively upregulated and downregulated genes (Figure 2A and Supplemental Tables 4 and 5). In contrast, testing the reverse order of events (normal to PTEN-deleted to ERG+, or normal to CHD1-deleted to SPOP-mutated) returned very few altered genes (Supplemental Figure 2 and Supplemental Tables 6 and 7), supporting the temporal sequence of our original models. To define convergent signaling between the 2 lineage models, we compared affected genes and nominated pathways, and found that upregulated genes shared among both subtype progression models were enriched in cell cycle function, while shared downregulated genes were enriched in focal adhesion function (Figure 2B, Supplemental Figure 3, and Supplemental Tables 8 and 9), consistent with broad common processes of tumorigenesis within the 2 tumor lineages (1, 4). In contrast, uniquely altered genes displayed different functional annotation (Supplemental Figure 3).

Figure 2 Transcriptional alterations of 2 distinct tumor lineage models: ERG/PTEN and SPOP/CHD1. (A) Two distinct tumor lineage models of PCa progression: ERG/PTEN and SPOP/CHD1 via ImpulseDE2 identified from TCGA cohort. Total genes in each category (transiently and progressively upregulated and downregulated) are represented in the bar plot with corresponding heatmaps. (B) Venn diagrams of shared and uniquely upregulated and downregulated genes between the 2 tumor lineage models. Numbers of shared and unique altered genes are indicated. (C) Normalized enrichment score (NES) from “early” to “late” states between the 2 tumor lineage models in TCGA and Taylor cohorts. R2 values of the linear regression model are shown. (D) Distinct pathways with NES from early to late events in TCGA cohort, ERG/PTEN mouse tissue, and Chd1 mouse organoid samples. (E) Divergent predicted upstream regulators from early to late events between the 2 tumor lineage models in TCGA cohort. Different colors represent upstream regulator groups.

By comparing the transcriptional pathways between these 2 tumor lineages, we identified similar enriched functions from the normal to early states (Supplemental Figure 4 and Supplemental Tables 10 and 14), but divergent signatures from the early to late states, in multiple localized PCa cohorts: TCGA (4), Taylor (1), and International Cancer Genome Consortium Prostate Adenocarcinoma - Canada (ICGC PRAD-CA) (33) (Figure 2C, Supplemental Figure 5, and Supplemental Tables 11–13 and 15–17). To further validate these transcriptional differences and determine if underlying genomic alterations were causative, we examined the transcriptomes of prostate organoids and tissue samples from genetically engineered mouse models with conditional deletion of Pten or Chd1, corresponding to the late state with each subtype (34, 35). Gene set enrichment analysis (GSEA) demonstrated distinct enriched functions between these 2 tumor lineages, in both human samples and genetically engineered mouse models (Figure 2D, Supplemental Figure 6, and Supplemental Table 18). For instance, tumor lineage of SPOP/CHD1 was positively enriched in androgen response signature (Figure 2D), consistent with higher androgen receptor transcriptional activity in SPOP-mutant samples (4, 11) and higher prostate-specific antigen (PSA) in SPOP-mutant cases (7).

To further explore the transcriptional divergence between these tumor lineages, we examined putative upstream transcriptional regulators for gene expression changes (36). We detected similar predicted upstream regulators from the normal to early states (Supplemental Figure 7 and Supplemental Tables 19 and 20). However, distinct upstream regulators were identified from the early to late states in multiple cohorts: TCGA (4), Taylor (1), and ICGC PRAD-CA (33) (Figure 2E, Supplemental Figures 8 and 9, and Supplemental Tables 21–26). Specifically, growth/survival-related kinases such as MEK, PI3K, and Erb-B2 receptor tyrosine kinase 2 (ERBB2) were predicted to be active in the ERG/PTEN lineage but inhibited in the SPOP/CHD1 lineage, while kinase inhibitors showed the opposite trend, suggesting distinct activities between the tumor lineages (Figure 2E). Consistent with its status as a known oncogenic SPOP substrate, TRIM24 was predicted to be activated only in the SPOP/CHD1 lineage, whereas putative GATA2 activity was restricted to the ERG/PTEN lineage (37–39). Broadly speaking, these analyses credential 2 distinct transcription-based tumor lineage progression models consisting of ERG/PTEN and SPOP/CHD1, with shared early tumorigenesis but distinct pathways toward progression.

Development of SCaPT models to classify PTEN and CHD1 deletions from transcriptional data. We next sought to understand the impact of subtype-specific progression on clinical outcomes (7, 18) using RNA-based machine-learning classifiers, similarly to that which we have previously reported (7). We developed subclass predictor based on transcriptional data (SCaPT) models to categorize prostate tumors according to subtype-specific molecular events (ERG/PTEN and SPOP/CHD1). To define signatures of PTEN and CHD1 deletions, we selected transcriptional features specific for these genomic events using TCGA cohort (ref. 4 and Figure 3, A and B). We next utilized support vector machine (SVM) (40–42) models and performed 10-fold cross validation to define the best features and models with highest sensitivity and specificity (Figure 3A and Supplemental Figure 10), and thereby established 2 RNA-based classifiers for PTEN and CHD1 deletions (Supplemental Tables 27 and 28). With unsupervised hierarchical clustering using the PTEN- and CHD1-deleted signatures on TCGA training data, we found expected enrichment of cases with PTEN and CHD1 genomic deletions (Figure 3C and Supplemental Figures 11 and 12). To further validate these models, we applied our PTEN and CHD1 transcriptional classifiers to an independent cohort (1), and found approximately 80% sensitivity and 90% specificity compared with genomic annotations (Figure 3D). These results demonstrated that our SCaPT models classify PTEN- and CHD1-deleted subclasses on the basis of transcriptional data with high accuracy and confidence.

Figure 3 Development of SCaPT models to classify PTEN and CHD1 deletions from transcriptional data. (A) Overview of SCaPT models to predict PTEN and CHD1 deletions from transcriptional data, including steps of feature selection, model selection, 10-fold cross validation, and validation testing on independent cohort. (B) PTEN signature of differentially expressed genes between PTEN-deleted and WT samples from TCGA ETS-fusion samples, and CHD1 signature of differentially expressed genes between CHD1-deleted and WT samples from TCGA non–ETS-fusion samples. Different colors represent molecular subclasses. Homdel, homozygous deletion; hetloss, heterozygous loss. (C) Significant enrichment of PTEN- and CHD1-deleted samples with PTEN and CHD1 features based on unsupervised hierarchical clustering of TCGA samples. Different colors represent genomic alterations. (D) Accuracy and confidence of PTEN- and CHD1-deleted subtype classifications by SCaPT model determined by testing on an independent data set (n = 106).

Tumor lineage in 8,158 patients using the SCaPT models and decision tree. We applied RNA-based classifiers (SCaPT) and decision tree to define tumor lineage in 8,158 patients from retrospective and prospective Genomics Resource Information Database (GRID) cohorts (refs. 7, 18; Figure 4A, and Supplemental Figure 13). Among the retrospective cohort with 1,626 radical prostatectomy specimens, we classified 8% (range, 4% to 10%) of samples to be CHD1del (CHD1 deleted), 8% (2% to 13%) of samples to be SPOPmut (SPOP mutant), and 2% (1% to 4%) of samples to be SPOPmut+CHD1del (SPOP mutant with CHD1 deletion) (Figure 4B and Supplemental Figure 14). Previously defined expression thresholds (7, 18) classified 42% (35% to 68%) as ERG fusion (overexpression, ERG+), and 14% (8% to 29%) to be ERG+PTENdel (ERG fusion with PTEN deletion), and 28% (21% to 39%) to be ERG+PTENwt (ERG fusion without PTEN deletion) (Figure 4B and Supplemental Figure 14). Expression thresholds defined 9% (7% to 12%) as non-ERG ETS fusion, and 35% (12% to 38%) without outlier expression, which we defined as an “other” subclass (Figure 4B). Among the prospective cohort with 6,532 radical prostatectomy specimens, we classified 7% of cases to be CHD1del, 4% of cases to be SPOPmut, 15% as ERG+PTENdel, 24% as ERG+PTENwt, 9% as ETS, and 36% as other subclass (Supplemental Figure 15). Overall, the percentage of each molecular subclass is consistent with previous PCa studies (1, 4–6), supporting the validity of our SCaPT models and decision tree.

Figure 4 The molecular subclass prediction via SCaPT models and its prognostic outcomes from the Decipher retrospective cohort. (A) Overview of molecular subclass classification in Decipher cohorts via SCaPT models and gene expression thresholds. (B) Subclass classifications from the Decipher retrospective cohort with 1,626 samples, on the basis of SCaPT models and decision tree. Different colors represent molecular subclasses. (C) Significant difference in prognostic outcome between PTEN-deleted and WT subclasses via Kaplan-Meier analysis for metastasis-free (MET-free) survival rates. (D) Significant difference in prognostic outcome between CHD1-deleted and SPOP-mutant subclasses via Kaplan-Meier analysis for MET-free survival rates.

Late progression events are associated with worse clinical prognosis. To investigate the association of molecular progression with clinical progression and patient prognosis, we examined the clinical outcomes associated with early and late progression events within each molecular subclass (Supplemental Figure 16). We found worse metastasis-free survival in both CHD1del and PTENdel tumors compared with the early state within each subtype (SPOPmut and PTENwt) (Figure 4, C and D). Of note, early states of each subtype had similar favorable prognosis, while both late states showed similar unfavorable prognosis (Supplemental Figure 16). Endpoints of biochemical recurrence–free survival and PCa-specific mortality–free survival rates followed similar patterns (Supplemental Figure 17), consistent with previous findings (7, 43). These results show that genomic alterations defined as late progression events at the molecular level also show clear evidence of more aggressive disease, consistent with clinical progression. Furthermore, these data suggest that the degree of progression within each subtype, rather than the initial lineage, is more associated with clinical prognosis.

Distinct clinical and pathologic characteristics among late progression events. Finally, having established that molecular progression within each subtype was associated with similar prognosis regarding detection of metastatic disease, we examined the association of clinical and pathologic characteristics in the 2 late-progressed states, using retrospective and prospective cohorts of 8,158 radical prostatectomy specimens, compared to various references. Consistent with known association with aggressive disease features, we found that tumors with predicted PTEN deletion were more likely to harbor adverse pathological features at radical prostatectomy: lymph node invasion, extracapsular extension, seminal vesicle invasion, and higher Gleason score in both retrospective and prospective cohorts (Figure 5, A and C, Supplemental Figures 18 and 19, and Supplemental Tables 29 and 30), consistent with pathologic features of late progression events. Strikingly, however, tumors with predicted CHD1 deletion were only associated with higher Gleason score but no other adverse clinical features (Figure 5, A and C, and Supplemental Tables 29 and 30). When compared with the early event of SPOP mutation, CHD1 deletion was associated with higher Gleason score in the retrospective cohort only (Supplemental Figures 18 and 19). Similarly, higher tumor stage (T3/T4) was associated with predicted PTEN deletion, but not CHD1 deletion (Figure 5, B and D). We further validated these associations in TCGA cohort (4), using genomic events to annotate subclass, rather than transcriptional signatures (Supplemental Figure 20).

Figure 5 Distinct pathological characteristics in CHD1- and PTEN-deleted subclasses from the Decipher retrospective and prospective cohorts. (A) Clinical and pathological difference between PTEN- and CHD1-deleted status in the Decipher retrospective cohort (n = 1,626) via univariate analyses, with other samples as reference. GS, Gleason score. (B) Alluvial diagrams of Gleason scores, lymph node invasion status (LNI), and tumor stages from molecular subclasses in retrospective cohort. Different colors represent molecular subclasses. (C) Clinical and pathological difference between PTEN- and CHD1-deleted status in the Decipher prospective cohort (n = 6,532) via univariate analyses, with other samples as reference. In A and C, box size indicates significance and red color indicates P < 0.05. *P < 0.05, **P < 0.01, ***P < 0.001. (D) Alluvial diagrams of Gleason scores, lymph node invasion status, and tumor stages from defined molecular subclasses in prospective cohort.

By comparing enriched signaling pathways between lymph node invasion, and from early to late states of 2 tumor lineages — ERG/PTEN and SPOP/CHD1 — we identified similar enriched functions from lymph node invasion and ERG/PTEN lineage. Strikingly, we found divergent signatures between lymph node invasion and SPOP/CHD1 lineage, supporting the clinical findings that PTEN-deleted tumors were more likely to harbor adverse pathological features including lymph node invasion, whereas CHD1 deletion was not associated with locoregional adverse features (Supplemental Figure 21). Interestingly, when compared with lymph node invasion and ERG/PTEN lineage, SPOP/CHD1 lineage showed dysregulation in metabolism-related pathways (Supplemental Figure 22), which has been shown to represent a hallmark of cancer progression and metastasis (44–46).

Broadly, these results demonstrate that despite similar metastatic potential, PTEN-deleted tumors show evidence of locoregional progression at radical prostatectomy, while CHD1 deletion is only associated with higher Gleason score, suggesting distinct pathways to metastatic disease.