Assembly and characterization of a large multiinstitutional ERCC2-mutant bladder cancer cohort. The ERCC2 mutation frequency in several reported bladder cancer cohorts ranges between 8% to 20% (3–5). To interrogate the nature of ERCC2 mutations more deeply in bladder cancer, we assembled a multiinstitutional cohort of bladder cancer cases (n = 2,012; Supplemental Figure 1A; supplemental material available online with this article; https://doi.org/10.1172/JCI186688DS1) that represents the largest clinically and/or genomically annotated database of ERCC2-mutant cases analyzed to date. The complete cohort consists of 675 patient-derived tumor samples analyzed by whole-exome sequencing (WES) and 1,337 samples analyzed by targeted panel sequencing. Cases with available sequencing data and clinical information were divided into 2 clinically distinct groups: a neoadjuvant cohort and a metastatic cohort. The neoadjuvant cohort consisted of 284 tumors collected from 5 cohorts of nonmetastatic MIBC patients who received cisplatin-based neoadjuvant chemotherapy (NAC): Dana-Farber Cancer Institute and Memorial Sloan Kettering Cancer Center (DFCI-MSKCC, n = 50) (5), Philadelphia (n = 48) (9, 16), Aarhus (n = 60) (12), MSK IMPACT (n = 38), and Indiana (n = 88) (Supplemental Figure 1A). The metastatic cohort was comprised of 429 tumors collected from 3 cohorts of patients: Aarhus (n = 105) (12), DFCI Oncopanel (n = 132) (17), and Urothelial Cancer – Genomics Analysis to Improve Patient Outcomes and Research (UC-GENOME, n = 192) (18) (Supplemental Figure 1A). Of the 429 patients in the metastatic cohort, 322 patients received platinum-based chemotherapy. In the DFCI Oncopanel and UC-GENOME cohorts, the primary tumor was sequenced in 77% and 87% of the patients, respectively, whereas tumor from a metastatic site was sequenced in 19% and 13%, respectively (the remaining 4% of samples in the DFCI Oncopanel cohort were derived from locally recurrent sites or the information was not available). In the Aarhus cohort, primary tumor specimens were sequenced in all 165 cases. In addition to the assembled neoadjuvant and metastatic cohorts, bladder cancer cases from The Cancer Genome Atlas (TCGA) cohort (4) were analyzed separately and consisted of 412 muscle-invasive, high-grade urothelial tumors analyzed by WES.

We performed comprehensive mutational analyses for all tumors across the 3 cohorts (neoadjuvant, metastatic, and TCGA). Somatic mutations, including single-nucleotide variants (SNVs) as well as short insertions and deletions (indels) identified by WES or targeted panel sequencing, were annotated. We focused our analyses on nonbenign exonic and splice site mutations affecting a gene identified to be significantly mutated in MIBC (4). The 20 most frequently mutated genes are shown in Figure 1A and Supplemental Figure 1, B and C.

Figure 1 Extensive analysis of MIBC cohorts. (A) Mutation landscape of the bladder cancer cases analyzed in the neoadjuvant, metastatic, and TCGA patient cohorts. (B) Mutually exclusive gene pairs identified in the neoadjuvant, metastatic, and TCGA cohorts using the DISCOVER test. No cooccurring gene pairs were detected using the DISCOVER test. Targeted and whole-exome sequencing (WES) cohorts were analyzed separately. (C) 87% of somatic small-scale mutations in ERCC2 occur in the helicase domains of the gene, although the helicase domains only constitute 56% of the gene. The observed ratio of helicase-domain variants was compared with an expected ratio of variants occurring randomly along the gene (χ2 test: P = 6.12 × 10–30). (D) The most frequent ERCC2 variants that were detected in the collected cohorts.

In the neoadjuvant cohort, the median nonsynonymous mutation rate was 5 mutations per megabase (Mb) for WES cases, TP53 was the most frequently mutated gene (57%), and ERCC2 was mutated in 19% of the cases (Supplemental Figure 1B). The Indiana and MSK-IMPACT targeted sequencing cohorts were excluded from Figure 1A and Supplemental Figure 1B because the Indiana cohort only had mutational data for ERCC2 and TP53, and the MSK-IMPACT cohort consisted exclusively of ERCC2-mutant cases (Supplemental Figure 1C). However, even with these 2 cohorts excluded, the frequency of ERCC2 mutations in the neoadjuvant cases summarized in Supplemental Figure 1B may still be higher than in a nonselected MIBC population because patients in the DFCI-MSKCC and Philadelphia cohorts were specifically included in the cohorts based on tumor response to cisplatin-based therapy. We performed mutual exclusivity and cooccurrence analyses (Methods) for mutations in genes significantly mutated in BLCA using Discrete Independence Statistic Controlling for Observations with Varying Event Rates (DISCOVER) (19). There were no genes with mutations that significantly cooccurred or were mutually exclusive with ERCC2 mutations; however, we did identify a mutually exclusive relationship between RB1 and KDM6A in the subset of the neoadjuvant cohort with available WES data (Figure 1B) in agreement with previous reports (20).

In the metastatic cohort, the median nonsynonymous mutation rate was 4 and 11 mutations per Mb for WES and panel sequencing samples, respectively. TP53 was mutated in 50% of cases, and ERCC2 was mutated in 11% of cases (Supplemental Figure 1B). We identified several mutually exclusive gene pairs including, but not limited to, RB1 and KDM6A, RB1 and FGFR3, TP53 and FGFR3, TP53 and STAG2, TP53 and HRAS, and HRAS and FGFR3 (Figure 1B). Mutually exclusive and cooccurring gene pairs were tested using the Fisher’s exact test (Supplemental Figure 1, D and E, Methods), which identified cooccurrence between ERCC2-ERBB2,and ERCC2-SF3B1 (Supplemental Figure 1E), although there were no genes that significantly cooccurred with ERCC2 using the DISCOVER test.

In the TCGA cohort, the median nonsynonymous mutation rate was 4 mutations per Mb, TP53 was mutated in 46% of cases, and ERCC2 was mutated in 9% of cases (Supplemental Figure 1B). We identified a mutually exclusive relationship between RB1 and FGFR3, TP53 and FGFR3, FGFR3 and ARID1A, and KMT2D and KDM6A, some of which have been previously described (4) (Figure 1B).

Of the 2,012 patient-derived samples, we identified 506 ERCC2 mutations in 477 individuals, the vast majority of which were missense variants (93%). ERCC2 variants were highly enriched (87%) in the helicase domains (HDs) of the protein compared with the expected ratio of mutations occurring randomly along the gene (Figure 1C, χ2 test: P = 6.12 × 10–30). The most frequent ERCC2 variant was N238S (Figure 1D; 14% of ERCC2-mutant cases); however, several other recurrent mutations were also identified (e.g. S44L, T484M, and Y24C; Figure 1D and Supplemental Figure 1F). Comprehensive copy number information and/or loss of heterozygosity (LOH) estimates were available for the WES and Indiana samples (Methods), and we found that ERCC2 missense mutations were nearly always present without loss of the second allele (Supplemental Figure 1G): 82% of ERCC2-mutant cases lacked LOH versus only 5% of the cases with an LOH event detected (LOH estimates were not available for 13% of the cases). Tumor mutation burden (TMB) was calculated and harmonized across different sequencing platforms by assigning a TMB z-score to each tumor (Supplemental Figure 1H, Methods). We found that ERCC2-mutant cases, defined as missense or truncating (stopgain, frameshift, or nonstop) variants in the HDs of ERCC2, demonstrated significantly higher nonsynonymous TMB compared with WT ERCC2 cases (defined as no mutations or mutations outside of the HDs) in all 3 cohorts (Supplemental Figure 1I; pairwise Wilcoxon rank-sum tests with Holm’s correction for multiple testing, neoadjuvant: P = 6.3 × 10–5, metastatic: P = 2.4 × 10–20, TCGA: P = 2.3 × 10–9). Finally, we performed region-specific mutational signature analysis (Methods) and found that many of the missense mutations in ERCC2 are consistent with the mutational signatures associated with APOBEC activity (Supplemental Figure 1J). However, we also observed a number of ERCC2 mutations due to T→C changes, including the most common variant, N238S, which is not attributable to APOBEC mutagenesis.

CRISPR-Select identifies functionally deleterious ERCC2 helicase-domain mutations. ERCC2 mutations have been associated with increased sensitivity to cisplatin-based chemotherapy in some bladder cancer cohorts, and functional analyses of selected ERCC2 mutations have demonstrated impaired NER activity. However, the functional impact of most clinically observed ERCC2 mutant alleles on cisplatin sensitivity has not been characterized. To quantitatively define the impact of specific ERCC2 missense mutations on cisplatin sensitivity, we leveraged the newly developed CRISPR-Select assay (14) (Figure 2A). In this approach, CRISPR-based genome editing is used to introduce the mutation of interest (Mut) as well as a synonymous (silent) mutation (WT*) as an internal control in a single MCF10A cell population. Cell aliquots are harvested at different time intervals after editing and deep NGS is performed to monitor relative changes in mutation frequencies over time. Drug treatment was included to evaluate if the introduced ERCC2 mutations confer increased cisplatin sensitivity. Given that TP53 is frequently comutated with ERCC2, we tested the impact of ERCC2 mutations on cisplatin sensitivity with and without cooccurring loss of TP53.

Figure 2 CRISPR-Select analysis establishes that helicase-domain ERCC2 mutations confer platinum sensitivity. (A) CRISPR-Select workflow. iCas9-MCF10A cells are transfected with equal amount of repair templates harboring the mutation of interest (Mut) or a synonymous mutation (WT*). The WT* is used as an internal normalization control. Following CRISPR editing, most cells with a mutation of interest knocked-in on one allele will have a disruptive frameshift (fs) InDel on the other allele. Cells are harvested at day 2 (D2; initial timepoint) and the remaining cells are split into untreated or cisplatin-treated conditions and collected at D12.The region containing the Mut or WT* is deep sequenced and the Mut:WT* ratio is calculated. (B) Schematic representation of ERCC2 gene structure and position of the mutations investigated by CRISPR-Select. The mutations correspond to germline mutations selected from ClinVar and somatic missense mutations identified in bladder cancer cohorts. The conserved helicase domains of ERCC2 are depicted. (C) Impact of a known pathogenic (Y639*) and a likely benign (D312N) variant on cell fitness in TP53 WT and KO iCas9-MCF10A cell lines. The normalized Mut:WT* shown corresponds to the ratio of the Mut:WT* normalized to D2. (D and E) Impact on cisplatin sensitivity of ERCC2 variants. The normalized Mut:WT* frequencies of somatic missense mutations in (D) TP53 KO iCas9-MCF10A and (E) bladder cancer cell lines. The error bars represent the standard deviation of 3 independent experiments. The statistical significance was determined using a paired2-tailed t test. *P ≤ 0.05, **P ≤ 0.01, and ***P ≤ 0.001.

First, we monitored basic cell proliferation rates and did not observe any difference between TP53 KO and TP53 WT cell lines as measured by live microscopy (Supplemental Figure 2A). Next, we selected a known ERCC2 pathogenic germline variant (Y639*; https://www.ncbi.nlm.nih.gov/clinvar/variation/1358482/) and a likely benign variant (D312N; https://www.ncbi.nlm.nih.gov/clinvar/variation/134117/) as controls. We introduced these alterations in the TP53 KO and TP53 WT cell lines and assessed the ERCC2 Mut:WT* frequencies over time in the absence of cisplatin (Figure 2, B and C). Cells were collected on day 2 (D2, initial timepoint) and day 12 (D12) following guide RNA transfection. The Mut and WT* frequencies were calculated and then the Mut was normalized to the WT* (Mut:WT*). To compensate for experimental variability, the Mut:WT* ratio at D12 was normalized to that of D2 (Supplemental Figure 2B). The Mut:WT* frequency of Y639* decreased by approximately 80% on D12 (Figure 2C), consistent with the known impact of Y639* on ERCC2 stability and the essentiality of ERCC2’s structural role as part of the TFIIH complex (21). Conversely, ERCC2-D312N did not affect cell fitness, supporting that this variant is benign (Figure 2C). The guide RNAs used in the CRISPR-Select experiment introduce frameshift mutations if the repair template is not used. We observed a decrease in frameshift frequency over time for guide RNAs used to edit at both Y639* and D312N positions, indicating a selection against disruptive ERCC2 frameshift mutations for both guide RNAs, which is in line with ERCC2’s essential function (Supplemental Figure 2C). Together, these results support the utility of CRISPR-Select to assess the functional impact of ERCC2 mutations.

We next investigated the impact of somatic ERCC2 mutations identified in bladder cancer cohorts (Figure 2B) on cell fitness in the TP53 KO (Figure 2D) and WT cell lines (Supplemental Figure 2D). In the absence of cisplatin, the variant frequency was unchanged over time for both helicase and nonhelicase mutations, suggesting that these somatic ERCC2 mutations did not impact baseline cell fitness (Figure 2D and Supplemental Figure 2D). A decrease in guide RNA–mediated frameshift frequency over time was observed for all ERCC2 mutations except ERCC2-Q758E (Supplemental Figure 2E). Q758E is in the last exon, and the guide RNA–mediated frameshift events may therefore not be as deleterious due to nonsense-mediated mRNA decay escape (22).

We next used CRISPR-Select to evaluate the impact of ERCC2 variants on cisplatin sensitivity. We first determined the half-maximal inhibitory concentration (IC 50 ) of cisplatin for TP53-WT and TP53-KO cells. Though TP53-WT cells were slightly more sensitive to cisplatin (IC 50 , 0.5 μM) than TP53-KO cells (IC 50 , 0.9 μM) (Supplemental Figure 2F), the difference was small, and we selected 1 μM cisplatin as the dose to be used for both cell lines. Two days following guide RNA transfection, an aliquot of cells was collected (D2) and the remaining cells were treated or not treated with 1 μM cisplatin and then harvested ten days later (D12). All tested helicase domain ERCC2 mutations sensitized cells to cisplatin, as demonstrated by the statistically significant decrease in Mut:WT* frequencies in both TP53 KO (Figure 2D) and TP53 WT (Supplemental Figure 2D) backgrounds whereas the nonhelicase domain variants did not impact cisplatin sensitivity (Figure 2D and Supplemental Figure 2D). In a separate set of experiments in which cells were harvested on D7 and D12, no significant difference in cisplatin sensitivity was observed between TP53-KO and WT cells (Supplemental Figure 2G). This indicates that TP53 loss does not influence cisplatin sensitivity induced by ERCC2 helicase-domain mutations in vitro.

To explore the impact of ERCC2 mutations on cisplatin sensitivity in a bladder cancer model, Cas9 and equal amounts of repair templates harboring Mut or WT* were nucleofected in J82, a malignant human urothelial cell line (23). In agreement with our prior findings, the 2 helicase domain variants, N238S and D609G, displayed increased sensitivity to cisplatin treatment (0.25 μM and 0.5 μM) but had no impact on cell fitness in the absence of cisplatin (Figure 2E). Taken together, these data demonstrate the utility of CRISPR-Select to define the functional impact of clinically observed ERCC2 mutations on bladder cancer cell fitness and cisplatin sensitivity. Our findings show that ERCC2 helicase-domain mutations substantially increase cisplatin sensitivity.

Single allele editing CRISPR-Select can quantify functional impacts of heterozygous ERCC2 missense mutations. The version of CRISPR-Select that was previously reported (14), and that we used to test the functional impact of ERCC2 mutations in Figure 2, relies on editing of one allele to introduce the desired missense mutation coupled with highly efficient loss of heterozygosity on the second allele via InDel formation (Figure 3A). However, this genetic context differs from most bladder tumors, in which the heterozygous missense ERCC2 mutations are present without loss of heterozygosity (LOH) of WT ERCC2 allele(s) (Supplemental Figure 1G). To more accurately model the clinically relevant setting, we adapted the CRISPR-Select assay by using guide RNAs that target the nearest intron to the ERCC2 mutation of interest. In this setting, the primary genome editing outcomes within a cell are as follows; (a) on one allele the donor repair templates (ssODNs) yields the desired missense (Mut) or synonymous (WT*) mutation, (b) the second allele is predominantly repaired without use of the donor repair templates, leading to intronic InDel formation that does not disrupt production of a full-length WT ERCC2 protein (Figure 3B). We term this assay “single allele editing CRISPR-Select”, as it allows introduction of heterozygous missense mutations without accompanying LOH. To validate this approach, we first compared editing outcomes using guide RNAs that targeted either the exons of ERCC2 D609 and N238 (common sites of clinically observed mutations) or their adjacent intronic regions (in the absence of a ssODN template). As expected, the exon-targeting guide RNAs resulted in a majority of InDel events in the coding regions whereas the intron-targeting guide RNAs resulted in intronic InDel events (Figure 3, C and D). We also considered if intron guide RNA might impact regions important for RNA splicing of ERCC2. However, analysis of NGS data following editing indicated that the intron guide RNAs had a smaller effect on splicing than the exon guide RNAs (Supplemental Table 1).

Figure 3 Single allele editing CRISPR-Select to quantify functional impacts of heterozygous ERCC2 missense mutations. (A and B) Principle of exon guide RNA editing compared with intron guide RNA editing. Following Cas9 cleavage, the ssODN repair templates are employed to introduce the mutation of interest (Mut) or a synonymous mutation (WT*) that are tracked by NGS. On the other allele, Cas9 introduces a cut, but, due to inefficiency of editing, this predominantly results in InDel events. Two cellular editing events resulting in Mut and WT* ERCC2 are depicted separated by dashed lines. (A) In exon guide RNA editing, the second allele events are frameshifts that generally are degraded by nonsense-mediated mRNA decay (NMD). (B) Using an intron guide RNA system, the second allele InDels are now in the noncoding region. This system can hence circumvent the formation of a high proportion of frameshifts and be used to mimic a heterozygous condition. (C and D) Quantification of exon and intron InDels at D2 in an exon guide RNA editing system compared with an intron guide RNA system. (E) Impact on cisplatin sensitivity of exon guide RNA and intron guide RNA for N238S and D609G variants. The normalized Mut:WT* shown corresponds to the ratio of the Mut:WT* normalized to the initial D2 timepoint. The error bars represent the standard deviation of 3 independent experiments. The statistical significance was determined using an unpaired 2-tailed t test.

We next assessed the impact of intron InDels and exon InDels on ERCC2 protein levels by transfecting cells with nontargeting, intron-targeting, or exon-targeting guide RNA only, without the addition of ssODNs, thereby inducing InDel events around the Cas9 cut site. The genomic DNA and protein were collected 3 days after guide RNA transfection. We observed an equivalent guide RNA-Cas9 efficiency (greater than 80% of modified alleles) with the intron and exon guide RNAs (Supplemental Table 2). As expected, a larger proportion of frameshift events were observed with the exon guide RNA compared to the intron guide RNA (Supplemental Table 2). Consistently, we observed a significant decrease in ERCC2 full-length protein expression with the exon guide RNA compared with the intron guide RNA (Supplemental Figure 3E). We then compared cellular fitness and cisplatin sensitivity following Mut or WT* editing with either the exon- or intron-targeting guide RNAs. Intriguingly, we observed similar cisplatin sensitivity with exon- and intron-targeting guide RNAs (Figure 3E), suggesting that helicase domain ERCC2 mutations were sufficient to confer cisplatin sensitivity in the presence or absence of accompanying WT ERCC2 protein. More broadly, these results indicate that single allele editing of ERCC2 mutations is feasible and support our findings obtained using the original CRISPR-Select assay (Figure 2).

MIBC cases with ERCC2 helicase domain mutations benefit from cisplatin-based neoadjuvant chemotherapy. To explore if ERCC2 helicase-domain mutations may predict cisplatin response in bladder cancer, we investigated the relationship between ERCC2 helicase-domain mutation status (Figure 4A) and patient outcomes in the assembled bladder cancer cohorts. The neoadjuvant cohort is comprised of MIBC patients who received cisplatin-based chemotherapy followed by radical cystectomy. Cisplatin responders were defined as those patients with pathologic down staging of tumors to nonmuscle invasive, node-negative disease (i.e., pT0, pTa, pTis, or pT1; and N0) at the time of cystectomy, whereas nonresponders were patients with residual muscle-invasive (pT2) or node-positive (N1) disease. Among patients with ERCC2 helicase-domain mutations, there was a significant enrichment of responders compared with nonresponders (Figure 4B, Fisher’s exact test: P = 3 × 10–4). This enrichment persisted if a stricter definition of response (pT0, pTa, or pTis; and N0) was applied (Supplemental Figure 4A, Fisher’s exact test: P = 5.1 × 10–5). The number of cases with nonhelicase domain ERCC2 mutations was too low to assess the association with response. Patients with helicase-domain ERCC2-mutant tumors had significantly longer OS compared with patients with WT ERCC2 or a nonhelicase domain ERCC2 mutation in our neoadjuvant cohort (Figure 4C, Log-rank test: P = 5 × 10–4).

Figure 4 Clinical outcomes among patients in the neoadjuvant, metastatic, and TCGA cohorts. (A) Percentage of ERCC2-mutant and WT cases in the 3 cohorts (neoadjuvant, metastatic, TCGA). (B) Patients with helicase domain ERCC2 mutantions were more likely to respond to cisplatin-based neoadjuvant chemotherapy (NAC). Response was defined as pT0, pTa, pTis, and pT1 (Fisher’s exact test: P = 3 × 10–4). (C) Kaplan-Meier plot for OS of patients in the neoadjuvant cohort stratified by ERCC2 helicase-domain mutation status (Log-rank test: P = 5 × 10–4). (D) Kaplan-Meier plot for OS of patients in the metastatic cohort stratified by ERCC2 helicase-domain mutation status (Log-rank test: P = 0.35). (E) Response to chemotherapy in the metastatic cohort (Fisher’s exact tests: P = 0.36). (F) Kaplan-Meier plots for OS of platinum-treated patients in the TCGA cohort stratified by ERCC2 helicase-domain mutation status (Log-rank test: P = 0.017). (G) Percentage of cases grouped by ERCC2 and TP53 mutation status in the 3 cohorts (neoadjuvant, metastatic, TCGA). (H) The number of responders and nonresponders to neoadjuvant cisplatin-based chemotherapy, when response was defined as pT0, pTa, pTis, and pT1 (Fisher’s exact test: overall P = 0.003), among cases grouped by ERCC2 helicase-domain mutation and TP53 mutation status. (I) Kaplan-Meier plot for OS of patients in the neoadjuvant cohort stratified by ERCC2 helicase-domain mutation and TP53 mutation status.

In the metastatic cohort, there was no significant difference in OS between patients harboring a helicase-domain ERCC2 mutation compared to patients with WT or nonhelicase domain ERCC2 mutations (Figure 4D, Log-rank test: P = 0.35). For a subset of cases in the Aarhus and UC-GENOME cohorts, response to first-line chemotherapy and response to chemotherapy, respectively, were available. In the metastatic subset of the Aarhus cohort, response to first-line cisplatin-based chemotherapy was measured posttreatment by cross-sectional imaging based on the Response Evaluation Criteria in Solid Tumors (RECIST) guidelines (12). In the UC-GENOME cohort, response was reported based on investigator assessment (18). Clinical benefit was defined as any patient who had a complete response (CR), partial response (PR), or stable disease (SD). In the combined subset of Aarhus and UC-GENOME metastatic cases with available chemotherapy response data, we found no significant associations between ERCC2 mutation status and response (Figure 4E, Fisher’s exact test: P = 0.36) or clinical benefit (Supplemental Figure 4B, Fisher’s exact test: P = 0.18), although the number of cases was limited.

In TCGA cohort, comparing OS of ERCC2-mutant vs WT cases, a clear separation was demonstrated when patients were stratified by receipt of platinum-based chemotherapy (Figure 4F, Log-rank test: P = 0.017 and Supplemental Figure 4C, Log-rank test: P = 0.91). Similar relationships were observed for other clinical endpoints, including progression-free interval, disease-free interval, and disease-specific survival (Supplemental Figure 4, D–I).

TP53 is mutated in approximately 50% of all bladder cancer cases, including approximately 50% of ERCC2-mutant cases (Figure 4G). Notably, our CRISPR-Select analysis indicated that TP53 status does not significantly influence the cisplatin sensitivity of ERCC2-mutant cells. Therefore, we investigated the impact of TP53 mutation status on clinical outcomes following platinum-based chemotherapy in patients with versus without an ERCC2 helicase-domain mutation. In the neoadjuvant cohort, patients with helicase-domain ERCC2 mutations were enriched in responders regardless of TP53 mutation status (Figure 4H, Fisher’s exact test: P = 3.2 × 10–3 and Supplemental Figure 4J, Fisher’s exact test: P = 8.2 × 10–4). We also investigated the associations of ERCC2 and TP53 mutation status on OS (Figure 4I, Kaplan-Meier curves) and found that helicase-domain ERCC2 mutation status was associated with significantly longer OS (Supplemental Table 3, HR = 0.43, P = 0.055), but neither TP53 status (Supplemental Table 3, HR = 1.14, P = 0.6) nor the interaction between ERCC2 and TP53 (Supplemental Table 3, HR = 0.55, P = 0.4) was associated with OS.

Comparison of CRISPR-Select and computational predictions of ERCC2. Computational models have emerged that allow fast prediction of the impact of specific mutations on certain protein functions. CRISPR-Select provides an opportunity to functionally quantify the impact of specific mutations based on endogenous locus editing. Therefore, we wished to compare the functional experimental results obtained with CRISPR-Select to various computational predictions of ERCC2 mutation pathogenicity. To identify functionally important sites in ERCC2, we employed a machine learning model (24) (thereafter referred to as the Cagiada model), and a threshold-based approach called FunC-ESMs (25), or Functional Characterization via Evolutionary Scale Models. The Cagiada model classifies each variant into one of 4 categories: WT-like, stable-but-inactive (SBI), total-loss (TL), and variants with WT-like function but decreased stability. The FunC-ESMs approach is similar to the Cagiada model conceptually, although it relies on recently developed protein language models (Methods). The computational predictions of functionally important sites in ERCC2 by the Cagiada and FunC-ESMs models are shown in Figure 5A and Supplemental Figure 5A, respectively. Both heatmaps show that the majority of variants (55% by the Cagiada model and 85% by FunC-ESMs) in ERCC2 were predicted to be either SBI or TL variants. According to the Cagiada model, the number of variants predicted to impair protein function (SBI variants) is enriched in the HDs (45%) compared with the nonhelicase domains (28%) of ERCC2 (Figure 5B, Fisher’s exact test: P = 5 × 10–3), which is in agreement with the expected association between functionally damaging missense variants and the HDs. However, the FunC-ESMs model did not show an enrichment of SBI variants in the HDs and noticeably appeared to overestimate the number of SBI variants in ERCC2 (Supplemental Figure 5B, Fisher’s exact test: P = 0.47). In comparison with CRISPR-Select, the Cagiada model accurately predicted the effect in 10 out of 12 variants (Figure 5C). On the other hand, the FunC-ESMs method was less accurate and misclassified the control benign variant, D312N, and 3 out of 4 nonhelicase domain variants (D179H, F193V and Q758E) (Figure 5C).

Figure 5 Comparison of CRISPR-Select and computational predictions of ERCC2. (A) Computational prediction of functionally important sites in ERCC2 using the Cagiada model. The heatmap shows that 2/3 (66%) of ERCC2 variants in the helicase domains are predicted to have a functionally or structurally detrimental effect, i.e., stable-but-inactive (SBI) (45%) or total-loss (TL) (21%) variants. (B) The bar plot shows the ratio of variants in each class predicted by the Cagiada model within and outside of the helicase domains (HDs) of ERCC2. The ratio of variants within and outside of the HDs was compared by the Fisher’s exact test: P = 5 × 10–3. (C) Comparison of CRISPR-Select functional experimental results using MCF10A TP53-KO cells and computational predictions by multiple functional and variant prediction tools. Values in “D12” and “D12+Cis” columns are showing the mean values of 3 independent experiments conducted by CRISPR-Select.

In addition to the Cagiada and FunC-ESMs models, we also employed other prediction tools to assess the pathogenicity of ERCC2 mutations including AlphaMissense (26), EVE (27), REVEL (28), SIFT (29), PolyPhen2 (30), and CancerVar (31). The predictions of ERCC2 pathogenicity by AlphaMissense are shown in Supplemental Figure 5C. Although 68% of the total variants in ERCC2 were predicted to be pathogenic, we observed an enrichment of pathogenic variants in the HDs compared with the nonhelicase domains of ERCC2 (79% versus 54%, Supplemental Figure 5D, Fisher’s exact test: P = 8 × 10–4). A similarly high percentage of predicted pathogenic variants were obtained by EVE and REVEL with an enrichment of pathogenic variants in the HDs (Supplemental Figure 5, E–G).

Next, we compared CRISPR-Select findings with computational predictions of pathogenicity with a fitness-centered view (i.e., analogous to cell viability on D12 without cisplatin treatment in the CRISPR-Select assay). The majority of prediction tools characterized the benign variant (D312N) as benign, except PolyPhen2 and CancerVar, which predicted D312N as “Possibly damaging” and a variant of “Uncertain significance”, respectively (Figure 5C and Supplemental Figure 5H). For the cancer-associated helicase domain variants (Figure 5C and Supplemental Figure 5H), the computational tools predicted the variants to be pathogenic. However, CRISPR-Select did not identify a fitness impact of these variants at baseline. Rather, only in the presence of cisplatin did CRISPR-Select identify functional impacts of these helicase-domain missense variants. Finally, we also interrogated several nonhelicase domain mutations. The predicted benign impacts of N250T and Q758E by almost all tested computational methods was in agreement with the CRISPR-Select assessment (Figure 5C and Supplemental Figure 5H). However, several of these tools labeled the D179H and F193V mutations as pathogenic (Figure 5C and Supplemental Figure 5H), which contrasts with the result from CRISPR-Select that found neither a fitness impact nor cisplatin treatment impact of these variants. Thus, while computational analysis provides complementary insights to precision functional assays, caution should be taken as these methods do not necessarily account for the complex nature of the systems they address.