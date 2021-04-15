Given the improved treatment response of patients with p16+ OPSCC, concerted efforts have been directed toward developing precision oncology approaches that include targeted de-intensification of radiation and chemotherapy doses and regimens (21, 22). However, the unpredictable clinical behavior of p16+ OPSCC results in a significant risk that some patients will be over- or undertreated. The majority of patients with p16+ OPSCC are cured with current treatments, which include primary radiation, primary chemoradiation, or primary surgery with or without adjuvant radiation or chemoradiation. However, these patients experience substantial toxicity and morbidity from these therapies. Consequently, there is a clear need to develop a quantifiable and reproducible biomarker to stratify high- and low-risk patients with p16+ OPSCC (23). Low-risk patients might then potentially benefit from therapy de-intensification, whereas high-risk patients would continue standard or intensified management.

Lewis et al. previously identified anaplasia and MN as novel prognostic features in patients with p16+ OPSCC (19). These were strongly and independently associated with disease recurrence and death from the disease and also correlated with DFS in a cohort of surgically treated patients with OPSCC (n = 149). However, identification of the above-mentioned morphologic features is pathologist dependent, and, although no specific study in the literature documents it, implies subjectivity and potential bias (24). We found specific examples difficult to discern and quantify, such as overlapping nuclei from separate cells that were not truly multinucleated and large, anaplastic, irregular nuclei that were also not truly multinucleated. Additionally, the study was performed at a single institution in a cohort of only surgically treated patients, for whom all slides of resected tumor, including lymph node metastases, were reviewed. Interestingly, in this study, the pathologist’s quantification of MNs on the single H&E-stained slides for 478 patients with p16+ OPSCC was not found to be prognostic for the other institutions’ cohorts (Supplemental Method 3), probably because of undersampling of the phenomenon on just a representative tumor slide.

The computerized MuNI presented in this work focused on addressing the issues related to tumor sampling and to subjectivity and inter-reader variability in MN interpretation. More critically, though, the MuNI is an independent prognostic marker of major clinical outcomes, OS, DFS, and DMFS. We validated the MuNI in a set of 1094 patients from 6 different institutions and found it to be strongly associated with DFS, OS, and DMFS. We identified a strong association between the predictions of the MuNI and OS, DFS, as well as DMFS among the AJCC 8th edition’s defined stage I and stage III patients in both univariate and multivariable analyses. If confirmed in a prospective clinical trial setting, we believe this finding could have major implications for clinical practice. Patients with stage I disease, currently the target of de-escalation treatment strategies, could be further stratified using the MuNI to exclude those who might have a high chance of treatment failure resulting in recurrence (25). Although multiple clinical trials are currently exploring therapeutic de-intensification strategies, they are limited by a dependence on clinical parameters to identify appropriate patients at low risk of disease recurrence (12). The identification of biologically meaningful markers of a good prognosis is of critical importance. Similarly, patients with stage III disease who are further categorized as high risk by the MuNI may merit the maintenance of treatment intensity by incorporating surgical resection, consistently utilizing concurrent chemotherapy, or intensifying chemoradiotherapy. Taken together, this would represent a novel, viable precision oncology approach to treating patients with p16+ OPSCC in the modern era.

In spite of the differences in clinical and pathological data between the sites (Supplemental Table 1), the MuNI was prognostic across the different sites using a single threshold cutoff learned from a single site, although with modest HRs for death. In Figure 6, the t-SNE of low-level image features such as color and texture extracted via HistoQC shows that each site clustered separately, indicating a large batch effect. On the other hand, the t-SNE using MuNI-specific statistics showed that slides from different sites were interspersed with one another, reflecting the resilience of the MuNI against site-specific preanalytic variations and batch effects. Additionally, as illustrated in Figure 6C, the MuNI was also able to enrich for patients who would develop tumor progression (progressors) versus those who would not (nonprogressors). MuNIs for the progressor group were also found to be statistically and significantly larger than those for the nonprogressor group (Supplemental Figure 8).

Quantitative histomorphometric (QH) approaches for the prognostication of disease outcomes have been previously proposed for many cancers. These approaches fall into 2 major categories: hand-crafted (or domain-inspired) and deep-learning– or neural network–based approaches. We have previously introduced 2 hand-crafted–based approaches, OHbIC (26) and QuHbIC (27), to stratify the risk of patients with head and neck carcinomas using H&E-stained tumor microarrays (TMAs). The first study showed the independent prognostic value of OHbIC, which utilizes nuclear shape and texture features for predicting disease-specific survival, in a cohort (n = 115) of patients with oral cavity squamous cell carcinoma (SCC). The latter showed that QuHbIC could predict the risk of recurrence in a cohort (n = 160) of patients with p16+ OPSCC by quantizing the spatial distribution of cell clusters. A second class of approaches of neural network–based deep-learning classifiers have become popular for cancer detection (28), diagnosis (29, 30), and prognosis (31). Bulten et al. presented a grading system for prostate biopsies using deep-learning models and evaluated its performance in a set of 550 biopsies (29). Skrede et al. trained multiple deep-learning models at different magnifications and fused their output to predict the prognosis for colorectal cancer (n = 2042) (31). These approaches utilize deep networks to learn best representations for predicting prognosis categories of interest without requiring a pathologist’s input. However, because of the multilayered, nonlinear structure of deep-learning models, they are considered black boxes, and their output is not interpretable by pathologists or translatable into any directly visual form. Interestingly, unlike these models, the method presented here utilizes the power of deep learning with the interpretability of hand-crafted (i.e., visually identified) features. In other words, deep learning was used not to make a direct prognostic prediction but rather to quantify MNs in WSIs and derive a prognostic metric based on the number of MNs identified on the WSIs. As demonstrated in experiment 4, the hybrid approach to identify a computational pathology–based biomarker was found to be resilient against batch effects.

Our study has some limitations. The cohorts from different institutions were found to have significant differences in their MuNIs, as well as in certain clinical and pathologic parameters (Supplemental Table 1 and Supplemental Figure 9). Nonetheless, we found that the MuNI was prognostic for the entire set of validation cohorts in both univariate and multivariable analyses, although it was not consistently prognostic for each of the separate cohorts. A possible explanation could be related to MN segmentation performance resulting in variants in the generated MuNIs across the different cohorts. Further evaluation of the sensitivity of the MuNI segmentation across sites is necessary. The MuNI was most strongly prognostic of outcomes in patients with stage III disease. This could be related to the number of patients within each stage. Since stage I and II patients had much better survival outcomes, irrespective of MN, it was more challenging to identify a difference between the high- and low-risk patients in these groups. A modest difference between the high- and low-risk groups could be observed among patients with stage I disease, since there were 471 patients, whereas in the group with stage II disease, which included 245 patients, that difference was not apparent. In the group of patients with stage III disease, whose overall prognosis was much worse, we could detect the difference, even though there were only 169 patients. Finally, this study was based on retrospectively collected data. Analyses of slides from completed multi-institutional, prospective clinical trials, or better yet, a prospective clinical trial with the MuNI embedded within it, are required to validate the findings, minimize the potential for bias, and determine whether the MuNI can specifically predict a patient’s response to treatment.

In conclusion, the MuNI is a tissue-nondestructive, reproducible, rapid, and cost-efficient artificial intelligence–enabled (AI-enabled) biomarker with the potential to risk-stratify patients with p16+ OPSCC. The MuNI only relies on the quantitative measurement of MN tumor cells in digitized H&E-stained tissue from primary tumors, without the need for visual or manual segmentation of tumor versus nontumor regions. These specimens are already consistently obtained from patients with OPSCC in routine practice. This makes the MuNI potentially widely introducible into clinical practice at US institutions and useful in low- and middle-income countries, where the costs associated with genomics-based tests make them difficult to adopt and implement. Given that the MuNI is tissue nondestructive, further validation in retrospective clinical trials or prospective validation could make it a useful biomarker to guide the treatment of p16+ OPSCC.