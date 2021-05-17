Clinical cohort. A total of 585 LDCT-positive patients were enrolled from thoracic surgery departments of 14 clinical sites across 8 different provinces in China. The percentage of malignancy based on pathological diagnosis from each province ranged from 75% to 88% (Supplemental Figure 1; supplemental material available online with this article; https://doi.org/10.1172/JCI145973DS1). Fifty-six samples were excluded from analysis due to failed experimental quality control (QC), e.g., an inadequate circulating free (cfDNA) amount extracted from plasmas. The remaining 529 patients’ plasma samples (116 benign and 413 malignant) were used for DNA methylation profiling, model development, and validation. An overview of the study design is shown in Figure 1, and the demographic characteristics for the 529 patients are shown in Table 1.

Figure 1 Study flow of participants in the study. Total 585 enrolled; 30 excluded due to limited cfDNA extracted (<5 ng) and 26 excluded due to failing sequencing QC. The model was developed, tested on 389 samples, and validated independently on 140 samples. The model was further validated on 100 indeterminate nodules (6–20 mm) in the validation set.

Table 1 Demographic and clinical characteristics of study participants

The 529 plasma samples were first split into a model development set and an independent validation set at a 3:1 ratio. Furthermore, the model development set was divided into a training set (56 benign + 253 malignant) and a test set (20 benign + 60 malignant), so that the distribution of malignancy, age, and sex of the test set matched that of the training set, as shown in Figure 1. The percentages of malignancy were 82% and 75% in the training and test sets, respectively. The samples used for model development were primarily from early stage NSCLC. Specifically, stage I and II cancers comprised 94% or 98% of the total cancer patients in the training set and the test set, respectively. Benign and malignant samples were matched with respect to sex and smoking status (P > 0.05). The average size of the nodules in the benign group was 15.8 mm (9.6–22.0 mm), which is statistically smaller (P < 0.05) than that of the malignant group, which was 16.4 mm (9.9–22.9 mm). A summary of nodule types and American Joint Committee on Cancer (AJCC) stage information is shown in Supplemental Tables 2 and 3.

Development and validation of the diagnosis model PulmoSeek for pulmonary nodule diagnosis. Methylation profiles of 309 plasma samples (Supplemental Table 1, training set) were analyzed using AnchorDx’s proprietary targeted methylation sequencing platform with a panel of 12,899 preselected lung cancer–specific methylation regions, corresponding to 105,844 CpG sites (9). A specific methylation signature was selected based on its performance of differentiating malignant from benign nodules.

The derived classification model, comprising 500 methylation target regions (features) achieved a receiver operating characteristic curve–AUC (ROC-AUC) of 0.823 (0.771–0.884) in the test set. Compared with the 500-feature model, the top 10 features within the model showed AUC values between 0.561 and 0.754 in the training set and 0.525 and 0.720 in the test set, demonstrating the necessity for building a multiple feature–based model (Supplemental Figure 2). For further downstream analysis, we annotated the selected 500 CpG features and performed a gene enrichment analysis. A total of 89 Gene Ontology (GO) categories were significantly enriched (Supplemental Table 4). The enriched categories include tissue proliferation and differentiation, such as embryonic morphogenesis (q value = 10−9.3), cell-fate commitment (q value = 10−4.7), stem cell proliferation (q value = 10−4.6), and epithelial tube morphogenesis (q value = 10−3.0). In addition, transcriptional factor activities, such as RNA polymerase II–specific DNA-binding transcription activator activity, were also significantly enriched (q value = 10−7.5). This result suggested that specific epigenetic signaling responsible for cell differentiation/reprogramming might be essential for pulmonary nodule development.

The performance of the model remained stable during a recursive feature elimination process: the smallest number of features that maintained an AUC within 1% of the 0.829 was 20, with an AUC of 0.810 (0.783–0.850) in the test set (Supplemental Figure 3). This indicates that a robust signature is maintained across different numbers of features selected.

We then chose the 100-feature model, PulmoSeek, for the follow-up analysis. PulmoSeek achieved an overall AUC of 0.829 (0.719–0.942), with a high sensitivity of 0.933 (0.533–0.983) at a specificity of 0.600 (0.500–1.000) in the test set, corresponding to an accuracy of 0.850 (0.625–0.912) (Figure 2, A and C, and Table 2). The detailed information for each methylation feature of PulmoSeek is listed in Supplemental Table 5. Given excessive false positives and overdiagnosis in LDCT screening, unnecessary invasive procedures should be avoided under conditions of high-screening sensitivity in patients with benign nodules; that is, one should not sacrifice sensitivity (misclassify true positives) to pursue a reduction of unnecessary invasive procedures. This argues for a test with high sensitivity and high NPV, instead of a test with high specificity and high PPV. We assessed PulmoSeek’s performance with regard to its negative predictive value (NPV) and positive predictive value (PPV). In the current study cohort of 78% prevalence, the NPV was 0.750 (0.396–0.929) and the PPV was 0.875 (0.852–1.000) in the test set (Table 2). The sensitivities of the top 20–, 50–, and 500-feature models were 0.800 (0.675-0.912), 0.800 (0.713–0.912), and 0.900 (0.517–0.967), respectively, as shown in Supplemental Table 6.

Figure 2 PulmoSeek performance compared with Mayo Clinic/VA model in all nodule sizes. A representative ROC displays the classification performance of PulmoSeek. (A) In the test set, the AUC was 0.83 (0.72–0.94). In the validation set, the AUC was 0.84 (0.77–0.92). (B) In the validation set, the AUC of the Mayo Clinic classifier was 0.59 (0.48–0.69), and the AUC of the VA classifier was 0.54 (0.44–0.64). (C) Confusion matrices for PulmoSeek comparing the true class with the predicted class for benign (n = 20) and malignant (n = 60) nodule samples and distribution of PulmoSeek scores (range, 0 to 1) in the test set. (D) Confusion matrices for PulmoSeek comparing the true class with the predicted class for benign (n = 40) and malignant (n = 100) nodule samples and distribution of PulmoSeek scores (range, 0 to 1) in the validation set.

Table 2 PulmoSeek performance metrics

We then used an independent cohort of 140 patient plasma samples (40 benign and 100 malignant; Supplemental Table 2, validation set) to further evaluate the performance of PulmoSeek. PulmoSeek achieved an AUC of 0.843 (0.769–0.918; Figure 2, A and D) with sensitivity of 0.990 (0.610–1.000) at specificity of 0.325 (0.200–0.875) and an overall accuracy of 0.800 (0.657–0.871). The NPV was 0.929 (0.444–1.000), and the PPV was 0.786 (0.758–0.938). In an intended-use population with a prevalence of malignant nodule at 10% (10), the NPV was calculated as 0.997 (0.947–1.000; Table 2). We further split the validation cohort into 3 subcohorts from high to low prevalences. We found that the NPV increased from 0.790 (0.370–1.000) to 1.000 (1.000–1.000) when the subcohort prevalence decreased from 79% to 23% (Supplemental Table 7).

The performance of PulmoSeek in patients with nodules of different histological types was further explored. Robust sensitivity for different subtypes, including minimally invasive adenocarcinoma (MIA) (95.2%), invasive adenocarcinoma (IA) (98.2%), and squamous cell carcinoma (SCC) (90.0%) were observed (Supplemental Table 8).

We also compared the performance of PulmoSeek to 2 clinical assessment models — the Mayo Clinic and VA models, which are based on clinical information and radiological characteristics, including nodule size and location, among others. In the validation set, PulmoSeek outperformed both of the clinical models, with an AUC of 0.843 (0.769–0.918) versus AUC of 0.591 (0.482–0.688) for the Mayo Clinic model and 0.544 (0.442–0.640) for the VA model (Figure 2B)

Classification accuracy of the model in very early stage lung cancers. Very early stage cancer (tumor, node, metastasis [TNM] stage I) poses the greatest challenge for cancer diagnosis using a liquid biopsy (11). We tested PulmoSeek in different stage I substages in the validation cohort: it achieved sensitivities of 0.941 and 1.00 in stage IA (n = 85) and stage IB (n = 5), more specifically, 0.864, 0.950, and 1.000 in stage IA1 (n = 22), IA2 (n = 40), and IA3 (n = 23), respectively (Figure 3, A and B). In the combined test and validation set, PulmoSeek detected malignancies with sensitivity of 0.971 (0.942–0.993) for stage 0–I and 0.875 (0.625–1.000) for later stage cancers (Supplemental Figure 4A). The decreased sensitivity in late-stage cancers could be due to the limited number of late-stage samples (n = 8), which was not statistically significant (P = 0.248). Besides, the differences in performance for PulmoSeek in different groups were also calculated, and we observed no significant differences between groups (Supplemental Figure 4, B and C). Taken together, these results validated the accuracy of PulmoSeek, especially in detecting very early stage lung cancers.

Figure 3 PulmoSeek performance in early stage lung cancer. In the independent validation set (A), PulmoSeek performance in early stage cancer: sensitivity was 100% in stage 0 (n = 2), 94.1% in stage IA (n = 85), and 100% in stage 1B (n = 5). (B) PulmoSeek performance in stage IA substages: sensitivity was 86.4% in stage IA1 (n = 22), 95.0% in stage IA2 (n = 40), and 100% in stage 1A3 (n = 23).

PulmoSeek outperformed clinical prediction models and conventional cancer biomarker tests in indeterminate nodules. Diagnosis of indeterminate pulmonary nodules (IPN) (nodules ranging between 6 and 20 mm in size) is challenging for clinicians due to the lack of well-specified optimal action strategies (12). The 6 to 20 mm size nodules made up about 70% of the test set (56 of 80) and the independent validation set (100 of 140) in this study (Supplemental Table 9). PulmoSeek achieved an AUC of 0.762 (0.610–0.913), sensitivity of 0.905 (0.429–0.976), and specificity of 0.500 (0.286–1.000) in the test set (Figure 4, A and B, and Table 2). In the independent validation set, PulmoSeek achieved an AUC of 0.844 (0.759–0.932), sensitivity of 1.000 (0.577–1.000), and specificity of 0.300 (0.172–0.931; Figure 4, A and D, Table 2, and Supplemental Table 10). For nodules above 20 mm (n = 59), PulmoSeek had an AUC of 0.860 (0.740–0.964) with sensitivity of 0.977 (0.628–1.000) and specificity of 0.562 (0.375–0.938; Supplemental Figure 5).

Figure 4 PulmoSeek performance compared with Mayo Clinic/VA model in 6–20 mm nodule sizes. A representative ROC displays the classification performance of PulmoSeek. (A) In the test set, the AUC was 0.76 (0.61–0.91). In the validation set, the AUC was 0.84 (0.76–0.93). (B) In the validation set, the AUC of the Mayo Clinic classifier was 0.60 (0.48–0.72) and the AUC of the VA classifier was 0.51 (0.40–0.63). (C) Confusion matrices for PulmoSeek comparing the true class with the predicted class for benign (n = 14) and malignant (n = 43) nodule samples and distribution of PulmoSeek scores (range, 0 to 1) in the test set. (D) Confusion matrices for PulmoSeek comparing the true class with the predicted class for benign (n = 30) and malignant (n = 73) nodule samples and distribution of PulmoSeek scores (range, 0 to 1) in the validation set.

When compared with the Mayo Clinic and VA models, PulmoSeek outperformed both clinical models in the validation set in which an AUC of 0.602 (0.482–0.719) was obtained with the Mayo Clinic model and an AUC of 0.512 (0.402–0.633) was obtained with the VA model (Figure 4C).

Consistent with previous studies, conventional cancer biomarkers such as carcinoembryonic antigen (CEA), cancer antigen 125 (CA-125), and cancer antigen 135 (CA-135) alone failed to effectively identify malignant nodules in our cohort (13). The corresponding sensitivity of CEA, CA-125, and CA-135 was only 0.010, 0.030, and 0.030, respectively, as compared with sensitivity of 0.950 by using PulmoSeek (Supplemental Figure 6).

PulmoSeek outperformed PET-CT in different nodule types, including ground-glass nodule. PET-CT is known to be more accurate than CT alone for characterizing solid-type pulmonary nodules, resulting in fewer equivocal findings (14). Thus, low- to intermediate-risk nodules are usually recommended to be further evaluated by PET-CT. However, PET-CT performance drops substantially for subsolid nodules (part-solid and ground-glass nodule [GGN]). We assessed the performance of PulmoSeek in comparison with PET-CT on the participants with established PET-CT records in our independent validation set. The accuracy of PulmoSeek was significantly higher than that of PET-CT: it correctly classified 8 out of 10 patients in the solid nodule (SN) subgroup, 9 out of 11 in the part-solid nodule subgroup, and 5 out of 5 in the GGN subgroup, while PET-CT correctly classified 6 out of 10 patients in the SN subgroup, 7 out of 11 in the part-solid nodule subgroup, and 0 out of 5 in the GGN subgroup (Figure 5). This performance was maintained across all nodule types in the combined test and independent validation sets: the model demonstrated a sensitivity of 1.000 (0.702–1.000) in the solid subgroup (n = 78), 0.947 (0.509–1.000) in the part-solid subgroup (n = 75), and 0.964 (0.518–1.000) in the GGN subgroup (n = 67; Supplemental Figure 7).

Figure 5 PulmoSeek performance in different nodule types and comparison with PET-CT. In the independent validation set samples with PET-CT records, the diagnosis result for each patient using PulmoSeek (squares) and PET-CT (diamonds) is shown. Green indicates the sample was diagnosed correctly, and the red incorrectly. PulmoSeek correctly identified 8 out 10 patients in the SN subgroup, 9 out of 11 in the part-solid nodule subgroup, and 5 out of 5 in the GGN subgroup. The PET-CT correctly identified 6 out 10 patients in the SN subgroup, 7 out of 11 in the part-solid nodule subgroup, and 0 out of 5 in the GGN subgroup.

A strategy of integrating liquid biopsy–based ctDNA and protein marker analysis followed by PET-CT imaging for cancer screening has been proposed (15). We tried to assess this strategy in our cohort by testing the performance of PET-CT on the malignant nodules identified by our methylation model. In both solid and part-solid nodule groups, integration of PET-CT did not reduce false-positive rates. Rather, it introduced a significant number of false negatives. In SNs, PulmoSeek had a false-positive rate of 14.2% (2 out of 14 misclassified), while integration of PET-CT resulted in a false-positive rate of 16.2% (2 out of 12) and a false-negative rate of 100% (2 out of 2). In all nodules, PulmoSeek had a false-positive rate of 14.8% (4 out of 27 misclassified), while integration of PET-CT had a false-positive rate of 11.7% (2 out of 17) and a false-negative rate of 80% (8 out of 10) (Supplemental Table 11).