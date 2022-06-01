Definitions of glaucoma, its incidence, and progression. The diagnostic criteria for possible glaucoma based on CFPs were created following published population-based studies; glaucomatous optic neuropathy was defined by the presence of a vertical cup-to-disc ratio of 0.7 or greater, retinal nerve fiber layer (RNFL) defect, optic disc rim width of 0.1-disc diameter or smaller, and/or disc hemorrhage (20–22). Glaucoma incidence was defined as eyes having nonglaucomatous baseline CFPs but becoming possibly glaucomatous during a follow-up period.

Humphrey visual fields performed in a standard 24-2 pattern mode were used for an analysis when glaucoma progression was suspected (23). Glaucomatous progression was defined by at least 3 visual field test points worse than the baseline at the 5% level in 2 consecutive reliable visual field tests or at least 3 visual field locations worse than the baseline at the 5% level in 2 subsequent consecutive reliable visual field tests (23). Time to progression was defined as the time from a baseline to the first visual field test report that confirmed glaucoma progression following the aforementioned criteria. The gold standard definition of clinical progression was confirmed to have been met by unanimous agreement of 3 ophthalmologists who independently assessed each visual field report.

Image data sets and patient characteristics. We established a large data set composed of CFPs and visual fields collected in Guangzhou, Beijing, and Kashi, China. The demographic and clinical information of the study participants is summarized in Table 1. The data were split randomly into mutually exclusive sets for training, validation, and external testing of the AI algorithms.

Table 1 Baseline characteristics of the study participants in the different data sets

In the first task, we developed a model to diagnose possible glaucoma based on 31,040 CFPs. In this task, 31,040 images (split into 20,872 for training, 3182 for validation, 6162 for external test 1,and 824 for external test 2) from 14,905 individuals were collected from glaucoma and anterior segment disease eye clinics. Among these images, 10,175 (32.8%) were diagnosed with possible glaucoma. The training and validation data sets were obtained from individuals from glaucoma and anterior segment disease sections in the Zhongshan Ophthalmic Center in Guangzhou, China. External test set 1 was collected from patients in the glaucoma and anterior segment disease clinic in Jidong Hospital near Beijing. To further test the generalizability of the AI model, we validated its performance with CFPs obtained by smartphones from Kashi.

In the second task, we developed a model to predict future glaucoma incidence based on the data from 3 longitudinal cohorts. We included a total of 13,222 eyes (10,357 training, 1191 validation, 955 external test 1, 719 external test 2) of 7127 participants, all of which were diagnosed as nonglaucomatous at the baseline. The training and validation data sets were obtained from individuals who underwent an annual health check in Guangzhou, while external test set 1 was from individuals who underwent an annual health check in Beijing and external test set 2 was from a community cohort in Guangzhou. The mean follow-up duration was 47.8–56.6 months across the data sets. The incidence rate of glaucoma was 1.1%–2.0% across the data sets.

In the third task, we developed a model to predict glaucoma progression based on the CFPs from cohorts with existing glaucoma. In this task, 4275 eyes (3003 training, 422 validation, 337 external test 1, 513 external test 2) from 2219 glaucoma patients were included, all of which were already diagnosed with glaucomatous optic neuropathy at the baseline. The training and validation data sets were obtained from 1 primary open-angle glaucoma (POAG) cohort in the Zhongshan Ophthalmic Center. To further test the generalizability of the AI model on different subtypes of glaucoma, external test set 1 was collected from another POAG cohort and external test set 2 was collected from a chronic primary angle-closure glaucoma (PACG) cohort in the Zhongshan Ophthalmic Center. The mean follow-up duration was 34.8–41.7 months across the data sets, and the proportion of glaucoma progression was 6%–13.5% across the data sets (Table 1).

Design of the diagnostic (DiagnoseNet) and predictive (PredictNet) algorithms. First, we developed a diagnostic algorithm for possible glaucoma, DiagnoseNet (Figure 1B). In brief, DiagnosetNet is composed of 2 main modules, a segmentation module and a diagnostic module. The CFPs were semantically segmented by the segmentation module with 4 anatomical structures: retinal vessels, macula, optic cup, and optic disk. The diagnostic module generated the glaucomatous probability score.

We then designed a pipeline, PredictNet, for incidence and progression prediction of glaucoma. In brief, PredictNet is also composed of 2 main modules, the segmentation module and the prediction module. The segmentation module is the same as that in DiagnoseNet. The prediction module produces the risk score of glaucoma incidence or progression in the future (Figure 1D and Supplemental Figure 1).

The diagnostic and predictive algorithms share the same segmentation module. The segmentation module was trained based on manual annotations of optic disc (1853 images), optic cup (1860 images), macula (1695 images), and blood vessels (160 images) independently. The segmentation module demonstrated outstanding segmentation performance on the above anatomical structures and achieved an intersection over union (IOU) of 0.847, 0.669, 0.570, and 0.538 for optic disc, optic cup, macula, and blood vessel segmentation, respectively (Supplemental Table 1). Representative samples of segmentation are shown in Supplemental Figure 2.

Diagnostic performance of the AI model based on CFPs captured by smartphones. To demonstrate the potential of deploying our AI model in routine healthcare, we developed and tested the AI model to diagnose possible glaucoma based on CFPs not only from fundus cameras but also from smartphones. As shown in Table 2, in this validation data set, the AI model achieved an area under the receiver operating characteristic (AUROC) curve of 0.97 (0.96–0.97), a sensitivity of 0.98 (0.97–0.99), and a specificity of 0.82 (0.80–0.83) for differentiating glaucomatous and nonglaucomatous eyes. To evaluate the generalizability of the algorithms, the AI model was tested on 2 external data sets. In external test set 1, the AI model achieved an AUROC of 0.94 (0.93–0.94), a sensitivity of 0.89 (0.87–0.90), and a specificity of 0.83 (0.81–0.84). In external test set 2, which was obtained using smartphones, the AI model achieved an AUROC of 0.91 (0.89–0.93), a sensitivity of 0.92 (0.88–0.96), and a specificity of 0.71 (0.67–0.74).

Table 2 Performance of the deep-learning models in the validation and external test sets

Prediction of glaucoma incidence using longitudinal cohorts. We investigated the predictive performance of the AI model for the development of glaucoma in nonglaucomatous individuals over a 4- to 5-year period. A total of 158 eyes developed glaucoma within the 4- to 5-year period. The AI model achieved an AUROC of 0.90 (0.81–0.99), a sensitivity of 0.84 (0.82–0.87), and a specificity of 0.82 (0.57–0.96) for predicting glaucoma incidence in the validation set (Table 2 and Figure 2). The AI model demonstrated good generalizability in the external test sets, which achieved an AUROC of 0.89 (0.83–0.95), a sensitivity of 0.84 (0.81–0.86), and a specificity of 0.68 (0.43–0.87) in external test set 1, and an AUROC of 0.88 (0.79–0.97), a sensitivity of 0.84 (0.81–0.86), and a specificity of 0.80 (0.44–0.97) in external test set 2 (Table 2, Figure 2, and Supplemental Figure 3).

Figure 2 Area under the receiver operating characteristic (AUROC) curves of the AI model for prediction of glaucoma onset. (A–C) Predictive performance of the AI model in the validation set (n = 1191), external test set 1 (n = 955), and external test set 2 (n = 719).

Supplemental Table 2 shows the incidence of glaucoma stratified by the AI model. As shown in Supplemental Table 2, there was a significant difference in the incidence rate of glaucoma between the low-risk and high-risk groups. The incidence rates were 0.2% and 5.0%, 0.6% and 5.6%, and 0.4% and 4.1% in the low- and high-risk groups of the validation set, external test set 1, and external test set 2, respectively. We employed the Kaplan-Meier approach to stratify healthy individuals into 2 risk categories (low or high risk) for developing glaucoma, based on 4- to 5-year longitudinal data on glaucoma development. The upper quartile of the predicted risk scores from the model in the validation set was used to create the threshold for the high-risk and low-risk groups in the Kaplan-Meier curves and log-rank tests. In the external test sets, significant separations of the low- and high-risk groups were achieved (both P < 0.001, Supplemental Figure 4).

The distribution of the risk scores and the threshold (upper quartile) of low- and high-risk groups across the validation and external test sets are presented in Supplemental Figure 5. As shown in the figure, the threshold (risk score of 0.3561, black dotted line) well defines a boundary to separate individuals who are likely and unlikely to develop glaucoma in a 4- to 5-year period.

Supplemental Table 3 presents the results of subgroup analyses within the validation and external test sets. The AI model demonstrated no statistically significant difference in performance among the subgroups as stratified by age (≥60 vs. <60 years), sex (male vs. female), and severity of glaucoma (mean deviation > –6 dB vs. < –6 dB).

Prediction of the glaucoma progression using longitudinal cohorts. We investigated the predictive performance of the AI model for glaucoma progression in glaucomatous eyes over a 3- to 4-year period. A total of 444 POAG eyes had progression within the 3- to 4-year period. The AI model achieved an AUROC of 0.91 (0.88–0.94), a sensitivity of 0.83 (0.79–0.87), and a specificity of 0.79 (0.66–0.89) for predicting glaucoma progression in the validation set (Table 2 and Figure 3). To validate the generalizability of the AI model in predicting progression in multiple-mechanism glaucoma, we further tested its predictive performance in 2 independent cohorts of PACG (external test set 1) and POAG (external test set 2). The AI model achieved excellent predictive performance, with an AUROC of 0.87 (0.81–0.92), a sensitivity of 0.82 (0.78–0.87), and a specificity of 0.59 (0.39–0.76) in external test set 1, and an AUROC of 0.88 (0.83–0.94), a sensitivity of 0.81 (0.77–0.84), and a specificity of 0.74 (0.55–0.88) in external test set 2 (Table 2, Figure 3, and Supplemental Figure 6).

Figure 3 Area under the receiver operating characteristic (AUROC) curves of the AI model for prediction of glaucoma progression. (A–C) Predictive performance of the AI model in the validation set (n = 422), external test set 1 (n = 337), and external test set 2 (n = 513).

We also trained a predictive model using baseline clinical metadata (age, sex, intraocular pressure, mean deviation, pattern standard deviation, and hypertension or diabetes status) alone to predict progression, which led to an AUROC of 0.76 (0.70–0.82), 0.73 (0.66–0.79), and 0.44 (0.33–0.54) in the validation set, external test set 1, and external test set 2, respectively (Supplemental Figure 7). The performance of the AI model was significantly better than that of the predictive model based on baseline metadata in the above data sets (all P < 0.001).

Supplemental Table 4 shows the risk of glaucoma progression stratified by the AI model. As shown in Supplemental Table 4, there was a significant difference in the proportion of eyes with glaucoma progression in the low-risk and high-risk groups. The incidence rates were 3.8% and 42.4%, 4.5% and 23.9%, and 2.0% and 19.8% in the low and high-risk groups of the validation set, external test set 1, and external test set 2, respectively. We then performed Kaplan-Meier analysis to stratify glaucomatous eyes into 2 risk categories (low or high risk) for glaucoma progression, based on 3- to 4-year longitudinal data on glaucoma progression. The upper quartile of the predicted risk scores from the model in the validation set was used to create the threshold for the high-risk and low-risk groups in the Kaplan-Meier curves and log-rank tests. In the external test sets, significant separations of the low- and high-risk groups were achieved (both P < 0.001, Supplemental Figure 8).

The distribution of the risk scores and the threshold (upper quartile) of low- and high-risk groups across the validation and external test sets are presented in Supplemental Figure 9. As shown in the figure, the threshold (risk score of 2.6352, black dotted line) well defines a boundary to separate glaucomatous eyes that are likely and unlikely to progress in a 3- to 4-year period.

Supplemental Table 3 presented the results of the subgroup analysis in the validation and external test sets. The AI model demonstrated no statistical significance in all the subgroups stratified by age (≥60 vs. <60 years), sex (male vs. female), and severity of glaucoma (mean deviation > –6 dB vs. < –6 dB) except the AUROCs of severe and less severe subgroups in the validation set and external test set 1.

Visualization of the evidence for prediction of glaucoma incidence and progression. To improve the interpretability of the AI models and illustrate the key regions for AI-based predictions, we used gradient-weighted class activation mapping (Grad-CAM) to generate the key regions in the CFPs for diagnosing glaucoma and predicting glaucoma incidence and progression. Representative cases and their corresponding saliency maps of DiagnoseNet are presented in Supplemental Figure 10. Representative cases and their corresponding saliency maps are presented in Supplemental Figure 10 (DiagnoseNet) and Figure 4 (PredictNet). The saliency maps suggest that the AI model focused on the optic disc rim and areas along the superior and inferior vascular arcades, which is consistent with the clinical approach whereby nerve fiber loss at the superior or inferior disc rim provides key diagnostic or predictive clues. This would suggest that the AI models are learning clinically relevant knowledge in evaluating glaucoma diagnosis and progression. AI-based predictions also appear to involve the retinal arterioles and venules, thus implicating vascular health as potentially relevant to the etiology of chronic open-angle glaucoma.