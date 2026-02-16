In this study, we evaluate whether machine learning methods can enhance the sensitivity and specificity of established cGVHD risk biomarkers. To be able to answer this question we have leveraged samples at days 90 or 100 after HCT and 7 plasma biomarkers previously validated with Cox regression analysis (8) in 4 cohorts totaling 1,310 HCT recipients in what is, to our knowledge, the largest biomarkers study for cGVHD prediction to date. Among the clinical variables, graft source was most often selected for predicting overall and moderate/severe cGVHD for the ML models where we examined the selection process for SCAD, Adaptive Group Lasso, and BART models. This finding was expected because recipients of BM grafts developed less cGVHD than PB graft recipients in a randomized trial (18) that was part of the current study. Related donor, primary disease, and age were selected by these models probably due to the fact that most patients with nonmalignant diseases are children and receive BM grafts. GVHD prophylaxis is also correlated with the graft source and selected as one of the top clinical variables. Other clinical variables were prioritized differently between overall cGVHD and moderate/severe cGVHD with prior aGVHD, F-M sex mismatch, conditioning intensity, and ATG being selected in priority for moderate/severe cGVHD risk.

For predicting overall cGVHD, CXCL9 and MMP3 were the most often selected. CXCL9 has been validated as a strong risk biomarker for cGVHD in several cohorts, particularly of adult recipients (8, 19, 20). It has been validated as a diagnostic and prognostic marker of severity in even more cohorts (19, 21–23). CXCL9 and CXCL10 are 2 chemokines that attract type 1 T cells expressing the receptor CXCR3+ (24). Furthermore, the presence of polymorphisms in CXCR3 ligands is a susceptibility marker for severe cGVHD (25). Since CXCL9 expression is upregulated in target tissues of cGVHD (e.g., skin, liver, lungs) in response to IFN-γ produced by donor-derived T cells, it acts as a chemoattractant for CXCR3+ effector T cells, promoting their migration into tissues, which contributes to the perpetuation of inflammation and tissue injury (26). High CXCL9 expression has also been associated with increased collagen deposition and fibrosis in skin fibrosing models through a CXCR3-dependent upregulation of col1a1 in fibroblasts (27) and pulmonary fibrosis (28). Targeting CXCR3/CXCL9 signaling is a potential therapeutic strategy to reduce T-cell infiltration and tissue damage. Several monoclonal antibodies and small molecules (MDX-1100, T487, SCH-546738, ACT-777991 (8a)) have been developed to block this pathway with unfortunately limited clinical efficacy at the exception of the emerging agent 8a in combination with an anti-CD3 antibody (29). In contrast, JAK inhibitors such as ruxolitinib has been FDA-approved for cGVHD treatment and have been shown to decrease CXCL9/CXCL10 levels in inflammatory diseases (30).

MMP3, also known as stromelysin-1, is a proteolytic enzyme that degrades the extracellular matrix during fibrosis and has been identified as a diagnostic biomarker in bronchiolitis obliterans (BOS) with higher concentrations compared with cGVHD without BOS (31). It was subsequently part of a panel for the diagnosis, prognosis, and risk of cGVHD (8, 19). MMP3 expression is upregulated by inflammatory cytokines such as TNFα, and IL-1β (32). MMP3 then degrades the extracellular matrix components, such as collagen, laminin, fibronectin, and proteoglycans, which contributes to fibrotic tissue remodeling. It has been specifically involved as a mediator of pulmonary fibrosis (33). MMP-3 can also activate other pro-MMPs, including MMP-1, MMP-7, and MMP-9, amplifying the matrix-degrading cascade (34). Since measurements in the current study were made 3 months after HCT, MMP3 elevation may indicate the cGVHD fibrosing process starts earlier than commonly considered. Therapeutic targeting of the fibrosing process of cGVHD is an unmet need. In autoimmune diseases, inhibition of the RIPK1 pathway with GSK2982772 decreased MMP3 production compared with the placebo group (35).

Along with CXCL9 and MMP3, IL-17 was selected as a biomarker for moderate/severe cGVHD risk; it is produced by Th17 cells and is associated with macrophage-driven inflammation and fibrosis in cGVHD (36). Although IL17 is typically challenging to measure in cGVHD plasmas, the use of an ultrasensitive immunoassay allowed its detection in patients who subsequently developed moderate/severe cGVHD. Selective ROCK2 inhibition with belumosudil downregulates proinflammatory cytokines like IL-17 and IL-21 and has been shown to reduce the number and activity of Th17 cells while promoting the expansion and function of regulatory T cells (37).

For NRM prediction, the biomarkers selected differed from cGVHD models, though most clinical features were similar. Age and primary disease were selected first, followed by GVHD prophylaxis, HLA-matching, prior aGVHD, and conditioning intensity. Importantly, among all clinical and biomarker variables, the protein IL1RL1 was selected first by all models. It has been associated with risk of aGVHD in several cohorts (38), and some cGVHD studies (19, 20). It has also consistently been associated with NRM (19, 38). Soluble IL1RL1 is the form measured by ELISA and is a decoy receptor of IL33 (39). It acts at several levels in the pathogenesis of cGVHD: (a) it interferes with the IL-33’s protective effects on regulatory T cells, type 2 innate lymphoid cells, and epithelial repair; (b) it is released when the endothelium is damaged, which has been observed in cGVHD (40, 41); and it is secreted by alloreactive T cells in preclinical GVHD models (42, 43). The murine anti-IL1RL1 that is full length has been shown to reduce alloreactivity in preclinical models by targeting the circulating excess of soluble IL1RL1 while maintaining IL1RL1+ regulatory cells in the tissues (42). Anti-human neutralizing IL1RL1 antibodies such as astegolimab are in phase II and III trials. Their safety and efficacy of astegolimab (a fully anti-human neutralizing ST2/IL1RL1 antibody) have been established in patients with COPD and adults with severe asthma (44, 45).

sCD163 was the second-most selected biomarker. It has been associated with cGVHD as early as 80 days after HCT (46) and recently with NRM (8). sCD163 is shed from the surface of inflammatory macrophages by the tumor necrosis factor-α–converting enzyme (TACE), also known as ADAM17 (47). Targeting monocyte/macrophage activation pathways represents an emerging therapeutic strategy. A recent phase II trial showed efficacy of colony-stimulating factor 1 receptor inhibition with axatilimab in recurrent/refractory cGVHD (48). Of note, in some preclinical models of GVHD, an unexpected neuroinflammation was observed (49), underscoring the need for careful consideration of the context of monocyte/macrophage inhibition. As CD163 is preferably expressed on activated macrophages, its inhibition might be more specific.

While the clinical and biomarker variables identified as the most important were generally consistent with prior work using standard approaches (8), the ML models showed improved prediction performance because ML offers greater flexibility through its ability to model nonlinear relationships between biomarkers and outcome and potential interactions between biomarkers and/or patient characteristics. The prediction models for cGVHD with both biomarkers and clinical variables account for a 6 point (on average) increase in AUC compared with the model with clinical variables only. The models for NRM with both biomarkers and clinical variables account for a 12 point (on average) increase in AUC compared with the clinical factors–only prediction model. The BART-derived models consistently provided the highest AUCt for prediction of future cGVHD and NRM. Our ML models containing biomarkers showed better AUROC at 0.76–0.91 versus 0.62 in contrast to ML models trained from 33,927 patients using only clinical characteristics in the EBMT-TCWP study (50), suggesting that addition of proteomic biomarkers strengthens the accuracy of the AUROC. Our study presents the distribution of predictive scores and statistical performances for ML models in cGVHD. The risk scores from the BART-derived ML models may help stratify patients at low and high risk for cGVHD/NRM who may benefit from additional monitoring and future preemptive intervention.

In our study, we explore the use of deep learning for GVHD evaluation. Unfortunately, deep learning models performed, at best, similar to the other ML methods considered, which may be due to an inability of the former’s complex neural network structure to effectively learn the relationships between biomarkers and cGVHD risk without a much larger sample size, in the tens of thousands, as it has been done for diagnosing lung cancer with imaging (51).

Generalizability of statistical algorithms for GVHD care have been insufficient (52). Most ML algorithms developed for HCT have been evaluated only on the data they were trained on; studying their performance metrics on a validation set permits a proper assessment of how well the algorithms generalize to unseen data (53). Applications have been made available for HLA matching and immune suppression discontinuation (54, 55), but a tool using ML in HCT has yet to be published. We offer the BIOPREVENT cGVHD application that calculates individualized scores based on clinical and biomarker data, trained on the largest-scale multicenter dataset with cGVHD biomarkers to date. Indeed, we have trained and validated the biomarker-based predictive tool in a diverse population across multiple trials, transplant centers, and data sources, with a wide range of GVHD prophylaxis regimens, conditioning intensities, and patient-specific risk factors. This supports its real world utility for patient-specific risk prediction in a heterogeneous setting. The real-world utility of the predictive tool comes from the potential to tailor treatment strategies to patients based on their cGVHD risk. To test the value of intervention based on our biomarker-based predictive tool, a randomized study will need to be developed where patients are randomized to an intervention based on the risk of the patients. For high-risk patients, one could consider randomizing to addition of preemptive immunosuppressive therapy, such as ruxolitinib versus placebo. For low-risk patients, rather than adding immunosuppression, which can increase the risk of relapse or infection, the biomarker-based predictive tool can be useful in guiding immunosuppression discontinuation decisions. Here, clinicians may be more interested in randomizing between a rapid immunosuppression taper versus a standard immunosuppression taper. Note that studying preemptive therapy in a high-risk population can lead to important reductions in required sample sizes, since the required sample size for detecting a difference in incidences of events is inversely proportional to the event rate in the study population. Tailoring and studying treatment strategies based on biomarker-based predictive tools are extremely useful real-world applications. To illustrate this, we performed 2 power calculations for a hypothetical trial using either an enriched high-risk group based on the BIOPREVENT algorithm or an all-comers group. We assumed that the day 540 cumulative incidence of chronic GVHD in the control group, starting at a day 90 landmark time, is 49.5% for the full HCT population and 63.3% for the high risk population, based on the observed rates in our current cohort. We also assumed follow up of 450 days for all patients in the trial, corresponding with follow up between the day 90 landmark and day 540 (approximately 1.5 years) after HCT. The sample size for 80% power to detect a subdistribution hazard ratio of 0.60 between treatment and control, using Gray’s test, would be 290 for the all-comers population or 222 for the high-risk group.

However, there are limitations to this study. While ML-based approaches can offer unbiased feature selection, 2 major challenges are understanding how they function and determining how influential variables impact the outcome of interest. That said, the BIOPREVENT cGVHD application will provide risk scores for cGVHD and NRM for each patient based on their clinical characteristics and Day 90/100 biomarkers. Another limitation of the study is that only 1 sample was tested due to the lack of additional longitudinal samples collected after day 100 in these cohorts. Increasing the number of biomarker timepoints could increase AUROC, as was shown for aGVHD prediction with a dynamic probabilistic algorithm, daGOAT (56). We chose to include only 7 previously validated risk biomarkers of cGVHD, but adding more markers including cellular subsets, such as regulatory T cells, T follicular helper cells, and T follicular regulatory cells, Th17-prone CD146+ T cells that have all been positively or negatively correlated with cGVHD (57–59) could increase the accuracy of the model. However, paired plasma and PBMCs are not readily available in most biobanks, and the low throughput of cellular measurements preclude timely generation of large amount of cellular datapoints necessary for ML models. Furthermore, adding many predictors without increasing the sample size can adversely impact ML models by introducing collinearity or noise (60). Prior acute GVHD was included in the ML models and was a variable picked by the BART model. However, some other potential important variables (late acute GVHD, immunosuppression status including modulation for increased risk of relapse at the time of sampling) could not be included because only retrospective deidentified defined data sets were available to develop prediction models and it was, therefore, not possible to go back to the centers to verify these variables. The models for NRM prediction treat relapse as a competing risk. Finally, PTCY has shown lower cGVHD incidence than CNI-based prophylaxis (61, 62). Although patients receiving PTCY represented a low number in our study (n = 27), PTCY was included in the variables compared in GVHD prophylaxis and was picked as a variable in the Group Lasso where hazard ratios are calculated for each variable (see Supplemental Table 5). Thus, the interpretation of the current algorithm in PTCY GVHD prophylaxis is still uncertain and the model will need to be validated on larger number of samples from patients receiving PTCY. We speculate that the BIOPREVENT model will work with PTCY prophylaxis since, once present, the biology of cGVHD should be relatively similar in patients receiving PTCY GVHD prophylaxis and patients receiving CNI-based GVHD prophylaxis. For example, tree-based machine learning using clinical, cellular, and proteomic variables in samples at day 28 after HCT from patients receiving PTCY showed that CXCL9 predicted acute GVHD (63) and we suspect that several of the 7 proteins measured at day 90 after HCT will also be selected in our machine learning models.

In summary, early identification of patients who will develop cGVHD may permit more stringent monitoring and preemptive interventions. These data support future research to further validate and implement these ML algorithms into the clinic.