BACKGROUND Chronic graft-versus-host disease (cGVHD) is a major contributor to nonrelapse mortality (NRM) following hematopoietic cell transplantation (HCT). Whether machine-learning (ML) models with biomarkers improve the accuracy for predicting future cGVHD/NRM is not established.METHODS We developed BIOPREVENT (BIOmarkers PREVENTion), a ML algorithm using data from 1,310 HCT recipients, incorporating 7 plasma proteins measured at Day 90/100 post-HCT and 9 clinical variables. Patients were divided into training and validation datasets. ML models — including CoxXGBoost, Group SCAD, Adaptive Group Lasso, Random Survival Forests, and Bayesian Additive Regression Trees (BART) — were used to estimate time-varying Area Under the ROC Curve (AUCt) at Days 180, 270, 360, and 540. Deep learning models were also evaluated.RESULTS ML models with biomarkers outperformed clinical-only models for predicting cGVHD, with BART and CoxXGBoost achieving AUCt greater than 0.65 at 1 year. For NRM, models with biomarkers achieved AUCt ranging from 0.75–0.91. Deep learning did not outperform other ML approaches. BART consistently demonstrated high predictive accuracy and was selected for the final BIOPREVENT model. Calibration curves aligned with observed values. Variable importance analysis identified MMP3 and CXCL9 as key for cGVHD prediction and IL1RL1 and sCD163 for NRM. Cumulative incidences of cGVHD and NRM differed significantly based on BIOPREVENT-defined cutpoints.CONCLUSION BIOPREVENT accurately predicts individual risk of future cGVHD and NRM using biomarkers at 3 months post-HCT. A publicly available R Shiny web application supports its clinical use. Further studies are needed to explore its role in guiding preemptive therapy.TRIAL REGISTRATION BMTCTN 0201, BMTCTN 1202, and NCT02194439.FUNDING R01CA264921, U10HL069294, U24HL138660, R01HD074587, and P01HL158505.
Michael J. Martens, Debjani Dutta, Yongzi Yu, Lisa E. Rein, Jerome Ritz, Brent R. Logan, Sophie Paczesny