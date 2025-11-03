Longitudinal multiomics profiling of LC. The IMPACC study included 1,164 participants admitted to 20 US hospitals for SARS-CoV-2 infection between May 2020 and March 2021 (37). Clinical data collection and immunophenotyping were performed longitudinally during the acute disease phase within 72 hours of hospital admission and 4, 7, 14, 21, and 28 days after hospital admission (visits 1–6, respectively). Surviving participants were contacted 3, 6, 9, and 12 months after hospital discharge (visits 7–10, respectively) to complete patient-reported outcome (PRO) and symptom surveys during the convalescent phase and to provide biosamples for immunophenotyping assays. Of the 702 participants who could be reached by the study team after discharge, 513 were included in the IMPACC convalescent cohort (Supplemental Figure 1; supplemental material available online with this article; https://doi.org/10.1172/JCI193698DS1). These participants were selected because they survived at least 28 days of hospitalization, completed at least 1 PRO survey, and provided at least 1 biosample during the convalescent period (Figure 1A and Supplemental Table 1) (18). IMPACC core laboratories performed immunophenotyping in both the acute and convalescent phases, including measurements of inflammatory mediators in blood serum via Olink (SO), global blood plasma metabolomics (PMG), global and targeted blood plasma proteomics (PPG and PPT), PBMC transcriptomics (PGX), whole blood cell frequencies measured with mass cytometry by time of flight (CyTOF), and CyTOF mean marker signal intensity measurements (BCT).

Figure 1 Multiomics data overview and generation of a predictive LC factor. (A) Number of samples used in the multiomics data integration strategy by assay (rows) and scheduled time of collection (columns). Shading indicates the frequency of samples with data availability at the indicated visit. (B) Patient classification in PRO clusters according to the PRO survey scores (18). (C) Individual assay data were preprocessed and split into train and test cohorts by participant in an 80/20 split, maintaining the proportion of PRO cluster participants in each partition. (D) Preprocessed assay data and LC response outcomes for the train cohort were used to identify multiomics predictive factors with SPEAR. Factor scores were then calculated for the test cohort. (E) The performance of the multiomics predictive factors to classify patients according to the presence or absence of LC was assessed on the train cohort via cross-validation and then validated on the test cohort. The predictive factor scores were confirmed to be associated with LC after correcting for possible confounding variables. In-depth analysis of enriched biological pathways and significant analytes relevant for the prediction was performed. Factor scores were computed for the acute infection immune profiles, and association analysis with LC at these early time points was performed. See also Supplemental Figures 1 and 2.

LC status was defined in this cohort according to the participant’s response to post-discharge surveys that captured symptoms and PRO measures evaluating general health and deficits in specific domains. Participants who responded to at least 1 set of post-discharge surveys were assigned to PRO clusters according to latent class modeling and clustering using standardized scores of the PRO survey measures (18) (PRO survey score details can be found in the Supplemental Methods). PRO clusters were classified as participant clusters with no or minimal deficits (MIN), or with deficits attributed to LC in several domains: physical predominant (PHY), mental/cognitive predominant (COG), and multi/pan domain (MLT) (18) (Figure 1B).

In this study, we utilized multiomics immunophenotyping profiles from participant biosamples obtained during the convalescent disease phase to develop interpretable models for predicting LC and exploring the underlying molecular mechanisms. To assess model performance, we split the convalescent cohort into an 80% train cohort and a 20% test cohort, maintaining the proportions of participants in each PRO cluster (Figure 1C), with no noticeable imbalance in other clinical characteristics or biosample availability between the cohorts (Supplemental Figure 2). We then used Signature-based multiPle-omics intEgration via lAtent factoRs (SPEAR) (41), a supervised Bayesian factor model for the identification of multiomics features, to integrate the high-dimensional data and construct multiomics predictive factors from immune profiles obtained during the convalescent phase in the train cohort. We assessed their predictive performance by repeated cross-validation on the train cohort and validated the performance of the selected model on the test cohort (Figure 1, D and E). To identify immune programs captured in the predictive factors, we conducted in-depth analyses of enriched biological pathways and analytes identified as highly relevant for the model’s predictive performance and performed associations with assay data not included in model training, such as blood CyTOF cell frequencies.

Multiomics factors are predictive of LC. We focused on predicting LC in the convalescent cohort using multiomics immune profiling data collected during the convalescent phase. We constructed several SPEAR models to generate supervised factors. These models used either PRO survey scores from PRO Measurement Information System (PROMIS) surveys (SPEAR Physical, SPEAR Cognitive, SPEAR Mental, SPEAR Impact, SPEAR Dyspnea) or the binary LC labels assigned to each participant (SPEAR LC) as response variables (Supplemental Figure 3A). We trained models on these different response variables, since binary LC labels (presence or absence of LC) per participant could omit valuable information captured by numeric PRO survey scores at each visit. Note that PRO survey scores were available for participants in all LC clinical PRO clusters, so each individual SPEAR model (e.g., SPEAR Physical) was trained using data from participants in all 4 PRO clusters (MIN, PHY, COG, MLT) (18). The SPEAR Physical model performed best among all models trained on PRO survey scores (Supplemental Figure 3B) and outperformed the model trained on binary LC labels (Figure 2A). Additionally, all SPEAR models, obtained with the Multiomics Factor Analysis (MOFA) framework (42), outperformed equivalent models trained on unsupervised multiomics factors, which do not consider a response variable during the factor construction step (Figure 2A and Supplemental Figure 3B). The SPEAR Physical model achieved an area under the receiver operating characteristic curve (AUROC) of 0.69 for predicting LC presence or absence in the test cohort (Figure 2B). The SPEAR Physical Factor, learned by the SPEAR physical model, was significantly associated with LC in the test cohort after correcting for sex and age (P = 0.00098, effect size 0.44), two variables previously associated with LC in our cohort (18) (Figure 2C). Sparse lasso regression models to reconstruct SPEAR Physical factor scores utilizing all analytes included in the model or analytes from individual assays showed that the model that included all analytes best reconstructed the factor scores, indicating that factor scores captured contributions from multiple omics (Supplemental Figure 3C). The SPEAR Physical Factor scores were significantly higher for participants in the MIN group compared with scores for the LC group, so we termed this factor the “recovery factor” (Figure 2C, Supplemental Figure 4, and Supplemental Figure 5). Recovery factor scores were significantly associated with PRO clusters (P = 0.0009); however, they showed a differential ability to identify individual LC deficit domains, with significant differences between MIN and COG (P = 0.0042, effect size 0.61) and MIN and MLT (P = 0.0018, effect size 0.60) PRO clusters, but not MIN and PHY clusters (P = 0.28, effect size 0.47) (Figure 2D and Supplemental Figure 5B). Taken together, the recovery factor is a multiomics model composed of biologic analyte levels during the convalescent phase of COVID-19 that distinguished MIN from LC over 12 months after hospital discharge in the IMPACC cohort.

Figure 2 Identification of a convalescent multiomics recovery factor that discriminates LC. (A) Predictive performance of a lasso model trained on the MOFA and SPEAR factors to discriminate LC versus MIN at the event level. The mean AUROC of a 10-fold cross-validation on the train cohort for 100 bootstrapped model training repetitions is shown. Significance was calculated by standard normal approximation of bootstrapped differences between models (t test, adj. ****P ≤ 0.0001). CV, cross validation. (B) Predictive performance of the SPEAR Physical model to discriminate LC versus MIN on the test cohort. The ROC curve of model (solid line), random classifier (dashed line), and AUROC value are shown. TPR, true-positive rate; FPR, false-positive rate. (C) Recovery factor scores for the test cohort of the MIN and LC groups at 3 months (visit 7), 6 months (visit 8), 9 months (visit 9), and 12 months (visit 10) after hospital discharge. (D) Recovery factor scores of the individual PRO clusters by visit for the test cohort. P values in C and D show the significance of the recovery factor score association with MIN versus LC and pairwise PRO cluster combinations using a goodness-of-fit χ2 test. See also Supplemental Figure 3–5.

Functional characterization of the recovery factor. To characterize the biologic processes underlying the recovery factor, we performed gene set enrichment analysis (GSEA) for each of the multiomics assays based on the SPEAR model’s internal ranking of the relative importance of each feature for predicting the PRO PROMIS Physical score. The hallmark heme metabolism transcriptomic pathway was negatively associated with the recovery factor, indicating upregulation in participants with LC, whereas the androgenic steroids metabolite set was positively associated with the recovery factor, indicating downregulation in participants with LC (Figure 3A). Evaluated individually, several leading-edge analytes in the hallmark heme metabolism gene set and androgenic steroids subpathway metabolite set showed significant associations with LC status (Supplemental Figure 6, A and B).

Figure 3 Heme metabolism and androgenic steroid pathways, inflammation-associated serum factors, and altered immune cell composition are associated with the recovery factor during convalescence. (A) GSEA identified heme metabolism and androgenic steroid pathways as being significantly associated with the recovery factor, with significance shown per assay, as well as across assays (joint adj. P < 0.05). (B) Twenty-six significant analytes (SPEAR Bayesian posterior selection probability ≥0.95) in the recovery factor across different assays (left) were identified using SPEAR factor loadings (middle; coefficient in the factor), and each was tested for association in the test cohort with MIN versus LC groups (right; adj. intercept P value). (C) Geometric means of analytes from the significantly enriched gene and metabolite sets and/or significant SPEAR analytes are shown per sample at each convalescent visit in the test cohort. The combined geometric mean score includes leading edge analytes from the hallmark heme metabolism and androgenic steroids pathways, and the significant SPEAR analytes. The P values indicate significance of the association with MIN versus LC. (D) Association in the full cohort of whole blood cell counts determined by CyTOF with the recovery factor for parent and child immune cell types. Mono, monocytes; B, B lymphocytes; CD4, CD4+ T lymphocytes; CD8, CD8+ T lymphocytes; CD27+ non-sM, CD27+ nonswitched memory; CD27+ sM, CD27+ switched memory. For a full list of the child populations, see Supplemental Table 3. (adj. *P < 0.05, adj. **P < 0.01, adj. ***P < 0.001). See also Supplemental Figure 6.

SPEAR performs internal significance testing to determine the importance of each analyte in predicting the response variable. The SPEAR Physical model identified 26 analytes across 4 assays as significant in the recovery factor (SPEAR Bayesian posterior selection probability ≥0.95), and we assessed the associations of these features with LC status in the test cohort, adjusting for age and sex (Figure 3B and Supplemental Figure 6E). Nine of these 26 analytes were from the serum Olink assay. Of these, DNER, a noncanonical Notch ligand implicated in promoting tumor growth, metastasis, and wound healing (43, 44), was significantly reduced in LC participants, consistent with a prior study of plasma proteins in LC (28). The remaining serum Olink analytes were negatively associated with the recovery factor. These included proteins and cytokines associated with chronic inflammatory conditions (45–50), particularly endothelial/vascular inflammation (FGF23, FGF21, CXCL9, TNFRSF11B, and TNFRSF9 [CD137]), as well as inflammation-associated myeloid regulators (51–53) (MMP10 and CSF1). Elevated IL10RB levels have been associated with worse outcomes in acute SARS-CoV-2 infection (54), consistent with elevation under inflammatory conditions. Leucine-rich alpha-2 glycoprotein 1 (LRG1), a protein elevated in participants with LC, is induced by IL-6 and other inflammatory cytokines and has been implicated in angiopathic activity (55–57). Phenylacetylglutamate and phenylacetylglutamine are gut microbiota–derived metabolites associated with vascular inflammation and thrombosis (58). Finally, the OSBP2 transcript, which encodes an oxysterol binding protein (59), was also identified as a leading edge gene in the Hallmark Heme Metabolism gene set elevated in LC participants.

Several metabolites from the androgenic steroids pathway were represented in the 26 significant analytes and were positively associated with the recovery factor, indicating that higher levels correlate with better physical function. When these metabolites were tested individually for association with LC status, five [DHEA-S, epiandrosterone sulfate, androsterone sulfate, 5α-androstan-3β,17β-diol monosulfate (2), 5α-androstan-3β,17α-diol disulfate] were significantly lower in LC participants, adjusting for age and sex (Figure 3B). Androgens can suppress inflammation (60), suggesting that higher levels of androgenic steroids in MIN participants could reflect better control of chronic inflammation. These findings are consistent with reports showing lower sex hormone levels in individuals with LC (31). Five metabolites related to pregnenolone were also represented in the significant SPEAR analytes (Figure 3B). Pregnenolone is synthesized from cholesterol as the first step of the steroid hormone biosynthesis pathway and is known to have potent effects as an inhibitor of inflammation (61) and as a neurosteroid (62). Altogether, these findings are consistent with a prominent role for persistent inflammation in LC with dysregulation of key analytes that may contribute to LC symptoms, including those that drive angiopathy, reduce wound healing, and alter heme metabolism.

The feature sets from heme metabolism and androgenic steroids identified by GSEA, combined with the significant SPEAR analytes, represent 73 unique features that potentially condense the predictive power of the recovery factor into a smaller feature set. To test this hypothesis, we calculated the geometric mean of the 43 leading-edge heme metabolism and 12 androgenic steroid features, as well as the 26 significant SPEAR analytes. All 3 geometric mean scores were significantly associated with LC in the test cohort (Figure 3C). Furthermore, the combined score, including analytes from all 3 feature sets, discriminates MIN and LC participants with even greater significance (Figure 3C). Thus, while the recovery factor consists of weighted contributions from 6,807 features, we have identified a smaller set of 73 unique features that discriminates participants according to LC status in the convalescent period.

Consistent with our finding, studies involving 2 separate cohorts have reported upregulation of the hallmark heme metabolism pathway in individuals with LC. In Hanson et al. (29), hospitalized and nonhospitalized participants with persisting symptoms 1–3 months after acute SARS-CoV-2 infection had higher hallmark heme metabolism signatures than did participants without persisting symptoms. In the study by Karisola et al. (63), which included only nonhospitalized patients with COVID-19, men with persisting symptoms 3 months after acute SARS-CoV-2 infection had higher hallmark heme metabolism signatures than did men without persisting symptoms. To determine whether the same heme metabolism–related genes were dysregulated in participants with LC in the IMPACC and external cohorts, we used the leading-edge genes from the significant hallmark heme metabolism pathway in our GSEA results (Supplemental Figure 6A) and calculated the geometric mean scores in whole blood transcriptomics profiles from Hanson et al. (29) and PBMC transcriptomics profiles from Karisola et al. (63). These heme metabolism leading-edge gene scores significantly differentiated participants with persistent symptoms from those with resolved symptoms in both cohorts, including both sexes (Supplemental Figure 6, C and D). The generalization of elevated expression of this heme metabolism gene set in nonhospitalized and hospitalized patients with COVID-19 who experienced LC in 3 independent and varied cohorts underscores its centrality for LC pathology.

Prior studies have identified altered leukocyte frequencies as a feature of LC (18, 26, 29, 31, 33). To determine whether similar cellular changes were associated with the recovery factor, we analyzed whole blood CyTOF cell frequencies for 15 parent and 46 child immune cell types in our cohort during convalescence (Figure 3D). Several cell subsets were significantly associated with the recovery factor. B cells and CD161+ muc–osal-associated invariant T (MAIT) cells were positively associated with the recovery factor. In contrast, polymorphonuclear neutrophils (PMNs) and monocytes, specifically the CD14+CD16– classical monocyte subset, were negatively associated with the recovery factor. Together, these findings suggest that elevated monocytes and neutrophils, along with decreased B cells, are associated with prolonged inflammation during LC. These findings are consistent with a previous report that monocytes are elevated in men with LC (31). The decrease in MAIT cells could be another effect of sustained inflammation, as reduced circulation of MAIT cells has been associated with chronic HIV (64) and hepatitis C (65) infections.

The recovery factor is associated with clinical characteristics and multiple PROs in the convalescent period. We next evaluated whether the recovery factor was associated with clinical features and additional clinical outcomes. We tested the association of recovery factor scores with clinical features at hospital admission (visit 1), including demographics, comorbidities, complications, and baseline laboratory measurements (Figure 4A). Several demographic and clinical measurements were significantly associated with recovery factor scores, including age, sex, length of hospital stay, and the Sequential Organ Failure Assessment (SOFA) score. Notably, anemia at hospital discharge was negatively associated with the recovery factor, whereas hemoglobin (Hgb) and hematocrit baseline measurements showed a significant positive association (Figure 4A). We additionally conducted association testing with PRO scores from surveys conducted during the same visit at which the recovery score was assessed in participants across the convalescent period, correcting for age and sex. The recovery factor score was significantly associated in the test cohort with the PROMIS Physical score, on which the model was trained (Figure 4B), and the EQ-5D-5L score, both of which contained questions assessing physical function (Figure 4B). The recovery factor also correlated with PROMIS Mental and Psychosocial Impact scores, although these associations were not significant after P-value correction (Figure 4B). We also tested whether recovery factor scores associated with reported clinical symptoms in the 7 days prior to each visit but found no significant association (Figure 4C).

Figure 4 Associations of clinical measurements with recovery factor scores. (A) Association of recovery factor scores with clinical features (demographics, comorbidities, complications, and baseline laboratory measurements). Dot plot shows the signed adjusted P values indicating the clinical feature term significance from a linear mixed-effects model, with enrollment site and participant as random effects to explain the convalescent phase recovery factor scores. Sex and discretized age were further adjusted as fixed effects for clinical features other than sex and age. Only significant associations (adj. P < 0.05) are shown. (B) Associations of recovery factor scores with individual PRO survey scores (PROMIS scale scores, EQ-5D-5L and health score) in the test cohort. Raw and adjusted P values indicate the PRO score term significance in linear mixed-effects models. (C) Associations of recovery factor scores with each indicated symptom group in the test cohort: neurological, cardiopulmonary, upper respiratory, systemic, gastrointestinal. Numbers are the uncorrected significance (P values) of the symptom group term in linear mixed-effects models. (D) Recovery factor scores per participant in the test cohort, separated into MIN and LC groups by acute phase trajectory groups, stratified by visit. P values for B–D show the endpoint term of a linear mixed-effects model with sex, discretized admission age, and trajectory group as fixed effects and enrollment site as a random effect. No individual MIN versus LC comparisons were significant after P value correction.

There is a lack of consensus about whether LC is associated with the severity of acute disease. A previous analysis of clinical features from the IMPACC cohort showed no association between acute infection severity, as assessed by clinical trajectory groups, and LC development (18). However, other studies have found an association (6). Thus, we sought to determine whether acute disease severity contributed to the association between recovery factor scores and LC status in our cohort. Clinical severity in the IMPACC cohort during the acute phase was defined by unsupervised clustering of the respiratory ordinal score over time, taking discharge status and limitations into account, with trajectory group 1 (TG1) representing the mildest and TG4 the most severe disease course among participants who survived for at least 28 days after hospitalization (66). After correcting for acute phase trajectory group assignment, recovery factor scores remained significantly associated with LC at the first 3 convalescent time points (Figure 4D), indicating that acute clinical severity does not contribute to the association between participant recovery factor scores in the convalescent disease phase and LC status.

Sex affects recovery factor scores. LC occurs more frequently in women than men, despite a higher percentage of men with severe acute COVID-19 disease (21, 67). In the IMPACC convalescent cohort, nearly half of the female participants presented with long-term deficits compared with only approximately 30% of male participants (Supplemental Figure 7A). Assignment to clinical LC subtypes was not influenced by sex, with similar proportions and numbers of male and female LC participants assigned to the COG, PHY, and MLT PRO clusters (Supplemental Figure 7A). However, consistent with the known influence of sex on LC status, sex was a statistically significant covariate in the recovery factor association with LC status from Figure 2C (P = 3.6 × 10–7) and with PRO clusters from Figure 2D (adjusted [adj.] P < 0.001 in all pairwise comparisons). Thus, we tested whether the recovery factor discriminates LC in both men and women by repeating our associations with LC status in the test cohort separated by sex. Recovery factor scores were significantly associated with the binary assignment to LC versus MIN groups in women but not in men after P value adjustment (Supplemental Figure 7B), although the trend of lower scores in participants with LC persisted in men. When considering individual PRO groups, recovery factor scores discriminated between MIN versus COG and MIN versus MLT groups for women and MIN versus MLT PRO groups for men (Supplemental Figure 7C). Given that the LC incidence is lower in men, it is notable that recovery factor scores were generally higher for men than for women, regardless of LC status. Geometric mean scores of the leading-edge analytes in the recovery factor from the heme metabolism and androgenic steroid pathways and the significant SPEAR analytes lost significance in 1 or both sexes when the cohort was divided into men and women (Supplemental Figure 7D). Notably, though, their combined score remained significantly associated with LC in both sexes (Supplemental Figure 7D).

Vaccination is not associated with altered recovery factor scores. Our cohort was enrolled prior to the national SARS-CoV-2 vaccine rollout for the general population. During the longitudinal post-hospitalization follow-up period, as vaccines became broadly available, close to 75% of the participants in the IMPACC convalescent cohort received a SARS-CoV-2 vaccine (Supplemental Figure 8, A and B). To assess the potential influence of the vaccine response on the immune profiling data and thus the recovery factor, we compared recovery factor scores per visit for events occurring before and after the first vaccine dose, as well as events occurring within a 3-week period after any vaccine dose, when vaccine responses have been shown to affect immune profiles (68, 69). No significant difference was found in recovery factor scores across these comparisons, indicating a negligible effect of vaccination on immune profiles related to LC in our patient cohort (Supplemental Figure 8, C and D).

Recovery factor scores during the acute disease phase associate with LC status during convalescence. We next investigated whether the immune elements identified in the recovery factor were predictive of LC status when measured at the acute infection phase, prior to LC development. We computed recovery factor scores using immune profiling data from all participants in the convalescent cohort during their acute phase visits (visits 1–6, spanning the hospital admission through days 26–35 after admission). Remarkably, the recovery factor scores were significantly higher for MIN participants than for LC participants as early as at hospital admission (visit 1) and consistently higher during the acute period (Figure 5A and Supplemental Figure 9A). Recovery factor scores were also significantly higher for MIN groups than for COG groups and for MIN versus PHY groups in the acute phase when assessed across the 28-day time course (Figure 5B and Supplemental Figure 9B). Geometric mean scores of heme metabolism and androgenic steroid pathway analytes from the recovery factor, as well as the 26 significant SPEAR recovery factor analytes, were also significantly associated with LC status during the acute phase. The combined geometric mean score of analytes from these 3 feature groups in the acute phase data associated most significantly with MIN versus LC status (Figure 5C), as it did previously in the convalescent phase (Figure 3C).

Figure 5 Recovery factor scores in acute phase data associate with eventual LC status. (A) Recovery factor scores during the acute disease phase for participants in the LC and MIN groups within 72 hours of hospital admission (visit 1) and at day 4 (visit 2), day 7 (visit 3), day 14 (visit 4), day 21 (visit 5), and day 28 (visit 6) after admission. (B) Recovery factor scores during the acute disease phase for participants in individual PRO clusters. (C) Geometric mean scores of analytes in enriched gene and metabolic sets and/or significant SPEAR analytes during the acute phase. No individual per-visit comparisons were significant after P value correction. P values in the top-right box in A–C show the significance of the recovery factor score or the geometric mean signature association with MIN versus LC or pairwise PRO cluster combinations. Bars above the box plots show the pairwise significance across groups in a per-visit comparison (*P < 0.05 and **P < 0.01). (D) Recovery factor score association with whole blood CyTOF immune cell populations during the acute phase (adj. *P < 0.05, adj. **P < 0.01, adj. ***P < 0.001). See also Supplemental Figure 9.

We further assessed whether altered circulating immune cell composition in the acute phase could contribute to acute phase recovery factor scores. Association testing of recovery factor scores with whole blood CyTOF measurements during the acute phase showed that CD4+ and CD8+ T cells, conventional DCs (cDCs), plasmacytoid DCs (pDCs), eosinophils, basophils, and CD56hi CD16lo NK cells were significantly positively associated with the recovery factor scores (Figure 5D). Within– the CD4+ T cell compartment, naive, central memory T (Tcm), and effector memory T (Tem) subsets, as well as non-naive Tregs were significantly associated with recovery factor scores, while activated CD4+ T cells were inversely correlated. Within the CD8+ T cell compartment, naive, Tcm, and Tem subsets were positively associated with recovery factor scores, as were NKT cells and CD161+ MAIT cells. In contrast, monocytes, neutrophils, B cells, and plasmablasts in the acute phase were negatively associated with recovery factor scores (Figure 5D). These findings are consistent with a previous study that found higher plasmablast counts and lower total CD4+ T, total CD8+ T, CD4+ Tem, CD8+ Tem, Treg, NK, and DC counts in immune cell populations sampled on post-infection days 0–14 in patients with COVID-19 who experienced persisting symptoms at days 91–180 after infection (29). The similarities across both studies indicate an acute blood immune cell type signature of LC that is robust to variance in patient cohorts and LC definition.

In summary, our findings indicate that the major biologic signatures of the recovery factor that stratify LC from recovered participants in the convalescent phase — elevated heme metabolism gene signatures, reduced androgenic steroids, increased circulating inflammatory mediators, and increased monocytes and neutrophils — are evident early in the acute phase.

Acute phase recovery factor scores distinguish acute disease severities and predict LC risk irrespective of acute severity. We investigated the full IMPACC study cohort (n = 1,148 participants with at least 1 sample measurement at visit 1 for the omics modalities included in our model) to assess whether recovery factor scores determined from acute phase data would associate with patient severity trajectory group assignments. For this analysis, we included participants who did not survive beyond 28 days after hospital admission and participants without biospecimens and/or surveys during the convalescent phase. We found that the recovery factor scores were significantly associated longitudinally with acute disease trajectory groups and were highest in participants with milder disease courses (TG1–TG3), and were lowest in participants with the most severe acute disease trajectories (TG4 and TG5) (Figure 6A). Acute phase recovery factor scores increased over time for participants in all trajectory groups except TG5, the group with the most severe disease, in which participants died by day 28 after hospital admission (Figure 6A). To assess whether the association between acute recovery factor scores and convalescent LC status was simply due to acute recovery factor scores being an indicator of acute disease severity, we repeated the association test, including the trajectory group as a covariate at each visit (Figure 6B) and longitudinally (Supplemental Figure 9C). LC status was still significantly associated with acute recovery factor scores, even after taking the trajectory group into account. These findings suggest that recovery factor scores in the acute phase contain valuable information for predicting convalescent LC status beyond its correlation with acute disease severity.

Figure 6 Recovery factor scores associate with acute disease phase trajectory groups, but identify LC irrespective of acute severity. (A) Longitudinal analysis of acute recovery factor scores for the full IMPACC cohort stratified by trajectory group (n = 1,148 participants). The P value shows the significance of the trajectory group term in a longitudinal model, correcting for age and sex as fixed effects and enrollment site and participant ID as random effects. (B) Recovery factor scores in the acute phase by convalescent MIN/LC label, stratified by acute trajectory group and visit number. P values show significance in distinguishing MIN versus LC labels in linear mixed models with sex, discretized admission age, and trajectory group as fixed effects and enrollment site and participant ID as random effects, performed separately for each acute visit and corrected across all visits.

Machine learning models based on the recovery factor scores predict LC status during the convalescent phase. We next assessed whether a combination of recovery factor scores and clinical characteristics (Supplemental Figure 10) could improve predictive performance. Machine learning models trained on acute phase recovery factor scores performed better than those trained on clinical features, and a model trained on the combination of both performed best (Supplemental Figure 11A). Similarly, machine learning models trained on both convalescent phase recovery factor scores and clinical features performed better than models trained on either alone (Supplemental Figure 11B). A model trained on the 26 SPEAR significant analytes was also predictive for LC, although at lower performance than models trained on the full recovery factor (Supplemental Figure 11B). This sparse model could be advantageous in a clinical setting as a diagnostic tool to identify individuals with LC. Notably, recovery factor scores at as early as visit 1 provided predictive performance (Supplemental Figure 11C), indicating that the recovery factor captures early predictive features of LC during the acute phase, albeit the signal at this early time point is not as strongly predictive as later in the convalescent phase.