Clinical cohort. Diarrheal specimens meeting inclusion and exclusion criteria were cultured for C. difficile. All C. difficile isolates recovered in culture were characterized for the presence of toxins tcdA, tcdB, cdtA, and cdtB by multiplex PCR, and underwent PCR ribotyping as previously described (11–14). Of the 8931 available stool specimens (Supplemental Figure 1; supplemental material available online with this article; https://doi.org/10.1172/JCI126905DS1), 2829 were eligible for chart review, through which an additional 2206 were excluded, yielding 622 stool specimens meeting inclusion and exclusion criteria. From these specimens, we assembled a 186-person cohort split into 3 groups of 62 patients matched by age and hospital location. These groups were defined by laboratory results: toxigenic culture–positive and toxin enzyme immunoassay–positive (using the Wample/TechLab Tox A/B II assay during routine clinical testing, Cx+/EIA+), toxigenic culture–positive and toxin enzyme immunoassay–negative (Cx+/EIA–), and toxigenic culture–negative and toxin enzyme immunoassay–negative (Cx–/EIA–) controls. Cohort demographics and clinical characteristics are shown in Table 1.

Table 1 Demographics of patient cohorts, including a summary of C. difficile ribotypes

Fecal metabolome characteristics. To characterize fecal metabolomic variations in the study cohort, we detected and quantified trimethylsilyl-derivatized fecal extracts using GC-MS. GC-MS is sensitive to low-molecular-weight analytes and does not detect proteins, peptides, complex lipids, or other macromolecules. We detected ions produced by electron ionization (EI), which oftentimes provides sufficient structure information to chemically identify metabolites of interest. Fecal metabolites may originate from human cells, microbiome, and/or diet. To compare metabolomes between specimens in the study population, GC-MS profiles were aligned so that each analyte (hereafter called a feature) is defined by its characteristic EI mass spectrum and GC retention time. Within the 186 patient specimens, we detected 2540 distinct features, 77 of which were removed as contaminants because they were present at comparable levels in multiple blank controls, leaving 2463 features for metabolomic analyses. These features were sparsely distributed with a heavy tail (Figure 1A), with only 593 features appearing in at least 8 (5%) specimens. The number of molecular features per sample was approximately normally distributed (Figure 1B; mean 164 features, standard deviation 54 features). Principal component analysis (PCA) of log-transformed feature intensities revealed no dominant modes of variation, with the first principal component explaining less than 10% of the overall variance in the data (Figure 1, C and D). Fecal metabolomes defined by GC-MS thus exhibit a high degree of individual variation, with only a small minority of metabolites common to all subjects.

Figure 1 Metabolomic characteristics of the patient cohort. (A) Histogram showing the distribution of feature richness (number of features present per sample) across all patient specimens. (B) Histogram showing the number of samples within which each unique feature is present. Fecal metabolomes were highly individualistic: among the more than 2000 features detected, most were infrequent. While the resulting data are very sparse overall, the distribution has a relatively heavy tail with a few features present in many samples. (C) Principal component analysis (PCA) score plot across the first 2 components created using log-transformed feature intensities across all metabolomic features. (D) PCA does not appear to reveal dominant modes of variation, with no single component explaining more than 9% of the variance and a long tail of modes each explaining approximately 1% each.

Metabolomic differences between C. difficile–infected and uninfected controls. To identify CDI-associated fecal metabolites, we conducted a supervised multivariate comparison of Cx+/EIA+ and Cx–/EIA– specimens. We used Cx+/EIA+ specimens to represent CDI because they harbor viable, toxigenic C. difficile alongside evidence of concurrent toxin production. Given the chemical complexity of fecal metabolomes (the >2000 resolved features greatly exceed the 124 samples), we employed multiple complementary measures to avoid overfitting the data, including repeated cross-validation (see Methods). Sparse partial least squares-discriminatory analysis (sPLS-DA) (Figure 2A) demonstrates good separation between metabolite profiles from the Cx+/EIA+ and Cx–/EIA– groups, despite this model’s use of an explicit penalty to prevent overfitting. To further assess this relationship, we conducted a separate logistic regression analysis on the Cx+/EIA+ and Cx–/EIA– groups with a similar penalization parameter to avoid overfitting. Using repeated 5-fold cross-validation with random subsets to select an appropriate penalization level, we found that relatively few molecular features yielded a large jump in average accuracy of the regression model (Figure 2B). We fixed the penalty parameter to the value yielding the maximum percent predicted, indicated by the star in Figure 2B, and again performed penalized logistic regression fit to the Cx+/EIA+ and Cx–/EIA– groups with repeated randomized 5-fold cross-validation. The observed distributions of log-odds for the test folds (that is, excluding the training sets) for Cx+/EIA+ and Cx–/EIA– again demonstrate good separation (Figure 2C). For comparison, Figure 2C also includes the distributions of the log-odds values for the Cx+/EIA– cases. The 9 metabolite features most consistently associated with Cx+/EIA+ specimens (Table 2 and Supplemental Table 1) include both positive and negative associations. The features consist of 2 SCFAs, 1 amino acid, 1 bile acid, 1 lipid, 3 carbohydrates, and 1 aromatic alcohol. These results implicate biochemically diverse metabolites in human CDI pathogenesis. We then fit a logistic model using only the 6 features that were most frequently selected across the cross-validation runs. This model achieves a ROC AUC (area under the receiver-operator characteristic curve) of 96.7%, with a 95% confidence interval of 85.6%–100% obtained under repeated randomized 5-fold cross-validation (Figure 2D). These results are consistent with a strong, characteristic signal that distinguishes Cx+/EIA+ specimens from Cx–/EIA– controls.

Figure 2 Supervised metabolomic analyses comparing Cx+/EIA+ with Cx–/EIA– samples. (A) Observed separation of Cx+/EIA+ and Cx–/EIA– samples under sparse partial least squares–discriminatory analysis (sPLS-DA). The data ellipses are drawn around each group of samples (at the 95% level). (B) Penalized logistic regression under repeated 5-fold cross-validation shows how the number of features used relates to the obtained accuracy, yielding high accuracy with a relatively small number of features. The maximum percent predicted is indicated by a star. (C) Using the penalty parameter associated with the maximum percent predicted, penalized logistic regression demonstrates good separation in the distribution of log-odds to be classified Cx+/EIA+ versus Cx–/EIA–. In the log-odds distribution shown here, only the test folds of Cx+/EIA+ and Cx–/EIA– for each randomized cross-validated run are shown (that is, the corresponding distribution of the training set is not shown). For comparison, the corresponding log-odds of the Cx+/EIA– samples are also shown. (D) Logistic regression (without penalty) to classify Cx+/EIA+ versus Cx–/EIA– was performed using only the 6 features most frequently used in the penalized logistic regressions. Fitting to all samples gives 96.7% ROC AUC. The 95% CI of 85.6%–100% AUC was obtained under repeated randomized 5-fold cross-validation using the same 6 features.

Table 2 Top CDI-associated metabolites selected during cross-validation of logistic regression model

Stickland amino acid fermentation in CDI. Among the most highly CDI-associated metabolites (Table 2) is the SCFA 4-methylpentanoic acid (4-MPA/4-methylvaleric acid/isocaproic acid). Unlike the SCFAs formate, acetate, and butyrate, which are produced during microbial carbohydrate fermentation, 4-MPA is produced from leucine through the Stickland reactions, amino acid fermentation pathways associated with C. difficile and other anaerobic bacteria (15–24). Ten established Stickland products were detected in the study cohort, representing both oxidative and reductive fermentation of 8 different amino acid precursors (Figure 3A and Supplemental Figure 2). These products exhibit varying degrees of association with CDI, with 8 of 10 products (80%) detected more frequently in CDI specimens than controls (Figure 3B and Supplemental Figure 5). Many Stickland products were present in Cx–/EIA– specimens, consistent with production by bacteria other than toxigenic C. difficile. Bootstrapped logistic regression (fit on 2000 bootstrap samples, stratified on Cx/EIA status) of Stickland metabolites consistently assigns the highest odds ratios for CDI to 4-MPA, the end product of leucine reduction (Figure 3C). Although other canonical Stickland products like 5-aminopentanoic acid (5-aminovaleric acid) are frequently present in CDI, they offer negligible discriminatory power beyond that of 4-MPA in the adjusted analysis.

Figure 3 Amino acid metabolism in C. difficile. (A) Stickland metabolism consists of anaerobic amino acid fermentation through coupled oxidation and reduction pathways. In the reductive pathway, amino acids are first deaminated to form 2-hydroxy acids and then reduced to carboxylic acids. In the oxidative pathway, amino acids are deaminated and oxidized with loss of CO 2 to yield a distinct set of carboxylic acids. Depicted here are established Stickland substrates and products identified within patient fecal metabolomes. Stickland substrates include the nonproteinogenic amino acid ornithine. ND, not determined. (B) Heatmap of Stickland precursor and product abundances corresponding to patient fecal metabolomes from the 3 diagnostic groups. Metabolites were organized using unsupervised hierarchical clustering. Metabolites differing significantly (Mann-Whitney U test; *P < 0.05, ***P < 0.001) between Cx–/EIA– and Cx+/EIA+ groups are labeled, along with the direction of the difference relative to the Cx–/EIA– control group. Stickland products are labeled according to the color scheme in A. (C) Adjusted and unadjusted (crude) CDI odds ratios and confidence intervals (95%) for Stickland precursors and products. Odds ratios were estimate by fitting logistic regression models to each of 2000 bootstrap samples stratified on Cx/EIA status (Cx–/EIA– vs. Cx+/EIA+). Logistic models containing a single metabolite were fit to obtain crude odds ratios (red). A single logistic model including all metabolites was fit to obtain the adjusted odds ratios (green). Bars represent 95% bootstrap percentile confidence intervals and black dots represent median odds ratios across all bootstrap samples. Stickland products are labeled according to the color scheme in A.

To more precisely quantify the relationship between 4-MPA production and CDI, we devised a targeted GC-MS assay to quantify Stickland fermentation activity through product/precursor ratios. In addition to increasing assay sensitivity and precision, this targeted biomarker ratio is intrinsically insensitive to the variations in fecal dilution that characterize diarrheal specimens. In an arbitrary subset of matched specimens, the 4-MPA/leucine ratio varied significantly between groups (P = 1.3 × 10–8, Kruskal-Wallis test). This variation distinguishes Cx+/EIA+ specimens from Cx–/EIA– specimens with an ROC AUC of 92.8% (95% CI: 86.8%–98.7%; Figure 4, A and B) that rivals the 6-feature regression model described above and in the Methods (Figure 2D; AUC = 96.7%; 95% CI: 85.6%–100%).

Figure 4 4-MPA/leucine ratio elevated in CDI. (A) Dot plots of 4-MPA/leucine product/precursor ratios measured by targeted (SIM) reanalysis of fecal specimens (n = 32 for each group). Patient groups were compared using the Kruskal-Wallis test (P = 1.3 × 10–8). To further characterize pair-wise differences between groups, Bonferroni-corrected Mann-Whitney U test P values are indicated (3 comparisons; NS: P ≥ 0.05, ***P < 0.001). Ratio thresholds giving perfect specificity (0.0825, black star) or sensitivity (0.00132, white star) for CDI+/EIA+ are marked as gray dashed lines. (B) Receiver-operator characteristic (ROC) plot distinguishing Cx+/EIA+ patients from Cx–/EIA– patients. The gray region represents the bootstrapped 95% confidence interval for the true-positive rate at each false-positive rate. Thresholds with perfect specificity or sensitivity are marked by stars, as in A.

Together, these results are consistent with a pathophysiologic role for Stickland fermentation in CDI. While the presence of these metabolites in Cx–/EIA– specimens suggests that intestinal Stickland metabolism in patients is not generally unique to CDI, the selective increase in 4-MPA in CDI specimens raises the possibility that leucine reduction is a selectively emphasized pathway in C. difficile during clinical infections.

The isomeric amino acid allo-isoleucine is associated with CDI. Among the metabolites that are positively associated with CDI is allo-isoleucine, an isoleucine diastereomer in which the beta carbon stereocenter is inverted from an S to an R configuration (Figure 5A). This noncanonical, nonproteinogenic amino acid has been identified as a biomarker of branched chain ketoaciduria (maple syrup urine disease, an inborn error of metabolism) but has not previously been associated with C. difficile or CDI. Its origins in feces are unclear, although a previously reported bacterial metabolic pathway producing it from L-isoleucine raises the possibility that it derives from the intestinal microbiome (25). To more carefully assess the relationship between allo-isoleucine and CDI, we devised a targeted GC-MS assay to quantify allo-isoleucine as a ratio to isoleucine, its putative precursor. The allo-isoleucine-to-isoleucine ratio varied significantly between groups (P = 6.5 × 10–5, Kruskal-Wallis test; Figure 5B and Supplemental Figures 3 and 4). ROC analysis (Figure 5C) (AUC = 79.7%; 95% CI: 68.2%–91.3%) suggested favorable diagnostic potential for distinguishing Cx+/EIA+ specimens from Cx–/EIA– specimens. These observations identify allo-isoleucine as a new and biochemically distinctive CDI correlate of unclear origin.

Figure 5 Isoleucine isomer correlated with C. difficile. (A) Chemical structures of isoleucine and its diastereomer, allo-isoleucine. (B) Dot plot of allo-isoleucine/isoleucine ratios as measured by SIM (n = 32 for each group). Patient groups were compared using the Kruskal-Wallis test (P = 6.5 × 10–5). To further characterize pair-wise differences between groups, Bonferroni-corrected Mann-Whitney U test P values are indicated (3 comparisons; NS: P ≥ 0.05, ***P < 0.001). (C) ROC plot showing ability to distinguish Cx+/EIA+ patients from Cx–/EIA– patients. The gray region represents the bootstrapped 95% confidence interval for the true-positive rate at each false-positive rate.

Bile acid metabolic pathways active in patients without CDI. Three negatively loaded bile acid features are among the most frequently detected Cx+/EIA+ correlates in our cross-validated analysis (Table 2 and Supplemental Table 1). This corresponds to previous scholarship, which has associated bile acid dehydroxylation by the intestinal microbiota with CDI susceptibility (6, 7, 26, 27). Canonical bile acid processing by the microbiome involves successive dehydroxylation of cholic acid (CA; a tri-hydroxylated primary bile acid) to deoxycholic (DCA, a di-hydroxylated secondary bile acid) and chenodeoxycholic acid (CDCA; a di-hydroxylated primary bile acid) to lithocholic acid (LCA, a mono-hydroxylated secondary bile acid). Unexpectedly, the 2 most highly CDI-associated bile acids in our cohort were identified as cholenoic acid and monohydroxycholenoic acid (CE and MHCE, respectively, Supplemental Figures 6–11), noncanonical unsaturated, dehydroxylated bile acids. As with DCA and LCA, these bile acids were more abundant in the non-CDI group, consistent with an alternative bile acid dehydroxylation pathway based on dehydration reactions (net loss of H 2 O to yield a double bond).

Unsaturated, nonhydroxylated bile acids are seldom considered in the bile acid literature. Their absence from our metabolite database compelled us to identify them through manual interpretation of spectra and comparison to chemically related reference compounds (Supplemental Figures 6–11). CE, a nonhydroxylated, unsaturated bile acid, was previously identified by Robben et al. as a lithocholic acid sulfate (LCA-S) desulfation product generated by an intestinal isolate of the Bacteroidaceae family (28). Robben et al. noted 2 isomeric CE products of these bacteria that differ in double bond location. We similarly observed 2 closely eluting CE products, consistent with a similar product distribution in our patient cohort (Supplemental Figure 9). Human tissues are known to generate sulfated bile acids, including LCA-S, which may provide substrates for fecal CE production through enzymatic desulfation (29). These observations are consistent with diminished microbial bile acid desulfation activity in patients with CDI.

Identification of a CDI-associated human bile acid network. Based on the presence of CE and MHCE in patient specimens, we hypothesized that sulfated bile acids (the precursors of unsaturated bile acids) (28) are also present. We further hypothesized that the desulfation mechanism of unsaturated bile acid production is generalizable such that an extended series of bile acid sulfates and unsaturated bile acids are present in the human fecal metabolome (Figure 6B). Using the calculated molecular weights, MS/MS fragmentation patterns, and chromatographic elution ranges for these hypothesized bile acids, we constructed a liquid chromatography–tandem mass spectrometry (LC-MS/MS) assay (Supplemental Figures 12–14 and Supplemental Table 4) because sulfated bile acids are undetectable by GC-MS. This assay resulted in tentative detection of 14 sulfated bile acids, 6 of which were dehydrogenated (possessing either an alkene or ketone; Table 3 and ref. 30). Many of these bile acids are distinguishable only by retention time, consistent with isomers that differ in the position(s) of double bonds, hydroxyl groups, and/or sulfate.

Figure 6 Bile acid transformations in the clinical cohort. (A) A force-directed network layout illustrates associations between bile acids in the study cohort. Each node represents a bile acid and each connecting line (edge) represents an association between 2 bile acids as 1 of the 5 highest correlations for at least 1 of the corresponding nodes. Edge lengths are determined by the level of correlation between connected bile acids. Nodes are colored by community assignment. (B) Scheme showing metabolic transformations producing bile acids in the network analysis. The central structure highlighted in gray represents a tri-hydroxylated primary bile acid (e.g., cholic acid). Taurine or glycine conjugation forms peptide bonds to the carboxylic acid group (right). Alcohol groups are removed from the bile acid nucleus (dehydroxylation, bottom right) or oxidized to a ketone (top left). Bile acid sulfation involves substitution of an alcohol group with a sulfate (R = SO 4 –) group (bottom left). Desulfation of bile acid sulfates yields unsaturated bile acids (left).

Table 3 Fecal bile acids monitored by LC-MS/MS

Although fecal bile acids largely originate from 2 primary bile acids (CA and CDCA), subsequent host conjugation, divergent microbiome cometabolism, and enterohepatic circulation create a complex, nonlinear bile acid physiology. To characterize bile acid interrelationships, we therefore performed community detection (31) on the weighted network of positive correlations among the 14 noncanonical bile acids described above and 17 canonical conjugated and nonconjugated primary and secondary bile acids. Seven bile acid communities emerged from this unbiased network community detection analysis, many of which could be rationalized by shared chemical features (Table 3 and Figure 6A). Where unavailability of authentic internal standards prevents identification of hydroxylation sites (e.g., the 3, 7, and 12 carbon positions) or epimers, bile acids are designated with general names. Communities 1 to 3 are composed exclusively of canonical primary and secondary bile acids. Community 1 consists of classic primary bile acids while community 2 consists of their glycine or taurine conjugates. Community 3 consists of conjugated secondary (dehydroxylated) bile acids. Community 4 includes secondary bile acids, secondary bile acid sulfates, and 1 candidate di-hydroxylated cholenic acid sulfate. Communities 5 and 6 consist entirely of sulfated bile acids, with a single sulfated cholenic acid candidate. The 5 bile acids in community 7 are all sulfated, with 4 cholenic acid sulfate candidates. The 5 candidate dehydroxylated cholenic acid sulfates may plausibly include sulfated keto bile acids, secondary bile acids of identical mass. In a force-directed layout depicting this network (Figure 6A), the primary bile acids (CA, CDCA) are located centrally, consistent with their recognized roles as precursors to conjugated and secondary bile acids. Clockwise progression moves from bile acid communities defined by host glycine and taurine conjugation, to classical microbial dehydroxylation, to sulfation, to desaturation or ketone formation (Figure 6B). The community organization emerging from this analysis reflects the distinctive metabolic transformations identified in the present study and in previous work.

Bile acid metabolomic associations with CDI. Disruption of microbiome-mediated bile acid metabolism has long been regarded to increase CDI risk. In our inpatient cohort, we hypothesized that the Cx–/EIA– group includes a subset of patients with disrupted, CDI-susceptible microbiomes. To test this hypothesis, we used PCA to graphically summarize bile acid metabolomic variation in culture-negative specimens (Figure 7, A and B). Next, we projected Cx+/EIA+ bile acid profiles onto these principal components. Consistent with the hypothesis, Cx+/EIA+ specimens preferentially occupied a restricted portion of the Cx–/EIA– patient bile acid profile distribution. Specifically, Cx+/EIA+ specimens preferentially exhibit elevated values along the first PCA-derived principal component (PC1). High PC1 scores correspond to higher primary (cholic and chenodeoxycholic) and low secondary (deoxycholic and lithocholic) bile acids (Figure 7D), similar to previous studies (26, 27). Low PC1 scores correspond to higher levels of sulfated and dehydroxylated cholenic and cholanic acids (DHCA-S3, DHCE-S3, LCA from community 4). ROC analysis using PC1 as the discriminator revealed an AUC of 61.3% (Figure 7C). These results are consistent with a negative association between CDI and bile acid sulfation, dehydroxylation, and unsaturation. While we cannot conclude a causative role from these correlative data, these metabolic processes may indicate the presence of a CDI-resistant intestinal microbiome.

Figure 7 The bile acid distribution in patients with CDI resembles that of a characteristic subgroup of uninfected, hospitalized patients. (A) Depicted here is a PCA plot of uninfected patients’ bile acid profiles (green, n = 62). Onto this space, we projected the bile acid metabolome of patients with CDI (red, n = 62). Data ellipses are drawn around each group of samples (95% level). Clustering of CDI specimens at high PC1 values is consistent with a favored bile acid distribution among patients with CDI. (B) Dot plot of PC1 scores for each patient sample (n = 62 in each group). Gray dashed line represents optimal PC1 threshold for distinguishing Cx–/EIA– from Cx+/EIA+ samples. This threshold was chosen by maximizing the sum of percent sensitivity and specificity. (C) ROC plot evaluating the ability of PC1 to distinguish CDI patients from controls. The gray region represents the bootstrapped 95% confidence interval for the true-positive rate at each false-positive rate. An asterisk marks the point corresponding to the optimal PC1 threshold depicted in B. (D) PCA loading plot depicting the relative contributions of each bile acid to the distribution of Cx–/EIA– samples in A. Abbreviations are indicated in Table 3.

Fecal carbohydrate associations with CDI. We next hypothesized that the Cx–/EIA– group includes patients with CDI-susceptible intestinal metabolites other than bile acids. To test this hypothesis, we used PCA to graphically summarize total GC-MS detectable metabolomic variation in culture-negative specimens. Next, we projected CDI patient metabolomes onto these principal components. Consistent with the hypothesis, CDI patient fecal metabolomes occupy a restricted portion of the uncolonized patient distribution, characterized by a high PC1 score (Figure 8, A and B). ROC analyses of PC1 scores yielded a modest AUC of 61.1% when distinguishing Cx+/EIA+ from Cx–/EIA– specimens (Figure 8C). These metabolites are not clearly related to bile acid composition, since the total metabolome PC1 exhibits a low degree of association with the bile acid PC1 determined above (r2 < 0.007; Supplemental Figure 17). Instead, high PC1 scores are primarily characterized by diminished monosaccharides, disaccharides, and sugar alcohols with uncertain relationships to CDI (Figure 8D and Supplemental Figure 16). While these metabolite classes can be reasonably identified by GC-MS, identifying specific isomers is often unreliable (e.g., sorbitol and mannitol are both C 6 H 14 O 6 and differ only in the orientation of 1 hydroxyl group and yield comparable spectra). The monosaccharide fructose, a favored C. difficile carbon substrate (32), emerged as a negative CDI correlate in the logistic regression analysis above (Table 2), raising the possibility that some carbohydrates may be consumed by metabolically active C. difficile. Trehalose, a disaccharide recently reported to be a favored substrate of epidemic C. difficile ribotypes 027 and 078, was not identified in our differential analysis (33). To more carefully assess the relationship between trehalose and CDI, we quantified fecal trehalose using a targeted GC-MS analysis based on stable isotope dilution with a 13C 6 -labeled internal standard (Supplemental Figure 15). It was detectable in 61% (115/189) of specimens but did not distinguish Cx+/EIA+ from Cx–/EIA– specimens (35/63 vs. 41/63, P = 0.36, 2-tailed Fisher’s exact test). In 027-positive specimens, trehalose also did not distinguish toxin-positive from toxin-negative specimens (6/8 vs. 12/23, P = 0.41, 2-tailed Fisher’s exact test). A subset of fecal carbohydrates thus has some potential to distinguish CDI and possibly CDI-susceptible patients, though the basis for this remains unclear.

Figure 8 Principal component analysis of GC-MS–defined metabolome in the clinical cohort. (A) Depicted here is a PCA plot of uninfected patients’ GC-MS metabolomes (green, n = 62), onto which is projected the GC-MS metabolomes of patients with CDI (red, n = 62). Data ellipses are drawn around each group of samples (95% level). The clustering of CDI specimens at high PC1 values is consistent with a favored metabolomic profile among patients with CDI. (B) Dot plot of PC1 scores for each patient (n = 62 in each group). Gray dashed line depicts the PC1 threshold that maximizes the sum of percent sensitivity and specificity for distinguishing Cx–/EIA– from Cx+/EIA+ samples. (C) ROC plot evaluating the ability of PC1 to distinguish between CDI patients and controls. The gray region represents 95% confidence intervals bootstrapped for the true-positive rate at each possible false-positive rate. An asterisk marks the point corresponding to the optimal PC1 threshold depicted in panel B. (D) Plot of PC1 and PC2 loadings for all 2539 GC-MS features. It depicts the relative contributions of each GC-MS feature to the distribution of Cx–/EIA– samples in the PCA projection in A. Features in the top or bottom 1% of PC1 loadings tentatively identified as sugars or sugar alcohols are highlighted in blue.

A metabolomic model of CDI. To determine whether fecal Stickland metabolites and bile acids can be used to construct a metabolomic definition of CDI, we conducted logistic regression using the 4-MPA/leucine ratio (log 10 -transformed) and the bile acid PC1 (Table 4 and Figure 9A). Each parameter alone exhibited significant (P < 0.05) independent associations with Cx+/EIA+ status when compared with Cx–/EIA– specimens. When the logistic model criterion is applied (corresponding to >50% probability), Cx+/EIA+ specimens clustered in the high 4-MPA/leucine and high bile acid PC1 quadrant (Figure 9, A and B). ROC analysis of this model yields an AUC of 98.2%, out-performing the original 6-feature model described above (Figure 9C). Each parameter contributed independently—adding a term for interaction between 4-MPA/leucine ratio and bile acid PC1 did not significantly improve the logistic model (P = 0.53, analysis of deviance). These results are consistent with distinctive host and microbial metabolic processes in human CDI.

Figure 9 Interrelationships between host- and C. difficile–associated metabolites. (A) Plotting bile acid PC1 (Figure 7) versus 4-methylpentanoic acid index (Figure 4) reveals that high PC1 score and high 4-methylpentanoic acid index values coincide in patients with CDI compared with control patients (n = 32 for each group). The dashed line marks the dividing line assigned 50% probability of being Cx+/EIA+ by a logistic regression model incorporating both PC1 and 4-methylpentanoic acid index. (B) Probabilities assigned to each patient by the logistic regression model (n = 32 per group). Higher values indicate higher certainty of Cx+/EIA+ status. The gray line marks the 50% probability cutoff above which samples are considered Cx+/EIA+. (C) ROC curve showing the performance of the logistic regression model in discriminating Cx–/EIA– patients from Cx+/EIA+ patients. The gray region represents 95% confidence intervals bootstrapped for the true-positive rate at each possible false-positive rate. The AUC and its 95% confidence interval are also reported. (D) Euler diagram showing the overlap between culture, EIA, and metabolome status. Samples were considered metabolome-positive if assigned a probability above 50% by the logistic regression model.

Table 4 Logistic regression model of CDI metabolome

Metabolomic differences in colonized patients with and without detectable fecal toxin. To determine whether Cx+/EIA– specimens possess distinctive metabolomes, we compared 4-MPA/leucine and bile acid composition profiles from Cx+/EIA– specimens to those of Cx+/EIA+ or Cx–/EIA– specimens. In the logistic regression model, only 38% (20/32) resembled Cx+/EIA+ specimens, with the remainder exhibiting low 4-MPA/leucine ratios in specimens with or without susceptible bile acid profiles (Figure 9, A and B). These observations are consistent with low C. difficile metabolic activity and a protective bile acid profile in many patients with undetectable fecal toxin. Using the logistic regression parameter compared with toxigenic culture or toxin EIA results alone defines a positive test group that is smaller than (but almost entirely encompassed by) toxigenic culture–positive specimens but greater than the number of toxin EIA–positive specimens (Figure 9D). If the metabolic criterion is highly accurate, it may restrict false-positive results from toxigenic C. difficile detection alone and also restrict false-negative results from the toxin EIA test. Further study is necessary to determine whether this possibility can be realized.