| Literature DB >> 34132169 |
Shanlin Ke1,2, Nira R Pollock3,4, Xu-Wen Wang1, Xinhua Chen5, Kaitlyn Daugherty5, Qianyun Lin5, Hua Xu5, Kevin W Garey6, Anne J Gonzales-Luna6, Ciarán P Kelly5, Yang-Yu Liu1.
Abstract
Clostridioides difficile (C. difficile) infection is the most common cause of healthcare-associated infection and an important cause of morbidity and mortality among hospitalized patients. A comprehensive understanding of C. difficile infection (CDI) pathogenesis is crucial for disease diagnosis, treatment, and prevention. Here, we characterized gut microbial compositions and a broad panel of innate and adaptive immunological markers in 243 well-characterized human subjects (including 187 subjects with both microbiota and immune marker data), who were divided into four phenotype groups: CDI, Asymptomatic Carriage, Non-CDI Diarrhea, and Control. We found that the interactions between gut microbiota and host immune markers are very sensitive to the status of C. difficile colonization and infection. We demonstrated that incorporating both gut microbiome and host immune marker data into classification models can better distinguish CDI from other groups than can either type of data alone. Our classification models display robust diagnostic performance to differentiate CDI from Asymptomatic carriage (AUC~0.916), Non-CDI Diarrhea (AUC~0.917), or Non-CDI that combines all other three groups (AUC~0.929). Finally, we performed symbolic classification using selected features to derive simple mathematic formulas that explicitly quantify the interactions between the gut microbiome and host immune markers. These findings support the potential roles of gut microbiota and host immune markers in the pathogenesis of CDI. Our study provides new insights for a microbiome-immune marker-derived signature to diagnose CDI and design therapeutic strategies for CDI.Entities:
Keywords: C. difficile infection; gut microbiome; host immune markers; machine learning
Year: 2021 PMID: 34132169 PMCID: PMC8210874 DOI: 10.1080/19490976.2021.1935186
Source DB: PubMed Journal: Gut Microbes ISSN: 1949-0976
Demographic characteristics of the enrolled subjects
| | NAAT negative | NAAT positive | ||
|---|---|---|---|---|
| Characteristics | Control (n = 47) | Non-CDI Diarrhea (n = 44) | Asymptomatic Carriage (n = 40) | CDI (n = 112) |
| Female | 14 (29.79%) | 22 (50.00%) | 20 (50.00%) | 61 (54.46%) |
| Male | 33 (70.21%) | 22 (50.00%) | 20 (50.00%) | 51 (45.54%) |
| 62.40 ± 12.33 | 63.07 ± 13.15 | 62.15 ± 17.25 | 64.99 ± 15.62 | |
| Hispanic | 1 (2.13%) | 3 (6.82%) | 1 (2.50%) | 6 (5.36%) |
| Non-Hispanic | 38 (80.85%) | 37 (84.09%) | 31 (77.50%) | 96 (85.71%) |
| Unknown | 8 (17.02%) | 4 (9.09%) | 8 (20.00%) | 10 (8.93%) |
| White | 33 (70.21%) | 28 (63.64%) | 28 (70.00%) | 89 (79.46%) |
| Other | 4 (8.51%) | 10 (22.73%) | 3 (7.50%) | 23 (20.54%) |
| Unknown | 10 (21.28%) | 6 (13.64%) | 9 (22.50%) | 0 (0.00%) |
Figure 1.Comparing the diversity of the gut microbiota (and host immune markers) of subjects with different . (a) Taxa richness. (b) Chao1. (c) Evenness. (d) Shannon index. (e) Principal Coordinates Analysis (PCoA) plot based on Bray–Curtis dissimilarities of microbial compositions. (f) Boxplot of the gut microbiome Bray–Curtis dissimilarity between subjects within each group. (g) Principal component analysis (PCA) plot of host immune marker concentrations. (h) Boxplot of the Euclidean distance for the host immune markers of subjects within each group. Statistical significance was determined by Mann–Whitney test, *P < .05, *P < .05, **P < .01, ***P < .001
Figure 2.Relative abundances of differentially abundant genera identified by ANCOM in comparing different groups. (a) CDI vs. Asymptomatic Carriage. (b) CDI vs. Non-CDI Diarrhea. (c) CDI vs. Non-CDI. The top differentially abundant taxa were ranked based on their W statistics (a high “w score” generated by this test indicates the greater likelihood that the null hypothesis can be rejected, indicating the number of times a parameter is significantly different between groups) (from left to right). The relative abundance (%) are plotted on log10 scale. The notches in the boxplots show the 95% confidence interval around the median
Figure 3.Microbial correlation networks of different groups. (a) Control. (b) Non-CDI Diarrhea. (c) Asymptomatic Carriage. (d) CDI. Nodes represent genera and are colored based on their phylum. Edges represent microbial correlations: green/red means positive/negative correlations, respectively. Edge thickness indicates correlation strength, and only the high-confidence interactions (p-value < 0.05) with high absolute correlation coefficients (> 0.3) were presented. For each group, we further identified the top-three most connected genera/nodes. They are Ruminococcus_1, Roseburia and Lachnospiraceae_UCG_008 for the Control group, [Ruminococcus]_torques_group, [Eubacterium]_hallii_group and Blautia for the Non-CDI Diarrhea group, Ruminiclostridium_5, Enterococcus and Lachnospiraceae_UCG_008 for the Asymptomatic Carriage group, and Alistipes, Ruminiclostridium_5 and Lachnoclostridium for the CDI group
Figure 4.Correlations between gut microbial abundances and host immune markers in different groups, quantified by Spearman correlation with Benjamini-Hochberg correction. (a) Control. (b) Non-CDI Diarrhea. (c) Asymptomatic Carriage. (d) CDI. Rows represent genera; columns represent immune markers. The layout of the heatmap is followed the hierarchical clustering results of Control cohort (see Supplementary Figure 4). Red/blue represents positive/negative correlation, respectively. The intensity of the colors denotes the strength of the correlation. *α < 0.05, **α < 0.01, ***α < 0.001
Figure 5.The performance of RF-based classification models based on various types of features in differentiating CDI from other groups. (a) CDI vs. Asymptomatic Carriage. (b) CDI vs. Non-CDI Diarrhea. (c) CDI vs. Non-CDI. For each classification task, we used different types of features: (1) the top-1 immune marker feature (based on mean decrease accuracy); (2) the top-1 genus feature; (3) all immune markers; (4) all genera; (5) integration of all immune markers and genera; (6) selected features from the set of all immune markers and genera. Error bars represent the standard errors of the means (SEM)
Diagnostic scores derived from symbolic classification (SC) and logistic regression (LR). For each subject , we calculate his/her diagnostic score (or ) based on one of the following formulas derived from SC (or LR), respectively. For SC, the class of subject is CDI if ; or Asymptomatic Carriage (or Non-CDI Diarrhea, Non-CDI) if . For LR, the class of subject is CDI if ; or Asymptomatic Carriage (or Non-CDI Diarrhea, Non-CDI) if . Here, both and were learned from the entire dataset. Features used here include: GCSF; IgA_toxA; IgA_toxB; IL6; TNFα; Anaerobacillus; Curvibacter; Enterobacter; Enterococcus; Epulopiscium; [Eubacterium]_haillii_group; Fusobacterium; Moryella; Stenotrophomonas; Veillonella. In particular, for each classification task (regardless of using SC or LR), the following selected features were: (1) CDI vs. Asymptomatic Carriage: ,, and ; (2) CDI vs. Non-CDI Diarrhea: , , , , and ; (3) CDI vs. Non-CDI: , , , , , , , , and . Note that in the calculation of precision, recall and F1-score, we can treat either CDI (or Asymptomatic Carriage, Non-CDI Diarrhea, Non-CDI) as the true positive. Results shown in the parenthesis represent the latter case
| Model | Diagnostic | Formula | Accuracy | Precision | Recall | F1-score |
|---|---|---|---|---|---|---|
| CDI vs. Asymptomatic Carriage | 0.896 | 0.914 (0.840) | 0.949 (0.75) | 0.931(0.792) | ||
| CDI vs. Non-CDI Diarrhea | 0.900 | 0.946 (0.826) | 0.897(0.905) | 0.921(0.864) | ||
| CDI vs. Non-CDI | 0.882 | 0.889 (0.878) | 0.821 (0.927) | 0.853 (0.902) | ||
| CDI vs. Asymptomatic Carriage | 0.830 | 0.895 (0.667) | 0.872 (0.714) | 0.883 (0.690) | ||
| CDI vs. Non-CDI Diarrhea | 0.800 | 0.814 (0.765) | 0.897 (0.619) | 0.854 (0.684) | ||
| CDI vs. Non-CDI | 0.813 | 0.841 (0.798) | 0.679 (0.908) | 0.752 (0.850) |