| Literature DB >> 34477543 |
Cedric Chih Shen Tan1,2, Mislav Acman1, Lucy van Dorp1, Francois Balloux1.
Abstract
Our understanding of the host component of sepsis has made significant progress. However, detailed study of the microorganisms causing sepsis, either as single pathogens or microbial assemblages, has received far less attention. Metagenomic data offer opportunities to characterize the microbial communities found in septic and healthy individuals. In this study we apply gradient-boosted tree classifiers and a novel computational decontamination technique built upon SHapley Additive exPlanations (SHAP) to identify microbial hallmarks which discriminate blood metagenomic samples of septic patients from that of healthy individuals. Classifiers had high performance when using the read assignments to microbial genera [area under the receiver operating characteristic (AUROC=0.995)], including after removal of species 'culture-confirmed' as the cause of sepsis through clinical testing (AUROC=0.915). Models trained on single genera were inferior to those employing a polymicrobial model and we identified multiple co-occurring bacterial genera absent from healthy controls. While prevailing diagnostic paradigms seek to identify single pathogens, our results point to the involvement of a polymicrobial community in sepsis. We demonstrate the importance of the microbial component in characterising sepsis, which may offer new biological insights into the aetiology of sepsis, and ultimately support the development of clinical diagnostic or even prognostic tools.Entities:
Keywords: SHAP; bacteraemia; blood metagenomics; contamination; kitome; machine learning; metagenomics; sepsis
Mesh:
Year: 2021 PMID: 34477543 PMCID: PMC8715444 DOI: 10.1099/mgen.0.000642
Source DB: PubMed Journal: Microb Genom ISSN: 2057-5858
Summary of metagenomic datasets.
Sample sizes indicated here are those after all quality control steps have been applied. Grumaz-16/19 is a combined dataset comprising Grumaz-16 and Grumaz-19.
|
Study |
Dataset alias |
Accession |
Sepsis definition |
Sequencing technique |
Sample size | |
|---|---|---|---|---|---|---|
|
Septic |
Healthy | |||||
|
| ||||||
|
Grumaz |
Grumaz-19 |
PRJEB21872 PRJEB30958 |
Sepsis-2 |
Shotgun |
50 |
– |
|
Grumaz |
Grumaz-16 |
PRJEB13247 |
Sepsis-2 |
Shotgun |
7 |
15 |
|
Gosiewski |
Gosiewski-17 |
Requested from authors |
Sepsis-1 |
16S (paired-end) |
56 |
23 |
|
Blauwkamp |
Karius |
PRJNA507824 |
Sepsis-1 |
Shotgun |
117 |
170 |
|
| ||||||
|
All single datasets |
Pooled |
All accessions |
Sepsis-1 and Sepsis-2 |
Shotgun and 16S (paired-end) |
230 |
208 |
|
Grumaz Grumaz |
Grumaz-16/19 |
PRJEB13247 PRJEB21872 PRJEB30958 |
Sepsis-2 |
Shotgun |
57 |
15 |
Summary of models trained
Models were optimized and evaluated via a nested cross-validation protocol. The prefix and suffix of each model name corresponds to the dataset and contamination reduction technique applied, respectively. Neat, SD and CR refer to the feature spaces with no decontamination, Simple Decontamination, and SHAP Decontamination applied, respectively (see Methods). Karius-Without corresponds to the SHAP-decontaminated feature space after claimed ‘culture-confirmed’ pathogens are excluded. Karius-Only refers to the feature space containing only genera with ‘culture-confirmed’ pathogens as features.
|
No. of features |
Feature space |
Model performance | ||
|---|---|---|---|---|
|
Precision |
Recall |
AUROC | ||
|
1564 |
|
0.976 |
0.983 |
0.995 |
|
1564 |
|
0.956 |
0.932 |
0.943 |
|
111 |
|
0.896 |
0.787 |
0.942 |
|
25 |
|
0.883 |
0.810 |
0.942 |
|
22 |
|
0.803 |
0.727 |
0.915 |
|
22 |
|
0.929 |
0.862 |
0.950 |
|
685 |
|
0.950 |
0.939 |
0.982 |
|
21 |
|
0.870 |
0.796 |
0.904 |
Fig. 1.Model interpretation and performance. (a) Plot summarizing the SHAP values across all samples for the most important features ranked by the mean absolute SHAP value (highest at the top) for Karius-Neat, (b) Karius-SD, (c) Karius-CR and (d) Karius-Without models. Each point represents a single sample. Points with similar SHAP values were stacked vertically for visualization of point density and were coloured according to the magnitude of the feature values (i.e. read counts). Genera that contained ‘culture-confirmed’ pathogens are highlighted in yellow.
Fig. 2.Comparison of performance (AUROC) for the multi-feature models (Karius-Neat, Karius-Only, Karius-Without feature space) and single-feature models (x-axis). Models were optimised and evaluated using the nested cross-validation protocol.
Fig. 3.Generalisability of models across sepsis cohorts. Model performance before and after SHAP Decontamination determined via holdout cross-validation (see Methods). The table appended describes the sepsis definition used, sequencing type and test size for each holdout dataset, and the corresponding size of the training data.
Fig. 4.Corrected microbial co-occurrence network for genera assigned in sepsis metagenomes. Input data correspond to the Karius-SD feature space. The edges in this network represent those in the septic network that were not present in the healthy network. The widths of edges are weighted by the strength of the SparCC correlations. Nodes are coloured as per the legend at the top, with ‘culture-confirmed’ pathogens those experimentally shown to be implicated in sepsis. The layout of the graph was generated using the Fruchterman–Reingold algorithm.