| Literature DB >> 33343536 |
Xiaochen Yin1, Tomer Altman2, Erica Rutherford1, Kiana A West1, Yonggan Wu1, Jinlyung Choi1, Paul L Beck3, Gilaad G Kaplan3, Karim Dabbagh1, Todd Z DeSantis1, Shoko Iwai1.
Abstract
Metabolomic analyses of human gut microbiome samples can unveil the metabolic potential of host tissues and the numerous microorganisms they support, concurrently. As such, metabolomic information bears immense potential to improve disease diagnosis and therapeutic drug discovery. Unfortunately, as cohort sizes increase, comprehensive metabolomic profiling becomes costly and logistically difficult to perform at a large scale. To address these difficulties, we tested the feasibility of predicting the metabolites of a microbial community based solely on microbiome sequencing data. Paired microbiome sequencing (16S rRNA gene amplicons, shotgun metagenomics, and metatranscriptomics) and metabolome (mass spectrometry and nuclear magnetic resonance spectroscopy) datasets were collected from six independent studies spanning multiple diseases. We used these datasets to evaluate two reference-based gene-to-metabolite prediction pipelines and a machine-learning (ML) based metabolic profile prediction approach. With the pre-trained model on over 900 microbiome-metabolome paired samples, the ML approach yielded the most accurate predictions (i.e., highest F1 scores) of metabolite occurrences in the human gut and outperformed reference-based pipelines in predicting differential metabolites between case and control subjects. Our findings demonstrate the possibility of predicting metabolites from microbiome sequencing data, while highlighting certain limitations in detecting differential metabolites, and provide a framework to evaluate metabolite prediction pipelines, which will ultimately facilitate future investigations on microbial metabolites and human health.Entities:
Keywords: Next Generation Sequence; computational prediction; human microbiome; metabolic potential; metabolome
Year: 2020 PMID: 33343536 PMCID: PMC7746778 DOI: 10.3389/fmicb.2020.595910
Source DB: PubMed Journal: Front Microbiol ISSN: 1664-302X Impact factor: 5.640
FIGURE 1(A) Metabolite prediction workflow for Mangsoteen, MIMOSA, and MelonnPan. (B) Evaluation metrics used to appraise prediction performance regarding occurrence and differential metabolite identification.
Characteristics of datasets included for prediction and evaluation.
| Disease area – sample type | Study | Treatment group (counts of biospecimens in each group) | Metabolome dataset | Microbiome dataset | Contrast for differential analysis | ||
| Profiling technology | Target region | Profiling technology | Data availability | ||||
| Healthy – Stool samples | High resistant starch intervention (14) Low resistant starch intervention (14) Baseline control (13) | SolariX Fourier transform ion cyclotron resonance mass spectrometer (FT-ICR-MS; Bruker Daltonik GmbH) | 16S rRNA gene: V4-V6 | HiSeq 2000 (Illumina) | EBI-ENA accession: | High resistant starch intervention: Baseline control | |
| Colorectal cancer (CRC) – Biopsy samples | Diseased tissue (28) Healthy tissue distal to diseased tissue (35) Healthy tissue proximal to diseases tissue (20) | Quadruple time-of-flight mass spectrometer (Agilent Technologies 6550 Q-TOF) | 16S rRNA gene: V3-V5 | MiSeq (Illumina) | NCBI SRA BioProject: | Diseased tissue: Healthy tissue distal to diseased tissue | |
| Autism spectrum disorder (ASD) – Stool samples | ASD (23) Healthy control (21) | Varian Direct Drive (VNMRS) 600 MHz spectrometer (Agilent Technologies) | 16S rRNA gene: V2-V3 | Genome Sequencer FLX-Titanium System (Roche) | Qitta: study ID 11169 | ASD: Healthy control | |
| Inflammatory bowel disease (IBD) – Stool samples | Crohn’s disease (88) Ulcerative colitis (76) Non-IBD Control (56) | Q Exactive Hybrid Quadrupole-Orbitrap mass spectrometer; Exactive Plus Orbitrap mass spectrometer (Thermo Fisher Scientific) | Whole genome | HiSeq 2500 (Illumina) | NCBI SRA BioProject: | Crohn’s disease: Non-IBD Control | |
| Crohn’s disease (139) Non-IBD Control (86) | Q Exactive/Exactive Plus orbitrap mass spectrometers (Thermo Fisher Scientific) | Whole genome | HiSeq2000; HiSeq 2500 (Illumina) | NCBI SRA BioProject: | Crohn’s disease: Non-IBD Control | ||
| Whole transcriptome | HiSeq2500 (Illumina) | ||||||
| SG_IBD (generated in this study) | Active ulcerative colitis (10) Inactive ulcerative colitis (19) Healthy control (15) | Q Exactive orbitrap mass spectrometers (Thermo Fisher Scientific) | 16S rRNA gene: V4 | MiSeq (Illumina) | NCBI SRA BioProject: | Active ulcerative colitis: Healthy control | |
| Whole transcriptome | NextSeq 550 (Illumina) | ||||||
FIGURE 2Results of metabolite prediction as performed by different pipelines. Upset plots (Lex and Gehlenborg, 2014) depict the measured and predicted metabolite numbers resulting from each pipeline and their intersections based on (A) KEGG and (B) BioCyc databases. Pie charts display predicted metabolite classification according to (C) KEGG BRITE classes, and specifically (D) metabolites belonging to the “Compounds with biological roles” BRITE class.
FIGURE 3Evaluation of predicted occurrence (presence/absence) as appraised by precision, recall, and F1 score. Each point indicates a dataset used for evaluation. A pairwise Wilcoxon signed-rank test was applied at ∗∗∗p < 0.01, ∗∗p < 0.05.
FIGURE 4Evaluation of predicted differential metabolite identification as appraised by precision, recall, and F1 score. Each point indicates a dataset used for evaluation. A pairwise Wilcoxon signed-rank test was applied at ∗∗p < 0.05.