| Literature DB >> 31316056 |
Himel Mallick1,2, Eric A Franzosa1,2, Lauren J Mclver1,2, Soumya Banerjee1,2, Alexandra Sirota-Madi1,2, Aleksandar D Kostic1,2, Clary B Clish1, Hera Vlamakis1, Ramnik J Xavier3,4,5,6, Curtis Huttenhower7,8.
Abstract
Microbial community metabolomics, particularly in the human gut, are beginning to provide a new route to identify functions and ecology disrupted in disease. However, these data can be costly and difficult to obtain at scale, while amplicon or shotgun metagenomic sequencing data are readily available for populations of many thousands. Here, we describe a computational approach to predict potentially unobserved metabolites in new microbial communities, given a model trained on paired metabolomes and metagenomes from the environment of interest. Focusing on two independent human gut microbiome datasets, we demonstrate that our framework successfully recovers community metabolic trends for more than 50% of associated metabolites. Similar accuracy is maintained using amplicon profiles of coral-associated, murine gut, and human vaginal microbiomes. We also provide an expected performance score to guide application of the model in new samples. Our results thus demonstrate that this 'predictive metabolomic' approach can aid in experimental design and provide useful insights into the thousands of community profiles for which only metagenomes are currently available.Entities:
Mesh:
Year: 2019 PMID: 31316056 PMCID: PMC6637180 DOI: 10.1038/s41467-019-10927-1
Source DB: PubMed Journal: Nat Commun ISSN: 2041-1723 Impact factor: 14.919
Fig. 1MelonnPan is a predictive model inferring microbial community metabolite features from amplicons or metagenomes. a The MelonnPan model can be trained to infer metabolite profiles for a particular microbial community type given, first, training data consisting of paired metagenomes (X) and metabolomes (Y) from the environment of interest. The model is fit beginning with microbial sequence features derived from training metagenomes. It uses an elastic net regularized regression, per metabolite, to identify a minimal set of microbial features whose abundances predict that metabolite. These individual learners are first checked using cross-validation, and poorly fit metabolites (Spearman correlation coefficient between measured and predicted metabolite abundances across samples <0.3) are flagged. b The sequence features’ coefficients (W) for remaining, well-predicted metabolites are saved and can be applied to new metagenomes to predict the associated metabolite features (, in units of relative abundance), which can be utilized for downstream epidemiological analysis
Fig. 2MelonnPan accurately predicts metabolite features based on metagenomic sequence profiles. a From a panel of 466 metabolites whose identities were confirmed against laboratory standards, these 50 were the best predicted by MelonnPan with unique Human Metabolome Database (HMDB) identifiers (as measured by Spearman correlation (r) of predicted versus measured profiles in the Netherlands inflammatory bowel disease (NLIBD) independent validation cohort). All metabolites shown have r > 0.3 over a total n = 65 NLIBD samples. Two representative metabolites are shown in red. b Measured and predicted metabolite profiles across a single representative sample were strongly and significantly associated across 107 labelled metabolites (including both unique and non-unique compounds that were well predicted by MelonnPan). c Representative significant prediction for cholestenone in the test set. d Representative statistically significant prediction for pantothenate in the NLIBD validation data (values are relative abundances). See Supplementary Figs. 1–2 and Supplementary Data 1–6 for full results. For each scatter plot, the best fitting regression line is also shown (in red)
Fig. 3MelonnPan reveals biologically meaningful functional relationships. a Statistically significant gene sets (genera) (Q < 0.25) enriched in the MelonnPan predictive gene list, as identified by the permutation-based Kolmogorov–Smirnov (KS) test (based on 100,000 null permutations). The bars in the x axis indicate the logarithm of P values calculated as the fraction of permutation values that are at least as extreme as the original KS statistic derived from the non-permuted data. Numbers in the parentheses indicate the size of the gene sets. b Statistically significant over-representation of uncharacterized gene families in MelonnPan gene set. Contingency table describing the relationship between class membership in Pfam database and metabolite predictiveness reveals enrichment of uncharacterized proteins in the metabolite prediction process
Fig. 4Predicted and measured metabolite profiles reveal similar global structure in the inflammatory bowel disease (IBD) microbiome. Principal coordinate analysis (PCoA) of top 50 well-predicted unique metabolite clusters whose identities were confirmed against laboratory standards, using a Spearman distance matrix. Metabolites in the PCoA plot are coloured by their labels and shaped by whether they were measured or predicted, and for each metabolite, the measured and predicted abundances are connected by edges, demonstrating closeness of measured and predicted metabolites across 65 Netherlands IBD (NLIBD) samples in the two-dimensional multidimensional scaling (MDS) space