| Literature DB >> 31686038 |
James T Morton1,2, Alexander A Aksenov3,4, Louis Felix Nothias3,4, James R Foulds5, Robert A Quinn6, Michelle H Badri7, Tami L Swenson8, Marc W Van Goethem8, Trent R Northen8,9, Yoshiki Vazquez-Baeza10,11, Mingxun Wang3,4, Nicholas A Bokulich12,13, Aaron Watters14, Se Jin Song1,11, Richard Bonneau7,14,15,16, Pieter C Dorrestein3,4, Rob Knight17,18,19,20.
Abstract
Integrating multiomics datasets is critical for microbiome research; however, inferring interactions across omics datasets has multiple statistical challenges. We solve this problem by using neural networks (https://github.com/biocore/mmvec) to estimate the conditional probability that each molecule is present given the presence of a specific microorganism. We show with known environmental (desert soil biocrust wetting) and clinical (cystic fibrosis lung) examples, our ability to recover microbe-metabolite relationships, and demonstrate how the method can discover relationships between microbially produced metabolites and inflammatory bowel disease.Entities:
Mesh:
Year: 2019 PMID: 31686038 PMCID: PMC6884698 DOI: 10.1038/s41592-019-0616-3
Source DB: PubMed Journal: Nat Methods ISSN: 1548-7091 Impact factor: 28.547
Figure 1:Input data types and mmvec neural network architecture. (a) The neural network architecture where the input layer represents one-hot encodings of N microbes and the output layer represents the proportions of M metabolites. U corresponds to microbial vectors and V corresponds to metabolite vectors. (b) The pipeline for training mmvec. The objective behind mmvec is to predict metabolite abundances (y) given a single input microbe sequence (x), also known as a one-hot encoding. This training procedure will estimate conditional probabilities of observing a metabolite given the input microbe sequence. Cross-validation can be performed on hold-out samples to access overfitting.
Figure 2:Simulation benchmarks. (a) Absolute abundances of microbes and metabolites simulated from differential equations derived in [27] for a specific spatial point. (b) Proportions of the abundances shown in (a). (c) F1 score, precision and recall curves comparing mmvec to Pearson, Spearman, SparCC, SPIEC-EASI, and proportionality metrics phi and rho across the top 100 metabolites for each microbe. (d) comparisons of coefficients learned from absolute abundances and relative abundances all of the benchmarked methods.
Figure 3:M. vaginatus released metabolites after the biocrust wetting event. (a) Comparison of M. vaginatus metabolite interactions estimated from Spearman and mmvec from (n=19 samples). All of the experimentally validated M. vaginatus released metabolites are labeled. All metabolites with contradicting findings between the wetting experiment and the in vitro experimental results are highlighted in red. Points are resized according to the −10 log(p-value) obtained from Spearman correlation. Dashlines mark the cutoff for a Spearman correlation of zero, and the conditional log probabilities of zero. Here a zero log conditional probability represents the conditional probability of the average metabolite because all probabilities here are mean centered. (b) Benchmarks comparing the detection rate of the experimentally validated molecules across different statistical methodologies. (c) M. vaginatus proportions and (d) 4-guanidinobutanoate proportions following a wetting event.
Figure 4:Investigation of P.aeruginosa-associated molecules. (a) Biplot drawn from the mmvec conditional probabilities estimated for the cystic fibrosis dataset [27]. Arrows represent microbes and dots represent metabolites. The x and y axes represent principal components from the SVD of the microbe-metabolite conditional probabilities estimated from mmvec (n=138 samples). Distances between points quantify co-occurrence strength between metabolites, with small distances indicating metabolites that have a high probability of co-occurring with high probability. Distances between arrow tips quantify co-occurrence strength between microbes. The directionality of the arrows can be used to pinpoint which microbes can explain the metabolite co-occurrence patterns. Arrows highlighted in green correspond to putative cystic fibrosis pathogens and yellow arrows highlight known anaerobes. Only known molecules produced by P. aeruginosa are labeled. (b) Scatter plot of molecules with respect to the oxygen gradient differential and the first principal component learned from mmvec (n=442 molecules) with linear regression model and 95% confidence interval for regression estimate. (c) The first principal component vs the number of samples where the taxa was the most abundant taxa in that sample. (d) Heatmap of P. aeruginosa and Streptococcus abundances in samples where they are the most abundant species. (e) Heatmap of the top 100 molecules that co-occur with P. aeruginosa and Streptococcus.
Figure 5:Microbe/metabolite co-occurrences across study of HCC progression in the context of innate immunity in a mouse model [28]. (a) Visualization of microbial co-occurrence patterns, where distances between points approximates the Aitchison distance between microbes, which quantities microbial occurrences. Small distances are indicative of microbes with high probability of co-occurring together. Microbes are colored according to their association with HFD, which was estimated using differential abundance analysis via multinomial regression. (b) Emperor [59] biplot of microbe-metabolite interactions, with metabolites colored according to their association with HFD. HFD association was estimated through differential abundance analysis via multinomial regression. Distances between points approximate Aitchison distances between metabolites and distances between arrow tips approximate Aitchison distances between microbes. Several Clostridium spp. appear to co-occur with the new bile acid molecule cholate phenylalanine amidate, also referred to as Phe conjugated cholic acid.
Figure 6:Microbe-metabolite interactions of the human microbiome in association with IBD samples [29]. (a) Heatmap visualization of the inferred conditional probabilities for various bile acids given the presence of Klebsiella, Roseburia and Clostridium bolteae. (b) Heatmap visualization of the inferred conditional probabilities for the carnitines given the presence of Klebsiella, Roseburia, and Clostridium bolteae. (c) Multiomics biplot of the microbe-metabolite interactions learned from metagenomics profiles and C18 negative ion mode LC-MS. Microbes (arrows) and metabolites (spheres) are colored according to their differentials estimated from multinomial regression. Klebsiella spp. appears to be strongly associated with IBD, while Propionibacterium spp. has strong negative association. (d) Network of the top 300 edges where only the edges that contain Klebsiella and Propionibacteriaceae are visualized.