| Literature DB >> 34454437 |
H Robert Frost1, Anne G Hoen2,3,4, Quang P Nguyen5,1, Margaret R Karagas5,6, Juliette C Madan5,1,6,7, Erika Dade5, Thomas J Palys5, Hilary G Morrison8, Wimal W Pathmasiri9, Susan McRitche10, Susan J Sumner10.
Abstract
BACKGROUND: The infant intestinal microbiome plays an important role in metabolism and immune development with impacts on lifelong health. The linkage between the taxonomic composition of the microbiome and its metabolic phenotype is undefined and complicated by redundancies in the taxon-function relationship within microbial communities. To inform a more mechanistic understanding of the relationship between the microbiome and health, we performed an integrative statistical and machine learning-based analysis of microbe taxonomic structure and metabolic function in order to characterize the taxa-function relationship in early life.Entities:
Keywords: Functional redundancy; Infant gut microbiome; Metabolism; Prediction models; Stool metabolome
Mesh:
Substances:
Year: 2021 PMID: 34454437 PMCID: PMC8400760 DOI: 10.1186/s12866-021-02282-3
Source DB: PubMed Journal: BMC Microbiol ISSN: 1471-2180 Impact factor: 3.605
Fig. 1Overview of the analysis. Panel A describes the subject selection workflow and panel B describes the analytic pipeline
Selected characteristics of subjects contributing samples at 6 weeks (n = 158) and at 12 months of age (n = 282)
| 6 weeks | 12 months | |
|---|---|---|
| Birthweight (grams) | ||
| Mean (Standard Deviation) | 3370 (480) | 3430 (528) |
| Median [Minimum, Maximum] | 3430 [1910, 4710] | 3450 [1320, 4660] |
| Missing | 2 (1.3%) | 4 (1.4%) |
| Sex | ||
| Male | 85 (53.8%) | 159 (56.4%) |
| Female | 73 (46.2%) | 123 (43.6%) |
| Feeding Mode Until Time of Sample Collection | ||
| Unknown | 6 (3.8%) | 7 (2.5%) |
| Exclusively breastfed | 98 (62%) | 99 (35.1%) |
| Exclusively formula fed | 13 (8.2%) | 9 (3.2%) |
| Mixed | 41 (25.9%) | 167 (59.2%) |
| Delivery Mode | ||
| Vaginal | 114 (72.2%) | 200 (70.9%) |
| Cesarean | 44 (27.8%) | 82 (29.1%) |
| Gestational Age (Weeks) | ||
| Mean (SD) | 39.1 (1.59) | 39.0 (1.70) |
| Median [Minimum, Maximum] | 39.1 [33.4, 43.0] | 39.1 [29.1, 42.0] |
| Post-delivery infant systemic antibiotic exposure | ||
| No | 151 (95.6%) | 274 (97.2%) |
| Yes | 7 (4.4%) | 8 (2.8%) |
| Maternal smoking during pregnancy | ||
| No | 143 (90.5%) | 262 (92.9%) |
| Yes | 11 (7.0%) | 14 (5.0%) |
| Missing | 4 (2.5%) | 6 (2.1%) |
| Infant Race | ||
| Other | 4 (2.5%) | 13 (4.6%) |
| White | 154 (97.5%) | 269 (95.4%) |
Fig. 2Inter-omics Procrustes biplots comparing PCoA ordinations of targeted metabolite profiles and taxonomic relative abundances for 6 weeks (left panels) (n = 158) and 12 months (right panels) (n = 262). Top panels present analyses based on ordinations from Euclidean distances of genus level abundances after centered log ratio transformation and Euclidean distances of log-transformed metabolite profiles. Bottom panel presents analyses based on gUniFrac distance of amplicon sequence variant (ASV) relative abundances and Euclidean distances of log-transformed metabolite profiles. There were significant associations between the microbiome and the metabolome (both targeted and untargeted) when utilizing Euclidean distances, however this association goes away when the gUniFrac distance was employed for the targeted metabolites only
Fig. 3Pairwise Spearman correlation of concentration-fitted metabolites and genus-level taxonomic abundances for 6-weeks (panel A, N = 158) and 12-months (panel B, N = 282) infants. Left panel displays the overall correlation pattern, where non-significant correlations are not colored (false discovery rate (FDR) controlled q-value < 0.05). Right panel displays the same heatmap restricted to taxa and metabolites selected by the sparse CCA procedure. Additionally, correlation coefficient of the first sCCA variate pair, bootstrapped 95% confidence interval and permutation p-value are also reported. Significant microbiome-metabolome correlation was observed at both time points, however no significant difference was found between the time points
Fig. 4Forest plots of each prediction performance metric (R-squared – Panel A, Spearman correlation – Panel B) for each time point (6 weeks (n = 158), 12 months (n = 282)) across all 36 metabolites and 4 machine learning models. 95% credible interval and predictive posterior means were generated using Bayesian modelling of the evaluation statistic (Methods) after 100 repeats of 5-fold nested cross validation. Red vertical lines indicate a value of 0 for the evaluation metric (equivalent to null model). Metabolites were classified as predictable if the null value did not lie within the estimated 95% credible interval. For most metabolites, predictive performance was not significantly better than null models
Fig. 5Comparative analysis predictive model performance across all metabolites in the targeted dataset for both 6-weeks (n = 158) and 12-months (n = 282) time points. Top panel shows superimposed boxplots and violin plots of the distribution of predictive posterior mean for each evaluation metric across all 36 metabolites. Bottom panels show aggregated model rankings for all metabolites using R-squared (left) and Spearman correlation (right) using Borda scores (Methods). Higher scores indicate that a model was consistently selected as a better performing. Relatively similar Borda scores and cross-metabolite average predictive performances indicate that no model was clearly the most performant. However, support vector machines (with radial basis function kernel) was highest scoring model