| Literature DB >> 35040752 |
Yutao Chen1,2, Hongchao Wang1,2, Wenwei Lu1,2, Tong Wu1,2, Weiwei Yuan1,2, Jinlin Zhu1,2, Yuan Kun Lee3,4, Jianxin Zhao1,2, Hao Zhang1,2,5,6, Wei Chen1,2,3.
Abstract
The human gut microbiome is a complex ecosystem that is closely related to the aging process. However, there is currently no reliable method to make full use of the metagenomics data of the gut microbiome to determine the age of the host. In this study, we considered the influence of geographical factors on the gut microbiome, and a total of 2604 filtered metagenomics data from the gut microbiome were used to construct an age prediction model. Then, we developed an ensemble model with multiple heterogeneous algorithms and combined species and pathway profiles for multi-view learning. By integrating gut microbiome metagenomics data and adjusting host confounding factors, the model showed high accuracy (R2 = 0.599, mean absolute error = 8.33 years). Besides, we further interpreted the model and identify potential biomarkers for the aging process. Among these identified biomarkers, we found that Finegoldia magna, Bifidobacterium dentium, and Clostridium clostridioforme had increased abundance in the elderly. Moreover, the utilization of amino acids by the gut microbiome undergoes substantial changes with increasing age which have been reported as the risk factors for age-associated malnutrition and inflammation. This model will be helpful for the comprehensive utilization of multiple omics data, and will allow greater understanding of the interaction between microorganisms and age to realize the targeted intervention of aging.Entities:
Keywords: Gut microbiome; age; ensemble; machine learning; metagenomic; multi-view; regression
Mesh:
Substances:
Year: 2022 PMID: 35040752 PMCID: PMC8773134 DOI: 10.1080/19490976.2021.2025016
Source DB: PubMed Journal: Gut Microbes ISSN: 1949-0976
Figure 1.The illustration of the stacking ensemble structure (a) the first stage of model ensemble (b) the second stage of model ensemble.
Figure 2.Overview of sample data and the association between host factors and age. (a) The sampling area (regrouped to subregion level) and sample size of the metagenomic data of the gut microbiome used in this study. (b) The effect computed using Adonis after of host factors with microbiome species and pathway composition. (c) The association was computed using linear regression and random forest model of country and age factor. (d) The extent of influence of each subregion on the age distribution sort by feature importance score. (e) The effect between the sampling subregion and age factor of the sample subset during each screening epoch. (f) The effect between the selected sample subsets’ country/subregion and age factors which computed by different algorithms. The performance of all the above models is evaluated by 10 times 5-fold cross-validation.
Figure 3.Performance of age prediction model based on species composition of gut microbiome. (a) The ability of different machine algorithms to predict the age of species composition (evaluation metric is R2 and MAE). (b) The impact of different feature selection algorithms on the performance of different models (filtered with age prediction ability, evaluation metric is R2). (c) The influence of extra subregion feature on the performance of age prediction (based on the species composition after feature selection).
Figure 4.Performance of age prediction model based on pathways composition of gut microbiome. (a) The ability of different machine algorithms to predict the age of pathways composition (evaluation metric is R2 and MAE). (b) The impact of different feature selection algorithms on the performance of different models (filtered with age prediction ability, evaluation metric is R2). (c) The influence of extra subregion feature on the performance of age prediction (based on the pathway composition after feature selection).
Figure 5.The predictive accuracy of modeling methods. (a) The accuracy of the different datasets analyzed by individual and ensemble model. (b) The impact of different data fusion methods on model prediction performance. (c) The predictive accuracy of different weighting methods on the extended dataset. There are significant differences in the prediction performance between the unlabeled groups. (d) Scatter plot of the true age and predicted age by the ensemble model (based on feature selected species and pathways data with extra subregion information). Origin, dataset after feature selection only; Extended, dataset after feature selection with extra subregion label. Paired Wilcoxon rank-sum test is used to analyze the difference between each group of data.
Figure 6.Aging-associated biomarkers have significant effect on model prediction performance. (a) Top 20 biomarkers with the highest effect on model prediction performance in ensemble model for prediction of age. (b) Top 8 most affected microbial species (c) Top 8 most affected microbial pathways. Correlation between species/pathways and age shown in terms of spearman’s rho (ρ). All p-values are adjusted for multiple comparisons using the Bonferroni correction and spline fit to the data is also shown (blue curve).