| Literature DB >> 35354069 |
Yang Liu1, Guillaume Méric2, Aki S Havulinna3, Shu Mei Teo4, Fredrik Åberg5, Matti Ruuskanen6, Jon Sanders7, Qiyun Zhu7, Anupriya Tripathi8, Karin Verspoor9, Susan Cheng10, Mohit Jain11, Pekka Jousilahti12, Yoshiki Vázquez-Baeza13, Rohit Loomba14, Leo Lahti15, Teemu Niiranen16, Veikko Salomaa12, Rob Knight17, Michael Inouye18.
Abstract
The gut microbiome has shown promise as a predictive biomarker for various diseases. However, the potential of gut microbiota for prospective risk prediction of liver disease has not been assessed. Here, we utilized shallow shotgun metagenomic sequencing of a large population-based cohort (N > 7,000) with ∼15 years of follow-up in combination with machine learning to investigate the predictive capacity of gut microbial predictors individually and in conjunction with conventional risk factors for incident liver disease. Separately, conventional and microbial factors showed comparable predictive capacity. However, microbiome augmentation of conventional risk factors using machine learning significantly improved the performance. Similarly, disease-free survival analysis showed significantly improved stratification using microbiome-augmented models. Investigation of predictive microbial signatures revealed previously unknown taxa for liver disease, as well as those previously associated with hepatic function and disease. This study supports the potential clinical validity of gut metagenomic sequencing to complement conventional risk factors for prediction of liver diseases.Entities:
Keywords: disease; gut; liver disease; metagenomics; microbiome; microbiota; prediction
Mesh:
Year: 2022 PMID: 35354069 PMCID: PMC9097589 DOI: 10.1016/j.cmet.2022.03.002
Source DB: PubMed Journal: Cell Metab ISSN: 1550-4131 Impact factor: 31.373
Baseline characteristics of study population
| Female | Male | |
|---|---|---|
| n = 7,115 | n = 55% | n = 45% |
| Age | 49.69 [38.05, 58.78] | 51.92 [40.54, 60.70] |
| Body mass index (kg/m2) | 25.90 [23.09, 29.47] | 26.9 [24.55, 29.58] |
| Waist-hip ratio | 0.84 [0.80, 0.88] | 0.97 [0.92, 1.01] |
| Smoking | 19% | 28% |
| Pure alcohol consumption (g/week) | 18.9 [2.7, 55.8] | 75.9 [20.7, 168.3] |
| HDL cholesterol (mmol/L) | 1.59 [1.35, 1.89] | 1.30 [1.10, 1.53] |
| LDL cholesterol (mmol/L) | 3.19 [2.65, 3.76] | 3.46 [2.89, 4.09] |
| Triglycerides (mmol/L) | 1.07 [0.80, 1.45] | 1.36 [0.97, 1.97] |
| Gamma-glutamyl transferase (U/L) | 19 [15, 27] | 30 [21, 46] |
| Alanine aminotransferase (U/L) | 18 [14, 24] | 27 [20, 37] |
| Aspartate aminotransferase (U/L) | 23.5 [20, 28] | 28.0 [24, 33] |
Median [IQR] for continuous variables; n% for categorical variables.
Available for 6,211 persons.
Figure 1Machine learning framework for predicting incident liver disease
Figure 2Comparison of approaches for the prediction of incident liver disease using gut microbial features
(A) For the prediction of any liver disease, the gradient boosting classifier outperformed logistic regression and ridge regression across different taxonomic levels.
(B) For the prediction of alcoholic liver disease, similar trends were observed. For comparison, a conventional prediction model is shown in red. Error bars represent mean and IQR (STAR Methods). Horizontal dashed lines mark the mean performance of conventional models.
Figure 3Models of conventional risk factors and gut microbiome data improved the prediction of incident liver disease over conventional prediction models
(A and B) Area under the ROC curve (AUROC) for gradient boosting models using species-level gut microbiome data together with conventional risk factors (blue) or a conventional risk factor model (red), predicting (A) incident any liver disease or (B) alcoholic liver disease.
(C and D) Area under the precision-recall curve (AUPRC) for (C) any liver disease and (D) alcoholic liver disease. Error bars represent mean and IQR (STAR Methods). Difference in performance between the microbiome-augmented model and the conventional risk factor model was tested using one-sided Wilcoxon-signed rank test. ∗p < 0.05; ∗∗p < 0.01. Horizontal dashed lines mark the mean performance of conventional model as a reference. The bolded ROC and precision-recall curves correspond to models with AUROC and AUPRC that are closest to the mean performance reference.
Figure 4Survival curves of predicted risk groups for incident liver disease
(A and B) Performance in the withheld validation set (30% of samples) of Cox models of conventional risk factors and in combination with species-level microbiome-only scores for (A) any liver disease (# of cases = 30) and (B) alcoholic liver disease (# of cases = 12). Predicted risk groups are the top 5% (risk group 1) versus the bottom 95% (risk group 2).
Figure 5Predictive microbial taxa for liver disease
A bacterial taxonomy tree (phylum to family level) whose members at lower ranks showed predictive signal for incident liver disease. For full taxonomy, see Figure S4.
| REAGENT or RESOURCE | SOURCE | IDENTIFIER |
|---|---|---|
| Human stool samples | FINRISK 2002 Cohort | |
| Human stool samples | UCSD NAFLD Cohort (PI: Rohit Loomba) | UCSD IRB #140084 UCSD IRB #111282 |
| Raw metagenomic data | FINRISK 2002 Cohort | EGAS00001005020 |
| Raw metagenomic data | UCSD NAFLDCohort | EGAS00001004600 |
| FINRISK 2002 individual level data | FINRISK 2002 Cohort | |
| GTDB release 89 | Genome Taxonomy Database | |
| Centrifuge v.1.0.4 | ||
| XGBoost | ||
| mlrMBO | ||
| Caret | ||
| Glmnet | ||
| Survival | ||
| Illumina HiSeq4000 | Illumina | |