| Literature DB >> 35663898 |
Jose Liñares-Blanco1,2,3, Carlos Fernandez-Lozano1, Jose A Seoane4, Guillermo López-Campos5.
Abstract
Inflammatory bowel disease (IBD) is a chronic disease with unknown pathophysiological mechanisms. There is evidence of the role of microorganims in this disease development. Thanks to the open access to multiple omics data, it is possible to develop predictive models that are able to prognosticate the course and development of the disease. The interpretability of these models, and the study of the variables used, allows the identification of biological aspects of great importance in the development of the disease. In this work we generated a metagenomic signature with predictive capacity to identify IBD from fecal samples. Different Machine Learning models were trained, obtaining high performance measures. The predictive capacity of the identified signature was validated in two external cohorts. More precisely a cohort containing samples from patients suffering Ulcerative Colitis and another from patients suffering Crohn's Disease, the two major subtypes of IBD. The results obtained in this validation (AUC 0.74 and AUC = 0.76, respectively) show that our signature presents a generalization capacity in both subtypes. The study of the variables within the model, and a correlation study based on text mining, identified different genera that play an important and common role in the development of these two subtypes.Entities:
Keywords: Crohn's disease; feature selection; inflammatory bowel disease; machine learning; microbiome; ulcerative colitis
Year: 2022 PMID: 35663898 PMCID: PMC9157387 DOI: 10.3389/fmicb.2022.872671
Source DB: PubMed Journal: Front Microbiol ISSN: 1664-302X Impact factor: 6.064
Summary descriptives table by groups of “cohort.”
|
|
|
| |
|---|---|---|---|
|
|
| ||
| Age | 46.7 (17.8) | 45.1 (17.0) | 0.413 |
| Sex | 0.267 | ||
| Female | 50 (51.5%) | 286 (52.5%) | |
| Male | 45 (46.4%) | 251 (46.1%) | |
| Unknown | 1 (1.03%) | 8 (1.47%) | |
| Unspecified | 1 (1.03%) | 0 (0.00%) | |
| IBD | 0.270 | ||
| Control | 54 (55.7%) | 267 (49.0%) | |
| IBD | 43 (44.3%) | 278 (51.0%) |
Figure 1Signature identification. (A) Training and test data available from the American Gut Project for IBD samples. (B) Upset plot indicating the common features selected by each FS method. (C) Variable importance of the winning model measured after cross-validation. In this case, the 40 genera selected by the Kruskal Wallis method are shown. The bars represent the importance of the variables within the glmnet model. This was done by summing the betas of each variable over all iterations of the CV. (D) Comparison of the performance of the models in train and test. Note that the train value is the result of the arithmetic mean of the five iterations of the CV, while the test result is a single measure. The results of the glmnet and RF models are shown in AUC value.
Figure 2External validation results. (A) Results obtained in each of the external validation datasets, measured in AUC. (B) ROC curve of the RF model in the two external validation datasets. (C) Genes available in each external validation cohort, with which the models were re-trained.
Figure 3Heatmaps of the signature in external cohorts. Abundance of genera used for retraining in the cohort of (A) Gevers et al. and (B) Morgan et al. In both figures, genera are ordered according to model importance.