| Literature DB >> 29335008 |
Gavin M Douglas1, Richard Hansen2, Casey M A Jones3, Katherine A Dunn4, André M Comeau5, Joseph P Bielawski4, Rachel Tayler2, Emad M El-Omar6, Richard K Russell2, Georgina L Hold6, Morgan G I Langille7,8,9, Johan Van Limbergen10.
Abstract
BACKGROUND: Crohn's disease (CD) has an unclear etiology, but there is growing evidence of a direct link with a dysbiotic microbiome. Many gut microbes have previously been associated with CD, but these have mainly been confounded with patients' ongoing treatments. Additionally, most analyses of CD patients' microbiomes have focused on microbes in stool samples, which yield different insights than profiling biopsy samples.Entities:
Keywords: Crohn’s disease; Machine learning; Microbiome; Pediatric; Treatment response; Treatment-naïve
Mesh:
Substances:
Year: 2018 PMID: 29335008 PMCID: PMC5769311 DOI: 10.1186/s40168-018-0398-3
Source DB: PubMed Journal: Microbiome ISSN: 2049-2618 Impact factor: 14.650
Fig. 1Diagram of the different datasets used for classification in this study. Datasets in orange were derived from the shotgun metagenomic sequencing (MGS) data (n = 40) and the datasets in blue were derived from the 16S rRNA gene (16S) sequencing data (n = 38*). These datasets were used to classify both disease state and treatment response as input to random forest machine learning models. *Note two Crohn’s disease samples were removed from both the 16S sequencing and MGS datasets due to low sequencing coverage, but their genetic profile was inferred from the MGS
Fig. 2Classification accuracies for all datasets classifying a disease state and b treatment response. Each bar corresponds to a different model. Accuracies are based on random forest (RF) leave-one-out cross-validation (LOOCV) in all cases, except for number of observed OTUs (# OTUs) and genetic risk scores (GRS) which are based on LOOCV of simple linear cut-off models. The symbols *, **, and *** indicate significance at P < 0.05, P < 0.01, and P < 0.001, respectively. RF model significances were based on a permutation test. P values for # OTUs and GRS are based on one-tailed Mann-Whitney-Wilcoxon Tests
Fig. 3Variable importance of features in combined random forest models for a disease state classification and b treatment response classification. Red and blue are used to indicate which class has a higher mean standardized relative abundance. Features that did not significantly differ (P ≥ 0.05) between classes based on a two-tailed Mann-Whitney-Wilcoxon test are indicated in gray. Features in black and green font indicate 16S rRNA gene and shotgun metagenomics sequencing origins, respectively. “Un” stands for “Unclassified” when used in taxa names