| Literature DB >> 31275588 |
Ezzat Dadkhah1, Masoumeh Sikaroodi1, Louis Korman2, Robert Hardi3, Jeffrey Baybick2, David Hanzel4, Gregory Kuehn5, Thomas Kuehn5, Patrick M Gillevet1.
Abstract
OBJECTIVE: To characterise the gut microbiome in subjects with and without polyps and evaluate the potential of the microbiome as a non-invasive biomarker to screen for risk of colorectal cancer (CRC).Entities:
Keywords: biopsy; classification; colorectal cancer; machine learning; microbiome; polyp; risk assessment; sequencing; stool
Year: 2019 PMID: 31275588 PMCID: PMC6577315 DOI: 10.1136/bmjgast-2019-000297
Source DB: PubMed Journal: BMJ Open Gastroenterol ISSN: 2054-4774
Figure 1Receiver operating characteristic (ROC) curves for five classifiers using informative operational taxonomic units (OTU) for biopsy, home stool, and rectal swab data sets. The evaluated classifiers included Naïve Bayes (red), random forest (orange), K-nearest neighbour (yellow), logistic regression (green), and Neural Network (blue). The straight line represents the null model. (A) For the biopsy data set, the best performing classifier was Naïve Bayes with area under the curve (AUC) equal to 0.85. (B) For the home stool data set, the best performing classifiers were Naïve Bayes and random forest with AUC=0.83. (C) For the rectal swab data set, the best performing classifiers were random forest with AUC=0.81 and Naïve Bayes with AUC=0.80.
Classification accuracy of the Naïve Bayes and Neural Network models for the home collected stool samples using naïve test sets
| Model | Classification accuracy | Sensitivity | Specificity | AUC |
| Naive Bayes | 79% | 83% | 72% | 86% |
| Neural Network | 82% | 86% | 75% | 87% |
AUC, area under the curve;
Confusion matrix for the Naïve Bayes and Neural Network models for the home collected stool samples using naïve test sets
| Model | Count | Score HS_Polyp_Y | Score HS_Polyp_N | Classification |
| Naïve Bayes |
|
|
|
|
| Naïve Bayes | 19 | 82% | 18% | FP |
| Naïve Bayes | 50 | 10% | 90% | TN |
| Naïve Bayes | 82 | 90% | 10% | TP |
| Neural Network |
|
|
|
|
| Neural Network | 17 | 83% | 17% | FP |
| Neural Network | 52 | 11% | 89% | TN |
| Neural Network | 85 | 90% | 10% | TP |
*False negatives using Naive Bayes model
†False negatives using Neural Network model
AUC, area under the curve; FN, false negative;FP, false positive;HS, home collected stool sample;HS_Polyp_N, Home Stool from subjects without polyps; HS_Polyp_Y, Home Stool from subjects with Polyps; Polyp_N, polyp-negative group;Polyp_Y, polyp-positive group;TN, true negative;TP, true positive.
Figure 2Naïve Bayes and Neural Network classification scores for the naïve predictions of the home stool samples.
Classification scores and predictions for the 22 subjects with polyps whose home stool samples were called false negative on Naïve Bayes and/or Neural Network models
| Sample ID | Class | Classifier | Score HS_Polyp_Y | Score HS_Polyp_N | Predicted | Call |
| A. Subjects where both models call FN | ||||||
| HS_23 | HS_Polyp_Y | Naïve Bayes | 0.007 | 0.993 | HS_Polyp_N | FN |
| Neural Network | 0 | 1 | HS_Polyp_N | FN | ||
| HS_341 | HS_Polyp_Y | Naïve Bayes | 0.159 | 0.841 | HS_Polyp_N | FN |
| Neural Network | 0.177 | 0.823 | HS_Polyp_N | FN | ||
| HS_372 | HS_Polyp_Y | Naïve Bayes | 0.213 | 0.787 | HS_Polyp_N | FN |
| Neural Network | 0.208 | 0.792 | HS_Polyp_N | FN | ||
| HS_381 | HS_Polyp_Y | Naïve Bayes | 0.005 | 0.995 | HS_Polyp_N | FN |
| Neural Network | 0.373 | 0.627 | HS_Polyp_N | FN | ||
| HS_384 | HS_Polyp_Y | Naïve Bayes | 0.026 | 0.974 | HS_Polyp_N | FN |
| Neural Network | 0.256 | 0.744 | HS_Polyp_N | FN | ||
| HS_386 | HS_Polyp_Y | Naïve Bayes | 0.35 | 0.65 | HS_Polyp_N | FN |
| Neural Network | 0.418 | 0.582 | HS_Polyp_N | FN | ||
| HS_413 | HS_Polyp_Y | Naïve Bayes | 0.328 | 0.672 | HS_Polyp_N | FN |
| Neural Network | 0.427 | 0.573 | HS_Polyp_N | FN | ||
| HS_423 | HS_Polyp_Y | Naïve Bayes | 0.01 | 0.99 | HS_Polyp_N | FN |
| Neural Network | 0.002 | 0.998 | HS_Polyp_N | FN | ||
| HS_461 | HS_Polyp_Y | Naïve Bayes | 0.578 | 0.422 | HS_Polyp_N | FN |
| Neural Network | 0.303 | 0.697 | HS_Polyp_N | FN | ||
| B. Subjects where one model calls FN and the other calls TP | ||||||
| HS_363 | HS_Polyp_Y | Naïve Bayes | 0.779 | 0.221 | HS_Polyp_Y | TP |
| Neural Network | 0.034 | 0.966 | HS_Polyp_N | FN | ||
| HS_367 | HS_Polyp_Y | Naïve Bayes | 0.905 | 0.095 | HS_Polyp_Y | TP |
| Neural Network | 0.365 | 0.635 | HS_Polyp_N | FN | ||
| HS_373 | HS_Polyp_Y | Naïve Bayes | 0.578 | 0.422 | HS_Polyp_N | FN |
| Neural Network | 0.992 | 0.008 | HS_Polyp_Y | TP | ||
| HS_403 | HS_Polyp_Y | Naïve Bayes | 0.627 | 0.373 | HS_Polyp_N | FN |
| Neural Network | 0.982 | 0.018 | HS_Polyp_Y | TP | ||
| HS_407 | HS_Polyp_Y | Naïve Bayes | 0.076 | 0.924 | HS_Polyp_N | FN |
| Neural Network | 0.991 | 0.009 | HS_Polyp_Y | TP | ||
| HS_412 | HS_Polyp_Y | Naïve Bayes | 0.317 | 0.683 | HS_Polyp_N | FN |
| Neural Network | 0.528 | 0.472 | HS_Polyp_Y | TP | ||
| HS_417 | HS_Polyp_Y | Naïve Bayes | 0.894 | 0.106 | HS_Polyp_Y | TP |
| Neural Network | 0.03 | 0.97 | HS_Polyp_N | FN | ||
| HS_420 | HS_Polyp_Y | Naïve Bayes | 0.524 | 0.476 | HS_Polyp_N | FN |
| Neural Network | 0.654 | 0.346 | HS_Polyp_Y | TP | ||
| HS_427 | HS_Polyp_Y | Naïve Bayes | 0.084 | 0.916 | HS_Polyp_N | FN |
| Neural Network | 0.536 | 0.464 | HS_Polyp_Y | TP | ||
| HS_45 | HS_Polyp_Y | Naïve Bayes | 0.651 | 0.349 | HS_Polyp_Y | TP |
| Neural Network | 0.079 | 0.921 | HS_Polyp_N | FN | ||
| HS_507 | HS_Polyp_Y | Naïve Bayes | 0.711 | 0.289 | HS_Polyp_N | FN |
| Neural Network | 0.682 | 0.318 | HS_Polyp_Y | TP | ||
| HS_6 | HS_Polyp_Y | Naïve Bayes | 0.031 | 0.969 | HS_Polyp_N | FN |
| Neural Network | 0.731 | 0.269 | HS_Polyp_Y | TP | ||
| HS_62 | HS_Polyp_Y | Naïve Bayes | 0.686 | 0.314 | HS_Polyp_Y | TP |
| Neural Network | 0.453 | 0.547 | HS_Polyp_N | FN | ||
Classification scores are presented for each model and subject where the true class (ie, Polyp-Y or Polyp-N) is tabulated along with the model scores and predictions.
FN, false negative;HS, home collected stool sample;Polyp_N, polyp-negative group;Polyp_Y, polyp-positive group;TP, true positive.
Summary for the adjusted composite confusion matrix for home stool samples
| Call | False negative | False positive | True positive | True negative |
| n | 9 | 22 | 90 | 47 |
| Rate (of 168 subjects) | 5% | 13% | 54% | 28% |
Subjects that had mixed false negative and true positive calls by the two models were binned as true positive.