| Literature DB >> 31191599 |
Bangzhou Zhang1,2, Shuangbin Xu3, Wei Xu4, Qiongyun Chen1,2, Zhangran Chen2, Changsheng Yan2, Yanyun Fan1, Huangkai Zhang3, Qi Liu4, Jie Yang4, Jinfeng Yang4, Chuanxing Xiao1,2, Hongzhi Xu1,2, Jianlin Ren1,2.
Abstract
Colorectal cancer (CRC) ranks second in cancer-associated mortality and third in the incidence worldwide. Most of CRC follow adenoma-carcinoma sequence, and have more than 90% chance of survival if diagnosed at early stage. But the recommended screening by colonoscopy is invasive, expensive, and poorly adhered to. Recently, several studies reported that the fecal bacteria might provide non-invasive biomarkers for CRC and precancerous tumors. Therefore, we collected and uniformly re-analyzed these published fecal 16S rDNA sequencing datasets to verify the association and identify biomarkers to classify and predict colorectal tumors by random forest method. A total of 1674 samples (330 CRC, 357 advanced adenoma, 141 adenoma, and 846 control) from 7 studies were analyzed in this study. By random effects model and fixed effects model, we observed significant differences in alpha-diversity and beta-diversity between individuals with CRC and the normal colon, but not between adenoma and the normal. We identified various bacterial genera with significant odds ratios for colorectal tumors at different stages. Through building random forest model with 10-fold cross-validation as well as new test datasets, we classified individuals with CRC, advanced adenoma, adenoma and normal colon. All approaches obtained comparable performance at entire OTU level, entire genus level, and the common genus level as measured using AUC. When combined all samples, the AUC of random forest model based on 12 common genera reached 0.846 for CRC, although the predication performed poorly for advance adenoma and adenoma.Entities:
Keywords: colorectal adenoma; colorectal cancer; fecal bacteria; random effects model; random forest
Year: 2019 PMID: 31191599 PMCID: PMC6547015 DOI: 10.3389/fgene.2019.00447
Source DB: PubMed Journal: Front Genet ISSN: 1664-8021 Impact factor: 4.599
characteristics of the fecal 16S rDNA sequencing studies included in the meta-analysis.
| No. | Author, year | Country | Source∗ | Health | Polyps | Adenoma (<1 cm) | Adv_adenoma (>1 cm) | CRC | DNA extraction | Region | Seq platform |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | China | SRA | 33 | 0 | 0 | 0 | 17 | GenElute Stool DNA isolation Kit | V3-V4 | HiSeq | |
| 2 | Ireland | Author | 62 | 0 | 22 | 0 | 69 | Allprep DNA/RNA kit-Qiagen | V3-V4 | MiSeq | |
| 3 | Italy | Author | 18 | 14 | 18 | 21 | 8 | QIAamp DNA stool kit | V4 | Miseq | |
| 4 | Ireland | Author | 36 | 0 | 0 | 0 | 42 | Allprep DNA/RNA kit-Qiagen | V3-V4 | MiSeq | |
| 5 | United States | Author | 475 | 0 | 0 | 203 | 34 | Chemagic DNA Blood Special Kit | V3-V5 | MiSeq | |
| 6 | United States + Canada | SRA | 172 | 0 | 88 | 108 | 119 | PowerSoil | V4 | MiSeq | |
| 7 | France | SRA | 50 | 0 | 13 | 25 | 41 | GNOME DNA Isolation Kit(MP) | V4 | Miseq | |
| 8 | China | NA | 130 | 30 | 32 | 88 | 130 | OMEGA-soil DNA kit | V3-V4 | MiSeq | |
| 9 | China | NA | 52 | 0 | 47 | 42 | E.Z.N.A. Stool DNA Kit | V1-V3 | 454 | ||
| 10 | China | NA | 24 | 9 | 0 | 20 | 2 | – | V3-V4 | MiSeq | |
| 11 | Spain | NA | 10 | 0 | 11 | 7 | Macherey–Nagel | V1-V3 | 454 | ||
| 12 | United States | NA | 94 | 0 | 0 | 0 | 47 | PowerSoil | V3-V4 | 454 | |
| 13 | United States | SRA | 6 | 6 | 0 | 0 | 0 | QIAamp Stool DNA | V1-V3 | 454 | |
| 14 | China | NA | 47 | 0 | 0 | 47 | 0 | Bead beating methods and phenol-chloroform | V1-V3 | 454 | |
| 15 | United States | NA | 8 | 0 | 0 | 0 | 7 | MoBio Powersoil | V4 | 454 | |
| 16 | China | NA | 20 | 0 | 0 | 0 | 19 | QIAamp Stool DNA | V3 | 454 | |
| 17 | China | NA | 21 | 0 | 0 | 0 | 22 | QIAamp DNA Mini Kit | V1-V3 | 454 | |
| 18 | China | NA | 56 | 0 | 0 | 0 | 46 | Bead-beating extraction and phenol–chloroform | V3 | 454 | |
| 19 | France | NA | 6 | 0 | 0 | 0 | 6 | GNOME DNA Isolation Kit(MP) | V3-V4 | 454 |
FIGURE 1The principal coordinates analysis depicting the great microbial variations from different studies with variables of DNA extraction methods, PCR amplification conditions, sequencing platforms, etc. The points represent samples, shapes represent the different diagnosis, and the colors represent the different study.
FIGURE 2Forest plot of the alpha diversity metrics for (A) adenoma, (B) advance adenoma, and (C) colorectal cancer. The length of the error bar represents the 95% confidence interval. The left of dashed lines depicts that the metric of the case is higher than the control. And the right of dashed lines depicts that the metric of the case is lower than the control. It shows that there were significantly difference between the cases and the control, if there was no overlap between the dashed lines and the error bar.
FIGURE 3Forest plot of the Bray-Cutris distances between the individual with colorectal tumors and the normal colons. (A) Adenomas vs. normal colons; (B) Adv_adenomas vs. normal colons; (C) CRC vs. normal colons. The error bar depicts the 95% confidence interval. The left-hand side (minus value) of the dashed line depicts that distances between the case and the normal are higher than the distances between the subjects of control. The right-hand side of the dashed line depicts that distances between the case and the normal are lower than the distances between the control. There were significantly difference between the case and the control, if there was no cross between the dashed line and the error bar.
FIGURE 4The ROC curves of the each study based on the matrix of the total OTUs (A–C) and the matrix of the total genera (D–F). The gray lines represent the random predictors. The other lines depict the ROC curves of each study using the cross-validation with ten repeats.
FIGURE 5The ROC curves based on the matrix of the common genera. The gray lines represent the random predictors. The other lines depict the ROC curve of each study using the cross-validation with ten repeats (A–C). The black lines represent the curves of the models built using the total studies data sets with cross-validation, and the colorful lines represent the curves of the models using the combined studies data sets with minus a specific study (D–F).
FIGURE 6The performances of models to classify the case and the normal. (A) CRC vs. normal colons; (B) Adenoma vs. normal colons; (C) Adv_adenoma vs. normal colons. The horizontal ordinates depict the studies used as the training data set. The vertical coordinates depicts the AUC of the specific test study. The black line represent the median of AUC of all test AUCs for a specific model. The dashed gray lines represent the AUC at 0.5 with random predictors.
FIGURE 7The ROC curves of the models built using the matrix of the common genera and n-1 studies (leave-one-study-out) and validated in the specific study. (A) Adenoma vs. normal colons; (B) Adv_Adenoma vs. normal colons; (C) CRC vs. normal colons.
Importance, odd ration, heterogeneity, and relative abundance of the 9 common genera selected for the RF model for CRC based on all samples.
| Genera | Mean decrease Gini | Odd ratio | CI_lb | CI_ub | Abundance (%) in CRC | Abundance (%) in the normal | ||
|---|---|---|---|---|---|---|---|---|
| 15.72 | 1.34 | 0.85 | 2.12 | 0.2 | 36.78 | 1.087 ± 1.019 | 1.23 ± 0.87 | |
| 13.43 | 1.76 | 1.17 | 2.65 | 0.01 | 27.3 | 0.979 ± 0.846 | 1.417 ± 1.024 | |
| 12.9 | 1.17 | 0.87 | 1.58 | 0.31 | 0 | 1.365 ± 0.726 | 1.136 ± 1.239 | |
| 10.75 | 0.34 | 0.24 | 0.48 | 0 | 0 | 0.791 ± 1.444 | 0.106 ± 0.125 | |
| 10.13 | 0.63 | 0.37 | 1.09 | 0.1 | 61.3 | 2.565 ± 1.431 | 1.368 ± 0.898 | |
| 8.7 | 1.01 | 0.69 | 1.47 | 0.97 | 21.65 | 1.915 ± 1.681 | 2.055 ± 1.769 | |
| 8.11 | 1.65 | 1.14 | 2.4 | 0.01 | 31.98 | 0.379 ± 0.382 | 0.509 ± 0.376 | |
| 8.09 | 1.01 | 0.63 | 1.63 | 0.95 | 55.87 | 6.69 ± 3.042 | 6.624 ± 2.294 | |
| 7.54 | 1.48 | 1.03 | 2.11 | 0.03 | 27.84 | 1.406 ± 0.968 | 1.703 ± 1.307 | |
| 6.86 | 0.54 | 0.38 | 0.75 | 0 | 0 | 0.522 ± 0.421 | 0.169 ± 0.129 | |
| 6.78 | 1.59 | 1.2 | 2.12 | 0 | 0 | 1.122 ± 0.329 | 1.418 ± 0.564 | |
| 6.62 | 0.65 | 0.49 | 0.87 | 0 | 0 | 0.256 ± 0.209 | 0.093 ± 0.046 |