| Literature DB >> 36077748 |
Mireia Obón-Santacana1,2,3, Joan Mas-Lloret1,2,3, David Bars-Cortina1,2, Lourdes Criado-Mesas1,2, Robert Carreras-Torres1,2,3,4, Anna Díez-Villanueva1,2,3, Ferran Moratalla-Navarro1,2,3,5, Elisabet Guinó1,2,3, Gemma Ibáñez-Sanz1,2,6, Lorena Rodríguez-Alonso6, Núria Mulet-Margalef7,8, Alfredo Mata9, Ana García-Rodríguez10, Eric J Duell1,2,3, Ville Nikolai Pimenoff11, Victor Moreno1,2,3,5.
Abstract
The gut microbiome is a potential modifiable risk factor for colorectal cancer (CRC). We re-analyzed all eight previously published stool sequencing data and conducted an MWAS meta-analysis. We used cross-validated LASSO predictive models to identify a microbiome signature for predicting the risk of CRC and precancerous lesions. These models were validated in a new study, Colorectal Cancer Screening (COLSCREEN), including 156 participants that were recruited in a CRC screening context. The MWAS meta-analysis identified 95 bacterial species that were statistically significantly associated with CRC (FDR < 0.05). The LASSO CRC predictive model obtained an area under the receiver operating characteristic curve (aROC) of 0.81 (95%CI: 0.78-0.83) and the validation in the COLSCREEN dataset was 0.75 (95%CI: 0.66-0.84). This model selected a total of 32 species. The aROC of this CRC-trained model to predict precancerous lesions was 0.52 (95%CI: 0.41-0.63). We have identified a signature of 32 bacterial species that have a good predictive accuracy to identify CRC but not precancerous lesions, suggesting that the identified microbes that were enriched or depleted in CRC are merely a consequence of the tumor. Further studies should focus on CRC as well as precancerous lesions with the intent to implement a microbiome signature in CRC screening programs.Entities:
Keywords: MWAS; colorectal cancer; meta-analysis; metagenomics; microbiome; predictive model; shotgun
Year: 2022 PMID: 36077748 PMCID: PMC9454621 DOI: 10.3390/cancers14174214
Source DB: PubMed Journal: Cancers (Basel) ISSN: 2072-6694 Impact factor: 6.575
Summary of sample sizes and epidemiological data of all the included studies.
| Ref | Total | Healthy/ | Precancerous Lesions | CRC | Woman | Age | BMI | |
|---|---|---|---|---|---|---|---|---|
| Study | n | n | n | n | % | Mean (SD) | Mean (SD) | |
| Zeller et al. | [ | 199 | 93 | 17 | 89 | 41 | 62.3 (12.1) | 25.6 (4.0) |
| Feng et al. | [ | 156 | 63 | 47 | 46 | 44 | 66.9 (8.3) | 27.4 (4.0) |
| Vogtmann et al. | [ | 104 | 52 | - | 52 | 29 | 61.5 (12.3) | 25.1 (4.2) |
| Yu et al. | [ | 128 | 54 | - | 74 | 37 | 64.2 (9.1) | 23.8 (3.1) |
| Yachida et al. | [ | 576 | 251 | 140 | 185 | 40 | 61.9 (11.0) | 22.9 (3.4) |
| Wirbel et al. | [ | 82 | 60 | - | 22 | 48 | 60.0 (11.6) | 25.0 (3.7) |
| Thomas et al. | [ | 140 | 52 | 27 | 61 | 35 | 63.5 (9.7) | 25.6 (4.0) |
| Gupta et al. | [ | 59 | 30 | - | 29 | 51 | 50.8 (16.1) | 21.5 (3.1) |
| Obón-Santacana et al. | - | 156 | 51 | 54 | 51 | 36 | 61.0 (7.9) | 27.6 (4.2) |
Figure 1Microbiome diversity statistics of the included metagenomic datasets. (a) Alpha diversity metrics (Faith’s index). (b) Beta diversity metrics (based on Euclidean distances of ILR-transformed relative abundance counts). The right ellipse represents Asian studies (Gupta, Yachida and Yu) meanwhile the left ellipse depicts USA and EU studies (Feng, Obón-Santacana, Thomas, Vogtmann, Wirbel and Zeller). Both ellipses represent a 95% confidence region.
Figure 2Species that were statistically associated to CRC. Black squares with lines represents the estimate of the effect size and 95% confidence intervals. The colored dots represent the estimates of the effect sizes for each dataset. (a) Species that were found to be decreased in cancer (columns 1 and 2). (b) Species that were found to be increased in cancer (column 3).
Figure 3Summary of the LASSO predictive model in our dataset. (a) Training (blue) and validation (red) estimate values for each control-enriched (green) and cancer-enriched (purple) species selected by the model. (b) Density plot of model prediction, colored by the status of the samples. (c) Receiver operating characteristic curve representing the performance of the model. *: statistically significant (p-value < 0.05) based on Wilcoxon rank sum test in aldex.ttest.
Figure 4Analysis of eggNOG orthologous group. (a) Training (blue) and validation (red) estimate values for each control-enriched (green) and cancer-enriched (purple) species that was selected by the model. (b) Amount of significantly associated orthologous groups, clustered by general category. Blue represents the control-associated groups, while red represents the cancer-associated groups. Category “S” (function unknown) was excluded. Orthologous groups belonging to more than one category were counted for each. (c) The receiver operating characteristic curve representing the performance of the predictive model. *: statistically significant (p-value < 0.05) based on Wilcoxon rank sum test in aldex.ttest.