| Literature DB >> 35012354 |
Courtney R Armour1, Begüm D Topçuoğlu1, Andrea Garretto1, Patrick D Schloss1.
Abstract
Colorectal cancer is a common and deadly disease in the United States accounting for over 50,000 deaths in 2020. This progressive disease is highly preventable with early detection and treatment, but many people do not comply with the recommended screening guidelines. The gut microbiome has emerged as a promising target for noninvasive detection of colorectal cancer. Most microbiome-based classification efforts utilize taxonomic abundance data from operational taxonomic units (OTUs) or amplicon sequence variants (ASVs) with the goal of increasing taxonomic resolution. However, it is unknown which taxonomic resolution is optimal for microbiome-based classification of colorectal cancer. To address this question, we used a reproducible machine learning framework to quantify classification performance of models based on data annotated to phylum, class, order, family, genus, OTU, and ASV levels. We found that model performance increased with increasing taxonomic resolution, up to the family level where performance was equal (P > 0.05) among family (mean area under the receiver operating characteristic curve [AUROC], 0.689), genus (mean AUROC, 0.690), and OTU (mean AUROC, 0.693) levels before decreasing at the ASV level (P < 0.05; mean AUROC, 0.676). These results demonstrate a trade-off between taxonomic resolution and prediction performance, where coarse taxonomic resolution (e.g., phylum) is not distinct enough, but fine resolution (e.g., ASV) is too individualized to accurately classify samples. Similar to the story of Goldilocks and the three bears (L. B. Cauley, Goldilocks and the Three Bears, 1981), mid-range resolution (i.e., family, genus, and OTU) is "just right" for optimal prediction of colorectal cancer from microbiome data. IMPORTANCE Despite being highly preventable, colorectal cancer remains a leading cause of cancer-related death in the United States. Low-cost, noninvasive detection methods could greatly improve our ability to identify and treat early stages of disease. The microbiome has shown promise as a resource for detection of colorectal cancer. Research on the gut microbiome tends to focus on improving our ability to profile species and strain level taxonomic resolution. However, we found that finer resolution impedes the ability to predict colorectal cancer based on the gut microbiome. These results highlight the need for consideration of the appropriate taxonomic resolution for microbiome analyses and that finer resolution is not always more informative.Entities:
Keywords: 16S rRNA gene sequencing; colon cancer; machine learning; microbiome; taxonomic level
Year: 2022 PMID: 35012354 PMCID: PMC8749421 DOI: 10.1128/mbio.03161-21
Source DB: PubMed Journal: mBio Impact factor: 7.867
FIG 1Random forest model performance. (A) Strip plot of the area under the receiver operating characteristic curve (AUROC) values on the test data set for 100 seeds predicting SRNs using a random forest model. Black circles denote the means, and black lines denote the standard deviations. The gray dashed line denotes an AUROC of 0.5 which is equivalent to random classification. Significance between taxonomic levels was quantified by comparing the difference in mean AUROC and is denoted by letters A through E on the right side of the plot; taxonomic levels with the same letter are in the same significance group and are not significantly different from one another. (B) Strip plot of the sensitivity at a specificity of 90% across the 100 model iterations for each taxonomic level. Black circles denote the means, and black lines denote the standard deviations. The letters W through Z on the right side of the plot denote the significance groups.
Overview of the number of features at each taxonomic level before and after preprocessing as described in Materials and Methods
| Taxonomic level | No. of features | % of features kept after preprocessing | |
|---|---|---|---|
| Before preprocessing | After preprocessing | ||
| Phylum | 19 | 9 | 47.4 |
| Class | 36 | 19 | 52.8 |
| Order | 65 | 28 | 43.1 |
| Family | 124 | 54 | 43.5 |
| Genus | 316 | 115 | 36.4 |
| OTU | 20,079 | 705 | 3.5 |
| ASV | 104,106 | 478 | 0.5 |