| Literature DB >> 35754317 |
Hiroshi Kuwabara1, Kenji Katsumata1, Atsuhiro Iwabuchi2, Ryutaro Udo1, Tomoya Tago1, Kenta Kasahara1, Junichi Mazaki1, Masanobu Enomoto1, Tetsuo Ishizaki1, Ryoko Soya1, Miku Kaneko3, Sana Ota3, Ayame Enomoto3, Tomoyoshi Soga3, Masaru Tomita3, Makoto Sunamura4, Akihiko Tsuchida1, Masahiro Sugimoto3,5, Yuichi Nagakawa1.
Abstract
As the worldwide prevalence of colorectal cancer (CRC) increases, it is vital to reduce its morbidity and mortality through early detection. Saliva-based tests are an ideal noninvasive tool for CRC detection. Here, we explored and validated salivary biomarkers to distinguish patients with CRC from those with adenoma (AD) and healthy controls (HC). Saliva samples were collected from patients with CRC, AD, and HC. Untargeted salivary hydrophilic metabolite profiling was conducted using capillary electrophoresis-mass spectrometry and liquid chromatography-mass spectrometry. An alternative decision tree (ADTree)-based machine learning (ML) method was used to assess the discrimination abilities of the quantified metabolites. A total of 2602 unstimulated saliva samples were collected from subjects with CRC (n = 235), AD (n = 50), and HC (n = 2317). Data were randomly divided into training (n = 1301) and validation datasets (n = 1301). The clustering analysis showed a clear consistency of aberrant metabolites between the two groups. The ADTree model was optimized through cross-validation (CV) using the training dataset, and the developed model was validated using the validation dataset. The model discriminating CRC + AD from HC showed area under the receiver-operating characteristic curves (AUC) of 0.860 (95% confidence interval [CI]: 0.828-0.891) for CV and 0.870 (95% CI: 0.837-0.903) for the validation dataset. The other model discriminating CRC from AD + HC showed an AUC of 0.879 (95% CI: 0.851-0.907) and 0.870 (95% CI: 0.838-0.902), respectively. Salivary metabolomics combined with ML demonstrated high accuracy and versatility in detecting CRC.Entities:
Keywords: biomarker; colorectal cancer; metabolomics; polyamine; saliva
Mesh:
Substances:
Year: 2022 PMID: 35754317 PMCID: PMC9459332 DOI: 10.1111/cas.15472
Source DB: PubMed Journal: Cancer Sci ISSN: 1347-9032 Impact factor: 6.518
FIGURE 1Data analysis design. (A), Data used in this study. Data were randomly split into training and validation datasets. Machine learning (ML) models were developed using the cross‐validation (CV) of the training dataset and validated using the validation dataset. (B), The ensemble alternative decision tree (ADTree) models. Each model has several nodes. The averaged predictions of multiple ADTree are used as the final prediction. (C), Depiction of the comparisons drawn in the study. The gray and white boxes indicate positive and negative groups, respectively. HC and CRC are negative and positive groups, respectively. AD is considered as a positive group in comparison (1) and as a negative group in comparison (2). ADTree1 and MLR1 are developed for comparison (1), and ADTree2 and MLR2 are developed for comparison (2). AD, adenoma; CRC, colorectal cancer; HC, healthy controls
Subject information
| Training data ( | Validation data ( | |||||
|---|---|---|---|---|---|---|
| HC | AD | CRC | HC | AD | CRC | |
|
| 1159 | 25 | 117 | 1158 | 25 | 118 |
| Age | ||||||
| Mean | 45.65 | 66.30 | 67.42 | 45.19 | 61.81 | 69.63 |
| ±SD | 10.15 | 11.07 | 11.24 | 10.10 | 10.40 | 12.14 |
| Gender | ||||||
| Male | 318 | 21 | 64 | 338 | 20 | 66 |
| Female | 841 | 4 | 53 | 820 | 5 | 52 |
| Stage | ||||||
| 0/I/II(N1)/II(N2)/Iva | 2/30/36/25/14/10 | 2/31/36/25/14/10 | ||||
Abbreviations: AD, adenoma; CRC, colorectal cancer; HC, healthy controls.
FIGURE 2Heatmap illustrating salivary metabolite concentrations. Each metabolite concentration was divided by its average for the training and validation dataset. These data were averaged again for each group
FIGURE 3The difference in salivary metabolites between healthy controls (HC) and colorectal cancer (CRC). (A), Score plots of partial least squares discriminant analysis (PLS‐DA). x‐ and y‐axes indicate the first and the second PLS component. Each plot corresponds to one sample. The plots with shorter distances indicate high similarity of the metabolomics profile of these samples. (B), Variable importance projection (VIP) score of PLS‐DA. (C), Pathway analysis. The metabolite concentration of each sample was divided by its median value. Subsequently, the data were log2‐transformed and translated into Z‐scores. For PLS‐DA, the 10‐fold cross‐validation with five components showed the highest generalization value (R 2 = 0.552 and Q 2 = 0.524)
FIGURE 4Discriminability of machine learning (ML) models. Receiver‐operating characteristic (ROC) curves of all data (A) as well as the cross‐validation (CV) of the training (B) and validation (C) datasets by alternative decision tree (ADTree)1. The ADTree1 prediction probability for adenoma (AD) + colorectal cancer (CRC) using the training (D) and validation (E) datasets. ADTree2 ROC curves for all data (F) as well as the CV of the training (G) and validation (H) datasets. ADTree2 prediction probability for CRC using the training (I) and validation (J) datasets. A, B, E, F, All area under ROC curves (AUC) values are presented with a 95% confidence interval (CI) between parentheses. The values were statistically significant (p < 0.0001). D, E, I, J, Asterisks indicate the P value of Dunn's post‐test after the Kruskal‐Wallis test. ***p < 0.01 and ****p < 0.0001. The y‐axis indicates the prediction probability