Literature DB >> 35754317

Salivary metabolomics with machine learning for colorectal cancer detection.

Hiroshi Kuwabara¹, Kenji Katsumata¹, Atsuhiro Iwabuchi², Ryutaro Udo¹, Tomoya Tago¹, Kenta Kasahara¹, Junichi Mazaki¹, Masanobu Enomoto¹, Tetsuo Ishizaki¹, Ryoko Soya¹, Miku Kaneko³, Sana Ota³, Ayame Enomoto³, Tomoyoshi Soga³, Masaru Tomita³, Makoto Sunamura⁴, Akihiko Tsuchida¹, Masahiro Sugimoto^3,5, Yuichi Nagakawa¹.

Abstract

As the worldwide prevalence of colorectal cancer (CRC) increases, it is vital to reduce its morbidity and mortality through early detection. Saliva-based tests are an ideal noninvasive tool for CRC detection. Here, we explored and validated salivary biomarkers to distinguish patients with CRC from those with adenoma (AD) and healthy controls (HC). Saliva samples were collected from patients with CRC, AD, and HC. Untargeted salivary hydrophilic metabolite profiling was conducted using capillary electrophoresis-mass spectrometry and liquid chromatography-mass spectrometry. An alternative decision tree (ADTree)-based machine learning (ML) method was used to assess the discrimination abilities of the quantified metabolites. A total of 2602 unstimulated saliva samples were collected from subjects with CRC (n = 235), AD (n = 50), and HC (n = 2317). Data were randomly divided into training (n = 1301) and validation datasets (n = 1301). The clustering analysis showed a clear consistency of aberrant metabolites between the two groups. The ADTree model was optimized through cross-validation (CV) using the training dataset, and the developed model was validated using the validation dataset. The model discriminating CRC + AD from HC showed area under the receiver-operating characteristic curves (AUC) of 0.860 (95% confidence interval [CI]: 0.828-0.891) for CV and 0.870 (95% CI: 0.837-0.903) for the validation dataset. The other model discriminating CRC from AD + HC showed an AUC of 0.879 (95% CI: 0.851-0.907) and 0.870 (95% CI: 0.838-0.902), respectively. Salivary metabolomics combined with ML demonstrated high accuracy and versatility in detecting CRC.

Entities: Chemical

Keywords: biomarker; colorectal cancer; metabolomics; polyamine; saliva

Mesh：

Substances：
Biomarkers, Tumor

Year: 2022 PMID： 35754317 PMCID： PMC9459332 DOI： 10.1111/cas.15472

Source DB: PubMed Journal: Cancer Sci ISSN： 1347-9032 Impact factor: 6.518

adenoma alternative decision tree adenomatous polyposis coli area under ROC curves cancer antigen 19‐9 capillary electrophoresis carcinoembryonic antigen confidence interval colorectal cancer cross‐validation healthy controls liquid chromatography machine learning multiple logistic regression mass spectrometry triple‐quadrupole receiver‐operating characteristic tricarboxylic acid cycle time‐of‐flight variable importance projection

INTRODUCTION

Despite the advances in cancer diagnosis and management in the last decade, CRC still represents a significant global health burden. Overall, CRC ranks third in cancer morbidity and second in mortality among all cancers worldwide. , The prevalence of CRC is strongly associated with the westernization of eating and health habits. Furthermore, it is expected to increase further in developed countries with remarkable economic growth. , , Therefore, cancer detection is an important issue in CRC diagnosis and treatment. Fecal occult blood tests are the most commonly used CRC screening tests in Japan. Although these tests have contributed to the reduction in the mortality rate associated with CRC, , their limited sensitivity for early‐stage precancerous lesions, such as AD or CRC, indicated the need for improvement. In addition, a large proportion of the at‐risk population is still often detected in advanced stages. Currently used blood‐based biomarkers for CRC, such as CEA and cancer antigen 19‐9 (CA19‐9), are suitable for surveillance or prognostic indicator in CRC treatment but are unsuitable for screening or diagnosing CRC due to their low sensitivity and specificity, as well as the association with other types of gastrointestinal cancers, including gastric cancer, pancreatic cancer, or gynecological cancer such as ovarian cancer. Therefore, developing a convenient novel screening method with higher sensitivity and specificity is paramount. Frequently mutated genes have been identified in association with CRC, including APC, CTNNB1, KRAS, BRAF, and SMAD4. The epigenetic variation in CRC changes the hyper‐ and hypomethylation, which inactivates the tumor suppressor genes and activates oncogenes, leading to epithelial cell growth to cancerous tumor formulation. In addition to these genetic changes, malignant cancers, including CRC, have shown drastic metabolic shifts. For instance, regardless of oxygen availability, tumor cells activate the glycolysis pathway to produce adenosine triphosphate (Warburg effect). In addition, oxidative phosphorylation upregulation has been reported in several cancers. Glutamine is used as a carbon source alternative to glucose via the TCA in proliferating tumor cells to synthesize purines and pyrimidines. In addition, holistic changes in the metabolic pathways have been reported, including amino acid, pentose phosphate, urea cycle, polyamine, and nucleotide pathways. , , Therefore, the metabolites in biofluids including blood and saliva that reflect these metabolic aberrances associated with CRC have been analyzed to establish a novel set of biomarkers. , , , , , , Saliva is an ideal biofluid that enables various disease detections. However, salivary components are expected to be fragile compared with those from other biofluids. , , , Therefore, strict protocols must be established for processing saliva samples for reproducible quantification. For example, the unstimulated saliva collection, overnight fasting duration, restriction of any oral treatments before the sample collection, and frozen sample storage should follow standard protocols. , However, saliva tests allow noninvasive sampling, which is beneficial as cancer screening. Metabolomic biomarkers in saliva samples have been shown to represent a potential medium for cancer detection. , Biomarkers for oral cancer and cancers in the organs far from the oral cavity, such as breast cancer and pancreatic cancers, have been reported. , , , To enhance the discriminability of multiple biomarkers, ML is a cornerstone. , Using urinary polyamine profiles, we previously used an alternative decision tree (ADTree)–based prediction method to detect CRC. Salivary polyamines with this ML method showed high discriminability for breast cancers. In this study, we performed salivary metabolomic profiling of saliva collected from patients with CRC, patients with AD, and HC. We developed ML models to determine the combination of metabolite concentrations that could discriminate among these groups. Mainly, we drew two types of comparison: comparison between CRC + AD and HC and comparison between CRC and AD + HC. More than 2000 samples were examined, and the data were divided into two datasets. One dataset was used for the ML model development, and the other dataset was used to validate the ML model. Our approach has shown the screening potential of salivary metabolomic profiles to detect CRC.

MATERIALS AND METHODS

Subjects

This study was approved by the Ethics Committee of Tokyo Medical University (Nos. 2346 and 3405) and conducted in accordance with the Declaration of Helsinki. Written informed consent was obtained from all participants who agreed to serve as saliva donors. Patients with CRC who underwent chemotherapy and patients with chronic metabolic diseases, such as diabetes, were excluded. Patients histopathologically diagnosed with colorectal adenocarcinoma were included, and patients with all other types of cancer (adenosquamous cell carcinoma, endocrine carcinoma, lymphoma, etc.) were excluded. The resected specimens were pathologically classified according to the 7th edition of the Union for International Cancer Control TNM Classification of Malignant Tumors. All patients with AD were histologically diagnosed as having AD after polypectomy. In addition, samples of HC were collected at the Center for Health Surveillance and Preventive Medicine, Tokyo Medical University.

Saliva collection

Subjects were allowed only water intake after 9:00 p.m. on the day before collection. Salivary samples were collected between 9:00 and 11:00 a.m. They were required to brush their teeth without toothpaste on the day of collection and could not use lipstick, drink water, smoke, brush their teeth, or exercise intensively 1 hour before saliva collection. A polypropylene straw 1.1 cm in diameter was used to assist in the saliva collection. Approximately 400 μl of unstimulated saliva was collected and stored in 50 ml polypropylene tubes on ice to prevent the degeneration of salivary metabolites. After collection, saliva samples were immediately stored at −80°C. Visibly cloudy and highly bubbly saliva was eliminated by visual inspection, and another saliva sample was collected 5 min later.

Saliva preparation and metabolomics analyses

Saliva samples were analyzed via two methods. CE‐TOF‐MS (TOF‐MS) was used for nontargeted analyses of hydrophilic metabolites, and LC–triple quadrupole MS (QQQ‐MS) was used for accurate quantification of polyamines as described previously with slight modifications. , Frozen saliva was thawed at 4°C for approximately 1.5 hours and subsequently dissolved using a Vortex mixer at 25°C. Ten microliters of each sample were used in the LC‐MS analysis, and the rest in the CE‐MS analysis. For LC‐MS analysis, saliva was mixed with methanol (90 μl) containing 149.6 mM ammonium hydroxide [1% (v/v) ammonia solution] and 0.9 μM internal standards (d8‐spermine, d8‐spermidine, d6‐N 1‐acetylspermidine, d3‐N 1‐acetylspermine, d6‐N 1,N 8‐diacetylspermidine, d6‐N 1,N 12‐diacetylspermine, hypoxanthine‐13C,15N, 1,6‐diaminohexane, 13C,15N‐Arg, 13C,15N‐Lys, 13C,15N‐Met, 13C,15N‐Pro, 13C,15N‐Trp, d3‐Leu, and d5‐Phe). After centrifugation at 15,780 × g for 10 minutes at 4°C, the supernatant was transferred to a fresh tube and vacuum dried. The sample was reconstituted with 90% methanol (10 μl) and water (30 μl) and then vortexed and centrifuged at 15,780 × g for 10 minutes at 4°C. One microliter of supernatant was then injected into the LC‐MS. For CE‐MS, saliva was centrifuged through a 5‐kDa‐cutoff filter (EMD Millipore) at 9100 × g for at least 2.5 hours at 4°C. The filtrate (45 μl) was transferred to a 1.5‐ml Eppendorf tube with 2 mM of internal standards (methionine sulfone, 2‐[N‐morpholino]‐ethanesulfonic acid [MES], D‐camphol‐10‐sulfonic acid, sodium salt, 3‐aminopyrrolidine, and trimesate). The instrumentation and measurement conditions used for LC‐QQQ‐MS and CE‐TOF‐MS were as described previously. , , Raw data processing was conducted by following the typical data processing flow. LC‐MS data were processed using Agilent MassHunter Qualitative Analysis and Quantitative Analysis software, including the MassHunter Optimizer and the Dynamic Multiple Reaction Monitoring Mode (DMRM) software (version B.08.00; Agilent Technologies). Polyamine concentrations were calculated based on the peak area of corresponding internal standards. CE‐MS data were analyzed using MasterHands (Keio University) , with noise filtering, subtraction of baselines, peak integration for each sliced electropherogram, estimation of accurate m/z in MS, alignment of multiple datasets to generate peak matrices, and identification of each peak by matching m/z values and corrected migration times to corresponding entries in a standard library. Metabolite concentrations in CE‐MS were calculated based on the ratio of peak area divided by the area of the internal standards in the samples and standard compound mixtures. Polyamine LC‐MS data were used for subsequent analyses because both methods redundantly detected their peaks.

Data analysis

The collected data were randomly split into training and validation datasets (Figure 1A ). The metabolites detected in more than 95% of samples with P values < 0.05 (Mann‐Whitney test) between CRC and HC were selected. Fold changes (FC) of averaged concentrations between HC and CRC were calculated. Only the metabolites showing higher FC than the average FC were used for clustering analyses. To evaluate the overall difference in the metabolite profile between HC and CRC, we conducted partial least squares discriminant analysis (PLS‐DA) and pathway analyses. PLS‐DA is a classification method frequently used in the metabolomics field, which aims to maximize the covariance between the independent variables (metabolites) and the corresponding dependent variables (groups) by finding the subspace of the explanatory variables, for example, independent components. PLS‐DA can generate score plots and VIP score plots.

FIGURE 1

Data analysis design. (A), Data used in this study. Data were randomly split into training and validation datasets. Machine learning (ML) models were developed using the cross‐validation (CV) of the training dataset and validated using the validation dataset. (B), The ensemble alternative decision tree (ADTree) models. Each model has several nodes. The averaged predictions of multiple ADTree are used as the final prediction. (C), Depiction of the comparisons drawn in the study. The gray and white boxes indicate positive and negative groups, respectively. HC and CRC are negative and positive groups, respectively. AD is considered as a positive group in comparison (1) and as a negative group in comparison (2). ADTree1 and MLR1 are developed for comparison (1), and ADTree2 and MLR2 are developed for comparison (2). AD, adenoma; CRC, colorectal cancer; HC, healthy controls We evaluated the discrimination ability of multiple combinations of metabolites. As one of the multivariate analyses, MLR was used. Stepwise feature forward selection was used to eliminate colinearity of the independent variables. This method limits the number of variables and deals with only linear relationships between independent (metabolites) and dependent variables (groups). Therefore, we also used the ADTree algorithm, an ML method, which is an improved form of the conventional “if‐then” decision tree–based method. In addition, multiple datasets were generated by random sampling allowing redundant selection, and ADTree models corresponding to each of the generated datasets were developed. Predictive ability of each ADTree model was averaged to enhance prediction accuracy (ensemble method; Figure 1B). The number of nodes and ADtrees were optimized via k‐fold CV using the training dataset. The developed model was validated using the validation dataset (Figure 1A). In this study, although data were obtained from three groups (HC, AD, and CRC), we used MLR and ADTree as a two‐group classification method. Therefore, we drew two types of comparison for discriminating CRC + AD from HC (1) and CRC from AD + HC (2). ADTree models were developed for (1) and (2), named ADTree1 and ADTree2, respectively. In addition, MLR models were developed for (1) and (2), named MLR1 and MLR2, respectively (Figure 1C). The discrimination ability was evaluated using the area under the ROC curves (AUC). The quantified values, such as metabolite concentration and predicted probabilities, were evaluated using the Mann‐Whitney test for two‐group comparisons and the Kruskal‐Wallis test with Dunn's post‐tests for ≥ 3 group comparisons. JMP Pro (ver. 14.1.0; SAS Institute Inc.), GraphPad Prism (ver. 7.0.3; GraphPad Software, Inc.), MeV TM4 (ver. 4.9.0; http://mev.tm4.org), Weka (ver. 3.6.15; the University of Waikato), and Metaboanalysis (v.5.0, https://www.metaboanalyst.ca) were used for the analyses.

RESULTS

Overview of profiled metabolites

Table 1 summarizes training and validation data of the subjects enrolled in this study. A total of 2602 unstimulated saliva samples were collected from 235 subjects with CRC, 50 subjects with ADs, and 2317 HC. All data were randomly assigned to training (n = 1301) and validation datasets (n = 1301; Figure 1A). Among the 122 quantified metabolites, 63 metabolites showed a p < 0.05 (Mann‐Whitney test) between HC and CRC. Among them, 23 metabolites showing an FC > 1.71 (average FC) were selected for clustering analyses (Figure 2). The CRC metabolites showed higher concentrations than those of HC. Several metabolites associated with AD also showed higher concentrations than those of HC. The FC between HC and CRC is visualized in Figure S1. The acetylated polyamines, such as N‐acetylputrescine, N 1‐acetylspermine, N 1,N 8‐diacetylspermidine, N 8‐acetylspermidine, N 1‐acetylspermidine, and N 1,N 12‐diacetylspermine, were included, and the first two polyamines showed relatively high FCs in both datasets. Two glycolysis metabolites, pyruvate and lactate, two citrate cycle metabolites, succinate and malate, and four amino acids were also included.

TABLE 1

Subject information

	Training data (n = 1301)			Validation data (n = 1301)
	HC	AD	CRC	HC	AD	CRC
n	1159	25	117	1158	25	118
Age
Mean	45.65	66.30	67.42	45.19	61.81	69.63
±SD	10.15	11.07	11.24	10.10	10.40	12.14
Gender
Male	318	21	64	338	20	66
Female	841	4	53	820	5	52
Stage
0/I/II(N1)/II(N2)/Iva			2/30/36/25/14/10			2/31/36/25/14/10

Abbreviations: AD, adenoma; CRC, colorectal cancer; HC, healthy controls.

FIGURE 2

Heatmap illustrating salivary metabolite concentrations. Each metabolite concentration was divided by its average for the training and validation dataset. These data were averaged again for each group

Subject information Abbreviations: AD, adenoma; CRC, colorectal cancer; HC, healthy controls. Heatmap illustrating salivary metabolite concentrations. Each metabolite concentration was divided by its average for the training and validation dataset. These data were averaged again for each group

Partial least squares discriminant analysis and pathway analysis

The overall differences in metabolomic profiles between HC and CRC were evaluated using PLS‐DA (Figure 3A,B) and pathway analysis (Figure 3C). The score plots showed separated HC and CRC (Figure 3A). N‐acetylputrescine and N 1‐acetylspermine showed high VIP scores, thus highly contributing to this discrimination (Figure 3B). Pathway analysis detected three significantly enriched pathways, including (1) amino sugar and nucleotide sugar metabolism, (2) alanine, aspartate, and glutamate metabolism, and (3) arginine as well as proline metabolism. However, the pathway impact of pathway (1) was small, while those of (2) and (3) were relatively high.

FIGURE 3

The difference in salivary metabolites between healthy controls (HC) and colorectal cancer (CRC). (A), Score plots of partial least squares discriminant analysis (PLS‐DA). x‐ and y‐axes indicate the first and the second PLS component. Each plot corresponds to one sample. The plots with shorter distances indicate high similarity of the metabolomics profile of these samples. (B), Variable importance projection (VIP) score of PLS‐DA. (C), Pathway analysis. The metabolite concentration of each sample was divided by its median value. Subsequently, the data were log2‐transformed and translated into Z‐scores. For PLS‐DA, the 10‐fold cross‐validation with five components showed the highest generalization value (R 2 = 0.552 and Q 2 = 0.524)

Alternative decision tree models

The discrimination abilities of multiple combinations of metabolites were analyzed. Two ADTree models were developed for comparison (1) to discriminate AD + CRC from HC (ADTree1) and for comparison (2) to discriminate CRC from AD + HC (ADTree2; Figure 1C). CV using training data resulted in 16 trees and 7 nodes for ADTree1 and 12 trees and 8 nodes for ADTree2. The ADTree1 discriminability was 0.933 (95% CI: 0.916‐0.951) for all data (Figure 4A) and 0.860 (95% CI: 0.828‐0.891) for CV (Figure 4B) in the training dataset. The generalization ability of the developed ADTree1 was 0.870 (95% CI: 0.870‐0.870) in the validation dataset (Figure 4C). This value was similar to that observed for CV in the training datasets. The stage‐specific differences in probabilities of AD + CRC predicted by ADTree1 of training data and validation data are depicted in Figure 4D,E. The significant differences at all stages except for stage 0 (n = 2) between HC were observed (Dunn's post‐tests of Kruskal‐Wallis test). The ADTree2 proved similar discrimination abilities of 0.951 (95% CI: 0.936‐0.965) for all data (Figure 4F) and 0.879 (95% CI: 0.851‐0.907) for CV (Figure 4G) in the training datasets, and 0.870 (95% CI: 0.838‐0.902) in the validation dataset (Figure 4H). The probabilities of CRC predicted by ADTree1 also showed similar patterns (Figure 4I,J). The used metabolites and their usage numbers in ADTree1 and ADTree2 are depicted (Figure S2). The top three metabolites (N‐acetylputrescine, 4‐methyl‐2‐oxopentanoate, and 5‐oxoproline) were used in both models.

FIGURE 4

Discriminability of machine learning (ML) models. Receiver‐operating characteristic (ROC) curves of all data (A) as well as the cross‐validation (CV) of the training (B) and validation (C) datasets by alternative decision tree (ADTree)1. The ADTree1 prediction probability for adenoma (AD) + colorectal cancer (CRC) using the training (D) and validation (E) datasets. ADTree2 ROC curves for all data (F) as well as the CV of the training (G) and validation (H) datasets. ADTree2 prediction probability for CRC using the training (I) and validation (J) datasets. A, B, E, F, All area under ROC curves (AUC) values are presented with a 95% confidence interval (CI) between parentheses. The values were statistically significant (p < 0.0001). D, E, I, J, Asterisks indicate the P value of Dunn's post‐test after the Kruskal‐Wallis test. ***p < 0.01 and ****p < 0.0001. The y‐axis indicates the prediction probability

Multiple logistic regression models

To compare the discrimination ability of the ADTree and MLR models, we developed two MLR models for the comparisons (1) and (2) (Figure 1C). We developed MLR1 to discriminate AD + CRC from HC, and MLR2 to discriminate CRC from AD + HC. The ROC curves using training and validation datasets and stage‐specific prediction probabilities are shown in Figure S3. MLR1 and MLR2 included three metabolites, and N‐acetylputrescine was selected in both models (Table S1). All AUC values yielded by both MLR models showed a p < 0.0001 (Table S2). However, comparing the results following validation, the AUC values of ML were higher than those of MLR models.

Comparisons with tumor markers

Carcinoembryonic antigen and CA19‐9 of AD and CRC subjects were measured, and the comparisons of sensitivities among the tumor markers (TMs), ML, and MLR models were summarized (Table S3). For TMs, the subjects showing a CEA > 5.0 ng/ml or CA19‐9 > 37 U/ml3 were counted as positive. For MLR and ML, the optimal cutoff value was defined to maximize the sensitivity and specificity using the training dataset, and validated using the validation dataset. The predicted probabilities higher than these cutoff values were counted as positive. In the validation dataset, the sensitivity to CEA and CA19‐9 from CRC subjects were 30.5% and 16.9%, respectively. Those of ADTree1, ADTree2, MLR1, and MLR2 were 77.1%, 70.4%, 77.1%, and 76.3%, respectively. All TM values for AD were negative. ADTree1 and MLR1 indicated 68% sensitivities, while ADTree2 and MLR2 indicated 60% sensitivities. The correlations between predicted probabilities and TM are listed in Table S4. In the validation dataset, the correlations (R, Spearman correlation) with ADTree1 were 0.174 (CEA) and 0.133 (CA19‐9). Those with ADTree2 were 0.216 (CEA) and 0.169 (CA19‐9). Only the comparison of MLR1 and CEA showed no significance. The correlations values with MLR1 were 0.114 (CEA) and 0.0900 (CA19‐9), and those with MLR2 were 0.117 (CEA) and 0.0375 (CA19‐9). All correlations produced by the MLR showed no significance.

Effect of age on the metabolomic profile

Age showed a significant effect on the metabolic profile. Therefore, age‐matched data were generated by eliminating HC subjects of lower ages. The MLR1 and MLR2 model's coefficients without feature selection were trained using training datasets and evaluated using validation datasets. In short, these models used already selected metabolites (Table S2). We conducted feature selection using the age‐matched data and developed MLR1 and MLR2 (Table S5). Both models with/without feature selection showed high AUC values (p < 0.0001) in both training and validation datasets (Table S6).

DISCUSSION

In this study, we investigated the use of metabolomics to discover salivary‐based biomarkers and discriminate among CRC, AD, and HC. As described in the heatmap (Figure 2), both training and validation datasets were highly similar. All metabolites in the map were elevated in CRC (Figure S1). Among them, end products of glycolysis, such as lactate and pyruvate, were elevated only in CRC. Lactate, also an end product of glycolysis, was observed in various reports, including our oral cancer saliva data. It can be inferred that the Warburg effect and glutaminolysis, , which are characteristic of cancer metabolism, might underlie the observed characteristics. In addition, several amino acids, such as isoleucine, valine, lysine, and alanine, were elevated in both AD and CRC. The intermediate metabolites associated with these energy and amino acid pathways were frequently reported. , In both ML models, acetylated polyamines were selected as predictive features. We previously reported the high polyamines in the urine collected from CRC patients as a valuable biomarker. , The synthesis of polyamines is attributed to various pathways, with the activation of ornithine decarboxylase that converts ornithine to putrescine in cancer cells largely affecting the polyamine contents. The activation of acetylation of spermine and spermidine resulted in high level of their acetylated forms, which spread to the scrounging biofluid (Figure S4). As frequently reported, we also previously confirmed that N 1,N 12‐diacetylspermine showed the most clearly elevated levels in CRC urine. , , However, other forms of acetylated polyamines, such as N 1‐acetylspermine and N 8‐diacetylspermidine, were also elevated, consistent with other studies. , In the current saliva data, N 1‐acetylspermine and N 1,N 8‐diacetylspermidine showed higher discriminability than N 1,N 12‐diacetylspermine (Figures S5 and S6). The differences in AUC values between all stages and early stages (0, I, and II) of CRC were small. These data are beneficial in enhancing the capacity to detect early‐stage CRC subjects. The prediction probabilities calculated by the two ML models were also successfully utilized. We observed significant differences between each CRC stage and HC, even in the early stages, although stage 0 showed no significance because of the small sample numbers (Figure 4). These metabolites also showed similar discriminability for AD (Figures S5 and S6). We previously confirmed the similar aberrant metabolomic profiles of AD and CRC, which were shifted by the activation of MYC genes observed in AD. MYC induced the activation of ODC, resulting in the polyamine synthesis activation (Figure S4). The prediction probabilities of AD by the developed models were significantly higher compared with HC, even though AD was grouped as negative data in ADTree2 (Figure 4I,J) and MLR2 (Figure S3G,H). These data indicate the usefulness of detecting AD and CRC, whereas the differentiation between AD and CRC is not satisfactory. We compared the sensitivity of ADTree models with those of CEA and CA19‐9 in AD and CRC data. Both ADTree models showed better sensitivity compared with these two TM. TM did not detect AD subjects; however, the ADTree and MLR models showed more than 60% sensitivity (Table S3). Furthermore, R = 0.216 between CEA and ADTree2 was the highest positive correlation between the ML model and TM in the validation data (Table S4). Therefore, the complimentary use of ADTree models and TM would benefit the screening of AD and CRC. Right‐sided CRC has a higher mortality rate and worse prognosis than left‐sided CRC, and both genetic and metabolomic differences between these two sides have been reported. , , Therefore, we evaluated the difference between the left and right colon on prediction accuracy. There was no significant difference between tumor locations even in the training and validation data (Figure S7). This trend was observed for both MLR1 and MLR2. Therefore, the prediction accuracies of the developed models were not affected by tumor location. Recently developed liquid biopsies for CRCs include methylation and abnormal levels of circulating tumor DNA and noncoding RNA, mainly micro‐RNA, as markers; however, most of these markers are detected in plasma and stool. In saliva samples, MiR‐21 has shown CRC and HC discrimination ability. A single marker that shows high specificity for a disease is beneficial for developing simple and reasonable assays as compared with simultaneous analyses of multiple markers for the detection of diseases. The analysis of volatile compounds in saliva has also shown potential to detect CRC. The current study measured only hydrophilic metabolites, and more comprehensive analyses could explore the accurate biomarkers of CRC. Several limitations need to be acknowledged. First, although the sample size is relatively large, this is a case‐control study; in short, the proportion of the three groups does not reflect the actual prevalence of these diseases. Second, the current data has an age bias between HC and the other groups. Therefore, age‐matched subsets were randomly generated, and the models showing discrimination at a significant level were confirmed (Tables S5 and S6). However, evaluating the developed models using age‐matched data, including larger samples, is necessary for rigorous validations. Third, the comparison with other diseases, especially using other cancer types, was not performed. For example, salivary polyamines were elevated even in breast and pancreatic cancers. , , The elevation of salivary amino acids was also reported in breast cancer. Therefore, a single marker may not be enough for a disease‐specific index, and an ML capturing multiple metabolite patterns would enhance the specificity. To use the developed biomarkers for diagnostic purposes, rigorous validation is necessary, for example, comparison of clinical‐pathological features between HC and patients with CRC. The approach in the current study showed CRC detection abilities; however, room to improve the sensitivity and specificity of CRC detection still exists. In general, a lower threshold to determine the positive cases enhances the sensitivity and reduces the false‐negative cases for screening purposes. Meanwhile, a higher threshold is used to enhance the specificity and reduce the false‐positive cases for diagnostic purposes. Saliva metabolomics demonstrated in this study showed a high sensitivity, which is suitable for a screening test; however, the specificity is not enough for diagnostic purposes. The current result can encourage patients who show a higher risk of AD or CRC to undergo other diagnostic tests. In conclusion, we analyzed the salivary metabolic profiles of CRC, AD, and HC. The data showed consistent profile patterns, including polyamines, with previous studies. The ensemble ADTree models successfully discriminated against these groups with high sensitivity and specificity. We also showed high generality using validation datasets. In addition, the models showed higher sensitivity compared with CEA and CA19‐9. The models could contribute to clinical screening for AD and CRC.

ACKNOWLEDGMENTS

The authors thank the members of Center for Health Surveillance and Preventive Medicine, Tokyo Medical University Hospital for collecting saliva samples.

DISCLOSURE

Tomoyoshi Soga is an editorial board member of Cancer Science. The authors have no conflict of interest.

ETHICAL APPROVAL

Approval of the research protocol by an Institutional Reviewer Board: This study was approved by the Ethics Committee of Tokyo Medical University (Nos. 2346 and 3405).

INFORMED CONSENT

All informed consent was obtained from the subjects. Appendix S1 Click here for additional data file.

62 in total

Review 1. Cochrane systematic review of colorectal cancer screening using the fecal occult blood test (hemoccult): an update.

Authors: Paul Hewitson; Paul Glasziou; Eila Watson; Bernie Towler; Les Irwig
Journal: Am J Gastroenterol Date: 2008-05-13 Impact factor: 10.864

2. Effect of timing of collection of salivary metabolomic biomarkers on oral cancer detection.

Authors: Shigeo Ishikawa; Masahiro Sugimoto; Kenichiro Kitabatake; Micheal Tu; Ayako Sugano; Iku Yamamori; Asuka Iba; Kazuyuki Yusa; Miku Kaneko; Sana Ota; Kana Hiwatari; Ayame Enomoto; Tomita Masaru; Mitsuyoshi Iino
Journal: Amino Acids Date: 2017-01-18 Impact factor: 3.520

3. Effects of inter-day and intra-day variation on salivary metabolomic profiles.

Authors: Norishige Kawanishi; Noriyuki Hoshi; Sugimoto Masahiro; Ayame Enomoto; Sana Ota; Miku Kaneko; Tomoyoshi Soga; Masaru Tomita; Katsuhiko Kimoto
Journal: Clin Chim Acta Date: 2018-11-24 Impact factor: 3.786

Review 4. Dietary patterns and the risk of colorectal cancer and adenomas.

Authors: Giorgia Randi; Valeria Edefonti; Monica Ferraroni; Carlo La Vecchia; Adriano Decarli
Journal: Nutr Rev Date: 2010-07 Impact factor: 7.110

5. Rapid LC-MS/MS quantification of cancer related acetylated polyamines in human biofluids.

Authors: Brian C DeFelice; Oliver Fiehn
Journal: Talanta Date: 2018-12-26 Impact factor: 6.057

6. Plasma metabolomic profiling distinguishes right-sided from left-sided colon cancer.

Authors: Kui Deng; Peng Han; Wei Song; Zhuozhong Wang; Fan Zhang; Hongyu Xie; Weiwei Zhao; Huan Xu; Yuqing Cai; Zhiwei Rong; Xiwen Yu; Bin-Bin Cui; Kang Li
Journal: Clin Chim Acta Date: 2018-10-05 Impact factor: 3.786

7. MetaboAnalyst 5.0: narrowing the gap between raw spectra and functional insights.

Authors: Zhiqiang Pang; Jasmine Chong; Guangyan Zhou; David Anderson de Lima Morais; Le Chang; Michel Barrette; Carol Gauthier; Pierre-Étienne Jacques; Shuzhao Li; Jianguo Xia
Journal: Nucleic Acids Res Date: 2021-05-21 Impact factor: 16.971

8. Elevated Polyamines in Saliva of Pancreatic Cancer.

Authors: Yasutsugu Asai; Takao Itoi; Masahiro Sugimoto; Atsushi Sofuni; Takayoshi Tsuchiya; Reina Tanaka; Ryosuke Tonozuka; Mitsuyoshi Honjo; Shuntaro Mukai; Mitsuru Fujita; Kenjiro Yamamoto; Yukitoshi Matsunami; Takashi Kurosawa; Yuichi Nagakawa; Miku Kaneko; Sana Ota; Shigeyuki Kawachi; Motohide Shimazu; Tomoyoshi Soga; Masaru Tomita; Makoto Sunamura
Journal: Cancers (Basel) Date: 2018-02-05 Impact factor: 6.639

9. Genomic and Metabolomic Landscape of Right-Sided and Left-Sided Colorectal Cancer: Potential Preventive Biomarkers.

Authors: Ming-Wei Su; Chung-Ke Chang; Chien-Wei Lin; Hou-Wei Chu; Tsen-Ni Tsai; Wei-Chih Su; Yen-Cheng Chen; Tsung-Kun Chang; Ching-Wen Huang; Hsiang-Lin Tsai; Chang-Chieh Wu; Huang-Chi Chou; Bei-Hao Shiu; Jaw-Yuan Wang
Journal: Cells Date: 2022-02-03 Impact factor: 6.600

10. MetaboAnalyst: a web server for metabolomic data analysis and interpretation.

Authors: Jianguo Xia; Nick Psychogios; Nelson Young; David S Wishart
Journal: Nucleic Acids Res Date: 2009-05-08 Impact factor: 16.971

1 in total

1. Salivary metabolomics with machine learning for colorectal cancer detection.

Authors: Hiroshi Kuwabara; Kenji Katsumata; Atsuhiro Iwabuchi; Ryutaro Udo; Tomoya Tago; Kenta Kasahara; Junichi Mazaki; Masanobu Enomoto; Tetsuo Ishizaki; Ryoko Soya; Miku Kaneko; Sana Ota; Ayame Enomoto; Tomoyoshi Soga; Masaru Tomita; Makoto Sunamura; Akihiko Tsuchida; Masahiro Sugimoto; Yuichi Nagakawa
Journal: Cancer Sci Date: 2022-07-08 Impact factor: 6.518

1 in total