| Literature DB >> 33217646 |
Ying Xie1, Wei-Yu Meng1, Run-Ze Li1, Yu-Wei Wang1, Xin Qian2, Chang Chan2, Zhi-Fang Yu2, Xing-Xing Fan1, Hu-Dan Pan1, Chun Xie1, Qi-Biao Wu1, Pei-Yu Yan1, Liang Liu1, Yi-Jun Tang2, Xiao-Jun Yao3, Mei-Fang Wang4, Elaine Lai-Han Leung5.
Abstract
Early diagnosis has been proved to improve survival rate of lung cancer patients. The availability of blood-based screening could increase early lung cancer patient uptake. Our present study attempted to discover Chinese patients' plasma metabolites as diagnostic biomarkers for lung cancer. In this work, we use a pioneering interdisciplinary mechanism, which is firstly applied to lung cancer, to detect early lung cancer diagnostic biomarkers by combining metabolomics and machine learning methods. We collected total 110 lung cancer patients and 43 healthy individuals in our study. Levels of 61 plasma metabolites were from targeted metabolomic study using LC-MS/MS. A specific combination of six metabolic biomarkers note-worthily enabling the discrimination between stage I lung cancer patients and healthy individuals (AUC = 0.989, Sensitivity = 98.1%, Specificity = 100.0%). And the top 5 relative importance metabolic biomarkers developed by FCBF algorithm also could be potential screening biomarkers for early detection of lung cancer. Naïve Bayes is recommended as an exploitable tool for early lung tumor prediction. This research will provide strong support for the feasibility of blood-based screening, and bring a more accurate, quick and integrated application tool for early lung cancer diagnostic. The proposed interdisciplinary method could be adapted to other cancer beyond lung cancer.Entities:
Keywords: Biomarker; Early diagnosis; Lung cancer; Machine learning; Metabolites
Year: 2020 PMID: 33217646 PMCID: PMC7683339 DOI: 10.1016/j.tranon.2020.100907
Source DB: PubMed Journal: Transl Oncol ISSN: 1936-5233 Impact factor: 4.243
Fig. 1Heatmap depicting the metabolomic biomarker levels of Stage I lung tumor patients (n = 54) and healthy people (n = 43). Stage I lung tumor patients and healthy people were grouped by hierarchical clustering of metabolomic biomarker levels.
Fig. 2Metabolic biomarkers for detection of early lung tumor and evaluation of different histological types. (A) 46 influential metabolomic biomarkers with statistical significance of Stage I lung tumor patients (mean value with SD). Through Mann–Whitney U test, 46 influential metabolic biomarkers showed statistically significant difference (p-value<0.05) among 61 metabolites. (B) PCA of 10 metabolomic biomarkers in early lung tumor detection. It revealed a clear separation between stage I lung tumor patients and healthy individuals. (C) ROC curve of metabolomic biomarkers and combined variates in early lung tumor detection. The combination of six variates included proline, l-kynurenine, spermidine, amino-hippuric acid, palmitoyl-l-carnitine and taurine. (D) ROC analysis of metabolomic biomarkers and combined variates of adenocarcinoma (n = 63) and squamous carcinoma (n = 41) patients. The combination of four variates included hypoxanthine, l-Kynurenine, proline and Carnitine. SD, standard deviation. PCA, principal component analysis. ROC, receiver operating characteristic.
ROC analysis of metabolomic biomarkers and combined variates.
| AUC | Std. error | Asymptotic 95% confidence interval | Optimal cut off | Sensitivity | Specificity | Youden index | ||
|---|---|---|---|---|---|---|---|---|
| Lower bound | Upper bound | |||||||
| ROC analysis of metabolomic biomarkers and combined variates in early lung tumor detection. | ||||||||
| L-Kynurenine | 0.825 | 0.043 | 0.740 | 0.909 | 0.975 | 85.2% | 72.1% | 0.573 |
| Proline | 0.923 | 0.026 | 0.871 | 0.975 | 24.350 | 79.6% | 93.0% | 0.727 |
| Spermidine | 0.890 | 0.035 | 0.821 | 0.958 | 7.195 | 81.5% | 90.7% | 0.722 |
| Amino-hippuric acid | 0.811 | 0.045 | 0.722 | 0.900 | 4.035 | 68.5% | 93.0% | 0.615 |
| Palmitoyl- | 0.906 | 0.032 | 0.843 | 0.969 | 3.655 | 74.1% | 100.0% | 0.741 |
| Taurine | 0.920 | 0.032 | 0.856 | 0.983 | 71.300 | 88.9% | 95.3% | 0.842 |
| Phenylalanine | 0.848 | 0.038 | 0.774 | 0.922 | 125.500 | 79.6% | 76.7% | 0.564 |
| L-Valine | 0.876 | 0.036 | 0.806 | 0.946 | 167.000 | 68.5% | 95.3% | 0.639 |
| o-Tyr | 0.822 | 0.043 | 0.738 | 0.906 | 24.650 | 83.3% | 72.1% | 0.554 |
| Carnitine | 0.848 | 0.040 | 0.769 | 0.926 | 4.680 | 72.2% | 93.0% | 0.652 |
| Combination of two | 0.933 | 0.028 | 0.878 | 0.978 | 0.337 | 85.2% | 93.0% | 0.782 |
| Combination of three | 0.968 | 0.019 | 0.931 | 1.000 | −0.147 | 94.4% | 97.7% | 0.921 |
| Combination of six | 0.989 | 0.011 | 0.967 | 1.000 | −0.102 | 98.1% | 100.0% | 0.981 |
| ROC analysis of metabolomic biomarkers and combined variates of adenocarcinoma and squamous carcinoma patients. | ||||||||
| L-Kynurenine | 0.423 | 0.060 | 0.306 | 0.540 | 1.050 | 77.8% | 24.4% | 0.022 |
| Proline | 0.580 | 0.057 | 0.469 | 0.692 | 35.150 | 54.0% | 65.9% | 0.198 |
| Carnitine | 0.536 | 0.058 | 0.422 | 0.650 | 6.835 | 38.1% | 75.6% | 0.137 |
| Hypoxanthine | 0.639 | 0.055 | 0.531 | 0.746 | 0.092 | 69.8% | 56.1% | 0.259 |
| Hippuric acid | 0.628 | 0.056 | 0.519 | 0.737 | 2.620 | 49.2% | 77.5% | 0.267 |
| Combination of four | 0.740 | 0.049 | 0.644 | 0.837 | 0.556 | 58.7% | 78.0% | 0.368 |
Abbreviations: ROC,receiver operating characteristic; AUC, area under the curve.
Fig. 3Metabolomic biomarkers changes with tumor stage progress. (A) It showed the levels of 10 metabolites which showed significant difference in stage I (n = 54), stage II (n = 31), stage III (n = 25) lung tumor patients and healthy individuals (n = 43). (B) ROC curve of metabolomic biomarkers of stage I (n = 54) lung tumor patients. (C) ROC curve of metabolomic biomarkers of stage II (n = 31) lung tumor patients. (D) ROC curve of metabolomic biomarkers of stage III (n = 25) lung tumor patients. ROC, receiver operating characteristic.
Fig. 4Machine learning was applied to develop the diagnostic model for early stages of lung cancer. (A) Machine learning applications build early lung tumor prediction models. (B) To validate diagnosis performance of machine learning models and demonstrate the specificity of metabolic biomarker features of early lung cancer patients found in our study, we created a scrambled set that showed no predictive value. AUC, area under the curve. AdaBoost, Adaptive Boosting. SVM, support vector machines. KNN, k-nearest neighbor.
Machine learning models used for early lung tumor detection based on the metabolomic biomarker features.
| TP | FP | TN | FN | Classification accuracy | Sensitivity | Specificity | AUC | Precision | ||
|---|---|---|---|---|---|---|---|---|---|---|
| Training set | KNN | 38 | 0 | 35 | 5 | 0.936 | 0.884 | 0.944 | ||
| SVM | 43 | 2 | 33 | 0 | 0.974 | 0.943 | 0.975 | |||
| Random Forest | 41 | 0 | 35 | 2 | 0.974 | 0.953 | 0.976 | |||
| Neural Network | 43 | 0 | 35 | 0 | ||||||
| Naïve Bayes | 43 | 0 | 35 | 0 | ||||||
| AdaBoost | 38 | 4 | 31 | 5 | 0.885 | 0.884 | 0.886 | 0.885 | 0.885 | |
| Test set | KNN | 9 | 0 | 8 | 2 | 0.895 | 0.818 | 0.916 | ||
| SVM | 10 | 0 | 8 | 1 | 0.947 | 0.909 | 0.953 | |||
| Random Forest | 11 | 2 | 6 | 0 | 0.895 | 0.750 | 0.911 | |||
| Neural Network | 10 | 0 | 8 | 1 | 0.947 | 0.909 | 0.953 | |||
| Naïve Bayes | 11 | 0 | 8 | 0 | ||||||
| AdaBoost | 4 | 0 | 8 | 7 | 0.632 | 0.364 | 0.682 | 0.804 |
Abbreviations: AdaBoost, Adaptive Boosting; SVM, support vector machines; KNN, k-nearest neighbor; TN, true negative; FN, false negative; TP, true positive; FP, false positive; AUC, area under the curve.