| Literature DB >> 32246031 |
Chunyan Wang1, Yijing Long1, Wenwen Li2, Wei Dai3, Shaohua Xie3,4, Yuanling Liu1, Yinchenxi Zhang1, Mingxin Liu3, Yonghui Tian5, Qiang Li6, Yixiang Duan7.
Abstract
Accurate classification of adenocarcinoma (AC) and squamous cell carcinoma (SCC) in lung cancer is critical to physicians' clinical decision-making. Exhaled breath analysis provides a tremendous potential approach in non-invasive diagnosis of lung cancer but was rarely reported for lung cancer subtypes classification. In this paper, we firstly proposed a combined method, integrating K-nearest neighbor classifier (KNN), borderline2-synthetic minority over-sampling technique (borderlin2-SMOTE), and feature reduction methods, to investigate the ability of exhaled breath to distinguish AC from SCC patients. The classification performance of the proposed method was compared with the results of four classification algorithms under different combinations of borderline2-SMOTE and feature reduction methods. The result indicated that the KNN classifier combining borderline2-SMOTE and feature reduction methods was the most promising method to discriminate AC from SCC patients and obtained the highest mean area under the receiver operating characteristic curve (0.63) and mean geometric mean (58.50) when compared to others classifiers. The result revealed that the combined algorithm could improve the classification performance of lung cancer subtypes in breathomics and suggested that combining non-invasive exhaled breath analysis with multivariate analysis is a promising screening method for informing treatment options and facilitating individualized treatment of lung cancer subtypes patients.Entities:
Mesh:
Year: 2020 PMID: 32246031 PMCID: PMC7125212 DOI: 10.1038/s41598-020-62803-4
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.379
Figure 1Study design flow. The input data first is processed using approach 1: without any processing; approach 2: borderline resampling technique only; approach 3: dimensionality reduction only; approach 4: dimensionality reduction and borderline resampling technique. And then, five classifiers are applied to establish a classification model in the training phase; final, the classification performance is evaluated in the testing set.
Demographics of adenocarcinoma patients and squamous cell carcinoma patients. Data are expressed as mean ± standard deviation for age and BMI.
| Clinical parameters | Adenocarcinoma | Squamous cell carcinoma |
|---|---|---|
| Sex | ||
| Male | 125 (53.42%) | 88 (96.70%) |
| Female | 109 (46.58%) | 3 (3.30%) |
| Age | ||
| Mean±SD | 61 ± 7.09 | 62 ± 6.72 |
| BMI | ||
| Mean±SD | 23.67 ± 3.03 | 22.48 ± 3.05 |
| Smoking status | ||
| Smoker | 67 | 51 |
| Non-smoker | 136 | 13 |
| Stopping smoker | 31 | 27 |
| Drinking status | ||
| Drinker | 48 | 24 |
| Non-drinker | 167 | 24 |
| Stopping drinker | 19 | 43 |
| Education status | ||
| Primary school | 92 | 39 |
| High school above | 116 | 45 |
| None | 26 | 7 |
| TNM stage | ||
| I | 110 (47.01%) | 16 (17.58%) |
| II | 24 (10.26%) | 25 (27.47%) |
| III | 45 (19.23%) | 35 (38.46%) |
| IV | 55 (23.50%) | 15 (16.48%) |
Figure 2The Hotellingss T2 range is plotted for outlier detection by the sample number on the horizontal axis and T2 range on the vertical. The green and red dotted lines represent the 95% and 99% confidence intervals, respectively.
Figure 3The visualization of the 3D scatters plot (a) before and (b) after the process of borderline2-SMOTE in PCA. These three axes represent the first three principal components. Abbreviations: PC: principal component. AC (red circle) and SCC (green circle) represent adenocarcinoma and squamous cell carcinoma, respectively.
The parameter setting, sensitivity, and specificity of five classifiers in approach 1 and approach 2. Abbreviations: Sensitivity and specificity in approach 2 are expressed as mean ± standard deviation.
| Classifier | Approach 1 | Approach 2 | ||||
|---|---|---|---|---|---|---|
| Parameters | Sensitivity (%) | Specificity (%) | Parameters | Sensitivity (%) (mean ± SD) | Specificity (%) (mean ± SD) | |
| KNN | K = 28 | 100 | 0 | K = 2; 1; 2 | 76.09 ± 9.96 | 41.67 ± 9.55 |
| RF | N = 215 | 91.30 | 18.75 | N = 295; 155; 265 | 81.88 ± 3.32 | 12.50 ± 6.25 |
| SVM | C = 1 | 100 | 0 | C = 18; 29; 24 | 60.87 ± 40.73 | 47.92 ± 29.54 |
| gamma = 0.01 | gamma = 0.3; 0.3; 0.3 | |||||
| MLP | M = 5 | 60.87 | 37.5 | M = 17; 66; 70 | 60.87 ± 2.17 | 33.33 ± 3.61 |
| PLS-DA | n = 2 | 91.30 | 12.5 | n = 6; 8; 5 | 63.05 ± 3.76 | 43.75 ± 6.25 |
Figure 4The classification result of G-mean and AUC value in five classifiers. (a,b) Represent the results of AUC and G-mean, respectively. Error bars are added to approach 2 for considering the average result after borderline2-SMOTE.
Figure 5Heat map presents the predictive performance in approach 3. Five classifiers across two feature dimensionality reduction methods (in rows) and selected ranges (in columns) in adenocarcinoma and squamous cell carcinoma patients are presented. (a,b) Are the AUC and G-mean values of five classifiers without borderline2-SMOTE, respectively.
Figure 6Heat map presents the predictive performance in approach 4. Five classifiers across two feature dimensionality reduction methods (in rows) and selected ranges (in columns) in adenocarcinoma and squamous cell carcinoma patients are presented. (a,b) Are the AUC and G-mean values of five classifiers with borderline2-SMOTE, respectively.