| Literature DB >> 32445550 |
Shuyin Duan1, Huimin Cao1, Hong Liu2, Lijun Miao2, Jing Wang2, Xiaolei Zhou3, Wei Wang1, Pingzhao Hu4, Lingbo Qu1,5, Yongjun Wu1,6.
Abstract
As an emerging technology, artificial intelligence has been applied to identify various physical disorders. Here, we developed a three-layer diagnosis system for lung cancer, in which three machine learning approaches including decision tree C5.0, artificial neural network (ANN) and support vector machine (SVM) were involved. The area under the curve (AUC) was employed to evaluate their decision powers. In the first layer, the AUCs of C5.0, ANN and SVM were 0.676, 0.736 and 0.640, ANN was better than C5.0 and SVM. In the second layer, ANN was similar with SVM but superior to C5.0 supported by the AUCs of 0.804, 0.889 and 0.825. Much higher AUCs of 0.908, 0.910 and 0.849 were identified in the third layer, where the highest sensitivity of 94.12% was found in C5.0. These data proposed a three-layer diagnosis system for lung cancer: ANN was used as a broad-spectrum screening subsystem basing on 14 epidemiological data and clinical symptoms, which was firstly adopted to screen high-risk groups; then, combining with additional 5 tumor biomarkers, ANN was used as an auxiliary diagnosis subsystem to determine the suspected lung cancer patients; C5.0 was finally employed to confirm lung cancer patients basing on 22 CT nodule-based radiomic features.Entities:
Keywords: lung cancer; machine learning; multidimensional variables; multimode diagnosis
Mesh:
Year: 2020 PMID: 32445550 PMCID: PMC7288961 DOI: 10.18632/aging.103249
Source DB: PubMed Journal: Aging (Albany NY) ISSN: 1945-4589 Impact factor: 5.682
Demographic characteristics of lung cancer and lung benign disease patients in the first-layer subsystem.
| Age By Groups | ||||
| ≤45 | 134 | 26 | 62.487 | <0.001* |
| >45 | 336 | 346 | ||
| Age (year) | 57(44-67) | 60(52-67) | -3.882 | <0.001* |
| Gender | ||||
| Female | 213 | 123 | 13.004 | <0.001* |
| Male | 257 | 249 | ||
| Smoking Status | ||||
| No | 359 | 210 | 37.649 | <0.001* |
| Yes | 111 | 162 | ||
| Drinking Status | ||||
| No | 405 | 290 | 9.720 | 0.002* |
| Yes | 65 | 82 | ||
| History of Lung Infection | ||||
| No | 167 | 108 | 3.989 | 0.046* |
| Yes | 303 | 264 | ||
| Chest Tightness or Chest Pain | ||||
| No | 230 | 176 | 0.219 | 0.639 |
| Yes | 240 | 196 | ||
| Expectoration | ||||
| No | 209 | 132 | 6.955 | 0.008* |
| Yes | 261 | 240 | ||
| Bloody Sputum | ||||
| No | 428 | 290 | 28.406 | <0.001* |
| Yes | 42 | 82 | ||
| Cough | ||||
| No | 144 | 88 | 8.180 | 0.004* |
| Yes | 326 | 284 | ||
| Hemoptysis | ||||
| No | 432 | 319 | 5.072 | 0.024* |
| Yes | 38 | 53 | ||
| Fever or Sweating | ||||
| No | 280 | 289 | 31.095 | <0.001* |
| Yes | 190 | 83 | ||
| Family History of Tumor | ||||
| No | 446 | 342 | 3.027 | 0.082 |
| Yes | 24 | 30 | ||
| Family History of Lung Cancer | ||||
| No | 445 | 346 | 1.018 | 0.313 |
| Yes | 25 | 26 |
*: Statistically significant at P=0.05 level.
Demographic characteristics of subjects in the second-layer subsystem.
| Age By Groups | ||||
| ≤45 | 41 | 8 | 19.778 | <0.001* |
| >45 | 116 | 121 | ||
| Age (year) | 58(45-67) | 59(52.5-66) | -1.834 | 0.067 |
| Gender | ||||
| Female | 65 | 51 | 0.102 | 0.749 |
| Male | 92 | 78 | ||
| Smoking Status | ||||
| No | 114 | 70 | 10.390 | 0.001* |
| Yes | 43 | 59 | ||
| Drinking Status | ||||
| No | 133 | 98 | 3.486 | 0.062 |
| Yes | 24 | 31 | ||
| History of Lung Infection | ||||
| No | 103 | 68 | 4.895 | 0.027* |
| Yes | 54 | 61 | ||
| Chest Tightness or Chest Pain | ||||
| No | 71 | 63 | 0.371 | 0.542 |
| Yes | 86 | 66 | ||
| Expectoration | ||||
| No | 78 | 43 | 7.754 | 0.005* |
| Yes | 79 | 86 | ||
| Bloody Sputum | ||||
| No | 140 | 93 | 13.682 | <0.001* |
| Yes | 17 | 36 | ||
| Cough | ||||
| No | 51 | 29 | 3.517 | 0.061 |
| Yes | 106 | 100 | ||
| Hemoptysis | ||||
| No | 145 | 105 | 7.733 | 0.005* |
| Yes | 12 | 24 | ||
| Fever or Sweating | ||||
| No | 84 | 95 | 12.267 | <0.001* |
| Yes | 73 | 34 | ||
| Family History of Tumor | ||||
| No | 141 | 110 | 1.358 | 0.244 |
| Yes | 16 | 19 | ||
| Family History of Lung Cancer | ||||
| No | 152 | 117 | 4.740 | 0.029* |
| Yes | 5 | 12 |
*: Statistically significant at P=0.05 level.
Comparison of the 5 tumor markers between lung cancer and lung benign diseases.
| ProGRP (pg/mL) | 18.59(11.61-30.39) | 27.50(15.76-44.40) | -4.298 | <0.001* |
| VEGF (ng/mL) | 2.25(1.38-3.42) | 3.00(1.95-4.06) | -4.318 | <0.001* |
| CEA(ng/mL) | 2.27(1.39-4.39) | 2.95(1.87-5.55) | -2.705 | 0.007* |
| CYFRA21-1(ng/mL) | 1.50(0.77-2.15) | 1.57(0.96-1.80) | -2.009 | 0.044* |
| NSE(ng/mL) | 9.30(5.83-15.19) | 8.88(5.36-15.04) | -0.727 | 0.467 |
*: Statistically significant at P=0.05 level.
Comparison of radiomic features extracted from lung CT benign and malignant nodules.
| f1 | 0.043(0.023-0.648) | 0.198(0.137-0.347) | -8.839 | <0.001* |
| f2 | 0.025(0.014-0.045) | 0.121(0.092-0.154) | -8.890 | <0.001* |
| f3 | 0.591(0.352-0.830) | 1.722(1.237-2.367) | -8.490 | <0.001* |
| f4 | 9.0E-4(1.0E-3-1.1E-3) | 8.0E-4(7.0E-4-8.0E-4) | -7.163 | <0.001* |
| f5 | 3.1E-8(1.3E-8-9.4E-8) | 1.9E-8(7.8E-9-6.4E-8) | -1.311 | 0.190 |
| f6 | 2.9E-12(1.5E-12-5.4E-12) | 2.8E-12(1.1E-12-7.4E-12) | -0.420 | 0.674 |
| f7 | 2.7E-12(7.8E-13-5.6E-12) | 1.6E-12(2.6E-13-4.0E-12) | -1.741 | 0.082 |
| f8 | 1.5E-26(-2.8E-24-2.5E-24) | 4.0E-26(-8.7E-26-3.8E-24) | -1.306 | 0.192 |
| f9 | -3.4E-16(-1.7E-15-5.0E-16) | -6.2E-19(-8.7E-16-6.1E-16) | -1.802 | 0.072 |
| f10 | 9.3E-26(-4.7E-24-2.1E-24) | -9.7E-27(-1.3E-24-2.8E-24) | -0.197 | 0.843 |
| f11 | 36.50(6.25-106.50) | 814(453-1722) | -8.714 | <0.001* |
| f12 | 0.16(0.05-0.30) | 0.54(0.36-0.68) | -7.423 | <0.001* |
| f15 | 0.00(0.00-0.00) | 0.00(0.00-1.00) | -0.819 | <0.001* |
| f16 | 0.00(0.00-6.75) | 1.00(-2.00-17.00) | -0.583 | 0.560 |
| f17 | 0(0-3.4E-2) | 7.0E-5(-1.7E-4-1.4E-3) | -1.298 | 0.194 |
| f18 | 132.63(90.59-220.19) | 450.39(343.46.76-617.20) | -8.368 | <0.001* |
| f19 | 0.956(0.945-0.963) | 0.971(0.963-0.976) | -6.202 | <0.001* |
| f20 | 0.849(0.784-0.913) | 0.484(0.322-0.645) | -8.657 | <0.001* |
| f21 | 0.944(0.919-0.966) | 0.834(0.759-0.890) | -8.115 | <0.001* |
| f22 | 3.088(2.633-3.576) | 5.316(4.342-6.6930 | -8.409 | <0.001* |
*: Statistically significant at P=0.05 level.
Results of machine learning models to distinguish lung cancer from lung benign diseases.
| C5.0-1 | Lung Benign | 280 | 67 | 94 | 29 | ||
| Lung Cancer | 62 | 229 | 34 | 47 | |||
| Total | 342 | 296 | 128 | 76 | |||
| Accuracy | 79.78% | 69.12% | |||||
| ANN-1 | Lung Benign | 238 | 68 | 84 | 14 | ||
| Lung Cancer | 104 | 228 | 44 | 62 | |||
| Total | 342 | 296 | 128 | 76 | |||
| Accuracy | 73.04% | 71.57% | |||||
| SVM-1 | Lung Benign | 270 | 73 | 88 | 31 | ||
| Lung Cancer | 72 | 223 | 40 | 45 | |||
| Total | 342 | 296 | 128 | 76 | |||
| Accuracy | 77.27% | 65.20% | |||||
| C5.0-2 | Lung Benign | 107 | 1 | 38 | 7 | ||
| Lung Cancer | 4 | 96 | 8 | 25 | |||
| Total | 111 | 97 | 46 | 32 | |||
| Accuracy | 97.60% | 80.77% | |||||
| ANN-2 | Lung Benign | 99 | 18 | 43 | 5 | ||
| Lung Cancer | 12 | 79 | 3 | 27 | |||
| Total | 111 | 97 | 46 | 32 | |||
| Accuracy | 85.58% | 89.74% | |||||
| SVM-2 | Lung Benign | 109 | 2 | 40 | 7 | ||
| Lung Cancer | 2 | 95 | 6 | 25 | |||
| Total | 111 | 97 | 46 | 32 | |||
| Accuracy | 98.08% | 83.33% | |||||
| C5.0-3 | Lung Benign | 48 | 0 | 14 | 1 | ||
| Lung Cancer | 0 | 42 | 2 | 16 | |||
| Total | 48 | 42 | 16 | 17 | |||
| Accuracy | 100% | 90.91% | |||||
| ANN-3 | Lung Benign | 46 | 4 | 15 | 2 | ||
| Lung Cancer | 2 | 38 | 1 | 15 | |||
| Total | 48 | 42 | 16 | 17 | |||
| Accuracy | 93.33% | 90.91% | |||||
| SVM-3 | Lung Benign | 48 | 0 | 14 | 3 | ||
| Lung Cancer | 0 | 42 | 2 | 14 | |||
| Total | 48 | 42 | 16 | 17 | |||
| Accuracy | 100% | 84.85% | |||||
Effect evaluation of machine learning models in the testing set.
| C5.0-1 | 69.12 | 61.84 | 73.44 | 58.02 | 76.42 | 0.676 (0.608-0.740) |
| ANN-1 | 71.57 | 81.58 | 65.63 | 58.49 | 85.71 | 0.736 (0.670-0.795) |
| SVM-1 | 65.20 | 59.21 | 68.75 | 52.94 | 73.95 | 0.640 (0.570-0.706) |
| C5.0-2 | 80.77 | 78.13 | 82.61 | 75.76 | 84.44 | 0.804 (0.698-0.885) |
| ANN-2 | 89.74 | 84.38 | 93.48 | 90.00 | 89.58 | 0.889 (0.798-0.949) |
| SVM-2 | 83.33 | 78.13 | 86.96 | 80.65 | 85.11 | 0.825 (0.732-0.902) |
| C5.0-3 | 90.91 | 94.12 | 87.50 | 88.89 | 93.33 | 0.908 (0.755-0.980) |
| ANN-3 | 90.91 | 88.24 | 93.75 | 93.75 | 88.24 | 0.910 (0.758-0.981) |
| SVM-3 | 84.85 | 82.35 | 87.50 | 87.50 | 82.35 | 0.849 (0.682-0.949) |
Figure 1A three-layer diagnosis system for lung cancer.