| Literature DB >> 31842486 |
Pushpanjali Gupta1, Sum-Fu Chiang2,3, Prasan Kumar Sahoo1,2, Suvendu Kumar Mohapatra4, Jeng-Fu You2, Djeane Debora Onthoni1, Hsin-Yuan Hung2, Jy-Ming Chiang2, Yenlin Huang5,6, Wen-Sy Tsai2,5.
Abstract
The prediction of tumor in the TNM staging (tumor, node, and metastasis) stage of colon cancer using the most influential histopathology parameters and to predict the five years disease-free survival (DFS) period using machine learning (ML) in clinical research have been studied here. From the colorectal cancer (CRC) registry of Chang Gung Memorial Hospital, Linkou, Taiwan, 4021 patients were selected for the analysis. Various ML algorithms were applied for the tumor stage prediction of the colon cancer by considering the Tumor Aggression Score (TAS) as a prognostic factor. Performances of different ML algorithms were evaluated using five-fold cross-validation, which is an effective way of the model validation. The accuracy achieved by the algorithms taking both cases of standard TNM staging and TNM staging with the Tumor Aggression Score was determined. It was observed that the Random Forest model achieved an F-measure of 0.89, when the Tumor Aggression Score was considered as an attribute along with the standard attributes normally used for the TNM stage prediction. We also found that the Random Forest algorithm outperformed all other algorithms, with an accuracy of approximately 84% and an area under the curve (AUC) of 0.82 ± 0.10 for predicting the five years DFS.Entities:
Keywords: TNM staging; artificial intelligence; colon cancer; disease-free survival; machine learning; prediction
Year: 2019 PMID: 31842486 PMCID: PMC6966646 DOI: 10.3390/cancers11122007
Source DB: PubMed Journal: Cancers (Basel) ISSN: 2072-6694 Impact factor: 6.639
Figure 1Outline of colon cancer staging and survival prediction using machine learning.
Sources of parameters collected in the clinical system. TNM: tumor, node, and metastasis.
| Sources | Parameters Collected |
|---|---|
|
| Age, gender, adjuvant therapy, status of follow-up, medical illness, pre-operation lab data |
|
| Smoking history, coffee consumption, alcohol consumption, physical activity |
|
| Operation date, intent of resection, operation timing, operation finding, operation type, early morbidity, late morbidity, mortality |
|
| Tumor location, gross appearance, circumferential involvement, tumor size, histologic type, histologic grade, tumor extension, examined lymph node number, total positive lymph node number, TNM staging |
Parameters used in this study with p-value.
| Parameters | Tumor Aggression Score | ||
|---|---|---|---|
| <9.8 | ≥9.8 | ||
| BMI | 0.004 | ||
| <18.5 | 215 (5.8) | 35 (11.90) | |
| 18.5–23.9 | 1665 (44.89) | 151 (51.36) | |
| 24.0–26.9 | 1070 (28.85) | 66 (22.45) | |
| ≥27 | 759 (20.46) | 42 (14.29) | |
| Family History (FH) | <0.001 | ||
| No | 2145 (57.83) | 180 (61.23) | |
| Yes | 1429 (38.53) | 104 (35.37) | |
| Unknown | 135 (3.64) | 10 (3.4) | |
| Age | 0.007 | ||
| <50 | 527 (14.20) | 50 (17) | |
| ≥50 | 3182 (85.80) | 244 (83) | |
| Gender | <0.001 | ||
| Male | 2114 (57) | 165 (56.12) | |
| Female | 1595 (43) | 129 (43.88) | |
| Hypertension | <0.001 | ||
| Yes | 2447 (65.97) | 191 (64.96) | |
| No | 1262 (34.03) | 103 (35.04) | |
| Diabetes | <0.001 | ||
| Yes | 3136 (84.55) | 231 (78.57) | |
| No | 573 (15.45) | 63 (21.43) | |
| Smoking | 0.001 | ||
| Never | 2324 (62.66) | 174 (59.18) | |
| Ex-Smoker | 546 (14.72) | 42 (14.29) | |
| Current | 839 (22.62) | 78 (26.53) | |
| Alcohol | <0.001 | ||
| Never | 2622 (70.69) | 213 (72.45) | |
| Ex-Drinker | 218 (5.88) | 18 (6.12) | |
| Current | 869 (23.43) | 63 (21.43) | |
| CEA Level | <0.001 | ||
| <5 | 2424 (65.35) | 145 (49.32) | |
| ≥5 | 1285 (34.65) | 149 (50.68) | |
| Hemoglobin | 0.9 | ||
| Low (<11) | 853 (23) | 182 (61.90) | |
| Normal | 2856 (77) | 112 (38.10) | |
| LAB_ALB | <0.001 | ||
| ≤3.5 | 424 (11.43) | 128 (43.54) | |
| ˃3.5 | 3285 (88.57) | 166 (56.46) | |
| LAB_CR | <0.001 | ||
| ≤1.1 | 2954 (79.64) | 233 (79.25) | |
| ˃1.1 | 755 (20.36) | 61 (20.75) | |
| WBC | <0.001 | ||
| ≤5500 | 202 (5.5) | 14(4.8) | |
| ˃5500 | 3507 (94.5) | 280(95.2) | |
| OP Time | 0.001 | ||
| Elective | 3635 (98) | 284 (96.6) | |
| Emergency | 74 (2) | 10 (3.4) | |
| OP Find | <0.001 | ||
| None | 3199 (86.25) | 205 (69.73) | |
| Combined | 470 (12.67) | 84 (28.57) | |
| Any one | 40 (1.08) | 5 (1.7) | |
| CirInvo | <0.001 | ||
| No | 1972 (53.17) | 26 (8.84) | |
| Yes | 1737 (46.83) | 268 (91.16) | |
| Tumor Differentiation | <0.001 | ||
| Grade I | 477 (12.86) | 7 (2.38) | |
| Grade II | 3001 (80.91) | 183 (62.24) | |
| Grade III | 231 (6.22) | 104 (35.37) | |
| Tumor Width | <0.001 | ||
| ≤4.4 | 2582 (69.61) | 8 (2.73) | |
| ˃4.4 | 1127 (30.39) | 286 (97.27) | |
| Tumor Length | <0.001 | ||
| ≤4.4 | 2679 (72.22) | 10 (3.4) | |
| ˃4.4 | 1030 (27.78) | 284 (96.6) | |
| T stage | <0.001 | ||
| T1 | 377 (10.16) | 5 (1.70) | |
| T2 | 531 (14.32) | 4 (1.36) | |
| T3 | 2322 (62.61) | 184 (62.59) | |
| T4 | 479 (12.91) | 101 (34.35) | |
| N stage | <0.001 | ||
| N0 | 2062 (55.6) | 179 (60.89) | |
| N1 | 1010 (27.23) | 57 (19.39) | |
| N2 | 522 (14.07) | 46 (1.24) | |
| N3 | 115 (3.10) | 12 (4.08) | |
Figure 2Distribution of patients based on the T-Stage.
Performance of different machine learning (ML) training models for tumor staging taking only the tumor size as a prognostic factor.
| Algorithms | Evaluation Metrics | |||
|---|---|---|---|---|
| Accuracy | Precision | Recall | F-Measure | |
|
| 0.73 (± 0.01) | 0.70 (± 0.03) | 0.74 (± 0.01) | 0.67 (± 0.01) |
|
| 0.63 (± 0.00) | 0.39 (± 0.00) | 0.63 (± 0.00) | 0.48 (± 0.00) |
|
| 0.63 (± 0.00) | 0.39 (± 0.00) | 0.63 (± 0.00) | 0.48 (± 0.00) |
|
| 0.63 (± 0.00) | 0.44 (± 0.12) | 0.63 (± 0.02) | 0.48 (± 0.00) |
|
| 0.64 (± 0.01) | 0.57 (± 0.01) | 0.64 (± 0.01) | 0.53 (± 0.02) |
|
| 0.73 (± 0.01) | 0.72 (± 0.08) | 0.73 (± 0.01) | 0.66 (± 0.01) |
Performance of different ML testing models for tumor staging taking only the tumor size as a prognostic factor.
| Algorithms | Evaluation Metrics | |||
|---|---|---|---|---|
| Accuracy | Precision | Recall | F-Measure | |
|
| 0.74 | 0.77 | 0.74 | 0.67 |
|
| 0.64 | 0.47 | 0.64 | 0.51 |
|
| 0.65 | 0.48 | 0.65 | 0.54 |
|
| 0.67 | 0.55 | 0.67 | 0.58 |
|
| 0.63 | 0.50 | 0.63 | 0.51 |
|
| 0.67 | 0.54 | 0.67 | 0.57 |
Performance of different ML training models for tumor staging taking TAS as a prognostic factor.
| Algorithms | Evaluation Metrics | |||
|---|---|---|---|---|
| Accuracy | Precision | Recall | F-Measure | |
|
| 0.90 (± 0.01) | 0.90 (± 0.02) | 0.90 (± 0.02) | 0.90 (± 0.02) |
|
| 0.73 (± 0.02) | 0.58 (± 0.08) | 0.73 (± 0.02) | 0.63 (± 0.02) |
|
| 0.63 (± 0.00) | 0.41 (± 0.00) | 0.63 (± 0.00) | 0.49 (± 0.00) |
|
| 0.63 (± 0.02) | 0.41 (± 0.07) | 0.63 (± 0.02) | 0.50 (± 0.03) |
|
| 0.86 (± 0.01) | 0.88 (± 0.01) | 0.86 (± 0.01) | 0.85 (± 0.01) |
|
| 0.89 (± 0.01) | 0.89 (± 0.01) | 0.89 (± 0.01) | 0.89 (± 0.01) |
Performance of different ML testing models for tumor staging taking Tumor Aggression Score (TAS) as a prognostic factor.
| Algorithms | Evaluation Metrics | |||
|---|---|---|---|---|
| Accuracy | Precision | Recall | F-Measure | |
|
| 0.89 | 0.89 | 0.88 | 0.89 |
|
| 0.73 | 0.65 | 0.73 | 0.64 |
|
| 0.62 | 0.38 | 0.62 | 0.48 |
|
| 0.62 | 0.52 | 0.64 | 0.48 |
|
| 0.85 | 0.87 | 0.85 | 0.84 |
|
| 0.81 | 0.81 | 0.81 | 0.78 |
Figure 3The ROC curves with AUC for different algorithms (a) Random Forest, (b) Support Vector Machine, (c) Logistic Regression, (d) Multilayer Perceptron, (e) K-Nearest Neighbors, and (f) AdaBoost, for tumor staging taking TAS as a prognostic factor.
Figure 4Overall accuracy achieved for predicting the five years disease-free survival.
Performance of different ML training models for predicting the five years disease-free survival (DFS).
| Algorithms | Evaluation Metrics | |||
|---|---|---|---|---|
| Accuracy | Precision | Recall | F-Measure | |
|
| 0.84 (± 0.12) | 0.82 (± 0.14) | 0.83 (± 0.12) | 0.81 (± 0.14) |
|
| 0.77 (± 0.03) | 0.74 (± 0.07) | 0.77 (± 0.03) | 0.71 (± 0.05) |
|
| 0.76 (± 0.02) | 0.73 (± 0.04) | 0.76 (± 0.02) | 0.71 (± 0.02) |
|
| 0.78 (± 0.11) | 0.77 (± 0.10) | 0.77 (± 0.11) | 0.77 (± 0.12) |
|
| 0.75 (± 0.06) | 0.72 (± 0.08) | 0.75 (± 0.06) | 0.71 (± 0.02) |
|
| 0.77 (± 0.03) | 0.75 (± 0.04) | 0.77 (± 0.03) | 0.74 (± 0.03) |
Performance of different ML testing models for predicting the five years DFS.
| Algorithms | Evaluation Metrics | |||
|---|---|---|---|---|
| Accuracy | Precision | Recall | F-Measure | |
|
| 0.76 | 0.74 | 0.76 | 0.71 |
|
| 0.74 | 0.71 | 0.74 | 0.64 |
|
| 0.73 | 0.70 | 0.73 | 0.71 |
|
| 0.64 | 0.66 | 0.64 | 0.65 |
|
| 0.73 | 0.70 | 0.73 | 0.70 |
|
| 0.66 | 0.70 | 0.66 | 0.67 |
Figure 5The ROC curves with AUC for different algorithms (a) Random Forest, (b) Support Vector Machine, (c) Logistic Regression, (d) Multilayer Perceptron, (e) K-Nearest Neighbors, and (f) AdaBoost, for predicting the five years DFS of the colon cancer patients.
Figure 6DFS of patients with an increase in Tumor Aggression Score.