| Literature DB >> 36123745 |
Danqing Hu1,2, Huanyao Zhang1,2, Shaolei Li3, Huilong Duan1,2, Nan Wu4, Xudong Lu5,6.
Abstract
BACKGROUND: Lung cancer is the leading cause of cancer death worldwide. Prognostic prediction plays a vital role in the decision-making process for postoperative non-small cell lung cancer (NSCLC) patients. However, the high imbalance ratio of prognostic data limits the development of effective prognostic prediction models.Entities:
Keywords: Active sampling; Ensemble learning; Non-small cell lung cancer; Prognostic prediction
Mesh:
Year: 2022 PMID: 36123745 PMCID: PMC9487160 DOI: 10.1186/s12911-022-01960-0
Source DB: PubMed Journal: BMC Med Inform Decis Mak ISSN: 1472-6947 Impact factor: 3.298
Fig. 1The process of the ELAS
The statistics of the 1-year, 3-year, and 5-year patient prognoses
| Outcomes | Number of patients | ||
|---|---|---|---|
| 1-year | 3-year | 5-year | |
| Recurrence, n (%) | 102 (7.6%) | 296 (29.1%) | 377 (51.9%) |
| No recurrence, n (%) | 1,246 (92.4%) | 720 (70.9%) | 350 (48.1%) |
| Death, n (%) | 62 (4.6%) | 220 (21.8%) | 307 (43.7%) |
| No death, n (%) | 1,288 (95.4%) | 787 (78.2%) | 395 (56.3%) |
The AUROC values of the base classifier algorithms and the ELAS
| Task | Base classifier algorithms | ELAS | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| SVM | L2-LR | CART | SVM-ELAS | L2-LR-ELAS | CART-ELAS | |||||||
| Mean | SD | Mean | SD | Mean | SD | Mean | SD | Mean | SD | Mean | SD | |
| 1-year recurrence | 0.649 | 0.063 | 0.660 | 0.072 | 0.603 | 0.072 | 0.079 | 0.674 | 0.071 | 0.668 | 0.056 | |
| 1-year death | 0.653 | 0.057 | 0.754 | 0.043 | 0.65 | 0.072 | 0.042 | 0.740 | 0.057 | 0.740 | 0.059 | |
| 3-year recurrence | 0.713 | 0.041 | 0.697 | 0.027 | 0.637 | 0.031 | 0.033 | 0.709 | 0.029 | 0.706 | 0.036 | |
| 3-year death | 0.702 | 0.044 | 0.711 | 0.040 | 0.663 | 0.041 | 0.733 | 0.035 | 0.720 | 0.037 | 0.040 | |
| 5-year recurrence | 0.053 | 0.730 | 0.061 | 0.668 | 0.045 | 0.748 | 0.055 | 0.735 | 0.063 | 0.724 | 0.051 | |
| 5-year death | 0.739 | 0.033 | 0.718 | 0.028 | 0.631 | 0.044 | 0.029 | 0.729 | 0.026 | 0.694 | 0.040 | |
| All tasks | 0.701 | 0.063 | 0.711 | 0.056 | 0.642 | 0.057 | 0.052 | 0.718 | 0.055 | 0.711 | 0.054 | |
The bold means the best results for corresponding tasks
The AUPRC values of the base classifier algorithms and the ELAS
| Task | Base classifier algorithms | ELAS | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| SVM | L2-LR | CART | SVM-ELAS | L2-LR-ELAS | CART-ELAS | |||||||
| Mean | SD | Mean | SD | Mean | SD | Mean | SD | Mean | SD | Mean | SD | |
| 1-year recurrence | 0.145 | 0.028 | 0.070 | 0.118 | 0.046 | 0.153 | 0.052 | 0.173 | 0.074 | 0.156 | 0.055 | |
| 1-year death | 0.123 | 0.042 | 0.042 | 0.109 | 0.029 | 0.129 | 0.039 | 0.133 | 0.040 | 0.136 | 0.041 | |
| 3-year recurrence | 0.518 | 0.054 | 0.497 | 0.041 | 0.406 | 0.033 | 0.050 | 0.509 | 0.047 | 0.486 | 0.044 | |
| 3-year death | 0.437 | 0.054 | 0.413 | 0.048 | 0.352 | 0.061 | 0.047 | 0.421 | 0.047 | 0.448 | 0.068 | |
| 5-year recurrence | 0.057 | 0.742 | 0.065 | 0.648 | 0.054 | 0.758 | 0.055 | 0.745 | 0.064 | 0.724 | 0.046 | |
| 5-year death | 0.694 | 0.045 | 0.680 | 0.036 | 0.532 | 0.035 | 0.040 | 0.690 | 0.034 | 0.634 | 0.052 | |
| All tasks | 0.446 | 0.250 | 0.441 | 0.234 | 0.361 | 0.203 | 0.247 | 0.445 | 0.239 | 0.431 | 0.227 | |
The bold means the best results for corresponding tasks
Fig. 2The AUROC values of the base classifier algorithms and the ELAS
Fig. 3The AUPRC values of the base classifier algorithms and the ELAS
The paired student t-test results between the base classifier algorithms and the ELAS
| Metric | Comparison | 1-year tasks | 3-year tasks | 5-year tasks | All tasks |
|---|---|---|---|---|---|
| AUROC | SVM versus SVM-ELAS | 0.411 | |||
| L2-LR versus L2-LR-ELAS | 0.487 | ||||
| CART versus CART-ELAS | |||||
| AUPRC | SVM versus SVM-ELAS | 0.165 | 0.378 | ||
| L2-LR versus L2-LR-ELAS | 0.093 | ||||
| CART versus CART-ELAS |
The bold means the p-value is less than 0.05, which means the results between different models have statistically significant differences
The AUROC values of the ensemble algorithms, resampling algorithms, and the ELAS
| Task | Ensemble algorithms | Resampling algorithms | Proposed | |||||||
|---|---|---|---|---|---|---|---|---|---|---|
| SVM-AdaBoost | SVM-Bagging | SVM-SMOTE | SVM-TomekLinks | SVM-ELAS | ||||||
| Mean | SD | Mean | SD | Mean | SD | Mean | SD | Mean | SD | |
| 1-year recurrence | 0.682 | 0.082 | 0.673 | 0.072 | 0.620 | 0.073 | 0.650 | 0.065 | 0.079 | |
| 1-year death | 0.055 | 0.726 | 0.047 | 0.670 | 0.058 | 0.668 | 0.058 | 0.760 | 0.042 | |
| 3-year recurrence | 0.692 | 0.038 | 0.723 | 0.037 | 0.706 | 0.031 | 0.723 | 0.038 | 0.033 | |
| 3-year death | 0.707 | 0.043 | 0.721 | 0.039 | 0.710 | 0.030 | 0.711 | 0.043 | 0.035 | |
| 5-year recurrence | 0.055 | 0.053 | 0.751 | 0.053 | 0.053 | 0.748 | 0.055 | |||
| 5-year death | 0.724 | 0.031 | 0.739 | 0.032 | 0.732 | 0.031 | 0.738 | 0.036 | 0.029 | |
| All tasks | 0.721 | 0.062 | 0.722 | 0.054 | 0.698 | 0.065 | 0.707 | 0.062 | 0.052 | |
The bold means the best results for corresponding tasks
The AUPRC values of the ensemble algorithms, resampling algorithms, and the ELAS
| Task | Ensemble algorithms | Resampling algorithms | Proposed | |||||||
|---|---|---|---|---|---|---|---|---|---|---|
| SVM-AdaBoost | SVM-Bagging | SVM-SMOTE | SVM-TomekLinks | SVM-ELAS | ||||||
| Mean | SD | Mean | SD | Mean | SD | Mean | SD | Mean | SD | |
| 1-year recurrence | 0.150 | 0.047 | 0.049 | 0.114 | 0.033 | 0.151 | 0.043 | 0.052 | ||
| 1-year death | 0.033 | 0.125 | 0.043 | 0.101 | 0.034 | 0.124 | 0.050 | 0.129 | 0.039 | |
| 3-year recurrence | 0.493 | 0.042 | 0.524 | 0.051 | 0.490 | 0.046 | 0.524 | 0.051 | 0.050 | |
| 3-year death | 0.420 | 0.045 | 0.454 | 0.054 | 0.404 | 0.048 | 0.449 | 0.044 | 0.047 | |
| 5-year recurrence | 0.762 | 0.052 | 0.054 | 0.762 | 0.057 | 0.763 | 0.057 | 0.758 | 0.055 | |
| 5-year death | 0.681 | 0.040 | 0.693 | 0.045 | 0.678 | 0.048 | 0.685 | 0.050 | 0.040 | |
| All tasks | 0.440 | 0.243 | 0.452 | 0.249 | 0.425 | 0.257 | 0.449 | 0.248 | 0.247 | |
The bold means the best results for corresponding tasks
Fig. 4The AUROC values of the ensemble algorithms, resampling algorithms, and the ELAS
Fig. 5The AUPRC values of the ensemble algorithms, resampling algorithms, and the ELAS
The paired student t-test results between the benchmark algorithms and the ELAS
| Metric | Comparison | 1-year tasks | 3-year tasks | 5-year tasks | All tasks |
|---|---|---|---|---|---|
| AUROC | SVM-AdaBoost versus SVM-ELAS | 0.231 | |||
| SVM-Bagging versus SVM-ELAS | 0.490 | ||||
| SVM-SMOTE versus SVM-ELAS | 0.104 | ||||
| SVM-TomekLinks versus SVM-ELAS | 0.454 | ||||
| AUPRC | SVM-AdaBoost versus SVM-ELAS | 0.428 | 0.096 | ||
| SVM-Bagging versus SVM-ELAS | 0.396 | 0.146 | 0.337 | ||
| SVM-SMOTE versus SVM-ELAS | 0.084 | ||||
| SVM-TomekLinks versus SVM-ELAS | 0.334 | 0.287 | 0.088 |
The bold means the p-value is less than 0.05, which means the results between different models have statistically significant differences