| Literature DB >> 33946756 |
Silvia Moreno1,2, Mario Bonfante1, Eduardo Zurek2, Dmitry Cherezov3, Dmitry Goldgof3, Lawrence Hall3, Matthew Schabath4.
Abstract
Lung cancer causes more deaths globally than any other type of cancer. To determine the best treatment, detecting EGFR and KRAS mutations is of interest. However, non-invasive ways to obtain this information are not available. Furthermore, many times there is a lack of big enough relevant public datasets, so the performance of single classifiers is not outstanding. In this paper, an ensemble approach is applied to increase the performance of EGFR and KRAS mutation prediction using a small dataset. A new voting scheme, Selective Class Average Voting (SCAV), is proposed and its performance is assessed both for machine learning models and CNNs. For the EGFR mutation, in the machine learning approach, there was an increase in the sensitivity from 0.66 to 0.75, and an increase in AUC from 0.68 to 0.70. With the deep learning approach, an AUC of 0.846 was obtained, and with SCAV, the accuracy of the model was increased from 0.80 to 0.857. For the KRAS mutation, both in the machine learning models (0.65 to 0.71 AUC) and the deep learning models (0.739 to 0.778 AUC), a significant increase in performance was found. The results obtained in this work show how to effectively learn from small image datasets to predict EGFR and KRAS mutations, and that using ensembles with SCAV increases the performance of machine learning classifiers and CNNs. The results provide confidence that as large datasets become available, tools to augment clinical capabilities can be fielded.Entities:
Keywords: CNN; EGFR; KRAS; NSCLC; ensembles; machine learning; radiogenomics
Year: 2021 PMID: 33946756 PMCID: PMC8162978 DOI: 10.3390/tomography7020014
Source DB: PubMed Journal: Tomography ISSN: 2379-1381
Mutation status data summary.
| Variable | Values | Number of Cases (%) |
|---|---|---|
| EGFR Mutation Status | Mutant | 12 (14%) |
| Wildtype | 71 (86%) | |
| Total | 83 (100%) | |
| KRAS Mutation Status | Mutant | 20 (24%) |
| Wildtype | 63 (76%) | |
| Total | 83 (100%) |
Clinical features data summary.
| Variable. | Overall | EGFR Mutant | EGFR Wildtype | KRAS Mutant | KRAS Wildtype |
|---|---|---|---|---|---|
| Median Age (Range) | 69 (46–85) | 72 (55–85) | 69 (46–84) | 68 (50–81) | 69 (46–85) |
| Gender | |||||
| Male | 65 (78%) | 7 (8%) | 58 (70%) | 16 (19%) | 49 (59%) |
| Female | 18 (22%) | 5 (6%) | 13 (16%) | 4 (5%) | 14 (17%) |
| Smoking Status | |||||
| Current | 18 (22%) | 1 (1%) | 17 (21%) | 6 (6%) | 12 (16%) |
| Former | 56 (67%) | 8 (9%) | 48 (58%) | 14 (17%) | 42 (50%) |
| Non-smoker | 9 (11%) | 3 (4%) | 6 (7%) | 0 (0%) | 9 (11%) |
| Pathological T Stage | |||||
| Tis | 3 (4%) | 1 (1%) | 2 (3%) | 0 (0%) | 3 (4%) |
| T1a | 17 (21%) | 1 (1%) | 16 (20%) | 4 (5%) | 13 (16%) |
| T1b | 19 (23%) | 5 (6%) | 14 (17%) | 3 (3%) | 16 (20%) |
| T2a | 26 (31%) | 3 (3%) | 23 (28%) | 7 (8%) | 19 (23%) |
| T2b | 6 (7%) | 1 (1%) | 5 (6%) | 1 (1%) | 5 (6%) |
| T3 | 8 (9%) | 1 (1%) | 7 (8%) | 5 (6%) | 3 (3%) |
| T4 | 4 (5%) | 0 | 4 (5%) | 0 (0%) | 4 (5%) |
| Pathological N Stage | |||||
| N0 | 65 (78%) | 10 (12%) | 55 (66%) | 16 (20%) | 49 (58%) |
| N1 | 8 (10%) | 1 (1%) | 7 (9%) | 1 (1%) | 7 (9%) |
| N2 | 10 (12%) | 1 (1%) | 9 (11%) | 3 (3%) | 7 (9%) |
| Pathological M Stage | |||||
| M0 | 80 (96%) | 12 (14%) | 68 (82%) | 19 (23%) | 61 (73%) |
| M1b | 3 (4%) | 0 0% | 3 (4%) | 1 (1%) | 2 (3%) |
| Histology | |||||
| Adenocarcinoma | 66 (80%) | 12(14%) | 54 (66%) | 19 (23%) | 47 (57%) |
| Squamous cell carcinoma | 14 (17%) | 0 (0%) | 14 (17%) | 0 (0%) | 14 (17%) |
| 1 (1%) | 2 (2%) |
Figure 1Selective Class Average Voting (SCAV) algorithm.
Figure 2Experiment 1 workflow.
Figure 3Architecture 1 of CNN base models.
Figure 4Architecture 4 of CNN base models.
Figure 5Architecture 6 of CNN base models.
EGFR mutation prediction results on Test dataset, base classifiers.
| Feature Selection | Classifier | SMOTE | Accuracy | Sensitivity | Specificity | AUC |
|---|---|---|---|---|---|---|
| MW (5 features) | nnet | No | 0.83 | 0.00 | 0.98 | 0.43 |
| ReliefF (15 features) | SVM | Yes | 0.76 | 0.66 | 0.78 | 0.68 |
| ReliefF (10 features) | RF | Yes | 0.76 | 0.41 | 0.82 | 0.67 |
| ReliefF (15 features) | nnet | Yes | 0.76 | 0.58 | 0.79 | 0.67 |
| ReliefF (5 features) | RF | Yes | 0.77 | 0.50 | 0.82 | 0.64 |
| ReliefF (20 features) | RF | Yes | 0.73 | 0.16 | 0.83 | 0.63 |
| ReliefF (20 features) | SVM | Yes | 0.68 | 0.66 | 0.69 | 0.63 |
| ReliefF (5 features) | nnet | Yes | 0.71 | 0.50 | 0.75 | 0.60 |
| ReliefF (5 features) | SVM | Yes | 0.79 | 0.25 | 0.89 | 0.59 |
| ReliefF (15 features) | RF | Yes | 0.72 | 0.25 | 0.80 | 0.57 |
| MW (5 features) | gbm | Yes | 0.68 | 0.16 | 0.78 | 0.53 |
EGFR mutation prediction best results on Test dataset, ensembles.
| Ensemble Combination | Classifiers | Accuracy | Sensitivity | Specificity | AUC |
|---|---|---|---|---|---|
| Ensemble SCAV thresh 3 (10 models) | gbm, SVM, nnet | 0.59 | 0.75 | 0.57 | 0.70 |
| Ensemble SCAV thresh 6 (10 models) | gbm, SVM, nnet | 0.80 | 0.33 | 0.89 | 0.68 |
| Ensemble Average (10 models) | gbm, SVM, nnet | 0.78 | 0.16 | 0.89 | 0.68 |
| Ensemble Average (5 models) | All | 0.78 | 0.16 | 0.89 | 0.67 |
| Ensemble Average (5 models) | RF, SVM, nnet | 0.79 | 0.33 | 0.87 | 0.66 |
| Ensemble Maximum (10 models) | gbm, SVM, nnet | 0.75 | 0.41 | 0.82 | 0.59 |
KRAS mutation prediction results on Test dataset, base classifiers.
| Feature Selection | Classifier | SMOTE | Accuracy | Sensitivity | Specificity | AUC |
|---|---|---|---|---|---|---|
| MW (10 features) | nnet | No | 0.72 | 0.10 | 0.93 | 0.44 |
| Relief (10 features) | nnet | No | 0.75 | 0.00 | 1.00 | 0.44 |
| ReliefF (5 features) | SVM | Yes | 0.70 | 0.35 | 0.81 | 0.65 |
| MW (15 features) | SVM | Yes | 0.64 | 0.40 | 0.72 | 0.64 |
| ReliefF (5 features) | gbm | Yes | 0.64 | 0.60 | 0.65 | 0.63 |
| ReliefF (20 features) | gbm | Yes | 0.63 | 0.50 | 0.67 | 0.63 |
| MW (10 features) | SVM | Yes | 0.64 | 0.45 | 0.70 | 0.63 |
| MW (20 features) | SVM | Yes | 0.71 | 0.40 | 0.81 | 0.63 |
| ReliefF (15 features) | SVM | Yes | 0.67 | 0.45 | 0.75 | 0.62 |
| MW (5 features) | SVM | Yes | 0.71 | 0.35 | 0.83 | 0.62 |
| MW (15 features) | gbm | Yes | 0.67 | 0.40 | 0.77 | 0.62 |
| ReliefF (15 features) | RF | Yes | 0.62 | 0.40 | 0.70 | 0.60 |
KRAS mutation prediction best results on Test dataset, ensembles.
| Ensemble Combination | Classifiers | Accuracy | Sensitivity | Specificity | AUC |
|---|---|---|---|---|---|
| Ensemble SCAV thresh 8 (10 models) | SVM, nnet | 0.72 | 0.20 | 0.89 | 0.71 |
| Ensemble SCAV thresh 6 (10 models) | SVM, nnet | 0.73 | 0.30 | 0.87 | 0.69 |
| Ensemble Average (5 models) | SVM | 0.70 | 0.35 | 0.81 | 0.67 |
| Ensemble Maximum (5 models) | SVM | 0.70 | 0.40 | 0.80 | 0.66 |
| Ensemble Average (10 models) | SVM, nnet | 0.66 | 0.35 | 0.76 | 0.65 |
EGFR mutation best results on Test dataset, CNNs.
| Model | Optimizer | Learning Rate | Epochs | Accuracy | Sensitivity | Specificity | AUC |
|---|---|---|---|---|---|---|---|
| Arch. 4 | SGD | 0.0005 | 30 | 0.800 | 0.667 | 0.846 | 0.846 |
| Arch. 6 | SGD | 0.0005 | 30 | 0.771 | 0.222 | 0.961 | 0.752 |
| Arch. 1 | SGD | 0.01 | 8 | 0.400 | 1.000 | 0.192 | 0.688 |
| Arch. 6 | SGD | 0.01 | 10 | 0.657 | 0.666 | 0.654 | 0.675 |
| Arch. 3 | SGD | 0.01 | 7 | 0.543 | 0.778 | 0.461 | 0.671 |
| Arch. 4 | SGD | 0.01 | 10 | 0.543 | 0.778 | 0.461 | 0.628 |
| Arch. 1 | SGD | 0.01 | 30 | 0.514 | 0.778 | 0.423 | 0.623 |
| Arch. 2 | SGD | 0.01 | 30 | 0.542 | 0.667 | 0.538 | 0.571 |
| Arch. 4 | SGD | 0.01 | 20 | 0.600 | 0.444 | 0.654 | 0.559 |
EGFR mutation best results on Test dataset, ensembles of CNNs.
| Model | Accuracy | Sensitivity | Specificity | AUC |
|---|---|---|---|---|
| Ensemble (3 models) SCAV thresh 3 | 0.828 | 0.667 | 0.885 | 0.820 |
| Ensemble (5 models) SCAV thresh 5 | 0.857 | 0.667 | 0.923 | 0.778 |
| Ensemble (3 models) Average | 0.486 | 0.778 | 0.385 | 0.743 |
| Ensemble (5 models) Average | 0.628 | 0.778 | 0.577 | 0.641 |
| Ensemble (3 models) Maximum | 0.371 | 0.778 | 0.231 | 0.624 |
KRAS mutation best results on Test dataset, CNNs.
| Model | Optimizer | Learning Rate | Epochs | Accuracy | Sensitivity | Specificity | AUC |
|---|---|---|---|---|---|---|---|
| Arch. 1 | SGD | 0.01 | 60 | 0.667 | 0.000 | 1.000 | 0.739 |
| Arch. 6 | Adam | 0.005 | 10 | 0.333 | 1.000 | 0.000 | 0.607 |
| Arch. 6 | Adam | 0.001 | 10 | 0.667 | 0.000 | 1.000 | 0.593 |
| Arch. 1 | Adam | 0.005 | 15 | 0.722 | 0.250 | 0.958 | 0.566 |
| Arch. 1 | SGD | 0.01 | 90 | 0.667 | 0.000 | 1.000 | 0.555 |
| Arch. 1 | SGD | 0.01 | 10 | 0.555 | 0.667 | 0.500 | 0.531 |
KRAS mutation best results on Test dataset, ensembles of CNNs.
| Model | Accuracy | Sensitivity | Specificity | AUC |
|---|---|---|---|---|
| Ensemble (3 models) Average | 0.722 | 0.250 | 0.958 | 0.778 |
| Ensemble (3 models) SCAV thresh 2 | 0.722 | 0.250 | 0.958 | 0.722 |
| Ensemble (4 models) SCAV thresh 3 | 0.722 | 0.250 | 0.958 | 0.642 |
| Ensemble (7 models) SCAV thresh 4 | 0.694 | 0.416 | 0.833 | 0.618 |
| Ensemble (7 models) SCAV thresh 5 | 0.694 | 0.083 | 1.000 | 0.604 |