| Literature DB >> 23725313 |
Siow-Wee Chang1, Sameem Abdul-Kareem, Amir Feisal Merican, Rosnah Binti Zain.
Abstract
BACKGROUND: Machine learning techniques are becoming useful as an alternative approach to conventional medical diagnosis or prognosis as they are good for handling noisy and incomplete data, and significant results can be attained despite a small sample size. Traditionally, clinicians make prognostic decisions based on clinicopathologic markers. However, it is not easy for the most skilful clinician to come out with an accurate prognosis by using these markers alone. Thus, there is a need to use genomic markers to improve the accuracy of prognosis. The main aim of this research is to apply a hybrid of feature selection and machine learning methods in oral cancer prognosis based on the parameters of the correlation of clinicopathologic and genomic markers.Entities:
Mesh:
Substances:
Year: 2013 PMID: 23725313 PMCID: PMC3673908 DOI: 10.1186/1471-2105-14-170
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Figure 1Framework for oral cancer prognostic model.
1-year, 2-year and 3-year survival
| 1-year | Survive | 27 | 87.1 |
| | Dead | 4 | 12.9 |
| | Lost of follow-up | 0 | 0.0 |
| 2-year | Survive | 19 | 61.3 |
| | Dead | 10 | 32.3 |
| | Lost of follow-up | 2 | 6.5 |
| 3-year | Survive | 17 | 54.8 |
| | Dead | 11 | 38.7 |
| Lost of follow-up | 3 | 9.7 |
Description and membership function for clinicopathologic and genomic variables
| Age | Age at diagnosis | 1 - 40–50, 2 - >50-60, 3 - >60-70, 4 - >70 |
| Eth | Ethnicity | 1 - Malay, 2 - Chinese, 3 - Indian |
| Gen | Gender | 1 - Male, 2 - Female |
| Smoke | Smoking habit | 1 - Yes, 2 - No |
| Drink | Alcohol drinking habit | 1 - Yes, 2 - No |
| Chew | Quid chewing habit | 1 - Yes, 2 - No |
| Site | Primary site of tumor | 1 - Buccal mucosa, 2 - tongue |
| 3 - floor, 4 - others | ||
| Subtype | Subtype and differentiation for SCC | 1 - Well differentiated |
| 2 - moderate differentiated | ||
| 3 - poorly differentiated | ||
| Inv | Depth of Invasion front | 1 - Non-cohesive, 2 - cohesive |
| Node | Neck nodes | 1 - Negative, 2 - positive |
| PT | Pathological tumor staging | 1 - T1, 2 - T2, 3 - T3, 4 - T4 |
| PN | Pathological lymph nodes | 1 - N0, 2- N1, 3- N2A, 4- N2B |
| Stage | Overall stage | 1 - I, 2 - II, 3 - III, 4 - IV |
| Size | Size of tumor | 1 - 0-2 cm, 2 - >2-4 cm, 3 - >4-6 cm, 4 - >6 cm |
| Treat | Type of treatment | 1 - Surgery only |
| 2 - Surgery + Radiotherapy | ||
| 3 - Surgery + Chemotherapy | ||
| Tumor suppressor gene | 1 - negative, 2 - positive | |
| Tumor suppressor gene | 1 - negative, 2 - positive | |
Figure 2Procedures for IHC results analysis and scoring.
Figure 3Genetic algorithm feature selection flowchart.
Mean square error rate for -input model
| 1-input | 0.3881 | 0.3626 |
| 2-input | 0.4193 | 0.2903 |
| 3-input | 0.3871 | 0.2581 |
| 4-input | 0.3871 | 0.2903 |
| 5-input | 0.3871 | 0.3226 |
| 6-input | 0.3871 | 0.3548 |
| 7-input | 0.4571 | 0.3548 |
| 8-input | 0.4839 | 0.4194 |
| 9-input | 0.5161 | 0.4516 |
Figure 4ANFIS rules for a 3-input model.
Feature subset selected for group 1
| GA | |
| 3-input | |
| 4-input | |
| 5-input | |
| 6-input | |
| 7-input | |
| CC | |
| 3-input | |
| 4-input | |
| 5-input | |
| 6-input | |
| 7-input | |
| ReliefF | |
| 3-input | |
| 4-input | |
| 5-input | |
| 6-input | |
| 7-input | |
| CC-GA | |
| 3-input | |
| 4-input | |
| 5-input | |
| 6-input | |
| 7-input | |
| ReliefF-GA | |
| 3-input | |
| 4-input | |
| 5-input | |
| 6-input | |
| 7-input |
Classification accuracy and AUC for group 1
| ANFIS | | | | | | | | | | |
| GA | 70.95 | 0.66 | 67.42 | 0.61 | 64.76 | 0.63 | 58.57 | 0.55 | 57.62 | 0.54 |
| CC | 58.10 | 0.53 | 74.76 | 0.70 | 51.43 | 0.43 | 57.62 | 0.50 | 64.29 | 0.58 |
| ReliefF | 61.43 | 0.53 | 50.59 | 0.50 | 58.10 | 0.50 | 64.29 | 0.54 | 64.29 | 0.54 |
| CC-GA | 44.76 | 0.44 | 67.62 | 0.57 | 63.81 | 0.55 | 64.29 | 0.54 | 57.62 | 0.52 |
| ReliefF-GA | 67.14 | 0.55 | 60.48 | 0.59 | 67.62 | 0.59 | 51.90 | 0.47 | 64.76 | 0.57 |
| ANN | | | | | | | | | | |
| GA | 45.52 | 0.53 | 52.43 | 0.53 | 45.05 | 0.47 | 48.38 | 0.52 | 45.33 | 0.50 |
| CC | 54.48 | 0.61 | 53.57 | 0.59 | 51.29 | 0.58 | 51.29 | 0.51 | 52.33 | 0.53 |
| ReliefF | 51.52 | 0.48 | 41.62 | 0.47 | 46.05 | 0.49 | 46.05 | 0.48 | 44.10 | 0.48 |
| CC-GA | 49.24 | 0.51 | 49.48 | 0.52 | 46.67 | 0.49 | 48.29 | 0.49 | 50.48 | 0.51 |
| ReliefF-GA | 50.24 | 0.55 | 52.86 | 0.59 | 56.76 | 0.58 | 47.00 | 0.51 | 50.05 | 0.54 |
| SVM | | | | | | | | | | |
| GA | 60.95 | 0.53 | 61.43 | 0.51 | 58.10 | 0.48 | 58.10 | 0.46 | 61.43 | 0.49 |
| CC | 60.95 | 0.53 | 60.95 | 0.53 | 58.10 | 0.46 | 51.43 | 0.41 | 51.43 | 0.41 |
| ReliefF | 54.29 | 0.44 | 50.95 | 0.42 | 51.43 | 0.42 | 48.10 | 0.40 | 50.95 | 0.45 |
| CC-GA | 63.81 | 0.55 | 61.43 | 0.51 | 58.10 | 0.46 | 58.10 | 0.48 | 58.10 | 0.49 |
| ReliefF-GA | 64.29 | 0.50 | 64.29 | 0.50 | 64.29 | 0.50 | 64.29 | 0.50 | 54.76 | 0.46 |
| LR | | | | | | | | | | |
| GA | 64.29 | 0.56 | 67.62 | 0.60 | 64.76 | 0.55 | 68.10 | 0.64 | 64.29 | 0.60 |
| CC | 64.29 | 0.56 | 60.48 | 0.57 | 67.62 | 0.61 | 67.62 | 0.61 | 64.29 | 0.58 |
| ReliefF | 50.59 | 0.44 | 50.59 | 0.44 | 48.10 | 0.39 | 41.43 | 0.34 | 44.29 | 0.39 |
| CC-GA | 67.62 | 0.57 | 67.62 | 0.60 | 61.43 | 0.51 | 70.95 | 0.72 | 64.76 | 0.67 |
| ReliefF-GA | 54.29 | 0.54 | 51.43 | 0.52 | 61.43 | 0.62 | 47.62 | 0.55 | 48.10 | 0.51 |
Feature subset selected for group 2
| GA | |
| 3-input | |
| 4-input | |
| 5-input | |
| 6-input | |
| 7-input | |
| CC | |
| 3-input | |
| 4-input | |
| 5-input | |
| 6-input | |
| 7-input | |
| ReliefF | |
| 3-input | |
| 4-input | |
| 5-input | |
| 6-input | |
| 7-input | |
| CC-GA | |
| 3-input | |
| 4-input | |
| 5-input | |
| 6-input | |
| 7-input | |
| ReliefF-GA | |
| 3-input | |
| 4-input | |
| 5-input | |
| 6-input | |
| 7-input |
Classification accuracy and AUC for group 2
| ANFIS | | | | | | | | | | |
| GA | 74.76 | 0.74 | 67.62 | 0.70 | 41.90 | 0.40 | 58.57 | 0.58 | 35.71 | 0.36 |
| CC | 58.10 | 0.48 | 58.10 | 0.52 | 51.90 | 0.48 | 48.57 | 0.46 | 61.90 | 0.59 |
| ReliefF | 54.29 | 0.47 | 44.29 | 0.38 | 48.10 | 0.53 | 67.14 | 0.62 | 67.14 | 0.62 |
| CC-GA | 74.76 | 0.70 | 70.48 | 0.71 | 54.76 | 0.57 | 61.43 | 0.61 | 64.29 | 0.65 |
| ReliefF-GA | 93.81 | 0.90 | 93.81 | 0.90 | 65.71 | 0.63 | 64.76 | 0.62 | 68.10 | 0.67 |
| ANN | | | | | | | | | | |
| GA | 45.14 | 0.50 | 51.48 | 0.55 | 45.81 | 0.49 | 46.14 | 0.50 | 47.71 | 0.51 |
| CC | 46.24 | 0.46 | 49.38 | 0.49 | 46.14 | 0.50 | 57.38 | 0.58 | 55.48 | 0.57 |
| ReliefF | 40.62 | 0.48 | 43.24 | 0.49 | 47.71 | 0.50 | 49.48 | 0.51 | 48.76 | 0.50 |
| CC-GA | 49.38 | 0.52 | 53.90 | 0.60 | 47.05 | 0.52 | 44.76 | 0.48 | 55.19 | 0.57 |
| ReliefF-GA | 84.62 | 0.83 | 73.38 | 0.75 | 48.00 | 0.52 | 51.57 | 0.53 | 45.86 | 0.47 |
| SVM | | | | | | | | | | |
| GA | 74.76 | 0.70 | 54.76 | 0.51 | 70.95 | 0.65 | 60.95 | 0.55 | 50.95 | 0.42 |
| CC | 64.76 | 0.55 | 64.76 | 0.55 | 64.76 | 0.55 | 67.62 | 0.56 | 67.62 | 0.62 |
| ReliefF | 54.29 | 0.44 | 54.29 | 0.44 | 44.29 | 0.36 | 48.10 | 0.46 | 34.76 | 0.28 |
| CC-GA | 74.76 | 0.70 | 54.76 | 0.51 | 61.43 | 0.50 | 58.10 | 0.54 | 61.43 | 0.57 |
| ReliefF-GA | 74.76 | 0.70 | 71.43 | 0.68 | 74.76 | 0.70 | 74.43 | 0.66 | 54.76 | 0.53 |
| LR | | | | | | | | | | |
| GA | 74.76 | 0.70 | 63.81 | 0.64 | 67.14 | 0.57 | 54.76 | 0.43 | 54.29 | 0.47 |
| CC | 71.43 | 0.67 | 71.43 | 0.67 | 61.43 | 0.59 | 68.10 | 0.65 | 61.43 | 0.59 |
| ReliefF | 50.59 | 0.45 | 48.10 | 0.39 | 48.10 | 0.41 | 44.76 | 0.43 | 41.43 | 0.41 |
| CC-GA | 74.76 | 0.70 | 63.81 | 0.64 | 60.48 | 0.61 | 64.29 | 0.63 | 60.48 | 0.54 |
| ReliefF-GA | 74.76 | 0.70 | 74.76 | 0.70 | 71.43 | 0.68 | 58.10 | 0.55 | 61.43 | 0.60 |
Best accuracy for -input model based on feature selection method
| GA | 70.95 | 67.62 | 64.76 | 68.10 | 64.29 | 74.76 | 67.62 | 70.95 | 60.95 | 54.29 |
| CC | 64.29 | 74.76 | 67.62 | 67.62 | 64.29 | 71.43 | 71.43 | 64.76 | 68.10 | 67.62 |
| ReliefF | 61.43 | 50.59 | 58.10 | 64.29 | 64.29 | 54.29 | 54.29 | 48.10 | 67.14 | 67.14 |
| CC-GA | 67.62 | 67.62 | 63.81 | 70.95 | 64.76 | 74.76 | 70.48 | 61.43 | 64.29 | 64.29 |
| ReliefF-GA | 67.14 | 64.29 | 67.62 | 64.29 | 64.76 | 93.81 | 93.81 | 74.76 | 74.43 | 68.10 |
Figure 5Graphs for best accuracy for n-input model based on feature selection method for Group 1.
Figure 6Graphs for best accuracy for n-input model based on feature selection method for Group 2.
Best accuracy by classification method
| GA | 70.95 | 52.43 | 61.43 | 68.10 | 74.76 | 51.48 | 74.76 | 74.76 |
| CC | 74.76 | 54.48 | 60.95 | 67.62 | 61.90 | 57.38 | 67.62 | 71.43 |
| ReliefF | 64.29 | 51.52 | 54.29 | 50.59 | 67.14 | 49.48 | 54.29 | 50.59 |
| CC-GA | 67.62 | 50.48 | 63.81 | 70.95 | 74.76 | 55.19 | 74.76 | 74.76 |
| ReliefF-GA | 67.62 | 56.76 | 64.29 | 61.43 | 93.81 | 84.62 | 74.76 | 74.76 |
Figure 7Graphs for best accuracy by classification method for Group 1.
Figure 8Graphs for best accuracy by classification method for Group 2.
Best models with accuracy, AUC, classification method and selected features
| Group 1 | | | | |
| CC-3-input | 74.76 | 0.70 | ANFIS | |
| GA-3-input | 70.95 | 0.66 | ANFIS | |
| CC-GA-6-input | 70.95 | 0.73 | LR | |
| Group 2 | | | | |
| ReliefF-GA-3-input | 93.81 | 0.90 | ANFIS | |
| ReliefF-GA-4-input | 93.81 | 0.90 | ANFIS | |
| ReliefF-GA-3-input | 84.62 | 0.83 | ANN | |
| ReliefF-GA-3-input | 84.62 | 0.83 | ANN | |
| GA-3-input | 74.76 | 0.74 | ANFIS | |
| CC-GA-3-input | 74.76 | 0.70 | ANFIS | |
| CC-GA-3-input | 74.76 | 0.70 | SVM | |
| CC-GA-3-input | 74.76 | 0.70 | LR | |
| ReliefF-GA-3-input | 74.76 | 0.70 | SVM | |
| ReliefF-GA-3-input | 74.76 | 0.70 | LR | |
| Relief-GA-4-input | 74.76 | 0.70 | LR | |
| Relief-GA-5-input | 74.76 | 0.70 | SVM | |
| Relief-GA-6-input | 74.43 | 0.66 | SVM | |
| Relief-GA-4-input | 73.38 | 0.75 | ANN | |
| Relief-GA-4-input | 71.43 | 0.68 | SVM | |
| Relief-GA-5-input | 71.43 | 0.68 | LR | |
| CC-3-input | 71.43 | 0.67 | LR | |
| CC-4-input | 71.43 | 0.67 | LR | |
| CC-GA-4-input | 70.48 | 0.71 | ANFIS |
Comparison between the current work and the literature
| Passaro et al. [ | 124 patients, 231 controls | 74-79 |
| Oliveira et al. [ | 500 | 5-year survival of 28.6% |
| Exarchos et al. [ | 41 | 100 |
| Exarchos et al. [ | 86 | 100 |
| Dom et al. [ | 84 patients, 87 controls | 82 |
| Current work | 31 | 93.81 |
Validation test with random permutation of 3-input model and full input model for Group 2
| | | |
| 64.76 | 0.63 | |
| 57.14 | 0.49 | |
| 58.10 | 0.51 | |
| 70.95 | 0.59 | |
| 39.05 | 0.32 | |
| 80.48 | 0.70 | |
| 67.14 | 0.67 | |
| 54.76 | 0.55 | |
| 32.86 | 0.28 | |
| 48.10 | 0.41 | |
| | | |
| Full model with ANFIS | N.A.* | N.A.* |
| Full model with NN | 42.90 | 0.47 |
| Full model with SVM | 54.76 | 0.46 |
| Full model with LR | 54.76 | 0.59 |
*N.A. - Results not available due to over-fitting problem as the rule-base generated was too large.
Classification results for 1-year to 3-year oral cancer prognosis
| 1-year | 93.33 | 0.90 |
| 2-year | 84.29 | 0.77 |
| 3-year | 93.81 | 0.90 |