| Literature DB >> 18366602 |
Mehdi Pirooznia1, Jack Y Yang, Mary Qu Yang, Youping Deng.
Abstract
BACKGROUND: Several classification and feature selection methods have been studied for the identification of differentially expressed genes in microarray data. Classification methods such as SVM, RBF Neural Nets, MLP Neural Nets, Bayesian, Decision Tree and Random Forrest methods have been used in recent studies. The accuracy of these methods has been calculated with validation methods such as v-fold validation. However there is lack of comparison between these methods to find a better framework for classification, clustering and analysis of microarray gene expression results.Entities:
Mesh:
Year: 2008 PMID: 18366602 PMCID: PMC2386055 DOI: 10.1186/1471-2164-9-S1-S13
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 3.969
Eight Datasets used in Experiment
| 1. Lymphoma (Devos et.al, 2002) | Tumor vs. Normal | 7129 | 25 |
| 2. Breast Cancer (Perou et. al, 2000) | Tumor subtype vs. Normal | 1753 | 84 |
| 3. Colon Cancer (Alon et. al, 1999) | Epithelial vs. Tumor | 7464 | 45 |
| 4. Lung Cancer (Garber et. al, 2001) | Tumor vs. Normal | 917 | 72 |
| 5. Adenocarcinoma (Beer et.al, 2002) | NP vs. NN | 5377 | 86 |
| 6. Lymphoma (Alizadeh et al, 2000) | DLBCL1 vs. DLBCL2 | 4027 | 96 |
| 7. Melanoma (Bittner et. al, 2000) | Tumor vs. Normal | 8067 | 38 |
| 8. Ovarian Cancer (Welsh et. al, 2001) | Tumor vs. Normal | 7129 | 39 |
Figure 1Percentage accuracy of 10-fold cross validation of classification methods for all genes. Results of 10-fold cross validation of the classification methods applied to all datasets without performing any feature selection.
Percentage accuracy of 10-fold cross validation of classification methods for all genes
| 1. Lymphoma (Devos et.al, 2002) | 96.0 | 84.0 | 68.0 | 88.0 | 64.0 | 76.0 | 48.0 | 52.0 |
| 2. Breast Cancer (Perou et. al, 2000) | 97.6 | 97.6 | 96.4 | 92.9 | 92.9 | 96.4 | 94.0 | 96.4 |
| 3. Colon Cancer (Alon et. al, 1999) | 95.6 | 91.1 | 91.1 | 93.3 | 91.1 | 80.0 | 88.9 | 93.3 |
| 4. Lung Cancer (Garber et. al, 2001) | 97.2 | 97.2 | 97.2 | 95.8 | 94.4 | 95.8 | 97.2 | 97.2 |
| 5. Adenocarcinoma (Beer et.al, 2002) | 96.5 | 94.2 | 75.6 | 75.6 | 74.4 | 79.1 | 66.3 | 79.1 |
| 6. Lymphoma (Alizadeh et al, 2000) | 96.9 | 88.5 | 75.0 | 85.4 | 75.0 | 76.0 | 62.5 | 84.4 |
| 7. Melanoma (Bittner et. al, 2000) | 94.7 | 81.6 | 84.2 | 76.3 | 81.6 | 81.6 | 52.6 | 81.6 |
| 8. Ovarian Cancer (Welsh et. al, 2001) | 94.9 | 84.6 | 89.7 | 87.2 | 87.2 | 89.7 | 74.4 | 89.7 |
Percentage accuracy of 10-fold cross validation of clustering methods for all genes
| 1. Lymphoma (Devos et.al, 2002) | 64.0 | 52.0 | 64.0 | 64.0 |
| 2. Breast Cancer (Perou et. al, 2000) | 67.9 | 71.4 | 85.7 | 67.9 |
| 3. Colon Cancer (Alon et. al, 1999) | 53.3 | 71.1 | 68.9 | 53.3 |
| 4. Lung Cancer (Garber et. al, 2001) | 79.2 | 37.5 | 75.0 | 80.6 |
| 5. Adenocarcinoma (Beer et.al, 2002) | 42.0 | 54.7 | 74.4 | 51.2 |
| 6. Lymphoma (Alizadeh et al, 2000) | 52.1 | 54.2 | 78.1 | 54.2 |
| 7. Melanoma (Bittner et. al, 2000) | 73.7 | 81.6 | 73.7 | 73.7 |
| 8. Ovarian Cancer (Welsh et. al, 2001) | 61.5 | 61.5 | 89.7 | 66.7 |
Figure 2Percentage accuracy of 10-fold cross validation of clustering methods for all genes. Results of 10-fold cross validation of the two class clustering methods applied to all datasets,
10-fold cross validation evaluation result of feature selection methods applied to the classification methods. X:Y pattern indicates X as the error rate in cancer samples and Y as the error rate in normal samples
| # Genes | SVM | RBF | MLP | Bayesian | J48 | ID3 | R. Forest | Bagging | |
| SVM-RFE | 50 | 0:0 | 0:1 | 2:3 | 1:1 | 1:2 | 1:1 | 4:5 | 3:6 |
| CFS | 50 | 1:0 | 2:1 | 3:3 | 2:1 | 4:4 | 2:2 | 3:6 | 3:6 |
| ChiSquared | 50 | 1:0 | 2:2 | 4:3 | 1:1 | 3:4 | 2:3 | 2:3 | 4:1 |
| All features | 7129 | 1:0 | 2:2 | 4:4 | 2:1 | 4:5 | 2:4 | 7:6 | 9:3 |
| # Genes | SVM | RBF | MLP | Bayesian | J48 | ID3 | R. Forest | Bagging | |
| SVM-RFE | 50 | 0:0 | 1:0 | 1:1 | 3:1 | 4:1 | 1:1 | 2:1 | 1:0 |
| CFS | 50 | 1:0 | 1:0 | 2:1 | 3:2 | 3:1 | 1:1 | 1:1 | 1:0 |
| ChiSquared | 50 | 1:1 | 1:1 | 1:1 | 2:2 | 3:1 | 1:0 | 1:1 | 1:0 |
| All features | 1753 | 1:1 | 1:1 | 2:1 | 4:2 | 4:2 | 2:1 | 4:1 | 2:1 |
| # Genes | SVM | RBF | MLP | Bayesian | J48 | ID3 | R. Forest | Bagging | |
| SVM-RFE | 50 | 0:0 | 0:1 | 1:0 | 1:0 | 2:0 | 3:1 | 1:1 | 1:1 |
| CFS | 50 | 1:1 | 2:1 | 1:1 | 2:0 | 1:1 | 2:2 | 1:1 | 1:0 |
| ChiSquared | 50 | 1:0 | 2:2 | 2:0 | 1:0 | 1:0 | 1:1 | 2:1 | 1:0 |
| All features | 7464 | 2:0 | 2:2 | 2:2 | 3:0 | 2:2 | 6:3 | 3:2 | 2:1 |
| # Genes | SVM | RBF | MLP | Bayesian | J48 | ID3 | R. Forest | Bagging | |
| SVM-RFE | 50 | 0:0 | 0:1 | 1:0 | 1:1 | 1:1 | 1:0 | 1:0 | 1:0 |
| CFS | 50 | 1:1 | 1:1 | 1:0 | 1:0 | 1:1 | 1:1 | 1:1 | 1:1 |
| ChiSquared | 50 | 1:0 | 1:0 | 1:0 | 1:1 | 2:1 | 2:0 | 1:1 | 1:1 |
| All features | 917 | 2:0 | 2:0 | 1:1 | 2:1 | 2:2 | 2:1 | 1:1 | 1:1 |
| # Genes | SVM | RBF | MLP | Bayesian | J48 | ID3 | R. Forest | Bagging | |
| SVM-RFE | 50 | 0:0 | 2:1 | 2:3 | 4:5 | 4:5 | 3:6 | 4:5 | 3:6 |
| CFS | 50 | 1:0 | 1:1 | 3:3 | 3:6 | 3:6 | 3:6 | 3:6 | 3:6 |
| ChiSquared | 50 | 1:0 | 2:2 | 4:3 | 5:5 | 3:5 | 5:5 | 2:3 | 5:5 |
| All features | 5377 | 2:1 | 3:2 | 15:6 | 15:6 | 15:7 | 14:4 | 17:13 | 12:6 |
| # Genes | SVM | RBF | MLP | Bayesian | J48 | ID3 | R. Forest | Bagging | |
| SVM-RFE | 50 | 0:0 | 0:1 | 2:3 | 4:5 | 4:5 | 3:6 | 4:5 | 3:6 |
| CFS | 50 | 1:1 | 2:3 | 3:3 | 3:6 | 3:6 | 3:6 | 3:6 | 3:6 |
| ChiSquared | 50 | 1:1 | 2:2 | 4:3 | 5:5 | 3:5 | 5:5 | 2:3 | 5:5 |
| All features | 4027 | 2:1 | 9:2 | 15:7 | 12:2 | 14:10 | 16:7 | 21:15 | 12:3 |
| # Genes | SVM | RBF | MLP | Bayesian | J48 | ID3 | R. Forest | Bagging | |
| SVM-RFE | 50 | 0:0 | 0:1 | 2:1 | 2:1 | 3:1 | 3:1 | 4:5 | 3:1 |
| CFS | 50 | 1:0 | 2:3 | 2:2 | 2:2 | 2:1 | 2:1 | 3:6 | 3:2 |
| ChiSquared | 50 | 1:0 | 2:2 | 3:2 | 2:3 | 2:2 | 2:2 | 2:3 | 3:2 |
| All features | 8067 | 2:0 | 4:3 | 4:2 | 6:3 | 4:3 | 4:3 | 15:3 | 5:2 |
| # Genes | SVM | RBF | MLP | Bayesian | J48 | ID3 | R. Forest | Bagging | |
| SVM-RFE | 50 | 0:0 | 0:1 | 1:1 | 1:1 | 1:1 | 1:1 | 2:1 | 3:1 |
| CFS | 50 | 1:0 | 3:2 | 1:2 | 1:1 | 1:1 | 1:1 | 2:2 | 2:1 |
| ChiSquared | 50 | 1:0 | 2:2 | 2:1 | 1:1 | 1:1 | 1:1 | 2:3 | 1:1 |
| All features | 7129 | 2:0 | 4:2 | 2:2 | 3:2 | 3:2 | 2:2 | 7:3 | 3:1 |
Percentage accuracy of 10-fold cross validation of feature selection methods applied to the classification methods.
| # Genes | SVM | RBF | MLP | Bayesian | J48 | ID3 | R. Forest | Bagging | |
| SVM-RFE | 50 | 100.00 | 96.00 | 80.00 | 92.00 | 88.00 | 92.00 | 64.00 | 64.00 |
| CFS | 50 | 96.00 | 88.00 | 76.00 | 88.00 | 68.00 | 84.00 | 64.00 | 64.00 |
| ChiSquared | 50 | 96.00 | 84.00 | 72.00 | 92.00 | 72.00 | 80.00 | 80.00 | 80.00 |
| All features | 7129 | 96.00 | 84.00 | 68.00 | 88.00 | 64.00 | 76.00 | 48.00 | 52.00 |
| # Genes | SVM | RBF | MLP | Bayesian | J48 | ID3 | R. Forest | Bagging | |
| SVM-RFE | 50 | 100.00 | 98.81 | 97.62 | 95.24 | 94.05 | 97.62 | 96.43 | 98.81 |
| CFS | 50 | 98.81 | 98.81 | 96.43 | 94.05 | 95.24 | 97.62 | 97.62 | 98.81 |
| ChiSquared | 50 | 97.62 | 97.62 | 97.62 | 95.24 | 95.24 | 98.81 | 97.62 | 98.81 |
| All features | 1753 | 97.62 | 97.62 | 96.43 | 92.86 | 92.86 | 96.43 | 94.05 | 96.43 |
| # Genes | SVM | RBF | MLP | Bayesian | J48 | ID3 | R. Forest | Bagging | |
| SVM-RFE | 50 | 100.00 | 97.78 | 97.78 | 97.78 | 95.56 | 91.11 | 95.56 | 95.56 |
| CFS | 50 | 95.56 | 93.33 | 95.56 | 95.56 | 95.56 | 95.56 | 95.56 | 97.78 |
| ChiSquared | 50 | 97.78 | 91.11 | 95.56 | 97.78 | 97.78 | 95.56 | 93.33 | 97.78 |
| All features | 7464 | 95.56 | 91.11 | 91.11 | 93.33 | 91.11 | 80.00 | 88.89 | 93.33 |
| # Genes | SVM | RBF | MLP | Bayesian | J48 | ID3 | R. Forest | Bagging | |
| SVM-RFE | 50 | 100.00 | 98.61 | 98.61 | 97.22 | 97.22 | 98.61 | 98.61 | 98.61 |
| CFS | 50 | 97.22 | 97.22 | 98.61 | 98.61 | 97.22 | 97.22 | 97.22 | 97.22 |
| ChiSquared | 50 | 98.61 | 98.61 | 98.61 | 97.22 | 95.83 | 97.22 | 97.22 | 97.22 |
| All features | 917 | 97.22 | 97.22 | 97.22 | 95.83 | 94.44 | 95.83 | 97.22 | 97.22 |
| # Genes | SVM | RBF | MLP | Bayesian | J48 | ID3 | R. Forest | Bagging | |
| SVM-RFE | 50 | 100.00 | 96.51 | 94.19 | 89.53 | 89.53 | 89.53 | 89.53 | 89.53 |
| CFS | 50 | 98.84 | 97.67 | 93.02 | 89.53 | 89.53 | 89.53 | 89.53 | 89.53 |
| ChiSquared | 50 | 98.84 | 95.35 | 91.86 | 88.37 | 90.70 | 88.37 | 94.19 | 88.37 |
| All features | 5377 | 96.51 | 94.19 | 75.58 | 75.58 | 74.42 | 79.07 | 66.28 | 79.07 |
| # Genes | SVM | RBF | MLP | Bayesian | J48 | ID3 | R. Forest | Bagging | |
| SVM-RFE | 50 | 100.00 | 100.00 | 94.79 | 90.63 | 90.63 | 90.63 | 90.63 | 90.63 |
| CFS | 50 | 97.92 | 94.79 | 93.75 | 90.63 | 90.63 | 90.63 | 90.63 | 90.63 |
| ChiSquared | 50 | 97.92 | 95.83 | 92.71 | 89.58 | 91.67 | 89.58 | 94.79 | 89.58 |
| All features | 4027 | 96.88 | 88.54 | 77.08 | 85.42 | 75.00 | 76.04 | 62.50 | 84.38 |
| # Genes | SVM | RBF | MLP | Bayesian | J48 | ID3 | R. Forest | Bagging | |
| SVM-RFE | 50 | 100.00 | 97.37 | 92.11 | 92.11 | 89.47 | 89.47 | 76.32 | 89.47 |
| CFS | 50 | 97.37 | 86.84 | 89.47 | 89.47 | 92.11 | 92.11 | 76.32 | 86.84 |
| ChiSquared | 50 | 97.37 | 89.47 | 86.84 | 86.84 | 89.47 | 89.47 | 86.84 | 86.84 |
| All features | 8067 | 94.74 | 81.58 | 84.21 | 76.32 | 81.58 | 81.58 | 52.63 | 81.58 |
| # Genes | SVM | RBF | MLP | Bayesian | J48 | ID3 | R. Forest | Bagging | |
| SVM-RFE | 50 | 100.00 | 100.00 | 94.87 | 94.87 | 94.87 | 94.87 | 92.31 | 89.74 |
| CFS | 50 | 97.44 | 87.18 | 92.31 | 94.87 | 94.87 | 94.87 | 89.74 | 92.31 |
| ChiSquared | 50 | 97.44 | 89.74 | 92.31 | 94.87 | 94.87 | 94.87 | 87.18 | 94.87 |
| All features | 7129 | 94.87 | 84.62 | 89.74 | 87.18 | 87.18 | 89.74 | 74.36 | 89.74 |