| Literature DB >> 18831787 |
Rattikorn Hewett1, Phongphun Kijsanayothin.
Abstract
BACKGROUND: Gene expression profiles based on microarray data are recognized as potential diagnostic indices of cancer. Molecular tumor classifications resulted from these data and learning algorithms have advanced our understanding of genetic changes associated with cancer etiology and development. However, classifications are not always perfect and in such cases the classification rankings (likelihoods of correct class predictions) can be useful for directing further research (e.g., by deriving inferences about predictive indicators or prioritizing future experiments). Classification ranking is a challenging problem, particularly for microarray data, where there is a huge number of possible regulated genes with no known rating function. This study investigates the possibility of making tumor classification more informative by using a method for classification ranking that requires no additional ranking analysis and maintains relatively good classification accuracy.Entities:
Mesh:
Year: 2008 PMID: 18831787 PMCID: PMC2559886 DOI: 10.1186/1471-2164-9-S2-S21
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 3.969
Features in the classification models obtained by MDR.
| Name | Features |
| ALL-AML Leukemia | attribute1834, attribute6855 |
| Lung cancer | 37954_at, 33328_at, 1500_at, 34320_at, 37716_at |
| Prostate cancer | 37639_at, 34163_g_at, 38406_f_at, 1776_at, 33784_at, 32057_at |
| Lymphoma | GENE3328X, GENE3512X, GENE3261X |
| Breast cancer | AL080059, AF035278, AB014543, Contig16531_RC, Contig64861_RC, NM_004469, Contig34634_RC, Contig15044_RC |
| BCR-ABL | 1636_g_at, 36591_at, 37602_at, 40698_at |
| E2A-PBX1 | 32063_at |
| Hyp | 31308_at, 38461_at, 37543_at, 1916_s_at, 36620_at, 39721_at, 36517_at, |
| 38402_at | |
| MALL | 33412_at, 31397_at, 34306_at, 31318_at, 40506_s_at, 31329_at, |
| 38413_at, 31324_at, 36777_at | |
| TALL | 38319_at |
| TEL-AML1 | 36985_at, 31572_at, 31492_at, 36239_at, 32645_at, 31691_g_at |
Figure 1ROC curves on prostate cancer expressions
Classification results on five cancer types.
| ALL-AML Leukemia | ||||||
| Learner | Recall | FPR | FNR | ACC | AUC | Precision |
| C4.5 | 85.11 | 32.00 | 14.89 | 79.17 | 72.00 | 83.33 |
| Bayes | 100.00 | 4.00 | 0.00 | 98.61 | 97.90 | 97.92 |
| 3NN | 95.74 | 40.00 | 4.26 | 83.33 | 87.50 | 81.82 |
| SVM | 100.00 | 4.00 | 0.00 | 98.61 | 98.00 | 97.92 |
| ZeroR | 100.00 | 100.00 | 0.00 | 65.28 | 42.60 | 65.28 |
| MDR | 97.87 | 4.00 | 2.13 | 97.22 | 99.02 | 97.87 |
| Lung cancer | ||||||
| Learner | Recall | FPR | FNR | ACC | AUC | Precision |
| C4.5 | 87.10 | 3.33 | 12.90 | 95.03 | 93.00 | 84.38 |
| Bayes | 96.77 | 1.33 | 3.23 | 98.34 | 97.70 | 93.75 |
| 3NN | 64.52 | 0.00 | 35.48 | 93.92 | 96.50 | 100.00 |
| SVM | 96.77 | 0.00 | 3.23 | 99.45 | 98.00 | 100.00 |
| ZeroR | 0.00 | 0.00 | 100.00 | 82.87 | 48.50 | nan |
| MDR | 90.32 | 2.00 | 9.68 | 96.69 | 97.61 | 90.32 |
| Prostate cancer | ||||||
| Learner | Recall | FPR | FNR | ACC | AUC | Precision |
| C4.5 | 87.01 | 30.51 | 12.99 | 79.41 | 79.00 | 78.82 |
| Bayes | 32.47 | 13.56 | 67.53 | 55.88 | 59.50 | 75.76 |
| 3NN | 84.42 | 28.81 | 15.58 | 78.68 | 87.10 | 79.27 |
| SVM | 92.21 | 10.17 | 7.79 | 91.18 | 91.00 | 92.21 |
| ZeroR | 100.00 | 100.00 | 0.00 | 56.62 | 47.90 | 56.62 |
| MDR | 89.61 | 13.56 | 10.39 | 88.24 | 92.30 | 89.61 |
| Lymphoma cancer | ||||||
| Learner | Recall | FPR | FNR | ACC | AUC | Precision |
| C4.5 | 69.57 | 12.50 | 30.43 | 78.72 | 77.00 | 84.21 |
| Bayes | 100.00 | 4.17 | 0.00 | 97.87 | 97.90 | 95.83 |
| 3NN | 60.87 | 8.33 | 39.13 | 76.60 | 81.30 | 87.50 |
| SVM | 100.00 | 4.17 | 0.00 | 97.87 | 97.90 | 95.83 |
| ZeroR | 0.00 | 0.00 | 100.00 | 51.06 | 40.80 | nan |
| MDR | 86.96 | 12.50 | 13.04 | 87.23 | 93.61 | 86.96 |
| Breast cancer | ||||||
| Learner | Recall | FPR | FNR | ACC | AUC | Precision |
| C4.5 | 52.17 | 27.45 | 47.83 | 62.89 | 66.00 | 63.16 |
| Bayes | 4.35 | 0.00 | 95.65 | 54.64 | 52.20 | 100.00 |
| 3NN | 45.65 | 31.37 | 54.35 | 57.73 | 59.50 | 56.76 |
| SVM | 69.57 | 33.33 | 30.43 | 68.04 | 68.10 | 65.31 |
| ZeroR | 0.00 | 0.00 | 100.00 | 52.58 | 46.50 | nan |
| MDR | 60.87 | 35.29 | 39.13 | 62.89 | 63.65 | 60.87 |
Classification results on six subtypes of acute lymphoblastic leukemia.
| ALL-BCR-ABL | ||||||
| Learner | Recall | FPR | FNR | ACC | AUC | Precision |
| C4.5 | 33.33 | 2.88 | 66.67 | 94.19 | 59.10 | 35.71 |
| Bayes | 0.00 | 0.32 | 100.00 | 95.11 | 49.80 | 0.00 |
| 3NN | 13.33 | 0.00 | 86.67 | 96.02 | 75.10 | 100.00 |
| SVM | 26.67 | 0.00 | 73.33 | 96.64 | 63.30 | 100.00 |
| ZeroR | 0.00 | 0.00 | 100.00 | 95.41 | 41.40 | nan |
| MDR | 20.00 | 3.85 | 80.00 | 92.66 | 84.65 | 20.00 |
| ALL-E2A-PBX1 | ||||||
| Learner | Recall | FPR | FNR | ACC | AUC | Precision |
| C4.5 | 100.00 | 0.00 | 0.00 | 100.00 | 100.00 | 100.00 |
| Bayes | 3.70 | 0.00 | 96.30 | 92.05 | 53.70 | 100.00 |
| 3NN | 92.59 | 0.00 | 7.41 | 99.39 | 99.00 | 100.00 |
| SVM | 96.30 | 0.00 | 3.70 | 99.69 | 98.10 | 100.00 |
| ZeroR | 0.00 | 0.00 | 100.00 | 91.74 | 46.10 | nan |
| MDR | 100.00 | 0.00 | 0.00 | 100.00 | 100.00 | 100.00 |
| ALL-HYP | ||||||
| Learner | Recall | FPR | FNR | ACC | AUC | Precision |
| C4.5 | 65.63 | 6.08 | 34.38 | 88.38 | 79.50 | 72.41 |
| Bayes | 20.31 | 1.52 | 79.69 | 83.18 | 60.20 | 76.47 |
| 3NN | 73.44 | 1.52 | 26.56 | 93.58 | 93.30 | 92.16 |
| SVM | 87.50 | 0.76 | 12.50 | 96.94 | 93.40 | 96.55 |
| ZeroR | 0.00 | 0.00 | 100.00 | 80.43 | 47.70 | nan |
| MDR | 37.50 | 9.13 | 62.50 | 80.43 | 85.44 | 50.00 |
| ALL-MALL | ||||||
| Learner | Recall | FPR | FNR | ACC | AUC | Precision |
| C4.5 | 80.00 | 1.95 | 20.00 | 96.94 | 89.00 | 72.73 |
| Bayes | 0.00 | 0.33 | 100.00 | 93.58 | 49.80 | 0.00 |
| 3NN | 50.00 | 0.00 | 50.00 | 96.94 | 86.50 | 100.00 |
| SVM | 70.00 | 0.00 | 30.00 | 98.17 | 85.00 | 100.00 |
| ZeroR | 0.00 | 0.00 | 100.00 | 93.88 | 49.70 | nan |
| MDR | 45.00 | 3.58 | 55.00 | 93.27 | 88.90 | 45.00 |
| ALL-T-ALL | ||||||
| Learner | Recall | FPR | FNR | ACC | AUC | Precision |
| C4.5 | 100.00 | 0.35 | 0.00 | 99.69 | 99.80 | 97.73 |
| Bayes | 23.26 | 0.35 | 76.74 | 89.60 | 61.50 | 90.91 |
| 3NN | 81.40 | 0.00 | 18.60 | 97.55 | 94.40 | 100.00 |
| SVM | 97.67 | 0.00 | 2.33 | 99.69 | 98.80 | 100.00 |
| ZeroR | 0.00 | 0.00 | 100.00 | 86.85 | 47.20 | nan |
| MDR | 100.00 | 0.00 | 0.00 | 100.00 | 100.00 | 100.00 |
| ALL-TEL-AML1 | ||||||
| Learner | Recall | FPR | FNR | ACC | AUC | Precision |
| C4.5 | 87.34 | 2.42 | 12.66 | 95.11 | 92.50 | 92.00 |
| Bayes | 37.97 | 2.42 | 62.03 | 83.18 | 70.60 | 83.33 |
| 3NN | 93.67 | 3.63 | 6.33 | 95.72 | 97.90 | 89.16 |
| SVM | 100.00 | 1.61 | 0.00 | 98.78 | 99.20 | 95.18 |
| ZeroR | 0.00 | 0.00 | 100.00 | 75.84 | 49.10 | nan |
| MDR | 82.28 | 5.65 | 17.72 | 91.44 | 91.50 | 82.28 |
AUC Comparisons.
| Data set | C4.5 | Bayes | 3NN | SVM | ZeroR | MDR |
| ALLAML | 72.00 | 97.90 | 87.50 | 98.00 | 42.60 | |
| Lung | 93.00 | 97.70 | 96.50 | 48.5 | 97.61 | |
| Prostate | 79.00 | 59.50 | 87.10 | 91.00 | 47.90 | |
| Lymphoma | 77.00 | 81.30 | 40.8 | 93.61 | ||
| Breast | 66.00 | 52.20 | 59.50 | 46.5 | 63.65 | |
| BCR-ABL | 59.10 | 49.80 | 75.10 | 63.30 | 41.40 | |
| E2A-PBX1 | 53.70 | 99.00 | 98.10 | 46.10 | ||
| Hyp | 79.50 | 60.20 | 93.30 | 47.70 | 85.47 | |
| MALL | 89.00 | 49.80 | 86.50 | 85.00 | 49.70 | |
| TALL | 99.80 | 61.50 | 94.40 | 98.80 | 47.20 | |
| TEL-AML1 | 92.50 | 70.60 | 97.90 | 49.10 | 91.60 | |
| Average | 82.45 | 68.25 | 87.10 | 90.07 | 46.14 | |
Figure 2Basic steps of MDR
Figure 3Example of a model produced by MDR
Figure 4Performance metrics in binary classification
Figure 5ROC curves on breast cancer classification of the same accuracy
Information on cancer expression data.
| Name | #Attributes | # Inst. | Target Class Name (# Inst.) | The other Class Name (# Inst.) | ||
| ALL-AML Leukemia | 15154 | 72 | ALL | 47 | AML | 25 |
| Lung cancer | 12533 | 181 | MPM | 31 | ADCA | 150 |
| Prostate cancer | 12600 | 136 | Tumor | 77 | Normal | 59 |
| Lymphoma | 4026 | 47 | Germinal | 24 | Activated | 23 |
| Breast cancer | 24481 | 97 | Relapse | 46 | Non-relapse | 51 |
| ALL (Acute Lymphoblastic Leukemia)-subtypes | 12558 | 327 | BCR-ABL | 15 | non-BCR-ABL | 312 |
| 12558 | 327 | E2A-PBX1 | 27 | non-E2A-PBX1 | 300 | |
| 12558 | 327 | Hyp | 64 | non-Hyp | 263 | |
| 12558 | 327 | MALL | 20 | non-MALL | 307 | |
| 12558 | 327 | T-ALL | 43 | non-T-ALL | 284 | |
| 12558 | 327 | TEL-AML1 | 79 | non-TEL-AML1 | 248 | |