| Literature DB >> 18647401 |
Alexander Statnikov1, Lily Wang, Constantin F Aliferis.
Abstract
BACKGROUND: Cancer diagnosis and clinical outcome prediction are among the most important emerging applications of gene expression microarray technology with several molecular signatures on their way toward clinical deployment. Use of the most accurate classification algorithms available for microarray gene expression data is a critical ingredient in order to develop the best possible molecular signatures for patient care. As suggested by a large body of literature to date, support vector machines can be considered "best of class" algorithms for classification of such data. Recent work, however, suggests that random forest classifiers may outperform support vector machines in this domain.Entities:
Mesh:
Substances:
Year: 2008 PMID: 18647401 PMCID: PMC2492881 DOI: 10.1186/1471-2105-9-319
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Figure 1Classification performance of SVMs and RFs without gene selection. The performance is estimated using area under ROC curve (AUC) for binary classification tasks and relative classifier information (RCI) for multicategory tasks.
Comparison of classification performance of SVMs and RFs without gene selection.
| AUC | 0.867 | 0.867 | - | 1 | |
| AUC | 0.821 | 0.767 | SVM | 0.409 | |
| AUC | 0.992 | 0.973 | SVM | 0.500 | |
| AUC | 0.964 | 0.944 | SVM | 0.377 | |
| AUC | 0.798 | 0.646 | SVM | ||
| AUC | 0.519 | 0.561 | RF | 0.546 | |
| AUC | 0.663 | 0.763 | RF | 0.061 | |
| AUC | 0.692 | 0.600 | SVM | 0.235 | |
| AUC | 0.689 | 0.629 | SVM | 0.140 | |
| AUC | 0.747 | 0.754 | RF | 0.867 | |
| AUC | 0.777 | 0.660 | SVM | ||
| RCI | 1.000 | 1.000 | - | 1 | |
| RCI | 0.944 | 0.894 | SVM | 0.658 | |
| RCI | 0.895 | 0.763 | SVM | ||
| RCI | 0.939 | 0.934 | SVM | 1 | |
| RCI | 1.000 | 1.000 | - | 1 | |
| RCI | 0.775 | 0.733 | SVM | 0.498 | |
| RCI | 0.823 | 0.611 | SVM | ||
| RCI | 0.905 | 0.861 | SVM | ||
| RCI | 0.770 | 0.819 | RF | 0.249 | |
| RCI | 0.958 | 0.910 | SVM | ||
| RCI | 0.451 | 0.304 | SVM | ||
The performance is estimated using area under ROC curve (AUC) for binary classification tasks and relative classifier information (RCI) for multicategory tasks. See subsection "Statistical comparison among classifiers" for the description of statistical test employed to compute reported p-values. P-values shown with boldface denote statistically significant differences between classification methods at the 0.05 α level.
Figure 2Classification performance of SVMs and RFs with gene selection. The performance is estimated using area under ROC curve (AUC) for binary classification tasks and relative classifier information (RCI) for multicategory tasks.
Comparison of classification performance of SVMs and RFs with gene selection.
| AUC | 0.938 | 0.917 | SVM | 0.626 | |
| AUC | 0.821 | 0.781 | SVM | 0.624 | |
| AUC | 0.992 | 0.975 | SVM | 0.502 | |
| AUC | 0.964 | 0.972 | RF | 0.812 | |
| AUC | 0.798 | 0.648 | SVM | ||
| AUC | 0.519 | 0.561 | RF | 0.550 | |
| AUC | 0.713 | 0.763 | RF | 0.750 | |
| AUC | 0.692 | 0.629 | SVM | 0.506 | |
| AUC | 0.689 | 0.631 | SVM | 0.128 | |
| AUC | 0.758 | 0.754 | SVM | 0.954 | |
| AUC | 0.777 | 0.716 | SVM | 0.082 | |
| RCI | 1.000 | 1.000 | - | 1 | |
| RCI | 0.944 | 0.911 | SVM | 0.624 | |
| RCI | 0.895 | 0.817 | SVM | 0.125 | |
| RCI | 0.953 | 0.934 | SVM | 1 | |
| RCI | 1.000 | 1.000 | - | 1 | |
| RCI | 0.812 | 0.733 | SVM | 0.220 | |
| RCI | 0.823 | 0.688 | SVM | 0.079 | |
| RCI | 0.911 | 0.880 | SVM | 0.066 | |
| RCI | 0.876 | 0.856 | SVM | 0.626 | |
| RCI | 0.958 | 0.922 | SVM | 0.078 | |
| RCI | 0.451 | 0.371 | SVM | 0.262 | |
The performance is estimated using area under ROC curve (AUC) for binary classification tasks and relative classifier information (RCI) for multicategory tasks. See subsection "Statistical comparison among classifiers" for the description of statistical test employed to compute reported p-values. P-values shown with boldface denote statistically significant differences between classification methods at the 0.05 α level.
Number of genes selected for each microarray dataset and gene selection method.
| 4026 | 12 | 62 | 73 | 19 | 15 | |
| 2000 | 105 | 16 | 3 | 15 | 13 | |
| 11225 | 74 | 709 | 57 | 106 | 48 | |
| 12600 | 289 | 27 | 15 | 1864 | 653 | |
| 5327 | 12 | 456 | 336 | 42 | 4 | |
| 2308 | 28 | 17 | 18 | 15 | 11 | |
| 10367 | 1598 | 126 | 101 | 476 | 926 | |
| 5920 | 186 | 34 | 16 | 70 | 435 | |
| 15009 | 3346 | 966 | 411 | 8248 | 10277 | |
| 13247 | 1576 | 12 | 4 | 4129 | 1364 | |
| 5469 | 8 | 15 | 6 | 13 | 89 | |
| 10509 | 157 | 58 | 21 | 22 | 38 | |
| 5726 | 169 | 152 | 73 | 93 | 97 | |
| 12533 | 2429 | 845 | 320 | 1318 | 1927 | |
| 7129 | 201 | 15 | 7 | 953 | 1380 | |
| 12600 | 21 | 46 | 7 | 138 | 61 | |
| 7070 | 103 | 38 | 7 | 168 | 185 | |
| 7129 | 70 | 29 | 13 | 445 | 439 | |
| 7399 | 2338 | 124 | 27 | 3201 | 3897 | |
| 24188 | 1056 | 124 | 20 | 5388 | 4405 | |
| 24188 | 491 | 149 | 39 | 1194 | 1764 | |
| 12240 | 1187 | 21 | 6 | 3077 | 1869 |
Average number of genes selected over 10 cross-validation training sets.
Gene expression microarray datasets used in this study.
| 3 | 4026 | 62 | Diffuse large B-cell lymphoma, follicular lymphoma, chronic lymphocytic leukemia | |
| 2 | 2000 | 62 | Colon tumors and normal tissues | |
| 3 | 11225 | 72 | AML, ALL and mixed-lineage leukemia (MLL) | |
| 5 | 12600 | 203 | 4 lung cancer types and normal tissues | |
| 3 | 5327 | 72 | Acute myelogenous leukemia (AML), acute lymphoblastic leukemia (ALL) B-cell and ALL T-cell | |
| 4 | 2308 | 83 | Small, round blue cell tumors of childhood | |
| 4 | 10367 | 50 | 4 malignant glioma types | |
| 5 | 5920 | 90 | 5 human brain tumor types | |
| 26 | 15009 | 308 | 14 various human tumor types and 12 normal tissue types | |
| 2 | 13247 | 76 | Metastatic and primary tumors | |
| 2 | 5469 | 77 | Diffuse large B-cell lymphomas and follicular lymphomas | |
| 2 | 10509 | 102 | Prostate tumor and normal tissues | |
| 9 | 5726 | 60 | 9 various human tumor types | |
| 11 | 12533 | 174 | 11 various human tumor types | |
| 2 | 7129 | 86 | Lung adenocarcinoma survival | |
| 2 | 12600 | 62 | Lung adenocarcinoma 4-year survival | |
| 2 | 7070 | 60 | Hepatocellular carcinoma 1-year recurrence-free survival | |
| 2 | 7129 | 60 | Medulloblastoma survival | |
| 2 | 7399 | 240 | Non-Hodgkin lymphoma survival | |
| 2 | 24188 | 97 | Breast cancer 5-year metastasis-free survival | |
| 3 | 24188 | 115 | Breast cancer 5-year metastasis-free survival, metastasis within 5 years, germline BRCA1 mutation | |
| 2 | 12240 | 233 | Acute lymphocytic leukemia relapse-free survival |
The reference paper for each dataset is provided in the Additional File 3.