| Literature DB >> 16774678 |
Abstract
BACKGROUND: The most fundamental task using gene expression data in clinical oncology is to classify tissue samples according to their gene expression levels. Compared with traditional pattern classifications, gene expression-based data classification is typically characterized by high dimensionality and small sample size, which make the task quite challenging.Entities:
Mesh:
Substances:
Year: 2006 PMID: 16774678 PMCID: PMC1513256 DOI: 10.1186/1471-2105-7-299
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Comparison of the classifiers in terms of the best results. The comparison of all the classifiers in terms of the best results of the average test error rates (%). For each data set, we chose the Nmost discriminatory genes, where N= 10, 20, 40, 60, 80, 100, 200, 400, 600, 800, 1000, 1200, 1400, 1600, 1800, 2000, respectively; repeated the experiment 100 times at each value of N; and then, calculated the average test error rates and their standard deviations over the 100 experiments. In comparison, we assign a classifier a score 1 as it achieves the best result on one data set, and 2 if it achieves the next best result, and so on. The average score roughly evaluates the global performance of a classifier on these twelve data sets.
| KNN | ULDA | DLDA | SVM | KerNN | |
| 3.32 (1.21) | 3.08 (1.09) | 2.95 (0.78) | |||
| 6.17 (2.75) | 5.19 (2.95) | 2.83 (2.37) | 3.21 (2.18) | ||
| 19.52 (5.88) | 22.42 (5.58) | 13.35 (7.52) | 15.32 (5.60) | ||
| 7.12 (4.12) | 4.92 (4.40) | 4.64 (4.39) | 4.48 (4.45) | ||
| 13.12 (5.91) | 9.92 (5.16) | 7.92 (5.39) | 8.36 (4.48) | ||
| 14.03 (3.76) | 16.84 (6.14) | 12.65 (4.58) | 11.84 (4.28) | ||
| 1.21 (0.98) | 0.81 (0.73) | 0.47 (0.57) | 0.53 (0.61) | ||
| 2.05 (2.58) | 2.05 (2.09) | 6.23 (2.88) | 1.90 (2.05) | ||
| 0.74 (0.87) | 0.02 (0.13) | 1.58 (0.81) | 0.17 (0.42) | ||
| 7.41 (2.47) | 5.22 (2.99) | 6.73 (3.02) | 4.90 (2.53) | ||
| 2.57 (0.86) | 2.45 (0.92) | 2.60 (1.02) | 2.42 (0.82) | ||
| Average Score | 4.5 | 2.8 | 3.3 | 2.3 | |
Figure 1Test error rates for the ALL-AML data set. Stability comparison on the ALL-AML data. The average test error rate (%) as a function of the selected gene number.
Figure 2Test error rates for the Colon data set. Stability comparison on the Colon data. The average test error rate (%) as a function of the selected gene number.
Figure 3Test error rates for the Prostate data set. Stability comparison on the Prostate data. The average test error rate (%) as a function of the selected gene number.
Figure 4The effect of the disturbed resampling on Prostate. The effect of adopting the technique of disturbed resampling on a relatively large data set, Prostate, which contains 102 samples. (a) Results on the training data. (b) Results on the test data.
Figure 5The effect of the disturbed resampling on Breast-ER. The effect of adopting the technique of disturbed resampling on the a relatively small data set, Breast-ER, which contains only 49 samples. (a) Results on the training data. (b) Results on the test data.