| Literature DB >> 16351746 |
Nicola Ancona1, Rosalia Maglietta, Annarita D'Addabbo, Sabino Liuni, Graziano Pesole.
Abstract
BACKGROUND: The advent of the technology of DNA microarrays constitutes an epochal change in the classification and discovery of different types of cancer because the information provided by DNA microarrays allows an approach to the problem of cancer analysis from a quantitative rather than qualitative point of view. Cancer classification requires well founded mathematical methods which are able to predict the status of new specimens with high significance levels starting from a limited number of data. In this paper we assess the performances of Regularized Least Squares (RLS) classifiers, originally proposed in regularization theory, by comparing them with Support Vector Machines (SVM), the state-of-the-art supervised learning technique for cancer classification by DNA microarray data. The performances of both approaches have been also investigated with respect to the number of selected genes and different gene selection strategies.Entities:
Mesh:
Year: 2005 PMID: 16351746 PMCID: PMC1866388 DOI: 10.1186/1471-2105-6-S4-S2
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Minimum LOO error on the Leukemia training set (composed of 38 examples), error on the test set (34 examples) and minimum LOO error on the whole Leukemia data set (72 examples).
| SVM | RLS | |
| LOO error on training set | 1 | 1 |
| Test error | 1 | 1 |
| LOO error on the whole data set | 1 | 1 |
Figure 1: Observed T(j) distribution computed on the Leukemia data set, compared to randomly permuted class distinctions. The number of genes highly expressed in a) ALL and b) AML is shown on y-axis.
Minimum LOO error computed on Leukemia data set (composed of 72 examples), for various number of genes, selected with S2N and NRFE statistics.
| SVM | RLS | |||
| genes | S2N | NRFE | S2N | NRFE |
| 1000 | 1 | 1 | 2 | 1 |
| 100 | 1 | 0 | 1 | 0 |
| 50 | 1 | 0 | 2 | 0 |
| 40 | 2 | 0 | 2 | 0 |
| 30 | 2 | 0 | 2 | 0 |
| 20 | 2 | 0 | 2 | 0 |
| 10 | 2 | 1 | 2 | 0 |
| 5 | 1 | 1 | 2 | 2 |
| 3 | 4 | 3 | 4 | 2 |
Minimum LOO error computed on Colon data set (composed of 62 examples), for various number of genes, selected with S2N and NRFE statistics.
| SVM | RLS | |||
| genes | S2N | NRFE | S2N | NRFE |
| 2000 | 6 | 6 | ||
| 500 | 6 | 6 | 7 | 6 |
| 400 | 7 | 6 | 6 | 6 |
| 300 | 6 | 6 | 6 | 6 |
| 200 | 7 | 6 | 7 | 4 |
| 100 | 9 | 5 | 7 | 4 |
| 50 | 8 | 4 | 6 | 1 |
| 10 | 6 | 6 | 7 | 5 |
| 5 | 7 | 8 | 7 | 8 |
Figure 3: LOO error (dotted line) and empirical risk (solid line) w.r.t the regularization parameter obtained on Colon data set by using a) SVM and b) RLS classifiers.
Minimum LOO error computed on Multi-cancer data set (composed of 280 examples), for various number of genes, selected with S2N and NRFE statistics.
| SVM | RLS | |||
| genes | S2N | NRFE | S2N | NRFE |
| 16063 | 105 | 90 | ||
| 1400 | 46 | 40 | 59 | 49 |
| 1000 | 42 | 41 | 57 | 50 |
| 500 | 50 | 41 | 57 | 56 |
| 300 | 51 | 38 | 57 | 54 |
| 200 | 51 | 50 | 55 | 50 |
| 100 | 63 | 97 | 51 | 58 |
| 50 | 59 | 76 | 43 | 61 |
| 10 | 63 | 74 | 59 | 73 |