| Literature DB >> 18973863 |
Chun-Hou Zheng1, De-Shuang Huang, Xiang-Zhen Kong, Xing-Ming Zhao.
Abstract
We propose a new method for tumor classification from gene expression data, which mainly contains three steps. Firstly, the original DNA microarray gene expression data are modeled by independent component analysis (ICA). Secondly, the most discriminant eigenassays extracted by ICA are selected by the sequential floating forward selection technique. Finally, support vector machine is used to classify the modeling data. To show the validity of the proposed method, we applied it to classify three DNA microarray datasets involving various human normal and tumor tissue samples. The experimental results show that the method is efficient and feasible.Entities:
Mesh:
Year: 2008 PMID: 18973863 PMCID: PMC5054104 DOI: 10.1016/S1672-0229(08)60022-4
Source DB: PubMed Journal: Genomics Proteomics Bioinformatics ISSN: 1672-0229 Impact factor: 7.691
Fig. 1The gene expression data synthesis model. To find a set of independent basis snapshots (eigenassay), the snapshots in X are considered to be a linear combination of statistically independent basis snapshots (the rows in S), where W is the unmixing matrix and A is an unknown mixing matrix. The independent eigenassay is estimated as the output U of the learned ICA.
Overview of the three datasets for classification
| Dataset | No. of training set | No. of test set | No. of genes | ||
|---|---|---|---|---|---|
| Class 1 | Class 2 | Class 1 | Class 2 | ||
| Colon cancer | 14 | 26 | 8 | 14 | 2,000 |
| Acute leukemia | 11 | 27 | 14 | 20 | 7,129 |
| High-grade glioma | 21 | 14 | 14 | 15 | 12,625 |
Comparison of the classification performances of five methods on three datasets
| No. | Method | Colon cancer dataset | ||
| LOOCV performance (%) | Accuracy on training set (%) | Accuracy on test set (%) | ||
| 1 | SVM | 88.25±3.74 | 95.00±2.35 | 88.18±3.83 |
| 2 | PCA+SVM | 87.25±2.99 | 93.25±2.05 | 89.54±3.74 |
| 3 | PCA+SFFS+SVM | 89.25±3.13 | 91.00±4.28 | 89.54±3.74 |
| 4 | ICA+SVM | 83.50±4.44 | 96.00±3.37 | 89.09±4.39 |
| 5 | ICA+SFFS±SVM | 91.25±2.12 | 93.75±2.42 | 89.54±3.74 |
| No. | Method | Acute leukemia dataset | ||
| LOOCV performance (%) | Accuracy on training set (%) | Accuracy on test set (%) | ||
| 1 | SVM | 93.69±2.21 | 100±0.00 | 95.30±3.45 |
| 2 | PCA+SVM | 91.59±2.99 | 100±0.00 | 93.24±5.01 |
| 3 | PCA+SFFS+SVM | 96.58±2.78 | 97.90±2.78 | 93.53±4.55 |
| 4 | ICA+SVM | 90.82±3.79 | 100±0.00 | 95.30±3.45 |
| 5 | ICA+SFFS+SVM | 97.90±2.07 | 99.21±1.27 | 96.77±2.57 |
| No. | Method | High-grade glioma dataset | ||
| LOOCV performance (%) | Accuracy on training set (%) | Accuracy on test set (%) | ||
| 1 | SVM | 80.00±6.67 | 99.52±1.51 | 66.55±4.00 |
| 2 | PCA+SVM | 78.09±7.17 | 94.76±3.51 | 67.93±3.65 |
| 3 | PCA+SFFS+SVM | 88.10±5.61 | 97.62±3.37 | 67.24±6.35 |
| 4 | ICA+SVM | 77.64±6.51 | 99.52±1.51 | 71.38±4.31 |
| 5 | ICA+SFFS+SVM | 88.54±4.00 | 97.62±3.37 | 71.73±3.91 |