| Literature DB >> 24678505 |
Bin Gan1, Chun-Hou Zheng2, Jun Zhang3, Hong-Qiang Wang4.
Abstract
Accurate tumor classification is crucial to the proper treatment of cancer. To now, sparse representation (SR) has shown its great performance for tumor classification. This paper conceives a new SR-based method for tumor classification by using gene expression data. In the proposed method, we firstly use latent low-rank representation for extracting salient features and removing noise from the original samples data. Then we use sparse representation classifier (SRC) to build tumor classification model. The experimental results on several real-world data sets show that our method is more efficient and more effective than the previous classification methods including SVM, SRC, and LASSO.Entities:
Mesh:
Substances:
Year: 2014 PMID: 24678505 PMCID: PMC3942202 DOI: 10.1155/2014/420856
Source DB: PubMed Journal: Biomed Res Int Impact factor: 3.411
Three binary data sets used in the experiments.
| Datasets | Samples | Genes | |
|---|---|---|---|
| Class 1 | Class 2 | ||
| Colon cancer | 40 | 22 | 2000 |
| Prostate cancer | 77 | 59 | 12600 |
| DLBCL | 58 | 19 | 5469 |
Classification accuracies by different methods for the three binary data sets.
| Datasets | SVM | LASSO | SRC | SRC-LatLRR |
|---|---|---|---|---|
| Colon cancer | 85.48 | 85.48 | 85.48 |
|
| Prostate cancer | 91.18 | 91.91 |
| 94.12 |
| DLBCL | 96.10 | 96.10 |
|
|
Classification accuracies by different methods with gene selection for the three binary data sets.
| Datasets | SVM | LASSO | SRC | SRC-LatLRR |
|---|---|---|---|---|
| Colon cancer (1000) | 87.1 | 87.1 | 87.1 |
|
| Prostate cancer (1500) | 94.85 | 91.18 | 95.59 |
|
| DLBCL (800) | 97.40 | 93.51 | 97.40 |
|
Descriptions of the four multiclass data sets used in DNA classification experiments.
| Dataset | Class counts | Samples | Genes |
|---|---|---|---|
| Lung cancer | 5 | 203 | 12600 |
| Leukemia | 3 | 72 | 11225 |
| 11_tumors | 11 | 174 | 12533 |
| 9_tumors | 9 | 60 | 5726 |
Classification accuracies by different methods for the multiclass data sets.
| Dataset | SVM | SRC | SRC-LatLRR |
|---|---|---|---|
| Lung cancer |
| 95.07 | 95.07 |
| Leukemia | 96.60 | 95.83 |
|
| 11_tumors | 94.68 |
|
|
| 9_tumors | 65.10 |
|
|
Classification accuracies by different methods with gene selection for the multiclass data sets.
| Dataset | SVM | SRC | SRC-LatLRR |
|---|---|---|---|
| Lung cancer (2000) |
| 95.07 | 95.57 |
| Leukemia (3000) | 96.90 | 95.83 |
|
| 11_tumors (1000) |
| 95.40 | 95.40 |
| 9_tumors (2000) |
| 71.67 | 80.00 |
Figure 1The changing curves of classification accuracy and removed noise level with λ on the colon data set.
Figure 2The changing curves of classification accuracy and removed noise level with λ on the prostate data set.
Figure 3The changing curves of classification accuracy and removed noise level with λ on the DLBCL data set.