| Literature DB >> 28393883 |
Weibiao Li1, Bo Liao1, Wen Zhu1, Min Chen1, Li Peng1, Xiaohui Wei1, Changlong Gu1, Keqin Li2.
Abstract
The classification of tumors is crucial for the proper treatment of cancer. Sparse representation-based classifier (SRC) exhibits good classification performance and has been successfully used to classify tumors using gene expression profile data. In this study, we propose a three-step maxdenominator reweighted sparse representation classification (MRSRC) method to classify tumors. First, we extract a set of metagenes from the training samples. These metagenes can capture the structures inherent to the data and are more effective for classification than the original gene expression data. Second, we use a reweighted regularization method to obtain the sparse representation coefficients. Reweighted regularization can enhance sparsity and obtain better sparse representation coefficients. Third, we classify the data by utilizing a maxdenominator residual error function. Maxdenominator strategy can reduce the residual error and improve the accuracy of the final classification. Extensive experiments using publicly available gene expression profile data sets show that the performance of MRSRC is comparable with or better than many existing representative methods.Entities:
Mesh:
Year: 2017 PMID: 28393883 PMCID: PMC5385541 DOI: 10.1038/srep46030
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.379
Figure 1Optimal classification accuracy of MRSRC on four binary class dataset.
The descriptions of four data sets for two-class classification.
| Data set | Classes | Genes | The number of samples |
|---|---|---|---|
| Acute leukemia data | 2 | 7,129 | 72 |
| Colon cancer data | 2 | 2,000 | 62 |
| Gliomas data | 2 | 1,2625 | 50 |
| DLBCL data | 2 | 7,129 | 77 |
Figure 2Comparison of prediction accuracy on four binary classification datasets by varying the number of samples from per subclass.
The classification sensitivity of two-class classification when the numbers of metagenes per subclass are fixed as 10.
| Data set | SRC | MSRC | MRSRC |
|---|---|---|---|
| Acute leukemia data | 93.33% | ||
| Colon cancer data | 80.00% | 85.83% | |
| Gliomas data | 67.22% | ||
| DLBCL data | 93.33% | 93.33 |
The classification specificity of two-class classification when the numbers of metagenes per subclass are fixed as 10.
| Data set | SRC | MSRC | MRSRC |
|---|---|---|---|
| Acute leukemia data | 93.24% | 94.59% | |
| Colon cancer data | 77.33% | 84.67% | |
| Gliomas data | 70.00% | 72.50 | |
| DLBCL data | 88.54% | 88.96% |
The descriptions of four data sets for multiclass classification.
| Data set | Classes | Genes | samples |
|---|---|---|---|
| SRBCT data | 4 | 2,308 | 83 |
| ALL data | 6 | 12,625 | 248 |
| MLLLeukemia data | 3 | 12,582 | 72 |
| LukemiaGloub data | 3 | 7,129 | 72 |
Figure 3Comparison of prediction accuracy on four multiclass classification datasets by varying the number of samples from per subclass.
Figure 4Comparison of accuracy on four binary classification datasets by varying the number of top selected genes.
Figure 5Comparison of accuracy on four multiclass classification datasets by varying the number of top selected genes.
10-fold CV prediction accuracy of eight tumor microarray datasets using different classification methods.
| Data set | SRC | MSRC | MRSRC |
|---|---|---|---|
| Acute leukemia data | |||
| Colon cancer data | 95.83% | 97.22% | |
| Gliomas data | 72.00% | 72.00% | |
| DLBCL data | 96.10% | 92.21% | |
| SRBCT data | 96.39% | ||
| ALL data | 97.98% | 97.58% | |
| MLLLeukemia data | |||
| LukemiaGloub data | 95.83% |
10-fold CV prediction sensitivity of eight tumor microarray datasets using different classification methods.
| Data set | SRC | MSRC | MRSRC |
|---|---|---|---|
| Acute leukemia data | 72.73% | ||
| Colon cancer data | 92.00% | 92.00% | |
| Gliomas data | 71.43% | 71.43% | |
| DLBCL data | 94.74% | 94.74% | |
| SRBCT data | 96.55% | ||
| ALL data | 86.67% | 86.67% | |
| MLLLeukemia data | |||
| LukemiaGloub data | 88.89% | 88.89% |
10-fold CV prediction specificity of eight tumor microarray datasets using different classification methods.
| Data set | SRC | MSRC | MRSRC |
|---|---|---|---|
| Acute leukemia data | 87.50% | 87.50% | |
| Colon cancer data | 97.87% | ||
| Gliomas data | 72.73% | 72.73% | |
| DLBCL data | 96.55% | 89.66% | |
| SRBCT data | |||
| ALL data | 98.71% | ||
| MLLLeukemia data | |||
| LukemiaGloub data |
Figure 6The value of the sparse representation coefficients of MSRC and MRSRC on four binary classification datasets when choosing one sample as test set.
Figure 7The value of the sparse representation coefficients of MSRC and MRSRC on four multiclass classification datasets when choosing one sample as test set.