| Literature DB >> 29904059 |
Weibiao Li1, Bo Liao2, Wen Zhu1, Min Chen1, Zejun Li1, Xiaohui Wei1, Lihong Peng3, Guohua Huang1, Lijun Cai1, HaoWen Chen1.
Abstract
Tumor classification is crucial to the clinical diagnosis and proper treatment of cancers. In recent years, sparse representation-based classifier (SRC) has been proposed for tumor classification. The employed dictionary plays an important role in sparse representation-based or sparse coding-based classification. However, sparse representation-based tumor classification models have not used the employed dictionary, thereby limiting their performance. Furthermore, this sparse representation model assumes that the coding residual follows a Gaussian or Laplacian distribution, which may not effectively describe the coding residual in practical tumor classification. In the present study, we formulated a novel effective cancer classification technique, namely, Fisher discrimination regularized robust coding (FDRRC), by combining the Fisher discrimination dictionary learning method with the regularized robust coding (RRC) model, which searches for a maximum a posteriori solution to coding problems by assuming that the coding residual and representation coefficient are independent and identically distributed. The proposed FDRRC model is extensively evaluated on various tumor datasets and shows superior performance compared with various state-of-the-art tumor classification methods in a variety of classification tasks.Entities:
Mesh:
Year: 2018 PMID: 29904059 PMCID: PMC6002553 DOI: 10.1038/s41598-018-27364-7
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.379
The descriptions of eight data sets of tumor.
| Data set | Classes | Genes | The number of samples |
|---|---|---|---|
| Acute leukemia data | 2 | 7,129 | 72 |
| Colon cancer data | 2 | 2,000 | 62 |
| Gliomas data | 2 | 1,2625 | 50 |
| DLBCL data | 2 | 7,129 | 77 |
| Prostate data | 2 | 12,600 | 136 |
| ALL data | 6 | 12,625 | 248 |
| MLLLeukemia data | 3 | 12,582 | 72 |
| LukemiaGloub data | 3 | 7,129 | 72 |
10-fold CV prediction specificity of eight tumor microarray datasets by using various classification methods with the top 400 genes.
| Dataset | SRC | MSRC | MRSRC | FDRRC |
|---|---|---|---|---|
| Colon cancer data | 82.50 | 82.50 | ||
| Acute leukemia data | 95.74 | 97.87 |
|
|
| Gliomas data | 68.18 | 68.18 | 77.27 | |
| DLBCL data | 89.66 | 91.38 | 86.21 | |
| Prostate data | 84.00 | 88.00 | ||
| ALL data | 99.14 | 98.71 | 98.71 | |
| MLLLeukemia data |
|
|
|
|
| LukemiaGloub data |
|
|
|
|
Figure 1Comparison of prediction accuracy on five two-class classification datasets by varying the number of samples from per subclass.
Figure 2Comparison of prediction accuracy on three multi-class classification datasets by varying the number of samples from per subclass.
Figure 3Comparison of accuracy on eight datasets by varying the number of top selected genes.
10-fold CV prediction accuracy of eight tumor microarray datasets by using various classification methods with the top 400 genes.
| Dataset | SRC | MSRC | MRSRC | FDRRC |
|---|---|---|---|---|
| Colon cancer data | 77.42 | 80.65 | 82.26 | |
| Acute leukemia data | 94.44 | 95.83 | 95.83 | |
| Gliomas data | 70.00 | 70.00 | 74.00 | |
| DLBCL data | 90.91 | 92.21 | 89.61 | |
| Prostate data | 88.24 | 95.10 | 92.16 | |
| ALL data | 97.18 | 97.58 | ||
| MLLLeukemia data | 97.22 | |||
| LukemiaGloub data | 94.44 | 95.83 | 97.22 |
|
10-fold CV prediction sensitivity of eight tumor microarray datasets by using various classification methods with the top 400 genes.
| Dataset | SRC | MSRC | MRSRC | FDRRC |
|---|---|---|---|---|
| Colon cancer data | 68.18 | 68.18 | 77.27 | |
| Acute leukemia data | 92.00 | 92.00 | 88.00 | |
| Gliomas data | 71.43 | 71.43 | 71.43 | |
| DLBCL data | 94.74 | 94.74 |
|
|
| Prostate data | 92.31 | 94.23 | ||
| ALL data | 80.00 | 86.67 | 86.67 | |
| MLLLeukemia data | 95.83 |
|
|
|
| LukemiaGloub data | 88.89 | 88.89 | 88.89 |
|
Update of representation coefficient in the Fisher discrimination dictionary learning model.
| 1. Initialization: |
| 2. while convergence or the maximal itertion number is not reached do |
|
|
| where |
|
|
| 3. Return |
Update of dictionary D in the Fisher discrimination dictionary learning model.
| Fix α and update each |
| 1. Let |
| 2. Fix all |
|
|
| After some deviation, we could get the solution |
| 3. Then Fix D and update α like Table |
The RRC algorithm.
| 1. Set the initial value of iteration count |
| 2. Compute the coding residual: |
|
|
| where |
| 3. Estimate weight value of each gene as follows: |
|
|
| where |
| 4. Weighted regularized sparse representation coefficient: |
|
|
| where |
| 5. Update the sparse representation coefficients: |
| If |
| If |
| where 0 < |
| 6. Reconstruct the test sample by sparse representation coefficient and all metagenes: |
| 7. Go back to Step 4 until condition of convergence |
The FDRRC algorithm.
| Testing samples |
| We initialize the atoms of |
| Fix |
| Fix |
| Fix |
| When the algorithm converges, we can classify the test samples as follows: |
| where |