| Literature DB >> 25473795 |
Shu-Lin Wang, Liuchao Sun, Jianwen Fang.
Abstract
MOTIVATION: Previous studies have demonstrated that machine learning based molecular cancer classification using gene expression profiling (GEP) data is promising for the clinic diagnosis and treatment of cancer. Novel classification methods with high efficiency and prediction accuracy are still needed to deal with high dimensionality and small sample size of typical GEP data. Recently the sparse representation (SR) method has been successfully applied to the cancer classification. Nevertheless, its efficiency needs to be improved when analyzing large-scale GEP data.Entities:
Mesh:
Year: 2014 PMID: 25473795 PMCID: PMC4271561 DOI: 10.1186/1471-2105-15-S15-S2
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Figure 1The analysis flowchart of cancer GEP data using SR-based methods for predicting cancer types.
The summary of the eight cancer datasets.
| Types | Datasets | #Samples | #Genes | #Subclasses( |
|---|---|---|---|---|
| Microarray | DLBCL | 77 | 7,129 | 2 |
| ALL | 248 | 12,626 | 6 | |
| GCM | 190 | 16,063 | 14 | |
| Lung | 203 | 12,601 | 5 | |
| MLL | 72 | 7,129 | 3 | |
| NGS | BRCACancer | 216 | 20531 | 2 |
| KIRCCancer | 130 | 20531 | 2 | |
| LUADCancer | 110 | 20531 | 2 | |
| THCACancer | 112 | 20531 | 2 | |
Figure 2The prediction accuracy on the four data sets varying with different value.
Figure 3The prediction accuracy of the four methods varying with different number of meta-samples on the four datasets.
The classification accuracy obtained by five SR-based methods on the nine cancer datasets.
| Types | Datasets | SRC | MSRC | MRSRC | MRRCC1 | MRRCC2 |
|---|---|---|---|---|---|---|
| Microarray | DLBCL | 94.75 | 96.10 | 94.81 | 94.81 | |
| All | 97.70 | 97.18 | 97.81 | 96.77 | ||
| GCM | 82.93 | 82.32 | 78.79 | 79.80 | 78.79 | |
| Lung | 94.53 | 95.57 | 96.55 | |||
| MLL | 96.31 | 95.83 | 98.61 | 97.22 | ||
| NGS | BRCACancer | 96.76 | 95.83 | 99.07 | ||
| KIRCCancer | 95.92 | 95.38 | 96.92 | |||
| LUADCancer | 94.91 | 99.09 | 99.09 | |||
| THCACancer | 93.30 | 87.50 | 95.54 | 92.86 | ||
Figure 4The performance of seven methods varying with the number of genes on the four microarray GEP datasets.
Figure 5The performance of seven methods varying with the number of genes on the four NGS GEP datasets.