| Literature DB >> 26159165 |
Xiaoping Cheng1, Hongmin Cai2, Yue Zhang3,4, Bo Xu5, Weifeng Su6.
Abstract
BACKGROUND: Classifying cancers by gene selection is among the most important and challenging procedures in biomedicine. A major challenge is to design an effective method that eliminates irrelevant, redundant, or noisy genes from the classification, while retaining all of the highly discriminative genes.Entities:
Mesh:
Year: 2015 PMID: 26159165 PMCID: PMC4498526 DOI: 10.1186/s12859-015-0629-6
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Fig. 1Experiments on feature weight estimation on Fermat’s Spiral. a Each class of 200 samples is labeled by a different color. To test the accuracy of feature weighting by LHDA, artificial noisy features of various dimensions (0 to 1000) were added to the dataset. The first two features completely determine the labels of the synthetic samples, while other features are redundant noises. These results are consistent with the data setting scheme. Estimated feature weights are plotted for noisy features of dimensions (b) 100; (c) 600; and (d) 1000
Fig. 2Performance of LHDA and various feature selection methods on Fermat’s Spiral problem with additional irrelevant features (dimensions ranging from 0 to 1000): (a) Experimental results of LOOCV; (b) Experimental results of 10-fold cross validation
Summary the speed of five local based methods
| Irrelevant feature NO. | Cost time(s) | ||||
|---|---|---|---|---|---|
| LHDA | LDPP | LSDA | LM-NNDA | I-Relief | |
| 100 | 18.82 | 2.09 | 0.05 | 57.86 | 3.23 |
| 200 | 40.24 | 2.63 | 0.07 | 134.36 | 7.72 |
| 300 | 87.96 | 3.00 | 0.11 | 632.91 | 11.83 |
| 400 | 139.13 | 3.31 | 0.25 | 487.89 | 30.71 |
| 500 | 203.28 | 3.60 | 0.24 | 493.70 | 35.03 |
| 600 | 285.81 | 3.58 | 0.35 | 560.68 | 61.91 |
| 700 | 366.59 | 5.45 | 0.34 | 651.41 | 73.77 |
| 800 | 505.73 | 6.05 | 0.32 | 758.55 | 83.78 |
| 900 | 655.34 | 6.58 | 0.39 | 972.84 | 95.71 |
| 1000 | 699.42 | 7.11 | 0.35 | 1201.89 | 108.58 |
Summary of the tested microarray datasets
| Dataset | Gene No. | Sample No. |
|---|---|---|
| Adenocarcinoma | 9868 | 76 |
| Colon | 2000 | 62 |
| SRBCT | 2038 | 83 |
| GCM | 16063 | 280 |
| Leukemia | 7129 | 72 |
| Leukemia1 | 5327 | 72 |
| Leukemia2 | 11225 | 72 |
| Ovarian | 15154 | 253 |
| AML-prognosis | 12625 | 58 |
| Breast | 4869 | 77 |
| CML | 12625 | 28 |
| Gastric | 7129 | 30 |
| Medulloblastoma | 2059 | 23 |
| CNS | 7129 | 34 |
| Prostate1 | 12600 | 102 |
| Prostate2 | 12625 | 88 |
| Prostate3 | 12626 | 33 |
| DLBCL | 7129 | 77 |
| Lung | 12533 | 181 |
| Lymphoma | 2647 | 62 |
Summary the speed of three embedded methods
| Irrelevant feature NO. | Cost time(s) | ||
|---|---|---|---|
| LHDA | SVM-RFE | RF | |
| 100 | 18.82 | 26.71 | 1.44 |
| 200 | 40.24 | 86.21 | 2.91 |
| 300 | 87.96 | 165.85 | 3.46 |
| 400 | 139.13 | 278.81 | 4.49 |
| 500 | 203.28 | 468.72 | 5.76 |
| 600 | 285.81 | 705.27 | 6.91 |
| 700 | 366.59 | 1010.30 | 8.69 |
| 800 | 505.73 | 1365.20 | 9.73 |
| 900 | 655.34 | 1833.64 | 11.90 |
| 1000 | 699.42 | 2029.50 | 11.41 |