| Literature DB >> 22369383 |
Qingzhong Liu1, Andrew H Sung, Zhongxue Chen, Jianzhong Liu, Lei Chen, Mengyu Qiao, Zhaohui Wang, Xudong Huang, Youping Deng.
Abstract
BACKGROUND: Microarray data have a high dimension of variables and a small sample size. In microarray data analyses, two important issues are how to choose genes, which provide reliable and good prediction for disease status, and how to determine the final gene set that is best for classification. Associations among genetic markers mean one can exploit information redundancy to potentially reduce classification cost in terms of time and money.Entities:
Mesh:
Year: 2011 PMID: 22369383 PMCID: PMC3287491 DOI: 10.1186/1471-2164-12-S5-S1
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 3.969
Figure 1The average testing accuracies of different gene selection methods for six benchmark data sets by using the classifiers (NBC, NMSC, SVM, RF). X-axis and y-axis give the feature dimension and testing accuracy values, respectively.
Mean values and standard errors of hs_hr and ms_hr.
| DATA SET | GENE SELECTION METHOD | MEAN(HS_HR) ± STD(HS_HR), % | MEAN(MS_HR) ± STD(MS_HR), % | ||||||
|---|---|---|---|---|---|---|---|---|---|
| NMSC | SVM | NBC | RF | NMSC | SVM | NBC | RF | ||
| Leukemia | NBC-MMC | 99.4 ± 1.2 | 98.3 ± 2.3 | 98.4 ± 1.4 | 98.1 ± 1.4 | 93.4 ± 2.8 | 94.3 ± 2.8 | 95.6 ± 2.3 | |
| NMSC-MMC | 99.1 ± 1.3 | 98.4 ± 1.9 | 98.6 ± 1.9 | 97.9 ± 1.2 | 93.3 ± 2.8 | 95.2 ± 2.8 | 95.7 ± 3.4 | ||
| NBC-MSC | 99.4 ± 1.1 | 99.1 ± 1.3 | 98.4 ± 1.7 | 94.6 ± 2.7 | 96.0 ± 2.5 | ||||
| NMSC-MSC | 99.7 ± 0.9 | 98.6 ± 1.7 | 98.7 ± 1.7 | 97.7 ± 1.4 | 94.8 ± 2.5 | 94.6 ± 3.4 | 95.7 ± 3.1 | ||
| GLGS | 99.6 ± 1.0 | 98.9 ± 1.7 | 98.6 ± 1.7 | 98.6 ± 1.7 | 97.8 ± 1.7 | 92.5 ± 3.8 | 95.0 ± 2.5 | ||
| LOOCSFS | 97.1 ± 3.3 | 98.0 ± 1.5 | 97.7 ± 1.9 | 93.9 ± 3.5 | 94.8 ± 3.1 | 94.5 ± 2.7 | |||
| SVMRFE | 98.0 ± 2.0 | 95.4 ± 3.9 | 97.3 ± 2.1 | 98.0 ± 2.0 | 95.7 ± 2.8 | 92.5 ± 5.2 | 92.5 ± 3.0 | 93.4 ± 1.9 | |
| SFFS-LSBOUND | 97.1 ± 2.5 | 97.4 ± 3.8 | 96.3 ± 4.1 | 97.1 ± 2.8 | 93.8 ± 4.3 | 92.9 ± 3.8 | 90.2 ± 5.8 | 92.6 ± 4.1 | |
| SFS-LSBOUND | 97.1 ± 2.8 | 97.0 ± 3.0 | 96.4 ± 3.6 | 97.3 ± 3.0 | 94.6 ± 3.5 | 93.6 ± 3.8 | 91.2 ± 5.0 | 93.0 ± 5.1 | |
| T-TEST | 94.8 ± 3.5 | 95.4 ± 4.5 | 93.3 ± 6.9 | 96.8 ± 2.9 | 92.2 ± 3.9 | 90.7 ± 4.8 | 90.1 ± 6.5 | 93.5 ± 3.6 | |
| Lymphoma | NBC-MMC | 98.1 ± 2.6 | 97.3 ± 2.6 | 96.4 ± 2.8 | 96.2 ± 4.3 | 93.8 ± 2.8 | 91.7 ± 3.9 | 91.6 ± 3.7 | |
| NMSC-MMC | 99.2 ± 1.2 | 98.8 ± 1.6 | 97.9 ± 2.6 | 96.5 ± 3.7 | 96.9 ± 1.9 | 93.0 ± 2.8 | 93.1 ± 3.3 | 92.3 ± 4.0 | |
| NBC-MSC | 99.4 ± 1.1 | 98.4 ± 1.8 | 97.9 ± 2.6 | 96.8 ± 3.3 | 93.1 ± 3.5 | 92.7 ± 3.5 | 92.6 ± 4.1 | ||
| NMSC-MSC | 98.8 ± 1.6 | 97.2 ± 1.9 | |||||||
| GLGS | 98.6 ± 1.8 | 98.2 ± 1.9 | 97.0 ± 2.6 | 96.9 ± 2.3 | 96.5 ± 2.1 | 92.5 ± 3.8 | 92.3 ± 3.6 | 91.7 ± 2.9 | |
| LOOCSFS | 87.0 ± 7.2 | 93.0 ± 5.3 | 87.3 ± 5.1 | 92.9 ± 4.8 | 85.8 ± 6.8 | 87.8 ± 5.4 | 85.1 ± 4.5 | 88.2 ± 4.3 | |
| SVMRFE | 99.2 ± 1.5 | 96.5 ± 3.9 | 97.2 ± 3.4 | 96.6 ± 3.1 | 96.5 ± 2.0 | 91.8 ± 4.3 | 93.1 ± 4.0 | 93.3 ± 4.0 | |
| SFFS-LSBOUND | 88.7 ± 6.1 | 95.1 ± 3.3 | 84.0 ± 4.9 | 92.2 ± 4.7 | 87.0 ± 5.7 | 88.2 ± 4.9 | 80.6 ± 3.9 | 86.8 ± 4.8 | |
| SFS-LSBOUND | 87.7 ± 6.1 | 96.1 ± 3.5 | 86.1 ± 3.5 | 91.8 ± 4.2 | 86.4 ± 5.6 | 91.1 ± 3.7 | 82.7 ± 3.4 | 86.1 ± 4.8 | |
| T-TEST | 86.0 ± 5.7 | 94.4 ± 3.0 | 86.5 ± 7.0 | 91.7 ± 5.2 | 84.3 ± 5.8 | 87.7 ± 3.3 | 83.9 ± 6.1 | 87.2 ± 4.5 | |
| Prostate | NBC-MMC | 96.3 ± 2.4 | 95.8 ± 2.5 | 94.8 ± 2.6 | 94.2 ± 2.8 | 91.6 ± 2.3 | 90.4 ± 2.7 | 92.1 ± 2.2 | |
| NMSC-MMC | 95.6 ± 2.3 | 95.9 ± 2.5 | 93.7 ± 2.8 | 95.3 ± 2.3 | 92.7 ± 2.3 | 91.4 ± 2.8 | 90.7 ± 3.1 | 91.3 ± 2.3 | |
| NBC-MSC | 96.4 ± 2.0 | 96.6 ± 1.9 | 92.5 ± 2.3 | 91.0 ± 2.3 | |||||
| NMSC-MSC | 94.5 ± 2.0 | 95.8 ± 1.8 | 94.5 ± 2.4 | 92.0 ± 1.9 | |||||
| GLGS | 93.6 ± 3.0 | 96.1 ± 2.2 | 90.4 ± 3.9 | 94.7 ± 2.0 | 91.5 ± 2.7 | 91.7 ± 2.6 | 87.5 ± 3.4 | 90.0 ± 2.5 | |
| LOOCSFS | 88.4 ± 5.2 | 94.9 ± 2.9 | 90.7 ± 5.3 | 95.2 ± 2.6 | 87.0 ± 4.7 | 91.1 ± 3.4 | 88.0 ± 4.5 | 92.3 ± 2.3 | |
| SVMRFE | 94.1 ± 3.4 | 92.3 ± 2.7 | 92.8 ± 4.3 | 95.7 ± 2.6 | 92.4 ± 3.3 | 86.7 ± 3.5 | 90.0 ± 4.0 | ||
| SFFS-LSBOUND | 90.4 ± 3.2 | 93.4 ± 2.8 | 86.2 ± 5.8 | 90.2 ± 3.2 | 88.9 ± 3.1 | 86.0 ± 3.2 | 84.4 ± 5.1 | 86.1 ± 4.0 | |
| SFS-LSBOUND | 89.7 ± 4.9 | 92.7 ± 4.0 | 87.3 ± 5.4 | 92.4 ± 3.5 | 88.3 ± 5.1 | 87.2 ± 5.0 | 85.1 ± 5.4 | 89.0 ± 3.9 | |
| T-TEST | 91.4 ± 4.1 | 92.5 ± 2.1 | 91.7 ± 2.8 | 94.0 ± 3.0 | 89.7 ± 3.7 | 87.1 ± 3.2 | 89.0 ± 4.3 | 91.0 ± 3.1 | |
| Colon | NBC-MMC | 88.7 ± 5.5 | 86.5 ± 4.0 | 89.7 ± 4.9 | 84.5 ± 5.2 | 80.9 ± 6.0 | 78.2 ± 4.9 | 82.5 ± 5.5 | |
| NMSC-MMC | 87.4 ± 5.3 | 90.0 ± 4.0 | 84.9 ± 7.1 | 80.8 ± 5.9 | 83.3 ± 5.4 | ||||
| NBC-MSC | 89.4 ± 4.3 | 86.9 ± 4.6 | 90.0 ± 4.0 | 80.3 ± 5.6 | 82.1 ± 4.8 | ||||
| NMSC-MSC | 91.0 ± 5.3 | 87.6 ± 4.7 | 88.1 ± 3.3 | 90.0 ± 4.4 | 80.9 ± 5.5 | 83.9 ± 4.5 | |||
| GLGS | 87.3 ± 6.2 | 87.3 ± 4.6 | 85.2 ± 4.8 | 83.7 ± 6.6 | 81.2 ± 5.5 | 77.6 ± 5.8 | 83.0 ± 4.5 | ||
| LOOCSFS | 85.0 ± 5.3 | 86.3 ± 3.9 | 81.6 ± 5.8 | 86.8 ± 5.3 | 82.2 ± 4.6 | 79.3 ± 5.2 | 76.7 ± 6.9 | 80.3 ± 5.3 | |
| SVMRFE | 86.0 ± 6.7 | 86.8 ± 4.8 | 82.1 ± 7.4 | 86.3 ± 5.5 | 81.8 ± 7.2 | 80.7 ± 4.7 | 77.7 ± 7.5 | 80.3 ± 6.0 | |
| SFFS-LSBOUND | 85.0 ± 4.8 | 87.1 ± 4.4 | 72.7 ± 7.0 | 82.6 ± 6.0 | 82.4 ± 4.4 | 76.2 ± 6.3 | 69.5 ± 8.3 | 74.6 ± 6.8 | |
| SFS-LSBOUND | 85.3 ± 4.6 | 85.8 ± 5.3 | 76.8 ± 7.1 | 86.0 ± 4.1 | 83.3 ± 4.7 | 77.7 ± 6.4 | 72.5 ± 6.2 | 77.6 ± 4.5 | |
| T-TEST | 77.4 ± 10.4 | 85.5 ± 4.0 | 76.3 ± 8.3 | 81.5 ± 7.2 | 74.9 ± 10.8 | 75.3 ± 5.7 | 72.8 ± 8.2 | 75.1 ± 7.8 | |
| CNS | NBC-MMC | 91.8 ± 6.1 | 77.8 ± 5.2 | 86.7 ± 6.0 | 82.4 ± 4.7 | 67.3 ± 4.1 | |||
| NMSC-MMC | 90.0 ± 6.4 | 92.2 ± 5.7 | 78.0 ± 5.3 | 82.7 ± 5.2 | 82.8 ± 6.8 | 82.1 ± 5.6 | 67.5 ± 5.5 | 73.5 ± 4.9 | |
| NBC-MSC | 92.0 ± 4.4 | 81.1 ± 4.1 | 85.5 ± 4.9 | 70.2 ± 3.7 | 75.9 ± 5.3 | ||||
| NMSC-MSC | 92.8 ± 4.0 | 91.6 ± 4.9 | 84.9 ± 4.1 | 85.6 ± 4.3 | 81.4 ± 6.2 | 70.0 ± 4.5 | 74.4 ± 4.2 | ||
| GLGS | 84.7 ± 3.3 | 91.1 ± 5.4 | 78.8 ± 5.5 | 84.2 ± 5.0 | 82.4 ± 3.6 | 81.3 ± 4.8 | 67.9 ± 4.5 | 75.3 ± 4.3 | |
| LOOCSFS | 71.3 ± 9.8 | 85.0 ± 5.9 | 79.1 ± 7.7 | 83.2 ± 4.4 | 69.3 ± 8.0 | 77.6 ± 4.5 | 75.3 ± 5.1 | ||
| SVMRFE | 83.2 ± 8.9 | 85.1 ± 8.4 | 77.1 ± 6.8 | 83.5 ± 4.3 | 77.0 ± 8.0 | 75.0 ± 8.8 | 65.7 ± 7.2 | 73.3 ± 4.9 | |
| SFFS-LSBOUND | 68.1 ± 6.7 | 71.9 ± 7.1 | 67.6 ± 7.7 | 76.2 ± 4.5 | 65.3 ± 6.3 | 59.4 ± 7.5 | 61.3 ± 6.1 | 66.9 ± 4.8 | |
| SFS-LSBOUND | 67.8 ± 6.2 | 72.4 ± 4.9 | 69.8 ± 8.2 | 76.2 ± 5.0 | 65.7 ± 5.4 | 60.7 ± 5.1 | 63.7 ± 7.2 | 68.4 ± 4.5 | |
| T-TEST | 67.5 ± 8.8 | 77.4 ± 6.4 | 67.0 ± 7.1 | 75.5 ± 5.9 | 63.4 ± 7.6 | 67.3 ± 5.8 | 60.9 ± 6.8 | 67.8 ± 4.9 | |
| Breast | NBC-MMC | 82.5 ± 6.0 | 82.9 ± 3.5 | 84.1 ± 3.0 | 84.1 ± 3.6 | 81.3 ± 5.7 | 73.2 ± 3.8 | 78.4 ± 3.4 | 78.4 ± 3.8 |
| NMSC-MMC | 82.0 ± 3.3 | 82.4 ± 4.3 | 83.7 ± 4.7 | 80.4 ± 4.0 | 72.0 ± 3.8 | 78.4 ± 4.3 | 77.0 ± 4.3 | ||
| NBC-MSC | 83.4 ± 5.8 | 79.1 ± 3.0 | |||||||
| NMSC-MSC | 82.8 ± 4.4 | 82.4 ± 3.8 | 84.1 ± 4.0 | 83.9 ± 4.0 | 79.6 ± 4.0 | 73.7 ± 3.9 | 77.7 ± 4.0 | ||
| GLGS | 80.8 ± 3.7 | 79.3 ± 4.5 | 81.4 ± 4.1 | 83.7 ± 4.6 | 79.2 ± 3.9 | 70.7 ± 4.6 | 77.8 ± 3.7 | 77.0 ± 4.2 | |
| LOOCSFS | 71.7 ± 6.5 | 77.3 ± 5.2 | 78.0 ± 5.8 | 80.3 ± 3.8 | 70.4 ± 6.5 | 69.2 ± 4.7 | 74.7 ± 5.1 | 74.3 ± 4.2 | |
| SVMRFE | 74.3 ± 7.1 | 78.3 ± 5.2 | 77.2 ± 5.3 | 80.4 ± 4.1 | 73.2 ± 6.6 | 72.1 ± 5.8 | 73.9 ± 4.5 | 73.9 ± 3.7 | |
| SFFS-LSBOUND | 76.2 ± 5.2 | 78.9 ± 2.8 | 76.9 ± 7.3 | 81.5 ± 5.3 | 75.0 ± 5.3 | 67.8 ± 3.3 | 75.2 ± 6.8 | 75.6 ± 4.9 | |
| SFS-LSBOUND | 77.5 ± 5.6 | 78.9 ± 4.2 | 79.8 ± 5.2 | 81.3 ± 5.2 | 75.8 ± 5.5 | 68.0 ± 4.7 | 76.9 ± 6.3 | 75.4 ± 5.2 | |
| T-TEST | 71.1 ± 5.3 | 77.6 ± 5.2 | 72.6 ± 6.3 | 76.3 ± 5.7 | 69.3 ± 5.3 | 69.9 ± 3.6 | 70.5 ± 5.8 | 71.1 ± 5.8 | |
In each data set, the highest mean value is highlighted in bold
The number of occurrences of the best testing in Table 1
| Gene | # Best testing accumulated with each classifier | # Best testing among the four classifiers | ||
|---|---|---|---|---|
| HS_HR | MS_HR | HS_HR | MS_HR | |
| NBC-MMC | 6 | 1 | 1 | 0 |
| NMSC-MMC | 4 | 1 | 2 | 0 |
| NBC-MSC | 8 | 12 | 2 | 6 |
| NMSC-MSC | 7 | 8 | 2 | 1 |
| GLGS | 1 | 1 | 0 | 0 |
| LOOCSFS | 1 | 2 | 0 | 0 |
| SVMRFE | 0 | 1 | 0 | 0 |
| SFFS-LSBOUND | 0 | 0 | 0 | 0 |
| SFS-LSBOUND | 0 | 0 | 0 | 0 |
| T-TEST | 0 | 0 | 0 | 0 |
| Total | 27 | 26 | 7 | 7 |
P-values from testing superiority of new methods to others
| Method | NBC-MMC | NMSC-MMC | NBC-MSC | NMSC-MSC | ||||
|---|---|---|---|---|---|---|---|---|
| HS_HR | MS_HR | HS_HR | MS_HR | HS_HR | MS_HR | HS_HR | MS_HR | |
| GLGS | 0.092 | 0.15 | 0.13 | 0.22 | 0.023 | 0.0212 | 0.038 | 0.048 |
| LOOCSFS | <0.0001 | <0.0001 | <0.0001 | 0.0001 | <0.0001 | <0.0001 | <0.0001 | <0.0001 |
| SVMRFE | <0.0001 | <0.0001 | <0.0001 | 0.0077 | <0.0001 | 0.0001 | <0.0001 | 0.0004 |
| SFFS-LSBOUND | <0.0001 | <0.0001 | <0.0001 | <0.0001 | <0.0001 | <0.0001 | <0.0001 | <0.0001 |
| SFS-LSBOUND | <0.0001 | <0.0001 | <0.0001 | <0.0001 | <0.0001 | <0.0001 | <0.0001 | <0.0001 |
| T-TEST | <0.0001 | <0.0001 | <0.0001 | <0.0001 | <0.0001 | <0.0001 | <0.0001 | <0.0001 |
Comparison of LPPO and Random Strategy
| Data | Gene | MEAN(S_LPPO - MS_HR), % | |||
|---|---|---|---|---|---|
| NMSC | SVM | NBC | RF | ||
| Leukemia | NBC-MMC | 0.8 | -0.1 | 2.3 | 1.4 |
| NMSC-MMC | 1.0 | 0.9 | 1.8 | 1.6 | |
| NBC-MSC | -0.2 | 0.3 | 1.9 | 1.1 | |
| NMSC-MSC | 1.6 | 0.7 | 2.5 | 1.3 | |
| Lymphoma | NBC-MMC | 0.6 | 0.1 | -1.0 | 0.4 |
| NMSC-MMC | 1.3 | -0.4 | 1.4 | 1.2 | |
| NBC-MSC | 0.4 | 1.2 | 1.5 | 1.4 | |
| NMSC-MSC | 0.9 | 0.1 | 1.6 | 0.6 | |
| Prostate | NBC-MMC | 0.2 | 0.1 | 0.0 | 0.5 |
| NMSC-MMC | 0.9 | 0.4 | 0.9 | 1.1 | |
| NBC-MSC | 0.3 | 0.7 | 0.6 | 1.8 | |
| NMSC-MSC | 0.4 | 0.8 | 0.2 | 1.0 | |
| Colon | NBC-MMC | 0.3 | 0.2 | -1.1 | 0.4 |
| NMSC-MMC | 0.6 | 0.0 | 0.1 | 0.3 | |
| NBC-MSC | -0.2 | -0.5 | -2.6 | -1.3 | |
| NMSC-MSC | 0.9 | 0.3 | -2.2 | -0.5 | |
| CNS | NBC-MMC | 2.1 | 1.8 | 2.2 | 3.1 |
| NMSC-MMC | 0.8 | 1.0 | 0.4 | 1.6 | |
| NBC-MSC | 1.2 | 0.0 | 0.6 | 0.6 | |
| NMSC-MSC | 1.9 | 2.2 | 2.4 | 1.3 | |
| Breast | NBC-MMC | 0.2 | 1.3 | 0.5 | 1.5 |
| NMSC-MMC | 0.6 | 3.2 | -1.2 | 0.9 | |
| NBC-MSC | 0.0 | 1.7 | -1.6 | -0.6 | |
| NMSC-MSC | 1.7 | 1.3 | -1.1 | 1.0 | |
| Average | 0.8 | 0.7 | 0.4 | 0.9 | |
Figure 2Boxplots of testing accuracies of the LPPO with four gene selection methods using two different classifiers (NBC, NMSC) compared to varSelRF for six data sets. RF is the final classifier. All six data sets demonstrate that varSelRF accuracies are lower than our proposed feature selection and optimization algorithm with the same RF classifier.
Figure 3A sketch description of the Lagging Prediction Peephole Optimization on Prostate data set.