| Literature DB >> 24688449 |
Jianzhong Wang1, Shuang Zhou2, Yugen Yi3, Jun Kong2.
Abstract
Feature selection is a key issue in the domain of machine learning and related fields. The results of feature selection can directly affect the classifier's classification accuracy and generalization performance. Recently, a statistical feature selection method named effective range based gene selection (ERGS) is proposed. However, ERGS only considers the overlapping area (OA) among effective ranges of each class for every feature; it fails to handle the problem of the inclusion relation of effective ranges. In order to overcome this limitation, a novel efficient statistical feature selection approach called improved feature selection based on effective range (IFSER) is proposed in this paper. In IFSER, an including area (IA) is introduced to characterize the inclusion relation of effective ranges. Moreover, the samples' proportion for each feature of every class in both OA and IA is also taken into consideration. Therefore, IFSER outperforms the original ERGS and some other state-of-the-art algorithms. Experiments on several well-known databases are performed to demonstrate the effectiveness of the proposed method.Entities:
Mesh:
Year: 2014 PMID: 24688449 PMCID: PMC3932247 DOI: 10.1155/2014/972125
Source DB: PubMed Journal: ScientificWorldJournal ISSN: 1537-744X
Figure 1The ER of the gene accessions numbers 9241 and 3689 from the Leukemia2 gene database.
Figure 2Different feature with the same size of overlapping area but different sample proportions in the two areas.
Algorithm 1Classification accuracies (%) of different feature selection methods with C4.5 on Lymphoma database.
| 10 | 30 | 50 | 70 | 90 | 110 | 130 | |
|---|---|---|---|---|---|---|---|
| PCC | 84.38 | 82.29 | 80.21 | 78.13 | 79.17 | 79.17 | 80.21 |
| Relief-F | 69.79 | 72.92 | 72.92 | 75.00 | 68.75 | 66.67 | 82.29 |
| IG | 78.13 | 76.04 | 76.04 | 72.92 | 77.08 | 77.08 | 77.08 |
| MRMR | 71.88 | 79.17 | 79.17 | 80.21 | 81.25 | 80.21 | 79.17 |
| ERGS | 86.46 | 85.42 | 82.29 | 81.25 | 83.33 | 83.33 | 84.38 |
| IFSER | 93.75 | 86.46 | 83.33 | 83.33 | 83.33 | 80.21 | 79.17 |
Classification accuracies (%) of different feature selection methods with C4.5 on Leukemia1 database.
| 10 | 30 | 50 | 70 | 90 | 110 | 130 | |
|---|---|---|---|---|---|---|---|
| PCC | 88.89 | 88.89 | 88.89 | 87.50 | 87.50 | 87.50 | 87.50 |
| Relief-F | 75.00 | 79.17 | 75.00 | 80.56 | 81.94 | 79.17 | 80.56 |
| IG | 80.56 | 84.72 | 84.72 | 84.72 | 84.72 | 84.72 | 84.72 |
| MRMR | 84.72 | 84.72 | 84.72 | 84.72 | 84.72 | 84.72 | 84.72 |
| ERGS | 88.89 | 88.89 | 88.89 | 88.89 | 88.89 | 88.89 | 88.89 |
| IFSER | 84.72 | 84.72 | 86.11 | 90.28 | 90.28 | 88.89 | 87.50 |
Classification accuracies (%) of different feature selection methods with C4.5 on Leukemia2 database.
| 10 | 30 | 50 | 70 | 90 | 110 | 130 | |
|---|---|---|---|---|---|---|---|
| PCC | 80.56 | 83.33 | 87.50 | 87.50 | 87.50 | 86.11 | 86.11 |
| Relief-F | 77.78 | 75.00 | 84.72 | 86.11 | 80.56 | 77.78 | 76.39 |
| IG | 84.72 | 87.50 | 87.50 | 87.50 | 87.50 | 87.50 | 87.50 |
| MRMR | 84.72 | 88.89 | 88.89 | 88.89 | 88.89 | 88.89 | 88.89 |
| ERGS | 86.11 | 84.72 | 88.89 | 88.89 | 88.89 | 87.50 | 87.50 |
| IFSER | 79.17 | 88.89 | 90.28 | 88.89 | 88.89 | 87.50 | 88.89 |
Classification accuracies (%) of different feature selection methods with C4.5 on 9_Tumors database.
| 10 | 30 | 50 | 70 | 90 | 110 | 130 | |
|---|---|---|---|---|---|---|---|
| PCC | 28.33 | 28.33 | 26.67 | 25.00 | 28.33 | 26.67 | 28.33 |
| Relief-F | 20.00 | 16.67 | 30.00 | 28.33 | 31.67 | 36.67 | 36.67 |
| IG | 38.33 | 38.33 | 41.67 | 40.00 | 40.00 | 40.00 | 38.33 |
| MRMR | 38.33 | 38.33 | 40.00 | 36.67 | 38.33 | 40.00 | 40.00 |
| ERGS | 28.33 | 28.33 | 23.33 | 25.00 | 23.33 | 21.67 | 26.67 |
| IFSER | 25.00 | 36.67 | 43.33 | 48.33 | 46.67 | 43.33 | 43.33 |
Classification accuracies (%) of different feature selection methods with NN on Lymphoma database.
| 10 | 30 | 50 | 70 | 90 | 110 | 130 | |
|---|---|---|---|---|---|---|---|
| PCC | 89.58 | 96.88 | 94.79 | 95.83 | 97.92 | 97.92 | 96.88 |
| Relief-F | 68.75 | 84.38 | 86.46 | 88.54 | 87.50 | 85.42 | 88.54 |
| IG | 88.54 | 95.83 | 94.79 | 94.79 | 95.83 | 96.88 | 96.88 |
| MRMR | 88.54 | 91.67 | 93.75 | 93.75 | 93.75 | 93.75 | 93.75 |
| ERGS | 89.58 | 94.79 | 95.83 | 97.92 | 95.83 | 97.92 | 97.92 |
| IFSER | 94.79 | 94.79 | 96.88 | 96.88 | 97.92 | 97.92 | 97.92 |
Classification accuracies (%) of different feature selection methods with NN on Leukemia1 database.
| 10 | 30 | 50 | 70 | 90 | 110 | 130 | |
|---|---|---|---|---|---|---|---|
| PCC | 93.06 | 94.44 | 95.83 | 97.22 | 95.83 | 97.22 | 95.83 |
| Relief-F | 69.44 | 76.31 | 75.00 | 75.00 | 73.61 | 76.39 | 80.56 |
| IG | 93.06 | 94.44 | 91.67 | 93.06 | 93.06 | 94.44 | 93.06 |
| MRMR | 88.89 | 93.06 | 90.28 | 93.06 | 93.06 | 94.44 | 93.06 |
| ERGS | 94.44 | 95.83 | 94.44 | 95.83 | 95.83 | 95.83 | 95.83 |
| IFSER | 81.94 | 91.67 | 93.06 | 91.67 | 97.22 | 94.44 | 95.83 |
Classification accuracies (%) of different feature selection methods with NN on Leukemia2 database.
| 10 | 30 | 50 | 70 | 90 | 110 | 130 | |
|---|---|---|---|---|---|---|---|
| PCC | 88.89 | 88.89 | 90.28 | 93.06 | 91.67 | 91.67 | 91.67 |
| Relief-F | 69.44 | 83.33 | 83.33 | 83.33 | 87.50 | 93.06 | 94.44 |
| IG | 83.33 | 83.33 | 94.44 | 94.44 | 94.44 | 94.44 | 94.44 |
| MRMR | 88.89 | 90.28 | 93.06 | 93.06 | 93.06 | 93.06 | 93.06 |
| ERGS | 86.11 | 86.11 | 93.06 | 93.06 | 91.67 | 93.06 | 93.06 |
| IFSER | 84.27 | 91.67 | 93.06 | 91.67 | 88.89 | 90.28 | 94.44 |
Classification accuracies (%) of different feature selection methods with NN on 9_Tumors database.
| 10 | 30 | 50 | 70 | 90 | 110 | 130 | |
|---|---|---|---|---|---|---|---|
| PCC | 28.33 | 41.67 | 51.67 | 51.67 | 51.67 | 50.00 | 51.67 |
| Relief-F | 25.00 | 28.33 | 21.67 | 26.67 | 30.00 | 35.00 | 33.33 |
| IG | 48.33 | 51.67 | 60.00 | 58.33 | 60.00 | 61.67 | 58.33 |
| MRMR | 38.33 | 46.67 | 56.67 | 55.00 | 60.00 | 65.00 | 61.67 |
| ERGS | 25.00 | 30.00 | 40.00 | 38.33 | 41.67 | 41.67 | 45.00 |
| IFSER | 35.00 | 36.67 | 38.33 | 46.67 | 46.67 | 45.00 | 46.67 |
Figure 3The classification accuracies of different algorithms on the Lymphoma database.
Figure 6The classification accuracies of different algorithms on the 9_Tumors database.
Figure 4The classification accuracies of different algorithms on the Leukemia1 database.
Figure 5The classification accuracies of different algorithms on the Leukemia2 database.