| Literature DB >> 25276120 |
Kai Zeng1, Kun She1, Xinzheng Niu1.
Abstract
Feature selection plays an important role in machine learning and data mining. In recent years, various feature measurements have been proposed to select significant features from high-dimensional datasets. However, most traditional feature selection methods will ignore some features which have strong classification ability as a group but are weak as individuals. To deal with this problem, we redefine the redundancy, interdependence, and independence of features by using neighborhood entropy. Then the neighborhood entropy-based feature contribution is proposed under the framework of cooperative game. The evaluative criteria of features can be formalized as the product of contribution and other classical feature measures. Finally, the proposed method is tested on several UCI datasets. The results show that neighborhood entropy-based cooperative game theory model (NECGT) yield better performance than classical ones.Entities:
Mesh:
Year: 2014 PMID: 25276120 PMCID: PMC4158261 DOI: 10.1155/2014/479289
Source DB: PubMed Journal: Comput Intell Neurosci
Algorithm 1Feature contribution evaluation based on the Banzhaf value.
Algorithm 2Feature selection with NECGT.
Data description.
| ID | Data | Samples | Features | Classes |
|---|---|---|---|---|
| 1 | Glass | 214 | 9 | 7 |
| 2 | Cardiotocography | 2126 | 22 | 3 |
| 3 | Wpbc | 198 | 33 | 2 |
| 4 | Crx | 690 | 15 | 2 |
| 5 | Hepatitis | 155 | 19 | 2 |
| 6 | Wine | 178 | 13 | 3 |
| 7 | Spectf | 267 | 44 | 2 |
| 8 | Lymphography | 148 | 18 | 4 |
| 9 | German | 1000 | 20 | 2 |
Order of feature selection on Glass.
| Method | Order |
|---|---|
| mRMR | 4, 7, 2, 3, 8, 1, 6, 5, 9 |
| NECGT-mRMR | 4, 7, 3, 1, 6, 2, 8, 5, 9 |
Figure 1The contribution of each feature on glass.
Figure 2The results of NECGT-mRMR versus mRMR.
Order of feature selection on different datasets.
| Data | NECGT-mRMR | mRMR | NECGT-RS | RS |
|---|---|---|---|---|
| Lymphography | 13, 5, 17, 8, 2, 15 | 13, 5, 18, 1, 9, 2 | 13, 2, 15, 14, 3, 16 | 13, 2, 15, 14, 10, 1 |
|
| ||||
| Crx | 9, 14, 6, 8, 15 | 9, 15, 6, 11, 8 | 9, 6, 12, 1, 14 | 9, 10, 14, 6, 2 |
|
| ||||
| Cardiotocogrphy | 6, 22, 11, 2, 1, 4, 5, 7, 19 | 6, 22, 11, 2, 18, 5, 7, 4, 8 | 5, 15, 4, 3, 16, 17, 11, 22, 1 | 6, 8, 13, 7, 1, 2, 3, 4, 5 |
|
| ||||
| Spectf | 26, 34, 36, 24, 43, 28, 44, 8, 30, 10, 16, 15, 25, 14 | 26, 34, 10, 4, 40, 14, 7, 28, 6, 43, 30, 15, 32, 42 | 33, 37, 21, 38, 3, 5, 24, 27, 34, 13, 29, 20, 23, 36 | 26, 43, 3, 1, 2, 4, 5, 6, 7, 8, 9, 10, 11, 12 |
|
| ||||
| Hepatitis | 12, 18, 17, 6, 11, 5 | 14, 12, 19, 17, 11, 6 | 15, 16, 8, 9, 3, 18 | 15, 16, 1, 2, 3, 4 |
Classification accuracies based on CART (%).
| Data | Raw data | NECGT-mRMR | mRMR | NECGT-RS | RS |
|---|---|---|---|---|---|
| Lymphography | 78.31 | 82.86 | 70.00 | 82.14 | 79.03 |
| Crx | 84.92 | 84.50 | 84.93 | 81.88 | 81.17 |
| Cardiotocography | 85.94 | 85.74 | 85.69 | 75.11 | 83.22 |
| Spectf | 81.29 | 85.37 | 83.50 | 70.39 | 70.09 |
| Hepatitis | 82.33 | 84.17 | 77.33 | 78.50 | 72.33 |
|
| |||||
| Average | 82.55 | 84.52 | 80.20 | 77.60 | 77.16 |
Classification accuracies based on LSVM (%).
| Data | Raw data | NECGT-mRMR | mRMR | NECGT-RS | RS |
|---|---|---|---|---|---|
| Lymphography | 79.74 | 76.88 | 74.74 | 76.43 | 76.88 |
| Crx | 85.51 | 85.51 | 85.51 | 85.51 | 85.51 |
| Cardiotocography | 84.60 | 85.80 | 84.87 | 82.60 | 80.56 |
| Spectf | 84.65 | 84.63 | 84.23 | 79.41 | 79.41 |
| Hepatitis | 82.33 | 84.33 | 81.00 | 79.50 | 79.50 |
|
| |||||
| Average | 83.36 | 83.43 | 82.07 | 80.69 | 80.37 |
Classification accuracies based on RSVM (%).
| Data | Raw data | NECGT-mRMR | mRMR | NECGT-RS | RS |
|---|---|---|---|---|---|
| Lymphography | 55.52 | 84.03 | 72.14 | 76.17 | 75.71 |
| Crx | 69.14 | 84.50 | 84.21 | 83.34 | 86.37 |
| Cardiotocography | 79.54 | 85.23 | 85.14 | 83.12 | 82.56 |
| Spectf | 83.52 | 85.71 | 85.74 | 79.41 | 79.41 |
| Hepatitis | 87.00 | 82.50 | 81.83 | 78.83 | 77.50 |
|
| |||||
| Average | 74.94 | 84.39 | 81.81 | 80.17 | 80.31 |
A comparison of results.
| Model | Win | Tie |
|---|---|---|
| NECGT-mRMR versus mRMR | 12 : 2 | 1 |
| NECGT-RS versus RS | 8 : 3 | 4 |
| NECGT-SIGFD versus SIGFD | 20 : 5 | 5 |
Figure 3The contribution of each feature on Lymphography and Wpbc.
Order of feature selection.
| Data | Model | Oder |
|---|---|---|
| Lymphography | NECGT-MI | 13, 14, 2, 15, 3, 16, 5, 4, 8, 6, 10, 11, 17, 12, 7, 18, 1, 9 |
| CoFS-MI | 13, 14, 15, 2, 3, 5, 6, 4, 8, 11, 16, 17, 7, 10, 12, 9, 18, 1 | |
|
| ||
| Wpbc | NECGT-MI | 2, 14, 10, 34, 22, 24, 21, 15, 27, 9, 18, 20, 13, 12, 25, 33, 32, 19, 30, 4, 26, 28, 31, 16, 1, 11, 29, 5, 8, 3, 6, 7, 23, 17 |
| CoFS-MI | 2, 1, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34 | |
Figure 4The results of NECGT-MI versus CoFS-MI. (a) The classification algorithm is CART; (b) the classification algorithm is RSVM.
Order of feature selection on different datasets.
| Data | NECGT-MI | CoFS-MI |
|---|---|---|
| Crx | 9, 6, 12, 1, 14 | 9, 7, 6, 1, 12 |
| German | 1, 4, 7, 3, 11 | 1, 3, 4, 7, 6 |
| Glass | 3, 1, 5, 6 | 4, 1, 2, 7 |
| Wine | 9, 5, 3, 4, 8, 10 | 9, 5, 3, 4, 8, 1 |
Classification accuracies (%) on the selected feature space.
| Data | CART | LSVM | RSVM | |||
|---|---|---|---|---|---|---|
| NECGT-MI | CoFS-MI | NECGT-MI | CoFS-MI | NECGT-MI | CoFS-MI | |
| Crx | 81.88 | 81.47 | 85.51 | 85.51 | 83.34 | 80.57 |
| German | 72.40 | 71.82 | 70.00 | 70.00 | 70.30 | 70.90 |
| Glass | 71.68 | 71.18 | 52.52 | 50.39 | 53.77 | 58.45 |
| Wine | 86.67 | 86.60 | 87.78 | 83.89 | 91.11 | 86.67 |
|
| ||||||
| Average | 78.15 | 77.76 | 73.95 | 72.44 | 74.63 | 74.14 |
Running time (seconds) for each feature selection model.
| Data | MI | NECGT-MI | CoFS-MI |
|---|---|---|---|
| Crx | 9.9 | 26.1 + 9.9 | 26.7 + 9.9 |
| German | 47.4 | 102.9 + 47.4 | 104.4 + 47.4 |
| Glass | 0.2 | 1.22 + 0.2 | 1.44 + 0.2 |
| Wine | 0.3 | 2.94 + 0.3 | 3.03 + 0.3 |