| Literature DB >> 36060093 |
Shanshan Xie1, Yan Zhang2, Danjv Lv1, Xu Chen1, Jing Lu1, Jiang Liu3.
Abstract
Feature selection plays a very significant role for the success of pattern recognition and data mining. Based on the maximal relevance and minimal redundancy (mRMR) method, combined with feature subset, this paper proposes an improved maximal relevance and minimal redundancy (ImRMR) feature selection method based on feature subset. In ImRMR, the Pearson correlation coefficient and mutual information are first used to measure the relevance of a single feature to the sample category, and a factor is introduced to adjust the weights of the two measurement criteria. And an equal grouping method is exploited to generate candidate feature subsets according to the ranking features. Then, the relevance and redundancy of candidate feature subsets are calculated and the ordered sequence of these feature subsets is gained by incremental search method. Finally, the final optimal feature subset is obtained from these feature subsets by combining the sequence forward search method and the classification learning algorithm. Experiments are conducted on seven datasets. The results show that ImRMR can effectively remove irrelevant and redundant features, which can not only reduce the dimension of sample features and time of model training and prediction, but also improve the classification performance.Entities:
Keywords: Feature selection; Feature subset; ImRMR; Sequence forward search; mRMR
Year: 2022 PMID: 36060093 PMCID: PMC9424812 DOI: 10.1007/s11227-022-04763-2
Source DB: PubMed Journal: J Supercomput ISSN: 0920-8542 Impact factor: 2.557
Fig. 1The follow chart of feature selection with ImRMR method
Fig. 2Generate feature subsets with equal grouping method
The information of datasets
| Dataset | Training set | Test set | Number of features | Number of categories |
|---|---|---|---|---|
| Musk | 334 | 142 | 166 | 2 |
| Urban | 478 | 197 | 147 | 9 |
| Ionosphere | 246 | 105 | 34 | 2 |
| Glass | 150 | 64 | 9 | 6 |
| Movement | 252 | 108 | 90 | 15 |
| PU | 1800 | 900 | 103 | 9 |
| Crane | 243 | 100 | 75 | 7 |
Experimental results of ImRMR-EGM and ImRMR-RS
| Dataset | Performance (%) | Feature selection methods | ||
|---|---|---|---|---|
| Raw feature | ImRMR-EGM | ImRMR-RS | ||
| Musk | Acc | 85.21 | 94.72 | |
| Dr | 0.00 | 90.96 | ||
| Z | 42.61 | 92.88 | ||
| Recall | 92.21 | 94.81 | ||
| 87.12 | 95.17 | |||
| Precision | 82.56 | 94.60 | ||
| Urban | Acc | 82.23 | 87.02 | |
| Dr | 0.00 | 88.41 | ||
| Z | 41.12 | 87.72 | ||
| Recall | 82.40 | 87.43 | ||
| 81.81 | 87.02 | |||
| Precision | 84.18 | 87.99 | ||
| Ionosphere | Acc | 91.43 | 95.24 | |
| Dr | 0.00 | 80.59 | ||
| Z | 45.71 | 88.14 | ||
| Recall | 81.82 | 87.88 | ||
| 85.71 | 92.06 | |||
| Precision | 90.00 | 95.80 | ||
| Glass | Acc | 65.63 | 82.81 | |
| Dr | 0.00 | 46.30 | ||
| Z | 32.81 | 64.55 | ||
| Recall | 64.62 | 76.06 | ||
| 63.36 | 77.18 | |||
| Precision | 66.02 | 84.24 | ||
| Movement | Acc | 81.48 | 85.19 | |
| Dr | 0.00 | 83.52 | ||
| Z | 40.74 | 84.83 | ||
| Recall | 81.94 | 85.96 | ||
| 81.08 | 84.75 | |||
| Precision | 83.03 | 87.08 | ||
| PU | Acc | 93.22 | 94.94 | |
| Dr | 0.00 | 85.28 | ||
| Z | 46.61 | 90.11 | ||
| Recall | 93.22 | 94.94 | ||
| 93.13 | 94.89 | |||
| Precision | 93.33 | 94.98 | ||
| Crane | Acc | 90.00 | 92.47 | |
| Dr | 0.00 | 81.16 | ||
| Z | 45.00 | 86.81 | ||
| Recall | 90.32 | 92.70 | ||
| 90.24 | 92.72 | |||
| Precision | 91.59 | 93.35 | ||
Best performance in bold
Fig. 3Performance of ImRMR-EGM and ImRMR-RS
Experimental results of ImRMR and other five methods
| Dataset | Performance (%) | Raw feature | Feature selection methods | |||||
|---|---|---|---|---|---|---|---|---|
| ImRMR | mRMR | IG | SU | GR | RfF | |||
| Musk | Acc | 85.21 | 83.80 | 87.32 | 86.62 | 91.55 | ||
| Dr | 0.00 | 90.96 | 92.17 | 90.36 | 92.17 | 89.16 | ||
| Z | 42.61 | 93.02 | 89.19 | 88.84 | 89.39 | 90.35 | ||
| Recall | 92.21 | 94.81 | 96.10 | 88.31 | 93.51 | 92.21 | ||
| 87.12 | 95.42 | 85.53 | 88.89 | 88.20 | 92.59 | |||
| Precision | 82.56 | 94.87 | 82.93 | 84.71 | 84.52 | 88.24 | ||
| Urban | Acc | 82.23 | 82.74 | 84.26 | 85.28 | 84.26 | 82.74 | |
| Dr | 0.00 | 90.48 | 90.48 | 90.47 | 86.39 | 89.80 | ||
| Z | 41.12 | 86.61 | 87.71 | 87.88 | 85.33 | 86.27 | ||
| Recall | 82.40 | 80.62 | 84.37 | 84.70 | 85.98 | 83.08 | ||
| 81.81 | 82.08 | 83.64 | 85.14 | 85.38 | 82.66 | |||
| Precision | 84.18 | 86.48 | 84.61 | 86.99 | 85.96 | 84.22 | ||
| Ionosphere | Acc | 91.43 | 94.29 | 93.33 | 92.38 | 94.29 | 94.29 | |
| Dr | 0.00 | 82.35 | 79.41 | 79.41 | 70.59 | |||
| Z | 45.71 | 88.80 | 86.37 | 88.84 | 86.85 | 82.44 | ||
| Recall | 81.82 | 84.85 | 84.85 | 81.82 | ||||
| 85.71 | 90.32 | 88.89 | 87.10 | 90.63 | 90.63 | |||
| Precision | 90.00 | 96.55 | 93.33 | 93.10 | 93.55 | 93.55 | ||
| Glass | Acc | 65.63 | 82.81 | 70.31 | 78.13 | 68.75 | 71.88 | |
| Dr | 0.00 | 44.44 | 33.33 | 33.33 | 22.22 | 33.33 | ||
| Z | 32.81 | 63.63 | 51.82 | 55.73 | 45.49 | 52.60 | ||
| Recall | 64.62 | 75.99 | 70.80 | 73.88 | 63.96 | 69.51 | ||
| 63.36 | 76.50 | 68.95 | 71.71 | 62.13 | 71.28 | |||
| Precision | 66.02 | 81.23 | 70.47 | 72.28 | 72.44 | 80.02 | ||
| Movement | Acc | 81.48 | 81.48 | 82.41 | 83.33 | 84.26 | 82.41 | |
| Dr | 0.00 | 75.56 | 74.44 | 77.78 | 78.89 | 80.00 | ||
| Z | 40.74 | 78.52 | 78.43 | 80.56 | 81.57 | 81.20 | ||
| Recall | 81.94 | 81.71 | 82.61 | 83.96 | 84.56 | 83.10 | ||
| 81.08 | 81.00 | 81.44 | 82.75 | 83.91 | 81.78 | |||
| Precision | 83.03 | 83.42 | 84.17 | 83.90 | 85.45 | 84.00 | ||
| PU | Acc | 93.22 | 93.67 | 94.78 | 94.44 | 94.67 | 93.89 | |
| Dr | 0.00 | 51.46 | 81.55 | 80.58 | 81.55 | 77.67 | ||
| Z | 46.61 | 72.56 | 88.17 | 87.51 | 88.11 | 85.78 | ||
| Recall | 93.22 | 93.67 | 94.78 | 94.44 | 94.67 | 93.89 | ||
| 93.13 | 93.59 | 94.71 | 94.37 | 94.62 | 93.84 | |||
| Precision | 93.33 | 93.76 | 94.83 | 94.51 | 94.68 | 94.03 | ||
| Crane | Acc | 90.00 | 90.00 | 86.00 | 85.00 | 92.00 | 90.00 | |
| Dr | 0.00 | 84.00 | 80.00 | 80.00 | 80.00 | 80.00 | ||
| Z | 45.00 | 87.67 | 83.00 | 82.50 | 86.00 | 85.00 | ||
| Recall | 90.32 | 89.73 | 86.72 | 86.00 | 92.06 | 91.80 | ||
| 90.24 | 90.03 | 86.63 | 85.44 | 92.43 | 90.90 | |||
| Precision | 91.59 | 91.03 | 87.70 | 86.17 | 93.38 | 90.36 | ||
Best performance in bold
Fig. 4Performance of ImRMR with other methods
Comparative analysis with other methods
| Study | Year | Method | Dataset | Acc (%) | Dr (%) | Z (%) |
|---|---|---|---|---|---|---|
| Mafarja et al. [ | 2018 | Whale optimization approaches for wrapper feature selection with KNN classifier | Ionosphere | 92.56 | 57.60 | 75.08 |
| Mafarja et al. [ | 2019 | Binary grasshopper optimization algorithm with mutation with KNN classifier | Ionosphere | 79.41 | 88 | |
| Du et al. [ | 2020 | Improved binary symbiotic organism search algorithm with transfer functions with KNN classifier | Ionosphere | 92.96 | – | – |
| Xu et al. [ | 2020 | Maximum feature tree embedded with mutual information and coefficient of variation with random forest classifier | Ionosphere | 94.32 | 64.71 | 79.52 |
| Ghosh et al. [ | 2020 | Binary social mimic optimization algorithm with x-shaped transfer function with KNN classifier | Ionosphere | 95.71 | 76.47 | 86.09 |
| Han et al. [ | 2021 | Multi-objective particle swarm optimization with adaptive strategies with KNN classifier | Ionosphere | 89.18 | – | – |
| Kang et al. [ | 2022 | Grey wolf improved flower pollination algorithm with KNN classifier | Ionosphere | 95.36 | 76.18 | 85.77 |
| Proposed method | 2022 | ImRMR with random forest classifier | Ionosphere | 95.24 | ||
| Xu et al. [ | 2020 | Mutual information and coefficient of variation with random forest classifier | Crane | 91.00 | 45.33 | 68.17 |
| Proposed method | 2022 | ImRMR with random forest classifier | Crane | |||
| Zhang et al. [ | 2018 | Maximum joint mutual information algorithm with KNN classifier | Movement | 80.67 | 3.33 | 42.00 |
| Proposed method | 2022 | ImRMR with random forest classifier | Movement | |||
| Zhang et al. [ | 2018 | Maximum joint mutual information algorithm with KNN classifier | Musk | 79.52 | 22.89 | 51.21 |
| Chen et al. [ | 2021 | Self-learning feature selection with random forest classifier | Musk | 88.63 | 53.73 | 71.18 |
| Han et al. [ | 2021 | Multi-objective particle swarm optimization with adaptive strategies with KNN classifier | Musk | 83.80 | – | – |
| Proposed method | 2022 | ImRMR with random forest classifier | Musk |
Best performance in bold
Fig. 5Feature selection method based on SFSFs
Experimental results of ImRMR and other five methods based SFSFs
| Dataset | Performance (%) | Raw feature | Feature selection methods | |||||
|---|---|---|---|---|---|---|---|---|
| ImRMR | mRMR | IG | SU | GR | RfF | |||
| Musk | Acc | 85.21 | 85.21 | 85.92 | 85.21 | 85.92 | 85.21 | |
| Dr | 0.00 | 43.37 | 72.89 | 81.93 | 84.94 | 77.71 | ||
| Z | 42.61 | 64.29 | 79.40 | 83.57 | 85.43 | 81.46 | ||
| Recall | 89.61 | 89.61 | 90.91 | |||||
| 87.12 | 86.79 | 87.65 | 86.79 | 87.50 | 87.12 | |||
| Precision | 82.56 | 84.15 | 83.53 | 84.15 | 84.34 | 82.56 | ||
| Urban | Acc | 82.23 | 82.23 | 82.74 | 83.25 | 82.74 | 82.23 | |
| Dr | 0.00 | 51.02 | 54.42 | 51.02 | 61.22 | 44.22 | ||
| Z | 41.12 | 66.63 | 68.58 | 67.13 | 71.98 | 63.23 | ||
| Recall | 82.40 | 82.59 | 83.48 | 83.15 | 83.35 | 81.99 | ||
| 81.81 | 83.38 | 83.12 | 82.26 | 82.96 | 81.73 | |||
| Precision | 84.18 | 84.75 | 84.33 | 84.63 | 84.20 | 83.98 | ||
| Ionosphere | Acc | 91.43 | 91.43 | 91.43 | 92.38 | 92.38 | 91.43 | |
| Dr | 0.00 | 70.59 | 26.47 | 70.59 | 70.59 | 70.59 | ||
| Z | 45.71 | 81.96 | 58.95 | 81.48 | 81.48 | 81.01 | ||
| Recall | 81.82 | 81.82 | 81.82 | 81.82 | ||||
| 85.71 | 86.15 | 85.71 | 87.10 | 87.50 | 85.71 | |||
| Precision | 90.00 | 87.50 | 90.00 | 93.10 | 90.32 | 90.00 | ||
| Glass | Acc | 65.63 | 65.63 | 68.75 | 68.75 | 68.75 | 67.19 | |
| Dr | 0.00 | 33.33 | 44.44 | 33.33 | 22.22 | 55.56 | ||
| Z | 32.81 | 58.07 | 56.60 | 51.04 | 45.49 | 61.37 | ||
| Recall | 64.62 | 54.49 | 68.02 | 66.12 | 63.96 | 67.14 | ||
| 63.36 | 61.47 | 66.69 | 64.75 | 62.13 | 65.95 | |||
| Precision | 66.02 | 70.51 | 68.77 | 67.60 | 72.44 | 67.94 | ||
| Movement | Acc | 81.48 | 82.41 | 82.41 | 81.48 | 82.41 | 82.41 | |
| Dr | 0.00 | 25.56 | 31.11 | 31.11 | 44.44 | 34.44 | ||
| Z | 40.74 | 53.98 | 56.76 | 56.30 | 63.43 | 58.43 | ||
| Recall | 81.94 | 83.67 | 83.24 | 82.48 | 83.34 | 82.28 | ||
| 81.08 | 82.35 | 81.70 | 81.16 | 81.87 | 81.22 | |||
| Precision | 83.03 | 83.45 | 82.70 | 83.19 | 82.33 | 83.57 | ||
| PU | Acc | 93.22 | 93.22 | 93.67 | 93.44 | 93.56 | 93.22 | |
| Dr | 0.00 | 38.83 | 28.16 | 49.51 | 69.90 | 29.13 | ||
| Z | 46.61 | 66.03 | 60.91 | 71.48 | 81.73 | 61.17 | ||
| Recall | 93.22 | 93.22 | 93.67 | 93.44 | 93.56 | 93.22 | ||
| 93.13 | 93.16 | 93.59 | 93.38 | 93.51 | 93.19 | |||
| Precision | 93.33 | 93.28 | 93.81 | 93.51 | 93.63 | 93.33 | ||
| Crane | Acc | |||||||
| Dr | 0.00 | 53.33 | 57.33 | 49.33 | 34.67 | 21.33 | ||
| Z | 45.00 | 71.67 | 73.67 | 69.67 | 62.33 | 55.67 | ||
| Recall | 90.32 | 89.73 | 91.80 | 89.81 | 91.80 | 89.81 | ||
| 90.24 | 90.02 | 90.08 | 89.24 | 90.51 | 89.71 | |||
| Precision | 89.23 | 91.29 | 90.40 | 89.10 | 89.90 | 90.56 | ||
Best performance in bold
Fig. 6Comparison of the effects of six methods