| Literature DB >> 34056623 |
Hamouda Chantar1, Mohammad Tubishat2, Mansour Essgaer1, Seyedali Mirjalili3,4.
Abstract
There are various fields are affected by the growth of data dimensionality. The major problems which are resulted from high dimensionality of data including high memory requirements, high computational cost, and low machine learning classifier performance. Therefore, proper selection of relevant features from the set of available features and the removal of irrelevant features will solve these problems. Therefore, to solve the feature selection problem, an improved version of Dragonfly Algorithm (DA) is proposed by combining it with Simulated Annealing (SA), where the improved algorithm named BDA-SA. To solve the local optima problem of DA and enhance its ability in selecting the best subset of features for classification problems, Simulated Annealing (SA) was applied to the best solution found by Binary Dragonfly algorithm in attempt to improve its accuracy. A set of frequently used data sets from UCI repository was utilized to evaluate the performance of the proposed FS approach. Results show that the proposed hybrid approach, named BDA-SA, has superior performance when compared to wrapper-based FS methods including a feature selection method based on the basic version of Binary Dragonfly Algorithm. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1007/s42979-021-00687-5.Entities:
Keywords: Dragonfly algorithm; Feature selection; Optimization; Simulated annealing algorithm
Year: 2021 PMID: 34056623 PMCID: PMC8147911 DOI: 10.1007/s42979-021-00687-5
Source DB: PubMed Journal: SN Comput Sci ISSN: 2661-8907
Fig. 1Flowchart of BDA-SA algorithm
Data sets used in the experiments
| Data set | Features | Instances |
|---|---|---|
| Breastcancer | 9 | 699 |
| BreastEW | 30 | 569 |
| CongressEW | 16 | 435 |
| Exactly | 13 | 1000 |
| Exactly2 | 13 | 1000 |
| HeartEW | 13 | 270 |
| IonosphereEW | 34 | 351 |
| KrvskpEW | 36 | 3196 |
| Lymphography | 18 | 148 |
| M-of-n | 13 | 1000 |
| PenglungEW | 325 | 73 |
| SonarEW | 60 | 208 |
| SpectEW | 22 | 267 |
| Tic-tac-toe | 9 | 958 |
| Vote | 16 | 300 |
| WaveformEW | 40 | 5000 |
| WineEW | 13 | 178 |
| Zoo | 16 | 101 |
Parameter setting of algorithms
| Parameter | Value |
|---|---|
| 5 | |
| 0.99 | |
| Number of search agents | 10 |
| Maximum number of iterations | 100 |
| from 2 to 0 | |
| from 0.9 to 0.2 | |
| 2.0 |
Averages of classification accuracy, best fitness and selected features obtained from BDA and BDA-SA
| Data set | Accuracy | Best Fitness | Selected Features | |||
|---|---|---|---|---|---|---|
| BDA | BDA-SA | BDA | BDA-SA | BDA | BDA-SA | |
| Breastcancer | 0.968 | 0.038 | 6.000 | 6.000 | ||
| BreastEW | 0.960 | 0.043 | 13.500 | |||
| CongressEW | 0.967 | 0.035 | 5.000 | |||
| Exactly | 0.982 | 0.023 | 6.250 | |||
| Exactly2 | 0.744 | 0.244 | 1.450 | 1.450 | ||
| HeartEW | 0.842 | 0.159 | 6.900 | |||
| IonosphereEW | 0.919 | 0.079 | 11.650 | |||
| KrvskpEW | 0.958 | 0.031 | 18.350 | |||
| Lymphography | 0.872 | 0.131 | 8.500 | |||
| M-of-n | 0.995 | 0.008 | 6.200 | |||
| PenglungEW | 0.909 | 0.093 | 141.300 | |||
| SonarEW | 0.914 | 0.086 | 27.550 | |||
| SpectEW | 0.857 | 0.143 | 8.850 | |||
| Tic-tac-toe | 0.784 | 0.207 | 8.200 | |||
| BDA Vote | 0.953 | 0.049 | 6.550 | |||
| WaveformEW | 0.755 | 0.236 | 21.100 | |||
| WineEW | 0.987 | 0.009 | 8.850 | |||
| Zoo | 0.959 | 0.042 | 8.350 | |||
Bold values represent the best results
Standard deviation of the averages of classification accuracy, best fitness and selected features obtained from BDA and BDA-SA
| Data set | Accuracy | Best Fitness | Selected Features | |||
|---|---|---|---|---|---|---|
| BDA | BDA-SA | BDA | BDA-SA | BDA | BDA-SA | |
| Breastcancer | 0.001278 | 0.000633 | 0.000000 | 0.000000 | ||
| BreastEW | 0.005140 | 0.004360 | 2.665000 | |||
| CongressEW | 0.004183 | 0.003603 | 1.716790 | |||
| Exactly | 0.063000 | 0.063070 | 0.910460 | |||
| Exactly2 | 0.035800 | 0.000498 | 1.394500 | |||
| HeartEW | 0.014274 | 0.008007 | 0.998683 | |||
| IonosphereEW | 0.012887 | 0.011020 | 2.680800 | |||
| KrvskpEW | 0.041039 | 0.005190 | 2.542270 | |||
| Lymphography | 0.011371 | 0.011450 | 1.960129 | |||
| M-of-n | 0.016938 | 0.008298 | 0.410391 | |||
| PenglungEW | 0.028996 | 0.016101 | 8.676041 | |||
| SonarEW | 0.021943 | 0.018594 | 4.006245 | |||
| SpectEW | 0.008352 | 0.008304 | 2.433862 | |||
| Tic-tac-toe | 0.038510 | 0.010051 | 1.880649 | |||
| Vote | 0.006840 | 0.003612 | 2.459675 | |||
| WaveformEW | 0.028255 | 0.006278 | 2.991215 | |||
| WineEW | 0.030413 | 0.004565 | 1.814416 | |||
| Zoo | 0.016711 | 0.013161 | 1.565248 | |||
Bold values represent the best results
P values of the Wilcoxon ranksum test over 20 runs for classification accuracy, Best Fitness and Selected Features of BDA and BDA-SA (P 0.05 have been underlined)
| Data set | Accuracy | Best fitness | Selected features |
|---|---|---|---|
| Breastcancer | 0.000080 | 0.000080 | N/A |
| BreastEW | 0.000080 | 0.000080 | |
| CongressEW | 0.000540 | 0.000320 | |
| Exactly | N/A | N/A | N/A |
| Exactly2 | 0.000140 | 0.000080 | N/A |
| HeartEW | 0.000080 | 0.000080 | 0.006140 |
| IonosphereEW | 0.024440 | ||
| KrvskpEW | 0.000500 | ||
| Lymphography | 0.00008 | 0.00008 | 0.047700 |
| M-of-n | N/A | N/A | N/A |
| PenglungEW | 0.013900 | 0.000100 | 0.000160 |
| SonarEW | 0.020880 | ||
| SpectEW | 0.023200 | 0.044440 | |
| Tic-tac-toe | 0.000080 | 0.000080 | N/A |
| Vote | 0.000200 | 0.000080 | 0.000420 |
| WaveformEW | 0.000080 | 0.000080 | |
| WineEW | 0.001000 | ||
| Zoo | 0.000640 | 0.000080 | 0.001420 |
The underline values mean that there is no significant difference
Fig. 2Averages of computational time for BDA and BDA-SA
Comparison between BDA-SA and other algorithms in terms of classification accuracy
| Data set | BPSO | BALO | BGWO | BDA-SA |
|---|---|---|---|---|
| Breastcancer | 0.949 | 0.931 | 0.957 | |
| BreastEW | 0.924 | 0.930 | 0.935 | |
| CongressEW | 0.912 | 0.915 | 0.923 | |
| Exactly | 0.672 | 0.626 | 0.666 | |
| Exactly2 | 0.725 | 0.702 | 0.717 | |
| HeartEW | 0.789 | 0.751 | 0.751 | |
| IonosphereEW | 0.845 | 0.813 | 0.787 | |
| KrvskpEW | 0.850 | 0.761 | 0.857 | |
| Lymphography | 0.730 | 0.684 | 0.786 | |
| M-of-n | 0.814 | 0.733 | 0.800 | |
| PenglungEW | 0.728 | 0.789 | 0.724 | |
| SonarEW | 0.799 | 0.778 | 0.796 | |
| SpectEW | 0.808 | 0.831 | 0.787 | |
| Tic-tac-toe | 0.712 | 0.694 | 0.707 | |
| Vote | 0.905 | 0.882 | 0.915 | |
| WaveformEW | 0.743 | 0.701 | 0.743 | |
| WineEW | 0.916 | 0.916 | 0.895 | |
| Zoo | 0.846 | 0.808 | 0.829 |
Bold values represent the best results
Comparison between BDA-SA and other algorithms in terms of best fitness
| Data set | BPSO | BALO | BGWO | BDA-SA |
|---|---|---|---|---|
| Breastcancer | 0.038 | 0.038 | 0.031 | |
| BreastEW | 0.044 | 0.036 | 0.045 | |
| CongressEW | 0.034 | 0.045 | 0.030 | |
| Exactly | 0.037 | 0.164 | 0.163 | |
| Exactly2 | 0.243 | 0.257 | 0.246 | |
| HeartEW | 0.135 | 0.165 | 0.166 | |
| IonosphereEW | 0.113 | 0.141 | 0.176 | |
| KrvskpEW | 0.030 | 0.045 | 0.049 | |
| Lymphography | 0.145 | 0.156 | 0.105 | |
| M-of-n | 0.046 | 0.042 | 0.008 | |
| PenglungEW | 0.165 | 0.122 | 0.216 | |
| SonarEW | 0.093 | 0.123 | 0.126 | |
| SpectEW | 0.134 | 0.127 | 0.137 | |
| Tic-tac-toe | 0.201 | 0.217 | 0.191 | |
| Vote | 0.062 | 0.046 | 0.037 | |
| WaveformEW | 0.200 | 0.219 | 0.217 | |
| WineEW | 0.018 | 0.038 | ||
| Zoo | 0.045 | 0.033 | 0.084 |
Comparison between BDA-SA and other algorithms in terms of selected features
| Data set | BPSO | BALO | BGWO | BDA-SA |
|---|---|---|---|---|
| Breastcancer | 6.000 | 6.000 | 6.000 | |
| BreastEW | 12.800 | 22.600 | 20.600 | |
| CongressEW | 9.300 | 9.500 | 4.150 | |
| Exactly | 6.200 | 7.700 | 7.750 | |
| Exactly2 | 2.050 | 8.450 | 8.200 | |
| HeartEW | 8.900 | 8.500 | 6.900 | |
| IonosphereEW | 12.300 | 25.500 | 20.550 | |
| KrvskpEW | 16.850 | 26.900 | 24.300 | |
| Lymphography | 11.050 | 11.800 | 8.500 | |
| M-of-n | 7.500 | 7.650 | 6.150 | |
| PenglungEW | 146.250 | 253.650 | 204.700 | |
| SonarEW | 27.000 | 47.450 | 40.050 | |
| SpectEW | 15.350 | 13.750 | 8.850 | |
| Tic-tac-toe | 8.900 | 9.000 | 7.800 | |
| Vote | 5.000 | 8.500 | 8.100 | |
| WaveformEW | 21.150 | 32.400 | 30.600 | |
| WineEW | 10.250 | 8.400 | 8.850 | |
| Zoo | 8.150 | 10.550 | 9.350 |
Fig. 3Comparison between BDA-SA and other algorithms in terms of computational time