| Literature DB >> 36201524 |
Olatunji A Akinola1, Jeffrey O Agushaka1,2, Absalom E Ezugwu1.
Abstract
Selecting appropriate feature subsets is a vital task in machine learning. Its main goal is to remove noisy, irrelevant, and redundant feature subsets that could negatively impact the learning model's accuracy and improve classification performance without information loss. Therefore, more advanced optimization methods have been employed to locate the optimal subset of features. This paper presents a binary version of the dwarf mongoose optimization called the BDMO algorithm to solve the high-dimensional feature selection problem. The effectiveness of this approach was validated using 18 high-dimensional datasets from the Arizona State University feature selection repository and compared the efficacy of the BDMO with other well-known feature selection techniques in the literature. The results show that the BDMO outperforms other methods producing the least average fitness value in 14 out of 18 datasets which means that it achieved 77.77% on the overall best fitness values. The result also shows BDMO demonstrating stability by returning the least standard deviation (SD) value in 13 of 18 datasets (72.22%). Furthermore, the study achieved higher validation accuracy in 15 of the 18 datasets (83.33%) over other methods. The proposed approach also yielded the highest validation accuracy attainable in the COIL20 and Leukemia datasets which vividly portray the superiority of the BDMO.Entities:
Mesh:
Year: 2022 PMID: 36201524 PMCID: PMC9536540 DOI: 10.1371/journal.pone.0274850
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.752
Fig 1The optimization procedures of DMO [15].
Fig 2Flowchart depicting the structure of the BDMO algorithm.
Dataset and their properties.
| Number | Datasets | # features | # instances | # Classes | Categories |
|---|---|---|---|---|---|
|
| ALLAML | 7129 | 72 | 2 | Biological |
|
| CLL-SUB-111 | 11,340 | 111 | 3 | Biological |
|
| COIL20 | 1024 | 1440 | 20 | Face image |
|
| Colon | 2000 | 62 | 2 | Biological |
|
| GLA-BRA-180 | 49,151 | 180 | 4 | Biological |
|
| GLI-85 | 22,283 | 85 | 2 | Biological |
|
| GLIOMA | 4434 | 50 | 4 | Biological |
|
| Leukemia | 7070 | 72 | 2 | Biological |
|
| Lung | 3312 | 203 | 5 | Biological |
|
| Lymphoma | 4026 | 96 | 9 | Biological |
|
| Nci9 | 9712 | 60 | 9 | Biological |
|
| Orlraw10P | 10,306 | 100 | 10 | Face image |
|
| Prostate_GE | 5966 | 102 | 2 | Biological |
|
| SMK-CAN-187 | 19,993 | 187 | 2 | Biological |
|
| TOX-171 | 1748 | 171 | 4 | Biological |
|
| warpAR10P | 2400 | 130 | 10 | Face image |
|
| warpPIE10P | 2420 | 210 | 10 | Face image |
|
| Yale | 1024 | 165 | 15 | Face image |
Experiment’s parameter setting.
| Parameter | Value |
|---|---|
| K-fold cross-validation number | 10 |
| Agent number | 10 |
| Number of runs | 20 |
| Maximum iterations | 100 |
| Dimension of problems | Features’ number in the dataset |
| Length of CSA flight | 1.5 |
| CSO’s social factor | 0.2 |
| CSA’s awareness probability | 1.5 |
| MFO’s Parameter | 1 |
| HDBPSO’s acceleration factors | 2 |
| Parameters | 2,2 |
Mean fitness values.
| No | Datasets | BDMO | SBWOA | S-SBWOS | JA | MFO | BPSO | CSA | CSO | GNDO | SSA | HDBPSO |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
| ALLAML |
| 0.0827 | 0.0970 | 0.1387 | 0.1317 |
| 0.1453 | 0.1443 | 0.1385 | 0.1532 | 0.1673 |
|
| CLL-SUB-111 |
| 0.2609 | 0.2719 | 0.3240 | 0.3138 |
| 0.3476 | 0.3408 | 0.3320 | 0.3612 | 0.3926 |
|
| COIL20 |
| 0.0045 | NA | NA | NA | 0.0010 | NA | NA | NA | NA | NA |
|
| Colon |
| 0.0980 | 0.1003 | 0.1538 | 0.1395 | NA | 0.1665 | 0.1585 | 0.1548 | 0.1693 | 0.1988 |
|
| GLA-BRA-180 |
|
| NA | NA | NA | 0.1944 | NA | NA | NA | NA | NA |
|
| GLI-85 |
| 0.0717 | 0.0775 | 0.0967 | 0.0967 |
| 0.1092 | 0.1092 | 0.1050 | 0.1183 | 0.1442 |
|
| GLIOMA | 0.20 |
| 0.1358 | 0.1688 | 0.1658 | 0.20 | 0.1721 | 0.1767 | 0.1688 | 0.1817 | 0.1979 |
|
| Leukemia | 0.0321 | 0.0337 | 0.0397 | 0.0725 | 0.0693 |
| 0.0840 | 0.0812 | 0.0772 | 0.0858 | 0.1003 |
|
| Lung | 0.0388 |
| 0.0217 | 0.0280 | 0.0260 | 0.045 | 0.0295 | 0.0287 | 0.0301 | 0.0320 | 0.0394 |
|
| Lymphoma | 0.1184 | 0.0651 |
| 0.0798 | 0.0755 | 0.1263 | 0.0799 | 0.0829 | 0.0799 | 0.0820 | 0.0910 |
|
| Nci9 |
| 0.4695 | 0.4793 | 0.5487 | 0.5348 |
| 0.5570 | 0.5590 | 0.5482 | 0.5690 | 0.5973 |
|
| Orlraw10P |
| 0.060 | 0.0621 | 0.1036 | 0.1021 |
| 0.1043 | 0.1057 | 0.1036 | 0.1057 | 0.1136 |
|
| Prostate_GE |
| 0.0945 | 0.1053 | 0.1278 | 0.1242 | 0.145 | 0.1363 | 0.1363 | 0.1339 | 0.1425 | 0.1602 |
|
| SMK-CAN-187 |
| 0.2372 | 0.2468 | 0.2713 | 0.2629 | 0.0622 | 0.2828 | 0.2740 | 0.2770 | 0.2870 | 0.3033 |
|
| TOX-171 |
| 0.2221 | 0.2317 | 0.2346 | 0.2275 | 0.1368 | 0.2704 | 0.2454 | 0.2454 | 0.2717 | 0.3142 |
|
| warpPIE10P |
| 0.1129 | 0.1167 | 0.1371 | 0.1354 | 0.1179 | 0.1476 | 0.1423 | 0.1385 | 0.1521 | 0.1665 |
|
| warpAR10P |
| 0.3878 | 0.4121 | 0.4837 | 0.4774 | 0.2789 | 0.5055 | 0.4965 | 0.4946 | 0.5092 | 0.5426 |
|
| Yale |
| 0.3499 | 0.3632 | 0.3823 | 0.3641 | 0.2833 | 0.3960 | 0.3767 | 0.3869 | 0.3998 | 0.4297 |
| Friedman’s test mean rank | 2.53 | 3 | 3.47 | 5.77 | 4.5 | 4.2 | 8.1 | 7.67 | 6.67 | 9.43 | 10.67 | |
| Rank | 1 | 2 | 3 | 6 | 5 | 4 | 9 | 8 | 7 | 10 | 11 | |
Standard deviation of fitness values.
| No | Datasets | BDMO | SBWOA | S-SBWOS | JA | MFO | BPSO | CSA | CSO | GNDO | SSA | HDBPSO |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
| ALLAML |
| 0.0380 | 0.0290 | 0.0364 | 0.0343 |
| 0.0361 | 0.0346 | 0.0393 | 0.0346 | 0.0326 |
|
| CLL-SUB-111 |
| 0.0336 | 0.0346 | 0.0367 | 0.0342 |
| 0.0335 | 0.0343 | 0.0399 | 0.0382 | 0.0344 |
|
| COIL20 |
| 0.0032 | NA | NA | NA |
| NA | NA | NA | NA | NA |
|
| Colon |
| 0.0369 | 0.0324 | 0.0425 | 0.0434 | NA | 0.0466 | 0.0515 | 0.0442 | 0.0474 | 0.0544 |
|
| GLA-BRA-180 | 0.0062 | 0.0188 | NA | NA | NA |
| NA | NA | NA | NA | NA |
|
| GLI-85 |
| 0.0484 | 0.0277 | 0.0361 | 0.0361 |
| 0.0380 | 0.0447 | 0.0416 | 0.0452 | 0.0469 |
|
| GLIOMA |
| 0.0311 | 0.0600 | 0.0628 | 0.0577 |
| 0.0578 | 0.0607 | 0.0621 | 0.0669 | 0.0788 |
|
| Leukemia | 0.0365 |
| 0.0287 | 0.0371 | 0.0361 | 0.0262 | 0.0339 | 0.0403 | 0.0379 | 0.0349 | 0.0385 |
|
| Lung | 0.0128 | 0.0101 | 0.0088 | 0.0091 |
| 0.0103 | 0.0084 | 0.0093 | 0.0098 | ||
|
| Lymphoma | 0.0234 | 0.0153 |
| 0.0220 | 0.0231 | 0.0265 | 0.0232 | 0.0248 | 0.0213 | 0.0236 | 0.0251 |
|
| Nci9 |
| 0.0520 | 0.0606 | 0.0670 | 0.0586 |
| 0.0599 | 0.0624 | 0.0623 | 0.0599 | 0.0578 |
|
| Orlraw10P |
| 0.0183 | 0.0193 | 0.0207 | 0.0224 |
| 0.0218 | 0.0238 | 0.0240 | 0.0224 | 0.0243 |
|
| Prostate_GE |
| 0.0197 | 0.0346 | 0.0254 | 0.0260 | 0.0154 | 0.0271 | 0.0261 | 0.0259 | 0.0220 | 0.0256 |
|
| SMK-CAN-187 |
| 0.0265 | 0.0327 | 0.0423 | 0.0384 | 0.0127 | 0.0404 | 0.0434 | 0.0425 | 0.0405 | 0.0410 |
|
| TOX-171 |
| 0.0385 | 0.0360 | 0.0389 | 0.0330 | 0.0239 | 0.0358 | 0.0270 | 0.0271 | 0.0347 | 0.0299 |
|
| warpAR10P |
| 0.0474 | 0.0331 | 0.0583 | 0.0617 | 0.0246 | 0.0613 | 0.0607 | 0.0647 | 0.0312 | 0.0311 |
|
| warpPIE10P |
| 0.0243 | 0.0243 | 0.0284 | 0.0267 | 0.0053 | 0.0298 | 0.0375 | 0.0268 | 0.0617 | 0.0637 |
|
| Yale | 0.0178 | 0.0331 | 0.0291 | 0.0445 | 0.0413 |
| 0.0385 | 0.0375 | 0.0347 | 0.0355 | 0.0438 |
Fig 3Comparison between the proposed BDMO and the state-of-the-art methods based on accuracy validation on all selected high-dimensional feature selection datasets.
Fig 4Average feature selected by BDMO and other approaches.
Fig 5Illustration of the convergence curves for the three most prominent approaches employed in this study, namely BDMO, BPSO, and SBWOA to solve all the selected high-dimensional feature selection datasets.
Average time of computation of BDMO and other approaches (in seconds).
| No | Datasets | BDMO | SBWOA | S-SBWOS | JA | MFO | BPSO | CSA | CSO | GNDO | SSA | HDBPSO |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | ALLAML | 11.2 | 3.1 |
| 14.4 | 14.5 | 13.9 | 13.4 | 8.8 | 28.0 | 13.2 | 24.6 |
| 2 | CLL-SUB-111 | 18.8 | 5.8 |
| 32.0 | 31.7 | 22.1 | 29.1 | 18.9 | 56.3 | 28.8 | 54.5 |
| 3 | COIL20 | 74.7 |
| NA | NA | NA | 52.8 | NA | NA | NA | NA | NA |
| 4 | Colon | 6.1 | 1.4 |
| 2.4 | 2.5 | 5.8 | 2.4 | 1.8 | 4.7 | 2.3 | 3.9 |
| 5 | GLA-BRA-180 | 108.8 |
| NA | NA | NA | 119.3 | NA | NA | NA | NA | NA |
| 6 | GLI-85 | 25.5 | 14.8 |
| 48.2 | 47.6 | 35.9 | 46.6 | 31.4 | 92.8 | 45.0 | 88.2 |
| 7 | GLIOMA | 6.8 | 2.1 |
| 3.6 | 3.8 | 8.2 | 3.4 | 3.0 | 7.0 | 3.3 | 10.5 |
| 8 | Leukemia | 10.5 | 3.2 |
| 13.9 | 14.5 | 12.8 | 13.8 | 8.7 | 27.3 | 13.4 | 24.8 |
| 9 | Lung | 14.1 | 4.5 |
| 18.5 | 18.6 | 12.6 | 18.0 | 10.2 | 32.1 | 16.4 | 30.0 |
| 10 | Lymphoma | 9.8 | 2.7 |
| 5.5 | 9.4 | 10.3 | 5.4 | 3.8 | 11.0 | 5.0 | 19.0 |
| 11 | Nci9 | 10.8 | 3.6 |
| 15.6 | 16.4 | 15.1 | 15.0 | 10.9 | 29.3 | 14.9 | 27.9 |
| 12 | Orlraw10P | 15.9 | 5.3 |
| 27.0 | 27.2 | 19.1 | 24.6 | 15.7 | 50.1 | 23.6 | 48.4 |
| 13 | Prostate_GE | 12.1 | 3.4 |
| 16.8 | 17.0 | 13.6 | 15.7 | 10.2 | 32.0 | 15.3 | 27.2 |
| 14 | SMK-CAN-187 | 53.5 | 22.6 |
| 94.9 | 95.2 | 53.3 | 89.7 | 53.1 | 183.1 | 88.4 | 174.6 |
| 15 | TOX-171 | 17.6 | 6.2 |
| 26.7 | 25.3 | 17.1 | 24.5 | 13.5 | 48.7 | 24.2 | 41.7 |
| 16 | warpAR10P | 8.1 | 2.1 |
| 4.5 | 4.6 | 0.7 | 4.5 | 2.9 | 7.8 | 4.0 | 14.7 |
| 17 | warpPIE10P | 12.2 | 3.5 |
| 14.1 | 14.0 | 10.5 | 13.9 | 7.2 | 26.4 | 13.3 | 22.9 |
| 18 | Yale | 6.9 | 2.1 |
| 3.2 | 3.2 | 5.8 | 3.1 | 1.9 | 6.2 | 3.0 | 4.4 |
Precision of BDMO and other approaches.
| No | Datasets | BDMO | SBWOA | S-SBWOS | JA | MFO | BPSO | CSA | CSO | GNDO | SSA | HDBPSO |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
| ALLAML |
| 0.8442 | 0.8328 | 0.8142 | 0.8211 |
| 0.8121 | 0.8099 | 0.8082 | 0.8126 | 0.8115 |
|
| CLL-SUB-111 |
| 0.6955 | 0.6802 | 0.6652 | 0.6483 | 0.7 | 0.6410 | 0.6440 | 0.6625 | 0.6546 | 0.6552 |
|
| COIL20 |
| NA | NA | NA |
| NA | NA | NA | NA | NA | |
|
| Colon | 0.875 | 0.8162 | 0.8202 | 0.8072 | 0.7953 |
| 0.7930 | 0.8050 | 0.7987 | 0.8132 | 0.8062 |
|
| GLA-BRA-180 | 0.8214 | 0.6667 | NA | NA | NA |
| NA | NA | NA | NA | NA |
|
| GLI-85 |
| 0.7026 | 0.6896 | 0.6812 | 0.7031 |
| 0.6699 | 0.6924 | 0.7063 | 0.7175 | 0.6975 |
|
| GLIOMA | 0.625 | 0.7644 | 0.7562 | 0.7702 | 0.7681 | 0.625 |
| 0.7652 | 0.7663 | 0.7479 | 0.7681 |
|
| Leukemia |
| 0.8873 | 0.8993 | 0.8851 | 0.8917 |
| 0.8856 | 0.8916 | 0.8892 | 0.8977 | 0.8882 |
|
| Lung |
| 0.9426 | 0.9358 | 0.9486 | 0.9508 |
| 0.9491 | 0.9207 | 0.8832 | 0.9489 | 0.9333 |
|
| Lymphoma |
| 0.9366 | 0.9319 | 0.9429 | 0.9371 |
| 0.9383 | 0.6278 | 0.6395 | 0.9333 | 0.6344 |
|
| Nci9 |
| 0.6721 | 0.6404 | 0.6518 | 0.6405 | 0.8 | 0.6463 | 0.2964 | 0.2876 | 0.6658 | 0.2805 |
|
| Orlraw10P |
| 0.9362 | 0.9411 | 0.9412 | 0.9403 |
| 0.9420 | 0.9255 | 0.9275 | 0.9390 | 0.9255 |
|
| Prostate_GE |
| 0.8772 | 0.8641 | 0.8661 | 0.8742 |
| 0.8727 | 0.8674 | 0.8618 | 0.8729 | 0.8691 |
|
| SMK-CAN-187 |
| 0.6572 | 0.6751 | 0.6466 | 0.6532 |
| 0.6533 | 0.6585 | 0.6571 | 0.6608 | 0.6489 |
|
| TOX-171 | 0.9091 | 0.7035 | 0.6731 | 0.6900 | 0.6705 |
| 0.6570 | 0.6750 | 0.6697 | 0.6672 | 0.6640 |
|
| warpAR10P | 0.6364 | 0.5986 | 0.5789 | 0.5900 | 0.5998 |
| 0.5916 | 0.5233 | 0.5039 | 0.5905 | 0.5091 |
|
| warpPIE10P |
| 0.8807 | 0.8902 | 0.8675 | 0.8638 |
| 0.8813 | 0.8642 | 0.8719 | 0.8725 | 0.8681 |
|
| Yale | 0.8214 | 0.7020 | 0.6779 | 0.7060 | 0.7017 |
| 0.7015 | 0.6251 | 0. .6217 | 0.6976 | 0.6447 |
F-measure of BDMO and other approaches.
| No | Datasets | BDMO | SBWOA | S-SBWOS | JA | MFO | BPSO | CSA | CSO | GNDO | SSA | HDBPSO |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
| ALLAML | 0.75 | 0.8915 |
| 0.8799 | 0.8885 | 0.75 | 0.8822 | 0.8780 | 0.8784 | 0.8819 | 0.8818 |
|
| CLL-SUB-111 |
| 0.6505 | 0.6430 | 0.6041 | 0.5852 |
| 0.5823 | 0.5814 | 0.5992 | 0.5923 | 0.5943 |
|
| COIL20 |
| NA | NA | NA |
| NA | NA | NA | NA | NA | |
|
| Colon | 0.875 | 0.8322 | 0.8334 | 0.8225 | 0.8091 |
| 0.8159 | 0.8129 | 0.8111 | 0.8274 | 0.8231 |
|
| GLA-BRA-180 | 0.8364 | 0.7692 | NA | NA | NA |
| NA | NA | NA | NA | NA |
|
| GLI-85 |
| 0.7316 | 0.7047 | 0.7203 | 0.7454 |
| 0.7194 | 0.7259 | 0.7493 | 0.7714 | 0.7326 |
|
| GLIOMA |
| 0.7225 | 0.7128 | 0.7240 | 0.7312 |
| 0.7283 | 0.7312 | 0.7269 | 0.7182 | 0.7348 |
|
| Leukemia |
| 0.9129 | 0.9314 | 0.9163 | 0.9269 |
| 0.9220 | 0.9231 | 0.9235 | 0.9264 | 0.9198 |
|
| Lung | 0.9167 | 0.8447 | 0.8599 | 0.9009 | 0.8741 |
| 0.8826 | 0.8917 | 0.8635 | 0.8807 | 0.9066 |
|
| Lymphoma |
| 0.6157 | 0.6308 | 0.6576 | 0.6373 |
| 0.6540 | 0.6436 | 0.6555 | 0.6242 | 0.6488 |
|
| Nci9 |
| 0.3334 | 0.2645 | 0.3049 | 0.2993 | 0.8421 | 0.2980 | 0.3037 | 0.2998 | 0.2933 | 0.2899 |
|
| Orlraw10P |
| 0.8939 | 0.8998 | 0.8997 | 0.8974 |
| 0.8970 | 0.8903 | 0.8935 | 0.8939 | 0.8903 |
|
| Prostate_GE |
| 0.8665 | 0.8319 | 0.8331 | 0.8437 |
| 0.8489 | 0.8356 | 0.8312 | 0.8422 | 0.8406 |
|
| SMK-CAN-187 |
| 0.6179 | 0.6159 | 0.6013 | 0.6073 |
| 0.6070 | 0.6141 | 0.6049 | 0.6052 | 0.6084 |
|
| TOX-171 | 0.8511 | 0.6711 | 0.6428 | 0.6556 | 0.6389 |
| 0.6265 | 0.6393 | 0.6354 | 0.6386 | 0.6302 |
|
| warpAR10P | 0.7180 | 0.4690 | 0.4578 | 0.4343 | 0.4198 |
| 0.4273 | 0.4317 | 0.4290 | 0.4232 | 0.4260 |
|
| warpPIE10P |
| 0.8585 | 0.8705 | 0.8442 | 0.8435 |
| 0.8597 | 0.8407 | 0.8506 | 0.8484 | 0.8442 |
|
| Yale |
| 0.5428 | 0.5289 | 0.5449 | 0.5447 | 0.8302 | 0.5369 | 0.5236 | 0.5267 | 0.5362 | 0.5356 |
Results of Wilcoxon sign test on validation accuracy.
| Algorithms | N | Mean Rank | Sum of Ranks | Z | Asymp. Sig. (2-tailed) | |
|---|---|---|---|---|---|---|
| SBWOA—BDMO | Negative Ranks | 15 | 8.80 | 132.00 | -2.627b | 0.009 |
| Positive Ranks | 2 | 10.50 | 21.00 | |||
| Ties | 1 | |||||
| Total | 18 | |||||
| S-SBWOS–BDMO | Negative Ranks | 13 | 8.08 | 105.00 | -2.556b | 0.011 |
| Positive Ranks | 2 | 7.50 | 15.00 | |||
| Ties | 1 | |||||
| Total | 16 | |||||
| JA—BDMO | Negative Ranks | 14 | 9.36 | 131.00 | -3.258b | 0.001 |
| Positive Ranks | 2 | 2.50 | 5.00 | |||
| Ties | 0 | |||||
| Total | 16 | |||||
| MFO—BDMO | Negative Ranks | 13 | 8.73 | 113.50 | -3.039b | 0.002 |
| Positive Ranks | 2 | 3.25 | 6.50 | |||
| Ties | 1 | |||||
| Total | 16 | |||||
| BSPO—BDMO | Negative Ranks | 9 | 5.67 | 51.00 | -2.395b | 0.017 |
| Positive Ranks | 1 | 4.00 | 4.00 | |||
| Ties | 8 | |||||
| Total | 18 | |||||
| CSA—BDMO | Negative Ranks | 14 | 9.36 | 131.00 | -3.258b | 0.001 |
| Positive Ranks | 2 | 2.50 | 5.00 | |||
| Ties | 0 | |||||
| Total | 16 | |||||
| CSO—BDMO | Negative Ranks | 14 | 9.36 | 131.00 | -3.258b | 0.001 |
| Positive Ranks | 2 | 2.50 | 5.00 | |||
| Ties | 0 | |||||
| Total | 16 | |||||
| GNDO—BDMO | Negative Ranks | 14 | 9.36 | 131.00 | -3.258b | 0.001 |
| Positive Ranks | 2 | 2.50 | 5.00 | |||
| Ties | 0 | |||||
| Total | 16 | |||||
| SSA—BDMO | Negative Ranks | 14 | 9.36 | 131.00 | -3.258b | 0.001 |
| Positive Ranks | 2 | 2.50 | 5.00 | |||
| Ties | 0 | |||||
| Total | 16 | |||||
| HDBPSO–BDMO | Negative Ranks | 14 | 8.50 | 119.00 | -3.351b | <,001 |
| Positive Ranks | 1 | 1.00 | 1.00 | |||
| Ties | 1 | |||||
| Total | 16 | |||||
b. Based on positive ranks.