| Literature DB >> 26963715 |
Hossam M Zawbaa1,2, E Emary3,4, Crina Grosan1,5.
Abstract
BACKGROUND: Selecting a subset of relevant properties from a large set of features that describe a dataset is a challenging machine learning task. In biology, for instance, the advances in the available technologies enable the generation of a very large number of biomarkers that describe the data. Choosing the more informative markers along with performing a high-accuracy classification over the data can be a daunting task, particularly if the data are high dimensional. An often adopted approach is to formulate the feature selection problem as a biobjective optimization problem, with the aim of maximizing the performance of the data analysis model (the quality of the data training fitting) while minimizing the number of features used.Entities:
Mesh:
Year: 2016 PMID: 26963715 PMCID: PMC4786139 DOI: 10.1371/journal.pone.0150652
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Fig 1Exploration rate () at different iterations in the original ALO.
Fig 2The exploration and exploitation values of different chaotic maps.
Fig 3The proposed chaotic antlion optimization (CALO).
Datasets used in the experiments.
| Dataset | No. of features | No. of samples | Scientific area |
|---|---|---|---|
| Breastcancer | 9 | 699 | Biology |
| BreastEW | 30 | 569 | |
| Exactly [ | 13 | 1000 | |
| Exactly2 [ | 13 | 1000 | |
| HeartEW | 13 | 270 | |
| Lymphography | 18 | 148 | |
| M-of-n | 13 | 1000 | |
| PenglungEW | 325 | 73 | |
| SonarEW | 60 | 208 | |
| SpectEW | 22 | 267 | |
| CongressEW | 16 | 435 | Politics |
| IonosphereEW | 34 | 351 | Electromagnetic |
| KrvskpEW | 36 | 3196 | Game |
| Tic-tac-toe | 9 | 958 | Game |
| Vote | 16 | 300 | Politics |
| WaveformEW | 40 | 5000 | Physics |
| WineEW | 13 | 178 | Chemistry |
| Zoo | 16 | 101 | Artificial, 7 classes of animals |
Parameter settings for CALO.
| Parameter | Value |
|---|---|
| No of search agents | 8 |
| No of iterations | 70 |
| Problem dimension | Same as number of features in any given database |
| Search domain | [0 1] |
Fig 4The fitness values obtained from different methods for all 18 datasets.
Fig 5The fitness values obtained from different methods for biological datasets.
Fig 6The fitness values obtained from different methods for non-biological datasets.
Average classification accuracy of 20 independent runs for GA, PSO, ALO, and 5 different chaos maps of CALO.
| Dataset | ALO | CALO-Log | CALO-Piec | CALO-Singer | CALO-Sinu | CALO-Tent | GA | PSO |
|---|---|---|---|---|---|---|---|---|
| 0.954 | 0.955 | 0.955 | 0.950 | 0.951 | 0.955 | 0.951 | ||
| 0.943 | 0.943 | 0.946 | 0.949 | 0.948 | 0.946 | 0.942 | ||
| 0.660 | 0.675 | 0.655 | 0.671 | 0.661 | 0.659 | 0.671 | ||
| 0.741 | 0.737 | 0.747 | 0.730 | 0.736 | 0.728 | 0.737 | ||
| 0.813 | 0.813 | 0.811 | 0.822 | 0.820 | 0.802 | |||
| 0.744 | 0.732 | 0.728 | 0.720 | 0.724 | 0.744 | 0.692 | ||
| 0.875 | 0.891 | 0.882 | 0.880 | 0.851 | 0.867 | 0.841 | ||
| 0.658 | 0.659 | 0.625 | 0.644 | 0.603 | 0.632 | 0.627 | ||
| 0.714 | 0.723 | 0.680 | 0.734 | 0.697 | 0.717 | 0.723 | ||
| 0.787 | 0.784 | 0.780 | 0.778 | 0.778 | 0.787 | 0.782 | ||
| 0.946 | 0.941 | 0.942 | 0.950 | 0.942 | 0.932 | 0.938 | ||
| 0.848 | 0.819 | 0.838 | 0.836 | 0.843 | 0.824 | 0.815 | ||
| 0.938 | 0.948 | 0.953 | 0.941 | 0.930 | 0.950 | 0.946 | ||
| 0.737 | 0.739 | 0.754 | 0.725 | 0.732 | 0.739 | 0.722 | ||
| 0.914 | 0.922 | 0.920 | 0.918 | 0.882 | 0.914 | 0.898 | ||
| 0.765 | 0.767 | 0.766 | 0.764 | 0.762 | 0.766 | 0.757 | ||
| 0.937 | 0.930 | 0.950 | 0.950 | 0.953 | 0.947 | 0.923 | ||
| 0.836 | 0.798 | 0.805 | 0.854 | 0.846 | 0.824 | 0.805 |
Average selection size of 20 independent runs for GA, PSO, ALO, and 5 different chaos maps of CALO.
| Dataset | ALO | CALO-Log | CALO-Piec | CALO-Singer | CALO-Sinu | CALO-Tent | GA | PSO |
|---|---|---|---|---|---|---|---|---|
| 0.800 | 0.844 | 0.756 | 0.689 | 0.822 | 0.733 | 0.667 | ||
| 0.567 | 0.753 | 0.580 | 0.687 | 0.653 | 0.680 | 0.527 | ||
| 0.538 | 0.538 | 0.723 | 0.692 | 0.754 | 0.738 | 0.615 | ||
| 0.785 | 0.492 | 0.646 | 0.585 | 0.477 | 0.738 | 0.508 | ||
| 0.677 | 0.615 | 0.738 | 0.662 | 0.723 | 0.615 | 0.677 | ||
| 0.456 | 0.456 | 0.600 | 0.611 | 0.456 | 0.522 | 0.433 | ||
| 0.815 | 0.800 | 0.754 | 0.846 | 0.954 | 0.631 | 0.708 | ||
| 0.266 | 0.419 | 0.316 | 0.357 | 0.442 | 0.494 | 0.476 | ||
| 0.357 | 0.350 | 0.433 | 0.537 | 0.360 | 0.483 | 0.497 | ||
| 0.391 | 0.418 | 0.400 | 0.427 | 0.527 | 0.518 | 0.455 | ||
| 0.362 | 0.512 | 0.388 | 0.325 | 0.388 | 0.450 | 0.375 | ||
| 0.324 | 0.200 | 0.694 | 0.524 | 0.235 | 0.553 | 0.535 | ||
| 0.739 | 0.611 | 0.561 | 0.550 | 0.672 | 0.594 | 0.528 | ||
| 0.711 | 0.756 | 0.711 | 0.756 | 0.622 | 0.667 | 0.711 | ||
| 0.475 | 0.588 | 0.613 | 0.500 | 0.625 | 0.325 | 0.475 | ||
| 0.850 | 0.765 | 0.825 | 0.855 | 0.805 | 0.810 | 0.600 | ||
| 0.615 | 0.585 | 0.462 | 0.554 | 0.677 | 0.615 | 0.508 | ||
| 0.637 | 0.662 | 0.700 | 0.613 | 0.613 | 0.600 |
An example of the feature selection size (reduction) for each optimization algorithm using the Breastcancer dataset.
| Optimizer | Selected features | ||
|---|---|---|---|
| Total No. | Indices | Labels | |
| 4 | 1, 2, 5, 6 | Clump Thickness, Uniformity of Cell Size, Single Epithelial Cell Size, Bare Nuclei | |
| 6 | 1, 3, 4, 6, 7, 9 | Clump Thickness, Uniformity of Cell Shape, Marginal Adhesion, Bare Nuclei, Bland Chromatin, Mitoses | |
| 7 | 1, 2, 3, 5, 6, 7, 9 | Clump Thickness, Uniformity of Cell Size, Uniformity of Cell Shape, Single Epithelial Cell Size, Bare Nuclei, Bland Chromatin, Mitoses | |
| 7 | 1, 2, 3, 5, 6, 7, 9 | Clump Thickness, Uniformity of Cell Size, Uniformity of Cell Shape, Single Epithelial Cell Size, Bare Nuclei, Bland Chromatin, Mitoses | |
| 5 | 1, 3, 6, 7, 9 | Clump Thickness, Uniformity of Cell Shape, Bare Nuclei, Bland Chromatin, Mitoses | |
| 8 | 1, 2, 4, 5, 6, 7, 8, 9 | Clump Thickness, Uniformity of Cell Size, Marginal Adhesion, Single Epithelial Cell Size, Bare Nuclei, Bland Chromatin, Normal Nucleoli, Mitoses | |
| 5 | 1, 3, 6, 7, 9 | Clump Thickness, Uniformity of Cell Shape, Bare Nuclei, Bland Chromatin, Mitoses | |
| 5 | 1, 3, 5, 6, 7 | Clump Thickness, Uniformity of Cell Shape, Single Epithelial Cell Size, Bare Nuclei, Bland Chromatin | |
An example of the feature selection size (reduction) for each optimization algorithm using the HeartEW dataset.
| Optimizer | Selected features | ||
|---|---|---|---|
| Total No. | Indices | Labels | |
| 5 | 3, 9, 10, 11, 12 | chest pain type, exercise-induced angina, oldpeak, slope of the peak exercise ST segment, number of major vessels | |
| 6 | 1, 3, 7, 9, 12, 13 | age, chest pain type, resting electrocardiographic, exercise-induced angina, number of major vessels, defect type | |
| 8 | 1, 3, 4, 7, 8, 11, 12, 13 | age, chest pain type, resting blood pressure, resting electrocardiographic, maximum heart rate, slope of the peak exercise ST segment, number of major vessels, defect type | |
| 8 | 1, 2, 3, 6, 8, 10, 11, 12 | age, sex, chest pain type, fasting blood sugar, maximum heart rate, oldpeak, slope of the peak exercise ST segment, number of major vessels | |
| 6 | 1, 2, 3, 7, 12, 13 | age, sex, chest pain type, resting electrocardiographic, number of major vessels, defect type | |
| 6 | 2, 3, 5, 11, 12, 13 | sex, chest pain type, serum cholesterol, slope of the peak exercise ST segment, number of major vessels, defect type | |
| 7 | 3, 5, 8, 9, 10, 11, 12 | chest pain type, serum cholesterol, maximum heart rate, exercise-induced angina, oldpeak, slope of the peak exercise ST segment, number of major vessels | |
| 8 | 1, 2, 3, 6, 7, 10, 12, 13 | age, sex, chest pain type, fasting blood sugar, resting electrocardiographic, oldpeak, number of major vessels, defect type | |
Average f-score of 20 independent runs for GA, PSO, ALO, and 5 different chaos maps of CALO.
| Dataset | ALO | CALO-Log | CALO-Piec | CALO-Singer | CALO-Sinu | CALO-Tent | GA | PSO |
|---|---|---|---|---|---|---|---|---|
| 9.159 | 10.084 | 8.780 | 8.251 | 9.661 | 8.417 | 8.155 | ||
| 7.855 | 9.552 | 7.792 | 9.085 | 8.349 | 9.625 | 7.382 | ||
| 0.017 | 0.022 | 0.023 | 0.024 | 0.025 | 0.018 | 0.021 | ||
| 0.025 | 0.020 | 0.018 | 0.023 | 0.015 | 0.015 | |||
| 1.408 | 1.301 | 1.410 | 1.356 | 1.453 | 1.254 | 1.370 | ||
| 3.104 | 3.203 | 5.110 | 4.633 | 4.436 | 5.390 | 4.084 | ||
| 0.403 | 0.403 | 0.403 | 0.404 | 0.404 | 0.401 | 0.389 | ||
| 66.689 | 103.142 | 81.710 | 87.756 | 110.113 | 120.123 | 118.180 | ||
| 0.900 | 0.933 | 1.171 | 1.362 | 0.914 | 1.343 | 1.335 | ||
| 0.482 | 0.512 | 0.492 | 0.515 | 0.634 | 0.616 | 0.525 | ||
| 3.043 | 4.164 | 3.322 | 3.171 | 3.096 | 3.625 | 3.118 | ||
| 1.077 | 0.850 | 1.809 | 0.771 | 1.620 | 1.585 | 1.399 | ||
| 0.852 | 0.796 | 0.781 | 0.753 | 0.794 | 0.773 | 0.769 | ||
| 0.050 | 0.055 | 0.054 | 0.053 | 0.050 | 0.051 | 0.048 | ||
| 3.337 | 3.831 | 4.047 | 3.353 | 2.480 | 3.647 | 3.186 | ||
| 6.178 | 5.903 | 6.110 | 6.314 | 5.823 | 5.875 | 5.485 | ||
| 7.742 | 7.332 | 6.392 | 7.217 | 8.567 | 8.648 | 6.268 | ||
| 299.533 | 288.376 | 267.946 | 260.639 | 265.226 | 244.569 | 298.357 |