| Literature DB >> 28659577 |
Jinyan Li1, Simon Fong1, Raymond K Wong2, Richard Millham3, Kelvin K L Wong4,5.
Abstract
Due to the high-dimensional characteristics of dataset, we propose a new method based on the Wolf Search Algorithm (WSA) for optimising the feature selection problem. The proposed approach uses the natural strategy established by Charles Darwin; that is, 'It is not the strongest of the species that survives, but the most adaptable'. This means that in the evolution of a swarm, the elitists are motivated to quickly obtain more and better resources. The memory function helps the proposed method to avoid repeat searches for the worst position in order to enhance the effectiveness of the search, while the binary strategy simplifies the feature selection problem into a similar problem of function optimisation. Furthermore, the wrapper strategy gathers these strengthened wolves with the classifier of extreme learning machine to find a sub-dataset with a reasonable number of features that offers the maximum correctness of global classification models. The experimental results from the six public high-dimensional bioinformatics datasets tested demonstrate that the proposed method can best some of the conventional feature selection methods up to 29% in classification accuracy, and outperform previous WSAs by up to 99.81% in computational time.Entities:
Year: 2017 PMID: 28659577 PMCID: PMC5489518 DOI: 10.1038/s41598-017-04037-5
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.379
Accuracy of all datasets with different methods (best results highlighted in bold).
| Accuracy | ALLAML | GLI_85 | Prostate_GE | SMK_CAN_187 | Colon | Leukemia |
|---|---|---|---|---|---|---|
| ELM | 0.61 | 0.66 | 0.53 | 0.5 | 0.52 | 0.6 |
| CHSAE | 0.65 | 0.64 | 0.57 | 0.50 |
| 0.69 |
| INFORGAE | 0.68 | 0.67 | 0.61 | 0.50 | 0.63 | 0.72 |
| RFAE | 0.56 | 0.61 | 0.52 | 0.48 | 0.68 | 0.58 |
| PSO | 0.71 ± 0.03 | 0.69 ± 0.05 | 0.66 ± 0.04 | 0.53 ± 0.02 | 0.64 ± 0.06 | 0.68 ± 0.05 |
| BPSO | 0.70 ± 0.06 | 0.70 ± 0.06 | 0.7 ± 0.08 | 0.54 ± 0.04 | 0.66 ± 0.02 | 0.68 ± 0.05 |
| WSA | 0.72 ± 0.04 | 0.73 ± 0.04 | 0.66 ± 0.11 | 0.56 ± 0.04 | 0.66 ± 0.06 | 0.68 ± 0.05 |
| EBWSA |
|
|
|
| 0.68 ± 0.04 |
|
Kappa statistics for all datasets with different methods (best results highlighted in bold).
| Kappa | ALLAML | GLI_85 | Prostate_GE | SMK_CAN_187 | Colon | Leukemia |
|---|---|---|---|---|---|---|
| ELM | 0.16 | 0.17 | 0.12 | −0.0174 | −0.06 | 0.06 |
| CHSAE | 0.37 | 0.19 | 0.14 | −0.0165 |
| 0.37 |
| INFORGAE | 0.39 | 0.24 | 0.22 | −0.0149 | 0.23 | 0.43 |
| RFAE | 0.06 | 0.08 | 0.04 | −0.0427 | 0.28 | 0.1 |
| PSO | 0.36 ± 0.07 | 0.21 ± 0.12 | 0.32 ± 0.08 | 0.07 ± 0.05 | 0.21 ± 0.14 | 0.30 ± 0.11 |
| BPSO | 0.33 ± 0.12 | 0.25 ± 0.15 | 0.39 ± 0.17 | 0.08 ± 0.07 | 0.26 ± 0.06 | 0.30 ± 0.10 |
| WSA | 0.38 ± 0.08 | 0.30 ± 0.10 | 0.32 ± 0.21 | 0.12 ± 0.09 | 0.26 ± 0.14 | 0.30 ± .012 |
| EBWSA |
|
|
|
| 0.29 ± 0.14 |
|
Precision of all datasets with different methods (best results highlighted in bold).
| Precision | ALLAML | GLI_85 | Prostate_GE | SMK_CAN_187 | Colon | Leukemia |
|---|---|---|---|---|---|---|
| ELM | 0.68 | 0.34 | 0.46 | 0.33 | 0.62 | 0.74 |
| CHSAE | 0.82 | 0.42 | 0.56 | 0.34 | 0.73 | 0.68 |
| INFORGAE | 0.62 |
| 0.66 | 0.37 | 0.65 | 0.7 |
| RFAE | 0.59 | 0.35 | 0.48 | 0.41 |
| 0.66 |
| PSO | 0.76 ± 0.04 | 0.34 ± 0.09 | 0.62 ± 0.06 | 0.41 ± 0.11 | 0.72 ± 0.07 | 0.77 ± 0.07 |
| BPSO | 0.74 ± 0.07 | 0.38 ± 0.12 | 0.65 ± 0.15 | 0.48 ± 0.10 | 0.72 ± 0.02 | 0.76 ± 0.08 |
| WSA | 0.78 ± 0.05 | 0.40 ± 0.09 | 0.64 ± 0.13 | 0.47 ± 0.14 | 0.75 ± 0.05 | 0.75 ± 0.05 |
| EBWSA |
| 0.47 ± 0.11 |
|
|
|
|
Recall of all datasets with different methods (best results highlighted in bold).
| Recall | ALLAML | GLI_85 | Prostate_GE | SMK_CAN_187 | Colon | Leukemia |
|---|---|---|---|---|---|---|
| ELM | 0.71 | 0.45 | 0.52 | 0.46 | 0.62 | 0.65 |
| CHSAE | 0.72 | 0.44 | 0.56 | 0.46 |
| 0.82 |
| INFORGAE | 0.81 | 0.46 | 0.58 | 0.47 | 0.74 | 0.84 |
| RFAE | 0.62 | 0.36 | 0.51 | 0.45 | 0.73 | 0.68 |
| PSO | 0.78 ± 0.03 | 0.51 ± 0.11 | 0.67 ± 0.05 | 0.53 ± 0.03 | 0.72 ± 0.06 | 0.75 ± 0.04 |
| BPSO | 0.75 ± 0.04 | 0.53 ± 0.12 | 0.70 ± 0.08 | 0.53 ± 0.05 | 0.74 ± 0.03 | 0.76 ± 0.04 |
| WSA | 0.80 ± 0.03 |
| 0.66 ± 0.11 | 0.55 ± 0.04 | 0.74 ± 0.06 | 0.76 ± 0.05 |
| EBWSA |
| 0.59 ± 0.08 |
|
| 0.75 ± 0.04 |
|
Figure 1Average classification accuracy of all dataset.
Figure 2Average Kappa value of classification of all dataset.
Dimensions of all datasets with different methods.
| Dimension | ALLAML | GLI_85 | Prostate_GE | SMK_CAN_187 | Colon | Leukemia |
|---|---|---|---|---|---|---|
| ELM | 7130 | 22284 | 5967 | 19994 | 2001 | 7071 |
| CHSAE | 1026 | 192 | 1354 | 1867 | 28 | 1250 |
| INFORGAE | 2432 | 3562 | 2451 | 1727 | 220 | 1321 |
| RFAE | 5898 | 16780 | 4514 | 12525 | 1320 | 5569 |
| PSO | 3674.1 ± 1816.1 | 12456.5 ± 6205.8 | 2018.9 ± 1742.6 | 9505.2 ± 7589.8 | 1025.8 ± 687.3 | 2526.9 ± 2057.1 |
| BPSO | 3674.1 ± 566.5 | 12456.5 ± 5619.5 | 2018.9 ± 2258.6 | 9505.2 ± 5127.3 | 1025.8 ± 479.1 | 2526.9 ± 2339.6 |
| WSA | 3925.7 ± 3401.21 | 11174.3 ± 7988.38 | 2816.6 ± 2866.1 | 6406.4 ± 8237.1 | 1823.1 ± 368.3 | 5365.1 ± 2900.9 |
| EBWSA | 1098.4 ± 2179.4 | 8267.2 ± 5777.9 | 43.2 ± 59.4 | 25.3 ± 22.6 | 818.4 ± 554.6 | 972.5 ± 1554.1 |
Figure 3Average dimension (%) of all dataset.
Figure 4Consumption time by each swarm intelligence algorithm (i.e. PSO, BPSO, WSA, and EBWSA).
Figure 5Average accuracy, Kappa and Dimensions (%) of the ELM, CHSAE, INFORGAE, RFAE, PSO, BPSO, WSA, and EMWSA methods.
Figure 6Hunting behaviors of WSA (based on an example of a population of five wolves in iteration).
Figure 7Hunting behaviors of EBWSA (based on an example of a pack of five wolves in an iteration).
Figure 8The variation of a wolf’s position during iteration.
Figure 9Flow chart of the elitist process of WSA.
Figure 10An example: variations of each wolf’s (weight × population) for a population of 5 within a maximum of 100 iterations in the sub-figures (a–e) and varations of living environments is showed in last sub-figure (d).
Bioinformatics datasets used in experiments.
|
|
|
|
|---|---|---|
| ALLAML | 72 | 7129 |
| GLI_85 | 85 | 22283 |
| Colon | 62 | 2000 |
| Prostate_GE | 102 | 5966 |
| SMK_CAN_187 | 187 | 19993 |
| Leukemia | 72 | 7070 |
| EBWSA Pseudo code |
| Objective function |
| Initialize the population of wolves |
| Define and initialize parameters: |
| The memory length |
|
|
|
|
|
|
|
|
| // different weights of different wolves in different iteration, |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 9. ELSE IF |
|
|
| 11. END IF |
|
|
|
|
|
|
| if yes, repeat generate new location |
| 15. END IF |
|
|
|
|
| 18. END IF |
| 19. END FOR |
|
|
|
|
|
|
| 23. ELSE |
|
|
| 25. END IF |
| // Update the weight |
|
|
|
|
|
|
| 29. Else |
|
|
| 31. END IF |
| 32. END FOR |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 40. END IF |
| 41. END FOR |
|
|
|
|
|
|
| 45. END WHILE |