| Literature DB >> 33286385 |
Yu Zhou1, Junhao Kang1, Xiao Zhang2,3.
Abstract
Recent discretization-based feature selection methods show great advantages by introducing the entropy-based cut-points for features to integrate discretization and feature selection into one stage for high-dimensional data. However, current methods usually consider the individual features independently, ignoring the interaction between features with cut-points and those without cut-points, which results in information loss. In this paper, we propose a cooperative coevolutionary algorithm based on the genetic algorithm (GA) and particle swarm optimization (PSO), which searches for the feature subsets with and without entropy-based cut-points simultaneously. For the features with cut-points, a ranking mechanism is used to control the probability of mutation and crossover in GA. In addition, a binary-coded PSO is applied to update the indices of the selected features without cut-points. Experimental results on 10 real datasets verify the effectiveness of our algorithm in classification accuracy compared with several state-of-the-art competitors.Entities:
Keywords: cooperative coevolutionary; entropy-based cut-points; feature selection; genetic algorithms; particle swarm optimization
Year: 2020 PMID: 33286385 PMCID: PMC7517144 DOI: 10.3390/e22060613
Source DB: PubMed Journal: Entropy (Basel) ISSN: 1099-4300 Impact factor: 2.524
Figure 1Bit-flip mutation. Each gene of an individual has a certain probability to perform the flip operation.
Figure 2Discrete crossover. Two individuals X and Y are selected as parents, and genes are selected from the parents to produce offspring.
Figure 3The particle representation of EPSO.
Figure 4The particle representation of PPSO.
Figure 5Overview of our proposed method.
Figure 6The particle’s representation of our algorithm.
Datasets.
| Dataset | # | # of S | # of C | # of Small | # of Big |
|---|---|---|---|---|---|
| SRBCT | 2308 | 83 | 4 | 13 | 35 |
| DLBCL | 5469 | 77 | 2 | 25 | 75 |
| 9Tumor | 5726 | 60 | 9 | 3 | 15 |
| Leukemia 1 | 5327 | 72 | 3 | 13 | 53 |
| Leukemia 2 | 11,225 | 72 | 3 | 28 | 39 |
| Brain Tumor1 | 5920 | 90 | 5 | 4 | 67 |
| Brain Tumor2 | 10,367 | 50 | 4 | 14 | 30 |
| Prostate | 10,509 | 102 | 2 | 49 | 51 |
| Lung Cancer | 12,600 | 203 | 5 | 3 | 68 |
| 11Tumor | 12,533 | 174 | 11 | 4 | 16 |
# means; S means the number of samples; C means the number classes.
Parameter setting.
| Parameter | Setting |
|---|---|
| Population | No. of features/20 (Limited to 300 and no less than 100) |
| Maximum iteration | 100 |
| c1 and c2 | 1.49445 |
|
| 0.5 |
| Lmin | 0.25 |
| Lmax | 0.5 |
| Stopping criterion | The fitness value of |
Experimental results. DMLP, deep multilayer perceptron; FS, feature selection; CCB-DFS, coevolutionary discretization-based using bare-bone PSO FS. (Bold indicates the best value).
| Dataset | Method | # of Features | Best (%) | Avg (std) | S |
|---|---|---|---|---|---|
| SRBCT | Full | 2308.0 | 87.08 | + | |
| DMLP | 128 |
| 97.72 (1.67) | + | |
| PSO-FS | 150.0 | 97.50 | 91.31 (2.71) | + | |
| EPSO | 137.3 |
| 96.89 (1.64) | + | |
| PPSO | 108.5 |
| 95.78 (1.96) | + | |
| CCB-DFS | 168.1 |
| = | ||
| CC-DFS | 220.6 |
| 98.90 (1.05) | ||
| DLBCL | Full | 5469.0 | 83.00 | + | |
| DMLP | 128 |
| − | ||
| PSO-FS | 101.8 | 96.67 | 80.03 (6.13) | + | |
| EPSO | 42.8 | 94.17 | 85.18 (5.46) | + | |
| PPSO | 44.0 | 94.17 | 86.22 (3.58) | + | |
| CCB-DFS | 85.6 | 96.67 | 90.28 (3.29) | = | |
| CC-DFS | 77.6 | 96.67 | 90.37 (3.37) | ||
| 9Tumor | Full | 5726.0 | 36.67 | + | |
| DMLP | 128 | 55.97 | 48.48 (5.61) | + | |
| PSO-FS | 955.0 | 55.00 | 45.95 (4.93) | + | |
| EPSO | 138.5 |
| 58.22 (3.12) | − | |
| PPSO | 118.1 |
| − | ||
| CCB-DFS | 314.2 | 58.20 | 52.66 (3.64) | = | |
| CC-DFS | 278.0 | 61.48 | 53.78 (3.59) | ||
| Leukemia1 | Full | 5327.0 | 72.08 | + | |
| DMLP | 128 |
| 91.64 (3.99) | + | |
| PSO-FS | 150.0 | 92.22 | 81.60 (4.72) | + | |
| EPSO | 135.9 | 95.56 | 93.37 (1.83) | = | |
| PPSO | 80.4 | 95.42 | = | ||
| CCB-DFS | 126.8 | 96.67 | 94.14 (1.35) | = | |
| CC-DFS | 166.4 | 97.50 | 94.02 (1.45) | ||
| Leukemia2 | Full | 11,225.0 | 89.44 | + | |
| DMLP | 128 | 96.94 | 93.48 (1.75) | + | |
| PSO-FS | 150.0 | 93.89 | 86.11 (3.97) | + | |
| EPSO | 139.9 | 94.44 | 89.93 (2.79) | + | |
| PPSO | 86.7 |
| = | ||
| CCB-DFS | 346.8 |
| 95.17 (2.00) | = | |
| CC-DFS | 131.7 |
| 95.57 (2.09) | ||
| Brain Tumor1 | Full | 5920.0 | 72.08 | + | |
| DMLP | 128 | 82.57 | 73.76 (3.69) | + | |
| PSO-FS | 317.3 | 78.75 | 71.00 (3.06) | + | |
| EPSO | 150.7 | 79.17 | 72.79 (3.48) | + | |
| PPSO | 73.4 | 82.08 | 74.40 (3.67) | + | |
| CCB-DFS | 189.5 | 80.58 | 75.90 (2.49) | = | |
| CC-DFS | 187.4 |
| |||
| Brain Tumor2 | Full | 10,367.0 | 62.50 | + | |
| DMLP | 128 | 81.81 | 73.93 (3.25) | = | |
| PSO-FS | 417.9 | 82.08 | 69.11 (5.89) | + | |
| EPSO | 152.8 | 83.75 | 70.76 (5.30) | + | |
| PPSO | 66.7 | 74.58 | 68.75 (4.24) | + | |
| CCB-DFS | 298.6 |
| − | ||
| CC-DFS | 138.7 | 83.75 | 72.22 (5.01) | ||
| Prostate | Full | 10,509.0 | 85.33 | + | |
| DMLP | 128 | 83.40 | 74.25 (3.21) | + | |
| PSO-FS | 777.4 | 90.33 | 85.20 (2.35) | + | |
| EPSO | 54.9 | 90.33 | 83.74 (3.55) | + | |
| PPSO | 65.6 |
| − | ||
| CCB-DFS | 129.8 | 92.50 | 89.06 (2.08) | = | |
| CC-DFS | 180.6 | 92.17 | 88.52 (1.63) | ||
| 11Tumor | Full | 12,533.0 | 71.42 | + | |
| DMLP |
| 79.36 | 73.69 (3.19) | + | |
| PSO-FS | 1638.8 | 86.07 | 82.62 (1.70) | + | |
| EPSO | 149.9 | 83.68 | 79.29 (2.11) | + | |
| PPSO | 167.0 | 83.20 | 76.83 (2.91) | + | |
| CCB-DFS | 1422.4 |
| = | ||
| CC-DFS | 1890.2 | 87.77 | 84.84 (2.47) | ||
| Lung Cancer | Full | 12,600.0 | 78.05 | + | |
| DMLP | 128 | 81.21 | 73.78 (3.73) | + | |
| PSO-FS | 686.2 | 85.73 | 81.72 (2.08) | = | |
| EPSO | 150.8 | 85.58 | 80.60 (2.42) | = | |
| PPSO | 203.0 | 84.11 | 79.38 (3.26) | + | |
| CCB-DFS | 433.9 | 86.92 | = | ||
| CC-DFS | 155.6 |
| 81.40 (3.10) |
Figure 7With reset operation and without reset operation.
Running time (s). W, with the reset operation; W/R, without the reset operation.
| Dataset | Time (W) | Time (W/R) |
|---|---|---|
| SRBCT | 157.1 | 352.2 |
| DLBCL | 397.1 | 782.3 |
| 9Tumor | 260.4 | 637.9 |
| Leukemia1 | 475.1 | 778.9 |
| Leukemia2 | 1465.4 | 2283.8 |
| Brain Tumor1 | 909.8 | 1445.1 |
| Brain Tumor2 | 1591.6 | 2041.4 |
| Prostate | 884.4 | 1334.8 |
| 11Tumor | 2062.2 | 4885.1 |
| Lung Cancer | 4756.9 | 12,075.0 |
Figure 8Comparison of the running time.