| Literature DB >> 35509450 |
Yongqiang Dai1, Lili Niu2, Linjing Wei1, Jie Tang1.
Abstract
High-dimensional biomedical data contained many irrelevant or weakly correlated features, which affected the efficiency of disease diagnosis. This manuscript presented a feature selection method for high-dimensional biomedical data based on the chemotaxis foraging-shuffled frog leaping algorithm (BF-SFLA). The performance of the BF-SFLA based feature selection method was further improved by introducing chemokine operation and balanced grouping strategies into the shuffled frog leaping algorithm, which maintained the balance between global optimization and local optimization and reduced the possibility of the algorithm falling into local optimization. To evaluate the proposed method's effectiveness, we employed the K-NN (k-nearest Neighbor) and C4.5 decision tree classification algorithm with a comparative analysis. We compared our proposed approach with improved genetic algorithms, particle swarm optimization, and the basic shuffled frog leaping algorithm. Experimental results showed that the feature selection method based on BF-SFLA obtained a better feature subset, improved classification accuracy, and shortened classification time.Entities:
Keywords: bacterial foraging algorithm; biomedical data; classification accuracy; feature selection; shuffled frog leaping algorithm
Year: 2022 PMID: 35509450 PMCID: PMC9058075 DOI: 10.3389/fnins.2022.854685
Source DB: PubMed Journal: Front Neurosci ISSN: 1662-453X Impact factor: 5.152
FIGURE 1The simulation diagram of biological characteristics of SFLA.
FIGURE 2The curve of function a.
Parameters of the benchmark function.
| Function | Dimensions(n) | Scope | Optimal value | Accuracy |
|
| 30/60/90 | [–5.12,5.12] | 0 | | Actual Value –0| < 1 × 10–16 |
|
| 30/60/90 | [–30,30] | 0 | | Actual Value –0| < 1 × 101 |
|
| 30/60/90 | [–5.12,5.12] | 0 | | Actual Value –0| < 1 × 101 |
|
| 30/60/90 | [–600,600] | 0 | | Actual Value –0| < 1 × 10–2 |
|
| 30/60/90 | [–32,32] | 0 | | Actual Value –0| < 1 × 10–7 |
|
| 30/60/90 | [–100,100] | 0 | | Actual Value –0| < 1 × 100 |
|
| 30/60/90 | [–10,10] | 0 | | Actual Value –0| < 1 × 10–16 |
| f8(x) = Max{|xi|} | 30/60/90 | [–100,100] | 0 | | Actual Value –0| < 1 × 10–2 |
|
| 30/60/90 | [–100,100] | 0 | | Actual Value –0| < 1 × 10–16 |
| 30/60/90 | [–1.28,1.28] | 0 | | Actual Value –0| < 1 × 10–3 | |
|
| 30/60/90 | [–500,500] | –418.9829 | | Actual Value –(–418.9829 |
|
| ||||
|
| 30/60/90 | [–50,50] | 0 | | Actual Value –0| < 1 × 10–15 |
|
| 2 | [–5,5] | –1.0316285 | | Actual Value –1.0316285)| < 1 × 10–3 |
|
| 2 | [–15,15] | 0.398 | | Actual Value –(0.398)| < 1 × 10–2 |
|
| 2 | [–100,100] | –1 | | Actual Value –(–1)| < 1 × 10–4 |
FIGURE 3The average contribution rate of each group updating P.
The experimental results under fixed iteration number.
|
| SFLA | SFLA1 | SFLA2 | SFLA[25] | SFLA[26] | BF-SFLA | ||||||
| Ave | Std | Ave | Std | Ave | Std | Ave | Std | Ave | Std | Ave | Std | |
| f1 | 9.36E–01 | 8.66E–02 | 1.47E–33 | 4.92E–20 | 9.05E–01 | 6.68E–02 | 6.45E–03 | 3.12E–03 | 5.22E–03 | 7.32E–33 | 3.21E–18 | 5.02E–33 |
| f2 | 1.46E+02 | 6.59E+01 | 2.54E+01 | 1.71E+01 | 1.01E+02 | 6.08E+01 | 2.67E+02 | 5.28E+01 | 1.29E+02 | 3.05E–01 | 2.57E+01 | 4.63E–01 |
| f3 | 1.59E+01 | 4.39E+00 | 1.03E+00 | 3.19E+00 | 1.30E+01 | 4.11E+00 | 1.95E+01 | 7.07E+00 | 1.16E+01 | 1.56E+00 | 8.73E+00 | 2.03E+00 |
| f4 | 1.09E+00 | 4.57E–02 | 1.00E+00 | 1.60E–16 | 1.04E+00 | 3.11E–02 | 1.00E+00 | 2.14E–04 | 1.00E+00 | 1.93E–16 | 1.00E+00 | 2.14E–16 |
| f5 | 1.41E+00 | 5.68E–01 | 1.06E–14 | 2.62E–12 | 1.07E+00 | 5.26E–01 | 1.09E+00 | 6.62E–01 | 7.50E–01 | 3.18E–15 | 1.12E–12 | 7.68E–15 |
| f6 | 2.44E+01 | 7.33E+00 | 1.91E–01 | 3.37E+00 | 2.27E+01 | 8.54E+00 | 1.91E+01 | 4.46E+00 | 1.69E+01 | 2.88E–01 | 6.05E+00 | 3.88E–01 |
| f7 | 1.01E+00 | 1.08E–01 | 1.14E–17 | 6.22E–35 | 9.68E–01 | 3.66E–02 | 5.99E–01 | 1.50E–01 | 3.11E–01 | 2.66E–17 | 1.14E–35 | 2.77E–18 |
| f8 | 6.62E+00 | 9.98E–01 | 3.32E–04 | 3.53E–01 | 4.06E+00 | 9.44E–01 | 5.01E+00 | 7.40E–01 | 4.32E+00 | 2.54E–04 | 1.08E+00 | 4.47E–04 |
| f9 | 0.00E+00 | 0.00E+00 | 0.00E+00 | 0.00E+00 | 0.00E+00 | 0.00E+00 | 0.00E+00 | 0.00E+00 | 0.00E+00 | 0.00E+00 | 0.00E+00 | 0.00E+00 |
| f10 | 5.18E–01 | 1.57E–01 | 1.02E–03 | 8.29E–04 | 5.02E–01 | 9.63E–02 | 2.16E–03 | 8.03E–04 | 2.90E–03 | 3.30E–04 | 2.41E–03 | 3.99E–04 |
| f11 | –3.05E+03 | 4.01E+02 | –4.61E+03 | 6.48E+02 | –3.01E+03 | 4.20E+02 | –5.09E+03 | 5.67E+02 | –4.77E+03 | 3.55E+02 | –4.94E+03 | 2.48E+02 |
| f12 | 9.30E–01 | 6.60E–02 | 1.92E–32 | 7.25E–15 | 8.09E–01 | 8.40E–02 | 4.90E–02 | 7.51E–02 | 5.74E–02 | 2.06E–33 | 1.11E–17 | 1.40E–32 |
| f13 | –7.68E–01 | 2.01E–01 | –1.03E+00 | 2.51E–04 | –7.87E–01 | 2.53E–01 | –1.03E+00 | 0.00E+00 | –1.03E+00 | 1.05E–03 | –1.03E+00 | 9.72E–04 |
| f14 | 3.98E–01 | 1.70E–01 | 3.98E–01 | 0.00E+00 | 3.98E–01 | 1.78E–01 | 3.98E–01 | 3.98E–01 | 3.98E–01 | 0.00E+00 | 3.98E–01 | 0.00E+00 |
| f15 | –8.77E–01 | 6.40E–02 | –1.00E+00 | 3.36E–03 | –8.63E–01 | 7.23E–02 | –1.00E+00 | 0.00E+00 | –9.98E–01 | 2.69E–04 | –1.00E+00 | 7.35E–04 |
The experimental results under fixed optimization accuracy.
|
| SFLA | SFLA1 | SFLA2 | SFLA[25] | SFLA[26] | BF-SFLA | ||||||
| Ave(%) | AveN | Ave(%) | AveN | Ave(%) | AveN | Ave(%) | AveN | Ave(%) | AveN | Ave(%) | AveN | |
| f1 | 0% | – | 23% | 407 | 0% | – | 100% | 261 | 100% | 283 | 100% | 248 |
| f2 | 0% | – | 93% | 298 | 0% | – | 100% | 121 | 97% | 260 | 100% | 94 |
| f3 | 23% | 385 | 100% | 145 | 43% | 306 | 100% | 140 | 47% | 295 | 100% | 126 |
| f4 | 0% | – | 90% | 201 | 0% | – | 97% | 149 | 80% | 339 | 93% | 138 |
| f5 | 0% | – | 100% | 267 | 0% | – | 100% | 266 | 0% | – | 100% | 249 |
| f6 | 3% | 482 | 100% | 208 | 7% | 437 | 100% | 192 | 7% | 424 | 100% | 182 |
| f7 | 0% | – | 100% | 344 | 0% | – | 0% | – | 0% | – | 100% | 342 |
| f8 | 0% | – | 100% | 120 | 0% | – | 0% | – | 0% | – | 100% | 120 |
| f9 | 100% | 128 | 100% | 23 | 100% | 127 | 100% | 65 | 100% | 76 | 100% | 20 |
| f10 | 100% | 93 | 100% | 32 | 100% | 72 | 100% | 23 | 100% | 119 | 100% | 26 |
| f11 | 63% | 231 | 93% | 66 | 73% | 220 | 63% | 220 | 70% | 231 | 100% | 170 |
| f12 | 0% | – | 93% | 232 | 0% | – | 0% | – | 0% | – | 97% | 192 |
| f13 | 70% | 148 | 100% | 31 | 40% | 144 | 100% | 72 | 100% | 59 | 100% | 77 |
| f14 | 100% | 20 | 100% | 16 | 100% | 20 | 100% | 19 | 100% | 16 | 100% | 15 |
| f15 | 77% | 220 | 80% | 133 | 87% | 217 | 87% | 182 | 100%v | 117 | 100% | 74 |
The symbol “–” indicates that the fixed optimization accuracy cannot be achieved within the 500 times.
The index mean of fixed iteration times.
| Attribute | SFLA | SFLA1 | SFLA2 | SFLA[25] | SFLA[26] | BF-SFLA |
| AVE(Ave) | 6.48E+02 | 5.32E+02 | 6.47E+02 | 5.20E+02 | 5.31E+02 |
|
| AVE(Std) | 3.21E+01 | 4.48E+01 | 3.30E+01 | 4.22E+01 | 2.38E+01 |
|
The best value is in bold.
The index mean value under fixed optimization accuracy.
| Attribute | SFLA | SFLA1 | SFLA2 | SFLA[25] | SFLA[26] | BF-SFLA |
| AVE(Ave(%)) | 35.73% | 91.47% | 36.67% | 76.47% | 57.21% |
|
| AVE(AveN) | 323.62 | 139.85 | 311.00 | 217.54 | 282.77 |
|
The best value is in bold.
Addition and subtraction of discrete binary solutions.
| X1 | X2 | X1-X2 | X1+X2 |
| (1, 0, 1, 0) | (0, 1, 0, 0) | (0, 1, 1, 0) | (1, 1, 1, 0) |
FIGURE 4The feature selection flow chart.
The format of datasets.
| Data set | Instances | Attributes | Classes | K-NN (k = 5) | C4.5 |
| ColonTumor | 62 | 2,000 | 2 | 73.87 (0.24) | 73.87 (0.24) |
| DLBCL-Outcome | 58 | 7,129 | 2 | 47.46 (0.51) | 47.46 (0.51) |
| ALL-AML-Leukemia | 106 | 7,130 | 2 | 88.39 (0.13) | 88.39 (0.13) |
| Lung cancer-Ontario | 39 | 2,880 | 2 | 56.38 (0.34) | 56.38 (0.34) |
| DLBCL-Stanford | 47 | 4,026 | 2 | 75.51 (0.26) | 75.51 (0.26) |
| Lung cancer-Harvard2 | 181 | 12,534 | 2 | 94.38 (0.04) | 94.38 (0.04) |
| Nervous-System | 60 | 7,129 | 2 | 54.63 (0.42) | 54.63 (0.42) |
| Lung cancer-Harvard1 | 203 | 12,600 | 5 | 87.56 (0.09) | 87.56 (0.09) |
| DLBCL-NIH | 160 | 7,400 | 2 | 47.23 (0.46) | 47.23 (0.46) |
The running result for four algorithms.
| Data set | Algorithm | Ave(%) | Max(%) | Min(%) | Std | AveN | S |
| ColonTumor |
|
|
| 90.23 |
|
| 6 |
| SFLA | 89.02 | 91.66 |
| 2.69 | 36.16 | 6 | |
| IGA | 86.67 | 88.33 | 83.33 | 2.36 | 38.24 | 6 | |
| IPSO | 87.67 | 91.67 | 85.01 | 3.65 | 49.40 | 6 | |
| DLBCL-outcome |
|
|
|
|
|
| 8 |
| SFLA | 69.21 | 75.20 | 65.33 | 3.84 | 51.43 | 8 | |
| IGA | 64.33 | 70.06 | 60.00 | 5.21 | 27.62 | 8 | |
| IPSO | 71.11 | 76.67 | 63.33 | 5.34 | 51.24 | 8 | |
| ALL-AML-leukemia |
|
|
|
|
|
| 8 |
| SFLA | 97.27 | 99.09 | 94.52 | 1.93 | 45.65 | 8 | |
| IGA | 95.09 | 97.27 | 92.73 | 1.65 | 30.63 | 8 | |
| IPSO | 99.01 | 100.00 | 98.18 | 1.04 | 113.5 | 8 | |
| LungCancer-ontario |
|
|
|
|
|
| 8 |
| SFLA | 70.22 | 85.12 | 62.54 | 4.84 | 18.46 | 8 | |
| IGA | 65.51 | 75.21 | 57.52 | 4.18 | 10.22 | 8 | |
| IPSO | 70.00 | 77.50 | 57.50 | 4.89 | 56.25 | 8 | |
| DLBCL-stanford |
|
|
|
|
|
| 8 |
| SFLA | 80.01 | 82.01 | 78.04 | 2.06 | 25.67 | 8 | |
| IGA | 78.80 | 84.02 | 72.02 | 4.83 | 18.43 | 8 | |
| IPSO | 78.10 | 80.02 | 74.11 | 3.19 | 49.50 | 8 | |
| LungCancer-Harvard2 |
|
|
|
|
|
| 8 |
| SFLA | 98.02 | 98.81 | 96.66 | 1.06 | 75.25 | 8 | |
| IGA | 96.67 | 98.33 | 95.56 | 1.11 | 52.80 | 8 | |
| IPSO | 96.36 | 99.98 | 93.34 | 2.33 | 98.31 | 8 | |
| Nervous-system |
|
|
|
|
|
| 8 |
| SFLA | 76.08 | 80.05 | 71.67 | 3.64 | 57.86 | 8 | |
| IGA | 71.67 | 81.67 | 61.67 | 7.16 | 30.25 | 8 | |
| IPSO | 72.67 | 78.33 | 63.33 | 6.07 | 45.03 | 8 | |
| LungCancer-harvard1 |
| 90.03 | 91.12 | 88.49 |
|
| 9 |
| SFLA |
|
|
| 1.23 | 54.71 | 9 | |
| IGA | 85.90 | 87.50 | 84.09 | 1.29 | 31.81 | 9 | |
| IPSO | 91.90 | 94.14 | 90.04 | 1.51 | 44.20 | 9 | |
| DLBCL-NIH |
|
| 56.84 |
|
|
| 8 |
| SFLA | 54.16 |
| 50.63 | 3.13 | 30.75 | 8 | |
| IGA | 56.02 | 61.24 | 51.78 | 3.66 | 32.12 | 8 | |
| IPSO | 55.11 | 65.02 | 47.51 | 9.01 | 35.11 | 8 |
The best value is in bold.
The average attributes value for nine datasets.
| Attributes | BF-SFLA | SFLA | IGA | IPSO |
| AVE(Ave) |
| 80.57 | 77.55 | 80.21 |
| AVE(Std) |
| 2.60 | 3.49 | 4.11 |
| AVE(AveN) |
| 43.99 | 30.23 | 60.28 |
The best value is in bold.
FIGURE 5The variation trend of classification accuracy and feature subset of ColonTumor.
FIGURE 13The variation trend of classification accuracy and feature subset of DLBCL-NIH.
FIGURE 6The variation trend of classification accuracy and feature subset e of DLBCL-Outcome.
FIGURE 9The variation trend of classification accuracy and feature subset of DLBCL-Stanford.
FIGURE 7The variation trend of classification accuracy and feature subset of ALL-AML-Leukemia.
FIGURE 10The variation trend of classification accuracy and feature subset of LungCancer-Harvard2.
FIGURE 12The variation trend of classification accuracy and feature subset of lungcancer-harvard1.