| Literature DB >> 36236372 |
Halah AlMazrua1, Hala AlShamlan1.
Abstract
This paper presents two novel swarm intelligence algorithms for gene selection, HHO-SVM and HHO-KNN. Both of these algorithms are based on Harris Hawks Optimization (HHO), one in conjunction with support vector machines (SVM) and the other in conjunction with k-nearest neighbors (k-NN). In both algorithms, the goal is to determine a small gene subset that can be used to classify samples with a high degree of accuracy. The proposed algorithms are divided into two phases. To obtain an accurate gene set and to deal with the challenge of high-dimensional data, the redundancy analysis and relevance calculation are conducted in the first phase. To solve the gene selection problem, the second phase applies SVM and k-NN with leave-one-out cross-validation. A performance evaluation was performed on six microarray data sets using the two proposed algorithms. A comparison of the two proposed algorithms with several known algorithms indicates that both of them perform quite well in terms of classification accuracy and the number of selected genes.Entities:
Keywords: Harris Hawks Optimization; bio-inspired algorithms; bioinformatics; cancer classification; evolutionary algorithm; feature selection; gene expression; k-nearest neighbor; support vector machine
Mesh:
Substances:
Year: 2022 PMID: 36236372 PMCID: PMC9572901 DOI: 10.3390/s22197273
Source DB: PubMed Journal: Sensors (Basel) ISSN: 1424-8220 Impact factor: 3.847
Figure 1HHO-SVM and HHO-KNN flowchart.
Description of microarray data sets.
| Data Set | No. Total Genes | No. Samples | No. Classes |
|---|---|---|---|
| Colon Tumor [ | 2000 | 62 | 2 |
| Lung Cancer [ | 7129 | 96 | 2 |
| Leukemia2 [ | 7129 | 72 | 3 |
| Leukemia3 [ | 7129 | 72 | 2 |
| SRBCT [ | 2308 | 83 | 4 |
| Lymphoma [ | 4026 | 66 | 3 |
Parameter settings for HHO-SVM and HHO-KNN.
| Parameter | Value |
|---|---|
| Dimension | No. genes in data set |
| No. iterations (Max_iter) | 100 |
| Lower bound (LB) | 0 |
| Upper bound (UB) | 1 |
| No. Harris’s hawks (SearchAgents_no) | 10 |
| No. runs (m) | 30 |
|
| 7 |
Colon data set results.
| No. Genes | Best | Average | Worst | |
|---|---|---|---|---|
| HHO-KNN | 20 | 82.26% | 75.92% | 64.52% |
|
|
| 74.42% | 53.23% | |
| 10 | 88.71% | 74.65% | 61.29% | |
| 5 | 83.87% | 68.12% | 53.23% | |
| 2 | 79.03% | 64.01% | 48.39% | |
| HHO-SVM | 20 | 85.48% | 76.21% | 62.90% |
| 16 | 87.10% | 74.48% | 56.45% | |
|
|
| 73.94% | 51.61% | |
| 5 | 83.87% | 69.02% | 56.45% | |
| 2 | 74.19% | 64.16% | 51.61% |
Leukemia2 data set results.
| No. Genes | Best | Average | Worst | |
|---|---|---|---|---|
| HHO-KNN | 16 | 66.67% | 61.25% | 54.17% |
|
|
| 64.29% | 52.78% | |
| 6 | 73.61% | 64.12% | 55.56% | |
| 2 | 72.22% | 61.62% | 48.61% | |
| HHO-SVM | 16 | 68.06% | 64.95% | 62.50% |
|
|
| 66.17% | 58.33% | |
| 6 | 72.22% | 65.09% | 58.33% | |
| 2 | 72.22% | 65.32% | 59.72% |
Leukemia3 data set results.
| No. Genes | Best | Average | Worst | |
|---|---|---|---|---|
| HHO-KNN | 30 | 69.44% | 59.07% | 50.00% |
|
|
| 58.52% | 45.83% | |
| 20 | 63.89% | 55.42% | 44.44% | |
| 15 | 59.72% | 52.69% | 43.06% | |
| 5 | 61.11% | 52.82% | 43.06% | |
| HHO-SVM | 30 | 69.44% | 55.97% | 37.50% |
|
|
| 58.19% | 50.00% | |
| 20 | 65.28% | 53.43% | 38.89% | |
| 15 | 63.89% | 53.33% | 38.89% | |
| 5 | 66.67% | 55.00% | 40.28% |
Lung data set results.
| No. Genes | Best | Average | Worst | |
|---|---|---|---|---|
| HHO-KNN | 19 | 98.96% | 92.38% | 83.33% |
| 10 | 100.00% | 93.70% | 86.46% | |
|
|
| 91.57% | 84.38% | |
| 1 | 97.92% | 89.65% | 85.42% | |
| HHO-SVM | 19 | 95.83% | 90.25% | 89.58% |
| 10 | 100.00% | 90.90% | 87.50% | |
|
|
| 92.09% | 86.46% | |
| 1 | 97.92% | 89.76% | 86.46% |
Lymphoma data set results.
| No. Genes | Best | Average | Worst | |
|---|---|---|---|---|
| HHO-KNN | 12 | 98.48% | 92.31% | 80.30% |
| 10 | 100.00% | 92.22% | 77.27% | |
| 3 | 100.00% | 77.23% | 60.61% | |
|
|
| 73.63% | 60.61% | |
| 1 | 75.76% | 66.36% | 56.06% | |
| HHO-SVM | 12 | 100.00% | 93.43% | 80.30% |
| 10 | 100.00% | 92.66% | 72.73% | |
|
|
| 78.35% | 65.15% | |
| 2 | 96.97% | 74.01% | 65.15% | |
| 1 | 81.82% | 70.61% | 66.67% |
SRBCT data set results.
| No. Genes | Best | Average | Worst | |
|---|---|---|---|---|
| HHO-KNN | 30 | 83.13% | 60.64% | 39.76% |
|
|
| 56.43% | 37.35% | |
| 20 | 83.13% | 53.41% | 39.76% | |
| 10 | 75.90% | 47.83% | 28.92% | |
| 5 | 59.04% | 42.41% | 30.12% | |
| HHO-SVM | 30 | 90.36% | 59.92% | 33.73% |
|
|
| 57.35% | 40.96% | |
| 20 | 83.13% | 53.21% | 34.94% | |
| 10 | 78.31% | 45.26% | 21.69% | |
| 5 | 69.88% | 39.92% | 24.10% |
Comparison between the proposed selection methods and previous methods in terms of the number of selected genes and accuracy.
| Algorithms | Colon | Lung | Leukemia2 | Leukemia3 | Lymphoma | SRBCT |
|---|---|---|---|---|---|---|
| HHO-KNN | 90.32%(16) |
| 94.44%(11) | 90.28%(25) |
| 91.57%(29) |
| HHO-SVM | 90.32%(10) |
| 97.22%(11) | 84.72%(25) | 100%(3) | 92.77%(29) |
| HS-GA [ | 95.9%(20) | - | 97.5%(20) | - | - | - |
| FF-SVM [ | 92.7%(22) |
| 99.5%(11) | - | 92.6%(19) | 97.5%(14) |
| GBC [ | 98.38%(20) | - |
| - | - | - |
| MIM-mMFA [ |
| 100%(20) | 100%(6) |
| 100%(4) | 100%(23) |
| QMFOA [ | 100%(27) | 100%(20) | 100%(32) | 100%(30) | - | 100%(23) |
| BQPSO [ | 83.59%(46) | 100%(46) | 93.1%(48) | - | 100%(49) | - |
| PCC-GA [ | 91.94%(29) | 97.54%(42) | 100%(35) | - | 100%(39) |
|