| Literature DB >> 34094808 |
Gayathri Nagarajan1, L D Dhinesh Babu1.
Abstract
Feature selection has gained its importance due to the voluminous nature of the data. Owing to the computational complexity of wrapper approaches, the poor performance of filtering techniques, and the classifier dependency of embedded approaches, hybrid approaches are more commonly used in feature selection. Hybrid approaches use filtering metrics to reduce the computational complexity of wrapper algorithms and are proved to yield better feature subset. Though filtering metrics select the features based on their significance, most of them are unstable and biased towards the metric used. Moreover, the choice of filtering metrics depends largely on the distribution of data and data types. Biomedical datasets contain features with different distribution and types adding to the complexity in the choice of filtering metric. We address this problem by proposing a stable filtering method based on rank aggregation in hybrid feature selection model with Improved Squirrel search algorithm for biomedical datasets. Our proposed model is compared with other well-known and state-of-the-art methods and the results prove that our model exhibited superior performance in terms of classification accuracy and computational time. The robustness of our proposed model is proved by conducting experiments on nine biomedical datasets and with three different classifiers.Entities:
Keywords: Biomedical data classification; Hybrid feature selection; Linguistic fuzzy modeling; Rank aggregation
Year: 2021 PMID: 34094808 PMCID: PMC8170065 DOI: 10.1007/s13721-021-00313-7
Source DB: PubMed Journal: Netw Model Anal Health Inform Bioinform ISSN: 2192-6670
Fig. 1Graphical abstract of H-FBRA+ISSA
Fig. 2Sample fuzzy system–triangular membership function
Fig. 3Sample defuzzification of the features in our proposed model–centroid method
Dataset description
| Dataset no | Dataset | Source | Rows | Features | Classes | Feature types | Sample proportion |
|---|---|---|---|---|---|---|---|
| 1 | Pancan | UCI repository | 801 | 20531 | 5 | Continuous | 17–18–18–10–37% |
| 2 | Cancer Gene | Kaggle | 72 | 7129 | 2 | Continuous | 65–35% |
| 3 | Lymphoma | llmpp.nih.gov | 96 | 4026 | 9 | Continuous | 10–11–48–9–2–2–6–4–6% |
| 4 | Colon | Kaggle | 62 | 2000 | 2 | Continuous | 65–35% |
| 5 | MicroRNAs | Kaggle | 133 | 1928 | 2 | Continuous | 92–8% |
| 6 | Chronic kidney | UCI repository | 400 | 24 | 2 | Mixed | 62–38% |
| 7 | Spine disease | Kaggle | 310 | 12 | 2 | Continuous | 68–32% |
| 8 | Heart disease | UCI repository | 270 | 13 | 2 | Mixed | 55–45% |
| 9 | Cancer disease | Kaggle | 569 | 31 | 2 | Continuous | 37–63% |
Fig. 4Weight matrix of the datasets based on the three different measures—information, correlation, and distance
Fig. 5Final Feature ranking by FBRA
Comparison of H-FBRA + ISSA with Individual filtering metrics—SVM
| Approaches | DS1 | DS2 | DS3 | DS4 | DS5 | DS6 | DS7 | DS8 | DS9 |
|---|---|---|---|---|---|---|---|---|---|
| FS | 81 [38] | 83.95 [20] | 75.58 [19] | 87.2 [17] | 84.48 [16] | 87.83 [14] | 75.59 [8] | 78.48 [7] | 82.8 [16] |
| CHS | 82 [40] | 82.78 [22] | 74.86 [21] | 86.84 [19] | 83.76 [15] | 87.38 [10] | 74.06 [6] | 78.21 [8] | 82.35 [17] |
| ReliefF | 83.2 [39] | 83.77 [27] | 75.4 [18] | 85.76 [22] | 85.2 [17] | 86.3 [15] | 74.69 [4] | 79.74 [10] | 83.43 [20] |
| IG | 84.2 [37] | 85.84 [27] | 74.32 [18] | 88.1 [20] | 84.75 [18] | 87.02 [9] | 75.5 [5] | 76.68 [8] | 86.04 [17] |
| MRMR | 85 [38] | 85.66 [26] | 75.58 [16] | 87.65 [18] | 85.29 [19] | 89 [8] | 75.77 [7] | 79.74 [9] | 85.86 [16] |
| CFS | 81.9 [40] | 85.39 [21] | 75.49 [17] | 88.1 [19] | 84.3 [17] | 89.7 [9] | 74.78 [8] | 78.66 [10] | 84.15 [15] |
| MRMD | 84.2 [35] | 85.94 [18] | 76.3 [20] | 85.85 [20] | 85.56 [16] | 88.28 [12] | 73.89 [4] | 80.01 [12] | 84.78 [17] |
| SFR | 83.7 [42] | 85.48 [17] | 75.85 [18] | 87.47 [22] | 84.48 [17] | 87.47 [17] | 75.5 [5] | 79.56 [9] | 83.79 [15] |
| NQF | 85.1 [35] | 84.49 [19] | 74.86 [17] | 86.66 [20] | 85.2 [15] | 88.19 [12] | 73.79 [6] | 74.88 [7] | 82.62 [14] |
| CCI | 84 [40] | 86.02 [15] | 75.31 [16] | 87.65 [19] | 83.4 [14] | 86.84 [11] | 75.77 [7] | 78.75 [9] | 83.34 [17] |
| RMI | 85 [39] | 85.48 [17] | 75.85 [18] | 86.84 [17] | 84.75 [13] | 87.29 [8] | 74.78 [7] | 79.47 [8] | 84.33 [16] |
| FBRA |
The bold values represent the best solution corresponding to the evaluated metric
Comparison of H-FBRA + ISSA with Individual filtering metrics—RF
| Approaches | DS1 | DS2 | DS3 | DS4 | DS5 | DS6 | DS7 | DS8 | DS9 |
|---|---|---|---|---|---|---|---|---|---|
| FS | 82.5 [38] | 84.58 [20] | 73.87 [19] | 86.39 [17] | 83.58 [16] | 86.48 [14] | 72.8 [8] | 78.66 [7] | 88.83 [16] |
| CHS | 83 [40] | 83.59 [22] | 74.68 [21] | 87.38 [19] | 84.3 [15] | 87.47 [10] | 74.87 [6] | 79.11 [8] | 84.78 [17] |
| ReliefF | 83.5 [39] | 84.94 [27] | 73.6 [18] | 85.85 [22] | 84.39 [17] | 85.4 [15] | 73.52 [4] | 80.19 [10] | 88.02 [20] |
| IG | 84.2 [37] | 85.66 [27] | 74.14 [18] | 86.39 [20] | 83.67 [18] | 86.66 [9] | 74.69 [5] | 78.57 [8] | 86.85 [17] |
| MRMR | 85.1 [38] | 85.93 [26] | 74.86 [16] | 87.2 [18] | 83.49 [19] | 87.83 [8] | 75.41 [7] | 80.1 [9] | 87.39 [16] |
| CFS | 83.1 [40] | 85.48 [21] | 74.59 [17] | 86.84 [19] | 84.3 [17] | 87.65 [9] | 76.22 [8] | 79.65 [10] | 85.86 [15] |
| MRMD | 84.8 [35] | 85.21 [18] | 74.05 [20] | 85.85 [20] | 84.57 [16] | 85.58 [12] | 74.6 [4] | 80.64 [12] | 86.58 [17] |
| SFR | 84.1 [42] | 85.48 [17] | 74.59 [18] | 86.39 [22] | 84.3 [17] | 87.29 [17] | 72.89 [5] | 80.1 [9] | 86.49 [15] |
| NQF | 85.9 [35] | 84.85 [19] | 74.87 [17] | 85.76 [20] | 83.76 [15] | 87.47 [12] | 74.06 [6] | 78.66 [7] | 87.57 [14] |
| CCI | 84.1 [40] | 86.02 [15] | 74.86 [16] | 86.48 [19] | 84.48 [14] | 88.1 [11] | 74.6 [7] | 78.84 [9] | 88.47 [17] |
| RMI | 85 [39] | 85.48 [17] | 75.31 [18] | 87.2 [17] | 84.12 [13] | 87.29 [8] | 75.5 [7] | 80.28 [8] | 89.28 [16] |
| FBRA |
The bold values represent the best solution corresponding to the evaluated metric
Comparison of H-FBRA + ISSA with Individual filtering metrics–DNN
| Approaches | DS1 | DS2 | DS3 | DS4 | DS5 | DS6 | DS7 | DS8 | DS9 |
|---|---|---|---|---|---|---|---|---|---|
| FS | 83 [38] | 83.66 [20] | 77.38 [19] | 87.83 [17] | 83.2 [16] | 81.8 [14] | 71.27 [8] | 73.89 [7] | 80.19 [16] |
| CHS | 84.1 [40] | 82.58 [22] | 76.57 [21] | 87.74 [19] | 82.38 [15] | 82.07 [10] | 67.58 [6] | 75.78 [8] | 80.91 [17] |
| ReliefF | 84.1 [39] | 84.02 [27] | 77.38 [18] | 86.48 [22] | 84.2 [17] | 83.6 [15] | 70.46 [4] | 78.48 [10] | 81.09 [20] |
| IG | 84.8 [37] | 85.15 [27] | 76.66 [18] | 87.83 [20] | 84.14 [18] | 82.88 [9] | 72.17 [5] | 77.67 [8] | 82.17 [17] |
| MRMR | 85.3 [38] | 85.38 [26] | 78.64 [16] | 87.11 [18] | 83.56 [19] | 84.86 [8] | 71.63 [7] | 78.3 [9] | 81.81 [16] |
| CFS | 84.6 [40] | 84.2 [21] | 77.65 [17] | 87.2 [19] | 84.2 [17] | 85.4 [9] | 70.91 [8] | 76.77 [10] | 81 [15] |
| MRMD | 85 [35] | 85.03 [18] | 78.46 [20] | 86.57 [20] | 82.74 [16] | 84.59 [12] | 71.18 [4] | 77.58 [12] | 80.91 [17] |
| SFR | 84.7 [42] | 84.66 [17] | 77.74 [18] | 87.56 [22] | 84.2 [17] | 83.6 [17] | 70.19 [5] | 76.86 [9] | 78.48 [15] |
| NQF | 86 [35] | 84.12 [19] | 78.55 [17] | 86.84 [20] | 85.1 [15] | 84.23 [12] | 69.74 [6] | 78.3 [7] | 80.28 [14] |
| CCI | 85 [40] | 85.92 [15] | 78.46 [16] | 87.38 [19] | 83.1 [14] | 82.79 [11] | 70.46 [7] | 77.58 [9] | 81.18 [17] |
| RMI | 86 [39] | 84.38 [17] | 77.56 [18] | 87.29 [17] | 83.75 [13] | 84.5 [8] | 67.85 [7] | 76.86 [8] | 81.45 [16] |
| FBRA |
The bold values represent the best solution corresponding to the evaluated metric.
Comparison of H-FBRA + ISSA with other aggregation approaches—SVM
| Approaches | DS1 | DS2 | DS3 | DS4 | DS5 | DS6 | DS7 | DS8 | DS9 |
|---|---|---|---|---|---|---|---|---|---|
| Borda | 84 [38] | 84.9 [29] | 76.2 [21] | 87 [23] | 85.1 [16] | 87 [15] | 73 [10] | 79 [10] | 85.6 [20] |
| RRA | 84.9 [38] | 85.6 [28] | 76 [26] | 87.8 [21] | 84.3 [19] | 87.6 [17] | 72 [9] | 80[12] | 86.1 [18] |
| SA | 85 [38] | 86.1 [26] | 77 [19] | 87.2 20] | 86 [15] | 88.1 [19] | 76 [8] | 79.3 [9] | 84 [19] |
| MVFS | 85.5 [38] | 87.3 [27] | 77.8 [20] | 88 [21] | 8 7[14] | 89 [17] | 74 [6] | 80[8] | 86 [18] |
| FBRA |
The bold values represent the best solution corresponding to the evaluated metric
Comparison of H-FBRA + ISSA with other aggregation approaches—F
| Approaches | DS1 | DS2 | DS3 | DS4 | DS5 | DS6 | DS7 | DS8 | DS9 |
|---|---|---|---|---|---|---|---|---|---|
| Borda | 84 [38] | 86 [29] | 75.9 [21] | 85.8 [23] | 87 [16] | 87 [15] | 73 [10] | 80 [10] | 88.2 [20] |
| RRA | 85.5 [38] | 87.1 [28] | 76.5 [26] | 86.3 [21] | 85.4 [19] | 86 [17] | 72.4 [9] | 80.5 [12] | 89 [18] |
| SA | 85 [38] | 86.9 [26] | 77 [19] | 87 [20] | 86.3 [15] | 88.1[19] | 76.2 [8] | 81 [9] | 86.3 [19] |
| MVFS | 86 [38] | 87.6 [27] | 7 8[20] | 87.4 [21] | 87.1 [14] | 88.4 [17] | 74 [6] | 81.5 [8] | 89.4 [18] |
| FBRA |
The bold values represent the best solution corresponding to the evaluated metric
Comparison of H-FBRA + ISSA with other aggregation approaches—DNN
| Approaches | DS1 | DS2 | DS3 | DS4 | DS5 | DS6 | DS7 | DS8 | DS9 |
|---|---|---|---|---|---|---|---|---|---|
| Borda | 85 [38] | 87.19 [29] | 78.5 [21] | 87.4 [23] | 86.6 [16] | 85 [15] | 71 [10] | 79 [10] | 80 [20] |
| RRA | 85.6 [38] | 87.18 [28] | 79 [26] | 88 [21] | 87.1 [19] | 84.4 [17] | 68 [9] | 78.4 [12] | 82 [18] |
| SA | 85.9 [38] | 87.6 [26] | 78.9 [19] | 87.9 [20] | 87.19 [15] | 85.3 [19] | 69 [8] | 79 [9] | 81.3 [19] |
| MVFS | 86 [38] | 86.9 [27] | 80 [20] | 88.1 [21] | 87.5 [14] | 85.1 [17] | 70 [6] | 79.2 [8] | 82 [18] |
| FBRA |
The bold values represent the best solution corresponding to the evaluated metric
Fig. 6Classification accuracy of optimization algorithms on different classifiers
Fig. 7Number of features selected by optimization algorithms on different classifiers
Comparison of H-FBRA + ISSA with state-of-the-art methods—classification accuracy
| Dataset |
Jain et al. ( |
Apolloni et al. ( |
Bonilla-Huerta et al. ( | H-FBRA + ISSA | Original number of features |
|---|---|---|---|---|---|
| 1 | 94 [47] | 95 [67] | 97[94] | 20531 | |
| 2 | 99.3 [ | 96 [9] | 95.8 [11] | 7129 | |
| 3 | 99.4 [3] | 100 [24] | 4026 | ||
| 4 | 98.7 [7] | 93.8 [ | 99 [9] | 2000 | |
| 5 | 94 [8] | 97[9] | 98 [ | 1928 | |
| 6 | 99 [11] | 99.2 [ | 100 [12] | 24 | |
| 7 | 91 [6] | 90 [ | 92 [8] | 12 | |
| 8 | 87 [ | 90 [9] | 89.1 [6] | 13 | |
| 9 | 92.4 [ | 95.6 [15] | 31 |
The bold values represent the best solution corresponding to the evaluated metric
Comparison of average execution time in seconds
| Dataset |
Jain et al. ( |
Apolloni et al. ( |
Bonilla-Huerta et al. ( | H-FBRA + ISSA |
|---|---|---|---|---|
| 1 | 5900 | 4200 | 3900 | 1058 |
| 2 | 208 | 195 | 177 | 121 |
| 3 | 347 | 280 | 217 | 201 |
| 4 | 102 | 95 | 67 | 32 |
| 5 | 123 | 108 | 72 | 35 |
| 6 | 32 | 27 | 19 | 11 |
| 7 | 16 | 13 | 9 | 8 |
| 8 | 18 | 11 | 8 | 6 |
| 9 | 31 | 21 | 16 | 13 |
Subset of features selected by H-FBRA + ISSA
| Dataset no | Dataset | Subset of features |
|---|---|---|
| 1 | Pancan | ZNF193, SF3A3, UBE2Z, G0, AGTPBP1, B4GALT3, RIPPLY1, DLGAP3, LOC399815, MAK16, NDUFS6, LPIN2, MBOAT2, ADRA1B, RNF185, ARL2BP, RIPPLY 1, . |
| 2 | Cancer Gene | 4847(Zyxin), 804(Macmarcks), 1882(CST3 Cystatin C), 6855(TCF3 Transcription factor 3) 6919(RNS2 Ribonuclease 2),2348(ACADM Acyl-Coenzyme A dehydrogenase),461, 1962, 5552, 2131 |
| 3 | Lymphoma | 390,3066 |
| 4 | Colon | 377(H.sapiens mRNA for GCAP-II/uroguanylin precursor), 765, 590, 384, 266, 1058(H.sapiens a-L-fucosidase gene),1541, 1873(Human MXI1 mRNA, complete cds.) |
| 5 | MicroRNAs | miR-505-5p, miR-125b-5p, miR-21-5p, miR-96-5p,miR-3613-3p,miR-4668-5p,miR-4516,miR-3656,miR-4488,miR-5704 |
| 6 | Chronic kidney | Specific gravity, albumin, blood glucose random, potassium, haemoglobin, packed cell volume, red blood cell count, hypertension, serum creatinine, anaemia |
| 7 | Spine disease | Degree spon, pelvic incidence, lumbar angle, pelvic rad and pelvic tilt, cervical tilt |
| 8 | Heart disease | Age, fasting blood sugar, Trestbps, Cholestrol, Thal, Slope,Cp |
| 9 | Cancer disease | Perimeter-largest-worst,area-largest-worst,smoothness-largest-worst,compactness-mean, concave-points-mean, texture-largest-worst, texture-mean, symmetry-mean, concavity-largest-worst, concavity-mean, fractal-dimension-largest-worst, perimeter-mean, radius-se, area-mean |