| Literature DB >> 30155265 |
Loganathan Meenachi1, Srinivasan Ramakrishnan1.
Abstract
Cancer is one of the deadly diseases of human life. The patient may likely to survive if the disease is diagnosed in its early stages. In this Letter, the authors propose a genetic search fuzzy rough (GSFR) feature selection algorithm, which is hybridised using the evolutionary sequential genetic search technique and fuzzy rough set to select features. The genetic operator's selection, crossover and mutation are applied to generate the subset of features from dataset. The generated subset is subjected to the evaluation with the modified dependency function of the fuzzy rough set using positive and boundary regions, which act as a fitness function. The generation and evaluation of the subset of features continue until the best subset is arrived at to develop the classification model. Selected features are applied to the different classifiers, from the classifiers fuzzy-rough nearest neighbour (FRNN) classifier, which outperforms in terms of classification accuracy and computation time. Hence, the FRNN is applied for performance analysis of existing feature selection algorithms against the proposed GSFR feature selection algorithm. The result generated from the proposed GSFR feature selection algorithm proved to be precise when compared to other feature selection algorithms.Entities:
Keywords: FRNN classifier; GSFR feature selection algorithm; boundary regions; cancer; classification accuracy; computation time; deadly diseases; evolutionary sequential genetic search technique-based cancer classification; feature selection; fitness function; fuzzy rough nearest neighbour classifier; fuzzy rough set; fuzzy set theory; generated subset; genetic algorithms; genetic operator; genetic search fuzzy rough feature selection algorithm; mathematical operators; medical computing; modified dependency function; pattern classification; positive regions; rough set theory; search problems
Year: 2018 PMID: 30155265 PMCID: PMC6103784 DOI: 10.1049/htl.2018.5041
Source DB: PubMed Journal: Healthc Technol Lett ISSN: 2053-3713
Description of the datasets
| Breast | DLBCL | SRBCT | Leukaemia | EEG | Gisette | |
|---|---|---|---|---|---|---|
| features | 9217 | 4027 | 2309 | 12,583 | 4097 | 5000 |
| instances | 54 | 58 | 63 | 57 | 500 | 13,500 |
| classes | 5 | 6 | 4 | 3 | 5 | 2 |
Fig. 1Block diagram denotes the classification model involving the feature selection by GSFR algorithm using genetic search technique and fuzzy rough set
Fig. 2Algorithm: proposed GSFR algorithm
Fig. 3Algorithm: FRNN algorithm
Datasets with reduced features considering different feature selection techniques
| Dataset | Total number of features | Number of features selected with proposed and other feature selection techniques (% of features selected) | ||||
|---|---|---|---|---|---|---|
| CFS [ | CSE [ | PCA [ | WSE [ | Proposed GSFR | ||
| breast | 9217 | 2677 (29%) | 1928 (21%) | 1458 (16%) | 1337 (15%) | 1329 (14%) |
| DLBCL | 4027 | 1014 (25%) | 872 (22%) | 543 (13%) | 313 (8%) | 306 (8%) |
| SRBCT | 2309 | 254 (11%) | 167 (7%) | 96 (4%) | 53 (2%) | 47 (2%) |
| Leukaemia | 12,583 | 1239 (10%) | 767 (6%) | 593 (5%) | 193 (2%) | 190 (2%) |
| EEG | 4097 | 1075 (26%) | 912 (22%) | 527 (13%) | 298 (7%) | 155 (4%) |
| Gisette | 5000 | 1202 (24%) | 923 (18%) | 519 (10%) | 392 (8%) | 65 (1%) |
Classification accuracy and computation time for different classifier with proposed GSFR feature selection algorithm A and without using proposed GSFR feature selection algorithm B
| Dataset | Accuracy, % and time, s | NN [ | J48 [ | FNN [ | RandF [ | GradBoost [ | SVM [ | FRNN [ | |||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| B | A | B | A | B | A | B | A | B | A | B | A | B | A | ||
| breast | acc. | 60.37 | 66.67 | 70.37 | 73.19 | 66.67 | 70.37 | 62.96 | 72.22 | 73.78 | 73.90 | 73.54 | 74.04 | 74.02 | 74.07 |
| time | 0.37 | 0.10 | 1.30 | 0.07 | 0.25 | 0.06 | 0.34 | 0.18 | 4.88 | 0.65 | 1.34 | 0.06 | 0.26 | 0.04 | |
| DLBCL | acc. | 72.76 | 77.59 | 68.80 | 72.40 | 80.76 | 82.75 | 75.86 | 81.03 | 87.93 | 89.66 | 90.00 | 94.45 | 93.10 | 96.55 |
| time | 0.12 | 0.08 | 0.44 | 0.06 | 0.12 | 0.04 | 0.09 | 0.01 | 2.44 | 0.18 | 0.12 | 0.05 | 0.11 | 0.02 | |
| SRBCT | acc. | 66.19 | 74.60 | 72.54 | 74.60 | 67.30 | 76.19 | 72.54 | 75.71 | 75.71 | 78.65 | 78.89 | 79.41 | 79.15 | 80.95 |
| time | 0.08 | 0.04 | 0.18 | 0.10 | 0.06 | 0.01 | 0.06 | 0.01 | 1.38 | 0.02 | 0.06 | 0.04 | 0.01 | 0.01 | |
| Leukaemia | acc. | 67.72 | 73.68 | 61.19 | 65.78 | 56.74 | 57.90 | 73.68 | 75.44 | 70.70 | 81.23 | 72.48 | 82.98 | 70.17 | 87.71 |
| time | 0.37 | 0.12 | 0.71 | 0.12 | 0.36 | 0.05 | 0.84 | 0.02 | 3.77 | 0.06 | 0.44 | 0.01 | 0.54 | 0.05 | |
| EEG | acc. | 60.08 | 61.80 | 74.20 | 88.80 | 73.20 | 89.80 | 75.80 | 89.89 | 78.00 | 84.23 | 78.20 | 87.20 | 78.80 | 90.80 |
| time | 1.18 | 0.05 | 1.28 | 0.36 | 1.75 | 0.05 | 1.41 | 0.12 | 1.17 | 0.18 | 1.42 | 0.39 | 1.82 | 0.05 | |
| Gisette | acc. | 69.08 | 76.03 | 70.02 | 78.76 | 70.94 | 79.09 | 72.50 | 80.00 | 73.27 | 82.60 | 75.00 | 87.80 | 75.25 | 90.60 |
| time | 1.34 | 0.12 | 1.46 | 0.19 | 1.12 | 0.09 | 0.90 | 0.05 | 1.02 | 0.06 | 0.50 | 0.04 | 0.24 | 0.01 | |
Performance analysis of FRNN with different feature selection techniques
| Feature selection techniques | Dataset | Sensitivity | Specificity | Precision | F-measure | Accuracy |
|---|---|---|---|---|---|---|
| CFS [ | Breast | 0.653 | 0.889 | 0.709 | 0.660 | 66.67 |
| DLBCL | 0.825 | 0.948 | 0.863 | 0.829 | 77.59 | |
| SRBCT | 0.709 | 0.884 | 0.711 | 0.709 | 71.43 | |
| Leukaemia | 0.735 | 0.848 | 0.721 | 0.726 | 73.68 | |
| EEG | 0.832 | 0.952 | 0.819 | 0.816 | 83.2 | |
| Gisette | 0.818 | 0.786 | 0.813 | 0.796 | 81.82 | |
| CSE [ | Breast | 0.687 | 0.892 | 0.675 | 0.666 | 68.52 |
| DLBCL | 0.846 | 0.952 | 0.875 | 0.846 | 79.31 | |
| SRBCT | 0.732 | 0.892 | 0.728 | 0.726 | 73.02 | |
| Leukaemia | 0.747 | 0.861 | 0.743 | 0.743 | 75.44 | |
| EEG | 0.85 | 0.957 | 0.844 | 0.838 | 85 | |
| Gisette | 0.836 | 0.802 | 0.834 | 0.814 | 83.6 | |
| PCA [ | Breast | 0.685 | 0.905 | 0.793 | 0.706 | 70.37 |
| DLBCL | 0.856 | 0.956 | 0.884 | 0.859 | 81.03 | |
| SRBCT | 0.747 | 0.901 | 0.749 | 0.744 | 74.60 | |
| Leukaemia | 0.761 | 0.872 | 0.761 | 0.758 | 77.19 | |
| EEG | 0.866 | 0.962 | 0.865 | 0.854 | 86.6 | |
| Gisette | 0.843 | 0.822 | 0.847 | 0.830 | 84.26 | |
| WSE [ | Breast | 0.739 | 0.912 | 0.796 | 0.728 | 72.22 |
| DLBCL | 0.878 | 0.96 | 0.877 | 0.873 | 82.76 | |
| SRBCT | 0.767 | 0.907 | 0.788 | 0.760 | 76.19 | |
| Leukaemia | 0.856 | 0.923 | 0.864 | 0.858 | 77.78 | |
| EEG | 0.882 | 0.967 | 0.877 | 0.869 | 88.2 | |
| Gisette | 0.871 | 0.846 | 0.878 | 0.857 | 87 | |
| proposed GSFR | Breast | 0.741 | 0.948 | 0.826 | 0.781 | 74.07 |
| DLBCL | 0.942 | 0.985 | 0.904 | 0.923 | 96.55 | |
| SRBCT | 0.849 | 0.94 | 0.841 | 0.845 | 80.95 | |
| Leukaemia | 0.705 | 0.856 | 0.746 | 0.725 | 87.71 | |
| EEG | 0.908 | 0.977 | 0.916 | 0.908 | 90.8 | |
| Gisette | 0.906 | 0.85 | 0.91 | 0.904 | 90.6 |