| Literature DB >> 30881694 |
Chinnaswamy Arunkumar1, Srinivasan Ramakrishnan2.
Abstract
This Letter proposes a customised approach for attribute selection applied to the fuzzy rough quick reduct algorithm. The unbalanced data is balanced using synthetic minority oversampling technique. The huge dimensionality of the cancer data is reduced using a correlation-based filter. The dimensionality reduced balanced attribute gene subset is used to compute the final minimal reduct set using a customised fuzzy triangular norm operator on the fuzzy rough quick reduct algorithm. The customised fuzzy triangular norm operator is used with a Lukasiewicz fuzzy implicator to compute the fuzzy approximation. The customised operator selects the least number of informative feature genes from the dimensionality reduced datasets. Classification accuracy using leave-one-out cross validation of 94.85, 76.54, 98.11, and 99.13% is obtained using a customised function for Lukasiewicz triangular norm operator on leukemia, central nervous system, lung, and ovarian datasets, respectively. Performance analysis of the conventional fuzzy rough quick reduct and the proposed method are performed using parameters such as classification accuracy, precision, recall, F-measure, scatter plots, receiver operating characteristic area, McNemar test, chi-squared test, Matthew's correlation coefficient and false discovery rate that are used to prove that the proposed approach performs better than available methods in the literature.Entities:
Keywords: Lukasiewicz fuzzy implicator; Matthew's correlation coefficient; McNemar test; approximation theory; cancer; cancer data; central nervous system; chi-squared test; correlation-based filter; fuzzy approximation; fuzzy rough machine learning approaches; fuzzy rough quick reduct algorithm; fuzzy set theory; fuzzy triangular norm operator; learning (artificial intelligence); leukaemia; lung; medical computing; ovarian datasets; pattern classification; rough set theory; sampling methods
Year: 2018 PMID: 30881694 PMCID: PMC6407447 DOI: 10.1049/htl.2018.5055
Source DB: PubMed Journal: Healthc Technol Lett ISSN: 2053-3713
Fig. 1Framework of the proposed approach
Performance analysis on different datasets – conventional FRQR versus proposed customised FRQR
| Dataset | Number of genes in the raw dataset | Number of genes obtained using CFS | Method | No. of feature genes selected | CA, % | FDR | Precision | Recall | TP | FN | FP | TN | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| leukemia | 7129 | 112 | conventional FRQR | 8 | 87.63 | 0.120 | 0.880 | 0.880 | 0.880 | 44 | 6 | 6 | 41 |
| proposed method | 2 | 94.85 | 0.063 | 0.938 | 0.957 | 0.947 | 45 | 2 | 3 | 47 | |||
| CNS | 7129 | 100 | conventional FRQR | 10 | 69.14 | 0.333 | 0.667 | 0.810 | 0.731 | 34 | 8 | 17 | 22 |
| proposed method | 3 | 76.54 | 0.275 | 0.725 | 0.881 | 0.796 | 37 | 5 | 14 | 25 | |||
| lung cancer | 12,533 | 252 | conventional FRQR | 8 | 96.70 | 0.026 | 0.974 | 0.980 | 0.977 | 147 | 3 | 4 | 58 |
| proposed method | 2 | 98.11 | 0.020 | 0.981 | 0.993 | 0.987 | 149 | 1 | 3 | 59 | |||
| ovarian cancer | 15,154 | 44 | conventional FRQR | 8 | 96.22 | 0.059 | 0.941 | 0.981 | 0.961 | 159 | 3 | 10 | 172 |
| proposed method | 3 | 99.13 | 0.000 | 1.000 | 0.981 | 0.991 | 159 | 3 | 0 | 182 |
TP, true positive; FN, false negative; FP, false positive; TN, true negative.
Fig. 2Scatter plot for conventional FRQR for leukemia dataset
Fig. 3Scatter plot for proposed customised FRQR for leukemia dataset
k, MAE, RMSE metrics for conventional FRQR versus proposed customised FRQR
| Dataset | Conventional FRQR | Proposed method | ||||
|---|---|---|---|---|---|---|
| MAE | RMSE | MAE | RMSE | |||
| leukemia | 0.752 | 0.147 | 0.340 | 0.900 | 0.089 | 0.225 |
| CNS | 0.377 | 0.365 | 0.475 | 0.526 | 0.347 | 0.420 |
| lung cancer | 0.919 | 0.069 | 0.161 | 0.954 | 0.037 | 0.138 |
| ovarian cancer | 0.924 | 0.064 | 0.191 | 0.983 | 0.036 | 0.097 |
MCC, McNemar, chi-squared metrics for conventional FRQR versus proposed customised FRQR
| Dataset | Conventional FRQR | Proposed method | ||||
|---|---|---|---|---|---|---|
| MCC | McNemar chi-squared value ( | Chi-squared test value ( | MCC | McNemar chi-squared value ( | Chi-squared test value ( | |
| leukemia | 0.752 | 0.08 (0.386) | 51.93 (<0.0001) | 0.897 | 0.00 (0.500) | 74.51 (<0.0001) |
| CNS | 0.387 | 3.80 (0.109) | 10.56 (0.0006) | 0.540 | 2.89 (0.213) | 21.44 (<0.0001) |
| lung cancer | 0.920 | 0.04 (0.500) | 174.96 (<0.0001) | 0.954 | 0.02 (0.614) | 188.41 (<0.0001) |
| ovarian cancer | 0.925 | 2.14 (0.096) | 290.71 (<0.0001) | 0.983 | 1.33 (0.248) | 328.22 (<0.0001) |
TPR, FPR for conventional FRQR versus proposed customised FRQR
| Dataset | Conventional FRQR | Proposed method | ||
|---|---|---|---|---|
| TPR | FPR | TPR | FPR | |
| leukemia | 0.880 | 0.128 | 0.957 | 0.060 |
| CNS | 0.691 | 0.318 | 0.881 | 0.243 |
| lung cancer | 0.987 | 0.081 | 0.993 | 0.048 |
| ovarian cancer | 0.981 | 0.055 | 0.981 | 0.001 |
Comparison with state-of-the-art attribute selection methods
| Attribute selection method | CA, % | Number of genes in the reduced subset |
|---|---|---|
| independent component subspace [ | 83.00 | 10 |
| neighbourhood approximation [ | 84.64 | 3 |
| scalable feature selection [ | 84.64 | 4 |
| CFS-improved binary particle swarm optimisation [ | 84.53 | 7 |
| max dependency, relevance [ | 82.83 | 2 |
| CFS-PSO-FRQR [ | 90.19 | 10 |
| BDE-SVMrankf (binary differential evolution – support vector machine (SVMrankf)) [ | 91.80 | 4 |
| proposed customised FRQR | 92.16 | 3 |