| Literature DB >> 26491718 |
Senthilkumar Devaraj1, S Paulraj2.
Abstract
Multidimensional medical data classification has recently received increased attention by researchers working on machine learning and data mining. In multidimensional dataset (MDD) each instance is associated with multiple class values. Due to its complex nature, feature selection and classifier built from the MDD are typically more expensive or time-consuming. Therefore, we need a robust feature selection technique for selecting the optimum single subset of the features of the MDD for further analysis or to design a classifier. In this paper, an efficient feature selection algorithm is proposed for the classification of MDD. The proposed multidimensional feature subset selection (MFSS) algorithm yields a unique feature subset for further analysis or to build a classifier and there is a computational advantage on MDD compared with the existing feature selection algorithms. The proposed work is applied to benchmark multidimensional datasets. The number of features was reduced to 3% minimum and 30% maximum by using the proposed MFSS. In conclusion, the study results show that MFSS is an efficient feature selection algorithm without affecting the classification accuracy even for the reduced number of features. Also the proposed MFSS algorithm is suitable for both problem transformation and algorithm adaptation and it has great potentials in those applications generating multidimensional datasets.Entities:
Mesh:
Year: 2015 PMID: 26491718 PMCID: PMC4601565 DOI: 10.1155/2015/821798
Source DB: PubMed Journal: ScientificWorldJournal ISSN: 1537-744X
Figure 1The relationship among different classification paradigms.
Figure 2Proposed MFSS for multidimensional dataset.
Details of the dataset used in experiments.
| Dataset | Number of instance | Number of features | Number of target classes |
|---|---|---|---|
| Thyroid | 9172 | 29 | 7 |
| Solar flare | 1389 | 10 | 3 |
| Scene | 2407 | 294 | 6 |
| Music | 593 | 72 | 6 |
| Yeast | 2417 | 103 | 14 |
Evaluation metrics for J48 algorithm.
| Metrics | HS | EM | HL | ZOL | ||||
|---|---|---|---|---|---|---|---|---|
| Dataset | BFS | MFSS | BFS | MFSS | BFS | MFSS | BFS | MFSS |
| Thyroid |
|
| 0.948 | 0.896 | 0.01 | 0.022 | 0.052 | 0.104 |
| Solar flare |
|
| 0.791 | 0.791 | 0.088 | 0.088 | 0.209 | 0.209 |
| Scene | 0.849 | 0.75 | 0.525 | 0.261 | 0.151 | 0.25 | 0.475 | 0.739 |
| Music | 0.723 | 0.748 | 0.213 | 0.233 | 0.277 | 0.252 | 0.787 | 0.767 |
| Yeast | 0.713 | 0.725 | 0.142 | 0.145 | 0.287 | 0.275 | 0.858 | 0.855 |
Evaluation metrics for Naive Bayes algorithm.
| Metrics | HS | EM | HL | ZOL | ||||
|---|---|---|---|---|---|---|---|---|
| Dataset | BFS | MFSS | BFS | MFSS | BFS | MFSS | BFS | MFSS |
| Thyroid | 0.946 | 0.967 | 0.668 | 0.83 | 0.054 | 0.033 | 0.332 | 0.17 |
| Solar flare | 0.879 | 0.9 | 0.709 | 0.782 | 0.121 | 0.1 | 0.291 | 0.218 |
| Scene | 0.862 | 0.769 | 0.527 | 0.302 | 0.138 | 0.231 | 0.473 | 0.698 |
| Music | 0.782 | 0.767 | 0.297 | 0.248 | 0.218 | 0.233 | 0.703 | 0.752 |
| Yeast | 0.713 | 0.737 | 0.142 | 0.127 | 0.287 | 0.263 | 0.858 | 0.873 |
Evaluation metrics for SVM algorithm.
| Metrics | HS | EM | HL | ZOL | ||||
|---|---|---|---|---|---|---|---|---|
| Dataset | BFS | MFSS | BFS | MFSS | BFS | MFSS | BFS | MFSS |
| Thyroid | 0.968 | 0.967 | 0.792 | 0.788 | 0.032 | 0.033 | 0.208 | 0.212 |
| Solar flare |
|
| 0.791 | 0.791 | 0.088 | 0.088 | 0.209 | 0.209 |
| Scene |
|
| 0.695 | 0.315 | 0.09 | 0.226 | 0.305 | 0.685 |
| Music |
|
| 0.356 | 0.267 | 0.192 | 0.228 | 0.644 | 0.733 |
| Yeast |
|
| 0.251 | 0.173 | 0.209 | 0.231 | 0.749 | 0.827 |
Evaluation metrics for IBk algorithm.
| Metrics | HS | EM | HL | ZOL | ||||
|---|---|---|---|---|---|---|---|---|
| Dataset | BFS | MFSS | BFS | MFSS | BFS | MFSS | BFS | MFSS |
| Thyroid | 0.973 | 0.969 | 0.834 | 0.833 | 0.027 | 0.031 | 0.166 | 0.167 |
| Solar flare | 0.888 | 0.912 | 0.736 | 0.791 | 0.112 | 0.088 | 0.264 | 0.209 |
| Scene | 0.886 | 0.76 | 0.626 | 0.276 | 0.114 | 0.24 | 0.374 | 0.724 |
| Music | 0.753 | 0.74 | 0.243 | 0.233 | 0.247 | 0.252 | 0.757 | 0.767 |
| Yeast | 0.762 | 0.722 | 0.214 | 0.123 | 0.238 | 0.278 | 0.786 | 0.877 |
Figure 3Hamming score-Naive Bayes.
Figure 4Hamming score-SVM.
Figure 5Hamming score-IBk.
Figure 6Hamming score-J48.
Figure 7Exact match: Naive Bayes.
Figure 8Exact match-SVM.
Figure 9Exact match-IBk.
Figure 10Exact match-J48.
Evans correlation coefficient classification.
| Correlation coefficient value | Strength of correlation |
|---|---|
| 0.80–1.00 | Very strong |
| 0.60–0.79 | Strong |
| 0.40–0.59 | Moderate |
| 0.20–0.39 | Weak |
| 0.00–0.19 | Very weak |
Correlation between BFS and MFSS for four classifiers.
| Metrics | J48 | Naive Bayes | SVM | IBk |
|---|---|---|---|---|
| Hamming score | 0.914 | 0.867 | 0.801 | 0.859 |
| Exact match | 0.943 | 0.908 | 0.853 | 0.878 |
Paired t-test results of different evaluation metrics before and after applying MFSS for four classifiers.
| Metrics | J48 | Accept/reject | Naive | Accept/reject | SVM | Accept/reject | IBk | Accept/reject |
|---|---|---|---|---|---|---|---|---|
| Hamming score | 0.675 | Accept | 0.376 | Accept | 1.549 | Accept | 1.239 | Accept |
| Exact match | 1.111 | Accept | 0.166 | Accept | 1.577 | Accept | 1.110 | Accept |
Features selected using proposed MFSS.
| Dataset | Number of features in the dataset | Number of features selected using MFSS | Percentage of selected features using MFSS |
|---|---|---|---|
| Thyroid | 28 | 5 | 18 |
| Solar flare | 10 | 3 | 30 |
| Scene | 294 | 8 | 3 |
| Music/emotions | 71 | 6 | 8 |
| Yeast | 103 | 7 | 7 |
Figure 11Features selected using proposed MFSS.
Comparison of time complexity.
| Existing feature selection techniques in the literature | Proposed MFSS |
|---|---|
|
|
|
fs: feature subset for class c after problem transformation, i = 1 ⋯ m; “m”: the number of classes.
fs: optimal single unique feature subset using proposed MFSS for all the “m” classes.