| Literature DB >> 25419659 |
Nader Salari1, Shamarina Shohaimi2, Farid Najafi3, Meenakshii Nallappan2, Isthrinayagy Karishnarajah4.
Abstract
Among numerous artificial intelligence approaches, k-Nearest Neighbor algorithms, genetic algorithms, and artificial neural networks are considered as the most common and effective methods in classification problems in numerous studies. In the present study, the results of the implementation of a novel hybrid feature selection-classification model using the above mentioned methods are presented. The purpose is benefitting from the synergies obtained from combining these technologies for the development of classification models. Such a combination creates an opportunity to invest in the strength of each algorithm, and is an approach to make up for their deficiencies. To develop proposed model, with the aim of obtaining the best array of features, first, feature ranking techniques such as the Fisher's discriminant ratio and class separability criteria were used to prioritize features. Second, the obtained results that included arrays of the top-ranked features were used as the initial population of a genetic algorithm to produce optimum arrays of features. Third, using a modified k-Nearest Neighbor method as well as an improved method of backpropagation neural networks, the classification process was advanced based on optimum arrays of the features selected by genetic algorithms. The performance of the proposed model was compared with thirteen well-known classification models based on seven datasets. Furthermore, the statistical analysis was performed using the Friedman test followed by post-hoc tests. The experimental findings indicated that the novel proposed hybrid model resulted in significantly better classification performance compared with all 13 classification methods. Finally, the performance results of the proposed model was benchmarked against the best ones reported as the state-of-the-art classifiers in terms of classification accuracy for the same data sets. The substantial findings of the comprehensive comparative study revealed that performance of the proposed model in terms of classification accuracy is desirable, promising, and competitive to the existing state-of-the-art classification models.Entities:
Mesh:
Year: 2014 PMID: 25419659 PMCID: PMC4242540 DOI: 10.1371/journal.pone.0112987
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Class sample distribution in the ACSEKI dataset.
| Class Label | Class Name | Frequency | Percent |
| 1 | STEMI | 316 | 15.28 |
| 2 | NSTEMI | 461 | 22.29 |
| 3 | UA | 1196 | 57.83 |
| 4 | Other | 95 | 04.59 |
| Total | 2068 | 100.0 |
Detailed description of the recorded clinical features in the ACSEKI database.
| No. | Name | Description | measurement scale |
| 1 | dmSex | Female, male | Nominal |
| 2 | age | Age in years | Numeric |
| 3 | BMI | Body mass index | Numeric |
| 4 | phMI | History of prior myocardial infarction | Nominal |
| 5 | phPriorAP | History of prior angina pectoris | Nominal |
| 6 | phCHF | History of congestive heart failure | Nominal |
| 7 | phStroke | History of stroke | Nominal |
| 8 | phLungDis | History of chronic lung disease | Nominal |
| 9 | phPCI | Prior PCI | Nominal |
| 10 | phCABG | Prior CABG | Nominal |
| 11 | phSmoking | Smoking status | Nominal |
| 12 | phDiab | Diabetes mellitus | Nominal |
| 13 | phHT | History of hypertension | Nominal |
| 14 | phHyChol | History of hypercholesterolemia | Nominal |
| 15 | phFamHist | Family history of Coronary Artery Disease | Nominal |
| 16 | adPresSymp | Predominant presenting symptom | Nominal |
| 17 | adHR | Heart rate | Nominal |
| 18 | adSysBP | Systolic blood pressure | Numeric |
| 19 | adtroponMax | Max measure of Troponin in first four tests | Numeric |
| 20 | adSTTchange | ECG STT changes | Nominal |
| 21 | adCKMB | CKMB mass elevated | Nominal |
| 22 | adSChol | Total cholesterol value | Numeric |
| 23 | adSCreat | Serum creatinine value | Numeric |
| 24 | adGluc | Glucose value | Numeric |
| 25 | adHaem | Haemoglobin value | Numeric |
| 26 | adRhythm | ECG rhythm | Nominal |
| 27 | dcDisDiag | Discharge diagnosis(class) | Nominal |
Detailed description of the recorded clinical features in the Cleveland dataset.
| No. | Name | Description | measurement scale |
| 1 | Sex | Sex | Nominal |
| 2 | Age | Age | Numeric |
| 3 | Cp | Chest pain type | Nominal |
| 4 | Trestbps | Resting blood pressure | Numeric |
| 5 | Chol | Serum cholestoral | Numeric |
| 6 | Fbs | Fasting blood sugar | Nominal |
| 7 | Restecg | Electrocardiographic results during rest | Nominal |
| 8 | Thalach | Maximum heart rate achieved | Numeric |
| 9 | Exang | Exercise induced angina | Nominal |
| 10 | Oldpeak | ST depression induced by exercise relative to the rest | Nominal |
| 11 | Slope | Slope of the peak exercise ST segment | Nominal |
| 12 | CA | Number of major vessels colored by flourosopy | Nominal |
| 13 | Thal | The heart status | Nominal |
| 14 | class | Healthy = 0, sick type = 1, 2, 3, and 4 (beasd on severity of heart disease) | Nominal |
Class sample distribution in the Cleveland dataset (multiple classes).
| Class Numbers | Class Name | Frequency | Percent |
| 1 | Healthy | 164 | 54.1 |
| 2 | Sick level 1 | 55 | 18.1 |
| 3 | Sick level 2 | 36 | 11.9 |
| 4 | Sick level 3 | 35 | 11.6 |
| 5 | Sick level 4 | 13 | 4.3 |
| Total | 303 | 100.0 |
Class sample distribution in the Cleveland dataset (binary class).
| Class Numbers | Class Name | Frequency | Percent |
| 1 | Healthy | 164 | 54.1 |
| 2 | Sick | 139 | 45.9 |
| Total | 303 | 100.0 |
Class sample distribution in the Hungarian dataset (binary class).
| Class Numbers | Class Name | Frequency | Percent |
| 1 | Healthy | 188 | 64 |
| 2 | Sick | 106 | 36 |
| Total | 294 | 100.0 |
Description of the recorded clinical features computed from the digital images of FNA of the breast masses in the WBC dataset.
| No. | Name | Value |
| 1 | Clump Thickness | 1–10 |
| 2 | Uniformity of Cell Size | 1–10 |
| 3 | Uniformity of Cell Shape | 1–10 |
| 4 | Marginal Adhesion | 1–10 |
| 5 | Single Epithelial Cell Size | 1–10 |
| 6 | Bare Nuclei | 1–10 |
| 7 | Bland Chromatin | 1–10 |
| 8 | Normal Nucleoli | 1–10 |
| 9 | Mitoses | 1–10 |
| 10 | Class | Benign = 1, Malignant = 2 |
Class sample distribution in the WBC dataset (binary class).
| Class Numbers | Class Name | Frequency | Percent |
| 1 | Malignant | 241 | 34 |
| 2 | Benign | 458 | 66 |
| Total | 699 | 100.0 |
Description of the recorded clinical features computed from digital images of FNA of the breast masses in the WDBC dataset.
| No. | Name | Description |
| 1 | radius | mean of distances from center to points on the perimeter |
| 2 | texture | standard deviation of gray-scale values |
| 3 | perimeter | the border around cell nucleus |
| 4 | area | the amount of space that covers cell nuclus |
| 5 | smoothness | local variation in radius lengths |
| 6 | compactness | (perimeter∧2/area – 1) |
| 7 | concavity | severity of concave portions of the contour |
| 8 | concave points | number of concave portions of the contour |
| 9 | symmetry | the quality of being symmetrical |
| 10 | fractal dimension | (“coastline approximation” – 1) |
Class sample distribution in the WDBC dataset (binary class).
| Class Numbers | Class Name | Frequency | Percent |
| 1 | Malignant | 212 | 37 |
| 2 | Benign | 357 | 63 |
| Total | 569 | 100.0 |
Description of the recorded clinical features computed from digital images of FNA of the breast masses in the WDBC dataset.
| Class Numbers | Class Name | measurement scale |
| 1 | Number of times pregnant | Numeric |
| 2 | Plasma glucose concentration (mg/dc) | Numeric |
| 3 | Diastolic blood pressure (mm Hg) | Numeric |
| 4 | Triceps skin fold thickness (mm) | Numeric |
| 5 | 2-hour serum insulin (mu U/ml) | Numeric |
| 6 | Body mass index (weight in kg/(height in m2) | Numeric |
| 7 | Diabetes pedigree function | Numeric |
| 8 | Age (years) | Numeric |
Class sample distribution in the Pima dataset (binary class).
| Class Numbers | Class Name | Frequency | Percent |
| 1 | Healthy | 500 | 65 |
| 2 | Sick (diabetic) | 268 | 35 |
| Total | 768 | 100.0 |
Figure 1Backpropagation Neural Network.
Figure 2Implementation stages of the proposed new hybrid model.
Figure 3Graph of the Logistic function and its derivative function.
Figure 4Graph of the dynamic logistic function and its derivative function for active input range [−5 7] and output range [−1 1].
Figure 5Graph of the dynamic logistic function and its derivative function for active input range [−7.3, 6.5] and output range [1.2, 3.4].
A typical Fuzzy class membership array.
| Number of classes | 1 | 2 | 3 | 4 | 5 |
|
| %1.2 | %42 | %21 | %56 | %78 |
The conventional data layout for the 2×2 confusion matrix.
| Predicted class | ||||
| Positive | Negative | |||
| Actual class | Positive |
|
| TPR |
| Negative |
|
| TNR | |
| PPV | NPV | |||
Figure 6The definitions of confusion matrix-derived accuracy measures.
The assessment results of the proposed model in comparison with the all other thirteen methods based on the five binary-class data sets by applying the six commonly used performance evaluation criteria.
| Data sets | performances evaluation criteria | ||||||
| Name of Classifier | Accuracy | Sensitivity | Specificity | Precision | F-score | MCC | |
|
| ANFIS | 0.774 | 0.865 | 0.671 | 0.753 | 0.803 | 0.551 |
| NB | 0.751 | 0.774 | 0.725 | 0.761 | 0.765 | 0.501 | |
| BPNN | 0.805 | 0.837 | 0.769 | 0.799 | 0.818 | 0.609 | |
| GLM binomial | 0.821 | 0.821 | 0.823 | 0.839 | 0.827 | 0.642 | |
| GLM inv. gaussian | 0.814 | 0.901 | 0.666 | 0.77 | 0.845 | 0.639 | |
| GLM normal | 0.843 | 0.903 | 0.773 | 0.824 | 0.86 | 0.687 | |
| GLM poisson | 0.848 | 0892 | 0.751 | 0.817 | 0.868 | 0.698 | |
| ID3 | 0.744 | 0.761 | 0.727 | 0.769 | 0.761 | 0.491 | |
| Bagging-ID3 | 0.804 | 0.844 | 0.758 | 0.805 | 0.822 | 0.606 | |
| k-NN | 0.801 | 0.825 | 0.775 | 0.812 | 0.815 | 0.602 | |
| DWk-NN | 0.811 | 0.834 | 0.784 | 0.817 | 0.823 | 0.621 | |
| PDk-NN | 0.802 | 0.837 | 0.76 | 0.799 | 0.816 | 0.599 | |
| RBF | 0.757 | 0.853 | 0.645 | 0.734 | 0.787 | 0.517 | |
| Proposed Model | 0.856 | 0.906 | 0.801 | 0.841 | 0.872 | 0.714 | |
|
| ANFIS | 0.806 | 0.894 | 0.651 | 0.818 | 0.853 | 0.572 |
| NB | 0.779 | 0.785 | 0.769 | 0.856 | 0.817 | 0.542 | |
| BPNN | 0.787 | 0.883 | 0.618 | 0.813 | 0.840 | 0.517 | |
| GLM binomial | 0.815 | 0.862 | 0.733 | 0.846 | 0.853 | 0.601 | |
| GLM inv. gaussian | 0.801 | 0.927 | 0.585 | 0.794 | 0.854 | 0.564 | |
| GLM normal | 0.825 | 0.903 | 0.685 | 0.831 | 0.868 | 0.610 | |
| GLM poisson | 0.820 | 0.911 | 0.658 | 0.827 | 0.866 | 0.600 | |
| ID3 | 0.768 | 0.824 | 0.670 | 0.819 | 0.819 | 0.497 | |
| Bagging-ID3 | 0.803 | 0.869 | 0.695 | 0.827 | 0.846 | 0.577 | |
| k-NN | 0.778 | 0.874 | 0.611 | 0.801 | 0.834 | 0.511 | |
| DWk-NN | 0.783 | 0.867 | 0.633 | 0.808 | 0.835 | 0.521 | |
| PDk-NN | 0.784 | 0.875 | 0.633 | 0.803 | 0.836 | 0.530 | |
| RBF | 0.748 | 0.826 | 0.607 | 0.806 | 0.807 | 0.438 | |
| Proposed Model | 0.842 | 0.935 | 0.678 | 0.839 | 0.883 | 0.652 | |
|
| ANFIS | 0.908 | 0.965 | 0.799 | 0.901 | 0.931 | 0.795 |
| NB | 0.938 | 0.956 | 0.903 | 0.948 | 0.952 | 0.863 | |
| BPNN | 0.931 | 0.923 | 0.946 | 0.971 | 0.946 | 0.854 | |
| GLM binomial | 0.972 | 0.961 | 0.973 | 0.976 | 0.968 | 0.941 | |
| GLM inv. gaussian | 0.917 | 0.979 | 0.783 | 0.892 | 0.933 | 0.821 | |
| GLM normal | 0.961 | 0.978 | 0.927 | 0.961 | 0.971 | 0.915 | |
| GLM poisson | 0.949 | 0.979 | 0.881 | 0.938 | 0.958 | 0.889 | |
| ID3 | 0.949 | 0.959 | 0.931 | 0.962 | 0.961 | 0.889 | |
| Bagging-ID3 | 0.968 | 0.971 | 0.965 | 0.981 | 0.976 | 0.932 | |
| k-NN | 0.960 | 0.977 | 0.953 | 0.974 | 0.975 | 0.932 | |
| DWk-NN | 0.967 | 0.976 | 0.950 | 0.973 | 0.974 | 0.927 | |
| PDk-NN | 0.968 | 0.974 | 0.957 | 0.976 | 0.975 | 0.931 | |
| RBF | 0.921 | 0.971 | 0.831 | 0.914 | 0.942 | 0.825 | |
| Proposed Model | 0.981 | 0.982 | 0.978 | 0.988 | 0.985 | 0.957 | |
|
| ANFIS | 0.916 | 0.955 | 0.852 | 0.915 | 0.934 | 0.820 |
| NB | 0.954 | 0.972 | 0.923 | 0.955 | 0.964 | 0.902 | |
| BPNN | 0.907 | 0.932 | 0.866 | 0.927 | 0.920 | 0.801 | |
| GLM binomial | 0.947 | 0.949 | 0.944 | 0.965 | 0.957 | 0.889 | |
| GLM inv. gaussian | 0.889 | 0.996 | 0.711 | 0.851 | 0.917 | 0.772 | |
| GLM normal | 0.946 | 0.987 | 0.878 | 0.931 | 0.958 | 0.885 | |
| GLM poisson | 0.931 | 0.994 | 0.825 | 0.905 | 0.947 | 0.855 | |
| ID3 | 0.928 | 0.939 | 0.909 | 0.946 | 0.942 | 0.847 | |
| Bagging-ID3 | 0.944 | 0.962 | 0.914 | 0.951 | 0.956 | 0.881 | |
| k-NN | 0.968 | 0.989 | 0.932 | 0.961 | 0.975 | 0.932 | |
| DWk-NN | 0.966 | 0.988 | 0.929 | 0.958 | 0.973 | 0.927 | |
| PDk-NN | 0.963 | 0.987 | 0.922 | 0.956 | 0.971 | 0.921 | |
| RBF | 0.954 | 0.983 | 0.907 | 0.946 | 0.964 | 0.903 | |
| Proposed Model | 0.983 | 0.998 | 0.959 | 0.976 | 0.987 | 0.965 | |
|
| ANFIS | 0.696 | 0.797 | 0.511 | 0.751 | 0.773 | 0.319 |
| NB | 0.747 | 0.896 | 0.471 | 0.759 | 0.822 | 0.415 | |
| BPNN | 0.741 | 0.890 | 0.463 | 0.761 | 0.821 | 0.383 | |
| GLM binomial | 0.761 | 0.861 | 0.581 | 0.790 | 0.824 | 0.463 | |
| GLM inv. gaussian | 0.769 | 0.926 | 0.479 | 0.767 | 0.839 | 0.471 | |
| GLM normal | 0.769 | 0.893 | 0.539 | 0.783 | 0.835 | 0.471 | |
| GLM poisson | 0.765 | 0.896 | 0.521 | 0.777 | 0.832 | 0.459 | |
| ID3 | 0.702 | 0.775 | 0.571 | 0.768 | 0.772 | 0.347 | |
| Bagging-ID3 | 0.747 | 0.829 | 0.593 | 0.793 | 0.811 | 0.432 | |
| k-NN | 0.739 | 0.848 | 0.535 | 0.775 | 0.809 | 0.405 | |
| DWk-NN | 0.737 | 0.846 | 0.536 | 0.773 | 0.808 | 0.403 | |
| PDk-NN | 0.728 | 0.838 | 0.523 | 0.767 | 0.801 | 0.381 | |
| RBF | 0.759 | 0.889 | 0.521 | 0.774 | 0.828 | 0.449 | |
| Proposed Model | 0.774 | 0.936 | 0.611 | 0.797 | 0.861 | 0.481 | |
The homoscedasticity test results of Levene on the results were obtained from, the six performance evaluation criteria based on five binary-class dataset.
| performances evaluation criteria | ||||||
| Name of dataset | Accuracy | Sensitivity | Specificity | Precision | F-score | MCC |
|
| 0.00 * | 0.00 * | 0.00 * | 0.00 * | 0.00 * | 0.01 * |
|
| 0.03 * | 0.00 * | 0.00 * | 0.20 | 0.00 * | 0.01 * |
|
| 0.02 * | 0.00 * | 0.00 * | 0.01* | 0.00 * | 0.00 * |
|
| 0.01 * | 0.00 * | 0.00 * | 0.02 * | 0.00 * | 0.00 * |
|
| 0.03 * | 0.00 * | 0.00 * | 0.10 | 0.00 * | 0.01 * |
The symbol “*” implies that homoscedasticity condition was not satisfied (the Levene's test was statistically significant (P-V<0.05)).
The multiple comparison test results of Friedman on the results were obtained from, the six performance evaluation criteria based on the five binary-class data sets.
| performances evaluation criteria | ||||||
| Accuracy | Sensitivity | Specificity | Precision | F-score | MCC | |
|
| 32.872 | 46.399 | 43.175 | 32.904 | 29.902 | 34.354 |
|
| 0.001781 * | 0.000012 * | 0.000042 * | 0.001761 * | 0.001761* | 0.001063 * |
The symbol “*” implies that the Friedman's test was statistically significant (P-V<0.05).
The pairwise multiple comparisons (post-hoc) test results of Holm (the proposed hybrid model (control algorithm) vs. the rest algorithms) on the results were obtained from, the six performance evaluation criteria based on the five binary-class data sets.
| Evaluation criteria | Name of Classifier | Z-Score | P-Value | Coefficient adjustment of Holm | Adjusted P-Value |
|
|
| ||||
| ANFIS | 4.5356 | .000006 | 13 | .000076* | |
| RBF | 4.1576 | .000032 | 12 | .000388* | |
| ID3 | 3.9686 | .000073 | 11 | .000798* | |
| NB | 3.7796 | .000157 | 10 | .001573* | |
| GLM inv. gaussian | 3.5907 | .000330 | 9 | .002970* | |
| BPNN | 3.4017 | .000670 | 8 | .005359* | |
| k-NN | 3.4017 | .000670 | 7 | .004689* | |
| PDk-NN | 3.1749 | .001499 | 6 | .008993* | |
| DWk-NN | 3.0237 | .002497 | 5 | .012484* | |
| GLM binomial | 2.8347 | .004587 | 4 | .018346* | |
| GLM poisson | 2.6458 | .008150 | 3 | .024449* | |
| GLM normal | 2.4568 | .014018 | 2 | .028036* | |
| Bagging-ID3 | 2.2678 | .023341 | 1 | .023341* | |
|
|
| ||||
| NB | 4.5356 | .000006 | 13 | .000076* | |
| GLM binomial | 4.1576 | .000032 | 12 | .000388* | |
| ID3 | 3.9686 | .000073 | 11 | .000798* | |
| ANFIS | 3.7796 | .000157 | 10 | .001573* | |
| RBF | 3.4017 | .000670 | 9 | .006028* | |
| BPNN | 3.2127 | .001315 | 8 | .010519* | |
| DWk-NN | 3.0993 | .001940 | 7 | .013578* | |
| PDk-NN | 3.0237 | .002497 | 6 | .014981* | |
| k-NN | 2.8347 | .004587 | 5 | .022933* | |
| GLM normal | 2.6458 | .008150 | 4 | .032598* | |
| GLM poisson | 2.5702 | .010164 | 3 | .030491* | |
| Bagging-ID3 | 2.4568 | .014018 | 2 | .028036* | |
| GLM inv. gaussian | 2.2678 | .023341 | 1 | .023341* | |
|
|
| ||||
| GLM inv. gaussian | 4.6868 | .000003 | 13 | .000037* | |
| RBF | 3.9686 | .000073 | 12 | .000870* | |
| ANFIS | 3.8552 | .000116 | 11 | .001275* | |
| GLM poisson | 3.3639 | .000769 | 10 | .007686* | |
| BPNN | 3.4017 | .000670 | 9 | .006028* | |
| NB | 3.0993 | .001940 | 8 | .015517* | |
| ID3 | 3.0237 | .002497 | 7 | .017478* | |
| PDk-NN | 2.9103 | .003611 | 6 | .021664* | |
| GLM normal | 2.6458 | .008150 | 5 | .040748* | |
| k-NN | 2.3434 | .019109 | 4 | .076436* | |
| DWk-NN | 2.4568 | .014018 | 3 | .042054* | |
| Bagging-ID3 | 2.2678 | .023341 | 2 | .046683* | |
| GLM binomial | 2.1544 | .031209 | 1 | .031209* | |
|
|
| ||||
| ANFIS | 4.2332 | .000023 | 13 | .000302* | |
| GLM inv. gaussian | 4.1198 | .000038 | 12 | .000458* | |
| RBF | 3.5151 | .000440 | 11 | .004838* | |
| PDk-NN | 2.8725 | .004072 | 10 | .040721* | |
| DWk-NN | 2.6458 | .008150 | 9 | .073346* | |
| NB | 2.6080 | .009107 | 8 | .072857* | |
| BPNN | 2.5702 | .010164 | 7 | .071146* | |
| GLM poisson | 2.5702 | .010164 | 6 | .060983* | |
| GLM normal | 2.4946 | .012610 | 5 | .063049* | |
| ID3 | 2.4568 | .014018 | 4 | .056072* | |
| k-NN | 2.3434 | .019109 | 3 | .057327* | |
| GLM binomial | 2.2678 | .023341 | 2 | .046683* | |
| Bagging-ID3 | 2.2300 | .025748 | 1 | .025748* | |
|
|
| ||||
| ID3 | 4.5356 | .000006 | 13 | .000076* | |
| ANFIS | 4.1576 | .000032 | 12 | .000388* | |
| RBF | 3.7796 | .000157 | 11 | .001731* | |
| BPNN | 3.2883 | .001008 | 10 | .010080* | |
| NB | 3.2505 | .001152 | 9 | .010368* | |
| GLM inv. gaussian | 3.0237 | .002497 | 8 | .019975* | |
| k-NN | 2.9103 | .003611 | 7 | .025274* | |
| PDk-NN | 2.8347 | .004587 | 6 | .027520* | |
| DWk-NN | 2.7213 | .006502 | 5 | .032512* | |
| Bagging-ID3 | 2.6458 | .008150 | 4 | .032598* | |
| GLM poisson | 2.4568 | .014018 | 3 | .042054* | |
| GLM binomial | 2.3434 | .019109 | 2 | .038218* | |
| GLM normal | 2.2678 | .023341 | 1 | .023341* | |
|
|
| ||||
| ID3 | 4.1198 | .000038 | 13 | .000496* | |
| ANFIS | 3.9308 | .000085 | 12 | .001019* | |
| BPNN | 3.6285 | .000285 | 11 | .003138* | |
| NB | 3.4017 | .000670 | 10 | .006698* | |
| RBF | 3.3261 | .000881 | 9 | .007927* | |
| GLM inv. gaussian | 3.0993 | .001940 | 8 | .015517* | |
| PDk-NN | 3.0237 | .002497 | 7 | .017478* | |
| k-NN | 2.9103 | .003611 | 6 | .021664* | |
| DWk-NN | 2.8347 | .004587 | 5 | .022933* | |
| GLM poisson | 2.6458 | .008150 | 4 | .032598* | |
| GLM normal | 2.4568 | .014018 | 3 | .042054* | |
| Bagging-ID3 | 2.3812 | .017256 | 2 | .034513* | |
| GLM binomial | 2.3056 | .021133 | 1 | .021133* |
The symbol “*” implies that pairwise multiple comparisons (post-hoc) test was statistically significant (P-V<0.05).
The assessment results of the proposed model in comparison with the all other thirteen methods based on the two multi-class data sets by applying the nine commonly used performance evaluation criteria.
| Data sets | performances evaluation criteria | |||||||||
| Name of Classifier | Accuracy | P-micro | P-macro | R-micro | R-macro | F-micro | F-macro | MCC | 1-CEN | |
|
|
| 0.546 | 0.365 | 0.352 | 0.378 | 0.351 | 0.371 | 0.327 | 0.297 | 0.579 |
| NB | 0.525 | 0.331 | 0.312 | 0.323 | 0.301 | 0.327 | 0.290 | 0.264 | 0.578 | |
| BPNN | 0.588 | 0.364 | 0.350 | 0.329 | 0.314 | 0.346 | 0.275 | 0.408 | 0.646 | |
| GLM binomial | 0.613 | 0.371 | 0.357 | 0.357 | 0.344 | 0.364 | 0.291 | 0.360 | 0.711 | |
| GLM inv. gaussian | 0.584 | 0.353 | 0.347 | 0.349 | 0.332 | 0.351 | 0.287 | 0.291 | 0.697 | |
| GLM normal | 0.613 | 0.358 | 0.343 | 0.361 | 0.348 | 0.359 | 0.288 | 0.364 | 0.713 | |
| GLM poisson | 0.594 | 0.351 | 0.332 | 0.342 | 0.329 | 0.346 | 0.284 | 0.328 | 0.696 | |
| ID3 | 0.505 | 0.321 | 0.308 | 0.325 | 0.304 | 0.323 | 0.290 | 0.242 | 0.541 | |
| Bagging-ID3 | 0.593 | 0.378 | 0.361 | 0.359 | 0.342 | 0.368 | 0.332 | 0.279 | 0.601 | |
| k-NN | 0.559 | 0.321 | 0.294 | 0.298 | 0.285 | 0.309 | 0.268 | 0.294 | 0.612 | |
| DWk-NN | 0.561 | 0.319 | 0.311 | 0.312 | 0.299 | 0.315 | 0.286 | 0.296 | 0.597 | |
| PDk-NN | 0.569 | 0.325 | 0.318 | 0.322 | 0.304 | 0.323 | 0.292 | 0.289 | 0.598 | |
| RBF | 0.531 | 0.285 | 0.261 | 0.218 | 0.209 | 0.247 | 0.157 | 0.050 | 0.721 | |
| Proposed Model | 0.621 | 0.423 | 0.507 | 0.431 | 0.418 | 0.427 | 0.436 | 0.497 | 0.804 | |
|
| ANFIS | 0.501 | 0.483 | 0.278 | 0.512 | 0.499 | 0.497 | 0.207 | 0.252 | 0.839 |
| NB | 0.507 | 0.502 | 0.481 | 0.451 | 0.448 | 0.475 | 0.357 | 0.348 | 0.669 | |
| BPNN | 0.776 | 0.651 | 0.332 | 0.534 | 0.250 | 0.587 | 0.183 | 0.001 | 0.770 | |
| GLM binomial | 0.563 | 0.493 | 0.487 | 0.341 | 0.325 | 0.403 | 0.319 | 0.215 | 0.659 | |
| GLM inv. gaussian | 0.621 | 0.495 | 0.489 | 0.412 | 0.401 | 0.450 | 0.392 | 0.356 | 0.674 | |
| GLM normal | 0.547 | 0.491 | 0.484 | 0.324 | 0.319 | 0.390 | 0.303 | 0.206 | 0.655 | |
| GLM poisson | 0.582 | 0.523 | 0.516 | 0.361 | 0.353 | 0.427 | 0.344 | 0.267 | 0.665 | |
| ID3 | 0.915 | 0.781 | 0.765 | 0.768 | 0.763 | 0.774 | 0.763 | 0.856 | 0.892 | |
| Bagging-ID3 | 0.937 | 0.793 | 0.773 | 0.752 | 0.745 | 0.772 | 0.741 | 0.894 | 0.926 | |
| k-NN | 0.777 | 0.656 | 0.648 | 0.591 | 0.580 | 0.622 | 0.584 | 0.604 | 0.751 | |
| DWk-NN | 0.784 | 0.642 | 0.637 | 0.589 | 0.585 | 0.614 | 0.590 | 0.615 | 0.756 | |
| PDk-NN | 0.763 | 0.607 | 0.598 | 0.576 | 0.569 | 0.591 | 0.574 | 0.579 | 0.729 | |
| RBF | 0.621 | 0.421 | 0.407 | 0.352 | 0.345 | 0.383 | 0.276 | 0.184 | 0.777 | |
| Proposed Model | 0.952 | 0.868 | 0.784 | 0.812 | 0.792 | 0.839 | 0.774 | 0.915 | 0.931 | |
The homoscedasticity test results of Levene on the results were obtained from, the six performance evaluation criteria based on five binary-class dataset.
| performances evaluation criteria | |||||||||
| Name of Classifier | Accuracy | P-micro | P-macro | R-micro | R-macro | F-micro | F-macro | MCC | 1-CEN |
|
| 0,00* | 0.04* | 0.02* | 0.01* | 0.00* | .03* | 0.00* | 0.17 | 0.13 |
|
| 0,00* | 0.01* | 0.11* | 0.00* | 0.00* | .06 | 0.00* | 0.02* | 0.04* |
The symbol “*” implies that homoscedasticity condition was not satisfied (the Levene's test was statistically significant (P-V<0.05)).
The multiple comparisons test results of Friedman on the results were obtained from, the nine performance evaluation criteria based on two multi-class data sets.
| performances evaluation criteria | |||||||||
| Name of Classifier | Accuracy | P-micro | P-macro | R-micro | R-macro | F-micro | F-macro | MCC | 1-CEN |
|
| 23.614 | 24.647 | 22.771 | 24.068 | 24.056 | 23.492 | 29.24 | 30.375 | 32.571 |
|
| .035* | .026* | .045* | .031* | .031* | .036* | .006* | .004* | .002* |
The symbol “*” implies that the Friedman's test was statistically significant (P-V<0.05).
The pairwise multiple comparisons (post-hoc) test results of Holm (the proposed hybrid model (control algorithm) vs. the rest algorithms) on the results were obtained from, the nine performance evaluation criteria based on the two multi-class data sets.
| Evaluation criteria | Name of Classifier | Z-Score | P-Value | Coefficient adjustment of Holm | Adjusted P-Value |
|
|
| ||||
| NB | 4.5356 | .000006 | 13 | .000075* | |
| ANFIS | 4.5356 | .000006 | 12 | .000069* | |
| RBF | 4.0329 | .000055 | 11 | .000606* | |
| GLM normal | 3.7154 | .000203 | 10 | .002029* | |
| GLM binomial | 3.4659 | .000528 | 9 | .004756* | |
| GLM poisson | 3.4017 | .000670 | 8 | .005357* | |
| GLM inv. gaussian | 3.2770 | .001049 | 7 | .007344* | |
| PDk-NN | 3.1371 | .001706 | 6 | .010238* | |
| k-NN | 3.0237 | .002497 | 5 | .012485* | |
| ID3 | 2.9481 | .003197 | 4 | .012789* | |
| DWk-NN | 2.8990 | .003744 | 3 | .011231* | |
| BPNN | 1.7651 | .077547 | 2 | .155094 | |
| Bagging-ID3 | 1.1339 | .256837 | 1 | .256837 | |
|
|
| ||||
| RBF | 4.1576 | .000032 | 13 | .000418* | |
| NB | 3.9686 | .000072 | 12 | .000868* | |
| ANFIS | 3.7796 | .000157 | 11 | .001728* | |
| GLM normal | 3.4017 | .000670 | 10 | .006697* | |
| GLM binomial | 3.2127 | .001315 | 9 | .011834* | |
| GLM poisson | 3.0993 | .001940 | 8 | .015518* | |
| k-NN | 3.0237 | .002497 | 7 | .017479* | |
| GLM inv. gaussian | 2.9481 | .003197 | 6 | .019184* | |
| PDk-NN | 2.8725 | .004072 | 5 | .020362* | |
| DWk-NN | 2.7705 | .005597 | 4 | .022388* | |
| ID3 | 1.8256 | .067911 | 3 | .203732 | |
| BPNN | 1.5119 | .130559 | 2 | .261119 | |
| Bagging-ID3 | .3780 | .705431 | 1 | .705431 | |
|
|
| ||||
| RBF | 4.4108 | .000010 | 13 | .000134* | |
| NB | 3.6549 | .000257 | 12 | .003087* | |
| ANFIS | 3.6549 | .000257 | 11 | .002830* | |
| BPNN | 3.5264 | .000421 | 10 | .004213* | |
| GLM normal | 3.4017 | .000670 | 9 | .006027* | |
| GLM poisson | 3.3261 | .000881 | 8 | .007046* | |
| GLM binomial | 3.1371 | .001706 | 7 | .011944* | |
| k-NN | 3.0237 | .002497 | 6 | .014982* | |
| DWk-NN | 2.9103 | .003611 | 5 | .018054* | |
| PDk-NN | 2.8347 | .004587 | 4 | .018347* | |
| GLM inv. gaussian | 2.6458 | .008169 | 3 | .024507* | |
| ID3 | 1.8898 | .058785 | 2 | .117569 | |
| Bagging-ID3 | .3780 | .001706 | 1 | .001706 | |
|
|
| ||||
| GLM normal | 4.5356 | .000006 | 13 | .000075* | |
| RBF | 4.4108 | .000010 | 12 | .000124* | |
| GLM binomial | 3.4017 | .000670 | 11 | .007366* | |
| GLM poisson | 3.1484 | .001642 | 10 | .016417* | |
| NB | 3.0237 | .002497 | 9 | .022473* | |
| GLM inv. gaussian | 3.0237 | .002497 | 8 | .019976* | |
| k-NN | 3.0237 | .002497 | 7 | .017479* | |
| ID3 | 2.9481 | .003197 | 6 | .019184* | |
| DWk-NN | 2.8347 | .004587 | 5 | .022934* | |
| PDk-NN | 2.7969 | .005160 | 4 | .020638* | |
| ANFIS | 2.7213 | .006503 | 3 | .019508* | |
| BPNN | 1.6366 | .101714 | 2 | .203428 | |
| Bagging-ID3 | 1.2586 | .208175 | 1 | .208175 | |
|
|
| ||||
| RBF | 4.5356 | .000006 | 13 | .000075* | |
| BPNN | 4.0442 | .000053 | 12 | .000630* | |
| NB | 3.4017 | .000670 | 11 | .007366* | |
| k-NN | 3.4017 | .000670 | 10 | .006697* | |
| DWk-NN | 2.9103 | .003611 | 9 | .032497* | |
| GLM poisson | 2.7591 | .005796 | 8 | .046369* | |
| GLM normal | 2.7213 | .006503 | 7 | .045518* | |
| PDk-NN | 2.7213 | .006503 | 6 | .039015* | |
| GLM binomial | 2.6458 | .008150 | 5 | .040749* | |
| GLM inv. gaussian | 2.6458 | .008150 | 4 | .032599* | |
| ID3 | 1.5875 | .112399 | 3 | .337198* | |
| ANFIS | 1.1339 | .256837 | 2 | .513673 | |
| Bagging-ID3 | 1.0583 | .289919 | 1 | .289919 | |
|
|
| ||||
| RBF | 4.9135 | .000001 | 13 | .000012* | |
| GLM normal | 4.5356 | .000006 | 12 | .000070* | |
| GLM poisson | 4.1576 | .000032 | 11 | .000356* | |
| GLM binomial | 3.9686 | .000073 | 10 | .000725* | |
| NB | 3.7796 | .000157 | 9 | .001416* | |
| GLM inv. gaussian | 3.4017 | .000670 | 8 | .005359* | |
| PDk-NN | 3.0237 | .002497 | 7 | .017478* | |
| DWk-NN | 2.9103 | .003611 | 6 | .021664* | |
| k-NN | 2.8347 | .004587 | 5 | .022933* | |
| ANFIS | 2.7213 | .006502 | 4 | .026009* | |
| BPNN | 2.6458 | .008150 | 3 | .024449* | |
| ID3 | 1.5761 | .115003 | 2 | .230005 | |
| Bagging-ID3 | .6312 | .527910 | 1 | .527910 | |
|
|
| ||||
| RBF | 4.6868 | .000003 | 13 | .000037* | |
| BPNN | 4.5356 | .000006 | 12 | .000070* | |
| GLM normal | 4.1576 | .000032 | 11 | .000356* | |
| GLM poisson | 3.9686 | .000073 | 10 | .000725* | |
| ANFIS | 3.7796 | .000157 | 9 | .001416* | |
| GLM binomial | 3.4017 | .000670 | 8 | .005359* | |
| k-NN | 3.2127 | .001315 | 7 | .009205* | |
| GLM inv. gaussian | 3.0993 | .001940 | 6 | .011638* | |
| NB | 3.0237 | .002497 | 5 | .012484* | |
| DWk-NN | 2.8347 | .004587 | 4 | .018346* | |
| PDk-NN | 2.7213 | .006502 | 3 | .019507* | |
| ID3 | 2.6458 | .008150 | 2 | .016299* | |
| Bagging-ID3 | .5292 | .596667 | 1 | .596667 | |
|
|
| ||||
| RBF | 4.8379 | .000001 | 13 | .000018* | |
| NB | 4.5356 | .000006 | 12 | .000070* | |
| GLM normal | 4.3466 | .000014 | 11 | .000154* | |
| GLM poisson | 4.1576 | .000032 | 10 | .000324* | |
| GLM binomial | 3.8930 | .000099 | 9 | .000893* | |
| GLM inv. gaussian | 3.7796 | .000157 | 8 | .001259* | |
| PDk-NN | 3.6663 | .000246 | 7 | .001724* | |
| BPNN | 3.5907 | .000330 | 6 | .001980* | |
| ANFIS | 3.4017 | .000670 | 5 | .003349* | |
| k-NN | 3.3261 | .000881 | 4 | .003523* | |
| DWk-NN | 3.0237 | .002497 | 3 | .007491* | |
| ID3 | 2.6458 | .008150 | 2 | .016299* | |
| Bagging-ID3 | 2.0788 | .037636 | 1 | .037636* | |
|
|
| ||||
| NB | 4.5356 | .000006 | 13 | .000076* | |
| PDk-NN | 4.1576 | .000032 | 12 | .000388* | |
| DWk-NN | 3.9686 | .000073 | 11 | .000798* | |
| GLM poisson | 3.7796 | .000157 | 10 | .001573* | |
| k-NN | 3.6663 | .000246 | 9 | .002217* | |
| GLM normal | 3.5907 | .000330 | 8 | .002640* | |
| GLM binomial | 3.4017 | .000670 | 7 | .004689* | |
| GLM inv. gaussian | 3.2505 | .001152 | 6 | .006912* | |
| ANFIS | 3.0237 | .002497 | 5 | .012484* | |
| BPNN | 2.8347 | .004587 | 4 | .018346* | |
| ID3 | 2.6458 | .008150 | 3 | .024449* | |
| Bagging-ID3 | 2.2678 | .023341 | 2 | .046683* | |
| RBF | 2.0410 | .041251 | 1 | .041251* |
The symbol “*” implies that pairwise multiple comparisons (post-hoc) test was statistically significant (P-V<0.05).
Classification accuracies obtained with the proposed hybrid model and the other state-of-the-art classifiers from the recent literature for the data sets under consideration.
| Data sets | Kind of Hybrid | Author | Name of Classifier | Year | Accuracy |
|
|
| Zhang et al. | RF | 2008 | 55.62 |
| Zhang et al. | RF- AdaBoost | 2008 | 56.20 | ||
| Ghaemi et al. | FW-FOA | 2014 | 58.14 | ||
|
| Madhu et al. | SVM-ZDISC | 2014 | 57.90 | |
| Madhu et al. | SVM-Bayesian | 2014 | 56.08 | ||
| Madhu et al. | SVM-Fayyad-Irani | 2014 | 57.74 | ||
| Madhu et al. | SVM-CACC | 2014 | 56.70 | ||
| Forghani et al. | SVM-Fuzzy | 2014 | 63.00 | ||
|
| Zhang et al. | AdaBoost | 2008 | 54.45 | |
| Zhang et al. | MultiBoost | 2008 | 55.52 | ||
| Madhu et al. | C4.5-ZDISC | 2014 | 57.09 | ||
| Madhu et al. | C4.5-Bayesian | 2014 | 52.50 | ||
| Madhu et al. | C4.5-Fayyad-Irani | 2014 | 57.97 | ||
| Madhu et al. | C4.5-CACC | 2014 | 50.80 | ||
|
|
|
|
| ||
|
|
| Tan et al. | SVM-GA | 2009 | 84.07 |
| Ozcift | RF-CFS | 2011 | 80.49 | ||
| Ballings et al. | RF | 2013 | 82.12 | ||
| Ballings et al. | KIRF-RBF | 2013 | 67.55 | ||
| Fernandez-Delgado et al. | RF | 2014 | 80.40 | ||
|
| Tan et al. | SVM-GA | 2009 | 84.07 | |
| Fernandez-Delgado et al. | SVM-DKP | 2014 | 79.90 | ||
| Fernandez-Delgado et al. | SVM | 2014 | 81.60 | ||
| Chen et al. | GRID-SVM | 2014 | 83.44 | ||
| Chen et al. | PSO-SVM | 2014 | 86.55 | ||
| Chen et al. | PTVPSO-SVM | 2014 | 87.21 | ||
|
| Zhang et al. | LP-Adaboost | 2011 | 77.04 | |
| Zhang et al. | LP-WV | 2011 | 83.22 | ||
| Zhang et al. | MCE-WV | 2011 | 81.70 | ||
| Ahmad et al. | Improved GA-MLP | 2013 | 85.50 | ||
| Ballings et al. | KF-RBF | 2013 | 75.91 | ||
|
|
|
|
| ||
|
| |||||
|
| Rodriguez et al. | Resampling-AdaBoost | 2008 | 80.96 | |
| Rodriguez et al. | Reweighting -AdaBoost | 2008 | 81.44 | ||
| Sarkar et al. | Naïve Bayes-GA | 2012 | 73.30 | ||
| Sarkar et al. | C4.5-GA | 2012 | 78.08 | ||
| Sarkar et al. | ANN-GA | 2012 | 69.43 | ||
|
|
|
|
| ||
|
|
| Yao et al. | RF_CFS | 2011 | 96.26 |
| Yao et al. | RF-MARS | 2011 | 96.29 | ||
| Ozcift | RF-Bayes Network | 2012 | 96.22 | ||
| Ozcift | RF- Naive Bayes | 2012 | 96.22 | ||
| Ozcift | RF- RBF | 2012 | 96.22 | ||
| Ozcift | RF-kstar | 2012 | 99.05 | ||
| Ozcift | RF- Logistics | 2012 | 98.11 | ||
| Khan | RF-GA | 2013 | 76.26 | ||
| Cadenas et al. | RF-Fuzzy | 2013 | 95.20 | ||
| Cadenas et al. | RF-Fuzzy-fs | 2013 | 95.25 | ||
|
| Nandi et al. | SVM-SOM–RBF | 2006 | 98.00 | |
| Polat et al. | SVM- LS | 2007 | 98.53 | ||
| Kumar et al. | SVM-DT | 2010 | 87.08 | ||
| Chen et al. | SVM-RS | 2011 | 96.87 | ||
| Chen et al. | PSO-SVM | 2012 | 99.30 | ||
| Desir | RF-OC | 2012 | 96.00 | ||
| Chaurasia et al. | SVM-CFS | 2013 | 96.40 | ||
| Zheng et al. | K-means -SVM | 2014 | 97.38 | ||
| Gorunescu et al. | SVM | 2014 | 95.58 | ||
| Chen et al. | PSO-SVM | 2014 | 98.01 | ||
| Chen et al. | PTVPSO-SVM | 2014 | 98.44 | ||
| Chen et al. | GRID-SVM | 2014 | 97.45 | ||
|
| Hassan et al. | Fuzzy-HMM | 2010 | 98.16 | |
| Chin et al. | Fuzzy tree-CB | 2011 | 98.90 | ||
| Ballings et al. | KF-RBF | 2013 | 94.19 | ||
| Gorunescu et al. | MLP-GA | 2014 | 93.58 | ||
|
|
|
|
| ||
|
|
| Desir et al. | RF-One class | 2012 | 96.00 |
| Seera et al. | RF-FuzzyMM-CART | 2014 | 97.29 | ||
| Bonissone et al. | RF-Fuzzy | 2010 | 97.30 | ||
| Bonissone et al. | RF | 2010 | 97.07 | ||
|
| Desir et al. | SVM-One class | 2012 | 92.00 | |
| Stoean et al. | SVM | 2013 | 96.50 | ||
| Stoean et al. | SVM-FS | 2013 | 97.07 | ||
| Fernandez-Delgado et al. | SVM-DKP | 2014 | 96.10 | ||
| Fernandez-Delgado et al. | SVM | 2014 | 97.10 | ||
| Gorunescu et al. | SVM | 2014 | 96.92 | ||
| Chen et al. | PSO-SVM | 2014 | 97.55 | ||
| Chen et al. | PTVPSO-SVM | 2014 | 98.62 | ||
| Chen et al. | GRID-SVM | 2014 | 96.62 | ||
|
| Bonissone et al. | Fuzzy tree-Bagging | 2010 | 95.68 | |
| Bonissone et al. | Fuzzy tree-Boosting | 2010 | 94.51 | ||
| Polat et al. | Fuzzy-AIRS | 2007 | 98.51 | ||
| Subashini et al. | SVM-CFS | 2011 | 92.13 | ||
| Orkcu et al. | Binary Coded-GA | 2011 | 94.00 | ||
| Luukka | similarity classifier | 2011 | 97.49 | ||
| Luukka | similarity classifier-Fuzzy entropy | 2011 | 97.18 | ||
| Gorunescu et al. | MLP-GA | 2014 | 91.42 | ||
| Seera et al. | FuzzyMM-CART | 2014 | 93.14 | ||
|
|
|
|
| ||
|
|
| Bonissone et al. | RF-Fuzzy | 2010 | 76.53 |
| Bonissone et al. | RF | 2010 | 75.26 | ||
| Desir et al. | RF-One class | 2012 | 68.00 | ||
| Tripoliti et al. | RF | 2012 | 77.30 | ||
| Cadenas et al. | RF-Fuzzy | 2013 | 76.43 | ||
| Cadenas et al. | RF-Fuzzy-fs | 2013 | 75.69 | ||
| Fernandez-Delgado et al. | RF | 2014 | 74.60 | ||
| Seera et al. | RF-FuzzyMM-CART | 2014 | 76.56 | ||
|
| Polat et al. | GDA–LS-SVM | 2008 | 82.05 | |
| Tan et al. | SVM-GA | 2009 | 78.26 | ||
| Desir et al. | SVM- One class | 2012 | 34.00 | ||
| Chorowski et al. | SVM | 2014 | 76.00 | ||
| Chorowski et al. | SVM-LS | 2014 | 76.00 | ||
| Fernandez-Delgado et al. | SVM-DKP | 2014 | 74.7 | ||
| Fernandez-Delgado et al. | SVM | 2014 | 75.8 | ||
| Chen et al. | PSO-SVM | 2014 | 77.58 | ||
| Chen et al. | PTVPSO-SVM | 2014 | 78.14 | ||
| Chen et al. | GRID-SVM | 2014 | 76.65 | ||
|
| Dogantekin et al. | LDA-ANFIS | 2010 | 84.61 | |
| Bonissone et al. | Fuzzy tree | 2010 | 67.55 | ||
| Bonissone et al. | Fuzzy tree-Bagging | 2010 | 73.63 | ||
| Bonissone et al. | Fuzzy tree-Boosting | 2010 | 66.18 | ||
| Ozcift | RF-CFS | 2011 | 74.47 | ||
| Luukka | similarity classifier | 2011 | 75.29 | ||
| Luukka | similarity classifier-Fuzzy entropy | 2011 | 75.97 | ||
| Chorowski et al. | ELM | 2014 | 76.00 | ||
| Chorowski et al. | ML-ELM | 2014 | 77.00 | ||
| Fernandez-Delgado et al. | SVM-DKP | 2014 | 74.7 | ||
| Fernandez-Delgado et al. | SVM | 2014 | 75.8 | ||
| Seera et al. | FuzzyMM-CART | 2014 | 69.13 | ||
|
|
|
|
|