| Literature DB >> 28255296 |
Abstract
Inspired by the group decision making process, ensembles or combinations of classifiers have been found favorable in a wide variety of application domains. Some researchers propose to use the mixture of two different types of classification algorithms to create a hybrid ensemble. Why does such an ensemble work? The question remains. Following the concept of diversity, which is one of the fundamental elements of the success of ensembles, we conduct a theoretical analysis of why hybrid ensembles work, connecting using different algorithms to accuracy gain. We also conduct experiments on classification performance of hybrid ensembles of classifiers created by decision tree and naïve Bayes classification algorithms, each of which is a top data mining algorithm and often used to create non-hybrid ensembles. Therefore, through this paper, we provide a complement to the theoretical foundation of creating and using hybrid ensembles.Entities:
Mesh:
Year: 2017 PMID: 28255296 PMCID: PMC5307253 DOI: 10.1155/2017/1930702
Source DB: PubMed Journal: Comput Intell Neurosci
Algorithm 1Process to create a hybrid ensemble.
Algorithm 2Process to estimate instability.
Algorithm 3Process to estimate differentiability.
Characteristics of the data sets used in experiments.
| Number | Name | Samples | Attributes | ||
|---|---|---|---|---|---|
| All | Min. | Nom. | Number | ||
| (1) | biomed | 209 | 75 | 1 (0) | 7 (2) |
| (2) | boston-binary | 506 | 132 | 0 (0) | 13 (0) |
| (3) | breast-w | 699 | 241 | 0 (0) | 9 (1) |
| (4) | colic | 368 | 136 | 15 (14) | 7 (7) |
| (5) | credit-a | 690 | 307 | 9 (5) | 6 (2) |
| (6) | credit-g | 1000 | 300 | 13 (0) | 7 (0) |
| (7) | credit | 490 | 217 | 9 (5) | 6 (2) |
| (8) | diabetes | 768 | 268 | 0 (0) | 8 (0) |
| (9) | heart-c-binary | 303 | 138 | 7 (1) | 6 (1) |
| (10) | heart-h-binary | 294 | 106 | 7 (5) | 6 (4) |
| (11) | heart-statlog | 270 | 120 | 0 (0) | 13 (0) |
| (12) | hepatitis | 155 | 32 | 13 (10) | 6 (5) |
| (13) | hprice-binary | 546 | 271 | 11 (0) | 0 (0) |
| (14) | ICU | 200 | 40 | 16 (0) | 3 (0) |
| (15) | ionosphere | 351 | 126 | 0 (0) | 34 (0) |
| (16) | kr-vs-kp | 3196 | 1527 | 36 (0) | 0 (0) |
| (17) | schizo | 340 | 163 | 2 (0) | 11 (11) |
| (18) | sick | 3772 | 231 | 22 (1) | 7 (7) |
| (19) | sonar | 208 | 97 | 0 (0) | 60 (0) |
| (20) | vote | 435 | 168 | 16 (16) | 0 (0) |
Results for instability.
| Number | DT | NB |
|---|---|---|
| (1) | 0.16 | 0.01 |
| (2) | 0.04 | 0.06 |
| (3) | 0 | 0 |
| (4) | 0.61 | 0.45 |
| (5) | 0.72 | 0.25 |
| (6) | 1 | 0.97 |
| (7) | 0.61 | 0.23 |
| (8) | 0.96 | 0.66 |
| (9) | 0.9 | 0.37 |
| (10) | 0.9 | 0.18 |
| (11) | 0.7 | 0.33 |
| (12) | 0.72 | 0.58 |
| (13) | 1 | 0.24 |
| (14) | 0.76 | 0.68 |
| (15) | 0.05 | 0.49 |
| (16) | 0 | 0.18 |
| (17) | 1 | 0.85 |
| (18) | 0 | 0.04 |
| (19) | 0.19 | 0.9 |
| (20) | 0 | 0 |
Results for differentiability.
| Number | DT versus NB |
|---|---|
| (1) | 0.93 |
| (2) | 1 |
| (3) | 0.05 |
| (4) | 1 |
| (5) | 1 |
| (6) | 1 |
| (7) | 1 |
| (8) | 1 |
| (9) | 1 |
| (10) | 1 |
| (11) | 1 |
| (12) | 0.98 |
| (13) | 1 |
| (14) | 0.97 |
| (15) | 1 |
| (16) | 1 |
| (17) | 1 |
| (18) | 1 |
| (19) | 1 |
| (20) | 0.95 |
Performance in accuracy.
| Number | Single | Bagging | HE | ||
|---|---|---|---|---|---|
| DT | NB | DT | NB | DT + NB | |
| (1) | 0.891 ± 0.014 | 0.894 ± 0.005 | 0.908 ± 0.009 | 0.893 ± 0.005 | 0.909 ± 0.007 |
| (2) | 0.901 ± 0.008 | 0.709 ± 0.003 | 0.918 ± 0.007 | 0.717 ± 0.005 | 0.873 ± 0.006 |
| (3) | 0.948 ± 0.004 | 0.964 ± 0.001 | 0.958 ± 0.004 | 0.961 ± 0.001 | 0.967 ± 0.002 |
| (4) | 0.852 ± 0.004 | 0.784 ± 0.004 | 0.854 ± 0.005 | 0.786 ± 0.006 | 0.847 ± 0.006 |
| (5) | 0.857 ± 0.007 | 0.778 ± 0.003 | 0.862 ± 0.004 | 0.784 ± 0.003 | 0.828 ± 0.006 |
| (6) | 0.714 ± 0.007 | 0.751 ± 0.006 | 0.738 ± 0.009 | 0.759 ± 0.006 | 0.755 ± 0.008 |
| (7) | 0.865 ± 0.009 | 0.779 ± 0.005 | 0.882 ± 0.007 | 0.784 ± 0.005 | 0.836 ± 0.009 |
| (8) | 0.745 ± 0.007 | 0.755 ± 0.004 | 0.759 ± 0.008 | 0.756 ± 0.005 | 0.767 ± 0.004 |
| (9) | 0.775 ± 0.016 | 0.833 ± 0.005 | 0.787 ± 0.018 | 0.834 ± 0.004 | 0.835 ± 0.007 |
| (10) | 0.793 ± 0.016 | 0.843 ± 0.004 | 0.796 ± 0.014 | 0.845 ± 0.005 | 0.841 ± 0.006 |
| (11) | 0.784 ± 0.016 | 0.839 ± 0.007 | 0.801 ± 0.017 | 0.838 ± 0.006 | 0.847 ± 0.008 |
| (12) | 0.784 ± 0.016 | 0.839 ± 0.009 | 0.805 ± 0.019 | 0.842 ± 0.012 | 0.852 ± 0.008 |
| (13) | 0.766 ± 0.013 | 0.817 ± 0.003 | 0.783 ± 0.009 | 0.818 ± 0.003 | 0.818 ± 0.003 |
| (14) | 0.823 ± 0.013 | 0.808 ± 0.008 | 0.838 ± 0.012 | 0.806 ± 0.01 | 0.835 ± 0.011 |
| (15) | 0.891 ± 0.012 | 0.823 ± 0.005 | 0.925 ± 0.008 | 0.825 ± 0.007 | 0.882 ± 0.008 |
| (16) | 0.994 ± 0.001 | 0.878 ± 0.002 | 0.994 ± 0.001 | 0.878 ± 0.002 | 0.952 ± 0.002 |
| (17) | 0.562 ± 0.016 | 0.575 ± 0.004 | 0.595 ± 0.016 | 0.576 ± 0.006 | 0.60 ± 0.01 |
| (18) | 0.987 ± 0.001 | 0.928 ± 0.001 | 0.988 ± 0.001 | 0.927 ± 0.002 | 0.982 ± 0.001 |
| (19) | 0.737 ± 0.019 | 0.689 ± 0.009 | 0.787 ± 0.026 | 0.684 ± 0.019 | 0.728 ± 0.014 |
| (20) | 0.965 ± 0.003 | 0.90 ± 0.002 | 0.965 ± 0.004 | 0.90 ± 0.002 | 0.944 ± 0.003 |
Performance in F1-measure.
| Number | Single | Bagging | HE | ||
|---|---|---|---|---|---|
| DT | NB | DT | NB | DT + NB | |
| (1) | 0.842 ± 0.019 | 0.833 ± 0.008 | 0.865 ± 0.012 | 0.833 ± 0.008 | 0.863 ± 0.011 |
| (2) | 0.805 ± 0.017 | 0.615 ± 0.002 | 0.842 ± 0.013 | 0.616 ± 0.005 | 0.778 ± 0.009 |
| (3) | 0.925 ± 0.006 | 0.944 ± 0.002 | 0.939 ± 0.006 | 0.944 ± 0.002 | 0.952 ± 0.002 |
| (4) | 0.782 ± 0.005 | 0.722 ± 0.006 | 0.786 ± 0.006 | 0.724 ± 0.006 | 0.772 ± 0.008 |
| (5) | 0.838 ± 0.007 | 0.705 ± 0.005 | 0.847 ± 0.004 | 0.709 ± 0.005 | 0.795 ± 0.008 |
| (6) | 0.458 ± 0.014 | 0.542 ± 0.01 | 0.497 ± 0.016 | 0.546 ± 0.012 | 0.516 ± 0.017 |
| (7) | 0.844 ± 0.01 | 0.707 ± 0.007 | 0.866 ± 0.009 | 0.716 ± 0.008 | 0.804 ± 0.012 |
| (8) | 0.617 ± 0.012 | 0.631 ± 0.006 | 0.636 ± 0.011 | 0.633 ± 0.008 | 0.635 ± 0.007 |
| (9) | 0.757 ± 0.017 | 0.813 ± 0.005 | 0.763 ± 0.021 | 0.814 ± 0.005 | 0.811 ± 0.008 |
| (10) | 0.692 ± 0.028 | 0.788 ± 0.004 | 0.706 ± 0.023 | 0.782 ± 0.006 | 0.776 ± 0.008 |
| (11) | 0.751 ± 0.017 | 0.815 ± 0.007 | 0.773 ± 0.018 | 0.814 ± 0.007 | 0.821 ± 0.01 |
| (12) | 0.408 ± 0.065 | 0.645 ± 0.02 | 0.439 ± 0.066 | 0.645 ± 0.02 | 0.652 ± 0.02 |
| (13) | 0.757 ± 0.014 | 0.809 ± 0.003 | 0.775 ± 0.011 | 0.811 ± 0.003 | 0.809 ± 0.003 |
| (14) | 0.423 ± 0.044 | 0.482 ± 0.016 | 0.455 ± 0.037 | 0.466 ± 0.015 | 0.476 ± 0.042 |
| (15) | 0.842 ± 0.017 | 0.778 ± 0.005 | 0.882 ± 0.013 | 0.783 ± 0.007 | 0.847 ± 0.01 |
| (16) | 0.994 ± 0.001 | 0.871 ± 0.002 | 0.994 ± 0.001 | 0.871 ± 0.002 | 0.945 ± 0.003 |
| (17) | 0.50 ± 0.027 | 0.505 ± 0.006 | 0.555 ± 0.022 | 0.509 ± 0.009 | 0.506 ± 0.016 |
| (18) | 0.894 ± 0.008 | 0.569 ± 0.006 | 0.901 ± 0.01 | 0.567 ± 0.005 | 0.842 ± 0.01 |
| (19) | 0.717 ± 0.02 | 0.702 ± 0.008 | 0.762 ± 0.03 | 0.701 ± 0.018 | 0.736 ± 0.015 |
| (20) | 0.955 ± 0.003 | 0.877 ± 0.002 | 0.955 ± 0.004 | 0.877 ± 0.003 | 0.928 ± 0.004 |
Results for Wilcoxon test for accuracy (W/L: row versus column).
| NB | B-DT | B-NB | HE DT + NB | |
|---|---|---|---|---|
| DT | 10/9 | 0/16 | 10/9 | 8/11 |
| NB | 7/12 | 0/1 | 0/17 | |
| B-DT | 12/7 | 9/8 | ||
| B-NB | 1/17 |
Results for Wilcoxon test for F1-measure (W/L: row versus column).
| NB | B-DT | B-NB | HE DT + NB | |
|---|---|---|---|---|
| DT | 18/0 | 0/18 | 9/9 | 7/11 |
| NB | 8/11 | 1/2 | 2/13 | |
| B-DT | 11/7 | 10/7 | ||
| B-NB | 2/12 |
Lower bounds for accuracy gain and the probability.
| Number | Acc1 | Acc2 |
|
| Acc. gain | Prob. | B-DT | HE | Acc. diff. |
|---|---|---|---|---|---|---|---|---|---|
| (1) | 0.891 | 0.894 | 0.16 | 0.93 | −7.5 × 10−6 | 0.9412 | 0.908 | 0.909 | 0.001 |
| (3) | 0.948 | 0.96 | 0 | 0.05 | −0.00003 | 0.05 | 0.958 | 0.967 | 0.009 |
| (6) | 0.714 | 0.751 | 1 | 1 | −9.25 × 10−5 | 1 | 0.738 | 0.755 | 0.017 |
| (8) | 0.745 | 0.755 | 0.96 | 1 | −0.000025 | 1 | 0.759 | 0.767 | 0.008 |
| (9) | 0.775 | 0.833 | 0.9 | 1 | −0.000145 | 1 | 0.787 | 0.835 | 0.048 |
| (11) | 0.784 | 0.839 | 0.7 | 1 | −0.0001375 | 1 | 0.801 | 0.847 | 0.046 |
| (12) | 0.784 | 0.839 | 0.72 | 0.98 | −0.0001375 | 0.9944 | 0.805 | 0.85 | 0.045 |
| (17) | 0.56 | 0.575 | 1 | 1 | −3.75 × 10−5 | 1 | 0.595 | 0.6 | 0.005 |