| Literature DB >> 33286898 |
Jędrzej Biedrzycki1, Robert Burduk1.
Abstract
A vital aspect of the Multiple Classifier Systems construction process is the base model integration. For example, the Random Forest approach used the majority voting rule to fuse the base classifiers obtained by bagging the training dataset. In this paper we propose the algorithm that uses partitioning the feature space whose split is determined by the decision rules of each decision tree node which is the base classification model. After dividing the feature space, the centroid of each new subspace is determined. This centroids are used in order to determine the weights needed in the integration phase based on the weighted majority voting rule. The proposal was compared with other Multiple Classifier Systems approaches. The experiments regarding multiple open-source benchmarking datasets demonstrate the effectiveness of our method. To discuss the results of our experiments, we use micro and macro-average classification performance measures.Entities:
Keywords: classifier ensemble; classifier integration; decision tree; majority voting; random forest
Year: 2020 PMID: 33286898 PMCID: PMC7597268 DOI: 10.3390/e22101129
Source DB: PubMed Journal: Entropy (Basel) ISSN: 1099-4300 Impact factor: 2.524
Figure 1The process of extracting subspaces from base classifiers and determining neighbors for a subspace.
Descriptions of datasets used in experiments (name with abbreviation, number of instances, number of features, imbalance ratio).
| Dataset | #inst | #f | Imb |
|---|---|---|---|
| Indoor Channel Measurements (aa) | 7840 | 5 | 208.0 |
| Appendicitis (ap) | 106 | 7 | 4.0 |
| Banana (ba) | 5300 | 2 | 5.9 |
| QSAR biodegradation (bi) | 1055 | 41 | 2.0 |
| Liver Disorders (BUPA) (bu) | 345 | 6 | 1.4 |
| Cryotherapy (c) | 90 | 7 | 1.1 |
| Banknote authentication (d) | 1372 | 5 | 1.2 |
| Ecoli (e) | 336 | 7 | 71.5 |
| Haberman’s Survival (h) | 306 | 3 | 2.8 |
| Ionosphere (io) | 351 | 34 | 1.8 |
| Iris plants (ir) | 150 | 4 | 1.0 |
| Magic (ma) | 19,020 | 10 | 1.0 |
| Ultrasonic flowmeter diagnostics (me) | 540 | 173 | 1.4 |
| Phoneme (ph) | 5404 | 5 | 2.4 |
| Pima (pi) | 768 | 8 | 1.9 |
| Climate model simulation crashes (po) | 540 | 18 | 10.7 |
| Ring (r) | 7400 | 20 | 1.0 |
| Spambase (sb) | 4597 | 57 | 1.5 |
| Seismic-bumps (se) | 2584 | 19 | 14.2 |
| Texture (te) | 5500 | 40 | 1.0 |
| Thyroid (th) | 7200 | 21 | 1.0 |
| Titanic (ti) | 2201 | 3 | 2.1 |
| Twonorm (tw) | 7400 | 20 | 1.0 |
| Breast Cancer (Diagnostic) (wd) | 569 | 30 | 1.7 |
| Breast Cancer (Original) (wi) | 699 | 9 | 1.9 |
| Wine quality – red (wr) | 1599 | 11 | 68.1 |
| Wine quality – white (ww) | 4898 | 11 | 439.6 |
| Yeast (y) | 1484 | 8 | 92.6 |
Average accuracy and f-scores for the random forest, the majority voting and the proposed algorithm together with Friedman ranks.
| Average Accuracy | F-Score | F-Score | |||||||
|---|---|---|---|---|---|---|---|---|---|
| Dataset |
|
|
|
|
|
|
|
|
|
| aa | 0.917 | 0.918 | 0.919 | 0.469 | 0.474 | 0.477 | 0.196 | 0.192 | 0.176 |
| ap | 0.853 | 0.812 | 0.863 | 0.853 | 0.812 | 0.863 | 0.676 | 0.559 | 0.692 |
| ba | 0.789 | 0.808 | 0.815 | 0.683 | 0.712 | 0.722 | 0.483 | 0.493 | 0.502 |
| bi | 0.736 | 0.736 | 0.702 | 0.736 | 0.736 | 0.702 | 0.717 | 0.717 | 0.561 |
| bu | 0.579 | 0.527 | 0.536 | 0.579 | 0.527 | 0.536 | 0.563 | 0.520 | 0.512 |
| c | 0.762 | 0.867 | 0.684 | 0.762 | 0.867 | 0.684 | 0.773 | 0.870 | 0.698 |
| d | 0.935 | 0.935 | 0.938 | 0.935 | 0.935 | 0.938 | 0.934 | 0.934 | 0.936 |
| e | 0.825 | 0.827 | 0.825 | 0.414 | 0.423 | 0.414 | 0.110 | 0.167 | 0.106 |
| h | 0.637 | 0.691 | 0.657 | 0.637 | 0.691 | 0.657 | 0.480 | 0.581 | 0.491 |
| io | 0.862 | 0.868 | 0.458 | 0.862 | 0.868 | 0.458 | 0.845 | 0.853 | 0.578 |
| ir | 0.965 | 0.961 | 0.978 | 0.947 | 0.942 | 0.968 | 0.945 | 0.943 | 0.968 |
| ma | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 |
| me | 0.582 | 0.681 | 0.615 | 0.582 | 0.681 | 0.615 | 0.583 | 0.688 | 0.607 |
| ph | 0.771 | 0.767 | 0.774 | 0.771 | 0.767 | 0.774 | 0.720 | 0.718 | 0.724 |
| pi | 0.699 | 0.685 | 0.704 | 0.699 | 0.685 | 0.704 | 0.661 | 0.653 | 0.670 |
| po | 0.879 | 0.872 | 0.897 | 0.879 | 0.872 | 0.897 | 0.468 | 0.466 | 0.473 |
| r | 0.726 | 0.723 | 0.728 | 0.726 | 0.723 | 0.728 | 0.733 | 0.730 | 0.735 |
| sb | 0.711 | 0.718 | 0.712 | 0.711 | 0.718 | 0.712 | 0.686 | 0.694 | 0.686 |
| se | 0.924 | 0.921 | 0.926 | 0.924 | 0.921 | 0.926 | 0.520 | 0.516 | 0.497 |
| te | 0.889 | 0.890 | 0.892 | 0.389 | 0.392 | 0.408 | 0.393 | 0.387 | 0.404 |
| th | 0.983 | 0.981 | 0.982 | 0.974 | 0.972 | 0.973 | 0.849 | 0.825 | 0.851 |
| ti | 0.788 | 0.778 | 0.681 | 0.788 | 0.778 | 0.681 | 0.752 | 0.732 | 0.405 |
| tw | 0.717 | 0.714 | 0.724 | 0.717 | 0.714 | 0.724 | 0.717 | 0.714 | 0.724 |
| wd | 0.902 | 0.893 | 0.918 | 0.902 | 0.893 | 0.918 | 0.893 | 0.884 | 0.911 |
| wi | 0.936 | 0.955 | 0.944 | 0.936 | 0.955 | 0.944 | 0.931 | 0.951 | 0.941 |
| wr | 0.831 | 0.827 | 0.823 | 0.493 | 0.481 | 0.468 | 0.241 | 0.227 | 0.225 |
| ww | 0.839 | 0.838 | 0.840 | 0.459 | 0.457 | 0.464 | 0.205 | 0.224 | 0.208 |
| y | 0.866 | 0.861 | 0.865 | 0.349 | 0.325 | 0.344 | 0.223 | 0.214 | 0.234 |
| rank | 2.00 | 2.14 | 1.61 | 2.00 | 2.14 | 1.61 | 1.93 | 2.04 | 1.79 |
Micro-average precision and recall for the random forest, the majority voting and the proposed algorithm together with Friedman ranks.
| Precision | Recall | |||||
|---|---|---|---|---|---|---|
| Dataset |
|
|
|
|
|
|
| aa | 0.469 | 0.475 | 0.477 | 0.469 | 0.473 | 0.477 |
| ap | 0.853 | 0.812 | 0.863 | 0.853 | 0.812 | 0.863 |
| ba | 0.683 | 0.712 | 0.722 | 0.683 | 0.712 | 0.722 |
| bi | 0.736 | 0.736 | 0.702 | 0.736 | 0.736 | 0.702 |
| bu | 0.579 | 0.527 | 0.536 | 0.579 | 0.527 | 0.536 |
| c | 0.762 | 0.867 | 0.684 | 0.762 | 0.867 | 0.684 |
| d | 0.935 | 0.935 | 0.938 | 0.935 | 0.935 | 0.938 |
| e | 0.418 | 0.424 | 0.417 | 0.411 | 0.423 | 0.411 |
| h | 0.637 | 0.691 | 0.657 | 0.637 | 0.691 | 0.657 |
| io | 0.862 | 0.868 | 0.458 | 0.862 | 0.868 | 0.458 |
| ir | 0.947 | 0.942 | 0.968 | 0.947 | 0.942 | 0.968 |
| ma | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 |
| me | 0.582 | 0.681 | 0.615 | 0.582 | 0.681 | 0.615 |
| ph | 0.771 | 0.767 | 0.774 | 0.771 | 0.767 | 0.774 |
| pi | 0.699 | 0.685 | 0.704 | 0.699 | 0.685 | 0.704 |
| po | 0.879 | 0.872 | 0.897 | 0.879 | 0.872 | 0.897 |
| r | 0.726 | 0.723 | 0.728 | 0.726 | 0.723 | 0.728 |
| sb | 0.711 | 0.718 | 0.712 | 0.711 | 0.718 | 0.712 |
| se | 0.924 | 0.921 | 0.926 | 0.924 | 0.921 | 0.926 |
| te | 0.389 | 0.392 | 0.408 | 0.389 | 0.392 | 0.408 |
| th | 0.974 | 0.972 | 0.973 | 0.974 | 0.972 | 0.973 |
| ti | 0.788 | 0.778 | 0.681 | 0.788 | 0.778 | 0.681 |
| tw | 0.717 | 0.714 | 0.724 | 0.717 | 0.714 | 0.724 |
| wd | 0.902 | 0.893 | 0.918 | 0.902 | 0.893 | 0.918 |
| wi | 0.936 | 0.955 | 0.944 | 0.936 | 0.955 | 0.944 |
| wr | 0.493 | 0.481 | 0.468 | 0.493 | 0.481 | 0.468 |
| ww | 0.459 | 0.457 | 0.464 | 0.459 | 0.457 | 0.464 |
| y | 0.350 | 0.325 | 0.344 | 0.349 | 0.325 | 0.344 |
| rank | 2.00 | 2.14 | 1.64 | 2.00 | 2.14 | 1.61 |
Macro-average precision and recall for the random forest, the majority voting and the proposed algorithm together with Friedman ranks.
| Precision | Recall | |||||
|---|---|---|---|---|---|---|
| Dataset |
|
|
|
|
|
|
| aa | 0.179 | 0.171 | 0.152 | 0.217 | 0.218 | 0.209 |
| ap | 0.705 | 0.557 | 0.710 | 0.663 | 0.563 | 0.684 |
| ba | 0.475 | 0.475 | 0.486 | 0.491 | 0.512 | 0.519 |
| bi | 0.712 | 0.712 | 0.526 | 0.721 | 0.721 | 0.614 |
| bu | 0.564 | 0.521 | 0.513 | 0.561 | 0.520 | 0.511 |
| c | 0.779 | 0.872 | 0.702 | 0.767 | 0.868 | 0.693 |
| d | 0.936 | 0.935 | 0.938 | 0.932 | 0.933 | 0.935 |
| e | 0.079 | 0.123 | 0.075 | 0.182 | 0.259 | 0.186 |
| h | 0.474 | 0.594 | 0.485 | 0.486 | 0.568 | 0.500 |
| io | 0.860 | 0.881 | 0.596 | 0.831 | 0.828 | 0.562 |
| ir | 0.944 | 0.942 | 0.968 | 0.945 | 0.944 | 0.968 |
| ma | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 |
| me | 0.582 | 0.683 | 0.614 | 0.585 | 0.692 | 0.600 |
| ph | 0.726 | 0.721 | 0.730 | 0.715 | 0.716 | 0.718 |
| pi | 0.664 | 0.652 | 0.670 | 0.659 | 0.655 | 0.669 |
| po | 0.451 | 0.450 | 0.451 | 0.487 | 0.483 | 0.496 |
| r | 0.742 | 0.738 | 0.743 | 0.724 | 0.721 | 0.726 |
| sb | 0.723 | 0.728 | 0.721 | 0.653 | 0.663 | 0.655 |
| se | 0.542 | 0.527 | 0.495 | 0.504 | 0.507 | 0.501 |
| te | 0.397 | 0.380 | 0.400 | 0.390 | 0.393 | 0.409 |
| th | 0.816 | 0.803 | 0.826 | 0.886 | 0.848 | 0.879 |
| ti | 0.853 | 0.780 | 0.341 | 0.673 | 0.692 | 0.500 |
| tw | 0.718 | 0.714 | 0.724 | 0.717 | 0.714 | 0.724 |
| wd | 0.894 | 0.883 | 0.911 | 0.892 | 0.886 | 0.911 |
| wi | 0.926 | 0.948 | 0.931 | 0.935 | 0.954 | 0.951 |
| wr | 0.246 | 0.228 | 0.234 | 0.236 | 0.226 | 0.216 |
| ww | 0.226 | 0.247 | 0.230 | 0.189 | 0.206 | 0.191 |
| y | 0.232 | 0.200 | 0.244 | 0.216 | 0.230 | 0.226 |
| rank | 1.86 | 2.07 | 1.79 | 2.18 | 1.82 | 1.82 |
Figure 2Friedman ranks for calculated metrics together with Nemenyi critical values.