| Literature DB >> 35295673 |
Zefeng Liang1, Huan Wang2, Kaixiang Yang3, Yifan Shi4.
Abstract
The imbalance problem is widespread in real-world applications. When training a classifier on the imbalance datasets, the classifier is hard to learn an appropriate decision boundary, which causes unsatisfying classification performance. To deal with the imbalance problem, various ensemble algorithms are proposed. However, conventional ensemble algorithms do not consider exploring an effective feature space to further improve the performance. In addition, they treat the base classifiers equally and ignore the different contributions of each base classifier to the ensemble result. In order to address these problems, we propose a novel ensemble algorithm that combines effective data transformation and an adaptive weighted voting scheme. First, we utilize modified metric learning to obtain an effective feature space based on imbalanced data. Next, the base classifiers are assigned different weights adaptively. The experiments on multiple imbalanced datasets, including images and biomedical datasets verify the superiority of our proposed ensemble algorithm.Entities:
Keywords: classification; ensemble learning; imbalance learning; information fusion; metric learning
Year: 2022 PMID: 35295673 PMCID: PMC8918481 DOI: 10.3389/fnbot.2022.827913
Source DB: PubMed Journal: Front Neurorobot ISSN: 1662-5218 Impact factor: 2.650
Figure 1The overall framework of ensemble algorithm.
Imbalance Ensemble Framework.
|
|
| |
| |
| |
| |
| |
| |
| |
| |
Figure 2The diagram of proposed data transformation. After the data transformation, the similar neighbors are closer, while the dissimilar neighbor samples are pushed and hold a certain distance to the anchor sample. (A) Before data transformation. (B) After data transformation.
Adaptive weight Procedure.
|
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
The attributes of datasets.
|
|
|
| |
|---|---|---|---|
| climate | 10.74 | 540 | 18 |
| libras_move | 14.00 | 360 | 90 |
| ecoli2 | 5.46 | 90 | 7 |
| glass_0_1_2_3_vs_4_5_6 | 3.20 | 214 | 9 |
| yeast3 | 8.10 | 1484 | 8 |
| cleveland_0_vs_4 | 12.31 | 173 | 13 |
| winequality_red_4 | 29.17 | 1599 | 11 |
| ecoli1 | 3.36 | 336 | 7 |
Comparisons between imbalance learning algorithm and our proposed method in terms of AUC.
|
|
|
|
|
|
| |
|---|---|---|---|---|---|---|
| climate | 0.8785 ± 0.0092 | 0.5353 ± 0.0188 | 0.7462 ± 0.0245 | 0.8496 ± 0.0206 | 0.8787 ± 0.0162 |
|
| libras_move | 0.8741 ± 0.0227 | 0.8133 ± 0.025 | 0.8015 ± 0.06 | 0.8544 ± 0.0153 | 0.8146 ± 0.0286 |
|
| ecoli2 | 0.8671 ± 0.0121 | 0.813 ± 0.0113 | 0.8252 ± 0.0504 | 0.8678 ± 0.0138 | 0.8537 ± 0.0102 |
|
| glass_0_1_2_3_vs_4_5_6 | 0.8945 ± 0.0172 | 0.876 ± 0.0129 | 0.8441 ± 0.039 | 0.8804 ± 0.0251 | 0.827 ± 0.0295 |
|
| yeast3 | 0.8278 ± 0.8278 | 0.7348 ± 0.0219 | 0.8163 ± 0.0258 | 0.8256 ± 0.0273 | 0.8381 ± 0.0163 |
|
| cleveland_0_vs_4 | 0.8919 ± 0.0103 | 0.6367 ± 0.088 | 0.7547 ± 0.05 | 0.848 ± 0.0405 | 0.8871 ± 0.0236 |
|
| winequality_red_4 | 0.6667 ± 0.0175 | 0.5291 ± 0.0093 | 0.5919 ± 0.0323 | 0.6515 ± 0.6515 | 0.6615 ± 0.0111 |
|
| ecoli1 | 0.8715 ± 0.0208 | 0.8461 ± 0.0297 | 0.7868 ± 0.0464 | 0.8756 ± 0.0228 | 0.8786 ± 0.0062 |
|
| AVERAGE_AUC | 0.8465 | 0.723 | 0.7708 | 0.8316 | 0.8299 |
|
The bold value means the best result among the compared algorithms.
Figure 3The Fashion-mnist dataset.
Comparisons between imbalance learning algorithm and our proposed method in terms of AUC.
|
|
|
|
|
|
| |
|---|---|---|---|---|---|---|
| Fashion-mnist | 0.9384 ± 0.0018 | 0.9523 ± 0.0025 | 0.9414 ± 0.006 | 0.9561 ± 0.0043 | 0.9543 ± 0.0044 |
|
The bold value means the best result among the compared algorithms.
Figure 4Effect of the number of subspaces on the performance of ensemble on six datasets.