| Literature DB >> 29398782 |
Guillem Collell1,2, Drazen Prelec1,3,4, Kaustubh R Patil1,5.
Abstract
Class imbalance presents a major hurdle in the application of classification methods. A commonly taken approach is to learn ensembles of classifiers using rebalanced data. Examples include bootstrap averaging (bagging) combined with either undersampling or oversampling of the minority class examples. However, rebalancing methods entail asymmetric changes to the examples of different classes, which in turn can introduce their own biases. Furthermore, these methods often require specifying the performance measure of interest a priori, i.e., before learning. An alternative is to employ the threshold moving technique, which applies a threshold to the continuous output of a model, offering the possibility to adapt to a performance measure a posteriori, i.e., a plug-in method. Surprisingly, little attention has been paid to this combination of a bagging ensemble and threshold-moving. In this paper, we study this combination and demonstrate its competitiveness. Contrary to the other resampling methods, we preserve the natural class distribution of the data resulting in well-calibrated posterior probabilities. Additionally, we extend the proposed method to handle multiclass data. We validated our method on binary and multiclass benchmark data sets by using both, decision trees and neural networks as base classifiers. We perform analyses that provide insights into the proposed method.Entities:
Keywords: Bagging ensembles; Binary classification; Imbalanced data; Multiclass classification; Posterior calibration; Resampling
Year: 2018 PMID: 29398782 PMCID: PMC5750819 DOI: 10.1016/j.neucom.2017.08.035
Source DB: PubMed Journal: Neurocomputing ISSN: 0925-2312 Impact factor: 5.719
Pseudo-code for the bagging ensemble.
| 1. Learning: |
| 1.1. Input: A training set |
| 1.2. Generate |
| 1.3. Learn |
| 2. Prediction: |
| 2.1. Input: an instance |
| 2.2. Each base classifier |
| 2.3. Compute averages of probabilistic predictions for each class |
| 2.4. Rank each class |
| 2.5. Assign the label for which the score in 2.4 is the highest. |
1The sampling mechanism has been purposefully left unspecified. It is specified in the context of the respective methods.
Confusion matrix in binary classification.
| Predicted positive | Predicted negative | |
|---|---|---|
| Actual positive | TP (true positive) | FN (false negative) |
| Actual negative | FP (false positive) | TN (true negative) |
Overview of the binary data sets obtained from UCI, HDDT* and KEEL† repositories (names were shortened for convenience).
| Dataset | #Inst | #Attr | #Num | %Min | Dataset | #Inst | #Attr | #Num | %Min |
|---|---|---|---|---|---|---|---|---|---|
| pima | 768 | 8 | 8 | 34.5 | br-y† | 277 | 9 | 0 | 29.2 |
| ion | 351 | 34 | 34 | 35.9 | cl0vs4† | 173 | 13 | 13 | 7.5 |
| sonar | 208 | 60 | 60 | 46.6 | ecoli4† | 336 | 7 | 7 | 5.9 |
| spectf* | 267 | 44 | 44 | 20.6 | hab† | 306 | 3 | 3 | 26.5 |
| phon* | 5404 | 5 | 5 | 29.3 | led7_xvs1† | 443 | 7 | 7 | 8.3 |
| page* | 5473 | 10 | 10 | 10.2 | pb-1-3vs4† | 472 | 10 | 10 | 5.9 |
| ism* | 11,180 | 6 | 6 | 2.3 | shut-0vs4† | 1829 | 9 | 9 | 6.7 |
| letter* | 20,000 | 16 | 16 | 3.9 | vow0† | 988 | 13 | 13 | 9.1 |
| satim* | 6430 | 36 | 36 | 9.7 | yst-2vs4† | 514 | 8 | 8 | 9.9 |
| compu* | 13,657 | 20 | 20 | 3.8 | yst4† | 1484 | 8 | 8 | 3.4 |
| segm* | 2310 | 19 | 19 | 14.3 | glass6† | 214 | 9 | 9 | 13.5 |
| oil* | 937 | 49 | 49 | 4.4 | new-th1† | 215 | 5 | 5 | 16.3 |
| estate* | 5322 | 12 | 12 | 12 | wisc† | 683 | 9 | 9 | 34.5 |
| hypo* | 2000 | 24 | 6 | 6.1 | car-gd† | 1728 | 6 | 0 | 4 |
| boun* | 3505 | 175 | 0 | 3.5 | flare-F† | 1066 | 11 | 0 | 4 |
| cred* | 1000 | 20 | 7 | 30 | kdd† | 1642 | 41 | 26 | 3.2 |
| hrt-v* | 133 | 9 | 4 | 23.3 | veh0† | 846 | 18 | 18 | 23.5 |
| ab9-18† | 731 | 8 | 7 | 5.7 | w-red-4† | 1599 | 11 | 11 | 3.3 |
Overview of the multiclass data sets, all from the KEEL repository.
| Dataset | #Inst | #Attr | #Num | %Min | #Class |
|---|---|---|---|---|---|
| ontraceptive | 1473 | 9 | 6 | 22.6 | 3 |
| dermatology | 366 | 34 | 34 | 5.5 | 6 |
| balance | 625 | 4 | 4 | 7.8 | 3 |
| penbased | 1100 | 16 | 16 | 9.5 | 10 |
| shuttle | 2175 | 9 | 9 | 0.09 | 5 |
| wine | 178 | 13 | 13 | 27 | 3 |
| yeast | 1484 | 8 | 8 | 0.3 | 10 |
| pageblocks | 548 | 10 | 10 | 0.6 | 5 |
| thyroid | 720 | 21 | 21 | 2.4 | 3 |
| ecoli | 336 | 7 | 7 | 0.6 | 8 |
| autos | 159 | 25 | 15 | 1.9 | 6 |
| glass | 214 | 9 | 9 | 4.2 | 6 |
| new-thyroid | 215 | 5 | 5 | 13.9 | 3 |
| hayes-roth | 132 | 4 | 4 | 22.7 | 3 |
| lymphography | 148 | 18 | 3 | 1.3 | 4 |
Fig. 1Average test performance across datasets for different numbers of classifiers in AUCPR (left), macro-accuracy (middle) and macro F1 (right). The first row shows results for DT ensembles and second row for NN ensembles. The interpolated lines are shown for convenience.
Win/Tie/Loss tables. Each element expresses how many times the method in the row wins/ties/loses against the method in the column. The top tables show results with DT methods, and the bottom tables show results with NN methods.
| Macro-accuracy | Macro F1-score | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| EB | RB | SMOTE | RNB | Single | EB | RB | SMOTE | RNB | Single | ||
| PTMA | 27/0/9 | 23/1/12 | 24/1/11 | 23/1/12 | 31/2/3 | PTF1 | 33/0/3 | 31/1/4 | 19/2/15 | 19/1/16 | 30/0/6 |
| EB | – | 6/4/26 | 20/2/14 | 19/1/16 | 31/0/5 | EB | – | 2/2/32 | 1/1/34 | 1/2/33 | 10/0/26 |
| RB | – | – | 24/1/11 | 23/4/9 | 33/0/3 | RB | – | – | 5/2/29 | 6/2/28 | 15/0/21 |
| SMOTE | – | – | – | 15/4/17 | 30/1/5 | SMOTE | – | – | – | 18/2/16 | 28/0/8 |
| RNB | – | – | – | – | 34/1/1 | RNB | – | – | – | – | 30/0/6 |
| EB | RB | SMOTE | RNB | single | EB | RB | SMOTE | RNB | single | ||
| PTMA | 13/1/12 | 16/0/10 | 20/0/6 | 15/2/9 | 26/0/0 | PTF1 | 20/2/4 | 20/0/6 | 14/1/11 | 11/4/11 | 24/0/2 |
| EB | – | 15/2/9 | 19/1/6 | 14/1/11 | 26/0/0 | EB | – | 4/0/22 | 4/1/21 | 3/1/22 | 12/0/14 |
| RB | – | – | 19/1/6 | 13/1/12 | 26/0/0 | RB | – | – | 6/2/18 | 5/2/19 | 13/0/13 |
| SMOTE | – | – | – | 4/2/20 | 22/0/4 | SMOTE | – | – | – | 13/2/11 | 23/1/2 |
| RNB | – | – | – | – | 26/0/0 | RNB | – | – | – | – | 22/0/4 |
Fig. 2Average recall across data sets for different numbers of classifiers, separated for the minority class (solid line) and the majority class (dashed line). The left plot shows results for DT ensembles and right plot for NN ensembles.
The average difference between the actual performance and full potential. The first row shows results for DT methods and second row for NN methods.
| Macro-accuracy | Macro F1-score | |||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| PTMA | EB | RB | SMOTE | RNB | single | PTF1 | EB | RB | SMOTE | RNB | single | |
| DT | 1.9% | 2.5% | 2.3% | 4.7% | 3.4% | 0.9% | 2.6% | 9.5% | 6.9% | 2.9% | 3.2% | 1.1% |
| NN | 1.7% | 2.0% | 2.1% | 5.9% | 3.3% | 0.6% | 2.9% | 6.5% | 5.5% | 2.9% | 3.6% | 0.9% |
Fig. 3Reliability plots for DT ensembles with 100 classifiers; spectf (UCI, left), pb-1-3vs4 (KEEL, middle) and satim (HDDT, right). We used 10 bins to discretize the posterior probability for the minority class (x-axis) for all five runs and two folds. The corresponding observed frequencies of the minority class (y-axis) were calculated for each bin (i.e., the “true” ). A method lining up with the diagonal is well calibrated while values below the diagonal are overestimating the probability of the minority class.