| Literature DB >> 27980678 |
Jinyan Li1, Simon Fong1, Yunsick Sung2, Kyungeun Cho3, Raymond Wong4, Kelvin K L Wong5,6.
Abstract
BACKGROUND: An imbalanced dataset is defined as a training dataset that has imbalanced proportions of data in both interesting and uninteresting classes. Often in biomedical applications, samples from the stimulating class are rare in a population, such as medical anomalies, positive clinical tests, and particular diseases. Although the target samples in the primitive dataset are small in number, the induction of a classification model over such training data leads to poor prediction performance due to insufficient training from the minority class.Entities:
Keywords: Biomedical data; Classification; Dynamic Multi-objective; Imbalanced dataset; SMOTE; Swarm optimisation; Under-sampling
Year: 2016 PMID: 27980678 PMCID: PMC5131504 DOI: 10.1186/s13040-016-0117-1
Source DB: PubMed Journal: BioData Min ISSN: 1756-0381 Impact factor: 2.522
Fig. 3Snapshot of fluctuating values of accuracy and Kappa (an example of imbalanced dataset with 100 majority class samples and 5 minority class samples)
Fig. 1Principle of adaptive swarm clustered-based dynamic multi-objective synthetic minority oversampling technique (SMOTE)
Fig. 2Flow chart of dynamic multi-objective SMOTE (SDMRA)
Information of our biomedical datasets
| Name | Imb.Ratio | Target | Samples |
|---|---|---|---|
| Thoraric Surgery | 5.7:1 | died | 470 |
| Ecoli | 8.6:1 | imU | 336 |
| Sick Euthyroid | 9.8:1 | sick euthyroid | 3163 |
| Yeast_ML8 | 13:1 | target 8 | 2407 |
| Thyroid Sick | 15:1 | sick | 3772 |
| Arrhythmia | 17:1 | class = 06 | 452 |
| Mammography | 42:1 | minority | 11183 |
Average Kappa of different algorithms with different datasets
| Kappa | NN | SMOTE-NN | R-SMOTE-NN | SRA-NN | ASCB_DmSMOTE-NN |
|---|---|---|---|---|---|
| Thoraric Surgery | 0.049 | 0.305 | 0.312 ± 0.48 | 0.670 ± 0.21 |
|
| Ecoli | 0.502 | 0.807 | 0.723 ± 0.12 | 0.850 ± 0.05 |
|
| Sick Euthyroid | 0.497 | 0.831 | 0.688 ± 0.13 | 0.824 ± 0.07 |
|
| Yeast_ML8 |
| 0.381 | 0.578 ± 0.23 |
| 0.927 ± 0.04 |
| Thyroid Sick | 0.360 | 0.833 | 0.762 ± 0.12 |
| 0.829 ± 0.11 |
| Arrhythmia | 0.068 | 0.761 | 0.826 ± 0.13 | 0.937 ± 0.04 |
|
| Mammography | 0.436 | 0.794 | 0.729 ± 0.14 | 0.673 ± 0.16 |
|
| K_average | 0.273 | 0.673 | 0.660 ± 0.22 | 0.833 ± 0.09 |
|
The italicized entries represent the best performance
Average Accuracy of different algorithms with different datasets
| Accuracy | NN | SMOTE-NN | R-SMOTE-NN | SRA-NN | ASCB_DmSMOTE-NN |
|---|---|---|---|---|---|
| ThoraricSurgery | 0.848 | 0.653 | 0.686 ± 0.23 | 0.895 ± 0.05 |
|
| Ecoli | 0.925 | 0.904 | 0.817 ± 0.18 |
| 0.918 ± 0.02 |
| Sick Euthyroid | 0.936 | 0.916 | 0.781 ± 0.19 |
| 0.927 ± 0.02 |
| Yeast_ML8 | 0.926 | 0.690 | 0.756 ± 0.18 |
| 0.959 ± 0.02 |
| Thyroid Sick | 0.953 | 0.916 | 0.852 ± 0.13 |
| 0.946 ± 0.03 |
| Arrhythmia | 0.858 | 0.880 | 0.871 ± 0.11 | 0.958 ± 0.04 |
|
| Mammography |
| 0.897 | 0.884 ± 0.10 |
| 0.956 ± 0.02 |
| A_average | 0.919 | 0.837 | 0.807 ± 0.16 |
| 0.938 ± 0.02 |
The italicized entries represent the best performance
Average G-mean value of different algorithms with different dataset
| G-mean | NN | SMOTE-NN | R-SMOTE-NN | SRA-NN | ASCB_DmSMOTE-NN |
|---|---|---|---|---|---|
| ThoraricSurgery | 0.179 | 0.651 | 0.479 ± 0.22 | 0.715 ± 0.12 |
|
| Ecoli | 0.630 | 0.904 | 0.768 ± 0.18 | 0.813 ± 0.12 |
|
| Sick Euthyroid | 0.613 | 0.916 | 0.750 ± 0.15 | 0.832 ± 0.10 |
|
| Yeast_ML8 | 0.000 | 0.690 | 0.641 ± 0.14 | 0.926 ± 0.07 |
|
| Thyroid Sick | 0.453 | 0.916 | 0.811 ± 0.14 |
| 0.836 ± 0.6 |
| Arrhythmia | 0.091 | 0.880 | 0.802 ± 0.13 | 0.904 ± 0.06 |
|
| Mammography | 0.520 | 0.896 | 0.795 ± 0.12 | 0.746 ± 0.06 |
|
| G_average | 0.355 | 0.836 | 0.721 ± 0.14 | 0.833 ± 0.10 |
|
The italicized entries represent the best performance
Average F-measure of different algorithms with different datasets
| F-measure(F1) | NN | SMOTE-NN | R-SMOTE-NN | SRA-NN | ASCB_DmSMOTE-NN |
|---|---|---|---|---|---|
| ThoraricSurgery |
| 0.667 | 0.643 ± 0.27 | 0.642 ± 0.1 | 0.865 ± 0.04 |
| Ecoli |
| 0.902 | 0.762 ± 0.18 | 0.795 ± 0.09 | 0.874 ± 0.04 |
| Sick Euthyroid |
| 0.916 | 0.793 ± 0.16 | 0.787 ± 0.07 | 0.891 ± 0.05 |
| Yeast_ML8 |
| 0.693 | 0.809 ± 0.15 | 0.902 ± 0.09 | 0.939 ± 0.04 |
| Thyroid Sick |
| 0.915 | 0.820 ± 0.15 | 0.863 ± 0.08 | 0.821 ± 0.03 |
| Arrhythmia | 0.876 | 0.878 | 0.847 ± 0.16 | 0.895 ± 0.08 |
|
| Mammography |
| 0.901 | 0.812 ± 0.13 | 0.726 ± 0.07 | 0.943 ± 0.03 |
| F1_average |
| 0.839 | 0.784 ± 0.17 | 0.801 ± 0.09 | 0.912 ± 0.04 |
The italicized entries represent the best performance
Average Imbalanced ratio (majority: minority) value of different algorithms with different datasets
| Imb.Ratio (ma/mi) | NN | SMOTE-NN | R-SMOTE-NN | SRA-NN | ASCB_DmSMOTE-NN |
|---|---|---|---|---|---|
| ThoraricSurgery | 5.7:1 | 1:1 | 1.2 ± 0.7:1 | 0.7 ± 0.4:1 | 0.5 ± 0.3:1 |
| Ecoli | 8.6:1 | 1:1 | 1.3 ± 0.5:1 | 0.6 ± 0.2:1 | 0.4 ± 0.3:1 |
| Sick Euthyroid | 9.8: | 1:1 | 1.8 ± 0.5:1 | 1.1 ± 0.3:1 | 0.7 ± 0.4:1 |
| Yeast_ML8 | 12.6: | 1:1 | 1.9 ± 0.6:1 | 0.6 ± 0.2:1 | 0.7 ± 0.2:1 |
| Thyroid Sick | 15.3:1 | 1:1 | 1.6 ± 0.4:1 | 0.8 ± 0.3:1 | 0.9 ± 0.1:1 |
| Arrhythmia | 17.1:1 | 1:1 | 1.3 ± 0.7:1 | 0.7 ± 0.3:1 | 0.5 ± 0.2:1 |
| Mammography | 42.0:1 | 1:1 | 1.5 ± 0.5:1 | 0.9 ± 0.2:1 | 0.8 ± 0.3:1 |
| I_average | 15.9:1 | 1:1 | 1.5 ± 0.6:1 | 0.8 ± 0.3:1 | 0.6 ± 0.3:1 |
Fig. 4Average Kappa and Average Accuracy of different methods over all datasets in boxplot
Fig. 5Average of Kappa, Accuracy and Reliable accuracy of all experiments
Fig. 6Average performance of imbalanced dataset classification indices of all experiments
Fig. 7Size variations of datasets with different methods processed
List of abbreviations
| Abbreviation | Meaning | Page |
|---|---|---|
| ASCB_DmSMOTE | Adaptive Swarm Cluster-Based Dynamic Multi-objective SMOTE | 1 |
| SDMRA/DMSMOTE | Swarm Dynamic Multi-objective Rebalancing Algorithm | 6 |
| Imb.Ratio | Imbalanced Ratio | 7 |
| ma/mi | Majority class/Minority class | 13 |
| NN | Neural Network | 7 |
| PSO | Particle Swarm Optimization | 3 |
| R-SMOTE | Random -SMOTE | 7 |
| SMOTE | Synthetic Minority Oversampling Technique | 1 |
| SRA | Swarm Rebalancing Algorithm | 6 |