| Literature DB >> 28811820 |
Abstract
The Mahalanobis Taguchi System (MTS) is considered one of the most promising binary classification algorithms to handle imbalance data. Unfortunately, MTS lacks a method for determining an efficient threshold for the binary classification. In this paper, a nonlinear optimization model is formulated based on minimizing the distance between MTS Receiver Operating Characteristics (ROC) curve and the theoretical optimal point named Modified Mahalanobis Taguchi System (MMTS). To validate the MMTS classification efficacy, it has been benchmarked with Support Vector Machines (SVMs), Naive Bayes (NB), Probabilistic Mahalanobis Taguchi Systems (PTM), Synthetic Minority Oversampling Technique (SMOTE), Adaptive Conformal Transformation (ACT), Kernel Boundary Alignment (KBA), Hidden Naive Bayes (HNB), and other improved Naive Bayes algorithms. MMTS outperforms the benchmarked algorithms especially when the imbalance ratio is greater than 400. A real life case study on manufacturing sector is used to demonstrate the applicability of the proposed model and to compare its performance with Mahalanobis Genetic Algorithm (MGA).Entities:
Mesh:
Year: 2017 PMID: 28811820 PMCID: PMC5546084 DOI: 10.1155/2017/5874896
Source DB: PubMed Journal: Comput Intell Neurosci
Algorithm 1Modified Mahalanobis Taguchi System (MMTS) pseudo code.
Figure 1Receiver Operating Characteristics (ROC) curve for MTS.
Confusion matrix.
| True class | |||
|---|---|---|---|
| Negative | Positive | ||
| Hypothesis output | Negative | TN( | FN( |
| Positive | FP( | TP( | |
| Sum | | | |
TN(: true negative, FN(: false negative, FP(: false positive, TP(: true positive, based on threshold x, N: negative observations, and N: positive observations.
Summary of the dataset used in the study.
| Number | Dataset | Class | # variables | Number of observations |
| IR ratio b |
| Statistically | |
|---|---|---|---|---|---|---|---|---|---|
| Major/minor | Negative | Positive | Significantd | ||||||
| (1) | Abalone | Remainder/Class 24 | 8 | 4175 | 2 | 7.797 | 2088 : 1 | 0.0000 | Yes |
| (2) | Abalone | Remainder/Class 22 | 8 | 4171 | 6 | 0.814 | 695 : 1 | 0.0000 | Yes |
| (3) | Abalone | Remainder/Class 23 | 8 | 4168 | 9 | 0.661 | 463 : 1 | 0.0000 | Yes |
| (4) | Abalone | Remainder/Class 3 | 8 | 4162 | 10 | 8.227 | 417 : 1 | 0.0028 | Yes |
| (5) | Abalone | Remainder/Class 21 | 8 | 4165 | 12 | 1.244 | 347 : 1 | 0.0000 | Yes |
| (6) | Abalone | Remainder/Class 21 | 8 | 4163 | 14 | 1.000 | 297 : 1 | 0.0000 | Yes |
| (7) | Abalone | Remainder/Class 21 | 8 | 4151 | 22 | 1.019 | 189 : 1 | 0.0000 | Yes |
| (8) | Abalone | Remainder/Class 21 | 8 | 4151 | 26 | 0.868 | 160 : 1 | 0.0000 | Yes |
| (9) | Abalone | Remainder/Class 19 | 8 | 4145 | 32 | 0.555 | 130 : 1 | 0.0000 | Yes |
| (10) | ECOLI | Remainder/Class OML | 7 | 331 | 5 | 56.509 | 66 : 1 | 0.0000 | Yes |
| (11) | Weldinge | Normal/Expulsion | 28 | 316 | 6 | 18.837 | 53 : 1 | 0.0122 | Yes |
| (12) | Yeast | Remainder/Class ME2 | 8 | 1433 | 51 | 1.144 | 28 : 1 | 0.0000 | Yes |
| (13) | Shuttle | Remainder/Class 5 | 9 | 41042 | 2458 | 11.513 | 17 : 1 | 0.0000 | Yes |
| (14) | Glass | Remainder/Class 7 | 9 | 185 | 29 | 2.806 | 6 : 1 | 0.8156 | No |
| (15) | Heart disease | Absence/Presence | 13 | 150 | 120 | 0.872 | 1.25 : 1 | 0.0000 | Yes |
aFisher discriminant ratio; data overlapping index, bimbalance ratio = Negative/Positive; cbased on Kruskal-Wallis nonparametric test; dis there any statistical significant difference among classifiers performance (yes/no)? e[40].
Figure 2Supporting Vector Machines (SVMs).
Summary of the classifiers performance ranks for all datasets.
| Number | Dataset |
| IR ratiob | MTS | MMTS | PTM | SVM | NB | Classifier rank | ||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| LLh | Med.h | ULh | LL | Med. | UL | LL | Med. | UL | LL | Med. | UL | LL | Med. | UL | MTSc | MMTSd | PTMe | SVMf | NBg | ||||
| (1) | Abalone | 7.797 | 2088 : 1 | 88.02 | 88.91 | 89.52 | 98.31 | 98.77 | 99.22 | 00.00 | 49.20 | 98.40 | 00.00 | 00.00 | 00.00 | 00.00 | 00.00 | 00.00 | 2 | 1 | 2 | 3 | 3 |
| (2) | Abalone | 0.814 | 695 : 1 | 50.50 | 60.40 | 70.20 | 74.70 | 82.90 | 91.00 | 00.00 | 00.00 | 39.60 | 28.40 | 56.80 | 68.30 | 00.00 | 00.00 | 27.90 | 2 | 1 | 3 | 2 | 4 |
| (3) | Abalone | 0.661 | 463 : 1 | 56.80 | 61.90 | 67.00 | 72.58 | 75.65 | 76.54 | 21.20 | 42.40 | 43.80 | 29.20 | 51.60 | 66.90 | 00.00 | 00.00 | 00.00 | 2 | 1 | 3 | 2 | 4 |
| (4) | Abalone | 8.227 | 417 : 1 | 84.14 | 84.58 | 85.43 | 96.63 | 97.33 | 97.92 | 28.70 | 57.30 | 77.80 | 81.30 | 90.00 | 98.70 | 69.70 | 80.50 | 94.40 | 2 | 1 | 3 | 1 | 2 |
| (5) | Abalone | 1.244 | 347 : 1 | 61.00 | 66.70 | 73.80 | 69.30 | 75.70 | 79.80 | 25.80 | 51.40 | 57.80 | 63.40 | 73.60 | 80.70 | 00.00 | 18.80 | 37.10 | 2 | 1 | 3 | 2 | 4 |
| (6) | Abalone | 1.000 | 297 : 1 | 64.40 | 69.90 | 75.50 | 71.54 | 76.18 | 81.03 | 32.20 | 51.50 | 60.30 | 63.50 | 74.00 | 85.20 | 23.50 | 38.70 | 45.30 | 2 | 1 | 3 | 1 | 4 |
| (7) | Abalone | 1.019 | 189 : 1 | 71.84 | 73.9 | 76.62 | 72.78 | 77.36 | 80.02 | 18.20 | 35.30 | 44.70 | 70.70 | 76.80 | 83.50 | 13.00 | 25.90 | 44.60 | 1 | 1 | 2 | 1 | 2 |
| (8) | Abalone | 0.868 | 160 : 1 | 69.06 | 71.84 | 74.68 | 73.82 | 78.15 | 81.35 | 29.40 | 44.50 | 56.30 | 81.78 | 83.30 | 85.18 | 24.20 | 38.90 | 48.50 | 3 | 2 | 4 | 1 | 4 |
| (9) | Abalone | 0.555 | 130 : 1 | 54.07 | 58.1 | 61.82 | 65.14 | 67.88 | 70.21 | 10.90 | 21.90 | 34.00 | 68.90 | 77.40 | 79.90 | 30.63 | 34.25 | 39.62 | 3 | 2 | 5 | 1 | 4 |
| (10) | ECOLI | 56.509 | 66 : 1 | 84.36 | 86.5 | 88.76 | 98.71 | 99.07 | 99.42 | 99.30 | 99.30 | 99.42 | 99.08 | 99.54 | 99.77 | 0.000 | 0.000 | 28.90 | 3 | 2 | 2 | 1 | 4 |
| (11) | WeldingH | 18.837 | 53 : 1 | 57.80 | 65.00 | 71.70 | 79.70 | 89.10 | 98.30 | 79.60 | 86.10 | 92.70 | 38.90 | 69.00 | 89.20 | 38.00 | 67.50 | 84.50 | 2 | 1 | 1 | 2 | 3 |
| (12) | Yeast | 1.144 | 28 : 1 | 67.95 | 69.26 | 70.45 | 69.66 | 72.14 | 74.55 | 17.00 | 29.60 | 33.90 | 63.40 | 82.90 | 84.20 | 17.30 | 25.90 | 35.60 | 3 | 2 | 4 | 1 | 4 |
| (13) | Shuttle | 11.513 | 17 : 1 | 87.59 | 87.68 | 87.81 | 99.83 | 99.92 | 99.93 | 06.13 | 06.85 | 07.29 | 99.98 | 99.98 | 99.99 | 99.24 | 99.37 | 99.47 | 4 | 2 | 5 | 1 | 3 |
| (14) | Heart | 0.872 | 1.25 : 1 | 61.59 | 65.56 | 69.25 | 75.19 | 76.58 | 77.50 | 73.31 | 74.90 | 76.57 | 80.31 | 81.73 | 83.58 | 75.56 | 76.90 | 78.48 | 4 | 2 | 3 | 1 | 3 |
aFisher discriminant ratio; data overlapping index; bimbalance ratio = Negative/Positive; cMTS: Mahalanobis Taguchi System classifier at threshold = 1; dMMTS: Modified Mahalanobis Taguchi System classifier; ePTM: Probabilistic Mahalanobis Taguchi System classifier; fSVM: Support Vector Machines classifier; gNB: Naive Bayes classifier; hLL: lower limit, Med.: median, and UL: upper limit based on 95% confidence interval by using one sample Wilcoxon method; H[40].
Classification performance results (Gmeans) of MMTS classifier versus modified SVMs class imbalance data classifiers.
| Dataset |
| # variables | Number of observations | IR | SVM | SMOTE | ACT | KBA | MMTS | |
|---|---|---|---|---|---|---|---|---|---|---|
| Negative | Positive | |||||||||
| Car | 1.01 | 6 | 1659 | 69 | 24 : 1 | 99.0 ± 2.2 | 99.0 ± 2.3 | 99.9 ± 0.2 | 99.9 ± 0.2 | 85.3 ± 2.2 |
| Yeast | 1.14 | 8 | 1433 | 51 | 28 : 1 | 59.0 ± 12.1 | 69.9 ± 10.0 | 78.5 ± 4.5 | 82.2 ± 7.1 | 72.2 ± 2.9 |
| Abalone | 0.55 | 8 | 4145 | 32 | 130 : 1 | 0.0 ± 0.0 | 0.0 ± 0.0 | 51.9 ± 7.6 | 57.8 ± 5.4 | 67.7 ± 3.4 |
Classification performance results (Gmeans) for the modified Naive Bayes classifiers.
| Dataset |
| # variables | Number of observations | IR | HNB | TAN | NBTree | AODE | WAODE | |
|---|---|---|---|---|---|---|---|---|---|---|
| Negative | Positive | |||||||||
| Car | 1.01 | 6 | 1659 | 69 | 24 : 1 | 74.8 ± 7.1 | 49.6 ± 11.5 | 52.2 ± 22.8 | 2.3 ± 7.3 | 8.6 ± 14.5 |
| Yeast | 1.14 | 8 | 1433 | 51 | 28 : 1 | 45.5 ± 26.9 | 34.9 ± 25.1 | 23.5 ± 25.5 | 4.1 ± 12.9 | 25.8 ± 22.8 |
| Abalone | 0.55 | 8 | 4145 | 32 | 130 : 1 | 0.0 ± 0.0 | 0.0 ± 0.0 | 0.0 ± 0.0 | 0.0 ± 0.0 | 0.0 ± 0.0 |
Description of welding data.
| Dataset | Classes | Number of vars. | Neg. obs. | Pos. obs. |
| IR |
|---|---|---|---|---|---|---|
| AC welding | Normal/expulsion | 28 | 3288 | 6 | 4.1104 | 548 : 1 |
Classification results for AC welding dataset with IR 548.
| Classifier type | Repetition | Threshold ( | Specificity | Sensitivity | Precision |
| |
|---|---|---|---|---|---|---|---|
| MMTS | 1 | 4.661 | 99.392 | 75.000 | 99.195 | 86.339 | 85.417 |
| 2 | 1.588 | 87.319 | 100.000 | 88.746 | 93.444 | 94.037 | |
| 3 | 6.415 | 99.345 | 75.000 | 99.134 | 86.318 | 85.395 | |
| 4 | 2.339 | 95.367 | 100.000 | 95.572 | 97.656 | 97.736 | |
| 5 | 1.858 | 91.015 | 100.000 | 91.756 | 95.402 | 95.701 | |
| 6 | 2.929 | 98.549 | 100.000 | 98.570 | 99.272 | 99.280 | |
| 7 | 2.789 | 98.315 | 100.000 | 98.343 | 99.154 | 99.165 | |
| 8 | 1.653 | 89.190 | 100.000 | 90.245 | 94.441 | 94.872 | |
| 9 | 1.254 | 79.551 | 100.000 | 83.023 | 89.191 | 90.724 | |
| 10 | 3.074 | 98.690 | 100.000 | 98.707 | 99.343 | 99.349 | |
|
| |||||||
| PTM | 1 | 3.803 | 98.549 | 75.000 | 98.103 | 85.972 | 85.010 |
| 2 | 3.718 | 98.737 | 75.000 | 98.343 | 86.054 | 85.100 | |
| 3 | 3.279 | 96.912 | 75.000 | 96.045 | 85.255 | 84.228 | |
| 4 | 2.312 | 95.087 | 100.000 | 95.317 | 97.512 | 97.602 | |
| 5 | 4.416 | 99.438 | 25.000 | 97.803 | 49.859 | 39.821 | |
| 6 | 2.503 | 97.099 | 100.000 | 97.181 | 98.539 | 98.570 | |
| 7 | 2.112 | 95.367 | 100.000 | 95.572 | 97.656 | 97.736 | |
| 8 | 4.775 | 99.064 | 50.000 | 98.163 | 70.379 | 66.253 | |
| 9 | 3.137 | 95.929 | 75.000 | 94.851 | 84.821 | 83.766 | |
| 10 | 2.492 | 96.912 | 100.000 | 97.004 | 98.444 | 98.479 | |
|
| |||||||
| SVM | 1 | — | 99.953 | 25.000 | 99.813 | 49.988 | 39.985 |
| 2 | — | 100.000 | 50.000 | 100.000 | 70.711 | 66.667 | |
| 3 | — | 99.953 | 75.000 | 99.938 | 86.582 | 85.691 | |
| 4 | — | 99.953 | 75.000 | 99.938 | 86.582 | 85.691 | |
| 5 | — | 100.000 | 25.000 | 100.000 | 50.000 | 40.000 | |
| 6 | — | 99.906 | 50.000 | 99.813 | 70.678 | 66.625 | |
| 7 | — | 99.906 | 50.000 | 99.813 | 70.678 | 66.625 | |
| 8 | — | 100.000 | 25.000 | 100.000 | 50.000 | 40.000 | |
| 9 | — | 99.953 | 75.000 | 99.938 | 86.582 | 85.691 | |
| 10 | — | 99.953 | 25.000 | 99.813 | 49.988 | 39.985 | |
|
| |||||||
| NB | 1 | — | 100.000 | 25.000 | 100.000 | 50.000 | 40.000 |
| 2 | — | 100.000 | 25.000 | 100.000 | 50.000 | 40.000 | |
| 3 | — | 99.906 | 0.000 | 0.000 | 0.000 | NaNa | |
| 4 | — | 100.000 | 25.000 | 100.000 | 50.000 | 40.000 | |
| 5 | — | 99.953 | 0.000 | 0.000 | 0.000 | NaN | |
| 6 | — | 100.000 | 0.000 | NaN | 0.000 | NaN | |
| 7 | — | 100.000 | 25.000 | 100.000 | 50.000 | 40.000 | |
| 8 | — | 100.000 | 0.000 | NaN | 0.000 | NaN | |
| 9 | — | 99.906 | 0.000 | 0.000 | 0.000 | NaN | |
| 10 | — | 99.953 | 0.000 | 0.000 | 0.000 | NaN | |
|
| |||||||
| MGA | 1 | 1.000 | 77.164 | 100.000 | 81.410 | 87.843 | 89.752 |
| 2 | 1.000 | 77.118 | 100.000 | 81.379 | 87.817 | 89.733 | |
| 3 | 1.000 | 76.977 | 100.000 | 81.286 | 87.737 | 89.677 | |
| 4 | 1.000 | 77.164 | 100.000 | 81.410 | 87.843 | 89.752 | |
| 5 | 1.000 | 76.837 | 100.000 | 81.193 | 87.657 | 89.621 | |
| 6 | 1.000 | 77.492 | 100.000 | 81.627 | 88.029 | 89.884 | |
| 7 | 1.000 | 77.211 | 100.000 | 81.441 | 87.870 | 89.771 | |
| 8 | 1.000 | 77.164 | 100.000 | 81.410 | 87.843 | 89.752 | |
| 9 | 1.000 | 77.164 | 100.000 | 81.410 | 87.843 | 89.752 | |
| 10 | 1.000 | 77.632 | 100.000 | 81.721 | 88.109 | 89.941 | |
aNAN since the dominator is zero.
Mann–Whitney test P values. a Results for welding AC dataset with IR 548.
|
| Classifier rank | ||||||
|---|---|---|---|---|---|---|---|
| MMTS | PTM | SVM | NB | MGA | |||
|
| MMTS | — | 0.0410 | 0.0005 | 0.0001 | 0.0129 | 1 |
| PTM | — | — | 0.0521 | 0.0003 |
| 2 | |
| SVM | — | — | — | 0.0070 | 0.0001 | 3 | |
| NB | — | — | — | — | — | 4 | |
| MGAb | — |
| 0.0001 | 0.0001 | — | 2 | |
aThe null hypothesis H : Median1 = Median2 is tested versus the alternative hypothesis H1 : Median1 > Median2, at a specified level of significance α = 0.05; bMahalanobis Genetic Algorithm [3].
Figure 3ROC curves for MMTS, PTM, SVMs, and NB classifiers for welding AC dataset.