| Literature DB >> 24790571 |
Xiaoqing Gu1, Tongguang Ni1, Hongyuan Wang1.
Abstract
In medical datasets classification, support vector machine (SVM) is considered to be one of the most successful methods. However, most of the real-world medical datasets usually contain some outliers/noise and data often have class imbalance problems. In this paper, a fuzzy support machine (FSVM) for the class imbalance problem (called FSVM-CIP) is presented, which can be seen as a modified class of FSVM by extending manifold regularization and assigning two misclassification costs for two classes. The proposed FSVM-CIP can be used to handle the class imbalance problem in the presence of outliers/noise, and enhance the locality maximum margin. Five real-world medical datasets, breast, heart, hepatitis, BUPA liver, and pima diabetes, from the UCI medical database are employed to illustrate the method presented in this paper. Experimental results on these datasets show the outperformed or comparable effectiveness of FSVM-CIP.Entities:
Mesh:
Year: 2014 PMID: 24790571 PMCID: PMC3982259 DOI: 10.1155/2014/536434
Source DB: PubMed Journal: ScientificWorldJournal ISSN: 1537-744X
Figure 1The hyperplanes of linear FSVM-CIP.
Algorithm 1FSVM-CIP in the linear case.
Algorithm 2Kernel FSVM-CIP.
Characteristics of the selected datasets.
| Datasets | #pos | #neg |
|
| Ratio |
|
|---|---|---|---|---|---|---|
| Breast | 458 | 241 | 240 | 120 | 2 : 1 | 9 |
| Heart | 120 | 150 | 80 | 20 | 4 : 1 | 13 |
| Hepatitis | 123 | 32 | 100 | 10 | 10 : 1 | 19 |
| BUPA liver | 200 | 145 | 150 | 10 | 15 : 1 | 6 |
| Pima diabetes | 268 | 500 | 180 | 10 | 18 : 1 | 8 |
Comparison of the classification results (%) on breast dataset.
| Method | Sensitivity | Specificity | Accuracy | |
|---|---|---|---|---|
| Linear | FSVM | 95.87 ± 0.017 | 95.04 ± 0.043 | 95.58 ± 0.035 |
| SVDD | 97.71 ± 0.065 | 90.90 ± 0.013 | 95.28 ± 0.052 | |
| FSVM-CIL | 95.87 ± 0.024 | 95.87 ± 0.015 | 95.81 ± 0.028 | |
| WCS-FSVM | 96.33 ± 0.067 | 95.04 ± 0.056 | 95.87 ± 0.047 | |
| FSVM-CIPlin | 96.98 ± 0.039 | 96.49 ± 0.022 | 96.76 ± 0.040 | |
| FSVM-CIPexp | 96.68 ± 0.011 | 96.69 ± 0.042 | 96.76 ± 0.037 | |
|
| ||||
| Gaussian kernel | FSVM | 96.33 ± 0.023 | 95.87 ± 0.051 | 96.17 ± 0.050 |
| SVDD | 97.30 ± 0.065 | 91.25 ± 0.013 | 95.44 ± 0.052 | |
| FSVM-CIL | 96.79 ± 0.059 | 95.87 ± 0.042 | 96.46 ± 0.055 | |
| WCS-FSVM | 96.97 ± 0.030 | 96.69 ± 0.093 | 96.76 ± 0.067 | |
| FSVM-CIPlin | 97.25 ± 0.055 | 96.29 ± 0.032 | 97.05 ± 0.042 | |
| FSVM-CIPexp | 97.25 ± 0.055 | 97.52 ± 0.045 | 97.34 ± 0.033 | |
Comparison of the classification results (%) on heart dataset.
| Method | Sensitivity | Specificity | Accuracy | |
|---|---|---|---|---|
| Linear | FSVM | 87.50 ± 0.080 | 80.77 ± 0.069 | 82.35 ± 0.069 |
| SVDD | 87.03 ± 0.021 | 77.69 ± 0.005 | 80.00 ± 0.051 | |
| FSVM-CIL | 85.00 ± 0.046 | 82.04 ± 0.110 | 82.35 ± 0.072 | |
| WCS-FSVM | 87.30 ± 0.071 | 81.54 ± 0.089 | 82.94 ± 0.088 | |
| FSVM-CIPlin | 85.00 ± 0.063 | 82.31 ± 0.083 | 82.84 ± 0.054 | |
| FSVM-CIPexp | 87.50 ± 0.025 | 82.31 ± 0.083 | 83.53 ± 0.055 | |
|
| ||||
| Gaussian kernel | FSVM | 86.70 ± 0.099 | 82.61 ± 0.087 | 83.35 ± 0.042 |
| SVDD | 90.35 ± 0.022 | 80.77 ± 0.034 | 82.80 ± 0.070 | |
| FSVM-CIL | 87.05 ± 0.034 | 81.54 ± 0.067 | 82.94 ± 0.044 | |
| WCS-FSVM | 91.00 ± 0.076 | 81.73 ± 0.083 | 84.12 ± 0.085 | |
| FSVM-CIPlin | 90.00 ± 0.045 | 82.31 ± 0.086 | 84.12 ± 0.052 | |
| FSVM-CIPexp | 86.05 ± 0.023 | 83.08 ± 0.078 | 84.71 ± 0.066 | |
Comparison of the classification results (%) on hepatitis dataset.
| Method | Sensitivity | Specificity | Accuracy | |
|---|---|---|---|---|
| Linear | FSVM | 82.60 ± 0.053 | 22.73 ± 0.087 | 53.33 ± 0.073 |
| SVDD | 73.91 ± 0.071 | 45.45 ± 0.011 | 60.00 ± 0.046 | |
| FSVM-CIL | 77.66 ± 0.026 | 45.46 ± 0.082 | 61.02 ± 0.070 | |
| WCS-FSVM | 79.56 ± 0.107 | 27.27 ± 0.062 | 53.33 ± 0.059 | |
| FSVM-CIPlin | 78.26 ± 0.046 | 45.46 ± 0.032 | 62.22 ± 0.023 | |
| FSVM-CIPexp | 78.26 ± 0.068 | 50.00 ± 0.086 | 64.44 ± 0.071 | |
|
| ||||
| Gaussian kernel | FSVM | 73.91 ± 0.038 | 31.82 ± 0.012 | 53.33 ± 0.025 |
| SVDD | 82.60 ± 0.053 | 42.86 ± 0.025 | 63.64 ± 0.030 | |
| FSVM-CIL | 77.26 ± 0.041 | 50.00 ± 0.086 | 63.84 ± 0.064 | |
| WCS-FSVM | 78.26 ± 0.015 | 36.36 ± 0.074 | 57.78 ± 0.056 | |
| FSVM-CIPlin | 73.51 ± 0.064 | 54.55 ± 0.037 | 64.44 ± 0.058 | |
| FSVM-CIPexp | 73.91 ± 0.050 | 59.10 ± 0.011 | 66.67 ± 0.036 | |
Comparison of the classification results (%) on BUPA liver dataset.
| Method | Sensitivity | Specificity | Accuracy | |
|---|---|---|---|---|
| Linear | FSVM | 88.10 ± 0.008 | 66.42 ± 0.073 | 72.19 ± 0.057 |
| SVDD | 87.27 ± 0.021 | 68.05 ± 0.063 | 72.72 ± 0.042 | |
| FSVM-CIL | 88.00 ± 0.004 | 67.44 ± 0.042 | 73.19 ± 0.015 | |
| WCS-FSVM | 84.00 ± 0.360 | 67.15 ± 0.068 | 71.66 ± 0.051 | |
| FSVM-CIPlin | 88.00 ± 0.004 | 67.88 ± 0.063 | 73.26 ± 0.031 | |
| FSVM-CIPexp | 86.00 ± 0.048 | 69.34 ± 0.072 | 73.80 ± 0.054 | |
|
| ||||
| Gaussian kernel | FSVM | 96.00 ± 0.057 | 66.67 ± 0.026 | 74.60 ± 0.038 |
| SVDD | 95.43 ± 0.033 | 71.24 ± 0.050 | 77.23 ± 0.017 | |
| FSVM-CIL | 95.00 ± 0.045 | 72.59 ± 0.052 | 78.37 ± 0.050 | |
| WCS-FSVM | 90.08 ± 0.070 | 67.44 ± 0.083 | 73.73 ± 0.062 | |
| FSVM-CIPlin | 94.00 ± 0.049 | 74.10 ± 0.045 | 79.46 ± 0.048 | |
| FSVM-CIPexp | 94.00 ± 0.049 | 73.33 ± 0.084 | 79.92 ± 0.074 | |
Comparison of the classification results (%) on pima diabetes dataset.
| Method | Sensitivity | Specificity | Accuracy | |
|---|---|---|---|---|
| Linear | FSVM | 91.91 ± 0.022 | 49.98 ± 0.053 | 55.36 ± 0.051 |
| SVDD | 88.65 ± 0.081 | 53.43 ± 0.062 | 58.45 ± 0.029 | |
| FSVM-CIL | 86.36 ± 0.064 | 55.10 ± 0.059 | 59.86 ± 0.060 | |
| WCS-FSVM | 87.50 ± 0.043 | 52.65 ± 0.024 | 57.96 ± 0.030 | |
| FSVM-CIPlin | 85.23 ± 0.021 | 57.76 ± 0.064 | 61.94 ± 0.043 | |
| FSVM-CIPexp | 84.09 ± 0.009 | 57.96 ± 0.062 | 61.94 ± 0.053 | |
|
| ||||
| Gaussian kernel | FSVM | 93.18 ± 0.031 | 51.02 ± 0.073 | 57.44 ± 0.053 |
| SVDD | 91.76 ± 0.025 | 56.86 ± 0.052 | 62.57 ± 0.028 | |
| FSVM-CIL | 90.91 ± 0.047 | 58.78 ± 0.084 | 63.67 ± 0.077 | |
| WCS-FSVM | 92.05 ± 0.010 | 54.69 ± 0.066 | 60.38 ± 0.053 | |
| FSVM-CIPlin | 88.84 ± 0.040 | 61.38 ± 0.063 | 65.57 ± 0.063 | |
| FSVM-CIPexp | 88.64 ± 0.029 | 61.43 ± 0.074 | 65.57 ± 0.070 | |
Figure 2The effect of the parameter η on kernel FSVM-CIPexp.
Figure 3The effect of the parameter k on kernel FSVM-CIPexp.