| Literature DB >> 23983632 |
Mustafa Serter Uzer1, Nihat Yilmaz, Onur Inan.
Abstract
This paper offers a hybrid approach that uses the artificial bee colony (ABC) algorithm for feature selection and support vector machines for classification. The purpose of this paper is to test the effect of elimination of the unimportant and obsolete features of the datasets on the success of the classification, using the SVM classifier. The developed approach conventionally used in liver diseases and diabetes diagnostics, which are commonly observed and reduce the quality of life, is developed. For the diagnosis of these diseases, hepatitis, liver disorders and diabetes datasets from the UCI database were used, and the proposed system reached a classification accuracies of 94.92%, 74.81%, and 79.29%, respectively. For these datasets, the classification accuracies were obtained by the help of the 10-fold cross-validation method. The results show that the performance of the method is highly successful compared to other results attained and seems very promising for pattern recognition applications.Entities:
Mesh:
Year: 2013 PMID: 23983632 PMCID: PMC3745978 DOI: 10.1155/2013/419187
Source DB: PubMed Journal: ScientificWorldJournal ISSN: 1537-744X
Pseudocode 1Pseudo code for SFS [22].
Pseudocode 2Pseudo-code of the ABC algorithm [24].
Figure 1Linear classifier defined by the hyperplane H (w · x + b = 0).
The four classification performance indices included in the confusion matrix.
| Actual class | Predicted class | |
|---|---|---|
| Positive | Negative | |
| Positive | True positive (TP) | False negative (FN) |
| Negative | False positive (FP) | True negative (TN) |
Figure 2Block diagram of the proposed system.
Range values and attribute names for hepatitis dataset [35].
| The number of attribute | The name of attribute | Interval of attribute |
|---|---|---|
| 1 | Age | 7–78 |
| 2 | Sex | Male, Female |
| 3 | Steroid | No, Yes |
| 4 | Antivirals | No, Yes |
| 5 | Fatigue | No, Yes |
| 6 | Malaise | No, Yes |
| 7 | Anorexia | No, Yes |
| 8 | Liver big | No, Yes |
| 9 | Liver firm | No, Yes |
| 10 | Spleen palpable | No, Yes |
| 11 | Spiders | No, Yes |
| 12 | Ascites | No, Yes |
| 13 | Varices | No, Yes |
| 14 | Bilirubin | 0.3–8 |
| 15 | Alk phosphate | 26–295 |
| 16 | SGOT | 14–648 |
| 17 | Albumin | 2.1–6.4 |
| 18 | Protime | 0–100 |
| 19 | Histology | No, Yes |
Range values and attribute names for liver disorders dataset [35].
| The number of attribute | The name of attribute | Description of the attribute | Interval of attribute |
|---|---|---|---|
| 1 | MCV | Mean corpuscular volume | 65–103 |
| 2 | Alkphos | Alkaline phosphatase | 23–138 |
| 3 | SGPT | Alamine aminotransferase | 4–155 |
| 4 | SGOT | Aspartate aminotransferase | 5–82 |
| 5 | gammaGT | Gamma-glutamyl transpeptidase | 5–297 |
| 6 | Drinks | Number of half-pint equivalents of alcoholic beverages drunk per day | 0–20 |
Features and parameters of the diabetes dataset.
| Features | Mean | Standard deviation | Min | Max |
|---|---|---|---|---|
| Number of times pregnant | 3.8 | 3.4 | 0 | 17 |
| Plasma glucose concentration, 2 h in an oral glucose tolerance test | 120.9 | 32.0 | 0 | 199 |
| Diastolic blood pressure (mm Hg) | 69.1 | 19.4 | 0 | 122 |
| Triceps skinfold thickness (mm) | 20.5 | 16.0 | 0 | 99 |
| 2-hour serum insulin (mu U/mL) | 79.8 | 115.2 | 0 | 846 |
| Body mass index (kg/m2) | 32.0 | 7.9 | 0 | 67.1 |
| Diabetes pedigree function | 0.5 | 0.3 | 0.078 | 2.42 |
| Age (years) | 33.2 | 11.8 | 21 | 81 |
Pseudocode 3Pseudo-code of developed feature selection algorithm based on ABC.
List of datasets.
| Databases | Number of classes | Samples | Number of features | Number of selected features | Selected features |
|---|---|---|---|---|---|
| Hepatitis | 2 | 155 | 19 | 11 | 12, 14, 13, 15, 18, 1, 17, 5, 16, 2, 4 |
| Liver disorders | 2 | 345 | 6 | 5 | 5, 3, 2, 4, 1 |
| Diabetes | 2 | 768 | 8 | 6 | 2, 8, 6, 7, 4, 5 |
List of classification parameters.
| Parameters | Value |
|---|---|
| Method | SVM |
| Optimization algorithm | SMO |
| Validation method |
|
| Kernel_Function | Linear |
| TolKKT | 1.0000 |
| MaxIter | 15000 |
| KernelCacheLimit | 5000 |
| The initial value | Random |
Performance of classification for the hepatitis, liver disorders, and diabetes datasets.
| Performance criteria | Hepatitis dataset | Liver disorders dataset | Diabetes dataset |
|---|---|---|---|
| Classification accuracy (%) | 94.92 | 74.81 | 79.29 |
| Sensitivity (%) | 97.13 | 88.22 | 89.84 |
| Specificity (%) | 88.33 | 56.68 | 59.61 |
| Positive predictive value (%) | 96.91 | 73.99 | 80.63 |
| Negative predictive value (%) | 88.33 | 78.57 | 75.65 |
Classification accuracies obtained by our method and other classifiers for the hepatitis dataset.
| Author (year) | Method | Classification accuracy (%) |
|---|---|---|
| Polat and Güneş (2006) [ | FS-AIRS with fuzzy res. (10-fold CV) | 92.59 |
| Polat and Güneş (2007) [ | FS-Fuzzy-AIRS (10-fold CV) | 94.12 |
|
Polat and Güneş (2007) [ | AIRS (10-fold CV) | 76.00 |
| PCA-AIRS (10-fold CV) | 94.12 | |
| Kahramanli and Allahverdi (2009) [ | Hybrid system (ANN and AIS) (without | 96.8 |
| Dogantekin et al. (2009) [ | LDA-ANFIS | 94.16 |
| Bascil and Temurtas (2011) [ | MLNN (MLP) + LM (10-fold CV) | 91.87 |
| Our study | ABCFS + SVM (10-fold CV) |
|
Classification accuracies obtained by our method and other classifiers for the liver disorders dataset.
| Author (year) | Method | Classification accuracy (%) |
|---|---|---|
|
Lee and Mangasarian (2001) [ | SSVM (10-fold CV) | 70.33 |
| van Gestel et al. (2002) [ | SVM with GP (10-fold CV) | 69.7 |
|
Gonçalves et al. (2006) [ | HNFB-1 method | 73.33 |
|
Özşen and Güneş (2008) [ | AWAIS (10-fold CV) | 70.17 |
| AIS with hybrid similarity measure (10-fold CV) | 60.57 | |
| AIS with Manhattan distance (10-fold CV) | 60.21 | |
| AIS with Euclidean distance (10-fold CV) | 60.00 | |
| Li et al. (2011) [ | A fuzzy-based nonlinear transformation method + SVM | 70.85 |
| Chen et al. (2012) [ | (PSO) + 1-NN method (5-fold CV) | 68.99 |
| Chang et al. (2012) [ | CBR + PSO (train: 75%-test: 25%) | 76.81 |
| Our study | ABCFS + SVM (train: 75%-test: 25%) |
|
| ABCFS + SVM (10-fold CV) |
|
Classification accuracies obtained by our method and other classifiers for diabetes dataset.
| Author (year) | Method | Classification accuracy (%) |
|---|---|---|
| Şahan et al. (2005) [ | AWAIS (10-fold CV) | 75.87 |
| Polat and Güneş (2007) [ | Combining PCA and ANFIS | 89.47 |
|
Polat et al. (2008) [ | LS-SVM (10-fold CV) | 78.21 |
| GDA-LS-SVM (10-fold CV) | 82.05 | |
| Kahramanli and Allahverdi (2008) [ | Hybrid system (ANN and FNN) | 84.2 |
| Patil et al. (2010) [ | Hybrid prediction model (HPM ) with reduced dataset | 92.38 |
| Isa and Mamat (2011) [ | Clustered-HMLP | 80.59 |
| Aibinu et al. (2011) [ | AR1 + NN (3-fold CV) | 81.28 |
| Our study | ABCFS + SVM (train: 75%-test: 25%) |
|
| ABCFS + SVM (10-fold CV) |
|