| Literature DB >> 26042184 |
Juanying Xie1, Jinhu Lei1, Weixin Xie2, Yong Shi3, Xiaohui Liu4.
Abstract
This paper proposes two-stage hybrid feature selection algorithms to build the stable and efficient diagnostic models where a new accuracy measure is introduced to assess the models. The two-stage hybrid algorithms adopt Support Vector Machines (SVM) as a classification tool, and the extended Sequential Forward Search (SFS), Sequential Forward Floating Search (SFFS), and Sequential Backward Floating Search (SBFS), respectively, as search strategies, and the generalized F-score (GF) to evaluate the importance of each feature. The new accuracy measure is used as the criterion to evaluated the performance of a temporary SVM to direct the feature selection algorithms. These hybrid methods combine the advantages of filters and wrappers to select the optimal feature subset from the original feature set to build the stable and efficient classifiers. To get the stable, statistical and optimal classifiers, we conduct 10-fold cross validation experiments in the first stage; then we merge the 10 selected feature subsets of the 10-cross validation experiments, respectively, as the new full feature set to do feature selection in the second stage for each algorithm. We repeat the each hybrid feature selection algorithm in the second stage on the one fold that has got the best result in the first stage. Experimental results show that our proposed two-stage hybrid feature selection algorithms can construct efficient diagnostic models which have got better accuracy than that built by the corresponding hybrid feature selection algorithms without the second stage feature selection procedures. Furthermore our methods have got better classification accuracy when compared with the available algorithms for diagnosing erythemato-squamous diseases.Entities:
Year: 2013 PMID: 26042184 PMCID: PMC4453584 DOI: 10.1186/2047-2501-1-10
Source DB: PubMed Journal: Health Inf Sci Syst ISSN: 2047-2501
Figure 1New hybrid feature selection algorithms.
Eryrhenato-squamous diseases dataset from UCI
| Diseases (patient number) | Clinical feature | Histopathological feature |
|---|---|---|
| Psoriasis (111) | Feature 1: Erythema | Feature 12: Melanin incontinence |
| Seboreic dermatitis (60) | Feature 2: Scaling | Feature 13: Eosrinophils in the infiltrate |
| Lichen planus (71) | Feature 3: Definite borders | Feature 14: PNL infiltrate |
| Pityriasis rosea (48) | Feature 4: Itching | Feature 15: Fibrosis of the papillary dermis |
| Chronic dermatitis (48) | Feature 5: Koebner phenomenon | Feature 16: Exocytosis |
| Pityriasis rubra pilaris (20) | Feature 6: Polygonal papules | Feature 17: Acanthosis |
| Feature 7: Follicular papules | Feature 18: Hyperkeratosis | |
| Feature 8: Oral mucosal involvement | Feature 19: Parakeratosis | |
| Feature 9: Kneeand elbow involvement | Feature 20: Clubbing of the rete ridges | |
| Feature 10: Scalp involvement | Feature 21: Elongation of the rete ridges | |
| Feature 11: Family history | Feature 22: Thinning of the suprapapillary epidermis | |
| Feature 34: Age | Feature 23: Pongiform pustule | |
| Feature 24: Munro microabcess | ||
| Feature 25: Focal hypergranulosis | ||
| Feature 26: Disappearance of the granular layer | ||
| Feature 27: Vacuolization and damage of basal layer | ||
| Feature 28: Spongiosis | ||
| Feature 29: Saw-tooth appearance of retes | ||
| Feature 30: Follicular horn plug | ||
| Feature 31: Perifollicular parakeratosis | ||
| Feature 32: Inflammatory mononuclear infiltrate | ||
| Feature 33: Band-like infiltrate |
Experimental results of GFSFS with ordinary accuracy
| Fold | Selected feature subset | Size of selected | Accuracy (%) |
|---|---|---|---|
| feature subset | |||
| 1 | 33, 27,29, 31, 6, 12, 20, 15, 25, 22, 8, 7 | 22 | 100.0000 |
| 21, 30, 9, 10, 16, 24, 28, 14, 5, 26 | |||
| 2 | 33, 29, 27, 31, 6, 12, 15, 20, 25, 22, 7, 8 | 22 | 97.2222 |
| 21, 30, 9, 24, 10, 28, 16, 14, 5, 26 | |||
| 3 | 33, 27, 31, 29, 6, 12, 22, 25, 15, 20, 8, 7 | 23 | 100.0000 |
| 30, 21, 9, 28, 10, 16, 24, 14, 5, 34, 26 | |||
| 4 | 33, 27, 31, 29, 6, 12, 20, 15, 22, 7, 25, 8 | 23 | 94.4444 |
| 21, 30, 9, 28, 16, 24, 10, 14, 5, 34, 26 | |||
| 5 | 33, 27, 6, 31, 29, 12, 15, 22, 20, 25, 7, 8 | 21 | 100.0000 |
| 21, 30, 9, 28, 24, 16, 10, 14, 5 | |||
| 6 | 27, 33, 31, 6, 29, 12, 25, 22, 15, 20, 7, 8 | 23 | 100.0000 |
| 21, 30, 9, 24, 16, 28, 10, 14, 5, 34, 26 | |||
| 7 | 33, 27, 31, 29, 6, 12, 20, 15, 22, 7, 25, 8 | 21 | 100.0000 |
| 21, 30, 9, 24, 28, 16, 10, 14, 5 | |||
| 8 | 33, 27, 29, 12, 31, 6, 15, 22, 20, 7, 25, 8 | 23 | 97.2222 |
| 21, 30, 9, 28, 16, 24, 10, 14, 5, 34,26 | |||
| 9 | 33, 27, 29, 31, 6, 12, 22, 20, 25, 15, 7, 8 | 23 | 100.0000 |
| 21, 30, 9, 28, 16, 24, 10, 14, 5, 34, 26 | |||
| 10 | 33, 27, 31, 29, 6, 12, 20, 22, 15, 25, 8, 7 | 21 | 100.0000 |
| 21, 30, 9, 28, 10, 24, 16, 14, 5 | |||
| Average & common | 33, 27, 29, 31, 6, 12, 20, 15, 25, 22, 8, 7 | 22.20 | 98.89 |
| 21, 30, 9, 10, 16, 24, 28, 14, 5 |
Experimental results of GFSFS with new accuracy
| Fold | Selected feature subset | Size of selected | Accuracy (%) |
|---|---|---|---|
| feature subset | |||
| 1 | 33, 27, 29, 31, 6, 12, 20, 15, 25, 22, 8, 7 | 22 | 100.0000 |
| 21, 30, 9, 10, 16, 24, 28, 14, 5, 26 | |||
| 2 | 33, 29, 27, 31, 6, 12, 15, 20, 25, 22, 7, 8 | 22 | 97.2222 |
| 21, 30, 9, 24, 10, 28, 16, 14, 5, 26 | |||
| 3 | 33, 27, 31, 29, 6, 12, 22, 25, 15, 20, 8, 7, 30 | 23 | 100.0000 |
| 21, 9, 28, 10, 16, 24, 14, 5, 34, 26 | |||
| 4 | 33, 27, 31, 29, 6, 12, 20, 15, 22, 7, 25, 8 | 23 | 94.4444 |
| 21, 30, 9, 28, 16, 24, 10, 14, 5, 34, 26 | |||
| 5 | 33, 27, 6, 31, 29, 12, 15, 22, 20, 25, 7, 8 | 21 | 100.0000 |
| 21, 30, 9, 28, 24, 16, 10, 14, 5 | |||
| 6 | 27, 33, 31, 6, 29, 12, 25, 22, 15, 20, 7, 8 | 23 | 100.0000 |
| 21, 30, 9, 24, 16, 28, 10, 14, 5, 34, 26 | |||
| 7 | 33, 27, 31, 29, 6, 12, 20, 15, 22, 7, 25, 8 | 21 | 100.0000 |
| 21, 30, 9, 24, 28, 16, 10, 14, 5 | |||
| 8 | 33, 27, 29, 12, 31, 6, 15, 22, 20, 7, 25, 8 | 21 | 100.0000 |
| 21, 30, 9, 28, 16, 24, 10, 14, 5 | |||
| 9 | 33, 27, 29, 31, 6, 12, 22, 20, 25, 15, 7, 8 | 23 | 100.0000 |
| 21, 30, 9, 28, 16, 24, 10, 14, 5, 34, 26 | |||
| 10 | 33, 27, 31, 29, 6, 12, 20, 22, 15, 25, 8, 7 | 21 | 100.0000 |
| 21, 30, 9, 28, 10, 24, 16, 14, 5 | |||
| Average & common | 33, 27, 29, 31, 6, 12, 20, 15, 25, 22, 8, 7 | 22 | 99.17 |
| 21, 30, 9, 10, 16, 24, 28, 14, 5 |
Experimental results of GFSFFS with ordinary accuracy
| Fold | Selected feature subset | Size of selected feature subset | Accuracy (%) |
|---|---|---|---|
| 1 | 7, 31, 9, 5, 34, 4, 14, 28, 15, 17, 26, 25 | 12 | 100 |
| 2 | 5, 7, 14, 9, 31, 28, 15, 21, 16, 1, 17, 33, 18, 13 | 14 | 91.6667 |
| 3 | 26, 7, 31, 28, 9, 34, 15, 21, 14, 5, 2, 4, 17 | 13 | 100 |
| 4 | 7, 31, 30, 9, 5, 34, 28, 15, 21, 16, 4, 14, 1, 25, 33 | 15 | 88.8889 |
| 5 | 7, 31, 28, 15, 21, 5, 4, 14, 9, 34, 33, 29, 26, 18, 17 | 15 | 97.2222 |
| 6 | 7, 31, 9, 34, 28, 15, 21, 14, 16, 5, 33, 27, 26 | 13 | 97.2222 |
| 7 | 7, 31, 28, 9, 15, 21, 16, 14, 5, 4, 2, 33, 25 | 13 | 94.4444 |
| 8 | 7, 31, 33, 5, 28, 21, 15, 26, 29 | 9 | 97.2222 |
| 9 | 7, 31, 9, 34, 28, 21, 15, 5, 16, 14, 4, 2, 26, 17 | 14 | 94.1176 |
| 10 | 7, 31, 9, 28, 34, 15, 21, 5, 16, 4, 1, 18, 33, 32, 13 | 15 | 100 |
| Average & common | 7, 31, 5, 28, 15 | 13.3 | 96.08 |
Experimental results of GFSFFS with new accuracy
| Fold | Selected feature subset | Size of selected feature subset | Accuracy (%) |
|---|---|---|---|
| 1 | 33, 29, 31, 6, 20, 15, 7, 21, 10, 16, 28, 14, 5, 26, 18 | 15 | 100.0000 |
| 2 | 33, 29, 31, 15, 20, 22, 7, 28, 16, 14, 5, 26, 18 | 13 | 94.4444 |
| 3 | 33, 31, 6, 22, 25, 15, 20, 7, 28, 10, 16, 14, 5, 26 | 14 | 100.0000 |
| 4 | 33, 31, 6, 20, 15, 7, 25, 21, 9, 28, 24, 14, 5, 26, 19 | 15 | 94.4444 |
| 5 | 33, 6, 31, 15, 22, 20, 25, 28, 16, 10, 14, 5, 4, 26 | 14 | 100.0000 |
| 6 | 27, 31, 22, 15, 20, 7, 16, 10, 14, 5, 26 | 11 | 97.2222 |
| 7 | 33, 31, 6, 20, 15, 22, 25, 28, 16, 10, 14, 5, 26, 18 | 14 | 100.0000 |
| 8 | 33, 29, 31, 6, 15, 22, 20, 16, 14, 5, 26 | 11 | 97.2222 |
| 9 | 33, 29, 31, 6, 22, 15, 9, 28, 16, 10, 14, 5, 26 | 13 | 100.0000 |
| 10 | 33, 31, 20, 15, 7, 21, 9, 28, 10, 14, 5, 26 | 12 | 100.0000 |
| Average & common | 31, 15, 14, 5, 26 | 13.2 | 98.33 |
Experimental results of GFSBFS with ordinary accuracy
| Fold | Selected feature subset | Size of selected feature subset | Accuracy (%) |
|---|---|---|---|
| 1 | 17, 13, 19, 2, 3, 4, 34, 26, 5, 14, 28, 16 | 14 | 94.7368 |
| 15, 33 | |||
| 2 | 32, 18, 13, 17, 1, 26, 5, 14, 28, 15, 31, 29 | 12 | 94.4444 |
| 3 | 32, 18, 17, 13, 1, 19, 3, 23, 4, 26, 34, 5 | 18 | 100 |
| 14, 16, 28, 9, 15, 33 | |||
| 4 | 32, 13, 17, 1, 19, 2, 3, 23, 4, 26, 34, 5,14 | 19 | 97.2222 |
| 9, 7, 22, 15, 31, 33 | |||
| 5 | 1, 13, 17, 19, 2, 23, 26, 14, 24, 28, 9, 7 | 16 | 94.4444 |
| 25, 22, 15, 33 | |||
| 6 | 32, 18, 1, 13, 17, 19, 2, 3, 4, 26, 5, 28, 9 | 16 | 91.6667 |
| 15, 31, 27 | |||
| 7 | 32, 13, 18, 1, 17, 19, 2, 11, 4, 23, 3, 26 | 34 | 97.2222 |
| 34, 5, 14, 10, 16, 28, 24, 9, 30, 21, 8, 25 | |||
| 7, 22, 15, 20, 12, 6, 29, 31, 27, 33 | |||
| 8 | 32, 17, 13, 1, 3, 26, 5, 9, 30, 20, 22, 15 | 14 | 97.2222 |
| 31, 33 | |||
| 9 | 18, 13, 17, 1, 19, 2, 23, 3, 4, 26, 5, 14, 28 | 16 | 97.0588 |
| 15, 31, 33 | |||
| 10 | 32, 18, 13, 17, 1, 19, 11, 2, 3, 4, 23, 26 | 34 | 94.1176 |
| 34, 5, 14, 16, 24, 10, 28, 9, 30, 21, 7, 8 | |||
| 25, 15,22, 20, 12, 6, 29, 31, 27, 33 | |||
| Average & common | 17,13, 26, 15 | 19.3 | 95.81 |
Experimental results of GFSBFS with new accuracy
| Fold | Selected feature subset | Size of selected | Accuracy (%) |
|---|---|---|---|
| feature subset | |||
| 1 | 13, 1, 19, 2, 4, 23, 34, 26, 5, 14, 28, 16, 7 | 15 | 97.36842105 |
| 15, 33 | |||
| 2 | 18, 13, 17, 19, 2, 3, 4, 23, 26, 5, 16, 28, 9 | 16 | 94.44444444 |
| 22, 31, 33 | |||
| 3 | 32, 18, 17, 13, 1, 19, 23, 4, 26, 34, 5, 14 | 18 | 94.44444444 |
| 28, 9, 3 7, 15, 33 | |||
| 4 | 32, 18, 13, 17, 19, 2, 3, 4, 26, 34, 5, 14 | 18 | 97.22222222 |
| 16, 28, 7, 15, 20, 33 | |||
| 5 | 1, 13, 17, 2, 3, 26, 4, 5, 28, 9, 7, 15, 33 | 13 | 94.44444444 |
| 6 | 18, 1, 13, 17, 2, 3, 4, 23, 26, 34, 5, 14, 28 | 18 | 91.66666667 |
| 16, 15, 22, 31, 27 | |||
| 7 | 13, 18, 1, 17, 23, 26, 34, 5, 14, 16, 7, 22 | 16 | 97.22222222 |
| 15, 20, 27, 33 | |||
| 8 | 32, 18, 17, 13, 1, 19, 2, 3, 4, 26, 34, 5, 10 | 21 | 88.88888889 |
| 28, 9, 21, 25, 7, 22, 15, 27 | |||
| 9 | 32, 13, 1, 11, 19, 2, 23, 3, 26, 34, 5, 14 | 21 | 97.05882353 |
| 10, 24, 16, 28, 9, 21, 7, 15, 33 | |||
| 10 | 13, 1, 19, 3, 4, 26, 5, 14, 28, 9, 15, 22, 29 | 15 | 100 |
| 31, 33 | |||
| Average & common | 13, 26, 5 | 17.1 | 95.28 |
The emerged feature subset for our two-stage hybrid feature selection algorithms
| For methods | Emerged feature subset of 10-fold cross validation experiment | Size of new full feature subset |
|---|---|---|
| Two-stage GFSFS | 5, 6, 7, 8, 9, 10, 12, 14, 15, 16, 20, 21, 22, 24, 25, 26 | 23 |
| 27, 28, 29, 30, 31, 33, 34 | ||
| Two-stage new GFSFS | 5, 6, 7, 8, 9, 10, 12, 14, 15, 16, 20, 21, 22, 24, 25, 26 | 23 |
| 27, 28, 29, 30, 31, 33, 34 | ||
| Two-stage GFSFFS | 1, 2, 4, 5, 7, 9, 13, 14, 15, 16, 17, 18, 21, 25, 26, 27 | 23 |
| 28, 29, 30, 31, 32, 33, 34 | ||
| Two-stage new GFSFFS | 4, 5, 6, 7, 9, 10, 14, 15, 16, 18, 19, 20, 21, 22, 24, 25 | 22 |
| 26, 27, 28, 29, 31, 33 | ||
| Two-stage GFSBFS | 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17 | 34 |
| 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31 | ||
| 32, 33, 34 | ||
| Two-stage new GFSBFS | 1, 2, 3, 4, 5, 7, 9, 10, 11, 13, 14, 15, 16, 17, 18, 19 | 30 |
| 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 31, 32, 33, 34 |
Experimental results of our two-stage hybrid feature selection algorithms
| Methods | Selected feature subset | Size of selected feature subset | Accuracy (%) |
|---|---|---|---|
| GFSFS | 33, 27, 31, 29, 6, 12, 20, 22, 15, 25, 8 | 21 | 100 |
| 7, 21, 30, 9, 28, 10, 24, 16, 14, 5 | |||
| Two-stage GFSFS | 22, 19, 17, 21, 2, 7, 11, 9, 3, 13, 15, 4 | 20 | 100 |
| 12, 20, 5, 18, 14, 6, 10, 8 | |||
| New GFSFS | 33, 27, 31, 29, 6, 12, 20, 22, 15, 25, 8 | 21 | 100 |
| 7, 21, 30, 9, 28, 10, 24, 16, 14, 5 | |||
| Two-stage new GFSFS | 22, 19, 17, 21, 2, 7, 11, 9, 3, 13, 15, 4 | 20 | 100 |
| 12, 20, 5, 18, 14, 6, 10, 8 | |||
| GFSFFS | 7, 31, 9, 28, 34, 15, 21, 5, 16, 4, 1, 18 | 15 | 100 |
| 33, 32, 13 | |||
| Two-stage GFSFFS | 22, 18, 20, 9, 13, 6, 17, 10, 8, 4, 15 | 11 | 100 |
| New GFSFFS | 33, 31, 20, 15, 7, 21, 9, 28, 10, 14, 5 | 12 | 100 |
| 26 | |||
| Two-stage new GFSFFS | 22, 20, 21, 12, 8, 4, 14, 16, 13, 5, 19, 6 | 16 | 100 |
| 9, 7, 2, 17 | |||
| GFSBFS | 32, 18, 13, 17, 1, 19, 11, 2, 3, 4, 23, 26 | 34 | 94.1176 |
| 34, 5, 14, 16, 24, 10, 28, 9, 30, 21, 7, 8 | |||
| 25, 15,22, 20, 12, 6, 29, 31, 27, 33 | |||
| Two-stage GFSBFS | 32, 13, 19, 2, 3, 26, 34, 5, 14, 16, 28, 9 | 15 | 100 |
| 21, 15, 33 | |||
| New GFSBFS | 13, 1, 19, 3, 4, 26, 5, 14, 28, 9, 15, 22 | 15 | 100 |
| 29, 31, 33 | |||
| Two-stage new GFSBFS | 28, 15, 10, 14, 1, 16, 2, 3, 20, 23, 30, 5 | 19 | 97.05882 |
| 11, 13, 25, 7, 6, 12, 29 |
The accuracy comparison of all available classifiers
| Authors | Methods | Accuracy (%) |
|---|---|---|
| Übeyli and Güler (2005) | ANFIS | 95.50 |
| Luukka and Leppälampi (2006) | Fuzzy similarity-based classification | 97.02 |
| Polat and Günes (2006) | Fuzzy weighted pre-processing | 88.18 |
|
| 97.57 | |
| Decision tree | 99.00 | |
| Nanni (2006) | LSVM | 97.22 |
| RS | 97.22 | |
| B1_5 | 97.50 | |
| B1_10 | 98.10 | |
| B1_15 | 97.22 | |
| B2_5 | 97.50 | |
| B2_10 | 97.80 | |
| B2_15 | 98.30 | |
| Luukka (2007) | Similarity measure | 97.80 |
| Übeyli (2008) | Multiclass SVM with the ECOC | 98.32 |
| Polat and Günes (2009) | C4.5 and one-against-all | 96.71 |
| Übeyli (2009) | CNN | 97.77 |
| Liu et al. (2009) | Naïve Bayes | 96.72 |
| 1-NN | 92.18 | |
| C4.5 | 95.08 | |
| PIPPER | 92.20 | |
| Karabatak and Ince (2009) | AR and NN | 98.61 |
| Übeyli and Dog̈du (2010) | K-means clustering | 94.22 |
| Xie | IFSFFS | 97.58 |
| Xie | IFSFS | 98.61 |
| This study | GFSFS | 98.89 |
| new GFSFS | 99.17 | |
| GFSFFS | 96.08 | |
| new GFSFFS | 98.33 | |
| GFSBFS | 95.81 | |
| new GFSBFS | 95.28 | |
| two-stage GFSFS | 100 | |
| two-stage new GFSFS | 100 | |
| two-stage GFSFFS | 100 | |
| two-stage new GFSFFS | 100 | |
| two-stage GFSBFS | 100 | |
| two-stage new GFSBFS | 97.06 |