| Literature DB >> 26491713 |
C V Subbulakshmi1, S N Deepa1.
Abstract
Medical data classification is a prime data mining problem being discussed about for a decade that has attracted several researchers around the world. Most classifiers are designed so as to learn from the data itself using a training process, because complete expert knowledge to determine classifier parameters is impracticable. This paper proposes a hybrid methodology based on machine learning paradigm. This paradigm integrates the successful exploration mechanism called self-regulated learning capability of the particle swarm optimization (PSO) algorithm with the extreme learning machine (ELM) classifier. As a recent off-line learning method, ELM is a single-hidden layer feedforward neural network (FFNN), proved to be an excellent classifier with large number of hidden layer neurons. In this research, PSO is used to determine the optimum set of parameters for the ELM, thus reducing the number of hidden layer neurons, and it further improves the network generalization performance. The proposed method is experimented on five benchmarked datasets of the UCI Machine Learning Repository for handling medical dataset classification. Simulation results show that the proposed approach is able to achieve good generalization performance, compared to the results of other classifiers.Entities:
Year: 2015 PMID: 26491713 PMCID: PMC4605351 DOI: 10.1155/2015/418060
Source DB: PubMed Journal: ScientificWorldJournal ISSN: 1537-744X
Figure 1Architecture of the ELM (single layer FFNN).
Figure 2Flowchart of the basic PSO.
Figure 3Flowchart of the PSO with self-regulated learning scheme.
Description of datasets.
| Dataset | Number of instances | Number of features | Number of classes |
|---|---|---|---|
| Wisconsin Breast Cancer | 699 | 9 | 2 |
| Pima Indians Diabetes | 768 | 8 | 2 |
| Heart-Statlog | 270 | 13 | 2 |
| Hepatitis | 155 | 19 | 2 |
| Cleveland Heart Disease | 296 | 13 | 5 |
Parameters for the proposed algorithms.
| Parameters | ELM | Parameters | SRLPSO |
|---|---|---|---|
| Initial weights and bias between networks | 0 | Particle size | 50 |
| Learning rate | 1 | Acceleration constants | 2.0 |
| Number of neurons in input and output layers | Based on datasets considered |
| 0.4 and 1.0 |
| Maximum iteration | 500 | Maximum iteration | 500 |
Classification results with Breast Cancer dataset.
| Methodology adopted | Accuracy (%) | Sensitivity (%) | Specificity (%) | Selected features |
|---|---|---|---|---|
| Optimized LVQ (10x CV) [ | 96.70 | 91.29 | 92.34 | 2, 3, 6 |
| Big LVQ (10x CV) [ | 96.80 | 95.23 | 96.10 | 2, 3, 6 |
| AIRS (10x CV) [ | 97.20 | 96.92 | 95.00 | 2, 3, 6, 7 |
| Supervised fuzzy clustering (10x CV) [ | 95.57 | 98.23 | 97.36 | 2, 3, 6, 7, 8 |
| Fuzzy-AIS-knn (10x CV) [ | 99.14 | 99.56 | 100 | 2, 3, 6, 7, 8 |
|
| 99.51 | 99.24 | 98.61 | 2, 3, 6, 7 |
| Association rule + neural network [ | 97.4 | 93.12 | 91.26 | 2, 3, 6, 7, 8 |
| Artificial metaplasticity neural network [ | 99.26 | 100 | 97.89 | 2, 3, 6, 7, 8 |
| Mean selection method [ | 95.99 | 93 | 97 | 2, 3, 6, 7 |
| Half selection method [ | 96.71 | 94 | 98 | 2, 3, 6, 7, 8 |
| Neural network for threshold selection [ | 97.28 | 94 | 99 | 1, 2, 3, 5, 6, 7, 8 |
| PSO + ELM |
|
|
|
|
| Proposed SRLPSO + ELM |
|
|
|
|
Figure 4Classification rate for Breast Cancer dataset.
Classification results with Pima Indians Diabetes dataset.
| Methodology adopted | Accuracy (%) | Sensitivity (%) | Specificity (%) | Selected features |
|---|---|---|---|---|
| PCA-ANFIS (10x FC) [ | 89.47 | 70 | 71.1 | 1, 2, 6, 7, 8 |
| LS-ELM (10x FC) [ | 78.21 | 73.91 | 80 | 1, 2, 6, 8 |
| GDA-LS-ELM (10x FC) [ | 79.16 | 79.1 | 83.33 | 1, 2, 6, 7, 8 |
| MLNN with LM (10x FC) [ | 79.62 | 70 | 70.31 | 1, 2, 6, 8 |
| PNN (10x FC) [ | 78.05 | 71 | 70.5 | 2, 6, 8 |
| LDA-MWELM [ | 89.74 | 83.33 | 93.75 | 1, 2, 6, 7, 8 |
| Mean selection method [ | 76.04 | 71 | 78 | 2, 6, 8 |
| Half selection method [ | 75.91 | 69 | 79 | 1, 2, 6, 8 |
| Neural network for threshold selection [ | 76.04 | 71 | 78 | 2, 6, 8 |
| PSO + ELM |
|
|
|
|
| Proposed SRLPSO + ELM |
|
|
|
|
Figure 5Classification rate for Diabetes dataset.
Classification results with Heart-Statlog dataset.
| Methodology adopted | Accuracy (%) | Sensitivity (%) | Specificity (%) | Selected features |
|---|---|---|---|---|
| Evolutionary sigmoidal unit neural network (ESUNN) [ | 83.22 | 84.32 | 81.65 | 3, 8, 9, 11, 12 |
| Evolutionary product unit neural network (EPUNN) [ | 81.89 | 83.67 | 84.91 | 8, 9, 11, 12 |
| Multilogistic regression + EPUNN [ | 83.12 | 78.15 | 80.59 | 8, 9, 11, 12, 13 |
| Mean selection method [ | 84.44 | 85 | 84 | 3, 8, 9, 11, 12, 13 |
| Half selection method [ | 84.81 | 85 | 84 | 3, 8, 9, 10, 11, 12, 13 |
| Neural network for threshold selection [ | 85.19 | 85 | 86 | 3, 11, 12, 13 |
| PSO + ELM |
|
|
|
|
| Proposed SRLPSO + ELM |
|
|
|
|
Figure 6Classification rate for Heart-Statlog dataset.
Classification results with Hepatitis dataset.
| Methodology adopted | Accuracy (%) | Sensitivity (%) | Specificity (%) | Selected features |
|---|---|---|---|---|
| Conventional artificial neural network [ | 97.00 | 92.31 | 94.5 | All |
| Mean selection method [ | 82.58 | 87 | 60 | 5, 6, 11, 12, 13, 14, 17, 19 |
| Half selection method [ | 85.16 | 90 | 66 | 2, 5, 6, 10, 11, 12, 13, 14, 17, 19 |
| Neural network for threshold selection [ | 85.16 | 90 | 66 | 2, 5, 6, 10, 11, 12, 13, 14, 17, 19 |
| PSO + ELM |
|
|
|
|
| Proposed SRLPSO + ELM |
|
|
|
|
Figure 7Classification rate for Hepatitis dataset.
Classification results with Cleveland Heart Disease dataset.
| Methodology adopted | Accuracy (%) | Sensitivity (%) | Specificity (%) | Selected features |
|---|---|---|---|---|
| C4.5 [ | 81.11 | 77.23 | 76.58 | All |
| Naive Bayes [ | 81.48 | 80.97 | 81.22 | 3, 9, 11, 12 |
| BNND [ | 81.11 | 82.13 | 80.42 | 3, 11, 12 |
| BNNF [ | 80.96 | 76.93 | 75.81 | 3, 8, 9, 11, 12 |
| AIRS [ | 84.50 | 75.34 | 72.96 | All |
| Hybrid neural network [ | 87.40 | 93.00 | 78.50 | 3, 8, 9, 10, 11, 12 |
| Neural networks ensemble [ | 89.01 | 80.95 | 95.91 | All |
| Mean selection method [ | 81.75 | 82 | 82 | 3, 8, 9, 11, 12, 13 |
| Half selection method [ | 83.44 | 84 | 83 | 3, 8, 9, 10, 11, 12, 13 |
| Neural network for threshold selection [ | 84.46 | 82 | 82 | 3, 12, 13 |
| PSO + ELM |
|
|
|
|
| Proposed SRLPSO + ELM |
|
|
|
|
Figure 8Classification rate for Cleveland Heart Disease dataset.