| Literature DB >> 28127385 |
Xiao Liu1, Xiaoli Wang1, Qiang Su1, Mo Zhang2, Yanhong Zhu3, Qiugen Wang4, Qian Wang4.
Abstract
Heart disease is one of the most common diseases in the world. The objective of this study is to aid the diagnosis of heart disease using a hybrid classification system based on the ReliefF and Rough Set (RFRS) method. The proposed system contains two subsystems: the RFRS feature selection system and a classification system with an ensemble classifier. The first system includes three stages: (i) data discretization, (ii) feature extraction using the ReliefF algorithm, and (iii) feature reduction using the heuristic Rough Set reduction algorithm that we developed. In the second system, an ensemble classifier is proposed based on the C4.5 classifier. The Statlog (Heart) dataset, obtained from the UCI database, was used for experiments. A maximum classification accuracy of 92.59% was achieved according to a jackknife cross-validation scheme. The results demonstrate that the performance of the proposed system is superior to the performances of previously reported classification techniques.Entities:
Mesh:
Year: 2017 PMID: 28127385 PMCID: PMC5239990 DOI: 10.1155/2017/8272091
Source DB: PubMed Journal: Comput Math Methods Med ISSN: 1748-670X Impact factor: 2.238
Algorithm 1Pseudocode of ReliefF.
Figure 1Structure of RFRS-based classification system.
Algorithm 2Pseudocode of heuristic RS reduction algorithm.
Feature information of Statlog (Heart) dataset.
| Feature | Code | Description | Domain | Data type | Mean | Standard deviation |
|---|---|---|---|---|---|---|
| Age |
| — | 29–77 | Real | 54 | 9 |
| Sex |
| Male, female | 0, 1 | Binary | — | — |
| Chest pain type |
| Angina, asymptomatic, abnormal | 1, 2, 3, 4 | Nominal | — | — |
| Resting blood pressure |
| — | 94–200 | Real | 131.344 | 17.862 |
| Serum cholesterol in mg/dl |
| — | 126–564 | Real | 249.659 | 51.686 |
| Fasting blood sugar > 120 mg/dl |
| — | 0, 1 | Binary | — | — |
| Resting electrocardiographic results |
| Norm, abnormal, hyper | 0, 1, 2 | Nominal | — | — |
| Maximum heart rate achieved |
| — | 71–202 | Real | 149.678 | 23.1666 |
| Exercise-induced angina |
| — | 0, 1 | Binary | — | — |
| Old peak = ST depression induced by exercise relative to rest |
| — | 0–6.2 | Real | 1.05 | 1.145 |
| Slope of the peak exercise ST segment |
| Up, flat, down | 1, 2, 3 | Ordered | — | — |
| Number of major vessels (0–3) colored by fluoroscopy |
| — | 0, 1, 2, 3 | Real | — | — |
| Thal |
| Normal, fixed defect, reversible defect | 3, 6, 7 | Nominal | — | — |
The confusion matrix.
| Predicted patients with heart disease | Predicted healthy persons | |
|---|---|---|
| Actual patients with heart disease | True positive (TP) | False negative (FN) |
| Actual healthy persons | False positive (FP) | True negative (TN) |
Results of the ReliefF algorithm.
| Feature |
|
|
|
|
|
|
|
|
|
|
|
|
|
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Weight | 0.172 | 0.147 | 0.126 | 0.122 | 0.106 | 0.098 | 0.057 | 0.046 | 0.042 | 0.032 | 0.028 | 0.014 | 0.011 |
Performance values for different reduced subset.
| Code | Reduct | Number | Test classification accuracy (%) | |||
|---|---|---|---|---|---|---|
| Ensemble classifier | ||||||
|
| Sn | Sp | ACC | |||
|
|
| 7 | 50 | 83.33 | 87.5 | 85.19 |
| 100 | 83.33 | 95.83 | 88.89 | |||
| 150 | 86.67 | 83.33 | 85.19 | |||
|
|
| 7 | 50 | 86.67 | 91.67 | 88.89 |
| 100 | 93.33 | 87.50 | 92.59 | |||
| 150 | 93.33 | 87.04 | 90.74 | |||
|
|
| 7 | 50 | 86.67 | 83.33 | 85.19 |
| 100 | 93.33 | 79.17 | 87.04 | |||
| 150 | 80 | 91.67 | 85.19 | |||
|
|
| 8 | 50 | 86.67 | 83.33 | 85.19 |
| 100 | 93.33 | 83.33 | 88.89 | |||
| 150 | 86.67 | 87.5 | 87.04 | |||
Figure 2Training process of R 7.
Figure 3ROC curves for training and test sets.
Classification results using the four classifiers.
| Classifiers | Test classification accuracy of | ||
|---|---|---|---|
| Sn | Sp | Acc | |
| Ensemble classifier ( | 86.67 | 91.67 | 88.89 |
| Ensemble classifier ( | 93.33 | 87.50 | 92.59 |
| Ensemble classifier ( | 93.33 | 87.04 | 90.74 |
| C4.5 tree | 93.1 | 80 | 87.03 |
| Naïve Bayes | 93.75 | 68.18 | 83.33 |
| Bayesian Neural Networks (BNN) | 93.75 | 72.72 | 85.19 |
Comparison of our results with those of other studies.
| Author | Method | Classification accuracy (%) |
|---|---|---|
| Our study | RFRS classification system | 92.59 |
| Lee [ | Graphical characteristics of BSWFM combined with Euclidean distance | 87.4 |
| Tomar and Agarwal [ | Feature selection-based LSTSVM | 85.59 |
| Buscema et al. [ | TWIST algorithm | 84.14 |
| Subbulakshmi et al. [ | ELM | 87.5 |
| Karegowda et al. [ | GA + Naïve Bayes | 85.87 |
| Srinivas et al. [ | Naïve Bayes | 83.70 |
| Polat and Güneş [ | RBF kernel | 83.70 |
| Özşen and Güneş [ | GA-AWAIS | 87.43 |
| Helmy and Rasheed [ | Algebraic Sigmoid | 85.24 |
| Wang et al. [ | Linear kernel SVM classifiers | 83.37 |
| Özşen and Güneş [ | Hybrid similarity measure | 83.95 |
| Kahramanli and Allahverdi [ | Hybrid neural network method | 86.8 |
| Yan et al. [ | ICA + SVM | 83.75 |
| Şahan et al. [ | AWAIS | 82.59 |
| Duch et al. [ |
| 85.6 |
BSWFM: bounded sum of weighted fuzzy membership functions; LSTSVM: Least Square Twin Support Vector Machine; TWIST: Training with Input Selection and Testing; ELM: Extreme Learning Machine; GA: genetic algorithm; SVM: support vector machine; ICA: imperialist competitive algorithm; AWAIS: attribute weighted artificial immune system; KNN: k-nearest neighbor.