| Literature DB >> 31885684 |
Liaqat Ali1,2, Shafqat Ullah Khan3, Noorbakhsh Amiri Golilarz4, Imrana Yakubu4, Iqbal Qasim5, Adeeb Noor6, Redhwan Nour7.
Abstract
Heart failure (HF) is considered a deadliest disease worldwide. Therefore, different intelligent medical decision support systems have been widely proposed for detection of HF in literature. However, low rate of accuracies achieved on the HF data is a major problem in these decision support systems. To improve the prediction accuracy, we have developed a feature-driven decision support system consisting of two main stages. In the first stage, χ 2 statistical model is used to rank the commonly used 13 HF features. Based on the χ 2 test score, an optimal subset of features is searched using forward best-first search strategy. In the second stage, Gaussian Naive Bayes (GNB) classifier is used as a predictive model. The performance of the newly proposed method (χ 2-GNB) is evaluated by using an online heart disease database of 297 subjects. Experimental results show that our proposed method could achieve a prediction accuracy of 93.33%. The developed method (i.e., χ 2-GNB) improves the HF prediction performance of GNB model by 3.33%. Moreover, the newly proposed method also shows better performance than the available methods in literature that achieved accuracies in the range of 57.85-92.22%.Entities:
Mesh:
Year: 2019 PMID: 31885684 PMCID: PMC6925936 DOI: 10.1155/2019/6314328
Source DB: PubMed Journal: Comput Math Methods Med ISSN: 1748-670X Impact factor: 2.238
Commonly used HF features of the dataset.
| Feature no. | Feature description | Feature code |
|---|---|---|
| 1 | Age (AGE) |
|
| 2 | Sex (SEX) |
|
| 3 | Chest pain type (CPT) |
|
| 4 | Resting blood pressure (RBP) |
|
| 5 | Serum cholesterol (SCH) |
|
| 6 | Fasting blood Sugar (FBS) |
|
| 7 | Resting electrocardiographic results (RES) |
|
| 8 | Maximum heart rate achieved (MHR) |
|
| 9 | Exercise induced angina (EIA) |
|
| 10 | Old peak (OPK) |
|
| 11 | Peak exercise slope (PES) |
|
| 12 | Number of major vessels colored by fluoroscopy (VCA) |
|
| 13 | Thallium scan (THA) |
|
Table to compute χ2 test score.
| Positive class | Negative class | Total | |
|---|---|---|---|
| Feature |
|
|
|
| Feature |
|
|
|
|
|
|
|
|
The sum of instances comprising feature f is denoted by μ, the sum of instances without feature f is denoted by τ − μ, the sum of instances that are positive is expressed as ω, and the sum of instances that are negative are represented by τ − ω. Let the observed values be α, β, λ, and γ with the expected values E, E, E, and E. Based on the hypothesis that the two events are independent, the expected value can be evaluated as follows:
Results of different subsets of features for the heart disease dataset.
| n |
|
| Spec. (%) | Sens. (%) | MCC |
|---|---|---|---|---|---|
| 1 | 78.88 | 75.36 | 81.63 | 75.60 | 0.573 |
| 2 | 88.11 | 78.26 | 79.59 | 82.92 | 0.622 |
| 3 | 84.44 | 81.64 | 89.79 | 78.04 | 0.686 |
| 4 | 86.66 | 79.71 | 91.83 | 80.48 | 0.732 |
| 5 | 86.66 | 80.19 | 93.87 | 78.04 | 0.734 |
| 6 | 86.66 | 80.67 | 89.79 | 82.92 | 0.730 |
| 7 | 90.00 | 82.12 | 91.83 | 87.80 | 0.798 |
| 8 | 90.00 | 83.57 | 93.87 | 85.36 | 0.799 |
|
|
|
|
|
|
|
| 10 | 90.00 | 81.64 | 93.87 | 85.36 | 0.799 |
| 11 | 90.00 | 82.60 | 93.87 | 85.36 | 0.599 |
| 12 | 90.00 | 84.05 | 93.87 | 85.36 | 0.799 |
| 13 | 90.00 | 82.12 | 93.87 | 85.36 | 0.799 |
Figure 1Confusion matrix of training data.
Figure 2Confusion matrix of testing data.
Experimental results of other optimized machine learning models.
| Model | Hyperparameters |
| Spec. | Sens. | MCC |
|---|---|---|---|---|---|
| SVM (linear) |
| 90 | 93.87 | 85.36 | 0.799 |
| SVM (RBF) |
| 90 | 93.87 | 85.36 | 0.799 |
| Adaboost |
| 88 | 89.79 | 87.80 | 0.776 |
| Extra tree |
| 88 | 89.79 | 87.80 | 0.776 |
| Random forest |
| 88 | 93.87 | 82.92 | 0.777 |
|
|
|
|
|
|
|
Figure 3ROC charts of the χ2-GNB model and optimized SVM and ensemble models. ROC chart of the (a) proposed model, (b) Adaboost ensemble model, (c) extra tree ensemble model, (d) random forest, (e) linear SVM model, and (f) SVM (RBF) model.
Details of other machine learning methods proposed for HF prediction and their obtained HF prediction accuracies.
| Study (year) | Method | Accuracy (%) |
|---|---|---|
| ToolDiag, RA [ | IB1-4 | 50.00 |
| WEKA, RA [ | InductH | 58.50 |
| ToolDiag, RA [ | RBF | 60.00 |
| WEKA, RA [ | FOIL | 64.00 |
| ToolDiag, RA [ | MLP + BP | 65.60 |
| WEKA, RA [ | T2 | 68.10 |
| WEKA, RA [ | 1R | 71.40 |
| WEKA, RA [ | IB1c | 74.00 |
| WEKA, RA [ | K | 76.70 |
| Robert Detrano [ | Logistic regression | 77.00 |
| Cheung (2001) [ | C4.5 | 81.11 |
| Cheung (2001) [ | Naive Bayes | 81.48 |
| Cheung (2001) [ | BNND | 81.11 |
| Cheung (2001) [ | BNNF | 80.96 |
| WEKA, RA [ | Naive Bayes | 83.60 |
| Ster and Dobnikar [ | Fisher discriminant analysis | 84.2 |
| Ster and Dobnikar [ | Linear discriminant analysis | 84.5 |
| Ster and Dobnikar [ | Naive Bayes | 82.5–83.4 |
| Polat et al. (2005) [ | AIRS | 84.50 |
| Ozsen et al. (2005) [ | Kernel functions with AIS | 85.93 |
| Kahramanli and Allahverdi (2008) [ | Hybrid neural network system | 86.8 |
| Polat et al. (2006) [ | Fuzzy-AIRS-Knn-based system | 87.00 |
| Özşen and Güneş (2009) [ | Modified artificial immune system | 87.43 |
| Das et al. (2009) [ | Neural network ensembles | 89.01 |
| Jankowski and Kadirkamanathan (1997) [ | IncNet | 90.00 |
| Kumar (2011) [ | ANFIS | 91.18 |
| Samuel et al. (2017) [ | ANN-fuzzy-AHP | 91.10 |
| Kumar (2012) [ | Fuzzy resolution mechanism | 91.83 |
| Ali et al. (2019) [ | Stacked and optimized SVMs | 92.22 |
| Paul et al. (2018) [ | Adaptive weighted fuzzy system ensemble | 92.31 |
|
|
|
|