| Literature DB >> 33184369 |
Yar Muhammad1, Muhammad Tahir1, Maqsood Hayat2, Kil To Chong3.
Abstract
Heart disease is a fatal human disease, rapidly increases globally in both developed and undeveloped countries and consequently, causes death. Normally, in this disease, the heart fails to supply a sufficient amount of blood to other parts of the body in order to accomplish their normal functionalities. Early and on-time diagnosing of this problem is very essential for preventing patients from more damage and saving their lives. Among the conventional invasive-based techniques, angiography is considered to be the most well-known technique for diagnosing heart problems but it has some limitations. On the other hand, the non-invasive based methods, like intelligent learning-based computational techniques are found more upright and effectual for the heart disease diagnosis. Here, an intelligent computational predictive system is introduced for the identification and diagnosis of cardiac disease. In this study, various machine learning classification algorithms are investigated. In order to remove irrelevant and noisy data from extracted feature space, four distinct feature selection algorithms are applied and the results of each feature selection algorithm along with classifiers are analyzed. Several performance metrics namely: accuracy, sensitivity, specificity, AUC, F1-score, MCC, and ROC curve are used to observe the effectiveness and strength of the developed model. The classification rates of the developed system are examined on both full and optimal feature spaces, consequently, the performance of the developed model is boosted in case of high variated optimal feature space. In addition, P-value and Chi-square are also computed for the ET classifier along with each feature selection technique. It is anticipated that the proposed system will be useful and helpful for the physician to diagnose heart disease accurately and effectively.Entities:
Year: 2020 PMID: 33184369 PMCID: PMC7665174 DOI: 10.1038/s41598-020-76635-9
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.379
Classifiers’ success rates on full features using 10-fold CV on S1.
| Classification model | Accuracy | Sensitivity | Specificity | AUC | Precision | F1-score | MCC |
|---|---|---|---|---|---|---|---|
| KNN (k = 7) | 85.55 | 85.93 | 85.17 | 95.64 | 86.09 | 0.86 | 0.71 |
| DT | 86.82 | 89.73 | 83.76 | 91.89 | 85.40 | 0.87 | 0.73 |
| ET | 92.09 | 91.82 | 92.38 | 97.92 | 92.84 | 0.92 | 0.84 |
| GB | 91.34 | 90.32 | 91.52 | 96.87 | 92.14 | 0.92 | 0.83 |
| RF (n = 100) | 89.45 | 88.57 | 90.19 | 94.22 | 90.82 | 0.87 | 0.81 |
| SVM (kernel = ‘rbf’) | 84.28 | 93.15 | 74.94 | 92.00 | 79.77 | 0.86 | 0.69 |
| AB | 88.09 | 90.84 | 84.38 | 93.92 | 87.84 | 0.86 | 0.76 |
| NB | 82.33 | 86.31 | 78.15 | 90.17 | 80.98 | 0.83 | 0.65 |
| LR (C = 10) | 84.08 | 89.92 | 77.95 | 92.28 | 81.24 | 0.85 | 0.69 |
| ANN (13, 20, 2) | 85.07 | 84.35 | 83.72 | 92.54 | 83.19 | 0.82 | 0.70 |
Classifiers’ success rates on full features using 10-fold CV on S2.
| Classification model | Accuracy | Sensitivity | Specificity | AUC | Precision | F1-score | MCC |
|---|---|---|---|---|---|---|---|
| KNN (N = 7) | 89.16 | 90.30 | 87.97 | 93.30 | 88.93 | 0.90 | 0.78 |
| DT | 86.82 | 89.73 | 83.76 | 92.90 | 85.40 | 0.87 | 0.74 |
| ET | 96.74 | 96.36 | 97.40 | 98.15 | 96.10 | 0.97 | 0.93 |
| GB | 96.05 | 96.10 | 97.20 | 97.85 | 95.90 | 0.96 | 0.92 |
| RF | 94.60 | 94.62 | 94.19 | 96.27 | 94.25 | 0.94 | 0.90 |
| SVM (kernel = ‘rbf’) | 84.18 | 92.01 | 75.95 | 90.48 | 80.31 | 0.86 | 0.70 |
| AB | 92.09 | 91.82 | 92.38 | 97.12 | 92.84 | 0.92 | 0.84 |
| NB | 82.33 | 86.31 | 78.15 | 90.71 | 80.98 | 0.83 | 0.65 |
| LR (C = 10) | 84.08 | 89.92 | 77.95 | 90.08 | 81.24 | 0.85 | 0.68 |
| ANN (13, 20, 2) | 95.80 | 96.80 | 95.45 | 97.70 | 96.90 | 0.95 | 0.91 |
Figure 1ROC curves of all classifiers on full feature space using 10-fold cross-validation on S1.
Classifiers' performance on optimal feature space of the FCBF feature selection algorithm.
| Classification model | Accuracy | Sensitivity | Specificity | AUC | Precision | F1-score | MCC |
|---|---|---|---|---|---|---|---|
| KNN (K = 7) | 87.50 | 85.55 | 89.57 | 92.01 | 86.25 | 0.87 | 0.75 |
| DT | 84.97 | 85.53 | 89.57 | 91.49 | 89.77 | 0.85 | 0.70 |
| ET | 94.14 | 94.29 | 93.98 | 94.21 | 94.47 | 0.94 | 0.88 |
| GB | 93.36 | 94.67 | 91.98 | 93.87 | 92.44 | 0.93 | 0.87 |
| RF | 88.48 | 90.87 | 85.57 | 92.42 | 87.05 | 0.88 | 0.77 |
| SVM (kernel = ‘rbf’) | 82.62 | 90.68 | 74.14 | 89.68 | 78.86 | 0.84 | 0.66 |
| AB | 87.50 | 89.92 | 84.96 | 91.13 | 86.44 | 0.88 | 0.75 |
| NB | 81.25 | 84.22 | 78.15 | 88.52 | 80.55 | 0.82 | 0.62 |
| LR (C = 10) | 82.33 | 86.88 | 77.55 | 89.51 | 80.61 | 0.83 | 0.65 |
| ANN (13, 20, 2) | 88.19 | 92.20 | 83.56 | 90.17 | 85.48 | 0.89 | 0.76 |
Figure 2ROC curve of all classifiers on selected features by FCBF feature selection algorithm.
Classifiers' performance on optimal feature space of the mRMR feature selection algorithm.
| Classification model | Accuracy | Sensitivity | Specificity | AUC | Precision | F1-score | MCC |
|---|---|---|---|---|---|---|---|
| KNN (K = 7) | 86.62 | 85.93 | 87.37 | 94.69 | 87.88 | 0.87 | 0.73 |
| DT | 82.24 | 84.98 | 79.35 | 87.49 | 81.38 | 0.83 | 0.64 |
| ET | 93.42 | 93.92 | 93.88 | 93.23 | 94.45 | 0.94 | 0.88 |
| GB | 91.30 | 92.61 | 98.92 | 91.81 | 90.40 | 0.91 | 0.85 |
| RF (n = 100) | 88.29 | 92.20 | 84.56 | 94.42 | 86.33 | 0.89 | 0.77 |
| SVM (kernel = ‘rbf’) | 82.14 | 85.36 | 78.75 | 86.11 | 81.11 | 0.83 | 0.64 |
| AB | 87.41 | 88.40 | 86.37 | 92.81 | 86.32 | 0.88 | 0.75 |
| NB | 81.84 | 81.93 | 81.76 | 89.06 | 82.85 | 0.82 | 0.63 |
| LR (C = 10) | 82.14 | 84.03 | 80.16 | 86.38 | 81.81 | 0.83 | 0.64 |
| ANN (13, 20, 2) | 90.55 | 91.48 | 89.58 | 96.95 | 90.16 | 0.90 | 0.84 |
Figure 3ROC curve of all classifiers on selected features using the mRMR feature selection algorithm.
Classifiers’ performance on optimal feature space of LASSO feature selection algorithm.
| Classification model | Accuracy | Sensitivity | Specificity | AUC | Precision | F1-score | MCC |
|---|---|---|---|---|---|---|---|
| KNN (K = 7) | 85.65 | 83.84 | 87.57 | 92.10 | 86.82 | 0.85 | 0.71 |
| DT | 84.28 | 83.84 | 84.76 | 88.68 | 84.26 | 0.84 | 0.68 |
| ET | 89.36 | 88.21 | 90.58 | 92.05 | 88.90 | 0.88 | 0.77 |
| GB | 88.47 | 89.54 | 87.37 | 92.69 | 86.39 | 0.88 | 0.77 |
| RF | 88.18 | 89.92 | 85.97 | 92.52 | 86.17 | 0.87 | 0.76 |
| SVM (kernel = ‘rbf’) | 80.57 | 83.26 | 77.75 | 88.03 | 80.06 | 0.81 | 0.61 |
| AB | 85.94 | 86.50 | 85.37 | 90.72 | 84.33 | 0.86 | 0.72 |
| NB | 82.62 | 83.84 | 81.36 | 86.81 | 82.76 | 0.83 | 0.65 |
| LR (C = 10) | 80.77 | 83.46 | 77.95 | 85.00 | 80.35 | 0.81 | 0.61 |
| ANN (13, 20, 2) | 87.59 | 88.02 | 86.57 | 92.40 | 87.58 | 0.87 | 0.74 |
Figure 4ROC curve of all classifiers on selected feature space using the LASSO feature selection algorithm.
Classifiers’ performance on optimal feature space of Relief feature selection algorithm.
| Classification model | Accuracy | Sensitivity | Specificity | AUC | Precision | F1-score | MCC |
|---|---|---|---|---|---|---|---|
| KNN (K = 7) | 87.11 | 84.41 | 89.97 | 94.08 | 86.06 | 0.87 | 0.74 |
| DT | 84.98 | 89.16 | 80.56 | 87.83 | 82.06 | 0.85 | 0.70 |
| ET | 94.41 | 94.93 | 94.89 | 94.24 | 95.46 | 0.95 | 0.89 |
| GB | 92.35 | 93.66 | 98.90 | 92.86 | 91.44 | 0.92 | 0.87 |
| RF | 91.51 | 92.39 | 89.37 | 94.78 | 89.42 | 0.91 | 0.81 |
| SVM (kernel = ‘rbf’) | 81.26 | 87.83 | 74.34 | 84.26 | 82.60 | 0.82 | 0.63 |
| AB | 83.01 | 83.26 | 82.76 | 88.14 | 83.78 | 0.83 | 0.66 |
| NB | 80.29 | 81.93 | 78.55 | 84.94 | 79.17 | 0.81 | 0.60 |
| LR (C = 10) | 80.77 | 84.79 | 76.55 | 84.47 | 78.29 | 0.82 | 0.61 |
| ANN (13, 20, 2) | 86.72 | 89.73 | 83.56 | 91.79 | 85.66 | 0.87 | 0.73 |
Figure 5ROC curve of all classifiers on selected features selected by the Relief feature selection algorithm.
Performance of ET classifier using 10-fold CV on different sub-feature spaces on S1.
| Features size | Accuracy | Sensitivity | Specificity | AUC | Precision | F1-score | MCC |
|---|---|---|---|---|---|---|---|
| 1 | 72.86 | 73.33 | 73.91 | 77.02 | 75.71 | 0.75 | 0.47 |
| 2 | 75.81 | 76.36 | 75.36 | 75.19 | 78.70 | 0.77 | 0.52 |
| 3 | 81.12 | 85.45 | 76.08 | 88.13 | 83.38 | 0.84 | 0.62 |
| 4 | 85.11 | 86.84 | 80.26 | 90.56 | 84.83 | 0.85 | 0.68 |
| 5 | 92.60 | 91.65 | 92.72 | 95.80 | 91.48 | 0.93 | 0.85 |
| 6 | 94.41 | 94.93 | 94.89 | 94.24 | 95.46 | 0.95 | 0.89 |
| 7 | 92.12 | 91.85 | 92.42 | 96.98 | 92.86 | 0.92 | 0.84 |
| 8 | 92.12 | 91.85 | 92.42 | 96.98 | 92.86 | 0.92 | 0.84 |
| 9 | 92.10 | 91.84 | 92.39 | 96.94 | 92.84 | 0.92 | 0.84 |
| 10 | 92.10 | 91.84 | 92.39 | 96.94 | 92.84 | 0.92 | 0.84 |
| 11 | 92.09 | 91.82 | 92.38 | 97.92 | 92.84 | 0.92 | 0.84 |
| 12 | 92.09 | 91.82 | 92.38 | 97.92 | 92.84 | 0.92 | 0.84 |
Performance of ET classifier using 10-fold CV on different sub-feature spaces on S2.
| Features size | Accuracy | Sensitivity | Specificity | AUC | Precision | F1-score | MCC |
|---|---|---|---|---|---|---|---|
| 1 | 75.98 | 76.80 | 75.15 | 077.11 | 76.69 | 0.76 | 0.52 |
| 2 | 82.62 | 76.23 | 89.37 | 91.48 | 88.43 | 0.82 | 0.66 |
| 3 | 83.80 | 79.27 | 88.57 | 93.27 | 88.19 | 0.83 | 0.68 |
| 4 | 91.99 | 90.30 | 93.78 | 98.25 | 93.91 | 0.92 | 0.86 |
| 5 | 96.80 | 97.20 | 96.25 | 97.80 | 97.12 | 0.97 | 0.94 |
| 6 | 98.04 | 97.21 | 98.15 | 98.95 | 98.03 | 0.98 | 0.95 |
| 7 | 96.81 | 97.24 | 96.30 | 98.16 | 97.12 | 0.97 | 0.94 |
| 8 | 96.80 | 97.22 | 96.28 | 98.14 | 97.12 | 0.97 | 0.94 |
| 9 | 96.80 | 97.22 | 96.28 | 98.14 | 97.12 | 0.97 | 0.94 |
| 10 | 96.78 | 97.16 | 96.34 | 97.96 | 96.94 | 0.97 | 0.94 |
| 11 | 96.76 | 97.22 | 96.24 | 97.82 | 96.88 | 0.97 | 0.94 |
| 12 | 96.74 | 96.36 | 97.40 | 98.15 | 96.10 | 0.97 | 0.93 |
P-value and chi-square of ET classifier on different feature selection techniques.
| Classifier + feature selection | Chi-square | P-value (α = 0.05) |
|---|---|---|
| ET + relief | 116.7555 | < 0.00001 |
| ET + FCBF | 114.5667 | < 0.00001 |
| ET + mRMR | 104.2308 | < 0.00001 |
| ET + LASSO | 88.1475 | < 0.00001 |
Classification accuracy of the developed system and other approaches in the literature using heart disease dataset.
| Publications | Approach | Accuracy |
|---|---|---|
| Amin et al.[ | Hybrid framework | 86.00 |
| Mohan et al.[ | HRFLM | 88.70 |
| Kumar et al.[ | ANFIS | 91.00 |
| Samuel et al.[ | ANN-fuzzy-AHP | 91.10 |
| Liaqat et al.[ | Stacked SVM | 91.11 |
| Developed model | Intelligent framework (full features) | 92.09 |
| Developed model | Intelligent framework (selected-features) | 94.41 |
Description of CHDD and HHDD datasets.
| S. no | Feature name | Feature code | Description | Range of values |
|---|---|---|---|---|
| 1 | Age | Age | Age in years | 26 < age < 88 |
| 2 | Sex | Sex | Male = 1 | 1 |
| Female = 0 | 0 | |||
| 3 | Chest pain type | CPT | Atypical angina | 0 |
| Typical angina | 1 | |||
| Asymptotic | 2 | |||
| Non-anginal pain | 3 | |||
| 4 | Resting blood pressure | RBP | mmHg in the hospital | 94–200 |
| 5 | Serum cholesterol | SCH | in mg/dl | 120–564 |
| 6 | Fasting blood sugar | FBS | FBS > 120 mg/dl (0 = false, 1 = true) | 0 1 |
| 7 | Resting electrocardiographic results | RECG | 0 = normal | 0 |
| 1 = having ST-T | 1 | |||
| 2 = Hypertrophy | 2 | |||
| 8 | Thallium scan | THA | 0 = normal | 0 |
| 1 = fixed defect | 1 | |||
| 2 = reversible defect | 2 | |||
| 9 | Number of major vessels colored by fluoroscopy | VCA | – | 0 1 2 3 |
| 10 | The slope of peak exercise ST segments | PES | 0 = up sloping | 0 |
| 1 = flat/ no slope | 1 | |||
| 2 = down sloping | 2 | |||
| 11 | Old peak | OPK | – | 0–6.5 |
| 12 | Exercise-induced angina | EIA | 0 = no | 0 |
| 1 = yes | 1 | |||
| 13 | Maximum heart rate | MHR | – | 70–204 |
Figure 6An Intelligent Hybrid Framework for the prediction of heart disease.
Selected Features by FCBF algorithm and their Scores.
| S. no. | Features | Feature codes | Score |
|---|---|---|---|
| 1 | 8 | THA | 0.230 |
| 2 | 3 | CPT | 0.177 |
| 3 | 12 | EIA | 0.171 |
| 4 | 9 | VCA | 0.166 |
| 5 | 10 | PES | 0.109 |
| 6 | 7 | RES | 0.024 |
Figure 7Features ranking by four feature selection algorithms (FCBF, LASSO, mRMR, Relief).
Selected features by mRMR algorithm and their scores.
| S. no. | Features | Feature codes | Score |
|---|---|---|---|
| 1 | 3 | PES | 0.59 |
| 2 | 5 | CPT | 0.58 |
| 3 | 10 | OPK | 0.57 |
| 4 | 9 | VCA | 0.53 |
| 5 | 2 | SEX | 0.52 |
| 6 | 8 | THA | 0.48 |
Selected features by LASSO algorithm and their scores.
| S. no.. | Features | Feature codes | Score |
|---|---|---|---|
| 1 | 2 | SEX | 0.20 |
| 2 | 9 | VCA | 0.19 |
| 3 | 12 | EIA | 0.18 |
| 4 | 3 | CPT | 0.13 |
| 5 | 10 | PES | 0.11 |
| 6 | 8 | THA | 0.10 |
Selected features by relief algorithm and their scores.
| S. no. | Features | Feature codes | Score |
|---|---|---|---|
| 1 | 3 | CPT | 0.184 |
| 2 | 1 | Age | 0.109 |
| 3 | 6 | FBS | 0.102 |
| 4 | 9 | VCA | 0.078 |
| 5 | 8 | THA | 0.064 |
| 6 | 13 | MHR | 0.062 |
Confusion matrix.
| Predicted (no disease) | Predicted (heart disease) | |
|---|---|---|
| Actual (no disease) | TN | FP |
| Actual (heart disease) | FN | TP |