| Literature DB >> 36236325 |
Ch Anwar Ul Hassan1, Jawaid Iqbal2, Rizwana Irfan3, Saddam Hussain4, Abeer D Algarni5, Syed Sabir Hussain Bukhari6, Nazik Alturki7, Syed Sajid Ullah8.
Abstract
Coronary heart disease is one of the major causes of deaths around the globe. Predicating a heart disease is one of the most challenging tasks in the field of clinical data analysis. Machine learning (ML) is useful in diagnostic assistance in terms of decision making and prediction on the basis of the data produced by healthcare sector globally. We have also perceived ML techniques employed in the medical field of disease prediction. In this regard, numerous research studies have been shown on heart disease prediction using an ML classifier. In this paper, we used eleven ML classifiers to identify key features, which improved the predictability of heart disease. To introduce the prediction model, various feature combinations and well-known classification algorithms were used. We achieved 95% accuracy with gradient boosted trees and multilayer perceptron in the heart disease prediction model. The Random Forest gives a better performance level in heart disease prediction, with an accuracy level of 96%.Entities:
Keywords: disease prediction; heart disease dataset; machine learning; supervised learning
Mesh:
Year: 2022 PMID: 36236325 PMCID: PMC9573101 DOI: 10.3390/s22197227
Source DB: PubMed Journal: Sensors (Basel) ISSN: 1424-8220 Impact factor: 3.847
State of the Art.
| Author | Year | Methods/Classifiers | Datasets | Evaluation | Highest Accuracy% |
|---|---|---|---|---|---|
| [ | 2022 | LR, NB, RF REP, M5P Tree, J48, JRIP | Hungarian and Statlog (heart) dataset | RMSE, MAE | RF 99.81% |
| [ | 2021 | RF, DT, LR | UCI Cleveland database | Accuracy | LR 92.10% |
| [ | 2021 | AB, ET, LR, MNB, CART, LDA, SVM, RF, XGB | Heart Dataset | Accuracy | AB 90% |
| [ | 2021 | SVM, NB, DT | Heart Dataset | Accuracy | DT 90% |
| [ | 2022 | KNN, DT, LR, NB, SVM | Heart Dataset | Accuracy, Specificity, Sensitivity, F1-Score | LR 92% |
| [ | 2022 | RF into fetal echocardiography | Congenital heart disease database of 3910 Singleton Fetuses | Sensitivity, Specificity | sensitivity 0.85, specificity 0.88, |
| [ | 2022 | LR, Evimp functions, Multivariate adaptive regression | DiScRi dataset | Accuracy, Sensitivity, Specificity | 94.09% |
| [ | 2022 | LR, KNN, SVM, RF | Pathogen, Host feature | Accuracy | RF 99% |
| [ | 2022 | DT, LR, XGB, NB, GB, RF, SVM, PEM | Cardiovascular disease dataset (Mendeley Data Center) | Accuracy | EM 96.75% |
| [ | 2021 | NB, LM, LR, DT, RF, SVM, HRFLM | Heart Cleveland | Accuracy, Precision, Specificity, Sensitivity, F-Measure | HRFLM 88.4% |
| [ | 2021 | RF, LR, KNN, SVM, DT, XGB | Public Health Dataset | Accuracy, Specificity, Sensitivity | SVM 84% |
| [ | 2022 | K-NN, DT, RF, MLP, NB, L-SVM, | IoT based Produced Data | Accuracy | L-SVM 92.30%, |
| [ | 2022 | DT, NB, KNN, RF, ANN, Ada, GBA | Heart Disease (Kaggle Repository) | Accuracy, Precision, recall, f1-score | RF 86.89% |
| In our Proposed Scheme | |||||
| Proposed Methodology | 2022 | LR, SVM, NB, RF, XGB, DT, NN, RBF, KNN, GBT, MLP | Heart Disease (UCI Repository) | Accuracy, Precision (specificity), Recall (sensitivity), F-Measure | RF 96.28% |
Figure 1System Working Methodology.
Dataset Attributes Description.
| Dataset Details | |||
|---|---|---|---|
| No. | Features | Description | Value |
| 1. | Age | Age is an important aspect of health care. | Its value is an integer. |
| 2 | Sex | Gender | Female = 0, Male = 1 |
| 3. | Chest pain(cp) | The patient is suffering from chest pain. | Asymptomatic = 4, typicalangina = 1, atypicalangina = 2, non-anginal pain = 3 |
| 4. | RestingBloodPressure (trestbps) | High blood pressure ensues with some other factors which increase the risk. | It has either an integer or float value. |
| 5. | Cholesterol(Chol) | Serum cholesterol | It has either an integer or float value |
| 6. | FastingBloodSugar(Fbs) | Fasting blood sugar is more than 120 mg/dL | 0 = false; 1 = true |
| 7. | RestingECG (restech) | ElectroCardioGraphic Resting | ST-T wave abnormality =2, Normal =0, Left ventricular hypertrophy =1, |
| 8. | Max Heart Rate Achieved (thalach) | This is the highest heart rate you have ever had. | It has either an integer or float value. |
| 9. | Exercise-Induced Angina (exang) | Angina instigated by exercise | no = 0, yes = 1 |
| 10. | Oldpeak | Exercise-tempted ST depression compared to rest | It shows the value as either an integer or a float. |
| 11. | Slope | slope of peak exercise ST segment | flat = 1, downsloping = 2, Upsloping =0 |
| 12. | Coronary Artery (ca) | Fluoroscopy has colored a large number of major vessels. | It has either an integer or float value. |
| 13. | Thalassemia (thal) | Normal, reversible defect, fixed defect, | Measuring scales: 3 = normal; 7 = reversable defect; 6 = fixed defect |
| 14. | Num(target: Heart Disease predicting attribute) | Heart disease diagnosis (angiographic disease status) | 0 indicates a diameter narrowing of less than 50%, 1 indicates a diameter narrowing of more than 50%. |
Correlation Matrix Value.
| Attributes | Value |
|---|---|
| Age | 0.225439 |
| Sex | 0.280937 |
| Chest Pain | 0.433798 |
| Fasting Blood Sugar | 0.028046 |
| Resting Blood Pressure | 0.144931 |
| Cholesterol | 0.085239 |
| Exercise-Induced Angina | 0.436757 |
| Max Heart Rate Achieved | 0.421741 |
| Resting ECG | 0.137230 |
| Oldpeak | 0.430696 |
| Slope | 0.345877 |
| Coronary Artery | 0.391724 |
| Thalassemia | 0.344029 |
| Heart Disease Diagnosis | 1.000000 |
Figure 2Correlation Matrix with a Heatmap.
Figure 3Heart Disease Status.
Figure 4Sex heart disease chances.
Figure 5Heart Disease Dataset Age Statistics.
Figure 6Chest pain vs. heart disease chances.
Figure 7Fasting blood sugar vs. disease chances.
Figure 8Resting ECG vs. heart disease chances.
Figure 9Exercise-Induced Angina vs. disease chances.
Figure 10Slope vs. heart disease chances.
Figure 11Coronary Artery vs. disease chances.
Figure 12Thalassemia vs. heart disease chances.
Accuracy of ML Classifiers.
| Classifiers | Accuracy | Precision | Recall | F-Measure |
|---|---|---|---|---|
| Logistic Regression | 88.25% | 0.8791 | 0.8825 | 0.8865 |
| Support Vector Regression | 84.97% | 0.8407 | 0.8496 | 0.8437 |
| Naive Bayes | 88.25% | 0.8825 | 0.8854 | 0.8825 |
| Random Forest | 96.28% | 0.9628 | 0.9537 | 0.9668 |
| XGBoost | 88.25% | 0.8786 | 0.8810 | 0.8815 |
| Decision Tree | 84.97% | 0.8497 | 0.8475 | 0.8527 |
| Neural Network | 84.33% | 0.8433 | 0.8501 | 0.8413 |
| k-Nearest Neighbors | 70.21% | 0.7021 | 0.6901 | 0.7101 |
| Gradient Boosted Tree | 95.83% | 0.9493 | 0.9583 | 0.9613 |
| Radial Basis Function | 86.35% | 0.8635 | 0.8644 | 0.8635 |
| Multilayer perceptron | 94.96% | 0.9516 | 0.9506 | 0.9506 |
Figure 13ML Classifiers Accuracy.
Figure 14Random Forest Classifiers ROC.
Figure 15Gradient Boosting Tree Classifiers ROC.
Figure 16Multilayer perceptron Classifiers ROC.