| Literature DB >> 35299686 |
Bhanu Prakash Doppala1, Debnath Bhattacharyya2, Midhunchakkaravarthy Janarthanan1, Namkyun Baik3.
Abstract
Machine intelligence can convert raw clinical data into an informational source that helps make decisions and predictions. As a result, cardiovascular diseases are more likely to be addressed as early as possible before affecting the lifespan. Artificial intelligence has taken research on disease diagnosis and identification to another level. Despite several methods and models coming into existence, there is a possibility of improving the classification or forecast accuracy. By selecting the connected combination of models and features, we can improve accuracy. To achieve a better solution, we have proposed a reliable ensemble model in this paper. The proposed model produced results of 96.75% on the cardiovascular disease dataset obtained from the Mendeley Data Center, 93.39% on the comprehensive dataset collected from IEEE DataPort, and 88.24% on data collected from the Cleveland dataset. With this proposed model, we can achieve the safety and health security of an individual.Entities:
Mesh:
Year: 2022 PMID: 35299686 PMCID: PMC8923755 DOI: 10.1155/2022/2585235
Source DB: PubMed Journal: J Healthc Eng ISSN: 2040-2295 Impact factor: 2.682
Existing models' accuracy comparison.
| Authors | Model used | Accuracy (%) |
|---|---|---|
| AI-Milli [ | NN | 81 |
| Sonawane and Patil [ | MPNN | 98 |
| Dai et al. [ | AdaBoost | 82 |
| Radhimeenakshi [ | SVM, ANN | 86 |
| Saqlain et al. [ | LR and RF | 80.69 |
| Karaylan and Kilic [ | ANN | 95 |
| Esfahani and Ghazanfari [ | DT | 86.80 |
| Cheng and Chiu [ | ANN | 82.5 |
| Doppala et al. [ | Hybrid model | 84.40 |
| Nasarian et al. [ | Hybrid feature selection | 81.23 |
| Doppala et al. [ | Ensemble | 85.24 |
| Kumar et al. [ | CNN | 88 |
| Bayu Adhi et al. [ | Ensemble | 93.55 |
| Doppala et al. [ | GA-RBF | 85.40, 94.20 |
| Waqas Nadeem et al. [ | SVM | 96.23 |
Figure 1Proposed model architecture.
Dataset attributes' description [33].
| S. no. | Cleveland dataset features | Comprehensive dataset features | Mendeley dataset features | Unit |
|---|---|---|---|---|
| 1 | Age | Age | Age | In years |
| 2 | Sex | Sex | Gender | 1, 0 (0 = female; 1 = male) |
| 3 | cp | Chest pain type | Chest pain | Value 0: typical angina; value 1: atypical angina |
| 4 | trestbps | Resting bps | Resting BP | 94–200 (in mmHg) |
| 5 | chol | Cholesterol | Serum cholesterol | 126–564 (in mg/dl) |
| 6 | fbs | Fasting blood sugar | Fasting blood sugar | 0, 1 > 120 mg/dl (0 = false; 1 = true) |
| 7 | restecg | Resting ECG | Restingrelectro | 0, 1, 2 (value 0: normal; value 1: having ST-T-wave abnormality (T-wave inversions and/or ST elevation or depression of >0.05 mV); value 2: showing probable or definite left ventricular hypertrophy by Estes criteria |
| 8 | thalach | Max heart rate | Max heart rate | 71–202 |
| 9 | exang | Exercise angina | Exercise angina | 0, 1 (0 = no; 1 = yes) |
| 10 | Oldpeak | Oldpeak | Oldpeak | 0–6.2 |
| 11 | Slope | ST slope | Slope | 1, 2, 3 (1-upsloping, 2-flat, and 3-downsloping) |
| 12 | ca | — | No. of major vessels | 0, 1, 2, 3 |
| 13 | thal | — | — | Thalassemia display, 3 = normal, 6 = fixed, and 7 = reversible defect |
| 14 | Target | Target | Target | 0, 1 (0 = absence of heart disease; 1 = presence of heart disease) |
Figure 2Heatmap of the heart disease dataset obtained from the Cleveland repository.
Figure 3Heatmap of the dataset obtained from IEEE DataPort.
Figure 4Heatmap of the cardiovascular disease dataset obtained from the Mendeley Data Center.
Figure 5Representation of the decision tree [40].
Figure 6Representation of the logistic function [41].
Figure 7Representation of naive Bayes [38].
Proposed algorithm.
| Algorithm |
|---|
|
|
|
|
| Train_data, Test_data = split (heart_disease_data,lables) |
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Achieved accuracies using benchmark classifiers.
| Classification technique | Accuracy (%) achieved with the Cleveland dataset | Accuracy (%) achieved with the comprehensive dataset | Accuracy (%) achieved with the Mendeley dataset |
|---|---|---|---|
| Decision tree | 77.86 | 82.56 | 95 |
| Random forest | 78.68 | 90.75 | 95.12 |
| Naive Bayes | 81.14 | 84.24 | 94.25 |
| Logistic regression | 81.96 | 84.03 | 95.25 |
| Support vector machine | 79.05 | 81.52 | 93.15 |
| Gradient boosting | 81.14 | 86.13 | 95.15 |
| XGBoost | 80.32 | 88.23 | 96.12 |
Proposed model performance representation.
| Classification technique | Accuracy (%) achieved with the Cleveland dataset | Accuracy (%) achieved with the comprehensive dataset | Accuracy (%) achieved with the Mendeley dataset |
|---|---|---|---|
| Proposed ensemble model | 88.24 | 93.39 | 96.75 |
Performance metrics of all the machine learning models.
| Classification technique | Accuracy (%) achieved with the Cleveland dataset | Sensitivity | Specificity | Precision | Recall |
| MCC |
|---|---|---|---|---|---|---|---|
| Decision tree | 77.86 | 0.81 | 0.73 | 0.77 | 0.81 | 0.79 | 0.55 |
| Random forest | 78.68 | 0.78 | 0.77 | 0.80 | 0.78 | 0.79 | 0.55 |
| Naive Bayes | 81.14 | 0.87 | 0.73 | 0.79 | 0.87 | 0.83 | 0.62 |
| Logistic regression | 81.96 | 0.93 | 0.66 | 0.76 | 0.790. | 0.84 | 0.63 |
| Support vector machine | 79.05 | 0.77 | 0.75 | 0.79 | 0.85 | 0.78 | 0.54 |
| Gradient boosting | 81.14 | 0.93 | 0.66 | 0.76 | 0.93 | 0.84 | 0.63 |
| XGBoost | 80.32 | 0.87 | 0.71 | 0.78 | 0.87 | 0.82 | 0.60 |
| Proposed ensemble model | 88.24 | 0.91 | 0.84 | 0.85 | 0.90 | 0.88 | 0.76 |
|
| |||||||
| Classification technique | Accuracy (%) achieved with the comprehensive dataset | Sensitivity | Specificity | Precision | Recall |
| MCC |
|
| |||||||
| Decision tree | 82.56 | 0.79 | 0.85 | 0.83 | 0.79 | 0.81 | 0.65 |
| Random forest | 90.75 | 0.93 | 0.88 | 0.88 | 0.93 | 0.90 | 0.81 |
| Naive Bayes | 84.24 | 0.85 | 0.82 | 0.82 | 0.85 | 0.84 | 0.68 |
| Logistic regression | 84.03 | 0.87 | 0.80 | 0.81 | 0.87 | 0.84 | 0.68 |
| Support vector machine | 81.52 | 0.83 | 0.82 | 0.82 | 0.84 | 0.83 | 0.69 |
| Gradient boosting | 86.13 | 0.92 | 0.79 | 0.81 | 0.92 | 0.86 | 0.72 |
| XGBoost | 83.23 | 0.91 | 0.84 | 0.85 | 0.91 | 0.88 | 0.76 |
| Proposed ensemble model | 93.39 | 0.94 | 0.89 | 0.99 | 0.88 | 0.90 | 0.85 |
|
| |||||||
| Classification technique | Accuracy (%) achieved with the Mendeley dataset | Sensitivity | Specificity | Precision | Recall |
| MCC |
|
| |||||||
| Decision tree | 95 | 0.95 | 0.94 | 0.96 | 0.95 | 0.95 | 0.88 |
| Random forest | 95.12 | 0.94 | 0.96 | 0.97 | 0.94 | 0.96 | 0.90 |
| Naive Bayes | 94.25 | 0.95 | 0.90 | 0.94 | 0.95 | 0.94 | 0.86 |
| Logistic regression | 95.25 | 0.97 | 0.95 | 0.97 | 0.97 | 0.97 | 0.92 |
| Support vector machine | 93.15 | 0.95 | 0.90 | 0.93 | 0.95 | 0.93 | 0.85 |
| Gradient boosting | 95.15 | 0.95 | 0.95 | 0.97 | 0.95 | 0.96 | 0.90 |
| XGBoost | 96.12 | 0.96 | 0.95 | 0.97 | 0.96 | 0.96 | 0.92 |
| Proposed ensemble model | 96.75 | 0.96 | 0.97 | 0.98 | 0.96 | 0.97 | 0.93 |
Figure 8Classifiers' performance on the Cleveland dataset.
Figure 9Classifiers' performance on the comprehensive dataset.
Figure 10Classifiers' performance on the Mendeley dataset.
Figure 11ROC curve for all models on the Cleveland dataset.
Figure 12ROC curve for all models on the comprehensive dataset.
Figure 13ROC curve for all models on the Mendeley dataset.