| Literature DB >> 35479597 |
T R Mahesh1, V Dhilip Kumar2, V Vinoth Kumar1, Junaid Asghar3, Oana Geman4, G Arulkumaran5, N Arun1.
Abstract
As a result of technology improvements, various features have been collected for heart disease diagnosis. Large data sets have several drawbacks, including limited storage capacity and long access and processing times. For medical therapy, early diagnosis of heart problems is crucial. Disease of heart is a devastating human disease that is quickly increasing in developed and also developing countries, resulting in death. In this type of disease, the heart normally fails to provide enough blood to different body parts in order to allow them to perform their regular functions. Early, as well as, proper diagnosis of this condition is very critical for averting further damage and also to save patients' lives. In this work, machine learning (ML) is utilized to find out whether a person has cardiac disease or not. Both the types of ensemble classifiers, namely, homogeneous as well as heterogeneous classifiers (formed by combining two separate classifiers), have been implemented in this work. The data mining preprocessing using Synthetic Minority Oversampling Technique (SMOTE) has been employed to cope with the imbalance problem of the class as well as noise. The proposed work has two steps. SMOTE is used in the initial phase to reduce the impact of data imbalance and the second phase is classifying data using Naive Bayes (NB), decision tree (DT) algorithms, and their ensembles. The experimental results demonstrate that the AdaBoost-Random Forest classifier provides 95.47% accuracy in the early detection of heart disease.Entities:
Mesh:
Year: 2022 PMID: 35479597 PMCID: PMC9038394 DOI: 10.1155/2022/9005278
Source DB: PubMed Journal: Comput Intell Neurosci
Figure 1Proposed flow diagram.
Attributes of the dataset.
| Sl. No. | Features | Description | Values |
|---|---|---|---|
| 1 | Age | Age in years | Continuous |
| 2 | Sex | Gender of patient | Male/female |
| 3 | CP | Chest pain | Four types |
| 4 | Trestbps | Resting blood pressure | Continuous |
| 5 | Chol | Serum cholesterol | Continuous |
| 6 | FBS | Fasting blood sugar | <, or >120 mg/dl |
| 7 | Restecg | Resting electrocardiograph | Five values |
| 8 | Thalach | Maximum heart rate achieved | Continuous |
| 9 | Exang | Exercise induced angina | Yes/no |
| 10 | Oldpeak | ST depression when working out compared to the amount of rest taken | Continuous |
| 11 | Slope | Slope of peak exercise ST segment | Up/flat/down |
| 12 | Ca | Gives number of major vessels colored by fluoroscopy | 0–3 |
| 13 | Thal | Defect type | Reversible/fixed/normal |
| 14 | Num (disorder) | Heart disease | Not present (“NO”)/present in the four major types (“YES”) |
Figure 2Heatmap depiction of the dataset.
Single classifier evaluation comparison.
| Performance metrics | Naive Bayes | AltDTree | RF | RedEPTree | CART |
|---|---|---|---|---|---|
| TTBM (sec) | 4.56 | 60.18 | 2.11 | 10.25 | 52.24 |
| Accuracy (%) | 78.6 | 93.56 | 92.45 | 79.23 | 78.67 |
| MAE | 0.60 | 0.28 | 0.27 | 026 | 0.27 |
| RMSE | 0.83 | 0.41 | 0.42 | 0.42 | 0.56 |
| RAE | 120 | 67.71 | 77.87 | 79.12 | 68.91 |
| RRSE | 127.41 | 95.33 | 82.92 | 97.89 | 98.34 |
| F1-score | 0.3 | 0.85 | 0.84 | 0.83 | 0.81 |
Figure 3Accuracy prediction for single classifiers.
Figure 4Error rates of individual classifier.
AdaBoost classifier.
| Performance metrics | AB-NB | AB-AltDTree | AB-RF | AB-RedEPTree | AB-CART |
|---|---|---|---|---|---|
| TTBM (sec) | 18.32 | 30.01 | 10.34 | 64.35 | 295.45 |
| Accuracy (%) | 80.6 | 93.56 | 95.47 | 82.23 | 81.67 |
| MAE | 0.54 | 0.21 | 0.14 | 0.21 | 0.20 |
| RMSE | 0.76 | 0.43 | 0.38 | 0.41 | 0.41 |
| RAE | 129.79 | 57.78 | 35.87 | 45.19 | 41.61 |
| RRSE | 155.62 | 96.23 | 65.47 | 91.03 | 91.08 |
| F1-score | 0.81 | 0.94 | 0.98 | 0.83 | 0.87 |
Figure 5Accuracy of AdaBoost classifier.
Figure 6AdaBoost classifier error rate.
Ensemble classifiers, heterogeneous.
| Performance metrics | NB + AltDTree | NB + RF | AltDTree + RF | RF + RedEPTree | RF + CART | AltDTree + RedEPTree | AltDTree + CART |
|---|---|---|---|---|---|---|---|
| TTBM (sec) | 30.03 | 32.05 | 398.12 | 7.89 | 7.34 | 357.77 | 598.02 |
| Accuracy (%) | 76.45 | 76.05 | 70.12 | 85.45 | 86.29 | 74.49 | 71.29 |
| MAE | 0.42 | 0.43 | 0.37 | 0.35 | 0.34 | 0.37 | 0.41 |
| RMSE | 0.42 | 0.39 | 0.49 | 0.36 | 0.36 | 0.37 | 0.42 |
| RAE | 99.23 | 92.23 | 80.12 | 71.01 | 70.89 | 73.23 | 89.23 |
| RRSE | 98.23 | 97.49 | 101.22 | 91.29 | 90.12 | 93.37 | 99.34 |
| F1-score | 0.74 | 0.75 | 0.68 | 0.84 | 0.85 | 0.73 | 0.69 |
Figure 7Accuracy of heterogeneous ensemble classifiers.
Figure 8Error rates for heterogeneous ensemble classifiers.