| Literature DB >> 35115894 |
Alabi Waheed Banjoko1, Kawthar Opeyemi Abdulazeez1.
Abstract
BACKGROUND: The computerised classification and prediction of heart disease can be useful for medical personnel for the purpose of fast diagnosis with accurate results. This study presents an efficient classification method for predicting heart disease using a data-mining algorithm.Entities:
Keywords: biserial correlation; cross-validation; heart disease; splitting ratios; weighted support vector machine
Year: 2021 PMID: 35115894 PMCID: PMC8793974 DOI: 10.21315/mjms2021.28.5.12
Source DB: PubMed Journal: Malays J Med Sci ISSN: 1394-195X
Description of the variables in the heart disease data
| S/no | Factors | Description | Factor levels |
|---|---|---|---|
| 1 | Age | Patient age in years | Continuous |
| 2 | Sex | Patient sex | 1 = male |
| 3 | CP | Chest pain type | 1 = typical angina |
| 4 | RBP | Resting blood pressure | Continuous |
| 5 | Chol | Serum cholesterol in mg/dL | Continuous |
| 6 | FBS | Fasting blood sugar in mg/dL | 1 ≥ 120 mg/dL |
| 7 | RECGR | Resting electrocardiographic results | 0 = normal |
| 8 | MHRA | Maximum heart rate achieved | Continuous |
| 9 | EXANG | Exercise-induced angina | 0 = no |
| 10 | OLDPK | Depression induced by exercise relative to rest | Continuous |
| 11 | SLOPE | Slope of the peak exercise | 1 = up sloping |
| 12 | CA | Number of major vessels | 0 – 3 values |
| 13 | THAL | Defect type | 3 = normal |
Angiographic test result
| Heart disease absent (−) | Heart disease present (+) | Total |
|---|---|---|
| 160 (53.9%) | 137 (46.1%) | 297 (100%) |
Clinical characteristics of the continuous variable for the entire 297 heart disease patient
| Minimum | Maximum | Mean (SD) | |
|---|---|---|---|
| Age | 29 | 77 | 55.5 (9.1) |
| Chol | 126 | 564 | 247.4 (52.0) |
| MHRA | 71 | 202 | 149.6 (22.9) |
| RBP | 94 | 200 | 131.7 (17.8) |
| OLDPK | 0 | 6.2 | 1.1 (1.2) |
Figure 1Box plot of the continuous variables in Table 3. The ‘No’ and ‘Yes’ indicates absence (−1) and presence (+1) of the heart disease, respectively
Summary of the clinical characteristics of categorical variables for the entire 297 heart disease patient
| Factors | Factor levels | Frequency (Percentage) |
|---|---|---|
| Total number of patient | 297 | |
| Sex | Male | 201 (67.7%) |
| Female | 96 (32.3%) | |
| CP | Typical angina | 23 (7.7%) |
| Atypical angina | 49 (16.5%) | |
| Non-angina pain | 83 (27.9%) | |
| Asymptomatic | 142 (47.8%) | |
| FBS | ≥120 mg/dL | 254 (85.5%) |
| < 120 mg/dL | 43 (14.5%) | |
| RECGR | Normal | 147 (49.5%) |
| Abnormality | 4 (1.3%) | |
| Left ventricular hypertrophy | 146 (49.2%) | |
| EXANG | No | 200 (67.3%) |
| Yes | 97 (32.7%) | |
| SLOPE | Up sloping | 139 (46.8%) |
| Flat | 137 (46.1%) | |
| Down sloping | 21 (7.1%) | |
| CA | 0 | 174 (58.6%) |
| 1 | 65 (21.9%) | |
| 2 | 38 (12.8%) | |
| 3 | 20 (6.7%) | |
| THAL | Normal | 164 (55.2%) |
| Fixed | 18 (6.1%) | |
| Reversible | 115 (38.7%) | |
| NUM | Yes (Presence of heart disease) | 137 (46.1%) |
| No (Absence of heart disease) | 160 (53.9%) |
A typical 2 × 2 confusion matrix for a binary response data
| True class (T) | ||||
|---|---|---|---|---|
| Predicted Class (P) | 1 | −1 | Marginal total | |
| 1 | TP | FP | TP + FP | |
| −1 | FN | TN | FN + TN | |
| Marginal total | TP + FN | FP + TN | N | |
Degree of relationship and weight of each predictor
| Variables in the data |
|
| Rank |
|---|---|---|---|
| Age ( | 0.2271 | 0.0581 | 9 |
| Sex ( | 0.2785 | 0.0712 | 8 |
| CP ( | 0.4089 | 0.1046 | 6 |
| RBP ( | 0.1535 | 0.0393 | 11 |
| Chol ( | 0.0803 | 0.0205 | 12 |
| FBS ( | 0.0032 | 0.0008 | 13 |
| RECGR ( | 0.1663 | 0.0425 | 10 |
| MHRA ( | 0.4238 | 0.1084 | 4 |
| EXANG ( | 0.4214 | 0.1078 | 5 |
| OLDPK ( | 0.4241 | 0.1085 | 3 |
| SLOPE ( | 0.3331 | 0.0852 | 7 |
| CA ( | 0.4632 | 0.1185 | 2 |
| THAL ( | 0.5266 | 0.1347 | 1 |
Performance measures of w-WSM on the heart disease classification
| Performance index (%) | Splitting ratios (%) | ||||
|---|---|---|---|---|---|
|
| |||||
| 95:5 | 90:10 | 80:20 | 75:25 | 50:50 | |
| ACC | 90.00 | 90.62 | 90.53 | 90.49 | 88.62 |
| MER | 10.00 | 9.38 | 9.47 | 9.51 | 11.38 |
| Se | 90.68 | 90.95 | 90.99 | 90.90 | 89.91 |
| Sp | 90.34 | 91.02 | 90.72 | 90.78 | 88.50 |
| P+ | 83.30 | 87.41 | 87.99 | 87.34 | 85.29 |
| P− | 89.95 | 90.54 | 90.53 | 90.53 | 89.82 |
| JI | 81.67 | 82.66 | 82.89 | 82.80 | 80.04 |
Figure 2Flow chart of the w-SVM prediction algorithm for the heart disease data
Figure 3The graph of MER results of w-SVM, SVM, RF and NB
Accuracy of Cleveland heart disease prediction with different
| Authors | Year | Classifier used | Accuracy (%) |
|---|---|---|---|
| Otoom et al. ( | 2015 | BayesNet | 84.50 |
| SVM | 85.10 | ||
| Functional trees | 84.50 | ||
| Vembandasamy et al. ( | 2015 | Naïve Bayes | 86.42 |
| Dwivedi ( | 2018 | Naïve Bayes | 83.00 |
| Classification trees | 77.00 | ||
| K-NN | 80.00 | ||
| Logistic regression | 85.00 | ||
| SVM | 82.00 | ||
| ANN | 84.00 | ||
| Deepika et al. ( | 2016 | Naïve Bayes | 93.85 |
| Decision tree | 92.59 | ||
| SVM | 95.20 | ||
| ANN | 94.27 | ||
| Zriqat et al. ( | 2017 | Decision tree | 99.01 |
| Naïve Bayes | 78.88 | ||
| Discriminant | 83.50 | ||
| Random forest | 93.40 | ||
| SVM | 76.57 | ||
| Rajdhan et al. ( | 2020 | Decision tree | 81.97 |
| Logistic regression | 85.25 | ||
| Random forest | 90.16 | ||
| Naïve Bayes | 85.25 | ||
| Proposed | 2021 | w-SVM | 90.53 |