| Literature DB >> 34976036 |
Ebenezer Owusu1, Prince Boakye-Sekyerehene1, Justice Kwame Appati1, Julius Yaw Ludu1.
Abstract
Heart diseases are a leading cause of death worldwide, and they have sparked a lot of interest in the scientific community. Because of the high number of impulsive deaths associated with it, early detection is critical. This study proposes a boosting Support Vector Machine (SVM) technique as the backbone of computer-aided diagnostic tools for more accurately forecasting heart disease risk levels. The datasets which contain 13 attributes such as gender, age, blood pressure, and chest pain are taken from the Cleveland clinic. In total, there were 303 records with 6 tuples having missing values. To clean the data, we deleted the 6 missing records through the listwise technique. The size of data, and the fact that it is a purely random subset, made this approach have no significant effect for the experiment because there were no biases. Salient features are selected using the boosting technique to speed up and improve accuracies. Using the train/test split approach, the data is then partitioned into training and testing. SVM is then used to train and test the data. The C parameter is set at 0.05 and the linear kernel function is used. Logistic regression, Nave Bayes, decision trees, Multilayer Perceptron, and random forest were used to compare the results. The proposed boosting SVM performed exceptionally well, making it a better tool than the existing techniques.Entities:
Mesh:
Year: 2021 PMID: 34976036 PMCID: PMC8718315 DOI: 10.1155/2021/3152618
Source DB: PubMed Journal: Comput Intell Neurosci
Comparative performance of the training and testing accuracies of methods.
| Method | Accuracy | ||
|---|---|---|---|
| Training (%) | Testing (%) | Testing time (s) | |
| Random forest | 100 | 83.33 | 3.0 |
| Multilayer Perceptron | 75.36 | 80.0 | 5.8 |
| Decision tree | 92.15 | 83.33 | 4.0 |
| Naïve Bayes | 82.13 | 85.5 | 3.2 |
| Logistic regression | 84.06 | 84.44 | 4.5 |
| Boosting SVM | 99.92 | 99.75 | 2.1 |
Comparative confusion matrices of different methods.
| Method | Confusion matrix | |
|---|---|---|
| Random forest | 47 | 8 |
| 7 | 28 | |
|
| ||
| Multilayer Perceptron | 46 | 9 |
| 9 | 26 | |
|
| ||
| Decision tree | 48 | 7 |
| 8 | 27 | |
|
| ||
| Naïve Bayes | 47 | 8 |
| 5 | 30 | |
|
| ||
| Logistic regression | 45 | 10 |
| 4 | 31 | |
|
| ||
| Boosting SVM | 51 | 4 |
| 2 | 33 | |
Comparing classification report on test data.
| Method | Precision | Recall | F1-score | Support |
|---|---|---|---|---|
| Random forest | 0.87 | 0.85 | 0.86 | 55 |
| 0.78 | 0.80 | 0.79 | 35 | |
|
| ||||
| Multilayer Perceptron | 0.84 | 0.84 | 0.84 | 55 |
| 0.74 | 0.74 | 0.74 | 35 | |
|
| ||||
| Decision tree | 0.86 | 0.87 | 0.86 | 55 |
| 0.79 | 0.77 | 0.78 | 35 | |
|
| ||||
| Naïve Bayes | 0.90 | 0.85 | 0.88 | 55 |
| 0.79 | 0.82 | 0.82 | 35 | |
|
| ||||
| Logistic regression | 0.92 | 0.82 | 0.87 | 55 |
| 0.76 | 0.89 | 0.82 | 35 | |
|
| ||||
| Boosting SVM | 0.94 | 0.87 | 0.90 | 55 |
| 0.82 | 0.89 | 0.85 | 35 | |
Performances of different methods on Cleveland datasets.
| Author | Method | Accuracy (%) |
|---|---|---|
| Mirza et al. [ | RBFSVM | 87.114 |
| Amen et al. [ | Logistics regression | 82 |
| Sajja et al. [ | SVM | 92–94 |
| Waris & Koteeswaran [ | Novel KNN | 93 |
| Gupta et al. [ | Naive Bayes | 88.16 |
| Saini et al. [ | Hybrid classifier with weighted voting (HCWV) | 82.54 |
| Abdeldjouad et al. [ | GFS-logicboost-C | 94.17 |
| Motarwar et al. [ | AdaBoost | 80.32 |
| Alotaibi [ | Decision tree | 93.19 |
| Gupta et al. [ | Ensemble of Naïve Bayes, AdaBoost, and boosted tree | 87.97 |
| Proposed method | Boosting SVM | 99.92 |
Figure 1Comparative ROC of various classifiers.
Description of the attributes.
| No | Attribute | Description | Ranges |
|---|---|---|---|
| 1 | Age | Ages of patients taken in years. | 29 to 27 |
| 2 | Sex | 0 for female, 1 for male. | 0, 1 |
| 3 | Chest pain type | There are four types—1 for angina, 2 for atypical angina, 3 for nonangina pain, and 4 for asymptomatic angina. | 1, 2, 3, 4 |
| 4 | Resting blood pressure | Blood pressure of the patient when at rest in mm Hg. | 94 to 200 |
| 5 | Serum cholesterol | The amount of cholesterol in the blood in mg/dL. | 126 to 564 |
| 6 | Fasting blood sugar | Amount of sugar present at fasting. 0 for false—fasting blood sugar is not above 120 mg/dL; 1 for true—fasting blood sugar is above 120 mg/dL. | 0, 1 |
| 7 | Resting electrocardiograph | Values produced by electrocardiography at rest. 0 is normal; 1 is having ST-T wave abnormality; 2 for showing probable or definite left ventricular hypertrophy. | 0, 1, 2 |
| 8 | Maximum heart rate | Maximum heart rate of patient. | 71 to 202 |
| 9 | Exercise-induced angina | Whether or not the patient gets angina when exercise is performed. They are 0 for no and 1 for yes. | 0, 1 |
| 10 | ST depression | Finding on an electrocardiogram wherein the trace of the ST segment is abnormally low below the baseline. Values contain ST depression induced by exercise relative to rest. The abbreviation ST in medical terms means sinus tachycardia. | 1 to 3 |
| 11 | Slope | The slope of the ST segment for peak exercise by the patient. 1 for upsloping, 2 for flat, and 3 for downsloping. | 1, 2, 3 |
| 12 | Number of vessels | Number of vessels colored by fluoroscopy. | 0 to 3 |
| 13 | Thallium stress test result | How well blood flows to the heart while at rest or during exercise. 3 is normal, 6 is a fixed defect, and 7 is a reversible defect. | 3, 6, 7 |
| 14 | Diagnosis | Predicted attribute that contains values showing no presence or presence of heart disease to varying degrees. 0 for no presence, 1 for least likelihood, 2 for moderate likelihood, 3 for a high likelihood, and 4 for very high likelihood. Values 1 through 4 are compressed to a single value, 1, representing the presence of heart disease. | 0 or 1 |
Figure 2The proposed framework.
Algorithm 1Gini index computation.
Figure 3Importance of each feature.
Figure 4A correlation matrix with heatmap.
Interpretation of AUC values.
| AUC value | Connotation |
|---|---|
| 0.9 < | Excellent |
| 0.8 < | Good |
| 0.7 < | Fair |
| 0.6 < | Poor |
| 0.5 < | Insignificant |
Figure 5Prediction page showing prediction result for patient with low risk level.
Figure 6Screen showing prediction history of all patients.