| Literature DB >> 31979257 |
Javad Hassannataj Joloudari1, Edris Hassannataj Joloudari2, Hamid Saadatfar1, Mohammad GhasemiGol1, Seyyed Mohammad Razavi3, Amir Mosavi4,5,6,7, Narjes Nabipour8, Shahaboddin Shamshirband9,10, Laszlo Nadai4.
Abstract
Heart disease is one of the most common diseases in middle-aged citizens. Among the vast number of heart diseases, coronary artery disease (CAD) is considered as a common cardiovascular disease with a high death rate. The most popular tool for diagnosing CAD is the use of medical imaging, e.g., angiography. However, angiography is known for being costly and also associated with a number of side effects. Hence, the purpose of this study is to increase the accuracy of coronary heart disease diagnosis through selecting significant predictive features in order of their ranking. In this study, we propose an integrated method using machine learning. The machine learning methods of random trees (RTs), decision tree of C5.0, support vector machine (SVM), and decision tree of Chi-squared automatic interaction detection (CHAID) are used in this study. The proposed method shows promising results and the study confirms that the RTs model outperforms other models.Entities:
Keywords: big data; coronary artery disease; data science; ensemble model; health informatics; heart disease diagnosis; industry 4.0; machine learning; predictive model; random forest
Year: 2020 PMID: 31979257 PMCID: PMC7037941 DOI: 10.3390/ijerph17030731
Source DB: PubMed Journal: Int J Environ Res Public Health ISSN: 1660-4601 Impact factor: 3.390
Figure 1Database knowledge discovery (KDD) process steps [1].
Figure 2Support vector machine in two-dimensional space.
Figure 3Proposed methodology.
Description of the features used in the Z-Alizadeh-Sani dataset with their valid ranges.
| Feature Type | Feature Name | Range | Measurement | |||
|---|---|---|---|---|---|---|
| Mean | Std. Error of Mean | Std. Deviation | Variance | |||
| Demographic | Age | (30–80) | 58.90 | 0.6 | 10.39 | 108 |
| Demographic | Weight | (48–120) | 73.83 | 0.69 | 11.99 | 143.7 |
| Demographic | Length | (140–188) | 164.72 | 0.54 | 9.33 | 87.01 |
| Demographic | Sex | Male, Female | --- | --- | --- | --- |
| Demographic | BMI (body mass index Kb/m2) | (18–41) | 27.25 | 0.24 | 4.1 | 16.8 |
| Demographic | DM (diabetes mellitus) | (0, 1) | 0.3 | 0.03 | 0.46 | 0.21 |
| Demographic | HTN (hypertension) | (0, 1) | 0.6 | 0.03 | 0.49 | 0.24 |
| Demographic | Current smoker | (0, 1) | 0.21 | 0.02 | 0.41 | 0.17 |
| Demographic | Ex-smoker | (0, 1) | 0.03 | 0.01 | 0.18 | 0.03 |
| Demographic | FH (family history) | (0, 1) | 0.16 | 0.02 | 0.37 | 0.13 |
| Demographic | Obesity | Yes if MBI > 25, No otherwise | --- | --- | --- | --- |
| Demographic | CRF (chronic renal failure) | Yes, No | --- | --- | --- | --- |
| Demographic | CVA (cerebrovascular accident) | Yes, No | --- | --- | --- | --- |
| Demographic | Airway disease | Yes, No | --- | --- | --- | --- |
| Demographic | Thyroid disease | Yes, No | --- | --- | --- | --- |
| Demographic | CHF (congestive heart failure) | Yes, No | --- | --- | --- | --- |
| Demographic | DPL (dyslipidemia) | Yes, No | --- | --- | --- | --- |
| Symptom and examination | BP (blood pressure mm Hg) | (90–190) | 129.55 | 1.09 | 18.94 | 358.65 |
| Symptom and examination | PR (pulse rate ppm) | (50–110) | 75.14 | 0.51 | 8.91 | 79.42 |
| Symptom and examination | Edema | (0, 1) | 0.04 | 0.01 | 0.2 | 0.04 |
| Symptom and examination | Weak peripheral pulse | Yes, No | --- | --- | --- | --- |
| Symptom and examination | Lung rates | Yes, No | --- | --- | --- | --- |
| Symptom and examination | Systolic murmur | Yes, No | --- | --- | --- | --- |
| Symptom and examination | Diastolic murmur | Yes, No | --- | --- | --- | --- |
| Symptom and examination | Typical chest pain | (0, 1) | 0.54 | 0.03 | 0.5 | 0.25 |
| Symptom and examination | Dyspnea | Yes, No | --- | --- | --- | --- |
| Symptom and examination | Function class | 1, 2, 3, 4 | 0.66 | 0.06 | 1.03 | 1.07 |
| Symptom and examination | Atypical | Yes, No | --- | --- | --- | --- |
| Symptom and examination | Nonanginal chest pain | Yes, No | --- | --- | --- | --- |
| Symptom and examination | Exertional chest pain | Yes, No | --- | --- | --- | --- |
| Symptom and examination | Low TH Ang (low-threshold angina) | Yes, No | --- | --- | --- | --- |
| ECG | Rhythm | Sin, AF | --- | --- | --- | --- |
| ECG | Q wave | (0, 1) | 0.05 | 0.01 | 0.22 | 0.05 |
| ECG | ST elevation | (0, 1) | 0.05 | 0.01 | 0.21 | 0.04 |
| ECG | ST depression | (0, 1) | 0.23 | 0.02 | 0.42 | 0.18 |
| ECG | T inversion | (0, 1) | 0.3 | 0.03 | 0.46 | 0.21 |
| ECG | LVH (left ventricular hypertrophy) | Yes, No | --- | --- | --- | --- |
| ECG | Poor R-wave progression | Yes, No | --- | --- | --- | --- |
| Laboratory and echo | FBS (fasting blood sugar mg/dL) | (62–400) | 119.18 | 2.99 | 52.08 | 2712.29 |
| Laboratory and echo | Cr (creatine mg/dL) | (0.5–2.2) | 1.06 | 0.02 | 0.26 | 0.07 |
| Laboratory and echo | TG (triglyceride mg/dL) | (37–1050) | 150.34 | 5.63 | 97.96 | 9596.05 |
| Laboratory and echo | LDL (low-density lipoprotein mg/dL) | (18–232) | 104.64 | 2.03 | 35.4 | 1252.93 |
| Laboratory and echo | HDL (high-density lipoprotein mg/dL) | (15–111) | 40.23 | 0.61 | 10.56 | 111.49 |
| Laboratory and echo | BUN (blood urea nitrogen mg/dL) | (6–52) | 17.5 | 0.4 | 6.96 | 48.4 |
| Laboratory and echo | ESR (erythrocyte sedimentation rate mm/h) | (1–90) | 19.46 | 0.92 | 15.94 | 253.97 |
| Laboratory and echo | HB (hemoglobin g/dL) | (8.9–17.6) | 13.15 | 0.09 | 1.61 | 2.59 |
| Laboratory and echo | K (potassium mEq/lit) | (3.0–6.6) | 4.23 | 0.03 | 0.46 | 0.21 |
| Laboratory and echo | Na (sodium mEq/lit) | (128–156) | 141 | 0.22 | 3.81 | 14.5 |
| Laboratory and echo | WBC (white blood cell cells/mL) | (3700–18.000) | 7562.05 | 138.67 | 2413.74 | 5,826,137.52 |
| Laboratory and echo | Lymph (lymphocyte %) | (7–60) | 32.4 | 0.57 | 9.97 | 99.45 |
| Laboratory and echo | Neut (neutrophil %) | (32–89) | 60.15 | 0.59 | 10.18 | 103.68 |
| Laboratory and echo | PLT (platelet 1000/mL) | (25–742) | 221.49 | 3.49 | 60.8 | 3696.18 |
| Laboratory and echo | EF (ejection fraction %) | (15–60) | 47.23 | 0.51 | 8.93 | 79.7 |
| Laboratory and echo | Region with RWMA | (0–4) | 0.62 | 0.07 | 1.13 | 1.28 |
| Laboratory and echo | VHD (valvular heart disease) | Normal, Mild, Moderate, Severe | --- | --- | --- | --- |
| Categorical | Target class: Cath | CAD, Normal | --- | --- | --- | --- |
Confusion matrix for detection of coronary artery disease (CAD).
| The Actual Class | The Predicted Class | |
|---|---|---|
| Disease (CAD) | Healthy (Normal) | |
| Positive | True Positive | False Positive |
| Negative | False Negative | True Negative |
Figure 4Comparison based on ROC of models: (a) Normal class (b) CAD class.
Figure 5Results based on gain of models: (a) Normal class, (b) CAD class.
Figure 6Results based on confidence through the lift chart of models: (a) CAD class, (b) Normal class.
Figure 7Results based on profit of models: (a) Normal class, (b) CAD Class.
Figure 8Results based on ROI of models: (a) CAD class, (b) Normal class.
Figure 9Results based on response of models: (a) CAD class, (b) Normal class.
Predictor significance for features based on ranking for the random trees model.
| No. | Feature | Predictor Significance |
|---|---|---|
| 1 | Typical chest pain | 0.98 |
| 2 | TG | 0.66 |
| 3 | BMI | 0.63 |
| 4 | Age | 0.58 |
| 5 | Weight | 0.54 |
| 6 | BP | 0.51 |
| 7 | K | 0.48 |
| 8 | FBS | 0.43 |
| 9 | Length | 0.37 |
| 10 | BUN | 0.3 |
| 11 | PR | 0.29 |
| 12 | HB | 0.26 |
| 13 | Function class | 0.25 |
| 14 | Neut | 0.25 |
| 15 | EF-TTE | 0.25 |
| 16 | WBC | 0.24 |
| 17 | DM | 0.23 |
| 18 | PLT | 0.2 |
| 19 | Atypical | 0.19 |
| 20 | FH | 0.18 |
| 21 | HDL | 0.16 |
| 22 | ESR | 0.16 |
| 23 | CR | 0.14 |
| 24 | LDL | 0.14 |
| 25 | T inversion | 0.13 |
| 26 | DLP | 0.13 |
| 27 | Region RWMA | 0.12 |
| 28 | HTN | 0.11 |
| 29 | Obesity | 0.1 |
| 30 | Systolic murmur | 0.09 |
| 31 | Sex | 0.09 |
| 32 | Dyspnea | 0.08 |
| 33 | Current smoker | 0.06 |
| 34 | BBB | 0.05 |
| 35 | LVH | 0.03 |
| 36 | Edema | 0.02 |
| 37 | Ex-smoker | 0.02 |
| 38 | VHD | 0.01 |
| 39 | St depression | 0.01 |
| 40 | Lymph | 0.0 |
Predictor significance for features based on ranking for the support vector machine (SVM) model.
| No. | Feature | Predictor Significance |
|---|---|---|
| 1 | Typical chest pain | 0.04 |
| 2 | Atypical | 0.03 |
| 3 | Sex | 0.02 |
| 4 | Obesity | 0.02 |
| 5 | FH | 0.02 |
| 6 | Age | 0.02 |
| 7 | DM | 0.02 |
| 8 | Dyspnea | 0.02 |
| 9 | Systolic murmur | 0.02 |
| 10 | St depression | 0.02 |
| 11 | HTN | 0.02 |
| 12 | LDL | 0.02 |
| 13 | Current smoker | 0.02 |
| 14 | DLP | 0.02 |
| 15 | BP | 0.02 |
| 16 | LVH | 0.02 |
| 17 | Nonanginal | 0.02 |
| 18 | Tin version | 0.02 |
| 19 | Length | 0.02 |
| 20 | Function class | 0.02 |
| 21 | BBB | 0.02 |
| 22 | VHD | 0.02 |
| 23 | CHF | 0.02 |
| 24 | PR | 0.02 |
| 25 | WBC | 0.02 |
| 26 | BUN | 0.02 |
| 27 | FBS | 0.02 |
| 28 | ESR | 0.02 |
| 29 | CVA | 0.02 |
| 30 | Thyroid disease | 0.02 |
| 31 | Lymph | 0.02 |
| 32 | Weight | 0.02 |
| 33 | CR | 0.02 |
| 34 | Airway disease | 0.02 |
| 35 | TG | 0.02 |
| 36 | CRF | 0.02 |
| 37 | Diastolic murmur | 0.02 |
| 38 | Low TH ang | 0.02 |
| 39 | Exertional CP | 0.02 |
| 40 | Weak peripheral pulse | 0.02 |
| 41 | Neut | 0.02 |
| 42 | PLT | 0.02 |
| 43 | St elevation | 0.02 |
| 44 | EF-TTE | 0.02 |
| 45 | K | 0.02 |
| 46 | BMI | 0.02 |
| 47 | Ex-smoker | 0.02 |
| 48 | Lung rates | 0.02 |
| 49 | HDL | 0.02 |
| 50 | Na | 0.01 |
| 51 | Edema | 0.01 |
| 52 | Q wave | 0.01 |
| 53 | HB | 0.01 |
| 54 | Poor R progression | 0.01 |
| 55 | Region RWMA | 0.01 |
Predictor significance for features based on ranking for the C5.0 model.
| No. | Feature | Predictor Significance |
|---|---|---|
| 1 | Typical chest pain | 0.28 |
| 2 | CR | 0.14 |
| 3 | ESR | 0.13 |
| 4 | T inversion | 0.1 |
| 5 | Edema | 0.09 |
| 6 | Region RWMA | 0.08 |
| 7 | Poor R progression | 0.04 |
| 8 | Sex | 0.03 |
| 9 | DM | 0.03 |
| 10 | BMI | 0.02 |
| 11 | WBC | 0.02 |
| 12 | DLP | 0.02 |
| 13 | Length | 0.01 |
| 14 | Dyspnea | 0.0 |
| 15 | EF-TTE | 0.0 |
Predictor significance imported for features based on ranking for the Chi-squared automatic interaction detection (CHAID) model.
| No. | Feature | Predictor Significance |
|---|---|---|
| 1 | Typical chest pain | 0.33 |
| 2 | Age | 0.15 |
| 3 | T inversion | 0.11 |
| 4 | VHD | 0.1 |
| 5 | DM | 0.09 |
| 6 | HTN | 0.04 |
| 7 | Nonanginal | 0.03 |
| 8 | BP | 0.02 |
| 9 | Region RWMA | 0.02 |
| 10 | HDL | 0.02 |
The most significant obtained rules for CAD diagnosis using random trees (top decision rules for ‘cath’ class).
| Decision Rule | Most Frequent Category | Rule Accuracy | Forest Accuracy | Interestingness Index |
|---|---|---|---|---|
| (BP > 110.0), (FH > 0.0), (Neut > 51.0) and (Typical Chest Pain > 0.0) | CAD | 1.000 | 1.000 | 1.000 |
| (BMI ≤ 29.02), (EF-TTE > 50.0), (CR ≤ 0.9), (Typical Chest Pain > 0.0) and (Atypical = {N}) | CAD | 1.000 | 1.000 | 1.000 |
| (Weight > 8.0), (CR > 0.9), (Typical Chest Pain > 0.0) and (Atypical = {N}) | CAD | 1.000 | 1.000 | 1.000 |
| (K ≤ 4.9), (WBC > 5700.0), (CR < 0.9), | CAD | 1.000 | 1.000 | 1.000 |
| (DM > 0.0) and (Typical Chest Pain > 0.0) | CAD | 1.000 | 1.000 | 1.000 |
The performed works for CAD diagnosis on the Z-Alizadeh Sani dataset with the 10-fold cross validation method.
| Referense | Methods | No. Features Subset Selection | Accuracy (%) | Auc % | Gini % |
|---|---|---|---|---|---|
| [ | Naïve Bayes-SMO | 16 | 88.52 | Not reported | Not reported |
| [ | SMO along with information Gain | 34 | 94.08 | Not reported | Not reported |
| [ | SVM along with average information gain and also information gain | 24 | 86.14 for LAD | Not reported | Not reported |
| [ | Neural network-genetic algorithm-weight by SVM | 22 | 93.85 | Not reported | Not reported |
| [ | SVM along with feature engineering | 32 | 96.40 | 92 | Not reported |
| [ | N2Genetic-nuSVM | 29 | 93.08 | Not reported | Not reported |
| In our study | Random trees | 40 | 91.47 | 96.70 | 93.40 |