| Literature DB >> 34788911 |
Roya Najafi-Vosough1, Javad Faradmal1,2, Seyed Kianoosh Hosseini3, Abbas Moghimbeigi4,5, Hossein Mahjub1,6.
Abstract
OBJECTIVES: Heart failure (HF) is a common disease with a high hospital readmission rate. This study considered class imbalance and missing data, which are two common issues in medical data. The current study's main goal was to compare the performance of six machine learning (ML) methods for predicting hospital readmission in HF patients.Entities:
Keywords: Classification; Data Analysis; Heart Failure; Machine Learning; Patient Readmission
Year: 2021 PMID: 34788911 PMCID: PMC8654329 DOI: 10.4258/hir.2021.27.4.307
Source DB: PubMed Journal: Healthc Inform Res ISSN: 2093-3681
Clinical characteristics of heart failure patients
| Variable | Hospital readmission | |||||
|---|---|---|---|---|---|---|
| No (n = 1,314) | Yes (n = 542) | |||||
| Median | Min–Max | Mean ± SD | Median | Min–Max | Mean ± SD | |
| Age (yr) | 76.0 | 22.0–97.0 | 74.0 ± 13.5 | 73.0 | 27.0–97.0 | 71.7 ± 13.4 |
| BMI (kg/m2) | 25.3 | 14.3–53.3 | 25.8 ± 5.1 | 25.0 | 13.8–47.3 | 25.8 ± 4.9 |
| Ejection fraction (%) | 25.0 | 10.0–55.0 | 26.4 ± 10.9 | 20.0 | 10.0–50.0 | 22.8 ± 10.1 |
| SBP (mmHg) | 121.0 | 67.0–220.0 | 125.4 ± 24.4 | 125.0 | 70.0–220.0 | 126.0 ± 24.1 |
| DBP (mmHg) | 80.0 | 40.0–137.0 | 77.9 ± 15.6 | 80.0 | 44.0–140.0 | 78.1 ± 15.1 |
| FBS (mg/dL) | 98.5 | 38.0–455.0 | 113.3 ± 51.2 | 97.0 | 31.0–453.0 | 112.6 ± 53.0 |
| BUN (mg/dL) | 23.0 | 10.0–127.5 | 26.5 ± 13.6 | 24.0 | 11.5–95.0 | 27.2 ± 12.1 |
| Creatinine (mg/dL) | 1.3 | 0.5–10.7 | 1.4 ± 0.8 | 1.3 | 0.7–10.2 | 1.4 ± 0.7 |
| Cholesterol (mg/dL) | 141.0 | 50.0–386.0 | 146.7 ± 43.5 | 135.0 | 30.0–317.0 | 142.3 ± 41.2 |
| Triglycerides (mg/dL) | 103.0 | 25.0–437.0 | 115.3 ± 52.3 | 99.0 | 28.0–358.0 | 110.8 ± 50.5 |
| HDL (mg/dL) | 36.0 | 20.0–85.0 | 38.0 ± 9.7 | 38.0 | 20.0–74.0 | 38.8 ± 9.8 |
| LDL (mg/dL) | 81.0 | 24.0–313.0 | 86.7 ± 32.9 | 80.0 | 26.0–382.0 | 84.2 ± 32.7 |
| CK-MB (U/L) | 22.0 | 2.0–980.0 | 34.5 ± 57.6 | 22.0 | 7.0–1089.0 | 32.5 ± 60.3 |
| Sodium (Na) (mmol/L) | 139.5 | 116.0–164.5 | 139.1 ± 4.0 | 140.0 | 121.5–148.0 | 139.6 ± 3.7 |
| Potassium (K) (mmol/L) | 4.2 | 2.9–7.3 | 4.2 ± 0.5 | 4.2 | 2.7–6.6 | 4.2 ± 0.4 |
| WBC (×109/L) | 7.6 | 2.5–23.6 | 8.1 ± 2.9 | 7.4 | 2.4–20.9 | 7.9 ± 2.7 |
| RBC (×109/L) | 4.6 | 2.6–8.4 | 4.6 ± 0.7 | 4.6 | 2.7–7.5 | 4.7 ± 0.7 |
| Hemoglobin (Hb) (g/dL) | 13.6 | 6.0–19.9 | 13.5 ± 2.1 | 13.7 | 8.1–19.8 | 13.7 ± 2.1 |
| Hct (%) | 41.8 | 19.7–63.3 | 41.8 ± 6.0 | 41.8 | 25.2–62.8 | 42.3 ± 5.9 |
| RDW (%) | 14.5 | 11.5–24.6 | 14.9 ± 2.0 | 14.5 | 11.8–23.6 | 15.0 ± 2.0 |
| Platelet (×103/μL) | 188.0 | 40.0–573.0 | 197.1 ± 69.5 | 185.0 | 50.0–578.0 | 198.3 ± 71.9 |
| MCV (fL) | 90.1 | 58.3–118.1 | 89.4 ± 7.1 | 90.9 | 62.3–112.3 | 90.3 ± 6.9 |
| MCH (pg) | 29.3 | 16.4–43.1 | 29.0 ± 2.9 | 29.6 | 17.8–37.7 | 29.2 ± 2.9 |
| MCHC (g/dL) | 32.4 | 25.9–43.5 | 32.3 ± 1.6 | 32.4 | 26.2–36.3 | 32.3 ± 1.6 |
| PT (s) | 13.2 | 12.0–36.0 | 14.4 ± 3.5 | 13.3 | 12.0–36.0 | 14.4 ± 3.4 |
| INR | 1.1 | 1.0–10.5 | 1.3 ± 0.7 | 1.1 | 1.0–6.5 | 1.3 ± 0.6 |
| PTT (s) | 27.0 | 20.0–120.0 | 29.5 ± 9.5 | 27.0 | 20.5–120.0 | 29.6 ± 10.0 |
BMI: body mass index, SBP: systolic blood pressure, DBP: diastolic blood pressure, FBS: fasting blood glucose, BUN: blood urea nitrogen, HDL: high-density lipoprotein, LDL: low-density lipoprotein, CK-MB: creatine kinase-MB, WBC, white blood cell, RBC: red blood cell, RDW: red cell distribution width, Hct: hematocrit, MCV: mean corpuscular volume, MCH: mean corpuscular hemoglobin, MCHC: mean corpuscular hemoglobin concentration, PT: prothrombin time, INR: international normalized ratio, PTT: partial thromboplastin time, SD, standard deviation.
Baseline characteristics of heart failure patients
| Hospital readmission | ||
|---|---|---|
| No (n = 1,314) | Yes (n = 542) | |
| Hospital departments (ward) | 803 (61.1) | 356 (65.7) |
| Sex (male) | 742 (56.5) | 350 (64.4) |
| History of diabetes (yes) | 377 (28.7) | 158 (29.2) |
| History of hypertension (yes) | 780 (59.4) | 309 (57.0) |
| History of blood lipids (yes) | 152 (11.6) | 52 (9.6) |
| Smoking (yes) | 175 (13.3) | 113 (20.8) |
| Substance abuse (yes) | 179 (13.6) | 99 (18.3) |
| History of MI (yes) | 57 (4.3) | 41 (7.6) |
| Family history of HF (yes) | 59 (4.5) | 40 (7.4) |
| History of stroke (yes) | 65 (4.9) | 21 (3.9) |
| COPD (yes) | 57 (4.3) | 34 (6.3) |
| Thyroid disease (yes) | 75 (5.7) | 29 (5.4) |
| Respiratory disease (yes) | 142 (10.8) | 65 (12.0) |
| Kidney disease (yes) | 143 (10.9) | 61 (11.3) |
| CABG (yes) | 128 (9.7) | 79 (14.6) |
| CAG (yes) | 125 (9.5) | 60 (11.1) |
Values are presented as number (%).
MI: myocardial infarction, HF: heart failure, COPD: chronic obstructive pulmonary disease, CABG: coronary artery bypass graft, CAG: coronary angiography.
Performance criteria of machine learning methods using the median imputation method
| Methods | Set | Sensitivity | Specificity | PPV | NPV | Accuracy |
|---|---|---|---|---|---|---|
| SVM | Train | 0.66 (0.010) | 0.99 (0.001) | 0.98 (0.007) | 0.93 (0.001) | 0.94 (0.001) |
| Test | 0.62 (0.016) | 0.95 (0.011) | 0.75 (0.049) | 0.92 (0.005) | 0.89 (0.009) | |
| LS-SVM | Train | 0.87 (0.008) | 0.53 (0.070) | 0.28 (0.027) | 0.95 (0.005) | 0.59 (0.057) |
| Test | 0.86 (0.015) | 0.51 (0.073) | 0.27 (0.032) | 0.94 (0.007) | 0.57 (0.059) | |
| Bagging | Train | 0.54 (0.017) | 0.99 (0.001) | 0.95 (0.015) | 0.91 (0.003) | 0.91 (0.003) |
| Test | 0.52 (0.021) | 0.96 (0.011) | 0.75 (0.060) | 0.90 (0.006) | 0.88 (0.009) | |
| AdaBoost | Train | 1.00 (0) | 1.00 (0) | 1.00 (0) | 1.00 (0) | 1.00 (0) |
| Test | 0.85 (0.012) | 0.87 (0.020) | 0.58 (0.044) | 0.96 (0.003) | 0.86 (0.016) | |
| RF | Train | 0.81 (0.008) | 1.00 (0) | 1.00 (0) | 0.96 (0.001) | 0.97 (0.001) |
| Test | 0.72 (0.016) | 0.95 (0.011) | 0.78 (0.047) | 0.94 (0.004) | 0.91 (0.009) | |
| NB | Train | 0.67 (0.007) | 0.96 (0.004) | 0.77 (0.020) | 0.93 (0.001) | 0.91 (0.004) |
| Test | 0.64 (0.017) | 0.93 (0.011) | 0.66 (0.041) | 0.92 (0.004) | 0.88 (0.009) |
The number in parenthesis denotes standard deviation.
PPV: positive predicted value, NPV: negative predicted value; SVM: support vector machine, LS-SVM: least-square support vector machine, RF: random forest, NB: naïve Bayes.
Performance criteria of machine learning methods using the multiple imputation method
| Methods | Set | Sensitivity | Specificity | PPV | NPV | Accuracy |
|---|---|---|---|---|---|---|
| SVM | Train | 0.66 (0.010) | 0.99 (0.001) | 0.98 (0.006) | 0.93 (0.001) | 0.94 (0.001) |
| Test | 0.62 (0.017) | 0.95 (0.011) | 0.75 (0.052) | 0.92 (0.004) | 0.90 (0.008) | |
| LS-SVM | Train | 0.87 (0.009) | 0.55 (0.061) | 0.29 (0.026) | 0.95 (0.004) | 0.60 (0.049) |
| Test | 0.86 (0.015) | 0.54 (0.060) | 0.28 (0.028) | 0.95 (0.005) | 0.60 (0.049) | |
| Bagging | Train | 0.48 (0.024) | 0.99 (0.001) | 0.95 (0.016) | 0.90 (0.004) | 0.90 (0.004) |
| Test | 0.46 (0.027) | 0.95 (0.014) | 0.69 (0.066) | 0.89 (0.006) | 0.87 (0.010) | |
| AdaBoost | Train | 1.00 (0) | 1.00 (0) | 1.00 (0) | 1.00 (0) | 1.00 (0) |
| Test | 0.84 (0.012) | 0.84 (0.021) | 0.54 (0.039) | 0.96 (0.003) | 0.84 (0.017) | |
| RF | Train | 0.80 (0.009) | 1.00 (0) | 1.00 (0) | 0.96 (0.001) | 0.96 (0.001) |
| Test | 0.69 (0.018) | 0.94 (0.013) | 0.73 (0.051) | 0.93 (0.004) | 0.90 (0.010) | |
| NB | Train | 0.69 (0.008) | 0.94 (0.005) | 0.71 (0.019) | 0.93 (0.001) | 0.89 (0.004) |
| Test | 0.66 (0.017) | 0.90 (0.018) | 0.59 (0.046) | 0.92 (0.004) | 0.86 (0.014) |
The number in parenthesis denotes standard deviation.
PPV: positive predicted value, NPV: negative predicted value; SVM: support vector machine, LS-SVM: least-square support vector machine, RF: random forest, NB: naïve Bayes.
Figure 1Top 10 variable importance (VIMP) values for predicting hospital readmission in heart failure patients using two imputation methods for missing data: (A) median imputation method and (B) multiple imputation method. EF: ejection fraction, PTT: partial thromboplastin time, CK-MB: creatine kinase-MB, BUN: blood urea nitrogen, Hct: hematocrit, DBP: diastolic blood pressure, LDL: low-density lipoprotein.