| Literature DB >> 28679442 |
John Wallert1,2, Mattia Tomasoni3, Guy Madison4, Claes Held5,6.
Abstract
BACKGROUND: Machine learning algorithms hold potential for improved prediction of all-cause mortality in cardiovascular patients, yet have not previously been developed with high-quality population data. This study compared four popular machine learning algorithms trained on unselected, nation-wide population data from Sweden to solve the binary classification problem of predicting survival versus non-survival 2 years after first myocardial infarction (MI).Entities:
Keywords: Cardiovascular disease; Classification; Coronary Artery Syndrome; Myocardial infarction; Prognostic Modelling; Registries; Supervised machine learning
Mesh:
Year: 2017 PMID: 28679442 PMCID: PMC5499032 DOI: 10.1186/s12911-017-0500-y
Source DB: PubMed Journal: BMC Med Inform Decis Mak ISSN: 1472-6947 Impact factor: 2.796
Predictors for all cases, by each class, and univariate class comparisons
| Predictors ( | All cases ( | Survivors ( | Non-survivors ( | Survivors vs. Non-survivors ( |
|---|---|---|---|---|
| Age (yrs) | 68.8 ± 12.3 | 67.5 ± 11.9 | 79.3 ± 9.8 | < 0.0001 |
| Male sex | 33,620 (64.7) | 30,523 (66.0) | 3097 (54.2) | < 0.0001 |
| Weight (kg) | 79.1 ± 15.9 | 72.1 ± 15.6 | 80.0 ± 16.3 | < 0.0001 |
| Ambulance to CCU | 31,654 (60.9) | 27,816 (60.2) | 3838 (67.2) | < 0.0001 |
| Comorbid conditions | ||||
| Smoking | 12,717 (24.5) | 11,740 (25.4) | 977 (17.1) | < 0.0001 |
| Diabetes | 8552 (16.5) | 7046 (15.2) | 1506 (26.4) | < 0.0001 |
| Hypertension | 23,432 (45.1) | 20,386 (44.1) | 3046 (53.3) | < 0.0001 |
| Previous stroke | 3623 (7.0) | 2759 (6.0) | 864 (15.1) | < 0.0001 |
| Admission medication | ||||
| ACE inhibitors | 8409 (16.2) | 7096 (15.3) | 1313 (23.0) | < 0.0001 |
| A2 blockers | 5893 (11.3) | 5164 (11.2) | 729 (12.8) | 0.0003 |
| Beta blockers | 14,485 (27.9) | 12,084 (26.1) | 2401 (42.0) | < 0.0001 |
| Statins | 9904 (19.1) | 8677 (18.8) | 1227 (21.5) | < 0.0001 |
| Presenting symptoms | ||||
| Chest pain | 44,589 (85.8) | 40,761 (88.2) | 3828 (67.0) | < 0.0001 |
| Dyspnea | 3472 (6.7) | 2368 (5.1) | 1104 (19.3) | < 0.0001 |
| Other | 3580 (6.9) | 2834 (6.1) | 746 (13.1) | < 0.0001 |
| ECG rhythm at CCU | ||||
| Sinus | 46,297 (89.1) | 41,983 (90.8) | 4314 (75.6) | < 0.0001 |
| Atrial fibrillation | 4469 (8.6) | 3308 (7.2) | 1161 (20.3) | < 0.0001 |
| ECG QRS at CCU | ||||
| Normal | 35,819 (69.0) | 32,709 (70.7) | 3110 (54.5) | < 0.0001 |
| Pathological Q-wave | 5407 (10.4) | 4753 (10.3) | 654 (11.5) | 0.0062 |
| Left bundle branch block | 2458 (4.7) | 1877 (4.1) | 581 (10.2) | < 0.0001 |
| Other | 5711 (11.0) | 4862 (10.5) | 849 (14.9) | < 0.0001 |
| ECG STT at CCU | ||||
| Normal | 11,729 (22.6) | 10,805 (23.4) | 924 (16.2) | < 0.0001 |
| ST-elevation | 17,641 (34.0) | 16,251 (35.2) | 1390 (24.3) | < 0.0001 |
| ST-depression | 11,462 (22.1) | 9690 (21.0) | 1772 (31.0) | < 0.0001 |
| Other | 5820 (11.2) | 4726 (10.2) | 1094 (19.2) | < 0.0001 |
| Pulmonary rales at CCU | ||||
| No | 46,205 (89.0) | 42,081 (91.0) | 4124 (72.2) | < 0.0001 |
| Rales | 3880 (7.5) | 2762 (6.0) | 1118 (19.6) | < 0.0001 |
| Other measures at CCU | ||||
| Troponin (ng) | 1360 (310–6460) | 1350 (280–1587) | 1400 (319–10,000) | 0.1761 |
| HR(bpm) | 76 (65–90) | 75 (65–90) | 86 (71–86) | < 0.0001 |
| SBP (mm Hg) | 148.9 ± 28.6 | 143.0 ± 28.2 | 150.0 ± 30.7 | < 0.0001 |
| Reperfusion at CCU | ||||
| No | 34,469 (66.4) | 29,740 (64.3) | 4729 (82.8) | < 0.0001 |
| Primary PCI | 14,665 (28.2) | 13,884 (30.0) | 781 (13.7) | < 0.0001 |
| Discharge medication | ||||
| ACE inhibitors | 31,547 (60.7) | 28,712 (62.1) | 2835 (49.6) | < 0.0001 |
| A2 blockers | 6445 (12.4) | 5670 (12.3) | 775 (13.6) | 0.0046 |
| Oral anticoagulants | 2993 (5.8) | 2514 (5.4) | 479 (8.4) | < 0.0001 |
| Other antiplatelet | 41,741 (80.4) | 38,461 (83.2) | 3280 (57.4) | < 0.0001 |
| Beta blockers | 46,623 (89.8) | 41,789 (90.4) | 4834 (84.7) | < 0.0001 |
| Statins | 45,366 (87.3) | 41,918 (90.7) | 3448 (60.4) | < 0.0001 |
| ECG rhythm at discharge | ||||
| Atrial fibrillation | 3703 (7.1) | 2645 (5.7) | 1058 (18.5) | < 0.0001 |
Values are mean ± SD or median (IQR) or count (%). Uncorrected P-values are from Welch’s t-tests if variable is Gaussian, Mann-Whitney U-tests if non-Gaussian, or Pearson’s χ2-tests if categorical
ACE angiotensin-converting-enzyme, A2 angiotensin-2 receptor, CCU coronary care unit, ECG electrocardiogram, HR heart rate, PCI percutaneous coronary intervention, SBP systolic blood pressure
Fig. 1Training results. Top panel: Model training result as a function of increasing sample size (1–100%). Bottom panel: Model training performance on the three predictor sets using 100% of training samples (n = 31,166) with the 5 and 10 predictor sets as chosen by each model. Points are mean values of each model’s resampled training runs optimized on the Area Under the Receiver Operating Characteristic (AUROC). Error bars indicate ± SD. C5.0, Boosted C5.0; LR, Logistic regression; RF, Random Forest; SVM, Support Vector Machine
Fig. 2The importance of the 15 most important predictors chosen by each model. Derived from 100% of training samples (n = 31,166). Importance is scaled relative to the most important predictor within each model based on model-specific metrics (LR, z-value; C5.0, tree split usage; RF, Gini importance; SVM, univariate AUROC). Prefixes: Previous = before the first MI; Intake = at hospital/lab arrival; CCU = during the Coronary Care Unit stay; Discharge = at discharge from hospital. Unspecified prefix signifies either a fixed predictor or that the predictor was register at some time-point before hospital discharge. C5.0, Boosted C5.0; LR, Logistic regression; RF, Random Forest; SVM, Support Vector Machine; ACE, Angiotensin-converting-enzyme; ECG, Electrocardiogram; PCI, Percutaneous coronary intervention
Additional test performance metrics
| Model | Sens/Spec | PPV/NPV | Detection rate | Detection incidence | Accuracy (95% CI) |
|---|---|---|---|---|---|
| Full predictor set ( | |||||
| LR | 0.771/0.770 | 0.293/0.965 | 0.085 | 0.290 | 0.770 (0.764 to 0.776) |
| C5.0 | 0.798/0.739 | 0.275/0.967 | 0.088 | 0.320 | 0.746 (0.740 to 0.752) |
| RF | 0.789/0.752 | 0.282/0.966 | 0.087 | 0.307 | 0.756 (0.750 to 0.762) |
| SVM | 0.784/0.751 | 0.280/0.966 | 0.086 | 0.308 | 0.755 (0.749 to 0.761) |
| Reduced predictor set ( | |||||
| LR | 0.754/0.758 | 0.278/0.961 | 0.083 | 0.298 | 0.758 (0.752 to 0.763) |
| C5.0 | 0.768/0.757 | 0.281/0.964 | 0.084 | 0.301 | 0.758 (0.752 to 0.764) |
| RF | 0.771/0.746 | 0.272/0.963 | 0.085 | 0.311 | 0.748 (0.742 to 0.754) |
| SVM | 0.751/0.756 | 0.275/0.961 | 0.083 | 0.300 | 0.755 (0.749 to 0.761) |
| Minimal predictor set ( | |||||
| LR | 0.749/0.750 | 0.270/0.960 | 0.082 | 0.305 | 0.750 (0.744 to 0.756) |
| C5.0 | 0.758/0.736 | 0.262/0.961 | 0.083 | 0.319 | 0.738 (0.732 to 0.744) |
| RF | 0.755/0.703 | 0.239/0.959 | 0.083 | 0.348 | 0.708 (0.702 to 0.715) |
| SVM | 0.732/0.753 | 0.268/0.958 | 0.080 | 0.300 | 0.751 (0.745 to 0.757) |
Results of trained models on 100% of testing data (n = 20,777) by predictor set. For all models, Base Rate Incidence = 0.110, and No Information Rate = 0.890
Sens sensitivity, Spec specificity, PPV positive predictive value, NPV negative predictive value, CI confidence interval, NIR no information rate, C5.0 C5.0 boosted decision trees, LR logistic regression, RF random forest, SVM support vector machine