| Literature DB >> 35544292 |
Osung Kwon1, Wonjun Na2, Dong Hyun Yang3, Young-Hak Kim4, Heejun Kang4, Tae Joon Jun4, Jihoon Kweon4, Gyung-Min Park5, YongHyun Cho6, Cinyoung Hur6, Jungwoo Chae6, Do-Yoon Kang4, Pil Hyung Lee4, Jung-Min Ahn4, Duk-Woo Park4, Soo-Jin Kang4, Seung-Whan Lee4, Cheol Whan Lee4, Seong-Wook Park4, Seung-Jung Park4.
Abstract
BACKGROUND: Although there is a growing interest in prediction models based on electronic medical records (EMRs) to identify patients at risk of adverse cardiac events following invasive coronary treatment, robust models fully utilizing EMR data are limited.Entities:
Keywords: adverse cardiac event; big data; coronary artery disease; electronic medical record; machine learning; mortality; prediction
Year: 2022 PMID: 35544292 PMCID: PMC9133980 DOI: 10.2196/26801
Source DB: PubMed Journal: JMIR Med Inform
Figure 1Study diagram. Database, machine learning, and validation. AMC: Asan Medical Center; CABG: coronary artery bypass grafting; EMR: electronic medical record; ML: machine learning; PCI: percutaneous coronary intervention.
Figure 2An example case incorporating serial and various electronic medical record data to predict adverse events. BP: blood pressure; BSA: body surface area; BUN: blood urea nitrogen; CAG: coronary angiography; CK-MB: creatine kinase myocardial band; Dia: diameter; EDD: end diastolic dimension; EF: ejection fraction; EKG: electrocardiogram; ESD: end systolic dimension; FFR: fractional flow rate; GLS: global longitudinal strain; Hb: hemoglobin; HR: heart rate; LDL: low-density lipoprotein; Leng: length; Lp(a): lipoprotein A; LV: left ventricle; PCI: percutaneous coronary intervention; pLAD: proximal left anterior descending; Pr: pressure; RR: respiratory rate.
Hyperparameters and those values of each model.
| Model, hyperparameter | Value | |
|
| ||
|
| Solver | liblinear |
|
| Maximal iteration | 100 |
|
| ||
|
| Number of estimators | 100 |
|
| Maximal depth | 10 |
|
| ||
|
| Objective | binary |
|
| Estimators | 150 |
|
| Boosting type | Gradient boosting decision tree |
|
| Number of leaves | 15 |
|
| Maximal depth | –1 (no limit) |
|
| Learning rate | 0.025 |
|
| Minimal number of data in child | 90 |
|
| ||
|
| Learning rate | 0.0002 |
|
| Hidden layer units | (64,64) |
|
| Batch size | 64 |
|
| Epoch | 40 |
|
| Dropout rate | 0.5 |
|
| Optimizer | Adam (beta1=.5, beta2=.999) |
Baseline clinical characteristics of the development and internal validation set.
| Characteristics | Development and internal validation set | ||
|
| Total population (N=16,793) | Percutaneous coronary | Coronary artery bypass grafting surgery (n=4274) |
| Age (years), mean (SD) | 62.7 (10.2) | 62.2 (10.5) | 64.1 (9.4) |
| Male sex, n (%) | 12,465 (74.2) | 9312 (74.4) | 3153 (73.8) |
| Body mass index (kg/m2), mean (SD) | 24.9 (3.1) | 25.0 (3.0) | 24.6 (3.1) |
| Hypertension, n (%) | 10,697 (63.7) | 7758 (62) | 2939 (68.8) |
| Diabetes mellitus, n (%) | 6084 (36.2) | 4127 (33) | 1957 (45.8) |
| Hyperlipidemia, n (%) | 9200 (54.8) | 6932 (55.4) | 2268 (53.1) |
| Current cigarette smoker, n (%) | 3009 (17.9) | 2424 (19.4) | 585 (13.7) |
| Prior myocardial infarction, n (%) | 568 (3.4) | 394 (3.1) | 174 (4.1) |
| Previous cerebrovascular accident, n (%) | 596 (3.5) | 420 (3.4) | 176 (4.1) |
| History of congestive heart failure, n (%) | 243 (1.4) | 132 (1.1) | 111 (2.6) |
| Peripheral vascular disease, n (%) | 278 (1.7) | 199 (1.6) | 79 (1.8) |
| Valvular heart disease, n (%) | 387 (2.3) | 106 (0.8) | 281 (6.6) |
| Chronic renal insufficiency, n (%) | 566 (3.4) | 363 (2.9) | 203 (4.7) |
| Chronic lung disease, n (%) | 386 (2.3) | 306 (2.4) | 80 (1.9) |
| Chronic liver disease, n (%) | 487 (2.9) | 396 (3.2) | 91 (2.1) |
| History of malignancy, n (%) | 1019 (6.1) | 816 (6.5) | 203 (4.7) |
| Presentation with acute myocardial infarction, n (%) | 3032 (18.1) | 2509 (20) | 523 (12.2) |
| Admission via emergency department, n (%) | 5054 (30.1) | 3941 (31.5) | 1113 (26) |
| Admission via outpatient clinics, n (%) | 11,739 (69.9) | 8578 (68.5) | 3161 (74) |
Baseline clinical characteristics of the external validation set.
| Characteristics | External validation set | ||
|
| Total population (n=4159) | Percutaneous coronary | Coronary artery bypass grafting surgery (n=209) |
| Age (years), mean (SD) | 61.7 (10.9) | 61.6 (9.4) | 62.7 (10.9) |
| Male sex, n (%) | 2913 (70) | 2779 (70.3) | 134 (64.1) |
| Body mass index (kg/m2), mean (SD) | 24.0 (5.4) | 24.0 (5.2) | 23.8 (6.4) |
| Hypertension, n (%) | 1947 (46.8) | 1851 (46.8) | 96 (45.9) |
| Diabetes mellitus, n (%) | 1278 (30.7) | 1195 (30.2) | 83 (39.7) |
| Hyperlipidemia, n (%) | 1154 (27.7) | 1098 (27.7) | 56 (26.7) |
| Current cigarette smoker, n (%) | 1285 (30.9) | 1234 (31.2) | 51 (24.4) |
| Prior myocardial infarction, n (%) | 280 (6.7) | 265 (6.7) | 15 (7.1) |
| Previous cerebrovascular accident, n (%) | 233 (5.6) | 220 (5.5) | 13 (6.2) |
| History of congestive heart failure, n (%) | 76 (1.8) | 71 (1.7) | 5 (2.3) |
| Peripheral vascular disease, n (%) | 49 (1.1) | 45 (1.1) | 4 (1.9) |
| Valvular heart disease, n (%) | 27 (0.6) | 18 (0.4) | 9 (4.3) |
| Chronic renal insufficiency, n (%) | 130 (3.1) | 123 (3.1) | 7 (3.3) |
| Chronic lung disease, n (%) | 146 (3.5) | 143 (3.6) | 3 (1.4) |
| Chronic liver disease, n (%) | 201 (4.8) | 193 (4.8) | 8 (3.8) |
| History of malignancy, n (%) | 192 (4.6) | 183 (4.6) | 9 (4.3) |
| Presentation with acute myocardial infarction, n (%) | 1357 (32.6) | 1314 (33.2) | 43 (20.5) |
| Admission via emergency department, n (%) | 1706 (41) | 1634 (41.3) | 72 (34.4) |
| Admission via outpatient clinics, n (%) | 2453 (58.9) | 2316 (58.6) | 137 (65.5) |
Figure 3Five-fold cross-validation of performance of each machine model in predicting 30-day mortality after invasive treatment. A. Area under the receiver-operator characteristic curve, B. Area under the precision-recall curve, and C. Calibration plot with Brier score.
Figure 4External validation of performance of each machine model in predicting 30-day mortality after invasive treatment. A. Area under the receiver operator characteristic curve, B. Area under the precision-recall curve, and C. Calibration plot with Brier score.
Figure 5Prediction performance of the gradient boosting machine model assessed by area under the receiver operator characteristic curves. A. Each data category, B. Combination of data categories. AUROC: area under the receiver operator characteristic curve.
Performance of machine learning models for predicting major adverse cardiac events.
| Model | Area under the receiver operating characteristic curve | 95% CI | Area under the precision-recall curve | Brier score | |
| Logistic regression | 0.83 | 0.82-0.88 | <.001 | 0.37 | 0.06 |
| Random forest | 0.85 | 0.83-0.88 | <.001 | 0.39 | 0.06 |
| Gradient boosting machine | 0.88 | 0.85-0.90 | <.001 | 0.50 | 0.05 |
| Feedforward neural network | 0.85 | 0.83-0.88 | <.001 | 0.41 | 0.06 |
Top 10 important variables of each machine learning model.
| Rank | Logistic regression | Random forest | Gradient boosting machine | Feedforward neural network |
| 1 | Systolic blood pressure | Serum aspartate aminotransferase | Serum protein | Serum phosphorus |
| 2 | Diastolic blood pressure | PaCO2 | Age | PaCO2 |
| 3 | Respiratory rate | Arterial pH | Serum phosphorus | Hemoglobin |
| 4 | PaCO2 | PaO2 | Systolic blood pressure | Systolic blood pressure |
| 5 | Arterial pH | Serum alanine aminotransferase | Platelet | Normal sinus rhythm in electrocardiogram |
| 6 | PaO2 | Total bilirubin | Serum aspartate aminotransferase | Estimated glomerular filtration rate |
| 7 | Aspartate aminotransferase | Creatine kinase-myocardial band | PaO2 | Serum glucose |
| 8 | Pulse rate | White blood cell | Serum albumin | Platelet |
| 9 | Blood urea nitrogen | Serum sodium | Pulse rate | PaO2 |
| 10 | Serum phosphorus | Platelet | Activated partial thromboplastin time | Arterial pH |