| Literature DB >> 30282956 |
Hyung-Chul Lee1, Hyun-Kyu Yoon2, Karam Nam3, Youn Joung Cho4, Tae Kyong Kim5, Won Ho Kim6, Jae-Hyon Bahk7.
Abstract
Machine learning approaches were introduced for better or comparable predictive ability than statistical analysis to predict postoperative outcomes. We sought to compare the performance of machine learning approaches with that of logistic regression analysis to predict acute kidney injury after cardiac surgery. We retrospectively reviewed 2010 patients who underwent open heart surgery and thoracic aortic surgery. Baseline medical condition, intraoperative anesthesia, and surgery-related data were obtained. The primary outcome was postoperative acute kidney injury (AKI) defined according to the Kidney Disease Improving Global Outcomes criteria. The following machine learning techniques were used: decision tree, random forest, extreme gradient boosting, support vector machine, neural network classifier, and deep learning. The performance of these techniques was compared with that of logistic regression analysis regarding the area under the receiver-operating characteristic curve (AUC). During the first postoperative week, AKI occurred in 770 patients (38.3%). The best performance regarding AUC was achieved by the gradient boosting machine to predict the AKI of all stages (0.78, 95% confidence interval (CI) 0.75⁻0.80) or stage 2 or 3 AKI. The AUC of logistic regression analysis was 0.69 (95% CI 0.66⁻0.72). Decision tree, random forest, and support vector machine showed similar performance to logistic regression. In our comprehensive comparison of machine learning approaches with logistic regression analysis, gradient boosting technique showed the best performance with the highest AUC and lower error rate. We developed an Internet⁻based risk estimator which could be used for real-time processing of patient data to estimate the risk of AKI at the end of surgery.Entities:
Keywords: acute kidney injury; cardiovascular surgery; machine learning
Year: 2018 PMID: 30282956 PMCID: PMC6210196 DOI: 10.3390/jcm7100322
Source DB: PubMed Journal: J Clin Med ISSN: 2077-0383 Impact factor: 4.241
Patient characteristics and postoperative renal function in the dataset.
| Variables | All | Training Set | Test Set | |
|---|---|---|---|---|
| Patient population, | 2010 | 1005 | 1005 | |
| Demographic data | ||||
| Age (years) | 64 (56–71) | 64 (56–71) | 64 (55–71) | 0.884 |
| Female ( | 553 (27.5) | 279 (27.8) | 274 (27.3) | 0.803 |
| Body-mass index (kg/m2) | 23.8 (21.6–25.9) | 23.9 (21.7–25.9) | 23.7 (21.5–25.9) | 0.563 |
| Surgery type | ||||
| Coronary artery bypass ( | 911 (45.3) | 473 (47.1) | 438 (43.6) | 0.117 |
| Valvular heart surgery ( | 1052 (52.3) | 503 (50.0) | 549 (54.6) | 0.060 |
| Thoracic aortic surgery ( | 47 (2.3) | 29 (2.9) | 18 (1.8) | 0.104 |
| Emergency ( | 51 (2.5) | 26 (2.6) | 25 (2.5) | 0.887 |
| Previous cardiac surgery ( | 149 (7.4) | 75 (7.5) | 74 (7.4) | 0.932 |
| Medical history | ||||
| Hypertension ( | 1057 (52.6) | 538 (53.5) | 519 (51.6) | 0.396 |
| Diabetes mellitus ( | 588 (29.3) | 302 (30.0) | 286 (28.5) | 0.433 |
| Three vessel disease ( | 602 (30.0) | 306 (30.4) | 296 (29.5) | 0.626 |
| Previous coronary stent insertion ( | 235 (11.7) | 118 (11.7) | 117 (11.6) | 0.945 |
| Cerebrovascular accident ( | 228 (11.3) | 101 (10.0) | 127 (12.6) | 0.078 |
| COPD ( | 100 (5.0) | 49 (4.9) | 51 (5.1) | 0.837 |
| Pulmonary hypertension ( | 129 (6.4) | 60 (6.0) | 69 (6.9) | 0.413 |
| Chronic kidney disease ( | 121 (6.0) | 57 (5.7) | 64 (6.4) | 0.512 |
| Preoperative Medication | ||||
| ACEi ( | 114 (5.7) | 58 (5.8) | 56 (5.6) | 0.847 |
| ARB ( | 249 (12.4) | 122 (12.1) | 127 (12.6) | 0.735 |
| β-blocker ( | 289 (19.4) | 199 (19.8) | 190 (18.9) | 0.611 |
| Diuretics ( | 297 (14.8) | 133 (13.2) | 164 (16.3) | 0.059 |
| Calcium channel blocker ( | 287 (14.3) | 151 (15.0) | 136 (13.5) | 0.339 |
| Statins ( | 506 (25.2) | 255 (25.4) | 251 (25.0) | 0.837 |
| Aspirin ( | 957 (47.6) | 498 (49.6) | 459 (45.7) | 0.090 |
| Baseline laboratory findings | ||||
| Preoperative LVEF (%) | 58 (52–63) | 58 (53–63) | 57 (52–63) | 0.427 |
| Hematocrit (%) | 38 (34–42) | 38 (34–42) | 38 (34–42) | 0.844 |
| Serum creatinine (mg/dL) | 0.94 (0.80–1.12) | 0.93 (0.80–1.10) | 0.94 (0.80–1.13) | 0.613 |
| Serum Albumin (g/dL) | 4.1 (3.8–4.3) | 4.1 (3.9–4.3) | 4.1 (3.8–4.3) | 0.183 |
| Serum uric acid (mg/dL) | 4.6 (3.7–5.6) | 4.6 (3.7–5.7) | 4.5 (3.6–5.5) | 0.190 |
| Blood glucose (mg/dL) | 115 (96–146) | 116 (96–146) | 113 (96–147) | 0.500 |
| Surgery and anaesthesia details | ||||
| Operation time (h) | 6.25 (5.33–7.25) | 6.25 (5.41–7.27) | 6.25 (5.33–7.24) | 0.654 |
| Anesthesia time (h) | 7.50 (6.25–8.50) | 7.50 (6.50–8.50) | 7.50 (6.50–8.42) | 0.608 |
| Total intravenous anesthesia ( | 1858 (92.4) | 937 (93.2) | 921 (91.6) | 0.206 |
| Inhalational anesthesia ( | 152 (7.6) | 68 (6.8) | 84 (8.4) | 0.206 |
| Intraoperative crystalloid infusion (L) | 2150 (1150–3000) | 2200 (1100–3100) | 2150 (1200–2950) | 0.656 |
| Intraoperative colloid use (mL) | 900 (350–1500) | 1000 (350–1550) | 800 (350–1500) | 0.067 |
| pRBC transfusion during surgery (units) | 2 (0–3) | 2 (0–3) | 2 (0–3) | 0.725 |
| FFP transfusion during surgery (units) | 0 (0–3) | 0 (0–3) | 0 (0–3) | 0.589 |
| Intraoperative mean arterial pressure (mmHg) | 72 (67–78) | 72 (67–78) | 72 (67–78) | 0.974 |
| Intraoperative mean cardiac index (L/min) | 2.3 (2.1–2.7) | 2.3 (2.1–2.7) | 2.3 (2.1–2.7) | 0.257 |
| Intraoperative mean SvO2 (%) | 73 (69–76) | 73 (69–76) | 73 (68–76) | 0.207 |
| Intraoperative diuretics use ( | 204 (10.1) | 91 (9.1) | 113 (11.2) | 0.107 |
| Postoperative renal function | ||||
| AKI according to KDIGO criteria ( | 0.596 | |||
| Stage 1 | 591 (29.4) | 282 (28.1) | 309 (30.7) | |
| Stage 2 | 114 (5.7) | 60 (6.0) | 54 (5.4) | |
| Stage 3 | 65 (3.2) | 33 (3.3) | 32 (3.2) | |
| Hemodialysis dependent ( | 125 (6.2) | 60 (6.0) | 65 (6.5) | 0.644 |
| GFR at postoperative day one (ml/min/1.73m2) | 79 (58–94) | 79 (57–95) | 78 (58–94) | 0.864 |
Data are presented as median (interquartile range) or number (%). COPD = chronic obstructive pulmonary disease, ACEi = angiotensin-converting-enzyme inhibitor, AKI = acute kidney injury, ARB = angiotensin II receptor blocker, LVEF = left ventricular ejection fraction, pRBC = packed red blood cell transfusion, FFP = fresh-frozen plasma, SvO2 = mixed venous oxygen saturation, KDIGO = kidney disease improving global outcomes, GFR = glomerular filtration rate.
Comparison of area under receiver-operating characteristic curve among the different models.
| Model | Software or R Packages | Error Rate of Test Data Set | AUC in the Test Set |
|---|---|---|---|
| Machine learning techniques | |||
| Decision tree, CART | tree, rpart | 28.9% | 0.71 (0.67–0.74) |
| ROSE decision tree | ROSE | 30.6% | 0.66 (0.65–0.72) |
| Random forest model | randomForest | 30.4% | 0.68 (0.64–0.71) |
| Random forest SMOTE model | DMwR | 33.5% | 0.68 (0.65–0.71) |
| Gradient boosting classification | XGBoost | 26.0% | 0.78 (0.75–0.80) * |
| Support vector machine, classifier | e1071 | 31.4% | 0.67 (0.63–0.70) |
| Support vector machine, SMOTE model | UBL | 33.3% | 0.68 (0.65–0.71) |
| Support vector machine, least square | Kernlab | 30.2% | 0.69 (0.66–0.72) |
| Neural network classifier | nnet | 38.4% | 0.64 (0.61–0.68) |
| Neural network classifier | neuralnet | 43.9% | 0.57 (0.53–0.61) |
| Deep belief network | h2o | 47.2% | 0.55 (0.51–0.59) |
| Risk scores from logistic regression analysis | |||
| Logistic regression model, stepwise variable selection | R | 33.6% | 0.69 (0.66–0.72) |
| Logistic regression model, without variable selection | R | 32.8% | 0.70 (0.68–0.73) |
| AKICS score | R | 43.4% | 0.57 (0.53–0.60) |
| Wijeysundera and colleagues | R | 45.2% | 0.55 (0.51–0.59) |
| Metha and colleagues | R | 45.8% | 0.55 (0.52–0.59) |
| Thakar and colleagues | R | 45.3% | 0.56 (0.53–0.60) |
| Brown and colleagues | R | 43.1% | 0.58 (0.54–0.61) |
| Aronson and colleagues | R | 43.3% | 0.58 (0.51–0.62) |
| Fortescue and colleagues | R | 44.2% | 0.56 (0.52–0.60) |
| Rhamanian and colleagues | R | 47.0% | 0.55 (0.52–0.58) |
Error rate was defined as sum of the number of cases with false positive and false negative divided by all test set. * Significantly greater than AUC of all the other techniques, AUC = area under the receiver operating characteristic curve, CART = Classification And Regression Tree, ROSE = Random Over-Sampling Examples, SMOTE = Synthetic Minority Over-sampling Technique, DMwR = Data Mining with R, XGBoost = eXtreme Gradient Boosting, UBL = utility-based learning, AKICS = acute kidney injury following cardiac surgery.
Figure 1Comparison of AUC among the different machine learning models and logistic regression model. AKICS = acute kidney injury after cardiac surgery.
Development of multivariable logistic regression model to predict acute kidney injury using stepwise variable selection.
| Variable | Beta-Coefficient | Odds Ratio | 95% CI | |
|---|---|---|---|---|
| Age per 10 year | 0.128 | 1.14 | 1.04–1.61 | 0.004 |
| History of hypertension | 0.320 | 1.38 | 1.12–1.69 | 0.002 |
| Baseline chronic kidney disease | 0.907 | 2.48 | 1.62–3.78 | <0.001 |
| Preoperative E/e´ > 15 | 0.454 | 1.58 | 1.27–1.96 | <0.001 |
| Preoperative hematocrit, % | −0.062 | 0.94 | 0.92–0.96 | <0.001 |
| Surgery time, per 1 h | 0.073 | 1.08 | 1.01–1.15 | 0.036 |
| Intraoperative red blood cell transfusion, unit | 0.056 | 1.06 | 1.01–1.11 | 0.022 |
| Intraoperative fresh frozen plasma transfusion, unit | 0.085 | 1.09 | 1.03–1.15 | 0.001 |
| Intraoperative diuretics use | 0.630 | 1.88 | 1.36–2.60 | <0.001 |
Multivariable logistic regression analysis was performed using all the variables in Table 1. Stepwise backward variable selection process was used for this analysis using cutoff of p-value of less than 0.10. Nagelkerke’s R2 was 0.32 and Hosmer-Lemeshow goodness-of-fit test showed good calibration (chi-square = 12.1, p = 0.231). CI = confidence interval, E/e´ = ratio of early transmitral flow velocity to early diastolic velocity of the mitral annulus.
Figure 2Simple decision tree model showing the classification of patients with (1) and without (0) acute kidney injury (AKI). The numbers with two decimals in each cell means the probability of developing AKI in each classification tree. The blue or green color becomes dense when it is more likely to develop acute kidney injury or not. The % number in the boxes denotes the percentage of patients with each discriminating variable from CART (Classification And Regression Tree) analysis. Intraop = intraoperative, preop = preoperative, pRBC = packed red blood cells, Hct = hematocrit, Cr = creatinine, FFP = fresh frozen plasma, E_or_e_prime = preoperative ratio of early transmitral flow velocity to early diastolic velocity of the mitral annulus.
Figure 3Importance matrix plot of the gradient boosting machine. This figure shows the importance of each covariates in the final model. ARB = angiotensin receptor blocker, BMI = body-mass index, CABG = coronary artery bypass graft, CCB = calcium channel blocker, CKD = chronic kidney disease, Cr = creatinine, CVA = history of cerebrovascular accident, EF = ejection fraction, E_or_e_prime = preoperative ratio of early transmitral flow velocity to early diastolic velocity of the mitral annulus, FFP = fresh frozen plasma, hct = hematocrit, HTN = hypertension, intraop = intraoperative, mean SvO2 = intraoperative mean mixed venous oxygen saturation, three_VD = three vessel coronary disease, preop = preoperative, pRBC = packed red blood cells.