| Literature DB >> 32352036 |
Ira S Hofer1,2, Christine Lee2, Eilon Gabel1, Pierre Baldi3, Maxime Cannesson1.
Abstract
During the perioperative period patients often suffer complications, including acute kidney injury (AKI), reintubation, and mortality. In order to effectively prevent these complications, high-risk patients must be readily identified. However, most current risk scores are designed to predict a single postoperative complication and often lack specificity on the patient level. In other fields, machine learning (ML) has been shown to successfully create models to predict multiple end points using a single input feature set. We hypothesized that ML can be used to create models to predict postoperative mortality, AKI, reintubation, and a combined outcome using a single set of features available at the end of surgery. A set of 46 features available at the end of surgery, including drug dosing, blood loss, vital signs, and others were extracted. Additionally, six additional features accounting for total intraoperative hypotension were extracted and trialed for different models. A total of 59,981 surgical procedures met inclusion criteria and the deep neural networks (DNN) were trained on 80% of the data, with 20% reserved for testing. The network performances were then compared to ASA Physical Status. In addition to creating separate models for each outcome, a multitask learning model was trialed that used information on all outcomes to predict the likelihood of each outcome individually. The overall rate of the examined complications in this data set was 0.79% for mortality, 22.3% (of 21,676 patients with creatinine values) for AKI, and 1.1% for reintubation. Overall, there was significant overlap between the various model types for each outcome, with no one modeling technique consistently performing the best. However, the best DNN models did beat the ASA score for all outcomes other than mortality. The highest area under the receiver operating characteristic curve (AUC) models were 0.792 (0.775-0.808) for AKI, 0.879 (0.851-0.905) for reintubation, 0.907 (0.872-0.938) for mortality, and 0.874 (0.864-0.866) for any outcome. The ASA score alone achieved AUCs of 0.652 (0.636-0.669) for AKI, 0.787 (0.757-0.818) for reintubation, 0.839 (0.804-0.875) for mortality, and 0.76 (0.748-0.773) for any outcome. Overall, the DNN architecture was able to create models that outperformed the ASA physical status to predict all outcomes based on a single feature set, consisting of objective data available at the end of surgery. No one model architecture consistently performed the best.Entities:
Keywords: Disease-free survival; Health policy; Translational research
Year: 2020 PMID: 32352036 PMCID: PMC7170922 DOI: 10.1038/s41746-020-0248-0
Source DB: PubMed Journal: NPJ Digit Med ISSN: 2398-6352
Description of demographic features.
| Train | Test | |
|---|---|---|
| # Patients | 47,985 | 11,996 |
| Age | 56 ± 17 | 56 ± 94 |
| EBL | 96 ± 539 | 18 ± 410 |
| # With Aline | 8583 | 2135 |
| # With PA | 1641 | 430 |
| # With CVC | 2443 | 635 |
| ASA score | ||
| 1 | 3022 | 762 |
| 2 | 17,930 | 4477 |
| 3 | 23,960 | 5985 |
| 4 | 2910 | 735 |
| 5 | 144 | 30 |
| 6 | 4 | 0 |
| Unknown | 15 | 7 |
| Primary CPT by specialty | ||
| Gastroenterology | 6615 (13.8%) | 1614 (13.5%) |
| General Surgery | 6552 (13.7%) | 1646 (13.7%) |
| Urology | 4005 (8.3%) | 1062 (8.9%) |
| Orthopedics | 3916 (8.2%) | 979 (8.2%) |
| Neurosurgery | 3686 (7.7%) | 916 (7.6%) |
| Otolaryngology | 3268 (6.8%) | 860 (7.2%) |
| Obstetrics and Gynecology | 2630 (5.5%) | 672 (5.6%) |
| Vascular Surgery | 1834 (3.8%) | 445 (3.7%) |
| Cardiac Surgery | 1396 (2.9%) | 372 (3.1%) |
| Thoracic Surgery | 1095 (2.3%) | 273 (2.3%) |
| Other | 8497 (17.7%) | 2049 (17.1%) |
| Unknown | 4491 (9.4%) | 1108 (9.2%) |
| AKI | ||
| Class 1 | 2501 (5.21%) | 622 (5.19%) |
| Class 2 | 369 (0.77%) | 99 (0.83%) |
| Class 3 | 1001 (2.09%) | 246 (2.05%) |
| Null | 30616 (63.8%) | 7689 (64.1%) |
| Reintubation | 548 (1.14%) | 159 (1.33%) |
| Mortality | 389 (0.81%) | 87 (0.73%) |
AUC for prediction of acute kidney injury (AKI), reintubation, mortality, and any outcome with 95% CIs for the test set (N = 11,996) for the ASA score, logistic regression (LR) models, deep neural networks predicting individual outcomes (DNN individual), and deep neural networks predicting all three outcomes (DNN combined).
| Score | AKIa | Reintubation | Mortality | Any outcome |
|---|---|---|---|---|
| ASA | 0.652 (0.636–0.669) | 0.787 (0.757–0.818) | 0.839 (0.804–0.875) | 0.76 (0.748–0.773) |
| RQIb | 0.652 (0.623–0.683) | 0.878 (0.842–0.909) | 0.907 (0.86–0.942) | 0.8 (0.778–0.821) |
| RSIc | 0.594 (0.571–0.615) | 0.829 (0.783–0.873) | 0.97 (0.944–0.99) | 0.597 (0.576–0.621) |
Each model was also evaluated for each feature set combination of original feature set (OFS), OFS + the minimum MAP features (OFS + MAP), and reduced feature set (RFS). Note that for the LR and individual models, there is one model per outcome and the predicted outcome probabilities from each model is stacked to predict any outcome. For the combined models, there is one model for all three outcomes and those probabilities are stacked to predict any outcome. Bold results indicate the best AUC for that measure.
aIt should be noted that AKI labels were only available for 4307 of the test patients, and so all AUCs reflect results for only those patients with AKI labels.
bRQI was calculated on 5591 test patients (63 reintubation, 38 mortality, and 491 any label); and on 2319 test patients with AKI labels (445 positive).
cRSI was calculated on 11,939 test patients (159 reintubation, 86 mortality, and 1066 any label); and on 4294 test patients with AKI labels (967 positive).
Fig. 1Visual depiction of the any outcome stacked models.
Summary figure describing the stacked “any” postoperative outcome models for the combined deep neural networks (DNN combined) trained to output probabilities of all three outcomes vs the deep neural networks (DNN individual) and logistic regression (LR) models that were individually trained per outcome.
Best threshold chosen by highest F1 score.
| AKIa | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|
| Score | Threshold | Sensitivity (95% CI) | Specificity (95% CI) | Precision (95% CI) | TN | FP | FN | TP | Accuracy (%) | |
| ASA | 3 | 0.412 (0.393–0.43) | 0.914 (0.896–0.93) | 0.27 (0.255–0.284) | 0.266 (0.251–0.281) | 901 | 2439 | 83 | 884 | 41.4 |
| 0.631 (0.597–0.661) | 0.793 (0.78–0.807) | 0.469 (0.442–0.497) | 2650 | 690 | 357 | 610 | 75.7 | |||
| LR OFS + MAP features | 0.27574 | 0.537 (0.512–0.563) | 0.624 (0.59–0.654) | 0.798 (0.785–0.812) | 0.472 (0.444–0.5) | 2666 | 674 | 364 | 603 | 75.9 |
| LR RFS | 0.287606 | 0.537 (0.51–0.563) | 0.607 (0.575–0.637) | 0.811 (0.798–0.823) | 0.482 (0.454–0.511) | 2708 | 632 | 380 | 587 | 76.5 |
| DNN individual OFS | 0.408436 | 0.545 (0.52–0.569) | 0.654 (0.622–0.682) | 0.784 (0.77–0.798) | 0.467 (0.441–0.493) | 2618 | 722 | 335 | 632 | 75.5 |
| 0.548 (0.515–0.579) | 0.881 (0.87–0.892) | 0.571 (0.542–0.603) | 2942 | 398 | 437 | 530 | 80.6 | |||
| DNN individual RFS | 0.406397 | 0.542 (0.516–0.568) | 0.618 (0.586–0.648) | 0.808 (0.794–0.821) | 0.483 (0.455–0.51) | 2699 | 641 | 369 | 598 | 76.5 |
| DNN combined OFS | 0.906036 | 0.548 (0.521–0.575) | 0.568 (0.536–0.598) | 0.854 (0.843–0.865) | 0.53 (0.501–0.559) | 2853 | 487 | 418 | 549 | 79.0 |
| DNN combined OFS + MAP features | 0.901522 | 0.549 (0.524–0.575) | 0.58 (0.55–0.61) | 0.846 (0.833–0.857) | 0.521 (0.493–0.552) | 2825 | 515 | 406 | 561 | 78.6 |
| DNN combined RFS | 0.869984 | 0.557 (0.53–0.583) | 0.575 (0.543–0.606) | 0.858 (0.846–0.87) | 0.539 (0.51–0.569) | 2865 | 475 | 411 | 556 | 79.4 |
Comparison of F1 score, sensitivity, and specificity with best thresholds for acute kidney injury (AKI), reintubation, mortality, and any outcome with 95% CIs for the test set (N = 11,996) for the ASA score, logistic regression (LR) models, deep neural networks predicting individual outcomes (DNN individual), and deep neural networks predicting all three outcomes (DNN combined). Each model was also evaluated for each feature set combination of original feature set (OFS), OFS + the minimum MAP features (OFS + MAP), and reduced feature set (RFS). Note that for the LR and individual models, there is one model per outcome and the predicted outcome probabilities from each model is stacked to predict any outcome. For the combined models, there is one model for all three outcomes and those probabilities are stacked to predict any outcome.
aIt should be noted that AKI labels were only available for 4307 of the test patients, and so all results for AKI are from those patients with AKI labels. Bolded are the best F1 scores for logistic regression and DNN models.
a McNemar test results comparing logistic regression (LR) models and deep neural network (DNN) models classification errors when choosing best thresholds by the highest F1 score. b McNemar test results comparing individual DNN to combined DNN.
| AKIa | Reintubation | Mortality | Any outcome | ||||||
|---|---|---|---|---|---|---|---|---|---|
| Logistic regression model | DNN model | ||||||||
| LR OFS | DNN combined RFS | 4.62E−15 | TRUE | 4.39E−01 | FALSE | 1.77E−06 | TRUE | 5.92E−34 | TRUE |
| LR OFS | DNN combined OFS | 1.34E−11 | TRUE | 8.42E−06 | TRUE | 8.78E−01 | FALSE | 6.05E−03 | TRUE |
| LR OFS | DNN combined OFS + MAP features | 8.01E−10 | TRUE | 5.08E−01 | FALSE | 1.26E−01 | FALSE | 2.54E−21 | TRUE |
| LR OFS | DNN individual OFS | 5.92E−01 | FALSE | 5.72E−04 | TRUE | 2.01E−02 | TRUE | 1.90E−02 | TRUE |
| LR OFS | DNN individual RFS | 3.34E−02 | TRUE | 1.33E−06 | TRUE | 2.12E−12 | TRUE | 1.32E−07 | TRUE |
| LR OFS | DNN individual OFS + MAP Features | 5.29E−06 | TRUE | 2.89E−03 | TRUE | ||||
| LR RFS | DNN combined RFS | 2.39E−10 | TRUE | 3.15E−01 | FALSE | 1.75E−01 | FALSE | 7.52E−04 | TRUE |
| LR RFS | DNN combined OFS | 7.48E−08 | TRUE | 3.12E−05 | TRUE | 1.82E−03 | TRUE | 4.49E−24 | TRUE |
| LR RFS | DNN combined OFS + MAP features | 3.63E−06 | TRUE | 6.80E−01 | FALSE | 8.58E−06 | TRUE | 3.67E−37 | TRUE |
| LR RFS | DNN individual OFS | 1.28E−02 | TRUE | 1.76E−03 | TRUE | 8.14E−02 | FALSE | 2.86E−03 | TRUE |
| LR RFS | DNN individual RFS | 9.53E−01 | FALSE | 3.56E−06 | TRUE | 4.77E−05 | TRUE | 3.25E−09 | TRUE |
| LR RFS | DNN individual OFS + MAP features | 1.36E−17 | TRUE | 3.03E−05 | TRUE | 3.21E−01 | FALSE | 6.21E−18 | TRUE |
| LR OFS + MAP features | DNN combined RFS | 4.54E−14 | TRUE | 6.38E−01 | FALSE | 1.77E−06 | TRUE | 4.11E−02 | TRUE |
| LR OFS + MAP features | DNN combined OFS | 7.89E−11 | TRUE | 2.51E−06 | TRUE | 8.83E−01 | FALSE | 1.49E−18 | TRUE |
| LR OFS + MAP features | DNN combined OFS + MAP features | 7.09E−09 | TRUE | 1.35E−01 | FALSE | 2.81E−31 | TRUE | ||
| LR OFS + MAP features | DNN individual OFS | 2.90E−01 | FALSE | 1.41E−04 | TRUE | 1.15E−01 | FALSE | ||
| LR OFS + MAP features | DNN individual RFS | 1.09E−01 | FALSE | 3.59E−07 | TRUE | 4.03E−12 | TRUE | 5.36E−06 | TRUE |
| LR OFS + MAP features | DNN individual OFS + MAP features | 3.81E−21 | TRUE | 9.69E−07 | TRUE | 6.60E−03 | TRUE | 2.09E−13 | TRUE |
McNemar test p values < 0.05 were considered significant, indicating that the classifiers have significantly different proportion of errors when classifying acute kidney injury (AKI), reintubation, mortality, or any outcome for the test set (N = 11,996) when comparing the logistic regression (LR) models, deep neural networks predicting individual outcomes (DNN individual), and deep neural networks predicting all three outcomes (DNN combined). Each model was also evaluated for each feature set combination of original feature set (OFS), OFS + the minimum MAP features (OFS + MAP), and reduced feature set (RFS). Note that for the LR and individual models, there is one model per outcome and the predicted outcome probabilities from each model is stacked to predict any outcome. For the combined models, there is one model for all three outcomes and those probabilities are stacked to predict any outcome.
Bolded results are the smallest p values for the given outcome.
An example of how to interpret this table is: for correctly classifying any outcome, all LR and DNN models were significantly different (p < 0.05) from each other except for LR OFS + MAP and DNN Individual OFS. The best performing F1 score LR model was LR OFS (F1 score 0.504, sensitivity 0.542, specificity 0.941, and precision 0.471) and the best performing DNN model was DNN individual OFS + MAP (F1 score 0.482; sensitivity 0.584; specificity 0.918; and precision 0.41).
aIt should be noted that AKI labels were only available for 4307 of the test patients, and so all results for AKI are from those patients with AKI labels.
Fig. 2ROC Curves for AKI, mortality, reintubation and any outcome.
ROC Curves for AKI (a), mortality (b), reintubation (c) and any outcome (d). Receiver operator characteristic curves for acute kidney injury (AKI), reintubation, mortality, and any outcome for the test set (N = 11,996) for the ASA score, logistic regression (LR) models, deep neural networks predicting individual outcomes (DNN individual), and deep neural networks predicting all three outcomes (DNN combined). Each model was also evaluated for each feature set combination of original feature set (OFS), OFS + the minimum MAP features (OFS + MAP), and reduced feature set (RFS). Note that for the LR and individual models, there is one model per outcome and the predicted outcome probabilities from each model is stacked to predict any outcome. For the combined models, there is one model for all three outcomes and those probabilities are stacked to predict any outcome. *It should be noted that AKI labels were only available for 4307 of the test patients, and so all AUCs reflect results for only those patients with AKI labels.
Fig. 3Scatter plot and Pearson correlations for potential outcome pairs.
Scatter plot comparison and Pearson correlation (r) for predicted probabilities of AKI, mortality, and reintubation from the best performing AUC DNN model with OFS + MAP features. a AKI vs Mortality; b Reintubation vs Mortality; c AKI vs Reintubation.