| Literature DB >> 32734166 |
Armando D Bedoya1, Joseph Futoma2,3, Meredith E Clement4, Kristin Corey5,6, Nathan Brajer5,6, Anthony Lin5,6, Morgan G Simons5,6, Michael Gao5, Marshall Nichols5, Suresh Balu5,6, Katherine Heller2, Mark Sendak5, Cara O'Brien7.
Abstract
OBJECTIVE: Determine if deep learning detects sepsis earlier and more accurately than other models. To evaluate model performance using implementation-oriented metrics that simulate clinical practice.Entities:
Keywords: ROC curve; adult; clinical; decision; electronic health records/statistics and numerical data; emergency service; hospital/statistics and numerical data; hospitalization/statistics and numerical data; machine learning; retrospective studies; sepsis/mortality; support systems
Year: 2020 PMID: 32734166 PMCID: PMC7382639 DOI: 10.1093/jamiaopen/ooaa006
Source DB: PubMed Journal: JAMIA Open ISSN: 2574-2531
Baseline characteristics of internal development and validation cohorts (90% and 10% of full data), and of temporal validation cohort
| Baseline characteristic of cohort | Development, | Septic development, | Internal validation, | Septic internal validation, | Temporal validation, | Septic temporal validation, |
|---|---|---|---|---|---|---|
| Age (years), mean ± SD | 55.9 (18.7) | 59.8 (17.1) | 56.2 (18.6) | 59.6 (17.3) | 50.4 (19.5) | 59.7 (17.0) |
| Sex male | 18 203 (47.1) | 4005 (54.5) | 2050 (47.7) | 434 (53.4) | 18 272 (45.9) | 1418 (55.3) |
| Weight (lbs), mean ± SD | 158.8 (19.7) | 160.2 (19.8) | 158.4 (19.4) | 159.7 (19.8) | 185.8 (72.1) | 184.4 (60.1) |
| Admission source | ||||||
| Home/non-healthcare facility | 30 063 (77.7) | 5072 (69.0) | 3363 (78.3) | 577 (71.0) | 34 848 (87.6) | 1854 (72.4) |
| Transfer from hospital | 4930 (12.7) | 1498 (20.4) | 534 (12.4) | 147 (18.1) | 2877 (7.2) | 538 (21.0) |
| Missing/other | 3689 (9.5) | 777 (10.6) | 400 (9.3) | 89 (10.9) | 2061 (5.2) | 170 (6.6) |
| Admission type | ||||||
| Elective | 11 854 (30.6) | 571 (7.8) | 1338 (31.1) | 60 (7.4) | 5620 (14.1) | 138 (5.4) |
| Emergency | 16 478 (42.6) | 4813 (65.5) | 1797 (41.8) | 522 (64.2) | 30 048 (75.5) | 1917 (74.8) |
| Urgent | 10 342 (26.7) | 1963 (26.7) | 1162 (27.0) | 231 (27.8) | 4118 (10.4) | 507 (19.8) |
| Race | ||||||
| Black or African American | 11 390 (29.4) | 2329 (31.7) | 1252 (29.1) | 255 (31.4) | 15 805 (39.7) | 931 (36.3) |
| Caucasian/White | 24 317 (62.9) | 4661 (63.4) | 2681 (62.4) | 499 (61.4) | 19 701 (49.5) | 1454 (56.8) |
| Missing/other | 2975 (7.7) | 357 (4.9) | 364 (8.5) | 59 (7.3) | 4280 (10.8) | 177 (6.9) |
| Comorbidities | ||||||
| Congestive heart failure | 5656 (14.6) | 1576 (21.5) | 627 (14.6) | 175 (21.6) | 3329 (8.4) | 469 (18.3) |
| Valvular disease | 5288 (13.7) | 1298 (17.7) | 544 (12.7) | 137 (16.9) | 1975 (5.0) | 232 (9.1) |
| Peripheral vascular disease | 4283 (11.1) | 1016 (13.8) | 474 (11.0) | 119 (14.6) | 1678 (4.2) | 198 (7.7) |
| Hypertension | 18 251 (47.2) | 4114 (56.0) | 2009 (46.8) | 445 (54.7) | 11 874 (29.8) | 1014 (39.6) |
| Other neurological disorders | 6725 (17.4) | 1974 (26.9) | 731 (17.0) | 226 (27.8) | 2538 (6.4) | 239 (9.3) |
| Pulmonary circulation disorders | 6917 (17.9) | 1830 (24.9) | 779 (18.1) | 192 (23.6) | 4047 (10.2) | 338 (13.2) |
| Diabetes mellitus without chronic complications | 6071 (15.7) | 1394 (19.0) | 685 (15.9) | 169 (20.8) | 2896 (7.3) | 225 (8.8) |
| Renal failure | 6188 (16.0) | 1876 (25.5) | 673 (15.7) | 216 (26.6) | 3829 (9.6) | 600 (23.4) |
| Solid tumor without Metastasis | 4711 (12.2) | 809 (11.0) | 525 (12.2) | 93 (11.4) | 3879 (9.7) | 395 (15.4) |
| Coagulopathy | 4503 (11.6) | 1588 (21.6) | 497 (11.6) | 173 (21.3) | 1558 (3.9) | 341 (13.3) |
| Obesity | 5542 (14.3) | 1203 (16.4) | 598 (13.9) | 133 (16.4) | 3213 (8.1) | 207 (8.1) |
| Fluid and electrolyte disorders | 10 204 (26.4) | 3221 (43.8) | 1110 (25.8) | 353 (43.4) | 4855 (12.2) | 668 (26.1) |
| Anemia | 9242 (23.9) | 2763 (37.6) | 1055 (24.6) | 309 (38.0) | 3327 (8.4) | 396 (15.5) |
| Depression | 6308 (16.3) | 1526 (20.8) | 715 (16.6) | 168 (20.7) | 2721 (6.8) | 174 (6.8) |
| Prior sepsis encounters in past year | ||||||
| 0 | 36 634 (94.7) | 6363 (86.6) | 4103 (95.5) | 727 (89.4) | 38 872 (97.7) | 2319 (90.5) |
| 1 | 1514 (3.9) | 688 (9.4) | 149 (3.5) | 64 (7.9) | 681 (1.7) | 165 (6.4) |
| 2 or more | 534 (1.4) | 296 (4.0) | 45 (1.0) | 22 (2.7) | 233 (0.6) | 78 (3.0) |
| Overall in-hospital mortality (%) | 1257 (3.2) | 696 (9.5) | 121 (2.8) | 59 (7.3) | 577 (1.5) | 337 (13.2) |
| Overall length of stay (h), median (25%–75%) | 95 (57–168) | 167 (95–318) | 95 (55–168) | 167 (94–315) | 13 (5–90) | 172 (95–342) |
| Overall rate of ICU admission (%) | 5870 (15.2) | 1530 (20.8) | 646 (15.0) | 166 (20.4) | 4598 (11.6) | 1148 (44.8) |
| Septic (%) | 7347 (19.0) | 7347 (100.0) | 813 (18.9) | 813 (100.0) | 2562 (6.4) | 2562 (100.0) |
Note: For each cohort, characteristics are also broken out among the subgroup of patients who acquire sepsis.
Figure 1.Results of our deep learning model compared with the clinical scores methods. (A) ROC curves for the MGP–RNN and the 3 clinical scores considered, SIRS, NEWS, and qSOFA is shown. The accompanying table lists C-statistics with bootstrap confidence intervals. (B) The average number of sepsis cases each day we expect to detect early before a definition for sepsis is met (ie, a more interpretable version of sensitivity), as a function of how many alarms each method would produce each hour is shown. We limit the average alarms per hour to less than 10, as this is the operating range at which we expect to use in practice. There were an average of 17.9 sepsis cases per 24-h period in the dataset, so sensitivity can be recovered by dividing the reported y-axis value in panel B by 17.9. Positive predictive value at a particular threshold can be recovered by dividing the reported y-axis value by 24 times the reported x-axis value (ie, the average number of alarms per 24-h period). MGP–RNN, multi-output Gaussian process and recurrent neural network; NEWS, national early warning score; QSOFA, quick Sequential Organ Failure Assessment; SIRS, systemic inflammatory response syndrome.
Figure 2.Results of our deep learning model compared with the other machine learning models. (A) ROC curves for the MGP–RNN and the 3 other machine learning models considered, Cox regression, penalized logistic regression, and random forest is shown. The accompanying table lists C-statistics with bootstrap confidence intervals. (B) The average number of sepsis cases each day we expect to detect early before a definition for sepsis is met (ie, a more interpretable version of sensitivity), as a function of how many alarms each method would produce each hour is shown. We limit the average alarms per hour to less than 10, as this is the operating range at which we expect to use in practice. There were an average of 17.9 sepsis cases per 24-h period in the dataset, so sensitivity can be recovered by dividing the reported y-axis value in panel B by 17.9. Positive predictive value at a particular threshold can be recovered by dividing the reported y-axis value by 24 times the reported x-axis value (ie, the average number of alarms per 24-h period). MGP–RNN, multi-output Gaussian process and recurrent neural network; PLR, penalized logistic regression; RF, random forest.
Figure 3.(A) Compares the AUROC obtained from the internal validation cohort with the AUROC from the temporal validation cohort for each method, along with bootstrap confidence intervals. (B) The AUROC as a function of hours after presentation to the ED for the temporal validation cohort for each method, limited to the first 24 h following initial presentation is shown. (C) The PPV at 75% sensitivity for each method as a function of number of hours after presentation to the ED is shown.
Figure 4.Results for the temporal validation cohort (analogous to Figures 1 and 2, which show results on the internal validation cohort.) (A) ROC curves and (B) the operating alarms are shown. There were an average of 14.4 sepsis cases per 24-h period in the dataset, so sensitivity can be recovered by dividing the reported y-axis value in panel B by 14.4. Positive predictive value at a particular threshold can be recovered by dividing the reported y-axis value by 24 times the reported x-axis value (ie, the average number of alarms per 24-h period).