Parth K Shah1, Jennifer C Ginestra2, Lyle H Ungar3, Paul Junker4, Jeff I Rohrbach4, Neil O Fishman5, Gary E Weissman2,5. 1. Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA. 2. Palliative and Advanced Illness Research Center, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA. 3. Department of Computer and Information Science, University of Pennsylvania, Philadelphia, PA. 4. Clinical Effectiveness and Quality Improvement, Hospital of the University of Pennsylvania, Philadelphia, PA. 5. Department of Medicine, Hospital of the University of Pennsylvania, Philadelphia, PA.
Abstract
OBJECTIVES: The National Early Warning Score, Modified Early Warning Score, and quick Sepsis-related Organ Failure Assessment can predict clinical deterioration. These scores exhibit only moderate performance and are often evaluated using aggregated measures over time. A simulated prospective validation strategy that assesses multiple predictions per patient-day would provide the best pragmatic evaluation. We developed a deep recurrent neural network deterioration model and conducted a simulated prospective evaluation. DESIGN: Retrospective cohort study. SETTING: Four hospitals in Pennsylvania. PATIENTS: Inpatient adults discharged between July 1, 2017, and June 30, 2019. INTERVENTIONS: None. MEASUREMENTS AND MAIN RESULTS: We trained a deep recurrent neural network and logistic regression model using data from electronic health records to predict hourly the 24-hour composite outcome of transfer to ICU or death. We analyzed 146,446 hospitalizations with 16.75 million patient-hours. The hourly event rate was 1.6% (12,842 transfers or deaths, corresponding to 260,295 patient-hours within the predictive horizon). On a hold-out dataset, the deep recurrent neural network achieved an area under the precision-recall curve of 0.042 (95% CI, 0.04-0.043), comparable with logistic regression model (0.043; 95% CI 0.041 to 0.045), and outperformed National Early Warning Score (0.034; 95% CI, 0.032-0.035), Modified Early Warning Score (0.028; 95% CI, 0.027- 0.03), and quick Sepsis-related Organ Failure Assessment (0.021; 95% CI, 0.021-0.022). For a fixed sensitivity of 50%, the deep recurrent neural network achieved a positive predictive value of 3.4% (95% CI, 3.4-3.5) and outperformed logistic regression model (3.1%; 95% CI 3.1-3.2), National Early Warning Score (2.0%; 95% CI, 2.0-2.0), Modified Early Warning Score (1.5%; 95% CI, 1.5-1.5), and quick Sepsis-related Organ Failure Assessment (1.5%; 95% CI, 1.5-1.5). CONCLUSIONS: Commonly used early warning scores for clinical decompensation, along with a logistic regression model and a deep recurrent neural network model, show very poor performance characteristics when assessed using a simulated prospective validation. None of these models may be suitable for real-time deployment.
OBJECTIVES: The National Early Warning Score, Modified Early Warning Score, and quick Sepsis-related Organ Failure Assessment can predict clinical deterioration. These scores exhibit only moderate performance and are often evaluated using aggregated measures over time. A simulated prospective validation strategy that assesses multiple predictions per patient-day would provide the best pragmatic evaluation. We developed a deep recurrent neural network deterioration model and conducted a simulated prospective evaluation. DESIGN: Retrospective cohort study. SETTING: Four hospitals in Pennsylvania. PATIENTS: Inpatient adults discharged between July 1, 2017, and June 30, 2019. INTERVENTIONS: None. MEASUREMENTS AND MAIN RESULTS: We trained a deep recurrent neural network and logistic regression model using data from electronic health records to predict hourly the 24-hour composite outcome of transfer to ICU or death. We analyzed 146,446 hospitalizations with 16.75 million patient-hours. The hourly event rate was 1.6% (12,842 transfers or deaths, corresponding to 260,295 patient-hours within the predictive horizon). On a hold-out dataset, the deep recurrent neural network achieved an area under the precision-recall curve of 0.042 (95% CI, 0.04-0.043), comparable with logistic regression model (0.043; 95% CI 0.041 to 0.045), and outperformed National Early Warning Score (0.034; 95% CI, 0.032-0.035), Modified Early Warning Score (0.028; 95% CI, 0.027- 0.03), and quick Sepsis-related Organ Failure Assessment (0.021; 95% CI, 0.021-0.022). For a fixed sensitivity of 50%, the deep recurrent neural network achieved a positive predictive value of 3.4% (95% CI, 3.4-3.5) and outperformed logistic regression model (3.1%; 95% CI 3.1-3.2), National Early Warning Score (2.0%; 95% CI, 2.0-2.0), Modified Early Warning Score (1.5%; 95% CI, 1.5-1.5), and quick Sepsis-related Organ Failure Assessment (1.5%; 95% CI, 1.5-1.5). CONCLUSIONS: Commonly used early warning scores for clinical decompensation, along with a logistic regression model and a deep recurrent neural network model, show very poor performance characteristics when assessed using a simulated prospective validation. None of these models may be suitable for real-time deployment.
Authors: Evangelia Christodoulou; Jie Ma; Gary S Collins; Ewout W Steyerberg; Jan Y Verbakel; Ben Van Calster Journal: J Clin Epidemiol Date: 2019-02-11 Impact factor: 6.437
Authors: Matthew M Churpek; Ashley Snyder; Xuan Han; Sarah Sokol; Natasha Pettit; Michael D Howell; Dana P Edelson Journal: Am J Respir Crit Care Med Date: 2017-04-01 Impact factor: 21.405
Authors: Patricia Kipnis; Benjamin J Turk; David A Wulf; Juan Carlos LaGuardia; Vincent Liu; Matthew M Churpek; Santiago Romero-Brufau; Gabriel J Escobar Journal: J Biomed Inform Date: 2016-09-20 Impact factor: 6.317
Authors: Vincent X Liu; Yun Lu; Kyle A Carey; Emily R Gilbert; Majid Afshar; Mary Akel; Nirav S Shah; John Dolan; Christopher Winslow; Patricia Kipnis; Dana P Edelson; Gabriel J Escobar; Matthew M Churpek Journal: JAMA Netw Open Date: 2020-05-01
Authors: Stephen Gerry; Timothy Bonnici; Jacqueline Birks; Shona Kirtley; Pradeep S Virdee; Peter J Watkinson; Gary S Collins Journal: BMJ Date: 2020-05-20
Authors: Yoshihiko Raita; Tadahiro Goto; Mohammad Kamal Faridi; David F M Brown; Carlos A Camargo; Kohei Hasegawa Journal: Crit Care Date: 2019-02-22 Impact factor: 9.097
Authors: Sean C Yu; Aditi Gupta; Kevin D Betthauser; Patrick G Lyons; Albert M Lai; Marin H Kollef; Philip R O Payne; Andrew P Michelson Journal: Front Digit Health Date: 2022-03-08