Gary E Weissman1,2,3, Rebecca A Hubbard4, Lyle H Ungar5, Michael O Harhay2,4, Casey S Greene6,7,8, Blanca E Himes4,8, Scott D Halpern1,2,3,4. 1. Division of Pulmonary, Allergy, and Critical Care, Department of Medicine, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA. 2. Palliative and Advanced Illness Research Center, Department of Medicine, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA. 3. Leonard Davis Institute of Health Economics, University of Pennsylvania, Philadelphia, PA. 4. Department of Biostatistics, Epidemiology and Informatics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA. 5. Department of Computer and Information Science, University of Pennsylvania, Philadelphia, PA. 6. Department of Systems Pharmacology and Translational Therapeutics, University of Pennsylvania, Philadelphia, PA. 7. Institute for Translational Medicine and Therapeutics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA. 8. Institute for Biomedical Informatics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA.
Abstract
OBJECTIVES: Early prediction of undesired outcomes among newly hospitalized patients could improve patient triage and prompt conversations about patients' goals of care. We evaluated the performance of logistic regression, gradient boosting machine, random forest, and elastic net regression models, with and without unstructured clinical text data, to predict a binary composite outcome of in-hospital death or ICU length of stay greater than or equal to 7 days using data from the first 48 hours of hospitalization. DESIGN: Retrospective cohort study with split sampling for model training and testing. SETTING: A single urban academic hospital. PATIENTS: All hospitalized patients who required ICU care at the Beth Israel Deaconess Medical Center in Boston, MA, from 2001 to 2012. INTERVENTIONS: None. MEASUREMENTS AND MAIN RESULTS: Among eligible 25,947 hospital admissions, we observed 5,504 (21.2%) in which patients died or had ICU length of stay greater than or equal to 7 days. The gradient boosting machine model had the highest discrimination without (area under the receiver operating characteristic curve, 0.83; 95% CI, 0.81-0.84) and with (area under the receiver operating characteristic curve, 0.89; 95% CI, 0.88-0.90) text-derived variables. Both gradient boosting machines and random forests outperformed logistic regression without text data (p < 0.001), whereas all models outperformed logistic regression with text data (p < 0.02). The inclusion of text data increased the discrimination of all four model types (p < 0.001). Among those models using text data, the increasing presence of terms "intubated" and "poor prognosis" were positively associated with mortality and ICU length of stay, whereas the term "extubated" was inversely associated with them. CONCLUSIONS: Variables extracted from unstructured clinical text from the first 48 hours of hospital admission using natural language processing techniques significantly improved the abilities of logistic regression and other machine learning models to predict which patients died or had long ICU stays. Learning health systems may adapt such models using open-source approaches to capture local variation in care patterns.
OBJECTIVES: Early prediction of undesired outcomes among newly hospitalized patients could improve patient triage and prompt conversations about patients' goals of care. We evaluated the performance of logistic regression, gradient boosting machine, random forest, and elastic net regression models, with and without unstructured clinical text data, to predict a binary composite outcome of in-hospital death or ICU length of stay greater than or equal to 7 days using data from the first 48 hours of hospitalization. DESIGN: Retrospective cohort study with split sampling for model training and testing. SETTING: A single urban academic hospital. PATIENTS: All hospitalized patients who required ICU care at the Beth Israel Deaconess Medical Center in Boston, MA, from 2001 to 2012. INTERVENTIONS: None. MEASUREMENTS AND MAIN RESULTS: Among eligible 25,947 hospital admissions, we observed 5,504 (21.2%) in which patients died or had ICU length of stay greater than or equal to 7 days. The gradient boosting machine model had the highest discrimination without (area under the receiver operating characteristic curve, 0.83; 95% CI, 0.81-0.84) and with (area under the receiver operating characteristic curve, 0.89; 95% CI, 0.88-0.90) text-derived variables. Both gradient boosting machines and random forests outperformed logistic regression without text data (p < 0.001), whereas all models outperformed logistic regression with text data (p < 0.02). The inclusion of text data increased the discrimination of all four model types (p < 0.001). Among those models using text data, the increasing presence of terms "intubated" and "poor prognosis" were positively associated with mortality and ICU length of stay, whereas the term "extubated" was inversely associated with them. CONCLUSIONS: Variables extracted from unstructured clinical text from the first 48 hours of hospital admission using natural language processing techniques significantly improved the abilities of logistic regression and other machine learning models to predict which patients died or had long ICU stays. Learning health systems may adapt such models using open-source approaches to capture local variation in care patterns.
Authors: Meeta Prasad Kerlin; Michael O Harhay; Kelly C Vranas; Elizabeth Cooney; Sarah J Ratcliffe; Scott D Halpern Journal: Ann Am Thorac Soc Date: 2014-02
Authors: Romain Pirracchio; Maya L Petersen; Marco Carone; Matthieu Resche Rigon; Sylvie Chevret; Mark J van der Laan Journal: Lancet Respir Med Date: 2014-11-24 Impact factor: 30.700
Authors: Matthew M Churpek; Trevor C Yuen; Christopher Winslow; David O Meltzer; Michael W Kattan; Dana P Edelson Journal: Crit Care Med Date: 2016-02 Impact factor: 7.598
Authors: Jason Wagner; Nicole B Gabler; Sarah J Ratcliffe; Sydney E S Brown; Brian L Strom; Scott D Halpern Journal: Ann Intern Med Date: 2013-10-01 Impact factor: 25.391
Authors: H Andrew Schwartz; Johannes C Eichstaedt; Margaret L Kern; Lukasz Dziurzynski; Stephanie M Ramones; Megha Agrawal; Achal Shah; Michal Kosinski; David Stillwell; Martin E P Seligman; Lyle H Ungar Journal: PLoS One Date: 2013-09-25 Impact factor: 3.240
Authors: Katherine R Courtright; Corey Chivers; Michael Becker; Susan H Regli; Linnea C Pepper; Michael E Draugelis; Nina R O'Connor Journal: J Gen Intern Med Date: 2019-07-16 Impact factor: 5.128
Authors: Michael F Gensheimer; A Solomon Henry; Douglas J Wood; Trevor J Hastie; Sonya Aggarwal; Sara A Dudley; Pooja Pradhan; Imon Banerjee; Eunpi Cho; Kavitha Ramchandran; Erqi Pollom; Albert C Koong; Daniel L Rubin; Daniel T Chang Journal: J Natl Cancer Inst Date: 2019-06-01 Impact factor: 13.506
Authors: Jessica M Schwartz; Amanda J Moy; Sarah C Rossetti; Noémie Elhadad; Kenrick D Cato Journal: J Am Med Inform Assoc Date: 2021-03-01 Impact factor: 4.497
Authors: Ben J Marafino; Miran Park; Jason M Davies; Robert Thombley; Harold S Luft; David C Sing; Dhruv S Kazi; Colette DeJong; W John Boscardin; Mitzi L Dean; R Adams Dudley Journal: JAMA Netw Open Date: 2018-12-07