| Literature DB >> 32012659 |
Arash Kia1, Prem Timsina1, Himanshu N Joshi1, Eyal Klang2, Rohit R Gupta3, Robert M Freeman1, David L Reich4, Max S Tomlinson5,6, Joel T Dudley5,6, Roopa Kohli-Seth3, Madhu Mazumdar1, Matthew A Levin1,4,5,6.
Abstract
Early detection of patients at risk for clinical deterioration is crucial for timely intervention. Traditional detection systems rely on a limited set of variables and are unable to predict the time of decline. We describe a machine learning model called MEWS++ that enables the identification of patients at risk of escalation of care or death six hours prior to the event. A retrospective single-center cohort study was conducted from July 2011 to July 2017 of adult (age > 18) inpatients excluding psychiatric, parturient, and hospice patients. Three machine learning models were trained and tested: random forest (RF), linear support vector machine, and logistic regression. We compared the models' performance to the traditional Modified Early Warning Score (MEWS) using sensitivity, specificity, and Area Under the Curve for Receiver Operating Characteristic (AUC-ROC) and Precision-Recall curves (AUC-PR). The primary outcome was escalation of care from a floor bed to an intensive care or step-down unit, or death, within 6 h. A total of 96,645 patients with 157,984 hospital encounters and 244,343 bed movements were included. Overall rate of escalation or death was 3.4%. The RF model had the best performance with sensitivity 81.6%, specificity 75.5%, AUC-ROC of 0.85, and AUC-PR of 0.37. Compared to traditional MEWS, sensitivity increased 37%, specificity increased 11%, and AUC-ROC increased 14%. This study found that using machine learning and readily available clinical data, clinical deterioration or death can be predicted 6 h prior to the event. The model we developed can warn of patient deterioration hours before the event, thus helping make timely clinical decisions.Entities:
Keywords: Failure to Rescue; Machine Learning Classifiers; Modified Early Warning Score; Unexpected Escalation; clinical deterioration
Year: 2020 PMID: 32012659 PMCID: PMC7073544 DOI: 10.3390/jcm9020343
Source DB: PubMed Journal: J Clin Med ISSN: 2077-0383 Impact factor: 4.241
Figure 1Prediction time and sampling window. t0 is the time of escalation, death, or discharge for patients with no event. Prediction time tp is the time prior to t0 at which a prediction was generated. The sampling window is the 24-h period preceding the prediction time tp.
Cohort demographics.
| Total | Training (%) | Test (%) | |||
|---|---|---|---|---|---|
| Bed movements | 117,884 | 15,818 | 102,066 | ||
| Bed movements per encounter | 1.67 ± 1.15 | 1.33 ± 0.76 | 1.59 ± 0.99 | ||
| Unique Patients * | 63,100 | 13,168 | 58,742 | ||
| Age | 18–45 | 19,422 (16.5) | 2107 (13.3) | 17,315 (17.0) | <0.001 |
| 45–65 | 40,942 (34.7) | 5060 (32.0) | 35,882 (35.2) | ||
| 65–80 | 37,596 (31.9) | 5266 (33.3) | 32,330 (31.7) | ||
| >80 | 19,924 (16.9) | 3385 (21.4) | 16,539 (16.2) | ||
| Gender | Female | 58,345 (49.5) | 7760 (49.1) | 50,585 (49.6) | 0.5 |
| Male | 59,532 (50.5) | 8057 (50.9) | 51,475 (50.4) | ||
| Other | 7 (0.0) | 1 (0.0) | 6 (0.0) | ||
| Major Diagnostic Category (MDC) | Circulatory system | 29,904 (25.4) | 3930 (24.8) | 25,974 (25.4) | <0.001 |
| Musculoskeletal system & connective tissue | 12,521 (10.6) | 1291 (8.2) | 11,230 (11.0) | ||
| Nervous system | 8767 (7.4) | 1329 (8.4) | 7438 (7.3) | ||
| Hepatobiliary/pancreas | 7368 (6.3) | 1223 (7.7) | 6145 (6.0) | ||
| Respiratory system | 7094 (6.0) | 1190 (7.5) | 5904 (5.8) | ||
| Infectious & parasitic | 5762 (4.9) | 1327 (8.4) | 4435 (4.3) | ||
| Kidney & urinary tract | 5474 (4.6) | 723 (4.6) | 4751 (4.7) | ||
| Endocrine/nutrition/metabolic | 4207 (3.6) | 513 (3.2) | 3694 (3.6) | ||
| Ear, nose, mouth, and throat | 2859 (2.4) | 319 (2.0) | 2540 (2.5) | ||
| Female reproductive system | 2809 (2.4) | 259 (1.6) | 2550 (2.5) | ||
| Skin, subcutaneous tissue, breast | 2459 (2.1) | 236 (1.5) | 2223 (2.2) | ||
| Other (MDCs with ≤ 2% occurrence) | 28,660 (24.3) | 3478 (22) | 25,182 (24.7) | ||
| Overall length of stay at hospital | ≤5 days | 52,087 (44.2) | 5410 (34.2) | 46,677 (45.7) | <0.001 |
| 5–12 days | 35,210 (29.9) | 4876 (30.8) | 30,334 (29.7) | ||
| 12–42 days | 26,753 (22.7) | 4482 (28.3) | 22,271 (21.8) | ||
| >42 days | 3834 (3.3) | 1050 (6.6) | 2784 (2.7) | ||
| Length of stay by hospital unit | ≤24 h | 52,932 (44.9) | 6699 (42.4) | 46,233 (45.3) | <0.001 |
| 1–3 days | 35,748 (30.3) | 4865 (30.8) | 30,883 (30.3) | ||
| 3–7 days | 20,916 (17.7) | 2833 (17.9) | 18,083 (17.7) | ||
| >7 days | 8288 (7.0) | 1421 (9.0) | 6867 (6.7) | ||
| Length of stay in the ICU | ≤24 h | 2805(28.8) | 198 (27.1) | 2607 (29.0) | 0.36 |
| 1–3 days | 4048 (41.6) | 322 (44.1) | 3726 (41.4) | ||
| 3–7 days | 1928 (19.8) | 134 (18.4) | 1794 (19.9) | ||
| >7 days | 947 (9.7) | 76 (10.4) | 871 (9.7) |
* Some patients appeared in both training and test sets because the data were split on bed movements, not patients.
Model performance metrics.
| Model | Sensitivity, % | Specificity, % | Accuracy, % | PPV, % | F1 Score | ROC | AUC PR | |
|---|---|---|---|---|---|---|---|---|
| Random Forest (MEWS++) | 78.9 | 79.1 | 79.1 | 11.5 | 0.2 | 87.9 | 36.2 | <0.0001 |
| Linear SVM | 79.0 | 77.9 | 77.9 | 11.0 | 0.19 | 87.3 | 28.7 | <0.00010.16 ** |
| LR | 61.4 | 78.5 | 77.9 | 9.0 | 0.16 | 79.1 | 17.2 | <0.0001 |
| MEWS Score | 64.2 | 66.2 | 66.2 | 6.1 | 0.11 | 66.7 | 7.0 |
* p-value for difference between AUC ROC for respective ML model and MEWS Score. ** p-value = 0.16 for Random Forest vs. Linear SVM. AUCPR—Area Under Precision Recall Curve, LR—Linear Regression, SVM—Support Vector Machine, ROC—Receiver Operating Characteristic.
Figure 2ROC and AUC PR Curves. Receiver Operating Characteristic (ROC) curves (left panel) and Precision-Recall curves (right panel) for the four models evaluated. MEWS++ (RF) performs better than other algorithms. LR—Logistic Regression, SVM = Support Vector Machine.
Figure 3Comparison of 24-h performance of RF Model (MEWS++) vs. classical MEWS. Predictions were generated every 2 h for 24 h prior to escalation. A threshold of 2 was used for MEWS, and 0.5 (the default) for the RF model. (a) Sensitivity of MEWS begins to degrade after 4 h whereas sensitivity of MEWS++ remains stable. (b) Specificity of MEWS++ is consistently higher than MEWS.