Literature DB >> 33181231

Dynamic individual vital sign trajectory early warning score (DyniEWS) versus snapshot national early warning score (NEWS) for predicting postoperative deterioration.

Yajing Zhu1, Yi-Da Chiu2, Sofia S Villar3, Jonathan W Brand4, Mathew V Patteril5, David J Morrice6, James Clayton7, Jonathan H Mackay8.   

Abstract

AIMS: International early warning scores (EWS) including the additive National Early Warning Score (NEWS) and logistic EWS currently utilise physiological snapshots to predict clinical deterioration. We hypothesised that a dynamic score including vital sign trajectory would improve discriminatory power.
METHODS: Multicentre retrospective analysis of electronic health record data from postoperative patients admitted to cardiac surgical wards in four UK hospitals. Least absolute shrinkage and selection operator-type regression (LASSO) was used to develop a dynamic model (DyniEWS) to predict a composite adverse event of cardiac arrest, unplanned intensive care re-admission or in-hospital death within 24 h.
RESULTS: A total of 13,319 postoperative adult cardiac patients contributed 442,461 observations of which 4234 (0.96%) adverse events in 24 h were recorded. The new dynamic model (AUC = 0.80 [95% CI 0.78-0.83], AUPRC = 0.12 [0.10-0.14]) outperforms both an updated snapshot logistic model (AUC = 0.76 [0.73-0.79], AUPRC = 0.08 [0.60-0.10]) and the additive National Early Warning Score (AUC = 0.73 [0.70-0.76], AUPRC = 0.05 [0.02-0.08]). Controlling for the false alarm rates to be at current levels using NEWS cut-offs of 5 and 7, DyniEWS delivers a 7% improvement in balanced accuracy and increased sensitivities from 41% to 54% at NEWS 5 and 18% to -30% at NEWS 7.
CONCLUSIONS: Using an advanced statistical approach, we created a model that can detect dynamic changes in risk of unplanned readmission to intensive care, cardiac arrest or in-hospital mortality and can be used in real time to risk-prioritise clinical workload.
Copyright © 2020 The Authors. Published by Elsevier B.V. All rights reserved.

Entities:  

Keywords:  Cardiac surgery; Dynamic prediction; Early warning scores; National early warning score; Postoperative deterioration

Mesh:

Year:  2020        PMID: 33181231      PMCID: PMC7762721          DOI: 10.1016/j.resuscitation.2020.10.037

Source DB:  PubMed          Journal:  Resuscitation        ISSN: 0300-9572            Impact factor:   5.262


Introduction

The National Early Warning Score (NEWS) utilises vital sign snapshots to detect patients at high risk of clinical deterioration.1, 2, 3 Despite NHS endorsement as the ‘gold standard’, the predictive value of any given NEWS score is uncertain and subject to inter-specialty variation.4, 5 Although it seems intuitive that vital sign trajectory should be an important determinant of the need for clinical review, no currently used Early Warning Score (EWS) includes vital sign trends.6, 7, 8, 9 We have recently developed a logistic early warning score (logEWS) that was better than NEWS at discriminating patients who had an adverse event after cardiac surgery from those who did not. Any given logEWS score reflects the percentage chance of acute deterioration within a 24 h time period. As well as offering greater transparency, the logEWS concept introduces the potential for specialty specific calibration. Electronic calculation of logEWS using a web-based app reveals wide differences in percentage chance of deterioration for many common clinical scenarios at the two recommended escalation thresholds of NEWS ≥ 5 and ≥7.10, 11, 12 Patients with stable or improving physiology, with so-called ‘soft NEWS scores’ contribute to the high ‘non-event’ rate at both escalation thresholds and loss of confidence in NEWS. We hypothesise that individual patient trajectory should be factored into the EWS model by giving additional weighting to the deteriorating patient and reduced weighting to the improving patient. Using an advanced statistical approach, a dynamic scoring model was developed that takes into account both improving or deteriorating physiology and the rate of that change over time.

Methods

Study population and data collection

The Health Research Authority approval was granted for this study and ethics approval was not applicable. The study population and data sources have been described previously in our earlier snapshot logistic early warning score paper. Briefly, we studied adult patients undergoing risk-stratified major cardiac surgery over a three-year period 1st April 2014–31st March 2017 in four UK Adult Cardiac Surgical Centres–Coventry, Middlesbrough, Papworth and Wolverhampton. All centres used vitalPAC™ (CareFlows Vitals, System C Healthcare, Maidstone, Kent, UK) to electronically record patients’ vital signs on the postoperative surgical wards. Note that data have been curated from our earlier work. We excluded observations: with missing values due to software errors and unused oxygen delivery values, before cardiac surgery, long stayers >180 days post-surgery, after readmission to hospital following surgery and duplicates (Fig. 1). The final dataset contains no missing data.
Fig. 1

Flow chart of data processing steps for vitalPAC™.

Flow chart of data processing steps for vitalPAC™.

Outcomes

Our primary objective was to develop a dynamic scoring model that uses information on patient trajectory, comparing it against simple snapshot logistic model and the additive NEWS to predict a composite adverse event of cardiac arrest, unplanned intensive care re-admission or in-hospital death within 24 h. Secondary objectives included highlighting specific features in patient vital sign trends that cause greatest predictive value of adverse events.

Score calculation

We developed a dynamic prediction model (DyniEWS) that uses both snapshot and individual patient trajectories of vital signs. To directly compare against NEWS we constructed the model using the same snapshot variables. We increased the number of categories for oxygen therapy from two used by NEWS to four: category 0, room air; category 1, FiO2 0.25−0.34, Venturi mask or nasal cannulae with oxygen flow <5 l.min−1; category 2, FiO2 0.35−0.44, standard oxygen face mask or nasal cannulae with oxygen flow ≥5 l.min−1; and category 3, FiO2 ≥ 0.45 or reservoir oxygen mask. Following previous findings, we allowed for non-symmetric effects of continuous predictors by breaking each physiological measurement into two variables reflecting positive (min (0, value-median value)) and negative (min (0, abs (value-median value)) deviations from the median value. To capture the vital trends of each patient, we constructed, for each vital sign, the difference from the most recent value (the most recent rate of change), average (level) and standard deviation (variability) of three most recent values (i.e. a rolling window of three previous records) as well as those across all sequential values. Frequency of measurements in 6 h prior to each vital record was also included in the predictor set under the clinical hypothesis that the monitoring frequency is associated with deteriorating patient characteristics. More discussions on the choice of the observation window are available in Supplementary Section 2.5 and Supplementary Table 8.

Statistical analysis

Least absolute shrinkage and selection operator–type regression (LASSO) was used to perform variable selection by imposing a penalty to models with a large number of predictors and shrinking the coefficients of the less influential variables to zero. This statistical approach improves the accuracy of prediction while reducing the model complexity. DyniEWS was compared against the snapshot logEWS2 (a revised version of the original logEWS) and the classical NEWS. Sensitivity analyses for a number of alternative forms of DyniEWS were also performed (Supplementary Section 2.2).

Model evaluation

Model performance was assessed by area under the receiver operating curve (AUC), area under the precision-recall curve (AUPRC), sensitivity, specificity and balanced accuracy.15, 16 Considering clinical utility, sensitivity was assessed when specificity was fixed at NEWS values of 5 and 7 respectively. Published comparisons of different EWS have traditionally utilised AUC to measure discriminatory performance and ability to predict SAE. There is accumulating evidence that for heavily imbalanced data with very low incidences of SAE, AUC may be misleading as it can remain high even when most or all of the rare events (SAEs) are misclassified.17, 18 The precision-recall curve, on the other hand, focuses on patients with events and plots precision (the fraction of the predicted positives that are true positives) against recall (sensitivity). The reference is presented by the horizontal line indicating prevalence. It has been recommended that both ROC and PRC should be presented for imbalanced outcomes typically encountered in these studies.16, 17 More discussions are available in Supplementary Sections 1.2 and 2.5.

Model validation

The time series feature of patient records allows for a temporal model validation procedure which reflects both the association of consecutive records in the patient record trajectory and also the amount of heterogeneity across patients. The 36-month data in each hospital were split into nine 4-month folds in temporal order. The last 4-month fold in each hospital (roughly 10% of all observations) was used as the held-out test set while the remaining 32-month data (i.e. the training set) was used for model development and internal temporal validation. AUC, calibration slope (ideal value is 1; values >1 indicates under-fitting and <1, over-fitting), calibration-in-the-large (ideal value is 0; values >0 indicates under-fitting and <1, over-fitting) and calibration plots were reported. Details explaining the statistical and validation schemes are available in the Supplementary Sections 2.3 and 2.5.

Materials

All analyses were performed using the R statistical software version 3.5.3 (packages: pROC, caret, dplyr, ggplot2). Programming scripts are fully available in the supplementary materials. Model results were reported following the TRIPOD multivariable prediction model checklist and the STROBE checklist for observational studies.

Results

Data description

A total of 13,319 patients contributed 442,461 records across four hospitals with 4234 preceding composite adverse events in 24 h (0.96% of records and 4.1% of patients). Record-level data were used across all analyses. Patient characteristics and the distribution of vitals are reported in Table 1. Descriptive statistics for other hospitals and for all variables that capture the trajectory of vitals are available in Supplementary Tables 1.1 & 1.2.
Table 1

Distribution of records, outcomes and snapshot patient vitals.

CharacteristicsTotal
Patient and events
#Patients13,319
#Records442,461



#Repeated measurements per patient
 Median(Q1,Q3)39 (25, 66)



#Days in wards
 Median(Q1,Q3)7 (4, 13)



#Composite adverse events in 24 h
n (%)4234 (0.96)



NEWS score
 Mean(SD)2.2 (1.7)



#Records in 6 h before each new record
 Median(Q1,Q3)1 (1, 1)



Physiological vitals
FiO2 category, n (%)
 Room air297,723 (67.3)
 Nasal cannula125,321 (28.3)
 Simple mask19,211 (4.3)
 Reservoir mask206 (0.1)



Level of consciousness
 Alert441,126 (99.7)
 Others1335 (0.3)



Respiratory rate (breaths min−1)
 Mean(SD)17 (2.4)



Oxygen saturation (%)
 Mean(SD)96 (2.0)



Temperature (°C)
 Mean(SD)36.6 (0.5)



Systolic blood pressure (mmHg)
 Mean(SD)120 (18)



Heart rate (beats min−1)
 Mean(SD)82 (16)

# Patients = number of patients, # records = number of records, ... and so on

Distribution of records, outcomes and snapshot patient vitals. # Patients = number of patients, # records = number of records, ... and so on

DyniEWS and feature importance

The final model, DyniEWS, contains the frequency of measurements in 6 h prior to each new record, positive and negative deviations from the median of each snapshot measurement of each vital, the most recent rate of change of each vital, the average and standard deviation of the most recent three measurements of each vital and the average of all historical values of FiO2 categories. In general, we find that the four most influential predictors are: snapshot FiO2 oxygen therapy categories (low, medium or high FiO2), “Not alert” conscious level, the average of all sequential values of ordered FiO2 categories and the frequency of measurements in the previous 6 h (Table 2). Among the five snapshot physiological measurements, above-median respiratory rate, below-median systolic blood pressure, below-median oxygen saturation, and below-median temperature and above-median heart rate are the most influential predictors of a SAE in 24 h. Among their sequential values, rolling averages of oxygen saturation and respiratory rate were ranked the most important while for the most recent rate of change, heart rate, systolic blood pressure and respiratory rate were the only influential features. Of the total importance of predictors to SAEs in 24 h, snapshot measurements account for 82%, trajectory information accounts for 18%, of which the frequency of measurements in the previous 6 h alone accounts for4%. More details are available in Supplementary Table 4.
Table 2

Ranking (from highest to lowest) of important features selected into the final DyniEWS model. Relative importance of each feature computed as percentages of the largest effect based on standardised features.

Rank (from highest to lowest)Most important features
1FiO2 categories (Reservoir mask (100%), Simple mask (73.5%), Nasal cannula (41.5%))
2Level of consciousness (Not alert (31.1%))
3Average of all historical values of the FiO2 categories (16.2%)
4Frequency of measurements in the previous 6 h (13.5%)

roll: rolling summaries of the most recent three values.

diff: the most recent rate of change.

mean: average.

sd: standard deviation.

plus: positive deviation from the median.

minus: negative deviation from the median.

Ranking (from highest to lowest) of important features selected into the final DyniEWS model. Relative importance of each feature computed as percentages of the largest effect based on standardised features. roll: rolling summaries of the most recent three values. diff: the most recent rate of change. mean: average. sd: standard deviation. plus: positive deviation from the median. minus: negative deviation from the median.

Comparing DyniEWS to other models

Internal temporal validation shows that all candidate models are well-calibrated overall (calibration slope at 1 and calibration-in-the-large at 0; more details in Supplementary Table 2). Detailed inspection via a calibration plot of 10 equal-sized probability groups (Fig. 2A) find that probabilities below 40% are well-calibrated. Similar to our previous findings, higher probabilities (rare events, roughly 1 in 5000 incidence) are not well-calibrated and need to be interpreted with caution. Among the candidate models, DyniEWS has the highest discriminability (median AUC = 0.79 [Q1–Q3: 0.78–0.80] vs LogEWS2 = 0.76 [0.75−0.77]). Youden’s best threshold for an alarm was found to be 0.97% [0.92 %–1.23 %] from this internal validation (Supplementary Table 5).
Fig. 2

Assessments of model performance on training and test data. (A) Calibration plot for internal temporal validation using training data (the number of observations = 405,692, the number of patients = 12,307). (B) (Left): Receiver-operating characteristic curves for fitting each method to the test data (the number of observations = 36,769, the number of patients = 1,150, the reference random-classification gives a 45-degree straight line with area under the curve at 50%). (B) (Right): Precision-recall curves for fitting each method to the test data (the reference is the horizontal line with precision equal to the prevalence of adverse events, 0.93%). P denotes the number of adverse events and N denotes the number of non-events.

Assessments of model performance on training and test data. (A) Calibration plot for internal temporal validation using training data (the number of observations = 405,692, the number of patients = 12,307). (B) (Left): Receiver-operating characteristic curves for fitting each method to the test data (the number of observations = 36,769, the number of patients = 1,150, the reference random-classification gives a 45-degree straight line with area under the curve at 50%). (B) (Right): Precision-recall curves for fitting each method to the test data (the reference is the horizontal line with precision equal to the prevalence of adverse events, 0.93%). P denotes the number of adverse events and N denotes the number of non-events. Focusing on the performance of each model on the test set (Fig. 2B, Supplementary Fig. 3 and Table 6), we find that DyniEWS has a clear advantage (AUC = 0.80 [95% CI 0.78, 0.83], AUPRC = 0.12 [0.10, 0.14]) over NEWS (AUC = 0.73 [0.70, 0.76], AUPRC = 0.05 [0.02, 0.08]) and snapshot LogEWS2 (AUC = 0.76 [0.73, 0.79], AUPRC = 0.08 [0.06, 0.10]). Note that the revised AUC value for NEWS is lower than previously reported due to further data curation and the introduction of a new temporal validation scheme that further corrects for the optimism in model performance. Fixing false alarm (false positive) rates at current levels using NEWS cut-offs of 5 and 7, a clear improvement (7%) of DyniEWS in balanced accuracy was also observed and sensitivity was found to have increased from that of NEWS by 13% at cut-off 5 and 12% at cut-off 7 (Supplementary Table 4). Given the poor performance of NEWS, the relative improvement is sizable (i.e. 32% (13%/41%) and 67% (12%/18%), respectively). In terms of actual patient outcomes, using test data, Table 3, shows that NEWS (cut-off values of 5 and 7) is better at identifying non-events (over 90% non-events were correctly non-flagged) at the cost of failure to pick up true events (less than 41% events were flagged). DyniEWS performs better than NEWS and LogEWS2 by correctly flagging 69% of events.
Table 3

A comparison of actual patient outcomes using four scoring systems. Numbers of cases for “events + alarms from the system”, “non-events + no alarm”, “events + no alarm” and “non-events + alarm” are reported for the unseen test data (total number of observations = 36,769 across 4 months in 4 centres), of which 340 were adverse events and 36,429 were non-events. Percentages in the last two columns were computed as the % of total no-alarms followed by SAEs and that of total alarms followed by SAEs, respectively. Youden’s thresholds (cut-off values) for LogEWS2, DyniEWS.simplified and DyniEWS that maximised the sum of sensitivity and specificity were derived from internal validation.

MethodsEvent & alarmNon-event & no-alarmEvent & no-alarmNon-event & alarmTotal alarmTotal no-alarm
N (% 340 expected cases)N (% 36,429 expected cases)N (% total no-alarms)N (% total alarms)NN
NEWS (cut-off = 3)242(71)22,827(63)98(0.4)13,602(98)13,84422,925
NEWS (cut-off = 5)140(41)32,644(90)200(0.6)3785(96)392532,844
NEWS (cut-off = 7)63(19)35,624(98)277(0.8)805(93)86835,901
LogEWS2 (cut-off = 1.01%)207(61)27,648(76)133(0.5)8781(98)891827,851
DyniNEWS.simplified (cut-off = 1.14%)211(62)29,422(81)129(0.4)7007(97)721829,551
DyniEWS (cut-off = 0.97%)233(69)28,127(77)107(0.4)8302(97)853528,234
A comparison of actual patient outcomes using four scoring systems. Numbers of cases for “events + alarms from the system”, “non-events + no alarm”, “events + no alarm” and “non-events + alarm” are reported for the unseen test data (total number of observations = 36,769 across 4 months in 4 centres), of which 340 were adverse events and 36,429 were non-events. Percentages in the last two columns were computed as the % of total no-alarms followed by SAEs and that of total alarms followed by SAEs, respectively. Youden’s thresholds (cut-off values) for LogEWS2, DyniEWS.simplified and DyniEWS that maximised the sum of sensitivity and specificity were derived from internal validation.

Practical clinical utility

Two examples of hypothetical, but relatively common hypotensive and respiratory failure clinical scenarios where stable NEWS may offer false reassurance highlight practical clinical utility (Table 4). Observation set H represents an initially stable ward-based hypotensive patient (H1), suffering a significant deterioration in 1 h (H2) and is resuscitated with fluids (H3). As opposed to a stable NEWS of 7, DyniEWS takes into account the patient’s recent trajectory and dynamically updates the probability of a SAE in 24 h from 2% in H1 to 7% and 4% in the following hours. Observation set R represents a patient with progressively worsening hypoxia from R1 to R3. DyniEWS scores 2%, 5% and 16% in 4 h, correctly indicating a worsening trend that NEWS fails to recognise.
Table 4

Comparison of NEWS and DyniEWS scores for two hypothetical dynamic clinical scenarios. H denotes a set of three consecutive records for a hypotensive patient over 2 h. R denotes a set of three consecutive records over 4 h for a worsening patient with respiratory failure. Decision threshold for each model are indicated in brackets. Physiological parameter and oxygen therapy categories are shown in colour to demonstrate how total additive NEWS score is calculated in each set of observations (score 0 = black, score 1 = green, score 2 = blue and score 3 = red).

Comparison of NEWS and DyniEWS scores for two hypothetical dynamic clinical scenarios. H denotes a set of three consecutive records for a hypotensive patient over 2 h. R denotes a set of three consecutive records over 4 h for a worsening patient with respiratory failure. Decision threshold for each model are indicated in brackets. Physiological parameter and oxygen therapy categories are shown in colour to demonstrate how total additive NEWS score is calculated in each set of observations (score 0 = black, score 1 = green, score 2 = blue and score 3 = red).

Discussion

Added knowledge

Our newly derived dynamic score offers substantially improved discriminatory performance (AUC = 0.80 [95% CI 0.78−0.83]) and AUPRC 0.12 [0.10−0.14] versus the current gold standard additive NEWS (AUC = 0.73 [0.70−0.76]) and AUPRC 0.05 [0.05−0.08]. Improved sensitivity from 41% to 54% at the lower NEWS threshold of 5 and 18% to –30% at the higher NEWS thresholds of 7 are of potential clinical importance.

Strengths

The consensus based NEWS is a simple additive score whereas DyniEWS is an evidence based, completely data-driven dynamic score. Recognition of the asymmetric contribution of above- and below-median values of 5 physiological observations and using patient trajectory are key advantages of DyniEWS. Unlike data-hungry deep-learning or machine-learning algorithms, DyniEWS is not a black-box approach and is able to produce more robust and clinically interpretable results. Our study reinforces the case for revising the relative weightings of vital signs and subdividing oxygen therapy based on FiO2 whilst illustrating the benefits of including patient trajectory in future EWS models. Traditional EWS models give equal weighting to positive and negative divergences from median values. Although hypertension, bradypnoea, bradycardia and hyperthermia are undesirable, our study highlights the greater comparative risks of hypotension, tachypnoea, tachycardia, hypoxia and hypothermia.10, 21 Another recent study has similarly reported that subdivision of FiO2 into three subcategories improves NEWS discriminatory performance. Our findings suggest that the greatest gains of including patient trajectory are achieved by the most recent three records of each patient. With the exception of historical FiO2 categories, the contribution of other vital sign trends to the incidence over a longer trajectory is small. The clinical scenario R1–R3 (Table 4) emphasises the important relative contribution of oxygen therapy–particularly with higher FiO2 categories. Our model introduces ‘frequency of observations in previous 6 h as a new independent risk factor. The magnitude of this variable’s effect was one of the top five most influential predictors. Frequency of ward observations in a postoperative cardiac surgical ward is dependent on three main factors: time out from intensive care, new patient symptoms and ‘nursing concern’. A very weak correlation between frequency of ‘records in the previous 6 h’ and ‘time out of ICU’ (0.092) provides circumstantial evidence for the inclusion of nursing concern in future EWS models.23, 24 Dynamic individual patient trajectory prediction is an advanced, highly interpretable and computationally efficient statistical method. Instead of leaving one centre out for external validation, we believe that it is essential to develop the model using data from all available centres to better capture patient heterogeneity and case-mix, maximise the use of data and reduce optimism in the final model’s predictive performance. The temporal order of vital signs allows for an external validation procedure using held-out test data that were collected after all data had been used for model development. The potential added value of DyniEWS is best illustrated by two relatively common clinical scenarios (Table 4) that demonstrate the potential clinical utility of a dynamic model in situations where a stable NEWS could provide false reassurance. Scenarios H1 and R1 are both common and frequently initially deemed low-risk after escalation and clinical review. Further deterioration may not be appropriately escalated in both scenarios due to stable NEWS being mistaken for clinical stability.

Weaknesses

The absence of an ‘event’ does not mean the absence of any ‘therapeutic intervention’. Many NEWS 5 and 7 alarms result in ward-based therapeutic interventions that do not require ICU readmission. Accurate measurement of this proportion of alarms leading to ward level interventions is notoriously challenging for multiple reasons-including major inter-individual differences in definition and documentation of ward interventions. For this reason, EWS validation studies invariably focus on events which can be objectively measured.5, 20, 21, 26, 27, 28 As with all EWS, the high apparent non-event rate after an alarm (Table 3) is a cause for concern. The constant burden of alarms leading to ‘alarm fatigue’ may lead to scores being ignored and loss of confidence and possible abandonment of physiological hospital-wide surveillance.13, 29 In the test data, our total incidence of 8302 so called ‘non-events and alarms’ with DyniEWS translates into ∼519 alarms per week over four hospitals or 19 alarms per cardiac surgical unit per day (Table 3). Clinical experience suggests that a small number of sick patients will account for the majority of these ‘non-events and alarms’ and many of these ‘alarms’ will subsequently result in (often undocumented) ward-level therapeutic interventions. Although we believe this frequency of alarms will be acceptable, prospective studies are required to optimize alarm thresholds and confirm clinical utility. Our model is a probabilistic model primarily predicting risk of ICU admission. The choice of threshold warrants careful consideration of clinical utility and the decision making process needs to take into account the relative weightings of false positives and negatives to decide on optimal trade-off. 30, 31

Future studies and practice changes

Currently, DyniEWS has been derived, validated and calibrated in postoperative cardiac surgical patients. Recalibration in other surgical, medical and paediatric specialties is entirely possible. Despite DyniEWS being focussed entirely on NEWS parameters, consideration could be given to including other strong predictors such as renal function. NEWS3 is scheduled to be launched in 2022. By this time the majority of hospitals in the developed world will be using electronic observation charts. We believe this provides the opportunity for radical revisions to NEWS2 and a shift away from outdated ‘consensus-based’ scores towards an ‘evidence-based’ model specifically calibrated for the patient group it aims to protect. NEWS3 should recognise the additional safety benefits of a dynamic score including patient trajectory and observation frequency. After 2022, for the minority of hospitals still using ‘pen & paper’ charts, consideration should be given to subdivision of supplementary oxygen in an updated additive NEWS. Changes to the efferent response to DyniEWS with tiered thresholds for escalation would also need to be developed, agreed and implemented. Methodologically, more black-box-type machine learning methods may also be considered but the trade-off between clinical interpretability, data representativeness and computational efficiency should be carefully evaluated.

Conclusions

There is a worldwide trend towards investing in electronic observations. A dynamic EWS with specialty specific recalibration offers the potential to substantially reduce missed event rates and improve patient safety. The failure of current snapshot models to distinguish between rapidly deteriorating and improving situations is a major potential weakness. Scoring systems should utilise and process what is important rather than just what is easy.

Authors’ contributions

Study conception design: YZ, YDC, SSV, JHM. Data acquisition: JC, JWB, MVP, DJM, JHM. Data analysis and model construction: YZ, YDC, SSV. Interpreting the results: YZ, YDC, SSV, JHM. Initial drafting of manuscript: YZ, JHM. Critical revision of manuscript: YZ, YDC, SSV, JC, JHM.

Conflicts of interest

The authors declare that they have no conflicts of interest.

Funding

Sofia S. Villar thanks the National Institute for Health Research Cambridge Biomedical Research Centre at Cambridge University Hospitals NHS Foundation Trust and the UK Medical Research Council (grant number: MC_UU_00002/15) for their funding.

CRediT authorship contribution statement

Yajing Zhu: Conceptualization, Data curation, Formal analysis, Methodology, Validation, Writing - original draft, Writing - review & editing. Yi-Da Chiu: Conceptualization, Data curation, Methodology, Writing - review & editing. Sofia S. Villar: Conceptualization, Formal analysis, Funding acquisition, Supervision, Writing - review & editing. Jonathan W. Brand: Investigation, Resources, Writing - review & editing. Mathew V. Patteril: Investigation, Resources, Writing - review & editing. David J. Morrice: Investigation, Resources, Writing - review & editing. James Clayton: Data curation, Resources, Writing - review & editing. Jonathan H. Mackay: Conceptualization, Investigation, Project administration, Supervision, Writing - original draft, Writing - review & editing.
  24 in total

1.  Human error: models and management.

Authors:  J Reason
Journal:  BMJ       Date:  2000-03-18

2.  ViEWS--Towards a national early warning score for detecting adult inpatient deterioration.

Authors:  David R Prytherch; Gary B Smith; Paul E Schmidt; Peter I Featherstone
Journal:  Resuscitation       Date:  2010-08       Impact factor: 5.262

3.  Improving early warning scores - more data, better validation, the same response.

Authors:  J H Mackay; J W Brand; Y D Chiu; S S Villar
Journal:  Anaesthesia       Date:  2020-04       Impact factor: 6.955

4.  The value of vital sign trends for detecting clinical deterioration on the wards.

Authors:  Matthew M Churpek; Richa Adhikari; Dana P Edelson
Journal:  Resuscitation       Date:  2016-02-16       Impact factor: 5.262

5.  Validating the Electronic Cardiac Arrest Risk Triage (eCART) Score for Risk Stratification of Surgical Inpatients in the Postoperative Setting: Retrospective Cohort Study.

Authors:  Bartlomiej Bartkowiak; Ashley M Snyder; Andrew Benjamin; Andrew Schneider; Nicole M Twu; Matthew M Churpek; Kevin K Roggin; Dana P Edelson
Journal:  Ann Surg       Date:  2019-06       Impact factor: 12.969

6.  Impact of introducing an electronic physiological surveillance system on hospital mortality.

Authors:  Paul E Schmidt; Paul Meredith; David R Prytherch; Duncan Watson; Valerie Watson; Roger M Killen; Peter Greengross; Mohammed A Mohammed; Gary B Smith
Journal:  BMJ Qual Saf       Date:  2014-09-23       Impact factor: 7.035

7.  The precision-recall plot is more informative than the ROC plot when evaluating binary classifiers on imbalanced datasets.

Authors:  Takaya Saito; Marc Rehmsmeier
Journal:  PLoS One       Date:  2015-03-04       Impact factor: 3.240

Review 8.  Nurses' worry or concern and early recognition of deteriorating patients on general wards in acute care hospitals: a systematic review.

Authors:  Gooske Douw; Lisette Schoonhoven; Tineke Holwerda; Getty Huisman-de Waal; Arthur R H van Zanten; Theo van Achterberg; Johannes G van der Hoeven
Journal:  Crit Care       Date:  2015-05-20       Impact factor: 9.097

9.  The fifth vital sign? Nurse worry predicts inpatient deterioration within 24 hours.

Authors:  Santiago Romero-Brufau; Kim Gaines; Clara T Nicolas; Matthew G Johnson; Joel Hickman; Jeanne M Huddleston
Journal:  JAMIA Open       Date:  2019-08-28

10.  Logistic early warning scores to predict death, cardiac arrest or unplanned intensive care unit re-admission after cardiac surgery.

Authors:  Y-D Chiu; S S Villar; J W Brand; M V Patteril; D J Morrice; J Clayton; J H Mackay
Journal:  Anaesthesia       Date:  2019-07-03       Impact factor: 6.955

View more
  2 in total

1.  Evaluation of NEWS2 response thresholds in a retrospective observational study from a UK acute hospital.

Authors:  Tanya Pankhurst; Elizabeth Sapey; Helen Gyves; Felicity Evison; Suzy Gallier; George Gkoutos; Simon Ball
Journal:  BMJ Open       Date:  2022-02-08       Impact factor: 2.692

2.  Dynamic early warning scores for predicting clinical deterioration in patients with respiratory disease.

Authors:  Sherif Gonem; Adam Taylor; Grazziela Figueredo; Sarah Forster; Philip Quinlan; Jonathan M Garibaldi; Tricia M McKeever; Dominick Shaw
Journal:  Respir Res       Date:  2022-08-11
  2 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.