Literature DB >> 30732136

Heart rate variability based machine learning models for risk prediction of suspected sepsis patients in the emergency department.

Calvin J Chiew¹, Nan Liu^2,3, Takashi Tagami^3,4, Ting Hway Wong^1,5, Zhi Xiong Koh⁶, Marcus E H Ong^2,3,6.

Abstract

Early identification of high-risk septic patients in the emergency department (ED) may guide appropriate management and disposition, thereby improving outcomes. We compared the performance of machine learning models against conventional risk stratification tools, namely the Quick Sequential Organ Failure Assessment (qSOFA), National Early Warning Score (NEWS), Modified Early Warning Score (MEWS), and our previously described Singapore ED Sepsis (SEDS) model, in the prediction of 30-day in-hospital mortality (IHM) among suspected sepsis patients in the ED.Adult patients who presented to Singapore General Hospital (SGH) ED between September 2014 and April 2016, and who met ≥2 of the 4 Systemic Inflammatory Response Syndrome (SIRS) criteria were included. Patient demographics, vital signs and heart rate variability (HRV) measures obtained at triage were used as predictors. Baseline models were created using qSOFA, NEWS, MEWS, and SEDS scores. Candidate models were trained using k-nearest neighbors, random forest, adaptive boosting, gradient boosting and support vector machine. Models were evaluated on F1 score and area under the precision-recall curve (AUPRC).A total of 214 patients were included, of whom 40 (18.7%) met the outcome. Gradient boosting was the best model with a F1 score of 0.50 and AUPRC of 0.35, and performed better than all the baseline comparators (SEDS, F1 0.40, AUPRC 0.22; qSOFA, F1 0.32, AUPRC 0.21; NEWS, F1 0.38, AUPRC 0.28; MEWS, F1 0.30, AUPRC 0.25).A machine learning model can be used to improve prediction of 30-day IHM among suspected sepsis patients in the ED compared to traditional risk stratification tools.

Entities: Chemical

Mesh：

Year: 2019 PMID： 30732136 PMCID： PMC6380871 DOI： 10.1097/MD.0000000000014197

Source DB: PubMed Journal: Medicine (Baltimore) ISSN： 0025-7974 Impact factor: 1.817

Introduction

Sepsis is increasing in incidence and has a 10% to 20% in-hospital mortality (IHM) rate.[ Risk stratification of septic patients in the Emergency Department (ED) may help to guide appropriate management and disposition, thereby reducing morbidity and mortality.[ A number of clinical tools have been developed to risk stratify septic patients in the ED, where certain clinical information, such as laboratory investigations, is not readily available. The Quick Sequential Organ Failure Assessment (qSOFA) score was externally validated among septic patients presenting to the ED using the worst level of the 3 components observed during the ED stay and showed good prognostic accuracy for IHM.[ A recent study showed that commonly used early warning scores such as the National Early Warning Score (NEWS) and the Modified Early Warning Score (MEWS) were more accurate than qSOFA in predicting mortality in patients with suspected infection presenting to the ED.[ Several studies have also reported the prognostic value of heart rate variability (HRV) parameters in septic patients presenting to the ED.[ Septic patients have reduced sympatho-vagal balance and impaired sympathetic activity, which lead to varying degrees of cardiac autonomic dysfunction.[ This can be detected by HRV analysis, a quick, non-invasive technique of evaluating the beat-to-beat variation in heart rate. HRV analyses are divided into linear and non-linear methods.[ Linear methods include HRV parameters measured in time or frequency domains. Time domain HRV parameters are statistical calculations of consecutive R-R time intervals and how they correlate with each other. Frequency domain HRV parameters are based on spectral analysis. Studies have suggested that regulators of the cardiovascular system interact in a non-linear way[ and HRV analysis using non-linear methods reflect these mechanisms.[ We previously described a 5-variable Singapore ED Sepsis (SEDS) model to predict the risk of 30-day IHM among septic patients in the ED.[ The SEDS model was the first risk stratification tool to incorporate HRV parameters with other traditional prognosticators such as patient demographics and vital signs. It was developed via stepwise logistic regression and had improved predictive performance over existing tools that only utilize vital signs in their scoring criteria.[ With the widespread adoption of electronic medical records in healthcare and availability of high-resolution data particularly in the intensive care unit (ICU) setting, machine learning algorithms have become popular for modelling patient health status. Machine learning models have shown good performance in early detection of sepsis among ICU patients[ and prediction of progression to septic shock among patients with sepsis.[ A randomised controlled trial also showed that the use of a machine learning-based severe sepsis surveillance and alert system improved patient outcomes such as length of stay and IHM.[ To date, only 1 study has demonstrated the use of machine learning for risk prediction of septic patients in the ED. However, it did not explore modern algorithms such as boosting and support vector machine, and did not incorporate HRV measures.[ In this study, we aimed to compare the performance of HRV-based machine learning models against the SEDS model and other conventional risk stratification tools, namely the qSOFA, NEWS and MEWS, in the prediction of 30-day IHM among suspected sepsis patients in the ED setting. We are interested in the use of these models for early risk stratification based on clinical information that are quickly obtainable during triage and without chart review.

Methods

Ethics approval for the study was obtained from SingHealth's Centralised Institutional Review Board (CIRB, Reference Number 2016/2858), with waiver of patient consent. We conducted secondary analysis of electronic health data from patients above 21 years old who presented to the Singapore General Hospital (SGH) ED between September 2014 and April 2016 with suspected sepsis, and who met at least 2 of the 4 Systemic Inflammatory Response Syndrome (SIRS) criteria.[ The SIRS criteria are temperature (<36°C or >38°C), heart rate (>90 beats/min), respiratory rate (>20 breaths/min) and total white count (<4000/mm3 or >12,000/mm3). All patients presenting to the SGH ED are triaged by a trained nurse on arrival. The first set of vital signs recorded and routine 5-minute one-lead electrocardiogram (ECG) tracings performed at triage were used for analysis. Patient demographics and vital signs were obtained from the hospital's electronic medical records. The ECGs were obtained from X-Series Monitor (ZOLL Medical Corporation, Chelmsford, MA) and subsequently loaded into Kubios HRV software version 2.2 (Kuopio, Finland) for computation of HRV parameters.[ The program automatically detected QRS complexes, but each ECG was also manually screened to ensure QRS detection was correct, and their positions were adjusted if misplaced. The R-R interval time series was then screened for rhythm, artifacts and ectopic beats. If artifacts or ectopic beats were few (<5), they were removed from the R-R interval time series. Patients with non-sinus rhythm or >5 artifacts and/or ectopic beats were excluded. The outcome of interest was IHM within 30 days of ED admission. Objective variables quickly obtainable during triage and without chart review were considered as predictors, namely 3 demographic variables (age, gender, ethnicity), 6 vital signs (temperature, heart rate, respiratory rate, systolic and diastolic blood pressures, and Glasgow Coma Scale (GCS) score), and 22 HRV parameters in time, frequency and non-linear domains. These variables were also compared between patients who met the outcome and patients who did not using the Mann-Whitney U test for continuous variables, and the chi-square test or Fisher exact test as appropriate for categorical variables. One-hot encoding was applied to categorical variables (such as ethnicity) and all variables were scaled prior to modeling. We randomly selected 60% of the observations to train the models, holding the remaining 40% as a test set for subsequent model evaluation. Baseline models were created using qSOFA, NEWS and MEWS scores. Their scoring criteria and thresholds for predicting positive outcome (>=2 for qSOFA, >=7 for NEWS, >=5 for MEWS) were taken from their original articles.[ Two sets of qSOFA scores were computed, one using initial vital signs recorded at triage, and another using worst vital signs recorded during the entire ED stay as described by Freund et al.[ Candidate models were trained using k-nearest neighbors (KNN), random forest (RF), adaptive boosting (ADA), gradient boosting (GB) and support vector machine (SVM). Class imbalance was addressed by applying class weights. Parameter tuning was performed via grid search 5-fold cross-validation with the aim of optimizing F1 score. We used each model to predict on the test set and calculated its precision (equivalent to positive predictive value) and recall (equivalent to sensitivity) from its confusion matrix. For each model, we also generated a precision-recall curve (PRC), and calculated its F1 score, which is the harmonic mean of precision and recall, as well as area under the PRC (AUPRC). We chose these performance metrics as they are more informative and less misleading than specificity and Receiver Operating Characteristics (ROC) plots for evaluating binary classifiers on imbalanced datasets.[ We computed 95% confidence intervals (CI) for the F1 scores by sampling from 1000 bootstrapped test sets. We used F1 as our main evaluation metric since it takes both precision and recall into account and we believe both are important in this context. To better understand how the GB model worked, we also visualized feature importance in terms of the total decrease in node impurity (indicated by Gini index) due to branching over a given predictor, averaged over all trees. Univariate statistical analysis was carried out in Stata version 13 (StataCorp 2013, College Station, TX). Machine learning models were developed in Python 3.6 (Python Software Foundation, Wilmington, DE) using the scikit-learn library.[

Results

Figure 1 shows the cohort selection process. A total of 214 patients were included in the study, of whom 40 (18.7%) met the outcome. One hundred eight (50.5%) of them were male, with median age of 67.5 years (inter-quartile range, IQR 57–79). The most commonly identified sources of infection were respiratory (33.2%), urinary tract (17.8%), gastrointestinal (7.0%), musculoskeletal (5.6%), and hepatobiliary (5.6%). There were no significant differences in the sources of infection between those who did and did not meet the outcome.

Figure 1

Cohort selection flow with breakdown of outcome.

Cohort selection flow with breakdown of outcome. Table 1 compares the patient demographics, vital signs and HRV parameters of the 2 patient groups. Patients who met the outcome were older (median age 76 years; IQR 68–83 years) than those who did not (median age 66 years; IQR 56–77 years). There were no significant differences in gender and ethnicity distributions between the 2 groups. In terms of vital signs, patients who met the outcome had higher respiratory rates, as well as lower temperatures and GCS scores, compared to patients who did not meet the outcome. Most of the HRV parameters across time, frequency and non-linear domains showed significant differences between the 2 groups.

Table 1

Summary of predictor variables by presence of outcome.

Summary of predictor variables by presence of outcome. Table 2 summarizes the precision, recall, F1 score and AUPRC of the baseline and candidate models. Gradient boosting (GB) was the best candidate model with a F1 score of 0.50 and AUPRC of 0.35, and performed better than all the baseline models. Figure 2 shows the precision-recall curves of the GB model and baseline comparators.

Table 2

Results of model evaluation.

Figure 2

Precision-recall curves of gradient boosting model and baseline comparators.

Results of model evaluation. Precision-recall curves of gradient boosting model and baseline comparators. Figure 3 shows the most predictive features in the GB model and their relative importance. Top predictors for 30-day IHM included temperature, detrended fluctuation analysis (DFA) α-2, heart rate, Glasgow Coma Scale (GCS) score and approximate entropy.

Figure 3

Relative importance of top 10 predictive features in the gradient boosting model.

Discussion

In this study, we applied machine learning to improve the 30-day IHM prediction of suspected sepsis patients in the ED. Baseline comparators were the qSOFA, NEWS, MEWS, and SEDS scoring systems. Our gradient boosting model outperformed all of them in terms of F1 score and AUPRC. Compared to a previous study by Taylor et al which employed all available clinical variables collected during the entire ED stay,[ our study only used predictors that were objective and quickly attainable in the first 5 minutes of patient presentation, namely demographics, vital signs and HRV parameters derived from routine ECGs. This allows risk stratification to be done at triage, facilitating early recognition of high-risk patients for allocation of care resources in the ED. Our outcome of interest was in-hospital mortality within 30 days during the same admission where the vitals and ECG were taken. Some studies did not specify a time period for mortality[ or if it was strictly within the same admission or not.[ We chose this endpoint as it is more likely to be sepsis-related compared to an out-of-hospital mortality or mortality from a subsequent admission. It is also more meaningful for physicians in terms of administering possible interventions such as closer monitoring and less conservative management of high-risk patients.[ Among the top predictors in our machine learning model are temperature and heart rate, which are also part of the NEWS and MEWS scoring criteria, as well as GCS score, which is part of qSOFA, and similar to the AVPU scale used in NEWS and MEWS. The most important HRV predictor is DFA, which is a non-linear parameter quantifying the self-similarity of signals using the fractal property.[ In other words, it measures the long-range correlation patterns of the R-R interval time series, which includes a short-term and long-term fractal scaling exponent, α-1 and α-2, respectively. The degree of fractal correlation has been shown to reflect sympathetic and parasympathetic tone.[ Nonetheless, more research is needed to understand the physiological significance and normal range of values for each of the HRV parameters. Our study had several limitations. Firstly, this was a single-institution study with a small sample size. Therefore, the results might not be generalizable to other settings and larger multi-centre prospective studies are required to validate our results. Secondly, we had included patients in our study based on clinical suspicion of sepsis and meeting at least 2 of the 4 SIRS criteria. Sepsis largely remains a clinical diagnosis and there is no gold standard to determine whether a patient is septic. Other studies have attempted to address this issue by including only patients with administered intravenous antibiotics, blood culture investigations or confirmed source of infection.[ We acknowledge that our cohort definition reflects suspected sepsis rather than confirmed sepsis. However, given the aim of early risk stratification during triage where laboratory testing and confirmed diagnoses are not available, we believe this is suitable and does not detract from the model's value. In addition, while the SIRS criteria has recently been replaced with a new state of sepsis, defined as a life-threatening organ dysfunction caused by a dysregulated host response to infection, the usefulness of the SIRS criteria in diagnosis of sepsis was still emphasized by the same task force.[ Lastly, even though HRV measures are predictive of adverse outcomes in suspected sepsis patients as shown in this study, they cannot be manually calculated from a patient's ECG. Currently, we are developing a portable hardware device which can be used at the bedside to perform HRV analysis. We acknowledge that the use of a machine learning model requiring computational resources on the ground may be challenging. However, many modern EDs already employ electronic data collection systems, on which predictive machine learning models could be deployed, making them even more convenient than traditional manual scoring tools. Future studies should implement the clinical use of such models and evaluate whether they translate into improved outcomes for septic patients. In conclusion, a machine learning model incorporating HRV analysis can be used to improve prediction of 30-day IHM among suspected sepsis patients in the ED compared to traditional risk stratification tools. This model could be used at triage as a clinical decision support tool to identify high-risk septic patients for early, appropriate management.

Acknowledgments

We would like to thank all doctors, nurses and research assistants from the Department of Emergency Medicine, Singapore General Hospital, who contributed towards this project.

Author contributions

Conceptualization: Calvin J Chiew, Nan Liu, Marcus EH Ong. Data curation: Zhi Xiong Koh. Formal analysis: Calvin J Chiew, Nan Liu, Takashi Tagami, Ting Hway Wong, Zhi Xiong Koh. Methodology: Calvin J Chiew, Nan Liu, Takashi Tagami, Ting Hway Wong, Zhi Xiong Koh, Marcus EH Ong. Supervision: Nan Liu, Marcus EH Ong. Writing – original draft: Calvin J Chiew, Nan Liu, Takashi Tagami, Ting Hway Wong, Zhi Xiong Koh, Marcus EH Ong. Writing – review & editing: Calvin J Chiew, Nan Liu, Takashi Tagami, Ting Hway Wong, Zhi Xiong Koh, Marcus EH Ong.

33 in total

1. Critical care in the emergency department: A physiologic assessment and outcome evaluation.

Authors: H B Nguyen; E P Rivers; S Havstad; B Knoblich; J A Ressler; A M Muzzin; M C Tomlanovich
Journal: Acad Emerg Med Date: 2000-12 Impact factor: 3.451

2. Characteristics of heart rate variability can predict impending septic shock in emergency department patients with sepsis.

Authors: Wei-Lung Chen; Cheng-Deng Kuo
Journal: Acad Emerg Med Date: 2007-03-26 Impact factor: 3.451

Review 3. Severe sepsis and septic shock.

Authors: Derek C Angus; Tom van der Poll
Journal: N Engl J Med Date: 2013-08-29 Impact factor: 91.245

4. Benchmarking the incidence and mortality of severe sepsis in the United States.

Authors: David F Gaieski; J Matthew Edwards; Michael J Kallan; Brendan G Carr
Journal: Crit Care Med Date: 2013-05 Impact factor: 7.598

5. An Interpretable Machine Learning Model for Accurate Prediction of Sepsis in the ICU.

Authors: Shamim Nemati; Andre Holder; Fereshteh Razmi; Matthew D Stanley; Gari D Clifford; Timothy G Buchman
Journal: Crit Care Med Date: 2018-04 Impact factor: 7.598

Review 6. Nonlinear dynamics in cardiology.

Authors: Trine Krogh-Madsen; David J Christini
Journal: Annu Rev Biomed Eng Date: 2012-04-18 Impact factor: 9.590

Review 7. Definitions for sepsis and organ failure and guidelines for the use of innovative therapies in sepsis. The ACCP/SCCM Consensus Conference Committee. American College of Chest Physicians/Society of Critical Care Medicine.

Authors: R C Bone; R A Balk; F B Cerra; R P Dellinger; A M Fein; W A Knaus; R M Schein; W J Sibbald
Journal: Chest Date: 1992-06 Impact factor: 9.410

8. Prediction of In-hospital Mortality in Emergency Department Patients With Sepsis: A Local Big Data-Driven, Machine Learning Approach.

Authors: R Andrew Taylor; Joseph R Pare; Arjun K Venkatesh; Hani Mowafi; Edward R Melnick; William Fleischman; M Kennedy Hall
Journal: Acad Emerg Med Date: 2016-02-13 Impact factor: 3.451

Review 9. Advances in heart rate variability signal analysis: joint position statement by the e-Cardiology ESC Working Group and the European Heart Rhythm Association co-endorsed by the Asia Pacific Heart Rhythm Society.

Authors: Roberto Sassi; Sergio Cerutti; Federico Lombardi; Marek Malik; Heikki V Huikuri; Chung-Kang Peng; Georg Schmidt; Yoshiharu Yamamoto
Journal: Europace Date: 2015-07-14 Impact factor: 5.214

10. The precision-recall plot is more informative than the ROC plot when evaluating binary classifiers on imbalanced datasets.

Authors: Takaya Saito; Marc Rehmsmeier
Journal: PLoS One Date: 2015-03-04 Impact factor: 3.240

21 in total

1. Machine Learning Approaches for Fracture Risk Assessment: A Comparative Analysis of Genomic and Phenotypic Data in 5130 Older Men.

Authors: Qing Wu; Fatma Nasoz; Jongyun Jung; Bibek Bhattarai; Mira V Han
Journal: Calcif Tissue Int Date: 2020-07-29 Impact factor: 4.333

2. Serial Heart Rate Variability Measures for Risk Prediction of Septic Patients in the Emergency Department.

Authors: Calvin J Chiew; Han Wang; Marcus E H Ong; Ting Hway Wong; Zhi Xiong Koh; Nan Liu; Mengling Feng
Journal: AMIA Annu Symp Proc Date: 2020-03-04

3. A comparison of machine learning models versus clinical evaluation for mortality prediction in patients with sepsis.

Authors: William P T M van Doorn; Patricia M Stassen; Hella F Borggreve; Maaike J Schalkwijk; Judith Stoffers; Otto Bekers; Steven J R Meex
Journal: PLoS One Date: 2021-01-19 Impact factor: 3.240

Review 4. Data Science Methods for Nursing-Relevant Patient Outcomes and Clinical Processes: The 2019 Literature Year in Review.

Authors: Mary Anne Schultz; Rachel Lane Walden; Kenrick Cato; Cynthia Peltier Coviak; Christopher Cruz; Fabio D'Agostino; Brian J Douthit; Thompson Forbes; Grace Gao; Mikyoung Angela Lee; Deborah Lekan; Ann Wieben; Alvin D Jeffery
Journal: Comput Inform Nurs Date: 2021-05-06 Impact factor: 1.985

5. A novel artificial intelligence based intensive care unit monitoring system: using physiological waveforms to identify sepsis.

Authors: Maximiliano Mollura; Li-Wei H Lehman; Roger G Mark; Riccardo Barbieri
Journal: Philos Trans A Math Phys Eng Sci Date: 2021-10-25 Impact factor: 4.226

6. Influence of artificial intelligence on the work design of emergency department clinicians a systematic literature review.

Authors: Albert Boonstra; Mente Laven
Journal: BMC Health Serv Res Date: 2022-05-18 Impact factor: 2.908

Review 7. Sepsis 2019: What Surgeons Need to Know.

Authors: Vanessa P Ho; Haytham Kaafarani; Rishi Rattan; Nicholas Namias; Heather Evans; Tanya L Zakrison
Journal: Surg Infect (Larchmt) Date: 2019-11-22 Impact factor: 2.150

8. Failure of vital sign normalization is more strongly associated than single measures with mortality and outcomes.

Authors: Nicholas Levin; Devin Horton; Matthew Sanford; Benjamin Horne; Mahima Saseendran; Kencee Graves; Michael White; Joseph E Tonna
Journal: Am J Emerg Med Date: 2019-12-14 Impact factor: 2.469

9. Machine-learning Approach for the Development of a Novel Predictive Model for the Diagnosis of Hepatocellular Carcinoma.

Authors: Masaya Sato; Kentaro Morimoto; Shigeki Kajihara; Ryosuke Tateishi; Shuichiro Shiina; Kazuhiko Koike; Yutaka Yatomi
Journal: Sci Rep Date: 2019-05-30 Impact factor: 4.379

Review 10. A narrative review of heart rate and variability in sepsis.

Authors: Benjamin Yi Hao Wee; Jan Hau Lee; Yee Hui Mok; Shu-Ling Chong
Journal: Ann Transl Med Date: 2020-06