| Literature DB >> 34124082 |
Michael Moor1,2, Bastian Rieck1,2, Max Horn1,2, Catherine R Jutzeler1,2, Karsten Borgwardt1,2.
Abstract
Background: Sepsis is among the leading causes of death in intensive care units (ICUs) worldwide and its recognition, particularly in the early stages of the disease, remains a medical challenge. The advent of an affluence of available digital health data has created a setting in which machine learning can be used for digital biomarker discovery, with the ultimate goal to advance the early recognition of sepsis. Objective: To systematically review and evaluate studies employing machine learning for the prediction of sepsis in the ICU. Data Sources: Using Embase, Google Scholar, PubMed/Medline, Scopus, and Web of Science, we systematically searched the existing literature for machine learning-driven sepsis onset prediction for patients in the ICU. Study Eligibility Criteria: All peer-reviewed articles using machine learning for the prediction of sepsis onset in adult ICU patients were included. Studies focusing on patient populations outside the ICU were excluded. Study Appraisal and Synthesis <br> Methods: A systematic review was performed according to the PRISMA guidelines. Moreover, a quality assessment of all eligible studies was performed. <br> Results: Out of 974 identified articles, 22 and 21 met the criteria to be included in the systematic review and quality assessment, respectively. A multitude of machine learning algorithms were applied to refine the early prediction of sepsis. The quality of the studies ranged from "poor" (satisfying ≤ 40% of the quality criteria) to "very good" (satisfying ≥ 90% of the quality criteria). The majority of the studies (n = 19, 86.4%) employed an offline training scenario combined with a horizon evaluation, while two studies implemented an online scenario (n = 2, 9.1%). The massive inter-study heterogeneity in terms of model development, sepsis definition, prediction time windows, and outcomes precluded a meta-analysis. Last, only two studies provided publicly accessible source code and data sources fostering reproducibility. Limitations: Articles were only eligible for inclusion when employing machine learning algorithms for the prediction of sepsis onset in the ICU. This restriction led to the exclusion of studies focusing on the prediction of septic shock, sepsis-related mortality, and patient populations outside the ICU. Conclusions and Key Findings: A growing number of studies employs machine learning to optimize the early prediction of sepsis through digital biomarker discovery. This review, however, highlights several shortcomings of the current approaches, including low comparability and reproducibility. Finally, we gather recommendations how these challenges can be addressed before deploying these models in prospective analyses. Systematic Review Registration Number: CRD42020200133.Entities:
Keywords: early detection; machine learning; onset prediction; sepsis; systematic review
Year: 2021 PMID: 34124082 PMCID: PMC8193357 DOI: 10.3389/fmed.2021.607952
Source DB: PubMed Journal: Front Med (Lausanne) ISSN: 2296-858X
Figure 1PRISMA flowchart of the search strategy. A total of 22 studies were eligible for the literature review and 21 for the quality assessment.
Overview of included studies.
| 1 | Abromavičius et al. ( | Emory University Hospital, MIMIC-III | Sepsis-3 (with modified time windows) | 2,932 | 7.3 | Yes | No | No | AdaBoost and Discriminant Subspace Learning | – | – | No | Demographics, labs, vitals | 11 |
| 2 | Barton et al. ( | MIMIC-III, UCSF | Sepsis-3 | 3,673 | 3.3 | No | No | No | XGBoost | 0.88 | 0 | No | Vitals | 6 |
| 3 | Bloch et al. ( | RMC | Sepsis-2 related | 300 | 50.0 | No | No | No | Neural Networks, SVM, logistic regression | 0.88 | 4 | No | Vitals | 4 |
| 4 | Calvert et al. ( | MIMIC-II | Sepsis-2 related | 159 | 11.4 | No | No | No | InSight Algorithm | 0.92 | 3 | No | Demographics, labs, vitals | 9 |
| 5 | Desautels et al. ( | MIMIC-III | Sepsis-3 | 1,840 | 9.7 | No | No | No | InSight Algorithm | 0.88 | 0 | No | Demographics, vitals | 8 |
| 6 | Futoma et al. ( | Duke University Health System | Sepsis-2 related | 11,064 | 21.4 | No | No | No | MGP-RNN | 0.91 | 0 | No | Comorbidities, demographics, labs, medications, vitals | 77 |
| 7 | Kaji et al. ( | MIMIC-III | Sepsis-2 related | 36,176 | 63.6 | Yes | Yes | Yes | LSTM | 0.88 | “Next day” | No | Demographics, labs, medications, vitals | 119 |
| 8 | Kam and Kim ( | MIMIC-II | Sepsis-2 related | 360 | 6.2 | No | No | No | SepLSTM | 0.99 | 0 | No | Demographics, labs, vitals | 9 |
| 9 | Lauritsen et al. ( | Danish EHR | Sepsis-2 related | – | – | No | No | No | CNN-LSTM | 0.88 | 0.25 | No | Diagnoses, labs, imaging, medications, vitals, procedures | – |
| 10 | Lukaszewski et al. ( | Queen Alexandra Hospital | Sepsis-2 related | 25 | 53.2 | No | No | No | MLP | – | – | No | Clinical parameters, cytokine mRNA expression | – |
| 11 | Mao et al. ( | MIMIC-III, UCSF | Sepsis-2 related | 1,965 | 9.1 | Yes | No | No | InSight Algorithm | 0.92 | 0 | Yes | Vitals | 30 |
| 12 | McCoy and Das ( | CRMC | Sepsis-3, Severe Sepsis | 407 | 24.4 | No | No | No | InSight Algorithm | 0.91 | – | – | Labs, vitals | – |
| 13 | Moor et al. ( | MIMIC-III | Sepsis-3 | 570 | 9.2 | Yes | Yes | Yes | MGP-TCN | 0.91 | 0 | No | Labs, vitals | 44 |
| 14 | Nemati et al. ( | Emory Healthcare system, MIMIC-III | Sepsis-3 (modified time windows) | 2,375 | 8.6 | No | No | No | Weilbull-Cox proportional hazards model | 0.85 | 4 | Yes | Demographics, vitals | 48 |
| 15 | Reyna et al. ( | Emory University Hospital, MIMIC-III | Sepsis-3 (modified time windows) | 2,932 | 7.3 | Yes | No | No | – | – | – | Yes | Demographics, labs, vitals | 40 |
| 16 | Schamoni et al. ( | University Medical Centre Mannheim | Sepsis tag by ICU clinicians | 200 | 32.3 | No | No | No | Non-linear ordinal regression | 0.84 | 4 | No | Comorbidities, demographics, labs, vitals | 55 |
| 17 | Scherpf et al. ( | MIMIC-III | Sepsis-2 related | 2,724 | 7.7 | No | No | No | RNN-GRU | 0.81 | 3 | No | Labs, vitals | 10 |
| 18 | Shashikumar et al. ( | Emory Healthcare system | Sepsis-3 | 242 | 22.0 | No | No | No | ElasticNet | 0.78 | 4 | No | Comorbidities, clinical context, demographics, vitals | 17 |
| 19 | Shashikumar et al. ( | Emory Healthcare system | Sepsis-3 | 100 | 40.0 | No | No | No | SVM | 0.8 | 4 | No | Demographics, comorbidity, clinical context, vitals | 2 |
| 20 | Sheetrit et al. ( | MIMIC-III | Sepsis-2 related | 1,034 | 41.4 | No | No | No | Temporal Probabilistic Profiles | – | – | No | Demographics, labs, vitals | – |
| 21 | van Wyk et al. ( | MLH System | Sepsis-2 related | – | 50.0 | No | No | No | Random Forests, RNN | – | – | No | Labs, vitals | 7 |
| 22 | van Wyk et al. ( | MLH System | Sepsis-2 related | 377 | 50.0 | No | No | No | Random Forests | 0.79 | 0 | No | Vitals | 7 |
Only if area under the Receiver Operating Characteristic Curve (AUROC) was reported in an early prediction setup, the performance and the corresponding prediction window is reported (in hours before onset). As these windows were highly heterogeneous, to achieve more comparability, we report the minimal hour before onset that was reported. Notably, due to heterogeneous sepsis definition implementations and experimental setups, these metrics likely have low comparability between studies, which is why we deemed a quantitative meta-analysis to be inappropriate.
AUROC, area under the ROC curve; CNN-LSTM, convolutional neural network long short-term memory; EHR, electronic health record; ICU, intensive care unit; LSTM, long short-term memory; MGP-RNN, multi-task Gaussian process recurrent neural network; MGP-TCN, multi-task Gaussian process temporal convolutional network; MIMIC, medical information mart for intensive care; MLH, Methodist Le Bonheur Healthcare System; MLP, multilayer perceptron; RMC, Rabin Medical Center; RNN-GRU, recurrent neural net gated recurrent unit; SepLSTM, proper name for LSTM for sepsis; SVM, support vector machine; USCF, University of California San Francisco Health System.
Figure 2A boxplot of the sepsis prevalence distribution of all studies, with the median prevalence being highlighted in . Note that some studies have subset controls for balancing the class ratios in order to facilitate the training of the machine learning model. Thus, the prevalence in the study cohort (i.e., the subset) can be different from the prevalence of the original data source (e.g., MIMIC-III).
An overview of experimental details: the used sepsis definition, the exact prediction task, and which type of temporal case–control alignment was used (if any).
| 1 | Abromavičius et al. ( | Online training, online evaluation | Sepsis-3 (with modified time windows) | – | – |
| 2 | Barton et al. ( | Offline training, horizon evaluation | Sepsis-3 | Random onset matching | Inpatients, age ≥18 years, at least one observation per measurement, prediction times between 7 and 2,000 h |
| 3 | Bloch et al. ( | Offline training, horizon evaluation | Sepsis-2 related: SIRS criteria plus diagnosis of infection | Random onset matching (at least 12 h after admission to the ICU) | age >18 years, admitted to ICU; minimum stay of 12 h in the ICU; patients did not meet SIRS criteria at time of admission to the ICU; Continuous documented measurements were available for at least 12 h for vital signs |
| 4 | Calvert et al. ( | Offline training, horizon evaluation | Sepsis-2 related: ICD-9 code 995.9 and a 5-h persisting window of fulfilled SIRS | – | Medical ICU, age >18 years, SIRS not fulfilled upon admission, measurements for set of nine variables available |
| 5 | Desautels et al. ( | Offline training, horizon evaluation, but retrained for each prediction horizon | Sepsis-3 | – | Age ≥15 years, any measurements present, Metavision logging, for cases: sepsis onset between 7 and 500 h after ICU admission, all variables at least once measured, excluded patients that received antibiotics before ICU |
| 6 | Futoma et al. ( | Offline training, horizon evaluation | Sepsis-2 related: SIRS fulfilled and blood culture drawn and 1 abnormal vital (time windows not stated) | Relative onset matching | Entire EHR cohort included |
| 7 | Kaji et al. ( | Offline training, horizon evaluation | Sepsis-2 related: SIRS criteria plus ICD-9 code consistent with infection | Fixed length of 14 days in ICU (truncation if longer, zero filling, and masking if shorter) | Individual patient ICU admissions 2 days or longer were identified |
| 8 | Kam and Kim ( | Offline training, horizon evaluation | Sepsis-2 related: ICD-9 code 995.9 and the first 5-h persisting window of fulfilled SIRS | insufficient detail: during training, 5-h windows are randomly extracted from case before sepsis and entire control stay, during testing it is not stated which data are used for controls | Medical ICU, age >18 years, patient can be checked for 5-h SIRS window plus ICD-9 995.9 code (if only one of the two was available, patients were excluded) |
| 9 | Lauritsen et al. ( | Offline training, horizon evaluation | Sepsis-2 related: SIRS criteria plus clinically suspected infection | Random onset matching (excluding the first and last 3 h) | Inpatients, admissions ≥3 h, hospital departments with sepsis prevalence ≥2%, ≥1 observations for each vital sign measurement |
| 10 | Lukaszewski et al. ( | Offline training, offline evaluation (fixed 24-h horizon) | Sepsis-2 related: SIRS criteria plus positive microbiological culture | Insufficient detail (but age-matching between cases and controls; healthy volunteers used as controls) | Blood samples taken daily; last sample on day of diagnosis or last stay in ICU |
| 11 | Mao et al. ( | Offline training, offline evaluation (single fixed 4-h horizon) | Sepsis-2 related (suspected infection and first hour of fulfilled SIRS criteria), Severe Sepsis: ICD-9 plus SIRS plus organ dysfunction criteria; Septic Shock: ICD-9 plus manually defined conditions | – | Inpatients, age ≥18 years, ≥1 observations for each vital sign measurement, prediction time between 7 and 2,000 h |
| 12 | McCoy and Das ( | Offline training, evaluation on retrospective dataset, prospective evaluation implemented as risk score | Sepsis-3, Severe Sepsis (SIRS criteria plus 2 organ dysfunction lab values) | – | Age >18 years; two or more sirs criteria during stay (hard to tell “Patient encounters were included in the sepsis-related outcome metrics if they met two or more SIRS criteria at some point during their stay.” Is this an inclusion criterion or their label definition?) |
| 13 | Moor et al. ( | Offline training, horizon evaluation | Sepsis-3 | Absolute onset matching | Age ≥15 years, chart data including ICU admission/discharge time available, Metavision logging, cases: onset at least 7 h into ICU stay |
| 14 | Nemati et al. ( | Offline training, horizon evaluation | Sepsis-3 (with modified time windows) | – | Age ≥18 years; sepsis onset not earlier than 4 h within ICU admission |
| 15 | Reyna et al. ( | Online training, online evaluation | Sepsis-3 (with modified time windows) | – | ≥8 h of measurements |
| 16 | Schamoni et al. ( | Offline training, horizon evaluation as well as prediction of severity (ordinal regression) | Sepsis tag by ICU clinicians via electronic questionnaire | – | Sepsis onset not earlier than on the second day after ICU admission |
| 17 | Scherpf et al. ( | Offline training, horizon evaluation | Sepsis-2 related: ICD-9 codes plus SIRS criteria | Random onset matching via drawing fixed size time windows | Age ≥18 years, at least one measurement for SIRS parameters, no sepsis on admission, at least 5 h plus prediction time of measurements |
| 18 | Shashikumar et al. ( | Offline training, Offline prediction (single fixed 4-h horizon) | Sepsis-3 | – | – |
| 19 | Shashikumar et al. ( | Offline training, Offline prediction (single fixed 4-h horizon) | Sepsis-3 | – | – |
| 20 | Sheetrit et al. ( | Offline training, horizon evaluation on two prediction windows (12 and 1 h) | Sepsis-2 related: ICD-9 Codes 995.91 or 995.92 plus antibiotics administered. Onset time is defined as the earliest of either antibiotics prescription or fulfilled qSOFA criteria | Insufficient detail: the paper uses the “equivalent time” as the feature window of the control group | ICU admission, age ≥15 years, for sepsis cases: onset not before third day |
| 21 | van Wyk et al. ( | Offline training, horizon evaluation | Sepsis-2 related: SIRS criteria plus suspicion of infection, indicated by the presence of a blood culture and the administration of antibiotics during the encounter, along with relevant ICD10 | Insufficient detail: the paper uses “a given 6-h observational period” for the control group | At least 8 h of continuous data, absence of cardiovascular disease |
| 22 | van Wyk et al. ( | Offline training, horizon evaluation | Sepsis-2 related: SIRS criteria plus suspicion of infection, indicated by the presence of a blood culture and the administration of antibiotics during the encounter, along with relevant ICD10 | Insufficient detail: the paper uses “a given 3-h observational period” for the control group | Age >18 years, physiological data available for at least 3 or 6 h, respectively; absence of cardiovascular disease |
Abbreviations: EHR, electronic health record; ICD-9, International Classification of Disease Version 9; ICU, intensive care unit; qSOFA, quick Sequential Organ Failure Assessment; SIRS, Systemic Inflammatory Response Syndrome.
Figure 3(A) Offline training scenario and case–control matching. Every case has a specific sepsis onset. Given a random control, there are multiple ways of determining a matched onset time: (i) relative refers to the relative time since intensive care unit (ICU) admission (here, 75% of the ICU stay); (ii) absolute refers to the absolute time since ICU admission; (iii) random refers to a pseudo-random time during the ICU stay, often with the requirement that the onset is not too close to ICU discharge. (B) Horizon evaluation scenario. Given a case and control, with a matched relative sepsis onset, the look-back horizon indicates how early a specific model is capable of predicting sepsis. As the (matched) sepsis onset is approached, this task typically becomes progressively easier. Notice the difference in the prediction targets (labels) (shown in for predicting a case, and for predicting a control).
Figure 4Online training and evaluation scenario. Here, the model predicts at regular intervals during an ICU stay (we show predictions in 1-h intervals). For sepsis cases, there is no prima facie notion at which point in time positive predictions ought to be considered as true positive (TP) predictions or false positive (FP) predictions (mutatis mutandis, this applies to negative predictions). For illustrative purposes, here we consider positive predictions up until 1 h before or after sepsis onset (for a case) to be TP.
Quality assessment of all studies.
| 1 | Abromavičius et al. ( | ✓ | ✓ | ✓ | ✓ | ✗ | ✗ | ✗ | ✗ | ✓ | ✓ | ✗ | ✓ | ✗ | ✗ | 50% |
| 2 | Barton et al. ( | ✓ | ✓ | ✗ | ✗ | ✗ | ✗ | ✓ | ✗ | ✓ | ✓ | ✓ | ✗ | ✓ | ✓ | 57% |
| 3 | Bloch et al. ( | ✓ | ✓ | ✗ | ✓ | ✗ | ✗ | ✓ | ✓ | ✓ | ✓ | ✓ | ✗ | ✓ | ✓ | 71% |
| 4 | Calvert et al. ( | ✓ | ✓ | ✗ | ✓ | ✗ | ✗ | ✗ | ✗ | ✓ | ✓ | ✗ | ✗ | ✗ | ✓ | 43% |
| 5 | Desautels et al. ( | ✓ | ✓ | ✗ | ✓ | ✗ | ✗ | ✗ | ✗ | ✓ | ✓ | ✓ | ✗ | ✗ | ✓ | 50% |
| 6 | Futoma et al. ( | ✓ | ✓ | ✗ | ✓ | ✗ | ✗ | ✗ | ✓ | ✓ | ✓ | ✗ | ✗ | ✗ | ✓ | 50% |
| 7 | Kaji et al. ( | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✗ | ✓ | ✓ | 93% |
| 8 | Kam and Kim ( | ✓ | ✓ | ✗ | ✗ | ✗ | ✗ | ✗ | ✗ | ✓ | ✓ | ✗ | ✗ | ✗ | ✓ | 36% |
| 9 | Lauritsen et al. ( | ✓ | ✗ | ✗ | ✓ | ✗ | ✗ | ✗ | ✓ | ✓ | ✓ | ✓ | ✗ | ✗ | ✓ | 57% |
| 10 | Lukaszewski et al. ( | ✓ | ✓ | ✗ | ✓ | ✗ | ✗ | ✗ | ✗ | ✗ | ✗ | ✓ | ✗ | ✓ | ✓ | 43% |
| 11 | Mao et al. ( | ✓ | ✓ | ✓ | ✓ | ✗ | ✗ | ✗ | ✗ | ✓ | ✓ | ✓ | ✓ | ✗ | ✓ | 64% |
| 12 | McCoy and Das ( | ✓ | ✓ | ✗ | ✗ | ✗ | ✗ | ✗ | ✗ | ✓ | ✗ | ✓ | ✗ | ✗ | ✓ | 36% |
| 13 | Moor et al. ( | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✗ | ✓ | ✓ | 93% |
| 14 | Nemati et al. ( | ✓ | ✓ | ✗ | ✓ | ✗ | ✗ | ✗ | ✗ | ✓ | ✓ | ✗ | ✓ | ✗ | ✓ | 50% |
| 15 | Schamoni et al. ( | ✓ | ✓ | ✗ | ✓ | ✗ | ✗ | ✗ | ✗ | ✓ | ✓ | ✓ | ✗ | ✓ | ✓ | 57% |
| 16 | Scherpf et al. ( | ✓ | ✓ | ✗ | ✓ | ✗ | ✗ | ✗ | ✗ | ✓ | ✓ | ✓ | ✗ | ✗ | ✗ | 43% |
| 17 | Shashikumar et al. ( | ✓ | ✓ | ✗ | ✓ | ✗ | ✗ | ✗ | ✗ | ✓ | ✓ | ✗ | ✗ | ✓ | ✓ | 50% |
| 18 | Shashikumar et al. ( | ✓ | ✓ | ✗ | ✓ | ✗ | ✗ | ✗ | ✗ | ✓ | ✓ | ✗ | ✗ | ✓ | ✓ | 50% |
| 19 | Sheetrit et al. ( | ✓ | ✓ | ✗ | ✓ | ✗ | ✗ | ✗ | ✓ | ✓ | ✓ | ✗ | ✗ | ✗ | ✗ | 43% |
| 20 | van Wyk et al. ( | ✓ | ✓ | ✗ | ✗ | ✗ | ✗ | ✗ | ✗ | ✓ | ✗ | ✓ | ✗ | ✗ | ✓ | 36% |
| 21 | van Wyk et al. ( | ✓ | ✓ | ✗ | ✓ | ✗ | ✗ | ✗ | ✗ | ✓ | ✗ | ✓ | ✗ | ✗ | ✓ | 43% |
| 100% | 95% | 19% | 81% | 10% | 10% | 19% | 29% | 95% | 81% | 62% | 14% | 38% | 86% | |||
| Study | Unmet need | Reproducibility | Stability | Generalizability | Clinical significance | Total | ||||||||||
We excluded Reyna et al. (.
Figure 5A boxplot of the number of sepsis encounters reported by all studies, with the median number of encounters being highlighted in . Since the numbers feature different orders of magnitude, we employed logarithmic scaling. The marks indicate which definition or modification thereof was used. Sepsis-3: squares, Sepsis-2: triangles, domain expert label: asterisk.
| Make code publicly available or usable | A prerequisite of being able to replicate the results of any study, or to use any model in a comparative setting, is having access to the raw code or a binary variant thereof that was used to perform the experiments. Authors are encouraged to share their code, for example via platforms, such as GitHub, or their binaries using container technologies like Docker. | GitHub, Docker |
| Use external validation for the machine learning model | External validation of a classifier is crucial for assessing the model's generalizability. Several publicly available data sources exist that can be used for this purpose. | MIMIC-II, MIMIC-III, eICU, HiRID |
| Provide exact definition of sepsis label | Implementations vary drastically in terms of prevalence and number of sepsis encounters. Thus, reporting the label generation process is essential, particularly when labels deviate from the international definitions of sepsis. For instance, when using the eICU dataset, microbiology measurements are under-reported for defining suspected infection, yet the exact modifications of sepsis implementations have not explicitly been stated ( | Provide code of how sepsis label was determined. |
| Provide an detailed description of a control and, if applicable, its matched onset | While there is a defined point in time for an event in the sepsis cohorts, it is much more challenging to determine at what time to extract data for a control case when was the non-event. For transparency and replication reasons, it is crucial to provide details on how controls were defined and how the onset was determined. | Provide code of how a control was defined and, if applicable, its matched onset was determined. |
| Make data available | If possible and in compliance with international data protection laws, data sources should be made accessible to bona fide researchers. There are multiple data repositories, which researchers can use to make their data accessible, while complying with data protection laws. | Harvard Dataverse, PhysioNet, Zenodo |
| Ensure comparability of models and their performances | To advance the field, it is important that researchers compare their models to existing models in order to evaluate and compare the performance across different studies. This necessitates improvements in prevalence reporting as well as the choice of different performance metrics. | Report prevalence and AUPRC in addition to other metrics. |
| Use licenses for code | Licenses protect the creators and the users of code. Numerous open source licenses exist, making it possible to satisfy the constraints of most authors, including companies that want to protect their intellectual property. | Apache license, BSD licenses, GPL |