| Literature DB >> 34897469 |
Melissa Y Yan1, Lise Tuset Gustad2,3, Øystein Nytrø1.
Abstract
OBJECTIVE: To determine the effects of using unstructured clinical text in machine learning (ML) for prediction, early detection, and identification of sepsis.Entities:
Keywords: electronic health records; machine learning; natural language processing; sepsis; systematic review
Mesh:
Year: 2022 PMID: 34897469 PMCID: PMC8800516 DOI: 10.1093/jamia/ocab236
Source DB: PubMed Journal: J Am Med Inform Assoc ISSN: 1067-5027 Impact factor: 4.497
Figure 1.PRISMA (Preferred Reporting Items for Systemic reviews and Meta-Analyses) flowchart for study selection.
Study characteristics
| Study (year) | Clinical setting and data source | Sample size | Cohort criteria infection definition | Task and objective |
|---|---|---|---|---|
| Horng et al. |
ED Beth Israel Deaconess (Boston, MA, United States) Dec 17, 2008—Feb 17, 2013 |
230 936 patient visits Infection: 32 103 P; 14% No infection: 198 833 P; 86% Train : 147 799 P; 64% Validation: 46 187 P; 20% Test: 36 950 P; 16% | Angus Sepsis ICD-9-CM abstraction criteria | Identify patients with suspected infection to demonstrate benefits of using clinical text with structured data for detecting ED patients with suspected infection. |
| Apostolova and Velez |
ICU MIMIC-III 2001–2012 |
634 369 nursing notes Infection presence: 186 158 N; 29% Possible infection: 3262 N; 1% No infection: 448 211 N; 70% Train: 70% Test: 30% | Notes describing patient taking or being prescribed antibiotics for treating infection | Identify notes with suspected or presence of infection to develop a system for detecting infection signs and symptoms in free-text nursing notes. |
| Culliton et al. |
Inpatient care Baystate hospitals (Springfield, MA, United States) 2012–2016 |
203 000 adult inpatient admission encounters Used 68 482 E Severe sepsis: 1427 E; 2.1% 3-fold cross validation: only text data Model construction: 2012–2015 data Test set: 2016 data: Used 13 603 E Severe sepsis: 425 P; 3.1% | Modified Baystate clinical definition of severe sepsis (8 structured variables) and severe sepsis ICD codes | Predict severe sepsis 4, 8, and 24 h before the earliest time structured variables meet the severe sepsis definition to compare accuracy of predicting patients that will meet the clinical definition of sepsis when using unstructured data only, structured data only, or both types. |
| Delahanty et al. |
ED Tenet Healthcare Hospitals (Nashville, TN, United States) January 1, 2016—October 31, 2017 |
2 759 529 patient encounters Sepsis: 54 661 E; 2% No Sepsis: 2 704 868 E; 98% Train: 1 839 503 E; 66.7% Sepsis: 36 458 E; 2% No sepsis: 1 803 045 E; 98% Test: 920 026 E; 33.3% Sepsis: 18 203 E; 2% No sepsis: 901 823 E; 98% | Rhee’s modified Sepsis-3 definition | Predict sepsis risk in patients 1, 3, 6, 12, and 24 h after the first vital sign or laboratory result is recorded in the EHR to develop a new sepsis screening tool comparable to benchmark screening tools. |
| Liu et al. |
ICU MIMIC-III 2001–2012 |
38 645 adult patients Train: 70% P Test: 30% P Applied model to: 15 930 P with suspected infection and at least 1 physiological EHR data | Sepsis-3 definition | Predict septic shock in sepsis patients before the earliest time septic shock criteria are met to demonstrate an approach using NLP features for septic shock prediction. |
| Amrollahi et al. |
ICU MIMIC-III 2001–2012 |
40 175 adult patients Sepsis: 2805 P; ∼7% Train 80% P Test 20% P | Sepsis-3 definition | Predict sepsis onset hours in advance using a deep learning approach to show a pre-trained neural language representation model can improve early sepsis detection. |
| Hammoud et al. |
ICU MIMIC-II 2001–2007 |
17 763 patients Sepsis: 6097 P Severe sepsis: 3962 P Septic shock : 1469 P 5-fold cross validation | Sepsis definition based on what Henry et al | Predict early septic shock in ICU patients using a model that can be optimized based on user preference or performance metrics. |
| Goh et al. |
ICU Singapore government-based hospital (Singapore, Singapore) Apr 2, 2015—Dec 31, 2017 |
5317 patients (114 602 notes) Train and validation: 3722 P (80 162 N) Sepsis: 6.45% No sepsis: 93.55% Test: 1595 P (34 440 N) Sepsis: 5.45% No sepsis: 94.55% | ICU admission with an ICD-10 code for sepsis, severe sepsis, or sepsis shock | Identify if a patient has sepsis at consultation time or predict sepsis 4, 6, 12, 24, and 48 h after consultation to develop an algorithm that uses structured and unstructured data to diagnose and predict sepsis. |
| Qin et al. |
ICU MIMIC-III 2001–2012 |
49 168 patients Train: 33 434 P Sepsis: 1353 P No Sepsis: 32 081 P Validation: 8358 P Sepsis: 338 P No Sepsis: 8020 P Test: 7376 P Sepsis: 229 P No Sepsis: 7077 P | PhysioNet Challenge restrictive Sepsis-3 definition | Predict if a patient will develop sepsis to explore how numerical and textual features can be used to build a predictive model for early sepsis prediction. |
ED: emergency department; ICU: intensive care unit; ICD: International Classification of Diseases; ICD-9 CM: ICD Clinical Modification, 9th revision; ICD-10: ICD 10th revision; MIMIC-II: Multiparameter Intelligent Monitoring in Intensive Care II database; MIMIC-III: Medical Information Mart for Intensive Care dataset.
Sample size unit abbreviations: P: patients; N: notes; E: encounters.
Clinical documentation from electronic health records
| Documentation types | Author | Description | Temporal perspective | Record latency | Frequency |
|---|---|---|---|---|---|
| Chief complaints |
Physician Nurse Specialist | Symptoms or complaints provided by a patient at start of care for why they are seeking care. | Current | Seconds to days | One per episode |
| History-and-physical notes |
Physician Nurse | Past medical history, family history, developmental history of present illness, problems about present illness, past medications or immunizations, allergies, or habits. | Retrospective | Immediately | One per episode |
| Progress notes |
Physician Nurse Specialist (eg, respiratory therapist) |
Observations of patient status and care provided to document progress and response to treatment plans. For physician, it includes determining diagnosis, prescriptions, and laboratory orders. |
Retrospective Prospective | 4–8 h | One per shift |
| Reports | Specialist | Radiologist results and cardiology results. | Retrospective | Days | One to many per episode |
| Discharge summary notes | Health care personnel | Episode of care summary and follow-up plans. |
Retrospective Prospective | At discharge or days after | One per episode |
| Discharge summary letter | Physician | Formal required letter containing follow-up treatment plans. |
Retrospective Prospective | Days to months after episode | One per episode |
| Laboratory results | Laboratory technician | Laboratory test analysis results from provided samples (eg, blood, urine, skin, and device) based on the physician’s order. | Retrospective | Days | One to many per episode |
| ICD codes |
Physician Professional ICD coder ICD data aggregator organization | Diagnosis classification for billing. | Retrospective | Days to months | One per episode |
| Administrative |
Administration | Patient information such as name, age, gender, address, contact information, and occupation. |
Retrospective Current | Immediately | One per episode |
Record latency is defined as time between measurement/observation and the availability of the results in electronic health records.
Figure 2.Overview of data from a patient timeline used to create models. The proximity of events toward a patient’s actual state and the actual documentation recorded in the electronic health records typically has delays. Green represents patient states as sepsis develops in a patient. Yellow are observations made by clinicians. Documentation includes ICU vital signsa in pink, narrative notes in blue, and ICD codes in orange. ICU vital signa documentation can be instantaneous, narrative notes can be written after observations are made, and ICD codes are typically registered after a patient is discharged. PIVC: peripheral intravenous catheter. aVital signs include temperature, pulse, blood pressure, respiratory rate, oxygen saturation, and level of consciousness and awareness.
Figure 3.Different types of windows were used to obtain longitudinal data. Each gray box represents a single window, which can vary in duration (length of time) depending on the study. One window with the whole encounter means the study used a single window containing data with a duration of the whole encounter from admittance until discharge. One window before onset signifies data from a window with a duration of time before sepsis, severe sepsis, or septic shock onset. Sliding windows are consecutive windows until before sepsis, severe sepsis, or septic shock onset; this includes non-overlapping and overlapping sliding windows. Non-overlapping sliding windows indicate that data within one window of a fixed duration does not contain data in the next window. In contrast, overlapping sliding windows indicate windows of a fixed duration overlap, and data within one window will be partially in the next window.
Text used in studies
| Study (year) | Free-text document type | Unit of analysis | Text processing |
|---|---|---|---|
| Horng et al. |
ED chief complaints Nursing triage assessments | One note |
Representation: Bi-gram BoW (15 240-word vocabulary) LDA topic modeling (500 topics) Techniques: Convert to lowercase Remove rare tokens and punctuation Negation |
|
Apostolova and Velez (2017) | Nursing notes | One note |
Representation: BoW CBOW (200 vector size with window size of 7 = 441-term vocabulary of antibiotics usage and rules for negation and speculations) tf-idf PV (600 vector size for document-level representation) Techniques: Convert to lowercase Remove frequent tokens and non-alphanumeric characters Negation |
| Culliton et al. | Clinical notes (mostly progress notes and history-and-physical notes) | One patient encounter |
Representation: GloVe (300-dimensional vector) + summing word vectors Techniques: Concatenated all notes for an encounter into a single text block |
| Delahanty et al. | ED chief complaints | Keywords | Other:
Keywords extracted by experts |
| Liu et al. | All MIMIC-III clinical notes, such as but not limited to:
Nursing notes Physician notes | One note |
Representation: BoW (8907 unique term vocabulary and 832 predictive terms) GloVe (300-dimensional vector for each unique term) Techniques: Convert to lowercase Remove rare tokens, frequent tokens, and non-alphanumeric characters |
| Amrollahi et al. |
Nursing notes Physician notes | One note |
Representation: tf-idf (2227 vector size features = 2187 text features + 40 structured features) ClinicalBERT (808 vector size features = 768 text features + 40 structured features) Techniques: Remove rare tokens, frequent tokens, stop words, dates, and special characters |
| Hammoud et al. | All MIMIC-II notes except discharge summaries, such as but not limited to:
Nursing progress notes Respiratory therapist progress notes | One note |
Representation: BoW tf-idf Techniques: Remove rare and frequent tokens |
| Goh et al. | Physician notes:
Admission notes Progress notes ICU consultations Pharmacy notes Allied health notes | One note |
Representation: tf-idf LDA topic modeling (100 topics) Techniques: Remove rare tokens, punctuation, and stop words Lemmatization POS tagging Manual classification of topics into categories |
| Qin et al. |
Nursing notes Physician notes Radiology notes Respiratory notes | Many notes |
Representation: tf-idf (1000 vector size = 1000 most common term vocabulary) ClinicalBERT (768 vector size features Techniques: Random-order concatenation of all clinical notes within the hour of consideration. Named entity recognition |
BoW: Bag-of-words; CBOW: Continuous bag-of-words; ClinicalBERT: Clinical Bidirectional Encoder Representations from Transformers; ED: emergency department; GloVe: Global Vectors for Word Representation; ICU: intensive care unit; LDA: Latent Dirichlet Allocation; POS tagging: Part-of-speech tagging; PV: paragraph vectors; tf-idf: term frequency-inverse document frequency.
Representation and technique details for Qin et al were provided through personal communications (with Fred Qin on September 7, 2021).
Figure 4.The unit of analysis used to train machine learning models for the included studies was either (1) a single note, (2) a set of many notes, or (3) keywords. In general, text was preprocessed and represented as features interpretable by a computer, then structured data were added, and the data were used to fit machine learning models.
Figure 5.Overview of area under the curve (AUC) values for identification or early detection of infection, sepsis, septic shock, and severe sepsis using different data types (structured data and text, structured data only, and text only). Each figure contains the study and year, machine learning model,a and natural language processing techniqueb. (A) AUC values for infection identification. Horng et al 2017: SVM (BoW) has 2 AUC values; 0.86 when using chief complaints and nursing notes and 0.83 when using only chief complaints. (B) AUC values for early sepsis detection. Amrollahi et al AUC values are from detecting 4 h before sepsis onset, and Qin et al AUC values are the average from detecting 0 to 6 h before sepsis onset. (C) AUC values for early septic shock detection. Hammoud et al AUC values are from detecting 30.64 h before septic shock onset, and Liu et al AUC values are from detecting 6.0 to 7.3 h before septic shock onset. (D) AUC values for early sepsis, severe sepsis, or septic shock detection and sepsis identification in Goh et al. Different symbols separate data types. (E) AUC values for early septic shock detection for Culliton et al using results from the test set. (F) AUC values for early septic shock detection for Culliton et al using results from 3-fold validation. Disclaimer: AUC values should not be directly compared between studies and different figures for infection, sepsis, severe sepsis, and septic shock. Additionally, the lines connecting points do not indicate AUC values changing over time (Figure 5D and 5F); lines only separate the different methods visually. aMachine learning models: dag: dagging (partition data into disjoint subgroups); GBT: gradient boosted trees; GRU: gated recurrent unit; LSTM: long short-term memory; NB: Naïve Bayes; RF: random forest; SVM: support vector machines. bNatural language processing techniques: BoW: Bag-of-words; ClinicalBERT: Clinical Bidirectional Encoder Representations from Transformers; ClinicalBERT-m: ClinicalBERT from merging all textual features to get embeddings; ClinicalBERT-sf; finetuned ClinicalBERT from concatenating individual embeddings of each textual feature; CM: Amazon Comprehend Medical service for named entity recognition; GloVe: Global Vectors for Word Representation; LDA: Latent Dirichlet Allocation; tf-idf: term frequency-inverse document frequency.
Study outcome overview of best and worst area under the curve values
| Study (year) | Hours | Data types | Models | AUC | |
|---|---|---|---|---|---|
| DVLMC | T | ||||
| Horng et al. | Identify | DV- - - | CC + NN | RF (BoW) | 0.87 |
| DV- - - | – | NB | 0.65 | ||
| Apostolova and Velez | Identify | - - - - - | NN | SVM (BoW + tf-idf) | – |
| - - - - - | NN | Logistic regression + KNN + SVM (PV) | – | ||
| Culliton et al. | −4 | - - - - - | CN | Ridge regression (GloVe) | 0.64 |
| −8 | - - - - - | CN | Ridge regression (GloVe) | 0.66 | |
| −24 | - - - - - | CN | Ridge regression (GloVe) | 0.73 | |
| −24 | -V- -C | CN | Ridge regression (GloVe) | 0.85 | |
| -V- -C | – | Ridge regression (GloVe) | 0.80 | ||
| Delahanty et al. | +1 | -VL- - | – | GBT | 0.93 |
| +3 | -VL- - | – | GBT | 0.95 | |
| +6 | -VL- - | – | GBT | 0.96 | |
| +12 | -VL- - | – | GBT | 0.97 | |
| +24 | -VL- - | – | GBT | 0.97 | |
| Liu et al. | −7 | -VLM- | CN | GRU (GloVe) | 0.92 |
| −7.3 | -VLM- | CN | GBT (BoW) | 0.91 | |
| −6 | -VLM- | – | GBT | 0.85 | |
| Amrollahi et al. | −4 | -VL- - | PN + NN | LSTM (ClinicalBERT) | 0.84 |
| - - - - - | PN + NN | LSTM (ClinicalBERT) | 0.74 | ||
| Hammoud et al. | −30.6 | DVL- - | CN | Lasso regression (BoW + tf-idf) | 0.89 |
| Goh et al. | Identify | DVLM- | PN | Logistic regression + RF (LDA) | 0.94 |
| DVLM- | PN | dag + Logistic regression (LDA) | 0.92 | ||
| −4 | DVLM- | – | Logistic regression + RF | 0.93 | |
| DVLM- | PN | dag + Logistic regression (LDA) | 0.85 | ||
| −6 | DVLM- | PN | Logistic regression + RF (LDA) | 0.92 | |
| DVLM- | PN | dag + Logistic regression (LDA) | 0.89 | ||
| −12 | DVLM- | PN | Logistic regression + RF (LDA) | 0.94 | |
| DVLM- | – | Logistic regression + RF | 0.79 | ||
| −24 | DVLM- | PN | Logistic regression + RF (LDA) | 0.90 | |
| DVLM- | – | Logistic regression + RF | 0.78 | ||
| −48 | DVLM- | PN | Logistic regression + RF (LDA) | 0.87 | |
| DVLM- | – | Logistic regression + RF | 0.77 | ||
| Qin et al. | −6 to 0 | -VL- - | CN | GBT (ClinicalBERT-sf) | 0.89 |
| -VL- - | – | GBT (ClinicalBERT-m) | 0.86 | ||
Hours: Identify: not detecting hours before or after; –: hours before; +: hours after an event.
Data types: D: demographics; V: vitals; L: laboratory; M: medications; C: codes; T: text; -‘s position in DVLMC indicates which is not used.
Text data types: CC: chief complaints; CN: various types of clinical notes; NN: nursing notes; PN: physician notes; –: no notes.
Machine learning models: dag: dagging (partition data into disjoint subgroups); GBT: gradient boosted trees; GRU: gated recurrent unit; KNN: K-nearest neighbors; LSTM: long short-term memory; NB: Naïve Bayes; RF: random forest; SVM: support vector machines.
Natural language processing (NLP) techniques: BoW: Bag-of-words; ClinicalBERT: Clinical Bidirectional Encoder Representations from Transformers; ClinicalBERT-m: ClinicalBERT from merging all textual features to get embeddings; ClinicalBERT-sf: finetuned ClinicalBERT from concatenating individual embeddings of each textual feature; GloVe: Global Vectors for Word Representation; LDA: Latent Dirichlet Allocation; PV: paragraph vectors; tf-idf: term frequency-inverse document frequency.
Area under the curve (AUC). Apostolova and Velez did not provide metrics for AUC.
Culliton et al performed 2 experiments, these results are from using a test set instead of 3-fold validation.
Number of hours before onset for Amrollahi et al was confirmed through personal communications (with Shamim Nemati on May 27, 2021 and Fatemeh Amrollahi on June 13, 2021).
Qin et al AUC values are an average from 0 to 6 h before sepsis, not the specified hours.