| Literature DB >> 35028534 |
Hong-Fei Deng1,2, Ming-Wei Sun3, Yu Wang1,2,3,4, Jun Zeng1,2,3,4, Ting Yuan1,2, Ting Li1,2, Di-Huan Li1,2, Wei Chen5, Ping Zhou6, Qi Wang7, Hua Jiang1,2,3,4.
Abstract
Studies for sepsis prediction using machine learning are developing rapidly in medical science recently. In this review, we propose a set of new evaluation criteria and reporting standards to assess 21 qualified machine learning models for quality analysis based on PRISMA. Our assessment shows that (1.) the definition of sepsis is not consistent among the studies; (2.) data sources and data preprocessing methods, machine learning models, feature engineering, and inclusion types vary widely among the studies; (3.) the closer to the onset of sepsis, the higher the value of AUROC is; (4.) the improvement in AUROC is primarily due to using machine learning as a feature engineering tool; (5.) deep neural networks coupled with Sepsis-3 diagnostic criteria tend to yield better results on the time series data collected from patients with sepsis. The new evaluation criteria and reporting standards will facilitate the development of improved machine learning models for clinical applications.Entities:
Keywords: Clinical medicine; Machine learning
Year: 2021 PMID: 35028534 PMCID: PMC8741489 DOI: 10.1016/j.isci.2021.103651
Source DB: PubMed Journal: iScience ISSN: 2589-0042
Figure 1Literature screening flowchart
Basic information of the included studies
| Study | Sepsis definition | Target | Data sources | Missing data processing | Training data | Testing data | Validation data |
|---|---|---|---|---|---|---|---|
| sepsis3.0 | Early prediction of sepsis | 49 urban community hospitals operated by Tenet Healthcare | NR | 1,839,503 | 920,026 | NR | |
| sepsis3.0 | Detection and early prediction of sepsis | UCSF data+BIDMC data | Carry-forward and replacing by mean | NR | NR | NR | |
| Infection + SIRS | Mortality prediction of sepsis | Four emergency departments | K-means | 4222 | NR | 1056 | |
| ICD-9 | Detection and early prediction of sepsis | MIMIC-II | Replacing by nearest measured value | 252 | 72 | 36 | |
| SIRS | Sepsis detection | UCSF data+BIDMC data | Carry-forward and replacing by mean | 80% | 20% | NR | |
| Clinical adjudication label | Early prediction of sepsis | Carle Foundation Hospital | NR | NR | NR | NR | |
| Angus | Early prediction of sepsis | MIMIC-III | Forward-filling | 81% | 10% | 9% | |
| SIRS + qSOFA | Mortality prediction of sepsis | Chang Gung Research Database | Replacing by medium number of the column | 70% | 30% | NR | |
| the criteria of the Agency for Healthcare Research and Quality | Severe sepsis prediction | DECLARE data | Replacing by mean value | 70% | NR | 30% | |
| Infection +SIRS | Early prediction of sepsis | Israel Rabin Medical Center | NR | 75% | 25% | NR | |
| Infection + qSOFA | Mortality prediction of sepsis | Four hospitals of Korea | NR | 74% | 18% | 8% | |
| sepsis3.0 | Early prediction of sepsis | two hospitals within the Emory Healthcare system and an ICU database | NR | 80% | 20% | NR | |
| Infection +SIRS | Early detection and prediction of sepsis | Four Danish municipalities data | NR | 80% | 10% | 10% | |
| ICD9+SIRS | Early prediction of sepsis | MIMIC-III | Liner interpolation and “carry forward/backward” extrapolation | NR | NR | NR | |
| Sepsis3.0 | Mortality prediction of sepsis | MIMIC III v1.4 | Remove the variables with more than 20% observations missing + multiple imputation method | NR | NR | NR | |
| Sepsis3.0 | Mortality prediction of sepsis | MIMIC III | Remove the patients with more than 30% predictor variable missing + Replace by mean value | NR | NR | NR | |
| SIRS + infection + end organ failure | Early detection of sepsis | ED of a quaternary academic hospital | NR | NR | NR | NR | |
| Infection + SIRS/SOFA | Mortality prediction of sepsis | ED at the Maastricht University Medical Center+ | NR | 1244 | NR | 100 | |
| ICD-9 | Mortality prediction of sepsis | MIMIC-III V1.4 | Remove the patients with data missing more than 30% + Replace by mean value | NR | NR | NR | |
| SIRS | Early severe sepsis prediction | The Dascena Analysis Dataset and the Cabell Huntington Hospital Dataset | last-one carry forward | NR | NR | NR | |
| Sepsis3.0 | Mortality prediction of sepsis | MIMIC-III | Remove the patients with data missing more than 40% + Replace by 21% and mean value | NR | NR | NR |
Abbreviation:SIRS: Systemic Inflammatory Response Syndrome; ICD9:international classification of diseases 9; NR: not reported.
Quality evaluation of including studies
| Study | Inclusioncriteria | Data preprocessed | Data source and collection | The source of the feature | Ethical issue | Detail discussion | Measurement of models' performance | Cross-validation/evaluation method |
|---|---|---|---|---|---|---|---|---|
| 0 | 0 | 1 | 1 | 0 | 1 | 1 | 1 | |
| 0 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | |
| Taylro, 2015 | 1 | 1 | 1 | 0 | 0 | 1 | 1 | 0 |
| 1 | 1 | 1 | 1 | 0 | 1 | 1 | 0 | |
| 0 | 1 | 1 | 1 | 0 | 1 | 1 | 1 | |
| 0 | 0 | 1 | 1 | 0 | 1 | 1 | 1 | |
| 1 | 1 | 1 | 1 | 0 | 1 | 1 | 0 | |
| 0 | 1 | 1 | 1 | 0 | 1 | 1 | 1 | |
| 1 | 1 | 1 | 1 | 0 | 1 | 1 | 1 | |
| 1 | 1 | 1 | 1 | 0 | 1 | 1 | 1 | |
| 1 | 0 | 1 | 1 | 0 | 1 | 1 | 1 | |
| 1 | 0 | 1 | 0 | 0 | 1 | 1 | 0 | |
| 1 | 0 | 1 | 0 | 0 | 1 | 1 | 0 | |
| 1 | 1 | 1 | 0 | 0 | 1 | 1 | 1 | |
| 1 | 1 | 1 | 1 | 0 | 1 | 1 | 0 | |
| 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | |
| 1 | 0 | 1 | 0 | 0 | 1 | 1 | 0 | |
| 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | |
| 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | |
| 1 | 1 | 1 | 1 | 0 | 1 | 1 | 1 | |
| 1 | 1 | 1 | 1 | 0 | 1 | 1 | 0 |
Annotation: The contents of have been tweaked to better fit machine learning research.
Prediction (AUROC) of each model at different hours in the sepsis studies
| Study | Model | Algorithm | Different hours | ||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| −48 | −24 | −12 | −10 | −8 | −6 | −5 | −4 | −3 | −2 | −1 | −0.25 | 0 | |||
| RoS | Gradient boosting | 0.97 | 0.93 | ||||||||||||
| MLA | Gradient boosted trees | 0.83 | 0.84 | 0.88 | |||||||||||
| SepLSTM | long short-term memory | 0.93 | 0.94 | 0.96 | 0.99 | ||||||||||
| SVM-RBF | SVM-RBF | 0.8141 | 0.8879 | 0.8807 | 0.8639 | 0.8675 | |||||||||
| Weilbull-Cox proportional hazards | Weilbull-Cox proportional hazards | 0.79 | 0.8 | 0.81 | 0.82 | ||||||||||
| CNN-LSTM | CNN-LSTM | 0.752 | 0.792 | 0.842 | 0.879 | ||||||||||
| RNN | RNN | 0.76 | 0.79 | 0.81 | |||||||||||
Abbreviation:RoS: Risk of Sepsis; MLA: machine learning algorithm; LSTM: long short-term memory; SVM-RBF: support vector machines with radial basis function; CNN-LSTM: convolutional neural network-long short-term memory; RNN: recurrent neural network.
Figure 2Predicting performance of multi-time points, related to Table 3
AUROC and time points of mortality prediction studies
| Study | Model | Algorithm | AUROC | Time |
|---|---|---|---|---|
| Taylro, 2015 | Logistic regression | Logistic regression | 0.755 | 28 days |
| CART | Classification and regression tree | 0.693 | ||
| Random forest | Random forest | 0.860 | ||
| MEDS score | NR | 0.705 | ||
| CURB-65 score | NR | 0.734 | ||
| REMS score | NR | 0.717 | ||
| KNN | KNN | 0.84 | 28 days | |
| SoftMax | SoftMax | 0.88 | ||
| PCA + SoftMax | PCA + SoftMax | 0.91 | ||
| AE + SoftMax | AE + SoftMax | 0.90 | ||
| CNN + SoftMax | CNN + SoftMax | 0.92 | ||
| qSOFA scores | NR | 0.78 | 3 days | |
| qSOFA-based machine-learning models | Extreme gradient boosting, light gradient boosting machine, and random forest | 0.86 | ||
| XGBoost | eXtreme Gradient Boosting | 0.857 | 30 days | |
| logistic regression | logistic regression | 0.819 | ||
| SAPS-II scores | Simplified acute physiology score-II | 0.797 | ||
| LASSO | least absolute shrinkage and selection operator | 0.829 | In hospital | |
| RF | random forest | 0.829 | ||
| GBM | gradient boosting machine | 0.845 | ||
| LR | logistic regression | 0.833 | ||
| SAPS II | Simplified acute physiology score-II | 0.77 | ||
| GBDT | GBDT | 0.992 | In hospital | |
| LR | Logistic regression | 0.876 | ||
| KNN | k-nearest neighbor | 0.877 | ||
| RF | Random forest | 0.980 | ||
| SVM | Support vector machine | 0.898 | ||
| XGBoost | Extreme gradient boosting | 0.848 | In hospital | |
| SAPSII | The simplified acute physiology score | 0.777 | ||
| SOFA | Sequential organ failure assessment score | 0.704 | ||
| SIRS | Systemic inflammatory response syndrome | 0.609 | ||
| qSOFA | Quick sequential organ failure assessment | 0.580 |
Abbreviation: CART: classification and regression tree; MEDS: mortality in emergency department sepsis score; KNN: K nearest neighbor; REMS: rapid emergency medicine score; CURB-65 score: the confusion, urea nitrogen, respiratory rate, blood pressure, 65 years of age and older; PCA: principal component analysis; AE: Autoencoder; CNN: Convolutional Neural Network; qSOFA: quick Sequential Organ Failure Assessment.
Features engineering and included features of each study
| Study | Number of initial features | Number of final features | Including features |
|---|---|---|---|
| 217 | 13 | Lactic acid (max), Shock index age (last), WBC count(max), Lactic acid(change), Neutrophils(max), Glucose(max), Blood urea nitrogen(max), Shock index age (first), Respiratory rate (max), Albumin (last), Systolic blood pressure (min), Serum creatinine (max), Temperature (max) | |
| 6 | 6 | SpO2, heart rate, respiratory rate, temperature, systolic blood pressure, diastolic blood pressure | |
| Taylro, 2015 | 566 | 20 | Oxygen saturation, Respiratory rate, Blood pressure, BUN, Albumin, Intubation, Procedures (in ED), Need for vasopressors, Age, RN resp care, RDW, Potassium, AST, Heart rate, Acuity level(triage), ED impression (Dx), CO2 (Lab), ECG performed, Beta-blocker (Home Med), Cardiac dysrhythmia (PMHx) |
| 9 | 9 | systolic pressure, pulse pressure, heart rate, body temperature, respiratory rate, WBC count, pH, blood oxygen saturation,age | |
| 6 | 6 | systolic blood pressure, diastolic blood pressure, heart rate, respiratory rate, temperature, peripheral capillary oxygen saturation | |
| 31 | NR | TNF-α, IL-1β, GCSF, IL-6, PCT, sTREM1, IL18, MMP9, TNFR1, TNFR2, IP10, MCP1, IL-1ra, NA, CD64, WBC, Lactic Acid, Systolic Blood Pressure, Diastolic Blood Pressure, Pulse, Temperature, Respirations, PCO2, Age, Gender, Bilirubin, Glasgow Coma Scale, Creatinine, Platelet, SOFA score, qSOFA score | |
| 47 | 34 | White blood cell count, Heart rate, Diastolic blood pressure, Systolic blood pressure, Mean blood pressure, Weight, Anion gap, Bicarbonate, Oxygen saturation, Height, Temperature, pH | |
| 20 | 4 | the number of trend changes in respiratory rate and arterial pressure, the minimal change in respiratory rate, and the median change in heart rate | |
| 14 | NR | Age, sex, diagnoses at the ED, systolic blood pressure, respiration rate, mental status, body temperature, heart rate, arterial partial pressure of carbon dioxide, white blood cell count, duration of hospitalization, ICU admission, mechanical ventilation, mortality. | |
| 65 | 65 | RRSTD, MAPSTD, HRV1, BPV1, HRV2, BPV2, MAP, HR, O2Sat, SBP, DBP, RESP, Temp, GCS, PaO2, FIO2, WBC, Hemoglobin, Hematocrit, Creatinine, Bilirubin and Bilirubin direct, Platelets, INR, PTT, AST, Alkaline Phosphatase, Lactate, Glucose, Potassium, Calcium, BUN, Phosphorus, Magnesium, Chloride, B-type BNP, Troponin, Fibrinogen, CRP, Sedimentation Rate, Ammonia, pH, pCO2, HCO3, Base Excess, SaO2, Care Unit (Surgical, Cardiac Care, or Neuro intensive care), Surgery in the past 12 h, Wound Class (clean, contaminated, dirty, or infected), Surgical Specialty (Cardiovascular, Neuro, Ortho-Spine, Oncology, Urology, etc.), Number of antibiotics in the past 12, 24, and 48 h, Age, CCI, Mechanical Ventilation, maximum change in SOFA score over the past 6 h. | |
| 22 | 11 | urine output, lactate, Bun, sysbp, INR, age, cancer, SpO2, sodium, AG, creatinine |
Annotation: The study of Tanejia2017 and YS2020 has established a variety of different models with different numbers of included features, so all features are provided. The Saqib et al. (2018) study provides only partial features.
Abbreviation: WBC count: white blood cell count; BUN: blood urea nitrogen; RDW: Red blood cell distribution width; AST: aspartate transaminase; ED: emergency department; ECG: electrocardiogram; SOFA: Sequential Organ Failure Assessment; qSOFA: quick Sequential Organ Failure Assessment; RRSTD: standard deviation of respiratory rate intervals; MAPSTD: standard deviation of mean arterial pressure; HRV1: average multiscale entropy of respiratory rate; BPV1:averagemultiscale entropy of mean arterial pressure; HRV2:average multiscale conditional entropy of respiratory rate; HRV2:average multiscale conditional entropy of respiratory rate; MAP: Mean Arterial Blood Pressure; HR: Heart Rate; O2Sat: Oxygen Saturation; SBP: Systolic Blood Pressure; DBP: Diastolic Blood Pressure; RESP: Respiratory Rate; Temp: Temperature; GCS: Glasgow Coma Scale; PaO2: Partial Pressure of Arterial Oxygen; FIO2: Fraction of Inspired O2; INR: International Normalized Ratio, PTT: Partial Prothrombin Time, AST: Aspartate Aminotransferase, BNP:B-type Natriuretic Peptide; CCI: Charleston Comorbidity Index; sysbp: systolic blood pressure; AG: anion gap.
Report standards list of machine learning in clinical medication
| Section and topic | Item | Description |
|---|---|---|
| Title/Abstract/Keywords | 1 | Can be judged as a machine learning predictive research. (Keywords,such as machine learning,prediction) |
| Introduction | 2 | Introduce background, existing problems, and study targets,such as evaluating machine learning models to predict prognoses and probability of disease occurrence |
| Method research subject | 3 | Inclusion and exclusion criteria, locations where data is collected and time range |
| 4 | Describe reasons of patients' selection, including symptoms, laboratorial results, or disease golden standard. | |
| 5 | Describe golden standard and provide references | |
| Research data | 6 | Describe whether study is based on past datasets (retrospective study) or latest collection data (prospective study). |
| 7 | Describe the data collection process. | |
| 8 | Describe the process of feature engineering. At least explain why choose this way to select features. | |
| Results Building model | 9 | Provide flowchart of the including and excluding process, describe demographic and clinical characteristics (such as age, sex, height, and weight) |
| 10 | Describe data preprocessing methods, including missing data processing, and smoothly processing sparse data. | |
| 11 | Describe the mathematical theory of the algorithm and its advantages. | |
| 12 | Describe numbers and names of finally including features | |
| Research results | 13 | Describe models performance at different time points (provide at least one evaluation indicator, such as AUROC, accuracy). |
| Discussion | 14 | Discuss clinical universality of predictive models, including heterogeneity discussion and clinical prospective validation. |
| REAGENT or RESOURCE | SOURCE | IDENTIFIER |
|---|---|---|
| Studies' methodologies and AUROC of prediction | Contained in the article | N/A |
| MIMIC database | MIMIC database | |
A literature retrieval strategy for sepsis prediction
| Databases | Search strategy |
|---|---|
| PubMed | ((sepsis [Title/Abstract]) and (machine learning [Title/Abstract])) and (prediction [Title/Abstract]) |
| ScienceDirect | Title, abstract, keywords: sepsis, machine learning, predict |
| The engineering index | (((sepsis) and (machine learning) and (prediction) and (mortality) and (onset)) WN KY) |
| Web of science | Title:(sepsis) and Title:(machine learning) and Title:(prediction) |
| CNKI | ky = 'sepsis' and ky = 'machine learning' and ky= 'prediction' |
| WANFANG DATA | Title or keywords: “sepsis” and “machine learning” and “prediction” |