| Literature DB >> 34106618 |
Ke Li1, Qinwen Shi1, Siru Liu2, Yilin Xie1, Jialin Liu3,4.
Abstract
ABSTRACT: Sepsis is a leading cause of mortality in the intensive care unit. Early prediction of sepsis can reduce the overall mortality rate and cost of sepsis treatment. Some studies have predicted mortality and development of sepsis using machine learning models. However, there is a gap between the creation of different machine learning algorithms and their implementation in clinical practice.This study utilized data from the Medical Information Mart for Intensive Care III. We established and compared the gradient boosting decision tree (GBDT), logistic regression (LR), k-nearest neighbor (KNN), random forest (RF), and support vector machine (SVM).A total of 3937 sepsis patients were included, with 34.3% mortality in the Medical Information Mart for Intensive Care III group. In our comparison of 5 machine learning models (GBDT, LR, KNN, RF, and SVM), the GBDT model showed the best performance with the highest area under the receiver operating characteristic curve (0.992), recall (94.8%), accuracy (95.4%), and F1 score (0.933). The RF, SVM, and KNN models showed better performance (area under the receiver operating characteristic curve: 0.980, 0.898, and 0.877, respectively) than the LR (0.876).The GBDT model showed better performance than other machine learning models (LR, KNN, RF, and SVM) in predicting the mortality of patients with sepsis in the intensive care unit. This could be used to develop a clinical decision support system in the future.Entities:
Year: 2021 PMID: 34106618 PMCID: PMC8133100 DOI: 10.1097/MD.0000000000025813
Source DB: PubMed Journal: Medicine (Baltimore) ISSN: 0025-7974 Impact factor: 1.889
Figure 1The flowchart for including patients in the study.
Patient demographic information.
| Variable | Death (n = 1352) | Survival (n = 2585) | |
| Gender | |||
| Female | 578 (42.8%) | 1147 (44.4%) | .344 |
| Male | 774 (57.2%) | 1438 (55.6%) | .344 |
| Age (y) (mean, SD) | 68.9 ± 14.9 | 65.5 ± 16.7 | <.01 |
| Ethnicity | |||
| Caucasian | 950 (70.3%) | 1894 (73.3%) | .047 |
| Hispanic | 37 (2.7%) | 90 (3.5%) | .218 |
| African American | 109 (8.1%) | 246 (9.5%) | .143 |
| Other | 256 (18.9%) | 355 (13.7%) | <.01 |
| ICU days (mean, SD) | 17.4 ± 18.1 | 18.2 ± 16.5 | .176 |
Death = death of septic patients during hospitalization.
Comparison of performance of the 5 models.
| LR | KNN | SVM | RF | GBDT | |
| AUC | 0.876 | 0.877 | 0.898 | 0.980 | 0.992 |
| Precision | 0.723 | 0.806 | 0.828 | 0.931 | 0.948 |
| Recall | 0.776 | 0.624 | 0.749 | 0.885 | 0.917 |
| Accuracy | 0.821 | 0.819 | 0.860 | 0.938 | 0.954 |
| F1 score | 0.715 | 0.702 | 0.780 | 0.907 | 0.933 |
Comparison of AUROC and F1 among the different models.
| AUROC | F1 | P(AUROC) | P(F1) | |
| GBDT | 0.992 (0.989–0.994) | 0.933 (0.929–0.938) | <0.01 vs LR<0.01 vs KNN<0.01 vs RF<0.01 vs SVM | <0.01 vs LR<0.01 vs KNN<0.01 vs RF<0.01 vs SVM |
| LR | 0.876 (0.864–0.885) | 0.715 (0.704–0.723) | 0.774 vs KNN<0.01 vs RF0.012 vs SVM | 0.354 vs KNN<0.01 vs RF<0.01 vs SVM |
| KNN | 0.877 (0.871–0.885) | 0.702 (0.665–0.730) | <0.01 vs RF0.010 vs SVM | <0.01 vs RF<0.01 vs SVM |
| RF | 0.980 (0.978–0.984) | 0.907 (0.896–0.930) | <0.01 vs SVM | <0.01 vs. SVM |
| SVM | 0.898 (0.880–0.914) | 0.780 (0.771–0.801) |
Figure 2Comparison of the ROC curve of the 5 models. ROC = receiver operating characteristic curve.
Figure 3Top-10 variable importance of GBDT. GBDT = gradient boosting decision tree, GCS = Glasgow coma scale, max2 = parameter maximum in 48 hours of admission, Mean1 = average of parameters within 24 hours of admission, mean2 = average of parameters within 48 hours of admission, min2 = parameter minimum in 48 hours of admission, PTT = partial thromboplastin time.