| Literature DB >> 33287854 |
Nianzong Hou1, Mingzhe Li2, Lu He3, Bing Xie1, Lin Wang4, Rumin Zhang4, Yong Yu4, Xiaodong Sun5, Zhengsheng Pan6, Kai Wang7.
Abstract
BACKGROUND: Sepsis is a significant cause of mortality in-hospital, especially in ICU patients. Early prediction of sepsis is essential, as prompt and appropriate treatment can improve survival outcomes. Machine learning methods are flexible prediction algorithms with potential advantages over conventional regression and scoring system. The aims of this study were to develop a machine learning approach using XGboost to predict the 30-days mortality for MIMIC-III Patients with sepsis-3 and to determine whether such model performs better than traditional prediction models.Entities:
Keywords: Logistic regression; MIMIC-III; Machine learning; SAPS-II score; Sepsis-3; Xgboost
Mesh:
Year: 2020 PMID: 33287854 PMCID: PMC7720497 DOI: 10.1186/s12967-020-02620-5
Source DB: PubMed Journal: J Transl Med ISSN: 1479-5876 Impact factor: 5.531
Fig. 1The detailed process of data extraction
Baseline characteristics, vital signs, laboratory parameters and statistic results of mimic-III patients with sepsis
| Death within 30 days | Survival within 30 days | ||
|---|---|---|---|
| Number (sample size) | 889 | 3670 | |
| Baseline variables and in-hospital factors | |||
| Age (year, mean SD) | 71.42 ± 15.93 | 63.61 ± 17.73 | 8.95E−36 |
| Sex (%) | |||
| Female | 413 | 1609 | |
| Male | 476 | 2061 | 0.1706 |
| Ethnicity (%) | |||
| White | 617 (69.4%) | 2642 (72.1%) | |
| Black | 66 (7.4%) | 338 (9.2%) | |
| Yellow | 34 (3.8%) | 145 (4%) | |
| Others | 172 (19.3%) | 545 (14.9%) | 0.005914 |
| Weight (kg), mean (SD) | 76.91 ± 21.31 | 82.69 ± 28.61 | |
| Height (cm), mean (SD) | 168.27 ± 11.66 | 169.51 ± 10.73 | |
| BMI (kg/m2), mean (SD) | 27.82 ± 7.65 | 29.25 ± 8.91 | |
| Length of stay in hospital, days, mean (SD) | 7.04 ± 6.3 | 10.99 ± 10.4 | |
| Length of stay in the ICU, days, mean (SD) | 4.87 ± 4.99 | 4.66 ± 6.31 | |
| Admission type | |||
| MED | 580 (65.2%) | 1912 (52.1%) | |
| CMED | 86 (9.7%) | 477 (13.0%) | |
| Others | 223 (25.1%) | 1282 (34.9%) | 1.36E−11 |
| Vital signs | |||
| Heartrate_min (times/min), mean (SD) | 72.92 ± 19.16 | 73.08 ± 15.72 | 0.826205355 |
| Heartrate_mean (times/min), mean (SD) | 91.12 ± 18.27 | 87.81 ± 16.15 | 0.00000579 |
| Sysbp_min (mmhg), mean (SD) | 80.46 ± 20.01 | 90.69 ± 16.18 | 6.49E−36 |
| Diasbp_mean (mmhg), mean (SD) | 58.65 ± 10.42 | 61.7 ± 10.08 | 6.62E−13 |
| Meanbp_min (mmhg), mean (SD) | 48.41 ± 15.65 | 56.24 ± 13.89 | 6.74E−34 |
| Resprate_mean (times/min),mean (SD) | 21.73 ± 4.68 | 19.54 ± 4.06 | 3.66E−30 |
| Tempc_min (℃), mean (SD) | 35.77 ± 1.17 | 36.15 ± 0.86 | 1.78E−16 |
| Tempc_max (℃), mean (SD) | 37.34 ± 1.17 | 37.65 ± 0.85 | 3.37E−11 |
| Spo2_mean (%), mean (SD) | 96.03 ± 4.02 | 97.1 ± 1.96 | 3.06E−12 |
| Laboratory parameters | |||
| Aniongap_max (mmhg), mean (SD) | 19.12 ± 6.26 | 16.17 ± 4.65 | 1.45E−31 |
| Aniongap_min (mmhg), mean (SD) | 14.58 ± 4.66 | 12.45 ± 3.08 | 1.48E−30 |
| Creatinine_min (ng/dL), mean (SD) | 1.65 ± 1.23 | 1.35 ± 1.39 | 3.89E−09 |
| Chloride_min (mmol/L), mean (SD) | 101.67 ± 7.65 | 101.93 ± 6.69 | 0.39868745 |
| Hemoglobin_min (g/dL), mean (SD) | 9.84 ± 2.21 | 10.08 ± 2.08 | 0.009534544 |
| Hemoglobin_max, (g/dL), mean (SD) | 11.77 ± 2.26 | 12 ± 2.09 | 0.012634047 |
| Lactate_min (mmol/L), mean (SD) | 2.36 ± 2.07 | 1.55 ± 0.81 | 4.26E−24 |
| Platelet_min (109/L), mean (SD) | 189.73 ± 125.29 | 195.98 ± 108.29 | 0.207608419 |
| Potassium_min (mmol/L), mean (SD) | 3.84 ± 0.7 | 3.71 ± 0.54 | 0.00000328 |
| Sodium_min (mmol/L), mean (SD) | 136.27 ± 6.62 | 136.08 ± 5.35 | 0.454879474 |
| Sodium_max (mmol/L), mean (SD) | 141.28 ± 6.8 | 140.51 ± 5.03 | 0.003570629 |
| Bun_min (mmol/L), mean (SD) | 36.09 ± 25.43 | 24.22 ± 19.69 | 8.45E−31 |
| Bun_max (mmol/L), mean (SD) | 42.88 ± 28.49 | 30.23 ± 24.39 | 1.12E−27 |
| Wbc_min (109/L), mean (SD) | 12.54 ± 12.22 | 10.41 ± 6.55 | 4.81E−06 |
| Wbc_max (109/L), mean (SD) | 17.54 ± 19.99 | 14.8 ± 9.87 | 0.000293182 |
| Inr_max, mean (SD) | 2.12 ± 1.79 | 1.61 ± 1.34 | 2.89E−14 |
| Urine output | 1225.29 ± 1307.53 | 1993.04 ± 1551.57 | 2.82E−48 |
| Score system | |||
| SOFA | 8.02 ± 4.33 | 5.22 ± 2.85 | 2.74E−55 |
| qSOFA | 2.15 ± 0.64 | 1.9 ± 0.69 | 4.60E−21 |
| SAPS II | 54.67 ± 16.37 | 37.51 ± 13.22 | 2.2E−16 |
| Advanced life support | |||
| Mechanical ventilation | 531 (59.69%) | 1668 (45.55%) | |
| Renal replacement therapy | 53 (5.95%) | 135 (3.61%) | 0.2503 |
| Accompanied diseases (comorbidity) | |||
| Diabetes | 246 (27.64%) | 2631 (71.70%) | |
| Malignant tumour | 116 (13.05%) | 160 (4.37%) | |
| Others | 527 (59.31%) | 879 (23.93%) | 2.20E−16 |
| Common sources of infection | |||
| Blood culture | 418 (47%) | 1343 (36.6%) | |
| MRSA screen | 267 (30%) | 1384 (37.72%) | |
| Urine | 151 (17%) | 642 (17.5%) | |
| Swab | 18 (2%) | 70 (1.9%) | |
| Others | 35 (4%) | 231 (6.3%) | 7.82E−08 |
| Outcome | |||
| Within 30-days mortality | 19.50% | 80.50% | |
MED medical-general service for internal medicine, CMED cardiac medical-for non-surgical cardiac related admissions, sysbp systolic blood pressure, diasbp diastolic blood pressure, meanbp mean blood pressure, resprate respiratary rate, tempc temperature, bun blood urea nitrogen, wbc white blood cell, INR international normalized ratio, sofa sequential organ failure assessment, qSOFA quick SOFA, SAPS II simplified acute physiology score II, Spo2 oxyhemoglobin saturation, Max maximum, Min minimum
Fig. 2Characteristics of MIMIC-III patients with sepsis by ethnicities (a) and characteristics of MIMIC-III patients with sepsis by common sources of infection (b)
Features selected in the conventional logistic regression
| OR_with_CI | p value | |
|---|---|---|
| (Intercept) | 52,913.003 (87.92–33,934,517.782) | < 0.001 |
| Sofa | 1.142 (1.106–1.179) | < 0.001 |
| Aniongap_min | 1.078 (1.043–1.115) | < 0.001 |
| Creatinine_min | 0.676 (0.592–0.767) | < 0.001 |
| Chloride_min | 0.98 (0.962–0.999) | 0.03393 |
| Hematocrit_min | 1.113 (1.053–1.178) | < 0.001 |
| Hemoglobin_min | 0.748 (0.623–0.895) | 0.00169 |
| Hemoglobin_max | 0.926 (0.863–0.992) | 0.02993 |
| Lactate_min | 1.308 (1.194–1.435) | < 0.001 |
| Potassium_min | 1.179 (1.001–1.389) | 0.04922 |
| Sodium_max | 1.046 (1.019–1.074) | < 0.001 |
| Bun_min | 1.033 (1.018–1.048) | < 0.001 |
| Bun_max | 0.986 (0.973–0.997) | 0.01542 |
| Wbc_min | 1.062 (1.036–1.09) | < 0.001 |
| Wbc_max | 0.969 (0.952–0.987) | < 0.001 |
| Heartrate_min | 0.987 (0.977–0.997) | 0.0111 |
| Heartrate_mean | 1.022 (1.011–1.033) | < 0.001 |
| Sysbp_min | 0.991 (0.984–0.998) | 0.00839 |
| Meanbp_min | 0.992 (0.985–1) | 0.0468 |
| Resprate_mean | 1.062 (1.038–1.086) | < 0.001 |
| Tempc_min | 0.897 (0.81–0.993) | 0.03242 |
| Tempc_max | 0.781 (0.698–0.873) | < 0.001 |
| Spo2_mean | 0.947 (0.909–0.986) | 0.00839 |
| Age | 1.029 (1.022–1.035) | < 0.001 |
| Diabetes | 0.779 (0.639–0.948) | 0.01328 |
| Vent | 1.824 (1.48–2.251) | < 0.001 |
OR odds ratio, CI confidence interval, SOFA sequential organ failure assessment, bun blood urea nitrogen, wbc white blood cell, sysbp systolic blood pressure, meanbp mean blood pressure, resprate respiratary rate, Spo2 oxyhemoglobin saturation, vent ventilation, Max maximum, Min minimum
Features selected in the XGboost model
| OR_with_CI | ||
|---|---|---|
| (Intercept) | 493.907 (9.063–27,931.087) | 0.00247 |
| Urineoutput | 1 (1–1) | < 0.001 |
| Lactate_min | 1.401 (1.288–1.527) | < 0.001 |
| Bun_mean | 1.018 (1.013–1.023) | < 0.001 |
| Sysbp_min | 0.979 (0.974–0.984) | < 0.001 |
| Metastatic_cancer | 2.997 (2.217–4.038) | < 0.001 |
| Inr_max | 1.058 (1.002–1.115) | 0.03709 |
| Age | 1.019 (1.013–1.025) | < 0.001 |
| Sodium_max | 1.016 (1.001–1.031) | 0.03835 |
| Aniongap_max | 1.048 (1.026–1.069) | < 0.001 |
| Creatinine_min | 0.766 (0.686–0.852) | < 0.001 |
| Spo2_mean | 0.897 (0.865–0.93) | < 0.001 |
OR odds ratio, CI confidence interval, bun blood urea nitrogen, sysbp systolic blood pressure, INR international normalized ratio, Spo2 oxyhemoglobin saturation, Max maximum, Min minimum
Fig. 3Top 11 features selected using XGBoost and the corresponding variable importance score. X-axis indicates the importance score which is the relative number of a variable that is used to distribute the data, Y-axis indicates the top 11 weighted variables
Fig. 4The receiver operating characteristic (ROC) curves. a traditional logistic regression model, area under curves (AUC) is 0.819 [95% confidence interval (CI); 0.800–0.838]; b SAPS-II score model, AUC is 0.797 [0.781–0.813]; c XGboost model, AUC is 0.857 [0.839–0.876], the best performance of the models was the XGboost model
Fig. 5Decision curve analysis (DCA) of the three prediction models. The net benefit curves for the three prognostic models are shown. X-axis indicates the threshold probability for critical care outcome and Y-axis indicates the net benefit. Solid green line = XGboost model, solid red line = traditional logistic model, solid blue line = SAPS-II score mode. The preferred model is the XGboost model, the net benefit of which was larger over the range of traditional logistic model and SAPS-II score model
Fig. 6Nomogram to estimate the risk of mortality in sepsis patients. To use the nomogram, we first draw a line from each parameter value to the score axis for the score, the points for all the parameters are then added, finally, a line from the total score axis is drawn to determine the risk of mortality on the lower line of the nomogram
Fig. 7Clinical impact curve (CIC) of XGboost model. The red curve (number of high-risk individuals) indicates the number of people who are classified as positive (high risk) by the model at each threshold probability; the blue curve (number of high-risk individuals with outcome) is the number of true positives at each threshold probability. CIC visually indicated that nomogram conferred high clinical net benefit and confirmed the clinical value of the XGboost model