| Literature DB >> 35416785 |
James Yeongjun Park1, Tzu-Chun Hsu2, Jiun-Ruey Hu3, Chun-Yuan Chen4, Wan-Ting Hsu5,6, Matthew Lee6, Joshua Ho7, Chien-Chang Lee2,7.
Abstract
BACKGROUND: Although machine learning (ML) algorithms have been applied to point-of-care sepsis prognostication, ML has not been used to predict sepsis mortality in an administrative database. Therefore, we examined the performance of common ML algorithms in predicting sepsis mortality in adult patients with sepsis and compared it with that of the conventional context knowledge-based logistic regression approach.Entities:
Keywords: SuperLearner; machine learning; mortality; sepsis
Mesh:
Year: 2022 PMID: 35416785 PMCID: PMC9047761 DOI: 10.2196/29982
Source DB: PubMed Journal: J Med Internet Res ISSN: 1438-8871 Impact factor: 7.076
Sensitivity analysis of tree numbers in the random forest algorithm.
| Number of trees allowed | AUCa (95% CI) | Pairwise significant comparison of AUC | |
| 100 | 0.876 (0.874-0.878) | 100 trees versus 200 trees | <.001b |
| 200 | 0.877 (0.876-0.879) | 200 trees versus 300 trees | <.001b |
| 300 | 0.878 (0.877-0.880) | 300 trees versus 400 trees | <.001b |
| 400 | 0.878 (0.877-0.880) | 400 trees versus 500 trees | .30 |
aAUC: area under the curve.
bValues are significant at P<.001.
Association between the number of variables allowed to be considered at each split in the random forest model and model discrimination.
| Number of variables allowed | AUCa (95% CI) | Number of variables | Pairwise significant comparison of AUC ( |
| 3 | 0.852 (0.850-0.854) | 3 variables versus 5 variables | <.001b |
| 5 | 0.860 (0.858-0.862) | 5 variables versus 9 variables | <.001b |
| 9 | 0.868 (0.866-0.869) | 9 variables versus 15 variables | <.001b |
| 15 | 0.874 (0.872-0.875) | 15 variables versus 20 variables | <.001b |
| 20 | 0.875 (0.874-0.877) | 20 variables versus 25 variables | <.001b |
| 25 | 0.877 (0.875-0.879) | 25 variables versus 40 variables | <.001b |
| 40 | 0.878 (0.876-0.880) | 40 variables versus 50 variables | .02c |
| 50 | 0.878 (0.877-0.880) | 50 variables versus 70 variables | .53 |
| 70 | 0.878 (0.877-0.880) | N/Ad | N/A |
aAUC: area under the curve.
bValues are significant at P<.001.
cValues are significant at P<.05.
dN/A: not applicable.
Figure 1The architecture of the 4-layered neural network to predict sepsis mortality. ReLu: Rectified Linear Unit.
Figure 2Flowchart depicting the construction of the study cohort from the Nationwide Inpatient Sample (NIS) database. LASSO: least absolute shrinkage and selection operator.
Characteristics of patients with sepsis in the Nationwide Inpatient Sample stratified by in-hospital survival status (N=923,759).
| Characteristics | Survivors of sepsis (n=726,918) | Nonsurvivors of sepsis (n=196,841) | Total | |
| Age (years), mean (SE) | 67.15 (16.44) | 70.85 (14.88) | 67.94 (16.19) | |
| Women, n (%) | 358,756 (49.4) | 96,708 (49.1) | 455,464 (49.3) | |
|
| ||||
|
| White | 511,579 (70.4) | 137,807 (70) | 649,386 (70.3) |
|
| Black | 112,801 (15.5) | 30,207 (15.3) | 143,008 (15.5) |
|
| Hispanic | 61,174 (8.4) | 16,386 (8.3) | 77,560 (8.4) |
|
| Others | 41,364 (5.7) | 12,441 (6.3) | 53,805 (5.8) |
|
| ||||
|
| Medicare | 221,228 (30.4) | 60,933 (31) | 282,161 (30.5) |
|
| Medicaid | 185,758 (25.6) | 48,838 (24.8) | 234,596 (25.4) |
|
| Commercial | 172,650 (23.8) | 45,437 (23.1) | 218,087 (23.6) |
|
| Other | 147,282 (20.3) | 41,633 (21.2) | 188,915 (20.5) |
|
| ||||
|
| Early mechanical ventilation | 118,939 (16.4) | 76,773 (39) | 195,712 (21.2) |
|
| Late mechanical ventilation | 36,649 (5) | 35,531 (18.1) | 72,180 (7.8) |
|
| Shock | 305,375 (42) | 132,582 (67.4) | 437,957 (47.4) |
|
| Hemodialysis | 58,962 (8.1) | 28,691 (14.6) | 87,653 (9.5) |
|
| ICUa care (at least one day) | 67,810 (9.3) | 58,756 (29.8) | 126,566 (13.7) |
|
| ||||
|
| Anemia | 265,364 (36.5) | 55,632 (28.3) | 320,996 (34.7) |
|
| Depression | 81,827 (11.3) | 14,612 (7.4) | 96,439 (10.4) |
|
| Diabetes | 256,947 (35.3) | 57,294 (29.1) | 314,241 (34) |
|
| Drug and substance abuse | 25,311 (3.5) | 4188 (2.1) | 29,499 (3.2) |
|
| Chronic lung disease | 188,546 (25.9) | 50,749 (25.8) | 239,295 (25.9) |
|
| Congestive heart failure | 173,776 (23.9) | 56,036 (28.5) | 229,812 (24.9) |
|
| Hypertension | 424,834 (58.4) | 102,862 (52.3) | 527,696 (57.1) |
|
| Hypothyroid disease | 100,256 (13.8) | 23,856 (12.1) | 124,112 (13.4) |
|
| Liver disease | 42,065 (5.8) | 17,995 (9.1) | 60,060 (6.5) |
|
| Renal failure, chronic | 210,371 (28.9) | 57,171 (29) | 267,542 (29) |
|
| Lymphoma | 13,691 (1.9) | 5469 (2.8) | 19,160 (2.1) |
|
| Metastatic carcinomas | 30,789 (4.2) | 17,109 (8.7) | 47,898 (5.2) |
|
| Neurological conditions | 117,134 (16.1) | 27,791 (14.1) | 144,925 (15.7) |
|
| Obesity | 100,716 (13.9) | 18,173 (9.2) | 118,889 (12.9) |
|
| Malignant solid tumors | 27,426 (3.8) | 10,057 (5.1) | 37,483 (4.1) |
|
| Rheumatoid arthritis or collagen vascular diseases | 27,294 (3.8) | 6324 (3.2) | 33,618 (3.6) |
|
| Paraplegia | 53,755 (7.4) | 10,955 (5.6) | 64,710 (7) |
|
| Perivascular conditions | 68,641 (9.4) | 22,853 (11.6) | 91,494 (9.9) |
|
| Psychiatric diseases | 44,282 (6.1) | 6902 (3.5) | 51,184 (5.5) |
|
| Pulmonary-circulatory | 43,697 (6) | 15,327 (7.8) | 59,024 (6.4) |
|
| Weight loss | 146,865 (20.2) | 47,320 (24) | 194,185 (21) |
|
| ||||
|
| Renal dysfunction | 433,920 (59.7) | 129,768 (65.9) | 563,688 (61) |
|
| Cardiovascular dysfunction or shock | 281,647 (38.7) | 132,079 (67.1) | 413,726 (44.8) |
|
| Acute respiratory failure | 161,921 (22.3) | 116,406 (59.1) | 278,327 (30.1) |
|
| CNSb dysfunction | 162,716 (22.4) | 51,146 (26) | 213,862 (23.2) |
|
| Hepatic dysfunction | 18,579 (2.6) | 20,561 (10.4) | 39,140 (4.2) |
|
| ||||
|
| Smoking | 75,404 (10.4) | 15,033 (7.6) | 90,437 (9.8) |
|
| Alcoholism | 32,879 (4.5) | 10,674 (5.4) | 43,553 (4.7) |
aICU: intensive care unit.
bCNS: central nervous system.
Characteristics of patients with sepsis in the Nationwide Inpatient Sample stratified by training and validation cohort (N=923,759).
| Characteristic | Training (2010-2013) | Testing (2014) | |||||||
|
| Survivors of sepsis (n=548,930) | Nonsurvivors of sepsis (n=155,316) | Survivors of sepsis (n=177,988) | Nonsurvivors of sepsis (n=41,525) | |||||
| Age (years), mean (SE) | 67.25 (16.46) | 70.96 (14.93) | 66.84 (16.37) | 70.44 (14.68) | |||||
| Women, n (%) | 271,311 (49.4) | 76,496 (49.3) | 87,445 (49.1) | 20,212 (48.7) | |||||
|
| |||||||||
|
| White | 385,330 (70.2) | 108,405 (69.8) | 126,249 (70.9) | 29,402 (70.8) | ||||
|
| Black | 86,727 (15.8) | 24,295 (15.6) | 26,074 (14.6) | 5912 (14.2) | ||||
|
| Hispanic | 45,887 (8.4) | 12,954 (8.3) | 15,287 (8.6) | 3432 (8.3) | ||||
|
| Others | 30,986 (5.6) | 9662 (6.2) | 10,378 (5.8) | 2779 (6.7) | ||||
|
| |||||||||
|
| Medicare | 166,023 (30.2) | 47,814 (30.8) | 55,205 (31) | 13,119 (31.6) | ||||
|
| Medicaid | 136,607 (24.9) | 37,627 (24.2) | 49,151 (27.6) | 11,211 (27) | ||||
|
| Commercial | 132,428 (24.1) | 36,387 (23.4) | 40,222 (22.6) | 9050 (21.8) | ||||
|
| Other | 113,872 (20.7) | 33,488 (21.6) | 33,410 (18.8) | 8145 (19.6) | ||||
|
| |||||||||
|
| Early mechanical ventilation | 92,718 (16.9) | 60,822 (39.2) | 26,221 (14.7) | 15,951 (38.4) | ||||
|
| Late mechanical ventilation | 28,892 (5.3) | 28,532 (18.4) | 7757 (4.4) | 6999 (16.9) | ||||
|
| Shock | 232,963 (42.4) | 103,544 (66.7) | 72,412 (40.7) | 29,038 (69.9) | ||||
|
| Hemodialysis | 46,180 (8.4) | 22,818 (14.7) | 12,782 (7.2) | 5873 (14.1) | ||||
|
| ICUa care (at least one day) | 53,146 (9.7) | 46,914 (30.2) | 14,664 (8.2) | 11,842 (28.5) | ||||
|
| |||||||||
|
| Anemia | 201,132 (36.6) | 43,380 (27.9) | 64,232 (36.1) | 12,252 (29.5) | ||||
|
| Depression | 59,998 (10.9) | 11,239 (7.2) | 21,829 (12.3) | 3373 (8.1) | ||||
|
| Diabetes | 191,296 (34.8) | 44,598 (28.7) | 65,651 (36.9) | 12,696 (30.6) | ||||
|
| Drug and substance abuse | 17,689 (3.2) | 3113 (2) | 7622 (4.3) | 1075 (2.6) | ||||
|
| Chronic lung disease | 140,276 (25.6) | 39,550 (25.5) | 48,270 (27.1) | 11,199 (27) | ||||
|
| Congestive heart failure | 130,913 (23.8) | 43,716 (28.1) | 42,863 (24.1) | 12,320 (29.7) | ||||
|
| Hypertension | 316,301 (57.6) | 79,939 (51.5) | 108,533 (61) | 22,923 (55.2) | ||||
|
| Hypothyroid disease | 73,904 (13.5) | 18,348 (11.8) | 26,352 (14.8) | 5508 (13.3) | ||||
|
| Liver disease | 30,753 (5.6) | 13,796 (8.9) | 11,312 (6.4) | 4199 (10.1) | ||||
|
| Renal failure, chronic | 158,078 (28.8) | 44,704 (28.8) | 52,293 (29.4) | 12,467 (30) | ||||
|
| Lymphoma | 10,371 (1.9) | 4281 (2.8) | 3320 (1.9) | 1188 (2.9) | ||||
|
| Metastatic carcinomas | 23,087 (4.2) | 13,352 (8.6) | 7702 (4.3) | 3757 (9) | ||||
|
| Neurological conditions | 87,994 (16) | 21,699 (14) | 29,140 (16.4) | 6092 (14.7) | ||||
|
| Obesity | 71,693 (13.1) | 13,392 (8.6) | 29,023 (16.3) | 4781 (11.5) | ||||
|
| Malignant solid tumors | 20,417 (3.7) | 7814 (5) | 7009 (3.9) | 2243 (5.4) | ||||
|
| Rheumatoid arthritis or collagen vascular diseases | 20,368 (3.7) | 4898 (3.2) | 6926 (3.9) | 1426 (3.4) | ||||
|
| Paraplegia | 40,811 (7.4) | 8488 (5.5) | 12,944 (7.3) | 2467 (5.9) | ||||
|
| Perivascular conditions | 50,853 (9.3) | 17,734 (11.4) | 17,788 (10) | 5119 (12.3) | ||||
|
| Psychiatric diseases | 32,698 (6) | 5320 (3.4) | 11,584 (6.5) | 1582 (3.8) | ||||
|
| Pulmonary-circulatory | 32,214 (5.9) | 11,625 (7.5) | 11,483 (6.5) | 3702 (8.9) | ||||
|
| Weight loss | 113,028 (20.6) | 37,182 (23.9) | 33,837 (19) | 10,138 (24.4) | ||||
|
| |||||||||
|
| Renal dysfunction | 324,840 (59.2) | 101,420 (65.3) | 109,080 (61.3) | 28,348 (68.3) | ||||
|
| Cardiovascular dysfunction or shock | 215,545 (39.3) | 103,064 (66.4) | 66,102 (37.1) | 29,015 (69.9) | ||||
|
| Acute respiratory failure | 125,706 (22.9) | 92,008 (59.2) | 36,215 (20.3) | 24,398 (58.8) | ||||
|
| CNSb dysfunction | 118,837 (21.6) | 38,642 (24.9) | 43,879 (24.7) | 12,504 (30.1) | ||||
|
| Hepatic dysfunction | 14,091 (2.6) | 15,752 (10.1) | 4488 (2.5) | 4809 (11.6) | ||||
|
| |||||||||
|
| Smoking | 54,038 (9.8) | 11,205 (7.2) | 21,366 (12) | 3828 (9.2) | ||||
|
| Alcoholism | 24,025 (4.4) | 8083 (5.2) | 8854 (5) | 2591 (6.2) | ||||
aICU: intensive care unit.
bCNS: central nervous system.
Measures of model discrimination and accuracy in the validation data set (Nationwide Inpatient Sample 2014), including area under the curve (AUC), sensitivity, specificity, positive predictive value (PPV), and negative predictive value (NPV).
| Model | AUC (95% CI) | Sensitivity (95% CI) | Specificity (95% CI) | PPV (95% CI) | NPV (95% CI) |
| Reference logistic regression (Severe Sepsis Prediction score) | 0.786 (0.783-0.788) | 0.708 (0.704-0.713) | 0.722 (0.720-0.774) | 0.373 (0.370-0.376) | 0.914 (0.912-0.915) |
| LASSOa | 0.878 (0.876-0.879) | 0.812 (0.808-0.816) | 0.784 (0.782-0.786) | 0.468 (0.464-0.471) | 0.947 (0.946-0.948) |
| Random forest | 0.878 (0.877-0.880) | 0.818 (0.814-0.821) | 0.771 (0.769-0.773) | 0.454 (0.451-0.458) | 0.948 (0.947-0.949) |
| Xgboost | 0.888 (0.886-0.889) | 0.829 (0.826-0.833) | 0.781 (0.781-0.785) | 0.472 (0.468-0.475) | 0.952 (0.950-0.953) |
| Deep neural network | 0.893 (0.891-0.895) | 0.826 (0.823-0.830) | 0.794 (0.793-0.796) | 0.484 (0.480-0.488) | 0.951 (0.950-0.953) |
| Super Learner | 0.883 (0.881-0.885) | 0.833 (0.829-0.837) | 0.769 (0.768-0.771) | 0.458 (0.455-0.460) | 0.952 (0.951-0.953) |
aLASSO: least absolute shrinkage and selection operator.
Figure 3Receiver operating characteristic curves of different machine learning models in predicting sepsis mortality. AUC: area under the curve; LASSO: least absolute shrinkage and selection operator.
Figure 4Precision-recall curves of different machine learning models in predicting sepsis mortality. AUC: area under the curve; LASSO: least absolute shrinkage and selection operator.
The area under the precision-recall curve (AUC-PR), recall, and precision of different machine learning models in predicting sepsis mortality.
|
| AUC-PR, mean (SD) | Recall (95% CI) | Precision (95% CI) |
| Reference logistic regression | 0.443 (0.003) | 0.587 (0.583-0.591) | 0.403 (0.401-0.405) |
| LASSOa | 0.636 (0.001) | 0.806 (0.805-0.807) | 0.410 (0.410-0.411) |
| Random forest | 0.653 (0.002) | 0.806 (0.805-0.807) | 0.415 (0.414-0.416) |
| Xgboost | 0.673 (0.002) | 0.814 (0.813-0.816) | 0.420 (0.420-0.421) |
| Neural networks | 0.681 (0.002) | 0.815 (0.814-0.816) | 0.427 (0.426-0.428) |
aLASSO: least absolute shrinkage and selection operator.
Figure 5Calibration plots of observed versus predicted hospital mortality and associated mortality ratios by risk deciles in the development and validation cohorts. LASSO: least absolute shrinkage and selection operator.
Calibration measures of different machine learning models in predicting sepsis mortality.
|
| Brier score | Slope | Intercept |
| Reference logistic regression | 0.129 | 1.048 | −0.054 |
| LASSOa | 0.108 | 1.044 | −0.028 |
| Random forest | 0.103 | 1.458 | 0.245 |
| Xgboost | 0.102 | 1.087 | −0.092 |
| Neural networks | 0.0954 | 1.096 | 0.073 |
aLASSO: least absolute shrinkage and selection operator.
Figure 6Variables of importance from random forest ranked by impurity-based variable importance. CNS: central nervous system; ICU: intensive care unit.
Figure 7Variables of importance from xgboost ranked by mean Shapley Additive Explanations values (SHAP). ICU: intensive care unit.
Figure 8Variance inflation factor scores of top 50 variables by Shapley Additive Explanations (SHAP) values. ICU: intensive care unit; VIF: variance inflation factor.
Performance comparison of the machine learning models with the logistic regression model with the same features.
|
| Brier score | AUC-PRa, mean (SD) | AUCb (95% CI) | AUC |
| Logistic regression model—all features | 0.102 | 0.634 (0.003) | 0.857 (0.855-0.859) | N/Ac |
| LASSOd | 0.108 | 0.636 (0.001) | 0.878 (0.876-0.879) | <.001e |
| Random forest | 0.103 | 0.653 (0.002) | 0.878 (0.877-0.880) | <.001e |
| Xgboost | 0.102 | 0.673 (0.002) | 0.888 (0.886-0.889) | <.001e |
| Neural networks | 0.0954 | 0.681 (0.002) | 0.893 (0.891-0.895) | <.001e |
aAUPRC: area under the precision-recall curve.
bAUC: area under the curve.
cN/A: not applicable.
dLASSO: least absolute shrinkage and selection operator.
eValues are significant at P<.001.
Variables of importance from the random forest (random train–test split cohort).
| Variable name | Importance rank | Top 50 from previous cohort |
| Acute respiratory failure | 1 | Yes |
| Age | 2 | Yes |
| Respiratory intubation and mechanical ventilation (primary procedure) | 3 | Yes |
| Combined comorbidity score | 4 | Yes |
| Shock (other diagnosis) | 5 | Yes |
| ICUa care (at least one day) | 6 | Yes |
| Cardiovascular dysfunction or shock | 7 | Yes |
| Other aftercare (other diagnosis) | 8 | Yes |
| Early mechanical ventilation | 9 | Yes |
| Respiratory intubation and mechanical ventilation (secondary procedure) | 10 | Yes |
| Shock | 11 | Yes |
| Late mechanical ventilation | 12 | Yes |
| Insurance | 13 | Yes |
| Hepatic dysfunction | 14 | Yes |
| Other liver diseases (other diagnosis) | 15 | Yes |
| Coma, stupor, and brain damage (other diagnosis) | 16 | Yes |
| Location or teaching status of hospital |
|
|
| Bacterial infection, unspecified site (other diagnosis) | 18 | Yes |
| Race | 19 | Yes |
| Urinary tract infections (other diagnosis) | 20 | Yes |
| Pneumonia (except that caused by tuberculosis or sexually transmitted disease; other diagnosis) | 21 | Yes |
| Other gastrointestinal disorders (other diagnosis) | 22 | Yes |
| Joint disorders and dislocations, trauma-related (secondary diagnosis) | 23 | Yes |
| Acute and unspecified renal failure (other diagnosis) | 24 | Yes |
| Residual codes, unclassified (other diagnosis) | 25 | Yes |
| Aspiration pneumonitis and food or vomitus (other diagnosis) | 26 | Yes |
| Secondary malignancies (other diagnosis) | 27 | Yes |
| Anemia | 28 | Yes |
| Renal dysfunction | 29 | Yes |
| Other nervous system disorders (other diagnosis) | 30 | Yes |
| Other nutritional, endocrine, and metabolic disorders (other diagnosis) | 31 | Yes |
| Coagulation and hemorrhagic disorders (other diagnosis) | 32 | Yes |
| Other injuries and conditions because of external causes (other diagnosis) | 33 | Yes |
| Cardiac dysrhythmias (other diagnosis) | 34 | Yes |
| Insertion, replacement, or removal of extracranial ventricular shunt (primary procedure) | 35 | Yes |
| CNSb dysfunction | 36 | Yes |
| Sex | 37 | Yes |
| Hemodialysis | 38 | Yes |
| Diabetes | 39 | Yes |
| Septicemia (except in labor; other diagnosis) | 40 | Yes |
| Nutritional deficiencies (other diagnosis) | 41 | Yes |
| Hypertension | 42 | Yes |
| Administrative or social admission (other diagnosis) | 43 | Yes |
| Allergic reactions (other diagnosis) | 44 | No |
| Pleurisy, pneumothorax, and pulmonary collapse (other diagnosis) | 45 | No |
| Metastatic cancer | 46 | Yes |
| Weight loss | 47 | Yes |
| Deficiency and other anemia (other diagnosis) | 48 | No |
| Delirium, dementia, and amnestic and other cognitive disorders (other diagnosis) | 49 | No |
| Coronary atherosclerosis and other heart disease (other) | 50 | No |
aICU: intensive care unit.
bCNS: central nervous system.