| Literature DB >> 35710407 |
Hanyin Wang1, Yikuan Li1, Andrew Naidech2, Yuan Luo3.
Abstract
BACKGROUND: Sepsis is one of the most life-threatening circumstances for critically ill patients in the United States, while diagnosis of sepsis is challenging as a standardized criteria for sepsis identification is still under development. Disparities in social determinants of sepsis patients can interfere with the risk prediction performances using machine learning.Entities:
Keywords: Disparity; Machine learning; Mortality prediction; Sepsis; Social determinants
Mesh:
Year: 2022 PMID: 35710407 PMCID: PMC9204861 DOI: 10.1186/s12911-022-01871-0
Source DB: PubMed Journal: BMC Med Inform Decis Mak ISSN: 1472-6947 Impact factor: 3.298
Fig. 1The workflow
Fig. 2Forest plot for disparities in social determinants across various sepsis criteria. The proportions of sepsis patients of every sub-population identified by each sepsis criteria are shown as a point with a 95% confidence interval. Sepsis criteria are shown in different colors, while results for each subpopulation are shown in a row corresponding to the labels on the y-axis. Explicit: the explicit criteria; Angus: the Angus methodology; Martin: the Martine methodology; CMS: criteria presented by Centers for Medicare & Medicaid Services (CMS); CDC: the complete surveillance criteria presented by Center of Disease Control and Prevention (CDC); Sepsis-3: the Sepsis-3 criteria
Statistics of 5783 sepsis patients
| Social Determinants | Category | n | % sepsis population | In-hospital mortality | % in-hospital mortality | Training | Testing |
|---|---|---|---|---|---|---|---|
| Race | Asian | 179 | 3.10 | 26 | 14.53 | 129 | 50 |
| Black or African American | 501 | 8.66 | 52 | 10.38 | 348 | 153 | |
| Hispanic or Latino | 188 | 3.25 | 18 | 9.57 | 132 | 56 | |
| Other | 714 | 12.35 | 165 | 23.11 | 527 | 187 | |
| White | 4201 | 72.64 | 575 | 13.69 | 2912 | 1289 | |
| Sex | Female | 2562 | 44.30 | 384 | 14.99 | 1798 | 764 |
| Male | 3221 | 55.70 | 452 | 14.03 | 2250 | 971 | |
| Marital status | Separated | 398 | 6.88 | 52 | 13.07 | 287 | 111 |
| Significant other | 2559 | 44.25 | 363 | 14.19 | 1788 | 771 | |
| Single | 1638 | 28.32 | 174 | 10.62 | 1157 | 481 | |
| Unknown | 332 | 5.74 | 102 | 30.72 | 248 | 84 | |
| Widowed | 856 | 14.80 | 145 | 16.94 | 568 | 288 | |
| Insurance type | Government | 166 | 2.87 | 13 | 7.83 | 115 | 51 |
| Medicaid | 570 | 9.86 | 67 | 11.75 | 395 | 175 | |
| Medicare | 3358 | 58.07 | 560 | 16.68 | 2335 | 1023 | |
| Private | 1639 | 28.34 | 185 | 11.29 | 1168 | 471 | |
| Self-pay | 50 | 0.86 | 11 | 22.00 | 35 | 15 | |
| Language | English | 5167 | 89.35 | 727 | 14.07 | 3631 | 1536 |
| Other | 499 | 8.63 | 94 | 18.84 | 339 | 160 | |
| Spanish | 117 | 2.02 | 15 | 12.82 | 78 | 39 |
Categories of each social determinant are ranked alphabetically; n: number of sepsis patients in the category; % sepsis population: percentage of the number of sepsis patients in the category among the 5,783 sepsis patients; In-hospital mortality: number of patients in the category deceased in-hospital; % in-hospital mortality: percentage of patients in the category deceased in-hospital; Training: number of patients of the given category that were assigned to the training set during train-test split; Testing: number of patients of the given category that were assigned to the test set during train-test split
Detailed performances on the entire testing set
| Accuracy | AUC | Precision | Recall | F1_binary | F1_macro | Specificity | |
|---|---|---|---|---|---|---|---|
| Ridge classifier | 0.6790 | 0.7774 | 0.2682 | 0.7052 | 0.3886 | 0.5855 | 0.6745 |
| Perceptron | 0.6720 | 0.7786 | 0.2634 | 0.7052 | 0.3835 | 0.5801 | 0.6664 |
| Passive-aggressive | 0.6841 | 0.7582 | 0.2733 | 0.7131 | 0.3951 | 0.5907 | 0.6792 |
| kNN | 0.7135 | 0.7299 | 0.2780 | 0.6135 | 0.3826 | 0.5981 | 0.7305 |
| Random forest | 0.7516 | 0.6459 | 0.2826 | 0.4661 | 0.3519 | 0.5991 | 0.7999 |
| LinearSVC_L1 | 0.6749 | 0.7781 | 0.2654 | 0.7052 | 0.3856 | 0.5823 | 0.6698 |
| LinearSVC_L2 | 0.6784 | 0.7777 | 0.2678 | 0.7052 | 0.3882 | 0.5850 | 0.6739 |
| SGDClassifier_L1 | 0.6790 | 0.7759 | 0.2682 | 0.7052 | 0.3886 | 0.5855 | 0.6745 |
| SGDClassifier_L2 | 0.6790 | 0.7749 | 0.2668 | 0.6972 | 0.3859 | 0.5843 | 0.6759 |
| SGDClassifier_EN | 0.6801 | 0.7753 | 0.2683 | 0.7012 | 0.3881 | 0.5858 | 0.6765 |
| MultinomialNB | 0.6392 | 0.7040 | 0.2348 | 0.6614 | 0.3466 | 0.5487 | 0.6354 |
| BernoulliNB | 0.3107 | 0.5724 | 0.1665 | 0.9402 | 0.2830 | 0.3096 | 0.2042 |
| Logistic regression | 0.6824 | 0.7761 | 0.2720 | 0.7131 | 0.3938 | 0.5893 | 0.6772 |
| SVC_rbf | 0.6847 | 0.7744 | 0.2702 | 0.6932 | 0.3888 | 0.5882 | 0.6833 |
| SVC_poly | 0.6749 | 0.7751 | 0.2654 | 0.7052 | 0.3856 | 0.5823 | 0.6698 |
| SVC_sigmoid | 0.6277 | 0.6873 | 0.2349 | 0.6972 | 0.3514 | 0.5451 | 0.6159 |
F1 binary: F1 score for the positive class; F1_macro: macro-averaged F1 score; Passive-aggressive: passive-aggressive classifier; kNN: k-Nearest Neighbors; LinearSVC_L1 or _L2: support vector machine with linear kernel coupled with L1 or L2 regularization; SGDClassifier_L1 or _L2 or _EN: stochastic gradient descent with L1 or L2 or Elastic Net regularization; MultinomialNB: Multinomial naïve Bayes; BernoulliNB: Bernoulli naïve Bayes; SVC_rbf or _poly or _sigmoid: support vector machine with rbf kernel or polynomial kernel or sigmoid kernel
Observed differences between the testing results and each race with p values from permutation tests
| Asian | Black or African American | Hispanic or Latino | Other | White | ||||||
|---|---|---|---|---|---|---|---|---|---|---|
| Observed difference | Observed difference | Observed difference | Observed difference | Observed difference | ||||||
| Ridge classifier | − 0.2812 | − 0.0241 | − 0.2208 | 0.0011 | 0.0175 | |||||
| Perceptron | − 0.2748 | 0.0078 | − 0.2453 | − 0.0026 | 0.0158 | |||||
| Passive-aggressive | − 0.3188 | − 0.0075 | − 0.1749 | 0.0159 | 0.0111 | |||||
| kNN | − 0.1314 | − 0.0628 | − 0.1865 | − 0.0144 | 0.0214 | |||||
| Random forest | − 0.0834 | − 0.0939 | − 0.1226 | 0.0536 | 0.0056 | |||||
| LinearSVC_L1 | − 0.2819 | − 0.0172 | − 0.2247 | 0.0003 | 0.0173 | |||||
| LinearSVC_L2 | − 0.2815 | − 0.0221 | − 0.2211 | 0.0005 | 0.0175 | |||||
| SGDClassifier_L1 | − 0.2872 | − 0.0041 | − 0.2159 | 0.0036 | 0.0184 | |||||
| SGDClassifier_L2 | − 0.2900 | − 0.0087 | − 0.2182 | 0.0044 | 0.0191 | |||||
| SGDClassifier_EN | − 0.2905 | − 0.0046 | − 0.2186 | 0.0050 | 0.0181 | |||||
| MultinomialNB | − 0.2797 | 0.0671 | − 0.2373 | 0.0051 | 0.0051 | |||||
| BernoulliNB | − 0.1974 | − 0.0034 | 0.0476 | 0.0173 | 0.0012 | |||||
| Logistic regression | − 0.2875 | − 0.0257 | − 0.2061 | 0.0043 | 0.0174 | |||||
| SVC_rbf | − 0.3085 | 0.0042 | − 0.2311 | − 0.0176 | 0.0175 | |||||
| SVC_poly | − 0.2978 | 0.0027 | − 0.2751 | 0.0087 | 0.0154 | |||||
| SVC_sigmoid | − 0.1343 | − 0.0941 | − 0.0606 | − 0.0099 | 0.0208 | |||||
Observe difference: observed difference in AUC when compared with the performance on the entire testing set; p_val: p value, p values less than or equal to 0.05 were highlighted; Passive-aggressive: passive-aggressive classifier; kNN: k-Nearest Neighbors; LinearSVC_L1 or _L2: support vector machine with linear kernel coupled with L1 or L2 regularization; SGDClassifier_L1 or _L2 or _EN: stochastic gradient descent with L1 or L2 or Elastic Net regularization; MultinomialNB: Multinomial naïve Bayes; BernoulliNB: Bernoulli naïve Bayes; SVC_rbf or _poly or _sigmoid: support vector machine with rbf kernel or polynomial kernel or sigmoid kernel
Observed differences between the testing results and each language with p values from permutation tests
| English | Other | Spanish | ||||
|---|---|---|---|---|---|---|
| Observed difference | Observed difference | Observed difference | ||||
| Ridge classifier | 0.0154 | − 0.0760 | − 0.3422 | |||
| Perceptron | 0.0182 | − 0.0916 | − 0.3551 | |||
| Passive-aggressive | 0.0122 | − 0.0555 | − 0.2288 | |||
| kNN | 0.0166 | − 0.0768 | − 0.3063 | |||
| Random forest | 0.0037 | − 0.0057 | − 0.2342 | |||
| LinearSVC_L1 | 0.0160 | − 0.0772 | − 0.3428 | |||
| LinearSVC_L2 | 0.0156 | − 0.0763 | − 0.3424 | |||
| SGDClassifier_L1 | 0.0184 | − 0.0783 | − 0.3347 | |||
| SGDClassifier_L2 | 0.0187 | − 0.0752 | − 0.3396 | |||
| SGDClassifier_EN | 0.0181 | − 0.0760 | − 0.3283 | |||
| MultinomialNB | 0.0221 | − 0.1210 | − 0.2746 | |||
| BernoulliNB | 0.0076 | − 0.0621 | 0.0306 | |||
| Logistic regression | 0.0145 | − 0.0703 | − 0.3173 | |||
| SVC_rbf | 0.0159 | − 0.0825 | − 0.3332 | |||
| SVC_poly | 0.0176 | − 0.0860 | − 0.3633 | |||
| SVC_sigmoid | − 0.0030 | 0.0341 | − 0.1814 | |||
Observe difference: observed difference in AUC when compared with the performance on the entire testing set; p_val: p-value, p values less than or equal to 0.05 were highlighted; Passive-aggressive: passive-aggressive classifier; kNN: k-Nearest Neighbors; LinearSVC_L1 or _L2: support vector machine with linear kernel coupled with L1 or L2 regularization; SGDClassifier_L1 or _L2 or _EN: stochastic gradient descent with L1 or L2 or Elastic Net regularization; MultinomialNB: Multinomial naïve Bayes; BernoulliNB: Bernoulli naïve Bayes; SVC_rbf or _poly or _sigmoid: support vector machine with rbf kernel or polynomial kernel or sigmoid kernel
Pairwise comparisons among different racial groups
| Asian v.s. Black or African American | Asian v.s. Hispanic or Latino | Asian v.s. Other | Asian v.s. White | Black or African American v.s. Hispanic or Latino | ||||||
|---|---|---|---|---|---|---|---|---|---|---|
| Observed difference | Observed difference | Observed difference | Observed difference | Observed difference | ||||||
| Ridge classifier | 0.2572 | 0.0605 | 0.2824 | 0.2988 | − 0.1967 | |||||
| Perceptron | 0.2827 | 0.0295 | 0.2722 | 0.2906 | − 0.2531 | |||||
| Passive-aggressive | 0.3114 | 0.1439 | 0.3348 | 0.3299 | − 0.1674 | |||||
| kNN | 0.0686 | − 0.0552 | 0.1170 | 0.1528 | − 0.1238 | |||||
| Random forest | − 0.0104 | − 0.0392 | 0.1370 | 0.0890 | − 0.0287 | |||||
| LinearSVC_L1 | 0.2647 | 0.0571 | 0.2822 | 0.2991 | − 0.2076 | |||||
| LinearSVC_L2 | 0.2594 | 0.0605 | 0.2820 | 0.2990 | − 0.1990 | |||||
| SGDClassifier_L1 | 0.2832 | 0.0714 | 0.2908 | 0.3057 | − 0.2118 | |||||
| SGDClassifier_L2 | 0.2813 | 0.0718 | 0.2944 | 0.3091 | − 0.2095 | |||||
| SGDClassifier_EN | 0.2858 | 0.0718 | 0.2954 | 0.3086 | − 0.2140 | |||||
| MultinomialNB | 0.3468 | 0.0424 | 0.2848 | 0.2848 | − 0.3044 | |||||
| BernoulliNB | 0.1940 | 0.2450 | 0.2147 | 0.1986 | 0.0510 | |||||
| Logistic regression | 0.2617 | 0.0814 | 0.2918 | 0.3049 | − 0.1804 | |||||
| SVC_rbf | 0.3127 | 0.0774 | 0.2909 | 0.3259 | − 0.2352 | |||||
| SVC_poly | 0.3005 | 0.0227 | 0.3066 | 0.3132 | − 0.2778 | |||||
| SVC_sigmoid | 0.0402 | 0.0736 | 0.1244 | 0.1551 | 0.0334 | |||||
Observe difference: observed difference in AUC when comparing the performance between the sub-populations; p_val: p-value, p values less than or equal to 0.05 were highlighted; Passive-aggressive: passive-aggressive classifier; kNN: k-Nearest Neighbors; LinearSVC_L1 or _L2: support vector machine with linear kernel coupled with L1 or L2 regularization; SGDClassifier_L1 or _L2 or _EN: stochastic gradient descent with L1 or L2 or Elastic Net regularization; MultinomialNB: Multinomial naïve Bayes; BernoulliNB: Bernoulli naïve Bayes; SVC_rbf or _poly or _sigmoid: support vector machine with rbf kernel or polynomial kernel or sigmoid kernel
Pairwise comparisons among different language groups
| English v.s. Other | English v.s. Spanish | Other v.s. Spanish | ||||
|---|---|---|---|---|---|---|
| Observed difference | Observed difference | Observed difference | ||||
| Ridge classifier | − 0.0915 | − 0.3576 | − 0.2661 | |||
| Perceptron | − 0.1098 | − 0.3733 | − 0.2635 | |||
| Passive-aggressive | − 0.0677 | − 0.2410 | − 0.1733 | |||
| kNN | − 0.0934 | − 0.3230 | − 0.2295 | |||
| Random forest | − 0.0094 | − 0.2379 | − 0.2285 | |||
| LinearSVC_L1 | − 0.0931 | − 0.3587 | − 0.2656 | |||
| LinearSVC_L2 | − 0.0919 | − 0.3580 | − 0.2661 | |||
| SGDClassifier_L1 | − 0.0967 | − 0.3531 | − 0.2564 | |||
| SGDClassifier_L2 | − 0.0939 | − 0.3583 | − 0.2643 | |||
| SGDClassifier_EN | − 0.0940 | − 0.3463 | − 0.2523 | |||
| MultinomialNB | − 0.1432 | − 0.2967 | − 0.1535 | |||
| BernoulliNB | − 0.0697 | 0.0230 | 0.0927 | |||
| Logistic regression | − 0.0849 | − 0.3318 | − 0.2469 | |||
| SVC_rbf | − 0.0984 | − 0.3491 | − 0.2507 | |||
| SVC_poly | − 0.1035 | − 0.3809 | − 0.2773 | |||
| SVC_sigmoid | 0.0372 | − 0.1784 | − 0.2155 | |||
Observe difference: observed difference in AUC when comparing the performance between the sub-populations; p_val: p-value, p-values less than or equal to 0.05 were highlighted; Passive-aggressive: passive-aggressive classifier; kNN: k-Nearest Neighbors; LinearSVC_L1 or _L2: support vector machine with linear kernel coupled with L1 or L2 regularization; SGDClassifier_L1 or _L2 or _EN: stochastic gradient descent with L1 or L2 or Elastic Net regularization; MultinomialNB: Multinomial naïve Bayes; BernoulliNB: Bernoulli naïve Bayes; SVC_rbf or _poly or _sigmoid: support vector machine with rbf kernel or polynomial kernel or sigmoid kernel