| Literature DB >> 34737270 |
Yingying Hu1, Ruijia Chen1, Haibing Gao2, Haitao Lin3, Jinye Wang1, Xiaowei Wang3, Jingfeng Liu4, Yongyi Zeng5.
Abstract
Spontaneous bacterial peritonitis (SBP) is a life-threatening complication in patients with cirrhosis. We aimed to develop an explainable machine learning model to achieve the early prediction and outcome interpretation of SBP. We used CatBoost algorithm to construct MODEL-1 with 46 variables. After dimensionality reduction, we constructed MODEL-2. We calculated and compared the sensitivity and negative predictive value (NPV) of MODEL-1 and MODEL-2. Finally, we used the SHAP (SHapley Additive exPlanations) method to provide insights into the model's outcome or prediction. MODEL-2 (AUROC: 0.822; 95% confidence interval [CI] 0.783-0.856), liked MODEL-1 (AUROC: 0.822; 95% CI 0.784-0.856), could well predict the risk of SBP in cirrhotic ascites patients. The 6 most influential predictive variables were total protein, C-reactive protein, prothrombin activity, cholinesterase, lymphocyte ratio and apolipoprotein A1. For binary classifier, the sensitivity and NPV of MODEL-1 were 0.894 and 0.885, respectively, while for MODEL-2 they were 0.927 and 0.904, respectively. We applied CatBoost algorithm to establish a practical and explainable prediction model for risk of SBP in cirrhotic patients with ascites. We also identified 6 important variables closely related to the occurrence of SBP.Entities:
Mesh:
Substances:
Year: 2021 PMID: 34737270 PMCID: PMC8569162 DOI: 10.1038/s41598-021-00218-5
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.379
Demographic and clinical features of the training set.
| Variables | Spontaneous bacterial peritonitis | ||
|---|---|---|---|
| No (n = 592) | Yes (n = 359) | ||
| 0.349 | |||
| Male | 419 (70.8%) | 265 (73.8%) | |
| Female | 173 (29.2%) | 94 (26.2%) | |
| Age, years | 52.4 ± 12.1 | 52.6 ± 12.7 | 0.749 |
| 0.106 | |||
| HBV-related cirrhosis | 368 (62.2%) | 209 (58.2%) | |
| Alcoholic cirrhosis | 75 (12.7%) | 47 (13.1%) | |
| HBV and Alcohol—related cirrhosis | 45 (7.6%) | 45 (12.5%) | |
| Cryptogenic cirrhosis | 39 (6.6%) | 29 (8.1%) | |
| Autoimmune cirrhosis | 36 (6.1%) | 12 (3.3%) | |
| Other | 29 (4.9%) | 17 (4.8%) | |
| 0.768 | |||
| No | 539 (91.0%) | 324 (90.3%) | |
| Yes | 53 (9.0%) | 35 (9.7%) | |
| < 0.001 | |||
| No | 565 (95.4%) | 319 (88.9%) | |
| Yes | 27 (4.6%) | 40 (11.1%) | |
| 0.836 | |||
| No | 538 (90.9%) | 324 (90.3%) | |
| Yes | 54 (9.1%) | 35 (9.7%) | |
| 0.426 | |||
| No | 546 (92.2%) | 325 (90.5%) | |
| Yes | 46 (7.8%) | 34 (9.5%) | |
| 0.32 | |||
| No | 471 (79.6%) | 275 (76.6%) | |
| Yes | 121 (20.4%) | 84 (23.4%) | |
| 0.119 | |||
| No | 452 (76.4%) | 257 (71.6%) | |
| Yes | 140 (23.6%) | 102 (28.4%) | |
| 0.51 | |||
| No | 406 (68.6%) | 238 (66.3%) | |
| Yes | 186 (31.4%) | 121 (33.7%) | |
| No | 571 (96.5%) | 305 (85.0%) | < 0.001 |
| Yes | 21 (3.5%) | 54 (15.0%) | |
| C-reactive protein (mg/L) | 7.71 ± 13.7 | 18.4 ± 30.7 | < 0.001 |
| Procalcitonin (ng/ml) | 0.425 ± 2.55 | 0.612 ± 2.60 | 0.279 |
| White blood cell count (× 109/L) | 4.48 ± 2.26 | 5.70 ± 3.71 | < 0.001 |
| Red blood cell count (× 1012/L) | 3.84 ± 0.784 | 3.64 ± 0.825 | < 0.001 |
| Mean corpuscular volume (fL) | 93.4 ± 10.0 | 93.7 ± 10.2 | 0.702 |
| Hematocrit(%) | 34.6 ± 9.77 | 31.7 ± 10.7 | < 0.001 |
| Neutrophil ratio (%) | 60.2 ± 12.2 | 66.3 ± 12.9 | < 0.001 |
| Lymphocyte ratio (%) | 29.4 ± 11.0 | 23.0 ± 11.4 | < 0.001 |
| Monocyte ratio (%) | 7.53 ± 2.59 | 8.10 ± 3.04 | 0.00357 |
| Hemoglobin (g/L) | 119 ± 26.0 | 113 ± 26.2 | 0.00219 |
| Mean hemoglobin (pg) | 31.0 ± 3.98 | 31.3 ± 3.98 | 0.186 |
| Mean hemoglobin concentration (g/L) | 331 ± 17.4 | 334 ± 18.3 | 0.0102 |
| Platelet (× 109/L) | 95.7 ± 56.2 | 98.5 ± 70.2 | 0.51 |
| Mean platelet volume (fL) | 10.6 ± 1.33 | 10.5 ± 1.48 | 0.095 |
| Total protein (g/L) | 64.2 ± 14.1 | 48.0 ± 24.3 | < 0.001 |
| Alanine aminotransferase (U/L) | 159 ± 362 | 222 ± 494 | 0.0356 |
| Aspartate aminotransferase (U/L) | 158 ± 314 | 216 ± 394 | 0.017 |
| Gamma glutamyl transpeptidase (U/L) | 169 ± 245 | 181 ± 298 | 0.526 |
| Alkaline phosphatase (U/L) | 142 ± 85.7 | 154 ± 105 | 0.0814 |
| Total bilirubin (µmol/L) | 66.4 ± 88.0 | 134 ± 159 | < 0.001 |
| Direct bilirubin (µmol/L) | 37.4 ± 57.3 | 77.5 ± 94.9 | < 0.001 |
| Total bile acid (µmol/L) | 80.3 ± 89.5 | 127 ± 126 | < 0.001 |
| Cholinesterase (U/L) | 3860 ± 1690 | 2870 ± 1290 | < 0.001 |
| Lactate dehydrogenase (U/L) | 252 ± 207 | 278 ± 170 | 0.0415 |
| Creatine kinase (U/L) | 134 ± 163 | 157 ± 190 | 0.0548 |
| Total cholesterol (mmol/L) | 3.94 ± 1.48 | 3.39 ± 1.60 | < 0.001 |
| Triglyceride (mmol/L) | 1.16 ± 0.742 | 1.18 ± 1.03 | 0.74 |
| High-density lipoprotein(mmol/L) | 0.965 (0.518) | 0.674 (0.498) | < 0.001 |
| Apolipoprotein A1 (g/L) | 1.07 ± 0.403 | 0.792 ± 0.404 | < 0.001 |
| Apolipoprotein B (g/L) | 0.793 ± 0.330 | 0.747 ± 0.316 | 0.0339 |
| Urea (mmol/L) | 4.90 ± 3.22 | 5.29 ± 4.13 | 0.127 |
| Serum creatinine (µmol/L) | 76.1 ± 66.2 | 79.2 ± 67.7 | 0.498 |
| Uric acid (µmol/L) | 307 ± 116 | 289 ± 135 | 0.0332 |
| Prothrombin activity (%) | 69.6 ± 19.3 | 56.2 ± 19.7 | < 0.001 |
| Sodium (mmol/L) | 139 ± 3.46 | 136 ± 5.04 | < 0.001 |
Data are represented as n (%) and mean ± SD.
Figure 1Importance matrix plot of the CatBoost model.
Figure 2SHAP summary plot of the CatBoost model.
Figure 3Receiver operator characteristic (ROC) curves and calibration curves for MODEL-1 and MODEL-2(validation set). (A) ROC curves showing the prediction performance of the MODEL-1 and MODEL-2. (B) Calibration curve reflecting the degree of consistency between the predicted risk and the actual risk of the MODEL-1 and MODEL-2.
Summary of prediction results of models on the validation set.
| Value | Models | |
|---|---|---|
| MODEL-1 | MODEL-2 | |
| AUC | 0.822 | 0.822 |
| 95% CI of the AUC | 0.784–0.856 | 0.783–0.856 |
| F2-score | 0.840 | 0.815 |
| Threshold | 0.376 | 0.306 |
| Sensitivity | 0.894 | 0.927 |
| Specificity | 0.543 | 0.457 |
| Positive predictive value | 0.565 | 0.532 |
| Negative predictive value | 0.885 | 0.904 |
Figure 4SHAP explanation force plot for 4 patients from the validation set of the CatBoost model. Protein, total protein; CRP, C-reactive protein; PTA, prothrombin activity; LYR, lymphocyte ratio; Apoa1, apolipoprotein A1.
Figure 5Patient screening process for model construction.