| Literature DB >> 35327348 |
Kuo-Hua Lee1,2,3,4, Yuan-Chia Chu5,6, Ming-Tsun Tsai1,2,3,4, Wei-Cheng Tseng1,2,3,4, Yao-Ping Lin1,2,3,4, Shuo-Ming Ou1,2,3,4, Der-Cherng Tarng1,2,3,4,6,7.
Abstract
Sepsis may lead to kidney function decline in patients with chronic kidney disease (CKD), and the deleterious effect may persist in patients who survive sepsis. We used a machine learning approach to predict the risk of end-stage renal disease (ESRD) in sepsis survivors. A total of 11,661 sepsis survivors were identified from a single-center database of 112,628 CKD patients between 2010 and 2018. During a median follow-up of 3.5 years, a total of 1366 (11.7%) sepsis survivors developed ESRD after hospital discharge. We adopted the random forest, extra trees, extreme gradient boosting, light gradient boosting machine (LGBM), and gradient boosting decision tree (GBDT) algorithms to predict the risk of ESRD development among these patients. GBDT yielded the highest area under the receiver operating characteristic curve of 0.879, followed by LGBM (0.868), and extra trees (0.865). The GBDT model revealed the strong effect of estimated glomerular filtration rates <25 mL/min/1.73 m2 at discharge in predicting ESRD development. In addition, hemoglobin and proteinuria were also essential predictors. Based on a large-scale dataset, we established a machine learning model computing the risk for ESRD occurrence among sepsis survivors with CKD. External validation is required to evaluate the generalizability of this model.Entities:
Keywords: artificial intelligence; chronic kidney disease; end-stage renal disease; machine learning; sepsis
Year: 2022 PMID: 35327348 PMCID: PMC8945427 DOI: 10.3390/biomedicines10030546
Source DB: PubMed Journal: Biomedicines ISSN: 2227-9059
Figure 1Study flowchart. This retrospective cohort consisted of 142,624 adults who received a diagnosis of chronic kidney disease between 2010 and 2018—a total of 14,234 patients with concomitant sepsis were treated in the hospital. After excluding patients who died during admission and those with insufficient data, we finally enrolled 11,661 patients who survived sepsis to discharge. Abbreviations: CKD—chronic kidney disease; eGFR—estimated glomerular filtration rate; ESRD—end-stage renal disease.
Clinical features of the patients in the training and validation sets used in machine learning models.
| All | Training Set | Validation Set | |
|---|---|---|---|
| ( | ( | ( | |
| Demographic and Clinical Characteristics | |||
| Age, years | 76.7 (63.3, 85.5) | 76.7 (63.3, 85.5) | 76.7 (63.1, 85.6) |
| Male sex, | 6927 (59.4) | 4865 (59.6) | 2062 (58.9) |
| Smoking, | 4289 (36.8) | 3009 (36.9) | 1280 (36.6) |
| Alcohol consumption, | 3291 (28.2) | 2318 (28.4) | 973 (27.8) |
| ICU admission, | 6367 (54.6) | 4457 (54.6) | 1910 (54.6) |
| Use of mechanical ventilators, | 4291 (36.8) | 3004 (36.8) | 1287 (36.8) |
| Use of inotropic agents, | 5562 (47.7) | 3893 (47.7) | 1669 (47.7) |
| Underlying Comorbidities | |||
| Hypertension, | 7540 (64.7) | 5270 (64.6) | 2270 (64.9) |
| Diabetes mellitus, | 6046 (51.8) | 4234 (51.9) | 1812 (51.8) |
| Coronary artery disease, | 3576 (30.7) | 2511 (30.8) | 1065 (30.4) |
| Heart failure, | 2551 (21.9) | 1792 (22.0) | 759 (21.7) |
| Peptic ulcer disease, | 2822 (24.2) | 1990 (24.4) | 832 (23.8) |
| COPD, | 2267 (19.4) | 1606 (19.7) | 661 (18.9) |
| Malignancy, | 4886 (41.9) | 3422 (41.9) | 1464 (41.8) |
| Charlson comorbidity index | 4 (3, 6) | 4 (3, 6) | 4 (2, 6) |
| Laboratory Data at Hospital Discharge | |||
| White blood cells,/mm3 | 8100 (5700, 11,900) | 8100 (5700, 11,900) | 8100 (5700, 12,000) |
| HGB, g/dL | 10.5 (9.3, 12.0) | 10.5 (9.3, 12.0) | 10.5 (9.3, 12.0) |
| Total cholesterol, mg/dL | 160.0 (134.0, 188.0) | 160.0 (134.0, 189.0) | 159.0 (133.0, 187.0) |
| LDL-C, mg/dL | 91.0 (70.0, 114.0) | 91.0 (70.0, 115.0) | 91.0 (69.0, 113.0) |
| HDL-C, mg/dL | 41.0 (32.0, 51.0) | 41.0 (32.0, 51.0) | 41.0 (32.0, 51.0) |
| Glucose, mg/dL | 116.0 (95.0, 156.0) | 116.0 (94.0, 155.0) | 117.0 (95.0, 157.0) |
| Uric acid, mg/dL | 5.5 (4.1, 7.1) | 5.5 (4.1, 7.1) | 5.6 (4.1, 7.1) |
| HbA1c, % | 7.2 (6.1, 10.3) | 7.1 (6.1, 10.3) | 7.2 (6.1, 10.5) |
| Albumin, mg/dL | 3.0 (2.6, 3.4) | 3.0 (2.6, 3.4) | 3.0 (2.6, 3.4) |
| Blood urea nitrogen, mg/dL | 24.0 (14.0, 51.0) | 24.0 (14.0, 51.0) | 24.0 (14.0, 50.0) |
| Creatinine, mg/dL | 1.1 (0.7, 2.1) | 1.1 (0.7, 2.2) | 1.1 (0.7, 2.1) |
| eGFR, mL/min/1.73 m2 * | 59.3 (35.5, 83.6) | 59.2 (33.4, 83.6) | 59.3 (35.1, 83.2) |
| C-reactive protein, mg/dL | 3.4 (1.2, 9.0) | 3.4 (1.2, 9.1) | 3.3 (1.1, 8.7) |
| Sodium, mmol/L | 139.0 (135.0, 142.0) | 139.0 (135.0, 142.0) | 139.0 (135.0, 142.0) |
| Potassium, mmol/L | 4.1 (3.6, 4.6) | 4.1 (3.6, 4.6) | 4.1 (3.6, 4.6) |
| Chloride, mmol/L | 103.0 (98.0, 106.0) | 103.0 (98.0, 106.0) | 103.0 (98.0, 106.0) |
| Calcium, mg/dL | 8.5 (8.0, 9.0) | 8.5 (8.0, 9.0) | 8.5 (8.0, 9.0) |
| Phosphate, mg/dL | 3.3 (2.6, 4.0) | 3.3 (2.6, 4.0) | 3.3 (2.7, 4.1) |
| Bicarbonate, mmol/L | 23.7 (19.3, 28.0) | 23.7 (19.3, 28.0) | 23.8 (19.4, 28.0) |
| INR | 1.1 (1.0, 1.2) | 1.1 (1.0, 1.2) | 1.1 (1.0, 1.2) |
| aPTT, seconds | 29.9 (27.1, 34.0) | 29.9 (27.2, 34.2) | 29.9 (27.1, 33.8) |
| D-dimer, ug/mL | 3.6 (1.6, 8.1) | 3.6 (1.5, 7.7) | 3.9 (1.8, 9.3) |
| Lactate dehydrogenase, U/L | 253.0 (196.0, 361.0) | 252.0 (196.0, 361.0) | 255.0 (197.0, 361.0) |
| NT-pro-BNP, pg/mL | 3146.0 (836.5, 11,617.0) | 3142.0 (823.8, 11,648.5) | 3185.0 (856.8, 11,580.8) |
| Total bilirubin, mg/dL | 0.6 (0.4, 1.1) | 0.6 (0.4, 1.1) | 0.6 (0.4, 1.1) |
| Alanine transaminase, U/L | 25.0 (15.0, 44.0) | 25.0 (15.0, 45.0) | 25.0 (15.0, 44.0) |
| Aspartate transaminase, U/L | 29.0 (20.0, 51.0) | 29.0 (20.0, 51.0) | 29.0 (20.0, 50.0) |
| Alkaline phosphatase, U/L | 95.0 (70.0, 147.0) | 95.0 (69.0, 147.0) | 94.0 (70.0, 147.0) |
| Gamma-glutamyl transferase, U/L | 54.0 (25.0, 125.0) | 53.0 (25.0, 125.0) | 54.0 (24.0, 126.0) |
| UPCR, mg/mg | 0.43 (0.13, 1.72) | 0.44 (0.13, 1.73) | 0.40 (0.12, 1.67) |
| Concomitant Medications | |||
| Calcium channel blockers, | 6412 (55.0) | 4517 (55.3) | 1895 (54.2) |
| Beta-blockers, | 5164 (44.3) | 3636 (44.5) | 1528 (43.7) |
| Alpha-blockers, | 3672 (31.5) | 2592 (31.8) | 1080 (30.9) |
| RAS inhibitors, | 5710 (49.0) | 3969 (48.6) | 1741 (49.8) |
| Anti-platelets, | 4472 (38.4) | 3154 (38.6) | 1318 (37.7) |
| Nitrates, | 3195 (27.4) | 2236 (27.4) | 959 (27.4) |
| Warfarin, | 758 (6.5) | 538 (6.6) | 220 (6.3) |
| Statins, | 2903 (24.9) | 2028 (24.8) | 875 (25.0) |
| Diuretics, | 2414 (20.7) | 1690 (20.7) | 724 (20.7) |
| NSAID, | 5550 (47.6) | 3885 (47.6) | 1665 (47.6) |
| COX-2 inhibitors, | 1633 (14.0) | 1143 (14.0) | 490 (14.0) |
| Metformin, | 1703 (14.6) | 1192 (14.6) | 511 (14.6) |
| Sulfonylurea, | 1085 (9.3) | 760 (9.3) | 325 (9.3) |
| Meglitinide analogues, | 1050 (9.0) | 735 (9.0) | 315 (9.0) |
| SGLT2 inhibitors, | 47 (0.4) | 33 (0.4) | 14 (0.4) |
| Dipeptidyl peptidase-4 inhibitors, | 1330 (11.4) | 931 (11.4) | 399 (11.4) |
| Insulin, | 5543 (47.5) | 3895 (47.7) | 1648 (47.1) |
Data are presented as n (%) or median and interquartile range. *—calculated by the chronic kidney disease epidemiology collaboration (CKD-EPI) creatinine equation. Abbreviations: ICU—intensive care unit; LDL-C—low-density lipoprotein cholesterol; HDL-C—high-density lipoprotein cholesterol; HbA1c—glycated hemoglobin; eGFR—estimated glomerular filtration rate; INR—international normalized ratio; NT-pro-BNP—N-terminal pro-brain natriuretic peptide; COPD—chronic obstructive pulmonary disease; HGB—hemoglobin; RAS—renin–angiotensin system; NSAIDs—nonsteroidal anti-inflammatory drugs; COX—cyclooxygenase; SGLT2—sodium–glucose cotransporter 2.
Model performance in predicting risk for end-stage renal disease among the sepsis survivors.
| Model | AUC | Accuracy | F1 | Precision | Recall | Average Precision | Sensitivity | Specificity |
|---|---|---|---|---|---|---|---|---|
| GBDT | 0.879 | 0.891 | 0.716 | 0.853 | 0.617 | 0.784 | 0.969 | 0.617 |
| LGBM | 0.868 | 0.889 | 0.712 | 0.851 | 0.612 | 0.782 | 0.969 | 0.612 |
| Extra-trees | 0.865 | 0.878 | 0.661 | 0.876 | 0.531 | 0.754 | 0.978 | 0.531 |
| Random forest | 0.864 | 0.860 | 0.565 | 0.927 | 0.406 | 0.765 | 0.991 | 0.406 |
| XGBoost | 0.859 | 0.885 | 0.708 | 0.820 | 0.623 | 0.769 | 0.961 | 0.623 |
| Logistic regression | 0.854 | 0.869 | 0.665 | 0.780 | 0.580 | 0.733 | 0.953 | 0.580 |
Abbreviations: AUC—area under the curve of receiver operating characteristic curve; GBDT—gradient boosting decision tree; LGBM—light gradient boosting machine; XGBoost—extreme gradient boosting.
Figure 2Receiver operating characteristic curves (a) and precision–recall curves (b) of machine learning models. GBDT yielded the highest area under the receiver operating characteristic curve and average precision, followed by LGBM and extra trees. Abbreviations: ROC—receiver operating characteristic; AUC—area under the curve; GBDT—gradient boosting decision tree; XGBoost—extreme gradient boosting; LGBM—light gradient boosting machine; PR—precision–recall; AP—average precision.
Figure 3Feature importance plot (a) and SHAP summary plot (b) of the top 25 clinical features predict end-stage renal disease development in the GBDT model. Abbreviations: eGFR—estimated glomerular filtration rates; HGB—hemoglobin; UPCR—urine protein/creatinine ratio; CCI—Charlson comorbidity index; DM—diabetes mellitus; HbA1C—glycohemoglobin; OHA—oral hypoglycemic agents; CHOL—cholesterol; COPD—chronic obstructive pulmonary disease; CCB—calcium channel blockers; CAD—coronary artery disease; PUD—peptic ulcer disease; HF—heart failure; RAS—renin–angiotensin system; HTN—hypertension.
Figure 4SHAP dependence plots of the GBDT model. Figure 4a–c showed the impact of eGFR (a), HGB (b), and UPCR (c) on the prediction model’s output. The risk prediction for end-stage renal disease development increases while the SHAP values of specific features exceed zero, represented by the red lines. Figure 4d–f showed the interaction effects between eGFR and the use of insulin (d), eGFR and the use of RAS inhibitor (e), eGFR and HTN (f) in the prediction model. The dotted lines represent while the SHAP value is zero. Abbreviations: SHAP—Shapley additive explanation; eGFR—estimated glomerular filtration rates; HGB—hemoglobin; UPCR—urine protein/creatinine ratio; RAS—renin–angiotensin system; HTN—hypertension; GBDT—gradient boosting decision tree. Red/blue—Features that push the prediction higher are shown in red, and those pushing the prediction lower are blue. A gray area refers to the patient distribution by eGFR levels.
Figure 5Receiver operating characteristic curves of GBDT prediction model and kidney failure risk equation (KFPE) predicting 2-year risk of ESRD (a) and 5-year risk of ESRD (b). Comparing the prediction performance of the two models, the GBDT model had a higher AUC, accuracy, and average precision than KFPE in predicting 2-year and 5-year risk of ESRD (c). The diagonal dotted line represents an AUC of 0.5. Abbreviations: ROC—receiver operating characteristic; AUC—area under curve; GBDT—gradient boosting decision tree; KFRE—kidney failure risk equation; ESRD—end-stage renal disease.