| Literature DB >> 35346252 |
Zheyi Dong1, Qian Wang1, Yujing Ke1, Weiguang Zhang1, Quan Hong1, Chao Liu1, Xiaomin Liu1, Jian Yang1, Yue Xi1, Jinlong Shi2, Li Zhang1, Ying Zheng1, Qiang Lv1, Yong Wang1, Jie Wu1, Xuefeng Sun1, Guangyan Cai1, Shen Qiao2, Chengliang Yin2, Shibin Su3, Xiangmei Chen4.
Abstract
BACKGROUND: Established prediction models of Diabetic kidney disease (DKD) are limited to the analysis of clinical research data or general population data and do not consider hospital visits. Construct a 3-year diabetic kidney disease risk prediction model in patients with type 2 diabetes mellitus (T2DM) using machine learning, based on electronic medical records (EMR).Entities:
Keywords: Diabetic kidney disease; Electronic medical records; Light gradient boosting machine; Machine learning; Risk assessment; Type 2 diabetes
Mesh:
Year: 2022 PMID: 35346252 PMCID: PMC8959559 DOI: 10.1186/s12967-022-03339-1
Source DB: PubMed Journal: J Transl Med ISSN: 1479-5876 Impact factor: 5.531
Fig. 1Flow diagram of patient selection. Non-DKD, no diabetic kidney disease; DKD, diabetic kidney disease; eGFR, estimated glomerular filtration rate
Baseline demographic and clinical characteristics of the included patients
| Base line characteristics | All | non-DKD | DKD | |
|---|---|---|---|---|
| Patient population, n | 816 | 408 | 408 | |
| Male, n (%) | 541 (66.3) | 291 (71.3) | 250 (61.3) | 0.002 |
| Age (years) | 56.00 (48.25–65.00) | 52.5 (47, 60) | 61 (50, 71) | 0.000 |
| BMI (kg/m2) | 26.03 (24.22, 28.61) | 25.79 (24.46, 28.24) | 26.30 (23.96, 29.06) | 0.343 |
| Hypertension (%) | 349 (42.8) | 157 (38.5) | 192 (47.1) | 0.013 |
| Cardiovascular disease (%) | 194 (23.8) | 79 (19.4) | 115 (28.2) | 0.292 |
| Cerebrovascular disease (%) | 81 (9.9) | 36 (8.8) | 45 (11) | 0.003 |
| Peripheral neuropathy (%) | 31 (3.8) | 13 (3.2) | 18 (4.4) | 0.360 |
| Diabetic retinopathy (%) | 21 (2.6) | 9 (2.2) | 12 (2.9) | 0.507 |
| eGFR CKD-EPI (ml/min/1.73m2) | 98.42 ± 18.63 | 103.25 ± 16.15 | 93.6 ± 19.69 | 0.000 |
| SCR (μmol/L) | 68.62 ± 14.06 | 67.13 ± 12.85 | 70.1 ± 15.05 | 0.003 |
| BUN (mmol/L) | 5.36 (4.5, 6.43) | 5.22 (4.47, 6.16) | 5.51 (4.53, 6.66) | 0.004 |
| SUA (μmol/L) | 331.50 ± 91.85 | 336 ± 84.46 | 326.99 ± 98.58 | 0.161 |
| HbA1c (%) | 6.8 (6.3, 77.7) | 6.6 (6.18, 7.30) | 7.00 (6.41, 8.03) | 0.000 |
| ALB (g/L) | 43.21 ± 4.04 | 43.89 ± 3.63 | 42.54 ± 4.31 | 0.000 |
| TC (mmol/L) | 4.41 (3.72, 5.23) | 4.46 (3.85, 5.24) | 4.33 (3.65, 5.23) | 0.153 |
| TG (mmol/L) | 1.58 (1.11, 2.35) | 1.69 (1.16, 2.42) | 1.50 (1.03, 2.21) | 0.019 |
| HDL (mmol/L) | 1.04 (0.89, 1.28) | 1.03 (0.88, 1.26) | 1.07 (0.90, 1.31) | 0.107 |
| LDL (mmol/L) | 2.76 ± 0.91 | 2.82 ± 0.86 | 2.71 ± 0.95 | 0.09 |
| K (mmol/L) | 4.09 ± 0.38 | 4.07 ± 0.33 | 4.1 ± 0.42 | 0.372 |
| Na (mmol/L) | 142 (140, 143.3) | 142.00 (140.80, 143.88) | 141.60 (139.4, 143.00) | 0.000 |
| Ca (mmol/L) | 2.29 ± 0.11 | 2.29 ± 0.10 | 2.28 ± 0.12 | 0.381 |
| P (mmol/L) | 1.18 ± 0.18 | 1.20 ± 0.18 | 1.17 ± 0.18 | 0.023 |
| Bicarbonate (mmol/L) | 26.13 (24.95, 27.5) | 25.98 (24.9, 27.10) | 26.23 (25.01, 27.78) | 0.017 |
| Hcy | 12.59 (10.17, 15.45) | 1175 (9.71, 14.16) | 13.61 (10.74, 16.65) | 0.000 |
| Hb (g/L) | 141.93 ± 18.8 | 145.51 ± 16.82 | 138.34 ± 19.98 | 0.000 |
| NLR | 1.84 (1.42, 2.44) | 1.76 (1.35, 2.32) | 1.97 (1.45, 2.61) | 0.000 |
| FIB (g/L) | 3.09 (2.70, 3.54) | 3.00 (2.65, 3.42) | 3.18 (2.77, 3.69) | 0.000 |
Values for continuous variables are expressed as mean ± standard deviation or median [interquartile range]; values for categorical data are given as number (percent). The P value represents comparison between non-DKD group and DKD group
Abbreviations and definitions: BMI, body mass index; eGFR, estimated glomerular filtration rate; SCR, serum creatinine; BUN, blood urea nitrogen; SUA, serum uric acid; ALB, serum albumin; TC, total cholesterol; TG, triglyceride; HDL. high-density lipoprotein; LDL, low-density lipoprotein; K, serum potassium; Na, serum sodium; Ca, calcium; P, phosphate; Hcy, homocysteine; Hb, hemoglobin; NLR, neutrophils to lymphocytes ratio; FIB, Plasma fibrinogen
Performance of the prediction models generated by the seven machine learning algorithms
| Models | AUC | 95% CI | SE (recall) | SP | AC | F1 | PPV | NPV | |
|---|---|---|---|---|---|---|---|---|---|
| Lower bound | Upper bound | ||||||||
| LightGBM | 0.815 | 0.747 | 0.882 | 0.741 | 0.797 | 0.768 | 0.768 | 0.797 | 0.741 |
| XGBoost | 0.779 | 0.706 | 0.853 | 0.682 | 0.785 | 0.732 | 0.725 | 0.773 | 0.697 |
| AdaBoost | 0.805 | 0.738 | 0.872 | 0.659 | 0.772 | 0.713 | 0.704 | 0.757 | 0.678 |
| Artificial Neural Network | 0.800 | 0.730 | 0.869 | 0.659 | 0.911 | 0.768 | 0.747 | 0.862 | 0.680 |
| Decision Tree | 0.579 | 0.503 | 0.655 | 0.576 | 0.595 | 0.579 | 0.587 | 0.598 | 0.603 |
| Support Vector Machine | 0.791 | 0.720 | 0.862 | 0.612 | 0.886 | 0.744 | 0.712 | 0.852 | 0.680 |
| Logistic Regression | 0.798 | 0.728 | 0.868 | 0.718 | 0.759 | 0.738 | 0.739 | 0.763 | 0.714 |
SE: sensitivity; SP: specificity; AC: accuracy; PPV: positive predictive value; NPV: negative predictive value
Fig. 2Evaluation of the seven machine learning algorithms based on the AUC of the ROC curve. AUC, area under the curve; ROC, receiver operating characteristic
Fig. 3a Importance matrix plot of the LightGBM model, depicting the importance of each variable for predicting 3-year DKD risk in patients with T2DM and normo-albuminuria. b SHAP summary plot of the top 8 clinical features of the LightGBM model. There is one dot per patient per feature colored according to an attribution value, where red represents a higher value and blue represents a lower value. Hcy, homocysteine; BMI, body mass index; ALB, serum albumin; eGFR, estimated glomerular filtration rate; LDL, low-density lipoprotein
Fig. 4SHAP dependence plot of the LightGBM model, depicting how a single variable affects the prediction. SHAP values for specific features that exceed zero suggest an increased risk of DKD. Hcy, homocysteine; BMI, body mass index; eGFR, estimated glomerular filtration rate
Fig. 5SHAP force plot for patients in the dataset at high (a) or low (b) risk of developing DKD; c SHAP values (global interpretation) for the training set. The abscissa represents each patient, and the ordinate represents the SHAP value. More red indicates a higher overall risk. Hcy, homocysteine; BMI, body mass index; eGFR, estimated glomerular filtration rate