| Literature DB >> 35887552 |
Qing Liu1, Qing Zhou1, Yifeng He2, Jingui Zou2, Yan Guo3, Yaqiong Yan3.
Abstract
Identifying people with a high risk of developing diabetes among those with prediabetes may facilitate the implementation of a targeted lifestyle and pharmacological interventions. We aimed to establish machine learning models based on demographic and clinical characteristics to predict the risk of incident diabetes. We used data from the free medical examination service project for elderly people who were 65 years or older to develop logistic regression (LR), decision tree (DT), random forest (RF), and extreme gradient boosting (XGBoost) machine learning models for the follow-up results of 2019 and 2020 and performed internal validation. The receiver operating characteristic (ROC), sensitivity, specificity, accuracy, and F1 score were used to select the model with better performance. The average annual progression rate to diabetes in prediabetic elderly people was 14.21%. Each model was trained using eight features and one outcome variable from 9607 prediabetic individuals, and the performance of the models was assessed in 2402 prediabetes patients. The predictive ability of four models in the first year was better than in the second year. The XGBoost model performed relatively efficiently (ROC: 0.6742 for 2019 and 0.6707 for 2020). We established and compared four machine learning models to predict the risk of progression from prediabetes to diabetes. Although there was little difference in the performance of the four models, the XGBoost model had a relatively good ROC value, which might perform well in future exploration in this field.Entities:
Keywords: incident diabetes; machine learning; prediabetes; predictive models
Year: 2022 PMID: 35887552 PMCID: PMC9324396 DOI: 10.3390/jpm12071055
Source DB: PubMed Journal: J Pers Med ISSN: 2075-4426
Figure 1Flowchart of study participants.
Baseline characteristics between the groups of participants with incident diabetes at different time points.
| Variables | 2019 | 2020 | ||||
|---|---|---|---|---|---|---|
| Without DM | DM | Without DM | DM | |||
| Age (years) | 72.06 ± 5.10 | 72.17 ± 5.22 | 0.393 | 72.08 ± 5.13 | 72.06 ± 5.10 | 0.813 |
| Gender, | <0.001 | 0.018 | ||||
| Male | 4536 (83.36) | 873 (26.14) | 3813 (70.49) | 1596 (29.51) | ||
| Female | 5695 (86.29) | 905 (13.71) | 4782 (72.45) | 1818 (27.55) | ||
| Education, | <0.001 | <0.001 | ||||
| ≤Primary school | 6485 (86.98) | 971 (13.02) | 5529 (74.16) | 1927 (25.84) | ||
| Middle school | 1990 (82.10) | 434 (17.90) | 1615 (66.63) | 809 (33.37) | ||
| High school | 931 (82.10) | 203 (17.90) | 764 (67.37) | 370 (32.63) | ||
| ≥University | 825 (82.91) | 170 (17.09) | 687 (69.05) | 308 (9.02) | ||
| Marital status, | 0.383 | 0.897 | ||||
| Married | 7762 (84.96) | 1374 (15.04) | 6541 (71.60) | 2595 (28.40) | ||
| Divorced | 57 (87.69) | 8 (12.31) | 44 (67.69) | 21 (32.31) | ||
| Widowed | 2331 (85.76) | 387 (14.24) | 1947 (71.63) | 771 (28.37) | ||
| Unmarried | 81 (90.00) | 9 (10.00) | 63 (70.00) | 27 (30.00) | ||
| Hypertension, | <0.001 | <0.001 | ||||
| No | 4893 (87.02) | 730 (12.98) | 4153 (73.86) | 1470 (26.14) | ||
| Yes | 5338 (85.39) | 1048 (16.41) | 4442 (69.56) | 1944 (30.44) | ||
| Myocardial infarction, | 0.463 | 0.298 | ||||
| No | 10,177 (85.18) | 1771 (14.82) | 8555 (71.60) | 3393 (28.40) | ||
| Yes | 54 (88.52) | 7 (11.48) | 40 (65.57) | 21 (34.43) | ||
| Coronary heart disease, | 0.841 | 0.144 | ||||
| No | 9632 (85.22) | 1670 (14.78) | 8106 (71.72) | 3196 (28.28) | ||
| Yes | 599 (84.72) | 108 (15.28) | 489 (69.17) | 218 (30.83) | ||
| Angina pectoris, | 0.828 | 0.437 | ||||
| No | 10,187 (85.19) | 1771 (14.81) | 8556 (71.55) | 3402 (28.45) | ||
| Yes | 44 (86.27) | 7 (13.73) | 39 (76.47) | 12 (23.53) | ||
| Fatty liver, | 0.315 | 0.055 | ||||
| No | 9979 (85.25) | 1727 (14.75) | 8393 (71.70) | 3313 (28.30) | ||
| Yes | 252 (83.17) | 51 (16.83) | 202 (66.67) | 101 (33.33) | ||
| Exercise, | 0.587 | 0.455 | ||||
| No | 3942 (85.42) | 673 (14.58) | 3321 (71.96) | 1294 (28.04) | ||
| Yes | 6289 (85.06) | 1105 (14.94) | 5274 (71.33) | 2120 (28.67) | ||
| Smoking, | 0.705 | 0.883 | ||||
| No | 8804 (85.15) | 1536 (14.85) | 7403 (71.60) | 2937 (28.40) | ||
| Yes | 1427 (85.50) | 242 (14.50) | 1192 (71.42) | 477 (28.58) | ||
| Drinking, | 0.295 | 0.212 | ||||
| No | 8544 (85.35) | 1467 (14.65) | 7188 (71.80) | 2823 (28.20) | ||
| Yes | 1687 (84.43) | 311 (15.57) | 1407 (70.42) | 591 (29.58) | ||
| BMI (kg/m2) | 24.56 ± 3.41 | 25.48 ± 3.31 | <0.001 | 24.44 ± 3.42 | 25.34 ± 3.33 | <0.001 |
| WC (cm) | 85.57 ± 9.73 | 88.65 ± 9.23 | <0.001 | 85.20 ± 9.68 | 88.10 ± 9.52 | <0.001 |
| SBP (mmHg) | 139.17 ± 19.48 | 140.30 ± 18.70 | 0.023 | 138.96 ± 19.52 | 140.28 ± 18.96 | <0.001 |
| DBP (mmHg) | 80.99 ± 11.09 | 81.57 ± 10.64 | 0.041 | 80.86 ± 11.11 | 81.60 ± 10.79 | 0.001 |
| FPG (mmol/L) | 6.42 ± 0.24 | 6.54 ± 0.26 | <0.001 | 6.40 ± 0.24 | 6.52 ± 0.26 | <0.001 |
| TC (mmol/L) | 5.05 ± 1.05 | 4.99 ± 1.03 | 0.021 | 5.06 ± 1.05 | 5.00 ± 1.03 | 0.007 |
| TG (mmol/L) | 1.32 (0.96) | 1.50 (1.06) | <0.001 | 1.30 (0.93) | 1.48 (1.05) | <0.001 |
| HDL-C (mmol/L) | 1.39 ± 0.40 | 1.34 ± 0.51 | <0.001 | 1.40 ± 0.40 | 1.33 ± 0.44 | <0.001 |
| LDL-C (mmol/L) | 2.80 ± 0.92 | 2.76 ± 1.02 | 0.147 | 2.78 ± 0.93 | 2.81 ± 0.95 | 0.231 |
| ALT (U/L) | 18.00 (11.00) | 19.10 (12.00) | <0.001 | 18.00 (10.90) | 19.00 (12.00) | <0.001 |
| AST (U/L) | 22.00 (8.30) | 22.00 (9.90) | 0.797 | 22.00 (8.30) | 22.30 (9.50) | 0.034 |
| TBil (μmol/L) | 12.80 (6.80) | 13.10 (6.20) | 0.131 | 12.80 (6.90) | 12.90 (6.50) | 0.931 |
| Scr (μmol/L) | 77.50 (29.00) | 76.90 (29.00) | 0.107 | 78.00 (28.00) | 76.00 (29.00) | <0.001 |
| BUN (mmol/L) | 5.80 (2.26) | 5.70 (2.05) | 0.002 | 5.83 (2.29) | 5.67 (2.07) | <0.001 |
| SUA (μmol/L) | 332.93 ± 99.46 | 347.54 ± 95.61 | <0.001 | 333.43 ± 99.44 | 344.32 ± 97.40 | <0.001 |
Data are shown as means ± standard deviation for normally distributed variables, median (interquartile range) for non-normally distributed variables, and percentages for categorical variables. DM: Diabetes mellitus; BMI: Body mass index; WC: Waist circumference; SBP: Systolic blood pressure; DBP: Diastolic blood pressure; FPG: Fasting plasma glucose; TC: Total cholesterol; TG: Triglyceride; HDL-C: High density lipoprotein cholesterol; LDL-C: Low density lipoprotein cholesterol; ALT: Alanine aminotransferase; AST: Aspartate aminotransferase; TBil: Total bilirubin; Scr: Serum creatinine; BUN: Blood urea nitrogen; SUA: Serum uric acid.
Figure 2Receiver operating characteristic (ROC) curves derived for prediction horizon of 1 and 2 years using the four models based logistic regression (LR), decision tree (DT), random forest (RF), and extreme gradient boosting (XGBoost): (a) 1-year forecast period; (b) 2-year forecast period.
Performance of four machine learning models for two forecast periods.
| Metrics | Machine Learning Models | |||
|---|---|---|---|---|
| LR | DT | RF | XGBoost | |
| 1-year forecast period | ||||
| Sensitivity | 0.5559 | 0.5213 | 0.5824 | 0.6569 |
| Specificity | 0.6876 | 0.7004 | 0.6807 | 0.5972 |
| Accuracy | 0.6669 | 0.6724 | 0.6653 | 0.6066 |
| F1 score | 0.3432 | 0.3325 | 0.3527 | 0.3433 |
| 2-year forecast period | ||||
| Sensitivity | 0.6232 | 0.5580 | 0.5754 | 0.6130 |
| Specificity | 0.6016 | 0.6612 | 0.6647 | 0.6443 |
| Accuracy | 0.6078 | 0.6316 | 0.6391 | 0.6353 |
| F1 score | 0.4772 | 0.4653 | 0.4780 | 0.4913 |
LR: Logistic regression; DT: Decision tree; RF: Random forest; XGBoost: Extreme gradient boosting.
Figure 3Confusion matrices derived for prediction horizons of 1 and 2 years based on the extreme gradient boosting (XGBoost): (a) 1-year forecast period; (b) 2-year forecast period.
Figure 4Feature importance in predicting incident diabetes according to the XGBoost model. The Shapley additive explanation (SHAP) algorithm is used to calculate the SHAP value which approximates how much each feature contributes to the average prediction for the dataset. (a) 1-year forecast period. (b) 2-year forecast period.
Figure 5SHAP summary plot of the XGBoost model. (a) 1-year forecast period. (b) 2-year forecast period.