| Literature DB >> 34268283 |
Yang Wu1,2,3, Haofei Hu3,4,5, Jinlin Cai1,2,6, Runtian Chen1,2,3, Xin Zuo7, Heng Cheng7, Dewen Yan1,2,3.
Abstract
Purpose: We aimed to establish and validate a risk assessment system that combines demographic and clinical variables to predict the 3-year risk of incident diabetes in Chinese adults.Entities:
Keywords: Incident diabetes; extreme gradient boosting; machine learning; risk; simple stepwise model
Mesh:
Year: 2021 PMID: 34268283 PMCID: PMC8275929 DOI: 10.3389/fpubh.2021.626331
Source DB: PubMed Journal: Front Public Health ISSN: 2296-2565
Figure 1Flowchart of study participants.
Baseline characteristics of the training and validation sets.
| Participants | 7,940 | 7,988 | |
| Incident diabetes | 0.901 | ||
| No | 7,795 (98.17%) | 7,840 (98.15%) | |
| Yes | 145 (1.83%) | 148 (1.85%) | |
| Age (year) | 43.43 ± 12.45 | 43.24 ± 12.17 | 0.339 |
| Gender | 0.595 | ||
| Male | 5,157 (64.95%) | 5,156 (64.55%) | |
| Female | 2,783 (35.05%) | 2,832 (35.45%) | |
| BMI (kg/m2) | 23.51 ± 3.28 | 23.54 ± 3.32 | 0.552 |
| SBP (mmHg) | 119.90 ± 16.00 | 119.62 ± 15.77 | 0.266 |
| DBP (mmHg) | 75.12 ± 10.46 | 75.04 ± 10.38 | 0.633 |
| FPG (mmol/L) | 4.86 ± 0.66 | 4.84 ± 0.66 | 0.247 |
| TG (mmol/L) | 1.17 (0.80–1.77) | 1.16 (0.80–1.75) | 0.287 |
| HDL-C (mmol/L) | 1.30 ± 0.31 | 1.30 ± 0.33 | 0.198 |
| LDL-C (mmol/L) | 2.75 ± 0.69 | 2.75 ± 0.69 | 0.913 |
| ALT (U/L) | 20.00 (14.00–30.00) | 20.00 (14.00–30.30) | 0.566 |
| BUN (mmol/L) | 4.66 ± 1.17 | 4.67 ± 1.16 | 0.880 |
| Scr (μmol/L) | 72.04 ± 15.07 | 72.11 ± 15.25 | 0.767 |
| Smoking status | 0.443 | ||
| Ever/current | 1972 (24.84%) | 2026 (25.36%) | |
| Never | 5968 (75.16%) | 5962 (74.64%) | |
| Drinking status | 0.624 | ||
| Ever/current | 1,544 (19.45%) | 1,578 (19.75%) | |
| Never | 6,396 (80.55%) | 6,410 (80.25%) | |
| Family history | 0.157 | ||
| No | 7,400 (93.20%) | 7489 (93.75%) | |
| Yes | 540 (6.80%) | 499 (6.25%) |
Values are n (%) or mean ± SD.
BMI, Body mass index; SBP, Systolic blood pressure; DBP, Diastolic blood pressure; FPG; Fasting plasma glucose; TG, Triglyceride; HDL-C, High density lipoprotein cholesterol; LDL-C, Low density lipid cholesterol; ALT, Alanine aminotransferase; BUN, Blood urea nitrogen; Scr, Serum creatinine; Family history, Family history of diabetes.
Figure 2Shapley values-based interpretation of the model. Contributing feature importance of the variables selected by the XGBoost model.
Figure 3The ROC curves of the MFP model, full model and stepwise model in the training set (A) and validation set (B).
Variables selected using stepwise logistic regression.
| (Intercept) | −24.07232 | 1.34753 | −17.86405 | – | – |
| FPG (mmol/L) | 2.45073 | 0.15763 | 15.54774 | 11.2812 (8.0798–16.4983) | 0.0000 |
| HDL-C (mmol/L) | 1.14025 | 0.29593 | 3.85313 | 3.1101 (1.7651–5.8612) | 0.0000 |
| BMI (kg/m2) | 0.15291 | 0.03016 | 5.07010 | 1.1647 (1.0911–1.2413) | 0.0000 |
| Age (year) | 0.04191 | 0.00765 | 5.47752 | 1.0427 (1.0276–1.0578) | 0.0000 |
| ALT (U/L) | 0.00852 | 0.00335 | 2.53939 | 1.0085 (1.0022–1.0146) | 0.0060 |
| LDL-C (mmol/L) | −0.32400 | 0.14526 | −2.23050 | 0.7238 (0.5438–0.9229) | 0.0030 |
FPG; Fasting plasma glucose; HDL-C, High density lipoprotein cholesterol; BMI, Body mass index; LDL-C, Low density lipid cholesterol; ALT, Alanine aminotransferase; RR, Relative risk; CI, Confidence interval.
Figure 4The nomogram of the stepwise model to predict the 3-year risk of incident diabetes. When predicting an individual's 3-year risk of diabetes, locate his/her value on each variable axis. Draw a vertical line from that value to the top Points scale to determine how many points are assigned by that variable value. Then, the points from each variable value are summed. Locate the sum on the Total Points scale and vertically project it onto the bottom axis, thus obtaining a personalized 3-year risk of diabetes.
Figure 5The ROC curves of the stepwise model in the training set and validation set.
Figure 6Comparison between predicted and observed 3-year incidence of deciles of the predicted diabetes risk score in the nomogram for the training set (A) and validation set (B).
Figure 7The decision curve for the stepwise model predicts the risk of incident diabetes in the training set (A) and validation set (B). Net benefit is shown on the y-axis. The red line represents the model; the thin gray line represents the assumption that all participants develop diabetes; the thin black line represents the assumption that none participants develop diabetes. The decision curve demonstrated that if the threshold probability of a patient is >1%, using the model to predict incident diabetes adds more benefit than diabetes screenings (i.e., oral glucose tolerance test) for all or none of the participants.
Modifications and interactions between each predictor selected by the stepwise model.
| Age | BMI | 0.997 (0.994, 1.001) | 0.186 |
| Age | FPG | 0.980 (0.958, 1.002) | 0.077 |
| Age | ALT | 1.000 (0.999, 1.000) | 0.824 |
| Age | HDL-C | 1.015 (0.969, 1.064) | 0.524 |
| Age | LDL-C | 0.996 (0.974, 1.018) | 0.699 |
| ALT | FPG | 1.001 (0.991, 1.011) | 0.902 |
| ALT | BMI | 1.000 (0.999, 1.002) | 0.627 |
| ALT | HDL-C | 0.999 (0.979, 1.019) | 0.896 |
| ALT | LDL-C | 0.994 (0.986, 1.002) | 0.148 |
| BMI | FPG | 0.904 (0.832, 0.982) | 0.017 |
| BMI | HDL-C | 0.978 (0.840, 1.139) | 0.776 |
| BMI | LDL-C | 1.001 (0.923, 1.086) | 0.979 |
| FPG | HDL-C | 1.903 (0.692, 5.233) | 0.213 |
| FPG | LDL-C | 1.034 (0.643, 1.665) | 0.889 |
| HDL-C | LDL-C | 1.268 (0.560, 2.872) | 0.569 |
FPG; Fasting plasma glucose; HDL-C, High density lipoprotein cholesterol; BMI, Body mass index; LDL-C, Low density lipid cholesterol; ALT, Alanine aminotransferase; HR, Hazard Ratio; CI, Confidence interval.
Figure 8The ROC curves of the external validation.
Figure 9Comparison between predicted and observed 3-year incidence of deciles of a predicted diabetes risk score for the external validation set.