| Literature DB >> 36186938 |
Jing Li1, Zheng Xu2, Tengda Xu1, Songbai Lin1.
Abstract
Purpose: To evaluate the performance of machine-learning models based on multiple years of continuous data to predict incident diabetes among patients with metabolic syndrome. Patients andEntities:
Keywords: diabetes; machine-learning method; metabolic syndrome; prevention
Year: 2022 PMID: 36186938 PMCID: PMC9525025 DOI: 10.2147/DMSO.S381146
Source DB: PubMed Journal: Diabetes Metab Syndr Obes ISSN: 1178-7007 Impact factor: 3.249
The Criteria of the International Diabetes Federation (IDF) for the Definition of Metabolic Syndrome (MetS)
| The Patient With Mets Should Meet At Least Any Three Of The Following Factors | |
|---|---|
| Waist circumference | Male≥90 cm; Female≥80 cm |
| Triglyceride | ≥1.7 mmol/L |
| High-density lipoprotein cholesterol | Male<1.03 mmol/L; Female<1.29 mmol/L |
| Blood pressure | Systolic blood pressure≥130 mmHg |
| Fasting plasma glucose | ≥5.6 mmol/L or medical history with diabetes |
The Missing Percentages of Each Variable
| Variables | Missing Percentage (%) |
|---|---|
| Age | 0 |
| Height (cm) | 0.43 |
| Weight (kg) | 0.43 |
| BMI (kg/m2) | 0.43 |
| WC (cm) | 41.10 |
| SBP (mmHg) | 0.46 |
| DBP (mmHg) | 0.47 |
| HDLC (mmol/L) | 6.25 |
| LDLC (mmol/L) | 2.33 |
| TG (mmol/L) | 0.53 |
| FPG (mmol/L) | 0.62 |
| HbA1c (mmol/L) | 38.68 |
| TSH (mmol/L) | 35.56 |
| UA (mmol/L) | 3.75 |
Abbreviations: BMI, body mass index; WC, waist circumference; SBP, systolic blood pressure. DBP, diastolic blood pressure; FPG, fasting plasma glucose; HDLC, high-density lipoprotein cholesterol; LDLC, low-density lipoprotein cholesterol; TG, triglyceride; FPG, fasting plasma glucose; HbA1c, glycated hemoglobin; TSH, thyroid-stimulating hormone; UA, uric acid.
Figure 1The definition of each longitudinal dataset in the timeline.
Baseline Characteristics of Sub-Groups from Patient Cohorts
| Variables | Patients With Diabetes | Patients Without Diabetes | P-value |
|---|---|---|---|
| Mean±SD | Mean±SD | ||
| Age | 55.71±14.60 | 47.06±13.01 | b |
| Height (cm) | 167.22±9.15 | 168.94±8.91 | b |
| Weight (kg) | 76.54±13.07 | 73.83±11.80 | b |
| BMI (kg/m2) | 27.25±3.33 | 25.75±2.96 | b |
| WC (cm) | 91.46±9.31 | 87.43±8.52 | b |
| SBP (mmHg) | 135.27±17.80 | 125.52±15.44 | b |
| DBP (mmHg) | 79.55±10.73 | 76.76±9.82 | b |
| HDLC (mmol/L) | 1.13±0.27 | 1.16±0.26 | b |
| LDLC (mmol/L) | 3.20±0.85 | 3.19±0.77 | a |
| TG (mmol/L) | 2.22±2.22 | 1.90±1.36 | b |
| FPG (mmol/L) | 6.43±1.11 | 5.37±0.45 | b |
| HbA1c (mmol/L) | 6.00±0.65 | 5.47±0.30 | b |
| TSH (mmol/L) | 2.61±3.61 | 2.34±2.43 | b |
| UA (mmol/L) | 364.85±78.98 | 354.98±84.40 | b |
Notes: aRepresents P-value<0.05, bRepresents P-value<0.01.
Abbreviations: BMI, body mass index; WC, waist circumference; SBP, systolic blood pressure. DBP, diastolic blood pressure; FPG, fasting plasma glucose; HDLC, high-density lipoprotein cholesterol; LDLC, low-density lipoprotein cholesterol; TG, triglyceride; FPG, fasting plasma glucose; HbA1c, glycated hemoglobin; TSH, thyroid-stimulating hormone; UA, uric acid.
Performance Metrics of Machine-Learning Models Using Longitudinal Data
| Models | AUROC | Recall | Precision | F1-Score |
|---|---|---|---|---|
| Year 1 | ||||
| Logistic Regression | 0.794 | 0.681 | 0.728 | 0.702 |
| Random Forest | 0.804 | 0.747 | 0.737 | 0.738 |
| Xgboost | 0.772 | 0.687 | 0.719 | 0.699 |
| Year 2 | ||||
| Logistic Regression | 0.838 | 0.746 | 0.783 | 0.763 |
| Random Forest | 0.838 | 0.747 | 0.752 | 0.748 |
| Xgboost | 0.823 | 0.759 | 0.757 | 0.756 |
| Year 3 | ||||
| Logistic Regression | 0.828 | 0.728 | 0.775 | 0.748 |
| Random Forest | 0.862 | 0.774 | 0.778 | 0.766 |
| Xgboost | 0.833 | 0.789 | 0.798 | 0.789 |
| Year 1–2 | ||||
| Logistic Regression | 0.670 | 0.549 | 0.646 | 0.584 |
| Random Forest | 0.847 | 0.801 | 0.794 | 0.796 |
| Xgboost | 0.856 | 0.748 | 0.779 | 0.757 |
| Year 1–3 | ||||
| Logistic Regression | 0.686 | 0.597 | 0.622 | 0.603 |
| Random Forest | 0.893 | 0.789 | 0.820 | 0.803 |
| Xgboost | 0.897 | 0.831 | 0.837 | 0.834 |
Abbreviation: AUROC, area under the receiver operating characteristic curve.
Figure 2ROC curves of the three models for all the datasets.
Figure 3Feature importance of each dataset using LASSO.