| Literature DB >> 33351102 |
Yan-Ting Wu1,2,3, Chen-Jie Zhang1,2,3, Ben Willem Mol4, Andrew Kawai4, Cheng Li1,2,3, Lei Chen1, Yu Wang1,2,3, Jian-Zhong Sheng5, Jian-Xia Fan1,2,3, Yi Shi6, He-Feng Huang1,2,3.
Abstract
CONTEXT: Accurate methods for early gestational diabetes mellitus (GDM) (during the first trimester of pregnancy) prediction in Chinese and other populations are lacking.Entities:
Keywords: BMI; GDM; early prediction; early pregnancy; machine learning models; thyroxine
Mesh:
Year: 2021 PMID: 33351102 PMCID: PMC7947802 DOI: 10.1210/clinem/dgaa899
Source DB: PubMed Journal: J Clin Endocrinol Metab ISSN: 0021-972X Impact factor: 5.958
Figure 1.Variable selection results. A, Spearman correlation coefficients between each variable and the gestational diabetes mellitus (GDM)/non-GDM label vector, over all the samples. The bar plots from left to right represent absolute values from high to low. B, Spearman correlation coefficients between all the variables over vectors of all the samples. Detailed correlation coefficient values can be found in Supplementary Table S5 (18). C, Variable-way hierarchical clustering results using distance metrics based on Spearman correlation coefficients.
Figure 2.A and B, Ten-fold cross-validation (CV)-based detailed prediction outcomes of each variable selection iteration. The yellow and blue elements represent predicted gestational diabetes mellitus (GDM) cases and predicted non-GDM cases, respectively. A, Seeking optimal accuracy. B, Seeking optimal area under the curve (AUC). C and D, Variable selection trajectory guided by classification accuracy and AUC, respectively, under a 10-fold CV framework. E and F, Leave-one-out CV-based detailed prediction outcomes of each variable selection iteration. The yellow and blue elements represent predicted GDM cases and predicted non-GDM cases, respectively. E, Seeking optimal accuracy. F, Seeking optimal AUC. G and H, Variable selection trajectory guided by classification accuracy and AUC, respectively, under a leave-one-out CV framework.
Sociodemographic characteristics of the training group and testing group
| Characteristic | 2017 training group n = 16 819 | 2018 testing group n = 15 371 |
|
|---|---|---|---|
| Age, y, median (IQR) | 31 (28-34) | 31 (28-34) | .68 |
| BMI before pregnancy (kg/m2), median (IQR) | 20.8 (19.3-22.6) | 20.5 (19.5-22.5) | .51 |
| Smoking | 95 (0.6) | 74 (0.5) | .30 |
| Educational background | |||
| Primary school degree | 15 (0.1) | 9 (0.1) | .70 |
| Junior high school degree | 388 (2.3) | 360 (2.3) | |
| High school degree | 889 (5.3) | 789 (5.1) | |
| University degree and above | 15 527 (92.3) | 14 213 (92.5) | |
| Family history of diabetes in a first-degree relative | 1202 (7.1) | 1046 (6.8) | .23 |
| GDM | 2696 (16.0) | 2216 (14.4) | .07 |
| Personal history of GDM | 176 (1.0) | 138 (0.9) | .18 |
| Natural pregnancy | 14 504 (86.2) | 13 258 (86.3) | .96 |
| Multiple pregnancy | 489 (2.9) | 466 (3.0) | .51 |
| Multipara | 5539 (32.9) | 4833 (31.4) | .004 |
Abbreviations: BMI, body mass index; GDM, gestational diabetes mellitus; IQR, interquartile range.
Sociodemographic characteristics of gestational diabetes mellitus (GDM) and non-GDM cases
| Characteristic | 2017 training group |
| 2018 testing group |
| ||
|---|---|---|---|---|---|---|
| GDM cases | Controls | GDM cases | Controls | |||
| n = 2696 | n = 14 123 | n = 2216 | n = 13 155 | |||
| n (%) | n (%) | n (%) | n (%) | |||
| Age, y, median (IQR) | 32 (29-36) | 30 (28-34) | < .001 | 33 (30-36) | 30 (28-33) | < .001 |
| < 38 | 2340 (86.8) | 13240 (93.7) | < .00 | 1933 (87.2) | 12324 (93.7) | < .00 |
| ≥ 38 | 356 (13.2) | 883 (6.3) | 1 | 283 (12.8) | 831 (6.3) | 1 |
| Weight, kg, before pregnancy, median (IQR) | 56.0 (52.0-62.0) | 54.5 (50.0-59.0) | < .001 | 58.0 (52.0-64.0) | 55.0 (50.0-59.0) | < .001 |
| Height, cm, median (IQR) | 161.0 (158.0-165.0) | 162.0 (159.0-165.0) | < .001 | 161.0 (158.0-165.0) | 162.0 (160.0-165.0) | < .001 |
| BMI before pregnancy (kg/m2), median (IQR) | 21.6 (20.1-23.6) | 20.7 (19.3-22.3) | < .001 | 22.1 (20.1-24.4) | 20.8 (19.5-22.1) | < .001 |
| ≤ 23 | 1620 (60.1) | 9926 (70.2) | < .00 | 1386 (62.5) | 11 064 (84.1) | < .00 |
| > 23 | 1076 (39.9) | 4197 (29.7) | 1 | 830 (37.5) | 2091 (15.9) | 1 |
| Drinking | 7 (0.3) | 39 (0.3) | 1.00 | 35 (1.6) | 218 (1.7) | 0.79 |
| Smoking | 14 (0.5) | 81 (0.6) | .89 | 13 (0.6) | 61 (0.5) | 0.44 |
| Educational background | ||||||
| Primary school degree | 5 (0.2) | 10 (0.1) | < .001 | 3 (0.1) | 6 (0.05) | < .001 |
| Junior high school degree | 102 (3.8) | 286 (2) | 65 (2.9) | 295 (2.2) | ||
| High school degree | 162 (6) | 727 (5.1) | 142 (6.4) | 647 (4.9) | ||
| University degree and above | 2427 (90.0) | 13 100 (92.8) | 2006 (90.5) | 12 207 (92.8) | ||
| Family history of diabetes in a first-degree relative | 439 (16.3) | 763 (5.4) | < .001 | 341 (15.4) | 705 (5.4) | < .001 |
Abbreviations: BMI, body mass index; GDM, gestational diabetes mellitus; IQR, interquartile range.
Selecting variables by k-nearest neighbor
| 10-fold (accuracy) | LOO (accuracy) | 10-fold (AUC) | LOO (AUC) | |
|---|---|---|---|---|
| Selected variables | FPG | FPG | FPB | FPG |
| Lipoprotein(a) | FPG | FPG | HbA1c | |
| Total 3,5,3′-triiodothyronine | Lipoprotein(a) | Lipoprotein(a) | Family history of diabetes in a first-degree relative | |
| Age | Total 3,5,3′-triiodothyronine | Total 3,5,3′-triiodothyronine | Triglyceride | |
| Multiple pregnancy | Age | Triglyceride | Age | |
| Total thyroxin | Age | Total 3,5,3′-triiodothyronine | ||
| ApoA | HbA1c | Lipoprotein(a) | ||
| Multipara | Total thyroxin | Age | ||
| Multiple pregnancy | Age | Total thyroxin | ||
| ApoB | Multipara | |||
| Multipara | ApoA | |||
| Previous GDM | Multiple pregnancy | |||
| Multiple pregnancy | Previous GDM | |||
| Smoking |
Using the 10-fold method, 5 variables were selected to obtain optimal accuracy; using the LOO method, 9 variables were selected to obtain optimal accuracy; using the 10-fold method, 14 variables were selected to obtain the optimal ROC area; using the LOO method, 13 variables were selected to obtain the optimal ROC area.
Abbreviations: ApoA, apolipoprotein A; ApoB, apolipoprotein B; AUC, area under the curve; FPG, fasting plasma glucose; GDM, gestational diabetes mellitus; HBA1c, glycated hemoglobin; LOO, leave-one-out; ROC, receiver operating characteristic.
Categorical variable: age younger than 38 years: 0, age 38 years or older: 1; FPG less than 5.1 mmol/L: 0, FPG 5.1 or greater and less than 7.0 mmol/L: 1; HbA1c 5.7 or less: 0, HbA1c greater than 5.7 and less than 6.5: 1.
Selected 17 variables in the training group and testing group
| Characteristic | 2017 training group |
| 2018 testing group |
| ||
|---|---|---|---|---|---|---|
| GDM cases | Controls | GDM cases | Control | |||
| n = 2696 n (%) | n = 14 123 n (%) | n = 2216 n (%) | n = 13 155 n (%) | |||
| Age, y, median (IQR) | 32 (29-36) | 30 (28-34) | < .001 | 33 (30-36) | 30 (28-33) | < .001 |
| Age | 356 (13.2) | 883 (6.3) | < .001 | 283 (12.8) | 831 (6.3) | < .001 |
| Smoking | 14 (0.5) | 81 (0.6) | .89 | 13 (0.6) | 61 (0.5) | .44 |
| Family history of diabetes in a first-degree relative | 439 (16.3) | 763 (5.4) | < .001 | 341 (15.4) | 705 (5.4) | < .001 |
| Personal history of GDM | 132 (4.9) | 44 (0.3) | < .001 | 94 (4.2) | 44 (0.3) | < .001 |
| Multiple pregnancy | 110 (4.1) | 379 (2.7) | < .001 | 80 (3.6) | 386 (2.9) | < .001 |
| Multipara | 1053 (39.1) | 4486 (31.8) | < .001 | 825 (37.2) | 4008 (30.5) | < .001 |
| ApoA | 2.01 (1.93-2.08) | 1.98 (1.94-2.02) | < .001 | 2.15 (2.01-2.29) | 2.14 (2.01-2.27) | .17 |
| ApoB | 0.89 (0.84-0.94) | 0.85 (0.84-0.88) | < .001 | 0.79 (0.70-0.91) | 0.74 (0.65-0.85) | < .001 |
| Triglyceride | 1.47 (1.15-1.89) | 1.22 (0.97-1.52) | < .001 | 1.49 (1.16-1.93) | 1.24 (0.98-1.57) | < .001 |
| Lipoprotein(a) | 157.8 (101.5-185.9) | 191.2 (173.3-210.9) | < .001 | 103.0 (46.0-216.3) | 123.0 (57.0-232.0) | < .001 |
| FPG, mM | 4.77 (4.49-5.13) | 4.50 (4.30-4.70) | < .001 | 4.78 (4.50-5.14) | 4.54 (4.33-4.73) | < .001 |
| FPG | 766 (28.4) | 494 (3.5) | < .001 | 614 (27.7) | 400 (3.0) | < .001 |
| HbA1c, % | 5.3 (5.1-5.5) | 5.1 (5.0-5.3) | < .001 | 5.4 (5.2-5.6) | 5.2 (5.1-5.4) | < .001 |
| HbA1c | 179 (6.6) | 71 (0.5) | < .001 | 241 (10.9) | 131 (1.0) | < .001 |
| Total thyroxin, pM | 114.2 (106.6-119.0) | 116.0 (112.6-120.1) | < .001 | 115.7 (99.4-132.9) | 118.9 (102.8-134.2) | < .001 |
| Total 3,3,5′-triiodothyronine, nM | 2.10 (2.00-2.23) | 2.02 (1.97-2.08) | < .001 | 2.10 (1.90-2.40) | 2.00 (1.80-2.30) | < .001 |
Abbreviations: ApoA, apolipoprotein A; ApoB, apolipoprotein B; FPG, fasting plasma glucose; GDM, gestational diabetes mellitus; HBA1c, glycated hemoglobin.
Categorical variable: age younger than 38 years: 0, age 38 years or older: 1; FPG less than 5.1 mmol/L: 0, FPG 5.1 or greater and less than 7.0 mmol/L: 1; HbA1c 5.7 or less: 0, HbA1c greater than 5.7 and less than 6.5: 1.
Multivariate analysis for the 7-variable logistic regression model
| β | Adjusted odds ratio (95% CI) |
| |
|---|---|---|---|
| Intercept | −14.2334 | – | < .001 |
| Age | .0681 | 1.070 (1.058-1.083) | < .001 |
| Previous GDM | 2.6181 | 13.710 (9.532-19.718) | < .001 |
| Family history of diabetes in a first-degree relative | 1.1062 | 3.023 (2.610-3.501) | < .001 |
| Multiple pregnancy | .4349 | 1.545 (1.208-1.976) | .001 |
| FPG | 2.8165 | 16.718 (14.125-19.788) | < .001 |
| HBA1c | 1.6925 | 5.433 (4.472-6.600) | < .001 |
| Triglyceride | .5005 | 1.650 (1.528-1.781) | < .001 |
Abbreviations: FPG, fasting plasma glucose; GDM, gestational diabetes mellitus; HBA1c, glycated hemoglobin.
Categorical variable: FPG less than 5.1 mM: 0, FPG 5.1 mM or greater: 1.
Figure 3.Discriminative power comparison between different prediction models. A, Receiver operating characteristic (ROC) curves of different prediction models based on the 7-variable panel (*) and all-variable panel (**). B and C, Violin plot comparisons of predicted score distribution using different prediction models with the 7-variable panel and the all-variable panel.
Sensitivity and specificity of different models
| Prediction model | AUC (95% CI) | Optimum threshold probability | Sensitivity, % | Specificity, % | Youden index |
|---|---|---|---|---|---|
| LR | 0.77 (0.76-0.78) | 0.13 | 59 | 82 | 0.41 |
| LR | 0.77 (0.76-0.78) | 0.02 | 58 | 86 | 0.44 |
| KNN | 0.65 (0.63-0.66) | – | 31 | 98 | 0.29 |
| KNN | 0.61 (0.59-0.62) | – | 23 | 99 | 0.22 |
| SVM | 0.66 (0.65-0.67) | 0.14 | 32 | 98 | 0.30 |
| SVM | 0.77 (0.76-0.78) | 0.15 | 32 | 98 | 0.30 |
| DNN | 0.77 (0.76-0.78) | 0.10 | 70 | 69 | 0.39 |
| DNN | 0.80 (0.79-0.81) | 0.15 | 63 | 82 | 0.45 |
Abbreviations: AUC, area under the curve; DNN, deep neural network; KNN, k-nearest neighbor; LR, logistic regression; SVM, support vector machine.
Seven-variable model.
All-variable model.
Figure 4.Calibration of different models. The P values of all prediction models in Hosmer-Lemeshow (HL) tests are less than .001. The 7-variable models, A to C, show superior HL test performance compared to D to F, the all-variable models. This is because if the model incorporates all of the variables without selection, it will inevitably overfit, which will significantly affect the model calibration.
Clinical features of gestational diabetes mellitus (GDM) and non-GDM cases in the first trimester
| Characteristic | 2017 training group |
| 2018 testing group |
| ||
|---|---|---|---|---|---|---|
| GDM cases | Controls | GDM cases | Controls | |||
| n = 2696 | n = 14 123 | n = 2216 | n = 13 155 | |||
| n (%) | n (%) | n (%) | n (%) | |||
| SBP, mm Hg, median (IQR) | 114 (10-122) | 110 (102-117) | < .001 | 115 (106-124) | 110 (102-117) | < .001 |
| DBP, mm Hg, median (IQR) | 71 (65-77) | 68 (62-73) | < .001 | 71 (64-79) | 68 (62-74) | < .001 |
| PCOS | 13 (0.5) | 30 (0.2) | .02 | 30 (1.4) | 65 (0.5) | < .001 |
| Personal history of GDM | 132 (4.9) | 44 (0.3) | < .001 | 94 (4.2) | 44 (0.3) | < .001 |
| Natural pregnancy | 2180 (80.9) | 12 324 (87.3) | < .001 | 1796 (81.0) | 11 462 (87.1) | < .001 |
| Multiple pregnancy | 110 (4.1) | 379 (2.7) | < .001 | 80 (3.6) | 386 (2.9) | < .001 |
| Multipara | 1053 (39.1) | 4486 (31.8) | < .001 | 825 (37.2) | 4008 (30.5) | < .001 |
Abbreviations: DBP, diastolic blood pressure; GDM, gestational diabetes mellitus; IQR, interquartile range; PCOS, polycystic ovary syndrome; SBP, systolic blood pressure.