| Literature DB >> 34649307 |
Seung Mi Lee1,2, Suhyun Hwangbo3, Errol R Norwitz4, Ja Nam Koo5, Ig Hwan Oh5, Eun Saem Choi2, Young Mi Jung1,2, Sun Min Kim1,6, Byoung Jae Kim1,6, Sang Youn Kim7, Gyoung Min Kim8, Won Kim9,10, Sae Kyung Joo9,10, Sue Shin11,12, Chan-Wook Park1,2, Taesung Park3,13, Joong Shin Park1,2.
Abstract
BACKGROUND/AIMS: To develop an early prediction model for gestational diabetes mellitus (GDM) using machine learning and to evaluate whether the inclusion of nonalcoholic fatty liver disease (NAFLD)-associated variables increases the performance of model.Entities:
Keywords: Diabetes, Gestational; Machine learning; Nonalcoholic fatty liver disease; Prediction; Pregnancy, High-risk
Mesh:
Year: 2021 PMID: 34649307 PMCID: PMC8755469 DOI: 10.3350/cmh.2021.0174
Source DB: PubMed Journal: Clin Mol Hepatol ISSN: 2287-2728
Figure 1.Workflow of the study.
Baseline features and pregnancy outcomes of the study population
| Characteristic | No GDM (n=1,357) | GDM (n=86) | ||
|---|---|---|---|---|
| Baseline characteristic | ||||
| Age (years) | 32.3±4.0 | 32.4±4.6 | 0.758 | |
| Nulliparity | 716 (52.8) | 51 (59.3) | 0.286 | |
| BMI before pregnancy (kg/m2) | 22.1±3.6 | 25.1±4.9 | <0.001 | |
| WC before pregnancy (cm) (n=1,418) | 70.9±5.7 | 74.5±7.3 | <0.001 | |
| Laboratory result in early pregnancy | ||||
| Gestational age at measurement | 7.8±1.4 | 7.8±1.5 | 0.952 | |
| Hemoglobin (g/dL) | 12.7±1.0 | 13.0±1.0 | 0.006 | |
| Platelet counts (×103/uL) | 250.7±53.6 | 273.6±57.1 | <0.001 | |
| AST (U/L) | 16.0±5.2 | 18.8±15.5 | 0.098 | |
| ALT (U/L) | 14.3±10.1 | 20.9±20.7 | 0.005 | |
| Laboratory and ultrasound result at 10–14 weeks | ||||
| Gestational age at measurement | 12.4±0.5 | 12.3±0.6 | 0.209 | |
| AST (U/L) | 16.6±10.7 | 17.6±9.1 | 0.819 | |
| ALT (U/L) | 12.8±14.4 | 16.5±14.1 | 0.001 | |
| Cholesterol (mg/dL) | 172.1±30.3 | 179.2±29.5 | 0.028 | |
| HDL cholesterol (mg/dL) | 68.6±14.2 | 63.6±15.3 | 0.012 | |
| LDL cholesterol (mg/dL) | 81.4±22.3 | 84.3±25.4 | 0.150 | |
| Triglycerides (mg/dL) | 111.0±43.1 | 151.7±77.6 | <0.001 | |
| γ-GT (U/L) | 13.7±8.4 | 16.1±10.1 | 0.001 | |
| Fasting glucose (mg/dL) | 79.6±8.9 | 88.7±13.0 | <0.001 | |
| HSI | 30.3±5.0 | 34.5±5.6 | <0.001 | |
| NAFLD by liver ultrasound | 158 (11.8) | 32 (37.6) | <0.001 | |
| Pregnancy outcome | 1,327 | 85 | ||
| Gestational age at delivery (weeks) | 38.9±1.4 | 38.5±1.7 | 0.033 | |
| Birthweight (kg) | 3.2±0.4 | 3.2±0.5 | 0.998 | |
| Large-for-gestational age neonates | 137 (10.3) | 15 (17.6) | 0.053 | |
Values are presented as mean±standard deviation or number (%).
GDM, gestational diabetes mellitus; BMI, body mass index; WC, waist circumference; AST, aspartate aminotransferase; ALT, alanine aminotransferase; HDL, high-density lipoprotein; LDL, low-density lipoprotein; γ-GT, gamma-glutamyl transferase; HSI, hepatic steatosis index; NAFLD, nonalcoholic fatty liver disease.
Comparison of risk factors in the study population
| Characteristic | No GDM (n=1,357) | GDM (n=86) | |||
|---|---|---|---|---|---|
| Risk factors in old criteria, 1998 [ | |||||
| Classified as high-risk women by old criteria | 387 (28.5) | 51 (59.3) | <0.001 | ||
| Severe obesity, BMI ≥30 kg/m2 | 51 (3.8) | 13 (15.1) | <0.001 | ||
| Family history of type 2 diabetes | 290 (21.4) | 31 (36.0) | 0.002 | ||
| Previous GDM | 24 (1.8) | 7 (8.1) | <0.001 | ||
| Impaired fasting glucose | 20 (1.5) | 18 (20.9) | <0.001 | ||
| Glucosuria | 35 (2.6) | 8 (9.3) | 0.001 | ||
| Risk factors in new ACOG criteria, 2018 [ | |||||
| Classified as high-risk women by new criteria | 194 (14.3) | 36 (41.9) | <0.001 | ||
| Overweight or obese, BMI ≥23 kg/m2 | 418 (30.8) | 47 (54.7) | <0.001 | ||
| Physical inactivity | 161 (11.9) | 10 (11.6) | 1.000 | ||
| Family history of type 2 diabetes | 290 (21.4) | 31 (36.0) | 0.002 | ||
| High-risk race or ethnicity | 0 (0.0) | 0 (0.0) | - | ||
| Previous macrosomia | 15 (1.1) | 1 (1.2) | 1.000 | ||
| Previous GDM | 24 (1.8) | 7 (8.1) | <0.001 | ||
| Preexisting hypertension | 11 (0.8) | 3 (3.5) | 0.059 | ||
| Low HDL, <35 mg/dL | 13/1,350 (1.0) | 1/84 (1.2) | 1.000 | ||
| High TG, >250 mg/dL | 14/1,350 (1.0) | 6/84 (7.1) | <0.001 | ||
| PCOS | 23 (1.7) | 2 (2.3) | 0.993 | ||
| Impaired fasting glucose | 20 (1.5) | 18 (20.9) | <0.001 | ||
| History of cardiovascular disease | 8 (0.6) | 1 (1.2) | 1.000 | ||
| Severe obesity, BMI ≥30 kg/m2 | 51 (3.8) | 13 (15.1) | <0.001 | ||
Values are presented as number (%).
The risk factors in the old criteria were from the 4th International Workshop Conference on GDM in 1998; [2] the risk factors in the new criteria were based on the recommendation of the American Diabetes Association, which defined high-risk women as overweight or obese women with one of the risk factors. [3]
GDM, gestational diabetes mellitus; BMI, body mass index; ACOG, American College of Obstetricians and Gynecologists; HDL, high-density lipoprotein; TG, triglycerides; PCOS, polycystic ovarian syndrome.
Results of predictive modeling
| Setting | Variables used | Prediction model | Model development set | Test set | |||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| AUC | Sen | Spe | AUC | Sen | Spe | ||||||
| Setting 1 | (1) Conventional ACOG risk factors | LR | 0.728 | 0.649 | 0.723 | <0.001 | 0.609[ | 0.483 | 0.698 | 0.041 | 0.194[ |
| RF | 0.667 | 0.368 | 0.961 | <0.001 | 0.565 | 0.172 | 0.962 | 0.082 | 0.003[ | ||
| SVM | 0.713 | 0.649 | 0.723 | <0.001 | 0.600 | 0.483 | 0.698 | 0.053 | 0.003[ | ||
| DNN | 0.683 | 0.525 | 0.817 | <0.001 | 0.585 | 0.359 | 0.796 | 0.042 | 0.023[ | ||
| Setting 2 | (1) + (2) New ACOG risk factors form 2017 | LR | 0.777 | 0.719 | 0.734 | <0.001 | 0.563 | 0.481 | 0.728 | 0.364 | 0.105[ |
| RF | 0.702 | 0.456 | 0.945 | <0.001 | 0.578 | 0.222 | 0.951 | 0.069 | 0.009[ | ||
| SVM | 0.729 | 0.737 | 0.667 | <0.001 | 0.697[ | 0.704 | 0.666 | <0.001 | 0.084[ | ||
| DNN | 0.686 | 0.631 | 0.672 | <0.001 | 0.609 | 0.548 | 0.616 | 0.135 | 0.054[ | ||
| Setting 3 | (1) + (2) + (3) Routine clinical variables | LR | 0.842 | 0.809 | 0.761 | <0.001 | 0.617 | 0.520 | 0.758 | 0.104 | 0.297[ |
| RF | 0.983 | 0.915 | 0.955 | <0.001 | 0.643[ | 0.440 | 0.859 | 0.033 | 0.167[ | ||
| SVM | 0.810 | 0.638 | 0.870 | <0.001 | 0.605 | 0.520 | 0.725 | 0.095 | 0.008[ | ||
| DNN | 0.615 | 0.545 | 0.599 | 0.035 | 0.597 | 0.480 | 0.628 | 0.250 | 0.014[ | ||
| Setting 4 | (1) + (2) + (3) + (4) Variables associated with NAFLD | LR | 0.881 | 0.800 | 0.868 | <0.001 | 0.740 | 0.500 | 0.929 | <0.001 | 0.652[ |
| RF | 1.000 | 1.000 | 1.000 | <0.001 | 0.781[ | 0.750 | 0.670 | <0.001 | 0.647[ | ||
| SVM | 1.000 | 1.000 | 1.000 | <0.001 | 0.756 | 0.708 | 0.747 | <0.001 | 0.246[ | ||
| DNN | 0.800 | 0.572 | 0.807 | <0.001 | 0.745 | 0.517 | 0.836 | <0.001 | 0.457[ | ||
| Setting 5 | Top 11 important variables selected | LR | 0.840 | 0.778 | 0.779 | <0.001 | 0.719 | 0.542 | 0.872 | 0.001 | 1 |
| RF | 1.000 | 1.000 | 0.996 | <0.001 | 0.763 | 0.708 | 0.755 | <0.001 | 1 | ||
| SVM | 0.800 | 0.733 | 0.775 | <0.001 | 0.819[ | 0.708 | 0.866 | <0.001 | 1 | ||
| DNN | 0.806 | 0.759 | 0.678 | <0.001 | 0.777 | 0.750 | 0.654 | <0.001 | 1 | ||
Sen (i.e., sensitivity) and Spe (i.e., specificity) are represented as the values at the threshold with the maximum balanced accuracy.
AUC, area under the receiver operating characteristic curve; Sen, sensitivity; Spe, specificity; ACOG, American College of Obstetricians and Gynecologists; LR, logistic regression; RF, random forest; SVM, support vector machine; DNN, deep neural network; NAFLD, nonalcoholic fatty liver disease.
P-value when compared with the LR model in setting 5 in the test dataset.
P-value when compared with the RF model in setting 5 in the test dataset.
P-value when compared with the SVM model in setting 5 in the test dataset.
P-value when compared with the DNN model in setting 5 in the test dataset.
The maximum test AUC for each setting.
Figure 2.Receiver operating characteristic curves of the best prediction model for gestational diabetes in settings 1–5. Setting 1, conventional risk factors using older ACOG criteria. Setting 2, addition of new ACOG risk factors to setting 1. Setting 3, addition of routine clinical variables to setting 2. Setting 4, addition of variables associated with NAFLD to setting 3. Setting 5, top 11 variables. High risk 1, old criteria (from the 4th international workshop) had a sensitivity of 59.3% and specificity of 71.5% for GDM. High risk 2, new criteria (from the ADA) had a sensitivity of 41.9% and specificity of 85.9% for GDM. ACOG, American College of Obstetricians and Gynecologists; NAFLD, nonalcoholic fatty liver disease; GDM, gestational diabetes mellitus; ADA, American Diabetes Association.
Figure 3.Variable importance of the top 11 selected variables in support vector machine model. TG, triglycerides; HDL, high-density lipoprotein; ALT, alanine aminotransferase; NAFLD, nonalcoholic fatty liver disease; PCOS, polycystic ovarian syndrome; GDM, gestational diabetes; AUC, area under the receiver operating characteristic curve.