| Literature DB >> 35064173 |
Yuhan Du1, Anthony R Rafferty2, Fionnuala M McAuliffe2, Lan Wei1, Catherine Mooney3.
Abstract
Gestational Diabetes Mellitus (GDM), a common pregnancy complication associated with many maternal and neonatal consequences, is increased in mothers with overweight and obesity. Interventions initiated early in pregnancy can reduce the rate of GDM in these women, however, untargeted interventions can be costly and time-consuming. We have developed an explainable machine learning-based clinical decision support system (CDSS) to identify at-risk women in need of targeted pregnancy intervention. Maternal characteristics and blood biomarkers at baseline from the PEARS study were used. After appropriate data preparation, synthetic minority oversampling technique and feature selection, five machine learning algorithms were applied with five-fold cross-validated grid search optimising the balanced accuracy. Our models were explained with Shapley additive explanations to increase the trustworthiness and acceptability of the system. We developed multiple models for different use cases: theoretical (AUC-PR 0.485, AUC-ROC 0.792), GDM screening during a normal antenatal visit (AUC-PR 0.208, AUC-ROC 0.659), and remote GDM risk assessment (AUC-PR 0.199, AUC-ROC 0.656). Our models have been implemented as a web server that is publicly available for academic use. Our explainable CDSS demonstrates the potential to assist clinicians in screening at risk patients who may benefit from early pregnancy GDM prevention strategies.Entities:
Mesh:
Year: 2022 PMID: 35064173 PMCID: PMC8782851 DOI: 10.1038/s41598-022-05112-2
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.379
Previously published machine learning-based GDM risk prediction models.
| Authors | Subjects/data | Algorithm | Specificity | Sensitivity | AUC-ROC |
|---|---|---|---|---|---|
| Qiu et al.[ | 4,378 women | Cost-sensitive hybrid model of logistic regression, support vector machine and CHAID tree | 0.998 | 0.622 | 0.847 |
| Zheng et al.[ | 4,771 women | Multivariate Bayesian logistic regression | 0.75 | 0.66 | 0.766 |
| Ye et al.[ | 22,242 pregnancies | Gradient boosting decision tree | 0.99 | 0.15 | 0.74 |
| 0.26 | 0.90 | ||||
| Artzi et al.[ | 588,622 pregnancies | Gradient boosting | – | – | 0.854 |
| Xiong et al.[ | 490 women | Light gradient boosting machine | 0.995 | 0.883 | 0.942 |
| Yan et al.[ | 3,988 women | Logistic regression | – | 0.706 | 0.779 |
| Hou et al.[ | 1,000 samples | Light gradient boosting machine | – | – | 0.852 |
| Wu et al.[ | 32,190 women | Deep neural network | 0.82 | 0.63 | 0.80 |
| Wu et al.[ | 17,005 women | Random forest | 0.269 | 0.91 | 0.746 |
| 0.524 | 0.75 | ||||
| 0.777 | 0.487 |
Where multiple models were developed, the best performing model is described.
Descriptive features for gestational diabetes mellitus prediction.
| Feature | Non-GDM (n=413) | GDM (n=71) |
|---|---|---|
| Gestational Age (Weeks) | 14.94 (1.66) | 14.76 (1.62) |
| Maternal Age (Years) | 32.35 (4.44) | 33.80 (3.98) |
| Pobal HP Deprivation Index | 6.07 (11.30) | 7.11 (10.83) |
| Parity | 0.71 (0.90) | 0.91 (1.02) |
| Height (m) | 1.64 (0.07) | 1.64 (0.07) |
| Weight (kg) | 78.71 (10.72) | 81.72 (10.86) |
| Body Mass Index (BMI) (kg/m | 29.06 (3.21) | 30.23 (3.67) |
| Mid Upper Arm Circumstance (MUAC) (cm) | 30.90 (2.47) | 31.33 (2.47) |
| White Cell Count ( | 8.64 (1.89) | 9.28 (1.72) |
| Fasting Glucose (mmol/L) | 4.50 (0.29) | 4.87 (0.38) |
| Insulin (mU/L) | 8.99 (4.03) | 11.11 (5.36) |
| C-peptide (ng/mL) | 1.38 (0.58) | 1.79 (0.93) |
| Total Cholesterol (mmol/L) | 5.45 (0.94) | 5.20 (0.88) |
| High-Density Lipoprotein (HDL) Cholesterol (mmol/L) | 1.52 (0.43) | 1.48 (0.45) |
| Low-Density Lipoprotein (LDL) Cholesterol (mmol/L) | 3.30 (0.91) | 3.06 (0.88) |
| Triglycerides (mmol/L) | 1.39 (0.49) | 1.45 (0.48) |
| Complement Component 3 (C3) (mg/dL) | 156.04 (26.02) | 166.88 (28.16) |
| C-Reactive Protein (CRP) (mg/L) | 3.06 (7.12) | 6.11 (16.22) |
| Leptin (ng/mL) | 40.98 (19.36) | 46.71 (24.72) |
| Adiponectin ( | 17.55 (9.87) | 12.39 (5.67) |
| Ethnicity | ||
| White Irish | 312 (75.54) | 49 (69.01) |
| Other White | 66 (15.98) | 12 (16.90) |
| Black | 5 (1.21) | 0 (0) |
| Chinese | 5 (1.21) | 2 (2.82) |
| Other Asian | 11 (2.66) | 4 (5.63) |
| Mixed | 5 (1.21) | 2 (2.82) |
| Not Specified | 9 (2.18) | 2 (2.82) |
| Education | ||
| Level 1: No schooling | 0 (0) | 0 (0) |
| Level 2: Primary school education only | 0 (0) | 0 (0) |
| Level 3: Some secondary education only | 10 (2.42) | 1 (1.41) |
| Level 4: Complete secondary education only | 48 (11.62) | 7 (9.86) |
| Level 5: Some third degree education only | 84 (20.34) | 17 (23.94) |
| Level 6: Complete third degree education | 262 (63.44) | 43 (60.56) |
| Family History of Diabetes Mellitus (DM) | ||
| 1: Yes | 78 (18.89) | 26 (36.62) |
| 2: No | 328 (79.42) | 45 (63.38) |
Figure 1Workflow diagram.
Figure 2(a) Percentage of missing values for each feature. (b) Mean absolute error for generated missing values using different imputation methods.
Figure 3Correlation plot.
Hyper-parameters for each algorithm in the grid search.
| Logistic regression | C: 0.1, 1, 10 |
| solver: newton-cg, lbfgs, liblinear, sag, saga | |
| penalty: l1 (liblinear, saga solver only), l2, elasticnet (saga solver only) | |
| Random forest | n_estimators: 100, 200, 300, 500 |
| max_depth: 10, 20, 30, 50 | |
| max_features: auto, sqrt | |
| min_samples_leaf: 1, 2, 4 | |
| min_samples_split: 2, 5, 10 | |
| Support vector machine | kernel: rbf, poly, sigmoid, linear |
| C: 0.1, 1, 10 | |
| degree: 2, 3, 4 (poly kernel only) | |
| gamma: scale, auto (rbf, poly, sigmoid kernel only) | |
| Adaptive boosting | n_estimators: 20, 50, 100 |
| learning_rate: 0.1, 0.2, 0.3 | |
| Extreme gradient boosting | n_estimators: 20, 50, 100 |
| learning_rate: 0.1, 0.2, 0.3 | |
| max_depth: 4, 6, 8 | |
| objective: binary:logistic | |
| subsample: 0.6, 0.8, 1 | |
| colsample_bytree: 0.6, 0.8, 1 |
Figure 4Models’ balanced accuracy in cross-validation.
Model performance on the entire independent test set and complete-case independent test set.
| Model | Test cases | AUC-PR | AUC-ROC | Sensitivity | Specificity | Balanced accuracy |
|---|---|---|---|---|---|---|
| 1 | All (110 cases) | 0.485 | 0.792 | 0.733 | 0.768 | 0.751 |
| 1 | Complete (77 cases) | 0.551 | 0.860 | 0.833 | 0.754 | 0.794 |
| 2 | All (110 cases) | 0.208 | 0.659 | 0.6 | 0.6 | 0.6 |
| 2 | Complete (77 cases) | 0.256 | 0.690 | 0.583 | 0.6 | 0.592 |
| 3 | All (110 cases) | 0.199 | 0.656 | 0.533 | 0.674 | 0.604 |
| 3 | Complete (77 cases) | 0.320 | 0.687 | 0.5 | 0.708 | 0.604 |
Model 1: feature-agnostic model. Features: family history of diabetes mellitus, weight, white cell count, fasting glucose, insulin.
Model 2: clinical routine model. Features: gestational age, maternal age, family history of diabetes mellitus, weight, white cell count.
Model 3: remotely usable model. Features: gestational age, maternal age, family history of diabetes mellitus, weight.
Model performance on the independent test set and independent cross-cultural/ethnic test set.
| Model | Test population | AUC-PR | AUC-ROC | Sensitivity | Specificity | Balanced accuracy |
|---|---|---|---|---|---|---|
| 1 | Non-white (45 cases) | 0.572 | 0.717 | 0.6 | 0.8 | 0.7 |
| 1 | White (110 cases) | 0.485 | 0.792 | 0.733 | 0.768 | 0.751 |
| 2 | Non-white (45 cases) | 0.263 | 0.643 | 0.3 | 0.686 | 0.493 |
| 2 | White (110 cases) | 0.208 | 0.659 | 0.6 | 0.6 | 0.6 |
| 3 | Non-white (45 cases) | 0.293 | 0.677 | 0.3 | 0.829 | 0.564 |
| 3 | White (110 cases) | 0.199 | 0.656 | 0.533 | 0.674 | 0.604 |
Model 1: feature-agnostic model. Features: family history of diabetes mellitus, weight, white cell count, fasting glucose, insulin.
Model 2: clinical routine model. Features: gestational age, maternal age, family history of diabetes mellitus, weight, white cell count.
Model 3: remotely usable model. Features: gestational age, maternal age, family history of diabetes mellitus, weight.
Independent test cases.
| Case | Outcome | Gestational age (weeks) | Maternal age (years) | Family history of DM | Weight (kg) | White cell count ( | Fasting glucose (mmol/L) | Insulin (mU/L) |
|---|---|---|---|---|---|---|---|---|
| 1 | GDM | 14 | 30.48 | 2: no | 99.2 | 9.8 | 4.8 | 11.29 |
| 2 | non-GDM | 14 | 30.08 | 2: no | 75.5 | 9.9 | 4.5 | 8.42 |
Figure 5Global feature importance based on mean absolute SHAP value for (a) Model 1 (b) Model 2 (c) Model 3.
Figure 6Local interpretation based on SHAP values for (a) Model 1 (b) Model 2 (c) Model 3.
Comparison with previously published machine learning-based GDM prediction models.
| Model | Population | No. of features | Clinical test needed | Specificity | Sensitivity | AUC-ROC | AUC-PR |
|---|---|---|---|---|---|---|---|
| Model 1 | Overweight & obese | 5 | Fasting blood test (fasting glucose, insulin, white cell count) | 0.768 | 0.733 | 0.792 | 0.485 |
| Model 2 | Overweight& obese | 5 | Routine blood test (white cell count) | 0.6 | 0.6 | 0.659 | 0.208 |
| Model 3 | Overweight & obese | 4 | No | 0.674 | 0.533 | 0.656 | 0.199 |
| Qiu et al.[ | General | 49 | Fasting blood test (fasting plasma glucose, complete blood count, liver function test...) | 0.998 | 0.622 | 0.847 | – |
| Zheng et al.[ | General | 4 | Fasting blood test (fasting plasma glucose, triglycerides) | 0.75 | 0.66 | 0.766 | – |
| Ye et al.[ | General | 17 | Fasting blood test (fasting glucose, HbA1c, triglycerides...) | 0.99 | 0.15 | 0.74 | – |
| 0.26 | 0.90 | ||||||
| Artzi et al.[ | General | 2,355 | Laboratory tests including fasting blood test (glucose, white cell count...), blood pressure measurement, urine test, and blood test in previous pregnancy (glucose tolerance test...). | – | – | 0.854 | 0.318 |
| 9 | Blood test in previous pregnancy (HbA1c% test, glucose challenge test/oral glucose tolerance test) | – | – | 0.799 | 0.241 | ||
| Xiong et al.[ | General | 43 | Routine blood test, hepatic and renal function examination, coagulation function examination | 0.995 | 0.883 | 0.942 | – |
| Yan et al.[ | General | 61 | Blood test (complete blood test, liver function test...), urine test (urine glucose, urinary gallbladder, nitrite...) | – | 0.706 | 0.779 | – |
| Hou et al.[ | General | 83 | Single nucleotide polymorphism genes, blood test (cholesterol, white blood cell...) | – | – | 0.852 | – |
| Wu et al.[ | General | 73 | Fasting blood test (fasting plasma glucose, complete blood count, liver function test) | 0.82 | 0.63 | 0.80 | – |
| 7 | Fasting blood test (fasting plasma glucose, HbA1c, triglycerides) | 0.82 | 0.59 | 0.77 | – | ||
| Wu et al.[ | General | 15 | Blood test (complete blood count, liver function test, ferritin...) | 0.269 | 0.91 | 0.746 | – |
| 0.524 | 0.75 | ||||||
| 0.777 | 0.487 |