| Literature DB >> 32226625 |
Abstract
Modelling the risk of abnormal pregnancy-related outcomes such as stillbirth and preterm birth have been proposed in the past. Commonly they utilize maternal demographic and medical history information as predictors, and they are based on conventional statistical modelling techniques. In this study, we utilize state-of-the-art machine learning methods in the task of predicting early stillbirth, late stillbirth and preterm birth pregnancies. The aim of this experimentation is to discover novel risk models that could be utilized in a clinical setting. A CDC data set of almost sixteen million observations was used conduct feature selection, parameter optimization and verification of proposed models. An additional NYC data set was used for external validation. Algorithms such as logistic regression, artificial neural network and gradient boosting decision tree were used to construct individual classifiers. Ensemble learning strategies of these classifiers were also experimented with. The best performing machine learning models achieved 0.76 AUC for early stillbirth, 0.63 for late stillbirth and 0.64 for preterm birth while using a external NYC test data. The repeatable performance of our models demonstrates robustness that is required in this context. Our proposed novel models provide a solid foundation for risk prediction and could be further improved with the addition of biochemical and/or biophysical markers.Entities:
Keywords: Machine learning; Preterm; Risk prediction; Stillbirth
Year: 2020 PMID: 32226625 PMCID: PMC7096343 DOI: 10.1007/s13755-020-00105-9
Source DB: PubMed Journal: Health Inf Sci Syst ISSN: 2047-2501
Feature variables
| Demographics | f1 | Age (years) | Discrete |
| f2 | Race (white, black, American Indian or Alaskan native, Asian or pacific islander) | Nominal | |
| f3 | Marital status | Nominal | |
| f4 | Education (8th grade or less to doctorate) | Nominal | |
| f5 | Number of previous terminations | Discrete | |
| f6 | Special supplemental nutrition program (WIC) | Binary | |
| f7 | Smoking before pregnancy | Nominal | |
| f8 | Body mass index (BMI) | Continuous | |
| f9 | Height (inches) | Continuous | |
| f10 | Weight (pounds) | Continuous | |
| f11 | Parity | Nominal | |
| Pregnancy history | f12 | Pre-pregnancy diabetes | Binary |
| f13 | Gestational diabetes | Binary | |
| f14 | Pre-pregnancy hypertension | Binary | |
| f15 | Gestational hypertension | Binary | |
| f16 | Hypertension eclampsia | Binary | |
| f17 | Previous preterm births | Binary | |
| f18 | Infertility treatment | Binary | |
| f19 | Infertility drugs | Binary | |
| f20 | Assisted reproductive technology (ART) | Binary | |
| f21 | Previous cesarean sections | Binary | |
| Infections | f22 | Gonorrhea | Binary |
| f23 | Syphilis | Binary | |
| f24 | Chlamydia | Binary | |
| f25 | Hepatitis B | Binary | |
| f26 | Hepatitis C | Binary |
Data observations
| Data | Normal | Early stillbirth | Late stillbirth | PTB |
|---|---|---|---|---|
| Feature selection | 1,178,146 (92.5%) | 782 (0.06%) | 809 (0.06%) | 93,813 (7.36%) |
| Training | 8,331,492 (92.5%) | 5578 (0.06%) | 5806 (0.06%) | 662,026 (7.35%) |
| Validation | 1,196,173 (92.5%) | 796 (0.06%) | 845 (0.07%) | 95,033 (7.35%) |
| Test | 1,195,800 (92.5%) | 768 (0.06%) | 850 (0.07%) | 95,429 (7.38%) |
| NYC | 266,419 (93.2%) | 139 (0.05%) | 110 (0.04%) | 19,203 (6.72%) |
Descriptive statistics of CDC data
| Normal | Early stillbirth | Late stillbirth | PTB | |
|---|---|---|---|---|
| Age | ||||
| Mean (SD) | 28.5 (5.65) | 28.0 (5.99) | 28.3 (6.01) | 28.7 (6.02) |
| Range | 18.0–50.0 | 18.0–50.0 | 18.0–48.0 | 18.0–50.0 |
| Race | ||||
| White | 9,170,035 (77.0%) | 4701 (59.3%) | 5620 (67.6%) | 669,375 (70.7%) |
| Black | 1,771,832 (14.9%) | 2770 (35.0%) | 2231 (26.8%) | 205,091 (21.7%) |
| American Indian or Alaskan Native | 129,546 (1.1%) | 116 (1.5%) | 102 (1.2%) | 11,962 (1.3%) |
| Asian or Pacific Islander | 830,198 (7.0%) | 337 (4.3%) | 357 (4.3%) | 59,873 (6.3%) |
| Marital status | ||||
| Married | 4,592,768 (38.6%) | 4359 (55.0%) | 3932 (47.3%) | 439,202 (46.4%) |
| Not married | 7,308,843 (61.4%) | 3565 (45.0%) | 4378 (52.7%) | 507,099 (53.6%) |
| Education | ||||
| 8th grade or less | 412,938 (3.5%) | 281 (3.5%) | 347 (4.2%) | 33,244 (3.5%) |
| 9th through 12th grade with no diploma | 1169,038 (9.8%) | 1027 (13.0%) | 1031 (12.4%) | 118,041 (12.5%) |
| High school graduate or GED completed | 2,985,280 (25.1%) | 2667 (33.7%) | 2665 (32.1%) | 267,006 (28.2%) |
| Some college credit, but not a degree | 2,574,183 (21.6%) | 1865 (23.5%) | 1828 (22.0%) | 218,090 (23.0%) |
| Associate degree | 995,733 (8.4%) | 607 (7.7%) | 650 (7.8%) | 78,195 (8.3%) |
| Bachelor’s degree | 2,393,391 (20.1%) | 984 (12.4%) | 1277 (15.4%) | 148,341 (15.7%) |
| Master’s degree | 1,065,979 (9.0%) | 392 (4.9%) | 420 (5.1%) | 64,909 (6.9%) |
| Doctorate or Professional Degree | 305,069 (2.6%) | 101 (1.3%) | 92 (1.1%) | 18,475 (2.0%) |
| Number of previous terminations | ||||
| Mean (SD) | 0.40 (0.85) | 0.69 (1.20) | 0.65 (1.17) | 0.52 (1.03) |
| Range | 0.00–30.0 | 0.00–13.0 | 0.00–17.0 | 0.00–27.0 |
| WIC | ||||
| No | 6,950,753 (58.4%) | 5108 (64.5%) | 5410 (65.1%) | 518,777 (54.8%) |
| Yes | 4,950,858 (41.6%) | 2816 (35.5%) | 2900 (34.9%) | 427,524 (45.2%) |
| Smoking before pregnancy | ||||
| Nonsmoker | 10,685,669 (89.8%) | 6707 (84.6%) | 7179 (86.4%) | 817,116 (86.3%) |
| 1–5 | 309,680 (2.6%) | 329 (4.2%) | 300 (3.6%) | 31,856 (3.4%) |
| 6–10 | 415,142 (3.5%) | 438 (5.5%) | 402 (4.8%) | 43,269 (4.6%) |
| 11–20 | 419,861 (3.5%) | 397 (5.0%) | 356 (4.3%) | 45,524 (4.8%) |
| 21–40 | 61,875 (0.5%) | 45 (0.6%) | 63 (0.8%) | 7383 (0.8%) |
| 41 or more | 9384 (0.1%) | 8 (0.1%) | 10 (0.1%) | 1153 (0.1%) |
| BMI | ||||
| Mean (SD) | 26.6 (6.52) | 28.6 (7.70) | 28.3 (7.51) | 27.3 (7.16) |
| Range | 10.5–168 | 13.7–68.7 | 10.0–67.4 | 10.0–125 |
| Height (in.) | ||||
| Mean (SD) | 64.2 (2.84) | 64.0 (2.83) | 64.0 (2.84) | 63.9 (2.87) |
| Range | 30.0–78.0 | 48.0–78.0 | 46.0–78.0 | 34.0–78.0 |
| Weight (pounds) | ||||
| Mean (SD) | 156 (40.5) | 167 (47.8) | 165 (46.4) | 159 (44.3) |
| Range | 75.0–375 | 75.0–375 | 75.0–375 | 75.0–375 |
| Parity | ||||
| Nulliparous | 6,786,170 (57.0%) | 5649 (71.3%) | 5373 (64.7%) | 576,415 (60.9%) |
| Parous | 5,115,441 (43.0%) | 2275 (28.7%) | 2937 (35.3%) | 369,886 (39.1%) |
| Pre-pregnancy diabetes | ||||
| No | 11,823,600 (99.3%) | 7787 (98.3%) | 8135 (97.9%) | 922,519 (97.5%) |
| Yes | 78,011 (0.7%) | 137 (1.7%) | 175 (2.1%) | 23,782 (2.5%) |
| Gestational diabetes | ||||
| No | 11,255,544 (94.6%) | 7752 (97.8%) | 8019 (96.5%) | 868,686 (91.8%) |
| Yes | 646,067 (5.4%) | 172 (2.2%) | 291 (3.5%) | 77,615 (8.2%) |
| Pre-pregnancy hypertension | ||||
| No | 11,737,430 (98.6%) | 7662 (96.7%) | 8084 (97.3%) | 906,296 (95.8%) |
| Yes | 164,181 (1.4%) | 262 (3.3%) | 226 (2.7%) | 40,005 (4.2%) |
| Gestational hypertension | ||||
| No | 11,362 046 (95.5%) | 7632 (96.3%) | 8010 (96.4%) | 817,889 (86.4%) |
| Yes | 539,565 (4.5%) | 292 (3.7%) | 300 (3.6%) | 128,412 (13.6%) |
| Hypertension eclampsia | ||||
| No | 11,883,299 (99.8%) | 7895 (99.6%) | 8289 (99.7%) | 936,176 (98.9%) |
| Yes | 18,312 (0.2%) | 29 (0.4%) | 21 (0.3%) | 10,125 (1.1%) |
| Previous preterm birth | ||||
| No | 11,629,604 (97.7%) | 7163 (90.4%) | 7733 (93.1%) | 854,379 (90.3%) |
| Yes | 272,007 (2.3%) | 761 (9.6%) | 577 (6.9%) | 91,922 (9.7%) |
| Infertility treatment | ||||
| No | 11,784 432 (99.0%) | 7774 (98.1%) | 8178 (98.4%) | 932,874 (98.6%) |
| Yes | 117,179 (1.0%) | 151 (1.9%) | 133 (1.6%) | 13,427 (1.4%) |
| Infertility drugs | ||||
| No | 11,848,427 (99.6%) | 7867 (99.3%) | 8249 (99.3%) | 939,687 (99.3%) |
| Yes | 53,184 (0.4%) | 57 (0.7%) | 61 (0.7%) | 6614 (0.7%) |
| ART | ||||
| No | 11,847,091 (99.5%) | 7844 (99.0%) | 8250 (99.3%) | 940,647 (99.4%) |
| Yes | 54,520 (0.5%) | 80 (1.0%) | 60 (0.7%) | 5654 (0.6%) |
| Previous cesarean sections | ||||
| No | 10,107,199 (84.9%) | 6972 (88.0%) | 7204 (86.7%) | 777,914 (82.2%) |
| Yes | 1,794,412 (15.1%) | 952 (12.0%) | 1106 (13.3%) | 168 387 (17.8%) |
| Gonorrhea | ||||
| No | 11,873,368 (99.8%) | 7893 (99.6%) | 8271 (99.5%) | 942,694 (99.6%) |
| Yes | 28,243 (0.2%) | 31 (0.4%) | 39 (0.5%) | 3607 (0.4%) |
| Syphilis | ||||
| No | 11,892,843 (99.9%) | 7915 (99.9%) | 8300 (99.9%) | 945,206 (99.9%) |
| Yes | 8768 (0.1%) | 9 (0.1%) | 10 (0.1%) | 1095 (0.1%) |
| Chlamydia | ||||
| No | 11,695,524 (98.3%) | 7724 (97.5%) | 8142 (98.0%) | 925,808 (97.8%) |
| Yes | 206,087 (1.7%) | 200 (2.5%) | 168 (2.0%) | 20 493 (2.2%) |
| Hepatitis B | ||||
| No | 11,874,921 (99.8%) | 7907 (99.8%) | 8298 (99.9%) | 944,139 (99.8%) |
| Yes | 26,690 (0.2%) | 17 (0.2%) | 12 (0.1%) | 2162 (0.2%) |
| Hepatitis C | ||||
| No | 11,863,301 (99.7%) | 7875 (99.4%) | 8273 (99.6%) | 939,814 (99.3%) |
| Yes | 38,310 (0.3%) | 49 (0.6%) | 37 (0.4%) | 6487 (0.7%) |
Fig. 1Pearson correlation matrix of feature variables
Fig. 2Venn diagram of the three infertility-related feature variables from the whole study data
Univariate results, selected variables per outcome are highlighted
| Feature | Early stillbirth | p | Late stillbirth | p | PTB | p |
|---|---|---|---|---|---|---|
| Age | 1.00 (0.98, 1.01) | 0.43 | < | |||
| Race | < | 1.00 (0.91, 1.08) | 0.95 | < | ||
| Marital status | < | < | < | |||
| Education | < | < | < | |||
| Number of previous terminations | < | < | < | |||
| WIC | < | < | < | |||
| Smoking before pregnancy | < | 1.06 (0.96, 1.15) | 0.22 | < | ||
| BMI | < | < | < | |||
| Height | 0.98 (0.95, 1.00) | 0.07 | < | < | ||
| Parity | < | < | ||||
| Pre-pregnancy diabetes | < | < | < | |||
| Gestational diabetes | < | < | < | |||
| Pre-pregnancy hypertension | < | < | < | |||
| Gestational hypertension | 0.87 (0.60, 1.23) | 0.46 | < | |||
| Hypertension eclampsia | < | 2.41 (0.60, 6.26) | 0.13 | < | ||
| Previous preterm births | < | < | < | |||
| Infertility treatment | < | < | < | |||
| Infertility drugs | 1.73 (0.68, 3.52) | 0.18 | 1.95 (0.84, 3.79) | 0.08 | < | |
| ART | < | |||||
| Previous cesarean sections | 0.87 (0.71, 1.06) | 0.18 | < | |||
| Gonorrhea | 1.09 (0.18, 3.37) | 0.90 | 2.11 (0.65, 4.92) | 0.14 | < | |
| Syphilis | 1.70 (0.10, 7.49) | 0.60 | < 0.01 (< 0.01, < 0.01) | 0.94 | < | |
| Chlamydia | 1.09 (0.63, 1.75) | 0.73 | 1.42 (0.88, 2.14) | 0.12 | < | |
| Hepatitis B | 1.71 (0.43, 4.46) | 0.35 | < 0.01 (< 0.01, < 0.01) | 0.93 | 1.04 (0.91, 1.19) | 0.56 |
| Hepatitis C | 1.99 (0.71, 4.29) | 0.13 | 1.15 (0.29, 2.99) | 0.81 | < |
Model results of CDC test data
| Model | Early stillbirth AUC (95% CI) | TPR at 10% FPR (%) | Late stillbirth AUC (95% CI) | TPR at 10% FPR (%) | Preterm AUC (95% CI) | TPR at 10% FPR (%) |
|---|---|---|---|---|---|---|
| Logistic regression | 0.73 (0.71, 0.74) | 38 | 0.58 (0.55, 0.60) | 15 | 0.64 (0.64, 0.64) | 27 |
| Deep NN | 0.73 (0.72, 0.75) | 37 | 0.57 (0.54, 0.60) | 16 | 0.66 (0.66, 0.66) | 30 |
| SELU network | 0.75 (0.73, 0.76) | 40 | 0.59 (0.56, 0.62) | 17 | 0.67 (0.66, 0.67) | 31 |
| LGBM | 0.75 (0.74, 0.77) | 39 | 0.60 (0.58, 0.63) | 17 | 0.67 (0.67, 0.67) | 31 |
| Averaged ensemble | 0.75 (0.74, 0.77) | 39 | 0.60 (0.57, 0.62) | 18 | 0.67 (0.66, 0.67) | 31 |
| WA ensemble | 0.75 (0.74, 0.77) | 40 | 0.60 (0.58, 0.63) | 19 | 0.67 (0.67, 0.67) | 31 |
Model results of NYC test data
| Model | Early stillbirth AUC (95% CI) | TPR at 10% FPR (%) | Late stillbirth AUC (95% CI) | TPR at 10% FPR (%) | Preterm AUC (95% CI) | TPR at 10% FPR (%) |
|---|---|---|---|---|---|---|
| Logistic regression | 0.74 (0.69, 0.78) | 37 | 0.61 (0.56, 0.66) | 18 | 0.62 (0.61, 0.62) | 22 |
| Deep NN | 0.74 (0.70, 0.77) | 37 | 0.54 (0.49, 0.59) | 15 | 0.63 (0.63, 0.64) | 24 |
| SELU network | 0.76 (0.73, 0.79) | 38 | 0.59 (0.54, 0.65) | 15 | 0.64 (0.63, 0.64) | 24 |
| LGBM | 0.76 (0.70, 0.79) | 37 | 0.61 (0.55, 0.67) | 22 | 0.64 (0.63, 0.64) | 24 |
| Averaged ensemble | 0.75 (0.72, 0.79) | 38 | 0.63 (0.57, 0.68) | 21 | 0.63 (0.63, 0.64) | 22 |
| WA ensemble | 0.76 (0.71, 0.79) | 38 | 0.62 (0.56, 0.67) | 26 | 0.63 (0.63, 0.64) | 23 |
Fig. 3Weight grid searches of early stillbirth (a), late stillbirth (b) and preterm (c) for WA ensemble. Color is determined by the calculated AUC of the ensemble