| Literature DB >> 35402862 |
Davide De Francesco1,2,3, Yair J Blumenfeld4, Ivana Marić1, Jonathan A Mayo3, Alan L Chang1,2,3, Ramin Fallahzadeh1,2,3, Thanaphong Phongpreecha1,2,5, Alex J Butwick1, Maria Xenochristou1,2,3, Ciaran S Phibbs3,6, Neda H Bidoki1,2,3, Martin Becker1,2,3, Anthony Culos1,2,3, Camilo Espinosa1,2,3, Qun Liu1,2,3, Karl G Sylvester7, Brice Gaudilliere1,3, Martin S Angst1, David K Stevenson3, Gary M Shaw3, Nima Aghaeepour1,2,3.
Abstract
Whereas prematurity is a major cause of neonatal mortality, morbidity, and lifelong impairment, the degree of prematurity is usually defined by the gestational age (GA) at delivery rather than by neonatal morbidity. Here we propose a multi-task deep neural network model that simultaneously predicts twelve neonatal morbidities, as the basis for a new data-driven approach to define prematurity. Maternal demographics, medical history, obstetrical complications, and prenatal fetal findings were obtained from linked birth certificates and maternal/infant hospitalization records for 11,594,786 livebirths in California from 1991 to 2012. Overall, our model outperformed traditional models to assess prematurity which are based on GA and/or birthweight (area under the precision-recall curve was 0.326 for our model, 0.229 for GA, and 0.156 for small for GA). These findings highlight the potential of using machine learning techniques to predict multiple prematurity phenotypes and inform clinical decisions to prevent, diagnose and treat neonatal morbidities.Entities:
Keywords: Biological sciences; Cell biology; Molecular biology
Year: 2022 PMID: 35402862 PMCID: PMC8990172 DOI: 10.1016/j.isci.2022.104143
Source DB: PubMed Journal: iScience ISSN: 2589-0042
Descriptive statistics of study population
| Training dataset (n = 2,000,000) | Validation dataset ( | Test dataset ( | |
|---|---|---|---|
| Age [years] | 28 (23, 32) | 28 (23, 32) | 28 (23, 32) |
| Race/Ethnicity | |||
| | 630,210 (31.5%) | 630,985 (31.5%) | 2,400,043 (31.6%) |
| | 125,883 (6.3%) | 125,628 (6.3%) | 478,152 (6.3%) |
| | 222,181 (11.1%) | 222,553 (11.1%) | 841,332 (11.1%) |
| | 10,293 (0.5%) | 10,438 (0.5%) | 38,983 (0.5%) |
| | 982,397 (49.1%) | 981,378 (49.1%) | 3,726,848 (49.1%) |
| | 9,092 (0.5%) | 9,027 (0.5%) | 33,622 (0.4%) |
| | 1,323 (0.1%) | 1,361 (0.1%) | 5,085 (0.1%) |
| | 18,621 (0.9%) | 18,630 (0.9%) | 70,721 (0.9%) |
| Education | |||
| | 595,377 (29.8%) | 595,814 (29.8%) | 2,257,225 (29.7%) |
| | 542,869 (27.1%) | 541,316 (27.1%) | 2,062,298 (27.2%) |
| | 402,080 (20.1%) | 402,262 (20.1%) | 1,529,583 (20.1%) |
| | 417,984 (20.9%) | 419,258 (20.9%) | 1,587,390 (20.9%) |
| | 41,690 (2.1%) | 41,350 (2.1%) | 158,290 (2.1%) |
| Parity | |||
| | 778,620 (38.9%) | 776,915 (38.8%) | 2,955,981 (38.9%) |
| | 628,421 (31.4%) | 628,523 (31.4%) | 2,386,792 (31.4%) |
| | 340,841 (17.0%) | 342,372 (17.1%) | 1,296,547 (17.1%) |
| | 250,513 (12.5%) | 250,684 (12.5%) | 949,593 (12.5%) |
| | 1,605 (0.1%) | 1,577 (0.1%) | 5,873 (0.1%) |
| Gender | |||
| | 1,022,913 (51.1%) | 1,023,808 (51.2%) | 3,882,528 (51.1%) |
| | 977,075 (48.9%) | 976,172 (48.8%) | 3,712,177 (48.9%) |
| | 12 (0.0%) | 20 (0.0%) | 81 (0.0%) |
| Birthweight [Kg] | 3.37 (3.03, 3.69) | 3.34 (3.03, 3.69) | 3.37 (3.03, 3.69) |
| GA [days] | 276 (268, 282) | 276 (268, 283) | 276 (268, 282) |
| GA ≤32 weeks | 24,793 (1.2%) | 25,008 (1.2%) | 93,371 (1.2%) |
| GA >32 and ≤37 weeks | 170,446 (8.5%) | 170,023 (8.5%) | 644,986 (8.5%) |
| GA >37 and <40 weeks | 965,946 (48.3%) | 964,362 (48.2%) | 3,664,423 (48.3%) |
| GA ≥40 weeks | 661,724 (33.1%) | 663,515 (33.2%) | 2,516,606 (33.1%) |
| Unknown GA | 177,091 (8.9%) | 177,092 (8.9%) | 675,400 (8.9%) |
| SGA | 189,532 (10.0%) | 188,741 (10.0%) | 719,093 (10.0%) |
| PTB | 213,426 (11.7%) | 213,385 (11.7%) | 808,043 (11.7%) |
| RDS | 30,621 (1.5%) | 30,645 (1.5%) | 116,644 (1.5%) |
| IVH | 3,883 (0.2%) | 3,877 (0.2%) | 14,729 (0.2%) |
| NEC | 1,527 (0.1%) | 1,522 (0.1%) | 5,839 (0.1%) |
| ROP | 4,232 (0.2%) | 4,237 (0.2%) | 15,923 (0.2%) |
| BPD | 2,737 (0.1%) | 2,750 (0.1%) | 10,401 (0.1%) |
| PDA | 17,056 (0.9%) | 17,097 (0.9%) | 64,545 (0.9%) |
| PVL | 138 (0.01%) | 141 (0.01%) | 562 (0.01%) |
| Sepsis | 18,969 (0.9%) | 18,994 (0.9%) | 71,827 (0.9%) |
| Pulmonary hemorrhage | 585 (0.03%) | 629 (0.03%) | 2,323 (0.03%) |
| CP | 36 (0.002%) | 29 (0.001%) | 141 (0.002%) |
| Pulmonary HTN | 410 (0.02%) | 421 (0.02%) | 1,562 (0.02%) |
| Jaundice | 269,534 (13.5%) | 269,767 (13.5%) | 1,023,709 (13.5%) |
| ≥1 outcome | 297,988 (14.9%) | 298,065 (14.9%) | 1,131,381 (14.9%) |
Summary of maternal and newborn characteristics, including neonatal morbidities, in the training and validation datasets.
Figure 1Schematic of the study design and correlations between neonatal morbidities
(A) Schematic of the study design.
(B) Correlation network between neonatal morbidities and clinical variables: edges are drawn between pairs of neonatal morbidities/clinical variables that are correlated with the absolute value of the correlation coefficient exceeding 0.1; green (red) edges indicate positive (negative) correlations, AND the width of the edges is proportional to the correlation coefficient.
Figure 2Model performance in terms of precision and recall
Precision-recall curves (AUPRCs) in the test data set for the full model (black solid line), the reduced model (grey solid line), and the logistic regression models considering gestational age (blue dotted line), SGA (red square) and PTB (green dot). The horizontal light grey dashed line represents the expected precision-recall curve for a random classifier and depends on the prevalence of the morbidity.
| REAGENT or RESOURCE | SOURCE | IDENTIFIER |
|---|---|---|
| Administrative data from the California Office of Statewide Health Planning and Development | ||
| Birth certificates from the California Department of Health Care Services | ||
| R | The Comprehensive R Archive Network ( | version 3.6.3 |