Ann Weber1, Gary L Darmstadt2, Susan Gruber3, Megan E Foeller4, Suzan L Carmichael2, David K Stevenson2, Gary M Shaw2. 1. March of Dimes Prematurity Center, Department of Pediatrics, Stanford University School of Medicine, Stanford, CA. Electronic address: annweber@stanford.edu. 2. March of Dimes Prematurity Center, Department of Pediatrics, Stanford University School of Medicine, Stanford, CA. 3. Putnam Data Sciences, Cambridge, MA. 4. Department of Obstetrics and Gynecology, Stanford University School of Medicine, Stanford, CA.
Abstract
PURPOSE: Spontaneous preterm birth is a leading cause of perinatal mortality in the United States, occurring disproportionately among non-Hispanic black women compared to other race-ethnicities. Clinicians lack tools to identify first-time mothers at risk for spontaneous preterm birth. This study assessed prediction of early (<32 weeks) spontaneous preterm birth among non-Hispanic black and white women by applying state-of-the-art machine-learning to multilevel data from a large birth cohort. METHODS: Data from birth certificate and hospital discharge records for 336,214 singleton births to nulliparous women in California from 2007 to 2011 were used in cross-validated regressions, with multiple imputation for missing covariate data. Residential census tract information was overlaid for 281,733 births. Prediction was assessed with areas under the receiver operator characteristic curves (AUCs). RESULTS: Cross-validated AUCs were low (0.62 [min = 0.60, max = 0.63] for non-Hispanic blacks and 0.63 [min = 0.61, max = 0.65] for non-Hispanic whites). Combining racial-ethnic groups improved prediction (cross-validated AUC = 0.67 [min = 0.65, max = 0.68]), approaching what others have achieved using biomarkers. Census tract-level information did not improve prediction. CONCLUSIONS: The resolution of administrative data was inadequate to precisely predict individual risk for early spontaneous preterm birth despite the use of advanced statistical methods.
PURPOSE: Spontaneous preterm birth is a leading cause of perinatal mortality in the United States, occurring disproportionately among non-Hispanic black women compared to other race-ethnicities. Clinicians lack tools to identify first-time mothers at risk for spontaneous preterm birth. This study assessed prediction of early (<32 weeks) spontaneous preterm birth among non-Hispanic black and white women by applying state-of-the-art machine-learning to multilevel data from a large birth cohort. METHODS: Data from birth certificate and hospital discharge records for 336,214 singleton births to nulliparous women in California from 2007 to 2011 were used in cross-validated regressions, with multiple imputation for missing covariate data. Residential census tract information was overlaid for 281,733 births. Prediction was assessed with areas under the receiver operator characteristic curves (AUCs). RESULTS: Cross-validated AUCs were low (0.62 [min = 0.60, max = 0.63] for non-Hispanic blacks and 0.63 [min = 0.61, max = 0.65] for non-Hispanic whites). Combining racial-ethnic groups improved prediction (cross-validated AUC = 0.67 [min = 0.65, max = 0.68]), approaching what others have achieved using biomarkers. Census tract-level information did not improve prediction. CONCLUSIONS: The resolution of administrative data was inadequate to precisely predict individual risk for early spontaneous preterm birth despite the use of advanced statistical methods.