Timothy L Lash1, Barbara Abrams, Lisa M Bodnar. 1. From the aDepartment of Epidemiology, Rollins School of Public Health, Emory University, Atlanta, GA; bDivision of Epidemiology, School of Public Health, University of California at Berkeley, Berkeley, California; cDepartment of Epidemiology, Graduate School of Public Health, University of Pittsburgh, Pennsylvania, PA; dDepartment of Obstetrics, Gynecology, and Reproductive Sciences, School of Medicine, University of Pittsburgh, Pittsburgh, Pennsylvania, PA; and eMagee-Womens Research Institute, Pittsburgh, Pennsylvania, PA.
Abstract
BACKGROUND: Epidemiologic data sets continue to grow larger. Probabilistic-bias analyses, which simulate hundreds of thousands of replications of the original data set, may challenge desktop computational resources. METHODS: We implemented a probabilistic-bias analysis to evaluate the direction, magnitude, and uncertainty of the bias arising from misclassification of prepregnancy body mass index when studying its association with early preterm birth in a cohort of 773,625 singleton births. We compared 3 bias analysis strategies: (1) using the full cohort, (2) using a case-cohort design, and (3) weighting records by their frequency in the full cohort. RESULTS: Underweight and overweight mothers were more likely to deliver early preterm. A validation substudy demonstrated misclassification of prepregnancy body mass index derived from birth certificates. Probabilistic-bias analyses suggested that the association between underweight and early preterm birth was overestimated by the conventional approach, whereas the associations between overweight categories and early preterm birth were underestimated. The 3 bias analyses yielded equivalent results and challenged our typical desktop computing environment. Analyses applied to the full cohort, case cohort, and weighted full cohort required 7.75 days and 4 terabytes, 15.8 hours and 287 gigabytes, and 8.5 hours and 202 gigabytes, respectively. CONCLUSIONS: Large epidemiologic data sets often include variables that are imperfectly measured, often because data were collected for other purposes. Probabilistic-bias analysis allows quantification of errors but may be difficult in a desktop computing environment. Solutions that allow these analyses in this environment can be achieved without new hardware and within reasonable computational time frames.
BACKGROUND: Epidemiologic data sets continue to grow larger. Probabilistic-bias analyses, which simulate hundreds of thousands of replications of the original data set, may challenge desktop computational resources. METHODS: We implemented a probabilistic-bias analysis to evaluate the direction, magnitude, and uncertainty of the bias arising from misclassification of prepregnancy body mass index when studying its association with early preterm birth in a cohort of 773,625 singleton births. We compared 3 bias analysis strategies: (1) using the full cohort, (2) using a case-cohort design, and (3) weighting records by their frequency in the full cohort. RESULTS: Underweight and overweight mothers were more likely to deliver early preterm. A validation substudy demonstrated misclassification of prepregnancy body mass index derived from birth certificates. Probabilistic-bias analyses suggested that the association between underweight and early preterm birth was overestimated by the conventional approach, whereas the associations between overweight categories and early preterm birth were underestimated. The 3 bias analyses yielded equivalent results and challenged our typical desktop computing environment. Analyses applied to the full cohort, case cohort, and weighted full cohort required 7.75 days and 4 terabytes, 15.8 hours and 287 gigabytes, and 8.5 hours and 202 gigabytes, respectively. CONCLUSIONS: Large epidemiologic data sets often include variables that are imperfectly measured, often because data were collected for other purposes. Probabilistic-bias analysis allows quantification of errors but may be difficult in a desktop computing environment. Solutions that allow these analyses in this environment can be achieved without new hardware and within reasonable computational time frames.
Authors: Sophia R Newcomer; Stan Xu; Martin Kulldorff; Matthew F Daley; Bruce Fireman; Jason M Glanz Journal: J Am Med Inform Assoc Date: 2019-12-01 Impact factor: 4.497
Authors: Lisa M Bodnar; Barbara Abrams; Marnie Bertolet; Alison D Gernand; Sara M Parisi; Katherine P Himes; Timothy L Lash Journal: Paediatr Perinat Epidemiol Date: 2014-03-27 Impact factor: 3.980
Authors: Jacob N Hunnicutt; Christine M Ulbricht; Stavroula A Chrysanthopoulou; Kate L Lapane Journal: Pharmacoepidemiol Drug Saf Date: 2016-09-05 Impact factor: 2.890
Authors: Lisa M Bodnar; Jennifer A Hutcheon; Sara M Parisi; Sarah J Pugh; Barbara Abrams Journal: Paediatr Perinat Epidemiol Date: 2014-12-10 Impact factor: 3.980
Authors: Lisa M Bodnar; Lara L Siminerio; Katherine P Himes; Jennifer A Hutcheon; Timothy L Lash; Sara M Parisi; Barbara Abrams Journal: Obesity (Silver Spring) Date: 2015-11-17 Impact factor: 5.002
Authors: Jason M Glanz; Christina L Clarke; Stanley Xu; Matthew F Daley; Jo Ann Shoup; Emily B Schroeder; Bruno J Lewin; David L McClure; Elyse Kharbanda; Nicola P Klein; Frank DeStefano Journal: JAMA Pediatr Date: 2020-05-01 Impact factor: 16.193
Authors: Lisa M Bodnar; Sarah J Pugh; Timothy L Lash; Jennifer A Hutcheon; Katherine P Himes; Sara M Parisi; Barbara Abrams Journal: Epidemiology Date: 2016-11 Impact factor: 4.822