Prince Addo Allotey1, Ofer Harel2. 1. Department of Statistics, College of Liberal Arts and Sciences, University of Connecticut, 215 Glenbrook Rd Unit, Storrs, CT, 4120, USA. 2. Department of Statistics, College of Liberal Arts and Sciences, University of Connecticut, 215 Glenbrook Rd Unit, Storrs, CT, 4120, USA. ofer.harel@uconn.edu.
Abstract
PURPOSE OF REVIEW: Incomplete data are a common problem in statistical analysis of environmental epidemiological research. However, many researchers still ignore this complication. We evaluate the performance of two commonly used multiple imputation (MI) methods (fully conditional specification and multivariate normal) for handling missing data and compare them to complete case analysis (CCA) method. We further discuss issues that arise when these methods are being used. RECENT FINDINGS: MI is a simulation-based approach to deal with incomplete data. In general, MI will perform better then ad hoc techniques such as CCA. MI is an approach which replaces the missing data with plausible values and allows for additional uncertainty due to the missing information caused by the incomplete data. To illustrate this, we use data of 944 women from the Collaborative Perinatal Project and compare estimates between these methods. The goal is to examine if each of two outcomes, birth-weight and spontaneous abortion, in the data set are associated with mothers' smoking status during pregnancy adjusting for baseline covariates in the model. Results indicate that MI is better suited for handling incomplete data and led to a significant improvement in parameter estimates compared to CCA. The two MI methods produced similar point estimates, but slightly different standard errors.
PURPOSE OF REVIEW: Incomplete data are a common problem in statistical analysis of environmental epidemiological research. However, many researchers still ignore this complication. We evaluate the performance of two commonly used multiple imputation (MI) methods (fully conditional specification and multivariate normal) for handling missing data and compare them to complete case analysis (CCA) method. We further discuss issues that arise when these methods are being used. RECENT FINDINGS: MI is a simulation-based approach to deal with incomplete data. In general, MI will perform better then ad hoc techniques such as CCA. MI is an approach which replaces the missing data with plausible values and allows for additional uncertainty due to the missing information caused by the incomplete data. To illustrate this, we use data of 944 women from the Collaborative Perinatal Project and compare estimates between these methods. The goal is to examine if each of two outcomes, birth-weight and spontaneous abortion, in the data set are associated with mothers' smoking status during pregnancy adjusting for baseline covariates in the model. Results indicate that MI is better suited for handling incomplete data and led to a significant improvement in parameter estimates compared to CCA. The two MI methods produced similar point estimates, but slightly different standard errors.
Entities:
Keywords:
Complete case analysis; Complete data; Missing data; Multiple imputation; Spontaneous abortion; Traditional statistical methods
Authors: Jonathan A C Sterne; Ian R White; John B Carlin; Michael Spratt; Patrick Royston; Michael G Kenward; Angela M Wood; James R Carpenter Journal: BMJ Date: 2009-06-29
Authors: Brian W Whitcomb; Enrique F Schisterman; Mark A Klebanoff; Mona Baumgarten; Alice Rhoton-Vlasak; Xiaoping Luo; Nasser Chegini Journal: Am J Epidemiol Date: 2007-05-15 Impact factor: 4.897
Authors: Xin-Chen Liu; Esben Strodl; Li-Hua Huang; Qing Lu; Yang Liang; Wei-Qing Chen Journal: Int J Environ Res Public Health Date: 2022-09-18 Impact factor: 4.614