Daniel Westreich1, Jessie K Edwards2, Stephen R Cole2, Robert W Platt3, Sunni L Mumford4, Enrique F Schisterman4. 1. Department of Epidemiology, Gillings School of Global Public Health, UNC-Chapel Hill, NC, USA, djw@unc.edu. 2. Department of Epidemiology, Gillings School of Global Public Health, UNC-Chapel Hill, NC, USA. 3. Department of Epidemiology, Biostatistics, and Occupational Health, McGill University, QC, Canada and. 4. Epidemiology Branch, Eunice Kennedy Shriver National Institute of Child Health and Human Development, Bethesda, MD, USA.
Abstract
BACKGROUND: The fundamental problem of causal inference is one of missing data, and specifically of missing potential outcomes: if potential outcomes were fully observed, then causal inference could be made trivially. Though often not discussed explicitly in the epidemiological literature, the connections between causal inference and missing data can provide additional intuition. METHODS: We demonstrate how we can approach causal inference in ways similar to how we address all problems of missing data, using multiple imputation and the parametric g-formula. RESULTS: We explain and demonstrate the use of these methods in example data, and discuss implications for more traditional approaches to causal inference. CONCLUSIONS: Though there are advantages and disadvantages to both multiple imputation and g-formula approaches, epidemiologists can benefit from thinking about their causal inference problems as problems of missing data, as such perspectives may lend new and clarifying insights to their analyses.
BACKGROUND: The fundamental problem of causal inference is one of missing data, and specifically of missing potential outcomes: if potential outcomes were fully observed, then causal inference could be made trivially. Though often not discussed explicitly in the epidemiological literature, the connections between causal inference and missing data can provide additional intuition. METHODS: We demonstrate how we can approach causal inference in ways similar to how we address all problems of missing data, using multiple imputation and the parametric g-formula. RESULTS: We explain and demonstrate the use of these methods in example data, and discuss implications for more traditional approaches to causal inference. CONCLUSIONS: Though there are advantages and disadvantages to both multiple imputation and g-formula approaches, epidemiologists can benefit from thinking about their causal inference problems as problems of missing data, as such perspectives may lend new and clarifying insights to their analyses.
Authors: Robert H Lyles; Li Tang; Hillary M Superak; Caroline C King; David D Celentano; Yungtai Lo; Jack D Sobel Journal: Epidemiology Date: 2011-07 Impact factor: 4.822
Authors: Daniel Westreich; Stephen R Cole; Jessica G Young; Frank Palella; Phyllis C Tien; Lawrence Kingsley; Stephen J Gange; Miguel A Hernán Journal: Stat Med Date: 2012-04-11 Impact factor: 2.373
Authors: Jessie K Edwards; Stephen R Cole; Richard D Moore; W Christopher Mathews; Mari Kitahata; Joseph J Eron Journal: Am J Epidemiol Date: 2018-08-01 Impact factor: 4.897