Paul N Zivich1,2, Alexander Breskin3. 1. From the Department of Epidemiology, Gillings School of Global Public Health, University of North Carolina at Chapel Hill, Chapel Hill, NC. 2. Carolina Population Center, University of North Carolina at Chapel Hill, Chapel Hill, NC. 3. NoviSci, Durham, NC.
Abstract
BACKGROUND: Modern causal inference methods allow machine learning to be used to weaken parametric modeling assumptions. However, the use of machine learning may result in complications for inference. Doubly robust cross-fit estimators have been proposed to yield better statistical properties. METHODS: We conducted a simulation study to assess the performance of several different estimators for the average causal effect. The data generating mechanisms for the simulated treatment and outcome included log-transforms, polynomial terms, and discontinuities. We compared singly robust estimators (g-computation, inverse probability weighting) and doubly robust estimators (augmented inverse probability weighting, targeted maximum likelihood estimation). We estimated nuisance functions with parametric models and ensemble machine learning separately. We further assessed doubly robust cross-fit estimators. RESULTS: With correctly specified parametric models, all of the estimators were unbiased and confidence intervals achieved nominal coverage. When used with machine learning, the doubly robust cross-fit estimators substantially outperformed all of the other estimators in terms of bias, variance, and confidence interval coverage. CONCLUSIONS: Due to the difficulty of properly specifying parametric models in high-dimensional data, doubly robust estimators with ensemble learning and cross-fitting may be the preferred approach for estimation of the average causal effect in most epidemiologic studies. However, these approaches may require larger sample sizes to avoid finite-sample issues.
BACKGROUND: Modern causal inference methods allow machine learning to be used to weaken parametric modeling assumptions. However, the use of machine learning may result in complications for inference. Doubly robust cross-fit estimators have been proposed to yield better statistical properties. METHODS: We conducted a simulation study to assess the performance of several different estimators for the average causal effect. The data generating mechanisms for the simulated treatment and outcome included log-transforms, polynomial terms, and discontinuities. We compared singly robust estimators (g-computation, inverse probability weighting) and doubly robust estimators (augmented inverse probability weighting, targeted maximum likelihood estimation). We estimated nuisance functions with parametric models and ensemble machine learning separately. We further assessed doubly robust cross-fit estimators. RESULTS: With correctly specified parametric models, all of the estimators were unbiased and confidence intervals achieved nominal coverage. When used with machine learning, the doubly robust cross-fit estimators substantially outperformed all of the other estimators in terms of bias, variance, and confidence interval coverage. CONCLUSIONS: Due to the difficulty of properly specifying parametric models in high-dimensional data, doubly robust estimators with ensemble learning and cross-fitting may be the preferred approach for estimation of the average causal effect in most epidemiologic studies. However, these approaches may require larger sample sizes to avoid finite-sample issues.
Authors: Maya L Petersen; Kristin E Porter; Susan Gruber; Yue Wang; Mark J van der Laan Journal: Stat Methods Med Res Date: 2010-10-28 Impact factor: 3.021
Authors: Stephanie Watkins; Michele Jonsson-Funk; M Alan Brookhart; Steven A Rosenberg; T Michael O'Shea; Julie Daniels Journal: Health Serv Res Date: 2013-05-23 Impact factor: 3.402
Authors: Paul N Zivich; Michael G Hudgens; Maurice A Brookhart; James Moody; David J Weber; Allison E Aiello Journal: Stat Med Date: 2022-07-18 Impact factor: 2.497
Authors: Gabriel Conzuelo Rodriguez; Lisa M Bodnar; Maria M Brooks; Abdus Wahed; Edward H Kennedy; Enrique Schisterman; Ashley I Naimi Journal: Am J Epidemiol Date: 2022-01-01 Impact factor: 5.363
Authors: Jenna Wong; Daniel Prieto-Alhambra; Peter R Rijnbeek; Rishi J Desai; Jenna M Reps; Sengwee Toh Journal: Drug Saf Date: 2022-05-17 Impact factor: 5.228
Authors: Jonathan Yinhao Huang; Shirong Cai; Zhongwei Huang; Mya Thway Tint; Wen Lun Yuan; Izzuddin M Aris; Keith M Godfrey; Neerja Karnani; Yung Seng Lee; Jerry Kok Yen Chan; Yap Seng Chong; Johan Gunnar Eriksson; Shiao-Yng Chan Journal: Nat Commun Date: 2021-09-23 Impact factor: 14.919
Authors: Jea Young Min; Nicholas Williams; Will Simmons; Samprit Banerjee; Fei Wang; Yongkang Zhang; April B Reese; Alvin I Mushlin; James H Flory Journal: Diabet Med Date: 2022-02-28 Impact factor: 4.213
Authors: Richard Wyss; Chen Yanover; Tal El-Hay; Dimitri Bennett; Robert W Platt; Andrew R Zullo; Grammati Sari; Xuerong Wen; Yizhou Ye; Hongbo Yuan; Mugdha Gokhale; Elisabetta Patorno; Kueiyu Joshua Lin Journal: Pharmacoepidemiol Drug Saf Date: 2022-07-05 Impact factor: 2.732