Literature DB >> 33591058

Machine Learning for Causal Inference: On the Use of Cross-fit Estimators.

Paul N Zivich1,2, Alexander Breskin3.   

Abstract

BACKGROUND: Modern causal inference methods allow machine learning to be used to weaken parametric modeling assumptions. However, the use of machine learning may result in complications for inference. Doubly robust cross-fit estimators have been proposed to yield better statistical properties.
METHODS: We conducted a simulation study to assess the performance of several different estimators for the average causal effect. The data generating mechanisms for the simulated treatment and outcome included log-transforms, polynomial terms, and discontinuities. We compared singly robust estimators (g-computation, inverse probability weighting) and doubly robust estimators (augmented inverse probability weighting, targeted maximum likelihood estimation). We estimated nuisance functions with parametric models and ensemble machine learning separately. We further assessed doubly robust cross-fit estimators.
RESULTS: With correctly specified parametric models, all of the estimators were unbiased and confidence intervals achieved nominal coverage. When used with machine learning, the doubly robust cross-fit estimators substantially outperformed all of the other estimators in terms of bias, variance, and confidence interval coverage.
CONCLUSIONS: Due to the difficulty of properly specifying parametric models in high-dimensional data, doubly robust estimators with ensemble learning and cross-fitting may be the preferred approach for estimation of the average causal effect in most epidemiologic studies. However, these approaches may require larger sample sizes to avoid finite-sample issues.
Copyright © 2021 Wolters Kluwer Health, Inc. All rights reserved.

Entities:  

Mesh:

Year:  2021        PMID: 33591058      PMCID: PMC8012235          DOI: 10.1097/EDE.0000000000001332

Source DB:  PubMed          Journal:  Epidemiology        ISSN: 1044-3983            Impact factor:   4.860


  29 in total

1.  On the use of generalized additive models in time-series studies of air pollution and health.

Authors:  Francesca Dominici; Aidan McDermott; Scott L Zeger; Jonathan M Samet
Journal:  Am J Epidemiol       Date:  2002-08-01       Impact factor: 4.897

2.  Stratification and weighting via the propensity score in estimation of causal treatment effects: a comparative study.

Authors:  Jared K Lunceford; Marie Davidian
Journal:  Stat Med       Date:  2004-10-15       Impact factor: 2.373

3.  Diagnosing and responding to violations in the positivity assumption.

Authors:  Maya L Petersen; Kristin E Porter; Susan Gruber; Yue Wang; Mark J van der Laan
Journal:  Stat Methods Med Res       Date:  2010-10-28       Impact factor: 3.021

4.  Doubly robust estimation in missing data and causal inference models.

Authors:  Heejung Bang; James M Robins
Journal:  Biometrics       Date:  2005-12       Impact factor: 2.571

5.  An empirical comparison of tree-based methods for propensity score estimation.

Authors:  Stephanie Watkins; Michele Jonsson-Funk; M Alan Brookhart; Steven A Rosenberg; T Michael O'Shea; Julie Daniels
Journal:  Health Serv Res       Date:  2013-05-23       Impact factor: 3.402

6.  Recursive partitioning for heterogeneous causal effects.

Authors:  Susan Athey; Guido Imbens
Journal:  Proc Natl Acad Sci U S A       Date:  2016-07-05       Impact factor: 11.205

7.  Understanding and diagnosing the potential for bias when using machine learning methods with doubly robust causal estimators.

Authors:  Asma Bahamyirou; Lucie Blais; Amélie Forget; Mireille E Schnitzer
Journal:  Stat Methods Med Res       Date:  2018-05-02       Impact factor: 3.021

8.  Improving propensity score weighting using machine learning.

Authors:  Brian K Lee; Justin Lessler; Elizabeth A Stuart
Journal:  Stat Med       Date:  2010-02-10       Impact factor: 2.373

Review 9.  You are smarter than you think: (super) machine learning in context.

Authors:  Alexander P Keil; Jessie K Edwards
Journal:  Eur J Epidemiol       Date:  2018-05-09       Impact factor: 8.082

10.  Parametric assumptions equate to hidden observations: comparing the efficiency of nonparametric and parametric models for estimating time to AIDS or death in a cohort of HIV-positive women.

Authors:  Jacqueline E Rudolph; Stephen R Cole; Jessie K Edwards
Journal:  BMC Med Res Methodol       Date:  2018-11-19       Impact factor: 4.615

View more
  9 in total

1.  State of the Art Causal Inference in the Presence of Extraneous Covariates: A Simulation Study.

Authors:  Raluca Cobzaru; Sharon Jiang; Kenney Ng; Stan Finkelstein; Roy Welsch; Zach Shahn
Journal:  AMIA Annu Symp Proc       Date:  2022-02-21

2.  Targeted maximum likelihood estimation of causal effects with interference: A simulation study.

Authors:  Paul N Zivich; Michael G Hudgens; Maurice A Brookhart; James Moody; David J Weber; Allison E Aiello
Journal:  Stat Med       Date:  2022-07-18       Impact factor: 2.497

3.  Performance Evaluation of Parametric and Nonparametric Methods When Assessing Effect Measure Modification.

Authors:  Gabriel Conzuelo Rodriguez; Lisa M Bodnar; Maria M Brooks; Abdus Wahed; Edward H Kennedy; Enrique Schisterman; Ashley I Naimi
Journal:  Am J Epidemiol       Date:  2022-01-01       Impact factor: 5.363

Review 4.  Applying Machine Learning in Distributed Data Networks for Pharmacoepidemiologic and Pharmacovigilance Studies: Opportunities, Challenges, and Considerations.

Authors:  Jenna Wong; Daniel Prieto-Alhambra; Peter R Rijnbeek; Rishi J Desai; Jenna M Reps; Sengwee Toh
Journal:  Drug Saf       Date:  2022-05-17       Impact factor: 5.228

5.  AIPW: An R Package for Augmented Inverse Probability-Weighted Estimation of Average Causal Effects.

Authors:  Yongqi Zhong; Edward H Kennedy; Lisa M Bodnar; Ashley I Naimi
Journal:  Am J Epidemiol       Date:  2021-12-01       Impact factor: 5.363

6.  Analyses of child cardiometabolic phenotype following assisted reproductive technologies using a pragmatic trial emulation approach.

Authors:  Jonathan Yinhao Huang; Shirong Cai; Zhongwei Huang; Mya Thway Tint; Wen Lun Yuan; Izzuddin M Aris; Keith M Godfrey; Neerja Karnani; Yung Seng Lee; Jerry Kok Yen Chan; Yap Seng Chong; Johan Gunnar Eriksson; Shiao-Yng Chan
Journal:  Nat Commun       Date:  2021-09-23       Impact factor: 14.919

7.  Use of Machine Learning to Estimate the Per-Protocol Effect of Low-Dose Aspirin on Pregnancy Outcomes: A Secondary Analysis of a Randomized Clinical Trial.

Authors:  Yongqi Zhong; Maria M Brooks; Edward H Kennedy; Lisa M Bodnar; Ashley I Naimi
Journal:  JAMA Netw Open       Date:  2022-03-01

8.  Baseline haemoglobin A1c and the risk of COVID-19 hospitalization among patients with diabetes in the INSIGHT Clinical Research Network.

Authors:  Jea Young Min; Nicholas Williams; Will Simmons; Samprit Banerjee; Fei Wang; Yongkang Zhang; April B Reese; Alvin I Mushlin; James H Flory
Journal:  Diabet Med       Date:  2022-02-28       Impact factor: 4.213

Review 9.  Machine learning for improving high-dimensional proxy confounder adjustment in healthcare database studies: An overview of the current literature.

Authors:  Richard Wyss; Chen Yanover; Tal El-Hay; Dimitri Bennett; Robert W Platt; Andrew R Zullo; Grammati Sari; Xuerong Wen; Yizhou Ye; Hongbo Yuan; Mugdha Gokhale; Elisabetta Patorno; Kueiyu Joshua Lin
Journal:  Pharmacoepidemiol Drug Saf       Date:  2022-07-05       Impact factor: 2.732

  9 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.