| Literature DB >> 29687470 |
Miguel Angel Luque-Fernandez1,2,3, Michael Schomaker4, Bernard Rachet1, Mireille E Schnitzer5.
Abstract
When estimating the average effect of a binary treatment (or exposure) on an outcome, methods that incorporate propensity scores, the G-formula, or targeted maximum likelihood estimation (TMLE) are preferred over naïve regression approaches, which are biased under misspecification of a parametric outcome model. In contrast propensity score methods require the correct specification of an exposure model. Double-robust methods only require correct specification of either the outcome or the exposure model. Targeted maximum likelihood estimation is a semiparametric double-robust method that improves the chances of correct model specification by allowing for flexible estimation using (nonparametric) machine-learning methods. It therefore requires weaker assumptions than its competitors. We provide a step-by-step guided implementation of TMLE and illustrate it in a realistic scenario based on cancer epidemiology where assumptions about correct model specification and positivity (ie, when a study participant had 0 probability of receiving the treatment) are nearly violated. This article provides a concise and reproducible educational introduction to TMLE for a binary outcome and exposure. The reader should gain sufficient understanding of TMLE from this introductory tutorial to be able to apply the method in practice. Extensive R-code is provided in easy-to-read boxes throughout the article for replicability. Stata users will find a testing implementation of TMLE and additional material in the Appendix S1 and at the following GitHub repository: https://github.com/migariane/SIM-TMLE-tutorial.Entities:
Keywords: causal inference; ensemble Learning; machine learning; observational studies; targeted maximum likelihood estimation
Mesh:
Year: 2018 PMID: 29687470 PMCID: PMC6032875 DOI: 10.1002/sim.7628
Source DB: PubMed Journal: Stat Med ISSN: 0277-6715 Impact factor: 2.373
Figure 1Direct acyclic graph. Legend: Conditional exchangeability of the treatment effect or exposure (A) on cancer mortality (Y) is obtained through conditioning on a set of available covariates (Y(1),Y(0) ⊥ A|W). The average treatment effect for the structural framework is estimated as the average risk difference between the expected effect of the treatment conditional on W among those treated (E(Y|A = 1; W)) and the expected effect of the treatment conditional on W among those untreated (E(Y|A = 0; W)). Y: mortality binary indicator (1 death, 0 alive), A: binary treatment for cancer with monotherapy versus dual therapy (1 Mono; 0 Dual); W: W 1: sex; W 2: age at diagnosis; W 3: cancer stage, TNM classification; W 4: comorbidities [Colour figure can be viewed at http://wileyonlinelibrary.com]
Final dataset for the update of to
| id | Q1W0 | Q0W0 | g | Epsilon 1 | Epsilon 2 |
|
|---|---|---|---|---|---|---|
| 1 | 0.8551 | 0.6702 | 0.1967 | 0.003 | 0.0027 | 0.1858 |
| 2 | 0.639 | 0.3787 | 0.0184 | 0.003 | 0.0027 | 0.2927 |
| 3 | 0.7494 | 0.5073 | 0.0509 | 0.003 | 0.0027 | 0.2511 |
| 4 | 0.6604 | 0.4011 | 0.0095 | 0.003 | 0.0027 | 0.3187 |
| 5 | 0.9152 | 0.7879 | 0.5908 | 0.003 | 0.0027 | 0.1264 |
| … | … | … | … | … | … | … |
ATE and COR Monte Carlo simulations for mild misspecified models and near‐positivity violation, n = 1000
| Misspecified treatment and outcome models | Naïve | AIPTW | TMLE‐1 | TMLE‐2 | TMLE‐3 | |
|---|---|---|---|---|---|---|
| True ATE | 0.193 | |||||
| Estimate ATE | 0.208 | 0.199 | 0.193 | 0.193 | ||
| Absolute bias ATE | 0.015 | 0.006 | 0.000 | 0.000 | ||
| Relative bias ATE (%) | 7.2% | 3.0% | 0.0% | 0.0% | ||
| True MOR | 2.5 | |||||
| Estimate MOR | 3.1 | 3.0 | 3.0 | 2.9 | 2.8 |
: Estimated average treatment effect from the 1000 simulation repetitions.
: Estimated marginal odds ratio from the 1000 simulation repetitions.
Naïve: Logistic regression.
AIPTW: Augmented inverse‐probability treatment weights estimation under dual misspecification (model for the treatment and the outcome).
TMLE‐1: Dual misspecification. Algorithm computed by hand and naïve prediction (using from logistic regression models) without Super‐Learner (SL).
TMLE‐2: Dual misspecification. Algorithm estimated using R‐package and default SL library (SL.glm, SL.step, and SL.glm.interaction).
TMLE‐3: Dual misspecification. Algorithm computed using R‐package user‐supplied SL library (SL.gam, SL.randomForest, and SL.rpart).
Treatment model correctly specified refers to the usage of the correct logistic regression model for the propensity score. For TMLE‐2 and TMLE‐3, SL is used to estimate the outcome model as in the first scenario.
Figure 2Probability density function of the propensity score by treatment status for one randomly selected sample from 1000 Monte Carlo simulations [Colour figure can be viewed at http://wileyonlinelibrary.com]