| Literature DB >> 24525488 |
Noémi Kreif1, Susan Gruber2, Rosalba Radice3, Richard Grieve4, Jasjeet S Sekhon3.
Abstract
Statistical approaches for estimating treatment effectiveness commonly model the endpoint, or the propensity score, using parametric regressions such as generalised linear models. Misspecification of these models can lead to biased parameter estimates. We compare two approaches that combine the propensity score and the endpoint regression, and can make weaker modelling assumptions, by using machine learning approaches to estimate the regression function and the propensity score. Targeted maximum likelihood estimation is a double-robust method designed to reduce bias in the estimate of the parameter of interest. Bias-corrected matching reduces bias due to covariate imbalance between matched pairs by using regression predictions. We illustrate the methods in an evaluation of different types of hip prosthesis on the health-related quality of life of patients with osteoarthritis. We undertake a simulation study, grounded in the case study, to compare the relative bias, efficiency and confidence interval coverage of the methods. We consider data generating processes with non-linear functional form relationships, normal and non-normal endpoints. We find that across the circumstances considered, bias-corrected matching generally reported less bias, but higher variance than targeted maximum likelihood estimation. When either targeted maximum likelihood estimation or bias-corrected matching incorporated machine learning, bias was much reduced, compared to using misspecified parametric models.Entities:
Keywords: bias-corrected matching; double robustness; machine learning; model misspecification; targeted maximum likelihood estimation; treatment effectiveness
Mesh:
Year: 2014 PMID: 24525488 PMCID: PMC5051604 DOI: 10.1177/0962280214521341
Source DB: PubMed Journal: Stat Methods Med Res ISSN: 0962-2802 Impact factor: 3.021
Balance on pre-operative characteristics, means and % standardised mean differences.
| Covariate | Mean hybrid ( | Mean cementless ( | SMD (%) |
|---|---|---|---|
| Age | 69.7 | 69.3 | 15.98 |
| Oxford hip scorea | 20.2 | 19.9 | 2.83 |
| Pre-operative EQ-5Da | 0.401 | 0.399 | 0.63 |
| Index of deprivationa | 3.26 | 3.03 | 15.92 |
| ASA grade 1 (%)a | 0.0903 | 0.120 | 9.55 |
| ASA grade 2 (%) | 0.740 | 0.738 | 0.52 |
| Disability score | 0.617 | 0.596 | 4.19 |
| Obesea | 0.270 | 0.266 | 0.69 |
| Morbidly obesea | 0.104 | 0.111 | 4.30 |
| Number of comorbidities | 1.00 | 0.96 | 4.14 |
| Comorbidities | |||
| Heart disease | 0.176 | 0.15 | 7.86 |
| High bp | 0.399 | 0.422 | 4.55 |
| Stroke | 0.0285 | 0.0169 | 7.78 |
| Circulation | 0.0777 | 0.0671 | 4.08 |
| Lung disease | 0.0555 | 0.0640 | 3.61 |
| Diabetes | 0.130 | 0.123 | 2.20 |
| Kidney disease | 0.0127 | 0.0207 | 6.24 |
| Nervous system | 0.00634 | 0.0118 | 5.20 |
| Liver disease | 0.0951 | 0.00339 | 7.65 |
| Cancer | 0.0602 | 0.0515 | 3.80 |
| Depression | 0.0491 | 0.0373 | 5.84 |
| Consultant | 0.803 | 0.869 | 17.64 |
| Treatment centre | 0.0491 | 0.122 | 26.16 |
Note: SMD: standardised mean difference. SMD was calculated as , where and are the means for the hybrid and cementless group, while the denominator includes the pooled standard deviation of the two groups, for a given covariate. Variables are dichotomous, with the exception of age, Oxford hip score, pre-operative EQ-5D-3L score, index of deprivation and number of comorbidities.
Variables with missing values. Here, SMDs were combined using Rubin’s formulae.
Figure 1.Densities of the estimated PS using logistic regression, hybrid versus cementless THR. Hybrid (dashed line) versus cementless (black line).
Figure 2.Point estimates and 95% CIs of ATE in terms of EQ-5D-3L score, hybrid versus cementless THR, across statistical methods. SL: super learner.
Summary of DGPs used in the simulation study.
| Overlap | Confounder–endpoint association | Endpoint distribution | |
|---|---|---|---|
| DGP 1 | Good | Moderate | Normal |
| DGP 2 | Good | Strong | Normal |
| DGP 3 | Poor | Strong | Normal |
| DGP 4 | Poor | Strong | Gamma |
| DGP 5 | Poor | Strong | Semi-continuous |
Figure 3.Densities of the true PS in the simulations for a typical sample (n = 10,000). Treated (dashed line) versus control (black line). (a) Good overlap (DGP 1 and 2), (b) poor overlap (DGP 3–5).
Simulation results for DGP 1, over 1000 replications: normal endpoint, moderate association confounder–endpoint association, good overlap.
| Scenario | Relative bias (%) | Variance | RMSE | 95% CI coverage (%) |
|---|---|---|---|---|
|
| ||||
| OLS | −0.1 | 0.005 | 0.070 | 95 |
| IPTW | 0.5 | 0.008 | 0.091 | 99 |
| PS matching | 1.2 | 0.011 | 0.106 | 98 |
| TMLE | −0.1 | 0.005 | 0.071 | 95 |
| BCM | −0.1 | 0.007 | 0.082 | 95 |
|
| ||||
| OLS | −0.1 | 0.005 | 0.070 | 95 |
| IPTW | −15.0 | 0.008 | 0.110 | 97 |
| PS matching | −8.1 | 0.013 | 0.117 | 96 |
| TMLE | −0.2 | 0.005 | 0.070 | 94 |
| BCM | 0.7 | 0.007 | 0.085 | 93 |
|
| ||||
| OLS | −11.7 | 0.008 | 0.098 | 90 |
| IPTW | 0.5 | 0.008 | 0.091 | 99 |
| PS matching | 1.2 | 0.011 | 0.106 | 98 |
| WLS | 0.6 | 0.008 | 0.087 | 95 |
| TMLE | 0.6 | 0.008 | 0.087 | 95 |
| BCM | 0.7 | 0.009 | 0.097 | 95 |
|
| ||||
| OLS | −11.7 | 0.008 | 0.098 | 90 |
| IPTW | −15.0 | 0.008 | 0.110 | 97 |
| PS matching | −8.1 | 0.013 | 0.117 | 96 |
| WLS | −12.7 | 0.008 | 0.103 | 90 |
| TMLE | −12.9 | 0.008 | 0.104 | 90 |
| BCM | −7.4 | 0.011 | 0.108 | 93 |
|
| ||||
| Regression (Q super learner) | −3.1 | 0.006 | 0.079 | 95 |
| IPTW (g boosted CART) | 10.2 | 0.007 | 0.091 | 98 |
| WLS (Q OLS, g boosted CART) | 0.5 | 0.006 | 0.076 | 97 |
| TMLE (Q SL, g boosted CART) | 1.1 | 0.006 | 0.074 | 94 |
| BCM (Q SL, g boosted CART) | 2.1 | 0.008 | 0.092 | 95 |
Note: In DGP 1 the true ATE was 0.4 and the bias using a naive estimator based on the mean difference was 20%. WLS is implemented as main terms only in regression; hence it is reported as a misspecified estimator.
Simulation results for DGP 4 and 5, over 1000 replications: Normal and gamma endpoints, strong confounder–endpoint relationship, poor overlap.
| Relative bias (%) | Variance | RMSE | 95% CI coverage (%) | |
|---|---|---|---|---|
|
| ||||
|
| ||||
| OLS | −93.3 | 10.175 | 9.843 | 16 |
| IPTW | −102.7 | 11.850 | 10.817 | 34 |
| PS matching | −85.6 | 19.120 | 9.595 | 59 |
| WLS | −96.9 | 11.475 | 10.252 | 19 |
| TMLE | −96.4 | 10.303 | 10.140 | 17 |
| BCM | −80.7 | 17.642 | 9.085 | 37 |
|
| ||||
| Regression (Q super learner) | −11.8 | 7.600 | 2.998 | 90 |
| IPTW (g boosted CART) | −80.1 | 16.585 | 8.974 | 62 |
| WLS (Q OLS, g boosted CART) | −32.1 | 11.024 | 4.612 | 81 |
| TMLE (Q SL, g boosted CART) | −20.7 | 6.115 | 3.224 | 70 |
| BCM (Q SL, g boosted CART) | −2.5 | 6.755 | 2.610 | 98 |
|
| ||||
|
| ||||
| OLS | 26.0 | 0.0002 | 0.022 | 78 |
| IPTW | 15.0 | 0.0003 | 0.019 | 99 |
| PS matching | 26.9 | 0.0004 | 0.026 | 93 |
| WLS | 23.9 | 0.0003 | 0.022 | 83 |
| TMLE | 17.9 | 0.0002 | 0.019 | 90 |
| BCM | 27.1 | 0.0003 | 0.024 | 82 |
|
| ||||
| Regression (Q super learner) | 13.5 | 0.0002 | 0.017 | 91 |
| IPTW (g boosted CART) | 59.4 | 0.0003 | 0.041 | 72 |
| WLS (Q OLS, g boosted CART) | 12.9 | 0.0002 | 0.017 | 90 |
| TMLE (Q SL, g boosted CART) | 7.2 | 0.0002 | 0.016 | 87 |
| BCM (Q SL, g boosted CART) | −1.1 | 0.0004 | 0.019 | 95 |
Note: In DGPs 4 and 5, the true ATE was 9.98 and 0.062, respectively. The bias using a naive estimator based on the mean difference was 170 and 150%, respectively.
Figure 4.Estimated ATEs in the simulations. The boxplots show bias and variation, as median, quartiles and 1.5 times interquartile range for the estimated ATEs across 1000 replications. The dashed lines are the true values. The left panel provides results for when the PS model and endpoint were estimated with misspecified fixed parametric methods (d1), the right panel for when machine learning estimation (d2) was used. (a) DGP 3, (b) DGP 4, (c) DGP 5.
Simulation results for DGP 2 and 3, over 1000 replications: normal endpoint, strong confounder–endpoint association, good and poor overlap.
| Relative bias (%) | Variance | RMSE | 95% CI coverage (%) | |
|---|---|---|---|---|
|
| ||||
|
| ||||
| OLS regression | −45.9 | 0.052 | 0.292 | 86 |
| IPTW | −59.1 | 0.067 | 0.350 | 98 |
| PS matching | −34.0 | 0.099 | 0.342 | 96 |
| WLS | −50.2 | 0.059 | 0.315 | 87 |
| TMLE | −45.7 | 0.041 | 0.272 | 86 |
| BCM | −31.4 | 0.074 | 0.299 | 90 |
|
| ||||
| Regression (Q super learner) | −8.6 | 0.025 | 0.162 | 96 |
| IPTW (g boosted CART) | 41.0 | 0.036 | 0.251 | 99 |
| WLS (Q OLS, g boosted CART) | 2.6 | 0.022 | 0.149 | 100 |
| TMLE (Q SL, g boosted CART) | 3.1 | 0.011 | 0.106 | 95 |
| BCM (Q SL, g boosted CART) | 9.8 | 0.029 | 0.174 | 98 |
|
| ||||
|
| ||||
| OLS regression | −119.2 | 0.050 | 0.527 | 40 |
| IPTW | −160.6 | 0.082 | 0.703 | 71 |
| PS matching | −81.1 | 0.100 | 0.453 | 84 |
| WLS | −137.9 | 0.063 | 0.606 | 39 |
| TMLE | −129.7 | 0.046 | 0.561 | 35 |
| BCM | −73.8 | 0.072 | 0.399 | 74 |
|
| ||||
| Regression (Q super learner) | −22.0 | 0.046 | 0.233 | 94 |
| IPTW (g boosted CART) | 100.6 | 0.034 | 0.442 | 82 |
| WLS (Q OLS, g boosted CART) | −12.8 | 0.025 | 0.165 | 99 |
| TMLE (Q SL, g boosted CART) | 5.6 | 0.019 | 0.139 | 87 |
| BCM (Q SL, g boosted CART) | 12.3 | 0.034 | 0.191 | 98 |
Note: In DGPs 2 and 3, the true ATE was 0.4 and the biases, using a naive estimator based on the mean difference, were 80 and 190%, respectively.