| Literature DB >> 26980444 |
K Ellicott Colson1, Kara E Rudolph1,2, Scott C Zimmerman1, Dana E Goin1, Elizabeth A Stuart3, Mark van der Laan4, Jennifer Ahern1.
Abstract
Matching methods are common in studies across many disciplines. However, there is limited evidence on how to optimally combine matching with subsequent analysis approaches to minimize bias and maximize efficiency for the quantity of interest. We conducted simulations to compare the performance of a wide variety of matching methods and analysis approaches in terms of bias, variance, and mean squared error (MSE). We then compared these approaches in an applied example of an employment training program. The results indicate that combining full matching with double robust analysis performed best in both the simulations and the applied example, particularly when combined with machine learning estimation methods. To reduce bias, current guidelines advise researchers to select the technique with the best post-matching covariate balance, but this work finds that such an approach does not always minimize mean squared error (MSE). These findings have important implications for future research utilizing matching. To minimize MSE, investigators should consider additional diagnostics, and use of simulations tailored to the study of interest to identify the optimal matching and analysis combination.Entities:
Mesh:
Year: 2016 PMID: 26980444 PMCID: PMC4793248 DOI: 10.1038/srep23222
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.379
Figure 1(a) Density of estimated propensity scores for treated and control units in good support scenario. (b) Density of estimated propensity scores for treated and control units in poor support scenario. The plots illustrate substantial overlap of the propensity scores for treated and control units in the good support scenario and minimal overlap in the poor support scenario. The probability of treatment ranged between 0.093 and 0.776 in the good support scenario and between <0.001 and >0.999 in the poor support scenario. Distributions and results for the medium support scenario fell in between those of the good and poor support scenarios and are presented in SI Text, SI Fig. S1, and SI Table S2.
Balance metrics by simulation scenario and matching method.
| Match method | Percent of covariates with ASMD less than… | Median ASMD | Maximum ASMD | ASMD in propensity score | |
|---|---|---|---|---|---|
| 20% | 5% | ||||
| Good support | |||||
| None | 46.1 | 5.3 | 0.214 | 0.443 | 0.362 |
| NN | 100 | 80.6 | 0.024 | 0.147 | 0.072 |
| Opt | 100 | 81.1 | 0.025 | 0.148 | 0.069 |
| Genetic | |||||
| Sub | 100 | 99.7 | 0.013 | 0.055 | 0.048 |
| Full | 100 | 94.7 | 0.014 | 0.092 | 0.054 |
| IPTW | 100 | 99.7 | 0.013 | 0.072 | 0.068 |
| Poor support | |||||
| None | 48.1 | 13.8 | 0.726 | 0.941 | 1.184 |
| NN | 74.4 | 36.5 | 0.090 | 0.330 | 0.126 |
| Opt | 49.6 | 20.7 | 0.544 | 0.822 | 0.953 |
| Genetic | |||||
| Sub | 78.9 | 7.4 | 0.131 | 0.451 | 0.132 |
| Full | 79.0 | 36.3 | 0.111 | 0.335 | 0.145 |
| IPTW | 62.5 | 15.1 | 0.130 | 1.643 | 0.200 |
ASMD: Absolute standardized mean difference in covariate values between treated and control groups. NN: greedy nearest neighbor. Opt: optimal nearest neighbor. Sub: subclassification. IPTW: inverse probability of treatment weighting. Metrics are averaged across 1,000 simulation runs. Bolded values indicate the best balance according to each metric and scenario.
Simulation results comparing matching and analysis combinations in good support scenario.
| Match | Analysis | % Bias | Var | MSE | Bias rank | Var rank | MSE rank |
|---|---|---|---|---|---|---|---|
| Full | TMLE parametric | 0.28% | 0.0063 | 0.0063 | 16 | 2 | 1 |
| None | TMLE with SL | −0.02% | 0.0065 | 0.0065 | 3 | 3 | 2 |
| None | TMLE parametric | −0.14% | 0.0066 | 0.0066 | 11 | 4 | 3 |
| IPTW | g-computation | 0.13% | 0.0066 | 0.0066 | 9 | 5 | 4 |
| IPTW | TMLE with SL | 0.02% | 0.0067 | 0.0067 | 2 | 6 | 5 |
| Sub | g-computation | −0.48% | 0.0067 | 0.0067 | 19 | 7 | 6 |
| IPTW | Naïve | −0.52% | 0.0068 | 0.0069 | 20 | 9 | 7 |
| Sub | TMLE with SL | −0.45% | 0.0069 | 0.0070 | 18 | 10 | 8 |
| Opt | g-computation | −1.35% | 0.0067 | 0.0071 | 23 | 8 | 9 |
| Sub | Naïve | 0.89% | 0.0070 | 0.0072 | 21 | 11 | 10 |
| Opt | TMLE with SL | −0.08% | 0.0072 | 0.0072 | 6 | 13 | 11 |
| Full | TMLE with SL | −0.04% | 0.0073 | 0.0073 | 4 | 14 | 12 |
| Opt | TMLE parametric | −1.25% | 0.0071 | 0.0074 | 22 | 12 | 13 |
| Full | g-computation | −0.21% | 0.0077 | 0.0077 | 12 | 16 | 14 |
| Full | Naive | 0.24% | 0.0080 | 0.0081 | 14 | 17 | 15 |
| Sub | TMLE parametric | −1.52% | 0.0082 | 0.0087 | 24 | 18 | 16 |
| Genetic | TMLE with SL | 0.11% | 0.0090 | 0.0090 | 7 | 19 | 17 |
| Genetic | g-computation | 0.05% | 0.0093 | 0.0093 | 5 | 20 | 18 |
| Genetic | Naive | 0.14% | 0.0093 | 0.0093 | 10 | 21 | 19 |
| NN | TMLE with SL | 0.02% | 0.0102 | 0.0102 | 1 | 23 | 20 |
| Opt | Naive | 2.23% | 0.0097 | 0.0107 | 25 | 22 | 21 |
| NN | g-computation | −0.23% | 0.0107 | 0.0107 | 13 | 24 | 22 |
| NN | Naive | 0.28% | 0.0113 | 0.0113 | 15 | 25 | 23 |
| Genetic | TMLE parametric | 0.33% | 0.0115 | 0.0115 | 17 | 26 | 24 |
| NN | TMLE parametric | −0.12% | 0.0129 | 0.0129 | 8 | 27 | 25 |
| None | g-computation | −8.51% | 0.0054 | 0.0202 | 27 | 1 | 26 |
| IPTW | TMLE parametric | 7.97% | 0.0077 | 0.0207 | 26 | 15 | 27 |
| None | Naive | 21.22% | 0.0147 | 0.1073 | 28 | 28 | 28 |
Var: variance. MSE: mean squared error. NN: greedy nearest neighbor. Opt: optimal nearest neighbor. Sub: subclassification. IPTW: inverse probability of treatment weighting. SL: using SuperLearner for semi-parametric estimation.
Simulation results comparing matching and analysis combinations in poor support scenario.
| Match | Analysis | % Bias | Var | MSE | Bias rank | Var rank | MSE rank |
|---|---|---|---|---|---|---|---|
| Full | TMLE with SL | −0.81% | 0.0088 | 0.0091 | 8 | 3 | 1 |
| IPTW | TMLE with SL | −0.76% | 0.0090 | 0.0092 | 6 | 4 | 2 |
| Genetic | TMLE with SL | −0.65% | 0.0130 | 0.0132 | 3 | 8 | 3 |
| NN | TMLE with SL | −0.70% | 0.0140 | 0.0141 | 4 | 9 | 4 |
| Opt | TMLE with SL | 0.53% | 0.0166 | 0.0167 | 2 | 14 | 5 |
| IPTW | G-computation | −0.70% | 0.0169 | 0.0171 | 5 | 15 | 6 |
| None | TMLE with SL | 0.52% | 0.0179 | 0.0179 | 1 | 16 | 7 |
| Sub | Naive | 3.47% | 0.0158 | 0.0201 | 18 | 13 | 8 |
| Genetic | g-computation | −0.78% | 0.0210 | 0.0212 | 7 | 19 | 9 |
| Full | g-computation | −1.55% | 0.0204 | 0.0212 | 10 | 17 | 10 |
| Full | Naive | 1.77% | 0.0210 | 0.0221 | 13 | 18 | 11 |
| Genetic | Naive | 1.59% | 0.0216 | 0.0224 | 11 | 20 | 12 |
| NN | g-computation | −1.82% | 0.0217 | 0.0229 | 14 | 21 | 13 |
| NN | Naive | 1.76% | 0.0222 | 0.0232 | 12 | 22 | 14 |
| Opt | TMLE parametric | 1.11% | 0.0247 | 0.0251 | 9 | 24 | 15 |
| Sub | g-computation | −3.46% | 0.0243 | 0.0285 | 17 | 23 | 16 |
| None | TMLE parametric | 1.89% | 0.0285 | 0.0297 | 15 | 25 | 17 |
| Genetic | TMLE parametric | −6.69% | 0.0145 | 0.0303 | 19 | 10 | 18 |
| IPTW | TMLE parametric | −7.56% | 0.0106 | 0.0308 | 22 | 5 | 19 |
| NN | TMLE parametric | −7.22% | 0.0147 | 0.0331 | 20 | 12 | 20 |
| Full | TMLE parametric | −7.84% | 0.0118 | 0.0335 | 23 | 7 | 21 |
| IPTW | Naive | −1.94% | 0.0493 | 0.0506 | 16 | 27 | 22 |
| Sub | TMLE parametric | −7.50% | 0.0467 | 0.0665 | 21 | 26 | 23 |
| Sub | TMLE with SL | −8.04% | 0.0512 | 0.0740 | 24 | 28 | 24 |
| None | g-computation | −26.75% | 0.0074 | 0.2605 | 25 | 1 | 25 |
| Opt | g-computation | −28.68% | 0.0074 | 0.2982 | 26 | 2 | 26 |
| Opt | Naive | 45.27% | 0.0147 | 0.7393 | 27 | 11 | 27 |
| None | Naive | 60.41% | 0.0110 | 1.3013 | 28 | 6 | 28 |
Var: variance. MSE: mean squared error. NN: greedy nearest neighbor. Opt: optimal nearest neighbor. Sub: subclassification. IPTW: inverse probability of treatment weighting. SL: using SuperLearner for semi-parametric estimation.
Figure 2Density of estimated propensity scores for treated experimental units and observational control units.
The plot illustrates extremely poor overlap of the propensity scores for treated and control units in the applied example. Propensity scores ranged between <0.001 and 0.488, and were substantially skewed towards 0 for the control group. These patterns indicate that the baseline characteristics of the control individuals are very different from those who participated in the program; this example is most similar to the poor support simulation scenario.
Balance metrics for NSW observational data by matching method.
| Match method | Percent of covariates with ASMD less than… | Median ASMD | Maximum ASMD | ASMD in propensity score | |
|---|---|---|---|---|---|
| 20% | 5% | ||||
| None | 12.2 | 5.0 | 1.182 | 1.471 | 4.901 |
| NN | 94.4 | 45.1 | 0.066 | 0.309 | 0.490 |
| Opt | 98.6 | 56.2 | 0.048 | 0.227 | 0.255 |
| Genetic | 0.160 | 0.060 | |||
| Sub | 60.0 | 4.8 | 0.181 | 0.352 | 0.062 |
| Full | 97.9 | 58.3 | 0.048 | 0.215 | |
| IPTW | 100 | 91.8 | 0.015 | 0.028 | |
ASMD: Absolute standardized mean difference in covariate values between treated and control groups. NN: greedy nearest neighbor. Opt: optimal nearest neighbor. Sub: subclassification. IPTW: inverse probability of treatment weighting. Metrics are averaged across 500 bootstrapped samples. Bolded values indicate the best balance according to each metric and scenario.
Figure 3Comparison of matching and analysis combinations to estimate the effect of NSW participation using observational control data.
Colored points represent point estimates of the effect of treatment on the treated (ATT), with corresponding 95% error bars. The unadjusted estimate of the ATT in the NSW observational data was -$8,526, dramatically different from the experimental result of $1,794 (indicated by the grey line). The success of matching and analysis combinations in recovering the experimental result varied substantially. Almost all methods underestimated the experimental result, suggesting a consistent residual bias that may be the result of unmeasured covariates. The confidence intervals for all ATT estimates were wide, and few estimates excluded the null. IPTW: inverse probability of treatment weighting. TMLE: targeted minimum loss-based estimation. SL: using SuperLearner for semi-parametric estimation.