| Literature DB >> 28573919 |
Clémence Leyrat1, Shaun R Seaman2, Ian R White2,3, Ian Douglas4, Liam Smeeth4, Joseph Kim1,5, Matthieu Resche-Rigon6,7, James R Carpenter1,3, Elizabeth J Williamson1,8.
Abstract
Inverse probability of treatment weighting is a popular propensity score-based approach to estimate marginal treatment effects in observational studies at risk of confounding bias. A major issue when estimating the propensity score is the presence of partially observed covariates. Multiple imputation is a natural approach to handle missing data on covariates: covariates are imputed and a propensity score analysis is performed in each imputed dataset to estimate the treatment effect. The treatment effect estimates from each imputed dataset are then combined to obtain an overall estimate. We call this method MIte. However, an alternative approach has been proposed, in which the propensity scores are combined across the imputed datasets (MIps). Therefore, there are remaining uncertainties about how to implement multiple imputation for propensity score analysis: (a) should we apply Rubin's rules to the inverse probability of treatment weighting treatment effect estimates or to the propensity score estimates themselves? (b) does the outcome have to be included in the imputation model? (c) how should we estimate the variance of the inverse probability of treatment weighting estimator after multiple imputation? We studied the consistency and balancing properties of the MIte and MIps estimators and performed a simulation study to empirically assess their performance for the analysis of a binary outcome. We also compared the performance of these methods to complete case analysis and the missingness pattern approach, which uses a different propensity score model for each pattern of missingness, and a third multiple imputation approach in which the propensity score parameters are combined rather than the propensity scores themselves (MIpar). Under a missing at random mechanism, complete case and missingness pattern analyses were biased in most cases for estimating the marginal treatment effect, whereas multiple imputation approaches were approximately unbiased as long as the outcome was included in the imputation model. Only MIte was unbiased in all the studied scenarios and Rubin's rules provided good variance estimates for MIte. The propensity score estimated in the MIte approach showed good balancing properties. In conclusion, when using multiple imputation in the inverse probability of treatment weighting context, MIte with the outcome included in the imputation model is the preferred approach.Entities:
Keywords: Missing covariates; Rubin’s rules; chained equations; inverse probability of treatment weighting; missingness pattern
Mesh:
Substances:
Year: 2017 PMID: 28573919 PMCID: PMC6313366 DOI: 10.1177/0962280217713032
Source DB: PubMed Journal: Stat Methods Med Res ISSN: 0962-2802 Impact factor: 3.021
Figure 1.The three approaches considered after multiple imputation (MI) of the partially observed covariates are missing values on the original dataset. are imputed values in the kth imputed dataset. and are the estimated treatment effect and estimated propensity scores, respectively, from the kth imputed dataset . The MIte approach consists of pooling the M treatment effects estimated with IPTW on each imputed dataset. MIps estimate is obtained by using the average PS across the M imputed datasets in the IPTW estimator. Finally, the MIpar approach uses the PS of the average covariate value across the M imputed dataset. The PS is estimated using the average PS parameters as regression coefficients.
Factors used in the simulation study ( factorial design).
| Factor | Values | Description | Comments |
|---|---|---|---|
|
| 0.3 or 0.6 | Correlation between the covariates | After dichotomization, |
| RR | 1 or 2 | Relative risk | In model (9), |
|
| 0 or −0.4 | Association between the outcome and the probability of missingness | When |
For each scenario, 5000 datasets of sample size 2000 were generated. The prevalence of the treatment was 0.3 and each partially observed covariate had 30% of data missing. Ten imputed datasets were created.
Simulation parameters used for the sensitivity analysis.
| Factor | Values | Description |
|---|---|---|
|
| 500 | Sample size |
|
| 0.1 or 0.6 | Missingness rate |
|
| 5 or 20 | Number of imputed datasets |
For each scenario of the sensitivity analysis, 5000 datasets were generated. The other parameter values were: RR = 2, and .
Figure 2.Absolute value of the bias for the four scenarios in which . CC: complete case; MP: missingness pattern; MIte: treatment effects combined after multiple imputation; MIps: propensity scores combined after multiple imputation; MIpar: propensity score parameters combined after multiple imputation. For the three MI approaches ‘+’ means that the outcome is included in the imputation model, ‘−’ means that the outcome is not in the imputation model. RR: relative risk.
Standardized differences (in %) after IPTW for each method for one scenario: RR = 2, , outcome predictor of missingness and included in the imputation model (n = 2000).
| Method | X1 | X2 | X3 |
|---|---|---|---|
| Crude (without IPTW) | 81.3 | 74.7 | 51.7 |
| Full | 4.6 | 4.6 | 2.4 |
| CC (n = 1074) | 7.6 | 7.3 | 3.5 |
| MP | |||
| Balance on full data | 14.6 | 4.3 | 8.5 |
| Balance on the observed part of the covariate | 6.1 | 4.3 | 2.9 |
| Balance on the missing part of the covariate | 48.6 | NA | 28.3 |
| MIte | |||
| Balance on full data | 15.0 | 4.5 | 9.1 |
| Balance on each imputed dataset | 4.5 | 4.5 | 2.4 |
| MIps | |||
| Balance on full data | 15.9 | 5.5 | 10.7 |
| Balance on the average imputed dataset | 15.8 | 5.5 | 10.6 |
| Balance on the observed part of the covariate | 7.6 | 5.5 | 4.9 |
| Balance on the imputed part of the covariate | 58.1 | NA | 36.9 |
| MIpar | |||
| Balance on full data | 15.1 | 4.8 | 9.6 |
| Balance on the average imputed dataset | 14.7 | 4.8 | 9.7 |
| Balance on the observed part of the covariate | 7.7 | 4.8 | 5.4 |
| Balance on the imputed part of the covariate | 52.5 | NA | 34.3 |
CC: complete case; MP: missingness pattern; MIte: treatment effects combined after multiple imputation; MIps: propensity scores combined after multiple imputation; MIpar: propensity score parameters combined after multiple imputation; RR: relative risk; NA: not applicable because X2 is fully observed.
Figure 3.Coverage rate of the 95% CI for each method compared. Results are pooled for the 8 main scenarios. CC: complete case; MP: missingness pattern; MIte: treatment effects combined after multiple imputation; MIps: propensity scores combined after multiple imputation; MIpar: propensity score parameters combined after multiple imputation. RR: relative risk. For the three MI methods, the outcome is included in the imputation model.
Bias of the log(RR), its estimated variance and coverage rate for the three MI approaches according the sample size n for one scenario (RR = 2, , outcome predictor of missingness and included in the imputation model). Results based on 5000 simulations.
| Full | CC | MP | MIte | MIps | MIpar | |||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| n = 500 | n = 2000 | n = 500 | n = 2000 | n = 500 | n = 2000 | n = 500 | n = 2000 | n = 500 | n = 2000 | n = 500 | n = 2000 | |
| Bias | 0.007 | 0.002 | 0.110 | 0.141 | 0.153 | 0.130 | 0.010 | 0.005 | 0.038 | 0.028 | 0.024 | 0.017 |
| Variance | 0.022 | 0.006 | 0.050 | 0.014 | 0.029 | 0.008 | 0.026 | 0.007 | 0.022 | 0.006 | 0.023 | 0.006 |
| Empirical variance | 0.024 | 0.006 | 0.059 | 0.014 | 0.027 | 0.007 | 0.025 | 0.006 | 0.024 | 0.006 | 0.025 | 0.006 |
| Coverage rate | 0.940 | 0.947 | 0.887 | 0.769 | 0.855 | 0.691 | 0.955 | 0.957 | 0.939 | 0.932 | 0.943 | 0.942 |
MIte: treatment effects combined after multiple imputation; MIps: propensity scores combined after multiple imputation; MIpar: propensity score parameters combined after multiple imputation.
Figure 4.Absolute value of the bias according to the missingness rate. CC: complete case; MP: missingness pattern; MIte: treatment effects combined after multiple imputation; MIps: propensity scores combined after multiple imputation; MIpar: propensity score parameters combined after multiple imputation; RR: relative risk. For the three MI methods, the outcome is included in the imputation model.
Figure 5.Distribution of the propensity score estimated on complete cases for statin users and non users (n = 5168).
Description and comparison of statin users and non users (n = 7158).
| Variable | Missing (%) | Statin users | Missing (%) | Non statin users | Standardized difference (%) | |||||
|---|---|---|---|---|---|---|---|---|---|---|
| n = 599 | n = 6559 | Crude | CC* | MP | MIte | MIps | MIpar | |||
| Characteristics | ||||||||||
| Age (mean (sd)) | 66.9 (10.7) | 69.8 (10.9) | 27.0 | 3.8 | 2.0 | 1.4 | 1.4 | 1.4 | ||
| Male | 322 (53.8) | 3173 (48.4) | 10.8 | 2.0 | 6.2 | 2.2 | 2.1 | 2.2 | ||
| BMI (mean (sd)) | 43 (7.2) | 27.6 (5.9) | 1444 (22.0) | 25.8 (5.9) | 31.9 | 7.8 | 9.0 | 9.0 | 11.4 | 11.4 |
| Drinkers | 67 (11.2) | 98 (18.4) | 1334 (20.3) | 814 (15.6) | 7.6 | 2.1 | 0.3 | 2.3 | 2.9 | 3.0 |
| Smokers | 7 (1.2) | 256 (43.2) | 505 (7.7) | 2728 (45.1) | 3.7 | 1.7 | 1.5 | 2.8 | 3.0 | 3.0 |
| Medical history | ||||||||||
| Diabetes | 243 (40.6) | 715 (10.9) | 72.1 | 5.0 | 7.7 | 7.1 | 7.2 | 7.1 | ||
| Cardiovascular disease | 141 (23.5) | 651 (9.9) | 37.1 | 11.4 | 11.4 | 13.6 | 13.6 | 13.6 | ||
| Circulatory disease | 426 (71.1) | 3471 (52.9) | 38.2 | 13.6 | 9.8 | 16.6 | 16.7 | 16.6 | ||
| Heart failure | 51 (8.5) | 426 (6.5) | 7.7 | 11.6 | 6.2 | 12.8 | 12.8 | 12.8 | ||
| Cancer | 37 (6.2) | 607 (9.2) | 11.5 | 2.1 | 0.4 | 0.4 | 0.0 | 0.1 | ||
| Dementia | 6 (1.0) | 190 (2.9) | 13.7 | 7.3 | 13.0 | 11.6 | 11.6 | 11.6 | ||
| Hypertension | 336 (56.1) | 1165 (17.8) | 52.1 | 13.3 | 21.5 | 18.7 | 18.7 | 18.7 | ||
| Hyperlipidaemia | 205 (34.2) | 182 (2.8) | 88.5 | 1.1 | 4.1 | 1.9 | 2.0 | 2.0 | ||
| Treatments | ||||||||||
| Antidepressant | 108 (18.0) | 995 (15.2) | 7.7 | 1.7 | 5.9 | 0.3 | 0.1 | 0.1 | ||
| Antipsychotic | 11 (1.8) | 340 (5.2) | 18.3 | 0.5 | 11.3 | 5.0 | 5.0 | 5.0 | ||
| Hormone replacement therapy | 37 (6.2) | 277 (4.2) | 8.8 | 0.9 | 0.9 | 1.0 | 1.0 | 1.0 | ||
| Steroid | 93 (15.5) | 1090 (16.6) | 3.0 | 1.0 | 2.2 | 0.4 | 0.3 | 0.3 | ||
| Antihypertensive | 272 (45.4) | 1165 (17.8) | 62.3 | 12.6 | 27.5 | 18.0 | 17.8 | 17.9 | ||
| Diuretics | 319 (53.3) | 2416 (36.8) | 33.4 | 14.3 | 19.8 | 15.8 | 15.9 | 15.9 | ||
| Betablocker | 193 (32.2) | 1061 (16.2) | 38.1 | 11.4 | 7.2 | 13.8 | 13.8 | 13.8 | ||
| Nitrate | 74 (12.4) | 334 (5.1) | 25.9 | 17.3 | 14.8 | 17.5 | 17.6 | 17.6 |
For CC analysis, n = 5168 (503 statin users and 4665 non users).
Estimate of the relative risk of mortality and its 95% confidence interval for statin vs non statin users (motivating example) (n = 7158).
| Method |
| 95% CI( |
|---|---|---|
| Crude | 0.587 | [0.497;0.684] |
| CC | 0.702 | [0.534;0.924] |
| MP | 0.708 | [0.555;0.904] |
| MIte | 0.654 | [0.513;0.835] |
| MIps | 0.653 | [0.512;0.834] |
| MIpar | 0.654 | [0.513;0.834] |
RR: relative risk; CC: complete case; MP: missingness pattern; MIte: treatment effects combined after multiple imputation; MIps: propensity scores combined after multiple imputation; MIpar: propensity score parameters combined after multiple imputation; RR: relative risk.