| Literature DB >> 19452569 |
Ian R White1, Patrick Royston.
Abstract
Multiple imputation is commonly used to impute missing data, and is typically more efficient than complete cases analysis in regression analysis when covariates have missing values. Imputation may be performed using a regression model for the incomplete covariates on other covariates and, importantly, on the outcome. With a survival outcome, it is a common practice to use the event indicator D and the log of the observed event or censoring time T in the imputation model, but the rationale is not clear.We assume that the survival outcome follows a proportional hazards model given covariates X and Z. We show that a suitable model for imputing binary or Normal X is a logistic or linear regression on the event indicator D, the cumulative baseline hazard H(0)(T), and the other covariates Z. This result is exact in the case of a single binary covariate; in other cases, it is approximately valid for small covariate effects and/or small cumulative incidence. If we do not know H(0)(T), we approximate it by the Nelson-Aalen estimator of H(T) or estimate it by Cox regression.We compare the methods using simulation studies. We find that using logT biases covariate-outcome associations towards the null, while the new methods have lower bias. Overall, we recommend including the event indicator and the Nelson-Aalen estimator of H(T) in the imputation model. Copyright 2009 John Wiley & Sons, Ltd.Entities:
Mesh:
Year: 2009 PMID: 19452569 PMCID: PMC2998703 DOI: 10.1002/sim.3618
Source DB: PubMed Journal: Stat Med ISSN: 0277-6715 Impact factor: 2.373
Summary of data from the MRC RE01 study (n = 347).
| Variable | Code | Mean | SD | Per cent missing |
|---|---|---|---|---|
| Erythrocyte sedimentation rate | esr | 49.6 | 35.1 | 51.3 |
| Haemoglobin | haem | 12.3 | 1.9 | 6.6 |
| White cell count | wcc | 8.7 | 4.1 | 6.6 |
| Days from metastasis to randomization | t_mt | 129 | 421 | 0.3 |
| WHO performance status | who | 0 | 27 | 0 |
| 1 | 48 | |||
| 2 | 24 | |||
| Treatment with IFN | trt | control | 50 | 0 |
| IFN | 50 |
Figure 1Smoothed mean and SD of X|T, D with β = 0.7, h0(t) = 1.
Models considered for imputing missing values of incomplete X.
| Abbreviation | Description |
|---|---|
| NO-T | Regression of |
| LOGT | Regression of |
| T | Regression of |
| T2 | Regression of |
| NA | Regression of |
| NA-INT | Regression of |
| COX | Regression of |
| COX* | Same as COX, but with only two iterations used for |
‘Regression’ means logistic regression for binary X and linear regression for Normal X.
Simulation results for parameter β in univariate model with binary X.
| Settings | Analyses | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| Model | β | PERFECT | CC | NO-T | LOGT | T | T2 | NA | COX | COX | |
| Base case | 0 | 336 | 0.00 | −0.01 | −0.01 | −0.01 | −0.01 | −0.01 | −0.01 | −0.01 | −0.01 |
| 0.5 | 336 | 0.00 | 0.00 | 0.00 | −0.02 | 0.00 | 0.00 | 0.00 | |||
| 1 | 84 | 0.01 | 0.03 | 0.03 | 0.02 | 0.03 | |||||
| Shape 2 | 1 | 84 | 0.01 | 0.03 | 0.00 | 0.03 | 0.03 | ||||
| Admin cens | 1 | 84 | 0.01 | 0.03 | 0.01 | 0.02 | 0.02 | 0.02 | 0.02 | 0.02 | |
| Base case | 0 | 336 | 0 | 1 | 4 | −1 | 0 | −1 | 0 | 1 | |
| 0.5 | 336 | −1 | 1 | 5 | 0 | −1 | 0 | 0 | 0 | ||
| 1 | 84 | −2 | −6 | 4 | −2 | −4 | −2 | −4 | −2 | ||
| Shape 2 | 1 | 84 | −2 | −6 | 4 | 0 | −2 | −2 | −4 | −2 | |
| Admin cens | 1 | 84 | −4 | −1 | −1 | −2 | −2 | −2 | 0 | 0 | |
| Base case | 0.5 | 336 | 92 | 65 | 65 | 65 | 64 | 64 | |||
| 1 | 84 | 92 | 64 | 64 | 63 | 64 | 65 | ||||
| Shape 2 | 1 | 84 | 92 | 64 | 63 | 64 | 63 | 64 | 64 | ||
| Admin cens | 1 | 84 | 96 | 73 | 71 | 73 | 73 | 72 | 71 | 71 | |
Base case: π = 0.5, π = 0.5, λ =0.002, shape κ=1, random censoring with λ = 0.002.
Administrative censoring.
Monte Carlo error: ≤0.016 for bias, ≤4 per cent for per cent error in model standard error, ≤1.6 per cent for power. Bold cells have bias greater than 0.03, more than 10 per cent error in model standard error or power more than 3 per cent worse than method T.
Simulation results for parameter β in univariate model with Normal X.
| Settings | Analyses | |||||||||
|---|---|---|---|---|---|---|---|---|---|---|
| Model | β | PERFECT | CC | LOGT | T | T2 | NA | COX | COX | |
| Base case | 0 | 336 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 |
| 0.25 | 336 | 0.00 | 0.01 | −0.02 | 0.00 | −0.01 | 0.00 | 0.00 | 0.00 | |
| 0.5 | 84 | 0.02 | −0.03 | −0.02 | −0.03 | |||||
| Shape 2 | 0.5 | 84 | 0.02 | −0.02 | −0.03 | −0.02 | −0.02 | |||
| Admin cens | 0.5 | 84 | 0.01 | 0.01 | −0.03 | −0.03 | −0.03 | −0.03 | −0.02 | |
| Base case | 0 | 336 | −2 | −2 | 3 | −1 | 1 | −1 | −1 | 0 |
| 0.25 | 336 | −3 | −4 | 2 | 0 | 4 | 0 | 6 | 0 | |
| 0.5 | 84 | 0 | −3 | 9 | 8 | |||||
| Shape 2 | 0.5 | 84 | 0 | −3 | 9 | 9 | 8 | |||
| Admin cens | 0.5 | 84 | −1 | −5 | 1 | 1 | 2 | 1 | 2 | 1 |
| Base case | 0.25 | 336 | 89 | 62 | 60 | 60 | 59 | 60 | ||
| 0.5 | 84 | 88 | 58 | 53 | 53 | 55 | ||||
| Shape 2 | 0.5 | 84 | 88 | 58 | 52 | 53 | 53 | 55 | ||
| Admin cens | 0.5 | 84 | 89 | 58 | 53 | 53 | 53 | 53 | 55 | 54 |
Base case: π = 0.5, π = 0.5, λ = 0.002, shape κ=1, random censoring with λ = 0.002.
Administrative censoring.
Monte Carlo error: ≤0.01 for bias, ≤5 per cent for per cent error in model standard error, ≤1.6 per cent for power. Bold cells have bias greater than 0.03, more than 10 per cent error in model standard error or power more than 3 per cent worse than method T.
Simulation results for parameter β in bivariate model.
| Settings | Analyses | |||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| β | β | ρ | PERFECT | CC | LOGT | T | T2 | NA | NA-INT | COX* | ||
| MCAR | 0 | 0 | 0 | 336 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 |
| 0 | 0 | 0.5 | 336 | 0.00 | −0.01 | 0.00 | −0.01 | 0.00 | −0.01 | 0.00 | 0.00 | |
| 0 | 0.5 | 0 | 336 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | |
| 0 | 0.5 | 0.5 | 336 | −0.01 | −0.01 | 0.00 | 0.00 | 0.00 | 0.00 | −0.01 | 0.00 | |
| 0.5 | 0 | 0 | 84 | 0.01 | −0.03 | |||||||
| 0.5 | 0 | 0.5 | 84 | 0.01 | ||||||||
| 0.5 | 0.5 | 0 | 84 | 0.00 | 0.03 | |||||||
| 0.5 | 0.5 | 0.5 | 84 | 0.01 | 0.03 | |||||||
| MAR | 0.5 | 0.5 | 0 | 84 | 0.00 | |||||||
| 0.5 | 0.5 | 0.5 | 84 | 0.01 | ||||||||
| MCAR | 0 | 0 | 0 | 336 | 0 | 1 | 5 | 1 | 4 | 1 | 3 | 1 |
| 0 | 0 | 0.5 | 336 | 0 | 4 | 8 | 4 | 7 | 4 | 6 | 5 | |
| 0 | 0.5 | 0 | 336 | 1 | 61 | 5 | 2 | 6 | 1 | 2 | 3 | |
| 0 | 0.5 | 0.5 | 336 | 2 | 4 | 8 | 7 | 6 | 7 | 9 | ||
| 0.5 | 0 | 0 | 84 | 0 | −6 | 8 | 7 | 5 | 6 | 9 | ||
| 0.5 | 0 | 0.5 | 84 | 0 | −5 | 9 | 7 | 6 | 5 | 8 | ||
| 0.5 | 0.5 | 0 | 84 | −2 | −5 | 9 | 9 | 6 | 6 | |||
| 0.5 | 0.5 | 0.5 | 84 | −2 | −2 | 10 | ||||||
| MAR | 0.5 | 0.5 | 0 | 84 | −2 | −9 | 7 | |||||
| 0.5 | 0.5 | 0.5 | 84 | −2 | −3 | |||||||
| MCAR | 0.5 | 0 | 0 | 84 | 85 | 54 | 50 | 50 | 48 | 49 | ||
| 0.5 | 0 | 0.5 | 84 | 74 | 42 | 36 | 39 | 36 | 39 | 38 | 40 | |
| 0.5 | 0.5 | 0 | 84 | 85 | 53 | 44 | 46 | 48 | 47 | 47 | ||
| 0.5 | 0.5 | 0.5 | 84 | 73 | 43 | 32 | 34 | 35 | 37 | 34 | ||
| MAR | 0.5 | 0.5 | 0 | 84 | 85 | 49 | 38 | 39 | 40 | 40 | 41 | |
| 0.5 | 0.5 | 0.5 | 84 | 73 | 38 | 27 | 27 | 28 | 30 | 27 | ||
Monte Carlo error is ≤0.01 for bias, ≤3 per cent for per cent error in model standard error, ≤l.6 for power.
Bold cells have bias greater than 0.03, more than 10 per cent error in model standard error or power more than 3 per cent worse than method T.
Simulation results for parameter β in bivariate model.
| Settings | Analyses | |||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| β | β | ρ | PERFECT | CC | LOGT | T | T2 | NA | NA-INT | COX* | ||
| MCAR | 0 | 0 | 0 | 336 | 0.00 | 0.01 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 |
| 0 | 0 | 0.5 | 336 | 0.00 | 0.01 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | |
| 0 | 0.5 | 0 | 336 | 0.01 | 0.01 | 0.01 | 0.01 | 0.01 | 0.01 | 0.01 | 0.01 | |
| 0 | 0.5 | 0.5 | 336 | 0.01 | 0.02 | 0.01 | 0.01 | 0.01 | 0.01 | 0.01 | 0.01 | |
| 0.5 | 0 | 0 | 84 | 0.00 | 0.00 | 0.01 | 0.01 | 0.01 | 0.01 | 0.01 | 0.01 | |
| 0.5 | 0 | 0.5 | 84 | 0.00 | 0.01 | 0.03 | 0.02 | 0.02 | 0.02 | |||
| 0.5 | 0.5 | 0 | 84 | 0.02 | 0.00 | 0.00 | −0.01 | 0.00 | 0.02 | 0.00 | ||
| 0.5 | 0.5 | 0.5 | 84 | 0.02 | 0.03 | 0.03 | ||||||
| MAR | 0.5 | 0.5 | 0 | 84 | 0.02 | 0.00 | −0.01 | −0.01 | −0.01 | 0.01 | −0.01 | |
| 0.5 | 0.5 | 0.5 | 84 | 0.02 | ||||||||
| MCAR | 0 | 0 | 0 | 336 | −2 | 1 | −1 | −1 | −1 | −1 | −2 | −1 |
| 0 | 0 | 0.5 | 336 | −2 | 1 | 2 | 1 | 2 | 1 | 1 | 1 | |
| 0 | 0.5 | 0 | 336 | 0 | 0 | 1 | 1 | 1 | 1 | 1 | 2 | |
| 0 | 0.5 | 0.5 | 336 | 0 | 0 | 4 | 3 | 4 | 3 | 3 | 3 | |
| 0.5 | 0 | 0 | 84 | 4 | −6 | 9 | 9 | 9 | 6 | 9 | ||
| 0.5 | 0 | 0.5 | 84 | 4 | −6 | 8 | 6 | 8 | 6 | 5 | 6 | |
| 0.5 | 0.5 | 0 | 84 | 2 | −4 | 6 | 7 | 8 | 6 | 5 | 8 | |
| 0.5 | 0.5 | 0.5 | 84 | 4 | −3 | 5 | 5 | 6 | 5 | 5 | 5 | |
| MAR | 0.5 | 0.5 | 0 | 84 | 2 | −7 | 9 | 9 | 10 | 9 | ||
| 0.5 | 0.5 | 0.5 | 84 | 4 | −7 | 9 | 8 | 9 | 8 | 7 | 7 | |
| MCAR | 0 | 0.5 | 0 | 336 | 100 | 100 | 100 | 100 | 100 | 100 | 100 | 100 |
| 0 | 0.5 | 0.5 | 336 | 100 | 100 | 100 | 100 | 100 | 100 | 100 | ||
| 0.5 | 0.5 | 0 | 84 | 86 | 77 | 76 | 76 | 76 | 78 | 77 | ||
| 0.5 | 0.5 | 0.5 | 84 | 76 | 68 | 66 | 69 | 66 | 68 | 67 | ||
| MAR | 0.5 | 0.5 | 0 | 84 | 86 | 72 | 72 | 72 | 73 | 73 | 73 | |
| 0.5 | 0.5 | 0.5 | 84 | 76 | 64 | 65 | 66 | 64 | 63 | 65 | ||
Monte Carlo error is ≤0.01 for bias, ≤3 per cent for per cent error in model standard error and ≤1.6 per cent for power. Bold cells have bias greater than 0.03, more than 10 per cent error in model standard error or power more than 3 per cent worse than method T.
Renal cancer data: results of proportional hazards models by complete cases and eight different imputation methods. Tabulated values are (standard error).
| CC | Imputation methods ( | |||||||
|---|---|---|---|---|---|---|---|---|
| Variable | ( | NO-T | LOGT | T | T2 | NA | COX | COX* |
| esr/35.1 | 0.24 | 0.26 | 0.25 | 0.25 | ||||
| (0.11) | (0.10) | (0.12) | (0.12) | (0.12) | (0.12) | (0.12) | (0.11) | |
| who2 | −0.84 | −0.87 | −0.86 | −0.84 | −0.87 | −0.87 | −0.87 | |
| (0.17) | (0.17) | (0.17) | (0.17) | (0.18) | (0.17) | (0.17) | ||
| who3 | −0.62 | −0.62 | −0.62 | −0.61 | −0.61 | −0.61 | −0.61 | −0.61 |
| (0.14) | (0.15) | (0.15) | (0.15) | (0.15) | (0.15) | (0.15) | ||
| haem/2.00 | −0.27 | −0.26 | −0.26 | −0.26 | ||||
| (0.12) | (0.09) | (0.10) | (0.10) | (0.10) | (0.10) | (0.10) | (0.10) | |
| wcc^3/13.5 | 0.33 | 0.34 | 0.34 | 0.33 | 0.34 | 0.34 | 0.34 | |
| (0.08) | (0.08) | (0.08) | (0.08) | (0.08) | (0.08) | (0.08) | ||
| log(t_mt+1)/1.42 | −0.24 | −0.23 | −0.24 | −0.24 | −0.23 | −0.24 | −0.24 | −0.24 |
| (0.06) | (0.06) | (0.06) | (0.06) | (0.06) | (0.06) | (0.06) | ||
| trt | −0.37 | −0.37 | −0.37 | −0.37 | −0.37 | −0.37 | −0.37 | |
| (0.12) | (0.12) | (0.12) | (0.12) | (0.12) | (0.12) | (0.12) | ||
Bold cells indicate estimates that differ from the NA estimate by more than 20 per cent of the NA standard error, or standard errors that differ from the NA standard error by more than 20 per cent of the NA standard error. Monte Carlo error in parameter estimates is no more than 0.003 in all cases.