| Literature DB >> 24525487 |
Jonathan W Bartlett1, Shaun R Seaman2, Ian R White2, James R Carpenter3.
Abstract
Missing covariate data commonly occur in epidemiological and clinical research, and are often dealt with using multiple imputation. Imputation of partially observed covariates is complicated if the substantive model is non-linear (e.g. Cox proportional hazards model), or contains non-linear (e.g. squared) or interaction terms, and standard software implementations of multiple imputation may impute covariates from models that are incompatible with such substantive models. We show how imputation by fully conditional specification, a popular approach for performing multiple imputation, can be modified so that covariates are imputed from models which are compatible with the substantive model. We investigate through simulation the performance of this proposal, and compare it with existing approaches. Simulation results suggest our proposal gives consistent estimates for a range of common substantive models, including models which contain non-linear covariate effects or interactions, provided data are missing at random and the assumed imputation models are correctly specified and mutually compatible. Stata software implementing the approach is freely available.Entities:
Keywords: compatibility; fully conditional specification; interactions; multiple imputation; non-linearities; rejection sampling
Mesh:
Year: 2014 PMID: 24525487 PMCID: PMC4513015 DOI: 10.1177/0962280214521348
Source DB: PubMed Journal: Stat Methods Med Res ISSN: 0962-2802 Impact factor: 3.021
Baseline characteristics of n = 382 ADNI subjects with MCI at baseline.
| Variable | Mean (SD) or no. (% of observed) | No. of missing values (%) |
|---|---|---|
| A | 16.4 (5.5) | 190 (49.7%) |
| log(tau) (log pg/mL) | 4.50 (0.49) | 193 (50.5%) |
| log(p-tau) (log pg/mL) | 3.44 (0.50) | 189 (49.5%) |
| Mother had AD | 77 (25.3%) | 77 (20.2%) |
| Father had AD | 26 (9.0%) | 93 (24.3%) |
| Intracranial volume (cm3) | 1474 (150) | 43 (11.3%) |
| Hippocampal volume (cm3) | 6.47 (1.04) | 43 (11.3%) |
| APOE4 positive | 207 (54.2%) | 0 (0%) |
ADNI: Alzheimer’s Disease Neuroimaging Initiative; MCI: mild cognitive impairment; AD: Alzheimer’s disease.
Simulation results – linear regression with quadratic covariate effects. Empirical mean (SD) of estimates of quadratic coefficient β2 = 1 from 1000 simulations, using linear passive imputation, JAV imputation, the polynomial combination method and SMC-FCS. Empirical coverage of nominal 95% confidence intervals is also shown (Cov). Monte-Carlo errors for means and SDs are less than 0.003, except for log-normal X MAR, where Monte-Carlo errors for means and SDs are less than 0.02. Monte-Carlo errors for confidence interval coverage are less than 1.6%.
| Scenario | Linear passive | JAV | Polynomial comb. | SMC-FCS | ||||
|---|---|---|---|---|---|---|---|---|
| Mean (SD) | Cov | Mean (SD) | Cov | Mean (SD) | Cov | Mean (SD) | Cov | |
| Normal | 0.696 (0.041) | 0.0 | 1.001 (0.041) | 91.9 | 1.005 (0.040) | 91.5 | 0.998 (0.038) | 93.9 |
| Log-normal | 0.789 (0.084) | 16.7 | 1.012 (0.100) | 82.6 | 1.025 (0.097) | 83.1 | 1.000 (0.059) | 96.2 |
| 0.493 (0.036) | 0.0 | 0.997 (0.036) | 94.7 | 1.003 (0.034) | 95.8 | 0.942 (0.036) | 65.9 | |
| Normal | 0.618 (0.045) | 0.0 | 1.192 (0.073) | 12.3 | 1.045 (0.069) | 75.9 | 0.995 (0.049) | 94.4 |
| Log-normal | 0.790 (0.265) | 58.6 | 1.488 (0.324) | 25.7 | 1.288 (0.179) | 27.6 | 1.002 (0.158) | 91.7 |
| 0.450 (0.033) | 0.0 | 1.085 (0.047) | 48.1 | 1.009 (0.048) | 87.8 | 0.840 (0.038) | 3.4 | |
JAV: just another variable; MCAR: missing completely at random; MAR: missing at random.
Simulation results – linear regression with interaction. Empirical mean (SD) of estimates of β1 = 1 and β3 = 1 from 1000 simulations, standard FCS with passive imputation (Passive FCS), JAV imputation and SMC-FCS. Empirical coverage of nominal 95% confidence intervals is also shown (Cov). Monte-Carlo errors for means and SDs are all less than 0.04 for β1 and less than 0.02 for β3. Monte-Carlo errors for confidence interval coverage are less than 1.6%.
| Passive FCS | JAV | SMC-FCS | |||||
|---|---|---|---|---|---|---|---|
| Parameter | Mean (SD) | Cov | Mean (SD) | Cov | Mean (SD) | Cov | |
| MCAR | |||||||
| β1 | 1.50 (0.39) | 82.7 | 1.02 (0.53) | 94.5 | 0.99 (0.45) | 95.4 | |
| β3 | 0.74 (0.14) | 75.1 | 1.00 (0.23) | 95.2 | 1.01 (0.19) | 95.4 | |
| β1 | 1.70 (0.57) | 72.4 | 1.03 (0.61) | 95.1 | 0.81 (0.56) | 92.7 | |
| β3 | 0.69 (0.23) | 64.5 | 1.00 (0.23) | 94.8 | 1.08 (0.22) | 92.1 | |
| β1 | 2.20 (0.65) | 36.6 | 1.03 (0.50) | 95.3 | 1.09 (0.52) | 93.0 | |
| β3 | 0.70 (0.27) | 57.6 | 1.00 (0.13) | 94.6 | 1.09 (0.14) | 87.9 | |
| β1 | 1.11 (0.21) | 91.8 | 1.00 (0.23) | 95.3 | 0.99 (0.22) | 95.0 | |
| β3 | 0.80 (0.15) | 80.7 | 0.99 (0.20) | 95.3 | 0.99 (0.17) | 94.6 | |
| β1 | 1.74 (0.63) | 78.3 | 1.01 (0.74) | 95.1 | 1.28 (0.65) | 92.1 | |
| β3 | 0.71 (0.24) | 78.4 | 0.98 (0.28) | 94.5 | 0.89 (0.24) | 92.0 | |
| MAR | |||||||
| β1 | 1.63 (0.37) | 78.6 | 1.31 (0.60) | 91.1 | 1.03 (0.46) | 95.5 | |
| β3 | 0.64 (0.12) | 56.2 | 0.96 (0.30) | 94.0 | 0.97 (0.19) | 95.3 | |
| β1 | 2.53 (0.95) | 45.5 | 1.57 (1.17) | 91.7 | 1.02 (0.93) | 94.9 | |
| β3 | 0.16 (0.35) | 30.0 | 1.06 (0.57) | 95.0 | 1.00 (0.45) | 92.2 | |
| β1 | 2.39 (1.33) | 41.9 | 1.68 (0.63) | 81.1 | 1.29 (0.54) | 94.8 | |
| β3 | 0.13 (0.28) | 24.3 | 1.17 (0.20) | 84.3 | 1.10 (0.20) | 90.7 | |
| β1 | 1.11 (0.21) | 92.2 | 1.14 (0.22) | 88.4 | 1.00 (0.22) | 95.0 | |
| β3 | 0.78 (0.15) | 81.6 | 0.97 (0.22) | 95.3 | 0.98 (0.17) | 95.6 | |
| β1 | 1.84 (0.74) | 85.1 | 1.11 (0.99) | 95.1 | 1.24 (0.77) | 93.1 | |
| β3 | 0.68 (0.28) | 84.6 | 0.96 (0.38) | 94.8 | 0.91 (0.28) | 92.9 | |
| Results based on * 999, ** 968 simulations | |||||||
FCS: fully conditional specification; JAV: just another variable; SMC-FCS: substantive model compatible-fully conditional specification; MCAR: missing completely at random; MAR: missing at random.
Cox proportional hazards outcome model simulation results. Empirical mean (SD) of estimates of β1 = 1 and β2 = 1 from 1000 simulations, using complete case analysis, MI of X1 and X2 using FCS with the event indicator and Nelson–Aalen marginal baseline cumulative hazard function as covariates (FCS), and SMC-FCS. Empirical coverage of nominal 95% confidence intervals is also shown (Cov). Monte-Carlo errors in means and SDs are no more than 0.02 for n = 100 and 0.005 for n = 1000.
| Parameter | Complete case | FCS | SMC-FCS | |||
|---|---|---|---|---|---|---|
| Mean (SD) | Cov | Mean (SD) | Cov | Mean (SD) | Cov | |
| β1 = 1 | 1.04 (0.47) | 95.6 | 0.94 (0.36) | 96.5 | 1.02 (0.41) | 94.7 |
| β2 = 1 | 1.05 (0.26) | 95.6 | 0.89 (0.17) | 94.0 | 1.05 (0.21) | 94.8 |
| β1 = 1 | 1.000 (0.129) | 95.2 | 0.902 (0.107) | 89.1 | 1.002 (0.114) | 95.0 |
| β2 = 1 | 1.007 (0.070) | 94.8 | 0.861 (0.049) | 45.7 | 1.006 (0.058) | 95.1 |
FCS: fully conditional specification; SMC-FCS: substantive model compatible-fully conditional specification.
Estimates of log hazard ratios (standard errors) for Cox proportional hazards model relating hazard of conversion to AD to baseline risk factors. Estimates based on complete case, FCS imputation and SMC-FCS.
| ( | ( | ||
|---|---|---|---|
| Variable | Complete case | FCS | SMC-FCS |
| A | 0.31 (0.19) | 0.08 (0.10) | 0.28 (0.16) |
| A | −0.011 (0.005) | −0.004 (0.003) | −0.010 (0.005) |
| log(tau) (log pg/mL) | −0.60 (0.47) | −0.23 (0.37) | −0.19 (0.36) |
| log(p-tau) (log pg/mL) | 1.29 (0.51) | 0.52 (0.38) | 0.44 (0.40) |
| Mother had AD | −0.61 (0.32) | −0.15 (0.22) | −0.14 (0.21) |
| Father had AD | −1.07 (0.68) | −0.22 (0.35) | −0.28 (0.34) |
| Intracranial volume (cm3) | 0.0005 (0.0010) | 0.0010 (0.0007) | 0.0011 (0.0007) |
| Hippocampal volume (cm3) | −0.64 (0.17) | −0.47 (0.10) | −0.51 (0.10) |
| APOE4 positive | −0.06 (0.30) | 0.31 (0.22) | 0.41 (0.20) |
FCS: fully conditional specification; SMC–FCS: substantive model compatible-fully conditional specification.