| Literature DB >> 22489953 |
Shaun R Seaman1, Jonathan W Bartlett, Ian R White.
Abstract
BACKGROUND: Multiple imputation is often used for missing data. When a model contains as covariates more than one function of a variable, it is not obvious how best to impute missing values in these covariates. Consider a regression with outcome Y and covariates X and X2. In 'passive imputation' a value X* is imputed for X and then X2 is imputed as (X*)2. A recent proposal is to treat X2 as 'just another variable' (JAV) and impute X and X2 under multivariate normality.Entities:
Mesh:
Year: 2012 PMID: 22489953 PMCID: PMC3403931 DOI: 10.1186/1471-2288-12-46
Source DB: PubMed Journal: BMC Med Res Methodol ISSN: 1471-2288 Impact factor: 4.615
Figure 1Typical datasets for normally or log-normally distributed . Dotted line shows expected value of Y given X.
Linear regression with Y ~ N (2X+X2, ϕ)
| bias | cover | bias | cover | bias | cover | ||||
|---|---|---|---|---|---|---|---|---|---|
| MCAR, | |||||||||
| CData | -3 | 95 | 100 | -1 | 95 | 100 | 0 | 95 | 100 |
| CCase | -2 | 95 | 64 | -1 | 95 | 64 | 0 | 95 | 64 |
| Passive | -32 | 99 | 124 | -21 | 95 | 104 | -20 | 87 | 86 |
| PMM | -3 | 92 | 59 | 0 | 93 | 65 | 2 | 92 | 64 |
| JAV | -4 | 94 | 61 | -1 | 95 | 61 | 0 | 95 | 62 |
| MAR, | |||||||||
| CData | -6 | 95 | 100 | -1 | 96 | 100 | -2 | 95 | 100 |
| CCase | -23 | 95 | 72 | -13 | 95 | 59 | -8 | 94 | 48 |
| Passive | -45 | 99 | 144 | -27 | 95 | 120 | -42 | 50 | 122 |
| PMM | -36 | 89 | 50 | -13 | 93 | 49 | 8 | 91 | 36 |
| JAV | -12 | 94 | 52 | -1 | 95 | 42 | 0 | 93 | 38 |
| MAR, | |||||||||
| CData | -6 | 96 | 100 | 0 | 95 | 100 | -1 | 95 | 100 |
| CCase | -21 | 94 | 42 | -19 | 94 | 24 | -7 | 94 | 20 |
| Passive | -72 | 98 | 70 | 24 | 93 | 21 | -3 | 88 | 31 |
| PMM | -46 | 88 | 29 | -19 | 90 | 15 | 47 | 86 | 6 |
| JAV | -7 | 92 | 28 | 7 | 91 | 12 | 18 | 91 | 10 |
Table 1 Percentage bias, coverage and relative precision for quadratic term in linear regression when Y ~ N (2X+X2, ϕ). The true value of the quadratic term is 1. For MCAR, X ~ normal, the maximum MCSEs among the five methods are 4, 1 and 1% for R2 = 0.1, 0.5 and 0.8, respectively. For MAR, X ~ normal, they are 5, 2 and 1%. For MAR, X ~ log normal, they are 7, 4 and 3%
Linear regression with Y ~ N ((X-2)2, ϕ)
| bias | cover | bias | cover | bias | cover | ||||
|---|---|---|---|---|---|---|---|---|---|
| MCAR, | |||||||||
| CData | -1 | 95 | 100 | 0 | 95 | 100 | 0 | 95 | 100 |
| CCase | 0 | 95 | 64 | 0 | 95 | 64 | 0 | 95 | 64 |
| Passive | -31 | 86 | 110 | -31 | 48 | 55 | -30 | 32 | 21 |
| PMM | -2 | 93 | 62 | 1 | 93 | 61 | 2 | 90 | 47 |
| JAV | -1 | 94 | 64 | 0 | 93 | 63 | 0 | 92 | 52 |
| MAR, | |||||||||
| CData | 0 | 94 | 100 | 0 | 95 | 100 | 0 | 94 | 100 |
| CCase | -14 | 92 | 54 | -9 | 88 | 37 | -4 | 91 | 30 |
| Passive | -41 | 80 | 108 | -38 | 45 | 52 | -32 | 48 | 18 |
| PMM | -10 | 88 | 42 | 4 | 88 | 26 | 16 | 51 | 10 |
| JAV | 0 | 93 | 41 | 18 | 68 | 21 | 22 | 19 | 10 |
| MAR, | |||||||||
| CData | 2 | 94 | 100 | 0 | 95 | 100 | 0 | 95 | 100 |
| CCase | -12 | 94 | 44 | -8 | 94 | 27 | -4 | 94 | 20 |
| Passive | -41 | 96 | 81 | -25 | 87 | 18 | -9 | 90 | 4 |
| PMM | -10 | 88 | 29 | 8 | 91 | 12 | 35 | 70 | 3 |
| JAV | 7 | 92 | 27 | 41 | 70 | 6 | 71 | 20 | 2 |
Table 2 Percentage bias, coverage and relative precision for quadratic term in linear regression when Y ~ N ((X-2)2, ϕ). For MCAR, X ~ normal, the maximum MCSEs among the five methods are 1, 0 and 0% for R2 = 0.1, 0.5 and 0.8, respectively. For MAR, X ~ normal, they are 1, 1 and 1%. For MAR, X ~ log normal, they are 2, 2 and 2%
Linear regression with interaction
| bias | cover | bias | cover | bias | cover | ||||
|---|---|---|---|---|---|---|---|---|---|
| MCAR, | |||||||||
| CData | 3 | 93 | 100 | 1 | 93 | 100 | 0 | 93 | 100 |
| CCase | -3 | 95 | 71 | -1 | 95 | 71 | 0 | 95 | 71 |
| Passive1 | -31 | 97 | 136 | -19 | 94 | 116 | -18 | 88 | 106 |
| Passive2 | -11 | 95 | 86 | -17 | 94 | 115 | -17 | 89 | 103 |
| PMM | -12 | 96 | 86 | -15 | 96 | 106 | -13 | 91 | 93 |
| JAV | -2 | 93 | 66 | -1 | 94 | 65 | 0 | 94 | 65 |
| MAR, | |||||||||
| CData | -1 | 94 | 100 | -2 | 95 | 100 | 0 | 95 | 100 |
| CCase | -15 | 96 | 82 | -12 | 94 | 69 | -5 | 95 | 62 |
| Passive1 | -36 | 99 | 147 | -24 | 94 | 112 | -25 | 79 | 110 |
| Passive2 | -14 | 96 | 75 | -26 | 94 | 111 | -25 | 82 | 89 |
| PMM | -19 | 97 | 84 | -23 | 94 | 94 | -17 | 90 | 85 |
| JAV | -3 | 94 | 60 | -4 | 92 | 54 | 1 | 94 | 53 |
| MAR, | |||||||||
| CData | -1 | 96 | 100 | 2 | 95 | 100 | 1 | 96 | 100 |
| CCase | -17 | 94 | 57 | -9 | 95 | 38 | -4 | 96 | 30 |
| Passive1 | -43 | 98 | 129 | -20 | 96 | 76 | -34 | 68 | 65 |
| Passive2 | -40 | 96 | 71 | -42 | 89 | 58 | -45 | 73 | 27 |
| PMM | -40 | 96 | 79 | -38 | 92 | 66 | -27 | 85 | 30 |
| JAV | -3 | 93 | 41 | 8 | 92 | 26 | 14 | 92 | 20 |
Table 3 Percentage bias, coverage and relative precision for interaction term in linear regression. The true value of the interaction term is 1. For MCAR, X, Z ~ normal, the maximum MCSEs are 4, 1 and 1% for R2 = 0.1, 0.5 and 0.8, respectively. For MAR, X, Z ~ normal, they are 4, 2 and 1%. For MAR, X, Z ~ log normal, they are 6, 3 and 1%
Logistic regression with quadratic term
| (0.5, 1/12) | (0.5, 1/6) | (0.1, 1/12) | |||||||
|---|---|---|---|---|---|---|---|---|---|
| bias | cover | bias | cover | bias | cover | ||||
| MCAR, | |||||||||
| CData | 1 | 95 | 100 | -1 | 95 | 100 | -6 | 94 | 100 |
| CCase | 1 | 96 | 70 | -1 | 95 | 73 | -8 | 95 | 67 |
| Passive | -30 | 97 | 137 | -30 | 92 | 136 | -34 | 99 | 119 |
| PMM | 0 | 94 | 67 | -1 | 94 | 70 | -10 | 93 | 63 |
| JAV | -7 | 96 | 76 | -23 | 92 | 102 | 27 | 91 | 72 |
| MCAR, | |||||||||
| CData | 6 | 95 | 100 | 4 | 94 | 100 | 4 | 94 | 100 |
| CCase | 7 | 94 | 69 | 4 | 95 | 73 | 4 | 96 | 71 |
| Passive | -36 | 96 | 222 | -45 | 90 | 308 | -40 | 93 | 127 |
| PMM | 8 | 93 | 67 | 6 | 92 | 68 | 5 | 95 | 68 |
| JAV | -66 | 71 | 178 | -118 | 3 | 398 | 55 | 85 | 56 |
| MAR, | |||||||||
| CData | 0 | 96 | 100 | 1 | 95 | 100 | -8 | 96 | 100 |
| CCase | -1 | 97 | 67 | 0 | 95 | 63 | -28 | 96 | 28 |
| Passive | 33 | 97 | 125 | -30 | 92 | 115 | -71 | 99 | 171 |
| PMM | -2 | 94 | 65 | -1 | 92 | 59 | -33 | 85 | 27 |
| JAV | 37 | 89 | 59 | 56 | 62 | 79 | 51 | 82 | 26 |
| MAR, | |||||||||
| CData | 5 | 93 | 100 | 5 | 96 | 100 | 5 | 94 | 100 |
| CCase | 7 | 93 | 70 | 7 | 95 | 69 | 7 | 95 | 38 |
| Passive | -8 | 98 | 100 | 2 | 99 | 106 | -202 | 16 | 81 |
| PMM | 8 | 91 | 67 | 7 | 93 | 64 | 5 | 84 | 34 |
| JAV | 22 | 92 | 81 | -30 | 80 | 105 | 333 | 25 | 7 |
Table 4 Percentage bias, coverage and relative precision for quadratic term in logistic regression. For MCAR, X ~ normal, the maximum MCSEs are 2, 1 and 3% for (p, β2)=(0.5,1/12), (0.5, 1/6) and (0.1, 1/12), respectively. For MCAR, X ~ log normal, they are 2, 2 and 2%. For MAR, X ~ normal, they are 2, 1 and 4%. For MAR, X ~ log normal, they are 2, 2 and 6%
Figure 2Log plasma vitamin C and log dietary vitamin C in 15415 individuals for whom both variables are observed.
Analysis of vitamin-C data
| Complete | FCS | FCS | JAV | |||||
|---|---|---|---|---|---|---|---|---|
| Cases | with Passive | with PMM | ||||||
| Est | SE | Est | SE | Est | SE | Est | SE | |
| intercept | 0.990 | 0.201 | 0.570 | 0.177 | 0.903 | 0.181 | 1.030 | 0.163 |
| log diet C | 1.141 | 0.090 | 1.322 | 0.079 | 1.163 | 0.081 | 1.106 | 0.075 |
| log diet C sqrd | -0.090 | 0.010 | -0.113 | 0.009 | -0.094 | 0.009 | -0.088 | 0.008 |
| sex | 0.169 | 0.008 | 0.173 | 0.007 | 0.172 | 0.007 | 0.172 | 0.007 |
| weight (per 10 Kg) | -0.042 | 0.003 | -0.041 | 0.003 | -0.040 | 0.003 | -0.041 | 0.003 |
| age (per 10 yrs) | -0.052 | 0.004 | -0.043 | 0.003 | -0.043 | 0.003 | -0.043 | 0.003 |
| former smoker | 0.212 | 0.015 | 0.213 | 0.012 | 0.213 | 0.012 | 0.212 | 0.012 |
| never smoker | 0.216 | 0.014 | 0.218 | 0.012 | 0.218 | 0.012 | 0.219 | 0.012 |
Table 5 Point estimates and SEs from complete-case analysis and three MI methods (full conditional specification with and without predictive mean matching, and JAV) for the regression of log plasma vitamin C on log dietary vitamin C ('log diet C'), its square ('log diet C sqrd') and a set of confounders