| Literature DB >> 22695083 |
Katherine J Lee1, John B Carlin.
Abstract
BACKGROUND: Multiple imputation is becoming increasingly popular for handling missing data. However, it is often implemented without adequate consideration of whether it offers any advantage over complete case analysis for the research question of interest, or whether potential gains may be offset by bias from a poorly fitting imputation model, particularly as the amount of missing data increases.Entities:
Year: 2012 PMID: 22695083 PMCID: PMC3544721 DOI: 10.1186/1742-7622-9-3
Source DB: PubMed Journal: Emerg Themes Epidemiol ISSN: 1742-7622
Spearman rank correlations between covariates and distress at Wave II in the synthetic population (n = 971,327)
| Log(Distress WI) | 0.07 | | | | |
| Grade | 0.06 | 0.10 | | | |
| Health | 0.07 | 0.22 | 0.03 | | |
| Fitness | 0.13 | 0.22 | 0.09 | 0.42 | |
| | | | | | |
| Log(Distress WII) | −0.0008 | 0.56 | 0.07 | 0.20 | 0.20 |
Performance of methods for regression of (continuous) distress at Wave II, with missing baseline distress
| 90% | CCA | −0.016 | 0.069 | −0.229 | 93.9% | <0.001 | 0.036 | 0.014 | 95.1% | 0.002 | 0.032 | 0.063 | 93.7% |
| | MVNI-log | −0.001 | 0.064 | −0.012 | 93.0% | −0.009 | 0.035 | −0.252 | 95.1% | 0.003 | 0.031 | 0.095 | 94.1% |
| | MVNI-skew0 | −0.001 | 0.064 | −0.010 | 93.2% | <0.001 | 0.035 | 0.001 | 95.4% | 0.002 | 0.031 | 0.061 | 94.2% |
| 75% | CCA | −0.031 | 0.082 | −0.383 | 92.4% | −0.009 | 0.040 | −0.219 | 95.4% | 0.001 | 0.036 | 0.016 | 95.3% |
| | MVNI-log | 0.004 | 0.067 | 0.059 | 95.1% | −0.032 | 0.037 | −0.863 | 84.7% | 0.003 | 0.032 | 0.110 | 94.7% |
| | MVNI-skew0 | 0.003 | 0.066 | 0.047 | 94.5% | −0.009 | 0.038 | −0.233 | 96.0% | 0.001 | 0.031 | 0.033 | 94.5% |
| 50% | CCA | −0.054 | 0.118 | −0.458 | 91.9% | −0.011 | 0.051 | −0.210 | 95.5% | 0.003 | 0.048 | 0.064 | 94.9% |
| | MVNI-log | 0.002 | 0.075 | 0.031 | 94.6% | −0.064 | 0.045 | −1.426 | 68.7% | 0.008 | 0.034 | 0.246 | 94.5% |
| | MVNI-skew0 | <0.001 | 0.074 | −0.003 | 95.2% | −0.012 | 0.046 | −0.263 | 94.7% | 0.002 | 0.033 | 0.050 | 94.8% |
| 25% | CCA | −0.058 | 0.190 | −0.308 | 91.8% | −0.020 | 0.075 | −0.265 | 93.7% | 0.002 | 0.071 | 0.031 | 94.8% |
| | MVNI-log | 0.004 | 0.091 | 0.041 | 95.4% | −0.106 | 0.060 | −1.775 | 57.4% | 0.013 | 0.039 | 0.328 | 93.9% |
| | MVNI-skew0 | 0.004 | 0.092 | 0.045 | 94.6% | −0.023 | 0.059 | −0.397 | 90.7% | 0.003 | 0.039 | 0.072 | 94.4% |
| 10% | CCA | −0.023 | 0.345 | −0.068 | 94.8% | −0.025 | 0.125 | −0.196 | 95.9% | 0.008 | 0.120 | 0.071 | 96.0% |
| | MVNI-log | 0.011 | 0.129 | 0.084 | 96.0% | −0.145 | 0.092 | −1.576 | 60.9% | 0.021 | 0.049 | 0.423 | 91.3% |
| MVNI-skew0 | 0.013 | 0.136 | 0.096 | 95.1% | −0.056 | 0.091 | −0.614 | 89.5% | 0.009 | 0.052 | 0.183 | 94.8% | |
Measures of performance are mean values from the estimation of the β parameters in Equation 1 across the 1000 simulated datasets of 1000 observations (compared to the true values from the synthetic population of 971,327). CCA = Complete Case Analysis; MVNI = Multivariate normal imputation; StdBias = standardised bias; SE = average (estimated) standard error across the 1000 datasets.
Figure 1 Mean Squared Errors from regression of (continuous) distress at Wave II with missing baseline distress. a) Diet, b) Emotional Distress at Wave I Results presented are the average Mean Squared Error across the 1000 simulated datasets in the parameter estimates from linear regression of (continuous) emotional distress at Wave II from Equation 1, with missing data on emotional distress at baseline. CCA = Complete Case Analysis; MVNI = Multivariate normal imputation.
Performance of methods for regression of (continuous) distress at Wave II with missing dieting indicator
| 90% | CCA | 0.001 | 0.067 | 0.019 | 94.2% | −0.006 | 0.036 | −0.154 | 94.9% | −0.001 | 0.032 | −0.020 | 95.1% |
| | MVNI-skew0 | 0.005 | 0.065 | 0.074 | 94.6% | 0.001 | 0.034 | 0.020 | 94.7% | −0.001 | 0.030 | −0.043 | 95.3% |
| 75% | CCA | −0.001 | 0.075 | −0.014 | 95.0% | −0.014 | 0.040 | −0.360 | 94.9% | <0.001 | 0.036 | 0.009 | 95.5% |
| | MVNI-skew0 | 0.008 | 0.070 | 0.119 | 96.1% | <0.001 | 0.034 | 0.013 | 95.2% | −0.001 | 0.030 | −0.042 | 95.6% |
| 50% | CCA | −0.006 | 0.097 | −0.065 | 94.2% | −0.028 | 0.049 | −0.577 | 91.8% | 0.004 | 0.047 | 0.086 | 94.6% |
| | MVNI-skew0 | 0.014 | 0.080 | 0.175 | 95.6% | 0.001 | 0.034 | 0.032 | 94.6% | <0.001 | 0.030 | −0.002 | 96.3% |
| 25% | CCA | −0.011 | 0.147 | −0.076 | 94.6% | −0.043 | 0.072 | −0.603 | 91.9% | −0.001 | 0.071 | −0.014 | 94.3% |
| | MVNI-skew0 | 0.018 | 0.104 | 0.172 | 96.1% | <0.001 | 0.034 | −0.009 | 93.6% | −0.002 | 0.030 | −0.071 | 94.2% |
| 10% | CCA | −0.020 | 0.250 | −0.081 | 95.8% | −0.053 | 0.118 | −0.452 | 92.2% | 0.009 | 0.120 | 0.077 | 94.7% |
| MVNI-skew0 | 0.011 | 0.149 | 0.071 | 95.0% | −0.001 | 0.035 | −0.042 | 95.9% | <0.001 | 0.031 | −0.014 | 95.4% | |
Measures of performance are mean values in the estimation of the β parameters from Equation 1 across the 1000 simulated datasets of 1000 observations (compared to the true values from the synthetic population of 971,327). CCA = Complete Case Analysis; MVNI = Multivariate normal imputation; StdBias = standardised bias; SE = average (estimated) standard error across the 1000 datasets.
Figure 2 Mean Squared Errors from regression of (continuous) distress at Wave II with missing dieting indicator. a) Diet, b) Emotional Distress at Wave I Results presented are the average Mean Squared Error across the 1000 simulated datasets in the parameter estimates from linear regression of (continuous) emotional distress at Wave II from Equation 1, with missing data on the dieting indicator. CCA = Complete Case Analysis; MVNI = Multivariate normal imputation.
Performance of methods for regression of (dichotomous) distress at Wave II with missing baseline distress
| 90% | CCA | −0.115 | 0.257 | −0.449 | 92.6% | 0.021 | 0.197 | 0.106 | 95.0% | 0.007 | 0.115 | 0.065 | 95.0% |
| | MVNI-skew0 | 0.002 | 0.226 | 0.008 | 93.9% | −0.003 | 0.199 | −0.013 | 95.3% | 0.005 | 0.106 | 0.048 | 95.1% |
| 75% | CCA | −0.209 | 0.319 | −0.653 | 91.9% | 0.013 | 0.227 | 0.059 | 93.70% | 0.006 | 0.133 | 0.047 | 95.4% |
| | MVNI-skew0 | −0.007 | 0.233 | −0.030 | 95.8% | −0.043 | 0.230 | −0.187 | 94.6% | 0.004 | 0.108 | 0.039 | 94.3% |
| 50% | CCA | −0.243 | 0.485 | −0.501 | 96.3% | 0.046 | 0.306 | 0.149 | 94.5% | 0.008 | 0.180 | 0.043 | 94.0% |
| | MVNI-skew0 | 0.014 | 0.251 | 0.057 | 95.9% | −0.082 | 0.304 | −0.270 | 94.6% | 0.008 | 0.115 | 0.066 | 95.6% |
| 25% | CCA | −0.120 | 0.833 | −0.144 | 82.4% | 0.091 | 0.487 | 0.188 | 79.4% | 0.021 | 0.286 | 0.073 | 80.0% |
| MVNI-skew0 | 0.021 | 0.308 | 0.068 | 80.6% | −0.155 | 0.447 | −0.348 | 78.5% | 0.002 | 0.132 | 0.012 | 80.5% | |
Measures of performance are mean values in the estimation of the β parameters from Equation 1 across the 1000 simulated datasets of 1000 observations (compared to the true values from the synthetic population of 971,327). CCA = Complete Case Analysis; MVNI = Multivariate normal imputation; StdBias = standardised bias; SE = average (estimated) standard error across the 1000 datasets.
Note in this example it was not possible to include an analysis where only 10% of the data were complete due to a large number of datasets with zero counts in the cross-tabulation of diet and distress at Wave II.