| Literature DB >> 24252653 |
Cattram D Nguyen1, John B Carlin, Katherine J Lee.
Abstract
BACKGROUND: Multiple imputation (MI) is becoming increasingly popular as a strategy for handling missing data, but there is a scarcity of tools for checking the adequacy of imputation models. The Kolmogorov-Smirnov (KS) test has been identified as a potential diagnostic method for assessing whether the distribution of imputed data deviates substantially from that of the observed data. The aim of this study was to evaluate the performance of the KS test as an imputation diagnostic.Entities:
Mesh:
Year: 2013 PMID: 24252653 PMCID: PMC3840572 DOI: 10.1186/1471-2288-13-144
Source DB: PubMed Journal: BMC Med Res Methodol ISSN: 1471-2288 Impact factor: 4.615
Linear regression analysis results for the Longitudinal Study of Australian Children example
| | | ||||||||||
| Mother completed high school | 3 (0.1) | −0.147 | 0.061 | −0.216 | 0.054 | −0.212 | 0.054 | −0.233 | 0.054 | −0.212 | 0.053 |
| Socioeconomic position | 151 (3.6) | −0.192 | 0.028 | −0.183 | 0.026 | −0.184 | 0.026 | −0.184 | 0.025 | −0.180 | 0.025 |
| Male child | 0 (0) | 0.236 | 0.048 | 0.281 | 0.043 | 0.283 | 0.043 | 0.284 | 0.043 | 0.281 | 0.043 |
| Warm parenting | 251 (6.0) | −0.309 | 0.059 | −0.297 | 0.053 | −0.290 | 0.055 | −0.295 | 0.054 | −0.303 | 0.054 |
| Harsh discipline score | 970 (23.0) | 0.252 | 0.017 | 0.232 | 0.018 | 0.212 | 0.017 | 0.247 | 0.017 | 0.241 | 0.018 |
| K6 total score | 287 (6.8) | 0.033 | 0.008 | 0.035 | 0.007 | 0.036 | 0.008 | 0.032 | 0.007 | 0.033 | 0.008 |
| Smoker | 971 (23.1) | 0.155 | 0.069 | 0.193 | 0.063 | 0.199 | 0.062 | 0.153 | 0.077 | 0.208 | 0.063 |
The multiple imputation results were estimated using 20 imputations. The four imputation models were: 1 = “optimal” model, 2 = no outcome variable, 3 = no auxiliary variables, 4 = no de-skewing.
Kolmogorov-Smirnov (KS) test p-values for the Longitudinal Study of Australian Children example
| Family socioeconomic position | 4.94 × 10-7 | 9.02 × 10-7 | 0.024 | 2.85 × 10-6 |
| Warm parenting | 4.81 × 10-12 | 4.81 × 10-12 | 3.38 × 10-12 | 1.73 × 10-8 |
| Harsh discipline | 2.52 × 10-5 | 6.21 × 10-5 | 8.45 × 10-6 | 6.39 × 10-15 |
| Mother’s emotional distress | 4.48 × 10-6 | 2.44 × 10-6 | 9.95 × 10-6 | 1.78 × 10-16 |
Results are median KS p-values over 20 imputed datasets. The four imputation models were: 1 = “optimal” model, 2 = no outcome variable, 3 = no auxiliary variables, 4 = no de-skewing.
Figure 1Results from the Longitudinal Study of Australian Children example. Density plots of observed (solid line) and imputed (dashed line) data for family socioeconomic advantage. Larger values represent greater socioeconomic advantage. The data from the 20 imputed datasets have been pooled.
Figure 2Simulation study results for degrees of freedom = 1000. a) Line plots of the root mean square error (RMSE) of the beta coefficient estimates against alpha (the parameter controlling the skewness), and b) Bar charts of median Kolmogorov-Smirnov (KS) test p-values across the 20 imputed datasets and 1000 replications against alpha. MCAR = missing completely at random, MAR = missing at random.