| Literature DB >> 24766825 |
Laura Rodwell1, Katherine J Lee, Helena Romaniuk, John B Carlin.
Abstract
BACKGROUND: Multiple imputation (MI) was developed as a method to enable valid inferences to be obtained in the presence of missing data rather than to re-create the missing values. Within the applied setting, it remains unclear how important it is that imputed values should be plausible for individual observations. One variable type for which MI may lead to implausible values is a limited-range variable, where imputed values may fall outside the observable range. The aim of this work was to compare methods for imputing limited-range variables, with a focus on those that restrict the range of the imputed values.Entities:
Mesh:
Year: 2014 PMID: 24766825 PMCID: PMC4021274 DOI: 10.1186/1471-2288-14-57
Source DB: PubMed Journal: BMC Med Res Methodol ISSN: 1471-2288 Impact factor: 4.615
Figure 1Distribution of complete data for three scoring methods of GHQ at wave 8 (714 females).
Average percentage of missing values imputed outside the specified range using linear regression imputation, by scoring method
| | ||||||||
|---|---|---|---|---|---|---|---|---|
| | ||||||||
| Likert | 2.0 | 2.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.2 | 0.2 |
| C-GHQ | 12.2 | 11.6 | 0.1 | 0.1 | 7.9 | 7.4 | 1.8 | 0.2 |
| Standard | 23.0 | 24.0 | 0.0 | 0.0 | 14.3 | 14.9 | 6.5 | 7.5 |
Note: Mean number of missing values per dataset is 238.
Performance measures for the estimation of the marginal mean, GHQ scores on raw scale
| | | | ||||
| (1 + | ||||||
| Regression, non-rounded | 10.2319 | 0.0008 | 0.1862 | 0.0179 | 0.0174 | 0.943 |
| Post-imputation rounding | 10.2445 | 0.0134 | 0.1851 | 0.0180 | 0.0167 | 0.941 |
| Truncated regression | 10.2206 | -0.0105 | 0.1846 | 0.0172 | 0.0172 | 0.959 |
| Predictive mean matching | 10.1805 | -0.0506 | 0.1832 | 0.0176 | 0.0138 | 0.894 |
| | | | | | | |
| Regression, non-rounded | 10.2243 | -0.0068 | 0.1838 | 0.0220 | 0.0191 | 0.939 |
| Post-imputation rounding | 10.2353 | 0.0042 | 0.1828 | 0.0221 | 0.0185 | 0.928 |
| Truncated regression | 10.2183 | -0.0128 | 0.1825 | 0.0219 | 0.0191 | 0.936 |
| Predictive mean matching | 10.1378 | -0.0933 | 0.1818 | 0.0213 | 0.0158 | 0.852 |
| | | | ||||
| (1 + | ||||||
| Regression, non-rounded | 3.3178 | -0.0002 | 0.1058 | 0.0055 | 0.0054 | 0.956 |
| Post-imputation rounding | 3.3741 | 0.0561 | 0.1023 | 0.0053 | 0.0044 | 0.846 |
| Truncated regression | 3.5582 | 0.2402 | 0.1025 | 0.0054 | 0.0077 | 0.233 |
| Predictive mean matching | 3.2931 | -0.0248 | 0.1048 | 0.0058 | 0.0050 | 0.925 |
| | | | | | | |
| Regression, non-rounded | 3.3150 | -0.0029 | 0.1055 | 0.0065 | 0.0061 | 0.946 |
| Post-imputation rounding | 3.3687 | 0.0508 | 0.1021 | 0.0061 | 0.0050 | 0.880 |
| Truncated regression | 3.5505 | 0.2326 | 0.1024 | 0.0060 | 0.0086 | 0.318 |
| Predictive mean matching | 3.2664 | -0.0515 | 0.1045 | 0.0067 | 0.0059 | 0.880 |
| | | | ||||
| (1 + | ||||||
| Regression, non-rounded | 1.8085 | 0.0004 | 0.0951 | 0.0045 | 0.0045 | 0.950 |
| Post-imputation rounding | 1.9276 | 0.1195 | 0.0894 | 0.0046 | 0.0029 | 0.436 |
| Truncated regression | 2.2988 | 0.4907 | 0.0982 | 0.0078 | 0.0434 | 0.293 |
| Predictive mean matching | 1.7791 | -0.0290 | 0.0934 | 0.0044 | 0.0035 | 0.894 |
| | | | | | | |
| Regression, non-rounded | 1.8057 | -0.0024 | 0.0941 | 0.0056 | 0.0051 | 0.933 |
| Post-imputation rounding | 1.9198 | 0.1117 | 0.0887 | 0.0054 | 0.0033 | 0.552 |
| Truncated regression | 2.3043 | 0.4962 | 0.0984 | 0.0082 | 0.0445 | 0.291 |
| Predictive mean matching | 1.7624 | -0.0457 | 0.0930 | 0.0053 | 0.0040 | 0.848 |
Key: = complete data estimate; U = estimated variance of from complete data; = average of MI-based point estimates across 1000 simulated datasets; bias = difference between and ; = average of estimated within-imputation variance across simulated datasets; = variance of the MI point estimates across simulated datasets; (1 + m- 1)E[Bm] = average of estimated between-imputation variance (with adjustment for number of imputations) across simulated datasets; coverage = proportion of (nominally) 95% confidence intervals that contain the complete data estimate.
Performance measures for the estimation of the marginal mean with transformed GHQ scores
| | | | ||||
| (1 + | ||||||
| Regression, non-rounded | 10.2366 | 0.0055 | 0.1861 | 0.0181 | 0.0174 | 0.947 |
| Post-imputation rounding | 10.1446 | -0.0865 | 0.1857 | 0.0416 | 0.0170 | 0.820 |
| Truncated regression | 10.2227 | -0.0084 | 0.1840 | 0.0181 | 0.0163 | 0.935 |
| Predictive mean matching | 10.1926 | -0.0385 | 0.1837 | 0.0174 | 0.0148 | 0.916 |
| | | | | | | |
| Regression, non-rounded | 10.2119 | -0.0192 | 0.1846 | 0.0223 | 0.0197 | 0.928 |
| Post-imputation rounding | 10.0985 | -0.1326 | 0.1842 | 0.0628 | 0.0193 | 0.758 |
| Truncated regression | 10.2010 | -0.0301 | 0.1825 | 0.0217 | 0.0180 | 0.915 |
| Predictive mean matching | 10.1401 | -0.0910 | 0.1819 | 0.0216 | 0.0164 | 0.858 |
| | | | ||||
| (1 + | ||||||
| Regression, non-rounded | 3.3268 | 0.0088 | 0.1087 | 0.0057 | 0.0063 | 0.960 |
| Post-imputation rounding | 3.3231 | 0.0051 | 0.1053 | 0.0058 | 0.0052 | 0.931 |
| Truncated regression | 3.5563 | 0.2384 | 0.1028 | 0.0053 | 0.0048 | 0.096 |
| Predictive mean matching | 3.3009 | -0.0170 | 0.1049 | 0.0057 | 0.0053 | 0.941 |
| | | | | | | |
| Regression, non-rounded | 3.3265 | 0.0086 | 0.1094 | 0.0065 | 0.0077 | 0.969 |
| Post-imputation rounding | 3.3185 | 0.0005 | 0.1055 | 0.0065 | 0.0063 | 0.954 |
| Truncated regression | 3.5452 | 0.2273 | 0.1027 | 0.0060 | 0.0055 | 0.160 |
| Predictive mean matching | 3.2722 | -0.0457 | 0.1045 | 0.0069 | 0.0059 | 0.886 |
| | | | ||||
| (1 + | ||||||
| Regression, non-rounded | 263.63 | 261.82 | 30917 | 53900000 | 998000000 | 0.992 |
| Post-imputation rounding | 1.7996 | -0.0086 | 0.1055 | 0.0037 | 0.0074 | 0.992 |
| Truncated regression | 2.2661 | 0.4580 | 0.0953 | 0.0056 | 0.0047 | 0.000 |
| Predictive mean matching | 1.8052 | -0.0029 | 0.0943 | 0.0043 | 0.0041 | 0.940 |
| | | | | | | |
| Regression, non-roundedƗ | | | | | | |
| Post-imputation rounding | 1.8035 | -0.0046 | 0.1074 | 0.0043 | 0.0095 | 0.989 |
| Truncated regression | 2.2603 | 0.4522 | 0.0951 | 0.0061 | 0.0054 | 0.000 |
| Predictive mean matching | 1.7829 | -0.0252 | 0.0937 | 0.0053 | 0.0045 | 0.909 |
ƗValues larger than those obtained under the MCAR condition Key: = complete data estimate; U = estimated variance of from complete data; = average of MI-based point estimates across 1000 simulated datasets; bias = difference between and ; = average of estimated within-imputation variance across simulated datasets; = variance of the MI point estimates across simulated datasets; (1 + m- 1)E[Bm] = average of estimated between-imputation variance (with adjustment for number of imputations) across simulated datasets; coverage = proportion of (nominally) 95% confidence intervals that contain the complete data estimate.
Performance measures for the estimation of the regression coefficient with GHQ scores on raw scale
| | | | ||||
| (1 + | ||||||
| Regression, non-rounded | 0.03181 | -0.00046 | 0.02211 | 0.00027 | 0.00022 | 0.922 |
| Post-imputation rounding | 0.03192 | -0.00035 | 0.02219 | 0.00027 | 0.00022 | 0.920 |
| Truncated regression | 0.03247 | 0.00020 | 0.02217 | 0.00028 | 0.00022 | 0.914 |
| Predictive mean matching | 0.02551 | -0.00676 | 0.02223 | 0.00019 | 0.00016 | 0.918 |
| | | | | | | |
| Regression, non-rounded | 0.02888 | -0.00340 | 0.02270 | 0.00080 | 0.00067 | 0.927 |
| Post-imputation rounding | 0.02926 | -0.00301 | 0.02279 | 0.00079 | 0.00066 | 0.924 |
| Truncated regression | 0.03010 | -0.00217 | 0.02278 | 0.00084 | 0.00066 | 0.926 |
| Predictive mean matching | 0.01727 | -0.01500 | 0.02298 | 0.00036 | 0.00036 | 0.911 |
| | | | ||||
| (1 + | ||||||
| Regression, non-rounded | 0.04694 | -0.00100 | 0.04014 | 0.00077 | 0.00076 | 0.946 |
| Post-imputation rounding | 0.04805 | 0.00012 | 0.04123 | 0.00080 | 0.00069 | 0.932 |
| Truncated regression | 0.04360 | -0.00433 | 0.04130 | 0.00086 | 0.00073 | 0.925 |
| Predictive mean matching | 0.03738 | -0.01055 | 0.04033 | 0.00053 | 0.00056 | 0.939 |
| | | | | | | |
| Regression, non-rounded | 0.04283 | -0.00511 | 0.04056 | 0.00232 | 0.00219 | 0.939 |
| Post-imputation rounding | 0.04932 | 0.00138 | 0.04158 | 0.00224 | 0.00195 | 0.928 |
| Truncated regression | 0.06470 | 0.01676 | 0.04133 | 0.00243 | 0.00210 | 0.906 |
| Predictive mean matching | 0.02442 | -0.02352 | 0.04087 | 0.00109 | 0.00121 | 0.929 |
| | | | ||||
| (1 + | ||||||
| Regression, non-rounded | 0.05066 | -0.00170 | 0.04336 | 0.00101 | 0.00085 | 0.938 |
| Post-imputation rounding | 0.05252 | 0.00016 | 0.04550 | 0.00106 | 0.00067 | 0.889 |
| Truncated regression | 0.04613 | -0.00623 | 0.04218 | 0.00101 | 0.00096 | 0.930 |
| Predictive mean matching | 0.04069 | -0.01167 | 0.04368 | 0.00065 | 0.00061 | 0.929 |
| | | | | | | |
| Regression, non-rounded | 0.04557 | -0.00679 | 0.04441 | 0.00278 | 0.00261 | 0.942 |
| Post-imputation rounding | 0.06226 | 0.00990 | 0.04590 | 0.00238 | 0.00187 | 0.911 |
| Truncated regression | 0.09485 | 0.04249 | 0.04092 | 0.00233 | 0.00259 | 0.857 |
| Predictive mean matching | 0.02669 | -0.02567 | 0.04517 | 0.00122 | 0.00139 | 0.939 |
Key: = complete data estimate; U = estimated variance of from complete data; = average of MI-based point estimates across 1000 simulated datasets; bias = difference between and ; = average of estimated within-imputation variance across simulated datasets; = variance of the MI point estimates across simulated datasets; [(1 + m- 1)E[Bm] = average of estimated between-imputation variance (with adjustment for number of imputations) across simulated datasets; coverage = proportion of (nominally) 95% confidence intervals that contain the complete data estimate.