| Literature DB >> 24748700 |
Ian R White1, Rhian Daniel2, Patrick Royston3.
Abstract
Multiple imputation is a popular way to handle missing data. Automated procedures are widely available in standard software. However, such automated procedures may hide many assumptions and possible difficulties from the view of the data analyst. Imputation procedures such as monotone imputation and imputation by chained equations often involve the fitting of a regression model for a categorical outcome. If perfect prediction occurs in such a model, then automated procedures may give severely biased results. This is a problem in some standard software, but it may be avoided by bootstrap methods, penalised regression methods, or a new augmentation procedure.Entities:
Keywords: Missing data; Multiple imputation; Perfect prediction; Separation
Year: 2010 PMID: 24748700 PMCID: PMC3990447 DOI: 10.1016/j.csda.2010.04.005
Source DB: PubMed Journal: Comput Stat Data Anal ISSN: 0167-9473 Impact factor: 1.681
Dental pain trial: comparison of complete cases and 2 different MI procedures for the logistic regression of outcome on treatment group at the final timepoint. Figures are estimated log odds of pain relief (standard error) and fraction of missing information (FMI) in model (1).
| Parameter | MI with 100 imputations | Complete cases | |||
|---|---|---|---|---|---|
| SAS/PROC MI | R/MICE | ||||
| Estimate (s.e.) | FMI | Estimate (s.e.) | FMI | ||
| 0.37 (0.74) | 0.84 | 0.43 (0.71) | 0.83 | 0.83 (0.45) | |
| 1.21 (0.53) | 0.59 | 1.19 (0.78) | 0.80 | 1.72 (0.49) | |
| 1.04 (0.47) | 0.53 | 1.10 (0.58) | 0.67 | 1.46 (0.42) | |
| 2.04 (0.59) | 0.44 | 1.29 (1.02) | 0.85 | 2.71 (0.73) | |
| 2.32 (0.65) | 0.41 | 1.59 (0.70) | 0.69 | 2.48 (0.60) | |
| 1.00 (0.45) | 0.51 | 0.90 (0.35) | 0.23 | 1.30 (0.38) | |
| 1.52 (1.85) | 0.92 | 0.23 (1.65) | 0.95 | 1.39 (0.79) | |
Artificial data used to illustrate the perfect prediction problem.
| 0 | 1 | Missing | |
|---|---|---|---|
| 0 | 100 | 0 | 100 |
| 1 | 100 | 100 | 100 |
Fig. 1Profile log-likelihood for model fitted to the data in Table 2.
Fig. 2Mean number of imputed successes in data in Table 2 using “Normal/augment” method: comparison of different values added to all cells, for three different numbers of failures .
Fig. 3Artificial data of Table 2: number of imputed successes in 100 individuals with (left panels) and in 100 individuals with (right panels), using various analysis methods and software packages.
Fig. 4Dental pain data: estimated log odds of pain relief at 10th occasion (with 95% confidence interval) for various analysis methods.
Results of simulation study to compare Normal/allow, Bootstrap, Normal/penalise and Normal/augment methods for handling perfect prediction. EmpSE is the empirical standard error, ModSE is the relative error in the model standard error compared with the empirical standard error, Coverage is coverage of a nominal 95% confidence interval, and Power is the power to reject parameter = 0.
| Parameter | Method | Bias | EmpSE | ModSE (%) | Coverage (%) | Power (%) |
|---|---|---|---|---|---|---|
| Normal/allow | 0.07 | 0.05 | 130 | 99 | 23 | |
| Bootstrap | 0.00 | 0.02 | 1 | 95 | 100 | |
| Normal/penalise | 0.00 | 0.02 | 3 | 95 | 100 | |
| Normal/augment | 0.00 | 0.02 | 3 | 95 | 100 | |
| Normal/allow | −0.22 | 0.36 | 47 | 93 | 20 | |
| Bootstrap | 0.02 | 0.43 | 1 | 94 | 65 | |
| Normal/penalise | 0.00 | 0.42 | 2 | 94 | 65 | |
| Normal/augment | −0.02 | 0.42 | 3 | 95 | 63 | |
| Normal/allow | −0.11 | 0.24 | 26 | 96 | 90 | |
| Bootstrap | 0.01 | 0.25 | 1 | 96 | 98 | |
| Normal/penalise | 0.00 | 0.25 | 1 | 95 | 97 | |
| Normal/augment | 0.00 | 0.25 | 2 | 96 | 97 | |
| Normal/allow | 0.00 | 0.09 | −3 | 94 | 100 | |
| Bootstrap | 0.00 | 0.09 | −3 | 94 | 100 | |
| Normal/penalise | 0.00 | 0.09 | −3 | 94 | 100 | |
| Normal/augment | 0.00 | 0.09 | −3 | 94 | 100 | |
| Maximum Monte Carlo error | 0.01 | 0.01 | 5 | 1 | 2 | |