| Literature DB >> 33794933 |
Cattram D Nguyen1,2, John B Carlin3,4, Katherine J Lee3,4.
Abstract
Multiple imputation is a recommended method for handling incomplete data problems. One of the barriers to its successful use is the breakdown of the multiple imputation procedure, often due to numerical problems with the algorithms used within the imputation process. These problems frequently occur when imputation models contain large numbers of variables, especially with the popular approach of multivariate imputation by chained equations. This paper describes common causes of failure of the imputation procedure including perfect prediction and collinearity, focusing on issues when using Stata software. We outline a number of strategies for addressing these issues, including imputation of composite variables instead of individual components, introducing prior information and changing the form of the imputation model. These strategies are illustrated using a case study based on data from the Longitudinal Study of Australian Children.Entities:
Keywords: Auxiliary variables; Collinearity; Convergence; Missing data; Multiple imputation; Multivariate imputation by chained equations; Multivariate normal imputation; Perfect prediction
Year: 2021 PMID: 33794933 PMCID: PMC8017730 DOI: 10.1186/s12982-021-00095-3
Source DB: PubMed Journal: Emerg Themes Epidemiol ISSN: 1742-7622
Cross-tabulation of two simulated variables Y (binary) and X (categorical)
| Y | X | Total | |||
|---|---|---|---|---|---|
| 0 | 1 | 2 | Missing | ||
| 0 | 25 | 17 | 2 | 6 | 50 |
| 1 | 6 | 4 | 0 | 1 | 11 |
| Missing | 5 | 2 | 1 | 1 | 9 |
| Total | 36 | 23 | 3 | 8 | 70 |
Correlation matrix of simulated variables Y, V1, V2, V3 and V4
| Y | V1 | V2 | V3 | V4 | |
|---|---|---|---|---|---|
| Y | 1 | ||||
| V1 | − 0.09 | 1 | |||
| V2 | − 0.02 | 0.94 | 1 | ||
| V3 | − 0.18 | 0.92 | 0.92 | 1 | |
| V4 | − 0.18 | 0.89 | 0.93 | 0.96 | 1 |
Strategies for exploring reasons for failed imputation procedures
| Strategy | Problem identified |
|---|---|
| Remove variables from the imputation model in turn | If the model runs successfully after omitting a particular variable, this might provide some insight into which variable(s) is causing the problem |
| Create cross-tabulations of categorical variables in the imputation model (such as that shown in Table | Look for sparse or empty cells as these may be causing perfect prediction. It may be necessary to explore patterns across > 2 variables, as perfect prediction can occur for strata produced by combinations of multiple variables |
| Explore correlations between variables | This can help identify possible sources of collinearity |
| Examine any output the software produces prior to breakdown of the MI procedure e.g. interim estimates of model parameters | Look for signs of collinearity such as large standard errors and unstable coefficients across iterations. Omission of variables from a model might also signal perfect prediction or collinearity. If the imputation procedure iterates for a substantial amount of time, it might be advisable to run a small number of iterations in order to obtain some output |
| For problems with MICE, the univariate imputation models can be tested outside the MICE framework by fitting models to observed data (i.e. complete cases) | Check whether the software removes any variables or issues warnings when fitting the univariate models (as these error messages might provide information that is not provided after imputation model failure). When fitting the univariate models, it is also possible to use additional diagnostics such as the variance inflation factor, which provides an indication of whether standard errors are inflated due to collinearity [ |
Fig. 1Results of five imputation approaches applied to data from the Longitudinal Study of Australian Children. Results shown are log-odds ratios and 95% confidence intervals for the association between BMI z-score and poor health related quality of life. The five imputation approaches correspond to the scenarios with a “Yes” in the “Successful?” column in Additional file 1: Supplementary Table 2, i.e. strategies 2, 3, 8, 9 and 11 respectively. MICE, multivariate imputation by chained equations; MVNI, multivariate normal imputation