| Literature DB >> 30541455 |
Md Hamidul Huque1,2, John B Carlin3,4,5, Julie A Simpson5, Katherine J Lee3,4.
Abstract
BACKGROUND: Multiple imputation (MI) is now widely used to handle missing data in longitudinal studies. Several MI techniques have been proposed to impute incomplete longitudinal covariates, including standard fully conditional specification (FCS-Standard) and joint multivariate normal imputation (JM-MVN), which treat repeated measurements as distinct variables, and various extensions based on generalized linear mixed models. Although these MI approaches have been implemented in various software packages, there has not been a comprehensive evaluation of the relative performance of these methods in the context of longitudinal data.Entities:
Keywords: FCS; Joint modelling; Linear mixed model; Longitudinal missing data; MICE; Multilevel multiple imputation; Multiple imputation
Mesh:
Year: 2018 PMID: 30541455 PMCID: PMC6292063 DOI: 10.1186/s12874-018-0615-6
Source DB: PubMed Journal: BMC Med Res Methodol ISSN: 1471-2288 Impact factor: 4.615
Summary of imputation approaches for handling missing data in longitudinal studies available in standard software
| MI approaches | Method | Details | Software |
|---|---|---|---|
| Joint modelling (JM) | JM-MVN | • Repeated measurements of time-dependent variables are imputed as distinct variables. | SAS (7), SPSS (42), Stata (8), Mplus (43) and R (9) |
| JM-MLMM | • Repeated measurements of time-dependent variables are imputed using hierarchical models. | Mplus, | |
| JM-MLMM-LN | • Repeated measurements of time-dependent variables are imputed using hierarchical models. | Realcom-impute [ | |
| Fully conditional specification (FCS) | FCS-Standard | • Repeated measurements of time-dependent variables are imputed as distinct variables. | SAS, SPSS, Stata, Mplus and R |
| FCS - Twofold | • Repeated measurements of time-dependent variables are imputed as distinct variables. | Stata | |
| FCS-MTW | • Repeated measurements of time-dependent variables are imputed as distinct variables. | Stata | |
| FCS-LMM | • Repeated measurements of time-dependent variables are imputed using hierarchical models. | R package mice ( | |
| FCS-LMM-het | • Repeated measurements of time-dependent variables are imputed using hierarchical models. | R package mice ( | |
| FCS-GLMM | • Repeated measurements of time-dependent variables are imputed using hierarchical models. | R package micemd [ | |
| FCS-MLMM-LN | • Repeated measurements of time-dependent variables are imputed using hierarchical models. | Mplus, R package micemd | |
| FCS- LMM-LN | • Repeated measurements of time-dependent variables are imputed using hierarchical models. | Blimp [ | |
| FCS-LMM-PMM | • Repeated measurements of time-dependent variables are imputed using hierarchical models. | R package miceadds [ |
The following abbreviations are used to denote different MI methods, e.g., MVN: multivariate normal imputation; MLMM: Multivariate linear mixed-effects model; MLMM-LN: Multivariate linear mixed-effects model with latent normal variables; LMM: Linear mixed-effects model; PMM-Predicted mean matching; GLMM-Generalised linear mixed-effects model; MTW – Moving Time Window
Comparisons of the missing data proportions in both LSAC and simulated data
| Data collection wave | Proportion of missing data in BMI z-score | Proportion of missing data in FamStruc | ||
|---|---|---|---|---|
| Case study | Simulation study | Case study | Simulation study | |
| 1 | 0.06 | 0.01 | 0.00 | 0.00 |
| 2 | 0.08 | 0.06 | 0.03 | 0.06 |
| 3 | 0.09 | 0.10 | 0.05 | 0.08 |
| 4 | 0.14 | 0.15 | 0.09 | 0.10 |
| 5 | 0.19 | 0.19 | 0.14 | 0.15 |
| 6 | 0.30 | 0.28 | 0.24 | 0.24 |
Fig. 1Distribution of the bias in the estimated regression coefficients (i.e., mean changes in the QoL z-score associated with each covariate) for analysis model (1) across the 1000 simulated datasets following complete data, available data and 12 multiple imputation methods. Top and bottom panel show the distribution of the bias in the estimated regression coefficients for covariates with missing data whereas the middle panel shows the distribution of the bias associated with fully observed covariate
Fig. 2Estimated coverage of the 95% confidence interval for the regression coefficients in analysis model (1), derived from 1000 simulated datasets. The dotted lines indicate the nominal value of 95%
Fig. 3Distribution of the bias in the estimated regression coefficients (i.e., mean changes in the QoL z-score associated with each covariate) for analysis model (2) across the 1000 simulated datasets following complete data, available data and 12 multiple imputation methods. Top, left and bottom right panels show the distribution of the bias in the estimated regression coefficients for covariates with missing data and all other panels show the distribution of the bias associated with fully observed covariate
Fig. 4Estimated coverage of the 95% confidence interval for the regression coefficients in analysis model (2), derived from 1000 simulated datasets. The dotted lines indicate the nominal value of 95%
Fig. 5Average computational time (in seconds) for single imputation for each of the MI methods when applied to a single simulated dataset
Fig. 6Estimated regression coefficients and 95% CI for analysis model (1) applying available data and all the approaches to handle missing data in LSAC
Fig. 7Estimated regression coefficients with 95% CI for analysis model (2) applying available data and all the MI approaches to handle missing data in LSAC