| Literature DB >> 25880850 |
Panteha Hayati Rezvan1, Katherine J Lee2,3, Julie A Simpson4.
Abstract
BACKGROUND: Missing data are common in medical research, which can lead to a loss in statistical power and potentially biased results if not handled appropriately. Multiple imputation (MI) is a statistical method, widely adopted in practice, for dealing with missing data. Many academic journals now emphasise the importance of reporting information regarding missing data and proposed guidelines for documenting the application of MI have been published. This review evaluated the reporting of missing data, the application of MI including the details provided regarding the imputation model, and the frequency of sensitivity analyses within the MI framework in medical research articles.Entities:
Mesh:
Year: 2015 PMID: 25880850 PMCID: PMC4396150 DOI: 10.1186/s12874-015-0022-1
Source DB: PubMed Journal: BMC Med Res Methodol ISSN: 1471-2288 Impact factor: 4.615
Number of articles using multiple imputation from January 2008 to December 2013 by type of study
|
| |||
|---|---|---|---|
|
|
|
|
|
|
|
|
| |
|
| |||
| Randomised controlled trials | 35 | 34 | 69 |
| Non-randomised controlled trialsa | 2 | 2 | 4 |
| Total | 37 | 36 | 73 |
|
| |||
| Prospective studies | 11 | 7 | 18 |
| Retrospective studiesb | 6 | 1 | 7 |
| Cross-sectional studies | 3 | 0 | 3 |
| Case–control studies | 1 | 1 | 2 |
| Total | 21 | 9 | 30 |
aQuasi- experimental studies; bRetrospective studies- these are studies which performed a retrospective analysis of routinely collected data.
Figure 1Search results.
Figure 2Number of articles in the Lancet and New England Journal of Medicine that used MI: overall and by study type.
Reporting of missing data in articles using multiple imputation
|
| |||
|---|---|---|---|
|
|
|
|
|
|
|
|
| |
|
|
|
| |
|
| |||
| Availability of any information about the amount of missing data/complete cases | 46 (63) | 23 (77) | 69 (67) |
| Proportion of complete cases was stated/available | 31 (42) | 10 (33) | 41 (40) |
| Median [Range] % complete cases | 88 [47-99] | 41 [28-74] | 84 [28-99] |
| Proportion of missing by each variable was stated/available | 36 (49) | 19 (63) | 55 (53) |
| Assessed differences between individuals with complete and incomplete data in text? | 8 (11) | 5 (17) | 13 (13) |
| Provided a table | 3 | 4 | 7 |
| Statement regarding missing data mechanism assumed in the analysis | 13 (18) | 7 (23) | 20 (19) |
*Unless otherwise stated.
Reporting of variables imputed in articles using multiple imputation
|
| |||
|---|---|---|---|
|
|
|
|
|
|
|
|
| |
|
|
|
| |
|
| |||
| Variables imputed specified/available | 61 (84) | 28 (93) | 89 (86) |
| Number of variable(s) imputed | |||
| 1 | 31 | 8 | 39 |
| 2 | 9 | 7 | 16 |
| >2 | 17 | 10 | 27 |
| Uncleara | 4 | 3 | 7 |
| Outcome variable imputed | |||
| Yes | 55 (75) | 9 (30) | 64 (62) |
| Not stated | 12 (16) | 2 (7) | 14 (14) |
| No | 6 (8) | 19 (63) | 25 (24) |
| Type of outcome variable imputed | |||
| Numerical | 31 | 6 | 37 |
| Categorical | 16 | 2 | 18 |
| Numerical and categorical | 8 | 1 | 9 |
| Number of imputed outcome variables | |||
| 1 | 30 | 6 | 36 |
| 2 | 10 | 1 | 11 |
| >2b | 12 | 2 | 14 |
| Uncleara | 3 | 0 | 3 |
| Covariate imputed | |||
| Yes | 13 (18) | 21 (70) | 34 (33) |
| Not stated | 12 (16) | 2 (7) | 14 (14) |
| No | 48 (66) | 7 (23) | 55 (53) |
| Type of covariates imputed | |||
| Numerical | 6 | 4 | 10 |
| Categorical | 3 | 8 | 11 |
| Numerical and categorical | 3 | 8 | 11 |
| Uncleara,c | 1 | 1 | 2 |
| Number of imputed covariates | |||
| 1 | 6 | 3 | 9 |
| 2 | 2 | 7 | 9 |
| >2 | 5 | 8 | 13 |
| Uncleara | 1 | 3 | 4 |
*Unless otherwise stated.
aAuthors provided a generic statement regarding the imputed variables (e.g. the missing data in the covariates were imputed), and did not explicitly specify which outcome or covariate with missing data was imputed, so the number or type of imputed variables could not be verified.
bOne article [128] imputed missing data in 5 incomplete variables for two questionnaires recorded at 6 different waves of data collection (i.e. 60 imputed variables).
cIn one paper [98], the use of MI for imputing missing data in the covariates was derived from the cited reference, so the data type of imputed variables was not clear.
Reporting of MI procedure in articles using multiple imputation
|
| |||
|---|---|---|---|
|
|
|
|
|
|
|
|
| |
|
|
|
| |
|
| |||
| Any imputation details provideda | 60 (82) | 27 (90) | 87 (85) |
| Imputation method stated | 29 (40) | 9 (30) | 38 (37) |
| MI using chained equations (MICE) | 14 | 6 | 20 |
| MI using multivariate normal model (MVNI)b | 7 | 1 | 8 |
| MI using predictive mean matching (PMM) | 1 | 0 | 1 |
| MI using regression-based imputationc | 4 | 1 | 5 |
| MI using MICE & PMMd | 1 | 1 | 2 |
| MI using propensity score | 1 | 0 | 1 |
| MI using propensity score or regression modellinge | 1 | 0 | 1 |
| General procedure/command specified | 5 (7) | 2 (7) | 7 (7) |
| Proc MI | 4 | 1 | 5 |
| MI command | 0 | 1 | 1 |
| Model-based MIf | 1 | 0 | 1 |
| Imputation method inferred | 11 (15) | 10 (33) | 21 (20) |
| MICE (SAS- IVEware) | 1 | 2 | 3 |
| MICE (Stata- pre V11) | 1 | 2 | 3 |
| MICE (Multiple packageg) | 1 | 0 | 1 |
| MVNI (SAS- pre V9.3-imputed more than 1 variable) | 5 | 1 | 6 |
| MVNI (R-Amelia II) | 0 | 2 | 2 |
| MVNI (S-plus) | 2 | 0 | 2 |
| Regression-based imputation (SAS pre V9.3-imputed 1 categorical variable) | 1 | 3 | 4 |
| Non-normal variables transformed prior to imputation | 6 (8) | 6 (20) | 12 (12) |
| Log transformationh | 4 | 4 | 8 |
| Logit transformation | 0 | 1 | 1 |
| General comment about applying normalising transformation | 2 | 1 | 3 |
| Provided details on the variables included in the imputation model | 26 (36) | 13 (43) | 39 (38) |
| Included auxiliary variable(s) | 6 | 4 | 10 |
| Included interaction term(s) | 2 | 2 | 4 |
| Included auxiliary variable and interaction | 3 | 2 | 5 |
| No information provided on auxiliary variables and interaction terms | 15 | 5 | 20 |
| Number of imputations | 28 (38) | 19 (63) | 47 (46) |
| ≤5 | 8 | 3 | 11 |
| 10 | 6 | 3 | 9 |
| 11-50 | 8 | 6 | 14 |
| 100 | 4 | 6 | 10 |
| >100 | 2 | 1 | 3 |
| Carried out diagnostic checks of the imputation modeli | 0 (0) | 2 (7) | 2 (2) |
| Assessed differences between results obtained from CC/LOCF and MI in the text/tablej | 45 (62) | 17 (57) | 62 (60) |
|
| |||
| Imputation software statedk,l | 51 (70) | 25 (83) | 76 (74) |
| SAS | 23 | 10 | 33 |
| Stata | 18 | 9 | 27 |
| R | 6 | 6 | 12 |
| Other packages (SOLAS, S-plus, SPSS) | 4 | 0 | 4 |
|
| |||
| MI used in the primary analysis | 26 (36) | 12 (40) | 38 (37) |
| MI used as a secondary analysis | 47 (64) | 19 (63) | 66l (64) |
| Methods used for primary analysis if MI applied as a secondary analysis | |||
| Complete case analysis (CC)m,n | 43 | 19 | 62 |
| Last observation carried forward (LOCF) | 4 | 0 | 4 |
| Sensitivity analysis following MI | 3 (4) | 0 (0) | 3 (3) |
| Pattern-mixture model approach | 1 | 0 | 1 |
| Selection model approach | 0 | 0 | 0 |
| Performed but the method not statedo | 2 | 0 | 2 |
*Unless otherwise stated.
Abbreviations: MI- multiple imputation, MICE- multiple imputation by chained equations, MVNI- multivariate normal imputation, PMM- predictive mean matching, MCMC- Markov chain Monte Carlo, CC- complete case, LOCF- last observation carried forward.
aAny information provided by the authors with regard to the imputation process. Note: a general procedure/command stated by the authors, and the imputation methods that were inferred by the reviewers are not included in this category.
bIn five articles [35,61,68,90] MI via MCMC algorithm was used for imputing missing data.
cIn three articles [40,47,84], logistic regression method and in two articles [39,113], linear regression method were stated as a imputation method for handling missing data.
dTwo articles [61,93] imputed one or two variables with missing data under PMM (because of non-normality), and imputed other incomplete variables under MICE.
eOne article [91] stated that MI was used on the basis of either propensity scoring or regression modelling for imputation of missing data in the primary and secondary outcome measures.
fOne article [51] stated that model-based MI was used to account for missing data in the clinical outcome.
gIn one article [77] multiple packages were used for the analyses, i.e. SPSS version 15.0 and Stata version 10.1. The default imputation method in either of these packages (given the specified versions) was chained equations.
hOne article [93] used both the square root and log transformations for non-normally distributed variables.
iBoth articles [82,130] compared the observed and imputed data.
jThe MI estimates were not provided in 6 articles [34,37,81,85,87,120], instead a comparison of the results between the different approaches for dealing with the missing data was commented on in the text (e.g. the analysis of complete cases and the imputed data provided the same results).
kFor eight articles [59,77,81,88,94,96,115,127] it was not possible to extract this information because multiple packages for the statistical analyses were mentioned with no explicit statement regarding which package was used for imputation.
lThose articles that did not provide the name of the imputation software (R, Stata, SAS, etc.), but instead gave the name of the procedure/application used for imputing missing data (e.g. Amelia II, IVEware) were also included here.
mOne article [99] used MI as well as CC for primary analysis to impute the missing confounder values (with no imputation of missing data in the exposure and outcome), and used MI again as a sensitivity analysis to impute missing data in all confounders and the outcome (but not the exposure), as well as a CC.
nTwo articles [40,100] used LOCF for the secondary analysis as well as MI; one of them described the MI as a sensitivity analysis.
oA general statement was made about performing a sensitivity analysis but the results of the details were not provided.