Literature DB >> 31997388

A fair comparison of tree-based and parametric methods in multiple imputation by chained equations.

Emily Slade1,2, Melissa G Naylor1.   

Abstract

Multiple imputation by chained equations (MICE) has emerged as a leading strategy for imputing missing epidemiological data due to its ease of implementation and ability to maintain unbiased effect estimates and valid inference. Within the MICE algorithm, imputation can be performed using a variety of parametric or nonparametric methods. Literature has suggested that nonparametric tree-based imputation methods outperform parametric methods in terms of bias and coverage when there are interactions or other nonlinear effects among the variables. However, these studies fail to provide a fair comparison as they do not follow the well-established recommendation that any effects in the final analysis model (including interactions) should be included in the parametric imputation model. We show via simulation that properly incorporating interactions in the parametric imputation model leads to much better performance. In fact, correctly specified parametric imputation and tree-based random forest imputation perform similarly when estimating the interaction effect. Parametric imputation leads to slightly higher coverage for the interaction effect, but it has wider confidence intervals than random forest imputation and requires correct specification of the imputation model. Epidemiologists should take care in specifying MICE imputation models, and this paper assists in that task by providing a fair comparison of parametric and tree-based imputation in MICE.
© 2020 John Wiley & Sons, Ltd.

Entities:  

Keywords:  imputation; interaction; missing data; regression tree

Mesh:

Year:  2020        PMID: 31997388      PMCID: PMC9136914          DOI: 10.1002/sim.8468

Source DB:  PubMed          Journal:  Stat Med        ISSN: 0277-6715            Impact factor:   2.497


  7 in total

1.  Multiple imputation for missing data via sequential regression trees.

Authors:  Lane F Burgette; Jerome P Reiter
Journal:  Am J Epidemiol       Date:  2010-09-14       Impact factor: 4.897

2.  Using the outcome for imputation of missing predictor values was preferred.

Authors:  Karel G M Moons; Rogier A R T Donders; Theo Stijnen; Frank E Harrell
Journal:  J Clin Epidemiol       Date:  2006-06-19       Impact factor: 6.437

3.  Multiple imputation: current perspectives.

Authors:  Michael G Kenward; James Carpenter
Journal:  Stat Methods Med Res       Date:  2007-06       Impact factor: 3.021

4.  An introduction to recursive partitioning: rationale, application, and characteristics of classification and regression trees, bagging, and random forests.

Authors:  Carolin Strobl; James Malley; Gerhard Tutz
Journal:  Psychol Methods       Date:  2009-12

5.  Multiple imputation of missing covariates with non-linear effects and interactions: an evaluation of statistical methods.

Authors:  Shaun R Seaman; Jonathan W Bartlett; Ian R White
Journal:  BMC Med Res Methodol       Date:  2012-04-10       Impact factor: 4.615

6.  Comparison of random forest and parametric imputation models for imputing missing data using MICE: a CALIBER study.

Authors:  Anoop D Shah; Jonathan W Bartlett; James Carpenter; Owen Nicholas; Harry Hemingway
Journal:  Am J Epidemiol       Date:  2014-01-12       Impact factor: 4.897

7.  Appropriate inclusion of interactions was needed to avoid bias in multiple imputation.

Authors:  Kate Tilling; Elizabeth J Williamson; Michael Spratt; Jonathan A C Sterne; James R Carpenter
Journal:  J Clin Epidemiol       Date:  2016-07-19       Impact factor: 6.437

  7 in total
  4 in total

1.  Missing Value Estimation using Clustering and Deep Learning within Multiple Imputation Framework.

Authors:  Manar D Samad; Sakib Abrar; Norou Diawara
Journal:  Knowl Based Syst       Date:  2022-05-10       Impact factor: 8.139

2.  Assessment of label-free quantification and missing value imputation for proteomics in non-human primates.

Authors:  Zeeshan Hamid; Kip D Zimmerman; Hector Guillen-Ahlers; Cun Li; Peter Nathanielsz; Laura A Cox; Michael Olivier
Journal:  BMC Genomics       Date:  2022-07-08       Impact factor: 4.547

3.  Comparison of methods for handling covariate missingness in propensity score estimation with a binary exposure.

Authors:  Donna L Coffman; Jiangxiu Zhou; Xizhen Cai
Journal:  BMC Med Res Methodol       Date:  2020-06-26       Impact factor: 4.615

4.  Fishing, predation, and temperature drive herring decline in a large marine ecosystem.

Authors:  Daniel G Boyce; Brian Petrie; Kenneth T Frank
Journal:  Ecol Evol       Date:  2021-12-14       Impact factor: 2.912

  4 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.