Literature DB >> 29085552

Variable Selection in the Presence of Missing Data: Imputation-based Methods.

Yize Zhao1, Qi Long2.   

Abstract

Variable selection plays an essential role in regression analysis as it identifies important variables that associated with outcomes and is known to improve predictive accuracy of resulting models. Variable selection methods have been widely investigated for fully observed data. However, in the presence of missing data, methods for variable selection need to be carefully designed to account for missing data mechanisms and statistical techniques used for handling missing data. Since imputation is arguably the most popular method for handling missing data due to its ease of use, statistical methods for variable selection that are combined with imputation are of particular interest. These methods, valid used under the assumptions of missing at random (MAR) and missing completely at random (MCAR), largely fall into three general strategies. The first strategy applies existing variable selection methods to each imputed dataset and then combine variable selection results across all imputed datasets. The second strategy applies existing variable selection methods to stacked imputed datasets. The third variable selection strategy combines resampling techniques such as bootstrap with imputation. Despite recent advances, this area remains under-developed and offers fertile ground for further research.

Entities:  

Keywords:  MAR; MCAR; MNAR; bootstrap; imputation; missing data; resampling; variable selection

Year:  2017        PMID: 29085552      PMCID: PMC5659333          DOI: 10.1002/wics.1402

Source DB:  PubMed          Journal:  Wiley Interdiscip Rev Comput Stat        ISSN: 1939-0068


  19 in total

1.  Penalized Estimating Functions and Variable Selection in Semiparametric Regression Models.

Authors:  Brent A Johnson; D Y Lin; Donglin Zeng
Journal:  J Am Stat Assoc       Date:  2008-06-01       Impact factor: 5.033

2.  How should variable selection be performed with multiply imputed data?

Authors:  Angela M Wood; Ian R White; Patrick Royston
Journal:  Stat Med       Date:  2008-07-30       Impact factor: 2.373

3.  Model selection of generalized estimating equations with multiply imputed longitudinal data.

Authors:  Chung-Wei Shen; Yi-Hau Chen
Journal:  Biom J       Date:  2013-08-23       Impact factor: 2.207

4.  Multiple imputation in the presence of high-dimensional data.

Authors:  Yize Zhao; Qi Long
Journal:  Stat Methods Med Res       Date:  2013-11-25       Impact factor: 3.021

Review 5.  Multivariable prognostic models: issues in developing models, evaluating assumptions and adequacy, and measuring and reducing errors.

Authors:  F E Harrell; K L Lee; D B Mark
Journal:  Stat Med       Date:  1996-02-28       Impact factor: 2.373

6.  Variable selection for multiply-imputed data with application to dioxin exposure study.

Authors:  Qixuan Chen; Sijian Wang
Journal:  Stat Med       Date:  2013-03-25       Impact factor: 2.373

7.  VARIABLE SELECTION AND PREDICTION WITH INCOMPLETE HIGH-DIMENSIONAL DATA.

Authors:  Ying Liu; Yuanjia Wang; Yang Feng; Melanie M Wall
Journal:  Ann Appl Stat       Date:  2016-03-25       Impact factor: 2.083

8.  Validation of prediction models based on lasso regression with multiply imputed data.

Authors:  Jammbe Z Musoro; Aeilko H Zwinderman; Milo A Puhan; Gerben ter Riet; Ronald B Geskus
Journal:  BMC Med Res Methodol       Date:  2014-10-16       Impact factor: 4.615

9.  Multiple Imputation for General Missing Data Patterns in the Presence of High-dimensional Data.

Authors:  Yi Deng; Changgee Chang; Moges Seyoum Ido; Qi Long
Journal:  Sci Rep       Date:  2016-02-12       Impact factor: 4.379

10.  Variable selection under multiple imputation using the bootstrap in a prognostic study.

Authors:  Martijn W Heymans; Stef van Buuren; Dirk L Knol; Willem van Mechelen; Henrica C W de Vet
Journal:  BMC Med Res Methodol       Date:  2007-07-13       Impact factor: 4.615

View more
  5 in total

1.  Etiology and Timing of Postoperative Rapid Response Team Activations.

Authors:  Jeremy P Walco; Dorothee A Mueller; Sameer Lakha; Liza M Weavind; Jacob C Clifton; Robert E Freundlich
Journal:  J Med Syst       Date:  2021-07-14       Impact factor: 4.920

2.  Children's mental and behavioral health, schooling, and socioeconomic characteristics during school closure in France due to COVID-19: the SAPRIS project.

Authors:  Maëva Monnier; Flore Moulin; Xavier Thierry; Stéphanie Vandentorren; Sylvana Côté; Susana Barbosa; Bruno Falissard; Sabine Plancoulaine; Marie-Aline Charles; Thierry Simeon; Bertrand Geay; Laetitia Marchand; Pierre-Yves Ancel; Maria Melchior; Alexandra Rouquette; Cédric Galera
Journal:  Sci Rep       Date:  2021-11-17       Impact factor: 4.379

3.  Characteristics of community-dwelling older individuals who delayed care during the COVID-19 pandemic.

Authors:  Ling Na
Journal:  Arch Gerontol Geriatr       Date:  2022-04-27       Impact factor: 4.163

4.  Arbovirus risk perception as a predictor of mosquito-bite preventive behaviors in Ponce, Puerto Rico.

Authors:  Josée M Dussault; Gabriela Paz-Bailey; Liliana Sánchez-González; Laura E Adams; Dania M Rodríguez; Kyle R Ryff; Chelsea G Major; Olga Lorenzi; Vanessa Rivera-Amill
Journal:  PLoS Negl Trop Dis       Date:  2022-07-26

5.  A comparison of model selection methods for prediction in the presence of multiply imputed data.

Authors:  Le Thi Phuong Thao; Ronald Geskus
Journal:  Biom J       Date:  2018-10-23       Impact factor: 2.207

  5 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.