Warning: Undefined array key "mm" in /www/wwwroot/www.ai-bt.com/si.php on line 10 Deprecated: trim(): Passing null to parameter #1 ($string) of type string is deprecated in /www/wwwroot/www.ai-bt.com/si.php on line 10 missForest with feature selection using binary particle swarm optimization improves the imputation accuracy of continuous data.

Literature DB >> 35384632

missForest with feature selection using binary particle swarm optimization improves the imputation accuracy of continuous data.

Heejin Jin¹, Surin Jung², Sungho Won^3,4,5.

Abstract

BACKGROUND: Missing data are a common problem in large-scale datasets and its appropriate handling is crucial for data analyses. Missingness can be categorized as (1) missing completely at random (MCAR), (2) missing at random (MAR), and (3) missing not at random (MNAR). Different missingness mechanisms require different imputation strategies. Multiple imputation, an approach for averaging outcomes across multiple imputed data, is more suitable than single imputation for dealing with various missing mechanisms. missForest, a nonparametric missing value imputation strategy using random forest, is one of the most prevalent multiple imputation methods for missing-data because it can be applied to mixed-type data and does not require distributional assumptions. However, a recent study found that missForest can produce biased results for non-normal data. In addition, missForest is computationally expensive.
OBJECTIVE: Therefore, we aimed to further develop the missForest algorithm by combining a binary particle swarm optimization (BPSO)-based feature-selection strategy.
METHODS: The BPSO is an evolutionary algorithm that is well known for global optimization and computational efficiency. By using the BPSO-based feature selection step prior to imputing missing values with missForest, the imputation accuracy for continuous variables could be increased by pruning redundant variables.
RESULTS: In this study, missForest with BPSO (BPSOmf) showed better imputation accuracy than missForest alone with respect to continuous variables by feature selection prior to the imputation step.
CONCLUSIONS: BPSOmf is an appropriate and robust method when the imputation target data consist mainly of continuous variables.

Entities: Chemical

Keywords: Binary particle swarm optimization; Feature selection; Imputation; missForest

Mesh：

Year: 2022 PMID： 35384632 DOI： 10.1007/s13258-022-01247-8

Source DB: PubMed Journal: Genes Genomics ISSN： 1976-9571 Impact factor: 1.839

Keyword Cloud
References

14 in total

1. MissForest--non-parametric missing value imputation for mixed-type data.

Authors: Daniel J Stekhoven; Peter Bühlmann
Journal: Bioinformatics Date: 2011-10-28 Impact factor: 6.937

Review 2. Multiple imputation in health-care databases: an overview and some applications.

Authors: D B Rubin; N Schenker
Journal: Stat Med Date: 1991-04 Impact factor: 2.373

Review 3. Review: a gentle introduction to imputation of missing values.

Authors: A Rogier T Donders; Geert J M G van der Heijden; Theo Stijnen; Karel G M Moons
Journal: J Clin Epidemiol Date: 2006-07-11 Impact factor: 6.437

4. Multiple imputation by chained equations: what is it and how does it work?

Authors: Melissa J Azur; Elizabeth A Stuart; Constantine Frangakis; Philip J Leaf
Journal: Int J Methods Psychiatr Res Date: 2011-03 Impact factor: 4.035

5. A Review of Hot Deck Imputation for Survey Non-response.

Authors: Rebecca R Andridge; Roderick J A Little
Journal: Int Stat Rev Date: 2010-04 Impact factor: 2.217

6. UK biobank: an open access resource for identifying the causes of a wide range of complex diseases of middle and old age.

Authors: Cathie Sudlow; John Gallacher; Naomi Allen; Valerie Beral; Paul Burton; John Danesh; Paul Downey; Paul Elliott; Jane Green; Martin Landray; Bette Liu; Paul Matthews; Giok Ong; Jill Pell; Alan Silman; Alan Young; Tim Sprosen; Tim Peakman; Rory Collins
Journal: PLoS Med Date: 2015-03-31 Impact factor: 11.069

7. Cohort Profile: The Korean Genome and Epidemiology Study (KoGES) Consortium.

Authors: Yeonjung Kim; Bok-Ghee Han
Journal: Int J Epidemiol Date: 2017-04-01 Impact factor: 7.196

8. Accuracy of random-forest-based imputation of missing data in the presence of non-normality, non-linearity, and interaction.

Authors: Shangzhi Hong; Henry S Lynn
Journal: BMC Med Res Methodol Date: 2020-07-25 Impact factor: 4.615

9. Comparison of random forest and parametric imputation models for imputing missing data using MICE: a CALIBER study.

Authors: Anoop D Shah; Jonathan W Bartlett; James Carpenter; Owen Nicholas; Harry Hemingway
Journal: Am J Epidemiol Date: 2014-01-12 Impact factor: 4.897

10. Data resource profile: the Korea National Health and Nutrition Examination Survey (KNHANES).

Authors: Sanghui Kweon; Yuna Kim; Myoung-jin Jang; Yoonjung Kim; Kirang Kim; Sunhye Choi; Chaemin Chun; Young-Ho Khang; Kyungwon Oh
Journal: Int J Epidemiol Date: 2014-02 Impact factor: 7.196