Literature DB >> 33781921

A random forest method with feature selection for developing medical prediction models with clustered and longitudinal data.

Jaime Lynn Speiser1.   

Abstract

BACKGROUND: Machine learning methodologies are gaining popularity for developing medical prediction models for datasets with a large number of predictors, particularly in the setting of clustered and longitudinal data. Binary Mixed Model (BiMM) forest is a promising machine learning algorithm which may be applied to develop prediction models for clustered and longitudinal binary outcomes. Although machine learning methods for clustered and longitudinal methods such as BiMM forest exist, feature selection has not been analyzed via data simulations. Feature selection improves the practicality and ease of use of prediction models for clinicians by reducing the burden of data collection. Thus, feature selection procedures are not only beneficial, but are often necessary for development of medical prediction models. In this study, we aim to assess feature selection within the BiMM forest setting for modeling clustered and longitudinal binary outcomes.
METHODS: We conducted a simulation study to compare BiMM forest with feature selection (backward elimination or stepwise selection) to standard generalized linear mixed model feature selection methods (shrinkage and backward elimination). We also evaluated feature selection methods to develop models predicting mobility disability in older adults using the Health, Aging and Body Composition Study dataset as an example utilization of the proposed methodology.
RESULTS: BiMM forest with backward elimination generally offered higher computational efficiency, similar or higher predictive performance (accuracy and area under the receiver operating curve), and similar or higher ability to identify correct features compared to linear methods for the different simulated scenarios. For predicting mobility disability in older adults, methods generally performed similarly in terms of accuracy, area under the receiver operating curve, and specificity; however, BiMM forest with backward elimination had the highest sensitivity.
CONCLUSIONS: This study is novel because it is the first investigation of feature selection for developing random forest prediction models for clustered and longitudinal binary outcomes. Results from the simulation study reveal that BiMM forest with backward elimination has the highest accuracy (performance and identification of correct features) and lowest computation time compared to other feature selection methods in some scenarios and similar performance in other scenarios. Many informatics datasets have clustered and longitudinal outcomes and results from this study suggest that BiMM forest with backward elimination may be beneficial for developing medical prediction models.
Copyright © 2021 Elsevier Inc. All rights reserved.

Entities:  

Keywords:  Binary mixed model forest; Clustered outcomes; Feature selection; Longitudinal outcomes; Random forest; Variable selection

Mesh:

Year:  2021        PMID: 33781921      PMCID: PMC8131242          DOI: 10.1016/j.jbi.2021.103763

Source DB:  PubMed          Journal:  J Biomed Inform        ISSN: 1532-0464            Impact factor:   6.317


  29 in total

1.  Machine Learning in Aging: An Example of Developing Prediction Models for Serious Fall Injury in Older Adults.

Authors:  Jaime Lynn Speiser; Kathryn E Callahan; Denise K Houston; Jason Fanning; Thomas M Gill; Jack M Guralnik; Anne B Newman; Marco Pahor; W Jack Rejeski; Michael E Miller
Journal:  J Gerontol A Biol Sci Med Sci       Date:  2021-03-31       Impact factor: 6.053

Review 2.  Random forests for genetic association studies.

Authors:  Benjamin A Goldstein; Eric C Polley; Farren B S Briggs
Journal:  Stat Appl Genet Mol Biol       Date:  2011-07-12

3.  Repeated measures random forests (RMRF): Identifying factors associated with nocturnal hypoglycemia.

Authors:  Peter Calhoun; Richard A Levine; Juanjuan Fan
Journal:  Biometrics       Date:  2020-05-06       Impact factor: 2.571

Review 4.  Development of physical disability in older adults.

Authors:  Todd Manini
Journal:  Curr Aging Sci       Date:  2011-12

5.  Longitudinal clinical score prediction in Alzheimer's disease with soft-split sparse regression based random forest.

Authors:  Lei Huang; Yan Jin; Yaozong Gao; Kim-Han Thung; Dinggang Shen
Journal:  Neurobiol Aging       Date:  2016-07-15       Impact factor: 4.673

6.  Variable selection for semiparametric mixed models in longitudinal studies.

Authors:  Xiao Ni; Daowen Zhang; Hao Helen Zhang
Journal:  Biometrics       Date:  2009-04-13       Impact factor: 2.571

7.  Data mining in the Life Sciences with Random Forest: a walk in the park or lost in the jungle?

Authors:  Wouter G Touw; Jumamurat R Bayjanov; Lex Overmars; Lennart Backus; Jos Boekhorst; Michiel Wels; Sacha A F T van Hijum
Journal:  Brief Bioinform       Date:  2012-07-10       Impact factor: 11.622

8.  Gene selection and classification of microarray data using random forest.

Authors:  Ramón Díaz-Uriarte; Sara Alvarez de Andrés
Journal:  BMC Bioinformatics       Date:  2006-01-06       Impact factor: 3.169

9.  Identification of genes and haplotypes that predict rheumatoid arthritis using random forests.

Authors:  Rui Tang; Jason P Sinnwell; Jia Li; David N Rider; Mariza de Andrade; Joanna M Biernacka
Journal:  BMC Proc       Date:  2009-12-15

10.  Joint analysis of two microarray gene-expression data sets to select lung adenocarcinoma marker genes.

Authors:  Hongying Jiang; Youping Deng; Huann-Sheng Chen; Lin Tao; Qiuying Sha; Jun Chen; Chung-Jui Tsai; Shuanglin Zhang
Journal:  BMC Bioinformatics       Date:  2004-06-24       Impact factor: 3.169

View more
  2 in total

1.  Predicting Future Mobility Limitation in Older Adults: A Machine Learning Analysis of Health ABC Study Data.

Authors:  Jaime L Speiser; Kathryn E Callahan; Edward H Ip; Michael E Miller; Janet A Tooze; Stephen B Kritchevsky; Denise K Houston
Journal:  J Gerontol A Biol Sci Med Sci       Date:  2022-05-05       Impact factor: 6.591

2.  Predicting recurrence and metastasis risk of endometrial carcinoma via prognostic signatures identified from multi-omics data.

Authors:  Ling Li; Wenjing Qiu; Liang Lin; Jinyang Liu; Xiaoli Shi; Yi Shi
Journal:  Front Oncol       Date:  2022-08-19       Impact factor: 5.738

  2 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.