Literature DB >> 29887230

Comparison of variable selection methods for clinical predictive modeling.

L Nelson Sanchez-Pinto1, Laura Ruth Venable2, John Fahrenbach3, Matthew M Churpek4.   

Abstract

OBJECTIVE: Modern machine learning-based modeling methods are increasingly applied to clinical problems. One such application is in variable selection methods for predictive modeling. However, there is limited research comparing the performance of classic and modern for variable selection in clinical datasets.
MATERIALS AND METHODS: We analyzed the performance of eight different variable selection methods: four regression-based methods (stepwise backward selection using p-value and AIC, Least Absolute Shrinkage and Selection Operator, and Elastic Net) and four tree-based methods (Variable Selection Using Random Forest, Regularized Random Forests, Boruta, and Gradient Boosted Feature Selection). We used two clinical datasets of different sizes, a multicenter adult clinical deterioration cohort and a single center pediatric acute kidney injury cohort. Method evaluation included measures of parsimony, variable importance, and discrimination.
RESULTS: In the large, multicenter dataset, the modern tree-based Variable Selection Using Random Forest and the Gradient Boosted Feature Selection methods achieved the best parsimony. In the smaller, single-center dataset, the classic regression-based stepwise backward selection using p-value and AIC methods achieved the best parsimony. In both datasets, variable selection tended to decrease the accuracy of the random forest models and increase the accuracy of logistic regression models.
CONCLUSIONS: The performance of classic regression-based and modern tree-based variable selection methods is associated with the size of the clinical dataset used. Classic regression-based variable selection methods seem to achieve better parsimony in clinical prediction problems in smaller datasets while modern tree-based methods perform better in larger datasets.
Copyright © 2018 Elsevier B.V. All rights reserved.

Entities:  

Keywords:  Data interpretation; Electronic health records; Machine learning; Models; Regression analysis; Statistical; Variable selection

Mesh:

Year:  2018        PMID: 29887230      PMCID: PMC6003624          DOI: 10.1016/j.ijmedinf.2018.05.006

Source DB:  PubMed          Journal:  Int J Med Inform        ISSN: 1386-5056            Impact factor:   4.046


  23 in total

1.  Performance of using multiple stepwise algorithms for variable selection.

Authors:  Ryan E Wiegand
Journal:  Stat Med       Date:  2010-07-10       Impact factor: 2.373

Review 2.  A review of feature selection techniques in bioinformatics.

Authors:  Yvan Saeys; Iñaki Inza; Pedro Larrañaga
Journal:  Bioinformatics       Date:  2007-08-24       Impact factor: 6.937

Review 3.  Feature selection methods for big data bioinformatics: A survey from the search perspective.

Authors:  Lipo Wang; Yaoli Wang; Qing Chang
Journal:  Methods       Date:  2016-08-31       Impact factor: 3.608

4.  Multicenter development and validation of a risk stratification tool for ward patients.

Authors:  Matthew M Churpek; Trevor C Yuen; Christopher Winslow; Ari A Robicsek; David O Meltzer; Robert D Gibbons; Dana P Edelson
Journal:  Am J Respir Crit Care Med       Date:  2014-09-15       Impact factor: 21.405

5.  Multicenter Comparison of Machine Learning Methods and Conventional Regression for Predicting Clinical Deterioration on the Wards.

Authors:  Matthew M Churpek; Trevor C Yuen; Christopher Winslow; David O Meltzer; Michael W Kattan; Dana P Edelson
Journal:  Crit Care Med       Date:  2016-02       Impact factor: 7.598

6.  The influence of feature selection methods on accuracy, stability and interpretability of molecular signatures.

Authors:  Anne-Claire Haury; Pierre Gestraud; Jean-Philippe Vert
Journal:  PLoS One       Date:  2011-12-21       Impact factor: 3.240

7.  Modern modelling techniques are data hungry: a simulation study for predicting dichotomous endpoints.

Authors:  Tjeerd van der Ploeg; Peter C Austin; Ewout W Steyerberg
Journal:  BMC Med Res Methodol       Date:  2014-12-22       Impact factor: 4.615

8.  Variable selection: current practice in epidemiological studies.

Authors:  Stefan Walter; Henning Tiemeier
Journal:  Eur J Epidemiol       Date:  2009-12-05       Impact factor: 8.082

9.  Purposeful selection of variables in logistic regression.

Authors:  Zoran Bursac; C Heath Gauss; David Keith Williams; David W Hosmer
Journal:  Source Code Biol Med       Date:  2008-12-16

10.  Feature selection and validated predictive performance in the domain of Legionella pneumophila: a comparative study.

Authors:  Tjeerd van der Ploeg; Ewout W Steyerberg
Journal:  BMC Res Notes       Date:  2016-03-08
View more
  43 in total

1.  Novel application of approaches to predicting medication adherence using medical claims data.

Authors:  Leah L Zullig; Shelley A Jazowski; Tracy Y Wang; Anne Hellkamp; Daniel Wojdyla; Laine Thomas; Lisa Egbuonu-Davis; Anne Beal; Hayden B Bosworth
Journal:  Health Serv Res       Date:  2019-08-20       Impact factor: 3.402

2.  Machine Learning-Based Predictive Modeling of Surgical Intervention in Glaucoma Using Systemic Data From Electronic Health Records.

Authors:  Sally L Baxter; Charles Marks; Tsung-Ting Kuo; Lucila Ohno-Machado; Robert N Weinreb
Journal:  Am J Ophthalmol       Date:  2019-07-16       Impact factor: 5.258

3.  A Comparison of Random Forest Variable Selection Methods for Classification Prediction Modeling.

Authors:  Jaime Lynn Speiser; Michael E Miller; Janet Tooze; Edward Ip
Journal:  Expert Syst Appl       Date:  2019-05-23       Impact factor: 6.954

Review 4.  Machine Learning for Renal Pathologies: An Updated Survey.

Authors:  Roberto Magherini; Elisa Mussi; Yary Volpe; Rocco Furferi; Francesco Buonamici; Michaela Servi
Journal:  Sensors (Basel)       Date:  2022-07-01       Impact factor: 3.847

5.  Prognostic factors for patients with anal cancer treated with conformal radiotherapy-a systematic review.

Authors:  Alexandra Gilbert; Ane L Appelt; Stelios Theophanous; Robert Samuel; John Lilley; Ann Henry; David Sebag-Montefiore
Journal:  BMC Cancer       Date:  2022-06-03       Impact factor: 4.638

6.  RSMOTE: improving classification performance over imbalanced medical datasets.

Authors:  Mehdi Naseriparsa; Ahmed Al-Shammari; Ming Sheng; Yong Zhang; Rui Zhou
Journal:  Health Inf Sci Syst       Date:  2020-06-12

7.  A Time-Updated, Parsimonious Model to Predict AKI in Hospitalized Children.

Authors:  Ibrahim Sandokji; Yu Yamamoto; Aditya Biswas; Tanima Arora; Ugochukwu Ugwuowo; Michael Simonov; Ishan Saran; Melissa Martin; Jeffrey M Testani; Sherry Mansour; Dennis G Moledina; Jason H Greenberg; F Perry Wilson
Journal:  J Am Soc Nephrol       Date:  2020-05-07       Impact factor: 10.121

8.  Development and Validation of a Model to Predict Long-Term Survival After Liver Transplantation.

Authors:  David Goldberg; Alejandro Mantero; Craig Newcomb; Cindy Delgado; Kimberly Forde; David Kaplan; Binu John; Nadine Nuchovich; Barbara Dominguez; Ezekiel Emanuel; Peter P Reese
Journal:  Liver Transpl       Date:  2021-06       Impact factor: 5.799

9.  Application of Machine Learning in Translational Medicine: Current Status and Future Opportunities.

Authors:  Nadia Terranova; Karthik Venkatakrishnan; Lisa J Benincosa
Journal:  AAPS J       Date:  2021-05-18       Impact factor: 4.009

10.  T-Cell Infiltration and Adaptive Treg Resistance in Response to Androgen Deprivation With or Without Vaccination in Localized Prostate Cancer.

Authors:  Aleksandar Z Obradovic; Matthew C Dallos; Emmanuel S Antonarakis; Charles G Drake; Marianna L Zahurak; Alan W Partin; Edward M Schaeffer; Ashley E Ross; Mohamad E Allaf; Thomas R Nirschl; David Liu; Carolyn G Chapman; Tanya O'Neal; Haiyi Cao; Jennifer N Durham; Gunes Guner; Javier A Baena-Del Valle; Onur Ertunc; Angelo M De Marzo
Journal:  Clin Cancer Res       Date:  2020-03-15       Impact factor: 12.531

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.