Literature DB >> 25619995

Bayesian feature selection for high-dimensional linear regression via the Ising approximation with applications to genomics.

Charles K Fisher1, Pankaj Mehta1.   

Abstract

MOTIVATION: Feature selection, identifying a subset of variables that are relevant for predicting a response, is an important and challenging component of many methods in statistics and machine learning. Feature selection is especially difficult and computationally intensive when the number of variables approaches or exceeds the number of samples, as is often the case for many genomic datasets.
RESULTS: Here, we introduce a new approach--the Bayesian Ising Approximation (BIA)-to rapidly calculate posterior probabilities for feature relevance in L2 penalized linear regression. In the regime where the regression problem is strongly regularized by the prior, we show that computing the marginal posterior probabilities for features is equivalent to computing the magnetizations of an Ising model with weak couplings. Using a mean field approximation, we show it is possible to rapidly compute the feature selection path described by the posterior probabilities as a function of the L2 penalty. We present simulations and analytical results illustrating the accuracy of the BIA on some simple regression problems. Finally, we demonstrate the applicability of the BIA to high-dimensional regression by analyzing a gene expression dataset with nearly 30 000 features. These results also highlight the impact of correlations between features on Bayesian feature selection.
AVAILABILITY AND IMPLEMENTATION: An implementation of the BIA in C++, along with data for reproducing our gene expression analyses, are freely available at http://physics.bu.edu/∼pankajm/BIACode.
© The Author 2015. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com.

Entities:  

Mesh:

Year:  2015        PMID: 25619995     DOI: 10.1093/bioinformatics/btv037

Source DB:  PubMed          Journal:  Bioinformatics        ISSN: 1367-4803            Impact factor:   6.937


  7 in total

1.  A high-bias, low-variance introduction to Machine Learning for physicists.

Authors:  Pankaj Mehta; Ching-Hao Wang; Alexandre G R Day; Clint Richardson; Marin Bukov; Charles K Fisher; David J Schwab
Journal:  Phys Rep       Date:  2019-03-14       Impact factor: 25.600

2.  Unsupervised Bayesian Ising Approximation for decoding neural activity and other biological dictionaries.

Authors:  Damián G Hernández; Samuel J Sober; Ilya Nemenman
Journal:  Elife       Date:  2022-03-22       Impact factor: 8.713

3.  BOSO: A novel feature selection algorithm for linear regression with high-dimensional data.

Authors:  Luis V Valcárcel; Edurne San José-Enériz; Xabier Cendoya; Ángel Rubio; Xabier Agirre; Felipe Prósper; Francisco J Planes
Journal:  PLoS Comput Biol       Date:  2022-05-31       Impact factor: 4.779

4.  DeepCOMBI: explainable artificial intelligence for the analysis and discovery in genome-wide association studies.

Authors:  Bettina Mieth; Alexandre Rozier; Juan Antonio Rodriguez; Marina M C Höhne; Nico Görnitz; Klaus-Robert Müller
Journal:  NAR Genom Bioinform       Date:  2021-07-20

5.  Partition: a surjective mapping approach for dimensionality reduction.

Authors:  Joshua Millstein; Francesca Battaglin; Malcolm Barrett; Shu Cao; Wu Zhang; Sebastian Stintzing; Volker Heinemann; Heinz-Josef Lenz
Journal:  Bioinformatics       Date:  2020-02-01       Impact factor: 6.937

6.  Combining Multiple Hypothesis Testing with Machine Learning Increases the Statistical Power of Genome-wide Association Studies.

Authors:  Bettina Mieth; Marius Kloft; Juan Antonio Rodríguez; Sören Sonnenburg; Robin Vobruba; Carlos Morcillo-Suárez; Xavier Farré; Urko M Marigorta; Ernst Fehr; Thorsten Dickhaus; Gilles Blanchard; Daniel Schunk; Arcadi Navarro; Klaus-Robert Müller
Journal:  Sci Rep       Date:  2016-11-28       Impact factor: 4.379

7.  Variable habitat conditions drive species covariation in the human microbiota.

Authors:  Charles K Fisher; Thierry Mora; Aleksandra M Walczak
Journal:  PLoS Comput Biol       Date:  2017-04-27       Impact factor: 4.475

  7 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.