Literature DB >> 22889876

Random forests for genetic association studies.

Benjamin A Goldstein1, Eric C Polley, Farren B S Briggs.   

Abstract

The Random Forests (RF) algorithm has become a commonly used machine learning algorithm for genetic association studies. It is well suited for genetic applications since it is both computationally efficient and models genetic causal mechanisms well. With its growing ubiquity, there has been inconsistent and less than optimal use of RF in the literature. The purpose of this review is to breakdown the theoretical and statistical basis of RF so that practitioners are able to apply it in their work. An emphasis is placed on showing how the various components contribute to bias and variance, as well as discussing variable importance measures. Applications specific to genetic studies are highlighted. To provide context, RF is compared to other commonly used machine learning algorithms.

Entities:  

Mesh:

Year:  2011        PMID: 22889876      PMCID: PMC3154091          DOI: 10.2202/1544-6115.1691

Source DB:  PubMed          Journal:  Stat Appl Genet Mol Biol        ISSN: 1544-6115


  39 in total

1.  Principal components analysis corrects for stratification in genome-wide association studies.

Authors:  Alkes L Price; Nick J Patterson; Robert M Plenge; Michael E Weinblatt; Nancy A Shadick; David Reich
Journal:  Nat Genet       Date:  2006-07-23       Impact factor: 38.330

2.  Modified FDR controlling procedure for multi-stage analyses.

Authors:  Catherine Tuglus; Mark J van der Laan
Journal:  Stat Appl Genet Mol Biol       Date:  2009-02-04

3.  Genome-wide association analysis by lasso penalized logistic regression.

Authors:  Tong Tong Wu; Yi Fang Chen; Trevor Hastie; Eric Sobel; Kenneth Lange
Journal:  Bioinformatics       Date:  2009-01-28       Impact factor: 6.937

4.  Integration of genetic risk factors into a clinical algorithm for multiple sclerosis susceptibility: a weighted genetic risk score.

Authors:  Philip L De Jager; Lori B Chibnik; Jing Cui; Joachim Reischl; Stephan Lehr; K Claire Simon; Cristin Aubin; David Bauer; Jürgen F Heubach; Rupert Sandbrink; Michaela Tyblova; Petra Lelkova; Eva Havrdova; Christoph Pohl; Dana Horakova; Alberto Ascherio; David A Hafler; Elizabeth W Karlson
Journal:  Lancet Neurol       Date:  2009-10-29       Impact factor: 44.182

5.  Maximal conditional chi-square importance in random forests.

Authors:  Minghui Wang; Xiang Chen; Heping Zhang
Journal:  Bioinformatics       Date:  2010-02-03       Impact factor: 6.937

Review 6.  Multigenic modeling of complex disease by random forests.

Authors:  Yan V Sun
Journal:  Adv Genet       Date:  2010       Impact factor: 1.944

7.  An application of Random Forests to a genome-wide association dataset: methodological considerations & new findings.

Authors:  Benjamin A Goldstein; Alan E Hubbard; Adele Cutler; Lisa F Barcellos
Journal:  BMC Genet       Date:  2010-06-14       Impact factor: 2.797

8.  Bias in random forest variable importance measures: illustrations, sources and a solution.

Authors:  Carolin Strobl; Anne-Laure Boulesteix; Achim Zeileis; Torsten Hothorn
Journal:  BMC Bioinformatics       Date:  2007-01-25       Impact factor: 3.169

9.  Screening large-scale association study data: exploiting interactions using random forests.

Authors:  Kathryn L Lunetta; L Brooke Hayward; Jonathan Segal; Paul Van Eerdewegh
Journal:  BMC Genet       Date:  2004-12-10       Impact factor: 2.797

10.  Classification of rheumatoid arthritis status with candidate gene and genome-wide single-nucleotide polymorphisms using random forests.

Authors:  Yan V Sun; Zhaohui Cai; Kaushal Desai; Rachael Lawrance; Richard Leff; Ansar Jawaid; Sharon Lr Kardia; Huiying Yang
Journal:  BMC Proc       Date:  2007-12-18
View more
  55 in total

1.  Random forest fishing: a novel approach to identifying organic group of risk factors in genome-wide association studies.

Authors:  Wei Yang; C Charles Gu
Journal:  Eur J Hum Genet       Date:  2013-05-22       Impact factor: 4.246

2.  Near-term prediction of sudden cardiac death in older hemodialysis patients using electronic health records.

Authors:  Benjamin A Goldstein; Tara I Chang; Aya A Mitani; Themistocles L Assimes; Wolfgang C Winkelmayer
Journal:  Clin J Am Soc Nephrol       Date:  2013-10-31       Impact factor: 8.237

3.  Assessing the accuracy and stability of variable selection methods for random forest modeling in ecology.

Authors:  Eric W Fox; Ryan A Hill; Scott G Leibowitz; Anthony R Olsen; Darren J Thornbrugh; Marc H Weber
Journal:  Environ Monit Assess       Date:  2017-06-06       Impact factor: 2.513

4.  Genetic basis of adult migration timing in anadromous steelhead discovered through multivariate association testing.

Authors:  Jon E Hess; Joseph S Zendt; Amanda R Matala; Shawn R Narum
Journal:  Proc Biol Sci       Date:  2016-05-11       Impact factor: 5.349

5.  Chemical Characterization of Young Virgin Queens and Mated Egg-Laying Queens in the Ant Cataglyphis cursor: Random Forest Classification Analysis for Multivariate Datasets.

Authors:  Thibaud Monnin; Florence Helft; Chloé Leroy; Patrizia d'Ettorre; Claudie Doums
Journal:  J Chem Ecol       Date:  2018-01-19       Impact factor: 2.626

6.  Metabolome-based signature of disease pathology in MS.

Authors:  S L Andersen; F B S Briggs; J H Winnike; Y Natanzon; S Maichle; K J Knagge; L K Newby; S G Gregory
Journal:  Mult Scler Relat Disord       Date:  2019-03-09       Impact factor: 4.339

7.  Modeling X Chromosome Data Using Random Forests: Conquering Sex Bias.

Authors:  Stacey J Winham; Gregory D Jenkins; Joanna M Biernacka
Journal:  Genet Epidemiol       Date:  2015-12-07       Impact factor: 2.135

8.  A random forest method with feature selection for developing medical prediction models with clustered and longitudinal data.

Authors:  Jaime Lynn Speiser
Journal:  J Biomed Inform       Date:  2021-03-26       Impact factor: 6.317

9.  Variable importance-weighted Random Forests.

Authors:  Yiyi Liu; Hongyu Zhao
Journal:  Quant Biol       Date:  2017-11-06

10.  SNP interaction detection with Random Forests in high-dimensional genetic data.

Authors:  Stacey J Winham; Colin L Colby; Robert R Freimuth; Xin Wang; Mariza de Andrade; Marianne Huebner; Joanna M Biernacka
Journal:  BMC Bioinformatics       Date:  2012-07-15       Impact factor: 3.169

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.