Literature DB >> 23695277

Random forest fishing: a novel approach to identifying organic group of risk factors in genome-wide association studies.

Wei Yang1, C Charles Gu2.   

Abstract

Genome-wide association studies (GWAS) has brought methodological challenges in handling massive high-dimensional data and also real opportunities for studying the joint effect of many risk factors acting in concert as an organic group. The random forest (RF) methodology is recognized by many for its potential in examining interaction effects in large data sets. However, RF is not designed to directly handle GWAS data, which typically have hundreds of thousands of single-nucleotide polymorphisms as predictor variables. We propose and evaluate a novel extension of RF, called random forest fishing (RFF), for GWAS analysis. RFF repeatedly updates a relatively small set of predictors obtained by RF tests to find globally important groups predictive of the disease phenotype, using a novel search algorithm based on genetic programming and simulated annealing. A key improvement of RFF results from the use of guidance incorporating empirical test results of genome-wide pairwise interactions. Evaluated using simulated and real GWAS data sets, RFF is shown to be effective in identifying important predictors, particularly when both marginal effects and interactions exist, and is applicable to very large GWAS data sets.

Mesh:

Year:  2013        PMID: 23695277      PMCID: PMC3895629          DOI: 10.1038/ejhg.2013.109

Source DB:  PubMed          Journal:  Eur J Hum Genet        ISSN: 1018-4813            Impact factor:   4.246


  25 in total

1.  Tree and spline based association analysis of gene-gene interaction models for ischemic stroke.

Authors:  Nancy R Cook; Robert Y L Zee; Paul M Ridker
Journal:  Stat Med       Date:  2004-05-15       Impact factor: 2.373

2.  On safari to Random Jungle: a fast implementation of Random Forests for high-dimensional data.

Authors:  Daniel F Schwarz; Inke R König; Andreas Ziegler
Journal:  Bioinformatics       Date:  2010-05-26       Impact factor: 6.937

3.  Genome-wide strategies for detecting multiple loci that influence complex diseases.

Authors:  Jonathan Marchini; Peter Donnelly; Lon R Cardon
Journal:  Nat Genet       Date:  2005-03-27       Impact factor: 38.330

4.  GWAsimulator: a rapid whole-genome simulation program.

Authors:  Chun Li; Mingyao Li
Journal:  Bioinformatics       Date:  2007-11-15       Impact factor: 6.937

5.  Interacting genetic loci on chromosomes 20 and 10 influence extreme human obesity.

Authors:  Chuanhui Dong; Shuang Wang; Wei-Dong Li; Ding Li; Hongyu Zhao; R Arlen Price
Journal:  Am J Hum Genet       Date:  2002-12-11       Impact factor: 11.025

6.  An application of Random Forests to a genome-wide association dataset: methodological considerations & new findings.

Authors:  Benjamin A Goldstein; Alan E Hubbard; Adele Cutler; Lisa F Barcellos
Journal:  BMC Genet       Date:  2010-06-14       Impact factor: 2.797

Review 7.  Detecting gene-gene interactions that underlie human diseases.

Authors:  Heather J Cordell
Journal:  Nat Rev Genet       Date:  2009-06       Impact factor: 53.242

8.  A genome-wide association study of Alzheimer's disease using random forests and enrichment analysis.

Authors:  Liang Zou; Qiong Huang; Ao Li; Minghui Wang
Journal:  Sci China Life Sci       Date:  2012-08-04       Impact factor: 6.038

9.  Screening large-scale association study data: exploiting interactions using random forests.

Authors:  Kathryn L Lunetta; L Brooke Hayward; Jonathan Segal; Paul Van Eerdewegh
Journal:  BMC Genet       Date:  2004-12-10       Impact factor: 2.797

10.  Modifier effects between regulatory and protein-coding variation.

Authors:  Antigone S Dimas; Barbara E Stranger; Claude Beazley; Robert D Finn; Catherine E Ingle; Matthew S Forrest; Matthew E Ritchie; Panos Deloukas; Simon Tavaré; Emmanouil T Dermitzakis
Journal:  PLoS Genet       Date:  2008-10-31       Impact factor: 5.917

View more
  5 in total

1.  A whole-genome simulator capable of modeling high-order epistasis for complex disease.

Authors:  Wei Yang; C Charles Gu
Journal:  Genet Epidemiol       Date:  2013-10-01       Impact factor: 2.135

Review 2.  Towards the identification of the loci of adaptive evolution.

Authors:  Carolina Pardo-Diaz; Camilo Salazar; Chris D Jiggins
Journal:  Methods Ecol Evol       Date:  2015-02-12       Impact factor: 7.781

3.  Detecting gene-gene interactions using a permutation-based random forest method.

Authors:  Jing Li; James D Malley; Angeline S Andrew; Margaret R Karagas; Jason H Moore
Journal:  BioData Min       Date:  2016-04-06       Impact factor: 2.522

Review 4.  What makes a good prediction? Feature importance and beginning to open the black box of machine learning in genetics.

Authors:  Anthony M Musolf; Emily R Holzinger; James D Malley; Joan E Bailey-Wilson
Journal:  Hum Genet       Date:  2021-12-04       Impact factor: 5.881

Review 5.  Multivariate Methods for Genetic Variants Selection and Risk Prediction in Cardiovascular Diseases.

Authors:  Alberto Malovini; Riccardo Bellazzi; Carlo Napolitano; Guia Guffanti
Journal:  Front Cardiovasc Med       Date:  2016-06-08
  5 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.