Literature DB >> 20505004

On safari to Random Jungle: a fast implementation of Random Forests for high-dimensional data.

Daniel F Schwarz1, Inke R König, Andreas Ziegler.   

Abstract

MOTIVATION: Genome-wide association (GWA) studies have proven to be a successful approach for helping unravel the genetic basis of complex genetic diseases. However, the identified associations are not well suited for disease prediction, and only a modest portion of the heritability can be explained for most diseases, such as Type 2 diabetes or Crohn's disease. This may partly be due to the low power of standard statistical approaches to detect gene-gene and gene-environment interactions when small marginal effects are present. A promising alternative is Random Forests, which have already been successfully applied in candidate gene analyses. Important single nucleotide polymorphisms are detected by permutation importance measures. To this day, the application to GWA data was highly cumbersome with existing implementations because of the high computational burden.
RESULTS: Here, we present the new freely available software package Random Jungle (RJ), which facilitates the rapid analysis of GWA data. The program yields valid results and computes up to 159 times faster than the fastest alternative implementation, while still maintaining all options of other programs. Specifically, it offers the different permutation importance measures available. It includes new options such as the backward elimination method. We illustrate the application of RJ to a GWA of Crohn's disease. The most important single nucleotide polymorphisms (SNPs) validate recent findings in the literature and reveal potential interactions. AVAILABILITY: The RJ software package is freely available at http://www.randomjungle.org

Entities:  

Mesh:

Year:  2010        PMID: 20505004      PMCID: PMC2894507          DOI: 10.1093/bioinformatics/btq257

Source DB:  PubMed          Journal:  Bioinformatics        ISSN: 1367-4803            Impact factor:   6.937


  46 in total

1.  Selecting SNPs in two-stage analysis of disease association data: a model-free approach.

Authors:  J Hoh; A Wille; R Zee; S Cheng; R Reynolds; K Lindpaintner; J Ott
Journal:  Ann Hum Genet       Date:  2000-09       Impact factor: 1.670

2.  Genome-wide strategies for detecting multiple loci that influence complex diseases.

Authors:  Jonathan Marchini; Peter Donnelly; Lon R Cardon
Journal:  Nat Genet       Date:  2005-03-27       Impact factor: 38.330

3.  A genome-wide association study identifies IL23R as an inflammatory bowel disease gene.

Authors:  Richard H Duerr; Kent D Taylor; Steven R Brant; John D Rioux; Mark S Silverberg; Mark J Daly; A Hillary Steinhart; Clara Abraham; Miguel Regueiro; Anne Griffiths; Themistocles Dassopoulos; Alain Bitton; Huiying Yang; Stephan Targan; Lisa Wu Datta; Emily O Kistner; L Philip Schumm; Annette T Lee; Peter K Gregersen; M Michael Barmada; Jerome I Rotter; Dan L Nicolae; Judy H Cho
Journal:  Science       Date:  2006-10-26       Impact factor: 47.728

4.  Data mining, neural nets, trees--problems 2 and 3 of Genetic Analysis Workshop 15.

Authors:  Andreas Ziegler; Anita L DeStefano; Inke R König; Claire Bardel; Dumitru Brinza; Shelley Bull; Zhaohui Cai; Beate Glaser; Wei Jiang; Kristine E Lee; Chuang Xing Li; Jing Li; Xin Li; Paul Majoram; Yan Meng; Kristin K Nicodemus; Alexander Platt; Daniel F Schwarz; Weilang Shi; Yin Yao Shugart; Hans H Stassen; Yan V Sun; Sungho Won; Wenyi Wang; Grace Wahba; Usumah A Zagaar; Zhenming Zhao
Journal:  Genet Epidemiol       Date:  2007       Impact factor: 2.135

5.  Cyclooxygenase-2 overexpression inhibits death receptor 5 expression and confers resistance to tumor necrosis factor-related apoptosis-inducing ligand-induced apoptosis in human colon cancer cells.

Authors:  Ximing Tang; Yun Jie Sun; Elizabeth Half; M Tien Kuo; Frank Sinicrope
Journal:  Cancer Res       Date:  2002-09-01       Impact factor: 12.701

6.  The behaviour of random forest permutation-based variable importance measures under predictor correlation.

Authors:  Kristin K Nicodemus; James D Malley; Carolin Strobl; Andreas Ziegler
Journal:  BMC Bioinformatics       Date:  2010-02-27       Impact factor: 3.169

7.  Parallels between global transcriptional programs of polarizing Caco-2 intestinal epithelial cells in vitro and gene expression programs in normal colon and colon cancer.

Authors:  Annika M Sääf; Jennifer M Halbleib; Xin Chen; Siu Tsan Yuen; Suet Yi Leung; W James Nelson; Patrick O Brown
Journal:  Mol Biol Cell       Date:  2007-08-15       Impact factor: 4.138

8.  Induction of Nod2 in myelomonocytic and intestinal epithelial cells via nuclear factor-kappa B activation.

Authors:  Olga Gutierrez; Carlos Pipaon; Naohiro Inohara; Ana Fontalba; Yasunori Ogura; Felipe Prosper; Gabriel Nunez; Jose L Fernandez-Luna
Journal:  J Biol Chem       Date:  2002-08-22       Impact factor: 5.157

9.  Screening large-scale association study data: exploiting interactions using random forests.

Authors:  Kathryn L Lunetta; L Brooke Hayward; Jonathan Segal; Paul Van Eerdewegh
Journal:  BMC Genet       Date:  2004-12-10       Impact factor: 2.797

10.  Genetic Analysis Workshop 15: simulation of a complex genetic model for rheumatoid arthritis in nuclear families including a dense SNP map with linkage disequilibrium between marker loci and trait loci.

Authors:  Michael B Miller; Gregg R Lind; Na Li; Soon-Young Jang
Journal:  BMC Proc       Date:  2007-12-18
View more
  84 in total

1.  Next generation analytic tools for large scale genetic epidemiology studies of complex diseases.

Authors:  Leah E Mechanic; Huann-Sheng Chen; Christopher I Amos; Nilanjan Chatterjee; Nancy J Cox; Rao L Divi; Ruzong Fan; Emily L Harris; Kevin Jacobs; Peter Kraft; Suzanne M Leal; Kimberly McAllister; Jason H Moore; Dina N Paltoo; Michael A Province; Erin M Ramos; Marylyn D Ritchie; Kathryn Roeder; Daniel J Schaid; Matthew Stephens; Duncan C Thomas; Clarice R Weinberg; John S Witte; Shunpu Zhang; Sebastian Zöllner; Eric J Feuer; Elizabeth M Gillanders
Journal:  Genet Epidemiol       Date:  2011-12-06       Impact factor: 2.135

2.  Multiple testing in high-throughput sequence data: experiences from Group 8 of Genetic Analysis Workshop 17.

Authors:  Inke R König; Jeremie Nsengimana; Charalampos Papachristou; Matthew A Simonson; Kai Wang; Jason A Weisburd
Journal:  Genet Epidemiol       Date:  2011       Impact factor: 2.135

3.  Detecting genome-wide epistases based on the clustering of relatively frequent items.

Authors:  Minzhu Xie; Jing Li; Tao Jiang
Journal:  Bioinformatics       Date:  2011-11-03       Impact factor: 6.937

4.  Power of data mining methods to detect genetic associations and interactions.

Authors:  Annette M Molinaro; Nicholas Carriero; Robert Bjornson; Patricia Hartge; Nathaniel Rothman; Nilanjan Chatterjee
Journal:  Hum Hered       Date:  2011-09-17       Impact factor: 0.444

5.  BOOST: A fast approach to detecting gene-gene interactions in genome-wide case-control studies.

Authors:  Xiang Wan; Can Yang; Qiang Yang; Hong Xue; Xiaodan Fan; Nelson L S Tang; Weichuan Yu
Journal:  Am J Hum Genet       Date:  2010-09-10       Impact factor: 11.025

6.  ATHENA: the analysis tool for heritable and environmental network associations.

Authors:  Emily R Holzinger; Scott M Dudek; Alex T Frase; Sarah A Pendergrass; Marylyn D Ritchie
Journal:  Bioinformatics       Date:  2013-10-21       Impact factor: 6.937

7.  Random forest fishing: a novel approach to identifying organic group of risk factors in genome-wide association studies.

Authors:  Wei Yang; C Charles Gu
Journal:  Eur J Hum Genet       Date:  2013-05-22       Impact factor: 4.246

8.  Mind the dbGAP: the application of data mining to identify biological mechanisms.

Authors:  Eric C Wooten; Gordon S Huggins
Journal:  Mol Interv       Date:  2011-04

9.  High-dimensional pharmacogenetic prediction of a continuous trait using machine learning techniques with application to warfarin dose prediction in African Americans.

Authors:  Erdal Cosgun; Nita A Limdi; Christine W Duarte
Journal:  Bioinformatics       Date:  2011-03-30       Impact factor: 6.937

10.  ATHENA: a tool for meta-dimensional analysis applied to genotypes and gene expression data to predict HDL cholesterol levels.

Authors:  Emily R Holzinger; Scott M Dudek; Alex T Frase; Ronald M Krauss; Marisa W Medina; Marylyn D Ritchie
Journal:  Pac Symp Biocomput       Date:  2013
View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.