Daniel F Schwarz1, Inke R König, Andreas Ziegler. 1. Institut für Medizinische Biometrie und Statistik, Universität zu Lübeck, Universitätsklinikum Schleswig-Holstein, Campus Lübeck, Maria-Goeppert-Strasse 1, 23562 Lübeck, Germany.
Abstract
MOTIVATION: Genome-wide association (GWA) studies have proven to be a successful approach for helping unravel the genetic basis of complex genetic diseases. However, the identified associations are not well suited for disease prediction, and only a modest portion of the heritability can be explained for most diseases, such as Type 2 diabetes or Crohn's disease. This may partly be due to the low power of standard statistical approaches to detect gene-gene and gene-environment interactions when small marginal effects are present. A promising alternative is Random Forests, which have already been successfully applied in candidate gene analyses. Important single nucleotide polymorphisms are detected by permutation importance measures. To this day, the application to GWA data was highly cumbersome with existing implementations because of the high computational burden. RESULTS: Here, we present the new freely available software package Random Jungle (RJ), which facilitates the rapid analysis of GWA data. The program yields valid results and computes up to 159 times faster than the fastest alternative implementation, while still maintaining all options of other programs. Specifically, it offers the different permutation importance measures available. It includes new options such as the backward elimination method. We illustrate the application of RJ to a GWA of Crohn's disease. The most important single nucleotide polymorphisms (SNPs) validate recent findings in the literature and reveal potential interactions. AVAILABILITY: The RJ software package is freely available at http://www.randomjungle.org
MOTIVATION: Genome-wide association (GWA) studies have proven to be a successful approach for helping unravel the genetic basis of complex genetic diseases. However, the identified associations are not well suited for disease prediction, and only a modest portion of the heritability can be explained for most diseases, such as Type 2 diabetes or Crohn's disease. This may partly be due to the low power of standard statistical approaches to detect gene-gene and gene-environment interactions when small marginal effects are present. A promising alternative is Random Forests, which have already been successfully applied in candidate gene analyses. Important single nucleotide polymorphisms are detected by permutation importance measures. To this day, the application to GWA data was highly cumbersome with existing implementations because of the high computational burden. RESULTS: Here, we present the new freely available software package Random Jungle (RJ), which facilitates the rapid analysis of GWA data. The program yields valid results and computes up to 159 times faster than the fastest alternative implementation, while still maintaining all options of other programs. Specifically, it offers the different permutation importance measures available. It includes new options such as the backward elimination method. We illustrate the application of RJ to a GWA of Crohn's disease. The most important single nucleotide polymorphisms (SNPs) validate recent findings in the literature and reveal potential interactions. AVAILABILITY: The RJ software package is freely available at http://www.randomjungle.org
Authors: Richard H Duerr; Kent D Taylor; Steven R Brant; John D Rioux; Mark S Silverberg; Mark J Daly; A Hillary Steinhart; Clara Abraham; Miguel Regueiro; Anne Griffiths; Themistocles Dassopoulos; Alain Bitton; Huiying Yang; Stephan Targan; Lisa Wu Datta; Emily O Kistner; L Philip Schumm; Annette T Lee; Peter K Gregersen; M Michael Barmada; Jerome I Rotter; Dan L Nicolae; Judy H Cho Journal: Science Date: 2006-10-26 Impact factor: 47.728
Authors: Andreas Ziegler; Anita L DeStefano; Inke R König; Claire Bardel; Dumitru Brinza; Shelley Bull; Zhaohui Cai; Beate Glaser; Wei Jiang; Kristine E Lee; Chuang Xing Li; Jing Li; Xin Li; Paul Majoram; Yan Meng; Kristin K Nicodemus; Alexander Platt; Daniel F Schwarz; Weilang Shi; Yin Yao Shugart; Hans H Stassen; Yan V Sun; Sungho Won; Wenyi Wang; Grace Wahba; Usumah A Zagaar; Zhenming Zhao Journal: Genet Epidemiol Date: 2007 Impact factor: 2.135
Authors: Annika M Sääf; Jennifer M Halbleib; Xin Chen; Siu Tsan Yuen; Suet Yi Leung; W James Nelson; Patrick O Brown Journal: Mol Biol Cell Date: 2007-08-15 Impact factor: 4.138
Authors: Olga Gutierrez; Carlos Pipaon; Naohiro Inohara; Ana Fontalba; Yasunori Ogura; Felipe Prosper; Gabriel Nunez; Jose L Fernandez-Luna Journal: J Biol Chem Date: 2002-08-22 Impact factor: 5.157
Authors: Leah E Mechanic; Huann-Sheng Chen; Christopher I Amos; Nilanjan Chatterjee; Nancy J Cox; Rao L Divi; Ruzong Fan; Emily L Harris; Kevin Jacobs; Peter Kraft; Suzanne M Leal; Kimberly McAllister; Jason H Moore; Dina N Paltoo; Michael A Province; Erin M Ramos; Marylyn D Ritchie; Kathryn Roeder; Daniel J Schaid; Matthew Stephens; Duncan C Thomas; Clarice R Weinberg; John S Witte; Shunpu Zhang; Sebastian Zöllner; Eric J Feuer; Elizabeth M Gillanders Journal: Genet Epidemiol Date: 2011-12-06 Impact factor: 2.135
Authors: Inke R König; Jeremie Nsengimana; Charalampos Papachristou; Matthew A Simonson; Kai Wang; Jason A Weisburd Journal: Genet Epidemiol Date: 2011 Impact factor: 2.135
Authors: Xiang Wan; Can Yang; Qiang Yang; Hong Xue; Xiaodan Fan; Nelson L S Tang; Weichuan Yu Journal: Am J Hum Genet Date: 2010-09-10 Impact factor: 11.025
Authors: Emily R Holzinger; Scott M Dudek; Alex T Frase; Sarah A Pendergrass; Marylyn D Ritchie Journal: Bioinformatics Date: 2013-10-21 Impact factor: 6.937