BACKGROUND: Genetic association studies, thus far, have focused on the analysis of individual main effects of SNP markers. Nonetheless, there is a clear need for modeling epistasis or gene-gene interactions to better understand the biologic basis of existing associations. Tree-based methods have been widely studied as tools for building prediction models based on complex variable interactions. An understanding of the power of such methods for the discovery of genetic associations in the presence of complex interactions is of great importance. Here, we systematically evaluate the power of three leading algorithms: random forests (RF), Monte Carlo logic regression (MCLR), and multifactor dimensionality reduction (MDR). METHODS: We use the algorithm-specific variable importance measures (VIMs) as statistics and employ permutation-based resampling to generate the null distribution and associated p values. The power of the three is assessed via simulation studies. Additionally, in a data analysis, we evaluate the associations between individual SNPs in pro-inflammatory and immunoregulatory genes and the risk of non-Hodgkin lymphoma. RESULTS: The power of RF is highest in all simulation models, that of MCLR is similar to RF in half, and that of MDR is consistently the lowest. CONCLUSIONS: Our study indicates that the power of RF VIMs is most reliable. However, in addition to tuning parameters, the power of RF is notably influenced by the type of variable (continuous vs. categorical) and the chosen VIM.
BACKGROUND: Genetic association studies, thus far, have focused on the analysis of individual main effects of SNP markers. Nonetheless, there is a clear need for modeling epistasis or gene-gene interactions to better understand the biologic basis of existing associations. Tree-based methods have been widely studied as tools for building prediction models based on complex variable interactions. An understanding of the power of such methods for the discovery of genetic associations in the presence of complex interactions is of great importance. Here, we systematically evaluate the power of three leading algorithms: random forests (RF), Monte Carlo logic regression (MCLR), and multifactor dimensionality reduction (MDR). METHODS: We use the algorithm-specific variable importance measures (VIMs) as statistics and employ permutation-based resampling to generate the null distribution and associated p values. The power of the three is assessed via simulation studies. Additionally, in a data analysis, we evaluate the associations between individual SNPs in pro-inflammatory and immunoregulatory genes and the risk of non-Hodgkin lymphoma. RESULTS: The power of RF is highest in all simulation models, that of MCLR is similar to RF in half, and that of MDR is consistently the lowest. CONCLUSIONS: Our study indicates that the power of RF VIMs is most reliable. However, in addition to tuning parameters, the power of RF is notably influenced by the type of variable (continuous vs. categorical) and the chosen VIM.
Authors: Silke Szymczak; Joanna M Biernacka; Heather J Cordell; Oscar González-Recio; Inke R König; Heping Zhang; Yan V Sun Journal: Genet Epidemiol Date: 2009 Impact factor: 2.135
Authors: Katja Butterbach; Lars Beckmann; Silvia de Sanjosé; Yolanda Benavente; Nikolaus Becker; Lenka Foretova; Marc Maynadie; Pierluigi Cocco; Anthony Staines; Paolo Boffetta; Paul Brennan; Alexandra Nieters Journal: Br J Haematol Date: 2011-03-21 Impact factor: 6.998
Authors: Yoonhee Kim; Robert Wojciechowski; Heejong Sung; Rasika A Mathias; Li Wang; Alison P Klein; Rhoshel K Lenroot; James Malley; Joan E Bailey-Wilson Journal: BMC Proc Date: 2009-12-15
Authors: Kathleen D Askland; Sarah Garnaat; Nicholas J Sibrava; Christina L Boisseau; David Strong; Maria Mancebo; Benjamin Greenberg; Steve Rasmussen; Jane Eisen Journal: Int J Methods Psychiatr Res Date: 2015-05-21 Impact factor: 4.035
Authors: Jenny H D A van Beek; Marleen H M de Moor; Lot M Geels; Michel R T Sinke; Eco J C de Geus; Gitta H Lubke; Cornelis Kluft; Jacoline Neuteboom; Jacqueline M Vink; Gonneke Willemsen; Dorret I Boomsma Journal: Drug Alcohol Depend Date: 2013-09-27 Impact factor: 4.492
Authors: Fernando Augusto de Lima Marson; Carmen Sílvia Bertuzzo; Rodrigo Secolin; Antônio Fernando Ribeiro; José Dirceu Ribeiro Journal: BMC Med Genet Date: 2013-06-10 Impact factor: 2.103