Literature DB >> 21934324

Power of data mining methods to detect genetic associations and interactions.

Annette M Molinaro1, Nicholas Carriero, Robert Bjornson, Patricia Hartge, Nathaniel Rothman, Nilanjan Chatterjee.   

Abstract

BACKGROUND: Genetic association studies, thus far, have focused on the analysis of individual main effects of SNP markers. Nonetheless, there is a clear need for modeling epistasis or gene-gene interactions to better understand the biologic basis of existing associations. Tree-based methods have been widely studied as tools for building prediction models based on complex variable interactions. An understanding of the power of such methods for the discovery of genetic associations in the presence of complex interactions is of great importance. Here, we systematically evaluate the power of three leading algorithms: random forests (RF), Monte Carlo logic regression (MCLR), and multifactor dimensionality reduction (MDR).
METHODS: We use the algorithm-specific variable importance measures (VIMs) as statistics and employ permutation-based resampling to generate the null distribution and associated p values. The power of the three is assessed via simulation studies. Additionally, in a data analysis, we evaluate the associations between individual SNPs in pro-inflammatory and immunoregulatory genes and the risk of non-Hodgkin lymphoma.
RESULTS: The power of RF is highest in all simulation models, that of MCLR is similar to RF in half, and that of MDR is consistently the lowest.
CONCLUSIONS: Our study indicates that the power of RF VIMs is most reliable. However, in addition to tuning parameters, the power of RF is notably influenced by the type of variable (continuous vs. categorical) and the chosen VIM.
Copyright © 2011 S. Karger AG, Basel.

Entities:  

Mesh:

Year:  2011        PMID: 21934324      PMCID: PMC3222116          DOI: 10.1159/000330579

Source DB:  PubMed          Journal:  Hum Hered        ISSN: 0001-5652            Impact factor:   0.444


  23 in total

1.  On safari to Random Jungle: a fast implementation of Random Forests for high-dimensional data.

Authors:  Daniel F Schwarz; Inke R König; Andreas Ziegler
Journal:  Bioinformatics       Date:  2010-05-26       Impact factor: 6.937

2.  Machine learning in genome-wide association studies.

Authors:  Silke Szymczak; Joanna M Biernacka; Heather J Cordell; Oscar González-Recio; Inke R König; Heping Zhang; Yan V Sun
Journal:  Genet Epidemiol       Date:  2009       Impact factor: 2.135

3.  Maximal conditional chi-square importance in random forests.

Authors:  Minghui Wang; Xiang Chen; Heping Zhang
Journal:  Bioinformatics       Date:  2010-02-03       Impact factor: 6.937

4.  Association of JAK-STAT pathway related genes with lymphoma risk: results of a European case-control study (EpiLymph).

Authors:  Katja Butterbach; Lars Beckmann; Silvia de Sanjosé; Yolanda Benavente; Nikolaus Becker; Lenka Foretova; Marc Maynadie; Pierluigi Cocco; Anthony Staines; Paolo Boffetta; Paul Brennan; Alexandra Nieters
Journal:  Br J Haematol       Date:  2011-03-21       Impact factor: 6.998

5.  Cytokine polymorphisms in the Th1/Th2 pathway and susceptibility to non-Hodgkin lymphoma.

Authors:  Qing Lan; Tongzhang Zheng; Nathaniel Rothman; Yawei Zhang; Sophia S Wang; Min Shen; Sonja I Berndt; Shelia H Zahm; Theodore R Holford; Brian Leaderer; Meredith Yeager; Robert Welch; Peter Boyle; Bing Zhang; Kaiyong Zou; Yong Zhu; Stephen Chanock
Journal:  Blood       Date:  2006-01-31       Impact factor: 22.113

6.  Risk of non-Hodgkin's lymphoma and family history of lymphatic, hematologic, and other cancers.

Authors:  Nilanjan Chatterjee; Patricia Hartge; James R Cerhan; Wendy Cozen; Scott Davis; Naoko Ishibe; Joanne Colt; Lynn Goldin; Richard K Severson
Journal:  Cancer Epidemiol Biomarkers Prev       Date:  2004-09       Impact factor: 4.254

Review 7.  Detecting gene-gene interactions that underlie human diseases.

Authors:  Heather J Cordell
Journal:  Nat Rev Genet       Date:  2009-06       Impact factor: 53.242

8.  Bias in random forest variable importance measures: illustrations, sources and a solution.

Authors:  Carolin Strobl; Anne-Laure Boulesteix; Achim Zeileis; Torsten Hothorn
Journal:  BMC Bioinformatics       Date:  2007-01-25       Impact factor: 3.169

9.  Screening large-scale association study data: exploiting interactions using random forests.

Authors:  Kathryn L Lunetta; L Brooke Hayward; Jonathan Segal; Paul Van Eerdewegh
Journal:  BMC Genet       Date:  2004-12-10       Impact factor: 2.797

10.  Evaluation of random forests performance for genome-wide association studies in the presence of interaction effects.

Authors:  Yoonhee Kim; Robert Wojciechowski; Heejong Sung; Rasika A Mathias; Li Wang; Alison P Klein; Rhoshel K Lenroot; James Malley; Joan E Bailey-Wilson
Journal:  BMC Proc       Date:  2009-12-15
View more
  13 in total

1.  Exploiting Linkage Disequilibrium for Ultrahigh-Dimensional Genome-Wide Data with an Integrated Statistical Approach.

Authors:  Michelle Carlsen; Guifang Fu; Shaun Bushman; Christopher Corcoran
Journal:  Genetics       Date:  2015-12-12       Impact factor: 4.562

2.  Prediction of remission in obsessive compulsive disorder using a novel machine learning strategy.

Authors:  Kathleen D Askland; Sarah Garnaat; Nicholas J Sibrava; Christina L Boisseau; David Strong; Maria Mancebo; Benjamin Greenberg; Steve Rasmussen; Jane Eisen
Journal:  Int J Methods Psychiatr Res       Date:  2015-05-21       Impact factor: 4.035

Review 3.  Random forests for genomic data analysis.

Authors:  Xi Chen; Hemant Ishwaran
Journal:  Genomics       Date:  2012-04-21       Impact factor: 5.736

4.  The association of alcohol intake with γ-glutamyl transferase (GGT) levels: evidence for correlated genetic effects.

Authors:  Jenny H D A van Beek; Marleen H M de Moor; Lot M Geels; Michel R T Sinke; Eco J C de Geus; Gitta H Lubke; Cornelis Kluft; Jacoline Neuteboom; Jacqueline M Vink; Gonneke Willemsen; Dorret I Boomsma
Journal:  Drug Alcohol Depend       Date:  2013-09-27       Impact factor: 4.492

Review 5.  Genomics models in radiotherapy: From mechanistic to machine learning.

Authors:  John Kang; James T Coates; Robert L Strawderman; Barry S Rosenstein; Sarah L Kerns
Journal:  Med Phys       Date:  2020-06       Impact factor: 4.071

6.  Random forests on Hadoop for genome-wide association studies of multivariate neuroimaging phenotypes.

Authors:  Yue Wang; Wilson Goh; Limsoon Wong; Giovanni Montana
Journal:  BMC Bioinformatics       Date:  2013-10-22       Impact factor: 3.169

7.  Evaluating methods for modeling epistasis networks with application to head and neck cancer.

Authors:  Rajesh Talluri; Sanjay Shete
Journal:  Cancer Inform       Date:  2015-02-10

Review 8.  Detecting epistasis in human complex traits.

Authors:  Wen-Hua Wei; Gibran Hemani; Chris S Haley
Journal:  Nat Rev Genet       Date:  2014-09-09       Impact factor: 53.242

9.  Finding type 2 diabetes causal single nucleotide polymorphism combinations and functional modules from genome-wide association data.

Authors:  Chiyong Kang; Hyeji Yu; Gwan-Su Yi
Journal:  BMC Med Inform Decis Mak       Date:  2013-04-05       Impact factor: 2.796

10.  Genetic interaction of GSH metabolic pathway genes in cystic fibrosis.

Authors:  Fernando Augusto de Lima Marson; Carmen Sílvia Bertuzzo; Rodrigo Secolin; Antônio Fernando Ribeiro; José Dirceu Ribeiro
Journal:  BMC Med Genet       Date:  2013-06-10       Impact factor: 2.103

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.