Literature DB >> 15593090

Identifying SNPs predictive of phenotype using random forests.

Alexandre Bureau1, Josée Dupuis, Kathleen Falls, Kathryn L Lunetta, Brooke Hayward, Tim P Keith, Paul Van Eerdewegh.   

Abstract

There has been a great interest and a few successes in the identification of complex disease susceptibility genes in recent years. Association studies, where a large number of single-nucleotide polymorphisms (SNPs) are typed in a sample of cases and controls to determine which genes are associated with a specific disease, provide a powerful approach for complex disease gene mapping. Genes of interest in those studies may contain large numbers of SNPs that classical statistical methods cannot handle simultaneously without requiring prohibitively large sample sizes. By contrast, high-dimensional nonparametric methods thrive on large numbers of predictors. This work explores the application of one such method, random forests, to the problem of identifying SNPs predictive of the phenotype in the case-control study design. A random forest is a collection of classification trees grown on bootstrap samples of observations, using a random subset of predictors to define the best split at each node. The observations left out of the bootstrap samples are used to estimate prediction error. The importance of a predictor is quantified by the increase in misclassification occurring when the values of the predictor are randomly permuted. We extend the concept of importance to pairs of predictors, to capture joint effects, and we explore the behavior of importance measures over a range of two-locus disease models in the presence of a varying number of SNPs unassociated with the phenotype. We illustrate the application of random forests with a data set of asthma cases and unaffected controls genotyped at 42 SNPs in ADAM33, a previously identified asthma susceptibility gene. SNPs and SNP pairs highly associated with asthma tend to have the highest importance index value, but predictive importance and association do not always coincide. 2004 Wiley-Liss, Inc.

Entities:  

Mesh:

Year:  2005        PMID: 15593090     DOI: 10.1002/gepi.20041

Source DB:  PubMed          Journal:  Genet Epidemiol        ISSN: 0741-0395            Impact factor:   2.135


  136 in total

1.  A screening methodology based on Random Forests to improve the detection of gene-gene interactions.

Authors:  Lizzy De Lobel; Pierre Geurts; Guy Baele; Francesc Castro-Giner; Manolis Kogevinas; Kristel Van Steen
Journal:  Eur J Hum Genet       Date:  2010-05-12       Impact factor: 4.246

2.  On safari to Random Jungle: a fast implementation of Random Forests for high-dimensional data.

Authors:  Daniel F Schwarz; Inke R König; Andreas Ziegler
Journal:  Bioinformatics       Date:  2010-05-26       Impact factor: 6.937

3.  Testing SNPs and sets of SNPs for importance in association studies.

Authors:  Holger Schwender; Ingo Ruczinski; Katja Ickstadt
Journal:  Biostatistics       Date:  2010-07-02       Impact factor: 5.899

Review 4.  Phenomics: the next challenge.

Authors:  David Houle; Diddahally R Govindaraju; Stig Omholt
Journal:  Nat Rev Genet       Date:  2010-12       Impact factor: 53.242

5.  Bayesian neural adjustment of inhibitory control predicts emergence of problem stimulant use.

Authors:  Katia M Harlé; Jennifer L Stewart; Shunan Zhang; Susan F Tapert; Angela J Yu; Martin P Paulus
Journal:  Brain       Date:  2015-09-03       Impact factor: 13.501

6.  Machine learning for detecting gene-gene interactions: a review.

Authors:  Brett A McKinney; David M Reif; Marylyn D Ritchie; Jason H Moore
Journal:  Appl Bioinformatics       Date:  2006

7.  Interactions between environmental factors and polymorphisms in angiogenesis pathway genes in esophageal adenocarcinoma risk: a case-only study.

Authors:  Rihong Zhai; Yang Zhao; Geoffrey Liu; Monica Ter-Minassian; I-Chen Wu; Zhaoxi Wang; Li Su; Kofi Asomaning; Feng Chen; Matthew H Kulke; Xihong Lin; Rebecca S Heist; John C Wain; David C Christiani
Journal:  Cancer       Date:  2011-07-12       Impact factor: 6.860

8.  A meta-analytic framework for detection of genetic interactions.

Authors:  Yulun Liu; Yong Chen; Paul Scheet
Journal:  Genet Epidemiol       Date:  2016-08-15       Impact factor: 2.135

9.  Analysis of a large data set to identify predictors of blood transfusion in primary total hip and knee arthroplasty.

Authors:  ZeYu Huang; Cheng Huang; JinWei Xie; Jun Ma; GuoRui Cao; Qiang Huang; Bin Shen; Virginia Byers Kraus; FuXing Pei
Journal:  Transfusion       Date:  2018-08-25       Impact factor: 3.157

10.  An introduction to recursive partitioning: rationale, application, and characteristics of classification and regression trees, bagging, and random forests.

Authors:  Carolin Strobl; James Malley; Gerhard Tutz
Journal:  Psychol Methods       Date:  2009-12
View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.