Literature DB >> 22847933

An integrated approach to reduce the impact of minor allele frequency and linkage disequilibrium on variable importance measures for genome-wide data.

Raymond Walters1, Charles Laurin, Gitta H Lubke.   

Abstract

MOTIVATION: There is growing momentum to develop statistical learning (SL) methods as an alternative to conventional genome-wide association studies (GWAS). Methods such as random forests (RF) and gradient boosting machine (GBM) result in variable importance measures that indicate how well each single-nucleotide polymorphism (SNP) predicts the phenotype. For RF, it has been shown that variable importance measures are systematically affected by minor allele frequency (MAF) and linkage disequilibrium (LD). To establish RF and GBM as viable alternatives for analyzing genome-wide data, it is necessary to address this potential bias and show that SL methods do not significantly under-perform conventional GWAS methods.
RESULTS: Both LD and MAF have a significant impact on the variable importance measures commonly used in RF and GBM. Dividing SNPs into overlapping subsets with approximate linkage equilibrium and applying SL methods to each subset successfully reduces the impact of LD. A welcome side effect of this approach is a dramatic reduction in parallel computing time, increasing the feasibility of applying SL methods to large datasets. The created subsets also facilitate a potential correction for the effect of MAF using pseudocovariates. Simulations using simulated SNPs embedded in empirical data-assessing varying effect sizes, minor allele frequencies and LD patterns-suggest that the sensitivity to detect effects is often improved by subsetting and does not significantly under-perform the Armitage trend test, even under ideal conditions for the trend test. AVAILABILITY: Code for the LD subsetting algorithm and pseudocovariate correction is available at http://www.nd.edu/~glubke/code.html.

Mesh:

Year:  2012        PMID: 22847933      PMCID: PMC3467741          DOI: 10.1093/bioinformatics/bts483

Source DB:  PubMed          Journal:  Bioinformatics        ISSN: 1367-4803            Impact factor:   6.937


  34 in total

1.  Distribution of allele frequencies and effect sizes and their interrelationships for common genetic susceptibility variants.

Authors:  Ju-Hyun Park; Mitchell H Gail; Clarice R Weinberg; Raymond J Carroll; Charles C Chung; Zhaoming Wang; Stephen J Chanock; Joseph F Fraumeni; Nilanjan Chatterjee
Journal:  Proc Natl Acad Sci U S A       Date:  2011-10-14       Impact factor: 11.205

2.  Personal genomes: The case of the missing heritability.

Authors:  Brendan Maher
Journal:  Nature       Date:  2008-11-06       Impact factor: 49.962

3.  Uncovering the total heritability explained by all true susceptibility variants in a genome-wide association study.

Authors:  Hon-Cheong So; Miaoxin Li; Pak C Sham
Journal:  Genet Epidemiol       Date:  2011-05-26       Impact factor: 2.135

4.  Common SNPs explain a large proportion of the heritability for human height.

Authors:  Jian Yang; Beben Benyamin; Brian P McEvoy; Scott Gordon; Anjali K Henders; Dale R Nyholt; Pamela A Madden; Andrew C Heath; Nicholas G Martin; Grant W Montgomery; Michael E Goddard; Peter M Visscher
Journal:  Nat Genet       Date:  2010-06-20       Impact factor: 38.330

Review 5.  Human genetic variation and its contribution to complex traits.

Authors:  Kelly A Frazer; Sarah S Murray; Nicholas J Schork; Eric J Topol
Journal:  Nat Rev Genet       Date:  2009-04       Impact factor: 53.242

6.  An application of Random Forests to a genome-wide association dataset: methodological considerations & new findings.

Authors:  Benjamin A Goldstein; Alan E Hubbard; Adele Cutler; Lisa F Barcellos
Journal:  BMC Genet       Date:  2010-06-14       Impact factor: 2.797

7.  A comparison of random forests, boosting and support vector machines for genomic selection.

Authors:  Joseph O Ogutu; Hans-Peter Piepho; Torben Schulz-Streeck
Journal:  BMC Proc       Date:  2011-05-27

8.  Detecting epistatic effects in association studies at a genomic level based on an ensemble approach.

Authors:  Jing Li; Benjamin Horstman; Yixuan Chen
Journal:  Bioinformatics       Date:  2011-07-01       Impact factor: 6.937

9.  Screening large-scale association study data: exploiting interactions using random forests.

Authors:  Kathryn L Lunetta; L Brooke Hayward; Jonathan Segal; Paul Van Eerdewegh
Journal:  BMC Genet       Date:  2004-12-10       Impact factor: 2.797

10.  Application of two machine learning algorithms to genetic association studies in the presence of covariates.

Authors:  Bareng A S Nonyane; Andrea S Foulkes
Journal:  BMC Genet       Date:  2008-11-14       Impact factor: 2.797

View more
  8 in total

1.  A Bayesian linear mixed model for prediction of complex traits.

Authors:  Yang Hai; Yalu Wen
Journal:  Bioinformatics       Date:  2020-12-17       Impact factor: 6.937

2.  Gradient Boosting as a SNP Filter: an Evaluation Using Simulated and Hair Morphology Data.

Authors:  Gh Lubke; C Laurin; R Walters; N Eriksson; P Hysi; Td Spector; Gw Montgomery; Ng Martin; Se Medland; DI Boomsma
Journal:  J Data Mining Genomics Proteomics       Date:  2013-10-20

3.  Investigation of regions impacting inbreeding depression and their association with the additive genetic effect for United States and Australia Jersey dairy cattle.

Authors:  Jeremy T Howard; Mekonnen Haile-Mariam; Jennie E Pryce; Christian Maltecca
Journal:  BMC Genomics       Date:  2015-10-19       Impact factor: 3.969

4.  Genomic Prediction of Breeding Values Using a Subset of SNPs Identified by Three Machine Learning Methods.

Authors:  Bo Li; Nanxi Zhang; You-Gan Wang; Andrew W George; Antonio Reverter; Yutao Li
Journal:  Front Genet       Date:  2018-07-04       Impact factor: 4.599

5.  Pathway analysis of genome-wide data improves warfarin dose prediction.

Authors:  Roxana Daneshjou; Nicholas P Tatonetti; Konrad J Karczewski; Hersh Sagreiya; Stephane Bourgeois; Katarzyna Drozda; James K Burmester; Tatsuhiko Tsunoda; Yusuke Nakamura; Michiaki Kubo; Matthew Tector; Nita A Limdi; Larisa H Cavallari; Minoli Perera; Julie A Johnson; Teri E Klein; Russ B Altman
Journal:  BMC Genomics       Date:  2013-05-28       Impact factor: 3.969

6.  r2VIM: A new variable selection method for random forests in genome-wide association studies.

Authors:  Silke Szymczak; Emily Holzinger; Abhijit Dasgupta; James D Malley; Anne M Molloy; James L Mills; Lawrence C Brody; Dwight Stambolian; Joan E Bailey-Wilson
Journal:  BioData Min       Date:  2016-02-01       Impact factor: 2.522

7.  Inherited variations in human pigmentation-related genes modulate cutaneous melanoma risk and clinicopathological features in Brazilian population.

Authors:  Gustavo Jacob Lourenço; Cristiane Oliveira; Benilton Sá Carvalho; Caroline Torricelli; Janet Keller Silva; Gabriela Vilas Bôas Gomez; José Augusto Rinck-Junior; Wesley Lima Oliveira; Vinicius Lima Vazquez; Sergio Vicente Serrano; Aparecida Machado Moraes; Carmen Silvia Passos Lima
Journal:  Sci Rep       Date:  2020-07-22       Impact factor: 4.379

8.  The revival of the Gini importance?

Authors:  Stefano Nembrini; Inke R König; Marvin N Wright
Journal:  Bioinformatics       Date:  2018-11-01       Impact factor: 6.937

  8 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.