Literature DB >> 25592581

Variable selection method for the identification of epistatic models.

Emily Rose Holzinger1, Silke Szymczak, Abhijit Dasgupta, James Malley, Qing Li, Joan E Bailey-Wilson.   

Abstract

Standard analysis methods for genome wide association studies (GWAS) are not robust to complex disease models, such as interactions between variables with small main effects. These types of effects likely contribute to the heritability of complex human traits. Machine learning methods that are capable of identifying interactions, such as Random Forests (RF), are an alternative analysis approach. One caveat to RF is that there is no standardized method of selecting variables so that false positives are reduced while retaining adequate power. To this end, we have developed a novel variable selection method called relative recurrency variable importance metric (r2VIM). This method incorporates recurrency and variance estimation to assist in optimal threshold selection. For this study, we specifically address how this method performs in data with almost completely epistatic effects (i.e. no marginal effects). Our results show that with appropriate parameter settings, r2VIM can identify interaction effects when the marginal effects are virtually nonexistent. It also outperforms logistic regression, which has essentially no power under this type of model when the number of potential features (genetic variants) is large. (All Supplementary Data can be found here: http://research.nhgri.nih.gov/manuscripts/Bailey-Wilson/r2VIM_epi/).

Entities:  

Mesh:

Year:  2015        PMID: 25592581      PMCID: PMC4299919     

Source DB:  PubMed          Journal:  Pac Symp Biocomput        ISSN: 2335-6928


  11 in total

Review 1.  Role of evolutionary history on haplotype block structure in the human genome: implications for disease mapping.

Authors:  Sarah A Tishkoff; Brian C Verrelli
Journal:  Curr Opin Genet Dev       Date:  2003-12       Impact factor: 5.578

Review 2.  Machine learning techniques and drug design.

Authors:  J C Gertrudes; V G Maltarollo; R A Silva; P R Oliveira; K M Honório; A B F da Silva
Journal:  Curr Med Chem       Date:  2012       Impact factor: 4.530

3.  Data simulation software for whole-genome association and other studies in human genetics.

Authors:  Scott M Dudek; Alison A Motsinger; Digna R Velez; Scott M Williams; Marylyn D Ritchie
Journal:  Pac Symp Biocomput       Date:  2006

4.  ATHENA: a tool for meta-dimensional analysis applied to genotypes and gene expression data to predict HDL cholesterol levels.

Authors:  Emily R Holzinger; Scott M Dudek; Alex T Frase; Ronald M Krauss; Marisa W Medina; Marylyn D Ritchie
Journal:  Pac Symp Biocomput       Date:  2013

5.  The behaviour of random forest permutation-based variable importance measures under predictor correlation.

Authors:  Kristin K Nicodemus; James D Malley; Carolin Strobl; Andreas Ziegler
Journal:  BMC Bioinformatics       Date:  2010-02-27       Impact factor: 3.169

Review 6.  Finding the missing heritability of complex diseases.

Authors:  Teri A Manolio; Francis S Collins; Nancy J Cox; David B Goldstein; Lucia A Hindorff; David J Hunter; Mark I McCarthy; Erin M Ramos; Lon R Cardon; Aravinda Chakravarti; Judy H Cho; Alan E Guttmacher; Augustine Kong; Leonid Kruglyak; Elaine Mardis; Charles N Rotimi; Montgomery Slatkin; David Valle; Alice S Whittemore; Michael Boehnke; Andrew G Clark; Evan E Eichler; Greg Gibson; Jonathan L Haines; Trudy F C Mackay; Steven A McCarroll; Peter M Visscher
Journal:  Nature       Date:  2009-10-08       Impact factor: 49.962

7.  Epistasis dominates the genetic architecture of Drosophila quantitative traits.

Authors:  Wen Huang; Stephen Richards; Mary Anna Carbone; Dianhui Zhu; Robert R H Anholt; Julien F Ayroles; Laura Duncan; Katherine W Jordan; Faye Lawrence; Michael M Magwire; Crystal B Warner; Kerstin Blankenburg; Yi Han; Mehwish Javaid; Joy Jayaseelan; Shalini N Jhangiani; Donna Muzny; Fiona Ongeri; Lora Perales; Yuan-Qing Wu; Yiqing Zhang; Xiaoyan Zou; Eric A Stone; Richard A Gibbs; Trudy F C Mackay
Journal:  Proc Natl Acad Sci U S A       Date:  2012-09-04       Impact factor: 11.205

8.  The NHGRI GWAS Catalog, a curated resource of SNP-trait associations.

Authors:  Danielle Welter; Jacqueline MacArthur; Joannella Morales; Tony Burdett; Peggy Hall; Heather Junkins; Alan Klemm; Paul Flicek; Teri Manolio; Lucia Hindorff; Helen Parkinson
Journal:  Nucleic Acids Res       Date:  2013-12-06       Impact factor: 16.971

Review 9.  Personalizing health care: feasibility and future implications.

Authors:  Brian Godman; Alexander E Finlayson; Parneet K Cheema; Eva Zebedin-Brandl; Inaki Gutiérrez-Ibarluzea; Jan Jones; Rickard E Malmström; Elina Asola; Christoph Baumgärtel; Marion Bennie; Iain Bishop; Anna Bucsics; Stephen Campbell; Eduardo Diogene; Alessandra Ferrario; Jurij Fürst; Kristina Garuoliene; Miguel Gomes; Katharine Harris; Alan Haycox; Harald Herholz; Krystyna Hviding; Saira Jan; Marija Kalaba; Christina Kvalheim; Ott Laius; Sven-Ake Lööv; Kamila Malinowska; Andrew Martin; Laura McCullagh; Fredrik Nilsson; Ken Paterson; Ulrich Schwabe; Gisbert Selke; Catherine Sermet; Steven Simoens; Dominik Tomek; Vera Vlahovic-Palcevski; Luka Voncina; Magdalena Wladysiuk; Menno van Woerkom; Durhane Wong-Rieger; Corrine Zara; Raghib Ali; Lars L Gustafsson
Journal:  BMC Med       Date:  2013-08-13       Impact factor: 8.775

10.  Risk estimation using probability machines.

Authors:  Abhijit Dasgupta; Silke Szymczak; Jason H Moore; Joan E Bailey-Wilson; James D Malley
Journal:  BioData Min       Date:  2014-03-01       Impact factor: 2.522

View more
  5 in total

1.  Gene-Gene Interaction Among WNT Genes for Oral Cleft in Trios.

Authors:  Qing Li; Yoonhee Kim; Bhoom Suktitipat; Jacqueline B Hetmanski; Mary L Marazita; Priya Duggal; Terri H Beaty; Joan E Bailey-Wilson
Journal:  Genet Epidemiol       Date:  2015-02-06       Impact factor: 2.135

2.  Using Linkage Analysis to Detect Gene-Gene Interactions. 2. Improved Reliability and Extension to More-Complex Models.

Authors:  Susan E Hodge; Valerie R Hager; David A Greenberg
Journal:  PLoS One       Date:  2016-01-11       Impact factor: 3.240

3.  Advantages of Synthetic Noise and Machine Learning for Analyzing Radioecological Data Sets.

Authors:  Igor Shuryak
Journal:  PLoS One       Date:  2017-01-09       Impact factor: 3.240

Review 4.  What makes a good prediction? Feature importance and beginning to open the black box of machine learning in genetics.

Authors:  Anthony M Musolf; Emily R Holzinger; James D Malley; Joan E Bailey-Wilson
Journal:  Hum Genet       Date:  2021-12-04       Impact factor: 5.881

5.  r2VIM: A new variable selection method for random forests in genome-wide association studies.

Authors:  Silke Szymczak; Emily Holzinger; Abhijit Dasgupta; James D Malley; Anne M Molloy; James L Mills; Lawrence C Brody; Dwight Stambolian; Joan E Bailey-Wilson
Journal:  BioData Min       Date:  2016-02-01       Impact factor: 2.522

  5 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.