Literature DB >> 28875524

Iterative hard thresholding for model selection in genome-wide association studies.

Kevin L Keys1, Gary K Chen2, Kenneth Lange3.   

Abstract

A genome-wide association study (GWAS) correlates marker and trait variation in a study sample. Each subject is genotyped at a multitude of SNPs (single nucleotide polymorphisms) spanning the genome. Here, we assume that subjects are randomly collected unrelateds and that trait values are normally distributed or can be transformed to normality. Over the past decade, geneticists have been remarkably successful in applying GWAS analysis to hundreds of traits. The massive amount of data produced in these studies present unique computational challenges. Penalized regression with the ℓ1 penalty (LASSO) or minimax concave penalty (MCP) penalties is capable of selecting a handful of associated SNPs from millions of potential SNPs. Unfortunately, model selection can be corrupted by false positives and false negatives, obscuring the genetic underpinning of a trait. Here, we compare LASSO and MCP penalized regression to iterative hard thresholding (IHT). On GWAS regression data, IHT is better at model selection and comparable in speed to both methods of penalized regression. This conclusion holds for both simulated and real GWAS data. IHT fosters parallelization and scales well in problems with large numbers of causal markers. Our parallel implementation of IHT accommodates SNP genotype compression and exploits multiple CPU cores and graphics processing units (GPUs). This allows statistical geneticists to leverage commodity desktop computers in GWAS analysis and to avoid supercomputing. AVAILABILITY: Source code is freely available at https://github.com/klkeys/IHT.jl.
© 2017 WILEY PERIODICALS, INC.

Entities:  

Keywords:  genetic association studies; greedy algorithm; parallel computing; sparse regression

Mesh:

Substances:

Year:  2017        PMID: 28875524      PMCID: PMC5696071          DOI: 10.1002/gepi.22068

Source DB:  PubMed          Journal:  Genet Epidemiol        ISSN: 0741-0395            Impact factor:   2.135


  21 in total

1.  Association screening of common and rare genetic variants by penalized regression.

Authors:  Hua Zhou; Mary E Sehl; Janet S Sinsheimer; Kenneth Lange
Journal:  Bioinformatics       Date:  2010-08-06       Impact factor: 6.937

2.  Genome-wide association analysis by lasso penalized logistic regression.

Authors:  Tong Tong Wu; Yi Fang Chen; Trevor Hastie; Eric Sobel; Kenneth Lange
Journal:  Bioinformatics       Date:  2009-01-28       Impact factor: 6.937

3.  Common SNPs explain a large proportion of the heritability for human height.

Authors:  Jian Yang; Beben Benyamin; Brian P McEvoy; Scott Gordon; Anjali K Henders; Dale R Nyholt; Pamela A Madden; Andrew C Heath; Nicholas G Martin; Grant W Montgomery; Michael E Goddard; Peter M Visscher
Journal:  Nat Genet       Date:  2010-06-20       Impact factor: 38.330

4.  COORDINATE DESCENT ALGORITHMS FOR NONCONVEX PENALIZED REGRESSION, WITH APPLICATIONS TO BIOLOGICAL FEATURE SELECTION.

Authors:  Patrick Breheny; Jian Huang
Journal:  Ann Appl Stat       Date:  2011-01-01       Impact factor: 2.083

5.  Pleiotropy analysis of quantitative traits at gene level by multivariate functional linear models.

Authors:  Yifan Wang; Aiyi Liu; James L Mills; Michael Boehnke; Alexander F Wilson; Joan E Bailey-Wilson; Momiao Xiong; Colin O Wu; Ruzong Fan
Journal:  Genet Epidemiol       Date:  2015-03-23       Impact factor: 2.135

6.  Regularization Paths for Generalized Linear Models via Coordinate Descent.

Authors:  Jerome Friedman; Trevor Hastie; Rob Tibshirani
Journal:  J Stat Softw       Date:  2010       Impact factor: 6.440

7.  Functional linear models for association analysis of quantitative traits.

Authors:  Ruzong Fan; Yifan Wang; James L Mills; Alexander F Wilson; Joan E Bailey-Wilson; Momiao Xiong
Journal:  Genet Epidemiol       Date:  2013-11       Impact factor: 2.135

8.  A scalable and portable framework for massively parallel variable selection in genetic association studies.

Authors:  Gary K Chen
Journal:  Bioinformatics       Date:  2012-01-11       Impact factor: 6.937

9.  Second-generation PLINK: rising to the challenge of larger and richer datasets.

Authors:  Christopher C Chang; Carson C Chow; Laurent Cam Tellier; Shashaank Vattikuti; Shaun M Purcell; James J Lee
Journal:  Gigascience       Date:  2015-02-25       Impact factor: 6.524

10.  SparSNP: fast and memory-efficient analysis of all SNPs for phenotype prediction.

Authors:  Gad Abraham; Adam Kowalczyk; Justin Zobel; Michael Inouye
Journal:  BMC Bioinformatics       Date:  2012-05-10       Impact factor: 3.169

View more
  3 in total

Review 1.  Big-Data Science in Porous Materials: Materials Genomics and Machine Learning.

Authors:  Kevin Maik Jablonka; Daniele Ongari; Seyed Mohamad Moosavi; Berend Smit
Journal:  Chem Rev       Date:  2020-06-10       Impact factor: 60.622

2.  Iterative hard thresholding in genome-wide association studies: Generalized linear models, prior weights, and double sparsity.

Authors:  Benjamin B Chu; Kevin L Keys; Christopher A German; Hua Zhou; Jin J Zhou; Eric M Sobel; Janet S Sinsheimer; Kenneth Lange
Journal:  Gigascience       Date:  2020-06-01       Impact factor: 6.524

Review 3.  OPENMENDEL: a cooperative programming project for statistical genetics.

Authors:  Hua Zhou; Janet S Sinsheimer; Douglas M Bates; Benjamin B Chu; Christopher A German; Sarah S Ji; Kevin L Keys; Juhyun Kim; Seyoon Ko; Gordon D Mosher; Jeanette C Papp; Eric M Sobel; Jing Zhai; Jin J Zhou; Kenneth Lange
Journal:  Hum Genet       Date:  2019-03-26       Impact factor: 4.132

  3 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.