Literature DB >> 24166696

Strategies for developing prediction models from genome-wide association studies.

Jincao Wu1, Ruth M Pfeiffer, Mitchell H Gail.   

Abstract

Genome-wide association studies (GWASs) have identified hundreds of single nucleotide polymorphisms (SNPs) associated with complex human diseases. However, risk prediction models based on them have limited discriminatory accuracy. It has been suggested that including many such SNPs can improve predictive performance. Here, we studied various aspects of model building to improve discriminatory accuracy, as measured by the area under the receiver operating characteristic curve (AUC), including: (1) How well does a one-phase procedure that selects SNPs and estimates odds ratios on the same data perform? (2) How should training data be allocated between SNP selection (Phase 1) and estimation (Phase 2) in a two-phase procedure? (3) Should SNP selection be based on P-value thresholding or ranking P-values? (4) How many SNPs should be selected? and (5) Is multivariate estimation preferred to univariate estimation in the presence of linkage disequilibrium (LD)? We used realistic estimates of the distributions of genetic effect sizes, allele frequencies, and LD patterns based on GWAS data for Crohn's disease and prostate cancer. Theory and simulations were used to estimate AUC. Empirical risk models based on 10,000 cases and controls had considerably lower AUC than theoretically achievable. The most critical aspect of prediction model building was initial SNP selection. The single-phase procedure achieved higher AUC than the two-phase procedure. Multivariate estimation did not perform as well as univariate (marginal) estimation. For complex diseases and samples of 10,000 or fewer cases and controls, one should limit the number of SNPs to tens or hundreds.
© 2013 WILEY PERIODICALS, INC.

Entities:  

Keywords:  AUC; GWAS; ROC curve; discriminatory accuracy; probability of correct classification; risk prediction

Mesh:

Year:  2013        PMID: 24166696     DOI: 10.1002/gepi.21762

Source DB:  PubMed          Journal:  Genet Epidemiol        ISSN: 0741-0395            Impact factor:   2.135


  12 in total

Review 1.  Methodological challenges in constructing DNA methylation risk scores.

Authors:  Anke Hüls; Darina Czamara
Journal:  Epigenetics       Date:  2019-07-22       Impact factor: 4.528

Review 2.  Statistical learning approaches in the genetic epidemiology of complex diseases.

Authors:  Anne-Laure Boulesteix; Marvin N Wright; Sabine Hoffmann; Inke R König
Journal:  Hum Genet       Date:  2019-05-02       Impact factor: 4.132

3.  Multi-locus genetic risk score predicts risk for Crohn's disease in Slovenian population.

Authors:  Katarina Zupančič; Kristijan Skok; Katja Repnik; Rinse K Weersma; Uroš Potočnik; Pavel Skok
Journal:  World J Gastroenterol       Date:  2016-04-14       Impact factor: 5.742

Review 4.  Polygenic Risk Scores in Clinical Psychology: Bridging Genomic Risk to Individual Differences.

Authors:  Ryan Bogdan; David A A Baranger; Arpana Agrawal
Journal:  Annu Rev Clin Psychol       Date:  2018-05-07       Impact factor: 18.561

5.  Contemporary Considerations for Constructing a Genetic Risk Score: An Empirical Approach.

Authors:  Benjamin A Goldstein; Lingyao Yang; Elias Salfati; Themistoclies L Assimes
Journal:  Genet Epidemiol       Date:  2015-07-22       Impact factor: 2.135

Review 6.  Developing and evaluating polygenic risk prediction models for stratified disease prevention.

Authors:  Nilanjan Chatterjee; Jianxin Shi; Montserrat García-Closas
Journal:  Nat Rev Genet       Date:  2016-05-03       Impact factor: 53.242

7.  Estimating the predictive ability of genetic risk models in simulated data based on published results from genome-wide association studies.

Authors:  Suman Kundu; Raluca Mihaescu; Catherina M C Meijer; Rachel Bakker; A Cecile J W Janssens
Journal:  Front Genet       Date:  2014-06-13       Impact factor: 4.599

8.  Regularized machine learning in the genetic prediction of complex traits.

Authors:  Sebastian Okser; Tapio Pahikkala; Antti Airola; Tapio Salakoski; Samuli Ripatti; Tero Aittokallio
Journal:  PLoS Genet       Date:  2014-11-13       Impact factor: 5.917

9.  Incorporation of personal single nucleotide polymorphism (SNP) data into a national level electronic health record for disease risk assessment, part 1: an overview of requirements.

Authors:  Timur Beyan; Yeşim Aydın Son
Journal:  JMIR Med Inform       Date:  2014-07-24

Review 10.  Genetic-based prediction of disease traits: prediction is very difficult, especially about the future.

Authors:  Steven J Schrodi; Shubhabrata Mukherjee; Ying Shan; Gerard Tromp; John J Sninsky; Amy P Callear; Tonia C Carter; Zhan Ye; Jonathan L Haines; Murray H Brilliant; Paul K Crane; Diane T Smelser; Robert C Elston; Daniel E Weeks
Journal:  Front Genet       Date:  2014-06-02       Impact factor: 4.599

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.