| Literature DB >> 26430803 |
Bjarni J Vilhjálmsson1, Jian Yang2, Hilary K Finucane3, Alexander Gusev4, Sara Lindström5, Stephan Ripke6, Giulio Genovese7, Po-Ru Loh4, Gaurav Bhatia4, Ron Do8, Tristan Hayeck4, Hong-Hee Won9, Sekar Kathiresan9, Michele Pato10, Carlos Pato10, Rulla Tamimi11, Eli Stahl12, Noah Zaitlen13, Bogdan Pasaniuc14, Gillian Belbin8, Eimear E Kenny15, Mikkel H Schierup16, Philip De Jager17, Nikolaos A Patsopoulos17, Steve McCarroll7, Mark Daly18, Shaun Purcell12, Daniel Chasman19, Benjamin Neale18, Michael Goddard20, Peter M Visscher2, Peter Kraft21, Nick Patterson22, Alkes L Price23.
Abstract
Polygenic risk scores have shown great promise in predicting complex disease risk and will become more accurate as training sample sizes increase. The standard approach for calculating risk scores involves linkage disequilibrium (LD)-based marker pruning and applying a p value threshold to association statistics, but this discards information and can reduce predictive accuracy. We introduce LDpred, a method that infers the posterior mean effect size of each marker by using a prior on effect sizes and LD information from an external reference panel. Theory and simulations show that LDpred outperforms the approach of pruning followed by thresholding, particularly at large sample sizes. Accordingly, predicted R(2) increased from 20.1% to 25.3% in a large schizophrenia dataset and from 9.8% to 12.0% in a large multiple sclerosis dataset. A similar relative improvement in accuracy was observed for three additional large disease datasets and for non-European schizophrenia samples. The advantage of LDpred over existing methods will grow as sample sizes increase.Entities:
Mesh:
Year: 2015 PMID: 26430803 PMCID: PMC4596916 DOI: 10.1016/j.ajhg.2015.09.001
Source DB: PubMed Journal: Am J Hum Genet ISSN: 0002-9297 Impact factor: 11.025