| Literature DB >> 34488838 |
Zijie Zhao1, Yanyao Yi2, Jie Song2, Yuchang Wu1, Xiaoyuan Zhong3, Yupei Lin3, Timothy J Hohman4,5, Jason Fletcher6,7,8, Qiongshi Lu9,10,11.
Abstract
Polygenic risk scores (PRSs) have wide applications in human genetics research, but often include tuning parameters which are difficult to optimize in practice due to limited access to individual-level data. Here, we introduce PUMAS, a novel method to fine-tune PRS models using summary statistics from genome-wide association studies (GWASs). Through extensive simulations, external validations, and analysis of 65 traits, we demonstrate that PUMAS can perform various model-tuning procedures using GWAS summary statistics and effectively benchmark and optimize PRS models under diverse genetic architecture. Furthermore, we show that fine-tuned PRSs will significantly improve statistical power in downstream association analysis.Entities:
Keywords: GWAS; Model tuning; Polygenic risk score; Summary statistics
Mesh:
Year: 2021 PMID: 34488838 PMCID: PMC8419981 DOI: 10.1186/s13059-021-02479-9
Source DB: PubMed Journal: Genome Biol ISSN: 1474-7596 Impact factor: 13.583
Fig. 1A workflow of model-tuning strategies. A Traditional approaches split individual-level data into training and validation subsets to fine-tune prediction models. B Our method directly generates training and validation summary statistics without using individual-level information and use simulated summary statistics as input to select the best model
Fig. 2Comparison of PUMAS and repeated learning. A, C Model tuning results based on PUMAS. B, D Results of repeated learning with individual-level data as input. The proportion of causal variants was set to be 0.001 in A and B and 0.1 in C and D. The X-axis shows the log-transformed p-value thresholds. The Y-axis shows the predictive performance quantified by average R2 across four folds. Parameter α was set to be 0 in this simulation (the “Methods” section). Results for other settings are summarized in Additional file 1: Fig. S1-S9
Fig. 3Model-tuning performance on real GWAS data. A PUMAS performance on the EA training set. B Prediction performance on two validation sets for EA. C PUMAS performance on the AD training set. D Prediction performance on two validation sets for AD. The X-axis shows the log-transformed p-value cutoffs in PRS which is the tuning parameter of interest. The Y-axis indicates predictive R2. EA educational attainment, AD Alzheimer’s disease
Fig. 4Technical issues involving sample size and LD clumping. A PUMAS results on LDL cholesterol and EA with various sample size specifications. The two gray dashed lines represent the optimal p-value cutoffs selected by the “QCed” setting for LDL and EA, respectively. B Predictive performance on external validation for AD PRS based on pruned and clumped summary statistics. Two gray dashed lines mark the optimal p-value cutoffs inferred by PUMAS on pruned and clumped summary statistics. LDL low-density lipoprotein, EA educational attainment, AD Alzheimer’s disease
Fig. 5An atlas of optimized PRSs for complex diseases and traits. 45 diseases/traits with optimized R2 > 0.005 are included in the figure. Each circle represents a disease or trait. The size of circles indicates the sample size of the study; colors mark the five trait categories. The X-axis indicates the negative log-transformed p-value cutoff in PRS which is also the tuning parameter of interest. The Y-axis indicates the optimal R2. Information on all diseases and traits is summarized in Additional file 7: Table. S6
Fig. 6Identifying neuroimaging trait PRSs associated with AD. A QQ plot for the associations between 211 neuroimaging trait PRSs and AD. p-values were based on the meta-analysis of IGAP 2019 GWAS and the UK Biobank with a proxy AD phenotype. B Effect size estimates for top associations. Imaging trait PRSs that reached a p-value < 0.01 in the meta-analysis are shown in the plot. X-axis: effect sizes of imaging trait PRSs on the AD-proxy phenotype in the UK Biobank; Y-axis: effect sizes on AD in the IGAP 2019 GWAS. Imaging traits whose p-value achieved Bonferroni-corrected significance in the meta-analysis are highlighted in red. The dashed lines represent the standard error of effect size estimates