| Literature DB >> 35905320 |
Etienne J Orliac1, Daniel Trejo Banos2, Sven E Ojavee3, Kristi Läll4, Reedik Mägi4, Peter M Visscher5, Matthew R Robinson6.
Abstract
Genetically informed, deep-phenotyped biobanks are an important research resource and it is imperative that the most powerful, versatile, and efficient analysis approaches are used. Here, we apply our recently developed Bayesian grouped mixture of regressions model (GMRM) in the UK and Estonian Biobanks and obtain the highest genomic prediction accuracy reported to date across 21 heritable traits. When compared to other approaches, GMRM accuracy was greater than annotation prediction models run in the LDAK or LDPred-funct software by 15% (SE 7%) and 14% (SE 2%), respectively, and was 18% (SE 3%) greater than a baseline BayesR model without single-nucleotide polymorphism (SNP) markers grouped into minor allele frequency-linkage disequilibrium (MAF-LD) annotation categories. For height, the prediction accuracy R2 was 47% in a UK Biobank holdout sample, which was 76% of the estimated [Formula: see text]. We then extend our GMRM prediction model to provide mixed-linear model association (MLMA) SNP marker estimates for genome-wide association (GWAS) discovery, which increased the independent loci detected to 16,162 in unrelated UK Biobank individuals, compared to 10,550 from BoltLMM and 10,095 from Regenie, a 62 and 65% increase, respectively. The average [Formula: see text] value of the leading markers increased by 15.24 (SE 0.41) for every 1% increase in prediction accuracy gained over a baseline BayesR model across the traits. Thus, we show that modeling genetic associations accounting for MAF and LD differences among SNP markers, and incorporating prior knowledge of genomic function, is important for both genomic prediction and discovery in large-scale individual-level studies.Entities:
Keywords: Bayesian penalized regression; association study; genomic prediction
Mesh:
Year: 2022 PMID: 35905320 PMCID: PMC9351350 DOI: 10.1073/pnas.2121279119
Source DB: PubMed Journal: Proc Natl Acad Sci U S A ISSN: 0027-8424 Impact factor: 12.779
Fig. 1.Prediction accuracy of a GMRM. (A) Prediction accuracy obtained by GMRM for the 21 traits compared to the best individual-level LDAK prediction model (LDAK), a BayesR model with five mixture groups (BayesR), or polygenic risk scores calculated using BoltLMM mixed-linear model association SNP marker effects (PRS). (B) The prediction accuracy of LDAK and GMRM models as a percentage difference from the accuracy obtained from the BayesR model. Error bars in give 95% CIs. Full trait code descriptions are given in .
Fig. 2.Prediction accuracy of GMRM in UK and Estonian Biobanks. (A) Prediction accuracy of the GMRM effects sizes as a percentage of their upper bound (the SNP heritability) for 21 traits. (B) Prediction accuracy obtained by GMRM for the 21 traits compared to that expected from ridge-regression theory. (C) Prediction accuracy obtained using GMRM UK Biobank estimates in UK Biobank holdout data (UK →UK), GMRM UK Biobank estimates in Estonian data (UK →EE), and UK Biobank and Estonian meta-analysis GMRM estimates in Estonian holdout data (UK+EE →EE) for five focal traits. (D) Odds ratio for top 1% of the GMRM genetic predictor compared to all others, within UK →UK and UK+EE →EE for T2D, CAD, and high BP. Error bars in give 95% CIs. Full trait code descriptions are given in .
Fig. 3.GWAS discovery of a GMRM in the UK Biobank. (A) Number of LD-independent genomic regions identified at by GMRM, compared to in BoltLMM (Bolt) and Regenie (Regenie) across 21 traits. (B) For SNP markers identified at by Bolt, Regenie, and GMRM, we estimated the difference in value between GMRM and Regenie and plotted this against the difference in prediction accuracy of GMRM compared to a BayesR model, to test whether discovery power scales with improved prediction accuracy of using MAF-LD annotation groups. Shaded area gives the 95% CIs of the regression line. Full trait code descriptions are given in .