Omer Weissbrod1, Masahiro Kanai2,3, Huwenbo Shi4,5, Steven Gazal4,6,7, Wouter J Peyrot4,8, Amit V Khera2,9, Yukinori Okada3,10, Alicia R Martin2, Hilary K Finucane2,11, Alkes L Price12,13. 1. Epidemiology Department, Harvard School of Public Health, Boston, MA, USA. oweissbrod@hsph.harvard.edu. 2. Broad Institute of MIT and Harvard, Cambridge, MA, USA. 3. Department of Statistical Genetics, Osaka University Graduate School of Medicine, Suita, Japan. 4. Epidemiology Department, Harvard School of Public Health, Boston, MA, USA. 5. OMNI Bioinformatics, San Francisco, CA, USA. 6. Department of Population and Public Health Sciences, Keck School of Medicine, University of Southern California, Los Angeles, CA, USA. 7. Center for Genetic Epidemiology, Keck School of Medicine, University of Southern California, Los Angeles, CA, USA. 8. Department of Psychiatry, Amsterdam UMC, Vrije Universiteit, Amsterdam, the Netherlands. 9. Verve Therapeutics, Cambridge, MA, USA. 10. Laboratory for Systems Genetics, RIKEN Center for Integrative Medical Sciences, Yokohama, Japan. 11. Department of Medicine, Massachusetts General Hospital, Boston, MA, USA. 12. Epidemiology Department, Harvard School of Public Health, Boston, MA, USA. aprice@hsph.harvard.edu. 13. Broad Institute of MIT and Harvard, Cambridge, MA, USA. aprice@hsph.harvard.edu.
Abstract
Polygenic risk scores suffer reduced accuracy in non-European populations, exacerbating health disparities. We propose PolyPred, a method that improves cross-population polygenic risk scores by combining two predictors: a new predictor that leverages functionally informed fine-mapping to estimate causal effects (instead of tagging effects), addressing linkage disequilibrium differences, and BOLT-LMM, a published predictor. When a large training sample is available in the non-European target population, we propose PolyPred+, which further incorporates the non-European training data. We applied PolyPred to 49 diseases/traits in four UK Biobank populations using UK Biobank British training data, and observed relative improvements versus BOLT-LMM ranging from +7% in south Asians to +32% in Africans, consistent with simulations. We applied PolyPred+ to 23 diseases/traits in UK Biobank east Asians using both UK Biobank British and Biobank Japan training data, and observed improvements of +24% versus BOLT-LMM and +12% versus PolyPred. Summary statistics-based analogs of PolyPred and PolyPred+ attained similar improvements.
Polygenic risk scores suffer reduced accuracy in non-European populations, exacerbating health disparities. We propose PolyPred, a method that improves cross-population polygenic risk scores by combining two predictors: a new predictor that leverages functionally informed fine-mapping to estimate causal effects (instead of tagging effects), addressing linkage disequilibrium differences, and BOLT-LMM, a published predictor. When a large training sample is available in the non-European target population, we propose PolyPred+, which further incorporates the non-European training data. We applied PolyPred to 49 diseases/traits in four UK Biobank populations using UK Biobank British training data, and observed relative improvements versus BOLT-LMM ranging from +7% in south Asians to +32% in Africans, consistent with simulations. We applied PolyPred+ to 23 diseases/traits in UK Biobank east Asians using both UK Biobank British and Biobank Japan training data, and observed improvements of +24% versus BOLT-LMM and +12% versus PolyPred. Summary statistics-based analogs of PolyPred and PolyPred+ attained similar improvements.
Polygenic risk scores (PRS) can identify individuals at elevated risk of complex diseases, providing opportunities for preventative action[1-6]. However, many studies have shown that PRS based on European training data attain lower accuracy when applied to populations of non-European ancestry[7-26]. This loss of accuracy is primarily driven by LD differences[12-15], allele frequency differences (including population-specific SNPs)[13,14,27], and causal effect size differences[12-14,28-31], though differences in heritability also play a minor role[13,14,32]. PRS based on non-European training data do not suffer from these limitations, but are currently limited by much smaller training sample sizes[1,12-15,21,33]. The development of new methods to reduce this gap in cross-population PRS accuracy has the potential to ameliorate health disparities[13].Here, we propose PolyPred, which linearly combines two complementary predictors derived from European training data: (1) PolyFun-pred, a new predictor that circumvents LD differences by applying genome-wide functionally informed fine-mapping[34,35] to precisely estimate causal effects (instead of tagging effects); and (2) BOLT-LMM[36,37], a published predictor that analyzes all loci jointly and can capture all signals in extremely polygenic loci. BOLT-LMM requires individual-level training data. If individual-level training data is not available, we propose two analogous methods: (i) PolyPred-S, which linearly combines PolyFun-pred with SBayesR[38], and (ii) PolyPred-P, which linearly combines PolyFun-pred with PRS-CS[39]. Recommendations for when to use PolyPred, PolyPred-S, or PolyPred-P are provided below.In the special case where there exists a large (e.g., N≥50K) non-European training sample from the target population (or a closely related population), we propose PolyPred+, a polygenic prediction method that leverages both European and non-European training data. PolyPred+ linearly combines (1) PolyFun-pred; (2) BOLT-LMM; and (3) BOLT-LMM-pop, which is obtained by applying BOLT-LMM to the non-European training data, addressing MAF differences and causal effect size differences. If individual-level training data is not available, we propose the alternative methods PolyPred-S+ and PolyPred-P+, which replace BOLT-LMM with either SBayesR or PRS-CS, respectively. Recommendations for when to use PolyPred+, PolyPred-S+, or PolyPred-P+ are provided below.We compared PolyPred and PolyPred+ (and their summary statistic-based analogues) to state-of-the-art polygenic prediction methods via simulations and analyses of 49 diseases and complex traits in 4 populations from the UK Biobank[40], in Biobank Japan[41], and in Uganda-APCDR[42,43]. We conclude that PolyPred and its summary statistic-based analogues substantially increase cross-population polygenic prediction accuracy, and that PolyPred+ and its summary statistic-based analogues further increases cross-population prediction accuracy in the special case where non-European training data is available in large sample size.
Results
Overview of Methods
PolyPred combines two complementary predictors: PolyFun-pred and BOLT-LMM (Table 1 and Figure 1a). PolyFun-pred is a new predictor that leverages genome-wide functionally informed fine-mapping[34,35] to estimate posterior mean causal effects (instead of tagging effects; see Supplementary Note) for all SNPs with European MAF≥0.1% (18 million SNPs in this study) by applying PolyFun + SuSiE[35] to European training data across 2,763 overlapping 3Mb loci. Leveraging fine-mapped posterior mean causal effects for cross-population polygenic prediction aims to address LD differences between populations. BOLT-LMM[36,37] is a published predictor that estimates posterior mean tagging effects of common SNPs (1.2 million HapMap 3 SNPs[44] in this study) using European individual-level training data. Combining PolyFun-pred with BOLT-LMM is advantageous because they have complementary advantages: PolyFun-pred estimates causal effects rather than tagging effects. BOLT-LMM estimates tagging effects, but it analyzes all loci jointly and it can potentially capture all signals in extremely polygenic loci (Methods).
Table 1:
Summary of main methods evaluated.
Method
Constituent methods
SNP set
Training data
Fine-mapped effect sizes
Summary statistics
Ref.
P+T
-
All (18 million)
Eur
No
Yes
[45,46]
BOLT-LMM
-
HapMap 3 (1.2 million)
Eur
No
No
[36,37]
SBayesR
-
HapMap 3 (1.2 million)
Eur
No
Yes
38
PRS-CS
-
HapMap 3 (1.2 million)
Eur
No
Yes
39
PolyPred
PolyFun-pred, BOLT-LMM
All (18 million)
Eur
Yes
No
This work
PolyPred-S
PolyFun-pred, SBayesR
All (18 million)
Eur
Yes
Yes
This work
PolyPred-P
PolyFun-pred, PRS-CS
All (18 million)
Eur
Yes
Yes
This work
PolyPred+
PolyFun-pred, BOLT-LMM, BOLT-LMM-pop
All (18 million)
Eur + target pop
Yes
No
This work
PolyPred-S+
PolyFun-pred, SBayesR, SBayesR-pop
All (18 million)
Eur + target pop
Yes
Yes
This work
PolyPred-P+
PolyFun-pred, PRS-CS, PRS-CS-pop
All (18 million)
Eur + target pop
Yes
Yes
This work
For each method we report its constituent methods (or “-” for individual methods), the set of SNPs analyzed in model training using UK Biobank training data (and its size when restricted to imputed UK Biobank SNPs with European MAF≥0.1% and INFO score≥0.6), the training data analyzed, whether it incorporates fine-mapped effect sizes (as opposed to tagging effect sizes), whether it can work with summary statistics, and the corresponding reference. Eur: European; target pop: non-European target population; Method-pop: Method applied to training data from non-European target population.
Figure 1:
Overview of PolyPred and PolyPred+.
(a) Overview of PolyPred. PolyPred linearly combines the effect sizes of BOLT-LMM (βBOLT–LMM) and PolyFun-pred (βPolyFun–pred), (trained using European training data). It uses a small training sample from the target population to estimate mixing weights (ω1, ω2) for the constituent methods. (b) Overview of PolyPred+. PolyPred+ linearly combines the effect sizes of BOLT-LMM (βBOLT–LMM), PolyFun-pred (βPolyFun–pred) (trained using European training data), and BOLT-LMM-pop (βBOLT–LMM–pop) (trained using non-European training data from the target population). It uses a small training sample from the target population to estimate mixing weights (ω1, ω2, ω3) for the constituent methods. PolyPred-S and PolyPred-P (resp. Poly-Pred-S+ and PolyPred-P+) replace all instances of BOLT-LMM with SBayesR or PRS-CS, respectively.
In the special case where a large training sample is available in the target population (or a closely related population), we propose PolyPred+, which combines three complementary predictors: PolyFun-pred, BOLT-LMM, and BOLT-LMM-pop (Table 1 and Figure 1b); BOLT-LMM-pop refers to application of BOLT-LMM to common SNPs (1.2 million HapMap 3 SNPs in this study) using training data from the non-European target population, addressing MAF differences and causal effect size differences.PolyPred computes linear combinations of the estimated effect sizes of their constituent predictors:
where i indexes SNPs, j indexes the constituent predictors (PolyFun-pred and BOLT-LMM for PolyPred; PolyFun-pred, BOLT-LMM and BOLT-LMM-pop for PolyPred+), is the PolyPred (+) per-allele effect size of SNP i, w are method-specific weights, and is the per-allele effect size of SNP i for method j (or 0 if SNP i was not considered by method j). Predicted phenotypes are computed by applying effect sizes to target genotypes:
where is the predicted phenotype of an individual from the target population and x is the number of minor alleles of SNP i carried by the individual. The mixing weights w in Equation 1 are estimated via non-negative least squares regression using a small number of training individuals from the target population (500 in this study), regressing true phenotypes on a linear combination of the constituent predictors (which are computed as in Equation 2).PolyPred requires individual-level training data for its BOLT-LMM component. If only summary statistics (and summary LD information) are available, we propose two analogous methods (Table 1): (i) PolyPred-S, which linearly combines PolyFun-pred and SBayesR[38]; and (ii) PolyPred-P, which linearly combines PolyFun-pred and PRS-CS[39]. We also propose the analogous methods PolyPred-P+ and PolyPred-S+ (Table 1). Further details of PolyPred and PolyPred+ (and their summary statistic-based analogues) are provided in the Methods section; we have publicly released open-source software implementing these methods (see Code Availability).We evaluate prediction accuracy for each method and target population using relative-R2, defined as the R2 obtained in the target non-European population (after correcting for covariates and potential confounders; see Methods) divided by the R2 obtained by BOLT-LMM in UK Biobank non-British Europeans (employing the same correction), using the same training data for the numerator and the denominator. This quotient transforms the prediction accuracies from an absolute scale to a scale of relative improvement (vs. the BOLT-LMM predictor in the UK Biobank non-British European target population), which is invariant to factors such as training sample size and trait heritability. For disease traits, we additionally evaluated the area under the receiving operating characteristic. We provide further details in the Methods section. We compare PolyPred and PolyPred+ (and their summary statistic-based analogues) to 4 published methods: LD-pruning + P-value thresholding (P+T)[45,46], BOLT-LMM[36,37], SBayesR[38], and PRS-CS[39] (Table 1).Our recommendation for which version of PolyPred to use (see Table 1) depends on three factors: (i) whether individual-level training data is available; (ii) the size and consistency of matched ancestry of the LD reference panel (if individual-level training data is not available); and (iii) whether non-European training data is available. Our results for the underlying constituent methods are summarized in Table 2 (detailed below), and our recommendations are summarized in Figure 2.
Table 2:
Summary of the relative performance of constituent PRS methods.
LD
BOLT-LMM
SBayesR
PRS-CS
Figure(s)/Table(s)
Individual-level data (UKB, N=337K)
✔✔
✔
✔
Figures 3,4,6
In-sample LD (UKB, N=337K)
---
✔✔
✔
Figures 3,4,6
Very large unmatched LD (UKB, N=337K)
---
✔
✔✔
Extended Data Figure 1
Small unmatched LD (1000G, N=489)
---
✘
✔✔*
Tables S4–S6
For each of three constituent PRS methods (BOLT-LMM, SBayesR, PRS-CS) we report its relative performance in prediction in UK Biobank non-British Europeans under various settings; we also provide links to the corresponding Figure(s)/Table(s). ✔✔: the method is significantly more accurate than the second best method in the same row, and combining this method with PolyFun-pred increases prediction accuracy; ✔✔*: the method is significantly more accurate than the second best method in the same row, and combining this method with PolyFun-pred does not increase prediction accuracy; ✔: the method is significantly less accurate than the best method in the same row, but is significantly more accurate than P+T; ✘: the method is not significantly more accurate than P+T; ---: the method is not applicable, because it requires individual-level data. For individual-level data, the difference between BOLT-LMM and the second-best method was significant in simulations but non-significant in real trait analyses. For In-sample LD, the difference between SBayesR and PRS-CS was significant in simulations but non-significant in real traits analyses. For Very large unmatched LD (a likely scenario when analyzing summary statistics from a meta-analysis of many cohorts), we performed real trait analyses only, as simulations would have required another very large individual-level data set in addition to UK Biobank (see Supplementary Note). For small unmatched LD, we performed both simulations and real trait analyses but report results of real trait analyses, which we believe to be most reflective of real-life settings (in simulations, SBayesR was significantly more accurate than PRS-CS; see Supplementary Note). Results for non-European target populations from UK Biobank were similar, though some of the differences were not statistically significant due to smaller prediction accuracies and sample sizes. We have facilitated the use of very large LD reference panels for European training data by publicly releasing summary LD information for N=337K British-ancestry UK Biobank samples across 18 million SNPs (see Data availability).
Figure 2:
Recommendations for the application of PolyPred, PolyPred+ and related methods.
(a) Flowchart of recommendations when only European training data is available. (b) Flowchart of recommendations when both European and non-European training data are available. We note that when working with summary statistics from a meta-analysis of many cohorts, there is typically no LD reference panel that closely matches the GWAS population. Also, it is possible that the answers to the flowchart questions are different for European vs. non-European training data, in which case the recommendation would be to use a hybrid method based on the answers to each flowchart in turn (e.g. PolyFun-pred + BOLT-LMM + PRS-CS-pop; not listed in Table 1). For both (a) and (b), we recommend training PolyFun-pred using a very large LD reference panel (e.g. N=337K UK Biobank British) with a dense SNP set (e.g. 8 million SNPs). We have facilitated this by publicly releasing summary LD information for N=337K British-ancestry UK Biobank samples across 18 million SNPs (see Data availability).
Simulations with in-sample LD
We compared PolyPred, PolyPred-S and PolyPred-P to P+T, BOLT-LMM, SBayesR, and PRS-CS via simulations, using real genotypes or in-sample LD from the UK Biobank[40]. We trained each method using 337,491 unrelated British-ancestry individuals[40], and computed predictions in four target populations: non-British Europeans, South Asians, East Asians, and Africans. We estimated mixing weights for PolyPred, PolyPred-S and PolyPred-P using 500 individuals from the target population. We evaluated prediction accuracy using held-out individuals from each target population that were not included in the training sets: 42K non-British Europeans, 7.7K South Asians, 0.9K East Asians, and 6.2K Africans. We computed PRS using 250,963 MAF≥0.1% SNPs with INFO score≥0.6 on chromosome 22.Generative trait architectures were specified as follows. We simulated traits with polygenicity (genome-wide proportion of causal SNPs) equal to either 0.1% (less polygenic) or 0.3% (more polygenic) and heritability equal to 5%. We specified prior causal probabilities for each SNP in proportion to per-SNP heritabilities, which we generated for each SNP based on its British LD, MAF, and functional annotations, using the baseline-LF model[47]. For each causal SNP, we sampled ancestry-specific causal effect sizes from a multivariate normal distribution assuming cross-population genetic correlations of 0.8[13,30]. Other parameter settings were explored in secondary analyses (see below).We computed relative-R2 for each method, target population, and trait architecture, averaged across 100 simulations. In addition to the simulations with in-sample LD described below, we also performed simulations with reference panel LD (Supplementary Note; also see Table 2). Further details of the simulation framework are provided in the Methods section.The simulation results are reported in Figure 3 and Supplementary Table 1 (also see Table 2). PolyPred was the most accurate method in each target population, with relative improvements vs. BOLT-LMM (resp. P-values for improvement) ranging from +13% in non-British Europeans (P<10−16) to +65% in Africans (P<10−16) for the less polygenic architecture, and from +2% in non-British Europeans (P=0.0001) to +17% in Africans (P=10−8) for the more polygenic architecture. PolyPred-S and PolyPred-P performed slightly worse than PolyPred, but were substantially and significantly more accurate than their corresponding constituent methods. Among the remaining methods, BOLT-LMM was consistently the most accurate and P+T was consistently the least accurate method, far underperforming the other methods (despite its widespread recent use[11,13-18,23,31,48-52]). We note that the higher accuracy of BOLT-LMM vs. SBayesR and PRS-CS does not imply that BOLT-LMM is a superior method, as BOLT-LMM analyzes individual-level training data whereas SBayesR and PRS-CS analyze summary statistics.
Figure 3:
Cross-population PRS results for simulated UK Biobank traits using in-sample LD.
We report average prediction accuracy (relative-R2; see main text) for PRS trained in UK Biobank British samples (N=337K) and applied to 4 UK Biobank target populations across 100 simulated traits with less polygenic (0.1% of SNPs causal; left panel) or more polygenic (0.3% of SNPs causal; right panel) architectures. Target population sample sizes are indicated in parentheses; PolyPred and its summary statistic-based analogues used 500 additional training samples from each target population to estimate mixing weights. Asterisks above each bar denote statistical significance of the difference vs. BOLT-LMM, with black asterisks denoting an advantage and red asterisks denoting a disadvantage (*P<0.05; **P<0.001). P-values were computed using a two-sided Wald test and were not adjusted for multiple comparisons. Errors bars denote standard errors. Numerical results, absolute prediction accuracies (R2), and P-values of relative improvements vs. BOLT-LMM are reported in Supplementary Table 1.
We additionally performed many secondary analyses to investigate the sensitivity of the results to the simulation parameters, the SNP set and the functional annotations, and to evaluate the computational cost and memory cost of each method (Supplementary Note, Supplementary Tables 1–2).We conclude that PolyPred and its summary statistic-based analogues are more accurate than BOLT-LMM, SBayesR, PRS-CS, and P+T, with small but significant improvements vs. BOLT-LMM in Europeans and substantial improvements in Africans.
PRS in 4 UK Biobank populations using British training data
We applied PolyPred and its summary statistic-based analogues to 49 diseases and complex traits from the UK Biobank, analyzing 4 target populations (Methods, Supplementary Table 3). As in our simulations, we used UK Biobank British training data (average N=325K) to estimate SNP effect sizes; used 500 additional individuals from the target population to estimate mixing weights; evaluated prediction accuracy using individuals from each of the 4 target populations that were not included in the training data: 42K non-British Europeans, 7.7K South Asians, 0.9K East Asians, and 6.2K Africans; and compared PolyPred and its summary statistic-based analogues to P+T, BOLT-LMM, SBayesR, and PRS-CS. We meta-analyzed relative-R2 across traits by restricting to 7 well-powered, independent complex traits from the UK Biobank[40] (|rg|<0.3; see Methods and Supplementary Table 3) that were also available in Biobank Japan and in Uganda-APCDR (see below). We have publicly released SNP effect sizes used for prediction for each of the 4 methods (see Data Availability).We computed relative-R2 for each method and target population. The results are summarized in Figure 4 and provided in Supplementary Tables 4–6 (also see Table 2). Among the published methods, BOLT-LMM attained the highest prediction accuracy in all target populations (differences between BOLT-LMM and SBayesR were small and not statistically significant). P+T was much less accurate than the other methods (despite its widespread recent use[11,13-18,23,31,48-52]), suffering relative losses of 37–50% vs. BOLT-LMM. We thus used BOLT-LMM as a benchmark.
Figure 4:
Cross-population PRS results for real UK Biobank traits.
We report average prediction accuracy (relative-R2; see main text), meta-analyzed across 7 well-powered, independent traits, for PRS trained in UK Biobank British samples (average N=325K) and applied to 4 UK Biobank target populations. Target population sample sizes are indicated in parentheses; PolyPred and its summary statistic-based analogues used 500 additional training samples from each target population to estimate mixing weights. Asterisks above each bar denote statistical significance of the difference vs. BOLT-LMM, with black asterisks denoting an advantage and red asterisks denoting a disadvantage (*P<0.05; **P<0.001). P-values were computed using a two-sided Wald test and were not adjusted for multiple comparisons. Errors bars denote standard errors. Numerical results, results for all 49 traits analyzed, absolute prediction accuracies (R2), and P-values of relative improvements vs. BOLT-LMM are reported in Supplementary Tables 4–6.
Among all 7 methods, PolyPred attained the highest prediction accuracy in each target population. Improvements in average relative-R2 of PolyPred vs. BOLT-LMM were equal to +7.5% in non-British Europeans (P=0.05), +6.8% in South Asians (P=0.02), +11% in East Asians (P=0.12) and +32% in Africans (P=0.02). The larger improvement in Africans reflects the larger LD differences vs. British training data, due to earlier divergence times[13,14,53]. The lack of statistical significance in East Asians reflects the low power to detect significant differences in very small target samples. PolyPred-S and PolyPred-P were consistently the second and third most accurate methods, respectively, with statistically significant improvements vs. their constituent methods. We additionally verified that PolyPred was well-calibrated (i.e., regressing the true phenotype on the predicted phenotype yields a slope of 1) in all target populations, whereas the alternative methods were not always well-calibrated (Supplementary Tables 4–6, Supplementary Note). Despite the improvements attained by PolyPred, the reductions in prediction accuracy in non-European populations remained significant (P<0.002), with meta-analyzed absolute R2 equal to 0.17 in non-British Europeans, 0.11 in South Asians, 0.093 in East Asians, and 0.053 in Africans (Methods, Supplementary Tables 4–5).As a secondary analysis, we meta-analyzed the results of each method across three independent diseases: type 2 diabetes, asthma, and all autoimmune disease (Methods); these diseases were not included in our primary meta-analyses due to low heritabilities. PolyPred attained the highest prediction accuracy for each target population and each disease, except for East Asians (where standard errors were large due to the small sample size) and for type 2 diabetes in non-British Europeans (where BOLT-LMM performed slightly but non-significantly better) (Supplementary Table 4). We performed additional secondary analyses to evaluate the impact of the LD reference panel and the SNP set on prediction accuracy, to evaluate additional methods, and to evaluate the results when modifying the parameters of PolyPred and the other evaluated methods (Supplementary Note, Supplementary Tables 4–7).We conclude that PolyPred and its summary statistic-based analogues substantially increase cross-population polygenic prediction accuracy vs. published methods (with a particularly large improvement in Africans), consistent with simulations. However, there remains a large gap in cross-population polygenic prediction accuracy as compared to Europeans.
PRS using ENGAGE meta-analysis training data
We sought to analyze training data consisting of summary statistics for real traits from a meta-analysis of many European cohorts, for which in-sample LD is generally not available. We analyzed 8.1 million meta-analyzed summary statistics from the European Network for Genetic and Genomic Epidemiology (ENGAGE) consortium[54-56] for four traits (BMI, waist-hip-ratio (adjusted for BMI), total cholesterol, and triglycerides; average N=61,365), and evaluated the prediction accuracy using the same four UK Biobank populations analyzed previously. For each method, we used an LD reference panel based on UK Biobank British individuals; we emphasize that unlike the other primary analyses, the LD reference panel was mis-specified, because it was not based on in-sample LD. We excluded methods that require individual-level training data (BOLT-LMM and PolyPred) from this analysis.The results are summarized in Extended Data Figure 1 and reported in Supplementary Tables 5 and 8 (also see Table 2). Briefly, PolyPred-P was generally the most accurate method, and PRS-CS outperformed SBayesR (with a significant improvement for non-British Europeans and Africans), consistent with a previous study[57] (unlike our analysis of UK Biobank training data, where SBayesR outperformed PRS-CS; Figure 4). However, differences between similarly performing methods were generally not statistically significant (due to moderately large standard errors), and thus caution should be exercised in their interpretation; for this reason, we did not perform secondary analyses to further assess differences between methods.
Extended Data Fig. 1
Cross-population PRS results for real UK Biobank traits, using summary statistics from a meta-analysis of many cohorts.
We report average prediction accuracy (relative-R2, but computed with respect to PRS-CS instead of BOLT-LMM; see main text), meta-analyzed across 4 well-powered, approximately independent traits, for PRS trained in European Network for Genetic and Genomic Epidemiology (ENGAGE) samples (average N=61,365) and applied to 4 UK Biobank populations. Target population sample sizes are indicated in parentheses; PolyPred and its summary statistic-based analogues used 500 additional training samples from each target population to estimate mixing weights. Asterisks above each bar denote statistical significance of the difference vs. PRS-CS, with red asterisks denoting a disadvantage (*P<0.05; **P<0.001). P-values were computed using a two-sided Wald test and were not adjusted for multiple comparisons. Errors bars denote standard errors. Numerical results, results for all 4 traits analyzed, absolute prediction accuracies (R2), and P-values of relative improvements vs. PRS-CS are reported in Supplementary Table 5 and Supplementary Table 8.
We conclude that PolyPred-P can increase cross-population polygenic prediction accuracy vs. published methods when analyzing summary statistics from a meta-analysis of many cohorts.
PRS in Biobank Japan and Uganda-APCDR cohorts
We applied PolyPred and its summary statistic-based analogues to predict 23 diseases and complex traits in Biobank Japan[41] and 7 complex traits in Uganda-APCDR, an African-ancestry cohort[42,43] (Methods, Supplementary Table 3). We performed these experiments to avoid training effect sizes and testing predictions in the same cohort, which may produce inflated prediction accuracies[33,58-60]. We again used UK Biobank British training data (average N=325K) to estimate SNP effect sizes, and used 500 individuals from the target population to estimate mixing weights. We evaluated prediction accuracy using individuals from each of the 2 target cohorts that were not included in the training data: 5K Biobank Japan individuals and 1.3K Uganda-APCDR individuals. We again compared PolyPred and its summary statistic-based analogues to P+T, BOLT-LMM, SBayesR, and PRS-CS. We meta-analyzed relative-R2 across the same 7 well-powered, independent complex traits used in the UK Biobank analyses (Supplementary Table 3).The results are summarized in Figure 5 and reported in Supplementary Tables 5 and 9. Among the published methods, we again observed that BOLT-LMM attained the highest prediction accuracy in each target population, and that P+T was substantially less accurate than the other methods. Among all 7 methods, PolyPred attained the highest prediction accuracy in Biobank Japan, and PolyPred-P attained the highest prediction accuracy in Uganda-APCDR (although the difference between PolyPred and PolyPred-P in Uganda-APCDR was not statistically significant). Improvements of PolyPred vs. BOLT-LMM in average relative-R2 were equal to +13% in Biobank Japan (P=2×10−6) and +22% in Uganda-APCDR (P=0.26), similar to our UK Biobank results above. We observed similar improvements for PolyPred-S vs. SBayesR and PolyPred-P vs. PRS-CS (both of which were statistically significant in Biobank Japan). Prediction accuracy for each method was much smaller in Biobank Japan and Uganda-APCDR (e.g. 0.32 and 0.11 for PolyPred; Figure 5) than in UK Biobank East Asians and UK Biobank Africans (0.62 and 0.34; Figure 4), likely due to higher SNP-heritabilities in the UK Biobank (see below). We also applied PolyPred+ and its summary statistic-based analogues to Biobank Japan, incorporating additional Biobank Japan training data (average N=124K), with the caveat that this analysis involved training and testing in the same cohort (Methods). PolyPred+ attained increased prediction accuracy, with a further +23% improvement vs. PolyPred (P=0.0004), with similar results for PolyPred-S+ and PolyPred-P+ (Supplementary Tables 5, 9).
Figure 5:
Cross-population PRS results for Biobank Japan and Uganda-APCDR traits.
We report average prediction accuracy (relative-R2; see main text), meta-analyzed across 7 well-powered, independent traits, for PRS trained in UK Biobank British samples (average N=325K) and applied to Biobank Japan and Uganda-APCDR target populations. Target population sample sizes are indicated in parentheses; PolyPred and its summary statistic-based analogues used 500 additional training samples from each target population to estimate mixing weights. Asterisks above each bar denote statistical significance of the difference vs. BOLT-LMM, with black asterisks denoting an advantage and red asterisks denoting a disadvantage (*P<0.05; **P<0.001). P-values were computed using a two-sided Wald test and were not adjusted for multiple comparisons. Errors bars denote standard errors. Numerical results, results for all 23 traits analyzed, absolute prediction accuracies (R2), and P-values of relative improvements vs. BOLT-LMM are reported in Supplementary Table 9.
We performed additional experiments to investigate the above result of decreased prediction accuracy in Biobank Japan vs. UK Biobank East Asians. We matched the BOLT-LMM British training sample size to the Biobank Japan training sample size, and obtained a relative-R2 in UK Biobank non-British Europeans (using UK Biobank British training samples) +108% larger than in Biobank Japan (using Biobank Japan training samples), consistent with the +104% increase expected from theory[61,62] based on the +67% higher SNP-heritabilities in UK Biobank (Supplementary Table 10, Supplementary Note). This suggests that differences in SNP-heritability due to ancestry or cohort differences may explain most of the differences in prediction accuracies observed between the UK Biobank and Biobank Japan. Further experiments and interpretation are provided in the Supplementary Note. We performed 6 additional secondary analyses to evaluate the sensitivity of the results to various factors (Supplementary Note, Supplementary Tables 5, 9).We conclude that PolyPred and its summary statistic-based analogues substantially increase cross-population polygenic prediction accuracy vs. published methods when applied to target cohorts different from the training cohort.
PRS in East Asians using British and Japanese training data
We applied PolyPred+ and its summary statistic-based analogues to predict 23 diseases and complex traits in UK Biobank East Asians using UK Biobank British and Biobank Japan training data (Supplementary Table 3). We performed this experiment to explore the special case where non-European training data is available in large sample size from a population that is genetically similar to the target population, in a cohort that is distinct from the target cohort (previous studies considered only European training data or analyzed non-European training data from the target cohort[11,13-17]). We note that this experiment is still imperfect in that the European training data and non-European target data are from the same cohort (UK Biobank); however, we believe that cohort effects would deflate rather than inflate the relative improvement of PolyPred+ vs. other methods, since they would confer an advantage to the European training data but not the non-European training data. We used UK Biobank British training data (average N=325K) and Biobank Japan training data (average N=124K) to estimate SNP effect sizes. We again used 500 individuals from the target population to estimate mixing weights, and evaluated prediction accuracy using 900 UK Biobank East Asians that were not included in the training data. We compared PolyPred, PolyPred+, and their summary statistic-based analogues to P+T, BOLT-LMM, SBayesR, and PRS-CS (Methods). We meta-analyzed relative-R2 across the same 7 well-powered, independent complex traits used in the previous analyses (Supplementary Table 3).The results are summarized in Figure 6 and reported in Supplementary Tables 4–6. PolyPred+ attained the highest prediction accuracy, with a +24% improvement vs. BOLT-LMM (P=0.0009) and a +12% improvement vs. PolyPred (P=0.0014). This implies that incorporating non-European training data can provide a substantial advantage, if it is available in large sample size. Results for PolyPred-S+ (vs. SBayesR and PolyPred-S) and PolyPred-P+ (vs. PRS-CS and PolyPred-P) were similar. We emphasize that the +12% improvement for PolyPred+ vs. PolyPred should be viewed as a lower bound on the improvement that would be obtained in settings without cohort effects that may confer an advantage to the European training data. We performed additional secondary analyses to evaluate the sensitivity of the results to various factors (Supplementary Note, Supplementary Tables 4–6).
Figure 6:
Cross-population PRS results for UK Biobank East Asians when incorporating both European and non-European training data.
We report average prediction accuracy (relative-R2; see main text), meta-analyzed across 7 well-powered, independent traits, for PRS trained in UK Biobank British (average N=325K) and Biobank Japan samples (average N=124K; used by PolyPred+ and its summary statistic-based analogues only) and applied to UK Biobank East Asians. The target population sample size is indicated in parentheses; PolyPred, PolyPred+, and their summary statistic-based analogues used 500 additional training samples from the target population to estimate mixing weights. Asterisks above each bar denote statistical significance of the difference vs. BOLT-LMM, with black asterisks denoting an advantage and red asterisks denoting a disadvantage (*P<0.05; **P<0.001). P-values were computed using a two-sided Wald test and were not adjusted for multiple comparisons. Errors bars denote standard errors. Numerical results, results for all 23 traits analyzed, absolute prediction accuracies (R2), and P-values of relative improvements vs. BOLT-LMM are reported in Supplementary Tables 4–6.
We conclude that PolyPred+ and its summary statistic-based analogues further increase cross-population prediction accuracy in the special case where non-European training data from the target population (or a closely related population) is available in large sample size. We emphasize that efforts to assess the benefit of incorporating non-European training data should analyze non-European training data from a cohort that is distinct from the target cohort, otherwise results may be inflated due to cohort effects.
Discussion
We have introduced PolyPred, which improves cross-population polygenic risk prediction by incorporating causal effects in addition to tagging effects, addressing cross-population LD differences. Across seven well-powered independent traits, PolyPred significantly increased prediction accuracy over BOLT-LMM by 32% in UK Biobank Africans and by 13% in Biobank Japan (with similar results vs. SBayesR and PRS-CS). In the special case where a large training sample is available in the non-European target population (or a closely related population), we have introduced PolyPred+, which further incorporates the non-European training data, addressing MAF differences and causal effect size differences. PolyPred+ significantly increased prediction accuracy in UK Biobank East Asians over BOLT-LMM by 24% (and over PolyPred by 12%). PolyPred and PolyPred+ require individual-level training data (for their BOLT-LMM component), but we have also introduced summary statistic-based analogues of PolyPred and PolyPred+ in cases where individual-level training data is not available; specific recommendations are provided in Figure 2 (also see Table 2). In conclusion, PolyPred and its summary statistic-based analogues substantially improve cross-population polygenic prediction accuracy, ameliorating health disparities[13]. We have publicly released the PRS coefficients for all SNPs and traits analyzed under all evaluated methods (see Data Availability).Although we substantially improved cross-population PRS accuracy over the state of the art, prediction accuracy in non-Europeans is still substantially lower compared to Europeans, even within the UK Biobank. There are two reasons for the remaining accuracy gap. First, European sample sizes are still limited, which limits the ability of PolyFun-pred to estimate causal rather than tagging effects. Second, non-European sample sizes are limited, which limits the ability of BOLT-LMM applied to non-European samples to estimate tagging effects. Even with an infinite European training sample, which allows estimating causal effects perfectly (thus addressing LD differences), prediction accuracy could still be higher for Europeans vs. non-Europeans due to cross-population genetic correlations less than 1[13,30,63,64] and different allele frequencies (including population-specific SNPs) (Supplementary Note). Hence our theory and results confirm that larger non-European GWAS are the best way to further improve PRS accuracy in non-European populations[9,10,12,13,21].Our work has several limitations, providing opportunities for future work. First, we did not evaluate a setting where the British training data, the non-British training data, and the target population are sampled from three different cohorts. Second, PolyPred requires a large number of imputed SNPs (e.g. 8.1 million SNPs in the ENGAGE analysis) to perform fine-mapping, motivating the need for large cross-population imputation panels. Third, it may be possible to improve PRS accuracy for admixed individuals by using European effect sizes for European alleles and non-European effect sizes for non-European alleles[16,17]. Fourth, PolyPred and its summary statistic-based analogues are slower than alternative PRS methods (Supplementary Note). Fifth, PolyPred cannot use data from a fixed-effects meta-analysis of GWAS data of different populations (Supplementary Note). Sixth, PolyPred requires a small training sample from the target cohort to maintain calibrated predictions (Supplementary Note). Seventh, PolyPred prediction accuracy could in principle be improved if it were possible to decompose its constituent predictors into shared and non-shared components (Supplementary Note). Despite all these limitations, PolyPred and PolyPred+ and their summary statistic-based analogues provide a clear improvement for cross-population polygenic risk prediction.
Methods
PolyPred and its summary statistic-based analogues
All methods in this paper use a linear PRS, i.e., , where is the PRS of an individual, x is the number of minor alleles of SNP i carried by that individual, and is the estimated per-allele causal effect size of SNP i. The methods differ in the way they estimate .PolyPred and PolyPred+ both combine the methods PolyFun-pred and BOLT-LMM; PolyPred-S and PolyPred-S+ both combine the methods PolyFun-pred and SBayesR; and PolyPred-P and PolyPred-P+ both combine the methods PolyFun-pred and PRS-CS. PolyFun-pred estimates as the (approximate) posterior mean causal effect size of SNP i, as estimated by PolyFun + SuSiE[35] based on European training data, using 187 functional annotations to specify prior causal probabilities (see below). BOLT-LMM (resp. SBayesR and PRS-CS) estimates tagging effects (Supplementary Note) of HapMap 3 SNPs by applying BOLT-LMM[36,37] (resp. SBayesR[38] and PRS-CS[39]) to European training data. BOLT-LMM (resp. SBayesR) treats the effect of each SNP i as a random effect sampled from a mixture of two (resp. four) zero-mean normal distributions, whose variances and mixture weights are determined in a data-driven manner. PRS-CS treats the effect of each SNP i as a random effect sampled from a continuous shrinkage prior distribution.PolyPred and its summary statistic-based analogues compute the effect size of each SNP i that is either in HapMap 3 or has a European MAF≥0.1% and INFO score ≥0.6 as a weighted combination of (1) its PolyFun-pred effect size based on European training data; and (2) its BOLT-LMM (resp. SBayesR and PRS-CS) effect size based on European training data:
where is the PolyFun-pred approximate posterior mean causal effect size of SNP i based on European training data, is the approximate posterior mean tagging effect size of SNP i based on European training data using the indicated method (setting the effects of SNPs not in HapMap 3 to zero), and wPolyFun–pred, wBOLT–LMM/SBayesR/PRS–CS are mixing weights. PolyPred estimates the mixing weights via non-negative least squares estimation (i.e., least squares estimation constrained to produce to non-negative estimates) based on training individuals from the target cohort. Specifically, PolyPred (resp. PolyPred-S and PolyPred-P) estimates the mixing weights by computing the PRS corresponding to the PolyFun-pred effect sizes (given by ) and the PRS corresponding to the BOLT-LMM (resp. SBayesR and PRS-CS) effect sizes (given by ), and then fitting the mixing weights by regressing the true phenotypes y of the training individuals in the target cohort on the PolyFun-pred and the BOLT-LMM (resp. SBayesR and PRS-CS) PRSs. The use of non-negative least squares estimation guarantees that the correlation of the predicted phenotype with the true phenotype is at least as large as the smallest of the correlations between each constituent predicted phenotype and the true phenotype.PolyPred+ and its summary statistic-based analogues compute the effect size of each SNP i that is either in HapMap 3 or has a European MAF≥0.1% and INFO score ≥0.6 as a weighted combination of (1) its PolyFun-pred effect size based on European training data; (2) its BOLT-LMM (resp. SBayesR and PRS-CS) effect size based on European training data; and (3) its effect size as estimated by applying BOLT-LMM (resp. SBayesR and PRS-CS) to training data from the target population (or a closely related population):
where is the BOLT-LMM (resp. SBayesR or PRS-CS) approximate posterior mean tagging effect of SNP i based on training data from the non-European population (and set to zero for SNPs that are not in HapMap 3), and wBOLT–LMM/SBayesR/PRS–CS–nonEur is the mixing weight of . The mixing weights are estimated as in PolyPred.In practice, we apply PolyPred and its summary statistic-based analogues by linearly combining the PolyFun-pred PRS and the BOLT-LMM (or SBayesR or PRS-CS) PRS (rather than linearly combining the SNP effect sizes). The two procedures are almost mathematically identical, with the only difference being that a linear combination of PRSs can also accommodate an intercept, which explicitly bias-corrects the PRS to the target population.We applied PolyFun-pred in the same way that we applied PolyFun + SuSiE in our previous work[35]. Briefly, we applied PolyFun-pred across 2,763 overlapping 3Mb loci (equally spaced starting at chromosome 1, position 0) spanning 18,212,157 European MAF>0.1% imputed SNPs with INFO score>0.6 (excluding the HLA and two other long-range LD regions)[35], assuming 10 causal SNPs per locus. We used summary statistics computed by BOLT-LMM, based on up to N=337,491 unrelated British-ancestry UK Biobank individuals, and using summary LD information estimated directly from the target samples. Full details are provided in ref.[35]. We note that the use of BOLT-LMM summary statistics is mathematically equivalent to regressing the target phenotypes on BOLT-LMM off-chromosome PRS prior to applying PolyFun + SuSiE[37]. We also note that the use of 3Mb loci guarantees that for each SNP, the estimation of its causal effect size takes into account virtually all relevant SNPs that may be in LD with that SNP (because LD in European populations rarely ranges beyond 1Mb[65]), allowing to disentangle its causal effect size from its tagging effect size.PRS methods that include non-common SNPs (MAF<5%) may be sensitive to MAF-dependent and LD-dependent architectures[66,47,67]. Previous PRS methods have largely alleviated this concern by discarding non-common SNPs instead of explicitly modeling their lower per-SNP heritability[33,38,39,68-70,58,59,71-73]. In contrast, PolyFun-pred accounts for MAF-dependent and LD-dependent architectures by specifying SNP-specific prior causal probabilities based on the baseline-LF model[47] (Supplementary Table 11). In detail, PolyFun-pred uses 187 overlapping functional annotations from the baseline-LF model (previously described in ref.[35]), including 10 common MAF bins (MAF≥0.05); 10 low-frequency MAF bins (0.05>MAF≥0.001); 6 LD-related annotations for common SNPs; 5 LD-related annotations for low-frequency SNPs; 40 binary functional annotations for common SNPs; 7 continuous functional annotations for common SNPs; 40 binary functional annotations for low-frequency SNPs; 3 continuous functional annotations for low-frequency SNPs; and 66 annotations constructed via windows around other annotations[74] (Supplementary Table 11).
Estimating relative-R2 and its standard error
We measured prediction accuracy for each trait via a measure that we call relative-R2, defined via the following computations:Compute R2-PRS: the R2 obtained via a linear predictor that includes PRS, age, sex, age*sex (if the correlation with age was <0.95), UK Biobank assessment center (defined via dummy binary variables), genotyping array, 10 principal components (computed separately for each ancestry; see below), and dilution factor (for biochemical traits only).Compute R2-noPRS, defined like R2-PRS but omitting the PRSCompute R2-PRS-BOLT-EUR, computed by applying BOLT-LMM to UK Biobank non-British Europeans as in step 1Compute R2-noPRS-BOLT-EUR, computed by applying BOLT-LMM but omitting the PRS to non-British Europeans.Compute relative-R2 as (R2-PRS - R2-noPRS) / (R2-PRS-BOLT-EUR - R2 - noPRS-BOLT-EUR).We note that fold improvement in relative-R2 is the same as fold improvement in absolute difference in R2, (i.e., in R2-PRS - R2-noPRS), because the denominator (R2-PRS-BOLT-EUR - R2-noPRS- BOLT-EUR) is a trait-specific scaling factor.We computed standard errors of relative-R2, of differences in relative-R2 (e.g., vs. BOLT-LMM), of ancestry-specific regression slopes, and of the area under the receiver operating curve (for disease traits) via genomic block-jackknife, partitioning the genome into 200 equally-sized consecutive loci and omitting each one in turn. In secondary analyses, we computed standard errors by applying jackknife over individuals from the target population. These analyses yielded much smaller standard errors in the UK Biobank, suggesting that genomic block-jackknife standard errors may be conservative, whereas individual-based jackknife estimates maty be anti-conservative. We emphasize that individual-based jackknife explicitly assumes a fixed training set.We estimated statistics (e.g., relative-R2) for meta-analyzed traits via an inverse-variance weighted average, using weights inversely proportional to the standard error of the R2 of BOLT-LMM in the target population (as estimated via genomic block-jackknife). We estimated the standard error of the meta-analyzed statistics as the square root of the weighted average of the trait-specific sampling variances (obtained via genomic block-jackknife), divided by the square root of the number of traits. We computed p-values of differences in relative-R2 vs. BOLT-LMM via a Wald test.We computed the statistical significance of the decrease in R2 in non-European vs. European target samples via a Wald test for the difference in R2, conservatively estimating the sampling variance of this difference as the sum of the sampling variances of the European R2 and the non-European R2 (this is a conservative estimate as long as the R2 estimates in Europeans and non-Europeans are not negatively correlated, which is extremely unlikely).
Cohorts Analyzed
UK Biobank
The UK Biobank is a UK-based population cohort[40]. We used version 3 of the imputed genotypes, as described in our previous work[35]. We computed ancestry-specific PCs for UK Biobank Africans, UK Biobank East Asians, and UK Biobank South Asians via plink 1.9[75], restricting to SNPs that have ancestry-specific MAF>5%, missingness<10%, HWE p-value>10−10, and that were LD-pruned using the command --indep-pairwise 1000 50 0.05, and restricted to unrelated individuals (kinship coefficient <0.05) from the target ancestry with missingness <10%. We used the UK Biobank provided PCs for UK Biobank Europeans.We defined the ‘autoimmune disease’ trait in the UK Biobank as a union of the following UK Biobank codes: 1154 (irritable bowel syndrome); 1222 (type 1 diabetes); 1224 (thyroid problem); 1225 (hyperthyroidism/thyrotoxicosis); 1226 (hypothyroidism/myxoedema); 1256 (acute infective polyneuritis/guillain-barre syndrome); 1260 (myasthenia gravis); 1261 (multiple sclerosis); 1313 (ankylosing spondylitis); 1372 (vasculitis); 1377 (polymyalgia); 1378 (wegners granulmatosis); 1381 (systemic lupus erythematosis/sle); 1382 (sjogren’s syndrome/sicca syndrome); 1384 (scleroderma/systemic sclerosis); 1437 (myasthenia gravis); 1453 (psoriasis); 1456 (malabsorption/coeliac disease); 1461 (inflammatory bowel disease); 1462 (Crohns disease); 1463 (ulcerative colitis); 1464 (rheumatoid arthritis); 1477 (psoriatic arthropathy); 1522 (grave’s disease); 1661 (vitiligo); 1667 (alopecia / hair loss).
European Network for Genetic and Genomic Epidemiology
European Network for Genetic and Genomic Epidemiology (ENGAGE) is a consortium comprised of 24 cohorts to study the impact of genetic variations on medical phenotypes through GWAS[54]. The consortium has performed over 80,000 GWASs using genetic and phenotype samples from over 600,000 individuals, and made the GWAS summary statistics publicly available[54].We obtained ENGAGE GWAS summary statistics, representing fixed-effect meta-analyses from 22 studies of European ancestry, for 2 lipid phenotypes[55] (triglyceride (N=56,267) and total cholesterol (N=58,327)), and 2 obesity-related phenotypes[56] (BMI (N=80,938) and BMI-adjusted waist hip ratio (N=49,877)). In each ENGAGE study, up to 37.4 million autosomal variants were imputed using the 1000 Genomes Project (we used 8.1 million variants which were also imputed in the UK Biobank); phenotypes were adjusted for age, age squared, genotype principal components, and other study-/trait-specific covariates, and were inverse rank normalized; GWASs were performed for each sex separately and combined using fixed-effect meta-analysis; a single genomic control correction was performed for each study prior to a cross-study meta-analysis[55,56].
Biobank Japan
Biobank Japan (BBJ) is a multi-institutional hospital-based biobank with DNA and serum samples from approximately 200,000 participants from 12 medical institutions in Japan[41]. The participants are mainly of Japanese ancestry and had been diagnosed with at least one of 47 diseases by physicians at the cooperating hospitals. Written informed consent was obtained from all the participants, as approved by the ethics committees of RIKEN Center for Integrative Medical Sciences and the Institute of Medical Sciences at the University of Tokyo.We genotyped samples with either (i) the Illumina HumanOmniExpressExome BeadChip or (ii) a combination of the Illumina HumanOmniExpress and HumanExome BeadChips. We applied standard quality control criteria for both samples and variants as detailed elsewhere[76]. We then pre-phased genotypes with Eagle2[77] and imputed dosages with Minimac3[78] using 1000 Genomes project phase 3 (version 5) data (N=2,504) and Japanese whole-genome sequencing (WGS) data (N=1,037) as a reference[76]. We computed PCs using EIGENSOFT’s smartpca[79].For phenotypes, we retrieved clinical medical records from the participating hospitals through interviews and a standardized questionnaire. We used 23 diseases and complex traits in Biobank Japan which are also analyzed in UK Biobank (Supplementary Table 3). We normalized quantitative phenotypes via inverse-rank normal transformation as described elsewhere[80]. We defined the ‘autoimmune disease’ trait in Biobank Japan as a union of Graves’ disease and rheumatoid arthritis.
Uganda-APCDR
Uganda-APCDR is a population-based cohort from the General Population Cohort (GPC), Uganda. We retrieved genotype and phenotype data through the African Partnership for Chronic Disease Research (APCDR) initiative via the European Genome-Phenome Archive (EGA), using EGAD00010000965 to access genotype data. Phenotype data were accessed via sftp from EGA (reference: DD_PK_050716 gwas_phenotypes_28Oct14.txt). The participants are from nine ethno-linguistic groups in sub-Saharan Africa and had been recruited from the study area located in southwestern Uganda in Kyamulibwa subcounty of Kalungu district, approximately 120 km from Entebbe town. These ethno-linguistic groups have diverse population structure with varying degrees of admixture between Eurasian and East African Nilo-Saharan ancestries, which has been extensively characterized elsewhere[81]. The detailed cohort demographics, sample collection, and processing were described previously[42,43].Briefly, the samples were genotyped using the Illumina HumanOmni 2.5M BeadChip at the Wellcome Trust Sanger Institute. We used the Ricopili pipeline to conduct pre-imputation QC and perform phasing and imputation[82]. Briefly, we phased the data using Eagle 2.3.5[77] and imputed variants using minimac3[78] in chunks ≥3Mb. The 1000 Genomes project phase 3 haplotypes[65] were used as the reference panel for phasing and imputation.As described previously, phenotypes were collected using a standard individual questionnaire, blood samples (laboratory tests), and biophysical measurements (height, weight, waist and hip circumferences and blood pressure)[42]. We normalized quantitative phenotypes via inverse-rank normal transformation.
UK Biobank Simulations
We simulated data based on real genotypes of UK Biobank individuals, using 250,963 MAF≥0.1% SNPs with INFO score≥0.6 on chromosome 22 (including short indels) (Supplementary Note). We trained all methods using 337,491 unrelated British-ancestry individuals[40], and we estimated the mixing weights of PolyPred and its summary statistic-based analogues using up to 1000 additional individuals from each of the four non-British ancestries. We computed summary statistics by applying linear regression via Plink 2.0. We did not evaluate PolyPred+ in the simulations because of the relatively small sample sizes of the UK Biobank non-European populations. We evaluated prediction accuracy via R2, using held-out individuals that were not included in the training sets and were unrelated to the training set individuals and to each other, using 42K non-British Europeans, 7.7K South Asians, 0.9K East Asians, and 6.2K Africans. We computed PRSs by applying plink 2.0 with the --score command, using imputed dosage data (rather than hard-called SNP values). We computed standard errors via a jackknife over simulations.We trained BOLT-LMM by applying BOLT-LMM v2.3.4 to plink files of HapMap 3 SNPs (hard-coded from imputed dosages), using the same covariates specified in the “Estimating relative-R2 and its standard error” Methods subsection, and specifying the flag –predBetasFile to report PRS coefficients.We trained SBayesR using summary statistics from the infinitesimal version of BOLT-LMM (BOLT-LMM-inf[36]), which yielded far superior accuracy vs. using summary statistics from the non-infinitesimal version of BOLT-LMM. We ran SBayesR using 10,000 iterations, 4,000 burn-in iterations, using values from 10% of the iterations to compute posterior means, using the HapMap 3 LD files published the SBayesR authors[83]. We attempted to run SBayesR using a mixture of four distributions (using π = [0.95,0.02,0.02,0.01] and γ = [0,0.01,0.1,1]). In case SBayesR failed with these parameters, we iteratively shrank the last entry in the vector γ by 50% until it was smaller than 10−6, at which point we removed the last mixture component and redefined π such that the first entry was equal to 0.95 and all other entries had the same value such that all values sum to 1.0.We trained PRS-CS using summary statistics from BOLT-LMM-inf (as in SBayesR) with the parameters a=1, b=0.5, thin=5, n_iter=10000, n_burnin=500, and without specifying the value of phi (corresponding to PRS-CS-auto). We used the UK Biobank LD reference panels made publicly available by the authors of PRS-CS (see Data Availability).We trained P+T by applying plink with the command –clump-r2 0.5 –clump-kb 250 with various values of –clump-p1 (following ref.[13]), and using 10,000 randomly selected unrelated UK Biobank British individuals to compute LD. We estimated LD using 10,000 individuals to balance between runtime and accuracy (noting that P+T is relatively insensitive to the LD reference panel size compared to the other methods evaluated in this manuscript). We used summary statistics based on BOLT-LMM, using marginal effect sizes derived from reported χ2 values (i.e., the square root of χ2 divided by the square root of the BOLT-LMM effective sample size[35], and multiplied by the sign of the effect size estimated by the infinitesimal version of BOLT-LMM). We used the best value of –clump-p1 (out of the evaluated values 10−2, 10−3, 10−4, 10−6, 5×10−8) based on the target sample phenotypes, which leads to anti-conservative prediction accuracy estimates for P+T.We used slightly different LD reference panels for PolyFun-pred, SBayesR, and PRS-CS, because (i) they use different algorithms to impose sparsity on LD matrices, and different file formats to store them; and (ii) we assume that naively running SBayesR or PRS-CS using summary LD from the 18 million SNPs used by PolyFun-pred would be computationally infeasible, based on information provided in the manuscripts describing these methods[38,39]. When modifying the training sample size, we kept the LD reference panel sample size fixed to alleviate computational costs.
Analysis of real data
We performed four sets of analyses: (i) Analysis of 4 UK Biobank populations using UK Biobank British training data; (ii) Analysis of 4 UK Biobank populations using ENGAGE meta-analysis training data; (iii) Analysis of Biobank Japan and Uganda-APCDR cohorts; and (iv) Analysis of UK Biobank East Asians using UK Biobank British and Biobank Japan training data. In analysis sets (i), (iii) and (iv), we evaluated PRSs generated by training all methods using unrelated UK Biobank British-ancestry individuals. In analysis set (ii), we evaluated PRSs generated by training all methods using summary statistics from 8.1 million meta-analyzed summary statistics from the ENGAGE consortium[54-56]. In a subset of analysis set (iii) and in analysis set (iv) we additionally evaluated PRSs generated by training BOLT-LMM-BBJ (BOLT-LMM trained on Biobank Japan individuals). In all analysis sets, the individuals in the target populations were unrelated to each other and to the individuals in the training set (when available).In analysis sets (i), (iii) and (iv), we selected the 7 traits to meta-analyze by first restricting the set of 49 traits analyzed in ref.[35] to traits that are available in Biobank Japan and Uganda-APCDR and are well-powered across multiple ancestries, having h2>0.05 in UK Biobank non-British Europeans, in UK Biobank South Asians, and in UK Biobank Africans (see below for details on ancestry-specific heritability estimation). We then iteratively greedily selected ranked traits according to their heritability in UK Biobank non-British Europeans (estimated as in ref. [35]), such that no selected trait had |rg|<0.3 with a previously selected trait.We computed ancestry-specific SNP heritabilities in each UK Biobank ancestry by applying GCTA[84] to unrelated sets of individuals using hard-called HapMap 3 SNPs (using a random set of 10,000 individuals for non-British Europeans to facilitate the computations). We did not use more advanced methods[85] because of the relatively small sample sizes. We meta-analyzed ancestry-specific SNP heritabilities by averaging the estimated heritabilities, and we estimated the meta-analyzed standard error via the square root of the average sampling variance, divided by the square root of the number of traits.In analysis sets (i), (iii) and (iv), We trained all PRS methods on UK Biobank unrelated British-ancestry individuals (average N=325) as described in the Methods subsection “UK Biobank simulations”, but using summary statistics generated by BOLT-LMM when applied to UK Biobank British-ancestry individuals, as described in our previous work[35]. We trained P+T separately for each non-UK Biobank cohort by restricting the set of SNPs considered to the set of SNPs available in both the UK Biobank and in the target cohort. We computed the contribution of PolyFun-pred (resp. BOLT-LMM) towards PolyPred via the ratio of the mixing weight of PolyFun-pred (resp. BOLT-LMM) to the sum of the mixing weights of PolyPred and of BOLT-LMM.In analysis sets (i), (ii) and (iv), we computed a PRS for each UK Biobank individual using imputed dosage data as described in the “UK Biobank Simulations”. In analysis set (iii), we computed a PRS for each individual in Biobank Japan and in Uganda-APCDR using imputed dosage data using Plink 2.0[86,87].In secondary analyses of analysis set (i) we also evaluated LDpred[33]. We trained LDpred using HapMap 3 SNPs and using two different LD reference panels: 1000 Genomes project[65] and UK10K[88]. We used summary statistics from the infinitesimal version of BOLT-LMM (as in SBayesR) and with default parameters, using the parameter --ldr 400. We used the value of “--F” (corresponding to the assumed proportion of causal SNPs, using all the default evaluated values) that yielded the best prediction accuracy in the target sample, yielding anti-conservative accuracy estimates as in P+T.In analysis sets (iii) and (iv), we trained BOLT-LMM-BBJ, SBayesR-BBJ, and PRS-CS-BBJ (BOLT-LMM, SBayesR, and PRS-CS, respectively, trained using Biobank Japan training data) (average N=124K). We selected individuals for training these methods as described in our previous work[13], but excluding a random subset of 5,000 individuals that were used for evaluating prediction accuracy. For SBayesR-BBJ, we used a subset of individuals (N=50K) from Biobank Japan to compute in-sample LD, following the recommendations of the authors of SBayesR[38]. For PRS-CS-BBJ, we used the East Asian LD reference panels made publicly available by the authors of PRS-CS (see Data Availability).
Data availability
Access to the UK Biobank resource is available via application (http://www.ukbiobank.ac.uk). PRS coefficients generated in this study are available for public download at http://data.broadinstitute.org/alkesgroup/polypred_results. Summary LD information of N=337K British-ancestry UK Biobank individuals for 2,763 overlapping 3Mb loci is available at: https://data.broadinstitute.org/alkesgroup/UKBB_LD. Summary LD information of N=50K UK Biobank individuals for SBayesR is available at: https://zenodo.org/record/3350914. Summary LD information used by PRS-CS is available at: https://github.com/getian107/PRScs. Baseline-LF v2.2.UKB annotations and LD-scores for UK Biobank SNPs are available at: https://data.broadinstitute.org/alkesgroup/LDSCORE/baselineLF_v2.2.UKB.tar.gz
Source data are provided with this paper.
Code availability
PolyPred and PolyPred+ are provided as part of the open-source software package PolyFun, which is freely available at https://doi.org/10.5281/zenodo.6139679[89] and https://github.com/omerwe/polyfun. BOLT-LMM is available at https://data.broadinstitute.org/alkesgroup/BOLT-LMM. SBayesR is available at https://cnsgenomics.com/software/gctb. PRS-CS is available at https://github.com/getian107/PRScs.
Cross-population PRS results for real UK Biobank traits, using summary statistics from a meta-analysis of many cohorts.
We report average prediction accuracy (relative-R2, but computed with respect to PRS-CS instead of BOLT-LMM; see main text), meta-analyzed across 4 well-powered, approximately independent traits, for PRS trained in European Network for Genetic and Genomic Epidemiology (ENGAGE) samples (average N=61,365) and applied to 4 UK Biobank populations. Target population sample sizes are indicated in parentheses; PolyPred and its summary statistic-based analogues used 500 additional training samples from each target population to estimate mixing weights. Asterisks above each bar denote statistical significance of the difference vs. PRS-CS, with red asterisks denoting a disadvantage (*P<0.05; **P<0.001). P-values were computed using a two-sided Wald test and were not adjusted for multiple comparisons. Errors bars denote standard errors. Numerical results, results for all 4 traits analyzed, absolute prediction accuracies (R2), and P-values of relative improvements vs. PRS-CS are reported in Supplementary Table 5 and Supplementary Table 8.
Authors: Roseann E Peterson; Karoline Kuchenbaecker; Raymond K Walters; Chia-Yen Chen; Alice B Popejoy; Sathish Periyasamy; Max Lam; Conrad Iyegbe; Rona J Strawbridge; Leslie Brick; Caitlin E Carey; Alicia R Martin; Jacquelyn L Meyers; Jinni Su; Junfang Chen; Alexis C Edwards; Allan Kalungi; Nastassja Koen; Lerato Majara; Emanuel Schwarz; Jordan W Smoller; Eli A Stahl; Patrick F Sullivan; Evangelos Vassos; Bryan Mowry; Miguel L Prieto; Alfredo Cuellar-Barboza; Tim B Bigdeli; Howard J Edenberg; Hailiang Huang; Laramie E Duncan Journal: Cell Date: 2019-10-10 Impact factor: 41.582
Authors: Genevieve L Wojcik; Mariaelisa Graff; Katherine K Nishimura; Ran Tao; Jeffrey Haessler; Christopher R Gignoux; Heather M Highland; Yesha M Patel; Elena P Sorokin; Christy L Avery; Gillian M Belbin; Stephanie A Bien; Iona Cheng; Sinead Cullina; Chani J Hodonsky; Yao Hu; Laura M Huckins; Janina Jeff; Anne E Justice; Jonathan M Kocarnik; Unhee Lim; Bridget M Lin; Yingchang Lu; Sarah C Nelson; Sung-Shim L Park; Hannah Poisner; Michael H Preuss; Melissa A Richard; Claudia Schurmann; Veronica W Setiawan; Alexandra Sockell; Karan Vahi; Marie Verbanck; Abhishek Vishnu; Ryan W Walker; Kristin L Young; Niha Zubair; Victor Acuña-Alonso; Jose Luis Ambite; Kathleen C Barnes; Eric Boerwinkle; Erwin P Bottinger; Carlos D Bustamante; Christian Caberto; Samuel Canizales-Quinteros; Matthew P Conomos; Ewa Deelman; Ron Do; Kimberly Doheny; Lindsay Fernández-Rhodes; Myriam Fornage; Benyam Hailu; Gerardo Heiss; Brenna M Henn; Lucia A Hindorff; Rebecca D Jackson; Cecelia A Laurie; Cathy C Laurie; Yuqing Li; Dan-Yu Lin; Andres Moreno-Estrada; Girish Nadkarni; Paul J Norman; Loreall C Pooler; Alexander P Reiner; Jane Romm; Chiara Sabatti; Karla Sandoval; Xin Sheng; Eli A Stahl; Daniel O Stram; Timothy A Thornton; Christina L Wassel; Lynne R Wilkens; Cheryl A Winkler; Sachi Yoneyama; Steven Buyske; Christopher A Haiman; Charles Kooperberg; Loic Le Marchand; Ruth J F Loos; Tara C Matise; Kari E North; Ulrike Peters; Eimear E Kenny; Christopher S Carlson Journal: Nature Date: 2019-06-19 Impact factor: 69.504
Authors: Po-Ru Loh; George Tucker; Brendan K Bulik-Sullivan; Bjarni J Vilhjálmsson; Hilary K Finucane; Rany M Salem; Daniel I Chasman; Paul M Ridker; Benjamin M Neale; Bonnie Berger; Nick Patterson; Alkes L Price Journal: Nat Genet Date: 2015-02-02 Impact factor: 38.330
Authors: Isabelle Budin-Ljøsne; Julia Isaeva; Bartha Maria Knoppers; Anne Marie Tassé; Huei-yi Shen; Mark I McCarthy; Jennifer R Harris Journal: Eur J Hum Genet Date: 2013-06-19 Impact factor: 4.246
Authors: Elizabeth G Atkinson; Sevim B Bianchi; Gordon Y Ye; José Jaime Martínez-Magaña; Grace E Tietz; Janitza L Montalvo-Ortiz; Paola Giusti-Rodriguez; Abraham A Palmer; Sandra Sanchez-Roige Journal: Neuropsychopharmacology Date: 2022-06-23 Impact factor: 8.294
Authors: Alyna T Khan; Stephanie M Gogarten; Caitlin P McHugh; Adrienne M Stilp; Tamar Sofer; Michael L Bowers; Quenna Wong; L Adrienne Cupples; Bertha Hidalgo; Andrew D Johnson; Merry-Lynn N McDonald; Stephen T McGarvey; Matthew R G Taylor; Stephanie M Fullerton; Matthew P Conomos; Sarah C Nelson Journal: Cell Genom Date: 2022-07-26
Authors: Dongbing Lai; Tae-Hwi Schwantes-An; Marco Abreu; Grace Chan; Victor Hesselbrock; Chella Kamarajan; Yunlong Liu; Jacquelyn L Meyers; John I Nurnberger; Martin H Plawecki; Leah Wetherill; Marc Schuckit; Pengyue Zhang; Howard J Edenberg; Bernice Porjesz; Arpana Agrawal; Tatiana Foroud Journal: Transl Psychiatry Date: 2022-07-05 Impact factor: 7.989
Authors: Tian Ge; Marguerite R Irvin; Amit Patki; Vinodh Srinivasasainagendra; Yen-Feng Lin; Hemant K Tiwari; Nicole D Armstrong; Barbara Benoit; Chia-Yen Chen; Karmel W Choi; James J Cimino; Brittney H Davis; Ozan Dikilitas; Bethany Etheridge; Yen-Chen Anne Feng; Vivian Gainer; Hailiang Huang; Gail P Jarvik; Christopher Kachulis; Eimear E Kenny; Atlas Khan; Krzysztof Kiryluk; Leah Kottyan; Iftikhar J Kullo; Christoph Lange; Niall Lennon; Aaron Leong; Edyta Malolepsza; Ayme D Miles; Shawn Murphy; Bahram Namjou; Renuka Narayan; Mark J O'Connor; Jennifer A Pacheco; Emma Perez; Laura J Rasmussen-Torvik; Elisabeth A Rosenthal; Daniel Schaid; Maria Stamou; Miriam S Udler; Wei-Qi Wei; Scott T Weiss; Maggie C Y Ng; Jordan W Smoller; Matthew S Lebo; James B Meigs; Nita A Limdi; Elizabeth W Karlson Journal: Genome Med Date: 2022-06-29 Impact factor: 15.266
Authors: Nina Mars; Sini Kerminen; Yen-Chen A Feng; Masahiro Kanai; Kristi Läll; Laurent F Thomas; Anne Heidi Skogholt; Pietro Della Briotta Parolo; Benjamin M Neale; Jordan W Smoller; Maiken E Gabrielsen; Kristian Hveem; Reedik Mägi; Koichi Matsuda; Yukinori Okada; Matti Pirinen; Aarno Palotie; Andrea Ganna; Alicia R Martin; Samuli Ripatti Journal: Cell Genom Date: 2022-04-13