| Literature DB >> 35935918 |
Margaux L A Hujoel1,2,3,4, Po-Ru Loh2,3, Benjamin M Neale3, Alkes L Price1,3,5.
Abstract
Polygenic risk scores (PRSs) derived from genotype data and family history (FH) of disease provide valuable information for predicting disease risk, but PRSs perform poorly when applied to diverse populations. Here, we explore methods for combining both types of information (PRS-FH) in UK Biobank data. PRSs were trained using all British individuals (n = 409,000), and target samples consisted of unrelated non-British Europeans (n = 42,000), South Asians (n = 7,000), or Africans (n = 7,000). We evaluated PRS, FH, and PRS-FH using liability-scale R 2, primarily focusing on 3 well-powered diseases (type 2 diabetes, hypertension, and depression). PRS attained average prediction R 2s of 5.8%, 4.0%, and 0.53% in non-British Europeans, South Asians, and Africans, confirming poor cross-population transferability. In contrast, PRS-FH attained average prediction R 2s of 13%, 12%, and 10%, respectively, representing a large improvement in Europeans and an extremely large improvement in Africans. In conclusion, including family history improves the accuracy of polygenic risk scores, particularly in diverse populations.Entities:
Year: 2022 PMID: 35935918 PMCID: PMC9351615 DOI: 10.1016/j.xgen.2022.100152
Source DB: PubMed Journal: Cell Genom ISSN: 2666-979X
Figure 1.Overview of PRS-FH methods
We list the 3 steps of PRS-FHlog and PRS-FHliab. Although the PRS-FHlog model coefficients and PRS-FHliab prior covariance shown here are the same for each parent, they may differ between mother and father. In addition, both methods can incorporate sibling history.
Figure 2.PRS-FHlog and PRS-FHliab increase prediction accuracy in simulations
We report mean liability-scale R2 across 10 simulations for PRS alone, FH alone (FHlog and FHliab), and PRS-FH methods (PRS-FHlog and PRS-FHliab) for different values of disease prevalence. Error bars denote standard errors. Numerical results are reported in Table S2.
List of 10 UK Biobank diseases analyzed
| Diseases | British | British N | British | Non-B. Eur. N | Non-B. Eur. | S.A. N | S.A. | Afr. N | Afr. |
|---|---|---|---|---|---|---|---|---|---|
| Lung cancer | 0.096 | 408,903 | 0.006 | 41,842 | 0.006 | 7,048 | 0.002 | 7,087 | 0.003 |
| Bowel cancer | 0.160 | 408,903 | 0.013 | 41,842 | 0.011 | 7,048 | 0.005 | 7,087 | 0.009 |
| Stroke | 0.090 | 408,903 | 0.024 | 41,842 | 0.020 | 7,048 | 0.025 | 7,087 | 0.025 |
| COPD | 0.172 | 408,903 | 0.035 | 41,842 | 0.035 | 7,048 | 0.022 | 7,087 | 0.013 |
| Prostate cancer | 0.296 | 187,889 | 0.038 | 18,192 | 0.032 | 3,811 | 0.014 | 3,096 | 0.050 |
|
|
|
|
|
|
|
|
|
|
|
| Breast cancer | 0.204 | 221,014 | 0.061 | 23,650 | 0.061 | 3,237 | 0.036 | 3,991 | 0.028 |
|
|
|
|
|
|
|
|
|
|
|
| CAD | 0.206 | 408,903 | 0.085 | 41,842 | 0.077 | 7,048 | 0.140 | 7,087 | 0.063 |
|
|
|
|
|
|
|
|
|
|
|
For each disease, we report the SNP-heritability () in UK Biobank British training data and the number of samples (N) and disease prevalence (K) in each UK Biobank training (British) and target (Non-British European, South Asian, or African) population. We note that the sample size and prevalence in British training data includes information from related individuals, but SNP-heritability was estimated using unrelated British individuals. Diseases are listed in order of disease prevalence in British training data. Our primary focus was on 3 well-powered diseases (type 2 diabetes, depression, and hypertension; denoted in bold) with (liability-scale) prediction R2 > 0.05 for PRS and/or FH in each target population (no additional criteria were applied). Non-B. Eur., Non-British European; S.A., South Asian; Afr., African; COPD, chronic obstructive pulmonary disease, defined as chronic bronchitis/emphysema; T2D, type 2 diabetes; CAD, coronary artery disease; HTN, hypertension.
Figure 3.PRS-FH increases prediction accuracy in analyses of UK Biobank diseases
(A) Analyses without covariates. We report liability-scale R2 for PRS alone, FH alone (FHlog and FHliab), and PRS-FH methods (PRS-FHlog and PRS-FHliab) for different diseases and target populations.
(B) Analyses with covariates. We report difference in liability-scale R2 (see text) for the corresponding methods incorporating covariates (PRS+, FH+, PRS-FH+), for different diseases and target populations. Error bars denote standard errors; error bars are jittered for PRS-FH (left) and FH (right) for visualization purposes. We focus on three well-powered diseases with R2 > 0.05 for PRS and/or FH in each target population (no additional criteria were applied). For depression in Africans, PRS-FHlog performs slightly worse than FHlog (difference in R of −0.001 [p = 0.003 for difference] in analyses without covariates and difference in R of −0.002 [p = 0.13 for difference] in analyses with covariates). Numerical results are reported in Tables S9 and S20.
KEY RESOURCES TABLE
| REAGENT or RESOURCE | SOURCE | IDENTIFIER |
|---|---|---|
| Deposited data | ||
| UK Biobank | Bycroft et al., 2018 |
|
| SNP weights to construct PRS scores for 12 diseases with family history in UK Biobank | This paper |
|
| PRS and family history weights for prediction models for 10 diseases in UK Biobank | This paper |
|
| Software and algorithms | ||
| BOLT-LMM | Lohetal., 2015 |
|
| PRS-FH | This paper |
|