| Literature DB >> 34301930 |
Ganna Leonenko1, Emily Baker1, Joshua Stevenson-Hoare1, Annerieke Sierksma2,3, Mark Fiers2,3,4, Julie Williams1,5, Bart de Strooper2,3,4, Valentina Escott-Price6,7.
Abstract
Polygenic Risk Scores (PRS) for AD offer unique possibilities for reliable identification of individuals at high and low risk of AD. However, there is little agreement in the field as to what approach should be used for genetic risk score calculations, how to model the effect of APOE, what the optimal p-value threshold (pT) for SNP selection is and how to compare scores between studies and methods. We show that the best prediction accuracy is achieved with a model with two predictors (APOE and PRS excluding APOE region) with pT<0.1 for SNP selection. Prediction accuracy in a sample across different PRS approaches is similar, but individuals' scores and their associated ranking differ. We show that standardising PRS against the population mean, as opposed to the sample mean, makes the individuals' scores comparable between studies. Our work highlights the best strategies for polygenic profiling when assessing individuals for AD risk.Entities:
Mesh:
Substances:
Year: 2021 PMID: 34301930 PMCID: PMC8302739 DOI: 10.1038/s41467-021-24082-z
Source DB: PubMed Journal: Nat Commun ISSN: 2041-1723 Impact factor: 14.919
Model description for the PRS models presented in the manuscript.
| Model Name | Model description |
|---|---|
| ORS.full | PRS including SNPs with a pT ≤ 1e-5 |
| ORS.no.APOE | PRS including SNPs with a pT ≤ 1e-5 and excluding SNPs in the |
| PRS.full | PRS including SNPs with a pT ≤ 0.1 (unless otherwise specified) |
| PRS.no.APOE | PRS including SNPs with a pT ≤ 0.1 and excluding SNPs in the |
| PRS.AD | PRS calculated as a weighted sum of PRS.no.APOE (including SNPs with a pT ≤ 0.1, unless otherwise specified) and |
PRS prediction accuracy for the AD case-control dataset using different p-value thresholds and methods to model APOE.
| pT | PRS.full | PRS.no.APOE | PRS.AD | ||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| N SNPs | AUC (%) | OR | N SNPs | AUC (%) | OR | AUC | OR | ||||
| 2 | 70.0 | 0.18 | 2.2 (1.8,2.7) | – | – | – | – | 70.0 | 0.18 | 2.2 (1.8, 2.7) | |
| 5e-8 | 65 | 69.8 | 0.16 | 2.2 (1.8, 2.7) | 17 | 55.7 | 0.02 | 1.2 (1.0, 1.5) | 71.4 | 0.19 | 2.4 (2.0, 3.0) |
| 1e-5 (ORS) | 126 | 69.4 | 0.16 | 2.2 (1.8, 2.7) | 66 | 56.7 | 0.02 | 1.2 (1.1, 1.5) | 72.0 | 0.20 | 2.4 (2.0, 3.0) |
| 0.1 | 68,681 | 64.9 | 0.09 | 1.8 (1.5, 2.2) | 68,516 | 61.3 | 0.06 | 1.6 (1.3, 1.9) | 74.1 | 0.24 | 2.8 (2.2, 3.4) |
| 0.5 | 203,950 | 62.6 | 0.07 | 1.7 (1.4, 2.0) | 203,710 | 60.5 | 0.05 | 1.5 (1.3, 1.8) | 73.7 | 0.23 | 2.7 (2.2, 3.4) |
Legend: PRSs were calculated on a case-control cohort (271 clinically defined AD cases and 278 cognitively normal controls) using Kunkle et al. (2019) summary statistics for pT ≤ 5e-8, 1e-5, 0.1, 0.5 LD-pruned SNPs and APOE(ε2 + ε4). The number of SNPs (NSNPs) in each risk score are reported. Three PRS models were considered: PRS.full calculated on the full summary statistics; PRS.no.APOE where the APOE region was excluded (chr19:44.4–46.5 Mb); PRS.AD which is calculated as a weighted sum of PRS.no.APOE and APOE(ε2 + ε4), where APOE effects were weighted with effect sizes (B(ε2) = −0.47 and B(ε4) = 1.12) as in Kunkle et al (2019). The number of SNPs for PRS.AD models is always two more than for PRS.no.APOE. Prediction was estimated in terms of AUC, R2 and OR with 95% Confidence Intervals (CI).
Fig. 1Effects of APOE allele frequencies and age on genetic risk scores.
A Allele ε4, ε3, ε2 frequencies (red, orange and green lines respectively) in the case-control dataset (271 cases and 278 controls), B, C the mean risk score for ε4 allele frequencies, ORS.no.APOE and PRS.no.APOE (red, green and blue lines respectively) split by age groups. ORS.no.APOE includes SNPs with p-value≤ 1e-5, PRS.no.APOE includes SNPs with p-value ≤ 0.1 and both exclude the APOE region. A represents the full sample (B) represents cases only and (C) controls only. Age groups are specified as (55–65, 65–75, 75–85, 85+). For comparability of the scores in (B, C) the e4 genotypes (originally coded as 0/1/2) were also standardised. ORS Oligogenic risk score, PRS Polygenic risk score, SNP Single nucleotide polymorphism.
Fig. 2Prediction accuracy across different PRS methods (PRS(C + T), PRSice, LDpred-Inf, PRS-CS, LDAK and SBayesR) for ORS.full, ORS.no.APOE, PRS.full, PRS.no.APOE and PRS.AD.
Bar plot for prediction accuracy (AUC and R2) across 6 PRS approaches: PRS(C + T), PRSice, LDpred-Inf, PRS-CS, LDAK and SBayesR (red, yellow, green, teal, blue, pink bars respectively). The colour of each PRS method is consistent across all plots. Upper figures represent ORS and lower figures represent PRS and PRS.AD models in the case-control dataset (271 cases and 278 controls). ORS.full includes SNPs with pT ≤ 1e-5 and PRS.full includes SNPs with pT ≤ 0.1, ORS.no.APOE and PRS.no.APOE exclude SNPs in the APOE region and PRS.AD models APOE separately and subsequently adds this to PRS.no.APOE. AUC Area Under the Curve, ORS Oligogenic risk score, PRS Polygenic risk score, AD Alzheimer’s Disease, SNP Single nucleotide polymorphism.
Number of ORS/PRS extremes in the case-control dataset standardised within the sample and against 1000 Genomes European population.
| Sample | Risk | Tail | In-sample standardisation | Population-based standardisation | ||||||
|---|---|---|---|---|---|---|---|---|---|---|
| OR | AUC | OR | AUC | |||||||
| All | ORS.full | Positive | 18 (6.6) | 2 (0.7) | 9 (0.4, 207) | 84.2 | 33 (12) | 5 (1.8) | 10 (1, 75) | 74.6 |
| Negative | 1 (0.3) | 1 (0.3) | 2 (0.7) | 3 (1.1) | ||||||
| PRS.full | Positive | 11 (4) | 2 (0.7) | 20 (3, 145) | 81.3 | 19 (7) | 3 (1.1) | 32 (6, 180) | 83.1 | |
| Negative | 3 (1.1) | 11 (3.9) | 3 (1.1) | 15 (5.3) | ||||||
| PRS.AD | Positive | 21 (7.7) | 1 (0.3) | 100 (3, 2989) | 84.5 | 33 (12) | 3 (1.1) | 124 (6, 2707) | 88.2 | |
| Negative | 0 (0) | 3 (1) | 0 (0) | 6 (2) | ||||||
| ε3ε3 | ORS.no.APOE | Positive | 1 (1) | 3 (1.8) | 1.7 (0.1, 38) | 43.8 | 1 (1) | 2 (1.1) | 0.6 (0.03, 14) | 56.3 |
| Negative | 1 (1) | 5 (3) | 1 (1) | 3 (1.8) | ||||||
| PRS.no.APOE | Positive | 4 (4) | 1 (0.6) | 39 (1, 1191) | 99.9 | 7 (7) | 2 (1.1) | 95 (3, 2683) | 95.7 | |
| Negative | 0 (0) | 6 (3.6) | 0 (0) | 10 (6) | ||||||
Legend: In a case-control dataset the number of cases (N cases) and controls (N controls) at PRS extremes were identified with percentage from total of cases = 271 and total of controls = 278. The prediction accuracy of these extremes was assessed with AUC and OR (95% Confidence Intervals) when standardised (a) using sample mean and SD (b) using mean and SD from 1000 Genomes data. We define PRS extremes as individuals with a score exceeding ± 2 SD from the data mean or population mean. Three models were used for the whole dataset (549 individuals): ORS.full (pT ≤ 1e-5), PRS.full (pT ≤ 0.1) and PRS.AD (pT ≤ 0.1) and two models were used for ε3 homozygote individuals (N = 267 with 100 cases and 167 controls): ORS.no.APOE and PRS.no.APOE. ORS.no.APOE and PRS.no.APOE exclude the APOE region and PRS.AD models APOE separately and subsequently adds this to PRS.no.APOE.