| Literature DB >> 33225976 |
Shang-Fu Chen1,2, Raquel Dias1,2, Doug Evans1,2, Elias L Salfati1,2, Shuchen Liu1,2, Nathan E Wineinger1,2, Ali Torkamani3,4.
Abstract
BACKGROUND: Polygenic risk scores (PRSs) are a summarization of an individual's genetic risk for a disease or trait. These scores are being generated in research and commercial settings to study how they may be used to guide healthcare decisions. PRSs should be updated as genetic knowledgebases improve; however, no guidelines exist for their generation or updating.Entities:
Keywords: Coronary artery disease; Genetic risk score; Genome-wide score; Genotype imputation; Genotype phasing; PRS; Polygenic risk score; Polygenic score
Mesh:
Year: 2020 PMID: 33225976 PMCID: PMC7682022 DOI: 10.1186/s13073-020-00801-x
Source DB: PubMed Journal: Genome Med ISSN: 1756-994X Impact factor: 11.117
Fig. 1Study overview. A schematic overview of our polygenic risk computational process. Genetic data was standardized (pre-processing), underwent imputation using three different common genotype imputation processes (imputation), and PRS analysis (PRS analysis)
The breakdown of typed vs imputed SNPs per PRS score
| Number of SNP | Reference | ||||
|---|---|---|---|---|---|
| Total | Found | Typed | Imputed | ||
| PRSCAD | 163 | 161 | 38 | 123 | [ |
| metaGRSCAD | 1,745,180 | 1,736,608 | 143,724 | 1,592,884 | [ |
| GPSCAD | 6,630,150 | 6,238,460 | 567,297 | 5,671,163 | [ |
| PRS-GWAST2D (547) | 558 | 547 | 66 | 481 | [ |
| PRS-GWAST2D (397) | 403 | 397 | 29 | 368 | [ |
| PRS-GWAST2D (170487) | 171,249 | 170,487 | 10,152 | 160,335 | [ |
| GPST2D | 6,917,436 | 6,482,889 | 570,779 | 5,912,110 | [ |
| PRS-GWASBC (239) | 313 | 239 | 34 | 205 | [ |
| PRS-GWASBC (2935) | 3820 | 2941 | 325 | 2616 | [ |
| GPSBC | 5218 | 4457 | 374 | 4083 | [ |
| PRS-GWASAfib | 166 | 166 | 19 | 147 | [ |
| GPSAfib | 6,730,541 | 6,302,924 | 570,913 | 5,732,011 | [ |
| PRS-GWASAD | 29 | 29 | 2 | 27 | [ |
| PRS-GWASGlaucoma | 2673 | 2657 | 224 | 2433 | [ |
Fig. 2PRSCAD reproducibility. The variability in PRSCAD percentile values as determined by three different imputation processes. a Gold standard WGS-based PRS percentile (x-axis) vs six replicates of imputation-derived PRS percentiles (y-axis). Point darkness depicts WGS-based ranking. b Histogram of the absolute score deviations relative to the WGS-based standard. Bin for no change is not shown
The distribution of CAD risk score percentile changes caused by three different imputation processes
| Percentile change | Beagle | Eagle+Minimac | SHAPEIT+Minimac | ||||||
|---|---|---|---|---|---|---|---|---|---|
| EUR | AFR | All | EUR | AFR | All | EUR | AFR | All | |
| ≤ 1%tile | 1247 (86.18%) | 163 (68.2%) | 1410 (83.63%) | 1437 (99.31%) | 236 (98.74%) | 1673 (99.23%) | 259 (17.9%) | 5 (2.09%) | 264 (15.66%) |
| > 1%tile | 200 (13.82%) | 76 (31.8%) | 276 (16.37%) | 10 (0.69%) | 3 (1.26%) | 13 (0.77%) | 1188 (82.1%) | 234 (97.91%) | 1422 (84.34%) |
| > 5%tile | 21 (1.45%) | 5 (2.09%) | 26 (1.54%) | 7 (0.48%) | 1 (0.42%) | 8 (0.47%) | 331 (22.87%) | 168 (70.29%) | 499 (29.6%) |
| > 10%tile | 2 (0.14%) | 0 (0%) | 2 (0.12%) | 1 (0.07%) | 0 (0%) | 1 (0.06%) | 68 (4.7%) | 87 (36.4%) | 155 (9.19%) |
| > 20%tile | 0 (0%) | 0 (0%) | 0 (0%) | 0 (0%) | 0 (0%) | 0 (0%) | 3 (0.21%) | 10 (4.18%) | 13 (0.77%) |
| ≤ 1%tile | 940 (64.96%) | 106 (44.35%) | 1046 (62.04%) | 1418 (98%) | 229 (95.82%) | 1647 (97.69%) | 153 (10.57%) | 13 (5.44%) | 166 (9.85%) |
| > 1%tile | 507 (35.04%) | 133 (55.65%) | 640 (37.96%) | 29 (2%) | 10 (4.18%) | 39 (2.31%) | 1294 (89.43%) | 226 (94.56%) | 1520 (90.15%) |
| > 5%tile | 12 (0.83%) | 2 (0.84%) | 14 (0.83%) | 5 (0.35%) | 1 (0.42%) | 6 (0.36%) | 277 (19.14%) | 154 (64.44%) | 431 (25.56%) |
| > 10%tile | 2 (0.14%) | 0 (0%) | 2 (0.12%) | 0 (0%) | 0 (0%) | 0 (0%) | 8 (0.55%) | 42 (17.57%) | 50 (2.97%) |
| > 20%tile | 0 (0%) | 0 (0%) | 0 (0%) | 0 (0%) | 0 (0%) | 0 (0%) | 0 (0%) | 0 (0%) | 0 (0%) |
| ≤ 1%tile | 1289 (89.08%) | 191 (79.92%) | 1480 (87.78%) | 1429 (98.76%) | 233 (97.49%) | 1662 (98.58%) | 328 (22.67%) | 20 (8.37%) | 348 (20.64%) |
| > 1%tile | 158 (10.92%) | 48 (20.08%) | 206 (12.22%) | 18 (1.24%) | 6 (2.51%) | 24 (1.42%) | 1119 (77.33%) | 219 (91.63%) | 1338 (79.36%) |
| > 5%tile | 26 (1.8%) | 0 (0%) | 26 (1.54%) | 2 (0.14%) | 0 (0%) | 2 (0.12%) | 239 (16.52%) | 69 (28.87%) | 308 (18.27%) |
| > 10%tile | 12 (0.83%) | 0 (0%) | 12 (0.71%) | 0 (0%) | 0 (0%) | 0 (0%) | 56 (3.87%) | 9 (3.77%) | 65 (3.86%) |
| > 20%tile | 2 (0.14%) | 0 (0%) | 2 (0.12%) | 0 (0%) | 0 (0%) | 0 (0%) | 6 (0.41%) | 1 (0.42%) | 7 (0.42%) |
Columns indicate the number and percentage of subjects in the population with percentile change in their PRS. EUR European, AFR African
Fig. 3PRS reproducibility. The variability in PRS percentile values as determined by three different imputation processes for 14 different PRSs. Gold standard WGS-based PRS percentile (x-axis) vs six replicates of imputation-derived PRS percentiles (y-axis). Point darkness depicts WGS-based ranking
Fig. 4PRSCAD variability as a function of PRS bin. The degree of variability in PRS percentile as a function of the expected WGS-based PRS tier across three different imputation processes. a Average absolute deviation per individual relative to their WGS-based gold standard. b Maximum absolute deviation per individual relative to the WGS-based gold standard. Box plots depict the interquartile range as is standard
The rate of risk tier re-classification due to imputation variability using quintile-based cutoffs
| Risk tier | Beagle | Eagle+Minimac | SHAPEIT+Minimac | ||||||
|---|---|---|---|---|---|---|---|---|---|
| < 20%tile | 20–80%tile | > 80%tile | < 20%tile | 20–80%tile | > 80%tile | < 20%tile | 20–80%tile | > 80%tile | |
| < 20%tile | 19.89% | 0.09% | 0% | 19.99% | 0% | 0% | 19.27% | 0.72% | 0% |
| 20–80%tile | 0.04% | 59.90% | 0.09% | 0% | 60.01% | 0.01% | 0.48% | 58.88% | 0.64% |
| > 80%tile | 0% | 0.10% | 19.90% | 0% | 0.01% | 19.98% | 0% | 0.48% | 19.52% |
| < 20%tile | 19.88% | 0.11% | 0% | 19.98% | 0.01% | 0% | 19.47% | 0.51% | 0% |
| 20–80%tile | 0.29% | 59.59% | 0.13% | 0.01% | 60% | 0% | 0.51% | 58.95% | 0.54% |
| > 80%tile | 0% | 0.09% | 19.92% | 0% | 0.01% | 19.99% | 0% | 0.50% | 19.50% |
| < 20%tile | 19.85% | 0.14% | 0% | 19.96% | 0.02% | 0% | 19.57% | 0.42% | 0% |
| 20–80%tile | 0.08% | 59.76% | 0.17% | 0.03% | 60% | 0% | 0.59% | 58.93% | 0.48% |
| > 80%tile | 0% | 0.13% | 19.88% | 0% | 0% | 19.99% | 0% | 0.56% | 19.44% |
The rate of re-classification into low risk (< 20th percentile), intermediate risk (20–80th percentile), and high risk (> 80th percentile) due to phasing and imputation variability across all individuals included in this study (1447 EURs and 239 AFRs combined). Columns indicate the average individual-level imputation-based risk tier. Rows indicate the process-level imputation-based risk tier
The rate of risk tier re-classification for high risk individuals due to phasing and imputation variability
| Risk tier | Beagle | Eagle+Minimac | SHAPEIT+Minimac |
|---|---|---|---|
| < 5%tile | 1.19% (− 1.13–3.51) | 0% (0–0) | 15.66% (7.84–23.48) |
| < 10%tile | 2.96% (0.4–5.51) | 0% (0–0) | 10.12% (5.56–14.68) |
| < 15%tile | 1.59% (0.04–3.13) | 0% (0–0) | 9.96% (6.26–13.67) |
| < 20%tile | 0.89% (− 0.11–1.9) | 0% (0–0) | 9.91% (6.7–13.12) |
| > 80%tile | 1.78% (0.37–3.19) | 0.3% (− 0.28–0.88) | 10% (6.81–13.19) |
| > 85%tile | 1.98% (0.26–3.69) | 0% (0–0) | 12.7% (8.59–16.81) |
| > 90%tile | 1.19% (− 0.45–2.83) | 0% (0–0) | 17.54% (11.84–23.24) |
| > 95%tile | 4.71% (0.2–9.21) | 0% (0–0) | 15.85% (7.95–23.76) |
| | |||
| < 5%tile | 6.98% (1.59–12.36) | 1.19% (− 1.13–3.51) | 18.6% (10.38–26.83) |
| < 10%tile | 2.37% (0.07–4.66) | 0% (0–0) | 6.67% (2.86–10.47) |
| < 15%tile | 2.37% (0.5–4.25) | 0.79% (− 0.3–1.88) | 12.94% (8.82–17.06) |
| < 20%tile | 3.82% (1.79–5.86) | 0.3% (− 0.28–0.88) | 8.9% (5.86–11.94) |
| > 80%tile | 2.66% (0.95–4.38) | 0% (0–0) | 9.76% (6.6–12.93) |
| > 85%tile | 4.72% (2.12–7.33) | 0% (0–0) | 12.11% (8.11–16.11) |
| > 90%tile | 6.43% (2.76–10.11) | 0% (0–0) | 6.06% (2.42–9.7) |
| > 95%tile | 2.38% (− 0.88–5.64) | 0% (0–0) | 9.52% (3.25–15.8) |
| < 5%tile | 5.88% (0.88–10.88) | 0% (0–0) | 13.25% (5.96–20.55) |
| < 10%tile | 2.38% (0.08–4.69) | 0.59% (− 0.56–1.75) | 7.78% (3.72–11.85) |
| < 15%tile | 0.79% (− 0.3–1.88) | 0% (0–0) | 9.92% (6.23–13.61) |
| < 20%tile | 1.49% (0.19–2.78) | 0.3% (− 0.28–0.88) | 10% (6.81–13.19) |
| > 80%tile | 2.37% (0.75–3.99) | 0% (0–0) | 8.33% (5.38–11.29) |
| > 85%tile | 1.97% (0.26–3.68) | 0% (0–0) | 6.48% (3.41–9.55) |
| > 90%tile | 2.37% (0.07–4.66) | 0% (0–0) | 5.99% (2.39–9.59) |
| > 95%tile | 1.18% (− 1.12–3.47) | 0% (0–0) | 10.59% (4.05–17.13) |
The rate of individuals re-classified from high risk tiers due to imputation variability across all individuals included in this study (1447 EURs and 239 AFRs combined). An individual is considered re-classified if at least two imputation replicates produce a different risk tier than the WGS-based ground truth. Values in parenthesis present the 95% confidence interval
Fig. 5Imputation accuracy relative to WGS-based gold standard expressed as R2. a Distribution of R2 values per individual for chromosome 22—average R2 (left) and minimum R2 (right) for 6 replicate imputation runs. Dashed vertical lines indicate the median values. b Range of R2 values achieved per minor allele frequency (MAF) bin for chromosome 22. c Scatterplot of percent variation explained (log-scale) per SNP for CAD PRSs (left: PRSCAD, middle: metaGRSCAD, right: GPSCAD) vs imputation variability as defined by the Gini coefficient of SNP imputation results