| Literature DB >> 35851544 |
Annika Faucon1, Julian Samaroo2, Tian Ge3, Lea K Davis1,4, Nancy J Cox1,4, Ran Tao1,5, Megan M Shuey6,4.
Abstract
To enable large-scale application of polygenic risk scores (PRSs) in a computationally efficient manner, we translate a widely used PRS construction method, PRS-continuous shrinkage, to the Julia programming language, PRS.jl. On nine different traits with varying genetic architectures, we demonstrate that PRS.jl maintains accuracy of prediction while decreasing the average runtime by 5.5×. Additional programmatic modifications improve usability and robustness. This freely available software substantially improves work flow and democratizes usage of PRSs by lowering the computational burden of the PRS-continuous shrinkage method.Entities:
Mesh:
Year: 2022 PMID: 35851544 PMCID: PMC9297586 DOI: 10.26508/lsa.202201382
Source DB: PubMed Journal: Life Sci Alliance ISSN: 2575-1077
Individual and average runtimes for polygenic risk score–continuous shrinkage (PRS-CS) and PRS.jl by phenotype.
| PRS-CS | PRS.jl | Average | |||||||
|---|---|---|---|---|---|---|---|---|---|
| Run 1 | Run 2 | Run 3 | Mean (SD) | Run 1 | Run 2 | Run 3 | Mean (SD) | Improvement | |
| Quantitative phenotypes | |||||||||
| Body mass index | 62:21:25 | 65:12:16 | 59:41:13 | 62:24:58 (1:28:32) | 12:01:15 | 12:20:31 | 11:19:32 | 11:53:46 (0:31:10) | 5.3 |
| Cholesterol | 56:51:43 | 55:36:35 | 59:12:59 | 57:13:46 (1:49:52) | 12:07:50 | 8:15:23 | 12:34:38 | 10:59:17 (2:22:34) | 5.2 |
| eGFR | 65:52:20 | 69:43:40 | 77:57:50 | 71:11:17 (6:10:36) | 13:54:28 | 15:57:40 | 14:01:35 | 14:37:54 (1:09:10) | 4.9 |
| High-density lipoprotein | 59:58:13 | 56:39:56 | 59:19:08 | 58:39:06 (1:45:02) | 8:21:45 | 8:16:13 | 10:58:48 | 9:12:15 (1:32:19) | 6.4 |
| Low-density lipoprotein | 63:38:06 | 56:17:24 | 56:47:47 | 58:54:26 (4:06:08) | 8:26:04 | 8:18:47 | 12:54:02 | 9:52:58 (2:36:51) | 6.0 |
| Triglycerides | 58:00:52 | 56:04:48 | 60:45:49 | 58:17:10 (2:21:13) | 8:18:01 | 8:09:24 | 11:48:25 | 9:25:17 (2:04:02) | 6.2 |
| Binary phenotypes | |||||||||
| Asthma | 41:58:26 | 40:15:14 | 41:29:02 | 41:14:14 (0:53:10) | 5:26:44 | 6:25:06 | 7:27:54 | 6:26:35 (1:00:36) | 6.4 |
| Coronary artery disease | 66:59:45 | 66:00:16 | 64:05:35 | 65:41:52 (1:28:32) | 10:43:53 | 15:05:15 | 13:04:09 | 12:57:46 (2:10:48) | 5.1 |
| Type 2 diabetes mellitus | 68:35:38 | 69:10:26 | 68:08:43 | 68:38:16 (0:30:57) | 17:03:18 | 20:13:24 | 16:38:33 | 17:58:25 (1:57:33) | 3.8 |
| All phenotypes combined | |||||||||
| 544:16:28 | 535:00:35 | 547:28:06 | 542:15:03 (6:28:16) | 96:23:18 | 103:01:43 | 110:47:36 | 103:24:12 (7:12:35) | 5.5 | |
All runtimes are presented as hour:minutes:seconds.
Average improvement is estimated as the mean PRS-CS/mean PRS.jl.
Figure 1.Plots comparing polygenic risk score–continuous shrinkage (PRS-CS) and PRS.jl PRS estimates for each trait.
(A, B, C, D, E, F, G, H, I) Plots of the PRSs calculated by the python implementation of PRS-CS (PRS-CS.py) on the y-axis compared with the scores calculated by PRS.jl on the x-axis for each trait: (A) asthma, (B) coronary artery disease, (C) type 2 diabetes mellitus, (D) body mass index, (E) cholesterol, (F) estimated glomerular filtration rate, (G) high-density lipoprotein, (H) low-density lipoprotein, and (I) triglycerides. The correlation R2 are presented in the corner of each plot.
Median squared error and P-value from the t test comparing SNP weights between python and Julia implementations for a single run.
| SNP count | ||||
|---|---|---|---|---|
| Median square error | GWAS | Polygenic risk score | ||
| Asthma | 2.07 × 10−11 | 0.89 | 8,270,130 | 494,889 |
| Body mass index | 6.55 × 10−11 | 1.00 | 2,529,253 | 719,311 |
| Coronary artery disease | 6.83 × 10−11 | 0.91 | 8,440,435 | 782,510 |
| eGFR | 3.22 × 10−11 | 0.93 | 17,393,472 | 774,105 |
| HDL | 3.29 × 10−11 | 0.95 | 2,433,797 | 696,196 |
| LDL | 3.19 × 10−11 | 0.97 | 2,424,334 | 695,115 |
| Total cholesterol | 3.25 × 10−11 | 0.87 | 2,433,332 | 696,147 |
| Triglycerides | 3.17 × 10−11 | 0.86 | 2,425,960 | 695,255 |
| Type 2 diabetes mellitus | 2.00 × 10−11 | 0.85 | 35,369,247 | 780,627 |
eGFR, estimated glomerular filtration rate; GWAS, genome-wide association study; HDL, high-density lipoprotein; LDL, low-density lipoprotein; SNP, single nucleotide polymorphism.
Comparison of polygenic risk score–continuous shrinkage (PRS-CS) and PRS.jl performance for quantitative traits using, as covariates, age, sex, and PCs 1–10.
| R2 | |||
|---|---|---|---|
| PRS-CS | PRS.jl | Number of subjects | |
| Body mass index | 0.1141(<0.0001) | 0.1141 (<0.0001) | 60,584 |
| Cholesterol | 0.1089 (0.0002) | 0.1088 (0.0002) | 34,347 |
| eGFR | 0.4921 (<0.0001) | 0.4922 (<0.0001) | 34,797 |
| High-density lipoprotein | 0.2326 (<0.0001) | 0.2326 (<0.0001) | 33,338 |
| Low-density lipoprotein | 0.0835 (<0.0001) | 0.0835 (0.0001) | 32,061 |
| Triglycerides | 0.0657 (<0.0001) | 0.0657 (<0.0001) | 34,531 |
All data is presented as mean (SD).
Comparison of polygenic risk score–continuous shrinkage (PRS-CS) and PRS.jl performance for binary traits.
| Nagelkerke R2 | Area under the curve | 10% odds ratio | Number of subjects | |||||
|---|---|---|---|---|---|---|---|---|
| PRS-CS | PRS.jl | PRS-CS | PRS.jl | PRS-CS | PRS.jl | Cases | Controls | |
| Asthma | 0.0176 (<0.0001) | 0.0176 (<0.0001) | 0.560 (<0.0001) | 0.560 (0.0002) | 1.54 (0.02) | 1.54 (0.02) | 8,210 | 64,618 |
| Coronary artery disease | 0.3212 (0.0001) | 0.3214 (0.0001) | 0.552 (0.0004) | 0.552 (0.0005) | 1.71 (0.02) | 1.73 (0.01) | 16,807 | 56,021 |
| Type 2 diabetes mellitus | 0.1715 (<0.0001) | 0.1716 (<0.0001) | 0.626 (0.0001) | 0.626 (0.0001) | 2.75 (0.017) | 2.77 (0.008) | 13,688 | 59,140 |
All data is presented as mean (SD).
Figure S1.PC1 by PC2 plot of genetically determined ancestry based on comparison with the 1000 Genomes reference panel.
Individuals from 1000 Genomes were used to create CEU-YRI and CEU-CHB axes. European ancestry inclusion was based on the following thresholds ≥0.3 on the CEU-YRI axis and ≥0.4 CEU-CHB axis. Individuals in the region remaining after threshold exclusion are noted by red Xs and represent the individuals included in this study. The other colors represent the administratively assigned or self-reported race for patients excluded from the study. The color key is denoted in the box in the upper right corner with the following abbreviations: B, Black or African American; W, European American or White; I, American Indian or Alaska Native; U, Unknown; A, Asian; and N, Other.