| Literature DB >> 32045423 |
Shuang Song1,2, Wei Jiang3, Lin Hou1,2, Hongyu Zhao3.
Abstract
Genetic risk prediction is an important problem in human genetics, and accurate prediction can facilitate disease prevention and treatment. Calculating polygenic risk score (PRS) has become widely used due to its simplicity and effectiveness, where only summary statistics from genome-wide association studies are needed in the standard method. Recently, several methods have been proposed to improve standard PRS by utilizing external information, such as linkage disequilibrium and functional annotations. In this paper, we introduce EB-PRS, a novel method that leverages information for effect sizes across all the markers to improve prediction accuracy. Compared to most existing genetic risk prediction methods, our method does not need to tune parameters nor external information. Real data applications on six diseases, including asthma, breast cancer, celiac disease, Crohn's disease, Parkinson's disease and type 2 diabetes show that EB-PRS achieved 307.1%, 42.8%, 25.5%, 3.1%, 74.3% and 49.6% relative improvements in terms of predictive r2 over standard PRS method with optimally tuned parameters. Besides, compared to LDpred that makes use of LD information, EB-PRS also achieved 37.9%, 33.6%, 8.6%, 36.2%, 40.6% and 10.8% relative improvements. We note that our method is not the first method leveraging effect size distributions. Here we first justify our method by presenting theoretical optimal property over existing methods in this class of methods, and substantiate our theoretical result with extensive simulation results. The R-package EBPRS that implements our method is available on CRAN.Entities:
Mesh:
Year: 2020 PMID: 32045423 PMCID: PMC7039528 DOI: 10.1371/journal.pcbi.1007565
Source DB: PubMed Journal: PLoS Comput Biol ISSN: 1553-734X Impact factor: 4.475
Fig 1The average predictive r2 of the EB-PRS, P+T and So et al.’s method under different training sample sizes in simulation experiments with independent SNPs.
Here the control-to-case ratio is set to one. EB-PRS always outperformed the other methods. The error bar indicates the standard deviation of predictive r2 across 10 times simulations.
Fig 2ROC curves of EB-PRS, P+T and methods from So et al. and Mak et al. under different CCRs in simulations with independent SNPs, when the training sample size is 2,000.
We use the bootstrap-based method presented in Robin et al. [36] to compare the difference of AUC. We show the p-values of comparing the AUC of EB-PRS and P+T method.
Fig 3Predictive r2 of EB-PRS and six other methods on simulations based on observed genotypes using five-fold cross validation. The error bar indicates the standard deviation of predictive r2.
Predictive r2 and AUC of EB-PRS, unadjusted PRS, P+T, LDpred-inf, LDpred So et al.’s method and Mak et al.’s method on simulations based on observed genotypes using five-fold cross validation.
The simulations were based on individual-level genotype data accessed from the schizophrenia study (study accession number phs000021) in dbGaP. The dataset included 2,729 samples, and consisted of 729,454 SNPs. The highest mean r2 and AUCs are highlighted in boldface.
| EB-PRS | Uadj PRS | P+T | LDpred-inf | LDpred | So’s | Mak’s | |
|---|---|---|---|---|---|---|---|
| Predictive | 0.014 | 0.022 | 0.014 | 0.024 | 0.018 | 0.006 | |
| AUC | 0.582 | 0.608 | 0.582 | 0.612 | 0.600 | 0.545 |
Summary of the training summary statistics and the testing genotype data in real data applications.
| Disease | Training | Sample Size | Number of SNPs | Testing | Sample Size |
|---|---|---|---|---|---|
| AS | GABRIEL Consortium | 535, 060 | dbGaP (phs000490) | ||
| BC | GAME-ON study | 2, 435, 470 | CGEMS | ||
| CEL | Dubois’ study | 508, 742 | NIDDK celiac disease study | ||
| CD | IIBDGC (WTCCC removed) | 871, 743 | WTCCC | ||
| PD | Simon-Sanchez J et al.’s study | 450, 439 | WTCCC2 | ||
| T2D | DIAGRAM | 2, 400, 624 | Northwestern NUgene Project |
Fig 4Comparisons of predictve r2 between EB-PRS and six other methods PRS on real data from six diseases.
AUCs of different methods on real datasets of six diseases.
The highest AUCs are highlighted in boldface.
| Disease | EB-PRS | Uadj PRS | P+T | LDpred-inf | LDpred | So’s | Mak’s |
|---|---|---|---|---|---|---|---|
| AS | 0.532 | 0.526 | 0.539 | 0.541 | 0.546 | 0.543 | |
| BC | 0.551 | 0.629 | 0.551 | 0.628 | 0.640 | 0.640 | |
| CD | 0.632 | 0.684 | 0.623 | 0.661 | 0.685 | 0.676 | |
| CEL | 0.593 | 0.607 | 0.585 | 0.611 | 0.615 | 0.618 | |
| PD | 0.520 | 0.525 | 0.518 | 0.519 | 0.521 | 0.522 | |
| T2D | 0.586 | 0.595 | 0.581 | 0.614 | 0.594 | 0.604 |