| Literature DB >> 28594818 |
Yiming Hu1, Qiongshi Lu1, Ryan Powles2, Xinwei Yao3, Can Yang4, Fang Fang1, Xinran Xu1, Hongyu Zhao1,2,5,6.
Abstract
Genetic risk prediction is an important goal in human genetics research and precision medicine. Accurate prediction models will have great impacts on both disease prevention and early treatment strategies. Despite the identification of thousands of disease-associated genetic variants through genome wide association studies (GWAS), genetic risk prediction accuracy remains moderate for most diseases, which is largely due to the challenges in both identifying all the functionally relevant variants and accurately estimating their effect sizes in the presence of linkage disequilibrium. In this paper, we introduce AnnoPred, a principled framework that leverages diverse types of genomic and epigenomic functional annotations in genetic risk prediction for complex diseases. AnnoPred is trained using GWAS summary statistics in a Bayesian framework in which we explicitly model various functional annotations and allow for linkage disequilibrium estimated from reference genotype data. Compared with state-of-the-art risk prediction methods, AnnoPred achieves consistently improved prediction accuracy in both extensive simulations and real data.Entities:
Mesh:
Year: 2017 PMID: 28594818 PMCID: PMC5481142 DOI: 10.1371/journal.pcbi.1005589
Source DB: PubMed Journal: PLoS Comput Biol ISSN: 1553-734X Impact factor: 4.475
Mean correlation between simulated and predicted traits calculated from 100 replicates under different simulation settings.
The highest mean correlations are highlighted in boldface. Standard deviations are shown in parentheses. Traits were simulated from WTCCC genotype data, which contain 15,918 individuals genotyped for 393,273 SNPs. In each setting, we used 70% of the data to calculate the training summary statistics and randomly divided the rest 30% into two parts for parameter tuning.
| Training samples | Heritability | #Causal | PRSsig | PRSall | PRSP+T | LDpred | AnnoPred |
|---|---|---|---|---|---|---|---|
| Half (~5K) | 0.25 | 300 | 0.149(.028) | 0.08(.021) | 0.25(.028) | 0.279(.025) | |
| 3000 | NA | 0.082(.016) | 0.073(.020) | 0.087(.019) | |||
| 0.5 | 300 | 0.304(.04) | 0.16(.022) | 0.48(.026) | 0.502(.033) | ||
| 3000 | NA | 0.157(.019) | 0.157(.024) | 0.195(.021) | |||
| Full (~10K) | 0.25 | 300 | 0.217(.031) | 0.11(.02) | 0.332(.023) | 0.35(.033) | |
| 3000 | NA | 0.11(.014) | 0.107(.018) | 0.136(.017) | |||
| 0.5 | 300 | 0.373(.036) | 0.213(.023) | 0.548(.024) | 0.557(.047) | ||
| 3000 | 0.078(.023) | 0.21(.019) | 0.243(.021) | 0.309(.021) |
* NA means no SNP achieves genome-wide significance level (5e-8).
CORs of different methods.
The highest CORs are highlighted in boldface.
| Disease/Trait | PRSsig | PRSall | PRSP+T | LDpred | AnnoPred |
|---|---|---|---|---|---|
| Crohn's Disease | 0.27 | 0.229 | 0.32 | 0.325 | |
| Breast Cancer | 0.084 | 0.055 | 0.12 | 0.122 | |
| Rheumatoid Arthritis | 0.204 | 0.114 | 0.248 | 0.282 | |
| Type-II Diabetes | 0.165 | 0.156 | 0.204 | 0.202 | |
| Celiac Disease | 0.11 | 0.136 | 0.18 | 0.197 |