| Literature DB >> 34599249 |
Declan Bennett1, Donal O'Shea1,2, John Ferguson1,3, Derek Morris4, Cathal Seoighe5.
Abstract
Ongoing increases in the size of human genotype and phenotype collections offer the promise of improved understanding of the genetics of complex diseases. In addition to the biological insights that can be gained from the nature of the variants that contribute to the genetic component of complex trait variability, these data bring forward the prospect of predicting complex traits and the risk of complex genetic diseases from genotype data. Here we show that advances in phenotype prediction can be applied to improve the power of genome-wide association studies. We demonstrate a simple and efficient method to model genetic background effects using polygenic scores derived from SNPs that are not on the same chromosome as the target SNP. Using simulated and real data we found that this can result in a substantial increase in the number of variants passing genome-wide significance thresholds. This increase in power to detect trait-associated variants also translates into an increase in the accuracy with which the resulting polygenic score predicts the phenotype from genotype data. Our results suggest that advances in methods for phenotype prediction can be exploited to improve the control of background genetic effects, leading to more accurate GWAS results and further improvements in phenotype prediction.Entities:
Mesh:
Year: 2021 PMID: 34599249 PMCID: PMC8486788 DOI: 10.1038/s41598-021-99031-3
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.379
Figure 1The proportion of causal variants recovered in 100 simulations. The boxplot shows the median (center line), upper and lower quartiles (hinges) and the maximum and minimum values not more than 1.5 times the interquartile range from the corresponding hinge (whiskers). The simulations consisted of 100,000 individuals and a continuous trait, with narrow-sense heritability of 0.5 and 1000 causal variants. BOLT-LMM-165 denotes BOLT-LMM with a GRM derived from 165,684 variants resulting from strict LD-pruning. BOLT-LMM-664 refers to the use of BOLT-LMM with a GRM derived from all 664,393 variants in the simulations. Methods that include PGS in the name involved the use of a LOCO PGS fixed effect, derived either from pruning and thresholding (methods ending in PT) or using LDpred2.
Pipeline computation time and memory for simulations consisting of 100,000 individuals and 664,393 variants.
| Method | CPU Time (s) | Max memory (GB) | |||
|---|---|---|---|---|---|
| GWAS | LOCO PGS | GWAS(22 chr) | Total (CPU Time) | ||
| fastGWA | 501.2 | 0.0 | 0.0 | 501.2 | 0.5 |
| fastGWA-PGS-PT | 501.2 | 245.8 | 2953.3 | 3700.2 | 0.7 |
| fastGWA-PGS-LDpred2 | 501.2 | 58,880.0 | 2953.3 | 62,334.5 | 6.3 |
| BOLT-LMM-165 | 92,108.0 | 0.0 | 0.0 | 92,108.0 | 3.9 |
| BOLT-LMM-165-PGS-PT | 92,108.0 | 245.8 | 614,514.4 | 706,868.2 | 3.9 |
| BOLT-LMM-664 | 119,202.0 | 0.0 | 0.0 | 119,202.0 | 15.5 |
Analyses were performed on a single compute node with 32 Xeon(R) CPU D-1541 CPUs and 128 GB of RAM. Note that REGENIE was omitted from the table, as the simulation is based on a single phenotype and would unfairly disadvantage REGENIE, which is optimized for the task of performing association analyses on multiple phenotypes simultaneously.
Figure 2Difference in sensitivity (between fastGWA-PGS-LDpred2 and fastGWA) as a function of specificity for 100 simulations of a continuous trait with narrow-sense heritability of 0.5 and 1000 causal variants in 100,000 individuals. The specificity (x-axis) is discretized in bins of size 0.0001. Each grey line shows the results of one simulation. The red line shows the mean difference over all simulations.
Figure 3Proportion of causal variants recovered in simulations of a quantitative trait over a range of values of and the number of causal loci. Simulations on the top (A) and bottom (B) panels were based on 100,000 and 430,000 randomly sampled individuals from the UK Biobank, respectively.
Number of independent significant loci identified and resulting phenotype prediction model fit.
| Method | Significant loci | 95% CI | 95% CI | Spearman’s | Phenotype | ||
|---|---|---|---|---|---|---|---|
| fastGWA | 1381 | 0.696 | 0.689, 0.702 | 0.165 | 0.158, 0.170 | 0.382 | Height |
| fastGWA-PGS-PT | 1583 | 0.701 | 0.694, 0.707 | 0.173 | 0.166, 0.179 | 0.391 | |
| fastGWA-PGS-LDpred2 | 1717 | 0.703 | 0.696, 0.709 | 0.176 | 0.170, 0.182 | 0.395 | |
| BOLT-LMM | 1804 | 0.703 | 0.697, 0.709 | 0.170 | 0.164, 0.176 | 0.388 | |
| fastGWA | 450 | 0.151 | 0.146, 0.158 | 0.130 | 0.124, 0.135 | 0.351 | BMI |
| fastGWA-PGS-PT | 493 | 0.153 | 0.147, 0.159 | 0.130 | 0.125, 0.136 | 0.351 | |
| fastGWA-PGS-LDpred2 | 500 | 0.151 | 0.146, 0.157 | 0.127 | 0.121, 0.133 | 0.346 | |
| BOLT-LMM | 583 | 0.155 | 0.150, 0.162 | 0.134 | 0.128, 0.139 | 0.356 | |
| fastGWA | 324 | 0.216 | 0.204, 0.232 | 0.158 | 0.144, 0.171 | 0.427 | HBMD |
| fastGWA-PGS-PT | 365 | 0.221 | 0.208, 0.238 | 0.164 | 0.152, 0.178 | 0.439 | |
| fastGWA-PGS-LDpred2 | 385 | 0.225 | 0.210, 0.241 | 0.167 | 0.154, 0.182 | 0.444 | |
| BOLT-LMM | 393 | 0.223 | 0.209, 0.238 | 0.165 | 0.152, 0.178 | 0.437 |
full is the coefficient of determination of a model that includes the PGS, sex, age & 10 PCs as covariates while PGS is the coefficient for a model that includes only the PGS. BOLT-LMM was applied with a GRM consisting of 556,516 variants.