| Literature DB >> 30770844 |
Armin P Schoech1,2,3, Daniel M Jordan4, Po-Ru Loh5,6, Steven Gazal7,5, Luke J O'Connor7,8,5, Daniel J Balick6,9, Pier F Palamara10, Hilary K Finucane5, Shamil R Sunyaev5,6,9, Alkes L Price11,12,13.
Abstract
Understanding the role of rare variants is important in elucidating the genetic basis of human disease. Negative selection can cause rare variants to have larger per-allele effect sizes than common variants. Here, we develop a method to estimate the minor allele frequency (MAF) dependence of SNP effect sizes. We use a model in which per-allele effect sizes have variance proportional to [p(1 - p)]α, where p is the MAF and negative values of α imply larger effect sizes for rare variants. We estimate α for 25 UK Biobank diseases and complex traits. All traits produce negative α estimates, with best-fit mean of -0.38 (s.e. 0.02) across traits. Despite larger rare variant effect sizes, rare variants (MAF < 1%) explain less than 10% of total SNP-heritability for most traits analyzed. Using evolutionary modeling and forward simulations, we validate the α model of MAF-dependent trait effects and assess plausible values of relevant evolutionary parameters.Entities:
Mesh:
Year: 2019 PMID: 30770844 PMCID: PMC6377669 DOI: 10.1038/s41467-019-08424-6
Source DB: PubMed Journal: Nat Commun ISSN: 2041-1723 Impact factor: 14.919
Estimates of α in simulations
|
|
| Sample size | Polygenicity (%) | Imputation noise | LD dependent effects | Mean | Mean |
|---|---|---|---|---|---|---|---|
| −0.3 | 0.4 | 5000 | 1 | Yes | Yes | −0.276 ± 0.017 | −0.192 ± 0.019 |
| 0.0 | 0.4 | 5000 | 1 | Yes | Yes | 0.021 ± 0.020 | 0.120 ± 0.017 |
| −0.6 | 0.4 | 5000 | 1 | Yes | Yes | −0.573 ± 0.014 | −0.471 ± 0.015 |
| −0.3 | 0.2 | 5000 | 1 | Yes | Yes | −0.260 ± 0.024 | −0.148 ± 0.024 |
| −0.3 | 0.4 | 5000 | 100 | Yes | Yes | −0.308 ± 0.012 | −0.195 ± 0.013 |
| −0.3 | 0.4 | 5000 | 1 | No | Yes | −0.304 ± 0.016 | −0.191 ± 0.017 |
| −0.3 | 0.4 | 5000 | 1 | Yes | No | −0.373 ± 0.017 | −0.284 ± 0.017 |
| −0.3 | 0.4 | 2500 | 1 | Yes | Yes | −0.269 ± 0.026 | −0.157 ± 0.025 |
| −0.3 | 0.2 | 2500 | 1 | Yes | Yes | −0.266 ± 0.052 | −0.160 ± 0.034 |
We simulated phenotypes using imputed UK Biobank genotypes and applied our method to infer α. In each line we show results from phenotypes that were simulated using various values of α, , sample size, and the proportion of causal SNPs. In most simulations, imputation noise and LD dependent SNP effects were included in the simulated phenotypes. In each case we report the mean estimated α and standard error of the mean, using our estimation method either with LD correction or without LD correction .
Estimates of α for 25 UK Biobank traits
| Phenotype | Sample size | |
|---|---|---|
| Age of menarche | 58,329 | −0.40 [−0.63, −0.11] |
| Blood pressure (diastolic) | 104,835 | −0.39 [−0.54, −0.20] |
| Blood pressure (systolic) | 104,835 | −0.38 [−0.54, −0.18] |
| BMI | 113,540 | −0.24 [−0.38, −0.06] |
| Bone mineral density | 110,611 | −0.35 [−0.45, −0.23] |
| FEV1/FVC | 97,075 | −0.44 [−0.55, −0.31] |
| FVC | 97,075 | −0.15 [−0.31, 0.04] |
| Height | 113,660 | −0.45 [−0.52, −0.39] |
| Smoking status | 113,560 | −0.16 [−0.43, 0.21] |
| Waist-hip ratio | 113,668 | −0.17 [−0.43, 0.19] |
| Allergic eczema | 113,707 | −0.60 [−0.85, −0.26] |
| Asthma | 113,707 | −0.25 [−0.60, 0.28] |
| College education | 112,811 | −0.32 [−0.54, −0.04] |
| Hypertension | 113,689 | −0.18 [−0.46, 0.21] |
| Eosinophil count | 108,957 | −0.40 [−0.54, −0.24] |
| High light scatter reticulocyte count | 108,785 | −0.53 [−0.65, −0.38] |
| Lymphocyte count | 108,664 | −0.52 [−0.63, −0.38] |
| Mean corpuscular hemoglobin | 108,513 | −0.42 [−0.53, −0.31] |
| Mean sphered cell volume | 109,523 | −0.43 [−0.56, −0.28] |
| Monocyte count | 110,026 | −0.19 [−0.35, −0.01] |
| Platelet count | 109,971 | −0.19 [−0.32, −0.03] |
| Platelet distribution width | 109,938 | −0.27 [−0.44, −0.07] |
| Red blood cell count | 110,054 | −0.39 [−0.51, −0.25] |
| Red blood cell distribution width | 109,913 | −0.20 [−0.36, −0.01] |
| White blood cell count | 110,186 | −0.25 [−0.42, −0.03] |
We computed α estimates for 25 UK Biobank traits, including 10 quantitative traits, 4 case–control traits, and 11 blood cell traits (all quantitative). The reported 95% credible intervals were calculated from the profile likelihood curves using a flat prior
Fig. 1Fraction of SNP-heritability in different MAF ranges given α. We report the fraction of SNP-heritability explained by SNPs up to a certain MAF (x-axis), for different values of α. For example, assuming α = −0.4, SNPs with MAF ≤ 5% collectively explain about 20% of the total SNP-heritability. These results are based on the UK10K allele frequency spectrum and our model assumption that squared per-allele effects are proportional to [2p(1 − p)]. Source data are provided as a Source Data file
Fig. 2MAF-dependence of SNP effects in evolutionary forward simulations. Forward simulations confirm that α model approximately holds above the MAF threshold . We report simulated mean squared SNP effect sizes at a given MAF on a log-log plot, assuming τ = 0.4 and a genome wide selection coefficient distribution with mean and shape parameter k = 0.25. Data points represent the mean squared effect size of 1000 SNPs of similar MAF, calculated assuming Eq. (2). The blue curve represents mean squared effect sizes under the α model (Eq. 1) with α = −0.32, fitted to SNPs above the MAF threshold T. The MAF threshold T = 0.006 is indicated by a dotted red line. Source data are provided as a Source Data file
Fig. 3Value of α as a function of τ and other parameters in forward simulations. We report best-fit α estimates for simulations at each value of τ at a given genome-wide average selection coefficient . Selection coefficients were sampled using a gamma distribution shape parameter of k = 0.25 (solid lines) or k = 0.125 (dotted lines). α estimates where calculated by fitting the model in Eq. (1) to simulated SNP effects above twice the MAF threshold (in order to avoid edge effects near T), with error bars representing standard errors calculated by bootstrap resampling of 25 independent SLiM2 simulations. The horizontal dashed line indicates α = −0.38, the best-fit α across the 25 UK Biobank traits. Results for a broader range of k values are reported in Supplementary Figure 5. Source data are provided as a Source Data file