Literature DB >> 30297966

Functional architecture of low-frequency variants highlights strength of negative selection across coding and non-coding annotations.

Steven Gazal^1,2, Po-Ru Loh^3,4, Hilary K Finucane^3,5, Andrea Ganna^3,6,7, Armin Schoech^8,3,9, Shamil Sunyaev^3,4,10, Alkes L Price^11,12,13.

Abstract

Common variant heritability has been widely reported to be concentrated in variants within cell-type-specific non-coding functional annotations, but little is known about low-frequency variant functional architectures. We partitioned the heritability of both low-frequency (0.5%≤ minor allele frequency <5%) and common (minor allele frequency ≥5%) variants in 40 UK Biobank traits across a broad set of functional annotations. We determined that non-synonymous coding variants explain 17 ± 1% of low-frequency variant heritability ([Formula: see text]) versus 2.1 ± 0.2% of common variant heritability ([Formula: see text]). Cell-type-specific non-coding annotations that were significantly enriched for [Formula: see text] of corresponding traits were similarly enriched for [Formula: see text] for most traits, but more enriched for brain-related annotations and traits. For example, H3K4me3 marks in brain dorsolateral prefrontal cortex explain 57 ± 12% of [Formula: see text] versus 12 ± 2% of [Formula: see text] for neuroticism. Forward simulations confirmed that low-frequency variant enrichment depends on the mean selection coefficient of causal variants in the annotation, and can be used to predict effect size variance of causal rare variants (minor allele frequency <0.5%).

Entities: Chemical Disease Gene Species

Mesh：

Year: 2018 PMID： 30297966 PMCID： PMC6236676 DOI： 10.1038/s41588-018-0231-8

Source DB: PubMed Journal: Nat Genet ISSN： 1061-4036 Impact factor: 38.330

Introduction

Common variant (minor allele frequency (MAF) ≥5%) trait heritability has been widely reported to be concentrated into noncoding functional annotations that are active in relevant cell-types or tissues, with a limited role for common coding variants[1-8]. Although common variants explain the bulk of heritability[9-11], low-frequency variants can have larger per-allele effect sizes than common variants when impacted by negative selection[9-17], and may thus yield important biological insights even though the heritability they explain is modest[6,7]. Recent large genome-wide association studies (GWAS) have identified low-frequency variants with large per-allele effect sizes and reported an excess of genome-wide significant low-frequency variants in coding regions[18-21], implying that low-frequency coding variants have larger effect sizes than other low-frequency variants. However, the relative contribution of low-frequency coding variants to low-frequency variant heritability is currently unknown. For cell-type-specific noncoding variants, discovery of genome-wide significant low-frequency variants has been limited, and their contribution to low-frequency variant heritability is also unknown. Dissecting low-frequency variant functional architectures can shed light on the action of negative selection across functional annotations and inform the design of low-frequency and rare variant association studies[14,22]. To investigate functional enrichments of low-frequency variants (defined here as 0.5%≤MAF<5%), we extended stratified LD-score regression[5,23] (S-LDSC) to partition the heritability of both low-frequency and common variants; our method produces robust (unbiased or slightly conservative) results in simulations. We applied our method to partition the heritability of low-frequency and common variants in 40 heritable traits from the UK Biobank[24-26] (average N=363K UK-ancestry samples) across a broad set of coding and noncoding functional annotations[5,6,8,23,27-31]. We performed forward simulations to connect estimated low-frequency and common variant functional enrichments to the action of negative selection, and to predict the effect size variance of causal rare variants (MAF<0.5%) within each functional annotation.

Results

Overview of methods

S-LDSC[5,23] is a method for partitioning the heritability causally explained by common variants across overlapping discrete or continuous annotations using genome-wide association study (GWAS) summary statistics for accurately imputed variants and a linkage disequilibrium (LD) reference panel. Here, we extended S-LDSC to partition the heritability causally explained by low-frequency variants using GWAS summary statistics for accurately imputed and poorly imputed variants. We included separate annotations for low-frequency and common variants, and used WGS data from 3,567 UK10K samples[18] as an LD reference panel to ensure accurate LD information for low-frequency variants in the UK-ancestry target samples analyzed in this study (see Methods). We jointly analyzed 163 annotations (referred as the “baseline-LF model”), including 33 main binary annotations, MAF bins, and LD-related annotations (Supplementary Table 1 and Supplementary Table 2; see Methods). We note that the inclusion of MAF- and LD-related annotations implies that the expected causal heritability of a SNP is a function of MAF and LD. We first estimated the heritability causally explained by all low-frequency variants () and the heritability causally explained by all common variants (). For the 33 main binary annotations, we computed their low-frequency variant enrichment (LFVE), defined as the proportion of causally explained by variants in the annotation divided by the proportion of low-frequency variants that lie in the annotation, and common variant enrichment (CVE), defined analogously. Further details of the method are provided in the Methods section. We have released open-source software implementing the method, and have made our annotations publicly available (see URLs).

Simulations of extending S-LDSC to low-frequency variants

Although S-LDSC has previously been shown to produce robust results for partitioning common variant heritability using overlapping binary and continuous annotations[23,32], we performed additional simulations to assess our extension to low-frequency variants. We first confirmed that S-LDSC with the UK10K LD reference panel produced unbiased heritability estimates for variants with MAF≥0.5% in simulations using UK10K target samples (see Supplementary Figure 1, Supplementary Table 3, and Supplementary Note). We subsequently performed more realistic simulations using target samples from the UK Biobank interim release[24], so that LD (and MAF) in the target samples and UK10K LD reference panel do not perfectly match (see Methods and Supplementary Figure 2). S-LDSC was run either by restricting regression variants to accurately imputed variants (i.e. INFO score[33] ≥0.99), as we recommended previously[5], or by including all variants (regardless of INFO score). We focused our simulations on two representative annotations spanning roughly 1% of the genome: coding and enhancer. We considered various MAF-dependent architectures[34,35], and conservatively specified our generative model to be different from the additive model assumed by S-LDSC (see Methods). For each of the two annotations, we simulated scenarios with no functional enrichment (“No Enrichment”) and scenarios with CVE roughly equal to 7× and lower LFVE (“Lower LFVE”), similar LFVE (“Same Enrichment”), or higher LFVE (“Higher LFVE”), respectively. For both annotations, we observed that including all variants in the regression produced slightly conservative LFVE estimates and unbiased LFVE/CVE ratio estimates, while restricting to accurately imputed variants produced upward biases (Figure 1, Supplementary Table 4). The slightly conservative and LFVE estimates are due to LD-dependent architectures (coding and enhancer variants have lower than average levels of LD, as do other enriched functional annotations[23]), as we observed nearly unbiased estimates when creating shifted annotations with average levels of LD (see Methods and Supplementary Figure 3). We thus recommend including all variants in the regression when running S-LDSC using the baseline-LF model. Our simulations indicate that this method is robust (unbiased or slightly conservative) in estimating low-frequency and common variant functional enrichments and LFVE/CVE ratios across a wide range of genetic architectures, even in the presence of poorly imputed variants, a target sample that does not exactly match the UK10K LD reference panel, and a MAF-dependent architecture that does not match the additive model assumed by S-LDSC.

Figure 1:

Simulations to assess low-frequency variant enrichment estimates.

We report estimates of LFVE and LFVE/CVE ratio in simulations under a coding-enriched architecture (first row) or enhancer-enriched architecture (second row). We considered four different simulation scenarios (see main text). S-LDSC was run either by restricting regression variants to accurately imputed variants (S-LDSC – INFO ≥ 0.99), or by including all variants (S-LDSC – All variants). We do not report LFVE/CVE ratio for the No Enrichment simulation (CVE=LFVE=1) due to unstable estimates; however, all analyses of real traits in this paper focus on annotations with significant CVE. Results are averaged across 1,000 simulations. Error bars represent 95% confidence intervals. Numerical results for , , LFVE, CVE and LFVE/CVE ratio are reported in Supplementary Table 4.

Low-frequency functional architecture of UK Biobank traits

We applied S-LDSC with the baseline-LF model to 40 polygenic, heritable complex traits and diseases from the full UK Biobank release[25] (average N=363K; Supplementary Table 5). Analyses were restricted to the set of 409K individuals with UK ancestry[25] to ensure a close ancestry match with the UK10K LD reference panel. Summary statistics were computed by running BOLT-LMM v2.3 (ref.[26]) on imputed dosages, and made publicly available (see URLs). S-LDSC results were meta-analyzed across 27 independent traits (average N=355K; see Supplementary Note). We observed a roughly linear relationship between estimates of and (Figure 2 and Supplementary Table 5), with low-frequency variants explaining 6.3±0.2× less heritability and having 4.0±0.1× lower per-variant heritability than common variants on average. These ratios are consistent with a model in which the variance of per-normalized genotype effect sizes is proportional to (where p is the minor allele frequency; refs.[34,35]) with α=−0.37 (95% confidence interval [−0.40;−0.34]; similar to previous α estimates from raw genotype-phenotype data[10,11]), and consistent with a model in which low-frequency variants have smaller per-variant heritability but larger per-allele effect sizes[10,11,23,34,35] (Supplementary Figure 4).

Figure 2:

Common variant heritability ()and low-frequency variant heritability ()estimates for 40 UK Biobank traits.

We report and estimated by S-LDSC with the baseline-LF model for 40 UK Biobank traits (for binary traits, estimates are on the liability scale), with 7 representative independent traits highlighted. Error bars represent 95% confidence intervals. The dashed black line represents the ratio between and meta-analyzed across 27 independent traits (1/6.3). Grey lines represent expected ratios for different values of α (see main text). Error bars represent 95% confidence intervals. Numerical results are reported in Supplementary Table 5.

We compared the LFVE and CVE of the 33 main binary functional annotations of the baseline-LF model, meta-analyzed across traits (Figure 3, Supplementary Table 6). LFVE were highly correlated to CVE (r=0.79) and larger than CVE on average (regression slope =1.85). We identified 9 main functional annotations with significantly different LFVE and CVE (Figure 3, Supplementary Table 6). Non-synonymous variants had the largest LFVE and largest difference vs. CVE (5.0× ratio; LFVE=38.2±2.3×, vs. CVE=7.7±0.9×; P=3×10−36 for difference). As non-synonymous variants comprise 0.45% of low-frequency variants vs. only 0.27% of common variants due to strong negative selection on non-synonymous mutations[36,37] (see below), this difference is even larger when comparing the proportion of heritability they explain (8.2× ratio; 17.3±1.0% of , vs. 2.1±0.2% of ; P=5×10−47). Non-synonymous variants predicted to be deleterious by PolyPhen-2 (ref.[29]) had larger LFVE and LFVE/CVE ratio than non-synonymous variants predicted to be benign (Supplementary Figure 5).

Figure 3:

Functional low-frequency and common variant architectures across 27 independent UK Biobank traits.

We plot LFVE vs. CVE (log scale) for the 33 main functional annotations of the baseline-LF model (meta-analyzed across the 27 independent traits), highlighting annotations for which LFVE is significantly different from CVE. Numbers in the legend represent the proportion of common / low-frequency variants inside the annotation, respectively. The first three conserved annotations are based on phastCons elements[27], Conserved in mammals* is based on GERP RS scores[28] (≥4), and Conserved in mammals** is based on Lindblad-Toh et al.[30]. The promoter flanking annotation has (non-significantly) negative LFVE and is not displayed for visualization purposes. The solid line represents LFVE=CVE; dashed lines represent LFVE=constant multiples of CVE. Error bars represent 95% confidence intervals. Numerical results are reported in Supplementary Table 6.

We also observed LFVE significantly larger than CVE for coding variants (2.5× ratio; P=1×10−18), 5’ UTR (2.5× ratio; P=1×10−4) and the five main conserved annotations[27,28,30] (ratios 1.5×-2.2×; each P<5×10−7; Figure 3, Supplementary Table 6). Surprisingly, phastCons regions conserved in primates[27] were more enriched than phastCons regions conserved in vertebrates or conserved in mammals[27] (even though regions conserved in more distant species may be viewed as more biologically critical). We observed that the significantly larger LFVE (compared to CVE) for all 5 conserved annotations is mainly due to conserved regions that are coding, and that coding enrichments are similar for regions conserved across different species (Supplementary Figure 6). Finally, we observed significantly smaller LFVE than CVE for intronic variants (0.85× ratio; P=8×10−5). These results were generally consistent across the 40 UK Biobank traits analyzed (Supplementary Figure 7). We also observed significantly larger enrichment/depletion for LFVE than for CVE in the first and/or last quintile of LD-related continuous annotations related to negative selection[23] (Supplementary Figure 8 and Supplementary Table 7); our forward simulations from ref.[23] confirmed larger effects of low-frequency variants in these LD-related annotations (Supplementary Table 8). Overall, our results suggest that LFVE is substantially larger than CVE only for annotations that are strongly constrained by negative selection, as the strongest differences were observed for coding and non-synonymous variants, which are known to be under strong negative selection[36,37]. A more detailed interpretation of the LFVE/CVE ratio is provided below (see Forward simulations).

Cell-type-specific enrichments of low-frequency variants

We sought to investigate the contribution to low-frequency variant architectures of cell-type-specific (CTS) annotations[1-6] (i.e. reflecting regulatory activity in a given cell type) with excess contributions to common variant architectures. For each of the 40 UK Biobank traits, we selected the subset of 396 CTS Roadmap annotations[6] with statistically significant common variant enrichment after conditioning on (non-CTS annotations in) the baseline-LD model[5,8] (see Methods). We selected a total of 637 trait-annotation pairs, with at least one CTS annotation for 36 of 40 traits (25 of 27 independent traits) (Supplementary Table 9); the 637 CTS annotations contained 2.7% of common variants and 3.0% of low-frequency variants on average (Supplementary Table 10). We analyzed each of these trait-annotation pairs using the baseline-LF model (Figure 4a and Supplementary Table 10). For the 25 trait-annotation pairs with the most statistically significant CVE for each of the 25 independent traits (critical CTS annotations), LFVE and CVE were similar, with LFVE 1.12±0.13× larger than CVE on average (other definitions of critical CTS annotations produced similar conclusions; see Supplementary Figure 9).

Figure 4:

Low-frequency and common variant architectures of cell-type-specific (CTS) annotations.

For 637 trait-annotation pairs with conditionally statistically significant common variant enrichment, we report (a) LFVE vs. CVE (log scale) and (b) proportion of vs. proportion of explained. The dashed black line in (a) represents the regression slope for 25 critical CTS annotations for independent traits (see main text). Brain-specific annotations are denoted in blue. Two trait-H3K4me3 annotation pairs with LFVE significantly larger than CVE are denoted in dark blue (see main text); error bars represent 95% confidence intervals. The two arrows in (b) denote All autoimmune diseases (H3K4me1 in Regulatory T-cells; left arrow) and Monocyte count (H3K4me1 in Primary monocytes; right arrow) (see main text). Results for coding and non-synonymous annotations (meta-analysis across 27 independent traits) are denoted in red; error bars represent 95% confidence intervals. Numerical results are reported in Supplementary Table 10.

We observed Bonferroni-significant differences (after correcting each trait for 1–53 annotations tested) for two traits. The most significant trait-annotation pairs were neuroticism and H3K4me3 in brain dorsolateral prefrontal cortex, vs. CVE=8.3±1.5×; P=0.001; 63.2±15.4% of , vs. 11.1±2.0% of ). We note that these results are not driven by the fact that H3K4me3 marks are often located in 5’ UTR and exons[38] (Supplementary Table 10). Interestingly, these two annotations (and 55 of all 62 CTS annotations with LFVE/CVE>2) are brain-specific, implicating stronger selection against variants impacting gene regulation in brain tissues (see Forward simulations and Discussion). While CTS annotations generally have only moderately large LFVE (e.g. smaller than non-synonymous variants; Figure 4a), they often explain a large proportion of (e.g. larger than non-synonymous variants; Figure 4b) due to large annotation size, as with common variant enrichment. In particular, H3K4me1 in regulatory T-cells (3.7% of low-frequency variants) explains 86.2±20.8% of for All autoimmune diseases (vs. 3.4% of common variants explaining 48.9±9.1% of ), and H3K4me1 in primary monocytes (4.8% of low-frequency variants) explains 79.3±18.1% of for monocyte count (vs. 4.6% of common variants explaining 70.8±8.6% of ; Figure 4b and Supplementary Table 10). Thus, CTS annotations often dominate low-frequency architectures, analogous to common variant architectures[5,8].

Larger non-synonymous enrichments in genes under selection

Recent studies have identified gene sets that are depleted for non-synonymous variants[31,39]. To further investigate the connection between functional enrichment and negative selection, we stratified the CVE and LFVE of non-synonymous variants (Figure 3a) based on the strength of selection on the underlying genes. We considered 5 bins of estimated values of selection coefficients for heterozygous protein-truncating variants[31] (s), with 3,073 protein-coding genes per bin, and added annotations based on non-synonymous variants within each bin to the baseline-LF model (see Methods). We determined that both the LFVE and CVE of non-synonymous variants correlated strongly with the predicted strength of selection on the underlying genes (Figure 5 and Supplementary Table 11). In particular, we observed extremely strong enrichments for non-synonymous variants in genes under the strongest selection (bin 1: LFVE=102.0±7.9× and CVE=41.5±4.8×). However, the LFCE/CVE ratio was smaller for non-synonymous variants in genes under the strongest selection (bin 1: 2.5×) than in genes under the weakest selection (bins 4+5: 5.8×); we discuss this surprising result below (see Forward simulations). We obtained similar results when stratifying non-synonymous variants in genes under varying levels of selective constraint based on other related criteria (Supplementary Figure 10).

Figure 5:

Low-frequency and common variant enrichments for non-synonymous variants vary with the strength of selection on the underlying genes.

We report LFVE vs. CVE (log scale) for non-synonymous variants in 5 bins of s (see main text), meta-analyzed across 27 independent UK Biobank traits; bins 4+5 are merged for visualization purposes. Numbers in the legend represent the proportion of common / low-frequency variants inside the annotation, respectively. The solid line represents LFVE=CVE; dashed lines represent LFVE=constant multiples of CVE. Error bars represent 95% confidence intervals. Numerical results for each bin are reported in Supplementary Table 11.

Forward simulations confirm role of negative selection

We hypothesized that the LFVE and CVE of different functional annotations would be informative for the action of negative selection, which constrains strongly selected variants to lower frequency[9-17]. To investigate this, we performed forward simulations[40] using a genetic architecture involving annotations mimicking non-synonymous variants (1% of the simulated genome), functional noncoding variants (1%), and ordinary noncoding variants (98%), with different respective distributions of selection coefficients s (Supplementary Figure 11). For each of these three annotations we specified the probability for a de novo variant to be deleterious (π), the mean selection coefficient for de novo deleterious variants () and the probability for a deleterious variant to be causal for the trait (π); the probability for a de novo variant to be causal for the trait is π=π·π. Per-allele trait effect sizes were specified to be proportional to|| where parameterizes the coupling between selection coefficient and trait effect size in the Eyre-Walker model[12], implying that only deleterious variants have nonzero effects (see Methods). We investigated how the LFVE and CVE of the functional noncoding annotation varied as a function of the values of and π for that annotation. To achieve a realistic simulation framework, we fixed the remaining values of π, and π for the three annotations, as well as the value of , to values that we fit using our UK Biobank estimate of 4.0× larger per-variant heritability for common vs. low-frequency variants, as well as the LFVE and CVE of non-synonymous variants (38.2× and 7.7×, respectively). Specifically, we fixed π=60% for the functional noncoding annotation (similar results for π=40%; see Methods); π=80% (ref.[13]), =−0.003 (ref.[13]) and π=8% for the non-synonymous annotation; π=40%, =−0.0001 and π=4% for the ordinary noncoding annotation; and =0.75. We note that our fitted value of is larger than previous estimates[11,13,15,16] (see Discussion). We determined that the CVE of the functional noncoding annotation in our simulations depends on both and π (Figure 6a), while the LFVE/CVE ratio depends primarily on (Figure 6b). When de novo deleterious variants are under strong selection (≥−0.0003, corresponding to LFVE/CVE ratio ≥1.2×; Figure 6b), the CVE depends primarily on π (Figure 6a), as the mean selection coefficient of deleterious common variants varies only weakly with (since most deleterious common variants have s<<||; Figure 6c). Finally, we observed that functional noncoding annotations with similar CVE and LFVE tend to have causal variants with slightly stronger selection coefficients (i.e. ≈−0.0002) than ordinary noncoding causal variants (=−0.0001), for which LFVE is lower than CVE (Figure 6b). We note that the LFVE/CVE ratio can be used to infer the mean selection coefficient of deleterious causal variants as a function of MAF (see Figure 6c), because this ratio depends primarily on and because the selection coefficients of de novo deleterious causal variants are drawn from a distribution with mean .

Figure 6:

Forward simulations enable inferences about negative selection and rare variant architectures.

Results are based on forward simulations involving an annotation mimicking functional noncoding variants, as well as other annotations (see text). (a,b) We report the CVE (a) and LFVE/CVE ratio (b) of the functional noncoding annotation as a function of the mean selection coefficient for de novo deleterious variants () and the probability of a de novo variant to be causal (π) for this annotation. and π values for non-synonymous and ordinary noncoding annotations are described in the main text. (c) We report the mean absolute selection coefficient of deleterious variants in the functional noncoding annotation as a function of and MAF (rare, low-frequency, common). (d) We report the mean squared per-allele effect size of causal variants in the functional noncoding annotation (normalized by the mean squared per-allele effect size of rare causal non-synonymous variants) as a function of and MAF (rare, low-frequency and common). Red lines denote the value =−0.003 used to simulate non-synonymous variants, grey lines denote the value =−0.0001 used to simulate ordinary noncoding variants (see main text). The value π=48% used in (d) (see Methods) is denoted via squares in (a) and (b). Numerical results are reported in Supplementary Table 12.

Our forward simulations provide an interpretation of the LFVE/CVE ratios of different functional annotations that we estimated for UK Biobank traits and annotations. First, they confirm that non-synonymous variants (which are strongly deleterious[41]: large π and ||) can have a limited contribution to common variant architectures (2.1% of ) but a large contribution to low-frequency variant architectures (17.3% of ) (Figure 3a). Second, they indicate that the proportion of causal variants (π) is larger for critical cell-type-specific (CTS) annotations than for non-synonymous variants (based on their CVE; Figure 4a), but that the causal variants in critical CTS annotations have only slightly larger selection coefficients than ordinary noncoding variants, except for some brain annotations that are under much stronger selection (much larger ||, based on their LFVE/CVE ratios; Figure 4a). Third, they explain the extremely large CVE for non-synonymous variants inside genes predicted to be under strong negative selection[31] (large s; Figure 5), which are expected to correspond to genes with an extremely large proportion of deleterious non-synonymous variants (large πdel, implying large π=π·π). However, despite extremely large CVE and LFVE, this class of variants had a smaller LFVE/CVE ratio than that of non-synonymous variants inside genes predicted to be under weak selection (Figure 5), a surprising result that appears to suggest a smaller(Figure 6b) despite the extremely large value of π. We performed additional forward simulations to show that a larger || doesnot produce larger LFVE/CVE ratios for annotations with extremely large values of π, for which the ratio between the proportion of low-frequency variants that are deleterious and the proportion of common variants that are deleterious is reduced to 1 (Supplementary Figure 12). Although our focus is primarily on low-frequency variants (0.5%≤MAF<5%), we also used our forward simulation framework to draw inferences about rare variant (MAF<0.5%) architectures of noncoding functional annotations, based on LFVE and CVE estimates from UK Biobank (Figure 4a). Specifically, we compared the mean squared per-allele effect size of rare causal variants in annotations mimicking functional noncoding variants and non-synonymous variants, respectively. We inferred disproportionate causal effects of rare variants in annotations under very strong selection (||=−0.003, similar to non-synonymous variants[13]), with mean squared causal effect sizes 11×, 26× and 60× larger than annotations with ||=−0.0006, ||=−0.0003 and ||=−0.0002, respectively (Figure 6d and Supplementary Table 12; similar results for different choices of π, Supplementary Figure 13). These results indicate that an annotation with large CVE needs to have even larger LFVE (e.g. LFVE/CVE ratio ≥2×, corresponding to ||≤−0.0006; Figure 6b) in order to harbor rare causal variants with substantial mean squared effect sizes (e.g. only an order of magnitude smaller than rare causal non-synonymous variants; Figure 6d). Unfortunately, most of the non-brain CTS annotations that we analyzed do not achieve this ratio (Figure 4a), motivating further work on more precise noncoding annotations (see Discussion).

Discussion

In this study, we partitioned the heritability of both low-frequency and common variants in 40 UK Biobank traits across numerous functional annotations, employing an extension of stratified LD score regression[5,23] to low-frequency and common variants that produces robust (unbiased or slightly conservative) results. Meta-analyzing functional enrichments across 27 independent traits, we highlighted the critical impact of low-frequency non-synonymous variants (17.3% of , LFVE=38.2×) compared to common non-synonymous variants (2.1% of , CVE=7.7×). Other annotations previously linked to negative selection, including non-synonymous variants with high PolyPhen-2 scores[29], non-synonymous variants in genes under strong selection[31], and LD-related annotations[23], were also significantly more enriched for as compared to . Finally, at the trait level, we observed that CTS annotations[6,8] also dominate the low-frequency architecture, and that significant CVE tend to have similar LFVE, or larger LFVE for brain-related annotations and traits. This last observation implicate the action of negative selection on low-frequency variants affecting gene regulation in the brain, and is consistent with the interaction between brain enhancers and genes under stronger purifying selection[18], and with the excess of rare de novo mutations in regulatory elements active in fetal brain in patients with neurodevelopmental disorders[43]. We showed via forward simulations that the CVE of an annotation depends primarily on its proportion of causal variants (π), while its LFVE/CVE ratio depends primarily on the mean selection coefficient for de novo deleterious variants (), and thus to the mean selection coefficient of causal variants (Figure 6). These conclusions are consistent with previous studies of the role of selection[9-17], including pleiotropic selection[17], in maintaining variants with large effects on complex traits at low frequencies. Overall, our work quantifies the relationship between the strength of selection in specific functional annotations (both coding and noncoding) and low-frequency and common variant enrichment for human diseases and complex traits, providing an interpretation of the enrichments estimated for UK Biobank traits and annotations. Our results on low-frequency variant functional architectures have several implications for downstream analyses. First, our results provide guidance for the design of association studies targeting low-frequency variants. Non-synonymous variants should be strongly prioritized at the low-frequency variant level[21], as they explain a large proportion of and directly implicate causal genes (and specifically implicate core disease genes rather than peripheral genes[7]), avoiding the challenge of mapping noncoding variants to genes[42,44]. However, we observed that all coding and UTRs variants jointly explained only 26.8±1.9% of (Supplementary Table 6), providing an upper bound of the proportion of low-frequency signal captured by whole-exome sequencing (WES) studies. This underscores the advantages of large GWAS (with imputed genotypes obtained using large reference panels), compared to WES or exome chip data, for querying low-frequency variation[16]. Furthermore, using functionally informed association tests that assign higher weight to low-frequency non-synonymous variants or CTS annotations should significantly improve power in these analyses[4,20,45]. Second, our results provide guidance for the design of association studies targeting rare (MAF<0.5%) variants, which require large sequencing datasets[14]. While WES datasets have been successfully used to detect new coding variants, genes and gene sets associated to human diseases and complex traits, there is an increasing focus on WGS that can capture rare noncoding variants. However, our LFVE and CVE results for critical CTS annotations (Figure 4), coupled with our predictions of causal rare variant effect size variance (Figure 6d), suggest that in most instances these annotations do not harbor causal variants with large mean squared effect sizes (with brain-related annotations and traits as a notable exception; also see ref.[43]), highlighting the need for more precise noncoding annotations for prioritization in WGS. As a first step towards this goal, we estimated the LFVE and CVE of annotations constructed using a wide range of recently developed noncoding variant prioritization scores[46-50]. We identified only one annotation, defined using the top 0.5% of Eigen scores[48], with an LFVE/CVE ratio significantly larger than 1 (1.7× ratio; LFVE=22.0±2.2×, vs. CVE=13.0±1.4×; P=7×10−4 for difference; Supplementary Figure 14). However, even for this annotation, the LFVE/CVE ratio <2 again implies that this annotation does not harbor causal variants with substantial mean squared effect sizes (only an order of magnitude smaller than rare causal non-synonymous variants; Figure 6d). Third, our results were consistent with strong coupling between selection coefficient and trait effect size (Eyre-Walker coupling parameter[12] =0.75; robust to error bars in LFVE and CVE estimates, see Supplementary Figure 15), implicating a larger impact of negative selection on complex traits than previously reported[11,13,15,16] and much larger effect sizes for rare variants in functional annotations with strong selection coefficients. This can be explained by the fact that our inference procedure explicitly allows different distributions of selection coefficients for non-synonymous and noncoding variants (=−0.003 and =−0.0001, respectively; Supplementary Figure 16). Finally, the different LFVE/CVE ratios that we inferred for different functional annotations suggest that it may be appropriate to allow annotation-specific α values when using the α model (per-normalized genotype effect size proportional to (; refs.[10,11,34,35]). In the extreme case of non-synonymous variants, we explored different choices of α values for non-synonymous and other variants, and determined that a value of α=−1.10 for non-synonymous variants and α=−0.30 for other variants provided the best fit our UK Biobank heritability and enrichment results (Supplementary Table 13). Although our work has provided insights on low-frequency variant architectures of human diseases and complex traits, it has several limitations (see Supplementary Note). Despite these limitations, our low-frequency and common variant enrichment results convincingly demonstrate and quantify the action of negative selection across coding and noncoding functional annotations.

Methods

Extension of S-LDSC to low-frequency variants.

S-LDSC[5,23] is a method for partitioning heritability explained by common variants across overlapping annotations (both binary and continuous[23]) using GWAS summary statistics. More precisely, S-LDSC models the vector of per normalized genotype effect size β as a mean-0 vector whose variance depends on D continuous-valued annotations : where α (j) is the value of annotation a at variant j, and represents the per-variant contribution of one unit of the annotation α to heritability. We can thus perform a regression to infer the values of using the following relationship with the expected statistic of variant j: where is the LD score of variant j with respect to continuous values αd(k) of annotation αd, r is the correlation between variant j and k in an LD reference panel, N is the sample size of the GWAS study,and b is a term that measures the contribution of confounding biases[51]. Then, the heritability causally explained by a subset of variants S can be estimated as . We note that this definition, used here to define and estimate and , is different from the definition of “SNP-heritability” (ref.[52]), which refers to the heritability tagged by a set of genotyped and/or imputed variants. To allow different effects for low-frequency and common variants inside a functional annotation α, we modeled the variance of the per normalized genotype effect sizes using different for these two categories of variants. In a case where we consider D functional annotations, we write: where (resp. ) is an indicator function with value 1 if variant j is a low-frequency (resp. common) variant, and 0 otherwise, (resp. ) represents the per-variant contribution of one unit of the annotation α to the heritability explained by low-frequency (resp. common) variants. These parameters can be estimated using S-LDSC by writing equation (3) in the form: where (resp. ) is an annotation equals to α(j) if variant j is a low-frequency (resp. common) variant and 0 otherwise. In all analyses we also added one annotation containing all the variants, 5 MAF bins for low-frequency variants, and 10 MAF bins for common variants in order to take into account MAF-dependent effects[23,53,54]. For each functional binary annotation of interest α, we compared its low-frequency variant enrichment (LFVE) and common variant enrichment (CVE), defined as the proportion of (resp. ) explained by the annotation, divided by the proportion of low-frequency (resp. common) variants that are in the annotation (see Supplementary Note for a justification of the denominator). Standard errors were computed using a block jackknife procedure[5]. We note that these computations did not include the heritability causally explained by rare variants (MAF<0.5%). Application of S-LDSC was performed using 3,567 unrelated individuals of UK10K data set[18] (ALSPAC and TWINSUK cohorts) as an LD reference panel. This choice was made in order to ensure a close ancestry match between the target sample used to compute summary statistics (UK Biobank) and the LD reference panel (UK10K), as LD patterns of low-frequency variants are expected to vary across European populations[55,56] (see Supplementary Note for more information on our application of S-LDSC). The main differences of our application of S-LDSC compared to standard S-LDSC analyses on common variants are summarized in Supplementary Table 14.

Baseline-LF model and functional annotations.

We considered 34 main functional annotations from the baseline-LD model v1.1 (27 binary and 7 continuous annotations, including LD-related annotations; refs.[5,23,57,58]), including coding, UTR, promoter and intronic regions, the histone marks monomethylation (H3K4me1) and trimethylation (H3K4me3) of histone H3 at lysine 4, acetylation of histone H3 at lysine 9 (H3K9ac) and two versions of acetylation of histone H3 at lysine 27 (H3K27ac), open chromatin as reflected by DNase I hypersensitivity sites (DHSs), combined chromHMM and Segway predictions (which make use of many Encyclopedia of DNA Elements (ENCODE) annotations to produce a single partition of the genome into seven underlying chromatin states), three different conserved annotations, two versions of super-enhancers, FANTOM5 enhancers, typical enhancers, and 6 LD-related continuous annotations (see Supplementary Table 1). In order to further dissect the set of coding variants, a major focus of this study, we annotated each coding variant using ANNOVAR[59], and added one synonymous and one non-synonymous annotation to our model. We also added three new annotations based on phastCons[27] conserved elements (46 way) in vertebrates, mammals and primates, and one annotation based on flanking bivalent TSS/enhancers from Roadmap data[6] (see URLs). These 6 new annotations led to a total of 33 main binary annotations (see Supplementary Table 1). We included 500 bp windows around each binary annotation and 100 bp windows around four of the main annotations, leading to a total of 74 main functional annotations. Then, all annotations were duplicated for low-frequency and common variants as described in equation (4), except for the predicted allele age annotation[60] (which had too many missing values for low-frequency variants). Finally, we included one annotation containing all variants, 10 common variant MAF bins (as in the baseline-LD model[23]) and 5 low-frequency variant 5 MAF bins. We thus obtained a set of 163 total annotations. We refer to this set of annotations as the “baseline-LF model” (see Supplementary Table 2), which we used for all of our S-LDSC analyses. More details on the baseline-LF model are provided in the Supplementary Note. We note that the inclusion of MAF and LD-related annotations in this model implies that the expected causal heritability of a SNP is a function of MAF and LD. More details on LD-related heritability models are provided in the Supplementary Note.

Simulations using UK Biobank target samples to assess extension of S-LDSC to low-frequency variants.

To assess possible biases in heritability and enrichment estimates under a more realistic scenario, we simulated quantitative phenotypes from chromosome 1 of UK Biobank interim release dataset with imputed variants from thousand genomes[61] and UK10K[18] (113,851 unrelated individuals, 1,023,655 variants with allele counts greater or equal to 5 in UK10K). First, we randomly sampled integer-valued genotypes from UK Biobank imputation dosage data. Second, we set trait heritability to h[2]=0.5, selected M=100,000 causal variants, and performed simulations under a coding-enriched architecture by simulating the variance of per-normalized genotype effect sizes proportional to , where 1 (resp.1 )is an indicator function taking the value 1 if variant j belongs (resp. does not belong) to the coding annotation,p is the frequency of the causal variant in the simulated UK Biobank genotypes dataset, α was set to −0.25, and c and α were chosen to produce four different genetic architectures (see Supplementary Table 4). We note that this generative model is different and more complex than the additive inference model implemented in S-LDSC, but may be more realistic as the effect size of coding variants depends now directly on their allele-frequency (and not or their low-frequency/common status). We also performed simulations under an enhancer-enriched architecture by considering the baseline ChromHMM/Segway weak-enhancer[62] annotation, which has similar properties as the coding annotation (2.28% of reference low-frequency variants versus 1.83% for coding, and elements with a mean length size of 249bp versus 315bp for coding). To investigate the impact of the LD-dependent architecture created by the enrichment of these two annotations (coding and weak-enhancer variants tend to have low levels of LD[23]), we randomly created 100 shifted coding (resp. weak-enhancer) annotations, and selected the annotation with an average level of LD (i.e. the shifted annotation with the 50th smallest level of LD computed on low-frequency variants; see ref.[23] for a definition of level of LD). Third, we used version 2.3 of BOLT-LMM software[26,63] (see URLs) to compute association statistics on UK Biobank dosage data to mimic the fact that we computed summary statistics on imputed data. Finally, we used S-LDSC with our baseline-LF model (except that the 6 new functional annotations were not included in the simulation analyses) to estimate , , and coding/enhancer CVE and LFVE. S-LDSC was run by restricting regression variants to accurately imputed variants (i.e. INFO score[33] ≥ 0.99), as we suggested previously[5], or to all variants (irrespective of INFO score). We also report results when using an INFO score threshold of 0.5 or 0.9, which did not improve the results (see Supplementary Table 4). We also considered including INFO score explicitly in the regression to down-weight poorly imputed variants (i.e. replacing equation (2) by , where I is the INFO score of variant j and ; this approximation assumes that genotype uncertainty decreases the association test statistics), but this did not improve the results, consistent with the fact that summary statistics computed from dosage data already down-weight poorly imputed variants (Supplementary Table 4). We performed 1,000 simulations for each simulation scenario. In each case, we removed 0–3 outlier simulations in which the estimate of was below 0.0001; we did not observe any such outlier results in analyses of real traits (minimum =0.006; Supplementary Table 5).

S-LDSC analyses of UK Biobank data.

We applied S-LDSC with the baseline-LF model to 40 UK Biobank traits, estimated , , and the ratio using the 15 MAF bin annotations, and computed their standard errors using a jackknife procedure. We meta-analyzed the ratio, and multiplied it by the ratio of the number of low-frequency and common variants in the LD reference sample (i.e. 3,398,397/5,353,593) to convert it into a per-variant heritability ratio. To match these ratios to a model in which the variance of per-normalized genotype effect sizes is proportional to , we used low-frequency and common variants of our LD reference panel and computed the ratio using different values of α. The CVE and LFVE of each functional annotation were compared using a two-sided z-test; these values are independent as they are computed using non-overlapping sets of variants. The regression slope of LFVE on CVE was computed with no intercept. As most of the 33 annotations are correlated, we did not attempt to assess the statistical significance of the regression slope, or of the corresponding correlation between CVE and LFVE. We note that after removing the 9 annotations with significantly different LFVE and CVE in Figure 3, LFVE remained highly correlated to CVE (r=0.83) and only slightly larger than CVE on average (regression slope=1.10). For CTS analyses, we analyzed the 396 Roadmap[6] annotations constructed in Finucane et al.[8] from narrow peaks in six chromatin marks (DNase hypersensitivity, H3K27ac, H3K4me3, H3K4me1, H3K9ac, and H3K36me3) in a subset of a set of 88 primary cell types/tissues. We selected CTS annotations for which common variants are disease relevant following Finucane et al.[8] guidelines. First, we analyzed each CTS annotation in turn using default S-LDSC (i.e. not our extension to low-frequency variants) by conditioning on all the non-CTS annotations of the baseline-LD model v1.1, the union of annotations for each of the six chromatin marks, and the average of annotations for each mark (as performed in ref.[8]). We note that our choice to switch from the baseline model[5], as performed in ref.[8], to the baseline-LD model (which includes MAF bins and LD-related annotations in addition to new functional annotations) was motived by our observation that the baseline model can slightly overestimate functional enrichment due to unmodeled annotations[23]. We also decided to consider only non-CTS annotations and to remove the four enhancers annotations derived from Vahedi et al.[64] (absent from the baseline model and added in the baseline-LD model) as they are T-cell specific and may impact the detection of relevant cell types for traits for which T-cells are a relevant cell type (such as asthma and eczema; see Supplementary Figure 17). We retained all the CTS annotations with a coefficient statistically larger than 0 (using P<0.05/396), selecting a total of 637 trait-annotation pairs with at least one CTS annotation for 36 of 40 traits (all traits except high light scatter reticulocyte count, high cholesterol, sunburn occasion, and age at menopause), including 25 of 27 independent traits (Supplementary Table 9). Finally, we re-analyzed these 637 trait-annotation pairs using our extended S-LDSC with the baseline-LF model, the union of the six chromatin marks, and the average of annotations for each mark. In Figure 4, we report all 637 pairs for completeness, demonstrating the consistency between CVE and LFVE for CTS annotations (Supplementary Table 10). However, as the 1–53 CTS annotations selected for each trait are often highly correlated with each other, we selected for each of the 25 independent traits the “most critical” CTS annotation, defined in the main text and Figure 4 as the CTS annotation with the most statistically significant CVE. For these 25 annotations, we regressed their LFVE on their CVE with no intercept. We also considered 5 alternative definitions of the “most critical” CTS annotation for each trait; for each of these definitions, LFVE were similar to CVE (Supplementary Figure 9). Finally, when testing if a CTS annotation has a significantly larger LFVE than CVE, we used a trait-specific Bonferonni threshold (i.e. 0.05 divided by the number of CTS annotations retained for the trait). For gene set analyses based on the s metric[31], we divided variants into 5 bins containing the same number of genes (3,073; 3,072 for the last bin). For S-LDSC analyses, we added to the baseline-LF model two annotations for variants inside a protein coding gene (for low-frequency and common variants, respectively; we used the 17,484 protein-genes from ref.[65]), 10 annotations for variants inside the 5 gene sets, and 10 annotations for non-synonymous variants inside the 5 gene sets (22 annotations in total).

Forward simulations.

To investigate the connection between LFVE, CVE and the distribution of fitness effects (DFE), we performed forward simulations under a Wright-Fisher model with selection using SLiM2 software[40] (see URLs). We simulated 1Mb regions of genetic length 1cM with a uniform recombination rate and a uniform mutation rate (2.36×10−8, as recommended in SLiM manual). De novo mutations had probability π to be deleterious with a dominance coefficient of 0.5 and a selection coefficient s drawn from a gamma distribution with mean and shape , and had probability 1 - π to be neutral (i.e. s=0). We outputted a sample of 5,000 European genomes using the out-of-Africa demographic model of Gravel et al.[66] implemented in SLiM. Then, we used Eyre-Walker model[12] to compute the per-allele effect size , where c is a constant, N is the effective population size, s the selection coefficient of variant j, is the coupling coefficient between selection and phenotypic effect, and ε is a normally distributed noise. Here, c was set to have a trait heritability h=0.5 (i.e., where p is the allele frequency of variant j),N was set as the expected coalescent time[67] of the European population of the Gravel et al. model (6,524), and ε was set to 0 for simplicity. We note that we focused here on per-variant heritability (i.e. ) and not directional effects, thus our conclusions are independent of the direction of the selection coefficient on the trait and are valid for traits that are either under direct or stabilizing selection. Unlike our previous forward simulation framework[23], we designed these simulations to have a realistic DFE for annotations mimicking both non-synonymous and noncoding variants. Briefly, we created 50 non-synonymous elements with a realistic length 200bp (10kb in total, 1% of the 1Mb simulated genome) separated by non-coding elements of size 14.9kb (99% of the simulated genome; Supplementary Figure 11a). To mimic non-synonymous elements, we used π = 80%, = −3.16 × 10−3 and = 0.32, as previously estimated[13]. Then, we estimated that fixing π=40%, =−1.00×10−4, =0.32 for noncoding variants and =0.75 provide a good fit of our UK Biobank heritability and non-synonymous enrichment results (see Supplementary Note). In most subsequent simulations, we fixed the probability of a deleterious variant to be causal (π) at 10%, so that the proportion of de novo non-synonymous variants that are causal (π, defined as π=π·π) is 8% (resp. 4% for noncoding variants). This allows non-synonymous variants to have LFVE and CVE on the same order of magnitude as the LFVE and CVE observed for the non-synonymous variants inside genes predicted to be under strong negative selection[31] (102.0× and 41.4×, respectively; Figure 5). We note that we replicated our main results when using π=5% (Supplementary Figure 18). Next, we investigated the impact of and π on a “functional noncoding” annotation. To do so, we alternately considered 200kb functional elements as non-synonymous elements (1% of the simulated genome) or as functional noncoding elements (1% of the simulated genome), separated by “ordinary noncoding” elements of size 9.8kb (98% of the simulated genome; Supplementary Figure 11b). For each functional noncoding element, we fixed π=60% and =0.32 (equal to the value of for non-synonymous and overall noncoding elements). We chose a value π in between the value for overall noncoding (π=40%) and non-synonymous (π=80%) annotations, as we hypothesized that enriched functional noncoding annotations in the human genome have a larger proportion of deleterious variants than the overall noncoding genome. However, we note that we obtained similar results when choosing π=40% for the functional noncoding annotation (Supplementary Figure 19). We varied and π (and thus π) of the functional noncoding annotation, while retaining π=10% for the variants in the non-synonymous and ordinary noncoding elements. (We varied on the logarithmic scale, and report truncated values in the manuscript for simplicity; for example,=−0.003 stands for −3.1623×10−3; see Supplementary Table 12 for exact values). For each scenario, we simulated 1,000 regions of 1Mb for each scenario, merged the outputted variants, and considered 100 randomly chosen sets of causal variants. When drawing inferences about rare variant (MAF<0.5%) architectures of noncoding functional annotations, we focused on simulations with π=48% for the functional noncoding annotation, because the CVE and LFVE/CVE ratios for the CTS annotations in Figure 4a (between 5 and 20, and between 1 and 2, respectively) roughly correspond to π=48% and between 0.0002 and 0.0006 (Figure 6a-b).

62 in total

1. Estimation of SNP heritability from dense genotype data.

Authors: S Hong Lee; Jian Yang; Guo-Bo Chen; Stephan Ripke; Eli A Stahl; Christina M Hultman; Pamela Sklar; Peter M Visscher; Patrick F Sullivan; Michael E Goddard; Naomi R Wray
Journal: Am J Hum Genet Date: 2013-12-05 Impact factor: 11.025

2. Common SNPs explain a large proportion of the heritability for human height.

Authors: Jian Yang; Beben Benyamin; Brian P McEvoy; Scott Gordon; Anjali K Henders; Dale R Nyholt; Pamela A Madden; Andrew C Heath; Nicholas G Martin; Grant W Montgomery; Michael E Goddard; Peter M Visscher
Journal: Nat Genet Date: 2010-06-20 Impact factor: 38.330

3. Genome-wide inference of ancestral recombination graphs.

Authors: Matthew D Rasmussen; Melissa J Hubisz; Ilan Gronau; Adam Siepel
Journal: PLoS Genet Date: 2014-05-15 Impact factor: 5.917

4. A high-resolution map of human evolutionary constraint using 29 mammals.

Authors: Kerstin Lindblad-Toh; Manuel Garber; Or Zuk; Michael F Lin; Brian J Parker; Stefan Washietl; Pouya Kheradpour; Jason Ernst; Gregory Jordan; Evan Mauceli; Lucas D Ward; Craig B Lowe; Alisha K Holloway; Michele Clamp; Sante Gnerre; Jessica Alföldi; Kathryn Beal; Jean Chang; Hiram Clawson; James Cuff; Federica Di Palma; Stephen Fitzgerald; Paul Flicek; Mitchell Guttman; Melissa J Hubisz; David B Jaffe; Irwin Jungreis; W James Kent; Dennis Kostka; Marcia Lara; Andre L Martins; Tim Massingham; Ida Moltke; Brian J Raney; Matthew D Rasmussen; Jim Robinson; Alexander Stark; Albert J Vilella; Jiayu Wen; Xiaohui Xie; Michael C Zody; Jen Baldwin; Toby Bloom; Chee Whye Chin; Dave Heiman; Robert Nicol; Chad Nusbaum; Sarah Young; Jane Wilkinson; Kim C Worley; Christie L Kovar; Donna M Muzny; Richard A Gibbs; Andrew Cree; Huyen H Dihn; Gerald Fowler; Shalili Jhangiani; Vandita Joshi; Sandra Lee; Lora R Lewis; Lynne V Nazareth; Geoffrey Okwuonu; Jireh Santibanez; Wesley C Warren; Elaine R Mardis; George M Weinstock; Richard K Wilson; Kim Delehaunty; David Dooling; Catrina Fronik; Lucinda Fulton; Bob Fulton; Tina Graves; Patrick Minx; Erica Sodergren; Ewan Birney; Elliott H Margulies; Javier Herrero; Eric D Green; David Haussler; Adam Siepel; Nick Goldman; Katherine S Pollard; Jakob S Pedersen; Eric S Lander; Manolis Kellis
Journal: Nature Date: 2011-10-12 Impact factor: 49.962

5. UK biobank: an open access resource for identifying the causes of a wide range of complex diseases of middle and old age.

Authors: Cathie Sudlow; John Gallacher; Naomi Allen; Valerie Beral; Paul Burton; John Danesh; Paul Downey; Paul Elliott; Jane Green; Martin Landray; Bette Liu; Paul Matthews; Giok Ong; Jill Pell; Alan Silman; Alan Young; Tim Sprosen; Tim Peakman; Rory Collins
Journal: PLoS Med Date: 2015-03-31 Impact factor: 11.069

6. A global reference for human genetic variation.

Authors: Adam Auton; Lisa D Brooks; Richard M Durbin; Erik P Garrison; Hyun Min Kang; Jan O Korbel; Jonathan L Marchini; Shane McCarthy; Gil A McVean; Gonçalo R Abecasis
Journal: Nature Date: 2015-10-01 Impact factor: 49.962

7. Integrative annotation of chromatin elements from ENCODE data.

Authors: Michael M Hoffman; Jason Ernst; Steven P Wilder; Anshul Kundaje; Robert S Harris; Max Libbrecht; Belinda Giardine; Paul M Ellenbogen; Jeffrey A Bilmes; Ewan Birney; Ross C Hardison; Ian Dunham; Manolis Kellis; William Stafford Noble
Journal: Nucleic Acids Res Date: 2012-12-05 Impact factor: 16.971

8. Integrative analysis of 111 reference human epigenomes.

Authors: Anshul Kundaje; Wouter Meuleman; Jason Ernst; Misha Bilenky; Angela Yen; Alireza Heravi-Moussavi; Pouya Kheradpour; Zhizhuo Zhang; Jianrong Wang; Michael J Ziller; Viren Amin; John W Whitaker; Matthew D Schultz; Lucas D Ward; Abhishek Sarkar; Gerald Quon; Richard S Sandstrom; Matthew L Eaton; Yi-Chieh Wu; Andreas R Pfenning; Xinchen Wang; Melina Claussnitzer; Yaping Liu; Cristian Coarfa; R Alan Harris; Noam Shoresh; Charles B Epstein; Elizabeta Gjoneska; Danny Leung; Wei Xie; R David Hawkins; Ryan Lister; Chibo Hong; Philippe Gascard; Andrew J Mungall; Richard Moore; Eric Chuah; Angela Tam; Theresa K Canfield; R Scott Hansen; Rajinder Kaul; Peter J Sabo; Mukul S Bansal; Annaick Carles; Jesse R Dixon; Kai-How Farh; Soheil Feizi; Rosa Karlic; Ah-Ram Kim; Ashwinikumar Kulkarni; Daofeng Li; Rebecca Lowdon; GiNell Elliott; Tim R Mercer; Shane J Neph; Vitor Onuchic; Paz Polak; Nisha Rajagopal; Pradipta Ray; Richard C Sallari; Kyle T Siebenthall; Nicholas A Sinnott-Armstrong; Michael Stevens; Robert E Thurman; Jie Wu; Bo Zhang; Xin Zhou; Arthur E Beaudet; Laurie A Boyer; Philip L De Jager; Peggy J Farnham; Susan J Fisher; David Haussler; Steven J M Jones; Wei Li; Marco A Marra; Michael T McManus; Shamil Sunyaev; James A Thomson; Thea D Tlsty; Li-Huei Tsai; Wei Wang; Robert A Waterland; Michael Q Zhang; Lisa H Chadwick; Bradley E Bernstein; Joseph F Costello; Joseph R Ecker; Martin Hirst; Alexander Meissner; Aleksandar Milosavljevic; Bing Ren; John A Stamatoyannopoulos; Ting Wang; Manolis Kellis
Journal: Nature Date: 2015-02-19 Impact factor: 69.504

9. Contrasting genetic architectures of schizophrenia and other complex diseases using fast variance-components analysis.

Authors: Po-Ru Loh; Gaurav Bhatia; Alexander Gusev; Hilary K Finucane; Brendan K Bulik-Sullivan; Samuela J Pollack; Teresa R de Candia; Sang Hong Lee; Naomi R Wray; Kenneth S Kendler; Michael C O'Donovan; Benjamin M Neale; Nick Patterson; Alkes L Price
Journal: Nat Genet Date: 2015-11-02 Impact factor: 38.330

10. The UK10K project identifies rare variants in health and disease.

Authors: Klaudia Walter; Josine L Min; Jie Huang; Lucy Crooks; Yasin Memari; Shane McCarthy; John R B Perry; ChangJiang Xu; Marta Futema; Daniel Lawson; Valentina Iotchkova; Stephan Schiffels; Audrey E Hendricks; Petr Danecek; Rui Li; James Floyd; Louise V Wain; Inês Barroso; Steve E Humphries; Matthew E Hurles; Eleftheria Zeggini; Jeffrey C Barrett; Vincent Plagnol; J Brent Richards; Celia M T Greenwood; Nicholas J Timpson; Richard Durbin; Nicole Soranzo
Journal: Nature Date: 2015-09-14 Impact factor: 49.962

48 in total

1. Reconciling S-LDSC and LDAK functional enrichment estimates.

Authors: Steven Gazal; Carla Marquez-Luna; Hilary K Finucane; Alkes L Price
Journal: Nat Genet Date: 2019-08 Impact factor: 38.330

Review 2. Missing heritability of complex diseases: case solved?

Authors: Emmanuelle Génin
Journal: Hum Genet Date: 2019-06-04 Impact factor: 4.132

3. Extreme Polygenicity of Complex Traits Is Explained by Negative Selection.

Authors: Luke J O'Connor; Armin P Schoech; Farhad Hormozdiari; Steven Gazal; Nick Patterson; Alkes L Price
Journal: Am J Hum Genet Date: 2019-08-08 Impact factor: 11.025

4. Disease Heritability Enrichment of Regulatory Elements Is Concentrated in Elements with Ancient Sequence Age and Conserved Function across Species.

Authors: Margaux L A Hujoel; Steven Gazal; Farhad Hormozdiari; Bryce van de Geijn; Alkes L Price
Journal: Am J Hum Genet Date: 2019-03-21 Impact factor: 11.025

5. Benchmarker: An Unbiased, Association-Data-Driven Strategy to Evaluate Gene Prioritization Algorithms.

Authors: Rebecca S Fine; Tune H Pers; Tiffany Amariuta; Soumya Raychaudhuri; Joel N Hirschhorn
Journal: Am J Hum Genet Date: 2019-05-02 Impact factor: 11.025

6. Topologically associating domain boundaries that are stable across diverse cell types are evolutionarily constrained and enriched for heritability.

Authors: Evonne McArthur; John A Capra
Journal: Am J Hum Genet Date: 2021-02-04 Impact factor: 11.025

7. Negative selection on complex traits limits phenotype prediction accuracy between populations.

Authors: Arun Durvasula; Kirk E Lohmueller
Journal: Am J Hum Genet Date: 2021-03-09 Impact factor: 11.025

8. Inferring the Nature of Missing Heritability in Human Traits Using Data from the GWAS Catalog.

Authors: Eugenio López-Cortegano; Armando Caballero
Journal: Genetics Date: 2019-05-13 Impact factor: 4.562

9. Annotations capturing cell type-specific TF binding explain a large fraction of disease heritability.

Authors: Bryce van de Geijn; Hilary Finucane; Steven Gazal; Farhad Hormozdiari; Tiffany Amariuta; Xuanyao Liu; Alexander Gusev; Po-Ru Loh; Yakir Reshef; Gleb Kichaev; Soumya Raychauduri; Alkes L Price
Journal: Hum Mol Genet Date: 2020-05-08 Impact factor: 6.150

10. Genetic mapping of etiologic brain cell types for obesity.

Authors: Pascal N Timshel; Jonatan J Thompson; Tune H Pers
Journal: Elife Date: 2020-09-21 Impact factor: 8.140