Steven Gazal1,2, Po-Ru Loh3,4, Hilary K Finucane3,5, Andrea Ganna3,6,7, Armin Schoech8,3,9, Shamil Sunyaev3,4,10, Alkes L Price11,12,13. 1. Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, MA, USA. sgazal@hsph.harvard.edu. 2. Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA. sgazal@hsph.harvard.edu. 3. Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA. 4. Division of Genetics, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, USA. 5. Schmidt Fellows Program, Broad Institute of MIT and Harvard, Cambridge, MA, USA. 6. Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA, USA. 7. Analytic and Translational Genetics Unit, Department of Medicine, Massachusetts General Hospital, Boston, MA, USA. 8. Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, MA, USA. 9. Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA, USA. 10. Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA. 11. Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, MA, USA. aprice@hsph.harvard.edu. 12. Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA. aprice@hsph.harvard.edu. 13. Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA, USA. aprice@hsph.harvard.edu.
Abstract
Common variant heritability has been widely reported to be concentrated in variants within cell-type-specific non-coding functional annotations, but little is known about low-frequency variant functional architectures. We partitioned the heritability of both low-frequency (0.5%≤ minor allele frequency <5%) and common (minor allele frequency ≥5%) variants in 40 UK Biobank traits across a broad set of functional annotations. We determined that non-synonymous coding variants explain 17 ± 1% of low-frequency variant heritability ([Formula: see text]) versus 2.1 ± 0.2% of common variant heritability ([Formula: see text]). Cell-type-specific non-coding annotations that were significantly enriched for [Formula: see text] of corresponding traits were similarly enriched for [Formula: see text] for most traits, but more enriched for brain-related annotations and traits. For example, H3K4me3 marks in brain dorsolateral prefrontal cortex explain 57 ± 12% of [Formula: see text] versus 12 ± 2% of [Formula: see text] for neuroticism. Forward simulations confirmed that low-frequency variant enrichment depends on the mean selection coefficient of causal variants in the annotation, and can be used to predict effect size variance of causal rare variants (minor allele frequency <0.5%).
Common variant heritability has been widely reported to be concentrated in variants within cell-type-specific non-coding functional annotations, but little is known about low-frequency variant functional architectures. We partitioned the heritability of both low-frequency (0.5%≤ minor allele frequency <5%) and common (minor allele frequency ≥5%) variants in 40 UK Biobank traits across a broad set of functional annotations. We determined that non-synonymous coding variants explain 17 ± 1% of low-frequency variant heritability ([Formula: see text]) versus 2.1 ± 0.2% of common variant heritability ([Formula: see text]). Cell-type-specific non-coding annotations that were significantly enriched for [Formula: see text] of corresponding traits were similarly enriched for [Formula: see text] for most traits, but more enriched for brain-related annotations and traits. For example, H3K4me3 marks in brain dorsolateral prefrontal cortex explain 57 ± 12% of [Formula: see text] versus 12 ± 2% of [Formula: see text] for neuroticism. Forward simulations confirmed that low-frequency variant enrichment depends on the mean selection coefficient of causal variants in the annotation, and can be used to predict effect size variance of causal rare variants (minor allele frequency <0.5%).
Common variant (minor allele frequency (MAF) ≥5%) trait heritability
has been widely reported to be concentrated into noncoding functional annotations
that are active in relevant cell-types or tissues, with a limited role for common
coding variants[1-8]. Although common variants explain the bulk of
heritability[9-11], low-frequency variants can have
larger per-allele effect sizes than common variants when impacted by negative
selection[9-17], and may thus yield important biological
insights even though the heritability they explain is modest[6,7].Recent large genome-wide association studies (GWAS) have identified
low-frequency variants with large per-allele effect sizes and reported an excess of
genome-wide significant low-frequency variants in coding regions[18-21], implying that low-frequency coding variants have larger effect
sizes than other low-frequency variants. However, the relative contribution of
low-frequency coding variants to low-frequency variant heritability is currently
unknown. For cell-type-specific noncoding variants, discovery of genome-wide
significant low-frequency variants has been limited, and their contribution to
low-frequency variant heritability is also unknown. Dissecting low-frequency variant
functional architectures can shed light on the action of negative selection across
functional annotations and inform the design of low-frequency and rare variant
association studies[14,22].To investigate functional enrichments of low-frequency variants (defined here
as 0.5%≤MAF<5%), we extended stratified LD-score regression[5,23] (S-LDSC) to partition the heritability of both low-frequency
and common variants; our method produces robust (unbiased or slightly conservative)
results in simulations. We applied our method to partition the heritability of
low-frequency and common variants in 40 heritable traits from the UK
Biobank[24-26] (average N=363K UK-ancestry
samples) across a broad set of coding and noncoding functional annotations[5,6,8,23,27-31]. We
performed forward simulations to connect estimated low-frequency and common variant
functional enrichments to the action of negative selection, and to predict the
effect size variance of causal rare variants (MAF<0.5%) within each
functional annotation.
Results
Overview of methods
S-LDSC[5,23] is a method for partitioning the
heritability causally explained by common variants across overlapping discrete
or continuous annotations using genome-wide association study (GWAS) summary
statistics for accurately imputed variants and a linkage disequilibrium (LD)
reference panel. Here, we extended S-LDSC to partition the heritability causally
explained by low-frequency variants using GWAS summary statistics for accurately
imputed and poorly imputed variants. We included separate annotations for
low-frequency and common variants, and used WGS data from 3,567 UK10K
samples[18] as an LD
reference panel to ensure accurate LD information for low-frequency variants in
the UK-ancestry target samples analyzed in this study (see Methods).We jointly analyzed 163 annotations (referred as the “baseline-LF
model”), including 33 main binary annotations, MAF bins, and LD-related
annotations (Supplementary
Table 1 and Supplementary Table 2; see Methods). We note that the inclusion of MAF- and LD-related
annotations implies that the expected causal heritability of a SNP is a function
of MAF and LD. We first estimated the heritability causally explained by all
low-frequency variants () and the heritability causally explained by all
common variants (). For the 33 main binary annotations, we
computed their low-frequency variant enrichment (LFVE), defined as the
proportion of causally explained by variants in the
annotation divided by the proportion of low-frequency variants that lie in the
annotation, and common variant enrichment (CVE), defined analogously. Further
details of the method are provided in the Methods section. We have released
open-source software implementing the method, and have made our annotations
publicly available (see URLs).
Simulations of extending S-LDSC to low-frequency variants
Although S-LDSC has previously been shown to produce robust results for
partitioning common variant heritability using overlapping binary and continuous
annotations[23,32], we performed additional
simulations to assess our extension to low-frequency variants. We first
confirmed that S-LDSC with the UK10K LD reference panel produced unbiased
heritability estimates for variants with MAF≥0.5% in simulations using
UK10K target samples (see Supplementary Figure 1, Supplementary Table 3, and Supplementary Note). We
subsequently performed more realistic simulations using target samples from the
UK Biobank interim release[24],
so that LD (and MAF) in the target samples and UK10K LD reference panel do not
perfectly match (see Methods and Supplementary Figure 2).
S-LDSC was run either by restricting regression variants to accurately imputed
variants (i.e. INFO score[33]
≥0.99), as we recommended previously[5], or by including all variants (regardless of INFO
score). We focused our simulations on two representative annotations spanning
roughly 1% of the genome: coding and enhancer. We considered various
MAF-dependent architectures[34,35], and conservatively specified
our generative model to be different from the additive model assumed by S-LDSC
(see Methods). For each of the two
annotations, we simulated scenarios with no functional enrichment (“No
Enrichment”) and scenarios with CVE roughly equal to 7× and lower
LFVE (“Lower LFVE”), similar LFVE (“Same
Enrichment”), or higher LFVE (“Higher LFVE”), respectively.
For both annotations, we observed that including all variants in the regression
produced slightly conservative LFVE estimates and unbiased LFVE/CVE ratio
estimates, while restricting to accurately imputed variants produced upward
biases (Figure 1, Supplementary Table 4). The
slightly conservative and LFVE estimates are due to LD-dependent
architectures (coding and enhancer variants have lower than average levels of
LD, as do other enriched functional annotations[23]), as we observed nearly unbiased
estimates when creating shifted annotations with average levels of LD (see Methods and Supplementary Figure 3). We thus
recommend including all variants in the regression when running S-LDSC using the
baseline-LF model. Our simulations indicate that this method is robust (unbiased
or slightly conservative) in estimating low-frequency and common variant
functional enrichments and LFVE/CVE ratios across a wide range of genetic
architectures, even in the presence of poorly imputed variants, a target sample
that does not exactly match the UK10K LD reference panel, and a MAF-dependent
architecture that does not match the additive model assumed by S-LDSC.
Figure 1:
Simulations to assess low-frequency variant enrichment estimates.
We report estimates of LFVE and LFVE/CVE ratio in simulations under a
coding-enriched architecture (first row) or enhancer-enriched architecture
(second row). We considered four different simulation scenarios (see main text).
S-LDSC was run either by restricting regression variants to accurately imputed
variants (S-LDSC – INFO ≥ 0.99), or by including all variants
(S-LDSC – All variants). We do not report LFVE/CVE ratio for the No
Enrichment simulation (CVE=LFVE=1) due to unstable estimates; however, all
analyses of real traits in this paper focus on annotations with significant CVE.
Results are averaged across 1,000 simulations. Error bars represent 95%
confidence intervals. Numerical results for , , LFVE, CVE and LFVE/CVE ratio are reported in
Supplementary Table
4.
Low-frequency functional architecture of UK Biobank traits
We applied S-LDSC with the baseline-LF model to 40 polygenic, heritable
complex traits and diseases from the full UK Biobank release[25] (average N=363K; Supplementary Table 5).
Analyses were restricted to the set of 409K individuals with UK
ancestry[25] to ensure a
close ancestry match with the UK10K LD reference panel. Summary statistics were
computed by running BOLT-LMM v2.3 (ref.[26]) on imputed dosages, and made publicly available (see
URLs). S-LDSC results were
meta-analyzed across 27 independent traits (average N=355K; see
Supplementary
Note). We observed a roughly linear relationship between estimates of
and (Figure 2
and Supplementary Table
5), with low-frequency variants explaining 6.3±0.2×
less heritability and having 4.0±0.1× lower per-variant
heritability than common variants on average. These ratios are consistent with a
model in which the variance of per-normalized genotype effect sizes is
proportional to (where p is the minor allele
frequency; refs.[34,35]) with
α=−0.37 (95% confidence interval
[−0.40;−0.34]; similar to previous α estimates from raw
genotype-phenotype data[10,11]), and consistent with a model
in which low-frequency variants have smaller per-variant heritability but larger
per-allele effect sizes[10,11,23,34,35] (Supplementary Figure 4).
Figure 2:
Common variant heritability ()and low-frequency variant heritability
()estimates for 40 UK Biobank traits.
We report and estimated by S-LDSC with the baseline-LF model
for 40 UK Biobank traits (for binary traits, estimates are on the liability
scale), with 7 representative independent traits highlighted. Error bars
represent 95% confidence intervals. The dashed black line represents the ratio
between and meta-analyzed across 27 independent traits
(1/6.3). Grey lines represent expected ratios for different values of
α (see main text). Error bars represent 95%
confidence intervals. Numerical results are reported in Supplementary Table 5.
We compared the LFVE and CVE of the 33 main binary functional annotations
of the baseline-LF model, meta-analyzed across traits (Figure 3, Supplementary Table 6). LFVE were
highly correlated to CVE (r=0.79) and larger than CVE on
average (regression slope =1.85). We identified 9 main functional annotations
with significantly different LFVE and CVE (Figure
3, Supplementary
Table 6). Non-synonymous variants had the largest LFVE and largest
difference vs. CVE (5.0× ratio; LFVE=38.2±2.3×, vs.
CVE=7.7±0.9×; P=3×10−36
for difference). As non-synonymous variants comprise 0.45% of low-frequency
variants vs. only 0.27% of common variants due to strong negative selection on
non-synonymous mutations[36,37] (see below), this difference
is even larger when comparing the proportion of heritability they explain
(8.2× ratio; 17.3±1.0% of , vs. 2.1±0.2% of
;
P=5×10−47). Non-synonymous
variants predicted to be deleterious by PolyPhen-2 (ref.[29]) had larger LFVE and LFVE/CVE ratio than
non-synonymous variants predicted to be benign (Supplementary Figure 5).
Figure 3:
Functional low-frequency and common variant architectures across 27
independent UK Biobank traits.
We plot LFVE vs. CVE (log scale) for the 33 main functional annotations
of the baseline-LF model (meta-analyzed across the 27 independent traits),
highlighting annotations for which LFVE is significantly different from CVE.
Numbers in the legend represent the proportion of common / low-frequency
variants inside the annotation, respectively. The first three conserved
annotations are based on phastCons elements[27], Conserved in mammals* is based on GERP RS
scores[28] (≥4),
and Conserved in mammals** is based on Lindblad-Toh et al.[30]. The promoter flanking annotation has
(non-significantly) negative LFVE and is not displayed for visualization
purposes. The solid line represents LFVE=CVE; dashed lines represent
LFVE=constant multiples of CVE. Error bars represent 95% confidence intervals.
Numerical results are reported in Supplementary Table 6.
We also observed LFVE significantly larger than CVE for coding variants
(2.5× ratio; P=1×10−18),
5’ UTR (2.5× ratio;
P=1×10−4) and the five main
conserved annotations[27,28,30] (ratios 1.5×-2.2×; each
P<5×10−7; Figure 3, Supplementary Table 6).
Surprisingly, phastCons regions conserved in primates[27] were more enriched than phastCons
regions conserved in vertebrates or conserved in mammals[27] (even though regions conserved in more
distant species may be viewed as more biologically critical). We observed that
the significantly larger LFVE (compared to CVE) for all 5 conserved annotations
is mainly due to conserved regions that are coding, and that coding enrichments
are similar for regions conserved across different species (Supplementary Figure 6). Finally,
we observed significantly smaller LFVE than CVE for intronic variants
(0.85× ratio; P=8×10−5). These
results were generally consistent across the 40 UK Biobank traits analyzed
(Supplementary Figure
7).We also observed significantly larger enrichment/depletion for LFVE than
for CVE in the first and/or last quintile of LD-related continuous annotations
related to negative selection[23] (Supplementary Figure 8 and Supplementary Table 7); our forward
simulations from ref.[23]
confirmed larger effects of low-frequency variants in these LD-related
annotations (Supplementary
Table 8). Overall, our results suggest that LFVE is substantially
larger than CVE only for annotations that are strongly constrained by negative
selection, as the strongest differences were observed for coding and
non-synonymous variants, which are known to be under strong negative
selection[36,37]. A more detailed interpretation of the
LFVE/CVE ratio is provided below (see Forward
simulations).
Cell-type-specific enrichments of low-frequency variants
We sought to investigate the contribution to low-frequency variant
architectures of cell-type-specific (CTS) annotations[1-6] (i.e. reflecting regulatory activity in a given cell type)
with excess contributions to common variant architectures. For each of the 40 UK
Biobank traits, we selected the subset of 396 CTS Roadmap annotations[6] with statistically significant
common variant enrichment after conditioning on (non-CTS annotations in) the
baseline-LD model[5,8] (see Methods). We selected a total of 637 trait-annotation pairs, with at
least one CTS annotation for 36 of 40 traits (25 of 27 independent traits)
(Supplementary Table
9); the 637 CTS annotations contained 2.7% of common variants and
3.0% of low-frequency variants on average (Supplementary Table 10). We
analyzed each of these trait-annotation pairs using the baseline-LF model (Figure 4a and Supplementary Table 10). For the 25
trait-annotation pairs with the most statistically significant CVE for each of
the 25 independent traits (critical CTS annotations), LFVE and CVE were similar,
with LFVE 1.12±0.13× larger than CVE on average (other definitions
of critical CTS annotations produced similar conclusions; see Supplementary Figure 9).
Figure 4:
Low-frequency and common variant architectures of cell-type-specific (CTS)
annotations.
For 637 trait-annotation pairs with conditionally statistically
significant common variant enrichment, we report (a) LFVE vs. CVE
(log scale) and (b) proportion of vs. proportion of explained. The dashed black line in (a)
represents the regression slope for 25 critical CTS annotations for independent
traits (see main text). Brain-specific annotations are denoted in blue. Two
trait-H3K4me3 annotation pairs with LFVE significantly larger than CVE are
denoted in dark blue (see main text); error bars represent 95% confidence
intervals. The two arrows in (b) denote All autoimmune diseases (H3K4me1 in
Regulatory T-cells; left arrow) and Monocyte count (H3K4me1 in Primary
monocytes; right arrow) (see main text). Results for coding and non-synonymous
annotations (meta-analysis across 27 independent traits) are denoted in red;
error bars represent 95% confidence intervals. Numerical results are reported in
Supplementary Table
10.
We observed Bonferroni-significant differences (after correcting each
trait for 1–53 annotations tested) for two traits. The most significant
trait-annotation pairs were neuroticism and H3K4me3 in brain dorsolateral
prefrontal cortex, vs. CVE=8.3±1.5×; P=0.001;
63.2±15.4% of , vs. 11.1±2.0% of
). We note that these results are not driven by
the fact that H3K4me3 marks are often located in 5’ UTR and
exons[38] (Supplementary Table 10).
Interestingly, these two annotations (and 55 of all 62 CTS annotations with
LFVE/CVE>2) are brain-specific, implicating stronger selection against
variants impacting gene regulation in brain tissues (see Forward simulations and Discussion).While CTS annotations generally have only moderately large LFVE (e.g.
smaller than non-synonymous variants; Figure
4a), they often explain a large proportion of
(e.g. larger than non-synonymous variants;
Figure 4b) due to large annotation
size, as with common variant enrichment. In particular, H3K4me1 in regulatory
T-cells (3.7% of low-frequency variants) explains 86.2±20.8% of
for All autoimmune diseases (vs. 3.4% of common
variants explaining 48.9±9.1% of ), and H3K4me1 in primary monocytes (4.8% of
low-frequency variants) explains 79.3±18.1% of for monocyte count (vs. 4.6% of common variants
explaining 70.8±8.6% of ; Figure 4b
and Supplementary Table
10). Thus, CTS annotations often dominate low-frequency
architectures, analogous to common variant architectures[5,8].
Larger non-synonymous enrichments in genes under selection
Recent studies have identified gene sets that are depleted for
non-synonymous variants[31,39]. To further investigate the
connection between functional enrichment and negative selection, we stratified
the CVE and LFVE of non-synonymous variants (Figure 3a) based on the strength of selection on the underlying
genes. We considered 5 bins of estimated values of selection coefficients for
heterozygous protein-truncating variants[31] (s),
with 3,073 protein-coding genes per bin, and added annotations based on
non-synonymous variants within each bin to the baseline-LF model (see Methods). We determined that both the LFVE
and CVE of non-synonymous variants correlated strongly with the predicted
strength of selection on the underlying genes (Figure 5 and Supplementary Table 11). In particular, we observed extremely strong
enrichments for non-synonymous variants in genes under the strongest selection
(bin 1: LFVE=102.0±7.9× and CVE=41.5±4.8×). However,
the LFCE/CVE ratio was smaller for non-synonymous variants in genes under the
strongest selection (bin 1: 2.5×) than in genes under the weakest
selection (bins 4+5: 5.8×); we discuss this surprising result below (see
Forward simulations). We obtained
similar results when stratifying non-synonymous variants in genes under varying
levels of selective constraint based on other related criteria (Supplementary Figure 10).
Figure 5:
Low-frequency and common variant enrichments for non-synonymous variants vary
with the strength of selection on the underlying genes.
We report LFVE vs. CVE (log scale) for non-synonymous variants in 5 bins
of s (see main text),
meta-analyzed across 27 independent UK Biobank traits; bins 4+5 are merged for
visualization purposes. Numbers in the legend represent the proportion of common
/ low-frequency variants inside the annotation, respectively. The solid line
represents LFVE=CVE; dashed lines represent LFVE=constant multiples of CVE.
Error bars represent 95% confidence intervals. Numerical results for each bin
are reported in Supplementary
Table 11.
Forward simulations confirm role of negative selection
We hypothesized that the LFVE and CVE of different functional
annotations would be informative for the action of negative selection, which
constrains strongly selected variants to lower frequency[9-17]. To investigate this, we performed forward
simulations[40] using a
genetic architecture involving annotations mimicking non-synonymous variants (1%
of the simulated genome), functional noncoding variants (1%), and ordinary
noncoding variants (98%), with different respective distributions of selection
coefficients s (Supplementary Figure 11). For each
of these three annotations we specified the probability for a de
novo variant to be deleterious
(π), the mean selection coefficient
for de novo deleterious variants () and the probability for a deleterious variant
to be causal for the trait (π); the
probability for a de novo variant to be causal for the trait is
π=π·π.
Per-allele trait effect sizes were specified to be proportional
to|| where parameterizes the coupling between selection
coefficient and trait effect size in the Eyre-Walker model[12], implying that only deleterious variants
have nonzero effects (see Methods). We
investigated how the LFVE and CVE of the functional noncoding annotation varied
as a function of the values of and π for that annotation. To achieve a
realistic simulation framework, we fixed the remaining values of
π, and π for the three annotations, as well
as the value of , to values that we fit using our UK Biobank
estimate of 4.0× larger per-variant heritability for common vs.
low-frequency variants, as well as the LFVE and CVE of non-synonymous variants
(38.2× and 7.7×, respectively). Specifically, we fixed
π=60% for the functional noncoding
annotation (similar results for π=40%; see
Methods);
π=80% (ref.[13]), =−0.003 (ref.[13]) and π=8% for the non-synonymous
annotation; π=40%,
=−0.0001 and π=4% for the ordinary
noncoding annotation; and =0.75. We note that our fitted value of
is larger than previous estimates[11,13,15,16] (see Discussion).We determined that the CVE of the functional noncoding annotation in our
simulations depends on both and π (Figure 6a), while the LFVE/CVE ratio depends primarily on
(Figure
6b). When de novo deleterious variants are under strong
selection (≥−0.0003, corresponding to
LFVE/CVE ratio ≥1.2×; Figure
6b), the CVE depends primarily on π (Figure 6a), as the mean selection coefficient of
deleterious common variants varies only weakly with (since most deleterious common variants have
s<<||; Figure
6c). Finally, we observed that functional noncoding annotations with
similar CVE and LFVE tend to have causal variants with slightly stronger
selection coefficients (i.e. ≈−0.0002) than ordinary noncoding
causal variants (=−0.0001), for which LFVE is lower than
CVE (Figure 6b). We note that the LFVE/CVE
ratio can be used to infer the mean selection coefficient of deleterious causal
variants as a function of MAF (see Figure
6c), because this ratio depends primarily on
and because the selection coefficients of
de novo deleterious causal variants are drawn from a
distribution with mean .
Figure 6:
Forward simulations enable inferences about negative selection and rare
variant architectures.
Results are based on forward simulations involving an annotation
mimicking functional noncoding variants, as well as other annotations (see
text). (a,b) We report the CVE (a) and LFVE/CVE ratio (b) of the
functional noncoding annotation as a function of the mean selection coefficient
for de novo deleterious variants () and the probability of a de
novo variant to be causal (π) for this annotation.
and π values for non-synonymous and
ordinary noncoding annotations are described in the main text. (c)
We report the mean absolute selection coefficient of deleterious variants in the
functional noncoding annotation as a function of and MAF (rare, low-frequency, common).
(d) We report the mean squared per-allele effect size of causal
variants in the functional noncoding annotation (normalized by the mean squared
per-allele effect size of rare causal non-synonymous variants) as a function of
and MAF (rare, low-frequency and common). Red
lines denote the value =−0.003 used to simulate non-synonymous
variants, grey lines denote the value =−0.0001 used to simulate ordinary
noncoding variants (see main text). The value π=48% used in (d) (see
Methods) is denoted via squares in (a) and (b). Numerical results are reported
in Supplementary Table
12.
Our forward simulations provide an interpretation of the LFVE/CVE ratios
of different functional annotations that we estimated for UK Biobank traits and
annotations. First, they confirm that non-synonymous variants (which are
strongly deleterious[41]: large
π and ||) can have a limited contribution to common
variant architectures (2.1% of ) but a large contribution to low-frequency
variant architectures (17.3% of ) (Figure
3a). Second, they indicate that the proportion of causal variants
(π) is larger for critical cell-type-specific (CTS) annotations than for
non-synonymous variants (based on their CVE; Figure 4a), but that the causal variants in critical CTS annotations
have only slightly larger selection coefficients than ordinary noncoding
variants, except for some brain annotations that are under much stronger
selection (much larger ||, based on their LFVE/CVE ratios;
Figure 4a). Third, they explain the
extremely large CVE for non-synonymous variants inside genes predicted to be
under strong negative selection[31] (large s;
Figure 5), which are expected to
correspond to genes with an extremely large proportion of deleterious
non-synonymous variants (large πdel, implying large
π=π·π).
However, despite extremely large CVE and LFVE, this class of variants had a
smaller LFVE/CVE ratio than that of non-synonymous variants inside genes
predicted to be under weak selection (Figure
5), a surprising result that appears to suggest a smaller(Figure 6b) despite the extremely large value
of π. We performed additional forward
simulations to show that a larger || doesnot produce larger LFVE/CVE ratios for
annotations with extremely large values of
π, for which the ratio between the
proportion of low-frequency variants that are deleterious and the proportion of
common variants that are deleterious is reduced to 1 (Supplementary Figure 12).Although our focus is primarily on low-frequency variants
(0.5%≤MAF<5%), we also used our forward simulation framework to
draw inferences about rare variant (MAF<0.5%) architectures of noncoding
functional annotations, based on LFVE and CVE estimates from UK Biobank (Figure 4a). Specifically, we compared the
mean squared per-allele effect size of rare causal variants in annotations
mimicking functional noncoding variants and non-synonymous variants,
respectively. We inferred disproportionate causal effects of rare variants in
annotations under very strong selection (||=−0.003, similar to non-synonymous
variants[13]), with mean
squared causal effect sizes 11×, 26× and 60× larger than
annotations with ||=−0.0006, ||=−0.0003 and ||=−0.0002, respectively (Figure 6d and Supplementary Table 12; similar
results for different choices of π, Supplementary Figure 13). These
results indicate that an annotation with large CVE needs to have even larger
LFVE (e.g. LFVE/CVE ratio ≥2×, corresponding to
||≤−0.0006; Figure 6b) in order to harbor rare causal variants
with substantial mean squared effect sizes (e.g. only an order of magnitude
smaller than rare causal non-synonymous variants; Figure 6d). Unfortunately, most of the non-brain CTS annotations
that we analyzed do not achieve this ratio (Figure
4a), motivating further work on more precise noncoding annotations
(see Discussion).
Discussion
In this study, we partitioned the heritability of both low-frequency and
common variants in 40 UK Biobank traits across numerous functional annotations,
employing an extension of stratified LD score regression[5,23] to
low-frequency and common variants that produces robust (unbiased or slightly
conservative) results. Meta-analyzing functional enrichments across 27 independent
traits, we highlighted the critical impact of low-frequency non-synonymous variants
(17.3% of , LFVE=38.2×) compared to common
non-synonymous variants (2.1% of , CVE=7.7×). Other annotations previously
linked to negative selection, including non-synonymous variants with high PolyPhen-2
scores[29], non-synonymous
variants in genes under strong selection[31], and LD-related annotations[23], were also significantly more enriched for
as compared to . Finally, at the trait level, we observed that CTS
annotations[6,8] also dominate the low-frequency architecture,
and that significant CVE tend to have similar LFVE, or larger LFVE for brain-related
annotations and traits. This last observation implicate the action of negative
selection on low-frequency variants affecting gene regulation in the brain, and is
consistent with the interaction between brain enhancers and genes under stronger
purifying selection[18], and with
the excess of rare de novo mutations in regulatory elements active
in fetal brain in patients with neurodevelopmental disorders[43]. We showed via forward simulations that the
CVE of an annotation depends primarily on its proportion of causal variants
(π), while its LFVE/CVE ratio depends primarily on the mean selection
coefficient for de novo deleterious variants
(), and thus to the mean selection coefficient of
causal variants (Figure 6). These conclusions
are consistent with previous studies of the role of selection[9-17], including pleiotropic selection[17], in maintaining variants with large effects
on complex traits at low frequencies. Overall, our work quantifies the relationship
between the strength of selection in specific functional annotations (both coding
and noncoding) and low-frequency and common variant enrichment for human diseases
and complex traits, providing an interpretation of the enrichments estimated for UK
Biobank traits and annotations.Our results on low-frequency variant functional architectures have several
implications for downstream analyses. First, our results provide guidance for the
design of association studies targeting low-frequency variants. Non-synonymous
variants should be strongly prioritized at the low-frequency variant level[21], as they explain a large
proportion of and directly implicate causal genes (and
specifically implicate core disease genes rather than peripheral genes[7]), avoiding the challenge of mapping
noncoding variants to genes[42,44]. However, we observed that all
coding and UTRs variants jointly explained only 26.8±1.9% of
(Supplementary Table 6), providing an upper bound of the proportion of
low-frequency signal captured by whole-exome sequencing (WES) studies. This
underscores the advantages of large GWAS (with imputed genotypes obtained using
large reference panels), compared to WES or exome chip data, for querying
low-frequency variation[16].
Furthermore, using functionally informed association tests that assign higher weight
to low-frequency non-synonymous variants or CTS annotations should significantly
improve power in these analyses[4,20,45]. Second, our results provide guidance for the design of
association studies targeting rare (MAF<0.5%) variants, which require large
sequencing datasets[14]. While WES
datasets have been successfully used to detect new coding variants, genes and gene
sets associated to human diseases and complex traits, there is an increasing focus
on WGS that can capture rare noncoding variants. However, our LFVE and CVE results
for critical CTS annotations (Figure 4),
coupled with our predictions of causal rare variant effect size variance (Figure 6d), suggest that in most instances these
annotations do not harbor causal variants with large mean squared effect sizes (with
brain-related annotations and traits as a notable exception; also see ref.[43]), highlighting the need for more
precise noncoding annotations for prioritization in WGS. As a first step towards
this goal, we estimated the LFVE and CVE of annotations constructed using a wide
range of recently developed noncoding variant prioritization scores[46-50]. We identified only one annotation, defined using the top
0.5% of Eigen scores[48], with an
LFVE/CVE ratio significantly larger than 1 (1.7× ratio;
LFVE=22.0±2.2×, vs. CVE=13.0±1.4×;
P=7×10−4 for difference; Supplementary Figure 14).
However, even for this annotation, the LFVE/CVE ratio <2 again implies that
this annotation does not harbor causal variants with substantial mean squared effect
sizes (only an order of magnitude smaller than rare causal non-synonymous variants;
Figure 6d). Third, our results were
consistent with strong coupling between selection coefficient and trait effect size
(Eyre-Walker coupling parameter[12]
=0.75; robust to error bars in LFVE and CVE
estimates, see Supplementary
Figure 15), implicating a larger impact of negative selection on complex
traits than previously reported[11,13,15,16] and much larger
effect sizes for rare variants in functional annotations with strong selection
coefficients. This can be explained by the fact that our inference procedure
explicitly allows different distributions of selection coefficients for
non-synonymous and noncoding variants (=−0.003 and =−0.0001, respectively; Supplementary Figure 16). Finally, the
different LFVE/CVE ratios that we inferred for different functional annotations
suggest that it may be appropriate to allow annotation-specific
α values when using the α model
(per-normalized genotype effect size proportional to (; refs.[10,11,34,35]).
In the extreme case of non-synonymous variants, we explored different choices of
α values for non-synonymous and other variants, and
determined that a value of α=−1.10 for non-synonymous
variants and α=−0.30 for other variants provided the
best fit our UK Biobank heritability and enrichment results (Supplementary Table 13).Although our work has provided insights on low-frequency variant
architectures of human diseases and complex traits, it has several limitations (see
Supplementary Note).
Despite these limitations, our low-frequency and common variant enrichment results
convincingly demonstrate and quantify the action of negative selection across coding
and noncoding functional annotations.
Methods
Extension of S-LDSC to low-frequency variants.
S-LDSC[5,23] is a method for partitioning
heritability explained by common variants across overlapping annotations (both
binary and continuous[23]) using
GWAS summary statistics. More precisely, S-LDSC models the vector of per
normalized genotype effect size β as a mean-0 vector
whose variance depends on D continuous-valued annotations
: where α (j)
is the value of annotation a at
variant j, and represents the per-variant contribution of one
unit of the annotation α to
heritability. We can thus perform a regression to infer the values of
using the following relationship with the
expected statistic of variant j:
where is the LD score of variant j
with respect to continuous values αd(k) of annotation
αd, r is
the correlation between variant j and k in an
LD reference panel, N is the sample size of the GWAS study,and
b is a term that measures the contribution of confounding
biases[51]. Then, the
heritability causally explained by a subset of variants S can
be estimated as . We note that this definition, used here to
define and estimate and , is different from the definition of
“SNP-heritability” (ref.[52]), which refers to the heritability tagged by a set of
genotyped and/or imputed variants.To allow different effects for low-frequency and common variants inside
a functional annotation α,
we modeled the variance of the per normalized genotype effect sizes using
different for these two categories of variants. In a case
where we consider D functional
annotations, we write: where (resp. ) is an indicator function with value 1 if
variant j is a low-frequency (resp. common) variant, and 0
otherwise, (resp. ) represents the per-variant contribution of
one unit of the annotation
α to the heritability
explained by low-frequency (resp. common) variants. These parameters can be
estimated using S-LDSC by writing equation (3) in the form: where (resp. ) is an annotation equals to
α(j) if variant
j is a low-frequency (resp. common) variant and 0
otherwise. In all analyses we also added one annotation containing all the
variants, 5 MAF bins for low-frequency variants, and 10 MAF bins for common
variants in order to take into account MAF-dependent effects[23,53,54].For each functional binary annotation of interest
α, we compared its
low-frequency variant enrichment (LFVE) and common variant enrichment (CVE),
defined as the proportion of (resp. ) explained by the annotation, divided by the
proportion of low-frequency (resp. common) variants that are in the annotation
(see Supplementary Note
for a justification of the denominator). Standard errors were computed using a
block jackknife procedure[5]. We
note that these computations did not include the heritability causally explained
by rare variants (MAF<0.5%).Application of S-LDSC was performed using 3,567 unrelated individuals of
UK10K data set[18] (ALSPAC and
TWINSUK cohorts) as an LD reference panel. This choice was made in order to
ensure a close ancestry match between the target sample used to compute summary
statistics (UK Biobank) and the LD reference panel (UK10K), as LD patterns of
low-frequency variants are expected to vary across European
populations[55,56] (see Supplementary Note for more
information on our application of S-LDSC). The main differences of our
application of S-LDSC compared to standard S-LDSC analyses on common variants
are summarized in Supplementary Table 14.
Baseline-LF model and functional annotations.
We considered 34 main functional annotations from the baseline-LD model
v1.1 (27 binary and 7 continuous annotations, including LD-related annotations;
refs.[5,23,57,58]), including
coding, UTR, promoter and intronic regions, the histone marks monomethylation
(H3K4me1) and trimethylation (H3K4me3) of histone H3 at lysine 4, acetylation of
histone H3 at lysine 9 (H3K9ac) and two versions of acetylation of histone H3 at
lysine 27 (H3K27ac), open chromatin as reflected by DNase I hypersensitivity
sites (DHSs), combined chromHMM and Segway predictions (which make use of many
Encyclopedia of DNA Elements (ENCODE) annotations to produce a single partition
of the genome into seven underlying chromatin states), three different conserved
annotations, two versions of super-enhancers, FANTOM5 enhancers, typical
enhancers, and 6 LD-related continuous annotations (see Supplementary Table 1).In order to further dissect the set of coding variants, a major focus of
this study, we annotated each coding variant using ANNOVAR[59], and added one synonymous and one
non-synonymous annotation to our model. We also added three new annotations
based on phastCons[27] conserved
elements (46 way) in vertebrates, mammals and primates, and one annotation based
on flanking bivalent TSS/enhancers from Roadmap data[6] (see URLs). These 6 new annotations led to a total of 33 main binary
annotations (see Supplementary
Table 1).We included 500 bp windows around each binary annotation and 100 bp
windows around four of the main annotations, leading to a total of 74 main
functional annotations. Then, all annotations were duplicated for low-frequency
and common variants as described in equation (4), except for the predicted
allele age annotation[60] (which
had too many missing values for low-frequency variants). Finally, we included
one annotation containing all variants, 10 common variant MAF bins (as in the
baseline-LD model[23]) and 5
low-frequency variant 5 MAF bins. We thus obtained a set of 163 total
annotations. We refer to this set of annotations as the “baseline-LF
model” (see Supplementary Table 2), which we used for all of our S-LDSC
analyses. More details on the baseline-LF model are provided in the Supplementary Note.We note that the inclusion of MAF and LD-related annotations in this
model implies that the expected causal heritability of a SNP is a function of
MAF and LD. More details on LD-related heritability models are provided in the
Supplementary
Note.
Simulations using UK Biobank target samples to assess extension of S-LDSC to
low-frequency variants.
To assess possible biases in heritability and enrichment estimates under
a more realistic scenario, we simulated quantitative phenotypes from chromosome
1 of UK Biobank interim release dataset with imputed variants from thousand
genomes[61] and
UK10K[18] (113,851
unrelated individuals, 1,023,655 variants with allele counts greater or equal to
5 in UK10K). First, we randomly sampled integer-valued genotypes from UK Biobank
imputation dosage data. Second, we set trait heritability to
h[2]=0.5, selected M=100,000 causal variants, and
performed simulations under a coding-enriched architecture by simulating the
variance of per-normalized genotype effect sizes proportional to
, where 1
(resp.1 )is an indicator function
taking the value 1 if variant j belongs (resp. does not belong)
to the coding annotation,p is the frequency of the causal
variant in the simulated UK Biobank genotypes dataset,
α was set to
−0.25, and c and
α were chosen
to produce four different genetic architectures (see Supplementary Table 4). We note
that this generative model is different and more complex than the additive
inference model implemented in S-LDSC, but may be more realistic as the effect
size of coding variants depends now directly on their allele-frequency (and not
or their low-frequency/common status). We also performed simulations under an
enhancer-enriched architecture by considering the baseline ChromHMM/Segway
weak-enhancer[62]
annotation, which has similar properties as the coding annotation (2.28% of
reference low-frequency variants versus 1.83% for coding, and elements with a
mean length size of 249bp versus 315bp for coding). To investigate the impact of
the LD-dependent architecture created by the enrichment of these two annotations
(coding and weak-enhancer variants tend to have low levels of LD[23]), we randomly created 100
shifted coding (resp. weak-enhancer) annotations, and selected the annotation
with an average level of LD (i.e. the shifted annotation with the
50th smallest level of LD computed on low-frequency variants; see
ref.[23] for a
definition of level of LD). Third, we used version 2.3 of BOLT-LMM
software[26,63] (see URLs) to compute association statistics on UK Biobank dosage data to
mimic the fact that we computed summary statistics on imputed data. Finally, we
used S-LDSC with our baseline-LF model (except that the 6 new functional
annotations were not included in the simulation analyses) to estimate
, , and coding/enhancer CVE and LFVE. S-LDSC was
run by restricting regression variants to accurately imputed variants (i.e. INFO
score[33] ≥
0.99), as we suggested previously[5], or to all variants (irrespective of INFO score). We also
report results when using an INFO score threshold of 0.5 or 0.9, which did not
improve the results (see Supplementary Table 4). We also considered including INFO score
explicitly in the regression to down-weight poorly imputed variants (i.e.
replacing equation (2) by , where
I is the INFO score of
variant j and ; this approximation assumes that genotype
uncertainty decreases the association test statistics), but this did not improve
the results, consistent with the fact that summary statistics computed from
dosage data already down-weight poorly imputed variants (Supplementary Table 4). We
performed 1,000 simulations for each simulation scenario. In each case, we
removed 0–3 outlier simulations in which the estimate of
was below 0.0001; we did not observe any such
outlier results in analyses of real traits (minimum =0.006; Supplementary Table 5).
S-LDSC analyses of UK Biobank data.
We applied S-LDSC with the baseline-LF model to 40 UK Biobank traits,
estimated , , and the ratio using the 15 MAF bin annotations, and
computed their standard errors using a jackknife procedure. We meta-analyzed the
ratio, and multiplied it by the ratio of the
number of low-frequency and common variants in the LD reference sample (i.e.
3,398,397/5,353,593) to convert it into a per-variant heritability ratio. To
match these ratios to a model in which the variance of per-normalized genotype
effect sizes is proportional to , we used low-frequency and common variants of
our LD reference panel and computed the ratio using different values of
α.The CVE and LFVE of each functional annotation were compared using a
two-sided z-test; these values are independent as they are
computed using non-overlapping sets of variants. The regression slope of LFVE on
CVE was computed with no intercept. As most of the 33 annotations are
correlated, we did not attempt to assess the statistical significance of the
regression slope, or of the corresponding correlation between CVE and LFVE. We
note that after removing the 9 annotations with significantly different LFVE and
CVE in Figure 3, LFVE remained highly
correlated to CVE (r=0.83) and only slightly larger than CVE on
average (regression slope=1.10).For CTS analyses, we analyzed the 396 Roadmap[6] annotations constructed in Finucane et
al.[8] from narrow peaks
in six chromatin marks (DNase hypersensitivity, H3K27ac, H3K4me3, H3K4me1,
H3K9ac, and H3K36me3) in a subset of a set of 88 primary cell types/tissues. We
selected CTS annotations for which common variants are disease relevant
following Finucane et al.[8]
guidelines. First, we analyzed each CTS annotation in turn using default S-LDSC
(i.e. not our extension to low-frequency variants) by conditioning on all the
non-CTS annotations of the baseline-LD model v1.1, the union of annotations for
each of the six chromatin marks, and the average of annotations for each mark
(as performed in ref.[8]). We
note that our choice to switch from the baseline model[5], as performed in ref.[8], to the baseline-LD model (which includes
MAF bins and LD-related annotations in addition to new functional annotations)
was motived by our observation that the baseline model can slightly overestimate
functional enrichment due to unmodeled annotations[23]. We also decided to consider only
non-CTS annotations and to remove the four enhancers annotations derived from
Vahedi et al.[64] (absent from
the baseline model and added in the baseline-LD model) as they are T-cell
specific and may impact the detection of relevant cell types for traits for
which T-cells are a relevant cell type (such as asthma and eczema; see Supplementary Figure 17).
We retained all the CTS annotations with a coefficient statistically larger than 0 (using
P<0.05/396), selecting a total of 637
trait-annotation pairs with at least one CTS annotation for 36 of 40 traits (all
traits except high light scatter reticulocyte count, high cholesterol, sunburn
occasion, and age at menopause), including 25 of 27 independent traits (Supplementary Table 9).
Finally, we re-analyzed these 637 trait-annotation pairs using our extended
S-LDSC with the baseline-LF model, the union of the six chromatin marks, and the
average of annotations for each mark. In Figure
4, we report all 637 pairs for completeness, demonstrating the
consistency between CVE and LFVE for CTS annotations (Supplementary Table 10). However,
as the 1–53 CTS annotations selected for each trait are often highly
correlated with each other, we selected for each of the 25 independent traits
the “most critical” CTS annotation, defined in the main text and
Figure 4 as the CTS annotation with the
most statistically significant CVE. For these 25 annotations, we regressed their
LFVE on their CVE with no intercept. We also considered 5 alternative
definitions of the “most critical” CTS annotation for each trait;
for each of these definitions, LFVE were similar to CVE (Supplementary Figure 9). Finally,
when testing if a CTS annotation has a significantly larger LFVE than CVE, we
used a trait-specific Bonferonni threshold (i.e. 0.05 divided by the number of
CTS annotations retained for the trait).For gene set analyses based on the
s metric[31], we divided variants into 5
bins containing the same number of genes (3,073; 3,072 for the last bin). For
S-LDSC analyses, we added to the baseline-LF model two annotations for variants
inside a protein coding gene (for low-frequency and common variants,
respectively; we used the 17,484 protein-genes from ref.[65]), 10 annotations for variants inside the
5 gene sets, and 10 annotations for non-synonymous variants inside the 5 gene
sets (22 annotations in total).
Forward simulations.
To investigate the connection between LFVE, CVE and the distribution of
fitness effects (DFE), we performed forward simulations under a Wright-Fisher
model with selection using SLiM2 software[40] (see URLs). We
simulated 1Mb regions of genetic length 1cM with a uniform recombination rate
and a uniform mutation rate (2.36×10−8, as recommended
in SLiM manual). De novo mutations had probability
π to be deleterious with a
dominance coefficient of 0.5 and a selection coefficient s
drawn from a gamma distribution with mean and shape , and had probability 1 -
π to be neutral (i.e.
s=0). We outputted a sample of 5,000 European genomes using
the out-of-Africa demographic model of Gravel et al.[66] implemented in SLiM. Then, we used
Eyre-Walker model[12] to compute
the per-allele effect size , where c is a constant,
N is the effective population
size, s the selection coefficient
of variant j, is the coupling coefficient between selection
and phenotypic effect, and ε is a normally distributed
noise. Here, c was set to have a trait heritability
h=0.5
(i.e., where
p is the allele frequency
of variant j),N
was set as the expected coalescent time[67] of the European population of the Gravel et al. model
(6,524), and ε was set to 0 for simplicity. We note that
we focused here on per-variant heritability (i.e. ) and not directional effects, thus our
conclusions are independent of the direction of the selection coefficient on the
trait and are valid for traits that are either under direct or stabilizing
selection.Unlike our previous forward simulation framework[23], we designed these simulations to have a
realistic DFE for annotations mimicking both non-synonymous and noncoding
variants. Briefly, we created 50 non-synonymous elements with a realistic length
200bp (10kb in total, 1% of the 1Mb simulated genome) separated by non-coding
elements of size 14.9kb (99% of the simulated genome; Supplementary Figure 11a). To mimic
non-synonymous elements, we used π = 80%,
= −3.16 × 10−3
and = 0.32, as previously estimated[13]. Then, we estimated that
fixing π=40%, =−1.00×10−4,
=0.32 for noncoding variants and =0.75 provide a good fit of our UK Biobank
heritability and non-synonymous enrichment results (see Supplementary Note).In most subsequent simulations, we fixed the probability of a
deleterious variant to be causal
(π) at 10%, so that the proportion
of de novo non-synonymous variants that are causal (π,
defined as
π=π·π)
is 8% (resp. 4% for noncoding variants). This allows non-synonymous variants to
have LFVE and CVE on the same order of magnitude as the LFVE and CVE observed
for the non-synonymous variants inside genes predicted to be under strong
negative selection[31]
(102.0× and 41.4×, respectively; Figure 5). We note that we replicated our main results when using
π=5%
(Supplementary Figure
18).Next, we investigated the impact of and π on a “functional
noncoding” annotation. To do so, we alternately considered 200kb
functional elements as non-synonymous elements (1% of the simulated genome) or
as functional noncoding elements (1% of the simulated genome), separated by
“ordinary noncoding” elements of size 9.8kb (98% of the simulated
genome; Supplementary Figure
11b). For each functional noncoding element, we fixed
π=60% and
=0.32 (equal to the value of
for non-synonymous and overall noncoding
elements). We chose a value π in between
the value for overall noncoding (π=40%)
and non-synonymous (π=80%) annotations, as
we hypothesized that enriched functional noncoding annotations in the human
genome have a larger proportion of deleterious variants than the overall
noncoding genome. However, we note that we obtained similar results when
choosing π=40% for the functional
noncoding annotation (Supplementary Figure 19). We varied and
π (and
thus π) of the functional noncoding annotation, while retaining
π=10% for
the variants in the non-synonymous and ordinary noncoding elements. (We varied
on the logarithmic scale, and report truncated
values in the manuscript for simplicity; for example,=−0.003 stands for
−3.1623×10−3; see Supplementary Table 12 for exact
values). For each scenario, we simulated 1,000
regions of 1Mb for each scenario, merged the outputted variants, and considered
100 randomly chosen sets of causal variants.When drawing inferences about rare variant (MAF<0.5%)
architectures of noncoding functional annotations, we focused on simulations
with π=48% for the functional noncoding annotation, because the CVE and
LFVE/CVE ratios for the CTS annotations in Figure
4a (between 5 and 20, and between 1 and 2, respectively) roughly
correspond to π=48% and between 0.0002 and 0.0006 (Figure 6a-b).
Authors: S Hong Lee; Jian Yang; Guo-Bo Chen; Stephan Ripke; Eli A Stahl; Christina M Hultman; Pamela Sklar; Peter M Visscher; Patrick F Sullivan; Michael E Goddard; Naomi R Wray Journal: Am J Hum Genet Date: 2013-12-05 Impact factor: 11.025
Authors: Jian Yang; Beben Benyamin; Brian P McEvoy; Scott Gordon; Anjali K Henders; Dale R Nyholt; Pamela A Madden; Andrew C Heath; Nicholas G Martin; Grant W Montgomery; Michael E Goddard; Peter M Visscher Journal: Nat Genet Date: 2010-06-20 Impact factor: 38.330
Authors: Kerstin Lindblad-Toh; Manuel Garber; Or Zuk; Michael F Lin; Brian J Parker; Stefan Washietl; Pouya Kheradpour; Jason Ernst; Gregory Jordan; Evan Mauceli; Lucas D Ward; Craig B Lowe; Alisha K Holloway; Michele Clamp; Sante Gnerre; Jessica Alföldi; Kathryn Beal; Jean Chang; Hiram Clawson; James Cuff; Federica Di Palma; Stephen Fitzgerald; Paul Flicek; Mitchell Guttman; Melissa J Hubisz; David B Jaffe; Irwin Jungreis; W James Kent; Dennis Kostka; Marcia Lara; Andre L Martins; Tim Massingham; Ida Moltke; Brian J Raney; Matthew D Rasmussen; Jim Robinson; Alexander Stark; Albert J Vilella; Jiayu Wen; Xiaohui Xie; Michael C Zody; Jen Baldwin; Toby Bloom; Chee Whye Chin; Dave Heiman; Robert Nicol; Chad Nusbaum; Sarah Young; Jane Wilkinson; Kim C Worley; Christie L Kovar; Donna M Muzny; Richard A Gibbs; Andrew Cree; Huyen H Dihn; Gerald Fowler; Shalili Jhangiani; Vandita Joshi; Sandra Lee; Lora R Lewis; Lynne V Nazareth; Geoffrey Okwuonu; Jireh Santibanez; Wesley C Warren; Elaine R Mardis; George M Weinstock; Richard K Wilson; Kim Delehaunty; David Dooling; Catrina Fronik; Lucinda Fulton; Bob Fulton; Tina Graves; Patrick Minx; Erica Sodergren; Ewan Birney; Elliott H Margulies; Javier Herrero; Eric D Green; David Haussler; Adam Siepel; Nick Goldman; Katherine S Pollard; Jakob S Pedersen; Eric S Lander; Manolis Kellis Journal: Nature Date: 2011-10-12 Impact factor: 49.962
Authors: Cathie Sudlow; John Gallacher; Naomi Allen; Valerie Beral; Paul Burton; John Danesh; Paul Downey; Paul Elliott; Jane Green; Martin Landray; Bette Liu; Paul Matthews; Giok Ong; Jill Pell; Alan Silman; Alan Young; Tim Sprosen; Tim Peakman; Rory Collins Journal: PLoS Med Date: 2015-03-31 Impact factor: 11.069
Authors: Adam Auton; Lisa D Brooks; Richard M Durbin; Erik P Garrison; Hyun Min Kang; Jan O Korbel; Jonathan L Marchini; Shane McCarthy; Gil A McVean; Gonçalo R Abecasis Journal: Nature Date: 2015-10-01 Impact factor: 49.962
Authors: Michael M Hoffman; Jason Ernst; Steven P Wilder; Anshul Kundaje; Robert S Harris; Max Libbrecht; Belinda Giardine; Paul M Ellenbogen; Jeffrey A Bilmes; Ewan Birney; Ross C Hardison; Ian Dunham; Manolis Kellis; William Stafford Noble Journal: Nucleic Acids Res Date: 2012-12-05 Impact factor: 16.971
Authors: Anshul Kundaje; Wouter Meuleman; Jason Ernst; Misha Bilenky; Angela Yen; Alireza Heravi-Moussavi; Pouya Kheradpour; Zhizhuo Zhang; Jianrong Wang; Michael J Ziller; Viren Amin; John W Whitaker; Matthew D Schultz; Lucas D Ward; Abhishek Sarkar; Gerald Quon; Richard S Sandstrom; Matthew L Eaton; Yi-Chieh Wu; Andreas R Pfenning; Xinchen Wang; Melina Claussnitzer; Yaping Liu; Cristian Coarfa; R Alan Harris; Noam Shoresh; Charles B Epstein; Elizabeta Gjoneska; Danny Leung; Wei Xie; R David Hawkins; Ryan Lister; Chibo Hong; Philippe Gascard; Andrew J Mungall; Richard Moore; Eric Chuah; Angela Tam; Theresa K Canfield; R Scott Hansen; Rajinder Kaul; Peter J Sabo; Mukul S Bansal; Annaick Carles; Jesse R Dixon; Kai-How Farh; Soheil Feizi; Rosa Karlic; Ah-Ram Kim; Ashwinikumar Kulkarni; Daofeng Li; Rebecca Lowdon; GiNell Elliott; Tim R Mercer; Shane J Neph; Vitor Onuchic; Paz Polak; Nisha Rajagopal; Pradipta Ray; Richard C Sallari; Kyle T Siebenthall; Nicholas A Sinnott-Armstrong; Michael Stevens; Robert E Thurman; Jie Wu; Bo Zhang; Xin Zhou; Arthur E Beaudet; Laurie A Boyer; Philip L De Jager; Peggy J Farnham; Susan J Fisher; David Haussler; Steven J M Jones; Wei Li; Marco A Marra; Michael T McManus; Shamil Sunyaev; James A Thomson; Thea D Tlsty; Li-Huei Tsai; Wei Wang; Robert A Waterland; Michael Q Zhang; Lisa H Chadwick; Bradley E Bernstein; Joseph F Costello; Joseph R Ecker; Martin Hirst; Alexander Meissner; Aleksandar Milosavljevic; Bing Ren; John A Stamatoyannopoulos; Ting Wang; Manolis Kellis Journal: Nature Date: 2015-02-19 Impact factor: 69.504
Authors: Po-Ru Loh; Gaurav Bhatia; Alexander Gusev; Hilary K Finucane; Brendan K Bulik-Sullivan; Samuela J Pollack; Teresa R de Candia; Sang Hong Lee; Naomi R Wray; Kenneth S Kendler; Michael C O'Donovan; Benjamin M Neale; Nick Patterson; Alkes L Price Journal: Nat Genet Date: 2015-11-02 Impact factor: 38.330
Authors: Klaudia Walter; Josine L Min; Jie Huang; Lucy Crooks; Yasin Memari; Shane McCarthy; John R B Perry; ChangJiang Xu; Marta Futema; Daniel Lawson; Valentina Iotchkova; Stephan Schiffels; Audrey E Hendricks; Petr Danecek; Rui Li; James Floyd; Louise V Wain; Inês Barroso; Steve E Humphries; Matthew E Hurles; Eleftheria Zeggini; Jeffrey C Barrett; Vincent Plagnol; J Brent Richards; Celia M T Greenwood; Nicholas J Timpson; Richard Durbin; Nicole Soranzo Journal: Nature Date: 2015-09-14 Impact factor: 49.962
Authors: Luke J O'Connor; Armin P Schoech; Farhad Hormozdiari; Steven Gazal; Nick Patterson; Alkes L Price Journal: Am J Hum Genet Date: 2019-08-08 Impact factor: 11.025
Authors: Margaux L A Hujoel; Steven Gazal; Farhad Hormozdiari; Bryce van de Geijn; Alkes L Price Journal: Am J Hum Genet Date: 2019-03-21 Impact factor: 11.025
Authors: Rebecca S Fine; Tune H Pers; Tiffany Amariuta; Soumya Raychaudhuri; Joel N Hirschhorn Journal: Am J Hum Genet Date: 2019-05-02 Impact factor: 11.025