Literature DB >> 32424349

Quantifying genetic effects on disease mediated by assayed gene expression levels.

Douglas W Yao1, Luke J O'Connor2,3,4, Alkes L Price3,4,5, Alexander Gusev6,7,8.   

Abstract

Disease variants identified by genome-wide association studies (GWAS) tend to overlap with expression quantitative trait loci (eQTLs), but it remains unclear whether this overlap is driven by gene expression levels 'mediating' genetic effects on disease. Here, we introduce a new method, mediated expression score regression (MESC), to estimate disease heritability mediated by the cis genetic component of gene expression levels. We applied MESC to GWAS summary statistics for 42 traits (average N = 323,000) and cis-eQTL summary statistics for 48 tissues from the Genotype-Tissue Expression (GTEx) consortium. Averaging across traits, only 11 ± 2% of heritability was mediated by assayed gene expression levels. Expression-mediated heritability was enriched in genes with evidence of selective constraint and genes with disease-appropriate annotations. Our results demonstrate that assayed bulk tissue eQTLs, although disease relevant, cannot explain the majority of disease heritability.

Entities:  

Mesh:

Year:  2020        PMID: 32424349      PMCID: PMC7276299          DOI: 10.1038/s41588-020-0625-2

Source DB:  PubMed          Journal:  Nat Genet        ISSN: 1061-4036            Impact factor:   38.330


Introduction

In the past decade, genome-wide association studies (GWAS) have shown that most disease-associated variants lie in noncoding regions of the genome[1-3], leading to the hypothesis that regulation of gene expression levels is the primary biological mechanism through which genetic variants affect complex traits, and motivating large scale expression quantitative trait loci (eQTL) studies[4,5]. Many statistical methods have been developed to integrate eQTL data with GWAS data to gain functional insight into the genetic architecture of disease. These methods include: colocalization tests, which have shown that many genes have eQTLs that colocalize with GWAS loci[6-10]; transcriptome-wide association studies, which have shown that many genes exhibit significant cis-genetic correlations between their expression and disease[11-24]; and partitioning of disease heritability, which has shown that eQTLs as a whole are significantly enriched for disease heritability[25-28]. Despite these findings, it remains unclear the extent to which eQTLs from available studies capture mechanistic effects of gene expression on disease[9,29-31]. In particular, eQTLs from the largest available gene expression reference panels[5,32] are measured in bulk tissues in steady-state cellular conditions, which may not reflect the specific cell types or cellular contexts in which gene expression is causal for disease[33-35]. In addition, several different causal scenarios can result in similar patterns of enrichment/overlap between GWAS loci and eQTLs, summarized in Figure 1a: (1) mediation, (2) pleiotropy, and (3) linkage. Of these three scenarios, only scenario (1) is informative of the SNP’s mechanism of action on disease, but existing methods are unable to consistently distinguish scenarios (2) and (3) from scenario (1). Colocalization tests can sometimes rule out linkage as an explanation for overlap between eQTLs and disease SNPs, but cannot rule out pleiotropy[13,36]. Transcriptome-wide association studies cannot rule out either pleiotropy or linkage[13,29]. Among the methods that partition disease heritability, some aim to rule out linkage through fine-mapping of eQTLs[27], but none aim to rule out pleiotropy. Thus, it remains unclear whether enrichment/overlap between eQTLs and disease SNPs usually reflects mediation, or whether it more commonly reflects pleiotropy and/or linkage[9,29]. For example, in the case of autoimmune diseases, most instances of overlap between significant disease loci and immune cell eQTLs are driven by linkage[9], suggesting that linkage may be more prevalent than mediation[31].
Figure 1.

Schematic of MESC

(a) Three possible causal scenarios explaining enrichment/overlap between GWAS loci and eQTLs. GE, gene expression levels. (b) SNP effect sizes are modeled as the sum of a mediated component (defined as causal cis-eQTL effect sizes β multiplied by gene-trait effect sizes α) and a non-mediated component γ. (c) Heritability mediated by the cis-genetic component of gene expression levels () is defined as the squared mediated component of SNP effect sizes summed across all SNPs (assuming that genotypes and phenotypes are standardized). can be rewritten as the product of the number of genes G, the average expression cis-heritability , and the average gene-trait effect size E[α] (d) The basic premise behind MESC is to regress squared GWAS effect sizes on squared eQTL effect sizes. Non-directional non-mediated effects are captured by the intercept, while directional mediated effects are captured by the slope, which equals E[α] given appropriate effect size independence assumptions (see Methods). (e) In practice, MESC involves regressing squared GWAS summary statistics on squared eQTL summary statistics. Differences in the level of LD between SNPs are captured by an LD score covariate. In the figure, we show a simplified LD architecture with two discrete levels of LD.

In this study, we aim to quantify the proportion of disease heritability mediated in cis by assayed expression levels (scenario (1) from above). We first define expression-mediated heritability under a generative model featuring both mediated and non-mediated (including pleiotropic and linkage) effects of SNPs on the trait. This definition can accommodate assayed gene expression levels measured in a tissue or cellular context not necessarily causal for the disease. We introduce a method, mediated expression score regression (MESC), to estimate expression-mediated heritability from GWAS summary statistics, linkage disequilibrium (LD) scores, and eQTL effect sizes obtained from external expression panels. Intuitively, MESC distinguishes mediated from non-mediated effects in a set of genes via the idea that mediation (unlike pleiotropy and linkage) induces a linear relationship between the magnitude of eQTL effect sizes and disease effect sizes. We applied MESC to GWAS summary statistics for 42 diseases and complex traits and cis-eQTL data for 48 tissues from the GTEx consortium[5] to quantify the proportion of disease heritability mediated by the expression levels of all genes as a whole, as well as by various functional gene sets.

Results

Definition of expression-mediated heritability

We briefly define heritability mediated by the cis-genetic component of gene expression levels (). Cis-eQTL effects multiplied by gene-trait effects form an expression-mediated component of each SNP effect on trait (Figure 1b). This component is then squared and summed across all SNPs to obtain (Figure 1c,d). Our definition of additionally has two forms: , in which cis-eQTL effect sizes are hypothetically obtained in the causal cell types and contexts for the disease, and , in which cis-eQTL effect sizes are obtained in a given set of assayed tissues T (e.g. from GTEx). and are related by the formula , where is the average squared genetic correlation between expression in T and expression in the unobserved causal cell types/contexts for the disease. In practice, we only aim to estimate , but it is useful to conceptualize this quantity in terms of since has a more direct mechanistic interpretation. For brevity, we refer to as simply for the remainder of the manuscript, where the set of tissues T is implicit. We also define a quantity corresponding to the heritability mediated by the expression levels of gene category D, where D can be arbitrarily defined over any set of genes (e.g. genes in a specific molecular pathway). See Methods for a more detailed definition of and .

Estimating expression-mediated heritability using MESC

In order to estimate , we propose an approach that involves regressing squared GWAS summary statistics on squared cis-eQTL summary statistics summed across genes (Figure 1d). Differences in LD between SNPs are captured by conditioning on LD scores (Figure 1e). In addition, to avoid bias (see below), we stratify the regression across both gene categories D and SNP categories C. The final regression equation used to estimate is where is the GWAS χ2 statistic of SNP k, N is the number of samples, τc is the per-SNP contribution to non-mediated heritability of SNPs in SNP category C, ℓ is the LD score[2,37] of SNP k with respect to SNP category C (defined as ), πd is the per-gene contribution to , and ℒ is the expression score of SNP k with respect to gene category D (defined as ). Here, r refers to the LD between SNPs j and k, while β refers to the causal cis-eQTL effect size of SNP j on gene i. ℒ can be conceptualized as the total expression cis-heritability of genes in D that is tagged by SNP k. The above equation allows us to estimate π and τ via computationally efficient multiple regression of GWAS chi-square statistics against LD scores and expression scores. In order for the equation to provide unbiased estimates of , two main effect size independence assumptions must be satisfied, of which violations can be addressed via careful partitioning of SNPs and/or genes (Methods; Supplementary Note). Throughout this study, we present estimates of three quantities that are a function of and/or : (1) the proportion of heritability mediated by expression (defined as ), (2) the proportion of expression-mediated heritability for gene category D (defined as ), and (3) the enrichment of expression-mediated heritability for D (defined as the proportion of expression-mediated heritability in D divided by the proportion of genes in D). We estimate standard errors and p-values for all quantities by jackknifing over blocks of SNPs[2,37] (Methods). We have released open source software implementing our method (https://github.com/douglasyao/mesc).

Simulations assessing calibration and bias

We performed simulations to assess the calibration and bias of MESC in estimating and its standard error from simulated complex trait and expression data under a variety of genetic architectures (Methods). We performed all simulations using real genotypes from UK Biobank[38] (N = 10,000 GWAS samples; N = 100–1000 expression samples, M = 98,499 SNPs from chromosome 1). We evaluated the bias of MESC in estimating various values of in the following scenarios: (1) when varying expression panel sample size (Figure 2a), (2) when varying the proportion of SNPs and genes with nonzero effects (Figure 2b), (3) when simulating eQTL effect sizes in the gene expression panel that differ from those used to generate the complex trait phenotype, emulating the scenario in which assayed tissues differ from the causal tissue(s) for the disease (Figure 2c), (4) when using different methods to estimate expression scores (5 in total) (Supplementary Figure 1), (5) when varying total disease heritability (Supplementary Figure 2), and (6) when including rare variants and inducing an inverse relationship between eQTL/GWAS effect size magnitude and minor allele frequency (Supplementary Figure 3), consistent with negative selection acting on both gene expression[39,40] and complex trait[41,42]. We observed that MESC produced unbiased or nearly unbiased estimates of across all simulated genetic architectures with expression panel sample size greater than 500 when using the best-performing method to estimate expression scores, LASSO with REML correction (Methods). We note that available expression panel sample sizes for individual tissues are typically smaller than 500, which necessitates meta-analysis across tissues to attain larger expression panel sample sizes (Methods). For scenario (3), we expect in theory that MESC will estimate the quantity when using expression scores from a non-causal tissue with average squared genetic correlation of expression with the causal tissue. Our simulation results support this theoretical expectation.
Figure 2.

Simulation results.

We simulated expression and complex trait architectures corresponding to various levels of . GWAS sample size was fixed at 10,000 and was fixed at 0.5. Error bars represent mean standard errors across 300 simulations. (a) Impact of expression panel sample size on estimates. Expression scores were estimated from simulated expression panel samples using LASSO with REML correction. (b) Impact of sparse genetic/eQTL architectures on estimates. (c) estimates with . (d) estimates in the presence of a negative correlation between the magnitude of eQTL effect size and gene effect size (constituting a violation of gene-eQTL independence). Results are shown with and without stratifying genes by 5 expression cis-heritability bins. See Supplementary Figure 5 for estimates of individual bins. (e) estimates when 100% of eQTL effects and non-mediated effects lie within coding regions (constituting a violation of gene-eQTL independence). Results are shown stratifying SNPs by the baselineLD model and a version of the baselineLD model with the coding annotation removed. See Supplementary Figure 6 for additional similar simulations. (f) With fixed at 0, we varied the heritability enrichment of three eQTL-enriched SNP categories (coding, TSS, and conserved regions) from 2.5x to 10x. In the figure, we show the proportion of simulations in which the null hypothesis that is rejected by MESC, and the proportion of simulations in which the null hypothesis of no enrichment for the set of all eQTLs is rejected by stratified LD-score regression (S-LDSC).

Next, we assessed the bias of MESC in two biologically plausible scenarios corresponding to violations of the two main effect size independence assumptions (Methods), and we assessed how well partitioning genes and SNPs ameliorated this bias. The assumptions can be summarized as: (1) gene-eQTL independence, where eQTL and gene effect size magnitude are independent within each gene category, and (2) pleiotropy-eQTL effect size independence, where eQTL and SNP non-mediated effect size magnitude are independent within each SNP category. We simulated violations of (1) by inducing a negative correlation between eQTL and gene effect size magnitude across the genome. We observed that partitioning genes into 5 bins by the magnitude of their expression heritability enabled us to obtain approximately unbiased estimates of (Figure 2d). We simulated violations of (2) by inducing enrichment of eQTLs and non-mediated effects within the same SNP categories (e.g. coding regions, transcription start sites, or conserved regions). We observed that partitioning SNPs by the baselineLD model[2,43] (a set of comprehensive functional SNP annotations) enabled us to obtain approximately unbiased estimates of (Figure 2e), even in extreme scenarios e.g. when 100% of mediated and non-mediated heritability were entirely concentrated in coding regions. Finally, we performed simulations comparing MESC to other methods. To our knowledge, no published methods specifically aim to estimate heritability mediated by expression levels. The closest analogues are approaches that measure the genome-wide heritability enrichment of eQTLs[25-28] using GCTA[44] or stratified LD score regression (S-LDSC)[2,43]. In simulations, we found that S-LDSC detected significant heritability enrichment of a SNP category corresponding to the set of all eQTLs in the absence of any mediation (), while MESC had a well-calibrated false positive rate for detecting significantly non-zero in this scenario (Figure 2f). In summary, we show that MESC produces approximately unbiased estimates of and well-calibrated standard errors under a wide variety of simulated genetic and gene architectures for expression panel sample sizes > 500, whereas other methods cannot distinguish mediated from non-mediated effects. See Supplementary Note for more details on simulations in this section.

Estimation of for 42 diseases and complex traits

We applied MESC to estimate the proportion of heritability mediated by the cis-genetic component of assayed expression levels () for 42 independent diseases and complex traits from the UK Biobank[38] and other publicly available datasets (average N = 323K; see Supplementary Table 1 for list of traits). In total, we produced three different types of expression scores: (1) expression scores for each individual GTEx tissue, (2) expression scores meta-analyzed within groups of GTEx tissues with common biological origin (Supplementary Table 2), and (3) expression scores meta-analyzed across all 48 GTEx tissues. Each type of expression score was used to estimate for each complex trait (Methods). To avoid biases, we partitioned genes by 5 expression cis-heritability bins and SNPs by the baselineLD model. We performed several analyses evaluating the robustness of these SNP and gene categories, finding that our estimates of were similar with other reasonable choices of SNP and gene categories but very biased when not partitioning genes or SNPs at all (Supplementary Note). Across all 42 traits, we observed an average of 0.11 (S.E. 0.02) from the all-tissue meta-analyzed expression scores. We did not observe a relationship between and across traits (R = 0.004) (Extended Data 1). Of the 42 traits, 26 had estimates greater than 0 at nominal significance (p-value < 0.05), with 10 reaching Bonferroni significance (p-value < 0.05 / 42). In Figure 3a, we report estimates from all-tissue and tissue-group meta-analyzed expression scores for a representative set of 10 genetically uncorrelated traits (full results in Extended Data 2 and Supplementary Table 3,4). We observed consistently lower estimates of from individual-tissue expression scores than from meta-tissue expression scores, as well as a positive correlation between tissue sample size and magnitude of individual-tissue (R = 0.71) (Extended Data 3), suggesting downward biases in individual-tissue estimates due to low sample size.
Extended Data Fig. 1

Relationship between and .

estimates were obtained using all-tissue meta-analyzed expression scores. estimates were obtained using stratified LD-score regression. Error bars represent jackknife standard errors.

Figure 3.

Estimates of proportion of heritability mediated by expression from GTEx.

(a) Estimated proportion of heritability mediated by the cis-genetic component of assayed gene expression levels () for 10 genetically uncorrelated traits (average N = 339K). See Supplementary Note for procedure behind selecting these 10 traits and Extended Data 2 for estimates of for all 42 traits. Error bars represent jackknife standard errors. For each trait, we report the estimate for “All tissues” (expression scores meta-analyzed across all 48 GTEx tissues) and “Best tissue group” (expression scores meta-analyzed within 7 tissue groups). Here, “best” refers to the tissue group resulting in the highest estimates of compared to all other tissue groups. (b) estimates meta-analyzed across all 42 traits (average N = 323K). Error bars represent standard errors from random-effects meta-analysis. Here, “Best tissue” refers to the individual tissue resulting in the highest estimates of compared to all other tissues. BMI, body mass index; CNS, central nervous system.

Extended Data Fig. 2

estimates for all diseases and expression scores.

Same as Figure 3a, but containing estimates for all 42 traits from all three types of expression scores: “All tissues” (expression scores meta-analyzed across all 48 GTEx tissues), “Best tissue group” (expression scores meta-analyzed within 7 tissue groups), and “Best tissue” (expression scores computed within individual tissues). Here, “best” refers to the tissue/tissue group resulting in the highest estimates of compared to all other tissues/tissue groups. Error bars represent jackknife standard errors.

Extended Data Fig. 3

Relationship between individual tissue sample size and magnitude of .

estimates from expression scores estimated in each of 48 individual GTEx tissues were meta-analyzed across 42 complex traits, then plotted against the number of samples in each tissue. We use the following abbreviations: adipose visceral, adipose visceral omentum; brain ACC, brain anterior cingulate cortex BA24; brain CBG, brain caudate basal ganglia; brain CH, brain cerebellar hemisphere; brain FC, brain frontal cortex BA9; brain NABG, brain nucleus accumbens basal ganglia; brain PBG brain putamen basal ganglia; cells CETL, cells EBV-transformed lymphocytes; cells TF, cells transformed fibroblasts; esophagus GJ, esophagus gastroesophageal junction; heart AA, heart atrial appendage; heart LV, heart left ventricle; skin NSES, skin not sun exposed suprapubic; skin SELL, skin sun exposed lower leg; small intestine, small intestine terminal ileum.

As independent validation, we used cis-eQTL summary statistics from eQTLGen[32] (N = 31,684 in blood only) to estimate for the same 42 traits we analyzed above. We obtained very similar estimates as GTEx all-tissue expression for blood/immune traits and lower for non-blood/immune traits (Extended Data 4, Supplementary Table 5), consistent with the fact that that eQTLGen only captures expression levels in blood while GTEx all-tissue meta-analysis captures expression levels across diverse tissues.
Extended Data Fig. 4

estimates for 42 diseases and complex traits using data from eQTLGen.

We estimated expression scores for all SNPs using cis-eQTL summary statistics from eQTLGen (N = 31,684 blood samples), then estimated using GWAS summary statistics for the same 42 traits analyzed in the main text. Expression cis-heritability estimates for eQTLGen data were obtained using LD-score regression. For sake of comparison, we also display estimates obtained from expression scores from GTEx all-tissue meta-analysis and GTEx whole blood only. (a) estimates for 42 individual traits, organized into blood/immune and non-blood/immune traits. Error bars represent jackknife standard errors. (b) Results from a meta-analyzed across traits. Error bars represent standard errors from random-effects meta-analysis. Note that low estimates of for GTEx whole blood expression scores are caused by the small sample size of the GTEx whole blood data set (N = 369).

Genes with low expression heritability explain more

To investigate the relationship between expression cis-heritability () and amount of complex trait heritability mediated by those genes, we looked at the proportion of (defined as for gene category D) mediated by genes stratified into 10 equally-sized bins by their . Across 26 traits with significantly greater than 0, we observed an inverse relationship between meta-tissue and proportion of across gene bins (Figure 4, Supplementary Table 6), with 32% of explained by the lowest 2 bins (mean meta-tissue ) and only 3% of explained by the highest 2 bins (mean meta-tissue ). This result implies that genes with less heritable expression (i.e. weaker/fewer eQTLs) have substantially larger causal effect sizes on the complex trait.
Figure 4.

Low heritability genes explain more expression-mediated disease heritability.

(a) Estimated proportion of expression-mediated heritability () for 10 gene bins stratified by magnitude of expression cis-heritability. Results are meta-analyzed across 26 traits with nominally significant . Error bars represent standard errors from random effects meta-analysis. Results for individual traits can be found in Supplementary Table 6.

We considered several reasons why genes with less heritable expression might have larger causal effects on the complex trait. One explanation is that negative selection purifies out strong eQTLs for genes with large effect on complex traits[5,45]. Alternatively, genes with low meta-tissue may consist of genes with tissue-specific eQTLs, which have been shown to be enriched for disease heritability[5,27,28]. In support of the first explanation, we observed that the probability of being loss-of-function intolerant[46] (i.e. pLI) and the level of selection against protein-truncating variants[47] (i.e. s) were both inversely correlated with meta-tissue (Spearman’s ρ = −0.23 and −0.21 respectively) (Extended Data 5). We did not observe strong evidence for the second hypothesis (Supplementary Note).
Extended Data Fig. 5

Relationship between expression cis-heritability and metrics of gene essentiality.

For each gene, pLI (probability of loss-of-function intolerance) was obtained from Lek et al. 2016 Nature and s (selection against protein-truncating variants) was obtained from Cassa et al. 2017 Nature Genetics.

enrichment in functional gene sets

To gain insight into the distribution of expression-mediated effect sizes across various functional gene sets, we estimated enrichment, defined as (proportion of ) / (proportion of genes), for these gene sets. We analyzed 827 gene sets from three main sources: (1) 10 gene sets reflecting various broad metrics of gene essentiality; (2) 780 gene sets reflecting specific biological pathways, including gene sets from the KEGG[48], Reactome[49], and Gene Ontology (GO)[50] pathway databases; and (3) 37 gene sets comprising genes specifically expressed in 37 different GTEx tissues[51] (Methods; see Supplementary Table 7 for list of gene sets). We restricted our analyses to large gene sets with at least 200 genes, since we observed large standard errors in enrichment estimates for gene sets with 200 or fewer genes (Supplementary Figure 4). Out of 21,502 gene set-complex trait pairs (827 gene sets × 26 complex traits), we observed 226 gene set-complex trait pairs (comprising 117 unique gene sets) with FDR-significant enrichment (q-value < 0.05 accounting for 21,502 hypotheses tested). Significant enrichment estimates ranged from 1.5x to 51x across gene-set complex trait pairs. The full list of enrichment estimates for all 21,502 gene set-complex trait pairs is reported in Supplementary Table 8. In Figure 5a, we show enrichment estimates for all 10 broadly essential gene sets meta-analyzed across 26 complex traits (individual trait results in Extended Data 6). We observed Bonferroni-significant meta-trait enrichment (p < 0.05 / 10) for 8 gene sets, including ExAC loss-of-function intolerant genes[46] (3.9x enrichment; p = 2.3 × 10−25), FDA-approved drug targets[52] (5.2x enrichment; p = 2.0 × 10−5), genes essential in mice[53-55] (4.0x enrichment; p = 1.1 × 10−10), and genes nearest to GWAS peaks[56] (3.9x enrichment; p = 5.0 × 10−46).
Figure 5.

Expression-mediated heritability enrichment estimates for functional gene sets.

For all plots, x axis represents complex traits and y axis represents gene sets. P-values for enrichment are obtained using a two-tailed z-test using jackknife standard errors for enrichment. (a) enrichment estimates for 10 broadly essential gene sets meta-analyzed across 26 complex traits. enrichment estimates for individual traits can be found in Extended Data 6. Error bars represent standard errors from random-effects meta-analysis. (b) For ease of display, we report enrichment estimates for a representative set of 14 pathway-specific gene sets across 10 complex traits. enrichment estimates for additional complex traits and gene sets can be found in Extended Data 7 and Supplementary Table 8. (c) enrichment estimates for 37 gene sets corresponding to specifically expressed genes in 37 GTEx tissues. Brain tissues (13 total) are indicated as so in the figure. enrichment estimates for additional complex traits, with individual GTEx tissues labelled, can be found in Supplementary Figure 7. LoF, loss of function.

Extended Data Fig. 6

enrichment estimates for all 10 broadly essential gene sets across all 26 complex traits.

Same as Figure 5a, but showing enrichment estimates for individual traits rather than meta-analyzed estimates.

Of the 780 pathway gene sets, we observed that 97 had a significant enrichment (q-value < 0.05) in at least one of the 26 complex traits. In Figure 5b, we show the enrichment estimates of a representative set of 140 gene set-complex trait pairs (full results in Extended Data 7). Most gene sets exhibited highly trait-specific patterns of enrichment that were consistent with the known biology of the trait, including fragile X mental retardation protein (FMRP)-interacting genes for schizophrenia[57,58], Wnt signaling for bone density[59], and hemostasis for platelet count[60].
Extended Data Fig. 7

enrichment estimates for 97 pathway-specific gene sets across all 26 complex traits.

Same as Figure 5b, but plotting all pathway-specific gene sets (out of 780 total) with FDR-significant enrichment in at least one of the 26 complex traits. For ease of display, we grouped together related traits and gene sets.

Finally, we investigated whether genes specifically expressed in 37 different GTEx tissues[51] were enriched for . We found significant enrichment (q-value < 0.05) of genes specifically expressed in brain tissues for brain-related traits (schizophrenia and years of education) (Figure 5c), demonstrating that the complex trait heritability of SNPs near genes specifically expressed in causal tissues (at least for the two traits here) is in part mediated by the expression of those genes. Given that MESC can be used to prioritize disease-relevant gene sets based on the magnitude of their enrichment, it falls alongside a large class of methods that aim to perform gene set enrichment analysis from GWAS data[61-67]. We compared results from MESC to two other popular gene set enrichment methods applied to the same GWAS summary statistics we analyzed, MAGMA[64] and DEPICT[63]. We observed that MESC highlighted both broadly concordant and unique gene sets compared to these other methods (Supplementary Note; Extended Data 8; Supplementary Table 9).
Extended Data Fig. 8

Comparison between gene set enrichment estimates from MESC, MAGMA, and DEPICT.

See Supplementary Note for details on these analyses. (a) Venn diagram showing the overlap between significantly enriched trait-gene set pairs (FDR < 0.05) identified by the three methods. (b) Scatterplots of -log10 enrichment p-values from MESC vs. MAGMA (left), MESC vs. DEPICT (middle), and MAGMA vs. DEPICT (right). Each point represents a trait-gene set pair. (c) List of all 32 gene sets-complex traits pairs detected as significant by MESC (FDR q-value < 0.05) that are not detected as significant by MAGMA or DEPICT. See Supplementary Table 9 for enrichment estimates for all gene set-complex traits pairs.

Discussion

We have developed a new method, mediated expression score regression (MESC), to estimate complex trait heritability mediated by the cis-genetic component of assayed expression levels () from GWAS summary statistics and eQTL effect sizes estimated from an external expression panel. Our method is distinct from existing methods that identify and quantify overlap between eQTLs and GWAS hits (including colocalization tests[6-10], transcriptome-wide association studies[11-14,16], and heritability partitioning by eQTL status[25-28]) in that it specifically aims to distinguish directional mediated effects from non-directional pleiotropic and linkage effects. Moreover, our polygenic approach does not require individual eQTLs or GWAS loci to be significant and is not impacted by the sparsity of eQTL effect sizes, so unlike other approaches[9-13,27] we do not exclude genes or SNPs from our analyses based on any significance thresholds. We applied our method to summary statistics for 42 traits and eQTL effect sizes estimated from 48 GTEx tissues. We show that across traits, a significant but modest proportion of complex trait heritability (0.11±0.02) is mediated by the cis-genetic component of assayed expression levels. Though many previous approaches have hypothesized that SNPs impact complex traits by directly modulating gene expression levels, our results provide concrete genome-wide evidence for this hypothesis. On the other hand, the fact that our estimates are low for most traits suggests that eQTLs estimated from steady-state expression in bulk post-mortem tissues from GTEx do not capture most of the mediated effect of complex trait heritability, motivating additional assays to better identify molecular mechanisms impacted by regulatory GWAS variants. There are two possible explanations for our low estimates: The proportion of complex trait heritability mediated by the cis-genetic component of gene expression levels is in fact high in causal cell types/contexts for the trait, but eQTL data from bulk assayed tissues from GTEx is a poor proxy for eQTL data in causal cell types/contexts, causing to be low. In other words, is high, while is low. Low may be addressed by larger assays measuring context-specific expression[33,35] and/or single-cell expression[34]. The proportion of complex trait heritability mediated by the cis-genetic component of gene expression levels is low even in causal cell types/contexts for the trait. In particular, complex trait heritability may be mediated in ways other than through gene expression levels in cis, including through protein-coding changes, splicing, or expression levels in trans. In these scenarios, additional assays such as splicing[68], histone mark[4], chromosome conformation[69], and trans-eQTL[32] assays can potentially be informative for probing other molecular mechanisms impacted by GWAS variants. We note that much larger gene expression assays than currently available are necessary to estimate heritability mediated by gene expression levels in trans using MESC (Supplementary Note). We anticipate that MESC can be used to estimate the proportion of disease heritability mediated by future QTL studies beyond cis-eQTLs. We considered several other explanations for our low estimates and justify that they do not apply to our analysis. Our low estimates are not related to the fact that expression cis-heritability is also low, since the level of environmental/stochastic noise in gene expression measurements does not affect our estimates (Supplementary Note). Moreover, our estimates are not biased by rare variant effects on gene expression[39,40], since we only aim to estimate the proportion of common disease heritability mediated by gene expression levels (Supplementary Note). We observed that expression scores meta-analyzed across tissues gave us higher estimates of than individual-tissue expression scores. This result is consistent with previous studies that reported higher heritability enrichment of cis-eQTLs meta-analyzed across all GTEx tissues compared to individual tissues[27], higher prediction accuracy for imputed expression using joint prediction from multiple tissues compared to individual tissues[70], and high cis-genetic correlations of expression between tissues overall[71,72]. We observed a strong inverse relationship between proportion of and expression cis-heritability across genes, suggesting that genes with low expression cis-heritability have large effects on complex traits. This result suggests that integrative association tests that prioritize genes based on probability of colocalization between eQTLs and GWAS hits[6,8,9] and/or significance of genetic correlation between expression and trait[11-13] may not detect the most mechanistically important genes, since these methods have lower power for genes with weaker eQTLs. Instead, our result suggests that genes with weaker eQTLs should be prioritized, and it motivates the implementation of larger eQTL studies and/or cell-type specific assays to more accurately detect these weak eQTLs. There are several limitations to our method. First, our method makes the assumptions that the magnitude of eQTL effect sizes is uncorrelated with the magnitude of both gene-trait effect sizes and non-mediated effect sizes within each SNP and gene category included in the model. Although we have evaluated the robustness of our choice of SNP and gene categories in both simulations and real data, these assumptions may still be violated. Second, our method relies on the accurate estimation of expression scores from external expression panel samples. In order for our method to be well-powered, it requires large expression panel sample sizes that can only be obtained through meta-analysis across individual tissues at current sample sizes. Third, the quantity that our method estimates in practice (i.e. heritability mediated by assayed gene expression levels) can potentially be much smaller than the theoretical quantity of heritability mediated by expression levels in causal cell types/contexts if assayed gene expression levels do not adequately capture expression levels in causal cell types/contexts. Fourth, our method can only provide reliable enrichment estimates for large gene sets on the order of 200 or more genes, so smaller gene pathways or individual genes cannot be prioritized using our method. Fifth, our method does not capture non-additive effects of SNPs on gene expression or gene expression on trait. Despite these limitations, our method provides a novel framework to distinguish mediated effects from pleiotropic and linkage effects and will be useful for quantifying the improvement of new molecular QTL studies over existing assays in capturing regulatory disease mechanisms. Moreover, partitioning mediated heritability can provide insight into regulatory effects mediated by specific gene sets or pathways.

Methods

Definition of

We model trait y for N individuals as follows: where y is an N-vector of phenotypes (standardized to mean 0 and variance 1), X is an N × Mgenotype matrix for M SNPs (standardized to mean 0 and variance 1), γ is an M vector of non-mediated SNP effect sizes on the trait (including pleiotropic, linkage, and trans-eQTL-mediated effects), B is an M × G matrix of cis-eQTL effect sizes in the causal cell types/contexts for G genes, α is a G-vector of causal gene expression effect sizes on the trait, and ϵ is an N-vector of environmental effects. We treat all variables as random. We define as follows: Under the assumption that α and β are independent of each other, we can rewrite this as follows: where E[α2] is the average squared per-gene effect of expression on trait and is the average cis-heritability of expression across all genes. The second line above follows the first because E[XBα | B, α] = 0. We define in a similar fashion: where E[γ2] is the average squared per-SNP effect on trait that is not mediated by gene expression. We consider additional expression causality scenarios, such as reverse mediation, cis-by-trans mediation, and mediation by unobserved intermediaries (Supplementary Figure 8), and we justify that these scenarios do not compromise our definition of (Supplementary Note). In practice, expression levels in causal cell types/contexts for the complex trait are likely not assayed. Given a set of assayed tissues T (which may or may not be causal for the complex trait), we define as follows: while we define as . Here, and denotes the average squared genetic correlation between expression in assayed tissues T vs. in causal cell types/contexts, where β represents cis-eQTL effect sizes on gene i in T. Note that β can refer to either single tissue or meta-tissue cis-eQTL effect sizes, depending on whether T contains one or multiple tissues.

Unstratified MESC

For illustrative purposes, we walk through a derivation for MESC in the idealized scenario that we know 1. the true eQTL effect sizes, β, of each SNP on each gene and 2. the true phenotypic effect sizes, ω, of each SNP on y. Under the generative model (1), the total effect of SNP k on the complex trait is Given conditional independence of α and γ given β, upon squaring ω we have Assuming unconditional independence of α and γ (which requires that we make additional effect size independence assumptions involving β; see “Model assumptions”), this simplifies to We use equation (2) to estimate E[α2] by regressing ω2 for all SNPs on and taking the slope, while we estimate E[γ2] by taking the intercept. See Figure 1d for a plot illustrating this approach. E[α2] can be multiplied by to obtain , while E[γ2] can be multiplied by M to obtain . When we perform this regression using eQTL effect sizes obtained from non-causal tissues T with squared genetic correlation with the causal tissue(s), we obtain an estimate of the quantity rather than (Supplementary Note). Moreover, in practice we perform this regression using GWAS and eQTL summary statistics, in which case we account for differences in LD between SNPs with an LD score covariate (see Supplementary Note for derivation and regression equation).

Model assumptions

The two main effect size independence assumptions that are needed to derive equation (2) are: Across all genes, the magnitude of gene effect sizes is uncorrelated with the magnitude of eQTL effect sizes (i.e. Cov(α2, β2) = 0). We refer to this assumption as gene-eQTL effect size independence. Across all SNPs, the magnitude of non-mediated SNP effect sizes is uncorrelated with the magnitude of eQTL effect sizes (i.e. Cov(γ2, β2) = 0). We refer to this assumption as pleiotropy-eQTL effect size independence. Violations of either of these two assumptions will result in biased estimates of , where the direction of bias is the same as the direction of correlation between eQTL effect size magnitude and gene or non-mediated effect size magnitude. See Supplementary Note for a discussion of realistic scenarios in which these assumptions might be violated, as well as an illustration of how conditioning on SNP- and gene-level annotations can ameliorate any resulting bias.

Stratified MESC

In this section, we extend unstratified MESC to estimate partitioned over groups of genes. Note that stratified MESC can be viewed as a special form of stratified LD score regression[2] (Supplementary Note). Given D potentially overlapping gene categories , …, , we define partitioned over gene categories as follows: where is the heritability mediated in cis through the expression of genes in category , is the number of genes in , is the average squared causal effect of expression on trait for genes in , and is the average cis-heritability of expression of genes in . Similar to our definition of , the second line above relies on an independence assumption between α and β, namely that α ⊥ β | i∈ . For gene i, we model the variance of gene effect size α as If gene categories form a disjoint partition of the set of all genes, we have On the other hand, if gene categories are overlapping, then π can be conceptualized as the contribution of annotation to conditional on contributions from all other gene categories included in the model. Given C potentially overlapping SNP categories , …, , we define partitioned over SNP categories as follows: where is the non-mediated heritability of SNPs in category , is the number of SNPs in , and is the average squared non-mediated effect size of SNPs in . For SNP j, we model the variance of non-mediated effect size γ as follows: If SNP categories form a disjoint partition of the set of all SNPs, we have On the other hand, if SNP categories are overlapping, then τ can be conceptualized of as the contribution of annotation to conditional on contributions from all other SNP categories included in the model. The equation for stratified MESC is where is the GWAS χ2-statistic of SNP k, N is the number of samples, ℓ is the LD score of SNP k with respect to SNP category (defined as ), and ℒ is the expression score of SNP k with respect to gene category (defined as ). Here, r refers to the LD between SNPs j and k. See Supplementary Note for a derivation of this equation. Analogous to unstratified MESC, when we perform this regression using expression scores in assayed tissues T rather than expression scores in causal cell types/contexts, we will estimate , where r2(T, ) is the average squared genetic correlation of expression between T and causal cell types/contexts for genes in .

Estimation of expression scores

In order to carry out the regression described in equation (3), we must first estimate expression scores ℒ (where ) from an external expression panel. We estimate ℒ from either eQTL summary statistics or individual-level genotypes and expression measurements, where the latter provides less noisy estimates of ℒ given that it is available. In our case, we use the first procedure to estimate expression scores from eQTLGen data (since only eQTL summary statistics are provided), whereas we used the second procedure for GTEx data.

eQTL summary statistics.

We can estimate ℒ from eQTL summary statistics using the following formula: , where is the marginal OLS eQTL effect size estimate of SNP k on gene i, |D| is the number of genes in gene category D, and N is the number of expression panel samples. The right-hand side of the formula is in expectation equal to ℒ (Supplementary Note).

Individual-level genotypes and expression data.

We estimate ℒ by first using LASSO[73] to obtain regularized estimates of causal eQTL effect sizes (), then multiply by the element-wise squared LD matrix R2 as follows: . Here, c is a scaling factor we apply to so that , where is the restricted maximum likelihood (REML) estimate of expression cis-heritability for gene i. We observed that scaling our estimates in this manner reduces noise and bias compared to unscaled estimates (Supplementary Figure 9). We obtain approximately unbiased estimates of the squared LD between two SNPs using the formula , where denotes the standard biased estimator of r2. We refer to this overall procedure as “LASSO with REML correction” and show that it provides the best performance in simulations compared to other methods (Supplementary Note).

Meta-analysis of expression scores

Given our method of computing expression scores from individual-level genotypes and expression data outlined above, we meta-analyze expression scores across tissues as follows. We first obtain meta-tissue expression cis-heritability () estimates for each gene by averaging individual-tissue estimates across tissues. We scale individual-tissue LASSO-predicted causal eQTL effect sizes to the meta-tissue (see above), then average the scaled causal eQTL effect sizes across tissues. Finally, we multiply the averaged causal eQTL effect sizes by the element-wise squared LD matrix to obtain expression scores. In simulations, we show that this method of meta-analyzing expression scores produces nearly unbiased estimates of at 5 tissues × 200 samples per tissue (Supplementary Figure 10), which is comparable to the number expression panel samples in given tissue group (Supplementary Table 2).

Simulations

All simulations were conducted using genotypes from UK Biobank[38] restricted to HapMap 3 SNPs[74] on chromosome 1 (M = 98,499 SNPs). All simulations followed the same overall procedure outlined below in chronological order. See Supplementary Note for specific parameters used in each simulation. Simulation of expression data. We simulated 1–5 eQTLs each for G = 1,000 genes, with effect sizes drawn from a normal distribution and locations randomly selected in a 1 Mb window around the gene. Total was fixed at 0.05 for all simulations. We then simulated expression phenotypes for 100–1,000 expression panel samples (genotypes randomly selected from UK Biobank) using an additive generative model with normally distributed environmental noise added, representing an expression panel. Simulation of GWAS data. We simulated non-mediated SNP effect sizes and gene-trait effect sizes from normal or point-normal distributions for all SNPs and genes corresponding to various levels of . Total was fixed at 0.5 for all simulations (other than for Supplementary Figure 2, in which we varied ). Together with the eQTL effect sizes simulated in the previous step, we used these effect sizes to simulate trait phenotypes using an additive generative model with normally distributed environmental noise added for 10,000 GWAS samples (genotypes randomly selected from UK Biobank and distinct from the expression panel samples). We then produced GWAS summary statistics from this simulated data set using ordinary least squares. Estimation of expression scores. We estimated expression scores from the expression panel samples using LASSO with REML correction (see “Estimation of expression scores” above). For computational ease, we did not actually use REML to predict expression cis-heritability for each gene in each simulation; we instead took the true expression cis-heritability of the gene and added noise drawn from N(0, 0.012) to simulate REML prediction error, which is consistent with empirical standard error estimates produced by GCTA (Supplementary Figure 11). Estimation of . We estimated using MESC with the previously estimated expression scores, in-sample LD scores (computed from the 10,000 GWAS samples), and GWAS summary statistics.

Data and quality control

Genotypes

For MESC, we used European samples in 1000G[75] as reference SNPs to compute LD scores. Regression SNPs were obtained from HapMap 3[74]. Notably, by restricting regression SNPs to HapMap 3 SNPs, we estimate common disease heritability mediated by gene expression levels (see Supplementary Note for discussion of rare vs. common variant ). SNPs with GWAS χ2 statistics > max{80, 0.001N} (where N is the number of GWAS samples) and in the major histocompatibility complex (MHC) region were excluded. See Supplementary Note of ref.[2] for justification of these procedures. For computing expression scores, we downloaded genotypes derived from sequencing data for GTEx v7 from the GTEx Portal (Data Availability) as described in ref.[5]. We retained SNPs that were from HapMap 3[74].

Expression data

We obtained processed and quantile normalized gene expression data for GTEx v7 from the GTEx Portal (Data Availability) as described in ref.[5]. For each tissue, the following covariates were included in all analyses: 3 genetic principal components, sex, platform, and 14–35 expression factors[76] as selected by the main GTEx analysis.

Estimation of expression scores from GTEx data

We used REML as implemented in GCTA[44] to estimate the expression cis-heritability for each gene in each individual GTEx tissue. We then used LASSO as implemented in PLINK[77] (with the LASSO tuning parameter set as the estimated expression cis-heritability of the gene) to estimate eQTL effect sizes for each gene in each individual GTEx tissue. In all procedures, we excluded gene-tissue pairs for which LASSO did not converge when predicting effect sizes. For Figure 3 and Extended Data 2, we obtained causal eQTL effect size estimates in three different ways:

Meta-analysis across all tissues.

For each gene, we averaged the expression cis-heritability estimates across all 48 tissues. Within each tissue, we scaled the LASSO-predicted eQTL effect sizes to the averaged cis-heritability value. We then averaged the scaled eQTL effect sizes for each gene across all tissues. Genes were retained if they had a LASSO-converged eQTL effect size in at least one tissue.

Meta-analysis in tissue groups.

Of the 48 tissues, we grouped together 37 of them into 7 broad tissue groups: adipose, blood/immune, cardiovascular, CNS, digestive, endocrine, and skin (Supplementary Table 2). Within each tissue group, we averaged the expression cis-heritability estimates for each gene and scaled the LASSO-predicted eQTL effect sizes to the averaged cis-heritability value. We then averaged the scaled eQTL effect sizes for each gene across the tissues for each tissue group. Genes were retained in each tissue group if they had a LASSO-converged eQTL effect size in at least one tissue within that tissue group.

Individual tissues.

For each individual tissue, we scaled the LASSO-predicted eQTL effect sizes to the within-tissue-group averaged cis-heritability estimates. The final eQTL effect sizes were then multiplied by the element-wise squared LD matrix (estimated from 1000G[75]) order to obtain expression scores (see “Estimation of expression scores”).

Set of 42 independent traits

Analogous to previous studies[27,78], we initially considered a set of 34 traits from publicly available sources and 55 traits from UK Biobank for which GWAS summary statistics had been computed using BOLT-LMM v2.3[79,80] (see Data Availability). We restricted our analysis to 47 traits with z-scores of total SNP heritability above 6 (computed using stratified LD-score regression). The 47 traits included 5 traits that were duplicated across two datasets (genetic correlation of at least 0.9). For duplicated traits, we retained the data set with the larger sample size, leaving us with a total of 42 independent traits. When meta-analyzing estimates across traits, we performed random effects meta-analysis using the R package rmeta.

BaselineLD categories

In all our analyses, we stratified SNPs by 72 functional categories specified by the baselineLD model v2.0[2,43] (Data Availability). These annotations include coding, conserved, regulatory (e.g., promoter, enhancer, histone marks, transcription factor binding sites), and LD-related annotations. The original baselineLD model v2.0 contains 76 total categories; we removed 4 categories corresponding to QTL MaxCPP annotations[27] because the information contained in these annotations is redundant with the eQTL effect size information contained in expression scores.

Gene set analyses

In order for us to obtain to unbiased estimates of enrichment for the gene sets in our analysis, we must ensure that the gene-eQTL effect size independence assumption holds within each gene set (see “Model assumptions” above). Thus, in order to capture potential correlations between the magnitude of eQTL effect sizes and gene-trait effect sizes within gene sets, we partitioned each gene set into three equally-sized bins based on the magnitude of their expression cis-heritability relative to other genes in the gene set. We then estimated for each individual bin and aggregated these values together to estimate the overall enrichment of the gene set.

Broad gene sets

We obtained gene sets corresponding to all coding genes, genes near significant GWAS hits in the NHGRI GWAS catalog[56], genes essential in mice[53-55], genes essential in cultured cell lines[81], genes with any disease association in ClinVar[82], and genes that are FDA-approved drug targets[52] from the Macarthur lab GitHub page (Data Availability). We obtained an additional gene set for genes essential in cell lines[83], genes depleted for protein-truncating mutations[46,47], and genes depleted for missense mutations[84] from the supplementary data of the respective papers.

Pathway gene sets

We initially considered a set of 7,246 gene sets from the “canonical pathways” and “GO gene sets” collections from the Molecular Signatures Database[85] (Data Availability), consisting of gene sets from BioCarta, Reactome, KEGG, GO, PID, and other sources. We restricted our analysis to 780 gene sets for which the number of genes with LASSO estimates of eQTL effect sizes that converged in individual GTEx tissues was at least 100 when averaged across all individual tissues. Note that this roughly corresponds to gene sets with greater than 200 total genes; see Supplementary Table 7.

Tissue specific expression gene sets

We initially considered the full set of 48 GTEx tissues. We restricted our analysis to 37 gene sets for which the focal tissue belonged to one of the 7 main tissue groups we defined in our previous analyses (Supplementary Table 2). From ref.[51], we obtained the set of 10% most specifically expressed genes in each of the 37 tissues.

Data Availability

GWAS summary statistics for 42 diseases and complex traits can be found at https://data.broadinstitute.org/alkesgroup/sumstats_formatted/. Genotypes for 1000 Genomes Phase 3 data can be found at ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/release/20130502. GTEx v7 data can be found at https://www.gtexportal.org/home/datasets, though to access genotypes one is required to have an approved application. eQTLGen data can be found at https://www.eqtlgen.org/cis-eqtls.html. BaselineLD v2.0 annotations can be found at https://data.broadinstitute.org/alkesgroup/LDSCORE/. Gene sets can be found from the Macarthur lab, https://github.com/macarthur-lab/gene_lists, and Molecular Signatures Database, http://software.broadinstitute.org/gsea/msigdb/collections.jsp. S-LDSC software can be found at https://github.com/bulik/ldsc. BOLT-LMM software can be found at https://data.broadinstitute.org/alkesgroup/BOLT-LMM/downloads/.

Code Availability

Software implementing MESC can be found at https://github.com/douglasyao/mesc.

Relationship between and .

estimates were obtained using all-tissue meta-analyzed expression scores. estimates were obtained using stratified LD-score regression. Error bars represent jackknife standard errors.

estimates for all diseases and expression scores.

Same as Figure 3a, but containing estimates for all 42 traits from all three types of expression scores: “All tissues” (expression scores meta-analyzed across all 48 GTEx tissues), “Best tissue group” (expression scores meta-analyzed within 7 tissue groups), and “Best tissue” (expression scores computed within individual tissues). Here, “best” refers to the tissue/tissue group resulting in the highest estimates of compared to all other tissues/tissue groups. Error bars represent jackknife standard errors.

Relationship between individual tissue sample size and magnitude of .

estimates from expression scores estimated in each of 48 individual GTEx tissues were meta-analyzed across 42 complex traits, then plotted against the number of samples in each tissue. We use the following abbreviations: adipose visceral, adipose visceral omentum; brain ACC, brain anterior cingulate cortex BA24; brain CBG, brain caudate basal ganglia; brain CH, brain cerebellar hemisphere; brain FC, brain frontal cortex BA9; brain NABG, brain nucleus accumbens basal ganglia; brain PBG brain putamen basal ganglia; cells CETL, cells EBV-transformed lymphocytes; cells TF, cells transformed fibroblasts; esophagus GJ, esophagus gastroesophageal junction; heart AA, heart atrial appendage; heart LV, heart left ventricle; skin NSES, skin not sun exposed suprapubic; skin SELL, skin sun exposed lower leg; small intestine, small intestine terminal ileum.

estimates for 42 diseases and complex traits using data from eQTLGen.

We estimated expression scores for all SNPs using cis-eQTL summary statistics from eQTLGen (N = 31,684 blood samples), then estimated using GWAS summary statistics for the same 42 traits analyzed in the main text. Expression cis-heritability estimates for eQTLGen data were obtained using LD-score regression. For sake of comparison, we also display estimates obtained from expression scores from GTEx all-tissue meta-analysis and GTEx whole blood only. (a) estimates for 42 individual traits, organized into blood/immune and non-blood/immune traits. Error bars represent jackknife standard errors. (b) Results from a meta-analyzed across traits. Error bars represent standard errors from random-effects meta-analysis. Note that low estimates of for GTEx whole blood expression scores are caused by the small sample size of the GTEx whole blood data set (N = 369).

Relationship between expression cis-heritability and metrics of gene essentiality.

For each gene, pLI (probability of loss-of-function intolerance) was obtained from Lek et al. 2016 Nature and s (selection against protein-truncating variants) was obtained from Cassa et al. 2017 Nature Genetics.

enrichment estimates for all 10 broadly essential gene sets across all 26 complex traits.

Same as Figure 5a, but showing enrichment estimates for individual traits rather than meta-analyzed estimates.

enrichment estimates for 97 pathway-specific gene sets across all 26 complex traits.

Same as Figure 5b, but plotting all pathway-specific gene sets (out of 780 total) with FDR-significant enrichment in at least one of the 26 complex traits. For ease of display, we grouped together related traits and gene sets.

Comparison between gene set enrichment estimates from MESC, MAGMA, and DEPICT.

See Supplementary Note for details on these analyses. (a) Venn diagram showing the overlap between significantly enriched trait-gene set pairs (FDR < 0.05) identified by the three methods. (b) Scatterplots of -log10 enrichment p-values from MESC vs. MAGMA (left), MESC vs. DEPICT (middle), and MAGMA vs. DEPICT (right). Each point represents a trait-gene set pair. (c) List of all 32 gene sets-complex traits pairs detected as significant by MESC (FDR q-value < 0.05) that are not detected as significant by MAGMA or DEPICT. See Supplementary Table 9 for enrichment estimates for all gene set-complex traits pairs.
  78 in total

1.  Colocalization of GWAS and eQTL Signals Detects Target Genes.

Authors:  Farhad Hormozdiari; Martijn van de Bunt; Ayellet V Segrè; Xiao Li; Jong Wha J Joo; Michael Bilow; Jae Hoon Sul; Sriram Sankararaman; Bogdan Pasaniuc; Eleazar Eskin
Journal:  Am J Hum Genet       Date:  2016-11-17       Impact factor: 11.025

2.  Integration of summary data from GWAS and eQTL studies predicts complex trait gene targets.

Authors:  Zhihong Zhu; Futao Zhang; Han Hu; Andrew Bakshi; Matthew R Robinson; Joseph E Powell; Grant W Montgomery; Michael E Goddard; Naomi R Wray; Peter M Visscher; Jian Yang
Journal:  Nat Genet       Date:  2016-03-28       Impact factor: 38.330

3.  Estimating the causal tissues for complex traits and diseases.

Authors:  Halit Ongen; Andrew A Brown; Olivier Delaneau; Nikolaos I Panousis; Alexandra C Nica; Emmanouil T Dermitzakis
Journal:  Nat Genet       Date:  2017-10-23       Impact factor: 38.330

4.  Integrative approaches for large-scale transcriptome-wide association studies.

Authors:  Alexander Gusev; Arthur Ko; Huwenbo Shi; Gaurav Bhatia; Wonil Chung; Brenda W J H Penninx; Rick Jansen; Eco J C de Geus; Dorret I Boomsma; Fred A Wright; Patrick F Sullivan; Elina Nikkola; Marcus Alvarez; Mete Civelek; Aldons J Lusis; Terho Lehtimäki; Emma Raitoharju; Mika Kähönen; Ilkka Seppälä; Olli T Raitakari; Johanna Kuusisto; Markku Laakso; Alkes L Price; Päivi Pajukanta; Bogdan Pasaniuc
Journal:  Nat Genet       Date:  2016-02-08       Impact factor: 38.330

5.  The International Human Epigenome Consortium: A Blueprint for Scientific Collaboration and Discovery.

Authors:  Hendrik G Stunnenberg; Martin Hirst
Journal:  Cell       Date:  2016-11-17       Impact factor: 41.582

Review 6.  10 Years of GWAS Discovery: Biology, Function, and Translation.

Authors:  Peter M Visscher; Naomi R Wray; Qian Zhang; Pamela Sklar; Mark I McCarthy; Matthew A Brown; Jian Yang
Journal:  Am J Hum Genet       Date:  2017-07-06       Impact factor: 11.025

7.  Genetic and epigenetic fine mapping of causal autoimmune disease variants.

Authors:  Kyle Kai-How Farh; Alexander Marson; Jiang Zhu; Markus Kleinewietfeld; William J Housley; Samantha Beik; Noam Shoresh; Holly Whitton; Russell J H Ryan; Alexander A Shishkin; Meital Hatan; Marlene J Carrasco-Alfonso; Dita Mayer; C John Luckey; Nikolaos A Patsopoulos; Philip L De Jager; Vijay K Kuchroo; Charles B Epstein; Mark J Daly; David A Hafler; Bradley E Bernstein
Journal:  Nature       Date:  2014-10-29       Impact factor: 49.962

8.  Limited statistical evidence for shared genetic effects of eQTLs and autoimmune-disease-associated loci in three major immune-cell types.

Authors:  Sung Chun; Alexandra Casparino; Nikolaos A Patsopoulos; Damien C Croteau-Chonka; Benjamin A Raby; Philip L De Jager; Shamil R Sunyaev; Chris Cotsapas
Journal:  Nat Genet       Date:  2017-02-20       Impact factor: 38.330

9.  A gene-based association method for mapping traits using reference transcriptome data.

Authors:  Eric R Gamazon; Heather E Wheeler; Kaanan P Shah; Sahar V Mozaffari; Keston Aquino-Michaels; Robert J Carroll; Anne E Eyler; Joshua C Denny; Dan L Nicolae; Nancy J Cox; Hae Kyung Im
Journal:  Nat Genet       Date:  2015-08-10       Impact factor: 38.330

10.  Partitioning heritability by functional annotation using genome-wide association summary statistics.

Authors:  Hilary K Finucane; Brendan Bulik-Sullivan; Alexander Gusev; Gosia Trynka; Yakir Reshef; Po-Ru Loh; Verneri Anttila; Han Xu; Chongzhi Zang; Kyle Farh; Stephan Ripke; Felix R Day; Shaun Purcell; Eli Stahl; Sara Lindstrom; John R B Perry; Yukinori Okada; Soumya Raychaudhuri; Mark J Daly; Nick Patterson; Benjamin M Neale; Alkes L Price
Journal:  Nat Genet       Date:  2015-09-28       Impact factor: 38.330

View more
  58 in total

1.  Systematic identification of cis-regulatory variants that cause gene expression differences in a yeast cross.

Authors:  Kaushik Renganaath; Rocky Cheung; Laura Day; Sriram Kosuri; Leonid Kruglyak; Frank W Albert
Journal:  Elife       Date:  2020-11-12       Impact factor: 8.140

Review 2.  Where Are the Disease-Associated eQTLs?

Authors:  Benjamin D Umans; Alexis Battle; Yoav Gilad
Journal:  Trends Genet       Date:  2020-09-07       Impact factor: 11.639

3.  Novel loci and potential mechanisms of major depressive disorder, bipolar disorder, and schizophrenia.

Authors:  He Wang; Zhenghui Yi; Tieliu Shi
Journal:  Sci China Life Sci       Date:  2021-06-16       Impact factor: 6.038

4.  IL10RB as a key regulator of COVID-19 host susceptibility and severity.

Authors:  Georgios Voloudakis; Gabriel Hoffman; Sanan Venkatesh; Kyung Min Lee; Kristina Dobrindt; James M Vicari; Wen Zhang; Noam D Beckmann; Shan Jiang; Daisy Hoagland; Jiantao Bian; Lina Gao; André Corvelo; Kelly Cho; Jennifer S Lee; Sudha K Iyengar; Shiuh-Wen Luoh; Schahram Akbarian; Robert Striker; Themistocles L Assimes; Eric E Schadt; Miriam Merad; Benjamin R tenOever; Alexander W Charney; Kristen J Brennand; Julie A Lynch; John F Fullard; Panos Roussos
Journal:  medRxiv       Date:  2021-06-02

5.  Tejaas: reverse regression increases power for detecting trans-eQTLs.

Authors:  Saikat Banerjee; Franco L Simonetti; Kira E Detrois; Anubhav Kaphle; Raktim Mitra; Rahul Nagial; Johannes Söding
Journal:  Genome Biol       Date:  2021-05-06       Impact factor: 13.583

6.  Imputed gene expression risk scores: a functionally informed component of polygenic risk.

Authors:  Oliver Pain; Kylie P Glanville; Saskia Hagenaars; Saskia Selzam; Anna Fürtjes; Jonathan R I Coleman; Kaili Rimfeld; Gerome Breen; Lasse Folkersen; Cathryn M Lewis
Journal:  Hum Mol Genet       Date:  2021-05-17       Impact factor: 6.150

7.  Germline genetic contribution to the immune landscape of cancer.

Authors:  Rosalyn W Sayaman; Mohamad Saad; Vésteinn Thorsson; Donglei Hu; Wouter Hendrickx; Jessica Roelands; Eduard Porta-Pardo; Younes Mokrab; Farshad Farshidfar; Tomas Kirchhoff; Randy F Sweis; Oliver F Bathe; Carolina Heimann; Michael J Campbell; Cynthia Stretch; Scott Huntsman; Rebecca E Graff; Najeeb Syed; Laszlo Radvanyi; Simon Shelley; Denise Wolf; Francesco M Marincola; Michele Ceccarelli; Jérôme Galon; Elad Ziv; Davide Bedognetti
Journal:  Immunity       Date:  2021-02-09       Impact factor: 31.745

8.  Leveraging supervised learning for functionally informed fine-mapping of cis-eQTLs identifies an additional 20,913 putative causal eQTLs.

Authors:  Qingbo S Wang; David R Kelley; Jacob Ulirsch; Masahiro Kanai; Shuvom Sadhuka; Ran Cui; Carlos Albors; Nathan Cheng; Yukinori Okada; Francois Aguet; Kristin G Ardlie; Daniel G MacArthur; Hilary K Finucane
Journal:  Nat Commun       Date:  2021-06-07       Impact factor: 14.919

9.  A machine learning approach to brain epigenetic analysis reveals kinases associated with Alzheimer's disease.

Authors:  Yanting Huang; Xiaobo Sun; Huige Jiang; Shaojun Yu; Chloe Robins; Matthew J Armstrong; Ronghua Li; Zhen Mei; Xiaochuan Shi; Ekaterina Sergeevna Gerasimov; Philip L De Jager; David A Bennett; Aliza P Wingo; Peng Jin; Thomas S Wingo; Zhaohui S Qin
Journal:  Nat Commun       Date:  2021-07-22       Impact factor: 17.694

Review 10.  Transcriptomic Insight Into the Polygenic Mechanisms Underlying Psychiatric Disorders.

Authors:  Leanna M Hernandez; Minsoo Kim; Gil D Hoftman; Jillian R Haney; Luis de la Torre-Ubieta; Bogdan Pasaniuc; Michael J Gandal
Journal:  Biol Psychiatry       Date:  2020-06-12       Impact factor: 13.382

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.