Literature DB >> 33990562

Modeling regulatory network topology improves genome-wide analyses of complex human traits.

Xiang Zhu^1,2,3, Zhana Duren^4,5, Wing Hung Wong^6,7.

Abstract

Genome-wide association studies (GWAS) have cataloged many significant associations between genetic variants and complex traits. However, most of these findings have unclear biological significance, because they often have small effects and occur in non-coding regions. Integration of GWAS with gene regulatory networks addresses both issues by aggregating weak genetic signals within regulatory programs. Here we develop a Bayesian framework that integrates GWAS summary statistics with regulatory networks to infer genetic enrichments and associations simultaneously. Our method improves upon existing approaches by explicitly modeling network topology to assess enrichments, and by automatically leveraging enrichments to identify associations. Applying this method to 18 human traits and 38 regulatory networks shows that genetic signals of complex traits are often enriched in interconnections specific to trait-relevant cell types or tissues. Prioritizing variants within enriched networks identifies known and previously undescribed trait-associated genes revealing biological and therapeutic insights.

Entities: Chemical Disease Gene Species

Year: 2021 PMID： 33990562 PMCID： PMC8121952 DOI： 10.1038/s41467-021-22588-0

Source DB: PubMed Journal: Nat Commun ISSN： 2041-1723 Impact factor: 14.919

Introduction

Genome-wide association studies (GWAS) have cataloged many significant and reproducible associations between common genetic variants, notably single-nucleotide polymorphisms (SNPs), and diverse human complex traits[1]. However, it remains challenging[2] to translate these findings into biological mechanisms and clinical applications, because most trait-associated variants have individually small effects and map to non-coding sequences. One hypothesis is that non-coding variants cumulatively affect complex traits through cell type- or tissue-specific[3] gene regulation[4]. To test this hypothesis, large-scale epigenomic[5,6] and transcriptomic[7-10] data have been made available spanning diverse human cell types and tissues. Exploiting these data many studies have shown enrichments of trait-associated SNPs in chromatin regions[11-13] and genes[14-16] that are active in trait-relevant cell types or tissues. These studies overlap regulatory maps with GWAS data and often ignore functional interactions among loci within regulatory programs. Gene regulatory networks[17-20] have proven useful in mining functional interactions of genes from genomic data. Transcriptional regulatory interactions, rather than gene expression alone, drive tissue specificity[19]. Further, context-specific regulatory networks have emerged as promising tools to dissect the genetics of complex traits[21-23]. Network-connectivity analyses in GWAS have shown that trait-associated genes are more interconnected than expected[18] and highly interconnected genes are enriched for trait heritability[24]. However, these analyses do not leverage observed enrichments to further enhance trait-associated gene discovery. To unleash the potential of regulatory networks in GWAS, we develop a Bayesian framework for simultaneous genome-wide network enrichment and gene prioritization analysis. Through extensive simulations we show several advantages of the method such as flexibility in various genetic architectures, robustness to a wide range of model mis-specification and improved performance over existing methods. Applying the method to 18 human traits and 38 regulatory networks, we identify strong enrichments of genetic associations in network topology specific to trait-relevant cell types or tissues. By prioritizing variants within enriched networks we identify trait-associated genes that were not implicated by the same GWAS. Many of these previously undescribed genes have strong support from multiple lines of external evidence; some are further validated by follow-up GWAS of the same traits with increased sample sizes. Together, these results demonstrate the potential for our method to yield additional biological and therapeutic insights from existing data.

Results

Method overview

Figure 1 shows the method schematic. In brief, we develop a model dissecting the total effect of a single SNP on a trait into effects of multiple (nearby and distal) genes through a regulatory network, and we combine it with a multiple-SNP regression likelihood[25] based on GWAS summary statistics to perform Bayesian inference.

Fig. 1

Schematic of RSS-NET.

Schematic of RSS-NET.

a Decomposition of the total effect of a common SNP on a complex trait through multiple nearby and distal genes. b Gene regulatory network defined as a weighted and directed bipartite graph linking TFs to TGs. c RSS-NET exploits the topology of a TF-TG network to decompose the total genetic effect into cis and trans-regulatory components. Both the SNP-gene (c) and TF-TG (v) weights in this decomposition are assumed known and are specified by existing omics data (Methods). In addition to TF-TG networks, RSS-NET also requires d GWAS summary statistics and e ancestry-matching LD estimates as input. f Bayesian hierarchical model underlying RSS-NET. An in-depth description is provided in Methods. g Given a network, RSS-NET produces a Bayes factor comparing the baseline (M0) and enrichment (M1) models to summarize the evidence for network enrichment. h RSS-NET prioritizes loci within an enriched network by computing P1, the posterior probability that at least one SNP j in a locus is trait-associated (β ≠ 0). Differences between P1 under M0 and M1 reflect the influence of a regulatory network on genetic associations, highlighting previously undescribed trait-associated genes. Conceptually, we decompose the total effect of a common SNP on a complex trait into three components: a cis-regulatory component through nearby genes, a trans-regulatory component through distal genes that are regulated by genes near this SNP, and a remaining component due to other factors (Fig. 1a). Since common genetic variation contributes to complex traits primarily via gene regulation[22], we find this decomposition a sensible approximation to the genetic basis of complex traits. Despite various ways to model the regulatory components, here we use cell type- or tissue-specific regulatory networks[18,20] linking transcription factors (TFs) to target genes (TGs). Specifically, we define a regulatory network as a directed bipartite graph with weighted edges from TFs to TGs (Fig. 1b). Given a TF-TG network, we use its topology to decompose the total effect of each SNP into effects of multiple interconnected genes. As shown in Fig. 1c, we approximate the effect of SNP j using a weighted sum of cis effects of three nearby genes (outside-network gene k, TG u and TF g) and trans effects of three TGs (u and t on the same chromosome, and n on a different chromosome) that are directly regulated by TF g near SNP j. For identifiability we assume the SNP-gene (c) and TF-TG (v) weights in the decomposition are known, specified by existing omics data (Methods). To implement this regulatory decomposition in GWAS, we formulate a network-induced prior for SNP-level effects (), and combine it with a regression likelihood[25] of based on single-SNP association statistics from a GWAS (Fig. 1d) and linkage disequilibrium (LD) estimates from a reference panel with ancestry matching the GWAS (Fig. 1e). We refer to the resulting Bayesian framework (Fig. 1f) as Regression with Summary Statistics exploiting NEtwork Topology (RSS-NET). RSS-NET accomplishes two tasks simultaneously: (1) testing if a network is enriched for genetic associations (Fig. 1g); (2) identifying which genes within this network drive the enrichment (Fig. 1h). Specifically, RSS-NET estimates two independent enrichment parameters (θ and σ2) that measure the extent to which, SNPs near network genes and regulatory elements (REs) have higher chances to be associated with the trait, and, SNPs near network edges have larger effect sizes, respectively. To assess network enrichment, RSS-NET computes a Bayes factor (BF) comparing the “enrichment model” (M1: θ > 0 or σ2 > 0) against the “baseline model” (M0: θ = 0 and σ2 = 0). To prioritize genes within enriched networks, RSS-NET contrasts posterior distributions of estimated under M0 and M1. RSS-NET improves upon its predecessor RSS-E[16]. Specifically, RSS-NET exploits the full network topology, whereas RSS-E ignores the edge information. By explicitly modeling regulatory interconnections, RSS-NET outperforms RSS-E on both simulated and real data. Despite different treatments of network information, RSS-NET and RSS-E share computation schemes (Box 1; Supplementary Notes 1–3), allowing us to reuse the efficient algorithm of RSS-E. Software is available at https://github.com/suwonglab/rss-net. Input: GWAS summary statistics , LD estimates , network annotations {a, O, W} and a grid of hyper-parameters , h = 1, …, H; see Methods for details. Output: {, , } such that is the closest mean-field approximation in Kullback–Leibler divergence to the exact conditional posterior of given the hyper-parameters {θ0, θ, σ0, σ}. 1. Initialize: Set the initial values of {, } randomly. 2. Optimize: 2a. Compute the prior parameters for each SNP j = 1, …, p: 2b. Determine : for SNP j = 1, …, p. 2c. Iterate through all SNPs to update {, } as follows: 2d. Repeat 2c until {, } converge. 3. Repeat: Repeat 2 for each in the grid to obtain the corresponding optimal {(, (, (}, h = 1, …, H.

Method comparison through simulations

The key contribution of RSS-NET is a unified framework that leverages network topology to infer enrichments from whole-genome association statistics and prioritizes loci in light of inferred enrichments automatically. We are not aware of any published method with the same features. However, one could ignore topology and simply annotate SNPs based on their proximity to network genes and REs (Methods). For these SNP-level annotations there are methods to assess global enrichments or local associations on GWAS summary data. Here we use Pascal[26], LDSC[13,27], and RSS-E[16] to benchmark RSS-NET. Given a network, we first simulated SNP effects () from either RSS-NET or mis-specified models, and then combined them with real genotypes to simulate phenotypes from a genome-wide multiple-SNP model. We computed the single-SNP association statistics, on which we compared RSS-NET with other methods (Figs. 2–4; Supplementary Figs. 1–9). Since RSS-NET is model-based, we designed a large array of simulation scenarios for both correctly- and mis-specified . To reduce computation of this large-scale design, we mainly used genotypes[28] of 348,965 genome-wide common SNPs and a whole-genome regulatory network inferred for human B cells (436 TFs, 3,018 TGs)[20,29]. We obtained similar results from simulations based on genotypes[30] of 1 million common SNPs[31] (Supplementary Fig. 9) or a different network (Supplementary Figs. 2 and 8).

Fig. 2

Flexibility of RSS-NET to identify network-level enrichments from GWAS summary statistics.

We used a B cell-specific regulatory network and real genotypes of 348,965 genome-wide SNPs to simulate negative and positive individual-level data under two genetic architectures (“sparse” and “polygenic”). We simulated SNP effects () for negative datasets from the baseline model (M0: θ = 0 and σ2 = 0). We simulated for positive datasets from the enrichment model (M1: θ > 0 or σ2 > 0) for the target network under three scenarios: a θ > 0, σ2 = 0; b θ = 0, σ2 > 0; c θ > 0, σ2 > 0. Using the simulated individual-level data we computed single-SNP association statistics, on which we compared RSS-NET with RSS-E[16], LDSC-baseline[13], LDSC-baselineLD[27], and Pascal[26] using their default setups (Methods). Pascal includes two gene (“max”: maximum-of-χ2; “sum”: sum-of-χ2) and two pathway (“chi”: χ2 approximation; “emp”: empirical sampling) scoring options. For each dataset, Pascal and LDSC methods produced P-values, whereas RSS-E and RSS-NET produced BFs; these statistics were used to rank the significance of enrichments. A false and true positive occurs if a method identifies enrichment of the target network in a negative and positive dataset respectively. Each panel displays the trade-off between false and true positives via receiver operating characteristics (ROC) curves for all methods in 200 negative and 200 positive datasets of a simulation scenario, and also reports the corresponding areas under ROC curves (AUROCs, higher value indicating better performance). Dashed diagonal lines denote random ROC curves (AUROC = 0.5). d RSS-NET, as well as other methods, does not perform well when the target network harbors weak genetic associations. Simulation details and additional results are provided in Supplementary Figs. 1, 2.

Fig. 4

Power of RSS-NET to identify gene-level associations from GWAS summary statistics.

We started with simulations where RSS-NET modeling assumptions were satisfied. We considered two genetic architectures: a sparse scenario with most SNPs being null and a polygenic scenario with most SNPs being trait-associated. For each architecture, we created negative datasets by simulating SNP effects () from M0 and positive datasets by simulating from three M1 patterns (only θ > 0; only σ2 > 0; both θ > 0 and σ2 > 0) of the target network, and applied the methods to detect M1 from all datasets (Fig. 2; Supplementary Figs. 1, 2). Existing methods tend to perform well in select settings. For example, Pascal and LDSC perform poorly when genetic signals are very sparse (Fig. 2b); RSS-E performs poorly when enrichment patterns are inconsistent with its modeling assumptions (Fig. 2c). Except for datasets with weak genetic signals on the network (Fig. 2d), RSS-NET performs consistently well in all scenarios. This is expected because the flexible model underlying RSS-NET can capture various genetic architectures and enrichment patterns. In practice, one rarely knows beforehand the correct architecture, which makes the flexibility of RSS-NET appealing.

Flexibility of RSS-NET to identify network-level enrichments from GWAS summary statistics.

Fig. 3

Robustness of RSS-NET to model mis-specification in enrichment analyses.

Robustness of RSS-NET to model mis-specification in enrichment analyses.

Here positive datasets were generated from M1 with θ > 0 and σ2 > 0 (Fig. 2c). Negative datasets were simulated from four scenarios where genetic associations were enriched in: a a random set of near-gene SNPs; b a random set of near-RE SNPs; c SNPs with MAF- and LD-dependent effects; d a random edge-altered network. By this design, RSS-NET was mis-specified in all four scenarios. Similar to positive datasets, the simulated false enrichments in all negative datasets manifested in both association proportion (more frequent) and magnitude (larger effect). RSS-E was excluded here because of its poor performance shown in Fig. 2c. The rest is the same as Fig. 2. Simulation details and additional results are provided in Supplementary Figs. 3–6. Minor allele frequency (MAF)- and LD-dependent genetic architectures are identified in complex traits[27]. To assess the impact of MAF- and LD-dependence on RSS-NET results, we simulated MAF- and LD-dependent SNP effects () from an additive model of 10 MAF bins and 6 LD-related annotations[27], which were then used to create negative datasets (Fig. 3c; Supplementary Fig. 5). Similarly, enrichments identified by RSS-NET are unlikely to be false positives induced by MAF- and LD-dependence. Interconnections within regulatory programs play key roles in driving context specificity[19] and propagating disease risk[22], but existing methods often ignore the edge information. In contrast, RSS-NET leverages the full topology of a given network. The topology-aware feature increases the potential of RSS-NET to identify the most relevant network for a trait among candidates that share many nodes but differ in edges. To illustrate this feature, we designed a scenario where a real target network and random candidates had the same nodes and edge counts, but different edges. We simulated positive and negative datasets where genetic associations were enriched in the target network and random candidates respectively, and then tested enrichment of the target network on all datasets. As expected, only RSS-NET can reliably distinguish true enrichments of the target network from enrichments of its edge-altered counterparts (Fig. 3d; Supplementary Fig. 6). To benchmark its prioritization component, we compared RSS-NET with gene-based association modules in RSS-E[16] and Pascal[26] (Fig. 4; Supplementary Figs. 7–9). Consistent with previous work[16], RSS methods outperform Pascal methods even without network enrichment (Fig. 4a). This is because RSS-NET and RSS-E exploit a multiple regression framework[25] to learn the genetic architecture from data of all genes and assess their effects jointly, whereas Pascal only uses data of a single gene to estimate its effect. Similar to enrichment simulations (Fig. 2), RSS-NET outperforms RSS-E in prioritizing genes across different architectures (Fig. 4b–d). This again highlights the flexibility of RSS-NET.

Power of RSS-NET to identify gene-level associations from GWAS summary statistics.

We used a B cell-specific regulatory network and real genotypes of 348,965 genome-wide SNPs to simulate individual-level GWAS data under four scenarios: a θ = 0, σ2 = 0; b θ > 0, σ2 = 0; c θ = 0, σ2 > 0; d θ > 0, σ2 > 0. Using the simulated individual-level data we computed single-SNP association statistics, on which we compared RSS-NET with gene-level association components of RSS-E[16] and Pascal[26]. RSS-E is a special case of RSS-NET assuming σ2 = 0, and RSS-E-baseline is a special case of RSS-E assuming θ = 0. Pascal includes two gene scoring options: maximum-of-χ2 (“max”) and sum-of-χ2 (“sum”). Given a network, Pascal and RSS-E-baseline do not leverage any network information, RSS-E ignores the edge information, and RSS-NET exploits the full topology. Each scenario contains 200 datasets and each dataset contains 16,954 autosomal protein-coding genes for testing. We defined a gene as "trait-associated'' if at least one SNP j within 100 kb of the transcribed region of this gene had non-zero effect (β ≠ 0). For each gene in each dataset, RSS methods produced posterior probabilities that the gene was trait-associated (P1), whereas Pascal methods produced association P-values; these statistics were used to rank the significance of gene-level associations. The first row of each panel displays ROC curves and AUROCs for all methods, with dashed diagonal lines indicating random performance (AUROC = 0.5). The second row of each panel displays precision-recall (PRC) curves and areas under PRC curves (AUPRCs) for all methods, with dashed horizontal lines indicating random performance. For both AUROC and AUPRC, higher value indicates better performance. Simulation details and additional results are provided in Supplementary Figs. 7, 8. Finally, since RSS-NET uses network as is and most networks to date are algorithmically inferred, we performed simulations to assess the robustness of RSS-NET under noisy networks. Specifically, we simulated datasets from a real target network, created noisy networks by randomly removing edges from this real target, and then fed the noisy networks (rather than the real one) to RSS-NET. By exploiting retained true nodes and edges, RSS-NET produces reliable results in identifying both network enrichments and genetic associations, and unsurprisingly, its performance drops as the noise level increases (Supplementary Fig. 10). In conclusion, RSS-NET is adaptive to various genetic architectures and enrichment patterns, it is robust to a wide range of model mis-specification, and it outperforms existing related methods. To further investigate its real-world utility, we applied RSS-NET to analyze 18 complex traits and 38 regulatory networks.

Enrichment analyses of 38 networks across 18 traits

We first inferred[20] whole-genome regulatory networks for 38 human cell types and tissues (Methods; Supplementary Data 1) from public data[29] of paired expression and chromatin accessibility (PECA). On average each network has 431 TFs, 3,298 TGs, and 93,764 weighted TF-TG edges. Clustering showed that networks recapitulated context similarity, with immune cells and brain regions grouping together as two units (Fig. 5a; Supplementary Fig. 11).

Fig. 5

RSS-NET analyses of 18 complex traits and 38 regulatory networks.

RSS-NET analyses of 18 complex traits and 38 regulatory networks.

a Clustering of 38 regulatory networks based on t-distributed stochastic neighbour embedding. Details are provided in Supplementary Fig. 11. b Similarity between a given tissue-specific PECA-based network and 394 CAGE-based networks for various cell types and tissues (a: adult samples; c: cell lines; f: fetal samples). The similarity between a PECA- and CAGE-based network is summarized by Jaccard indices of their node sets (x-axis) and edge sets (y-axis). To simplify visualization, only labels of top four CAGE-based networks with the highest edge similarity are shown for each PECA-based network. See Supplementary Fig. 12 for additional results. c Ternary diagram showing, for each trait, percentages of the “best” enrichment model (with the largest BF) as M11: θ > 0, σ2 = 0, M12: θ = 0, σ2 > 0 and M13: θ > 0, σ2 > 0 across networks. See Supplementary Table 4 for numerical values. Shown are 16 traits having multiple networks more enriched than the near-gene control. d Comparison of context-matched PECA-based (y-axis) and CAGE-based (x-axis) network enrichments on the same GWAS. Dashed lines have slope 1 and intercept 0. See Supplementary Fig. 14 for additional results. e Median proportion of genes with higher than reference estimates ( or ), among genes with reference estimates higher than a given cutoff. Medians are evaluated among 16 traits in c. See Supplementary Table 5 for numerical values. Overlap of RSS-NET prioritized genes () with genes implicated in f knockout mouse phenotypes[47] and g human Mendelian diseases[49,50]. An edge indicates that a category of knockout mouse or Mendelian genes is significantly enriched for genes prioritized for a GWAS trait (FDR ≤ 0.1). Thicker edges correspond to stronger enrichments. To simplify visualization, only top-ranked categories are shown for each trait (f 3; g 2). See Supplementary Data 4, 5 for full results. Trait abbreviations are defined in Supplementary Table 1. As a validation, we assessed the pairwise similarity between the 38 PECA-based networks and 394 human cell type- and tissue-specific regulatory networks[18] reconstructed from independent cap analysis of gene expression (CAGE) data[7,8]. Reassuringly, PECA- and CAGE-based networks often reached maximum overlap when they were derived from biosamples of matched cell types or tissues (Fig. 5b; Supplementary Fig. 12), showing that the context specificity of regulatory networks is replicable. On the 38 networks, we applied RSS-NET to analyze 1.1 million common SNPs[31] for 18 traits, using GWAS summary statistics from 20,883 to 253,288 European-ancestry individuals (Supplementary Table 1) and LD estimates[16] from the European panel of 1000 Genomes Project[30]. For each trait-network pair we computed a BF assessing network enrichment. Full results of 684 trait-network pairs are available online (Data availability). To check whether observed enrichments could be driven by general regulatory enrichments, we created a “near-gene” control network with 18,334 protein-coding autosomal genes as nodes and no edges, and analyzed this control with RSS-NET on the same GWAS data. For most traits, the near-gene control has substantially weaker enrichment than the actual networks. In particular, 512 out of 684 trait-network pairs (one-sided binomial P = 2.2 × 10−40) showed stronger enrichments than their near-gene counterparts (average log10 BF increase: 13.94; one-sided t P = 5.1 × 10−15), and, 16 out of 18 traits had multiple networks more enriched than the near-gene control (minimum: 5; one-sided Wilcoxon P = 1.2 × 10−4). In contrast, LDSC and Pascal methods identified fewer trait-network pairs passing the near-gene enrichment control (LDSC maximum: 389, one-sided χ2 P = 1.7 × 10−12; Pascal maximum: 69, P = 2.0 × 10−129; Supplementary Table 2). Consistent with simulations (Fig. 3a, b), these results indicate that network enrichments identified by RSS-NET are unlikely driven by arbitrary enrichments harbored in the vicinity of genes. Among 512 trait-network pairs passing the near-gene enrichment control, we further examined whether the observed enrichments could be confounded by network properties or genomic annotations. We did not observe any correlation between BFs and three network features (proportion of SNPs in a network: Pearson R = −3.0 × 10−2, two-sided P = 0.49; node counts: R = −5.4 × 10−2, P = 0.23; edge counts: R = −9.2 × 10−3, P = 0.84). To check confounding effects of genomic annotations, we computed the correlation between BFs and proportions of SNPs falling into both a network and each of 73 functional categories[27], and we did not find any significant correlation (−0.13 < R < −0.01, P > 0.05/73). Similar patterns hold for all 684 trait-network pairs (Supplementary Table 3 and Data 2). Together, the results suggest that observed enrichments are unlikely driven by generic network or genome features. For each trait-network pair, we also computed BFs comparing the baseline (M0) against three disjoint models where enrichment (M1) was contributed by (1) network genes and REs only (M11: θ > 0, σ2 = 0); (2) TF-TG edges only (M12: θ = 0, σ2 > 0); (3) network genes, REs and TF-TG edges (M13: θ > 0, σ2 > 0). We found that M13 was the most supported model by data (with the largest BF) for 411 out of 512 trait-network pairs (one-sided binomial P = 1.2 × 10−45), highlighting the key role of TF-TG edges in driving enrichments. To further confirm this finding, we repeated RSS-NET analyses by fixing all TF-TG edge weights as zero (v = 0) and we observed substantially weaker enrichments (average log10 BF decrease: 30.46; one-sided t P = 8.6 × 10−35; Supplementary Fig. 13). Altogether the results corroborate the “omnigenic model” that genetic signals of complex traits are distributed across the genome via regulatory interconnections[22]. Enrichment patterns varied considerably among traits (Fig. 5c; Supplementary Table 4). For type 2 diabetes (T2D), two of five networks passing the near-gene enrichment control showed the strongest support for M11. Many networks showed the strongest support for M12 in breast cancer (10), body mass index (BMI, 14), waist-hip ratio (37), and schizophrenia (38). Since one rarely knows the true enrichment patterns a priori, and M1 includes {M11, M12, M13} as special cases, we used M1-based BFs throughout this study. Collectively, these results highlight the heterogeneity of network enrichments across traits, which can be potentially learned from data by flexible approaches like RSS-NET. Top-ranked enrichments recapitulated many trait-context links reported in previous GWAS. Genetic associations with BMI were enriched in the networks of pancreas (BF = 2.07 × 1013), bowel (BF = 8.02 × 1012), and adipose (BF = 4.73 × 1012), consistent with the roles of obesity-related genes in insulin biology and energy metabolism. Networks of immune cells showed enrichments for rheumatoid arthritis (RA, BF = 2.95 × 1060), inflammatory bowel disease (IBD, BF = 5.07 × 1035) and Alzheimer’s disease (BF = 8.31 × 1026). Networks of cardiac and other muscle tissues showed enrichments for coronary artery disease (CAD, BF = 9.78 × 1028), atrial fibrillation (AF, BF = 8.55 × 1014), and heart rate (BF = 2.43 × 107). Other examples are brain network with neuroticism (BF = 2.12 × 1019), and, liver network with high- and low-density lipoprotein (HDL, BF = 2.81 × 1021; LDL, BF = 7.66 × 1027). Some top-ranked enrichments were not identified in the original GWAS, but they are biologically relevant. For example, natural killer (NK) cell network showed the strongest enrichment among 38 networks for BMI (BF = 3.95 × 1013), LDL (BF = 5.18 × 1030), and T2D (BF = 1.49 × 1077). This result supports a recent mouse study[32] revealing the role of NK cell in obesity-induced inflammation and insulin resistance, and adds to the considerable evidence unifying metabolism and immunity in many pathological states[33]. Other examples include adipose network with CAD[34] (BF = 1.67 × 1029), liver network with Alzheimer’s disease[16,35] (BF = 1.09 × 1020) and monocyte network with AF[36,37] (BF = 4.84 × 1012). Some networks show enrichments in multiple traits. To assess network co-enrichments among traits, we tested correlations for all trait pairs using their BFs of 38 networks (Supplementary Data 3). In total 29 of 153 trait pairs had significant correlations (two-sided Pearson P < 0.05/153). Reassuringly, subtypes of the same disease showed strongly correlated enrichments, as in IBD (R = 0.96, P = 1.3 × 10−20) and CAD subtypes (R = 0.90, P = 3.3 × 10−14). The results also recapitulated known genetic correlations including RA with IBD (R = 0.79, P = 5.3 × 10−9) and neuroticism with schizophrenia (R = 0.73, P = 1.6 × 10−7). Network enrichments of CAD were correlated with network enrichments of known CAD risk factors such as heart rate (R = 0.75, P = 5.1 × 10−8), BMI (R = 0.71, P = 5.1 × 10−7), AF (R = 0.65, P = 9.2 × 10−6) and height (R = 0.64, P = 1.6 × 10−5). Network enrichments of Alzheimer’s disease were strongly correlated with network enrichments of LDL (R = 0.90, P = 2.6 × 10−14) and IBD (R = 0.78, P = 8.3 × 10−9), consistent with roles of lipid metabolism and inflammation in Alzheimer’s disease[35]. Genetic correlations among traits are not predictive of correlations based on network enrichments (Pearson R = 0.12, two-sided P = 0.18), suggesting the additional explanatory power from regulatory networks to reveal trait similarities in GWAS. To show that RSS-NET can be applied more generally, we analyzed the CAGE-based networks[18] of 20 cell types and tissues that were present in 38 PECA-based networks (Fig. 5d; Supplementary Fig. 14). PECA-based networks often produced larger BFs than their CAGE-based counterparts on the same GWAS data (average log10 BF increase: 17.36; one-sided t P = 1.4 × 10−11), suggesting that PECA-based networks are more enriched in genetic signals. Reassuringly, PECA- and CAGE-based networks consistently highlighted known trait-context links (e.g., immune cells and autoimmune diseases, muscle tissues and heart diseases). For some traits PECA-based networks produced more informative results. For example, CAGE-based analysis of HDL showed a broad enrichment pattern across cell types and tissues (which is consistent with previous connectivity analysis[18] of the same data), whereas PECA-based analysis identified liver as the top-enriched context by a wide margin. Although not our main focus, these results highlight the potential for RSS-NET to systematically evaluate different network inferences in GWAS.

Enrichment-informed prioritization of network genes

A key feature of RSS-NET is that inferred network enrichments automatically contribute to prioritization of network genes (Method). Specifically, for each locus RSS-NET produces , and , the posterior probabilities that at least one SNP in the locus is associated with the trait, assuming M0, M1 for the near-gene control network and M1 for a given network, respectively. When multiple networks are enriched, RSS-NET produces by averaging over all networks passing the near-gene control, weighted by their BFs. This allows us to assess genetic associations in light of enrichment without having to select a single enriched network. Differences between enrichment estimates ( or ) and reference estimates ( or ) reflect the impact of network on a locus. RSS-NET enhances genetic association detection by leveraging inferred enrichments. To quantify this improvement, for each trait we calculated the proportion of genes with higher than reference estimates ( or ), among genes with reference P1 passing a given cutoff (Fig. 5e). When using as reference, we observed high proportions of genes with (median: 82–98%) across a wide range of -cutoffs (0−0.9), and as expected, the improvement decreased as the reference cutoff increased. When using as reference, we observed less genes with improved than using (one-sided Wilcoxon P = 9.8 × 10−4), suggesting the observed improvement might be partially due to general near-gene enrichments, but proportions of genes with remained high (median: 74–94%) nonetheless. Similar patterns occurred when we repeated the analysis with across 512 trait-network pairs (Supplementary Table 5). Together the results demonstrate the strong influence of network enrichments on nominating additional trait-associated genes. RSS-NET tends to promote more genes in networks with stronger enrichments. For each trait, the proportion of genes with in a network is often positively correlated with the network enrichment BF (R: 0.28−0.91; Supplementary Table 6). When a gene belongs to multiple networks, the highest often occurs in the top-enriched networks (Fig. 6). We illustrate this coherent pattern with MT1G, a liver-active[9] gene prioritized for HDL by RSS-NET and also implicated in a recent multi-ancestry genome-wide interaction analysis of HDL[38]. Although MT1G belongs to regulatory networks of 18 contexts, only the top enrichment in liver informs a strong association between MT1G and HDL (), and remaining networks with weaker enrichments yield minimal improvement (, ).

Fig. 6

RSS-NET gene prioritization results of select trait-network pairs.

RSS-NET gene prioritization results of select trait-network pairs.

Shown are four trait-network pairs: a body mass index and pancreas; b rheumatoid arthritis and B cell; c high-density lipoprotein cholesterol and liver; d neuroticism and putamen. In the first column of each panel, each point represents a member gene of a given network (blue circle: TF; orange triangle: TG). Dashed lines have slope 1 and intercept 0. In the second and third columns, each point represents a cell type- or tissue-specific network to which a select gene belongs. Numerical values of P1 and BF are available online (Data availability) and are provided as a Source Data file. RSS-NET recapitulates many genes implicated in the same GWAS. For each analyzed dataset we downloaded the GWAS-implicated genes from the GWAS Catalog[1] and computed the proportion of these genes with high . With a stringent cutoff , we observed a significant overlap (median across traits: 69%; median two-sided Fisher exact P = 1.2 × 10−26; Supplementary Table 7). Reassuringly, many recapitulated genes are well-established for the traits (Supplementary Table 8), such as CACNA1C for schizophrenia, TCF7L2 for T2D, APOB for lipids, and STAT4 for autoimmune diseases. RSS-NET also uncovers putative associations that were not reported in the same GWAS. To demonstrate that many of these previously undescribed associations are potentially real, we exploited 15 analyzed traits that each had a updated GWAS with larger sample size. In each case, we obtained newly implicated genes from the GWAS Catalog[1] and computed the proportion of these genes that were identified by RSS-NET (). The overlap proportions remained significant (median: 12%; median two-sided Fisher exact P = 1.9 × 10−5; Supplementary Table 7), showing the potential of RSS-NET to identify trait-associated genes that can be validated by later GWAS with additional samples. Among these validated genes, many are strongly supported by multiple lines of external evidence (Table 1). A particular example is NR0B2, a liver-active[9] gene prioritized for HDL by RSS-NET (, ) but not identified by standard GWAS[39] of the same data (minimum single-SNP association P = 1.4 × 10−7 within 100 kb, n = 99, 900). NR0B2 was associated with mouse lipid traits[40-42] and human obesity[43], and identified in a later GWAS of HDL[44] with doubled sample size (P = 9.7 × 10−16, n = 187, 056).

Table 1

Trait	Gene (Role)	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${P}_{1}^{{\mathsf{base}}}$$\end{document}P1base	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${P}_{1}^{{\mathsf{near}}}$$\end{document}P1near	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${P}_{1}^{{\mathsf{bma}}}$$\end{document}P1bma	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${P}_{1}^{{\mathsf{net}}}$$\end{document}P1net (Network, BF)	Mouse trait	Therapeutic and clinical evidence
BMI	PAX2 (TF)	0.78	0.80	0.94	0.94 (Pancreas, 2.07 × 10¹³)	Eye, Renal	Ocular and renal anomalies
	FLT3 (TG)	0.61	0.70	0.85	0.85 (Cerebellum, 8.70 × 10¹¹)	Growth, Immune	Acute myeloid leukemia
WAIST	LAMB1 (TG)	0.97	0.97	0.98	0.98 (Esophagus, 6.78 × 10²³⁹)	Neuron, NS	Lissencephaly-5
BC	KCTD1 (TG)	0.89	0.93	0.98	0.98 (Heart, 8.08 × 10⁷)	CS	Scalp-ear-nipple syndrome
	CASP8 (TG)	0.71	0.72	0.94	0.94 (Aorta, 8.27 × 10⁸)	Growth, Immune	Hepatoma, Glionitrin A^*
RA	AIRE (TF)	0.54	0.61	0.84	0.84 (B cell, 3.31 × 10⁵⁷)	Immune	APS1
IBD	LPP (TG)	0.98	0.94	0.99	0.99 (Monocyte, 6.28 × 10³¹)	Cellular	Acute myeloid leukemia
	FOXP1 (TF)	0.84	0.78	0.95	0.95 (NK cell, 5.07 × 10³⁵)	Immune, Neuron	Language impairment
	CCND3 (TG)	0.81	0.89	0.95	0.95 (NK cell, 5.07 × 10³⁵)	Immune
HDL	ALOX5 (TG)	0.97	0.97	0.99	0.99 (Monocyte, 4.75 × 10¹⁵)	Immune, Metab.	Atherosclerosis
	GPAM (TG)	0.92	0.95	0.98	0.98 (Liver, 2.81 × 10²¹)	Liver, Metab.
	NR0B2 (TG)	0.84	0.93	0.98	0.98 (Liver, 2.81 × 10²¹)	Growth, Metab.	Early-onset obesity
LDL	CERS2 (TG)	0.99	0.99	1.00	1.00 (NK cell, 5.18 × 10³⁰)	Liver, Metab.
	ABCA1 (TG)	0.98	0.98	0.99	0.99 (Liver, 7.66 × 10²⁷)	Liver, Metab.	Tangier disease, Probucol^*
	ABCB11 (TG)	0.68	0.72	0.88	0.88 (Liver, 7.66 × 10²⁷)	Liver, Metab.	Cholestasis
	DLG4 (TG)	0.69	0.59	0.85	0.85 (NK cell, 5.18 × 10³⁰)	Metab., NS	Tat-NR2B9c^*
	SOX17 (TF)	0.52	0.65	0.82	0.84 (CD8, 5.86 × 10²⁸)	Liver, Metab.	Vesicoureteral reflux-3
CAD	TGFB1 (TG)	0.92	0.99	0.99	0.99 (Adipose, 1.67 × 10²⁹)	CS, Growth	Camurati-Engelmann disease
	FN1 (TG)	0.58	0.79	0.91	0.92 (GEJ, 9.78 × 10²⁸)	CS, Metab.	GFND2, SMDCF
	CDH13 (TG)	0.31	0.55	0.77	0.82 (Heart, 1.93 × 10²⁸)	CS, Metab.
	EDNRA (TG)	0.57	0.79	0.80	0.82 (Aorta, 1.09 × 10²⁷)	CS, Muscle	Ambrisentan^, Macitentan^
AF	SCN5A (TG)	0.87	0.92	1.00	1.00 (Heart, 6.89 × 10¹²)	CS, Muscle	Brugada syndrome-1, ATFB10
	ENPEP (TG)	0.50	0.76	0.92	0.94 (Uterus, 2.71 × 10¹¹)		QGC-001^*
	ATXN1 (TG)	0.45	0.62	0.90	0.90 (Colon, 7.54 × 10¹⁴)	Muscle, NS	Spinocerebellar ataxia-1
	MYOT (TG)	0.55	0.66	0.86	0.87 (Muscle, 8.55 × 10¹⁴)		Myofibrillar myopathy
SCZ	FOXP1 (TF)	1.00	1.00	1.00	1.00 (Colon, 1.20 × 10¹⁴⁴)	Growth, Neuron	Language impairment
	BCL11A (TG)	1.00	1.00	1.00	1.00 (Spleen, 1.44 × 10¹⁴¹)	Immune, NS	Dias-Logan syndrome
	SLC25A12 (TG)	0.79	0.81	0.88	0.88 (Muscle, 4.99 × 10¹²⁷)	Neuron, NS	DEE39
NEU	TCF4 (TF)	0.72	0.88	0.95	0.95 (CD8, 3.66 × 10²⁰)	Immune, NS	Pitt-Hopkins syndrome
	RAPSN (TG)	0.77	0.88	0.93	0.93 (Muscle, 8.20 × 10¹⁷)	Muscle, NS	Congenital myasthenic syndrome-11
	MEF2C (TF)	0.15	0.40	0.83	0.83 (Ileum, 8.56 × 10²²)	Growth, Neuron	Mental retardation-20
	SNCA (TG)	0.15	0.32	0.78	0.79 (Putamen, 2.12 × 10¹⁹)	Neuron, NS	Parkinsonism, BIIB054^*
	PAX6 (TF)	0.10	0.22	0.62	0.64 (Putamen, 2.12 × 10¹⁹)	NS, Vision	Optic nerve hypoplasia
	PCLO (TG)	0.06	0.17	0.63	0.63 (Ileum, 8.56 × 10²²)	Growth, NS	Pontocerebellar hypoplasia-3

The “mouse trait” column is based on the Mouse Genome Informatics[47]. The “therapeutic/clinical evidence” column is based on the Online Mendelian Inheritance in Man[50] and Therapeutic Target Database[53]. Drugs are identified with an asterisk ("*”). Trait abbreviations are defined in Supplementary Table 1. GEJ: gastroesophageal junction. CS: cardiovascular system. DS: digestive/alimentary system. Metab.: metabolism. NS: nervous system. APS1: autoimmune polyendocrinopathy syndrome-1. GFND2: glomerulopathy with fibronectin deposits-2. SMDCF: corner fracture type of spondylometaphyseal dysplasia. ATFB10: familial atrial fibrillation-10. DEE39: developmental and epileptic encephalopathy-39.

Examples of RSS-NET highlighted genes that were not reported in GWAS of the same data but were implicated in later GWAS with increased sample sizes (genome-wide significance threshold: single-SNP association P < 5 × 10−8). The “mouse trait” column is based on the Mouse Genome Informatics[47]. The “therapeutic/clinical evidence” column is based on the Online Mendelian Inheritance in Man[50] and Therapeutic Target Database[53]. Drugs are identified with an asterisk ("*”). Trait abbreviations are defined in Supplementary Table 1. GEJ: gastroesophageal junction. CS: cardiovascular system. DS: digestive/alimentary system. Metab.: metabolism. NS: nervous system. APS1: autoimmune polyendocrinopathy syndrome-1. GFND2: glomerulopathy with fibronectin deposits-2. SMDCF: corner fracture type of spondylometaphyseal dysplasia. ATFB10: familial atrial fibrillation-10. DEE39: developmental and epileptic encephalopathy-39.

Biological and clinical relevance of prioritized genes

Besides looking up overlaps with GWAS publications, we cross-referenced RSS-NET prioritized genes () with multiple orthogonal databases to systematically assess their biological and therapeutic themes. Mouse phenomics provides important resources to study genetics of human traits[45]. Here we evaluated overlap between RSS-NET prioritized genes and genes implicated in 27 categories of knockout mouse phenotypes[46]. Network-informed genes () were significantly enriched in 128 mouse-human trait pairs (FDR ≤ 0.1; Supplementary Data 4). Fewer significant pairs were identified without network information (119 for ; 80 for ). For many human traits, top enrichments of network-prioritized genes occurred in closely related mouse phenotypes (Fig. 5f). Genes prioritized for schizophrenia were strongly enriched in nervous, neurological and growth phenotypes (OR: 1.77–2.04). Genes prioritized for autoimmune diseases were strongly enriched in immune and hematopoietic phenotypes (OR: 2.05–2.35). The cardiovascular system showed strong enrichments of genes prioritized for heart conditions (OR: 2.45–2.92). The biliary system showed strong enrichments of genes prioritized for lipids, BMI, CAD, and T2D (OR: 2.16–10.78). The phenotypically matched cross-species enrichments strengthen the biological relevance of RSS-NET results. Genes causing Mendelian diseases often contribute to complex traits[47]. Here we quantified overlap between RSS-NET prioritized genes and genes causing 19 categories[48] of Mendelian disorders[49]. Leveraging regulatory networks (), we observed 47 significantly enriched Mendelian-complex trait pairs (FDR ≤ 0.1; 44 for ; 31 for ; Supplementary Data 5), among which the top-ranked ones were often phenotypically matched (Fig. 5g). Genes prioritized for schizophrenia were strongly enriched in Mendelian development and psychiatric disorders (OR: 2.22–2.23). Genes prioritized for AF and heart rate were strongly enriched in arrhythmia (OR: 7.16–8.28). Genes prioritized for autoimmune diseases were strongly enriched in monogenic immune dysregulation (OR: 3.11–4.32). Monogenic cardiovascular diseases showed strong enrichments of genes prioritized for lipids and heart conditions (OR: 2.69–3.70). We also identified pairs where Mendelian and complex traits seemed unrelated but were indeed linked. Examples include Alzheimer’s disease with immune dysregulation[35] (OR = 7.32) and breast cancer with insulin disorders[50] (OR = 9.71). The results corroborate the continuum between Mendelian and complex traits. Human genetics has proven valuable in therapeutic development[51]. To evaluate their potential in drug discovery, we examined whether RSS-NET prioritized genes are pharmacologically active and clinically relevant[52]. We identified genes with drug indications matching GWAS traits. One identical match is EDNRA, a gene that is prioritized for CAD (, in aorta) and also a successful target of approved drugs for cardiovascular diseases (Table 1). We identified genes with drug indications closely related to GWAS traits. For example, TTR is prioritized for Alzheimer (, ) and also a successful target of approved drugs for amyloidosis (Table 2). For early-stage development, overlaps between drug indications and GWAS traits may provide additional genetic confidence. For example, HCAR3 is prioritized for HDL (, ) and also a clinical trial target for lipid metabolism disorders (Table 2). Other examples include CASP8 with cancer, NFKB2 with IBD, and DLG4 with stroke (Tables 1, 2). For some genes we found mismatches between drug indications and GWAS traits, which could suggest drug repurposing opportunities[53]. For example, CSF3 is prioritized for AF (, ) and also a successful target of an approved drug for aplastic anemia (AA). Since CSF3 is associated with various blood cell traits in mouse[54] and human[55], and inflammation plays a role in both AA and AF etiology[36,37,56], it is tempting to assess effects of the approved AA drug on AF. Mechanistic evaluations are required to understand the prioritized therapeutic genes, but they could form a useful basis for future studies.

Table 2

Examples of RSS-NET highlighted genes that have not reached genome-wide significance in the GWAS Catalog[1] at the time of analysis.

Trait	Gene (Role)	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${P}_{1}^{{\mathsf{base}}}$$\end{document}P1base	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${P}_{1}^{{\mathsf{near}}}$$\end{document}P1near	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${P}_{1}^{{\mathsf{bma}}}$$\end{document}P1bma	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${P}_{1}^{{\mathsf{net}}}$$\end{document}P1net (Network, BF)	Mouse trait	Therapeutic and clinical evidence
BMI	NEXN (TG)	0.71	0.79	0.89	0.90 (Muscle, 9.31 × 10¹²)	CS, Muscle	Cardiomyopathy
	CDX2 (TF)	0.61	0.70	0.83	0.86 (NK cell, 3.95 × 10¹³)	DS, Growth
WAIST	BSCL2 (TG)	0.80	0.68	0.87	0.87 (Esophagus, 6.78 × 10²³⁹)	Adipose, Growth	Berardinelli-Seip syndrome
	FOXP2 (TF)	0.56	0.59	0.73	0.73 (Esophagus, 6.78 × 10²³⁹)	Growth, NS	Speech-language disorder-1
BC	ADSL (TG)	0.76	0.80	0.91	0.92 (Aorta, 8.27 × 10⁸)	CS, Eye	Adenylosuccinase deficiency
	SYNE1 (TG)	0.57	0.63	0.89	0.90 (Esophagus, 6.30 × 10⁷)	Growth, Muscle	AMC3, EDMD4, SCAR8
RA	TAL1 (TF)	0.71	0.79	0.91	0.93 (CD4, 3.02 × 10⁵²)	Immune, Tumor	Acute lymphocytic leukemia
	FHIT (TG)	0.30	0.60	0.90	0.91 (CD4, 3.02 × 10⁵²)	Immune, Tumor
	FLT3 (TG)	0.33	0.57	0.73	0.73 (B cell, 3.31 × 10⁵⁷)	Immune, Tumor	Acute myeloid leukemia
IBD	FHIT (TG)	0.63	0.87	0.95	0.95 (CD4, 5.32 × 10³³)	Immune, Tumor
	GATA3 (TF)	0.85	0.83	0.94	0.94 (NK cell, 5.07 × 10³⁵)	Immune, Renal	Barakat syndrome
	RORA (TF)	0.66	0.78	0.87	0.90 (B cell, 1.49 × 10³²)	Immune, NS	Intellectual disability
	NFKB2 (TF)	0.74	0.85	0.84	0.88 (B cell, 1.49 × 10³²)	Immune	Immunodeficiency, DIMS-0150^*
	LRBA (TG)	0.42	0.58	0.72	0.72 (NK cell, 5.07 × 10³⁵)	Immune	Immunodeficiency
	DOCK2 (TG)	0.38	0.53	0.71	0.71 (NK cell, 5.07 × 10³⁵)	Immune	Immunodeficiency
HDL	MT1G (TG)	0.10	0.09	0.98	0.98 (Liver, 2.81 × 10²¹)	CS, Metab.
	RETSAT (TG)	0.79	0.80	0.95	0.95 (Liver, 2.81 × 10²¹)	Adipose, Metab.
	ESR1 (TF)	0.77	0.82	0.95	0.95 (Liver, 2.81 × 10²¹)	CS, Metab.	Myocardial infarction
	HCAR3 (TG)	0.85	0.85	0.92	0.92 (Monocyte, 4.75 × 10¹⁵)	Metab.	ARI-3037MO^*
	TNNC1 (TG)	0.48	0.45	0.78	0.78 (Liver, 2.81 × 10²¹)	CS, Muscle	Cardiomyopathy, Levosimendan^*
LDL	RAF1 (TG)	0.79	0.83	0.90	0.90 (Aorta, 3.71 × 10²⁷)	CS, Immune	Cardiomyopathy, Semapimod^*
	APOA1 (TG)	0.70	0.76	0.90	0.90 (Liver, 7.66 × 10²⁷)	CS, Metab.	Amyloidosis, HDL deficiency
	ACADVL (TG)	0.69	0.59	0.85	0.85 (NK cell, 5.18 × 10³⁰)	Liver, Metab.	VLCAD deficiency
T2D	ITGB6 (TG)	0.75	0.99	0.99	0.99 (Ileum, 4.52 × 10⁶²)	Immune, Metab.	Amelogenesis imperfecta type IH
HR	TKT (TG)	0.65	0.67	0.92	0.93 (Aorta, 2.43 × 10⁷)	CS, Growth	SDDHD
CAD	OSM (TG)	0.56	0.78	0.86	0.86 (Aorta, 1.09 × 10²⁷)	Immune, Metab.	GSK2330811^*
	TRIB1 (TG)	0.43	0.68	0.85	0.85 (Adipose, 1.67 × 10²⁹)	Adipose, Metab.
	TAB2 (TG)	0.19	0.43	0.61	0.61 (CD8, 1.13 × 10²⁵)	CS	Congenital heart defects
AF	TPMT (TG)	0.88	0.93	0.99	0.99 (Ileum, 4.43 × 10¹³)	Metab.	Poor metabolism of thiopurines-1
	RUNX1 (TF)	0.44	0.60	0.88	0.89 (Heart, 2.15 × 10¹⁴)	CS, Immune	Acute myeloid leukemia, FPDMM
	CSF3 (TG)	0.56	0.72	0.88	0.88 (Muscle, 8.55 × 10¹⁴)	Blood, Immune	Interleukin-3^*
LOAD	CASP2 (TG)	0.99	1.00	1.00	1.00 (CD8, 8.31 × 10²⁶)	Cellular, NS	Caspase-2^*
	TTR (TG)	0.64	0.92	0.94	0.94 (Pancreas, 3.53 × 10²⁰)	Metab.	Amyloidosis, Inotersen^, Patisiran^
SCZ	RORA (TF)	1.00	1.00	1.00	1.00 (Cortex, 5.39 × 10¹²⁸)	Neuron, NS	Intellectual disability
	ERBB4 (TG)	1.00	1.00	1.00	1.00 (Putamen, 7.22 × 10¹¹⁶)	Neuron, NS	Amyotrophic lateral sclerosis-19
	NFIB (TF)	0.97	0.97	0.98	0.98 (Cortex, 5.39 × 10¹²⁸)	NS	MACID
	GRIK2 (TG)	0.90	0.94	0.97	0.97 (Cerebellum, 3.15 × 10¹²⁹)	Neuron, NS	Mental retardation
	SYT1 (TG)	0.84	0.89	0.93	0.93 (Cerebellum, 3.15 × 10¹²⁹)	Neuron, NS	Baker-Gordon syndrome
	ESR1 (TF)	0.80	0.84	0.93	0.93 (Colon, 1.07 × 10¹⁴¹)	Neuron, NS	Migraine
	NTRK2 (TG)	0.78	0.84	0.91	0.91 (Cerebellum, 3.15 × 10¹²⁹)	Neuron, NS	DEE58
	LRRK2 (TG)	0.73	0.78	0.86	0.86 (Monocyte, 5.85 × 10¹³¹)	Neuron, NS	Parkinsonism, DNL151^, DNL201^
	C9orf72 (TG)	0.74	0.78	0.83	0.83 (Spleen, 1.44 × 10¹⁴¹)	Neuron, NS	FTDALS1
	SNCA (TG)	0.60	0.66	0.74	0.74 (Cerebellum, 3.15 × 10¹²⁹)	Neuron, NS	Parkinsonism, BIIB054^*
NEU	LMBRD1 (TG)	0.42	0.66	0.94	0.94 (Ileum, 8.56 × 10²²)	Metab.	MAHCF
	PRKCQ (TG)	0.36	0.56	0.90	0.91 (Spleen, 2.13 × 10¹⁹)	Immune, NS
	ATP1A2 (TG)	0.33	0.39	0.76	0.78 (Putamen, 2.12 × 10¹⁹)	Neuron, NS	AHC1, FHM2

AMC3: myogenic-type arthrogryposis multiplex congenita-3. EDMD4: Emery-Dreifuss muscular dystrophy-4. SCAR8: autosomal recessive spinocerebellar ataxia-8. VLCAD: very long-chain acyl-CoA dehydrogenase. SDDHD: short stature, developmental delay, and congenital heart defects. FPDMM: familial platelet disorder with associated myeloid malignancy. MACID: acquired macrocephaly with impaired intellectual development. FTDALS1: frontotemporal dementia and/or amyotrophic lateral sclerosis. MAHCF: methylmalonic aciduria and homocystinuria of the cblF type. AHC1: alternating hemiplegia of childhood-1. FHM2: familial hemiplegic migraine-2. The remaining abbreviations are the same as in Table 1.

Examples of RSS-NET highlighted genes that have not reached genome-wide significance in the GWAS Catalog[1] at the time of analysis. AMC3: myogenic-type arthrogryposis multiplex congenita-3. EDMD4: Emery-Dreifuss muscular dystrophy-4. SCAR8: autosomal recessive spinocerebellar ataxia-8. VLCAD: very long-chain acyl-CoA dehydrogenase. SDDHD: short stature, developmental delay, and congenital heart defects. FPDMM: familial platelet disorder with associated myeloid malignancy. MACID: acquired macrocephaly with impaired intellectual development. FTDALS1: frontotemporal dementia and/or amyotrophic lateral sclerosis. MAHCF: methylmalonic aciduria and homocystinuria of the cblF type. AHC1: alternating hemiplegia of childhood-1. FHM2: familial hemiplegic migraine-2. The remaining abbreviations are the same as in Table 1.

Discussion

We present RSS-NET, a topology-aware method for integrative analysis of regulatory networks and GWAS summary data. We demonstrate the improvement of RSS-NET over existing methods through extensive simulations, and illustrate its potential to yield biological and therapeutic insights via analyses of 38 networks and 18 traits. With multi-omics integration becoming a routine in GWAS, we expect that researchers will find RSS-NET useful. Compared with existing integrative approaches, RSS-NET has several key strengths. First, unlike many methods that require loci passing a significance threshold[11,12,17], RSS-NET uses data from genome-wide common variants. This potentially allows RSS-NET to identify subtle enrichments even in studies with few significant hits. Second, RSS-NET models enrichments directly as increased rates (θ) and sizes (σ2) of SNP-level associations, and thus bypasses the issue of converting SNP-level summary data to gene-level statistics[17,18,26]. Third, RSS-NET inherits from RSS-E[16] an important feature that inferred enrichments automatically highlight which network genes are most likely to be trait-associated. This prioritization component, though useful, is missing in current polygenic analyses[13,15,24,27]. Fourth, by making flexible modeling assumptions, RSS-NET is adaptive to unknown genetic architectures. RSS-NET allows us to study complex trait genetics through the lens of regulatory topology. Complementing previous connectivity analyses[17-19,24], RSS-NET highlights a consistent pattern that genetic signals of complex traits often distribute across genome via regulatory topology. RSS-NET further leverages topology enrichments to enhance trait-associated gene discovery. The topology awareness of RSS-NET in both enrichment and prioritization analyses is enabled by a model that decomposes the effect of a single SNP into effects of multiple (cis or trans) genes through a regulatory network. RSS-NET depends critically on the quality of input networks. The more accurate networks are, the better performance RSS-NET achieves. Currently, our understanding of regulatory networks remains incomplete, and most of the available networks are algorithmically inferred[17-20]. Artifacts in inferred networks can bias RSS-NET results; however, our simulations confirm the robustness of RSS-NET when input networks are not severely deviated from ground truth. The modular design of RSS-NET enables systematic assessment of various networks in the same GWAS and provides interpretable performance metrics, as illustrated in our comparison of PECA- and CAGE-based networks. As more accurate networks become available in diverse cellular contexts, the performance of RSS-NET will be markedly enhanced. Like any method, RSS-NET has several limitations in its current form. First, despite its prioritization feature, RSS-NET does not attempt to pinpoint associations to causal SNPs within prioritized loci. For this task, we recommend off-the-shelf fine-mapping methods[57]. Second, the computation time of RSS-NET increases as the total number of analyzed SNPs increases, and thus our simulations and analyses focused on 0.35–1.19 million genome-wide common SNPs[28,31]. Relaxing the complexity will allow RSS-NET to analyze more SNPs jointly. Third, RSS-NET uses a simple method to derive SNP-gene relevance (c) from expression quantitative trait loci (eQTL). A more principled approach would be applying the RSS likelihood[25] to eQTL summary data (as we did in GWAS) and using the estimated SNP effects to specify c. However, our initial assessments indicated that the model-based approach was limited by the small sample sizes of current eQTL studies[9,10]. With eQTL studies reaching large sample sizes[58] comparable to current GWAS[1], this approach may improve c specification in RSS-NET. Fourth, RSS-NET analyzes one network at a time. Since a complex disease typically manifests in various sites, multiple cellular networks are likely to mediate disease risk jointly. To extend RSS-NET to incorporate multiple networks, an intuitive idea would be representing the total effect of a SNP as an average of its effect in each network, weighted by network relevance for a disease. Fifth, RSS-NET does not include known SNP-level[13,24,27] or gene-level[14-16] annotations. Although our mis-specification simulations and near-gene control analyses confirm that RSS-NET is robust to generic enrichments of known features, accounting for known annotations can help interpret observed network enrichments[24]. Our preliminary experiments showed that incorporating additional networks or annotations in RSS-NET increased computation costs. Hence, we view developing computationally efficient multi-network, multi-annotation methods as an important area for future work. In summary, improved understanding of complex trait genetics requires biologically informed models beyond the standard one employed in GWAS. By modeling context-specific regulatory topology, RSS-NET is a step forward in this direction.

Methods

Gene and SNP information

This study used genes and SNPs from the human genome assembly GRCh37. This study used 18,334 protein-coding autosomal genes (http://ftp.ensembl.org/pub/grch37/release-94/gtf/homo_sapiens, accessed January 3, 2019). Simulations used 348,965 genome-wide SNPs[28] (https://www.wtccc.org.uk), and data analyses used 1,289,786 genome-wide HapMap3[31] SNPs (https://data.broadinstitute.org/alkesgroup/LDSCORE/w_hm3.snplist.bz2, accessed November 27, 2018). As discussed later, these SNP sets were chosen to reduce computation. This study excluded SNPs on sex chromosomes, SNPs with MAF less than 1%, and SNPs in the human leukocyte antigen region.

Gene regulatory networks

In this study a regulatory network is a directed bipartite graph {VTF, VTG, ETF→TG}, where VTF and VTG denote the node sets of TFs and TGs respectively, and ETF→TG denotes the set of TF-to-TG edges, summarizing how TFs regulate TGs through REs (Fig. 1b; Supplementary Note 4). Each edge has a weight between 0 and 1, measuring the relative regulation strength of a TF on a TG. We inferred 38 regulatory networks from context-matched sequencing data of gene expression (e.g., RNA-seq) and chromatin accessibility (e.g., DNAse-seq or ATAC-seq). We obtained these PECA data from ENCODE[29] (https://www.encodeproject.org, accessed December 14, 2018) and GTEx[9] (https://gtexportal.org, accessed July 13, 2019); see Supplementary Data 1. The network-construction software and TF-motif information are available at https://github.com/suwonglab/PECA. The 38 networks are available at https://github.com/suwonglab/rss-net, with descriptive statistics provided in Supplementary Tables 9–11. We first constructed an “omnibus” network from PECA data of 201 biosamples across 80 cell types and tissues, using a regression-based method[20]. In brief, by modeling the distribution of TG expression levels conditional on RE accessibility levels and TF expression levels, we estimated a regression coefficient for each TF-TG pair. We selected a TF-TG pair as the network edge if this estimated coefficient was significantly non-zero, and divided the estimate by the maximum of estimates for all TF-TG pairs to set a (0, 1)-scale edge weight. We also estimated a regression coefficient for each RE-TG pair, which reflected the regulating strengths of REs on TGs and was later used to construct context-specific networks, i.e., {I} in Eq. (1). Here we defined REs as open chromatin peaks called from accessibility sequencing data by MACS2[59] (https://github.com/macs3-project/MACS, accessed July 12, 2018). With the omnibus network in place, we then constructed context-specific networks for 5 immune cell types, 5 brain regions and 27 non-brain tissues. For each context (tissue or cell type), we computed a trans-regulation score (TRS) between TF g and TG t:where R is the correlation of TF g and TG t expression levels across all contexts; are normalized context-specific expression (TF g, TG t) and accessibility (RE i) levels (, where y denotes the actual accessibility or expression level in a given context, and ymed denotes median level across all contexts); B reflects the motif binding strength of TF g on RE i, defined as the sum of motif position weight matrix-based log-odds probabilities of all binding sites on RE i and calculated by HOMER[60] (http://homer.ucsd.edu/homer/, accessed July 12, 2018); and I reflects the overall regulating strength of RE i on TG t, provided by the omnibus network. TRS naturally ranks and selects context-specific TF-TG edges because a larger value of TRS indicates a stronger regulating strength of TF g on TG t in the given context. We set (0, 1)-scale TF-TG edge weights by computing . To validate PECA-based networks and illustrate RSS-NET as a generally applicable tool, we also analyzed 394 cell type- and tissue-specific TF-TG circuits[18] inferred from independent CAGE data[7,8] (http://regulatorycircuits.org/, accessed May 8, 2019). When evaluating the similarity between PECA- and CAGE-based networks (Fig. 5b; Supplementary Fig. 12), we used their full node and edge sets to compute Jaccard indices. When running RSS-NET on context-matched PECA- and CAGE-based networks (Fig. 5d; Supplementary Fig. 14), we selected top-ranked CAGE-based edges to match PECA-based edge counts (Supplementary Table 10) and normalized CAGE-based edge weights (, where x denotes original weight) to match the scale of PECA-based edge weights (Supplementary Table 11).

External databases for cross-reference

To validate and interpret RSS-NET results, we used the following external databases (accessed November 28, 2019): GWAS Catalog[1] (https://www.ebi.ac.uk/gwas/), Mouse Genome Informatics[46] (http://www.informatics.jax.org/), Mendelian gene sets[48] (https://github.com/bogdanlab/gene_sets/), Online Mendelian Inheritance in Man[49] (https://www.omim.org/), Therapeutic Target Database[52] (http://db.idrblab.net/ttd/). When quantifying overlaps between RSS-NET prioritized genes and mouse or Mendelian genes, we used all genes for each GWAS trait. We repeated the overlap analysis under the same significance cutoff (FDR ≤ 0.1) after excluding genes implicated in the same or later GWAS (Supplementary Table 7). Since GWAS-implicated genes overlap significantly with phenotypically-matched mouse and Mendelian genes (median two-sided Fisher exact P = 7.1 × 10−7), we identified fewer discoveries as expected (mouse-human pairs: 26, Mendelian-complex pairs: 4; Supplementary Data 4–5), but we obtained consistent effect sizes nonetheless (mouse R = 0.78, two-sided P = 8.6 × 10−73; Mendelian R = 0.89, P = 9.0 × 10−74; Supplementary Fig. 15).

Network-induced effect size distribution

We model the total effect of SNP j on a given trait β aswhere π denotes the probability that SNP j is associated with the trait (β ≠ 0), denotes a normal distribution with mean μ and variance specifying the effect size of a trait-associated SNP j, and δ0 denotes point mass at zero (β = 0). We model the trait-association probability π aswhere θ0 < 0 captures the genome-wide background proportion of trait-associated SNPs, θ > 0 reflects the increase in probability, on the log10-odds scale, that a SNP near network genes and REs is trait-associated, and a reflects the proximity of SNP j to a network. Following previous analyses[15,16,24], we let a = 1 if SNP j is within 100 kb of any member gene (TF, TG) or RE for a given network. Equation (3) suggests that if a cell type or tissue plays an important role in a trait then genetic associations may occur more often in SNPs involved in the corresponding network genes and REs than expected by chance. We model the mean effect size μ aswhere O is the set of all nearby or distal genes contributing to the total effect of SNP j, w measures the relevance between SNP j and gene g, and γ denotes the effect of SNP j on a trait due to gene g. Equation (4) provides a general decomposition of total SNP effect into gene effects through {O, w}. Here we use a TF-TG network to specify {O, w} in Eq. (4):where G is the set of all genes within 1 Mb window of SNP j (a standard window size used in cis-eQTL studies[9,10,58]), c measures the relative impact of a SNP j on gene g, T is the set of all genes directly regulated by TF g in a given network (T is empty if gene g is not a TF), and v measures the relative impact of a TF g on its TG t. Since a genome-wide analysis typically involves many SNPs and genes, we fix {T, v, c} to ensure the identifiability of Eq. (5). We use inferred edges and weights of a context-specific TF-TG network[20,29] to specify T and v respectively. We use context-matched cis-eQTL[9,10,58] to specify c (Supplementary Note 5 and Tables 12, 13). Equation (5) suggests that the total effect of a SNP may fan out through some regulatory network of multiple (nearby or distal) genes to affect the trait[22]. We model the random effect γ of SNP j due to gene g aswhere the SNP-level subscript j in γ ensures the exchangeability of β in Eq. (2); see Supplementary Note 6. Equation (6) uses a constant σ2 for computational convenience. Equation (6) could be modified by letting σ2 depend on functional annotations[13,27] of SNP j and context-specific expression[14-16] of gene g, though possibly at higher computational cost. Equations (2), (4), and (6) implies a variance decomposition for SNP effect: We hypothesize that Eq. (7) may provide an alternative approach to heritability analyses[13,24,27] and we plan to investigate it elsewhere.

Bayesian hierarchical modeling

Consider a GWAS with n unrelated individuals measured on p SNPs. In practice we do not know the true SNP-level effects in Eq. (2), but we can infer them from GWAS summary statistics and LD estimates. Specifically, we perform Bayesian inference for by combining the network-based prior defined by Eqs. (2)–(6) with the RSS likelihood[25]:where , is a p × p diagonal matrix with , are estimated single-SNP effect size of each SNP j and its standard error from the GWAS, and is the p × p LD matrix estimated from a reference panel with ancestry matching the GWAS. RSS-NET, defined by Eqs. (2)–(6), and (8), consists of four unknown hyper-parameters . To specify hyper-priors, we first introduce two free parameters {η, ρ} to re-parameterize :where, roughly, η represents the proportion of the total phenotypic variation explained by p SNPs, and ρ represents the proportion of total genetic variation explained by network annotations {O, w}. Because approximates the ratio of phenotype variance to genotype variance, Eq. (9) ensures that SNP effects () do not rely on sample size n and have the same measurement unit as the trait. See Supplementary Note 7 for derivation of Eq. (9). We then place independent uniform grid priors on {θ0, θ, η, ρ} (Supplementary Table 14). These simple hyper-priors produce accurate posterior estimates for hyper-parameters in simulations (Supplementary Fig. 16). RSS-NET results are robust to grid choice on both simulated and real data (Supplementary Figs. 17–18). (If one had specific information about {θ0, θ, η, ρ} in a given setting then this could be incorporated in the hyper-priors).

Network enrichment

To assess whether a regulatory network is enriched for genetic associations with a trait, we evaluate a Bayes factor (BF):where f( ⋅ ) denotes probability densities, a is defined in Eq. (3), {O, W} are defined in Eq. (4), M1 denotes the enrichment model with θ > 0 or σ2 > 0, and M0 denotes the baseline model with θ = 0 and σ2 = 0. The observed data are BF times more likely under M1 than under M0, and so the larger the BF, the stronger evidence for network enrichment. See Supplementary Note 2 for computation details. To compute BFs used in Fig. 5c, we replace M1 in Eq. (10) with three restricted enrichment models (M11, M12, M13). Unless otherwise specified, all BFs reported in this work are based on M1. Given a BF cutoff, false positive rates vary considerably across genetic architectures and enrichment patterns in simulations (Supplementary Table 15). As the genetic basis of most complex traits remains unknown, we find it impractical to fix some significance threshold. Instead we recommend an adaptive approach. Specifically, for a given GWAS we run RSS-NET on a near-gene control network containing all genes as nodes and no edges (i.e., a = 1 for all SNPs within 100 kb of any gene and v = 0 for all TF-TG pairs), and we use the resulting BF as the enrichment threshold in this GWAS. Our analyses show three advantages of this approach. First, it is adaptive to study heterogeneity such as trait differences and sample sizes (Supplementary Table 1). Second, it accounts for generic enrichments of genetic signals residing near genes. Third, it facilitates comparisons with non-Bayesian methods based on P-values (Supplementary Table 2).

Locus association

To identify the association between a locus and a trait, we compute P1, the posterior probability that at least one SNP in the locus is associated with the trait:where D is a shorthand for the input data of RSS-NET including GWAS summary statistics , LD estimates and network annotations {a, O, W}. See Supplementary Note 3 for computation details. For a locus, , , and correspond to P1 evaluated under the baseline model M0, the enrichment model M1 for the near-gene control network, and M1 for a given TF-TG network. In this study, we defined a locus as the transcribed region of a gene plus 100 kb up and downstream, and we used “locus” and “gene” interchangeably. For K networks with enrichments stronger than the near-gene control, we use Bayesian model averaging (BMA) to compute for each locus:where and BF(k) are enrichment P1 and BF for network k. The ability to average across networks in Eq. (12) is an advantage of our Bayesian framework, because it allows us to assess associations in light of network enrichment without having to select a single enriched network. In this study we used P1 ≥ 0.9 as the significance cutoff, yielding a median false positive rate 1.24 × 10−4 and a median false discovery rate 6.43 × 10−2 in simulations (Supplementary Tables 16, 17). We also highlighted genes with (Fig. 6 and Tables 1, 2), because they showcase the influence of context-specific regulatory topology on prioritizing genetic associations.

Computation time

The total computation time of RSS-NET to analyze a pair of trait and network is determined by the number of genome-wide SNPs analyzed, the size of hyper-parameter grid, and the number of variational iterations till convergence, all of which can vary considerably among studies. It is thus hard to make general statements about computation time. However, to give a specific example, we finished the analysis of 1,032,214 HapMap3 SNPs and liver network for HDL within 12 hours in a standard computer cluster (60 nodes, 8 CPUs, and 32 Gb memory per node). The number of genome-wide SNPs analyzed (p) affects the computation time of RSS-NET in two distinct ways. First, the per-iteration complexity of RSS-NET is linear with p (Box 1; Supplementary Note 1). Second, a large p defines a large optimization problem, often requiring many iterations to converge. To quantify the impact of p on computation time, we simulated datasets from different sets of genome-wide SNPs, analyzed them with RSS-NET on identical computers, and compared the computation time (Supplementary Fig. 9). When p increased from 348,965 to 1,030,397, on average the total computation time was four times longer (one-sided Wilcoxon P = 8.0 × 10−132).

Simulation overview

To assess the network-induced model for SNP effects () in RSS-NET, we simulated a large array of correctly- and mis-specified for a given target network. Specifically, we generated “positive” datasets where the underlying was simulated from M1 for the target network, and “negative” datasets where was simulated from either M0 or the following scenarios: (1) random enrichments of near-gene SNPs; (2) random enrichments of near-RE SNPs; (3) MAF- and LD-dependent effect sizes; (4) M1 for edge-altered copies of the target network. For a fair comparison in each scenario, we matched positive and negative datasets by both the number of trait-associated SNPs and the proportion of phenotypic variation explained by all SNPs. See Supplementary Figs. 1–9 for details. We combined the simulated with genotypes of 348,965 genome-wide SNPs from 1,458 individuals[28] to simulate phenotypes using an additive multiple-SNP model with Gaussian noise. We performed the standard single-SNP analysis of simulated individual-level datasets to generate GWAS summary statistics, on which we compared RSS-NET with external methods.

External software for benchmarking

To benchmark RSS-NET this study used the following software: RSS-E (https://github.com/stephenslab/rss, accessed October 19, 2018), Pascal (https://www2.unil.ch/cbg/index.php?title=Pascal, accessed October 5, 2017) and LDSC with two sets of baseline annotations as covariates (version 1.0.0, https://github.com/bulik/ldsc; baseline model v1.1, https://data.broadinstitute.org/alkesgroup/LDSCORE/1000G_Phase3_baseline_v1.1_ldscores.tgz; baselineLD model v2.1, https://data.broadinstitute.org/alkesgroup/LDSCORE/1000G_Phase3_baselineLD_v2.1_ldscores.tgz; accessed November 27, 2018). Versions of all packages and files were up-to-date at the time of analysis. Given a context-specific TF-TG network, RSS-E and LDSC methods use the same binary SNP-level annotations {a} defined in Eq. (3). The interface design of Pascal does not allow direct usage of {a}. Here we supplied Pascal program with a GMT file containing all member genes of a network and set SNP-to-gene window sizes as 100 kb (“–up = 100000 –down = 100000”). In this study all external methods were used with their default setups, which did not include the edge information of a network. RSS-E outputs the same statistics as RSS-NET (BF and P1). Pascal implements two gene scoring methods (maximum-of-χ2 and sum-of-χ2) to produce gene-based association P-values. Given gene scores, Pascal provides two gene set scoring options (χ2 approximation and empirical sampling) to produce enrichment P-values. LDSC methods output enrichment P-values and coefficient Z-scores, yielding consistent results in our simulations (LDSC-baseline: R = 0.98, two-sided P = 1.2 × 10−67; LDSC-baselineLD: R = 0.98, P = 9.1 × 10−63; Supplementary Fig. 19). Due to the higher power shown in simulations (LDSC-baseline: average AUROC increase = 0.012, one-sided t P = 4.0 × 10−3; LDSC-baseline LD: average AUROC increase = 0.023, one-sided t P = 1.5 × 10−5), we used enrichment P-values from LDSC in this study.

59 in total

1. Systematic localization of common disease-associated variation in regulatory DNA.

Authors: Matthew T Maurano; Richard Humbert; Eric Rynes; Robert E Thurman; Eric Haugen; Hao Wang; Alex P Reynolds; Richard Sandstrom; Hongzhu Qu; Jennifer Brody; Anthony Shafer; Fidencio Neri; Kristen Lee; Tanya Kutyavin; Sandra Stehling-Sun; Audra K Johnson; Theresa K Canfield; Erika Giste; Morgan Diegel; Daniel Bates; R Scott Hansen; Shane Neph; Peter J Sabo; Shelly Heimfeld; Antony Raubitschek; Steven Ziegler; Chris Cotsapas; Nona Sotoodehnia; Ian Glass; Shamil R Sunyaev; Rajinder Kaul; John A Stamatoyannopoulos
Journal: Science Date: 2012-09-05 Impact factor: 47.728

2. Integrating common and rare genetic variation in diverse human populations.

Authors: David M Altshuler; Richard A Gibbs; Leena Peltonen; David M Altshuler; Richard A Gibbs; Leena Peltonen; Emmanouil Dermitzakis; Stephen F Schaffner; Fuli Yu; Leena Peltonen; Emmanouil Dermitzakis; Penelope E Bonnen; David M Altshuler; Richard A Gibbs; Paul I W de Bakker; Panos Deloukas; Stacey B Gabriel; Rhian Gwilliam; Sarah Hunt; Michael Inouye; Xiaoming Jia; Aarno Palotie; Melissa Parkin; Pamela Whittaker; Fuli Yu; Kyle Chang; Alicia Hawes; Lora R Lewis; Yanru Ren; David Wheeler; Richard A Gibbs; Donna Marie Muzny; Chris Barnes; Katayoon Darvishi; Matthew Hurles; Joshua M Korn; Kati Kristiansson; Charles Lee; Steven A McCarrol; James Nemesh; Emmanouil Dermitzakis; Alon Keinan; Stephen B Montgomery; Samuela Pollack; Alkes L Price; Nicole Soranzo; Penelope E Bonnen; Richard A Gibbs; Claudia Gonzaga-Jauregui; Alon Keinan; Alkes L Price; Fuli Yu; Verneri Anttila; Wendy Brodeur; Mark J Daly; Stephen Leslie; Gil McVean; Loukas Moutsianas; Huy Nguyen; Stephen F Schaffner; Qingrun Zhang; Mohammed J R Ghori; Ralph McGinnis; William McLaren; Samuela Pollack; Alkes L Price; Stephen F Schaffner; Fumihiko Takeuchi; Sharon R Grossman; Ilya Shlyakhter; Elizabeth B Hostetter; Pardis C Sabeti; Clement A Adebamowo; Morris W Foster; Deborah R Gordon; Julio Licinio; Maria Cristina Manca; Patricia A Marshall; Ichiro Matsuda; Duncan Ngare; Vivian Ota Wang; Deepa Reddy; Charles N Rotimi; Charmaine D Royal; Richard R Sharp; Changqing Zeng; Lisa D Brooks; Jean E McEwen
Journal: Nature Date: 2010-09-02 Impact factor: 49.962

Review 3. From genome-wide associations to candidate causal variants by statistical fine-mapping.

Authors: Daniel J Schaid; Wenan Chen; Nicholas B Larson
Journal: Nat Rev Genet Date: 2018-08 Impact factor: 53.242

Review 4. Validating therapeutic targets through human genetics.

Authors: Robert M Plenge; Edward M Scolnick; David Altshuler
Journal: Nat Rev Drug Discov Date: 2013-07-19 Impact factor: 84.694

Review 5. Mechanisms of tissue and cell-type specificity in heritable traits and diseases.

Authors: Idan Hekselman; Esti Yeger-Lotem
Journal: Nat Rev Genet Date: 2020-01-08 Impact factor: 53.242

6. An atlas of active enhancers across human cell types and tissues.

Authors: Robin Andersson; Claudia Gebhard; Michael Rehli; Albin Sandelin; Irene Miguel-Escalada; Ilka Hoof; Jette Bornholdt; Mette Boyd; Yun Chen; Xiaobei Zhao; Christian Schmidl; Takahiro Suzuki; Evgenia Ntini; Erik Arner; Eivind Valen; Kang Li; Lucia Schwarzfischer; Dagmar Glatz; Johanna Raithel; Berit Lilje; Nicolas Rapin; Frederik Otzen Bagger; Mette Jørgensen; Peter Refsing Andersen; Nicolas Bertin; Owen Rackham; A Maxwell Burroughs; J Kenneth Baillie; Yuri Ishizu; Yuri Shimizu; Erina Furuhata; Shiori Maeda; Yutaka Negishi; Christopher J Mungall; Terrence F Meehan; Timo Lassmann; Masayoshi Itoh; Hideya Kawaji; Naoto Kondo; Jun Kawai; Andreas Lennartsson; Carsten O Daub; Peter Heutink; David A Hume; Torben Heick Jensen; Harukazu Suzuki; Yoshihide Hayashizaki; Ferenc Müller; Alistair R R Forrest; Piero Carninci
Journal: Nature Date: 2014-03-27 Impact factor: 49.962

7. An integrated encyclopedia of DNA elements in the human genome.

Authors:
Journal: Nature Date: 2012-09-06 Impact factor: 49.962

8. Understanding multicellular function and disease with human tissue-specific networks.

Authors: Casey S Greene; Arjun Krishnan; Aaron K Wong; Emanuela Ricciotti; Rene A Zelaya; Daniel S Himmelstein; Ran Zhang; Boris M Hartmann; Elena Zaslavsky; Stuart C Sealfon; Daniel I Chasman; Garret A FitzGerald; Kara Dolinski; Tilo Grosser; Olga G Troyanskaya
Journal: Nat Genet Date: 2015-04-27 Impact factor: 38.330

9. Heritability enrichment of specifically expressed genes identifies disease-relevant tissues and cell types.

Authors: Hilary K Finucane; Yakir A Reshef; Verneri Anttila; Kamil Slowikowski; Alexander Gusev; Andrea Byrnes; Steven Gazal; Po-Ru Loh; Caleb Lareau; Noam Shoresh; Giulio Genovese; Arpiar Saunders; Evan Macosko; Samuela Pollack; John R B Perry; Jason D Buenrostro; Bradley E Bernstein; Soumya Raychaudhuri; Steven McCarroll; Benjamin M Neale; Alkes L Price
Journal: Nat Genet Date: 2018-04-09 Impact factor: 38.330

10. Understanding Tissue-Specific Gene Regulation.

Authors: Abhijeet Rajendra Sonawane; John Platig; Maud Fagny; Cho-Yi Chen; Joseph Nathaniel Paulson; Camila Miranda Lopes-Ramos; Dawn Lisa DeMeo; John Quackenbush; Kimberly Glass; Marieke Lydia Kuijjer
Journal: Cell Rep Date: 2017-10-24 Impact factor: 9.423

2 in total

1. A multi-layer functional genomic analysis to understand noncoding genetic variation in lipids.

Authors: Shweta Ramdas; Jonathan Judd; Sarah E Graham; Stavroula Kanoni; Yuxuan Wang; Ida Surakka; Brandon Wenz; Shoa L Clarke; Alessandra Chesi; Andrew Wells; Konain Fatima Bhatti; Sailaja Vedantam; Thomas W Winkler; Adam E Locke; Eirini Marouli; Greg J M Zajac; Kuan-Han H Wu; Ioanna Ntalla; Qin Hui; Derek Klarin; Austin T Hilliard; Zeyuan Wang; Chao Xue; Gudmar Thorleifsson; Anna Helgadottir; Daniel F Gudbjartsson; Hilma Holm; Isleifur Olafsson; Mi Yeong Hwang; Sohee Han; Masato Akiyama; Saori Sakaue; Chikashi Terao; Masahiro Kanai; Wei Zhou; Ben M Brumpton; Humaira Rasheed; Aki S Havulinna; Yogasudha Veturi; Jennifer Allen Pacheco; Elisabeth A Rosenthal; Todd Lingren; QiPing Feng; Iftikhar J Kullo; Akira Narita; Jun Takayama; Hilary C Martin; Karen A Hunt; Bhavi Trivedi; Jeffrey Haessler; Franco Giulianini; Yuki Bradford; Jason E Miller; Archie Campbell; Kuang Lin; Iona Y Millwood; Asif Rasheed; George Hindy; Jessica D Faul; Wei Zhao; David R Weir; Constance Turman; Hongyan Huang; Mariaelisa Graff; Ananyo Choudhury; Dhriti Sengupta; Anubha Mahajan; Michael R Brown; Weihua Zhang; Ketian Yu; Ellen M Schmidt; Anita Pandit; Stefan Gustafsson; Xianyong Yin; Jian'an Luan; Jing-Hua Zhao; Fumihiko Matsuda; Hye-Mi Jang; Kyungheon Yoon; Carolina Medina-Gomez; Achilleas Pitsillides; Jouke Jan Hottenga; Andrew R Wood; Yingji Ji; Zishan Gao; Simon Haworth; Ruth E Mitchell; Jin Fang Chai; Mette Aadahl; Anne A Bjerregaard; Jie Yao; Ani Manichaikul; Wen-Jane Lee; Chao Agnes Hsiung; Helen R Warren; Julia Ramirez; Jette Bork-Jensen; Line L Kårhus; Anuj Goel; Maria Sabater-Lleal; Raymond Noordam; Pala Mauro; Floris Matteo; Aaron F McDaid; Pedro Marques-Vidal; Matthias Wielscher; Stella Trompet; Naveed Sattar; Line T Møllehave; Matthias Munz; Lingyao Zeng; Jianfeng Huang; Bin Yang; Alaitz Poveda; Azra Kurbasic; Sebastian Schönherr; Lukas Forer; Markus Scholz; Tessel E Galesloot; Jonathan P Bradfield; Sanni E Ruotsalainen; E Warwick Daw; Joseph M Zmuda; Jonathan S Mitchell; Christian Fuchsberger; Henry Christensen; Jennifer A Brody; Phuong Le; Mary F Feitosa; Mary K Wojczynski; Daiane Hemerich; Michael Preuss; Massimo Mangino; Paraskevi Christofidou; Niek Verweij; Jan W Benjamins; Jorgen Engmann; Tsao L Noah; Anurag Verma; Roderick C Slieker; Ken Sin Lo; Nuno R Zilhao; Marcus E Kleber; Graciela E Delgado; Shaofeng Huo; Daisuke D Ikeda; Hiroyuki Iha; Jian Yang; Jun Liu; Ayşe Demirkan; Hampton L Leonard; Jonathan Marten; Carina Emmel; Börge Schmidt; Laura J Smyth; Marisa Cañadas-Garre; Chaolong Wang; Masahiro Nakatochi; Andrew Wong; Nina Hutri-Kähönen; Xueling Sim; Rui Xia; Alicia Huerta-Chagoya; Juan Carlos Fernandez-Lopez; Valeriya Lyssenko; Suraj S Nongmaithem; Alagu Sankareswaran; Marguerite R Irvin; Christopher Oldmeadow; Han-Na Kim; Seungho Ryu; Paul R H J Timmers; Liubov Arbeeva; Rajkumar Dorajoo; Leslie A Lange; Gauri Prasad; Laura Lorés-Motta; Marc Pauper; Jirong Long; Xiaohui Li; Elizabeth Theusch; Fumihiko Takeuchi; Cassandra N Spracklen; Anu Loukola; Sailalitha Bollepalli; Sophie C Warner; Ya Xing Wang; Wen B Wei; Teresa Nutile; Daniela Ruggiero; Yun Ju Sung; Shufeng Chen; Fangchao Liu; Jingyun Yang; Katherine A Kentistou; Bernhard Banas; Anna Morgan; Karina Meidtner; Lawrence F Bielak; Jennifer A Smith; Prashantha Hebbar; Aliki-Eleni Farmaki; Edith Hofer; Maoxuan Lin; Maria Pina Concas; Simona Vaccargiu; Peter J van der Most; Niina Pitkänen; Brian E Cade; Sander W van der Laan; Kumaraswamy Naidu Chitrala; Stefan Weiss; Amy R Bentley; Ayo P Doumatey; Adebowale A Adeyemo; Jong Young Lee; Eva R B Petersen; Aneta A Nielsen; Hyeok Sun Choi; Maria Nethander; Sandra Freitag-Wolf; Lorraine Southam; Nigel W Rayner; Carol A Wang; Shih-Yi Lin; Jun-Sing Wang; Christian Couture; Leo-Pekka Lyytikäinen; Kjell Nikus; Gabriel Cuellar-Partida; Henrik Vestergaard; Bertha Hidalgo; Olga Giannakopoulou; Qiuyin Cai; Morgan O Obura; Jessica van Setten; Karen Y He; Hua Tang; Natalie Terzikhan; Jae Hun Shin; Rebecca D Jackson; Alexander P Reiner; Lisa Warsinger Martin; Zhengming Chen; Liming Li; Takahisa Kawaguchi; Joachim Thiery; Joshua C Bis; Lenore J Launer; Huaixing Li; Mike A Nalls; Olli T Raitakari; Sahoko Ichihara; Sarah H Wild; Christopher P Nelson; Harry Campbell; Susanne Jäger; Toru Nabika; Fahd Al-Mulla; Harri Niinikoski; Peter S Braund; Ivana Kolcic; Peter Kovacs; Tota Giardoglou; Tomohiro Katsuya; Dominique de Kleijn; Gert J de Borst; Eung Kweon Kim; Hieab H H Adams; M Arfan Ikram; Xiaofeng Zhu; Folkert W Asselbergs; Adriaan O Kraaijeveld; Joline W J Beulens; Xiao-Ou Shu; Loukianos S Rallidis; Oluf Pedersen; Torben Hansen; Paul Mitchell; Alex W Hewitt; Mika Kähönen; Louis Pérusse; Claude Bouchard; Anke Tönjes; Yii-Der Ida Chen; Craig E Pennell; Trevor A Mori; Wolfgang Lieb; Andre Franke; Claes Ohlsson; Dan Mellström; Yoon Shin Cho; Hyejin Lee; Jian-Min Yuan; Woon-Puay Koh; Sang Youl Rhee; Jeong-Taek Woo; Iris M Heid; Klaus J Stark; Martina E Zimmermann; Henry Völzke; Georg Homuth; Michele K Evans; Alan B Zonderman; Ozren Polasek; Gerard Pasterkamp; Imo E Hoefer; Susan Redline; Katja Pahkala; Albertine J Oldehinkel; Harold Snieder; Ginevra Biino; Reinhold Schmidt; Helena Schmidt; Stefania Bandinelli; George Dedoussis; Thangavel Alphonse Thanaraj; Patricia A Peyser; Norihiro Kato; Matthias B Schulze; Giorgia Girotto; Carsten A Böger; Bettina Jung; Peter K Joshi; David A Bennett; Philip L De Jager; Xiangfeng Lu; Vasiliki Mamakou; Morris Brown; Mark J Caulfield; Patricia B Munroe; Xiuqing Guo; Marina Ciullo; Jost B Jonas; Nilesh J Samani; Jaakko Kaprio; Päivi Pajukanta; Teresa Tusié-Luna; Carlos A Aguilar-Salinas; Linda S Adair; Sonny Augustin Bechayda; H Janaka de Silva; Ananda R Wickremasinghe; Ronald M Krauss; Jer-Yuarn Wu; Wei Zheng; Anneke I den Hollander; Dwaipayan Bharadwaj; Adolfo Correa; James G Wilson; Lars Lind; Chew-Kiat Heng; Amanda E Nelson; Yvonne M Golightly; James F Wilson; Brenda Penninx; Hyung-Lae Kim; John Attia; Rodney J Scott; D C Rao; Donna K Arnett; Mark Walker; Laura J Scott; Heikki A Koistinen; Giriraj R Chandak; Josep M Mercader; Clicerio Gonzalez Villalpando; Lorena Orozco; Myriam Fornage; E Shyong Tai; Rob M van Dam; Terho Lehtimäki; Nish Chaturvedi; Mitsuhiro Yokota; Jianjun Liu; Dermot F Reilly; Amy Jayne McKnight; Frank Kee; Karl-Heinz Jöckel; Mark I McCarthy; Colin N A Palmer; Veronique Vitart; Caroline Hayward; Eleanor Simonsick; Cornelia M van Duijn; Zi-Bing Jin; Fan Lu; Haretsugu Hishigaki; Xu Lin; Winfried März; Vilmundur Gudnason; Jean-Claude Tardif; Guillaume Lettre; Leen M T Hart; Petra J M Elders; Daniel J Rader; Scott M Damrauer; Meena Kumari; Mika Kivimaki; Pim van der Harst; Tim D Spector; Ruth J F Loos; Michael A Province; Esteban J Parra; Miguel Cruz; Bruce M Psaty; Ivan Brandslund; Peter P Pramstaller; Charles N Rotimi; Kaare Christensen; Samuli Ripatti; Elisabeth Widén; Hakon Hakonarson; Struan F A Grant; Lambertus Kiemeney; Jacqueline de Graaf; Markus Loeffler; Florian Kronenberg; Dongfeng Gu; Jeanette Erdmann; Heribert Schunkert; Paul W Franks; Allan Linneberg; J Wouter Jukema; Amit V Khera; Minna Männikkö; Marjo-Riitta Jarvelin; Zoltan Kutalik; Cucca Francesco; Dennis O Mook-Kanamori; Ko Willems van Dijk; Hugh Watkins; David P Strachan; Niels Grarup; Peter Sever; Neil Poulter; Wayne Huey-Herng Sheu; Jerome I Rotter; Thomas M Dantoft; Fredrik Karpe; Matt J Neville; Nicholas J Timpson; Ching-Yu Cheng; Tien-Yin Wong; Chiea Chuen Khor; Hengtong Li; Charumathi Sabanayagam; Annette Peters; Christian Gieger; Andrew T Hattersley; Nancy L Pedersen; Patrik K E Magnusson; Dorret I Boomsma; Eco J C de Geus; L Adrienne Cupples; Joyce B J van Meurs; Arfan Ikram; Mohsen Ghanbari; Penny Gordon-Larsen; Wei Huang; Young Jin Kim; Yasuharu Tabara; Nicholas J Wareham; Claudia Langenberg; Eleftheria Zeggini; Jaakko Tuomilehto; Johanna Kuusisto; Markku Laakso; Erik Ingelsson; Goncalo Abecasis; John C Chambers; Jaspal S Kooner; Paul S de Vries; Alanna C Morrison; Scott Hazelhurst; Michèle Ramsay; Kari E North; Martha Daviglus; Peter Kraft; Nicholas G Martin; John B Whitfield; Shahid Abbas; Danish Saleheen; Robin G Walters; Michael V Holmes; Corri Black; Blair H Smith; Aris Baras; Anne E Justice; Julie E Buring; Paul M Ridker; Daniel I Chasman; Charles Kooperberg; Gen Tamiya; Masayuki Yamamoto; David A van Heel; Richard C Trembath; Wei-Qi Wei; Gail P Jarvik; Bahram Namjou; M Geoffrey Hayes; Marylyn D Ritchie; Pekka Jousilahti; Veikko Salomaa; Kristian Hveem; Bjørn Olav Åsvold; Michiaki Kubo; Yoichiro Kamatani; Yukinori Okada; Yoshinori Murakami; Bong-Jo Kim; Unnur Thorsteinsdottir; Kari Stefansson; Jifeng Zhang; Y Eugene Chen; Yuk-Lam Ho; Julie A Lynch; Philip S Tsao; Kyong-Mi Chang; Kelly Cho; Christopher J O'Donnell; John M Gaziano; Peter Wilson; Karen L Mohlke; Timothy M Frayling; Joel N Hirschhorn; Sekar Kathiresan; Michael Boehnke; Pradeep Natarajan; Yan V Sun; Andrew P Morris; Panos Deloukas; Gina Peloso; Themistocles L Assimes; Cristen J Willer; Xiang Zhu; Christopher D Brown
Journal: Am J Hum Genet Date: 2022-08-04 Impact factor: 11.043

2. Leveraging cell-type-specific regulatory networks to interpret genetic variants in abdominal aortic aneurysm.

Authors: Shining Ma; Xi Chen; Xiang Zhu; Philip S Tsao; Wing Hung Wong
Journal: Proc Natl Acad Sci U S A Date: 2022-01-04 Impact factor: 11.205

2 in total