| Literature DB >> 33990562 |
Xiang Zhu1,2,3, Zhana Duren4,5, Wing Hung Wong6,7.
Abstract
Genome-wide association studies (GWAS) have cataloged many significant associations between genetic variants and complex traits. However, most of these findings have unclear biological significance, because they often have small effects and occur in non-coding regions. Integration of GWAS with gene regulatory networks addresses both issues by aggregating weak genetic signals within regulatory programs. Here we develop a Bayesian framework that integrates GWAS summary statistics with regulatory networks to infer genetic enrichments and associations simultaneously. Our method improves upon existing approaches by explicitly modeling network topology to assess enrichments, and by automatically leveraging enrichments to identify associations. Applying this method to 18 human traits and 38 regulatory networks shows that genetic signals of complex traits are often enriched in interconnections specific to trait-relevant cell types or tissues. Prioritizing variants within enriched networks identifies known and previously undescribed trait-associated genes revealing biological and therapeutic insights.Entities:
Year: 2021 PMID: 33990562 PMCID: PMC8121952 DOI: 10.1038/s41467-021-22588-0
Source DB: PubMed Journal: Nat Commun ISSN: 2041-1723 Impact factor: 14.919
Fig. 1Schematic of RSS-NET.
a Decomposition of the total effect of a common SNP on a complex trait through multiple nearby and distal genes. b Gene regulatory network defined as a weighted and directed bipartite graph linking TFs to TGs. c RSS-NET exploits the topology of a TF-TG network to decompose the total genetic effect into cis and trans-regulatory components. Both the SNP-gene (c) and TF-TG (v) weights in this decomposition are assumed known and are specified by existing omics data (Methods). In addition to TF-TG networks, RSS-NET also requires d GWAS summary statistics and e ancestry-matching LD estimates as input. f Bayesian hierarchical model underlying RSS-NET. An in-depth description is provided in Methods. g Given a network, RSS-NET produces a Bayes factor comparing the baseline (M0) and enrichment (M1) models to summarize the evidence for network enrichment. h RSS-NET prioritizes loci within an enriched network by computing P1, the posterior probability that at least one SNP j in a locus is trait-associated (β ≠ 0). Differences between P1 under M0 and M1 reflect the influence of a regulatory network on genetic associations, highlighting previously undescribed trait-associated genes.
Fig. 2Flexibility of RSS-NET to identify network-level enrichments from GWAS summary statistics.
We used a B cell-specific regulatory network and real genotypes of 348,965 genome-wide SNPs to simulate negative and positive individual-level data under two genetic architectures (“sparse” and “polygenic”). We simulated SNP effects () for negative datasets from the baseline model (M0: θ = 0 and σ2 = 0). We simulated for positive datasets from the enrichment model (M1: θ > 0 or σ2 > 0) for the target network under three scenarios: a θ > 0, σ2 = 0; b θ = 0, σ2 > 0; c θ > 0, σ2 > 0. Using the simulated individual-level data we computed single-SNP association statistics, on which we compared RSS-NET with RSS-E[16], LDSC-baseline[13], LDSC-baselineLD[27], and Pascal[26] using their default setups (Methods). Pascal includes two gene (“max”: maximum-of-χ2; “sum”: sum-of-χ2) and two pathway (“chi”: χ2 approximation; “emp”: empirical sampling) scoring options. For each dataset, Pascal and LDSC methods produced P-values, whereas RSS-E and RSS-NET produced BFs; these statistics were used to rank the significance of enrichments. A false and true positive occurs if a method identifies enrichment of the target network in a negative and positive dataset respectively. Each panel displays the trade-off between false and true positives via receiver operating characteristics (ROC) curves for all methods in 200 negative and 200 positive datasets of a simulation scenario, and also reports the corresponding areas under ROC curves (AUROCs, higher value indicating better performance). Dashed diagonal lines denote random ROC curves (AUROC = 0.5). d RSS-NET, as well as other methods, does not perform well when the target network harbors weak genetic associations. Simulation details and additional results are provided in Supplementary Figs. 1, 2.
Fig. 4Power of RSS-NET to identify gene-level associations from GWAS summary statistics.
We used a B cell-specific regulatory network and real genotypes of 348,965 genome-wide SNPs to simulate individual-level GWAS data under four scenarios: a θ = 0, σ2 = 0; b θ > 0, σ2 = 0; c θ = 0, σ2 > 0; d θ > 0, σ2 > 0. Using the simulated individual-level data we computed single-SNP association statistics, on which we compared RSS-NET with gene-level association components of RSS-E[16] and Pascal[26]. RSS-E is a special case of RSS-NET assuming σ2 = 0, and RSS-E-baseline is a special case of RSS-E assuming θ = 0. Pascal includes two gene scoring options: maximum-of-χ2 (“max”) and sum-of-χ2 (“sum”). Given a network, Pascal and RSS-E-baseline do not leverage any network information, RSS-E ignores the edge information, and RSS-NET exploits the full topology. Each scenario contains 200 datasets and each dataset contains 16,954 autosomal protein-coding genes for testing. We defined a gene as "trait-associated'' if at least one SNP j within 100 kb of the transcribed region of this gene had non-zero effect (β ≠ 0). For each gene in each dataset, RSS methods produced posterior probabilities that the gene was trait-associated (P1), whereas Pascal methods produced association P-values; these statistics were used to rank the significance of gene-level associations. The first row of each panel displays ROC curves and AUROCs for all methods, with dashed diagonal lines indicating random performance (AUROC = 0.5). The second row of each panel displays precision-recall (PRC) curves and areas under PRC curves (AUPRCs) for all methods, with dashed horizontal lines indicating random performance. For both AUROC and AUPRC, higher value indicates better performance. Simulation details and additional results are provided in Supplementary Figs. 7, 8.
Fig. 3Robustness of RSS-NET to model mis-specification in enrichment analyses.
Here positive datasets were generated from M1 with θ > 0 and σ2 > 0 (Fig. 2c). Negative datasets were simulated from four scenarios where genetic associations were enriched in: a a random set of near-gene SNPs; b a random set of near-RE SNPs; c SNPs with MAF- and LD-dependent effects; d a random edge-altered network. By this design, RSS-NET was mis-specified in all four scenarios. Similar to positive datasets, the simulated false enrichments in all negative datasets manifested in both association proportion (more frequent) and magnitude (larger effect). RSS-E was excluded here because of its poor performance shown in Fig. 2c. The rest is the same as Fig. 2. Simulation details and additional results are provided in Supplementary Figs. 3–6.
Fig. 5RSS-NET analyses of 18 complex traits and 38 regulatory networks.
a Clustering of 38 regulatory networks based on t-distributed stochastic neighbour embedding. Details are provided in Supplementary Fig. 11. b Similarity between a given tissue-specific PECA-based network and 394 CAGE-based networks for various cell types and tissues (a: adult samples; c: cell lines; f: fetal samples). The similarity between a PECA- and CAGE-based network is summarized by Jaccard indices of their node sets (x-axis) and edge sets (y-axis). To simplify visualization, only labels of top four CAGE-based networks with the highest edge similarity are shown for each PECA-based network. See Supplementary Fig. 12 for additional results. c Ternary diagram showing, for each trait, percentages of the “best” enrichment model (with the largest BF) as M11: θ > 0, σ2 = 0, M12: θ = 0, σ2 > 0 and M13: θ > 0, σ2 > 0 across networks. See Supplementary Table 4 for numerical values. Shown are 16 traits having multiple networks more enriched than the near-gene control. d Comparison of context-matched PECA-based (y-axis) and CAGE-based (x-axis) network enrichments on the same GWAS. Dashed lines have slope 1 and intercept 0. See Supplementary Fig. 14 for additional results. e Median proportion of genes with higher than reference estimates ( or ), among genes with reference estimates higher than a given cutoff. Medians are evaluated among 16 traits in c. See Supplementary Table 5 for numerical values. Overlap of RSS-NET prioritized genes () with genes implicated in f knockout mouse phenotypes[47] and g human Mendelian diseases[49,50]. An edge indicates that a category of knockout mouse or Mendelian genes is significantly enriched for genes prioritized for a GWAS trait (FDR ≤ 0.1). Thicker edges correspond to stronger enrichments. To simplify visualization, only top-ranked categories are shown for each trait (f 3; g 2). See Supplementary Data 4, 5 for full results. Trait abbreviations are defined in Supplementary Table 1.
Fig. 6RSS-NET gene prioritization results of select trait-network pairs.
Shown are four trait-network pairs: a body mass index and pancreas; b rheumatoid arthritis and B cell; c high-density lipoprotein cholesterol and liver; d neuroticism and putamen. In the first column of each panel, each point represents a member gene of a given network (blue circle: TF; orange triangle: TG). Dashed lines have slope 1 and intercept 0. In the second and third columns, each point represents a cell type- or tissue-specific network to which a select gene belongs. Numerical values of P1 and BF are available online (Data availability) and are provided as a Source Data file.
Examples of RSS-NET highlighted genes that were not reported in GWAS of the same data but were implicated in later GWAS with increased sample sizes (genome-wide significance threshold: single-SNP association P < 5 × 10−8).
| Trait | Gene (Role) | Mouse trait | Therapeutic and clinical evidence | ||||
|---|---|---|---|---|---|---|---|
| BMI | 0.78 | 0.80 | 0.94 | 0.94 (Pancreas, 2.07 × 1013) | Eye, Renal | Ocular and renal anomalies | |
| 0.61 | 0.70 | 0.85 | 0.85 (Cerebellum, 8.70 × 1011) | Growth, Immune | Acute myeloid leukemia | ||
| WAIST | 0.97 | 0.97 | 0.98 | 0.98 (Esophagus, 6.78 × 10239) | Neuron, NS | Lissencephaly-5 | |
| BC | 0.89 | 0.93 | 0.98 | 0.98 (Heart, 8.08 × 107) | CS | Scalp-ear-nipple syndrome | |
| 0.71 | 0.72 | 0.94 | 0.94 (Aorta, 8.27 × 108) | Growth, Immune | Hepatoma, Glionitrin A* | ||
| RA | 0.54 | 0.61 | 0.84 | 0.84 (B cell, 3.31 × 1057) | Immune | APS1 | |
| IBD | 0.98 | 0.94 | 0.99 | 0.99 (Monocyte, 6.28 × 1031) | Cellular | Acute myeloid leukemia | |
| 0.84 | 0.78 | 0.95 | 0.95 (NK cell, 5.07 × 1035) | Immune, Neuron | Language impairment | ||
| 0.81 | 0.89 | 0.95 | 0.95 (NK cell, 5.07 × 1035) | Immune | |||
| HDL | 0.97 | 0.97 | 0.99 | 0.99 (Monocyte, 4.75 × 1015) | Immune, Metab. | Atherosclerosis | |
| 0.92 | 0.95 | 0.98 | 0.98 (Liver, 2.81 × 1021) | Liver, Metab. | |||
| 0.84 | 0.93 | 0.98 | 0.98 (Liver, 2.81 × 1021) | Growth, Metab. | Early-onset obesity | ||
| LDL | 0.99 | 0.99 | 1.00 | 1.00 (NK cell, 5.18 × 1030) | Liver, Metab. | ||
| 0.98 | 0.98 | 0.99 | 0.99 (Liver, 7.66 × 1027) | Liver, Metab. | Tangier disease, Probucol* | ||
| 0.68 | 0.72 | 0.88 | 0.88 (Liver, 7.66 × 1027) | Liver, Metab. | Cholestasis | ||
| 0.69 | 0.59 | 0.85 | 0.85 (NK cell, 5.18 × 1030) | Metab., NS | Tat-NR2B9c* | ||
| 0.52 | 0.65 | 0.82 | 0.84 (CD8, 5.86 × 1028) | Liver, Metab. | Vesicoureteral reflux-3 | ||
| CAD | 0.92 | 0.99 | 0.99 | 0.99 (Adipose, 1.67 × 1029) | CS, Growth | Camurati-Engelmann disease | |
| 0.58 | 0.79 | 0.91 | 0.92 (GEJ, 9.78 × 1028) | CS, Metab. | GFND2, SMDCF | ||
| 0.31 | 0.55 | 0.77 | 0.82 (Heart, 1.93 × 1028) | CS, Metab. | |||
| 0.57 | 0.79 | 0.80 | 0.82 (Aorta, 1.09 × 1027) | CS, Muscle | Ambrisentan*, Macitentan* | ||
| AF | 0.87 | 0.92 | 1.00 | 1.00 (Heart, 6.89 × 1012) | CS, Muscle | Brugada syndrome-1, ATFB10 | |
| 0.50 | 0.76 | 0.92 | 0.94 (Uterus, 2.71 × 1011) | QGC-001* | |||
| 0.45 | 0.62 | 0.90 | 0.90 (Colon, 7.54 × 1014) | Muscle, NS | Spinocerebellar ataxia-1 | ||
| 0.55 | 0.66 | 0.86 | 0.87 (Muscle, 8.55 × 1014) | Myofibrillar myopathy | |||
| SCZ | 1.00 | 1.00 | 1.00 | 1.00 (Colon, 1.20 × 10144) | Growth, Neuron | Language impairment | |
| 1.00 | 1.00 | 1.00 | 1.00 (Spleen, 1.44 × 10141) | Immune, NS | Dias-Logan syndrome | ||
| 0.79 | 0.81 | 0.88 | 0.88 (Muscle, 4.99 × 10127) | Neuron, NS | DEE39 | ||
| NEU | 0.72 | 0.88 | 0.95 | 0.95 (CD8, 3.66 × 1020) | Immune, NS | Pitt-Hopkins syndrome | |
| 0.77 | 0.88 | 0.93 | 0.93 (Muscle, 8.20 × 1017) | Muscle, NS | Congenital myasthenic syndrome-11 | ||
| 0.15 | 0.40 | 0.83 | 0.83 (Ileum, 8.56 × 1022) | Growth, Neuron | Mental retardation-20 | ||
| 0.15 | 0.32 | 0.78 | 0.79 (Putamen, 2.12 × 1019) | Neuron, NS | Parkinsonism, BIIB054* | ||
| 0.10 | 0.22 | 0.62 | 0.64 (Putamen, 2.12 × 1019) | NS, Vision | Optic nerve hypoplasia | ||
| 0.06 | 0.17 | 0.63 | 0.63 (Ileum, 8.56 × 1022) | Growth, NS | Pontocerebellar hypoplasia-3 |
The “mouse trait” column is based on the Mouse Genome Informatics[47]. The “therapeutic/clinical evidence” column is based on the Online Mendelian Inheritance in Man[50] and Therapeutic Target Database[53]. Drugs are identified with an asterisk ("*”). Trait abbreviations are defined in Supplementary Table 1. GEJ: gastroesophageal junction. CS: cardiovascular system. DS: digestive/alimentary system. Metab.: metabolism. NS: nervous system. APS1: autoimmune polyendocrinopathy syndrome-1. GFND2: glomerulopathy with fibronectin deposits-2. SMDCF: corner fracture type of spondylometaphyseal dysplasia. ATFB10: familial atrial fibrillation-10. DEE39: developmental and epileptic encephalopathy-39.
Examples of RSS-NET highlighted genes that have not reached genome-wide significance in the GWAS Catalog[1] at the time of analysis.
| Trait | Gene (Role) | Mouse trait | Therapeutic and clinical evidence | ||||
|---|---|---|---|---|---|---|---|
| BMI | 0.71 | 0.79 | 0.89 | 0.90 (Muscle, 9.31 × 1012) | CS, Muscle | Cardiomyopathy | |
| 0.61 | 0.70 | 0.83 | 0.86 (NK cell, 3.95 × 1013) | DS, Growth | |||
| WAIST | 0.80 | 0.68 | 0.87 | 0.87 (Esophagus, 6.78 × 10239) | Adipose, Growth | Berardinelli-Seip syndrome | |
| 0.56 | 0.59 | 0.73 | 0.73 (Esophagus, 6.78 × 10239) | Growth, NS | Speech-language disorder-1 | ||
| BC | 0.76 | 0.80 | 0.91 | 0.92 (Aorta, 8.27 × 108) | CS, Eye | Adenylosuccinase deficiency | |
| 0.57 | 0.63 | 0.89 | 0.90 (Esophagus, 6.30 × 107) | Growth, Muscle | AMC3, EDMD4, SCAR8 | ||
| RA | 0.71 | 0.79 | 0.91 | 0.93 (CD4, 3.02 × 1052) | Immune, Tumor | Acute lymphocytic leukemia | |
| 0.30 | 0.60 | 0.90 | 0.91 (CD4, 3.02 × 1052) | Immune, Tumor | |||
| 0.33 | 0.57 | 0.73 | 0.73 (B cell, 3.31 × 1057) | Immune, Tumor | Acute myeloid leukemia | ||
| IBD | 0.63 | 0.87 | 0.95 | 0.95 (CD4, 5.32 × 1033) | Immune, Tumor | ||
| 0.85 | 0.83 | 0.94 | 0.94 (NK cell, 5.07 × 1035) | Immune, Renal | Barakat syndrome | ||
| 0.66 | 0.78 | 0.87 | 0.90 (B cell, 1.49 × 1032) | Immune, NS | Intellectual disability | ||
| 0.74 | 0.85 | 0.84 | 0.88 (B cell, 1.49 × 1032) | Immune | Immunodeficiency, DIMS-0150* | ||
| 0.42 | 0.58 | 0.72 | 0.72 (NK cell, 5.07 × 1035) | Immune | Immunodeficiency | ||
| 0.38 | 0.53 | 0.71 | 0.71 (NK cell, 5.07 × 1035) | Immune | Immunodeficiency | ||
| HDL | 0.10 | 0.09 | 0.98 | 0.98 (Liver, 2.81 × 1021) | CS, Metab. | ||
| 0.79 | 0.80 | 0.95 | 0.95 (Liver, 2.81 × 1021) | Adipose, Metab. | |||
| 0.77 | 0.82 | 0.95 | 0.95 (Liver, 2.81 × 1021) | CS, Metab. | Myocardial infarction | ||
| 0.85 | 0.85 | 0.92 | 0.92 (Monocyte, 4.75 × 1015) | Metab. | ARI-3037MO* | ||
| 0.48 | 0.45 | 0.78 | 0.78 (Liver, 2.81 × 1021) | CS, Muscle | Cardiomyopathy, Levosimendan* | ||
| LDL | 0.79 | 0.83 | 0.90 | 0.90 (Aorta, 3.71 × 1027) | CS, Immune | Cardiomyopathy, Semapimod* | |
| 0.70 | 0.76 | 0.90 | 0.90 (Liver, 7.66 × 1027) | CS, Metab. | Amyloidosis, HDL deficiency | ||
| 0.69 | 0.59 | 0.85 | 0.85 (NK cell, 5.18 × 1030) | Liver, Metab. | VLCAD deficiency | ||
| T2D | 0.75 | 0.99 | 0.99 | 0.99 (Ileum, 4.52 × 1062) | Immune, Metab. | Amelogenesis imperfecta type IH | |
| HR | 0.65 | 0.67 | 0.92 | 0.93 (Aorta, 2.43 × 107) | CS, Growth | SDDHD | |
| CAD | 0.56 | 0.78 | 0.86 | 0.86 (Aorta, 1.09 × 1027) | Immune, Metab. | GSK2330811* | |
| 0.43 | 0.68 | 0.85 | 0.85 (Adipose, 1.67 × 1029) | Adipose, Metab. | |||
| 0.19 | 0.43 | 0.61 | 0.61 (CD8, 1.13 × 1025) | CS | Congenital heart defects | ||
| AF | 0.88 | 0.93 | 0.99 | 0.99 (Ileum, 4.43 × 1013) | Metab. | Poor metabolism of thiopurines-1 | |
| 0.44 | 0.60 | 0.88 | 0.89 (Heart, 2.15 × 1014) | CS, Immune | Acute myeloid leukemia, FPDMM | ||
| 0.56 | 0.72 | 0.88 | 0.88 (Muscle, 8.55 × 1014) | Blood, Immune | Interleukin-3* | ||
| LOAD | 0.99 | 1.00 | 1.00 | 1.00 (CD8, 8.31 × 1026) | Cellular, NS | Caspase-2* | |
| 0.64 | 0.92 | 0.94 | 0.94 (Pancreas, 3.53 × 1020) | Metab. | Amyloidosis, Inotersen*, Patisiran* | ||
| SCZ | 1.00 | 1.00 | 1.00 | 1.00 (Cortex, 5.39 × 10128) | Neuron, NS | Intellectual disability | |
| 1.00 | 1.00 | 1.00 | 1.00 (Putamen, 7.22 × 10116) | Neuron, NS | Amyotrophic lateral sclerosis-19 | ||
| 0.97 | 0.97 | 0.98 | 0.98 (Cortex, 5.39 × 10128) | NS | MACID | ||
| 0.90 | 0.94 | 0.97 | 0.97 (Cerebellum, 3.15 × 10129) | Neuron, NS | Mental retardation | ||
| 0.84 | 0.89 | 0.93 | 0.93 (Cerebellum, 3.15 × 10129) | Neuron, NS | Baker-Gordon syndrome | ||
| 0.80 | 0.84 | 0.93 | 0.93 (Colon, 1.07 × 10141) | Neuron, NS | Migraine | ||
| 0.78 | 0.84 | 0.91 | 0.91 (Cerebellum, 3.15 × 10129) | Neuron, NS | DEE58 | ||
| 0.73 | 0.78 | 0.86 | 0.86 (Monocyte, 5.85 × 10131) | Neuron, NS | Parkinsonism, DNL151*, DNL201* | ||
| 0.74 | 0.78 | 0.83 | 0.83 (Spleen, 1.44 × 10141) | Neuron, NS | FTDALS1 | ||
| 0.60 | 0.66 | 0.74 | 0.74 (Cerebellum, 3.15 × 10129) | Neuron, NS | Parkinsonism, BIIB054* | ||
| NEU | 0.42 | 0.66 | 0.94 | 0.94 (Ileum, 8.56 × 1022) | Metab. | MAHCF | |
| 0.36 | 0.56 | 0.90 | 0.91 (Spleen, 2.13 × 1019) | Immune, NS | |||
| 0.33 | 0.39 | 0.76 | 0.78 (Putamen, 2.12 × 1019) | Neuron, NS | AHC1, FHM2 |
AMC3: myogenic-type arthrogryposis multiplex congenita-3. EDMD4: Emery-Dreifuss muscular dystrophy-4. SCAR8: autosomal recessive spinocerebellar ataxia-8. VLCAD: very long-chain acyl-CoA dehydrogenase. SDDHD: short stature, developmental delay, and congenital heart defects. FPDMM: familial platelet disorder with associated myeloid malignancy. MACID: acquired macrocephaly with impaired intellectual development. FTDALS1: frontotemporal dementia and/or amyotrophic lateral sclerosis. MAHCF: methylmalonic aciduria and homocystinuria of the cblF type. AHC1: alternating hemiplegia of childhood-1. FHM2: familial hemiplegic migraine-2. The remaining abbreviations are the same as in Table 1.