| Literature DB >> 22892719 |
A Pandey1, N A Davis, B C White, N M Pajewski, J Savitz, W C Drevets, B A McKinney.
Abstract
Most pathway and gene-set enrichment methods prioritize genes by their main effect and do not account for variation due to interactions in the pathway. A portion of the presumed missing heritability in genome-wide association studies (GWAS) may be accounted for through gene-gene interactions and additive genetic variability. In this study, we prioritize genes for pathway enrichment in GWAS of bipolar disorder (BD) by aggregating gene-gene interaction information with main effect associations through a machine learning (evaporative cooling) feature selection and epistasis network centrality analysis. We validate this approach in a two-stage (discovery/replication) pathway analysis of GWAS of BD. The discovery cohort comes from the Wellcome Trust Case Control Consortium (WTCCC) GWAS of BD, and the replication cohort comes from the National Institute of Mental Health (NIMH) GWAS of BD in European Ancestry individuals. Epistasis network centrality yields replicated enrichment of Cadherin signaling pathway, whose genes have been hypothesized to have an important role in BD pathophysiology but have not demonstrated enrichment in previous analysis. Other enriched pathways include Wnt signaling, circadian rhythm pathway, axon guidance and neuroactive ligand-receptor interaction. In addition to pathway enrichment, the collective network approach elevates the importance of ANK3, DGKH and ODZ4 for BD susceptibility in the WTCCC GWAS, despite their weak single-locus effect in the data. These results provide evidence that numerous small interactions among common alleles may contribute to the diathesis for BD and demonstrate the importance of including information from the network of gene-gene interactions as well as main effects when prioritizing genes for pathway analysis.Entities:
Mesh:
Substances:
Year: 2012 PMID: 22892719 PMCID: PMC3432194 DOI: 10.1038/tp.2012.80
Source DB: PubMed Journal: Transl Psychiatry ISSN: 2158-3188 Impact factor: 6.222
Figure 1Epistasis network analysis flowchart. Overview of the data analysis workflow used to identify variants due to epistasis network centrality and test for replication of pathways. The analysis steps in the dotted frame are carried out for the three GWAS at the top (WTCCC, NIMH and, as a secondary analysis, the two GWAS combined). On the bottom left, the enriched pathways are compared between the WTCCC and NIMH GWAS, and replication is defined when a pathway has an FDR-adjusted P-value less than 0.05 for both. On the bottom right, tables are created for the top genes based on their epistasis network centrality for each of the data combinations.
WTCCC pathway enrichment
| P | ||
|---|---|---|
| Wnt signaling pathway(P) | 0.0008 | |
| Neuroactive ligand-receptor interaction(K) | 0.0008 | |
| Cadherin signaling pathway(P) | 0.004 | |
| Shigellosis(K) | 0.0054 | |
| Bacterial invasion of epithelial cells(K) | 0.0085 | |
| Calcium signaling pathway(K) | 0.0141 | |
| CFTR and beta 2 adrenergic receptor (b2ar) pathway(B) | 0.019 | |
| Circadian rhythm—mammal(K) | 0.027 | |
| Signaling events mediated by HDAC class III(N) | 0.0292 | |
| Receptor-ligand complexes bind G proteins(R) | 0.0302 | |
| ID(C) | 0.0315 | |
| Corticosteroids and cardioprotection(B) | 0.0315 | |
| β-Arrestins in gpcr desensitization(B) | 0.0338 | |
| Activation of camp-dependent protein kinase pka(B) | 0.0338 | |
| Role of β-arrestins in the activation and targeting of map kinases(B) | 0.0387 | |
| O-glycan biosynthesis(K) | 0.0438 | |
| Roles of β arrestin-dependent recruitment of src kinases in gpcr signaling(B) | 0.0491 |
Genes were prioritized by epistasis network analysis as described in the Materials and methods. Pathways are shown with adjusted hypergeometric enrichment P-value<0.05.
These pathways suggest replication in the NIMH-BD GWAS for European ancestry (see Table 2).
NIMH-EA pathway enrichment
| P | ||
|---|---|---|
| M phase(R) | 0.009 | |
| Cadherin signaling pathway(P) | 0.0094 | |
| Glycosphingolipid biosynthesis—globo series(K) | 0.0149 | |
| Glycosaminoglycan biosynthesis—keratan sulfate(K) | 0.017 | |
| Syndecan-3-mediated signaling events(N) | 0.0214 | |
| Protein processing in endoplasmic reticulum(K) | 0.0233 | |
| Map kinase inactivation of smrt corepressor(B) | 0.0238 | |
| Axon guidance(K) | 0.0282 | |
| PDGFR-alpha signaling pathway(N) | 0.0289 | |
| LPA receptor-mediated events(N) | 0.0397 | |
| Ephrin B reverse signaling(N) | 0.0403 | |
| RXR and RAR heterodimerization with other nuclear receptor(N) | 0.0465 | |
| Glycosphingolipid biosynthesis—lacto and neolacto series(K) | 0.0497 | |
| Pyruvate metabolism and TCA cycle(R) | 0.053 | |
| Reelin signaling pathway(N) | 0.053 | |
| NR transcription pathway(R) | 0.0599 | |
| Alpha-synuclein signaling(N) | 0.0599 | |
| Wnt signaling pathway(P) | 0.0606 |
Genes were prioritized by epistasis network analysis as described in the Materials and methods and pathway enrichment adjusted P-values calculated by the hypergeometric distribution.
These pathways were statistically significant in the WTCCC-BD GWAS (see Table 1).
Figure 2Epistasis network for WTCCC GWAS of bipolar disorder. Network inferred following ECML feature selection and regression-based genetic association interaction network (reGAIN) for the WTCCC GWAS of bipolar disorder, annotated by top enriched pathways. An edge threshold (0.575) was chosen as described in Materials and methods; interactions below this threshold are hidden. The 146 nodes are colored based on membership of the genes in the pathways with evidence of enrichment replication (Tables 1 and 2): red diamond (membership in both Wnt signaling pathway and cadherin signaling pathway), green square (Wnt signaling pathway only) and magenta triangle (Neuroactive ligand-receptor interaction pathway). The weight of an edge is proportional to the gene–gene interaction strength. The 183 edges are colored based on connection of a gene node to a gene in the given pathway using the scheme above (red squiggle, green dashed, magenta solid). The size of a node is proportional to its degree (number of edges). Note, ANK3 in the middle is the most connected.
Top genes from epistasis network centrality of combined WTCCC+NIMH GWAS
| 5 | rs393291 | DAP | 7.61E-03 | 1.05 | 0.6388 |
| 10 | rs10509126 | 6.64E-03 | 1.192 | 0.01619 | |
| 2 | rs10190186 | FHL2 | 6.63E-03 | 1.195 | 0.01106 |
| 4 | rs7679912 | ARAP2 | 6.41E-03 | 1.209 | 0.009473 |
| 3 | rs6773049 | ZIC1 | 6.30E-03 | 1.143 | 0.07756 |
| 12 | rs983421 | SUDS3 | 6.29E-03 | 1.154 | 0.05072 |
| 13 | rs606568 | 6.28E-03 | 0.8816 | 0.1125 | |
| 13 | rs17088579 | OR7E156P | 6.27E-03 | 1.123 | 0.1374 |
| 12 | rs4135067 | TDG | 6.17E-03 | 1.091 | 0.2667 |
| 10 | rs2094179 | KLF6 | 6.05E-03 | 1.122 | 0.1266 |
| 1 | rs640718 | KMO | 6.00E-03 | 1.192 | 0.009732 |
| 1 | rs17484306 | RRAGC | 5.97E-03 | 1.231 | 0.00339 |
| 6 | rs3736712 | WDR27 | 5.93E-03 | 1.137 | 0.06991 |
| 11 | rs12275977 | GALNTL4 | 5.92E-03 | 1.127 | 0.09964 |
| 11 | rs6591941 | 5.84E-03 | 1.04 | 0.6031 | |
| 3 | rs614566 | LAMP3 | 5.80E-03 | 1.204 | 0.005761 |
| 14 | rs6574988 | GPR65 | 5.80E-03 | 1.234 | 0.0003089 |
| 1 | rs495489 | POGK | 5.79E-03 | 0.9191 | 0.2722 |
| 1 | rs11161999 | LMO4 | 5.70E-03 | 1.193 | 0.007684 |
| 18 | rs17082921 | SOCS6 | 5.69E-03 | 1.144 | 0.07807 |
| 9 | rs17063814 | GNA14 | 5.62E-03 | 1.21 | 0.002639 |
| 14 | rs12588812 | RNASE1 | 5.55E-03 | 1.137 | 0.07456 |
| 3 | rs16852539 | GOLIM4 | 5.53E-03 | 1.073 | 0.2998 |
| 4 | rs7680321 | GABRB1 | 5.51E-03 | 1.25 | 0.0001764 |
| 8 | rs448578 | MSR1 | 5.50E-03 | 1.111 | 0.1176 |
| 8 | rs17069985 | CSMD1 | 5.49E-03 | 1.105 | 0.1615 |
| 1 | rs1890038 | CHD1L | 5.48E-03 | 1.137 | 0.05786 |
| 10 | rs10443995 | DOCK1 | 5.48E-03 | 1.047 | 0.5138 |
| 9 | rs13290547 | DAB2IP | 5.47E-03 | 1.192 | 0.01176 |
| 3 | rs9824570 | CLSTN2 | 5.45E-03 | 0.92 | 0.1817 |
| 16 | rs4843366 | LOC732275 | 5.44E-03 | 1.162 | 0.013 |
| 10 | rs1338007 | ADRA2A | 5.44E-03 | 1.075 | 0.3076 |
| 9 | rs615928 | GCNT1 | 5.44E-03 | 1.099 | 0.2024 |
| 14 | rs10137389 | C14orf106 | 5.43E-03 | 1.084 | 0.2648 |
| 7 | rs56183050 | POT1 | 5.43E-03 | 1.095 | 0.1748 |
| 12 | rs2468244 | CEP290 | 5.42E-03 | 1.096 | 0.1677 |
| 9 | rs3780621 | COL15A1 | 5.41E-03 | 1.157 | 0.01337 |
| 1 | rs6684324 | INADL | 5.41E-03 | 1.204 | 0.003692 |
| 13 | rs9514132 | SLC10A2 | 5.38E-03 | 0.9617 | 0.6074 |
| 1 | rs1318222 | C1orf94 | 5.37E-03 | 1.123 | 0.08309 |
| 18 | rs1560398 | MC4R | 5.36E-03 | 1.035 | 0.6496 |
| 5 | rs17653341 | ADRB2 | 5.32E-03 | 1.055 | 0.4495 |
| 1 | rs12046987 | MIR101-1 | 5.30E-03 | 1.158 | 0.02022 |
| 6 | rs7739908 | OGFRL1 | 5.29E-03 | 1.165 | 0.02408 |
| 18 | rs17739703 | C18orf34 | 5.28E-03 | 0.9017 | 0.1245 |
| 12 | rs1861674 | LOH12CR1 | 5.28E-03 | 1.204 | 0.001111 |
| 7 | rs7785575 | ELMO1 | 5.28E-03 | 1.117 | 0.08243 |
Top genes found by the epistasis network analysis workflow described in the Materials and methods for the merged WTCCC+NIMH-EA data sets. Rows are sorted by SNPrank epistasis network centrality score. Columns are chromosome, SNP rsid, gene symbol, SNPrank score and univariate odds ratio and P-value.Bold gene symbols are genes that have strong evidence from univariate analysis of other larger-scale GWAS of BD. Ranking for unmerged data may be found in Supplementary Table 1.