| Literature DB >> 28751646 |
Yong Xu1, Jun Wang2, Shuquan Rao3, McKenzie Ritter4, Lydia C Manor5, Robert Backer6, Hongbao Cao4,7, Zaohuo Cheng2, Sha Liu1, Yansong Liu2, Lin Tian2, Kunlun Dong2, Yin Yao Shugart4, Guoqiang Wang8, Fuquan Zhang9.
Abstract
Studies to date have reported hundreds of genes connected to bipolar disorder (BP). However, many studies identifying candidate genes have lacked replication, and their results have, at times, been inconsistent with one another. This paper, therefore, offers a computational workflow that can curate and evaluate BP-related genetic data. Our method integrated large-scale literature data and gene expression data that were acquired from both postmortem human brain regions (BP case/control: 45/50) and peripheral blood mononuclear cells (BP case/control: 193/593). To assess the pathogenic profiles of candidate genes, we conducted Pathway Enrichment, Sub-Network Enrichment, and Gene-Gene Interaction analyses, with 4 metrics proposed and validated for each gene. Our approach developed a scalable BP genetic database (BP_GD), including BP related genes, drugs, pathways, diseases and supporting references. The 4 metrics successfully identified frequently-studied BP genes (e.g. GRIN2A, DRD1, DRD2, HTR2A, CACNA1C, TH, BDNF, SLC6A3, P2RX7, DRD3, and DRD4) and also highlighted several recently reported BP genes (e.g. GRIK5, GRM1 and CACNA1A). The computational biology approach and the BP database developed in this study could contribute to a better understanding of the current stage of BP genetic research and assist further studies in the field.Entities:
Mesh:
Substances:
Year: 2017 PMID: 28751646 PMCID: PMC5532256 DOI: 10.1038/s41598-017-05846-4
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.379
Figure 1Diagram for the integrated computational marker evaluation approach.
Permutation test on top genes corresponding to highest CRs.
| Data Sets | Items | RScore | AScore | PScore | SScore | All Genes |
|---|---|---|---|---|---|---|
| GSE35977 (516/535) | Max CR (%) | 70.53 | 66.32 | 67.37 | 72.63 | 63.16 |
| #Gene | 7 | 37 | 11 | 34 | 516 | |
| p-value | 0.0006 | 0.046 | 0.015 | 0.0008 | 0.11 | |
| GSE82042 (515/535) | Max CR (%) | 56.36 | 57.00 | 57.63 | 57.12 | 54.70 |
| #Gene | 384 | 61 | 141 | 187 | 515 | |
| p-value | 0.073 | 0.022 | 0.02 | 0.034 | 0.32 |
Note: p-value in the table refers to permutation p-value, which is defined as the number of runs with equal or higher CRs, using same number of genes divided by the total number of runs.
Figure 2Validation of different metrics through a LOO cross-validation. (a) Results from GSE35977. (b) Results from GSE82042. Mean CRs of randomly selected genes are displayed in green. The maximum CRs for each metric are presented in corresponding positions.
Figure 3Top BP genes selected by cross-metrics analysis and their relationships to other diseases. The 11 genes that shared overlap in RScore, PScore and SScore groups are highlighted in red; The 3 genes that shared overlap in AScore, PScore and SScore groups are highlighted in yellow. The network was built using the ‘network building’ module of Pathway Studio.
Top 20 Molecular function pathways/groups enriched by 535 reported genes.
| Pathway/gene set name | Hit type | GO ID | # of Entities | Overlap | p-value | Jaccard similarity |
|---|---|---|---|---|---|---|
| synaptic transmission | biological_process | 0007268 | 472 | 112 | 8.95E-76 | 0.13 |
| neuronal cell body | cellular_component | 0043025 | 466 | 94 | 8.82E-58 | 0.10 |
| dendrite | cellular_component | 0030425 | 396 | 87 | 1.32E-56 | 0.10 |
| response to drug | biological_process | 0017035 | 509 | 87 | 1.19E-45 | 0.09 |
| synapse | cellular_component | 0045202 | 466 | 81 | 2.25E-44 | 0.09 |
| axon | cellular_component | 0030424 | 318 | 68 | 4.69E-43 | 0.09 |
| memory | biological_process | 0007613 | 76 | 39 | 1.51E-40 | 0.07 |
| postsynaptic membrane | cellular_component | 0045211 | 227 | 56 | 7.45E-39 | 0.08 |
| postsynaptic density | cellular_component | 0014069 | 168 | 48 | 1.87E-36 | 0.07 |
| neuron projection | cellular_component | 0043005 | 378 | 66 | 4.54E-36 | 0.08 |
Note: For each gene set, the p-value was calculated using Fisher-Exact test against the hypothesis that a randomly selected gene group of the same size (535) would generate the same or higher overlap with a given gene set (q = 0.001 for FDR correction). Jaccard similarity (J ) is a statistic used for comparing the similarity and diversity of sample sets, which is defined by , where A and B are two sample sets.