| Literature DB >> 30108454 |
Zhe Li1,2, ZhenZhen Xiong3, Lydia C Manor4, Hongbao Cao5,6, Tao Li1.
Abstract
Recent studies have reported hundreds of genes linked to Alzheimer's Disease (AD). However, many of these candidate genes may be not identified in different studies when analyses were replicated. Moreover, results could be controversial. Here, we proposed a computational workflow to curate and evaluate AD related genes. The method integrates large scale literature knowledge data and gene expression data that were acquired from postmortem human brain regions (AD case/control: 31/32 and 22/8). Pathway Enrichment, Sub-Network Enrichment, and Gene-Gene Interaction analysis were conducted to study the pathogenic profile of the candidate genes, with 4 metrics proposed and validated for each gene. By using our approach, a scalable AD genetic database was developed, including AD related genes, pathways, diseases and info of supporting references. The AD case/control classification supported the effectiveness of the 4 proposed metrics, which successfully identified 21 well-studied AD genes (i.g. TGFB1, CTNNB1, APP, IL1B, PSEN1, PTGS2, IL6, VEGFA, SOD1, AKT1, CDK5, TNF, GSK3B, TP53, CCL2, BDNF, NGF, IGF1, SIRT1, AGER and TLR) and highlighted one recently reported AD gene (i.g. ITGB1). The computational biology approach and the AD database developed in this study provide a valuable resource which may facilitate the understanding of the AD genetic profile.Entities:
Keywords: Alzheimer’s disease; Gene-gene interaction analysis; Pathway enrichment analysis; ResNet database; Sub-network enrichment analysis
Year: 2018 PMID: 30108454 PMCID: PMC6088103 DOI: 10.1016/j.sjbs.2018.05.019
Source DB: PubMed Journal: Saudi J Biol Sci ISSN: 1319-562X Impact factor: 4.219
Fig. 1Diagram for the integrative computational marker evaluation approach for AD. First, literature based analysis were conducted to identify the AD related genes, then Gene-Gene Interaction Analysis, Enrichment analysis, and Metrics analysis were conducted on these gene and results were saved in the AD database. Finally, AD case/control classification were conducted to test the effectiveness of the identified genes, using gene expression datasets.
Top 10 Molecular function pathways/groups enriched by 1669 genes reported.
| Pathway/gene set name | GO ID | # of Entities | Overlap | q-value | Jaccard similarity |
|---|---|---|---|---|---|
| Aging | 0016280 | 254 | 140 | 2.28E−84 | 0.077 |
| Neuronal cell body | 0043025 | 466 | 172 | 4.7E−72 | 0.086 |
| Neuron projection | 0043005 | 378 | 153 | 2.8E−70 | 0.079 |
| Response to lipopolysaccharide | 0032496 | 252 | 126 | 4.9E−69 | 0.069 |
| Response to hypoxia | 0001666 | 259 | 127 | 8.11E−68 | 0.069 |
| Response to organic cyclic compound | 0014070 | 253 | 122 | 1.79E−64 | 0.067 |
| Response to ethanol | 0017036 | 161 | 94 | 4.21E−59 | 0.053 |
| Negative regulation of apoptotic process | 0006916 | 650 | 187 | 1.21E−56 | 0.086 |
| Perinuclear region of cytoplasm | 0048471 | 688 | 188 | 2.11E−55 | 0.085 |
| Axon | 0030424 | 318 | 125 | 2.56E−55 | 0.066 |
For each gene set, the p-value was calculated using Fisher’s-Exact test against the hypothesis that a randomly selected gene group of same size (1669) can generate a same or higher overlap with the corresponding gene set (q = 0.001 for FDR correction). The Jaccard similarity (j) is a statistic used for comparing the similarity and diversity of sample sets, which is defined by where A and B are two sample sets.
Fig. 2Gene-gene interaction network for AD. The network contains 1453 out of 1669 genes AD target genes that enriched within the 151 AD target pathways. The weight of an edge between two nodes is the number of pathways shared by both nodes. The larger the size of a node, the larger the number of AD candidate pathways including the gene (high PScore); the brighter the color, the larger number of AD candidate genes associated with gene (high SScore). 216 out of 1669 genes were not included in the network as they were not enriched within the top 151 AD candidate pathways.
Permutation test on top genes corresponding to highest CRs.
| Data sets | Items | RScore | AScore | PScore | SScore | All genes |
|---|---|---|---|---|---|---|
| GSE29378 (31/32) | Max CR (%) | 80.95 | 79.37 | 82.54 | 74.60 | 60.32 |
| #Gene | 66 | 109 | 20 | 57 | 1605 | |
| p-value | 0.0022 | 0.004 | 0.0004 | 0.03 | 0.96 | |
| GSE28146 (22/8) | Max CR (%) | 80.00 | 90.00 | 83.33 | 90.00 | 73.33 |
| #Gene | 18 | 102 | 25 | 22 | 1621 | |
| p-value | 0.017 | 0.001 | 0.0074 | 0.0002 | 0.90 |
Fig. 3Validation of different metrics through a LOO cross validation. (a) Results from GSE29378; (b) Results from GSE28146. Mean of CRs by randomly selected genes are displayed in a dash-grey line (Legend: Random). The maximum CR by different metrics are presented at the corresponding positions.
Fig. 4AD genes selected by cross metrics analysis and their relation with other diseases. The 21 genes that were overlap in RScore, PScore and SScore groups are highlighted in green; Gene ITGB1 that was the overlap in AScore, PScore and SScore groups and is highlighted in yellow. The network was built using the ‘network building’ module of Pathway Studio.