Literature DB >> 23284986

Gene expression profiling combined with bioinformatics analysis identify biomarkers for Parkinson disease.

Hongyu Diao1, Xinxing Li, Sheng Hu, Yunhui Liu.   

Abstract

Parkinson disease (PD) progresses relentlessly and affects approximately 4% of the population aged over 80 years old. It is difficult to diagnose in its early stages. The purpose of our study is to identify molecular biomarkers for PD initiation using a computational bioinformatics analysis of gene expression. We downloaded the gene expression profile of PD from Gene Expression Omnibus and identified differentially coexpressed genes (DCGs) and dysfunctional pathways in PD patients compared to controls. Besides, we built a regulatory network by mapping the DCGs to known regulatory data between transcription factors (TFs) and target genes and calculated the regulatory impact factor of each transcription factor. As the results, a total of 1004 genes associated with PD initiation were identified. Pathway enrichment of these genes suggests that biological processes of protein turnover were impaired in PD. In the regulatory network, HLF, E2F1 and STAT4 were found have altered expression levels in PD patients. The expression levels of other transcription factors, NKX3-1, TAL1, RFX1 and EGR3, were not found altered. However, they regulated differentially expressed genes. In conclusion, we suggest that HLF, E2F1 and STAT4 may be used as molecular biomarkers for PD; however, more work is needed to validate our result.

Entities:  

Mesh:

Substances:

Year:  2012        PMID: 23284986      PMCID: PMC3532340          DOI: 10.1371/journal.pone.0052319

Source DB:  PubMed          Journal:  PLoS One        ISSN: 1932-6203            Impact factor:   3.240


Introduction

Parkinson disease (PD) is a common chronic neurodegenerative disorder characterized by selective loss of dopaminergic neurons from the substantia nigra and presence of Lewy bodies [1]. The obvious symptoms are tremor at rest, muscle rigidity, bradykinesia and other movement-related symptoms [2]. PD is difficult to diagnose in its early stages, and when it was diagnosed, the only treatment involved boosting inadequate levels of dopamine in the brain, which did not eliminate all symptoms. Therefore, it is of significantly importance to find molecular biomarkers of PD to improve diagnosis accuracy, monitor disease progression and develop therapeutic interventions. The etiology of PD remains a puzzling mix of environmental factors, genes and the aged brain [3], [4]. Epidemiological research indicates that exposure to pesticides elevates the risk of PD. By contrast, caffeine and tobacco are associated with reduced risk [5]. In recent years, several causative genes of PD have been identified, including α-synuclein (SNCA), parkin (PARK2), UCHL-1 (PARK5), PINK1 (PARK6), DJ-1 (PARK7), LRRK2 (PARK8) and ATP13A2 (PARK9) [6], [7]. These PD-linked molecules are candidate biomarkers for PD [8]. Among them, the levels of DJ-1 and α-synuclein in human cerebrospinal fluid and blood between PD patients and non-PD controls are the most frequently tested biomarkers in previous studies; however, the results are conflicting [9], [10], [11], [12], [13]. At this stage, neither DJ-1 nor α-synuclein alone appears to be satisfactory as the biological biomarker for PD. Besides, changed levels of Urinary 8-hydroxydeoxyguanosin (Urinary 8-OHdGe) and proinflammatory cytokines such as tumor necrosis factor α (TNF-α), interleukin 6 (IL-6) and interleukin 1β (IL-1β) are also been studied as biomarkers for PD [14], [15]. Godau et al. recently showed that the levels of serum insulin-like growth factor (IGF-2) were significantly higher in PD patients than that in controls [16]. The purpose of this study is to identify molecular biomarkers for PD initiation using a computational bioinformatics analysis of gene expression. The availability and integration of high-throughput gene expression data and the computational bioinformatics analysis may shed new lights on molecular biomarker identification of PD.

Materials and Methods

Affymetrix Microarray Data

The transcription profile of GSE 20333 was downloaded from a public functional genomics data repository GEO (Gene Expression Omnibus) (http://www.ncbi.nlm.nih.gov/geo/). Affymetrix HG-Focus array was used to determine a global gene expression profile of clinically and neuropathologically confirmed cases of sporadic Parkinson disease (n = 6) compared to controls (n = 6). Postmortem human brains were obtained from moderately to severe Parkinsonism individuals based on the Hoehn & Yahr criteria. The average age for PD and control is 76.6 and 77.8 years, respectively. The average postmortem delay for PD and control is 26.2 and 19.8 hours, respectively.

Pathway Data

KEGG (Kyoto Encyclopedia of Genes and Genomes) is one of the most popular pathway databases; it groups genes into pathways of interacting genes and substrates, and contains specific links between genes and substrates that interact directly [17], [18]. The PATHWAY database records networks of molecular interactions in the cells, and variants of them specific to particular organisms (http://www.genome.jp/kegg/). We collected pathway information from KEGG on June 30, 2011.

Regulatory Data

UCSC (http://genome.ucsc.edu) is an interactive website offering access to genome sequence data from a variety of vertebrate and invertebrate species and major model organisms, integrated with a large collection of aligned annotations. We downloaded the human transcription factors (TFs) and their target chromosome region from UCSC. Then, we downloaded the chromosome annotation information from NCBI and analyzed the relationships between TFs and their target genes.

Differentially Coexpression Analysis

From the perspective of systems biology, functionally related genes are frequently coexpressed across a set of samples [19], [20], [21]. Differentially Coexpressed Genes and Links (DCGL) is designed for identifying differentially coexpressed genes and links from gene expression microarray data [22]. For GSE20333, we used the DCGL package [22], [23] in R [24] to identify differentially coexpressed genes (DCGs) and links in PD patients compared to non-PD controls. We calculated the p-values and adjusted the raw p-values into false discovery rate (FDR) using the Benjamini-Hochberg method [25] to circumvent the multi-test problem which might induce too much false positive results. The genes only with FDR <0.25 were selected as differentially coexpressed genes.

Pathway Enrichment Analysis

In order to facilitate the functional annotation and analysis of large lists of genes in our result, we inputted all the DCGs into DAVID (The Database for Annotation, Visualization and Integrated Discovery) for KEGG (Kyoto Encyclopedia of Genes and Genomes) term enrichment analysis. The DAVID identifies canonical pathways associated with a given list of genes by calculating the hypergeometric test p-value for probability that association between this set of genes and a canonical pathway [26]. We chose p-value <0.05 as the cut-off criterion.

Measures of RIF

Regulatory impact factor (RIF) appears to be a robust and valuable methodology to identify the regulators with the highest evidence of contributing to differential expression in two biological conditions. It is a metric given to each TF that combines the change in coexpression between the TF and the DEGs (i.e. the potential targets). The measures of RIF are computed as follows [27]:where n is the number of DEGs; e1 and e2j represent the expression value of the jth DEG in conditions 1 and 2, respectively; r1 and r2 represent the coexpression correlation between the ith TF and the jth DEG in conditions 1 and 2, respectively.

Results

Identification of Differentially Coexpressed Genes in PD

We downloaded publicly available microarray dataset GSE20333 from GEO database and applied DCGL package in R to identify DCGs in 6 PD patients and 6 non-PD controls. Among all genes tested, we found a total of 1004 DCGs with FDR <0.25. Besides, a total of 459683 links were predicted among these DCGs.

Enrichment of PD Associated Pathways

In order to functional annotation of the large lists of genes in our result, we used the online biological classification tool DAVID and observed significant enrichment of these genes in multiple KEGG categories (Table 1). Pathway analysis revealed that the DCGs were strongly associated with Ribosome (p = 2.21E-06), and Neurotrophin signaling pathway (p = 1.45E-04). In addition, Steroid biosynthesis, Spliceosome, and NOD-like receptor signaling pathway showed evidence of association with the differentially co-expressed genes (p<0.01).
Table 1

The enriched KEGG pathways (p<0.05).

IDP-valueCountSizeTerm
30102.21E-061388Ribosome
1000.001496417Steroid biosynthesis
30400.00179511128Spliceosome
46210.002702762NOD-like receptor signaling pathway
46100.018662669Complement and coagulation cascades
52150.019005789Prostate cancer
9800.021211671Metabolism of xenobiotics by cytochrome P450
9820.023986673Drug metabolism - cytochrome P450
1400.027895556Steroid hormone biosynthesis
720.02930729Synthesis and degradation of ketone bodies
46120.031962678Antigen processing and presentation
6200.033264440Pyruvate metabolism
52100.040861562Colorectal cancer
49620.044997444Vasopressin-regulated water reabsorption
300.04856327Pentose phosphate pathway

ID represents the pathway ID in KEGG. Count represents the number of DCGs enriched in each pathway. Size represents the total number of genes in each pathway. Term represents the pathway name.

ID represents the pathway ID in KEGG. Count represents the number of DCGs enriched in each pathway. Size represents the total number of genes in each pathway. Term represents the pathway name.

Regulatory Network Construction

We matched the 1004 DCGs and the 459683 links to the known regulatory data between transcription factors (TFs) and target genes, and obtained a total of 745 pairs of relationships between 82 TFs and 601 target genes. By integrating the regulatory relationships above, we built a regulatory network using Cytoscape [28] (Figure 1).
Figure 1

Regulatory network construction among TFs and their target genes.

The red nodes represent TFs and the green nodes represent their target genes. Large nodes are differentially co-expressed genes and small nodes are non-DCGs.

Regulatory network construction among TFs and their target genes.

The red nodes represent TFs and the green nodes represent their target genes. Large nodes are differentially co-expressed genes and small nodes are non-DCGs.

Impact Analysis of Transcription Factor

The above network generates vast amounts of data. In order to focus on the most meaningful information, we calculated the RIF of each TF. The top 5 ranked TFs are HLF (hepatic leukemia factor), NKX3-1 (NK3 homeobox 1), TAL1 (T cell acute lymphocytic leukemia 1), RFX1 (regulatory factor X, 1) and EGR3 (early growth response 3) (Table 2). The relationships between these top 5 TFs and their target genes were shown in Figure 2 and Table 3. From Table 3, we could find that HLF, E2F1 (E2F transcription factor 1) and STAT4 (signal transducer and activator of transcription 4) are both TFs and DCGs. Other TFs, such as NKX3-1, TAL1, RFX1 and EGR3, are not DCGs, but their target genes are.
Table 2

The top 5 ranked TFs.

TFRIFRank
HLF121368.21
NKX3-1112874.12
TAL1109026.63
RFX1103119.94
EGR3102361.65

TF represents the transcription factor. RIF represents regulatory impact factor of TF. Rank represents the impact rank of TF.

Figure 2

The regulatory relationships between the top 5 TFs and their target genes.

The red nodes represent transcription factors and the green nodes represent their target genes.

Table 3

The regulatory relationships between the top 5 TFs and their target genes.

TFTargetGenecor.1cor.2DCG
HLFGPR850.970975−0.88501HLF
HLFING10.563873−0.93968HLF
HLFKLHL12−0.03165−0.99013HLF
HLFNUSAP10.227138−0.96687HLF
HLFNYX0.294478−0.95254HLF
HLFSULT1E10.963595−0.28899HLF
HLFCALCA−0.03387−0.91528HLF
HLFCOL15A1−0.053−0.85312HLF
HLFKLF10−0.01067−0.90286HLF
NKX3-1NUDT21−0.392490.968038NUDT21
TAL1INPP4B−0.55120.962619INPP4B
TAL1PPFIA2−0.126430.964919PPFIA2
TAL1STAG2−0.08273−0.93227STAG2
E2F1TAL1−0.639410.995085E2F1
STAT4TAL1−0.541160.964951STAT4
RFX1TAAR3−0.912490.91653TAAR3
RFX1BCL90.0468610.870466BCL9
RFX1LTBP4−0.287080.991056LTBP4
EGR3CBL−0.137150.969967CBL
EGR3CD248−0.473120.983806CD248
EGR3CDK2AP10.810745−0.91525CDK2AP1
EGR3DGKI−0.259260.95291DGKI
EGR3ICAM50.0256470.824395ICAM5
EGR3KPNA4−0.044240.970174KPNA4
EGR3NCK1−0.691860.954629NCK1
EGR3TCF120.835666−0.89573TCF12
EGR3TUBA8−0.103590.981482TUBA8
EGR3BLZF1−0.199940.959835BLZF1
EGR3TDRKH−0.91930.793759TDRKH

TF represents the transcription factor. Target gene represents the target gene of transcription factor. cor.1 and cor.2 represent the coexpression correlation between the TF and the target gene in conditions 1 and 2, respectively. DCG indicates the differentially co-expressed gene of a pair of TF and target gene.

The regulatory relationships between the top 5 TFs and their target genes.

The red nodes represent transcription factors and the green nodes represent their target genes. TF represents the transcription factor. RIF represents regulatory impact factor of TF. Rank represents the impact rank of TF. TF represents the transcription factor. Target gene represents the target gene of transcription factor. cor.1 and cor.2 represent the coexpression correlation between the TF and the target gene in conditions 1 and 2, respectively. DCG indicates the differentially co-expressed gene of a pair of TF and target gene.

Discussion

Molecular biomarkers are useful to improve diagnosis, to predict clinical behavior and to demonstrate new therapeutic efficacy. Since microarray can interrogate expression levels of thousands of genes in human genome simultaneously, it has been widely used in discovery of disease biomarkers [29], [30], [31]. In this work, we have analysed gene expression data with computational methods with the aim of uncovering genes that potentially dysregulate in PD. We identified a total of 1004 DCGs in PD patients compared to non-PD controls. After regulatory network construction and regulatory impact factor analysis, we found that the transcription factors HLF, E2F1, STAT4, NKX3-1, TAL1, RFX1 and EGR3 may play important roles in PD initiation. Of these, HLF, STAT4 and E2F1 were found have altered expression levels in PD patients. The expression levels of other transcription factors, NKX3-1, TAL1, RFX1 and EGR3, were not found altered. However, they regulated differentially expressed genes. HLF encodes a member of the proline and acidic-rihc protein family, a subset of the bZIP transcription factors. Chromosomal translocations fusing portions of this gene with the E2A gene cause a subset of childhood B-lineage acute lymphoid leukemias [32]. While HLF has been linked to malignancies of the lymphoid system, it is detected in the liver, kidney, and adult nervous system by northern blotting [33]. Hitzler et al. found that HLF expression increased markedly with synaptogenesis and was coincident with barrel formation and suggested that HLF plays a role in the function of differentiated neurons in the adult nervous system [34]. HLF appears as the most significant transcription factors related to the differential expression of genes in PD patients. E2F1 is a member of the E2F family of transcription factors. The E2F family plays a crucial role in the control of cell cycle and action of tumor suppressor proteins and is also a target of the transforming proteins of small DNA tumor viruses. Several studies have demonstrated that E2F1 contributes to neuronal damage and death using in vitro models of neurodegeneration [35], [36], [37]. E2F1 immunoreactivity and/or protein levels were reported to increase in neurons of patients with PD [38]. They showed that pRb/E2F pathway is activated in dopaminergic neurons in PD, but also demonstrated that activation of this pathway is instrumental in the degeneration of these neurons in the MPTP/MPP+ model of the disease [38]. In a recent study, Lu and his colleagues showed that mutations in LRRK2 cause PD through inhibiting the translational repression of the transcription factors E2F1 and DP [39]. STAT4 is a transcription factor belonging to the signal transducer and activator of transcription protein family [40]. STAT4 is involved in the signaling of interleukin-12 and interferon -γ, as well as interleukin-23 [41]. Though we found STAT4 was differentially expressed in PD patients compared to non-PD controls, the gene has no known role in PD pathogenesis to data. From Table 1, we could find that the most significant enriched pathway is ribosome which is responsible for catalyzing the formation of proteins from individual amino acids. Besides, some pathways associated with protein synthesis were also enriched in the result, such as ribosome, steroid biosynthesis, and spliceosome. This result suggests that biological processes of protein turnover were impaired in PD. Our result is in line with previous study [42], [43]. In conclusion, we have identified molecular biomarkers for PD initiation using a computational bioinformatics analysis of gene expression. A total of 1004 differentially coexpressed genes were identified between PD patients and non-PD controls. Pathway enrichment of these genes suggests that biological processes of protein turnover were impaired in PD. After regulatory network construction and regulatory impact factor analysis, we found that the transcription factors HLF, E2F1, STAT4, NKX3-1, TAL1, RFX1 and EGR3 may play important roles in PD initiation. Of these, HLF, STAT4 and E2F1 were found have altered expression levels in PD patients. Therefore, we suggested that HLF, E2F1 and STAT4 may be used as biomarkers for PD; however, more work is needed to validate our result.
  41 in total

1.  KEGG: kyoto encyclopedia of genes and genomes.

Authors:  M Kanehisa; S Goto
Journal:  Nucleic Acids Res       Date:  2000-01-01       Impact factor: 16.971

Review 2.  Nongenetic causes of Parkinson's disease.

Authors:  A R Chade; M Kasten; C M Tanner
Journal:  J Neural Transm Suppl       Date:  2006

3.  Neuropilin-1 is a direct target of the transcription factor E2F1 during cerebral ischemia-induced neuronal death in vivo.

Authors:  Susan X Jiang; Melissa Sheldrick; Angele Desbois; Jacqueline Slinn; Sheng T Hou
Journal:  Mol Cell Biol       Date:  2006-12-18       Impact factor: 4.272

4.  Decreased alpha-synuclein in cerebrospinal fluid of aged individuals and subjects with Parkinson's disease.

Authors:  Takahiko Tokuda; Sultan A Salem; David Allsop; Toshiki Mizuno; Masanori Nakagawa; Mohamed M Qureshi; Joseph J Locascio; Michael G Schlossmacher; Omar M A El-Agnaf
Journal:  Biochem Biophys Res Commun       Date:  2006-08-14       Impact factor: 3.575

5.  DCGL: an R package for identifying differentially coexpressed genes and links from gene expression microarray data.

Authors:  Bao-Hong Liu; Hui Yu; Kang Tu; Chun Li; Yi-Xue Li; Yuan-Yuan Li
Journal:  Bioinformatics       Date:  2010-08-26       Impact factor: 6.937

6.  Involvement of the transcription factor E2F1/Rb in kainic acid-induced death of murine cerebellar granule cells.

Authors:  Robert A Smith; Teena Walker; Xiaoqi Xie; Sheng T Hou
Journal:  Brain Res Mol Brain Res       Date:  2003-08-19

7.  Stat4, a novel gamma interferon activation site-binding protein expressed in early myeloid differentiation.

Authors:  K Yamamoto; F W Quelle; W E Thierfelder; B L Kreider; D J Gilbert; N A Jenkins; N G Copeland; O Silvennoinen; J N Ihle
Journal:  Mol Cell Biol       Date:  1994-07       Impact factor: 4.272

Review 8.  Mechanisms of Disease: biomarkers and molecular targets from microarray gene expression studies in prostate cancer.

Authors:  Colin S Cooper; Colin Campbell; Sameer Jhavar
Journal:  Nat Clin Pract Urol       Date:  2007-12

9.  Link-based quantitative methods to identify differentially coexpressed genes and gene pairs.

Authors:  Hui Yu; Bao-Hong Liu; Zhi-Qiang Ye; Chun Li; Yi-Xue Li; Yuan-Yuan Li
Journal:  BMC Bioinformatics       Date:  2011-08-02       Impact factor: 3.169

10.  Analyzing microarray data of Alzheimer's using cluster analysis to identify the biomarker genes.

Authors:  Satya Vani Guttula; Apparao Allam; R Sridhar Gumpeny
Journal:  Int J Alzheimers Dis       Date:  2012-02-14
View more
  14 in total

1.  E2F1 impairs all-trans retinoic acid-induced osteogenic differentiation of osteosarcoma via promoting ubiquitination-mediated degradation of RARα.

Authors:  Lei Zhang; Qian Zhou; Ning Zhang; Weixu Li; Meidan Ying; Wanjing Ding; Bo Yang; Qiaojun He
Journal:  Cell Cycle       Date:  2014-02-17       Impact factor: 4.534

Review 2.  Unbiased approaches to biomarker discovery in neurodegenerative diseases.

Authors:  Alice S Chen-Plotkin
Journal:  Neuron       Date:  2014-11-05       Impact factor: 17.173

3.  Whole-Genome mRNA Gene Expression Differs Between Patients With and Without Delirium.

Authors:  Katrina Kalantar; Sara C LaHue; Joseph L DeRisi; Hannah A Sample; Caitlin A Contag; Scott A Josephson; Michael R Wilson; Vanja C Douglas
Journal:  J Geriatr Psychiatry Neurol       Date:  2018-07-10       Impact factor: 2.680

4.  Pathway activity inference for multiclass disease classification through a mathematical programming optimisation framework.

Authors:  Lingjian Yang; Chrysanthi Ainali; Sophia Tsoka; Lazaros G Papageorgiou
Journal:  BMC Bioinformatics       Date:  2014-12-05       Impact factor: 3.169

5.  Identification of key genes associated with gastric cancer based on DNA microarray data.

Authors:  Hui Sun
Journal:  Oncol Lett       Date:  2015-11-17       Impact factor: 2.967

6.  Efficient and biologically relevant consensus strategy for Parkinson's disease gene prioritization.

Authors:  Maykel Cruz-Monteagudo; Fernanda Borges; Cesar Paz-Y-Miño; M Natália D S Cordeiro; Irene Rebelo; Yunierkis Perez-Castillo; Aliuska Morales Helguera; Aminael Sánchez-Rodríguez; Eduardo Tejera
Journal:  BMC Med Genomics       Date:  2016-03-09       Impact factor: 3.063

7.  Validity of the MPTP-Treated Mouse as a Model for Parkinson's Disease.

Authors:  Cornelius J H M Klemann; Gerard J M Martens; Geert Poelmans; Jasper E Visser
Journal:  Mol Neurobiol       Date:  2015-02-13       Impact factor: 5.590

8.  Integrating omics data and protein interaction networks to prioritize driver genes in cancer.

Authors:  Tiejun Zhang; Di Zhang
Journal:  Oncotarget       Date:  2017-07-22

9.  DCGL v2.0: an R package for unveiling differential regulation from differential co-expression.

Authors:  Jing Yang; Hui Yu; Bao-Hong Liu; Zhongming Zhao; Lei Liu; Liang-Xiao Ma; Yi-Xue Li; Yuan-Yuan Li
Journal:  PLoS One       Date:  2013-11-20       Impact factor: 3.240

10.  Hepcidin Plays a Key Role in 6-OHDA Induced Iron Overload and Apoptotic Cell Death in a Cell Culture Model of Parkinson's Disease.

Authors:  Qi Xu; Anumantha G Kanthasamy; Huajun Jin; Manju B Reddy
Journal:  Parkinsons Dis       Date:  2016-05-19
View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.