| Literature DB >> 21559189 |
Regina Augustin1, Stefan F Lichtenthaler, Michael Greeff, Jens Hansen, Wolfgang Wurst, Dietrich Trümbach.
Abstract
The molecular mechanisms and genetic risk factors underlying Alzheimer's disease (AD) pathogenesis are only partly understood. To identify new factors, which may contribute to AD, different approaches are taken including proteomics, genetics, and functional genomics. Here, we used a bioinformatics approach and found that distinct AD-related genes share modules of transcription factor binding sites, suggesting a transcriptional coregulation. To detect additional coregulated genes, which may potentially contribute to AD, we established a new bioinformatics workflow with known multivariate methods like support vector machines, biclustering, and predicted transcription factor binding site modules by using in silico analysis and over 400 expression arrays from human and mouse. Two significant modules are composed of three transcription factor families: CTCF, SP1F, and EGRF/ZBPF, which are conserved between human and mouse APP promoter sequences. The specific combination of in silico promoter and multivariate analysis can identify regulation mechanisms of genes involved in multifactorial diseases.Entities:
Year: 2011 PMID: 21559189 PMCID: PMC3090009 DOI: 10.4061/2011/154325
Source DB: PubMed Journal: Int J Alzheimers Dis
Figure 1Workflow of bioinformatics analysis of promoter sequences and gene expression data to identify modules of TFBSs in AD-related genes. The workflow is divided into two parts, the first and second approach. The coloured boxes describe the methods which were used. The yellow boxes represent tools of the Genomatix Software (1DiAlignTF, 2FrameWorker, 3ModelInspector), and the blue boxes indicate multivariate methods or filtering by illumina detection score. The beginning and the end of the arrow specify input and output of the methods, respectively. The grey arrow denotes the comparison of the target genes of the modules with genes differentially regulated in microarray analyses. The scheme at the end of the second approach indicates a module composed of three TFBSs (blue, red, and green), which is common to three promoter sequences with transcription start site at the red arrow.
Modules identified by the first approach: 17 modules composed of two or more TFBSs families. The TFBSs families consist of several TFs (Table 2). The second column specifies the key genes of AD the TFs of the module bind to, according to the search of the module in all human promoters by ModelInspector. Human and mouse APPs are indicated by hAPP and mApp, respectively.
| Module | AD key genes—targets of module |
|---|---|
| CTCF-E2FF-SP1F | hAPP, mApp, BACE1, NCSTN, APH1A, |
| CTCF-SP1F | hAPP, mApp, BACE1, PS2, NCSTN, APH1A |
| E2FF-E2FF-EGRF | hAPP, mApp, BACE1, BACE2, PEN-2, APH1A |
| CTCF-E2FF-EGRF | hAPP, BACE2, PEN-2, NCSTN, APH1A |
| CTCF-E2FF-EGRF | hAPP, mApp, BACE1, BACE2, PEN-2, APH1A |
| CTCF-HAND-SP1F | hAPP, mApp, BACE2, PS2 |
| CTCF-SP1F-SP1F | hAPP, mApp, BACE1, BACE2 |
| CTCF-NRF1-SP1F | hAPP, mApp, BACE1, APH1A |
| CTCF-EGRF-NRF1 | hAPP, BACE2, PS2, PEN-2 |
| CTCF-SP1F-ZBPF | hAPP, mApp, BACE2 |
| CTCF-EGRF-ZBPF | hAPP, BACE2, PS2 |
| CTCF-NRF1 | hAPP, mApp, BACE1, BACE2, PEN-2 |
| CTCF-EGRF-SP1F | hAPP, BACE2, PS2 |
| NRF1-ZBPF | hAPP, mApp, BACE2, PEN-2, APH1A |
| CTCF-EGRF | hAPP, mApp, BACE1, BACE2, PS1, PEN-2, APH1A |
| CTCF-E2FF | hAPP, mApp, BACE1, BACE2, NCSTN, APH1A |
| SP1F-ZBPF-ZBPF | BACE1, PS1, PEN-2, APH1A |
Composition of the TF families of the modules. Each TF family consists of several transcription factors (TFs). Additional information about description and binding domains of the families is given in the second and fourth column, respectively.
| TF family | Description | TFs | Binding domains |
|---|---|---|---|
| CTCF | CTCF and BORIS gene family, transcriptional regulators with 11 highly conserved zinc finger domains | CTCF, CTCFL | C2H2 zinc finger domain |
| E2FF | E2F-myc activator/cell cycle regulator | E2F1, E2F2, E2F3, E2F4, E2F5, E2F6, E2F7, E2F8, TFDP1, TFDP2, TFPD3 | E2F winged helix |
| EGRF | EGR/nerve growth factor-induced protein C and related factors | EGR1, EGR2, EGR3, EGR4, WT1, ZBTB7A, ZBTB7B | C2H2 zinc finger domain |
| HAND | Twist subfamily of class B bHLH transcription factors | HAND1, HAND2, LYL1, MESP1, MESP2, NHLH1, NHLH2, SCXA, SCXB, TAL1, TAL2, TCF12, TCF15, TCF3, TWIST1, TWIST2 | bHLH |
| KLFS | Krueppel-like transcription factors | KLF1, KLF2, KLF3, KLF4, KLF6, KLF7, KLF8, KLF9, KLF12, KLF13, KLF15 | — |
| NRF1 | Nuclear respiratory factor 1 | NRF1 | bZIP |
| SP1F | GC-Box factors SP1/GC | KLF10, KLF11, KLF16, KLF5, SP1, SP2, SP3, SP4, SP5, SP6, SP7, SP8 | C2H2 zinc finger domain |
| ZBPF | Zinc binding protein factors | ZKSCAN3, ZNF148, ZNF202, ZNF219, ZNF281, ZNF300 | C2H2 zinc finger domain |
Figure 2Relations of predicted target genes of three TFBSs modules. This picture summarizes important target genes of the modules, the relation of target genes to KEGG pathways playing a role in AD (blue rectangle), and the relation of the target genes to some AD key genes (red pentagon). The target genes are coloured according to their membership to microarray studies, and some target genes with two colours are derived from analysis of two different microarray studies. The grey arrows are the predicted regulations of the target genes by the modules (orange rectangle), and the black lines indicate that the target gene is part of the corresponding KEGG pathway. Additionally, three different relations of the target genes to AD key genes are shown by purple, green, and blue lines, which indicate protein-protein binding, protein modification, and regulation, respectively.
Figure 3Expression profiles of five different clusters of coregulated genes. On the x-axis, the sample IDs (specified by accession numbers of GEO/NCBI) incorporated in the cluster are given, and y-axis indicates values of expression. One gene corresponds to a single line in the profile, and the target genes of the modules as mentioned in the text are coloured. The two upper profiles (a, b) are clusters of coregulated genes of the AD patients dataset, the profile in the middle (c) corresponds to coregulated genes of the LOAD patients dataset, and the two profiles below (d, e) correspond to coregulated genes of the double-transgenic mice dataset. The target genes of the profiles (a) and (d) were used for the establishment of the module CTCF-EGRF-SP1F and the profiles (b) and (e) for the module CTCF-SP1F-ZBPF, at which (b) was also used for the module KLFS-SP1F-ZBPF. The five lines in profile (e) correspond to genes, which are involved in the MAPK signaling pathway. The target genes of the profile (c) were used for the establishment of the modules CTCF-SP1F-ZBPF and KLFS-SP1F-ZBPF.