| Literature DB >> 17090307 |
Klaas Vandepoele1, Tineke Casneuf, Yves Van de Peer.
Abstract
BACKGROUND: Transcriptional regulation plays an important role in the control of many biological processes. Transcription factor binding sites (TFBSs) are the functional elements that determine transcriptional activity and are organized into separable cis-regulatory modules, each defining the cooperation of several transcription factors required for a specific spatio-temporal expression pattern. Consequently, the discovery of novel TFBSs in promoter sequences is an important step to improve our understanding of gene regulation.Entities:
Mesh:
Substances:
Year: 2006 PMID: 17090307 PMCID: PMC1794593 DOI: 10.1186/gb-2006-7-11-r103
Source DB: PubMed Journal: Genome Biol ISSN: 1474-7596 Impact factor: 13.583
Figure 1Network-level conservation filter. (a) The occurrence of a candidate TFBS in the set of orthologous Arabidopsis-poplar gene pairs was determined and the significance of the overlap is measured using the hypergeometric distribution [24]. The NCS is defined as the negative logarithm of the hypergeometric p value. (b) Distribution of NCS values for 1,000 randomly generated TFBSs (grey) and the motifs found using the co-expression (black) and the two-way clustering (white) procedure. The left and right y-axis show the frequency for the random and the potentially functional TFBSs, respectively.
Figure 2Detection of TFBSs using two-way clustering. Starting from the available set of 34 TFBSs identified using sets of co-expressed genes (see text for details), clusters of genes with similar TFBS combinations in their promoter are delineated. Next, within each set of genes with similar TFBS content, groups of co-expressed genes are identified. Finally, motif detection is applied and evolutionarily conserved TFBSs are retained. The panel on the right shows the identification of the TFBS HA_HSE2 involved in zygotic embryogenesis. The top picture depicts a subset of all 573 Arabidopsis genes containing the module consisting of two distinct G-boxes. The two images below show the three groups of co-expressed genes and the newly identified TFBSs found in a set of 22 genes containing both G-boxes in their promoter and showing embryo-specific expression. Note that the section indicated with the dotted line corresponds with the motif-detection approach applied on co-expressed genes in the first stage.
Overview of the TFBSs identified using co-expressed genes
| TFBS motif* | NCS† | Known motif | Site‡ | Functional enrichment targets: GO Biological Process or Molecular Function§ |
| nrCAAnTC (a) | 5.77 | BJ_CAAT-box | GO:0008152 metabolism 8.58E-04 (1.2); GO:0003824 catalytic activity 8.91E-05 (1.2) | |
| GTACAwry (b) | 5.64 | GO:0007275 development 2.89E-02 (1.6); GO:0003824 catalytic activity 2.98E-03 (1.2) | ||
| TTCkwwTs | 5.79 | BOXIINTPATPB | ||
| sGCrGAGA | 5.77 | GO:0015980 energy derivation by oxidation of organic compounds 4.82E-02 (2.7); GO:0008152 metabolism 1.43E-03 (1.2); GO:0003824 catalytic activity 2.89E-03 (1.1) | ||
| kCCACGTn (4) | 17.54 | AT_G-box; HV_ABRE6; PH_boxII | GO:0015979 photosynthesis 2.48E-04 (4.2); GO:0048316 seed development 2.64E-03 (3.6); GO:0009793 embryonic development (sensu Magnoliophyta) 6.15E-03 (3.5) | |
| yCATTTnT (c) | 8.7 | GM_Unnamed_6 | G | GO:0003700 transcription factor activity 2.94E-03 (1.3); GO:0030528 transcription regulator activity 1.64E-02 (1.3); GO:0003677 DNA binding 3.86E-02 (1.2) |
| ynTTATCC | 6.75 | SREATMSD; AT_I-box | ||
| nGTTGACw (d) | 5.31 | ZM_O2-site | GO:0006952 defense response 2.99E-04 (1.9); GO:0009607 response to biotic stimulus 3.56E-04 (1.7); GO:0016301 kinase activity 7.52E-11 (1.7) | |
| TTTGCnrA | 6.13 | GO:0016773 phosphotransferase activity, alcohol group as acceptor 1.14E-02 (1.6); GO:0016772 transferase activity, transferring phosphorus-containing groups 2.60E-02 (1.5) | ||
| rATyTGGG | 5.58 | |||
| TrTwTATA | 9.35 | AT_TATA-box | GO:0019748 secondary metabolism 2.76E-02 (2.1); GO:0006519 amino acid and derivative metabolism 1.35E-02 (1.8); GO:0003700 transcription factor activity 3.36E-02 (1.3) | |
| ATArwACA (e) | 5.79 | OS_Unnamed_2 | CCA | |
| nTTCCCGC (5) | 27.27 | NT_E2Fa | GO:0006261 DNA-dependent DNA replication 6.48E-04 (6.2); GO:0000067 DNA replication and chromosome cycle 1.06E-07 (5.5); GO:0006260 DNA replication 3.57E-05 (5.1) | |
| TkAGAwnA | 8.86 | BO_TCA-element3 | GO:0006464 protein modification 4.52E-02 (1.7); GO:0003824 catalytic activity 5.20E-03 (1.1) | |
| AAACCCTA (13) (f) | 40.06 | TELOBOXATEEF1AA1 | Ribosome biogenesis and assembly 9.86E-13 (4.4); ribosome biogenesis 5.67E-12 (4.3); pre-mRNA splicing factor activity 3.20E-04 (3.9) | |
| mGnyAAAG (g) | 6.38 | GO:0003824 catalytic activity 2.93E-02 (1.1) | ||
| GAnCnkmG | 6.29 | GO:0003729 mRNA binding 1.00E-02 (3.1); GO:0003735 structural constituent of ribosome 3.69E-02 (1.7); GO:0006412 protein biosynthesis 3.15E-03 (1.7) | ||
| TCnCTCTC | 8.98 | LE_5UTRPy-richstretch | TT | GO:0003777 microtubule motor activity 9.90E-03 (2.7); GO:0050789 regulation of biological process 2.27E-03 (1.4); GO:0016772 transferase activity, transferring phosphorus-containing groups 7.89E-03 (1.4) |
| wmGTCmAm | 7.16 | GO:0003824 catalytic activity 4.51E-03 (1.1) | ||
| ynCAACGG | 8.39 | CR_MSA-like | GO:0003777 microtubule motor activity 3.17E-03 (3.4); GO:0003774 motor activity 8.55E-03 (2.9) | |
| nmGATyCr | 5.66 | GO:0006944 membrane fusion 2.32E-02 (4.5); GO:0003735 structural constituent of ribosome 2.77E-03 (1.9); GO:0005198 structural molecule activity 7.11E-04 (1.9) | ||
| CGkCGmCn | 7.68 | OS_GC-motif5 | ||
| AGGCCCAw (9) | 21.94 | UP1ATMSD | GO:0007046 ribosome biogenesis 3.56E-14 (4.3); GO:0042254 ribosome biogenesis and assembly 2.28E-14 (4.3); GO:0003735 structural constituent of ribosome 8.66E-29 (3.3) | |
| AykyATwA | 6.09 | |||
| CTGnCTCy | 6.91 | GO:0016301 kinase activity 3.44E-02 (1.3); GO:0003676 nucleic acid binding 3.48E-02 (1.2); GO:0005488 binding 2.60E-03 (1.2) | ||
| TsTCGnTT | 7.22 | GO:0003824 catalytic activity 5.10E-03 (1.1) | ||
| TmAsTGAn | 7.76 | OS_GTCAdirectrepeat | TAAGTCA | GO:0016491 oxidoreductase activity 3.85E-03 (1.5); GO:0008152 metabolism 5.74E-03 (1.2); GO:0003824 catalytic activity 5.70E-04 (1.2) |
| yyACrCGT (2) | 6.56 | ST_G-box | GO:0009605 response to external stimulus 4.80E-02 (1.6); GO:0006950 response to stress 3.42E-02 (1.6) | |
| mATATTTT | 5.51 | GM_Nodule-site1 | GATATATT | |
| CCAATnCm | 5.78 | CAATBOX1; HV_ATC-motif | GO:0008152 metabolism 2.01E-02 (1.2) | |
| rkTCAwGm | 5.42 | GO:0003824 catalytic activity 6.17E-05 (1.2) | ||
| ssCGCCnA (2) | 9.13 | E2F1OSPCNA | GO:0000067 DNA replication and chromosome cycle 4.74E-02 (3.0); GO:0006259 DNA metabolism 2.15E-03 (2.3); GO:0007049 cell cycle 4.29E-02 (2.2) | |
| TTTATGnG | 7.1 | |||
| TCAwATAA | 6.74 |
*Numbers in parentheses indicate the number of clusters (containing co-expressed genes) in which the motif was independently identified. The letters in parenthesis refer to the updated TFBS identified using the two-way clustering: (a) GCAAnTCn; (b) GTACmwGy; (c) yCATTTAT; (d) mkTTGACT; (e) ATrrwACA; (f) AAACCCTA; (g) mGnCAAAG. †Network-level Conservation score. ‡Residues in bold indicate the matching position between the known motif and the motif found in this study. Known motifs were retrieved from PLACE [26] and PlantCARE [27]. §Only the first three GO categories according to the highest enrichment score are shown. The enrichment score is shown as number in parentheses.
Figure 3Motif synergy map for 139 modules with significant GO Biological Process annotation. The full and dotted lines connect motifs cooperating in modules containing two and three TFBSs, respectively. Line colors indicate the GO Biological Process enrichment for Arabidopsis genes containing this module (see also Additional data file 7).
Figure 4Correlation between cis-regulatory modules and clusters of co-expressed genes. Rows depict co-expression clusters with their corresponding cluster number and brief description, if available, whereas columns show modules with their corresponding GO descriptions. The number of genes within each co-expression cluster is indicated in parentheses. Only expression clusters enriched for one (or more) modules are shown. Enrichment was calculated using the hypergeometric distribution and p values were corrected for multiple hypotheses testing with the false discovery rate method (q-value) [76].
Figure 5Motif distance distributions for 30 conserved modules in orthologous target genes between Arabidopsis and poplar. For all modules, the distance (in bp) between cooperative TFBS was measured in 200 conserved orthologous target genes and plotted in a histogram for Arabidopsis and poplar. The white boxes denote the cumulative fraction.