| Literature DB >> 16740159 |
Abstract
BACKGROUND: The human genome contains over one million Alu repeat elements whose distribution is not uniform. While metabolism-related genes were shown to be enriched with Alu, in structural genes Alu elements are under-represented. Such observations led researchers to suggest that Alu elements were involved in gene regulation and were selected to be present in some genes and absent from others. This hypothesis is gaining strength due to findings that indicate involvement of Alu elements in a variety of functions; for example, Alu sequences were found to contain several functional transcription factor (TF) binding sites (BSs). We performed a search for new putative BSs on Alu elements, using a database of Position Specific Score Matrices (PSSMs). We searched consensus Alu sequences as well as specific Alu elements that appear on the 5 Kbp regions upstream to the transcription start site (TSS) of about 14000 genes.Entities:
Mesh:
Substances:
Year: 2006 PMID: 16740159 PMCID: PMC1513395 DOI: 10.1186/1471-2164-7-133
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 3.969
Twenty five binding motifs whose scores pass T1. Those that pass FDR of Q = 0.20 on at least one major Alu subfamily sequence are in bold. The consensus sites of the PSSMs (from the IUPAC convention [33]) is in the second column and the target sequences with the highest scores among all the major Alu consensus sequences – in the third, with nucleotides that agree with the consensus denoted by capital letters. The fourth column contains the locations of the putative target sites on the Alu sequences of the subfamily with the highest score (in case the same sequence appears on several Alu elements, we choose the one with the highest number of copies in the 5 kb upstream regions). Two Alu subsequences serve as putative target sites of several TFs (designated by*, and **). The fifth column contains the p-values (see text and methods) and the number of subfamilies on which the BSs of the third column resides is listed in column 6.
| NNGGATTAANNN | TGGGATTAcAGG | AluSx (-22:-33)* | 4.00E-03 | 8 | |
| ANGRGATTASMN | cTGGGATTACAG | AluSx (-23:-34)* | 2.60E-03 | 8 | |
| NNAAATCNNAGNN | TGtAATCCCAGCA | AluSx (24:36)* | 2.00E-03 | 8 | |
| WNTAATCCCAR | TGTAATCCCAG | AluSx (24:34)* | 1.00E-05 | 8 | |
| TCCCAGCTACTTTGGGA | TCCCAGCactTTgGGa | AluSx (29:45) | 1.00E-05 | 8 | |
| TTTGGGARR | TTTGGGAGG | AluSx (38:46) | 1.70E-03 | 8 | |
| NGNTYACTNNMGKTCA | GGATCACcTGAGGTCA | AluSx (58:73) | 1.00E-04 | 2 | |
| NATCACGTGAN | GATCACcTGAG | AluSx (59:69) | 1.60E-03 | 3 | |
| V$RORA1.01 | NNWWNNAGGTCAN | ATcACGAGGTCAA | AluSc (60:72) | 8.50E-03 | 1 |
| TYAAGTG | TCAAGTG | AluJb (-62:-68) | 1.00E-05 | 1 | |
| V$HNF4.02 | NNGGNCNAAAGNTCN | GAGGTCAAgAGATCG | AluSc (65:79) | 6.60E-03 | 1 |
| NNAGGTCANNGTGACCT | GTtGGTCAGGcTGgtCT | AluSp (-82:-98) | 4.90E-03 | 1 | |
| V$CHREBP _MLX_01 | CAYGNGNNANNSNNGTG | CACGGtGAAACCCCGTc | AluY (96:112) | 6.50E-03 | 1 |
| RTTKCATCA | GTTTCAcCA | AluSx (-100:-108) | 3.60E-03 | 5 | |
| TGTTACTAAAAATAGAAM | ctcTACTAAAAATAcAAA | AluSx (114:131)** | 4.00E-04 | 6 | |
| NNKWKCTAWAAATAGMNN | CTcTaCTAAAAATAcAAA | AluSx (114:131)** | 3.50E-03 | 6 | |
| NNNCTAWAAATAGMNN | CTACTAAAAATAcAAA | AluSx (116:131)** | 2.60E-03 | 6 | |
| NWKCTAWAAATAGMNN | CTaCTAAAAATAcAAA | AluSx (116:131)** | 3.30E-03 | 6 | |
| ANKCTAWAAATAGMWNN | cTaCTAAAAATAcAAAA | AluSx (116:132)** | 3.00E-03 | 6 | |
| NNANANAATCMANANNT | ACAAAAAATaCAAAAAT | AluJo (118:134)** | 4.00E-04 | 2 | |
| V$BRN2.03 | NNMTWNATTWNMWTN | ACAaAAATTAGCTgG | AluSc (125:139)** | 8.30E-03 | 1 |
| V$HNF1.01 | GGTTAATNWTTAMM | GGcTAATTTTTgtA | AluSx (-126:-139)** | 8.80E-03 | 6 |
| V$HNF1.03 | GGTTAATNWTTRNC | GGcTAATTTTTGTa | AluSx (-126:-139)** | 6.60E-03 | 6 |
| NAMWAATTASS | AAAAAATTAGC | AluY (127:137)** | 4.00E-04 | 1 | |
| SNAAAGYGAAACY | CAAgAGCGAAACT | AluSp (264:276) | 3.60E-03 | 2 |
TFs for which the binding to the Alu-associated BM has been experimentally verified. Most passed T1 and FDR of 20%. The last two (marked by **) did not pass our promoter regions search, but were reported in the literature as bound to Alu. Four TFs (bold) are new: they were found by us but their being bound to Alu went unnoticed. The first column is the PSSM symbol from MatInspector [30] or TRANSCFAC [31] the second and third columns are the gene symbol and description of the biological process. The last two columns list those target genes of each TF that have on-Alu BSs, and the appropriate references.
| Eye, heart, tooth and abdominal organs | [40,42] | |||
| Heart formation | [42] | |||
| V$LYF1.01* | ZNFN1A1 (LYF1) | T-cell differentiation | [27] | |
| Hematopoietic differentiation | [41] | |||
| V$LXRE.01* | [56] | |||
| V$ER.01–02 | [18] | |||
| V$RAR.01* | Retonic acid receptors family. Embryonic morphogenesis | [22] | ||
| V$GC.01* | Nuclear factor. Tumer suppressor | [21] | ||
| Lung development | [39] | |||
| V$PAX6.02** | Pancreas development | [45] | ||
| V$YY1.01** | Nuclear factor | [20] |
List of human target genes of (murine) Gfi-1 which contain Alu in their 5' upstream region. In these genes some of the Gfi-1 target sites are on Alu sequences; such sequences are denoted by '+' in the third column. The first column contains the target gene name; the second column is a potential binding sequence in the promoter region of the gene. The fourth column is a result of EMSA binding experiment for the sequence in column 2 [41].
| AZU | 1, 5'-CTCCCAAAGTG | + | + |
| 2, 5'-CGGCCTGGAG | - | + | |
| 3, 5'-ACTTCCTGCCTC | - | - | |
| 4, 5'-TTTCACACCTG | + | + | |
| 5, 5'-GGTGGACAACTG | - | - | |
| AAT | 1, 5'-GCACACGCCTG | + | + |
| 2, 5'-CCCAGGAGGTT | - | + | |
| 3, 5'-ATAGCTGACC | - | - | |
| ACT | 1, 5'-GCTCATGCCTG | + | + |
| 2, 5'-TTCTTGGCTGCCACTGATTCCCTGTGCCTT-3' | - | + | |
| 3, 5'-GCCAAAATA | - | + | |
| Jak3 | 1, 5'-CTCACGTGTG | + | + |
| P21 | 1, 5'-GCATGCCTG | + | + |
| 2, 5'-ATGTCTGGG | - | + |
*These sites were tested in transient transfection assays.
List of biological processes whose regulation involves transcription factors with putative target sites on Alu. Those transcription factors that were experimentally found to bind to Alu (see Table 2) are in bold. These reported binding events involved specific Alu sequences, and were not based on genome-wide screening.
| Nuclear factors and stress response | |
| Hematopoietic differentiation | |
| Heart development/Muscle development | |
| Brain and CNS development: | |
| Eye development | |
| Pancreas developments | |
| Embryonic development | GSC.01 ( |
| Sterol biosynthesis | SREBP.03 ( |
| Interferon | IRF1.01 ( |
Figure 1The nucleotide density of different repetitive elements in the 5 Kbp upstream to the TSS regions of 13686 genes. The average genomic Alu content is 10% and the nucleotide density associated with all the repetitive elements is about 45% [1].
List of biological processes and statistics of the corresponding Alu content in the 5 kbp upstream regions of the associated genes.
| central nervous system development | 86 | 167 | 1.94 | 1.73E-06 |
| skeletal development | 105 | 210 | 2.00 | 2.13E-07 |
| potassium ion transport | 128 | 270 | 2.11 | 9.41E-08 |
| organogenesis | 816 | 1986 | 2.43 | 4.63E-25 |
| cell-cell adhesion | 114 | 281 | 2.46 | 1.28E-04 |
| neurogenesis | 335 | 835 | 2.49 | 2.20E-10 |
| cell-cell signaling | 470 | 1190 | 2.53 | 3.18E-13 |
| morphogenesis | 1010 | 2606 | 2.58 | 9.42E-24 |
| cell adhesion | 471 | 1288 | 2.74 | 2.54E-09 |
| muscle development | 150 | 412 | 2.74 | 7.95E-04 |
| sensory perception | 234 | 646 | 2.76 | 3.76E-05 |
| ion transport | 573 | 1595 | 2.78 | 5.06E-10 |
| inflammatory response | 181 | 507 | 2.80 | 5.50E-04 |
| development | 1519 | 4274 | 2.81 | 3.92E-21 |
| immune response | 635 | 1853 | 2.92 | 4.21E-08 |
| response to external stimulus | 910 | 2727 | 3.00 | 7.05E-09 |
| cell communication | 2535 | 7885 | 3.11 | 3.66E-14 |
| signal transduction | 2017 | 6538 | 3.24 | 2.79E-07 |
| genome | 13686 | 49748 | 3.63 | 1 |
| metabolism | 5605 | 21936 | 3.91 | 6.85E-08 |
| cell cycle | 651 | 2751 | 4.23 | 5.56E-06 |
| biosynthesis | 899 | 3827 | 4.26 | 2.92E-08 |
| intracellular transport | 406 | 1840 | 4.53 | 3.86E-08 |
| protein localization | 418 | 1981 | 4.74 | 7.15E-12 |
| protein biosynthesis | 452 | 2155 | 4.77 | 3.24E-13 |
| mrna processing | 205 | 1006 | 4.91 | 2.37E-08 |
| rna processing | 322 | 1611 | 5.00 | 6.70E-14 |
| rna splicing | 173 | 872 | 5.04 | 1.42E-08 |
| rna metabolism | 391 | 1987 | 5.08 | 3.66E-18 |
| ribosome biogenesis | 43 | 223 | 5.19 | 1.62E-03 |
| translation | 138 | 743 | 5.39 | 2.62E-10 |
Figure 2Number of BSs of 19 TFs (selected from the list of 66 – see text) on the 5 Kbp region upstream of 13686 genes. A DNA sequence was considered to be a BS if its score was higher than Tmax. The BS were counted separately for on- Alu (blue) and off-Alu (red) appearances. For 16 out of 19 TFs the majority of the putative BSs reside on Alu. (A) developmental TFs (B) other TFs.
Amount of binding motifs (for 25 TFs with Tmax > T1) in the 500 bp and 5000 bp regions of 13686 genes. For each BM we show statistics with respect to sequences that passed Tmax and T5 (see text for further details). We also checked whether the BS resides on Alu or not (see methods). and give the number of BSs that reside on Alu and their percentage out of the total number of BS.
| 5000 upstream | 500 upstream | |||||||
| BM name | No. of BS on Alu – Tmax | %BS on Alu- Tmax | No. of BS on Alu T5 | %BS on Alu T5 | No. of BS on Alu Tmax | %BS on Alu Tmax | No. of BS on Alu T5 | %BS on Alu T5 |
| V$PITX2_Q2 | 34188 | 99.4% | 45546 | 94.9% | 777 | 99.0% | 1022 | 86.0% |
| V$LXRE.01 | 8193 | 99.2% | 21686 | 74.9% | 185 | 96.9% | 473 | 42.2% |
| V$GSC.01 | 33565 | 96.0% | 36033 | 90.7% | 761 | 89.2% | 812 | 75.5% |
| V$GFI1.01 | 32311 | 95.2% | 33557 | 88.3% | 713 | 84.0% | 742 | 68.5% |
| V$SREBP.03 | 7695 | 93.0% | 18264 | 65.9% | 181 | 44.7% | 399 | 18.8% |
| V$LYF1.01 | 26778 | 91.8% | 27476 | 84.0% | 616 | 73.5% | 624 | 56.1% |
| V$OTX2.01 | 35993 | 84.8% | 35993 | 84.8% | 813 | 64.0% | 813 | 64.0% |
| V$PAX4.01 | 9760 | 84.0% | 23265 | 74.7% | 207 | 63.9% | 508 | 53.4% |
| V$AARE.01 | 13854 | 77.3% | 17047 | 62.0% | 282 | 51.1% | 328 | 30.5% |
| V$MEF2.04 | 17253 | 76.7% | 19766 | 65.2% | 355 | 58.7% | 414 | 43.6% |
| V$CHREBP_MLX_01 | 1537 | 70.3% | 9810 | 68.6% | 49 | 23.6% | 261 | 20.5% |
| V$ER.02 | 3166 | 68.2% | 12256 | 50.7% | 60 | 27.6% | 260 | 15.9% |
| V$NKX25.01 | 14144 | 67.7% | 14754 | 48.9% | 238 | 32.9% | 249 | 18.7% |
| V$LUN1_01 | 15768 | 99.9% | 50303 | 99.6% | 340 | 100.0% | 1143 | 98.1% |
| V$HNF4.02 | 2224 | 61.5% | 11368 | 61.0% | 39 | 18.9% | 259 | 21.6% |
| V$MEF2.02 | 18807 | 56.9% | 18807 | 56.9% | 389 | 30.7% | 389 | 30.7% |
| V$RSRFC4.01 | 18153 | 49.5% | 18153 | 49.5% | 381 | 26.3% | 381 | 26.3% |
| V$RSRFC4.02 | 18424 | 49.0% | 18424 | 49.0% | 385 | 26.7% | 385 | 26.7% |
| V$IRF1.01 | 2116 | 48.8% | 5343 | 19.6% | 57 | 20.0% | 144 | 7.1% |
| V$MEF2.03 | 18027 | 48.2% | 18027 | 48.2% | 383 | 26.3% | 383 | 26.3% |
| V$HNF1.03 | 8819 | 26.9% | 8819 | 26.9% | 216 | 12.9% | 216 | 12.9% |
| V$OC2.01 | 4430 | 25.5% | 8643 | 27.1% | 98 | 15.2% | 204 | 16.6% |
| V$RORA1.01 | 968 | 12.8% | 6891 | 26.7% | 25 | 5.8% | 152 | 10.6% |
| V$HNF1.01 | 2939 | 10.3% | 2939 | 10.3% | 74 | 4.5% | 74 | 4.5% |
| V$BRN2.03 | 2127 | 6.8% | 2127 | 6.8% | 60 | 4.0% | 60 | 4.0% |
Figure 3Number of BSs of 19 TFs (see text) on the 500 bp region upstream of 13686 genes. A DNA sequence was considered to be a BS if its score was higher than Tmax. The BS were counted separately for on (blue) and off Alu (red) 7 appearance. For 7 out of 19 TFs the majority of the putative BSs reside on Alu. (A) developmental TFs (B) other TFs.
Figure 4We present for 6 TFs the number of putative BSs in the 5 Kbp region upstream, averaged over the genes that belong to various biological processes. The 6 TFs shown were chosen because their BSs are non-overlapping and their Tmax is greater than T5 (see text) a. for on-Alu BSs b. all BSs(on and off Alu).
BMs with target sites on B1_mM and B1_Mus ('Alu Like' sequence in mouse). The scores of the target sites were above T5 (those for which the corresponding ALU BS had scores greater than T5 are in bold). The IUPAC convention [33] consensus sites are in column 2, and the third column shows the target site with the highest score among all the major consensus sequence (nucleotides that agree with the consensus are denoted with capital letters). The location of the putative target sites is in column 4, given for the subfamily on which the target site of column 3 appears. Column 6 lists the number, n, of B1 subfamilies on which the BS shown in column 3 is located.
| BM name | Consensus | Binding site | location | p-val | n |
| SNGCCACNNNNNN | CCaCCACGCCCGG | B1_Mm (-3:-15) | 2.8E-02 | 1 | |
| V$ZBRK1_01 | SGGGSMRCAGNYMTTTKTKKSC | GGtGGCGCAcGCCTTTaaTcCC | B1_Mm (11:32) | 2.9E-02 | 2 |
| NNYYWKTAATYWNWY | CGCCTTTAATCcCAg | B1_Mm (20:34) | 3.9E-02 | 2 | |
| V$GKLF_02 | NTYAMAGGRN | ATTAAAGGcG | B1_Mm (-20:-29) | 1.1E-02 | 2 |
| V$CRX.01 | NNNGATTARNNT | TGGGATTAAAGg | B1_Mm (-22:-33) | 7.9E-03 | 2 |
| NNGGATTAANNN | TGGGATTAAAGG | B1_Mm (-22:-33) | 3.0E-04 | 2 | |
| V$TTF1.01 | NNWSTCAAGYRYWN | CCTCcCAAGTGCTG | B1_Mus1 (-33:-46) | 4.6E-03 | 1 |
| V$LEF1.02 | NNNWTCAAAGN | GTCTaCAAAGT | B1_Mm (83:93) | 1.4E-02 | 1 |
| V$PAX6.02 | NNAGKKCCAGGNNMG | TGAGTTCCAGGACAG | B1_Mm (93:107) | 1.0E-05 | 2 |
| NNTCAAGGTCASNN | GTTCcAGGaCAGCC | B1_Mm (96:109) | 3.9E-02 | 2 | |
| V$GRE.01 | NNGGTWCNNNNTGTTCTNR | GTaGccCTGGCTGTcCTGG | B1_Mus1 (-100:-118) | 1.9E-02 | 1 |
| V$CP2.02 | NWCYGSNNMWNNCTNGNY | CTCTGtATAgCCCTGGCT | B1_Mm (-106:-123) | 1.7E-02 | 1 |
| V$MTATA.01 | NNNTWTAAANCNNNNNN | CTCTgTAtAGCCCTGGC | B1_Mm (-107:-123) | 4.6E-02 | 1 |
| NNTGTTACTAAAAATAGAAMNN | AGgGTTtCTctgtATAGccCTG | B1_Mm (-109:-130) | 3.4E-02 | 1 | |
| ANKCTAWAAATAGMWNN | tTTCTcTgtATAGCcCT | B1_Mm (-110:-126) | 4.9E-02 | 1 | |
| NNKWKCTAWAAATAGMNN | GGTTTCTcTgtATAGCCC | B1_Mm (-111:-128) | 3.9E-02 | 1 | |
| NWKCTAWAAATAGMNN | TTTCTcTgtATAGCCC | B1_Mm (-111:-126) | 4.9E-02 | 1 | |
| NNNNNTAAAAATANCNNN | CTGTCTcgAAAaAACAAA | B1_Mm (129:146) | 1.1E-02 | 2 | |
| V$FAST1.01 | NNTTGTKKATTGGS | TTTTGTTTtTTcGa | B1_Mm (-134:-147) | 3.9E-02 | 2 |
| V$HFH8.01 | YRNATAAACANN | CGAAaAAACAAA | B1_Mm (135:146) | 1.1E-02 | 2 |
| NNWATAAACANNN | CGAAaAAACAAAA | B1_Mm (135:147) | 4.5E-02 | 2 | |
| V$HFH1.01 | AWATAAACAWTN | gAAaAAACAAaA | B1_Mm (136:147) | 2.7E-02 | 2 |
| RAANAAAYAWTN | GAAAAAACAAaA | B1_Mm (136:147) | 2.0E-03 | 2 |
Figure 5Scatter plot of nucleotide densities in 5 Kbp upstream of the TSS, that belong to B1 (in mouse genes) and to Alu (in orthologous human genes). For both species the values presented by a point are obtained by averaging over 200 genes that belong to a "gene set". There are 27 gene sets, that were obtained by sorting the 5400 human promoters (that have mouse othologues) according to their Alu content, and dividing them into 27 bins (of 200 genes in each). Each mouse gene set contains the orthologous genes of the corresponding human gene-set.
The nucleotides distribution in the consensus sequences of major Alu subfamilies.
| ALU name | ALU length | C | G | T | G + C |
| AluJo | 283 | 28.62% | 33.92% | 16.25% | 62.54% |
| AluJb | 283 | 28.98% | 34.63% | 15.55% | 63.60% |
| AluSc | 280 | 28.57% | 33.57% | 15.71% | 62.14% |
| AluSg | 281 | 28.83% | 34.16% | 15.30% | 62.99% |
| AluSp | 284 | 28.17% | 33.10% | 15.85% | 61.27% |
| AluSq | 284 | 28.17% | 33.45% | 15.85% | 61.62% |
| AluSx | 283 | 28.98% | 33.92% | 15.55% | 62.90% |
| AluY | 282 | 28.37% | 34.75% | 14.89% | 63.12% |
Number of di -nucleotide appearances in the consensus sequences of major Alu subfamilies.
| ALU NAME | ALU length | AA | AC | AG | AT | CA | CC | CG | CT | GA | GC | GG | GT | TA | TC | TG | TT |
| AluJo | 283 | 11 | 12 | 28 | 8 | 18 | 23 | 21 | 19 | 23 | 33 | 29 | 11 | 8 | 13 | 17 | 8 |
| AluJb | 283 | 13 | 13 | 25 | 7 | 18 | 24 | 22 | 18 | 22 | 31 | 32 | 13 | 6 | 14 | 18 | 6 |
| AluSc | 280 | 15 | 14 | 23 | 9 | 17 | 21 | 24 | 18 | 23 | 29 | 30 | 12 | 7 | 16 | 16 | 5 |
| AluSg | 281 | 15 | 14 | 23 | 8 | 17 | 22 | 25 | 17 | 22 | 30 | 32 | 12 | 7 | 15 | 15 | 6 |
| AluSp | 284 | 19 | 14 | 21 | 10 | 17 | 23 | 23 | 17 | 22 | 28 | 33 | 11 | 7 | 15 | 16 | 7 |
| AluSq | 284 | 19 | 14 | 22 | 8 | 18 | 23 | 21 | 18 | 20 | 28 | 34 | 13 | 7 | 15 | 17 | 6 |
| AluSx | 283 | 15 | 14 | 23 | 8 | 17 | 23 | 24 | 18 | 22 | 30 | 32 | 12 | 7 | 15 | 16 | 6 |
| AluY | 282 | 15 | 15 | 23 | 8 | 16 | 21 | 25 | 18 | 23 | 29 | 34 | 12 | 8 | 15 | 15 | 4 |