| Literature DB >> 17346352 |
Yoshiharu Y Yamamoto1, Hiroyuki Ichida, Minami Matsui, Junichi Obokata, Tetsuya Sakurai, Masakazu Satou, Motoaki Seki, Kazuo Shinozaki, Tomoko Abe.
Abstract
BACKGROUND: Plant promoter architecture is important for understanding regulation and evolution of the promoters, but our current knowledge about plant promoter structure, especially with respect to the core promoter, is insufficient. Several promoter elements including TATA box, and several types of transcriptional regulatory elements have been found to show local distribution within promoters, and this feature has been successfully utilized for extraction of promoter constituents from human genome.Entities:
Mesh:
Substances:
Year: 2007 PMID: 17346352 PMCID: PMC1832190 DOI: 10.1186/1471-2164-8-67
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 3.969
Figure 1Examples of distribution of peaks. Several examples of hexamer analysis against Arabidopsis promoters are shown. The vertical axis indicates the total count of the whole promoter database. Gray and solid lines show raw and average with 15 bin (width of window), respectively. Instead of the promoter database, a set of 3,000 random fragments of 1 kb length from the Arabidopsis genome were used for the occurrence analysis as negative controls (shown as "random genome" in the bottom columns).
Figure 2Parameters for peak detection. (A) Graph is a distribution profile of CACGTG in Arabidopsis promoters. Average with 15 bin is shown. The dotted line indicates the Base Line, which is an average of -1,000 to -500. The light grey area shows Peak Area. The dark grey area is Δarea, an indication of the fluctuation from the Base Line from -1,000 to -500. In addition, the following parameters have been defined: Relative Peak Area (RPA) = Peak Area/total area; Relative Peak Height (RPH) = peak height/Base Line; Peak Area/basal fluctuation = Peak Area/Δarea per peak width; Peak height/SD = peak height/standard deviation of occurrence from -1000 to -500. Several parameters of this graph are shown in Table 2 (CACGTG). (B) All the hexamers were analyzed to obtain various parameters, and (Peak Area/basal fluctuation) and peak position were calculated. The graph shows the results. Each dot shows the data of an individual hexamer. Among the 4,096 hexamers (grey dots), 247 peak positive hexamers have been selected (solid dots). The graph demonstrates that hexamers with a significant value have a peak position from -200 to -13 (the most downstream position after smoothing).
Y Patch and TATA Box identified from Arabidopsis hexamer analysis
| Sequence | Peak position1 (bp) | Peak width2 (bp) | #promoter3 | Relative Peak Height (RPH) | Relative Peak Area (RPA) |
| Y Patch | |||||
| TCTCTC | -13 | 158 | 6,741 | 10.96 | 0.25 |
| CCTCTC | -13 | 107 | 3,106 | 8.13 | 0.20 |
| CTTCTC | -13 | 88 | 5,916 | 7.64 | 0.15 |
| CTCCTC | -13 | 81 | 3,180 | 7.23 | 0.12 |
| CTCTTC | -13 | 91 | 5,393 | 7.02 | 0.14 |
| CTCTCC | -13 | 108 | 3,153 | 6.95 | 0.16 |
| TCCCTC | -13 | 93 | 2,140 | 6.13 | 0.15 |
| TTCTTC | -13 | 75 | 8,829 | 5.78 | 0.11 |
| TTCTCT | -13 | 109 | 8,314 | 5.77 | 0.12 |
| TATA Box | |||||
| TATAAA | -35 | 30 | 10,704 | 9.0 | 0.10 |
| TATATA | -36 | 27 | 10,315 | 6.38 | 0.07 |
| ATATAA | -35 | 27 | 10,062 | 6.14 | 0.07 |
| ATAAAT | -35 | 27 | 10,572 | 5.14 | 0.05 |
| TAAATA | -34 | 25 | 9,801 | 4.65 | 0.04 |
| ATATAT | -35 | 24 | 10,412 | 3.84 | 0.04 |
| TTATAA | -36 | 23 | 9,172 | 3.36 | 0.03 |
| TTATAT | -36 | 23 | 9,639 | 3.10 | 0.03 |
1In this analysis, -13 is the position for an average from -20 to -6 that covers a region from -20 to -1, so -13 is the most downstream position.
2Peak width at the bottom of the peak.
3Number of promoters containing the element out of 15,607 Arabidopsis promoters (-1,000 to -1). Number of promoters containing an element within the peak area can be roughly estimated by #promoters × RPA. For example, TATAAA is found in approx. 1,070 promoters within the peak area (10,704 × 0.10).
REGs identified from Arabidopsis hexamer analysis
| Sequence | Peak position (bp) | Peak width (bp) | #promoter | Relative Peak Height (RPH) | Relative Peak Area (RPA) |
| AGGCCC | -76 | 326 | 2,005 | 14.78 | 0.54 |
| GGCCCA | -73 | 347 | 1,225 | 12.26 | 0.53 |
| GGGCCT | -106 | 240 | 1,764 | 10.31 | 0.47 |
| TGGGCC | -107 | 262 | 2,867 | 9.29 | 0.46 |
| GGGCCC | -91 | 256 | 711 | 9.51 | 0.44 |
| GCCCAT | -76 | 320 | 2,925 | 8.41 | 0.43 |
| GCCCAA | -72 | 366 | 3,068 | 7.78 | 0.42 |
| AGCCCA | -85 | 284 | 2,963 | 7.53 | 0.39 |
| CACGTG | -80 | 273 | 3,039 | 6.85 | 0.38 |
| AAGCCC | -86 | 299 | 2,593 | 7.48 | 0.37 |
| CGGCCC | -62 | 189 | 732 | 7.66 | 0.36 |
| CCACGT | -83 | 260 | 2,367 | 5.66 | 0.35 |
| ATGGGC | -97 | 295 | 2,836 | 6.29 | 0.35 |
| CGTGGC | -97 | 251 | 1,459 | 5.96 | 0.35 |
| TAGGCC | -75 | 311 | 1,435 | 6.18 | 0.34 |
| CGTGTC | -79 | 289 | 1,909 | 5.57 | 0.33 |
| AAGGCC | -77 | 287 | 1,935 | 6.27 | 0.33 |
| GCGCGT | -59 | 244 | 632 | 5.56 | 0.32 |
| GCCACG | -83 | 215 | 1,411 | 6.64 | 0.31 |
| ACGCGC | -65 | 190 | 655 | 5.08 | 0.31 |
| GGGCCG | -85 | 196 | 711 | 6.01 | 0.30 |
| CACGCG | -138 | 182 | 884 | 5.22 | 0.30 |
Figure 3Directional preference of LDSS-positive hexamers. When the corresponding complementary sequence was not found in the LDSS-positive group, the hexamer was counted as "uniq", which means orientation-sensitive. When found, the sequence was counted as "comp", meaning direction-insensitive. The number of both hexamers were counted according to the peak position from the TSS, and summarized in a bar graph. The inset graph is an enlargement to show more detail around the TSS.
Figure 4Comparison of . (A) 987 octamers that are LDSS-positive in either Arabidopsis or rice promoters were selected and their Relative Peak Height (RPH) was compared and expressed as a scatter plot. Each dot is data from an individual octamer sequence. (B) LDSS-positive octamer sequences of Arabidopsis and rice were compared, and common sequences found in both sets were identified. The figure shows the number of octamer sequences. Classification into the Y and TATA groups were done based on distribution profiles as shown in Figure 5. The REG group has a peak position between -51 and -200.
Figure 5Clustering of LDSS-positive sequences based on distribution profiles. Distribution profiles of each LDSS-positive octamer of Arabidopsis were subjected to hierarchical clustering. Three major clusters are shown.
Figure 6REG-promoter clustering. For each Arabidopsis promoter, number of each octamer REG within a region from -400 to -40 bp was scored, and subjected to 2D hierarchical clustering. The vertical axis shows promoters and the horizontal axis does REGs. The matrix means number of REG sequences. Two small promoter clusters are shown in the figure together with the whole REGs. (A) A part of promoter cluster rich in GCCCA motif for meristematic expression. Ribosomal proteins are shown in blue. (B) A part of promoter cluster rich in ACGT motif for environmental response. Promoter names are expressed in color according to expression data from AtGenExpress. Red: abiotic stress-positive, orange: abiotic stress-negative, green: light-positive, black: no response to abiotic stress or light, grey: no expression data found. (C) An example of clustered REGs. A part of the ACGT cluster shown in the top of Panel A is enlarged. ACGT in the octamers are highlighted with orange.
Classification of octamer REGs
| Group | Motif1 | Motif name | Comment | Trans factor | Expression | Reference | At1 | Rice1 | At & Rice2 |
| 1 | GCCCA | Element II of | PCF1, PCF2, TCP20 | cell cycle/meristematic expression | [35, 60] | 36 | 68 | 71 | |
| 2 | ACGT | "ACGT Core", G-box, ABRE, | bZIP family (GBF, TGA1, etc.), PIF3 | environmental response (light, UV, drought, ABA) | [36, 61] | 33 | 4 | 9 | |
| 3 | ACGCGC | CGCG box | AtSR1(CaMBP) | stress response? | [62] | 7 | 1 | 0 | |
| 4 | CCGAC | DRE | DRE core | DREB/CBF | stress response | [39] | 9 | 3 | 0 |
| 5 | AACCG(G/A) | novel | overlapping with GT1 box (TTAACC) | ? | not known | this study | 36 | 1 | 0 |
| 6 | AAACG(C/G) | novel | ? | not known | this study | 13 | 1 | 2 | |
| 7 | ACCCCT | novel | ? | not known | this study | 4 | 0 | 0 | |
| 8 | ACCCT | novel | ? | not known | this study | 4 | 0 | 0 | |
| 9 | ACGGGC | novel | ? | not known | this study | 2 | 5 | 1 | |
| 10 | CCATGG | novel | ? | not known | this study | 1 | 1 | 2 | |
| 11 | CCAACGG | novel | ? | not known | this study | 1 | 4 | 6 | |
| 12 | GGGACCC | novel | ? | not known | this study | 4 | 3 | 4 | |
| Rest | 74 | 66 | 1 | ||||||
| Total | 308 | 242 | 90 |
1Number of octamer sequences. This classification is not completely mutually exclusive.
Figure 7Clustering of REGs. Aided by REG-promoter clustering, Arabidopsis REGs were subjected to classification. Colored dots in the figure mean presence of the corresponding motif in the REG sequence. The tree is the same as one in Figure 6A.
Several REG groups were identified from Arabidopsis and rice octamer analysis
| Rice | |
| *GGCCCA* | |
| AGGCCCAA# | AGGCCCAA# |
| AGGCCCAC# | AGGCCCAC# |
| AGGCCCAG# | AGGCCCAG# |
| AGGCCCAT# | AGGCCCAT# |
| CGGCCCAA# | CGGCCCAA# |
| CGGCCCAT# | CGGCCCAC |
| GGGCCCAA# | CGGCCCAG |
| GGGCCCAG# | CGGCCCAT# |
| GGGCCCAT# | GGGCCCAA# |
| TGGCCCAA | GGGCCCAC |
| TGGCCCAG# | GGGCCCAG# |
| TGGCCCAT# | GGGCCCAT# |
| TGGCCCAC | |
| TGGCCCAG# | |
| TGGCCCAT# | |
| **ACGT**, *ACGT*** | |
| ACACGTCA | ACACGTGG# |
| ACACGTGA | CACGTCAC# |
| ACACGTGG# | CACGTCTC |
| CACGTCAC# | CACGTGGC# |
| CACGTCAG | CACGTGGG# |
| CACGTCAT | CACGTGTC# |
| CACGTCTC# | |
| CACGTGAC | |
| CACGTGCG | |
| CACGTGGA | |
| CACGTGGC# | |
| CACGTGGG# | |
| CACGTGGT | |
| CACGTGTA | |
| CACGTGTC# | |
| CACGTGTG | |
| CACGTGTT | |
| CCACGTAG | |
| CCACGTCA | |
| CCACGTCG | |
| GACGTCGT |
REGs found in both Arabidopsis and rice are indicated with a sharp (hash) symbol. An asterisk indicates any base and is used to restrict the position of the motif in the octamer sequence.
PLACE cis-elements found and not found in Arabidopsis REGs
| 1 | ACGT | |
| 2 | ACGTG | |
| 3 | CCGAC | |
| 4 | GCCAC | |
| 5 | GGGCC | |
| 6 | TTGAC | |
| 7 | CACGTG | |
| 8 | YAACKG | |
| 9 | CNGTTR | |
| 10 | TGGGCY | |
| 11 | ACGTGKC | |
| 12 | ACACNNG | |
| 13 | ACGTGTC | |
| 14 | TTAATGG | |
| 15 | CAAAACGC | |
| 16 | CACGTGGC | |
| 17 | TGACGTGG | |
| 18 | CCACGTCA | |
| 19 | CAACA | |
| 20 | RCCGAC | |
| 21 | TTGACC | |
| 22 | NGATT | |
| 23 | TGTCTC | |
| 24 | CCGTCG | |
| 25 | GATAAG | |
| 26 | WAACCA | |
| 27 | TAACTG | |
| 28 | CATGTG | |
| 29 | CACATG | |
| 30 | ACTCAT | |
| 31 | CACCTG | |
| 32 | TTATCC | |
| 33 | ACTTTG | |
| 34 | AGCCGCC | |
| 35 | TAACAAR | |
| 36 | CCAATGT | |
| 37 | ACCGACA | |
| 38 | CTAACCA | |
| 39 | GAGTGAG | |
| 40 | CCACGTGG | |
| 41 | AAMAATCT | |
| 42 | TTTCCCGC | |
| 43 | TAAATGYA | |
| 44 | CGCGGATC | |
| 45 | GTGATCAC | |
| 46 | CATGCATG | |
| 47 | AAACCCTA | |
| 48 | ATACGTGT |
Figure 8Identification of YR Rule. (A) Dinucleotide sequences at the -1/+1 position relative to Arabidopsis TSS, determined by information of the fl-cDNAs, were counted. As shown, most of the TSS have (C/T)(A/G), and this YR Rule applies to 77% of the analyzed TSSs. (B) Frequency of dinucleotide sequences fitting with YR Rule was scanned from -5 to +5 of Arabidopsis and rice TSS. Position of the downstream site of the dimer is shown. For example, the -1/+1 position is indicated as "1". Theoretically frequency of YR in non-biased sequence is 0.25.
Figure 9Illustration of YR Rule, Y Patch, TATA box, and REG. (A) Expected appearance positions relative to the TSS are as follows: YR Rule (-1/+1), Y Patch (-100 to -1), TATA box (-50 to -20), REG (-20 to -400). Among them, only the REG is orientation-insensitive, and the other groups are sensitive. In many cases the Y Patch locates between the TATA boxes and the TSS, but it is also observed upstream of the TATA boxes. (B) An example of an Arabidopsis promoter that has a Y Patch and TATA box. At1g10960 is one of the promoters clustered in Figure 6B. The promoter sequence from -100 to +1 is shown together with octamer motifs. Marks on the sequence are the same as illustrated in (A).