| Literature DB >> 16407329 |
Hana Faiger1, Marina Ivanchenko, Ilana Cohen, Tali E Haran.
Abstract
We carried out in vitro selection experiments to systematically probe the effects of TATA-box flanking sequences on its interaction with the TATA-box binding protein (TBP). This study validates our previous hypothesis that the effect of the flanking sequences on TBP/TATA-box interactions is much more significant when the TATA box has a context-dependent DNA structure. Several interesting observations, with implications for protein-DNA interactions in general, came out of this study. (i) Selected sequences are selection-method specific and TATA-box dependent. (ii) The variability in binding stability as a function of the flanking sequences for (T-A)4 boxes is as large as the variability in binding stability as a function of the core TATA box itself. Thus, for (T-A)4 boxes the flanking sequences completely dominate and determine the binding interaction. (iii) Binding stabilities of all but one of the individual selected sequences of the (T-A)4 form is significantly higher than that of their mononucleotide-based consensus sequence. (iv) Even though the (T-A)4 sequence is symmetric the flanking sequence pattern is asymmetric. We propose that the plasticity of (T-A)n sequences increases the number of conformationally distinct TATA boxes without the need to extent the TBP contact region beyond the eight-base-pair long TATA box.Entities:
Mesh:
Substances:
Year: 2006 PMID: 16407329 PMCID: PMC1326239 DOI: 10.1093/nar/gkj414
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Selected TBP/TATA-box complexes studied individually
| Name | Sequence | Half-life B | ‘B’ fraction | ‘A’ fraction | Reference |
|---|---|---|---|---|---|
| Dissociation kinetics | |||||
| MLPk93 | CCTCGG | 249 (±23) | 0.81 (±5) | 0.19 (±5) | This work |
| MLPk62 | TTGGCG | 213 (±4) | 0.79 (±5) | 0.21 (±5) | This work |
| MLPk52 | TGACGG | 212 (±7) | 0.73 (±3) | 0.27 (±3) | This work |
| MLPk88 | TTCGTC | 206 (±17) | 0.70 (±4) | 0.30 (±4) | This work |
| MLPkcon | TTTGGCG | 232 (±12) | 0.75 (±4) | 0.25 (±4) | This work |
| wt MLP | CGGGC | 255 (±24) | 0.83 (±3) | 0.17 (±3) | |
| fsMLP | CGGAC | 271 (±20) | 0.86 (±2) | 0.14 (±2) | |
| E4k28 | TCCTAGT | 333 (±14) | 0.93 (±2) | 0.07 (±2) | This work |
| E4k60 | TTGGGG | 325 (±12) | 0.89 (±2) | 0.11 (±2) | This work |
| E4k56 | GGGTCT | 254 (±11) | 0.89 (±2) | 0.11 (±2) | This work |
| E4k30 | TAGCGC | 243 (±6) | 0.89 (±3) | 0.11 (±3) | This work |
| E4k67 | GGTCGA | 197 (±3) | 0.86 (±1) | 0.14 (±1) | This work |
| E4k55 | GGAAGC | 192 (±27) | 0.76 (±4) | 0.24 (±4) | This work |
| E4k53 | TGAACC | 175 (±8) | 0.78 (±5) | 0.22 (±5) | This work |
| E4k36 | TGGTGC | 147 (±5) | 0.83 (±3) | 0.17 (±3) | This work |
| E4kcon mono | GGGGC | 145 (±5) | 0.90 (±3) | 0.10 (±3) | This work |
| E4kcon high | CCCGC | 182 (±8) | 0.88 (±1) | 0.12 (±1) | This work |
| (TA)4 | CGGGC | 163 (±6) | 0.74 (±4) | 0.26 (±4) | |
| fs(TA)4 | CGGAC | 157 (±5) | 0.81 (±5) | 0.19 (±5) | This work |
| wt E4 | AGTCC | 70 (±4) | 0.81 (±5) | 0.19 (±5) | This work |
| T5T7 | CGGGC | 78 (±6) | 0.87 (±2) | 0.13 (±2) | |
| fsT5T7 | CGGAC | 155 (±5) | 0.74 (±2) | 0.26 (±2) | |
| Binding affinity | |||||
| E4t10 | CCCTGC | 2.4 (±0.3) | This work | ||
| E4t16 | AGCCGC | 3.5 (±0.4) | This work | ||
| E4t45 | CCACCC | 4.1 (±0.7) | This work | ||
| E4t6 | GTCCGA | 7.5 (±0.4) | This work | ||
| E4tcon | TCCGT | 3.3 (±0.3) | This work | ||
| wtE4 | AGTCC | 19 (±2) | This work | ||
aEquation used is . A and B are the fraction of molecules dissociating with macroscopic rate constants k1 and k2, respectively. The half-life was determined from t1/2 = ln2/k2.
bNumbers in parentheses are the standard error of the mean. It includes the experimental error between the different independent experiments (5–9 experiments for each sequence) and the difference between the experimental points and the curve-fitting model.
cThese consensus sequences are based on linked higher-order sequence motifs, but also agree with a mononucleotide-based consensus sequence (see text for details).
dExperiments are from (11), reanalyzed here by the equation above.
efs stands for ‘flanking sequences’, and is the name given to these sequences by (11), where only one flanking-sequence variant was analyzed for each TATA box.
Figure 1In vitro selection experiments. EMSA conducted for the separation of selected and non-selected DNA templates. (a) Selection for high binding affinity from a DNA pool containing the MLP TATA box and ten random flanking sequences on each side. (b) Selection for high complex stability from a DNA pool containing the E4 TATA box and 10 random flanking sequences on each side. The gels shown are those after the first selection cycle.
Figure 2Gels showing representative results for the dissociation kinetics of yTBPc (27 nM) from several E4-related TATA boxes embedded in hairpin constructs (0.4 nM). The number below each gel specifies the time after adding competitor DNA (1.76 mM).
Figure 5Gels showing representative binding affinity measurements for yTBPc complexes with several E4-related TATA boxes embedded in hairpin constructs (50 pM).
Consensus sequences and information content of sequences studied here
| Sequence | mononucleotide consensus | Total | Total number of sequences | Sequence used in analysis | |||
|---|---|---|---|---|---|---|---|
| 5′ Side | TATA box | 3′ side | total | ||||
| 1 3 5 7 9 11 13 15 17 19 21 23 25 27 | |||||||
| Consensus of selected sequences based on mononucleotide frequencies | |||||||
| MLP therm. | n | 0.1 (1) | 15.6 (1) | 0.2 (1) | 15.9 (3) | 47 | 42 |
| MLP kinetic | nnnn | 0.5 (1) | 15.6 (1) | 0.5 (1) | 16.6 (2) | 45 | 42 |
| E4 therm. | nnnnnn | 0.3 (1) | 15.6 (1) | 1.4 (1) | 17.3 (2) | 51 | 47 |
| E4 kinetic | nnnnn | 0.6 (1) | 15.6 (1) | 1.9 (1) | 18.1 (3) | 42 | 40 |
| Consensus of selected sequences based on the mononucleotide frequencies observed in higher-order motifs | |||||||
| MLP therm. | nnnnnnnn | ||||||
| MLP kinetic | nn | ||||||
| E4 therm. | nnnnn | ||||||
| E4 kinetic | nnnnn | ||||||
| Consensus of natural promoters | |||||||
| MLP eukaryote | 0.85 (3) | 15.90 (3) | 0.75 (3) | 17.50 (5) | 185 | 176 | |
| MLP human | 1.4 (2) | 15.5 (1) | 1.1 (2) | 18.0 (3) | 42 | 38 | |
| E4 eukaryote | 0.3 (1) | 15.7 (1) | 0.4 (1) | 16.3 (2) | 70 | 59 | |
aTATA boxes are underlined. Boldface letters in the flanking sequences indicate that the reduction in uncertainty for that position (Rseq) is larger than 1 SD from that expected for a sample of that size. Uppercase letters indicate that the frequency of that nucleotide is >50%. An ambiguous code is used whenever there are several nucleotides that are within 1 SD of the most frequent one, and is denoted by a uppercase letter when at least one nucleotide frequency is >50%. K = G or T; S = C or G; W = A or T; B = C, G or T; D = A, G or T; V = A, C or G.
bTotal number of sequences in the non-redundant data.
cUnequivocal TATA-box sequences only. Sequences were deleted if they contained additional and alternative TATA boxes in the flanking sequences.
dBased only on higher-order motifs (2, 3 or 4 bp long) that are statistically significant in the selected sequences (see Table 3 for details). Ambiguous codes are given as discussed in Footnote a. Uppercase letters indicate that this nucleotide is the only one observed in this position, in all three higher-order levels. Italicized letters indicate that the frequency of this base is >50% in all three levels.
eSequences were retrieved from the Eukaryotic Promoter Database [release 82 (35)].
Figure 3Information content of in vitro selected sequences flanking TATA boxes. (a) Constructs containing MLP-like sequences selected for high binding affinity. (b) Constructs containing MLP-like sequences selected for high complex stability. (c) Constructs containing E4-like sequences selected for high binding affinity. (d) Constructs containing E4-like sequences selected for high complex stability. Rseq is the reduction in uncertainty at each position of the binding site. The Rseq of the core TATA box is close to 2. We rescaled the graphs at Rseq= 0.6 to show the pattern in the flanking sequences more clearly.
Figure 4Information content of TATA boxes and their flanking sequences found in the EPD (35). (a) MLP-like TATA boxes from all eukaryotic promoters. (b) MLP-like TATA boxes from human promoters. (c) E4-like TATA boxes from eukaryotic promoters. Rseq is the reduction in uncertainty at each position of the binding site. See Figure 3 for details.
Statistically significant higher-order motifs in the selected DNA pools
| Position | E4 kinetic selection (40 sequences) | E4 thermodynamic selection (47 sequences) | MLP kinetic selection (42 sequences) | ||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Con | Statistically significant dinucleotide | Statistically significant trinucleotide | Statistically significant tetranucleotide | Con | Statistically significant dinucleotide | Statistically significant trinucleotide | Most frequent tetranucleotide | Con | Statistically significant dinucleotide | Statistically significant trinucleotide | Statistically significant tetranucleotide | ||||||||||
| 1 | n | n | n | ||||||||||||||||||
| 2 | n | n | n | ||||||||||||||||||
| 3 | n | n | n | ||||||||||||||||||
| CA 6 | 2.2 | ||||||||||||||||||||
| GT 7 | 2.8 | ||||||||||||||||||||
| 4 | n | n | n | CAT 5 | 5.4 | ||||||||||||||||
| GTT 5 | 5.4 | ||||||||||||||||||||
| AT 9 | 4.1 | CATT 3 | 7.0 | ||||||||||||||||||
| GTTT 3 | 7.0 | ||||||||||||||||||||
| 5 | n | AGG 4 | 4.3 | n | k | ATT 6 | 6.6 | ||||||||||||||
| GG 7 | 2.9 | TT 12 | 6.0 | TTTG 3 | 7.0 | ||||||||||||||||
| 6 | g | n | t | TTG 7 | 7.9 | ||||||||||||||||
| TC 9 | 3.7 | GTCC 3 | 6.6 | TG 11 | 5.3 | TTGG 4 | 9.5 | ||||||||||||||
| TTCG 3 | 7.0 | ||||||||||||||||||||
| 7 | g | s | n | TGG 7 | 7.9 | ||||||||||||||||
| CCCG 3 | 7.2 | CC 8 | 3.1 | TCCG 3 | 6.6 | GG 9 | 4.1 | TGGG 4 | 9.5 | ||||||||||||
| TGGC 3 | 7.0 | ||||||||||||||||||||
| 8 | n | CCG 4 | 4.3 | n | CCG 7 | 7.4 | n | GGG 4 | 4.2 | ||||||||||||
| GCG 4 | 4.3 | GGC 4 | 4.2 | ||||||||||||||||||
| CG 11 | 5.6 | GGTG 3 | 7.2 | CG 10 | 4.3 | CCGC 3 | 6.6 | ||||||||||||||
| CCGT 3 | 6.6 | ||||||||||||||||||||
| 9 | G | CGC 5 | 5.6 | G | CGT 5 | 5.0 | s | ||||||||||||||
| CGC 4 | 3.8 | ||||||||||||||||||||
| GC 13 | 6.9 | CGCT 5 | 5.6 | GT 9 | 3.7 | CGCT 4 | 3.8 | GG 7 | 2.8 | ||||||||||||
| CGTT 5 | 5.0 | ||||||||||||||||||||
| 10 | C | GCT 13 | 6.9 | n | GTT 9 | 3.7 | n | GGT 7 | 2.8 | ||||||||||||
| CT 20 | 3.7 | GCTA 13 | 6.9 | TT 16 | 1.4 | GT 15 | 1.6 | ||||||||||||||
| 11 | T | T | T | ||||||||||||||||||
| 18 | A | A | G | ||||||||||||||||||
| AC 17 | 2.6 | AC 19 | 2.4 | GT 17 | 2.3 | ||||||||||||||||
| 19 | c | ACA 8 | 3.6 | c | AGG 7 | 2.4 | k | GTT 7 | 2.8 | ||||||||||||
| ACG 7 | 2.9 | GGT 6 | 2.2 | ||||||||||||||||||
| CA 8 | 3.6 | ACGC 7 | 8.1 | GG 7 | 2.4 | TT 7 | 2.8 | ||||||||||||||
| CG 7 | 3.0 | ACAC 4 | 4.3 | ||||||||||||||||||
| AGTG 4 | 4.3 | ||||||||||||||||||||
| 20 | g | CGC 7 | 8.1 | g | n | ||||||||||||||||
| CAC 4 | 4.3 | ||||||||||||||||||||
| GTG 4 | 4.3 | ||||||||||||||||||||
| GC 10 | 4.9 | CGCG 3 | 7.2 7.2 | GT 9 | 3.7 | CCCT 3 | 6.6 | ||||||||||||||
| TG 8 | 3.8 | CGCC 3 | |||||||||||||||||||
| 21 | s | GCG 5 | 5.6 | n | GTT 6 | 6.2 | n | ||||||||||||||
| GCT 4 | 3.8 | ||||||||||||||||||||
| CCT 4 | 3.8 | ||||||||||||||||||||
| CG 8 | 3.6 | GCGG 3 | 7.2 | CT 11 | 4.9 | GTTG 4 | 8.9 | ||||||||||||||
| GG 7 | 3.0 | TT 9 | 3.7 | ||||||||||||||||||
| GC 7 | 3.0 | GG 7 | 2.4 | ||||||||||||||||||
| 22 | g | GCG 4 | 4.3 | T | TTG 6 | 6.2 | r | TAG 5 | 5.4 | ||||||||||||
| GGG 4 | 4.3 | GGC 5 | 5.0 | ||||||||||||||||||
| CGG 4 | 4.3 | CTG 5 | 5.0 | ||||||||||||||||||
| GG 10 | 4.9 | CGGC 3 | 7.2 | TG 15 | 7.3 | TTGG 4 | 8.9 | GG 7 | 2.8 | ||||||||||||
| GGGG 3 | 7.2 | GGCA 3 | 6.6 | ||||||||||||||||||
| 23 | G | GGG 6 | 6.9 | s | TGG 6 | 6.2 | s | ||||||||||||||
| GCG 4 | 4.3 | TGA 5 | 5.0 | ||||||||||||||||||
| GGC 4 | 4.3 | GGA 4 | 3.8 | ||||||||||||||||||
| GG 13 | 6.9 | GG 8 | 3.1 | TCCC 4 | 8.9 | ||||||||||||||||
| GC 7 | 3.0 | GA 8 | 3.1 | TACC 3 | 6.6 | ||||||||||||||||
| TGGC 3 | 6.6 | ||||||||||||||||||||
| 24 | G | GGC 6 | 6.9 | n | CCC 4 | 4.3 | k | CTG 4 | 4.2 | ||||||||||||
| GGC 4 | 4.3 | ||||||||||||||||||||
| GC 8 | 3.6 | CC 9 | 3.6 | CGGC 3 | 6.6 | CTGA 3 | 7.0 | ||||||||||||||
| AC 7 | 2.4 | ||||||||||||||||||||
| 25 | n | GGG 4 | 4.3 | c | n | TGA 4 | 4.2 | ||||||||||||||
| TGG 4 | 4.3 | GCT 4 | 4.2 | ||||||||||||||||||
| GG 9 | 4.2 | TGGG 3 | 7.2 | CG 7 | 2.4 | GGCG 3 | 6.6 | GG 7 | 2.8 | GCTC 3 | 7.0 | ||||||||||
| 26 | g | GGG 6 | 6.9 | n | CCC 5 | 5.0 | n | ||||||||||||||
| CGG 4 | 4.3 | CAC 5 | 5.0 | ||||||||||||||||||
| CGC 4 | 3.8 | ||||||||||||||||||||
| GG 12 | 6.2 | GGGG 4 | 9.9 | CC 8 | 3.1 | CACG 3 | 6.6 | GT 7 | 2.8 | ||||||||||||
| 27 | g | GGG 7 | 8.3 | b | ACC 4 | 3.8 | k | ||||||||||||||
| GG 7 | 3.0 | CG 8 | 3.1 | GC 7 | 2.8 | ||||||||||||||||
| 28 | n | n | n | ||||||||||||||||||
aPosition in the sequence.
bMononucleotide-based consensus sequence. For details on the lettering see Table 2.
cDNA tracts (2, 3 or 4 bp long), which are statistically significant at each position. Dinucleotides are positioned between the two bases that constitute it, trinucleotides on the central base, and tetranucleotides between the second and third base. Here lettering is not an indication on the frequency of occurrence. The subscript numbers are the number of occurrences of each motif.
dZ-statistics or the deviation of the observed frequency of DNA tracts from that expected based on its mononucleotide composition. It is calculated by subtracting from the observed number of occurrences of the most frequent motif, the expected number of occurrences based on the mononucleotide frequency of the respective base pairs, and then dividing this value by the expected standard deviation (25). Statistically significant motifs are those that appear with frequency higher than that observed in a completely random sequence set, of similar size, in which there is an equal representation of each nucleotide in each position.