| Literature DB >> 15888729 |
Svetlana Nikolajewa1, Andreas Beyer, Maik Friedel, Jens Hollunder, Thomas Wilhelm.
Abstract
Restriction enzymes are among the best studied examples of DNA binding proteins. In order to find general patterns in DNA recognition sites, which may reflect important properties of protein-DNA interaction, we analyse the binding sites of all known type II restriction endonucleases. We find a significantly enhanced GC content and discuss three explanations for this phenomenon. Moreover, we study patterns of nucleotide order in recognition sites. Our analysis reveals a striking accumulation of adjacent purines (R) or pyrimidines (Y). We discuss three possible reasons: RR/YY dinucleotides are characterized by (i) stronger H-bond donor and acceptor clusters, (ii) specific geometrical properties and (iii) a low stacking energy. These features make RR/YY steps particularly accessible for specific protein-DNA interactions. Finally, we show that the recognition sites of type II restriction enzymes are underrepresented in host genomes and in phage genomes.Entities:
Mesh:
Substances:
Year: 2005 PMID: 15888729 PMCID: PMC1097771 DOI: 10.1093/nar/gki575
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
All type II restriction enzymes with known three-dimensional structure and their cognate DNA recognition sequences [PDB, (20)]
| Enzyme | Source | Recognition sequence | Purine (1)–pyrimidine (0) pattern |
|---|---|---|---|
| MspI | CCGG | 0011 | |
| FokI | Flavobacterium okeanokoites | GGATG | 11101 |
| EcoRII | Escherichia coli | CCWGG | 00W11 |
| EcoRI | E.coli | GAATTC | 111000 |
| BamHI | Bacillus amyloliquefaciens | GGATCC | 111000 |
| HindIII | Haemophilus influenzae | AAGCTT | 111000 |
| BglII | Bacillus globigii | AGATCT | 111000 |
| BstYI | Bacillus stearothermophilus | RGATCY | 111000 |
| EcoRV | E.coli | GATATC | 110100 |
| Cfr10I | Citrobacter freundii | RCCGGY | 100110 |
| NaeI | Nocardia aerocolonigenes | GCCGGC | 100110 |
| NgoMIV | Neisseria gonorrhoeae | GCCGGC | 100110 |
| HincII | H.influenzae Rc | GTYRAC | 100110 |
| Bse634I | RCCGGY | 100110 | |
| MunI | CAATTG | 011001 | |
| PvuII | Proteus vulgaris | CAGCTG | 011001 |
| BsoBI | B.stearothermophilus | CYCGRG | 000111 |
| EcoO109I | E.coli | RGGNCCY | 111N000 |
| BglI | B.globigii | GCCNNNNNGGC | 100NNNNN110 |
The corresponding purine (1)–pyrimidine (0) coding shows that 11/00 is a common pattern in all binding sites.
aRecognition sequence representations use the standard abbreviations (21) to represent ambiguity. R = G or A; K = G or T; S = G or C; B = not A (C or G or T); D = not C (A or G or T); Y = C or T; M = A or C; W = A or T; H = not G (A or C or T); V = not T (A or C or G) and N = A or C or G or T.
Purine–pyrimidine and ketobase–aminobase patterns in type II restriction enzyme recognition sequences
| Pattern | Symmetrical recognition sequences | Asymmetrical recognition sequences | ||||||
|---|---|---|---|---|---|---|---|---|
| Purine (1)–pyrimidine (0) | Keto (1)–amino (0) | Purine (1)–pyrimidine (0) | Keto (1)–amino (0) | |||||
| Frequency | Frequency | Frequency | Frequency | |||||
| 00 | 1758 | 6.6E−63 | 1097 | 0.61 | 529 | 5.1E−12 | 294 | 1 |
| 01 | 817 | 1 | 1060 | 1 | 214 | 1 | 379 | 0.59 |
| 10 | 903 | 1 | 1278 | 0.01 | 348 | 0.98 | 524 | 2.0E−15 |
| 11 | 1743 | 1.7E−29 | 1389 | 0.01 | 501 | 4.7E−14 | 380 | 0.69 |
| 000 | 348 | 5.5E−08 | 78 | 1 | 288 | 1.5E−24 | 62 | 1 |
| 001 | 328 | 1.8E−08 | 250 | 9.3E−06 | 81 | 1 | 160 | 0.07 |
| 010 | 89 | 1 | 250 | 9.3E−06 | 79 | 1 | 210 | 1.0E−08 |
| 011 | 165 | 0.99 | 302 | 3.3E−10 | 102 | 0.99 | 129 | 0.92 |
| 100 | 269 | 0.04 | 194 | 0.41 | 140 | 0.79 | 142 | 0.52 |
| 101 | 105 | 1 | 117 | 1 | 104 | 0.99 | 156 | 0.16 |
| 110 | 264 | 0.00 | 271 | 1.8E−05 | 193 | 1.0E−05 | 210 | 3.1E−08 |
| 111 | 310 | 8.3E−13 | 132 | 1 | 231 | 1.5E−15 | 128 | 0.95 |
| 0000 | 150 | 3.2E−27 | 14 | 1 | ||||
| 0001 | 3 | 0.59 | 2 | 0.92 | 24 | 0.99 | 31 | 0.99 |
| 0010 | 26 | 0.99 | 91 | 3.4E−08 | ||||
| 0011 | 1 | 0.94 | 3 | 0.42 | 47 | 0.74 | 53 | 0.36 |
| 0100 | 4 | 0.36 | 1 | 0.98 | 32 | 0.99 | 31 | 0.99 |
| 0101 | 9 | 1 | 34 | 0.99 | ||||
| 0110 | 1 | 0.90 | 35 | 0.92 | 81 | 2.4E−05 | ||
| 0111 | 5 | 0.01 | 39 | 0.90 | 27 | 0.99 | ||
| 1000 | 8 | 0.01 | 1 | 0.98 | 78 | 0.00 | 14 | 1 |
| 1001 | 18 | 1 | 83 | 8.2E−06 | ||||
| 1010 | 1 | 0.94 | 2 | 0.68 | 36 | 0.99 | 89 | 2.3E−07 |
| 1011 | 7 | 0.01 | 5 | 0.01 | 45 | 0.73 | 44 | 0.86 |
| 1100 | 3 | 0.54 | 4 | 0.21 | 82 | 2.7E−05 | 24 | 0.99 |
| 1101 | 2 | 0.74 | 2 | 0.41 | 52 | 0.34 | 109 | 2.0E−13 |
| 1110 | 88 | 1.4E−07 | 91 | 1.2E−07 | ||||
| 1111 | 2 | 0.20 | 94 | 2.3E−10 | 20 | 1 | ||
In the pur–pyr coding 1 stands for purine (A, G, R) and 0 for pyrimidine (T, C, S), and in the keto-amino coding 1 stands for a ketobase (G, T, K) and 0 for an aminobase (A, C, M).
Figure 1Example of an interaction between an H-bond donor cluster (resulting from two adjacent purines AA) and an H-bond acceptor (bifurcated hydrogen bond). The figure shows binding of residue Asn141 from EcoRI to the DNA subsequence 5′-D(GAA)-3′ (only one strand shown). Green lines indicate potential hydrogen donor–acceptor pairs; distances are in angstroms. The structure is according to PDB entry 1CKQ. Note the bending towards the major groove, which reduces the distances between the H-bond donors of the two adenines.
Examples of gene regulatory proteins that recognize specific short DNA sequences
| DNA binding protein | Recognition sequence (or consensus motif) | Purine (1)–pyrimidine (0) pattern | References |
|---|---|---|---|
| p53 | RRRCW2GYYYRRRCW2GYYY | 1110W210001110W21000 | ( |
| MADS box | CCW6GG | 00W611 | ( |
| ERSE | CCAATN9CCACG | 00110N900101 | ( |
| Ski oncoprotein | GTCTAGAC | 10001110 | ( |
| GAL4 | CGGN5TN5CCG | 011N50N5001 | ( |
| GAL4 | WGGN10–12CCG | W11N10–12001 | ( |
| nkx-2.5 | CWTTAATTN | 0W001100N | ( |
| Bicoid | TCTAATCCC | 000110000 | ( |
| AP-2 | GCCCCAGGC | 100001110 | ( |
| Stat5-RE | TTCN3GAA | 000N3111 | ( |
| GRE | AGAACAN3TGTTCT | 111101N3010000 | ( |
| SRF | CCW2AW3GG | 00W21W311 | ( |
| MCM1 | CCYW3N2GG | 000W3N211 | ( |
| NFκB | GGGACTTTCC | 111100000 | ( |
| ANGCAANCGNTTNCNT | 1N1011N01N00N0N0 | ( | |
| YY1 | GGCCATCTTG | 1100100001 | ( |
| NF-1/CTF-1 | TGGN6GCCAA | 011N610011 | ( |
| PPAR | AGGAAACTGGA | 11111100111 | ( |
| NFAT | ATTGGAAA | 10011111 | ( |
| CREA | GCGGAGACCCCAG | 1011111000011 | ( |
| C/EBP | CCAAT | 00110 | ( |
| PacC | GCCARG | 100111 | ( |
| TTK finger1 | GAT | 110 | ( |
| TTK finger2 | AGG | 111 | ( |
| Zif finger1 | GCG | 101 | ( |
| Zif finger2 | TGG | 011 | ( |
| GLI finger4 | TTGGG | 00111 | ( |
| GLI finger5 | GACC | 1100 | ( |
| ( | |||
| σ70 (primary) | CTTGA | 00011 | |
| σ32 (heat shock) | CTTGAA | 000111 | |
| σ60 (nitr. reg. gene) | CTGGNA | 0011N1 | |
| σ54 (nit. ox. stress) | TTGG CACG | 0011 0101 | |
| σ28 (exter. stress) | CTAAA | 00111 |