| Literature DB >> 19056824 |
Valeriy Sorokin1, Konstantin Severinov, Mikhail S Gelfand.
Abstract
We present here the results of a systematic bioinformatics analysis of control (C) proteins, a class of DNA-binding regulators that control time-delayed transcription of their own genes as well as restriction endonuclease genes in many type II restriction-modification systems. More than 290 C protein homologs were identified and DNA-binding sites for approximately 70% of new and previously known C proteins were predicted by a combination of phylogenetic footprinting and motif searches in DNA upstream of C protein genes. Additional analysis revealed that a large proportion of C protein genes are translated from leaderless RNA, which may contribute to time-delayed nature of genetic switches operated by these proteins. Analysis of genetic contexts of newly identified C protein genes revealed that they are not exclusively associated with restriction-modification genes; numerous instances of associations with genes originating from mobile genetic elements were observed. These instances might be vestiges of ancient horizontal transfers and indicate that during evolution ancestral restriction-modification system genes were the sites of mobile elements insertions.Entities:
Mesh:
Substances:
Year: 2008 PMID: 19056824 PMCID: PMC2632904 DOI: 10.1093/nar/gkn931
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Figure 1.Maximum likelihood tree built of REBASE and newly discovered putative C proteins. Color indicates C proteins whose predicted binding sites fall into distinct motifs (1 through 10). The subdivided motifs (2, 2b, 2c and 6, 6b) are marked with similar, but not identical colors. The asterisk preceding the protein id indicates a relatively lower level of prediction confidence.
Motifs, the number of candidate C proteins and the gene content of respective loci
| Motifs | Total members | Motif type | Number of loci with RM-related orfs | Number of loci with phage-related orfs | |
|---|---|---|---|---|---|
| Motif 1 | 21 | 0 | C.PvuII-like (double-box motif) | 7 | 15 |
| Motif 2 | 33 | 12 | C.PvuII-like (double-box motif) | 21 | 23 |
| Motif 2b | 14 | 6 | C.PvuII-like (double-box motif) | 8 | 10 |
| Motif 2c | 12 | 1 | C.PvuII-like (double-box motif) | 2 | 6 |
| Motif 3 | 18 | 0 | C.PvuII-like (double-box motif) | 0 | 18 |
| Motif 4 | 8 | 0 | C.PvuII-like (double-box motif) | 3 | 7 |
| Motif 5 | 10 | 0 | C.PvuII-like (single-box motif) | 3 | 9 |
| Motif 6 | 5 | 0 | C.PvuII-like (single-box motif) | 0 | 2 |
| Motif 6b | 10 | 0 | C.PvuII-like (single-box motif) | 1 | 7 |
| Motif 7 | 13 | 3 | C.EcoRV-like (palindromic motif) | 8 | 11 |
| Motif 8 | 14 | 2 | C.EcoO109I-like (palindromic motif) | 6 | 7 |
| Motif 9 | 15 | 6 | new (non-palindromic motif) | 7 | 11 |
| Motif 10 | 8 | 0 | new (palindromic motif) | 1 | 4 |
Figure 2.(a) Logos of C.PvuII-like (1–6) motifs. The total number of members and the number of REBASE members (if any) are indicated for every motif. Paired palindromic boxes (consensus sequences) are marked with light green squares. Palindromic elements of motifs’ architecture are underlined with colored arrows. Conserved trinucleotides found at the outside of the motifs are underlined with orange arrows. (b) Logos of palindromic (7, 8 and 10) motifs. The total number of members and the number of REBASE members (if any) are indicated for every motif. Paired palindromic boxes (consensus sequences) are marked with light green squares. Palindromic elements of motifs’ architecture are underlined with colored arrows. (c) Logo of new motif 9. The total number of members and the number of REBASE members are shown.
Distribution of candidate C protein genes in loci containing RM and phage-related genes
| Genomic loci | RM-related orfs | Phage-related orfs | Both | Total |
|---|---|---|---|---|
| New C protein- family genes | 39 (23%) | 115 (68%) | 27 (16%) | 169 (100%) |
| C protein genes from Rebase with binding sites predicted | 32 (100.0%) | 26 (78%) | 26 (81%) | 32 (100%) |
Loci containing two C protein genes
| Putative C protein gene 1 | Start | End | Binding motif class of C protein 1 | Putative C protein gene 2 | Start | End | Binding motif class of C protein 2 | Genbank ID | Distance |
|---|---|---|---|---|---|---|---|---|---|
| Mlo243 | 54253 | 54495 | Motif 7 | Mlo093 | 54714 | 54923 | Motif 4 | AL672113 | 219 |
| Bxe227 | 453842 | 454198 | Motif 8 | Bxe226 | 453455 | 453781 | Motif 8 | CP000272 | 61 |
| Ccr197 | 2919529 | 2919741 | Motif 2b | Ccr196 | 2919837 | 2920043 | Motif 2b | AE005673 | 96 |
| Mlo092 | 4943838 | 4944044 | Motif 4 | Mlo105 | 4944350 | 4944562 | Motif 1 | BA000012 | 306 |
| Plu192 | 157322 | 157555 | Motif 2 | Plu191 | 154190 | 154423 | Motif 2 | BX571873 | 2899 |
| Brl149 | 2459664 | 2459906 | Motif 5 | Brl147 | 2458276 | 2458518 | Motif 5 | CP000085 | 1146 |
| Bps148 | 521079 | 521321 | Motif 5 | Bps146 | 522475 | 522717 | Motif 5 | BX571966 | 1154 |
Genomic co-occurrence of C protein genes and known Rebase R–M systems
| No. | Putative system ID | Motif | Organism description | Genbank ID | Putative C protein start-end | REBASE system ID | Gene annotation | Gene start-end | Distance |
|---|---|---|---|---|---|---|---|---|---|
| 1 | Pst154 | Motif 1 | Pseudomonas stutzeri A1501, complete genome | CP000304.1 | 749928–750170 | PstA1501ORF647P | Methylase | 743080–745056 | 4872 |
| 2 | Lho088 | Motif 2c | Laribacter hongkongensis plasmid pHLHK8, complete sequence | AY858987.1 | 1735–1983 | LhopHLHKP | Restrictase | 3832–4806 | 1849 |
| 3 | Nha178 | Motif 2b | Nitrobacter hamburgensis X14, complete genome | CP000319.1 | 2773100–2773321 | NhaXORF2515P | Methylase | 2770295–2771002 | 2098 |
| 4 | Hch122 | Motif 2b | Hahella chejuensis KCTC 2396, complete genome | CP000155.1 | 2547264–2547500 | HchORF2488P | Methylase | 2547622–2548812 | 122 |
| 5 | Hne199 | Motif 2 | Hyphomonas neptunium ATCC 15444, complete genome | CP000158.1 | 2696435–2696641 | HneORF2545P | Unannotated short protein | 2696432–2696641 | 0 |
| 6 | Swi193 | Motif 2 | Sphingomonas wittichii RW1, complete genome | CP000699.1 | 1752458–1752658 | SwiRWORF1578P | Methylase | 1754713–1756164 | 2055 |
| 7 | Nwi091 | Motif 4 | Nitrobacter winogradskyi Nb-255, complete genome | CP000115.1 | 922287–922499 | NwiORF847P | Methylase | 924490–925248 | 1991 |
| 8 | Ent115 (C.Esp1396I) | Motif 2 | Enterobacter sp. RFL1396 plasmid pEsp1396, complete sequence | AF527822.1 | 1481–1717 | Esp1396I | Recently annotated C protein | 1481–1717 | 0 |
| 9 | Lpn060 | Motif 6b | Legionella pneumophila str. Corby, complete genome | CP000675.1 | 226951–227193 | LpnCMrrP | Restrictase | 233982–234959 | 6789 |
| 10 | Cef187 | Motif 2 | Corynebacterium efficiens plasmid pCE3 DNA, complete sequence | AP005226.1 | 16363–16605 | CefpCE3MrrP | Restrictase | 10596–11558 | 5047 |
| 11 | Gur068 | Motif 2 | Geobacter uraniumreducens Rf4, complete genome | CP000698.1 | 1337269–1337499 | GurRORF1135P | HTH-domain protein | 1337269–1337499 | 0 |
| 12 | Pgi032 | Motif 9 | Porphyromonas gingivalis W83, complete genome | AE015924.1 | 595797–595997 | PgiTORF544P | Methylase | 596192–598138 | 195 |
| 13 | Bth033 | Motif 9 | Bacteroides thetaiotaomicron VPI-5482, complete genome | AE015928.1 | 5932179–5932379 | BthVORF4518P | Restrictase | 5934256–5936961 | 1877 |
| 14 | Nha244 | Motif 7 | Nitrobacter hamburgensis X14, complete genome | CP000319.1 | 883322–883612 | NhaXORF803P | Methylase | 885888–887198 | 2276 |
Figure 3.(a) Logos of 1–4 motifs’ consensus sequences. Palindromic boxes are marked with light green squares and underlined with colored arrows. The upper logo represents the 5′ (distal) copy, while the lower logo represents the 3′ (proximal) copy. (b) Logos of palindromic (7, 8 and 10) motifs’ consensus sequences. Palindromic boxes are marked with light green squares and underlined with colored arrows. The upper logo corresponds to annotated binding sites, while the lower logo corresponds to their weak downstream copies.
Figure 4.(a) The structure of a region upstream of a typical C.PvuII-like C protein gene. The binding site and the ATG start codon are marked with black color. The palindromic elements of the site are underlined. (b) The histogram of distances between the candidate binding sites and start codons of C protein genes. Only C.PvuII-like motifs (1, 2, 2b, 2c, 3, 4, 5, 6, 6b) are considered. Horizontal axis and numbers above the bars: distances, vertical axis: frequency of such distance.