| Literature DB >> 18471287 |
Michael Dekhtyar1, Amelie Morin, Vehary Sakanyan.
Abstract
BACKGROUND: Bacterial promoters, which increase the efficiency of gene expression, differ from other promoters by several characteristics. This difference, not yet widely exploited in bioinformatics, looks promising for the development of relevant computational tools to search for strong promoters in bacterial genomes.Entities:
Mesh:
Substances:
Year: 2008 PMID: 18471287 PMCID: PMC2412878 DOI: 10.1186/1471-2105-9-233
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Figure 1Text-format presentation of strong promoter candidates.
Figure 2Diagram of the fusion DNA constructs used to express the . The argC gene was amplified with forward 5'-GGAGGGGGAACATATGATGAA and reverse 5'-GGACCACCGCGCTACTGCCG primers from pHAV2 [32] by conserving a 112-bp downstream region carrying transcriptional terminators of the vector DNA.
A+T content of bacterial genomes and 300-bp regions located upstream of genes and the percentage of strong promoter candidates predicted in 300-bp real genomic and random-generated regions of the same content.
| (A+T)% of | % of candidates in | ||||
| N° | Genome | Bacterial genomes* | 300-bp genomic regions | 300-bp genomic regions | 300-bp random sequences |
| 1 | 32.99 | 34.19 | 0.19 | 0 | |
| 2 | 32.77 | 34.40 | 0.05 | 0 | |
| 3 | 34.51 | 34.50 | 0.20 | 0 | |
| 4 | 33.44 | 35.38 | 0.27 | 0 | |
| 5 | 34.93 | 35.64 | 0.05 | 0 | |
| 6 | 34.39 | 35.69 | 0.18 | 0 | |
| 7 | 35.23 | 36.02 | 0.05 | 0 | |
| 8 | 37.25 | 39.09 | 0.13 | 0 | |
| 9 | 37.27 | 39.66 | 0.24 | 0 | |
| 10 | 42.20 | 43.14 | 0.29 | 0 | |
| 11 | 45.64 | 43.20 | 0.74 | 0 | |
| 12 | 42.84 | 45.73 | 1.02 | 0 | |
| 13 | 47.22 | 47.01 | 0.37 | 0.3 | |
| 14 | 43.47 | 47.50 | 1.50 | 0.31 | |
| 15 | 47.78 | 51.08 | 3.54 | 0.7 | |
| 16 | 48.47 | 52.20 | 5.03 | 0.93 | |
| 17 | 49.50 | 52.54 | 4.80 | 1.1 | |
| 18 | 50.46 | 53.11 | 4.26 | 1.23 | |
| 19 | 52.28 | 53.71 | 2.89 | 1.7 | |
| 20 | 53.75 | 54.66 | 3.27 | 2.15 | |
| 21 | 52.30 | 54.94 | 3.22 | 2.4 | |
| 22 | 52.36 | 55.77 | 6.78 | 3.0 | |
| 23 | 57.73 | 57.70 | 4.72 | 4.15 | |
| 24 | 56.31 | 58.65 | 8.70 | 5.1 | |
| 25 | 56.48 | 59.30 | 10.28 | 5.8 | |
| 26 | 59.99 | 61.71 | 5.25 | 9.7 | |
| 27 | 59.69 | 61.73 | 9.01 | 9.7 | |
| 28 | 59.60 | 62.31 | 11.42 | 10.9 | |
| 29 | 59.42 | 62.80 | 14.77 | 12.5 | |
| 30 | 60.30 | 62.88 | 15.83 | 13.0 | |
| 31 | 61.49 | 63.99 | 16.87 | 14.3 | |
| 32 | 62.43 | 64.11 | 17.74 | 14.8 | |
| 33 | 62.56 | 64.30 | 12.07 | 15.5 | |
| 34 | 61.85 | 64.45 | 15.61 | 16.0 | |
| 35 | 68.31 | 69.50 | 15.99 | 35.0 | |
| 36 | 67.16 | 69.71 | 35.25 | 36.1 | |
| 37 | 69.45 | 71.36 | 32.07 | 41.8 | |
| 38 | 69.07 | 71.83 | 45.08 | 44.2 | |
| 39 | 71.40 | 73.18 | 40.00 | 54.1 | |
| 40 | 71.00 | 73.26 | 50.06 | 55.2 | |
| 41 | 71.43 | 74.74 | 53.94 | 58.1 | |
| 42 | 74.50 | 76.05 | 50.85 | 65.35 | |
| 43 | 74.67 | 78.36 | 58.05 | 74.5 | |
* A+T content of bacterial genomes was calculated from corresponding genomic DNA sequences available in gene banks.
** Similar values were found for the E. coli K12 genome.
Figure 3The number of strong promoter candidate sequences is a function of the A+T content of bacterial genomes. For the score parameters sUp = 13, s35 = 5.5, s10 = 4.5 and constants c1 = 0.22 and c2 = -11.7, the picture displays a linear graph of the "exponential low" (thin line), which approximates fairly closely to the curve ln [N(A+T)], shown as a thick line. The logarithm of the percentage of strong promoter candidates in real genomes is shown by (○).
Number of sequences reminiscent of strong promoters in regions located upstream and downstream of the initiation codon of genes in bacterial genomes.
| N° | Genome | Length, bp | Number of genes | Upstream region | Downstream region |
| 1 | 2648638 | 2681 | 5 | 1 | |
| 2 | 4016947 | 3787 | 2 | 0 | |
| 3 | 3716413 | 3477 | 7 | 0 | |
| 4 | 6264403 | 5570 | 15 | 2 | |
| 5 | 5076188 | 4197 | 2 | 0 | |
| 6 | 4411529 | 3922 | 7 | 0 | |
| 7 | 5175554 | 4344 | 2 | 0 | |
| 8 | 7036074 | 6693 | 9 | 0 | |
| 9 | 3654135 | 3375 | 8 | 0 | |
| 10 | 3268203 | 2770 | 8 | 1 | |
| 11 | 2841581 | 2701 | 20 | 1 | |
| 12 | 2117144 | 2059 | 21 | 4 | |
| 13 | 1138011 | 1083 | 4 | 3 | |
| 14 | 2154946 | 2329 | 35 | 13 | |
| 15 | 4857432 | 4608 | 163 | 61 | |
| 16 | 2272351 | 2226 | 112 | 45 | |
| 17 | 5528445 | 5478 | 263 | 79 | |
| 18 | 1751377 | 1900 | 81 | 24 | |
| 19 | 3573470 | 1074 | 31 | 6 | |
| 20 | 1860725 | 1926 | 63 | 10 | |
| 21 | 2961149 | 2887 | 93 | 37 | |
| 22 | 4653728 | 4042 | 274 | 61 | |
| 23 | 1551335 | 1503 | 71 | 37 | |
| 24 | 4202353 | 4125 | 359 | 87 | |
| 25 | 4214814 | 4182 | 430 | 111 | |
| 26 | 816394 | 705 | 37 | 14 | |
| 27 | 1069411 | 954 | 86 | 31 | |
| 28 | 2257487 | 1996 | 228 | 64 | |
| 29 | 1226565 | 1097 | 162 | 51 | |
| 30 | 2160837 | 2306 | 365 | 156 | |
| 31 | 1852441 | 1731 | 292 | 115 | |
| 32 | 2689445 | 2632 | 467 | 248 | |
| 33 | 3011208 | 3529 | 426 | 229 | |
| 34 | 1830138 | 1775 | 277 | 94 | |
| 35 | 580074 | 519 | 83 | 63 | |
| 36 | 2814816 | 2638 | 930 | 418 | |
| 37 | 1641481 | 1684 | 540 | 353 | |
| 38 | 3940880 | 3738 | 1685 | 916 | |
| 39 | 910724 | 875 | 350 | 292 | |
| 40 | 1111523 | 885 | 443 | 252 | |
| 41 | 3031430 | 2779 | 1499 | 772 | |
| 42 | 751719 | 645 | 328 | 236 | |
| 43 | 641454 | 584 | 339 | 225 |
Strong promoter candidates identified in T. maritima MSB8*.
| Downstream located gene(s)** | Strong promoter candidate sequence*** | Total score**** |
| TM_0013 | ACAATTTTTATCTGATATTTTTTTCACAttcaccatagtcgatTATAAC | 0,8475 |
| TM_0110 | ACCTTGATTTTAAATTATTTCCTGCATataattaatgtgaaCATAAT | 0,805 |
| TM_0280 | GCAATATTTGTCCAGAAATATACTTGATTtaacaaaaatggacaatgTAGAAT | 0,88 |
| TM_0339 | AGAAAAATTTTTTTGGAGACTTGACAaaatatttggtaatattcTAAAAT | 0,8975 |
| TM_0373 | TTTTACAAATTCTCATACGACCCCTTGACAtcccattctgtgcctcacTATAAT | 0,94 |
| TM_0657 | TAATGTAACTATTCAAAATCATTACAgtttataattatgtggTAAAAT | 0,8125 |
| TM_0682 | GAATACTCTGTCAGAAAGATTCGTGATCAtcttttcacctcgtgtagTATAAT | 0,915 |
| TM_1016 | TAAAAATTTCATGAAAAATTTCTTGAATtctgtgaccaaaagggTTTAAT | 0,9175 |
| TM_1167 | GAAAAGTTACAGAAAAAGTACCCTTGTTAtctgaaggtgaaaaatggTAAAAT | 0,865 |
| TM_t27 | TCATTCATTTTACCATCGAGTCCACTTGAAAttcaggaaggtatgtagTACAAT | 0,8675 |
| TM_1205 | GTTTTTTATCTCTACTAATTAGGTTGACAttattgattcagaagagTAAAAT | 0,88 |
| TM_1318 | AGAAACAATTTTGGAATTTGATCCATGGACAttattacctttaatgGATAAT | 0,8325 |
| TM_t34 | AGAAAAATTTCCGATGAGGGTACTTGAAAagggtgaaaacctgtgcTATTAT | 0,855 |
| TM_1429 | GCATTGTGATTTTTGTAACTATATTGACAtaaaacaaaaggtttgtTATAAT | 0,9175 |
| TM_t39 | AAAAATAAAAAGTCCTTCTGGGGATTGACCatatttcgtactcatgcTATAAT | 0,8725 |
| TM_1667 | AAGTATATCCTAAAAAAATATTTGAAAtgataccccaagattttaTATAAT | 0,905 |
| TM_1780 | GAAAATAACAGTGAAAAAACACTTCATAtaaatcatttcaaataatccTATAAT | 0,875 |
| TM_0150 (complem.) | AAAAATGTAAAAGAAGAGAAACTTGAATctttgaaaaacatcaTATACT | 0,855 |
| TM_0477 (complem.) | ACAAAAAAACTTTAGAAAACTCTTGAATttcctttggacgggatggTATAAT | 0,9425 |
| TM_0625 (complem.) | ATATTCGTTCTGAATGAAGGTTTTACATttcatccaaattattttggtTATAGT | 0,805 |
| TM_0656 (complem.) | AACTTAAGTAACACAAAATTAACCTTGACAacgaaaggggggtgggTATAAT | 0,8925 |
| TM_0755 (complem.) | AGAAATTCTTTGAAAACTATCTAGAATtcaaacgtcgcttttccagTATACT | 0,85 |
| TM_0971 (complem.) | AAATATAAATCTGAATTTACTAAATTCACAtttagcaaatcatcattTATAAT | 0,895 |
| TM_1015 (complem.) | ATAATTTTTGCAATTTTATCTCTATACAtctcacatcacctccggctaTATATT | 0,855 |
| TM_1067 (complem.) | GGATTATTTTATACTGAAAGCCCTTGACCttgttgtatgtttgttgaTATTAT | 0,92 |
| TM_1271 (complem.) | GGGTGATATTTCAACATTAAAATCTTGACAttctaccatgtcaaggtgTATAAT | 0,9525 |
| TM_1286 (complem.) | GTTTATGCAAATTTTCCTTCTGTTAACCAtgttacacacaacatgtggTATCAT | 0,8625 |
| TM_t31 (complem.) | AAGTTTTGATTTTTGTAAGGTTGAAAtaatctttctgacgatgtggTATAAT | 0,86 |
| TM_1412 (complem.) | ATATGGAAGTTCAAAAAACATCTTGCTTtcagagtgtgtttgtggTATAAA | 0,865 |
| TM_1419 (complem.) | AGAAAACTATTGGTAAAGCACTTGAAAtatatgactgtaaaaacgtgaTATAAT | 0,87 |
| TM_1439 (complem.) | TAGTATTCTACCCTAAACTCTTTCAttctggattcgataatTGTAAT | 0,835 |
| TM_t45 (complem.) | AAAAGAAGGAAGAAAAATGAAAACTTGAACaaggaaacgattgagtgTATAAT | 0,865 |
| TM_1786 (complem.) | GTATTATTCATTCTAAAAACTTGAAActgaccaaataaagtatTAGAAT | 0,855 |
| TM_1850 (complem.) | AAACGATTCTTCTAAAATGTGTTCTTGATTtgtatcactgttatgtTATAAA | 0,855 |
| TM_t11 | GAAAAGGGTTATCAGGAAATATCTTGAATagaaaaggttcgtgtgtTAAAAT | 0,8825 |
| TM_1272 | TTTCACATTTTGCATTATACACCTTGACAtggtagaatgtcaagatTTTAAT | 0,8975 |
| TM_0032 (complem.) | AATATTAGAATTTGAACTATAATTCGAAAtaattcctgttattcactCATAAT | 0,86 |
| TM_1490 (complem.) | GGTGAAAATATGCCCAGGAAACGTTTGACTggaatagttgtgagcgaTAAAAT | 0,845 |
* The genome annotation of T. maritima AE000512 used for analysis was dated 28th December 2005.
** The gene order for the first 34 candidate sequences is shown on both strands as described in the annotated genome [49]. The complementary strand is noted as (complem).
*** The spacer between -35 and -10 sites and the region located downstream of the -10 site are shown in lowercase; the initiation codons of the ORFs are shown in capital letters at the end of the corresponding sequences.
**** The first 34 candidate sequences were detected with the score parameters sUP = 13, s35 = 6, s10 = 5; TMt11, TM1272, TM0032 and TM1490 were detected with sUP = 12, s35 = 6, s10 = 5 and used for analysis in a cell-free system (see Fig. 3).
Figure 4Assessment of the strength of . Lanes 1 – Ptac (reference); 2 – PTM0032; 3 – PTM0373; 4 – PTM0477; 5 – PTM1016; 6 – PTM1067; 7 – PTM1271; 8 – PTM1272; 9 – PTM1429; 10 – PTM1490; 11 – PTM1667; 12 – PTM1780; 13 – PTMt45; 14 – PTMt11; 15 – PargC. Similar results were obtained in 3 experiments.
Figure 5Organization of strong bacterial promoters. (A), Alignment of 13 promoter candidates of T. maritima; (B) consensus sequences of T. maritima and E. coli strong promoters; consensus of the E. coli UP element is described in [26, 27]; (C) the strong promoters Ptac and PargC were used as references in this study.