| Literature DB >> 11737947 |
Abstract
BACKGROUND: Identification of co-regulated genes is essential for elucidating transcriptional regulatory networks and the function of uncharacterized genes. Although co-regulated genes should have at least one common sequence element, it is generally difficult to identify these genes from the presence of this element because it is very easily obscured by noise. To overcome this problem, we used conserved information from three closely related species: Bacillus subtilis, B. halodurans and B. stearothermophilus.Entities:
Mesh:
Substances:
Year: 2001 PMID: 11737947 PMCID: PMC60312 DOI: 10.1186/gb-2001-2-11-research0048
Source DB: PubMed Journal: Genome Biol ISSN: 1474-7596 Impact factor: 13.583
Figure 1Histogram of PCE scores calculated from sequence alignments. (a) Three or (b) two sequences were aligned. Green bars correspond to the score of actual PCEs and yellow bars to the score of spurious PCEs generated by joining upstream regions with unrelated coding regions. In the yellow bars, the averaged values of five trials are shown with their error bars.
Correspondence between known transcription factor binding sites and PCEs
| Factor name | Orthologs* | Number of known sites† | Number of sites to be detected‡ | Number of overlaps§ |
| AbrB | H S | 11 (1) | 7 | 3 |
| AhrC | H S | 5 (1) | 3 | 2 |
| AraR | H | 5 (1) | 2 | 1 |
| BirA | H S | 1 | 0 | - |
| BltR | H | 1 | 1 | 0 |
| BmrR | None | 1 | 0 | - |
| CcpA | H S | 33 (17) | 11 | 6 |
| CodY | H S | 2 | 1 | 1 |
| ComA | None | 5 | 2 | 0 |
| ComK | None | 1 | 0 | - |
| CtsR | H S | 6 | 6 | 4 |
| DegU | H S | 14 (3) | 5 | 1 |
| DeoR | H | 1 | 1 | 0 |
| LexA | H | 8 | 6 | 3 |
| ExuR | S | 1 | 1 | 0 |
| Fnr | H | 2 | 2 | 1 |
| GerE | H S | 21 (2) | 7 | 0 |
| GlnR | S | 6 | 3 | 0 |
| GltC | H S | 3 | 3 | 0 |
| GltR | None | 4 | 2 | 0 |
| GntR | None | 1 | 0 | - |
| Hpr | H | 8 (1) | 3 | 0 |
| HrcA | H S | 2 | 2 | 2 |
| IolR | None | 2 | 1 | 0 |
| LevR | S | 3 | 0 | - |
| LicT | H | 1 | 1 | 1 |
| LrpC | H | 1 | 1 | 0 |
| Mta | H S | 3 | 2 | 1 |
| MtrB | H S | 1 | 1 | 0 |
| PhoP | H S | 6 (2) | 2 | 0 |
| PyrR | H S | 3 | 3 | 3 |
| PurR | H S | 1 | 0 | - |
| RibC | H S | 1 | 0 | - |
| RocR | H | 4 | 2 | 0 |
| SacT | None | 1 | 0 | - |
| SacY | None | 1 | 0 | - |
| SenS | None | 1 | 0 | - |
| SinR | H | 6 | 5 | 5 |
| Spo0A | H S | 22 (1) | 18 | 10 |
| SpoIIID | H S | 12 (5) | 6 | 2 |
| TnrA | H S | 10 | 6 | 1 |
| TreR | H S | 2 | 2 | 2 |
| Xre | H | 4 | 0 | - |
| XylR | H S | 1 | 1 | 1 |
| MntR | H S | 2 | 1 | 1 |
| Zur | H S | 2 | 2 | 1 |
| Total | 232 (34) | 122 | 52 |
*Name(s) of species having the orthologous gene with the B. subtilis gene. H: B. halodurans; S: B. stearothermophilus.†Total number of experimentally verified binding sites of < 50 bp. The number of binding sites in the coding region is shown in parentheses. ‡Number of known binding sites in the region analyzed in this work. §Number of analyzed sites overlapping with PCEs over 5 bp.
Figure 2Histogram of similarity scores used during the clustering process. Red bars represent clustering of PCEs within the upstream regions of orthologous genes, green bars the clustering of PCEs with randomly shuffled sequence, and blue bars the clustering of PCEs identified when the upstream regions are linked to unrelated coding regions. For the green and blue bars, average values are shown with their error bars.
Comparison of some typical regulons with our results
| Regulon | Gene* | Cluster information† | Sequence of PCE‡ |
| 34 | AGTCCAGAGAGGCTGAGAAGGA-T | ||
| 34 | AATCCAGAGAGGTTG | ||
| C | CAGAGAGGCTT | ||
| S-box regulon (regulator: unknown) | 1,11 | ||
| 1 | |||
| 1,5,11 | |||
| 5,11 | |||
| 5 | |||
| 5,11 | |||
| 5,11 | |||
| B | |||
| A | |||
| A | |||
| A | |||
| Hypothetical xanthine regulon (regulator: unknown) | 20 | ||
| 14,20 | |||
| 14 | |||
| Aminoacyl-tRNA synthetases (regulator: uncharged tRNA) | 2 | AGGGTGGCAACGCGAG | |
| 2 | AAAAAAGGTGGTACCGCGA | ||
| 2 | GAAAAAAGGGTGGAACCACGA | ||
| 2 | TTAGTAGGGTGGTACCGCGA | ||
| 2 | AGGGTGGTACCGCGGG | ||
| 2 | AGGGTGGTACCGCGTG | ||
| 2 | AGGGTGGTACCGCGGAAAG | ||
| 2 | AATAAGGGTGGTACCGCG | ||
| 2 | AACTAGGGTGGCACCACGGGTAT.. | ||
| 2 | GCAACTAGGGTGGAACCGCGGG | ||
| 2 | AGGGTGGTACCGCGAG-A | ||
| 2 | AGGGTGGTACCGCGAGA | ||
| 2 | AAGGTGGTACCACGGA | ||
| D | C-AAACAGAGTGGAACCGCG | ||
| C | AGGGTGG | ||
| A | |||
| Heat-shock regulon (regulator: CtsR) | 6 | ||
| 6 | |||
| 6 | GAAAGTCAAAGTCAGGCAT | ||
| B | |||
| CcpA regulon§ (regulator: CcpA) | 47 | ||
| 47 | |||
| 47 | ...TCTT-TAAAGCGCTTTCAT | ||
| 47 | GACCAAAGCGTTTTT | ||
| 59 | |||
| 59 | TATAGAATGAAAGCGC | ||
| D | |||
| D | |||
| D | |||
| B | |||
| B | |||
| B | |||
| B | |||
| E | |||
| E | |||
| E | |||
| E | |||
| E | |||
| E | |||
| E | |||
| E | |||
| E | |||
| E | |||
| E | |||
| E | |||
| E | |||
| A | |||
| A | |||
| A | |||
| A | |||
| A | |||
| A | |||
| A |
*Probable new members identified by our analysis are shown with anasterisk. †Cluster number(s) are shown when available, otherwise, one of the situation codes is shown: A, orthologous genes not found; B, no overlaps between known binding site and PCE; C, PCE overlaps with known site but is too short; D, PCE overlaps with known site but is slightly different; E, binding site exists within the coding region. ‡PCE sequence in B. subtilis. The region overlapping with a known binding site is shown in bold. §CcpA-dependent genes identified by a systematic experiment [31] are not included.
Figure 3Post-transcriptional regulation of the pyr operon. (a) The three attenuation regions in the operon. (b) Two alternative secondary structures of the transcript of each attenuation region. In the presence of high UMP concentration, PyrR binds to the anti-antiterminator and stabilizes the formation of the terminator structure, while preventing the formation of the antiterminator.
Clusters having SD-like PCEs
| Gene | Functional classification* |
| Membrane bioenergetics | |
| Sporulation | |
| Sporulation | |
| Ribosomal proteins | |
| Elongation | |
| Ribosomal proteins | |
| Elongation | |
| None | |
| None | |
| None | |
| Sporulation | |
| Ribosomal proteins | |
| Sporulation | |
| Ribosomal proteins | |
| Ribosomal proteins | |
| Ribosomal proteins | |
| Cell division | |
| Ribosomal proteins | |
| Elongation | |
| Metabolism of amino acids and related molecules | |
| None | |
| Elongation | |
| Regulation | |
| Initiation | |
| Germination | |
| Aminoacyl-tRNA synthetases | |
| Termination | |
| None | |
| Metabolism of amino acids and related molecules | |
| Initiation | |
| Metabolism of lipids | |
| Termination | |
| Detoxification | |
| Cell division | |
| Cell wall | |
| Mobility and chemotaxis | |
| Metabolism of amino acids and related molecules | |
| Elongation | |
| Transport/binding proteins and lipoproteins | |
| Metabolism of nucleotides and nucleic acids | |
| None | |
| Aminoacyl-tRNA synthetases | |
| Detoxification | |
| Metabolism of amino acids and related molecules | |
| Ribosomal proteins | |
| Ribosomal proteins | |
| DNA replication | |
| Aminoacyl-tRNA synthetases | |
| RNA modification | |
| None | |
| Specific pathways | |
| Metabolism of amino acids and related molecules | |
| Regulation | |
| Specific pathways | |
| Transport/binding proteins and lipoproteins | |
| Aminoacyl-tRNA synthetases | |
| Protein folding | |
| None | |
| None | |
| Aminoacyl-tRNA synthetases | |
| Ribosomal proteins | |
| Specific pathways | |
| TCA cycle | |
| Detoxification | |
| Phage-related functions | |
| Metabolism of amino acids and related molecules | |
| Elongation | |
| Elongation | |
| None | |
| TCA cycle | |
| None | |
| None | |
| None | |
| Main glycolytic pathways | |
| Aminoacyl-tRNA synthetases | |
| Cell wall | |
| Ribosomal proteins | |
| Ribosomal proteins | |
| Cell wall | |
| None | |
| Regulation | |
| None | |
| Main glycolytic pathways | |
| None | |
| Main glycolytic pathways | |
| Metabolism of amino acids and related molecules | |
| Detoxification | |
| Transport/binding proteins and lipoproteins | |
| Transport/binding proteins and lipoproteins | |
| None | |
| Sporulation |
*Functional classification is obtained from the SubtiList website [42,43]. Genes belonging to the same cluster are grouped together
Number of -35/-10 boxes that overlap with PCEs for each sigma factor
| Sigma factor | Number of sites* | Number of -35 boxes† | Number of -10 boxes‡ |
| SigA | 62 | 9 | 14 |
| SigB | 9 | 0 | 0 |
| SigD | 5 | 0 | 2 |
| SigE | 19 | 3 | 7 |
| SigF | 7 | 2 | 1 |
| SigG | 14 | 3 | 2 |
| SigH | 8 | 0 | 2 |
| SigK | 13 | 3 | 2 |
| SigL | 1 | 0 | 1 |
| SigW | 9 | 4 | 2 |
| SigX | 2 | 0 | 0 |
-35/-10 boxes that overlap with a PCE by 5 bp or more were counted. If the box is shorter than 5 bp, those fully overlapping with PCE were counted. *Number of known -35/-10 boxes that exist in the regions analyzed in our work. †Number of -35 boxes that overlap with PCE. §Number of -10 boxes that overlap with PCE.