| Literature DB >> 15072583 |
David J Studholme1, Stephen D Bentley, Jan Kormanec.
Abstract
BACKGROUND: Streptomyces coelicolor is a bacterium with a vast repertoire of metabolic functions and complex systems of cellular development. Its genome sequence is rich in genes that encode regulatory proteins to control these processes in response to its changing environment. We wished to apply a recently published bioinformatic method for identifying novel regulatory sequence signals to gain new insights into regulation in S. coelicolor.Entities:
Mesh:
Substances:
Year: 2004 PMID: 15072583 PMCID: PMC450296 DOI: 10.1186/1471-2180-4-14
Source DB: PubMed Journal: BMC Microbiol ISSN: 1471-2180 Impact factor: 3.605
| Matrix | Protein class | Number of ORFs in this set | Number of ORFs in this set belonging to this protein class | P value | Consensus sequence | |
| 2083 | 2.1.3 | Degradation of polysaccharides | 54 | 10 | 7.557e-10 | See Figure 2A |
| 2318 | 2.2.3 | DNA – replication, repair, restriction / modification | 21 | 6 | 7.76e-08 | See Figure 2B |
| 1744 | 1.2.1 | Chromosome replication | 24 | 3 | 2.12e-06 | See Figure 2C |
| 1909 | 4.1.7 | Gram +ve exported / lipoprotein | 106 | 22 | 9.24e-08 | See Figure 2D |
| 46 | 6.2.1 | sigma factor | 116 | 10 | 6.60e-08 | See Figure 2E |
| 2034 | 6.3.13 | ArsR | 45 | 4 | 1.89e-06 | See Figure 2F |
| 1853 | 3.3.11 | Nucleotide interconversions | 46 | 5 | 4.09e-07 | See Figure 2G |
| 363 | 3.8.0 | Secondary metabolism | 9 | 5 | 4.89e-07 | See Figure 2H |
| 571 | 3.8.0 | Secondary metabolism | 10 | 5 | 9.621e-07 | See Figure 2H |
| 293 | 3.8.0 | Secondary metabolism | 10 | 5 | 9.62e-07 | See Figure 2H |
| 153 | 3.8.0 | Secondary metabolism | 18 | 6 | 1.31e-06 | See Figure 2H |
Table 1. Position-specific weight matrices (PSWMs) that represent DNA sequence motifs shared by functionally coherent sets of genes in Streptomyces coelicolor. A library of 2497 matrices was generated from alignments of over-represented DNA sequence dyads as described in the Methods section. Each matrix is essentially a statistical model of a DNA sequence motif [58]. The non-coding regions of the S. coelicolor genome were searced against the matrices to find matches to each of the sequence motifs. The scanning method assigneda score (maximum 100) to each match site. The minimum score threshold was chosen as 80. For each matrix, we recorded the number of genes whose upstream region contains at least one match site. We also recorded the number of those genes belonging to each functional category in the protein classification scheme, and calculated a P value to determine whether that functional category was significantly over-represented.
Figure 2Consensus sequences as indicated from Table 1.
Figure 1Occurrence of sequence motifs in coding and non-coding DNA. The genomes of Streptomyces coelicolor (A), S. avermitilis (B), Mycobacterium tuberculosis (C) and Escherichia coli (D) were scanned against each matrix to find matches to the corresponding DNA sequence motifs. The scanning method assigned a score (maximum 100) to each match site. Scans were performed using a range of different threshold minimum scores. For each threshold, we counted the number of match sites (with a score of equal to or greater than the threshold) found in coding and in non-coding DNA (i.e. intragenic and intergenic sites respectively). The ratio of the number of intergenic sites to the number of intragenic sites is plotted for each threshold level that was used. The matrices are further described in Table 1.