| Literature DB >> 16202130 |
Yu Zheng1, Brian P Anton, Richard J Roberts, Simon Kasif.
Abstract
BACKGROUND: Microbial genomes contain an abundance of genes with conserved proximity forming clusters on the chromosome. However, the conservation can be a result of many factors such as vertical inheritance, or functional selection. Thus, identification of conserved gene clusters that are under functional selection provides an effective channel for gene annotation, microarray screening, and pathway reconstruction. The problem of devising a robust method to identify these conserved gene clusters and to evaluate the significance of the conservation in multiple genomes has a number of implications for comparative, evolutionary and functional genomics as well as synthetic biology.Entities:
Mesh:
Year: 2005 PMID: 16202130 PMCID: PMC1266350 DOI: 10.1186/1471-2105-6-243
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Figure 1Improvement of signal-to-noise ratio of the phylogenetic method over the simple method. The upper conservation scores (Cu) profiles for the genomic region surrounding ent operons in Escherichia coli are shown. (a) the simple method; (b) the phylogenetic method.
Figure 2Conservation score (Cu and Cd) profiles for the entCEBA genomic region using increasing number of reference genomes.
Figure 3Upstream conservation score calculated from orthology and similarity data for a genomic region surrounding fhuABCD operon in Escherichia coli.
Figure 4Receiving operating characteristic (ROC) curves of different methods on the RegulonDB operon dataset. Curves are color-coded for different methods. Points using cutoffs of 4.0 and 5.0 for our method are highlighted on the curve.
Statistics of conserved gene clusters in a number of microorganisms
| Genome | Total genes | Total genes in clusters | Total detected clusters | Percentage | Average cluster size |
| Chlamydophila pneumoniae J138 | 1070 | 209 | 58 | 0.20 | 3.6 |
| Mycobacterium tuberculosis CDC1551 | 4187 | 547 | 161 | 0.13 | 3.4 |
| Sinorhizobium meliloti | 6205 | 841 | 230 | 0.14 | 3.7 |
| Clostridium acetobutylicum | 3672 | 707 | 166 | 0.19 | 4.3 |
| Mycobacterium tuberculosis H37Rv | 3918 | 543 | 156 | 0.14 | 3.5 |
| Staphylococcus aureus Mu50 | 2748 | 757 | 174 | 0.28 | 4.4 |
| Aeropyrum pernix | 2694 | 160 | 44 | 0.06 | 3.6 |
| Clostridium perfringens | 2723 | 633 | 147 | 0.23 | 4.3 |
| Mycoplasma genitalium | 480 | 153 | 40 | 0.32 | 3.8 |
| Agrobacterium tumefaciens C58 | 5301 | 805 | 221 | 0.15 | 3.6 |
| Deinococcus radiodurans | 3102 | 389 | 117 | 0.13 | 3.3 |
| Mycoplasma pneumoniae | 688 | 167 | 47 | 0.24 | 3.6 |
| Streptococcus pneumoniae R6 | 2043 | 546 | 151 | 0.27 | 3.6 |
| Agrobacterium tumefaciens C58 UWash | 5402 | 832 | 224 | 0.15 | 3.7 |
| Escherichia coli K12 | 4289 | 1313 | 287 | 0.31 | 4.6 |
| Mycoplasma pulmonis | 782 | 168 | 52 | 0.21 | 3.2 |
| Streptococcus pneumoniae TIGR4 | 2094 | 534 | 147 | 0.26 | 3.6 |
| Escherichia coli O157H7 | 5361 | 1327 | 288 | 0.25 | 4.6 |
| Neisseria meningitidis MC58 | 2025 | 457 | 129 | 0.23 | 3.5 |
| Streptococcus pyogenes | 1696 | 501 | 136 | 0.30 | 3.7 |
| Aquifex aeolicus | 1553 | 178 | 57 | 0.11 | 3.1 |
| Sulfolobus solfataricus | 2977 | 244 | 65 | 0.08 | 3.8 |
| Archaeoglobus fulgidus | 2407 | 250 | 73 | 0.10 | 3.4 |
| Nostoc sp | 6129 | 284 | 88 | 0.05 | 3.2 |
| Sulfolobus tokodaii | 2826 | 242 | 65 | 0.09 | 3.7 |
| Bacillus halodurans | 4066 | 952 | 219 | 0.23 | 4.3 |
| Borrelia burgdorferi | 1709 | 214 | 57 | 0.13 | 3.8 |
Figure 5Examples of gene clusters that are conserved in only a few genomes.
Statistics of functional enrichment
| Aeropyrum pernix | 39/45 | 6.33E-12 | 40/46 | 2.21E-12 | 0.349 |
| 80/107 | 1.23E-32 | 81/111 | 8.71E-32 | 7.08 | |
| 329/503 | 1.86E-44 | 363/601 | 4.10E-38 | 2.20E+06 | |
| 167/246 | 1.28E-31 | 181/307 | 1.90E-23 | 1.48E+08 | |
| 95/132 | 3.94E-24 | 123/295 | 1.88E-03 | 4.77E+20 | |
| 104/137 | 5.03E-30 | 135/327 | 4.29E-03 | 8.53E+26 | |
| 173/221 | 4.07E-57 | 167/218 | 5.56E-52 | 1.37E+05 | |
| 439/750 | 8.92E-48 | 606/1553 | 0.19 | 2.13E+46 | |
| 127/150 | 7.60E-38 | 125/145 | 6.27E-39 | 0.0825 | |
| 120/151 | 1.11E-37 | 122/158 | 4.92E-36 | 0.443 | |
| 182/277 | 2.79E-28 | 196/319 | 4.19E-25 | 1500 | |
| 68/83 | 5.82E-22 | 64/79 | 5.91E-20 | 102 | |
| 203/293 | 1.05E-42 | 226/347 | 1.09E-41 | 10.4 | |
| 75/98 | 3.61E-11 | 77/103 | 1.44E-10 | 3.99 | |
| 150/222 | 1.25E-51 | 156/241 | 9.14E-51 | 7.31 | |
a Ratio is P-value for functional enrichment by the phylogeny method divided by P-value for functional enrichment by the counting method [i.e., (column 3) / (column 5)].
Figure 6A simple genome phylogenetic tree.