| Literature DB >> 16682449 |
Dongsheng Che1, Guojun Li, Fenglou Mao, Hongwei Wu, Ying Xu.
Abstract
We present a study on computational identification of uber-operons in a prokaryotic genome, each of which represents a group of operons that are evolutionarily or functionally associated through operons in other (reference) genomes. Uber-operons represent a rich set of footprints of operon evolution, whose full utilization could lead to new and more powerful tools for elucidation of biological pathways and networks than what operons have provided, and a better understanding of prokaryotic genome structures and evolution. Our prediction algorithm predicts uber-operons through identifying groups of functionally or transcriptionally related operons, whose gene sets are conserved across the target and multiple reference genomes. Using this algorithm, we have predicted uber-operons for each of a group of 91 genomes, using the other 90 genomes as references. In particular, we predicted 158 uber-operons in Escherichia coli K12 covering 1830 genes, and found that many of the uber-operons correspond to parts of known regulons or biological pathways or are involved in highly related biological processes based on their Gene Ontology (GO) assignments. For some of the predicted uber-operons that are not parts of known regulons or pathways, our analyses indicate that their genes are highly likely to work together in the same biological processes, suggesting the possibility of new regulons and pathways. We believe that our uber-operon prediction provides a highly useful capability and a rich information source for elucidation of complex biological processes, such as pathways in microbes. All the prediction results are available at our Uber-Operon Database: http://csbl.bmb.uga.edu/uber, the first of its kind.Entities:
Mesh:
Substances:
Year: 2006 PMID: 16682449 PMCID: PMC1458513 DOI: 10.1093/nar/gkl294
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Figure 1A schematic diagram showing how our algorithm works. In each (a, b, c, d, e and f), the first row represents genes and operons in one genome, and the second row represents genes and operons in another genome. (a) The initial homologous relationship (dashed lines) between the two genomes; each operon is considered as a vertex; (b) the weight of O4-O5′ is 3 (because the maximum mapping between them is 3), and it is the maximal among all the weights, so they are merged to one operon group, where the solid lines represent orthologous relationship, and this operon group becomes a new vertex; (c) the weights of O1-O1′, O2-O2′ and O4′-O4O5′ are 2; they are merged to operon groups and become the new vertices; (d) the weight of O3-O4′O4O5′ are 2; they are merged into one operon group; it should be noted that when the maximum mapping is re-calculated, one pair of orthologues between O4 and O5′ has been re-predicted; the new prediction is more accurate when all the four operons are considered, which represents a correcting mechanism in this algorithm; (e) O1O1′ and O2O2′ are merged into one operon group; (f) O3′ and O1O1′O2O2′are merged into one operon group; it should be noted that when the maximum mapping is re-calculated, some of the predicted orthologous relationships could be different from that by the previous iteration. At the end two uber-operons in each genome are generated.
Figure 2An overview of the uber-operon prediction procedure that consists of preparing operon data, identifying candidate uber-operons using a heuristic algorithm (the lower-layer algorithm), and clustering (the higher-layer algorithm).
AHMDs of all predicted uber-operons, means and standard deviations (SD) of AHMDs of randomly combined operons, and their corresponding Z-scores, for the known pathways and regulons, and ASgo of all predicted uber-operons, means and SD of ASgo of randomly combined operons and their corresponding Z-scores, for the known GO terms
| IRa | PathAHMDb | RandAHMD(sd)c | RegAHMDe | RandAHMD(sd)f | ASgoh | RandASgo(sd)i | |||
|---|---|---|---|---|---|---|---|---|---|
| 2.0 | 0.098 | 0.066 (0.0058) | 5.637 | 0.125 | 0.078 (0.0093) | 4.979 | 3.419 | 2.861 (0.079) | 7.102 |
| 3.0 | 0.107 | 0.082 (0.0063) | 3.991 | 0.166 | 0.104 (0.011) | 5.76 | 3.511 | 2.864 (0.069) | 9.378 |
| 4.0 | 0.112 | 0.085 (0.0069) | 3.93 | 0.166 | 0.107 (0.011) | 5.173 | 3.509 | 2.850 (0.068) | 9.691 |
| 5.0 | 0.115 | 0.085 (0.0071) | 4.091 | 0.159 | 0.110 (0.012) | 4.145 | 3.561 | 2.855 (0.074) | 9.579 |
Four different inflation values (2.0, 3.0, 4.0 and 5.0) in MCL were tested in our experiments.
aInflation rate.
bAHMD(U), calculated using formula (3).
cAverage AHMD(U′) for 100 sets of pseudo uber-operons, and its SD, calculated by formula (3) for randomly generated uber-operons.
dZ-score for AHMD(U), calculated using formula (5).
eAHMD(U), calculated using formula (3).
fAverage AHMD(U′) for 100 sets of pseudo uber-operons, and its standard deviation, calculated by formula (3) for randomly generated uber-operons.
gZ-score for AHMD(U), calculated using formula (5).
hASgo for the predicted uber-operons, calculated using formula (6).
iAverage ASgo for 100 sets of pseudo uber-operons, and its SD.
jZ-score for ASgo, calculated using formula (5).
Figure 3Frequency distribution of the number of operons in a uber-operon in E.coli. A total of 157 predicted uber-operons in E.coli were used. One uber-operon containing 28 operons was not included.