| Literature DB >> 15901854 |
Hongwei Wu1, Zhengchang Su, Fenglou Mao, Victor Olman, Ying Xu.
Abstract
We present a computational method for the prediction of functional modules encoded in microbial genomes. In this work, we have also developed a formal measure to quantify the degree of consistency between the predicted and the known modules, and have carried out statistical significance analysis of consistency measures. We first evaluate the functional relationship between two genes from three different perspectives--phylogenetic profile analysis, gene neighborhood analysis and Gene Ontology assignments. We then combine the three different sources of information in the framework of Bayesian inference, and we use the combined information to measure the strength of gene functional relationship. Finally, we apply a threshold-based method to predict functional modules. By applying this method to Escherichia coli K12, we have predicted 185 functional modules. Our predictions are highly consistent with the previously known functional modules in E.coli. The application results have demonstrated that our approach is highly promising for the prediction of functional modules encoded in a microbial genome.Entities:
Mesh:
Year: 2005 PMID: 15901854 PMCID: PMC1130488 DOI: 10.1093/nar/gki573
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Figure 1The directed acyclic graph induced from the GO term UDP-N-acetylgalactosamine biosynthesis (GO: 0019277), wherein at the bottommost level is the GO term of interest itself, and at the upper levels are all its ancestors, adapted from QuickGO Go Browser ().
Means and standard deviations of SGO(g, g), d(g, g), SN(g, g) and Combined(g, g) for the positive and the random sets
| Mean | SD | Mean | SD | Mean | SD | Mean | SD | |
|---|---|---|---|---|---|---|---|---|
| Positive set | 3.652 | 1.871 | 23.273 | 11.365 | 0.864 | 0.436 | 0.286 | 1.192 |
| Random set | 3.111 | 1.244 | 26.882 | 16.077 | 0.720 | 0.266 | −0.262 | 0.813 |
Figure 2Distribution of SGO(g, g) for the positive (circles) and the random (triangles) sets.
Figure 3Distribution of d(g, g) for the positive (blue) and the random (red) sets.
Group assignments of the 134 reference genomes
| Phylum | Genomes |
|---|---|
| Crenarchaeota | |
| Aquificae | Aquifex aeolicus |
| Euryarchaeota | |
| Firmicutes | |
| Bacteroidetes | |
| Actinobacteria | |
| Spirochaetes | |
| Chlamydiae | |
| Fusobacteria | |
| Cyanobacteria | |
| Nanoarchaeota | |
| Planctomycetes | |
| Thermotogae | Thermotoga maritima |
| Proteobacteria |
Figure 4Distribution of SN(g, g) for the positive (blue) and the random (red) sets.
Figure 5The normalized Combined(g, g) of both naive Bayesian (red) and Bayesian (blue) inference approaches for both the positive (solid) and the random (dashed) sets.
The maximum AHMDs and their associated α-values for the 10 experiments, each of which corresponds to one repeat of the procedure of forming the training set, computing the combined score and predicting modules
| Experiment | Pathway | Regulon | Operon | |||
|---|---|---|---|---|---|---|
| α | AHMD | α | AHMD | α | AHMD | |
| 1 | 6.75 | 0.265 | 6.5 | 0.168 | 6.5 | 0.164 |
| 2 | 6.25 | 0.257 | 5.25 | 0.171 | 5.25 | 0.165 |
| 3 | 7 | 0.259 | 5.75 | 0.192 | 5.5 | 0.172 |
| 4 | 6.25 | 0.260 | 5.25 | 0.184 | 5 | 0.157 |
| 5 | 6 | 0.269 | 5.5 | 0.194 | 6 | 0.181 |
| 6 | 6.25 | 0.268 | 5.75 | 0.183 | 6 | 0.171 |
| 7 | 5.25 | 0.249 | 5.25 | 0.190 | 4.75 | 0.171 |
| 8 | 6.25 | 0.288 | 6 | 0.217 | 6 | 0.188 |
| 9 | 5.75 | 0.261 | 4.75 | 0.191 | 4.75 | 0.165 |
| 10 | 6 | 0.267 | 5.25 | 0.200 | 5.25 | 0.176 |
The maximum AHMD values, the associated values of α, the number (N) of predicted modules, the total number (|C|) of genes in all the predicted modules and the associated Z-scores, for the known pathways, regulons and operons achieved by using different sources of information
| AHMD | α | N | | | ||
|---|---|---|---|---|---|
| Pathways ( | |||||
| | 0.265 | 6.75 | 185 | 654 | 62.293 |
| | 0.236 | 3.75 | 189 | 998 | 58.474 |
| | 0.0364 | 3 | 28 | 221 | 4.753 |
| | 0.224 | 4 | 106 | 796 | 70.103 |
| Regulons ( | |||||
| | 0.168 | 6.5 | 194 | 717 | 37.908 |
| | 0.182 | 3.5 | 189 | 1099 | 40.576 |
| | 0.0200 | 2.75 | 26 | 431 | 0.769 |
| | 0.117 | 3.75 | 115 | 959 | 31.591 |
| Operons ( | |||||
| | 0.164 | 6.5 | 194 | 717 | 32.572 |
| | 0.176 | 5 | 188 | 702 | 39.406 |
| | 0.0147 | 3 | 28 | 221 | 0.502 |
| | 0.0708 | 3.75 | 115 | 959 | 13.868 |
The phylogenetic and neighborhood profiles are obtained by using the BDBH method.
The maximum AHMD values, the associated values of α, the number (N) of predicted modules, the total number (|C|) of genes in all the predicted modules and the associated Z-scores, for the known pathways, regulons and operons achieved by using the combined information, neighborhood profiles and phylogenetic profiles, respectively
| AHMD | α | N | | | ||
|---|---|---|---|---|---|
| Pathways ( | |||||
| | 0.248 | 7.25 | 191 | 700 | 66.694 |
| | 0.212 | 3.5 | 173 | 920 | 52.832 |
| | 0.0416 | 3.25 | 61 | 416 | 2.647 |
| Regulons ( | |||||
| | 0.176 | 6 | 171 | 1006 | 45.653 |
| | 0.170 | 3.25 | 165 | 1008 | 37.033 |
| | 0.0317 | 3.25 | 61 | 416 | −1.406 |
| Operons ( | |||||
| | 0.164 | 7.25 | 191 | 700 | 32.959 |
| | 0.157 | 4.5 | 173 | 669 | 36.274 |
| | 0.0246 | 3.25 | 61 | 416 | −0.988 |
The neighborhood and phylogenetic profiles are obtained by using the reciprocal smallest distance algorithm (33).
Figure 6(a) The number of edges as a function of α; and (b) AHMD values for the known pathways, regulons and operons as functions of α.
Figure 7Predicted modules consisting of at least three genes obtained by using α = 6.75, where edges of red represent for belonging to the same known pathways, edges of blue represent for belonging to the same known regulons, edges of green represent for belonging to the same known operons, edges of orange represent for transporter unit, edges of purple represent for having similar GO assignments and edges of black represent for having not been experimentally verified.
Figure 8Predicted module corresponding to the flagellar metabolism pathway, where according to Eco Cyc the genes of blue belong to the same regulon, and the genes of yellow, green and dark red belong to three different operons, respectively.
Figure 9Predicted modules involving more than one pathways.
Figure 10Genes are connected mainly because they are conserved neighboring genes along the same strand of DNA.
Figure 11Predicted modules that have not been experimentally verified.
Figure 12Predicted modules that hemL (16128147) is involved for different values of α: (a) α = 6.75, (b) α = 5.5 and (c) α = 4.