| Literature DB >> 17367537 |
Huaiqiu Zhu1, Gang-Qing Hu, Yi-Fan Yang, Jin Wang, Zhen-Su She.
Abstract
BACKGROUND: Despite a remarkable success in the computational prediction of genes in Bacteria and Archaea, a lack of comprehensive understanding of prokaryotic gene structures prevents from further elucidation of differences among genomes. It continues to be interesting to develop new ab initio algorithms which not only accurately predict genes, but also facilitate comparative studies of prokaryotic genomes.Entities:
Mesh:
Substances:
Year: 2007 PMID: 17367537 PMCID: PMC1847833 DOI: 10.1186/1471-2105-8-97
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Figure 1Flow chart of gene prediction process with MED 2.0 system.
Prediction for 5' and 3' gene-ends for five programs on test sets. Comparison of prediction for 5' and 3' ends of genes are performed among MED 2.0 (MED), Glimmer 2.02 post-processed by RBSfinder (GL2), Glimmer 3.02 (GL3) GeneMarkS (GMK), ZCURVE 1.0 (ZCV) and EasyGene (EG) on a set of reliable test sets
| Test set | Gene # | 3' end match (%) | Both ends match (%) | ||||||||||
| MED | GL2 | GL3 | GMK | ZCV | EG | MED | GL2 | GL3 | GMK | ZCV | EG | ||
| Bsub_All | 4100 | 98.7 | 98.2 | 97.6 | 98.9 (96.7) | 98.4 | 94.5 | 83.8 | 75.0 | 82.4 | 86.1 (83.2) | 83.1 | 79.6 |
| EcoGene | 854 | 99.1 | 99.3 | 99.4 | 99.9 (-) | 98.8 | 99.4 | 92.0 | 82.5 | 91.9 | 93.8 (-) | 89.2 | 91.1 |
| Link | 195 | 99.0 | 100.0 | 100.0 | 100.0 (100.0) | 100.0 | 100.0 | 93.3 | 85.6 | 94.4 | 94.4 (94.4) | 92.3 | 92.1 |
| EcoGene_short | 58 | 93.1 | 91.4 | 96.6 | 100.0 (-) | 86.2 | 93.3 | 91.4 | 77.6 | 89.7 | 98.3 (-) | 77.6 | 90.0 |
| Bsub123 | 123 | 95.1 | 91.1 | 87.8 | 97.6 (91.9) | 91.9 | 73.0 | 85.4 | 73.2 | 77.2 | 87.8 (82.9) | 78.0 | 66.0 |
| Bsub72 | 72 | 94.4 | 91.7 | 87.5 | 98.6 (94.4) | 93.1 | 82.4 | 87.5 | 75.0 | 77.8 | 93.1 (88.9) | 86.1 | 76.5 |
| Bsub51 | 51 | 92.2 | 88.2 | 82.3 | 98.0 (94.1) | 90.2 | 84.8 | 90.2 | 70.6 | 78.4 | 94.1(90.2) | 84.3 | 81.8 |
| Psaer107 | 107 | 97.2 | 100.0 | 95.3 | 93.5 (-) | 95.3 | 100.0 | 93.5 | 83.2 | 90.6 | 85.0 (-) | 91.6 | 88.0 |
| Mtub66 | 66 | 95.5 | 98.5 | 97.0 | 98.5 (-) | 97.0 | 97.5 | 87.9 | 60.6 | 80.3 | 80.3 (-) | 75.8 | 82.5 |
| SolfGene | 56 | 100.0 | 100.0 | 100.0 | 100.0 (-) | 100.0 | 100.0 | 89.3 | 50.0 | 87.5 | 85.7 (-) | 73.2 | 89.3 |
Programs MED, GL2 (post-processed by RBSfinder), GL3 and ZCV were run locally, while GMK was run online, as described in the text. Predictions for EG were downloaded from [34].
Experiment confirmed TISs data sets: the first three represent two well-studied genomes: B. subtilis (Bsub_All) and E. coli (EcoGene and Link); the fourth to seventh represent short genes for E. coli (EcoGene_short) and B. subtilis (Bsub123, Bsub72 and Bsub51); Psaer107 and Mtub66 are selected for two GC rich genomes, M. tuberculosis (GC%: 65.6) and P. aeruginosa (GC%: 66.6); SolfGene corresponds to the archaeal S. solfataricus.
Numbers in parentheses indicate that the results of GeneMarkS have been reported in literature, (-) means no data reported.
Figure 2Sequence logos of TIS-upstream-regions predicted by MED 2.0 for three archaeal genomes. We present the logos of three representative archaeal genomes: M. jannaschii, N. equitans and P. abyssi. The logos of start codon at position 0 to +2 are masked off.
Figure 3Sequence logos of TIS-upstream-regions for MED prediction and GenBank annotation to . (a) Venn diagram indicating the numbers of common and different gene starts given by MED 2.0 and GenBank; (b) Sequence logos of upstream region to TISs agreed by both MED 2.0 and GenBank; (c) Sequence logos of upstream region to TISs predicted only by MED 2.0; (d) Sequence logos of upstream region to TISs annotated only in GenBank. The logos of start codon at position 0 to +2 are masked off.
Figure 4Sequence logos of TIS-upstream-regions for MED, Glimmer, GeneMarkS and ZCURVE prediction to . The three Venn diagrams indicate the number of common and different gene starts by MED 2.0 versus Glimmer 3.02 (a), GeneMarkS (b) and ZCURVE 1.0 (c), separately. The left side sequence logos are for upstream regions to the TISs predicted by MED2.0 but rejected by Glimmer 3.02 (d), GeneMarkS (f) and ZCURVE 1.0 (h). The right side sequence logos of are for upstream regions to the TISs predicted by Glimmer 3.02 (e), GeneMarkS (g) and ZCURVE 1.0 (i) but rejected by MED2.0. The logos of start codon at position 0 to +2 are masked off.