| Literature DB >> 16526950 |
Abstract
BACKGROUND: Although it is not difficult for state-of-the-art gene finders to identify coding regions in prokaryotic genomes, exact prediction of the corresponding translation initiation sites (TIS) is still a challenging problem. Recently a number of post-processing tools have been proposed for improving the annotation of prokaryotic TIS. However, inherent difficulties of these approaches arise from the considerable variation of TIS characteristics across different species. Therefore prior assumptions about the properties of prokaryotic gene starts may cause suboptimal predictions for newly sequenced genomes with TIS signals differing from those of well-investigated genomes.Entities:
Mesh:
Substances:
Year: 2006 PMID: 16526950 PMCID: PMC1434772 DOI: 10.1186/1471-2105-7-121
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Figure 1Dependence on user-selected smoothing parameter. Performance of TICO on the genomes of E. coli, B. subtilis, P. aeruginosa, B. pseudomallei and R. solanacearum for a user-selected smoothing parameter varying according to σ = {0.1, 0.15, ..., 2.0}. Performance is measured in percentage of correctly predicted TIS as compared with the respective reference dataset (see section »Datasets« for details).
TIS prediction accuracy of our algorithm (TICO) in comparison with other post-processing tools. RBSfinder [10], GS-Finder [4] and MED-Start [5] were used as post-processors on the same GLIMMER2.02-prediction as TICO. Accuracy was measured in percent of TIS that were predicted correctly with respect to reference annotations. Datasets are explained in detail in section »Results«.
| GLIMMER | MED-Start | TICO | GS-Finder | RBSfinder | |||
| EcoGene | 854 | 99.3 | 63.2 | 92.0 | 90.3 | 81.9 | |
| Link | 195 | 100 | 66.7 | 94.9 | 92.3 | 80.0 | |
| Bsub | 1248 | 98.6 | 61.3 | 89.2 | 87.9 | 78.5 | |
| 58 | 98.3 | 69.0 | 91.4 | 82.8 | |||
| PseudoCAP | 3281 | 97.5 | 57.8 | 3.6 | 83.6 | 67.7 | |
| 3440 | 97.2 | 51.5 | 5.0 | 71.4 | 56.8 | ||
| 1676 | 97.0 | 48.9 | 6.0 | 66.2 | 55.5 | ||
| 3399 | 97.7 | 53.2 | 5.5 | 64.3 | 53.3 | ||
| 2329 | 97.7 | 48.9 | 4.7 | 67.0 | 52.1 | ||
Figure 2Exemplary trimer weights calculated by TICO. Positional weight matrix (PWM) values resulting from our algorithm for four exemplary trimers in the flanking regions of the TIS. Position 0 denotes the translation start. Selected trimers correspond to the most frequent subwords in the putative SD motifs determined by MED-Start [5] for P. aeruginosa (CCTGG, GCGCC, GCCTG, CGCCG and CGGCG). Negative weights indicate that trimer occurrences at the corresponding positions are untypical for strong TIS candidates.