| Literature DB >> 15070404 |
Shaun Mahony1, James O McInerney, Terry J Smith, Aaron Golden.
Abstract
BACKGROUND: Many current gene prediction methods use only one model to represent protein-coding regions in a genome, and so are less likely to predict the location of genes that have an atypical sequence composition. It is likely that future improvements in gene finding will involve the development of methods that can adequately deal with intra-genomic compositional variation.Entities:
Mesh:
Substances:
Year: 2004 PMID: 15070404 PMCID: PMC385221 DOI: 10.1186/1471-2105-5-23
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Accuracy of RescueNet in 15 bacterial genomes.
| 26.2 | 564 | 292 | |||||
| 28.6 | 857 | 403 | |||||
| 30.6 | 1654 | 673 | |||||
| 31.4 | 1715 | 692 | |||||
| 31.7 | 483 | 301 | |||||
| 38.0 | 1754 | 885 | |||||
| 38.9 | 1593 | 712 | |||||
| 43.3 | 1517 | 723 | |||||
| 43.5 | 4220 | 1832 | |||||
| 47.6 | 3169 | 954 | |||||
| 47.6 | 4043 | 1640 | |||||
| 50.8 | 4290 | 1983 | |||||
| 67.0 | 2622 | 1436 | |||||
| 67.0 | 3442 | 1748 | |||||
| 72.1 | 7851 | 956 |
The genomes are listed according to ascending G+C content. For each genome, the table shows: Genome GC content (GC %), the number of genes annotated in GenBank for that genome, the number of genes in the RescueNet training set, overall RescueNet sensitivity (Sn.), the sensitivity of RescueNet in finding genes longer than the 225 bp minimum prediction size (Sn. >225 bp), the sensitivity of RescueNet in finding genes that have been confirmed by homology with other genes in GenBank (Sn. Conserved), and finally, overall RescueNet specificity (Sp.)
Figure 1Screenshot from the Artemis sequence viewer [49] showing a sample region of D. radiodurans and accompanying RescueNet predictions. Annotated genes are shown as white blocks, and predictions are shown in-frame as shaded blocks. Note the relative infrequency of stop codons (vertical lines in each frame) and the many ORFs that are not protein-coding regions. Note also the selected gene DR1142 and the contradicting RescueNet prediction. DR1142 is a hypothetical gene, predicted to be so by Glimmer2, and there is a strong possibility that the CDS marked by RescueNet is the correct prediction. The possibility is also raised by RescueNet that the gene DR1143 may be longer than previously annotated and contains a frameshift.
Figure 2Artemis screenshot showing a sample region of the H. influenzae genome and associated RescueNet predictions. As in Fig. 1, the annotated genes are shown as white blocks and the RescueNet predictions are shown in-frame as shaded blocks. Note that genes HI0218 and HI0220 contain authentic frameshifts. RescueNet gives two predictions that overlap each of these gene, and they meet near the frameshift point.