| Literature DB >> 31329826 |
Katelyn McNair1, Carol Zhou2, Elizabeth A Dinsdale3, Brian Souza2, Robert A Edwards1,3,4.
Abstract
MOTIVATION: Currently there are no tools specifically designed for annotating genes in phages. Several tools are available that have been adapted to run on phage genomes, but due to their underlying design, they are unable to capture the full complexity of phage genomes. Phages have adapted their genomes to be extremely compact, having adjacent genes that overlap and genes completely inside of other longer genes. This non-delineated genome structure makes it difficult for gene prediction using the currently available gene annotators. Here we present PHANOTATE, a novel method for gene calling specifically designed for phage genomes. Although the compact nature of genes in phages is a problem for current gene annotators, we exploit this property by treating a phage genome as a network of paths: where open reading frames are favorable, and overlaps and gaps are less favorable, but still possible. We represent this network of connections as a weighted graph, and use dynamic programing to find the optimal path.Entities:
Mesh:
Year: 2019 PMID: 31329826 PMCID: PMC6853651 DOI: 10.1093/bioinformatics/btz265
Source DB: PubMed Journal: Bioinformatics ISSN: 1367-4803 Impact factor: 6.937
Fig. 1.Number of genes predicted by each of four different gene prediction algorithms and the combinations thereof. Orange background: predicted by a single algorithm; green background: predicted by two algorithms; blue background: predicted by three algorithms; pink background: predicted by all four algorithms
Numbers and lengths of the genes predicted by the different gene callers
| Gene caller | Number of genes | Mean length (nt) | SD of gene length (nt) |
|---|---|---|---|
| PHANOTATE | 225 518 | 603 | 708 |
| GeneMarkS | 213 101 | 628 | 719 |
| Glimmer | 211 278 | 631 | 719 |
| Prodigal | 211 886 | 631 | 720 |
Fig. 2.Violin plot of the ln(number or reads that map) to each of the ORFs predicted either by one (or more) of Prodigal, Glimmer or GeneMarkS; by no gene prediction algorithms (negative control); or by PHANOTATE alone