| Literature DB >> 23129300 |
Shiyuyun Tang1, Ivan Antonov, Mark Borodovsky.
Abstract
SUMMARY: Frameshift (FS) prediction is important for analysis and biological interpretation of metagenomic sequences. Since a genomic context of a short metagenomic sequence is rarely known, there is not enough data available to estimate parameters of species-specific statistical models of protein-coding and non-coding regions. The challenge of ab initio FS detection is, therefore, two fold: (i) to find a way to infer necessary model parameters and (ii) to identify positions of frameshifts (if any). Here we describe a new tool, MetaGeneTack, which uses a heuristic method to estimate parameters of sequence models used in the FS detection algorithm. It is shown on multiple test sets that the MetaGeneTack FS detection performance is comparable or better than the one of earlier developed program FragGeneScan.Entities:
Mesh:
Year: 2012 PMID: 23129300 PMCID: PMC3530910 DOI: 10.1093/bioinformatics/bts636
Source DB: PubMed Journal: Bioinformatics ISSN: 1367-4803 Impact factor: 6.937
FS detection accuracy of FragGeneScan and MetaGeneTack for short fragments from 18 prokaryotic genomes
| Fragment length | Fragments having FS (%) | FragGeneScan | MetaGeneTack | ||||
|---|---|---|---|---|---|---|---|
| Avg | Avg | ||||||
| 400 nt | 5 | 79.6 | 15.8 | 47.7 | 74.4 | 38.3 | 56.4 |
| 10 | 80.5 | 27.3 | 53.9 | 75.3 | 54.5 | 64.9 | |
| 20 | 81.0 | 43.2 | 62.1 | 75.8 | 70.2 | 73.0 | |
| 600 nt | 5 | 81.2 | 11.7 | 46.4 | 79.9 | 27.7 | 53.8 |
| 10 | 81.8 | 21.2 | 51.5 | 79.9 | 43.1 | 61.5 | |
| 20 | 81.9 | 35.1 | 58.5 | 80.1 | 61.7 | 70.9 | |
| 800 nt | 5 | 81.9 | 9.1 | 45.5 | 81.7 | 21.7 | 51.7 |
| 10 | 82.6 | 16.9 | 49.7 | 81.2 | 35.0 | 58.1 | |
| 20 | 82.8 | 29.4 | 56.1 | 81.5 | 51.9 | 66.7 | |
Values are averaged among genomes and then averaged between insertion and deletion FS sets (see Supplementary Table S1 for details)
Fig. 1.Performance of MetaGeneTack with different combinations of filters as well as performance of FragGeneScan (the leftmost columns) using the 600 nt sequences with 20% having simulated FSs as the test set. The predicted frameshift is reported as true positive if it is located within 20 nt from the true simulated frameshift position, (A) for fragments with insertions, (B) for fragments with deletions. Values are averaged among 18 genomes