| Literature DB >> 27153669 |
Ben J Woodcroft1, Joel A Boyd1, Gene W Tyson1.
Abstract
UNLABELLED: Finding and translating stretches of DNA lacking stop codons is a task common in the analysis of sequence data. However, the computational tools for finding open reading frames are sufficiently slow that they are becoming a bottleneck as the volume of sequence data grows. This computational bottleneck is especially problematic in metagenomics when searching unassembled reads, or screening assembled contigs for genes of interest. Here, we present OrfM, a tool to rapidly identify open reading frames (ORFs) in sequence data by applying the Aho-Corasick algorithm to find regions uninterrupted by stop codons. Benchmarking revealed that OrfM finds identical ORFs to similar tools ('GetOrf' and 'Translate') but is four-five times faster. While OrfM is sequencing platform-agnostic, it is best suited to large, high quality datasets such as those produced by Illumina sequencers.Entities:
Mesh:
Year: 2016 PMID: 27153669 PMCID: PMC5013905 DOI: 10.1093/bioinformatics/btw241
Source DB: PubMed Journal: Bioinformatics ISSN: 1367-4803 Impact factor: 6.937
Fig. 1.Time taken (wall time) by each program for the benchmark datasets. GetOrf and Translate take significantly more time than OrfM to call ORFs. Translate is unable to run on compressed reads therefore wall time was not measured for the first dataset. Error bars indicate standard error of mean among triplicate runs