| Literature DB >> 27138275 |
Sharma V Thankachan1, Sriram P Chockalingam2, Yongchao Liu1, Alberto Apostolico1, Srinivas Aluru1.
Abstract
Alignment-free approaches are gaining persistent interest in many sequence analysis applications such as phylogenetic inference and metagenomic classification/clustering, especially for large-scale sequence datasets. Besides the widely used k-mer methods, the average common substring (ACS) approach has emerged to be one of the well-known alignment-free approaches. Two recent works further generalize this ACS approach by allowing a bounded number k of mismatches in the common substrings, relying on approximation (linear time) and exact computation, respectively. Albeit having a good worst-case time complexity [Formula: see text], the exact approach is complex and unlikely to be efficient in practice. Herein, we present ALFRED, an alignment-free distance computation method, which solves the generalized common substring search problem via exact computation. Compared to the theoretical approach, our algorithm is easier to implement and more practical to use, while still providing highly competitive theoretical performances with an expected run-time of [Formula: see text]. By applying our program to phylogenetic inference as a case study, we find that our program facilitates to exactly reconstruct the topology of the reference phylogenetic tree for a set of 27 primate mitochondrial genomes, at reasonably acceptable speed. ALFRED is implemented in C++ programming language and the source code is freely available online.Entities:
Keywords: approximate string matching; phylogenetic tree; suffix trees
Mesh:
Year: 2016 PMID: 27138275 DOI: 10.1089/cmb.2015.0217
Source DB: PubMed Journal: J Comput Biol ISSN: 1066-5277 Impact factor: 1.479