| Literature DB >> 23829649 |
Ehsan Behnam1, Michael S Waterman, Andrew D Smith.
Abstract
Local alignment-free sequence comparison arises in the context of identifying similar segments of sequences that may not be alignable in the traditional sense. We propose a randomized approximation algorithm that is both accurate and efficient. We show that under D2 and its important variant [Formula: see text] as the similarity measure, local alignment-free comparison between a pair of sequences can be formulated as the problem of finding the maximum bichromatic dot product between two sets of points in high dimensions. We introduce a geometric framework that reduces this problem to that of finding the bichromatic closest pair (BCP), allowing the properties of the underlying metric to be leveraged. Local alignment-free sequence comparison can be solved by making a quadratic number of alignment-free substring comparisons. We show both theoretically and through empirical results on simulated data that our approximation algorithm requires a subquadratic number of such comparisons and trades only a small amount of accuracy to achieve this efficiency. Therefore, our algorithm can extend the current usage of alignment-free-based methods and can also be regarded as a substitute for local alignment algorithms in many biological studies.Mesh:
Year: 2013 PMID: 23829649 PMCID: PMC3704055 DOI: 10.1089/cmb.2012.0280
Source DB: PubMed Journal: J Comput Biol ISSN: 1066-5277 Impact factor: 1.479