Warning: Undefined array key "mm" in /www/wwwroot/www.ai-bt.com/si.php on line 10 Deprecated: trim(): Passing null to parameter #1 ($string) of type string is deprecated in /www/wwwroot/www.ai-bt.com/si.php on line 10 A geometric interpretation for local alignment-free sequence comparison.

Literature DB >> 23829649

A geometric interpretation for local alignment-free sequence comparison.

Ehsan Behnam¹, Michael S Waterman, Andrew D Smith.

Abstract

Local alignment-free sequence comparison arises in the context of identifying similar segments of sequences that may not be alignable in the traditional sense. We propose a randomized approximation algorithm that is both accurate and efficient. We show that under D2 and its important variant [Formula: see text] as the similarity measure, local alignment-free comparison between a pair of sequences can be formulated as the problem of finding the maximum bichromatic dot product between two sets of points in high dimensions. We introduce a geometric framework that reduces this problem to that of finding the bichromatic closest pair (BCP), allowing the properties of the underlying metric to be leveraged. Local alignment-free sequence comparison can be solved by making a quadratic number of alignment-free substring comparisons. We show both theoretically and through empirical results on simulated data that our approximation algorithm requires a subquadratic number of such comparisons and trades only a small amount of accuracy to achieve this efficiency. Therefore, our algorithm can extend the current usage of alignment-free-based methods and can also be regarded as a substitute for local alignment algorithms in many biological studies.

Mesh：

Year: 2013 PMID： 23829649 PMCID： PMC3704055 DOI： 10.1089/cmb.2012.0280

Source DB: PubMed Journal: J Comput Biol ISSN： 1066-5277 Impact factor: 1.479

28 in total

A geometric interpretation for local alignment-free sequence comparison.

1. Distributional regimes for the number of k-word matches between two random sequences.

2. Alignment-free sequence comparison (I): statistics and power.

3. Protein sequence similarity searches using patterns as seeds.

4. Methods for assessing the statistical significance of molecular sequence features by using general scoring schemes.

5. Improved tools for biological sequence comparison.

6. Sequence turnover and tandem repeats in cis-regulatory modules in drosophila.

7. Alignment-free estimation of nucleotide diversity.

8. Efficient large-scale protein sequence comparison and gene matching to identify orthologs and co-orthologs.

9. The evolution of two-component systems in bacteria reveals different strategies for niche adaptation.

10. NCBI BLAST: a better web interface.

Review 1. Sequence analysis by iterated maps, a review.

2. Inference of Markovian properties of molecular sequences from NGS data and applications to comparative genomics.

3. The Amordad database engine for metagenomics.

4. Optimal choice of word length when comparing two Markov sequences using a χ ²-statistic.