Warning: Undefined array key "mm" in /www/wwwroot/www.ai-bt.com/si.php on line 10 Deprecated: trim(): Passing null to parameter #1 ($string) of type string is deprecated in /www/wwwroot/www.ai-bt.com/si.php on line 10 Efficient tools for comparative substring analysis.

Literature DB >> 20682467

Efficient tools for comparative substring analysis.

Alberto Apostolico¹, Olgert Denas, Andreas Dress.

Abstract

This paper introduces an efficient implementation of approaches to alignment-free comparative genome analysis and genome-based phylogeny relying on substring composition. Distances derived from substring statistics have been proposed recently as a meaningful alternative to distances derived from sequence alignment. In particular, procaryote phylogenies based on comparative 5- and 6-mer analysis of whole proteomes have successfully been worked out. The present implementation extends the computation of composition-based distances so as to involve allk-mers for anyk up to any preset m aximum length K (including K=infinity). Remarkably, although there may be Theta(L(2)) distinct strings that occur in a given sequence of length L (and Theta(KL) of length k< or =K), it is shown that composition-based distances as well as many other details of interest in comparative genome analysis can be computed in O(L) time and space (with a constant that is independent of the size of K, that is, the same constant works for all K). A typical run with 2 sequences of altogether 1.5 million characters computes their composition-based distance in about 2s, a performance to be contrasted with the several hours needed, even when restricting attention to substrings of length at most 6, by the direct method in use. This paper. Copyright 2010 Elsevier B.V. All rights reserved.

Entities: Species

Mesh：

Year: 2010 PMID： 20682467 DOI： 10.1016/j.jbiotec.2010.05.006

Source DB: PubMed Journal: J Biotechnol ISSN： 0168-1656 Impact factor: 3.307

Keyword Cloud
Cited

1 in total

1. MissMax: alignment-free sequence comparison with mismatches through filtering and heuristics.

Authors: Cinzia Pizzi
Journal: Algorithms Mol Biol Date: 2016-04-21 Impact factor: 1.405

1 in total