Warning: Undefined array key "mm" in /www/wwwroot/www.ai-bt.com/si.php on line 10 Deprecated: trim(): Passing null to parameter #1 ($string) of type string is deprecated in /www/wwwroot/www.ai-bt.com/si.php on line 10 Discovering sequence similarity by the algorithmic significance method.

Literature DB >> 7584347

Discovering sequence similarity by the algorithmic significance method.

Abstract

The minimal-length encoding approach is applied to define concept of sequence similarity. A sequence is defined to be similar to another sequence or to a set of keywords if it can be encoded in a small number of bits by taking advantage of common subwords. Minimal-length encoding of a sequence is computed in linear time, using a data compression algorithm that is based on a dynamic programming strategy and the directed acyclic word graph data structure. No assumptions about common word ("k-tuple") length are made in advance, and common words of any length are considered. The newly proposed algorithmic significance method provides an exact upper bound on the probability that sequence similarity has occurred by chance, thus eliminating the need for any arbitrary choice of similarity thresholds. Preliminary experiments indicate that a small number of keywords can positively identify a DNA sequence, which is extremely relevant in the context of partial sequencing by hybridization.

Mesh：

Substances：

Year: 1993 PMID： 7584347

Source DB: PubMed Journal: Proc Int Conf Intell Syst Mol Biol ISSN： 1553-0833

Keyword Cloud
Cited

2 in total

1. Differential direct coding: a compression algorithm for nucleotide sequence data.

Authors: Gregory Vey
Journal: Database (Oxford) Date: 2009-09-14 Impact factor: 3.451

2. LifePrint: a novel k-tuple distance method for construction of phylogenetic trees.

Authors: Fabián Reyes-Prieto; Adda J García-Chéquer; Hueman Jaimes-Díaz; Janet Casique-Almazán; Juana M Espinosa-Lara; Rosaura Palma-Orozco; Alfonso Méndez-Tenorio; Rogelio Maldonado-Rodríguez; Kenneth L Beattie
Journal: Adv Appl Bioinform Chem Date: 2011-01-20

2 in total