Warning: Undefined array key "mm" in /www/wwwroot/www.ai-bt.com/si.php on line 10 Deprecated: trim(): Passing null to parameter #1 ($string) of type string is deprecated in /www/wwwroot/www.ai-bt.com/si.php on line 10 Biological evaluation of d2, an algorithm for high-performance sequence comparison.

Literature DB >> 8790465

Biological evaluation of d2, an algorithm for high-performance sequence comparison.

Abstract

A number of algorithms exist for searching sequence databases for biologically significant similarities based on the primary sequence similarity of aligned sequences. We have determined the biological sensitivity and selectivity of d2, a high-performance comparison algorithm that rapidly determines the relative dissimilarity of large datasets of genetic sequences. d2 uses sequence-word multiplicity as a simple measure of dissimilarity. It is not constrained by the comparison of direct sequence alignments and so can use word contexts to yield new information on relationships. It is extremely efficient, comparing a query of length 884 bases (INS1ECLAC) with 19,540,603 bases of the bacterial division of GenBank (release 76.0) in 51.77 CPU seconds on a Cray Y/MP-48 supercomputer. It is unique in that subsequences (words) of biological interest can be weighted to improve the sensitivity and selectivity of a search over existing methods. We have determined the ability of d2 to detect biologically significant matches between a query and large datasets of DNA sequences while varying parameters such as word-length and window size. We have also determined the distribution of dissimilarity scores within eukaryotic and prokaryotic divisions of GenBank. We have optimized parameters of the d2 program using Cray hardware and present an analysis of the sensitivity and selectivity of the algorithm. A theoretical analysis of the expectation for scores is presented. This work demonstrates that d2 is a unique, sensitive, and selective method of rapid sequence comparison that can detect novel sequence relationships which remain undetected by alternate methodologies.

Mesh：

Year: 1994 PMID： 8790465 DOI： 10.1089/cmb.1994.1.199

Source DB: PubMed Journal: J Comput Biol ISSN： 1066-5277 Impact factor: 1.479

Keyword Cloud
Cited

26 in total

Biological evaluation of d2, an algorithm for high-performance sequence comparison.

1. STACK: Sequence Tag Alignment and Consensus Knowledgebase.

2. Distributional regimes for the number of k-word matches between two random sequences.

3. Metagenomic Classification Using an Abstraction Augmented Markov Model.

Review 4. New developments of alignment-free sequence comparison: measures, statistics and next-generation sequencing.

5. The distribution of word matches between Markovian sequences with periodic boundary conditions.

6. Alternative gene form discovery and candidate gene selection from gene indexing projects.

7. Fast genotyping of known SNPs through approximate k-mer matching.

8. PEACE: Parallel Environment for Assembly and Clustering of Gene Expression.

Review 9. An overview of the wcd EST clustering tool.

10. k-link EST clustering: evaluating error introduced by chimeric sequences under different degrees of linkage.