Literature DB >> 8790465

Biological evaluation of d2, an algorithm for high-performance sequence comparison.

W Hide1, J Burke, D B Davison.   

Abstract

A number of algorithms exist for searching sequence databases for biologically significant similarities based on the primary sequence similarity of aligned sequences. We have determined the biological sensitivity and selectivity of d2, a high-performance comparison algorithm that rapidly determines the relative dissimilarity of large datasets of genetic sequences. d2 uses sequence-word multiplicity as a simple measure of dissimilarity. It is not constrained by the comparison of direct sequence alignments and so can use word contexts to yield new information on relationships. It is extremely efficient, comparing a query of length 884 bases (INS1ECLAC) with 19,540,603 bases of the bacterial division of GenBank (release 76.0) in 51.77 CPU seconds on a Cray Y/MP-48 supercomputer. It is unique in that subsequences (words) of biological interest can be weighted to improve the sensitivity and selectivity of a search over existing methods. We have determined the ability of d2 to detect biologically significant matches between a query and large datasets of DNA sequences while varying parameters such as word-length and window size. We have also determined the distribution of dissimilarity scores within eukaryotic and prokaryotic divisions of GenBank. We have optimized parameters of the d2 program using Cray hardware and present an analysis of the sensitivity and selectivity of the algorithm. A theoretical analysis of the expectation for scores is presented. This work demonstrates that d2 is a unique, sensitive, and selective method of rapid sequence comparison that can detect novel sequence relationships which remain undetected by alternate methodologies.

Mesh:

Year:  1994        PMID: 8790465     DOI: 10.1089/cmb.1994.1.199

Source DB:  PubMed          Journal:  J Comput Biol        ISSN: 1066-5277            Impact factor:   1.479


  26 in total

1.  STACK: Sequence Tag Alignment and Consensus Knowledgebase.

Authors:  A Christoffels; A van Gelder; G Greyling; R Miller; T Hide; W Hide
Journal:  Nucleic Acids Res       Date:  2001-01-01       Impact factor: 16.971

2.  Distributional regimes for the number of k-word matches between two random sequences.

Authors:  Ross A Lippert; Haiyan Huang; Michael S Waterman
Journal:  Proc Natl Acad Sci U S A       Date:  2002-10-08       Impact factor: 11.205

3.  Metagenomic Classification Using an Abstraction Augmented Markov Model.

Authors:  Xiujun Sylvia Zhu; Monnie McGee
Journal:  J Comput Biol       Date:  2015-11-30       Impact factor: 1.479

Review 4.  New developments of alignment-free sequence comparison: measures, statistics and next-generation sequencing.

Authors:  Kai Song; Jie Ren; Gesine Reinert; Minghua Deng; Michael S Waterman; Fengzhu Sun
Journal:  Brief Bioinform       Date:  2013-09-23       Impact factor: 11.622

5.  The distribution of word matches between Markovian sequences with periodic boundary conditions.

Authors:  Conrad J Burden; Paul Leopardi; Sylvain Forêt
Journal:  J Comput Biol       Date:  2013-10-26       Impact factor: 1.479

6.  Alternative gene form discovery and candidate gene selection from gene indexing projects.

Authors:  J Burke; H Wang; W Hide; D B Davison
Journal:  Genome Res       Date:  1998-03       Impact factor: 9.043

7.  Fast genotyping of known SNPs through approximate k-mer matching.

Authors:  Ariya Shajii; Deniz Yorukoglu; Yun William Yu; Bonnie Berger
Journal:  Bioinformatics       Date:  2016-09-01       Impact factor: 6.937

8.  PEACE: Parallel Environment for Assembly and Clustering of Gene Expression.

Authors:  D M Rao; J C Moler; M Ozden; Y Zhang; C Liang; J E Karro
Journal:  Nucleic Acids Res       Date:  2010-06-03       Impact factor: 16.971

Review 9.  An overview of the wcd EST clustering tool.

Authors:  Scott Hazelhurst; Winston Hide; Zsuzsanna Lipták; Ramon Nogueira; Richard Starfield
Journal:  Bioinformatics       Date:  2008-05-14       Impact factor: 6.937

10.  k-link EST clustering: evaluating error introduced by chimeric sequences under different degrees of linkage.

Authors:  Lauren M Bragg; Glenn Stone
Journal:  Bioinformatics       Date:  2009-07-01       Impact factor: 6.937

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.