Literature DB >> 7584371

FLASH: a fast look-up algorithm for string homology.

A Califano1, I Rigoutsos.   

Abstract

A key issue in managing today's large amounts of genetic data is the availability of efficient, accurate, and selective techniques for detecting homologies (similarities) between newly discovered and already stored sequences. A common characteristic of today's most advanced algorithms, such as FASTA, BLAST, and BLAZE is the need to scan the contents of the entire database, in order to find one or more matches. This design decision results in either excessively long search times or, as is the case of BLAST, in a sharp trade-off between the achieved accuracy and the required amount of computation. The homology detection algorithm presented in this paper, on the other hand, is based on a probabilistic indexing framework. The algorithm requires minimal access to the database in order to determine matches. This minimal requirement is achieved by using the sequences of interest to generate a highly redundant number of very descriptive tuples; these tuples are subsequently used as indices in a table look-up paradigm. In addition to the description of the algorithm, theoretical and experimental results on the sensitivity and accuracy of the suggested approach are provided. The storage and computational requirements are described and the probability of correct matches and false alarms is derived. Sensitivity and accuracy are shown to be close to those of dynamic programming techniques. A prototype system has been implemented using the described ideas. It contains the full Swiss-Prot database rel 25 (10 MR) and the genome of E. Coli (2 MR). The system is currently being expanded to include the complete Genbank database.(ABSTRACT TRUNCATED AT 250 WORDS)

Entities:  

Mesh:

Year:  1993        PMID: 7584371

Source DB:  PubMed          Journal:  Proc Int Conf Intell Syst Mol Biol        ISSN: 1553-0833


  7 in total

1.  A web server for performing electronic PCR.

Authors:  Kirill Rotmistrovsky; Wonhee Jang; Gregory D Schuler
Journal:  Nucleic Acids Res       Date:  2004-07-01       Impact factor: 16.971

2.  BOND: Basic OligoNucleotide Design.

Authors:  Lucian Ilie; Hamid Mohamadi; Geoffrey Brian Golding; William F Smyth
Journal:  BMC Bioinformatics       Date:  2013-02-27       Impact factor: 3.169

3.  Optimizing a massive parallel sequencing workflow for quantitative miRNA expression analysis.

Authors:  Francesca Cordero; Marco Beccuti; Maddalena Arigoni; Susanna Donatelli; Raffaele A Calogero
Journal:  PLoS One       Date:  2012-02-20       Impact factor: 3.240

4.  YASS: enhancing the sensitivity of DNA similarity search.

Authors:  Laurent Noé; Gregory Kucherov
Journal:  Nucleic Acids Res       Date:  2005-07-01       Impact factor: 16.971

5.  SANS: high-throughput retrieval of protein sequences allowing 50% mismatches.

Authors:  J Patrik Koskinen; Liisa Holm
Journal:  Bioinformatics       Date:  2012-09-15       Impact factor: 6.937

6.  Short read DNA fragment anchoring algorithm.

Authors:  Wendi Wang; Peiheng Zhang; Xinchun Liu
Journal:  BMC Bioinformatics       Date:  2009-01-30       Impact factor: 3.169

7.  Choosing the best heuristic for seeded alignment of DNA sequences.

Authors:  Yanni Sun; Jeremy Buhler
Journal:  BMC Bioinformatics       Date:  2006-03-13       Impact factor: 3.169

  7 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.