Literature DB >> 11331236

Efficient large-scale sequence comparison by locality-sensitive hashing.

J Buhler1.   

Abstract

MOTIVATION: Comparison of multimegabase genomic DNA sequences is a popular technique for finding and annotating conserved genome features. Performing such comparisons entails finding many short local alignments between sequences up to tens of megabases in length. To process such long sequences efficiently, existing algorithms find alignments by expanding around short runs of matching bases with no substitutions or other differences. Unfortunately, exact matches that are short enough to occur often in significant alignments also occur frequently by chance in the background sequence. Thus, these algorithms must trade off between efficiency and sensitivity to features without long exact matches.
RESULTS: We introduce a new algorithm, LSH-ALL-PAIRS, to find ungapped local alignments in genomic sequence with up to a specified fraction of substitutions. The length and substitution rate of these alignments can be chosen so that they appear frequently in significant similarities yet still remain rare in the background sequence. The algorithm finds ungapped alignments efficiently using a randomized search technique, locality-sensitive hashing. We have found LSH-ALL-PAIRS to be both efficient and sensitive for finding local similarities with as little as 63% identity in mammalian genomic sequences up to tens of megabases in length

Entities:  

Mesh:

Substances:

Year:  2001        PMID: 11331236     DOI: 10.1093/bioinformatics/17.5.419

Source DB:  PubMed          Journal:  Bioinformatics        ISSN: 1367-4803            Impact factor:   6.937


  21 in total

1.  BLAST: at the core of a powerful and diverse set of sequence analysis tools.

Authors:  Scott McGinnis; Thomas L Madden
Journal:  Nucleic Acids Res       Date:  2004-07-01       Impact factor: 16.971

2.  Mapping short DNA sequencing reads and calling variants using mapping quality scores.

Authors:  Heng Li; Jue Ruan; Richard Durbin
Journal:  Genome Res       Date:  2008-08-19       Impact factor: 9.043

3.  A geometric interpretation for local alignment-free sequence comparison.

Authors:  Ehsan Behnam; Michael S Waterman; Andrew D Smith
Journal:  J Comput Biol       Date:  2013-07       Impact factor: 1.479

4.  Assembling large genomes with single-molecule sequencing and locality-sensitive hashing.

Authors:  Konstantin Berlin; Sergey Koren; Chen-Shan Chin; James P Drake; Jane M Landolin; Adam M Phillippy
Journal:  Nat Biotechnol       Date:  2015-05-25       Impact factor: 54.908

5.  Geometric aspects of biological sequence comparison.

Authors:  Aleksandar Stojmirović; Yi-Kuo Yu
Journal:  J Comput Biol       Date:  2009-04       Impact factor: 1.479

6.  The evolutionary analysis of emerging low frequency HIV-1 CXCR4 using variants through time--an ultra-deep approach.

Authors:  John Archer; Andrew Rambaut; Bruce E Taillon; P Richard Harrigan; Marilyn Lewis; David L Robertson
Journal:  PLoS Comput Biol       Date:  2010-12-16       Impact factor: 4.475

7.  Genome-scale NCRNA homology search using a Hamming distance-based filtration strategy.

Authors:  Yanni Sun; Osama Aljawad; Jikai Lei; Alex Liu
Journal:  BMC Bioinformatics       Date:  2012-03-21       Impact factor: 3.169

8.  Efficient computation of spaced seeds.

Authors:  Silvana Ilie
Journal:  BMC Res Notes       Date:  2012-02-28

9.  FSH: fast spaced seed hashing exploiting adjacent hashes.

Authors:  Samuele Girotto; Matteo Comin; Cinzia Pizzi
Journal:  Algorithms Mol Biol       Date:  2018-03-22       Impact factor: 1.405

10.  Mobilomics in Saccharomyces cerevisiae strains.

Authors:  Giulia Menconi; Giovanni Battaglia; Roberto Grossi; Nadia Pisanti; Roberto Marangoni
Journal:  BMC Bioinformatics       Date:  2013-03-20       Impact factor: 3.169

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.