Literature DB >> 17666756

Comparing compressed sequences for faster nucleotide BLAST searches.

Michael Cameron1, Hugh E Williams.   

Abstract

Molecular biologists, geneticists, and other life scientists use the BLAST homology search package as their first step for discovery of information about unknown or poorly annotated genomic sequences. There are two main variants of BLAST: BLASTP for searching protein collections and BLASTN for nucleotide collections. Surprisingly, BLASTN has had very little attention; for example, the algorithms it uses do not follow those described in the 1997 BLAST paper and no exact description has been published. It is important that BLASTN is state-of-the-art: Nucleotide collections such as GenBank dwarf the protein collections in size, they double in size almost yearly, and they take many minutes to search on modern general purpose workstations. This paper proposes significant improvements to the BLASTN algorithms. Each of our schemes is based on compressed bytepacked formats that allow queries and collection sequences to be compared four bases at a time, permitting very fast query evaluation using lookup tables and numeric comparisons. Our most significant innovations are two new, fast gapped alignment schemes that allow accurate sequence alignment without decompression of the collection sequences. Overall, our innovations more than double the speed of BLASTN with no effect on accuracy and have been integrated into our new version of BLAST that is freely available for download from http://www.fsa-blast.org/.

Mesh:

Substances:

Year:  2007        PMID: 17666756     DOI: 10.1109/TCBB.2007.1029

Source DB:  PubMed          Journal:  IEEE/ACM Trans Comput Biol Bioinform        ISSN: 1545-5963            Impact factor:   3.710


  3 in total

1.  REMiner: a tool for unbiased mining and analysis of repetitive elements and their arrangement structures of large chromosomes.

Authors:  Byung-Ik Chung; Kang-Hoon Lee; Kyung-Seop Shin; Woo-Chan Kim; Deug-Nam Kwon; Ri-Na You; Young-Kwan Lee; Kiho Cho; Dong-Ho Cho
Journal:  Genomics       Date:  2011-07-22       Impact factor: 5.736

2.  PSimScan: algorithm and utility for fast protein similarity search.

Authors:  Anna Kaznadzey; Natalia Alexandrova; Vladimir Novichkov; Denis Kaznadzey
Journal:  PLoS One       Date:  2013-03-07       Impact factor: 3.240

3.  Sequence Similarity Network Analysis Provides Insight into the Temporal and Geographical Distribution of Mutations in SARS-CoV-2 Spike Protein.

Authors:  Shruti S Patil; Helen N Catanese; Kelly A Brayton; Eric T Lofgren; Assefaw H Gebremedhin
Journal:  Viruses       Date:  2022-07-29       Impact factor: 5.818

  3 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.