Literature DB >> 8877521

Compact encoding strategies for DNA sequence similarity search.

D J States1, P Agarwal.   

Abstract

Determining whether two DNA sequences are similar is an essential component of DNA sequence analysis. Dynamic programming is the algorithm of choice if computational time is not the most important consideration. Heuristic search tools, such as BLAST, are computationally more efficient, but they may miss some of the sequence similarities (Altschul et al., 1990). These tools often use common k-tuples (words) between the two sequences to determine anchor points for the alignment, and spend most of their computational time extending the alignment beyond these anchor points. We discuss and provide a DNA sequence similarity search implementation (called SENSEI) that improves upon the performance of BLASTN by almost an order of magnitude for comparable sensitivity. This improvement is a result of using compactly encoded scoring tables for k-tuples, encoding bases with a single bit, filtering the sequence to remove the simple sequence repeats using XNUN, and masking the known species-specific repeats in the query sequence. To reduce memory requirements, especially for large genomic DNA query sequences, we recommend generating the neighborhood words from the target sequence at run-time, instead of generating them by preprocessing the query sequence.

Mesh:

Substances:

Year:  1996        PMID: 8877521

Source DB:  PubMed          Journal:  Proc Int Conf Intell Syst Mol Biol        ISSN: 1553-0833


  4 in total

1.  Mastering seeds for genomic size nucleotide BLAST searches.

Authors:  Valer Gotea; Vamsi Veeramachaneni; Wojciech Makałowski
Journal:  Nucleic Acids Res       Date:  2003-12-01       Impact factor: 16.971

2.  High-resolution mapping of DNA copy alterations in human chromosome 22 using high-density tiling oligonucleotide arrays.

Authors:  Alexander Eckehart Urban; Jan O Korbel; Rebecca Selzer; Todd Richmond; April Hacker; George V Popescu; Joseph F Cubells; Roland Green; Beverly S Emanuel; Mark B Gerstein; Sherman M Weissman; Michael Snyder
Journal:  Proc Natl Acad Sci U S A       Date:  2006-03-14       Impact factor: 11.205

3.  Rapid identification and differentiation of Trichophyton species, based on sequence polymorphisms of the ribosomal internal transcribed spacer regions, by rolling-circle amplification.

Authors:  Fanrong Kong; Zhongsheng Tong; Xiaoyou Chen; Tania Sorrell; Bin Wang; Qixuan Wu; David Ellis; Sharon Chen
Journal:  J Clin Microbiol       Date:  2008-01-30       Impact factor: 5.948

4.  DeepARG: a deep learning approach for predicting antibiotic resistance genes from metagenomic data.

Authors:  Gustavo Arango-Argoty; Emily Garner; Amy Pruden; Lenwood S Heath; Peter Vikesland; Liqing Zhang
Journal:  Microbiome       Date:  2018-02-01       Impact factor: 14.650

  4 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.