Literature DB >> 11472987

An efficient algorithm for finding short approximate non-tandem repeats.

E F Adebiyi1, T Jiang, M Kaufmann.   

Abstract

We study the problem of approximate non-tandem repeat extraction. Given a long subject string S of length N over a finite alphabet Sigma and a threshold D, we would like to find all short substrings of S of length P that repeat with at most D differences, i.e., insertions, deletions, and mismatches. We give a careful theoretical characterization of the set of seeds (i.e., some maximal exact repeats) required by the algorithm, and prove a sublinear bound on their expected numbers. Using this result, we present a sub-quadratic algorithm for finding all short (i.e., of length O(log N)) approximate repeats. The running time of our algorithm is O(DN(3pow(epsilon)-1)log N), where epsilon = D/P and pow(epsilon) is an increasing, concave function that is 0 when epsilon = 0 and about 0.9 for DNA and protein sequences.

Mesh:

Year:  2001        PMID: 11472987     DOI: 10.1093/bioinformatics/17.suppl_1.s5

Source DB:  PubMed          Journal:  Bioinformatics        ISSN: 1367-4803            Impact factor:   6.937


  5 in total

1.  ACMES: fast multiple-genome searches for short repeat sequences with concurrent cross-species information retrieval.

Authors:  Jeff Reneker; Chi-Ren Shyu; Peiyu Zeng; Joseph C Polacco; Walter Gassmann
Journal:  Nucleic Acids Res       Date:  2004-07-01       Impact factor: 16.971

2.  A novel signal processing measure to identify exact and inexact tandem repeat patterns in DNA sequences.

Authors:  Ravi Gupta; Divya Sarthi; Ankush Mittal; Kuldip Singh
Journal:  EURASIP J Bioinform Syst Biol       Date:  2007

3.  High-performance exact algorithms for motif search.

Authors:  Sanguthevar Rajasekaran; Sudha Balla; Chun-Hsi Huang; Vishal Thapar; Michael Gryk; Mark Maciejewski; Martin Schiller
Journal:  J Clin Monit Comput       Date:  2005-10       Impact factor: 1.977

4.  Refined repetitive sequence searches utilizing a fast hash function and cross species information retrievals.

Authors:  Jeff Reneker; Chi-Ren Shyu
Journal:  BMC Bioinformatics       Date:  2005-05-03       Impact factor: 3.169

5.  Understanding and identifying amino acid repeats.

Authors:  Hong Luo; Harm Nijveen
Journal:  Brief Bioinform       Date:  2014-07       Impact factor: 11.622

  5 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.