Literature DB >> 11339903

An algorithm for approximate tandem repeats.

G M Landau1, J P Schmidt, D Sokol.   

Abstract

A perfect single tandem repeat is defined as a nonempty string that can be divided into two identical substrings, e.g., abcabc. An approximate single tandem repeat is one in which the substrings are similar, but not identical, e.g., abcdaacd. In this paper we consider two criterions of similarity: the Hamming distance (k mismatches) and the edit distance (k differences). For a string S of length n and an integer k our algorithm reports all locally optimal approximate repeats, r = umacro û, for which the Hamming distance of umacro and û is at most k, in O(nk log (n/k)) time, or all those for which the edit distance of umacro and û is at most k, in O(nk log k log (n/k)) time. This paper concentrates on a more general type of repeat called multiple tandem repeats. A multiple tandem repeat in a sequence S is a (periodic) substring r of S of the form r = u(a)u', where u is a prefix of r and u' is a prefix of u. An approximate multiple tandem repeat is a multiple repeat with errors; the repeated subsequences are similar but not identical. We precisely define approximate multiple repeats, and present an algorithm that finds all repeats that concur with our definition. The time complexity of the algorithm, when searching for repeats with up to k errors in a string S of length n, is O(nka log (n/k)) where a is the maximum number of periods in any reported repeat. We present some experimental results concerning the performance and sensitivity of our algorithm. The problem of finding repeats within a string is a computational problem with important applications in the field of molecular biology. Both exact and inexact repeats occur frequently in the genome, and certain repeats occurring in the genome are known to be related to diseases in the human.

Entities:  

Mesh:

Substances:

Year:  2001        PMID: 11339903     DOI: 10.1089/106652701300099038

Source DB:  PubMed          Journal:  J Comput Biol        ISSN: 1066-5277            Impact factor:   1.479


  15 in total

1.  mreps: Efficient and flexible detection of tandem repeats in DNA.

Authors:  Roman Kolpakov; Ghizlane Bana; Gregory Kucherov
Journal:  Nucleic Acids Res       Date:  2003-07-01       Impact factor: 16.971

2.  ACMES: fast multiple-genome searches for short repeat sequences with concurrent cross-species information retrieval.

Authors:  Jeff Reneker; Chi-Ren Shyu; Peiyu Zeng; Joseph C Polacco; Walter Gassmann
Journal:  Nucleic Acids Res       Date:  2004-07-01       Impact factor: 16.971

3.  A novel signal processing measure to identify exact and inexact tandem repeat patterns in DNA sequences.

Authors:  Ravi Gupta; Divya Sarthi; Ankush Mittal; Kuldip Singh
Journal:  EURASIP J Bioinform Syst Biol       Date:  2007

4.  Consensus higher order repeats and frequency of string distributions in human genome.

Authors:  Vladimir Paar; Ivan Basar; Marija Rosandić; Matko Gluncić
Journal:  Curr Genomics       Date:  2007-04       Impact factor: 2.236

5.  TRStalker: an efficient heuristic for finding fuzzy tandem repeats.

Authors:  Marco Pellegrini; M Elena Renda; Alessio Vecchio
Journal:  Bioinformatics       Date:  2010-06-15       Impact factor: 6.937

6.  TRedD--a database for tandem repeats over the edit distance.

Authors:  Dina Sokol; Firat Atagun
Journal:  Database (Oxford)       Date:  2010-07-06       Impact factor: 3.451

7.  Database of exact tandem repeats in the Zebrafish genome.

Authors:  Eric C Rouchka
Journal:  BMC Genomics       Date:  2010-06-01       Impact factor: 3.969

8.  Pattern matching through Chaos Game Representation: bridging numerical and discrete data structures for biological sequence analysis.

Authors:  Susana Vinga; Alexandra M Carvalho; Alexandre P Francisco; Luís Ms Russo; Jonas S Almeida
Journal:  Algorithms Mol Biol       Date:  2012-05-02       Impact factor: 1.405

9.  A monte carlo method for assessing the quality of duplication-aware alignment algorithms.

Authors:  Valerio Freschi; Alessandro Bogliolo
Journal:  Evol Bioinform Online       Date:  2011-05-10       Impact factor: 1.625

10.  NTRFinder: a software tool to find nested tandem repeats.

Authors:  Atheer A Matroud; M D Hendy; C P Tuffley
Journal:  Nucleic Acids Res       Date:  2011-11-25       Impact factor: 16.971

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.