Literature DB >> 9773349

Identifying satellites and periodic repetitions in biological sequences.

M F Sagot1, E W Myers.   

Abstract

We present in this paper an algorithm for identifying satellites in DNA sequences. Satellites (simple, micro, or mini) are repeats in number between 30 and as many as 1,000,000 whose lengths vary between 2 and hundreds of base pairs and that appear, with some mutations, in tandem along the sequence. We concentrate here on short to moderately long (up to 30-40 base pairs) approximate tandem repeats where copies may differ up to epsilon = 15-20% from a consensus model of the repeating unit (implying individual units may vary by 2 epsilon from each other). The algorithm is composed of two parts. The first one consists of a filter that basically eliminates all regions whose probability of containing a satellite is less than one in 10(4) when epsilon = 10%. The second part realizes an exhaustive exploration of the space of all possible models for the repeating units present in the sequence. It therefore has the advantage over previous work of being able to report a consensus model, say m, of the repeated unit as well as the span of the satellite. The first phase was designed for efficiency and takes only O (n) time where n is the length of the sequence. The second phase was designed for sensitivity and takes time O (n . N (e, k)) in the worst case where k is the length of the repeating unit m, e = [epsilon k] is the number of differences allowed between each repeat unit and the model m, and N (e, k) is the maximum number of words that are not more than e differences from another word of length k. That is, N (e, k) is the maximum size of an e-neighborhood of a string of length k. Experiments reveal the second phase to be considerably faster in practice than the worst-case complexity bound suggests. Finally, the present algorithm is easily adapted to finding tandem repeats in protein sequences, as well as extended to identifying mixed direct-inverse tandem repeats.

Mesh:

Substances:

Year:  1998        PMID: 9773349     DOI: 10.1089/cmb.1998.5.539

Source DB:  PubMed          Journal:  J Comput Biol        ISSN: 1066-5277            Impact factor:   1.479


  9 in total

Review 1.  Mapping the bacterial cell architecture into the chromosome.

Authors:  A Danchin; P Guerdoux-Jamet; I Moszer; P Nitschké
Journal:  Philos Trans R Soc Lond B Biol Sci       Date:  2000-02-29       Impact factor: 6.237

2.  An appraisal of the potential for illegitimate recombination in bacterial genomes and its consequences: from duplications to genome reduction.

Authors:  Eduardo P C Rocha
Journal:  Genome Res       Date:  2003-05-12       Impact factor: 9.043

Review 3.  Computational approaches to identify promoters and cis-regulatory elements in plant genomes.

Authors:  Stephane Rombauts; Kobe Florquin; Magali Lescot; Kathleen Marchal; Pierre Rouzé; Yves van de Peer
Journal:  Plant Physiol       Date:  2003-07       Impact factor: 8.340

Review 4.  Comparative genomics and molecular dynamics of DNA repeats in eukaryotes.

Authors:  Guy-Franck Richard; Alix Kerrest; Bernard Dujon
Journal:  Microbiol Mol Biol Rev       Date:  2008-12       Impact factor: 11.056

5.  Consensus higher order repeats and frequency of string distributions in human genome.

Authors:  Vladimir Paar; Ivan Basar; Marija Rosandić; Matko Gluncić
Journal:  Curr Genomics       Date:  2007-04       Impact factor: 2.236

6.  Direct mapping of symbolic DNA sequence into frequency domain in global repeat map algorithm.

Authors:  Matko Glunčić; Vladimir Paar
Journal:  Nucleic Acids Res       Date:  2012-09-12       Impact factor: 16.971

7.  NTRFinder: a software tool to find nested tandem repeats.

Authors:  Atheer A Matroud; M D Hendy; C P Tuffley
Journal:  Nucleic Acids Res       Date:  2011-11-25       Impact factor: 16.971

8.  Browsing repeats in genomes: Pygram and an application to non-coding region analysis.

Authors:  Patrick Durand; Frédéric Mahé; Anne-Sophie Valin; Jacques Nicolas
Journal:  BMC Bioinformatics       Date:  2006-10-26       Impact factor: 3.169

9.  Combined evidence annotation of transposable elements in genome sequences.

Authors:  Hadi Quesneville; Casey M Bergman; Olivier Andrieu; Delphine Autard; Danielle Nouaud; Michael Ashburner; Dominique Anxolabehere
Journal:  PLoS Comput Biol       Date:  2005-07-29       Impact factor: 4.475

  9 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.