Literature DB >> 24989859

Searching for repeats, as an example of using the generalised Ruzzo-Tompa algorithm to find optimal subsequences with gaps.

John L Spouge1, Leonardo Mariño-Ramírez1, Sergey L Sheetlin1.   

Abstract

Some biological sequences contain subsequences of unusual composition; e.g. some proteins contain DNA binding domains, transmembrane regions and charged regions, and some DNA sequences contain repeats. The linear-time Ruzzo-Tompa (RT) algorithm finds subsequences of unusual composition, using a sequence of scores as input and the corresponding 'maximal segments' as output. In principle, permitting gaps in the output subsequences could improve sensitivity. Here, the input of the RT algorithm is generalised to a finite, totally ordered, weighted graph, so the algorithm locates paths of maximal weight through increasing but not necessarily adjacent vertices. By permitting the penalised deletion of unfavourable letters, the generalisation therefore includes gaps. The program RepWords, which finds inexact simple repeats in DNA, exemplifies the general concepts by out-performing a similar extant, ad hoc tool. With minimal programming effort, the generalised Ruzzo-Tompa algorithm could improve the performance of many programs for finding biological subsequences of unusual composition.

Entities:  

Keywords:  DNA sequences; biological sequences; gaps; generalised Ruzzo–Tompa algorithm; optimal subsequences; repeats; unusual composition

Mesh:

Substances:

Year:  2014        PMID: 24989859      PMCID: PMC4135518          DOI: 10.1504/IJBRA.2014.062991

Source DB:  PubMed          Journal:  Int J Bioinform Res Appl        ISSN: 1744-5485


  33 in total

1.  WindowMasker: window-based masker for sequenced genomes.

Authors:  Aleksandr Morgulis; E Michael Gertz; Alejandro A Schäffer; Richa Agarwala
Journal:  Bioinformatics       Date:  2005-11-15       Impact factor: 6.937

2.  Chance and statistical significance in protein and DNA sequence analysis.

Authors:  S Karlin; V Brendel
Journal:  Science       Date:  1992-07-03       Impact factor: 47.728

3.  Significant similarity and dissimilarity in homologous proteins.

Authors:  S Karlin; V Brendel; P Bucher
Journal:  Mol Biol Evol       Date:  1992-01       Impact factor: 16.240

Review 4.  Repbase Update, a database of eukaryotic repetitive elements.

Authors:  J Jurka; V V Kapitonov; A Pavlicek; P Klonowski; O Kohany; J Walichiewicz
Journal:  Cytogenet Genome Res       Date:  2005       Impact factor: 1.636

5.  Repseek, a tool to retrieve approximate repeats from large DNA sequences.

Authors:  Guillaume Achaz; Frédéric Boyer; Eduardo P C Rocha; Alain Viari; Eric Coissac
Journal:  Bioinformatics       Date:  2006-10-11       Impact factor: 6.937

Review 6.  Transposable elements and the evolution of regulatory networks.

Authors:  Cédric Feschotte
Journal:  Nat Rev Genet       Date:  2008-05       Impact factor: 53.242

7.  Methods for assessing the statistical significance of molecular sequence features by using general scoring schemes.

Authors:  S Karlin; S F Altschul
Journal:  Proc Natl Acad Sci U S A       Date:  1990-03       Impact factor: 11.205

8.  A simple method for displaying the hydropathic character of a protein.

Authors:  J Kyte; R F Doolittle
Journal:  J Mol Biol       Date:  1982-05-05       Impact factor: 5.469

9.  An improved algorithm for matching biological sequences.

Authors:  O Gotoh
Journal:  J Mol Biol       Date:  1982-12-15       Impact factor: 5.469

10.  A new repeat-masking method enables specific detection of homologous sequences.

Authors:  Martin C Frith
Journal:  Nucleic Acids Res       Date:  2010-11-24       Impact factor: 16.971

View more
  1 in total

1.  Most of the tight positional conservation of transcription factor binding sites near the transcription start site reflects their co-localization within regulatory modules.

Authors:  Natalia Acevedo-Luna; Leonardo Mariño-Ramírez; Armand Halbert; Ulla Hansen; David Landsman; John L Spouge
Journal:  BMC Bioinformatics       Date:  2016-11-21       Impact factor: 3.169

  1 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.