Warning: Undefined array key "mm" in /www/wwwroot/www.ai-bt.com/si.php on line 10 Deprecated: trim(): Passing null to parameter #1 ($string) of type string is deprecated in /www/wwwroot/www.ai-bt.com/si.php on line 10 Searching for repeats, as an example of using the generalised Ruzzo-Tompa algorithm to find optimal subsequences with gaps.

Literature DB >> 24989859

Searching for repeats, as an example of using the generalised Ruzzo-Tompa algorithm to find optimal subsequences with gaps.

John L Spouge¹, Leonardo Mariño-Ramírez¹, Sergey L Sheetlin¹.

Abstract

Some biological sequences contain subsequences of unusual composition; e.g. some proteins contain DNA binding domains, transmembrane regions and charged regions, and some DNA sequences contain repeats. The linear-time Ruzzo-Tompa (RT) algorithm finds subsequences of unusual composition, using a sequence of scores as input and the corresponding 'maximal segments' as output. In principle, permitting gaps in the output subsequences could improve sensitivity. Here, the input of the RT algorithm is generalised to a finite, totally ordered, weighted graph, so the algorithm locates paths of maximal weight through increasing but not necessarily adjacent vertices. By permitting the penalised deletion of unfavourable letters, the generalisation therefore includes gaps. The program RepWords, which finds inexact simple repeats in DNA, exemplifies the general concepts by out-performing a similar extant, ad hoc tool. With minimal programming effort, the generalised Ruzzo-Tompa algorithm could improve the performance of many programs for finding biological subsequences of unusual composition.

Entities: Chemical Disease Gene Species

Keywords: DNA sequences; biological sequences; gaps; generalised Ruzzo–Tompa algorithm; optimal subsequences; repeats; unusual composition

Mesh：

Substances：
Proteins
DNA

Year: 2014 PMID： 24989859 PMCID： PMC4135518 DOI： 10.1504/IJBRA.2014.062991

Source DB: PubMed Journal: Int J Bioinform Res Appl ISSN： 1744-5485

33 in total

1. Most of the tight positional conservation of transcription factor binding sites near the transcription start site reflects their co-localization within regulatory modules.

Authors: Natalia Acevedo-Luna; Leonardo Mariño-Ramírez; Armand Halbert; Ulla Hansen; David Landsman; John L Spouge
Journal: BMC Bioinformatics Date: 2016-11-21 Impact factor: 3.169

1 in total

Searching for repeats, as an example of using the generalised Ruzzo-Tompa algorithm to find optimal subsequences with gaps.

1. WindowMasker: window-based masker for sequenced genomes.

2. Chance and statistical significance in protein and DNA sequence analysis.

3. Significant similarity and dissimilarity in homologous proteins.

Review 4. Repbase Update, a database of eukaryotic repetitive elements.

5. Repseek, a tool to retrieve approximate repeats from large DNA sequences.

Review 6. Transposable elements and the evolution of regulatory networks.

7. Methods for assessing the statistical significance of molecular sequence features by using general scoring schemes.

8. A simple method for displaying the hydropathic character of a protein.

9. An improved algorithm for matching biological sequences.

10. A new repeat-masking method enables specific detection of homologous sequences.

1. Most of the tight positional conservation of transcription factor binding sites near the transcription start site reflects their co-localization within regulatory modules.