| Literature DB >> 18495751 |
Morgane Thomas-Chollier1, Olivier Sand, Jean-Valéry Turatsinze, Rekin's Janky, Matthieu Defrance, Eric Vervisch, Sylvain Brohée, Jacques van Helden.
Abstract
The regulatory sequence analysis tools (RSAT, http://rsat.ulb.ac.be/rsat/) is a software suite that integrates a wide collection of modular tools for the detection of cis-regulatory elements in genome sequences. The suite includes programs for sequence retrieval, pattern discovery, phylogenetic footprint detection, pattern matching, genome scanning and feature map drawing. Random controls can be performed with random gene selections or by generating random sequences according to a variety of background models (Bernoulli, Markov). Beyond the original word-based pattern-discovery tools (oligo-analysis and dyad-analysis), we recently added a battery of tools for matrix-based detection of cis-acting elements, with some original features (adaptive background models, Markov-chain estimation of P-values) that do not exist in other matrix-based scanning tools. The web server offers an intuitive interface, where each program can be accessed either separately or connected to the other tools. In addition, the tools are now available as web services, enabling their integration in programmatic workflows. Genomes are regularly updated from various genome repositories (NCBI and EnsEMBL) and 682 organisms are currently supported. Since 1998, the tools have been used by several hundreds of researchers from all over the world. Several predictions made with RSAT were validated experimentally and published.Entities:
Mesh:
Year: 2008 PMID: 18495751 PMCID: PMC2447775 DOI: 10.1093/nar/gkn304
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Short description of the programs supported on RSAT web sites
| Task | Program name | Input | Output | Description |
|---|---|---|---|---|
| Genomes and genes | Organism names | Returns the list of organisms supported on this site of rsa-tools | ||
| Gene names | Genes | Selects genes whose identifier, name or description matches a list of query strings. Partial matches are supported. | ||
| Gene names | Operons + leader genes | Given one or more input genes, apply a simple distance-based rule to infer the operons to which those genes belong. Report the predicted operon leader gene and/or the complete operon. | ||
| Organism | Genes | Selects a random set of genes. | ||
| Given a gene or a list of genes from a query organism, and a reference taxon, this programs returns the orthologs of the query gene(s) in all the organisms belonging to the reference taxon | ||||
| Sequences | Gene names | Sequences | Given a set of gene names, returns upstream, downstream or unspliced ORF sequences. The user defines the limits relative to the ORF start. Segments overlapping an upstream ORF can be excluded or included. | |
| Sequences | Sequences | Discards large repetitive fragments from a sequence set. Program developed by Stefan Kurtz. | ||
| Sequences | Sequences | Interconversions between different sequence formats | ||
| Sequences | Generates random sequences. Different probabilistic models are proposed (equiprobable nucleotides, specific alphabet utilization and Markov chains). | |||
| Pattern discovery | Sequences | Exceptional oligos | Analyzes oligonucleotide occurrences in a set of sequences, and detects over- or under-represented oligonucleotides. Various background models and scoring statistics are supported. | |
| Sequences | Exceptional dyads | Detects overrepresented dyads (spaced pairs of oligonucleotides) within a set of sequences. | ||
| Sequences | Conserved dyads | Detects phylogenetic footprints by applying dyad-analysis in promoters of a set of orthologous genes. | ||
| Sequences | Positionally biased oligos | Calculates the positional distribution of oligonucleotides in a set of sequences, and detects those which significantly deviate from a homogeneous distribution | ||
| Sequences | Locally over/under-represented oligos/dyads | Computes oligomer/dyad frequencies in a set of sequences, and detects locally over/underrepresented oligomers | ||
| Oligos/dyads | Alignment | Aligns a set of strongly overlapping patterns (oligos or dyads). | ||
| String-based patterns (IUPAC) | Matches between patterns + related statistics | Counts matching residues between pairs of sequences/patterns from two sets, and assess the statistical significance of the matches. Patterns can be described using the IUPAC code for ambiguous nucleotides. Spaced patterns (dyads) are also supported. | ||
| Sequences | PSSM | Detects shared motifs in unaligned sequences on the basis of a greedy algorithm. Developed by Jerry Hertz. | ||
| Sequences | PSSM | Detects shared motifs in unaligned sequences on the basis of a Gibbs sampling strategy. Developed by Andrew Neuwald. | ||
| Pattern matching | Sequences + multiple patterns (string description) | Matching positions in input sequences | String-based pattern matching program specialized for DNA sequences. IUPAC code for partially specified nucleotides is supported, as well as regular expressions. Several patterns can be searched simultaneously in several sequences, allowing a fast detection | |
| Multiple patterns (string description) | Matching positions in all upstream sequences | Pattern matching with | ||
| Sequences + multiple patterns (PSSM) | Matching positions in input sequences | Scans sequences with one or several PSSMs to identify instances of the corresponding motifs (putative sites). This program supports a variety of background models (Bernoulli, Markov chains of any order). | ||
| Sequences + one pattern (PSSM) | Matching positions in input sequences | Pattern matching program based on a position-specific scoring matrix description of the patterns. Developed by Jerry Hertz. | ||
| Single pattern (PSSM) | Matching positions in all upstream sequences | Pattern matching with patser, applied to all genes (upstream or downstream sequences) of a selected organism | ||
| Background model | Background model | Interconversions between formats of background models supported by different programs. | ||
| Features | Features | Interconversions between various formats of feature description. | ||
| Features | Features + statistics | Compares two or more sets of features. This program takes as input several feature files (two or more), and calculates the intersection, union and difference between features. It also computes contingency tables and comparison statistics. | ||
| Patterns (PSSM) | Patterns (PSSM) | Performs inter-conversions between various formats of PSSMs. The program also performs a statistical analysis of the original matrix to provide different position-specific scores (weight, frequencies, information content) | ||
| Patterns (PSSM) | Theoretical score distribution | Computes the theoretical distribution of score probabilities of a given PSSM. Score probabilities can be computed according to Bernoulli as well as Markov-chain background models | ||
| Drawing | Matching positions | Drawing | Draws a map with the results of pattern matching programs. Several sequences can be represented in parallel, allowing visual comparison of matching positions. | |
| Numbers | Drawing | Draws a 2D graph from a table of numerical data |
Note that additional programs are available as Web Services and/or with the stand-alone tools.
Figure 1.Flow chart of the regulatory sequence analysis tools. Rounded boxes represent programs, rectangles data and results and trapezoid user input. Bold arrows highlight the succession of tools used by the tool footprint-discovery.
Supported inter-conversions between formats
| Data type | Program name | Supported input formats | Supported output formats |
|---|---|---|---|
| Sequences | EMBL, fasta, multi, raw, tab, wconsensus | fasta, ig, multi, raw, tab, wconsensus | |
| Features | |||
| PSSM | |||
| Background models | transition table, |
Figure 2.Example of result from footprint-discovery. (A) overrepresented dyads detected in promoters of orthologs of the yeast gene MET1. (B) PSSM obtained by assembling the most significant dyads and using them as seeds to scan the input sequences. (C) Feature map of the significant dyads. The clumps of overlapping boxes are indicative of good predictions for binding sites.
Figure 3.Example of matrix-scan result obtained by scanning yeast upstream sequences with matrices representing binding motifs for the transcription factors Met4p and Met31p. (A) Sequence logos representing binding motifs of the Met4p and Met31p transcription factors. (B) Feature map of the predicted sites and CRERs in upstream sequences of 26 yeast genes involved in methionine metabolism. (C) Random control: feature map of the predicted sites and CRERs detected in upstream sequences of 26 yeast genes selected at random. (D) Fragment of a matrix-scan result table reporting putative sites.