| Literature DB >> 25904632 |
Alejandra Medina-Rivera1, Matthieu Defrance2, Olivier Sand3, Carl Herrmann4, Jaime A Castro-Mondragon5, Jeremy Delerce5, Sébastien Jaeger6, Christophe Blanchet7, Pierre Vincens8, Christophe Caron9, Daniel M Staines10, Bruno Contreras-Moreira11, Marie Artufel5, Lucie Charbonnier-Khamvongsa5, Céline Hernandez8, Denis Thieffry8, Morgane Thomas-Chollier12, Jacques van Helden13.
Abstract
RSAT (Regulatory Sequence Analysis Tools) is a modular software suite for the analysis of cis-regulatory elements in genome sequences. Its main applications are (i) motif discovery, appropriate to genome-wide data sets like ChIP-seq, (ii) transcription factor binding motif analysis (quality assessment, comparisons and clustering), (iii) comparative genomics and (iv) analysis of regulatory variations. Nine new programs have been added to the 43 described in the 2011 NAR Web Software Issue, including a tool to extract sequences from a list of coordinates (fetch-sequences from UCSC), novel programs dedicated to the analysis of regulatory variants from GWAS or population genomics (retrieve-variation-seq and variation-scan), a program to cluster motifs and visualize the similarities as trees (matrix-clustering). To deal with the drastic increase of sequenced genomes, RSAT public sites have been reorganized into taxon-specific servers. The suite is well-documented with tutorials and published protocols. The software suite is available through Web sites, SOAP/WSDL Web services, virtual machines and stand-alone programs at http://www.rsat.eu/.Entities:
Mesh:
Substances:
Year: 2015 PMID: 25904632 PMCID: PMC4489296 DOI: 10.1093/nar/gkv362
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Figure 1.Overview of the main applications of RSAT.
Selection of some tools available on RSAT Web servers
| Application field | Program name | Input | Output | Description |
|---|---|---|---|---|
| Obtaining sequences (Sequence Tools) | retrieve-seq | Gene names | Sequences | Given a set of gene names, returns upstream, downstream (relative to ORF start) or unspliced ORF sequences. Segments overlapping an upstream ORF can be excluded or included. |
| * fetch-sequences (from UCSC) | Genomic coordinates | Sequences | From a set of genomic coordinates (BED file), collects the sequences from the UCSC genome browser. | |
| retrieve-ensembl-seq | Gene names | Sequences | Returns upstream, downstream, intronic, exonic, UTR, mRNA or CDS for a list of genes from Ensembl. Multi-genome queries enable automatic retrieval of sequences for gene orthologues. | |
| * retrieve-variation-seq | Identifier of variations | Sequences of the variants | Given a set of IDs for genetic variations, returns the corresponding variants and their flanking sequences. The output file can be scanned with the tool ‘variation-scan’. | |
| Motif discovery | oligo-analysis | Sequences | Over/under-represented oligonucleotides + PSSM | Analyses oligonucleotide occurrences in a set of sequences and detects over/under-represented oligonucleotides, using various background models and scoring statistics. |
| dyad-analysis | Sequences | Over/under-represented dyads + PSSM | Detects over-represented dyads (spaced pairs of oligonucleotides) within a set of sequences. | |
| NGS ChIP-seq | peak-motifs | Sequences | Discovered motifs + predicted sites | Discovers motifs in ChIP-seq peak sequence sets and returns detailed information on sequence composition and discovered motifs, with correspondences in databases and predicted binding sites. |
| Pattern matching | * crer-scan | Transcription factor binding sites | Given a set of | |
| matrix-scan (-quick) | Sequences + PSSMs | Matching positions in input sequences | Scans sequences with one or several PSSMs to identify instances of the corresponding motifs (putative sites). Supports a variety of background models (Bernoulli, Markov chains of any order). | |
| * variation-scan | Variant sequences | Regulatory variants | Scans variant sequences with PSSMs and report variations that affect the binding score, in order to predict regulatory variants. | |
| dna-pattern | Sequences + patterns | Matching positions in input sequences | String-based pattern matching program specialized for DNA sequences. Supports IUPAC code for partially specified nucleotides, regular expressions and search simultaneously multiple patterns. | |
| Motif quality and comparisons (Matrix Tools) | matrix-quality | Motif (PSSM) + sequence set(s) | Score distribution statistics + ROC curves | Evaluates the quality of a PSSM by comparing score distributions obtained with this matrix in control sequence sets. |
| compare-matrices | Two sets of PSSM | Similarity scores + matrix alignments | Compares two collections of PSSMs and returns various similarity statistics + matrix alignments. | |
| * matrix-clustering | One set of PSSM | Clusters of matrices + similarity trees | Clusters similar PSSMs and builds consensus matrices for each cluster. | |
| Comparative genomics | get-orthologs | Gene names + taxon | List of homologous genes with percentage of identity, alignment length and e-value | Given a list of genes from a query organism and a reference taxon, returns the orthologues of the query gene(s) in all the organisms belonging to the reference taxon. |
| footprint-discovery | Sequences | Conserved dyads + PSSM | Detects phylogenetic footprints by applying ‘dyad-analysis’ in promoters of a set of orthologous genes. | |
| * footprint-scan | Sequences + PSSM | Conserved motifs + binding sites | Scans promoters of orthologous genes with one or several PSSMs to detect enriched motifs and predict phylogenetically conserved target genes. | |
| Building control sets | random-seq | Sequence specifications | Sequences | This program generates random sequences. Different probabilistic models are proposed (equiprobable nucleotides, specific alphabet utilization, Markov chains). |
| random-genes | Name of an organism | Genes | Selects a random set of genes in a given genome. | |
| random-genome-fragments | Name of an organism | Randomly selected genome fragments | Selects a set of fragments with random positions in a given genome supported in either RSAT or Ensembl and returns their coordinates and/or sequences. | |
| permute-matrix | One set of PSSM | Randomized PSSMs | Randomizes a set of input matrices by permuting their columns. The resulting motifs have the same nucleotide composition and information content as the original ones. |
This table only displays the most central tools available on the Web interface. See the RSAT Web site for an exhaustive list of available tools. The new tools since the 2011 Web software issue are marked with an asterisk (*).