Literature DB >> 29095974

BEAM web server: a tool for structural RNA motif discovery.

Marco Pietrosanto1, Marta Adinolfi1, Riccardo Casula1, Gabriele Ausiello1, Fabrizio Ferrè2, Manuela Helmer-Citterich1.   

Abstract

Motivation: RNA structural motif finding is a relevant problem that becomes computationally hard when working on high-throughput data (e.g. eCLIP, PAR-CLIP), often represented by thousands of RNA molecules. Currently, the BEAM server is the only web tool capable to handle tens of thousands of RNA in input with a motif discovery procedure that is only limited by the current secondary structure prediction accuracies.
Results: The recently developed method BEAM (BEAr Motifs finder) can analyze tens of thousands of RNA molecules and identify RNA secondary structure motifs associated to a measure of their statistical significance. BEAM is extremely fast thanks to the BEAR encoding that transforms each RNA secondary structure in a string of characters. BEAM also exploits the evolutionary knowledge contained in a substitution matrix of secondary structure elements, extracted from the RFAM database of families of homologous RNAs. The BEAM web server has been designed to streamline data pre-processing by automatically handling folding and encoding of RNA sequences, giving users a choice for the preferred folding program. The server provides an intuitive and informative results page with the list of secondary structure motifs identified, the logo of each motif, its significance, graphic representation and information about its position in the RNA molecules sharing it. Availability and implementation: The web server is freely available at http://beam.uniroma2.it/ and it is implemented in NodeJS and Python with all major browsers supported. Contact: marco.pietrosanto@uniroma2.it. Supplementary information: Supplementary data are available at Bioinformatics online.

Entities:  

Mesh:

Substances:

Year:  2018        PMID: 29095974      PMCID: PMC5860439          DOI: 10.1093/bioinformatics/btx704

Source DB:  PubMed          Journal:  Bioinformatics        ISSN: 1367-4803            Impact factor:   6.937


1 Introduction

Structural motif finding in RNA is a growing branch in the field of computational biology, especially given the rise of new experimental techniques capable of probing structural contexts at single nucleotide resolution (Lu and Chang, 2016; Wan ), which in turn allows for more accurate secondary structure predictions (Lorenz ). The question that is usually addressed by looking for structural motifs revolves around finding structural determinants associated to specific functions (e.g. protein interaction specificity, main actor of certain interactions or behaviours), for example the determinant of Staufen-RNA specificity (LeGendre ). In this sense, data coming from high-throughput in vivo experiments such as HITS-CLIP, PAR-CLIP, iCLIP or eCLIP provide a perfect playground, for they are often composed by a large number of molecules (up to 50k RNAs, or even more) with a shared binding ability. Current structural motif finders cannot work over low input size thresholds (i.e. 1000 molecules is a hard limit for most) and, to our knowledge, only our method BEAM (Pietrosanto ), and the most recent SMARTIV (Polishchuk ), can tackle these large inputs. Ours is, however, the only web server that can both discover motifs in large windows (e.g. downstream or upstream a binding site or along a 500 nt RNA) and with tens of thousands of molecules.

2 Materials and methods

We extended BEAM, for which the user must provide pre-computed RNA secondary structures converted in BEAR notation through a separate encoding software (Mattei ), by letting users upload a standard FASTA format file containing only the RNA sequences. In this case users can choose one out of two possible structural prediction methods: RNAfold from the Vienna Package (Gruber ) or MaxExpect from RNAstructure (Reuter and Mathews, 2010). Then, from the dot-bracket notation, RNA structures will be automatically converted into the BEAR encoding. Users can also directly upload a file containing RNA sequences in FASTA format containing the corresponding secondary structure prediction in dot-bracket or in BEAR notation. The same data can be also pasted in a text-area. The users can also upload a background dataset for computing the motif significance; alternatively, the server provides automatic background generation by using RNA sequences from Rfam seed data with a filter that guarantees similar length and amount of structural content with respect to the input (Mattei ). Another available feature is the possibility to upload a BED file, which is the most common output format for CLIP-Seq analysis tools: the webserver will manage all the needed processing steps (namely: extension of the intervals, intersection with a feature file to extract only specific genomic regions, sequence retrieval, secondary structure prediction and motif discovery). In the output page, a table is provided containing all the RNA structure motifs identified. This table shows the following information, for each identified motif: a WebLogo (Crooks ) picture in qBEAR alphabet (Pietrosanto ), statistic values (such as P-value, coverage, BEAM score etc.), a histogram of the motif position distribution with respect to the 5′ of each RNA, and the motif model structure picture obtained using VARNA (Darty ). It is also possible to expand the motif results by listing all sequences with a graphic illustration of the motif position relative to the sequence length, along with the dot-bracket and sequence alignments. This representation of a structural motif provides researchers with an overview of how sub-structures could be involved in the function shared by all, or a subset, of the input RNAs, such as protein–RNA or RNA–RNA interactions.

3 Results

For large datasets the application was tested, along with about a hundred unique datasets, with CLIP-Seq data for SLBP (Zhang ) (stem-loop binding protein)-interacting RNAs (GSE62154), LIN28A (Cho ; Zeng ) targets and the other DoRiNA (Blin ) datasets, for which all the significant motifs retrieved were presented in the original work (Pietrosanto ). In some datasets, we analysed more than 35K RNA sequences in a single run. In particular SLBP has been known to interact with dsRNA (Brooks ; Li ; Zhang ), and the accurate structural context (Fig. 1) can be retrieved with little effort. The server has been tested on datasets of up to 100k RNAs and up to 5 motifs per dataset, and the computational time is similar to that of the BEAM standalone version, as the post analyses take negligible time to compute. The only consistent time added is the time taken by the secondary structure prediction and eventually the genomic interval pre-processing by means of BEDtools (Quinlan, 2014), if used. Current limitations and associated graphs are reported in the Supplementary Material and in the online documentation.
Fig. 1

SLBP putative interaction motif. On the left, a logo describing the identified structural motif is shown in qBEAR notation, in which A stands for medium size stem of a hairpin, and X for a short size terminal loop. On the right, an instance of the motif secondary structure is shown

SLBP putative interaction motif. On the left, a logo describing the identified structural motif is shown in qBEAR notation, in which A stands for medium size stem of a hairpin, and X for a short size terminal loop. On the right, an instance of the motif secondary structure is shown

4 Conclusion

The BEAM web server is a web application that allows the analyses of RNA datasets in search of secondary structure motifs. It can work with tens of thousands of molecules (see Supplementary Material for more information) with a length up to 2000 nt (if folding predictors are used, different limits are applied, see Supplementary Material). Therefore, this is the only tool that can tackle the task of structural motif discovery of big datasets (such as CLIP-Seq) along their full length. Moreover, our framework enables researchers to access the tool without additional scripting thanks to the automation provided by the web server. For advanced users, this resource is a fast test ground for BEAM and a precious time saver for downstream analysis. Click here for additional data file.
  19 in total

1.  A novel approach to represent and compare RNA secondary structures.

Authors:  Eugenio Mattei; Gabriele Ausiello; Fabrizio Ferrè; Manuela Helmer-Citterich
Journal:  Nucleic Acids Res       Date:  2014-04-21       Impact factor: 16.971

2.  A novel method for the identification of conserved structural patterns in RNA: From small scale to high-throughput applications.

Authors:  Marco Pietrosanto; Eugenio Mattei; Manuela Helmer-Citterich; Fabrizio Ferrè
Journal:  Nucleic Acids Res       Date:  2016-08-31       Impact factor: 16.971

3.  RNAstructure: software for RNA secondary structure prediction and analysis.

Authors:  Jessica S Reuter; David H Mathews
Journal:  BMC Bioinformatics       Date:  2010-03-15       Impact factor: 3.169

4.  LIN28A is a suppressor of ER-associated translation in embryonic stem cells.

Authors:  Jun Cho; Hyeshik Chang; S Chul Kwon; Baekgyu Kim; Yoosik Kim; Junho Choe; Minju Ha; Yoon Ki Kim; V Narry Kim
Journal:  Cell       Date:  2012-10-25       Impact factor: 41.582

5.  RNA targets and specificity of Staufen, a double-stranded RNA-binding protein in Caenorhabditis elegans.

Authors:  Jacqueline Baca LeGendre; Zachary T Campbell; Peggy Kroll-Conner; Phil Anderson; Judith Kimble; Marvin Wickens
Journal:  J Biol Chem       Date:  2012-11-29       Impact factor: 5.157

6.  The Vienna RNA websuite.

Authors:  Andreas R Gruber; Ronny Lorenz; Stephan H Bernhart; Richard Neuböck; Ivo L Hofacker
Journal:  Nucleic Acids Res       Date:  2008-04-19       Impact factor: 16.971

7.  DoRiNA 2.0--upgrading the doRiNA database of RNA interactions in post-transcriptional regulation.

Authors:  Kai Blin; Christoph Dieterich; Ricardo Wurmus; Nikolaus Rajewsky; Markus Landthaler; Altuna Akalin
Journal:  Nucleic Acids Res       Date:  2014-11-21       Impact factor: 16.971

8.  A multiprotein occupancy map of the mRNP on the 3' end of histone mRNAs.

Authors:  Lionel Brooks; Shawn M Lyons; J Matthew Mahoney; Joshua D Welch; Zhongle Liu; William F Marzluff; Michael L Whitfield
Journal:  RNA       Date:  2015-09-16       Impact factor: 4.942

9.  Landscape and variation of RNA secondary structure across the human transcriptome.

Authors:  Yue Wan; Kun Qu; Qiangfeng Cliff Zhang; Ryan A Flynn; Ohad Manor; Zhengqing Ouyang; Jiajing Zhang; Robert C Spitale; Michael P Snyder; Eran Segal; Howard Y Chang
Journal:  Nature       Date:  2014-01-30       Impact factor: 49.962

10.  RNA folding with hard and soft constraints.

Authors:  Ronny Lorenz; Ivo L Hofacker; Peter F Stadler
Journal:  Algorithms Mol Biol       Date:  2016-04-23       Impact factor: 1.405

View more
  6 in total

1.  Motif Discovery from CLIP Experiments.

Authors:  Marco Pietrosanto; Gabriele Ausiello; Manuela Helmer-Citterich
Journal:  Methods Mol Biol       Date:  2021

2.  Discovering sequence and structure landscapes in RNA interaction motifs.

Authors:  Marta Adinolfi; Marco Pietrosanto; Luca Parca; Gabriele Ausiello; Fabrizio Ferrè; Manuela Helmer-Citterich
Journal:  Nucleic Acids Res       Date:  2019-06-04       Impact factor: 16.971

3.  RNALigands: a database and web server for RNA-ligand interactions.

Authors:  Saisai Sun; Jianyi Yang; Zhaolei Zhang
Journal:  RNA       Date:  2021-11-03       Impact factor: 4.942

4.  RNANetMotif: Identifying sequence-structure RNA network motifs in RNA-protein binding sites.

Authors:  Hongli Ma; Han Wen; Zhiyuan Xue; Guojun Li; Zhaolei Zhang
Journal:  PLoS Comput Biol       Date:  2022-07-12       Impact factor: 4.779

5.  The RNA-bound proteome of MRSA reveals post-transcriptional roles for helix-turn-helix DNA-binding and Rossmann-fold proteins.

Authors:  Liang-Cui Chu; Pedro Arede; Wei Li; Erika C Urdaneta; Ivayla Ivanova; Stuart W McKellar; Jimi C Wills; Theresa Fröhlich; Alexander von Kriegsheim; Benedikt M Beckmann; Sander Granneman
Journal:  Nat Commun       Date:  2022-05-24       Impact factor: 17.694

6.  Introduction to Bioinformatics Resources for Post-transcriptional Regulation of Gene Expression.

Authors:  Eliana Destefanis; Erik Dassi
Journal:  Methods Mol Biol       Date:  2022
  6 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.