Literature DB >> 24838564

CopraRNA and IntaRNA: predicting small RNA targets, networks and interaction domains.

Patrick R Wright¹, Jens Georg², Martin Mann³, Dragos A Sorescu³, Andreas S Richter⁴, Steffen Lott², Robert Kleinkauf³, Wolfgang R Hess², Rolf Backofen⁵.

Abstract

CopraRNA (Comparative prediction algorithm for small RNA targets) is the most recent asset to the Freiburg RNA Tools webserver. It incorporates and extends the functionality of the existing tool IntaRNA (Interacting RNAs) in order to predict targets, interaction domains and consequently the regulatory networks of bacterial small RNA molecules. The CopraRNA prediction results are accompanied by extensive postprocessing methods such as functional enrichment analysis and visualization of interacting regions. Here, we introduce the functionality of the CopraRNA and IntaRNA webservers and give detailed explanations on their postprocessing functionalities. Both tools are freely accessible at http://rna.informatik.uni-freiburg.de.

Entities: Disease Gene Species

Mesh：

Substances：

Year: 2014 PMID： 24838564 PMCID： PMC4086077 DOI： 10.1093/nar/gku359

Source DB: PubMed Journal: Nucleic Acids Res ISSN： 0305-1048 Impact factor: 16.971

INTRODUCTION

In recent years, bacterial small RNAs (sRNAs) have proven to be potent, versatile and important regulators of prokaryotic gene expression (1,2). Furthermore, they are extremely abundant in various prokaryotic genomes (3–7) and due to novel experimental (6,8,9) and computational (10–12) methods on the genomic scale, biologists are struggling with ever increasing magnitudes of sRNA data that can, in many cases, only be harnessed by bioinformatics analyses (i.e. target predictions), preceding wetlab verifications. To make analysis methods accessible to a broad audience, graphical user interfaces (GUIs) are indispensable. Offering such interfaces in a web browser based manner has proven to be useful and intuitive to many users in the past (13–16). The Freiburg RNA Tools webserver aims at supplying an easy to use, free and comprehensive web resource for RNA analysis, also for non-adept users. Several sRNA target prediction algorithms have been developed in the past (17), and many of them are available as webservers (14,18–21). Here, we highlight that CopraRNA (Comparative prediction algorithm for small RNA targets) (22) and IntaRNA (Interacting RNAs) (23) not only produce more than sound results but also supply postprocessing that greatly aids in the interpretation and evaluation of the results. The tools are accompanied by extensive help pages, and direct help requests are rapidly answered. The results can be viewed in the browser and downloaded for further local analysis or archiving. Furthermore, the source code for both tools is available for download on the Freiburg RNA software page at http://www.bioinf.uni-freiburg.de/Software/.

CopraRNA AND IntaRNA

While CopraRNA is a comparative method that constructs a combined sRNA target prediction for a set of given organisms, IntaRNA predicts interactions in single organisms. An exemplary workflow incorporating both tools is given in Figure 1. Employing a statistical model, CopraRNA computes whole genome target predictions by combining whole genome IntaRNA target screens for homologous sRNA sequences from distinct organisms. Individual evolutionary distances between the organisms and the statistical dependencies in the data are accounted for and are corrected within the workflow of the algorithm. IntaRNA predicts interacting regions between two RNA molecules by incorporating the accessibility of both interaction sites and the presence of a seed interaction; both features are commonly observed in sRNA–mRNA interactions (24). IntaRNA, unlike CopraRNA, can also be applied to non-whole genome screens using smaller sets of RNA molecules as input. Thus, it is also applicable to RNA–RNA interaction prediction for eukaryotic systems (25).

Figure 1.

sRNA identification and classification workflow incorporating CopraRNA or IntaRNA. The first box mentions selected experiments that have aided in sRNA identification, i.e. RNAseq (8), dRNAseq (6) or Hfq co-immunoprecipitation (CoIP) (9). The cylinder represents databases that can be queried while looking for sRNA homologs. Examples are NCBI (BLAST) (26) or Rfam (27). The next step is the execution of the actual sRNA target prediction depending on presence of sRNA homologs (CopraRNA) or absence of sRNA homologs (IntaRNA). The final two stages consist of postprocessing and selection of candidates for experimental verification, e.g. by a GFP reporter system (32).

INPUT AND OUTPUT

Input data must be supplied in FASTA format. For CopraRNA, the FASTA file should represent three or more homologous sRNA sequences from distinct organisms. Homologous sRNA sequences may be retrieved from databases such as NCBI via BLAST (26) or from Rfam (27). While only three input sequences are mandatory, we suggest using at least five if available. CopraRNA requires for each sequence, a RefSeq ID of its affiliated organism as FASTA header (see Figure 3A, top left for an example). If several RefSeq IDs correspond to replicons of one organism, any one of these IDs may be supplied. A maximum of eight input organisms is possible. One of these species must be selected as central reference (organism of interest) for postprocessing and annotation.

Figure 3.

CopraRNA webserver input (A) and output (B) page for the sRNA GcvB. The FASTA file may be pasted or uploaded to the webserver. Upon insertion of the sequences, the webserver automatically displays the RefSeq IDs’ organism affiliations (blue text in (A)). The output page contains a visualization of the primary result table, the interaction as predicted by IntaRNA and the interacting region plots within the sRNA and mRNA. Furthermore, the functional enrichment is visualized as interactive heatmap.

Currently, ∼2700 organisms are available for CopraRNA and IntaRNA whole genome target predictions and the list is updated on a monthly basis. As previously mentioned, IntaRNA can also compute interactions for smaller sets of RNAs. In this case, the user may supply two FASTA files. For these, all pairwise interactions are computed. Suggested standard parameters for IntaRNA are a seed length (p) of 7, a target folding window size (w) of 150 and a maximum base pair distance (L) of 100 (28). Both webservers provide the top 100 predictions of the respective methods as primary result table. Furthermore, the core results of the algorithms are accompanied by extensive postprocessing that aids interpretation and condensation of the result tables. For whole genome target predictions, CopraRNA and IntaRNA include automatic functional enrichment (29) of the top predicted targets and visualization of putative interacting regions within the sRNA and the mRNA. As a new feature of the webserver, the functionally enriched terms are represented within a heatmap, allowing ‘at a glance’ conclusions for target networks (see Figure 2 for an example). These results can guide the user while constructing functional networks and characterizing target binding mechanisms of a given sRNA. For users interested in the entire results, the corresponding job is available for download as compressed archive. Sample input and output pages for CopraRNA are displayed in Figure 3. Both tools’ source code is also available for download and local installation from the Freiburg RNA webserver download page.

Figure 2.

The CopraRNA heatmap shows the targets with a p-value ≤ 0.01 (for IntaRNA the top 50 predicted targets are subjected to the initial functional enrichment), which have homologs in the organism of interest and are functionally enriched. All members of clusters with a DAVID enrichment score ≥ 1.0 are shown in a specific color. Each row represents a gene and each column a specific functional term. If the gene can be assigned to a term, the corresponding square is colored. If no assignment was made, the square remains white. Closely related terms are assigned to a cluster and have the same color. The opacity of the color depends on the p-value of the CopraRNA prediction. A more intense color represents a more significant p-value. The ‘fold enrichment’ is given in front of the term descriptions. It represents the enrichment of a term in the prediction group in relation to the whole prediction background (e.g. a term with an enrichment of 10 contains 10 times more genes belonging to the respective term than the background). The enrichment scores give a measure of the biological significance of the cluster. The DAVID enrichment score for a cluster is the log transformed geometric mean of all enrichment p-values from the terms belonging to the respective cluster. A higher score represents a more statistically significant enrichment. The individual p-values for the terms are calculated by a modified Fisher's exact test. The length of the bars next to the groups of enriched genes corresponds to the size of the enrichment score. The publication on the DAVID webserver suggests to investigate clusters with an enrichment score of ≥ 1.3 while also pointing out that clusters with lower enrichment scores must not necessarily be discarded and may also contain useful information (33). This specific heatmap represents the enrichment output for the enterobacterial (here Escherichia coli) sRNA FnrS. Due to space reasons only one term for each cluster is shown.

METHODS

CopraRNA utilizes IntaRNA to calculate single organism whole genome target predictions. IntaRNA predictions are computed for each sRNA-organism pair participating in the analysis. These individual predictions are the basis for the comparative model. In order to combine target predictions for homologous genes from distinct organisms, IntaRNA p-values are computed from the IntaRNA energy scores for each putative target with an energy score ≤ 0. Transforming energy scores to p-values is achieved by fitting generalized extreme value distributions to the IntaRNA energy scores. Using the resulting equations for each individual whole genome target prediction, p-values can be calculated for each putative target. In the following, the DomClust (30) algorithm is applied in order to cluster homologous genes. The clustering is based on the amino acid sequences of the organisms’ protein coding genes. These clusters are then used to calculate a combined CopraRNA p-value for each cluster of homologous genes by employing Hartung's method for the combination of dependent p-values (31). Conveniently, it not only allows to account for the overall dependency within the data, but also incorporates the possibility to weight individual p-values. This is important, as the organisms participating in the analysis can usually not be regarded as equidistant. Closer organisms are consequently down weighted. Excessive influence of outliers is corrected for by applying a root function to the weights. The final set of CopraRNA p-values is employed for q-value calculation. The q-values give an estimate of the false discovery rate of the target prediction. More detailed algorithmic explanations on CopraRNA and IntaRNA are given in the original publications (22,23).

POSTPROCESSING AND PREDICTION QUALITY ESTIMATION

The benchmarking of CopraRNA showed that some predictions are more reliable (e.g. GcvB, RyhB, FnrS) than others (e.g. ArcZ) (22). On behalf of a reduced experimental (32) workload it is preferable to have a measure for the reliability of each individual prediction. Here the q-value and the postprocessing outputs provide guidance. A strong functional enrichment signature, pointing to a specific group of genes or a specific pathway, has proven to be a reliable signal for a meaningful prediction. However, functional enrichments are not always present. This may be due to low prediction quality, but it can also be caused by a lack of annotation for the organism of interest or its absence in the DAVID knowledge base (33). In these cases the user may opt to choose the organism with the best available annotation as organism of interest. If this proves ineffective, the user should resort to the q-value distribution and the interaction domain plots. A slowly growing q-value, i.e. a relatively high number of predictions with a q-value ≤ 0.5, is a hallmark of a more reliable prediction, especially if the interaction plots show distinct clustered interaction regions for the sRNA and mRNA homologs. A random distribution of the interaction sites in the mRNAs and/or sRNA homologs argues against a reliable prediction.

JOB ARCHIVING

Upon submission, a unique ID, which is only known by the submitting user, is automatically assigned to each job. This ID can be used to recall the results of a specific job at any time within the storage period. The Freiburg RNA webserver stores all computed results for 30 days. Within this time, selected results or the entire job directory may be downloaded for local archiving by the user. Online archiving within the 30 day period is aided by the possibility of setting job specific descriptions.

PRIOR APPLICATION AND EVALUATION

The predictive performance of CopraRNA and IntaRNA was previously evaluated on an extensive benchmarking dataset of 101 experimentally verified sRNA and target pairs from 18 enterobacterial sRNAs (22). They were compared to each other and to RNApredator (19) and TargetRNA (18). Both tools from the Freiburg RNA webserver outperformed the other tools in predictive accuracy. Furthermore, CopraRNA was compared to experimental target prediction by micro arrays. Strikingly, it showed similar predictive quality with respect to the abundance of correctly predicted targets (22). From the CopraRNA benchmark predictions, 23 previously unreported, putative sRNA targets were selected for experimental verification. From these, 17 were verified (22). This represents a success rate of ∼74%. CopraRNA has also been successfully applied in studies on non-enterobacterial species. These include investigations of the sRNAs PsrR1 from Synechocystis sp. PCC6803 and AbcR1 from Agrobacterium tumefaciens (unpublished data). Beside many other studies, computational predictions with IntaRNA enabled the identification of two novel targets of the cyanobacterial sRNA Yfr1 (34) and aided in finding that the archaeal sRNA162 targets both cis- and trans-encoded mRNAs via two distinct domains (35).

IMPLEMENTATION

The Freiburg RNA webserver is based on Apache Tomcat Java Server Pages (JSP) to enable a high server-side performance for input validation, job execution and retrieval, and dedicated pre- and postprocessing. Javascripting is used to provide an intuitive and interactive user interface on the client side. The tools provided by the Freiburg RNA webserver are run on a dedicated computing cluster with up to 480 CPUs, depending on the workload. Jobs are automatically queued and started via Sun Grid Engine to ensure a balanced and fast job processing given the varying execution requirements of the different tools provided. An automatic emailing system informs the user upon job completion if an email address (optional) is provided upon submission.

34 in total

Review 1. Bacterial small RNA regulators: versatile roles and rapidly evolving variations.

Authors: Susan Gottesman; Gisela Storz
Journal: Cold Spring Harb Perspect Biol Date: 2011-12-01 Impact factor: 10.005

2. An infection-relevant transcriptomic compendium for Salmonella enterica Serovar Typhimurium.

Authors: Carsten Kröger; Aoife Colgan; Shabarinath Srikumar; Kristian Händler; Sathesh K Sivasankaran; Disa L Hammarlöf; Rocío Canals; Joe E Grissom; Tyrrell Conway; Karsten Hokamp; Jay C D Hinton
Journal: Cell Host Microbe Date: 2013-12-11 Impact factor: 21.023

Review 3. Regulation by small RNAs in bacteria: expanding frontiers.

Authors: Gisela Storz; Jörg Vogel; Karen M Wassarman
Journal: Mol Cell Date: 2011-09-16 Impact factor: 17.970

4. Bioinformatic prediction and experimental verification of sRNAs in the haloarchaeon Haloferax volcanii.

Authors: Julia Babski; Brian Tjaden; Björn Voss; Angelika Jellen-Ritter; Anita Marchfelder; Wolfgang R Hess; Jörg Soppa
Journal: RNA Biol Date: 2011-06-29 Impact factor: 4.652

Review 5. RNA-Seq: a revolutionary tool for transcriptomics.

Authors: Zhong Wang; Mark Gerstein; Michael Snyder
Journal: Nat Rev Genet Date: 2009-01 Impact factor: 53.242

6. Global or local? Predicting secondary structure and accessibility in mRNAs.

Authors: Sita J Lange; Daniel Maticzka; Mathias Möhl; Joshua N Gagnon; Chris M Brown; Rolf Backofen
Journal: Nucleic Acids Res Date: 2012-02-28 Impact factor: 16.971

7. ViennaRNA Package 2.0.

Authors: Ronny Lorenz; Stephan H Bernhart; Christian Höner Zu Siederdissen; Hakim Tafer; Christoph Flamm; Peter F Stadler; Ivo L Hofacker
Journal: Algorithms Mol Biol Date: 2011-11-24 Impact factor: 1.405

8. DAVID-WS: a stateful web service to facilitate gene/protein list analysis.

Authors: Xiaoli Jiao; Brad T Sherman; Da Wei Huang; Robert Stephens; Michael W Baseler; H Clifford Lane; Richard A Lempicki
Journal: Bioinformatics Date: 2012-04-27 Impact factor: 6.937

9. CARNA--alignment of RNA structure ensembles.

Authors: Dragos Alexandru Sorescu; Mathias Möhl; Martin Mann; Rolf Backofen; Sebastian Will
Journal: Nucleic Acids Res Date: 2012-06-11 Impact factor: 16.971

10. An archaeal sRNA targeting cis- and trans-encoded mRNAs via two distinct domains.

Authors: Dominik Jäger; Sandy R Pernitzsch; Andreas S Richter; Rolf Backofen; Cynthia M Sharma; Ruth A Schmitz
Journal: Nucleic Acids Res Date: 2012-09-10 Impact factor: 16.971

138 in total

1. Proteomic Analysis of the Pseudomonas aeruginosa Iron Starvation Response Reveals PrrF Small Regulatory RNA-Dependent Iron Regulation of Twitching Motility, Amino Acid Metabolism, and Zinc Homeostasis Proteins.

Authors: Cassandra E Nelson; Weiliang Huang; Luke K Brewer; Angela T Nguyen; Maureen A Kane; Angela Wilks; Amanda G Oglesby-Sherrouse
Journal: J Bacteriol Date: 2019-05-22 Impact factor: 3.490

2. Synthetic negative feedback circuits using engineered small RNAs.

Authors: Ciarán L Kelly; Andreas W K Harris; Harrison Steel; Edward J Hancock; John T Heap; Antonis Papachristodoulou
Journal: Nucleic Acids Res Date: 2018-10-12 Impact factor: 16.971

3. The multicopy sRNA LhrC controls expression of the oligopeptide-binding protein OppA in Listeria monocytogenes.

Authors: Susanne Sievers; Anja Lund; Pilar Menendez-Gil; Aaraby Nielsen; Maria Storm Mollerup; Stine Lambert Nielsen; Pernille Buch Larsson; Jonas Borch-Jensen; Jörgen Johansson; Birgitte Haahr Kallipolitis
Journal: RNA Biol Date: 2015 Impact factor: 4.652

4. The sRNA DicF integrates oxygen sensing to enhance enterohemorrhagic Escherichia coli virulence via distinctive RNA control mechanisms.

Authors: Elizabeth M Melson; Melissa M Kendall
Journal: Proc Natl Acad Sci U S A Date: 2019-06-24 Impact factor: 11.205

Review 5. Recent Research Advances in Small Regulatory RNAs in Streptococcus.

Authors: Zhi-Qiang Xiong; Ze-Xuan Lv; Xin Song; Xin-Xin Liu; Yong-Jun Xia; Lian-Zhong Ai
Journal: Curr Microbiol Date: 2021-05-07 Impact factor: 2.188

6. Emergence of New sRNAs in Enteric Bacteria is Associated with Low Expression and Rapid Evolution.

Authors: Fenil R Kacharia; Jess A Millar; Rahul Raghavan
Journal: J Mol Evol Date: 2017-04-12 Impact factor: 2.395

7. The RNA-binding protein QKI5 regulates primary miR-124-1 processing via a distal RNA motif during erythropoiesis.

Authors: Fang Wang; Wei Song; Hongmei Zhao; Yanni Ma; Yuxia Li; Di Zhai; Jingnan Pi; Yanmin Si; Jiayue Xu; Lei Dong; Rui Su; Mengmeng Zhang; Yong Zhu; Xiaoxia Ren; Fei Miao; Wenjie Liu; Feng Li; Junwu Zhang; Aibin He; Ge Shan; Jingyi Hui; Linfang Wang; Jia Yu
Journal: Cell Res Date: 2017-02-28 Impact factor: 25.617

8. The small RNA, SdsR, acts as a novel type of toxin in Escherichia coli.

Authors: Jee Soo Choi; Wonkyong Kim; Shinae Suk; Hongmarn Park; Geunu Bak; Junhyeok Yoon; Younghoon Lee
Journal: RNA Biol Date: 2018-10-18 Impact factor: 4.652

9. Small and Low but Potent: the Complex Regulatory Role of the Small RNA SolB in Solventogenesis in Clostridium acetobutylicum.

Authors: Alexander J Jones; Alan G Fast; Michael Clupper; Eleftherios T Papoutsakis
Journal: Appl Environ Microbiol Date: 2018-07-02 Impact factor: 4.792

10. The LhrC sRNAs control expression of T cell-stimulating antigen TcsA in Listeria monocytogenes by decreasing tcsA mRNA stability.

Authors: Joseph A Ross; Mette Thorsing; Eva Maria Sternkopf Lillebæk; Patrícia Teixeira Dos Santos; Birgitte H Kallipolitis
Journal: RNA Biol Date: 2019-02-01 Impact factor: 4.652