Literature DB >> 28472523

IntaRNA 2.0: enhanced and customizable prediction of RNA-RNA interactions.

Martin Mann1, Patrick R Wright1, Rolf Backofen1,2.   

Abstract

The IntaRNA algorithm enables fast and accurate prediction of RNA-RNA hybrids by incorporating seed constraints and interaction site accessibility. Here, we introduce IntaRNAv2, which enables enhanced parameterization as well as fully customizable control over the prediction modes and output formats. Based on up to date benchmark data, the enhanced predictive quality is shown and further improvements due to more restrictive seed constraints are highlighted. The extended web interface provides visualizations of the new minimal energy profiles for RNA-RNA interactions. These allow a detailed investigation of interaction alternatives and can reveal potential interaction site multiplicity. IntaRNAv2 is freely available (source and binary), and distributed via the conda package manager. Furthermore, it has been included into the Galaxy workflow framework and its already established web interface enables ad hoc usage.
© The Author(s) 2017. Published by Oxford University Press on behalf of Nucleic Acids Research.

Entities:  

Mesh:

Substances:

Year:  2017        PMID: 28472523      PMCID: PMC5570192          DOI: 10.1093/nar/gkx279

Source DB:  PubMed          Journal:  Nucleic Acids Res        ISSN: 0305-1048            Impact factor:   16.971


INTRODUCTION

The interaction of RNA molecules is an important factor for regulatory processes in all organisms. The in silico modeling and prediction of such RNA–RNA interactions is thus a central aspect in current research projects. In the last decades, various approaches to solve this problem have been proposed (see reviews (1,2)). To enable highly accurate predictions, state-of-the-art tools not only take the stability (energy) of possible RNA–RNA interactions into account but also consider the accessibility of the interacting subsequences (3). The latter is often incorporated by adding a pseudo-energy penalty proportional to the subsequences’ probability to be involved in intramolecular structure formation (4). Prediction quality can be further improved by addressing additional features known from experimentally validated interactions. For instance, bacterial small RNAs (sRNA) and eukaryotic microRNAs form highly stable subinteractions, which are considered to seed the formation of extended RNA–RNA duplexes (5,6). IntaRNA, first introduced in (7), is one of the most widely used state-of-the-art RNA–RNA interaction prediction approaches (2,3). It is employed in prokaryotic (8–10) and eukaryotic (11–13) systems, and its web server (14,15) received ∼16 500 external jobs in 2016. Its fast heuristic prediction mode enables genome wide target prediction while its high prediction accuracy is based on the incorporation of interaction site accessibility and seed constraints. However, IntaRNA has become technically outdated for the following reasons. Firstly, IntaRNA was fixed to an aged Vienna RNA library (v1.*) (16) and its energy parameters Turner99 (17). Secondly, it featured a strongly restricted user interface and was hard to maintain, update or extend. Here, we introduce IntaRNAv2, a complete open-source reimplementation of the previous approach (from now on referred to as IntaRNAv1) with a strong focus on efficiency, modularity and flexibility. The implementation closely follows the software development guidelines suggested in (18) and enables enhanced and customizable RNA–RNA interaction prediction. IntaRNAv2 enforces the seed interaction to be energetically favorable. Furthermore, a slightly changed treatment of dangling end energy contributions, compared to IntaRNAv1 and other approaches, has been incorporated. This, in concert with the usage of recent RNA energy parameters (from the Vienna RNA package v2.* (19)), enables an improvement of prediction quality. The newly performed benchmark uses a data set of 160 experimentally verified enterobacterial sRNA–mRNA interactions and evaluates genome wide screens for each interaction pair. The extended IntaRNAv2 web interface within the Freiburg RNA tools [Freiburg RNA tools: http://rna.informatik.uni-freiburg.de/IntaRNA.] v4.4.2 (14,15) enables full control over energy parameters, seed constraints and accessibility computation. The user has to provide either lists of interacting RNAs (all versus all predictions) or can perform a genome wide target screen (providing the according NCBI accession number) for a given input RNA. Using an efficient dynamic programming scheme, the minimum free energy interactions (or suboptimals if requested) are listed and additional details can be retrieved for each predicted interaction pair. Furthermore, the new version provides minimal energy profiles for interacting RNA pairs that enable a detailed investigation of interaction alternatives.

METHODS

IntaRNAv2 reimplements and extends the original IntaRNA approach (7) and closely follows established rules for software development in systems biology, e.g. see (18). Besides the heuristic prediction mode of IntaRNAv1 that allows fast, genome-wide predictions (7), IntaRNAv2 also features exact interaction prediction by reimplementing and extending the RNAup approach (4). Since seed constraints are optional for all prediction modes, IntaRNAv2 can emulate RNAup predictions (by disabling the seed requirement). Furthermore, the user interface enables full control of accessibility usage within RNA–RNA interaction prediction. Per default, unpaired probabilities are computed using the Vienna RNA package v2.* routines. Here, both global (RNAup-like) or local unpaired probabilities (as computed by RNAplfold (20)) can be selected. IntaRNAv2 also supports the loading of pre-computed accessibility data. This can significantly reduce the run time since the computation of the accessibility is highly time consuming (7). This feature is especially useful when screening a large target sequence (21). By disabling the integration of accessibility information, IntaRNAv2 can be used to emulate predictions from TargetScan (22) or RNAhybrid (23). Seed stability constraints are a new feature of IntaRNAv2, which enable direct control over the stability of the seed interaction. To this end, an upper energy bound can be provided. This defaults to 0 kcal/mol and thus requires all seed interactions to be energetically favorable. Furthermore, the user can optionally provide a lower bound on the unpaired probability of the subsequences forming the seed to define whether highly accessible regions are to be favored. Dangling end contributions incorporate a stabilizing effect due to the stacking of unpaired nucleotides directly neighboring base pairs (17). Since the interaction model of IntaRNA incorporates the accessibility, i.e. ‘unpairedness’, of the involved subsequences, an explicit dangling end case treatment was implemented in IntaRNAv1, i.e. dangling end contributions were only incorporated if the according nucleotide was also part of the accessible subsequence considered for interaction formation (see Supplementary material for further details). IntaRNAv2 simplifies the dangling end treatment for interactions. Here, the conditional probability of the dangling end to be free when the interaction is formed is used to weight the according dangling end contributions (see Supplementary material for details). These conditional probabilities can be directly computed from the respective accessibility terms already available for the interaction scoring. Minimal energy profiles, either position wise for each RNA (4) or intermolecular index pairwise for an RNA pair, can provide deeper insights into RNA–RNA interaction alternatives, since as for single RNA structure prediction, the optimal interaction is not necessarily the only or biologically correct option. With IntaRNAv2 both types of profiles can be generated by linking their generation with the respective dynamic programming computations. That is, whenever an interaction site (a cell in the according table) is computed, the respective minimal energy profiles are updated. This feature comes with a linear/quadratic overhead in computation time for single/pairwise profiles, respectively, since all indices within the according subsequences have to be updated. For instance, the run time of the heuristic prediction mode of O(n2) is increased to O(n3) for single sequence profiles and O(n4) for pairwise minimal energy profiles. While computationally demanding, the investigation of these profiles provides meaningful additional information about possible interaction patterns, as discussed in the Results section. Suboptimal interaction enumeration also enables the investigation of interaction alternatives. In IntaRNAv1, only query interactions were allowed to overlap, since interaction alternatives in the target were of interest. IntaRNAv2 now enables explicit control where suboptimal interactions are allowed to overlap. Besides the (so far limited) possibility to constrain overlaps to one sequence, this enables the enumeration of variants of the minimum free energy interaction (overlap in both) as well as the suppression of any overlap in both. Customizable output in CSV format complements the normal or detailed ASCII chart output of the interaction details. The CSV output provides for the first time a machine-readable output format of all interaction details. To this end, the user can specify all details of interest (columns of the CSV table) that will be reported for each predicted interaction. For further options of IntaRNAv2 to constrain the interaction, seed site, accessibility computation, energy model, multi-threading, output formats, installation, docker/conda/galaxy packages, etc. we refer to its documentation [IntaRNA at github: https://github.com/BackofenLab/IntaRNA/.].

RESULTS AND DISCUSSION

In order to evaluate the impact of recent energy parameters and the altered energy computation (see Methods), we performed a benchmark on genome-wide interaction predictions. In recent comprehensive benchmarks (2,24,25), IntaRNA and RNAupwere shown to be the best performing tools that are not based on conservation. Thus, we limited our comparison to IntaRNAv1 and IntaRNAv2. To assess performance, we followed the procedure previously applied in (25). The dataset of verified RNA–RNA interactions was extended by several candidates that have been reported in recent literature (26–36). Together, this is a dataset of 160 verified sRNA–target interactions (see Supplementary Table S1). For interaction prediction, we extracted 200 nucleotides upstream and 100 nucleotides downstream of the annotated start codon for each protein coding gene in Escherichia coli (NC_000913) and Salmonella (NC_003197). This gives rise to 4319 putative targets in E. coli and 4553 putative targets in Salmonella all of length 300 nucleotides with the start codon located at positions 201, 202 and 203. Target predictions were performed for sRNAs (see Supplementary FASTA) from both species individually with IntaRNAv1 and IntaRNAv2 (parameters see Supplementary material). The resulting predictions were ranked by energy score with the lowest energy ranking on position 1. A comparison is provided in Figure 1, which shows a slightly increased performance of IntaRNAv2 compared to IntaRNAv1 using comparable parameterization for both tools with equivalent run time (data not shown). The memory requirement of IntaRNAv2 was drastically reduced by a factor of 6 (from an average of 79MB to 13MB).
Figure 1.

The figure shows the whole genome target prediction performance of IntaRNAv1 (orange) and IntaRNAv2 (blue) on a benchmark set of 160 experimentally verified enterobacterial (E. coli, Salmonella) RNA–RNA interactions including 28 different sRNAs (see Supplementary Table S1 and Supplementary FASTA). Furthermore, the performance of IntaRNAv2 was assessed with the same data set under enforcement of a seed energy E ≤ –4.8 kcal/mol (dashed blue). The x-axis represents the amount of predictions per sRNA at a given rank and the y-axis shows the cumulated number of true positives for all sRNA whole genome target predictions.

The figure shows the whole genome target prediction performance of IntaRNAv1 (orange) and IntaRNAv2 (blue) on a benchmark set of 160 experimentally verified enterobacterial (E. coli, Salmonella) RNA–RNA interactions including 28 different sRNAs (see Supplementary Table S1 and Supplementary FASTA). Furthermore, the performance of IntaRNAv2 was assessed with the same data set under enforcement of a seed energy E ≤ –4.8 kcal/mol (dashed blue). The x-axis represents the amount of predictions per sRNA at a given rank and the y-axis shows the cumulated number of true positives for all sRNA whole genome target predictions.

Seed evaluation

To investigate the impact of more restrictive seed constraints, we first evaluated the energy distribution of seed interactions (without ED and dangling end contributions). To this end, we exhaustively generated all seeds (without bulges) for 5–8 bp, which revealed the seed energies to be normally distributed (see Table 1 for details). We observe that the mean value follows the formula (bp − 4)* − 1.6, where –1.6 is the average contribution of an additional stacking base pair for the Turner04 energy parameters.
Table 1.

Distribution of seed energies E for different seed lengths (= number of base pairs; no bulges allowed) with μ|σ = mean|standard deviation. The highlighted row corresponds to the seed constraints of the benchmark

Seed lengthmin(E)max(E)μ(E)σ(E)
5−9.28.3−1.62.5
6−12.59.6−3.22.8
7 −15.8 9.9 −4.8 3.1
8−19.111.2−6.43.4
For testing more tight constraints, we enforced the seed's stability to be above average (P-value: 0.5) and reran the benchmark. Since the number of seed base pairs was set to 7, only interactions with a seed that shows an energy below −4.8 kcal/mol are reported. The results are shown in Figure 1 and reveal that a noticeable performance increase can be obtained. Furthermore, the run time is heavily reduced (halved) since less than half the number of sequence pairs that showed a stable interaction without constraint enable such a stable seed (only 55 227 of 121 988). While this looks promising, such boundaries have to be chosen carefully. For instance, the prediction for the verified interaction of DsrA with the rpoS mRNA (37) is completely lost (no seed with E ≤ −4.8 kcal/mol) while it is top ranked (rank 1) for both IntaRNAv1 as well as IntaRNAv2 without seed constraint (see Supplementary Table S1). Thus, further research is needed on whether or not general suggestions for upper seed energy bounds can be provided or if such bounds are to be chosen in a sequence and organism dependent manner.

Minimal energy profiles

The extended web interface of IntaRNAv2 visualizes minimal energy profiles for interaction predictions that enable a sophisticated overview of interaction alternatives and their relative positioning. Figure 2 shows an example for the interaction of Spot42 with a region around the translation start site of the sthA (b3962) mRNA, which encodes a pyridine nucleotide transhydrogenase. Spot42 is known to interact with its targets via three accessible regions (I, II and III) (38), and the profile highlights sites I and III near the start codon. In fact, it has been shown that both regions I and III are important for regulation of the sthA mRNA (39), and this can also be deduced by investigating the profiles for the according Spot42 mutants (see Supplementary material).
Figure 2.

Minimal energy profile for all intermolecular index pairs covered by any predicted interaction of Spot42 with the sthA mRNA (with E < 0). Conserved accessible regions I, II and III of Spot42 known to interact are tagged on the right.

Minimal energy profile for all intermolecular index pairs covered by any predicted interaction of Spot42 with the sthA mRNA (with E < 0). Conserved accessible regions I, II and III of Spot42 known to interact are tagged on the right. Interestingly, while region II does not seem to be centrally involved in the interaction, its mutation still slightly reduces regulation by Spot42 (39). This might stem from a minor stability reduction of both sites I and III (predicted minimal energy increase of about 0.5 kcal/mol) when region II is mutated.

CONCLUSIONS

The reimplementation of IntaRNA enables state-of-the-art energy parameters as well as seed constraint and accessibility incorporation for fast and accurate RNA–RNA interaction prediction. Increased prediction quality for genome-wide target predictions are also beneficial for comparative interaction prediction approaches like CopraRNA (25,40). Based on an extended benchmark set, we show the enhanced prediction quality of IntaRNAv2 and highlight the possibility for further improvements, e.g. using more restrictive seed constraints. While promising, such constraints need careful tuning since they can lead to false negative predictions. One of the new features of the IntaRNA web server is the visualization of minimal energy profiles for interacting RNAs. This enables the detailed study of alternative RNA–RNA interactions. This way, it is possible to see whether multiple interaction sites are likely and if these can occur in conjunction or only exclusively. Furthermore, mutational effects can be analyzed and thus guide wet-lab experiments, which attempt in depth validation of predicted target sites (33,39,41). The new flexible framework is the foundation for further upcoming extensions of IntaRNA. One direction is to enable multi-site RNA–RNA interaction predictions. This way, hypotheses from the minimal energy profile investigations can be extended and further consolidated for successive experimental validation. Another direction is to enable further established input and output formats for an even more general embedding of IntaRNA into high-throughput workflow systems like Galaxy (42). In summary, the reimplementation lays the groundwork for IntaRNA to remain among the state-of-the-art RNA–RNA interaction prediction tools in future. Thus, it will continue to be useful and available for both bioinformaticians and wet-lab experimentalists. Click here for additional data file.
  40 in total

1.  Multiple factors dictate target selection by Hfq-binding small RNAs.

Authors:  Chase L Beisel; Taylor B Updegrove; Ben J Janson; Gisela Storz
Journal:  EMBO J       Date:  2012-03-02       Impact factor: 11.598

2.  Thermodynamics of RNA-RNA binding.

Authors:  Ulrike Mückstein; Hakim Tafer; Jörg Hackermüller; Stephan H Bernhart; Peter F Stadler; Ivo L Hofacker
Journal:  Bioinformatics       Date:  2006-01-29       Impact factor: 6.937

3.  Complex transcriptional and post-transcriptional regulation of an enzyme for lipopolysaccharide modification.

Authors:  Kyung Moon; David A Six; Hyun-Jung Lee; Christian R H Raetz; Susan Gottesman
Journal:  Mol Microbiol       Date:  2013-05-31       Impact factor: 3.501

4.  RIsearch2: suffix array-based large-scale prediction of RNA-RNA interactions and siRNA off-targets.

Authors:  Ferhat Alkan; Anne Wenzel; Oana Palasca; Peter Kerpedjiev; Anders Frost Rudebeck; Peter F Stadler; Ivo L Hofacker; Jan Gorodkin
Journal:  Nucleic Acids Res       Date:  2017-05-05       Impact factor: 16.971

5.  Spot 42 RNA mediates discoordinate expression of the E. coli galactose operon.

Authors:  Thorleif Møller; Thomas Franch; Christina Udesen; Kenn Gerdes; Poul Valentin-Hansen
Journal:  Genes Dev       Date:  2002-07-01       Impact factor: 11.361

6.  The sRNA NsiR4 is involved in nitrogen assimilation control in cyanobacteria by targeting glutamine synthetase inactivating factor IF7.

Authors:  Stephan Klähn; Christoph Schaal; Jens Georg; Desirée Baumgartner; Gernot Knippen; Martin Hagemann; Alicia M Muro-Pastor; Wolfgang R Hess
Journal:  Proc Natl Acad Sci U S A       Date:  2015-10-22       Impact factor: 11.205

7.  ViennaRNA Package 2.0.

Authors:  Ronny Lorenz; Stephan H Bernhart; Christian Höner Zu Siederdissen; Hakim Tafer; Christoph Flamm; Peter F Stadler; Ivo L Hofacker
Journal:  Algorithms Mol Biol       Date:  2011-11-24       Impact factor: 1.405

8.  Post-transcriptional control of the Escherichia coli PhoQ-PhoP two-component system by multiple sRNAs involves a novel pairing region of GcvB.

Authors:  Audrey Coornaert; Claude Chiaruttini; Mathias Springer; Maude Guillier
Journal:  PLoS Genet       Date:  2013-01-03       Impact factor: 5.917

9.  The sRNA RyhB regulates the synthesis of the Escherichia coli methionine sulfoxide reductase MsrB but not MsrA.

Authors:  Julia Bos; Yohann Duverger; Benoît Thouvenot; Claude Chiaruttini; Christiane Branlant; Mathias Springer; Bruno Charpentier; Frédéric Barras
Journal:  PLoS One       Date:  2013-05-09       Impact factor: 3.240

10.  The iron-sensing aconitase B binds its own mRNA to prevent sRNA-induced mRNA cleavage.

Authors:  Julie-Anna M Benjamin; Eric Massé
Journal:  Nucleic Acids Res       Date:  2014-08-04       Impact factor: 16.971

View more
  167 in total

1.  A novel mechanism of ribonuclease regulation: GcvB and Hfq stabilize the mRNA that encodes RNase BN/Z during exponential phase.

Authors:  Hua Chen; Angelica Previero; Murray P Deutscher
Journal:  J Biol Chem       Date:  2019-11-19       Impact factor: 5.157

Review 2.  Escherichia coli Small Proteome.

Authors:  Matthew R Hemm; Jeremy Weaver; Gisela Storz
Journal:  EcoSal Plus       Date:  2020-05

3.  Gene network analysis identifies a central post-transcriptional regulator of cellular stress survival.

Authors:  Matthew Tien; Aretha Fiebig; Sean Crosson
Journal:  Elife       Date:  2018-03-14       Impact factor: 8.140

4.  A CsrA-Binding, trans-Acting sRNA of Coxiella burnetii Is Necessary for Optimal Intracellular Growth and Vacuole Formation during Early Infection of Host Cells.

Authors:  Shaun Wachter; Matteo Bonazzi; Kyle Shifflett; Abraham S Moses; Rahul Raghavan; Michael F Minnick
Journal:  J Bacteriol       Date:  2019-10-21       Impact factor: 3.490

5.  Downregulation of MicroRNA eca-mir-128 in Seminal Exosomes and Enhanced Expression of CXCL16 in the Stallion Reproductive Tract Are Associated with Long-Term Persistence of Equine Arteritis Virus.

Authors:  Mariano Carossino; Pouya Dini; Theodore S Kalbfleisch; Alan T Loynachan; Igor F Canisso; Kathleen M Shuck; Peter J Timoney; R Frank Cook; Udeni B R Balasuriya
Journal:  J Virol       Date:  2018-04-13       Impact factor: 5.103

6.  Systematic analysis of lncRNA expression profiles and atherosclerosis-associated lncRNA-mRNA network revealing functional lncRNAs in carotid atherosclerotic rabbit models.

Authors:  Yingnan Wu; Feng Zhang; Xiaoying Li; Wenying Hou; Shuang Zhang; Yanan Feng; Rui Lu; Yu Ding; Litao Sun
Journal:  Funct Integr Genomics       Date:  2019-08-07       Impact factor: 3.410

7.  The small protein MgtS and small RNA MgrR modulate the PitA phosphate symporter to boost intracellular magnesium levels.

Authors:  Xuefeng Yin; Mona Wu Orr; Hanbo Wang; Errett C Hobbs; Svetlana A Shabalina; Gisela Storz
Journal:  Mol Microbiol       Date:  2018-10-21       Impact factor: 3.501

8.  The LhrC sRNAs control expression of T cell-stimulating antigen TcsA in Listeria monocytogenes by decreasing tcsA mRNA stability.

Authors:  Joseph A Ross; Mette Thorsing; Eva Maria Sternkopf Lillebæk; Patrícia Teixeira Dos Santos; Birgitte H Kallipolitis
Journal:  RNA Biol       Date:  2019-02-01       Impact factor: 4.652

9.  The long non-coding RNA TSLC8 inhibits colorectal cancer by stabilizing puma.

Authors:  Zhian Du; Tao Yu; Meina Sun; Yun Chu; Gang Liu
Journal:  Cell Cycle       Date:  2020-11-20       Impact factor: 4.534

10.  Minimal in vivo requirements for developmentally regulated cardiac long intergenic non-coding RNAs.

Authors:  Matthew R George; Qiming Duan; Abigail Nagle; Irfan S Kathiriya; Yu Huang; Kavitha Rao; Saptarsi M Haldar; Benoit G Bruneau
Journal:  Development       Date:  2019-12-09       Impact factor: 6.868

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.