Literature DB >> 27993777

A comprehensive benchmark of RNA-RNA interaction prediction tools for all domains of life.

Sinan Ugur Umu^1,2, Paul P Gardner^1,2,3.

Abstract

Motivation: The aim of this study is to assess the performance of RNA-RNA interaction prediction tools for all domains of life.
Results: Minimum free energy (MFE) and alignment methods constitute most of the current RNA interaction prediction algorithms. The MFE tools that include accessibility (i.e. RNAup, IntaRNA and RNAplex) to the final predicted binding energy have better true positive rates (TPRs) with a high positive predictive values (PPVs) in all datasets than other methods. They can also differentiate almost half of the native interactions from background. The algorithms that include effects of internal binding energies to their model and alignment methods seem to have high TPR but relatively low associated PPV compared to accessibility based methods. Availability and Implementation: We shared our wrapper scripts and datasets at Github (github.com/UCanCompBio/RNA_Interactions_Benchmark). All parameters are documented for personal use. Contact: sinan.umu@pg.canterbury.ac.nz. Supplementary information: Supplementary data are available at Bioinformatics online.

Entities: Chemical Disease Gene Species

Mesh：

Substances：
RNA

Year: 2017 PMID： 27993777 PMCID： PMC5408919 DOI： 10.1093/bioinformatics/btw728

Source DB: PubMed Journal: Bioinformatics ISSN： 1367-4803 Impact factor: 6.937

1 Introduction

RNA biology has become more prominent after the discovery of non-coding RNAs (ncRNAs) and their versatile functions (Ambros, 2004; Barquist and Vogel, 2015; Kidner and Martienssen, 2005; Mattick, 2004; Mattick, 2009; Storz ; Waters and Storz, 2009). The versatility of RNA molecules has led to the idea of an ‘RNA world’ where RNA formed the first primitive life forms (Gilbert, 1986). The importance of RNA biology is highlighted by the relatively small fraction of protein-coding regions of most eukaryotic genomes (Mattick, 2004, 2009). For example, 1.2% of the human genome contain protein coding genes, while 76% is transcribed into RNA (Pennisi, 2012). Likewise, prokaryotic cells contain various ncRNAs genes (Gottesman, 2004; Holmqvist and Vogel, 2013; Thébault ; Vogel, 2009) and have also been shown to have transcriptional complexity like eukaryotes (Barquist and Vogel, 2015; Cohen ; Güell , 2009; Lindgreen ). ncRNA molecules often utilize RNA–RNA base pairing such as bacterial/archaeal small RNAs (sRNAs) (Prasse ; Storz ), small interfering RNAs (siRNAs) (Carthew and Sontheimer, 2009), microRNAs (miRNAs) (Carthew and Sontheimer, 2009; Cuperus ), spliceosomal small nuclear RNAs (snRNAs) (Karijolich and Yu, 2010), small nucleolar RNAs (snoRNAs) (Brown ; Gardner ; Kiss, 2002; Omer ), cajal-body specific small nuclear RNAs (scaRNAs) (Darzacq ), clustered regularly-interspaced short palindromic repeats (CRISPR) RNA (Bhaya ) and piwi-interacting RNAs (piRNAs) (Brennecke ; Klattenhoff and Theurkauf, 2008). It seems some long-noncoding RNAs (lncRNAs) may also engage into RNA–RNA interactions (Kung ), which are quite abundant in eukaryotes (Zhao ). In addition to endogenous ncRNAs genes, many experimental techniques take advantage of RNA–RNA interactions such as gene silencing (i.e. knock-out) by artificial siRNAs (Deleavey and Damha, 2012; Reynolds ) and designing oligonucleotides for ribosomal RNA (rRNA) depletion in RNA-seq experiments (O’Neil ). Different clades of life utilize regulatory RNA–RNA interactions with different constraints: various mediator proteins (Carthew and Sontheimer, 2009; Vogel and Luisi, 2011), binding regions preference and distinct complementarity requirements (Ameres and Zamore, 2013; Millar and Waterhouse, 2005). Thus, many different tools have been developed to predict stable interactions. Some algorithms solve RNA–RNA interaction as an alignment problem using local alignment approaches (Hodas and Aalberts, 2004; Wenzel ). Most of these use dynamic programming and minimum free energy methods (MFE) (Backofen and Hess, 2010; Dieterich and Stadler, 2012; Lorenz ), which are also widely used methods for RNA secondary structure predictions (Markham and Zuker, 2008; McCaskill, 1990; Nussinov and Jacobson, 1980; Zuker, 2000; Zuker and Sankoff, 1984; Zuker and Stiegler, 1981). In bacteria, comparative methods are becoming popular (Kery ; Pain ; Wright ), but they are restricted to conserved sRNAs, which are quite rare (Barquist and Vogel, 2015; Lindgreen ). RNA target detection is still a challenging task but it is vital to understand more about RNA–RNA interactions for functional annotation of unknown transcripts while making computationally feasible and biologically relevant prediction. In this study, we assessed the performance of available RNA–RNA interaction prediction tools on trusted, verified datasets from all domains of life. We evaluated their ability to recover established RNA–RNA pairs in eukaryotic, bacterial and archaeal systems. We also assessed how successfully they predict binding scores and reported the significance of these predictions.

2 Materials and methods

All RNA interaction prediction algorithms are freely available and cited in the manuscript. We used Python, R, Bash for the scripts and wrappers, which are shared in our Github repository (github.com/UCanCompBio/RNA_Interactions_Benchmark). A parser script (or a wrapper script) has been written for each of the tools benchmarked here. All the parameters and command line arguments are also accessible.

2.1 Benchmark datasets

We manually confirmed the correct interaction regions (which contain the binding base-pairs) for all dataset items and used entire target regions (i.e. UTRs, coding regions or target RNA) to make our benchmark as realistic as possible. We also manually confirmed that the true binding regions on target RNAs are consecutive with only few mismatches. The eukaryotic benchmark dataset consisted of miRNAs from human, Arabidopsis, Caenorhabditis elegans (C. elegans) (Chou ; Kozomara and Griffiths-Jones, 2013); C/D and H/ACA box snoRNAs from human, Arabidopsis, C. elegans, yeast (Brown ; Lestrade and Weber, 2006; Piekna-Przybylska ; Yoshihama ); human and yeast U6/U2 snRNAs (Will and Lührmann, 2011); endogenous siRNAs from Arabidopsis (Addo-Quaye ) and piRNAs from mouse (Gou ). Experimentally verified miRNA/siRNA/piRNA-target mRNAs and snoRNA/snRNA-target RNAs were selected from different ncRNA families as much as possible (in total 88 pairs) (Supplementary Table S1). We compiled a bacterial sRNA and target mRNA dataset from Salmonella, Escherichia coli (E. coli) and Listeria monocytogenes (L. monocytogenes) that consists of 60 verified sRNA-mRNA pairs (Cao ; Lai and Meyer, 2015; Peer and Margalit, 2011). The target regions of bacterial sRNAs lie either in 5′UTR or downstream of start codon (Richter and Backofen, 2012; Storz ). We selected regions 200 nucleotides (nts) upstream to 100 nts downstream of the start codons (i.e. 5′end mRNA) which contain verified binding regions. We extracted both sRNAs and target 5′end mRNAs from their associated genome sequences (Acces. AE006468.1, AL591824.1 and U00096.3) (Supplementary Table S1). We gathered a set of archaeal C/D box snoRNAs consisting of 5 snoRNAs and their ribosomal RNA targets (Omer ). We also added a member of less studied archaeal sRNA (from Methanosarcina mazei) (Jäger ). Selected genes and targets were obtained from their associated archaeal genomes (AE008384.1) or Genbank (Supplementary Table S1).

2.2 Accuracy measures for binding site predictions

We calculated TPR (sensitivity) and PPV (precision) scores of each algorithm based on prediction of RNA–RNA binding regions for 154 manually curated interactions from the scientific literature. They include functionally characterized RNA–RNA interactions from Archaea, Bacteria and Eukaryotes. Verified binding regions between ncRNAs and target RNAs are annotated with published base-pairing interactions. These interactions can be used to assess overlaps between predicted and true binding regions on target RNAs. In this work, true positives (TPs) are the number of nucleotides on a correctly predicted binding region, false positive (FPs) are the number of nucleotides in a falsely predicted binding region (i.e. a predicted target that is not part of the curated set of interactions), and false negatives (FNs) are the number of nucleotides in a binding region where interactions are not predicted (Supplementary Fig. S1). True negatives (TNs) are generally not used for the treatment of RNA structure as the number of true negatives grows exponentially with sequence length while TP, FP and FN grow linearly (Wenzel ). We can calculate an approximation to the Matthews correlation coefficient (MCC) (Matthews, 1975) by using the geometric mean of TPR and PPV (Gorodkin ; Wenzel ). These can be defined as:

2.3 A significance test for prediction scores

Besides these well-known accuracy measures, we also assessed the scores generated by the algorithms, which usually show the stability of interaction (e.g. a binding MFE). For each true and verified target (positive control), we created 200 dinucleotide shuffled sequences (negative controls) using the esl-shuffle tool (Eddy, 2011) to prevent possible biases caused by the nearest-neighbour energy model of structure prediction (Workman and Krogh, 1999). To determine the significance of native interactions, we fitted the binding energies shuffled interactions (as a background) into both normal and Gumbel distributions (using negative energies) (Gumbel, 1958), since MFE values mostly follow an extreme value distribution (Rehmsmeier ; Tjaden, 2008). In short, we assessed significance of positive controls using a set of negative controls. A similar methodology showed an avoidance of crosstalk RNA–RNA interactions in prokaryotes which can be measured as a binding energy shift (Umu ). We applied this approach only to bacterial dataset due to time constraints, and the uniform distribution of bacterial targets (i.e. identical 300 nucleotides long target mRNAs). We selected the best scoring interaction as the native interaction if an algorithm produces more than one interaction, which is also true for all our analyses.

3 Results and discussion

3.1 RNA–RNA interaction prediction tools

The RNA–RNA interaction prediction methods are divided mainly into three groups: alignment like methods, MFE methods and comparative (homology) methods. We can also further divide the MFE methods into three different sub-classes based on whether their approach considers intramolecular base-pairs (internal structure), neglects intramolecular structure or measures the accessibility of the binding region. There are also other machine learning algorithms (Oğul ; Yang ), and probabilistic approaches like RactIP (Kato ), which uses the CONTRAfold model (Do ) for RNA interaction prediction. RIsearch (Wenzel ), Bindigo (Hodas and Aalberts, 2004) and Guugle (Gerlach and Giegerich, 2006) are examples of alignment-like methods. The RIsearch algorithm was mainly developed for rapidly searching genomes to detect RNA–RNA pairs from genome sequencing data by combining the Smith-Waterman-Gotoh algorithm with a nearest-neighbor energy model (Wenzel ), while Bindigo adopts an optimized Smith-Waterman to find optimal oligonucleotide-RNA pairs (Hodas and Aalberts, 2004). Guugle uses suffix arrays to seek RNA targets based on RNA helix rules that allow G-U pairs (Gerlach and Giegerich, 2006). Besides these alignment based methods, tools like BLAST (Altschul ), Blat (Kent, 2002), ssearch (Pearson and Lipman, 1988) or other local alignment implementations can be used to rapidly collect long (reverse) complementary regions by including G-U pairs (C-U or G-A for the reverse complement) in the scoring matrix (Gerlach and Giegerich, 2006; Thébault ; Wenzel ). MFE methods form the majority of the RNA–RNA interaction prediction tools (Backofen and Hess, 2010; Dieterich and Stadler, 2012; Lorenz ). Many secondary structure prediction tools also utilize MFE methods (Lorenz ; Markham and Zuker, 2008; Mathews and Turner, 2006; Zuker and Sankoff, 1984). Some MFE methods including RNAhybrid (Rehmsmeier ), RNAduplex (Lorenz ), DuplexFold (Reuter and Mathews, 2010) and TargetRNA (Tjaden, 2008) neglect intramolecular structures for the sake of algorithmic speed. Algorithms like Pairfold (Andronescu ), RNAcofold (Bernhart ) and bifold (Reuter and Mathews, 2010) take intramolecular base-pairing into account. RNAup (Mückstein ), RNAplex (Tafer and Hofacker, 2008) and IntaRNA (Busch ) compute the accessibility of binding regions to report the final MFE of the RNA duplex, which is considered more realistic biophysically (Richter and Backofen, 2012). AccessFold includes accessibility using a method defined as pseudo-energy minimization (DiChiacchio ). BistaRNA also includes accessibility and can predict multiple binding sites (Poolsap ). Lastly, tools like TargetRNA2 (Kery ), CopraRNA (Wright ), miRanda (John ), TargetScan (Lewis ), PETcofold (Seemann ) and DIANA-microT (Kiriakidou ) exploit homology and evolutionary conservation to predict interactions Some RNA–RNA interaction prediction tools are developed to achieve a specific task or to predict very specific group of interactions. For example, PLEXY is designed for C/D snoRNAs (Kehr ), RNAsnoop (Tafer ) for H/ACA snoRNAs and TargetRNA (Tjaden, 2008) for bacterial sRNAs (E. coli and Salmonella). In this study, we tried to assess the versatility of prediction tools on different datasets as well as their prediction power where applicable. We excluded tools designed for specific RNA families such as specialized miRNA algorithms (reviewed in Witkos ), specialized snoRNA target prediction algorithms and comparative bacterial sRNA prediction methods (reviewed in Backofen and Hess, 2010, Pain ). We also excluded inteRNA (Alkan ), IRIS (Pervouchine, 2004), piRNA (Chitsaz ) and biRNA (Chitsaz ), as they are either no longer supported or obsolete. In summary, our final list of selected tools used for further analyses consisted of RIsearch (Wenzel ), IntaRNA (Busch ), RNAcofold (Bernhart ), RNAhybrid (Rehmsmeier ), RNAduplex (Lorenz ), RNAplex (Tafer and Hofacker, 2008), RNAup (Mückstein ), pairfold (Andronescu ), bifold (Reuter and Mathews, 2010), DuplexFold (Reuter and Mathews, 2010), ssearch (Pearson, 1991), RactIP (Kato ), bistaRNA (Poolsap ), AccessFold (DiChiacchio ) and NUPACK (Dirks ) (Supplementary Table S2).

3.2 Overall prediction performances

Our analyses of the overall performances of RNA interaction prediction algorithms show that three accessibility based algorithms (RNAup, IntaRNA and RNAplex) scored highest for sensitivity and precision. RNAup was highly precise compared to other tools (Fig. 1 and Table 1). IntaRNA was the second algorithm (almost identical to RNAup) with a reasonable running time. RNAplex was comparable to both algorithms. RNAduplex had the best overall TPR score, but it was not as precise as IntaRNA. Table 1 summarizes the ’cumulative’ TPR, PPV and MCC scores, while Figure 1 shows their distribution for all interactions (n = 154) on all domains of life.

Fig. 1.

Table 1.

Total run time of algorithms, and the cumulative TPR, PPV and MCC scores

Algorithm	Total run time (s) on	TPR	PPV	MCC
	selected files (n = 50)	(Sensitivity)	(Precision)
AccessFold	596.44	0.38	0.31	0.35
bifold	404.63	0.37	0.31	0.34
bistaRNA	102.29	0.15	0.16	0.15
DuplexFold	5.33	0.48	0.17	0.29
IntaRNA	24.44	0.59	0.56	0.58
NUPACK	794.2	0.42	0.42	0.42
pairfold	90.24	0.39	0.29	0.34
ractIP	87.62	0.16	0.06	0.1
RIsearch	4.16	0.36	0.45	0.40
RNAcofold	15.28	0.41	0.32	0.36
RNAduplex	6.45	0.66	0.12	0.27
RNAhybrid	32.84	0.56	0.12	0.26
RNAplex	17.19	0.55	0.57	0.56
RNAup	137.48	0.51	0.69	0.60
ssearch	4.69	0.56	0.1	0.23

The cumulative scores (i.e. TPR, PPV, MCC) are calculated by adding individual TP, FP and FN values for all predictions.

The distribution of scores for RNA–RNA interaction prediction algorithms. (A) RNAduplex gave the highest median TPR (sensitivity) followed by IntaRNA. (B) RNAup was the most precise algorithm based on PPV score followed by the other accessibility based methods IntaRNA and RNAplex. (C) RNAup was the best prediction algorithm based on median MCC score, with IntaRNA and RNAplex giving similar scores. RactIP produced the worst overall MCC (Color version of this figure is available at Bioinformatics online.) Total run time of algorithms, and the cumulative TPR, PPV and MCC scores The cumulative scores (i.e. TPR, PPV, MCC) are calculated by adding individual TP, FP and FN values for all predictions. RIsearch and ssearch were the fastest methods, but they were not very sensitive or precise (Table 1). AccessFold and bifold had the longest run time, which appeared to increase for long RNA sequences like ribosomal RNAs or large target UTR regions. RIsearch and bifold gave inconsistent results, with combined MCCs of 0.33 and 0.40 respectively (Table 1). However, if we use a distribution of results as in Figure 1, the median MCCs appear to be zero for these algorithms. As bifold frequently returned no duplex structures for some RNA pairs (e.g. C. elegans miRNAs lin-4, lsy-6-3p, etc.), and RIsearch produced many unsuccessful predictions for bacterial sRNAs, which produced to zero MCC scores for both.

3.3 The significance test results of bacterial dataset

The MFE values produced by the algorithms are not very explicit, so it is common to use negative controls to determine the significance of predicted energy values (Rehmsmeier ), especially for structure predictions (Workman and Krogh, 1999). As described in materials and methods, we created a set of negative controls for each native RNA–RNA interaction. Some algorithms were excluded from this assessment, because either they do not produce a score (i.e. RactIP, bistaRNA and ssearch) or are biased towards internal structures (i.e. pairfold, RNAcofold, bifold and NUPACK). Thus, the test of significance includes only 8 prediction algorithms (Table 2).

Table 2.

The test of significance results of selected algorithms on bacterial sRNAs.

Algorithm	Total # of significant (P < 0.05) correct predictions for Gumbel dist. (n = 60)	Total # of significant (P < 0.05) correct predictions for normal dist. (n = 60)	Median rank of native interactions
AccessFold	15	17	41.75
DuplexFold	2	8	63.5
IntaRNA	23	26	19
RIsearch	13	14	52.25
RNAduplex	8	11	54.25
RNAhybrid	5	6	76
RNAplex	23	30	10.5
RNAup	28	29	13.5

Higher is better for the second and third columns. Lower is better for the fourth column.

The test of significance results of selected algorithms on bacterial sRNAs. Higher is better for the second and third columns. Lower is better for the fourth column. These results show that RNAplex and RNAup reported almost half of the native energies as significant if they are fitted to normal distributions. It seems the Gumbel fitting of scores is more conservative which likely decreases the risk of FP predictions on high-throughput predictions. RNAup results were almost identical for both distributions. IntaRNA performed slightly worse than these two algorithms. The last column of Table 2 shows the median rank of native interactions. If a prediction score of a native interaction has the highest score (e.g. lowest MFE), it is ranked 1 out of 201. Therefore, the median ranks in the last column can be interpreted as the expected number of FPs introduced by the algorithms before predicting the native interaction.

3.4 A summary of RNA–RNA interactions and algorithm performances for all domains of life

Eukaryotic RNA interactions mostly focus on RNA interference (RNAi) (i.e. miRNAs and siRNAs) (Ambros, 2004; Carthew and Sontheimer, 2009; Chen, 2008). In animal RNAi, miRNAs (∼20 nts long) prefer perfect complementarity in the seed region and have overall lower complementarity than plant counterparts (Ameres and Zamore, 2013; Axtell ). In plants, high complementary target regions may lie in coding region as well as UTRs rather than only 3′UTRs (Ameres and Zamore, 2013; Axtell ; Millar and Waterhouse, 2005). It is possible for a miRNA to target more than one region, especially in animals, which is known to increase efficiency of target gene downregulation (Millar and Waterhouse, 2005). However, in our benchmark we preferred to select miRNA targets containing a single designated binding region. Piwi associated piRNAs are also small endogenous RNAs (24–30 nts long) (Klattenhoff and Theurkauf, 2008; Zhang ), some of which use antisense binding to regulate target RNAs (Gou ) like miRNA and siRNA. H/ACA and C/D snoRNAs have roles in rRNA and snRNA maturation (Brown ; Gardner ; Kiss, 2002). These interactions differ in that C/D snoRNAs prefer a binding region on target RNA with consecutive nts around 7–20 bases long with a few mismatches (Gardner ; Kehr ), while H/ACA snoRNAs contain a stem loop within the binding region, which complicates target prediction (Gardner ; Kiss ; Tafer ). Spliceosomal snRNAs form ribonucleoprotein (RNP) complexes with other snRNAs (Karijolich and Yu, 2010), and they are also targeted by snoRNAs (termed scaRNAs) (Darzacq ). We included examples of both snRNA-snRNA and scaRNA-snRNA interactions to our dataset. It is also known that some lncRNAs use RNA–RNA interactions (Kung ) but these were not included in our benchmark. We found that in the eukaryotic dataset, accessibility based methods performed best based on the average MCC scores (except AccessFold and bistaRNA) (Fig. 2). IntaRNA (av. MCC: 0.51) slightly outperformed RNAup (av. MCC: 0.49) and produced a higher PPV than the other tools benchmarked. RNAplex (av. MCC: 0.48) and RIsearch (av. MCC: 0.48) (an alignment-like method) were also comparable with these two algorithms for eukaryotic datasets. Supplementary Table S3 explicitly shows the prediction scores for all 88 eukaryotic interactions.

Fig. 2.

This heatmap shows MCC values of each tool for entire dataset. The red cells display a higher MCC value denoting a better prediction. Similar methods are mostly clustered together based on these predictions (dendrogram at top). Row labels show the type of interactions. Predictions for the single archaeal sRNA are on the last row. An in depth examination of these results show that the algorithms are poor at predicting human miRNA-mRNA interactions (av. MCC: 0.22), snoRNAs (weaker for H/ACA as expected, av. MCC: 0.09), mouse piRNAs (av. MCC: 0.07). Conversely, they perform best on Arabidopsis miRNAs (av. MCC: 0.72), siRNAs (av. MCC: 0.71) and bacterial sRNAs (av. MCC: 0.40), which is most likely an effect of high complementarity in binding regions for these Bacterial small RNAs can be divided into three major types: antisense binding sRNAs, Hfq dependent sRNAs and csrA binding sRNAs (Storz ; Vogel, 2009). However, in this study, bacterial sRNAs refer to either antisense or Hfq dependent sRNAs, which achieve their role via RNA–RNA base-pairing interactions. Bacterial sRNAs (50–200 nts long) prefer short binding regions relative to their size (Storz ; Vogel, 2009). This was also true for our dataset, with an average binding region size of 23 nts, with the smallest just 7 nts long (Supplementary Table S1). Model bacterial organisms like E. coli or Salmonella contain hundreds of different sRNAs which points to a complex regulatory system in prokaryotic organisms (Waters and Storz, 2009). Moreover, increasing number of RNA-seq studies (Cohen ; Sharma and Vogel, 2014; Sharma ) reveal that there are novel regulatory ncRNAs are spanning in prokaryotes than previously anticipated (Barquist and Vogel, 2015; Chen ; Lindgreen ). We found that in the bacterial dataset, accessibility based methods performed better than the others based on the average MCC scores, as with the eukaryotic dataset. RNAup (av. MCC: 0.68) slightly outperformed IntaRNA (av. MCC: 0.65) in bacterial sRNA interactions. RNAplex (av. MCC: 0.61) was comparable with the other two algorithms. In bacterial dataset, RIsearch (av. MCC: 0.31) did not perform as well as on the eukaryotic dataset, which decreased the overall performance (Fig. 2). RNA interactions in archaea are not well characterized. Recent studies have shown that archaeal genomes contain a large number of ncRNA repositories similar to bacterial genomes (Lindgreen ). Unfortunately, there are not many verified RNA interactions available in archaea, except archaeal snoRNAs. Archaeal genomes mostly contain C/D box snoRNAs; thus, we added 5 C/D box snoRNAs (Omer ) and one archaeal sRNA (Jäger ) as an archaeal benchmark dataset. The archaeal sRNA targets a bicistronic gene and trans-regulates expression of two protein coding genes concurrently (Jäger ) (Figs 1 and 2 and Supplementary Table S3). We found that in the archaeal dataset, RNAplex (av. 0.65) performed better than the other algorithms, followed by IntaRNA (av. MCC: 0.61). These two algorithms were followed by RNAup (av. MCC: 0.53) and RIsearch (av. MCC: 0.40). RIsearch was better on snoRNA predictions than the single archeal sRNA, which reduced the average overall performance. RNAplex recovered the binding region with a perfect MCC score, followed by IntaRNA.

3.5 Limitations of RNA–RNA interaction predictions algorithms

Unfortunately, 15 out of 154 RNA interaction pairs in our benchmark dataset could not be correctly predicted by any of the algorithms (i.e. an MCC score of 0 for all algorithms) (Fig. 2 and Supplementary Table S3) including 6 human miRNAs, and snoRNAs from yeast, human and archaea. The mouse piRNA results were also unsatisfactory, and one (piR-013474) could not be detected by any of the algorithms. The algorithms benchmarked performed best on Arabidopsis miRNAs, siRNAs and bacterial sRNAs (Fig. 2). We applied the significance test to some of these failed eukaryotic interactions (e.g. mouse piRNAs, human miRNAs), aiming to see whether the predicted scores enabled the detection of true interactions (and separate scores for native interactions from background) rather than using correctly predicted binding regions. The comparison of two methods revealed consistent results as expected. For example, the native interaction of piR-013474 cannot be differentiated from background by any algorithm. This is also similar for other piRNAs and human miRNAs, where all algorithms consistently failed. The lengths of target RNA regions (which include binding regions) seem to influence prediction quality (also discussed by Lai and Meyer, 2015). The average length of a eukaryotic target RNA is 1690 nts long in our dataset. However, this rises to around 2400 nts for those miRNAs which did not give prediction scores, and longer in piRNAs. As described in materials and methods, we did not truncate the targets (e.g. UTRs) that contained binding regions. We found a significant reverse correlation (Pearson’s r = -0.28, p < 0.05) between the lengths of target RNAs and average MCCs (i.e. overall performances). However, some of the algorithms (RNAup, RNAplex, RIsearch, RNAcofold and NUPACK) are less prone to this length bias (p > 0.05) (Supplementary Table S4), making them ideal for use on untruncated targets. Another explanation for inadequate prediction may be the quality of the dataset. Not all experimental protocols are equally strong at detecting correct binding regions, functional characterization or identifying new targets (Chou ; Kuhn ; Thomson ; Vogel and Wagner, 2007). However, the incorrectly predicted human miRNAs (hsa-miR-21-5p, hsa-miR-29b-3p, etc.) were validated by relatively strong evidence (Chou ), which could rule out this explanation. RNA structure prediction (and also RNA–RNA interaction prediction) algorithms are based on biophysical assumptions where the influence of tertiary interactions and other factors are neglected (Mathews, 2006; Mathews and Turner, 2006; Wuchty ). RNA structures with the lowest free energy may not be the biologically active form, which may have multiple different conformations with different MFEs (Mathews, 2006; Mathews and Turner, 2006). Many algorithms ignore computationally expensive RNA structures (e.g. pseudoknots) (Do ; Hofacker ; Lorenz ). MFE methods also become inaccurate with longer RNA sequences (Lai and Meyer, 2015; Lange ; Mathews and Turner, 2006; Meyer, 2008). RNA interaction prediction algorithms generally do not consider multiple binding regions—only a few of which such as bistaRNA and ractIP, include multiple binding positions in their model (Kato ; Poolsap ). Cellular dynamics (i.e. interaction with other molecules, ion concentrations, etc.) can influence RNA structures (Onoa and Tinoco, 2004) and RNA interactions (Meyer, 2008; Mückstein ), which is hard to factor into prediction models. The ssearch tool uses the Smith-Waterman algorithm (Pearson and Lipman, 1988) and is the only pure alignment tool in our benchmark, although it is possible to use similar tools, such as BLAST or Blat, to extract complementary regions for high-throughput predictions. Once the gap penalty and scoring matrix parameters were tweaked to make it more suitable for RNA–RNA interaction prediction, ssearch was quite successful and even comparable with some MFE methods (e.g. RNAhybrid and DuplexFold) (Fig. 1). Those MFE methods that include internal structures (e.g. pairfold, RNAcofold, bifold, NUPACK) are biased towards internal structures as many ncRNAs have stable internal structures (Clote ). Therefore, using negative controls may lead to false significant predictions due to internal structures of interacting partners, giving misleading MFE scores. We also observed this effect in our predictions (data not shown), and so excluded those algorithms from the significance test. They also have relatively slow running times, and some have problems utilizing memory (e.g. bifold). NUPACK is the best among this type of prediction methods and RNAcofold is the fastest (Table 1). It is apparent that the algorithms do not necessarily perform equally for all types of RNA–RNA interactions, and it is better to select algorithms appropriate to the input dataset. For example, RIsearch is fast and accurate for eukaryotic datasets, and would be suitable for high throughput predictions which can be combined statistical significance testing of the predicted scores. IntaRNA and RNAplex seem to be reliable and relatively fast for all datasets. RNAup is precise and less prone to length bias (Supplementary Table S4).

4 Conclusion

Here we present one of the most comprehensive benchmark of RNA–RNA interaction prediction methods that covers almost all RNA–RNA interactions in RNA biology. We extended the previous work (Lai and Meyer, 2015; Pain ) by including all types of RNA–RNA interactions and the latest algorithms (DiChiacchio ) in the RNA interaction prediction field. We have included a test to determine the statistical significance of the predicted scores by each algorithm. We have also reported that increasing length of target RNAs which contain binding regions also negatively influences overall prediction quality (Supplementary Table S4). Three accessibility based algorithms, RNAup, IntaRNA and RNAplex, performed best for all types of interactions. We found that the accessibility based MFE methods could also differentiate almost half of the native interactions from background in our bacterial dataset (Table 2). Therefore, carefully designed negative controls (e.g. dinucleotide shuffling) allow for the use of predicted MFE values and separate scores for native interactions from the background. This makes the accessibility algorithms ideal tools for de novo predictions, especially those with smaller run-times such as IntaRNA and RNAplex, since candidate target RNAs can be thousands of nts long. RNAplex is also effective on detecting correct interaction regions buried in larger RNA targets (Results and Supplementary Table S4). RNA interaction prediction is still an expanding field. Advances in sequencing technology has unveiled a vast number of novel uncharacterized ncRNA transcripts in different clades of life. These methods are also showing that many ncRNAs utilize RNA–RNA interactions (Kudla ; Lu ; Sharma ) which makes RNA target prediction an important asset to determine functions of novel genes. Comparative methods are becoming popular (Lai and Meyer, 2015; Pain ; Seemann ; Wright ), and may increase the prediction accuracy (Pain ; Wright ). However, some other results suggest that there is little to be gained from comparative approaches for predicting interactions (Lai and Meyer, 2015; Richter and Backofen, 2012) due to low conservation of many ncRNAs (Lindgreen ). Unfortunately, most of the verified interactions in the RNA literature still belong to model species (human, C. elegans, Arabidopsis and E. coli, etc.) which also raises the risk of overfitting results to a modest numbers of known interactions. Weak prediction rates for piRNAs may suggest inadequacy of prediction methods for novel regulatory RNAs, but even well-known miRNA interaction predictions have failed to be detected by any of the algorithms benchmarked (Fig. 2). Archaeal regulatory systems are also not well studied, and only a handful of archaeal sRNAs have been identified. Therefore, non-comparative methods are still a robust way to produce ab initio interaction predictions. Our benchmark will help researchers to find an appropriate algorithm for functional annotation of unknown transcripts or a basis from which to improve or develop new methods. Our scripts and datasets are publicly available at Github (github.com/UCanCompBio/RNA_Interactions_Benchmark).

Funding

SUU is supported by a Biomolecular Interaction Centre and UC HPC (Bluefern) joint PhD Scholarship from the University of Canterbury. PPG is supported by Rutherford Discovery Fellowships, administered by the Royal Society of New Zealand. Click here for additional data file.

116 in total

1. Discovering common stem-loop motifs in unaligned RNA sequences.

Authors: J Gorodkin; S L Stricklin; G D Stormo
Journal: Nucleic Acids Res Date: 2001-05-15 Impact factor: 16.971

2. PLEXY: efficient target prediction for box C/D snoRNAs.

Authors: Stephanie Kehr; Sebastian Bartschat; Peter F Stadler; Hakim Tafer
Journal: Bioinformatics Date: 2010-11-13 Impact factor: 6.937

Review 3. Experimental validation of miRNA targets.

Authors: Donald E Kuhn; Mickey M Martin; David S Feldman; Alvin V Terry; Gerard J Nuovo; Terry S Elton
Journal: Methods Date: 2008-01 Impact factor: 3.608

4. Ribosomal RNA depletion for efficient use of RNA-seq capacity.

Authors: Dominic O'Neil; Heike Glowatz; Martin Schlumpberger
Journal: Curr Protoc Mol Biol Date: 2013-07

5. Transcriptome complexity in a genome-reduced bacterium.

Authors: Marc Güell; Vera van Noort; Eva Yus; Wei-Hua Chen; Justine Leigh-Bell; Konstantinos Michalodimitrakis; Takuji Yamada; Manimozhiyan Arumugam; Tobias Doerks; Sebastian Kühner; Michaela Rode; Mikita Suyama; Sabine Schmidt; Anne-Claude Gavin; Peer Bork; Luis Serrano
Journal: Science Date: 2009-11-27 Impact factor: 47.728

Review 6. Spliceosome structure and function.

Authors: Cindy L Will; Reinhard Lührmann
Journal: Cold Spring Harb Perspect Biol Date: 2011-07-01 Impact factor: 10.005

7. RNAsnoop: efficient target prediction for H/ACA snoRNAs.

Authors: Hakim Tafer; Stephanie Kehr; Jana Hertel; Ivo L Hofacker; Peter F Stadler
Journal: Bioinformatics Date: 2009-12-16 Impact factor: 6.937

8. Discrete small RNA-generating loci as master regulators of transposon activity in Drosophila.

Authors: Julius Brennecke; Alexei A Aravin; Alexander Stark; Monica Dus; Manolis Kellis; Ravi Sachidanandam; Gregory J Hannon
Journal: Cell Date: 2007-03-08 Impact factor: 41.582

9. Endogenous siRNA and miRNA targets identified by sequencing of the Arabidopsis degradome.

Authors: Charles Addo-Quaye; Tifani W Eshoo; David P Bartel; Michael J Axtell
Journal: Curr Biol Date: 2008-05-08 Impact factor: 10.834

10. Global or local? Predicting secondary structure and accessibility in mRNAs.

Authors: Sita J Lange; Daniel Maticzka; Mathias Möhl; Joshua N Gagnon; Chris M Brown; Rolf Backofen
Journal: Nucleic Acids Res Date: 2012-02-28 Impact factor: 16.971

15 in total

Review 1. Long non-coding RNAs and their potential impact on diagnosis, prognosis, and therapy in prostate cancer: racial, ethnic, and geographical considerations.

Authors: Rebecca Morgan; Willian Abraham da Silveira; Ryan Christopher Kelly; Ian Overton; Emma H Allott; Gary Hardiman
Journal: Expert Rev Mol Diagn Date: 2021-11-25 Impact factor: 5.225

Review 2. Know Your Enemy: Successful Bioinformatic Approaches to Predict Functional RNA Structures in Viral RNAs.

Authors: Chun Shen Lim; Chris M Brown
Journal: Front Microbiol Date: 2018-01-04 Impact factor: 5.640

3. IntaRNA 2.0: enhanced and customizable prediction of RNA-RNA interactions.

Authors: Martin Mann; Patrick R Wright; Rolf Backofen
Journal: Nucleic Acids Res Date: 2017-07-03 Impact factor: 16.971

4. Freiburg RNA tools: a central online resource for RNA-focused research and teaching.

Authors: Martin Raden; Syed M Ali; Omer S Alkhnbashi; Anke Busch; Fabrizio Costa; Jason A Davis; Florian Eggenhofer; Rick Gelhausen; Jens Georg; Steffen Heyne; Michael Hiller; Kousik Kundu; Robert Kleinkauf; Steffen C Lott; Mostafa M Mohamed; Alexander Mattheis; Milad Miladi; Andreas S Richter; Sebastian Will; Joachim Wolff; Patrick R Wright; Rolf Backofen
Journal: Nucleic Acids Res Date: 2018-07-02 Impact factor: 16.971

5. RNAmining: A machine learning stand-alone and web server tool for RNA coding potential prediction.

Authors: Thaís A R Ramos; Nilbson R O Galindo; Raúl Arias-Carrasco; Cecília F da Silva; Vinicius Maracaja-Coutinho; Thaís G do Rêgo
Journal: F1000Res Date: 2021-04-26

Review 6. Long Non-coding RNAs: Mechanisms, Experimental, and Computational Approaches in Identification, Characterization, and Their Biomarker Potential in Cancer.

Authors: Anshika Chowdhary; Venkata Satagopam; Reinhard Schneider
Journal: Front Genet Date: 2021-07-01 Impact factor: 4.599

7. A comprehensive profile of circulating RNAs in human serum.

Authors: Sinan Uğur Umu; Hilde Langseth; Cecilie Bucher-Johannessen; Bastian Fromm; Andreas Keller; Eckart Meese; Marianne Lauritzen; Magnus Leithaug; Robert Lyle; Trine B Rounge
Journal: RNA Biol Date: 2017-12-08 Impact factor: 4.652

8. CopomuS-Ranking Compensatory Mutations to Guide RNA-RNA Interaction Verification Experiments.

Authors: Martin Raden; Fabio Gutmann; Michael Uhl; Rolf Backofen
Journal: Int J Mol Sci Date: 2020-05-28 Impact factor: 5.923

9. Interactive implementations of thermodynamics-based RNA structure and RNA-RNA interaction prediction approaches for example-driven teaching.

Authors: Martin Raden; Mostafa Mahmoud Mohamed; Syed Mohsin Ali; Rolf Backofen
Journal: PLoS Comput Biol Date: 2018-08-30 Impact factor: 4.475

10. PAREsnip2: a tool for high-throughput prediction of small RNA targets from degradome sequencing data using configurable targeting rules.

Authors: Joshua Thody; Leighton Folkes; Zahara Medina-Calzada; Ping Xu; Tamas Dalmay; Vincent Moulton
Journal: Nucleic Acids Res Date: 2018-09-28 Impact factor: 16.971