Literature DB >> 32005745

RNAconTest: comparing tools for noncoding RNA multiple sequence alignment based on structural consistency.

Abstract

The importance of noncoding RNA sequences has become increasingly clear over the past decade. New RNA families are often detected and analyzed using comparative methods based on multiple sequence alignments. Accordingly, a number of programs have been developed for aligning and deriving secondary structures from sets of RNA sequences. Yet, the best tools for these tasks remain unclear because existing benchmarks contain too few sequences belonging to only a small number of RNA families. RNAconTest (RNA consistency test) is a new benchmarking approach relying on the observation that secondary structure is often conserved across highly divergent RNA sequences from the same family. RNAconTest scores multiple sequence alignments based on the level of consistency among known secondary structures belonging to reference sequences in their output alignment. Similarly, consensus secondary structure predictions are scored according to their agreement with one or more known structures in a family. Comparing the performance of 10 popular alignment programs using RNAconTest revealed that DAFS, DECIPHER, LocARNA, and MAFFT created the most structurally consistent alignments. The best consensus secondary structure predictions were generated by DAFS and LocARNA (via RNAalifold). Many of the methods specific to noncoding RNAs exhibited poor scalability as the number or length of input sequences increased, and several programs displayed substantial declines in score as more sequences were aligned. Overall, RNAconTest provides a means of testing and improving tools for comparative RNA analysis, as well as highlighting the best available approaches. RNAconTest is available from the DECIPHER website (http://DECIPHER.codes/Downloads.html).

Entities: Chemical

Keywords: benchmark; consensus secondary structure; multiple sequence alignment; noncoding RNA; secondary structure prediction

Mesh：

Substances：
RNA, Untranslated

Year: 2020 PMID： 32005745 PMCID： PMC7161358 DOI： 10.1261/rna.073015.119

Source DB: PubMed Journal: RNA ISSN： 1355-8382 Impact factor: 4.942

INTRODUCTION

Multiple sequence alignment forms the basis of many comparative analyses of noncoding RNA sequences such as the prediction of secondary structure. The alignment of RNA sequences remains an unsolved challenge in computational biology (Fallmann et al. 2017). While protein sequences can be accurately aligned down to 30% similarity, RNA sequences become difficult to align below 60% similarity. This discrepancy can be attributed to the much smaller alphabet of RNA sequences (i.e., four nucleotides versus 20 amino acids) and the fact that many different primary sequences can fold into similar secondary structures through complementary base-pairing (Bussotti et al. 2013). For this reason, most RNA-specific tools take advantage of structural conservation to improve alignment performance. Despite the wide variety of tools that have been published, it remains unclear which tools are the best for RNA multiple sequence alignment. However, it is generally believed that programs performing simultaneous folding and alignment outperform programs that do not consider structure during alignment. Several benchmarks for RNA multiple sequence alignment have been published. The popular BRaliBase II (Gardner et al. 2005) and 2.1 (Wilm et al. 2006) benchmarks were constructed from Rfam (Kalvari et al. 2018) seed alignments, which are often derived from alignments created by automated programs. Since the ground truth is unknown in these cases, it is foreseeable that accuracy is partly based on the ability to match other programs’ outputs. Furthermore, the BRaliBase benchmarks contain only two to 15 sequences per set and many test sets were generated by repeated resampling of the same few RNA families. The paucity of families with low levels of sequence similarity resulted in the infamous “BRaliBase dent,” where programs appear to perform worse at an intermediary similarity range than they do for highly dissimilar sequences (Löwes et al. 2017). More recently, a collection of manually curated multiple alignments was generated for RNA families with a known structure (Widmann et al. 2012). Although structures can be used to adjust alignments by hand, there is no guarantee that the resulting alignment is optimal (Morrison 2009). Thus, existing benchmarks for RNA multiple sequence alignment leave room for improvement, especially as the number of RNA families and solved 3D structures have continued to increase. In contrast to RNA, there is a much wider variety of benchmarks for protein multiple sequence alignment. Many benchmarks rely on the superposition of multiple 3D crystal structures to identify colocated residues. Structural superposition is more challenging for RNA sequences because RNAs have modular 3D structures that behave as a collection of semi-independent rigid bodies (Rahrig et al. 2013; Čech et al. 2015; Piątkowski et al. 2017). Even though many more benchmarks are available for proteins than RNAs, there remains little consensus as to which benchmarking approach is optimal (Aniba et al. 2010; Edgar 2010; Iantorno et al. 2014). Recently, a new protein benchmark was developed to overcome some flaws in previous approaches by indirectly comparing protein multiple sequence alignments based on their utility for predicting correct secondary structure (Fox et al. 2015). This approach was shown to strongly correlate with the results of 3D superposition–based protein benchmarks (Sievers and Higgins 2019) but has yet to be applied for benchmarking RNA alignment programs. The goal of this study was to develop a new benchmark, RNAconTest (RNA consistency test), for assessing RNA multiple sequence alignments based on their degree of secondary structure consistency. Ten different alignment programs were compared using 29 Rfam families with two or more solved structures. This approach is based on the assumption that better secondary structure agreement results from a better alignment, that is, the alignment adequately represents a biologically conserved structure. The same concept was applied to assess consensus secondary structure predictions according to their agreement with empirically determined structures, although many alternative benchmarks already exist for this purpose (Puton et al. 2013; Miao et al. 2017). Collectively, the results reveal differences among programs for RNA alignment and could assist with improving future programs.

RESULTS

Scores on the RNAconTest benchmark are based on the consistency of known secondary structures across multiple sequences in an alignment or when compared to a predicted consensus structure (Fig. 1). Alignment and structure scores were defined using secondary structures in a manner that is analogous to weighted sum of pairs scores that are commonly used to compare sequence alignments (see Materials and Methods). Each set of reference sequences was derived from Rfam families with one or more solved 3D structures (Table 1). Only the unique sequences in each reference set were kept because many solved structures are redundant. Since reference sets contained a small number of unique sequences (1 to 93) with empirically supported structures, supplemental RNA sequences (N = 10 to 5120) were added from each Rfam family's full alignment. This resulted in a set of 29 different Rfam families with two or more solved structures, and 22 additional Rfam families with a single solved structure (Table 1). At least two reference structures are required to calculate an alignment score, whereas only one structure is needed to compare with a consensus secondary structure prediction (Fig. 1).

FIGURE 1.

TABLE 1.

Reference sets and characteristics

Example scoring for an Rfam family. (A) Five selenocysteine transfer RNA (RF01852) reference sequences were supplemented with 1280 randomly selected sequences from the same Rfam family to generate an alignment using DECIPHER. (B) After discarding the supplemental sequences, alignment scores are calculated for each column (site) in the alignment according to the degree of consistency among their known secondary structures. Structures are shown in dot bracket notation (i.e., base-pairings “(“and”)”, pseudoknots “[“and”]”, and unpaired positions “.”) with chain breaks masked by “+” symbols. (C) These per-site scores are averaged, after weighting by column occupancy, to compute the final alignment score (0.72). (D) In a similar manner, DECIPHER's consensus structure prediction is compared to each reference sequence's known structure to calculate the fraction of matching paired and unpaired bases. These scores are averaged across sequences to obtain the final structure score (0.69). (E) An arc diagram connecting all empirically supported base-pairings in the alignment shows that only a subset of the predicted base-pairings was consistent among sequences. (F) The predicted consensus structure represented most of the consistent base-pairings in the alignment. Reference sets and characteristics Ten different alignment programs were tested for their ability to generate structurally consistent alignments. The programs vary in the way in which they handle folding and alignment. Two generic alignment programs, Clustal Omega (Sievers et al. 2011) and MUSCLE (Edgar 2004), do not make use of secondary structure predictions. Three programs, DECIPHER (Wright 2015), MAFFT (Katoh and Standley 2013), and R-Coffee (Wilm et al. 2008), are extensions of programs originally designed for protein alignment but handle RNA sequences differently by incorporating secondary structure predictions. The remaining five programs were specifically designed for the alignment of noncoding RNAs: DAFS (Sato et al. 2012), MASTR (Lindgreen et al. 2007), MXSCARNA (Tabei et al. 2008), LocARNA, and SPARSE (Will et al. 2015). Seven out of ten programs provide a consensus secondary structure prediction with their output (i.e., all except Clustal Omega, MUSCLE, and MAFFT).

Alignment consistency

As expected, all programs generated alignments following a monotonic pattern of lower consistency scores with increasing distance (Fig. 2A). Unlike previous benchmarks, there was no sudden transition to lower scores beyond a specific similarity threshold, suggesting that the concept of an alignment “twilight zone” may be less applicable to RNA sequences than proteins. However, a component of the steady decline in scores may be partly due to true biological variability among secondary structures as the sequences evolve. For example, consistency scores varied ∼10% between different reference sets with similar average sequence identities (Fig. 3). Therefore, it is not feasible to determine an upper bound on consistency scores, and RNAconTest benchmark results should only be considered on a relative basis.

FIGURE 2.

FIGURE 3.

Alignment scores by Rfam family. Reference sets are ordered (left to right) by decreasing average similarity among the reference sequences (values along top horizontal axis). Alignment scores are ordered from fewest to most supplemental sequences within each family (left to right). Horizontal gray lines denote the scores of 100 random permutations of the reference structures belonging to each family, representing the range of results expected from incorrect alignments. Some programs exhibited instability (e.g., SPARSE), with clear failures to accurately align some reference sets belonging to the same family depending on which and/or how many supplemental sequences were added. The top scoring program differed from family to family, but DAFS, DECIPHER, LocRNA, and MAFFT were consistently high scoring.

Trends in score by sequence breadth and number. (A) All programs followed a monotonic trend of decreasing alignment score with increasing distance among aligned sequences. Each curve represents a smoothed cubic spline (with three degrees of freedom) fit to the set of all alignments for a given program. (B) Some programs declined in performance as more supplemental sequences were added to the reference alignment. Lines show the average alignment score relative to the baseline of 10 supplemental sequences. Lines end where fewer than five reference sets were aligned within the 12 h time limit per alignment. (C) Structure scores tended to decline with increasing distance among the aligned sequences. Each curve represents a smoothed cubic spline fit to the set of all alignments. Interestingly, some curves contain a peak at ∼50% pairwise identity, particularly for covariation-based prediction methods (e.g., DECIPHER). (D) Average structure scores changed considerably for some programs as more supplemental sequences were included in the alignment. In particular, DECIPHER exhibited the largest gains in performance, whereas MASTR displayed the reverse trend. Alignment scores by Rfam family. Reference sets are ordered (left to right) by decreasing average similarity among the reference sequences (values along top horizontal axis). Alignment scores are ordered from fewest to most supplemental sequences within each family (left to right). Horizontal gray lines denote the scores of 100 random permutations of the reference structures belonging to each family, representing the range of results expected from incorrect alignments. Some programs exhibited instability (e.g., SPARSE), with clear failures to accurately align some reference sets belonging to the same family depending on which and/or how many supplemental sequences were added. The top scoring program differed from family to family, but DAFS, DECIPHER, LocRNA, and MAFFT were consistently high scoring. On average, LocARNA (0.758) was the highest scoring program (Table 2), followed by MAFFT (0.745), DECIPHER (0.744), and DAFS (0.739). However, the difference among the top scoring programs was not statistically significant (P ≥ 0.02), except in single cases where LocARNA had higher scores than DAFS (N = 10) and MAFFT (N = 20). For larger sets of sequences, the lack of statistical significance could be attributable to the relatively few alignments completed by LocARNA within the 12 h time limit (Table 2). The statistically significant scores were used to rank programs (Fig. 4), revealing that the highest scoring programs all held positions at the top of the hierarchy. MASTR (0.545) was consistently the lowest scoring program placed at the bottom of the hierarchy. Notably, MUSCLE (0.722) performed reasonably well given that it does not take into account secondary structure. SPARSE (0.662), which is an extension of LocARNA, performed considerably worse than LocARNA. Furthermore, SPARSE displayed considerable score variability within families, suggesting that the heuristics it uses to speed up LocARNA are sometimes detrimental to accuracy. Overall, all alignment programs performed better than expected by random chance, except on the reference sets with lowest average similarity (Fig. 3).

TABLE 2.

Programs tested and performance characteristics

FIGURE 4.

Competitive hierarchies of programs. (A) Alignment programs were ranked based on whether their alignment scores were greater than other alignment programs, with higher nodes in the hierarchy indicating better performance (see Materials and Methods). Arrows point toward the program that had significantly (P < 0.02) lower scores for reference sets with a given number of supplemental sequences (N). DAFS, DECIPHER, LocARNA, and MAFFT were consistently the highest ranked programs. Only DECIPHER and LocARNA were never significantly outperformed by another program. (B) An analogous competitive hierarchy of programs that provided a consensus secondary structure prediction. DAFS and LocARNA dominated the hierarchy, while DECIPHER climbed in rank as more supplemental sequences were added. Programs tested and performance characteristics It is known that alignment programs tend to decrease in accuracy as more sequences are incorporated into the alignment (Sievers et al. 2013). Across the range of supplemental sequences added (10 to 5120), Clustal Omega, MASTR, MUSCLE, and R-Coffee exhibited steady falloffs in score as more sequences were aligned (Fig. 2B). In contrast, DAFS, DECIPHER, MAFFT, and MXSCARNA all displayed relatively constant or increasing scores.

Consistency of secondary structure predictions

The scores for consensus secondary structure predictions told a somewhat different story than alignment scores. Structure scores tended to decrease with greater distance among the aligned sequences. However, some programs displayed a local peak in structure scores around 50% sequence identity (Fig. 2C), suggesting that intermediate levels of identity balance the benefits of covariation information against the loss of alignment quality. The highest structure scores were achieved by DAFS (0.690) and LocARNA (0.680), with neither program being statistically significantly better than the other (Table 2). Interestingly, LocARNA uses RNAalifold for their structure predictions, yet R-Coffee (0.524) had considerably lower structure scores despite also relying on RNAalifold. DECIPHER's predictions, which are purely based on mutual information (MI) rather than free energy, improved substantially (>10% on average) by increasing the number of sequences in the alignment (Fig. 2D). In contrast, MASTR, MXSCARNA, R-Coffee, and SPARSE all exhibited declines in structure scores on larger alignments.

Empirical scalability

Scalability was split into two components: number of sequences being aligned (n) and average length of sequences (l). In terms of elapsed time, Clustal Omega was the fastest aligner tested, followed by DECIPHER. Table 2 shows that both programs scaled nearly linearly with the number of input sequences (n). DECIPHER was the only program to exhibit sublinear scalability with increasing length of sequences, likely due to heuristics it uses to constrain the dynamic programming matrix. Most other programs displayed quadratic or worse limiting behavior in n and/or l. In particular, SPARSE was similar in scalability to LocARNA, implying that its faster speed is not worth its lower accuracy. Clustal Omega (259) and DECIPHER (258) completed the most alignments within the 12 h time limit per alignment, while LocARNA (140) and SPARSE (142) completed the fewest.

DISCUSSION

The RNAconTest benchmark provides an alternative to existing benchmarks for RNA sequence alignments. The approach described here only tests for consistency among known secondary structures since the perfect alignment is unknown. Such a benchmarking approach would have been impractical a decade ago when far fewer empirical RNA structures were available. The advantages of RNAconTest are that it does not rely on manual curation, encompasses a wide variety of RNA families without substantial redundancy, and more realistically reflects many user scenarios for RNA alignment (e.g., thousands of input sequences). Nevertheless, it relies on the assumption that greater structural consistency corresponds to greater underlying alignment accuracy. This assumption was shown to be reasonable for protein sequences (Sievers and Higgins 2019), but would be invalidated if secondary structure diverged substantially during RNA evolution. It is known that RNA structures evolve in regions outside of their conserved core. For example, some tRNAs can have a different number of arms than the traditional four arm cloverleaf structure (Pons et al. 2019). Extensive structural variations also exist in RNase P (Brown 1999), group I introns (Jackson et al. 2002), and telomerase (Podlevsky et al. 2008) sequences. This raises the question of whether it is even feasible to align such divergent structures? The assumption made by most alignment software is that the input sequences are, at a minimum, homologous, if not orthologous (Morrison 2006). If true, it is reasonable to assume that greater structural consistency would result from a better alignment. However, the need for a large number of supplemental sequences likely introduces some false positive input sequences that score above Rfam's gathering cutoff (Nawrocki et al. 2015) for the full alignment but do not belong in the RNA family. Such sequences would violate the assumption of homology made by most aligners, although they also likely reflect a realistic user scenario. As alignments grow in size, it is possible that not all input sequences are true homologs and alignment software is still expected to generate a high quality alignment of any homologous sequences. A variety of different measures have been used to gauge the accuracy of secondary structure agreement (Gardner and Giegerich 2004; Xu et al. 2012). Here, the decision was made to focus on a single alignment score representing the agreement among paired and unpaired positions in known structures (see Materials and Methods). The alignment score takes into account the number of correctly/incorrectly aligned positions, as well as positions that were gapped that should have been aligned. The structure score incorporates correctly/incorrectly predicted base-pairings, as well as missing base-pairings. The advantage of these scoring schemes is that they provide a unified measure of accuracy. Notably, this differs from a traditional benchmark in that the maximum score is unknown, that is, scores can only be compared on a relative (not absolute) basis. Notwithstanding this limitation, the ranking of programs on RNAconTest was generally consistent with previous benchmarks showing that LocARNA is currently the best available program for noncoding RNA alignment (Löwes et al. 2017). Taken together, the results of this study suggest that there remains considerable room for improvement in the alignment of noncoding RNA sequences. This was evidenced by the fact that no program substantially outperformed all others across most Rfam families (Fig. 3). Users of alignment programs have many viable options for reasonably accurate RNA alignments. For a small number of input sequences (n < 1000), DAFS, LocARNA, and MAFFT all produced similarly accurate alignments. DECIPHER was the most scalable program that maintained high accuracy on alignments of a large number of sequences. DAFS and LocARNA (via RNAalifold) outperformed most other secondary structure prediction tools. Hopefully the availability of a new benchmark will reinvigorate developers’ efforts to improve tools for analyzing noncoding RNAs.

MATERIALS AND METHODS

Construction of the RNAconTest benchmark

The benchmark was constructed from the set of 3016 families in Rfam (v14.1). This set was narrowed to 751 with at least 100 sequences available in their full alignment. Available 3D structures were downloaded from the Protein Data Bank (wwPDB consortium 2019) and used to determine an empirical secondary structure with DSSR (Lu and Olson 2003). Briefly, RNA chains were extracted using PyMOL (v2.4.0a0) to prevent cross-chain bonds within multichain structures and any resulting fragments were mapped to their primary sequence. Masking symbols (“+”) were used to denote chain breaks in the resulting reference secondary structures that were excluded from scoring. The final sets were reduced to the set of unique sequences with structures containing at least five paired bases and <10% imbalance between left and right pairings. Imbalance sometimes resulted from structures where the Rfam family only matched a small subsequence of the full-length chain (e.g., RF00029). A set of up to 10 different input alignments were tested for each Rfam family. These input alignments were generated by supplementing the reference sequences with 10 to 5120 ({10 * 2 : 0 ≤ k ≤ 9}) randomly selected sequences from the same Rfam family (full alignment). These two sets of sequences were combined as the input to each of the alignment programs and then the supplemental sequences were discarded after alignment. This resulted in a set of up to 93 aligned reference sequences that were scored for their secondary structure consistency. A total of 29 Rfam families with two or more solved structures were used for calculating alignment scores (Fig. 3), and 51 Rfam families with one or more solved structures were used for determining consensus secondary structure scores (Fig. 5).

FIGURE 5.

Structure scores by Rfam family. Reference sets are ordered (left to right) by decreasing average similarity among the aligned sequences (values along top horizontal axis). Structure scores are ordered from fewest to most supplemental sequences within each family (left to right). Horizontal gray lines denote the scores of 100 random permutations of the reference structures belonging to each family, representing the range of results expected from incorrect predictions. DAFS and LocARNA (via RNAalifold) outperformed the other programs for predicting consensus secondary structures.

Scoring sequence alignments and consensus structure predictions

For each alignment that was generated, alignment scores were calculated by inserting gaps in the known secondary structures to match those in the (m) aligned reference sequences after removing any supplemental sequences (Fig. 1). Excluding terminal gaps and any masked positions, a score (s) was assigned to each pair of structures (x and y) at a site in the alignment based on whether they were consistent or inconsistent: For example, if two structures were both unpaired (“.”) then the position was assigned a score of one, whereas if one was paired and the other was unpaired it was assigned a score of zero. Paired bases were required to pair with the same opposing site in the alignment in order to be considered consistent. At each site (i) in the alignment, the column score (cs) was defined as the number of consistent sequence pairs divided by the total number of possible sequence pairs (i.e., for all combinations of reference structures, j and k): The alignment score for a set was calculated as the average of column scores across all (l) sites weighted by occupancy (1 − f): In this way, sites including more nucleotides (i.e., those that have a lower fraction of gaps, f) had a greater influence over the alignment score. Thus, the alignment score can range from zero to one, where higher values correspond to greater secondary structure consistency in the aligned reference sequences. A structure score was calculated in a similar manner but without requiring multiple reference sequences per Rfam family. Here, a consensus secondary structure prediction was compared to each reference structure individually. A reference's (ref) individual score (ss) was defined as the fraction of paired or unpaired sites in the reference structure that agreed with the consensus (cons) secondary structure: Agreement required that a paired position pair with the same opposing position in both the reference structure and predicted consensus structure. The structure score for a set was the average of scores across all (m) reference sequences in cases where there was more than one reference structure available: In this manner, structure scores range from zero to one, with higher values denoting greater agreement between the consensus prediction and the known structures of reference sequences. Notably, perfect scores may be impossible to attain because of disagreement among known secondary structures or a suboptimal alignment. Nevertheless, higher relative scores are expected to result from better consensus structure predictions. Structures were permuted randomly to estimate scores achievable by chance alone (i.e., horizontal gray lines in Figs. 3, 5). Here, the set of paired and unpaired positions were sampled without replacement (i.e., shuffled) in each reference structure before calculating an alignment and structure score. This process was repeated 100 times for each reference set to capture the distribution of scores expected from incorrect alignments or consensus secondary structure predictions. DECIPHER's alignments and consensus secondary structure predictions were used for this randomization procedure because it was relatively quick to generate replicate alignments and structure predictions. Structure scores lower than expected by random chance were often achieved by assigning many (incorrect) paired rather than unpaired (i.e., “.”) positions in a consensus structure. Unpaired positions can match with any other unpaired position, which would be expected to result in higher scores by chance alone than incorrectly assigned base-pairings.

Alignment and structure prediction programs

Ten different alignment programs were compared, seven of which output a consensus secondary structure prediction (Table 2). Default invocations were used in all cases, with the exceptions of: Clustal Omega (“clustalo –seqtype=RNA –threads=1”), MAFFT (“mafft-qinsi –thread 1”), and R-Coffee (“tcoffee -mode rcoffee -ncore=1”). Alignments were limited to 12 h using the timeout command. For this reason, average scores in Table 2 are only listed for sets completed by all programs. To avoid a potential conflict of interest (Boulesteix et al. 2013) as the developer of DECIPHER, an older version of DECIPHER was used that dated before the creation of RNAconTest. An attempt was made to incorporate other published alignment programs, but many could not be installed or were no longer available. Unlike the other programs, DECIPHER is a package for the R programming language (R Core Team 2019). The DECIPHER algorithm for RNA sequences is analogous to that previously described for the alignment of protein sequences (Wright 2015). First, a k-mer–based guide tree is used to progressively construct an initial alignment. Then, secondary structure predictions for each sequence are derived from MI scores using an extension of a previously described method (Freyhult et al. 2005). In particular, the MI is modified by weights for each type of base-pairing and an average product correction (Buslje et al. 2009) is applied to the MI matrix. Secondary structures for each individual sequence are predicted from the consensus structure by removing paired positions that are inconsistent with base-pairing (e.g., A/A). Then a search is conducted between the remaining paired positions for stem–loops that are longer than expected by chance. The result is a predicted structure for each sequence, which is then incorporated during alignment with its own substitution matrix in the same way that base pairs are aligned. This process of constructing a guide tree, aligning, and calculating secondary structures is performed for two more iterations (by default) before producing the final alignment and consensus structure.

Ranking programs by score

Aggregate scores for the benchmark as a whole were unobtainable because each program completed a different number of alignments within the 12 h time limit. Instead, a test of statistical significance was performed to decide whether a program outperformed another on the set of reference alignments completed by both programs (for a given number of supplemental sequences). The Wilcoxon signed-rank test (one-sided) with a P-value threshold of 0.02 was used to qualify significance. To display these results, the programs were ranked according to a method for assigning hierarchy to directed networks (Gupte et al. 2011). This method relies on a scoring system wherein arrows pointing down the hierarchy are given a positive score (+1), and arrows pointing against the hierarchy are given a negative score (−1) in the same manner. The optimal ranking of programs was determined using the R package rgenoud for integer genetic optimization (Mebane and Sekhon 2011) by performing 100 replicates starting from different initial conditions. The most compact ranking was preferred in cases where multiple competitive hierarchies had equivalent scores.

Estimating scalability

Scalability in time (t) was approximated using a two-step procedure, first determining scalability with the number of input sequences (n) and then with their average length (l). In step 1, a line was fitted to the log of n versus the log of t for each reference set. The median of the fitted slopes is reported in Table 2 in big-O notation. In step 2, each set's slope was used to remove the effect of n on t. This resulted in an average time per reference set that was used to determine the slope of the log of l versus the log of t. The limiting behavior of scalability as l becomes large is reported separately from n in Table 2. Notably, one processor was used for all programs but additional speedup might be achieved by specifying more than one processor where possible (i.e., Clustal Omega, DECIPHER, MAFFT, and R-Coffee). All tests were performed on a Dell PowerEdge T650 with an Intel Xeon processor (E5-2690 v4 2.6 GHz) and 256 GB of memory running CentOS 7.

Availability of the RNAconTest benchmark

The RNAconTest (v1.0) benchmark and associated results are available from the DECIPHER website (http://DECIPHER.codes/Downloads.html). The benchmark consists of reference sequences, structures, and supplemental sequences for each Rfam family, as well as output alignments for all programs tested. Two R functions are included for scoring multiple sequence alignments and consensus secondary structure predictions.

40 in total

1. Correction for phylogeny, small number of observations and data redundancy improves the identification of coevolving amino acid pairs using mutual information.

Authors: Cristina Marino Buslje; Javier Santos; Jose Maria Delfino; Morten Nielsen
Journal: Bioinformatics Date: 2009-03-10 Impact factor: 6.937

Review 2. Who watches the watchmen? An appraisal of benchmarks for multiple sequence alignment.

Authors: Stefano Iantorno; Kevin Gori; Nick Goldman; Manuel Gil; Christophe Dessimoz
Journal: Methods Mol Biol Date: 2014

3. Making automated multiple alignments of very large numbers of protein sequences.

Authors: Fabian Sievers; David Dineen; Andreas Wilm; Desmond G Higgins
Journal: Bioinformatics Date: 2013-02-21 Impact factor: 6.937

4. Statistical evaluation of improvement in RNA secondary structure prediction.

Authors: Zhenjiang Xu; Anthony Almudevar; David H Mathews
Journal: Nucleic Acids Res Date: 2011-12-01 Impact factor: 16.971

5. SPARSE: quadratic time simultaneous alignment and folding of RNAs without sequence-based heuristics.

Authors: Sebastian Will; Christina Otto; Milad Miladi; Mathias Möhl; Rolf Backofen
Journal: Bioinformatics Date: 2015-04-02 Impact factor: 6.937

6. The BRaliBase dent-a tale of benchmark design and interpretation.

Authors: Benedikt Löwes; Cedric Chauve; Yann Ponty; Robert Giegerich
Journal: Brief Bioinform Date: 2017-03-01 Impact factor: 11.622

7. Arm-less mitochondrial tRNAs conserved for over 30 millions of years in spiders.

Authors: Joan Pons; Pere Bover; Leticia Bidegaray-Batista; Miquel A Arnedo
Journal: BMC Genomics Date: 2019-08-23 Impact factor: 3.969

8. A plea for neutral comparison studies in computational sciences.

Authors: Anne-Laure Boulesteix; Sabine Lauer; Manuel J A Eugster
Journal: PLoS One Date: 2013-04-24 Impact factor: 3.240

9. R3D Align web server for global nucleotide to nucleotide alignments of RNA 3D structures.

Authors: Ryan R Rahrig; Anton I Petrov; Neocles B Leontis; Craig L Zirbel
Journal: Nucleic Acids Res Date: 2013-05-28 Impact factor: 16.971

10. Protein Data Bank: the single global archive for 3D macromolecular structure data.

Authors:
Journal: Nucleic Acids Res Date: 2019-01-08 Impact factor: 16.971

7 in total

1. Correlations Between the Metabolome and the Endophytic Fungal Metagenome Suggests Importance of Various Metabolite Classes in Community Assembly in Horseradish (Armoracia rusticana, Brassicaceae) Roots.

Authors: Tamás Plaszkó; Zsolt Szűcs; Zoltán Cziáky; Lajos Ács-Szabó; Hajnalka Csoma; László Géczi; Gábor Vasas; Sándor Gonda
Journal: Front Plant Sci Date: 2022-06-17 Impact factor: 6.627

2. Fetal meconium does not have a detectable microbiota before birth.

Authors: Katherine M Kennedy; Max J Gerlach; Thomas Adam; Markus M Heimesaat; Laura Rossi; Michael G Surette; Deborah M Sloboda; Thorsten Braun
Journal: Nat Microbiol Date: 2021-05-10 Impact factor: 17.745

3. Development and implementation of a scalable and versatile test for COVID-19 diagnostics in rural communities.

Authors: A Ceci; C Muñoz-Ballester; A N Tegge; K L Brown; R A Umans; F M Michel; D Patel; B Tewari; J Martin; O Alcoreza; T Maynard; D Martinez-Martinez; P Bordwine; N Bissell; M J Friedlander; H Sontheimer; C V Finkielstein
Journal: Nat Commun Date: 2021-07-20 Impact factor: 14.919

4. Using a multiple-delivery-mode training approach to develop local capacity and infrastructure for advanced bioinformatics in Africa.

Authors: Verena Ras; Gerrit Botha; Shaun Aron; Katie Lennard; Imane Allali; Shantelle Claassen-Weitz; Kilaza Samson Mwaikono; Dane Kennedy; Jessica R Holmes; Gloria Rendon; Sumir Panji; Christopher J Fields; Nicola Mulder
Journal: PLoS Comput Biol Date: 2021-02-25 Impact factor: 4.475