| Literature DB >> 26984616 |
Benedikt Löwes1, Cedric Chauve2, Yann Ponty3, Robert Giegerich4.
Abstract
BRaliBase is a widely used benchmark for assessing the accuracy of RNA secondary structure alignment methods. In most case studies based on the BRaliBase benchmark, one can observe a puzzling drop in accuracy in the 40-60% sequence identity range, the so-called 'BRaliBase Dent'. In this article, we show this dent is owing to a bias in the composition of the BRaliBase benchmark, namely the inclusion of a disproportionate number of transfer RNAs, which exhibit a conserved secondary structure. Our analysis, aside of its interest regarding the specific case of the BRaliBase benchmark, also raises important questions regarding the design and use of benchmarks in computational biology.Entities:
Keywords: RNA family database; RNA structural alignment; benchmark
Mesh:
Year: 2017 PMID: 26984616 PMCID: PMC5444242 DOI: 10.1093/bib/bbw022
Source DB: PubMed Journal: Brief Bioinform ISSN: 1467-5463 Impact factor: 11.622
Figure 1.(A) Original BRaliBase evaluation of 2005 [6]. Dashed lines show pure sequence aligners, solid lines show structural aligners and dotted-solid lines show structural aligners with varying parameters. (B) Extended evaluation for Foldalign and PMcomp that shows all results for the 118 pairwise alignments for both tools using the original data. SPS (Sum of Pairs Scores) is a measure of alignment accuracy compared with a reference data set introduced in [2]. A colour version of this figure is available at BIB online: https://academic.oup.com/bib.
Figure 2.(A) Re-evaluation of the 2006 BRaliBase data from [7] with currently available structural aligners. (B) The same re-evaluation with only three tools and box plots showing the detailed distribution of SPS. Here, we have chosen to add LocARNA as the best performing tool and substituted PMcomp by Lara, because some of PMcomp’s alignment computations resulted in errors and Lara represents an interesting alternative not fitting into the previously mentioned categories. (See Supplementary Data Table S1 for details.) A colour version of this figure is available at BIB online: https://academic.oup.com/bib.
Figure 3.The two plots show 9 of 36 RNA families with at least 180 alignments. (A) Familywise performance of LocARNA. The family names in the legend are further accompanied by the total number of alignments for each family in brackets. (B) Each family’s share of LocARNA’s SPS (after local regression) per sequence identity. The remaining families with <180 alignments are grouped into ‘other’. A colour version of this figure is available at BIB online: https://academic.oup.com/bib.
Figure 4.Separate evaluation of tRNA alignments, non-tRNA alignments and the complete data set. Comparison (A) by BRaliBase SPS and (B) as length normalized score differences between the optimal version and the reference using PMcomp ’s scoring scheme (LocARNA was substituted by PMcomp for the ease of a scoring scheme that is easy to reverse engineer and implement). Additionally, (A) shows a curve for a sample approach in which no families share is bigger than 20% per sequence identity and the non-5S_rRNA alignments. A colour version of this figure is available at BIB online: https://academic.oup.com/bib.