Milad Miladi1, Martin Raden1, Sebastian Will2,3, Rolf Backofen4,5. 1. Bioinformatics Group, Department of Computer Science, University of Freiburg, Georges-Köhler-Allee 106, Freiburg, Germany. 2. Theoretical Biochemistry Group (TBI), Institute for Theoretical Chemistry, University of Vienna, Währingerstrasse 17, Vienna, Austria. 3. Bioinformatics group (AMIBIO), Laboratoire d'Informatique de l'École Polytechnique (LIX), Institut Polytechnique de Paris (IPP), Batiment Turing, 1 rue d'Estienne d'Orve, Palaiseau, France. 4. Bioinformatics Group, Department of Computer Science, University of Freiburg, Georges-Köhler-Allee 106, Freiburg, Germany. backofen@informatik.uni-freiburg.de. 5. Signalling Research Centres BIOSS and CIBSS, University of Freiburg, Schänzlestr. 18, Freiburg, Germany. backofen@informatik.uni-freiburg.de.
Abstract
MOTIVATION: Simultaneous alignment and folding (SA&F) of RNAs is the indispensable gold standard for inferring the structure of non-coding RNAs and their general analysis. The original algorithm, proposed by Sankoff, solves the theoretical problem exactly with a complexity of [Formula: see text] in the full energy model. Over the last two decades, several variants and improvements of the Sankoff algorithm have been proposed to reduce its extreme complexity by proposing simplified energy models or imposing restrictions on the predicted alignments. RESULTS: Here, we introduce a novel variant of Sankoff's algorithm that reconciles the simplifications of PMcomp, namely moving from the full energy model to a simpler base pair-based model, with the accuracy of the loop-based full energy model. Instead of estimating pseudo-energies from unconditional base pair probabilities, our model calculates energies from conditional base pair probabilities that allow to accurately capture structure probabilities, which obey a conditional dependency. This model gives rise to the fast and highly accurate novel algorithm Pankov (Probabilistic Sankoff-like simultaneous alignment and folding of RNAs inspired by Markov chains). CONCLUSIONS: Pankov benefits from the speed-up of excluding unreliable base-pairing without compromising the loop-based free energy model of the Sankoff's algorithm. We show that Pankov outperforms its predecessors LocARNA and SPARSE in folding quality and is faster than LocARNA.
MOTIVATION: Simultaneous alignment and folding (SA&F) of RNAs is the indispensable gold standard for inferring the structure of non-coding RNAs and their general analysis. The original algorithm, proposed by Sankoff, solves the theoretical problem exactly with a complexity of [Formula: see text] in the full energy model. Over the last two decades, several variants and improvements of the Sankoff algorithm have been proposed to reduce its extreme complexity by proposing simplified energy models or imposing restrictions on the predicted alignments. RESULTS: Here, we introduce a novel variant of Sankoff's algorithm that reconciles the simplifications of PMcomp, namely moving from the full energy model to a simpler base pair-based model, with the accuracy of the loop-based full energy model. Instead of estimating pseudo-energies from unconditional base pair probabilities, our model calculates energies from conditional base pair probabilities that allow to accurately capture structure probabilities, which obey a conditional dependency. This model gives rise to the fast and highly accurate novel algorithm Pankov (Probabilistic Sankoff-like simultaneous alignment and folding of RNAs inspired by Markov chains). CONCLUSIONS: Pankov benefits from the speed-up of excluding unreliable base-pairing without compromising the loop-based free energy model of the Sankoff's algorithm. We show that Pankov outperforms its predecessors LocARNA and SPARSE in folding quality and is faster than LocARNA.
Entities:
Keywords:
Alignment and folding of RNAs; RNA secondary structure; Structural bioinformatics
Authors: Athanasius F Bompfünewerer; Rolf Backofen; Stephan H Bernhart; Jana Hertel; Ivo L Hofacker; Peter F Stadler; Sebastian Will Journal: J Math Biol Date: 2007-07-05 Impact factor: 2.259
Authors: Milad Miladi; Alexander Junge; Fabrizio Costa; Stefan E Seemann; Jakob Hull Havgaard; Jan Gorodkin; Rolf Backofen Journal: Bioinformatics Date: 2017-07-15 Impact factor: 6.937