Literature DB >> 22135038

Evaluation of a sophisticated SCFG design for RNA secondary structure prediction.

Markus E Nebel1, Anika Scheid.   

Abstract

Predicting secondary structures of RNA molecules is one of the fundamental problems of and thus a challenging task in computational structural biology. Over the past decades, mainly two different approaches have been considered to compute predictions of RNA secondary structures from a single sequence: the first one relies on physics-based and the other on probabilistic RNA models. Particularly, the free energy minimization (MFE) approach is usually considered the most popular and successful method. Moreover, based on the paradigm-shifting work by McCaskill which proposes the computation of partition functions (PFs) and base pair probabilities based on thermodynamics, several extended partition function algorithms, statistical sampling methods and clustering techniques have been invented over the last years. However, the accuracy of the corresponding algorithms is limited by the quality of underlying physics-based models, which include a vast number of thermodynamic parameters and are still incomplete. The competing probabilistic approach is based on stochastic context-free grammars (SCFGs) or corresponding generalizations, like conditional log-linear models (CLLMs). These methods abstract from free energies and instead try to learn about the structural behavior of the molecules by learning (a manageable number of) probabilistic parameters from trusted RNA structure databases. In this work, we introduce and evaluate a sophisticated SCFG design that mirrors state-of-the-art physics-based RNA structure prediction procedures by distinguishing between all features of RNA that imply different energy rules. This SCFG actually serves as the foundation for a statistical sampling algorithm for RNA secondary structures of a single sequence that represents a probabilistic counterpart to the sampling extension of the PF approach. Furthermore, some new ways to derive meaningful structure predictions from generated sample sets are presented. They are used to compare the predictive accuracy of our model to that of other probabilistic and energy-based prediction methods. Particularly, comparisons to lightweight SCFGs and corresponding CLLMs for RNA structure prediction indicate that more complex SCFG designs might yield higher accuracy but eventually require more comprehensive and pure training sets. Investigations on both the accuracies of predicted foldings and the overall quality of generated sample sets (especially on an abstraction level, called abstract shapes of generated structures, that is relevant for biologists) yield the conclusion that the Boltzmann distribution of the PF sampling approach is more centered than the ensemble distribution induced by the sophisticated SCFG model, which implies a greater structural diversity within generated samples. In general, neither of the two distinct ensemble distributions is more adequate than the other and the corresponding results obtained by statistical sampling can be expected to bare fundamental differences, such that the method to be preferred for a particular input sequence strongly depends on the considered RNA type.

Mesh:

Substances:

Year:  2011        PMID: 22135038     DOI: 10.1007/s12064-011-0139-7

Source DB:  PubMed          Journal:  Theory Biosci        ISSN: 1431-7613            Impact factor:   1.919


  37 in total

1.  RNA secondary structure prediction using stochastic context-free grammars and evolutionary history.

Authors:  B Knudsen; J Hein
Journal:  Bioinformatics       Date:  1999-06       Impact factor: 6.937

2.  Secondary structure alone is generally not statistically significant for the detection of noncoding RNAs.

Authors:  E Rivas; S R Eddy
Journal:  Bioinformatics       Date:  2000-07       Impact factor: 6.937

3.  Rfam: an RNA family database.

Authors:  Sam Griffiths-Jones; Alex Bateman; Mhairi Marshall; Ajay Khanna; Sean R Eddy
Journal:  Nucleic Acids Res       Date:  2003-01-01       Impact factor: 16.971

4.  Sfold web server for statistical folding and rational design of nucleic acids.

Authors:  Ye Ding; Chi Yu Chan; Charles E Lawrence
Journal:  Nucleic Acids Res       Date:  2004-07-01       Impact factor: 16.971

5.  A range of complex probabilistic models for RNA secondary structure prediction that includes the nearest-neighbor model and more.

Authors:  Elena Rivas; Raymond Lang; Sean R Eddy
Journal:  RNA       Date:  2011-12-22       Impact factor: 4.942

6.  RNA secondary structure prediction by centroids in a Boltzmann weighted ensemble.

Authors:  Ye Ding; Chi Yu Chan; Charles E Lawrence
Journal:  RNA       Date:  2005-08       Impact factor: 4.942

7.  CONTRAfold: RNA secondary structure prediction without physics-based models.

Authors:  Chuong B Do; Daniel A Woods; Serafim Batzoglou
Journal:  Bioinformatics       Date:  2006-07-15       Impact factor: 6.937

8.  Evaluation of several lightweight stochastic context-free grammars for RNA secondary structure prediction.

Authors:  Robin D Dowell; Sean R Eddy
Journal:  BMC Bioinformatics       Date:  2004-06-04       Impact factor: 3.169

9.  Rfam: annotating non-coding RNAs in complete genomes.

Authors:  Sam Griffiths-Jones; Simon Moxon; Mhairi Marshall; Ajay Khanna; Sean R Eddy; Alex Bateman
Journal:  Nucleic Acids Res       Date:  2005-01-01       Impact factor: 16.971

10.  Design, implementation and evaluation of a practical pseudoknot folding algorithm based on thermodynamics.

Authors:  Jens Reeder; Robert Giegerich
Journal:  BMC Bioinformatics       Date:  2004-08-04       Impact factor: 3.169

View more
  4 in total

1.  Evaluating the effect of disturbed ensemble distributions on SCFG based statistical sampling of RNA secondary structures.

Authors:  Anika Scheid; Markus E Nebel
Journal:  BMC Bioinformatics       Date:  2012-07-09       Impact factor: 3.169

2.  Markov Chain-Based Sampling for Exploring RNA Secondary Structure under the Nearest Neighbor Thermodynamic Model and Extended Applications.

Authors:  Anna Kirkpatrick; Kalen Patton; Prasad Tetali; Cassie Mitchell
Journal:  Math Comput Appl       Date:  2020-10-10

3.  The four ingredients of single-sequence RNA secondary structure prediction. A unifying perspective.

Authors:  Elena Rivas
Journal:  RNA Biol       Date:  2013-05-10       Impact factor: 4.652

4.  RNA folding with hard and soft constraints.

Authors:  Ronny Lorenz; Ivo L Hofacker; Peter F Stadler
Journal:  Algorithms Mol Biol       Date:  2016-04-23       Impact factor: 1.405

  4 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.