Literature DB >> 19435882

CENTROIDFOLD: a web server for RNA secondary structure prediction.

Kengo Sato¹, Michiaki Hamada, Kiyoshi Asai, Toutai Mituyama.

Abstract

The CENTROIDFOLD web server (http://www.ncrna.org/centroidfold/) is a web application for RNA secondary structure prediction powered by one of the most accurate prediction engine. The server accepts two kinds of sequence data: a single RNA sequence and a multiple alignment of RNA sequences. It responses with a prediction result shown as a popular base-pair notation and a graph representation. PDF version of the graph representation is also available. For a multiple alignment sequence, the server predicts a common secondary structure. Usage of the server is quite simple. You can paste a single RNA sequence (FASTA or plain sequence text) or a multiple alignment (CLUSTAL-W format) into the textarea then click on the 'execute CentroidFold' button. The server quickly responses with a prediction result. The major advantage of this server is that it employs our original CentroidFold software as its prediction engine which scores the best accuracy in our benchmark results. Our web server is freely available with no login requirement.

Entities: Chemical Disease Gene

Mesh：

Substances：

Year: 2009 PMID： 19435882 PMCID： PMC2703931 DOI： 10.1093/nar/gkp367

Source DB: PubMed Journal: Nucleic Acids Res ISSN： 0305-1048 Impact factor: 16.971

INTRODUCTION

Recent research has discovered that functional noncoding RNAs (ncRNAs) play essential roles in cells. It is well-known that functions of ncRNAs are deeply related to their secondary structures rather than primary sequence structures (e.g. hairpin structures for miRNA precursors and cloverleaf structures for tRNAs). Therefore, the importance of accurate secondary structure predictions has increased. The most successful approach for predicting RNA secondary structures is based on the free energy minimization such as Mfold (1) and RNAfold in the Vienna RNA package (2). Alternative approach is based on probabilistic frameworks, including stochastic context-free grammars (SCFGs), which can model RNA secondary structures without pseudoknots (3). These approaches employ a dynamic programming technique called the Cocke–Younger–Kasami (CYK) algorithm for calculating the minimum free energy (MFE) or maximum likelihood (ML) structure (4). However, several studies have pointed out a drawback of the MFE/ML estimators that the MFE/ML structure generally has an extremely low probability and is even not optimal with respect to the number of corrected predicted base pairs (5–8). Hence, alternative estimators which consider the ensemble of all possible solutions, instead of only the solution with the highest probability, have been developed. These include the centroid estimator employed by Sfold (6,7) and the maximum expected accuracy (MEA) estimator employed by CONTRAfold (9). These estimators maximize the expectation of an object function related to the accuracy of the prediction. We have recently proposed a generalized centroid estimator, called a γ-centroid estimator, which can be more appropriate for the accuracy measure of RNA secondary structure prediction than the MEA estimator, and have furthermore shown that the γ-centroid estimator is theoretically and experimentally superior to the MEA estimator (10). CentroidFold is an implementation of the γ-centroid estimator for predicting RNA secondary structures, and is distributed as a free software from http://www.ncrna.org/software/centroidfold/. In this article, we introduce a web application of CentroidFold with a very simple interface. It takes an individual RNA sequence or a multiple alignment of RNA sequences, and returns its predicted (common) secondary structure with a graphical representation. Our web application is available at http://www.ncrna.org/centroidfold/ for unrestricted use.

METHODS

Algorithm

CentroidFold predicts RNA secondary structures with the γ-centroid estimator (10) which is a kind of posterior decoding method based on statistical decision theory. We define a gain function between a true structure y and a candidate structure ŷ by where γ is a weight for base pairs, y is 1 if the i-th and the j-th nucleotides form a base pair in , or 0 otherwise, and I(condition) is an indicator function which takes a value of 1 or 0 depending on whether the condition is true or false. The gain function (1) is equal to the weighted sum of the number of true positives and the number of true negatives of base pairs. The expectation of the gain function (1) with respect to an ensemble of all possible secondary structures under a given posterior distribution p(|) is where 𝒴() is a set of all possible secondary structures for , || is the length of and C is a constant independent of ŷ. The base-pairing probability p = 𝔼[y] is the probability that the i-th and j-th nucleotides form a base pair in , which can be interpreted as confidence measure of predicted base pairs. The posterior distribution p(|) for calculating base-pairing probabilities can be chosen from various implementations including the McCaskill model (11) and the CONTRAfold model (9). We employ the CONTRAfold model as the default setting of CentroidFold in accordance with our benchmark (10). Then, we can find ŷ which maximizes the expected gain (2) using the recursive equations: and tracing back from M1,|. We can control the trade-off between specificity and sensitivity by γ. If γ = 1, our estimator is equivalent to the centroid estimator (7,8). The γ-centroid estimator is similar to the MEA estimator (9). The difference between them is only in the gain functions: the gain function of the γ-centroid is more suitable for evaluation measures for RNA secondary structure prediction than that of the MEA estimator. See (10) for more details.

Web server

The CentroidFold web server can be accessed on http://www.ncrna.org/centroidfold/ providing a very simple form for inputs. The server can accept two types of sequence formats: the FASTA format for predicting secondary structures of a single RNA sequence, and the CLUSTAL-W format for predicting common secondary structures of a multiple alignment of RNA sequences. The format of entered sequences can be automatically detected, and the appropriate prediction method is executed after the ‘execute CentroidFold’ button is clicked (Figure 1). The result of prediction is shown as a standard base-pair notation (Figure 2A) and a graphical representation (Figure 2B). Each predicted base pair is colored with the heat color gradation from blue to red corresponding to the base-pairing probability from 0 to 1. You can see the PDF version of the graphical presentation from a link given below the Figure 2.

Figure 1.

The CentroidFold web server.

Figure 2.

The result of predicting a common secondary structure for an example multiple alignment of Qrr RNAs. (A) A standard base-pair notation. (B) A graphical representation.

The CentroidFold web server. The result of predicting a common secondary structure for an example multiple alignment of Qrr RNAs. (A) A standard base-pair notation. (B) A graphical representation.

DISCUSSION AND CONCLUSIONS

The CentroidFold web server allows biologists to predict RNA (common) secondary structures with the most accurate prediction engine which scores the best accuracy in our benchmark results. For example, RNAfold based on MFE fails to predict a secondary structure of a typical tRNA sequence (Rfam id: M19341.1/98-169), whereas CentroidFold almost successfully predicts its secondary structure as shown in Figure 3. This result suggests that several ncRNA sequences do not always form MFE secondary structures, and posterior decoding methods including the γ-centroid estimator can provide more reliable predictions.

Figure 3.

Comparison of secondary structures of a tRNA sequence (Rfam id: M19341.1/98–169) between RNAfold (left), CentroidFold (center) and the reference structure (right).

Comparison of secondary structures of a tRNA sequence (Rfam id: M19341.1/98–169) between RNAfold (left), CentroidFold (center) and the reference structure (right). The most recent CentroidFold software has implemented the stochastic suboptimal folding algorithm like Sfold (7) with the stochastic traceback algorithm for the CONTRAfold model instead of the McCaskill model. We are planing to provide its web interface for easy use.

FUNDING

This work was supported in part by a grant from ‘Functional RNA Project’ funded by the New Energy and Industrial Technology Development Organization (NEDO) of Japan, and was also supported in part by Grant-in-Aid for Scientific Research on Priority Area ‘Comparative Genomics’ from the Ministry of Education, Culture, Sports, Science and Technology of Japan. Funding for open access charge: Internal fund of Computational Biology Research Center. Conflict of interest statement. None declared.

10 in total

1. A statistical sampling algorithm for RNA secondary structure prediction.

Authors: Ye Ding; Charles E Lawrence
Journal: Nucleic Acids Res Date: 2003-12-15 Impact factor: 16.971

2. Pfold: RNA secondary structure prediction using stochastic context-free grammars.

Authors: Bjarne Knudsen; Jotun Hein
Journal: Nucleic Acids Res Date: 2003-07-01 Impact factor: 16.971

3. Vienna RNA secondary structure server.

Authors: Ivo L Hofacker
Journal: Nucleic Acids Res Date: 2003-07-01 Impact factor: 16.971

4. RNA secondary structure prediction by centroids in a Boltzmann weighted ensemble.

Authors: Ye Ding; Chi Yu Chan; Charles E Lawrence
Journal: RNA Date: 2005-08 Impact factor: 4.942

5. CONTRAfold: RNA secondary structure prediction without physics-based models.

Authors: Chuong B Do; Daniel A Woods; Serafim Batzoglou
Journal: Bioinformatics Date: 2006-07-15 Impact factor: 6.937

6. The equilibrium partition function and base pair binding probabilities for RNA secondary structure.

Authors: J S McCaskill
Journal: Biopolymers Date: 1990 May-Jun Impact factor: 2.505

7. Centroid estimation in discrete high-dimensional spaces with applications in biology.

Authors: Luis E Carvalho; Charles E Lawrence
Journal: Proc Natl Acad Sci U S A Date: 2008-02-27 Impact factor: 11.205

8. Prediction of RNA secondary structure using generalized centroid estimators.

Authors: Michiaki Hamada; Hisanori Kiryu; Kengo Sato; Toutai Mituyama; Kiyoshi Asai
Journal: Bioinformatics Date: 2008-12-18 Impact factor: 6.937

9. Optimal computer folding of large RNA sequences using thermodynamics and auxiliary information.

Authors: M Zuker; P Stiegler
Journal: Nucleic Acids Res Date: 1981-01-10 Impact factor: 16.971

10. Evaluation of several lightweight stochastic context-free grammars for RNA secondary structure prediction.

Authors: Robin D Dowell; Sean R Eddy
Journal: BMC Bioinformatics Date: 2004-06-04 Impact factor: 3.169

10 in total

117 in total

1. Improved prediction of RNA tertiary structure with insights into native state dynamics.

Authors: John Paul Bida; L James Maher
Journal: RNA Date: 2012-01-25 Impact factor: 4.942

2. Identification of cis- and trans-acting factors involved in the localization of MALAT-1 noncoding RNA to nuclear speckles.

Authors: Ryu Miyagawa; Keiko Tano; Rie Mizuno; Yo Nakamura; Kenichi Ijiri; Randeep Rakwal; Junko Shibato; Yoshinori Masuo; Akila Mayeda; Tetsuro Hirose; Nobuyoshi Akimitsu
Journal: RNA Date: 2012-02-21 Impact factor: 4.942

3. A single-base substitution suppresses flower color mutation caused by a novel miniature inverted-repeat transposable element in gentian.

Authors: Masahiro Nishihara; Takashi Hikage; Eri Yamada; Takashi Nakatsuka
Journal: Mol Genet Genomics Date: 2011-10-15 Impact factor: 3.291

4. FASTR: A novel data format for concomitant representation of RNA sequence and secondary structure information.

Authors: Tungadri Bose; Anirban Dutta; Mohammed Mh; Hemang Gandhi; Sharmila S Mande
Journal: J Biosci Date: 2015-09 Impact factor: 1.826

5. Widespread distribution and structural diversity of Type IV IRESs in members of Picornaviridae.

Authors: Mukta Asnani; Parimal Kumar; Christopher U T Hellen
Journal: Virology Date: 2015-02-27 Impact factor: 3.616

6. Base-pairing probability in the microRNA stem region affects the binding and editing specificity of human A-to-I editing enzymes ADAR1-p110 and ADAR2.

Authors: Soh Ishiguro; Josephine Galipon; Rintaro Ishii; Yutaka Suzuki; Shinji Kondo; Mariko Okada-Hatakeyama; Masaru Tomita; Kumiko Ui-Tei
Journal: RNA Biol Date: 2018-07-24 Impact factor: 4.652

7. CentroidHomfold-LAST: accurate prediction of RNA secondary structure using automatically collected homologous sequences.

Authors: Michiaki Hamada; Koichiro Yamada; Kengo Sato; Martin C Frith; Kiyoshi Asai
Journal: Nucleic Acids Res Date: 2011-05-11 Impact factor: 16.971

8. Evolutionary patterns and coevolutionary consequences of MIRNA genes and microRNA targets triggered by multiple mechanisms of genomic duplications in soybean.

Authors: Meixia Zhao; Blake C Meyers; Chunmei Cai; Wei Xu; Jianxin Ma
Journal: Plant Cell Date: 2015-03-06 Impact factor: 11.277

9. Vasa promotes Drosophila germline stem cell differentiation by activating mei-P26 translation by directly interacting with a (U)-rich motif in its 3' UTR.

Authors: Niankun Liu; Hong Han; Paul Lasko
Journal: Genes Dev Date: 2009-12-01 Impact factor: 11.361

10. MicroRNA target prediction: theory and practice.

Authors: Mathias Wagner; Benjamin Vicinus; Vilma Oliveira Frick; Michael Auchtor; Claudia Rubie; Pascal Jeanmonod; Tereza A Richards; Roland Linder; Frank Weichert
Journal: Mol Genet Genomics Date: 2014-06-18 Impact factor: 3.291