Literature DB >> 29985979

RNAvista: a webserver to assess RNA secondary structures with non-canonical base pairs.

Maciej Antczak1,2, Marcin Zablocki1, Tomasz Zok1, Agnieszka Rybarczyk1,2, Jacek Blazewicz1,2, Marta Szachniuk1,2.   

Abstract

Motivation: In the study of 3D RNA structure, information about non-canonical interactions between nucleobases is increasingly important. Specialized databases support investigation of this issue based on experimental data, and several programs can annotate non-canonical base pairs in the RNA 3D structure. However, predicting the extended RNA secondary structure which describes both canonical and non-canonical interactions remains difficult.
Results: Here, we present RNAvista that allows predicting an extended RNA secondary structure from sequence or from the list enumerating canonical base pairs only. RNAvista is implemented as a publicly available webserver with user-friendly interface. It runs on all major web browsers. Availability and implementation: http://rnavista.cs.put.poznan.pl.

Entities:  

Mesh:

Substances:

Year:  2019        PMID: 29985979      PMCID: PMC6298044          DOI: 10.1093/bioinformatics/bty609

Source DB:  PubMed          Journal:  Bioinformatics        ISSN: 1367-4803            Impact factor:   6.937


1 Introduction

Full understanding of RNA-mediated biology requires knowledge of RNA structure, which is divided into three levels of organization: primary (nucleotide sequence), secondary and tertiary. Unlike proteins, RNA can act in an unstructured form (e.g. codons must be unpaired from mRNA self-structure in order to be translated by pairing with tRNAs) and some conserved sequence motifs can be detected based on RNA sequence itself. Nevertheless, the most functional motifs (involved in protein binding and cellular processes regulation) have a structural context and are related to secondary structure patterns. Their structural similarity also arises in the absence of significant sequence identity (Pietrosanto ). Structural motifs can even encode a stronger functional signal than sequence ones. In general, knowing RNA secondary structure reveals essential constraints governing the molecule’s physical properties and function (Pietrosanto ; Rybarczyk ). At a fundamental level, RNA secondary structure consists of base-paired and unpaired nucleotides from which arise such structural elements as helical stems and single-stranded regions (hairpins, bulges, internal loops and n-way junctions). Base pairs are either canonical (Watson-Crick or wobble base pairs) or non-canonical (formed by edge-to-edge hydrogen bonding interactions between the bases) (Leontis and Westhof, 2001). Non-canonical ones play an important role, e.g. in base-specific interactions with proteins or ligands. Taking them into account is also essential to make the RNA 3D structure modeling more reliable and accurate (Halder and Bhattacharyya, 2013). Experimental determination of RNA secondary structure is a laborious and expensive task (Weeks, 2010). Thus, its computational assessment via 3D structure-based annotation or sequence-based prediction is an attractive alternative. Among over 50 methods developed for the latter purpose, only seven can predict extended RNA secondary structure containing both canonical and non-canonical base pairs (Dallaire and Major, 2016; Honer zu Siederdissen ; Parisien and Major, 2008; Pietrosanto ; Rybarczyk ; Sloma and Mathews, 2017; Weinreb ). The remaining ones handle canonical base pairs only. In the case of non-canonical pairs, an annotation problem seems to be better explored. Following this observation, in (Rybarczyk ), we have introduced our own methodology to predict extended RNA secondary structure. It leads through RNA 3D structure prediction from sequence, followed by extended secondary structure annotation. Initially, we proposed to apply RNAComposer (Antczak ; Popenda ) in the first step, and RNApdbee (Antczak ; Zok ) in the following step. They had to be executed one by one, with all parameters set by the user separately in each step. Here, we present the RNAvista webserver that facilitates the use of our approach by integrating specialized versions of RNAComposer and RNApdbee’s engines in a fully automated computational pipeline.

2 Materials and methods

The RNAvista webserver assesses extended secondary structure of RNA from a given sequence or canonical secondary structure. It was built based on the following four-step procedure (Fig. 1): (i) prediction of canonical interactions, (ii) prediction of the tertiary structure, (iii) annotation of extended secondary structure and (iv) output data encoding and visualization.
Fig. 1.

Consecutive steps in the RNAvista workflow

Consecutive steps in the RNAvista workflow The first step is optional. It runs when the user inputs RNA sequence only and is skipped if canonical base pairs have been defined in dot-bracket notation (DBN). Six algorithms, CentroidFold (Sato ), ContextFold (Zakov ), CONTRAfold (Do ), IPknot (Sato ), RNAfold (Hofacker ) and RNAstructure (Reuter and Mathews, 2010), are incorporated in the RNAvista webserver to perform this computational step and the user can decide which one to apply. In the second step, the RNAComposer method (Popenda ) is used to predict the RNA 3D model from canonical secondary structure. The model is built based on 3D structure elements derived from experimentally determined RNA structures that often include non-canonical and pseudoknot interactions. Thus, at this step, the structure is enriched with non-canonical base pair data. Next, the extended secondary structure is derived from the predicted RNA 3D model, and non-canonical interactions are classified according to both Saenger (Saenger, 1984) and Leontis-Westhof (Leontis and Westhof, 2001) nomenclatures. These tasks are performed by RNApdbee method (Antczak ). Optionally, the user can choose which built-in procedures of RNApdbee, RNAView (Yang ), MC-Annotate (Gendron ) or 3DNA/DSSR (Lu and Olson, 2003), should be applied in the annotation process. Finally, the output structure is saved in text formats (DBN – dot-bracket notation, BPSEQ and CT – connect) and visualized. Non-canonical base pairs are graphically annotated using Leontis-Westhof pictograms.

2.1 Input and output description

In the simplest usage scenario, the user should input an RNA sequence (up to 500 nts long) in FASTA format and click the Run button. If the user has knowledge of possible canonical secondary structure, it can be introduced at the input in extended DBN. Input data can be typed in directly to the edit box or loaded from a local file. Three examples are available to facilitate familiarization with the system. RNAvista allows to set options of intermediate processing steps. An option panel, displayed on clicking Show advanced options, enables to select: (i) one of six algorithms for canonical base pair prediction (default: CentroidFold), (ii) one of three methods that derive extended secondary structure from 3D model (default: 3DNA/DSSR with Analyze helices option; Lu and Olson, 2003), (iii) one of two algorithms for resolving 2D structure topology (default: Hybrid Algorithm; Antczak ). Output data includes: (i) predicted secondary structure in graphical view (with non-canonical base pairs annotated), DBN, BPSEQ and CT formats, (ii) the list of non-canonical base pairs with their classification, (iii) view of the corresponding 3D structure and (iv) log files regarding intermediate processing steps. The data are presented on the result page and can be downloaded to a local drive.

3 Results

In our previous work (Rybarczyk ), we have conducted large-scale tests aimed to verify the accuracy of the results generated by the pipeline integrating RNAComposer and RNApdbee (now implemented in RNAvista webserver). Using data from RNA STRAND (Andronescu ), we executed one prediction experiment based on RNA sequence only and the second starting from canonical secondary structure. The input dataset was divided into size-wise subsets. The results showed that—depending on the input sequence length—the percentage of correctly predicted non-canonical base pairs ranged between 30.64 and 57.57% (for sequence-based prediction), and 49.91–70.51% (for secondary structure-based prediction) in comparison to the reference structure. These results are also true for RNAvista webserver. Here, we additionally decided to estimate the accuracy of predicting and annotating recurrent RNA motifs known to be defined by non-canonical interactions only. We have run RNAvista to predict the secondary structures of seven featured motifs from RNA 3D Motif Atlas (Petrov ): K-turn, T-loop, C-loop, Sarcin, GNRA, Double sheared and Triple sheared. 12 PDB-deposited RNA 3D structures carrying these modules have been selected for the experiment. We have executed RNAvista in both modes with the default settings (3DNA/DSSR, Hybrid Algorithm) to predict whole structures of selected 12 RNAs. Next, for every recurrent motif shelled out of the predicted RNA model, we compared its extended secondary structure generated by RNAvista to the reference one, and we calculated positive predictive value (PPV), true positive rate (sensitivity, TPR), and Matthews correlation coefficient (MCC). PPV, TPR and MCC values were computed for the analyzed motifs exclusively, thus, considering non-canonical interactions only. In the sequence-based mode (Table 1), RNAvista was tested with every incorporated method dedicated to canonical secondary structure prediction. One can see that the first step of the computational pipeline profoundly influences the results. An accurate structure defined by canonical interactions significantly contributes to obtaining a precise extended secondary structure (Table 2). Additionally, the results reveal the advantage of CentroidFold (Sato ), the default algorithm of RNAvista, over the other methods.
Table 1.

The accuracy of non-canonical interactions within recurrent RNA motifs predicted by RNAvista from the sequence (best values in bold)

MotifPDB ID:CentroidFold
ContextFold
CONTRAFold
IPknot
RNAFold
RNAstructure
ChainPPVTPRMCCPPVTPRMCCPPVTPRMCCPPVTPRMCCPPVTPRMCCPPVTPRMCC
T-loop1J1U: B1.0001.0001.0001.0001.0001.0000.0000.0000.0001.0001.0001.0001.0001.0001.0001.0001.0001.000
4P5J1.0001.0001.0000.0000.0000.0001.0001.0001.0001.0001.0001.0000.0000.0000.0001.0001.0001.000
Sarcin1JBR: D0.7140.8330.7720.4290.5000.4630.5710.6670.6170.4290.5000.4630.1430.2500.1890.2860.5000.378
1Q93: B1.0001.0001.0000.1430.2500.1891.0001.0001.0000.2430.3330.2180.4290.6000.5070.1430.2500.189
GNRA1JID: B1.0001.0001.0001.0000.5000.7071.0000.5000.7071.0000.5000.7071.0000.5000.7071.0000.5000.707
1Q93: B1.0001.0001.0000.0000.0000.0001.0001.0001.0000.0000.0000.0001.0001.0001.0000.0000.0000.000
C-loop4JRC: A0.0000.0000.0000.0000.0000.0000.0000.0000.0000.0000.0000.0000.0000.0000.0000.0000.0000.000
5B2Q: B0.0000.0000.0000.0000.0000.0000.0000.0000.0000.0000.0000.0000.0000.0000.0000.0000.0000.000
K-turn5FJC: A0.0000.0000.0000.2000.3330.2580.2000.2500.2240.2000.3330.2580.2000.3330.2580.2000.3330.258
4QVI: B0.6001.0000.7751.0001.0001.0000.4000.5000.4470.4001.0000.6320.2000.5000.3160.2000.5000.316
Double sheared5AOX: F0.5000.5000.5000.0000.0000.0001.0001.0001.0001.0001.0001.0001.0001.0001.0001.0001.0001.000
1MMS: C0.0000.0000.0000.0000.0000.0000.0000.0000.0000.0000.0000.0000.0000.0000.0000.0000.0000.000
Triple sheared4GMA0.0000.0000.0001.0001.0001.0000.3330.6670.4710.3331.0000.5770.0000.0000.0000.0000.0000.000
Table 2.

The accuracy of non-canonical interactions within recurrent RNA motifs predicted by RNAvista from canonical secondary structure

MotifPDB ID: ChainChain: Motif size [nts]PPVTPRMCC
T-loop1J1U: B74: 91.0001.0001.000
4P5J86: 91.0001.0001.000
Sarcin1JBR: D31: 150.7140.8330.772
1Q93: B27: 151.0001.0001.000
GNRA1JID: B29: 61.0000.5000.707
1Q93: B27: 61.0001.0001.000
C-loop4JRC: A57: 70.0000.0000.000
5B2Q: B94: 71.0001.0001.000
K-turn5FJC: A94: 121.0001.0001.000
4QVI: B81: 121.0001.0001.000
Double sheared5AOX: F87: 80.5001.0000.707
1MMS: C58: 80.6671.0000.816
Triple sheared4GMA210: 121.0001.0001.000
The accuracy of non-canonical interactions within recurrent RNA motifs predicted by RNAvista from the sequence (best values in bold) The accuracy of non-canonical interactions within recurrent RNA motifs predicted by RNAvista from canonical secondary structure

4 Conclusions

We presented RNAvista, the first webserver to predict extended RNA secondary structure (including non-canonical base pairs) from sequence or canonical secondary structure. We believe RNAvista can contribute to better understanding of RNA structure and improve its full description.

Funding

National Science Centre, Poland [2016/23/B/ST6/03931], Faculty of Computing, Poznan University of Technology, Poland [09/91/DSPB/0649], and the Institute of Bioorganic Chemistry, Polish Academy of Sciences. Conflict of Interest: none declared.
  26 in total

1.  Quantitative analysis of nucleic acid three-dimensional structures.

Authors:  P Gendron; S Lemieux; F Major
Journal:  J Mol Biol       Date:  2001-05-18       Impact factor: 5.469

2.  Geometric nomenclature and classification of RNA base pairs.

Authors:  N B Leontis; E Westhof
Journal:  RNA       Date:  2001-04       Impact factor: 4.942

3.  3DNA: a software package for the analysis, rebuilding and visualization of three-dimensional nucleic acid structures.

Authors:  Xiang-Jun Lu; Wilma K Olson
Journal:  Nucleic Acids Res       Date:  2003-09-01       Impact factor: 16.971

4.  RNAstructure: software for RNA secondary structure prediction and analysis.

Authors:  Jessica S Reuter; David H Mathews
Journal:  BMC Bioinformatics       Date:  2010-03-15       Impact factor: 3.169

5.  IPknot: fast and accurate prediction of RNA secondary structures with pseudoknots using integer programming.

Authors:  Kengo Sato; Yuki Kato; Michiaki Hamada; Tatsuya Akutsu; Kiyoshi Asai
Journal:  Bioinformatics       Date:  2011-07-01       Impact factor: 6.937

6.  RNA STRAND: the RNA secondary structure and statistical analysis database.

Authors:  Mirela Andronescu; Vera Bereg; Holger H Hoos; Anne Condon
Journal:  BMC Bioinformatics       Date:  2008-08-13       Impact factor: 3.169

7.  Automated classification of RNA 3D motifs and the RNA 3D Motif Atlas.

Authors:  Anton I Petrov; Craig L Zirbel; Neocles B Leontis
Journal:  RNA       Date:  2013-08-22       Impact factor: 4.942

8.  New algorithms to represent complex pseudoknotted RNA structures in dot-bracket notation.

Authors:  Maciej Antczak; Mariusz Popenda; Tomasz Zok; Michal Zurkowski; Ryszard W Adamiak; Marta Szachniuk
Journal:  Bioinformatics       Date:  2018-04-15       Impact factor: 6.937

9.  CENTROIDFOLD: a web server for RNA secondary structure prediction.

Authors:  Kengo Sato; Michiaki Hamada; Kiyoshi Asai; Toutai Mituyama
Journal:  Nucleic Acids Res       Date:  2009-05-12       Impact factor: 16.971

10.  Base pair probability estimates improve the prediction accuracy of RNA non-canonical base pairs.

Authors:  Michael F Sloma; David H Mathews
Journal:  PLoS Comput Biol       Date:  2017-11-06       Impact factor: 4.475

View more
  4 in total

1.  Molecular architecture of the human 17S U2 snRNP.

Authors:  Zhenwei Zhang; Cindy L Will; Karl Bertram; Olexandr Dybkov; Klaus Hartmuth; Dmitry E Agafonov; Romina Hofele; Henning Urlaub; Berthold Kastner; Reinhard Lührmann; Holger Stark
Journal:  Nature       Date:  2020-06-03       Impact factor: 49.962

2.  DSSR-enabled innovative schematics of 3D nucleic acid structures with PyMOL.

Authors:  Xiang-Jun Lu
Journal:  Nucleic Acids Res       Date:  2020-07-27       Impact factor: 16.971

3.  RNAfitme: a webserver for modeling nucleobase and nucleoside residue conformation in fixed-backbone RNA structures.

Authors:  Maciej Antczak; Tomasz Zok; Maciej Osowiecki; Mariusz Popenda; Ryszard W Adamiak; Marta Szachniuk
Journal:  BMC Bioinformatics       Date:  2018-08-22       Impact factor: 3.169

Review 4.  Phylogenetic Utility of rRNA ITS2 Sequence-Structure under Functional Constraint.

Authors:  Wei Zhang; Wen Tian; Zhipeng Gao; Guoli Wang; Hong Zhao
Journal:  Int J Mol Sci       Date:  2020-09-03       Impact factor: 5.923

  4 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.