Literature DB >> 29069520

RNArchitecture: a database and a classification system of RNA families, with a focus on structural information.

Pietro Boccaletto1, Marcin Magnus1, Catarina Almeida1, Adriana Zyla1, Astha Astha1, Radoslaw Pluta1, Blazej Baginski1, Elzbieta Jankowska1, Stanislaw Dunin-Horkawicz1, Tomasz K Wirecki1, Michal J Boniecki1, Filip Stefaniak1, Janusz M Bujnicki1,2.   

Abstract

RNArchitecture is a database that provides a comprehensive description of relationships between known families of structured non-coding RNAs, with a focus on structural similarities. The classification is hierarchical and similar to the system used in the SCOP and CATH databases of protein structures. Its central level is Family, which builds on the Rfam catalog and gathers closely related RNAs. Consensus structures of Families are described with a reduced secondary structure representation. Evolutionarily related Families are grouped into Superfamilies. Similar structures are further grouped into Architectures. The highest level, Class, organizes families into very broad structural categories, such as simple or complex structured RNAs. Some groups at different levels of the hierarchy are currently labeled as 'unclassified'. The classification is expected to evolve as new data become available. For each Family with an experimentally determined three-diemsional (3D) structure(s), a representative one is provided. RNArchitecture also presents theoretical models of RNA 3D structure and is open for submission of structural models by users. Compared to other databases, RNArchitecture is unique in its focus on structure-based RNA classification, and in providing a platform for storing RNA 3D structure predictions. RNArchitecture can be accessed at http://iimcb.genesilico.pl/RNArchitecture/.
© The Author(s) 2017. Published by Oxford University Press on behalf of Nucleic Acids Research.

Entities:  

Mesh:

Substances:

Year:  2018        PMID: 29069520      PMCID: PMC5753356          DOI: 10.1093/nar/gkx966

Source DB:  PubMed          Journal:  Nucleic Acids Res        ISSN: 0305-1048            Impact factor:   16.971


INTRODUCTION

RNA molecules play fundamental roles in cellular processes. They have been long known to carry genetic information and to synthesize proteins. They may detect the presence of ions or small molecules in the environment, regulate gene expression at various levels (from DNA to RNA, to proteins) and catalyze chemical reactions (reviewed comprehensively in (1)). Many RNAs that have been structurally characterized form compact, functional, three-dimensional (3D) structures that determine their function and interactions with other molecules, in a similar manner to sequence-structure-function relationships that have been well described for proteins. Knowledge of similarity between biological macromolecules enables us to cluster them, to group them into families, superfamilies and higher-level organizations, to infer their evolutionary history, to detect functional motifs and thus to predict the mechanism of their action (2). The need to compare and classify protein structures has led to the development of commonly used databases and hierarchical structural classifications, such as SCOP (3) and CATH (4). Thanks to these and other computational resources, comparing and classifying proteins was demonstrated to be of crucial importance for function inference and it continues to be used in many applications. For RNA structures several databases have been developed. However, some of them, including RNABase (5) and SCOR (6) are not updated any more, while others, that implement automated clustering, such as HD-RNAS (7) or RNA Structure Atlas (8), provide information only about very close similarities. Currently, no comprehensive database of RNA families exists that provides information about structural similarities and dissimilarities at a level analogous to superfamily or fold in SCOP, or to topology or architecture in CATH. This unmet need has prompted us to develop the RNArchitecture database and its new hierarchical classification system. RNArchitecture is based on an established catalogue of Rfam families and extends this classification toward higher levels of organization built on structural considerations.

DATABASE CONTENT

The RNArchitecture database has been developed to provide a comprehensive classification system that describes relationships between RNA families, with a focus on structural similarities (Figure 1). RNArchitecture uses and organizes information from several databases, including Rfam (9) and Protein Data Bank (10), and introduces a SCOP/CATH-like hierarchical classification. The central level of classification is Family, which has been largely taken from the Rfam database. RNArchitecture includes 2688 Families of which only 2.54% (74 Families) have a structural model solved experimentally. Families group together evolutionarily related RNAs with conserved structure and detectable sequence similarity. Families whose members exhibit structural variation are further subdivided into Subfamilies. On the other hand, Families with similar structures and functions, and likely to be evolutionarily related (or at least converged to fulfill the same role in essentially the same way) are grouped into Superfamilies (which are more extensive than clans currently defined by Rfam). Superfamilies that share a similar core structure, but which are not clear homologs, are grouped into Architectures. The highest level, Class, organizes the data into very broad structural and functional categories. The coarse-graining of secondary structure in RNArchitecture is not much used at the ‘evolutionary’ level of classification into Superfamilies, but mostly at higher levels of Architecture and Class, which group RNAs according to structural similarities, in analogy to fold and class in SCOP. RNArchitecture also serves as a repository of theoretical models of RNA 3D structures, open for submission from users.
Figure 1.

A sunburst plot illustrating the hierarchy of RNArchitecture and the content of the 1.0 release. The outermost layer indicates 2688 RNA Families. The successive layers combine these Families into 1721 Superfamilies, 22 Architectures and finally into two Classes. Names are shown for Classes and largest Architectures, and Superfamilies.

Dataset acquisition and processing

The dataset of Families (in each case including the multiple sequence alignment, the consensus secondary structure, and the consensus sequence) was constructed based on Rfam (release 12.3), and expanded to include known RNA families that are currently not covered by Rfam. In particular, we included group I and group II intron Families which are not included in Rfam as complete full-length sequences. Group I intron sequences and alignments were obtained from GISSD: Group I Intron Sequence and Structure Database (11), and group II intron sequences were obtained from The Database for Bacterial Group II Introns (12) and the alignments were generated by us. These Families were further subdivided into Subfamilies, as proposed in these databases. The consensus structures were used to calculate reduced shape representations, similarly to the RNA shapes approach (13,14). We used an in-house program to convert full secondary structure representation to Level1 reduced representation, in which a single pair of brackets corresponds to two uninterrupted segments of paired residues, and then to Level2 reduced representation, in which a single pair of brackets corresponds to two series of mutually paired segments that can be interrupted by bulges and loops, but are not interrupted by residues paired with other segments (Figure 2). Level2 reduced representations were compared and the most common shapes were used, along with the 3D structural information, wherever available, to define the Architectures.
Figure 2.

Example of key new information offered by the RNArchitecture database as a basis of the classification system for RNA Families. (A) 3D structural model of the DP (downstream peptide) Family representative. (B) Reduction of the DP Family representative secondary structure assignment to simplified representations used for classifications into Architectures and Classes—in this case a pseudoknot. Colors indicate conserved elements of secondary structure.

A sunburst plot illustrating the hierarchy of RNArchitecture and the content of the 1.0 release. The outermost layer indicates 2688 RNA Families. The successive layers combine these Families into 1721 Superfamilies, 22 Architectures and finally into two Classes. Names are shown for Classes and largest Architectures, and Superfamilies. Example of key new information offered by the RNArchitecture database as a basis of the classification system for RNA Families. (A) 3D structural model of the DP (downstream peptide) Family representative. (B) Reduction of the DP Family representative secondary structure assignment to simplified representations used for classifications into Architectures and Classes—in this case a pseudoknot. Colors indicate conserved elements of secondary structure. For each Family, a representative member was identified to illustrate exemplary structural information. First, for each Family with a member that has an experimentally determined 3D structure (according to Rfam annotation), the structural coordinates were obtained from the Protein Data Bank (10). For these structurally characterized RNAs, sequences were extracted from the PDB file using do_x3dna (15), and secondary structure was annotated using ClaRNA (16). Additional data, such as header, title, molecule, and information about ligands, cofactors and hetero atoms, were derived from the associated PDB file using pypdb (17). Second, for each Family without an experimentally determined 3D structure, we selected the closest sequence to the consensus sequence, in terms of sequence identity. Its secondary structure was assigned based on the Rfam consensus secondary structure. In a few selected cases (e.g., in the case of RNAs with additional structural and functional information), a different family representative was identified, and secondary structure was assigned manually, based on literature and database information. For a number of Families without experimentally determined 3D structure, we generated 3D structural models, aiming to provide at least one 3D structural model for each Architecture and for the largest Superfamilies. Exceptions, for which we decided not to generate 3D structure models, include Families considered as largely unstructured, and hence unlikely to possess a stable unique 3D structure, or those without reliable structural information. Briefly, depending on the availability of a tentative structural template, a preliminary model for a family representative was generated with ModeRNA (18) or RNA Composer (19). This starting model was then refolded with SimRNA (20), with default parameters, using secondary structure restraints. Unless otherwise noted, the modeling was largely automated, and the resulting models have not been inspected for agreement with published literature, therefore they must be considered as tentative and still to be improved by more exhaustive studies.

Search

Options for database searching and querying have been implemented, including search using PDB IDs, names of Class, Architecture, Superfamily, Family, Subfamily, Rfam accession number and RNA type. The database also includes a powerful and dedicated RNA shape search tool for the exploration of the Families that contain a particular architectural motif, e.g., a simple pseudoknot, by querying the database with ‘([)]’. Currently such searches are either not possible or very difficult to make with other databases of RNA families or RNA structures.

Database implementation

An object-relational PostgreSQL database management system is used to store all the information. The web server is implemented for the Linux operating system under Django. Programs to export, to import and to visualize have been implemented in Python 3, including our in-house package rna-pdb-tools (freely available https://github.com/mmagnus/rna-pdb-tools/). The secondary structure visualizations are generated with VARNA (21), the tertiary structure visualizations are rendered with JSmol (22).

Future prospects

The number of experimentally determined RNA molecules is increasing rapidly, in line with recent discoveries and growing interest in RNA functions. The number of experimentally determined RNA structures is also growing, albeit at a much slower rate. Therefore, RNArchitecture is expected to be updated systematically with new information. In particular, we envisage updating it, following all major updates of the Rfam database. The catalog of Families is intended to be systematically expanded, to include additional sequences e.g., from the RNAcentral database (23). Structural information will be updated with new experimentally determined RNA structures, as well as with improved theoretical models. The generation of 3D structure representatives for all Families that are predicted to form stable 3D structures is an ongoing process and we intend to expand the current dataset of models to maximize the coverage. We encourage the users of RNArchitecture to submit models of RNA 3D structures, preferably ones with experimental support, to be included in the database, as well as suggestions for improvement of the existing classification. For the next release, we plan to expand the structural repertoire to include multiple structures (e.g., for different members of the same Family, for different functional/structural states of the same RNA, or for alternative theoretical models) and structural superpositions. New structural data will be used to update and potentially revise the classification system, in particular, the assignment to Superfamilies and Architectures, and the diversification of Classes. Another envisaged next step of the RNArchitecture database development is to link it with databases on other aspects of RNA structure, such as RMDB (24). In particular, we envisage that RNArchitecture will evolve in concert with the developments of the RNA-Puzzles experiment (25), and will serve the community of RNA structure predictors. We hope that the RNArchitecture RNA structure classification project will prompt new advances in the field, for instance facilitating and stimulating the choice of targets for theoretical prediction and experimental determination of RNA 3D structures.

AVAILABILITY

The data are accessible freely for research purposes at http://iimcb.genesilico.pl/RNArchitecture/. All RNA structures in the PDB format, images and alignments in the Stockholm format are available for download. The scripts used to process the data are part of our in-house package rna-pdb-tools (freely available at https://github.com/mmagnus/rna-pdb-tools/).
  22 in total

1.  RNA-Puzzles: a CASP-like evaluation of RNA three-dimensional structure prediction.

Authors:  José Almeida Cruz; Marc-Frédérick Blanchet; Michal Boniecki; Janusz M Bujnicki; Shi-Jie Chen; Song Cao; Rhiju Das; Feng Ding; Nikolay V Dokholyan; Samuel Coulbourn Flores; Lili Huang; Christopher A Lavender; Véronique Lisi; François Major; Katarzyna Mikolajczak; Dinshaw J Patel; Anna Philips; Tomasz Puton; John Santalucia; Fredrick Sijenyi; Thomas Hermann; Kristian Rother; Magdalena Rother; Alexander Serganov; Marcin Skorupski; Tomasz Soltysinski; Parin Sripakdeevong; Irina Tuszynska; Kevin M Weeks; Christina Waldsich; Michael Wildauer; Neocles B Leontis; Eric Westhof
Journal:  RNA       Date:  2012-02-23       Impact factor: 4.942

2.  do_x3dna: a tool to analyze structural fluctuations of dsDNA or dsRNA from molecular dynamics simulations.

Authors:  Rajendra Kumar; Helmut Grubmüller
Journal:  Bioinformatics       Date:  2015-04-02       Impact factor: 6.937

3.  An RNA Mapping DataBase for curating RNA structure mapping experiments.

Authors:  Pablo Cordero; Julius B Lucks; Rhiju Das
Journal:  Bioinformatics       Date:  2012-09-12       Impact factor: 6.937

4.  Using sequence similarity networks for visualization of relationships across diverse protein superfamilies.

Authors:  Holly J Atkinson; John H Morris; Thomas E Ferrin; Patricia C Babbitt
Journal:  PLoS One       Date:  2009-02-03       Impact factor: 3.240

5.  ModeRNA: a tool for comparative modeling of RNA 3D structure.

Authors:  Magdalena Rother; Kristian Rother; Tomasz Puton; Janusz M Bujnicki
Journal:  Nucleic Acids Res       Date:  2011-02-07       Impact factor: 16.971

6.  Database for bacterial group II introns.

Authors:  Manuel A Candales; Adrian Duong; Keyar S Hood; Tony Li; Ryan A E Neufeld; Runda Sun; Bonnie A McNeil; Li Wu; Ashley M Jarding; Steven Zimmerly
Journal:  Nucleic Acids Res       Date:  2011-11-10       Impact factor: 16.971

7.  HD-RNAS: An Automated Hierarchical Database of RNA Structures.

Authors:  Shubhra Sankar Ray; Sukanya Halder; Stephanie Kaypee; Dhananjay Bhattacharyya
Journal:  Front Genet       Date:  2012-04-18       Impact factor: 4.599

8.  Rfam 12.0: updates to the RNA families database.

Authors:  Eric P Nawrocki; Sarah W Burge; Alex Bateman; Jennifer Daub; Ruth Y Eberhardt; Sean R Eddy; Evan W Floden; Paul P Gardner; Thomas A Jones; John Tate; Robert D Finn
Journal:  Nucleic Acids Res       Date:  2014-11-11       Impact factor: 19.160

9.  RNAcentral: an international database of ncRNA sequences.

Authors:  Anton I Petrov; Simon J E Kay; Richard Gibson; Eugene Kulesha; Dan Staines; Elspeth A Bruford; Mathew W Wright; Sarah Burge; Robert D Finn; Paul J Kersey; Guy Cochrane; Alex Bateman; Sam Griffiths-Jones; Jennifer Harrow; Patricia P Chan; Todd M Lowe; Christian W Zwieb; Jacek Wower; Kelly P Williams; Corey M Hudson; Robin Gutell; Michael B Clark; Marcel Dinger; Xiu Cheng Quek; Janusz M Bujnicki; Nam-Hai Chua; Jun Liu; Huan Wang; Geir Skogerbø; Yi Zhao; Runsheng Chen; Weimin Zhu; James R Cole; Benli Chai; Hsien-Da Huang; His-Yuan Huang; J Michael Cherry; Artemis Hatzigeorgiou; Kim D Pruitt
Journal:  Nucleic Acids Res       Date:  2014-10-28       Impact factor: 16.971

10.  GISSD: Group I Intron Sequence and Structure Database.

Authors:  Yu Zhou; Chen Lu; Qi-Jia Wu; Yu Wang; Zhi-Tao Sun; Jia-Cong Deng; Yi Zhang
Journal:  Nucleic Acids Res       Date:  2007-10-16       Impact factor: 16.971

View more
  13 in total

1.  rna-tools.online: a Swiss army knife for RNA 3D structure modeling workflow.

Authors:  Marcin Magnus
Journal:  Nucleic Acids Res       Date:  2022-05-17       Impact factor: 19.160

2.  RR3DD: an RNA global structure-based RNA three-dimensional structural classification database.

Authors:  Xu Hong; Jinfang Zheng; Juan Xie; Xiaoxue Tong; Xudong Liu; Qi Song; Sen Liu; Shiyong Liu
Journal:  RNA Biol       Date:  2021-10-18       Impact factor: 4.766

3.  The 2018 Nucleic Acids Research database issue and the online molecular biology database collection.

Authors:  Daniel J Rigden; Xosé M Fernández
Journal:  Nucleic Acids Res       Date:  2018-01-04       Impact factor: 16.971

4.  Evolutionary Trends in RNA Base Selectivity Within the RNase A Superfamily.

Authors:  Guillem Prats-Ejarque; Lu Lu; Vivian A Salazar; Mohammed Moussaoui; Ester Boix
Journal:  Front Pharmacol       Date:  2019-10-09       Impact factor: 5.810

5.  RNA 3D structure prediction guided by independent folding of homologous sequences.

Authors:  Marcin Magnus; Kalli Kappel; Rhiju Das; Janusz M Bujnicki
Journal:  BMC Bioinformatics       Date:  2019-10-22       Impact factor: 3.169

6.  Discovery of new group I-D introns leads to creation of subtypes and link to an adaptive response of the mitochondrial genome in fungi.

Authors:  Benjamin Cinget; Richard R Bélanger
Journal:  RNA Biol       Date:  2020-05-23       Impact factor: 4.652

Review 7.  SINEUPs: a novel toolbox for RNA therapeutics.

Authors:  Stefano Espinoza; Carlotta Bon; Paola Valentini; Bianca Pierattini; Abraham Tettey Matey; Devid Damiani; Salvatore Pulcrano; Remo Sanges; Francesca Persichetti; Hazuki Takahashi; Piero Carninci; Claudio Santoro; Diego Cotella; Stefano Gustincich
Journal:  Essays Biochem       Date:  2021-10-27       Impact factor: 8.000

8.  Towards SINEUP-based therapeutics: Design of an in vitro synthesized SINEUP RNA.

Authors:  Paola Valentini; Bianca Pierattini; Elsa Zacco; Damiano Mangoni; Stefano Espinoza; Natalie A Webster; Byron Andrews; Piero Carninci; Gian Gaetano Tartaglia; Luca Pandolfini; Stefano Gustincich
Journal:  Mol Ther Nucleic Acids       Date:  2022-02-02       Impact factor: 8.886

9.  The State of Long Non-Coding RNA Biology.

Authors:  John S Mattick
Journal:  Noncoding RNA       Date:  2018-08-10

10.  RNA-Puzzles toolkit: a computational resource of RNA 3D structure benchmark datasets, structure manipulation, and evaluation tools.

Authors:  Marcin Magnus; Maciej Antczak; Tomasz Zok; Jakub Wiedemann; Piotr Lukasiak; Yang Cao; Janusz M Bujnicki; Eric Westhof; Marta Szachniuk; Zhichao Miao
Journal:  Nucleic Acids Res       Date:  2020-01-24       Impact factor: 16.971

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.