Literature DB >> 16845038

The ARTS web server for aligning RNA tertiary structures.

Oranit Dror¹, Ruth Nussinov, Haim J Wolfson.

Abstract

RNA molecules with common structural features may share similar functional properties. Structural comparison of RNAs and detection of common substructures is, thus, a highly important task. Nevertheless, the current available tools in the RNA community provide only a partial solution, since they either work at the 2D level or are suitable for detecting predefined or local contiguous tertiary motifs only. Here, we describe a web server built around ARTS, a method for aligning tertiary structures of nucleic acids (both RNA and DNA). ARTS receives a pair of 3D nucleic acid structures and searches for a priori unknown common substructures. The search is truly 3D and irrespective of the order of the nucleotides on the chain. The identified common substructures can be large global folds with hundreds and even thousands of nucleotides as well as small local motifs with at least two successive base pairs. The method is highly efficient and has been used to conduct an all-against-all comparison of all the RNA structures in the Protein Data Bank. The web server together with a software package for download are freely accessible at http://bioinfo3d.cs.tau.ac.il/ARTS.

Entities: Chemical Disease Gene Species

Mesh：

Substances：
RNA

Year: 2006 PMID： 16845038 PMCID： PMC1538835 DOI： 10.1093/nar/gkl312

Source DB: PubMed Journal: Nucleic Acids Res ISSN： 0305-1048 Impact factor: 16.971

INTRODUCTION

In recent years there is a fast growing interest in RNA molecules. This stems from the groundbreaking discovery that RNA is not solely a carrier of genetic information, but a key player in a wide range of essential processes within the cell, such as protein synthesis and transport, RNA processing and splicing, gene silencing, and chromosome replication (1–3). RNA is also involved in many pathological processes, like cancerous tumors and retroviral infections as AIDS. Much like proteins, understanding the functions of these active RNA molecules requires methods for analyzing their tertiary structures. However, in contrast to the wide range of 3D structure-based approaches available for proteins (4), a similar field for RNA is only now emerging. Many methods for structure analysis of RNA have been developed to work at the secondary structure level, that is the level of base pairing (5). In the absence of RNA tertiary structures, such methods provide an excellent starting point for exploring RNA structures. However, their inherent limitation is that they are incapable of predicting and annotating tertiary interactions. These interactions are formed between secondary structure elements and are crucial for establishing the global fold of an RNA (6,7). Fortunately, in the past few years both the number and size of solved RNA tertiary structures has dramatically increased. This has given rise to various computational tools for 3D structural analysis. A variety of methods are available for analyzing and classifying nucleotide conformations and spatial base interactions (8–13). Several other methods have been suggested for measuring the similarity between larger RNA structures, but require the structures to be with the same number of nucleotides and with a predefined correspondence between them (14–16). Fewer methods are available for locating small predefined motifs in larger structures (15,17). These methods are useful for finding new examples of known motifs, but are incapable of discovering novel ones. To date, the problem of identifying a priori unknown common substructures is only partially addressed by a few methods for recognizing recurring 3D contiguous fragments (18,19). Thus, there is a great need for new approaches. Herein, we present a web server built around the ARTS method [ (20)] for aligning 3D nucleic acid structures. Compared with the current very few comparison tools available for tackling this task, ARTS is suitable for identifying a priori unknown common substructures that may not necessarily be contiguous. The common substructures can be either large global folds containing hundreds and even thousands of nucleotides or small local spatial motifs with at least two successive base pairs. ARTS is also highly efficient requiring typically a few seconds for comparing a pair of average-size RNA structures with hundreds of nucleotides. The tool has been used to conduct an all-against-all comparison of all the RNA 3D structures currently available in the Protein Data Bank (PDB) (21). The results can be accessed via the website.

METHOD OUTLINE

The input is a pair of nucleic acid structures represented by the 3D coordinates of their atoms. The phosphate atoms are singled out as critical points and each structure is represented as a set of points in 3D space, where each point is the position of a phosphate atom. Using this representation the problem is a version of the Largest Common Point Set (LCP) problem in Computational Geometry. Namely, the task is to find a rigid transformation (rotation and translation) that superimposes the largest number of phosphate atoms of one structure onto the phosphate atoms of the other one within a predefined bottleneck matching distance (22) error. Although this problem has been studied extensively, the current known exact and approximate algorithms for solving it are impractical, since they require O(n32.5) and O(n8.5) time, respectively, where n is the number of phosphate atoms (22,23). ARTS [ (20)] is thus a heuristic method. By exploiting the base pairing and stacking properties of nucleic acids, it is capable of providing biologically relevant solutions in practical running times, even for large compact structures with thousands of nucleotides like the ribosome. Its time complexity is O(n3). Unlike the LCP problem, the goal is to maximize both the number of superimposed phosphate atoms and the number of superimposed base pairs. The rationale is that more than half of the nucleotides in an average non-coding RNA are involved in base pairing and the stems that they form are evolutionarily more conserved than loops (7). It is thus unlikely that an alignment with a large number of superimposed base pairs will be biologically meaningless, as might happen when solving the pure geometric LCP problem. In the first stage, all the possible local alignments of two successive base pairs between the structures are constructed. Then, a greedy approach is used to extend the local alignments so that a maximal number of phosphate atoms and base pairs will be superimposed. Finally, the global alignments are scored, clustered and ranked, and the highest scoring ones are reported. We estimate the significance of the obtained alignments by computing the P-value of their score with respect to a random dataset Γ of pairwise alignments. In the current version of the application Γ contains ∼245 000 alignments that have been randomly chosen from an all-against-all comparison of all the RNA structures in the PDB. The P-value of an alignment between a pair of RNA structures with n and m nucleotides is computed with respect to all alignments in Γ for which the number of nucleotides in the smallest structure is ±20% min(n,m). The resulting P-value of the alignment represents the probability that a pairwise alignment for which the size of the smallest structure is similar would receive a higher or equal score by chance.

WEB SERVER

The ARTS web server as well as an accompanied software package are freely available at .

Input

The user interface of the web server is straightforward (Figure 1a). It requires the user to enter an Email address and a pair of nucleic acid structures in PDB format (21). The structures can be either uploaded to the server or retrieved from the PDB. In the second case the user has to enter a four-character PDB code, optionally followed by a colon and a list of chain IDs, for instance ‘1u6b’, ‘1u6b:B’ and ‘1u6b:BC’. In both cases, the structures must contain all atoms and not only the ones on the backbone. The reason is that otherwise hydrogen bonds cannot be computed and these are necessary for finding base pairs. Another requirement is that each structure has at least two successive base pairs.

Figure 1

The ARTS web server. (a) The entrance page of the web server. The user is required to enter an Email address and a pair of nucleic acid structures in PDB format (21). The structures can be either uploaded to the server or retrieved from the PDB. In the second case the user has to enter a four-character PDB code, optionally followed by a colon and a list of chain IDs. (b) A web page with a summary of the 10 top-ranking alignments obtained for ‘1u6b:B’ and ‘1y0q’ PDB codes. (c) The page obtained after clicking on the ‘BP Core Size’ field of the top-ranking alignment in the summary page presented in (b). (d) The superimposition of the input structures displayed by PyMOL (26) after clicking on the ‘PDB Alignment’ field of the top-ranking alignment in the summary page presented in (b). The backbone of the two structures, PDB:1u6bB and PDB:1y0q, is depicted in red and blue, respectively. The matched base pairs are in green and the matched unpaired nucleotides are in yellow.

Output

A typical run of ARTS for comparing a pair of average-size nucleic acid structures with hundreds of nucleotides takes a few seconds. After the run completes, a web page with a summary of the obtained alignments is displayed. In addition, an Email with a link to this web page is sent to the user. Figure 1b displays a summary page obtained for two self-splicing group I introns, the Azoarcus pre-tRNA intron with both exons [PDB—1u6b:B (24)] and the Twort ribozyme intron [PDB—1y0q (25)]. The page contains two tables. The upper table shows the name of the compared structures and the number of nucleotides and base pairs in each structure. The bottom table shows the 10 top-ranking alignments sorted in descending order by their score. Besides the score, the following data are presented for each alignment: (i) the number of matched base pairs (BP Core Size); (ii) the total number of matched nucleotides including unpaired ones (Core Size); (iii) the root mean square deviation (RMSD) between the phosphate atoms of the matched nucleotides in the core; (iv) the P-value; and (v) a PDB file with the aligned structures. Clicking on the ‘BP Core Size’ field of one of the alignments displays a new page with a table of the matched base pairs. Figure 1c shows the page obtained after clicking on the ‘BP Core Size’ field of the top-ranking alignment in the summary page presented in Figure 1b. The table consists of two columns, one for each structure. Each line corresponds to a match between 2 base-pairs, and each entry provides the chain identifier, base type and residue number of the 2 nucleotides in the corresponding base pair. Clicking on the ‘Core Size’ field of one of the alignments in the summary page (Figure 1b) displays a similar page with a table of all matched nucleotides (paired and unpaired). A PDB file with the input structures superimposed one onto another will be downloaded or presented by a viewer (if configured) when clicking on the ‘PDB Alignment’ field of one of the alignments in the summary page. Figure 1d shows the superimposition of the input structures displayed by the PyMOL viewer [ (26)] after clicking on the ‘PDB alignment’ field of the top-ranking alignment in the summary page presented in Figure 1b. Scripts for easy use with viewers are provided with the software package. Among them are PyMOL [ (26)] and RasMol (27) scripts for displaying the alignments and selecting the matched base pairs (bpcore) and all matched nucleotides including unpaired ones (core).

CONCLUSIONS

We have presented a freely available web server accompanied by a software package for 3D structural alignment of nucleic acids. The web server receives as input a pair of tertiary structures of nucleic acids in PDB format, and searches for a priori unknown common substructures that are not necessarily contiguous. The output consists of the top-ranking superpositions between the two input structures in PDB format and corresponding lists of matched nucleotides in the common substructures. To the best of our knowledge, this is the first web server that performs RNA structural comparisons that are truly 3D and irrespective of the order of the nucleotides on the chain. The only requirement is that there are at least two consecutive base pairs in the match. The algorithm behind the web server is highly efficient, where a typical comparison of two nucleic acids takes a few seconds on a standard PC. An all-against-all comparison of all the RNA structures currently available in the PDB has been carried out and the results can be accessed via the web server. In future work we intend to allow online searches of uploaded structures against the entire PDB.

23 in total

1. The Protein Data Bank.

Authors: H M Berman; J Westbrook; Z Feng; G Gilliland; T N Bhat; H Weissig; I N Shindyalov; P E Bourne
Journal: Nucleic Acids Res Date: 2000-01-01 Impact factor: 16.971

2. Calculating nucleic acid secondary structure.

Authors: M Zuker
Journal: Curr Opin Struct Biol Date: 2000-06 Impact factor: 6.809

Review 3. Overview of nucleic acid analysis programs.

Authors: X J Lu; M S Babcock; W K Olson
Journal: J Biomol Struct Dyn Date: 1999-02

4. An expanding universe of noncoding RNAs.

Authors: Gisela Storz
Journal: Science Date: 2002-05-17 Impact factor: 47.728

5. Automated identification of RNA conformational motifs: theory and application to the HM LSU 23S rRNA.

Authors: Eli Hershkovitz; Emmanuel Tannenbaum; Shelley B Howerton; Ajay Sheth; Allen Tannenbaum; Loren Dean Williams
Journal: Nucleic Acids Res Date: 2003-11-01 Impact factor: 16.971

Review 6. The chemical repertoire of natural ribozymes.

Authors: Jennifer A Doudna; Thomas R Cech
Journal: Nature Date: 2002-07-11 Impact factor: 49.962

7. Representation, searching and discovery of patterns of bases in complex RNA structures.

Authors: Anne-Marie Harrison; Darren R South; Peter Willett; Peter J Artymiuk
Journal: J Comput Aided Mol Des Date: 2003-08 Impact factor: 3.686

8. Tools for the automatic identification and classification of RNA base pairs.

Authors: Huanwang Yang; Fabrice Jossinet; Neocles Leontis; Li Chen; John Westbrook; Helen Berman; Eric Westhof
Journal: Nucleic Acids Res Date: 2003-07-01 Impact factor: 16.971

9. RNA structure comparison, motif search and discovery using a reduced representation of RNA conformational space.

Authors: Carlos M Duarte; Leven M Wadley; Anna Marie Pyle
Journal: Nucleic Acids Res Date: 2003-08-15 Impact factor: 16.971

Review 10. From structure to function: methods and applications.

Authors: Haim J Wolfson; Maxim Shatsky; Dina Schneidman-Duhovny; Oranit Dror; Alexandra Shulman-Peleg; Buyong Ma; Ruth Nussinov
Journal: Curr Protein Pept Sci Date: 2005-04 Impact factor: 3.272

27 in total

1. Introduction to special issue on RNA.

Authors: Peter Clote
Journal: J Math Biol Date: 2008-01 Impact factor: 2.259

Review 2. The IRF family, revisited.

Authors: A Paun; P M Pitha
Journal: Biochimie Date: 2007-02-20 Impact factor: 4.079

3. 3DNA: a versatile, integrated software system for the analysis, rebuilding and visualization of three-dimensional nucleic-acid structures.

Authors: Xiang-Jun Lu; Wilma K Olson
Journal: Nat Protoc Date: 2008 Impact factor: 13.491

4. Topology independent comparison of RNA 3D structures using the CLICK algorithm.

Authors: Minh N Nguyen; Adelene Y L Sim; Yue Wan; M S Madhusudhan; Chandra Verma
Journal: Nucleic Acids Res Date: 2016-09-14 Impact factor: 16.971

5. Absence of knots in known RNA structures.

Authors: Cristian Micheletti; Marco Di Stefano; Henri Orland
Journal: Proc Natl Acad Sci U S A Date: 2015-02-02 Impact factor: 11.205

6. RNA FRABASE 2.0: an advanced web-accessible database with the capacity to search the three-dimensional fragments within RNA structures.

Authors: Mariusz Popenda; Marta Szachniuk; Marek Blazewicz; Szymon Wasik; Edmund K Burke; Jacek Blazewicz; Ryszard W Adamiak
Journal: BMC Bioinformatics Date: 2010-05-06 Impact factor: 3.169