Literature DB >> 27185896

iPARTS2: an improved tool for pairwise alignment of RNA tertiary structures, version 2.

Chung-Han Yang1, Cheng-Ting Shih2, Kun-Tze Chen2, Po-Han Lee2, Ping-Han Tsai2, Jian-Cheng Lin2, Ching-Yu Yen2, Tiao-Yin Lin3, Chin Lung Lu4.   

Abstract

Since its first release in 2010, iPARTS has become a valuable tool for globally or locally aligning two RNA 3D structures. It was implemented by a structural alphabet (SA)-based approach, which uses an SA of 23 letters to reduce RNA 3D structures into 1D sequences of SA letters and applies traditional sequence alignment to these SA-encoded sequences for determining their global or local similarity. In this version, we have re-implemented iPARTS into a new web server iPARTS2 by constructing a totally new SA, which consists of 92 elements with each carrying both information of base and backbone geometry for a representative nucleotide. This SA is significantly different from the one used in iPARTS, because the latter consists of only 23 elements with each carrying only the backbone geometry information of a representative nucleotide. Our experimental results have shown that iPARTS2 outperforms its previous version iPARTS and also achieves better accuracy than other popular tools, such as SARA, SETTER and RASS, in RNA alignment quality and function prediction. iPARTS2 takes as input two RNA 3D structures in the PDB format and outputs their global or local alignments with graphical display. iPARTS2 is now available online at http://genome.cs.nthu.edu.tw/iPARTS2/.
© The Author(s) 2016. Published by Oxford University Press on behalf of Nucleic Acids Research.

Entities:  

Mesh:

Substances:

Year:  2016        PMID: 27185896      PMCID: PMC4987943          DOI: 10.1093/nar/gkw412

Source DB:  PubMed          Journal:  Nucleic Acids Res        ISSN: 0305-1048            Impact factor:   16.971


INTRODUCTION

In addition to transmission of genetic information from DNA to proteins, RNA is capable of performing a wide range of biological functions in cells, including catalysis, genetic control and molecular recognition (1). Because the functions of RNAs are largely determined by their diverse three-dimensional (3D) structures, tools capable of efficiently and accurately comparing two RNA 3D structures are important in computational structural biology. Currently, several popular and useful tools of aligning two RNA 3D structure have been proposed based on heuristic approaches, such as SARA (2,3), iPARTS (4), SETTER (5,6) and RASS (7,8). Both SARA and iPARTS align two RNA 3D structures by using a similar approach, which reduces the 3D structures into one-dimensional (1D) sequences according to some local structure features in the nucleotide backbone conformation (i.e. backbone unit vectors used in SARA and backbone pseudo-torsion angles used in iPARTS) and then applies traditional sequence alignment algorithms to align the resulting 1D sequences (2–4). As to SETTER, it divides the RNA 3D structure into non-overlapping local structural units, called generalized secondary structure units (GSSUs), and then obtains their structural alignment by using a comparison method based on a distance measured by RMSD (Root Mean Square Deviation) transformation between all possible pairs of GSSUs (5,6). RASS develops a method based on elastic shape analysis, which treats the structures of RNAs as 3D curves with their 1D nucleotide sequence encoded on additional three dimensions, so that the structural alignment of two RNAs is performed in a joint sequence-structure space of six dimensions (7,8). The method we used to implement iPARTS (4) is the so-called structural alphabet (SA)-based approach, which uses an SA of 23 letters to reduce RNA 3D structures into 1D sequences of SA letters and applies traditional sequence alignment to these SA-encoded sequences for determining their global or local similarity. In fact, the accuracy performance of our iPARTS largely depends on the quality of the SA, which was constructed from a list of 117 RNA 3D structures using the pseudo-torsion angles of their nucleotide backbones. It has been shown that for RNAs, two pseudo-torsion angles (η and θ) are sufficient to describe the backbone conformation of each nucleotide (9). Actually, during the last 5 years after the introduction of our iPARTS, several hundreds of new RNA 3D structures have been determined and already deposited in the PDB/NDB databases (10,11). These newly determined RNA 3D structures should benefit us to improve the accuracy of our iPARTS by constructing a new and sufficiently high quality SA. In addition, as was reported in the study of RASS (7,8), both 1D nucleotide sequences and 3D structures of RNAs need to be taken into account when determining their functions, because 1D sequence carries side chain information of nucleotides, 3D structure carries the backbone geometry information of nucleotides and both types of information are different and can play important roles in determining RNA functions. In this study, we have re-implemented our previous tool iPARTS as a new web server named iPARTS2 (meaning iPARTS version 2) by constructing a totally new SA, which consists of 92 elements with each element carrying both information of base (1D) and backbone geometry (3D) for a representative nucleotide, from a representative and sufficiently non-redundant list of 876 atomic-resolution RNA 3D structures with 65154 nucleotides in total (12). This SA is significantly different from the one used in iPARTS, because the latter, constructed by using 117 crystal RNA structures with 9527 nucleotides, consists of only 23 elements, each of which carries only the backbone geometry information of a representative nucleotide. Like in iPARTS, we also equip iPARTS2 with two capabilities of aligning two RNA 3D structures: (i) global alignment that can be used to determine their overall structural similarity and (ii) local alignment that can be used to find their locally similar substructures. It is worth mentioning here that the function of local alignment in iPARTS2 is unique when compared with other tools SARA, SETTER and RASS, because they all provide the function of global alignment only. For validation, we have used a benchmark dataset FSCOR with 419 RNA 3D structures to test our iPARTS2 and compare the accuracy performance of its global alignment with its previous version iPARTS, as well as other popular tools SARA, SETTER and RASS. Our experimental results have finally shown that our current iPARTS2 indeed outperforms its previous version iPARTS and also achieves better accuracy than SARA, SETTER and RASS in RNA alignment quality and function prediction.

MATERIALS AND METHODS

In this study, we have implemented iPARTS2 by using an improved SA-based algorithm as follows. First, 63402 non-terminal nucleotides from the RNA 3D Hub non-redundant list (version 1.89) of 876 RNA 3D structures (12) were classified into 23 conformation clusters according to their backbone pseudo-torsion angles. Basically, nucleotides in the same cluster are structurally similar in backbone geometry. Next, 23 capital letters were used to represent the center nucleotides of these 23 clusters and for each letter, four different background colors were further used to separately represent four possible base types A, G, C and U of the corresponding center nucleotide. As a result, we constructed an SA of 92 elements with each element (a letter on a colored background) carrying both information of backbone geometry (letter) and base (background color) for a representative nucleotide. Finally, the SA was used to reduce input RNA 3D structures into 1D SA-encoded sequences and a traditional sequence alignment, such as global alignment (without penalty to end gaps) (13) or local alignment (14), was applied to them for determining their global or local similarity. In addition, for the accuracy of aligning two SA-encoded sequences, the statistical method proposed by Henikoff and Henikoff (15) was applied to derive a BLOSUM-like substitution matrix that can reward more similar SA-encoded sequences with high scores. We refer the reader to the Supplementary Data for the details of the above improved SA-based algorithm. It is worth mentioning here that the local alignment algorithm we used to implement iPARTS2 is slightly different from the one used in iPARTS, because we further utilized the technique mentioned in (16) to modify the local alignment algorithm such that the local alignments returned by iPARTS2 are non-intersecting, where two alignments are said to be non-intersecting if they do not have a match or mismatch in common. Usually, non-intersecting local alignments of RNA structures are more of practical interest to the user.

USAGE OF iPARTS2

The kernel algorithms of iPARTS2 were written in PHP. Currently, iPARTS2 can be accessed by an easy-to-operate web interface as illustrated in Figure 1. It provides the user two kinds of alignments for comparing two RNA 3D structures: (i) global alignment for determining their whole structural similarity, and (ii) local alignment for finding common similar substructures. Basically, iPARTS2 takes as input two RNA 3D structures, each of which can be either a PDB/NDB ID or a PDB file uploaded by the user, their chain IDs if they have multiple chains, and optionally the starting and ending residue numbers of substructures to be aligned. If required, the user can run iPARTS2 by modifying the default settings of all the parameters, including alignment method (whose default is global alignment), gap open and extension penalties (whose default values are −9 and −1, respectively), and number of suboptimal alignments (at least one). In the output page, iPARTS2 first shows the details of input RNA molecules and user-specified parameters. Next, iPARTS2 continues to show its running time, as well as its alignment results, including structural alignment score (SAS) (refer to the ‘Experimental Results’ section for its definition) between input RNA 3D structures with corresponding raw score in parentheses, number of aligned nucleotide pairs, RMSD and optimal/suboptimal alignments of their SA-encoded sequences and corresponding RNA sequences. Note that each letter in the aligned SA-encoded sequences is displayed with a colored background, which indicates the base type (A, G, C or U) of the corresponding nucleotide. Finally, iPARTS2 shows a JSmol graphical display (without installing Java plugin) of aligned RNA 3D structures, so that the user can visually view, rotate and enlarge the 3D structures of input RNA molecules and their structural superposition and download their alignment and PDB files. Note that in the JSmol visualization, end-gap residues in global alignment or non-aligned residues in local alignment are displayed in light colors.
Figure 1.

The web interface of iPARTS2.

The web interface of iPARTS2.

EXPERIMENTAL RESULTS

First, we tested iPARTS2 by running its global alignment on a benchmark dataset called FSCOR and evaluated its accuracy in function assignment by comparing its receiver operating characteristic (ROC) curve with those obtained by iPARTS (4) and other existing popular tools, including SARA (2,3) and SETTER (5,6). The FSCOR dataset originally proposed in (3) contains 419 RNA 3D structures that are classified into 168 functional classes. We ran all the tools mentioned above locally by aligning all 87571 pairs of RNA 3D structures in the FSCOR dataset. To take the quality of the structural alignments into account, the ROC curves of all the tools were computed based on a geometric match measure called SAS, which is defined to be (RMSD × 100)/(number of aligned nucleotide pairs) (17,18), instead of native alignment score. The reason is that, as suggested in (18), a better structural alignment should match more residues and also have lower RMSD, and the geometric match measure SAS is better than the native alignment score to separate good structural alignments from less good ones. Two RNA structures in the FSCOR dataset are said to be functionally identical if they have the same deepest SCOR classification (i.e. their geodesic distance d = 0) or functionally similar if they differ at least in the deepest SCOR classification (i.e. d ≤ 2). To obtain the ROC curve of each tool, the alignments of all pairs of RNA structures computed by the tool are sorted by their SAS values. A threshold of SAS is then varied between the minimum and maximum of the sorted SAS values for producing the points of the ROC curve. For a fixed threshold, all pairs of aligned RNA structures whose SAS values are above the threshold are assumed positive and all below it negative. Moreover, the pairs assumed positive are counted as true positives (TP) if they are functionally identical (d = 0) or similar (d ≤ 2) and false positives (FP) otherwise; the pairs assumed negative are counted as true negatives (TN) if they are functionally non-identical (d > 0) or dissimilar (d > 2) and false negatives (FN) otherwise. The point of the ROC curve corresponding to the fixed threshold is then produced by plotting its TP rate TP/(TP + FN) on the y-axis and its FP rate FP/(FP + TN) on the x-axis. As a result, the ROC curves for all the evaluated tools mentioned above are displayed in Figures 2 and 3 for d = 0 and d ≤ 2, respectively. These experimental results have shown that our iPARTS2 outperforms its previous version iPARTS and other tools SARA and SETTER for the function assignment in the FSCOR dataset, because iPARTS2 has the highest AUC values of 0.914 and 0.772 for d = 0 and d ≤ 2, respectively.
Figure 2.

ROC curves for d = 0 based on the SAS values of all aligned pairs of RNA 3D structures in the FSCOR dataset, where the AUC values of iPARTS, SARA, SETTER and iPARTS2 are 0.861, 0.883, 0.843 and 0.914, respectively. Note that the AUC value of RASS computed by using the 67006 pairs of RNA 3D structures is 0.892.

Figure 3.

ROC curves for d ≤ 2 based on the SAS values of all aligned pairs of RNA 3D structures in the FSCOR dataset, where the AUC values of iPARTS, SARA, SETTER and iPARTS2 are 0.740, 0.761, 0.713 and 0.772, respectively. Note that the AUC value of RASS computed by using the 67006 pairs of RNA 3D structures is 0.758.

ROC curves for d = 0 based on the SAS values of all aligned pairs of RNA 3D structures in the FSCOR dataset, where the AUC values of iPARTS, SARA, SETTER and iPARTS2 are 0.861, 0.883, 0.843 and 0.914, respectively. Note that the AUC value of RASS computed by using the 67006 pairs of RNA 3D structures is 0.892. ROC curves for d ≤ 2 based on the SAS values of all aligned pairs of RNA 3D structures in the FSCOR dataset, where the AUC values of iPARTS, SARA, SETTER and iPARTS2 are 0.740, 0.761, 0.713 and 0.772, respectively. Note that the AUC value of RASS computed by using the 67006 pairs of RNA 3D structures is 0.758. Next, we also compared the capabilities of iPARTS, SARA, SETTER and iPARTS2 for the function assignment with RASS (7,8) using the FSCOR dataset. As mentioned before, RASS is a recently developed tool of comparing two RNAs by considering both information of their sequences (bases) and 3D structures (backbone geometry). When running RASS on the FSCOR dataset, however, we noticed that for 20565 pairs among 419 RNA 3D structures, RASS was not able to provide their structural alignments so that their SAS values were not able to be computed. Therefore, for a fair comparison of all the evaluated tools, we calculated their ROC curves only using those 67006 pairs of RNA 3D structures whose structural alignments were able to be provided by RASS. In this situation, iPARTS2 still performs better than all other tools, including RASS, according to the AUC values of their ROC curves (refer to Supplementary Figures S5 and 6). For the results of additional experiments, we refer the reader to the Supplementary Data. Finally, for the running time comparison of all the tools mentioned before, we used five datasets containing two or more RNA 3D structures of various lengths as follows: (i) five tRNA structures (1EHZ:A, 1H3E:B, 1I9V:A, 2TRA:A and 1YFG:A) with an average length of 76 nucleotides, (ii) three ribozyme P4-P6 domains (1GID:A, 1HR2:A and 1L8V:A) with an average length of 157 nucleotides, (iii) two domains V of 23S rRNA (1FFZ:A and 1FG0:A) with an average length of 496 nucleotides, (iv) two 16S rRNA (1J5E:A and 4V4Q:AA) with an average length of 1522 nucleotides and (v) two 25S rRNA (4V7R:B1 and 4V7R:D1) with an average length of 3396 nucleotides. The average running times of all the tools were obtained by running them with their default parameters on local machine with Intel CPUs with 3.4 GHz and 32 GB of RAM under Linux system. As shown in Table 1, SETTER is the fastest tool among all the five tools. However, our iPARTS2, as well as iPARTS, outperforms both SARA and RASS, and it can finish its alignment job in several seconds up to a couple of minutes.
Table 1.

Comparison of average running times for iPARTS, SARA, SETTER, RASS and iPARTS2

DatasetiPARTSSARASETTERRASSiPARTS2
tRNA0.30 s0.83 s0.08 s1.52 s0.27 s
Ribozyme P4-P6 domain0.64 s5.21 s0.10 s3.46 s0.65 s
Domain V of 23S rRNA3.44 s1.87 min0.81 s9.17 s3.79 s
16S rRNA38.16 s46.53 min5.30 s48.09 s36.69 s
25S rRNA2.92 min6.65 h17.54 s5.60 min3.13 min

SUMMARY

In this study, we have re-implemented our previous tool iPARTS into a new web server iPARTS2 by constructing a totally new SA of 92 elements, with each element carrying both information of base (1D) and backbone geometry (3D) for a representative nucleotide. According to our experimental results on a benchmark dataset, iPARTS2 indeed outperforms iPARTS and also achieves better accuracy than other popular tools, such as SARA, SETTER and RASS, in RNA alignment quality and function prediction. Therefore, iPARTS2 can serve as a useful tool for aligning two RNA 3D structures, which can further provide insight into structural and functional properties of RNAs.
  16 in total

1.  The Protein Data Bank.

Authors:  H M Berman; J Westbrook; Z Feng; G Gilliland; T N Bhat; H Weissig; I N Shindyalov; P E Bourne
Journal:  Nucleic Acids Res       Date:  2000-01-01       Impact factor: 16.971

2.  Amino acid substitution matrices from protein blocks.

Authors:  S Henikoff; J G Henikoff
Journal:  Proc Natl Acad Sci U S A       Date:  1992-11-15       Impact factor: 11.205

3.  Evaluating and learning from RNA pseudotorsional space: quantitative validation of a reduced representation for RNA structure.

Authors:  Leven M Wadley; Kevin S Keating; Carlos M Duarte; Anna Marie Pyle
Journal:  J Mol Biol       Date:  2007-06-27       Impact factor: 5.469

4.  RNA structure alignment by a unit-vector approach.

Authors:  Emidio Capriotti; Marc A Marti-Renom
Journal:  Bioinformatics       Date:  2008-08-15       Impact factor: 6.937

5.  A new algorithm for best subsequence alignments with application to tRNA-rRNA comparisons.

Authors:  M S Waterman; M Eggert
Journal:  J Mol Biol       Date:  1987-10-20       Impact factor: 5.469

6.  iPARTS: an improved tool of pairwise alignment of RNA tertiary structures.

Authors:  Chih-Wei Wang; Kun-Tze Chen; Chin Lung Lu
Journal:  Nucleic Acids Res       Date:  2010-05-27       Impact factor: 16.971

7.  SETTER: web server for RNA structure comparison.

Authors:  Petr Cech; Daniel Svozil; David Hoksza
Journal:  Nucleic Acids Res       Date:  2012-06-11       Impact factor: 16.971

8.  RASS: a web server for RNA alignment in the joint sequence-structure space.

Authors:  Gewen He; Albert Steppi; Jose Laborde; Anuj Srivastava; Peixiang Zhao; Jinfeng Zhang
Journal:  Nucleic Acids Res       Date:  2014-05-15       Impact factor: 16.971

9.  SARA: a server for function annotation of RNA structures.

Authors:  Emidio Capriotti; Marc A Marti-Renom
Journal:  Nucleic Acids Res       Date:  2009-05-29       Impact factor: 16.971

10.  RNA global alignment in the joint sequence-structure space using elastic shape analysis.

Authors:  Jose Laborde; Daniel Robinson; Anuj Srivastava; Eric Klassen; Jinfeng Zhang
Journal:  Nucleic Acids Res       Date:  2013-04-12       Impact factor: 16.971

View more
  1 in total

1.  LocalSTAR3D: a local stack-based RNA 3D structural alignment tool.

Authors:  Xiaoli Chen; Nabila Shahnaz Khan; Shaojie Zhang
Journal:  Nucleic Acids Res       Date:  2020-07-27       Impact factor: 16.971

  1 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.