Literature DB >> 21602266

CLICK--topology-independent comparison of biomolecular 3D structures.

Abstract

Our server, CLICK: http://mspc.bii.a-star.edu.sg/click, is capable of superimposing the 3D structures of any pair of biomolecules (proteins, DNA, RNA, etc.). The server makes use of the Cartesian coordinates of the molecules with the option of using other structural features such as secondary structure, solvent accessible surface area and residue depth to guide the alignment. CLICK first looks for cliques of points (3-7 residues) that are structurally similar in the pair of structures to be aligned. Using these local similarities, a one-to-one equivalence is charted between the residues of the two structures. A least square fit then superimposes the two structures. Our method is especially powerful in establishing protein relationships by detecting similarities in structural subdomains, domains and topological variants. CLICK has been extensively benchmarked and compared with other popular methods for protein and RNA structural alignments. In most cases, CLICK alignments were statistically significantly better in terms of structure overlap. The method also recognizes conformational changes that may have occurred in structural domains or subdomains in one structure with respect to the other. For this purpose, the server produces complementary alignments to maximize the extent of detectable similarity. Various examples showcase the utility of our web server.

Entities: Species

Mesh：

Substances：
RNA

Year: 2011 PMID： 21602266 PMCID： PMC3125785 DOI： 10.1093/nar/gkr393

Source DB: PubMed Journal: Nucleic Acids Res ISSN： 0305-1048 Impact factor: 16.971

INTRODUCTION

The 3D structures of biomolecules at near atomic-level resolution often give us unique insights into their evolution and function. This has been extensively studied for molecules such as proteins, where similarity in 3D structure often implies homology (1–4). Given the rapid pace with which new structures are deposited in the PDB (5), it is crucial to have tools to classify and categorize these structures and investigate them for similarities at different levels. In the case of proteins, it has been beneficial to have categorization based on 3D-fold types that follow the primary sequence order (2–4). There are, however, several structural features whose similarities do not follow primary sequence order. Detecting these similarities in topologically different structures establishes new relationships between proteins in the different categories mentioned above. Frequently, these new relationships are of functional significance (6). The functionality of a biomolecule depends on the spatial orientation of its chemically various atoms. Sometimes different topologies result in similar/same functionality (6–8). Methods (1,9–20) that align a pair of structures imposing constraints on sequence order and topology may be inadequate to establish such functional similarities. To establish these relationships one needs to make use of non-sequential and non-topological protein structure matching programs (21–23). Here, we report on a web server that uses the CLICK algorithm (24) to align the 3D structures of any pair of biomolecules, independent of topology. The CLICK algorithm aligns 3D structures by matching cliques of points. Cliques are groupings of representative atoms of the biomolecules within a certain spatial proximity. Any pairwise distance among clique members is less than a set threshold. Points in a clique could also be assigned values for features such as secondary structure, solvent accessibility and depth. A pair of biomolecules is structurally aligned by matching cliques with similar features. In general, any pair of constellation of points can be aligned using CLICK and thus comparison between different types of biomolecules (for example, DNA with RNA) is also possible. To the best of our knowledge, this is the first web server equipped with this capability. We hope that our server is useful for a wide range of biomolecule structural analysis, especially in detecting conformational change, similarity of structural motifs (both local and global) and evolutionary relationships.

METHOD

Briefly, the CLICK algorithm consists of four sequential steps (for a comprehensive description of the method, see Supplementary Material).

Extracting features

Residues of a biomolecule are featured by the Cartesian coordinates of one or more representative atoms. If the biomolecules in question are proteins, additional structural features such as side-chain solvent accessibility, secondary structure and residue depth (25–26) are computed.

Forming cliques

We define a n-body clique as a subset of n points, where the Euclidean distance between any pair within the clique is within a predefined threshold. For each of the two structures to be compared, all possible n-body (n in the range of 3–7) cliques are computed from the representative atoms.

Clique matching

Cliques are matched by the superimposition of their Cartesian coordinates subject to the matching of other features. The objective here is to establish local structural similarities.

Alignment

Clique matching identifies structurally equivalent residues in the two structures. Using these equivalences, a final 3D least squares fit is performed to superimpose the two structures. The matching of cliques is not necessarily unique, i.e. multiple structural alignments are possible to be generated. The chosen alignment is the one that maximizes structure overlap.

USER PERSPECTIVE

Input

The web server is freely accessible at http://mspc.bii.a-star.edu.sg/click without login requirements. Input biomolecular structures can be submitted by specifying the four-letter code for existing structures deposited in the PDB, or by uploading structures in PDB format. In addition to the four-letter code, users can specify the identity of particular chains from the two structures. This specification is however optional as CLICK produces alignments irrespective of the number of chains in the PDB structure. By default, it considers all residues in the PDB file. Users are given the options to select one or more representative atoms (default: Cα atom) for individual residues of the molecule. For protein structure alignments, residue solvent-accessible surface area, secondary structure and depth information can be included as structural features to guide the alignment. These additional features are not yet operable for aligning other biomolecular structures. When four-letter PDB codes are specified, by default solvent-accessible surface area and secondary structure features are selected.

Output

The structure alignment is shown on two lines, one line per structure. The number and chain identifier of the first aligned residue on both structures precedes the listing of the residue one-letter codes. Each time the alignment fragments (probably because of topological differences), the number and chain identifier of the last residue in the fragments are also listed. Accompanying each alignment is a 3D rendition of the structural superimposition using Jmol. (Figures 1 and 2)

Figure 1.

A snapshot of the server showing the output of a structural alignment of two topologically different yet structurally similar proteins that belong to a different SCOP families, PDB codes 1iwm:A (salmon and SCOP entry: b.125.1.2) and 1oxe:A (green and SCOP entry: d.22.1.1), according to CLICK. (A) 3D representation of the superimposition is shown in an embedded JMol viewer. (B) The measures of alignment accuracy such as structure overlap (coverage), RMSD, sequence identity, topology score and fragment score. (C) The sequence alignment between the two proteins as inferred from the structural alignment. The conserved residues are shown in bold and red lettering. (D) Download links to the resulting sequence alignment in PIR format, the detailed alignment and matched residue (atom) pairs in text format, as well as the link to download the superimposed coordinates of the two structures, in PDB format.

Figure 2.

Another snapshot of the server showing RNA structure alignments (PDB codes 1l2x:A color salmon and 2c4y:S, color green).

Should there be conformational changes in the proteins being compared, CLICK first reports the largest alignment, in terms of number of residues aligned. The method then seeks to compare the regions of the proteins that were not aligned first. The detection of further structural similarity results in additional output alignments, shown one below the other in the aforementioned format. The alignments are downloadable in PIR (Protein Information Resource) format and in CLICK format that shows one equivalent representative atom match per line. Also downloadable are the coordinates of the superimposed structures in PDB format. Statistics relevant to the alignment including structure overlap, root mean square deviation (RMSD), fragment score, topology score, number of representative atoms in the two structures, length of the match and the number of identical residue matches are displayed in a table. Detailed help pages explain the significance of the different alignment measures. Users can also download an executable file of the CLICK program along with associated library files used in the server.

Examples

We demonstrate the versatility of the server with five different types of alignments. (i) A conventional structural alignment between a pair of proteins (PDB codes 1ANG and 7RSA) that are ∼35% identical in sequence and similar in overall topology. (ii) An alignment between two protein structures (1UBP chains A, B and C; and 1E9Y chains A and B) with multiple chains. Chains A and B of 1UBP superimpose onto chain A of 1E9Y, and chain C of 1UBP superimposes onto chain B of 1E9Y. This showcases the feature of CLICK to align structures regardless of chain breaks. (iii) An alignment between two proteins (PDB codes 1IWM chain A and 1OXE chain A) that are structurally similar and topologically different. The fragmented alignment is a result of the different topology. (iv) Multidomain proteins, where individual domains in one structure (PDB 1AIV) have a structural equivalence in the other (PDB 1OVT), but the relative orientations between domains differ in the two structures. Domain swapping and rigid body shift detection belong to this category. (v) An alignment showcasing the general utility of CLICK. Two DNA double-helical fragments bound to proteins (PDB codes 1YSA and 2AYG) are aligned with one another. The representative atom used in this instance was C3′. In principle, for such examples, representative atoms could have been used for amino acid and nucleotides residues at the same time. A snapshot of the server showing the output of a structural alignment of two topologically different yet structurally similar proteins that belong to a different SCOP families, PDB codes 1iwm:A (salmon and SCOP entry: b.125.1.2) and 1oxe:A (green and SCOP entry: d.22.1.1), according to CLICK. (A) 3D representation of the superimposition is shown in an embedded JMol viewer. (B) The measures of alignment accuracy such as structure overlap (coverage), RMSD, sequence identity, topology score and fragment score. (C) The sequence alignment between the two proteins as inferred from the structural alignment. The conserved residues are shown in bold and red lettering. (D) Download links to the resulting sequence alignment in PIR format, the detailed alignment and matched residue (atom) pairs in text format, as well as the link to download the superimposed coordinates of the two structures, in PDB format. Another snapshot of the server showing RNA structure alignments (PDB codes 1l2x:A color salmon and 2c4y:S, color green).

IMPLEMENTATION

The program run time increases with both sizes of input structures and number of best matched cliques. On average CLICK took 1 s to perform a single alignment of a pair of proteins each of size ∼150 residues on an Ubuntu 8.04 Linux platform with 3.00 GHz CPU (Core 2 Duo E8400) and 3.5 GB primary memory.

PERFORMANCE

The performance of CLICK was compared with other popular structural alignment methods. For protein structure comparisons, three different data sets consisting of 9357, 64 and 89 pairs of structures were used. For details on each of these data sets, refer to Supplementary Tables S1a and b, S2a and b, and S3a and b. Over each of these datasets, the alignment accuracy of CLICK was compared with other popular protein structure alignment programs including MUSTANG (21), Geometric Hashing (C-alpha Match) (27–29), SALIGN (19), DALI (22,30) and alignments from the HOMSTRAD database (4). All these programs were run using default parameters, and no effort was made to adjust the parameters for specific cases. With the exception of SALIGN, CLICK alignments are statistically significantly better than those of the other methods compared in terms of structure overlap. In all of these datasets, the structure overlap from CLICK alignments was never below 40%. CLICK was also compared with programs that align RNA 3D structures, including ARTS (31,32) and SARA (33,34). See Supplementary Tables S4a and b for details. The structure overlap of CLICK in this comparison was also statistically significantly better.

CONCLUSION

The CLICK method formalizes the biomolecular structure superimposition problem as one of feature-point matching. The features include Cartesian coordinates, solvent-accessible surface area and residue depth. The method is flexible and can be easily implemented with other features considered important in different contexts. We have extensively benchmarked the method over various protein and RNA structure data sets. The accuracy of CLICK alignments, in terms of structure overlap, is on par or statistically significantly better than several other existing methods for protein and RNA alignments. CLICK performs structural superposition on pairs of structures based on similarity of local structural packing, and thus is capable of aligning structures with dissimilar topologies, conformations or even molecular types. These unique properties make CLICK a utilitarian tool for detecting divergent evolution due to topology changes, convergence evolution where substructures of proteins are similar to one another, and conformational change such as domain swap and rigid-body shift where the relative orientation of domains change. This server now sets the stage for interesting investigations including topology-independent structural motif detection, biomolecular structure design and super-secondary structure classifications in not just proteins but also in molecules such as RNA, DNA, etc.

SUPPLEMENTARY DATA

Supplementary Data are available at NAR Online.

FUNDING

Funding for open access charge: Biomedical Research Council (A*STAR), Singapore. Conflict of interest statement. None declared.

33 in total

1. Residue depth: a novel parameter for the analysis of protein structure and stability.

Authors: S Chakravarty; R Varadarajan
Journal: Structure Date: 1999-07-15 Impact factor: 5.006

2. ARTS: alignment of RNA tertiary structures.

Authors: Oranit Dror; Ruth Nussinov; Haim Wolfson
Journal: Bioinformatics Date: 2005-09-01 Impact factor: 6.937

3. MUSTANG: a multiple structural alignment algorithm.

Authors: Arun S Konagurthu; James C Whisstock; Peter J Stuckey; Arthur M Lesk
Journal: Proteins Date: 2006-08-15

4. Using an alignment of fragment strings for comparing protein structures.

Authors: Iddo Friedberg; Tim Harder; Rachel Kolodny; Einat Sitbon; Zhanwen Li; Adam Godzik
Journal: Bioinformatics Date: 2007-01-15 Impact factor: 6.937

5. Structural search and retrieval using a tableau representation of protein folding patterns.

Authors: Arun S Konagurthu; Peter J Stuckey; Arthur M Lesk
Journal: Bioinformatics Date: 2008-01-05 Impact factor: 6.937

6. Architectures and functional coverage of protein-protein interfaces.

Authors: Nurcan Tuncbag; Attila Gursoy; Emre Guney; Ruth Nussinov; Ozlem Keskin
Journal: J Mol Biol Date: 2008-05-06 Impact factor: 5.469

7. RNA structure alignment by a unit-vector approach.

Authors: Emidio Capriotti; Marc A Marti-Renom
Journal: Bioinformatics Date: 2008-08-15 Impact factor: 6.937

8. SISYPHUS--structural alignments for proteins with non-trivial relationships.

Authors: Antonina Andreeva; Andreas Prlić; Tim J P Hubbard; Alexey G Murzin
Journal: Nucleic Acids Res Date: 2006-10-26 Impact factor: 16.971

9. TOPOFIT-DB, a database of protein structural alignments based on the TOPOFIT method.

Authors: Chesley M Leslin; Alexej Abyzov; Valentin A Ilyin
Journal: Nucleic Acids Res Date: 2006-10-25 Impact factor: 16.971

10. A comprehensive analysis of non-sequential alignments between all protein structures.

Authors: Alexej Abyzov; Valentin A Ilyin
Journal: BMC Struct Biol Date: 2007-11-16

51 in total

1. Constructing and validating initial Cα models from subnanometer resolution density maps with pathwalking.

Authors: Mariah R Baker; Ian Rees; Steven J Ludtke; Wah Chiu; Matthew L Baker
Journal: Structure Date: 2012-03-07 Impact factor: 5.006

2. De Novo modeling in cryo-EM density maps with Pathwalking.

Authors: Muyuan Chen; Philip R Baldwin; Steven J Ludtke; Matthew L Baker
Journal: J Struct Biol Date: 2016-07-17 Impact factor: 2.867

3. A high-affinity protein binder that blocks the IL-6/STAT3 signaling pathway effectively suppresses non-small cell lung cancer.

Authors: Joong-Jae Lee; Hyun Jung Kim; Chul-Su Yang; Hyun-Ho Kyeong; Jung-Min Choi; Da-Eun Hwang; Jae-Min Yuk; Keunwan Park; Yu Jung Kim; Seung-Goo Lee; Dongsup Kim; Eun-Kyeong Jo; Hae-Kap Cheong; Hak-Sung Kim
Journal: Mol Ther Date: 2014-03-31 Impact factor: 11.454

Review 4. Computational tools for epitope vaccine design and evaluation.

Authors: Linling He; Jiang Zhu
Journal: Curr Opin Virol Date: 2015-03-31 Impact factor: 7.090

5. Validated near-atomic resolution structure of bacteriophage epsilon15 derived from cryo-EM and modeling.

Authors: Matthew L Baker; Corey F Hryc; Qinfen Zhang; Weimin Wu; Joanita Jakana; Cameron Haase-Pettingell; Pavel V Afonine; Paul D Adams; Jonathan A King; Wen Jiang; Wah Chiu
Journal: Proc Natl Acad Sci U S A Date: 2013-07-09 Impact factor: 11.205

6. Topology independent comparison of RNA 3D structures using the CLICK algorithm.

Authors: Minh N Nguyen; Adelene Y L Sim; Yue Wan; M S Madhusudhan; Chandra Verma
Journal: Nucleic Acids Res Date: 2016-09-14 Impact factor: 16.971

7. The three-dimensional structure of an H-superfamily conotoxin reveals a granulin fold arising from a common ICK cysteine framework.

Authors: Lau D Nielsen; Mads M Foged; Anastasia Albert; Andreas B Bertelsen; Cecilie L Søltoft; Samuel D Robinson; Steen V Petersen; Anthony W Purcell; Baldomero M Olivera; Raymond S Norton; Terje Vasskog; Helena Safavi-Hemami; Kaare Teilum; Lars Ellgaard
Journal: J Biol Chem Date: 2019-04-11 Impact factor: 5.157

8. Methods for Molecular Modelling of Protein Complexes.

Authors: Tejashree Rajaram Kanitkar; Neeladri Sen; Sanjana Nair; Neelesh Soni; Kaustubh Amritkar; Yogendra Ramtirtha; M S Madhusudhan
Journal: Methods Mol Biol Date: 2021

9. Structural and ligand binding analyses of the periplasmic sensor domain of RsbU in Chlamydia trachomatis support a role in TCA cycle regulation.

Authors: Katelyn R Soules; Aidan Dmitriev; Scott D LaBrie; Zoë E Dimond; Benjamin H May; David K Johnson; Yang Zhang; Kevin P Battaile; Scott Lovell; P Scott Hefty
Journal: Mol Microbiol Date: 2019-11-07 Impact factor: 3.501

10. In silico analysis of Mn transporters (NRAMP1) in various plant species.

Authors: Recep Vatansever; Ertugrul Filiz; Ibrahim Ilker Ozyigit
Journal: Mol Biol Rep Date: 2016-02-15 Impact factor: 2.316