Literature DB >> 26508754

Omokage search: shape similarity search service for biomolecular structures in both the PDB and EMDB.

Hirofumi Suzuki1, Takeshi Kawabata1, Haruki Nakamura1.   

Abstract

UNLABELLED: Omokage search is a service to search the global shape similarity of biological macromolecules and their assemblies, in both the Protein Data Bank (PDB) and Electron Microscopy Data Bank (EMDB). The server compares global shapes of assemblies independent of sequence order and number of subunits. As a search query, the user inputs a structure ID (PDB ID or EMDB ID) or uploads an atomic model or 3D density map to the server. The search is performed usually within 1 min, using one-dimensional profiles (incremental distance rank profiles) to characterize the shapes. Using the gmfit (Gaussian mixture model fitting) program, the found structures are fitted onto the query structure and their superimposed structures are displayed on the Web browser. Our service provides new structural perspectives to life science researchers.
AVAILABILITY AND IMPLEMENTATION: Omokage search is freely accessible at http://pdbj.org/omokage/.
© The Author 2015. Published by Oxford University Press.

Entities:  

Mesh:

Substances:

Year:  2015        PMID: 26508754      PMCID: PMC4743628          DOI: 10.1093/bioinformatics/btv614

Source DB:  PubMed          Journal:  Bioinformatics        ISSN: 1367-4803            Impact factor:   6.937


1 Introduction

Electron microscopy of cellular macromolecules provides 3D density maps of many important molecular machines. More than 3000 density maps are now stored in the Electron Microscopy Data Bank (EMDB) (Lawson ). Atomic models, obtained by X-ray crystallography and the current hybrid method, are also available in the Protein Data Bank (PDB) (Berman ). We have been providing Web-based services, EM Navigator and Yorodumi, for both databanks (Kinjo ). Shape comparisons among these 3D density maps and atomic models facilitate the elucidation of structural differences and conformational changes, and the generation of atomic models from the density maps. However, very few Web services look for shapes represented as 3D density maps that are similar to the atomic models. The Web server EM-SURFER (Esquivel-Rodriguez ) was recently developed for searching 3D density maps. However, it only handles 3D density maps in the EMDB, and not in the PDB, and does not provide 3D superimpositions. Here, we describe our new search service, Omokage search, based on the global shape similarity of the structure data, for both the PDB and EMDB. Our server provides superimposed structures, using the program gmfit. Users can visually assess the similarities by the 3D superimpositions.

2 Methods

2.1 Similarity search

For a fast search through the large dataset, a comparison is performed using one-dimensional (1D) profiles generated from 3D point models. We employ the vector quantization method to convert both density maps and atomic models into 3D point models, and use the Situs software for the conversion (Wriggers ). Four 1D profiles are generated from the 3D point model. Three of them are generated based on the distances of the 3D point pairs (incremental distance rank profiles). The other is based on the principal components analysis (PCA) of the 3D point model. Details of the procedure are described in the Supplementary Data.

2.2 3D superimposition of assemblies

A superimposition of assemblies is performed using the gmfit program (Kawabata, 2008), which employs the Gaussian mixture model (GMM) algorithm to represent both the 3D density maps and atomic models. The GMM representation considerably reduces the computational cost for superimposition. We calculate the ‘one-to-one’ fitting, where a single density map or an atomic model is superimposed onto another fixed density map or atomic model. We employ the principal component axes alignment to generate the initial configurations, and the steepest descent method to refine the initial configurations. The computation time for the one-to-one superimposition is less than one second. Details of the procedure are described in the Supplementary Data.

3 Server description

3.1 Dataset

In this service, density maps from the EMDB, the asymmetric unit (AU) and a biological unit (BU) from the PDB are stored in the dataset. Approximately 2800 EMDB map data, 100 000 PDB AU models and 100 000 PDB BU models are presently available, and they are updated weekly.

3.2 Input

Input query structure data can be submitted by specifying the ID of the data in the search dataset or by uploading one’s own data file. As trials, some sample query data are shown. For a PDB entry, the user can specify an assembly ID by adding a number to the four component PDB ID (e.g. ‘1oel-1’) or by selecting one from the AU or BU images. PDB format files are acceptable for uploading an atomic model or a dummy atom model by small angle scattering. CCP4/MRC format files are acceptable for uploading a 3D density map, and the surface level should be specified.

3.3 Output

The search usually finishes within 1 min, and a list of similar shaped structure data (at most, 2000) to the input query data is shown, in the order of the similarities (Fig. 1, left). The users can open the page with the interactive viewer, Jmol/JSmol, which will show the found model superimposed onto the query model by the program gmfit (Kawabata, 2008; Fig. 1, right).
Fig. 1.

The search results using the map data RNA polymerase II (EMDB 2190) as the search input query (left), and the fitting of the atomic data (PDB 4BBR) onto the map data (right)

The search results using the map data RNA polymerase II (EMDB 2190) as the search input query (left), and the fitting of the atomic data (PDB 4BBR) onto the map data (right)

3.4 Evaluation

We evaluated the performance to detect biological similarities for both Omokage and EM-SURFER, and concluded that our Omokage is more powerful to detect biological similarities among various density maps with different resolutions and volumes. Details about the performance comparison using the ClpP-ClpB and 70S-ribosome datasets are described in the Supplementary Data.

4 Case studies and outlook

We emphasize that the advantage of our server is its ability to rapidly compare global shapes independent of sequence-order, subunit number and type of structural data (atomic model and density map). We introduce three types of case studies. The first is a search for low resolution structures. For the query of the 3D map data of RNA polymerase II (EMDB 2190; 25 Å resolution), 100 RNA polymerase structures were found in both databanks, independent of their resolutions. The second is a search for similar assembly forms with different subunits. A search with the PCNA clamp in the trimer ring form (AU of PDB 3IFV) yielded 100 clamps, including dimer beta clamps. The third is finding unexpected similar shapes, and implying some functional similarity (‘molecular mimicry’). The shape similarity of the tRNA-EF-Tu complex (RNA and protein) and EF-G (single protein) is a famous example (Nissen ). The search using the tRNA-EF-Tu complex structure (AU of PDB 1OB2) yielded some EF-G structures. The current case studies and the performance are described in detail in the Supplementary Data. Our server is not meant for sequence-order comparisons of atomic models of single protein chains, which can be easily performed by BLAST and DALI. The detection of local substructure similarity has not been incorporated in the current algorithm. For the hybrid and integrative methods (Sali ), such an algorithm should be developed in the future.
  8 in total

Review 1.  Self-organizing neural networks bridge the biomolecular resolution gap.

Authors:  W Wriggers; R A Milligan; K Schulten; J A McCammon
Journal:  J Mol Biol       Date:  1998-12-18       Impact factor: 5.469

2.  Multiple subunit fitting into a low-resolution density map of a macromolecular complex using a gaussian mixture model.

Authors:  Takeshi Kawabata
Journal:  Biophys J       Date:  2008-08-15       Impact factor: 4.033

3.  Crystal structure of the ternary complex of Phe-tRNAPhe, EF-Tu, and a GTP analog.

Authors:  P Nissen; M Kjeldgaard; S Thirup; G Polekhina; L Reshetnikova; B F Clark; J Nyborg
Journal:  Science       Date:  1995-12-01       Impact factor: 47.728

4.  EMDataBank.org: unified data resource for CryoEM.

Authors:  Catherine L Lawson; Matthew L Baker; Christoph Best; Chunxiao Bi; Matthew Dougherty; Powei Feng; Glen van Ginkel; Batsal Devkota; Ingvar Lagerstedt; Steven J Ludtke; Richard H Newman; Tom J Oldfield; Ian Rees; Gaurav Sahni; Raul Sala; Sameer Velankar; Joe Warren; John D Westbrook; Kim Henrick; Gerard J Kleywegt; Helen M Berman; Wah Chiu
Journal:  Nucleic Acids Res       Date:  2010-10-08       Impact factor: 16.971

5.  Protein Data Bank Japan (PDBj): maintaining a structural data archive and resource description framework format.

Authors:  Akira R Kinjo; Hirofumi Suzuki; Reiko Yamashita; Yasuyo Ikegawa; Takahiro Kudou; Reiko Igarashi; Yumiko Kengaku; Hasumi Cho; Daron M Standley; Atsushi Nakagawa; Haruki Nakamura
Journal:  Nucleic Acids Res       Date:  2011-10-05       Impact factor: 16.971

6.  The worldwide Protein Data Bank (wwPDB): ensuring a single, uniform archive of PDB data.

Authors:  Helen Berman; Kim Henrick; Haruki Nakamura; John L Markley
Journal:  Nucleic Acids Res       Date:  2006-11-16       Impact factor: 16.971

7.  Navigating 3D electron microscopy maps with EM-SURFER.

Authors:  Juan Esquivel-Rodríguez; Yi Xiong; Xusi Han; Shuomeng Guang; Charles Christoffer; Daisuke Kihara
Journal:  BMC Bioinformatics       Date:  2015-05-30       Impact factor: 3.169

8.  Outcome of the First wwPDB Hybrid/Integrative Methods Task Force Workshop.

Authors:  Andrej Sali; Helen M Berman; Torsten Schwede; Jill Trewhella; Gerard Kleywegt; Stephen K Burley; John Markley; Haruki Nakamura; Paul Adams; Alexandre M J J Bonvin; Wah Chiu; Matteo Dal Peraro; Frank Di Maio; Thomas E Ferrin; Kay Grünewald; Aleksandras Gutmanas; Richard Henderson; Gerhard Hummer; Kenji Iwasaki; Graham Johnson; Catherine L Lawson; Jens Meiler; Marc A Marti-Renom; Gaetano T Montelione; Michael Nilges; Ruth Nussinov; Ardan Patwardhan; Juri Rappsilber; Randy J Read; Helen Saibil; Gunnar F Schröder; Charles D Schwieters; Claus A M Seidel; Dmitri Svergun; Maya Topf; Eldon L Ulrich; Sameer Velankar; John D Westbrook
Journal:  Structure       Date:  2015-06-18       Impact factor: 5.006

  8 in total
  12 in total

1.  TopMatch-web: pairwise matching of large assemblies of protein and nucleic acid chains in 3D.

Authors:  Markus Wiederstein; Manfred J Sippl
Journal:  Nucleic Acids Res       Date:  2020-07-02       Impact factor: 16.971

2.  VESPER: global and local cryo-EM map alignment using local density vectors.

Authors:  Xusi Han; Genki Terashi; Charles Christoffer; Siyang Chen; Daisuke Kihara
Journal:  Nat Commun       Date:  2021-04-07       Impact factor: 14.919

3.  New tools and functions in data-out activities at Protein Data Bank Japan (PDBj).

Authors:  Akira R Kinjo; Gert-Jan Bekker; Hiroshi Wako; Shigeru Endo; Yuko Tsuchiya; Hiromu Sato; Hafumi Nishi; Kengo Kinoshita; Hirofumi Suzuki; Takeshi Kawabata; Masashi Yokochi; Takeshi Iwata; Naohiro Kobayashi; Toshimichi Fujiwara; Genji Kurisu; Haruki Nakamura
Journal:  Protein Sci       Date:  2017-09-18       Impact factor: 6.725

4.  Protein Data Bank Japan (PDBj): updated user interfaces, resource description framework, analysis tools for large structures.

Authors:  Akira R Kinjo; Gert-Jan Bekker; Hirofumi Suzuki; Yuko Tsuchiya; Takeshi Kawabata; Yasuyo Ikegawa; Haruki Nakamura
Journal:  Nucleic Acids Res       Date:  2016-10-26       Impact factor: 16.971

5.  Real time structural search of the Protein Data Bank.

Authors:  Dmytro Guzenko; Stephen K Burley; Jose M Duarte
Journal:  PLoS Comput Biol       Date:  2020-07-08       Impact factor: 4.475

Review 6.  Advances in the Development of Shape Similarity Methods and Their Application in Drug Discovery.

Authors:  Ashutosh Kumar; Kam Y J Zhang
Journal:  Front Chem       Date:  2018-07-25       Impact factor: 5.221

7.  Protein Data Bank Japan: Celebrating our 20th anniversary during a global pandemic as the Asian hub of three dimensional macromolecular structural data.

Authors:  Gert-Jan Bekker; Masashi Yokochi; Hirofumi Suzuki; Yasuyo Ikegawa; Takeshi Iwata; Takahiro Kudou; Kei Yura; Toshimichi Fujiwara; Takeshi Kawabata; Genji Kurisu
Journal:  Protein Sci       Date:  2021-10-27       Impact factor: 6.725

8.  Searching for 3D structural models from a library of biological shapes using a few 2D experimental images.

Authors:  Sandhya P Tiwari; Florence Tama; Osamu Miyashita
Journal:  BMC Bioinformatics       Date:  2018-09-12       Impact factor: 3.169

9.  The role of the jaw subdomain of peptidoglycan glycosyltransferases for lipid II polymerization.

Authors:  Avinash S Punekar; Firdaus Samsudin; Adrian J Lloyd; Christopher G Dowson; David J Scott; Syma Khalid; David I Roper
Journal:  Cell Surf       Date:  2018-06

10.  Simplified geometric representations of protein structures identify complementary interaction interfaces.

Authors:  Caitlyn L McCafferty; Edward M Marcotte; David W Taylor
Journal:  Proteins       Date:  2020-11-11
View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.