Literature DB >> 25100686

MEGADOCK 4.0: an ultra-high-performance protein-protein docking software for heterogeneous supercomputers.

Masahito Ohue1, Takehiro Shimoda1, Shuji Suzuki2, Yuri Matsuzaki3, Takashi Ishida3, Yutaka Akiyama1.   

Abstract

SUMMARY: The application of protein-protein docking in large-scale interactome analysis is a major challenge in structural bioinformatics and requires huge computing resources. In this work, we present MEGADOCK 4.0, an FFT-based docking software that makes extensive use of recent heterogeneous supercomputers and shows powerful, scalable performance of >97% strong scaling.
AVAILABILITY AND IMPLEMENTATION: MEGADOCK 4.0 is written in C++ with OpenMPI and NVIDIA CUDA 5.0 (or later) and is freely available to all academic and non-profit users at: http://www.bi.cs.titech.ac.jp/megadock. CONTACT: akiyama@cs.titech.ac.jp SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
© The Author 2014. Published by Oxford University Press.

Entities:  

Mesh:

Year:  2014        PMID: 25100686      PMCID: PMC4221127          DOI: 10.1093/bioinformatics/btu532

Source DB:  PubMed          Journal:  Bioinformatics        ISSN: 1367-4803            Impact factor:   6.937


1 INTRODUCTION

Protein–protein interactions can provide valuable insights for understanding the principles of biological systems and for elucidating causes of incurable diseases. Although many structures of interacting proteins have been determined by X-ray crystallography and Nuclear Magnetic Resonance spectroscopy, the structures of many protein complexes have still not been determined experimentally because of cost and technical limitations. Protein–protein docking, a computational method for predicting the structure of a protein complex from known component structures, is a powerful approach that facilitates the discovery of otherwise unattainable protein complex structures. A number of fast Fourier transform (FFT)-based rigid-body initial protein–protein docking tools have been developed for predicting protein complex structures (Cheng ; Pierce ; Ritchie and Venkatraman, 2010). However, faster docking tools are still required to perform large-scale interactome predictions. Some applications also require a huge number of dockings, such as ensemble docking techniques using multiple conformations for flexible docking (Grünberg ; Król ), cross-docking for identification of protein interaction partners (Lopes ; Matsuzaki ; Wass ; Zhang ) and multiple docking (Karaca and Bonvin, 2011). To achieve these large-scale analyses, use of the supercomputing environment has become absolutely necessary. On the other hand, 35% of computing performance of supercomputers ranked in top500.org (June 2014) is currently achieved by hardware accelerators, such as graphics processing units (GPUs), and this percentage is increasing. Therefore, tools that can be used with such ‘heterogeneous’ supercomputers are necessary. While some docking tools are accelerated by GPUs on a node (Ritchie and Venkatraman, 2010; Sukhwani and Herbordt, 2009), ‘heterogeneous’ supercomputers, which have massive numbers of nodes including multiple CPU cores and GPU cards, have not yet been used for acceleration of docking tool performance. Here, we present ultra–high-performance docking software, ‘MEGADOCK 4.0’, which makes extensive use of supercomputers equipped with GPUs.

2 IMPLEMENTATION

2.1 MEGADOCK scheme

MEGADOCK uses a Katchalski-Katzir algorithm known as a traditional FFT-based rigid-docking scheme (Katchalski-Katzir ). Its original scoring function, based on shape complementarity, electrostatics and desolvation free energy, is calculated by only one correlation function (Ohue , 2014). This is advantageous for faster calculation because multiple correlation functions and thus multiple FFT calculations are used to evaluate multiple effects in previous methods (Kozakov ; Pierce ). (see Supplementary Text S1 for details)

2.2 GPU implementation

MEGADOCK has been implemented on multiple GPUs using the CUDA library (Shimoda ). A previous study (Sukhwani and Herbordt, 2009) mapped only FFT processes onto a GPU, and its implementation could not use multiple GPUs. We mapped the whole docking process (voxelization, ligand rotation, FFTs and finding solutions) onto GPUs, and our implementation was able to use multiple GPUs and CPU cores (Shimoda ).

2.3 Hybrid CUDA, MPI and OpenMP parallelization

For extensive execution of docking jobs, an implementation that can be performed among many computing nodes is required. We previously parallelized the calculation of each docking processes using MPI and OpenMP with the master/worker model (Matsuzaki ). On cluster computers, a master process acquires a list of protein pairs and distributes the docking jobs to worker processes on available nodes. This implementation guarantees fault tolerance in that the master process surveys all docking jobs. The proposed software, MEGADOCK 4.0, is implemented by hybrid CUDA, MPI and OpenMP parallelization. Reducing the usage of memory space is important with systems that have many CPU cores, multiple GPUs per node and relatively little memory (e.g. there is only 6 GB memory on an NVIDIA Tesla K20X GPU). We assigned one docking job to each node and then distributed the calculations of ligand rotation by thread parallelization with CPU cores and GPUs. This implementation model manages one node as the master and the other nodes as workers. The master node distributes the docking jobs to worker nodes, and a worker node executes distributed docking jobs with multiple GPUs by CUDA and all CPU cores by OpenMP thread parallelization. This implementation also guarantees fault tolerance similar to the CPU version.

3 RESULTS AND DISCUSSION

To check the performance of MEGADOCK 4.0, we used the ZLAB benchmark 4.0 dataset (Hwang ). Speed measurement experiments were conducted on the TSUBAME 2.5 supercomputing system (Tokyo Institute of Technology, Japan). We used its ‘thin nodes’ with a reservation service of exclusive use (up to 420 nodes). Each ‘thin’ node contained two Intel Xeon X5670 (six cores, 2.93 GHz) and three NVIDIA Tesla K20X (GK110) GPUs. The specifications of the environment are shown in Supplementary Text S2 and Table S1. Figure 1 shows the average of five measurements of computation time and the parallel scalability of MEGADOCK 4.0 on 30 976 protein pairs from combinations between 176 receptors and 176 ligands, assuming a cross-docking study. The observed calculation acceleration was close to ideal. Strong scaling values from 35 nodes were >97% for all numbers of nodes measured here (Supplementary Table S2). Notably, a high scalability (98%) was obtained with the largest number of nodes (420 nodes).
Fig. 1.

Calculation time and acceleration by parallelization among nodes on 30 976 docking jobs

Calculation time and acceleration by parallelization among nodes on 30 976 docking jobs We also measured docking time on a half million and a million protein pairs for simulation of large-scale interactome analyses using averaged-sized proteins (FFT size of 108, see Supplementary Table S3). In this simulation, a half million docking jobs required 5.71 h, while a million jobs required 11.51 h. The epidermal growth factor receptor-related pathway, which we are studying in non–small-cell lung cancer, required approximately a quarter million dockings. This analysis could be completed in only 3 h with MEGADOCK 4.0 using 420 nodes, whereas solving the same problem requires several days with an older version of MEGADOCK.

4 CONCLUSIONS

MEGADOCK 4.0 is a docking software for heterogeneous supercomputing environments and shows excellent scalability. Heterogeneous supercomputers equipped with hardware accelerators, such GPUs, will become common in the future. Fully using such computers is crucial for bioinformatics research, which must analyze massive amounts of data. MEGADOCK 4.0 can serve as a tool to promote analysis of the whole interactome within a reasonable time.
  15 in total

1.  Molecular surface recognition: determination of geometric fit between proteins and their ligands by correlation techniques.

Authors:  E Katchalski-Katzir; I Shariv; M Eisenstein; A A Friesem; C Aflalo; I A Vakser
Journal:  Proc Natl Acad Sci U S A       Date:  1992-03-15       Impact factor: 11.205

2.  Protein-protein docking benchmark version 4.0.

Authors:  Howook Hwang; Thom Vreven; Joël Janin; Zhiping Weng
Journal:  Proteins       Date:  2010-11-15

3.  Complementarity of structure ensembles in protein-protein binding.

Authors:  Raik Grünberg; Johan Leckner; Michael Nilges
Journal:  Structure       Date:  2004-12       Impact factor: 5.006

4.  PIPER: an FFT-based protein docking program with pairwise potentials.

Authors:  Dima Kozakov; Ryan Brenke; Stephen R Comeau; Sandor Vajda
Journal:  Proteins       Date:  2006-11-01

5.  pyDock: electrostatics and desolvation for effective scoring of rigid-body protein-protein docking.

Authors:  Tammy Man-Kuang Cheng; Tom L Blundell; Juan Fernandez-Recio
Journal:  Proteins       Date:  2007-08-01

6.  In silico screening of protein-protein interactions with all-to-all rigid docking and clustering: an application to pathway analysis.

Authors:  Yuri Matsuzaki; Yusuke Matsuzaki; Toshiyuki Sato; Yutaka Akiyama
Journal:  J Bioinform Comput Biol       Date:  2009-12       Impact factor: 1.122

7.  A multidomain flexible docking approach to deal with large conformational changes in the modeling of biomolecular complexes.

Authors:  Ezgi Karaca; Alexandre M J J Bonvin
Journal:  Structure       Date:  2011-04-13       Impact factor: 5.006

8.  Accelerating protein docking in ZDOCK using an advanced 3D convolution library.

Authors:  Brian G Pierce; Yuichiro Hourai; Zhiping Weng
Journal:  PLoS One       Date:  2011-09-19       Impact factor: 3.240

9.  MEGADOCK 3.0: a high-performance protein-protein interaction prediction software using hybrid parallel computing for petascale supercomputing environments.

Authors:  Yuri Matsuzaki; Nobuyuki Uchikoga; Masahito Ohue; Takehiro Shimoda; Toshiyuki Sato; Takashi Ishida; Yutaka Akiyama
Journal:  Source Code Biol Med       Date:  2013-09-03

10.  MEGADOCK: an all-to-all protein-protein interaction prediction system using tertiary structure data.

Authors:  Masahito Ohue; Yuri Matsuzaki; Nobuyuki Uchikoga; Takashi Ishida; Yutaka Akiyama
Journal:  Protein Pept Lett       Date:  2014       Impact factor: 1.890

View more
  23 in total

Review 1.  Software for molecular docking: a review.

Authors:  Nataraj S Pagadala; Khajamohiddin Syed; Jack Tuszynski
Journal:  Biophys Rev       Date:  2017-01-16

Review 2.  Protein-Protein Docking: Past, Present, and Future.

Authors:  Sharon Sunny; P B Jayaraj
Journal:  Protein J       Date:  2021-11-17       Impact factor: 2.371

3.  Modeling Protein Complexes and Molecular Assemblies Using Computational Methods.

Authors:  Romain Launay; Elin Teppa; Jérémy Esque; Isabelle André
Journal:  Methods Mol Biol       Date:  2023

4.  Anti-Chikungunya Virus Monoclonal Antibody That Inhibits Viral Fusion and Release.

Authors:  Uranan Tumkosit; Uamporn Siripanyaphinyo; Naokazu Takeda; Motonori Tsuji; Yusuke Maeda; Kriangsak Ruchusatsawat; Tatsuo Shioda; Hiroto Mizushima; Prukswan Chetanachan; Pattara Wongjaroen; Yoshiharu Matsuura; Masashi Tatsumi; Atsushi Tanaka
Journal:  J Virol       Date:  2020-09-15       Impact factor: 5.103

5.  Computational Feasibility of an Exhaustive Search of Side-Chain Conformations in Protein-Protein Docking.

Authors:  Taras Dauzhenka; Petras J Kundrotas; Ilya A Vakser
Journal:  J Comput Chem       Date:  2018-09-18       Impact factor: 3.376

6.  Protein-protein docking on hardware accelerators: comparison of GPU and MIC architectures.

Authors:  Takehiro Shimoda; Shuji Suzuki; Masahito Ohue; Takashi Ishida; Yutaka Akiyama
Journal:  BMC Syst Biol       Date:  2015-01-21

7.  Specificity of broad protein interaction surfaces for proteins with multiple binding partners.

Authors:  Nobuyuki Uchikoga; Yuri Matsuzaki; Masahito Ohue; Yutaka Akiyama
Journal:  Biophys Physicobiol       Date:  2016-07-14

Review 8.  Predictive and Experimental Approaches for Elucidating Protein-Protein Interactions and Quaternary Structures.

Authors:  John Oliver Nealon; Limcy Seby Philomina; Liam James McGuffin
Journal:  Int J Mol Sci       Date:  2017-12-05       Impact factor: 5.923

Review 9.  Graphics processing units in bioinformatics, computational biology and systems biology.

Authors:  Marco S Nobile; Paolo Cazzaniga; Andrea Tangherloni; Daniela Besozzi
Journal:  Brief Bioinform       Date:  2017-09-01       Impact factor: 11.622

10.  MAP4K4 expression in cardiomyocytes: multiple isoforms, multiple phosphorylations and interactions with striatins.

Authors:  Stephen J Fuller; Nick S Edmunds; Liam J McGuffin; Michelle A Hardyman; Joshua J Cull; Hajed O Alharbi; Daniel N Meijles; Peter H Sugden; Angela Clerk
Journal:  Biochem J       Date:  2021-06-11       Impact factor: 3.857

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.