Literature DB >> 35609983

CB-Dock2: improved protein-ligand blind docking by integrating cavity detection, docking and homologous template fitting.

Yang Liu¹, Xiaocong Yang¹, Jianhong Gan¹, Shuang Chen², Zhi-Xiong Xiao¹, Yang Cao^1,3.

Abstract

Protein-ligand blind docking is a powerful method for exploring the binding sites of receptors and the corresponding binding poses of ligands. It has seen wide applications in pharmaceutical and biological researches. Previously, we proposed a blind docking server, CB-Dock, which has been under heavy use (over 200 submissions per day) by researchers worldwide since 2019. Here, we substantially improved the docking method by combining CB-Dock with our template-based docking engine to enhance the accuracy in binding site identification and binding pose prediction. In the benchmark tests, it yielded the success rate of ∼85% for binding pose prediction (RMSD < 2.0 Å), which outperformed original CB-Dock and most popular blind docking tools. This updated docking server, named CB-Dock2, reconfigured the input and output web interfaces, together with a highly automatic docking pipeline, making it a particularly efficient and easy-to-use tool for the bioinformatics and cheminformatics communities. The web server is freely available at https://cadd.labshare.cn/cb-dock2/.

Entities: Chemical

Year: 2022 PMID： 35609983 PMCID： PMC9252749 DOI： 10.1093/nar/gkac394

Source DB: PubMed Journal: Nucleic Acids Res ISSN： 0305-1048 Impact factor: 19.160

INTRODUCTION

Predicting interactions between proteins and small molecules plays key roles in deciphering a wide variety of biological processes and is crucial to understanding protein functions as well as leveraging drug development (1,2). A powerful approach for this purpose is protein–ligand blind docking, which identifies the binding regions of a protein, and simultaneously predicts the binding pose of a molecule (3,4). Recently, there is an increasingly urgent need of blind docking for the reason that massive protein structures have been determined by AlphaFold2 (5) or RoseTTAFold (6), opening the opportunities to explore new target therapies (7,8). The state-of-the-art blind docking methods such as SwissDock (9), COACH-D (10), EDock (11), MTiAutoDock (12) etc. have been extensively used in exploring potential binding sites or ligand-binding poses. CB-Dock is a protein–ligand blind docking server developed by our lab (13). It employed our protein-surface-curvature-based cavity detection approach (CurPocket) (13,14) to guide the molecular docking with AutoDock Vina (version 1.1.2) (15,16). Since the original release in 2019, CB-Dock webserver has seen over 200 task submissions worldwide per day, and numerous researchers have used CB-Dock for exploring the binding properties of the compounds as well as the molecular mechanism. For instance, Alvarez et al. discovered a protein functional region by detecting a noncharged binding pocket and docking with acetic acid using CB-Dock (17). Singh et al. utilized the CurPocket algorithm in CB-Dock to predict the binding sites for curcumin on Vibrio cholerae cytolysin (VCC) (18). Particularly, CB-Dock has been broadly used in the study of COVID-19 therapeutic agents (19–24). The extensive exploitation of CB-Dock can be attributed to the following advantages. (i) Quick result acquisition: the average task time is about one minute, which is suitable for real-time analysis. (ii) High-accuracy prediction: it showed 16%∼30% improvement in terms of docking success rate compared with other blind docking methods in our benchmark (13). (iii) Easy-to-use web interface: it provides interactive and intuitive visualization, which lowers the technical threshold for users. (iv) Exploratory capabilities for docking: it provides centers, sizes and volumes of the predicted cavities to facilitate the migration usage with other molecular docking tools. Herein, we present an updated version of CB-Dock, in which multiple new features have been added for both computational methods and web interfaces. It inherits the structure-based cavity detection and docking module and integrates a novel template-based molecular docking module to further enhance the accuracy. Our benchmark tests showed that CB-Dock2 surpassed CB-Dock with over 16% improvement in terms of docking success rate. The details of the updates will be described in the following sections.

CB-DOCK2: OVERVIEW AND NEW FEATURES

Computational pipeline

CB-Dock2 performs highly automatic protein–ligand blind docking by four steps: (i) data input, (ii) data processing, (iii) cavity detection and docking, and (iv) visualization and analysis (Figure 1). The data input includes the PDB file of query protein and the MOL2/SDF/PDB file of query ligand. In this new version, ligands can be drawn manually by using a built-in JSME (25) plug-in. The submitted ligand will be processed by adding hydrogens as well as partial charges, and generated initial 3D conformation by RDKit. CB-Dock2 will check the submitted protein, add the missing side-chain atoms (26,27) and hydrogen atoms, send notices about missing residues in a protein (28) and eliminate the co-crystallized waters as well as other het groups. The cavity detection and docking start with template matching, which searches for known complexes with similar proteins and ligands from the prepared complex database. If any similar complexes are retrieved, CB-Dock2 will use two parallel pipelines, i.e. structure-based and template-based blind docking to perform docking simulation. Among them, the structure-based blind docking pipeline is fully inherited from CB-Dock, and the detailed workflow is described in our publication (13). The template-based blind docking pipeline is powered by our recently developed docking method FitDock (29), which is elaborated in the next section. Each pipeline will produce a list of protein–ligand binding sites as well as binding poses. These results will be integrated by merging the same predicted binding sites and retaining the top scoring binding poses. If no similar complex is retrieved, CB-Dock2 will bypass the template-based blind docking pipeline. The visualization and analysis present the final results, which can be visualized and analyzed by interactive NGL Viewers (30) for 3D structures and 2D sequences, together with abundant information about binding sites, template structures, binding scores, contact residues, docking center, cavity volume, etc. Users can adjust and compare the results in massive forms, and download the results for further off-line analysis.

Figure 1.

The overall pipeline of CB-Dock2. It is constructed by modules of data input, data processing, cavity detection and docking, and visualization and analysis.

New feature: template-based blind docking

As the rapid accumulation of structures in Protein Data Bank (31), docking simulation can be profited by introducing the knowledge of the solved protein–ligand complex structures (32,33). CB-Dock2 not only inherits the structure-based cavity detection and docking module from CB-Dock, but also integrates a template-based molecular docking method FitDock (29) that we developed and published recently. This new module can extract the docking modes from the similar complex structures in the protein–ligand database and transfer to the query protein and ligand, with the assumption that similar ligands result in similar binding modes (32). In our comprehensive benchmark tests, FitDock showed 40–60% improvement in terms of docking success rate and an order of magnitude faster over popular docking methods, if template structures were available (29). The template structure database used in CB-Dock2 is taken from BioLip (version of 2021.09.15) (34) which is currently the most comprehensive protein–ligand interaction database. After removing the interactions involving ions, peptides, DNA/RNA and the artifact ligands, CB-Dock2 includes 214 506 protein–ligand complex structures. For a given query protein and ligand, CB-Dock2 firstly searches for similar ligands using FP2 fingerprint (35) with a minimum threshold of 0.4. Afterwards, the query protein will be superposed to the corresponding complex structure for FitDock. If none of the template structures were found, CB-Dock2 will only perform the structure-based cavity detection and docking. In addition, when more than one template structure was found, CB-Dock2 will merge cavities from two different templates if they share over 50% binding residues. In the other cases, CB-Dock2 will regard them as two different cavities and perform docking independently. By taking the advantages of both structure-based and template-based docking, CB-Dock2 showed significant improvement over the original CB-Dock. We performed the same docking test as our previous work using Astex Diverse Set (13). It should be mentioned that 82 in 85 test cases employed template-based docking (see Supplementary Table S1) while the others only performed structure-based docking. The result showed that CB-Dock2 achieved 85.9% success rate (the percentage of top-ranking pose within 2.0 Å root mean squared deviation (RMSD) compared with the crystal structure) in the whole data set, which outperformed 69.4% of CB-Dock or 83.5% of FitDock remarkably (Figure 2A). Compared to the state-of-the-art blind docking servers, such as MTiAutoDock (12), SwissDock (9) and COACH-D (10), CB-Dock2 exhibited at least 16% higher success rate (Figure 2B), which suggests the significance of the update.

Figure 2.

The overall performance of CB-Dock2 on Astex Diverse Set. (A) The success rates of the top-ranking binding modes achieved by CB-Dock, FitDock and CB-Dock2 respectively. (B) The success rates of the top-ranking binding modes achieved by MTiAutoDock, SwissDock, COACH-D and CB-Dock2, respectively. COACH-D1 and COACH-D2 refers to that the best pose generated by COACH-D is selected based on c-score and the docking score of AutoDock Vina, respectively.

New feature: the reconfiguration of input and output interfaces

The web interface was redesigned to provide more useful information and intuitive guidance. Firstly, we added a new function for users to perform cavity detection independently (Figure 3). It will illustrate the predicted binding regions in an interactive 3D viewer, in which the cavities can be selected manually for structure-based or template-based docking (Figure 3B, D). Particularly, the residues at the binding regions are highlighted in a sequence panel to facilitate the identification of binding sites. This function can help the users focus their investigation on any known binding pockets. Secondly, we updated the input interface (Figure 4A) and added a molecule editor, powered by JSME (25), to facilitate the input of ligands (Figure 4B). It allows users to upload query ligands by providing SMILES code or by drawing 2D structure in the JSME window. The ligand uploaded can also be previewed and modified in the window, which is convenient for the comparison studies. Thirdly, the docking result page provides plentiful interactive operations for the online analysis (Figure 4D). For instance, it provides a list of interaction residues with the distance threshold defined by the CASP (the sum of the van der Waals radii of the involved atoms plus a tolerance of 0.5Å) (36), enabling users to obtain contact residues more conveniently. And it also exhibits the results of structure-based and template-based docking to facilitate the neck-to-neck comparison. Fourthly, we added more parameter settings to improve the extensibility of CB-Dock2 and enrich the user experience, including modification of the number of cavities and uploading customized complex structures for template-based docking simulations.

Figure 3.

Figure 4.

The input and output interfaces of blind docking. (A) The panel for submission of query protein. (B) The panel for submission of query ligand. (C) The job list and status of the submissions. (D) The interactive visualization and meta-analysis for the blind docking results. Users can select and highlight the docking pose for each cavity.

Output interfaces of cavity detection. (A) The cavities detected by analyzing the concave regions on the solvent accessible surface of the query protein. (B) The docking results after clicking the button of ‘BlindDock’. (C) The cavities detected based on homologus templates. (D) The docking results after clicking the button of ‘Template-based Docking’. The input and output interfaces of blind docking. (A) The panel for submission of query protein. (B) The panel for submission of query ligand. (C) The job list and status of the submissions. (D) The interactive visualization and meta-analysis for the blind docking results. Users can select and highlight the docking pose for each cavity.

Other optimization

A drawback in CB-Dock is that the cavity detection approach (CurPocket) is extremely time consuming when the number of residues in the query protein is >2000. To address this issue, we optimized the program of calculating protein-surface curvature and enabled rapid processing of the ultra-large proteins. The test results show that the updated method speeds up to 4–5 times faster and can finish in 50s for a 2500-residue protein (Supplementary Figure S1).

CONCLUSIONS AND FUTURE PERSPECTIVES

CB-Dock2 inherits the popular features of the original version and is improved by integrating a template-based blind docking module, which empowers users to obtain potential binding sites and binding modes by referring to known protein–ligand structure information. The reconfiguration of the user interface allows CB-Dock2 to have more options for sophisticated and diverse data submission, and more convenient visualization of the results. The additional ligand drawing interface and the upload module for user-defined template complex, together with the newly integrated FitDock, empower CB-Dock2 to be particularly convenient for drug design and optimization beyond cavity detection and blind docking. Despite the enhanced features of CB-Dock2, there are still some shortcomings that need to be further addressed in our subsequent work. For instance, the current version cannot distinguish between the asymmetric unit and the biological assembly for the user-uploaded protein structures, which may result in artificial binding cavities. And it does not support fixing the missing residues, optimizing the protein structures and flexibility of receptors, which may be crucial for molecular docking. More importantly, the docking engines should be continuously improved. Hopefully, CB-Dock2 will benefit from our efforts and users’ feedback to become increasingly strong and prevalent.

DATA AVAILABILITY

The CB-Dock2 web server is publicly available at https://cadd.labshare.cn/cb-dock2/. Click here for additional data file.

34 in total

1. The Protein Data Bank.

Authors: H M Berman; J Westbrook; Z Feng; G Gilliland; T N Bhat; H Weissig; I N Shindyalov; P E Bourne
Journal: Nucleic Acids Res Date: 2000-01-01 Impact factor: 16.971

Review 2. Trends in the exploitation of novel drug targets.

Authors: Mathias Rask-Andersen; Markus Sällman Almén; Helgi B Schiöth
Journal: Nat Rev Drug Discov Date: 2011-08-01 Impact factor: 84.694

3. AutoDock Vina 1.2.0: New Docking Methods, Expanded Force Field, and Python Bindings.

Authors: Jerome Eberhardt; Diogo Santos-Martins; Andreas F Tillack; Stefano Forli
Journal: J Chem Inf Model Date: 2021-07-19 Impact factor: 4.956

4. Protein-ligand interaction prediction: an improved chemogenomics approach.

Authors: Laurent Jacob; Jean-Philippe Vert
Journal: Bioinformatics Date: 2008-08-01 Impact factor: 6.937

5. MTiOpenScreen: a web server for structure-based virtual screening.

Authors: Céline M Labbé; Julien Rey; David Lagorce; Marek Vavruša; Jérome Becot; Olivier Sperandio; Bruno O Villoutreix; Pierre Tufféry; Maria A Miteva
Journal: Nucleic Acids Res Date: 2015-04-08 Impact factor: 16.971

6. COACH-D: improved protein-ligand binding sites prediction with refined ligand-binding poses through molecular docking.

Authors: Qi Wu; Zhenling Peng; Yang Zhang; Jianyi Yang
Journal: Nucleic Acids Res Date: 2018-07-02 Impact factor: 16.971

7. CB-Dock: a web server for cavity detection-guided protein-ligand blind docking.

Authors: Yang Liu; Maximilian Grimm; Wen-Tao Dai; Mu-Chun Hou; Zhi-Xiong Xiao; Yang Cao
Journal: Acta Pharmacol Sin Date: 2019-07-01 Impact factor: 6.150

8. The Escherichia coli two-component signal sensor BarA binds protonated acetate via a conserved hydrophobic-binding pocket.

Authors: Adrián F Alvarez; Claudia Rodríguez; Ricardo González-Chávez; Dimitris Georgellis
Journal: J Biol Chem Date: 2021-11-04 Impact factor: 5.157

9. BioLiP: a semi-manually curated database for biologically relevant ligand-protein interactions.

Authors: Jianyi Yang; Ambrish Roy; Yang Zhang
Journal: Nucleic Acids Res Date: 2012-10-18 Impact factor: 16.971

10. In silico identification of Tretinoin as a SARS-CoV-2 envelope (E) protein ion channel inhibitor.

Authors: Debajit Dey; Subhomoi Borkotoky; Manidipa Banerjee
Journal: Comput Biol Med Date: 2020-10-20 Impact factor: 4.589

3 in total

1. Prevention of Aflatoxin Occurrence Using Nuts-Edible Coating of Ginger Oil Nanoemulsions and Investigate the Molecular Docking Strategy.

Authors: Amr Farouk; Adel Gabr Abdel-Razek; Karolina Gromadzka; Ahmed Noah Badr
Journal: Plants (Basel) Date: 2022-08-28

2. DrugRep: an automatic virtual screening server for drug repurposing.

Authors: Jian-Hong Gan; Ji-Xiang Liu; Yang Liu; Shu-Wen Chen; Wen-Tao Dai; Zhi-Xiong Xiao; Yang Cao
Journal: Acta Pharmacol Sin Date: 2022-10-10 Impact factor: 7.169

3. Drimane Sesquiterpene Aldehydes Control Candida Yeast Isolated from Candidemia in Chilean Patients.

Authors: Víctor Marín; Bryan Bart; Nicole Cortez; Verónica A Jiménez; Víctor Silva; Oscar Leyton; Jaime R Cabrera-Pardo; Bernd Schmidt; Matthias Heydenreich; Viviana Burgos; Cristian Paz
Journal: Int J Mol Sci Date: 2022-10-04 Impact factor: 6.208

3 in total