| Literature DB >> 31540192 |
Pedro H M Torres1, Ana C R Sodero2, Paula Jofily3, Floriano P Silva-Jr4.
Abstract
Molecular docking has been widely employed as a fast and inexpensive technique in the past decades, both in academic and industrial settings. Although this discipline has now had enough time to consolidate, many aspects remain challenging and there is still not a straightforward and accurate route to readily pinpoint true ligands among a set of molecules, nor to identify with precision the correct ligand conformation within the binding pocket of a given target molecule. Nevertheless, new approaches continue to be developed and the volume of published works grows at a rapid pace. In this review, we present an overview of the method and attempt to summarise recent developments regarding four main aspects of molecular docking approaches: (i) the available benchmarking sets, highlighting their advantages and caveats, (ii) the advances in consensus methods, (iii) recent algorithms and applications using fragment-based approaches, and (iv) the use of machine learning algorithms in molecular docking. These recent developments incrementally contribute to an increase in accuracy and are expected, given time, and together with advances in computing power and hardware capability, to eventually accomplish the full potential of this area.Entities:
Keywords: benchmarking sets; computer-aided drug design; consensus methods; fragment-based; machine learning; structure-based drug design
Year: 2019 PMID: 31540192 PMCID: PMC6769580 DOI: 10.3390/ijms20184574
Source DB: PubMed Journal: Int J Mol Sci ISSN: 1422-0067 Impact factor: 5.923
Figure 1General workflow of molecular docking calculations. The approaches normally start by obtaining 3D structures of target and ligands. Then, protonation states and partial charges are assigned. If not previously known, the target binding site is detected, or a blind docking simulation may be performed. Molecular docking calculations are carried out in two main steps: posing and scoring, thus generating a ranked list of possible complexes between target and ligands.
Molecular docking software.
| Software | Posing | Scoring | Availability | Reference |
|---|---|---|---|---|
| Vina | Iterated Local Search + BFGS Local Optimiser | Empirical/Knowledge-Based | Free (Apache License) | Trott, 2010 [ |
| AutoDock4 | Lamarckian Genetic Algorithm, Genetic Algorithm or Simulated Annealing | Semiempirical | Free (GNU License) | Morris, 2009; Huey, 2007 [ |
| Molegro/MolDock | Differential Evolution (Alternatively Simplex Evolution and Iterated Simplex) | Semiempirical | Commercial | Thomsen, 2006 [ |
| Smina | Monte Carlo stochastic sampling + local optimisation | Empirical (customisable) | Free (GNU License) | Koes, 2013 [ |
| Plants | Ant Colony Optimisation | Empirical | Academic License | Korb, 2007; Korb, 2009 [ |
| ICM | Biased Probability Monte Carlo + Local Optimisation | Physics-Based | Commercial | Abagyan, 1993; Abagyan, 1994 [ |
| Glide | Systematic search + Optimisation (XP mode also uses anchor-and-grow) | Empirical | Commercial | Friesner, 2004 [ |
| Surflex | Fragmentation and alignment to idealised molecule (Protomol) + BFGS optimisation | Empirical | Commercial | Jain, 2003; Jain 2007 [ |
| GOLD | Genetic Algorithm | Physics-based (GoldScore), Empirical (ChemScore, ChemPLP) and Knowledge-based (ASP) | Commercial | Jones, 1997; Verdonk 2003 [ |
| GEMDOCK | Generic Evolutionary Algorithm | Empirical (includes pharmacophore potential) | Free (for non-commercial research) | Yang, 2004 [ |
| Dock6 | Anchor-and-grow incremental construction | Physics-based (several other options) | Academic License | Allen, 2015 [ |
| GAsDock | Entropy-based multi-population genetic algorithm | Physics-based | * | Li, 2004 [ |
| FlexX | Fragment-Based Pattern-recognition (Pose Clustering) + Incremental Growth | Empirical | Commercial | Rarey, 1996; Rarey, 1996b [ |
| Fred | Conformer generation + Systematic rigid body search | Empirical (defaults to Chemgauss3) | Commercial | McGann, 2011 [ |
| DockThor | Steady-state genetic algorithm (with Dynamic Modified Restricted Tournament Selection method) | Physics-based + Empirical | Free (Webserver) | De Magalhães, 2014 [ |
* Availability is unclear.
Consensus docking methods.
| Source | T a | Posing b | F c | Consensus Strategy | Analysis | Ref. |
|---|---|---|---|---|---|---|
|
| 102/3 | 4 | 4 | Standard Deviation Consensus (SDC), | Rank/Score curves | Chaput, 2016 [ |
|
| 21 | 8 | 8 | Gradient Boosting | EF, ROCAUC | Ericksen, 2017 [ |
|
| 228/1 | Vina, AutoDock | 2 | Compound rejection if pose RMSD > 2.0 Å | Success rate | Houston, 2013 [ |
|
| 3 | GAsDock | 2 | Multi-Objective Scoring Function Optimisation | EF | Kang, 2019 [ |
|
| 1 | Glide | 26 | Linear Combination | BEI Correlation | Li, 2018 [ |
|
| 220 | FlexX | 9 | Several e | Compression and Accuracy | Oda, 2006 [ |
|
| 102 | Dock 3.6 | 15 | Genetic Algorithm used to combine SF components | EF, BEDROC | Perez-Castillo, 2019 [ |
|
| 1300 | 7 | 7 | RMSD-based pose consensus, multivariate linear regression | Success rate | Plewczynski, 2011 [ |
|
| 35 | 10 | 10 | Compound rejection based on RMSD consensus level | EF | Poli, 2016 [ |
|
| 3535 | 11 | 11 | Selection of representative pose with minimum RMSD | Success rate | Ren, 2018 [ |
|
| 100 | AutoDock | 11 | Supervised Learning (Random Forests), | Average RMSD, | Teramoto, 2007 [ |
|
| 130/3 | 10 | 10 | Compound rejection based on RMSD consensus level | EF, ROCAUC | Tuccinardi (2014) [ |
|
| 421 | Glide | 7 | Support Vector Rank Regression | Top pose /Top Rank | Wang, 2013 [ |
|
| 4 | GEMDOCK | 2 | Rank-by-rank, | Rank/Score curve, GH Score, CS index | Yang, 2005 [ |
a Total number of targets used in the assay; b Posing software used. If more than two software were used, than only the number is indicated; c Number of scoring functions used; d In this study, the dataset was composed of 25 mammalian target of rapamycin (mTOR) kinase inhibitors retrieved from the literature and six mTOR crystal structures retrieved from PDB; e The purpose of this study was to evaluate several different consensus strategies (e.g., rank-by-vote, rank-by-number, etc).
Recent works using consensus docking approaches.
| Target | Lig. | Posing | F a | Consensus Strategy | Hits/Test | Best Activity (IC50) | Ref. |
|---|---|---|---|---|---|---|---|
|
| 3.57 × 107 | VINA, FlexX | 2 | Sequential Docking | - | - | Onawole, 2018 [ |
|
| 1.13 × 105 | Surflex | 12 | Z-scaled rank-by-number | 2/20 | 51.6 μM | Liu, 2012 [ |
|
| 738 | 2 | 2 | Sequential Docking Compound rejection if pose RMSD > 2.0 Å | - | - | Aliebrahimi, 2017 [ |
|
| 14,758 | 4 | 4 | vSDC [ | 12/14 | 47.3 nM | Mokrani, 2019 [ |
|
| 32,500 | 10 | 10 | Compound rejection based on RMSD consensus level | 1/10 | 13.4 μM | Spena, 2019 [ |
|
| 47 | LigandFit | 5 | Support Vector Regression | 6/6 b | 7.7 nM | Zhan, 2014 [ |
|
| 4.80 × 105 | 4 | 4 | Compound rejection based on RMSD consensus level | 1/3 | 6.1 µM | Mouawad, 2019 [ |
a Number of scoring functions used; b This work consisted of a Quantitative Structure-Activity Relationship (QSAR) model using consensus docking as descriptors. Six compounds were designed, synthesised and tested, exhibiting IC50 values between 7.7 nM and 4.3 μM; c First IC50 value: inhibitory activity against PIN1 isomerisation. Second IC50 value: inhibitory effects on ovarian cancer cell lines.
Figure 2Scopus search results for the query “TITLE-ABS-KEY (software AND docking) AND PUBYEAR > 1994 AND PUBYEAR < 2019” where the word software is substituted for one of the eight most common docking software or by the word consensus.
Figure 3Ratio of the numbers of papers containing either the expression “molecular docking” or “ligand docking” to the number of papers containing either of the two expressions AND the word consensus.
Figure 4Learning methods can be broadly divided into supervised learning, when there is data available for training and parameterisation; and unsupervised learning, when there is no such data. Unsupervised learning cannot be used for binding affinity predictions and virtual screening. Supervised learning, on the other hand, can be divided into parametric and nonparametric learning. Parametric learning assumes a predetermined functional form, as observed in linear regression, and is the method employed in classical scoring functions. Nonparametric learning, or just machine learning, does not presume a predetermined functional form, which is instead inferred from the data itself. It can yield continuous output, as in nonlinear regression, or discrete output, for classification problems such as binders/nonbinders identification.
Recent developments using machine learning (ML) algorithms in molecular docking.
| SF Name | ML Algorithm | Training Database | Best Performance | Generic or Family Specific | Type of Docking Study | Reference |
|---|---|---|---|---|---|---|
| RF-Score | RF a | PDBbind | Rp b = 0.776 | Generic | BAP c | Ballester 2010 [ |
| B2BScore | RF | PDBbind | Rp = 0.746 | Generic | BAP | Liu 2013 [ |
| SFCScoreRF | RF | PDBbind | Rp = 0.779 | Generic | BAP | Zilian, 2013 [ |
| PostDOCK | RF | Constructed from PDB | 92% accuracy | Generic | VS d | Springer, 2005 [ |
| - | SVM e | DUD | - | Both | VS | Kinnings, 2011 [ |
| ID-Score | SVR f | PDBbind | Rp = 0.85 | Generic | BAP | Li, 2013 [ |
| NNScore | NN g | PDB; MOAD; PDBbind-CN | EF = 10.3 | Generic | VS | Durrant, 2010 [ |
| CScore | NN | PDBbind | Rp = 0.7668 (gen.) Rp = 0.8237 (fam. spec.) | Both | BAP | Ouyang, 2011 [ |
| - | Deep NN | CSAR, DUD-E | ROCAUC = 0.868 | Generic | VS | Ragoza, 2017 [ |
| - | Deep NN | DUD-E | ROCAUC = 0.92 | Both | VS | Imrie, 2018 [ |
| DLScore | Deep NN | PDBbind | Rp = 0.82 | Generic | BAP | Hassan, 2018 [ |
| DeepVS | Deep NN | DUD | ROCAUC = 0.81 | Generic | VS | Pereira, 2016 [ |
| Kdeep | Deep NN | PDBbind | Rp = 0.82 | Generic | BAP | Jiménez, 2018 [ |
a Random Forest; b Pearson’s Correlation Coefficient; c Binding Affinity Prediction; d Virtual Screening; e Support Vector Machine; f Support Vector Regression; g Neural Network.