| Literature DB >> 32471211 |
José Jiménez-Luna1,2, Alberto Cuzzolin3, Giovanni Bolcato3, Mattia Sturlese3, Stefano Moro3.
Abstract
While a plethora of different protein-ligand docking protocols have been developed over the past twenty years, their performances greatly depend on the provided input protein-ligand pair. In this study, we developed a machine-learning model that uses a combination of convolutional and fully connected neural networks for the task of predicting the performance of several popular docking protocols given a protein structure and a small compound. We also rigorously evaluated the performance of our model using a widely available database of protein-ligand complexes and different types of data splits. We further open-source all code related to this study so that potential users can make informed selections on which protocol is best suited for their particular protein-ligand pair.Entities:
Keywords: chemoinformatics; deep learning; molecular docking; structural biology
Mesh:
Year: 2020 PMID: 32471211 PMCID: PMC7321124 DOI: 10.3390/molecules25112487
Source DB: PubMed Journal: Molecules ISSN: 1420-3049 Impact factor: 4.411
Predictive performance for per docking protocol, for each of the four splits considered.
| Protocol | RMSE | Pearson’s | RMSE | Pearson’s | RMSE | Pearson’s | RMSE | Pearson’s |
|---|---|---|---|---|---|---|---|---|
| Random | Ligand Scaffold | Protein Classes | Protein Classes Balanced | |||||
| autodock-ga | 1.60 | 0.74 | 1.34 | 0.38 | 1.76 | 0.60 | 1.48 | 0.73 |
| autodock-lga | 2.01 | 0.65 | 1.82 | 0.30 | 2.20 | 0.57 | 1.89 | 0.70 |
| autodock-ls | 2.04 | 0.50 | 1.79 | 0.50 | 2.02 | 0.41 | 1.93 | 0.46 |
| glide-sp | 2.79 | 0.52 | 3.34 | 0.14 | 2.84 | 0.44 | 2.34 | 0.64 |
| gold-asp | 2.43 | 0.68 | 2.50 | 0.50 | 2.52 | 0.64 | 2.08 | 0.78 |
| gold-chemscore | 2.59 | 0.62 | 2.74 | 0.37 | 2.62 | 0.61 | 2.25 | 0.73 |
| gold-goldscore | 2.47 | 0.52 | 2.44 | 0.53 | 2.49 | 0.51 | 2.12 | 0.66 |
| gold-plp | 2.49 | 0.66 | 2.53 | 0.32 | 2.57 | 0.62 | 2.14 | 0.76 |
| plants-chemplp | 2.55 | 0.44 | 2.68 | −0.02 | 2.55 | 0.56 | 2.23 | 0.58 |
| plants-plp95 | 3.04 | 0.42 | 3.16 | −0.12 | 3.08 | 0.40 | 2.58 | 0.57 |
| plants-plp | 2.75 | 0.43 | 2.76 | 0.09 | 2.79 | 0.41 | 2.44 | 0.54 |
| rdock-solv | 3.95 | 0.35 | 3.58 | 0.09 | 3.73 | 0.42 | 3.33 | 0.54 |
| rdock-std | 3.92 | 0.35 | 3.62 | 0.08 | 3.71 | 0.42 | 3.23 | 0.56 |
| vina-std | 2.23 | 0.40 | 2.30 | 0.19 | 2.35 | 0.33 | 1.97 | 0.69 |
|
| 2.63 | 0.52 | 2.62 | 0.24 | 2.66 | 0.50 | 2.29 | 0.64 |
Ligand-centric evaluation () for the four different proposed split types in this study.
| Split Type | Pearson’s | RMSE |
|---|---|---|
| random |
|
|
| ligand scaffold |
|
|
| protein classes |
|
|
| protein classes balanced |
|
|
Figure 1Ligand-centric evaluation merging all protocols and for all different types of proposed splits.
Figure 2Distribution of , , and values in a self-docking scenario using the PDBbind v.2017 database of cocrystals, for all the protocols described in Table 3, and the approach proposed in this work under different evaluation scenarios.
Figure 3Average Pearson’s R correlation coefficient for the metric for all types of splits disaggregated into the 30 most populated PFAM families in the PDBbind refined dataset.
Docking protocols, search algorithms, and scoring functions considered in this study.
| Score | Search Algorithm | Scoring Function | Protocol Abbrv. |
|---|---|---|---|
| Autodock 4.2 | Local search | Autodock SF | autodock-ls |
| Lamarckian GA | autodock-lga | ||
| GA | autodock-ga | ||
| Glide 6.5 | Glide algorithm | Standard precision | glide-sp |
| GOLD 5.4.1 | GA | ASP | gold-asp |
| Chemscore | gold-chemscore | ||
| Goldscore | gold-goldscore | ||
| PLP | gold-plp | ||
| PLANTS 1.2 | ACO algorithm | ChemPLP | plants-chemplp |
| PLP | plants-plp | ||
| PLP95 | plants-plp95 | ||
| rDock 2013.1 | GA + MC + Simplex minimization | rDock master SF | rdock-std |
| rDock master SF + desolvation | rdock-solv | ||
| Vina 1.1.2 | MC + BFGS local search | Vina SF | vina-std |
GA (Genetic Algorithm), MC (Monte Carlo), BFGS (Broyden–Fletcher–Goldfarb–Shanno), ASP (Astex Statistical Potential), PLP (Pairwise Linear Potential), ACO (Ant Colony Optimization).
Figure 4Schema of the proposed architecture in this work. A fully connected neural network handles ECFP4 fingerprints and descriptors computed from RDKit while a 3D-convolutional neural network processes a voxelized representation of the protein binding site. Latent space from both inputs is then concatenated and fed into further fully connected layers that predict the three outputs of interest per docking protocol.