| Literature DB >> 32190246 |
José Jiménez-Luna1, Laura Pérez-Benito2,3, Gerard Martínez-Rosell4, Simone Sciabola5, Rubben Torella6, Gary Tresadern3, Gianni De Fabritiis1,4,7.
Abstract
The capability to rank different potential drug molecules against a protein target for potency has always been a fundamental challenge in computational chemistry due to its importance in drug design. While several simulation-based methodologies exist, they are hard to use prospectively and thus predicting potency in lead optimization campaigns remains an open challenge. Here we present the first machine learning approach specifically tailored for ranking congeneric series based on deep 3D-convolutional neural networks. Furthermore we prove its effectiveness by blindly testing it on datasets provided by Janssen, Pfizer and Biogen totalling over 3246 ligands and 13 targets as well as several well-known openly available sets, representing one the largest evaluations ever performed. We also performed online learning simulations of lead optimization using the approach in a predictive manner obtaining significant advantage over experimental choice. We believe that the evaluation performed in this study is strong evidence of the usefulness of a modern deep learning model in lead optimization pipelines against more expensive simulation-based alternatives. This journal is © The Royal Society of Chemistry 2019.Entities:
Year: 2019 PMID: 32190246 PMCID: PMC7066671 DOI: 10.1039/c9sc04606b
Source DB: PubMed Journal: Chem Sci ISSN: 2041-6520 Impact factor: 9.825
Fig. 1Architecture of the proposed model. A two-legged neural network with tied weights was constructed, and a pair of protein–ligand voxelization is feed-forwarded through it to later perform a latent space difference.
Fig. 2Average Pearson's correlation coefficient R (±1 standard deviation) based on 25 independent runs on different sets for the Janssen PDE2, PDE3 and PDE10 targets.
Fig. 3Average Pearson's correlation coefficient R (±1 standard deviation) based on several independent runs on two sets for the Janssen ROS1 and BACE targets.
Spearman's ρ performance results between experimental and predicted absolute affinities provided by Pfizer I&I, where other empirical, simulation, and machine-learning based affinity prediction methods are compared on several congeneric series. Performance is poor for most tested model except for the sequential approach proposed here, with Pearson correlations averaging over 0.5 with as few as 10% used analogues from the congeneric series at hand
| Target | # ligands | Mol. weight ( | clog | MM-GBSA ( |
| This work (10% training, | This work (20% training, | This work (30% training, |
| Kinase #1 | 362 | 0.19 | 0.06 | 0.56 | 0.42 | 0.49 | 0.64 | 0.73 |
| Kinase #2 | 106 | 0.1 | 0.28 | 0.25 | 0.25 | 0.25 | 0.41 | 0.51 |
| Kinase #3 | 95 | 0 | 0.04 | 0.25 | –0.27 | 0.3 | 0.3 | 0.31 |
| Enzyme | 93 | 0.43 | 0.24 | 0.01 | 0.49 | 0.43 | 0.26 | 0.59 |
| Phosphodiesterase | 100 | 0.37 | 0.36 | 0.67 | 0 | 0.49 | 0.64 | 0.73 |
| Activator of transcriptions | 199 | 0.13 | 0.08 | 0.66 | 0.29 | 0.72 | 0.84 | 0.94 |
| Weighted avg. | 0.19 | 0.14 | 0.47 | 0.25 |
|
|
| |
| Simple avg. | 0.2 | 0.18 | 0.4 | 0.18 |
|
|
|
Calculated log P as available in rdkit.
Fig. 4Results over 5 runs on Biogen's tyrosine-protein kinase and receptor-associated kinase using a temporal split, and MM-GBSA and QSAR random forest pipelines as baselines.
Simulation-based benchmark results over 10 independent runs for the different datasets. We show the amount of molecules the model is allowed to pick at each synthesis epoch, the experimental order of the compound with the highest affinity in the series, the average synthesis epoch our model found said molecule, the total necessary sampled ligands the proposed model has chosen before the target compound, and the sampling advantages over the experimental and random orders
| Target | Set | # ligands | Chosen per synthesis epoch | Experimental order | Found at synthesis epoch | Total sampled ligands | Advantage over experimental choice | Advantage over random choice |
| PDE2 | 1 | 900 | 10 | 766 | 12.2 | 132 |
|
|
| PDE2 | 2 | 303 | 10 | 61 | 1 | 20 |
|
|
| PDE2 | 3 | 278 | 10 | 253 | 5.9 | 69 |
|
|
| ROS1 | — | 165 | 10 | 73 | 3.1 | 41 |
|
|
| BACE | — | 229 | 10 | 190 | 20.8 | 218 | –28 | –103.5 |
Fig. 5Average model-picked training set affinity per number of compounds synthesized for the Janssen PDE2, ROS1 and BACE sets, as well as a baseline based on the actual experimental choice order of compounds.