| Literature DB >> 29934750 |
Abstract
We introduce the QuanSA method for inducing physically meaningful field-based models of ligand binding pockets based on structure-activity data alone. The method is closely related to the QMOD approach, substituting a learned scoring field for a pocket constructed of molecular fragments. The problem of mutual ligand alignment is addressed in a general way, and optimal model parameters and ligand poses are identified through multiple-instance machine learning. We provide algorithmic details along with performance results on sixteen structure-activity data sets covering many pharmaceutically relevant targets. In particular, we show how models initially induced from small data sets can extrapolatively identify potent new ligands with novel underlying scaffolds with very high specificity. Further, we show that combining predictions from QuanSA models with those from physics-based simulation approaches is synergistic. QuanSA predictions yield binding affinities, explicit estimates of ligand strain, associated ligand pose families, and estimates of structural novelty and confidence. The method is applicable for fine-grained lead optimization as well as potent new lead identification.Entities:
Keywords: Binding affinity; Confidence estimation; Free-energy perturbation; Machine learning; Multiple-instance learning; Pose prediction; QSAR
Mesh:
Substances:
Year: 2018 PMID: 29934750 PMCID: PMC6096883 DOI: 10.1007/s10822-018-0126-x
Source DB: PubMed Journal: J Comput Aided Mol Des ISSN: 0920-654X Impact factor: 3.686
Fig. 1Model induction is fully automatic, beginning with pure structure-activity data (top, SMILES and pK), generation of core alignments for diverse active ligands and elaboration into full pose cliques (middle, single poses shown without variations), and derivation of a final pocket-field model (lower left), which exhibits adaptation of training ligand poses to the induced model based on optimizing interactions with the model (lower right, initial (cyan) and final (gray) poses of meclonazepam are shown).
Fig. 2Canonical molecules from the globulin binding sets (top); optimal clique of single poses for each molecule (bottom left); all variants for a single molecule (bottom middle); and the observer points shown in relation to the most active SHBG molecules (bottom right).
Fig. 3The response functions for QuanSA are computed from observer points (yellow circles); the functions are responsive to molecular shape, hydrogen bond donor/acceptor arrangement (including directionality); and electrostatic field.
Fig. 4Comparison of pocket-field interactions for the top predicted ligand pose of DHT (left, steric interactions in gray sticks, ligand acceptor to protein donor in red sticks, ligand donor to protein acceptor interactions in blue sticks, and Coulombic interactions in half gray and half red/blue/yellow sticks); experimentally determined ligand poses to the SHBG binding pocket (middle, PDB Codes 1D2S and 1LHU); and QuanSA predicted alternative poses for estradiol (right).
Fig. 5Typical response functions for steric and polar terms are very similar in effect to the scoring function of Surflex-Dock.
Fig. 6The response functions for the electric field of the ligand can be quite complex, offering the ability to learn common physical interactions in protein-ligand complexes that include protein movement and the presence of complex entropic and water effects.
Fig. 7The learned response functions for each type of molecular feature at each observer point reflect the sigmoidal and Gaussian shapes of the underlying functional forms; bolded plot points (upper left) correspond to all observed values for one response function each for four observers over all training molecules; bolded plot points (upper right) correspond to all observer values for every response function for only meclonazepam; and the colored circles indicate the particular highlighted observers in the plots and the 3D depictions (bottom).
Summary of molecular datasets and their relative complexity.
| Set name | Benchmark source | N train | N blind test | N ChEMBL test |
|---|---|---|---|---|
| Steroid globulins [ | CoMFA/compass | 21 (CBG), 21 (SHBG) | 10, 61 | – |
| 5-HT1a receptor [ | Compass/QMOD | 20 | 35 | – |
| FEP benchmark [ | FEP | 199 (eight targets) | – | – |
| GABA | CMF/QMOD | 98 | 49 | 1158 |
| COX2 [ | CMF/QMOD | 188 | 94 | 2308 |
| AchE [ | CMF/QMOD | 74 | 37 | 2436 |
| Thrombin [ | CMF/QMOD | 59 | 29 | 2947 |
| Muscarinic receptor [ | Pharmacia Med. Chem. | 43 (refine: +26) | – | 993 |
Fig. 8Leave-one-out cross-validation results for CBG and SHBG (lines indicate the 1 and 2 kcal/mol error boundaries).
Fig. 9Blind prediction results for CBG and SHBG, with filled circles identifying in-model predictions.
Fig. 10Training on 20 molecules, all with the canonical scaffolds shown above, produced a remarkably general model, as shown by a test of 35 molecules, including examples with very different scaffolds.
Fig. 11FEP calculated pK using the specified target reference compound from which to calculate pK for other target ligands using individual calculations (left plot); QuanSA predicted pK using purely ligand-based models (middle plot) constructed using 80% of the training data (repeated 5 times on non-overlapping splits), and the combined performance of the two approaches (right plot). Typical examples of FEP mutation pairs for three targets, with the left-hand compound in each case being the target’s reference ligand and the right-hand one having the largest change in experimental free energy of binding of those computed.
Results on the FEP test set of 199 molecules under two prediction regimes for FEP and QuanSA (units for MAE are pK)
| Target | N | FEP (corrected) | FEP (ref | QuanSA LOO | QuanSA 80/20 (fivefold) | |||
|---|---|---|---|---|---|---|---|---|
| Tau (95% CI) | MAE | MAE | Tau (95% CI) | MAE | Tau (95% CI) | MAE | ||
| BACE | 36 | 0.66 (0.48–0.80) |
|
| 0.51 (0.24–0.74) |
| 0.57 (0.37–0.74) |
|
| CDK2 | 16 | 0.29 (− 0.16–0.71) |
|
| 0.82 (0.53–1.00) |
| 0.78 (0.57–0.96) |
|
| JNK1 | 21 | 0.87 (0.69–0.99) |
|
| 0.68 (0.47–0.87) |
| 0.70 (0.52–0.86) |
|
| MCL1 | 42 | 0.64 (0.49–0.77) |
|
| 0.64 (0.43–0.81) |
| 0.63 (0.39–0.81) |
|
| p38 | 34 | 0.53 (0.34–0.68) |
|
| 0.41 (0.21–0.58) |
| 0.32 (0.07–0.55) |
|
| PTP1b | 23 | 0.78 (0.50–0.96) |
|
| 0.59 (0.33–0.81) |
| 0.49 (0.19–0.74) |
|
| Thrombin | 11 | 0.60 (− 0.05 to – 1.00) |
|
| − 0.07 (− 0.89 to – 0.55) |
| 0.42 (− 0.25 to –0.74) |
|
| Tyk2 | 16 | 0.80 (0.56–0.96) |
|
| 0.72 (0.46–0.93) |
| 0.59 (0.28–0.87) |
|
| All | 199 | 0.68 (0.63–0.73) |
|
| 0.72 (0.67–0.77) |
| 0.64 (0.57–0.71) |
|
Tau using the FEP reference molecule is the same as the corrected predictions in all cases, except for when considering all molecules, where Tau was 0.63 (95% CI 0.57–0.68))
Results from combining the FEP (uncorrected reference compound ) predictions with the QuanSA 80/20 pure ligand-based predictions
| Target | Tau (95% CI) | MAE |
|---|---|---|
| BACE | 0.72 (0.54–0.86) |
|
| CDK2 | 0.74 (0.46–0.97) |
|
| JNK1 | 0.78 (0.62–0.88) |
|
| MCL1 | 0.70 (0.55–0.83) |
|
| p38 | 0.55 (0.33–0.72) |
|
| PTP1b | 0.71 (0.42–0.91) |
|
| Thrombin | 0.47 (− 0.16 to 0.89) |
|
| Tyk2 | 0.85 (0.69–1.00) |
|
| All | 0.72 (0.67–0.77) |
|
Test results for the complete Sutherland benchmark
| QuanSA in-model | QuanSA full test | QMOD full test | ||||||||
|---|---|---|---|---|---|---|---|---|---|---|
| % |
| MAE |
|
| MAE |
|
| MAE |
| |
| BZR | 45 | 0.54 (0.36–0.72) | 0.54 (0.38–0.74) | 0.39 (0.15–0.69) | 0.53 | 0.61 | 0.36 | 0.42 | 0.65 | 0.27 |
| COX2 | 81 | 0.54 (0.41–0.66) | 0.84 (0.67–1.04) | 0.41 (0.24–0.57) | 0.49 | 0.90 | 0.34 | 0.39 | 1.01 | 0.22 |
| AchE | 68 | 0.58 (0.30–0.80) | 0.71 (0.51–0.95) | 0.57 (0.23–0.84) | 0.51 | 0.83 | 0.47 | 0.60 | 0.68 | 0.56 |
| THR | 66 | 0.51 (0.19–0.77) | 0.69 (0.50–0.89) | 0.51 (0.18–0.77) | 0.45 | 0.89 | 0.29 | 0.51 | 0.69 | 0.42 |
Fig. 12The relative difficulty of the molecules to be predicted varied considerably, as measured by the nearest-neighbor 3D similarity of the final predicted pose for each test molecule relative to the closest training molecule.
Results for QuanSA models on diverse ChEMBL compounds, with N being the total number of tested compounds, “N i-m” being the number of in-model predictions, and the statistical performance assessed by Kendall’s Tau and mean absolute error
| Target | N | N i-m | Tau (95% CI) | MAE |
|---|---|---|---|---|
| BZR | 1158 | 148 | 0.25 (0.12–0.37) | 1.2 |
| COX2 | 2308 | 549 | 0.24 (0.18–0.30) | 1.0 |
| AchE | 2436 | 186 | 0.26 (0.16–0.35) | 1.6 |
| Thrombin | 2949 | 0 | – | – |
| Muscarinic | 993 | 291 | 0.34 (0.26–0.41) | 1.1 |
Fig. 13Examples of extrapolation to ChEMBL molecules for the QuanSA BZR model. The left-most and middle molecules are both in-model predictions, and the right-most falls above the novelty threshold.
Fig. 14Examples of extrapolation to ChEMBL molecules for the QuanSA AchE model.
Test results for the complete Sutherland benchmark.
| Target | Full ChEMBL set | QuanSA in-model pred. | 1000 Decoys | |||||||
|---|---|---|---|---|---|---|---|---|---|---|
| N |
|
| N | Mean | N TP | TP % | N FP | FP % | FP % | |
| BZR | 1158 | 309 | 544 | 90 | 6.9 | 38 | 12.3 | 32 | 5.9 | 1.1 |
| COX2 | 2308 | 351 | 1488 | 169 | 7.2 | 90 | 25.6 | 49 | 3.3 | 0.0 |
| AchE | 2436 | 491 | 1437 | 36 | 7.6 | 24 | 4.9 | 14 | 1.0 | 0.0 |
| Musc. | 993 | 350 | 427 | 66 | 7.2 | 31 | 8.9 | 17 | 4.0 | 0.0 |
Fig. 15Muscarinic model training, refinement, and scoring procedure.
Fig. 16The human M2 receptor bound to QNB aligned with rat M2 and the QuanSA predicted conformation of QNB (top left); the pocket-field and interactions with QNB (top middle); a striking example of substituent effect non-additivity (top right); and the predicted poses by docking (a49 and b29, green) and QuanSA (a49 in magenta, and b1/b29 in cyan).
Fig. 17In-model QuanSA predictions of potent muscarinic antagonists of diverse scaffolds.