| Literature DB >> 31179286 |
Daria B Kokh1, Tom Kaufmann1,2, Bastian Kister1,2, Rebecca C Wade1,3,4,5.
Abstract
Drug-target residence times cEntities:
Keywords: drug-protein residence time; drug-target binding kinetics; heat shock protein 90 (HSP90); machine learning; molecular dynamics simulation; structure-kinetic relationships (SKRs); tauRAMD
Year: 2019 PMID: 31179286 PMCID: PMC6543870 DOI: 10.3389/fmolb.2019.00036
Source DB: PubMed Journal: Front Mol Biosci ISSN: 2296-889X
Figure 1Structural and binding kinetic properties of the dataset of 94 N-HSP90 inhibitors. (A,B) Two conformations of the ATP binding site in N-HSP90 with a bound ligand shown in stick representation with coloring by atom type; α-helix3 (highlighted in red) can be distorted in the middle (loop-type conformation (A), compound 5 PDB ID 5J2X) or complete (helix-type conformation (B), compound 13, PDB ID 5J9X) (Amaral et al., 2017); the molecular surface of the binding pocket colored by the Coulomb potential is shown in insets for both conformations: the ATP binding site has predominantly negative charge (red), whereas the transient sub-pocket under α-helix3 is mostly hydrophobic. (C) Protein-ligand contacts for helix-binding compounds are illustrated for compound 13, (PDB ID 5J9X): the ligand-protein binding network consisting of D93, T184, and three water molecules (red spheres) is common to all compounds; compounds bound to the helix-conformation of the binding site also interact with F138 and may interact with residues in the hydrophobic pocket, such as W162 and Y139. (D) 2D representation showing the four main groups of compounds discussed in the text. (E) Similarity matrix of the 90 N-HSP90 inhibitors generated using Maestro [(Schrödinger, 2019); see text]. (F) Distribution of the experimental binding rate constants of the entire set of compounds. The three largest groups of compounds are colored as denoted in the legend: “Ind. exposed”—indazole-based compounds with different R1 fragments, “Ind. buried”—indazole compounds with different R2 fragments, compounds with resorcinol and quinazoline scaffolds, as well as bulky compounds with a tricyclic fragment and different ATP-pocket binding core. (G) Distribution of residence times of the helix-binding and loop-binding compounds.
Figure 2Workflow incorporating the simulation protocol for τRAMD simulations and the ML analysis. The τRAMD simulations provide (i) computed relative residence times, and (ii) trajectories that are used for analysis of protein-ligand contacts and building a ML regression model for prediction of residence times and determining the factors governing residence time (see section Methods and Materials); data sets generated and elements of simulation workflow are highlighted by blue and gray background, respectively.
Figure 3Workflow illustrating the generation of features from simulated τRAMD trajectories: (1) Extraction of interaction fingerprints as features for ML; (2) Discarding of the bound state part of the trajectory (highlighted in pink). The discarded part of the trajectory depends on the threshold used, resulting in data-sets A, B, and C (see text for details); (3) Averaging of the features over all snapshots in each trajectory; (4) Averaging of the features over all trajectories for each compound; (5) Removal of features that appear rarely or are strongly correlated with other features. The size of each set of data at each stage of the workflow is indicated in red.
Figure 4Results of τRAMD simulations. (A) Scaled τRAMD residence times plotted vs. measured log (1/koff) values on a logarithmic scale for the complete set of compounds. The τRAMD residence times are scaled according to the linear fitting (log(τRAMD) = 0.39*log(1/koff) −0.52) of all compounds except for the 13 outliers identified in Kokh et al. (2018) (shown with background yellow circles). Two groups of indazoles (with different exposed R1 and buried R2 fragments, see Figure 1D), resorcinol and quinazoline compounds, as well as compounds with a tricyclic fragment, are colored as denoted in the legend; other compounds are shown in gray. The black line shows the one-to-one correspondence of the computed and experimental data and the interval within 1.5-fold of the mean of the residuals (0.9 log([s])) is shown by the gray area. The distributions of measured and τRAMD computed residence times are shown along the X and Y axes, respectively. (B) Linear fitting with 95% confidence interval for several sub-groups of compounds highlighted in (A): indazole exposed, indazole buried, and resorcinol colored as in (A). (C) Histogram showing the distribution of the mean absolute error, MAE, of τRAMD residence times relative to measured values; the long tail arises from the outliers.
Figure 5Result of clustering analysis based on the IFs of the ligand dissociation trajectories. (A,B) Clustering of the complete data set of 94 compounds: (A) mean and standard deviation of log residence times in each cluster obtained in 50 clustering runs; (B) weights of IFs for each cluster. HB, ION, ARO, and APO mean hydrogen bond (donor or acceptor), ionic, aromatic, and apolar interactions, respectively; (C,E) Position of indazole compound bound to the helix-type conformation of the binding pocket (PDB ID:5LNZ), and (C) of resorcinol compound bound to the loop-type conformation (PDB ID: 5J2X) (E); residues that contribute to the protein-ligand contacts along the ligand dissociation trajectories are shown in stick representation and colored by protein region consistently with (B). (D) Clustering of the resorcinol loop-binders (see compound list in Supplementary Table 2) showing mean and standard deviation of the log residence time in each cluster (above) and cluster composition (below).
Figure 6Clustering of indazole compounds: (A) weights of IFs for each cluster (coloring scheme and labels as in Figure 4); (B) mean and standard deviation of log residence times of compounds in each cluster; (C) population of selected molecular fragments in each cluster (see Supplementary Figure 6 for naming convention); the structures of two compounds discussed in the text are shown below (fragment substitutions are highlighted in blue); (D) Position of indazole compound 37 in the binding pocket, the main contact residues are shown in sticks and colored as in (A).
Figure 7Representative examples of computed vs. experimental residence times obtained for data-sets. (A) A and (B) C using linear (LR) and non-linear (SVR) ML models as well as from the τRAMD residence time estimation procedure. Black/blue and red points belong to the training/validation and external test sets, respectively.
Figure 8Assessment of the RM quality. Histograms of mean absolute error, MAE (A) and score (B) of the external test set obtained in 200 repeated test/training set splitting using RMs and the same values computed from τRAMD simulations (C,D) are shown in blue along with results for the Dummy model (orange); results for the sub-set of only quinazoline compounds (from the full data set A) are shown by red lines; in τRAMD simulations values (D) are negative for quinazoline compounds; in the right-hand plot of panels (C,D) all quinazoline compounds were removed as outliers. The data-set used are denoted in each plot: A and C data-sets, MF—data-set from molecular descriptors only.
Results of evaluation tests for different models: mean of MAE and score obtained from 200 rounds of simulations (the standard deviation is given in parentheses) for the external test sets.
| MAE | LR | 0.47(0.08) | 0.51(0.09) | 0.60(0.11) | 0.43(0.08) | 0.51(0.10) | 0.39(0.10) |
| SVR | 0.48(0.09) | 0.53(0.10) | 0.60(0.11) | 0.43(0.08) | 0.45(0.11) | 0.39(0.11) | |
| τRAMD | 0.76(0.12) | 0.39(0.06) | – | 0.38(0.08) | |||
| Dummy | 0.71(0.11) | 0.61(0.11) | 0.71(0.11) | 0.55(0.14) | |||
| LR | 0.57(0.21) | 0.44(0.30) | 0.29(0.30) | 0.54(0.23) | 0.36(0.52) | 0.41(0.52) | |
| SVR | 0.56(0.22) | 0.44(0.30) | 0.28(0.30) | 0.51(0.25) | 0.52(0.30) | 0.38(0.58) | |
| τRAMD | −0.41(0.47) | 0.69(0.10) | – | 0.57(0.23) |
Calculations were done for data-sets A, B, and C (see main text) are based on the complete set of 94 compounds. The test sets in these three cases were required to contain some of the outliers found by applying the τRAMD procedure to estimate relative residence times, see Methods for details. A.
Figure 9Average value of MAE for the sub-set of compounds with experimental residence times in the ranges of (<1s), (1s-2s), (2s-3s), and (>3s) as obtained in 100 simulations for different test sets and compared with the Dummy-model (null-hypothesis) and τRAMD for the same set of compounds. (A–C) For the complete set of compounds in models (A–C), respectively; (D) For the data-set model A* (model A without outliers); (E) Only a sub-set of indazole compounds from the data-set A was used; (F) Only molecular features were used.
Figure 10Coefficients of the LR model in the test set averaged over 200 different splitting of the training and external test sets for the A and C data-sets (A) and for the LR built on molecular descriptors only (B), as denoted in each plot.