| Literature DB >> 31482130 |
Matteo Aldeghi1, Vytautas Gapsys1, Bert L de Groot1.
Abstract
Resistance to small molecule drugs often emerges in cancer cells, viruses, and bacteria as a result of the evolutionary pressure exerted by the therapy. Protein mutations that directly impair drug binding are frequently involved in resistance, and the ability to anticipate these mutations would be beneficial in drug development and clinical practice. Here, we evaluate the ability of three distinct computational methods to predict ligand binding affinity changes upon protein mutation for the cancer target Abl kinase. These structure-based approaches rely on first-principle statistical mechanics, mixed physics- and knowledge-based potentials, and machine learning, and were able to estimate binding affinity changes and identify resistant mutations with remarkable accuracy. We expect that these complementary approaches will enable the routine prediction of resistance-causing mutations in a variety of other target proteins.Entities:
Year: 2019 PMID: 31482130 PMCID: PMC6716344 DOI: 10.1021/acscentsci.9b00590
Source DB: PubMed Journal: ACS Cent Sci ISSN: 2374-7943 Impact factor: 14.553
Figure 1Data set of Abl kinase mutations and associated TKI affinity changes (ΔΔG) studied. (a) Structure of human Abl kinase (PDB-ID 1OPJ) with imatinib (light orange) bound. Mutated wild-type residues are shown as violet sticks. (b) Name and chemical structure of the 8 TKIs studied. (c) Distribution of the 144 experimental ΔΔG values. The line at ΔΔG = 1.36 kcal/mol separates mutations defined as resistant from susceptible.
Summary of the Approaches Used, Their Performance, and Computational Costa
| Approximate
cost per ΔΔ | Performance | ||||||
|---|---|---|---|---|---|---|---|
| Abbreviation | Method | Force field or scoring function | Hardware | Compute hours | RMSE (kcal/mol) | Pearson | AUPRC |
| OP3 | Molecular Dynamics | OPLS3 | 1 GPU | 72 | 1.070.891.25 | 0.490.190.69 | 0.560.320.76 |
| C22 | Molecular Dynamics | Charmm22* and CGenFF v 3.0.1 | 10 CPU | 59 | 1.030.851.21 | 0.240.010.44 | 0.250.120.48 |
| A99 | Molecular Dynamics | Amber99sb*-ILDN and GAFF v 2.1 | 10 CPU | 59 | 0.910.771.05 | 0.440.240.59 | 0.560.320.77 |
| Molecular Dynamics | Amber99sb*-ILDN and GAFF v 2.1 | 10 CPU | 98 | 0.910.741.09 | 0.420.200.59 | 0.510.270.75 | |
| R15 | Rosetta | REF15 | 1 CPU | 32 | 0.720.600.83 | 0.670.450.81 | 0.530.290.74 |
| R16 | Rosetta | βNOV16 | 1 CPU | 32 | 0.830.700.96 | 0.590.350.74 | 0.390.180.60 |
| ML1 | Machine Learning | n/a | 1 CPU | 0.02 | 0.870.681.06 | 0.12-0.040.29 | 0.200.100.39 |
| ML2 | Machine Learning | n/a | 1 CPU | 0.02 | 0.680.550.80 | 0.570.340.72 | 0.470.250.68 |
For the performance measures, the point estimates from the original samples and their 95% bootstrapped confidence intervals are shown (xlowerupper). RMSE: root-mean-square error; AUPRC: area under the precision-recall curve.
Data for the MD calculations with the OPLS3 force field refer to and were taken from Hauser et al.[12]
Nvidia (Pascal architecture).
Intel Xeon E5-2630 v 4.
Nvidia GeForce GTX 1080 Titan.
Intel Xeon (Broadwell architecture).
Time for charge-conserving mutations. For charge-changing mutations the simulation time is double.
Figure 2Accuracy of the ΔΔG estimates. (a) Scatter plots of experimental versus calculated ΔΔG values. The identity is shown as a dashed gray line. The four quadrants indicate the location of true positives (TP), true negatives (TN), false positives (FP), and false negatives (FN) according to the definition of resistant and susceptible mutations used (resistant if ΔΔGexp > 1.36 kcal/mol, susceptible otherwise)[12] and an equivalent cutoff (1.36 kcal/mol) for the calculated ΔΔG values. Each ΔΔG estimate is color-coded according to its absolute error with respect to the experimental ΔΔG value; at 300 K, an error of 1.4 kcal/mol corresponds to a ∼10-fold error in Kd change, and an error of 2.8 kcal/mol to a ∼100-fold error in Kd change. (b) Summary of the performance of the ΔΔG estimates across approaches in terms of RMSE, Pearson correlation, and AUPRC; point estimates from the original samples and 95% bootstrapped confidence intervals are shown (SI Methods). Differences at three levels of significance are reported using labels within the chart: e.g., a “C22**” label above the RMSE mark of OP3 indicates that the RMSE of OP3 is significantly lower (i.e., better agreement with experiment) than that of C22 at α = 0.05.
Figure 3Precision recall curves for selected approaches. A99 and R15 have been combined to give two consensus results: in “avg(A99,R15)”, the results of A99 and R15 have been averaged; in “max(A99,R15)”, for each mutation, the most positive ΔΔG estimate among A99 and R15 was selected. The curve for a random estimator is shown as a dashed black line (baseline with AUPRC of 0.13). The precision and recall corresponding to the conventional threshold of ΔΔGcalc > 1.36 kcal/mol is reported and marked by a circle on the curves.