Literature DB >> 31482130

Predicting Kinase Inhibitor Resistance: Physics-Based and Data-Driven Approaches.

Matteo Aldeghi¹, Vytautas Gapsys¹, Bert L de Groot¹.

Abstract

Resistance to small molecule drugs often emerges in cancer cells, viruses, and bacteria as a result of the evolutionary pressure exerted by the therapy. Protein mutations that directly impair drug binding are frequently involved in resistance, and the ability to anticipate these mutations would be beneficial in drug development and clinical practice. Here, we evaluate the ability of three distinct computational methods to predict ligand binding affinity changes upon protein mutation for the cancer target Abl kinase. These structure-based approaches rely on first-principle statistical mechanics, mixed physics- and knowledge-based potentials, and machine learning, and were able to estimate binding affinity changes and identify resistant mutations with remarkable accuracy. We expect that these complementary approaches will enable the routine prediction of resistance-causing mutations in a variety of other target proteins.

Entities: Chemical Disease Gene Species

Year: 2019 PMID： 31482130 PMCID： PMC6716344 DOI： 10.1021/acscentsci.9b00590

Source DB: PubMed Journal: ACS Cent Sci ISSN： 2374-7943 Impact factor: 14.553

Introduction

Drug resistance is one of the chief challenges to be overcome in the development of robust anticancer and antimicrobial therapies. While resistance can emerge via multiple mechanisms, such as increased drug efflux or activation of alternative signaling pathways, it is often caused by protein mutations directly impacting drug binding.[1,2] Anticipating these resistance-causing mutations is of interest for personalized medicine, as it would inform treatment decisions based on the patient’s genotype[3,4] and aid the development of combination therapies. It would also benefit drug development by allowing the parallel exploration of inhibitors with different resistance profiles. While large-scale experimentation is feasible,[5] it is neither cheap nor convenient, and accurate computer predictions mitigate the need for systematic experimentation. Protein kinases are among the most exploited drug targets, with 48 inhibitors approved to date in the United States.[6] The majority of these inhibitors target tyrosine kinases,[6] which play a critical role in the modulation of growth factor signaling.[7,8] Hence, tyrosine kinase inhibitors (TKIs) are employed against a number of malignancies, like chronic myelogenous leukemia (CML) and nonsmall-cell lung cancer.[7,8] In particular, TKIs targeting the human kinase Abl are the first-line therapy for the treatment of CML.[9] However, susceptibility to resistance requires continued development of new-generation inhibitors.[8] For instance, in nonsmall-cell lung cancer, acquired resistance inevitably occurs within 1–2 years of starting the therapy.[10] In CML, more than 25% of patients switch TKI due to intolerance or resistance,[11] the latter being most often caused by mutations in Abl.[8] Because kinases display a long tail of rare and uncharacterized mutations,[12] the sensitivity of many clinically identified kinase mutants to TKI treatment is often unknown. Thus, rapid and accurate computational approaches could impact clinical decision-making by providing oncologists with a first indication of whether the observed mutation is likely to cause resistance to certain inhibitors. Here, we show how both physics-based and data-driven computational approaches can be used to accurately estimate the change in affinity of TKIs for the human kinase Abl caused by single-point mutations. To test the different methodologies, we used a data set of 144 Abl:TKI affinity changes (ΔΔG) across 8 TKIs caused by 31 Abl mutations (Figure ), which was compiled by Hauser et al.[12] using publicly available cell viability IC50 data. The computational approaches tested were based on (i) molecular dynamics (MD) simulations[13] with a nonequilibrium free energy calculation protocol;[14−16] (ii) Rosetta, a modeling program that uses mixed physics- and knowledge-based potentials;[17,18] and (iii) machine learning (ML).[19] For completeness, we discuss our results also in light of those obtained by Hauser et al.[12] with the OPLS3 force field[20] and a MD-based approach similar to ours (Table ).

Figure 1

Table 1

Summary of the Approaches Used, Their Performance, and Computational Costa

			Approximate cost per ΔΔG estimate		Performance
Abbreviation	Method	Force field or scoring function	Hardware	Compute hours	RMSE (kcal/mol)	Pearson	AUPRC
OP3	Molecular Dynamics	OPLS3b	1 GPUc	72g	1.07_0.89^1.25	0.49_0.19^0.69	0.56_0.32^0.76
C22	Molecular Dynamics	Charmm22* and CGenFF v 3.0.1	10 CPUd cores and 1 GPUe	59	1.03_0.85^1.21	0.24_0.01^0.44	0.25_0.12^0.48
A99	Molecular Dynamics	Amber99sb*-ILDN and GAFF v 2.1	10 CPUd cores and 1 GPUe	59	0.91_0.77^1.05	0.44_0.24^0.59	0.56_0.32^0.77
	Molecular Dynamics	Amber99sb*-ILDN and GAFF v 2.1	10 CPUd cores and 1 GPUe	98g	0.91_0.74^1.09	0.42_0.20^0.59	0.51_0.27^0.75
R15	Rosetta	REF15	1 CPUf core	32	0.72_0.60^0.83	0.67_0.45^0.81	0.53_0.29^0.74
R16	Rosetta	βNOV16	1 CPUf core	32	0.83_0.70^0.96	0.59_0.35^0.74	0.39_0.18^0.60
ML1	Machine Learning	n/a	1 CPUf core	0.02	0.87_0.68^1.06	0.12_-0.04^0.29	0.20_0.10^0.39
ML2	Machine Learning	n/a	1 CPUf core	0.02	0.68_0.55^0.80	0.57_0.34^0.72	0.47_0.25^0.68

For the performance measures, the point estimates from the original samples and their 95% bootstrapped confidence intervals are shown (xlowerupper). RMSE: root-mean-square error; AUPRC: area under the precision-recall curve.

Data for the MD calculations with the OPLS3 force field refer to and were taken from Hauser et al.[12]

Nvidia (Pascal architecture).

Intel Xeon E5-2630 v 4.

Nvidia GeForce GTX 1080 Titan.

Intel Xeon (Broadwell architecture).

Time for charge-conserving mutations. For charge-changing mutations the simulation time is double.

Data set of Abl kinase mutations and associated TKI affinity changes (ΔΔG) studied. (a) Structure of human Abl kinase (PDB-ID 1OPJ) with imatinib (light orange) bound. Mutated wild-type residues are shown as violet sticks. (b) Name and chemical structure of the 8 TKIs studied. (c) Distribution of the 144 experimental ΔΔG values. The line at ΔΔG = 1.36 kcal/mol separates mutations defined as resistant from susceptible. For the performance measures, the point estimates from the original samples and their 95% bootstrapped confidence intervals are shown (xlowerupper). RMSE: root-mean-square error; AUPRC: area under the precision-recall curve. Data for the MD calculations with the OPLS3 force field refer to and were taken from Hauser et al.[12] Nvidia (Pascal architecture). Intel Xeon E5-2630 v 4. Nvidia GeForce GTX 1080 Titan. Intel Xeon (Broadwell architecture). Time for charge-conserving mutations. For charge-changing mutations the simulation time is double.

Results and Discussion

The performance of all calculations was assessed by the root-mean-square error (RMSE) between calculated and experimentally measured ΔΔG values, the Pearson correlation coefficient (r), and the area under the precision recall curve (AUPRC).[21] The latter measures the ability to classify mutations as resistant or susceptible. In particular, precision measures which fraction of mutations classified as resistant are actually resistant, whereas recall measures which fraction of resistant mutations are correctly classified as resistant. Consistent with previous work,[12] resistant mutations were defined as those causing more than a 10-fold drop in affinity (ΔΔGexp > 1.36 kcal/mol). On this data set, given the fraction of resistant mutations (Figure c), a random classifier would return an AUPRC of 0.13. First, we estimated changes in TKI affinity with an approach based on the first-principles of statistical mechanics. MD simulations were used to sample the isothermal–isobaric ensemble while a specific residue was “alchemically” mutated[14] into another one. From the nonequilibrium work required to transform the residue, it is possible to recover the equilibrium free energy difference[22,23] and, using a suitable thermodynamic cycle (Figure S1), a rigorous estimate of ΔΔG. We tested multiple force fields, and, on this data set, Amber force fields performed better than Charmm (Figure S2, Table S1); halogen bond modeling did not improve the Charmm results (Figure S3). For simplicity, we discuss only the results obtained with the best-performing Amber (A99)[24−27] and Charmm (C22)[28−31] force fields (Table ). The performance of A99 was comparable to that of OP3 (Figure ),[12] which was obtained by Hauser et al.[12] with the FEP+ program by Schrödinger Inc. and the recently optimized proprietary OPLS3 force field.[20] C22 performed significantly worse than A99 and OP3 in terms of correlation and classification ability. Surprisingly, combining force fields in a consensus approach (considering half of the simulations for each parent force field) did not generally improve the results (Figure S4). It is conceivable that, on this data set, the benefits of the additional sampling obtained with one force field outweighs those expected by combining multiple force fields.[16] We also did not observe a dependence of the accuracy of the calculation on the net charge change of the system upon mutation (Figure S5).

Figure 2

Accuracy of the ΔΔG estimates. (a) Scatter plots of experimental versus calculated ΔΔG values. The identity is shown as a dashed gray line. The four quadrants indicate the location of true positives (TP), true negatives (TN), false positives (FP), and false negatives (FN) according to the definition of resistant and susceptible mutations used (resistant if ΔΔGexp > 1.36 kcal/mol, susceptible otherwise)[12] and an equivalent cutoff (1.36 kcal/mol) for the calculated ΔΔG values. Each ΔΔG estimate is color-coded according to its absolute error with respect to the experimental ΔΔG value; at 300 K, an error of 1.4 kcal/mol corresponds to a ∼10-fold error in Kd change, and an error of 2.8 kcal/mol to a ∼100-fold error in Kd change. (b) Summary of the performance of the ΔΔG estimates across approaches in terms of RMSE, Pearson correlation, and AUPRC; point estimates from the original samples and 95% bootstrapped confidence intervals are shown (SI Methods). Differences at three levels of significance are reported using labels within the chart: e.g., a “C22**” label above the RMSE mark of OP3 indicates that the RMSE of OP3 is significantly lower (i.e., better agreement with experiment) than that of C22 at α = 0.05. While our free energy protocol used a total of 216 ns of MD per ΔΔG estimate,[16] OP3 used 720 ns for charge-changing mutations, and 360 ns for charge-conserving ones.[12] We thus also tested a more expensive protocol , which matched exactly the simulation time invested by OP3. improved upon the uncertainty (Figure S6), but not upon the accuracy (Figure ), of the A99 estimates. Lower uncertainty is expected when more calculation repeats are used as in this case. On the other hand, accuracy is not necessarily proportional to the length of the simulations, as longer simulations allow for better sampling but also larger deviation from the experimentally determined structure in a potentially artificial fashion. Overall, this rigorous physics-based approach returned accurate ΔΔG estimates (e.g., for A99, RMSE = 0.91 kcal/mol, r = 0.44, and AUPRC = 0.56) and was able to discriminate between resistant and susceptible mutants well. However, a strong dependence of the results on the force field was observed. In addition, compared to the approaches discussed later, the MD based calculations required considerable computational resources (∼2.5 days on 10 CPU cores and one GPU per ΔΔG estimate). As an approach that is neither rigorously physics-based nor purely statistical, we tested a recently proposed Rosetta protocol[18] that was found to perform well also on protein–ligand binding.[16] This method samples multiple conformations of wild-type and mutant proteins with a Monte Carlo algorithm and estimates ΔΔG with the all-atom Rosetta energy function,[17] a mixed knowledge- and physics-based potential, by averaging over the generated wild-type and mutant ensembles. We found this approach (R15) to be highly accurate on this data set (Figure )—with low absolute errors (RMSE = 0.72 kcal/mol), strong correlation (r = 0.67), and good classification performance (AUPRC = 0.53)—while requiring only moderate computational resources (∼32 h on a single CPU core per ΔΔG estimate). In addition to the standard REF15 scoring function, we tested the βNOV16 function (R16), which returned slightly worse results than R15 for this system (Figure ). As had been noticed before,[16] here too, complementing Rosetta with the MD results via simple averaging generally resulted in enhanced performance (Figure S7). For instance, combining A99 and R15, we obtained a particularly strong consensus estimate (Figure S8) with RMSE = 0.62 kcal/mol, r = 0.71, and AUPRC = 0.61. This can be attributed to the complementary performance of the two approaches: while the MD based calculations showed better precision performance and a lower number of false positives, Rosetta displayed better recall and a lower number of false negatives (Table S2, Figure ). As a purely data-driven approach, we employed extremely randomized regression trees,[19] a ML technique that uses ensembles of decision trees similarly to random forest. As input features, we used terms derived from the crystallographic protein–ligand structures (e.g., hydrogen bonds, nonpolar contacts, residue-ligand distance), ligand and residue physicochemical properties, and fast empirical scoring functions.[32−34] However, the way information from these sources is transformed to produce a ΔΔG estimate does not rely on any specific physical model. The technical details of the ML model construction are provided in the SI Methods. An advantage of this statistical approach is speed: once the necessary input features have been computed, a ΔΔG estimate can be obtained in seconds. The ML model was trained and validated on a subset of the Platinum database,[35] and then tested on the TKI data set. The training/validation set contained 484 point-mutations with associated ΔΔG values across 84 proteins and 143 ligands. We computed a total of 128 features, and then used a greedy algorithm to select any number of features, up to 40, which minimized the mean-squared error of 10-fold cross-validation. The folds were built such that each of them would contain a unique set of proteins not present in the other folds. This procedure identified an optimal set of 29 features (Data S3). After training, the model was tested on the TKI data set (ML1 in Figure ). Given no tyrosine kinase was present in the training set, we effectively assessed whether the model could extrapolate to this protein target. ML1 performed well in terms of RMSE (0.87 kcal/mol) but was not significantly better than random in terms of correlation and classification ability (r = 0.12-0.040.29, AUPRC = 0.200.100.39), and was overall comparable to the MD results obtained with the Charmm force fields (Figure , Table S1). These results are similar to those obtained with another ML model[36] based on Gaussian process regression (Figure S9). Given that ML is expected to perform best when trained on the most relevant data,[37] we asked whether the model performance would improve when trained on tyrosine kinase mutations only. In this case, we effectively assessed the interpolative ability of the model, as it was trained and tested on the TKI data set (ML2). To do this, we used 8-fold nested cross-validation. At each iteration, the model selected a subset of the 128 features via 7-fold cross validation with 7 inhibitors and was then tested on the eighth. This was done 8 times to obtain predictions for all TKIs and thus all entries in the data set. ML2 accurately estimated ΔΔG values and could discriminate well between resistant and susceptible mutations (RMSE = 0.68 kcal/mol, r = 0.57, AUPRC = 0.47). If the objective is only to identify resistant mutations[38] rather than assessing their impact on ligand binding quantitatively, then AUPRC is the most relevant performance measure to examine. In this regard, MD based calculations with the OPLS3 or Amber99 force fields (OP3, A99, ), and Rosetta with the REF15 scoring function (R15) provided the best performance (Figure , Figure S10). Furthermore, in this scenario, one is likely willing to accept a larger fraction of false positives if this means recovering more resistant mutations. Thus, a suitable consensus strategy is to select the most positive ΔΔG estimate among those available to maximize the recall (at the expense of precision) at a given threshold (Figure S11). Using this approach with A99 and R15 returns an AUPRC of 0.62 and, at the conventional threshold of ΔΔGcalc > 1.36 kcal/mol, a recall of 0.79 and a precision of 0.48 (Figure ; Table S2). This means that about half of the mutations classified as resistant are in fact resistant, and 79% of the resistant mutations in the data set have been identified.

Figure 3

Precision recall curves for selected approaches. A99 and R15 have been combined to give two consensus results: in “avg(A99,R15)”, the results of A99 and R15 have been averaged; in “max(A99,R15)”, for each mutation, the most positive ΔΔG estimate among A99 and R15 was selected. The curve for a random estimator is shown as a dashed black line (baseline with AUPRC of 0.13). The precision and recall corresponding to the conventional threshold of ΔΔGcalc > 1.36 kcal/mol is reported and marked by a circle on the curves. From the results presented, it emerges that the latest computational approaches, from physics-based to data-driven ones, are able to predict TKI affinity changes upon Abl mutation and identify resistance-causing mutations. However, the different methods have different strengths and weaknesses. The MD-based calculations achieved a remarkable performance considering that force field parameters are based on fundamental physical properties of organic molecules (e.g., quantum mechanical energies and electrostatic potentials, liquid densities, and enthalpies of vaporization), rather than binding affinities directly. This, together with the rigorous statistical mechanical treatment, might lie at the basis of the demonstrated generalizability[16,39] of the approach across protein–ligand systems. The downside is the comparably higher computational cost required to obtain precise ΔΔG estimates. On the other hand, ML predictions proved accurate only when the binding affinities for Abl:TKI were used directly for training. With this information, however, a ΔΔG estimate could be generated in a matter of seconds to minutes on a single CPU core, with most of the time spent collecting the necessary structure-based features. Thus, the optimal approach choice in a prospective scenario will depend on the amount of prior information available for the system of interest, the nature of the study (a large-scale scan versus an in-depth study of a few mutations), as well as the computational resources at hand. That being said, we found that Rosetta struck a good balance between accuracy, generalizability, and computational cost. Similar to MD, this approach considers an ensemble of protein–ligand conformations,[18] and its scoring function is based on simple physical and structural properties of organic molecules and biomolecules,[17] so that it does not require binding affinities to the system of interest for training. In addition, approximations such as the use of implicit solvation and a more limited conformational sampling result in a moderate computational cost.

Conclusion

In summary, we have shown how fundamentally different structure-based approaches are able to predict TKI resistance to similar or better accuracy than reported thus far. Taking advantage of multiple techniques in a consensus fashion resulted in remarkable accuracy and tuning of the classification performance. These results suggest that a diverse set of computational approaches, each with its own strengths and weaknesses, is now available for the reliable estimate of resistance-causing protein mutations. It is our hope that the availability of multiple complementary approaches will enable the routine prediction of resistance-causing mutations across many other protein targets.

35 in total

1. Entropy production fluctuation theorem and the nonequilibrium work relation for free energy differences.

Authors: G E Crooks
Journal: Phys Rev E Stat Phys Plasmas Fluids Relat Interdiscip Topics Date: 1999-09

2. Extending the treatment of backbone energetics in protein force fields: limitations of gas-phase quantum mechanics in reproducing protein conformational distributions in molecular dynamics simulations.

Authors: Alexander D Mackerell; Michael Feig; Charles L Brooks
Journal: J Comput Chem Date: 2004-08 Impact factor: 3.376

3. Development and testing of a general amber force field.

Authors: Junmei Wang; Romain M Wolf; James W Caldwell; Peter A Kollman; David A Case
Journal: J Comput Chem Date: 2004-07-15 Impact factor: 3.376

4. Comparison of multiple Amber force fields and development of improved protein backbone parameters.

Authors: Viktor Hornak; Robert Abel; Asim Okur; Bentley Strockbine; Adrian Roitberg; Carlos Simmerling
Journal: Proteins Date: 2006-11-15

5. AutoDock Vina: improving the speed and accuracy of docking with a new scoring function, efficient optimization, and multithreading.

Authors: Oleg Trott; Arthur J Olson
Journal: J Comput Chem Date: 2010-01-30 Impact factor: 3.376

6. How robust are protein folding simulations with respect to force field parameterization?

Authors: Stefano Piana; Kresten Lindorff-Larsen; David E Shaw
Journal: Biophys J Date: 2011-05-04 Impact factor: 4.033

Review 7. Role of tyrosine kinase inhibitors in cancer therapy.

Authors: Amit Arora; Eric M Scholar
Journal: J Pharmacol Exp Ther Date: 2005-07-07 Impact factor: 4.030

8. Optimized molecular dynamics force fields applied to the helix-coil transition of polypeptides.

Authors: Robert B Best; Gerhard Hummer
Journal: J Phys Chem B Date: 2009-07-02 Impact factor: 2.991

9. CHARMM general force field: A force field for drug-like molecules compatible with the CHARMM all-atom additive biological force fields.

Authors: K Vanommeslaeghe; E Hatcher; C Acharya; S Kundu; S Zhong; J Shim; E Darian; O Guvench; P Lopes; I Vorobyov; A D Mackerell
Journal: J Comput Chem Date: 2010-03 Impact factor: 3.376

10. Improved side-chain torsion potentials for the Amber ff99SB protein force field.

Authors: Kresten Lindorff-Larsen; Stefano Piana; Kim Palmo; Paul Maragakis; John L Klepeis; Ron O Dror; David E Shaw
Journal: Proteins Date: 2010-06

12 in total

1. Large scale relative protein ligand binding affinities using non-equilibrium alchemy.

Authors: Vytautas Gapsys; Laura Pérez-Benito; Matteo Aldeghi; Daniel Seeliger; Herman van Vlijmen; Gary Tresadern; Bert L de Groot
Journal: Chem Sci Date: 2019-12-02 Impact factor: 9.825

2. Predicting Resistance Is (Not) Futile.

Authors: Alice E Brankin; Philip W Fowler
Journal: ACS Cent Sci Date: 2019-08-15 Impact factor: 14.553

Review 3. FGFR-TKI resistance in cancer: current status and perspectives.

Authors: Sitong Yue; Yukun Li; Xiaojuan Chen; Juan Wang; Meixiang Li; Yongheng Chen; Daichao Wu
Journal: J Hematol Oncol Date: 2021-02-10 Impact factor: 17.388

4. How Far Are We from the Rapid Prediction of Drug Resistance Arising Due to Kinase Mutations?

Authors: Mehmet Erguven; Tülay Karakulak; M Kasim Diril; Ezgi Karaca
Journal: ACS Omega Date: 2021-01-04

5. Identification of Novel Alectinib-Resistant ALK Mutation G1202K with Sensitization to Lorlatinib: A Case Report and in silico Structural Modelling.

Authors: Ping Yang; Ran Cao; Hua Bao; Xue Wu; Lingling Yang; Dongqin Zhu; Lu Zhang; Liming Peng; Yuefei Cai; Weijun Zhang; Yang Shao
Journal: Onco Targets Ther Date: 2021-03-25 Impact factor: 4.147

6. PremPLI: a machine learning model for predicting the effects of missense mutations on protein-ligand interactions.

Authors: Tingting Sun; Yuting Chen; Yuhao Wen; Zefeng Zhu; Minghui Li
Journal: Commun Biol Date: 2021-11-19

7. Predicting the mutation effects of protein-ligand interactions via end-point binding free energy calculations: strategies and analyses.

Authors: Yang Yu; Zhe Wang; Lingling Wang; Sheng Tian; Tingjun Hou; Huiyong Sun
Journal: J Cheminform Date: 2022-08-20 Impact factor: 8.489