Literature DB >> 33490784

How Far Are We from the Rapid Prediction of Drug Resistance Arising Due to Kinase Mutations?

Mehmet Erguven^1,2, Tülay Karakulak^1,2, M Kasim Diril^1,2, Ezgi Karaca^1,2.

Abstract

In all living organisms, protein kinases regulate various cell signaling events through phosphorylation. The phosphorylation occurs upon transferring an ATP's terminal phosphate to a target residue. Because of the central role of protein kinases in several proliferative pathways, point mutations occurring within the kinase's ATP-binding site can lead to a constitutively active enzyme, and ultimately, to cancer. A select set of these point mutations can also make the enzyme drug resistant toward the available kinase inhibitors. Because of technical and economical limitations, rapid experimental exploration of the impact of these mutations remains to be a challenge. This underscores the importance of kinase-ligand binding affinity prediction tools that are poised to measure the efficacy of inhibitors in the presence of kinase mutations. To this end, here, we compare the performances of six web-based scoring tools (DSX-ONLINE, KDEEP, HADDOCK2.2, PDBePISA, Pose&Rank, and PRODIGY-LIG) in assessing the impact of kinase mutations on their interactions with their inhibitors. This assessment is carried out on a new structure-based BINDKIN benchmark we compiled. BINDKIN contains wild-type and mutant structure pairs of kinase-inhibitor complexes, together with their corresponding experimental binding affinities (in the form of IC50, K d, and K i). The performance of various web servers over BINDKIN shows that they cannot predict the binding affinities (ΔGs) of wild-type and mutant cases directly. Still, they could catch whether a mutation improves or worsens the ligand binding (ΔΔGs) where the highest Pearson's R correlation coefficient is reached by DSX-ONLINE over the K i dataset. When homology models are used instead of K i-associated crystal structures, DSX-ONLINE loses its predictive capacity. These results highlight that there is room to improve the available scoring functions to estimate the impact of protein kinase point mutations on inhibitor binding. The BINDKIN benchmark with all related results is freely accessible online (https://github.com/CSB-KaracaLab/BINDKIN).

Entities: CellLine Chemical Disease Gene Mutation Species

Year: 2021 PMID： 33490784 PMCID： PMC7818309 DOI： 10.1021/acsomega.0c04672

Source DB: PubMed Journal: ACS Omega ISSN： 2470-1343

Introduction

In all living organisms, protein kinases are at the heart of numerous signaling pathways.[1] They regulate diverse cell signaling events through phosphorylation. The phosphorylation reaction involves the transfer of an ATP molecule’s terminal phosphate to a target residue (serine/threonine/tyrosine/histidine). This transfer adds negative charge to the phosphorylated residue and alters the physicochemical properties of the substrate protein. This phosphorylation is used as a signal to mediate most of the critical cellular processes, such as apoptosis, cell division, cell migration, and transcriptional regulation. Because of the essential role of kinases in cellular homeostasis, deregulation in the protein kinase activity often results in malignancies. Such deregulations might stem from expression level changes, as well as from point mutations at or around the ATP binding pocket of the kinase.

Fundamentals of Protein Kinase-Mediated Phosphorylation

Protein kinases mediate catalysis through their interdomain interactions (between the globular N-terminal and C-terminal domains) (Figure ). This architecture grants modularity to the enzyme to leverage the binding of ATP, cofactors, and other protein partners, during the enzyme’s catalytic cycle.[2] The smaller N-terminal lobe is mostly composed of antiparallel β-sheets, whereas the larger C-terminal lobe is enriched in α-helices.[3−5] All protein kinases share several catalytically important regions, that is, the gatekeeper residue, the activation loop, the DFG motif, and the glycine-rich loop. The kinase gatekeeper residue resides within the ATP-binding pocket (Figure ). In the inactive state, the 20–30 residue-long activation loop is found in the DFG-out conformation. In this state, the catalytic aspartate (D of DFG), responsible for transferring the phosphate from ATP to the substrate, blocks the catalytic cleft for substrate entry (Figure ). In the active DFG-in conformation, the ATP-binding pocket is accessible to ATP, while the side chain of the catalytic aspartate faces the ATP-binding pocket (Figure ).[6,7] The transition from DFG-out to DFG-in states is induced upon phosphorylation of the activation loop or by binding of ATP-competitive kinase inhibitors.[8−10] In its active form, the catalytic aspartate chelates either with magnesium or manganese ions, the essential divalent cofactor cations (Figure ).[11] Together with these cations, the ATP molecule is coordinated by the glycine-rich loop and a conserved lysine of the ATP-binding pocket. In this coordination, N1 and N6 nitrogen atoms of the adenine ring form specific hydrogen bonds with the backbone of the interdomain hinge region (Figure ).[12] The partially conserved nonpolar aliphatic residues (leucine, valine, phenylalanine, alanine, and methionine) of the ATP-binding pocket provide van der Waals contacts with ATP’s purine moiety.[13] The details on the reaction mechanism and kinetics are provided in the Supporting Information.[4,5,14−18]

Figure 1

Figure 2

Catalytic subunit of cAMP-dependent protein kinase in complex with an inhibitory peptide (which is not covered by the figure pane) and ATP (PDB entry, 1ATP).[20] The ATP, hinge region, and glycine-rich loop backbone, conserved lysine, and catalytic aspartate are shown in sticks. Manganese ions are depicted as pink spheres. The positions of the two common gatekeeper residues are shown as smaller gray spheres. The blue-, red-, and orange-colored atoms correspond to nitrogen, oxygen, and phosphorus, respectively. The possible dipole–dipole (hydrogen bond), ion–ion (salt bridge), and ion–dipole interactions between the kinase and ATP are depicted in blue-, red-, and gray-colored dashed lines, respectively.

Representative protein kinase structure in DFG-in/-out states. DFG-in and DFG-out conformations of ABL kinase were superimposed (PDB entries, 3KF4 and 3KFA, respectively).[19] In the DFG-in conformation (cyan), the activation loop exposes the ATP-binding pocket. It also orients the catalytic aspartate toward the ATP-binding pocket. In the DFG-out conformation (pink), the activation loop occludes the ATP-binding pocket and orients the catalytic aspartate away from the ATP-binding pocket. The glycine-rich loop is represented in orange. Catalytic subunit of cAMP-dependent protein kinase in complex with an inhibitory peptide (which is not covered by the figure pane) and ATP (PDB entry, 1ATP).[20] The ATP, hinge region, and glycine-rich loop backbone, conserved lysine, and catalytic aspartate are shown in sticks. Manganese ions are depicted as pink spheres. The positions of the two common gatekeeper residues are shown as smaller gray spheres. The blue-, red-, and orange-colored atoms correspond to nitrogen, oxygen, and phosphorus, respectively. The possible dipole–dipole (hydrogen bond), ion–ion (salt bridge), and ion–dipole interactions between the kinase and ATP are depicted in blue-, red-, and gray-colored dashed lines, respectively.

Protein Kinase Deregulation and Drug Resistance due to Point Mutations

Point mutations, within the vicinity of the protein kinase’s functionally important regions, can impact cellular processes by altering the specificity of the enzyme or creating a constitutively active protein kinase. These changes can lead to severe clinical outcomes. For example, naturally occurring bulkier side chain substitutions at the gatekeeper position cause drug resistance, as they sterically block the entry of drug molecules into the ATP-binding pocket. Two such gatekeeper mutations are T315I in ABL kinase and T790M in EGFR. These mutations sever the critical hydrogen bond between the gatekeeper threonine hydroxyl and the ligand. They also destabilize the inactive conformation of the enzyme, leading to a constitutively active state. Constitutive activity of these mutants is attributed to the enhanced van der Waals interactions between the phenylalanine of the DFG motif and the mutant gatekeeper residue.[21−24] The hydrogen bond network between the ATP-binding pocket and the adenine ring does not involve the gatekeeper residue side chain. Therefore, the gatekeeper residue mutations do not hinder the binding of ATP.[25−27] Instead, the adenine ring engages in a conserved hydrogen bonding network, involving the hinge region (backbone) of the enzyme (Figure ). Also, the increased space occupation by the larger gatekeeper residue does not impact the accommodation of ATP because of the large spatial gap between the ATP molecule and the gatekeeper residue. Frequently, mutations in cancer can also be found within the N-lobe side of the activation loop.[28] For example, V600 of BRAF is a commonly mutated residue in cancer. The V600D/E/G/K/L/M/R mutations of BRAF result in a constitutively active enzyme. This mutation position corresponds to activating D1228V/N/H in MET and D816E/H/V/N/F/Y/I in KIT. Here, we also see that physiochemically opposing mutations can yield to hyperactivation: while mutation of valine to aspartate cause overactivation in BRAF, mutation of aspartate to valine leads to overactivation in MET and KIT kinases. Other such prominent examples are L858R (activation loop) and G719S (proximal to glycine-rich loop) mutants of EGFR, which, similar to the gatekeeper residue mutations, destabilize the inactive conformation of the kinase.[29] These observations highlight the complex nature of mutation-induced physiochemical changes in protein kinases.

Utilization of Kinase Mutations for Functional Studies

Drug discovery and chemical genetic fields frequently make use of ATP-binding pocket mutations to study the kinase function. In the case of drug discovery, the protein kinase gatekeeper and other critical residues are artificially mutated in model systems to study the mechanism of drug resistance. This approach allows scientists to tailor novel compounds for the treatment of drug-resistant tumors.[30] In chemical genetics, naturally occurring mutations are used as a tool to elucidate the biological roles of protein kinases.[31] For this, the kinase gatekeeper residue is mutated to glycine or alanine through gene editing, which enables the binding of a bulky inhibitor. As a result of this reverse engineering, protein kinases become sensitive to inhibitors, which can selectively turn the enzyme off without interfering with other kinases.[26,32] Such engineered kinases include yeast v-Src (I338G), c-Abl (T315A), Cdk2 (F80G),[33] Mps1 (M516G),[34] and human Cdk12 (F813G).[35] Besides their numerous advantages, exploiting the ATP-binding pocket mutations also has potential handicaps, such as incompatibility of the mutation–drug combinations or reduced enzyme activity after mutation. To minimize these risks, prescreening of mutation–drug combinations with binding affinity prediction tools stands out as a promising strategy.

How Far Are We from Predicting the Impact of Mutations on Protein Kinase–Ligand Binding?

Several methods have been proposed to predict protein–ligand binding affinities. However, only recently, the accuracy of these methods has been evaluated for predicting the impact of protein kinase mutations on their ligand binding. In 2018, Hauser et al. compiled a benchmark set, encompassing 144 ligand-binding affinity data of clinically relevant Abl mutations.[36] A year later, Aldeghi et al. assessed the capacity of statistical mechanics, mixed physics- and knowledge-based potentials, and machine-learning approaches in predicting the impact of 31 Abl mutations on their drug interactions.[37] Both of these approaches used a hybrid structure-based Abl benchmark, composed of wild-type co-crystal/docked structures and modeled mutant cases. Moreover, both papers assessed the use of sophisticated tools, the use of which would be too complicated for many experimental biologists. Next to these approaches, there are also web-based protein–ligand binding affinity predictors, which can be easily used by nonexperts to plan/guide their experiments. However, a systematic assessment of these tools within the context of protein kinase mutations has never been performed. To address this need, here, we have benchmarked four web-based protein–ligand scoring functions, DSX-ONLINE, KDEEP, Pose&Rank, and PRODIGY-LIG, as well as two general scoring functions, PDBePISA and HADDOCK Score,[38−45] for predicting the impact of protein kinase mutations on their ligand binding. We have selected these predictors based on their user-friendliness and widespread use. The benchmarking has been carried out on our structure-based BINDKIN (effect of point mutations on the BINDing affinity of protein KINase–ligand complexes) data set.

Results and Discussion

Characteristics of the BINDKIN Benchmark

The BINDKIN benchmark covers nine unique protein kinases (seven EGFR, three Abl, three Mps1, three Src, two Cdk2, one ALK, one FGFR, one Kit, and one PKA). In the benchmark, each wild-type crystal structure has a mutant counterpart, where both of the cases are bound to the same drug. While compiling the BINDKIN set, we have made sure that wild-type and mutant structures and protein kinase–drug binding affinities come from the same study, so that the experimental values are comparable. These strict criteria resulted in the collection of 23 protein kinase–drug pairs (with 8, 9, and 6; IC50-, Kd-, and Ki-cases, respectively) and 42 individual cases (with 16, 15, and 11; IC50-, Kd-, and Ki-cases, respectively) (Table S1). Despite being ideal to convert different types of kinetic data to a common metric, that is, Gibbs-free energy (ΔG), we could not perform this conversion due to the unavailability of substrate concentrations. The 23 mutant cases in the BINDKIN benchmark include 17 single, three double, two triple, and one quintuple point mutants. A total of 36 mutations are distributed across 15 unique positions, most of which are located within or at the boundaries of functionally crucial segments: six positions are at the hinge region; two positions at the activation loop; and one position at the glycine-rich loop (Figure a). In their wild-type states, these mutation positions exert diverse biophysical characteristics (charged, polar, and hydrophobic), whereas in their mutant forms, they predominantly turn into hydrophobic and charged amino acids. Approximately, half of the mutations lead to bulkier residues. 8 out of 42 structures are in DFG-out, and the remaining structures are in DFG-in conformation. Except two case pairs, the wild-type and mutant forms of all proteins share a common DFG state. Among these two pairs, the wild-type 4WA9 and 5AP1 are in the DFG-out state, while the mutant 4TWP and 5AP4 are in the DFG-in state. The DFG state change induced by these mutants may cause the production of constitutively active protein kinases.

Figure 3

BINDKIN benchmark characteristics. (a) The corresponding positions of the mutated residues are mapped on a representative ATP-bound protein kinase (cAMP-dependent protein kinase with the PDB entry 1ATP).[20] The green sphere is the position where the majority of BINDKIN mutations are located. The blue sphere indicates the location of the sensitizing mutations, where the red spheres indicate the mutations conferring resistance. The light brown ones correspond to the neutral mutations. (b) The distribution of the binding kinetics data for 42 individual cases. The data was split into Kd, Ki, and IC50 subsets and plotted as box-and-whisker plots for the corresponding wild-type and mutant cases. (c) The probability distribution of the kinetic data for wild-type and mutant pairs. The blue and red areas indicate the drug-sensitive and drug-resistant cases, respectively. The KDE (kernel density estimate) is represented by the light-brown area. The distribution of all cases covering the range of 0–40 nM were given on the right. The subset of cases within the range of 0–0.5 nM were explicitly highlighted on the left. The 18 ligands presented in BINDKIN are all ATP-competing reversible protein kinase inhibitors. These inhibitors are usually composed of five or six-membered, mostly heterocyclic rings, which enrich the nitrogen content of the ligands (Table S1, Figure S1). The rings are generally linked by secondary amine, ether, or ketone groups. Because of the abundance of unprotonated nitrogen atoms in the rings and unprotonated oxygen atoms in the linker ether and ketone groups, the hydrogen bond acceptors are more abundant than the donors. Regarding the conformational degrees of freedom of inhibitors, BINDKIN contains both rigid and flexible ligands, as the number of rotatable bonds in the ligands range from 0 to 11. We have looked for a relationship between the ligand properties and the reported binding affinities, but could not observe a correlation between them (Figure S2).[46] BINDKIN spans the binding affinity ranges of 0.80–350, 0.73–185, and 0.07–1090 nM for IC50, Kd, and Ki subsets, respectively (Table S2).[47−62] Therefore, the Ki subset covers the broadest binding affinity range (Figure b). We classified a mutation as drug-sensitive if the binding affinity of the mutant complex is at least 10-folds higher (lower IC50, Kd, or Ki) than that of its wild-type state.[36,37,63] When the binding affinity of the mutant complex is lower (higher IC50, Kd, or Ki) than one-tenth of its wild-type state, then this case is considered as drug-resistant. The remaining cases are classified as neutral. According to these criteria, BINDKIN contains 5 sensitive (one Kd and four Ki), 5 resistant (three IC50, and two Kd), and 13 neutral (five IC50, six Kd, and two Ki) cases (Figure c). For more details on our BINDKIN benchmark, please see the Methods and https://github.com/CSB-KaracaLab/BINDKIN.

Majority of Web-Based Scoring Functions can Predict the ΔΔG-Related Changes Caused by Kinase Mutations

Advanced computational approaches have been demonstrated to accurately predict protein kinase–ligand binding affinities.[37,63] Considering that these approaches do not have a broad reach among the experimental biology community, we have focused on user-friendly web servers’ performance in predicting the impact of kinase mutations. To this end, we have tested DSX-ONLINE, HADDOCK2.2 (refinement interface), KDEEP, PDBePISA, Pose&Rank, and PRODIGY-LIG servers on the BINDKIN benchmark. To assess the performance of each tool, we have pooled the affinity predictions (n = 42) and analyzed their association with the experimental data. This direct assessment shows that the presented scoring functions could not predict the affinities of BINDKIN cases (Figure ). Here, the highest R-value is produced by PRODIGY-LIG (R = 0.26). When we divide the same set according to their kinetic metrics (IC50, Kd, Ki), the highest correlation is obtained by Pose&Rank over IC50 cases (R = ∼0.50). This performance is followed by KDEEP and DSX-ONLINE (0.40 < R < 0.50, IC50 cases). Assessing wild-type and mutant states separately did not improve the presented correlations (data not shown).

Figure 4

Performance of various scoring functions according to direct assessment. Because of the nature of prediction scores, a negative correlation for pKd (KDEEP) and a positive correlation for the other scoring terms are expected. (a) DSX-ONLINE, (b) HADDOCK2.2 (HADDOCK score), (c,d) KDEEP (pKd and ΔG), (e) PDBePISA (ΔG), (f,g) Pose&Rank (PoseScore and RankScore), and (h) PRODIGY-LIG (ΔG) performances. The sample sizes for IC50, Kd, and Ki data are n = 16, n = 15, and n = 11, respectively. Next to the direct assessment, we have used two other metrics, that is, delta and binary assessments (Figure S3, see the Methods, Section ). The delta assessment measures the absolute change in the affinity upon mutation (correlated to ΔΔG), improving the R values significantly. This is especially the case for the R-performances of DSX-ONLINE, KDEEP, and PoseScore over Ki-cases, which become 0.97, 0.76, and 0.56, respectively. When the performance of these three predictors is analyzed from the perspective of drug resistance/sensitivity, it is observed that for each scoring function, different cases deviate from the linear behavior (Figure ). This means that distinct cases pose different prediction difficulties for the presented scoring functions. In general, significant binding affinity changes, leading to a gain of drug sensitivity or resistance, are easier to predict. The Ki-subset is rich in drug-sensitive cases. Consequently, having the Ki-subset dominated by cases with significant binding affinity changes could be the reason for the good prediction performances over the Ki-cases. Unfortunately, none of the presented scoring functions could predict the neutral behavior of IC50- and Kd-cases.

Figure 5

Performance of various scoring functions according to the delta assessment. The delta assessment of DSX-ONLINE, KDEEP (ΔG), and Pose&Rank (PoseScore) is characterized based on the drug resistance/sensitivity of the cases. The data are evaluated as subsets based on the kinetic metrics. The neutral (gray), drug-resistant (red), and drug-sensitive (blue) cases are indicated. In the binary assessment, the accuracy of predictors was measured upon analyzing the agreement between the direction of change in the experimental and predicted binding affinities. According to this metric, DSX-ONLINE accurately predicts the affinity change of all Ki-cases. The success rate of DSX-ONLINE over Ki-cases is followed by PoseScore and PRODIGY-LIG, both leading to 83% accuracy. As another important highlight, KDEEP (pKd) reaches 88% success rate in IC50 cases. When the neutral cases were excluded from the analysis, it is observed that DSX-ONLINE, PRODIGY-LIG, and PoseScore can predict the drug-sensitive and drug-resistant cases, with 100, 90, and 90% accuracies, respectively. For the neutral cases, the best performance is achieved by DSX-ONLINE and PRODIGY-LIG, with 62 and 54% accuracies. Therefore, both delta and binary assessments are in agreement with the fact that the mutations causing the most dramatic binding affinity changes (resistance or sensitivity) are easier to predict. However, the amount of neutral cases (n = 13) is almost 3 times more than the drug-sensitive (n = 5) or drug-resistant (n = 5) ones. Therefore, we cannot rule out the fact that small and different sample sizes could have an impact on the observed prediction accuracy. As an across-metric comparison, over Ki-cases, DSX-ONLINE leads to R values of 0.45 and 0.97 for direct and delta assessments, respectively, and 100% accuracy for the binary assessment. This accuracy improvement is explained by the fact that while strong binders (complexes with lower Ki values) show an orthogonal spread in the direct assessment, they show a linear spread for the delta assessment (Figure S4). While delta and binary assessments agree in this particular case, in some other cases, delta and binary assessments contradict. For example, PRODIGY-LIG achieves an R-value of 0.29 in the delta assessment and 83% accuracy in the binary assessment over the Ki-cases. This means that a scoring function may achieve great success on the binary assessment because it can predict the direction of affinity change. However, the same scoring function may not be able to predict the amount of change (ΔΔG), which would lead to a lower R-value. Within the context of the biological meaning of the assessment approaches, we suggest the concurrent use of delta and binary assessments. After testing the web servers with crystal structures, we have scored the HADDOCK-refined complexes with the probed web servers too. These efforts have revealed that structural refinement of crystal structures did not improve the prediction accuracy (data not shown). Finally, among the three different kinetic data types, according to the direct assessment, IC50 turns out to be the best descriptor for most of the scoring functions. When the complexes are considered as mutant/wild-type pairs (delta and binary assessments), Ki comes out as the best descriptor (Figure S3). Interestingly, none of the methods could lead to a reliable prediction over the Kd set. Kd is a universal measure of binding, while Ki is defined for enzymes only, which could be the reason for the high R values reported for this set.[64] However, in this case too, we cannot rule out the fact that success rates of the kinetic descriptors may depend on the number of cases available for each descriptor (eight pairs for IC50, nine pairs for Kd, and six pairs for Ki).

Structure Quality has a Substantial Impact on the Prediction Accuracy

In a real-life scenario, the experimental structure of the kinase under study might not be available. Taking this into consideration, we have homology modeled the Ki-cases and probed these structures with DSX-ONLINE (as this combination leads to the best performance for the delta and binary assessments). The mutation-specific features such as hydropathy index change,[65] residue volume change,[66] and drug resistance or sensitivity were chosen for the characterization of the affinity data Figure a). As a result of this exercise, we show that the performance of DSX-ONLINE is dramatically worsened by the worsened quality of the input structure (an R-drop from 0.97 to 0.24). We also observe a decrease in the residue volume upon mutation accompanied by an increase of polarity and drug sensitivity. To understand how modeling has impacted the conformations, we have performed a minimal structure examination. The case pair with the visibly highest predicted affinity difference was chosen for this exercise (Figure a, red circled data points). In this regard, the human ABL1 kinase–AXI complex is the choice. The wild-type and mutant kinase structures of this case pair (4WA9, wild-type; 4TWP, mutant) has the highest root-mean-square deviation (rmsd) between their backbone heavy atoms, when compared with the other Ki-cases (Figure b,c). The structural comparison of ABL1 and ABL T315I mutant shows that the conformation of the glycine-rich loop is distinctly altered upon modeling, misleading the DSX-ONLINE prediction. This glycine-rich loop is an integral element of the ATP-binding pocket, which is in direct contact with ABL’s ligand AXI. This example highlights the importance of proper mutation modeling in the prediction of mutant protein kinase–drug binding affinity changes.

Figure 6

The default scoring function of DSX-ONLINE is tested on the Ki-associated crystal and homology model structures using delta assessment. (a) Pink color corresponds to the neutral cases (left) or a decrease in the hydropathy index (middle) or residue volume (right). Turquoise color corresponds to the sensitive cases (left) or an increase in the hydropathy index (middle) or residue volume (right). (b) The crystal structure (PDB entry, 4WA9, orange) and the model structure (blue) of wild-type human ABL1 kinase were aligned (RMSD = 2.9 Å). (c) The crystal structure (PDB entry 4TWP, orange) and the model structure (blue) of T315I mutant human ABL1 kinase were aligned (RMSD = 2.7 Å). The ATP-binding pockets of the superposed structures are shown for both alignments. The ligand, the residue to be mutated, and the mutant residue are shown in sticks. The glycine-rich loops are encircled with the gray ellipses.

Conclusions

Our work investigates whether the web-based scoring tools can predict the impact of protein kinase mutations on their ligand-binding affinities. To test this, we have compiled BINDKIN, the first structure-based affinity benchmark of wild-type and mutant protein kinase–inhibitor complexes. Based on the Ki-associated cases of BINDKIN, we have generated a homology model benchmark too. Using the crystal structure sets and their available experimental binding affinity data, we have assessed the prediction accuracy of six different web-based scoring tools, DSX-ONLINE, HADDOCK2.2, KDEEP, PDBePISA, Pose&Rank, and PRODIGY-LIG. Overall, ligand-specific scoring functions (used by DSX-ONLINE, KDEEP, Pose&Rank, and PRODIGY-LIG) performed better than the general scoring functions (used by PDBePISA and HADDOCK2.2). When the ligand-specific modified HADDOCK[67] scoring function was used, the prediction accuracy did not improve significantly (data not shown). When a deeper analysis is performed, we have observed that (i) Ki is a sensitive descriptor that can capture the effect of mutations on the binding affinity, (ii) DSX-ONLINE is the best tool in predicting the mutation-induced Ki changes, (iii) the quality of the input structure significantly impacts the prediction accuracy. Therefore, despite the recent advances presented by machine-learning tools in the field of computational structural biology, in our study, a classical knowledge-based approach (DSX-ONLINE) still performed better than a machine-learning approach (KDEEP). All in all, as an answer to our title question, our results show that we are currently far from rapidly and accurately predicting the impact of protein kinase mutations, especially when the kinase structure is not at hand. A more comprehensive benchmark than BINDKIN is necessary to train and improve the available predictors in a reliable manner. In this regard, drug companies working on mutant oncogenic protein kinases can facilitate a consistent and a more comprehensive data set by providing their related experimental data. From our end, to assist the kinase-research field, we are sharing our files associated with this work online at https://github.com/CSB-KaracaLab/BINDKIN.

Methods

Collection of the BINDKIN Benchmark

To construct the BINDKIN (effect of point mutations on the BINDing affinity of protein KINase–ligand complexes) benchmark, we performed a thorough search in the Protein Data Bank (PDB)[68] (https://www.rcsb.org/). We obtained the final list of wild-type and mutant kinase–ligand pairs according to the following criteria: For each mutant protein kinase–ligand complex, there has to be a wild-type complex, containing the same protein and the ligand. Structures of wild-type and mutant complexes should originate from the same research article. For each complex pair, there has to be experimentally determined binding affinity data, originating from the same research. The experimental binding affinity data were acquired from PDBbind (http://www.pdbbind-cn.org/index.php),[69−71] Binding DB (https://www.bindingdb.org/bind/index.jsp),[72] and Binding MOAD (http://bindingmoad.org/Search/advancesearch)[73,74] databases. The ligand has to be an ATP-competing reversible (noncovalent) inhibitor. These criteria left us with 23 pairs of wild-type and mutant complexes, making up the BINDKIN benchmark. In the benchmark, three kinases are found in the dimeric state (PDB IDs: 2IW8/2IW9, 3G0F, and 4EOR/4EOK). However, in these cases, the protein–protein dimerization interface is not involved in the ligand interactions. Therefore, these cases have been treated as monomeric kinase complexes. The BINDKIN benchmark includes seven EGFR, three Abl, three Mps1, three Src, two Cdk2, one ALK, one FGFR, one Kit, and one PKA proteins. These complexes present 36 point mutations distributed across 15 unique positions within or in the vicinity of the ATP-binding pocket (Figure a). BINDKIN covers binding modes of 18 different inhibitors (Figure S1). The 2D images of the ligands were generated from their respective unique SMILES strings, using the Smi2Depict tool of ChemDB Chemoinformatics Portal (http://cdb.ics.uci.edu/cgibin/Smi2DepictWeb.py).[75] The pharmacophoric characteristics of ligands were evaluated with the ALOGPS web server of the Virtual Computational Chemistry Laboratory (http://www.virtuallaboratory.org/web/alogps/)[76−85] and the pkCSM web server (http://structure.bioc.cam.ac.uk/pkcsm).[86] The available ligand-related data were obtained from RCSB PDB and PubChem (https://pubchem.ncbi.nlm.nih.gov/). All the benchmark-related features are deposited in https://github.com/CSB-KaracaLab/BINDKIN. Further details about the benchmark characteristics are described in the Results and Discussion section.

BINDKIN-Homology Model Benchmark

When the structure of the protein kinase under study is not at hand, the researchers often resort to homology modeling to predict the relevant structure. Therefore, to simulate a realistic scenario, we established a homology model benchmark for the Ki-associated cases. Here, we chose the Ki-associated cases, as their binding affinities were predicted with the overall highest accuracy based on the available BINDKIN crystal structures. The wild-type and mutant sequences were structurally modeled with the default settings of the I-TASSER web server (https://zhanglab.ccmb.med.umich.edu/I-TASSER/).[87−89] During modeling, the original coordinates of the wild-type or mutant structures were excluded from the template list. After obtaining the homology models, their corresponding ligands were placed at their catalytic-binding pocket using the original crystal structure as a template. The fitting was performed with PyMOL.[90] In these crude models, steric clashes were observed. To optimize the protein–ligand interface, we applied water refinement on each model using the HADDOCK2.2 refinement interface (https://milou.science.uu.nl/services/HADDOCK2.2/haddockserver-refinement.html).[42] The HADDOCK refinement performs very short molecular dynamics simulation cycles in explicit water.

Benchmarking of the Web Servers

The molecules present other than the protein and the ligand either led to an error in the web servers or were disregarded by the web servers. We should note here that DSX-ONLINE has an option to include water molecules in the calculations. However, to measure the performance on a standardized set of crystal structures, we decided to use the same input in all servers. Therefore, before benchmarking, the crystallization buffer additives, crystal water molecules, and ions were removed from the co-crystal structures. For each occurrence of multiple conformations, the conformer with the highest occupancy was retained, and the other conformers were removed. In one of the EGFR cases (5EM7), the ligand 5Q4 is present in two distinct conformations. Both conformations were taken into consideration. The ligand coordinate files were converted to “mol2” and “sdf” formats based on the different file format requirements of the web servers. For mol2 conversion, Open BABEL (v2.4.1), and for sdf conversion, Online SMILES Translator and Structure File Generator (National Cancer Institute) (https://cactus.nci.nih.gov/translate/) were used. For the sdf conversion, the “Aromatic” SMILES representation option (prints the unique SMILES string into the “sdf” file) and the “3D” coordinate option were chosen. The sdf files were protonated by default. The conformations and the 3D coordinates of the ligands were retained during conversion to both mol2 and sdf formats. Six different web-based scoring functions were used to predict the binding affinities listed within BINDKIN. A short description of each web server is provided in Table . Among the web servers, DSX-ONLINE became inaccessible shortly after the completion of its benchmarking.

Table 1

List of Benchmarked Servers

tool name	brief explanation	input format	link
DSX-ONLINE[41]	a knowledge-based scoring function composed of distance-dependent pair potentials, torsion-angle potentials, and solvent-accessible surface-dependent potentials	takes the protein and ligand coordinate files in pdb and mol2 formats	http://pc1664.pharmazie.uni-marburg.de/drugscore/index.php
KDEEP[39]	a deep convolutional neural network approach. The web server represents the binding site as a 24 Å voxel for pharmacophore featurization, considering eight pharmacophoric-like features. It reports pK_d and ΔG scores	takes the protein and the ligand coordinate files in pdb and sdf formats, respectively	https://www.playmolecule.org/Kdeep/
Pose & Rank[38]	PoseScore and RankScore are atomistic distance-dependent statistical scoring functions. The PoseScore is optimal for distinguishing the native binding geometries of compounds from other poses, whereas RankScore is optimal for distinguishing ligands from nonbinding small molecules	takes the protein and ligand coordinate files in pdb and mol2 formats, respectively	https://modbase.compbio.ucsf.edu/poseandrank/
PRODIGY-LIG[40]	an atomistic contact-based protein–small molecule binding affinity predictor	takes the protein and ligand coordinates in pdb format	https://bianca.science.uu.nl/prodigy/
HADDOCK2.2[42,91]	HADDOCK2.2 is a molecular docking program, using both force field-based and empirically-derived scoring terms. In this study, we used the refinement interface of HADDOCK2.2	Takes the protein and the ligand coordinates in pdb format	https://milou.science.uu.nl/services/HADDOCK2.2/haddockserver-refinement.html
PDBePISA[43−45]	scores according to the average free energy contributions of the hydrogen bonds and salt bridges between the subunits and the solvation free energy of folding	takes the protein and the ligand coordinates in pdb format	https://www.ebi.ac.uk/pdbe/pisa/

Assessment of the Prediction Results

Three different approaches were used to assess the prediction results. These are direct, delta, and binary assessments. The direct assessment is a classical approach, where a linear correlation between all scores and experimental data (wild-type and mutant) were sought. For the delta assessment, a direct relationship between the mutation-induced binding affinity (Δbaexp, eq ) and the mutation-induced score (Δbapred, eq ) changes was sought. In the case of binary assessment, an agreement in the direction of mutation-induced change was required. For example, if a mutation leads to a worsening of the experimental binding affinity, and it is predicted as such by the computational prediction, it was considered as a successful prediction. If a worsening in the predicted affinity accompanies a mutation-induced improvement in the experimental affinity, then this case was counted as a misprediction. In the binary assessment, the percentage of correct predictions was used as the overall success rate. For linear regression analyses, the associated R and p values and the plots were generated with R (v3.6.1),[92] invoked by RStudio (v1.2.5001).[93] The following R packages were used to generate scatterplots, barplots, or histograms: ggpubr,[94] ggplot2,[95] magrittr,[96] ggrepel,[97] grid,[92] and gridExtra.[98] The heatmap was produced with pheatmap[99] and RColorBrewer[100] R packages.

80 in total

1. Application of ALOGPS 2.1 to predict log D distribution coefficient for Pfizer proprietary compounds.

Authors: Igor V Tetko; Gennadiy I Poda
Journal: J Med Chem Date: 2004-11-04 Impact factor: 7.446

2. Virtual computational chemistry laboratory--design and description.

Authors: Igor V Tetko; Johann Gasteiger; Roberto Todeschini; Andrea Mauri; David Livingstone; Peter Ertl; Vladimir A Palyulin; Eugene V Radchenko; Nikolay S Zefirov; Alexander S Makarenko; Vsevolod Yu Tanchuk; Volodymyr V Prokopenko
Journal: J Comput Aided Mol Des Date: 2005-06 Impact factor: 3.686

Review 3. Protein kinases--structure and function.

Authors: D Bossemeyer
Journal: FEBS Lett Date: 1995-08-01 Impact factor: 4.124

4. Kinetic and structural analyses reveal residues in phosphoinositide 3-kinase α that are critical for catalysis and substrate recognition.

Authors: Sweta Maheshwari; Michelle S Miller; Robert O'Meally; Robert N Cole; L Mario Amzel; Sandra B Gabelli
Journal: J Biol Chem Date: 2017-07-04 Impact factor: 5.157