Mehmet Erguven1,2, Tülay Karakulak1,2, M Kasim Diril1,2, Ezgi Karaca1,2. 1. Izmir Biomedicine and Genome Center, 35330 Izmir, Turkey. 2. Izmir International Biomedicine and Genome Institute, Dokuz Eylul University, 35340 Izmir, Turkey.
Abstract
In all living organisms, protein kinases regulate various cell signaling events through phosphorylation. The phosphorylation occurs upon transferring an ATP's terminal phosphate to a target residue. Because of the central role of protein kinases in several proliferative pathways, point mutations occurring within the kinase's ATP-binding site can lead to a constitutively active enzyme, and ultimately, to cancer. A select set of these point mutations can also make the enzyme drug resistant toward the available kinase inhibitors. Because of technical and economical limitations, rapid experimental exploration of the impact of these mutations remains to be a challenge. This underscores the importance of kinase-ligand binding affinity prediction tools that are poised to measure the efficacy of inhibitors in the presence of kinase mutations. To this end, here, we compare the performances of six web-based scoring tools (DSX-ONLINE, KDEEP, HADDOCK2.2, PDBePISA, Pose&Rank, and PRODIGY-LIG) in assessing the impact of kinase mutations on their interactions with their inhibitors. This assessment is carried out on a new structure-based BINDKIN benchmark we compiled. BINDKIN contains wild-type and mutant structure pairs of kinase-inhibitor complexes, together with their corresponding experimental binding affinities (in the form of IC50, K d, and K i). The performance of various web servers over BINDKIN shows that they cannot predict the binding affinities (ΔGs) of wild-type and mutant cases directly. Still, they could catch whether a mutation improves or worsens the ligand binding (ΔΔGs) where the highest Pearson's R correlation coefficient is reached by DSX-ONLINE over the K i dataset. When homology models are used instead of K i-associated crystal structures, DSX-ONLINE loses its predictive capacity. These results highlight that there is room to improve the available scoring functions to estimate the impact of protein kinase point mutations on inhibitor binding. The BINDKIN benchmark with all related results is freely accessible online (https://github.com/CSB-KaracaLab/BINDKIN).
In all living organisms, protein kinases regulate various cell signaling events through phosphorylation. The phosphorylation occurs upon transferring an ATP's terminal phosphate to a target residue. Because of the central role of protein kinases in several proliferative pathways, point mutations occurring within the kinase's ATP-binding site can lead to a constitutively active enzyme, and ultimately, to cancer. A select set of these point mutations can also make the enzyme drug resistant toward the available kinase inhibitors. Because of technical and economical limitations, rapid experimental exploration of the impact of these mutations remains to be a challenge. This underscores the importance of kinase-ligand binding affinity prediction tools that are poised to measure the efficacy of inhibitors in the presence of kinase mutations. To this end, here, we compare the performances of six web-based scoring tools (DSX-ONLINE, KDEEP, HADDOCK2.2, PDBePISA, Pose&Rank, and PRODIGY-LIG) in assessing the impact of kinase mutations on their interactions with their inhibitors. This assessment is carried out on a new structure-based BINDKIN benchmark we compiled. BINDKIN contains wild-type and mutant structure pairs of kinase-inhibitor complexes, together with their corresponding experimental binding affinities (in the form of IC50, K d, and K i). The performance of various web servers over BINDKIN shows that they cannot predict the binding affinities (ΔGs) of wild-type and mutant cases directly. Still, they could catch whether a mutation improves or worsens the ligand binding (ΔΔGs) where the highest Pearson's R correlation coefficient is reached by DSX-ONLINE over the K i dataset. When homology models are used instead of K i-associated crystal structures, DSX-ONLINE loses its predictive capacity. These results highlight that there is room to improve the available scoring functions to estimate the impact of protein kinase point mutations on inhibitor binding. The BINDKIN benchmark with all related results is freely accessible online (https://github.com/CSB-KaracaLab/BINDKIN).
In
all living organisms, protein kinases are at the heart of numerous
signaling pathways.[1] They regulate diverse
cell signaling events through phosphorylation. The phosphorylation
reaction involves the transfer of an ATP molecule’s terminal
phosphate to a target residue (serine/threonine/tyrosine/histidine).
This transfer adds negative charge to the phosphorylated residue and
alters the physicochemical properties of the substrate protein. This
phosphorylation is used as a signal to mediate most of the critical
cellular processes, such as apoptosis, cell division, cell migration,
and transcriptional regulation. Because of the essential role of kinases
in cellular homeostasis, deregulation in the protein kinase activity
often results in malignancies. Such deregulations might stem from
expression level changes, as well as from point mutations at or around
the ATP binding pocket of the kinase.
Fundamentals
of Protein Kinase-Mediated Phosphorylation
Protein kinases
mediate catalysis through their interdomain interactions
(between the globular N-terminal and C-terminal domains) (Figure ). This architecture
grants modularity to the enzyme to leverage the binding of ATP, cofactors,
and other protein partners, during the enzyme’s catalytic cycle.[2] The smaller N-terminal lobe is mostly composed
of antiparallel β-sheets, whereas the larger C-terminal lobe
is enriched in α-helices.[3−5] All protein kinases share several
catalytically important regions, that is, the gatekeeper residue,
the activation loop, the DFG motif, and the glycine-rich loop. The
kinase gatekeeper residue resides within the ATP-binding pocket (Figure ). In the inactive
state, the 20–30 residue-long activation loop is found in the
DFG-out conformation. In this state, the catalytic aspartate (D of DFG), responsible for transferring the phosphate from ATP
to the substrate, blocks the catalytic cleft for substrate entry (Figure ). In the active
DFG-in conformation, the ATP-binding pocket is accessible to ATP,
while the side chain of the catalytic aspartate faces the ATP-binding
pocket (Figure ).[6,7] The transition from DFG-out to DFG-in states is induced upon phosphorylation
of the activation loop or by binding of ATP-competitive kinase inhibitors.[8−10] In its active form, the catalytic aspartate chelates either with
magnesium or manganese ions, the essential divalent cofactor cations
(Figure ).[11] Together with these cations, the ATP molecule
is coordinated by the glycine-rich loop and a conserved lysine of
the ATP-binding pocket. In this coordination, N1 and N6 nitrogen atoms
of the adenine ring form specific hydrogen bonds with the backbone
of the interdomain hinge region (Figure ).[12] The partially
conserved nonpolar aliphatic residues (leucine, valine, phenylalanine,
alanine, and methionine) of the ATP-binding pocket provide van der
Waals contacts with ATP’s purine moiety.[13] The details on the reaction mechanism and kinetics are
provided in the Supporting Information.[4,5,14−18]
Figure 1
Representative protein kinase structure in DFG-in/-out
states.
DFG-in and DFG-out conformations of ABL kinase were superimposed (PDB
entries, 3KF4 and 3KFA,
respectively).[19] In the DFG-in conformation
(cyan), the activation loop exposes the ATP-binding pocket. It also
orients the catalytic aspartate toward the ATP-binding pocket. In
the DFG-out conformation (pink), the activation loop occludes the
ATP-binding pocket and orients the catalytic aspartate away from the
ATP-binding pocket. The glycine-rich loop is represented in orange.
Figure 2
Catalytic subunit of cAMP-dependent protein kinase in
complex with
an inhibitory peptide (which is not covered by the figure pane) and
ATP (PDB entry, 1ATP).[20] The ATP, hinge region, and glycine-rich
loop backbone, conserved lysine, and catalytic aspartate are shown
in sticks. Manganese ions are depicted as pink spheres. The positions
of the two common gatekeeper residues are shown as smaller gray spheres.
The blue-, red-, and orange-colored atoms correspond to nitrogen,
oxygen, and phosphorus, respectively. The possible dipole–dipole
(hydrogen bond), ion–ion (salt bridge), and ion–dipole
interactions between the kinase and ATP are depicted in blue-, red-,
and gray-colored dashed lines, respectively.
Representative protein kinase structure in DFG-in/-out
states.
DFG-in and DFG-out conformations of ABL kinase were superimposed (PDB
entries, 3KF4 and 3KFA,
respectively).[19] In the DFG-in conformation
(cyan), the activation loop exposes the ATP-binding pocket. It also
orients the catalytic aspartate toward the ATP-binding pocket. In
the DFG-out conformation (pink), the activation loop occludes the
ATP-binding pocket and orients the catalytic aspartate away from the
ATP-binding pocket. The glycine-rich loop is represented in orange.Catalytic subunit of cAMP-dependent protein kinase in
complex with
an inhibitory peptide (which is not covered by the figure pane) and
ATP (PDB entry, 1ATP).[20] The ATP, hinge region, and glycine-rich
loop backbone, conserved lysine, and catalytic aspartate are shown
in sticks. Manganese ions are depicted as pink spheres. The positions
of the two common gatekeeper residues are shown as smaller gray spheres.
The blue-, red-, and orange-colored atoms correspond to nitrogen,
oxygen, and phosphorus, respectively. The possible dipole–dipole
(hydrogen bond), ion–ion (salt bridge), and ion–dipole
interactions between the kinase and ATP are depicted in blue-, red-,
and gray-colored dashed lines, respectively.
Protein Kinase Deregulation and Drug Resistance
due to Point Mutations
Point mutations, within the vicinity
of the protein kinase’s functionally important regions, can
impact cellular processes by altering the specificity of the enzyme
or creating a constitutively active protein kinase. These changes
can lead to severe clinical outcomes. For example, naturally occurring
bulkier side chain substitutions at the gatekeeper position cause
drug resistance, as they sterically block the entry of drug molecules
into the ATP-binding pocket. Two such gatekeeper mutations are T315I
in ABL kinase and T790M in EGFR. These mutations sever the critical
hydrogen bond between the gatekeeperthreonine hydroxyl and the ligand.
They also destabilize the inactive conformation of the enzyme, leading
to a constitutively active state. Constitutive activity of these mutants
is attributed to the enhanced van der Waals interactions between the
phenylalanine of the DFG motif and the mutant gatekeeper residue.[21−24] The hydrogen bond network between the ATP-binding pocket and the
adenine ring does not involve the gatekeeper residue side chain. Therefore,
the gatekeeper residue mutations do not hinder the binding of ATP.[25−27] Instead, the adenine ring engages in a conserved hydrogen bonding
network, involving the hinge region (backbone) of the enzyme (Figure ). Also, the increased
space occupation by the larger gatekeeper residue does not impact
the accommodation of ATP because of the large spatial gap between
the ATP molecule and the gatekeeper residue.Frequently, mutations
in cancer can also be found within the N-lobe side of the activation
loop.[28] For example, V600 of BRAF is a
commonly mutated residue in cancer. The V600D/E/G/K/L/M/R mutations
of BRAF result in a constitutively active enzyme. This mutation position
corresponds to activating D1228V/N/H in MET and D816E/H/V/N/F/Y/I
in KIT. Here, we also see that physiochemically opposing mutations
can yield to hyperactivation: while mutation of valine to aspartate
cause overactivation in BRAF, mutation of aspartate to valine leads
to overactivation in MET and KIT kinases. Other such prominent examples
are L858R (activation loop) and G719S (proximal to glycine-rich loop)
mutants of EGFR, which, similar to the gatekeeper residue mutations,
destabilize the inactive conformation of the kinase.[29] These observations highlight the complex nature of mutation-induced
physiochemical changes in protein kinases.
Utilization
of Kinase Mutations for Functional
Studies
Drug discovery and chemical genetic fields frequently
make use of ATP-binding pocket mutations to study the kinase function.
In the case of drug discovery, the protein kinase gatekeeper and other
critical residues are artificially mutated in model systems to study
the mechanism of drug resistance. This approach allows scientists
to tailor novel compounds for the treatment of drug-resistant tumors.[30] In chemical genetics, naturally occurring mutations
are used as a tool to elucidate the biological roles of protein kinases.[31] For this, the kinase gatekeeper residue is mutated
to glycine or alanine through gene editing, which enables the binding
of a bulky inhibitor. As a result of this reverse engineering, protein
kinases become sensitive to inhibitors, which can selectively turn
the enzyme off without interfering with other kinases.[26,32] Such engineered kinases include yeast v-Src (I338G), c-Abl (T315A),
Cdk2 (F80G),[33] Mps1 (M516G),[34] and humanCdk12 (F813G).[35] Besides their numerous advantages, exploiting the ATP-binding
pocket mutations also has potential handicaps, such as incompatibility
of the mutation–drug combinations or reduced enzyme activity
after mutation. To minimize these risks, prescreening of mutation–drug
combinations with binding affinity prediction tools stands out as
a promising strategy.
How Far Are We from Predicting
the Impact
of Mutations on Protein Kinase–Ligand Binding?
Several
methods have been proposed to predict protein–ligand binding
affinities. However, only recently, the accuracy of these methods
has been evaluated for predicting the impact of protein kinase mutations
on their ligand binding. In 2018, Hauser et al. compiled a benchmark
set, encompassing 144 ligand-binding affinity data of clinically relevant
Abl mutations.[36] A year later, Aldeghi
et al. assessed the capacity of statistical mechanics, mixed physics-
and knowledge-based potentials, and machine-learning approaches in
predicting the impact of 31 Abl mutations on their drug interactions.[37] Both of these approaches used a hybrid structure-based
Abl benchmark, composed of wild-type co-crystal/docked structures
and modeled mutant cases. Moreover, both papers assessed the use of
sophisticated tools, the use of which would be too complicated for
many experimental biologists. Next to these approaches, there are
also web-based protein–ligand binding affinity predictors,
which can be easily used by nonexperts to plan/guide their experiments.
However, a systematic assessment of these tools within the context
of protein kinase mutations has never been performed. To address this
need, here, we have benchmarked four web-based protein–ligand
scoring functions, DSX-ONLINE, KDEEP, Pose&Rank, and PRODIGY-LIG,
as well as two general scoring functions, PDBePISA and HADDOCK Score,[38−45] for predicting the impact of protein kinase mutations on their ligand
binding. We have selected these predictors based on their user-friendliness
and widespread use. The benchmarking has been carried out on our structure-based BINDKIN (effect of point mutations on the BINDing affinity of protein KINase–ligand complexes)
data set.
Results and Discussion
Characteristics of the BINDKIN Benchmark
The BINDKIN
benchmark covers nine unique protein kinases (seven
EGFR, three Abl, three Mps1, three Src, two Cdk2, one ALK, one FGFR,
one Kit, and one PKA). In the benchmark, each wild-type crystal structure
has a mutant counterpart, where both of the cases are bound to the
same drug. While compiling the BINDKIN set, we have made sure that
wild-type and mutant structures and protein kinase–drug binding
affinities come from the same study, so that the experimental values
are comparable. These strict criteria resulted in the collection of
23 protein kinase–drug pairs (with 8, 9, and 6; IC50-, Kd-, and Ki-cases, respectively) and 42 individual cases (with 16, 15, and 11;
IC50-, Kd-, and Ki-cases, respectively) (Table S1). Despite being ideal to convert different types of kinetic data
to a common metric, that is, Gibbs-free energy (ΔG), we could not perform this conversion due to the unavailability
of substrate concentrations.The 23 mutant cases in the BINDKIN
benchmark include 17 single, three double, two triple, and one quintuple
point mutants. A total of 36 mutations are distributed across 15 unique
positions, most of which are located within or at the boundaries of
functionally crucial segments: six positions are at the hinge region;
two positions at the activation loop; and one position at the glycine-rich
loop (Figure a). In
their wild-type states, these mutation positions exert diverse biophysical
characteristics (charged, polar, and hydrophobic), whereas in their
mutant forms, they predominantly turn into hydrophobic and charged
amino acids. Approximately, half of the mutations lead to bulkier
residues. 8 out of 42 structures are in DFG-out, and the remaining
structures are in DFG-in conformation. Except two case pairs, the
wild-type and mutant forms of all proteins share a common DFG state.
Among these two pairs, the wild-type 4WA9 and 5AP1 are in the DFG-out state, while the mutant 4TWP and 5AP4 are in the DFG-in
state. The DFG state change induced by these mutants may cause the
production of constitutively active protein kinases.
Figure 3
BINDKIN benchmark characteristics.
(a) The corresponding positions
of the mutated residues are mapped on a representative ATP-bound protein
kinase (cAMP-dependent protein kinase with the PDB entry 1ATP).[20] The green sphere is the position where the majority of
BINDKIN mutations are located. The blue sphere indicates the location
of the sensitizing mutations, where the red spheres
indicate the mutations conferring resistance. The
light brown ones correspond to the neutral mutations.
(b) The distribution of the binding kinetics data for 42 individual
cases. The data was split into Kd, Ki, and IC50 subsets and plotted as
box-and-whisker plots for the corresponding wild-type and mutant cases.
(c) The probability distribution of the kinetic data for wild-type
and mutant pairs. The blue and red areas indicate the drug-sensitive and drug-resistant cases, respectively. The KDE
(kernel density estimate) is represented by the light-brown area.
The distribution of all cases covering the range of 0–40 nM
were given on the right. The subset of cases within the range of 0–0.5
nM were explicitly highlighted on the left.
BINDKIN benchmark characteristics.
(a) The corresponding positions
of the mutated residues are mapped on a representative ATP-bound protein
kinase (cAMP-dependent protein kinase with the PDB entry 1ATP).[20] The green sphere is the position where the majority of
BINDKIN mutations are located. The blue sphere indicates the location
of the sensitizing mutations, where the red spheres
indicate the mutations conferring resistance. The
light brown ones correspond to the neutral mutations.
(b) The distribution of the binding kinetics data for 42 individual
cases. The data was split into Kd, Ki, and IC50 subsets and plotted as
box-and-whisker plots for the corresponding wild-type and mutant cases.
(c) The probability distribution of the kinetic data for wild-type
and mutant pairs. The blue and red areas indicate the drug-sensitive and drug-resistant cases, respectively. The KDE
(kernel density estimate) is represented by the light-brown area.
The distribution of all cases covering the range of 0–40 nM
were given on the right. The subset of cases within the range of 0–0.5
nM were explicitly highlighted on the left.The 18 ligands presented in BINDKIN are all ATP-competing reversible
protein kinase inhibitors. These inhibitors are usually composed of
five or six-membered, mostly heterocyclic rings, which enrich the
nitrogen content of the ligands (Table S1, Figure S1). The rings are generally linked by secondary amine, ether,
or ketone groups. Because of the abundance of unprotonated nitrogen
atoms in the rings and unprotonated oxygen atoms in the linker ether
and ketone groups, the hydrogen bond acceptors are more abundant than
the donors. Regarding the conformational degrees of freedom of inhibitors,
BINDKIN contains both rigid and flexible ligands, as the number of
rotatable bonds in the ligands range from 0 to 11. We have looked
for a relationship between the ligand properties and the reported
binding affinities, but could not observe a correlation between them
(Figure S2).[46]BINDKIN spans the binding affinity ranges of 0.80–350,
0.73–185,
and 0.07–1090 nM for IC50, Kd, and Ki subsets, respectively
(Table S2).[47−62] Therefore, the Ki subset covers the
broadest binding affinity range (Figure b). We classified a mutation as drug-sensitive
if the binding affinity of the mutant complex is at least 10-folds
higher (lower IC50, Kd, or Ki) than that of its wild-type state.[36,37,63] When the binding affinity of
the mutant complex is lower (higher IC50, Kd, or Ki) than one-tenth of
its wild-type state, then this case is considered as drug-resistant. The remaining cases are classified as neutral.
According to these criteria, BINDKIN contains 5 sensitive (one Kd and four Ki), 5 resistant (three IC50, and
two Kd), and 13 neutral (five IC50, six Kd, and two Ki) cases (Figure c). For more details on our BINDKIN benchmark, please
see the Methods and https://github.com/CSB-KaracaLab/BINDKIN.
Majority of Web-Based Scoring Functions can
Predict the ΔΔG-Related Changes Caused
by Kinase Mutations
Advanced computational approaches have
been demonstrated to accurately predict protein kinase–ligand
binding affinities.[37,63] Considering that these approaches
do not have a broad reach among the experimental biology community,
we have focused on user-friendly web servers’ performance in
predicting the impact of kinase mutations. To this end, we have tested
DSX-ONLINE, HADDOCK2.2 (refinement interface), KDEEP, PDBePISA, Pose&Rank,
and PRODIGY-LIG servers on the BINDKIN benchmark. To assess the performance
of each tool, we have pooled the affinity predictions (n = 42) and analyzed their association with the experimental data.
This direct assessment shows that the presented scoring
functions could not predict the affinities of BINDKIN cases (Figure ). Here, the highest R-value is produced by PRODIGY-LIG (R =
0.26). When we divide the same set according to their kinetic metrics
(IC50, Kd, Ki), the highest correlation is obtained by Pose&Rank
over IC50 cases (R = ∼0.50). This
performance is followed by KDEEP and DSX-ONLINE (0.40 < R < 0.50, IC50 cases). Assessing wild-type
and mutant states separately did not improve the presented correlations
(data not shown).
Figure 4
Performance of various scoring functions according to direct
assessment. Because of the nature of prediction scores, a
negative correlation for pKd (KDEEP) and
a positive correlation for the other scoring terms are expected. (a)
DSX-ONLINE, (b) HADDOCK2.2 (HADDOCK score), (c,d)
KDEEP (pKd and ΔG), (e) PDBePISA (ΔG), (f,g) Pose&Rank
(PoseScore and RankScore), and (h)
PRODIGY-LIG (ΔG) performances. The sample sizes
for IC50, Kd, and Ki data are n = 16, n = 15, and n = 11, respectively.
Performance of various scoring functions according to direct
assessment. Because of the nature of prediction scores, a
negative correlation for pKd (KDEEP) and
a positive correlation for the other scoring terms are expected. (a)
DSX-ONLINE, (b) HADDOCK2.2 (HADDOCK score), (c,d)
KDEEP (pKd and ΔG), (e) PDBePISA (ΔG), (f,g) Pose&Rank
(PoseScore and RankScore), and (h)
PRODIGY-LIG (ΔG) performances. The sample sizes
for IC50, Kd, and Ki data are n = 16, n = 15, and n = 11, respectively.Next to the direct assessment, we have used
two
other metrics, that is, delta and binary
assessments (Figure S3, see the Methods, Section ). The delta assessment measures the
absolute change in the affinity upon mutation (correlated to ΔΔG), improving the R values significantly.
This is especially the case for the R-performances
of DSX-ONLINE, KDEEP, and PoseScore over Ki-cases, which become 0.97, 0.76, and 0.56,
respectively. When the performance of these three predictors is analyzed
from the perspective of drug resistance/sensitivity, it is observed
that for each scoring function, different cases deviate from the linear
behavior (Figure ).
This means that distinct cases pose different prediction difficulties
for the presented scoring functions. In general, significant binding
affinity changes, leading to a gain of drug sensitivity or resistance,
are easier to predict. The Ki-subset is
rich in drug-sensitive cases. Consequently, having the Ki-subset dominated by cases with significant binding affinity
changes could be the reason for the good prediction performances over
the Ki-cases. Unfortunately, none of the
presented scoring functions could predict the neutral behavior of
IC50- and Kd-cases.
Figure 5
Performance
of various scoring functions according to the delta assessment. The delta assessment of
DSX-ONLINE, KDEEP (ΔG), and Pose&Rank (PoseScore) is characterized based on the drug resistance/sensitivity
of the cases. The data are evaluated as subsets based on the kinetic
metrics. The neutral (gray), drug-resistant (red), and drug-sensitive
(blue) cases are indicated.
Performance
of various scoring functions according to the delta assessment. The delta assessment of
DSX-ONLINE, KDEEP (ΔG), and Pose&Rank (PoseScore) is characterized based on the drug resistance/sensitivity
of the cases. The data are evaluated as subsets based on the kinetic
metrics. The neutral (gray), drug-resistant (red), and drug-sensitive
(blue) cases are indicated.In the binary assessment, the accuracy of predictors
was measured upon analyzing the agreement between the direction of
change in the experimental and predicted binding affinities. According
to this metric, DSX-ONLINE accurately predicts the affinity change
of all Ki-cases. The success rate of DSX-ONLINE
over Ki-cases is followed by PoseScore and PRODIGY-LIG, both leading to 83% accuracy. As another important
highlight, KDEEP (pKd) reaches 88% success
rate in IC50 cases. When the neutral cases were excluded
from the analysis, it is observed that DSX-ONLINE, PRODIGY-LIG, and PoseScore can predict the drug-sensitive and drug-resistant
cases, with 100, 90, and 90% accuracies, respectively. For the neutral
cases, the best performance is achieved by DSX-ONLINE and PRODIGY-LIG,
with 62 and 54% accuracies. Therefore, both delta and binary assessments are in agreement with the
fact that the mutations causing the most dramatic binding affinity
changes (resistance or sensitivity) are easier to predict. However,
the amount of neutral cases (n = 13) is almost 3
times more than the drug-sensitive (n = 5) or drug-resistant
(n = 5) ones. Therefore, we cannot rule out the fact
that small and different sample sizes could have an impact on the
observed prediction accuracy.As an across-metric comparison,
over Ki-cases, DSX-ONLINE leads to R values of 0.45 and
0.97 for direct and delta assessments, respectively, and 100% accuracy for the binary assessment. This accuracy improvement is explained by the fact that while strong
binders (complexes with lower Ki values)
show an orthogonal spread in the direct assessment, they show a linear
spread for the delta assessment (Figure S4). While delta and binary
assessments agree in this particular case, in some other
cases, delta and binary assessments contradict. For example, PRODIGY-LIG achieves an R-value of 0.29 in the delta assessment and 83% accuracy
in the binary assessment over the Ki-cases. This means that a scoring function may achieve
great success on the binary assessment because it
can predict the direction of affinity change. However, the same scoring
function may not be able to predict the amount of change (ΔΔG), which would lead to a lower R-value. Within the context
of the biological meaning of the assessment approaches, we suggest
the concurrent use of delta and binary assessments. After testing the web servers with crystal structures, we have
scored the HADDOCK-refined complexes with the probed web servers too.
These efforts have revealed that structural refinement of crystal
structures did not improve the prediction accuracy (data not shown).Finally, among the three different kinetic data types, according
to the direct assessment, IC50 turns out
to be the best descriptor for most of the scoring functions. When
the complexes are considered as mutant/wild-type pairs (delta and binary assessments), Ki comes out as the best descriptor (Figure S3). Interestingly, none of the methods could lead to a reliable
prediction over the Kd set. Kd is a universal measure of binding, while Ki is defined for enzymes only, which could be the reason
for the high R values reported for this set.[64] However, in this case too, we cannot rule out
the fact that success rates of the kinetic descriptors may depend
on the number of cases available for each descriptor (eight pairs
for IC50, nine pairs for Kd, and six pairs for Ki).
Structure Quality has a Substantial Impact
on the Prediction Accuracy
In a real-life scenario, the experimental
structure of the kinase under study might not be available. Taking
this into consideration, we have homology modeled the Ki-cases and probed these structures with DSX-ONLINE (as
this combination leads to the best performance for the delta
and binary assessments). The mutation-specific features such
as hydropathy index change,[65] residue volume
change,[66] and drug resistance or sensitivity
were chosen for the characterization of the affinity data Figure a). As a result of
this exercise, we show that the performance of DSX-ONLINE is dramatically
worsened by the worsened quality of the input structure (an R-drop from 0.97 to 0.24). We also observe a decrease in
the residue volume upon mutation accompanied by an increase of polarity
and drug sensitivity. To understand how modeling has impacted the
conformations, we have performed a minimal structure examination.
The case pair with the visibly highest predicted affinity difference
was chosen for this exercise (Figure a, red circled data points). In this regard, the humanABL1 kinase–AXI complex is the choice. The wild-type and mutant
kinase structures of this case pair (4WA9, wild-type; 4TWP, mutant) has the highest root-mean-square
deviation (rmsd) between their backbone heavy atoms, when compared
with the other Ki-cases (Figure b,c). The structural comparison
of ABL1 and ABLT315I mutant shows that the conformation of the glycine-rich
loop is distinctly altered upon modeling, misleading the DSX-ONLINE
prediction. This glycine-rich loop is an integral element of the ATP-binding
pocket, which is in direct contact with ABL’s ligand AXI. This
example highlights the importance of proper mutation modeling in the
prediction of mutant protein kinase–drug binding affinity changes.
Figure 6
The default
scoring function of DSX-ONLINE is tested on the Ki-associated crystal and homology model structures
using delta assessment. (a) Pink color corresponds
to the neutral cases (left) or a decrease in the hydropathy index
(middle) or residue volume (right). Turquoise color corresponds to
the sensitive cases (left) or an increase in the hydropathy index
(middle) or residue volume (right). (b) The crystal structure (PDB
entry, 4WA9,
orange) and the model structure (blue) of wild-type human ABL1 kinase
were aligned (RMSD = 2.9 Å). (c) The crystal structure (PDB entry 4TWP, orange) and the
model structure (blue) of T315I mutant human ABL1 kinase were aligned
(RMSD = 2.7 Å). The ATP-binding pockets of the superposed structures
are shown for both alignments. The ligand, the residue to be mutated,
and the mutant residue are shown in sticks. The glycine-rich loops
are encircled with the gray ellipses.
The default
scoring function of DSX-ONLINE is tested on the Ki-associated crystal and homology model structures
using delta assessment. (a) Pink color corresponds
to the neutral cases (left) or a decrease in the hydropathy index
(middle) or residue volume (right). Turquoise color corresponds to
the sensitive cases (left) or an increase in the hydropathy index
(middle) or residue volume (right). (b) The crystal structure (PDB
entry, 4WA9,
orange) and the model structure (blue) of wild-type humanABL1 kinase
were aligned (RMSD = 2.9 Å). (c) The crystal structure (PDB entry 4TWP, orange) and the
model structure (blue) of T315I mutant humanABL1 kinase were aligned
(RMSD = 2.7 Å). The ATP-binding pockets of the superposed structures
are shown for both alignments. The ligand, the residue to be mutated,
and the mutant residue are shown in sticks. The glycine-rich loops
are encircled with the gray ellipses.
Conclusions
Our work investigates whether
the web-based scoring tools can predict
the impact of protein kinase mutations on their ligand-binding affinities.
To test this, we have compiled BINDKIN, the first structure-based
affinity benchmark of wild-type and mutant protein kinase–inhibitor
complexes. Based on the Ki-associated
cases of BINDKIN, we have generated a homology model benchmark too.
Using the crystal structure sets and their available experimental
binding affinity data, we have assessed the prediction accuracy of
six different web-based scoring tools, DSX-ONLINE, HADDOCK2.2, KDEEP,
PDBePISA, Pose&Rank, and PRODIGY-LIG. Overall, ligand-specific
scoring functions (used by DSX-ONLINE, KDEEP, Pose&Rank, and PRODIGY-LIG)
performed better than the general scoring functions (used by PDBePISA
and HADDOCK2.2). When the ligand-specific modified HADDOCK[67] scoring function was used, the prediction accuracy
did not improve significantly (data not shown).When a deeper
analysis is performed, we have observed that (i) Ki is a sensitive descriptor that can capture
the effect of mutations on the binding affinity, (ii) DSX-ONLINE is
the best tool in predicting the mutation-induced Ki changes, (iii) the quality of the input structure significantly
impacts the prediction accuracy. Therefore, despite the recent advances
presented by machine-learning tools in the field of computational
structural biology, in our study, a classical knowledge-based approach
(DSX-ONLINE) still performed better than a machine-learning approach
(KDEEP).All in all, as an answer to our title question, our
results show
that we are currently far from rapidly and accurately predicting the
impact of protein kinase mutations, especially when the kinase structure
is not at hand. A more comprehensive benchmark than BINDKIN is necessary
to train and improve the available predictors in a reliable manner.
In this regard, drug companies working on mutant oncogenic protein
kinases can facilitate a consistent and a more comprehensive data
set by providing their related experimental data. From our end, to
assist the kinase-research field, we are sharing our files associated
with this work online at https://github.com/CSB-KaracaLab/BINDKIN.
Methods
Collection of the BINDKIN
Benchmark
To construct the BINDKIN (effect of
point mutations
on the BINDing affinity of protein KINase–ligand
complexes) benchmark, we performed a thorough search in the Protein
Data Bank (PDB)[68] (https://www.rcsb.org/). We obtained
the final list of wild-type and mutant kinase–ligand pairs
according to the following criteria:For each mutant protein kinase–ligand complex,
there has to be a wild-type complex, containing the same protein and
the ligand.Structures of wild-type and
mutant complexes should
originate from the same research article.For each complex pair, there has to be experimentally
determined binding affinity data, originating from the same research.
The experimental binding affinity data were acquired from PDBbind
(http://www.pdbbind-cn.org/index.php),[69−71] Binding DB (https://www.bindingdb.org/bind/index.jsp),[72] and Binding MOAD (http://bindingmoad.org/Search/advancesearch)[73,74] databases.The ligand has to be an ATP-competing reversible (noncovalent)
inhibitor.These criteria left us with
23 pairs of wild-type and
mutant complexes, making up the BINDKIN benchmark. In the benchmark,
three kinases are found in the dimeric state (PDB IDs: 2IW8/2IW9, 3G0F, and 4EOR/4EOK). However, in these
cases, the protein–protein dimerization interface is not involved
in the ligand interactions. Therefore, these cases have been treated
as monomeric kinase complexes. The BINDKIN benchmark includes seven
EGFR, three Abl, three Mps1, three Src, two Cdk2, one ALK, one FGFR,
one Kit, and one PKA proteins. These complexes present 36 point mutations
distributed across 15 unique positions within or in the vicinity of
the ATP-binding pocket (Figure a).BINDKIN covers binding modes of 18 different inhibitors
(Figure S1). The 2D images of the ligands
were
generated from their respective unique SMILES strings, using the Smi2Depict
tool of ChemDB Chemoinformatics Portal (http://cdb.ics.uci.edu/cgibin/Smi2DepictWeb.py).[75] The pharmacophoric characteristics
of ligands were evaluated with the ALOGPS web server of the Virtual
Computational Chemistry Laboratory (http://www.virtuallaboratory.org/web/alogps/)[76−85] and the pkCSM web server (http://structure.bioc.cam.ac.uk/pkcsm).[86] The available ligand-related data
were obtained from RCSB PDB and PubChem (https://pubchem.ncbi.nlm.nih.gov/). All the benchmark-related features are deposited in https://github.com/CSB-KaracaLab/BINDKIN. Further details about the benchmark characteristics are described
in the Results and Discussion section.
BINDKIN-Homology Model Benchmark
When the structure
of the protein kinase under study is not at hand,
the researchers often resort to homology modeling to predict the relevant
structure. Therefore, to simulate a realistic scenario, we established
a homology model benchmark for the Ki-associated
cases. Here, we chose the Ki-associated
cases, as their binding affinities were predicted with the overall
highest accuracy based on the available BINDKIN crystal structures.The wild-type and mutant sequences were structurally modeled with
the default settings of the I-TASSER web server (https://zhanglab.ccmb.med.umich.edu/I-TASSER/).[87−89] During modeling, the original coordinates of the
wild-type or mutant structures were excluded from the template list.
After obtaining the homology models, their corresponding ligands were
placed at their catalytic-binding pocket using the original crystal
structure as a template. The fitting was performed with PyMOL.[90] In these crude models, steric clashes were observed.
To optimize the protein–ligand interface, we applied water
refinement on each model using the HADDOCK2.2 refinement interface
(https://milou.science.uu.nl/services/HADDOCK2.2/haddockserver-refinement.html).[42] The HADDOCK refinement performs very
short molecular dynamics simulation cycles in explicit water.
Benchmarking of the Web
Servers
The molecules present
other than the protein and the ligand either led to an error in the
web servers or were disregarded by the web servers. We should note
here that DSX-ONLINE has an option to include water molecules in the
calculations. However, to measure the performance on a standardized
set of crystal structures, we decided to use the same input in all
servers. Therefore, before benchmarking, the crystallization buffer
additives, crystal water molecules, and ions were removed from the
co-crystal structures. For each occurrence of multiple conformations,
the conformer with the highest occupancy was retained, and the other
conformers were removed. In one of the EGFR cases (5EM7), the ligand 5Q4
is present in two distinct conformations. Both conformations were
taken into consideration.The ligand coordinate files were converted
to “mol2” and “sdf” formats based on the
different file format requirements of the web servers. For mol2 conversion,
Open BABEL (v2.4.1), and for sdf conversion, Online SMILES Translator
and Structure File Generator (National Cancer Institute) (https://cactus.nci.nih.gov/translate/) were used. For the sdf conversion, the “Aromatic”
SMILES representation option (prints the unique SMILES string into
the “sdf” file) and the “3D” coordinate
option were chosen. The sdf files were protonated by default. The
conformations and the 3D coordinates of the ligands were retained
during conversion to both mol2 and sdf formats.Six different
web-based scoring functions were used to predict
the binding affinities listed within BINDKIN. A short description
of each web server is provided in Table . Among the web servers, DSX-ONLINE became
inaccessible shortly after the completion of its benchmarking.
Table 1
List of Benchmarked Servers
tool name
brief explanation
input format
link
DSX-ONLINE[41]
a knowledge-based scoring function
composed of distance-dependent pair
potentials, torsion-angle potentials, and solvent-accessible surface-dependent potentials
takes
the protein and ligand coordinate files in pdb and mol2
formats
a deep convolutional neural
network approach. The web server represents the binding site as a 24 Å voxel for pharmacophore featurization,
considering eight pharmacophoric-like features. It reports pKd and ΔG scores
takes the protein and the ligand
coordinate files in pdb and
sdf formats, respectively
https://www.playmolecule.org/Kdeep/
Pose & Rank[38]
PoseScore and RankScore are
atomistic distance-dependent statistical
scoring functions. The PoseScore is optimal for distinguishing
the native binding geometries of compounds from other poses, whereas RankScore is optimal for distinguishing ligands from nonbinding
small molecules
takes the protein and ligand coordinate
files in pdb and mol2
formats, respectively
https://modbase.compbio.ucsf.edu/poseandrank/
PRODIGY-LIG[40]
an atomistic contact-based protein–small molecule binding affinity predictor
takes the protein
and ligand coordinates in pdb format
https://bianca.science.uu.nl/prodigy/
HADDOCK2.2[42,91]
HADDOCK2.2 is a molecular docking program, using both force field-based and empirically-derived scoring terms. In this study, we used the refinement interface of
HADDOCK2.2
Takes the protein and the ligand coordinates
in pdb format
scores
according to the average free energy contributions of
the hydrogen bonds and salt bridges between the subunits and the solvation
free energy of folding
takes the protein and the ligand
coordinates in pdb format
https://www.ebi.ac.uk/pdbe/pisa/
Assessment of the Prediction Results
Three different
approaches were used to assess the prediction results.
These are direct, delta, and binary assessments. The direct assessment is a classical approach, where a linear correlation between all
scores and experimental data (wild-type and mutant) were sought. For
the delta assessment, a direct relationship between
the mutation-induced binding affinity (Δbaexp, eq ) and the mutation-induced
score (Δbapred, eq ) changes was sought.In the case of binary assessment, an agreement in the direction of mutation-induced change was required.
For example, if a mutation leads to a worsening of the experimental
binding affinity, and it is predicted as such by the computational
prediction, it was considered as a successful prediction. If a worsening
in the predicted affinity accompanies a mutation-induced improvement
in the experimental affinity, then this case was counted as a misprediction.
In the binary assessment, the percentage of correct
predictions was used as the overall success rate.For linear
regression analyses, the associated R and p values and the plots were generated with R (v3.6.1),[92] invoked by RStudio
(v1.2.5001).[93] The following R packages
were used to generate scatterplots, barplots, or histograms: ggpubr,[94] ggplot2,[95] magrittr,[96] ggrepel,[97] grid,[92] and gridExtra.[98] The
heatmap was produced with pheatmap[99] and
RColorBrewer[100] R packages.
Authors: Igor V Tetko; Johann Gasteiger; Roberto Todeschini; Andrea Mauri; David Livingstone; Peter Ertl; Vladimir A Palyulin; Eugene V Radchenko; Nikolay S Zefirov; Alexander S Makarenko; Vsevolod Yu Tanchuk; Volodymyr V Prokopenko Journal: J Comput Aided Mol Des Date: 2005-06 Impact factor: 3.686
Authors: Sweta Maheshwari; Michelle S Miller; Robert O'Meally; Robert N Cole; L Mario Amzel; Sandra B Gabelli Journal: J Biol Chem Date: 2017-07-04 Impact factor: 5.157