Tomomi Shimazaki1, Masanori Tachikawa2. 1. Graduate School of Nanobioscience, Yokohama City University, 22-2 Seto, Yokohama, Kanagawa 236-0027, Japan. 2. Graduate School of Data Science, Yokohama City University, 22-2, Seto, Yokohama, Kanagawa 236-0027, Japan.
Abstract
To improve virtual screening for drug discovery, we present a collaborative approach between explainable artificial intelligence (AI) and simplified chemical interaction scores to efficiently search for active ligands bound to the target receptor. In particular, we focus on cyclin-dependent kinase 2 (CDK2), which is well known as a cancer target protein. Docking simulation alone is insufficient to distinguish active ligands from decoy molecules. To identify active ligands, in this paper, machine learning is employed together with scoring functions that simplify the screened Coulomb and Lennard-Jones interactions between the ligands and residues of the target receptor. We demonstrate that these simplified interaction scores can significantly improve the classification ability of machine learning models. We also demonstrate that explainable AI together with the simplified scoring method can highlight the important residues of CDK2 for recognizing active ligands.
To improve virtual screening for drug discovery, we present a collaborative approach between explainable artificial intelligence (AI) and simplified chemical interaction scores to efficiently search for active ligands bound to the target receptor. In particular, we focus on cyclin-dependent kinase 2 (CDK2), which is well known as a cancer target protein. Docking simulation alone is insufficient to distinguish active ligands from decoy molecules. To identify active ligands, in this paper, machine learning is employed together with scoring functions that simplify the screened Coulomb and Lennard-Jones interactions between the ligands and residues of the target receptor. We demonstrate that these simplified interaction scores can significantly improve the classification ability of machine learning models. We also demonstrate that explainable AI together with the simplified scoring method can highlight the important residues of CDK2 for recognizing active ligands.
In
recent years, the use of machine learning with molecular databases[1−10] has rapidly increased in the field of physical chemistry. Many machine
learning algorithms focus on finding correlations (patterns) in the
given data; however, the patterns found are often difficult for humans
to understand due to the complex nonlinear behavior of machine learning
models. The incomprehensibility of machine learning results has become
a major problem in both scientific and nonscientific fields. To alleviate
this problem, explainable artificial intelligence (AI) techniques,
which attempt to understand and explain the behavior of machine learning
models, have gained much attention.[11−13] In this paper, we examine
several explainable AI algorithms, focusing on docking simulation
(virtual screening) to search for drug candidate molecules, as a large
amount of experimental data on both positive and negative cases is
available.In the early stages of drug development, virtual
screening (docking
simulation) is routinely employed to narrow down candidate molecules,
where the binding ability of ligand molecules against the target protein
is virtually tested on a computer.[14−24] However, the use of docking simulation alone is not sufficient to
search for drug candidates. Thus, more precise physicochemical simulations,
such as (subsystem-based) quantum chemistry and (classical) molecular
mechanics/dynamics, have been performed to improve the screening process;
however, they consume a large amount of computational resources.[25−36] In contrast, machine learning is employed to achieve efficient and
accurate virtual screening.[37−52] For example, Perreira et al. proposed a deep learning approach to
improve docking simulations and extracted relevant features from protein–ligand
complex data.[49] Yan et al. developed a
descriptor based on interactions between protein and ligand for machine
learning method.[53] Ragoza et al. reported
a convolutional neural network scoring function approach and highlighted
the key features of protein–ligand interactions.[41] Molecular descriptions suitable for machine
learning have also been developed.[53−57] Several scores based on machine learning techniques
are reported.[50,58,59] In this paper, we discuss a virtual screening technique based on
machine learning using simplified interaction scoring functions.To verify virtual screening techniques, it is necessary to use
decoy molecules that have similar chemical structures and properties
to those of active ligands. The use of decoy molecules allows for
stricter verification than using active ligands alone. In other words,
virtual screening techniques must classify active ligands (positive
cases) from a large number of decoy molecules (negative cases). For
this purpose, Mysinger et al. developed the Directory of Useful Decoys:
Enhanced (DUD-E) data set, which provides not only experimentally
verified active ligands but also a large number of property-matched
decoy molecules.[60] The number of decoys
registered in the DUD-E data set is approximately 35 times the number
of active ligands for each target protein (receptor). In this paper,
we use the decoy molecules provided in the DUD-E data set to verify
the machine learning models.We focus on cyclin-dependent kinase
2 (CDK2) as the target receptor
in this paper. CDK2 is a catalytic subunit of the cyclin-dependent
kinase complex and plays a critical role in the abnormal growth process
of cancer cells.[61,62] Thus, CDK2 inhibitors have been
widely researched as drug candidates.[63] For CDK2, 798 active ligands and 28,328 decoy molecules are provided
in the DUD-E data set.[60] Thus, a small
number of active ligands must be distinguished from a large number
of decoys. However, the affinity (binding free energy) determined
by conventional docking simulation is not sufficient to classify active
ligands. Therefore, we introduce simplified interactions (scores)
between ligands and the CDK2 receptor. These simplified interaction
scores can significantly improve the classification ability of machine
learning models. In addition, important residues for ligand recognition
of CDK2 can be highlighted based on a collaborative approach between
the interaction scores and explainable AI. The knowledge of important
residues may be a useful guideline to understand the binding phenomena
between ligands and the target protein.In Section , we
describe the docking simulation, the simplified interaction scores
containing chemical features, machine learning techniques, and the
collaborative approach between the interaction scores and explainable
AI. The calculation results and discussion are provided in Section , and concluding
remarks are presented in Section .
Methods
In this
section, we describe the calculation process to classify
a small number of active ligands bound to the target protein (receptor)
among a much larger number of decoy molecules. Figure summarizes the classification process. The
first step is docking simulation between ligands and the target receptor.
We employed the AutoDock Vina program package (referred to as Vina
in this paper).[21] Vina provides simulation
results with a higher average accuracy than AutoDock4 and efficiently
predicts the ligand–receptor structure using a simple scoring
function.[16] However, there are many common
features between Vina and AutoDock4. For example, Vina employs the
PDBQT molecular structure file format for input and output, which
is also used in AutoDock4. In addition, the subroutines of MGLtools,
which were originally developed for AutoDock4, can be used to prepare
input files for the Vina docking simulation.[64,65] Vina can also evaluate the binding free energy (affinity) between
a ligand and the target protein. The PDB receptor structure data registered
in DUD-E, whose PDB-ID is 1h00,[66] was employed in this study. We also used the molecular
data provided in DUD-E without any modifications.
Figure 1
Calculation and analysis
flow used in this paper. In the first
step, docking simulation using Vina is employed to obtain the complex
structure and binding energy (affinity) between CDK2 and ligands.
In the second step, the simplified screened Coulomb (SC) and Lennard-Jones
(LJ) interaction scores between ligands and residues around CDK2 are
calculated using ligand–receptor complex structures. In the
third step, machine learning models to classify active ligands are
created using feature vectors with simplified interaction scores and
affinities. In the final step, explainable AI analysis is performed
for the trained machine leaning models.
Calculation and analysis
flow used in this paper. In the first
step, docking simulation using Vina is employed to obtain the complex
structure and binding energy (affinity) between CDK2 and ligands.
In the second step, the simplified screened Coulomb (SC) and Lennard-Jones
(LJ) interaction scores between ligands and residues around CDK2 are
calculated using ligand–receptor complex structures. In the
third step, machine learning models to classify active ligands are
created using feature vectors with simplified interaction scores and
affinities. In the final step, explainable AI analysis is performed
for the trained machine leaning models.Figure a presents
the histograms of the affinities of active and decoy ligands for the
target CDK2 protein. The blue and red transparent histograms represent
active and decoy molecules, respectively. The number of active ligands
was much smaller than the number of decoys; thus, the histogram of
the active ligands is much smaller than that of the decoys and is
difficult to distinguish. To more easily compare the two histograms,
normalized histograms are presented in Figure b. There is a large overlap between the histograms,
although the active ligands tend to have a slightly lower affinity
than the decoys on average. The large overlap in affinity leads to
difficult classification of active ligands. In fact, we were unable
to create a satisfactory model to classify active ligands even using
machine learning, as shown later.
Figure 2
(a) Histograms of the binding energy (affinity)
of active and decoy
ligand molecules for CDK2 evaluated by Vina docking simulation. Here,
red and blue transparent histograms represent active and decoy molecules,
respectively. The number of active ligands is much smaller than that
of decoys. (b) Normalized histograms for active and decoy molecules
for a clearer comparison between the two. There is a large overlap
between the active and decoy histograms, indicating that docking simulation
cannot fully classify active ligands from a large number of decoys.
(a) Histograms of the binding energy (affinity)
of active and decoy
ligand molecules for CDK2 evaluated by Vina docking simulation. Here,
red and blue transparent histograms represent active and decoy molecules,
respectively. The number of active ligands is much smaller than that
of decoys. (b) Normalized histograms for active and decoy molecules
for a clearer comparison between the two. There is a large overlap
between the active and decoy histograms, indicating that docking simulation
cannot fully classify active ligands from a large number of decoys.In the second step, simplified interaction scores
between ligands
and residues around CDK2 were calculated, which are used in feature
vectors to create improved machine learning models. To evaluate the
interaction scores, we focused on amino acid residues around the pocket
region of the target receptor where ligands were bound. We display
a CDK2–ligand complex structure obtained by Vina in Figure a and 17 residues
around the ligand-binding pocket of CDK2 in Figure b. In this study, we calculated the interaction
scores between a ligand molecule and each residue based on the simplified
SC interaction and the LJ potential as followswhere is the distance
between atom of each CDK2 residue and atom of the ligand molecule and is a constant parameter set to
0.1 [Å–2]. To evaluate these scores, we used
the ligand–receptor
complex structure obtained from the Vina docking simulation. Equation represents the simplified
SC interaction. Here, only the atomic species of O, N, S, and F are
taken into account to evaluate the score because they play an important
role in hydrogen bonding. The simplified SC scoring function does
not take into account the partial charges on atoms, unlike classical
force field models. Equation is based on the LJ potential; however, it does not consider
the differences between atom species. In the simplified LJ interaction
score, the same parameter is employed for all atom species, although
ordinary classical force fields have many parameters. Thus, we employed
simplified SC and LJ scores instead of considering more complex interactions
because the former can enable easy, robust, and fast evaluation. In
addition, this simplification is useful to highlight important residues
for binding active ligands to the target protein.
Figure 3
(a) Structure of CDK2
described by the ball-and-stick representation.
Here, a ligand molecule is bound to the pocket of CDK2. The ligand–receptor
complex structure is obtained by the Vina docking simulation. (b)
Residues around the pocket of CDK2, where ligand molecules are bound.
(a) Structure of CDK2
described by the ball-and-stick representation.
Here, a ligand molecule is bound to the pocket of CDK2. The ligand–receptor
complex structure is obtained by the Vina docking simulation. (b)
Residues around the pocket of CDK2, where ligand molecules are bound.In the third step, we employed machine learning
to classify active
ligands from a large number of decoys. Here, the simplified SC and
LJ interactions as well as the affinity calculated by Vina were used
in feature vectors to construct machine learning models. We examined
several machine learning algorithms using scikit-learn and LightGBM
(light gradient boosting machine) libraries.[67,68] The hyperparameters were tuned using grid search and Bayesian optimization
with the scikit-optimize library.[69] Cross-validation
was used to evaluate the machine learning models.Finally, in
the fourth step, we analyzed the trained machine learning
models using explainable AI methods, such as the permutation importance
algorithm[70] and the Shapley additive explanation
(SHAP) method using the scikit-learn[67] and
shap libraries.[71,72] The explainable AI analysis allowed
us to distinguish important CDK2 residues to classify active ligands.
In Section , we present
the details of the calculation results based on the explainable AI
approach together with the simplified SC and LJ interaction scores.
Results and Discussion
This section discusses the calculation
results obtained from machine
learning models with simplified interaction scores. The simplified
SC and LJ interaction scores were used in feature vectors to construct
the machine learning models. To calculate the simplified SC and LJ
scores, 17 residues of CDK2 near the pocket were considered. In addition,
the affinity obtained from Vina was also employed to describe the
features of the ligand molecules. Thus, the size (dimension) of the
feature vectors for each ligand was 35 (= 17 × 2 + 1). Standardization
(normalization) of the feature vectors was employed to construct the
machine learning models (classifiers). In this study, we examined
several classifiers based on the logistic regression, random forest,
and support vector machine (SVM) algorithms implemented in scikit-learn
and the gradient boosting algorithm implemented in LightGBM. In the
SVM method, the radial basis function (RBF) and the third-order polynomial
(poly3) kernels were employed to consider nonlinear relationships
in the data.[73,74]Table summarizes
the performance of the machine learning models to classify active
ligands. Five-fold cross-validation was used to obtain these metrics.
The number of active ligands was much smaller than the number of decoys;
thus, the accuracy (= (TP + TN)/(TP + TN + FP +FN)) was not a suitable
metric to evaluate the machine learning models. Here, , , , and represent true positive, true
negative,
false positive, and false negative, respectively. Thus, we also used
other metrics, such as precision (= TP/(TP + FP)) and recall (= TP/(TP
+ FN)), which have a tradeoff relationship. In addition, the F1 score
and the Matthews correlation coefficient (MCC) can be used to comprehensively
compare the model performance.[75]Table also presents the
area under the receiver operating characteristic (AUC) curve scores.
Table 1
Comparison of Machine Learning Models
in Classifying Active Ligands (Positive Cases) Bound to CDK2
models
accuracy
precision
recall
F1 score
MCC
AUC score
gradient boosting (LightGBM)
0.98
0.98
0.48
0.60
0.65
0.93
SVM–RBF
0.98
0.79
0.60
0.68
0.68
0.92
SVM–poly3
0.98
0.65
0.48
0.55
0.55
0.87
random forest
0.98
0.99
0.28
0.44
0.52
0.89
logistic regression
0.77
0.08
0.71
0.14
0.18
0.81
The results in Table indicate that the
LightGBM model, which was based on the gradient
boosting algorithm, demonstrated superior classification ability for
active ligands. The LightGBM algorithm was much faster than other
methods; thus, we focus on the LightGBM model in this paper. Table presents the mixed
matrix for the LightGBM model using 35 features. For comparison, Table also presents the
mixed matrix when only the affinity was used. The decoys demonstrated
good results for all metrics in both cases due to the much larger
number of decoy molecules than active ligands. In contrast, different
results were observed for active ligands, where the use of the affinity
alone was insufficient to construct an effective classifier. For example,
an F1 score of 0.02 and a precision of 0.30 were obtained when only
the affinity was used for the feature vectors. However, when the simplified
SC and LJ interaction scores together with the affinity were used,
a machine learning model could be constructed with an F1 score of
0.60 and a precision of 0.98. The use of interaction scores thus greatly
improved the machine learning model. Therefore, using the LightGBM
model with the simplified SC and LJ interaction scores was effective
in classifying active ligands.
Table 2
Mixed Matrix for
LightGBM Model
precision
recall
F1 score
model using 35 features
active ligands
0.98
0.43
0.60
decoy molecules
0.98
1.00
0.99
model using only the affinity
active ligands
0.30
0.01
0.02
decoy molecules
0.97
1.00
0.99
To analyze
the trained machine learning models, we employed the
permutation importance algorithm, which is an explainable AI method.
This algorithm makes it possible to evaluate the importance of features
even in nonlinear models. In this algorithm, the values of a feature
are randomly shuffled, and then the target data are predicted. The
feature with shuffled values becomes useless and usually reduces the
predictive performance of the trained model. The importance of a feature
can be evaluated from the performance deterioration. This operation
is repeatedly executed for all features, and the importance of each
feature is evaluated.Figure a presents
the top 15 most important features obtained by the permutation importance
method for the LightGBM model, where the maximum value was scaled
to 1.0. More detailed results are provided in the Supporting Information. To obtain these results, only active
ligands (positive cases) were employed to avoid the strong influence
of the larger number of decoys. The explainable AI analysis provided
an insight into the behavior of the complex nonlinear machine learning
model. For example, we determined that the affinity calculated by
Vina was the most important feature in the LightGBM model. The docking
simulation results were essential even if the simplified SC and LJ
interaction scores were used. It should be noted that the affinity
was not the most important feature in the SVM model (see also the Supporting Information). The second-most important
feature in the LightGBM model was the simplified LJ interaction between
ligands and the Val18 residue, while the third- and fourth-most important
features were the simplified SC scores on Val18 and Leu134, respectively.
Thus, we obtained information on which interactions between ligands
and the target protein were important for classifying active ligands.
Figure 4
Scaled
importance of features is obtained by the following explainable
AI algorithms for the LightGBM model to classify active ligands: (a)
permutation importance algorithm and (b) SHAP algorithm. These algorithms
provide different interpretations of the importance of features, but
common important residues are observed for recognizing active ligands.
(c) Five of the top six residues of CDK2 are observed in both algorithms.
Scaled
importance of features is obtained by the following explainable
AI algorithms for the LightGBM model to classify active ligands: (a)
permutation importance algorithm and (b) SHAP algorithm. These algorithms
provide different interpretations of the importance of features, but
common important residues are observed for recognizing active ligands.
(c) Five of the top six residues of CDK2 are observed in both algorithms.Figure b illustrates
the importance of features obtained by the SHAP algorithm, which is
based on the Shapley value in game theory.[76] In this paper, these values are referred to as the SHAP importance.
Here, the SHAP importance values were also scaled so that the maximum
value was 1.0 for comparison purposes. The SHAP importance values
were different from those obtained by the permutation importance algorithm.
This suggests that the importance of features is not uniquely determined
but depends on the selected algorithms. However, we were able to recognize
common important residues from the different results. To clarify this
point, we focus on only the residue information.Table presents
the top six most important residues provided by the permutation importance
and SHAP algorithms. To obtain the results, the residues were simply
aggregated based on the importance values without distinguishing between
the simplified SC and LJ scores. The results demonstrate that both
the permutation importance and SHAP algorithms produced similar important
residues, although their order was slightly different. That is, five
residues of Ala31, Lys33, Leu83, His84, and Leu134 were common to
the top six residues, as illustrated in Figure c. Thus, the explainable AI techniques suggest
that these residues play essential roles in the molecular recognition
of CDK2 for classifying active ligands in the LightGBM model. Figure displays the change
in MCC values according to the number of features. Here, the features
were first sorted according to their importance values, and the ones
with higher importance were used. Both graphs display a similar behavior,
where the performance approached saturation at approximately 15 features.
Thus, it was demonstrated that the top residues played more essential
roles in classifying active ligands.
Table 3
Most Important Residues Provided by
the Permutation Importance and SHAP Algorithms in the LightGBM Model
permutation
importance
SHAP
1
Val18
Leu83
2
Leu134
His84
3
Lys33
Ala31
4
Leu83
Leu134
5
His84
Asp86
6
Ala31
Lys33
Figure 5
Change in MCC according to the number
of features used in the (a)
LightGBM model and (b) SVM with the radial distribution function (SVM–RBF)
model. Here, features sorted by permutation importance are used to
obtain the results. The classification ability of both learning models
saturates at approximately 15 features.
Change in MCC according to the number
of features used in the (a)
LightGBM model and (b) SVM with the radial distribution function (SVM–RBF)
model. Here, features sorted by permutation importance are used to
obtain the results. The classification ability of both learning models
saturates at approximately 15 features.Next, we discuss the SVM algorithm
with the RBF kernel (SVM–RBF)
to investigate the behavior of the permutation importance algorithm. Table presents the top
five most important residues for SVM–RBF, which were obtained
in the same way as in Table . In addition, we present the results calculated by the LightGBM
model for comparison. The results differed slightly between the two
models because the models behaved differently. For example, the precision
of the SVM–RBF and LightGBM models was 0.79 and 0.98, respectively,
as displayed in Table . In addition, the recall of these models was 0.68 and 0.60, respectively.
The LightGBM model exhibited a superior performance in terms of precision,
whereas the SVM–RBF model exhibited a superior recall. Thus,
each machine learning model focused on different aspects of the data;
therefore, the importance of the residues differed between these machine
learning models. However, the three residues of Lys33, Leu83, and
His84 were common to the top five residues in SVM–RBF and LightGBM.
Table 4
Most Important Residues Provided by
the Permutation Importance Algorithm in the SVM–RBF and LightGBM
Models
SVM–RBF
LightGBM
1
Leu83
Val18
2
Glu81
Leu134
3
Lys33
Lys33
4
Val18
Leu83
5
His84
His84
In both LightGBM and SVM models,
the recall values are lower than
those of precision. One of the factors is the failure of the docking
simulation process. The simplified SC and LJ interaction scores are
evaluated from the protein–ligand complex structure obtained
from the docking simulation. If the docking simulation did not provide
the correct complex structure, active ligands would be mistakenly
classified as decoy molecules. Thus, the recall metric would be a
lower value even if the simplified SC and LJ interaction scores are
employed together with machine learning. In this paper, we executed
Vina docking simulations against CDK2 with the fixed structure. More
sophisticated docking simulation techniques such as using a flexible
receptor model may improve the recall metric. The simplified SC and
LJ interaction approach can be easily incorporated into other docking
simulation techniques and therefore will be useful to enhance the
performance of various docking simulations.To analyze the behaviors
of the trained LightGBM model, this paper
employed the partial dependence (PD) analysis.[77] In the PD analysis, a feature is counterfactually changed,
and (virtual) prediction results are accumulated while averaging effects
from other features. Thus, we can investigate the average behaviors
of the machine learning model when a feature is changed. Here, the
probability to classify active ligands is employed for the counterfactual
PD calculations. In Figure , we summarize the PD analysis results for Lys33, Leu83, and
His84, where the left and right columns are used for the simplified
SC and LJ interactions, respectively. The horizontal axis represents
the standardized SC and LJ interaction score values. The vertical
axis describes the counterfactual properties (probabilities) obtained
from the PD analysis. From these calculations, we can investigate
how the machine learning model distinguishes between actives and decoys.
For example, if a stronger SC interaction for Lys33 is given, the
LightGBM model behaves to classify the molecule as an active ligand.
Conversely, in the LJ interaction case for Lys33, the counterfactual
probability takes higher values within a certain area. We also observe
similar behaviors on the SC and LJ interaction scores for Leu83. These
PD calculations indicate that a molecule needs to have appropriate
interaction strengths for Lys33 and Leu83 residues to become an active
ligand. Thus, the PD analysis with simplified interaction scores can
provide a rough understanding of how the model recognizes ligand molecules.
Figure 6
PD analysis
on simplified SC and LJ interaction scores for the
LightGMB model. The horizontal and vertical axes represent the standardized
interaction score and the counterfactual property (probability), respectively.
The higher the counterfactual values, the more the machine learning
model classifies molecules as active ligands.
PD analysis
on simplified SC and LJ interaction scores for the
LightGMB model. The horizontal and vertical axes represent the standardized
interaction score and the counterfactual property (probability), respectively.
The higher the counterfactual values, the more the machine learning
model classifies molecules as active ligands.The residues of Lys33, Leu83, and His84 are found in Table ; therefore, they may be particularly
important for binding between ligands and CDK2. Figure illustrates these residues, where the ligand
molecule is sandwiched between them. The simplified SC and LJ interaction
scores are based on (simplified) physicochemical concepts; thus, the
explainable AI analysis reflects the actual chemical and biological
behavior of CDK2. In fact, the Leu83 residue is essential for ligand
recognition of the CDK2 receptor.[78−83] Many machine learning techniques focus only on correlations (patterns)
in data rather than on causal relationships (physicochemical laws).
Thus, the patterns found by machine learning models do not always
have actual physicochemical meaning. However, the simplified SC and
LJ interaction scores possess physicochemical meaning; thus, explainable
AI analysis can capture actual physicochemical phenomena. This collaborative
approach between explainable AI techniques and simplified interaction
scores may help to extract physical and chemical meaning contained
in data. In particular, the causal relationships in biological data
are sometimes weak; therefore, the collaborative approach may be especially
useful in this situation.
Figure 7
Protein structure around the ligand-binding
pocket of CDK2. The
residues of Lys33, Leu83, and His84 are highlighted, which play an
essential role in classifying active ligands. The ligand molecule
is sandwiched between these residues.
Protein structure around the ligand-binding
pocket of CDK2. The
residues of Lys33, Leu83, and His84 are highlighted, which play an
essential role in classifying active ligands. The ligand molecule
is sandwiched between these residues.In this paper, we discussed that the simplified SC and LJ interaction
score can work well to classify active ligands. In some physical and
chemical approaches, more precise descriptions such as using first-principles
quantum chemistry and molecular dynamics have been pursued for protein–ligand
interactions. Conversely, such precise descriptions may contain some
difficult problems to handle with actual assays. For example, the
X-ray resolution to obtain the receptor structure may be rough. The
protonation state of ligands and protein residues may be unclear.
In such situations, the simplified score approach together with machine
learning may be more suitable than precise interaction descriptions
because of its robustness. Other scores may be available for the purpose.[53−57] However, it is not always clear what simplifications will work well
for each target application. Systematic research studies for simplified
interactions may be an important topic for the future.Finally,
we discuss calculation results based on a different data
set with 824 molecules, which is extracted from the ChEMBL database[5] and classified the molecule with the IC50 value
of 150 nM or less for CDK2 as active ligands. In addition, we employed
the CDK2 structure (PDB-ID 6Q4G) with a resolution of 0.98 Å.[84] Thus, we examined different classification conditions
from those used in the DUD-E database. When only affinity scores calculated
from Vina were employed, we could not effectively classify active
ligands, where the LightGBM model showed 0.24, 0.01, and 0.03 for
the precision, recall, and F1-score metrics, respectively. Conversely,
the performance of the same machine learning algorithm was largely
improved by using simplified SC and LJ scores, and 0.70, 0.48, and
0.57 were achieved for the precision, recall, and F1-score metrics,
respectively. Therefore, the simplified interaction approach is useful
to improve the classification performance of machine learning. In
our approach, simplified SC and LJ scores were employed together with
docking simulation results to construct machine leaning models, and
therefore, it can be usually expected that better results will be
obtained compared to the use of docking simulation alone, although
machine learning is a probability-based algorithm. We can systematically
handle with both docking simulation results and simplified interaction
scores in the machine learning approach. In addition, behaviors of
machine learning models can be investigated by using explainable AI
techniques to highlight important residues of the target protein.
The collaborative approach between explainable AI and simplified chemical
interactions will become a useful tool.
Concluding
Remarks
Docking simulation is routinely employed to explore
ligand candidates
in the early stages of molecular drug research and development. However,
the affinity (binding free energy) estimated from docking simulation
is not sufficient for classifying active ligands. Thus, in this study,
we introduced simplified SC and LJ interaction scores to improve the
classification ability of machine learning models. The gradient boosting
algorithm implemented in LightGBM demonstrated superior prediction
ability for CDK2, where a precision of 0.98 was achieved in classifying
active ligands. In addition, we examined the explainable AI approach
using the permutation importance and SHAP algorithms to evaluate the
trained machine learning models. The explainable AI analysis with
the simplified interaction scores provided important residues for
ligand recognition of CDK2.The number of experimental data
increases as the drug research
progresses. The accumulated data and the proposed simplified interaction
scores can improve the search for active ligands. In addition, data
analysis based on machine learning has become indispensable in various
chemical fields. Explainable AI can be a useful tool to analyze the
complex behavior of (nonlinear) machine learning models and to obtain
insights into various research topics. Although this study employed
simplified chemical interactions to describe ligands, these interactions
were effective in improving the machine learning models. The results
thus suggest that it is useful to incorporate chemical concepts into
feature vectors to improve machine learning models even if all of
the target systems are not well understood. In this paper, we discuss
CDK2–ligand complexes based on a collaborative approach between
the simplified interaction scores and explainable AI, where ligand
molecules registered in DUD-E were examined. Although the data set
was carefully created, it may have some bias. On the other hand, relatively
new data sets such as CASF-2007,[8] CASF-2013,[9] and CASF-2016[10] were
reported. In the future, we will discuss other data sets based on
the collaborative approach between the simplified interaction score
and machine learning techniques.
Authors: Scott M Lundberg; Gabriel Erion; Hugh Chen; Alex DeGrave; Jordan M Prutkin; Bala Nair; Ronit Katz; Jonathan Himmelfarb; Nisha Bansal; Su-In Lee Journal: Nat Mach Intell Date: 2020-01-17
Authors: Sunghwan Kim; Paul A Thiessen; Evan E Bolton; Jie Chen; Gang Fu; Asta Gindulyte; Lianyi Han; Jane He; Siqian He; Benjamin A Shoemaker; Jiyao Wang; Bo Yu; Jian Zhang; Stephen H Bryant Journal: Nucleic Acids Res Date: 2015-09-22 Impact factor: 16.971