Enzymes in the glutathione transferase (GST) superfamily catalyze the conjugation of glutathione (GSH) to electrophilic substrates. As a consequence they are involved in a number of key biological processes, including protection of cells against chemical damage, steroid and prostaglandin biosynthesis, tyrosine catabolism, and cell apoptosis. Although virtual screening has been used widely to discover substrates by docking potential noncovalent ligands into active site clefts of enzymes, docking has been rarely constrained by a covalent bond between the enzyme and ligand. In this study, we investigate the accuracy of docking poses and substrate discovery in the GST superfamily, by docking 6738 potential ligands from the KEGG and MetaCyc compound libraries into 14 representative GST enzymes with known structures and substrates using the PLOP program [ Jacobson Proteins 2004 , 55 , 351 ]. For X-ray structures as receptors, one of the top 3 ranked models is within 3 Å all-atom root mean square deviation (RMSD) of the native complex in 11 of the 14 cases; the enrichment LogAUC value is better than random in all cases, and better than 25 in 7 of 11 cases. For comparative models as receptors, near-native ligand-enzyme configurations are often sampled but difficult to rank highly. For models based on templates with the highest sequence identity, the enrichment LogAUC is better than 25 in 5 of 11 cases, not significantly different from the crystal structures. In conclusion, we show that covalent docking can be a useful tool for substrate discovery and point out specific challenges for future method improvement.
Enzymes in the glutathione transferase (GST) superfamily catalyze the conjugation of glutathione (GSH) to electrophilic substrates. As a consequence they are involved in a number of key biological processes, including protection of cells against chemical damage, steroid and prostaglandin biosynthesis, tyrosine catabolism, and cell apoptosis. Although virtual screening has been used widely to discover substrates by docking potential noncovalent ligands into active site clefts of enzymes, docking has been rarely constrained by a covalent bond between the enzyme and ligand. In this study, we investigate the accuracy of docking poses and substrate discovery in the GST superfamily, by docking 6738 potential ligands from the KEGG and MetaCyc compound libraries into 14 representative GST enzymes with known structures and substrates using the PLOP program [ Jacobson Proteins 2004 , 55 , 351 ]. For X-ray structures as receptors, one of the top 3 ranked models is within 3 Å all-atom root mean square deviation (RMSD) of the native complex in 11 of the 14 cases; the enrichment LogAUC value is better than random in all cases, and better than 25 in 7 of 11 cases. For comparative models as receptors, near-native ligand-enzyme configurations are often sampled but difficult to rank highly. For models based on templates with the highest sequence identity, the enrichment LogAUC is better than 25 in 5 of 11 cases, not significantly different from the crystal structures. In conclusion, we show that covalent docking can be a useful tool for substrate discovery and point out specific challenges for future method improvement.
The canonical glutathione transferases (also known as GSTs; EC
2.5.1.18) catalyze addition of an excellent nucleophile to an electrophilic
center. They play important roles in the metabolism and detoxification
of numerous endogenous and xenobiotic compounds, including oxidized
lipids, drugs, and pollutants.[2−5] The canonical GSTs are a subset of the thioredoxin
fold family of proteins.[6,7] They consist of an N-terminal
thioredoxin domain and a C-terminal α-helical domain. Although
a number of GSTs are known to have specific substrates, many if not
most have no clearly assigned biological substrates. In addition,
the general nature of their chemistry leads to enzymes that tend to
be catalytically promiscuous even with respect to the transition state
for the reaction.[3] As a consequence, the de novo prediction of enzyme function by computational methods
becomes challenging.Computational docking methods have been
widely used in ligand discovery
for many enzymes.[8−12] In particular, these methods can predict the docking pose of a known
ligand (docking) and/or predict ligands in a large library of small
molecules (virtual screening). Docking consists of searching through
plausible binding modes of a compound and scoring each mode to distinguish
a near-native binding pose from others (docking). Virtual screening
consists of performing docking for each candidate ligand, followed
by the ranking of the candidate ligand by their best docking scores.Although this structure-based approach has been used successfully
for the prediction of both substrates and other types of ligands (e.g.,
orthosteric inhibitors and allosteric modulators), docking for substrate
discovery is most difficult, particularly when the enzyme may accommodate
more than one type of reaction. In addition, only ground state or
intermediate state complexes, not transition states, are typically
accessible by standard docking procedures. The resulting complexes
may or may not be directly competent for turnover in the absence of
information on the “preorganization” of the enzyme–substrate
complex required for catalysis. Despite the difficulties involved
in the docking of substrates, structure-based methods have been used
successfully for substrate discoveries in several systems.[13−18]There are two principal challenges in the de novo prediction of the substrate preferences of enzymes in large, functionally
diverse superfamilies. The first challenge is the availability of
experimentally determined enzyme structures, which lags behind the
number of known protein sequences by a factor of approximately 400
as of June 2013. To remedy this situation, virtual screening has also
relied on comparative models, not only experimentally determined structures.[19−26] The relative utility of comparative models versus experimentally
determined structures has been assessed.[19,27] The second challenge is that standard docking procedures lack sufficient
constraints that efficiently define the productive geometries between
the reacting species (substrates) on the enzyme surface. This issue
is addressed here by applying a covalent bond constraint between the
sulfur of GSH and the electrophilic substrate. Importantly, the effectiveness
of covalent docking has not been rigorously addressed yet.The
absence of a large-scale benchmarking of covalent docking using
either X-ray structures or comparative models, and the need to predict
substrates for GST enzymes, inspired us to investigate the following
questions. Can covalent docking accurately predict docking poses of
known ligands, given experimentally determined, homologous, or modeled
structures in the GST superfamily? Can covalent docking accurately
predict ligands despite the catalytic promiscuity of many GST enzymes?
What is the difference in the utility of apo, holo, comparative modeling,
and homologous structures for virtual screening in the GST superfamily?
Can the virtual screening be improved by consensus scoring, relying
on independent screening against multiple holo, apo, comparative modeling,
and homologous structures in the GST superfamily? If multiple models
are calculated on the basis of different templates, can any of them
outperform apo and even holo X-ray structures of the target? If so,
can one reliably identify which model will do so, or even perform
optimally among a set of modeled structures; are there sequence and/or
structural attributes (i.e., the overall target–template sequence
identity, the binding site target–template sequence identity,
and the predicted accuracy of a model) that reliably predict the accuracy
of ligand docking?In this report, we attempt to answer these
questions with the aid
of a virtual library of compounds and 14 representative GST enzymes
of known structure and function. The virtual library consists of known
substrates for the selected GST proteins, and a large number of compounds
selected from KEGG[28] and MetaCyc.[29] For each target, the entire virtual library
of compounds was docked to the known X-ray structures, homologous
structures, and comparative models of the protein. The results are
analyzed by comparing the docking poses of native products to X-ray
structures and calculating the enrichment of known products with respect
to the entire compound library.We begin by describing the GST
catalyzed reactions, the docking
library, the selected GST proteins, the automated modeling pipeline,
the docking pipeline, and methods to evaluate the accuracy of predicted
docking poses for known ligands and the accuracy of virtual ligand
screening (Methods). We then describe and
compare the results of docking native ligands and virtual screening
using apo, holo, comparative modeling, and homologous structures (Results). Finally, we discuss the implications of
the current approach and answer the questions we asked previously,
given our modeling, docking, and benchmark (Discussion).
Methods
We begin by listing the set of the GST catalyzed
reactions used
in this study, followed by a description of the corresponding products
that comprise a virtual screening library. Next, we describe how we
selected GST targets for docking and how we built their comparative
models. Finally, we describe the covalent docking pipeline as well
as the assessment criteria for evaluating the accuracy of docking
and virtual screening.
Construction of the Virtual Screening Library
GSTs
catalyze a range of different reactions. Here, we considered 11 different
kinds of GST catalyzed reactions, each with a different substrate
motif (Figure 1). For each substrate motif,
we found all matching molecules in the KEGG and MetaCyc compound databases.
The search was performed with the OECHEM tools[30] by matching a SMILES[31] string
of a database molecule to the SMARTS[31] strings
representing the substrate motifs. We found 4,149 and 3,259 in KEGG
and MetaCyc, respectively. We also added 64 substrates from the literature,[32−53] for the total of 6,738 unique substrates.
Figure 1
Partial list of chemical
reactions catalyzed by GSTs. Each row
defines a type of reaction based on the reactive functional group.
X indicates a leaving group in the nucleophilic substitution reactions.
EWG indicates an electron-withdrawing group.
Partial list of chemical
reactions catalyzed by GSTs. Each row
defines a type of reaction based on the reactive functional group.
X indicates a leaving group in the nucleophilic substitution reactions.
EWG indicates an electron-withdrawing group.Next, each of the substrates was converted to one or more
products,
as follows. The conversions corresponding to the 11 reactions were
carried out using OECHEM’s library generation function[30] with explicit hydrogens. The reactive functional
groups in the substrate were identified by comparing the substrate
SMILES string with the SMARTS string representing the substrate motif
undergoing conversion to a product motif, for each reaction. Each
match was then used to convert the substrate to a product that was
added to the virtual screening library.Finally, we prepared
products for docking. For products with undefined
stereocenters, we first enumerated the stereoisomers using OECHEM,[30] with the maximum number of stereoisomers retained
arbitrarily set to 16 for computational efficiency. Second, protonation
states of each stereoisomer was enumerated within pH range 6–8,
followed by generating a conformation for each protonation state,
using Epik[54,55] and LigPrep.[56] Finally, force field parameters for each product were generated
using the hetgrp_ffgen utility (Schrödinger, LCC).
Selection of
the GST Targets for Docking
A subset of GST structures
was selected from the Protein Data Bank (PDB) for retrospective docking
by maximizing the sequence and functional coverage of the GST superfamily
(Figure 2), following three steps: First, all
GST structures with a cocrystallized GSH-substrate conjugate were
extracted from the PDB. Second, structures for the same protein were
grouped together, resulting in 14 groups with unique sets of ligands.
Finally, for each group, the structure with the highest resolution,
the lowest Rfree,, and a unique ligand
was selected, resulting in 14 target holo X-ray structures (Supporting Information Table S1).
Figure 2
Representative network
view of the GST superfamily showing where
docking targets fall in this superfamily. In this view, 2,190 50%
sequence identity filtered nodes (“ID50 node”) representing
13,493 sequences are shown as dots, and each sequence similarity relationship
with a BLAST E-value ≤ 1 × 10–13 is shown as a line or edge between representative nodes. The largest
clusters are labeled by their subgroup in the structure–function
linkage database (SFLD).[38] The 13,493 sequences
for cytosolic GSTs (“cytGSTs”) are based on Pfam (v26.0)[35] sequences that were at least 100 residues in
length and had scores above the gathering threshold for at least one
Pfam hidden Markov model (HMM) corresponding to the cytGST N-terminal
domain (“GST_N,” “GST_N2″, or “GST_N3”;
collectively referred to here as “GST_N*”). Also included in the data set were 58
proteins that lacked matches to a GST_N* HMM but were chosen as Enzyme
Function Initiative (EFI)[57] targets for
experimental characterization because of their sequence similarities
to cytGSTs. Sequence similarity was detected with an all-by-all BLAST
search using blastp (v2.2.24).[36] The data
were viewed with Cytoscape (v2.8.3). Sequence identities are calculated
using CD-HIT[37] (“ID50” set).
The edges are shown using the organic layout where nodes that are
more highly interconnected are clustered more tightly together. PDB
structures associated with cytGST sequences (95% or more sequence
identity to the PDB sequence) were identified using the SFLD[38] interface. An ID50 node is shown as a blue dot
if any member sequence of the node was associated with a PDB structure.
Representative nodes where only an EFI structure was associated with
its member sequences are shown as red dots. Retrospective docking
targets are marked using dark blue borders; only 10 targets are shown
because the rest are more than 50% identical to these 10 representatives
shown.
Representative network
view of the GST superfamily showing where
docking targets fall in this superfamily. In this view, 2,190 50%
sequence identity filtered nodes (“ID50 node”) representing
13,493 sequences are shown as dots, and each sequence similarity relationship
with a BLAST E-value ≤ 1 × 10–13 is shown as a line or edge between representative nodes. The largest
clusters are labeled by their subgroup in the structure–function
linkage database (SFLD).[38] The 13,493 sequences
for cytosolic GSTs (“cytGSTs”) are based on Pfam (v26.0)[35] sequences that were at least 100 residues in
length and had scores above the gathering threshold for at least one
Pfam hidden Markov model (HMM) corresponding to the cytGST N-terminal
domain (“GST_N,” “GST_N2″, or “GST_N3”;
collectively referred to here as “GST_N*”). Also included in the data set were 58
proteins that lacked matches to a GST_N* HMM but were chosen as Enzyme
Function Initiative (EFI)[57] targets for
experimental characterization because of their sequence similarities
to cytGSTs. Sequence similarity was detected with an all-by-all BLAST
search using blastp (v2.2.24).[36] The data
were viewed with Cytoscape (v2.8.3). Sequence identities are calculated
using CD-HIT[37] (“ID50” set).
The edges are shown using the organic layout where nodes that are
more highly interconnected are clustered more tightly together. PDB
structures associated with cytGST sequences (95% or more sequence
identity to the PDB sequence) were identified using the SFLD[38] interface. An ID50 node is shown as a blue dot
if any member sequence of the node was associated with a PDB structure.
Representative nodes where only an EFI structure was associated with
its member sequences are shown as red dots. Retrospective docking
targets are marked using dark blue borders; only 10 targets are shown
because the rest are more than 50% identical to these 10 representatives
shown.Because GSTs function as dimers,
we used for docking the biological
dimer unit from the PDB. The substrate part of the GSH-substrate conjugate
in holo structures was removed, resulting in GST holo X-ray structures
with only GSH bound. Each dimer was then prepared for docking using
the Protein Preparation Wizard (Schrödinger, LCC) by designating
each of the two GSHs as the reactive cofactor, resulting in two receptor
binding sites (with the exception of 1GWC, which has only one site
occupied by GSH); in most cases, the two binding sites are slightly
different because symmetry is generally not imposed during crystallographic
structure refinement.
Comparative Modeling of the GST Targets
Comparative
models for each target were built by selecting template dimer structures
from the PDB with either an unmodified GSH (apo) or a conjugate form
(product) of GSH (holo). For each target, we aimed to include one
holo and one apo template structure for each of the four sequence
identity ranges: 25–40%, 40–55%, 55–70%, and
70–85%. The target–template sequence alignments were
generated using HHalign,[58] which in turn
relied on target and template profile hidden Markov models (HMMs)
generated using HHblits[58] with the UniProt20
database.[59] Five-hundred models were built
for each target–template alignment, using the automodel protocol
in MODELLER-9v10.[60] The coordinates for
GSH atoms in the template structure were then copied to the target
models, using the BLK function of MODELLER. Models were then assessed
by the discrete optimized protein energy (DOPE)[61] function of MODELLER. Finally, the model with the lowest
normalized DOPE score was used for docking.To compare different
comparative models and compare the models against X-ray structures,
we calculated the binding site sequence identity of the templates
to the X-ray structures, and the binding site non-hydrogen atom root-mean-square
deviation (RMSD) between a comparative model and X-ray structure.
The binding site was defined to contain atoms in all residues with
at least one atom within 6 Å of any ligand atom in the X-ray
structure. As expected, strong correlations were observed between
the overall target–template sequence identity and the target–template
binding site sequence identity, and between the target–template
binding site sequence identity and the model’s non-hydrogen
atom RMSD error (Figure 4A,B).
Figure 4
Comparison of comparative
model properties and the RMSD errors
of the docked poses. Each point represents the property of a single
comparative model and/or the RMSD error of the docked pose of its
corresponding crystallographic product against this model. Results
for models built based on apo and holo template structures are shown
in red and blue, respectively, with the least-squares linear fits
shown as dashed lines. The Pearson correlation coefficients R are also shown. The gray dashed line indicates the lower
bound of the RMSD of the top 3 ranked poses.
As designed by the choice of the template
structures, the distribution
of the non-hydrogen atom RMSD error of the binding site confirms that
there is a range of accuracy across the set of modeled structures.
The models with subangstrom non-hydrogen atom RMSD error tend to be
those based on templates with over 80% sequence identity in the binding
site residues to the target sequence. Over a third of models had a
binding site RMSD error over 4 Å. Thus, homology models cover
a range of sequence identities and a range of accuracy.
Protocol for
Covalent Docking
Covalent docking of a
potential product to a receptor was started by placing it in the binding
site. More specifically, the coordinates of the GSH part of the potential
product were matched to the GSH molecule in the receptor, and the
coordinates of the remaining atoms of the product were then built
using OMEGA.[62] Up to 20 initial configurations
were generated for each product molecule.For each initial configuration,
we then used PLOP’s tether pred (Academic version 25.6) function[1] to rotate the rotatable bonds in the GSH-substrate
conjugate. The rotatable bonds were identified using OECHEM. We sampled
all of the rotatable bonds in the substrate part of the GSH-substrate
conjugate as well as the CA–CB and CB–SG bonds in the
cysteine residue of GSH. Up to 50 configurations were generated by
PLOP starting from each initial configuration. These configurations
were then scored by calculating the total potential energy of the
product–receptor complex by PLOP (in the units of kilocalories
per mole), which in turn relies on the OPLS force field with a variable-dielectric
generalized Born model.[63] The product–receptor
distances alone were also scored by the atomic statistical potential
PoseScore (in arbitrary units).[64] While
the PLOP potential energy does not contain the entropic contributions
to the binding free energy, the PoseScore term does approximate the
contribution of the interface to the binding free energy.[65]Finally, for virtual screening, a substrate
was ranked using the
median PLOP energy of its products’ different stereoisomers
and protonation states. The PLOP energy of a product in a specific
stereoisomer form and protonation state was the median PLOP energy
of its different configurations.
Assessment of Docking and
Virtual Screening
When a
native product was docked (Supporting Information Table S2), the docking pose of the product was assessed for accuracy
based on its non-hydrogen atom RMSD from the native configuration,
after superposition of the receptor used for docking on the native
structure of the receptor; GSH was excluded, except for the cysteinesulfur atom.The accuracy of virtual screening was evaluated
by the enrichment for the known products (Supporting
Information Table S3) among the top scoring potential products.
The enrichment curve was obtained by plotting the percentage of actual
products found (y-axis) within the top ranked subset
of all database compounds (x-axis on logarithmic
scale). logAUC, the area under the curve of the enrichment plot, was
also calculated to indicate the accuracy of enrichment; random selection
has a logAUC of 14.5.
Results
We begin by evaluating the
docking pose of the native ligands using
holo and apo structures, followed by evaluating the docking pose of
the native ligands using comparative models and homologous structures.
Next, enrichment of the known ligands using X-ray structures is benchmarked
and analyzed. Finally, we analyze the enrichment of the known ligands
using comparative models.
Docking of Products into X-ray Structures
As the easiest
test, we first applied covalent docking by PLOP to reconstruct binding
poses of the crystallographic products in the corresponding holo X-ray
structures (Supporting Information Table
S1). In all 14 cases, a docking pose within 3 Å of the native
structure was sampled (within 2 Å for 13 cases), though not necessarily
recognized as such by the scoring function (Supporting
Information Table S4); the 3 Å threshold on the all-atom
RMSD subjectively distinguishes near-native and nonnative poses. However,
the top 3 ranked structure was within 3 Å to the native structure
in 9 out of 14 cases using the PLOP energy function, and in 11 out
of 14 cases using PoseScore (Figure 3).
Figure 3
Comparison of the predicted complex between
the native product
and its holo X-ray structure with the native pose: (A) 1F3B with GBX; (B) 2F3M with GTD; (C) 2GST with GPS; (D) 1M9B with IBG. The surface
of the receptor is shown, with red and blue corresponding to oxygen
and nitrogen atoms. The product is shown in the native configuration
(gray), the most accurate sampled configuration (yellow), and the
top ranked configuration by the PLOP energy function (light blue)
and PoseScore (green).
Next, we performed a slightly more difficult, but still easy, test.
For targets with more than one crystallographic product, we docked
each product to the nonnative holo X-ray structure of the target (cross-docking; Supporting Information Table S5). In all 12 cases,
a docking pose within 3 Å to the native structure was sampled
(within 2 Å for 11 cases). The top 3 ranked complex was within
3 Å to the native structure in 7 out of 12 cases for the PLOP
energy function, and in 8 out of 12 cases for PoseScore. To test whether
combining the two scoring functions can further improve the result,
we considered an optimal linear combination of the PoseScore and the
PLOP potential energy, using the average all-atom RMSD of the top
ranked docking poses as the optimization criterion; the optimal linear
combination has zero weight for the PLOP potential energy.Finally,
to test whether or not apo structures can be used for
predicting the native product docking pose, we docked the two products
of GSTM1_HUMAN (GDN and GTD) to its two available apo structures (1XW6 and 1YJ6). The results were
comparable to those from cross-docking (Supporting
Information Table S6).Comparison of the predicted complex between
the native product
and its holo X-ray structure with the native pose: (A) 1F3B with GBX; (B) 2F3M with GTD; (C) 2GST with GPS; (D) 1M9B with IBG. The surface
of the receptor is shown, with red and blue corresponding to oxygen
and nitrogen atoms. The product is shown in the native configuration
(gray), the most accurate sampled configuration (yellow), and the
top ranked configuration by the PLOP energy function (light blue)
and PoseScore (green).
Docking of Native Products into Comparative Models and Homologous
Crystallographic Structures
To test whether or not comparative
models can be used for docking pose prediction of native products,
a total of 62 homology models were generated for the 14 GST targets
(Supporting Information Table S7). We performed
native product docking against these models as well as some of their
template structures.To find out to what degree the success
of docking pose prediction was limited by sampling versus scoring,
we plotted the RMSD error of the top ranked docking pose versus the
RMSD error of the best sampled docking pose (Figure 4C). As expected there
was a significant correlation between these two RMSDs; the RMSD error
of the best sampled pose is a lower bound of the RMSD error of the
top ranked pose. Even when accurate poses were sampled, the scoring
function often did not recognize them (Figure 4C). In 56 out of 62 cases, a docking pose within 3 Å was sampled,
but only in 22 out of 62 cases, it was ranked in the top 3. Therefore,
the limiting factor for docking pose prediction is the accuracy of
the scoring function used. In this application, the inaccuracy of
the scoring function may be exacerbated by errors in the model of
the binding site (Figure 6). Despite the difficulties
in ranking a near-native pose in the context of comparative model,
for 6 of the 14 targets (1GWC, 2AB6, 3LJR, 3O76, 2C4J, and 1BYE; Figure 5),
the RMSD errors of the top ranked poses by PoseScore were comparable
to those against holo X-ray structures.
Figure 6
Docking of native products against comparative models. (A) Successful
docking of GBX onto a model of 3CSH, based on 85% sequence identity
to the template 3O76 (chain A). The ribbon representation of the crystal structure is
shown in light gray and that for the model in orange. The C-α
RMSD error of the model is 0.64 Å, and the non-hydrogen atom
RMSD error of the binding site is 0.71 Å. The ligand is shown
in the stick representation. The native pose is shown in light gray,
the best-sampled pose (RMSD of 0.90 Å) is shown in yellow, the
pose top ranked by PLOP (0.92 Å) is shown in cyan, and the pose
top ranked by POSESCORE (0.92 Å) is shown in green (not visible
because it is hidden behind the cyan structure). (B) Failed docking
of GPS onto a model of 1F3B, based on 76% sequence identity to the template 1YDK (chain B). The ribbon
representation of the crystal structure is shown in light gray and
that for the model in orange. The C-α RMSD error of the model
is 2.42 Å, and the non-hydrogen atom RMSD error of the binding
site is 2.47 Å. The ligand is shown in the stick representation.
The native pose is shown in light gray, the best-sampled pose (RMSD
of 5.38 Å) is shown in yellow, the pose top ranked by PLOP (6.58
Å) is shown in cyan, and the pose top ranked by POSESCORE (0.92
Å) is shown in green.
Figure 5
Non-hydrogen atom RMSD error of the docked poses, for the crystallographic
product against its corresponding comparative models with different
sequence identities. For each target, the RMSD errors of the best-sampled
poses using apo templates (stripped red bar), the RMSD errors of the
top-ranked poses by PoseScore using apo templates (stripped blue bar),
the RMSD errors of the best-sampled poses using holo templates (red
bar), and the RMSD errors of the top ranked pose by PoseScore using
apo templates (blue bar) are shown. For comparison, the RMSD errors
of the docked poses using holo X-ray structures are shown as horizontal
lines (blue solid line for the top ranked pose by PoseScore, red dashed
line for the best-sampled pose).
We then asked the question
whether or not there were model properties
that predict the best model for docking pose prediction, among multiple
models based on different template structures. We found no such predictors.
There was essentially no correlation between a model’s binding
site RMSD error and the RMSD error of the top ranked docking pose
(Figure 4D). There were only weak correlations
between the target–template sequence identity and the RMSD
error of the best-sampled docking pose (Figure 4F; R = −0.19 for apo tempalates, and R = −0.20 for holo templates). We also calculated
the correlation between the RMSD errors of the model binding site
and the best-sampled docking pose. There was no significant correlation
between these two RMSD errors. We also investigated whether or not
holo and apo structures performed differently for docking pose prediction.
On average, models based on holo template structures were not noticeably
better for docking pose prediction than models from apo template structures.Finally, we tested the accuracy of docking pose prediction using
a homologous structure itself, instead of a comparative target model
based on it. The test was performed with the two known products of
GSTM1_HUMAN (GDN and GTD) and its homologous structure (2C4J). The docking pose
of GTD had the RMSD error of 2.2 Å, and the docking pose of GDN
had the RMSD error of 4.7 Å, on average not significantly different
from that using the corresponding comparative model (3.0 Å for
GTD and 3.5 Å for GDN, binding site RMSD error).Comparison of comparative
model properties and the RMSD errors
of the docked poses. Each point represents the property of a single
comparative model and/or the RMSD error of the docked pose of its
corresponding crystallographic product against this model. Results
for models built based on apo and holo template structures are shown
in red and blue, respectively, with the least-squares linear fits
shown as dashed lines. The Pearson correlation coefficients R are also shown. The gray dashed line indicates the lower
bound of the RMSD of the top 3 ranked poses.Non-hydrogen atom RMSD error of the docked poses, for the crystallographic
product against its corresponding comparative models with different
sequence identities. For each target, the RMSD errors of the best-sampled
poses using apo templates (stripped red bar), the RMSD errors of the
top-ranked poses by PoseScore using apo templates (stripped blue bar),
the RMSD errors of the best-sampled poses using holo templates (red
bar), and the RMSD errors of the top ranked pose by PoseScore using
apo templates (blue bar) are shown. For comparison, the RMSD errors
of the docked poses using holo X-ray structures are shown as horizontal
lines (blue solid line for the top ranked pose by PoseScore, red dashed
line for the best-sampled pose).Docking of native products against comparative models. (A) Successful
docking of GBX onto a model of 3CSH, based on 85% sequence identity
to the template 3O76 (chain A). The ribbon representation of the crystal structure is
shown in light gray and that for the model in orange. The C-α
RMSD error of the model is 0.64 Å, and the non-hydrogen atom
RMSD error of the binding site is 0.71 Å. The ligand is shown
in the stick representation. The native pose is shown in light gray,
the best-sampled pose (RMSD of 0.90 Å) is shown in yellow, the
pose top ranked by PLOP (0.92 Å) is shown in cyan, and the pose
top ranked by POSESCORE (0.92 Å) is shown in green (not visible
because it is hidden behind the cyan structure). (B) Failed docking
of GPS onto a model of 1F3B, based on 76% sequence identity to the template 1YDK (chain B). The ribbon
representation of the crystal structure is shown in light gray and
that for the model in orange. The C-α RMSD error of the model
is 2.42 Å, and the non-hydrogen atom RMSD error of the binding
site is 2.47 Å. The ligand is shown in the stick representation.
The native pose is shown in light gray, the best-sampled pose (RMSD
of 5.38 Å) is shown in yellow, the pose top ranked by PLOP (6.58
Å) is shown in cyan, and the pose top ranked by POSESCORE (0.92
Å) is shown in green.
Enrichment for Known Products by Virtual Screening against X-ray
Structures
We now address the question whether or not holo
and apo X-ray structures can be used for virtual product screening
by covalent docking.We first applied PLOP to virtual substrate
discovery by docking the whole product library against holo structures
of 11 GSTs with more than 3 known substrates; this is the minimal
number of known ligands needed for computing an enrichment curve.
We docked the products onto each chain of the GST dimers independently
and analyzed the docking results separately for each chain as well
as collectively for all chains, by relying on our consensus scoring
scheme;[27] consensus scoring combines docking
scores from independent virtual screen against multiple chains/structures/models
of the same target. In all 11 cases, the enrichment result was better
than that from random selection (Figure 7).
In 7 of the 11 cases, logAUC was better than 25. In contrast to the
noncovalent docking benchmark,[27] consensus
scoring did not improve logAUC for any target.
Figure 7
Enrichment
curves for the 11 GST targets with 3 or more known products,
using holo X-ray structures. For each target, the enrichment curves
for the individual chains (blue, green, yellow, and cyan lines), the
consensus scoring for multiple chains (red line), and random selection
(black dashed line) are shown.
Next, we tested
the enrichment of known ligands using two apo structures
of GSTM1_HUMAN (1YJ6 and 1XW6; Supporting Information Figure S2). The averaging
logAUC (32.6) is slightly worse than that using the corresponding
holo structure (33.3).Finally, we tested whether or not consensus
scoring using multiple
structures could improve enrichment for known products by combining
virtual screening results from either two apo or two holo structures
of GSTM1_HUMAN. Using the two apo structures (1YJ6 and 1XW6), the logAUC for
consensus scoring was 34.5. Similarly, using holo structures (2F3M and 1XWK) resulted in a logAUC
value of 33.6. In both cases, logAUC for consensus scoring was between
the best and worst logAUC values of individual virtual screening runs.Enrichment
curves for the 11 GST targets with 3 or more known products,
using holo X-ray structures. For each target, the enrichment curves
for the individual chains (blue, green, yellow, and cyan lines), the
consensus scoring for multiple chains (red line), and random selection
(black dashed line) are shown.
Enrichment for Known Products by Virtual Screening against Comparative
Models and Homologous Crystallographic Structures
To evaluate
the utility of homology models for discovering substrates of GSTs,
the enrichment of known products was determined and compared to that
for the crystal structures.We first tested whether or not there
was an apparent correlation between the enrichment and the target–template
sequence identity by virtual screening against the 6 comparative models
of the target 2F3M. No correlation was observed between logAUC and the target–template
sequence identity (Figure 8). The absence of
such a correlation was also observed in an extensive noncovalent docking
benchmark.[27]
Figure 8
Enrichment curves for target 2F3M using its 6 comparative models based
on different apo and holo templates. For each model, the enrichment
curves for the individual chains (blue, green, yellow, and cyan lines),
the consensus scoring for multiple chains (red line), and random selection
(black dashed line) are shown. Target–template sequence identities
are as follows: The 2F3M-3T2U and 2F3M-2FHE sequence identities
are 33% and 50%, respectively (apo templates). Target–template
sequence identities are as follows: 33% (2F3M-3T2U, apo), 50% (2F3M-2FHE), 33% (2F3M-2GSR, holo),
42% (2F3M-1M99, holo), 66% (2F3M-1GSU, holo), and 85%
(2F3M-2C4J, holo). Binding
site RMSD errors are as follows: 5.9 Å (2F3M-3T2U, apo), 3.1 Å
(2F3M-2FHE), 7.4 Å (2F3M-2GSR, holo), 3.1 Å
(2F3M-1M99, holo), 1.4 Å
(2F3M-1GSU, holo), and 1.1
Å (2F3M-2C4J, holo).
Given the absence of
a correlation between the enrichment and the
target–template sequence identity, we next benchmarked the
utility of the best comparative models for virtual screening, for
the 11 targets with more than 3 known substrates; the best comparative
model is that based on the highest target–template sequence
identity. We docked the products onto each chain of the GST dimers
independently, and analyzed the docking results separately for each
chain as well as collectively for all chains, by relying on our consensus
scoring scheme.[27] In all cases, the enrichment
result was better than that from random selection (Figure 9). In 5 of the 11 cases, logAUC was better than
25. Again, the consensus scoring did not improve logAUC for any target.
Figure 9
Enrichment curves for the 11 GST targets with
3 or more known products,
using their comparative models based on templates with the highest
sequence identities. For each target, the enrichment curves for the
individual chains (blue, green, yellow, and cyan lines), the consensus
scoring for multiple chains (red line), and the random selection (black
dashed line) are shown. Target–template sequence identities
are as follows: 85% (2F3M-2C4J), 85%
(3CSH-2C4J), 77% (2C4J-6GSV), 54% (3LJR-2C3Q), 33% (2GSQ-2GSR), 28% (3GX0-1PN9), 76% (2GST-2C4J), 44% (1M9B-2C4J), 64% (1VF3-3KTL), 76% (2AB6-6GSV), and 46% (1GWC-2VO4).
We further tested whether or not consensus docking using multiple
comparative models can improve the enrichment by combining the docking
results from the 6 comparative models for target 2F3M (Figure 8). Consensus scoring resulted in a logAUC of 40.4,
which was slightly better than the best logAUC value for individual
models (40.3) and was better than logAUC for the holo X-ray structure.Finally, we tested whether or not homologous X-ray structures could
be used in place of comparative models for virtual screening, using
a homologue of GSTM1_HUMAN (2C4J). The corresponding logAUC is 30.9
for chain A, 31.6 for chain B, and 31.5 for consensus scoring using
chains A and B, which is significantly worse than that for the comparative
model based on 2C4J (34.9 for chain A, 35.8 for chain B, and 38.7 for consensus scoring).
A similar finding was obtained in the noncovalent docking benchmark.[27]Enrichment curves for target 2F3M using its 6 comparative models based
on different apo and holo templates. For each model, the enrichment
curves for the individual chains (blue, green, yellow, and cyan lines),
the consensus scoring for multiple chains (red line), and random selection
(black dashed line) are shown. Target–template sequence identities
are as follows: The 2F3M-3T2U and 2F3M-2FHE sequence identities
are 33% and 50%, respectively (apo templates). Target–template
sequence identities are as follows: 33% (2F3M-3T2U, apo), 50% (2F3M-2FHE), 33% (2F3M-2GSR, holo),
42% (2F3M-1M99, holo), 66% (2F3M-1GSU, holo), and 85%
(2F3M-2C4J, holo). Binding
site RMSD errors are as follows: 5.9 Å (2F3M-3T2U, apo), 3.1 Å
(2F3M-2FHE), 7.4 Å (2F3M-2GSR, holo), 3.1 Å
(2F3M-1M99, holo), 1.4 Å
(2F3M-1GSU, holo), and 1.1
Å (2F3M-2C4J, holo).Enrichment curves for the 11 GST targets with
3 or more known products,
using their comparative models based on templates with the highest
sequence identities. For each target, the enrichment curves for the
individual chains (blue, green, yellow, and cyan lines), the consensus
scoring for multiple chains (red line), and the random selection (black
dashed line) are shown. Target–template sequence identities
are as follows: 85% (2F3M-2C4J), 85%
(3CSH-2C4J), 77% (2C4J-6GSV), 54% (3LJR-2C3Q), 33% (2GSQ-2GSR), 28% (3GX0-1PN9), 76% (2GST-2C4J), 44% (1M9B-2C4J), 64% (1VF3-3KTL), 76% (2AB6-6GSV), and 46% (1GWC-2VO4).
Discussion
We developed a comparative
modeling and covalent docking pipeline
for predicting docking poses of known products and for the virtual
discovery of substrates, as well as benchmarked it on the GST enzymes.
Either a holo or apo structure can be used for predicting docking
poses relatively accurately in a majority of cases. While using homology
models for predicting docking poses is more difficult, a homology
model can also be useful. Specifically, for native holo X-ray structures
as receptors, one of the top 3 ranked models is within 3 Å all-atom
RMSD of the native complex in 11 of the 14 test cases (in 8 of 12
cases for cross-docking in which a product from another holo structure
of the same enzyme is docked; Supporting Information Tables S4 and S5). For comparative models based on more than 30%
sequence identity, one of the top 3 ranked models is within 3 Å
all-atom RMSD of the native complex in 22 of the 62 test cases (Figure 5). For virtual screening against holo X-ray structures,
the enrichment logAUC value is better than random (14.5) in all cases
and better than 25 in 7 of 11 cases (Figure 7). For models based on templates with the highest sequence identity
(approximately 30–85%), the logAUC is better than 25 in 5 of
11 cases (Figure 9), not significantly different
from the crystal structures. In conclusion, we show that covalent
docking can be a useful tool for substrate discovery.Next,
we discuss three points in turn. First, we assess to what
degree the accuracy of the docking poses and ranking of alternative
products is limited by the degree of structure sampling and the accuracy
of scoring. Second, we discuss the utility of comparative models for
docking and virtual screening. Third, we conclude by commenting on
docking transition state conformations of ligands.
Sampling versus Scoring
To predict an accurate docking
pose, two conditions need to be satisfied: (i) a near-native structure
needs to be generated by the sampling procedure and (ii) an accurate
sample structure needs to be scored better than the inaccurate structures
by the scoring function. Using X-ray structures as receptors, a near-native
docking pose (<3 Å) was sampled in all cases (Supporting Information Tables S4 and S5); thus,
the degree of sampling does not limit the prediction accuracy. But,
in 21% of the native docking cases and in 33% of the cross-docking
cases, a near-native model was not ranked in top 3 by PoseScore. Using
homology models as receptors, a near-native docking pose was sampled
in 56 of 62 cases but was only ranked in top 3 in 22 of 62 cases (Figure 5). The reason is that our scoring function is not
always accurate enough to differentiate between accurate and inaccurate
docking poses.To rank different substrates, we used the median
PLOP energy of their product–enzyme complexes. We chose the
median instead of the lowest PLOP energy because it resulted in a
significantly higher average logAUC (27.3 vs 18.9). A likely reason
is that the energy estimate of a single configuration is noisier than
the median, compensating for the lack of accounting for the proper
physics of the problem. In the future, other functions such as the
Boltzmann average of the sampled energies will be explored.Although PoseScore reranking of the docking poses generated by
optimizing the PLOP energy further improved the accuracy of the top
scoring docked poses, this result does not imply that PoseScore is
more accurate than the PLOP energy. It is possible that reranking
by PLOP would also improve the accuracy of the poses generated by
optimizing PoseScore; for technical reasons, it is not straightforward
to actually make this test. Our result indicates merely that the two
scoring functions are different and imperfect and that PoseScore contains
helpful information not present in PLOP, while being silent on whether
or not PLOP also contains information not present in PoseScore.In contrast to noncovalent docking,[27] on
average, consensus scoring did not improve virtual screening
accuracy. A combination of the following two assumptions explains
this finding: (i) covalent docking is limited more by scoring than
by sampling compared to noncovalent docking; and (ii) consensus scoring
works best when sampling is a limiting factor in docking. A major
difference between noncovalent and covalent docking is that the latter
is constrained by a covalent bond, resulting in an easier sampling
problem all other things considered equal. Using multiple receptor
structures to mimic receptor flexibility was already shown to lead
to more accurate sampled docking poses in noncovalent docking.[27] Thus, the disadvantage of a larger number of
decoys produced by using multiple templates, which increases the burden
on the scoring function for identifying the most accurate pose, is
outweighed by the more accurately sampled poses that are more likely
to be ranked highly by the scoring function. In contrast, if scoring
is limiting and sampling is not, as is the case in covalent docking
(cf. our native docking and cross-docking tests), considering poses
from multiple receptor structures is not likely to improve the results,
as observed.”Given the demonstrated limitations of current
scoring, the accuracy
of ranking ligand poses and ligands can be improved by more accurate
scoring functions. On the one hand, the ability of the PLOP potential
energy to rank the native state (or at least a near-native state)
may be improved by estimating the free energy of the native state.[66] On the other hand, the accuracy of the statistical
potential could be improved by applying the Bayesian framework for
inferring statistically optimized atomic potentials (SOAP).[67] This framework (1) uses multiple data-driven
“recovery” functions based on probability theory, without
recourse to questionable statistical mechanical assumptions about
statistical potentials; (2) restrains the relative orientation between
two covalent bonds instead of a simple distance between two atoms,
in an effort to capture orientation-dependent interactions such as
hydrogen bonds; (3) performs Bayesian smoothing for estimating the
underlying smooth distributions from noisy observations; (4) applies
Bayesian inference for calculating parameter values that maximize
the posteriori probability; and (5) benefits from Bayesian model selection
based on Bayesian predictive densities, in an effort to improve the
generalizability of the derived statistical potentials.
Comparative
Models for Docking and Virtual Screening
Using comparative
models for predicting docking poses often produces
inaccurate models (Figures 4 and 5). One possible reason is that covalent docking is sensitive
to minor changes in the binding site, especially for residues close
to the GSHsulfur atom due to the substrate–receptor covalent
bond constraint (Figure 6). Another possible
reason is that the active sites of GST enzymes are often defined in
part by the C-terminus, which is often flexible or missing in a template
structure (Figure 6). Moreover, there was no
significant correlation between the binding site non-hydrogen atom
RMSD and the docking pose RMSD (Figure 4).
The lack of this correlation may also result from the disproportionate
impact of residues close to the GSHsulfur atom on the docking pose.Virtual screening using comparative models generated results comparable
to the X-ray structures (average logAUC of 28.2 versus 27.3, better
than 25 in 5 versus 7 out of 11 cases), consistent with what has been
observed when using comparative models for virtual screening by noncovalent
docking (average logAUC of 28.7 versus 30.6),[27] even though comparative models performed much worse than X-ray structures
for docking pose prediction (average non-hydrogen atom RMSD of 3.98
Å versus 1.83 Å). This finding may be rationalized by comparatively
worse scoring of both true and decoy products (average PoseScore of
−4.2 and 4.1 versus −60.0 and −52.2) when docking
against inaccurate binding sites in comparative models.In prospective
applications, if some substrates for the target
of interest are known, virtual screening using comparative models
can be improved by selecting the most enriching model for known substrates.[68−70] If no substrates are known for the target of interest, it might
be advantageous to build and dock into multiple structurally diverse
models and analyze the virtual screening results by hand.[15] An experienced user may be able to identify
an accurate docking result based on the totality of their knowledge
about the proteins of interest.
Reactant, Product, Intermediate,
and Transition States
Generally, a substrate of an enzyme
is required to have a sufficiently
high kcat/Km; a typical threshold is 104 (mol/L)−1 s–1. An enzymatic reaction involves interconversions
between pairs of stable states, separated by a transition state; the
stable states include the enzyme and substrate without interaction
(E + S), an enzyme–substrate complex intermediate (ES), an
enzyme–product intermediate (EP), and the enzyme and product
without interaction (E + P).[71] In general, kcat/Km depends on
the free energies of all states. Thus, virtual ligand screening should
in principle consider all states, which is generally not the case.When conversion from ES to EP is the rate-limiting step, kcat is mostly determined by the transition state
energy barrier for this step. Enzymes lower this barrier by creating
a binding site that is complementary to the transition state of the
substrate. Thus, docking a substrate (or a product) in its transition
state conformation may be more appropriate than docking a substrate
(or a product) in its ground state. Although quantum mechanics calculations
can model the transition state and estimate the activation energy
barrier, running these calculations for the entire virtual screening
library is not computationally feasible.[72] Instead, we docked here the reaction product states and achieved
relative success. However, the enrichment results could possibly be
greatly improved if we can dock the transition state, or a transition
state like structure, without recourse to quantum mechanics. For GST
substrate discovery, compounds similar to the high-energy intermediates
used for docking to amidohydrolases[73] could
be constructed, provided some technical challenges are solved (e.g.,
building C atoms with 2 partial bonds for a total of 5 bonds). Although
enzymes presumably maximize complementarity to the transition state,
other states during the reaction must also be compatible with the
potentially flexible binding site of the enzyme. Thus, substrate discovery
might also be improved by consideration of docking of all ligand states
to the corresponding state(s) of the enzyme.
Conclusion
Can covalent docking accurately predict docking poses of
known ligands, given experimentally determined, homologous or modeled
structures in the GST superfamily? Yes, either a holo or apo
structure can be used for predicting docking poses accurately in a
majority of cases, while using homology models is less successful.
Specifically, for native holo X-ray structures as receptors, one of
the top 3 ranked models is within 3 Å all-atom RMSD of the native
complex in 11 of the 14 cases (in 8 of 12 cases for cross-docking).
For comparative models as receptors, one of the top 3 ranked models
is within 3 Å all-atom RMSD of the native complex in 22 of the
62 test cases.Can covalent docking accurately predict
ligands despite the
catalytically promiscuity of many GST enzymes? Often.What is the difference in the utility of apo, holo, comparative
modeling, and homologous structures for virtual screening in the GST
superfamily? Holo and apo structures, homologous structures,
and comparative models can all enrich for known products. For virtual
screening against holo X-ray structures, the enrichment logAUC value
is better than random in all cases, and better than 25 in 7 of 11
cases (average logAUC 27.3; Figure 7). Using
apo X-ray structures of 2F3M, the average logAUC (32.6) is slightly worse than
that using the corresponding holo structure (33.3). For models based
on templates with the highest sequence identity (approximately 30–85%),
the logAUC is better than 25 in 5 of 11 cases (average logAUC 28.2;
Figure 9), which is comparable to that for
X-ray structure. Using the homologous structure of 2F3M, the logAUC (31.3)
is significantly worse than that using comparative models (36.5).Can the virtual screening be improved by consensus scoring,
relying on independent screening against multiple holo, apo, comparative
modeling, and homologous structures in the GST superfamily? Rarely. Consensus scoring using multiple chains from the same structure/model
or multiple structures/models of the same target often does not improve
the virtual screening accuracy. Although consensus scoring improves
logAUC in comparison to using a single receptor in some cases, it
makes it worse in others.If multiple models are calculated
on the basis of different
templates, can any of them outperform apo and even holo X-ray structures
of the target? If so, can one reliably identify which model will do
so, or even perform optimally among a set of modeled structures; are
there sequence and/or structural attributes (i.e., the overall target–template
sequence identity, the binding site target–template sequence
identity, and the predicted accuracy of a model) that reliably predict
the accuracy of ligand docking? For docking pose prediction,
there are a few cases where comparative models outperformed the corresponding
holo X-ray structure (Figure 5). However, we
did not find a predictor that could reliably predict the accuracy
of the docked pose. For virtual screening, some models also outperform
the corresponding holo X-ray structure (Figures 8 and 9), but we did not find a correlation
between logAUC and the binding site sequence identity or the binding
site RMSD errors.
Authors: Matthew P Jacobson; David L Pincus; Chaya S Rapp; Tyler J F Day; Barry Honig; David E Shaw; Richard A Friesner Journal: Proteins Date: 2004-05-01
Authors: Hao Fan; Daniel S Hitchcock; Ronald D Seidel; Brandan Hillerich; Henry Lin; Steven C Almo; Andrej Sali; Brian K Shoichet; Frank M Raushel Journal: J Am Chem Soc Date: 2013-01-02 Impact factor: 15.419
Authors: B Mannervik; P Alin; C Guthenberg; H Jensson; M K Tahir; M Warholm; H Jörnvall Journal: Proc Natl Acad Sci U S A Date: 1985-11 Impact factor: 11.205
Authors: Luca Federici; Carlo Lo Sterzo; Silvia Pezzola; Adele Di Matteo; Flavio Scaloni; Giorgio Federici; Anna Maria Caccuri Journal: Cancer Res Date: 2009-10-06 Impact factor: 12.701
Authors: X Ji; W W Johnson; M A Sesay; L Dickert; S M Prasad; H L Ammon; R N Armstrong; G L Gilliland Journal: Biochemistry Date: 1994-02-08 Impact factor: 3.162
Authors: Ron Caspi; Tomer Altman; Kate Dreher; Carol A Fulcher; Pallavi Subhraveti; Ingrid M Keseler; Anamika Kothari; Markus Krummenacker; Mario Latendresse; Lukas A Mueller; Quang Ong; Suzanne Paley; Anuradha Pujar; Alexander G Shearer; Michael Travers; Deepika Weerasinghe; Peifen Zhang; Peter D Karp Journal: Nucleic Acids Res Date: 2011-11-18 Impact factor: 16.971
Authors: John A Gerlt; Jason T Bouvier; Daniel B Davidson; Heidi J Imker; Boris Sadkhin; David R Slater; Katie L Whalen Journal: Biochim Biophys Acta Date: 2015-04-18
Authors: Nir London; Jeremiah D Farelli; Shoshana D Brown; Chunliang Liu; Hua Huang; Magdalena Korczynska; Nawar F Al-Obaidi; Patricia C Babbitt; Steven C Almo; Karen N Allen; Brian K Shoichet Journal: Biochemistry Date: 2015-01-05 Impact factor: 3.162
Authors: Stacy T Knutson; Brian M Westwood; Janelle B Leuthaeuser; Brandon E Turner; Don Nguyendac; Gabrielle Shea; Kiran Kumar; Julia D Hayden; Angela F Harper; Shoshana D Brown; John H Morris; Thomas E Ferrin; Patricia C Babbitt; Jacquelyn S Fetrow Journal: Protein Sci Date: 2017-03-08 Impact factor: 6.725