Literature DB >> 24802635

Prediction of substrates for glutathione transferases by covalent docking.

Guang Qiang Dong¹, Sara Calhoun, Hao Fan, Chakrapani Kalyanaraman, Megan C Branch, Susan T Mashiyama, Nir London, Matthew P Jacobson, Patricia C Babbitt, Brian K Shoichet, Richard N Armstrong, Andrej Sali.

Abstract

Enzymes in the glutathione transferase (GST) superfamily catalyze the conjugation of glutathione (GSH) to electrophilic substrates. As a consequence they are involved in a number of key biological processes, including protection of cells against chemical damage, steroid and prostaglandin biosynthesis, tyrosine catabolism, and cell apoptosis. Although virtual screening has been used widely to discover substrates by docking potential noncovalent ligands into active site clefts of enzymes, docking has been rarely constrained by a covalent bond between the enzyme and ligand. In this study, we investigate the accuracy of docking poses and substrate discovery in the GST superfamily, by docking 6738 potential ligands from the KEGG and MetaCyc compound libraries into 14 representative GST enzymes with known structures and substrates using the PLOP program [ Jacobson Proteins 2004 , 55 , 351 ]. For X-ray structures as receptors, one of the top 3 ranked models is within 3 Å all-atom root mean square deviation (RMSD) of the native complex in 11 of the 14 cases; the enrichment LogAUC value is better than random in all cases, and better than 25 in 7 of 11 cases. For comparative models as receptors, near-native ligand-enzyme configurations are often sampled but difficult to rank highly. For models based on templates with the highest sequence identity, the enrichment LogAUC is better than 25 in 5 of 11 cases, not significantly different from the crystal structures. In conclusion, we show that covalent docking can be a useful tool for substrate discovery and point out specific challenges for future method improvement.

Entities: Chemical Disease Gene Species

Mesh：

Substances：

Year: 2014 PMID： 24802635 PMCID： PMC4068255 DOI： 10.1021/ci5001554

Source DB: PubMed Journal: J Chem Inf Model ISSN： 1549-9596 Impact factor: 4.956

Introduction

The canonical glutathione transferases (also known as GSTs; EC 2.5.1.18) catalyze addition of an excellent nucleophile to an electrophilic center. They play important roles in the metabolism and detoxification of numerous endogenous and xenobiotic compounds, including oxidized lipids, drugs, and pollutants.[2−5] The canonical GSTs are a subset of the thioredoxin fold family of proteins.[6,7] They consist of an N-terminal thioredoxin domain and a C-terminal α-helical domain. Although a number of GSTs are known to have specific substrates, many if not most have no clearly assigned biological substrates. In addition, the general nature of their chemistry leads to enzymes that tend to be catalytically promiscuous even with respect to the transition state for the reaction.[3] As a consequence, the de novo prediction of enzyme function by computational methods becomes challenging. Computational docking methods have been widely used in ligand discovery for many enzymes.[8−12] In particular, these methods can predict the docking pose of a known ligand (docking) and/or predict ligands in a large library of small molecules (virtual screening). Docking consists of searching through plausible binding modes of a compound and scoring each mode to distinguish a near-native binding pose from others (docking). Virtual screening consists of performing docking for each candidate ligand, followed by the ranking of the candidate ligand by their best docking scores. Although this structure-based approach has been used successfully for the prediction of both substrates and other types of ligands (e.g., orthosteric inhibitors and allosteric modulators), docking for substrate discovery is most difficult, particularly when the enzyme may accommodate more than one type of reaction. In addition, only ground state or intermediate state complexes, not transition states, are typically accessible by standard docking procedures. The resulting complexes may or may not be directly competent for turnover in the absence of information on the “preorganization” of the enzyme–substrate complex required for catalysis. Despite the difficulties involved in the docking of substrates, structure-based methods have been used successfully for substrate discoveries in several systems.[13−18] There are two principal challenges in the de novo prediction of the substrate preferences of enzymes in large, functionally diverse superfamilies. The first challenge is the availability of experimentally determined enzyme structures, which lags behind the number of known protein sequences by a factor of approximately 400 as of June 2013. To remedy this situation, virtual screening has also relied on comparative models, not only experimentally determined structures.[19−26] The relative utility of comparative models versus experimentally determined structures has been assessed.[19,27] The second challenge is that standard docking procedures lack sufficient constraints that efficiently define the productive geometries between the reacting species (substrates) on the enzyme surface. This issue is addressed here by applying a covalent bond constraint between the sulfur of GSH and the electrophilic substrate. Importantly, the effectiveness of covalent docking has not been rigorously addressed yet. The absence of a large-scale benchmarking of covalent docking using either X-ray structures or comparative models, and the need to predict substrates for GST enzymes, inspired us to investigate the following questions. Can covalent docking accurately predict docking poses of known ligands, given experimentally determined, homologous, or modeled structures in the GST superfamily? Can covalent docking accurately predict ligands despite the catalytic promiscuity of many GST enzymes? What is the difference in the utility of apo, holo, comparative modeling, and homologous structures for virtual screening in the GST superfamily? Can the virtual screening be improved by consensus scoring, relying on independent screening against multiple holo, apo, comparative modeling, and homologous structures in the GST superfamily? If multiple models are calculated on the basis of different templates, can any of them outperform apo and even holo X-ray structures of the target? If so, can one reliably identify which model will do so, or even perform optimally among a set of modeled structures; are there sequence and/or structural attributes (i.e., the overall target–template sequence identity, the binding site target–template sequence identity, and the predicted accuracy of a model) that reliably predict the accuracy of ligand docking? In this report, we attempt to answer these questions with the aid of a virtual library of compounds and 14 representative GST enzymes of known structure and function. The virtual library consists of known substrates for the selected GST proteins, and a large number of compounds selected from KEGG[28] and MetaCyc.[29] For each target, the entire virtual library of compounds was docked to the known X-ray structures, homologous structures, and comparative models of the protein. The results are analyzed by comparing the docking poses of native products to X-ray structures and calculating the enrichment of known products with respect to the entire compound library. We begin by describing the GST catalyzed reactions, the docking library, the selected GST proteins, the automated modeling pipeline, the docking pipeline, and methods to evaluate the accuracy of predicted docking poses for known ligands and the accuracy of virtual ligand screening (Methods). We then describe and compare the results of docking native ligands and virtual screening using apo, holo, comparative modeling, and homologous structures (Results). Finally, we discuss the implications of the current approach and answer the questions we asked previously, given our modeling, docking, and benchmark (Discussion).

Methods

We begin by listing the set of the GST catalyzed reactions used in this study, followed by a description of the corresponding products that comprise a virtual screening library. Next, we describe how we selected GST targets for docking and how we built their comparative models. Finally, we describe the covalent docking pipeline as well as the assessment criteria for evaluating the accuracy of docking and virtual screening.

Construction of the Virtual Screening Library

GSTs catalyze a range of different reactions. Here, we considered 11 different kinds of GST catalyzed reactions, each with a different substrate motif (Figure 1). For each substrate motif, we found all matching molecules in the KEGG and MetaCyc compound databases. The search was performed with the OECHEM tools[30] by matching a SMILES[31] string of a database molecule to the SMARTS[31] strings representing the substrate motifs. We found 4,149 and 3,259 in KEGG and MetaCyc, respectively. We also added 64 substrates from the literature,[32−53] for the total of 6,738 unique substrates.

Figure 1

Partial list of chemical reactions catalyzed by GSTs. Each row defines a type of reaction based on the reactive functional group. X indicates a leaving group in the nucleophilic substitution reactions. EWG indicates an electron-withdrawing group. Next, each of the substrates was converted to one or more products, as follows. The conversions corresponding to the 11 reactions were carried out using OECHEM’s library generation function[30] with explicit hydrogens. The reactive functional groups in the substrate were identified by comparing the substrate SMILES string with the SMARTS string representing the substrate motif undergoing conversion to a product motif, for each reaction. Each match was then used to convert the substrate to a product that was added to the virtual screening library. Finally, we prepared products for docking. For products with undefined stereocenters, we first enumerated the stereoisomers using OECHEM,[30] with the maximum number of stereoisomers retained arbitrarily set to 16 for computational efficiency. Second, protonation states of each stereoisomer was enumerated within pH range 6–8, followed by generating a conformation for each protonation state, using Epik[54,55] and LigPrep.[56] Finally, force field parameters for each product were generated using the hetgrp_ffgen utility (Schrödinger, LCC).

Selection of the GST Targets for Docking

A subset of GST structures was selected from the Protein Data Bank (PDB) for retrospective docking by maximizing the sequence and functional coverage of the GST superfamily (Figure 2), following three steps: First, all GST structures with a cocrystallized GSH-substrate conjugate were extracted from the PDB. Second, structures for the same protein were grouped together, resulting in 14 groups with unique sets of ligands. Finally, for each group, the structure with the highest resolution, the lowest Rfree,, and a unique ligand was selected, resulting in 14 target holo X-ray structures (Supporting Information Table S1).

Figure 2

Representative network view of the GST superfamily showing where docking targets fall in this superfamily. In this view, 2,190 50% sequence identity filtered nodes (“ID50 node”) representing 13,493 sequences are shown as dots, and each sequence similarity relationship with a BLAST E-value ≤ 1 × 10–13 is shown as a line or edge between representative nodes. The largest clusters are labeled by their subgroup in the structure–function linkage database (SFLD).[38] The 13,493 sequences for cytosolic GSTs (“cytGSTs”) are based on Pfam (v26.0)[35] sequences that were at least 100 residues in length and had scores above the gathering threshold for at least one Pfam hidden Markov model (HMM) corresponding to the cytGST N-terminal domain (“GST_N,” “GST_N2″, or “GST_N3”; collectively referred to here as “GST_N*”). Also included in the data set were 58 proteins that lacked matches to a GST_N* HMM but were chosen as Enzyme Function Initiative (EFI)[57] targets for experimental characterization because of their sequence similarities to cytGSTs. Sequence similarity was detected with an all-by-all BLAST search using blastp (v2.2.24).[36] The data were viewed with Cytoscape (v2.8.3). Sequence identities are calculated using CD-HIT[37] (“ID50” set). The edges are shown using the organic layout where nodes that are more highly interconnected are clustered more tightly together. PDB structures associated with cytGST sequences (95% or more sequence identity to the PDB sequence) were identified using the SFLD[38] interface. An ID50 node is shown as a blue dot if any member sequence of the node was associated with a PDB structure. Representative nodes where only an EFI structure was associated with its member sequences are shown as red dots. Retrospective docking targets are marked using dark blue borders; only 10 targets are shown because the rest are more than 50% identical to these 10 representatives shown.

Comparative Modeling of the GST Targets

Comparative models for each target were built by selecting template dimer structures from the PDB with either an unmodified GSH (apo) or a conjugate form (product) of GSH (holo). For each target, we aimed to include one holo and one apo template structure for each of the four sequence identity ranges: 25–40%, 40–55%, 55–70%, and 70–85%. The target–template sequence alignments were generated using HHalign,[58] which in turn relied on target and template profile hidden Markov models (HMMs) generated using HHblits[58] with the UniProt20 database.[59] Five-hundred models were built for each target–template alignment, using the automodel protocol in MODELLER-9v10.[60] The coordinates for GSH atoms in the template structure were then copied to the target models, using the BLK function of MODELLER. Models were then assessed by the discrete optimized protein energy (DOPE)[61] function of MODELLER. Finally, the model with the lowest normalized DOPE score was used for docking. To compare different comparative models and compare the models against X-ray structures, we calculated the binding site sequence identity of the templates to the X-ray structures, and the binding site non-hydrogen atom root-mean-square deviation (RMSD) between a comparative model and X-ray structure. The binding site was defined to contain atoms in all residues with at least one atom within 6 Å of any ligand atom in the X-ray structure. As expected, strong correlations were observed between the overall target–template sequence identity and the target–template binding site sequence identity, and between the target–template binding site sequence identity and the model’s non-hydrogen atom RMSD error (Figure 4A,B).

Figure 4

Comparison of comparative model properties and the RMSD errors of the docked poses. Each point represents the property of a single comparative model and/or the RMSD error of the docked pose of its corresponding crystallographic product against this model. Results for models built based on apo and holo template structures are shown in red and blue, respectively, with the least-squares linear fits shown as dashed lines. The Pearson correlation coefficients R are also shown. The gray dashed line indicates the lower bound of the RMSD of the top 3 ranked poses.

As designed by the choice of the template structures, the distribution of the non-hydrogen atom RMSD error of the binding site confirms that there is a range of accuracy across the set of modeled structures. The models with subangstrom non-hydrogen atom RMSD error tend to be those based on templates with over 80% sequence identity in the binding site residues to the target sequence. Over a third of models had a binding site RMSD error over 4 Å. Thus, homology models cover a range of sequence identities and a range of accuracy.

Protocol for Covalent Docking

Covalent docking of a potential product to a receptor was started by placing it in the binding site. More specifically, the coordinates of the GSH part of the potential product were matched to the GSH molecule in the receptor, and the coordinates of the remaining atoms of the product were then built using OMEGA.[62] Up to 20 initial configurations were generated for each product molecule. For each initial configuration, we then used PLOP’s tether pred (Academic version 25.6) function[1] to rotate the rotatable bonds in the GSH-substrate conjugate. The rotatable bonds were identified using OECHEM. We sampled all of the rotatable bonds in the substrate part of the GSH-substrate conjugate as well as the CA–CB and CB–SG bonds in the cysteine residue of GSH. Up to 50 configurations were generated by PLOP starting from each initial configuration. These configurations were then scored by calculating the total potential energy of the product–receptor complex by PLOP (in the units of kilocalories per mole), which in turn relies on the OPLS force field with a variable-dielectric generalized Born model.[63] The product–receptor distances alone were also scored by the atomic statistical potential PoseScore (in arbitrary units).[64] While the PLOP potential energy does not contain the entropic contributions to the binding free energy, the PoseScore term does approximate the contribution of the interface to the binding free energy.[65] Finally, for virtual screening, a substrate was ranked using the median PLOP energy of its products’ different stereoisomers and protonation states. The PLOP energy of a product in a specific stereoisomer form and protonation state was the median PLOP energy of its different configurations.

Assessment of Docking and Virtual Screening

When a native product was docked (Supporting Information Table S2), the docking pose of the product was assessed for accuracy based on its non-hydrogen atom RMSD from the native configuration, after superposition of the receptor used for docking on the native structure of the receptor; GSH was excluded, except for the cysteine sulfur atom. The accuracy of virtual screening was evaluated by the enrichment for the known products (Supporting Information Table S3) among the top scoring potential products. The enrichment curve was obtained by plotting the percentage of actual products found (y-axis) within the top ranked subset of all database compounds (x-axis on logarithmic scale). logAUC, the area under the curve of the enrichment plot, was also calculated to indicate the accuracy of enrichment; random selection has a logAUC of 14.5.

Results

We begin by evaluating the docking pose of the native ligands using holo and apo structures, followed by evaluating the docking pose of the native ligands using comparative models and homologous structures. Next, enrichment of the known ligands using X-ray structures is benchmarked and analyzed. Finally, we analyze the enrichment of the known ligands using comparative models.

Docking of Products into X-ray Structures

As the easiest test, we first applied covalent docking by PLOP to reconstruct binding poses of the crystallographic products in the corresponding holo X-ray structures (Supporting Information Table S1). In all 14 cases, a docking pose within 3 Å of the native structure was sampled (within 2 Å for 13 cases), though not necessarily recognized as such by the scoring function (Supporting Information Table S4); the 3 Å threshold on the all-atom RMSD subjectively distinguishes near-native and nonnative poses. However, the top 3 ranked structure was within 3 Å to the native structure in 9 out of 14 cases using the PLOP energy function, and in 11 out of 14 cases using PoseScore (Figure 3).

Figure 3

Comparison of the predicted complex between the native product and its holo X-ray structure with the native pose: (A) 1F3B with GBX; (B) 2F3M with GTD; (C) 2GST with GPS; (D) 1M9B with IBG. The surface of the receptor is shown, with red and blue corresponding to oxygen and nitrogen atoms. The product is shown in the native configuration (gray), the most accurate sampled configuration (yellow), and the top ranked configuration by the PLOP energy function (light blue) and PoseScore (green).

Next, we performed a slightly more difficult, but still easy, test. For targets with more than one crystallographic product, we docked each product to the nonnative holo X-ray structure of the target (cross-docking; Supporting Information Table S5). In all 12 cases, a docking pose within 3 Å to the native structure was sampled (within 2 Å for 11 cases). The top 3 ranked complex was within 3 Å to the native structure in 7 out of 12 cases for the PLOP energy function, and in 8 out of 12 cases for PoseScore. To test whether combining the two scoring functions can further improve the result, we considered an optimal linear combination of the PoseScore and the PLOP potential energy, using the average all-atom RMSD of the top ranked docking poses as the optimization criterion; the optimal linear combination has zero weight for the PLOP potential energy. Finally, to test whether or not apo structures can be used for predicting the native product docking pose, we docked the two products of GSTM1_HUMAN (GDN and GTD) to its two available apo structures (1XW6 and 1YJ6). The results were comparable to those from cross-docking (Supporting Information Table S6). Comparison of the predicted complex between the native product and its holo X-ray structure with the native pose: (A) 1F3B with GBX; (B) 2F3M with GTD; (C) 2GST with GPS; (D) 1M9B with IBG. The surface of the receptor is shown, with red and blue corresponding to oxygen and nitrogen atoms. The product is shown in the native configuration (gray), the most accurate sampled configuration (yellow), and the top ranked configuration by the PLOP energy function (light blue) and PoseScore (green).

Docking of Native Products into Comparative Models and Homologous Crystallographic Structures

To test whether or not comparative models can be used for docking pose prediction of native products, a total of 62 homology models were generated for the 14 GST targets (Supporting Information Table S7). We performed native product docking against these models as well as some of their template structures. To find out to what degree the success of docking pose prediction was limited by sampling versus scoring, we plotted the RMSD error of the top ranked docking pose versus the RMSD error of the best sampled docking pose (Figure 4C). As expected there was a significant correlation between these two RMSDs; the RMSD error of the best sampled pose is a lower bound of the RMSD error of the top ranked pose. Even when accurate poses were sampled, the scoring function often did not recognize them (Figure 4C). In 56 out of 62 cases, a docking pose within 3 Å was sampled, but only in 22 out of 62 cases, it was ranked in the top 3. Therefore, the limiting factor for docking pose prediction is the accuracy of the scoring function used. In this application, the inaccuracy of the scoring function may be exacerbated by errors in the model of the binding site (Figure 6). Despite the difficulties in ranking a near-native pose in the context of comparative model, for 6 of the 14 targets (1GWC, 2AB6, 3LJR, 3O76, 2C4J, and 1BYE; Figure 5), the RMSD errors of the top ranked poses by PoseScore were comparable to those against holo X-ray structures.

Figure 6

Docking of native products against comparative models. (A) Successful docking of GBX onto a model of 3CSH, based on 85% sequence identity to the template 3O76 (chain A). The ribbon representation of the crystal structure is shown in light gray and that for the model in orange. The C-α RMSD error of the model is 0.64 Å, and the non-hydrogen atom RMSD error of the binding site is 0.71 Å. The ligand is shown in the stick representation. The native pose is shown in light gray, the best-sampled pose (RMSD of 0.90 Å) is shown in yellow, the pose top ranked by PLOP (0.92 Å) is shown in cyan, and the pose top ranked by POSESCORE (0.92 Å) is shown in green (not visible because it is hidden behind the cyan structure). (B) Failed docking of GPS onto a model of 1F3B, based on 76% sequence identity to the template 1YDK (chain B). The ribbon representation of the crystal structure is shown in light gray and that for the model in orange. The C-α RMSD error of the model is 2.42 Å, and the non-hydrogen atom RMSD error of the binding site is 2.47 Å. The ligand is shown in the stick representation. The native pose is shown in light gray, the best-sampled pose (RMSD of 5.38 Å) is shown in yellow, the pose top ranked by PLOP (6.58 Å) is shown in cyan, and the pose top ranked by POSESCORE (0.92 Å) is shown in green.

Figure 5

Non-hydrogen atom RMSD error of the docked poses, for the crystallographic product against its corresponding comparative models with different sequence identities. For each target, the RMSD errors of the best-sampled poses using apo templates (stripped red bar), the RMSD errors of the top-ranked poses by PoseScore using apo templates (stripped blue bar), the RMSD errors of the best-sampled poses using holo templates (red bar), and the RMSD errors of the top ranked pose by PoseScore using apo templates (blue bar) are shown. For comparison, the RMSD errors of the docked poses using holo X-ray structures are shown as horizontal lines (blue solid line for the top ranked pose by PoseScore, red dashed line for the best-sampled pose).

We then asked the question whether or not there were model properties that predict the best model for docking pose prediction, among multiple models based on different template structures. We found no such predictors. There was essentially no correlation between a model’s binding site RMSD error and the RMSD error of the top ranked docking pose (Figure 4D). There were only weak correlations between the target–template sequence identity and the RMSD error of the best-sampled docking pose (Figure 4F; R = −0.19 for apo tempalates, and R = −0.20 for holo templates). We also calculated the correlation between the RMSD errors of the model binding site and the best-sampled docking pose. There was no significant correlation between these two RMSD errors. We also investigated whether or not holo and apo structures performed differently for docking pose prediction. On average, models based on holo template structures were not noticeably better for docking pose prediction than models from apo template structures. Finally, we tested the accuracy of docking pose prediction using a homologous structure itself, instead of a comparative target model based on it. The test was performed with the two known products of GSTM1_HUMAN (GDN and GTD) and its homologous structure (2C4J). The docking pose of GTD had the RMSD error of 2.2 Å, and the docking pose of GDN had the RMSD error of 4.7 Å, on average not significantly different from that using the corresponding comparative model (3.0 Å for GTD and 3.5 Å for GDN, binding site RMSD error). Comparison of comparative model properties and the RMSD errors of the docked poses. Each point represents the property of a single comparative model and/or the RMSD error of the docked pose of its corresponding crystallographic product against this model. Results for models built based on apo and holo template structures are shown in red and blue, respectively, with the least-squares linear fits shown as dashed lines. The Pearson correlation coefficients R are also shown. The gray dashed line indicates the lower bound of the RMSD of the top 3 ranked poses. Non-hydrogen atom RMSD error of the docked poses, for the crystallographic product against its corresponding comparative models with different sequence identities. For each target, the RMSD errors of the best-sampled poses using apo templates (stripped red bar), the RMSD errors of the top-ranked poses by PoseScore using apo templates (stripped blue bar), the RMSD errors of the best-sampled poses using holo templates (red bar), and the RMSD errors of the top ranked pose by PoseScore using apo templates (blue bar) are shown. For comparison, the RMSD errors of the docked poses using holo X-ray structures are shown as horizontal lines (blue solid line for the top ranked pose by PoseScore, red dashed line for the best-sampled pose). Docking of native products against comparative models. (A) Successful docking of GBX onto a model of 3CSH, based on 85% sequence identity to the template 3O76 (chain A). The ribbon representation of the crystal structure is shown in light gray and that for the model in orange. The C-α RMSD error of the model is 0.64 Å, and the non-hydrogen atom RMSD error of the binding site is 0.71 Å. The ligand is shown in the stick representation. The native pose is shown in light gray, the best-sampled pose (RMSD of 0.90 Å) is shown in yellow, the pose top ranked by PLOP (0.92 Å) is shown in cyan, and the pose top ranked by POSESCORE (0.92 Å) is shown in green (not visible because it is hidden behind the cyan structure). (B) Failed docking of GPS onto a model of 1F3B, based on 76% sequence identity to the template 1YDK (chain B). The ribbon representation of the crystal structure is shown in light gray and that for the model in orange. The C-α RMSD error of the model is 2.42 Å, and the non-hydrogen atom RMSD error of the binding site is 2.47 Å. The ligand is shown in the stick representation. The native pose is shown in light gray, the best-sampled pose (RMSD of 5.38 Å) is shown in yellow, the pose top ranked by PLOP (6.58 Å) is shown in cyan, and the pose top ranked by POSESCORE (0.92 Å) is shown in green.

Enrichment for Known Products by Virtual Screening against X-ray Structures

We now address the question whether or not holo and apo X-ray structures can be used for virtual product screening by covalent docking. We first applied PLOP to virtual substrate discovery by docking the whole product library against holo structures of 11 GSTs with more than 3 known substrates; this is the minimal number of known ligands needed for computing an enrichment curve. We docked the products onto each chain of the GST dimers independently and analyzed the docking results separately for each chain as well as collectively for all chains, by relying on our consensus scoring scheme;[27] consensus scoring combines docking scores from independent virtual screen against multiple chains/structures/models of the same target. In all 11 cases, the enrichment result was better than that from random selection (Figure 7). In 7 of the 11 cases, logAUC was better than 25. In contrast to the noncovalent docking benchmark,[27] consensus scoring did not improve logAUC for any target.

Figure 7

Enrichment curves for the 11 GST targets with 3 or more known products, using holo X-ray structures. For each target, the enrichment curves for the individual chains (blue, green, yellow, and cyan lines), the consensus scoring for multiple chains (red line), and random selection (black dashed line) are shown.

Next, we tested the enrichment of known ligands using two apo structures of GSTM1_HUMAN (1YJ6 and 1XW6; Supporting Information Figure S2). The averaging logAUC (32.6) is slightly worse than that using the corresponding holo structure (33.3). Finally, we tested whether or not consensus scoring using multiple structures could improve enrichment for known products by combining virtual screening results from either two apo or two holo structures of GSTM1_HUMAN. Using the two apo structures (1YJ6 and 1XW6), the logAUC for consensus scoring was 34.5. Similarly, using holo structures (2F3M and 1XWK) resulted in a logAUC value of 33.6. In both cases, logAUC for consensus scoring was between the best and worst logAUC values of individual virtual screening runs. Enrichment curves for the 11 GST targets with 3 or more known products, using holo X-ray structures. For each target, the enrichment curves for the individual chains (blue, green, yellow, and cyan lines), the consensus scoring for multiple chains (red line), and random selection (black dashed line) are shown.

Enrichment for Known Products by Virtual Screening against Comparative Models and Homologous Crystallographic Structures

To evaluate the utility of homology models for discovering substrates of GSTs, the enrichment of known products was determined and compared to that for the crystal structures. We first tested whether or not there was an apparent correlation between the enrichment and the target–template sequence identity by virtual screening against the 6 comparative models of the target 2F3M. No correlation was observed between logAUC and the target–template sequence identity (Figure 8). The absence of such a correlation was also observed in an extensive noncovalent docking benchmark.[27]

Figure 8

Enrichment curves for target 2F3M using its 6 comparative models based on different apo and holo templates. For each model, the enrichment curves for the individual chains (blue, green, yellow, and cyan lines), the consensus scoring for multiple chains (red line), and random selection (black dashed line) are shown. Target–template sequence identities are as follows: The 2F3M-3T2U and 2F3M-2FHE sequence identities are 33% and 50%, respectively (apo templates). Target–template sequence identities are as follows: 33% (2F3M-3T2U, apo), 50% (2F3M-2FHE), 33% (2F3M-2GSR, holo), 42% (2F3M-1M99, holo), 66% (2F3M-1GSU, holo), and 85% (2F3M-2C4J, holo). Binding site RMSD errors are as follows: 5.9 Å (2F3M-3T2U, apo), 3.1 Å (2F3M-2FHE), 7.4 Å (2F3M-2GSR, holo), 3.1 Å (2F3M-1M99, holo), 1.4 Å (2F3M-1GSU, holo), and 1.1 Å (2F3M-2C4J, holo).

Given the absence of a correlation between the enrichment and the target–template sequence identity, we next benchmarked the utility of the best comparative models for virtual screening, for the 11 targets with more than 3 known substrates; the best comparative model is that based on the highest target–template sequence identity. We docked the products onto each chain of the GST dimers independently, and analyzed the docking results separately for each chain as well as collectively for all chains, by relying on our consensus scoring scheme.[27] In all cases, the enrichment result was better than that from random selection (Figure 9). In 5 of the 11 cases, logAUC was better than 25. Again, the consensus scoring did not improve logAUC for any target.

Figure 9

Enrichment curves for the 11 GST targets with 3 or more known products, using their comparative models based on templates with the highest sequence identities. For each target, the enrichment curves for the individual chains (blue, green, yellow, and cyan lines), the consensus scoring for multiple chains (red line), and the random selection (black dashed line) are shown. Target–template sequence identities are as follows: 85% (2F3M-2C4J), 85% (3CSH-2C4J), 77% (2C4J-6GSV), 54% (3LJR-2C3Q), 33% (2GSQ-2GSR), 28% (3GX0-1PN9), 76% (2GST-2C4J), 44% (1M9B-2C4J), 64% (1VF3-3KTL), 76% (2AB6-6GSV), and 46% (1GWC-2VO4).

We further tested whether or not consensus docking using multiple comparative models can improve the enrichment by combining the docking results from the 6 comparative models for target 2F3M (Figure 8). Consensus scoring resulted in a logAUC of 40.4, which was slightly better than the best logAUC value for individual models (40.3) and was better than logAUC for the holo X-ray structure. Finally, we tested whether or not homologous X-ray structures could be used in place of comparative models for virtual screening, using a homologue of GSTM1_HUMAN (2C4J). The corresponding logAUC is 30.9 for chain A, 31.6 for chain B, and 31.5 for consensus scoring using chains A and B, which is significantly worse than that for the comparative model based on 2C4J (34.9 for chain A, 35.8 for chain B, and 38.7 for consensus scoring). A similar finding was obtained in the noncovalent docking benchmark.[27] Enrichment curves for target 2F3M using its 6 comparative models based on different apo and holo templates. For each model, the enrichment curves for the individual chains (blue, green, yellow, and cyan lines), the consensus scoring for multiple chains (red line), and random selection (black dashed line) are shown. Target–template sequence identities are as follows: The 2F3M-3T2U and 2F3M-2FHE sequence identities are 33% and 50%, respectively (apo templates). Target–template sequence identities are as follows: 33% (2F3M-3T2U, apo), 50% (2F3M-2FHE), 33% (2F3M-2GSR, holo), 42% (2F3M-1M99, holo), 66% (2F3M-1GSU, holo), and 85% (2F3M-2C4J, holo). Binding site RMSD errors are as follows: 5.9 Å (2F3M-3T2U, apo), 3.1 Å (2F3M-2FHE), 7.4 Å (2F3M-2GSR, holo), 3.1 Å (2F3M-1M99, holo), 1.4 Å (2F3M-1GSU, holo), and 1.1 Å (2F3M-2C4J, holo). Enrichment curves for the 11 GST targets with 3 or more known products, using their comparative models based on templates with the highest sequence identities. For each target, the enrichment curves for the individual chains (blue, green, yellow, and cyan lines), the consensus scoring for multiple chains (red line), and the random selection (black dashed line) are shown. Target–template sequence identities are as follows: 85% (2F3M-2C4J), 85% (3CSH-2C4J), 77% (2C4J-6GSV), 54% (3LJR-2C3Q), 33% (2GSQ-2GSR), 28% (3GX0-1PN9), 76% (2GST-2C4J), 44% (1M9B-2C4J), 64% (1VF3-3KTL), 76% (2AB6-6GSV), and 46% (1GWC-2VO4).

Discussion

We developed a comparative modeling and covalent docking pipeline for predicting docking poses of known products and for the virtual discovery of substrates, as well as benchmarked it on the GST enzymes. Either a holo or apo structure can be used for predicting docking poses relatively accurately in a majority of cases. While using homology models for predicting docking poses is more difficult, a homology model can also be useful. Specifically, for native holo X-ray structures as receptors, one of the top 3 ranked models is within 3 Å all-atom RMSD of the native complex in 11 of the 14 test cases (in 8 of 12 cases for cross-docking in which a product from another holo structure of the same enzyme is docked; Supporting Information Tables S4 and S5). For comparative models based on more than 30% sequence identity, one of the top 3 ranked models is within 3 Å all-atom RMSD of the native complex in 22 of the 62 test cases (Figure 5). For virtual screening against holo X-ray structures, the enrichment logAUC value is better than random (14.5) in all cases and better than 25 in 7 of 11 cases (Figure 7). For models based on templates with the highest sequence identity (approximately 30–85%), the logAUC is better than 25 in 5 of 11 cases (Figure 9), not significantly different from the crystal structures. In conclusion, we show that covalent docking can be a useful tool for substrate discovery. Next, we discuss three points in turn. First, we assess to what degree the accuracy of the docking poses and ranking of alternative products is limited by the degree of structure sampling and the accuracy of scoring. Second, we discuss the utility of comparative models for docking and virtual screening. Third, we conclude by commenting on docking transition state conformations of ligands.

Sampling versus Scoring

To predict an accurate docking pose, two conditions need to be satisfied: (i) a near-native structure needs to be generated by the sampling procedure and (ii) an accurate sample structure needs to be scored better than the inaccurate structures by the scoring function. Using X-ray structures as receptors, a near-native docking pose (<3 Å) was sampled in all cases (Supporting Information Tables S4 and S5); thus, the degree of sampling does not limit the prediction accuracy. But, in 21% of the native docking cases and in 33% of the cross-docking cases, a near-native model was not ranked in top 3 by PoseScore. Using homology models as receptors, a near-native docking pose was sampled in 56 of 62 cases but was only ranked in top 3 in 22 of 62 cases (Figure 5). The reason is that our scoring function is not always accurate enough to differentiate between accurate and inaccurate docking poses. To rank different substrates, we used the median PLOP energy of their product–enzyme complexes. We chose the median instead of the lowest PLOP energy because it resulted in a significantly higher average logAUC (27.3 vs 18.9). A likely reason is that the energy estimate of a single configuration is noisier than the median, compensating for the lack of accounting for the proper physics of the problem. In the future, other functions such as the Boltzmann average of the sampled energies will be explored. Although PoseScore reranking of the docking poses generated by optimizing the PLOP energy further improved the accuracy of the top scoring docked poses, this result does not imply that PoseScore is more accurate than the PLOP energy. It is possible that reranking by PLOP would also improve the accuracy of the poses generated by optimizing PoseScore; for technical reasons, it is not straightforward to actually make this test. Our result indicates merely that the two scoring functions are different and imperfect and that PoseScore contains helpful information not present in PLOP, while being silent on whether or not PLOP also contains information not present in PoseScore. In contrast to noncovalent docking,[27] on average, consensus scoring did not improve virtual screening accuracy. A combination of the following two assumptions explains this finding: (i) covalent docking is limited more by scoring than by sampling compared to noncovalent docking; and (ii) consensus scoring works best when sampling is a limiting factor in docking. A major difference between noncovalent and covalent docking is that the latter is constrained by a covalent bond, resulting in an easier sampling problem all other things considered equal. Using multiple receptor structures to mimic receptor flexibility was already shown to lead to more accurate sampled docking poses in noncovalent docking.[27] Thus, the disadvantage of a larger number of decoys produced by using multiple templates, which increases the burden on the scoring function for identifying the most accurate pose, is outweighed by the more accurately sampled poses that are more likely to be ranked highly by the scoring function. In contrast, if scoring is limiting and sampling is not, as is the case in covalent docking (cf. our native docking and cross-docking tests), considering poses from multiple receptor structures is not likely to improve the results, as observed.” Given the demonstrated limitations of current scoring, the accuracy of ranking ligand poses and ligands can be improved by more accurate scoring functions. On the one hand, the ability of the PLOP potential energy to rank the native state (or at least a near-native state) may be improved by estimating the free energy of the native state.[66] On the other hand, the accuracy of the statistical potential could be improved by applying the Bayesian framework for inferring statistically optimized atomic potentials (SOAP).[67] This framework (1) uses multiple data-driven “recovery” functions based on probability theory, without recourse to questionable statistical mechanical assumptions about statistical potentials; (2) restrains the relative orientation between two covalent bonds instead of a simple distance between two atoms, in an effort to capture orientation-dependent interactions such as hydrogen bonds; (3) performs Bayesian smoothing for estimating the underlying smooth distributions from noisy observations; (4) applies Bayesian inference for calculating parameter values that maximize the posteriori probability; and (5) benefits from Bayesian model selection based on Bayesian predictive densities, in an effort to improve the generalizability of the derived statistical potentials.

Comparative Models for Docking and Virtual Screening

Using comparative models for predicting docking poses often produces inaccurate models (Figures 4 and 5). One possible reason is that covalent docking is sensitive to minor changes in the binding site, especially for residues close to the GSH sulfur atom due to the substrate–receptor covalent bond constraint (Figure 6). Another possible reason is that the active sites of GST enzymes are often defined in part by the C-terminus, which is often flexible or missing in a template structure (Figure 6). Moreover, there was no significant correlation between the binding site non-hydrogen atom RMSD and the docking pose RMSD (Figure 4). The lack of this correlation may also result from the disproportionate impact of residues close to the GSH sulfur atom on the docking pose. Virtual screening using comparative models generated results comparable to the X-ray structures (average logAUC of 28.2 versus 27.3, better than 25 in 5 versus 7 out of 11 cases), consistent with what has been observed when using comparative models for virtual screening by noncovalent docking (average logAUC of 28.7 versus 30.6),[27] even though comparative models performed much worse than X-ray structures for docking pose prediction (average non-hydrogen atom RMSD of 3.98 Å versus 1.83 Å). This finding may be rationalized by comparatively worse scoring of both true and decoy products (average PoseScore of −4.2 and 4.1 versus −60.0 and −52.2) when docking against inaccurate binding sites in comparative models. In prospective applications, if some substrates for the target of interest are known, virtual screening using comparative models can be improved by selecting the most enriching model for known substrates.[68−70] If no substrates are known for the target of interest, it might be advantageous to build and dock into multiple structurally diverse models and analyze the virtual screening results by hand.[15] An experienced user may be able to identify an accurate docking result based on the totality of their knowledge about the proteins of interest.

Reactant, Product, Intermediate, and Transition States

Generally, a substrate of an enzyme is required to have a sufficiently high kcat/Km; a typical threshold is 104 (mol/L)−1 s–1. An enzymatic reaction involves interconversions between pairs of stable states, separated by a transition state; the stable states include the enzyme and substrate without interaction (E + S), an enzyme–substrate complex intermediate (ES), an enzyme–product intermediate (EP), and the enzyme and product without interaction (E + P).[71] In general, kcat/Km depends on the free energies of all states. Thus, virtual ligand screening should in principle consider all states, which is generally not the case. When conversion from ES to EP is the rate-limiting step, kcat is mostly determined by the transition state energy barrier for this step. Enzymes lower this barrier by creating a binding site that is complementary to the transition state of the substrate. Thus, docking a substrate (or a product) in its transition state conformation may be more appropriate than docking a substrate (or a product) in its ground state. Although quantum mechanics calculations can model the transition state and estimate the activation energy barrier, running these calculations for the entire virtual screening library is not computationally feasible.[72] Instead, we docked here the reaction product states and achieved relative success. However, the enrichment results could possibly be greatly improved if we can dock the transition state, or a transition state like structure, without recourse to quantum mechanics. For GST substrate discovery, compounds similar to the high-energy intermediates used for docking to amidohydrolases[73] could be constructed, provided some technical challenges are solved (e.g., building C atoms with 2 partial bonds for a total of 5 bonds). Although enzymes presumably maximize complementarity to the transition state, other states during the reaction must also be compatible with the potentially flexible binding site of the enzyme. Thus, substrate discovery might also be improved by consideration of docking of all ligand states to the corresponding state(s) of the enzyme.

Conclusion

Can covalent docking accurately predict docking poses of known ligands, given experimentally determined, homologous or modeled structures in the GST superfamily? Yes, either a holo or apo structure can be used for predicting docking poses accurately in a majority of cases, while using homology models is less successful. Specifically, for native holo X-ray structures as receptors, one of the top 3 ranked models is within 3 Å all-atom RMSD of the native complex in 11 of the 14 cases (in 8 of 12 cases for cross-docking). For comparative models as receptors, one of the top 3 ranked models is within 3 Å all-atom RMSD of the native complex in 22 of the 62 test cases. Can covalent docking accurately predict ligands despite the catalytically promiscuity of many GST enzymes? Often. What is the difference in the utility of apo, holo, comparative modeling, and homologous structures for virtual screening in the GST superfamily? Holo and apo structures, homologous structures, and comparative models can all enrich for known products. For virtual screening against holo X-ray structures, the enrichment logAUC value is better than random in all cases, and better than 25 in 7 of 11 cases (average logAUC 27.3; Figure 7). Using apo X-ray structures of 2F3M, the average logAUC (32.6) is slightly worse than that using the corresponding holo structure (33.3). For models based on templates with the highest sequence identity (approximately 30–85%), the logAUC is better than 25 in 5 of 11 cases (average logAUC 28.2; Figure 9), which is comparable to that for X-ray structure. Using the homologous structure of 2F3M, the logAUC (31.3) is significantly worse than that using comparative models (36.5). Can the virtual screening be improved by consensus scoring, relying on independent screening against multiple holo, apo, comparative modeling, and homologous structures in the GST superfamily? Rarely. Consensus scoring using multiple chains from the same structure/model or multiple structures/models of the same target often does not improve the virtual screening accuracy. Although consensus scoring improves logAUC in comparison to using a single receptor in some cases, it makes it worse in others. If multiple models are calculated on the basis of different templates, can any of them outperform apo and even holo X-ray structures of the target? If so, can one reliably identify which model will do so, or even perform optimally among a set of modeled structures; are there sequence and/or structural attributes (i.e., the overall target–template sequence identity, the binding site target–template sequence identity, and the predicted accuracy of a model) that reliably predict the accuracy of ligand docking? For docking pose prediction, there are a few cases where comparative models outperformed the corresponding holo X-ray structure (Figure 5). However, we did not find a predictor that could reliably predict the accuracy of the docked pose. For virtual screening, some models also outperform the corresponding holo X-ray structure (Figures 8 and 9), but we did not find a correlation between logAUC and the binding site sequence identity or the binding site RMSD errors.

65 in total

1. Ligand-supported homology modeling of g-protein-coupled receptor sites: models sufficient for successful virtual screening.

Authors: Andreas Evers; Gerhard Klebe
Journal: Angew Chem Int Ed Engl Date: 2004-01 Impact factor: 15.336

2. A hierarchical approach to all-atom protein loop prediction.

Authors: Matthew P Jacobson; David L Pincus; Chaya S Rapp; Tyler J F Day; Barry Honig; David E Shaw; Richard A Friesner
Journal: Proteins Date: 2004-05-01

3. Optimized atomic statistical potentials: assessment of protein interfaces and loops.

Authors: Guang Qiang Dong; Hao Fan; Dina Schneidman-Duhovny; Ben Webb; Andrej Sali
Journal: Bioinformatics Date: 2013-09-27 Impact factor: 6.937

4. Predicting subsite interactions of plasmin with substrates and inhibitors through computational docking analysis.

Authors: Keigo Gohda; Naoki Teno; Keiko Wanaka; Yuko Tsuda
Journal: J Enzyme Inhib Med Chem Date: 2011-10-12 Impact factor: 5.051

5. Assignment of pterin deaminase activity to an enzyme of unknown function guided by homology modeling and docking.

Authors: Hao Fan; Daniel S Hitchcock; Ronald D Seidel; Brandan Hillerich; Henry Lin; Steven C Almo; Andrej Sali; Brian K Shoichet; Frank M Raushel
Journal: J Am Chem Soc Date: 2013-01-02 Impact factor: 15.419

6. Identification of three classes of cytosolic glutathione transferase common to several mammalian species: correlation between structural data and enzymatic properties.

Authors: B Mannervik; P Alin; C Guthenberg; H Jensson; M K Tahir; M Warholm; H Jörnvall
Journal: Proc Natl Acad Sci U S A Date: 1985-11 Impact factor: 11.205

7. Transition state model and mechanism of nucleophilic aromatic substitution reactions catalyzed by human glutathione S-transferase M1a-1a.

Authors: Yury Patskovsky; Larysa Patskovska; Steven C Almo; Irving Listowsky
Journal: Biochemistry Date: 2006-03-28 Impact factor: 3.162

8. Structural basis for the binding of the anticancer compound 6-(7-nitro-2,1,3-benzoxadiazol-4-ylthio)hexanol to human glutathione s-transferases.

Authors: Luca Federici; Carlo Lo Sterzo; Silvia Pezzola; Adele Di Matteo; Flavio Scaloni; Giorgio Federici; Anna Maria Caccuri
Journal: Cancer Res Date: 2009-10-06 Impact factor: 12.701

9. Structure and function of the xenobiotic substrate binding site of a glutathione S-transferase as revealed by X-ray crystallographic analysis of product complexes with the diastereomers of 9-(S-glutathionyl)-10-hydroxy-9,10-dihydrophenanthrene.

Authors: X Ji; W W Johnson; M A Sesay; L Dickert; S M Prasad; H L Ammon; R N Armstrong; G L Gilliland
Journal: Biochemistry Date: 1994-02-08 Impact factor: 3.162

10. The MetaCyc database of metabolic pathways and enzymes and the BioCyc collection of pathway/genome databases.

Authors: Ron Caspi; Tomer Altman; Kate Dreher; Carol A Fulcher; Pallavi Subhraveti; Ingrid M Keseler; Anamika Kothari; Markus Krummenacker; Mario Latendresse; Lukas A Mueller; Quang Ong; Suzanne Paley; Anuradha Pujar; Alexander G Shearer; Michael Travers; Deepika Weerasinghe; Peifen Zhang; Peter D Karp
Journal: Nucleic Acids Res Date: 2011-11-18 Impact factor: 16.971

9 in total

Review 1. Docking Screens for Novel Ligands Conferring New Biology.

Authors: John J Irwin; Brian K Shoichet
Journal: J Med Chem Date: 2016-03-15 Impact factor: 7.446

Review 2. Enzyme Function Initiative-Enzyme Similarity Tool (EFI-EST): A web tool for generating protein sequence similarity networks.

Authors: John A Gerlt; Jason T Bouvier; Daniel B Davidson; Heidi J Imker; Boris Sadkhin; David R Slater; Katie L Whalen
Journal: Biochim Biophys Acta Date: 2015-04-18

Review 3. Targeting Non-Catalytic Cysteine Residues Through Structure-Guided Drug Discovery.

Authors: Kenneth K Hallenbeck; David M Turner; Adam R Renslo; Michelle R Arkin
Journal: Curr Top Med Chem Date: 2017 Impact factor: 3.295

4. Covalent docking predicts substrates for haloalkanoate dehalogenase superfamily phosphatases.

Authors: Nir London; Jeremiah D Farelli; Shoshana D Brown; Chunliang Liu; Hua Huang; Magdalena Korczynska; Nawar F Al-Obaidi; Patricia C Babbitt; Steven C Almo; Karen N Allen; Brian K Shoichet
Journal: Biochemistry Date: 2015-01-05 Impact factor: 3.162

5. An approach to functionally relevant clustering of the protein universe: Active site profile-based clustering of protein structures and sequences.

Authors: Stacy T Knutson; Brian M Westwood; Janelle B Leuthaeuser; Brandon E Turner; Don Nguyendac; Gabrielle Shea; Kiran Kumar; Julia D Hayden; Angela F Harper; Shoshana D Brown; John H Morris; Thomas E Ferrin; Patricia C Babbitt; Jacquelyn S Fetrow
Journal: Protein Sci Date: 2017-03-08 Impact factor: 6.725

Review 9. Recent Trends and Applications of Molecular Modeling in GPCR⁻Ligand Recognition and Structure-Based Drug Design.

Authors: Xiaojing Yuan; Yechun Xu
Journal: Int J Mol Sci Date: 2018-07-20 Impact factor: 5.923