Andrew Anighoro1, Jürgen Bajorath1. 1. Department of Life Science Informatics, B-IT, LIMES Program Unit Chemical Biology and Medicinal Chemistry, Rheinische Friedrich-Wilhelms-Universität, Dahlmannstr. 2, D-53113 Bonn, Germany.
Abstract
Ligand docking into homology models of G-protein-coupled receptors (GPCRs) is a widely used approach in computational compound screening. The generation of "double-hypothetical" models of ligand-target complexes has intrinsic accuracy limitations that further complicate compound ranking and selection compared to those of X-ray structures. Given these uncertainties, we have explored "fuzzy 3D similarity" between hypothetical binding modes of known ligands in homology models and docking poses of database compounds as an alternative to conventional scoring schemes. Therefore, GPCR homology models at varying accuracy levels were generated and used for docking. Increases in recall performance were observed for fuzzy 3D similarity ranking using single or multiple ligand poses compared to that of conventional scoring functions and interaction fingerprints. Fuzzy similarity ranking was also successfully applied to docking into an external model of a GPCR for which no experimental structure is currently available. Taken together, our results indicate that the use of putative ligand poses, albeit approximate at best, increases the odds of identifying active compounds in docking screens of GPCR homology models.
Ligand docking into homology models of G-protein-coupled receptors (GPCRs) is a widely used approach in computational compound screening. The generation of "double-hypothetical" models of ligand-target complexes has intrinsic accuracy limitations that further complicate compound ranking and selection compared to those of X-ray structures. Given these uncertainties, we have explored "fuzzy 3D similarity" between hypothetical binding modes of known ligands in homology models and docking poses of database compounds as an alternative to conventional scoring schemes. Therefore, GPCR homology models at varying accuracy levels were generated and used for docking. Increases in recall performance were observed for fuzzy 3D similarity ranking using single or multiple ligand poses compared to that of conventional scoring functions and interaction fingerprints. Fuzzy similarity ranking was also successfully applied to docking into an external model of a GPCR for which no experimental structure is currently available. Taken together, our results indicate that the use of putative ligand poses, albeit approximate at best, increases the odds of identifying active compounds in docking screens of GPCR homology models.
Docking of small molecules into binding sites of target proteins
continues to be the major approach to computational ligand identification.[1,2] Although high-resolution X-ray structures are generally preferred
as docking templates, homology models are also frequently used in
the absence of experimental structures.[3−5] Although protein models
are less accurate than X-ray structures from which they are derived,
they are also capable of enriching active compounds in database rankings
from docking screens and identifying new active compounds in prospective
applications.[6]G-protein-coupled
receptors (GPCRs) are a growth area for molecular
docking, due to stellar advances in the structural biology of these
highly complex membrane receptor systems over the past decade.[7−9] The increasing number of experimentally determined GPCR structures
has deepened our understanding of binding-site features and provided
unprecedented opportunities for structure-based compound screening
and ligand design.[10] At the same time,
structural coverage across the GPCR superfamily is still sparse compared
to that of other popular therapeutic targets. Accordingly, homology
models of GPCRs continue to play an important role for docking,[11,12] yielding impressive results in some applications.[6] In fact, with each new GPCR X-ray structure that is becoming
available, the knowledge base for homology modeling further increases
compared to that of earlier days when the X-ray structure of bovinerhodopsin was for long the only available template for GPCR modeling.[13,14]Despite incremental advances made over the years, force field-based
scoring of ligand poses still represents a major limitation of hit
identification via docking.[15−17] Difficulties in reliably ranking
active compounds on the basis of force field energy functions have
triggered the exploration of alternative scoring approaches tailored
toward different targets or families, including GPCRs.[18−20] In this context, synergies between docking and ligand-based computational
screening methods can also be taken into consideration,[21,22] for example, the inclusion of molecular similarity[23] in evaluating docking poses. However, only few studies
have so far attempted to combine ligand-based similarity assessment
and docking, for example, through sequential calculations[24,25] or data fusion.[26]As a step toward
integration of molecular similarity and docking
calculations, we have previously reported a methodology combining
the generation of docking poses with 3D similarity comparison to experimentally
determined ligand-binding modes.[27] Accordingly,
instead of applying scoring functions, docking poses and reference
ligands were compared and 3D similarity values calculated to generate
database rankings, which frequently yielded significant improvements
in compound recall over conventional scoring or the use of protein–ligand
interaction fingerprints (PLIFs) for pose prioritization.[27,28] In addition to other targets, the utility of this approach was also
demonstrated for GPCRs.[28] However, given
the dependence of the 3D similarity-based approach on crystallographic
ligand binding modes, its application was principally limited to targets
for which complex X-ray structures were available.Herein, we
explored 3D similarity-based compound ranking for docking
into homology models of GPCRs, which continue to be widely used in
virtual screening. This might appear to be counterintuitive at a first
glance because 3D similarity ranking was conceptually based on crystallographic
ligand-binding modes. The probability to correctly prioritize docking
poses was expected to increase with the increase in the accuracy of
the reference information. However, given that docking into homology
models yields “double-hypothetical” ligand–target
complexes[29] with principally limited accuracy,
we were interested in investigating whether “fuzzy similarity”
between the modeled reference and database compounds might also suffice
to guide compound selection. Therefore, different strategies were
evaluated for ranking on the basis of fuzzy similarity, including
single or multiple reference poses. As reported herein, these strategies
produced higher compound recall for GPCR homology models than force
field energy scoring or PLIFs, thus further extending the potential
of model-based compound selection.
Methods
and Materials
Receptor Structures and
New Homology Models
An X-ray structure of the β2 adrenergic (β2) receptor bound to the antagonist
carazolol (PDB code 2RH1(30)) was taken from the Protein Data Bank.
The structure was
prepared for docking using Molecular Operating Environment (MOE) 2014.09.[31] Bound ions, organic solvent, and water molecules
were removed from the receptor ligand-binding domain used as a template
for docking. Other preparation steps included the addition of hydrogen
atoms, computation of protonation states and tautomers (calculated
at pH 7), assignment of partial charges, and limited energy minimization
(structural relaxation) using the Amber10 force field until a root
mean square (RMS) gradient of 0.1 kcal/mol/Å2 was
reached.Two homology models of the β2 receptor
were built using the structure of β1 adrenergic (β1) receptor (PDB code 4BVN)[32] from turkey (Meleagris gallopavo) and human adenosine A2A (A2A) receptor (PDB code 4EIY)[33] as templates,
respectively. In addition, a homology model of the A2A receptor
was generated using the β2 receptor, 2RH1, as a template.
The modeling protocol detailed in the following was consistently applied.Initial sequence alignments were generated with Clustal Omega.[34] For homology modeling, the alignments were manually
edited to appropriately place insertions and deletions in variable
regions. In each case, MODELLER version 9.17[35] was used to build 500 initial models, and the one with the most
favorable DOPE energy score was selected. Selected models were prepared
for docking as described above for the X-ray structure. A known antagonist
was flexibly docked into the binding site of each selected model to
refine binding site coordinates and provide a reference binding mode.
Ligands included ZM241385[33] (for A2A) and carazolol[30] (for β2). To adjust side chain positions of the residues in the modeled
binding sites in a ligand-assisted manner, the induced-fit docking
protocol of MOE[31] was applied, permitting
side chain flexibility within 6 Å of the placed ligand. In each
case, 100 of 100 000 initially generated docking poses were
retained and scored with the GBVI/WSA dG function implemented in MOE
using default parameter settings. As a reference for RMSD calculations,
crystallographic binding modes of reference ligands were transferred
into the models following α carbon atom superposition of the
X-ray structures and corresponding models.
External
GPCR Model
A homology model
of the 5-hydroxytryptamine 6 (5-HT6) receptor was extracted
from the GPCRdb.[36] This database includes
various models of human GPCRs generated by fragment assembly from
different templates for backbone construction combined with position-specific
rotamer side chain modeling.[37] The 5-HT6 receptor model was prepared for docking as described above.
Compound Sets
A benchmark set for
the A2A receptor was extracted from the DEKOIS 2.0.[38] The selected data set included 37 antagonists
and 1100 corresponding decoys. To focus the study on the ranking of
antagonists, the only three known agonists (BDB50085666, BDB50085668,
and BDB50309479) present in the benchmark set were removed together
with 100 decoys selected by the developers of the DEKOIS database
to match physicochemical properties of these agonists. Activity annotations
were confirmed on the basis of the corresponding BindingDB[39] records. For β2, no functional
designation (agonist or antagonist) was provided for a number of ligands
in DEKOIS. Therefore, 23 known antagonists were taken from the IUPHAR/BPS
Guide to Pharmacology database (GtoPdb).[40] These compounds were found to correspond to 15 different Bemis–Murcko
(BM) scaffolds, indicating structural diversity. For these antagonists,
1150 decoys were generated via the DUD-E web server.[41] For the external GPCR model, a set of 35 5-HT6 antagonists (containing 32 unique BM scaffolds) was extracted from
GtoPdb[40] and 2350 decoys were generated
using DUD-E.[41]For each compound,
an initial low-energy conformation was generated with MOE and protonation
states and partial charges were assigned on the basis of its AM1-BCC
implementation following a previously reported protocol,[42] which was also applied to prepare crystallographic
ligands for docking.
Docking and Scoring
All docking trials
were carried out using the Dock module of MOE.[31] The triangle matcher function was used to generate 1000
docking poses for each ligand, and the top 30 poses on the basis of
the London dG scoring function were preselected and further refined
by rescoring using the GBVI/WSA dG scoring function to produce final
ranking. In previous studies,[27,28] this combination of
the London dG and GBVI/WSA dG functions was the consistently best
performing force field-based scoring scheme.
Similarity
Calculations
Similarity
to reference binding modes was quantified using a property density
function-based 3D similarity measure[43] that
was consistently applied in our previous studies for 3D similarity
evaluation of docking poses.[27,28] In addition, interaction
similarity was assessed on the basis of the PLIF implementation of
MOE.[31] In the former case, normalized overlap
of property density functions (ranging from 0 to 1) was calculated
as a measure of 3D similarity.[43] Accordingly,
both conformational and translational differences (e.g., different
orientations in a binding site and/or positional displacements) were
taken into account. In the latter case, receptor–ligand contacts
were assigned to six categories of interactions including side chain-mediated
hydrogen bonds (donor and acceptor), backbone-mediated hydrogen bonds
(donor and acceptor), ionic interactions, and surface interactions.
PLIFs were calculated with default settings and compared using the
Tanimoto coefficient.[44] For each similarity
measure, compound rankings were calculated.The reference ligands
used for similarity calculations were overall characterized by relatively
low similarity to the docked antagonists. Only one to two antagonists
in each data set had a two-dimensional (2D) similarity value of greater
than 0.8 calculated using MACCS fingerprint compared to that of the
reference compounds.
Performance Evaluation
Receiver operator
characteristic (ROC) plots were generated to evaluate compound rankings.
ROC curves monitor the percentage of known active compounds retrieved
at each position of the ranking. The area under the ROC curve (AUC)
was calculated as a measure of the enrichment of active compounds
in a ranking applying the composite trapezoidal rule. AUC values of
0.5 correspond to the random distribution of active compounds and
decoys in rankings, whereas increasing AUC values greater than 0.5
further indicate increasing enrichment of active compounds at high-rank
positions. Accordingly, an AUC value of 1 would be produced by a ranking
in which all active compounds would be ranked higher than the best
scoring decoys. In addition, to specifically assess early enrichment
of active compounds, the enrichment factor for 10% of the ranked database
(Ef10%) was also computed.[45] The maximum theoretical Ef10% for all three data sets was 10.For the external
homology model, the performance was also assessed by calculating Ef1% values. The maximum theoretical Ef1% for this data set was 68.6. Furthermore,
Rocker[46] was used to calculate BEDROC[47] values with α = 20.0.
Results and Discussion
Study Concept
Previously, we have
ranked docking poses on the basis of 3D similarity to crystallographic
ligand-binding modes as an alternative to conventional force field
scoring.[27,28] These were the first attempts to calculate
3D similarity for compound ranking from docking screens, which represents,
by definition, a knowledge-based approach. The ability of 3D similarity
calculations to enrich active compounds at high-rank positions was
attributed to the use of well-defined experimental ligand-binding
modes as references that many active compounds were anticipated to
resemble. In fact, core fragments of crystallographic ligands were
in some instances already sufficient to effectively guide compound
ranking.[28] Herein, we have investigated
the question whether approximate ligand poses might also be useful.
For example, in cases in which no complex structures are available,
known active compounds might be docked into X-ray structures of targets
to provide reference poses. If modeled poses would at least be approximately
correct, they might serve as a surrogate or alternative for scoring.
We also reasoned that the use of approximate ligand-binding modes
might be particularly suitable if the docking template itself was
approximate, that is, a computational model instead of a refined X-ray
structure. Hence, we essentially asked the question whether the odds
of docking into approximate models might be further improved by adding
one or more hypothetical compound-binding modes as a basis for compound
ranking. However, propagating inaccuracies resulting from modeling
of targets and complexes might also compromise similarity-based ranking.
In any event, the use of double-hypothetical models inevitable introduces
fuzziness into 3D similarity assessment, which is not due to the similarity
calculations (which are the same as for crystallographic references)
but rather to the use of approximate reference states. Fuzziness of
these calculations might be further increased by using alternative
binding poses instead of a single one. Because the assessment of fuzzy
similarity for compound ranking was the main motivation for our current
study, the analysis was deliberately focused on homology models of
GPCRs, which continue to be popular docking targets. Owing to advances
in GPCR crystallography, homology modeling of GPCRs has experienced
a renaissance in recent years.
Docking
into the X-ray Structure of the β2 Receptor
As a reference calculation, 3D similarity
scoring was initially applied to the X-ray structure of the β2 receptor complexed to the antagonist carazolol. Three alternative
rankings were generated on the basis of 3D similarity to the bound
antagonist carazolol, PLIF-based similarity, and the preferred scoring
function. The results are summarized in Table and graphically represented in Figure A. Both scoring and
3D similarity calculations resulted in a high enrichment of known
antagonists with an AUC of 0.80 and 0.83, respectively, and an Ef10% of 6.09 for both methods. By contrast,
PLIF-based similarity only resulted in an AUC of 0.62. Figure shows that PLIF-based ranking
was very sensitive to correct posing. When docked antagonists departed
from the binding of carazolol, even if only in part, ligand–receptor
interaction details were modulated, thereby reducing PLIF similarity
and leading to low ranks. In contrast, 3D similarity calculations
were much more robust and yielded comparably high ranks for antagonists
as long as the docking poses were at least approximate and parts of
ligands correctly aligned. Importantly, 3D similarity calculations
quantify whole-molecule resemblance by comparing atomic property density
functions but are insensitive to interaction differences, which provided
an advantage in this case. For the A2A receptor in which
ligand–receptor key interactions were mainly formed by a large
rigid aromatic ring system, posing was more stable than observed for
the β2 receptor herein and central interactions often
conserved, leading to more successful PLIF-based ranking of known
antagonists A2A.[28] In the presence
of approximate poses, 3D similarity assessment was clearly more effective.
Table 1
Docking Screen of
the β2 Receptor Structurea
protocol
reference
pose
AUC
Ef10%
FF scoring
none
0.80
6.09
3D similarity
X-ray
0.83
6.09
PLIF similarity
X-ray
0.62
4.35
Compounds were docked into the β2 receptor structure (PDB code 2RH1) and ranked on the basis of force field
(FF) energy scoring and 3D similarity as well as the PLIF similarity
to the X-ray structure of a bound ligand. Recall of active compounds
is reported. Results for the best performing methods are in bold.
Figure 1
ROC plots
for alternative ranking schemes. Orange curves represent
the results for the London dG scoring function and yellow curves for
3D similarity to the crystallographic binding mode of carazolol (panel
A), a docking pose of carazolol (panels B and C), or a docking pose
of a fragment of ZM241385 (consisting of the triazolotriazine core
and the furan ring) (panel D). Green curves represent the results
for PLIF-based compound ranking by using the same reference poses
as before, and the blue lines provide a reference for random compound
selection.
Figure 2
3D similarity vs PLIFs. Superposition of co-crystallized
carazolol
(green) and four representative docked antagonists (magenta) are shown.
In panel A, propranolol essentially matches all structural features
of carazolol and hence forms comparable interactions with the receptor.
In panel B, NIP also aligns well with carazolol, but the protonated
amine is too far away from Asp113 for forming a salt bridge. In panel
C, a salt bridge between CGP 12177 and Asp113 is present, but the
hydroxyl group is displaced and the aromatic ring system is found
in a head-to-tail orientation compared to that of carazolol. In panel
D, metoprolol only partly overlaps with carazolol and does not form
well-defined interactions. For each antagonist, 3D similarity scores
and percentage rank positions (in parentheses) are reported.
ROC plots
for alternative ranking schemes. Orange curves represent
the results for the London dG scoring function and yellow curves for
3D similarity to the crystallographic binding mode of carazolol (panel
A), a docking pose of carazolol (panels B and C), or a docking pose
of a fragment of ZM241385 (consisting of the triazolotriazine core
and the furan ring) (panel D). Green curves represent the results
for PLIF-based compound ranking by using the same reference poses
as before, and the blue lines provide a reference for random compound
selection.3D similarity vs PLIFs. Superposition of co-crystallized
carazolol
(green) and four representative docked antagonists (magenta) are shown.
In panel A, propranolol essentially matches all structural features
of carazolol and hence forms comparable interactions with the receptor.
In panel B, NIP also aligns well with carazolol, but the protonated
amine is too far away from Asp113 for forming a salt bridge. In panel
C, a salt bridge between CGP 12177 and Asp113 is present, but the
hydroxyl group is displaced and the aromatic ring system is found
in a head-to-tail orientation compared to that of carazolol. In panel
D, metoprolol only partly overlaps with carazolol and does not form
well-defined interactions. For each antagonist, 3D similarity scores
and percentage rank positions (in parentheses) are reported.Compounds were docked into the β2 receptor structure (PDB code 2RH1) and ranked on the basis of force field
(FF) energy scoring and 3D similarity as well as the PLIF similarity
to the X-ray structure of a bound ligand. Recall of active compounds
is reported. Results for the best performing methods are in bold.
Docking
into Homology Models
For
the next step, three homology models were built, including two of
the β2 and one of the A2A receptor (Table ). The two β2 models were constructed to represent different accuracy levels.
The first model was built using the structure of the β1 adrenergic (β1) receptor from turkey as a template.
This model is referred to as β2(β1). The β1 template and β2 receptors
shared high sequence identity and conserved binding site residues,
yielding an overall accurate model (RMSD 1.72 Å; magenta in Figure A). The alternative
model was obtained using the structure of the human A2A receptor as a template, referred to as β2(A2A). Although β1, β2, and
A2A belonged to class A (rhodopsin-like) GPCRs, β2 shared much lower sequence identity with A2A than
with β1 (Table ). The β2(A2A) model was
thus overall less accurate (RMSD of 3.06 Å; orange in Figure A), including the
ligand binding that was only partly conserved in β1 and A2A (Figure B). In addition, an approximate homology model of the A2A receptor was also generated using the β2 receptor as a template, referred to as A2A(β2) (RMSD 3.62 Å; pink in Figure C). Overall, A2A(β2) was the least accurate homology model, and binding site accuracy
of this model was also only limited (Figure D), as expected.
Table 2
In-House Homology Modelsa
model
template
sequence
identity (%)
template PDB code
template resol. (Å)
model RMSD (Å)
pocket
RMSD (Å)
docked ligand RMSD (Å)
DOPE score
β2
β1
52.9
4BVN
2.1
1.7
0.5
1.2
–41 854.8
β2
A2A
30.4
4EIY
1.8
3.1
2.3
5.7
–39 094.1
A2A
β2
30.4
2RH1
2.4
3.6
3.6
1.8
–38 530.9
For two GPCRs with known X-ray structures,
homology models were generated using different templates. All template
structures were from Homo sapiens,
except β1, which was from M. gallopavo. For templates and targets, sequence identity is reported. For template
structures, the crystallographic resolution is also given. In addition,
RMSD values are provided for comparison of each model with the corresponding
X-ray structure, residues forming the binding pocket, and docked ligands
and their crystallographic binding modes (after superposition of the
model and X-ray structure). DOPE scores computed by MODELLER to assess
model quality are also reported.
Figure 3
Structures and binding
modes. Panel A shows the superposition of
the X-ray structure of the β2 receptor (green) and
two homology models based on the β1 (magenta) and
the A2A (orange) receptor, respectively. In panel B, binding
site details are displayed including co-crystallized carazolol (green)
and modeled binding modes. Panel C shows the superposition of the
X-ray structure of the A2A receptor (cyan) and the β2 receptor-based model (pink). In panel D, binding site details
are shown including co-crystallized ZM241385 (cyan) and modeled binding
modes. Panel E shows the homology model of the 5-HT6 receptor
from GPCRdb. In panel F, binding site details are displayed including
three different docking poses of Ro 04-6790.
Structures and binding
modes. Panel A shows the superposition of
the X-ray structure of the β2 receptor (green) and
two homology models based on the β1 (magenta) and
the A2A (orange) receptor, respectively. In panel B, binding
site details are displayed including co-crystallized carazolol (green)
and modeled binding modes. Panel C shows the superposition of the
X-ray structure of the A2A receptor (cyan) and the β2 receptor-based model (pink). In panel D, binding site details
are shown including co-crystallized ZM241385 (cyan) and modeled binding
modes. Panel E shows the homology model of the 5-HT6 receptor
from GPCRdb. In panel F, binding site details are displayed including
three different docking poses of Ro 04-6790.For two GPCRs with known X-ray structures,
homology models were generated using different templates. All template
structures were from Homo sapiens,
except β1, which was from M. gallopavo. For templates and targets, sequence identity is reported. For template
structures, the crystallographic resolution is also given. In addition,
RMSD values are provided for comparison of each model with the corresponding
X-ray structure, residues forming the binding pocket, and docked ligands
and their crystallographic binding modes (after superposition of the
model and X-ray structure). DOPE scores computed by MODELLER to assess
model quality are also reported.Induced-fit docking was used to generate reference binding modes
of carazolol for the β2 models and of ZM241385 for
the A2A models. For β2, the accuracy of
the modeled binding modes of carazolol correlated with the accuracy
of the models. Thus, in β2(β1),
the pose was close to the experimental binding mode (RSMD 1.2 Å;
magenta in Figure B), whereas the pose was displaced in β2(A2A) with only partial structural overlap (RMSD 5.7 Å; orange in Figure B). For ZM241385A,
a pose was obtained that also displayed a partial displacement but
was overall well aligned with the experimental binding mode (RMSD
1.8 Å; Figure D). In the case of β2(β1) and β2(A2A), the pose with the lowest RMSD value generated
by the induced-fit docking protocol did not correspond to the one
with the best score that was selected as a reference. A pose with
an RMSD value of 0.5 Å was obtained for β2(β1) and another with an RMSD of 3.5 Å for β2(A2A). For consistency with our validation study, these
poses were not selected as a reference.For the three homology
models, docking screens were carried out
using specifically assembled sets of antagonists and decoys. For β2(β1) and β2(A2A), AUC values of 0.69 and 0.61 were obtained, respectively (Table ), whereas for A2A(β2), the least accurate model, the recall
of known antagonists on the basis of scoring, was close to a random
selection (Table ).
Hence, compound recall for homology models was lower than for the
corresponding X-ray structure and dependent on the accuracy of the
models, consistent with our expectation.
Table 3
Docking
Screens of In-House Homology
Modelsa
AUC
Ef10%
protocol
reference pose
β2(β1)
β2(A2A)
A2A(β2)
β2(β1)
β2(A2A)
A2A(β2)
FF scoring
none
0.69
0.61
0.49
3.48
1.74
1.08
3D similarity
Ind. Fit
0.70
0.79
0.65
5.22
4.35
2.97
Top 3 S.
0.69
0.79
0.50
5.00
4.09
0.83
Top 3 Dif.
0.82
0.76
0.76
5.45
3.18
4.72
PLIF similarity
Ind. Fit
0.63
0.47
0.40
3.92
1.74
0.27
Top 3 S.
0.60
0.43
0.52
3.18
1.36
1.94
Top 3 Dif.
0.62
0.50
0.53
1.82
2.73
1.67
Reported is the
recall performance
for docking into different homology models using alternative ranking
schemes. For 3D and PLIF similarity, three different reference pose
schemes are evaluated. “Ind. Fit” stands for induced
fit, “Top 3 S.” refers to the three top-scoring ligand
docking poses, and “Top 3 Dif.” to three dissimilar
docking poses. Results for the best performing methods are in bold.
Reported is the
recall performance
for docking into different homology models using alternative ranking
schemes. For 3D and PLIF similarity, three different reference pose
schemes are evaluated. “Ind. Fit” stands for induced
fit, “Top 3 S.” refers to the three top-scoring ligand
docking poses, and “Top 3 Dif.” to three dissimilar
docking poses. Results for the best performing methods are in bold.Next, docking trials were carried
out using the induced-fit docking
poses of carazolol and ZM241385 described above as reference binding
modes. In the case of ZM241385, only the core fragment composed by
the triazolotriazine core and furan ring was considered, which dominated
the 3D similarity calculations on X-ray templates.[28] For all three models, an increase in compound recall and
early enrichment was observed for 3D similarity ranking compared to
that of scoring, as reported in Figure B–D and Table .For the β2(β1) and β2(A2A) models, the use of induced-fit
docking poses
as a reference for 3D similarity calculations resulted in AUC values
of 0.70 and 0.79, respectively, and even for A2A(β2) a value of 0.65, although scoring was in this case not better
than random selection. By contrast, PLIF scoring only produced an
enrichment for β2(β1), with an AUC
value of 0.63, but was below random selection in the other cases,
which further illustrated the difficulties to score approximate interactions;
a ranking strategy we consider unsuitable for models. In contrast,
3D similarity calculations were successful. Even ligands displaced
in homology models relative to their crystallographic binding modes,
as shown in Figure , were sufficient to guide compound selection on the basis of 3D
similarity and achieve better enrichment than energy scoring. Thus,
as long as the orientation of a reference ligand within the binding
site was at least approximately correct, 3D similarity calculations
were robust and tolerant to limited inaccuracies when using homology
models and hypothetical complexes.
Multiple
Reference Poses
In light
of the positive results obtained for individual ligand poses, we also
asked the question whether the use of multiple poses might further
improve compound ranking. Taking inherent accuracy limitations into
account, multiple poses further increase the degree of fuzziness underlying
similarity calculations. If multiple poses were used, it would be
possible to select for each docked compound the pose yielding the
highest 3D similarity values and use this value for compound ranking.
This scenario corresponded to a nearest-neighbor (1-NN) search in
the pose space. Accordingly, it might be possible to balance inaccuracies
associated with single putative binding modes.To test this
hypothesis, preferred binding modes were collected for the reference
ligands carazolol and ZM241385 using rigid receptor docking into the
models and two alternative sets of three poses each were assembled
as references for 1-NN calculations. The first set consisted of the
three best-scoring binding modes (Top 3 S.), regardless of their structural
relationships, and the second set consisted of the three poses with
the largest RMSD values among the precomputed binding modes (Top 3
Dif.) The idea underlying the generation of the second set was further
increasing binding site coverage with poses for 1-NN calculations.Ranking by 3D similarity to these pose sets produced different
results. Whereas the use of the Top 3 S. sets did not lead to a further
improvement in compound ranking compared to that of the individual
ligand-binding modes, the use of Top 3 Dif. sets resulted in a notable
improvement for β2(β1) and A2A(β2), as reported in Figure A–C and Table . For β2(A2A),
recall performance remained essentially constant when using single
or multiple ligand poses. For β2(β1) and A2A(β2), large AUC values of 0.82
and 0.76 were obtained, respectively, together with large early enrichment
factors (5.45 and 4.72, respectively). Hence, overall, the use of
Top 3 Dif. pose sets was the preferred strategy for compound ranking.These results suggest that an enhanced conformational sampling
of reference poses may be beneficial for compound ranking. A potential
extension of this approach may include the generation of reference
poses via molecular dynamics or Monte Carlo simulations. This would
also enable sampling different conformations of both the ligand and
the receptor, thus partly accounting for protein flexibility.
Docking Screen of an External Model
To further assess
the use of multiple ligand-binding modes for compound
ranking, we also applied the preferred Top 3 Dif. strategy to an externally
derived homology model of a GPCR. For this purpose, a model of the
5-HT6 receptor was selected (Figure E), another topical drug target.[48,49] Structures of two other subtypes of the 5-HT receptor, including
5-HT1B and 5-HT2B, were reported in complex
with agonists,[50,51] but the structure of 5-HT6 is currently unknown. In addition, compared with our in-house
homology models, the 5-HT6 model was generated from multiple
templates using a different computational protocol.[37] Given the absence of an X-ray structure, the accuracy of
this model remained unknown.For 5-HT6, a set of
35 antagonists and 2350 decoys was assembled and used for docking
into the putative binding pocket of the model. Reference binding modes
were obtained by generating Top 3 Dif. docking poses of the known
antagonist, Ro 04-6790 (Figure F), one of the first 5-HT6 selective antagonists
that was discovered.[52] Three-dimensional
similarity-based ranking resulted in an AUC of 0.75 and Ef10% of 3.53 (Table ). As a control, force field-based scoring performed
surprisingly well for this model (in fact, better than that observed
for the other GPCR models), with an AUC of 0.77 and Ef10% of 3.14, hence yielding overall comparable results
(Table ). This trend
was reflected by the equivalent BEDROC values measured for the two
approaches (Table ). Nonetheless, 3D similarity calculations resulted in higher enrichment
of known antagonists within the first third of the ranking (Figure D).
Table 4
Docking Screen of
the External Homology
Model of the 5-HT6 Receptora
protocol
reference pose
AUC
Ef1%
Ef10%
BEDROC
FF scoring
none
0.77
5.71
3.14
0.22
3D similarity
Top 3 Dif.
0.75
5.88
3.53
0.22
Recall performance
is compared for
docking into the homology model of the 5-HT6 receptor on
the basis of best force field energy scoring and the preferred “Top
3 Dif.” 3D similarity pose strategy. Ef1% and BEDROC (α = 20.0) are reported as an additional
performance assessment. Results for the best performing methods are
in bold.
Figure 4
ROC plots for ranking
based on similarity to three different docking
poses. Yellow curves represent the results for compound ranking on
the basis of 3D similarity calculations relative to three different
docking poses of carazolol (panels A and B), a fragment of ZM241385
(consisting of the triazolotriazine core and the furan ring; panel
C), or Ro 04-6790 (panel D). For comparison, green curves represent
the results for PLIF-based compound ranking by using the same reference
poses as before and orange curves the results for the London dG scoring
function. The blue lines provide a reference for random compound selection.
ROC plots for ranking
based on similarity to three different docking
poses. Yellow curves represent the results for compound ranking on
the basis of 3D similarity calculations relative to three different
docking poses of carazolol (panels A and B), a fragment of ZM241385
(consisting of the triazolotriazine core and the furan ring; panel
C), or Ro 04-6790 (panel D). For comparison, green curves represent
the results for PLIF-based compound ranking by using the same reference
poses as before and orange curves the results for the London dG scoring
function. The blue lines provide a reference for random compound selection.Recall performance
is compared for
docking into the homology model of the 5-HT6 receptor on
the basis of best force field energy scoring and the preferred “Top
3 Dif.” 3D similarity pose strategy. Ef1% and BEDROC (α = 20.0) are reported as an additional
performance assessment. Results for the best performing methods are
in bold.
Control
Calculations
As additional
control calculations, each of the 23 antagonists in the β2 data set was docked into the β2 X-ray structure
and homology models and used as a reference for single-pose 3D similarity
calculations. A consistent enrichment of active compounds was observed,
with average AUC values over all of the 23 trials ranging from 0.60
for β2(β1) to 0.78 for β2 X-ray structure. Thus, these calculations demonstrated that
3D similarity calculations did not depend on individual antagonists
used as references. However, inverting the orientations of reference
binding modes within the binding site in a “head-to-tail”
manner by 180° mostly abolished the enrichment of active compounds,
demonstrating that at least approximately correct ligand orientations
were essential for fuzzy 3D similarity calculations.A correlation
analysis was carried out between the 3D similarity measured for all
of the compounds docked into the β2 X-ray structure
by taking the crystallographic binding mode of carazolol as reference
and 2D similarity to the same ligand on the basis of ECFP4 fingerprints.
A low correlation coefficient (r = 0.29) was obtained.
This finding was consistent with previous studies, indicating frequent
low correlation of 3D and 2D ligand similarities.[27,43]
Conclusions
Scoring and compound ranking
continue to be limiting factors of
docking screens. While docking algorithms have become increasingly
effective over the years, it continues to be difficult to separate
active from inactive compounds in rankings. As a consequence, potential
alternatives to force field energy scoring are considered. Among these
is the calculation of 3D similarity of test compounds relative to
binding modes of known ligands, for which proof-of-principle has been
established previously. Although this approach has been firmly rooted
in X-ray crystallography to provide reference binding modes, we have
investigated herein whether it might be further extendable to docking
tasks wherein accuracy is principally limited, that is, when homology
models are used as docking templates. Despite accuracy limitations,
the use of homology models is relevant for the practice of structure-based
virtual screening, especially when GPCRs are targeted. Perhaps provocatively,
we have asked the question whether putative reference binding modes
placed into homology models might be useful to guide compound ranking
and designed a study to investigate this question in detail. Therefore,
the approach reported herein deliberately introduced fuzziness into
3D similarity assessment at the level of modeled binding modes. Success
or failure of such calculations was essentially unpredictable. However,
for different GPCR models, our analysis revealed that fuzzy 3D similarity
calculations were indeed capable of further enriching active compounds
at high-ranking positions compared to that of other scoring schemes,
as long as modeled binding modes and orientations were at least approximately
correct. Moreover, ensembles of multiple structurally diverse poses
in combination with the nearest-neighbor similarity calculations were
overall more effective in compound ranking than single binding modes,
which further increased the fuzziness of the approach. Taken together,
the findings reported herein indicate that fuzzy binding mode resemblance
can be successfully exploited in docking screens, even in instances
in which no crystallographic information is available, and suggest
the use of pose ensembles for compound selection from GPCR homology
models as an alternative to scoring.