Harrison J Hocker1, Nandini Rambahal, Alemayehu A Gorfe. 1. Department of Integrative Biology and Pharmacology, The University of Texas Health Science Center at Houston , 6431 Fannin Street MSB 4.108, Houston, Texas 77030, United States.
Abstract
Incorporation of receptor flexibility into computational drug discovery through the relaxed complex scheme is well suited for screening against a single binding site. In the absence of a known pocket or if there are multiple potential binding sites, it may be necessary to do docking against the entire surface of the target (global docking). However no suitable and easy-to-use tool is currently available to rank global docking results based on the preference of a ligand for a given binding site. We have developed a protocol, termed LIBSA for LIgand Binding Specificity Analysis, that analyzes multiple docked poses against a single or ensemble of receptor conformations and returns a metric for the relative binding to a specific region of interest. By using novel filtering algorithms and the signal-to-noise ratio (SNR), the relative ligand-binding frequency at different pockets can be calculated and compared quantitatively. Ligands can then be triaged by their tendency to bind to a site instead of ranking by affinity alone. The method thus facilitates screening libraries of ligand cores against a large library of receptor conformations without prior knowledge of specific pockets, which is especially useful to search for hits that selectively target a particular site. We demonstrate the utility of LIBSA by showing that it correctly identifies known ligand binding sites and predicts the relative preference of a set of related ligands for different pockets on the same receptor.
Incorporation of receptor flexibility into computational drug discovery through the relaxed complex scheme is well suited for screening against a single binding site. In the absence of a known pocket or if there are multiple potential binding sites, it may be necessary to do docking against the entire surface of the target (global docking). However no suitable and easy-to-use tool is currently available to rank global docking results based on the preference of a ligand for a given binding site. We have developed a protocol, termed LIBSA for LIgand Binding Specificity Analysis, that analyzes multiple docked poses against a single or ensemble of receptor conformations and returns a metric for the relative binding to a specific region of interest. By using novel filtering algorithms and the signal-to-noise ratio (SNR), the relative ligand-binding frequency at different pockets can be calculated and compared quantitatively. Ligands can then be triaged by their tendency to bind to a site instead of ranking by affinity alone. The method thus facilitates screening libraries of ligand cores against a large library of receptor conformations without prior knowledge of specific pockets, which is especially useful to search for hits that selectively target a particular site. We demonstrate the utility of LIBSA by showing that it correctly identifies known ligand binding sites and predicts the relative preference of a set of related ligands for different pockets on the same receptor.
In recent decades,
the number of drugs brought to market has been
declining, and new low-cost methods for drug discovery are needed.[1] While virtual screening is able to reduce the
total number of ligands that need to be synthesized and screened experimentally,[2] it requires prior knowledge of the target site.[3,4] However, some of the most effective compounds on the market bind
to allosteric sites,[5] and these sites are
not always apparent in average structures from X-ray crystallography
or nuclear magnetic resonance spectroscopy (NMR).[6] There are several computational techniques for binding
site identification, including FTMap,[7] blind
docking,[8,9] solvent mapping,[10] and other simulation-based methods.[11] Once the sites are determined, virtual screening (VS) can be performed
to find small-molecule ligands that have the potential to become hits.[12] Typically, VS is followed by experimental assays
of the highest affinity compounds. Although the application of any
of these methods to one or a few conformations of a receptor is fairly
straightforward, there is no simple way to find a consensus-binding
site on a large number of structures or to prioritize hits by preference
to a particular binding site. This is important because ligand–receptor
affinity is a property of an ensemble,[13] and therefore incorporation of target flexibility is crucial for
success.[14] Moreover, almost all docking
methods rank ligands based on predicted binding affinities,[15] despite examples of low-affinity hits having
led to potent bioactive compounds.[16] We
have developed a technique called LIgand Binding Specificity Analysis
(LIBSA) that quantifies (and prioritizes by) pocket-specificity following
a search for allosteric ligand binding sites over an ensemble of receptor
conformations.LIBSA uses a small molecule or fragment as a
probe to search for
allosteric binding sites on the surface of a protein. This can be
done by popular docking programs such as Autodock,[17] which in principle are capable of finding the correct binding
site for the right ligand when the search area covers the entire protein
surface.[9,18] However, this approach is likely to yield
random docked poses. These poses typically have a low probability
of occurring and thus should be filtered out. We introduce two techniques
to remove such poses: a high-pass filter[19] and an algorithm that filters out ligand poses based on their frequency
of occurrence. The latter favors poses with high affinity, and the
high-pass filter[19] removes false-positive
hits by scaling down ligand–receptor contacts that fall below
a threshold value. We then use a signal-to-noise ratio (SNR),[20] defined as the relative frequency of a ligand
contacting a particular region of the protein versus all other regions,
to quantify binding specificity. In sum, the general procedure of
LIBSA entails probing a receptor conformation with a ligand, determining
the frequency of the different protein–ligand contacts, filtering
out random poses, and then quantifying the consistency of binding
with the SNR (Figure 1).
Figure 1
Schematic outline of
the LIBSA protocol for the determination of
ligand binding preference to allosteric sites on a single or ensemble
of receptor conformations. (Left) Workflow of LIBSA. (Right top) Optional
preprocessing to group protein conformers based on clustering or other
methods. (Right upper middle) Depiction of several Ras conformations
bound to theoretical probe ligands shown as cyan spheres. (Right lower
middle) Contact spectrum generated from the probe ligands where the
red peaks represent the noise and the green peaks represent the signal.
A filtered spectrum is shown as an inset. (Right bottom) Table summarizing
the hypothetical SNRs calculated from the contact spectrum, where
pocket A is defined by the green peaks and pockets B and C lie within
the red peaks.
Schematic outline of
the LIBSA protocol for the determination of
ligand binding preference to allosteric sites on a single or ensemble
of receptor conformations. (Left) Workflow of LIBSA. (Right top) Optional
preprocessing to group protein conformers based on clustering or other
methods. (Right upper middle) Depiction of several Ras conformations
bound to theoretical probe ligands shown as cyan spheres. (Right lower
middle) Contact spectrum generated from the probe ligands where the
red peaks represent the noise and the green peaks represent the signal.
A filtered spectrum is shown as an inset. (Right bottom) Table summarizing
the hypothetical SNRs calculated from the contact spectrum, where
pocket A is defined by the green peaks and pockets B and C lie within
the red peaks.As an initial test, we
used LIBSA to correctly identify the active
site on a set of receptors cocrystallized with small molecule drugs,
including a group of kinase domains,[21] nuclear
receptors,[22−24] the β2AR,[25] HIV-protease,[26] and K-Ras.[27,28] We then applied
LIBSA to rank four related compounds by binding site preference when
docked onto an ensemble of 148 experimental and simulated Ras structures.
Materials
and Methods
Generating Library of Receptor Conformers for Docking
We applied LIBSA on two sets of structural ensembles. The first involved
10 protein–ligand complexes from the Protein Data Bank (PDB)
representing five different protein families (G-proteins, G-protein
coupled receptors, kinases, nuclear receptors, and proteases). In
each case, the cocrystallized ligand was removed and used as a probe
for blind docking to search for the binding site on the cocrystal
conformation (Table 1) and, when available,
the corresponding apo structure (Table 2). The second ensemble containing 148 wild type and mutant
H- and K-Ras conformers was derived from molecular dynamics (MD) simulations
(80), crystallography (66), and NMR (2 averaged and energy-minimized
structures). The MD simulations were conducted as described in the SI and representative structures were identified
as follows. 10 ps-separated snapshots from trajectories were clustered
using Cα atom positions with the leader-RMSD-based algorithm
implemented in Wordom.[29] The RMSD cutoff
for clustering ranged from 1.3 to 1.8 Å so that the top 10 clusters
contained 92–99% of the conformations in each trajectory (see
the SI). The resulting 80 cluster centroids
were taken as representative conformations of the large phase space
sampled by the simulations. For an additional and more global characterization
of the ensemble, we regrouped the structures into five clusters in
principal component (PC) space with the K-means algorithm.[30]
Table 1
Performance of LIBSA
on Known Protein–Ligand
Complexesa
SNR
ligand
receptor
PDB ID
box vol. (Å3)
Pop (S&W)
#
Lig Tors
RMSD (Å)
no filter
affinity filter
Nilotinib
p38 MAPK
3GP0
2.1e5
225 (0.25)
7
0.72
1.12
1.15
Gleevec
c-Abl Kin.
1IEP
1.9e5
225 (0.25)
7
1.34
1.42
1.43
Tamoxifen
ERα
3ERT
2.2e5
150 (0.10)
10
1.46
2.06
3.75
Raloxifene
ERα
2QXS
2.1e5
150 (0.10)
9
0.95
1.50
1.76
estradiol
ERα
1QKT
2.1e5
150 (0.10)
2
1.01
3.64
3.64
BI-167107
β2AR
3SN6
2.5e5
225 (0.25)
8
1.48
1.15
1.07
Indinavir
HIV Pr.
1HSG
1.3e5
225 (0.25)
14
1.41
0.96
0.94
benzimidazole
K-Ras
4DSU
1.3e5
150 (0.10)
0
0.94
1.10
3.48
0QV
K-Ras
4EPW
1.1e5
150 (0.10)
3
0.74
1.17
3.54
0QW
K-Ras
4EPT
1.0e5
150 (0.10)
3
0.71
1.24
1.74
Selected protein–ligand complexes
from the protein databank were used to illustrate the ability of LIBSA
to identify the active site and ligand pose. SNR values were calculated
with the affinity filter and without any filter. The AutoDock ligand
population size (Pop), probability of performing a Solis and Wets
(S&W) local energy minimization, rotatable bonds in the ligand
(Lig. Tors), volume of the search box, and the lowest docked RMSD
relative to the crystal pose are listed. Kin. = kinase, Pr. = protease.
Table 2
LIBSA Analysis of
Test Ligands Docked
onto Crystal and MD-Derived apo Structuresa
ligand
receptor
structure
box vol. (Å3)
Pop (S&W)
#Lig. Tors
SNR
Docking onto an apo Structure
with Open or Semiopen
Binding Site
estradiol
ERα
2B23
1.5e5
150 (0.10)
2
3.61
0QW
H-Ras
2CL0, 2RGB, MD
1.1–1.2e5
150 (0.10)
3
1.11, 1.03, 0.55
BZI
H-Ras
2CL0, 2RGB, MD
1.1–1.2e5
150 (0.10)
0
1.05, 1.00, 1.58
0QV
H-Ras
2CL0, 2RGB, MD
1.1–1.2e5
150 (0.10)
3
0.87, 0.72, 0.86
Gleevec
p38 MAPK
1WFC
2.5e5
225 (0.25)
7
0.75
Nilotinib
p38 MAPK
1WFC
2.5e5
225 (0.25)
7
0.71
Docking onto an apo Structure
with Occluded or
Small Binding Pocket
BI-1667107
β2AR
2R4S
1.9e6
225 (0.25)
8
–0.04
Raloxifene
ERα
2B23
1.5e5
150 (0.10)
9
–0.16
Tamoxifen
ERα
2B23
1.5e5
150 (0.10)
10
–0.17
Global
Cross Docking
Nilotinib
c-Abl Kin.
1IEP
1.9e5
225 (0.25)
7
1.05
Gleevec
p38 MAPK
3GP0
2.1e5
225 (0.25)
7
1.05
Because the available Ras ligands
are reportedly nonspecific for the highly homologous Ras isoforms,[27,28] for a stringent test of LIBSA we docked ligands solved with the
K-Ras isoform on two PDB and an MD apo structure
of H-Ras. These structures were chosen because they possess a visually
discernible pocket that is similar to that in the ligand-bound K-Ras
PDB structures 4DSU, 4EPT, or 4EPW.
Selected protein–ligand complexes
from the protein databank were used to illustrate the ability of LIBSA
to identify the active site and ligand pose. SNR values were calculated
with the affinity filter and without any filter. The AutoDock ligand
population size (Pop), probability of performing a Solis and Wets
(S&W) local energy minimization, rotatable bonds in the ligand
(Lig. Tors), volume of the search box, and the lowest docked RMSD
relative to the crystal pose are listed. Kin. = kinase, Pr. = protease.Because the available Ras ligands
are reportedly nonspecific for the highly homologous Ras isoforms,[27,28] for a stringent test of LIBSA we docked ligands solved with the
K-Ras isoform on two PDB and an MD apo structure
of H-Ras. These structures were chosen because they possess a visually
discernible pocket that is similar to that in the ligand-bound K-Ras
PDB structures 4DSU, 4EPT, or 4EPW.
Molecular Docking
The proteins and
ligands were prepared
for docking by first removing ions and water molecules and building
missing protein atoms using the CHARMM27 force field[31] and VMD;[32] missing nucleotide
hydrogen atoms were built using AutoDock Tools[17] or OpenBabel.[33] The structure
of ligands andrographolide (AGP), 3,19-(2-bromobenzylidene) andrographolide
(SRJ09), 3,19-(3-bromobenzylidene) andrographolide (SRJ10), and 3,19-(3-chloro-4-fluorobenzylidene)
andrographolide (SRJ23)[8] were prepared
using the CHARMM generalized force field (CGenFF36),[34] as described before.[8] Then,
nonpolar hydrogen atoms were condensed into their respective heavy
atoms, and atomic charges were assigned using the Gasteiger–Marsili
method.[35] “Blind docking”[18] was carried out with AutoDock 4.2[17] with the aid of the AutoDock tools (ADT) package,[17] keeping the ligand flexible and the receptor
rigid. The search space was a cubic box covering the entire surface
and centered on the geometric center of the protein, extending 10
Å beyond the protein edge in each direction. The termination
criterion for each docking run was set to be either 10,000 LGA generations
or 109 energy evaluations, whichever comes first. The genetic
algorithm (GA) population size and the Solis and Wets local search
probability were optimized for each system depending on the volume
of the search area, the accessibility of the pocket, and the number
of rotatable bonds in the ligand. Additional details of the docking
parameters are listed in Tables 1 and 2.
Affinity Filtering: Noise Reduction Using
Docking Scores
It is well-known that ranking by predicted
affinity is a major source
of false positives (error of ∼2 kcal/mol),[17] and currently there is no simple way to identify potential
hits with low predicted affinity. Therefore, we have developed an
approach that favors binding consistency over affinity by putting
more weight on the frequency of occurrence of a particular docking
score rather than the magnitude of the score. A key concept behind
affinity filtering is that docking scores can be used as a metric
for pose uniqueness and not just ranking. The steps for performing
the filtering are as follows: (i) Generate a histogram of the binding
scores, which we refer to as an affinity spectrum, from the AutoDock DLG file (in kcal/mol). (Since AutoDock gives
the scores with a precision out to the hundredth decimal we used a
relatively fine binning width of 0.05 kcal/mol.) (ii) Identify the
peaks that represent frequently sampled affinity space based on a
simple cutoff relative to the maximum value within each spectrum.
We found that a threshold of 40% of the maximum peak value yields
the best compromise between elimination of false positives and retention
of alternative poses at a given site. We therefore collected all peaks
whose height is ≥40% of the maximum peak height and referred
to them as explicit peaks. (iii) To account for the
fact that scoring functions are error prone, we added all peaks whose
AutoDock score is within a certain percentage of that of the explicit
peaks, which we call auxiliary peaks (i.e., peaks
in neighboring bins each side of the explicit peaks). For example,
using a sampling window of 1% (as used throughout this paper) and
a hypothetical explicit peak with an average energy score of −10.0
kcal/mol in a spectrum of bin width 0.05 kcal/mol, we include the
peaks in the first 2 bins each side of the explicit peak ((|(−10.0
× 0.01)|)/0.05) = 2). (iv) Write out the structures that gave
rise to the explicit and auxiliary peaks; this requires keeping track
of each docked conformation in step (i).This procedure effectively
eliminates low frequency high affinity poses while assigning more
weight on the high frequency high affinity poses. Moreover, poses
that are similar in affinity to the high frequency ones but appear
less frequently can be captured through the incorporation of auxiliary
peaks around the explicit peaks.
Binding Site Identification
and Scoring with SNR
The
affinity filtering described above identifies docked poses that are
well sampled in affinity space irrespective of their binding site
on the receptor. To identify a ligand-binding site, we use histograms
of the frequency with which a ligand contacts residues on the receptor,
followed by an analysis of the signal-to-noise ratio (SNR, see below).
This entails two simple steps. First, generate a histogram of residue
contact frequencies by counting the number of times any heavy atom
of a residue lies within 4.0 Å of any ligand heavy atom during
repeated docking runs. In this histogram, which we refer to as a contact spectrum, each bin corresponds to a single residue,
and for each ligand there can be as many contact spectra as there
are receptor conformations. Second, define a surface patch of interest
(e.g., from the dominant peaks in the contact spectrum or from prior
knowledge about the biochemical/structural features of the protein).
Then, the binding preference of a ligand to the patch is quantified
by SNR (eq 1), where signal refers to peaks
that lie within the binding patch and noise denotes the peaks that
lie outside the binding patch:Here u refers to the peak height at residue i and is indexed over all the residues that lie within the
binding patch (total number = Nsignal),
and u is the peak height
at residue j indexed over all the residues that lie
outside the binding patch (total number = Nnoise). Since the function is divergent when the noise term is zero, we
arbitrarily set the noise floor to 0.0001; SNR is set to 0.0 when
the signal is zero.This approach allows for scanning the protein
surface for allosteric ligand binding sites, ranking ligands by their
preference for a given site, or establishing threshold SNR values
below which binding could be deemed nonspecific (see Results and Discussion). One caveat is that this procedure
cannot be easily automated for receptors whose ligand binding site(s)
is not well-defined. One solution could be to build a series of patches
over the entire surface and compare the final SNRs or use prior knowledge
about the target including interaction sites or sites of post-translational
modification (e.g., acetylation, phosphorylation, ubiquitination).
High-Pass Filter: Noise Reduction Based on Contact Frequency
Concepts from digital signal processing such as high-pass filters
can be used to directly remove docking noise from a contact spectrum
by treating it as a discrete signal. Here we use the Butterworth filter,[19] though other types of filters could also be
used. A fifth order Butterworth high-pass filter with a critical value
equal to the Golden ratio conjugate[36,37] (or golden
section taken as 0.618) efficiently removes less frequent contact
signals and allows for predominant peaks to be more easily identified.
The formulation we used is as followswhere u is the golden section, u is the input spectrum, and n is 5. Note
that in
this analysis, the contact spectra should all be internally normalized
to the maximum value in order to use a consistent cutoff across the
data. Moreover, using an appropriate cutoff is crucial because, if
the cutoff value is set too low, the noise will be amplified and if
it is set too high the signal is not properly enhanced. A contact
spectrum can be constructed from a single ligand–receptor pair
or by combining the results from multiple ligand–receptor pairs
into a single histogram. In either case, the general procedure involves
generating a contact spectrum as described in the previous section,
applying the high-pass filter and computing the SNR.Although
high-pass and affinity filters serve somewhat different purposes and
operate on different spectrum types, their end result is the same:
remove docking noise. Therefore, either can be used in many situations.
One advantage of applying a high-pass filter on contact spectra is
that it makes no assumption about the relationship between affinity
and ligand-binding residues (i.e., affinity is completely removed
from the postdocking analysis).
Consensus Ligand Binding
Site Identification with Ensemble SNR
In the context of docking,
the most notable difference between
NMR/X-ray crystallographic and MD data is the number of structures
involved. The former is typically one or several to dozens, while
the latter can be in the millions, which complicates the search for
emergent binding sites. Ensemble SNR, which is a simple extension
of the procedures discussed above, can help tackle this problem. The
key here is to cluster the receptor conformations into N groups (denoted
α, β, γ in Figure 1) based
on either global or local structural features, such as solvent accessible
surface area, principal components, pocket volume, backbone RMSD,
etc. Then a single contact spectrum can be built by combining the
data from all members of a cluster (e.g., α), which can then
be used to compute SNR for use in consensus-binding site identification.
Thus, this approach utilizes the results from multiple, independent
docking experiments and produces a single number that describes how
frequently a ligand binds to a given region. Such an ensemble-averaged
SNR can improve prediction as the results are not contingent upon
a single structure but on multiple structures.[14,38] Its major drawback lies in the assumption that the pocket is present
on most if not all of the structures within a cluster. It is thus
prudent to first examine individual SNRs to ensure that the desired
pocket is present in the majority of the cluster members before proceeding
to generating an ensemble-averaged SNR.
Results and Discussion
Validation of LIBSA
Blind Redocking of Cocrystallized Ligands
A set of
10 protein–ligand complexes (Figure 2a) was used to assess the ability of the method to reliably identify
known ligand binding sites. Table 1 shows that
calculation of SNR from the raw docking data yields values equal to
or greater than 1.0 for each system. While SNR > 0.0 (see eq 2) can be regarded as a successful identification
of the binding site, SNR ≥ 1.0 is even better because this
means that there are at least 10-times more hits in the active site
than elsewhere on the protein. Importantly, preceding the SNR calculations
by affinity filtering led to an even larger (i.e., better) SNR for
half of the test cases.
Figure 2
Global docking followed by Signal-to-Noise Ratio
(SNR) analysis
for our test cases. (a) Chemical structure of ligands used to evaluate
the performance of LIBSA on known protein–ligand complexes.
(b) Convergence of SNR values during docking (5–256 docked
poses without filters) to the value reported in Table 1 (see No filter column). (c) The cumulative variance of SNR
decreases as the number of poses increases. Estradiol, which has only
2 rotatable bonds, and Indinavir*, whose torsions were fixed, targeted
a single site and therefore yielded a constant SNR regardless of the
number of docked poses used. BZI = benzimidazole, 0QV = (4-hydroxypiperidin-1-yl)(1H-indol-3-yl)methanethione,
0QW = (2- hydroxyphenyl)(pyrrolidin-1-yl)methanethione.
Global docking followed by Signal-to-Noise Ratio
(SNR) analysis
for our test cases. (a) Chemical structure of ligands used to evaluate
the performance of LIBSA on known protein–ligand complexes.
(b) Convergence of SNR values during docking (5–256 docked
poses without filters) to the value reported in Table 1 (see No filter column). (c) The cumulative variance of SNR
decreases as the number of poses increases. Estradiol, which has only
2 rotatable bonds, and Indinavir*, whose torsions were fixed, targeted
a single site and therefore yielded a constant SNR regardless of the
number of docked poses used. BZI = benzimidazole, 0QV = (4-hydroxypiperidin-1-yl)(1H-indol-3-yl)methanethione,
0QW = (2- hydroxyphenyl)(pyrrolidin-1-yl)methanethione.Many docking algorithms including AutoDock are
sensitive to the
total number of rotatable bonds on the ligand.[39] It is therefore expected that the number of flexible torsions
will likewise affect postprocessing by SNR. Indeed, the smallest SNR
was found for Indinavir, which has the largest number of active torsions
(14). This is likely because it overwhelmed the search algorithm in
AutoDock, as suggested by the dramatically larger SNR (3.85) obtained
when the ligand is docked with all its torsions fixed to their crystal
structure values (see Indinavir* in Table 1 and Figure 2b). Therefore, the complexity
of the probe to be used in LIBSA will depend on the capability of
the program used for docking. Similarly, convergence of SNR should
depend on the number of docked poses generated, as shown by the profile
of the SNR calculated from contact spectra with 5–256 docked
poses (Figure 2b). Excluding estradiol and
Indinavir* whose SNR was invariant (Table S2), convergence was achieved (Figure 2b) and
the variance plateaued (Figure 2c) after about
100 runs, with the coefficient of variation being 0.01–0.03
for the additional 100–256 runs.Overall, LIBSA identified
the right binding site (Table 1 and Figure 2) and recovered
the correct ligand pose with reasonably small root-mean-square deviations
(RMSD) from the crystal poses (Table 1). Moreover,
affinity filtering increased the SNR for half of the ligand–protein
pairs without affecting the rest, supporting our expectation that
low probability poses are characteristic of docking noise.
Blind
Docking of Known Ligands on apo Structures
To
examine if LIBSA could identify the correct binding site on structures
solved without the probe ligand, we blind-docked estradiol on ERα,
Gleevec, and Nilotinib on p38 MAPK, BI-167107 on β2AR, and Tamoxifen
and Raloxifene on ERα. Indinavir was excluded due to its large
number of rotatable bonds, as discussed in the previous section. Of
these, visual analysis suggested that the ligand-binding pocket on
the apo structure of p38 is open and appears suitable
for binding, whereas that of ERα is small and β2AR’s
is closed. As shown in Table 2, LIBSA predicted
a positive binding preference of ∼0.7 for both p38 ligands
and 3.61 for the small ligand estradiol on ERα. In contrast,
the SNR is negative when the pocket is closed (β2AR) or the
ligand is too large to fit in the pocket (Tamoxifen/Raloxifene on
ERα). This result provides additional evidence that LIBSA can
discriminate between favorable and unfavorable binding.For
a further and more stringent test we docked three ligands that were
solved with K-Ras (0QW, 0QV, and BZI) onto two X-ray and one MD apo structures of the homologous protein H-Ras. The H-Ras
structures were chosen because they display a pocket similar to that
seen in the ligand-bound K-Ras. LIBSA yielded SNR values of ∼0.6–1.6
for these pairs (Table 2), showing that our
tools are robust and applicable to diverse problems. Finally, cross
docking of the chemically somewhat similar Gleevec and Nilotinib (Tanimoto
coefficient of 0.6)[40] on c-Abl and p38
kinases resulted in SNR = 1.05 for both. Thus LIBSA was able to recognize
the potential of ligands with similar chemical signatures to have
similar binding profiles.In sum, LIBSA was able to correctly
identify binding sites on apo structures derived
from crystallography or MD (Table 2), though
the success rate is somewhat smaller than
in “re-docking” (Table 1). Taken
together, these results highlight the potential of global docking
combined with our analytic tools to numerically describe ligand-binding
preference, identify putative binding sites, and filter out off target
poses.
Ranking Ligands and Classifying
Receptor Conformations
by Binding Specificity
Here we used an ensemble of 148 Ras
conformers (see Methods) and four ligands
(Figure 3a) from our previous work (AGP, SRJ09,
SRJ10, and SRJ23)[8] to illustrate how chemically
similar yet pharmacologically different[8,41] compounds
may exhibit distinct pocket and conformation preferences when analyzed
by LIBSA. Focusing only on four previously described sites,[9] we calculated SNR after removing nonspecific
hits by affinity filtering. The results (Table 3) show that the four compounds predominantly target p1 or p3a (SNR
> 0.0) and rarely visit pockets 2 and p3b. In fact, SNR ≥1.0
was obtained only for p1 and p3a. The distribution of the scores in
Figure 3b further shows that the SRJ compounds
target p1 on 70–80% of the conformers, whereas AGP targets
p1 and p3a on 40% and 60% of the conformers. Binding at p2 and p3b
is negligible. Thus, the four ligands can be ranked by their preference
for p1: SRJ10 > SRJ23 > SRJ09 > AGP or p3a: AGP ≫
SRJ10 ≈
SRJ09 > SRJ23. This shows that LIBSA can effectively identify binding
sites preferred by a specific ligand and prioritize ligands by their
preference for a given pocket.
Figure 3
LIBSA of four highly related ligands docked
onto an ensemble of
Ras conformers. (a) Chemical structure of Andrographolide, SRJ09,
SRJ10, and SRJ23. (b) Distribution of the hits with SNR >1.0 at
sites
p1, p2, p3a, and p3b, showing preference of the SRJ ligands for p1
(∼70–80%) and Andrographolide for p3a (58%). (c) PC
projection of the 148 Ras conformations used for docking (clustered
into 5 groups based on their first two PCs) as well as conformers
from a trajectory of p1-bound SRJ23-K-RasQ61H complex (gray shade).
Clusters 2 and 4, which overlay well with conformers of SRJ23-bound
K-RasQ61H, bind ligands at p1 according to the ensemble SNRs shown
in Figure 4.
Table 3
LIBSA-Predicted Ensemble-Average Binding
Preferences of AGP and Its Derivatives for Specific Pockets on Rasa
SNR
ligand
Ras cluster
p1
p2
p3a
p3b
1
–2.56
–2.24
3.80
0.53
2
0.83
0.13
0.63
–0.13
AGP
3
–1.27
–0.39
1.66
0.36
4
1.59
0.12
–3.54
–3.89
5
–1.39
–0.14
1.31
–0.22
1
–1.70
–2.25
1.67
0.68
2
1.07
0.03
–0.02
0.40
SRJ09
3
–0.30
–0.26
1.10
0.75
4
1.52
0.15
–4.17
–3.91
5
0.36
–0.43
0.73
0.44
1
–1.60
–1.95
2.65
0.91
2
1.15
0.13
–0.22
0.13
SRJ10
3
–0.40
–0.23
1.20
0.58
4
1.54
0.10
–3.81
–3.65
5
0.33
–0.29
0.70
0.13
1
–1.81
–2.04
1.75
0.82
2
1.47
0.29
–0.39
–0.29
SRJ23
3
–0.19
–0.06
0.98
–0.01
4
1.60
0.16
–5.00
–5.51
5
0.28
–0.26
0.77
–1.14
Receptor clusters were defined using
K-means clustering based on principal components (see text). SNR scores
were calculated after applying high-pass filter to the raw data. Based
on the definition of SNR (eq 2) and the benchmark
data in Tables 1 and 2, SNR ≥ 1.0 (highlighted in bold) implies high preference
for a given pocket, 0 < SNR < 1.0 (bold-italic) moderate preference
and SNR ≤ 0.0 not favored.
LIBSA of four highly related ligands docked
onto an ensemble of
Ras conformers. (a) Chemical structure of Andrographolide, SRJ09,
SRJ10, and SRJ23. (b) Distribution of the hits with SNR >1.0 at
sites
p1, p2, p3a, and p3b, showing preference of the SRJ ligands for p1
(∼70–80%) and Andrographolide for p3a (58%). (c) PC
projection of the 148 Ras conformations used for docking (clustered
into 5 groups based on their first two PCs) as well as conformers
from a trajectory of p1-bound SRJ23-K-RasQ61H complex (gray shade).
Clusters 2 and 4, which overlay well with conformers of SRJ23-bound
K-RasQ61H, bind ligands at p1 according to the ensemble SNRs shown
in Figure 4.
Figure 4
Ensemble contact spectra for Ras using SRJ23 as probe
(normalized
by the largest score). (a) The dominant peak in the ensemble-averaged
contact spectra of clusters 2 and 4 yielded a p1 SNR > 1.0 after
noise
removal by a high-pass filter. (b) The four large peaks in the ensemble-averaged
contact spectrum of cluster 1 yielded a p3a SNR > 1.0 after noise
removal by a high-pass filter. (c) The cluttered spectra for clusters
3 and 5 yielded insignificant SNRs even after noise removal due to
nonspecific binding. Insets in (a) and (b) illustrate binding to pockets
p1 and p3a using the surface of representative Ras structures colored
by contact spectrum (red-to-black: high to low binding probability).
Receptor clusters were defined using
K-means clustering based on principal components (see text). SNR scores
were calculated after applying high-pass filter to the raw data. Based
on the definition of SNR (eq 2) and the benchmark
data in Tables 1 and 2, SNR ≥ 1.0 (highlighted in bold) implies high preference
for a given pocket, 0 < SNR < 1.0 (bold-italic) moderate preference
and SNR ≤ 0.0 not favored.To check if AGP and its derivatives preferentially
(SNR ≥1.0)
hit a given site on the same set of conformers, we used the Jaccard
similarity coefficient, J, defined here as the ratio
between the number of conformers targeted by both ligands a and b (n) and the total number of conformers targeted by either ligand (n + n):Table 4 shows that
the rather similar SRJ compounds target the same pocket on the same
set of conformers about 50% of the time on average, whereas the chemically
more divergent AGP has less in common with the SRJs in terms of both
its pocket and conformation preference. This is consistent with our
observation from cross docking on Abl and p38 and highlights yet another
utility of LIBSA.
Table 4
Ligand Binding Similarity among MD-Derived
Ras Conformersa
The data represent quantification
of the ability of closely related ligand pairs to target a given pocket
on the same set of Ras conformers, calculated as the ratio between
the number of Ras conformers targeted by both ligands and the total
number of conformers targeted by either ligand. Upper triangle: pocket
p1, lower triangle: pocket p3a.
The data represent quantification
of the ability of closely related ligand pairs to target a given pocket
on the same set of Ras conformers, calculated as the ratio between
the number of Ras conformers targeted by both ligands and the total
number of conformers targeted by either ligand. Upper triangle: pocket
p1, lower triangle: pocket p3a.As mentioned earlier, affinity filtering can be replaced or complemented
by a high-pass digital filter if groups of structurally related receptor
conformations exhibit similar tendencies to bind small molecules.
We thus divided the Ras conformers into 5 groups with K-means clustering
using Euclidean distances based on principal components, analyzed
each group separately with the high-pass filter and SNR, and compared
the results. This led to the following observations (Figures 3, 4, and S2 and Table 3): (i) the SRJ compounds
have a preference (SNR ≥ 1.0) for p1 in clusters 2 and 4 and
p3a in clusters 1 and 3. (ii) In contrast, AGP favors p1 only in cluster
4 and p3a in clusters 1, 3, and 5. (iii) Cluster 5 shows no tendency
to bind any of the SRJ compounds. Thus, a combination of structure
clustering, high-pass filtering, and SNR calculation can isolate reasonably
well a group of receptor conformers that preferentially bind a similar
set of ligands.Ensemble contact spectra for Ras using SRJ23 as probe
(normalized
by the largest score). (a) The dominant peak in the ensemble-averaged
contact spectra of clusters 2 and 4 yielded a p1 SNR > 1.0 after
noise
removal by a high-pass filter. (b) The four large peaks in the ensemble-averaged
contact spectrum of cluster 1 yielded a p3a SNR > 1.0 after noise
removal by a high-pass filter. (c) The cluttered spectra for clusters
3 and 5 yielded insignificant SNRs even after noise removal due to
nonspecific binding. Insets in (a) and (b) illustrate binding to pockets
p1 and p3a using the surface of representative Ras structures colored
by contact spectrum (red-to-black: high to low binding probability).It should be noted that the ensemble
contact spectrum used here
does not represent a single conformation but multiple receptor conformations
in tandem (see Methods). Additionally, application
of the high-pass filter does not directly correlate with the simple
inclusion or exclusion of docked poses, as does affinity filtering.
As a result, the scaled spectrum cannot be represented by a single
structure. That said, the receptor conformations identified by a high-pass
filter plus SNR as favoring p1 binding (clusters 2 and 4) resemble
those from molecular dynamics of SRJ23-K-RasQ61H (Figure 3c and ref (8)). They are also consistent with those found by looking
at the high-specificity ligand–receptor pairs in a piecewise
fashion (discussed above). Thus, irrespective of the specific filter
employed, LIBSA provides information on the receptor conformations
that are best suited to binding small molecules at a given pocket.
Optimizing Binding Affinity
It is
important to note that whereas LIBSA can potentially identify a drug
core and/or receptor conformations that are suitable for ligand binding,
it is unlikely to yield high affinity hits with desired biochemical
properties. It is therefore important that LIBSA is followed up with
site-directed VS to optimize potential hits or identify new ones.
For example, a library of compounds could be generated using programs
such as DAIM,[42] and a pocket can be characterized
as preferentially binding a molecule with LIBSA. Then other methods
such as BOMB,[43] GANDI,[44] and BROOD[45] can be used to generate
and optimize additional compounds which can then be screened against
only the most relevant receptor conformations. Thus the initial library
can be built into a targeted ligand library that is tailored toward
a particular pocket on a particular receptor conformation.
Concluding
Remarks
We have presented a computational framework for identifying
a consensus-binding
site on a single or ensemble of receptor conformations. The method
relies on three simple yet novel techniques to remove nonrelevant
docked poses and compute binding preference. The first tool, affinity
filtering, removes low frequency docked poses based on the distribution
of affinity scores. The second is a high-pass filter that scales down
protein–ligand contacts based on their probability of occurrence.
The third technique combines probing the surface of a target protein
by blind docking with the concept of signal-to-noise-ratio (SNR) to
identify allosteric sites that have the potential to bind small molecule
ligands. Computation of SNR can be preceded with any of the two filtering
methods and can be applied on data generated from blind docking with
any docking algorithm or other methods that can scan the surface of
a target receptor with druglike molecular probes. The resulting protocol,
termed LIBSA, provides a metric for identifying binding sites and
ranking ligands by their consistency of binding to these sites. We
have demonstrated the usefulness of this approach by applying it on
a diverse set of known ligands and their receptors as well as a small
set of related ligands docked onto a large ensemble of Ras conformers
with multiple binding sites. LIBSA was able to correctly identify
the known active sites as the preferred binding site for the respective
ligands and predicted the preference of each of our four test ligands
for a particular pocket on the Ras structures.Our approach
is similar in spirit with an earlier study that has
shown that binding consistency is a necessary condition for successful
docking using multiple runs of a genetic algorithm.[46] However, the previous study has focused on identifying
a consistent binding mode at a particular site, whereas the goal of
LIBSA was to simultaneously identify a binding site and ligands that
bind consistently to that site. Furthermore, in principle, more rigorous
methods such as MD[48] can be used for binding
site identification and scoring. For example, Huang et al used microsecond
scale explicit solvent MD simulations to show that Darunavir binds
to HIV protease in a completely different mode than that found in
the starting crystal structure.[49] However,
while potentially more accurate, this approach is too expensive to
be used for screening tens of thousands of probes against a library
of receptor conformers. The computationally much more efficient LIBSA
can be used alone or in conjunction with MD, depending on need and
the necessary tradeoff between accuracy and computational efficiency.
Moreover, each of the techniques described in this work can be used
either for analyzing the binding mode of known drugs retrospectively
or for a prospective design of new inhibitors.
Authors: Julie R Schames; Richard H Henchman; Jay S Siegel; Christoph A Sotriffer; Haihong Ni; J Andrew McCammon Journal: J Med Chem Date: 2004-04-08 Impact factor: 7.446
Authors: S R Jada; C Matthews; M S Saad; A S Hamzah; N H Lajis; M F G Stevens; J Stanslas Journal: Br J Pharmacol Date: 2008-09-22 Impact factor: 8.739
Authors: K Vanommeslaeghe; E Hatcher; C Acharya; S Kundu; S Zhong; J Shim; E Darian; O Guvench; P Lopes; I Vorobyov; A D Mackerell Journal: J Comput Chem Date: 2010-03 Impact factor: 3.376
Authors: Garrett M Morris; Ruth Huey; William Lindstrom; Michel F Sanner; Richard K Belew; David S Goodsell; Arthur J Olson Journal: J Comput Chem Date: 2009-12 Impact factor: 3.376
Authors: Harrison J Hocker; Kwang-Jin Cho; Chung-Ying K Chen; Nandini Rambahal; Sreenivasa Rao Sagineedu; Khozirah Shaari; Johnson Stanslas; John F Hancock; Alemayehu A Gorfe Journal: Proc Natl Acad Sci U S A Date: 2013-06-04 Impact factor: 11.205
Authors: Paul W Manley; Nikolaus Stiefl; Sandra W Cowan-Jacob; Susan Kaufman; Jürgen Mestan; Markus Wartmann; Marion Wiesmann; Richard Woodman; Neil Gallagher Journal: Bioorg Med Chem Date: 2010-08-14 Impact factor: 3.641
Authors: Barry J Grant; Suryani Lukman; Harrison J Hocker; Jaqueline Sayyah; Joan Heller Brown; J Andrew McCammon; Alemayehu A Gorfe Journal: PLoS One Date: 2011-10-25 Impact factor: 3.240