Robin M Betz1,2,3,4,5, Ron O Dror1,2,3,4,5. 1. Biophysics Program , Stanford University , Stanford , California 94305 , United States. 2. Department of Computer Science , Stanford University , Stanford, California 94305 , United States. 3. Department of Molecular and Cellular Physiology , Stanford University , Stanford , California 94305 , United States. 4. Department of Structural Biology , Stanford University , Stanford , California 94305 , United States. 5. Institute for Computational and Mathematical Engineering , Stanford University , Stanford , California 94305 , United States.
Abstract
Molecular dynamics (MD) simulations that capture the spontaneous binding of drugs and other ligands to their target proteins can reveal a great deal of useful information, but most drug-like ligands bind on time scales longer than those accessible to individual MD simulations. Adaptive sampling methods-in which one performs multiple rounds of simulation, with the initial conditions of each round based on the results of previous rounds-offer a promising potential solution to this problem. No comprehensive analysis of the performance gains from adaptive sampling is available for ligand binding, however, particularly for protein-ligand systems typical of those encountered in drug discovery. Moreover, most previous work presupposes knowledge of the ligand's bound pose. Here we outline existing methods for adaptive sampling of the ligand-binding process and introduce several improvements, with a focus on methods that do not require prior knowledge of the binding site or bound pose. We then evaluate these methods by comparing them to traditional, long MD simulations for realistic protein-ligand systems. We find that adaptive sampling simulations typically fail to reach the bound pose more efficiently than traditional MD. However, adaptive sampling identifies multiple potential binding sites more efficiently than traditional MD and also provides better characterization of binding pathways. We explain these results by showing that protein-ligand binding is an example of an exploration-exploitation dilemma. Existing adaptive sampling methods for ligand binding in the absence of a known bound pose vastly favor the broad exploration of protein-ligand space, sometimes failing to sufficiently exploit intermediate states as they are discovered. We suggest potential avenues for future research to address this shortcoming.
Molecular dynamics (MD) simulations that capture the spontaneous binding of drugs and other ligands to their target proteins can reveal a great deal of useful information, but most drug-like ligands bind on time scales longer than those accessible to individual MD simulations. Adaptive sampling methods-in which one performs multiple rounds of simulation, with the initial conditions of each round based on the results of previous rounds-offer a promising potential solution to this problem. No comprehensive analysis of the performance gains from adaptive sampling is available for ligand binding, however, particularly for protein-ligand systems typical of those encountered in drug discovery. Moreover, most previous work presupposes knowledge of the ligand's bound pose. Here we outline existing methods for adaptive sampling of the ligand-binding process and introduce several improvements, with a focus on methods that do not require prior knowledge of the binding site or bound pose. We then evaluate these methods by comparing them to traditional, long MD simulations for realistic protein-ligand systems. We find that adaptive sampling simulations typically fail to reach the bound pose more efficiently than traditional MD. However, adaptive sampling identifies multiple potential binding sites more efficiently than traditional MD and also provides better characterization of binding pathways. We explain these results by showing that protein-ligand binding is an example of an exploration-exploitation dilemma. Existing adaptive sampling methods for ligand binding in the absence of a known bound pose vastly favor the broad exploration of protein-ligand space, sometimes failing to sufficiently exploit intermediate states as they are discovered. We suggest potential avenues for future research to address this shortcoming.
A longstanding
goal of molecular simulation has been to capture
the binding process of a drug or other small-molecule ligand to its
target. If unbiased, atomic-level molecular dynamics (MD) simulations
of spontaneous ligand binding could be performed reliably, then they
could determine not only ligand-binding sites and poses but also binding
pathways and structural determinants of binding kinetics, all of which
are topics of great interest in drug discovery.[1,2] Thanks
to improvements in algorithms, software, and computer hardware, individual
all-atom simulations can now reach lengths of many microseconds (or
even a millisecond with specialized hardware).[3] This has enabled unguided simulations of the full binding process
for a number of ligand–protein pairs, generating considerable
excitement.[4−8] The great majority of ligands, however, bind on time scales substantially
longer than those currently accessible to individual MD simulations,
making binding a rare event.A number of methods exist for this
problem of rare event sampling
and appear in the literature under different names, beginning with
“splitting” strategies introduced in the 1950s.[9] Currently, many different path-sampling strategies
have been applied to biomolecular systems,[10] most notably the weighted ensemble method.[11] These methods have the added benefit of producing unbiased data
sets from which statistical mechanical observables can be calculated.
However, the majority of path-sampling methods presuppose or require
prior knowledge of the pathway of interest, requiring either start
and ending points or some progress coordinate that must be defined
prior to the run. Defining such coordinates for the binding of a ligand
to an unknown location on a protein presents a significant challenge.An alternative approach for sampling rare events is the use of
heuristic methods, where the results may be biased by the choice of
sampling strategy, but less information needs to be known beforehand
about the desired result. One such class of heuristic method is sometimes
referred to as adaptive sampling, in which multiple
rounds of simulations are performed, initiating the simulations in
each round from molecular system configurations deemed most promising
based on the results of previous rounds.In a typical adaptive
sampling protocol, one runs many independent,
simultaneous MD simulations (samplers) at each round.
A subsequent machine-learning step uses available simulation trajectories
to build a statistical model of what is currently known about what
the ligand can do. The group of samplers is then restarted from the
“most interesting” protein/ligand positions, as determined
by the model, to begin a new round of adaptive sampling.By
iterating these two steps of sampling and machine learning,
adaptive sampling explores the protein–ligand space in a parallelized
manner. In principle, adaptive sampling may also offer speedups relative
to traditional MD simulation by “wasting” less simulation
time on uninteresting regions of the space of all possible configurations.
The degree to which this is true in the case of protein–ligand
binding, if at all, is the topic of this paper.Adaptive sampling
methods have been applied primarily to the problem
of sampling protein conformations. For this application, a reasonable
assumption is that any stable conformation that is different from
those previously observed is interesting. Quantifying novelty in this
way is relatively straightforward. Prior work uses metrics that aim
to minimize redundant sampling,[12] reduce
the uncertainty of transitions,[13] capture
slow events,[14] or explore free energy[15] or other landscapes.[16]Adaptive sampling has also been used to study ligand binding,[17−19] but its performance relative to traditional MD simulation on protein–ligand
systems typical of those encountered in drug discovery has not been
determined. Existing implementations have been evaluated only on small
test systems, present little justification for their choice of methods
and parameters, and frequently rely on knowledge of the true bound
pose to guide sampling.In this Article, we evaluate the performance
of adaptive sampling
with no prior knowledge of both simple and complex protein–ligand
systems using long unbiased MD simulations as a baseline. We introduce
a suggested adaptive sampling protocol and implementation that requires
no a priori knowledge of binding site, bound pose, or even that the
ligand can bind at all—intended to be broadly transferable
to ligand binding, in general. We provide the reasoning behind all
design decisions to aid the reader in evaluating or implementing adaptive
sampling for their own applications.We find that current adaptive
sampling methods as well as the improvements
we present here do not usually sample bound poses faster than traditional
MD simulation. However, they do excel at tasks requiring broad sampling
of protein–ligand space, such as identifying possible interaction
sites all over the protein, and they are able to accurately and efficiently
characterize ligand pathways to a binding site even when the bound
pose is not identified. We analyze our results by placing adaptive
sampling in the context of the algorithmic exploration–exploitation
dilemma, a class of problems featuring a trade-off between
obtaining new knowledge (exploration) and using that knowledge (exploitation).[20]
Methods
Overview
Adaptive sampling protocols
generally consist of four main steps, as depicted in Figure .
Figure 1
Overview
of the adaptive sampling protocol. Initially, multiple
independent molecular dynamics simulations are run of an input system
with multiple copies of the ligand in solution. All configurations
of the molecular system observed in simulation trajectories are clustered,
then used to update a statistical model. Clusters are selected for
resampling in the next round by assigning a score to each cluster,
and new molecular system configurations are built from selected clusters
as starting points for another round of simulation.
Run multiple independent molecular dynamics
simulations, generally starting from different configurations of the
molecular system.Cluster all configurations of the molecular
system observed in simulation trajectories.Select clusters for resampling in the next
round. This involves assigning a score to each cluster, often on the
basis of a statistical model that describes what is known about the
dynamics of the molecular system given the simulations performed so
far.Build new molecular
system configurations
to serve as the starting points for another round of simulation, based
on configurations from the selected clusters.Overview
of the adaptive sampling protocol. Initially, multiple
independent molecular dynamics simulations are run of an input system
with multiple copies of the ligand in solution. All configurations
of the molecular system observed in simulation trajectories are clustered,
then used to update a statistical model. Clusters are selected for
resampling in the next round by assigning a score to each cluster,
and new molecular system configurations are built from selected clusters
as starting points for another round of simulation.Many design decisions and hyperparameters must
be set to have a
functional adaptive sampling implementation. In Sections –2.6, we introduce a recommended adaptive sampling protocol that
requires no a priori knowledge of the ligand’s bound pose or
even the neighborhood of its binding site, with well-defined, user-tunable
parameters.We allow a choice of several scoring functions,
including the hub scores metric, which has not been
previously used for
adaptive sampling. Additionally, our implementation is the first to
enable the effective use of multiple identical ligands in a single
simulation to further enhance sampling, with a system building step
to place them as efficiently as possible.The source code for
our implementation and configuration files
used for test systems are publicly available at github.com/drorlab/adaptive_sampling.
Molecular Dynamics Simulation Step
Each round of adaptive sampling begins with running many simulations
in parallel. For the initial round of sampling, one or more identical
ligands are placed at random positions that are at least some minimum
distance from the protein and from one another, typically in the water
surrounding the protein. In subsequent runs, the ligands are placed
in regions of interest, as defined by the scoring function (see Section ).Each
simulation begins with a static structure with initial atom velocities
assigned randomly. This diversifies sampling because initial random
assignment of velocities may remove the system from low energy states,
favoring exploration.[21] The structure is
minimized and equilibrated with restraints on the ligand and protein
(see the Supporting Information (SI)).Using nligands in each simulation rather
than only one yields an nligands-fold
increase in sampling but runs the risk of mischaracterizing molecular
system behavior if the ligands frequently interact with one another
or communicate through the protein. This parameter should be selected
such that ligands have sufficient “room to roam” and
do not exhibit a tendency to aggregate in simulation.Two additional
user-specified parameters determine the number N and length trun of samplers.
Available computational resources and competing research priorities
will place a realistic upper bound on N, but larger
values are generally better because more samplers means greater exploration
of configuration space in the same amount of wall-clock time.The choice of trun requires some consideration.
Individual trajectories need to be long enough to have a reasonable
chance of capturing relevant events but short enough that one can
perform multiple rounds of simulation, with the results of each guiding
the next.
Clustering Ligand Positions
Raw simulation
data specify the spatial coordinates of every atom for every frame
in the trajectory. To select ligand coordinates for resampling in
the next round, we must divide the system configurations sampled in
simulations thus far into a discrete list of clusters, which can be used to construct a statistical model and then scored.
In practice, each cluster will correspond roughly to a set of ligand
locations, although clusters may also reflect the internal conformation
of the ligand and protein.To cluster effectively, this high-dimensional
trajectory data set must be reduced to a lower-dimensional representation
that captures the features of interest, particularly the position
of each ligand relative to the protein. We wish to assign these features
without prior knowledge of the binding site or pose, disqualifying
the use of metrics like root-mean-square deviation (RMSD) to bound
pose or distance to binding pocket residues.Because multiple
identical ligands may be in the system, we featurize
each ligand independently. That is, for each frame of a simulation
with 10 ligands, 10 feature vectors will be generated. We make the
assumption that ligands do not interact substantially, which is moderately
enforced at the system building phase by placing ligands far from
each other.Ligands are featurized according to the log of the
minimum distance
of each ligand heavy atom to each protein residue. The dimensionality
of the vector is reduced to the slowest evolving components in time
with time-structure-based Independent Components Analysis (tICA).[22,23]Clustering on the tICA components produces “geometric”
clusters because the only information present in the original feature
vectors is spatial. Simple geometric clustering is insufficient to
produce a good representation of the system. In particular, there
are far more spatially distinct ligand positions in the bulk solvent
than there are protein-interacting ones, but these solvent positions
are not meaningful because ligands in solvent rapidly exchange between
different geometric clusters and completely solvated ligands are not
a priority for resampling.We therefore apply a second clustering
step in which geometric
clusters that quickly interconvert are “lumped” into
macrostates. The resulting macrostate clusters are geometrically distinct
but of varying size due to this second, kinetic clustering step (see
the SI). Clustering is done on the entire
data set, not just the new trajectories from each round, because cluster
assignments change when more of the configurational space is explored.User-defined parameters in this step include the tICA lag time,
the number of tICA components used for clustering, the number of geometric
clusters, and the number of macrostates. Of these, the tICA lag time
is of special importance because it determines the fastest dynamics
that may be resolved in the final model, regardless of the frequency
with which individual simulation frames are saved.
Model Construction
After clustering,
observed transitions between macrostates are counted and a Markov
state model (MSM) is constructed.[24] This
model represents each macrostate cluster as a node in a directed graph,
where edges between nodes correspond to possible transitions from
one ligand macrostate to another, with a “weight” specifying
a transition rate estimated from the frequency of transition observed
in simulation. This model can be used to estimate many different properties
of the molecular system, including pathways, kinetics, equilibrium
populations, and free energies.[25]The construction of this model uses several user-defined parameters,
including the number of macrostates and MSM lag time, which are further
described in the SI. For a high-quality
MSM suitable for further analysis, parameter selections are validated
by observing convergence in the model’s implied time scales.[26,27] However, this validation requires a significant amount of simulation
data, creating a chicken-and-egg problem for adaptive sampling runs.
We instead choose the model parameters somewhat arbitrarily, using
the MSM only to guide sampling with the assumption that it is of dubious
quality. Indeed, we find that even for successful adaptive sampling
runs, our parameter selections are not ideal (Supplementary Figure 1).
Scoring
Functions
The choice of which
clusters from the macrostate model to resample is made by selecting
clusters for resampling with a probability inversely proportional
to their score, as determined by some scoring function. We apply a
scoring function new to adaptive sampling, hub scores, and compare it to two simpler ones, populations and counts.An ideal scoring function will
be resistant to model uncertainty; that is, when the model is far
from convergence, scores will still reflect useful states for resampling
despite little data or an inaccurate picture of sampling. The scoring
function should also be transferable between systems and should work
regardless of the depth of the binding pathway, the presence or absence
of a lipid bilayer, and so on.The simplest scoring function
we investigated is the counts function, where a state
is selected with probability inversely proportional
to the number of times the state has been sampled in simulation. This
rewards the exploration of new or sparsely sampled states regardless
of their context. It has been used in several previous studies.[18,19,28]The populations scoring function selects states
for resampling with probability inversely proportional to their predicted
equilibrium population by the current MSM. This is equivalent to sampling
states that the current model predicts have high free energy. This
scoring function aims to both reduce uncertainty in the model, as
poorly characterized states often have low predicted equilibrium population,
and encourage resampling of higher energy transition states, such
as those present in binding pathways. This scoring function has been
previously implemented as the exploration phase of Free Energy Guided
Sampling.[15]Finally, we consider
the use of hub scores, again
with states selected for resampling with probability inversely proportional
to their score. The hub score, originally applied to quantify protein
folding networks,[29] has not been used previously
for adaptive sampling. This score is a measure of a state’s
connectivity in the MSM, with a higher hub score specifying greater
connectivity. More specifically, the hub score measures the probability
that any given state will lie on a pathway between any two points
in the configuration space, according to the MSM. In a typical protein–ligand
MSM, the bulk solvent state has the highest hub score, and states
more distant from it are lower.Although the hub scores
function has not been previously
used for protein–ligand adaptive sampling, it appears promising
due to its scoring of states in the context of the full model. When
searching for bound poses, it makes sense to look for states that
are at the end of pathways. Conversely, transient protein–ligand
interactions intuitively should have a higher rate of interconversion
with solvent, resulting in a higher hub score than a bound state at
the end of a longer pathway.In this study, we aimed to conduct
a survey of broadly different
scoring functions rather than a complete enumeration, and, as such,
we omitted several previous implementations that have been either
subsumed by or shown to be less useful than our selected three. For
example, resampling uncertain transitions more[13] is approximately equivalent to resampling uncertain states
more with the counts function.
System Building
The final step of
each adaptive sampling iteration is the construction of simulation
systems for the next set of MD runs. This step is complicated by the
fact that we typically use multiple ligands in each simulation, and
we wish to place each of them preferentially in clusters that received
low (i.e., good) scores. We therefore do not simply restart simulations
from frames of previous simulations.To ensure some diversity
in cases where one cluster has a much lower score than others, the
top-scoring N clusters are automatically sampled
in the next round. A random simulation frame from each of these clusters
is used to set the initial protein conformation and the position of
a single ligand for one sampler. For each additional ligand, a cluster
is selected at random with probability inversely proportional to its
score. The ligand coordinates are set from a randomly chosen frame
assigned to that cluster, provided that it is not too close to the
protein or to other already-placed ligands, as determined by a user-configurable
cutoff. Ligands that cannot be placed are positioned randomly in the
bulk solvent.Once each system is built, simulation begins,
kicking off the next
round of the adaptive sampling process.
Standard
Test System: Trypsin–Benzamidine
The binding of the
small molecular inhibitor benzamidine to the
protein trypsin is frequently used as a model system for evaluating
methods involving protein–ligand binding.[5,30] The
ligand-binding site is exposed to solvent, and little conformational
change takes place upon binding. The high on-rate of benzamidine also
enables easy sampling of multiple binding events, with an experimental
association constant of 2.9 × 107 mol–1 s–1[31] and a computationally
determined mean first passage time of 500 ns at a concentration of
0.0037 M.[17]We ran six trials of
adaptive sampling with N = 10 samplers with individual
simulation length trun = 10 ns and two
trials with each of the hubs, counts, or populations scoring metrics.
We arbitrarily chose a maximum run time of 80 sampling rounds because
this allowed all conditions to sample at least one binding event.
An initial 10 ns of equilibration was omitted from each trajectory
to avoid biasing the model; see the SI.As a control, we ran six trials of traditional MD simulation, each
with 10 samplers running for 800 ns each, which is equivalent to the
total amount of simulation performed with adaptive sampling.
Realistic System: β2AR–Dihydroalprenolol
We also evaluated the performance of adaptive sampling on a more
difficult system that represents a more typical real-world use case
of the method. We selected the binding of the beta-blocker dihydroalprenolol
to the β2 adrenergic receptor (β2AR) for this test because its binding process has previously been
characterized by extensive traditional MD simulation.[7]β2AR is a membrane-bound G-protein-coupled
receptor (GPCR) where the ligand-binding pocket is removed over 15
Å from bulk solvent. The binding pathway requires the ligand
to first enter the “extracellular vestibule” of the
protein, at which point it has a >50% chance of continuing on to
the
binding site.[7]This system is especially
challenging because it contains a lipid
bilayer, and the ligand is rather hydrophobic and will frequently
partition into the bilayer. A previous unadaptive MD simulation
study[7] of this system cut short runs where
all 10 dihydroalprenolol molecules entered the membrane because dihydroalprenolol
leaves the membrane quite slowly once having entered.We ran
two trials of adaptive sampling using hub scores and two
using population scores. In each case, we used N =
20 samplers with individual simulation length trun = 40 ns (with the initial 10 ns of equilibration omitted;
see the SI) for 40 rounds of sampling.The counts metric was not benchmarked on this system because initial
attempts resulted in all ligands partitioning into the lipid in various
locations. These ligands fail to bind to the protein without first
partitioning back into solvent, which would require prohibitive amounts
of simulation time to sample. Ligand positions in the membrane are
geometrically distinct and do not rapidly interconvert, resulting
in a unique cluster assignment at each membrane location. Because
there are many possible places where the ligand may partition into
the plane of the lipid, the probability of seeing one location many
times is low, resulting in a low count for each membrane-bound cluster.
The counts metric therefore ends up preferentially sampling all possible
ligand locations in the membrane and fails to progress along actual
binding pathways.Although the hub scores function loosely correlates
with observed
counts, the proximity of membrane-partitioned locations to solvent
results in these clusters having a higher hub score and therefore
avoiding resampling. There is no correlation between clusters with
low count and hub score (Supplementary Figure 2).As a control, we ran two trials of traditional MD
simulation, with
each replicate running for an equivalent simulation time of 1.6 μs.
We also obtained trajectories used in the previous binding study[7] for additional validation.Because adaptive
sampling involves running a group of simulations
simultaneously, all comparisons between our implementation and traditional
MD simulation will be done using groups of independent simulations
with equal total amounts of simulation time.
Results
Adaptive Sampling Correctly Identifies Binding
Pathways and Intermediate States
Questions of whether or
not adaptive sampling is faster or better than traditional MD simulation
are moot if the results of adaptive sampling are not interpretable
or fail to capture necessary information to describe binding. It is
also possible that the pathways predicted by adaptive sampling do
not represent the true predominant binding pathway, as the method
could repeatedly resample states in such a way as to force the system
to move over high-energy barriers while a lower energy transition
state is missed. For both test systems, we evaluate if our method
sampled the known bound pose and pathway.We compared our β2AR adaptive sampling runs to the conclusions of a previous
study where long MD simulations were used to characterize binding.[7] That paper notes two entry pathways: In 11 of
the 12 simulations where a ligand bound, it first entered the “extracellular
vestibule” (a region enclosed by extracellular loops (ECLs)
2 and 3 and helices 5–7) and then proceeded into the binding
pocket. In the remaining simulation, the ligand entered between ECL
2 and helices 2 and 7 before proceeding into the binding pocket.All of our adaptive sampling trials captured both of these binding
pathways (Figure ).
Moreover, they did so without requiring any manual analysis―to
obtain binding pathways from these adaptive sampling runs, we simply
queried the MSM for the top pathways from the bulk solvent cluster
to the cluster corresponding to the bound pose. Interestingly, both
pathways were obtained even from an adaptive sampling trial that did
not sample the crystallographic bound pose; in this case, we queried
for a pathway that reached the cluster closest to the bound pose.
Figure 2
Primary
(left) and alternative (right) pathways for ligand entry
to the binding pocket of β2AR, as determined by 60
rounds of two adaptive sampling trials using the hub scores criteria.
At the top are the predicted pathways from a trial where the crystallographic
pose for the ligand was sampled, and at the bottom are those from
a trial where the ligand entered the binding pocket but did not sample
the crystallographic pose. In both cases, the model was queried for
all paths from the most populous cluster (representing bulk solvent)
to the cluster with the lowest mean RMSD to the crystallographic pose.
The pathways shown were selected from the top ten highest flux paths
based on the number of well-defined clusters visited. The crystallographic
pose is shown as sticks, and ligand positions along the pathway are
shown as pins, with the pinpoint at the nitrogen atom and the round
end at the benzene ring center. The pins are colored by the cluster
assignment within the pathway, with 20 pins shown per cluster.
Primary
(left) and alternative (right) pathways for ligand entry
to the binding pocket of β2AR, as determined by 60
rounds of two adaptive sampling trials using the hub scores criteria.
At the top are the predicted pathways from a trial where the crystallographic
pose for the ligand was sampled, and at the bottom are those from
a trial where the ligand entered the binding pocket but did not sample
the crystallographic pose. In both cases, the model was queried for
all paths from the most populous cluster (representing bulk solvent)
to the cluster with the lowest mean RMSD to the crystallographic pose.
The pathways shown were selected from the top ten highest flux paths
based on the number of well-defined clusters visited. The crystallographic
pose is shown as sticks, and ligand positions along the pathway are
shown as pins, with the pinpoint at the nitrogen atom and the round
end at the benzene ring center. The pins are colored by the cluster
assignment within the pathway, with 20 pins shown per cluster.By contrast, when we performed
the same automated MSM-based analysis
for the trajectories obtained by the same amount of traditional MD
simulation, we did not identify both binding pathways. For the first
traditional MD simulation trial, only the less common binding pathway
was clearly identified by the MSM. Manual analysis of the trajectories
from this run shows that only one pathway was sampled in the simulation.
For the second traditional MD run, neither binding pathway was clearly
identified because the clusters did not have sufficient spatial resolution;
in particular, the pathway intermediates were lumped together with
bulk solvent.This marked difference in the quality of the predicted
binding
pathway suggests that by repeatedly placing the ligand in regions
of interest, adaptive sampling may better characterize all available
binding pathways. Traditional MD simulation may require significantly
more simulation time to do this, because once the ligand is bound,
it does not leave the pocket for some time.For the trypsin–benzamidine
system, our results support
those of previous work[17] in which adaptive
sampling consistently samples the crystallographic pose of benzamidine.
However, our featurization scheme, which neglects protein conformation,
results in adaptive sampling’s failure to capture the protein’s
metastable conformational states associated with binding, as the bound
state, as reported by the final MSM, contains many different protein
conformations (Supplementary Figure 3).
The predicted binding pathway involves the ligand exploring the solvent
and, once in the neighborhood of the binding site, quickly entering
the pocket and binding. However, trypsin’s binding pocket is
defined by three mobile protein loops, and the true binding pathway
involves conformational changes that expose the binding site.[32]These results suggest that for systems
where the protein undergoes
a conformational change associated with ligand binding, our adaptive
sampling implementation fails to capture sufficient conformational
hints from which to reconstruct an accurate pathway, even when the
bound pose is sampled in the simulation. However, this limitation
is not inherent to adaptive sampling methods, in general, and a more
sophisticated featurization scheme could produce macrostates that
capture both protein and ligand conformation and is an interesting
topic for future work.
Adaptive Sampling Does
Not Typically Sample
the Bound Pose More Quickly, Especially for Realistic Systems
For both systems, both adaptive sampling and traditional MD simulations
were successfully able to sample the bound state. For adaptive sampling,
all of the benchmarked scoring functions obtained binding events on
at least one test system, although the time to binding varied considerably
between trials (Supplementary Figures 4 and 5). The only scoring function that failed to obtain binding on both
test systems was counts, which resulted in all ligands partitioning
into the membrane in the β2AR–dihydroalprenolol
system. This result indicates that there is a significant opportunity
to further refine scoring functions for adaptive sampling because
counts is by far the most common metric used in prior work.Surprisingly, adaptive sampling did not consistently sample the bound
pose faster than traditional molecular dynamics simulation, especially
for the β2AR system, where one traditional MD simulation
samples a binding event in the first 200 ns of simulation (Figure b). By contrast,
the fastest adaptive sampling run captured a binding event after 400
ns. Calculating the total simulation time per binding event shows
that whereas adaptive sampling frequently makes more effective use
of simulation time, high variance between runs prevents us from drawing
stronger conclusions about the differences between adaptive sampling
and traditional MD (Supplementary Figure 6).
Figure 3
Root-mean-square deviation (RMSD) to the crystallographic pose
for (a) the single benzamidine ligand present in the trypsin system
for two selected trials and (b) the 10 dihydroalprenolol ligands present
in the β2AR system over time for all trials, including
data from a previous study.[7] For adaptive
sampling, vertical lines indicate resampling events every 10 (for
trypsin–benzamidine) or 40 ns (for β2AR–dihydroalprenolol).
All independent simulations are plotted simultaneously. Dark traces
show RMSD values smoothed with a Savitsky–Golay filter with
window size 5.8 ns, with unsmoothed RMSD traces shown behind in a
lighter color. Binding events are defined as the ligand going from
RMSD >3 Å to <2 Å and are indicated by arrows. A dotted
horizontal line demarcates this 2 Å RMSD cutoff. (c) Cumulative
number of binding events observed over all trials in this paper for
the trypsin (top) and β2AR (bottom) systems.
Root-mean-square deviation (RMSD) to the crystallographic pose
for (a) the single benzamidine ligand present in the trypsin system
for two selected trials and (b) the 10 dihydroalprenolol ligands present
in the β2AR system over time for all trials, including
data from a previous study.[7] For adaptive
sampling, vertical lines indicate resampling events every 10 (for
trypsin–benzamidine) or 40 ns (for β2AR–dihydroalprenolol).
All independent simulations are plotted simultaneously. Dark traces
show RMSD values smoothed with a Savitsky–Golay filter with
window size 5.8 ns, with unsmoothed RMSD traces shown behind in a
lighter color. Binding events are defined as the ligand going from
RMSD >3 Å to <2 Å and are indicated by arrows. A dotted
horizontal line demarcates this 2 Å RMSD cutoff. (c) Cumulative
number of binding events observed over all trials in this paper for
the trypsin (top) and β2AR (bottom) systems.It is important to note that by repeatedly resampling
areas of
interest and ignoring others, adaptive sampling yields trajectories
that are correlated, and the number and type of binding events obtained
are therefore not directly equivalent to those present in independent
simulations. Researchers should perform additional validation on pathways
predicted by adaptive sampling to check that these correlations did
not unduly bias the results.The performance difference between
traditional MD simulation and
adaptive sampling is not quite as extreme as these data suggest because
traditional MD typically does not obtain a bound pose in 200 ns. Other
simulations run obtained binding in anywhere from 800–2000
ns, with a total of 3/40 traditional MD runs and 5/20 previously published
trajectories obtaining a ligand position with an RMSD <2 Å
to the crystallographic pose in an equivalent amount of simulation
time to our adaptive sampling trials.However, unlike traditional
MD, where all trajectories are wholly
independent, adaptive sampling typically requires multiple simultaneous
samplers that then exchange information. If there are cluster resources
available for running 20 simulations simultaneously, for example,
the odds of obtaining binding in at least one traditional MD simulation
on this cluster are generally good, with a reasonable probability
that a binding event is observed within the first microsecond or so.
Adaptive sampling may not be worth the resources required if the only
desired information about the protein–ligand system is the
bound pose of the ligand.
Adaptive Sampling Cannot
Reliably Identify
Bound Poses without Follow-Up Analysis
Given that adaptive
sampling seems to be able to sample the pose in simulation, we now
address the issue of whether adaptive sampling can reliably identify
bound poses.Querying the final models after 60 rounds of sampling,
the β2AR system finds that both runs using the hub
scores criterion had the bound pose represented as one of the top
three scoring clusters (of 50), whereas neither run with the populations
criterion identified the pose in the top ten clusters despite repeatedly
sampling the pose in the input trajectories. This agrees with previous
suggestions[33] that MSM predictions are
highly sensitive to the parameters used to construct the model. In
this case, the choice of scoring function affects result quality significantly,
and other model parameters such as lag time or number of macrostates
may have even more dramatic effects.Our clustering scheme introduces
an additional complication to
identifying poses because the kinetic clustering step results in clusters
that can be spread over large areas of the simulation box. Even if
the bound pose is assigned to a given cluster, other frames assigned
to that cluster could span the entire binding pocket, preventing pose
identification. For example, of the three highest scoring clusters
for one hub score run on β2AR, the top two have high
spatial variance (Figure ). Despite not representing the crystallographic pose, these
two clusters may occupy possible binding sites because an intracellular
agonist has been crystallized in approximately the same region as
the top-scoring cluster,[34] and the other
cluster encompasses a location where cholesterol molecules are frequently
present in crystal structures.
Figure 4
Twenty ligand positions from each of the
top three clusters (blue,
red, and yellow, respectively) are shown for two trials of adaptive
sampling on the β2AR system using the hub scores
criteria for a total of 2.4 μs aggregate simulation. For each
trial, the model built from 60 sampling rounds was queried for the
top three scoring clusters, and the displayed frames assigned to each
cluster were randomly chosen. The crystal structure is shown with
the protein as gray ribbons and the ligand as black sticks.
Twenty ligand positions from each of the
top three clusters (blue,
red, and yellow, respectively) are shown for two trials of adaptive
sampling on the β2AR system using the hub scores
criteria for a total of 2.4 μs aggregate simulation. For each
trial, the model built from 60 sampling rounds was queried for the
top three scoring clusters, and the displayed frames assigned to each
cluster were randomly chosen. The crystal structure is shown with
the protein as gray ribbons and the ligand as black sticks.The MSMs used in adaptive sampling
are constructed primarily to
inform subsequent rounds of sampling and are not intended to be high-quality
models. Because the bound pose is indeed sampled in
all of our adaptive sampling runs for the β2AR system,
postrun analysis on the set of trajectories that uses geometric clustering
only, incorporates knowledge of the expected binding site, or involves
more sophisticated kinetic calculation would most likely be able to
identify the pose. Future work could improve predictions by selecting
clusters that have features likely to be characteristic of bound poses,
such as low spatial variance or multiple hydrogen bonds.
Adaptive Sampling Better Characterizes Possible
Allosteric Sites
Our benchmark simulations reveal another
advantage of adaptive sampling: Binding sites away from the most ligand-accessible
areas, such as the water-exposed regions of the β2AR system, are more rapidly sampled. Often, ligands may bind at locations
on the protein other than the canonical, or orthosteric, binding site.
These “allosteric” sites are of increasing interest
as pharmaceutical targets, as differences in the orthosteric pocket
between receptor subtypes can be subtle and targeting allosteric sites
can thus allow for better subtype selectivity.[35,36]In assessing the effectiveness of adaptive sampling at identifying
allosteric sites, we quantify how frequently the ligand samples the
allosteric sites found in our test systems. Because all possible
or even all common allosteric sites are not characterized for β2AR, we use information about allosteric sites present in the
entire family of GPCRs, of which β2AR is a member
with the locations of possible allosteric sites approximated by the
locations of crystallographically resolved cholesterol (and cholesterol
hemisuccinate) molecules. This is reasonable because cholesterol is
found in multiple, conserved binding sites on multiple GPCRs and is
a hydrophobic molecule that is chemically similar to dihydroalprenolol.Adaptive sampling using either population scores or hub scores
achieves significantly greater sampling of all possible cholesterol
sites compared with traditional MD simulation (Figure ). In an equivalent amount of simulation
time, the ligands in adaptive sampling simulations also sample the
entire space within 5 Å of the protein more completely than traditional
MD (Supplementary Figure 7).
Figure 5
On the upper
left, β2AR is shown as ribbons with
sticks showing the locations of all resolved cholesterols (either
as cholesterol or cholesterol hemisuccinate) in Class A GPCR structures
deposited in the Protein Data Bank (PDB).[48] Plotted is a sampling of each of those cholesterol locations in
terms of the probability of any ligand atom occupying the region in
a given frame. (See the SI.)
On the upper
left, β2AR is shown as ribbons with
sticks showing the locations of all resolved cholesterols (either
as cholesterol or cholesterol hemisuccinate) in Class A GPCR structures
deposited in the Protein Data Bank (PDB).[48] Plotted is a sampling of each of those cholesterol locations in
terms of the probability of any ligand atom occupying the region in
a given frame. (See the SI.)This improvement in sampling also holds true for
the trypsin system.
Again, the ligands in adaptive sampling simulations also sample the
space within 5 Å of the protein more completely than an equivalent
amount of traditional MD (Supplementary Figure 8).
Discussion
Thinking about Adaptive Sampling in the Context
of the Exploration–Exploitation Dilemma
Adaptive sampling
is most commonly used in the field of robotic remote sensing, where
a robot or team of robots is tasked with finding a small number of
interesting regions in a large, poorly characterized environment.[37] When solving this exploration problem, the robots
have the choice of either visiting regions in the neighborhood of
the currently known best region (exploitation) or minimizing uncertainty
in unknown parts of the space (exploration).[38] A well known review by Thrun gives an overview of this trade-off
in the context of reinforcement learning.[39]Exploring the configuration space of a protein–ligand
system with the goal of identifying (or at the very least, sampling)
bound poses is a similar problem on a much smaller physical scale.
Adaptive sampling methods here face the same exploration–exploitation dilemma, where a trade-off must be made between searching for new
binding sites and interactions (exploration) or resampling already
seen locations (exploitation) that may provide more accurate bound
poses or progress further along binding pathways. This trade-off has
been previously recognized in the context of protein conformational
sampling[16,40] but has not yet been discussed for the ligand-binding
problem.Some computational methods for the ligand-binding problem
choose
to emphasize exploration to an extreme, mapping out the entire protein
surface probabilistically in terms of where the ligand can bind.[41,42] Others employ a more exploitation-based approach using a knowledge-based
metric such as RMSD to bound pose or ligand distance to binding pocket
residues to determine which states should be resampled.[21,32,43,44] Human intuition can also be used to manually determine which states
are of interest for resampling.[45]In cases where the binding site of the ligand is treated as an
unknown, the scoring functions that have been used previously for
adaptive sampling of protein–ligand binding tend to favor exploration
over exploitation. This is also true of the hub score we have introduced,
as newly discovered states frequently have the lowest hub score.To encourage exploitation over exploration, one needs a scoring
function that can recognize when a state is likely to be on the binding
pathway. For our purposes, an effective scoring function (1) does
not require any a priori knowledge of binding pose or site, (2) should
avoid biasing the simulations toward the sampling of unlikely, or
worse, unphysical pathways, and (3) produces superior sampling for
the quantity of interest relative to traditional MD simulation.The scoring function used is roughly equivalent to the progress
coordinate used in path-sampling methods, but applications of these
methods to protein–ligand systems[42,46] require knowledge of the ligand-binding site. Designing a scoring
or progress metric that does meet this first criterion is complicated
by the inefficiency of these methods in sampling high-dimensional
spaces,[47] and reducing the dimensionality
of this space requires the researcher to make some assumptions of
the nature of the binding pathway.Designing such a function
for either adaptive or path-sampling
contexts without prior knowledge of where the ligand binds is nontrivial
but not impossible. For example, one might use the ligand’s
solvent-accessible surface area (SASA), the variance of all ligand
positions assigned to a macrostate, or the interaction energy between
the ligand and the protein. Indeed, prior work has balanced exploration–exploitation
trade-offs for sampling protein conformations by incorporating physical
properties such as protein SASA, free energy, or experimental measurements.[16,40]However, adaptive sampling algorithms that favor exploitation
must
have the ability to identify when exploitation has failed to know
when more exploration is necessary,[39] a
challenging thing for a problem where even expert medicinal chemists
cannot readily determine what differentiates true bound poses from
decoys. Developing a scoring function that better handles this dilemma
is an interesting subject for future work.
Rationalizing
Adaptive Sampling’s Performance
Relative to Traditional MD Simulation
It is no surprise that
a method that favors exploration better explores possible ligand locations
compared with traditional MD simulation. For systems where getting
the ligand in the vicinity of the binding pocket is sufficient for
binding, adaptive sampling can be expected to outperform traditional
MD because it especially excels at obtaining ligand sampling broadly
across the protein surface. This is why adaptive sampling is a win
compared with traditional MD for the trypsin–benzamidine system
(as seen in both our implementation and that of others). This binding
process does not require consistent sampling of some intermediate
state (exploitation). Instead, the ligand binds quickly once in the
neighborhood of the binding site (exploration).For β2AR and dihydroalprenolol, however, finding the exact bound
pose requires considerable exploitation of an intermediate
state in the binding pathway. In this system, the key intermediate
in binding is in the extracellular vestibule,[7] where the ligand may dwell for quite some time before finally assuming
the bound pose (see traditional MD RMSD traces in Figure b). This intermediate state
must therefore be sampled repeatedly to have a reasonable chance of
seeing a binding event.In addition to using scoring criteria
that may fail to recognize
relevant intermediate states, our adaptive sampling protocol enforces
diversity among the samplers at the system building step when placing
the first ligand. Although a perfect implementation might, in this
case, have all samplers focusing on the intermediate state, in reality
the models are quite underdetermined, and allowing all samplers to
resample a single state results in a substantial chance of wasting
sampling on irrelevant ligand positions.In the β2AR test case, enforcing diversity in
the built systems results in the ligand being removed from the vestibule
to sample some other location of interest. As a result, adaptive sampling
results in a much lower ligand dwell time in the vestibule and thus
fewer binding events for β2AR (Supplementary Figure 9).Although adaptive sampling
is not consistently the best approach
for quickly identifying bound states, it excels in sampling many binding
events and characterizing pathways, even if the exact bound state
is not found. Our adaptive sampling runs that obtained binding events
sampled many more binding events than the traditional MD simulations.In traditional MD simulation, once a ligand has bound, it rarely
unbinds due to the higher energy barrier for unbinding. This prevents
sampling of multiple binding events in a single simulation. During
adaptive sampling, the ligand is frequently repositioned and can be
removed from the binding site to sample other locations of interest,
resulting in many more binding events. This results in a better characterization
of the binding process, with less uncertainty in the transitions between
states along the pathway but less time spent in the bound state, especially
for β2AR (Supplementary Figure 10).Additionally, adaptive sampling’s ability
to better sample
all ligand locations close to the protein means that it holds promise
for identifying regions where the ligand has a reasonable dwell time.
Docking, follow-up simulation, or other analyses may then be performed
on those locations to produce possible bound poses of that ligand
at that potential site.
Conclusions
Whereas
it may be initially surprising that adaptive sampling may
offer little benefit in terms of getting ligands to bind faster, putting
the problem of ligand binding in the context of the exploration–exploitation
dilemma provides an explanation. Getting ligands to bind in simulation
is algorithmically equivalent to exploring some large, underdetermined
space (in this case, the space consists of all possible protein–ligand
interactions) to find a small number of regions of interest (ligand-binding
sites), with a high cost of sampling (running simulations is slow).Adaptive sampling with the scoring functions typically used in
the absence of a known binding site is an inherently exploration-focused
method. This results in a longer time to obtain binding for systems
such as β2AR with a key intermediate state that must
be resampled.However, for scientific questions that require
the broad exploration
of protein–ligand configuration space, such as the identification
of possible allosteric binding sites, adaptive sampling offers a substantial
benefit to the researcher, as its focus on exploration results in
much broader sampling of ligand locations near the protein, in general.
Likewise, adaptive sampling methods excel at providing an accurate
description of ligand-binding pathways.Scoring functions that
favor exploitation represent a promising
area of future work. Indeed, one might be able to combine the ability
of adaptive sampling methods to rapidly identify potential ligand-binding
sites with sampling approaches that are better able to exploit these
sites to get true bound poses within them.
Authors: Ron O Dror; Hillary F Green; Celine Valant; David W Borhani; James R Valcourt; Albert C Pan; Daniel H Arlow; Meritxell Canals; J Robert Lane; Raphaël Rahmani; Jonathan B Baell; Patrick M Sexton; Arthur Christopoulos; David E Shaw Journal: Nature Date: 2013-10-13 Impact factor: 49.962
Authors: Jan-Hendrik Prinz; Hao Wu; Marco Sarich; Bettina Keller; Martin Senne; Martin Held; John D Chodera; Christof Schütte; Frank Noé Journal: J Chem Phys Date: 2011-05-07 Impact factor: 3.488
Authors: Doris A Schuetz; Wilhelmus Egbertus Arnout de Witte; Yin Cheong Wong; Bernhard Knasmueller; Lars Richter; Daria B Kokh; S Kashif Sadiq; Reggie Bosma; Indira Nederpelt; Laura H Heitman; Elena Segala; Marta Amaral; Dong Guo; Dorothee Andres; Victoria Georgi; Leigh A Stoddart; Steve Hill; Robert M Cooke; Chris De Graaf; Rob Leurs; Matthias Frech; Rebecca C Wade; Elizabeth Cunera Maria de Lange; Adriaan P IJzerman; Anke Müller-Fahrnow; Gerhard F Ecker Journal: Drug Discov Today Date: 2017-04-13 Impact factor: 7.851
Authors: Yibing Shan; Eric T Kim; Michael P Eastwood; Ron O Dror; Markus A Seeliger; David E Shaw Journal: J Am Chem Soc Date: 2011-05-13 Impact factor: 15.419
Authors: Paulo C T Souza; Sebastian Thallmair; Paolo Conflitti; Carlos Ramírez-Palacios; Riccardo Alessandri; Stefano Raniolo; Vittorio Limongelli; Siewert J Marrink Journal: Nat Commun Date: 2020-07-24 Impact factor: 14.919