Literature DB >> 30645108

How Effectively Can Adaptive Sampling Methods Capture Spontaneous Ligand Binding?

Robin M Betz^1,2,3,4,5, Ron O Dror^1,2,3,4,5.

Abstract

Molecular dynamics (MD) simulations that capture the spontaneous binding of drugs and other ligands to their target proteins can reveal a great deal of useful information, but most drug-like ligands bind on time scales longer than those accessible to individual MD simulations. Adaptive sampling methods-in which one performs multiple rounds of simulation, with the initial conditions of each round based on the results of previous rounds-offer a promising potential solution to this problem. No comprehensive analysis of the performance gains from adaptive sampling is available for ligand binding, however, particularly for protein-ligand systems typical of those encountered in drug discovery. Moreover, most previous work presupposes knowledge of the ligand's bound pose. Here we outline existing methods for adaptive sampling of the ligand-binding process and introduce several improvements, with a focus on methods that do not require prior knowledge of the binding site or bound pose. We then evaluate these methods by comparing them to traditional, long MD simulations for realistic protein-ligand systems. We find that adaptive sampling simulations typically fail to reach the bound pose more efficiently than traditional MD. However, adaptive sampling identifies multiple potential binding sites more efficiently than traditional MD and also provides better characterization of binding pathways. We explain these results by showing that protein-ligand binding is an example of an exploration-exploitation dilemma. Existing adaptive sampling methods for ligand binding in the absence of a known bound pose vastly favor the broad exploration of protein-ligand space, sometimes failing to sufficiently exploit intermediate states as they are discovered. We suggest potential avenues for future research to address this shortcoming.

Entities: Chemical Disease Gene Species

Mesh：

Substances：

Year: 2019 PMID： 30645108 PMCID： PMC6795214 DOI： 10.1021/acs.jctc.8b00913

Source DB: PubMed Journal: J Chem Theory Comput ISSN： 1549-9618 Impact factor: 6.006

Introduction

A longstanding goal of molecular simulation has been to capture the binding process of a drug or other small-molecule ligand to its target. If unbiased, atomic-level molecular dynamics (MD) simulations of spontaneous ligand binding could be performed reliably, then they could determine not only ligand-binding sites and poses but also binding pathways and structural determinants of binding kinetics, all of which are topics of great interest in drug discovery.[1,2] Thanks to improvements in algorithms, software, and computer hardware, individual all-atom simulations can now reach lengths of many microseconds (or even a millisecond with specialized hardware).[3] This has enabled unguided simulations of the full binding process for a number of ligand–protein pairs, generating considerable excitement.[4−8] The great majority of ligands, however, bind on time scales substantially longer than those currently accessible to individual MD simulations, making binding a rare event. A number of methods exist for this problem of rare event sampling and appear in the literature under different names, beginning with “splitting” strategies introduced in the 1950s.[9] Currently, many different path-sampling strategies have been applied to biomolecular systems,[10] most notably the weighted ensemble method.[11] These methods have the added benefit of producing unbiased data sets from which statistical mechanical observables can be calculated. However, the majority of path-sampling methods presuppose or require prior knowledge of the pathway of interest, requiring either start and ending points or some progress coordinate that must be defined prior to the run. Defining such coordinates for the binding of a ligand to an unknown location on a protein presents a significant challenge. An alternative approach for sampling rare events is the use of heuristic methods, where the results may be biased by the choice of sampling strategy, but less information needs to be known beforehand about the desired result. One such class of heuristic method is sometimes referred to as adaptive sampling, in which multiple rounds of simulations are performed, initiating the simulations in each round from molecular system configurations deemed most promising based on the results of previous rounds. In a typical adaptive sampling protocol, one runs many independent, simultaneous MD simulations (samplers) at each round. A subsequent machine-learning step uses available simulation trajectories to build a statistical model of what is currently known about what the ligand can do. The group of samplers is then restarted from the “most interesting” protein/ligand positions, as determined by the model, to begin a new round of adaptive sampling. By iterating these two steps of sampling and machine learning, adaptive sampling explores the protein–ligand space in a parallelized manner. In principle, adaptive sampling may also offer speedups relative to traditional MD simulation by “wasting” less simulation time on uninteresting regions of the space of all possible configurations. The degree to which this is true in the case of protein–ligand binding, if at all, is the topic of this paper. Adaptive sampling methods have been applied primarily to the problem of sampling protein conformations. For this application, a reasonable assumption is that any stable conformation that is different from those previously observed is interesting. Quantifying novelty in this way is relatively straightforward. Prior work uses metrics that aim to minimize redundant sampling,[12] reduce the uncertainty of transitions,[13] capture slow events,[14] or explore free energy[15] or other landscapes.[16] Adaptive sampling has also been used to study ligand binding,[17−19] but its performance relative to traditional MD simulation on protein–ligand systems typical of those encountered in drug discovery has not been determined. Existing implementations have been evaluated only on small test systems, present little justification for their choice of methods and parameters, and frequently rely on knowledge of the true bound pose to guide sampling. In this Article, we evaluate the performance of adaptive sampling with no prior knowledge of both simple and complex protein–ligand systems using long unbiased MD simulations as a baseline. We introduce a suggested adaptive sampling protocol and implementation that requires no a priori knowledge of binding site, bound pose, or even that the ligand can bind at all—intended to be broadly transferable to ligand binding, in general. We provide the reasoning behind all design decisions to aid the reader in evaluating or implementing adaptive sampling for their own applications. We find that current adaptive sampling methods as well as the improvements we present here do not usually sample bound poses faster than traditional MD simulation. However, they do excel at tasks requiring broad sampling of protein–ligand space, such as identifying possible interaction sites all over the protein, and they are able to accurately and efficiently characterize ligand pathways to a binding site even when the bound pose is not identified. We analyze our results by placing adaptive sampling in the context of the algorithmic exploration–exploitation dilemma, a class of problems featuring a trade-off between obtaining new knowledge (exploration) and using that knowledge (exploitation).[20]

Methods

Overview

Adaptive sampling protocols generally consist of four main steps, as depicted in Figure .

Figure 1

Overview of the adaptive sampling protocol. Initially, multiple independent molecular dynamics simulations are run of an input system with multiple copies of the ligand in solution. All configurations of the molecular system observed in simulation trajectories are clustered, then used to update a statistical model. Clusters are selected for resampling in the next round by assigning a score to each cluster, and new molecular system configurations are built from selected clusters as starting points for another round of simulation.

Run multiple independent molecular dynamics simulations, generally starting from different configurations of the molecular system. Cluster all configurations of the molecular system observed in simulation trajectories. Select clusters for resampling in the next round. This involves assigning a score to each cluster, often on the basis of a statistical model that describes what is known about the dynamics of the molecular system given the simulations performed so far. Build new molecular system configurations to serve as the starting points for another round of simulation, based on configurations from the selected clusters. Overview of the adaptive sampling protocol. Initially, multiple independent molecular dynamics simulations are run of an input system with multiple copies of the ligand in solution. All configurations of the molecular system observed in simulation trajectories are clustered, then used to update a statistical model. Clusters are selected for resampling in the next round by assigning a score to each cluster, and new molecular system configurations are built from selected clusters as starting points for another round of simulation. Many design decisions and hyperparameters must be set to have a functional adaptive sampling implementation. In Sections –2.6, we introduce a recommended adaptive sampling protocol that requires no a priori knowledge of the ligand’s bound pose or even the neighborhood of its binding site, with well-defined, user-tunable parameters. We allow a choice of several scoring functions, including the hub scores metric, which has not been previously used for adaptive sampling. Additionally, our implementation is the first to enable the effective use of multiple identical ligands in a single simulation to further enhance sampling, with a system building step to place them as efficiently as possible. The source code for our implementation and configuration files used for test systems are publicly available at github.com/drorlab/adaptive_sampling.

Molecular Dynamics Simulation Step

Each round of adaptive sampling begins with running many simulations in parallel. For the initial round of sampling, one or more identical ligands are placed at random positions that are at least some minimum distance from the protein and from one another, typically in the water surrounding the protein. In subsequent runs, the ligands are placed in regions of interest, as defined by the scoring function (see Section ). Each simulation begins with a static structure with initial atom velocities assigned randomly. This diversifies sampling because initial random assignment of velocities may remove the system from low energy states, favoring exploration.[21] The structure is minimized and equilibrated with restraints on the ligand and protein (see the Supporting Information (SI)). Using nligands in each simulation rather than only one yields an nligands-fold increase in sampling but runs the risk of mischaracterizing molecular system behavior if the ligands frequently interact with one another or communicate through the protein. This parameter should be selected such that ligands have sufficient “room to roam” and do not exhibit a tendency to aggregate in simulation. Two additional user-specified parameters determine the number N and length trun of samplers. Available computational resources and competing research priorities will place a realistic upper bound on N, but larger values are generally better because more samplers means greater exploration of configuration space in the same amount of wall-clock time. The choice of trun requires some consideration. Individual trajectories need to be long enough to have a reasonable chance of capturing relevant events but short enough that one can perform multiple rounds of simulation, with the results of each guiding the next.

Clustering Ligand Positions

Raw simulation data specify the spatial coordinates of every atom for every frame in the trajectory. To select ligand coordinates for resampling in the next round, we must divide the system configurations sampled in simulations thus far into a discrete list of clusters, which can be used to construct a statistical model and then scored. In practice, each cluster will correspond roughly to a set of ligand locations, although clusters may also reflect the internal conformation of the ligand and protein. To cluster effectively, this high-dimensional trajectory data set must be reduced to a lower-dimensional representation that captures the features of interest, particularly the position of each ligand relative to the protein. We wish to assign these features without prior knowledge of the binding site or pose, disqualifying the use of metrics like root-mean-square deviation (RMSD) to bound pose or distance to binding pocket residues. Because multiple identical ligands may be in the system, we featurize each ligand independently. That is, for each frame of a simulation with 10 ligands, 10 feature vectors will be generated. We make the assumption that ligands do not interact substantially, which is moderately enforced at the system building phase by placing ligands far from each other. Ligands are featurized according to the log of the minimum distance of each ligand heavy atom to each protein residue. The dimensionality of the vector is reduced to the slowest evolving components in time with time-structure-based Independent Components Analysis (tICA).[22,23] Clustering on the tICA components produces “geometric” clusters because the only information present in the original feature vectors is spatial. Simple geometric clustering is insufficient to produce a good representation of the system. In particular, there are far more spatially distinct ligand positions in the bulk solvent than there are protein-interacting ones, but these solvent positions are not meaningful because ligands in solvent rapidly exchange between different geometric clusters and completely solvated ligands are not a priority for resampling. We therefore apply a second clustering step in which geometric clusters that quickly interconvert are “lumped” into macrostates. The resulting macrostate clusters are geometrically distinct but of varying size due to this second, kinetic clustering step (see the SI). Clustering is done on the entire data set, not just the new trajectories from each round, because cluster assignments change when more of the configurational space is explored. User-defined parameters in this step include the tICA lag time, the number of tICA components used for clustering, the number of geometric clusters, and the number of macrostates. Of these, the tICA lag time is of special importance because it determines the fastest dynamics that may be resolved in the final model, regardless of the frequency with which individual simulation frames are saved.

Model Construction

After clustering, observed transitions between macrostates are counted and a Markov state model (MSM) is constructed.[24] This model represents each macrostate cluster as a node in a directed graph, where edges between nodes correspond to possible transitions from one ligand macrostate to another, with a “weight” specifying a transition rate estimated from the frequency of transition observed in simulation. This model can be used to estimate many different properties of the molecular system, including pathways, kinetics, equilibrium populations, and free energies.[25] The construction of this model uses several user-defined parameters, including the number of macrostates and MSM lag time, which are further described in the SI. For a high-quality MSM suitable for further analysis, parameter selections are validated by observing convergence in the model’s implied time scales.[26,27] However, this validation requires a significant amount of simulation data, creating a chicken-and-egg problem for adaptive sampling runs. We instead choose the model parameters somewhat arbitrarily, using the MSM only to guide sampling with the assumption that it is of dubious quality. Indeed, we find that even for successful adaptive sampling runs, our parameter selections are not ideal (Supplementary Figure 1).

Scoring Functions

The choice of which clusters from the macrostate model to resample is made by selecting clusters for resampling with a probability inversely proportional to their score, as determined by some scoring function. We apply a scoring function new to adaptive sampling, hub scores, and compare it to two simpler ones, populations and counts. An ideal scoring function will be resistant to model uncertainty; that is, when the model is far from convergence, scores will still reflect useful states for resampling despite little data or an inaccurate picture of sampling. The scoring function should also be transferable between systems and should work regardless of the depth of the binding pathway, the presence or absence of a lipid bilayer, and so on. The simplest scoring function we investigated is the counts function, where a state is selected with probability inversely proportional to the number of times the state has been sampled in simulation. This rewards the exploration of new or sparsely sampled states regardless of their context. It has been used in several previous studies.[18,19,28] The populations scoring function selects states for resampling with probability inversely proportional to their predicted equilibrium population by the current MSM. This is equivalent to sampling states that the current model predicts have high free energy. This scoring function aims to both reduce uncertainty in the model, as poorly characterized states often have low predicted equilibrium population, and encourage resampling of higher energy transition states, such as those present in binding pathways. This scoring function has been previously implemented as the exploration phase of Free Energy Guided Sampling.[15] Finally, we consider the use of hub scores, again with states selected for resampling with probability inversely proportional to their score. The hub score, originally applied to quantify protein folding networks,[29] has not been used previously for adaptive sampling. This score is a measure of a state’s connectivity in the MSM, with a higher hub score specifying greater connectivity. More specifically, the hub score measures the probability that any given state will lie on a pathway between any two points in the configuration space, according to the MSM. In a typical protein–ligand MSM, the bulk solvent state has the highest hub score, and states more distant from it are lower. Although the hub scores function has not been previously used for protein–ligand adaptive sampling, it appears promising due to its scoring of states in the context of the full model. When searching for bound poses, it makes sense to look for states that are at the end of pathways. Conversely, transient protein–ligand interactions intuitively should have a higher rate of interconversion with solvent, resulting in a higher hub score than a bound state at the end of a longer pathway. In this study, we aimed to conduct a survey of broadly different scoring functions rather than a complete enumeration, and, as such, we omitted several previous implementations that have been either subsumed by or shown to be less useful than our selected three. For example, resampling uncertain transitions more[13] is approximately equivalent to resampling uncertain states more with the counts function.

System Building

The final step of each adaptive sampling iteration is the construction of simulation systems for the next set of MD runs. This step is complicated by the fact that we typically use multiple ligands in each simulation, and we wish to place each of them preferentially in clusters that received low (i.e., good) scores. We therefore do not simply restart simulations from frames of previous simulations. To ensure some diversity in cases where one cluster has a much lower score than others, the top-scoring N clusters are automatically sampled in the next round. A random simulation frame from each of these clusters is used to set the initial protein conformation and the position of a single ligand for one sampler. For each additional ligand, a cluster is selected at random with probability inversely proportional to its score. The ligand coordinates are set from a randomly chosen frame assigned to that cluster, provided that it is not too close to the protein or to other already-placed ligands, as determined by a user-configurable cutoff. Ligands that cannot be placed are positioned randomly in the bulk solvent. Once each system is built, simulation begins, kicking off the next round of the adaptive sampling process.

Standard Test System: Trypsin–Benzamidine

The binding of the small molecular inhibitor benzamidine to the protein trypsin is frequently used as a model system for evaluating methods involving protein–ligand binding.[5,30] The ligand-binding site is exposed to solvent, and little conformational change takes place upon binding. The high on-rate of benzamidine also enables easy sampling of multiple binding events, with an experimental association constant of 2.9 × 107 mol–1 s–1[31] and a computationally determined mean first passage time of 500 ns at a concentration of 0.0037 M.[17] We ran six trials of adaptive sampling with N = 10 samplers with individual simulation length trun = 10 ns and two trials with each of the hubs, counts, or populations scoring metrics. We arbitrarily chose a maximum run time of 80 sampling rounds because this allowed all conditions to sample at least one binding event. An initial 10 ns of equilibration was omitted from each trajectory to avoid biasing the model; see the SI. As a control, we ran six trials of traditional MD simulation, each with 10 samplers running for 800 ns each, which is equivalent to the total amount of simulation performed with adaptive sampling.

Realistic System: β2AR–Dihydroalprenolol

We also evaluated the performance of adaptive sampling on a more difficult system that represents a more typical real-world use case of the method. We selected the binding of the beta-blocker dihydroalprenolol to the β2 adrenergic receptor (β2AR) for this test because its binding process has previously been characterized by extensive traditional MD simulation.[7] β2AR is a membrane-bound G-protein-coupled receptor (GPCR) where the ligand-binding pocket is removed over 15 Å from bulk solvent. The binding pathway requires the ligand to first enter the “extracellular vestibule” of the protein, at which point it has a >50% chance of continuing on to the binding site.[7] This system is especially challenging because it contains a lipid bilayer, and the ligand is rather hydrophobic and will frequently partition into the bilayer. A previous unadaptive MD simulation study[7] of this system cut short runs where all 10 dihydroalprenolol molecules entered the membrane because dihydroalprenolol leaves the membrane quite slowly once having entered. We ran two trials of adaptive sampling using hub scores and two using population scores. In each case, we used N = 20 samplers with individual simulation length trun = 40 ns (with the initial 10 ns of equilibration omitted; see the SI) for 40 rounds of sampling. The counts metric was not benchmarked on this system because initial attempts resulted in all ligands partitioning into the lipid in various locations. These ligands fail to bind to the protein without first partitioning back into solvent, which would require prohibitive amounts of simulation time to sample. Ligand positions in the membrane are geometrically distinct and do not rapidly interconvert, resulting in a unique cluster assignment at each membrane location. Because there are many possible places where the ligand may partition into the plane of the lipid, the probability of seeing one location many times is low, resulting in a low count for each membrane-bound cluster. The counts metric therefore ends up preferentially sampling all possible ligand locations in the membrane and fails to progress along actual binding pathways. Although the hub scores function loosely correlates with observed counts, the proximity of membrane-partitioned locations to solvent results in these clusters having a higher hub score and therefore avoiding resampling. There is no correlation between clusters with low count and hub score (Supplementary Figure 2). As a control, we ran two trials of traditional MD simulation, with each replicate running for an equivalent simulation time of 1.6 μs. We also obtained trajectories used in the previous binding study[7] for additional validation. Because adaptive sampling involves running a group of simulations simultaneously, all comparisons between our implementation and traditional MD simulation will be done using groups of independent simulations with equal total amounts of simulation time.

Results

Adaptive Sampling Correctly Identifies Binding Pathways and Intermediate States

Questions of whether or not adaptive sampling is faster or better than traditional MD simulation are moot if the results of adaptive sampling are not interpretable or fail to capture necessary information to describe binding. It is also possible that the pathways predicted by adaptive sampling do not represent the true predominant binding pathway, as the method could repeatedly resample states in such a way as to force the system to move over high-energy barriers while a lower energy transition state is missed. For both test systems, we evaluate if our method sampled the known bound pose and pathway. We compared our β2AR adaptive sampling runs to the conclusions of a previous study where long MD simulations were used to characterize binding.[7] That paper notes two entry pathways: In 11 of the 12 simulations where a ligand bound, it first entered the “extracellular vestibule” (a region enclosed by extracellular loops (ECLs) 2 and 3 and helices 5–7) and then proceeded into the binding pocket. In the remaining simulation, the ligand entered between ECL 2 and helices 2 and 7 before proceeding into the binding pocket. All of our adaptive sampling trials captured both of these binding pathways (Figure ). Moreover, they did so without requiring any manual analysis―to obtain binding pathways from these adaptive sampling runs, we simply queried the MSM for the top pathways from the bulk solvent cluster to the cluster corresponding to the bound pose. Interestingly, both pathways were obtained even from an adaptive sampling trial that did not sample the crystallographic bound pose; in this case, we queried for a pathway that reached the cluster closest to the bound pose.

Figure 2

Primary (left) and alternative (right) pathways for ligand entry to the binding pocket of β2AR, as determined by 60 rounds of two adaptive sampling trials using the hub scores criteria. At the top are the predicted pathways from a trial where the crystallographic pose for the ligand was sampled, and at the bottom are those from a trial where the ligand entered the binding pocket but did not sample the crystallographic pose. In both cases, the model was queried for all paths from the most populous cluster (representing bulk solvent) to the cluster with the lowest mean RMSD to the crystallographic pose. The pathways shown were selected from the top ten highest flux paths based on the number of well-defined clusters visited. The crystallographic pose is shown as sticks, and ligand positions along the pathway are shown as pins, with the pinpoint at the nitrogen atom and the round end at the benzene ring center. The pins are colored by the cluster assignment within the pathway, with 20 pins shown per cluster. By contrast, when we performed the same automated MSM-based analysis for the trajectories obtained by the same amount of traditional MD simulation, we did not identify both binding pathways. For the first traditional MD simulation trial, only the less common binding pathway was clearly identified by the MSM. Manual analysis of the trajectories from this run shows that only one pathway was sampled in the simulation. For the second traditional MD run, neither binding pathway was clearly identified because the clusters did not have sufficient spatial resolution; in particular, the pathway intermediates were lumped together with bulk solvent. This marked difference in the quality of the predicted binding pathway suggests that by repeatedly placing the ligand in regions of interest, adaptive sampling may better characterize all available binding pathways. Traditional MD simulation may require significantly more simulation time to do this, because once the ligand is bound, it does not leave the pocket for some time. For the trypsin–benzamidine system, our results support those of previous work[17] in which adaptive sampling consistently samples the crystallographic pose of benzamidine. However, our featurization scheme, which neglects protein conformation, results in adaptive sampling’s failure to capture the protein’s metastable conformational states associated with binding, as the bound state, as reported by the final MSM, contains many different protein conformations (Supplementary Figure 3). The predicted binding pathway involves the ligand exploring the solvent and, once in the neighborhood of the binding site, quickly entering the pocket and binding. However, trypsin’s binding pocket is defined by three mobile protein loops, and the true binding pathway involves conformational changes that expose the binding site.[32] These results suggest that for systems where the protein undergoes a conformational change associated with ligand binding, our adaptive sampling implementation fails to capture sufficient conformational hints from which to reconstruct an accurate pathway, even when the bound pose is sampled in the simulation. However, this limitation is not inherent to adaptive sampling methods, in general, and a more sophisticated featurization scheme could produce macrostates that capture both protein and ligand conformation and is an interesting topic for future work.

Adaptive Sampling Does Not Typically Sample the Bound Pose More Quickly, Especially for Realistic Systems

For both systems, both adaptive sampling and traditional MD simulations were successfully able to sample the bound state. For adaptive sampling, all of the benchmarked scoring functions obtained binding events on at least one test system, although the time to binding varied considerably between trials (Supplementary Figures 4 and 5). The only scoring function that failed to obtain binding on both test systems was counts, which resulted in all ligands partitioning into the membrane in the β2AR–dihydroalprenolol system. This result indicates that there is a significant opportunity to further refine scoring functions for adaptive sampling because counts is by far the most common metric used in prior work. Surprisingly, adaptive sampling did not consistently sample the bound pose faster than traditional molecular dynamics simulation, especially for the β2AR system, where one traditional MD simulation samples a binding event in the first 200 ns of simulation (Figure b). By contrast, the fastest adaptive sampling run captured a binding event after 400 ns. Calculating the total simulation time per binding event shows that whereas adaptive sampling frequently makes more effective use of simulation time, high variance between runs prevents us from drawing stronger conclusions about the differences between adaptive sampling and traditional MD (Supplementary Figure 6).

Figure 3

Root-mean-square deviation (RMSD) to the crystallographic pose for (a) the single benzamidine ligand present in the trypsin system for two selected trials and (b) the 10 dihydroalprenolol ligands present in the β2AR system over time for all trials, including data from a previous study.[7] For adaptive sampling, vertical lines indicate resampling events every 10 (for trypsin–benzamidine) or 40 ns (for β2AR–dihydroalprenolol). All independent simulations are plotted simultaneously. Dark traces show RMSD values smoothed with a Savitsky–Golay filter with window size 5.8 ns, with unsmoothed RMSD traces shown behind in a lighter color. Binding events are defined as the ligand going from RMSD >3 Å to <2 Å and are indicated by arrows. A dotted horizontal line demarcates this 2 Å RMSD cutoff. (c) Cumulative number of binding events observed over all trials in this paper for the trypsin (top) and β2AR (bottom) systems. It is important to note that by repeatedly resampling areas of interest and ignoring others, adaptive sampling yields trajectories that are correlated, and the number and type of binding events obtained are therefore not directly equivalent to those present in independent simulations. Researchers should perform additional validation on pathways predicted by adaptive sampling to check that these correlations did not unduly bias the results. The performance difference between traditional MD simulation and adaptive sampling is not quite as extreme as these data suggest because traditional MD typically does not obtain a bound pose in 200 ns. Other simulations run obtained binding in anywhere from 800–2000 ns, with a total of 3/40 traditional MD runs and 5/20 previously published trajectories obtaining a ligand position with an RMSD <2 Å to the crystallographic pose in an equivalent amount of simulation time to our adaptive sampling trials. However, unlike traditional MD, where all trajectories are wholly independent, adaptive sampling typically requires multiple simultaneous samplers that then exchange information. If there are cluster resources available for running 20 simulations simultaneously, for example, the odds of obtaining binding in at least one traditional MD simulation on this cluster are generally good, with a reasonable probability that a binding event is observed within the first microsecond or so. Adaptive sampling may not be worth the resources required if the only desired information about the protein–ligand system is the bound pose of the ligand.

Adaptive Sampling Cannot Reliably Identify Bound Poses without Follow-Up Analysis

Given that adaptive sampling seems to be able to sample the pose in simulation, we now address the issue of whether adaptive sampling can reliably identify bound poses. Querying the final models after 60 rounds of sampling, the β2AR system finds that both runs using the hub scores criterion had the bound pose represented as one of the top three scoring clusters (of 50), whereas neither run with the populations criterion identified the pose in the top ten clusters despite repeatedly sampling the pose in the input trajectories. This agrees with previous suggestions[33] that MSM predictions are highly sensitive to the parameters used to construct the model. In this case, the choice of scoring function affects result quality significantly, and other model parameters such as lag time or number of macrostates may have even more dramatic effects. Our clustering scheme introduces an additional complication to identifying poses because the kinetic clustering step results in clusters that can be spread over large areas of the simulation box. Even if the bound pose is assigned to a given cluster, other frames assigned to that cluster could span the entire binding pocket, preventing pose identification. For example, of the three highest scoring clusters for one hub score run on β2AR, the top two have high spatial variance (Figure ). Despite not representing the crystallographic pose, these two clusters may occupy possible binding sites because an intracellular agonist has been crystallized in approximately the same region as the top-scoring cluster,[34] and the other cluster encompasses a location where cholesterol molecules are frequently present in crystal structures.

Figure 4

Twenty ligand positions from each of the top three clusters (blue, red, and yellow, respectively) are shown for two trials of adaptive sampling on the β2AR system using the hub scores criteria for a total of 2.4 μs aggregate simulation. For each trial, the model built from 60 sampling rounds was queried for the top three scoring clusters, and the displayed frames assigned to each cluster were randomly chosen. The crystal structure is shown with the protein as gray ribbons and the ligand as black sticks. The MSMs used in adaptive sampling are constructed primarily to inform subsequent rounds of sampling and are not intended to be high-quality models. Because the bound pose is indeed sampled in all of our adaptive sampling runs for the β2AR system, postrun analysis on the set of trajectories that uses geometric clustering only, incorporates knowledge of the expected binding site, or involves more sophisticated kinetic calculation would most likely be able to identify the pose. Future work could improve predictions by selecting clusters that have features likely to be characteristic of bound poses, such as low spatial variance or multiple hydrogen bonds.

Adaptive Sampling Better Characterizes Possible Allosteric Sites

Our benchmark simulations reveal another advantage of adaptive sampling: Binding sites away from the most ligand-accessible areas, such as the water-exposed regions of the β2AR system, are more rapidly sampled. Often, ligands may bind at locations on the protein other than the canonical, or orthosteric, binding site. These “allosteric” sites are of increasing interest as pharmaceutical targets, as differences in the orthosteric pocket between receptor subtypes can be subtle and targeting allosteric sites can thus allow for better subtype selectivity.[35,36] In assessing the effectiveness of adaptive sampling at identifying allosteric sites, we quantify how frequently the ligand samples the allosteric sites found in our test systems. Because all possible or even all common allosteric sites are not characterized for β2AR, we use information about allosteric sites present in the entire family of GPCRs, of which β2AR is a member with the locations of possible allosteric sites approximated by the locations of crystallographically resolved cholesterol (and cholesterol hemisuccinate) molecules. This is reasonable because cholesterol is found in multiple, conserved binding sites on multiple GPCRs and is a hydrophobic molecule that is chemically similar to dihydroalprenolol. Adaptive sampling using either population scores or hub scores achieves significantly greater sampling of all possible cholesterol sites compared with traditional MD simulation (Figure ). In an equivalent amount of simulation time, the ligands in adaptive sampling simulations also sample the entire space within 5 Å of the protein more completely than traditional MD (Supplementary Figure 7).

Figure 5

On the upper left, β2AR is shown as ribbons with sticks showing the locations of all resolved cholesterols (either as cholesterol or cholesterol hemisuccinate) in Class A GPCR structures deposited in the Protein Data Bank (PDB).[48] Plotted is a sampling of each of those cholesterol locations in terms of the probability of any ligand atom occupying the region in a given frame. (See the SI.) This improvement in sampling also holds true for the trypsin system. Again, the ligands in adaptive sampling simulations also sample the space within 5 Å of the protein more completely than an equivalent amount of traditional MD (Supplementary Figure 8).

Discussion

Thinking about Adaptive Sampling in the Context of the Exploration–Exploitation Dilemma

Adaptive sampling is most commonly used in the field of robotic remote sensing, where a robot or team of robots is tasked with finding a small number of interesting regions in a large, poorly characterized environment.[37] When solving this exploration problem, the robots have the choice of either visiting regions in the neighborhood of the currently known best region (exploitation) or minimizing uncertainty in unknown parts of the space (exploration).[38] A well known review by Thrun gives an overview of this trade-off in the context of reinforcement learning.[39] Exploring the configuration space of a protein–ligand system with the goal of identifying (or at the very least, sampling) bound poses is a similar problem on a much smaller physical scale. Adaptive sampling methods here face the same exploration–exploitation dilemma, where a trade-off must be made between searching for new binding sites and interactions (exploration) or resampling already seen locations (exploitation) that may provide more accurate bound poses or progress further along binding pathways. This trade-off has been previously recognized in the context of protein conformational sampling[16,40] but has not yet been discussed for the ligand-binding problem. Some computational methods for the ligand-binding problem choose to emphasize exploration to an extreme, mapping out the entire protein surface probabilistically in terms of where the ligand can bind.[41,42] Others employ a more exploitation-based approach using a knowledge-based metric such as RMSD to bound pose or ligand distance to binding pocket residues to determine which states should be resampled.[21,32,43,44] Human intuition can also be used to manually determine which states are of interest for resampling.[45] In cases where the binding site of the ligand is treated as an unknown, the scoring functions that have been used previously for adaptive sampling of protein–ligand binding tend to favor exploration over exploitation. This is also true of the hub score we have introduced, as newly discovered states frequently have the lowest hub score. To encourage exploitation over exploration, one needs a scoring function that can recognize when a state is likely to be on the binding pathway. For our purposes, an effective scoring function (1) does not require any a priori knowledge of binding pose or site, (2) should avoid biasing the simulations toward the sampling of unlikely, or worse, unphysical pathways, and (3) produces superior sampling for the quantity of interest relative to traditional MD simulation. The scoring function used is roughly equivalent to the progress coordinate used in path-sampling methods, but applications of these methods to protein–ligand systems[42,46] require knowledge of the ligand-binding site. Designing a scoring or progress metric that does meet this first criterion is complicated by the inefficiency of these methods in sampling high-dimensional spaces,[47] and reducing the dimensionality of this space requires the researcher to make some assumptions of the nature of the binding pathway. Designing such a function for either adaptive or path-sampling contexts without prior knowledge of where the ligand binds is nontrivial but not impossible. For example, one might use the ligand’s solvent-accessible surface area (SASA), the variance of all ligand positions assigned to a macrostate, or the interaction energy between the ligand and the protein. Indeed, prior work has balanced exploration–exploitation trade-offs for sampling protein conformations by incorporating physical properties such as protein SASA, free energy, or experimental measurements.[16,40] However, adaptive sampling algorithms that favor exploitation must have the ability to identify when exploitation has failed to know when more exploration is necessary,[39] a challenging thing for a problem where even expert medicinal chemists cannot readily determine what differentiates true bound poses from decoys. Developing a scoring function that better handles this dilemma is an interesting subject for future work.

Rationalizing Adaptive Sampling’s Performance Relative to Traditional MD Simulation

It is no surprise that a method that favors exploration better explores possible ligand locations compared with traditional MD simulation. For systems where getting the ligand in the vicinity of the binding pocket is sufficient for binding, adaptive sampling can be expected to outperform traditional MD because it especially excels at obtaining ligand sampling broadly across the protein surface. This is why adaptive sampling is a win compared with traditional MD for the trypsin–benzamidine system (as seen in both our implementation and that of others). This binding process does not require consistent sampling of some intermediate state (exploitation). Instead, the ligand binds quickly once in the neighborhood of the binding site (exploration). For β2AR and dihydroalprenolol, however, finding the exact bound pose requires considerable exploitation of an intermediate state in the binding pathway. In this system, the key intermediate in binding is in the extracellular vestibule,[7] where the ligand may dwell for quite some time before finally assuming the bound pose (see traditional MD RMSD traces in Figure b). This intermediate state must therefore be sampled repeatedly to have a reasonable chance of seeing a binding event. In addition to using scoring criteria that may fail to recognize relevant intermediate states, our adaptive sampling protocol enforces diversity among the samplers at the system building step when placing the first ligand. Although a perfect implementation might, in this case, have all samplers focusing on the intermediate state, in reality the models are quite underdetermined, and allowing all samplers to resample a single state results in a substantial chance of wasting sampling on irrelevant ligand positions. In the β2AR test case, enforcing diversity in the built systems results in the ligand being removed from the vestibule to sample some other location of interest. As a result, adaptive sampling results in a much lower ligand dwell time in the vestibule and thus fewer binding events for β2AR (Supplementary Figure 9). Although adaptive sampling is not consistently the best approach for quickly identifying bound states, it excels in sampling many binding events and characterizing pathways, even if the exact bound state is not found. Our adaptive sampling runs that obtained binding events sampled many more binding events than the traditional MD simulations. In traditional MD simulation, once a ligand has bound, it rarely unbinds due to the higher energy barrier for unbinding. This prevents sampling of multiple binding events in a single simulation. During adaptive sampling, the ligand is frequently repositioned and can be removed from the binding site to sample other locations of interest, resulting in many more binding events. This results in a better characterization of the binding process, with less uncertainty in the transitions between states along the pathway but less time spent in the bound state, especially for β2AR (Supplementary Figure 10). Additionally, adaptive sampling’s ability to better sample all ligand locations close to the protein means that it holds promise for identifying regions where the ligand has a reasonable dwell time. Docking, follow-up simulation, or other analyses may then be performed on those locations to produce possible bound poses of that ligand at that potential site.

Conclusions

Whereas it may be initially surprising that adaptive sampling may offer little benefit in terms of getting ligands to bind faster, putting the problem of ligand binding in the context of the exploration–exploitation dilemma provides an explanation. Getting ligands to bind in simulation is algorithmically equivalent to exploring some large, underdetermined space (in this case, the space consists of all possible protein–ligand interactions) to find a small number of regions of interest (ligand-binding sites), with a high cost of sampling (running simulations is slow). Adaptive sampling with the scoring functions typically used in the absence of a known binding site is an inherently exploration-focused method. This results in a longer time to obtain binding for systems such as β2AR with a key intermediate state that must be resampled. However, for scientific questions that require the broad exploration of protein–ligand configuration space, such as the identification of possible allosteric binding sites, adaptive sampling offers a substantial benefit to the researcher, as its focus on exploration results in much broader sampling of ligand locations near the protein, in general. Likewise, adaptive sampling methods excel at providing an accurate description of ligand-binding pathways. Scoring functions that favor exploitation represent a promising area of future work. Indeed, one might be able to combine the ability of adaptive sampling methods to rapidly identify potential ligand-binding sites with sampling approaches that are better able to exploit these sites to get true bound poses within them.

41 in total

1. Quantifying hub-like behavior in protein folding networks.

Authors: Alex Dickson; Charles L Brooks
Journal: J Chem Theory Comput Date: 2012 Impact factor: 6.006

2. Structural basis for modulation of a G-protein-coupled receptor by allosteric drugs.

Authors: Ron O Dror; Hillary F Green; Celine Valant; David W Borhani; James R Valcourt; Albert C Pan; Daniel H Arlow; Meritxell Canals; J Robert Lane; Raphaël Rahmani; Jonathan B Baell; Patrick M Sexton; Arthur Christopoulos; David E Shaw
Journal: Nature Date: 2013-10-13 Impact factor: 49.962

3. Markov models of molecular kinetics: generation and validation.

Authors: Jan-Hendrik Prinz; Hao Wu; Marco Sarich; Bettina Keller; Martin Senne; Martin Held; John D Chodera; Christof Schütte; Frank Noé
Journal: J Chem Phys Date: 2011-05-07 Impact factor: 3.488

Review 4. Kinetics for Drug Discovery: an industry-driven effort to target drug residence time.

Authors: Doris A Schuetz; Wilhelmus Egbertus Arnout de Witte; Yin Cheong Wong; Bernhard Knasmueller; Lars Richter; Daria B Kokh; S Kashif Sadiq; Reggie Bosma; Indira Nederpelt; Laura H Heitman; Elena Segala; Marta Amaral; Dong Guo; Dorothee Andres; Victoria Georgi; Leigh A Stoddart; Steve Hill; Robert M Cooke; Chris De Graaf; Rob Leurs; Matthias Frech; Rebecca C Wade; Elizabeth Cunera Maria de Lange; Adriaan P IJzerman; Anke Müller-Fahrnow; Gerhard F Ecker
Journal: Drug Discov Today Date: 2017-04-13 Impact factor: 7.851

5. A molecular simulation protocol to avoid sampling redundancy and discover new states.

Authors: Marco Bacci; Andreas Vitalis; Amedeo Caflisch
Journal: Biochim Biophys Acta Date: 2014-09-02

6. How does a drug molecule find its target binding site?

Authors: Yibing Shan; Eric T Kim; Michael P Eastwood; Ron O Dror; Markus A Seeliger; David E Shaw
Journal: J Am Chem Soc Date: 2011-05-13 Impact factor: 15.419

Review 7. Weighted Ensemble Simulation: Review of Methodology, Applications, and Software.

Authors: Daniel M Zuckerman; Lillian T Chong
Journal: Annu Rev Biophys Date: 2017-03-01 Impact factor: 12.981

8. Structural Diversity of Ligand-Binding Androgen Receptors Revealed by Microsecond Long Molecular Dynamics Simulations and Enhanced Sampling.

Authors: Mojie Duan; Na Liu; Wenfang Zhou; Dan Li; Minghui Yang; Tingjun Hou
Journal: J Chem Theory Comput Date: 2016-09-01 Impact factor: 6.006

9. Computational fragment-based binding site identification by ligand competitive saturation.

Authors: Olgun Guvench; Alexander D MacKerell
Journal: PLoS Comput Biol Date: 2009-07-10 Impact factor: 4.475

10. Mapping the Ligand Binding Landscape.

Authors: Alex Dickson
Journal: Biophys J Date: 2018-09-29 Impact factor: 4.033

9 in total

Review 1. Markov State Models to Elucidate Ligand Binding Mechanism.

Authors: Yunhui Ge; Vincent A Voelz
Journal: Methods Mol Biol Date: 2021

2. Markov state modeling reveals alternative unbinding pathways for peptide-MHC complexes.

Authors: Jayvee R Abella; Dinler Antunes; Kyle Jackson; Gregory Lizée; Cecilia Clementi; Lydia E Kavraki
Journal: Proc Natl Acad Sci U S A Date: 2020-11-12 Impact factor: 11.205

3. Can molecular dynamics simulations improve the structural accuracy and virtual screening performance of GPCR models?

Authors: Jon Kapla; Ismael Rodríguez-Espigares; Flavio Ballante; Jana Selent; Jens Carlsson
Journal: PLoS Comput Biol Date: 2021-05-13 Impact factor: 4.475

4. Protein-ligand binding with the coarse-grained Martini model.

Authors: Paulo C T Souza; Sebastian Thallmair; Paolo Conflitti; Carlos Ramírez-Palacios; Riccardo Alessandri; Stefano Raniolo; Vittorio Limongelli; Siewert J Marrink
Journal: Nat Commun Date: 2020-07-24 Impact factor: 14.919

5. Multisecond ligand dissociation dynamics from atomistic simulations.

Authors: Steffen Wolf; Benjamin Lickert; Simon Bray; Gerhard Stock
Journal: Nat Commun Date: 2020-06-10 Impact factor: 14.919

6. Changes in Protonation States of In-Pathway Residues can Alter Ligand Binding Pathways Obtained From Spontaneous Binding Molecular Dynamics Simulations.

Authors: Helena Girame; Marc Garcia-Borràs; Ferran Feixas
Journal: Front Mol Biosci Date: 2022-07-04

7. High-resolution mining of the SARS-CoV-2 main protease conformational space: supercomputer-driven unsupervised adaptive sampling.

Authors: Théo Jaffrelot Inizan; Frédéric Célerse; Olivier Adjoua; Dina El Ahdab; Luc-Henri Jolly; Chengwen Liu; Pengyu Ren; Matthieu Montes; Nathalie Lagarde; Louis Lagardère; Pierre Monmarché; Jean-Philip Piquemal
Journal: Chem Sci Date: 2021-02-02 Impact factor: 9.825

8. Enhancing Biomolecular Sampling with Reinforcement Learning: A Tree Search Molecular Dynamics Simulation Method.

Authors: Kento Shin; Duy Phuoc Tran; Kazuhiro Takemura; Akio Kitao; Kei Terayama; Koji Tsuda
Journal: ACS Omega Date: 2019-08-19

9. Spontaneous Ligand Access Events to Membrane-Bound Cytochrome P450 2D6 Sampled at Atomic Resolution.

Authors: André Fischer; Martin Smieško
Journal: Sci Rep Date: 2019-11-11 Impact factor: 4.379

9 in total