Literature DB >> 25247823

The adaptive biasing force method: everything you always wanted to know but were afraid to ask.

Jeffrey Comer¹, James C Gumbart, Jérôme Hénin, Tony Lelièvre, Andrew Pohorille, Christophe Chipot.

Abstract

In the host of numerical schemes devised to calculate free energy differences by way of geometric transformations, the adaptive biasing force algorithm has emerged as a promising route to map complex free-energy landscapes. It relies upon the simple concept that as a simulation progresses, a continuously updated biasing force is added to the equations of motion, such that in the long-time limit it yields a Hamiltonian devoid of an average force acting along the transition coordinate of interest. This means that sampling proceeds uniformly on a flat free-energy surface, thus providing reliable free-energy estimates. Much of the appeal of the algorithm to the practitioner is in its physically intuitive underlying ideas and the absence of any requirements for prior knowledge about free-energy landscapes. Since its inception in 2001, the adaptive biasing force scheme has been the subject of considerable attention, from in-depth mathematical analysis of convergence properties to novel developments and extensions. The method has also been successfully applied to many challenging problems in chemistry and biology. In this contribution, the method is presented in a comprehensive, self-contained fashion, discussing with a critical eye its properties, applicability, and inherent limitations, as well as introducing novel extensions. Through free-energy calculations of prototypical molecular systems, many methodological aspects are examined, from stratification strategies to overcoming the so-called hidden barriers in orthogonal space, relevant not only to the adaptive biasing force algorithm but also to other importance-sampling schemes. On the basis of the discussions in this paper, a number of good practices for improving the efficiency and reliability of the computed free-energy differences are proposed.

Entities: CellLine Chemical Disease Species

Year: 2014 PMID： 25247823 PMCID： PMC4306294 DOI： 10.1021/jp506633n

Source DB: PubMed Journal: J Phys Chem B ISSN： 1520-5207 Impact factor: 2.991

Introduction

Although the statistical-mechanical foundations of free-energy calculations were laid a long time ago,[1−3] their practical applications became possible only with the advent of modern computers. From the inception of computer-based free-energy calculations[4,5] it has been clear to theorists that direct Boltzmann sampling of rugged energy landscapes is inefficient. The subsequent development of the field is a history of efforts to remedy this problem. In free-energy calculations, the quantity of interest is almost always the free-energy difference between physical states of the system rather than the absolute free energy of a given state. From this standpoint, calculations can be categorized on the basis of variables used to transform the system between states of interest. Then, two main classes can be distinguished, namely alchemical and geometrical transformations.[6] They rely, respectively, on changes of a parameter in the Hamiltonian or a function of atomic coordinates. The first class encompasses structural modifications of chemical species that rest upon the remarkable malleability of the potential energy function in molecular-mechanics-based simulations,[7,8] reminiscent of the fabled ability of alchemists to transmute base metals into noble ones. Alchemical transformations are often associated with the free-energy perturbation method[2,3] on account of the progressive and perturbative nature of the change incurred by the system of interest, although, strictly speaking, alchemical free-energy calculations can be carried out by way of alternate approaches, such as thermodynamic integration.[1] The very first application of alchemical transformations to a nontrivial chemical problem was published nearly 30 years ago by William Jorgensen, to whom the present contribution is dedicated.[9] In noteworthy agreement with experiment, this pioneering simulation reproduced the relative hydration free energy of methanol with respect to ethane. The second class of transformations embraces virtually any geometric modification in a molecule or a collection of molecules by means of selected collective variables tailored to address the problem at hand, which could vary from changes in the internal degrees of freedom in a molecule to intricate recognition and association phenomena.[7] Such collective variables form the transition coordinate, a low-dimensional representation of a multidimensional mathematical object. The distinction between the two types of transformations is theoretically important. For geometric transformations, the transition coordinate is, in effect, a generalized coordinate, the evolution of which is usually described by Hamilton’s equations of motion. In contrast, for a parameter in the Hamiltonian, no equations of motion naturally exist, although it is possible to extend the formalism of dynamics to include such a parameter.[10] As a consequence, a number of methods for calculating free energy by way of geometric transformations cannot be applied to alchemical transformations without such extension. The adaptive biasing force method can serve as an example.[7] A considerable number of ingenious techniques have been developed to improve the efficiency of mapping free-energy landscapes associated with geometrical transformations along a transition coordinate.[11−25] A common feature of these methods is their reliance on importance-sampling techniques. The central idea of these techniques is to depart from sampling from the Boltzmann distribution defined by the original Hamiltonian and, instead, sample from another distribution that favors regions of phase space that would be visited only infrequently but are important to achieving reliable free-energy estimates. Because this procedure is clearly biased, it is essential to know how to correct, or unbias, it to recover the true underlying distribution. Importance sampling is commonly used not only in statistical mechanics of condensed phases but also in other fields of science, usually as a variance reduction technique most frequently combined with the Monte Carlo method. Probably the most popular, and also the oldest importance-sampling technique used in free-energy calculations is umbrella sampling.[11] It relies on introducing a bias in simulations that favors states corresponding to large values of the free energy along the transition coordinate. Local elevation,[15] conformational flooding,[16] metadynamics,[20,24] and the Wang–Landau algorithm[26,27] are examples of more recent importance-sampling algorithms, united by the common denominator that a memory-dependent potential disfavors regions of conformational space that have already been frequently visited. In a sense, the adaptive biasing force method[19] rewove the fabric of free-energy calculations of geometrical transformations, as it is characterized by both conceptual and practical simplicity and requires, at least in principle, little intervention from the end user. In spite of apparent similarities with the local-elevation and conformational-flooding strategies in its aim to sample efficiently all values of the transition coordinate, its theoretical underpinnings are quite different, as we will argue further in this paper. Furthermore, in contrast with seemingly similar strategies, the adaptive biasing force algorithm requires no prior knowledge of the free-energy landscape at hand. At its core, the adaptive biasing force method is an adaptive importance-sampling strategy in which the quantity being adjusted is the average force acting along the transition coordinate. It helps the system under study escape from kinetic traps in which it would otherwise have remained for a very long time. The method constitutes a highly efficient route to estimating free energies, which, since its inception, has been used to tackle a number of challenging problems of chemical and biological interest, such as mechanical proteins,[28] transport phenomena,[29−31] or protein–ligand and protein–protein recognition and association.[32] More generally, it is a versatile, adaptive, importance-sampling strategy that can be utilized in many fields, whenever sampling of a probability measure is thwarted by metastability of the sampling dynamics. In this self-contained contribution, the multiple facets of the adaptive biasing force algorithm are discussed in an exhaustive manner, tackling a number of issues that have not been addressed so far, or only rarely so. In the following section, we present the theoretical foundations of the method, discussed in the context of other free-energy approaches. Next, we briefly address a number of practical issues related to a proper choice of the transition coordinate. Then, an analysis of the convergence properties of the method and approaches to calculating and controlling statistical errors associated with the calculated free-energy values are presented. The discussion of convergence and errors continues with the focus on nonergodicity scenarios, and ways to identify and circumvent them. Subsequently, we examine for the first time how the adaptive biasing force algorithm is used in conjunction with geometrical restraints, which have to be enforced in many problems of interest. Finally, we discuss some new strategies for combining thermodynamics and kinetics in importance-sampling simulations, before closing with recommendations for ”good practices” in applying the adaptive biasing force method and an outlook toward further promising statistical mechanical and algorithmic developments.

The adaptive biasing force algorithm

In this section, the essential idea behind the adaptive biasing force algorithm is first explained in terms appealing to physical intuition, followed by the theoretical underpinnings of the method presented in a more formal language. Then, the reader is guided through the common expressions for the mean force, and the adaptive algorithm, both of which are at the core of the method. Finally, the method is compared with related importance-sampling schemes.

The Adaptive Biasing Force in Plain Language

The adaptive biasing force method is aimed at improving the efficiency of molecular dynamics simulations in which the potential energy surface is sampled ineffectively due to free-energy barriers. In practice, these barriers appear as bottlenecks in the dynamics of certain privileged coordinates that describe the transitions between physically important states (transition coordinates). They also cause the system to become trapped in some states for durations exceeding the time scale of the simulation, resulting in incomplete sampling. The free energy along a transition coordinate can be seen as a potential resulting from the average force acting along the coordinate (i.e., the negative of the gradient of this potential), hence the name potential of mean force. In the formalism of thermodynamic integration, on which the adaptive biasing force is based, the average force is the quantity that is calculated directly. Subsequently, this force is integrated to yield the potential. The instantaneous force acting along the coordinate may be decomposed into the sum of the average force (which depends only on the value of the transition coordinate) and a random force with zero average, reflecting fluctuations of all other degrees of freedom. Hence, in a low-dimensional view of the process, the transition coordinate evolves dynamically in its time-independent potential of mean force, and this evolution is driven by the random force. In many instances, the random force can be satisfactorily approximated as diffusive, leading to a simple physical picture in which the system diffuses along the transition coordinate in the potential of mean force. The idea behind the adaptive biasing force algorithm is to preserve most characteristics of this dynamics, including the random fluctuating force, while flattening the potential of mean force to remove free-energy barriers, and thus accelerate transitions between states. This is done adaptively, without any prior information about the potential of mean force. To accomplish this, the instantaneous force acting along the coordinate is calculated, and its running time average is recorded, thus providing an on-the-fly estimate of the derivative of the free energy at each point along the pathway. At the same time, an external biasing force is applied, exactly canceling the current estimate of the average force. Over time, as the estimate converges to the average force at equilibrium, the total, biased average force stabilizes at values very close to zero. Then, the system experiences a nearly flat potential of mean force and displays accelerated dynamics along the transition coordinate. The fact that the biasing force is exactly equal to the mean force is actually not crucial. What is important is that the biasing force yields sufficiently uniform sampling of the transition coordinate that the remaining barriers can be easily traversed in response to thermal fluctuations.

Theoretical Backdrop

Let us now define the adaptive biasing force algorithm in a formal way. The adaptive biasing force algorithm is not inherently tied to any specific type of dynamics but does rely on sampling of the canonical ensemble. In explicit-solvent simulations, Langevin dynamics with sufficiently soft damping and small stochastic forces becomes a mere perturbation of Hamiltonian dynamics and may be used as one simple way to achieve canonical sampling. For convenience, but without loss of generality, we will base our description below on Langevin dynamics. Langevin dynamics can be written aswhere (x, p) denotes the positions and momenta of the particles at time t, M is the mass tensor, V : is the potential energy function, γ is the friction coefficient, W is the Wiener process that underlies the random force (white noise), and β–1 = kBT, where kB is the Boltzmann constant and T is temperature. The dynamics of eq 1 is ergodic (under mild conditions on V) with respect to the canonical measure Z–1 exp[−βpTM–1p/2] dp μ(dx), where μ(dx) = Z–1 exp[−βV(x)] dx. Ergodicity means that long-time averages converge to canonical averages:and that the law at time t of the stochastic process converges to the canonical measure in the long-time limit:where is the expected value. The first limit in eq 2 is of particular interest for practical applications because it allows for computing canonical averages from trajectory averages. For a typical potential V, the associated Boltzmann measure μ is multimodal: high-probability regions are separated by low-probability regions. The former correspond, for instance, to the most likely conformations of a biological object, which are typically separated by transition regions of very low probability. For these reasons, estimating averages with respect to the probability measure μ is, in general, a difficult task. In particular, the ergodic properties of the dynamics of eq 1 are not sufficient to devise reliable numerical methods, because under these premises, the stochastic process remains trapped in large-probability regions, and, as a consequence, the long-time asymptotic regime t → ∞ is very difficult to reach in the ergodic limits in eqs 2 and 3. The fact that the system remains for a very long time in some region of phase space before hopping to another region is called metastability, and the corresponding states of the system are called metastable. The inability to reach the ergodic limit is often called quasi nonergodicity; the system appears nonergodic on the time scales of the simulations. A typical example of a metastable state is a local free-energy minimum in the conformational space of a protein. The adaptive biasing force method relies on modifying the potential V in such a way that the energy landscape is flattened along a given transition coordinate ξ: . Here, we restrict ourselves to a one-dimensional transition coordinate, leaving generalization to high-dimensional transition coordinates for the section Expressions for the Mean Force. More precisely, the potential V is changed to x → V(x) – A[ξ(x)] and A is updated in such a way that it converges to the free energy A, defined (up to an additive constant) bywhere the measure δξ( (dx) is supported by the subset {x, ξ(x) = z} and is such that δξ( (dx) dz = dx. In practice, the bias is only applied in a window [zmin, zmax] as explained in the section Justification of a Stratification Strategy. Notice that, by the definition of A, the canonical measure associated with the biased potential V(x) – A[ξ(x)] is such that ∫exp(−β{V(x) – A[ξ(x)]})δξ( (dx) = C, where C is a constant independent of z. Therefore, if the biased potential V(x) – A[ξ(x)]1ξ( is used, the marginal along ξ is a uniform law over [zmin, zmax]. Here, 1ξ( is an indicator function, which is one when ξ(x) ∈ [zmin, zmax] and zero otherwise. Let us recall that the marginal law is defined as follows: if x is distributed according to a probability distribution μ, then the law of ξ(x) is called the marginal of μ along ξ. If we knew the free energy A, it would be a good idea to use −A◦ξ as a biasing potential, because sampling along ξ would be easier. This is illustrated in simple two-dimensional examples in Figures 1 and 2. Notice in particular that the free energy seems to be a good biasing function for efficient sampling of both energetic barriers (Figure 1) and entropic barriers (Figure 2).

Figure 1

Figure 2

Upper row: (A) the original two-dimensional potential is zero inside the hourglass shape, and +∞ outside; (C) time trajectory of the first coordinate x in the stochastic process, showing oscillations between the two metastable wells. Lower row: (B) free energy along the transition coordinate ξ(x,y) = x, featuring a purely entropic barrier; (D) time trajectory of the transition coordinate x in the free-energy-biased dynamics, showing no metastability.

Upper row: (A) original two-dimensional, double-well potential displayed as level sets; (C) time trajectory of the first coordinate x in the stochastic process, showing oscillations between the two metastable wells. Lower row: (B) level sets of the same potential biased by the free energy associated with the transition coordinate ξ(x,y) = x; (D) time trajectory of the transition coordinate x in the adaptively biased dynamics, showing no metastability. Upper row: (A) the original two-dimensional potential is zero inside the hourglass shape, and +∞ outside; (C) time trajectory of the first coordinate x in the stochastic process, showing oscillations between the two metastable wells. Lower row: (B) free energy along the transition coordinate ξ(x,y) = x, featuring a purely entropic barrier; (D) time trajectory of the transition coordinate x in the free-energy-biased dynamics, showing no metastability. The main ingredient that we now need is an update rule for A such that limA = A. This is based on the following formula,[33−35] which defines the mean force, namely, the negative of the derivative of the free energy. More general formulas can be derived, which give rise to many variants of the adaptive biasing force method (see section Expressions for the Mean Force),where the instantaneous force is defined by Equation 5 is a consequence of the definition of the free energy in eq 4 and of the co-area formula (which is a generalization of the Fubini theorem); see for instance ref (8). From eq 5 it follows that A′ is the conditional average of the instantaneous force, Fξ, with respect to the canonical measure conditioned to a fixed value of the transition coordinate: A′(z) = [Fξ(x)ξ(x) = z]. An important observation is that eq 5 remains true if the potential V is changed to V – A◦ξ: for any function A In other words, a biasing potential A depending solely on ξ leaves averages conditioned by ξ unchanged. This is the intuition behind the adaptive biasing force dynamics, which can be written as Indeed, if (x, p) were at equilibrium with respect to the biased canonical measure Z–1 exp(−βpTM–1p/2) dpZ–1 exp[−β(V – A◦ξ)(x)] dx, then A′ would be equal to A′. In practice, of course, the sampling is not sufficiently fast for the process to be instantaneously at equilibrium with respect to the time-varying biased potential V – A◦ξ, and this is why this heuristic is not sufficient to fully understand the convergence of the adaptive biasing force dynamics (see the section Convergence and Error Analysis). However, this simple reasoning is sufficient to check that if A converges to some limit A∞, then necessarily, A∞′ = A′. From a practical viewpoint, the conditional expectation in eq 8 can be computed using two procedures: either time averages over a single long trajectory or averages over many replicas run in parallel. These procedures will be discussed in ample detail in the section Multiple-Walker Strategies below. Intuitively, the adaptive biasing force dynamics thus consists of adding a force A′[ξ(x)]∇ξ(x) that exactly compensates the average of the original force, −∇V(x), along a given transition coordinate. If ξ is well-chosen, the hope is to observe a fast convergence (at least compared to the original dynamics embodied in eq 1 at equilibrium) of A′ to the mean force A′.

Expressions for the Mean Force

Given a transition coordinate ξ(x), the mean force is a well-defined quantity, yet its expression as an ensemble average of an instantaneous force Fξ is not unique, as we will see below. In adaptive biasing force simulations, the choice of a convenient expression for Fξ is driven by practical considerations, notably ease of implementation and numerical behavior, such as variance. The classic expression for the instantaneous force involves an explicit coordinate transformation Ξ from Cartesian to generalized coordinates, which include the transition coordinate ξ. That is, Ξ : , with Ξ1 = ξ and components Ξ for i > 1 are generalized coordinates of no particular physical significance, but necessary to the mathematical framework. A valid expression for the instantaneous force is then[12]which in the physics literature is more commonly written aswhere |J| is the determinant of the Jacobian matrix (∂Ξ–1)(. From the arbitrary choice of Ξ is derived a (somewhat arbitrary) function Fξ, whose ξ-restricted ensemble average nevertheless yields the uniquely defined mean force (eq 5). Equation 10 provides an intuitive view that different choices of Ξ and Fξ correspond to different ways of projecting the Cartesian forces, −∇V, onto the transition coordinate. The direction along which the forces are projected in this expression is the vector ∂1Ξ–1, which we call “inverse gradient”.[36] As the gradient of ξ can be seen as the changes in ξ corresponding to infinitesimal changes in x, the inverse gradient is the vector along which a change in ξ is propagated in Cartesian coordinates, other generalized coordinates (Ξ) being constant, hence the dependence of the inverse gradient on the explicit coordinate transformation. The alternate notation for the inverse gradient, ∂x/∂ξ, has the drawback of hiding this dependence on the choice of Ξ. Numerically, eq 10 is impractical for two reasons. One is that defining Ξ explicitly can be exceedingly difficult, especially if ξ is a collective coordinate (e.g., the radius of gyration of a set of particles). Supposing that this step is done, the second difficulty comes with the numerical computation of the Jacobian derivative, as it involves second derivatives of Ξ–1 whose analytical derivation and numerical implementation may be cumbersome, again, depending on the nature of the transition coordinate. To circumvent this issue, the original adaptive biasing force method was introduced with an instantaneous force estimator based on a constraint force that is calculated iteratively, but never applied.[19] In the initial implementation of the adaptive biasing force algorithm[37] in the popular molecular dynamics program NAMD,[38] eq 10 was used because the small set of implemented coordinates made it practical. As this set was greatly extended in the framework of the collective variables module,[39] more versatile expressions of the instantaneous force were required.[36] Den Otter put forward the idea that the inverse gradient can be replaced with an arbitrary vector field (satisfying certain requirements).[40] In other words, changes in ξ may be propagated along an arbitrary direction in Cartesian coordinates, without explicitly defining a complete set of generalized coordinates. That idea was extended to a multidimensional case by Ciccotti et al.[41] Consider a vector transition coordinate (ξ), in the presence of a set of constraints of the form σ(x) = 0. For each coordinate ξ, let v be a vector field satisfying, for all j and k: The ith partial derivative of the free energy can then be calculated as the conditional average of the following instantaneous force:of which eq 6 is a special case, limited to a single coordinate and choosing v = ∇ξ/|∇ξ|2, which satisfies the condition (eq 11) above. Note that this estimator still requires the calculation of second derivatives, in the form of the divergence of the vector field v, although the relative freedom in choosing v can be taken advantage of to make the divergence calculation practical. The choice v = ∇ξ/|∇ξ|2 is always valid, and as such, convenient for theoretical purposes, but certainly not always optimal when implementing specific generalized coordinates. Expressions that were chosen in practice for those coordinates implemented in the collective variables module are listed in ref (39). Darve et al. described an estimator that does not require second derivatives, but rather first derivatives with respect to time and space, and is valid for multidimensional adaptive biasing force calculations.[42] This estimator resembles a statistical form of Newton’s equation of motion: instead of relating acceleration to the potential energy gradient, it relates the mean acceleration to the gradient of the free energy. In a considerable simplification, only the first derivative of the force with respect to ξ needs to be derived. The time derivative is calculated numerically in the same fashion and at the same level of accuracy as time derivatives of other quantities in molecular dynamics. Other ways of simplifying calculations of instantaneous forces will be discussed in the context of extended adaptive biasing force simulations, or eABF, in the section The Extended Adaptive Biasing Force Method.

Adaptive Algorithm

The final, essential ingredient of the theory underlying the adaptive force method is an algorithm for deriving the current estimate of the average force as simulations progress. In its generic, one-dimensional implementation, the transition coordinate, ξ, connecting two end points, is divided into M equally sized bins of width δξ in which forces are accrued in the course of the simulation. In a naïve approach, the approximation to the average force, F̅ξ(Nstep, k) in bin k after Nstep molecular dynamics steps is just the simple, unweighted average of all force samples in this binprovided that the bin has already been visited at least once. Nstep is the number of samples accrued in bin k after Nstep steps and Fμ abbreviates the μth force sample in this bin. This approach would work well for large Nstep. However, when only a few samples are available in a given bin k, the running average might be a poor estimate of the actual average force in this bin. Moreover, adding additional samples might markedly change F̅ξ(Nstep,k). Large fluctuations in the running estimate of the average force are undesirable, as they may drive the system away from equilibrium, thus slowing the convergence of the algorithm and reducing the efficiency of the method. To control these effects, a procedure is needed to reduce variations in early estimates of F̅ξ(Nstep,k). A number of schemes can be applied for this purpose. In current implementations, the biasing force in bin k at time t is applied in full only if the number of samples N is above a threshold, Nfull. It is ramped up smoothly as N varies from 0 to Nfull. In one implementation,[42] the ramp is linear and the force is proportional to N/Nfull; in another,[36,37] the biasing force is zero for N < Nfull/2 and is ramped linearly above that value, proportionally to 2N/Nfull – 1. Both implementations have proven to be efficient in a number of applications, but other, more advanced schemes are possible. So far, there have been no systematic studies on the efficiency of different adaptive algorithms, but it is anticipated that it may be strongly system-dependent. For a sufficiently large Nstep, F̅ξ(Nstep,k) approaches the correct average force in each bin. Then, the free-energy difference, ΔAξ, between the end point states can be estimated simply by way of summing the force estimates in individual bins.If the average force is a rapidly changing function of ξ, a more sophisticated integration algorithm may be required. It might be also possible to develop binless integration algorithms, similar to those proposed for some other free-energy calculation methods.[43,44] A common trait of importance-sampling algorithms is the discretization of the transition coordinate, ξ, in bins of width δξ in which statistical information is accrued in the course of the simulation.[7] In the umbrella-sampling scheme,[11] for instance, a histogram is constructed, corresponding to the biased probability of occurrence of the molecular assembly of interest at the different values of the transition coordinate. In the adaptive biasing force algorithm,[19] bins are utilized to store instantaneous values of the thermodynamic force that acts along the transition coordinate. As has been observed in practice previously for diffusive dynamics, the instantaneous force in any given bin obeys a normal distribution.[37] At thermodynamic equilibrium, by definition, its average is exactly equal to −A′(z), that is, the gradient of the free energy along the transition coordinate. Insisting upon being at thermodynamic equilibrium is pivotal here, as application of a poorly estimated time-dependent bias, i.e., from a distribution out of equilibrium, will not yield a Hamiltonian bereft of a mean force acting along the transition coordinate. The adaptive algorithms described above contain two adjustable parameters: bin width, δξ, and the number of samples, Ninit, below which R(Nstep) is not equal to 1 or, equivalently, averaging does not follow eq 14. The choice of these parameters should be done with some care. At constant δξ, large values of Ninit yield better estimates of the average force once the number of samples collected in a given bin reaches this threshold value and conventional averaging begins, at the price of delayed application of the full averaging and slow, initial progress along the transition coordinate. Small values of Ninit, in turn, tend to drive the system out of equilibrium. Typically, setting Ninit in the range between 200 and 500 appears to be a good compromise that allows for avoiding both types of problems. At constant Ninit, large values of δξ prevent capturing variations of the average force on short length scales. This may have adverse effects on the accuracy of integration in eq 15. On the other hand, small values of δξ require longer simulation times to collect sufficient force statistics in every bin. If the transition coordinate is a distance in an atomistic system, the choice of δξ as 0.1 or 0.2 Å usually represents a satisfactory trade-off.

Differences with Other Importance-Sampling Algorithms

In the last four decades a number of strategies have been developed for computing free-energy changes as a function of geometrical transformations, each endowed with advantages, as well as limitations.[11−25,45] To a large extent, umbrella sampling, whether in its original formulation or variants thereof,[11,18,46] remains one of the most widely utilized routes to address rare events in molecular simulations. In its original form, umbrella sampling referred to incorporating an external biasing potential in the simulation, i.e., an umbrella,[11] ideally the negative of the free energy, which would yield a broad, if not uniform exploration of the transition coordinate. In other words, in ideal circumstances, the system would evolve in the collective-variable space on a flat free-energy hyperplane, as is also the case for the adaptive biasing force method. Under most circumstances, however, the form of the optimal umbrella is unknown. Thus, for any qualitatively new problem, the end-user must resort to an educated guess regarding the shape of the biasing umbrella potential, usually on the basis of prior knowledge of this and related problems. This may constitute a daunting task.[47] Poorly predicted biasing potentials yield nonuniform probability distributions across the transition coordinate. This decreases the efficiency of umbrella sampling, which, in extreme cases, may reduce rather than improve the efficiency compared to results with unbiased calculations. This common shortcoming led to the development of an adaptive variant of the umbrella-sampling algorithm,[18] wherein the initial guess of the biasing potential is progressively refined in light of a series of short simulations. Adaptive umbrella sampling is one member of a broader family of techniques called adaptive biasing potential methods.[48] Local elevation,[15] conformational flooding,[16] or its more recent avatar, metadynamics,[20,49] adaptive biasing molecular dynamics,[23] and the Wang–Landau algorithm[26,27] belong to the same family. In the former methods, the idea is to penalize the already visited states by changing the potential V to V – A◦ξ, A(z) being related to the occupation time of the value z of the transition coordinate up to time t. The longer the time spent in a bin {x, ξ(x) ∈ [z0, z1]}, the larger the biasing potential A(z), z ∈ [z0, z1]. The local elevation technique and metadynamics are based on a similar idea. The biasing potential A is built as a sum of Gaussian kernels that are periodically added to the Hamiltonian along the ξ variable. This pushes the system away from states that have already been visited and, by doing so, improves sampling. It ought to be noted that the potential A, rather than its derivative, is computed in these two cases, hence, the name adaptive biasing potential methods. One important downside of the adaptive biasing force algorithm, compared to the class of adaptive biasing potential methods, lies in its inability to handle discrete transition coordinates, for instance, coordination numbers. This drawback can be understood by considering that the free energy is now a map from integers to reals, and, thus, has no derivative and, hence, no mean force. From a mathematical viewpoint, the adaptive biasing force method, just like adaptive biasing potential methods, is an adaptive importance-sampling procedure. There is, however, a salient difference between these two techniques. In the latter, the potential of mean force or, equivalently, the corresponding probability distribution along the transition coordinate is being adapted. In contrast, the former relies on biasing the force, i.e., the gradient of the potential. This difference is more important than it might appear at first sight, as potentials and probability distributions are global properties whereas gradients are defined locally. In terms of probability distributions, it means that the count of samples in the neighborhood of a given value of the transition coordinate is insufficient to estimate probability. Knowledge of the underlying probability distribution over a much broader range of ξ is required. This may considerably impede efficient adaptation. In contrast, all that is needed to estimate the gradient is the knowledge of local behavior of the potential of mean force. Other regions along the transition coordinate do not have to be visited. Thus, in many instances, adaptation proceeds markedly faster. Using a common metaphor, the difference between the adaptive biasing potential and adaptive biasing force methods can be compared to inundating the valleys of the free-energy landscape as opposed to plowing over its barriers to yield an approximately flat terrain, conducive to unhampered diffusion. There are also a number of important technical differences between these two methods. For example, in metadynamics and its ancestors, the widths and weights of Gaussian functions and the frequency with which the biasing potential is updated have to be carefully chosen, which often requires considerable experience. In the adaptive biasing force method, estimating the biasing gradient happens automatically by way of a simple algorithm, described in the previous subsection. Another technical concern about adaptive methods is to ensure that adaptation vanishes once A approaches the converged free energy.[50] There are a number of ways to fulfill this condition more-or-less automatically,[51,52] but adaptive biasing force and adaptive biasing potential techniques remain intrinsically different from this point of view. In the adaptive biasing force algorithm, if the correct free energy is given as an initial guess (namely if V is replaced by V – A◦ξ in eq 8), then the biasing force will not be updated (A′ in eq 8 will be constant over time). This is not the case for an adaptive biasing potential strategy. Moreover, if the derivative of the biasing potential is needed (for example to bias the Langevin dynamics as in eq 8), the advantage of the adaptive biasing force algorithm is that A′ is directly computed, whereas in adaptive biasing potential algorithms, one needs to differentiate the evaluated biasing potential A, which may lead to very noisy results because A is estimated along a stochastic trajectory. Because the basic quantity calculated, and subsequently integrated, in the adaptive biasing force method is the force, this approach belongs to the thermodynamic integration class of methods. However, in contrast to conventional implementations of thermodynamic integration and its generalizations, such as the blue-moon ensemble approach,[12] the adaptive biasing force algorithm does not rely on constrained molecular dynamics, but instead is based on unconstrained simulations; i.e., the free-energy difference is not determined at discrete values of the transition coordinate through solving constrained equations of motion. Sampling of the transition pathway proceeds in a continuous, unhampered fashion, guided by the diffusion properties of the system of interest, obviating the need for re-equilibration at fixed, predefined values of the transition coordinate, even in stratified simulations. As will be discussed further in this paper, this may improve ergodic behavior of the system. It is also of interest to compare the adaptive biasing force method with reconstructions of free-energy landscapes from nonequilibrium trajectories that represent repeated pulling experiments.[53−55] The latter are based on the groundbreaking Jarzynski identity,[56] or its extension to bidirectional transformations,[57] combined with steered molecular dynamics. Even though both approaches involve molecular dynamics trajectories that are initially away from equilibrium, there is a fundamental distinction between them. In the adaptive force method, only the average, or systematic force is removed to erase the ruggedness of the free-energy landscape, preserving the random force responsible for diffusion. Once a good estimate of the average force becomes available, the equilibrium behavior of the system is restored. In pulling experiments, the transformation always proceeds away from equilibrium at constant velocity and, therefore, the instantaneous force acting along the transition coordinate is nil. The random force actually appears in the formalism after averaging over the ensemble of pulling experiments. Moreover, achieving convergence in calculations based on Jarzynski’s identity usually requires large numbers of independent realizations,[21] which comes at a significant computational cost. Taken together, when a geometrical transformation can be undertaken at equilibrium, it is not clear whether there is any practical advantage of handling the problem at hand by means of nonequilibrium work experiments rather than the adaptive biasing force method. An important advantage of gradient-based methods[7,8] is the possibility of formally decomposing the free-energy change into physically meaningful contributions,[58,59] thereby helping to dissect qualitatively the nature of the intermolecular interactions at play. It is worth noting that different energy terms contribute to the mean force both explicitly through force terms and implicitly through the Boltzmann weights in the canonical average; contributions can only be separated numerically at the former level, not at the latter. Decomposition of the free energy is generally handled a posteriori through computing the thermodynamic force between, for instance, groups of atoms of interest. This force is then projected onto the transition coordinate determined for each stored configuration, prior to the construction of a histogram from which the average force is inferred. Integration of the latter yields the desired contribution to the total free-energy change across the entire transition pathway. Among many options for reconstructing free-energy landscapes along a transition coordinate, which one is the best? Considering that the efficiency of different methods strongly depends on their implementation in software packages and, very likely, on a system of interest, attempts to answer this question appear somewhat misguided and unproductive. That said, one aspect of the adaptive biasing force algorithm pleading in its favor is its simplicity.[36,37] How the algorithm operates is physically intuitive,[7,42] requiring, in principle, very little prior knowledge of the free-energy landscape, or input from the end-user, even for qualitatively new problems.

TRANSITION COORDINATE

Central to geometric transformations is the concept of a transition coordinate. In this section, this concept is illuminated in the context of free-energy calculations aimed at tackling rare events. Specifically, we will discuss how the transition coordinate is explored with the adaptive biasing force algorithm and delve into the practical aspects of defining this coordinate. Stratification,[60] a common technique for improving the efficiency of free-energy calculations by partitioning the reaction pathway into ranges of the transition coordinate, will be discussed with the focus on the justification for and limitations of this strategy.

Transition Coordinates and Rare Events

For any transition from a macrostate, A, to another macrostate, B, of the same system there exists an exact one-dimensional transition coordinate: the committor probability.[61−64] In most cases, this coordinate is difficult to calculate and usually offers very limited insight into the nature of the process of interest. For these reasons, it is often more useful to employ a transition coordinate that is only an approximation to the committor probability but is physically more meaningful and easier to handle. Sometimes it might be helpful to extend the reduced representation of the transition to a transition coordinate that extends beyond one dimension. Not only physical significance but also efficiency of sampling, given a transition coordinate, is of concern. These two factors are often closely related. As in any enhanced sampling method based on a reduced representation, adaptive biasing force sampling relies on stochastic exploration along the transition coordinate, ξ, enhanced by the adaptive bias, combined with equilibration of other, orthogonal degrees of freedom. Equilibration in the orthogonal space is critical in two respects: It affects the mobility along ξ (see Figure 1B for a diffusive example), and it determines the rate of convergence of A′(z) = ⟨−Fξ(x)⟩ξ, which is an average over the orthogonal degrees of freedom. In the ideal situation of time scale separation, all slow degrees of freedom are captured by the transition coordinates, and relaxation in the orthogonal space is comparatively fast. In other words, the adaptive biasing force algorithm removes metastability along the transition coordinates, provided that other degrees of freedom are not metastable. Complex systems such as biological macromolecules, however, possess many slow, coupled degrees of freedom, making time scale separation difficult or impossible to achieve. Fortunately, empirical results suggest that time scale separation is not an absolute requirement of adaptive biasing force sampling. One reason behind it might be that enhanced diffusion in the transition coordinate space reduces metastability in the orthogonal space, by letting the dynamics sidestep orthogonal barriers rather than cross them, making some “multichannel” cases (see the section Hidden Barriers and Other Challenges to Obtaining Accurate Results) tractable with standard adaptive biasing force simulations. More encouraging still, convergence in such multichannel cases can be markedly accelerated by multiple-walker formulations of the adaptive biasing force algorithm (see the section Multiple-Walker Strategies). Yet, not all intuitive choices of the transition are appropriate. As will be extensively discussed further in this paper, the end-to-end distance of the α-helical deca-alanine peptide provides a good example of an inadequate transition coordinate. Depending on the range of values, the coordinate exhibits completely different behavior. For values corresponding to the stable α-helix (14 Å) and larger, separation of time scales is obeyed and the adaptive biasing force converges well.[37] In contrast, smaller values of the end-to-end distance correspond to a rich set of metastable, compact states that are not resolved by the transition coordinate.[36] As a result, adaptive biasing force dynamics becomes trapped in these states, and the free-energy estimator does not converge in accessible simulation times.[65] Attempts to resolve these metastable states by two- and three-dimensional coordinates give improved results, allowing the exploration of all metastable basins, yet those basins are not all resolved even in three dimensions. Furthermore, the adaptive biasing force dynamics retains some metastability.[36] These difficulties are not due to specific deficiencies of the adaptive biasing force method, but rather, to shortcomings of the reduced representation, which would constitute an obstacle to any sampling method.

Practical Design of a Transition Coordinate

In practice, finding an effective reduced representation is still a process largely guided by physical intuition about the process of interest, as well as trial-and-error. More systematic and robust approaches for this dimension reduction step are an area of active research.[66] A limiting factor is often the availability of usable numerical implementations of the generalized coordinates of choice. The collective variables module[39] is an attempt to overcome this limitation, by providing a rich and flexible toolbox to define many types of coordinates, in particular those useful for the description of biological macromolecules. In this module, the adaptive biasing force is implemented, among other algorithms. Once an intuitive understanding of the relevant generalized coordinate has been obtained, some technical choices remain to be made to express this coordinate as a function ξ of atomic Cartesian coordinates. Though these decisions may seem ancillary, they have a strong influence on the accuracy, convergence, and computational performance of the adaptive biasing force algorithm. When objects of interest are composed of many atoms, there are often several nearly equivalent ways to define the transition coordinate. In sufficiently long simulations, different definitions produce nearly identical potentials of mean force, but the efficiency might vary considerably. For example, the distance between two proteins could be defined by selecting one central atom in each protein, or by selecting the centers of mass of large groups of atoms. The largest contribution to the instantaneous force on each atom in a molecule is due to rapidly oscillating bonded interactions; more generally, in all applications, forces on the particles will contain some background noise. If many atoms contribute to the projected force (e.g., the first right-hand-side term in eq 13), contributions from those noisy terms average out, which lowers the variance of the instantaneous force estimator. In the initial stage of an adaptive biasing force simulation, nonequilibrium effects occur if the biasing force applied to some degrees of freedom varies faster than coupled degrees of freedom can relax. This can be mitigated by defining “smooth” collective variables that involve many Cartesian coordinates with smoothly varying contributions to the gradient (hence, to the biasing force vector). At equilibrium, a smooth motion may be described by a nonsmooth variable and this may make the convergence of the adaptive bias more difficult. In the deca-alanine stretching toy example, the geometric process of interest involves all atoms in the peptide, yet our classic approach uses the end-to-end distance as a biasing coordinate, with the implicit assumption that biasing forces exerted on the terminal atoms propagate, and that the entire peptide relaxes rapidly, so that the biased trajectory remains close to equilibrium. A more robust approach is to replace that coordinate with the radius of gyration of the peptide. The gradient of the radius of gyration has components on each atom proportional to its distance from the center of the group, so that atoms close to the center also experience moderate biasing forces and do not lag behind the terminal atoms when the biasing force varies rapidly over time. A more elaborate discussion of this problem can be found in ref (42). In some cases, however, coordinates involving large collection of atoms will have less resolution than more local ones. A biophysical example is permeation through an interface, such as a lipid bilayer. The common choice of transition coordinate is the distance of the permeant molecule to the center of the bilayer center, projected onto the bilayer normal. In such a case, the physically relevant phenomenon is interaction of the permeant with the membrane surface, on a local scale. If the bilayer patch is large enough, it will experience fluctuations away from planarity, thus the local position of the interface will fluctuate with respect to the bilayer center. In turn, this will cause spurious fluctuations in the transition coordinate. A comparable situation arose in a study of glycerol permeation through the water channel protein GlpF.[30] Interaction of the permeant with the protein depended on distances to neighboring pore-lining residues, which fluctuated with respect to the bulk of the protein. Therefore, the global coordinate measured between the protein and glycerol molecule had insufficient resolution on a local scale, and a more local coordinate had to be defined to resolve the structure of the free-energy profile in the constricted region of the selectivity filter.

Performance

Depending on system size and implementation details, choices of coordinates may impact performance noticeably. A common application case is a biomolecular simulation performed with the NAMD package,[38] with the adaptive biasing force algorithm implemented[36] in the collective variables module.[39] NAMD is highly parallelized and can simulate large systems on supercomputers with nearly linear scaling. The current implementation of the adaptive biasing force algorithm is not parallelized and runs on one node, leading to two potential bottlenecks: (1) Poor serial performance on the master node: for the most expensive variables (e.g., those involving sums on atom pairs), the bias calculation on the master node might take longer than the force calculation on other nodes. (2) Scaling may suffer even for computationally simple coordinates, if too many atom coordinates and forces have to be communicated across nodes, increasing latency. This second case may affect any highly parallel application with coordinates defined on many atoms. In practice, one often has to find an acceptable trade-off between variance and performance by selecting a reasonable number of Cartesian coordinates that are most representative of the quantity of interest. To describe conformational fluctuations of a protein, for example, the root-mean-square deviation of α carbon coordinates is often a good compromise.

Justification of a Stratification Strategy

To increase the efficiency of exploring the transition coordinate in adaptive biasing force[19] or umbrella sampling,[11] it is common to break down the transition path into a series of sequential strata or windows. This idea[60] arises from the intuition that the time to convergence grows as the square of the range of the transition coordinate. Simple considerations provide the rationale for this strategy. Consider a transition path of length . Convergence of the free energy over the entire range is achieved after t0. Let us now divide the transition path into N nonoverlapping windows of lengths , for which convergence is attained after t1′, ..., t′. As shown in the Supporting Information, t0 > Σt′. We illustrate this result in a simple example of a tagged water molecule diffusing in a bulk environment over a stretch of 20 Å. The transition coordinate is the projection of the distance separating the centers of mass of the tagged water molecule and the simulation cell along a given direction of Cartesian space. Translational invariance due to the isotropic nature of the liquid imposes the condition that the free-energy change along the transition coordinate be zero. For this system, the potential of mean force was determined in a single, 20 Å long stratum, two 10 Å strata, four 5 Å strata, and eight 2.5 Å strata. The simulations continued until the root-mean-square deviation between the computed potential of mean force and the reference zero free-energy profile was less than 0.1 kcal/mol. As can be observed in Figure 3, t0, the time necessary to attain convergence within this preset tolerance without stratification is on the order of 100 ns. When the transition coordinate is divided into two strata, convergence in each 10 Å stratum is reached in approximately 40 ns, i.e., in 80 ns over the full 20 Å range. The effect of stratification increases further, as the reaction coordinate is decomposed into four and eight windows. In these cases, convergence is achieved in approximately 6 and 2 ns per window, respectively, which corresponds to 24 and 16 ns for the complete 20 Å range.

Figure 3

Stratification strategies for the translationally invariant toy model of a tagged water molecule diffusing in a bulk aqueous medium. The transition path spans 20 Å and is handled in a single window (A), in two 10 Å windows (B), in four 5 Å windows (C), and in eight 2.5 Å windows (D). For each stratification strategy, a potential of mean force calculation is carried out until the root-mean-square deviation with respect to the accurate zero free-energy profile is less than 0.1 kcal/mol. Both theoretical considerations and a simple example given above appear to suggest that extensive stratification should be always preferred. This, however, does not have to be the case. First, simulations in each window require initial equilibration, which may erase benefits gained from stratification into many windows. Perhaps more importantly, extensive stratification may impede ergodic sampling of the phase space. This behavior, shared with the standard thermodynamic integration, will be discussed in section Addressing Nonergodicity Scenarios. In contrast to umbrella sampling and its adaptive variants, the adaptive biasing force algorithm does not require that consecutive windows of a dissected transition coordinate overlap. Provided that convergence has been achieved in each window, the gradient, ∇A(ξ), can be reconstructed merely by joining the gradients from individual windows at the boundaries. This improves efficiency, as the requirement for overlap between windows may frequently add as much as 50% to the total simulation time. If the gradient between consecutive windows is not continuous to within statistical error, this is usually a sign of difficulties with ergodic sampling. Again, this problem will be considered in section Addressing Nonergodicity Scenarios.

Convergence and Error Analysis

In this section, we examine the convergence properties of the adaptive biasing force algorithm and the reliability of the computed free-energy estimates. First, the reader is invited to follow a demonstration that the numerical scheme formally converges. Then, we discuss how statistical errors associated with the reconstructed free-energy landscapes can be measured and managed.

Formal Convergence of the Adaptive Biasing Force Algorithm

The aim of this section is to explain convergence properties of the adaptive biasing force algorithm. For more details, the reader is referred to the Supporting Information. We restrict ourselves to the following simple setting. We consider the overdamped Langevin dynamics,where x is defined in the N-dimensional torus (namely, in [0, 1] with periodic boundary conditions) and ξ(x1,...,x) = x1. We direct the reader to refs (67) and (68) for extensions to more general situations of the results presented below. Starting from eq 16 and using the above choice of the transition coordinate ξ, the adaptive biasing force dynamics can be represented aswhere x1 denotes the first coordinate of the vector x, e1 is the vector with coordinates (1, 0, ..., 0), and ∂1V denotes the partial derivative of V(x1,...,x) with respect to x1. As explained in the section Theoretical Backdrop, it is clear at least formally that the only possible stationary state for A is the free energy (up to an additive constant). Indeed, if A converges to some stationary state A∞, then the law of x converges to Z∞–1 exp(−β{V(x) – A∞[ξ(x)]}) dx, which implies that (∂1V(x)x1=x1) is A′(x1). Yet, this does not provide a proof of the convergence of A′ to A′, and it does not explain why the adaptive biasing force dynamics of eq 17 indeed converges faster to equilibrium than the original, unbiased dynamics of eq 16. One way to understand this convergence is to look at the way the law of x evolves. Let us denote ψ(t,x) the density of x. For the original dynamics of eq 16, the density ψ satisfies the Fokker–Planck equation:For the adaptive biasing force dynamics, the density ψ satisfies It ought to be noted that eq 19 is a nonlinear partial differential equation (PDE), which makes the study of its long-time behavior much more complicated than for the linear Fokker–Planck PDE (18). Using appropriate mathematical tools (namely, entropy techniques, see the Supporting Information), one can show that if the transition coordinate is well chosen, the convergence to equilibrium for the adaptive biasing force dynamics of eq 19 is much faster than for the original unbiased dynamics of eq 18. Roughly speaking, the assumption on the transition coordinate is that the canonical measure Z–1 exp[−βV(x)] dx is “more multimodal” than the conditional measures at a fixed value x1 of the transition coordinate This is typically the case for the simple illustrative two-dimensional potentials in Figures 1 and 2 if ξ(x1,x2) = x1. This can be fully quantified, and it actually gives a way to measure the quality of the transition coordinate. In the analysis outlined above, it is assumed that the conditional expectation appearing in the adaptive biasing force dynamics is computed exactly. This analysis is therefore well adapted to discretizations that involve many replicas in parallel, which indeed converge to the adaptive biasing force dynamics with the exact conditional expectation; see ref (69). Analysis of the adaptive algorithms (adaptive biasing force or adaptive biasing potential) with estimates of the conditional expectations based on trajectory averages along a single path are much more complicated. See refs (70) and (71) for preliminary results for the Wang–Landau algorithm.

Distinguishing Sources of Error

Just like any experimental measurement, free-energy calculations, of either alchemical or geometrical nature, ought to be reported with the associated error bars. In the absence of an error estimate, a free-energy difference is generally of limited utility, making direct comparison with experiment difficult and speculative. Although much effort has been devoted in recent years to the characterization of errors that are associated with free-energy calculations,[7,8,72−81] estimating the reliability of such calculations remains an intricate task. This explains why it is not unusual that calculated free energies are still being published without error estimates. Ideally, any free-energy difference should be determined from a series of N independent simulations. If this were, indeed, the case, the best possible estimate of the target free-energy difference would be the expected value over the N simulations, i.e., , where ΔÂ denotes the estimate of the exact quantity, ΔA, inferred from one individual free-energy calculation. Then, the associated mean-square error can be written asThe first term of eq 20, σΔ2 = , is the variance, or the precision[82] of the free-energy calculation. In other words, it is a measure of its statistical error. The second term, , is the square of the bias of the free-energy estimator, i.e., the square of the difference between the expected value of the estimator and the actual free-energy change, ΔA. The bias, also referred to as the accuracy[82] of the free-energy calculation, is a measure of systematic error of the latter. How to estimate statistical error for the adaptive force method will be discussed in the next section. Unfortunately, a similar, formal treatment does not exist for systematic error. One important source of bias arises from incomplete sampling in finite-length simulations due to quasi nonergodic behavior of the system. Although this behavior cannot be quantified, in many instances it can be detected. Then, a number of remedial tools are at our disposal. How to recognize and remedy problems with quasi nonergodicity will be discussed in section Addressing Nonergodicity Scenarios. Other common sources of bias are inaccurate treatment of intermolecular interactions and algorithmic artifacts arising primarily from imprecise numerical integration of the equations of motion. These contributions to bias, which are common to all simulation methods, will not be discussed here.

Measure of the Statistical Error

The adaptive biasing force method is typically applied to nanoscale molecular processes under physiological conditions, a realm dominated by thermal noise. Thus, estimating free-energy differences using adaptive biasing force requires careful consideration of statistical error.[83] Within the adaptive biasing force framework, free-energy differences are determined by integrating the estimated mean force of the system exerted along the transition coordinate. Namely, a free-energy difference on the interval z ≤ ξ ≤ z can be expressed as[83]where ⟨F⟩ is the average of the instantaneous force on the transition coordinate ξ at the position ξ(x) = z. Thus, to determine the statistical error of the free-energy differences, we must delve into the statistics of the mean system force. We assume that the transition coordinate, ξ, is discretized and instantaneous forces calculated during the course of the simulation are collected in appropriate bins along ξ. As derived in the Supporting Information, one way to estimate the error of the mean force in bin i is given bywhere ΔFξ(x) = Fξ(x) – ⟨Fξ⟩ is the random component of the instantaneous force, n is the number of samples accrued in bin i, Δt is the time step of the simulation, and τ and ⟨ΔFξ2⟩ are the autocorrelation time and variance of ΔFξ(x) in bin i. Given a reliable estimate of the error of the mean force in each bin, we are prepared to analyze how these estimates are propagated to yield free-energy differences. On a discrete grid along the transition coordinate, the integral in eq 21 becomeswhere i and i are bin indices delimiting the ξ-interval [z, z]. Assuming independent behavior in each bin, the error of a sum of mean forces is approximated from from the Bienaymé formula as equal to the square root of the sum of squares of the errors of these mean forces,[84] A notable property of this formula is that the error increases with the size of the interval over which the free-energy difference is calculated.[83]

Application to the Toy-Model Deca-alanine

For concreteness, we now consider the error in free-energy differences for the reversible folding of deca-alanine in vacuum. We define the transition coordinate of interest ξ as the end-to-end distance of the peptide, specifically the distance between the carbonyl carbon atoms of the first and the tenth residue. Below we discuss the behavior of the three quantities that enter eq 22, namely. The number of samples in each bin n, the standard deviation of the random force (⟨Fξ2⟩)1/2, and the autocorrelation time of this force τ. See the Supporting Information for further comments on calculating these quantities. The black curve in Figure 4A is the number of samples in each bin of width δξ = 0.1 Å for the adaptive biasing force calculated in the range from 12 to 32 Å (the red curve will be discussed later in the paper). The number of samples, n, in this range varies from 17 000 to 36 000. Thus, as expected from the adaptive biasing force algorithm, the number of samples in each bin approaches uniformity. In this case, nonuniformity of sampling exceeds only slightly a factor of 2, which corresponds to the variations of the biased free energy not exceeding 0.5 kcal/mol. For comparison, the unbiased free energy changes in the same range by approximately 30 kcal/mol.

Figure 4

Coordinate dependence of sampling and the system force distribution in reversible folding of deca-alanine. (A) Samples in each bin for a 10 ns adaptive biasing force calculation on the domain ξ ∈ [4, 32] Å (black curve) and a 100 ns calculation on the domain ξ ∈ [4, 32] Å (red curve). Note the logarithmic scale on the vertical axis. (B) Distribution of the ξ-component of the instantaneous system force for different ranges of ξ. (C) Standard deviation of the ξ-component of the instantaneous system force as a function of ξ.

In Figure 4B, we show the distribution of instantaneous forces in four different bins. The forces in each bin are approximately normally distributed, with similar standard deviations of about 20 kcal mol–1 Å–1. This point is underscored in Figure 4C, which illustrates that, for deca-alanine in vacuum with Langevin dynamics emulating buffeting of the molecule by solvent, the standard deviation of the force acting along the transition coordinate changes very modestly, even though the peptide explores structures that are as disparate as can be imagined. The peptide courses through different compact forms for ξ < 10 Å, remains mostly α-helical for 10 < ξ < 16 Å, and forms extended structures with diminishing helical fractions as ξ increases beyond 16 Å.[85] The approximate uniformity of ⟨Fξ2⟩1/2 seen here is also characteristic of many other systems. For example, it has been previously found[86] that the standard deviation of the instantaneous force on the center of mass of a water molecule is about 2 kcal mol–1 Å–1, irrespective of whether the molecule lies in the bulk aqueous phase or in the hydrophobic core of a lipid bilayer. On the other hand, the transfer of a solute across a liquid–vapor interface is an example of a transition for which ⟨Fξ2⟩1/2 is expected to vary considerably with ξ. Coordinate dependence of sampling and the system force distribution in reversible folding of deca-alanine. (A) Samples in each bin for a 10 ns adaptive biasing force calculation on the domain ξ ∈ [4, 32] Å (black curve) and a 100 ns calculation on the domain ξ ∈ [4, 32] Å (red curve). Note the logarithmic scale on the vertical axis. (B) Distribution of the ξ-component of the instantaneous system force for different ranges of ξ. (C) Standard deviation of the ξ-component of the instantaneous system force as a function of ξ. Note that ⟨Fξ2⟩1/2 is considerable. Because this term enters prominently the expression for statistical error in eq 22, there are significant merits in reducing the dispersion of instantaneous forces. We have already pointed out two potential paths toward this goal. First, because the expression for the ensemble average of instantaneous force is not unique we can, in principle, choose one that reduces ⟨Fξ2⟩1/2. Second, variation of forces can be also reduced by a thoughtful choice of the transition coordinate. The third variable in eq 22, the correlation time of the instantaneous system force τ, is also the most difficult to calculate. The sampling in a single bin is rarely sufficient to obtain a converged autocorrelation function of the system force; thus, in Figure 5A we plot the autocorrelation function averaged over 40 bins along different regions of ξ. The correlation time for each region, τ, is determined by fitting an unscaled exponential function e– to the positive values of the autocorrelation function. Figure 5B reveals only modest variations in correlation time for different regions.

Figure 5

Coordinate dependence of correlations in the system force in reversible folding of deca-alanine. (A) Autocorrelation functions of the random component of the instantaneous system force for different ranges of ξ. The functions are normalized by the variance ⟨ΔFξ2⟩ so that correlation at t = 0 is unity. (B) Correlation time for ξ-ranges in panel A. Using the calculated values of n, ⟨Fξ2⟩ and τ, we now estimate the uncertainties of free-energy differences between deca-alanine structures with different end-to-end distances. The mean system force, with uncertainties calculated by eq 22, is shown in Figure 6A. Because only free-energy differences between two points along ξ, rather than free energies at single points, have a clear physical meaning, we focus on errors associated with these differences. To compute the error in the free-energy difference at points z and z, we must accumulate the uncertainties of the force between these two points, in accord with eq 24. If stratification is used and points z and z are in different windows, then the Bienaymé formula needs to be used across all windows separating these points.

Figure 6

Propagation of error in the mean force to the error of free-energy differences. (A) Mean system force on ξ for a 10 ns adaptive biasing force calculation, with error bars determined according to eq 22. (B) Error of free-energy differences A(ξ) – A(z) with the reference position z = 14 Å. This position is denoted by a violet dashed line. In Figure 6B we show the estimated error of ΔA between the α-helical minimum free-energy state (z = 14) and other states within the interval z ∈ [12, 32] Å. If we want to calculate, for instance, the difference in free-energy between the minimum and ξ = 16 Å, we obtain 3.6 ± 1.8 kcal/mol. Larger distances in ξ yield larger error: the free-energy difference between the minimum and the plateau at 25.5 Å is 19 ± 4 kcal/mol. It is expected that, for sufficiently long total simulation time, t, statistical errors of both the average forces and free-energy differences will decay proportionally as t1/2.[42] The same dependence on t should apply to deviations from nonuniform sampling. If force statistics is collected in M bins, then the root-mean-square deviation from the uniform distribution can be defined as the square root of the variance, Var(t),where Nstep and Nstep are the number of samples in bin k and the total number of samples at time t, respectively. For large t, the variance is expected to be proportional to 1/t or, equivalently to 1/Nstep. In other words, Nstep × Var(t) should be constant. An example of how Var(t) behaves in a typical simulation is shown in Figure 7. This is the expected behavior for diffusive motion. Errors that clearly deviate from such behavior are a sign of insufficient sampling. If the problem persists with increasing t, and especially if errors exhibit large fluctuations, then most likely problems with quasi nonergodic behavior have been encountered.

Figure 7

Nstep × Var(t) as a function of the number of molecular dynamics steps, Nstep. This quantity was calculated for the permeation of K+ through a transmembrane hexametric channel of a peptaibol, trichotoxin in a window along the normal to the membrane spanning the range between z equal to −15 and −9 Å. z = 0 is located in the middle of the membrane.[87] Note that for Nstep between 0.6 × 106 and 2.3 × 106, Nstep × Var(t) differs from its average value in this range by no more than 10%. The inset: the number of force samples along z in the window as a function of Nstep.

ADDRESSING NONERGODICITY SCENARIOS

A common manifestation of pathological free-energy calculations, in particular those of geometrical nature, is quasi nonergodicity, wherein sampling along the selected transition coordinate appears to be hampered. Here, we inspect closely the effects that impede accurate results to be obtained from free-energy methods, including the adaptive biasing force scheme (notably hidden barriers in the slow manifolds), and discuss how to identify these effects and outline possible remedies, by increasing the dimensionality of the transition coordinate, improved stratification, or sampling aided by multiple-replica strategies.

Hidden Barriers and Other Challenges to Obtaining Accurate Results

The primary objective of importance-sampling schemes is to facilitate exploration of the transition pathway with a uniform probability.[7,8] Among these schemes, as has been previously emphasized, the adaptive biasing force algorithm uses a local estimate of the gradient, A′, acting along the transition coordinate, to erase progressively the original ruggedness of the free-energy landscape. As has already been discussed in section Formal Convergence of the Adaptive Biasing Force Algorithm, this feature is valid from a theoretical standpoint. How true is this in practice? In most instances, satisfactorily uniform sampling is achieved quite efficiently. Occasionally, however, the adaptive biasing force algorithm does not perform as expected. The reminder of this section is devoted to explaining and identifying these special, yet important cases. Potential difficulties in applying the adaptive biasing force algorithm are intimately related to the choice of transition coordinate. A basic, yet seldom verified assumption that underlies this choice is the separation between time scales of motions along the transition coordinate and orthogonal degrees of freedom (see section Transition Coordinates and Rare Events). For complex, rugged free-energy landscapes, notably those formed by parallel valleys separated by considerable barriers in the direction orthogonal to the transition coordinate (Figure 8),[88] assuming time scale separation may turn out to be unwarranted. Because the adaptive biasing force algorithm exerts no direct action in the orthogonal space, it will not improve sampling at constant ξ. Returning to the foundational expression for the adaptive biasing force method, eq 8, which relates the gradient of the free energy to an ensemble average at constant value of the transition coordinate, the inability to cross hidden barriers in the orthogonal space is tantamount to incomplete ensemble averages and, hence, poor estimates of free-energy changes.

Figure 8

Common free-energy landscape featuring parallel valleys, collinear to the transition coordinate, ξ. These valleys are separated by substantial free-energy barriers in the direction ζ, orthogonal to ξ. ζ can be interpreted as a slow degree of freedom coupled to ξ and hampering progression along the latter direction. Excessively stratified reaction pathways preclude spontaneous crossing of high barriers, typically ΔA1. Wider windows should allow diffusion toward values of ξ, where the barrier separating valleys in the direction ζ is smaller, typically ΔA2. What are common symptoms of quasi nonergodic behavior? Several of them can be readily identified. Their presence is a guaranteed sign of flawed free-energy calculations, but their absence is not a sufficient condition to ensure that such calculations converged to the correct value. As has been discussed previously, sampling along the transition coordinate in well-behaved simulations should approach uniformity with time, and the statistical error associated with the biasing force or, equivalently, the free-energy differences should decrease in a predictable fashion. If this is not the case, difficulties in equilibrating the system along orthogonal degrees of freedom are, most likely, at play. To illustrate this point, we return to the toy model of deca-alanine reversibly folding in vacuum and to Figure 4A. In the range of ξ between 12 and 32 Å, sampling is fairly uniform even in 10-ns simulations. In contrast, in the [0, 12] Å range, sampling remains quite nonuniform, even after 100 ns (red curve). As has already been pointed out in the section Transition Coordinates and Rare Events, there are a number of metastable states in this region, all corresponding to similar values of ξ. Difficulties in properly averaging over these states markedly impedes equilibration along ξ. In the context of stratified simulations, the presence of hidden barriers along degrees of freedom orthogonal to ξ often leads to discontinuous biasing forces between adjacent windows. Another strategy for exposing apparent nonergodic behavior is to carry out bidirectional calculations, i.e., initiate the adaptive biasing force simulation from both end points along the transition coordinate. Just as in free-energy perturbation calculations,[81] the resulting hysteresis is a good indicator (although not necessarily a measure) of error. If the hysteresis markedly exceeds statistical error, the calculated free-energy values are, most likely, poorly converged. In such a case, simply combining data from both directions is not likely to improve accuracy significantly, as their proper weighting remains unknown. When quasi nonergodicity scenarios are encountered, additional simulation strategies, such as multiple walkers or multidimensional transition coordinates, should be brought to bear. Also, the chosen stratification scheme might require reevaluation. These issues are discussed in more detail below.

Balancing Ergodic Sampling and Efficiency in Window Sizes

Formally, the adaptive biasing force algorithm does not prescribe a particular window size nor does it require that consecutive windows along a stratified transition coordinate overlap. Provided that convergence has been achieved in each window, the gradient, ∇A(ξ), can be reconstructed merely by joining the gradients from individual windows at the boundaries. As we have already argued, from the efficiency point of view it might appear that using small windows is always beneficial. This is, however, not necessarily true. One concern about extensive stratification is that efficient sampling of rugged landscape along orthogonal degrees of freedom may require temporary excursions beyond the window’s boundaries, where the barrier separating adjacent valleys is smaller. This is illustrated in Figure 8. The barrier between the minima along the orthogonal degree of freedom, ζ, may be difficult to cross at ξ in the range of [0, 0.2], but not in the [0.8, 1] range. In this case, stratification will create kinetic traps, most likely reducing rather than improving efficiency. A similar problem was observed in simulations aimed at determining the potential of mean force for a small, proline-rich peptide, p41, bound to the SH3 domain of Abl kinase using the root-mean-square deviation (RMSD) with respect to the native conformation as the transition coordinate. There, a second minimum was only discovered through creation of a window that did not encompass the native state (see Figure 2A in ref (89)). This second minimum represents a shift in register of the peptide in its binding site, and although it would occur spontaneously given sufficiently long time, its probability can be enhanced by first driving the peptide to higher values of RMSD, which disrupts some of the bonds characteristic of the native state, and then letting it come back. Because it is often not possible to predict orthogonal barriers a priori, determining an appropriate windowing scheme typically requires an adaptive procedure. The basic criterion here is the continuity of the biasing force across consecutive windows. If one is concerned about quasi nonergodicity, a good strategy is to start with large windows. If the continuity of forces appears to be satisfactory, one might attempt to improve efficiency through further stratification. The advantage of this strategy is that the approximation to the biasing force acquired from the large-window simulation can be used in smaller windows. If no windowing scheme yields continuous forces, then other strategies, described in the next section, should be employed.

Multiple-Walker Strategies

Simulations involving multiple replicas are perhaps the most powerful strategies to accelerate ergodic sampling along degrees of freedom orthogonal to the transition coordinate. Beyond the “embarrassingly parallel” strategy of running independent simulations to obtain more sampling than is possible with a single simulation in the same real time, a number of schemes have been devised that significantly enhance sampling, at the cost of transferring information between replicas. In a prototype example of multiple-replica algorithms, originating with Monte Carlo simulations,[90−92] replicas are run at different temperatures and exchanges of system configurations are attempted periodically between replicas, so as to maintain a canonical ensemble at each temperature.[93] The advantage of this method is that replicas at higher temperatures cross energetic barriers more quickly and can pass the resulting configurations to replicas at lower temperatures, preventing the latter from remaining in metastable states. Exchange of replicas with different umbrella-sampling potentials, known as Hamiltonian exchange, has also been widely successful.[94−96] With the adaptive biasing force method, each replica can be thought of as a “walker” exploring the transition coordinate space. Multiple-walker strategies range from simply running similar independent adaptive biasing force calculations in parallel, to more complex ones, involving communication between replicas.[85,97] Here we consider two communication strategies that can be used in concert. In the first strategy, which we refer to as shared adaptive biasing force, the instantaneous forces sampled from each walker are collected in a single shared buffer as the simulations progress simultaneously. In the current implementation, the shared buffer is merely conceptual: each walker retains its own buffer, which is synchronized with all the others at fixed intervals. Regardless, shared adaptive biasing force can result in significantly faster exploration of the transition coordinate and improved convergence of the free energy, as compared to the case for independent walkers.[85,97] A second strategy, complementary to the first, is the application of so-called walker selection rules. These selection rules eliminate replicas with values of the transition coordinate that are already well sampled, while duplicating replicas in relatively unexplored regions, enforcing more uniform sampling. Selection rules may be implemented using a so-called resampling procedure and weights that are associated with each replica.[8] There are many ways to choose the weights and to implement this idea in practice. The basic requirement is that the selection mechanism automatically vanishes at equilibrium, namely when the biasing force is the mean force. Here, we calculate the weight of each walker in a somewhat simpler way than in previous studies,[48,97] while obtaining similar results. The weight assigned to each walker is the inverse of the number of samples accrued in a neighborhood of bins near the walker. Specifically,where j is the bin occupied by walker i at the time of application of the selection rules, n(t) – n(tlast) is the number of samples that have been accrued in bin j since the last selection and resampling step, and h defines the number of bins surrounding j that are included in the sum. The walkers are resampled on the basis of these weights, and the selection mechanism is switched off when the smallest and largest n(t) differ by less than 20%, i.e., when [max(1/w) – min(1/w)/min(1/w) < 0.2].

Application to the Toy-Model Deca-alanine

We consider reversible unfolding of deca-alanine as a model system for testing different multiple-walker strategies. In Figure 9A we compare the results of adaptive biasing force calculations of 8 ns total simulated time to a converged free-energy profile. A single, 8-ns long simulation gave the poorest results, whereas 16 short, independent calculations perform somewhat better, even though the region 31.4–32 Å has not been sampled at all. In contrast, shared adaptive biasing force yields complete and more uniform sampling, and a smoother, more reliable free-energy profile. The best free-energy profile, almost indistinguishable from the converged one, was obtained from shared adaptive biasing force combined with walker selection. Note that, as expected, the free energies converge to the same values for sufficiently long simulations, irrespective of the multiple-walker strategy, as demonstrated in Figure 9B. An implementation of the shared adaptive biasing force algorithm with the selection rules used here is expected to be available in future releases of the molecular dynamics program NAMD.

Figure 9

Improving the rate of convergence with multiple-walker strategies. (A) Potentials of mean force obtained for different multiple-walker strategies and total simulation times of 8 ns. For reference, the black curve shows the result for a total simulation time of 128 ns. “Single” refers to a single long 8 ns simulation, and “independent” denotes the result of combining force samples for 16 walkers after they ran independently for 0.5 ns. For the curves marked “shared” and “selection”, force samples were synchronized among walkers every 20 ps. “Selection” also included the application of walker selection, as described in the text, on the same interval. (B) Potentials of mean force obtained for different multiple-walker strategies and total simulation times of 128 ns.

Multidimensional Adaptive Biasing Force Simulations

Algorithmic Backdrop

There are several reasons to perform adaptive biasing force calculations with a transition coordinate of dimension greater than one. First, we may be simply interested in how the free energy varies as a function of more than one coordinate. For example, to study the interaction between two molecules, it may be of interest to obtain the free energy as a function of both their distance and relative orientation. A second reason to use multiple dimensions is practical. As described in section Hidden Barriers and Other Challenges to Obtaining Accurate Results, sampling along a single collective variable may be hampered by barriers in orthogonal dimensions. If one can identify these orthogonal dimensions, the application of adaptive biasing force along them will remove the barriers and improve sampling. By itself, obtaining a multidimensional free-energy landscape requires more sampling than is needed to calculate a one-dimensional profile. For example, if one uses N bins along collective variable ξ and M bins along collective variable χ, each component of the gradient over both collective variables, ∇A(ξ, χ), requires MN bins. With one sample obtained per time step, the time required to generate a statistically significant number of samples in all bins will increase approximately proportionally. Despite the additional, substantial computational effort, adding a second or third dimension may improve efficiency, as it accelerates convergence and increases uniformity in sampling by allowing the system to more rapidly cross between multiple parallel valleys.[98] Subsequently, one can recover the potential of mean force along a single dimension through integrating out the additional dimensions.[99] However, when one wishes to include a large number of dimensions, the generalized adaptive biasing force algorithm (see section The Generalized Adaptive Biasing Force Algorithm) may be more appropriate. If the transition coordinate involves angular degrees of freedom that span their full range (e.g., [0, 2π]), an additional complication develops. The exact forces and free energies are periodic in these variables, but this is not necessarily true for these quantities burdened with statistical errors. One way to restore the required periodicity is to approximate the free energy as a function of angular variables with spline functions, coefficients of which are computed to distribute errors smoothly across the whole hyperspace spanned by these variables. How to do this has been described by Darve et al.[42] As an example application of multidimensional adaptive biasing force, we consider the free-energy landscape of deca-alanine as a function of the root-mean-square displacement (RMSD) from three reference structures. Figure 10A shows these reference structures: an α-helix, a 310-helix, and a particular compact conformation that we refer to as the ω conformation due to its visual similarity to this Greek letter. In agreement with the one-dimensional calculations, extensively discussed above, the free-energy minimum occurs when the structure is close to the α-helix, i.e., when RMSDα = 0.3 Å. In Figure 10B the three-dimensional potential of mean force is represented by three isosurfaces. Two low-energy regions can be seen in this figure. The violet region on the left corresponds to the α-helix, whereas the region on the right is closer to the ω conformation than to the other two structures. The minimum in the latter region is only 1.2 kcal/mol higher than the minimum for α-helix and lies at (RMSDα RMSD310 RMSDω) = (4.3 5.0 2.7) Å. In Figure 10C, examples of representative structures in these two low-energy regions are shown. The multidimensional free-energy landscape thus yields more insight into the correspondence between the free energy and molecular conformations, revealing, in particular, a minimum for a compact structure that was not identified when the one-dimensional transition coordinate was used.

Figure 10

Three-dimensional adaptive biasing force calculation. (A) Three reference structures of deca-alanine. The root-mean-square deviation of selected atoms from their positions in each of the three reference structures defines each of the three transition coordinates. (B) Free-energy isosurfaces as a function of the three transition coordinates. The violet, pink, and gray surfaces contain all points for which the free energy is less than 5, 10, and 20 kcal/mol, respectively. In all cases the tick marks represent a root-mean-square deviation of 1 Å. The minimum value of the free energy, which occurs at (RMSDα RMSD310 RMSDω) = (0.3 2.6 2.7) Å, is defined to be zero. (C) Another view of the surface shown in panel B, with representative structures shown for the two energetically favorable regions.

The Generalized Adaptive Biasing Force Algorithm

Previously, we considered one particular approach to multidimensional adaptive biasing force. Here we discuss two further possibilities. First, it is easy to check that if the dimension of the transition coordinate is larger than one, the biasing force cannot be in general derived from a gradient (it is not conservative).[42] On the other hand, this is of course true of the expected long-time limit. A natural idea is, therefore, to project, using, for example, the classical Helmholtz–Hodge projection: Change the biasing force F to ∇φ where φ = arg minφ ∥F – ∇φ∥L. Because this is consistent with the expected stationary state, this should not alter the convergence properties. Moreover, the interest of this projection is that the variance of the force is reduced, because the nonconservative part is set to zero. This aspect is analyzed in ref (100). Second, it would be interesting to develop efficient adaptive biasing force techniques for higher dimensional transition coordinates. The standard implementation of the adaptive biasing force algorithm is currently limited to a dimension up to 3, because it relies on a Cartesian grid of the transition coordinate values, whose complexity is exponential with respect to the dimension of the transition coordinate. It would not be very useful, however, to develop a technique that yields a flat energy landscape along the transition coordinate if the dimension is high, as mapping a high-dimensional cube would be quite time-consuming. One alternative idea, explored in ref (101), is to use a bias of the form ∑A◦ξ, where (ξ1, ..., ξ) denotes an m-dimensional transition coordinate. Under appropriate assumptions, one can show the convergence of the method, and preliminary numerical results are encouraging; see ref (101). Another route would be to consider a bias of the form ∏A◦ξ, or even following greedy algorithms and tensor product approaches used in nonlinear approximation theory (see, for instance, ref (102)), ∑∏A◦ξ, where the functions (A1,, ..., A) would be iteratively computed. This method is currently under study.

Combining the adaptive biasing force algorithm with geometrical restraints

The objective of this section is to assess the influence of geometrical restraints on the free-energy landscape and how such restraints ought to be treated in the context of adaptive biasing force simulations. Toward this end, the nature of the forces at play, either thermodynamic forces or forces arising from external harmonic potentials, will be clarified, and the current strategy for handling the latter, using an extended-Lagrangian formalism, will be outlined.

Distinguishing the Thermodynamic Force from the Restraint Force

One subtlety of the adaptive biasing force algorithm that is often overlooked is its dependence on the measure of the thermodynamic force, which is not necessarily synonymous with the total force acting on the transition coordinate. For most biomolecular simulations, this is due to the imposition of various constraint and/or restraint forces. For example, hydrogen-bond lengths are often constrained in a simulation via the RATTLE algorithm.[103] If only the heavy atom of the bonded pair is involved in the transition coordinate, then the adaptive biasing force algorithm will include the constraint force emanating from the hydrogen in addition to the thermodynamic force, thus contaminating its estimate. A straightforward solution is to include both the hydrogen and its parent atom in the collective variable(s) defining the transition coordinate, causing the constraint forces to cancel each other. In other words, if the constraint/restraint force is zero in the collective variable’s center of mass, it will not contribute to the measured potential of mean force. Alternatively, if restraints nonuniformly affect atoms in the collective variable(s) but are not accounted for in the thermodynamic force used by the adaptive biasing force algorithm, then convergence is impossible; the adaptive biasing force scheme cannot remove forces that it does not measure. Though it may seem apparent that one will always want to calculate what the potential of mean force would be in the absence of artificial restraints, there are cases in which externally imposed restraints are meant to be included. A key example is in calculating protein–ligand binding free energies, which utilizes a staged procedure involving a series of geometrical restraints.[32,89] By design of the procedure, these restraints must involve the same atoms also being biased at each stage and their contributions are individually determined and tabulated at the end. In this case, to ensure that all necessary forces are included, a procedure such as extended adaptive biasing force is required.

The Extended Adaptive Biasing Force Method

As mentioned above, it may be cumbersome to write an analytical expression for the instantaneous force in dimensions larger than one or for complicated transition coordinates. One possible way around this issue is to extend configurational space with a fictitious degree of freedom λ and define an extended potential,where k > 0 is a (large) force constant that couples λ to the transition coordinate, and proceed utilizing the extended transition coordinate, ξext(x,λ) = λ, instead of ξ. It can be checked that the free energy associated with this extended system is the convolution of the original free energy with a Gaussian kernel, Aext(λ) = ∫dz χ(λ–z) A(z), where χ(z) = Z–1 exp[(−β/2k)|z|2] is the Gaussian kernel with variance kβ–1. The constant k should thus be chosen sufficiently large to ensure that Aext is a good approximation of A. This enables quick convergence to equilibrium. The adaptive biasing force algorithm can be applied to the extended system,[8] the potential energy being Vext and the transition coordinate being ξext. Notice that the instantaneous force on the extended degree of freedom, corresponding to free energy Aext, is trivial to compute, because it is equal to the harmonic spring force. The original free energy A can then be recovered by using either deconvolution procedures[104] or simple unbiasing techniques based on formulas such as eq 34. Provided that a good estimate of the biased marginal of the transition coordinate, ρ̃(z), has been collected, the following unbiased estimator of the free-energy derivative can be used,where ⟨λ⟩ is the conditional average value of λ for ξ(x) = z. Here, the biased marginal of ξ is used as a correction to the inaccurate biasing force. It should be noted that this is valid in a more general context, including the absence of biases (removing the second term of the right-hand side), in which case one recovers the trivial histogram-based estimator of the potential of mean force.

Combining thermodynamics and kinetics

Although the adaptive biasing force algorithm is often considered merely a free-energy calculation technique, it is, at its core, an importance-sampling scheme that can have applications beyond obtaining free energies. One such application is to calculate kinetic parameters, such as the diffusivity along a transition coordinate, where an importance-sampling scheme is often essential to obtain reliable estimates in all regions of the coordinate, particularly near all-important free-energy barriers. Together, the diffusivity and free energy as functions of the transition coordinate can be used to construct a kinetic model of a process of interest, which yields insight beyond the static picture of molecular phenomena given by free-energy calculations. In this section, we describe a recently developed scheme leveraging adaptive biasing force to compute simultaneously the free energy and diffusivity along a transition coordinate. This scheme is just an example, as a number of other approaches in which the adaptive biasing force is used to improve kinetic descriptions of different systems are being investigated. Also note that, although this scheme was designed with the adaptive biasing force method in mind, it can be straightforwardly applied to other importance-sampling techniques. Determining kinetic parameters can be more subtle than mapping the free-energy landscape, because kinetic descriptions are usually approximations and cannot be exactly derived from statistical mechanics.[105] A commonly invoked diffusive model is overdamped Langevin dynamics, in which the following equation of motion[106,107] is assumed for a set of M collective variables z = (z1, z2, ..., z),where ż is the time derivative of the ith collective variable at time t, D(z) is the (i, j) component of the diffusivity tensor for the configuration of the collective variables at time t, f(z,t) is the total force on collective variable j for the configuration at time t, ∇ = ∂/∂z, and ζ(t) is a stochastic variable with ⟨ζ(t) ζ(t′)⟩ = 2D(z) δ(t′–t). The total force on variable j, f(z,t), is the sum of the biasing force, fbias(t), and the system force, which can be expressed as the negative of the gradient of the potential of mean force F(z) = −∇A(z). Here, we will focus on diffusive models along a single collective variable, but note that multidimensional transitions have been analyzed by similar approaches.[86,108,109] In one dimension, the overdamped Langevin model has two free parameters, the free-energy profile A(z) and the position-dependent diffusivity D(z). The adaptive biasing force method, as well as other techniques, provides a route to A(z), whereas a number of methods have been applied to determining position-dependent diffusivity.[105,108−117] However, some of the most basic methods are not compatible with nonuniform free-energy landscapes,[112] and others are based on the assumption that the free energy can be approximated as a harmonic well.[105,110,112] In methods specifically designed for compatibility with the adaptive biasing force method, it is assumed that the dependence of diffusivity on position is weak.[87] To parametrize a diffusive model, A(z) and D(z) should be determined consistently, because, for example, coarsening A(z) results in a reduction of the associated D(z).[118] Furthermore, to save computational resources, it would be desirable to obtain both the free energy and the diffusivity in the same calculation. However, in many methods cited above, equilibrium or steady-state statistics is assumed, which is likely to yield erroneous results with time-dependent biasing forces. Solutions to the problem of calculating diffusivity in simulations with time-dependent biases take advantage of statistical tools such as maximum likelihood[114] or Bayesian inference.[117] Conceptually, these methods rely on optimizing the parameters of the diffusive model such that the observed trajectory has the greatest likelihood of occurring.

Optimizing the Diffusive Model

The Bayesian inference scheme described below[86,99,117] begins with assuming a particular dynamical model for the collective variable (or collective variables) of interest. In practice, we represent the functions D(z) and F(z) by piecewise cubic interpolation[117] from a discrete grid in M-dimensional z-space. We seek the optimal parameters comprising the values of the functions D(z) and F(z) at each grid node that best correspond to the simulated trajectory, denoted by T. For simplicity, we represent these optimal parameters as H*. We consider the trajectory as a set of discrete hops of duration Δt. Given trial parameters H, as well as the biasing forces fbias(tα), we compute for each hop p ({zα+1, tα+1}{zα, tα}, fbias(tα), H0), the conditional probability of arriving at transition-coordinate configuration zα+1 at time tα+1, given that the system occupied the configuration zα at time tα. The probability of the complete trajectory given the parameters is the product of the probabilities at each step[105,114,117]This equation yields the probability of the trajectory given an assumed set of parameters. Using Bayes’ theorem,[119,120] we can infer the probability of the parameters given the trajectory: An advantage of the Bayesian approach is that it permits inclusion of any prior knowledge about the form of the parameters in a consistent way by defining the prior probability pprior(H) of the parameters.[119,120] For example, one can assume scale invariance of the function values[121] or smoothness of the functions.[86,105,117] Finding the optimal parameters then becomes a problem of finding the set of parameters H* that maximizes posterior probability P(H*|T), which can be carried out by generating a Markov chain of states H using the Metropolis–Hastings algorithm.[122]

Trajectory Likelihood in One Dimension

In the one-dimensional case (or for independent motions along multiple dimensions), we can simplify and discretize eq 29 to yieldwhere g is a random variable with a standard normal distribution. From the properties of g, it is evident that eq 30 becomesBelow, using this formula and eq 31, we are able to reconstruct D(z) and F(z) for deca-alanine, yielding a complete diffusive model. One might note that we have assumed nothing about fbias(t). Indeed, we may even set fbias = 0 and construct the diffusive model from an equilibrium simulation. However, the accuracy of the results of a Bayesian scheme are wholly dependent on the quality of sampling near each point along z, necessitating importance sampling for rugged free-energy landscapes. In the Supporting Information, we discuss assessment of the precision and reliability of the results of the Bayesian scheme, which includes estimating the statistical error, as well as checking the consistency of the diffusive model with itself and the output of adaptive biasing force.

Application to the Toy-Model Deca-alanine

The z-dependent diffusivity for reversible unfolding of deca-alanine is shown in Figure 11B. The variation with z could not have been inferred in any simple way from knowledge of the system force or free-energy profiles. Diffusion along z, which can be thought of as the rate at which the end-to-end distance randomly changes, is highest near z = 16 Å, corrsponding to slightly unfolded α-helical structures. The ensemble of diverse compact structures in the range z ∈ [5, 10] Å seems to be associated with the lowest diffusivity values, whereas a secondary minimum appears near z = 22 Å. Given these data, one might hypothesize that the diffusivity is inversely related to the conformational degeneracy at z. Note that Figure 11B shows considerable dependence of the calculated diffusivity on Δt, the time over which the trajectory is discretized in eq 33, which implies that the motion on the observed times is not well modeled by overdamped Langevin dynamics, and that a more sophisticated model of motion along z may be needed.

Figure 11

Combining thermodynamics and kinetics for reversible unfolding of deca-alanine. (A) Comparison of the system force as determined by adaptive biasing force and that calculated from the Bayesian scheme on the same trajectory. (B) Diffusivity as a function of the end-to-end distance of deca-alanine for different observation time intervals Δt.

Summary and outlook

Free-energy calculations have become standard tools of statistical mechanics applied to a wide variety of problems in chemistry and biology. This has been possible due to significant theoretical advances in this area, their highly efficient implementations in software packages developed for modern, parallel computers, and remarkable improvements in computational power available for free-energy calculations. Yet, these calculations are still not sufficiently mature to be carried out without careful supervision of the end user. This may be fortunate, as there is no substitute for physical insight of the researcher. This also implies that a number of good practices should be followed in free-energy calculations in general, and in applying the adaptive biasing force in particular. These good practices are usually quite simple and involve careful design of simulations, monitoring their progress and postprocessing analysis. In favorable circumstances, they will simply increase confidence that the calculated free energies are reliable. In less favorable cases, they are even more important, as they allow for identifying and correcting shortcomings of the calculations that might adversely affect the results and their interpretation. Below, we recapitulate these good practices, which have been already discussed in a considerable detail in the preceding sections. Careful choice of the transition coordinate is a key step toward successful free-energy calculations. In making this choice, it is desirable not only to capture the physical nature of a problem of interest but also to ensure small variance and smoothness of the biasing force.[42] Ignoring the latter issues may adversely affect efficiency and even correctness of the calculation. Reducing the variance should be balanced with the cost of calculating the instantaneous value of the transition coordinate, as usually the former will decrease and the latter will increase with the number of atoms involved. This balance will depend on implementation details and, therefore, some familiarity with the underlying code might be required. A suitable stratification strategy should be planned in advance. If there are concerns about possible quasi nonergodic behavior of the system, and in particular the existence of parallel channels between the end point states, it is recommended to use large windows. If these concerns prove to be unjustified, it is always possible to switch to smaller windows without losing information about the biasing force that has already been accrued. In other instances it is usually more efficient to stratify the transition coordinate into smaller windows throughout a simulation. Free-energy calculations should always be accompanied by estimates of statistical errors. For the adaptive biasing force algorithm, a formula to do so exists and should be applied whenever possible. Without error estimates the reliability of the calculated free-energy values is questionable and the ability to compare them with experimental measurements is seriously hampered. In stratified simulations, lack of continuity in the biasing force across consecutive windows that clearly exceeds statistical errors is a sure sign of problematic free-energy calculations. This issue should not be ignored, and any attempt to circumvent it by way of some sort of averaging is unlikely to succeed. Instead, remedial steps aimed at improving ergodic behavior of the system, such as applying the multiple-walker strategy, should be taken. It is usually very useful to monitor the behavior of the total, biased force, as it should converge to zero in long simulations. Moreover, the rate of convergence at sufficiently long times t should become proportional to 1/√t. If the biased force along the transition coordinate or within a window exhibits large deviations from nonuniformity, which clearly does not decrease with time, or the convergence rate is erratic, we have, again, a likely indication of quasi nonergodicity in orthogonal degrees of freedom. Then, there is no basis for assuming that a reliable free-energy dependence on the transition coordinate can be extracted from the calculated forces. Instead, in such circumstances, techniques for removing quasi nonergodicities along orthogonal degrees of freedom should be brought to bear. Following these simple good practices guarantees the improved quality of free-energy calculations carried out by way of the adaptive biasing force method but does not ensure that all problems encountered in such calculations, especially those related to quasi nonergodicity, have been identified. To this end, the toughest challenges to practitioners of the adaptive biasing force algorithm and related methods are linked to multiple slow degrees of freedom leading to orthogonal barriers. One way to address these problems is through a multiple-walker strategy, which has proven effective in a test system that exhibits metastability in the orthogonal space, a significant challenge to the classic adaptive force method.[97] Along the same lines, the basic method can be integrated with other enhanced sampling techniques that involve multiple-copy schemes, such as parallel tempering and Hamiltonian exchange.[85] Another promising research direction is to increase the dimensionality of the bias, as is done in an implicit way in the generalized adaptive biasing force scheme.[101] “Real-life” applications would, however, require an accessible and well-documented implementation, now available as part of the independent collective variables module.[39] Although better sampling of orthogonal degrees of freedom is the most promising direction for increasing the efficiency of the adapting biasing force method, there is also room for improvement in achieving convergence along the transition coordinate. For example, it would be of interest to develop adaptive algorithms for the early stages of simulations that would converge faster than the currently used step or ramp functions. One possibility along these lines is to employ a kernel function along ξ. In other words, adaptation in a given bin would depend not only on samples accrued in this bin but also on samples in neighboring bins. Developing binless algorithms is also of interest. Another avenue to improve convergence along the transition coordinate, which has not been explored so far, is to exploit the freedom in defining the ensemble average of instantaneous force and identify choices that reduce the variance. Several steps can be taken to obtain better estimates of free energies. For example, the error in integrating force to calculate free energy that is due to binning of force values could be reduced by using improved, smooth interpolation schemes. Further, we observe that the best accuracy is obtained not when the free energy is flat but rather when the statistical error of the average force is the same everywhere along the transition coordinate. To achieve this, the number of samples, N, in bin k should be such that σ/(N)1/2 rather than N is constant for all bins, where σ is the standard deviation of estimated average force in bin k. This is realized if the biasing potential due to the average force used in the standard adaptive biasing force method is supplemented by the term 2kBT ln(σ/σ0), where σ0 is the standard deviation at a reference point. In practice, σ and σ0 are estimated from their running values or their approximations during the course of a simulation. The additional term becomes important whenever the friction coefficient changes markedly with ξ or parts of the system undergo large fluctuation, for example because they are near phase transition. A number of extensions to the adaptive biasing force method have not been explored yet. For example, the method could be straightforwardly extended to transformations along a parameter of the Hamiltonian if an equation of motion was associated with this parameter, as is done in metadynamics. It would be of interest to check whether such application of the adaptive biasing force method were more efficient and reliable than other, related methods currently used for this purpose, such as conventional thermodynamic integration, metadynamics, or the Wang–Landau algorithm. Another extension of the method that was outlined in one of the original studies on the adaptive biasing force[123] but has not been pursued so far, is to add to the biasing force another contribution that would depend only on ξ and would favor certain states of particular interest to the user or drive the dynamics in a specified direction. The latter would make the adaptive biasing force method a more efficient and better controlled alternative to steered dynamics. Finally, we emphasize that the adaptive biasing force is not just a free-energy calculation technique but can also be seen as a general adaptive biasing scheme (see, for instance, ref (124) for applications in Bayesian statistics). Indeed, once a correct sampling of the biased measure Z–1 exp(−βpTM–1p/2) dp Z–1 exp{−β[V – At◦ξ](x)} dx has been obtained (t0 being a fixed time, and the bias being fixed from time t0: ∀t ≥ t0, A = A, it is easy to recover canonical averages using standard unbiasing procedures: Although the adaptive biasing force dynamics has been applied so far almost exclusively to chemical and biological systems, it could also be very useful for sampling a multimodal measure (namely, a probability measure for which high-probability regions, called “modes” in this context, are separated by low-probability regions) in other fields, e.g., among others, free-energy computations in material sciences[125] or Markov chain Monte Carlo techniques for Bayesian inference.[124] In summary, the combination of simplicity, versatility, and strong mathematical underpinnings makes the adaptive biasing force method an attractive target for a wide variety of extensions in statistical mechanics and beyond.

61 in total

1. Escaping free-energy minima.

Authors: Alessandro Laio; Michele Parrinello
Journal: Proc Natl Acad Sci U S A Date: 2002-09-23 Impact factor: 11.205

2. Reaction coordinates and rates from transition paths.

Authors: Robert B Best; Gerhard Hummer
Journal: Proc Natl Acad Sci U S A Date: 2005-04-06 Impact factor: 11.205

3. Insights into the recognition and association of transmembrane alpha-helices. The free energy of alpha-helix dimerization in glycophorin A.

Authors: Jérôme Hénin; Andrew Pohorille; Christophe Chipot
Journal: J Am Chem Soc Date: 2005-06-15 Impact factor: 15.419

4. Diffusion constant of K+ inside Gramicidin A: a comparative study of four computational methods.

Authors: Artem B Mamonov; Maria G Kurnikova; Rob D Coalson
Journal: Biophys Chem Date: 2006-04-06 Impact factor: 2.352

5. Secondary and tertiary structure elasticity of titin Z1Z2 and a titin chain model.

Authors: Eric H Lee; Jen Hsin; Olga Mayans; Klaus Schulten
Journal: Biophys J Date: 2007-05-11 Impact factor: 4.033

6. Well-tempered metadynamics: a smoothly converging and tunable free-energy method.

Authors: Alessandro Barducci; Giovanni Bussi; Michele Parrinello
Journal: Phys Rev Lett Date: 2008-01-18 Impact factor: 9.161

7. Random walk in orthogonal space to achieve efficient free-energy simulation of complex systems.

Authors: Lianqing Zheng; Mengen Chen; Wei Yang
Journal: Proc Natl Acad Sci U S A Date: 2008-12-15 Impact factor: 11.205

8. Calculating free-energy profiles in biomolecular systems from fast nonequilibrium processes.

Authors: Michael W Forney; Lorant Janosi; Ioan Kosztin
Journal: Phys Rev E Stat Nonlin Soft Matter Phys Date: 2008-11-14

9. Calculating diffusion and permeability coefficients with the oscillating forward-reverse method.

Authors: Bryan W Holland; Chris G Gray; Bruno Tomberli
Journal: Phys Rev E Stat Nonlin Soft Matter Phys Date: 2012-09-27

10. Molecular dynamics simulation of the gramicidin channel in a phospholipid bilayer.

Authors: T B Woolf; B Roux
Journal: Proc Natl Acad Sci U S A Date: 1994-11-22 Impact factor: 11.205

66 in total

1. Data-guided Multi-Map variables for ensemble refinement of molecular movies.

Authors: John W Vant; Daipayan Sarkar; Ellen Streitwieser; Giacomo Fiorin; Robert Skeel; Josh V Vermaas; Abhishek Singharoy
Journal: J Chem Phys Date: 2020-12-07 Impact factor: 3.488

2. Estimating thermodynamic expectations and free energies in expanded ensemble simulations: Systematic variance reduction through conditioning.

Authors: Manuel Athènes; Pierre Terrier
Journal: J Chem Phys Date: 2017-05-21 Impact factor: 3.488

3. An intermediate along the recovery stroke of myosin VI revealed by X-ray crystallography and molecular dynamics.

Authors: Florian Blanc; Tatiana Isabet; Hannah Benisty; H Lee Sweeney; Marco Cecchini; Anne Houdusse
Journal: Proc Natl Acad Sci U S A Date: 2018-05-29 Impact factor: 11.205

4. Methodology for the Simulation of Molecular Motors at Different Scales.

Authors: Abhishek Singharoy; Christophe Chipot
Journal: J Phys Chem B Date: 2016-11-30 Impact factor: 2.991

5. Structural basis for substrate selection by the translocation and assembly module of the β-barrel assembly machinery.

Authors: Rebecca S Bamert; Karl Lundquist; Hyea Hwang; Chaille T Webb; Takoya Shiota; Christopher J Stubenrauch; Mathew J Belousoff; Robert J A Goode; Ralf B Schittenhelm; Richard Zimmerman; Martin Jung; James C Gumbart; Trevor Lithgow
Journal: Mol Microbiol Date: 2017-08-09 Impact factor: 3.501

6. Free Energy Landscape and Conformational Kinetics of Hoogsteen Base Pairing in DNA vs. RNA.

Authors: Dhiman Ray; Ioan Andricioaei
Journal: Biophys J Date: 2020-09-02 Impact factor: 4.033

7. Scalable molecular dynamics on CPU and GPU architectures with NAMD.

Authors: James C Phillips; David J Hardy; Julio D C Maia; John E Stone; João V Ribeiro; Rafael C Bernardi; Ronak Buch; Giacomo Fiorin; Jérôme Hénin; Wei Jiang; Ryan McGreevy; Marcelo C R Melo; Brian K Radak; Robert D Skeel; Abhishek Singharoy; Yi Wang; Benoît Roux; Aleksei Aksimentiev; Zaida Luthey-Schulten; Laxmikant V Kalé; Klaus Schulten; Christophe Chipot; Emad Tajkhorshid
Journal: J Chem Phys Date: 2020-07-28 Impact factor: 3.488