Rui Sun1, Olaseni Sode1, James F Dama1, Gregory A Voth1. 1. Department of Chemistry, James Franck Institute, and Institute for Biophysical Dynamics, The University of Chicago , 5735 South Ellis Avenue, Chicago, Illinois 60637, United States.
Abstract
The protein mediated hydrolysis of nucleoside triphosphates such as ATP or GTP is one of the most important and challenging biochemical reactions in nature. The chemical environment (water structure, catalytic metal, and amino acid residues) adjacent to the hydrolysis site contains hundreds of atoms, usually greatly limiting the amount of the free energy sampling that one can achieve from computationally demanding electronic structure calculations such as QM/MM simulations. Therefore, the combination of QM/MM molecular dynamics with the recently developed transition-tempered metadynamics (TTMetaD), an enhanced sampling method that can provide a high-quality free energy estimate at an early stage in a simulation, is an ideal approach to address the biomolecular nucleoside triphosphate hydrolysis problem. In this work the ATP hydrolysis process in monomeric and filamentous actin is studied as an example application of the combined methodology. The performance of TTMetaD in these demanding QM/MM simulations is compared with that of the more conventional well-tempered metadynamics (WTMetaD). Our results show that TTMetaD exhibits much better exploration of the hydrolysis reaction free energy surface in two key collective variables (CVs) during the early stages of the QM/MM simulation than does WTMetaD. The TTMetaD simulations also reveal that a key third degree of freedom, the O-H bond-breaking and proton transfer from the lytic water, must be biased for TTMetaD to converge fully. To perturb the NTP hydrolysis dynamics to the least extent and to properly focus the MetaD free energy sampling, we also adopt here the recently developed metabasin metadynamics (MBMetaD) to construct a self-limiting bias potential that only applies to the lytic water after its nucleophilic attack of the phosphate of ATP. With these new, state-of-the-art enhanced sampling metadynamics techniques, we present an effective and accurate computational strategy for combining QM/MM molecular dynamics simulation with free energy sampling methodology, including a means to analyze the convergence of the calculations through robust numerical criteria.
The protein mediated hydrolysis of nucleoside triphosphates such as ATP or GTP is one of the most important and challenging biochemical reactions in nature. The chemical environment (water structure, catalytic metal, and amino acid residues) adjacent to the hydrolysis site contains hundreds of atoms, usually greatly limiting the amount of the free energy sampling that one can achieve from computationally demanding electronic structure calculations such as QM/MM simulations. Therefore, the combination of QM/MM molecular dynamics with the recently developed transition-tempered metadynamics (TTMetaD), an enhanced sampling method that can provide a high-quality free energy estimate at an early stage in a simulation, is an ideal approach to address the biomolecular nucleoside triphosphate hydrolysis problem. In this work the ATP hydrolysis process in monomeric and filamentous actin is studied as an example application of the combined methodology. The performance of TTMetaD in these demanding QM/MM simulations is compared with that of the more conventional well-tempered metadynamics (WTMetaD). Our results show that TTMetaD exhibits much better exploration of the hydrolysis reaction free energy surface in two key collective variables (CVs) during the early stages of the QM/MM simulation than does WTMetaD. The TTMetaD simulations also reveal that a key third degree of freedom, the O-H bond-breaking and proton transfer from the lytic water, must be biased for TTMetaD to converge fully. To perturb the NTP hydrolysis dynamics to the least extent and to properly focus the MetaD free energy sampling, we also adopt here the recently developed metabasin metadynamics (MBMetaD) to construct a self-limiting bias potential that only applies to the lytic water after its nucleophilic attack of the phosphate of ATP. With these new, state-of-the-art enhanced sampling metadynamics techniques, we present an effective and accurate computational strategy for combining QM/MM molecular dynamics simulation with free energy sampling methodology, including a means to analyze the convergence of the calculations through robust numerical criteria.
Nucleoside triphosphates
(NTP) contain a nucleoside bound to three
phosphate groups. The hydrolysis of NTPs such as ATP or GTP by proteins
is one of the most important reactions in biology. It provides energy
to biochemical functions that are of great importance to life and
plays essential signaling functions. For example, adenosine triphosphate
(ATP), one of the most important NTPs, is often considered as a “molecular
unit of currency” in intracellular energy transfer[1] and plays essential roles in biological processes
such as synthesis of proteins and membranes, movement of the cell,
cellular division, etc.[2,3] Another important NTP molecule,
guanosine triphosphate (GTP), is vital to cell signaling, cell motility,
protein synthesis and translocation, and more.[4−6] Due to the importance
of NTP hydrolysis, great experimental effort has been devoted to study
the hydrolysis mechanism for many years. These efforts include but
are not limited to the structural determination of nucleotide-bound
proteins through X-ray crystallography and cryo-electron microscopy,[7] as well as the identification of essential residues
through mutagenesis and biochemical assays.[8] Despite the success of such experiments, the specific molecular
mechanism(s) of NTP hydrolysis often remains ambiguous because experiments
cannot directly observe chemical reactions in proteins. As a result,
theoretical studies, conducted through modeling and simulation, are
highly desirable to reveal the hydrolysis mechanism at an atomistic
level.Even though molecular dynamics (MD) simulations have
benefited
enormously from the rise of computational capabilities over the last
few decades,[9] the accurate computational
study of NTP hydrolysis remains very challenging for primarily two
reasons. First, the nature of the hydrolysis reaction requires the
chemical bonding to be modeled in a reactive manner (i.e., bonds will
be broken and new ones will be formed). Quantum mechanics/molecular
mechanics (QM/MM) simulation is one possible solution to this problem,
in which the NTP molecules and nearby protein region undergoing and
directly influencing the hydrolysis reaction are modeled with electronic
structure theory, and the other parts of the system are modeled with
classical molecular mechanics. However, the phosphate groups, catalytic
metals, and their nearby amino acids and water molecules are all relevant
to the hydrolysis reaction, usually making the reactive QM region
larger than 150 atoms. Simulating a system of this size with electronic
structure theory is extremely computationally expensive, even at the
level of density functional theory (DFT).[10] To overcome this challenge, researchers have developed various reactive
models. Warshel and co-workers have proposed the empirical valence
bond (EVB) approach,[11,12] which uses a diabatic representation
to describe the reactant, intermediate, and product states of the
reaction and empirical interaction, as one way of studying reactions
in complex systems. The EVB approach has been recently applied to
the study of NTP hydrolysis.[13] The multiscale
reactive molecular dynamics (MS-RMD) approach,[14−16] where the reactive
potential energy surface is defined by an “on the fly”
dynamically evolving linear combination of multiple diabatic potentials,
and the terms of the interaction matrix defining them are variationally
determined (at least in part) from QM/MM or ab initio MD forces, could in principle be applied to NTP reactions, as has
been done to simulate proton transport in proteins (see, e.g., ref (17) for a recent example).
The MS-RMD reactive potential energy surface enables the formation
and cleavage of molecular bonds in a deterministic classical MD simulation.
As another alternative, the self-consistent charge density functional
tight binding (SCC-DFTB)[18] might be utilized.
This approach is a semiempirical QM algorithm that is derived from
a Taylor expansion of the DFT Kohn–Sham energy for charge density
fluctuations in which the Hamiltonian matrix elements are evaluated
using a minimal basis set of pseudoatomic orbitals within a two-center
approximation.[19−21] SCC-DFTB has been applied to the ATP hydrolysis process
in myosin.[22] Another possible approach
is to utilize force matching along a reaction pathway determined from
less expensive semiempirical calculations such as SCC-DFTB and to
then correct them to higher level QM calculations.[23] In general, all of the aforementioned methods are implemented
in order to greatly increase the amount of statistical sampling in
the simulation by making evaluations of the forces in the QM region
of the hydrolysis reaction faster, ideally (but not commonly) with
a systematic and small trade-off in accuracy.The second challenge
is that the free energy barrier of NTP hydrolysis
is usually on the order of tens of kcal/mol, which makes conventional
MD and QM/MM simulations of no practical use in calculating a potential
of mean force (PMF) because the transition state region will barely
be sampled during a feasible simulation time. In a recent QM/MM study
of ATP hydrolysis in F1-ATPase,[24] Shigehiko
et al. optimized the geometry of the hydrolysis compounds along the
reaction path and constructed an approximate free energy surface with
vibrational frequencies from QM calculations. This approach only provides
free energy profiles for the stationary points (potential energy minima
and transition states) instead of the entire reaction path, and, more
importantly, the lack of accounting for configurational entropy from
nonlinear fluctuations may cause significant errors in the free energy
surface for enzymatic reactions.[25] Therefore,
enhanced sampling methods have been employed in the studies of NTP
hydrolysis to address this high energy barrier issue. For example,
Yang et al.[22] used umbrella sampling and
the weighted histogram analysis[26,27] in their SCC-DFTB/MM
simulations to encourage the attack of lytic water to the gamma phosphate
and the proton transfers between the lytic water and amino acid. More
recently, metadynamics (MetaD)[28−30] has been adopted in NTP hydrolysis
simulations.[31−33] MetaD accelerates the simulation by discouraging
it from revisiting configurations that have already been sampled through
constructing a history-dependent bias potential V(s) to help the system to overcome large free energy
barriers. An appealing feature of MetaD is its ability to explore
reactions without an a priori guess of the detailed
reaction path,[34] in contrast to umbrella
sampling where one has to prepare the geometry for each window according
to a hypothetical reaction path.In spite of the past application
of MetaD in assisting the PMF
calculation of NTP hydrolysis, the practical and robust convergence
of an acceptably accurate and therefore computationally expensive
QM/MM simulation with MetaD remains extremely difficult. Other than
the sampling limitations of the QM/MM simulation discussed earlier,
the complexity of NTP hydrolysis may demand more collective variables
(CVs) than those solely associated breaking the bond between the betaphosphate and the gamma oxygen, e.g., the proton transfer between
the lytic water and/or the amino acid through a water mediated “proton
relay”.[31] Additionally, different
sets of CVs are sometimes required to properly sample the forward
and backward reactions.[33] As a result,
the quality of the convergence in these early QM/MM MetaD simulations
could not be verified as rigorously as the theory suggests, e.g.,
by checking for the diffusive motion of the CVs and for agreement
between multiple replicas.It is noteworthy that these early
QM/MM MetaD simulations have
either utilized the original nontempered MetaD,[28] where the same Gaussian is added to the Hamiltonian along
the simulation, or the well-tempered MetaD (WTMetaD),[29] where the Gaussian is scaled with respect to the current
local bias potential. In nontempered MetaD, the ever-growing history
dependent bias potential does not converge to the PMF modulo a constant
but rather fluctuates around the desired final state.[30] A possible mitigation of this behavior is to take the time
average of the bias potential after observing the diffusive motion
of the system in the CV space. However, as pointed out earlier, the
identification of diffusive motion can be subjective and itself is
very difficult to achieve due to the limited sampling of NTP hydrolysis
in expensive QM/MM simulation, thus making an extensive enough MetaD
simulation to check for truly diffusive behavior is presently nearly
impossible. WTMetaD overcomes the convergence issue[30] but presents an obvious trade-off between the exploration
of the CV space in the early stages of the simulation and the asymptotic
convergence rate.[35,36] For example, WTMetaD that decreases
the Gaussian height slowly builds up a fast-growing but noisy bias
potential, hindering the convergence. In contrast, a fast decrease
of the Gaussian heights generates a smooth bias potential but prevents
it from escaping a minimum within a feasible QM/MM simulation time,
especially when the barrier in the CV space is very large, which is
usually the case in NTP (e.g., ATP) hydrolysis.As demonstrated
in our recent papers,[35,36] the newly developed transition-tempered
MetaD (TTMetaD)[35] has been shown to converge
rapidly and asymptotically
without sacrificing the exploration of the CV space in the early stages
of simulations, in contrast to current other convergent MetaD methods.
This aspect of TTMetaD obviously makes it very appealing to study
NTP hydrolysis in computationally expensive QM/MM simulations, as
the sampling of the latter will always be limited due to the high
cost of the QM force evaluations. Additionally, as pointed out by
McGrath et al.,[32] as well as in our own
experience, in a longer MetaD simulation the bias potential may introduce
irreversible changes in the system that dramatically delays the convergence
or pushes the system into unstable regions and may even crash the
QM/MM simulation. Therefore, it is also of interest to employ a self-limiting
bias potential to avoid these issues and to properly focus the QM/MM
free energy sample, by using a method such as our recently developed
metabasin MetaD (MBMetaD).[37]The
purpose of this paper is to demonstrate, through the specific
example of ATP hydrolysis in actin, that a combination of TTMetaD
and MBMetaD with QM/MM MD simulation can provide a systematic and
rigorous estimate of the free energy pathway of NTP hydrolysis in
proteins. We study here the ATP hydrolysis in monomeric and filamentous
actin and compare the features of the two reactions, building on our
earlier work[31] with this more powerful
and convergent combinations of simulation methods. ATP hydrolysis
in actin plays an important role in the actin-based cytoskeleton.[38,39] For example, ATP-bound monomeric globular actin (G-actin) is added
to the (+) end of an actin filament, undergoing conformational change
to a more flattened state in filamentous actin (F-actin), which then
catalyzes ATP hydrolysis, then releases the hydrolyzed phosphate group,
and finally dissociates at the (−) end, now as an ADP-bound
G-actin. The high selectivity for hydrolyzing ATP in F-actin, as opposed
to in G-actin, is important for modulating the filament’s physical
properties and the binding affinity of various actin-binding proteins.
The most recent experimental studies[2,40] have reported
the ATP hydrolysis rate to be 0.3 ± 0.1 s–1 and 7 × 10–6 s–1 for the
reaction in F-actin and G-actin, respectively. Assuming the same Arrhenius
prefactor in both forms of actin, this dramatic speedup suggests a
6.6 ± 0.7 kcal/mol hydrolysis free energy barrier difference
between F- and G-actin. McCullagh et al.[31] carried out a QM/MM MD computational study to reveal the origins
of the different rates of ATP hydrolysis in the different actin states.
Nontempered MetaD was employed in that work, and the simulations showed
an 8 kcal/mol hydrolysis barrier difference between F- and G-actin,
in good agreement with the experimental estimate. The water structure
of the “proton relay” between the lytic water in the
hydrolysis site and Asp154 was found to be to one difference between
F-actin and G-actin.In this paper, we revisit these same actin
systems and compare
the performance of TTMetaD simulation against WTMetaD and nontempered
MetaD simulation. Our simulation follows the setup of McCullagh et
al.,[31] except for using different MetaD
approaches. In particular, we show that reaching convergence requires
a comprehensive representation of the overall hydrolysis process by
also adding in an additional CV that biases proton transfer from the
nucleophilic water molecule. This additional concurrent metadynamics
benefits from the self-limiting feature of MBMetaD and dramatically
accelerates the convergence of the free energy surface. Most importantly,
our simulations provide a rigorous analysis of the convergence of
the bias potential, which has so far largely been overlooked in QM/MM
simulations of NTP hydrolysis. The focus and conclusions of this paper
reveal the computational advantages of utilizing TTMetaD and MBMetaD
with QM/MM MD when studying NTP hydrolysis in proteins. Future work
will focus on utilizing this approach to then study the hydrolysis
mechanism itself in key proteins.
Methodology
Transition-Tempered
Metadynamics
MetaD utilizes an
external history dependent bias potential, which has the form of a
summation of Gaussian functions centered at previously visited points
in a preselected CV space. The bias potential helps the chemical system
to overcome large free energy barriers by discouraging it from revisiting
configurations that have already been sampled. In the first implementation
of MetaD, the bias potential (VG) evolved
according to the equationwith w called the bias energy
addition rate, usually expressed as the height of individual Gaussian
functions added, w0, and divided by a
deposition stride τ. The function S(R) represents the ith CV for MetaD simulations, i.e., distance, angle, dihedral,
atomic coordinates, radius of gyration, local density, etc. The term f(S(R)) is defined asin which σ is the width of the Gaussian for the ith CV. This approach is usually referred to
as nontempered MetaD because
the bias energy addition rate w does not depend on
the progress of the simulation. The bias potential from nontempered
MetaD does not converge asymptotically,[30] and a practical solution is to take the time average of the bias
potential, after the time (t0) when the
motion of the system on the CV space becomes diffusive.[41] The PMF estimated from nontempered MetaD takes
the formwhere VG(t) is the instantaneous
bias potential.To address
the convergence issue, WTMetaD tempers the Gaussian heights exponentially
with respect to the local bias potential. The height of the Gaussian
function becomes time-dependentin which ΔT is a parameter
(with the unit of temperature) chosen prior to the simulation. Dama
et al. have provided a rigorous mathematical proof that WTMetaD converges
asymptotically to a linearly scaled inverse of the underlying PMF.[30]On the other hand, TTMetaD smoothly converges
the bias potential
asymptotically like WTMetaD, but it keeps track of the region that
the bias potential has explored and delays the tempering of the Gaussian
height until all the basin points are included in this region. Because
the basins are relatively filled prior to the tempering, TTMetaD can
be used with a quickly converging tempering (small ΔT) without a concern for being trapped in the first nearby
free energy minimum. The height of the Gaussian in TTMetaD is tempered
aswhere V*(s,t) is
the minimal bias on the maximally biased path among all the
continuous paths s(λ) that connects all of
a set of preselected basin points in the CV space. TTMetaD does not
require the preknowledge of the barrier height to tune ΔT. Instead, one needs to estimate the position of the basin
points in the CV space. This estimation is straightforward when it
is defined by the problem of interest, i.e., in this case the ATP
state and ADP + Pi state for the ATP hydrolysis reaction. Generally,
TTMetaD has been proven to be robust to the accuracy of the basin
points as long as they are separated by a major barrier.[36]
Metabasin Metadynamics
In a conventional
MetaD simulation,
the bias energy is constructed to flatten the underlying free energy
surface globally on the CV space. However, in many-dimensional systems,
the system may be pushed out from the low energy regions that are
of interest to some of the higher energy regions that are not physical
or as relevant. Worse, entering these high-energy regions can couple
with large, effectively irreversible conformation changes, introducing
large hysteresis and limiting reproducibility in the estimates of
the PMF. Therefore, it is highly desirable to use a self-limiting
biasing method that only fills the underlying free energy surface
up to a specified level and then converges that bias potential to
estimate the PMF. This forces a more physically relevant exploration
and greater reproducibility of the PMF estimates.MBMetaD achieves
this self-limiting behavior for the bias potential by only flattening
a finite domain (D) of the whole CV space. MBMetaD
therefore utilizes a Gaussian with two partsin which fInt(S(R)) is the
part to flatten the interior of the domain, and fExt(S(R)) raises the bias level of the exterior to match the bias
level of the domain boundary to counteract the force from fInt(S(R)) that slopes into the exterior of the
domain. The term fInt(S(R)) is rescaled from
the regular Gaussian (named f0(S(R))) following
a relative to the multiplicative McGovern-de Pablo rule[42]with I being the boundary-normalized
integral of f0(S(R)) over D, and f is a scaling function. The function f(x) evaluates as x for x ≥ 1 and x + 0.5*(1–x)2 for x < 1, so that these
hills are identical to the McGovern-de Pablo form inside the domain
but taper more smoothly to zero outside of the domain. At the same
time, MBMetaD constructs the exterior bias update, fExt(S(R)), as the function that would force the sampling into
the domain near the boundary exactly enough to cancel out how the
interior bias, fInt(S(R)), would push the
sampling out near the domain boundary if sampling inside the domain
were perfectly flat (or were the desired tempered distribution in
the case of WTMetaD). A detailed description of MBMetaD and additional
applications can be found in ref (37).
Initial Structures
The structures
of both G- and F-actin
were generated from equilibrated classical molecular dynamics simulations.[43] The crystal structure (pdb code 1NWK) was used for the
G-actin simulation and the Oda model (pdb code 2ZWH) after equilibration
of a 13-mer periodic filament was used for F-actin simulation. The
main change during the actin polymerization is a flattening of the
subdomain 2–1–3–4 dihedral, which is −0.55
degree in F-actin and −27.09 degree in G-actin. To prevent
F-actin from relaxing to G-actin during the MetaD simulation, a harmonic
bias force with a spring constant of 228.8 kcal/mol was applied to
restrain this dihedral based on a coarse-grained mapping of F-actin.
Further details of the structures in this simulation are provided
in McCullagh et al.[31] and its Supporting
Information.
QM/MM Setup
We adopted the same
QM region as in McCullagh
et al.,[31] including methyl triphosphate,
bound magnesium, surrounding waters, and ten amino acids (Asp11, Gly13,
Ser14, Gly15, Lys18, Gln137, Asp154, Gly156, Asp157, and His161).
The atoms involved in coordination with the catalytic magnesium, i.e.,
one betaoxygen, one gamma oxygen, and four water molecules, are included
in this QM region. Sequential amino acids backbone atoms were included
and hydrogen capped at the terminal nitrogen and carbon. Nonsequential
amino acids were truncated at the Cβ or Cγ carbon and capped with hydrogens.[31] The
PBE/TZV2P level of density functional theory[44,45] was employed for the QM region simulations, and the magnesium ion
was treated with GTH pseudopotentials.[46,47] The CHARMM27
force field[48,49] was used to model the MM region,
and periodic boundary conditions were applied in the NVT ensemble
with a smoothed particle mesh Ewald treatment of the long-range electrostatics.[50] The temperature was maintained at 310 K with
a Nosé–Hoover thermostat,[51] and the integration time step was set to 0.5 fs, totaling over 1.5
ns of simulation time. The total CPU time used in this project was
about 5 million hours. The QM/MM MetaD simulations were performed
with the CP2K software[52] coupled to the
enhanced sampling package PLUMED2.[53]
Metadynamics Setup
A set of appropriate CVs (denoted
as “s”) representing the chemical reaction
of interest must be chosen prior to MetaD simulation. As described
earlier, MetaD sequentially builds up a history dependent bias potential
of these variables to accelerate their dynamics. As suggested by McCullagh
et al.,[31] our TTMetaD and WTMetaD simulations
utilize two CVs in the ATP hydrolysis reaction (cf. Figure ). One CV is the coordination
number between the terminal (gamma) phosphorus (Pγ) and bridging (beta) oxygen (Oβ), shown in Figure . This coordination
number describes the forming and cleavage of the bond between Pγ and Oβ and allows the recombination
of Pγ with any of the Oβ atoms.
The other CV, also seen from Figure , is the coordination number between Pγ and both gamma oxygen (Oγ) and QM wateroxygens
(Ow). Therefore, all the QM water molecules (about 30)
can possibly be the lytic water that attacks Pγ and
triggers the hydrolysis reaction, in contrast to only biasing a particular
water molecule in the QM region to be the lytic water.[33] We anticipate this species-agnostic approach
to be important to prevent unphysical artifacts from breaking chemical
symmetries while modeling this reaction.
Figure 1
An illustration of the
atoms involved in the two collective variables
employed in the TTMetaD simulations of ATP hydrolysis in actin.
An illustration of the
atoms involved in the two collective variables
employed in the TTMetaD simulations of ATP hydrolysis in actin.It should be noted that the CVs
in our simulation prefer neither
an “associative” mechanism[54,55] nor a “dissociative” mechanism[56,57] for the ATP hydrolysis. The former suggests the lytic water molecule
forms a bond with Pγ before the bond between Pγ and Oβ breaks. The latter proposes
the Pγ and Oβ bond breaks before
the nucleophilic attack of lytic water. Importantly, an additional
CV that biases the hydrogen transfer from the lytic water was also
used in our QM/MM TTMetaD simulation with a self-limiting MetaD (MBMetaD)
to enhance convergence of the PMF. This feature of the simulation
will be further described later.The parameter ΔT (in eq )
that tunes the speed of decreasing Gaussian
height significantly affects WTMetaD convergence rate.[29,35,36,58] It is suggested in the original WTMetaD paper that ΔT should be chosen according to the free energy barrier,
as ΔT + T should be of the
order of magnitude of the barrier height.[29] Therefore, ΔT was chosen to be 12090 K (corresponding
to 24.0 kcal/mol) to maximize the efficiency of WTMetaD, based on
the previously determined hydrolysis barrier height (30 and 22 kcal/mol
for ATP hydrolysis in G- and F-actin, respectively).[31] However, it is important to note that this level of accuracy
of the estimates of the energy barrier is almost impossible for every
NPT hydrolysis reaction, especially if one is interested in a reaction
that has not been studied before. We note that TTMetaD is more robust
to the choice of ΔT as compared to WTMetaD,[35,36] and it is recommended to temper more aggressively (with a smaller
ΔT value), since the tempering is not applied
until the basins are already relatively filled. We therefore set ΔT to be 1240 K (corresponding to 2.5 kcal/mol) for the TTMetaD
simulations. Additionally, one needs to provide the estimates of the
basin positions, which are easy to do for ATP hydrolysis (one basin
is the ATP region and the other is the ADP + Pi region). The same
heights of Gaussian are used for both WTMetaD and TTMetaD simulations
as in the previous nontempered MetaD simulations.[31]
Results
TTMetaD vs WTMetaD
We carried out three sets of QM/MM
simulations to compare the performance between WTMetaD and TTMetaD
in the early stages of the simulation: F-actin starting from ATP,
F-actin starting from ADP + Pi, and G-actin starting from ATP. For
hydrolysis in F-actin, WTMetaD and TTMetaD simulations were initiated
separately from ATP and ADP basins. To illustrate the data, the bias
potentials from the ATP and ADP replicas were summed and plotted together
into one plot (top row in Figure ), following the practice of multiple walker MetaD.[59] This figure demonstrates the efficiency of TTMetaD
in exploring the CV space at the early stages of the simulation. It
is seen that after 50 ps, both ATP and ADP + Pi basins of the underlying
free energy surface are much better filled in TTMetaD simulations
than in WTMetaD simulations. In the TTMetaD simulations, both ATP
and ADP replicas have escaped from their origin basin and started
to explore the transition state region, in contrast to them being
trapped in its origin basin in WTMetaD simulations. The WTMetaD contour
lines are much smoother than those in the TTMetaD plot, reflecting
the fact that the WTMetaD is using smaller Gaussians before overcoming
the barrier. The maximum value of the bias energy, which can be identified
from the background color in each figure, is about 25 kcal/mol in
TTMetaD, significantly higher than the value of 18 kcal/mol in WTMetaD.
According to the tempering rules of WTMetaD (eq ), a continued WTMetaD simulation would use
as small as 47% of the original Gaussian hill height, in contrast
to TTMetaD that still uses a full Gaussian since the predefined basins
have not been connected by the bias potential yet. WTMetaD will eventually
converge with longer simulation times; however, due to tempering,
it will take even longer to add the same amount of bias energy to
the system than before, and this makes it relatively inefficient.
The G-actinATP simulations show a very similar story to the F-actin
simulations (bottom row in Figure ).
Figure 2
Inverse of the bias potential for ATP hydrolysis in F-actin
(top
row) and G-actin (bottom row). For F-actin, the bias potential is
the summation of ATP and ADP+Pi replicas (50 ps each). For G-actin,
the bias potential is from one of two ATP replicas (50 ps). Bias potentials
from WTMetaD are shown in the left column, and bias potentials from
TTMetaD are shown in the right column. The unit of energy is kcal/mol.
Inverse of the bias potential for ATP hydrolysis in F-actin
(top
row) and G-actin (bottom row). For F-actin, the bias potential is
the summation of ATP and ADP+Pi replicas (50 ps each). For G-actin,
the bias potential is from one of two ATP replicas (50 ps). Bias potentials
from WTMetaD are shown in the left column, and bias potentials from
TTMetaD are shown in the right column. The unit of energy is kcal/mol.At this point, the comparison
between TTMetaD and WTMetaD was clear,
so we halted the WTMetaD simulations at 50 ps. It is important to
emphasize that WTMetaD was set up in a way to maximize its efficiency
by taking advantage of an accurate prior knowledge of the hydrolysis
barrier height,[31] but TTMetaD’s
advantages seen early in the simulation were compelling. Indeed, the
early exploration efficiency of TTMetaD appears to be of considerable
value to the computationally demanding QM/MM NTP hydrolysis simulations.
Practical Convergence Requires Comprehensive CVs
We
next continued all three sets of TTMetaD simulations (F-actin starting
from ATP, F-actin starting from ADP + Pi, and G-actin starting from
ATP) beyond 50 ps to achieve better PMF convergence. We also replicated
each set, so that the hydrolysis in G-actin and in F-actin was both
modeled by 4 replicas each. After reading in the TTMetaD bias potential
as shown in Figure , the replicated trajectories were restarted from the configuration
at 50 ps with a random velocity sampled from a Boltzmann distribution
at 310 K. The reported PMF estimates are then averaged over those
replicas, because this ensemble-average approach has previously proven
to converge the MetaD simulations faster in both wall times and CPU
times.[36]As the TTMetaD simulation
goes on, the trajectories show diffusive motion between ATP and ADP
+ [H2O–PO3]−. However,
a complete hydrolysis reaction requires one of the protons from the
lytic water in [H2O–PO3]− to transfer to Oγ to form [H2PO4]− (see Figure );[24,31,33] but, without an additional bias, [H2O–PO3]− (seen in Figure ) remains a metastable state, and the transition between
[H2O–PO3]− and [H2PO4]− does not happen spontaneously.[33] The [H2O–PO3]− and [H2PO4]− correspond to a very similar value on the 2D CV space (with only
slight differences due to surrounding water structure), and, as a
result, the 2-CV biasing force from the previously described TTMetaD
simulations did not accelerate this proton transfer process. The neglect
of an important slow degree of freedom (i.e., rare event) in MetaD
simulations usually leads to significant hysteresis in the PMF estimates,[60] and we observed this in our simulation. Taking
the ATP hydrolysis in G-actin as an illustrative example, the longer
simulation shows the trajectories spend a relatively longer time between
the [H2O–PO3]− and
the transition state region, leading to an unphysical minimum near
that region in our PMF estimates (right panel in Figure ).
Figure 3
An illustration of the
collective variables employed in the MBMetaD
simulations to accelerate the transition between [H2O–PO3]− (bound water proton coordination number
is 2, shown as purple) and [H2PO4]− (bound water proton coordination number is 1, not shown).
Figure 4
Bias potential from simulations of ATP hydrolysis
in G-actin. Left
panel: the inverse of the bias potential from averaging over 4 TTMetaD
replicas with 2 CVs (180 ps for each replica). Right panel: the extended
(∼50 ps) simulation from the left panel. An unphysical hysteresis
builds up in the PMF (∼0.1, ∼3.8) because of the neglect
of a third key slow degree of freedom (proton transfer of the lytic
water). The unit of energy is kcal/mol.
An illustration of the
collective variables employed in the MBMetaD
simulations to accelerate the transition between [H2O–PO3]− (bound water proton coordination number
is 2, shown as purple) and [H2PO4]− (bound water proton coordination number is 1, not shown).Bias potential from simulations of ATP hydrolysis
in G-actin. Left
panel: the inverse of the bias potential from averaging over 4 TTMetaD
replicas with 2 CVs (180 ps for each replica). Right panel: the extended
(∼50 ps) simulation from the left panel. An unphysical hysteresis
builds up in the PMF (∼0.1, ∼3.8) because of the neglect
of a third key slow degree of freedom (proton transfer of the lytic
water). The unit of energy is kcal/mol.In general, MetaD simulations with more than 2 instantaneous
CVs
are difficult to converge even with tempering.[37,60,61] One solution to this higher-dimensional
problem is to simulate a series of MetaD simulations with different
concurrent CVs and let their bias energy and configuration exchange
between one another.[62] However, the application
of this approach to the ATP hydrolysis system seems less than optimal,
as the number of replicas needed to build “the ladder of replicas”
could be large and QM/MM simulation is costly.Nevertheless,
we decided to explore the idea of concurrent MetaD[62] and use it to explore just a third CV to accelerate
the hydrogen transfer of the lytic water: the coordination number
between lytic wateroxygen (Ol) and lytic waterhydrogen
(Hl). This CV was chosen to bias the transition between
[H2O–PO3]− (proton
coordination number of the bound water is 2) and [H2PO4]− (proton coordination number of the bound
water is 1). An illustration of this CV can be found in Figure . As shown in other research,
as well as in our own simulations, the lytic water, instead of an
already dissociated hydroxide,[32,33] nucleophilically attacks
the backside of Pγ and initiates the hydrolysis reactions.
Therefore, biasing the lytic water from the beginning of the simulation
has a risk of introducing artifacts into the simulation, i.e., forcing
hydroxide to be the attacking group, if the MetaD drives autoionization
too aggressively. Additionally, the lytic water could be any water
molecule adjacent to the Pγ, and biasing all of them
to the point of autoionization is clearly undesirable.Therefore,
we utilized MBMetaD to introduce a concurrent and self-limiting
bias potential that features the following properties: 1) it is only
in play after the formation of the [H2O–PO3]− intermediate; 2) it applies only to the lytic
water that forms the [H2O–PO3]− intermediate; and 3) the maximum bias value applied to the CV is
less than 6 kcal/mol (self-limiting). The value of 6 kcal/mol was
chosen as an upper-limit so that the Ol–Hl bond would not break unless there is a good Hl accepting
group close by, in contrast to conventional MetaD where the Ol–Hl bond would be driven to break regardless
of the local environment. As a result of this MBMetaD bias, the weakened
Ol–Hl bond had a much shorter lifetime
than in the 2-CV TTMetaD simulations, and the transfer between [H2O–PO3]− and [H2PO4]− became diffusive. Then, depending
on the chemical environment, the proton transfers from lytic wateroxygen to gamma oxygen followed different pathways.TTMetaD
combined with concurrent MBMetaD is able to converge the
simulation well, and the final PMF estimates from averaging over four
replicas are depicted in Figure . The minimum free energy path was then computed with
the string method[63] and is plotted in Figure . The difference
in activation free energy of ATP hydrolysis between G-actin and F-actin
is 7.1 kcal/mol, aligning well with the experimentally suggested difference
of 6.6 ± 0.7 kcal/mol. This agreement is also closer than the
previous result of 8 kcal/mol obtained from nontempered MetaD simulations.[31] The overall magnitude of the hydrolysis barrier
is almost certainly dominated by electrostatic effects in the active
site as well as the energy required to break the chemical bonds plus
differences in the efficiency of the proton relay from the lytic water
in F-actin versus G-actin.[31]
Figure 5
ATP hydrolysis
in F- (left) and G-actin (right) PMF from TTMetaD
+ concurrent MBMetaD simulations as described in the main text (averaged
over 4 replicas). The unit of energy is kcal/mol.
Figure 6
Minimum free energy path from the 2D free energy surfaces (Figure ) for ATP hydrolysis
in F- and G-actin. The reaction coordinate represents the hydrolysis
reaction progress along the minimum free energy path, i.e. “0”
being ATP and “1” being ADP + Pi.
ATP hydrolysis
in F- (left) and G-actin (right) PMF from TTMetaD
+ concurrent MBMetaD simulations as described in the main text (averaged
over 4 replicas). The unit of energy is kcal/mol.Minimum free energy path from the 2D free energy surfaces (Figure ) for ATP hydrolysis
in F- and G-actin. The reaction coordinate represents the hydrolysis
reaction progress along the minimum free energy path, i.e. “0”
being ATP and “1” being ADP + Pi.
Discussions of Convergence
The asymptotic convergence
for tempered metadynamics (WTMetaD and TTMetaD) cannot in principle
be reached with finite sampling.[30] Therefore,
it is important to set up practical convergence criteria that can
be verified for expensive QM/MM simulations such as the present ones
of ATP hydrolysis. As suggested in a number of papers,[28,36,60,64] the diffusive motions of the CVs and the differences between the
replicas can be employed as convergence criteria in MetaD simulations.
Conversely, the evolution of the bias potential is not used because
of a concern for false convergence, where the Gaussian height could
have been tempered so little that the bias potential barely changes
with longer simulation, but the calculation is in fact not converged.To verify the diffusive motion of the system in CV space, we restarted
trajectories from the ATP configuration with velocities randomly sampled
from a Boltzmann distribution (at 310 K). The “candidate”
PMF (Figure ) was
read in, and the Gaussians kept being added following the TTMetaD
tempering rule. The concurrent and self-limiting bias accelerating
the transfer between [H2O–PO3]− and [H2PO4]− was applied
as described before. If the previous simulation had indeed reached
a reasonable degree of convergence, one would expect the system to
escape from the ATP basin, cross the transition state region, form
ADP + [H2O–PO3]−, and
eventually reach ADP + [H2PO4]− relatively quickly, i.e., the whole process should be diffusive
without being trapped in any region. Taking F-actin as an example,
we tested eight such trajectories, and all of them show satisfactory
diffusive motion (i.e., by finishing the hydrolysis process within
15 ps, which includes constructing the self-limiting bias for the
hydrogen transfer CV). An example trajectory of ATP hydrolysis in
F-actin that demonstrates the proton transfer is shown in Figure . The evolution of
the CVs for another testing trajectory of hydrolysis in F-actin is
depicted in the Supporting Information (Figure
S1).
Figure 7
An example of a diffusive trajectory of ATP hydrolysis in F-actin.
An example of a diffusive trajectory of ATP hydrolysis in F-actin.We also compared the differences
between the bias potentials from
each replica because the robust convergence cannot be confirmed until
they agree reasonably well with every other. As mentioned earlier,
we simulated four replicated simulations for ATP hydrolysis in both
F- and G-actin. Figure depicts the minimum free energy paths (the ATP region is shifted
to zero) from the four replicas of ATP hydrolysis in G-actin. All
four minimum free energy paths show good agreements in the hydrolysis
barrier. However, differences in the ADP + Pi region indicate that
the convergence quality is not as ideal as in the ATP transition state
region. Nevertheless, in actin the forward ATP hydrolysis reaction
(forming ADP + Pi) is much more important than its synthesis (reverse
reaction), which is a very rare event.
Figure 8
Minimum free energy path
from 4 replicas of ATP hydrolysis in G-actin.
The reaction coordinate represents the hydrolysis reaction progress
along the minimum free energy path, i.e. “0” being ATP
and “1” being ADP + Pi.
Minimum free energy path
from 4 replicas of ATP hydrolysis in G-actin.
The reaction coordinate represents the hydrolysis reaction progress
along the minimum free energy path, i.e. “0” being ATP
and “1” being ADP + Pi.It is also of interest to compare the converged TTMetaD and
concurrent
MBMetaD simulation with the previous nontempered MetaD simulation
from McCullagh et al.[31] The PMF estimates
and the minimum free energy path from the nontempered simulation are
replotted in the Supporting Information (Figures S2 and S3). Even though there are differences in the PMF
estimates between the two studies, the key feature, the barrier height
in the minimum free energy paths from both simulations, agrees well.
As has been discussed in the Introduction,
a QM/MM study of NTP hydrolysis is extremely expensive and complicated;
therefore, McCullagh et al.[31] had chosen
to employ nontempered MetaD to ensure the exploration of the CV space
rather than to aim for a more meaningful PMF convergence with a method
like WTMetaD. Also, there was a lack of averaging of the nontempered
MetaD bias after observing the diffusive motion; hence the convergence
of the nontempered MetaD PMF estimate was not ideal. However, as shown
here with the recently developed TTMetaD and concurrent MBMetaD, QM/MM
simulations are now able to provide a rigorous and converged PMF estimate
for this system and presumably others similar to it.
Conclusions
Protein mediated NPT hydrolysis such as the actinATP hydrolysis
studied in this work remains one of the most important and challenging
biochemical reactions in computational biophysics. The size and complexity
of the protein environment adjacent to the hydrolysis site (water
structure, catalytic metal, and amino acid residues), as well as the
chemical complexity of the hydrolysis reaction itself, limits the
amount of sampling that one can achieve in a computationally demanding
QM/MM simulation. Therefore, an enhanced sampling method featuring
high-quality PMF estimates at the earliest possible stage of the simulation
is highly desirable. As shown in this work, TTMetaD is well suited
to this task in that it explores the CV space with full Gaussians
until the basins are roughly filled. This gives rapid estimates containing
all of the key features of the PMF. The bias is then aggressively
converged using quickly tempered Gaussians for the asymptotical convergence.
The TTMetaD QM/MM simulations also exhibit diffusive motion between
ATP and ADP + [H2O−PO3]−.We further introduced an additional bias potential
to accelerate
the transition between [H2O–PO3]− and [H2PO4]−. To perturb the hydrolysis dynamics to the least effective extent,
this bias was constructed from the self-limiting MBMetaD method. It
is important to emphasize that our MetaD simulations were not carried
out with 3 simultaneous CVs, where one must compute the PMF with respect
to any two CVs at the fixed value of the third CV. The third CV (biasing
proton transfer from lytic water) was only introduced to accelerate
one slow degree of freedom that prevented the TTMetaD with other two
hydrolysis CVs from converging. We therefore consider TTMetaD + concurrent
MBMetaD to be a much more efficient and physically motivated approach
than MetaD simulations with 3 simultaneous CVs. The reasons are 1)
The larger number of simultaneous CVs that MetaD uses, the longer
the simulation takes to converge; 2) Biasing the lytic water molecules
at the outset of the simulation can lead to cleaving the O–H
bond before it nucleophilically attacks ATP; and 3) Identifying in
advance a particular water molecule to be the lytic water from all
of the QM water molecules is nontrivial, so it is optimal to allow
one to initiate the reaction before adding the MBMetaD bias to it.We have also shown that running replicated tempered MetaD simulations
is advantageous, since not only do they provide confirmation of convergence,
but they can also reduce the simulation wall clock times, thus allowing
them to converge faster. Furthermore, a single tempered MetaD calculation
with limited sampling can be very untrustworthy.[36] In principal, nontempered MetaD can also provide converged
PMF estimates with two levels of averaging: time averaging over a
period of the simulation after it becomes diffusive and replica averaging
over the time-averaged bias potential. However, the diffusive motion
itself is difficult to realize due to the limited sampling in NTP
hydrolysis. To the best of our knowledge, this work is the first time
that diffusive MetaD behavior has been demonstrated for a QM/MM simulation
of an NTP hydrolysis reaction.The enhanced sampling methods
described in this paper, TTMetaD
and MBMetaD, have been implemented in a publicly available fork of
the latest version of PLUMED2 and are free to use. We are currently
working on implementing TTMetaD and MBMetaD into the public version
of the most recent PLUMED. Both TTMetaD and MBMetaD are easy to understand
and apply, and, compared to a conventional nontempered MetaD or WTMetaD
simulation, the extra input parameters that TTMetaD and MBMetaD require
have robust defaults and are straightforward to incorporate. A detailed
user manual for these two methods is available upon request. No specialized
or separate mathematical libraries are required by the new features
of these methods, thus any researcher already using MetaD and PLUMED2
can begin using TTMetaD and MBMetaD simply by reinstalling the new
fork of PLUMED2 with any compatible MD engine. An example MBMetaD
input file (used in our simulation) is provided in the Supporting Information.To conclude, in
this paper we have demonstrated the combined advantages
of TTMetaD and MBMetaD simulation for demanding QM/MM MD free energy
simulations of the ATP hydrolysis process in actin. In turn, the calculated
(and demonstrably reasonably well converged) difference in free energy
barriers for G-actin versus F-actin is found to be in good agreement
with the experimental value. Our work therefore serves as an example
for utilizing this approach in simulating complex NTP hydrolysis reactions
in proteins, which are some of the most challenging of all biochemical reactions. Additional
simulations along these lines will be the target of future research,
as well as additional mechanistic analysis.
Authors: Harshwardhan H Katkar; Aram Davtyan; Aleksander E P Durumeric; Glen M Hocky; Anthony C Schramm; Enrique M De La Cruz; Gregory A Voth Journal: Biophys J Date: 2018-09-01 Impact factor: 4.033
Authors: Vilmos Zsolnay; Harshwardhan H Katkar; Steven Z Chou; Thomas D Pollard; Gregory A Voth Journal: Proc Natl Acad Sci U S A Date: 2020-11-16 Impact factor: 11.205
Authors: Jaehyeok Jin; Alexander J Pak; Aleksander E P Durumeric; Timothy D Loose; Gregory A Voth Journal: J Chem Theory Comput Date: 2022-09-07 Impact factor: 6.578
Authors: Alexander Zlobin; Yuliana Mokrushina; Stanislav Terekhov; Arthur Zalevsky; Tatiana Bobik; Anastasiya Stepanova; Maria Aliseychik; Olga Kartseva; Sergey Panteleev; Andrey Golovin; Alexey Belogurov; Alexander Gabibov; Ivan Smirnov Journal: Front Pharmacol Date: 2018-08-03 Impact factor: 5.810
Authors: Clement P M Scipion; Umesh Ghoshdastider; Fernando J Ferrer; Tsz-Ying Yuen; Jantana Wongsantichon; Robert C Robinson Journal: Proc Natl Acad Sci U S A Date: 2018-09-25 Impact factor: 11.205