Accurately calculating a weak acid's pKa from simulations remains a challenging task. We report a multiscale theoretical approach to calculate the free energy profile for acid ionization, resulting in accurate absolute pKa values in addition to insights into the underlying mechanism. Importantly, our approach minimizes empiricism by mapping electronic structure data (QM/MM forces) into a reactive molecular dynamics model capable of extensive sampling. Consequently, the bulk property of interest (the absolute pKa) is the natural consequence of the model, not a parameter used to fit it. This approach is applied to create reactive models of aspartic and glutamic acids. We show that these models predict the correct pKa values and provide ample statistics to probe the molecular mechanism of dissociation. This analysis shows changes in the solvation structure and Zundel-dominated transitions between the protonated acid, contact ion pair, and bulk solvated excess proton.
Accurately calculating a weak acid's pKa from simulations remains a challenging task. We report a multiscale theoretical approach to calculate the free energy profile for acid ionization, resulting in accurate absolute pKa values in addition to insights into the underlying mechanism. Importantly, our approach minimizes empiricism by mapping electronic structure data (QM/MM forces) into a reactive molecular dynamics model capable of extensive sampling. Consequently, the bulk property of interest (the absolute pKa) is the natural consequence of the model, not a parameter used to fit it. This approach is applied to create reactive models of aspartic and glutamic acids. We show that these models predict the correct pKa values and provide ample statistics to probe the molecular mechanism of dissociation. This analysis shows changes in the solvation structure and Zundel-dominated transitions between the protonated acid, contact ion pair, and bulk solvated excess proton.
Weak acid ionization
plays a central role in many chemical, biological,
and industrial processes. Measuring and predicting the free energy
of this process (i.e., pKa values) have
become important experimental and theoretical tools, enabling control
over and insight into acid/base chemistry and pH-dependent biomolecular
transformations. Although a large number of theoretical methods have
been developed to predict pKa values (see
recent reviews (1) and (2) and references therein),
less attention has been given to understanding the molecular-scale
mechanism of acid dissociation. This is largely due to the complexity
and cost of analyzing the full free energy surface, which involves
not only a chemical reaction but also significant solvent and solute
reorganization. Instead, most pKa methods
rely on the use of thermodynamic cycles that break the complex process
of deprotonation into computationally tractable steps. For small molecules,
this cycle typically combines the gas phase dissociation energy, calculated
with high-level quantum mechanical (QM) calculations, with de/solvation
free energies of the reactant and product species, calculated with
continuum solvent approaches.[2−6] While the gas phase energies are quite accurate, the simple continuum
description of solvation free energies often requires parameter adjustment
to reproduce all of the entropic and enthalpic complexities that are
inherent in molecular solvation. They have been improved by hybrid
approaches that include some number of explicit water molecules in
the solvation free energy analysis.[7,8] In conjunction
with methodological refinements, these QM-continuum solvent models
have enabled pKa calculations of small
acids to within a few units of the experimentally determined values.[4−6,8]In contrast, the suite of
methods that have been developed to predict
biological pKa values have typically used
a thermodynamic cycle in which the solute remains in a solvated environment.
These approaches range from microscopic to continuum to completely
empirical. Each has advantages and drawbacks as well, discussed in
ref (2), however none
has yet demonstrated the ability to predict pKa values within a few pKa units
of the experimental value for all test cases. It has been suggested
that challenging moieties (e.g., buried residues that have large pKa shifts or those with strong coupling to conformational
changes or other titration sites) will require methods that better
capture the underlying physics.[1] These
conclusions highlight how interesting and nuanced a biomolecular system
can be. While it is clear that protonation states influence everything
from stability to binding and reactivity, the details of their influence
is still sometimes poorly understood.Few methods to date have
simulated the actual process of interest,
the exchange of an excess positive charge from an acid to bulk water.
To do so involves calculating the free energy surface for proton dissociation
in the bulk phase environment, from which one learns about the molecular
mechanism and rate. This requires three things: (1) a method that
describes the underlying physics of bond formation and cleavage in
the bulk phase environment, (2) a method that is efficient enough
that when combined with enhanced sampling approaches can capture the
distribution of phase space that defines a free energy in the condensed
phase (where entropy and energy are often a similar magnitude), and
(3) a reaction coordinate that is flexible enough to track the transferring
excess charge defect.The first of these requirements is most
obviously filled by QM
approaches, which have been used in a handful of helpful studies to
probe acid dissociation. Ivanov et al. have used Carr–Parrinello
molecular dynamics (CPMD) to calculate the free energy profile of
histidine dissociation.[9,10] Although the chosen reaction
coordinate limited their analysis to stopping just beyond the transition
state,[11] they were able to calculate the
correct relative pKa value by subtracting
the equivalent profile for water autodissociation.[9] Similarly, Park et al. used CPMD in combination with metadynamics
to sample the configurational space of acetic acid in bulk water.[12] While their trajectories were too short to sufficiently
converge the free energy surface, they were able to observe and characterize
multiple protonation and deprotonation events. These reactions occurred
along a well-characterized pathway where the excess proton is briefly
shared between the acid and solvating water. They also noted the existence
of a shallow minimum in the free energy surface corresponding to the
existence of a contact ion pair (CIP) that is separated from the protonated
state by a small energy barrier. In an alternative approach that still
describes the bond dissociation at the ab initio level, Uddin et al.
recently used QM/MM to calculate the free energy surfaces for several
small molecules.[13] Although they obtained
quite accurate pKa values, their method
cannot describe the mechanism of dissociation since they include only
one water molecule and the acid in the QM region. Hence, this method
changes the physical process from the dissociation of a delocalized
charge defect to proton transfer followed by separation of a localized
hydronium cation and acid anion in a solvating MM environment. Just
as for thermodynamic cycles, their results show that sampling an alternative
pathway between reactant and product states can yield accurate pKa values for small molecules.Another
approach that describes the bond dissociation process is
reactive MD (RMD).[14−17] The efficiency of these models allows them to sample reactive dynamics
on much longer length and time scales than ab initio MD. Since they
are parametrized to work with MM force fields, they also avoid the
boundary issues suffered by QM/MM calculations. Both of these factors
make RMD models ideal for simulating reactions in large, complex systems
with slow dynamics. Proteins and proton exchange membranes are two
important examples of such systems.[18−25] Without sufficient sampling in these types of systems, the final
result can be strongly influenced by the initial conditions. A common
criticism of RMD models is that they are only approximations to the
true quantum mechanical behavior of the system. While this is certainly
true, if the functional form is both flexible enough and parametrized
correctly, then the reactive model will retain the ability to accurately
reproduce the potential and free energy surfaces from which it was
parametrized.[26,27]The multistate empirical
valence bond (MS-EVB) method is a RMD
force field that has been used to characterize proton transport in
many condensed phase and biological environments.[11,28] It is especially well suited to the challenge of acid ionization
since it inherently defines a center of excess charge that can be
used to track reactions involving proton transport, thereby avoiding
issues with geometric reaction coordinates.[9,11] Although
MS-EVB has been used to study weak acids[29,30] and amino acid dissociation,[19,31] these efforts involved
a larger degree of empiricism since the pKa value was needed to fit the MSEVB potential. It has been a long
sought goal to find a procedure that would minimize the empiricism
in RMD models, such that the bulk property of interest was derived
entirely from quantum data in combination with extensive sampling.This work reports the achievement of that goal, a multiscale approach
to calculate the free energy profile for acid ionization that is based
entirely on QM data and yields an accurate absolute pKa value. An iterative procedure, in the spirit of adaptive
force matching,[32,33] is used to sample the reactive
phase space, obtain reference ab initio data (QM/MM forces), and fit
a RMD potential. Care has been taken to modify the functional form
such that it is sufficiently flexible to reproduce the QM/MM reference
data and, in principle, capture the essential physics. Our approach
is applied to aspartic acid (Asp) and to glutamic acid (Glu). Both
of these amino acids play crucial roles in proton and ion transport
in biological channels and pumps and are also commonly involved in
salt bridges that influence protein dynamics, structure, and binding.
We show that accurate estimates of the pKa values of both Asp and Glu are obtained from the resulting reactive
models, which is the first time a reactive model has been able to
accurately predict the correct pKa without
being explicitly parametrized to do so. Finally, we discuss the mechanistic
description our models provide of acid deprotonation in water.To emphasize that these models were fit entirely to QM/MM data
and without adjustments to match the experimental pKa, we will hereafter refer to them as multiscale reactive
molecular dynamics (MS-RMD) models.[26] It
should be noted, however, that the MS-RMD Asp and Glu models are parametrized
to work with a refined version of the MS-EVB3 hydronium model that
accurately describes a proton solvation and transport in water.[28]
Methodology
MS-RMD Description
MS-RMD is similar to the MS-EVB
framework that has been successfully used to model proton solvation
and transport in many aqueous systems.[11,15,28,31,34,35] The power in this approach lies
in its ability to describe electronic delocalization as a linear combination
of topologically distinct states. For a given configuration, states
|i⟩ and |j⟩ differ
only in their bonding topology. Each diabatic state describes a localized
protonated species. For example, state |i⟩
could be a protonated amino acid in a box of water (Figure 1a). State |j⟩ would have
the same coordinates but have a topology describing a CIP between
a solvated hydronium ion and a deprotonated amino acid (Figure 1b). At each step in the simulation, a state search
algorithm first determines all the possible states based on geometric
criteria and then calculates the energy of each state. The lowest
energy state is termed the pivot state and is the start of the state
search algorithm at the next step. Since the number of waters, and
therefore the number of possible states, grows exponentially with
distance from the pivot state, only states in the first three solvation
shells are included in the Hamiltonian. This has been shown to provide
sufficient accuracy and energy conservation.
Figure 1
Several different bonding
topologies are shown for a given set
of nuclear coordinates, r. Dashed lines surround the
protonated molecule, as defined by the bonding topology. (a) A protonated
Asp solvated by water. (b–d) Different possible locations for
the hydronium molecule.
Several different bonding
topologies are shown for a given set
of nuclear coordinates, r. Dashed lines surround the
protonated molecule, as defined by the bonding topology. (a) A protonated
Asp solvated by water. (b–d) Different possible locations for
the hydronium molecule.The reactive system is described by the following Hamiltonian:where r is the vector describing
the nuclear coordinates, h is the sum of MM potential terms for state i, and h is the coupling
between states |i⟩ and |j⟩. The details of these terms will be provided in the Force Field section. With the Hamiltonian defined
this way, the relative weight of each state is described by the components
of the eigenvector, c:Finally, forces are calculated using
the Hellmann–Feynman
theorem. These forces are then passed to a standard MD integrator
such as velocity Verlet.Free energy calculations require a
continuously varying reaction
coordinate (RC) to describe the process of interest. The EVB formalism
provides a convenient reference for the RC; the center of the excess
charge (CEC):In this equation, rCOC is the geometric center-of-charge
for the protonated molecule in state i, and c is the relative weight of
that state. We then use the distance between the CEC and the center
of mass of the Asp or Glu carboxyl group to describe the progress
of the deprotonation reaction.
Force Field
The
diagonal elements of the MS-RMD Hamiltonian
matrix are derived from the standard CHARMM MM force field.[36] The protonated form uses all of the standard
parameters with the exception of the acidic O–H bond. This
bond is replaced with a Morse potential to allow dissociation:where r is the bond length
and D, α, and r0 are parameters fit to reproduce the bond length and frequency of
the harmonic potential it replaces while still allowing for a reasonable
bond dissociation energy. The deprotonated form of amino acid uses
all of the standard CHARMM bonded and nonbonded terms and parameters.
Since the CHARMM force field was not originally intended for use in
reactive simulations, additional repulsive terms are used between
the carboxyl oxygens of the deprotonated model and the hydroniumoxygen
and hydrogens. These terms help to correct for the overattraction
at close distances due to the use of point charges.[37] The functional forms for the repulsions are the same as
those used in the MS-EVB3 hydronium model:[28]where B, b, b′, C, and c are fitted parameters. Following
the logic in previous
work, and to avoid parameter redundancy, dOO0 and dOH0 were
fixed the same values used in MS-EVB3. The sum of Gaussians term in
eq 5 is included to help the fitted potential
to reproduce the correct asymmetric solvation structure around the
hydronium, and the same switching function is used as given in eq
9 of ref (28).The Asp and Glu models are designed to work with a refined version
of the MS-EVB3 force field, herein referred to as MSEVB 3.1. This
refinement uses the same functional forms for the Hamiltonian but
revised parameters. Using a genetic algorithm, the fitting parameters
were optimized using the original ab initio data sets as well as potential
energy surface scans for the Eigen ion at the same level of the original
ab initio calculations. Furthermore, the oxygen partial charge of
the hydronium is an adjustable parameter (the charge for the hydrogens
is automatically determined to maintain the hydronium +1 total charge).
MS-EVB 3.1 better describes contact ion pairs. Parameters for this
force field are presented in the Supporting Information.In this study, the limiting species are the de/protonated
Asp or
Glu models and the MS-EVB 3.1 solvated proton in SPC-FW water. The MS-RMD force field is parametrized to smoothly switch
from one protonated species to another. The asymmetry of the reaction
necessitates several modifications to the original MS-EVB equations,
which were designed with water in mind. The off diagonal coupling
term, h, is designed
to reproduce the correct transition state geometry. Previous MS-EVB
models used a complex off-diagonal term to describe a symmetric reaction
pathway between two waters. However, the asymmetry of the Asp-water
system was better fit by a simple Gaussian potential:where rOH is the
distance between the deprotonated Aspoxygen and the excess proton.
The other parameters determine the shape and size of the Gaussian
and are fit to the QM/MM data. In practice, protonation reactions
occurring between different species require a constant energy term
() to be added to
the deprotonated state. This term accounts for the difference in energy
resulting from different MM force fields. Without it, the protonated
and deprotonated Asp potential energy surfaces would be offset by
hundreds of kilocalories per mole, and no reaction would ever take
place. This difference is due primarily to electrostatics and can
be estimated by subtracting the coulomb energy of the most favorable
hydronium state (containing the favorable electrostatic interactions
between the hydronium cation and amino acid anion) from the coulomb
energy of the protonated (and neutral) amino acid state. Figure 2 shows a plot of the average value of this term
as a function of RC. Since controls the relative stability of the protonated and deprotonated
states, it was used in past work to tune the behavior of the model
to reproduce experimental quantities such as pKa.[31] In the current work, an iterative
fitting approach is used to rigorously determine the value of along with all other MS-RMD
parameters.
Figure 2
Plot of the difference in coulomb energy between the protonated
Asp state and the lowest energy non-Asp state. At short distances,
this quantity is a good initial estimate of .
Plot of the difference in coulomb energy between the protonated
Asp state and the lowest energy non-Asp state. At short distances,
this quantity is a good initial estimate of .
Fitting Procedure
Our fitting procedure is described
graphically in Figure 3. It is similar in some
ways to that used in previous work in which MS-RMD models were force
matched from AIMD data.[26,38] There are, however,
a few important differences, including the use of QM/MM instead of
AIMD, the use of empirical functional forms instead of tabulated potentials,
and the use of an iterative scheme for sampling configuration space
starting from a guessed Hamiltonian. These choices were partially
motivated by the success that has been demonstrated with the adaptive
force matching method by Wang and co-workers.[32,39] The general fitting procedure is conceptually simple. First, umbrella
sampling simulations are run with a “best guess” reactive
Hamiltonian in order to generate a set of configurations along the
reactive pathway. Next, high-level QM/MM calculations are performed
to collect the atomic forces for those configurations. The new reactive
Hamiltonian is then used to generate new configurations via umbrella
sampling, and the process is repeated until the parameters reach the
desired convergence.
Figure 3
Schematic representation of the iterative fitting procedure
used
in this paper. Details of each step are discussed in the text.
Schematic representation of the iterative fitting procedure
used
in this paper. Details of each step are discussed in the text.Due to the nonlinearity of several
of the reactive force field
parameters, the reactive model is fit to the QM/MM reference data
using a genetic algorithm[40] to minimize
the residual:Here, NC and NA are the number of configurations and the number
of atoms in each configuration, respectively. w(r) is the weight for each atom
that, unless specified otherwise, is set to 1. F is the atomic force from the current MS-RMD
parameter set and FQM/MM is the reference force from
the QM/MM calculation.The fitting calculations were run in
parallel and used 4000 genomes
(parameter sets) per generation. To lessen the danger of overfitting,
a two-step minimization scheme was used. The first step followed the
uniformly distributed mutation scheme outlined in section 3.1 of ref (40). The range of parameters
was chosen to be as broad as possible without allowing unreasonable
values. For example, c3 in the off-diagonal
term would be unreasonable if it were greater than the AspO–OW distance. Once a converged parameter set was reached with
the uniform distribution, the mutation scheme was switched to the
normally distributed scheme described in section 3.2 of ref (40). This scheme is very good
at quickly finding a local minimum and was therefore used to find
the best genome in the vicinity of the result from the first step.
Since the second scheme is easily trapped in local minima, it was
not ideal for use on its own. The combination described here was found
to quickly and reliably converge on the optimum solution. The discrete
recombination scheme described in section 2.2 of ref (40) is used with both mutation
schemes.The nature of the reactive model introduces additional
complexity
into this otherwise simple scheme. , which directly controls the depth of the protonated well
relative to the deprotonated CIP, is a constant that is only included
for states in which the amino acid is protonated. Hence, its influence
is negligible beyond the transition state. The off-diagonal term acts
over a broader range spanning the protonated amino acid and CIP configurations
but also has a negligible effect once the CIP dissociates. In contrast,
the repulsive terms, which are only applied to the deprotonated amino acid, are significant at longer distances and negligible at
distances below the transition state. Furthermore, since both and the repulsive terms
are added to the diagonal matrix elements, they tend to have correlated
behavior if fit simultaneously. Thus, the value of V for the first iteration after generating new QM/MM data
is chosen to be the average value described in Figure 2. This value has been found to be within a few kilocalories
per mole of the final value of V and
is therefore a good initial approximation. Holding the value of V fixed, the off-diagonal and repulsive terms
can be fit to the full set of QM/MM reference forces. Once the off-diagonal
and repulsive terms reach their optimum value for the current value
of V, umbrella sampling is used to generate
a PMF with the model. Since V’s
effect is most important at and around the transition state, a range
of RC values immediately surrounding the transition state is chosen
from the PMF, and configurations from that range are then used to
fit a new value of V. Using this new
value of V, the off-diagonal and repulsive
terms are refit while V is again held
constant. This procedure fits an empirical model from solely QM/MM
data and results in models that correctly predict the bulk phase pKa values for both Asp and Glu.
Computational
Details
Solution simulations of Asp and
Glu include one solute molecule solvated by 486 and 467 waters, respectively.
To reduce interactions with the charged backbone protonatable sites,
the N-termini were capped with an acetyl group and the C-termini were
capped with methylamine. The systems were equilibrated in the NPT
ensemble at 1 atm and 310 K for 200 ps before changing to the NVT
ensemble with a cubic box of 24.55486 Å on a side for Asp and
24.24938 Å for Glu. All reactive simulations were run with a
modified version of the LAMMPS simulation package.[41,42] The RC was chosen to be the distance between the CEC and the center-of-mass
of the amino acid’s side chain carboxylic acid. Umbrella sampling
was used to evenly sample the RC. For the Asp system, 25 windows were
spaced every 0.36 Å starting at a RC value of 1.0 Å. Umbrella
restraints were chosen to be 20.0 kcal/mol. The Glu system used 36
windows spaced every 0.25 Å with 20.0 kcal/mol restraints. Each
production window was further equilibrated for 100 ps, and then statistics
were collected for 1.7 ns. The weighted histogram analysis method
was used to integrate all PMFs.[43,44] The pKa is calculated from the resulting PMF using the following
equation:where w(r) is the free energy from the PMF and β
is the product of the
simulation temperature and the Boltzmann constant. The integral is
calculated from zero to the transition state, as denoted by †. C0 is the standard state concentration whose
value is 1660 Å3/molecule and results from the entropic
freedom that is gained by the proton when it dissociates from the
acid.[45,46]Atomistic configurations for the QM/MM
calculations were selected from the umbrella sampling trajectories.
In order to ensure a uniform distribution along the RC, individual
configurations were sorted by their RC value into 60 0.067 Å
wide bins. Five configurations were then selected at random from each
bin, giving a total of 300 configurations for each iteration. QM/MM
was performed in CP2K[47] using B3LYP[48,49] with the double-zeta basis set. Unrestricted DFT was used to allow
the most accurate calculation of forces near the dissociation barrier.
B3LYP was chosen over MP2 because recent studies have shown comparable
accuracy with a reduced computational cost for water and the solvated
excess proton.[39,50,51] Calculations were run on 256 Intel Sandy Bridge cores and on average
completed within 10 min. The QM region included the entire amino acid
residue and all of the waters found by the MS-RMD state search. Because
proximity to classical point charges reduces the accuracy of forces
calculated on QM atoms, an additional 3 Å buffer layer of QM
water was added around the QM atoms. All together, this results in
a QM region containing an average of 208 atoms, or one Asp/Glu and
61 waters. In total, four iterations were needed for Asp parameters
to converge, while Glu parameters converged in three iterations.
Results and Discussion
MS-RMD Model Parameters
Parameters
for the new Asp
and Glu models are presented in Table 1. These
parameters were fit to QM/MM reference forces using the previously
described iterative scheme to ensure convergence. It must be noted
that the Asp and Glu models were parametrized using the same method,
but entirely independent data sets. The fact that the final parameters
for each model are so similar is not surprising, however, given the
molecular similarity and small difference in experimental pKa values. As the next section discusses, the
pKa predicted by each model is a very
good estimate of the experimental value. The fact that the models
are able to capture such subtle differences is evidence that the methodology
works well.
Table 1
Final Parameters for the Asp and Glu
Models
Asp
Glu
Asp
Glu
B
0.579 671
1.013 770
Vii
–150.505 417
–150.291 915
b
1.506 881
1.418 695
c1
–25.099 920
–25.013 330
b′
0.000 108
1.084 593
c2
2.799 262
3.018 957
dOO0
2.4
2.4
c3
1.299 994
1.280 649
C
0.579 841
0.987 714
D
143.003
143.003
c
0.931 593
1.146 188
α
1.8
1.8
dOH0
1.0
1.0
r0
0.975
0.975
rs
3.5
3.5
rs
4.0
4.0
Deprotonation
PMF and pKa
The potential of
mean force (PMF) of deprotonation describes the
free energy of the process as a function of the RC. To obtain a free
energy change between any two points on the PMF, one simply integrates
the PMF. Figure 4a shows the PMF for deprotonation
of Asp, and Figure 4b shows the PMF for Glu
(black lines). As depicted in Figure 5, the
RC gives the distance between the center-of-mass of the carboxylic
moiety and the CEC. RC values around 1.3 Å correspond to the
protonated amino acid, while values between 2.1 and 5 Å correspond
to the stable CIP. Values above 5.5 Å represent an excess proton
separated from the anionic acid by bulk solvent.
Figure 4
PMFs and cmax2 distributions
from the final (a) Asp and (b) Glu models. The PMF describes the free
energy as a function of the RC. The pKa is calculated from the PMF. The colormap in the background gives
the cmax2 probability distribution
normalized for each RC value. This distribution describes the extent
of charge delocalization and, by extension, the solvation structure.
Figure 5
Snapshots from umbrella sampling trajectories
illustrating the
types of structures suggested by the cmax2 distributions and the 2D-RDFs. Arrows indicate the region
of the PMF where the windows were run, while the distances in the
box are the center of the harmonic restraint. Atoms are colored according
to their type. The position of the CEC is shown as a yellow sphere.
In Eigen structures, the CEC is nearly aligned with the central water’s
oxygen. In Zundel structures, the CEC resides close to the midpoint
between neighboring oxygens.
PMFs and cmax2 distributions
from the final (a) Asp and (b) Glu models. The PMF describes the free
energy as a function of the RC. The pKa is calculated from the PMF. The colormap in the background gives
the cmax2 probability distribution
normalized for each RC value. This distribution describes the extent
of charge delocalization and, by extension, the solvation structure.Snapshots from umbrella sampling trajectories
illustrating the
types of structures suggested by the cmax2 distributions and the 2D-RDFs. Arrows indicate the region
of the PMF where the windows were run, while the distances in the
box are the center of the harmonic restraint. Atoms are colored according
to their type. The position of the CEC is shown as a yellow sphere.
In Eigen structures, the CEC is nearly aligned with the central water’s
oxygen. In Zundel structures, the CEC resides close to the midpoint
between neighboring oxygens.The protonated forms of Asp and Glu are much more stable
than the
deprotonated forms, which is consistent with their status as weak
acids. Evaluating the integral in eq 9 estimates
the pKa values for Asp and Glu to be 3.82
± 0.17 and 4.56 ± 0.18, respectively. These are both in
very good agreement with the experimental ranges of 3.71 to 3.90 for
Asp and 4.15 to 4.31 for Glu.[52,53]The CIP is a
significant feature of weakly acidic systems. It is
stabilized by the electrostatic attraction between oppositely charged
ions. The barrier that prevents protonation is due to the cost of
forming a Zundel-like arrangement between the acid and hydronium.
The depth and extent of the CIP varies with the system; however its
existence has been predicted in a variety of weak acid systems simulated
with different methods.[12,31,54,55] When preparing a simulation to
calculate a PMF of deprotonation and a pKa, care must be taken to ensure that the RC is sampled far enough
beyond the CIP to ensure bulk-like properties. If this is not done,
then the PMF from which the pKa is calculated
will be shifted and the resulting “pKa” will neither be accurate nor reflect the quality
of the model. Fortunately, sampling well beyond the CIP is not a problem
with MS-RMD models due to their efficiency. However, care needs to
be taken when running QM/MM simulations to ensure that a large enough
QM box and enough solvating waters are used to not only sample beyond
the CIP but also to avoid boundary issues.[13]
Solvation Structure
With the PMF in hand, one can additionally
probe the molecular mechanism governing weak acid dissociation. The
MS-RMD methodology conveniently provides a measure of the structure
of the solvated complex. It has been shown that the square of the
largest component of the ground state eigenvector (cmax2, from c in eq 3) describes the solvated structure.[28] A value of 1 indicates the limiting case of an entirely
localized excess charge, while values less than that quantify the
degree of charge delocalization. Eigen structures, where the excess
charge is shared among a central hydronium and three H-bound water
molecules, have a cmax2 value
around 0.65. Zundel structures, which share the excess charge more
equally between two water molecules, have a cmax2 value closer to 0.5. It should be emphasized
that there is always additional delocalization to other surrounding
water molecules, even for these limiting Eigen- and Zundel-like cations.
By plotting the cmax2 probability
density as a function of the RC, we gain insight into the structural
changes that accompany deprotonation. This probability density is
plotted as a colormap in Figure 4. At low RC
values, both Asp and Glu have an average cmax2 greater than 0.9, which indicates that the structure
and dynamics mostly resemble the limiting case of a protonated acid
in bulk water with little charge transfer to the surrounding water
molecules. This is notably different than the protonated water molecule,
which always involves a significant amount of charge delocalization.[56] As the proton begins to dissociate, the cmax2 decreases as the proton goes
through a clear Zundel-like transition between the acid and solvating
water. Beyond the transition state, the system begins to resemble
an Eigen cation where one of the solvating waters is replaced with
the acid anion. However, the Eigen structure seen in the CIP is distinctly
different from that seen in bulk water, as evidenced by a cmax2 of ∼0.73. The close proximity
of the anionic acid helps to stabilize the hydronium, leading to a
more localized charge. Further evidence of the uniqueness of the CIP
is that there is a well-defined Zundel transition around 4 Å
linking the CIP to the bulk-like structure at longer distances. While
proton transfers always occur via Eigen–Zundel–Eigen
transitions in bulk, the water around the anionic amino acid is more
structured, as evident in the distinct Zundel-like transition at a
RC value around 4 Å. Figure 5 shows snapshots
of the typical structures found at different RC values and characteristic
of significant regions in the PMF. A yellow sphere indicates the location
of the CEC. Its position relative to the atoms reinforces the analysis
presented herein. The bonding topology reflects O–H distances
less than or equal to 1.6 Å, which is commonly considered to
be the length of a hydrogen bond.It is also interesting to
look at the structural changes that occur directly around the CEC.
To do so, the two-dimensional radial distribution functions (2D-RDFs)
between the CEC and surrounding wateroxygens (OW) and
waterhydrogens (HW) are shown in Figure 6. These results corroborate the cmax2 analysis. A slice along the first dimension of the 2D-RDFs
gives a traditional 1D-RDF, which describes the probability of finding
the indicated pair a given distance apart. Scanning the second dimension
shows how the solvation structure changes as the reaction progresses
according to the RC. In combination with the cmax2 analysis, the 2D-RDFs show that the solvent
has well-defined rearrangements as the proton dissociates from Asp
and travels to bulk. Figure 6a shows the 2D-RDF
for Asp’s CEC–OW distance. Once Asp is deprotonated,
the first peak is near 2.5 Å, which is characteristic of the
Eigen structure. When the RC reaches 4 Å, the peak distance drops
to around 1.5 Å, which is characteristic of a slightly asymmetric
Zundel cation where the CEC resides close to the midpoint of the O–O
vector. The 2D-RDF showing Asp’s CEC–HW distribution
shows similar behavior and is displayed in Figure 6b. Figures 4b and 7 show that Glu behaves similarly to Asp.
Figure 6
2D-RDFs for (a) CEC–OW and (b) CEC–HW with Asp. These plots show
how the water structure changes
around the CEC as deprotonation occurs.
Figure 7
2D-RDFs for (a) CEC–OW and (b) CEC–HW with Glu.
2D-RDFs for (a) CEC–OW and (b) CEC–HW with Asp. These plots show
how the water structure changes
around the CEC as deprotonation occurs.2D-RDFs for (a) CEC–OW and (b) CEC–HW with Glu.
Conclusion
We
have presented a procedure for fitting the parameters of a MS-RMD
model to reproduce the forces from QM/MM calculations. Due to the
efficiency of MS-RMD, we are able to collect and analyze data from
large time and length scales that can reveal intricate behavior and
is essential for converged bulk phase thermodynamic properties. This
new methodology was used to parametrize new models of both Asp and
Glu that correctly predict the correct experimental pKa values. This is the first time that a MS-RMD model has
been able to reproduce the experimental pKa values without any fitting to experimental data. Multi-nanosecond
trajectories were analyzed to extract important information about
the mechanism underlying the deprotonation process. It is shown that
the deprotonation process follows a well-defined series of Zundel
to Eigen transitions as the excess proton travels from Asp to bulk
solution.This work is considered continued progress toward
the goal of having
flexible and systematic algorithms to reliably make a multiscale connection
between accurate electronic structure calculations and efficient empirical
models to describe complex reactive processes. As more accurate ab
initio methodologies (e.g., those explicitly accounting for electronic
correlation) become computationally tractable for QM/MM of condensed
phase systems, the procedure presented herein and related force matching
algorithms will be invaluable tools for mapping high level data into
efficient RMD models that can significantly extend the time and length
scales of reactive simulations to explore complex phenomena.While the results presented herein are important, the true test
of this procedure will be to apply it in biological systems where
the pKa of individual residues can be
shifted by several pKa units. Proton transport
in proteins is very sensitive to the underlying protonatable residue
models, and it is therefore crucial to have a procedure to ensure
that those models are physically accurate. Furthermore, biological
systems are orders of magnitude larger and have much slower dynamics
than the systems presented here. While it is conceivable that QM/MM
MD simulations could reach the same sort of time scales as the solution
simulations presented here, they would be prohibitively expensive
to run the length-and time-scales needed to account for the slow dynamics
commonly found in biological processes. The MS-RMD methodology has
already been shown to scale well to hundreds of thousands of atoms
and can realistically reach tens of nanosecond simulations with large
explicit biomembrane systems. Work is already underway to implement
these reactive models in protein environments and to use the presented
methodology to fine-tune these and other individual amino acid models
to behave properly in their specific protein environments.
Authors: Emil Alexov; Ernest L Mehler; Nathan Baker; António M Baptista; Yong Huang; Francesca Milletti; Jens Erik Nielsen; Damien Farrell; Tommy Carstensen; Mats H M Olsson; Jana K Shen; Jim Warwicker; Sarah Williams; J Michael Word Journal: Proteins Date: 2011-10-15
Authors: Ruibin Liang; Jessica M J Swanson; Jesper J Madsen; Mei Hong; William F DeGrado; Gregory A Voth Journal: Proc Natl Acad Sci U S A Date: 2016-10-24 Impact factor: 11.205
Authors: Laura C Watkins; Ruibin Liang; Jessica M J Swanson; William F DeGrado; Gregory A Voth Journal: J Am Chem Soc Date: 2019-07-12 Impact factor: 15.419