Robert B Best1, Wenwei Zheng1, Jeetain Mittal2. 1. Laboratory of Chemical Physics, National Institute of Diabetes and Digestive and Kidney Diseases, National Institutes of Health , Bethesda, Maryland 20892, United States. 2. Department of Chemical and Biomolecular Engineering, Lehigh University , Bethlehem, Pennsylvania 18015, United States.
Abstract
Some frequently encountered deficiencies in all-atom molecular simulations, such as nonspecific protein-protein interactions being too strong, and unfolded or disordered states being too collapsed, suggest that proteins are insufficiently well solvated in simulations using current state-of-the-art force fields. To address these issues, we make the simplest possible change, by modifying the short-range protein-water pair interactions, and leaving all the water-water and protein-protein parameters unchanged. We find that a modest strengthening of protein-water interactions is sufficient to recover the correct dimensions of intrinsically disordered or unfolded proteins, as determined by direct comparison with small-angle X-ray scattering (SAXS) and Förster resonance energy transfer (FRET) data. The modification also results in more realistic protein-protein affinities, and average solvation free energies of model compounds which are more consistent with experiment. Most importantly, we show that this scaling is small enough not to affect adversely the stability of the folded state, with only a modest effect on the stability of model peptides forming α-helix and β-sheet structures. The proposed adjustment opens the way to more accurate atomistic simulations of proteins, particularly for intrinsically disordered proteins, protein-protein association, and crowded cellular environments.
Some frequently encountered deficiencies in all-atom molecular simulations, such as nonspecific protein-protein interactions being too strong, and unfolded or disordered states being too collapsed, suggest that proteins are insufficiently well solvated in simulations using current state-of-the-art force fields. To address these issues, we make the simplest possible change, by modifying the short-range protein-water pair interactions, and leaving all the water-water and protein-protein parameters unchanged. We find that a modest strengthening of protein-water interactions is sufficient to recover the correct dimensions of intrinsically disordered or unfolded proteins, as determined by direct comparison with small-angle X-ray scattering (SAXS) and Förster resonance energy transfer (FRET) data. The modification also results in more realistic protein-protein affinities, and average solvation free energies of model compounds which are more consistent with experiment. Most importantly, we show that this scaling is small enough not to affect adversely the stability of the folded state, with only a modest effect on the stability of model peptides forming α-helix and β-sheet structures. The proposed adjustment opens the way to more accurate atomistic simulations of proteins, particularly for intrinsically disordered proteins, protein-protein association, and crowded cellular environments.
Atomistic models with
explicit solvent ought to provide the ultimate
accuracy for simulations of biomolecular folding, association, and
function.[1−3] Current energy functions have now been refined over
many years to the point where they may often be used in a predictive
manner. Nonetheless, progressively increasing computational power
is exposing their remaining deficiencies. For example, recent work
has shown that small errors in backbone[4−9] and side-chain[9,10] torsion parameters may accumulate
to have a detrimental effect on protein folding.Clearly, one
of the major shortcomings of current additive force
fields is the quality of the water model used, since current force
fields are generally developed in conjunction with an earlier generation
of water models, most commonly TIP3P.[11] However, more accurate additive water models have subsequently been
developed, carefully parametrized against available experimental and
quantum chemical data, the best examples being the four-site TIP4P-Ew[12] and TIP4P/2005[13] models,
and more recently the TIP3P-FB and TIP4P-FB models.[14] For example, the TIP4P/2005 water model can capture many
features of liquid water, including the temperature of maximum density.
Having a water model which is accurate for pure water may not appear
to offer a direct improvement in the properties of biomolecules. However,
the hydrophobic effect is strongly determined by the properties of
the solvent.[15−17] Since this effect is widely recognized to be of central
importance in describing biomolecular association and folding, an
improved water model should, in principle, enhance overall force field
accuracy. In addition, water-mediated interactions are believed to
be critical for protein–protein recognition.[18,19] At the very least, biomolecular force fields based on these improved
water models, should allow exploration of protein behavior over a
larger range of thermodynamic parameters, such as temperature and
pressure. Several groups have taken a step in this direction, by making
minimal changes to protein force fields in order to combine them with
optimized water models.[20,21] For example, the Amber
ff03w protein model, used with TIP4P/2005 water,[13] improves the cooperativity of helix formation, and results
in more expanded unfolded states,[20] when
compared with its parent Amber ff03* protein model,[6] used with TIP3P. This can be beneficial when seeking to
model the properties of intrinsically disordered proteins.[22−24]Nonetheless, there remain a number of outstanding discrepancies
with current force fields (including Amber ff03w), which consistently
suggest that proteins are too poorly solvated. The best examples are
(i) the dimensions of unfolded proteins in explicit solvent simulations
are still too small, relative to experiment,[25] (ii) solvation free energies of amino acid side-chain analogs are
generally slightly too unfavorable,[26] and
(iii) protein association is in general too favorable.[27] The poor solvation could arise from either indirect
effects due to an inadequate representation of solvent structure (affecting
the free energy cost of cavity formation in the solvent) or direct
effects due to protein–solvent interactions. Since we know
that recent water models accurately reproduce the structure of liquid
water,[12,13] the indirect mechanism is unlikely to play
a role; indeed, both the TIP4P-Ew and TIP4P/2005 water models capture
quite well the temperature-dependence of the solubility of hydrophobic
solutes.[28,29] However, Ashbaugh et al. showed that although
TIP4P/2005 best captures the temperature-dependence of the solvation
free energy for methane out of a range of water models, there was
still a systematic shift of the solvation free energies from the experimental
values.[29] This shift could be corrected
by changing the Lennard-Jones parameters of the methane, effectively
modifying the contribution of direct solute-water interactions to
the solvation free energy. Inspired by this result, we have sought
a similar adjustment of the protein–water interaction parameters
in protein force fields.In this Article, we show that a specific
modification of protein–water
Lennard-Jones parameters in Amber ff03w can address the shortcomings
described above, while not having detrimental effects on folded proteins,
or the stability of small, autonomously folding peptides. The resulting
parameters result in dimensions of unfolded proteins consistent with
FRET and SAXS measurements, realistic nonspecific protein-protein
association, solvation free energies in better agreement with experiment,
and stabilities of folded peptides consistent with experiment at room
temperature. The resulting protein force field, Amber ff03ws, should
facilitate accurate simulations of unfolded and disordered proteins,
protein association, and protein folding.
Methods
Molecular Simulation
Methods
Molecular dynamics was
propagated via the Langevin dynamics integrator in GROMACS (version
4.5.5 or 4.6.5),[30] using a time step of
2 fs, and a friction coefficient of 1 ps–1. Langevin
dynamics was used because it is a very effective thermostat and correctly
samples the canonical ensemble. The friction used here will have only
a small effect on the dynamics,[31] and no
effect on most of the observables we are concerned with, because these
are almost all equilibrium configurational averages. Lennard-Jones
pair interactions were cut off at 1.4 nm, electrostatic energies were
computed via particle-mesh Ewald[32] with
a grid spacing of ∼0.1 nm and a real-space cutoff of 0.9 nm.
The force field in all cases was a derivative of Amber ff03:[33] either Amber ff03*[6] in conjunction with the TIP3P water model[11] or Amber ff03w[20] in conjunction with
the TIP4P/2005 water model.[13] The force
field for the chromophores Alexa 488 and Alexa 594 will be described
in a future publication, and has been extensively validated against
experimental data.In certain cases, temperature replica exchange
simulations were performed (for Csp M34, ACTR, Ac-(AAQAA)3-NH2, GB1 hairpin, chignolin, and Trp cage). The protocol
for these is the same as above, with exchanges attempted every 1 ps.
Further details on the temperature ranges and simulation lengths for
each case are included in the Results section.
We eliminate from the analysis any configurations in which the proteins
make van der Waals contact with their periodic image, defined by a
closest approach of any atom with an image atom of less than 0.3 nm.[22]Nativeness of proteins and peptides was
assessed by computing the dRMS over native
contacts, defined as the mean-square
difference between the distances d0 between residue pairs (i,j) in contact in the reference (native)
state, and the corresponding distance d(x) in a given configuration xThe list of the N native contacts, {native}, is defined
as all pairs
of heavy atoms (i,j) within 4.5
Å in the native structure, excluding pairs for which |Res(i) – Res(j)| ≤ 2, where the
function Res(k) gives the residue number of atom k.
Protein Association
For the villin
headpiece HP36 fragment,[34] we calculate
association rates and dissociation
rates from the mean residence times in bound and unbound states. The
proteins are considered to bind when the minimum distance between
a pair of heavy atoms from each protein falls below 0.2 nm, and to
have dissociated when this minimum distance exceeds 1.0 nm. The unbinding
rate is koff = 1/toff where toff is the average residence
time before unbinding, while the average binding rate is given as kon = 1/(tonc0), where ton is
the average residence time before binding and c0 is the total protein concentration. The residence times in
unbound/bound are the maximum likelihood times assuming an exponential
distribution, that is, the total time in unbound/bound divided by
the number of binding/unbinding events. Then, the dissociation constant
is obtained as Kd = koff/kon.
Solvation Free
Energies
We compute solvation free energies
for amino acid side-chain analogues using the standard alchemical
perturbation, in which the interactions between the solute and the
solvent are turned off via a coupling parameter approach[35] (see below). Since these analogues are not part
of the standard force field, we have manually constructed them, keeping
all parameters as close as possible to those of the original force
field for the full residue. Specifically, we cut the bond between
Cα and Cβ, and add an extra hydrogen
atom to the Cβ with exactly the same parameters as
the other Hβ atoms. The charge on the Cβ is then adjusted to obtain an overall neutral charge, keeping all
other charges the same. A similar procedure has been used in previous
studies.[26,36]The solvation free energy is obtained
by switching between a system in which the solute and solvent are
fully interacting to one in which all solute–solvent interactions
are turned off. That is, the potential energy V(x) is parametrized, for configuration x, aswhere Vww(x) are all pairwise energy
terms involving solvent (water)
only, Vpp(x) are energy
terms involving solute (protein) only, and VLJpw(x) and Velecpw(x) are respectively the
Lennard-Jones and Coulomb pair interactions between protein and water
atoms. Separate coupling parameters μ and λ are used to
switch off the Lennard-Jones and Coulomb interactions. A soft-core
potential was used to smoothly switch off the Lennard-Jones term.
For all cases, a simple 9-window protocol was used with respective
values of μ = 0.0, 0.2, 0.5, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0 and
λ = 0.0, 0.0, 0.0, 0.0, 0.2, 0.4, 0.6, 0.8, 1.0. Note that the
charges are switched off fully before the LJ parameters are scaled
to avoid charge singularities. We have verified that this approach
gives the same results, within statistical error, as an alternative
protocol using 38 windows, and to published simulation data.[36] The results from all windows were combined using
the M-BAR estimator[37] to obtain final solvation
free energies.
Results
Parametrization Strategy
Our aim is to obtain a physics-based
transferable protein force field which can be used to study unfolded
and disordered proteins in solution, while still suitable for studying
folded proteins and ab initio protein folding. Therefore, as a starting
point, we use the Amber ff03w force field, which already comes close
to meeting these requirements,[20,22−24,38−40] although with
some deficiencies as noted in the introduction.Clearly, the
nonbonded parameters will have the most effect on the properties of
interest, that is, the atom-centered partial charges and the parameters
of the Lennard-Jones potential. Given the number of atoms and atom-types
involved, there is potentially a large number of degrees of freedom
in the parametrization. Moreover, these parameters, particularly those
for the Lennard-Jones potential, are coupled to the bonded parameters,
such as those for the torsion terms. Therefore, a reparametrization
of the nonbonded parameters really amounts to fitting a completely
new force field, while our goal is to address a specific deficiency.
Recently, a full parametrization of protein Lennard-Jones parameters
to reproduce solvation free energies showed some promising results,
but unfortunately resulted in very unstable folded proteins.[41] In this work, we seek only the minimal change
to an existing force field to obtain also reasonable properties for
unfolded proteins and protein-protein interactions, while maintaining
the integrity of folded proteins, and the folding stability of fast-folding
peptides.Although there is some degeneracy in the optimal choice
of charges
in an additive force field,[42] a thorough
study of solvation free energies of model compounds found relatively
little dependence of the accuracy of the solvation free energies on
the method used for deriving the charges,[43] and studies of amino acid side-chain analogs have also found little
dependence on the details of charge derivation.[26,36]Therefore, we focus on the Lennard-Jones (LJ) parameters.
We do
not want to alter the properties of the water model, which are already
in excellent agreement with experiment for the pure solvent. Furthermore,
it has been shown theoretically that changing solvent–solvent
interactions should not alter solvation free energies,[44] while ideally we would like to do so. We also
would prefer to avoid modifying the protein parameters, due to the
aforementioned coupling with the torsion parameters, and possible
detrimental effects on protein structure. We have therefore focused
on making a specific change to protein–water LJ interactions
only, rather than relying on the standard Lorentz–Berthelot
(LB) mixing rules employed by Amber force fields. Since there is not
a strong theoretical basis for these rules, and they have previously
been independently adjusted during force field development,[45] we chose to scale the Lennard-Jones ε, between the wateroxygen and all protein
atoms, that iswhere εO and ε are the Lennard-Jones
ε on the wateroxygen and atom i, εOLB is result generated
by the LB combination rule, and γ is the adjustable scaling
parameter. Note that the interactions between the water and ions are
not adjusted, and take their default values from the Lorentz–Berthelot
rules. Interactions between water and prosthetic groups such as chromophores
are rescaled as for the protein. We are thus left with only a single
tuning parameter γ to optimize the protein behavior. Our choice
of tuning an Amber force field is partly driven by the fact that the
original parametrization did not explicitly consider protein–water
interactions,[33] unlike, for example, the
CHARMM force fields,[9,46] and therefore, we have more freedom
to change these interactions.
Parameter Optimization:
Csp M34
Since we have only
a single parameter to optimize, there is no risk of overfitting. We,
therefore, use a single system to optimize the parameter, and then
validate this choice by testing against a variety of other systems.For computational convenience, we have chosen a relatively short
34-residue fragment of Cold-shock protein (Csp M34) for our initial
parametrization. Single molecule Förster resonance energy transfer
experiments on this peptide by Sorranno et al.[47] and Wuttke et al.[48] provide
an accurate source of distance information as a function of temperature.
Csp M34 is labeled at one terminus with the extrinsic chromophore
AlexaFluor 488 and at the other terminus with AlexaFluor 594, via
coupling of cysteine thiol groups to maleimide groups on the chromophores;
the version of the dyes with a 5-carbon linker is used. In the simulations,
we have placed AlexaFlour 488 at the N-terminus and AlexaFlour 594
at the C-terminus (in experiment, the order of attachment is not controlled,
but has been found not to affect the results when tested[49,50]). A complicating factor in such simulations is clearly the accuracy
of the force field for the dye molecules, which we have recently optimized
to reproduce the correct time-scales for chromophore dynamics and
interaction energies between the dye and protein.We have determined
the temperature dependence of the Csp M34 properties
via replica-exchange molecular dynamics (REMD) at constant temperature
and pressure, using 42 replicas spanning a temperature of 275–423
K and a 6.5 nm truncated octahedron periodic cell. Each run was for
100 ns, discarding the first 50 ns as “equilibration”.
In Figure 1 we show a comparison with the temperature-dependent
experimental data, both the average FRET efficiency measured (Figure 1C), as well as the radius of gyration (Rg) inferred from these FRET efficiency measurements (Figure 1A). From the simulation data, we have computed the
mean FRET efficiency via a simple average over the distance between
the chromophores, assuming that the distribution of chromophore orientations
is isotropic at all separations R (i.e., the orientational
factor κ2 in the FRET transfer rate is ∼2/3[51,52]), and that chromophore dynamics is fast relative to chain dynamics:The Rg is determined
from the simulations, using only the protein atoms (even though the
simulation includes the dye molecules and linkers).
Figure 1
Temperature-dependent collapse of the M34 fragment of CspTm. (A)
Radius of gyration (excluding chromophores) for ff03w (black symbols)
and ff03ws (red symbols). Blue line indicates values estimated from
FRET measurements.[48] Red dot-dashed and
dashed lines indicate the results obtained by reweighting at γ
= 1.05 and 1.15, respectively. (inset) Dependence of radius of gyration
on the water–protein scaling factor γ at 300 K; horizontal
blue line indicates the value from FRET and vertical red line the
“optimal” choice of γ ≡ 1.10. (B) Separation
of chromophores as given by the distance between the gamma sulfur
atom of the cysteines to which they are linked (empty symbols) or
by the distance between center of mass of the chromophores themselves
(solid symbols). Color code as in panel A. (C) FRET efficiency as
computed from the distances in panel B, meaning of symbols is the
same. Blue line indicates experimental FRET measurements or derived
quantities.[48]
Using the
Amber ff03w force field, the Rg values
are underestimated with respect to experiment at all temperatures,
even though the TIP4P/2005 water model is known to yield unfolded
states which are more expanded relative to the TIP3P water model[20]): the radius of gyration at 300 K is ∼1.3
nm, around 20% smaller than the experimental estimate. More importantly,
the FRET efficiency is >0.9, and thus clearly inconsistent with
experiment.
The average distance between the center of the chromophores is systematically
shorter than between the cysteine Sγ atoms, which
may be a consequence of the chromophores being too buried within the
unfolded chain.We, therefore, experimented with a range of
possible protein–water
scaling parameter values, and eventually settled on γ = 1.10
as a near-optimal value. Using this scaling, the agreement with experiment
is much improved, both for Rg and FRET
efficiency; the temperature-induced collapse noted in previous studies[25,48,53] is also qualitatively captured.
Interestingly, we obtain almost identical results, regardless of whether
we use the distance between a central atom (C1) of each chromophore,
or between the Sγ of the cysteine residue to which
they are attached.To confirm that the choice γ = 1.10
is close to optimal,
we have performed a simple reweighting, using values of γ ranging
from 1.05 to 1.15, with weights for each snapshot i given by w ∝
exp[−β(Vγ – V1.10)], where Vγ, V1.10, and β are respectively
the energy with the γ of interest, the energy with γ =
1.10, and the inverse temperature. In all cases, we determine the
effective fraction of snapshots contributing to the reweighted average
as fRW = exp[Sshannon]/N, where N is the total number
of snapshots and the Shannon entropy is defined as Sshannon = −∑w ln w, where the sum runs over the normalized
weights w of the N snapshots. We find that fRW is generally in the range of 10–40%, so we are not overemphasizing
individual trajectory frames in the reweighting. The reweighted radius
of gyration at 300 K, shown in Figure 1A, inset,
reveals the expected collapse transtion in the Rg with decreasing γ.[54] As
a consequence, the Rg are very sensitive
to the scaling, and only a rather narrow range of γ close to
γ = 1.10 could be considered acceptable. We therefore proceed
with this choice for the remainder of this work.
Parameter Validation
Having chosen γ = 1.10,
we then confirmed that this choice is reasonable, by applying it to
a variety of test cases probing different aspects of the force field.
We first present the results where this scaling leads to a substantial
improvement in relation to experiment: the dimensions of disordered
proteins (ACTR), the solvation free energies of amino acid analogues,
and protein self-association (villin HP36). We then demonstrate that
the modification does not detrimentally affect a range of other systems:
intrinsic structure propensity in short peptides, the helix–coil
transition, the folding of mini-proteins and the structure of folded
proteins. The force field with γ = 1.10 is referred to as Amber
ff03ws to distinguish it from the Amber ff03w, parametrized with unscaled
protein–water interactions (γ = 1).[20]Temperature-dependent collapse of the M34 fragment of CspTm. (A)
Radius of gyration (excluding chromophores) for ff03w (black symbols)
and ff03ws (red symbols). Blue line indicates values estimated from
FRET measurements.[48] Red dot-dashed and
dashed lines indicate the results obtained by reweighting at γ
= 1.05 and 1.15, respectively. (inset) Dependence of radius of gyration
on the water–protein scaling factor γ at 300 K; horizontal
blue line indicates the value from FRET and vertical red line the
“optimal” choice of γ ≡ 1.10. (B) Separation
of chromophores as given by the distance between the gamma sulfur
atom of the cysteines to which they are linked (empty symbols) or
by the distance between center of mass of the chromophores themselves
(solid symbols). Color code as in panel A. (C) FRET efficiency as
computed from the distances in panel B, meaning of symbols is the
same. Blue line indicates experimental FRET measurements or derived
quantities.[48]
Small-Angle X-ray Scattering (SAXS) of ACTR
While the
agreement for Csp M34 is good, it is a relatively short chain, and
a specific case. A possible concern could be that we optimized our
force field for a particular chain length or sequence composition,
while it may fail for others. This concern arises because of the fundamental
differences in the hydrophobic effect on different length scales,[17] and the known effects of sequence composition
and patterning on the dimensions of unfolded proteins.[55−58] Kjaergaard et al.[53] have measured the
dimensions of the intrinsically disordered protein ACTR by SAXS at
two temperatures. They found that the protein collapses slightly with
increasing temperature, with the average Rg decreasing from 2.63 nm at 278 K to 2.39 nm at 318 K. We have performed
replica-exchange molecular dynamics simulations on the same ACTR sequence,
using the same system size and replica spacing as for the labeled
Csp M34. The average radii of gyration from the replicas at 278 and
318 K are 2.13 (0.8) and 2.03 (0.4) nm, respectively using Amber ff03ws
and 1.56 (0.1) and 1.51 (0.2) nm at 278 and 318 K using Amber ff03w.
Since the experimental Rg is inferred
from a model, we have also directly calculated the scattering intensity
using CRYSOL,[59] and averaged over the trajectory.
In Figure 2, we show a comparison of the experimental
and calculated scattering intensity profiles at the two temperatures
studied in experiment, 278 and 318 K. The match to the experimental
data (black curve) is much improved with the new force field using
scaled protein–water interactions (green curve) relative to
the Amber ff03w force field (red curve), although the curves from
the simulation data still indicate that the structural ensemble is
slightly more compact relative to the experimental data. A likely
reason for the this remaining discrepancy may be the excluded volume
effects due to the limited system size utilized to make the replica
exchange simulations computationally feasible. A simple calculation
using a Gaussian chain model suggests that the magnitude of this effect
is comparable to the amount by which the simulation Rg is reduced, relative to the experimental estimate.
Figure 2
Properties of the intrinsically disordered protein ACTR. SAXS intensity
profiles from experiment (black)[53] and
from simulations using Amber ff03w (red) and Amber ff03ws (green)
are shown for (A) temperatures of 278 and (B) 318 K at an ionic strength
of ∼250 mM (close to the experimental conditions.[53] In panels C and D, we show the distributions
of radius of gyration at 278 and 318 K, respectively. REMD simulations
were performed for 100 ns per replica using a 6.5 nm truncated octahedron
box. Prior to this, equilibration runs of 10 and 30 ns were performed
for Amber ff03w and ff03ws, respectively.
To check the backbone sampling for ACTR, we have also performed chemical
shift calculations using SPARTA+[60] to compare
with the experimental shifts for this protein determined by Ebert
et al.[61] Within the error of the shift
prediction, the shifts calculated from either the Amber ff03w or ff03ws
simulations, using the replicas closest to the experimental temperature
(Figure 3), are in agreement with the experimental
data. One can conclude that the peptide is rather disordered, with
little persistent secondary structure. In this case, it appears that
the shifts are not particularly sensitive to degree of collapse of
the chain.
Figure 3
ACTR secondary chemical
shifts. Black line: experimental Cα secondary shifts
computed by subtracting the SPARTA+
reference shift[60] for a random-coil structure
(δrc)from the experimental data.[61] Blue symbols: simulated shifts computed using SPARTA+ from
Amber ff03w REMD simulations. Red symbols: corresponding results from
Amber ff03ws simulations. Shaded region lies outside of one σ
of the shift prediction from experiment. All data are at ∼304
K.
Properties of the intrinsically disordered protein ACTR. SAXS intensity
profiles from experiment (black)[53] and
from simulations using Amber ff03w (red) and Amber ff03ws (green)
are shown for (A) temperatures of 278 and (B) 318 K at an ionic strength
of ∼250 mM (close to the experimental conditions.[53] In panels C and D, we show the distributions
of radius of gyration at 278 and 318 K, respectively. REMD simulations
were performed for 100 ns per replica using a 6.5 nm truncated octahedron
box. Prior to this, equilibration runs of 10 and 30 ns were performed
for Amber ff03w and ff03ws, respectively.
Solvation Free Energies
Solvation free energies have
long been used as a force field benchmark, although only recently
have the calculations been sufficiently accurate to be discriminating.
For example, Shirts et al. carried out solvation free energy calculations
for all uncharged side-chain analogs, using a number of protein force
fields[26] and water models.[62] They found that all force fields underestimated the solubility
of most side-chain analogs. We have therefore determined solvation
free energies using a standard alchemical method, with 9 windows bridging
the uninserted and inserted states for a simulation time of 6 ns per
window. The results are shown in Table 1.
Table 1
Solvation Free Energies
of Uncharged
Amino Acid Side-Chain Anologs, with the Amber ff03 Force Field and
Different Water Models at 298 Ka
parent residue
PDB frequency
IDP frequency
μex expt (kJ mol–1)
μex ff03* (kJ mol–1)
μex ff03w (kJ mol–1)
μex ff03ws (kJ mol–1)
Ala
0.13
0.18
8.12
(0.08)
10.14
0.06
10.62
(0.07)
9.25
(0.07)
Asn
0.07
0.06
–40.5
–29.39
0.09
–29.96
(0.11)
–32.45
(0.11)
Cys
0.02
0.02
–5.19
(0.21)
–0.09
0.07
0.85
(0.09)
–1.49
(0.09)
Gln
0.06
0.08
–39.2
–44.86
0.11
–47.57
(0.14)
–50.46
(0.14)
Hid
0.04
0.03
–43.0
–49.99
0.11
–52.72
(0.15)
–56.25
(0.15)
Ile
0.09
0.05
9.00
(0.19)
10.54
0.12
11.30
(0.16)
6.96
(0.16)
Leu
0.14
0.12
9.54
(0.11)
9.88
0.10
10.99
(0.15)
6.52
(0.15)
Met
0.03
0.03
–6.19
0.69
0.10
1.87
(0.14)
–2.40
(0.14)
Phe
0.06
0.04
–3.18
(0.18)
2.17
0.12
3.69
(0.18)
–1.61
(0.18)
Ser
0.09
0.15
–21.2
(0.1)
–19.16
0.07
–19.56
(0.08)
–20.66
(0.08)
Thr
0.09
0.10
–20.4
(0.2)
–14.38
0.08
–14.91
(0.10)
–17.04
(0.10)
Tyr
0.02
0.01
–25.6
(0.1)
–14.90
0.15
–6.20
(0.16)
–10.81
(0.17)
Trp
0.05
0.03
–24.6
–7.87
0.12
–13.32
(0.20)
–20.33
(0.21)
Val
0.11
0.09
8.33
(0.10)
9.84
0.09
10.34
(0.12)
6.96
(0.12)
All values in
kJ mol–1. Experimental data were taken from Wolfenden.[63] Numbers in brackets are, for the simulation
data, the statistical
error, and for the experimental data, the standard deviation over
the experimental data sets collated by Shirts and Pande,[26] where available.
To gauge the overall impact of the solvation free energies on force
field properties, we would ideally perform a global comparison of
the solvation free energies of all side-chain analogues with their
experimental counterparts to determine the net difference. There are
a few complications to such a comparison; first, we have not determined
the solvation free energies of all side-chain analogues. Specifically,
we have neglected the charged analogues, whose solvation free energies
are an order of magnitude larger[48,64] and slightly
more complicated to determine by simulation.[65] Second, we do not consider the backbone contribution, due to the
ambiguity of how to parametrize model compounds for the backbone with
the Amber ff03 force field (all residues have different backbone charges).
Third, the values obtained should, in principle, be corrected for
the energetic cost of polarization, which is absent in a prepolarized
additive model.[66] As we discuss below,
there are many approximate ways to do this, and as in previous studies,
we will merely present the uncorrected results and discuss the qualitative
impact of corrections.[36]ACTR secondary chemical
shifts. Black line: experimental Cα secondary shifts
computed by subtracting the SPARTA+
reference shift[60] for a random-coil structure
(δrc)from the experimental data.[61] Blue symbols: simulated shifts computed using SPARTA+ from
Amber ff03w REMD simulations. Red symbols: corresponding results from
Amber ff03ws simulations. Shaded region lies outside of one σ
of the shift prediction from experiment. All data are at ∼304
K.All values in
kJ mol–1. Experimental data were taken from Wolfenden.[63] Numbers in brackets are, for the simulation
data, the statistical
error, and for the experimental data, the standard deviation over
the experimental data sets collated by Shirts and Pande,[26] where available.Nonetheless, we can derive some insights by comparing
the average
properties of the side-chain analogues which we have studied. In Table 2, we present various measures of the difference
from experiment for different combinations of force fields and water
models. In addition to the standard global root-mean-square error
from experiment, we also consider the mean signed error, which measures
the net deviation from experiment, as well as weighted variants MSEX defined viawhere wX(i) is the weight
of residue i for the set
of weights X, and μsimex(i) and μexptex(i) are respectively the excess chemical potential of residue i in simulation and experiment. We consider two sets of
weights, first the normalized frequency of each residue in the protein
data bank (PDB), and second the corresponding frequency of each residue
in loop regions of the PDB, as a proxy for intrinsically disordered
segments (residue frequncies taken from ref (67)).
Table 2
Global Measures of Deviation from
Experimental Solvation Free Energies (kJ mol–1)a
protein//water
force fields
RMSE
MSE
MSEPDB
MSEIDP
OPLS//TIP3Pb
3.11
2.5
2.11
2.21
Amber9x//TIP3Pb
3.63
2.42
1.86
1.69
Amber9x//TIP4Pewb
4.08
2.87
2.43
2.05
CHARMM//TIP3Pc
5.63
4.78
3.73
3.59
Amber03*//TIP3P
7.41
4.90
3.81
2.99
Amber03w//TIP4P/2005
8.25
5.32
4.16
3.11
Amber03ws//TIP4P/2005
6.12
1.81
0.95
0.33
Note that these comparisons exclude
histidine in all cases as it was not present in the data set of Hess.[36] MSE: Mean signed error. MSEPDB: Mean
signed error with PDB residue weighting. MSEIDP: Mean signed
error with residue weighting based on unstructured regions of the
PDB.
Data from Hess.[36]
Data from Shirts and Pande.[26]
Note that we have
used Amber9x to refer to Amber94, Amber99, Amber99SB,
etc., which are identical from the perspective of side-chain analogues,
although different parent force fields may be used to describe them
in the literature.[26,36]Of particular interest
for this study is the average signed error,
which is a measure of whether the simulation solvation free energies
are overall more or less favorable than the experimental data. This
is because the modification we make is not likely to have highly residue-specific
effects but should have a qualitatively similar effect on all residues.
The important deductions which can be made from Table 2 are that the RMS error of most force field/water combinations
is 1–2 kBT, with
a mean signed error of around +1 kBT. Thus, over the set of residues studied here, the solvation
free energies of these force fields are less favorable than in experiment.
Of course, we have not explicitly considered a polarization correction,
but this should make the simulation solvation free energies more unfavorable
(as it is an energy cost incurred upon dissolution).[42] Scaling the water interactions helps to improve the overall
RMSE of the Amber ff03ws model relative to Amber ff03w, but, more
importantly, it reduces the mean signed error (with or without weighting).
Thus, the average solvation free energy per side-chain is approximately
1–2 kBT more favorable
than the other force field models in Table 2. While this may be qualitatively expected, it helps to quantify
the effect of the chosen scaling of protein–water interactions.
When the differences are weighted by the observed frequency of residues
in the PDB (in either folded or disordered regions), the agreement
improves still further, indicating that the most abundant residues
also have the most accurate solvation free energies, so that the average
solvation of natural sequences will generally exceed expectations
based on a comparison of unweighted solvation free energies.Note that these comparisons exclude
histidine in all cases as it was not present in the data set of Hess.[36] MSE: Mean signed error. MSEPDB: Mean
signed error with PDB residue weighting. MSEIDP: Mean signed
error with residue weighting based on unstructured regions of the
PDB.Data from Hess.[36]Data from Shirts and Pande.[26]
Protein Association: Villin
HP36
Association of villin
HP36 is an appealing test for nonspecific protein-protein affinities,
because of its modest size (36 residues) relative to other systems
often used for studying weak protein association, such as Lysozyme.
Villin HP36 has been shown to associate in a diffusion-limited manner
in a recent computational study,[27] while
it is known to remain soluble up to ∼1.5 mM,[68] based on the invariance of NMR-determined diffusion coefficients
up to that concentration. Therefore, we expect a dissociation constant Kd > 1.5 mM. On the other hand, there is evidence
for weak protein association from a small change in chemical shifts
at a concentration of ∼32 mM.[69] To
test the self-interactions of villin, we ran simulations of two villin
HP36 molecules at a total protein concentration of ∼8.5 mM,
starting from the same initial configuration with the molecules spaced
∼1 nm apart. The simulations were run for 200 ns at 300 K.
We observed binding and unbinding events from long equilbrium simulations,
where we define a binding event as when the minimum distance between
the molecules falls below 0.2 nm, and unbinding when it exceeds 1.0
nm. These distances were chosen based on the trajectory of minimum
distances, shown in Figure 4. Using these cut-offs,
we can compute association and dissociation rates, and dissociation
constants Kd for each force field, shown
in Table 3.
Figure 4
Association of villin. The minimum distance
between two villin
HP36 molecules at 8.5 mM total protein concentration is shown for
(A) Amber ff03*, (B) Amber ff03w, and (C) Amber ff03ws. Broken red
lines indicate the boundaries used to define association and dissociation
events. The dRMS from the native state is shown for (D) Amber ff03*,
(E) Amber ff03w, and (F) Amber ff03ws (black, blue curves correspond
to the two chains in the system).
Table 3
Kinetic and Equilibrium Parameters
of Villin HP36 Association
force field
kon (M–1 ns–1)
koff (ns–1)
Kd (mM)
Amber ff03*
49.4 (34.9)
0.005 (0.005)
0.10 (0.07)
Amber ff03w
8.3 (2.7)
0.128 (0.043)
15.4 (1.7)
Amber ff03ws
3.5 (1.7)
0.077 (0.034)
21.9 (4.9)
It is clear from inspection
of the simulation with the Amber ff03* force field (TIP3P water) that
the association of the two proteins is extremely rapid, and, in spite
of an early short dissociation event, the proteins essentially remain
bound for the entire duration of the trajectory. Although the estimated Kd of 0.1 mM has a large associated error, it
is also clearly much too small to be consistent with the NMR diffusion
coefficient measurements. On the other hand, with the Amber ff03w
(TIP4P/2005 water) or Amber ff03ws (TIP4P/2005 water with scaled protein–water
interactions), we find dissociation constants of ∼15 and ∼22
mM respectively. These values are consistent with the lack of association
at ∼1.5 mM in the diffusion experiments and may also be consistent
with some weak association at the 32 mM concentration at which localized
chemical shift changes were observed.Association of villin. The minimum distance
between two villin
HP36 molecules at 8.5 mM total protein concentration is shown for
(A) Amber ff03*, (B) Amber ff03w, and (C) Amber ff03ws. Broken red
lines indicate the boundaries used to define association and dissociation
events. The dRMS from the native state is shown for (D) Amber ff03*,
(E) Amber ff03w, and (F) Amber ff03ws (black, blue curves correspond
to the two chains in the system).
Structure of Disordered Peptides: Ala53J Couplings
The determination
of a comprehensive
set of through-bond scalar couplings in short oligo-alanine peptides
by Schwalbe and co-workers[70] has provided
a reference data set for the intrinsic sampling of backbone conformations
in disordered peptides, which has been used to assess[71−73] and optimize[6,21] force fields. Here, our objective
is to confirm that altering the protein–water interactions
does not have detrimental effects on this sampling of backbone conformations.
The approach we use here is similar to that adopted earlier,[71] i.e., we measure the agreement with experiment
usingwhere Jexpt(i) is the ith experimental J-coupling and Jcalc(i) is the corresponding ith coupling computed using
a Karplus equation; σ is the estimated
error in the predictions made by the Karplus equation. However, we
have made some important revisions. First, we now use a more recent
Karplus equation, fitted to RDC-refined NMR structures[74] for the 3JHNHα, 3JHNC, and 3JHNCβ scalar couplings, which results in
a smaller σ for the predictions;
despite this, we still obtain a lower χ2 than with
the previous Karplus equation for those same couplings. Second, we
have now omitted the 3JCC measurements
from the test data set. In all previous studies, these have resulted
in a large contribution to the total χ2, and the
experimental values are inconsistent with data for other alanine-rich,
disordered sequences (Ad Bax, personal communication). Furthermore,
since the 3JCC couplings report
on the ϕ torsion angle, they are somewhat redundant as several
other J-couplings are available that report on the
same angle.The results, summarized in Table 4 show that overall, similar results are obtained with the
original Amber ff03w and with ff03ws, with χ2 smaller
than 1.0. If the force field were perfect, an average χ2 of 1.0 is expected, since the deviation from experiment is
then equal to the error in the Karplus prediction. Our χ2 < 1.0 most likely results from a slight overestimate of
the experimental σ which is obtained
from diverse sequences, not just oligo-alanine. This χ2 is also comparable to that for other state of the art protein force
fields such as CHARMM 36.[9]
Table 4
Summary of 3J Couplings (with New Karplus
Equation) for Residue 4a
Residue,
Coupling
Expt
σ
ff03*
CHARMM36
ff03w
ff03ws
A4 1JNCα (ψ4)
11.25
0.59
11.18
11.20
11.35
11.34
A4 2JNCα (ψ4)
8.40
0.50
8.17
8.06
8.26
8.22
A4 3JHαC (ϕ4)
1.89
0.38
1.83
1.94
1.85
1.85
A4 3JHNC (ϕ4)
1.15
0.31
1.10
1.25
1.13
1.20
A4 3JHNCβ (ϕ4)
2.14
0.25
2.13
2.11
2.03
1.98
A4 3JHNHα (ϕ4)
5.98
0.42
6.16
5.97
6.29
6.26
A4 3JHNCα (ϕ4, ψ3)
0.69
0.10
0.62
0.61
0.64
0.64
χ2
0.36
0.60
0.31
0.38
Note that the reported χ2 is for all residues; however, a full listing is given in
the Supporting Information.
Note that the reported χ2 is for all residues; however, a full listing is given in
the Supporting Information.
Helix Formation in Ac-(AAQAA)3-NH2
Good sampling of intrinsic backbone
propensity in unstructured peptides
is clearly desirable, but it is also critical that a modification
of the water–protein interaction does not destabilize folded
motifs such as helices, which may be weakly populated in unfolded
states. As a model system, we use the 15-residue helix-forming peptide
Ac-(AAQAA)3-NH2, which forms ∼30% helix
at 300 K, and for which the temperature-dependent helix propensity
has been determined by NMR.[75] We have previously
used this as a probe for helix propensity in force fields.[6,9,20] In Figure 5A, we show the temperature-dependent helix propensity for this peptide
using Amber03ws, compared with the closely related force-fields Amber03*
and Amber03w. It is clear that even with the modified water interactions,
stable helical structures are still formed. The overall helix stability
is reduced in Amber ff03ws, but since it was originally slightly too
high in Amber ff03w at low temperatures, the agreement with experiment
is somewhat improved at 300 K (see per-residue helix populations in
Figure 5A, inset).
Figure 5
Peptide folding equilibria. (A) Temperature-dependent
helix formation
in Ac-(AAQAA)3-NH2, (inset) per-residue fraction
helix at 300 K. (B) Folded population of Trp Cage. (C) Folded population
of GB1 hairpin. (D) Folded population of chignolin. Folded populations
are defined as those with dRMS from the
experimental structure of less than 0.2 nm. Green symbols indicate
Amber ff03ws, and where applicable, red symbols indicate Amber ff03w
and blue symbols Amber ff03* with TIP3P water. Experimental data are
indicated by black lines (taken from Shalongo et al.[75] for Ac-(AAQAA)3-NH2, from Muñoz
et al. for GB1, from Neidigh et al. for Trp cage[76] and from from Honda et al. for chignolin[77]). Up triangles and down triangles refer, respectively,
to REMD simulations initiated from unfolded or folded structures.
Simulation lengths were 150 ns (50 ns equilibration) for Ac-(AAQAA)3-NH2, 150 ns (75 ns equilibration) for Trp cage
initiated from folded structures, 300 ns (150 ns equilibration) for
Trp cage initiated from unfolded structures, 500 ns (200 ns equilibration)
for GB1 initiated from folded structures, 400 ns (200 ns equilibration)
for GB1 initiated from unfolded structures, and 100 ns (50 ns equilibration)
for chignolin.
Folding of Mini-Proteins:
Trp Cage, GB1 Hairpin, and Chignolin
In addition to the simple
helix-forming peptide above, we have
also investigated the stability of small cooperatively folding motifs,
or “mini-proteins”. These present the advantage that
their folding equilibria can be sampled relatively easily and, due
to their low stability, they are very sensitive to changes in force
field parameters.Peptide folding equilibria. (A) Temperature-dependent
helix formation
in Ac-(AAQAA)3-NH2, (inset) per-residue fraction
helix at 300 K. (B) Folded population of Trp Cage. (C) Folded population
of GB1 hairpin. (D) Folded population of chignolin. Folded populations
are defined as those with dRMS from the
experimental structure of less than 0.2 nm. Green symbols indicate
Amber ff03ws, and where applicable, red symbols indicate Amber ff03w
and blue symbols Amber ff03* with TIP3P water. Experimental data are
indicated by black lines (taken from Shalongo et al.[75] for Ac-(AAQAA)3-NH2, from Muñoz
et al. for GB1, from Neidigh et al. for Trp cage[76] and from from Honda et al. for chignolin[77]). Up triangles and down triangles refer, respectively,
to REMD simulations initiated from unfolded or folded structures.
Simulation lengths were 150 ns (50 ns equilibration) for Ac-(AAQAA)3-NH2, 150 ns (75 ns equilibration) for Trp cage
initiated from folded structures, 300 ns (150 ns equilibration) for
Trp cage initiated from unfolded structures, 500 ns (200 ns equilibration)
for GB1 initiated from folded structures, 400 ns (200 ns equilibration)
for GB1 initiated from unfolded structures, and 100 ns (50 ns equilibration)
for chignolin.The mini-proteins we
consider are the 20-residue Trp-cage,[76] representative of α-helical and 310-helical structure,
the 16-residue β-hairpin GB1,[78,79] and 10-residue
β-hairpin chignolin,[77] each representative
of turn and β structure. In Figure 5B,
we have determined melting curves for each of
these systems from replica-exchange molecular dynamics simulations.
Because obtaining fully converged equilibrium curves is challenging
for such systems, in each case, we run two independent calculations,
one starting from the native state, and another starting from an unfolded
state.The melting curves for Trp cage (Figure 5B) show that it is stably folded using Amber ff03ws, with
the results
of the two REMD simulations (from folded, unfolded) in near agreement
except at the lowest temperatures. The stability is slightly lower
than with the earlier Amber ff03* force field and TIP3P water, but
has similar stability to experiment at temperatures ∼300 K.It is more challenging to obtain converged melting curves for the
GB1 hairpin because of its slower folding rate,[80] but nonetheless the simulations starting from unfolded
and folded conformations provide bounds on the true equilibrium curve.
Different estimates for the fraction folded at 300 K have been obtained
in different experiments, ranging from ∼30%[81] to ∼50%.[79] Our estimate
of ∼10% is somewhat lower than these, however this still corresponds
to a folded state destabilization of only ∼1.3 kBT.As a second example of a β-hairpin,
we consider the designed
β-hairpin chignolin, derived by taking a consensus sequence
out of a large number of sequences which fold to a structure similar
to GB1.[77] In this case, because of the
smaller size of the peptide, we can obtain converged results starting
from folded or unfolded states (Figure 5D).
The folded population near 300 K is remarkably close to the experimental
estimate, but with a temperature dependence which is still too weak.
This lack of cooperativity is also evident for the other model systems
studied here, suggesting that the scaled water interactions do not
help to address this known deficiency of current additive force fields.[3]
Stability of Folded Proteins
Lastly,
we consider the
stability of native proteins: it might be anticipated that increasing
the protein-water interaction strength may destabilize folded states.
However, this is a difficult issue to address computationally as computing
the stability of native proteins is rather challenging. Instead, we
show that a more limited condition is satisfied, i.e. that folded
proteins stay close to the native state in simulations starting from
the experimental structures. We have chosen a set of four proteins
representative of different structural classes so as to obtain a good
coverage of protein structure space. These are the extensively characterized
ubiquitin (α/β), spectrin R15 (all-α), the cold
shock protein from Thermotoga maritima, CspTm (all-β), and humanlysozyme (α/β) –
the last being an example of a slightly larger protein with two distinct
α and β domains. For each protein, we have run 200 ns
simulations starting from the experimentally determined, folded structure.
In all cases, using the Amber03ws model results in all-atom dRMS deviations of ∼0.2 nm from the X-ray
structures. These deviations are comparable to, or in some cases (CspTm) slightly better than those obtained with the original
Amber03w. (Figure 6 A-D). To further quantify
the fluctuations on a per-residue basis, we have computed root-mean-square
fluctuations (RMSF) averaged over each residue, after aligning the
backbone to the experimental structure. The RMSF (Figure 6E–H) obtained with Amber03w and Amber03ws
is also comparable. Lastly, we have calculated all-atom contact maps
with a 0.6 nm cutoff (i.e., for each pair of residues, we calculate
the fraction of the time for which at least one pair of heavy atoms
from the two residues is within 0.6 nm). These contact maps, shown
in Figure 7, show that almost identical contacts
are formed with the original and modified force field; that is very
few native contacts are broken and few non-native contacts are formed.
We have also plotted the residue–residue contacts on the α-carbon
trace of each protein, highlighting the contacts formed in the Amber
ff03w simulations and not in the Amber ff03ws simulations, and vice
versa. As can be seen there are generally very few differences. The
largest changes are for CspTm, as already suggested
by the differences in the dRMS plots;
in this case, the ff03ws is in fact closer to the experimental structure.
Finally, we have directly calculated backbone amide order parameters
for comparison with those measured by NMR, for the proteins where
data is available (ubiquitin, humanlysozyme).[82] The results, shown in Figure 8 indicate
that the Amber ff03w and ff03ws force fields yield similar values
for the order parameters in regions of secondary structure, and are
in both cases in good agreement with experiment. For ubiquitin, the
loops are clearly more flexible in Amber ff03ws than in either Amber
ff03w or in experiment; in lysozyme, both ff03w and ff03ws exhibit
some excess flexibility in the loops over experiment. One caveat to
this interpretation is that the method we have used to compute order
parameters averages the internal dynamics over the entire simulation,[82] whereas the fit to experiment is sensitive only
to dynamics shorter than the reorientational correlation time of the
molecule. The calculated order parameters may therefore be an underestimate,
but much longer simulations would be needed to calculate them in the
analogous way to experiment.[83]
Figure 6
Stability of folded proteins.
We show native distance matrix RMS
(dRMS) calculated for all heavy atom pairs
in contact in the native crystal structure, for (A) ubiquitin, (B)
CspTm, (C) human lysozyme, and (D) spectrin R15 and
residue-averaged root-mean-square fluctuations (RMSF) for each of
the proteins (E) ubiquitin, (F) CspTm, (G) human
lysozyme, and (H) spectrin R15. Black and red curves correspond respectively
to results obtained with the Amber ff03w and Amber ff03ws force fields
in 200 ns simulations at 300 K. The experimental structures are presented
as insets in E–H.
Figure 7
Contact differences in folded proteins. For each of the proteins
shown in Figure 6, we present contact maps
(left panels) computed with Amber ff03w (above diagonal) and Amber
ff03ws (below diagonal): (A) ubiquitin, (B) CspTm, (C) human lysozyme, and (D) spectrin R15. In the right panels,
we show the contacts overlaid on the structures: cyan contacts are
common to both simulations, orange contacts are those for which P(ff03ws) – P(ff03w) < −0.5,
that is, the contact is formed with ff03w, but not ff03ws; similarly
dark blue contacts are those formed with ff03ws, but not ff03w P(ff03ws) – P(ff03w) > 0.5.
Figure 8
NMR order parameters for folded proteins: (A) ubiquitin
and (B)
lysozyme. Black lines are results using Amber ff03w. Red lines using
Amber ff03ws and blue symbols are experimental data taken from Tjandra
et al.[84] for ubiquitin and Mine et al.[85] for lysozyme.
Taken
together, these results all suggest that folded proteins are indeed
reasonably stable when using the ff03ws force field.Stability of folded proteins.
We show native distance matrix RMS
(dRMS) calculated for all heavy atom pairs
in contact in the native crystal structure, for (A) ubiquitin, (B)
CspTm, (C) humanlysozyme, and (D) spectrin R15 and
residue-averaged root-mean-square fluctuations (RMSF) for each of
the proteins (E) ubiquitin, (F) CspTm, (G) humanlysozyme, and (H) spectrin R15. Black and red curves correspond respectively
to results obtained with the Amber ff03w and Amber ff03ws force fields
in 200 ns simulations at 300 K. The experimental structures are presented
as insets in E–H.Contact differences in folded proteins. For each of the proteins
shown in Figure 6, we present contact maps
(left panels) computed with Amber ff03w (above diagonal) and Amber
ff03ws (below diagonal): (A) ubiquitin, (B) CspTm, (C) humanlysozyme, and (D) spectrin R15. In the right panels,
we show the contacts overlaid on the structures: cyan contacts are
common to both simulations, orange contacts are those for which P(ff03ws) – P(ff03w) < −0.5,
that is, the contact is formed with ff03w, but not ff03ws; similarly
dark blue contacts are those formed with ff03ws, but not ff03w P(ff03ws) – P(ff03w) > 0.5.NMR order parameters for folded proteins: (A) ubiquitin
and (B)
lysozyme. Black lines are results using Amber ff03w. Red lines using
Amber ff03ws and blue symbols are experimental data taken from Tjandra
et al.[84] for ubiquitin and Mine et al.[85] for lysozyme.
Discussion
Current force fields have achieved remarkable
successes in recent years, most notably the ab initio folding of a
number of small proteins starting from fully unfolded conformations.[86−91] Despite this success, however, these simulations did reveal some
unexpected features, for example, that the unfolded state was too
compact relative to experiment,[3] and often
contained highly structured states.[92−95] The collapse was clearly inconsistent
with SAXS and FRET measurements probing unfolded state dimensions,[3] while structure-forming propensity appears inconsistent
with the general finding of little persistent residual structure in
unfolded proteins, beyond the intrinsic structure of each residue.[96] It has also been suggested that such non-native
traps may be responsible for another artifact of all-atom folding
simulations; namely, the extreme sensitivity of the folding rate to
temperature[93] (most successful folding
simulations have been done at high temperatures ∼350 K). In
experiment, the folding rate is usually only weakly temperature-dependent.
The slow rates at low temperature may result from an energy landscape
which is too rugged as the result of non-native traps.Our work
suggests that, with current force fields, peptides are insufficiently
well solvated in water. This effect, while it may help to stabilize
the folded state, also clearly results in unfolded states which are
too collapsed, and may result in additional ruggedness via stabilization
of structured non-native states. In this paper, we have clearly shown
that the dimensions of unfolded and intrinsically disordered proteins
can be corrected by appropriately balancing protein–solvent
interactions, and that such a correction also appears to improve the
overall agreement of solvation free energies and nonspecific protein–protein
interactions with experiment. Notably, even though the optimal correction
was obtained from a specific set of data for one system (FRET data
from Csp M34), the same parameter results in similar improvements
for the other systems, that is, it is “transferable”.An obvious question, then, is to what extent the correction factor
obtained, γ = 1.10, would be transferable to different force
fields (given that other force fields also result in unfolded states
which are too collapsed[3,25]). Most likely, for completely
different protein force fields, such as CHARMM 36[9] or OPLS/AA,[97] a slightly different
factor may be needed since the Lennard-Jones parameters are independently
determined. However, there is a variety of Amber force fields sharing
the same Lennard-Jones parameters, notably Amber ff99SB[5] and its derivatives,[10] for which the correction may be similar. In fact, our preliminary
tests suggest that a scaling of factor of γ = 1.10 also appears
to be near optimal for a force field, Amber ff99SBws, based on Amber
ff99SB*-ILDN-Q (details in Supporting Information Text): that is the Amber ff99SB force field,[5] with a global backbone correction for helix propensity,[6] modified torsion parameters for certain side-chains,[10] and a uniform backbone charge model, found to
more accurately reproduce helix propensities for different residue
types.[40] In Supporting
Information Figures S1–S3, we show data for the dimensions
of Csp M34, villin association, and the helix fraction of AAQAA for
the ff99SBws analogue of ff99SB.We have focused on a narrowly
defined tuning of the force field
aimed at achieving the desired improvements; since current force fields
are very good for many purposes and represent many years of careful
tuning, it makes sense to keep as much as possible of the original.
Of course, the solvation free energy results suggest that more fine-grained
changes to the parameters may help to improve also the individual
solvation free energies of different side-chains, rather than just
the global signed error; ultimately, such an approach would be desirable,
but would represent a completely new force field parametrization.
Future more comprehensive global optimization efforts[98−100] may be able to include the type of data used here as part of the
optimization target, or for validation. The poor solvation of proteins
in current force fields may tend to have an stabilizing effect on
folded structures. We find that by improving the solvation, folded
structures are still generally stable, although with larger amplitude
dynamics in loop regions. Most likely, the properties of force fields
with a good average balance of protein water interactions will depend
more sensitively on the choice of other parameters in the protein
energy function.One aspect we have not really addressed is
the effect of these
changes on protein folding dynamics. As alluded to above, a poorly
solvated protein force field may result in deeper non-native “trap”
states, stabilized merely due to being collapsed. When protein–water
interactions are more finely balanced, this could significantly change
the local features of the folding energy landscape. In future work,
it will be interesting to test what effect these changes might have
on temperature-dependent protein folding dynamics.
Authors: Robert B Best; Xiao Zhu; Jihyun Shim; Pedro E M Lopes; Jeetain Mittal; Michael Feig; Alexander D Mackerell Journal: J Chem Theory Comput Date: 2012-07-18 Impact factor: 6.006