We present an atomistic level computational investigation of the dynamics of a signaling protein, monocyte chemoattractant protein-1 (MCP-1), that explores how simulation geometry and solution ionic strength affect the calculated diffusion coefficient. Using a simple extension of noncubic finite size diffusion correction expressions, it is possible to calculate experimentally comparable diffusion coefficients that are fully consistent with those determined from cubic box simulations. Additionally, increasing the concentration of salt in the solvent environment leads to changes in protein dynamics that are not explainable through changes in solvent viscosity alone. This work in accurate computational determination of protein diffusion coefficients led us to investigate molecular-weight-based predictors for biomolecular diffusion. By introducing protein volume- and protein surface-area-based extensions of traditional statistical relations connecting particle molecular weight to diffusion, we find that protein solvent-excluded surface area rather than volume works as a better geometric property for estimating biomolecule Stokes radii. This work highlights the considerations necessary for accurate computational determination of biomolecule diffusivity and presents insight into molecular weight relations for diffusion that could lead to new routes for estimating protein diffusion beyond the traditional approaches.
We present an atomistic level computational investigation of the dynamics of a signaling protein, monocyte chemoattractant protein-1 (MCP-1), that explores how simulation geometry and solution ionic strength affect the calculated diffusion coefficient. Using a simple extension of noncubic finite size diffusion correction expressions, it is possible to calculate experimentally comparable diffusion coefficients that are fully consistent with those determined from cubic box simulations. Additionally, increasing the concentration of salt in the solvent environment leads to changes in protein dynamics that are not explainable through changes in solvent viscosity alone. This work in accurate computational determination of protein diffusion coefficients led us to investigate molecular-weight-based predictors for biomolecular diffusion. By introducing protein volume- and protein surface-area-based extensions of traditional statistical relations connecting particle molecular weight to diffusion, we find that protein solvent-excluded surface area rather than volume works as a better geometric property for estimating biomolecule Stokes radii. This work highlights the considerations necessary for accurate computational determination of biomolecule diffusivity and presents insight into molecular weight relations for diffusion that could lead to new routes for estimating protein diffusion beyond the traditional approaches.
How proteins and other
biomolecules move
establishes a rate limit
for function in living systems. No single molecule can perform all
functions in and around cells, so diffusion of partner molecular species
is necessary for given particles to play a role in a complex system.
The primary role of signaling proteins is to do exactly this, translate
through their environment and interact with other larger biomolecular
structures in order to enact a more complex action.Most studies
on protein translation and function have been performed
in experimental laboratories rather than through computational methods.
Before the first structures of proteins were determined, diffusion
analysis via ultracentrifugation sedimentation studies was one of
the primary methods for assessing the shape and establishing the identity
of these macromolecules.[1−4] Early such analyses in protein dynamics motivated
the awarding of the 1926 Nobel Prize in Chemistry to Theodor Svedberg
for his pioneering efforts in the practical development and implementation
of experimental techniques in protein dynamics and separation. Protein
diffusion studies tend to be challenging from a computational perspective,
because the translational dynamics of multi-kDa mass particles that
are greater than an order-of-magnitude larger than the particles in
their surrounding solvent environment necessitate the use of reasonably
large system sizes simulated over relatively long times, often multiple
microseconds in length.Complicating matters, molecular simulations
are performed on finite-sized
systems. While use of periodic boundary conditions and extended interaction
correction techniques are mostly standardized for modeling condensed
phase molecular systems, some calculated properties need to be corrected
to account for the size of the simulation cell.[5] For example, while the shear viscosity is relatively insensitive
to simulated system size,[6,7] the apparent three-dimensional
diffusion coefficient will be slower in smaller systems and faster
in larger ones, and corrections have been developed to account for
these variations with system size.[6−13] We are interested in calculating diffusion coefficients that are
quantitatively comparable with experiments, and some of the more complex
models need to be simulated with noncubic simulation boxes. It would
be beneficial to evaluate the accuracy of current strategies for converting
apparent simulation calculated diffusion coefficients into viscosity
corrected infinitely dilute diffusion coefficients. Converting a single
simulation value would eliminate the need to perform multiple simulations
in order to project out to the infinitely dilute value.Here,
we explore protein diffusion from atomistic level molecular
dynamics simulations of a small signaling protein, monocyte chemoattractant
protein-1 (MCP-1). The goals of this effort are severalfold and include
(1) determining the steps and simulation scope necessary for calculation
of experimentally comparable diffusion coefficients, (2) assessing
how finite size corrections can be applied to single cubic and tetragonal
simulations to determine similarly accurate diffusion coefficients,
and (3) exploring how changes in the ionic strength of the surrounding
environment affect translational diffusion of a protein. Performing
this work led us to investigate protein-mass-based estimation of diffusion
coefficients to evaluate how one might potentially make accurate estimations
of protein diffusion without prior experimental knowledge of biomolecule
dynamics.
Methods
As a model system and for the purpose of comparison
with experimental
estimates, molecular dynamics simulations of pure water, water solvated
monocyte chemoattractant protein-1 (MCP-1: PDB code 1DOL), and MCP-1 in increasingly
concentrated aqueous NaCl solutions were performed at a target temperature
and pressure of 310.15 K and 1 atm. In addition to these molecular
dynamics simulations and companion analyses, we performed a surface
area and volumetric investigation of a set of proteins of increasing
mass in order to construct alternate mass-to-geometry relations for
general estimation of biomolecule diffusivity.In all molecular
dynamics simulation systems, cubic simulation
box geometries were used to form accurate finite size correction estimations
for the diffusion coefficient calculations.[6−8] In the pure
water and water solvated MCP-1 molecular simulation systems, additional
tetragonal simulation box geometries were chosen to explore simple
extensions of these finite size corrections beyond the typical cubic
simulation box constraint. We have an interest in simulating the dynamics
of proteins in the presence of collagen or other periodically replicated
molecules, and these tetragonal simulations were used to test how
accurate such extensions are in correcting diffusion calculations
for large singular biomolecules.[8−13]Figure details
the system geometry series used in calculating experimentally comparable
diffusion coefficients. The pure water systems ranged from 3 to 6
nm short edge-length simulation box sizes, approximately 2700 to 21 500
atoms, respectively, while the water solvated protein systems ranged
from 5 to 9 nm, corresponding to 12 400 to 72 500 atoms,
respectively. The smaller tetragonal systems were up to twice the
size of the corresponding cubic systems given the longer L edge-length. The pure water systems
were both a verification test of finite size correction parameters
detailed in previous studies[8,9] and necessary for determination
of solvent viscosity η at the target temperature and pressure,
needed for both finite size corrections and solvent model viscosity
corrections in making experimental comparisons.[7,8] Specifically,
for cubic systems with potentially charged molecules, Yeh and Hummer
proposed the adapted cubic system finite size correctionwhere kB is the
Boltzmann constant, T is the simulation temperature, L is the simulation box edge-length, η is the solvent
viscosity, ξEW ≈ 2.837298 is a unitless simple
cubic lattice self-term,[14] and α
is an empirical free fitting parameter introduced to account for deviations
from previous correction expressions, potentially introduced from
charged solute interactions.[8] This expression
gives a route to calculating the infinitely dilute diffusion coefficient, D0, from the apparent diffusion coefficient, Dapp, as calculated from a molecule’s
mean-square displacement over time. The resulting D0 is not necessarily yet an experimentally comparable
value, particularly if the solvent environment in the simulation does
not reproduce the experimental environment and properties well. Multiplying
the simulation D0 by the ratio of simulation
and experimental viscosities can correct for these differences.While insensitive
to simulation system sizes,
ηTIP3P will be strongly temperature and salt concentration
dependent, and it needs to be estimated for each given simulation
composition and state point.
Figure 1
Illustrated
series of simulation system geometries used in the
calculation of experimentally comparable diffusion coefficients. The
water-only simulations spanned cubic (top) and tetragonal (bottom)
boxes with a short edge-length (L or L) ranging
from approximately 3 to 6 nm in 0.5 nm increments, while the protein
simulations needed to be larger with a short edge ranging from 5 to
9 nm in 1 nm increments.
Illustrated
series of simulation system geometries used in the
calculation of experimentally comparable diffusion coefficients. The
water-only simulations spanned cubic (top) and tetragonal (bottom)
boxes with a short edge-length (L or L) ranging
from approximately 3 to 6 nm in 0.5 nm increments, while the protein
simulations needed to be larger with a short edge ranging from 5 to
9 nm in 1 nm increments.Calculating a D0 using eq is potentially convenient
assuming
the necessary parameters, η and α, are easily determined
or somewhat universal. In the case of α, this is not entirely
clear given its empirical origins. For a short net −2e RNA strand, Yeh and Hummer estimate this correction factor
α ≈ 0.76, which was considered reasonable given a value
somewhat close to unity.[8] Potentially more
concerning is the requirement of eq for perfectly cubic simulation geometries. When this
is not possible, determining experimentally comparable diffusion coefficients
becomes problematic. Recently, Kikugawa et al. developed fit functions
that can be used to correct diffusion coefficient estimations in the
case of tetragonal distortions of simulation box geometries. In the
case of an elongated tetragonal cell, where L ≥ L = LHere, the D value represents the calculated one-dimensional apparent
diffusion coefficient in the z-dimension, with similar D and D values for x and y, a = L/L, and a0 = 2.79336, which is presented as a universal constant determined
from the authors’ extensive study of Lennard–Jones particle
simulations.[9] In principle, the modification
in parentheses is simply a fit function that approaches −1
when the a aspect ratio approaches 1, restoring the unperturbed cubic
correction term. In the case of D, the function in parentheses is taken to equal −1.
This indicates that diffusion along the long x-axis
is dependent upon the smaller L = L according
to the standard cubic correction, something observed in their, and
other, Lennard–Jones particle simulations.[9,11−13]It should be noted that the tetragonal corrections
for Lennard–Jones
particles proposed by Kikugawa et al. have recently been independently
placed on more analytical footing by Vögele and Hummer,[12] versions of which are being further explored
by Simonnin et al.[13] The extended expressions
use box lengths rather than aspect ratios and slightly different numerical
fitting constants, these with a focus on retracted tetragonal cells
(L < L = L) that are more appropriate for membrane simulations. In this
particular work, we focus on the original numerical forms proposed
by Kikugawa et al.,[9] with any suggested
extensions being fully modular and applicable to alternate expressions.In the case of diffusion of charged proteins or similar macromolecular
particles, it would seem a combination of eqs and 3 would provide
a route to calculating experimentally comparable diffusion coefficients.
Such an expression array would take the formwhere
again L ≥ L = L, and all
the relevant variables are the same as described previously. This
set of D0 relations was tested first for
pure water, where α = 1, and then for MCP-1 using an α
derived from the cubic systems.The effect of salt concentration
on the dynamics of the MCP-1 protein
was also investigated by simulating it in the presence of additional
ions beyond the five Cl– ions needed to counter
the net charge of MCP-1 at pH 7. Cubic simulation box systems with
the same dimensions as the water solvated MCP-1 systems were prepared
by random insertion of ions and water to target 0.125, 0.25, and 0.5
M NaCl concentrations. In addition to these buffered protein simulations,
simulations of just NaCl in water at the same concentrations needed
to be performed in order to calculate the solvent viscosity of the
electrolyte solutions. Like the analogous pure water simulations,
determination of solvent viscosity and the apparent diffusion coefficients
of these salt solutions used 3 to 6 nm simulation box edge-lengths.
Simulation
Methods
Molecular dynamics simulations were
performed using GROMACS 4.5.5.[15−17] The Amber99SB-ILDN force field
was used for modeling of protein,[18,19] while water
and ions were modeled using the TIP3P water model with the associated
Amber force field ion types.[20−22] Protein setup involved PDB2GMX
processing of the 1DOL Protein Data Bank structure,[23] followed by charge neutralization with five Cl– ions. All of the previously described systems were individually
prepared using random insertion of protein, Na+ ions, and
Cl– ions, followed by a culled-overlap insertion
of equilibrated water boxes.For each system size and simulation
type, 10 unique starting configurations were prepared and seeded with
random velocities from a Maxwell–Boltzmann distribution at
a target temperature of 310.15 K following 1000 kJ mol–1 nm–1 force converged steepest descent minimization.
All were equilibrated for 150 ps of isotropic constant-pressure (1
atm) and constant-temperature (310.15 K) equilibration with the Parrinello–Rahman
barostat and v-rescale thermostat using time constants of 10 and 1
ps, respectively.[24,25] Integration of the equations
of motion was performed using the leapfrog algorithm with a time step
of 3 fs, LINCS to constrain protein bond vibrations, and SETTLE to
keep TIP3P water molecules rigid.[26,27] Smooth particle-mesh
Ewald[28] with a real-space cutoff of 1.0
nm, spline order of 4, and energy tolerance of 10–5 was used for long-ranged electrostatics corrections. Lennard–Jones
interactions were cut off at 1.0 nm with applied long-range energy
and pressure corrections.[29]Molecular
dynamics trajectories were recorded for each simulation
immediately following the NPT equilibration. Cubic system protein
containing simulations were each run for 300 ns, while those without
protein were run for 30 ns. In the case of tetragonal protein containing
simulations, simulations were extended to 450 ns to provide additional
sampling, as the Dapp needed to be decomposed
into x-, y-, and z-dimension components. Given that there were 10 independent simulations
for each system composition and size, each tetragonal protein, cubic
protein, and nonprotein simulation data point involved 4.5 μs,
3 μs, and 300 ns of respective aggregate sampling.Apparent
diffusion coefficients were calculated using the Einstein
mean-square displacement (MSD) as a function of time relationwhere d is the desired dimension
(1, 2, or 3), and the displacement, r(t) – r(0), is taken only over the dimensions
of interest. The Dapp is simply calculated
from the slope of the linear region of the MSD as a function of the
time interval. For the diffusion of water, the sheer number of water
molecules results in excellent averaging, leading to a linear trend
over most time intervals greater than the subpicosecond mean first
collision time. The regression interval for waterDapp values was taken over the 1-to-15 ns, and standard
errors of these values were determined over the 10 independent simulation
trajectories. For single molecules, like MCP-1 in this study, the
averaging is much poorer. The regressions to determine protein Dapp values were taken over the 2.5-to-25 ns,
the observed most linear region. This choice was validated by subsequent
iterative expansion of the regression range to the point where changes
in the calculated Dapp were not distinguishable
from the accumulated error, and the resulting values overlapped with
those from this selected time interval. In cubic systems, d = 3 in eq , while for tetragonal systems, d = 1 and separate D, D, and D values were determined from particle displacement
solely in the respective dimensions.
Mass Relation Analysis
In order to derive general insight
into protein molecular weight (MW) to diffusion coefficient expressions,
surface area and volume calculations were performed on a set of 40
protein structures of increasing mass, from 3.7 to 48 kDa.[30] All nonstandard components of the structures,
such as ligands, ions, hemes, and water particles, were removed from
the selected structures in the interest of placing a protein sequence
knowledge-based restriction on the explored series. Hydrogen atoms
were cleared and added in idealized positions to Protein Data Bank
structures for all proteins (see Table S1 in the Supporting Information for the list of proteins), and in the
case of NMR structures with multiple deposited conformers, only the
first conformer was used. MSMS 2.6.1 was used to calculate the solvent-accessible
surface area (SASA),[31] solvent-excluded
surface area (SESA),[32] and solvent-excluded
surface area volume, all using a probe radius of 0.14 nm. While the
same probe is used for both the SASA and SESA calculations, the surfaces
differ in that the SESA represents the mostly smooth protein contact
surface, while the SASA is the surface traced out by the center of
the probe sphere and is not smooth. As the SASA originates from the
solvent probe center, it is inflated relative to the SESA and typically
reports larger values (see the table in the Supporting Information). One could physically describe the SESA as the
protein surface and the SASA as the protein surface modulated by bound
solvent. Regression of all these quantities versus protein MW provided
a slope, and potentially nonzero intercept if desired, to form a function
for estimation of a hydrodynamic radius, RH, given the approximation that the SASA, SESA, and/or volume be applied
to a sphere. These regression results were applied as coefficient
inputs in the volume or surface area forms of the following related
extensions of the standard Stokes–Einstein relationfor the MW to RH approximation via general protein volumes, andfor the MW to RH approximation via general protein surface
areas, either SASAs or
SESAs. In both of these equations, C0 and C1 come from the slope and intercept of the protein
geometric quantity versus MW regression, respectively. If the regression
intercept passes through 0 nm2 or nm3 in the
surface area or volume regressions, respectively, the C1 parameter is eliminated, leaving only C0 as a protein geometry to MW connection parameter. The
performance of these geometry-only tethered diffusion relations was
assessed through comparisons to experimentally measured values.
Results and Discussion
Model viscosity and experimentally
comparable diffusion coefficients
for TIP3P water were initially determined from a series of increasingly
sized cubic system simulations. The viscosity of the model at 310.15
K is needed for correction of noncubic system diffusion, water solvated
protein diffusion, and all conversions of D0 to Dη values
for experimental comparisons. The TIP3P water solvent was used in
this study due to its general acceptance in biomolecular simulations
more so than its ability to accurately reproduce the experimental
properties of real water. The large downside to this is that TIP3P
is known to diffuse two to three times faster than real water, indicating
that it has a significantly lower viscosity than experiment. In agreement
with previous findings,[7] we observe this
reduced viscosity with respect to experiment, where ηexpt = 6. 88 × 10–4 kg m–1 s–1 at this temperature.[33] The top panel of Figure shows the system size dependence of ηTIP3P, and as seen in other studies, it is insensitive to changes in simulation
system size. The average value of 2.80 ± 0.02 × 10–4 kg m–1 s–1 is in agreement with
the projected temperature trend observed by Venable et al.,[34] and it is expectedly much lower than ηexpt, resulting in a viscosity correction ratio in eq of 0.408.
Figure 2
Viscosity (upper) and
calculated Dapp and corrected D0 values (lower) for
TIP3P water at 310.15 K as a function of inverse box size. The ηTIP3P is insensitive to changes in system size, while the Dapp is quite strongly dependent on the choice
of simulation size. The linear projection of Dapp to infinite system size and average corrected value using eq are in agreement at 7.11
± 0.09 and 7.120 ± 0.005 × 10–5 cm2 s–1.
Viscosity (upper) and
calculated Dapp and corrected D0 values (lower) for
TIP3P water at 310.15 K as a function of inverse box size. The ηTIP3P is insensitive to changes in system size, while the Dapp is quite strongly dependent on the choice
of simulation size. The linear projection of Dapp to infinite system size and average corrected value using eq are in agreement at 7.11
± 0.09 and 7.120 ± 0.005 × 10–5 cm2 s–1.From the lower panel of Figure , it is clear to see why it is important
to consider
correction of the Dapp from individual
molecular simulations. The Dapp values
increase linearly with inverse box edge-length up to a D0 value of 7.11 ± 0.09 × 10–5 cm2 s–1 at infinite box size. When
applying eq with a
correction factor α = 1, there is strong agreement with this
infinitely dilute value given that the average 7.120 ± 0.005
× 10–5 cm2 s–1 overlaps the Dapp regression value within
error. Both of these values are significantly larger than the Dexpt(310.15 K) value of 3.04 × 10–5 cm2 s–1.[33] Correcting the calculated D0 value with
the viscosity correction eq results in a similar Dη = 2.904 ± 0.003 × 10–5 cm2 s–1, which, while slightly low, is in significantly
better agreement with the experimental value.
Corrected Diffusion Coefficients
for Water from Noncubic Simulation
Boxes Are in Good Agreement with Cubic Box and Experimental Values
Moving beyond perfectly cubic simulation cells, the ability to
accurately predict the diffusion coefficient for a molecule from an
alternatively shaped simulation box would be advantageous for modeling
studies of systems where a constraint is applied along one of the
cell dimensions. To evaluate the accuracy of eq , we performed TIP3P water simulations in
elongated tetragonal simulation boxes, where the x-dimension box length L was kept fixed at the maximum cubic box length of 6 nm, and L = L was reduced following the same series of
smaller box size dimensions explored in the cubic box correction test
above. To determine an overall Dη value from such noncubic simulations, each dimension was corrected
independently, and the resulting independent values were averaged. Table shows a comparison
of resulting Dη values from these
simulations alongside the cubic system results. They agree well as
the calculated error bars just overlap.
Table 1
Calculated Dη Valuesa for
TIP3P Water
at 310.15 K from Tetragonal and Cubic Simulations of Increasing System
Size
tetragonal
cubic
1/⟨Lz⟩
Nwat
Dη
1/⟨Lz⟩
Nwat
Dη
(nm–1)
(10–5 cm2 s–1)
(nm–1)
(10–5 cm2 s–1)
0.332
1780
2.92(2)
0.333
895
2.91(2)
0.284
2447
2.91(1)
0.286
1410
2.90(1)
0.247
3257
2.90(1)
0.250
2180
2.90(1)
0.221
4017
2.916(8)
0.222
3009
2.912(9)
0.199
4993
2.912(9)
0.200
4142
2.902(8)
0.182
5935
2.904(9)
0.182
5439
2.897(5)
0.167
7161
2.905(4)
0.167
7161
2.905(5)
avg:
2.911(4)
2.904(3)
With standard error of the last
digits in parentheses. .
With standard error of the last
digits in parentheses. .These tetragonal system diffusion results used the a0 aspect ratio constant numerically computed from Lennard–Jones
simulations.[9] We found that the resulting Dη values were not strongly perturbed by
moderate changes to this parameter. In fact, rounding the original a0 = 2.79336 value to a0 = 3 gives an average Dη result that overlaps within error. While we used the recommended
constant in all tetragonal simulation correction calculations, it
is possible that this empirical correction could be further refined
or simplified.
Corrected MCP-1 Diffusion Coefficients Are
Equivalent to Those
Determined from Infinitely Dilute Projections
While the corrective
techniques worked well in pure water simulations, charged biomolecule
simulations potentially pose a greater challenge given the need to
introduce and determine a potentially system specific α correction
factor. Such a correction factor needs to be extracted from a linear
regression of cubic simulation box Dapp values as a function of increasing inverse box edge-length. A plot
of this trend for the water solvated MCP-1 signaling protein is shown
in Figure , with the
regression intercept of 0.447 ± 0.027 × 10–5 cm2 s–1. This intercept was used to
set the α correction factor in eq to α = 0.710, resulting in an identical average D0 value of 0.447 ± 0.005 × 10–5 cm2 s–1. Correcting
for the viscosity of the model solvent results in a Dη of 0.182 ± 0.002 × 10–5 cm2 s–1.
Figure 3
Dapp and corrected D0 values for
TIP3P solvated MCP-1 at 310.15 K as a function
of inverse simulation box size. The α correction factor in eq was determined to be 0.710
from the intercept of the Dapp regression,
resulting in an average value of D0 =
0.447 ± 0.005 × 10–5 cm2 s–1.
Dapp and corrected D0 values for
TIP3P solvated MCP-1 at 310.15 K as a function
of inverse simulation box size. The α correction factor in eq was determined to be 0.710
from the intercept of the Dapp regression,
resulting in an average value of D0 =
0.447 ± 0.005 × 10–5 cm2 s–1.With this α value,
we used eq to evaluate
the tetragonal simulation cell diffusion
coefficients. Each of the dimension D values were separately corrected, and the averaged Dη results over all dimensions are displayed
in Table . The averaged
noncubic and cubic results overlap within the accumulated error, indicating
that using this extended correction equation is a reasonable strategy
for correcting charged biomolecule diffusion coefficients from tetragonal
simulation cells. One other general thing to note comes from comparing
the error values in Table with those in Table . In the case of water diffusion, the statistical error gets
progressively smaller with increasing system size, while for MCP-1
diffusion, there is no such trend. The lack of trend in the MCP-1
results is expected, because as the system size increases, the number
of water molecules increases, but only a single protein is present.
Since there is no increase in MSD data, unlike in the case of the
pure water trajectories, the trend in statistical error should be
flat with random fluctuations.
Table 2
Calculated Dη Valuesa for
MCP-1 in TIP3P
Water at 310.15 K from Tetragonal and Cubic Simulations of Increasing
System Size
tetragonal
cubic
1/⟨Lz⟩
Dη
1/⟨Lz⟩
Dη
(nm–1)
(10–5 cm2 s–1)
(nm–1)
(10–5 cm2 s–1)
0.200
0.191(4)
0.200
0.183(3)
0.166
0.179(3)
0.166
0.185(5)
0.142
0.170(4)
0.142
0.175(6)
0.124
0.173(4)
0.124
0.182(3)
0.111
0.182(2)
0.111
0.186(7)
avg:
0.179(2)
0.182(2)
With standard error of the last
digits in parentheses.
With standard error of the last
digits in parentheses.
Effect
of Salt Concentration on Protein Diffusion Is Only Partly
Modeled by Solvent Viscosity Corrections
Salt or other cosolvents
change the viscosity of solution environments. Incorporation of salt
effects on protein dynamics is thus handled in an implicit fashion
through determination or estimation of a system specific solvent η
value. Given that we model salt ions explicitly in our molecular simulations,
we can test this implicit salt effect by calculating a protein’s
diffusion coefficient as a function of salt concentration. To do this,
TIP3P water simulations with increasing concentrations of NaCl were
performed in order to determine a trend in η for these electrolyte
solutions. While this trend is important for application in finite
size simulation corrections described above, it is critical for viscosity
correction of D0 if experimental comparisons
are expected. Table shows the calculated viscosities of electrolyte solutions as a function
of increasing salt concentration. In general, a linear trend is observed,
as in experiments,[33] though the reported
statistical error is generally somewhat large as these values are
averaged over only four system sizes, excepting the 0 M concentration
result which was calculated from the pure water simulations discussed
previously.
Table 3
Calculated η Values, Experimental
η Values,[33] α Correction Parameter,
and Calculated Dη for NaCl in TIP3P
Water at 310.15 K with Increasing NaCl Concentration
[NaCl]
ηsim
ηexpt
α
Dη
(M)
(10–4 kg m–1 s–1)
(10–4 kg m–1 s–1)
(10–5 cm2 s–1)
0.0
2.80(2)
6.88
0.710
0.182(2)
0.125
2.84(11)
6.95
0.662
0.166(1)
0.25
2.99(6)
7.04
0.642
0.159(2)
0.5
3.25(10)
7.20
0.606
0.157(1)
These viscosity
values were used in the construction of viscosity
correction ratios for computing Dη values from simulations of an MCP-1 protein monomer in aqueous environments
of increasing ionic strength. To account for possible changing of
the α correction factor in eq with changes in ionic strength, both α and Dη values were determined via Dapp regression as a function of system size, and a new
α value was determined for each concentration. The resulting
α values in Table follow a generally decreasing trend, indicating that this correction
factor may not be purely dependent upon the formal charge of the solute.[8] There appears to be some degree of coupling between
the solute protein and solvent charges at the given cosolvent concentrations.In hindsight, this change in α should not be that unexpected
given how solvent permittivity changes in response to increasing ionic
strength.[35] With the introduction of salt,
water is increasingly electrostricted in solvation shells about the
ions, decreasing the static dielectric constant of the environment.
For example, with a +5 net charge protein like MCP-1 here, the experimental
Bjerrum length λB, the distance when the Coulombic
potential energy between the solute and an external formal charge
is equal to kBT, is 3.5
nm at 298.15 K. If the static dielectric constant of the environment
decreases from 80 to 70, λB would increase to 4.0
nm. Even before considering nonuniform perturbations of the electrostatic
environment about the protein, there is an increasing interaction
range with the decreasing solution permittivity. This increased range
can further emphasize the need for the overly strong interaction correction
that the α parameter provides.Plotting the explicitly
calculated Dη values alongside projected Dη values
due solely to changes in viscosity as a function of increasing NaCl
concentration in Figure highlights a somewhat unexpected trend. Projecting the trend in
diffusion simply through changes in experimental viscosity with salt
concentration shows a modestly sloped decrease in Dη. Explicit calculation of the diffusion coefficient
shows a significantly more dramatic sudden drop with the addition
of salt ions and a somewhat similar smooth progression beyond this
point. Why is this Dη response so
different? Figure shows the normalized ion probability density function with respect
to distance from the protein surface for both Cl– and Na+ ions over these salt concentrations. Without
the introduction of NaCl, Cl– counterions distribute
preferentially near the surface of the protein, forming a negatively
charged ion cloud. With the introduction of 0.125 M NaCl, both Cl– and Na+ ions partition into this cloud
in a drive to neutralize the net protein charge (see illustrations
in Figure ). This
increases the effective concentration of ions in the immediate protein
environment, localizing ions at or near the protein surface to increase
both the protein hydrodynamic radius and the viscosity of the local
solvent environment. This effective cloud of ions moves with the protein,
acting to slow the diffusion of MCP-1 more than would be expected
in a uniform salt environment.
Figure 4
Simulation Dη for MCP-1 at 310.15
K as a function of NaCl concentration (black circles) alongside a
simple experimental viscosity-based prediction of the diffusion coefficient
(blue triangles). Of interest is the deviation of the actual diffusivity
from the predicted trend based solely on change in solvent viscosity.
The simulation Dη values show a
stronger dependence upon salt concentration, primarily through a sudden
drop with the initial introduction of NaCl.
Figure 5
Normalized probability density function
of ion occupancy as a function
of distance from the MCP-1 protein surface over the different simulation
salt concentrations. In the case of Cl– ions (left),
the counterion cloud at 0 M sees an enhanced occupancy probability
near the protein surface relative to further out in solution as illustrated
below. As the salt concentration increases, there is an influx of
both Cl– and Na+ ions into the inner
protein ion cloud in a drive to neutralize the net protein charge.
Simulation Dη for MCP-1 at 310.15
K as a function of NaCl concentration (black circles) alongside a
simple experimental viscosity-based prediction of the diffusion coefficient
(blue triangles). Of interest is the deviation of the actual diffusivity
from the predicted trend based solely on change in solvent viscosity.
The simulation Dη values show a
stronger dependence upon salt concentration, primarily through a sudden
drop with the initial introduction of NaCl.Normalized probability density function
of ion occupancy as a function
of distance from the MCP-1 protein surface over the different simulation
salt concentrations. In the case of Cl– ions (left),
the counterion cloud at 0 M sees an enhanced occupancy probability
near the protein surface relative to further out in solution as illustrated
below. As the salt concentration increases, there is an influx of
both Cl– and Na+ ions into the inner
protein ion cloud in a drive to neutralize the net protein charge.To better illustrate quantitatively
how the ions partition about
the MCP-1 signaling protein, we radially integrated populations of
both the Cl– and Na+ ions about the geometric
center of the flexible/dynamic protein. Figure shows the to-scale results of this ion population
analysis as a function of increasing salt concentration. As the protein
is flexible in the molecular dynamics simulations, it is difficult
to tightly resolve the preferred absolute location of ions relative
to protein features, so we integrated the protein atom density out
to the 90% occupancy level to use as an average protein contact surface.
This is shown in the panels of Figure as an orange background circle with radius rp90. The rs1 and rs2 come from integrating the Cl– counterion density out to the first and second ion occupancy values,
respectively, and these are taken to represent the first and second
ion cloud shells about the protein. As a metric of the nonuniformity
or nonideality of ion populations in the environment around the protein,
we calculate a nonideality ratio ϕNIwhere nions,s# is the integrated number of Cl– or Na+ ions out to either the first or second ion cloud shells as indicated,
and Vs# is the volume of the indicated
ion cloud shells. A ϕNI value is the ratio of ion
number densities in the inner (ρions,s1) to the outer
(ρions,s2) cloud shell, so a value of 1 indicates
a balanced ion density between the two ion cloud shells. Values greater
or less than 1 indicate increasingly biased ion populations either
near to or away from the protein’s surface, respectively.
Figure 6
Integrated
Cl– (left panels) and Na+ (right panels)
ion populations about the 90% protein particle occupancy
surface as a function of salt concentration. The bottom right of each
panel lists the ϕNI nonideality ratio for ion populations
in the ion cloud shells, while the bottom left of the Cl– panels lists the net charge within the outer ion cloud sphere. The
circled numbers indicate the net charge within each cloud shell, rounded
to the nearest formal charge, and the color of the Cl– shells indicates the asymmetry in absolute charge between them.
In all Cl– panels, the ϕNI is greater
than unity, indicating a neutralization driven pressure for populating
anions at the protein surface. This is further seen by the sudden
drop of the net charge within the rs2 sphere
upon addition of salt going from 0 to 0.125 M NaCl, with the protein
near fully neutralized by 0.5 M NaCl.
Integrated
Cl– (left panels) and Na+ (right panels)
ion populations about the 90% protein particle occupancy
surface as a function of salt concentration. The bottom right of each
panel lists the ϕNI nonideality ratio for ion populations
in the ion cloud shells, while the bottom left of the Cl– panels lists the net charge within the outer ion cloud sphere. The
circled numbers indicate the net charge within each cloud shell, rounded
to the nearest formal charge, and the color of the Cl– shells indicates the asymmetry in absolute charge between them.
In all Cl– panels, the ϕNI is greater
than unity, indicating a neutralization driven pressure for populating
anions at the protein surface. This is further seen by the sudden
drop of the net charge within the rs2 sphere
upon addition of salt going from 0 to 0.125 M NaCl, with the protein
near fully neutralized by 0.5 M NaCl.The most distinct aspects of the panels in Figure are the large ϕNI values
of the Cl– ion cloud populations versus the near
uniform spatial distribution of Na+ ions and the change
in net charge within the ion cloud environments. The MCP-1 protein
has a +5 net charge, so counteranions populate preferentially nearby
in order to neutralize this large formal charge. With no added salt,
the large ϕNI value indicates that this neutralization
pressure is a strong drive, though it is opposed by counterion distribution
entropy. With the addition of 0.125 M NaCl, there is a sudden drop
of the net charge within the ion cloud environment, indicating that
the introduction of salt makes it easier to locally neutralize the
protein charge, and the ϕNI greater than unity means
that this neutralization is preferentially localized at the immediate
protein surface. Increasing the salt concentration to 0.5 M leads
to near complete neutralization of the protein within this ion cloud
environment.This ion cloud population behavior has implications
for the protein
diffusion coefficient in that the protein does not diffuse independent
of its environment. The ion cloud has an outsized gain of Cl– ions with even a small addition of salt in the protein solution
environment. This leads to an increased drag, as the denser ion cloud
needs to be pulled with the protein as it translates. This condition
is consistent with the observed sudden drop in the diffusion coefficient
seen in Figure with
addition of 0.125 M NaCl.
Protein Surface Area Connects to Hydrodynamic
Radius More Strongly
than Protein Volume
The calculated diffusion coefficients
in the previous work are internally consistent, with cubic and noncubic
simulation results giving nearly identical values. In the case of
pure water, the Dη values compare
well with experimental values. How do the MCP-1 diffusion coefficients
compare with experimental results?In the absence of direct
experimental diffusion data, estimations of a given protein’s
diffusion coefficient can be crafted from molecular-weight-based relations.
One of the most common is[3]where A is a connection constant
that can be determined from known MW and diffusion data from single
or sets of proteins. For example, using both the MW and D20, of hemoglobin,[4,36] this
constant has been estimated to be 2.82 × 10–5 cm2 s–1 g1/3 mol–1/3.[37] Using this coefficient, the diffusion
coefficient of MCP-1 (mass: 8.144 kg mol–1) in pure
water at 20 °C would be Dexpt = 0.
140 × 10–5 cm2 s–1. This is ∼30% off the calculated value, and this difference
mostly seems to come from inflexibility in altering system temperature
and viscosity conditions in eq . If one considers the temperature change and resulting change
in viscosity moving to 37 °C in the standard Stokes–Einstein
relationthe estimated Dexpt would increase to 0.216 × 10–5 cm2 s–1. Instead of the calculated MCP-1 value being
30% faster than this experimental estimate, the calculated value now
appears to be ∼16% slower. A closer look at the data behind
the recommended A coefficient uncovered that the
MW of hemoglobin reported in the cited source is about 5.5% too large.[4] The mass comes from an early study on horse hemoglobin
reported by Svedberg and Pedersen,[2] while
the D20, comes from
a different study on human hemoglobin by Lamm and Polson.[38] Correcting this to use the known weight of human
hemoglobin lowers the experimental estimate to 0.210 × 10–5 cm2 s–1, making the
calculated MCP-1 Dη only 13% too
low. Polson actually recommended an A value of 2.74
× 10–5 cm2 s–1 g1/3 mol–1/3, built from a wider statistical
fit to accepted diffusion coefficients and MWs at that time.[3] Using this A and applying corrections
for differences in temperature and viscosity gives a Dexpt = 0. 194 × 10–5 cm2 s–1, which is much closer to the calculated value
of 0.182 × 10–5 cm2 s–1. Further corrections for molecular asymmetry and bound water could
potentially lead to even greater agreement,[4,34,39−41] though the statistical
nature of the fitting process behind the A value
likely incorporates these considerations to some degree.While
it is encouraging that eq can be readily adjusted to alter results and potentially
improve agreement with expected values, this variability and statistical
foundation also hides some of the reasoning for how and why it works.
This particular mass-to-diffusion relation, like many other similar
relations,[42−44] is based on two implicit assumptions: (1) proteins
and other such molecules have uniform densities, and (2) the hydrodynamic
radius, RH, is well approximated by the
radius of a sphere with a volume similar to that from the protein
mass divided by this uniform density. These approximating assumptions
came from before the first crystal structures of proteins were determined[45] and were actually adopted to estimate molecular
weights from measured diffusion coefficients.[3]Unlike when many of these relations where first devised, we
now
have many known protein structures whose volumes and surface areas
we can directly calculate, and this body of knowledge provides significantly
more detailed information from which to construct MW to diffusion
relations. Is it possible to form an estimate of the diffusion coefficient
from only a protein’s sequence without statistically building
in prior experimental knowledge of diffusion coefficients? Also, there
have been many studies exploring volume and surface area relations
for estimating the hydrodynamic radii of small molecules up through
protein-sized structures.[39,40,44,46−49] Is the volume or the surface
area a better geometric property to work through in estimating a hydrodynamic
radius? Answering these questions could potentially lead to some additional
insight into the driving forces behind diffusion in biomolecular systems.To test this assumption, we calculated
the SASA, SESA, and SESA
volume for a set of 40 proteins with MW ranging from 3.7 to 48 kDa.[30] In order to base this effort purely upon sequence
information, all nonstandard residue components, such as ligands,
ions, hemes, water, etc., were removed from this set of structures,
as mentioned in the Methods section. Figure shows the data trends
for these calculations as a function of MW. In all three cases, the
trends are linear, though there is slightly more scatter in the surface
area trends than the volume trend. Inserting the MW into linear fit
functions from these trends would be a quick way to directly estimate
a volume or surface area of a protein in the case that we only know
the protein’s sequence and thus MW. These values could then
be converted directly into the RH needed
in the Stokes–Einstein relation. This is the goal of the proposed
volume and surface area mass-to-diffusion relations in eqs and 7.
Figure 7
Calculated
SESA volume, SASA, and SESA for a series of 40 protein
structures, plotted as a function of molecular weight. All three are
linear trends, though with differing slopes and differing intercepts
if regression of the data is not set to pass through the origin.
Calculated
SESA volume, SASA, and SESA for a series of 40 protein
structures, plotted as a function of molecular weight. All three are
linear trends, though with differing slopes and differing intercepts
if regression of the data is not set to pass through the origin.The slopes and intercepts of the
linear functions in Figure were converted into a form
that could be directly inserted into the two mass-to-diffusion relations.
Two sets of parameters were determined for each of the SASA, SESA,
and SESA volume trends. In one set, the regression is fixed to pass
through the origin, so only the slope, and hence C0 parameter, is used. In the other set, both the slope
and intercept are used in the form of C0 and C1 parameters. In practice, the C1 parameter acts as a general mass correction,
providing some flexibility to the fitting trend if needed. Table shows the resulting
parameters for each of the fits. In these cases, the Vol0 and Vol1
sets are intended for the volume-based relation, eq . The remaining surface area sets are intended
for the surface area-based relation, eq .
Table 4
Protein MW to Geometry C0 and C1 Parameters Extracted
from Fits to Data in Figure for the Proposed Volumea or Surface
Areab Mass-to-Diffusion Relations, Both
alongside the Predicted Dη for MCP-1
at 310.15 K Using the Listed Parameters in Their Associated Equation
method
C0
C1
MCP-1 Dη
(10–4 cm K–1 kg4/3 mol–1/3)c
(kg mol–1)
(10–5 cm2 s–1)
(10–4 cm K–1 kg3/2 mol–1/2)d
Vol0
1.112
0.000
0.2466(7)
Vol1
1.107
–0.382
0.249(1)
SASA0
1.251
0.000
0.196(4)
SASA1
1.389
5.550
0.167(3)
SESA0
1.334
0.000
0.209(4)
SESA1
1.432
3.648
0.186(4)
Eq .
Eq .
Units for Vol0 and Vol1.
Units for SASA0, SASA1, SESA0, and
SESA1.
Eq .Eq .Units for Vol0 and Vol1.Units for SASA0, SASA1, SESA0, and
SESA1.Initial testing of
these relations was performed on the MCP-1 system
to see which of the approaches was able to best reproduce the explicit
simulation results. In them, T = 310.15 K, and the
η was set to the experimental viscosity of water at this temperature,
6.95 × 10–4 kg m–1 s–1. The primary trend in these results is that the surface-area-based
relations outperform the volume-based relation when it comes to agreeing
with the molecular simulation value of 0.182 × 10 cm2 s–1. Given that diffusion
in molecular systems is primarily affected by collisions between particles,
it is understandable that surface area would be a more well connected
geometric property to the RH. Also, in
these cases, the inclusion of the C1 parameter
seemed to make little if any difference. SESA1 seems to benefit from
this mass correction, but Vol1 and SASA1 show only minor to no improvement
over the Vol0 and SASA0 variants.It is important to note that
the only structurally dependent input
is the MW of MCP-1. The resulting predictions rely upon the assumption
that the MW is connected to volume or surface area in the same manner
as other proteins of similar MW, specifically the broad set used in
developing the MW-to-geometry relation as depicted in Figure . Without specific considerations
of protein flexibility,[50] cosolvent characteristics,
and dimerization propensity,[51] an MW-based
prediction will be intrinsically limited in accuracy. Of these specific
considerations, multimeric propensity would be the easiest to treat.
The predicted diffusion coefficient would simply come from an ensemble
average of the properly weighted predictions from the multimer state
MWs. In the case of MCP-1 here, the comparisons are to calculated
values that by design do not have any possibility for dimerization.The above result is only a single protein molecular weight data
point compared with our corrected computational diffusion coefficients,
making it difficult to fully evaluate the relations. Table shows the results for prediction
of D0 using only sequence MW for a series
of proteins provided by Tanford.[4] Again,
only the molecular weight of each of these proteins was used as an
input for estimating the diffusion coefficient. Across this series,
it appears that approximating the RH,
and thus the overall diffusion coefficient at a given T and solvent η, via a MW to volume relation tends to predict
slower diffusion coefficients than experiment, just as in the diffusion
predictions for MCP-1 in Table . In the case of the MW to SASA trend, the predicted diffusion
coefficients are faster than experiment, which also appears consistent
with previous results. SESA sits between these two extremes and better
reproduces the experimental D0 values.
It should be noted that while a 19–21% root-mean-squared deviation
is an improvement over volume and SASA, pinning the trend to a known
diffusion coefficient, as done using the traditional eq relation, will be about twice as
accurate. This is particularly true if the molecules used for determining
the A coefficient are part of the evaluation set.
The interest here is that no previously known diffusion information
was provided. The trend in predicted diffusion coefficients with increasing
MW seems to hold regardless of using volume or surface area. This
indicates that a better accounting of the solvent-excluded protein
contact surface would potentially lead to more accurate predictions
of protein diffusion than currently possible using knowledge-based
diffusion statistics.
Table 5
Volume-a and Surface-Area-BasedbD0 Predictions Using Only MW
Knowledge of a Protein Sequence
Compared with Experimental Protein D0 Values[4]
protein
MW
expt.
Vol0
Vol1
SASA0
SASA1
SESA0
SESA1
(kDa)
(10–5 cm2 s–1)
ribonuclease
13.7
0.1190
0.1360(4)
0.1366(5)
0.099(2)
0.093(2)
0.105(2)
0.101(2)
lysozyme
14.3
0.1040
0.1340(4)
0.1345(5)
0.097(2)
0.091(2)
0.103(2)
0.099(2)
chymotrypsinogen
23.4
0.0950
0.1137(3)
0.1137(5)
0.076(2)
0.075(2)
0.081(2)
0.080(2)
β-lactoglobulin
35.2
0.0782
0.0993(3)
0.0991(4)
0.062(1)
0.064(1)
0.066(1)
0.067(1)
ovalbumin
41.9
0.0776
0.0937(3)
0.0935(4)
0.057(1)
0.059(1)
0.060(1)
0.062(1)
hemoglobin
64.5
0.0690
0.0812(2)
0.0809(3)
0.0456(9)
0.049(1)
0.049(1)
0.051(1)
serum albumin
66.3
0.0594
0.0822(2)
0.0820(3)
0.0465(9)
0.049(1)
0.050(1)
0.052(1)
catalase
227.1
0.0410
0.0533(2)
0.0531(2)
0.0243(5)
0.0266(5)
0.0259(5)
0.0276(6)
MUE
0.25
0.24
0.24
0.22
0.19
0.18
RMSD
0.26
0.26
0.26
0.23
0.21
0.19
Eq .
Eq .
Eq .Eq .
Conclusions
Accurate computational determination of the diffusion coefficient
of biomolecules is a challenging problem. The finite sizes necessary
for performing molecular simulations of analogous systems pose a significant
computational burden, and even accounting for this, a series of increasingly
large simulations often needs to be performed in order to correct
the dynamical quantities to make them experimentally comparable. Here,
we show that it is possible to determine refined diffusion coefficients
for a small signaling protein, MCP-1, even in noncubic simulation
cells. The approaches used in this work were validated against experimental
values for the diffusion coefficient and viscosity of water as well
as experimentally based diffusion coefficient estimation from protein
molecular weight.In addition to pure water solvent simulations
of counterion neutralized
MCP-1, we explored the effect of salt concentration by performing
similar simulations in the presence of 0.125, 0.25, and 0.5 M NaCl.
We find that the ions populate the local environment around the protein
to better counter its net charge and ion induced solvent charge inhomogeneities.
This action, even at very low ionic strength, works to alter the local
solvent environment around the protein and cause it to slow down more
than expected from simple uniform viscosity changes due to the presence
of salt.Finally, this computational study of biomolecular diffusion
motivated
us to present a geometric investigation of molecular-weight-based
relations for prediction of diffusion coefficients. Using only the
connection between a protein’s mass and the volume, SASA, or
SESA, we show that it is possible to predict the diffusion coefficients
of proteins without previous input knowledge of their diffusivity.
While the predictions using the proposed relations are not as accurate
as those that use statistical fitting to known protein diffusion data,
this effort provides an initial foundation for exploring alternate
routes to general independent predictions of biomolecule dynamics.
Authors: David Van Der Spoel; Erik Lindahl; Berk Hess; Gerrit Groenhof; Alan E Mark; Herman J C Berendsen Journal: J Comput Chem Date: 2005-12 Impact factor: 3.376
Authors: David A Case; Thomas E Cheatham; Tom Darden; Holger Gohlke; Ray Luo; Kenneth M Merz; Alexey Onufriev; Carlos Simmerling; Bing Wang; Robert J Woods Journal: J Comput Chem Date: 2005-12 Impact factor: 3.376
Authors: Richard M Venable; Elizabeth Hatcher; Olgun Guvench; Alexander D Mackerell; Richard W Pastor Journal: J Phys Chem B Date: 2010-10-07 Impact factor: 2.991