Implicit solvation is a mean force approach to model solvent forces acting on a solute molecule. It is frequently used in molecular simulations to reduce the computational cost of solvent treatment. In the first instance, the free energy of solvation and the associated solvent-solute forces can be approximated by a function of the solvent-accessible surface area (SASA) of the solute and differentiated by an atom-specific solvation parameter σ(i) (SASA). A procedure for the determination of values for the σ(i) (SASA) parameters through matching of explicit and implicit solvation forces is proposed. Using the results of Molecular Dynamics simulations of 188 topologically diverse protein structures in water and in implicit solvent, values for the σ(i) (SASA) parameters for atom types i of the standard amino acids in the GROMOS force field have been determined. A simplified representation based on groups of atom types σ(g) (SASA) was obtained via partitioning of the atom-type σ(i) (SASA) distributions by dynamic programming. Three groups of atom types with well separated parameter ranges were obtained, and their performance in implicit versus explicit simulations was assessed. The solvent forces are available at http://mathbio.nimr.mrc.ac.uk/wiki/Solvent_Forces.
Implicit solvation is a mean force approach to model solvent forces acting on a solute molecule. It is frequently used in molecular simulations to reduce the computational cost of solvent treatment. In the first instance, the free energy of solvation and the associated solvent-solute forces can be approximated by a function of the solvent-accessible surface area (SASA) of the solute and differentiated by an atom-specific solvation parameter σ(i) (SASA). A procedure for the determination of values for the σ(i) (SASA) parameters through matching of explicit and implicit solvation forces is proposed. Using the results of Molecular Dynamics simulations of 188 topologically diverse protein structures in water and in implicit solvent, values for the σ(i) (SASA) parameters for atom types i of the standard amino acids in the GROMOS force field have been determined. A simplified representation based on groups of atom types σ(g) (SASA) was obtained via partitioning of the atom-type σ(i) (SASA) distributions by dynamic programming. Three groups of atom types with well separated parameter ranges were obtained, and their performance in implicit versus explicit simulations was assessed. The solvent forces are available at http://mathbio.nimr.mrc.ac.uk/wiki/Solvent_Forces.
Proteins have evolved to function within
the water–rich
environment of the cell. Adaptation to the particular solvation properties
of water, such as strong electrostatic shielding, hydrogen–bond
donor/acceptor saturation, and entropic effects, led to the known
segregation of predominantly hydrophobic residues in the core and
polar/charged amino acid residues on the protein surface. The distribution
of residue types on the protein surface determines its interaction
with the surrounding bulk solvent and with other solute molecules.[1] These interactions define to a large extent the
conformational equilibria and biological function of a protein. The
range of accessible conformations under physiological conditions is
the result of a delicate balance between competing forces: (i) highly anisotropic intraprotein interactions and (ii) approximately isotropic bulk–solvent interactions.
It is therefore not surprising that the presence of water has become
an integral part of protein folds by stabilizing secondary structure
elements and their association.[2−4]Biomolecular simulations
account for the presence of water in the
native environment either explicitly, by inclusion of water molecules,
or implicitly, by approximating the mean force exerted by the water
on the biomolecule. The latter is considerably faster to compute,
because the implicit solvent does not contribute any degrees of freedom
to the simulation, although it comes at the expense of a neglect of
specific features such as water dipole orientation and hydrogen bond
fluctuations at the surface of the solute. The extent to which a solvent
model (explicit or implicit) can realistically reproduce the dominant
physical forces in dynamic protein structures is therefore crucial
to its success in describing conformational equilibria. Implicit solvation
may be the best (or sole) choice for systems with a large number of
degrees of freedom, for systems whose reference experiment spans a
time scale inaccessible to current state-of-the-art explicit solvent
Molecular Dynamics (MD) and also for enhanced sampling simulations,
where conformational changes induced in the solute would lead to clashes
with explicit water molecules.[5,6]Because of their
computational efficiency, implicit solvent models
have been used in a large variety of computational studies, e.g. folding
simulations,[7] energy scoring of protein
structures,[8] structure prediction and design,[9,10] and membrane simulations.[11,12] Each model relies on
approximations of the mean force contribution of the solvent to the
overall energy of the solute molecule.The starting point of
most models is the use of a first-shell approximation
of the solvent effect, i.e. the assumption that the force on a solute
atom exerted by the solvent is on average proportional to the solvent-accessible
surface area (SASA) of the solute atom. This simple approximation
can be complemented by long-range electrostatic solvent contributions
based on the approximation that the bulk solvent behaves as a dielectric
continuum, leading to Poisson–Boltzmann (PB) or Generalized
Born (GB) energy terms to describe the electrostatic interactions
between the solvent and the partial charges of the solute.[13] Mixed models with GB/SASA terms are now widely
used[14−17] and are successful in predicting binding free energies. Comparisons
of the performance of different implicit solvent models[18−22] revealed the lack of an accurate implicit treatment of nonpolar
solvation in most models.[23] Complementing
the surface term with a volume term improves the description of long-range
solute–solvent interactions[24] and
nonpolar contributions.[25,26]In the past we
have presented an efficient implicit solvent model
for use in MD simulations based on a fast analytical approximation
to the SASA.[27] The energy of solvation
in this and other SASA-based models is expressed simply as V = ΣσA, where A denotes the SASA of atom i and σ an atom-specific solvation energy per surface
area parameter, which is empirical in nature. The analytical formula
used in the model for the fast evaluation of SASA is based on nearest-neighbor
distances and was published by Hasel et al.[28] This model was incorporated into the GROMOS simulation package[29,30] and appropriate atomic σ parameter
values compatible with the GROMOS force field parameter set 43A1 for
biomolecular solutes were proposed.[27] The
same model with a virtually identical parametrization was later used
in conjunction with the CHARMM force field,[31] showing the validity of the solvation parameters independent of
the solute force field employed.Here we describe the derivation
of values for the σ parameters for the
atom types of the GROMOS force
field 43A1 via force matching within the framework
of large-scale explicit solvent MD simulations, where the time-averaged
explicit forces on each solute atom type exerted by the explicit solvent
molecules are transformed into an implicit mean solvation force and
used to derive solvation parameters. The forces on the protein atoms
due to the explicit water molecules averaged over the trajectories
have been made available on our Web server to encourage further development
of the implicit solvation model and the refinement of parameters in
protein force fields other than the GROMOS one.
Methods
A Mean Plus Stochastic Force Representation of the Solvent
If the solvent degrees of freedom are not explicitly simulated,
one may approximate the force f exerted
by the solvent on atom i of the solute by a mean
force f, which can be derived
from a potential energy term, a potential of mean force V(r)which represents the averaged effect of the
omitted solvent degrees of freedom on the solute. A solute configuration
is represented by r ≡
(r1,r2,...,r), the Cartesian coordinates of all N solute atoms. A higher-order approximation of the force f exerted by the solvent
on the solute is obtained by considering not only its mean effect
but also its fluctuations in time and its frictional effect[32]The stochastic force is denoted by f, and the frictional force is proportional
to the solute atom velocity v with proportionality factor mγ, in which γ is the atomic friction coefficient and m is the mass of solute atom i. A stochastic force introduces energy into the system
and a frictional force removes energy from it. The condition of zero
energy loss on average will relate the two forces. If the stochastic
force f obeys a Gaussian probability
distribution with zero mean, if it is not correlated with prior velocities
or forces, and if the friction coefficient is independent of time,
this condition readswhere a (time) average is denoted by ⟨···⟩, k is Boltzmann’s constant,
and T is the reference
temperature of the system. Combination of eq 2 with Newton’s equation of motion leads to the stochastic
Langevin equation of motionin which f represents the intrasolute forces exerted by the atoms of
the solute on solute atom i. Numerical integration
of such a stochastic equation of motion is called stochastic dynamics
(SD) simulation.[33] More sophisticated forms
of SD than eqs 3 and 4 can be obtained by incorporating temporal and spatial correlations
in the description of the stochastic force f. f(r) are calculated from the solute
configuration r using the
GROMOS biomolecular force field. The mean force f(r)
due to the solvent is obtained from its energetic contribution to
solvation of a solute molecule, which is here treated as approximately
proportional to the molecular SASA, given by the sum of the atomic
SASA contributions AThe derivative of eq 5 with respect to r yields
the implicit solvent force f on
atom iThe implicit solvent force is proportional
to the atomic SASA change ∂A that is associated with a change ∂r. An analytical formula for the SASA computation
has been described elsewhere;[27,28] it is recapitulated
here for completeness of the methodological procedure. The SASA of
atom i is given by the analytical formulawhere the parameter S denotes the surface of the isolated atom i, and the terms p, p, and b are geometric parameters
that describe the reduction of SASA depending on the atom type i and the neighbor atom types j and r = |r–r|. The derivative (∂A)/(∂r) as
required for eq 6 is given in the Appendices
of Hasel et al.[28] and Allison et al.[24] Geometric parameters are reported in Hasel et
al.,[28] and specific p values for the GROMOS[29] atom types are reported in Fraternali and van Gunsteren.[27]The stochastic force f (t) and the atomic friction
coefficient γ will only be sizable
for solute atoms at the surface. Therefore, they are taken dependent
on the number of neighbor atoms[32]withwhere N(t) denotes the number of non-hydrogen neighbor
atoms of the solute within 0.3 nm radius, and N was defined as an upper limit of 6 neighbor
solute atoms at which solvent forces on solute atom i are assumed to vanish.
A Procedure To Determine the Implicit Solvation Parameters σ
Previously,[27] the parameters σ of the model were
derived by a calibration of the radius of gyration, and the hydrophobic
and hydrophilic SASA obtained in MD simulations with the implicit
solvent model against these quantities obtained in MD simulations
with explicit water molecules, for three proteins of different sizes.
Here we propose an alternative procedure in which the σ are determined such that the implicit solvation
forces f on the solute atoms i match as closely as possible the corresponding average
forces ⟨f⟩ that
are exerted on the solute atoms i by the solvent
molecules in an explicit solvent MD simulation.This matching
of f to ⟨f⟩ is not straightforward though.For a given solute configuration r, the direction of f is determined by eq 6, i.e. it is roughly perpendicular to the SASA A in the neighborhood of atom i, while the direction of ⟨f⟩ is determined by an average of the configurations
of many explicit solvent molecules (Figure 1), which means that f cannot faithfully
represent ⟨f⟩. In
order to minimize noise in the calibration data set we omit those
atoms for which the projection of ⟨f⟩ onto f is smaller than its component orthogonal to f.
Figure 1
Force matching. Projection of the explicit solvent force f on a solute atom onto
the unit vector of the implicit solvent force f̂ direction yields the implicit solvent
force f. The implicit force direction
is determined by the derivative −∂A/∂r.
The forces f (r)
and ⟨f⟩ are straightforwardly
matched only when the solute configuration r is the same for all the solvent configurations over
which the average in ⟨f⟩
is carried out. Therefore, we keep the solute configuration r fixed in the MD simulations
that were used to obtain the average forces due to the explicit solvent
molecules.The ⟨f⟩ are obtained from
MD simulations of proteins
in explicit solvent in which the solute force field is compatible
with the solvent one, in the present case the GROMOS force field[29,34] 43A1, which is compatible with the simple-point charge (SPC) water
model.[35] When omitting the solvent degrees
of freedom in an implicit solvent simulation, their dielectric screening
effect should somehow be retained. In the GROMOS 43B1 solute force
field for in vacuo simulations, this is achieved
by adapting the partial charges of groups of atoms that bear a total
charge of ±1e, such as in Asp, Glu, Arg, and
Lys side chains, such that their net charge becomes zero, while their
hydrogen-bonding capacity is retained.[29]
Determination of the σ Parameter
Values
The implicit solvation force f on each atom i should match
the strength of the average explicit force ⟨f⟩ exerted by the surrounding solvent along
the direction of fwhere the hat ∧ indicates
a vector of unit length, the ensemble average ⟨⟩ is
over the MD configurations from a simulation in explicit solvent,
and the index i denotes a particular atom in a given
molecule. This projection is illustrated in Figure 1. Solving eq 6 and eq 10 for σ yieldswhich means that the σ parameter values can be obtained from MD simulations: (i) the SASA derivative ∂Ai/∂ri from an implicit solvation simulation and (ii) the mean solvation force ⟨f⟩ from an explicit solvation simulation,
under the constraint that the protein conformation is identical in
both MD simulations.The angle between f and ⟨f⟩ is given by
Selection of the Reference Protein Set
The protein
set for the parametrization of σ should
contain a range of different folds to reflect the variation of protein
structures. To this end a topological alphabet was devised to capture
in simple terms the structural composition of protein folds, in loose
analogy to the methods of Martin[36] and
Kamat and Lesk.[37] The protein topology
was described as the sequence of pairs of secondary structure elements
(supersecondary structures), i.e. β–β, β–α,
α–β, or α–α. To increase the
resolution of the alphabet, the angle between the secondary structure
elements was included as a geometric parameter, and three angle ranges
0–60°, 60–120°, and 120–180° were
combined with the four topological states to yield 12 states of the
alphabet (Table S1 in the Supporting Information). Using this alphabet, we translated a selected set of 2559 well-resolved
protein domains of the SCOP ASTRAL40 database[38] to topological strings by assigning a topological alphabet character
to each supersecondary structure. The concatenated characters form
a topological string that characterizes basic features of the protein
fold.The selected SCOP set was reduced to less than 10% of
its size by applying the MinSet method,[39] so as to derive a database subset that was amenable to MD simulations
but maximally informative in terms of topological composition. Within
the framework of a genetic algorithm, random domain subsets were created,
their topological strings concatenated and assessed in terms of the
overall string entropy, where the entropic score takes the original
database composition into account. The subset with the highest entropy
was chosen. Among all the created random subsets, this subset was
the most informative with respect to the topological composition,
which does not exclude the presence of topologically similar proteins.
The final list of 188 protein domains is given in Supporting Information Table S2.
Simulations
MD simulations were performed using the
GROMOS biomolecular simulation software.[29,30] The employed
force fields were GROMOS 43A1 for simulations in explicit solvent
(water) and GROMOS 43B1 for implicit solvent. The integration time
step was set to 2 fs. The temperature was set to 298 K and controlled
by weak coupling to a temperature bath[40] with a coupling constant τ =
0.1 ps. Simulations in explicit water were kept at a pressure of 0.061020
kJ mol–1 nm–3 (1 atm) with a coupling
time of τ = 0.5 ps and an isothermal
compressibility of 5.575 × 10–4 (kJ mol–1 nm–3)−1.Bond lengths were constrained by the SHAKE algorithm.[41] Both simulation types (i.e., in explicit and implicit solvent)
were run for 2.5 × 105 steps (500 ps), and configurations
were saved at intervals of 500 steps (1 ps). Explicit solvent forces
and SASAs were saved with each configuration. Since water equilibration
around solutes occurs on the time scale of 10–20 ps, explicit
solvent simulations of 500 ps are sufficiently long to sample representative
force distributions.
Simulations in Explicit Solvent (Water)
Initial protein
structures (taken from the PDB database) were energy minimized using
100 steps of steepest descent. Energy minimized protein conformations
were solvated in a periodic water box of either rectangular or truncated-octahedral
shape, whichever was smaller and therefore matched the overall protein
shape better. The dimensions of all periodic boxes were larger than
twice the nonbonded cutoff radius of 1.4 nm (the shortest box axis
length was 5.7 nm), and the distance between solute and box was set
to 0.75 nm (rectangular box) or 0.85 nm (octahedral box). Systems
were electrostatically neutralized by replacing water molecules with
sodium or chloride ions to compensate the net charge of the protein
at neutral pH value. The neutralized systems were energy minimized
using 100 steps of steepest descent, while the solutes (protein domains)
were harmonically positionally restrained using a force constant of
2.5 × 104 kJ mol–1 nm–2. The systems were run for 2.5 × 105 steps of MD
while keeping the solute positionally constrained. Twin-range cutoff
radii of 0.8/1.4 nm were used to compute nonbonded interactions. The
nonbonded pair list was updated every time step for pairs within 0.8
nm and every fifth time step for the range 0.8–1.4 nm. Long-range
electrostatic interactions were approximated by a reaction-field force,
using a dielectric constant of 54.
Simulations in Implicit Solvent
The GROMOS 43B1 force
field is derived from the 43A1 force field by neutralizing the ±1e charges of charged side chains (Arg, Lys, Asp, Glu) and
the charged termini of the polypeptide backbone and by reducing their
repulsive van der Waals parameters.[29] Accordingly,
the dielectric constant was set to 1.0 and the electrostatic cutoff
to 100 nm. The POPS parametrization of Fraternali and Cavallo[42] was used for the implicit solvation. POPS parameters
were derived specifically for proteins and DNA/RNA molecules using
the SASA model given in eq 7 (see below). Initial
protein conformations were energy minimized using 500 steps of steepest
descent in the presence of the implicit solvent force field term.
The system was run for 2.5 × 105 steps of stochastic
dynamics using γ = 91 ps–1, and ω(t) was updated every 100 steps during the
simulation.
Trajectory Analysis and Force Matching
Values for the
atomic σ parameters were derived
using eq 11. Explicit solvent forces on each
atom along the trajectory were averaged over 10 configurations (10
ps), yielding 50 averaged σ parameters
per atom per 500 ps trajectory. Area derivatives (∂A)/(∂r) were calculated for the solute configuration
of the explicit water simulation, which guarantees that the implicit
and explicit forces are referring to identical atom positions. For
atoms with covalently bound polar hydrogen atom(s) (e.g., the OH group),
the explicit force on the hydrogen atom(s) was added to that on the
main atom, because the hydrogen atoms are not considered in the calculation
of the SASA. The resulting force represents the cumulative solvent
force on the atom group. Several subsets of atoms were created to
select atoms for fitting the parameters of the SASA model: (i) ⟨ALL⟩, all atoms; (ii) ⟨SA⟩, atoms with Ai within the range of
0.2–0.5 nm2; (iii) ⟨SA
& θ⟩, with the additional criterion that the implicit-explicit
solvent force angle θ lies between 0–45° (hydrophilic)
or 135–180° (hydrophobic); (iv) ⟨SA
& θ+⟩, as before, but exclusively only one of the
two angle ranges, i.e. either hydrophilic or hydrophobic. The angle
range 0–45° corresponds to ‘outward’ and
‘hydrophilic’, and accordingly, the 135–180°
range corresponds to ‘inward’ or ‘hydrophobic’.Data analysis was performed using the R-project software (R Development
Core Team[43]). Maximum-likelihood fitting
was performed with the function ‘fitdistr’ of the ‘MASS’[44] package. Subsamples of observed force distributions
were obtained with the ‘sample’ function; subsamples
of theoretical distributions were generated using the ‘rlnorm’
function.Resampling was undertaken to transform a source distribution
into
the form of a target distribution, specifically the ⟨SA &
θ+⟩ distribution into the form of the ⟨SA⟩
distribution. This was achieved by resampling points from the source
distribution (⟨SA & θ+⟩) with a suitable probability
to reconstruct the density of the target distribution (⟨SA⟩).
The probability densities of each data point of ⟨SA & θ+⟩
to be found (i) in the ⟨SA⟩ distribution
and (ii) in the ⟨SA & θ+⟩
distribution were computed using the ‘dlnorm’ function.
The ratio of these probabilities (per data point) was used as a probability
vector to resample the source distribution (⟨SA & θ+⟩),
which generated the data points of the resampled distribution with
a density matching that of the target ⟨SA⟩ distribution.
Partitioning of the Range of σ Values
into Groups via Dynamic Programming
Atom
grouping according to the distribution of σ parameters of all atom types is a partitioning problem: considering
the entire range of σ values divided
into n bins, one seeks to find the best partition
of the bins into k groups, which is equivalent to
finding the best locations for k–1 dividers.
An optimal partitioning for a given number of bins n can be found via dynamic programming[45] as sketched below.The first requirement
is the definition of a score by which the obtained partition is judged.
We used the Mutual Information between two sets of variables: (i) the GROMOS atom types (here ‘1’, ‘2’,
..., ‘16’) and (ii) the atom groups
that are to be defined (for example ‘charged’, ‘polar’,
and ‘hydrophobic’). The Mutual Information is generally
defined as[46]X denotes the set of atom
types and x a single atom type, while Y denotes the set of atom groups and y a single atom
group. Since each σ value is assigned
to an atom type, depending on the partitioning to an atom group, the
probability p(x,y) is the relative frequency of observing the combination x and y for the given σ values, while p(x) and p(y) are the marginal probabilities
of these variables.In the partitioning process, by placing
a new divider, the σ range is separated
into a left and a right part.
Assuming that the left part already contains dividers (partitions),
the score for placing the new divider is the sum of the Mutual Information
of the partitions on the left side (including that of the already
existing partitions) and the right side. Systematic variation of the
divider position yields the one with the maximal Mutual Information.
This procedure is iterated until the positions of all k–1 dividers are found.Without knowing a priori the best number of groups k, a safe procedure is
to run the partitioning for a range
of k-values, here 2 to 20, and to evaluate the results
of all runs. The other free parameter is the width of the bins. A
width of 1 kJ mol–1 nm–2 was chosen
here. To compare the Mutual Information between the partitionings
resulting from different k-values (and therefore
from a different number of free parameters), the Mutual Information
of each partitioning was normalized by the Joint Entropy, which is
defined as[46]The Joint Entropy is the maximal Mutual Information
that could be theoretically achieved given the number of discrete
states in the variables X and Y.
The ratio I = I/H is therefore a measure
of the actual versus the maximal information gain.
Error Estimation by Bootstrapping
The statistical basis
for the estimation of the σ parameters
in the previous section is solid with about 103–104 data points per parameter. However, some σ parameter distributions are skewed, and it is
instructive to evaluate the variability of the distribution measures
(median σ, interquartile
range iqr(σ)) with respect to variation
of the input data. This was performed here by bootstrapping, i.e.
computation of distribution measures on artificial subsamples of the
input data. The ‘boot’ package[47] of the R-project (R Development Core Team[43]) was used to compute the median and iqr value on 1000 subsamples
of the input data of each atom type and atom group. Sampling was performed
with replacement. The median and iqr values obtained from the bootstrap
procedure were indistinguishable from the ones reported in the σ result tables, confirming the robustness of the
obtained parameters from variation of the input data; therefore, only
standard deviations of the bootstrap values are reported (this refers
to Table 1, Table 2,
and Supplementary Table S4).
Table 1
Solvation Parameters σ⟨ for Each GROMOS Atom Typea
atom
type
solvation
parameter
id.
type
description
σ⟨SA & θ+⟩SASA (iqr) (kJ mol–1 nm–2)
sdbs
n(σ⟨SA & θ+⟩SASA)
resampled σr⟨SA & θ+⟩SASA (iqr) (kJ mol–1 nm–2)
1
O
carbonyl oxygen (C=O)
–7.2 (5.1)
0.04
11246 [1184750]
–7.5 (4.0)
2
OM
carboxyl oxygen (CO–)
–21.7 (14.4)
0.1
17346 [316400]
–21.7 (16.6)
3
OA
hydroxyl oxygen (OH)
–7.0 (5.0)
0.1
5942 [156050]
–7.1 (4.9)
5
N
peptide nitrogen (NH)
– (−)
– [1105150]
– (−)
6
NT
terminal nitrogen (NH2)
–4.0 (3.0)
0.05
3803 [89000]
–3.8 (3.4)
7
NL
terminal
nitrogen (NH3+)
–26.1 (22.5)
0.1
20384 [79450]
–26.0 (21.6)
8
NR
aromatic nitrogen (−N=)
–4.5 (4.5)
0.1
1589 [64150]
–4.4 (4.3)
9
NZ
Arg amino nitrogen (NH2+)
–13.3 (12.9)
0.2
1908 [111500]
–13.5 (13.4)
10
NE
Arg imino nitrogen (NH)
– (−)
– [55750]
– (−)
11
C
bare carbon (C)
– (−)
– [1360950]
– (−)
12
CH1
methine carbon
(CH)
3.8 (3.0)
0.03
11347 [1473500]
4.0 (3.1)
13
CH2
methylene carbon (CH2)
5.0 (4.3)
0.02
55292 [1408200]
4.3 (3.2)
14
CH3
methyl
carbon (CH3)
3.3 (2.9)
0.01
57976 [664450]
3.7 (3.2)
16
CR1
aromatic carbon (−CH=)
4.5 (4.5)
0.04
10793 [644900]
5.1 (5.6)
Atom types 4 (water oxygen) and
15 (CH4) were not included in this parametrization. Atom types with
unassigned data (−) were under-represented in the data subset
because of their small SASA values. Numbers in square brackets show
the total number of atoms (per atom type) in the reference proteins.
The ‘resampled’ columns show the σ parameters derived from the resampled force distribution (see
text). σ, median value;
(iqr), interquartile range; sd, standard
deviation of the property in 1000 bootstrap (with replacement) samples;
n(σ⟨), number of data points.
Table 2
Solvation Parameters σ of the Three Atom Groups Derived by Partitioning via Dynamic Programming, As Shown in Figure 7a
group
atom
solvation
parameter
id.
description
id.
type
σgSASA (iqr) (kJ mol–1 nm–2)
sdbs
n(σgSASA)
1
charged
2, 7, 9
OM, NL, NZ
–23.3 (19.0)
0.1
38325 [507350]
2
polar
1, 3, 6, 8
O, OA, NT, NR
–7.3 (5.9)
0.05
20090 [1493950]
3
hydrophobic
12, 13, 14, 16
CH1, CH2, CH3, CR1
4.1 (3.6)
0.01
143152 [4191050]
σ, median value; (iqr): interquartile
range; sd, standard deviation of the
property in 1000 bootstrap (with replacement) samples; n: number of
data points. See Table 1 for a description
of the atom types.
Atom types 4 (wateroxygen) and
15 (CH4) were not included in this parametrization. Atom types with
unassigned data (−) were under-represented in the data subset
because of their small SASA values. Numbers in square brackets show
the total number of atoms (per atom type) in the reference proteins.
The ‘resampled’ columns show the σ parameters derived from the resampled force distribution (see
text). σ, median value;
(iqr), interquartile range; sd, standard
deviation of the property in 1000 bootstrap (with replacement) samples;
n(σ⟨), number of data points.σ, median value; (iqr): interquartile
range; sd, standard deviation of the
property in 1000 bootstrap (with replacement) samples; n: number of
data points. See Table 1 for a description
of the atom types.Force matching. Projection of the explicit solvent force f on a solute atom onto
the unit vector of the implicit solvent force f̂ direction yields the implicit solvent
force f. The implicit force direction
is determined by the derivative −∂A/∂r.
Results
The parametrization of σ by force matching
is based on a projection of the explicit solvent force onto the implicit
force direction (Figure 1 and eq 11). To obtain meaningful solvation parameter estimates, the
direction of the explicit force and the implicit force should not
be too different and the size of the explicit force should not be
very small. Moreover, the solvent-accessible area A of an atom should not be very small.
These conditions are not always met as the data analysis presented
below reveals.Atoms having small SASAs or near-orthogonal projection
angles behave
like ‘noise’ when deriving σ values because they hardly contribute to the implicit force
vector. Therefore, we select data that match the characteristics of
the SASA model to be included in the calibration set of atoms, i.e.
atoms showing a partial exposure to solvent and explicit solvent forces
roughly aligned with the direction of the implicit solvation force.
Starting from the set of all data ⟨ALL⟩, three data
subsets with increasing degree of selectivity were created and tested
for the determination of σ: ⟨SA⟩
(selection on area range 0.2–0.5 nm2), ⟨SA
& θ⟩ (selection on area range 0.2–0.5 nm2 and two angle ranges 0–45° (hydrophilic) and
135–180° (hydrophobic)), and ⟨SA & θ+⟩
(selection on area range 0.2–0.5 nm2 and on angle
range 0–45° (hydrophilic) or 135–180° (hydrophobic)).
The subset short names will be given in the following as subscripts
to indicate the underlying data set. The final parametrization was
performed with the ⟨SA & θ+⟩ subset.
Distribution of SASA for Different Atom Types
Before
embarking on the examination of solvent forces, it is instructive
to view the distribution of SASA per atom type (Figure S1). Charged atoms (NH3+ (a), CO– (b)) show a median
exposure of 0.2–0.3 nm2, while polar atoms (OH (c),
C=O (d)) are less exposed, but usually more than the hydrophobic
carbon atoms (CH2 (e), −CH= (f)). The low
exposure of the carbonyl oxygenC=O is caused by its presence
in the peptide backbone. More exposed C=O atoms are located
in the amido groups of the Asn and Gln side chains (data not shown).
Distribution of the Force on the Solute Atoms Due to the Solvent
The explicit water force density distributions depend on the SASA
and polarity of the respective atom type (Figure 2). This and the following plots show typical atoms of the
set of GROMOS atom types, with two examples of each charged, polar,
and hydrophobic atom types. The color coding illustrates the data
subsets: gray denotes the entire data set (), orange the
selection
of atoms within the SASA range 0.2–0.5 nm2 (⟨SA⟩),
and blue the additional selection of force angle in the ranges 0–45°
or 135–180° (⟨SA & θ+⟩), where
the angle is measured between the direction of the implicit force
and the direction of the explicit force.
Figure 2
Distribution of the size of the explicit solvent (water) forces f for selected GROMOS atom
types and for different data sets. The gray distributions show the
forces on ⟨ALL⟩ atoms. The overlaid colored distributions
show subsets: the ⟨SA⟩ forces in orange, the ⟨SA
& θ+⟩ forces in blue. Insets show the ⟨SA
& θ+⟩ data in a uniform scale of 104 data
points for a quantitative comparison of the atom type frequency. The
selected atom types are the same as in Figure
S1.
It is apparent that
most atoms are excluded from the selected parametrization sets because
of their small SASA. The insets show the finally selected data set
⟨SA & θ+⟩ scaled to 104 data points.
The shape of all force density distributions is log–normal,
as demonstrated by quantile–quantile (Q–Q) plots of
a data sample over a random sample from a theoretical log–normal
density function with identical mean and sd values as the fitted data
(Supplementary Figure S6 – Figure S8). The forces on charged atoms show a median at about 200–300
kJ mol–1 nm–1; those on polar
atoms are about half as strong, and most hydrophobic atoms experience
forces of about 10 kJ mol–1 nm–1.Distribution of the size of the explicit solvent (water) forces f for selected GROMOS atom
types and for different data sets. The gray distributions show the
forces on ⟨ALL⟩ atoms. The overlaid colored distributions
show subsets: the ⟨SA⟩ forces in orange, the ⟨SA
& θ+⟩ forces in blue. Insets show the ⟨SA
& θ+⟩ data in a uniform scale of 104 data
points for a quantitative comparison of the atom type frequency. The
selected atom types are the same as in Figure
S1.The distributions of angles between the explicit
and implicit force
are shown in Figure 3. Small angles (close
to 0°) indicate hydrophilicity, because the solvent force points
in the direction of the area derivative of the implicit force, which
points generally toward the solvent, and large angles (close to 180°)
are accordingly associated with hydrophobicity. This is clearly visible
for the most hydrophilic NH3+ (a) and the most hydrophobic CH2 (e) atom types.
Figure 3
Distribution of the angle θ between
the explicit and implicit force vectors for selected GROMOS atom types.
The atom type selection, data subsets, and color scheme are the same
as in Figure 2.
The blue distributions of ⟨SA &
θ+⟩ reflect
the angle restrictions required in order to fit a SASA-based implicit
solvation model to explicit forces.Distribution of the angle θ between
the explicit and implicit force vectors for selected GROMOS atom types.
The atom type selection, data subsets, and color scheme are the same
as in Figure 2.
Distribution of σ Values
The distributions of values of the σ parameters
derived from the explicit forces in the area range 0.2–0.5
nm2 (⟨SA⟩) is reasonably constant as a function
of the SASA (Figure 4). This is reassuring,
because σ is assumed to be a constant,
independent of the SASA. However, one can observe an increased scatter
of σ toward low SASA values,
as mentioned above. This scatter can be understood from eq 11: as the atom is near complete burial or exposure,
small fluctuations in its position lead to large fluctuations in the
area derivatives and the derived σ values.
Figure 4
Distribution
of the subset σ⟨ values
as function of the atomic SASA value A for selected GROMOS atom types.
Distribution
of the subset σ⟨ values
as function of the atomicSASA value A for selected GROMOS atom types.
σ⟨ Parameters from Data Selected To Be Relevant to the SASA Model
The distributions of σ⟨ parameters (denoting
σ parameters derived from the ⟨SA
& θ+⟩ subset) are illustrated in Figure 5. Charged atoms show a larger spread of the σ⟨ distribution than polar and hydrophobic atoms.
Figure 5
Distribution of the subset
σ⟨ values as a function
of the atomic SASA value A for selected GROMOS atom types.
Distribution of the subset
σ⟨ values as a function
of the atomicSASA value A for selected GROMOS atom types.The derived σ⟨ values are given
in Table 1. The median and interquartile range
(iqr) were
chosen instead of mean and standard deviation (sd) because of the
asymmetry of the σ⟨ distributions (assuming
normality, the conversion between iqr and sd is sd = iqr/1.349). Despite the relatively large iqr
values, the σ⟨ estimates are robust as
shown by the small bootstrap ‘sd’ values, which is due
to the sound statistical basis of the data. A graphical illustration
of the σ⟨ value distributions is provided by the box plots in Figure 6.
Figure 6
Box plots of the σ⟨ value distributions
of
the atom types. Distribution features are characterized by symbols;
box: 50%, whiskers: 99%, circles: outliers.
Box plots of the σ⟨ value distributions
of
the atom types. Distribution features are characterized by symbols;
box: 50%, whiskers: 99%, circles: outliers.Distribution of σ⟨ parameters of GROMOS
atom
types, color coded with blue for oxygen atoms, yellow-red for nitrogen
atoms, and gray for carbon atoms. The data set contains 5000 sampled
(with replacement) data of the σ⟨ distribution of
each atom type. This data set served as input for the partitioning
into atom groups. The two dashed vertical lines show the optimal partitioning
obtained via dynamic programming.Box plots of the σ value distributions
of groups of atom types derived by dynamic programming partitioning.
Symbols are as in Figure 6.
Resampling According to the ⟨SA⟩ Distribution
The intention of the determination of the σ parameters via eq 11 is a direct transformation of observed solvent forces into a mean
force. The selection of forces relevant to the SASA model by requiring
a minimum size and projection on the area derivative direction biases
the original force distribution. To provide a quantitative estimate
of this bias, the distributions of the raw and selected data were
approximated with a maximum-likelihood fit using a ‘log–normal’
density function. We find that the original forces and all subsets
closely follow log–normal distributions as shown by the Q–Q-plots
in Supplementary Figure S6 – Figure S8, albeit with different mean and spread (Supplementary
Table S3).The σ⟨ values in the ‘solvation
parameter’ columns of Table 1 were derived
from the ⟨SA & θ+⟩ subset of explicit water
forces of minimal size and with implicit force direction, and it is
not clear at this point how this selection on projection angle influences
the force distribution and the derived parameters. To reconstruct
the ⟨SA⟩ force distribution without compromising the
selection on projection angle, the ⟨SA & θ+⟩
distribution was resampled with a probability vector that generated
the mean and spread of the ⟨SA⟩ distribution (see Supplementary Figure S9 and Methods).The
‘resampled’ values in the right columns of Table 1 were computed from this distribution. The results
are very similar for most atom types, the largest deviations being
0.7 (CH2) and 0.6 (CR1). In comparison to the iqr ranges, these deviations
are small enough to justify the selection on projection angle as originally
performed.
Atom Grouping via Dynamic Programming Partitioning
The similarity of the σ⟨ parameters between
several atom types suggests a simplified parametrization by partitioning
the atom types into atom groups. A good partition should preserve
a maximal amount of the information contained in the input data (i.e.,
the pairing of atom types and σ⟨ values) at a lower
number of parameters (i.e., groups of atom types instead of atom types).
In Figure 6, one can easily discern the difference
between atom types that are charged (OM, NL, and NZ), polar (O, OA,
NT, and NR), or hydrophobic (CH1, CH2, CH3, and CR1), and intuitively
one might choose to combine these into groups. A more systematic and
quantitative approach is provided by information theory. An estimate
of the fraction of preserved information can be obtained from the
ratio of the Mutual Information I(type; group) and the Joint Entropy
H(type, group).[48] The Mutual Information
increases with the number of groups up to a maximum, at which the
two variables (type, group) are equivalent with respect to the attributed
σ values. The Joint Entropy is the maximally
achievable Mutual Information, and it is dependent on the number of
groups. Therefore it can be used as a normalization term to compare
the Mutual Information between partitions based on different group
numbers k. In Supplementary Figure
S10, the normalized Mutual Information I = I/H is plotted over the number k of groups
for partitioning into σ⟨ value bins of width
1 kJ mol–1 nm–2. The corresponding
entropy values are given in Supplementary Table
S5. The curve shows a maximum at which the gain of information
is highest in comparison to the theoretically achievable information.
Details of the partitioning algorithm and the associated information
measures are given in the Methods section.The results show a maximal I for
three groups in accordance with the intuitive grouping described above
(Figure 7). Table 2 contains the statistics for the resulting σ values given to this grouping for force angles
0–45°, and box plots of the distributions are shown in
Figure 8.
Figure 7
Distribution of σ⟨ parameters of GROMOS
atom
types, color coded with blue for oxygen atoms, yellow-red for nitrogen
atoms, and gray for carbon atoms. The data set contains 5000 sampled
(with replacement) data of the σ⟨ distribution of
each atom type. This data set served as input for the partitioning
into atom groups. The two dashed vertical lines show the optimal partitioning
obtained via dynamic programming.
Figure 8
Box plots of the σ value distributions
of groups of atom types derived by dynamic programming partitioning.
Symbols are as in Figure 6.
The hydrophilicity decreases
from group 1 to 3, and the spread
of the σ values is comparable to
that for the atom types in Table 2. Overall,
the σ values of the groups are
well separated.
Performance of the Implicit Solvation Model
The performance
of the new implicit solvent parametrization described above was evaluated
on a test set of six proteins[24] that were
not included in the parametrization set. The size of the test proteins
ranges from 20 to 189 residues. Simulations of 10 ns length were performed
in implicit solvent using the old (Supplementary
Table S6) and new (Table 1) parametrization.
Reference values were taken from simulations in explicit solvent as
described by Allison et al.[24] A comparative
summary of relevant protein properties is given in Table 3. The most sensitive and important parameters for the current
work are the SASA values. The difference between the parametrizations
is discernible in Figure 9. The hydrophilic
SASA of the new parametrization (medium-blue solid) is closer to the
reference values (dark-blue solid) than the old parametrization (light-blue
dashed). The old parametrization leads to an overexposure of hydrophilic
atoms. The average deviation from the reference values improved from
19.3% error to 8.5% error. The hydrophobic SASA is nearly unchanged,
which is expected from the small difference between the old and new
σ values. The new parametrization underestimates
the hydrophobic SASA of ubq, lys, and talin by about 7%. This difference
could become smaller with the addition of a volume term that represents
solute–solvent interactions for mostly buried atoms.[24]
Table 3
Comparison between 10 ns Simulations
in Explicit Solvent (expl), Implicit Solvent Using the New Parametrization
Derived Here (impl.n), and Implicit Solvent Using the Old Parametrization
(impl.o)a
protein
SASAtotal (nm2)
SASAphob (nm2)
SASAphil (nm2)
Rgyr (nm)
rmsd (nm)
rmsf (nm)
trpexpl
14.37
8.56
5.81
0.75*
0.34*
0.17*
trpimpl.n
16.97
10.09
6.88
0.73
0.23
0.12
trpimpl.o
18.24
10.52
7.73
0.77
0.31
0.12
drkexpl
42.55
21.95
20.61
1.11*
0.31*
0.13*
drkimpl.n
42.25
22.67
19.58
1.06
0.27
0.11
drkimpl.o
45.88
23.14
22.74
1.07
0.23
0.08
ubqexpl
53.27
29.96
23.31
1.21*
0.26*
0.12*
ubqimpl.n
51.00
27.63
23.37
1.15
0.38
0.17
ubqimpl.o
55.90
29.26
26.64
1.19
0.28
0.12
if3cexpl
61.64
35.67
25.97
1.34*
0.18*
0.11*
if3cimpl.n
64.47
34.34
30.14
1.28
0.27
0.11
if3cimpl.o
67.87
35.53
32.34
1.30
0.29
0.15
lysexpl
68.00
37.18
30.82
1.41*
0.23*
0.15*
lysimpl.n
68.96
35.50
33.46
1.35
0.29
0.12
lysimpl.o
77.93
38.73
39.20
1.40
0.33
0.12
talinexpl
115.24
67.45
47.80
1.92*
0.48*
0.19*
talinimpl.n
107.53
61.08
46.44
1.90
0.39
0.14
talinimpl.o
115.04
63.77
51.27
1.91
0.38
0.12
The test proteins (PDB code)
are trp (1l2y), drk (2a36), ubq (1ubq), if3c (1tig), lys (1aki), and talin (2jsw). Values marked with an asterisk and trajectories underlying the
expl SASA values were taken from Allison et al.[24] R: radius of gyration; rmsd:
root mean square deviation from the X-ray or NMR model structure;
rmsf: root mean square fluctuation.
Figure 9
Comparison between the SASA values of six test proteins (abbreviation
(PDB code) below figure) obtained under different simulation conditions:
explicit water (solid lines, dark colors), implicit solvent using
the parametrization derived here (solid lines, medium colors), and
implicit solvent using the previous parametrization (dashed lines,
light colors). The color code is blue for hydrophilic SASA, red for
hydrophobic SASA, and black for total SASA. Conformations were plotted
every 25 ps.
The test proteins (PDB code)
are trp (1l2y), drk (2a36), ubq (1ubq), if3c (1tig), lys (1aki), and talin (2jsw). Values marked with an asterisk and trajectories underlying the
expl SASA values were taken from Allison et al.[24] R: radius of gyration; rmsd:
root mean square deviation from the X-ray or NMR model structure;
rmsf: root mean square fluctuation.Comparison between the SASA values of six test proteins (abbreviation
(PDB code) below figure) obtained under different simulation conditions:
explicit water (solid lines, dark colors), implicit solvent using
the parametrization derived here (solid lines, medium colors), and
implicit solvent using the previous parametrization (dashed lines,
light colors). The color code is blue for hydrophilic SASA, red for
hydrophobic SASA, and black for total SASA. Conformations were plotted
every 25 ps.A comparison of local structural properties along
the simulations
in implicit solvent and in water is shown in Figure 10 for two exemplary test molecules. The data of the remaining
four test molecules are shown in the Supporting
Information (Figure S11). Local conformations are represented
by a structural fragment alphabet (M32K25). This alphabet was derived
to describe local conformational states in protein dynamics, and it
provides a more comprehensive set of loop and turn states than the
conventional secondary structure tools.[49,50] The implicit
solvent reproduces the structural properties of the water simulations.
Proteins with low conformational fluctuations (Figure 10a,b) show virtually the same profile of structural states.
Proteins containing regions with significant conformational fluctuations
(Figure 10c,d) show slight shifts in the conformational
profiles between implicit and explicit solvation, probably as a result
of the approximations in the implicit solvation model. However, the
range of fluctuations is similar, and the sampled conformations are
closely related.
Figure 10
Local structural properties of the test proteins if3c
(a,b) and
talin (c,d) in implicit solvent (a,c) and water (b,d). Coloring scheme
of conformational states: red, helical (including α-helix);
blue: extended (including β-strand); green-yellow: turns and
loops. Conformations were plotted every 25 ps.
Local structural properties of the test proteins if3c
(a,b) and
talin (c,d) in implicit solvent (a,c) and water (b,d). Coloring scheme
of conformational states: red, helical (including α-helix);
blue: extended (including β-strand); green-yellow: turns and
loops. Conformations were plotted every 25 ps.
Scope and Limitation of the Parametrization
The SASA
model and its parametrization described here is based on several assumptions
that have been mentioned throughout the text. Here we summarize these
assumptions and discuss their implications for the scope and limitation
of the parametrization. From a theoretical point of view, the force
matching formula eq 11 is applicable to all
recorded data pairs ∂A/∂r and ⟨f⟩ and the determined
σ parameters would represent all cumulative
solvent forces on atom i. In practice, forces on
atoms with less than 0.2 nm2 SASA are very noisy, and therefore
they were excluded. Using all remaining explicit forces leads to unphysical
σ parameters, as shown in the last column
of Table 4 for θ = 180°. At this
angle, polar atom types 1, 3, and 8 appear to be barely hydrophilic;
parameters of atom types 6 and 9 become incorrectly hydrophobic. Therefore,
as described in Methods, the parametrization
in Table 1 was performed using a data subset
of the explicit forces that matches the SASA model. In this subset,
explicit forces point in the same direction as the implicit forces.
This selection on the force angle introduces a bias on the resulting
σ parameters as shown in Table 4 for the angle range 15–90° in 15°
increments. As expected, σ values decrease
toward larger angles, but from 15–60° the variation lies
within the iqr range. The σ values of
the angle range 75–90° become progressively similar to
the unphysical values of the last column (180°). We chose the
σ values resulting from θ = 45°
as the preferred parametrization, because they represent a reasonable
middle course between including all data and selecting data that match
best the framework of the SASA model.
Table 4
Solvation Parameters σ for Each GROMOS Atom Type Derived for Angle Ranges
[0°, θ° ] and [(180-θ)°, 180°]a
atom
type
solvation
parameter
id.
type
σ15
σ30
σ45
σ60
σ75
σ90
σ180
1
O
–7.7 (4.9)
–7.7 (5.3)
–7.2 (5.1)
–6.4 (4.7)
–5.2 (4.4)
–3.8 (4.7)
–0.5 (7.5)
2
OM
–22.4 (15.4)
–23.4 (15.7)
–21.7 (14.4)
–18.4 (13.5)
–14.1 (12.6)
–10.3 (13.1)
–4.6 (16.7)
3
OA
–7.9 (5.2)
–7.6 (5.2)
–7.0 (5.0)
–6.1 (4.6)
–5.0 (4.3)
–3.8 (4.7)
–0.8 (7.0)
5
N
– (−)
– (−)
– (−)
– (−)
– (−)
– (−)
– (−)
6
NT
–4.0 (2.8)
–4.1 (2.9)
–4.0 (3.0)
–3.6 (2.9)
–2.9 (2.6)
–2.0 (2.7)
4.6 (4.2)
7
NL
–28.6 (27.2)
–28.7 (24.9)
–26.1 (22.5)
–22.4 (20.7)
–19.1 (20.0)
–16.9 (20.6)
–12.6 (23.0)
8
NR
–5.1 (4.9)
–4.7 (4.8)
–4.5 (4.5)
–4.1 (4.5)
–3.4 (4.1)
–2.7 (3.9)
–0.5 (5.3)
9
NZ
–17.7 (15.0)
–18.9 (20.9)
–13.3 (12.9)
–10.3 (7.5)
–7.1 (6.0)
–4.0 (6.0)
2.1 (8.5)
10
NE
– (−)
– (−)
– (−)
– (−)
– (−)
– (−)
– (−)
11
C
– (−)
– (−)
– (−)
– (−)
– (−)
– (−)
– (−)
12
CH1
4.6 (3.3)
4.2 (3.1)
3.8 (3.0)
3.3 (2.8)
2.8 (2.8)
2.5 (2.9)
2.1 (3.1)
13
CH2
5.8 (4.8)
5.4 (4.5)
5.0 (4.3)
4.3 (3.9)
3.5 (3.7)
2.7 (3.7)
1.2 (4.3)
14
CH3
3.9 (3.2)
3.0 (3.0)
3.3 (2.9)
2.9 (2.6)
2.5 (2.5)
2.2 (2.6)
1.9 (2.7)
16
CR1
5.0 (4.5)
4.9 (4.6)
4.5 (4.5)
4.1 (4.3)
3.5 (4.0)
2.8 (3.8)
1.6 (4.4)
Atom types 4 (water oxygen) and
15 (CH4) were not included in this parameterization. Atom types with
unassigned data (−) are under-represented in the selected data.
σ̅, median value of σ⟨ in units kJ mol–1 nm–2, θ value as subscript;
the error is given as interquartile range in parentheses (assuming
normality, the conversion between iqr and sd is sd = iqr/1.349).
Atom types 4 (wateroxygen) and
15 (CH4) were not included in this parameterization. Atom types with
unassigned data (−) are under-represented in the selected data.
σ̅, median value of σ⟨ in units kJ mol–1 nm–2, θ value as subscript;
the error is given as interquartile range in parentheses (assuming
normality, the conversion between iqr and sd is sd = iqr/1.349).A feature of the SASA model is the independence of
σ from the atomicSASA value (eq 6). However, charged atoms (Figure 4a,b) show a positive correlation betweeen σ and SASA values. This correlation became apparent
in this
study through the use of explicit solvent forces on individual atom
types. Since this study is based on the eq 6, this correlation is necessarily neglected, and the range of σ values is represented by an averaged σ value. This approximation could be remedied by
an additional term that complements the SASA derivative of eq 6 with a term that is proportional to the atomicSASA.
Exploration of such an extended SASA model will be performed in future
studies.
Discussion
A method for the determination of the parameters
of an implicit
solvation model has been proposed. It is based on a particular form
of force matching: the mean solvation force on an atom in explicit
solvent is projected onto the direction of the implicit force, as
given by the derivative of the solvent-accessible surface area term
of the solute potential energy function. This allows for the extraction
of the solvation parameters σ of the implicit
solvation model directly from the observed solvation forces in MD
simulations using explicit solvation for a set of proteins in their
native structure.In the original description of the implicit
SASA model,[27] the implicit solvation parameters
σ were estimated by comparing the preservation
of characteristic geometric properties of proteins between MD simulations
with explicit and implicit solvation. The resulting values for σ were −25 kJ mol–1 nm–2 for hydrophilic atoms and 5 kJ mol–1 nm–2 for hydrophobic atoms (see Supplementary Table S6). Here we used MD simulations of 188
protein domains with diverse topology and applied a force matching
formula to derive the σ parameter
values directly from the simulations of the rigid proteins in implicit
and explicit water. Analysis of the solvent forces revealed that only
a fraction of the explicit solvent forces are suitable for the parametrization
of the SASA model: those that act on atoms having a surface exposure
over 0.2 nm2 and that are roughly aligned with the implicit
force. Therefore it was imperative to start from a large data set
to arrive at a final data set of sufficient statistical weight. While
the selection based on a sizable exposed area can be viewed as an
intrinsic part of a SASA-based model, it is less obvious how selection
based on a small angle between explicit and implicit forces modifies
the explicit solvent force distribution. We showed that the latter
follows a log–normal function like all other solvent force
distributions explored here and that the derived σ parameters would be the same if the distribution
was that of the SASA-only selection.The derived σ values show a good
correspondence between the original and current parametrization. Charged
atoms adopt σ values of −26.1
(NH3+) and −13.3
(Arg NH2+) kJ
mol–1 nm–2, hydrophobic atoms
values between 3.3 (CH3) and 5.0 (CH2) kJ mol–1 nm–2. Additionally we observed
a group of polar atom types (e.g., OH and C=O) that adopt intermediate
values between −7.0 (OH) and −4.0 (NH2) kJ
mol–1 nm–2. We partitioned the
resulting σ distributions of the 16
GROMOS atom types into three groups using dynamic programming. The
partitioning maximizes the Mutual Information between the (discretized)
σ distributions and the assigned group
labels. The resulting three groups can be labeled as ‘charged’
(σ = −23.3 kJ mol–1 nm–2), ‘polar’ (σ = −7.3 kJ mol–1 nm–2), and ‘hydrophobic’ (σ = 4.1 kJ mol–1 nm–2).The current parameter values were tested on a set of 6 proteins
for which data of long and independent (from this study) simulations
were available. The new parameters improve the hydrophilic SASA, indicating
that the average solvent forces on hydrophilic atoms are well reproduced
by the implicit solvent model.
Authors: Markus Christen; Philippe H Hünenberger; Dirk Bakowies; Riccardo Baron; Roland Bürgi; Daan P Geerke; Tim N Heinz; Mika A Kastenholz; Vincent Kräutler; Chris Oostenbrink; Christine Peter; Daniel Trzesniak; Wilfred F van Gunsteren Journal: J Comput Chem Date: 2005-12 Impact factor: 3.376
Authors: Jane R Allison; Katharina Boguslawski; Franca Fraternali; Wilfred F van Gunsteren Journal: J Phys Chem B Date: 2011-03-24 Impact factor: 2.991