Alfredo Jost Lopez1, Patrick K Quoika1, Max Linke1, Gerhard Hummer1,2, Jürgen Köfinger1. 1. Department of Theoretical Biophysics, Max Planck Institute of Biophysics, Max-von-Laue-Straße 3, 60438 Frankfurt am Main, Germany. 2. Institute for Biophysics, Goethe University, Max-von-Laue-Straße 9, 60438 Frankfurt am Main, Germany.
Abstract
Interactions among proteins, nucleic acids, and other macromolecules are essential for their biological functions and shape the physicochemcial properties of the crowded environments inside living cells. Binding interactions are commonly quantified by dissociation constants Kd, and both binding and nonbinding interactions are quantified by second osmotic virial coefficients B2. As a measure of nonspecific binding and stickiness, B2 is receiving renewed attention in the context of so-called liquid-liquid phase separation in protein and nucleic acid solutions. We show that Kd is fully determined by B2 and the fraction of the dimer observed in molecular simulations of two proteins in a box. We derive two methods to calculate B2. From molecular dynamics or Monte Carlo simulations using implicit solvents, we can determine B2 from insertion and removal energies by applying Bennett's acceptance ratio (BAR) method or the (binless) weighted histogram analysis method (WHAM). From simulations using implicit or explicit solvents, one can estimate B2 from the probability that the two molecules are within a volume large enough to cover their range of interactions. We validate these methods for coarse-grained Monte Carlo simulations of three weakly binding proteins. Our estimates for Kd and B2 allow us to separate out the contributions of nonbinding interactions to B2. Comparison of calculated and measured values of Kd and B2 can be used to (re-)parameterize and improve molecular force fields by calibrating specific affinities, overall stickiness, and nonbinding interactions. The accuracy and efficiency of Kd and B2 calculations make them well suited for high-throughput studies of large interactomes.
Interactions among proteins, nucleic acids, and other macromolecules are essential for their biological functions and shape the physicochemcial properties of the crowded environments inside living cells. Binding interactions are commonly quantified by dissociation constants Kd, and both binding and nonbinding interactions are quantified by second osmotic virial coefficients B2. As a measure of nonspecific binding and stickiness, B2 is receiving renewed attention in the context of so-called liquid-liquid phase separation in protein and nucleic acid solutions. We show that Kd is fully determined by B2 and the fraction of the dimer observed in molecular simulations of two proteins in a box. We derive two methods to calculate B2. From molecular dynamics or Monte Carlo simulations using implicit solvents, we can determine B2 from insertion and removal energies by applying Bennett's acceptance ratio (BAR) method or the (binless) weighted histogram analysis method (WHAM). From simulations using implicit or explicit solvents, one can estimate B2 from the probability that the two molecules are within a volume large enough to cover their range of interactions. We validate these methods for coarse-grained Monte Carlo simulations of three weakly binding proteins. Our estimates for Kd and B2 allow us to separate out the contributions of nonbinding interactions to B2. Comparison of calculated and measured values of Kd and B2 can be used to (re-)parameterize and improve molecular force fields by calibrating specific affinities, overall stickiness, and nonbinding interactions. The accuracy and efficiency of Kd and B2 calculations make them well suited for high-throughput studies of large interactomes.
In biological cells, most
protein, DNA, and RNA molecules have
to bind to specific binding partners to perform their biological functions.
These specific interactions compete with nonspecific interactions,
and cells have evolved various mechanisms to minimize wasteful nonspecific
binding.[1,2] However, nonspecific interactions shape
the physicochemical properties of the crowded environments inside
cells.[3] The quantification of binding affinities
and interaction strengths of biological macromolecules is thus crucial
for the understanding and modeling of cellular processes. In the following,
we focus on protein–protein interactions, but all of our results
are generally applicable to other specific and nonspecific binding
interactions.Experimentally, protein interactions are quantified
by the dissociation
constants Kd and the second osmotic virial
coefficient B of protein
species i and j. We follow the common
convention and use B22 for self-interactions
and B23 for cross-interactions. The dissociation
constant Kd quantifies the amount of bound
proteins and can be measured in isothermal titration calorimetry,
surface plasmon resonance, or analytical ultracentrifugation experiments,
for example.[4] The interaction strength
of pairs of proteins in binding and nonbinding configurations can
be quantified by measuring the second osmotic virial coefficient B, which relates the microscopic
protein interactions to the macroscopic osmotic pressure.[5−7] Moreover, the second osmotic virial coefficient is related to solubility
and used as a predictor for protein crystallization conditions.[8,9] In experiments, B is measured by sedimentation[10−12] and size-exclusion chromatography.[8] Scattering experiments, such as static light
scattering (SLS) and small-angle X-ray scattering (SAXS) experiments,
can provide approximate estimates for B.[13,14]Kd and B are
crucial quantities to relate molecular simulations
of interacting proteins to the experiment. Such comparisons become
increasingly important as molecular simulations of crowded cell-like
environments have become computationally feasible, even in full atomic
detail.[15,16] In simulations of strong binders, Kd is usually determined by calculating the binding
free energy to specific binding interfaces.[17] If binding interfaces are unknown, Kd values are often calculated from the ratio of bound and unbound
populations,[18] as recently applied to RNA–RNA
binding.[19] As we will discuss here, this
approximation is accurate only for special cases. B can be estimated by integration over
the configuration space,[20−22] by Mayer sampling,[23,24] from molecular simulations using radial distribution functions or
potentials of mean force,[25−28] and by simply counting all configurations in which
proteins do not interact.[29,30]Here, we show
that Kd is fully determined
by B and the fraction pb(V) of bound proteins estimated
from molecular simulations of two proteins in a box with volume V, i.e.Here NA is Avogadro’s
constant. In the derivation of this equation, we do not make any assumptions
about the interaction strength or about the degrees of freedom of
proteins or the solvent. Thus, it is generally applicable and valid
not only for coarse-grained simulations using implicit solvents but
also for fully atomistic molecular dynamics simulations using explicit
solvents. We present two different routes to calculate B and thus Kd.For simulations using implicit solvents, we can apply protein
insertion
and removal moves to estimate the free energy that corresponds to
the two-particle partition function determining B. The insertion ensemble can be generated
with any Monte Carlo or molecular dynamics code to sample from the
canonical ensemble without modification. We estimate the partition
function by combining the insertion and removal ensemble using either
Bennett’s acceptance ratio (BAR) method[31] or the binless weighted histogram analysis method (WHAM).[32−34] In contrast to the Mayer sampling method,[23,24] which uses molecular Monte Carlo integration to calculate virial
coefficients even of higher orders, here, we use exactly the same
simulation system for the calculation of B as we use to sample from the canonical ensemble.For simulations using either implicit or explicit solvents, B can be calculated accurately
by estimating the probability that the two proteins are outside of
their interaction range.[29] We present mathematically
simple expressions for B and Kd in terms of this probability,
which provide insights into their physical interpretations complementary
to more common formulations based on radial distribution functions
or potentials of mean force.We quantify the interactions of
the two proteins when they are
not bound using Kd and B. Previously, theoretical models for
excluded volumes have been used to extract nonbinding interactions
from experimentally measured B values.[35] Here, we use the fact
that the contributions of bound configurations to B are completely determined by Kd and show that the remaining contributions
have a simple and clear interpretation. Moreover, we propose that
these contributions of nonbinding interactions can be estimated in
experiments.The article is organized as follows. In Section , we derive expressions
to calculate the
dissociation constant and the second osmotic virial coefficient from
simulations. We present the details of our methods in Section and a validation of our methods
and results for three weekly binding proteins using coarse-grained
simulations in Section . We end with conclusions in Section .
Theory
For simulations
of two proteins in a box, we show that the dissociation
constant Kd is determined by the binding
probability and the second osmotic virial coefficient B of protein species i and j. The latter is determined by the two-particle
partition function, which in general can be estimated from the fraction
of states where proteins are outside of their interaction range[29] or, for implicit solvents, by performing a free
energy calculation using insertion and removal moves.
Preliminaries
McMillan and Mayer[5] have shown how we can apply results of statistical
mechanics to describe osmotic properties of solutions. Integrating
out solvent degrees of freedom, only solute degrees of freedom remain
and solutes interact with each other via effective potentials. For
such a system with m solute species, the virial equation
of state[36,37] becomes the osmotic virial equation of state,
i.e.where Π is the osmotic pressure, Vm is the molar volume, R is
the gas constant, T is the temperature, x is the mole fraction of species i, and B is
the osmotic second virial coefficient of proteins of species i and j.We can express the second
virial coefficients B of an arbitrarily shaped particle of species i and
an arbitrarily shaped particle of species j, via
one- and two-particle configurational partition functions. To do so,
we extend the derivation by McQuarrie to nonspherical particles[7] and start from the grand canonical partition
functionwhere V is the volume, N is the number of molecules
of species i containing n atoms each, and z = exp(βμ) is the
fugacity determined by the chemical potential μ of species i and the inverse temperature
β = 1/(kbT). kb is Boltzmann’s constant. The osmotic
pressure is a function of the fugacities and given by βΠV = ln Ξ(T,V, z1,...,z).[38,39] Here, we write the canonical partition function Q(N1,...,N) of m species of arbitrarily
shaped particles aswhere is the
corresponding configurational partition
functionwhere the potential
energy U(X) depends on the set X of all |X| = ∏Nn atom positions.
In eq , we introduced for the
single-particle canonical partition
function, e.g., . For spherically symmetric particles, and we recover McQuarrie’s
expression[7] for Q(N1, ..., N). For
rigid cylindrically symmetric and asymmetric particles, and , respectively.
Note that in the following,
we use “Z” instead of “ for
these expressions for rigid
molecules to distinguish them from the full configurational partition
function of flexible molecules written as calligraphic “. We
obtain for the second osmotic
virial coefficientswhere we introduced for the two-particle
partition function,
e.g., for a pair of particles of species 1 and
2 or for a pair of particles of species 1.
Estimating
the Dissociation Constant
We show how to obtain a box-size-independent
estimate of the dissociation
constant Kd from simulations of two proteins
in a box. Kd is related to the Gibbs binding
free energy ΔG viawhere c0 = 1M is the standard concentration
and Δp is the pressure difference between bound
and unbound states.[40] The last term is
usually small and can be neglected.For large enough box volumes V, one would be tempted
to estimate the dissociation constant of two proteins A and B directly from the binding probability pb(V). For a discussion of suitable
definitions of bound states, see Section . Using the concentrations of free proteins
[A] = [B] = (1 – pb(V))/(NAV) and the concentration of bound protein
[AB] = pb(V)/(NAV), a first rough
estimate of the dissociation constant is given byFor small box sizes typically used in simulations,
this estimate suffers from finite-size effects. Accurate estimates
using eq would require
unusually large boxes, as we show in Section , which makes sampling highly inefficient.To overcome this finite-size effect, we effectively extend the
box volume analytically and calculate Kd in the limit of infinite volume (Figure ). We emphasize here that in the following
derivation, we consider fully flexible proteins without any restrictions
on their internal degrees of freedom. We remove the translational
and rotational degrees of freedom of the protein of species i, which correspond to a factor Z(V) = 8π2V in the partition function for asymmetric proteins. That
is, we fix the position and orientation of the protein of species i, which leaves the internal degrees of freedom due to the
flexibility of the protein unchanged. The corresponding partition
function of j in the presence of i, with i fixed in position and orientation but internally
flexible, is given by We extend
this system with a fixed position
and orientation of the flexible protein i by an additional
volume ΔV accessible to the second protein.
The contribution to the partition function of a protein of species j being in this additional volume ΔV is given bywhere Z(ΔV) = 8π2ΔV gives the contribution due to the
translational and rotational
degrees of freedom of an asymmetric protein to the partition function. and are the partition functions of individual
proteins i and j, whose positions
and orientations are fixed in space. That is, and contain only contributions
due to the respective
internal degrees of freedom of free proteins and due to the degrees
of freedom of solvent molecules in the vicinity of the proteins, which
differ from the bulk due to the presence of the protein. For rigid
protein models in implicit solvents, .
Figure 1
Calculating the dissociation constant Kd and the second osmotic virial coefficient B from simulations of two proteins in a box
of volume V. The red protein has a single specific
wedge-shaped binding site for the triangular blue protein. The light-blue
protein configurations illustrate different interaction modes of the
two proteins considered in the derivation of eqs and 28. To obtain a Kd estimate independent of box size, we analytically
extend the two-particle partition function for the simulation box
by the contributions of an extension volume ΔV (gray shaded area) and perform the limit ΔV → ∞. We calculate B from the probability p(V) that the two proteins are within a subvolume v (green), which is at least large enough to cover all protein–protein
interactions (yellow shaded area).
Calculating the dissociation constant Kd and the second osmotic virial coefficient B from simulations of two proteins in a box
of volume V. The red protein has a single specific
wedge-shaped binding site for the triangular blue protein. The light-blue
protein configurations illustrate different interaction modes of the
two proteins considered in the derivation of eqs and 28. To obtain a Kd estimate independent of box size, we analytically
extend the two-particle partition function for the simulation box
by the contributions of an extension volume ΔV (gray shaded area) and perform the limit ΔV → ∞. We calculate B from the probability p(V) that the two proteins are within a subvolume v (green), which is at least large enough to cover all protein–protein
interactions (yellow shaded area).The probability pb(Vex) that the two proteins are bound in the extended volume Vex = V + ΔV is now given by the ratio of the partition function of the bound
proteins to the partition
function of the extended system . With the position of
protein i fixed, is independent of the size of the volume V and
thus the same for the simulation box and for the extended
system, i.e., . ConsequentlyTo calculate a Kd value
unaffected by the finite size of the simulation box, we now
substitute eq into eq . We then take the limit
ΔV → ∞ and use that Z(V)/V = 8π2 to obtainWe can rewrite
this equation realizing that
the partition function of all bound states of the system, where also
protein i can move and rotate, is given by . Note that is proportional to V. Equation becomesExpressing by the second osmotic virial coefficient
defined in eq and inserting the resulting expression
in eq , we obtain
the relationship
between Kd and B given in eq As a corollary, the volume dependence of the
fraction of bound proteinsis parameterized by Kd and B.As we derive in the following, Kd and B fulfill the approximate
relationThis approximate relationship becomes an exact
relationship if we define all interacting states as bound states[41,42] or for proteins that do not interact when they are not bound (see Section ). We write as a sum of the partition functions for the bound and for the unbound states, i.e., , and insert this expression
in eq . We then obtainIf unbound interactions are weak,
thensuch thatwhere we used eq . Rearranging this equation, we arrive at eq . We can now insert this
expression into eq and obtainfrom which we can express Kd to obtain an approximate estimate for Kd, which we call Kd′, i.e.Here, we introduced the fraction of unbound
protein configurations as pu(V) = 1 – pb(V).
Note that eq corresponds
to eq of de Jong
et al.[18] For the exact relationship between Kd and Kd′, see Section .
Estimating
the Second Osmotic Virial Coefficient
As we have shown above,
we have to estimate B to accurately estimate Kd. To do so,
we apply the same concepts as we have used
for the calculation of Kd. We first remove
contributions to the partition function due to the translational and
rotational freedom of the whole system by keeping the position and
the orientation of the otherwise flexible protein i fixed (eq ). Around
this protein, we define a subvolume v < V, which has to be big enough such that it captures all
protein–protein interactions (Figure ). Outside this subvolume, protein–protein
interactions can be neglected. That is, the flexible protein j moves freely when it is in volume δv = V – v.The probability p(V) that
protein j is in subvolume v is given
bywhere is given analogous to eq and Z(δv) = 8π2δv for asymmetric proteins. We can express from eq asUsiing eq for , it follows that1 – p(V) is the probability that protein j is
in volume δv. ConsequentlyUsing that Z(v) and Z(δv) are proportional to their
arguments with the same prefactor (see Section ) and that δv = V – v, where V is
the box volume, we obtainSolving for p(V), we obtainwhich describes
the dependence of p(V) on the
box volume V and the subvolume v.We emphasize that eq is generally valid for arbitrary binding partners, without
making
any assumptions about symmetry or the number of internal degrees of
freedom of the binding partners or of the solvent. The only condition
is that interactions between binding partners are negligible outside
of the volume v. We can introduce correction terms
based on an effective pairwise potential acting between the binding
partners if this condition is not fulfilled (see Section ).To motivate the
interpretation of eq , we rewrite it asNote
that the prefactor in eq contains the box volume V, whereas the prefactor
in eq contains the
subvolume v. The first term in the brackets, determining
the two-particle partition
function, is the ratio of the probability of finding one protein outside
of the subvolume v for the ideal system, 1 – v/V, to the corresponding probability for
the interacting proteins, 1 – p(V). This ratio, which is the inverse
of the quantity f2(V)
of Ashton and Wilding,[29] is independent
of the subvolume v, chosen to be just large enough
to cover the interaction range. Consequently, the first term in the
brackets in eq can
be written as 1/exp[−βFo(ex)(V)], where we introduced the excess free
energy of finding the two proteins outside of their interaction range
in the box of volume V asWe express Kd as
a function of p(V) by inserting eq into eq and
obtainWe next establish the commonly used relationship
of B to the partial
radial distribution function g(r).[7] The ratio of p(V)/(1 – p(V)) can
be estimated from the probability density of center-of-mass distances p(r) of two proteins in a box, for instance,
which is itself related to the radial distribution function g(r). To do so, we define a spherical volume v = 4πR3/3 and a spherical
shell around this sphere with volume δv = 4π[(δR + R)3 – R3]/3. The ratio is then given byWe define a radial distribution function g(r) throughWe can choose the proportionality constant
such that g(r) = 1 for r > R, where p(r) ∝ r2. Then, 4π ∫g(r)r2 dr = δv and we may writeInserting this expression in eq and using that ∫0 4πr2dr = v, we obtainBy introducing an effective
interaction potential
βw(r) = −ln g(r), we can write eq as it is commonly presentedUsing eq instead of eq or 35, we can avoid
the computation
of distance distribution functions and potentials of mean force, respectively,
and the subsequent integration. Importantly, we also do not have to
estimate the plateau value of g(r), which in simulations is different from one and which depends on
system size and the thermodynamic ensemble.[29,43] Although these differences might be viewed only as a minor simplification, eq emphasizes that B is independent of the detailed
shapes of g(r) and w(r) and determined by the excess free energy Fo(ex)(V) of finding the two proteins outside of their
interaction range. Note that our results also apply to the infinite
dilution limits of the Kirkwood–Buff integrals G = 4π ∫∞[g(r) – 1]r2 dr = 2B.[13,44,45]
Contribution of Nonbinding Interactions to B
We can use Kd and B to quantify the nonbinding interactions of two proteins. Let
us first consider two nonbinding proteins for which (V) = 0. Consequently, eq becomeswhere we use the superscript “(u)”
to indicate contributions of the unbound states. For binding proteins, B(u) is given by the difference between B = B(u) + B(b) and the contributions to B(b) due to bindingi.e., we can quantify the nonbinding interactions
for two binding proteins viawhich becomesFor hard spheres, pu(V) = 1 and p(V) = (v – vexc)/(V – vexc), where vexc is the excluded
volume, such that B(u) = vexc/2. For attractive nonbinding interactions B(u) < vexc/2, and for repulsive nonbinding
interactions B(u) > vexc/2. Note that for asymmetric proteins, vexc corresponds to an excluded region in the configuration
space, which,
for instance, is spanned by Cartesian coordinates and Euler angles
in the case of rigid proteins. Thus, in general, vexc should be viewed as an effective volume corresponding
to a thermodynamic free energy.We now show that B(u) quantifies the difference between the approximate expression
for Kd in eq and the box-size-independent expression
for Kd in eq . Inserting eq into eq ,
we obtainsuch that the relative difference is given
byConsequently, the approximate estimate Kd′ deviates systematically from the true value Kd, with deviations proportional to B(u), but converges to the true value with increasing box volume as 1/V.
Indistinguishable Binding
Partners (Homodimers)
So far, we have assumed that the proteins
are distinguishable,
i.e., that they form heterodimers, but all expressions derived here
are also valid for indistinguishable binding partners forming homodimers.
To consider the case of two identical binding partners, we rewrite eq aswhere
we introduced for the
partition function of two free
proteins, which is determined by the product of two single-protein
partition functions. For indistinguishable binding partners forming
homodimers, both and would have to be multiplied by a factor
1/2 to account for the indistinguishablity of the proteins. However,
these factors then cancel in the ratio in eq .
Kd and B from
a Single Simulation
We can estimate Kd and B from
the fraction of bound protein pb(V) and the probability p(V) of one
protein being located in a subvolume v around the
other. The latter determines B according to eq , which we then insert into eq to obtain the finite-size corrected estimate of Kd. We call this method the subvolume method. To calculate B, we can also estimate the
two-particle partition function Z(V), now for simplicity but without loss of
generality only considering rigid molecules, using free energy methods.[46] For implicit solvents, we can use insertion
and removal moves of the proteins to efficiently estimate Z(V), as
explained in the following. We call this method the insertion/removal
method.
Estimating Two-Particle Configurational
Partition Functions for Implicit Solvents
A simulation of
a pair of proteins in a box of volume V at reciprocal
temperature β gives us immediately the particle-removal energy
distribution as the normalized distribution of potential energies.
We define x = (r,Ω), where r are the
Cartesian coordinates of the geometric center of protein i and Ω are its Euler angles defining
its orientation. We denote the configuration space as W = V × Ω to simplify the notation. The
particle-removal energy distribution is then given bywhere Z23(β)
= ∫ dx2dx3e–β and δ[·] is Dirac’s delta function.The particle-insertion energy distribution pins(E) is formally given bywhere Z(β = 0) = ∫ dx. Sampling the particle-insertion
energy distribution pins(E) for a given box size is straightforward. All one needs is a replica
with reciprocal temperature β = 0 exactly. All moves will then
be accepted, and the energies saved are those of random insertions.
Alternatively, one could make trial moves of the two proteins with
Monte Carlo move widths ±L/2, where L is the box length, and the orientation changes about random
axes by ±π, and to write out the absolute trial (!) energies
(not the energy differences or the accepted energies). With such a
move protocol, it would not matter if one or both particles were moved
and if moves are accepted or not. It also does not matter what the
“acceptance rate” is (i.e., it can be zero!). What is
important, though, is that the box volumes in insertion and removal
runs are the same.The normalized removal and insertion energy
distributions are related
to each other bywhich follows fromThe
ratio of partition functions
defines the
free energy of going from a system of two noninteracting particles
to a system in which they interactNote that F = −Fo(ex)(V) (see eq ). An efficient way of determining
this free energy is to
use the Bennett acceptance ratio (BAR) estimator[31]where E are the uncorrelated (by
construction) insertion energies
and are the uncorrelated removal energies. However, it is clear
that this is problematic in cases where the proteins are strongly
bound (forming a dimer!) because then one would have very little information
about higher energies.This problem can be remedied using all
of the data in a temperature
replica exchange simulation. In effect, the high-temperature runs
allow us to estimate an accurate density of states to a pretty high
energy. The particle-insertion energies complement this density of
states on the high-energy side. All of the runs at different temperatures
can be combined with the list of insertion energies using binless
WHAM. As a reference, we take the temperature of interest (β
= β1 without loss of generality). The bias energies
at replicas with reciprocal temperature β are then ΔU = (β/β – 1)U. This formula
works also for the insertion energies coming from a run with β = 0. The insertion energies can be thought
of as coming from a run with the bias potential ΔU = −U, i.e., on potential zero. A binless-WHAM
analysis using these bias energies as input will produce the required
free energy F as the difference between the reference
state and the insertion run.
Practical
Considerations
In the derivation
of Kd and B, we have assumed that the volume is large enough
such that interactions between the protein with a fixed position and
orientation and the protein in the extended volume can be neglected.
If this condition is not fulfilled, then we can correct for residual
interaction energies using a simple distance-dependent interaction
potential ϕ(r) in the calculation of , where denotes the
Cartesian space defining ΔV. For example, at
large distances, the interaction of charged
proteins can be approximated by (screened) Coulomb interactions of
the total charges located at the centers of charge. In such a case,
we would include for the calculation of the fraction bound only configurations
of the simulation where the two proteins are separated less than a
cutoff distance, usually given by half the shortest box length. Such
a system corresponds to a spherical volume with one protein at its
center and the other one moving unrestrained. Doing so, we assume
that the residual interaction modeled as a simple pair-potential has
a negligible effect on the internal degrees of freedom and the degrees
of freedom of the surrounding solvent, i.e., and are unchanged.Suitable definitions
of bound states will depend on the molecular model we use for simulations.
In our simulations of rigid proteins in implicit solvents, we consider
a state as bound if the interaction energy of the two proteins is
smaller than −2kbT. Additionally demanding that two proteins have to have a minimum Cα distance smaller than 0.8 nm to be counted
as bound does not have a noticeable effect on the binding probability.
For molecular dynamics simulations using explicit solvents, a combination
of distance- and energy-based criteria and using transition-based-assignment
of states[47] might be necessary to reliably
distinguish bound states from spurious contacts.In simulations
of two proteins in a box, we can estimate p(V) using
a distance-based criterion as has been introduced by Ashton and Wilding.[29] We define a distance between the two proteins,
e.g., the center-of-mass distance r. We introduce
a distance R such that interactions between proteins
are negligible for distances r > R. For an ensemble of N structures, we count the
number of structures N for which r ≤ R. In these
structures, the center-of-mass of protein 2 lies within a spherical
volume v = 4πR3/3 centered at the center-of-mass of protein 1. We then estimate p(V) = N/N.For strong binders and in boxes of typical size, p(V) is close to one.
For p(V) = 1, (1 – v/V)/(1 – p(V)) in eq diverges. Consequently, p(V) has to
be determined with sufficient numerical precision to obtain accurate
estimates. For example, if we sample 10 000 configurations,
then the numerical precision of p(V) is limited to 1/10 000. The precision
can be increased by sampling more configurations or, in the case of
replica simulation, by including additional replicas using WHAM when
calculating p(V). For weak binders with Kd ≳ 100 μM, 10 000 configurations are sufficient
to estimate Kd and B even without applying WHAM.
Methods
We chose three weakly binding protein pairs
with experimental Kd values covering 3
orders of magnitude from
∼μM to ∼mM. The lysozyme homodimer has an experimental Kd value of Kd ≈
2710 ± 240 μM[48] (PDB 6LYZ[49]), the ubiquitin/CUE dimer (PDB 1OTR[50]) has a Kd ≈
155 ± 9 μM,[50] and the dimer
of the uracil-DNA glycosylase UDG and its uracil-DNA glycosylase inhibitor
protein (Ugi) has a Kd ≈ 1. 3 ±
0.3 μM[51] (PDB 1UUG[52]).To simulate these protein pairs, we use the amino-acid-level
coarse-grained
model developed by Kim and Hummer for weakly binding proteins[53] implemented in the Complexes++ software (https://www.github.com/bio-phys/complexespp). We treat all proteins as rigid bodies. In contrast to the original
model, which is called the KH-model, we shift the original Miyazawa
and Jernigan parameters[54,55] by e0 = −1.875 kbT, where T = 300 K, to account for the solvation energy and we scale
the resulting parameters by λ = 0.1243 to balance them with
the electrostatic interactions. In the original model, e0 = −2.27 kbT and λ
= 0.159. The new values have been chosen to better reproduce the B22 value of lysozyme and the Kd value of the ubiquitin/UIM1 complex. We chose residue
charges of −1.0e for Asp and Glu, +1.0e for Arg and Lys, and +0.5e for His because
its isoelectric point is at pH 7. e is the elementary
charge. Consequently, the total charges of the proteins are +8.5e for lysozyme, +0.5e for ubiquitin, −4.5e for CUE, +7.5e for UDG, and −11.5e for Ugi. We set the dielectric constant to 80 and the
Debye length to 1 nm, corresponding to the conditions in an aqueous
solution of 100 mM NaCl.To generate Boltzmann ensembles of
configurations, which also provide
the removal energy distributions defined in eq , we perform temperature replica exchange
Monte Carlo (REMC) simulations using 24 replicas. Temperatures were
equally spaced between 300 and 530 K. In a Monte Carlo sweep, each
protein performs one trial move on average, which can be translation
or rotation. Replica exchanges are attempted every 10 sweeps. For
the rotation move, a rotation axis is randomly generated by drawing
a point from a sphere. Then, we rotate around this axis by an angle,
which we draw from a box distribution with a width given by twice
the maximum angle. This maximum angle is set to 0.1 rad for the coolest
replica and to 1.25 rad for the hottest replica and spaced equidistantly
in between. Similarly, we set the maximum displacement for the translation
move to 0.2 nm in the coolest replica and to 1.35 nm in the hottest
replica, with equal spacing in between. In our simulations, we use
a cutoff radius of 3 nm to truncate our interaction potentials.To sample the insertion energy distribution defined in eq in simulations, we switch
off all interactions by setting all interaction parameters and residue
charges to zero. We use a maximum displacement of half the box length
and a maximum rotation angle of π. We accept and sample all
configurations to generate the insertion ensembles, for which we then
recalculate all energies for switched-on potentials.To estimate
the two-particle partition function, we combine results
from REMC simulations (removal ensemble) and the energies calculated
for the ensemble of noninteracting proteins (insertion ensemble) using
binless WHAM.[32−34] To avoid numerical problems, we clip interaction
energies at 100 kbT.
We define two proteins as being bound if their total interaction energy
is below −2kbT.For equilibration, we performed 106 Monte Carlo
sweeps
in each replica. For production, we performed 107 sweeps
and we sampled every 100th sweep, yielding 105 structures
for each protein pair per replica. We also performed 106 insertion moves for each pair, which by design creates uncorrelated
configurations.To study the box volume dependence of the fraction
bound pb(V), we calculated
for the
coolest replica pb = N/N. N is the number of structures with energies E ≤ −2kbT, and N = 105 is the total
number of structures. To study the box volume dependence of the subvolume
probability p(V), we calculated for the coolest replica p = N/N, where N is the number of structures within the subvolume v. We defined this volume as a spherical volume with a radius given
by the sum of (D + D)/2, where D and D are the largest diameters of proteins of species i and j, respectively, and our cutoff radius
of 3 nm. The resulting radii are between ∼6.7 and ∼7.4
nm for the three proteins. For each protein pair, we performed simulations
for 17 box sizes with volumes ranging from 3375 to 106 nm3. We calculated the standard errors of the mean by block averaging.[56,57]We validate the insertion/removal method and the subvolume
method
for the smallest boxes used here with volume = 153 nm3 = 3375
nm3. With uniform probability, we selected at random 10 000
of the N = 105 samples and chose for each
replica the configurations corresponding to the same 10 000
indices. We also drew 10 000 configurations of the 106 configurations in the insertion ensemble with uniform probability.
In the insertion/removal method, we then applied WHAM using these
250 000 configurations in total to calculate pb() and , from which we then estimated Kd and B. We repeated this procedure 1000 times and calculated the averages
of Kd and B and their covariance matrices. We confirmed visually
that the distributions of the estimates of Kd and B are
distributed according to two-dimensional Gaussians with the estimated
covariance matrices. We use the same protocol to obtain estimates
and uncertainties from resampling for the subvolume method, in which
we do not use the insertion ensemble.
Results
We calculated Kd and B using the insertion/removal method
and the subvolume method for three protein pairs, i.e., the lysozyme
homodimer and the heterodimers ubiquitin/CUE and UDG/Ugi. As we will
show, these estimates allow us to quantify the contributions due to
binding and nonbinding interactions to B.In the insertion/removal method, we determine Kd and B from replica exchange simulations at a box volume and from insertion ensembles. We first
estimated pb() and Z() by combining the insertion ensemble and
the replicas of our temperature REMC simulations using WHAM. We then
evaluated eq to obtain B and used this value together
with our estimate for pb() in eq to obtain Kd. By resampling,
we estimated the covariance matrix.In the subvolume method,
we first estimated p() and pb() from all replicas using WHAM. We used eq to calculate B from p() and used this estimate together with pb() to estimate Kd using eq .We
find that the estimates for Kd and B from the insertion/removal
method and the subvolume method agree excellently with each other
(Figure and Table ). Moreover, the estimates
have similar uncertainties. Kd values
and B values calculated
by resampling are correlated for both methods (Figure ). A smaller value of B, i.e., a more negative value, leads to
a smaller value of Kd according to eq .
Figure 2
Comparison of the accuracy
of the insertion/removal method (ins/rem,
black, solid lines) and the subvolume method (subvol, red, dashed
lines) to estimate Kd and B for three different protein pairs
(top to bottom). The most likely estimates are indicated by horizontal
and vertical dashed lines. The contour lines indicate the limits of
the 25, 50, 75, and 95% confidence regions. The insertion/removal
method (eqs and 1 and the two-particle partition function from WHAM
(Section ),
black) and the subvolume method (eqs and 1, red) agree excellently
with each other, and they have similar uncertainties. For UDG/Ugi,
contour lines collapse on to a single line due to the strong correlation
between the estimates for Kd and B.
Table 1
Kd, B, and the Contributions of
Binding Interactions, B(b), and Nonbinding Interactions, B(u), to B for Three Protein Complexes (PDB codes 6LYZ, 1OTR, 1UUG) for
the Insertion/Removal Method (“ins/rem”) and the Subvolume
Method (“subvol”)a
lysozyme
method
Kd [μM]
B22 [nm3]
B22(b) [nm3]
B22(u) [nm3]
ins/rem
5191 ± 63
–77 ± 4
–160 ± 2
83 ± 4
subvol
5188 ± 68
–78 ± 4
–160 ± 2
82 ± 3
Errors are standard
errors of the
mean.
Comparison of the accuracy
of the insertion/removal method (ins/rem,
black, solid lines) and the subvolume method (subvol, red, dashed
lines) to estimate Kd and B for three different protein pairs
(top to bottom). The most likely estimates are indicated by horizontal
and vertical dashed lines. The contour lines indicate the limits of
the 25, 50, 75, and 95% confidence regions. The insertion/removal
method (eqs and 1 and the two-particle partition function from WHAM
(Section ),
black) and the subvolume method (eqs and 1, red) agree excellently
with each other, and they have similar uncertainties. For UDG/Ugi,
contour lines collapse on to a single line due to the strong correlation
between the estimates for Kd and B.Errors are standard
errors of the
mean.For additional validation,
we use the results for Kd and B obtained at the box volume to predict the box-size dependence of the
fraction bound pb(V)
and the subvolume probability p(V). We use eq and our estimates for Kd and B obtained at
a box volume to calculate pb(V) (Figure ). We use eq and our estimates for B obtained at a box volume to calculate p(V) (Figure ). The resulting
curves reproduce the box
volume dependencies of pb(V) and p(V) observed in the entire range of simulations, covering nearly 3
orders of magnitude in volume.
Figure 3
Box-size dependence of the binding probability pb(V) is determined by B and Kd via eq .
We show
simulation results (blue) for three protein pairs (top to bottom).
Error bars indicate the blocked standard errors of the mean. The lines
are predictions using eq and estimates for Kd and B obtained at a box volume = 3375 nm3 (magenta vertical
line) using the insertion/removal method (black, solid lines) and
the subvolume method (red, dashed lines).
Figure 4
Box-size
dependence of the subvolume probability p(V) is determined by B via eq . We show simulation results (blue) for three
protein pairs (top to bottom). Error bars indicate the blocked standard
errors of the mean. The lines are predictions using eq and estimates of B obtained at a box volume = 3375 nm3 (magenta vertical
line) using the insertion/removal method (black, solid lines) and
the subvolume method (red, dashed lines).
Box-size dependence of the binding probability pb(V) is determined by B and Kd via eq .
We show
simulation results (blue) for three protein pairs (top to bottom).
Error bars indicate the blocked standard errors of the mean. The lines
are predictions using eq and estimates for Kd and B obtained at a box volume = 3375 nm3 (magenta vertical
line) using the insertion/removal method (black, solid lines) and
the subvolume method (red, dashed lines).Box-size
dependence of the subvolume probability p(V) is determined by B via eq . We show simulation results (blue) for three
protein pairs (top to bottom). Error bars indicate the blocked standard
errors of the mean. The lines are predictions using eq and estimates of B obtained at a box volume = 3375 nm3 (magenta vertical
line) using the insertion/removal method (black, solid lines) and
the subvolume method (red, dashed lines).For strong binders, the fraction bound pb(V) and the subvolume probability p(V) take on similar
values (compare Figures and 4). In these cases, p(V) is dominated by
binding. For small boxes, pb(V) is close to one and consequently so is p(V). For box sizes large enough
such that pb(V) is significantly
below one, the contribution of the size of the subvolume v to p(V) is small. For UDG/Ugi, the strongest binding complex considered
here, the fraction bound dominates p(V) such that the p(V) curve in Figure looks nearly identical
to the corresponding pb(V) curve in Figure . However, the differences in these curves are significant as they
are not only determined by the size of the subvolume v but also by the nonbinding interactions.We can extract the
contributions B(u), eq , of nonbinding
interactions to B.
We can do so even in the case of strong binders for which the Kd value is close to B(b) = −1/(2NAKd) according to eq (Figure ,
top). With the estimates provided by either the insertion/removal
method or the subvolume method, we can resolve the small difference B(u) = B – B(b) (Figure , center). Focusing on the results from the
insertion/removal methods, we find that for lysozyme B(u) ≈ 83 ± 4 nm3 > 0. This value
is
close to what one would expect for hard spheres of equal volume, i.e., B(u) = vexc/2 ≈
70 nm3. For ubiquitin/CUE, the interactions are clearly
attractive, but B(u) ≈ −9 ± 3
nm3 nearly vanishes. For UDG/Ugi, B(u) ≈ −94 ± 5 nm3 indicates attractive
interactions (Figure and Table ).
Figure 5
Contributions
of binding and nonbinding interactions to B = B(b) + B(u) for three protein pairs. We show estimates
from the insertion/removal method in color and estimates from the
subvolume method using larger symbols in gray. B of the strongest binders is dominated by
contributions of binding B(b) = −1/(2NAKd) such that the
ratio of |B(b)/B| is close to one (top). In these cases, nonbinding
contributions to B are
relatively small, i.e., |B(u)/B| ≪ 1 (center).
Contributions
of binding and nonbinding interactions to B = B(b) + B(u) for three protein pairs. We show estimates
from the insertion/removal method in color and estimates from the
subvolume method using larger symbols in gray. B of the strongest binders is dominated by
contributions of binding B(b) = −1/(2NAKd) such that the
ratio of |B(b)/B| is close to one (top). In these cases, nonbinding
contributions to B are
relatively small, i.e., |B(u)/B| ≪ 1 (center).Note that for Ubi/CUE and UDG/Ugi, the estimates for B23(u) = B23 – B23(b) are much smaller
than the
individual errors of B23 and B23(b) (∼27 000
nm3 for UDG/Ugi and ∼40 nm3 for Ubi/CUE; Table ). Naively, one would
think that these large uncertainties preclude reliable estimates for
the comparably small difference B23(u) in such a situation. However,
the estimates for B23 and B23(b) from
resampling are highly correlated because of the strong correlation
of B23 and Kd (Figure ). That
is, the individual errors of B23 and B23(b) do not determine the errors of their difference.Next, we
show that the naive estimate of Kd from
concentrations using eq actually suffers from a finite-size effect and that
it converges to the estimates obtained with the insertion/removal
and subvolume methods for large system sizes (Figure ). For comparison only, we evaluate eq for our predictions of pb(V) obtained at a volume (Figure ) and extrapolate the naive estimates for Kd until convergence is reached. For typical box sizes
used in simulations, Kd is underestimated
by about 10% for the lysozyme homodimer, the weakest binder considered
here, and by 3 orders of magnitude for UDB/Ugi, the strongest binder
considered here. To reach convergence when using eq , the box volumes have to be increased by
a factor ∼100 for the weakest binder and by a factor ∼100 000
for the strongest binder compared to typical box sizes.
Figure 6
Finite-size
correction gives box-size-independent dissociation
constants Kd (eq , red symbols). The naive estimate of Kd (eq , blue symbols) suffers from finite-size effects and converges
to the true value (gray horizontal line) for increasing box size.
We illustrate this convergence by evaluating eq for the predictions for pb(V) from the insertion/removal method
(black solid line) and the subvolume method (red dashed line). Approximately
corrected estimates (eq , eq of de Jong
et al.,[18] green symbols) suffer from finite-size
effects and converge to the true value for increasing box volume.
Error bars have been obtained by resampling.
Finite-size
correction gives box-size-independent dissociation
constants Kd (eq , red symbols). The naive estimate of Kd (eq , blue symbols) suffers from finite-size effects and converges
to the true value (gray horizontal line) for increasing box size.
We illustrate this convergence by evaluating eq for the predictions for pb(V) from the insertion/removal method
(black solid line) and the subvolume method (red dashed line). Approximately
corrected estimates (eq , eq of de Jong
et al.,[18] green symbols) suffer from finite-size
effects and converge to the true value for increasing box volume.
Error bars have been obtained by resampling.Using eq , we obtain
finite-size effect-free estimates for Kd at all box volumes (see Figure ). In contrast, the estimates obtained using the approximate
relation given by eq (eq of de Jong
et al.[18]) show small but systematic deviations
determined by B(u) (eq ); (see Figure ). These systematic deviations decrease with
increasing box volume as 1/V (Figure ). For the three dimers considered here,
these differences are in the range of ±5% for the smallest box
sizes used here.
Figure 7
Relative difference between the approximate estimates Kd′ (eq , eq of de Jong et al.[18]) and the box-size-independent estimates Kd (eq )
for the dissociation
constant as shown in Figure (discs) as functions of the inverse box volume 1/V. This difference is proportional to 1/V and to the contribution of unbound states to the second osmotic
virial coefficient, B(u) (eq , lines).
Relative difference between the approximate estimates Kd′ (eq , eq of de Jong et al.[18]) and the box-size-independent estimates Kd (eq )
for the dissociation
constant as shown in Figure (discs) as functions of the inverse box volume 1/V. This difference is proportional to 1/V and to the contribution of unbound states to the second osmotic
virial coefficient, B(u) (eq , lines).
Conclusions
We have shown how to calculate the dissociation
constant Kd of two proteins in a box from
the fraction
of protein dimers and the second osmotic virial coefficient B. We derived and validated
two methods to calculate B: For implicit solvents, we can use standard Monte Carlo or
molecular dynamics simulations of two proteins in a box and determine
insertion and removal energy distribution functions. From the latter,
we determine the two-particle partition function and thus B using BAR/WHAM. For implicit
and explicit solvents, we can calculate the probability that the two
proteins are within a volume at least covering the interaction range
of the two proteins.[29] Calculating B from the radial distribution
function or equally the potential of mean force via an integral is
equivalent to this method. For the coarse-grained simulations performed
here, both methods provide accurate results with comparable uncertainties.The relationship between Kd and B given by eq is also well suited for the quantification
of protein interactions in molecular dynamics simulations using explicit
solvents. Fully atomistic simulations of concentrated protein solutions
in explicit solvents have become computationally feasible on the microsecond
scale.[15,16] These studies have been facilitated by recent
improvements in molecular force fields, which correct, among other
things, for an increased stickiness of protein surfaces.[58−61] These parameterization efforts can benefit from comparisons of Kd and B to the experiment.Fully atomistic simulations are within
reach for the protein pairs
considered here. The box volume used here corresponds to about 300 000
particles in fully atomistic simulations using explicit solvents.
The binding and unbinding of weakly binding proteins like lysozyme
can be simulated atomistically without bias.[15] For more strongly binding proteins, enhanced sampling techniques
have to be applied.[62] Binding and unbinding
events of proteins and other molecules can be simulated efficiently
without bias also in molecular dynamics simulations using explicit
solvents using the MARTINI model, for example.[63−65]The sampling
strategy used here for weak binders is different from
the sampling strategy commonly used for strong binders. Strong binders
usually have specific interfaces, and the dissociation constant is
determined by the binding free energy to these specific interfaces.
If these interfaces are known, then we only have to calculate the
binding free energy for these specific binding poses dominating Kd.[17] For weak binders,
also nonspecific binding can contribute significantly to Kd and thus has to be sampled.B also plays an
important role in understanding phase separation by which liquid droplets
are formed within cells.[66] Specifically,
the Flory–Huggins solution theory is used to model liquid–liquid
phase separations.[67,68] In this framework, the Flory
interaction parameter χ is determined by Kd and B.[69]B also determines the “effective solvation volume” up
to a proportionality constant, a quantity commonly used in polymer
science.[70]The interactions of proteins
in nonbinding configurations can be
quantified by B(u), which is fully determined by Kd and B and which is thus a well-defined thermodynamic quantity. These
interactions shape the physicochemical properties of the crowded environments
inside cells. For example, nonbinding interactions can lead to demixing
and therefore to colocalization of binding partners. This colocalization
effectively increases the binding probability.In principle,
the contributions B(u) of nonbinding interactions to B can be determined experimentally.
SAXS experiments provide
information about B in the forward scattering intensities as well as information about
dimerization, and thus Kd, encoded in
the radius of gyration. Varying protein concentrations in equilibrium
sedimentation experiments can provide estimates for Kd and B.[10] The latter is used to correct for
the nonideality of the protein solution. Equation can be viewed as such a correction for nonideality.
Especially for weak binders, we expect that Kd and B can
be estimated accurately enough such that the contributions B(u) of nonbinding conformations to B can be determined. Similar to the
calculations performed here, we expect that in sedimentation experiments,
the uncertainties in the estimates for B(u) will be much smaller than the individual uncertainties in the estimates
for Kd and B.Complexes++ simulation software and the
binless-WHAM code can be
downloaded free of charge at https://www.github.com/bio-phys/complexespp and at https://github.com/bio-phys/binless-wham, respectively.
Authors: Djurre H De Jong; Lars V Schäfer; Alex H De Vries; Siewert J Marrink; Herman J C Berendsen; Helmut Grubmüller Journal: J Comput Chem Date: 2011-04-05 Impact factor: 3.376
Authors: Adip Jhaveri; Dhruw Maisuria; Matthew Varga; Dariush Mohammadyani; Margaret E Johnson Journal: J Phys Chem B Date: 2021-04-07 Impact factor: 2.991
Authors: Monica L Fernández-Quintero; Patrick K Quoika; Florian S Wedl; Clarissa A Seidler; Katharina B Kroell; Johannes R Loeffler; Nancy D Pomarici; Valentin J Hoerschinger; Alexander Bujotzek; Guy Georges; Hubert Kettenberger; Klaus R Liedl Journal: Front Mol Biosci Date: 2022-01-26