Literature DB >> 32379446

Quantifying Protein-Protein Interactions in Molecular Simulations.

Alfredo Jost Lopez¹, Patrick K Quoika¹, Max Linke¹, Gerhard Hummer^1,2, Jürgen Köfinger¹.

Abstract

Interactions among proteins, nucleic acids, and other macromolecules are essential for their biological functions and shape the physicochemcial properties of the crowded environments inside living cells. Binding interactions are commonly quantified by dissociation constants Kd, and both binding and nonbinding interactions are quantified by second osmotic virial coefficients B2. As a measure of nonspecific binding and stickiness, B2 is receiving renewed attention in the context of so-called liquid-liquid phase separation in protein and nucleic acid solutions. We show that Kd is fully determined by B2 and the fraction of the dimer observed in molecular simulations of two proteins in a box. We derive two methods to calculate B2. From molecular dynamics or Monte Carlo simulations using implicit solvents, we can determine B2 from insertion and removal energies by applying Bennett's acceptance ratio (BAR) method or the (binless) weighted histogram analysis method (WHAM). From simulations using implicit or explicit solvents, one can estimate B2 from the probability that the two molecules are within a volume large enough to cover their range of interactions. We validate these methods for coarse-grained Monte Carlo simulations of three weakly binding proteins. Our estimates for Kd and B2 allow us to separate out the contributions of nonbinding interactions to B2. Comparison of calculated and measured values of Kd and B2 can be used to (re-)parameterize and improve molecular force fields by calibrating specific affinities, overall stickiness, and nonbinding interactions. The accuracy and efficiency of Kd and B2 calculations make them well suited for high-throughput studies of large interactomes.

Entities: Chemical Disease Species

Mesh：

Substances：
Proteins
Solvents

Year: 2020 PMID： 32379446 PMCID： PMC7294537 DOI： 10.1021/acs.jpcb.9b11802

Source DB: PubMed Journal: J Phys Chem B ISSN： 1520-5207 Impact factor: 2.991

Introduction

In biological cells, most protein, DNA, and RNA molecules have to bind to specific binding partners to perform their biological functions. These specific interactions compete with nonspecific interactions, and cells have evolved various mechanisms to minimize wasteful nonspecific binding.[1,2] However, nonspecific interactions shape the physicochemical properties of the crowded environments inside cells.[3] The quantification of binding affinities and interaction strengths of biological macromolecules is thus crucial for the understanding and modeling of cellular processes. In the following, we focus on protein–protein interactions, but all of our results are generally applicable to other specific and nonspecific binding interactions. Experimentally, protein interactions are quantified by the dissociation constants Kd and the second osmotic virial coefficient B of protein species i and j. We follow the common convention and use B22 for self-interactions and B23 for cross-interactions. The dissociation constant Kd quantifies the amount of bound proteins and can be measured in isothermal titration calorimetry, surface plasmon resonance, or analytical ultracentrifugation experiments, for example.[4] The interaction strength of pairs of proteins in binding and nonbinding configurations can be quantified by measuring the second osmotic virial coefficient B, which relates the microscopic protein interactions to the macroscopic osmotic pressure.[5−7] Moreover, the second osmotic virial coefficient is related to solubility and used as a predictor for protein crystallization conditions.[8,9] In experiments, B is measured by sedimentation[10−12] and size-exclusion chromatography.[8] Scattering experiments, such as static light scattering (SLS) and small-angle X-ray scattering (SAXS) experiments, can provide approximate estimates for B.[13,14] Kd and B are crucial quantities to relate molecular simulations of interacting proteins to the experiment. Such comparisons become increasingly important as molecular simulations of crowded cell-like environments have become computationally feasible, even in full atomic detail.[15,16] In simulations of strong binders, Kd is usually determined by calculating the binding free energy to specific binding interfaces.[17] If binding interfaces are unknown, Kd values are often calculated from the ratio of bound and unbound populations,[18] as recently applied to RNA–RNA binding.[19] As we will discuss here, this approximation is accurate only for special cases. B can be estimated by integration over the configuration space,[20−22] by Mayer sampling,[23,24] from molecular simulations using radial distribution functions or potentials of mean force,[25−28] and by simply counting all configurations in which proteins do not interact.[29,30] Here, we show that Kd is fully determined by B and the fraction pb(V) of bound proteins estimated from molecular simulations of two proteins in a box with volume V, i.e.Here NA is Avogadro’s constant. In the derivation of this equation, we do not make any assumptions about the interaction strength or about the degrees of freedom of proteins or the solvent. Thus, it is generally applicable and valid not only for coarse-grained simulations using implicit solvents but also for fully atomistic molecular dynamics simulations using explicit solvents. We present two different routes to calculate B and thus Kd. For simulations using implicit solvents, we can apply protein insertion and removal moves to estimate the free energy that corresponds to the two-particle partition function determining B. The insertion ensemble can be generated with any Monte Carlo or molecular dynamics code to sample from the canonical ensemble without modification. We estimate the partition function by combining the insertion and removal ensemble using either Bennett’s acceptance ratio (BAR) method[31] or the binless weighted histogram analysis method (WHAM).[32−34] In contrast to the Mayer sampling method,[23,24] which uses molecular Monte Carlo integration to calculate virial coefficients even of higher orders, here, we use exactly the same simulation system for the calculation of B as we use to sample from the canonical ensemble. For simulations using either implicit or explicit solvents, B can be calculated accurately by estimating the probability that the two proteins are outside of their interaction range.[29] We present mathematically simple expressions for B and Kd in terms of this probability, which provide insights into their physical interpretations complementary to more common formulations based on radial distribution functions or potentials of mean force. We quantify the interactions of the two proteins when they are not bound using Kd and B. Previously, theoretical models for excluded volumes have been used to extract nonbinding interactions from experimentally measured B values.[35] Here, we use the fact that the contributions of bound configurations to B are completely determined by Kd and show that the remaining contributions have a simple and clear interpretation. Moreover, we propose that these contributions of nonbinding interactions can be estimated in experiments. The article is organized as follows. In Section , we derive expressions to calculate the dissociation constant and the second osmotic virial coefficient from simulations. We present the details of our methods in Section and a validation of our methods and results for three weekly binding proteins using coarse-grained simulations in Section . We end with conclusions in Section .

Theory

For simulations of two proteins in a box, we show that the dissociation constant Kd is determined by the binding probability and the second osmotic virial coefficient B of protein species i and j. The latter is determined by the two-particle partition function, which in general can be estimated from the fraction of states where proteins are outside of their interaction range[29] or, for implicit solvents, by performing a free energy calculation using insertion and removal moves.

Preliminaries

McMillan and Mayer[5] have shown how we can apply results of statistical mechanics to describe osmotic properties of solutions. Integrating out solvent degrees of freedom, only solute degrees of freedom remain and solutes interact with each other via effective potentials. For such a system with m solute species, the virial equation of state[36,37] becomes the osmotic virial equation of state, i.e.where Π is the osmotic pressure, Vm is the molar volume, R is the gas constant, T is the temperature, x is the mole fraction of species i, and B is the osmotic second virial coefficient of proteins of species i and j. We can express the second virial coefficients B of an arbitrarily shaped particle of species i and an arbitrarily shaped particle of species j, via one- and two-particle configurational partition functions. To do so, we extend the derivation by McQuarrie to nonspherical particles[7] and start from the grand canonical partition functionwhere V is the volume, N is the number of molecules of species i containing n atoms each, and z = exp(βμ) is the fugacity determined by the chemical potential μ of species i and the inverse temperature β = 1/(kbT). kb is Boltzmann’s constant. The osmotic pressure is a function of the fugacities and given by βΠV = ln Ξ(T,V, z1,...,z).[38,39] Here, we write the canonical partition function Q(N1,...,N) of m species of arbitrarily shaped particles aswhere is the corresponding configurational partition functionwhere the potential energy U(X) depends on the set X of all |X| = ∏Nn atom positions. In eq , we introduced for the single-particle canonical partition function, e.g., . For spherically symmetric particles, and we recover McQuarrie’s expression[7] for Q(N1, ..., N). For rigid cylindrically symmetric and asymmetric particles, and , respectively. Note that in the following, we use “Z” instead of “ for these expressions for rigid molecules to distinguish them from the full configurational partition function of flexible molecules written as calligraphic “. We obtain for the second osmotic virial coefficientswhere we introduced for the two-particle partition function, e.g., for a pair of particles of species 1 and 2 or for a pair of particles of species 1.

Estimating the Dissociation Constant

We show how to obtain a box-size-independent estimate of the dissociation constant Kd from simulations of two proteins in a box. Kd is related to the Gibbs binding free energy ΔG viawhere c0 = 1M is the standard concentration and Δp is the pressure difference between bound and unbound states.[40] The last term is usually small and can be neglected. For large enough box volumes V, one would be tempted to estimate the dissociation constant of two proteins A and B directly from the binding probability pb(V). For a discussion of suitable definitions of bound states, see Section . Using the concentrations of free proteins [A] = [B] = (1 – pb(V))/(NAV) and the concentration of bound protein [AB] = pb(V)/(NAV), a first rough estimate of the dissociation constant is given byFor small box sizes typically used in simulations, this estimate suffers from finite-size effects. Accurate estimates using eq would require unusually large boxes, as we show in Section , which makes sampling highly inefficient. To overcome this finite-size effect, we effectively extend the box volume analytically and calculate Kd in the limit of infinite volume (Figure ). We emphasize here that in the following derivation, we consider fully flexible proteins without any restrictions on their internal degrees of freedom. We remove the translational and rotational degrees of freedom of the protein of species i, which correspond to a factor Z(V) = 8π2V in the partition function for asymmetric proteins. That is, we fix the position and orientation of the protein of species i, which leaves the internal degrees of freedom due to the flexibility of the protein unchanged. The corresponding partition function of j in the presence of i, with i fixed in position and orientation but internally flexible, is given by We extend this system with a fixed position and orientation of the flexible protein i by an additional volume ΔV accessible to the second protein. The contribution to the partition function of a protein of species j being in this additional volume ΔV is given bywhere Z(ΔV) = 8π2ΔV gives the contribution due to the translational and rotational degrees of freedom of an asymmetric protein to the partition function. and are the partition functions of individual proteins i and j, whose positions and orientations are fixed in space. That is, and contain only contributions due to the respective internal degrees of freedom of free proteins and due to the degrees of freedom of solvent molecules in the vicinity of the proteins, which differ from the bulk due to the presence of the protein. For rigid protein models in implicit solvents, .

Figure 1

Calculating the dissociation constant Kd and the second osmotic virial coefficient B from simulations of two proteins in a box of volume V. The red protein has a single specific wedge-shaped binding site for the triangular blue protein. The light-blue protein configurations illustrate different interaction modes of the two proteins considered in the derivation of eqs and 28. To obtain a Kd estimate independent of box size, we analytically extend the two-particle partition function for the simulation box by the contributions of an extension volume ΔV (gray shaded area) and perform the limit ΔV → ∞. We calculate B from the probability p(V) that the two proteins are within a subvolume v (green), which is at least large enough to cover all protein–protein interactions (yellow shaded area). The probability pb(Vex) that the two proteins are bound in the extended volume Vex = V + ΔV is now given by the ratio of the partition function of the bound proteins to the partition function of the extended system . With the position of protein i fixed, is independent of the size of the volume V and thus the same for the simulation box and for the extended system, i.e., . ConsequentlyTo calculate a Kd value unaffected by the finite size of the simulation box, we now substitute eq into eq . We then take the limit ΔV → ∞ and use that Z(V)/V = 8π2 to obtainWe can rewrite this equation realizing that the partition function of all bound states of the system, where also protein i can move and rotate, is given by . Note that is proportional to V. Equation becomesExpressing by the second osmotic virial coefficient defined in eq and inserting the resulting expression in eq , we obtain the relationship between Kd and B given in eq As a corollary, the volume dependence of the fraction of bound proteinsis parameterized by Kd and B. As we derive in the following, Kd and B fulfill the approximate relationThis approximate relationship becomes an exact relationship if we define all interacting states as bound states[41,42] or for proteins that do not interact when they are not bound (see Section ). We write as a sum of the partition functions for the bound and for the unbound states, i.e., , and insert this expression in eq . We then obtainIf unbound interactions are weak, thensuch thatwhere we used eq . Rearranging this equation, we arrive at eq . We can now insert this expression into eq and obtainfrom which we can express Kd to obtain an approximate estimate for Kd, which we call Kd′, i.e.Here, we introduced the fraction of unbound protein configurations as pu(V) = 1 – pb(V). Note that eq corresponds to eq of de Jong et al.[18] For the exact relationship between Kd and Kd′, see Section .

Estimating the Second Osmotic Virial Coefficient

As we have shown above, we have to estimate B to accurately estimate Kd. To do so, we apply the same concepts as we have used for the calculation of Kd. We first remove contributions to the partition function due to the translational and rotational freedom of the whole system by keeping the position and the orientation of the otherwise flexible protein i fixed (eq ). Around this protein, we define a subvolume v < V, which has to be big enough such that it captures all protein–protein interactions (Figure ). Outside this subvolume, protein–protein interactions can be neglected. That is, the flexible protein j moves freely when it is in volume δv = V – v. The probability p(V) that protein j is in subvolume v is given bywhere is given analogous to eq and Z(δv) = 8π2δv for asymmetric proteins. We can express from eq asUsiing eq for , it follows that1 – p(V) is the probability that protein j is in volume δv. ConsequentlyUsing that Z(v) and Z(δv) are proportional to their arguments with the same prefactor (see Section ) and that δv = V – v, where V is the box volume, we obtainSolving for p(V), we obtainwhich describes the dependence of p(V) on the box volume V and the subvolume v. We emphasize that eq is generally valid for arbitrary binding partners, without making any assumptions about symmetry or the number of internal degrees of freedom of the binding partners or of the solvent. The only condition is that interactions between binding partners are negligible outside of the volume v. We can introduce correction terms based on an effective pairwise potential acting between the binding partners if this condition is not fulfilled (see Section ). To motivate the interpretation of eq , we rewrite it asNote that the prefactor in eq contains the box volume V, whereas the prefactor in eq contains the subvolume v. The first term in the brackets, determining the two-particle partition function, is the ratio of the probability of finding one protein outside of the subvolume v for the ideal system, 1 – v/V, to the corresponding probability for the interacting proteins, 1 – p(V). This ratio, which is the inverse of the quantity f2(V) of Ashton and Wilding,[29] is independent of the subvolume v, chosen to be just large enough to cover the interaction range. Consequently, the first term in the brackets in eq can be written as 1/exp[−βFo(ex)(V)], where we introduced the excess free energy of finding the two proteins outside of their interaction range in the box of volume V asWe express Kd as a function of p(V) by inserting eq into eq and obtainWe next establish the commonly used relationship of B to the partial radial distribution function g(r).[7] The ratio of p(V)/(1 – p(V)) can be estimated from the probability density of center-of-mass distances p(r) of two proteins in a box, for instance, which is itself related to the radial distribution function g(r). To do so, we define a spherical volume v = 4πR3/3 and a spherical shell around this sphere with volume δv = 4π[(δR + R)3 – R3]/3. The ratio is then given byWe define a radial distribution function g(r) throughWe can choose the proportionality constant such that g(r) = 1 for r > R, where p(r) ∝ r2. Then, 4π ∫g(r)r2 dr = δv and we may writeInserting this expression in eq and using that ∫0 4πr2dr = v, we obtainBy introducing an effective interaction potential βw(r) = −ln g(r), we can write eq as it is commonly presentedUsing eq instead of eq or 35, we can avoid the computation of distance distribution functions and potentials of mean force, respectively, and the subsequent integration. Importantly, we also do not have to estimate the plateau value of g(r), which in simulations is different from one and which depends on system size and the thermodynamic ensemble.[29,43] Although these differences might be viewed only as a minor simplification, eq emphasizes that B is independent of the detailed shapes of g(r) and w(r) and determined by the excess free energy Fo(ex)(V) of finding the two proteins outside of their interaction range. Note that our results also apply to the infinite dilution limits of the Kirkwood–Buff integrals G = 4π ∫∞[g(r) – 1]r2 dr = 2B.[13,44,45]

Contribution of Nonbinding Interactions to B

We can use Kd and B to quantify the nonbinding interactions of two proteins. Let us first consider two nonbinding proteins for which (V) = 0. Consequently, eq becomeswhere we use the superscript “(u)” to indicate contributions of the unbound states. For binding proteins, B(u) is given by the difference between B = B(u) + B(b) and the contributions to B(b) due to bindingi.e., we can quantify the nonbinding interactions for two binding proteins viawhich becomesFor hard spheres, pu(V) = 1 and p(V) = (v – vexc)/(V – vexc), where vexc is the excluded volume, such that B(u) = vexc/2. For attractive nonbinding interactions B(u) < vexc/2, and for repulsive nonbinding interactions B(u) > vexc/2. Note that for asymmetric proteins, vexc corresponds to an excluded region in the configuration space, which, for instance, is spanned by Cartesian coordinates and Euler angles in the case of rigid proteins. Thus, in general, vexc should be viewed as an effective volume corresponding to a thermodynamic free energy. We now show that B(u) quantifies the difference between the approximate expression for Kd in eq and the box-size-independent expression for Kd in eq . Inserting eq into eq , we obtainsuch that the relative difference is given byConsequently, the approximate estimate Kd′ deviates systematically from the true value Kd, with deviations proportional to B(u), but converges to the true value with increasing box volume as 1/V.

Indistinguishable Binding Partners (Homodimers)

So far, we have assumed that the proteins are distinguishable, i.e., that they form heterodimers, but all expressions derived here are also valid for indistinguishable binding partners forming homodimers. To consider the case of two identical binding partners, we rewrite eq aswhere we introduced for the partition function of two free proteins, which is determined by the product of two single-protein partition functions. For indistinguishable binding partners forming homodimers, both and would have to be multiplied by a factor 1/2 to account for the indistinguishablity of the proteins. However, these factors then cancel in the ratio in eq .

Kd and B from a Single Simulation

We can estimate Kd and B from the fraction of bound protein pb(V) and the probability p(V) of one protein being located in a subvolume v around the other. The latter determines B according to eq , which we then insert into eq to obtain the finite-size corrected estimate of Kd. We call this method the subvolume method. To calculate B, we can also estimate the two-particle partition function Z(V), now for simplicity but without loss of generality only considering rigid molecules, using free energy methods.[46] For implicit solvents, we can use insertion and removal moves of the proteins to efficiently estimate Z(V), as explained in the following. We call this method the insertion/removal method.

Estimating Two-Particle Configurational Partition Functions for Implicit Solvents

A simulation of a pair of proteins in a box of volume V at reciprocal temperature β gives us immediately the particle-removal energy distribution as the normalized distribution of potential energies. We define x = (r,Ω), where r are the Cartesian coordinates of the geometric center of protein i and Ω are its Euler angles defining its orientation. We denote the configuration space as W = V × Ω to simplify the notation. The particle-removal energy distribution is then given bywhere Z23(β) = ∫ dx2dx3e–β and δ[·] is Dirac’s delta function. The particle-insertion energy distribution pins(E) is formally given bywhere Z(β = 0) = ∫  dx. Sampling the particle-insertion energy distribution pins(E) for a given box size is straightforward. All one needs is a replica with reciprocal temperature β = 0 exactly. All moves will then be accepted, and the energies saved are those of random insertions. Alternatively, one could make trial moves of the two proteins with Monte Carlo move widths ±L/2, where L is the box length, and the orientation changes about random axes by ±π, and to write out the absolute trial (!) energies (not the energy differences or the accepted energies). With such a move protocol, it would not matter if one or both particles were moved and if moves are accepted or not. It also does not matter what the “acceptance rate” is (i.e., it can be zero!). What is important, though, is that the box volumes in insertion and removal runs are the same. The normalized removal and insertion energy distributions are related to each other bywhich follows fromThe ratio of partition functions defines the free energy of going from a system of two noninteracting particles to a system in which they interactNote that F = −Fo(ex)(V) (see eq ). An efficient way of determining this free energy is to use the Bennett acceptance ratio (BAR) estimator[31]where E are the uncorrelated (by construction) insertion energies and are the uncorrelated removal energies. However, it is clear that this is problematic in cases where the proteins are strongly bound (forming a dimer!) because then one would have very little information about higher energies. This problem can be remedied using all of the data in a temperature replica exchange simulation. In effect, the high-temperature runs allow us to estimate an accurate density of states to a pretty high energy. The particle-insertion energies complement this density of states on the high-energy side. All of the runs at different temperatures can be combined with the list of insertion energies using binless WHAM. As a reference, we take the temperature of interest (β = β1 without loss of generality). The bias energies at replicas with reciprocal temperature β are then ΔU = (β/β – 1)U. This formula works also for the insertion energies coming from a run with β = 0. The insertion energies can be thought of as coming from a run with the bias potential ΔU = −U, i.e., on potential zero. A binless-WHAM analysis using these bias energies as input will produce the required free energy F as the difference between the reference state and the insertion run.

Practical Considerations

In the derivation of Kd and B, we have assumed that the volume is large enough such that interactions between the protein with a fixed position and orientation and the protein in the extended volume can be neglected. If this condition is not fulfilled, then we can correct for residual interaction energies using a simple distance-dependent interaction potential ϕ(r) in the calculation of , where denotes the Cartesian space defining ΔV. For example, at large distances, the interaction of charged proteins can be approximated by (screened) Coulomb interactions of the total charges located at the centers of charge. In such a case, we would include for the calculation of the fraction bound only configurations of the simulation where the two proteins are separated less than a cutoff distance, usually given by half the shortest box length. Such a system corresponds to a spherical volume with one protein at its center and the other one moving unrestrained. Doing so, we assume that the residual interaction modeled as a simple pair-potential has a negligible effect on the internal degrees of freedom and the degrees of freedom of the surrounding solvent, i.e., and are unchanged. Suitable definitions of bound states will depend on the molecular model we use for simulations. In our simulations of rigid proteins in implicit solvents, we consider a state as bound if the interaction energy of the two proteins is smaller than −2kbT. Additionally demanding that two proteins have to have a minimum Cα distance smaller than 0.8 nm to be counted as bound does not have a noticeable effect on the binding probability. For molecular dynamics simulations using explicit solvents, a combination of distance- and energy-based criteria and using transition-based-assignment of states[47] might be necessary to reliably distinguish bound states from spurious contacts. In simulations of two proteins in a box, we can estimate p(V) using a distance-based criterion as has been introduced by Ashton and Wilding.[29] We define a distance between the two proteins, e.g., the center-of-mass distance r. We introduce a distance R such that interactions between proteins are negligible for distances r > R. For an ensemble of N structures, we count the number of structures N for which r ≤ R. In these structures, the center-of-mass of protein 2 lies within a spherical volume v = 4πR3/3 centered at the center-of-mass of protein 1. We then estimate p(V) = N/N. For strong binders and in boxes of typical size, p(V) is close to one. For p(V) = 1, (1 – v/V)/(1 – p(V)) in eq diverges. Consequently, p(V) has to be determined with sufficient numerical precision to obtain accurate estimates. For example, if we sample 10 000 configurations, then the numerical precision of p(V) is limited to 1/10 000. The precision can be increased by sampling more configurations or, in the case of replica simulation, by including additional replicas using WHAM when calculating p(V). For weak binders with Kd ≳ 100 μM, 10 000 configurations are sufficient to estimate Kd and B even without applying WHAM.

Methods

We chose three weakly binding protein pairs with experimental Kd values covering 3 orders of magnitude from ∼μM to ∼mM. The lysozyme homodimer has an experimental Kd value of Kd ≈ 2710 ± 240 μM[48] (PDB 6LYZ[49]), the ubiquitin/CUE dimer (PDB 1OTR[50]) has a Kd ≈ 155 ± 9 μM,[50] and the dimer of the uracil-DNA glycosylase UDG and its uracil-DNA glycosylase inhibitor protein (Ugi) has a Kd ≈ 1. 3 ± 0.3 μM[51] (PDB 1UUG[52]). To simulate these protein pairs, we use the amino-acid-level coarse-grained model developed by Kim and Hummer for weakly binding proteins[53] implemented in the Complexes++ software (https://www.github.com/bio-phys/complexespp). We treat all proteins as rigid bodies. In contrast to the original model, which is called the KH-model, we shift the original Miyazawa and Jernigan parameters[54,55] by e0 = −1.875 kbT, where T = 300 K, to account for the solvation energy and we scale the resulting parameters by λ = 0.1243 to balance them with the electrostatic interactions. In the original model, e0 = −2.27 kbT and λ = 0.159. The new values have been chosen to better reproduce the B22 value of lysozyme and the Kd value of the ubiquitin/UIM1 complex. We chose residue charges of −1.0e for Asp and Glu, +1.0e for Arg and Lys, and +0.5e for His because its isoelectric point is at pH 7. e is the elementary charge. Consequently, the total charges of the proteins are +8.5e for lysozyme, +0.5e for ubiquitin, −4.5e for CUE, +7.5e for UDG, and −11.5e for Ugi. We set the dielectric constant to 80 and the Debye length to 1 nm, corresponding to the conditions in an aqueous solution of 100 mM NaCl. To generate Boltzmann ensembles of configurations, which also provide the removal energy distributions defined in eq , we perform temperature replica exchange Monte Carlo (REMC) simulations using 24 replicas. Temperatures were equally spaced between 300 and 530 K. In a Monte Carlo sweep, each protein performs one trial move on average, which can be translation or rotation. Replica exchanges are attempted every 10 sweeps. For the rotation move, a rotation axis is randomly generated by drawing a point from a sphere. Then, we rotate around this axis by an angle, which we draw from a box distribution with a width given by twice the maximum angle. This maximum angle is set to 0.1 rad for the coolest replica and to 1.25 rad for the hottest replica and spaced equidistantly in between. Similarly, we set the maximum displacement for the translation move to 0.2 nm in the coolest replica and to 1.35 nm in the hottest replica, with equal spacing in between. In our simulations, we use a cutoff radius of 3 nm to truncate our interaction potentials. To sample the insertion energy distribution defined in eq in simulations, we switch off all interactions by setting all interaction parameters and residue charges to zero. We use a maximum displacement of half the box length and a maximum rotation angle of π. We accept and sample all configurations to generate the insertion ensembles, for which we then recalculate all energies for switched-on potentials. To estimate the two-particle partition function, we combine results from REMC simulations (removal ensemble) and the energies calculated for the ensemble of noninteracting proteins (insertion ensemble) using binless WHAM.[32−34] To avoid numerical problems, we clip interaction energies at 100 kbT. We define two proteins as being bound if their total interaction energy is below −2kbT. For equilibration, we performed 106 Monte Carlo sweeps in each replica. For production, we performed 107 sweeps and we sampled every 100th sweep, yielding 105 structures for each protein pair per replica. We also performed 106 insertion moves for each pair, which by design creates uncorrelated configurations. To study the box volume dependence of the fraction bound pb(V), we calculated for the coolest replica pb = N/N. N is the number of structures with energies E ≤ −2kbT, and N = 105 is the total number of structures. To study the box volume dependence of the subvolume probability p(V), we calculated for the coolest replica p = N/N, where N is the number of structures within the subvolume v. We defined this volume as a spherical volume with a radius given by the sum of (D + D)/2, where D and D are the largest diameters of proteins of species i and j, respectively, and our cutoff radius of 3 nm. The resulting radii are between ∼6.7 and ∼7.4 nm for the three proteins. For each protein pair, we performed simulations for 17 box sizes with volumes ranging from 3375 to 106 nm3. We calculated the standard errors of the mean by block averaging.[56,57] We validate the insertion/removal method and the subvolume method for the smallest boxes used here with volume = 153 nm3 = 3375 nm3. With uniform probability, we selected at random 10 000 of the N = 105 samples and chose for each replica the configurations corresponding to the same 10 000 indices. We also drew 10 000 configurations of the 106 configurations in the insertion ensemble with uniform probability. In the insertion/removal method, we then applied WHAM using these 250 000 configurations in total to calculate pb() and , from which we then estimated Kd and B. We repeated this procedure 1000 times and calculated the averages of Kd and B and their covariance matrices. We confirmed visually that the distributions of the estimates of Kd and B are distributed according to two-dimensional Gaussians with the estimated covariance matrices. We use the same protocol to obtain estimates and uncertainties from resampling for the subvolume method, in which we do not use the insertion ensemble.

Results

We calculated Kd and B using the insertion/removal method and the subvolume method for three protein pairs, i.e., the lysozyme homodimer and the heterodimers ubiquitin/CUE and UDG/Ugi. As we will show, these estimates allow us to quantify the contributions due to binding and nonbinding interactions to B. In the insertion/removal method, we determine Kd and B from replica exchange simulations at a box volume and from insertion ensembles. We first estimated pb() and Z() by combining the insertion ensemble and the replicas of our temperature REMC simulations using WHAM. We then evaluated eq to obtain B and used this value together with our estimate for pb() in eq to obtain Kd. By resampling, we estimated the covariance matrix. In the subvolume method, we first estimated p() and pb() from all replicas using WHAM. We used eq to calculate B from p() and used this estimate together with pb() to estimate Kd using eq . We find that the estimates for Kd and B from the insertion/removal method and the subvolume method agree excellently with each other (Figure and Table ). Moreover, the estimates have similar uncertainties. Kd values and B values calculated by resampling are correlated for both methods (Figure ). A smaller value of B, i.e., a more negative value, leads to a smaller value of Kd according to eq .

Figure 2

Table 1

Kd, B, and the Contributions of Binding Interactions, B(b), and Nonbinding Interactions, B(u), to B for Three Protein Complexes (PDB codes 6LYZ, 1OTR, 1UUG) for the Insertion/Removal Method (“ins/rem”) and the Subvolume Method (“subvol”)a

lysozyme	method	K_d [μM]	B₂₂ [nm³]	B₂₂^(b) [nm³]	B₂₂^(u) [nm³]
	ins/rem	5191 ± 63	–77 ± 4	–160 ± 2	83 ± 4
	subvol	5188 ± 68	–78 ± 4	–160 ± 2	82 ± 3

Errors are standard errors of the mean.

Comparison of the accuracy of the insertion/removal method (ins/rem, black, solid lines) and the subvolume method (subvol, red, dashed lines) to estimate Kd and B for three different protein pairs (top to bottom). The most likely estimates are indicated by horizontal and vertical dashed lines. The contour lines indicate the limits of the 25, 50, 75, and 95% confidence regions. The insertion/removal method (eqs and 1 and the two-particle partition function from WHAM (Section ), black) and the subvolume method (eqs and 1, red) agree excellently with each other, and they have similar uncertainties. For UDG/Ugi, contour lines collapse on to a single line due to the strong correlation between the estimates for Kd and B. Errors are standard errors of the mean. For additional validation, we use the results for Kd and B obtained at the box volume to predict the box-size dependence of the fraction bound pb(V) and the subvolume probability p(V). We use eq and our estimates for Kd and B obtained at a box volume to calculate pb(V) (Figure ). We use eq and our estimates for B obtained at a box volume to calculate p(V) (Figure ). The resulting curves reproduce the box volume dependencies of pb(V) and p(V) observed in the entire range of simulations, covering nearly 3 orders of magnitude in volume.

Figure 3

Figure 4

Box-size dependence of the subvolume probability p(V) is determined by B via eq . We show simulation results (blue) for three protein pairs (top to bottom). Error bars indicate the blocked standard errors of the mean. The lines are predictions using eq and estimates of B obtained at a box volume = 3375 nm3 (magenta vertical line) using the insertion/removal method (black, solid lines) and the subvolume method (red, dashed lines).

Box-size dependence of the binding probability pb(V) is determined by B and Kd via eq . We show simulation results (blue) for three protein pairs (top to bottom). Error bars indicate the blocked standard errors of the mean. The lines are predictions using eq and estimates for Kd and B obtained at a box volume = 3375 nm3 (magenta vertical line) using the insertion/removal method (black, solid lines) and the subvolume method (red, dashed lines). Box-size dependence of the subvolume probability p(V) is determined by B via eq . We show simulation results (blue) for three protein pairs (top to bottom). Error bars indicate the blocked standard errors of the mean. The lines are predictions using eq and estimates of B obtained at a box volume = 3375 nm3 (magenta vertical line) using the insertion/removal method (black, solid lines) and the subvolume method (red, dashed lines). For strong binders, the fraction bound pb(V) and the subvolume probability p(V) take on similar values (compare Figures and 4). In these cases, p(V) is dominated by binding. For small boxes, pb(V) is close to one and consequently so is p(V). For box sizes large enough such that pb(V) is significantly below one, the contribution of the size of the subvolume v to p(V) is small. For UDG/Ugi, the strongest binding complex considered here, the fraction bound dominates p(V) such that the p(V) curve in Figure looks nearly identical to the corresponding pb(V) curve in Figure . However, the differences in these curves are significant as they are not only determined by the size of the subvolume v but also by the nonbinding interactions. We can extract the contributions B(u), eq , of nonbinding interactions to B. We can do so even in the case of strong binders for which the Kd value is close to B(b) = −1/(2NAKd) according to eq (Figure , top). With the estimates provided by either the insertion/removal method or the subvolume method, we can resolve the small difference B(u) = B – B(b) (Figure , center). Focusing on the results from the insertion/removal methods, we find that for lysozyme B(u) ≈ 83 ± 4 nm3 > 0. This value is close to what one would expect for hard spheres of equal volume, i.e., B(u) = vexc/2 ≈ 70 nm3. For ubiquitin/CUE, the interactions are clearly attractive, but B(u) ≈ −9 ± 3 nm3 nearly vanishes. For UDG/Ugi, B(u) ≈ −94 ± 5 nm3 indicates attractive interactions (Figure and Table ).

Figure 5

Contributions of binding and nonbinding interactions to B = B(b) + B(u) for three protein pairs. We show estimates from the insertion/removal method in color and estimates from the subvolume method using larger symbols in gray. B of the strongest binders is dominated by contributions of binding B(b) = −1/(2NAKd) such that the ratio of |B(b)/B| is close to one (top). In these cases, nonbinding contributions to B are relatively small, i.e., |B(u)/B| ≪ 1 (center). Note that for Ubi/CUE and UDG/Ugi, the estimates for B23(u) = B23 – B23(b) are much smaller than the individual errors of B23 and B23(b) (∼27 000 nm3 for UDG/Ugi and ∼40 nm3 for Ubi/CUE; Table ). Naively, one would think that these large uncertainties preclude reliable estimates for the comparably small difference B23(u) in such a situation. However, the estimates for B23 and B23(b) from resampling are highly correlated because of the strong correlation of B23 and Kd (Figure ). That is, the individual errors of B23 and B23(b) do not determine the errors of their difference. Next, we show that the naive estimate of Kd from concentrations using eq actually suffers from a finite-size effect and that it converges to the estimates obtained with the insertion/removal and subvolume methods for large system sizes (Figure ). For comparison only, we evaluate eq for our predictions of pb(V) obtained at a volume (Figure ) and extrapolate the naive estimates for Kd until convergence is reached. For typical box sizes used in simulations, Kd is underestimated by about 10% for the lysozyme homodimer, the weakest binder considered here, and by 3 orders of magnitude for UDB/Ugi, the strongest binder considered here. To reach convergence when using eq , the box volumes have to be increased by a factor ∼100 for the weakest binder and by a factor ∼100 000 for the strongest binder compared to typical box sizes.

Figure 6

Finite-size correction gives box-size-independent dissociation constants Kd (eq , red symbols). The naive estimate of Kd (eq , blue symbols) suffers from finite-size effects and converges to the true value (gray horizontal line) for increasing box size. We illustrate this convergence by evaluating eq for the predictions for pb(V) from the insertion/removal method (black solid line) and the subvolume method (red dashed line). Approximately corrected estimates (eq , eq of de Jong et al.,[18] green symbols) suffer from finite-size effects and converge to the true value for increasing box volume. Error bars have been obtained by resampling. Using eq , we obtain finite-size effect-free estimates for Kd at all box volumes (see Figure ). In contrast, the estimates obtained using the approximate relation given by eq (eq of de Jong et al.[18]) show small but systematic deviations determined by B(u) (eq ); (see Figure ). These systematic deviations decrease with increasing box volume as 1/V (Figure ). For the three dimers considered here, these differences are in the range of ±5% for the smallest box sizes used here.

Figure 7

Relative difference between the approximate estimates Kd′ (eq , eq of de Jong et al.[18]) and the box-size-independent estimates Kd (eq ) for the dissociation constant as shown in Figure (discs) as functions of the inverse box volume 1/V. This difference is proportional to 1/V and to the contribution of unbound states to the second osmotic virial coefficient, B(u) (eq , lines).

Conclusions

We have shown how to calculate the dissociation constant Kd of two proteins in a box from the fraction of protein dimers and the second osmotic virial coefficient B. We derived and validated two methods to calculate B: For implicit solvents, we can use standard Monte Carlo or molecular dynamics simulations of two proteins in a box and determine insertion and removal energy distribution functions. From the latter, we determine the two-particle partition function and thus B using BAR/WHAM. For implicit and explicit solvents, we can calculate the probability that the two proteins are within a volume at least covering the interaction range of the two proteins.[29] Calculating B from the radial distribution function or equally the potential of mean force via an integral is equivalent to this method. For the coarse-grained simulations performed here, both methods provide accurate results with comparable uncertainties. The relationship between Kd and B given by eq is also well suited for the quantification of protein interactions in molecular dynamics simulations using explicit solvents. Fully atomistic simulations of concentrated protein solutions in explicit solvents have become computationally feasible on the microsecond scale.[15,16] These studies have been facilitated by recent improvements in molecular force fields, which correct, among other things, for an increased stickiness of protein surfaces.[58−61] These parameterization efforts can benefit from comparisons of Kd and B to the experiment. Fully atomistic simulations are within reach for the protein pairs considered here. The box volume used here corresponds to about 300 000 particles in fully atomistic simulations using explicit solvents. The binding and unbinding of weakly binding proteins like lysozyme can be simulated atomistically without bias.[15] For more strongly binding proteins, enhanced sampling techniques have to be applied.[62] Binding and unbinding events of proteins and other molecules can be simulated efficiently without bias also in molecular dynamics simulations using explicit solvents using the MARTINI model, for example.[63−65] The sampling strategy used here for weak binders is different from the sampling strategy commonly used for strong binders. Strong binders usually have specific interfaces, and the dissociation constant is determined by the binding free energy to these specific interfaces. If these interfaces are known, then we only have to calculate the binding free energy for these specific binding poses dominating Kd.[17] For weak binders, also nonspecific binding can contribute significantly to Kd and thus has to be sampled. B also plays an important role in understanding phase separation by which liquid droplets are formed within cells.[66] Specifically, the Flory–Huggins solution theory is used to model liquid–liquid phase separations.[67,68] In this framework, the Flory interaction parameter χ is determined by Kd and B.[69]B also determines the “effective solvation volume” up to a proportionality constant, a quantity commonly used in polymer science.[70] The interactions of proteins in nonbinding configurations can be quantified by B(u), which is fully determined by Kd and B and which is thus a well-defined thermodynamic quantity. These interactions shape the physicochemical properties of the crowded environments inside cells. For example, nonbinding interactions can lead to demixing and therefore to colocalization of binding partners. This colocalization effectively increases the binding probability. In principle, the contributions B(u) of nonbinding interactions to B can be determined experimentally. SAXS experiments provide information about B in the forward scattering intensities as well as information about dimerization, and thus Kd, encoded in the radius of gyration. Varying protein concentrations in equilibrium sedimentation experiments can provide estimates for Kd and B.[10] The latter is used to correct for the nonideality of the protein solution. Equation can be viewed as such a correction for nonideality. Especially for weak binders, we expect that Kd and B can be estimated accurately enough such that the contributions B(u) of nonbinding conformations to B can be determined. Similar to the calculations performed here, we expect that in sedimentation experiments, the uncertainties in the estimates for B(u) will be much smaller than the individual uncertainties in the estimates for Kd and B. Complexes++ simulation software and the binless-WHAM code can be downloaded free of charge at https://www.github.com/bio-phys/complexespp and at https://github.com/bio-phys/binless-wham, respectively.

50 in total

1. COVOL: an interactive program for evaluating second virial coefficients from the triaxial shape or dimensions of rigid macromolecules.

Authors: S E Harding; J C Horton; S Jones; J M Thornton; D J Winzor
Journal: Biophys J Date: 1999-05 Impact factor: 4.033

2. Nonspecific binding limits the number of proteins in a cell and shapes their interaction networks.

Authors: Margaret E Johnson; Gerhard Hummer
Journal: Proc Natl Acad Sci U S A Date: 2010-12-27 Impact factor: 11.205

3. Coarse master equations for peptide folding dynamics.

Authors: Nicolae-Viorel Buchete; Gerhard Hummer
Journal: J Phys Chem B Date: 2008-01-31 Impact factor: 2.991

4. Determining equilibrium constants for dimerization reactions from molecular dynamics simulations.

Authors: Djurre H De Jong; Lars V Schäfer; Alex H De Vries; Siewert J Marrink; Herman J C Berendsen; Helmut Grubmüller
Journal: J Comput Chem Date: 2011-04-05 Impact factor: 3.376

5. The statistical-thermodynamic basis for computation of binding affinities: a critical review.

Authors: M K Gilson; J A Given; B L Bush; J A McCammon
Journal: Biophys J Date: 1997-03 Impact factor: 4.033

6. Quantifying the effects of neglecting many-body interactions in coarse-grained models of complex fluids.

Authors: Douglas J Ashton; Nigel B Wilding
Journal: Phys Rev E Stat Nonlin Soft Matter Phys Date: 2014-03-10

7. Association sites of lysozyme in solution. I. The active site.

Authors: A J Sophianopoulos
Journal: J Biol Chem Date: 1969-06-25 Impact factor: 5.157

8. Residue-residue potentials with a favorable contact pair term and an unfavorable high packing density term, for simulation and threading.

Authors: S Miyazawa; R L Jernigan
Journal: J Mol Biol Date: 1996-03-01 Impact factor: 5.469

9. Calculation of Second Virial Coefficients of Atomistic Proteins Using Fast Fourier Transform.

Authors: Sanbo Qin; Huan-Xiang Zhou
Journal: J Phys Chem B Date: 2019-09-19 Impact factor: 2.991

10. Negative second virial coefficients as predictors of protein crystal growth: evidence from sedimentation equilibrium studies that refutes the designation of those light scattering parameters as osmotic virial coefficients.

Authors: Marcin Deszczynski; Stephen E Harding; Donald J Winzor
Journal: Biophys Chem Date: 2005-11-21 Impact factor: 2.352

7 in total

1. Accurate model of liquid-liquid phase behavior of intrinsically disordered proteins from optimization of single-chain properties.

Authors: Giulio Tesei; Thea K Schulze; Ramon Crehuet; Kresten Lindorff-Larsen
Journal: Proc Natl Acad Sci U S A Date: 2021-11-02 Impact factor: 11.205

2. Thermodynamics and Free Energy Landscape of BAR-Domain Dimerization from Molecular Simulations.

Authors: Adip Jhaveri; Dhruw Maisuria; Matthew Varga; Dariush Mohammadyani; Margaret E Johnson
Journal: J Phys Chem B Date: 2021-04-07 Impact factor: 2.991

3. Physical modeling of multivalent interactions in the nuclear pore complex.

Authors: Luke K Davis; Anđela Šarić; Bart W Hoogenboom; Anton Zilman
Journal: Biophys J Date: 2021-02-20 Impact factor: 4.033

4. Simulation of FUS Protein Condensates with an Adapted Coarse-Grained Model.

Authors: Zakarya Benayad; Sören von Bülow; Lukas S Stelzl; Gerhard Hummer
Journal: J Chem Theory Comput Date: 2020-12-13 Impact factor: 6.006

5. Calculation of therapeutic antibody viscosity with coarse-grained models, hydrodynamic calculations and machine learning-based parameters.

Authors: Pin-Kuang Lai; James W Swan; Bernhardt L Trout
Journal: MAbs Date: 2021 Jan-Dec Impact factor: 5.857

Review 6. Computational models for studying physical instabilities in high concentration biotherapeutic formulations.

Authors: Marco A Blanco
Journal: MAbs Date: 2022 Jan-Dec Impact factor: 5.857

7. Comparing Antibody Interfaces to Inform Rational Design of New Antibody Formats.

Authors: Monica L Fernández-Quintero; Patrick K Quoika; Florian S Wedl; Clarissa A Seidler; Katharina B Kroell; Johannes R Loeffler; Nancy D Pomarici; Valentin J Hoerschinger; Alexander Bujotzek; Guy Georges; Hubert Kettenberger; Klaus R Liedl
Journal: Front Mol Biosci Date: 2022-01-26

7 in total