Jan Domański1,2, George Hedger1, Robert B Best2, Phillip J Stansfeld1, Mark S P Sansom1. 1. Department of Biochemistry, University of Oxford , South Parks Road, Oxford OX1 3QU, U.K. 2. Laboratory of Chemical Physics, National Institute of Diabetes and Digestive and Kidney Diseases, National Institutes of Health , Bethesda, Maryland 20892-0520, United States.
Abstract
Potential of mean force (PMF) calculations are used to characterize the free energy landscape of protein-lipid and protein-protein association within membranes. Coarse-grained simulations allow binding free energies to be determined with reasonable statistical error. This accuracy relies on defining a good collective variable to describe the binding and unbinding transitions, and upon criteria for assessing the convergence of the simulation toward representative equilibrium sampling. As examples, we calculate protein-lipid binding PMFs for ANT/cardiolipin and Kir2.2/PIP2, using umbrella sampling on a distance coordinate. These highlight the importance of replica exchange between windows for convergence. The use of two independent sets of simulations, initiated from bound and unbound states, provide strong evidence for simulation convergence. For a model protein-protein interaction within a membrane, center-of-mass distance is shown to be a poor collective variable for describing transmembrane helix-helix dimerization. Instead, we employ an alternative intermolecular distance matrix RMS (DRMS) coordinate to obtain converged PMFs for the association of the glycophorin transmembrane domain. While the coarse-grained force field gives a reasonable Kd for dimerization, the majority of the bound population is revealed to be in a near-native conformation. Thus, the combination of a refined reaction coordinate with improved sampling reveals previously unnoticed complexities of the dimerization free energy landscape. We propose the use of replica-exchange umbrella sampling starting from different initial conditions as a robust approach for calculation of the binding energies in membrane simulations.
Potential of mean force (PMF) calculations are used to characterize the free energy landscape of protein-lipid and protein-protein association within membranes. Coarse-grained simulations allow binding free energies to be determined with reasonable statistical error. This accuracy relies on defining a good collective variable to describe the binding and unbinding transitions, and upon criteria for assessing the convergence of the simulation toward representative equilibrium sampling. As examples, we calculate protein-lipid binding PMFs for ANT/cardiolipin and Kir2.2/PIP2, using umbrella sampling on a distance coordinate. These highlight the importance of replica exchange between windows for convergence. The use of two independent sets of simulations, initiated from bound and unbound states, provide strong evidence for simulation convergence. For a model protein-protein interaction within a membrane, center-of-mass distance is shown to be a poor collective variable for describing transmembrane helix-helix dimerization. Instead, we employ an alternative intermolecular distance matrix RMS (DRMS) coordinate to obtain converged PMFs for the association of the glycophorin transmembrane domain. While the coarse-grained force field gives a reasonable Kd for dimerization, the majority of the bound population is revealed to be in a near-native conformation. Thus, the combination of a refined reaction coordinate with improved sampling reveals previously unnoticed complexities of the dimerization free energy landscape. We propose the use of replica-exchange umbrella sampling starting from different initial conditions as a robust approach for calculation of the binding energies in membrane simulations.
Biological
membranes are a complex mixture of multiple species
of lipids and proteins, the interactions of which play a key role
in cell function. The balance of interactions between different components
within the membrane also determines its longer range (nano to microscale)
structure. Of particular interest are the intramembrane interactions
of proteins with lipids[1] and adjacent proteins,
the latter playing key roles in membrane protein folding[2] and function.[3] Molecular
simulations have the potential to provide a detailed and quantitative
understanding of protein and lipid interactions in membranes. In particular,
simulations permit the computation of free energy landscapes or potentials
of mean force (PMF) of interactions between the different membrane
components (e.g., refs (4) and (5)). By enabling
binding free energies to be computed, PMFs can also provide mechanistic
insights into biologically relevant intermolecular interactions, which
in turn can be compared to available experimental measurements. However,
determining free energies for molecules embedded in lipid membranes
is particularly challenging. For example, the lateral diffusion coefficient
of common lipids in lipid bilayers is two to 3 orders of magnitude
smaller than that of water.[6] Therefore,
sampling methods based on molecular dynamics will be impeded by the
resulting slow diffusion of any molecule embedded in the membrane,
a situation which is exacerbated in complex and crowded membrane environments.[7]To calculate the binding free energy surface
between two species,
a sufficiently representative sample of the relevant configurations
of the system have to be visited such that the relative weights of
different free energy basins can to be established. Determining such
a sample, and when it is sufficient (or “converged”),
form the central task of any method for computing free energies. While
incompletely converged simulations may provide insights into the location
of stable states on a free energy surface, it is still possible that
the relative free energies of different states on that surface may
even be qualitatively incorrect. These challenges are starting to
be recognized in simulations of membrane-systems.[8] Therefore, for quantitative interpretation, and for comparison
with experiment, it is essential to obtain converged and therefore
reproducible results.Enhanced sampling methods have been widely
used to address the
sampling challenge in membrane simulations. Here we focus in particular
on those based on a reaction coordinate, or collective variable (CV),
which include, e.g., umbrella sampling,[9] metadynamics,[10] and adaptive bias force.[11] Replica exchange (RE) umbrella sampling (US)
is particularly advantageous due to straightforward setup and faster
convergence (relative to standard umbrella sampling) due to exchanges
between replicas.[12] This sampling method
has been applied to calculate protein–lipid and protein–protein
binding free energies in both all-atom (AA) and coarse-grained (CG)
simulations.[13−15] Umbrella sampling requires a collective variable
(CV) that defines the binding/unbinding transition of the interaction.
The most commonly used CV is a center-of-mass distance between two
species (protein–lipid or protein–protein) within the
membrane, which is used to define the umbrella windows. The initial
configurations of the simulation windows are generally selected by
translation along this reaction coordinate. Discarding an initial
fraction of a simulation sufficient to allow for equilibration, a
reweighting can be performed, using tools such as WHAM[16,17] or, equivalently, MBAR,[18] recovering
the unbiased potential of mean force after subtracting the effect
of the umbrella potentials. It is often assumed that if this PMF fails
to change with increased simulation time, then the PMF calculation
is converged. Of course, this is a necessary but insufficient test
when calculating PMFs for complex membrane systems, as we will illustrate.Here we present three examples of typical membrane PMF calculations,
two for protein–lipid (one simple, one more complex) and one
for protein–protein interactions within a lipid bilayer (Figure ). We use the coarse-grained
(CG) Martini force field which has been employed in a number of such
studies recently (recently reviewed in ref (19)). Our examples are the mitochondrial transport
protein ANT binding to cardiolipin (CL),[20] the potassium channel Kir 2.2 binding to phosphatidyl inositol 4,5-bisphosphate
(PIP2),[21] and dimerization of
the transmembrane (TM) helix of glycophorin A (GpA),[22] the latter being the prototypical system for studying protein–protein
interactions within a membrane. We study the dimerization of GpA and
its mutants, using a new collective variable, based on the intermolecular
contact matrix in the GpA dimer, to obtain a well-converged PMF. We
provide strong evidence of convergence in both cases: starting the
simulations with all replicas in a bound or all in an unbound initial
configuration has no effect on the result. The importance of exchanges
between windows is also highlighted, with absence of exchanges resulting
in slower convergence. We compare our simulation results with experimental
data, allowing us to explore limitations of the current CG force field.
Figure 1
(A) Possible
choices for initialization of umbrella sampling simulations:
“default” (cyan) corresponds to replicas starting close
to the bias center for each window; “bound” (blue) corresponds
to all the replicas close to one end of the collective variable (CV)
range; and “unbound” (red) corresponds to all replicas
close to the other end of the CV range. (B) The three simulation systems
explored in this article: ANT, the adenine nucleotide transporter
(in blue) interacting with a cardiolipin (CL; red) molecules; Kir2.2,
an inward rectifier potassium channel (gray) with a subunit (blue)
interacting with a PIP2 (red) molecule; and GpA, the glycophorin
TM transmembrane (TM) helix dimer, with the two TM helices (i.e.,
monomers) in red and blue. In each case the lipid bilayer is indicated
by its phosphate particles (in green) with the remainder of the lipid
molecules and the water on either side of the bilayer are omitted
for clarity. For each system bound and unbound system snapshots are
shown, in the upper and lower panel, respectively.
(A) Possible
choices for initialization of umbrella sampling simulations:
“default” (cyan) corresponds to replicas starting close
to the bias center for each window; “bound” (blue) corresponds
to all the replicas close to one end of the collective variable (CV)
range; and “unbound” (red) corresponds to all replicas
close to the other end of the CV range. (B) The three simulation systems
explored in this article: ANT, the adenine nucleotide transporter
(in blue) interacting with a cardiolipin (CL; red) molecules; Kir2.2,
an inward rectifier potassium channel (gray) with a subunit (blue)
interacting with a PIP2 (red) molecule; and GpA, the glycophorin
TM transmembrane (TM) helix dimer, with the two TM helices (i.e.,
monomers) in red and blue. In each case the lipid bilayer is indicated
by its phosphate particles (in green) with the remainder of the lipid
molecules and the water on either side of the bilayer are omitted
for clarity. For each system bound and unbound system snapshots are
shown, in the upper and lower panel, respectively.
Methods
Molecular Simulation Methods
Molecular dynamics was
performed in GROMACS (version 4.6.7; www.gromacs.org),[23] using a time
step of 20 fs. Semi-isotropic Berendsen pressure coupling[24] was used for equilibration with coupling constant
of 12 ps and Parrinello–Rahman[25] coupling with coupling constant of 12 ps for production simulations
with a reference pressure of 1 bar in both XY and Z directions. Similarly,
a Berendsen thermostat was used for equilibration with coupling constant
of 1 ps and a stochastic velocity rescaling temperature control[26] for production with characteristic time of 1
ps, keeping the temperature constant around 310 K. A cutoff of 1.2
nm was used for Lennard-Jones (LJ) interactions, with the potential
switched off smoothly between 0.9 and 1.2 nm. Coulombic interactions
were calculated using the Particle Mesh Ewald (PME) method,[27] with a grid spacing of 0.12 nm and a real-space
cutoff of 1.2 nm. These settings were consistently used in the ANT/CL,
KIK/PIP2, and glycophorin simulations.
Force Field
The CG Martini force field was used for
the simulations: for ANT/CL and GpA, version 2.1 of Martini was used[28−30] with a 4-tail bead POPC model. For Kir2.2/PIP2 version
2.2 was used[31] with the older 5-tail bead
POPC model.[32] PIP2 lipid parameters
taken from ref (33) and cardiolipin parameters were obtained from ref (34).
Umbrella Sampling
The PLUMED2 package[35,36] (2.2-hrex, compiled from https://github.com/GiovanniBussi/plumed2/tree/v2.2-hrex) was used to patch GROMACS 4.6.7 and define the collective variables
and perform the biasing.Replica exchanges are evaluated using
the Boltzmann criterion.where the probability of exchange between
two replica windows 1 and 2 with umbrella potentials U1 and U2 is dependent on the
difference Δ between the energies of the protein configurations X1 and X2 (those
in windows 1 and 2 before exchange) in these potentials before and
after the trial exchange (, where kB is
Boltzmann’s constant and T is the absolute
temperature). Exchanges are determined according to a Metropolis criterion,
always being accepted if e–Δ ≥ 1 and if e–Δ <
1 being accepted if a random number r drawn from
a uniform distribution on [0,1) is less than e–Δ.In this work, we have applied the final
umbrella potentials to
each window. In some cases, this may generate large forces due to
the system being out of equilibrium. This can readily be overcome
by including an appropriate initialization protocol (e.g., equilibrating
the umbrella closest to the initial condition first, then using this
to equilibrate the next umbrella and so on). If the landscape is rougher
(as in an all atom simulation), it will be harder to converge the
two initial conditions, although “slow growth” methods,
e.g., ref (37), may
be helpful in the situation. Even if convergence is not achieved,
the comparison will expose this fact, which otherwise would be overlooked.
Analysis and Visualization
MDAnalysis[38] was used to automate plumed input generation and GromacsWrapper[39] was used for topology file manipulation. VMD[40] was used to produce the molecular renderings.
ANT/Cardiolipin
The system was setup as described in
ref (20) with a simple
POPC bilayer assembled around the position restrained protein (with
force constant of 1000 kJ/mol/nm2). The final system contained
279 POPC molecules, a single cardiolipin molecule in the upper leaflet,
and over 4200 coarse-grained water particles. Randomly selected water
particles were converted into sodium and chloride ions to neutralize
the system and mimic the experimentally used buffer. Initial configurations
for the umbrella sampling windows were generated via a steered MD
(SMD) molecular dynamics simulation. A collective variable (CV) was
defined to permit sampling along a path orthogonal to the binding
site surface (Figure A). This CV corresponded to the distance between the CG glycerol
particle (GL0) of cardiolipin and the backbone bead of the Proline
in the conserved Px[D/E]xx[K/R] motif, found on the odd-numbered TM
helices and present in all members of this protein family (Figure A). We have chosen
to sample along this linear coordinate because it is most relevant
to binding to the known native binding sites, whereas using a radial
coordinate would also average over non-native binding. In addition,
a linear coordinate greatly facilitates sampling. To prevent the protein
from rotating within the bilayer, the 3 Proline backbone particles
of the three equivalent sites on ANT were position restrained (400
kJ/mol/nm2 on the X and Y dimensions). To keep the CL lipid
molecule from diffusing away from the line coordinate, an additional
harmonic restraint is defined with a force constant of 1000 kJ/mol/nm2, keeping the Y-component of the CV close to a fixed value.
Umbrella sampling was set up with 32 windows, spaced linearly between
1.6 and 4.5 nm on the cardolipin headgroup–protein center-of-mass
distance, with a force constant of 1000 kJ/mol/nm2. Exchanges
were attempted every 10000 steps.
Figure 2
Cardiolipin binding to ANT, providing
an example of calculating
a PMF for a lipid interacting with a specific binding site on an integral
membrane protein. (A) Representative structures of the closest (protein
lipid COM distance =1.5 nm) and furthest (4.5 nm) replicas in the
cardiolipin-ANT binding simulations. (B) PMFs starting from 3 different
initial configurations (default in cyan, unbound in orange, bound
in blue; black crosses indicate the final PMF for comparison; see Figure for definitions),
computed after 3, 5, and 7 μs of the simulation (within each
case the first half of the simulation discarded as equilibration).
Thus, the final PMF contains data pooled from 2 replica exchange simulations,
corresponding to the starting bound and unbound initial conditions.
(C) Root mean square difference between the two PMFs started from
“all bound” and “all unbound” configurations
as a function of time. (D) PMF RMSD to the final PMF of simulations
started from the default initial condition and either allowing for
replica exchange (labeled replex) or without exchanges (labeled no
replex).
Cardiolipin binding to ANT, providing
an example of calculating
a PMF for a lipid interacting with a specific binding site on an integral
membrane protein. (A) Representative structures of the closest (protein
lipid COM distance =1.5 nm) and furthest (4.5 nm) replicas in the
cardiolipin-ANT binding simulations. (B) PMFs starting from 3 different
initial configurations (default in cyan, unbound in orange, bound
in blue; black crosses indicate the final PMF for comparison; see Figure for definitions),
computed after 3, 5, and 7 μs of the simulation (within each
case the first half of the simulation discarded as equilibration).
Thus, the final PMF contains data pooled from 2 replica exchange simulations,
corresponding to the starting bound and unbound initial conditions.
(C) Root mean square difference between the two PMFs started from
“all bound” and “all unbound” configurations
as a function of time. (D) PMF RMSD to the final PMF of simulations
started from the default initial condition and either allowing for
replica exchange (labeled replex) or without exchanges (labeled no
replex).
Kir2.2
For computational
efficiency, the wild-type
Kir structure PDB 3SPI(21) was truncated in PyMol,[41] removing residues before Asp-61 and after Thr-191,
preserving Arg-186, which coordinates the 5′ phosphate in this
structure. This residue was then mutated to Ala in the mutant system.
The martinize script was used to obtain a coarse-grained topology
in each case and an elastic network was generated to stabilized the
protein structure in simulation.The protein was inserted into
a POPC bilayer, solvated, and equilibrated using the MemProtMD pipeline.[42] Position restraints with a force constant of
1000 kJ/mol/nm2 were used in equilibration to keep the
protein from moving. The short-tail PIP2 molecules from
the 3SPI crystal structure were used as the starting conformation
for the bound conformation of PIP2. Sodium and chloride
ions were added to 0.15 M to mimic the buffer generally used experimentally.
The final system contained over 12 000 particles with 4 PIP2 lipids, 360 POPC lipids, and over 6000 water particles.The collective variable was chosen to be the distance between the
PIP2 headgroup and the center of mass of the nearest Kir2.2
chain. Sixteen umbrella sampling windows were setup, with the CV linearly
spaced between 2.6 and 4.5 nm with a force constant of 500 kJ/mol/nm2 (Figure A).
Exchanges between replicas were attempted every 10 000 steps.
The initial structure for all windows of the “start bound”
simulation was 3SPI. After 50 ns of umbrella sampling, a frame from
the last window was used to initialize all windows of the “start
unbound” simulation. The distance is roughly parallel to the X-axis of the simulation box and so an additional bias is
applied to prevent the lipid from diffusing in Y, with a force constant
of 100 kJ/mol/nm2. To prevent the Kir from rotating in
the bilayer, a position restraint was applied to the backbone particles
of the protein, with a force constant of 100 kJ/mol/nm2 on X- and Y-coordinates.
Figure 3
PIP2 binding to Kir2.2, providing
an example of a PMF
for a charged lipid interacting with a specific binding site on an
integral membrane protein. (A) Representative structures of the PIP2 bound (PIP–Kir2.2 distance 2.7 nm) and PIP2 unbound (3.4 nm) replicas in the PIP2–Kir2.2 binding
simulations. (B,C) PMFs along a CV defined by the PIP2 headgroup
distance to the Kir monomer center of mass. PMFs are shown starting
from two initial configurations: with lipid bound in all replicas
(blue) or unbound (orange) PMFs are shown for (B) the wild-type protein
and (C) a mutant protein (R186A) with reduced PIP2-binding
affinity mutant.
PIP2 binding to Kir2.2, providing
an example of a PMF
for a charged lipid interacting with a specific binding site on an
integral membrane protein. (A) Representative structures of the PIP2 bound (PIP–Kir2.2 distance 2.7 nm) and PIP2 unbound (3.4 nm) replicas in the PIP2–Kir2.2 binding
simulations. (B,C) PMFs along a CV defined by the PIP2 headgroup
distance to the Kir monomer center of mass. PMFs are shown starting
from two initial configurations: with lipid bound in all replicas
(blue) or unbound (orange) PMFs are shown for (B) the wild-type protein
and (C) a mutant protein (R186A) with reduced PIP2-binding
affinity mutant.Instead of using an initial
steered MD simulation to generate the
unbound configurations (as was done for the ANT simulations) we relied
on a short umbrella sampling run starting from bound with the umbrella
centered far away from the protein. While this resulted in large initial
forces, these quickly taper off as the lipid moves closer to the target
window distance. Because the coarse-grained energy landscape is relatively
smooth this has no adverse effect.
GpA
The solution
NMR structure of GpA in a detergent
micelle (PDB id 1AFO(22)) was used as a starting structure for
the simulation. The protein was inserted into a POPC bilayer using
g_membed.[43] The system contained 125 POPC
lipids and over 1400 coarse-grained water particles.Umbrella
sampling simulations were setup with 16 windows, spaced linearly between
DRMS = 0 and 2.5 nm relative to the native, with a force
constant of 100 kJ/mol/nm2. Exchanges were attempted every
1000 steps.All the windows of start bound simulations were
initialized from
the 1AFO structure. After 50 ns, frames from each window were used
to initialize the “start default” simulation, and the
last window frame was used to initialize the “start unbound”
simulation. This is similar to what was done in Kir2.2 simulations,
as described in the previous section.
Definition of DRMS
Distance
root-mean-square displacement DRMS between
two configurations XA and XB is defined below.where d(xia,xja) is the distance between the coordinates xi and xj of atoms
i and j in configuration XA and the sum
runs over all atom pairs (i,j)
in a specified contact list. Because all-to-all distances have to
be considered this becomes computationally expensive with increasing
number of particles. Therefore, pairs of atoms are only included in
the contact list if those atoms are within some cut off distance in
the “native” bound state. Here we use a lower and upper
cut offs of 0.1 and 0.6 nm, respectively. We then further restrict
the atoms used in the calculating of the collective variable by taking
only pairs involving one atom from each molecule (for glycophorin
this becomes interhelical DRMS). Only
the transmembrane region between Glu-72 and Ile-95 was used to calculate
the interhelical DRMS (both backbone and
side-chain beads were included).
Calculation of the Dimerization Kd from the Potential of Mean Force
The dissociation constant
(Kd) for the GpA dimer can be defined
aswhere [A] is monomer
concentration
(per lipid area) when the system is in monomers and [AA] is the dimer concentration when the helices are dimerized and y is the fraction of time it is bound. Eq can be derived by requiring either (i) that
the time- or ensemble-averaged chemical potential of the monomer state
and dimer state should be equal at equilibrium (and neglecting pressure
contributions to free energy) or (ii) that the time-averaged forward
and reverse fluxes for dimerization should be equal at equilibrium.
Since in this case, , where σ is the
lipid area (defined
as the circle whose radius (L) is the mean helix–helix
distance in the last window of the umbrella sampling simulation),A reference concentration of 1 molecule
per nm2 is used. The standard free energy of dissociation
is then ΔG0 = −RTln Kd. An analogous expression was derived
by De Jong et al.[44] (see also[45]) for heterodimers.To determine the fraction
bound (y), one has to
integrate the PMF using some suitable dividing surface.where β = 1/(kBT), kB is Boltzmann’s
constant, and T is the absolute temperature. dc is the dividing value of DRMS separating bound and unbound state, chosen as 2.0
nm, and where the free energy value starts decreasing toward bound;
the value of y is not highly sensitive to this choice.
Results and Discussion
Protein–Lipid Interactions:
A Biologically Relevant Lipid
Protein–lipid interactions
play a key role in membrane protein
stability and function[1] and so it is important
to be able to characterize the strength of such interactions via PMF
calculations. A well-defined interaction between a lipid molecule
and a protein is provided by that between cardiolipin (CL) and ANT
(Figure B). The ADP/ATP
carrier (AAC1/ANT) is located in the inner membrane of the mitochondria.
It possesses a 3-fold pseudosymmetric structure with 3 CL binding
sites which have previously been characterized structurally.[46,47] From a sampling perspective, this is an interesting system because
CL is an anionic 4-tailed lipid, and thus might be anticipated to
be more challenging to sample in terms of its interactions with a
membrane protein than a neutral 2-tailed lipid.A collective
variable (CV) was defined to permit sampling along a path orthogonal
to the binding site surface (Figure A; see Methods Section for
details). Two independent sets of simulations were performed, with
the CL molecule either initially bound (d = 1.7 nm)
to site 1 on ANT, or unbound (d = 4.5 nm). The bound
configuration was the most representative structure of an ANT/CL complex
from a long equilibrium, unbiased MD simulation in which the cardiolipin
spontaneously bound to the site, which corresponds closely to the
crystallographically observed configuration of CL at this site.[20] An additional independent simulation was prepared
in which the CL molecules were initially positioned with uniform spacing
along the umbrella coordinate (see Figure A for a schematic representation of these
three initial starting conditions). The unbound configurations were
generated using accelerated MD, starting from the bound configuration,
and applying a force pushing the CL out and away from its binding
site.The umbrella sampling simulation was unbiased using WHAM,[16,17] yielding a PMF with 2 distinct minima, at d = 1.5
nm and d = 2.0 nm, with well-depths of 20 and 10
kJ/mol, respectively (Figure B). The PMFs are well-converged, within 1 kJ/mol of each other
after 7 μs. The progress of convergence can be monitored via
calculating the root-mean-square difference between a PMF derived
from simulations started with all replicas bound and a PMF starting
from all unbound. This PMF RMSD becomes <1 kJ/mol after 5 μs
(Figure C, upper plot).
Up to this difference, the PMF is thus insensitive to the initial
conditions. An overlap of distance histograms from the different umbrella
windows is shown in SI Figure 1A. Multiple
binding and unbinding transitions are observed in the continuous trajectories
obtained by following the replicas through exchanges (SI Figure 1B). Convergence is aided by allowing
for exchange between replicas, which is attempted every 10000 steps
and accepted via an analogous criterion to that for temperature replica
exchange.[12] In the absence of such exchanges,
the PMF RMSD remains ∼2 kT even after 7 μs, with no sign
of convergence (Figure C, lower plot).The two minima (A and B in Figure B) correspond to configurations
of ANT with bound or
partially bound CL, respectively. Representative frames show the headgroup
particles of CL bound tightly into the ANT in minimum A, while in
minimum B the phospholipid headgroup is about 0.5 nm further away
from the binding pocket surface. While minimum A is considerably deeper
than B, it can only be accessed by crossing a ∼ 5 kJ/mol barrier.
Assessing the relative depth of these two minima would be very difficult
using standard unbiased MD simulation, and presents a sampling challenge
to the replica exchange umbrella sampling approach.We are therefore
confident that, from the umbrella sampling with
replica exchange, we have a well converged PMF for the interaction
of a lipid with a membrane protein. We note that the duration of our
simulations (5–7 μs) is longer than often used for determining
PMFs in a membrane environment (e.g., 1 μs[5]). While we find that the well-depth for the deeper minimum
is within 2.5–5 kJ/mol of the final value, had the simulation
been run for 1 μs, the rest of the PMF may not be well converged.
We also note that a commonly used criterion, the PMF well-depth not
changing over time is a necessary but not sufficient to demonstrate
that a simulation is converged. This is especially the case in membrane
protein/lipid simulations where “memory effects” of
the initial conditions are likely to play a key role in determining
convergence. Furthermore, it appears that the simulations started
from the “bound” state converge much more quickly to
the final PMF than those started from “unbound”. This
is most likely related to the chosen coordinate not uniquely defining
the bound state: thus, applying a force to separate the protein and
lipid is effective in producing representative unbound states (since
distance is sufficient to define fully unbound states). However, pulling
the molecules together can create non-native states at the desired
intermolecular separation which evolve more slowly to the correctly
bound state as the bias force is not pushing non-native bound states
toward the native-like bound or partially bound states. Note that
this faster convergence starting from bound would only be expected
if the most stable bound state (in the context of the force field)
was used to initialize the runs.
Protein–Lipid Interactions:
A More Complex Case, Kir/PIP2
PIP2 is a key lipid molecule involved
in regulation and/or
recognition in many membrane processes,[48] including the activity of ion channels.[49] It has a somewhat more complex polyanionic headgroup than CL. Here
we explore the PMF of PIP2 interactions with the TM domain
of the Kir2.2 potassium channel. Both structural and functional experimental
data are available for this interaction,[49] as well as simulation studies which demonstrate that extended equilibrium
CG simulations will permit PIP2 to find its experimentally
observed binding site (Figure B), in which the anionic headgroup binds to basic residues
on the protein surface close to bilayer/water interface.[33,50] A number of binding site mutants exists, with reduced apparent affinity
for PIP2.[51,52] Thus, this system provides both
a further opportunity to explore convergence of protein/lipid PMFs,
and to explore the sensitivity of such (CG) PMFs to mutations in key
lipid-binding side chains.As detailed above, the CV was defined
as a distance between the PIP2 headgroup and the center
of mass of the Kir2.2 subunit containing the binding site (Figure A). For wild-type
(WT) Kir2.2, the potential of mean force shows a single minimum, 45
kJ/mol deep, with the PIP2 bound at the native site observed
in the crystal structure (Figure B). Comparison of the bound/unbound profiles demonstrates
convergence of the PMF to within an RMSD of 2.5–5 kJ/mol after
4–5 μs umbrella sampling with replica exchange. The depth
of the energy minimum, corresponding to interaction of PIP2 with a binding site containing 6 basic side chains, is comparable
to that of PIP2 interacting with a juxtamembrane binding
site of the EGFR (−42 kJ/mol), a (less structured) binding
site which contains 5 basic side chains.[5]The Kir2.2 mutant R186A (which has a reduced apparent affinity
for PIP2) has one fewer arginine in the PIP2 binding site, but still can be crystallized with a short-chain PIP2 molecule bound.[21] The PMF shows
a rather broad well, with depth of about 32 kJ/mol (Figure C). Representative structures
reveal that partially unbound and fully bound configurations of the
PIP2 headgroup at the mutant binding site have about the
same free energy of interaction (Figure C, inset), which is ca. 70% of that of PIP2 bound to the WT channel. Thus, the PMF suggests that the
effect of the R186A mutation may be more complex than simply a reduction
in affinity of the site for PIP2. To better understand
the role of electrostatics in the Kir2.2–PIP2 interaction,
we repeated the calculation with a PIP2 molecule in which
the phosphate particle charges had been neutralized (SI Figure 2). This reduces the well-depth by about 2-fold,
but interestingly the energy minimum remains at about the same position
on the CV.
Protein–Protein Interactions in a
Membrane
Protein–protein
interactions, and in particular interactions between TM helices within
a bilayer, play key roles in folding,[53] stability,[54] and function[3] of membrane proteins. Thus, it is of considerable interest
to understand the free energy landscapes of membrane protein–protein
interactions. It is also anticipated to be more challenging both in
terms of selection of a suitable CV and of establishing convergence
than is the case for even protein/lipid interactions in a membrane.One of the best studied examples of protein–protein interactions
in a membrane, both experimentally[22,55−62] and computationally[4,45,63−68] is that of dimerization of the glycophorin (GpA) TM domain (Figure B). Both NMR[22] and more recent crystallographic[62] data reveal the GpA TM domain to form a parallel
TM helix dimer with an interhelix crossing angle of ca. −20°
and a helix/helix interface containing a key glycine-rich sequence
motif. Both the wild-type structure and dimerization-disrupting mutants
have been characterized via a range of biochemical methods[55−60] and also in silico, the latter both at the coarse-grained[4,66,67] and all-atom[45,68] resolution. It is therefore an ideal system to probe protein–protein
interactions within a membrane.A “good” CV describing
TM helix dimerization in a
membrane should allow the dimer to be separated, and also allow us
to readily distinguish native from near-native dimer configurations.
We defined a set of intermolecular (i.e., between the TM helix monomers)
native distances in the GpA TM dimer structure. Using those distances
as a reference we then define a distance root-mean-square displacement
(DRSM) relative to the native TM dimer
configuration. As previously (see above) three independent sets of
simulations were run: one starting with all windows in the bound configuration;
one starting with all the windows in unbound configuration; and one
with the windows spaced at uniform intervals along the CV. The initial
“bound” structure was a CG model derived directly from
the PDB 1AFO solution NMR structure of GpA in a detergent micelle.[22]The PMF thus obtained (Figure A) could be shown to be independent
of the initial
configuration and converges around 6 μs (as judged from the
PMF RMSD plot; SI Figure 3) to within 2.5
kJ/mol between the simulations. The dimerization landscape has 4 minima,
with a well-depth of about 35 kJ/mol relative to the unbound state.
Notably, the native minimum (labeled 1 in Figure A) is not the lowest free energy configuration,
but rather a near-native minimum (labeled 2 in Figure A) is more stable. Projecting the PMF from
sampling along the DRMS CV onto the helix–helix
center of mass distance (D), we find that a non-native
dimerized state, with a helix–helix separation of 0.9 nm (versus
0.6 nm for the native) is the free energy minimum (Figure B). Minima 1 and 2 correspond
to native and near-native configurations (characterized by helix–helix
crossing angles of −25° and −20° compared
to −23° and −20° in the 1AFO NMR structure
and the 5EH4 crystal structure, respectively). Minima 3 and 4 represent
non-native configurations, with the helices more closely parallel
to one other (Figure C), and crossing angles of −15° and −5°,
respectively. The simulations are well converged, with multiple dimerization
and dissociation events (SI Figure 3).
Figure 4
Dimerization
of the TM helix of glycophorin A (GpA), providing
an example of a protein–protein interaction within a membrane.
(A) GpA dimerization showing a PMF along a reaction coordinate (CV)
defined by the intermolecular DRMS to
the native dimer structure (PDB id 1AFO). Three initial conditions used, which
had either all replicas in the native configuration (“bound”,
blue), all replicas initially dissociated (“unbound”,
orange), or replicas linearly spaced apart (“default”,
cyan). Observed 4 energy minima are labeled 1–4, corresponding
to DRMS values of about 0.1, 0.3, 0.55,
and 0.75 nm, respectively. (B) Reweighted PMF along a CV corresponding
to the interhelical distance (the native structure distance is in
red at ∼0.6 nm). (C) Representative structures (in gray; wireframe
backbone) from each of the four minima labeled in the PMF in (A).
The two subunits of the experimental (PDB id 1AFO) are shown in blue
and in red. (D) Crossing angle distributions for the structures seen
in PMF minima labeled in (A), with crossing angle defined as a torsion
angle between flanking backbone atoms of residues number 78–88.
Dimerization
of the TM helix of glycophorin A (GpA), providing
an example of a protein–protein interaction within a membrane.
(A) GpA dimerization showing a PMF along a reaction coordinate (CV)
defined by the intermolecular DRMS to
the native dimer structure (PDB id 1AFO). Three initial conditions used, which
had either all replicas in the native configuration (“bound”,
blue), all replicas initially dissociated (“unbound”,
orange), or replicas linearly spaced apart (“default”,
cyan). Observed 4 energy minima are labeled 1–4, corresponding
to DRMS values of about 0.1, 0.3, 0.55,
and 0.75 nm, respectively. (B) Reweighted PMF along a CV corresponding
to the interhelical distance (the native structure distance is in
red at ∼0.6 nm). (C) Representative structures (in gray; wireframe
backbone) from each of the four minima labeled in the PMF in (A).
The two subunits of the experimental (PDB id 1AFO) are shown in blue
and in red. (D) Crossing angle distributions for the structures seen
in PMF minima labeled in (A), with crossing angle defined as a torsion
angle between flanking backbone atoms of residues number 78–88.This finding was somewhat unexpected.
Previous CG PMFs for GpA
dimerization, whether by umbrella sampling MD[4] or by a Monte Carlo approach[67] were calculated
along helix–helix distance. While the DRMS and D behave similarly at large (>1.5
nm, Figure B, SI Figure 4) separations, at short distances D is unable to discriminate between closely related native
and non-native dimer configurations. Thus, D is a
relatively poor collective variable since these native, near native,
and non-native states are clearly separated by a free energy barrier
when projected onto DRMS (Figure A). Projecting our PMF onto
a 2D surface defined by the DRMS and D plane confirms that D collapses together
a number of states which are separated by DRMS (Figure B). A key
question is whether the observed non-native states are in fact experimentally
relevant, or simply due to the limited resolution that can be achieved
with the coarse-grained model. While the existing experimental structures
all favor a very tightly defined native structure,[62] that does not rule out the possibility that there may be
an appreciable population of other states at equilibrium (although
it seems unlikely). A second method of validating the results is compared
with dissociation constants determined from recent steric trap methods,
which allow an estimate of the dimer stability in a lipid bilayer.[60] In order to compute Kd from the PMF, we need to define what is meant by the bound state.
If we provisionally define all configurations in which the TM helices
are separated by a distance of less than 2.3 nm (corresponds to DRMS of ∼2.0 nm) (since the steric trap
method would not distinguish native from non-native states if the
helices have a native separation), we obtain a Kd of 1.1 × 10–6 molecules per nm2, compared with the experimental estimate of 1.3 × 10–9 molecules nm2,[60] an acceptable difference considering the simplicity of the simulation
model, which lacks detailed specific interactions, such as hydrogen
bonds. The PMF also shows that the non-native bound states constitute
the vast majority of the bound population, which might suggest that
they are in fact relevant.
Figure 5
(A) Schematic/cartoon of D vs DRMS as “poor” and “good”
CVs.
While at far separation both CVs describe the system similarly, at
close separation, D collapses the phase space and
can correspond to a number of very different configurations, while DRMS is able to distinguish native from near-native
helix–helix packing arrangements. (B) 2D PMF along D and DRMS, obtained by reweighing
the original 1D umbrella sampling simulation along DRMS. Inset (C; magnified) shows the minima 1–4
which have different DRMS values and essentially
the same D.
(A) Schematic/cartoon of D vs DRMS as “poor” and “good”
CVs.
While at far separation both CVs describe the system similarly, at
close separation, D collapses the phase space and
can correspond to a number of very different configurations, while DRMS is able to distinguish native from near-native
helix–helix packing arrangements. (B) 2D PMF along D and DRMS, obtained by reweighing
the original 1D umbrella sampling simulation along DRMS. Inset (C; magnified) shows the minima 1–4
which have different DRMS values and essentially
the same D.As an alternative, more stringent, experimental test of our
free
energy landscape, we therefore consider the effects of mutations.
The tight packing of GpA TM helices is mediated via a conserved GXXXG
motif.[69] Mutations that introduce a bulky
side-chain residue in place of one of the Gly residues are known to
disrupt dimerization.[63] Thus, both G83L
and G83I prevent dimerization.[55,57,58,60] We have simulated the G83L mutant
dimer to obtain PMFs as before starting either from bound or unbound
configurations; because isoleucine and leucine are represented in
an almost identical way in the Martini force field, the results for
G83I are expected to be very similar.The Kd computed from these PMFs is
indistinguishable from that of wild-type, strongly suggesting that
the populated binding modes are not representative of the experimental
distribution. We can obtain further insight from the detailed differences
between the PMFs. The G83L PMF shows that in fact the most native-like
minimum (minimum number 1 in Figure ) is destabilized, albeit by 5 ± 2.5 kJ/mol relative
to the WT PMF (Figure A), compared with 19 kJ/mol in experiment.[60] The quantitative discrepancy may be due to the coarse-grained nature
of the model. In addition, the range of experimental constructs (based
largely on a chimeric protein formed by fusion of the GpA TM domain
with staphylococcal nuclease) and environments (detergent micelles
vs in vitro lipid bilayers vs bacterial cell membranes)
makes quantitative comparisons difficult. Nonetheless, the reproduction
of a destabilizing effect is consistent with experiment and intuition,
based on the published structures.[22,62] There is also
a similar destabilization of the native-like states with lower DRMS to the experimental structure. On the other
hand, there is virtually no change in free energy for the minimum
between 0.7 and 0.8 nm. This is because residue 83 has a smaller role
in the binding interface in this case. Because this minimum (number
4 for the mutant, in which the helix crossing angles are bimodally
distributed with peaks at −20 and 20 degrees (SI Figure 5), is the most stable bound species in the mutant,
this allows the Kd to remain essentially
unchanged in the G83L mutant simulations. Overall these results suggest
that, at the least, the population of stable non-native dimer states
is too large in the simulation, particularly for those with largest DRMS from native.
Figure 6
Effects of a “disruptive”
mutation on the dimerization
of the TM helix of GpA. (A) PMF for dimerization of a mutant (G83L)
of GpA (compare with the wild type PMF in Figure A). The three energy minima labeled correspond
to the structures in Figure C. Two initial conditions were used, which had either all
replicas in the native configuration (“bound”, blue),
or all replicas initially dissociated (“unbound”, orange).
(B) Comparison of the population distributions derived from the wild-type
and G83L GpA PMFs. From these one may estimate a ΔΔG of 0 or 4 kJ/mol (with unbound state defined at 2.0 nm
or 0.2 DRMS respectively) for destabilization
of the native dimer by the G83L mutation.
Effects of a “disruptive”
mutation on the dimerization
of the TM helix of GpA. (A) PMF for dimerization of a mutant (G83L)
of GpA (compare with the wild type PMF in Figure A). The three energy minima labeled correspond
to the structures in Figure C. Two initial conditions were used, which had either all
replicas in the native configuration (“bound”, blue),
or all replicas initially dissociated (“unbound”, orange).
(B) Comparison of the population distributions derived from the wild-type
and G83L GpA PMFs. From these one may estimate a ΔΔG of 0 or 4 kJ/mol (with unbound state defined at 2.0 nm
or 0.2 DRMS respectively) for destabilization
of the native dimer by the G83L mutation.We also explored the robustness of the GpA TM helix dimer
PMF to
variations in the CG model applied. Thus, the results discussed above
used the Martini 2.1 force field; recalculation using the more recent
version, Martini 2.2,[31] does not seem to
change the PMF appreciably (SI Figure 6). We also explored whether the use of PME vs shifted electrostatics
within Martini, and the effect of different restraints on secondary
structure (dihedral vs elastic network). The resultant PMFs (SI Figure 7) suggest that the results are relatively
robust to the restraints, and that the use of PME leads to some relative
stabilization of the native dimer structure. Thus, we are confident
we have achieved a well sampled and converged PMF within the limitations
of the Martini force field.
Conclusions
In summary, we show how PMF calculations for membrane systems,
for both protein–lipid and protein–protein interactions,
benefit from stronger tests of convergence than is sometimes the case.
Exchanges between umbrella sampling windows speed-up convergence and
independent initial conditions provide an easily interpretable condition
for convergence. Additionally, care must be taken in selection of
suitable CV. For certain systems, such as lipid–protein interactions,
a simple center of mass distance appears sufficient to achieve good
sampling. However, for protein–protein interactions in a membrane,
we implemented a different collective variable, namely an intermolecular
distance matrix RMSD.We have illustrated our case with an example
of the interaction
of an integral membrane protein with cardiolipin, a topic of general
importance as the energetics of such interactions have been explored
by CG MD for a number of proteins from mitochondrial membranes, e.g.,
refs (70), (71), (20), and (72). We have extended this
analysis to a well-characterized interaction of PIP2 with
an inward rectifier potassium channel. This is also of broad relevance
given recent evidence for a role on PIP2 interactions in
regulating a number of ion channel proteins.[49] In particular, we show that with replica exchange and CG simulations
of >5 μs, one can obtain demonstrably converged protein/single
lipid PMFs, thus allowing exploration of the effects of binding site
mutations and/or changes in lipid species on the strength of lipid/protein
interactions. These calculations use the Martini CG force field: it
will be of interest in the future to attempt such free energy calculations
with an all-atom force field. We anticipate that in this case establishing
convergence will be more demanding, as the free energy landscape will
likely be more rugged.Using our DRMS collective variable,
we find that ΔG for GpA dissociation in Martini
2.1 is in agreement with experiment (values ranging from −24
kJ/mol[73] to −51 kJ/mol[60]). For the native state the well-depth observed
is −28 kJ/mol, which compares well with other PMF-based estimates
of −30 kJ/mol[67] and −40 kJ/mol[4] from CG simulations, and of −44 kJ/mol[45] and −60 kJ/mol[68] from all-atom simulations. However, while the CG force field can
qualitatively reproduce the effect of a dimer-disrupting mutation
on the native bound conformation, it does not appreciably alter the
dimer stability. The small effect of the mutation on stability is
due to the large population of non-native bound states that are not
much affected by the mutation. That the CG force field is not quantitatively
successful in capturing the effects of such mutations is perhaps not
surprising given the role of interactions which are not explicitly
present in the CG energy function, e.g., Cα-H···O
H-bonds, in stabilizing the GpA native helix dimer[62,74] over non-native bound states. The use of the DRMS reveals a more complex energy landscape than was previously
seen for GpA dimerization, such that a near-native dimer is the most
stable configuration. Comparable complexities of an apparently simple
free energy landscape for TM helix interactions have recently been
seen for CG metadynamics studies of dimerization of the EGFR TM domain.[75]A clear future extension of this study,
especially in terms of
TM helix–helix interactions, will be to obtain demonstrably
converged PMFs from all-atom simulations. Two all-atom PMFs of GpA
TM helix dimerization are available[45,68] but would
merit revisiting in the context of the greater complexity of the (CG)
free energy landscape revealed via the use of the DRMS as a CV. This will doubtless prove challenging due
to much larger number of degrees of freedom in all-atom simulations
and the high viscosity of the lipid bilayer.
Authors: Karen G Fleming; Cha-Chi Ren; Abigail K Doura; Matthew E Eisley; Felix J Kobus; Ann Marie Stanley Journal: Biophys Chem Date: 2004-03-01 Impact factor: 2.352
Authors: Djurre H de Jong; Gurpreet Singh; W F Drew Bennett; Clement Arnarez; Tsjerk A Wassenaar; Lars V Schäfer; Xavier Periole; D Peter Tieleman; Siewert J Marrink Journal: J Chem Theory Comput Date: 2012-11-28 Impact factor: 6.006
Authors: Sander Pronk; Szilárd Páll; Roland Schulz; Per Larsson; Pär Bjelkmar; Rossen Apostolov; Michael R Shirts; Jeremy C Smith; Peter M Kasson; David van der Spoel; Berk Hess; Erik Lindahl Journal: Bioinformatics Date: 2013-02-13 Impact factor: 6.937
Authors: Elizabeth Jefferys; Zara A Sands; Jiye Shi; Mark S P Sansom; Philip W Fowler Journal: J Chem Theory Comput Date: 2015-05-14 Impact factor: 6.006
Authors: Curtis Balusek; Hyea Hwang; Chun Hon Lau; Karl Lundquist; Anthony Hazel; Anna Pavlova; Diane L Lynch; Patricia H Reggio; Yi Wang; James C Gumbart Journal: J Chem Theory Comput Date: 2019-07-02 Impact factor: 6.006
Authors: Kyungreem Han; Phillip S Hudson; Michael R Jones; Naohiro Nishikawa; Florentina Tofoleanu; Bernard R Brooks Journal: J Comput Aided Mol Des Date: 2018-08-06 Impact factor: 3.686
Authors: Christophe Chipot; François Dehez; Jason R Schnell; Nicole Zitzmann; Eva Pebay-Peyroula; Laurent J Catoire; Bruno Miroux; Edmund R S Kunji; Gianluigi Veglia; Timothy A Cross; Paul Schanda Journal: Chem Rev Date: 2018-02-28 Impact factor: 60.622
Authors: Siewert J Marrink; Valentina Corradi; Paulo C T Souza; Helgi I Ingólfsson; D Peter Tieleman; Mark S P Sansom Journal: Chem Rev Date: 2019-01-09 Impact factor: 72.087