Hemagglutinin (HA) mediates membrane fusion, a crucial step during influenza virus cell entry. How many HAs are needed for this process is still subject to debate. To aid in this discussion, the confinement free energy method was used to calculate the conformational free energy difference between the extended intermediate and postfusion state of HA. Special care was taken to comply with the general guidelines for free energy calculations, thereby obtaining convergence and demonstrating reliability of the results. The energy that one HA trimer contributes to fusion was found to be 34.2 ± 3.4kBT, similar to the known contributions from other fusion proteins. Although computationally expensive, the technique used is a promising tool for the further energetic characterization of fusion protein mechanisms. Knowledge of the energetic contributions per protein, and of conserved residues that are crucial for fusion, aids in the development of fusion inhibitors for antiviral drugs.
Hemagglutinin (HA) mediates membrane fusion, a crucial step during influenza virus cell entry. How many HAs are needed for this process is still subject to debate. To aid in this discussion, the confinement free energy method was used to calculate the conformational free energy difference between the extended intermediate and postfusion state of HA. Special care was taken to comply with the general guidelines for free energy calculations, thereby obtaining convergence and demonstrating reliability of the results. The energy that one HA trimer contributes to fusion was found to be 34.2 ± 3.4kBT, similar to the known contributions from other fusion proteins. Although computationally expensive, the technique used is a promising tool for the further energetic characterization of fusion protein mechanisms. Knowledge of the energetic contributions per protein, and of conserved residues that are crucial for fusion, aids in the development of fusion inhibitors for antiviral drugs.
The glycoprotein hemagglutinin
(HA) catalyzes membrane fusion during
the invasion of cells by influenza virus particles.[1] HA is a homotrimeric class I fusion protein consisting
of two subunits, a mostly globular binding domain (HA1) and a fusion-active
domain (HA2) that is anchored in the viral membrane at its C-terminal.
HA1 is disulfide-bonded to HA2, whose central α-helical coiled
coil forms the core of the trimer. HA1 facilitates binding to receptors
on the target cell membrane and holds the protein in a metastable
configuration. After endocytosis, acidification of the endosome triggers
dissociation of HA1 and release of fusion peptides at the N-terminal
of HA2.[2,3] A large conformational change in HA2 ensues,
as is evident from the comparison of the prefusion and postfusion
(PF) conformations of HA.[4,5] In fact, a pathway of
HA rearrangements has been deduced, which involves two large conformational
changes.[6] First, a loop-to-helix transition
extends the existing coiled coil and projects the fusion peptides
toward the target membrane. In this hypothesized “extended
intermediate” (EI) state, the fusion peptides can insert into
the target membrane, thereby establishing HA2 as a bridge between
the viral and endosomal membrane. Subsequently, the globular C-terminal
domain of HA2 collapses and zippers up along distinct hydrophobic
patches on the extended coiled coil, so as to bring the two membranes
together for fusion.[7] Single-particle fusion
kinetics experiments have shown that HA triggering is the rate-limiting
step, followed by fast HA extension and fusion peptide insertion.
The resulting intermediate state of HA remains alive until a local
cluster of sufficiently many HAs can jointly provide enough energy
to fuse the membranes.[6,8] However, the amount of energy
that each HA can contribute to this process has not yet been determined.The free energy available per HA is valuable information in the
development of antiviral therapeutics. It is directly related to the
number of HAs needed to surmount the appreciable membrane fusion energy
barrier, estimated at 30–90kBT.[9−12] Both the number of HAs and the free energy per HA influence the
kinetics of fusion, which has to be tightly regulated to ensure release
of the viral RNA close to the target cell nucleus, while avoiding
degradation by the host immune system.[13] Therefore, these two quantities influence virus infectivity, dictate
the minimal receptor density on the target membrane, and define the
number of HA-targeting antibodies that will be needed to neutralize
a virus particle.[14,15]To our knowledge, HIV-1
gp41 and the SNARE complex are the only
fusion catalysts for which the energy contribution to membrane fusion
is known: 71kBT for gp41
and 65kBT for SNARE,
both determined using optical tweezers.[16,17] Depending
on the virus strain, HIV-1 entry requires one to seven gp41 trimers,[18] whereas one to three SNARE complexes can mediate
synaptic vesicle fusion.[19] For HA-mediated
fusion, the required number of trimers has been estimated to be between
three and six.[8,20,21] However, an experimental determination of the energy that HA contributes
is still lacking. Alternatively, computational methods may be used
to compute this energy, but they bring a number of challenges for
systems as large as HA. Computational difficulties arise mainly because
of the vast conformational space that needs to be sampled and many
local energy minima in which the system can get stuck.[22] For example, targeted molecular dynamics (MD)
is able to give insight into the pre- to postfusion pathway,[23] but an accurate free energy profile along this
path would require further sampling with more sophisticated (and laborious)
methods.[24] Some researchers have computed
the conformational free energy gain of the loop-to-helix transition
that extends HA2, thereby reducing the size of the system to only
tens of residues.[25,26] However, in view of the described
pathway of HA rearrangements, it is the entire free energy that is
released during the transition from the extended intermediate to
the postfusion state that needs to be calculated. To do so, a method
is needed that makes the computation of conformational free energy
differences for such a large system feasible.There is no generally
accepted method for computing the free energy
difference between the conformational states of a macromolecule, which
is a particularly difficult task when the two conformations differ
significantly from each other.[22] As the
two states of interest do not overlap, the path-dependent approach
requires integration over overlapping intermediate states along some
reaction coordinate to calculate the change in free energy during
the transformation from one state into the other. However, even though
the intermediate states do not need to be physically realistic, the
establishment of a suitable reaction coordinate and a reaction path
is challenging, and some paths converge much slower than others.[27] Moreover, the system can still become trapped
in an energy basin, even within the presence of a biasing potential.
Additional reaction coordinates that bias the phase space perpendicular
to the original reaction coordinate might be necessary, especially
for larger systems, at the cost of even more complexity.[28]Instead, we here apply the confinement
free energy method, proposed
a few years ago by Karplus and co-workers,[29] to determine the conformational free energy difference between the
extended intermediate and the postfusion state of HA. This method
is path-independent and has been used to estimate the free energy
difference between two conformational states by approximating the
internal motions of a protein as a superposition of harmonic oscillators.[29−31] By applying enhanced sampling techniques and a number of optimization
strategies, in compliance with general guidelines for free energy
calculations, the result of our calculation shows convergence and
consistency between two different analysis techniques.
Methods
Confinement
Free Energy Method
The confinement approach[29,33−35] is a path-independent method that can be used to
estimate the conformational free energy difference ΔGAB between two states, A and B, without the
need of a transition path. Its central idea is illustrated in Figure . Macrostate A is
transformed into a set of noninteracting harmonic oscillators, a,
for which the free energy Ga can be evaluated
analytically. The work done to perform this transformation is the
confinement free energy ΔGconfA. So, going from a to A in Figure , the conformational
free energy of state A is evaluated asA similar operation is performed for state
B. Completing the cycle, the conformational free energy difference
between the states can be computed asFocusing on the confinement
procedure to calculate
ΔGconfA first, the transformation A → a is
accomplished by a series of molecular dynamics simulations, in which
the system of interest is confined to state a. To this end, the Hamiltonian
is extended with a harmonic restraint, centered at the positions of
a reference structure X0A ∈ A. This restraint increases
stepwise with each new simulation, proportionally with the parameter
λ = 0, ..., 1, until the system can be considered purely harmonic
at λ = 1. This corresponds to the Hamiltonian[29]which includes the potential and kinetic energy
terms E(X) and K(P), depending on the system configuration X and the particle momenta P, respectively.
The particle masses m are input for the restraint over all N particles,
while ∥·∥ denotes the Euclidean norm and is evaluated
after a mass-weighted best-fit of the instantaneous particle coordinates x with the reference coordinates x0. This best-fit alignment improves the convergence
of the procedure by precluding work done on restraining rigid body
motions. The mass-weighting ensures the transformation into a set
of noninteracting harmonic oscillators for which the free energy (Ga) is known analytically, thereby obviating
the need for a normal-mode analysis to determine ΔGab in a previous version of the method.[35] The total work performed over all of the values of the
restraint parameter λ is the confinement free energy ΔGconf, which can be computed using[36]Here, M is the total mass
of the system, ⟨ρm(X, X0)⟩ is the mass-weighted root-mean-square
deviation (RMSD) of the sampled configuration X with
respect to the reference structure X0,
and the ensemble average denoted by ⟨·⟩ is obtained
from a simulation using the Hamiltonian Hλ(λ). Calculating ⟨ρm⟩ this way
in postprocessing, Uconf effectively gives
the average work done by the restraint term during the simulation. Equation is valid for ν
chosen sufficiently large, as evaluated by the convergence criterion[29]where NDOF is
the number of degrees of freedom in the system and β = 1/kBT. In this limit, the system
behaves as a set of noninteracting harmonic oscillators, of which
virtually all of the energy is due to the restraints. Reaching that
point, ΔGconfA can be interpreted as the “anharmonic”
part of the free energy of state A. By subtracting this from the “harmonic”
free energy of state a and changing the variable of integration in eq to ζ = λν2, the total conformational free energy of state A is computed
aswhere h is Planck’s
constant. E(X0A) is the potential energy of the
reference structure for state A, which is obtained by taking the ensemble
average of the potential energy at the highest restraint frequency.
The conformational free energy of state B can be obtained similarly
using X0B as the reference structure. The second term on the right-hand
side of eq is the free
energy of the set of noninteracting harmonic oscillators representing
a state, which is identical for two confined states with the same
number of degrees of freedom. Hence, if one is only interested in
the difference in free energy between two states, this term vanishes
and
Figure 1
Thermodynamic
cycle illustrating the confinement free energy method.
The free energy of macrostate A is estimated from the free energy Ga of the set of harmonic oscillators a, oscillating
around the reference structure X0A, and the work ΔGconfA needed to harmonically confine A to a as computed by thermodynamic
integration. The conformational free energy difference between states
A and B is then found by combining the confinement free energies ΔGconfB – ΔGconfA with the analytically obtained harmonic
free energies Ga and Gb (eq ).
On the right, the reference structures are shown in cartoon and licorice
representation for the EI (orange, state a) and PF (purple, state
b) states. The overlaid cartoon representations on the left depict
representative conformations from the simulation with the lowest restraint
strength for each of the states. Rendered with visual molecular dynamics
(VMD).[32]
Thermodynamic
cycle illustrating the confinement free energy method.
The free energy of macrostate A is estimated from the free energy Ga of the set of harmonic oscillators a, oscillating
around the reference structure X0A, and the work ΔGconfA needed to harmonically confine A to a as computed by thermodynamic
integration. The conformational free energy difference between states
A and B is then found by combining the confinement free energies ΔGconfB – ΔGconfA with the analytically obtained harmonic
free energies Ga and Gb (eq ).
On the right, the reference structures are shown in cartoon and licorice
representation for the EI (orange, state a) and PF (purple, state
b) states. The overlaid cartoon representations on the left depict
representative conformations from the simulation with the lowest restraint
strength for each of the states. Rendered with visual molecular dynamics
(VMD).[32]
Thermodynamic Integration (TI) and Bennett Acceptance Ratio
Estimator (MBAR)
The confinement free energy (eq ) can be calculated either directly
using thermodynamic integration (TI)[37] or
via the multistate Bennett acceptance ratio estimator (MBAR).[38] When using TI with numerical integration, the
trapezoid rule with linear interpolation is not suitable due to the
strong nonlinearity of ⟨ρm(X, X0)2⟩ as a function
of ν.[34] Instead, the integration
is performed using linear interpolation in logarithmic space as follows.[35] With a function y(k) that has been evaluated at the discrete points {a = k0, k1, k2, ..., k = b}, the area under the curve of this function
between the points i and j = i + 1 isHence, the integral of y(k) is calculated numerically usingThe statistical uncertainty of the confinement
free energy method arises mainly from the sampling error in obtaining
⟨ρm⟩. To compute the error in the conformational
free-energy difference G, the error in ρm is propagated through eq . When numerical integration is used, the error propagates
through eqs and 12, as detailed in Section S.1 in the Supporting Information.To keep the uncertainty
in the estimated free energy low, consecutive values of λ for
neighboring windows should be spaced sufficiently close.[39] The resulting spatial overlap between the conformations
obtained at each window can be exploited by using MBAR,[38] which calculates the free energy by minimizing
the uncertainties in the free energy differences between all of the
states simultaneously. The benefit with respect to TI is that samples
from multiple simulations are combined to compute the overall free
energy difference, which increases the accuracy of the calculation.
To do so, MBAR needs the difference in potential energy ΔU between
all of the pairs of windows p and q. These can be obtained from the windows that have been simulated
(p = q) using Boltzmann reweighting.
Since only the restraint term differs between the potential energies
of two states, ΔU can be calculated from ΔUconf for the set of k = 0, ..., K sampled configurations X at restraint frequency ν according to[36]When using MBAR, an estimate of the statistical
uncertainty that propagates from taking the integral of Uconf into ΔGconf is
provided by MBAR itself. While TI is sensitive to the average curvature
of the observable between the windows, the uncertainty in the result
from MBAR depends on the overlap between the windows. Therefore, an
added advantage of using MBAR is that the consistency of the free
energy calculation can be verified by comparing the answers from two
different analysis techniques (TI in logarithmic space and MBAR).
Consistent answers between the two methods ensures sufficient sampling
in each window, as well as sufficient overlap between the windows.
Guidelines for Free Energy Calculations
In an effort
to bring more standardization in free energy calculations, a number
of best practices have been proposed as general guidelines for the
analysis of free energy calculations:[40]Considering that the
method used in the present article has
been proposed only recently, and to ensure that the results presented
are as accurate as possible, we will follow these guidelines in the
analysis of our results and refer to them when relevant in the remainder
of this article.Use uncorrelated
samples.Compare results
from different analysis
techniques.Ensure
overlapping distributions between
the sampling windows.Confirm sufficient equilibration and
sampling in each window.
Equilibration and Decorrelation of Time Series
In the
calculation of the confinement free energy and its uncertainty, ⟨ρm⟩ should be derived from uncorrelated data sets (guideline
1), obtained from systems simulated at equilibrium (guideline 4).
Hence, to acquire uncorrelated samples, the data were subsampled with
a spacing of at least 2 times the autocorrelation time for each window.[41] Furthermore, the simulations are started from
a configuration that is not necessarily an equilibrium sample for
the Hamiltonian corresponding to that window. From this configuration,
it will take time for the system to reach equilibrium, especially
when the autocorrelation time is long. Therefore, the data need to
be equilibrated before they are used in the free energy calculations.
To do so, a method was used based on maximizing the number of uncorrelated
samples that remains after discarding the equilibration period.[42] The MBAR implementation of these decorrelation
and equilibration strategies was used. All of the free energy calculations
reported here were performed using equilibrated and decorrelated time
series, using the autocorrelation of the ρm time
series.
Crystal Structures and Simulation Setup
We carried
out confinement simulations of the extended intermediate (EI) and
the postfusion (PF) state of influenza hemagglutinin from the A/Aichi/68-X31
H3N2 strain. A structural model of the extended intermediate was obtained
by aligning the coiled coils of the prefusion (protein data bank (PDB)
code 1HGF(43)) and postfusion (PDB code 1QU1(44)) crystal structures using a root-mean-square fit of residues
97–101 in VMD.[32] Using this short
stretch within the trimeric coiled coil resulted in a fit that enabled
a seamless combination of the two structures. For both states, all
of the residues belonging to HA1 were removed and only residues 33–175
of the three monomers of HA2 were used because only these residues
are present in the postfusion crystal structure. The amino acid sequence
of the extended intermediate was matched with that of the postfusion
structure by a C137S mutation. Residues were protonated if their pKa in the postfusion structure was above 5, as
calculated by PROPKA[45,46] using the PDB2PQR web server.[47] Because the EI state occurs at the same pH as
the postfusion state, it was protonated identically. To limit the
computational burden of dynamic protonation, the protonation state
of a residue was kept fixed throughout the simulation.The confinement
simulations were performed with the nanoscale molecular dynamics (NAMD)
program,[47] using the Amber ff14SB all-atom
force field[48] and generalized Born implicit
solvent.[49,50] These parameters were chosen after a comparison
of the stability of the protein (in terms of RMSD values) and the
simulation speed using different MD packages, force fields, and water
models (see Section S.2 in the Supporting Information for details). A short-range interaction cutoff of 1.4 nm was used
without a switching function, with a pair list distance of 1.6 nm.
Nonbonded and electrostatic interactions were calculated at every
time step, while pair lists were updated every 10 time steps.The temperature was maintained at 300 K using Langevin dynamics,
with friction coefficients of 1, 5, 20, and 100 ps–1 in the ranges of ν = 0–0.3, 0.3–1.5, 1.5–5,
and >5 ps–1, respectively. Doing so, the autocorrelation
time at each of the oscillator frequencies is optimized, as proposed
by Villemot et al.[36] Following the recommendation
of the same authors, the simulation time step was taken to depend
on the restraint strength to sufficiently sample even the highest
frequencies. As such, a time step ranging from 1 to 0.01 fs was used
for ν > 1 ps–1, ensuring that each harmonic
oscillator period contained at least 80 time steps. Simulations ran
for a minimum of 2.5 × 106 time steps, depending on
the convergence of each individual simulation, resulting in a minimum
simulation length of 100 ps for the high-restraint range. Configurations
were saved for analysis every 1000 time steps, but at most every picosecond.Even with these specific measures to eliminate errors due to the
size of the integration time step, the mass-weighted RMSD occasionally
showed distinct, short peaks at high restraint strengths. These occurred
when one or more atoms would deviate so far from their reference positions,
presumably due to the random (Langevin) force, that a couple of time
steps were needed to relax them back to their equilibrium positions.
Such systematic errors have previously been shown to cancel out when
calculating the conformational free energy difference between two
states of a smaller system,[35] but led to
a number of rare but obvious shifts in the confinement free energy
in our results. Therefore, data points within a sampling window that
deviated more than four standard deviations from the ensemble average,
thereby clearly identified as out-of-equilibrium outliers, were discarded
before subsampling the data.A time step of 2 fs was used to
speed up the simulations at low
restraint strengths, where the conformational space of the protein
is still relatively large and autocorrelation times are long. For
these simulations, all of the bonds involving hydrogen atoms were
constrained using SHAKE.[51] In an alanine-dipeptide
test case, the reduction in degrees of freedom accompanying these
constraints did not influence the confinement free energy difference
below ν = 6 ps–1, as shown in Figure S1 in the Supporting Information. Apparently,
for these values of ν, the influence of constraining the small
oscillations of hydrogens is negligible compared to the relatively
large RMSD values, especially when only the free energy difference
between the two states is considered. Nevertheless, SHAKE was applied
conservatively for ν < 1 ps–1 only.Replica exchange molecular dynamics (REMD) was used for ν
< 1.76 ps–1. REMD has been shown to decrease
the autocorrelation time below ν ≈ 2 ps–1, but has no effect on the autocorrelation time at higher restraint
strengths.[36] Four replicas were used at
the temperatures 300.00, 308.11, 316.44, and 325.00 K, with exchange
attempts every 1 ps. This resulted in configuration exchanges between
replicas every 5 ps on average.
Window Spacing and Overlap
Coefficient
The windows
in TI calculations are usually equispaced in logarithmic space. However,
Villemot et al.[36] have shown that an increasingly
narrow spacing is necessary with increasing harmonic oscillator frequency
ν to limit the free energy spacing between the windows. Doing
so, enough overlap between the windows can be maintained. To get an
estimate of the free energy spacing, we initially used 21 frequencies,
equally spaced in logarithmic space according to the formula ν = 2.045 × 10–2 ×
1.9 ps–1, i = 0, 1, 2, ..., 20.[29] Because the sampled
conformations still appeared to be too constrained at the lowest restraint
strength, this range was subsequently expanded by adding the lowest
frequency at ν = 0.001 ps–1. To interpolate
between these data points, the most progressive relationship between
the free energy difference and oscillator frequency proposed by Villemot
et al. was used.[36] This meant that the
data points were added in such a way that the maximal free energy
difference between each of the neighboring simulations i and j was determined using[36]with a = 5 kcal/mol and b = 1.55 kcal/mol. In the low-frequency range (ν <
2 ps–1), MBAR was used to estimate the free energy
difference between the data points that were not yet sampled. In the
mid-range frequencies (2 < ν < 163 ps–1), the number of added data points was based on the free energy difference
calculated by TI between the two existing ones, and were equispaced
in logarithmic space. To minimize the number of data points that were
needed for sufficient overlap in the high-range frequencies (ν
> 163 ps–1), data points were added and calculated
iteratively, mid-way in logarithmic space between the already existing
ones. This led to a total of 2178 windows for each of the states.The overlap between the sampling windows was characterized using
the overlap coefficient, which can be determined from the overlap
matrix.[40] The overlap coefficient was calculated
for each row of the overlap matrix, by taking the minimum value of
the sum of the nondiagonal elements in the same row, either to the
right or to the left of the diagonal.
Results
We present
the results of the confinement simulations and compliance
to each of the guidelines for free energy calculations, as discussed
in the “Guidelines for Free Energy Calculations” section. The first of these guidelines is already met by
subsampling the data. In this section, the preparation of the reference
structures is described first. Next, completion of the confinement
to a set of noninteracting harmonic oscillators will be checked. Subsequently,
the confinement free energy is calculated using both the TI and MBAR
analysis techniques (guideline 2). Then, to corroborate these results,
the use of a sufficient number of windows and corresponding overlap
between them is shown (guideline 3). Following the final guideline,
sufficient equilibration and sampling in each window then demonstrates
convergence of the results. Finally, the conformational free energy
difference between the EI and PF state is calculated from these results.
Energy
Minimization
Reference structures for the confinement
simulations were obtained by thorough energy minimization of the structures
for the EI and PF state. The minimum potential energy during the energy
minimization is shown in Figure . Although neither of the states have converged to
an absolute energy minimum, the energy difference remains more or
less constant between the two states. The potential energy difference
ΔGab between the configurations
at the end of these 107 steps of energy minimization was
−222.9 kcal/mol. These final configurations were used as reference
structures X0A and X0B for the confinement free energy
simulations of the extended intermediate and postfusion state, respectively.
Figure 2
Minimum
potential energy during energy minimization of the EI (orange)
and PF (purple) crystal structures. The final configurations, with
a potential energy difference of −222.9 kcal/mol, were used
as reference structures for each respective state in the confinement
simulations.
Minimum
potential energy during energy minimization of the EI (orange)
and PF (purple) crystal structures. The final configurations, with
a potential energy difference of −222.9 kcal/mol, were used
as reference structures for each respective state in the confinement
simulations.
Confinement to the Harmonic
Regime
The results of the
confinement simulations are shown in Figure a as the direct observable ⟨ρm⟩ at all of the sampled values of ν. The conformational
space of the EI state is larger than that of the PF state, as seen
from the difference in ⟨ρm⟩ at low
restraint strengths. The upper inset in Figure a shows the same data in linear space, from
which the large curvature in ⟨ρm⟩ with
respect to ν becomes more apparent. The bottom inset in the
same figure shows the locally linear behavior of the data in logarithmic
space, warranting the numerical integration scheme of eqs and 12.
Figure 3
Results
of the confinement simulations of the EI (orange) and PF
(purple) state of HA. The expectation values from the mass-weighted
RMSD distributions are shown in (a), with the standard deviation of
the distribution indicated by the error bars. The convergence threshold
from the right-hand side of eq is shown as the dashed line in (b), together with the left-hand
side of that equation for each of the states.
Results
of the confinement simulations of the EI (orange) and PF
(purple) state of HA. The expectation values from the mass-weighted
RMSD distributions are shown in (a), with the standard deviation of
the distribution indicated by the error bars. The convergence threshold
from the right-hand side of eq is shown as the dashed line in (b), together with the left-hand
side of that equation for each of the states.Completion of the confinement procedure, by reaching a set
of noninteracting
harmonic oscillators in accordance with eq , is shown in Figure b. The curve for the EI state is similar
to that of the PF state. At the maximum harmonic oscillator frequency
that was sampled (ν = 1121 ps–1), the equality
in eq was reached to
within 99.9%.
Confinement Free Energies
The confinement
free energies
of individual states with respect to the harmonic oscillator frequency
ν are shown in Figure a. The curves shown here are calculated using MBAR, but are
indistinguishable from those calculated by TI in this graphical representation.
The difference between the two confinement free energies (ΔΔGconf, see Figure b) is relatively small compared to the absolute confinement
free energies at the highest restraint strength, emphasizing the importance
of accurate sampling and integration for both states. The confinement
of the EI state takes more energy than that of the PF state in the
range 10–3 < ν < 40 ps–1, causing a drop in ΔΔGconf until −206.2 ± 1.1 kcal/mol, as calculated by MBAR.
At higher restraint strengths, the more compactly folded PF state
takes more energy to confine, and the energy difference goes back
up, until it converges to −202.5 ± 2.0 kcal/mol at ν
= 1120 ps–1. The confinement free energy difference
calculated using TI converges to −202.1 ± 0.4 kcal/mol,
which is well within the uncertainty given by MBAR. The uncertainty
estimated by TI is almost an order of magnitude lower than that calculated
by MBAR because TI does not take into account the amount of overlap
between the windows in this estimate. Hence, the result is consistent
between the two families of analysis techniques, thereby satisfying
guideline 2.
Figure 4
Confinement free energies for the EI (orange) and PF (purple)
state
(a) and the difference between them (b). The results in (b) were calculated
by MBAR (black) and TI (green), with their uncertainties indicated
by the bands in gray and light green, respectively. The inset in (b)
zooms in on the tail region of the graph in the main frame.
Confinement free energies for the EI (orange) and PF (purple)
state
(a) and the difference between them (b). The results in (b) were calculated
by MBAR (black) and TI (green), with their uncertainties indicated
by the bands in gray and light green, respectively. The inset in (b)
zooms in on the tail region of the graph in the main frame.Before calculating the conformational
free energy difference between
the states resulting from these confinement free energies, the validity
of the estimate with respect to the remaining two guidelines for free
energy calculations will be confirmed first. To start with, the number
of sampling windows and the overlap between them will be discussed.
Convergence: Overlapping Distributions
Convergence
of the results for ΔGconf in terms
of a sufficient number of sampling windows is shown in Figure a. For both the EI and PF state,
the result calculated using TI does not change anymore between using
half or all of the windows (within the error bars of the result calculated
over all of the windows). This, however, does not yet guarantee sufficient
overlap between the distributions of neighboring sampling windows,
which is an essential requirement for accurate free energy calculations
(guideline 3). Such an overlap cannot only be accomplished by using
enough intermediate states at different values of ν, they should
also have a judiciously chosen spacing.
Figure 5
(a) Resulting
confinement free energy using TI over the whole frequency
range, with errors σΔ indicated by the error bars. The gray shaded area represents
the uncertainty of the end result to help assess convergence. (b)
The amount of overlap that each of the sampling windows has with its
neighboring windows, as calculated by the overlap coefficient.
The overlap coefficient
quantifies the amount of overlap between the sampled distributions.
An overlap coefficient of 0.25 would mean that half of the sampled
configurations could also be found in one of the two neighboring windows.
Any value above 0.03 is considered sufficient, whereas a smaller value
would cause incorrect free energy estimates and underestimation of
the uncertainty by MBAR.[40] So the optimum,
balancing overlap between and sampling within the windows lies somewhere
in between 0.03 and 0.25.Figure b shows
that there is more than sufficient overlap between the distributions
over the whole frequency range. In fact, the window spacing strategy
described in the “Window Spacing and Overlap
Coefficient” section turns out to be rather conservative
for a large part of the frequency range, especially in the mid-range
frequencies (2 < ν < 163 ps–1) with
overlap coefficients above 0.4. Such a nonoptimal overlap requires
more sampling windows than is necessary, but in turn provides the
advantage that per window, fewer decorrelated samples are required
for an accurate free energy estimate. As can be seen in Figure a, about 2500–4000 samples
per window are acquired in this frequency range. In contrast, for
frequencies above ν = 163 ps–1, where a more
progressive window spacing strategy is used, the overlap coefficient
remains around 0.1 and about 10 000 samples per window were
acquired. So, for these high-range frequencies, a 4-fold increase
in sampling is needed to maintain the same error contribution per
window as in the mid-range frequencies.
Figure 6
(a) Total number of samples obtained at 300
K in each window (black)
and the number of samples that remain after equilibration (red) and
subsequent decorrelation (blue). (b) Using MBAR for an increasing
number of windows over a selected frequency range, the deviation in
ΔGconf and σΔ is shown with respect to those calculated
over all of the windows.
In contrast to TI, MBAR
is particularly sensitive to the window
spacing and the resulting overlap between the observed distributions.
Without an overlap, MBAR either gives highly diverging results with
huge uncertainties if calculating a small number of windows, or, if
the number of windows is too high, it does not converge at all. In
our case, this applied to both the low-range (ν < 2 ps–1) and high-range (ν > 163 ps–1) frequencies as soon as more than half of the sampled windows was
left out of the calculation. The occurrence of overlap coefficients
around 0.25 in these frequency ranges (Figure b) is already an indication for such a behavior.
Nonetheless, the window spacing is sufficiently small in the mid-range
frequencies (2 < ν < 163 ps–1), such
that MBAR calculations converge with reasonable uncertainties while
using only portions of the total number of windows in this range.
The difference in the resulting ΔGconf between using all or a subset of the windows is shown in Figure b. For both states,
the calculated confinement free energy changes at most 0.5 kcal/mol
between using all, or just a 10th, of the total number of windows
in this frequency range. This is a relatively small deviation compared
to the absolute confinement free energies (Figure a), which are 5 orders of magnitude higher.
However, the uncertainty in the result increases considerably with
decreasing number of windows, to 1.5 kcal/mol of added uncertainty
when using a tenth of the original number of windows. This highlights
the importance of using a sufficient number of judiciously spaced
sampling windows for an accurate free energy estimate with low uncertainty.(a) Resulting
confinement free energy using TI over the whole frequency
range, with errors σΔ indicated by the error bars. The gray shaded area represents
the uncertainty of the end result to help assess convergence. (b)
The amount of overlap that each of the sampling windows has with its
neighboring windows, as calculated by the overlap coefficient.(a) Total number of samples obtained at 300
K in each window (black)
and the number of samples that remain after equilibration (red) and
subsequent decorrelation (blue). (b) Using MBAR for an increasing
number of windows over a selected frequency range, the deviation in
ΔGconf and σΔ is shown with respect to those calculated
over all of the windows.An adjustment of the window spacing strategy might prevent
the
occurrence of excessive overlap that we see here. We suggest to combine
the two strategies suggested by Villemot et al.,[36] which are: sequentially adding intermediate frequencies
over the whole range or determining the maximal frequency spacing
using a precalibrated function. Recognizing that in their and our
results a considerably larger window spacing can be used for frequencies
above 2 ps–1, the idea is to sequentially add intermediate
frequencies in three small frequency ranges (“calibration”
ranges) to determine the maximal frequency spacing function below
and above 2 ps–1 individually. First, we propose
that the whole frequency range is sampled using about 20 equidistant
frequencies (ν1, ..., ν20) in logarithmic
space. By making sure that the highest frequency enters the harmonic
regime, these simulations already provide a good indication of the
total range of the confinement free energy, given that they are well
equilibrated and converged. Although the window spacing would generally
be too large for MBAR at this point, such a convergence can be monitored
using TI. Then, simulations are added at frequencies in between the
first two, center two (around ν = 2 ps–1),
and last two windows, until there is sufficient overlap between at
least two of the windows in each of these three calibration ranges.
If the overlap coefficient is much higher than 0.03, slightly larger
spacings could be tried for further optimization. The maximal free
energy difference between two subsequent windows can then be found
by fitting eq on
the free energy differences obtained in the three calibration ranges
[ΔG(νlow), ΔG(νcenter) and
ΔG(νhigh)].
Convergence: Equilibration
and Sampling
Regardless
of the number of sampling windows, insufficient equilibration within
each of the sampling windows can skew the ensemble averages considerably
with respect to the ensemble average in equilibrium. Additionally,
the ensemble averages can be skewed if, in equilibrium, not enough
samples have been gathered. Both types of skewness will be reflected
in an inaccuracy in the free energy that is not taken into account
by the estimated uncertainty. Therefore, in line with guideline 4,
proper equilibration and convergence should always be confirmed. The
conventional way of showing convergence is to calculate the free energy
based on incremental fractions of the available data in the forward
direction of each of the time series. For converged time series, the
ensemble averages should not change when samples are added, so the
resulting plot should settle rapidly (between using 40–60%
of the available data) to within the uncertainty of the final value.
An effective measure to uncover possible nonequilibrated windows is
to include the time-reversed convergence plot, which should also converge
rapidly.[40] As shown in Figure a, this is the case for the
confinement free energies of both the EI and PF states. Consequently,
the confinement free energy difference is well sampled and equilibrated,
as evident from the rapidly converging forward and reverse plots in Figure b. With this, adherence
to all of the guidelines for free energy calculations has been shown,
so the final result will be calculated next.
Figure 7
Overview of the confinement
free energies and uncertainties for
the EI and PF state calculated by MBAR (a) and the differences between
them (b), depending on the number of sampled data points. In the analysis,
increasing fractions of data are taken from the time series in both
the forward (▶) and reverse (◀) direction, as indicated.
The gray shaded area represents the uncertainty of the end result
to help assess convergence.
Overview of the confinement
free energies and uncertainties for
the EI and PF state calculated by MBAR (a) and the differences between
them (b), depending on the number of sampled data points. In the analysis,
increasing fractions of data are taken from the time series in both
the forward (▶) and reverse (◀) direction, as indicated.
The gray shaded area represents the uncertainty of the end result
to help assess convergence.
Conformational Free Energy Difference
The conformational
free energy difference between the extended intermediate and postfusion
state can be calculated from the potential energy difference between
the reference structures and the confinement free energy difference
at the highest restraint frequency. Using eq This result shows that the transformation
from the extended intermediate to the postfusion state of hemagglutinin
is an exergonic reaction that is thermodynamically favorable and that
can supply an estimated free energy of −34.2 ± 3.4kBT per protein to the membrane
fusion process.
Discussion
The extensive conformational
changes in hemagglutinin serve to
overcome the kinetic barriers in membrane fusion. The amount of free
energy that is available to accomplish this task is unknown. Computation
of this quantity is inherently challenging because of the size of
the system and the extent of the conformational changes. We tackled
the latter obstacle by using the confinement method, thereby avoiding
the need to sample all of the conformational space along the path
of the structural changes and only focusing on sampling the two end
states. Before going into the implications of our findings in the
biological context, we first discuss our experiences with this approach
in terms of its efficiency and the possible sources of errors.During the confinement free energy calculations, convergence is
monitored to ensure sufficient sampling, adhering to the guidelines
for free energy calculations.[40] Although
such a careful assessment of convergence and propagation of the sampling
error already provides an estimate of the statistical uncertainty,
the calculations also suffer from systematic errors. These arise from
inaccuracies in the determination of the ensemble averages, due to,
e.g., the integration over too large a time step, or an inaccurate
force field. Others have already investigated the systematic error
due to the size of the integration time step, which was shown to cancel
out when taking the difference between the confinement free energies
of the two states.[35] In our results, however,
some rarely occurring, larger inaccuracies caused more obvious shifts
in the confinement free energy that did not cancel out. The RMSD at
these data points clearly deviated from the ensemble distribution
and were discarded as outliers.The errors introduced by inaccuracies
in the parameters of the
force field or water model are more difficult to evaluate. For example,
the confinement free energy method has recently been extended for
use with explicit water, in which the protein might behave differently.[52] However, the current study was feasible only
by the efficient acceleration of the implicit solvent model on graphical
processing units (GPUs), combined with the advantage of sampling with
a relatively low friction coefficient. Obtaining converged results
in the relatively viscous environment of explicit water currently
seems to be out of reach. In a recent study, however, the combination
of generalized Born implicit solvent with the Amber ff14SB force field
as used here yielded accurate folding to the native conformation for
14 out of 17 proteins with varying properties.[53] Such a score indicates that the parameters used are accurate
for the determination of free energy minima and the corresponding
protein conformations, as in our calculations.In addition to
systematic errors, insufficient equilibration is
a source of statistical uncertainty that is also hard to quantify.
Despite monitoring both the forward and backward averages, there could
always be unsampled events that reveal a longer correlation time,
requiring extension of the simulation time.[54] Inclusion of additional restraints has been proposed to bypass this
problem, speeding up equilibration by lowering the correlation times.[35,36] This might however change the definition of the macrostate and thereby
affect the resulting free energy by an unknown amount. Related to
this, the fact that the definition of the macrostate may be somewhat
arbitrary is a problem for free energy calculations in general.[22] Moreover, we used a hypothesized model for the
reference structure of the EI state. We therefore chose not to introduce
any additional constraints. Because the results show that only the
PF state is stable (has a constant RMSD) at low restraint strengths,
additional constraints are expected to decrease the confinement free
energy of the EI state (ΔGconfA) the most, decreasing the
difference between the two (ΔΔGconfAB). This would
in turn enhance ΔGAB, which means
that the current calculation is a lower bound for the absolute conformational
free energy difference between the two states.To reach convergence
and limit the uncertainty, we followed the
recommendations to balance accuracy and efficiency in the confinement
free energy method by Villemot et al.[36] However, simulation of 19.4 × 109 time steps altogether
(including all replicas) still is a huge computational effort, despite
acceleration of the simulations on GPUs,a following
recommendations in terms of window spacing, monitoring correlation
times, and the use of REMD. Presumably, convergence to the same accuracy
could have been accomplished more efficiently by an improved window
spacing strategy that avoids excessive overlap. Especially in the
low-frequency range (ν < 2 ps–1), where
equilibration takes most of the simulation time and the use of REMD
quadruples the amount of required resources, the window spacing should
be optimal from the start. To improve the window spacing in future
applications, we propose to find the optimal intermediate frequencies
by combining two approaches that have been suggested by Villemot et
al.[36] By calibrating the desired free energy
difference between windows by sequentially adding intermediate frequencies
(first approach) in three small frequency calibration ranges (low,
intermediate, and high), the maximal free energy difference function
(second approach) can be fitted between these calibration ranges.
Doing so should prevent the excessive overlap between windows in the
intermediate frequency range that was seen in our results. However,
to verify this would require more simulations, so this can better
be done on a smaller system first. Although a number of simulations
have to be carried out consecutively in the proposed strategy, we
expect that a more efficient spacing of the subsequent parallel simulations
will supersede this disadvantage.There is a final subtle aspect
in the confinement free energy method
that is easily overlooked. Because rigid-body motions have been removed
by the best-fit alignment, the translational (Gtrans) and rotational (Grot) degrees
of freedom should be taken into account separately. Of these, the
translational components of the free energy are the same for both
states and cancel out. For a protein anchored in the membrane, the
only rotational degree of freedom (around the longest axis) is negligible
and ΔGAB remains unchanged. When
a protein is free in solution, the rotational components are generally
not the same for both states. Although the EI state is significantly
more extended than the PF state, we find the difference ΔGrotAB to be only 0.5 kcal/mol. This results in a conformational free energy
difference of ΔGAB = −33.4
± 3.4kBT for the
protein free in solution.The conformational free energy of
the postfusion state of HA was
found to be 34.2 ± 3.4kBT lower than that of the extended intermediate. This amount can be
interpreted as the contribution of one membrane-embedded HA to membrane
fusion in the biological context of influenza viruses entering a cell.
Although a direct experimental validation for this result is not yet
available, circumstantial evidence suggests that it is close to what
one would expect. For example, it does not exceed the membrane-binding
affinity of the three HA fusion peptides together (about 43kBT; 3 times the binding free
energy of the P20H7 fusion peptide reported by Li et al.[55]), for otherwise, the fusion peptides would be
pulled out of the target membrane upon HA collapse.[6] Moreover, the estimated energy is close to the energy from
partial or full SNARE complex formation, respectively 35 or 65kBT, with one to three SNARE
complexes mediating fusion.[19] Our result
indicates that influenza fusion is catalyzed by one to three fusion
proteins as well, considering that the hydration barrier for hemifusion
has been estimated at 30–90kBT.[9−12] Combining this with the three to six HAs involved in fusion that
others have suggested,[8,20,21] three neighboring (X31) HAs currently seem the best estimate.[8,56] For further comparison, the free energy that is available from an
extended state of a single trimeric HIV-1 gp41 fusion protein to the
postfusion state is about 70kBT.[16] Although much higher than
our estimate for HA, it is consistent with a picture of fusion mediation
by only one gp41 for HIV-1,[57] which takes
considerably more time than for influenza,[58] presumably because of the remaining ∼20kBT barrier.[6]Application of the confinement method on the known or modeled
conformations
of other fusion proteins, specifically gp41[59−61] and the SNARE
complex,[62] would enable a direct comparison
with free energies obtained in the experiments. However, the prefusion
state of the SNARE complex constitutes a separate monomer and dimer,
which are both mainly disordered. This makes the definition of this
state for the confinement free energy method rather arbitrary, and
the result would therefore be unreliable. For the gp41 fusion protein,
a structural model of the extended intermediate was found to be highly
dynamic in our simulations, which would mean high RMSD values and
unfeasibly long computation times. Hence, a direct comparison with
experiments can only proceed once these problems have been alleviated.As a possible step in the future, the trajectories of the current
results could be analyzed to find the per-residue contribution to
the free energy difference.[30] From this
refinement, the amino acids that contribute most could be compared
with the documented effects of certain mutations on hemifusion.[7,63] Alternatively, the effects of certain mutations on the conformational
free energy of the postfusion state could further be investigated
using a new series of confinement simulations.[31] It would also be interesting to investigate HA from an
H1N1 strain, which we expect to release less energy because it shows
a marked difference in fusion efficiency with H3 HA.[14] If so, comparing the per-residue contributions between
the two strains would give valuable information about the conservation
of highly contributing amino acids.
Conclusions
The
confinement free energy method was used to calculate the energy
that hemagglutinin has available for membrane fusion. This energy
is assumed to be equivalent to the conformational free energy difference
between a model of the extended intermediate and the postfusion crystal
structure. Convergence of the results was achieved by following the
specific recommendations for the confinement method given in the literature,
and at the same time complying to the general guidelines for free
energy calculations. One membrane-anchored HA trimer was found to
contribute a free energy of 34.2 ± 3.4kBT to the membrane fusion process, consistent
with current estimates of the number of HAs and the free energy barriers
in membrane fusion, as well as with the free energy contributions
that have been obtained for other fusion proteins. Although computationally
expensive, the used method has potential in the search for residues
that contribute most to the energy for fusion. Knowledge of specific
conserved residues that are key to the fusion mechanism may lead to
the design of a physics-based drug that targets these residues.
Authors: N K Sauter; J E Hanson; G D Glick; J H Brown; R L Crowther; S J Park; J J Skehel; D C Wiley Journal: Biochemistry Date: 1992-10-13 Impact factor: 3.162
Authors: Riccardo Capelli; François Villemot; Elisabetta Moroni; Guido Tiana; Arjan van der Vaart; Giorgio Colombo Journal: J Phys Chem Lett Date: 2015-12-22 Impact factor: 6.475
Authors: Jason J Otterstrom; Boerries Brandenburg; Martin H Koldijk; Jarek Juraszek; Chan Tang; Samaneh Mashaghi; Ted Kwaks; Jaap Goudsmit; Ronald Vogels; Robert H E Friesen; Antoine M van Oijen Journal: Proc Natl Acad Sci U S A Date: 2014-11-17 Impact factor: 11.205