Literature DB >> 29151344

Computation of Hemagglutinin Free Energy Difference by the Confinement Method.

Sander Boonstra¹, Patrick R Onck¹, Erik van der Giessen¹.

Abstract

Hemagglutinin (HA) mediates membrane fusion, a crucial step during influenza virus cell entry. How many HAs are needed for this process is still subject to debate. To aid in this discussion, the confinement free energy method was used to calculate the conformational free energy difference between the extended intermediate and postfusion state of HA. Special care was taken to comply with the general guidelines for free energy calculations, thereby obtaining convergence and demonstrating reliability of the results. The energy that one HA trimer contributes to fusion was found to be 34.2 ± 3.4kBT, similar to the known contributions from other fusion proteins. Although computationally expensive, the technique used is a promising tool for the further energetic characterization of fusion protein mechanisms. Knowledge of the energetic contributions per protein, and of conserved residues that are crucial for fusion, aids in the development of fusion inhibitors for antiviral drugs.

Entities: Chemical Disease Gene Mutation Species

Mesh：

Substances：
Hemagglutinins

Year: 2017 PMID： 29151344 PMCID： PMC5742479 DOI： 10.1021/acs.jpcb.7b09699

Source DB: PubMed Journal: J Phys Chem B ISSN： 1520-5207 Impact factor: 2.991

Introduction

The glycoprotein hemagglutinin (HA) catalyzes membrane fusion during the invasion of cells by influenza virus particles.[1] HA is a homotrimeric class I fusion protein consisting of two subunits, a mostly globular binding domain (HA1) and a fusion-active domain (HA2) that is anchored in the viral membrane at its C-terminal. HA1 is disulfide-bonded to HA2, whose central α-helical coiled coil forms the core of the trimer. HA1 facilitates binding to receptors on the target cell membrane and holds the protein in a metastable configuration. After endocytosis, acidification of the endosome triggers dissociation of HA1 and release of fusion peptides at the N-terminal of HA2.[2,3] A large conformational change in HA2 ensues, as is evident from the comparison of the prefusion and postfusion (PF) conformations of HA.[4,5] In fact, a pathway of HA rearrangements has been deduced, which involves two large conformational changes.[6] First, a loop-to-helix transition extends the existing coiled coil and projects the fusion peptides toward the target membrane. In this hypothesized “extended intermediate” (EI) state, the fusion peptides can insert into the target membrane, thereby establishing HA2 as a bridge between the viral and endosomal membrane. Subsequently, the globular C-terminal domain of HA2 collapses and zippers up along distinct hydrophobic patches on the extended coiled coil, so as to bring the two membranes together for fusion.[7] Single-particle fusion kinetics experiments have shown that HA triggering is the rate-limiting step, followed by fast HA extension and fusion peptide insertion. The resulting intermediate state of HA remains alive until a local cluster of sufficiently many HAs can jointly provide enough energy to fuse the membranes.[6,8] However, the amount of energy that each HA can contribute to this process has not yet been determined. The free energy available per HA is valuable information in the development of antiviral therapeutics. It is directly related to the number of HAs needed to surmount the appreciable membrane fusion energy barrier, estimated at 30–90kBT.[9−12] Both the number of HAs and the free energy per HA influence the kinetics of fusion, which has to be tightly regulated to ensure release of the viral RNA close to the target cell nucleus, while avoiding degradation by the host immune system.[13] Therefore, these two quantities influence virus infectivity, dictate the minimal receptor density on the target membrane, and define the number of HA-targeting antibodies that will be needed to neutralize a virus particle.[14,15] To our knowledge, HIV-1 gp41 and the SNARE complex are the only fusion catalysts for which the energy contribution to membrane fusion is known: 71kBT for gp41 and 65kBT for SNARE, both determined using optical tweezers.[16,17] Depending on the virus strain, HIV-1 entry requires one to seven gp41 trimers,[18] whereas one to three SNARE complexes can mediate synaptic vesicle fusion.[19] For HA-mediated fusion, the required number of trimers has been estimated to be between three and six.[8,20,21] However, an experimental determination of the energy that HA contributes is still lacking. Alternatively, computational methods may be used to compute this energy, but they bring a number of challenges for systems as large as HA. Computational difficulties arise mainly because of the vast conformational space that needs to be sampled and many local energy minima in which the system can get stuck.[22] For example, targeted molecular dynamics (MD) is able to give insight into the pre- to postfusion pathway,[23] but an accurate free energy profile along this path would require further sampling with more sophisticated (and laborious) methods.[24] Some researchers have computed the conformational free energy gain of the loop-to-helix transition that extends HA2, thereby reducing the size of the system to only tens of residues.[25,26] However, in view of the described pathway of HA rearrangements, it is the entire free energy that is released during the transition from the extended intermediate to the postfusion state that needs to be calculated. To do so, a method is needed that makes the computation of conformational free energy differences for such a large system feasible. There is no generally accepted method for computing the free energy difference between the conformational states of a macromolecule, which is a particularly difficult task when the two conformations differ significantly from each other.[22] As the two states of interest do not overlap, the path-dependent approach requires integration over overlapping intermediate states along some reaction coordinate to calculate the change in free energy during the transformation from one state into the other. However, even though the intermediate states do not need to be physically realistic, the establishment of a suitable reaction coordinate and a reaction path is challenging, and some paths converge much slower than others.[27] Moreover, the system can still become trapped in an energy basin, even within the presence of a biasing potential. Additional reaction coordinates that bias the phase space perpendicular to the original reaction coordinate might be necessary, especially for larger systems, at the cost of even more complexity.[28] Instead, we here apply the confinement free energy method, proposed a few years ago by Karplus and co-workers,[29] to determine the conformational free energy difference between the extended intermediate and the postfusion state of HA. This method is path-independent and has been used to estimate the free energy difference between two conformational states by approximating the internal motions of a protein as a superposition of harmonic oscillators.[29−31] By applying enhanced sampling techniques and a number of optimization strategies, in compliance with general guidelines for free energy calculations, the result of our calculation shows convergence and consistency between two different analysis techniques.

Methods

Confinement Free Energy Method

The confinement approach[29,33−35] is a path-independent method that can be used to estimate the conformational free energy difference ΔGAB between two states, A and B, without the need of a transition path. Its central idea is illustrated in Figure . Macrostate A is transformed into a set of noninteracting harmonic oscillators, a, for which the free energy Ga can be evaluated analytically. The work done to perform this transformation is the confinement free energy ΔGconfA. So, going from a to A in Figure , the conformational free energy of state A is evaluated asA similar operation is performed for state B. Completing the cycle, the conformational free energy difference between the states can be computed asFocusing on the confinement procedure to calculate ΔGconfA first, the transformation A → a is accomplished by a series of molecular dynamics simulations, in which the system of interest is confined to state a. To this end, the Hamiltonian is extended with a harmonic restraint, centered at the positions of a reference structure X0A ∈ A. This restraint increases stepwise with each new simulation, proportionally with the parameter λ = 0, ..., 1, until the system can be considered purely harmonic at λ = 1. This corresponds to the Hamiltonian[29]which includes the potential and kinetic energy terms E(X) and K(P), depending on the system configuration X and the particle momenta P, respectively. The particle masses m are input for the restraint over all N particles, while ∥·∥ denotes the Euclidean norm and is evaluated after a mass-weighted best-fit of the instantaneous particle coordinates x with the reference coordinates x0. This best-fit alignment improves the convergence of the procedure by precluding work done on restraining rigid body motions. The mass-weighting ensures the transformation into a set of noninteracting harmonic oscillators for which the free energy (Ga) is known analytically, thereby obviating the need for a normal-mode analysis to determine ΔGab in a previous version of the method.[35] The total work performed over all of the values of the restraint parameter λ is the confinement free energy ΔGconf, which can be computed using[36]Here, M is the total mass of the system, ⟨ρm(X, X0)⟩ is the mass-weighted root-mean-square deviation (RMSD) of the sampled configuration X with respect to the reference structure X0, and the ensemble average denoted by ⟨·⟩ is obtained from a simulation using the Hamiltonian Hλ(λ). Calculating ⟨ρm⟩ this way in postprocessing, Uconf effectively gives the average work done by the restraint term during the simulation. Equation is valid for ν chosen sufficiently large, as evaluated by the convergence criterion[29]where NDOF is the number of degrees of freedom in the system and β = 1/kBT. In this limit, the system behaves as a set of noninteracting harmonic oscillators, of which virtually all of the energy is due to the restraints. Reaching that point, ΔGconfA can be interpreted as the “anharmonic” part of the free energy of state A. By subtracting this from the “harmonic” free energy of state a and changing the variable of integration in eq to ζ = λν2, the total conformational free energy of state A is computed aswhere h is Planck’s constant. E(X0A) is the potential energy of the reference structure for state A, which is obtained by taking the ensemble average of the potential energy at the highest restraint frequency. The conformational free energy of state B can be obtained similarly using X0B as the reference structure. The second term on the right-hand side of eq is the free energy of the set of noninteracting harmonic oscillators representing a state, which is identical for two confined states with the same number of degrees of freedom. Hence, if one is only interested in the difference in free energy between two states, this term vanishes and

Figure 1

Thermodynamic cycle illustrating the confinement free energy method. The free energy of macrostate A is estimated from the free energy Ga of the set of harmonic oscillators a, oscillating around the reference structure X0A, and the work ΔGconfA needed to harmonically confine A to a as computed by thermodynamic integration. The conformational free energy difference between states A and B is then found by combining the confinement free energies ΔGconfB – ΔGconfA with the analytically obtained harmonic free energies Ga and Gb (eq ). On the right, the reference structures are shown in cartoon and licorice representation for the EI (orange, state a) and PF (purple, state b) states. The overlaid cartoon representations on the left depict representative conformations from the simulation with the lowest restraint strength for each of the states. Rendered with visual molecular dynamics (VMD).[32]

Thermodynamic Integration (TI) and Bennett Acceptance Ratio Estimator (MBAR)

The confinement free energy (eq ) can be calculated either directly using thermodynamic integration (TI)[37] or via the multistate Bennett acceptance ratio estimator (MBAR).[38] When using TI with numerical integration, the trapezoid rule with linear interpolation is not suitable due to the strong nonlinearity of ⟨ρm(X, X0)2⟩ as a function of ν.[34] Instead, the integration is performed using linear interpolation in logarithmic space as follows.[35] With a function y(k) that has been evaluated at the discrete points {a = k0, k1, k2, ..., k = b}, the area under the curve of this function between the points i and j = i + 1 isHence, the integral of y(k) is calculated numerically usingThe statistical uncertainty of the confinement free energy method arises mainly from the sampling error in obtaining ⟨ρm⟩. To compute the error in the conformational free-energy difference G, the error in ρm is propagated through eq . When numerical integration is used, the error propagates through eqs and 12, as detailed in Section S.1 in the Supporting Information. To keep the uncertainty in the estimated free energy low, consecutive values of λ for neighboring windows should be spaced sufficiently close.[39] The resulting spatial overlap between the conformations obtained at each window can be exploited by using MBAR,[38] which calculates the free energy by minimizing the uncertainties in the free energy differences between all of the states simultaneously. The benefit with respect to TI is that samples from multiple simulations are combined to compute the overall free energy difference, which increases the accuracy of the calculation. To do so, MBAR needs the difference in potential energy ΔU between all of the pairs of windows p and q. These can be obtained from the windows that have been simulated (p = q) using Boltzmann reweighting. Since only the restraint term differs between the potential energies of two states, ΔU can be calculated from ΔUconf for the set of k = 0, ..., K sampled configurations X at restraint frequency ν according to[36]When using MBAR, an estimate of the statistical uncertainty that propagates from taking the integral of Uconf into ΔGconf is provided by MBAR itself. While TI is sensitive to the average curvature of the observable between the windows, the uncertainty in the result from MBAR depends on the overlap between the windows. Therefore, an added advantage of using MBAR is that the consistency of the free energy calculation can be verified by comparing the answers from two different analysis techniques (TI in logarithmic space and MBAR). Consistent answers between the two methods ensures sufficient sampling in each window, as well as sufficient overlap between the windows.

Guidelines for Free Energy Calculations

In an effort to bring more standardization in free energy calculations, a number of best practices have been proposed as general guidelines for the analysis of free energy calculations:[40]Considering that the method used in the present article has been proposed only recently, and to ensure that the results presented are as accurate as possible, we will follow these guidelines in the analysis of our results and refer to them when relevant in the remainder of this article. Use uncorrelated samples. Compare results from different analysis techniques. Ensure overlapping distributions between the sampling windows. Confirm sufficient equilibration and sampling in each window.

Equilibration and Decorrelation of Time Series

In the calculation of the confinement free energy and its uncertainty, ⟨ρm⟩ should be derived from uncorrelated data sets (guideline 1), obtained from systems simulated at equilibrium (guideline 4). Hence, to acquire uncorrelated samples, the data were subsampled with a spacing of at least 2 times the autocorrelation time for each window.[41] Furthermore, the simulations are started from a configuration that is not necessarily an equilibrium sample for the Hamiltonian corresponding to that window. From this configuration, it will take time for the system to reach equilibrium, especially when the autocorrelation time is long. Therefore, the data need to be equilibrated before they are used in the free energy calculations. To do so, a method was used based on maximizing the number of uncorrelated samples that remains after discarding the equilibration period.[42] The MBAR implementation of these decorrelation and equilibration strategies was used. All of the free energy calculations reported here were performed using equilibrated and decorrelated time series, using the autocorrelation of the ρm time series.

Crystal Structures and Simulation Setup

We carried out confinement simulations of the extended intermediate (EI) and the postfusion (PF) state of influenza hemagglutinin from the A/Aichi/68-X31 H3N2 strain. A structural model of the extended intermediate was obtained by aligning the coiled coils of the prefusion (protein data bank (PDB) code 1HGF(43)) and postfusion (PDB code 1QU1(44)) crystal structures using a root-mean-square fit of residues 97–101 in VMD.[32] Using this short stretch within the trimeric coiled coil resulted in a fit that enabled a seamless combination of the two structures. For both states, all of the residues belonging to HA1 were removed and only residues 33–175 of the three monomers of HA2 were used because only these residues are present in the postfusion crystal structure. The amino acid sequence of the extended intermediate was matched with that of the postfusion structure by a C137S mutation. Residues were protonated if their pKa in the postfusion structure was above 5, as calculated by PROPKA[45,46] using the PDB2PQR web server.[47] Because the EI state occurs at the same pH as the postfusion state, it was protonated identically. To limit the computational burden of dynamic protonation, the protonation state of a residue was kept fixed throughout the simulation. The confinement simulations were performed with the nanoscale molecular dynamics (NAMD) program,[47] using the Amber ff14SB all-atom force field[48] and generalized Born implicit solvent.[49,50] These parameters were chosen after a comparison of the stability of the protein (in terms of RMSD values) and the simulation speed using different MD packages, force fields, and water models (see Section S.2 in the Supporting Information for details). A short-range interaction cutoff of 1.4 nm was used without a switching function, with a pair list distance of 1.6 nm. Nonbonded and electrostatic interactions were calculated at every time step, while pair lists were updated every 10 time steps. The temperature was maintained at 300 K using Langevin dynamics, with friction coefficients of 1, 5, 20, and 100 ps–1 in the ranges of ν = 0–0.3, 0.3–1.5, 1.5–5, and >5 ps–1, respectively. Doing so, the autocorrelation time at each of the oscillator frequencies is optimized, as proposed by Villemot et al.[36] Following the recommendation of the same authors, the simulation time step was taken to depend on the restraint strength to sufficiently sample even the highest frequencies. As such, a time step ranging from 1 to 0.01 fs was used for ν > 1 ps–1, ensuring that each harmonic oscillator period contained at least 80 time steps. Simulations ran for a minimum of 2.5 × 106 time steps, depending on the convergence of each individual simulation, resulting in a minimum simulation length of 100 ps for the high-restraint range. Configurations were saved for analysis every 1000 time steps, but at most every picosecond. Even with these specific measures to eliminate errors due to the size of the integration time step, the mass-weighted RMSD occasionally showed distinct, short peaks at high restraint strengths. These occurred when one or more atoms would deviate so far from their reference positions, presumably due to the random (Langevin) force, that a couple of time steps were needed to relax them back to their equilibrium positions. Such systematic errors have previously been shown to cancel out when calculating the conformational free energy difference between two states of a smaller system,[35] but led to a number of rare but obvious shifts in the confinement free energy in our results. Therefore, data points within a sampling window that deviated more than four standard deviations from the ensemble average, thereby clearly identified as out-of-equilibrium outliers, were discarded before subsampling the data. A time step of 2 fs was used to speed up the simulations at low restraint strengths, where the conformational space of the protein is still relatively large and autocorrelation times are long. For these simulations, all of the bonds involving hydrogen atoms were constrained using SHAKE.[51] In an alanine-dipeptide test case, the reduction in degrees of freedom accompanying these constraints did not influence the confinement free energy difference below ν = 6 ps–1, as shown in Figure S1 in the Supporting Information. Apparently, for these values of ν, the influence of constraining the small oscillations of hydrogens is negligible compared to the relatively large RMSD values, especially when only the free energy difference between the two states is considered. Nevertheless, SHAKE was applied conservatively for ν < 1 ps–1 only. Replica exchange molecular dynamics (REMD) was used for ν < 1.76 ps–1. REMD has been shown to decrease the autocorrelation time below ν ≈ 2 ps–1, but has no effect on the autocorrelation time at higher restraint strengths.[36] Four replicas were used at the temperatures 300.00, 308.11, 316.44, and 325.00 K, with exchange attempts every 1 ps. This resulted in configuration exchanges between replicas every 5 ps on average.

Window Spacing and Overlap Coefficient

The windows in TI calculations are usually equispaced in logarithmic space. However, Villemot et al.[36] have shown that an increasingly narrow spacing is necessary with increasing harmonic oscillator frequency ν to limit the free energy spacing between the windows. Doing so, enough overlap between the windows can be maintained. To get an estimate of the free energy spacing, we initially used 21 frequencies, equally spaced in logarithmic space according to the formula ν = 2.045 × 10–2 × 1.9 ps–1, i = 0, 1, 2, ..., 20.[29] Because the sampled conformations still appeared to be too constrained at the lowest restraint strength, this range was subsequently expanded by adding the lowest frequency at ν = 0.001 ps–1. To interpolate between these data points, the most progressive relationship between the free energy difference and oscillator frequency proposed by Villemot et al. was used.[36] This meant that the data points were added in such a way that the maximal free energy difference between each of the neighboring simulations i and j was determined using[36]with a = 5 kcal/mol and b = 1.55 kcal/mol. In the low-frequency range (ν < 2 ps–1), MBAR was used to estimate the free energy difference between the data points that were not yet sampled. In the mid-range frequencies (2 < ν < 163 ps–1), the number of added data points was based on the free energy difference calculated by TI between the two existing ones, and were equispaced in logarithmic space. To minimize the number of data points that were needed for sufficient overlap in the high-range frequencies (ν > 163 ps–1), data points were added and calculated iteratively, mid-way in logarithmic space between the already existing ones. This led to a total of 2178 windows for each of the states. The overlap between the sampling windows was characterized using the overlap coefficient, which can be determined from the overlap matrix.[40] The overlap coefficient was calculated for each row of the overlap matrix, by taking the minimum value of the sum of the nondiagonal elements in the same row, either to the right or to the left of the diagonal.

Results

We present the results of the confinement simulations and compliance to each of the guidelines for free energy calculations, as discussed in the “Guidelines for Free Energy Calculations” section. The first of these guidelines is already met by subsampling the data. In this section, the preparation of the reference structures is described first. Next, completion of the confinement to a set of noninteracting harmonic oscillators will be checked. Subsequently, the confinement free energy is calculated using both the TI and MBAR analysis techniques (guideline 2). Then, to corroborate these results, the use of a sufficient number of windows and corresponding overlap between them is shown (guideline 3). Following the final guideline, sufficient equilibration and sampling in each window then demonstrates convergence of the results. Finally, the conformational free energy difference between the EI and PF state is calculated from these results.

Energy Minimization

Reference structures for the confinement simulations were obtained by thorough energy minimization of the structures for the EI and PF state. The minimum potential energy during the energy minimization is shown in Figure . Although neither of the states have converged to an absolute energy minimum, the energy difference remains more or less constant between the two states. The potential energy difference ΔGab between the configurations at the end of these 107 steps of energy minimization was −222.9 kcal/mol. These final configurations were used as reference structures X0A and X0B for the confinement free energy simulations of the extended intermediate and postfusion state, respectively.

Figure 2

Minimum potential energy during energy minimization of the EI (orange) and PF (purple) crystal structures. The final configurations, with a potential energy difference of −222.9 kcal/mol, were used as reference structures for each respective state in the confinement simulations.

Confinement to the Harmonic Regime

The results of the confinement simulations are shown in Figure a as the direct observable ⟨ρm⟩ at all of the sampled values of ν. The conformational space of the EI state is larger than that of the PF state, as seen from the difference in ⟨ρm⟩ at low restraint strengths. The upper inset in Figure a shows the same data in linear space, from which the large curvature in ⟨ρm⟩ with respect to ν becomes more apparent. The bottom inset in the same figure shows the locally linear behavior of the data in logarithmic space, warranting the numerical integration scheme of eqs and 12.

Figure 3

Results of the confinement simulations of the EI (orange) and PF (purple) state of HA. The expectation values from the mass-weighted RMSD distributions are shown in (a), with the standard deviation of the distribution indicated by the error bars. The convergence threshold from the right-hand side of eq is shown as the dashed line in (b), together with the left-hand side of that equation for each of the states. Completion of the confinement procedure, by reaching a set of noninteracting harmonic oscillators in accordance with eq , is shown in Figure b. The curve for the EI state is similar to that of the PF state. At the maximum harmonic oscillator frequency that was sampled (ν = 1121 ps–1), the equality in eq was reached to within 99.9%.

Confinement Free Energies

The confinement free energies of individual states with respect to the harmonic oscillator frequency ν are shown in Figure a. The curves shown here are calculated using MBAR, but are indistinguishable from those calculated by TI in this graphical representation. The difference between the two confinement free energies (ΔΔGconf, see Figure b) is relatively small compared to the absolute confinement free energies at the highest restraint strength, emphasizing the importance of accurate sampling and integration for both states. The confinement of the EI state takes more energy than that of the PF state in the range 10–3 < ν < 40 ps–1, causing a drop in ΔΔGconf until −206.2 ± 1.1 kcal/mol, as calculated by MBAR. At higher restraint strengths, the more compactly folded PF state takes more energy to confine, and the energy difference goes back up, until it converges to −202.5 ± 2.0 kcal/mol at ν = 1120 ps–1. The confinement free energy difference calculated using TI converges to −202.1 ± 0.4 kcal/mol, which is well within the uncertainty given by MBAR. The uncertainty estimated by TI is almost an order of magnitude lower than that calculated by MBAR because TI does not take into account the amount of overlap between the windows in this estimate. Hence, the result is consistent between the two families of analysis techniques, thereby satisfying guideline 2.

Figure 4

Confinement free energies for the EI (orange) and PF (purple) state (a) and the difference between them (b). The results in (b) were calculated by MBAR (black) and TI (green), with their uncertainties indicated by the bands in gray and light green, respectively. The inset in (b) zooms in on the tail region of the graph in the main frame. Before calculating the conformational free energy difference between the states resulting from these confinement free energies, the validity of the estimate with respect to the remaining two guidelines for free energy calculations will be confirmed first. To start with, the number of sampling windows and the overlap between them will be discussed.

Convergence: Overlapping Distributions

Convergence of the results for ΔGconf in terms of a sufficient number of sampling windows is shown in Figure a. For both the EI and PF state, the result calculated using TI does not change anymore between using half or all of the windows (within the error bars of the result calculated over all of the windows). This, however, does not yet guarantee sufficient overlap between the distributions of neighboring sampling windows, which is an essential requirement for accurate free energy calculations (guideline 3). Such an overlap cannot only be accomplished by using enough intermediate states at different values of ν, they should also have a judiciously chosen spacing.

Figure 5

(a) Resulting confinement free energy using TI over the whole frequency range, with errors σΔ indicated by the error bars. The gray shaded area represents the uncertainty of the end result to help assess convergence. (b) The amount of overlap that each of the sampling windows has with its neighboring windows, as calculated by the overlap coefficient.

The overlap coefficient quantifies the amount of overlap between the sampled distributions. An overlap coefficient of 0.25 would mean that half of the sampled configurations could also be found in one of the two neighboring windows. Any value above 0.03 is considered sufficient, whereas a smaller value would cause incorrect free energy estimates and underestimation of the uncertainty by MBAR.[40] So the optimum, balancing overlap between and sampling within the windows lies somewhere in between 0.03 and 0.25. Figure b shows that there is more than sufficient overlap between the distributions over the whole frequency range. In fact, the window spacing strategy described in the “Window Spacing and Overlap Coefficient” section turns out to be rather conservative for a large part of the frequency range, especially in the mid-range frequencies (2 < ν < 163 ps–1) with overlap coefficients above 0.4. Such a nonoptimal overlap requires more sampling windows than is necessary, but in turn provides the advantage that per window, fewer decorrelated samples are required for an accurate free energy estimate. As can be seen in Figure a, about 2500–4000 samples per window are acquired in this frequency range. In contrast, for frequencies above ν = 163 ps–1, where a more progressive window spacing strategy is used, the overlap coefficient remains around 0.1 and about 10 000 samples per window were acquired. So, for these high-range frequencies, a 4-fold increase in sampling is needed to maintain the same error contribution per window as in the mid-range frequencies.

Figure 6

(a) Total number of samples obtained at 300 K in each window (black) and the number of samples that remain after equilibration (red) and subsequent decorrelation (blue). (b) Using MBAR for an increasing number of windows over a selected frequency range, the deviation in ΔGconf and σΔ is shown with respect to those calculated over all of the windows.

In contrast to TI, MBAR is particularly sensitive to the window spacing and the resulting overlap between the observed distributions. Without an overlap, MBAR either gives highly diverging results with huge uncertainties if calculating a small number of windows, or, if the number of windows is too high, it does not converge at all. In our case, this applied to both the low-range (ν < 2 ps–1) and high-range (ν > 163 ps–1) frequencies as soon as more than half of the sampled windows was left out of the calculation. The occurrence of overlap coefficients around 0.25 in these frequency ranges (Figure b) is already an indication for such a behavior. Nonetheless, the window spacing is sufficiently small in the mid-range frequencies (2 < ν < 163 ps–1), such that MBAR calculations converge with reasonable uncertainties while using only portions of the total number of windows in this range. The difference in the resulting ΔGconf between using all or a subset of the windows is shown in Figure b. For both states, the calculated confinement free energy changes at most 0.5 kcal/mol between using all, or just a 10th, of the total number of windows in this frequency range. This is a relatively small deviation compared to the absolute confinement free energies (Figure a), which are 5 orders of magnitude higher. However, the uncertainty in the result increases considerably with decreasing number of windows, to 1.5 kcal/mol of added uncertainty when using a tenth of the original number of windows. This highlights the importance of using a sufficient number of judiciously spaced sampling windows for an accurate free energy estimate with low uncertainty. (a) Resulting confinement free energy using TI over the whole frequency range, with errors σΔ indicated by the error bars. The gray shaded area represents the uncertainty of the end result to help assess convergence. (b) The amount of overlap that each of the sampling windows has with its neighboring windows, as calculated by the overlap coefficient. (a) Total number of samples obtained at 300 K in each window (black) and the number of samples that remain after equilibration (red) and subsequent decorrelation (blue). (b) Using MBAR for an increasing number of windows over a selected frequency range, the deviation in ΔGconf and σΔ is shown with respect to those calculated over all of the windows. An adjustment of the window spacing strategy might prevent the occurrence of excessive overlap that we see here. We suggest to combine the two strategies suggested by Villemot et al.,[36] which are: sequentially adding intermediate frequencies over the whole range or determining the maximal frequency spacing using a precalibrated function. Recognizing that in their and our results a considerably larger window spacing can be used for frequencies above 2 ps–1, the idea is to sequentially add intermediate frequencies in three small frequency ranges (“calibration” ranges) to determine the maximal frequency spacing function below and above 2 ps–1 individually. First, we propose that the whole frequency range is sampled using about 20 equidistant frequencies (ν1, ..., ν20) in logarithmic space. By making sure that the highest frequency enters the harmonic regime, these simulations already provide a good indication of the total range of the confinement free energy, given that they are well equilibrated and converged. Although the window spacing would generally be too large for MBAR at this point, such a convergence can be monitored using TI. Then, simulations are added at frequencies in between the first two, center two (around ν = 2 ps–1), and last two windows, until there is sufficient overlap between at least two of the windows in each of these three calibration ranges. If the overlap coefficient is much higher than 0.03, slightly larger spacings could be tried for further optimization. The maximal free energy difference between two subsequent windows can then be found by fitting eq on the free energy differences obtained in the three calibration ranges [ΔG(νlow), ΔG(νcenter) and ΔG(νhigh)].

Convergence: Equilibration and Sampling

Regardless of the number of sampling windows, insufficient equilibration within each of the sampling windows can skew the ensemble averages considerably with respect to the ensemble average in equilibrium. Additionally, the ensemble averages can be skewed if, in equilibrium, not enough samples have been gathered. Both types of skewness will be reflected in an inaccuracy in the free energy that is not taken into account by the estimated uncertainty. Therefore, in line with guideline 4, proper equilibration and convergence should always be confirmed. The conventional way of showing convergence is to calculate the free energy based on incremental fractions of the available data in the forward direction of each of the time series. For converged time series, the ensemble averages should not change when samples are added, so the resulting plot should settle rapidly (between using 40–60% of the available data) to within the uncertainty of the final value. An effective measure to uncover possible nonequilibrated windows is to include the time-reversed convergence plot, which should also converge rapidly.[40] As shown in Figure a, this is the case for the confinement free energies of both the EI and PF states. Consequently, the confinement free energy difference is well sampled and equilibrated, as evident from the rapidly converging forward and reverse plots in Figure b. With this, adherence to all of the guidelines for free energy calculations has been shown, so the final result will be calculated next.

Figure 7

Overview of the confinement free energies and uncertainties for the EI and PF state calculated by MBAR (a) and the differences between them (b), depending on the number of sampled data points. In the analysis, increasing fractions of data are taken from the time series in both the forward (▶) and reverse (◀) direction, as indicated. The gray shaded area represents the uncertainty of the end result to help assess convergence.

Conformational Free Energy Difference

The conformational free energy difference between the extended intermediate and postfusion state can be calculated from the potential energy difference between the reference structures and the confinement free energy difference at the highest restraint frequency. Using eq This result shows that the transformation from the extended intermediate to the postfusion state of hemagglutinin is an exergonic reaction that is thermodynamically favorable and that can supply an estimated free energy of −34.2 ± 3.4kBT per protein to the membrane fusion process.

Discussion

The extensive conformational changes in hemagglutinin serve to overcome the kinetic barriers in membrane fusion. The amount of free energy that is available to accomplish this task is unknown. Computation of this quantity is inherently challenging because of the size of the system and the extent of the conformational changes. We tackled the latter obstacle by using the confinement method, thereby avoiding the need to sample all of the conformational space along the path of the structural changes and only focusing on sampling the two end states. Before going into the implications of our findings in the biological context, we first discuss our experiences with this approach in terms of its efficiency and the possible sources of errors. During the confinement free energy calculations, convergence is monitored to ensure sufficient sampling, adhering to the guidelines for free energy calculations.[40] Although such a careful assessment of convergence and propagation of the sampling error already provides an estimate of the statistical uncertainty, the calculations also suffer from systematic errors. These arise from inaccuracies in the determination of the ensemble averages, due to, e.g., the integration over too large a time step, or an inaccurate force field. Others have already investigated the systematic error due to the size of the integration time step, which was shown to cancel out when taking the difference between the confinement free energies of the two states.[35] In our results, however, some rarely occurring, larger inaccuracies caused more obvious shifts in the confinement free energy that did not cancel out. The RMSD at these data points clearly deviated from the ensemble distribution and were discarded as outliers. The errors introduced by inaccuracies in the parameters of the force field or water model are more difficult to evaluate. For example, the confinement free energy method has recently been extended for use with explicit water, in which the protein might behave differently.[52] However, the current study was feasible only by the efficient acceleration of the implicit solvent model on graphical processing units (GPUs), combined with the advantage of sampling with a relatively low friction coefficient. Obtaining converged results in the relatively viscous environment of explicit water currently seems to be out of reach. In a recent study, however, the combination of generalized Born implicit solvent with the Amber ff14SB force field as used here yielded accurate folding to the native conformation for 14 out of 17 proteins with varying properties.[53] Such a score indicates that the parameters used are accurate for the determination of free energy minima and the corresponding protein conformations, as in our calculations. In addition to systematic errors, insufficient equilibration is a source of statistical uncertainty that is also hard to quantify. Despite monitoring both the forward and backward averages, there could always be unsampled events that reveal a longer correlation time, requiring extension of the simulation time.[54] Inclusion of additional restraints has been proposed to bypass this problem, speeding up equilibration by lowering the correlation times.[35,36] This might however change the definition of the macrostate and thereby affect the resulting free energy by an unknown amount. Related to this, the fact that the definition of the macrostate may be somewhat arbitrary is a problem for free energy calculations in general.[22] Moreover, we used a hypothesized model for the reference structure of the EI state. We therefore chose not to introduce any additional constraints. Because the results show that only the PF state is stable (has a constant RMSD) at low restraint strengths, additional constraints are expected to decrease the confinement free energy of the EI state (ΔGconfA) the most, decreasing the difference between the two (ΔΔGconfAB). This would in turn enhance ΔGAB, which means that the current calculation is a lower bound for the absolute conformational free energy difference between the two states. To reach convergence and limit the uncertainty, we followed the recommendations to balance accuracy and efficiency in the confinement free energy method by Villemot et al.[36] However, simulation of 19.4 × 109 time steps altogether (including all replicas) still is a huge computational effort, despite acceleration of the simulations on GPUs,a following recommendations in terms of window spacing, monitoring correlation times, and the use of REMD. Presumably, convergence to the same accuracy could have been accomplished more efficiently by an improved window spacing strategy that avoids excessive overlap. Especially in the low-frequency range (ν < 2 ps–1), where equilibration takes most of the simulation time and the use of REMD quadruples the amount of required resources, the window spacing should be optimal from the start. To improve the window spacing in future applications, we propose to find the optimal intermediate frequencies by combining two approaches that have been suggested by Villemot et al.[36] By calibrating the desired free energy difference between windows by sequentially adding intermediate frequencies (first approach) in three small frequency calibration ranges (low, intermediate, and high), the maximal free energy difference function (second approach) can be fitted between these calibration ranges. Doing so should prevent the excessive overlap between windows in the intermediate frequency range that was seen in our results. However, to verify this would require more simulations, so this can better be done on a smaller system first. Although a number of simulations have to be carried out consecutively in the proposed strategy, we expect that a more efficient spacing of the subsequent parallel simulations will supersede this disadvantage. There is a final subtle aspect in the confinement free energy method that is easily overlooked. Because rigid-body motions have been removed by the best-fit alignment, the translational (Gtrans) and rotational (Grot) degrees of freedom should be taken into account separately. Of these, the translational components of the free energy are the same for both states and cancel out. For a protein anchored in the membrane, the only rotational degree of freedom (around the longest axis) is negligible and ΔGAB remains unchanged. When a protein is free in solution, the rotational components are generally not the same for both states. Although the EI state is significantly more extended than the PF state, we find the difference ΔGrotAB to be only 0.5 kcal/mol. This results in a conformational free energy difference of ΔGAB = −33.4 ± 3.4kBT for the protein free in solution. The conformational free energy of the postfusion state of HA was found to be 34.2 ± 3.4kBT lower than that of the extended intermediate. This amount can be interpreted as the contribution of one membrane-embedded HA to membrane fusion in the biological context of influenza viruses entering a cell. Although a direct experimental validation for this result is not yet available, circumstantial evidence suggests that it is close to what one would expect. For example, it does not exceed the membrane-binding affinity of the three HA fusion peptides together (about 43kBT; 3 times the binding free energy of the P20H7 fusion peptide reported by Li et al.[55]), for otherwise, the fusion peptides would be pulled out of the target membrane upon HA collapse.[6] Moreover, the estimated energy is close to the energy from partial or full SNARE complex formation, respectively 35 or 65kBT, with one to three SNARE complexes mediating fusion.[19] Our result indicates that influenza fusion is catalyzed by one to three fusion proteins as well, considering that the hydration barrier for hemifusion has been estimated at 30–90kBT.[9−12] Combining this with the three to six HAs involved in fusion that others have suggested,[8,20,21] three neighboring (X31) HAs currently seem the best estimate.[8,56] For further comparison, the free energy that is available from an extended state of a single trimeric HIV-1 gp41 fusion protein to the postfusion state is about 70kBT.[16] Although much higher than our estimate for HA, it is consistent with a picture of fusion mediation by only one gp41 for HIV-1,[57] which takes considerably more time than for influenza,[58] presumably because of the remaining ∼20kBT barrier.[6] Application of the confinement method on the known or modeled conformations of other fusion proteins, specifically gp41[59−61] and the SNARE complex,[62] would enable a direct comparison with free energies obtained in the experiments. However, the prefusion state of the SNARE complex constitutes a separate monomer and dimer, which are both mainly disordered. This makes the definition of this state for the confinement free energy method rather arbitrary, and the result would therefore be unreliable. For the gp41 fusion protein, a structural model of the extended intermediate was found to be highly dynamic in our simulations, which would mean high RMSD values and unfeasibly long computation times. Hence, a direct comparison with experiments can only proceed once these problems have been alleviated. As a possible step in the future, the trajectories of the current results could be analyzed to find the per-residue contribution to the free energy difference.[30] From this refinement, the amino acids that contribute most could be compared with the documented effects of certain mutations on hemifusion.[7,63] Alternatively, the effects of certain mutations on the conformational free energy of the postfusion state could further be investigated using a new series of confinement simulations.[31] It would also be interesting to investigate HA from an H1N1 strain, which we expect to release less energy because it shows a marked difference in fusion efficiency with H3 HA.[14] If so, comparing the per-residue contributions between the two strains would give valuable information about the conservation of highly contributing amino acids.

Conclusions

The confinement free energy method was used to calculate the energy that hemagglutinin has available for membrane fusion. This energy is assumed to be equivalent to the conformational free energy difference between a model of the extended intermediate and the postfusion crystal structure. Convergence of the results was achieved by following the specific recommendations for the confinement method given in the literature, and at the same time complying to the general guidelines for free energy calculations. One membrane-anchored HA trimer was found to contribute a free energy of 34.2 ± 3.4kBT to the membrane fusion process, consistent with current estimates of the number of HAs and the free energy barriers in membrane fusion, as well as with the free energy contributions that have been obtained for other fusion proteins. Although computationally expensive, the used method has potential in the search for residues that contribute most to the energy for fusion. Knowledge of specific conserved residues that are key to the fusion mechanism may lead to the design of a physics-based drug that targets these residues.

60 in total

1. N- and C-terminal residues combine in the fusion-pH influenza hemagglutinin HA(2) subunit to form an N cap that terminates the triple-stranded coiled coil.

Authors: J Chen; J J Skehel; D C Wiley
Journal: Proc Natl Acad Sci U S A Date: 1999-08-03 Impact factor: 11.205

2. Analysis and elimination of a bias in targeted molecular dynamics simulations of conformational transitions: application to calmodulin.

Authors: Victor Ovchinnikov; Martin Karplus
Journal: J Phys Chem B Date: 2012-03-28 Impact factor: 2.991

3. Binding of influenza virus hemagglutinin to analogs of its cell-surface receptor, sialic acid: analysis by proton nuclear magnetic resonance spectroscopy and X-ray crystallography.

Authors: N K Sauter; J E Hanson; G D Glick; J H Brown; R L Crowther; S J Park; J J Skehel; D C Wiley
Journal: Biochemistry Date: 1992-10-13 Impact factor: 3.162

4. Leash in the groove mechanism of membrane fusion.

Authors: Heather E Park; Jennifer A Gruenke; Judith M White
Journal: Nat Struct Biol Date: 2003-11-02

5. Assessment of Mutational Effects on Peptide Stability through Confinement Simulations.

Authors: Riccardo Capelli; François Villemot; Elisabetta Moroni; Guido Tiana; Arjan van der Vaart; Giorgio Colombo
Journal: J Phys Chem Lett Date: 2015-12-22 Impact factor: 6.475

6. How many trimers? Modeling influenza virus fusion yields a minimum aggregate size of six trimers, three of which are fusogenic.

Authors: Maria Pamela Dobay; Akos Dobay; Johnrob Bantang; Eduardo Mendoza
Journal: Mol Biosyst Date: 2011-07-08

7. Enhancing Important Fluctuations: Rare Events and Metadynamics from a Conceptual Viewpoint.

Authors: Omar Valsson; Pratyush Tiwary; Michele Parrinello
Journal: Annu Rev Phys Chem Date: 2016-03-10 Impact factor: 12.703

8. Relating influenza virus membrane fusion kinetics to stoichiometry of neutralizing antibodies at the single-particle level.

Authors: Jason J Otterstrom; Boerries Brandenburg; Martin H Koldijk; Jarek Juraszek; Chan Tang; Samaneh Mashaghi; Ted Kwaks; Jaap Goudsmit; Ronald Vogels; Robert H E Friesen; Antoine M van Oijen
Journal: Proc Natl Acad Sci U S A Date: 2014-11-17 Impact factor: 11.205

9. GPU/CPU Algorithm for Generalized Born/Solvent-Accessible Surface Area Implicit Solvent Calculations.

Authors: David E Tanner; James C Phillips; Klaus Schulten
Journal: J Chem Theory Comput Date: 2012-06-15 Impact factor: 6.006

Review 10. Viral membrane fusion.

Authors: Stephen C Harrison
Journal: Virology Date: 2015-04-10 Impact factor: 3.616