Literature DB >> 29262673

Saturation Mutagenesis by Efficient Free-Energy Calculation.

Zuzana Jandova¹, Daniel Fast¹, Martina Setz¹, Maria Pechlaner¹, Chris Oostenbrink¹.

Abstract

Single-point mutations in proteins can greatly influence protein stability, binding affinity, protein function or its expression per se. Here, we present accurate and efficient predictions of the free energy of mutation of amino acids. We divided the complete mutational free energy into an uncharging step, which we approximate by a third-power fitting (TPF) approach, and an annihilation step, which we approximate using the one-step perturbation (OSP) method. As a diverse set of test systems, we computed the solvation free energy of all amino acid side chain analogues and obtained an excellent agreement with thermodynamic integration (TI) data. Moreover, we calculated mutational free energies in model tripeptides and established an efficient protocol involving a single reference state. Again, the approximate methods agreed excellently with the TI references, with a root-mean-square error of only 3.6 kJ/mol over 17 mutations. Our combined TPF+OSP approach does show not only a very good agreement but also a 2-fold higher efficiency than full blown TI calculations.

Entities: Chemical

Mesh：

Substances：
Amino Acids

Year: 2018 PMID： 29262673 PMCID： PMC5813279 DOI： 10.1021/acs.jctc.7b01099

Source DB: PubMed Journal: J Chem Theory Comput ISSN： 1549-9618 Impact factor: 6.006

Introduction

One-point mutation in a carefully chosen position in a protein may have a huge impact on a number of various properties, such as protein stability,[1] protein secondary structure,[2] catalytic function,[3] oligomerization,[4] binding of small ligands,[5] DNA,[6] or protein–protein interactions.[7] It is of a great interest to understand and be able to predict these effects, for which it is highly relevant to compute the associated free energy of mutation. Plentiful methods to calculate the change of the free energy in different systems have been described. By far the fastest are the empirical scoring functions. They gain their speed by working with rigid molecules, thereby largely neglecting the entropic term.[8] A class of relatively fast free-energy methods is formed by the so-called end-point methods. As the name suggests, end-point methods save time by omitting intermediates and simulating solely two end states (e.g., bound and unbound, folded and unfolded, or charged and neutral states). Nonetheless, proper sampling in the end states is necessary for convergence of the free energies and accurate results. Sharir-Ivry et al.[9] were able to correctly predict the trend of the free energy of mutation for variants of haloalkane dehalogenase via the linear response approximation (LRA) and Almlöf et al.[10] accurately predicted the relative binding free energy of two proteins upon mutations using a similar approach. Other methods, such as molecular mechanics/Poisson–Boltzmann surface area (MM-PBSA)[11,12] and molecular mechanics/generalized Born surface area (MM-GBSA)[13] decompose the free energy term into various thermodynamic contributions and have been applied for a quick alanine scan with a good agreement with alchemical methods or experiments.[14,15] Somewhat less efficient, yet theoretically robust, are alchemical free-energy calculation methods. They work, in contrast to the end-point methods, with physical or unphysical intermediates, often employing noninteracting dummy atoms. Thus, computationally more accessible processes can be simulated and, because free energy is a state function, relevant free-energy differences can be calculated from thermodynamic cycles. As the value around the cycle is equal to zero, the nature of the intermediate does not influence the final free energy value but can significantly affect the efficiency of the calculation.[16−19] In thermodynamic integration (TI)[20] the free energy is computed along a continuous path connecting state A and state B. TI has been proven to be accurate at calculating pKa values,[21] relative[22,23] or absolute binding free energies[24] of protein–ligand or protein–protein complexes, and the free energy of mutation in DNA.[25] Various methods that are still derived from robust statistical mechanics but can lead to more efficient free-energy calculations have been reported in the past. One prominent example is an one-step perturbation (OSP) approach in which a judiciously chosen reference state is designed, such that the free-energy difference to multiple relevant end states can be computed in a single step.[26−28] The reference state itself does not have to represent a physical molecule, but the molecular sampling can be enhanced by using soft-core potential energy functions[18,29] or floppy molecules.[30] With OSP one can compare the free energy difference of two end states A and B from the differences to the reference state. OSP has found applications in design of BACE-1 inhibitors,[31] in binding free energy calculations of α-thrombin and p38-MAP kinase ligands,[32] in combination with quantum mechanics,[33] and in a number of automated programs.[34,35] Chiang and Wang used OSP to predict the free energy changes of benzene to its derivatives with different substituents on the ring.[36] Without using any soft-core potential, however, only predictions for small neutral substituents meet accuracy requirements (<2.5 kJ/mol), whereas larger changes may be accommodated by using more elaborate reference states.[28,37] Still, OSP was repeatedly shown to perform best for nonpolar changes, whereas large changes in polarity lead to poor overlap in the conformational ensembles of the reference state and the end states.[18,37−39] The free energy of charging a neutral compound may be more appropriately approximated by the end-state methods, e.g., based on linear response or the linear interaction energies method.[40−42] Some years ago, we introduced the third-power fitting method, which approximates the thermodynamic integration profile for charging or uncharging from simulations at the end states by using a cumulant expansion to determine second derivatives of the free energy.[26,43] De Ruiter accurately calculated charging free energies of benzamidines in water and trypsin using third power fitting (TPF) method.[43] As TPF only takes the two end points of a TI trajectory, it still significantly decreases the necessary simulation time while maintaining a comparable accuracy. The aim of the current work is 2-fold. On the one hand, we want to systematically investigate the use of OSP and TPF on a chemically diverse set of compounds, calculating the solvation free energy of the amino acid side chain analogues. On the other hand, we aim to efficiently compute the free energy of mutation in a model peptide, with an eye to computational saturation mutagenesis in future work. We will use the fast and inexpensive OSP approach to calculate the van der Waals contributions to the free energy (nonpolar free energy, ΔGR>NOSP) and the third power fitting method (TPF) to calculate the Coulombic contributions to the free energy (uncharging free energy, ΔGQ>NTPF). Our work can be divided into two parts. In the first part, we calculate the free energy of solvation of amino acid side chains. The free energy of desolvation (negative solvation free energy) can be computed by transforming an amino acid side chain in solution into a noninteracting dummy particle. Though this can be done using an alchemical method using 10–20 intermediate states, we here propose to use the scheme in Figure . First, the free energy of annihilation of the reference state into dummy atoms (ΔGR>DTI) is calculated via TI, so that the entire desolvation free energy of side chain analogues is calculated as the sum of the annihilation free energy of the reference state ΔGR>DTI, the van der Waals contribution to the free energy, ΔGN>ROSP, and the Coulombic contributions to the free energy, ΔGQ>NTPF. The reference states for OSP calculation of the solvation free energies are formed by one or two soft spheres, noted as R1 and R2, respectively (Figure ). As indicated in Figure , the uncharging and annihilation of all compounds is also calculated using TI, yielding ΔGQ>NTI and ΔGN>DTI, respectively.

Figure 1

Figure 2

Chemical structure of the reference states used for the OSP approach. Atom A represents a soft-core particle as defined in ref (18).

Solvation free energy is calculated by turning the electrostatic and van der Waals interactions of the amino acid side chain analogues off. This is done in two steps: (1) the uncharging free energy (Q > N) is computed by TI and TPF and (2) the decoupling free energy (N > D), is computed by TI and OSP, using R1 or R2 as reference states. Chemical structure of the reference states used for the OSP approach. Atom A represents a soft-core particle as defined in ref (18). In the second part of this work, we calculate the free energy of mutation of the tripeptide Ala–ref–Ala, where ref represents the reference state. To compare calculations from the tripeptides with the ones of side chains, the first two reference states consist of one (R3) and two soft spheres (R4) connected to the peptide backbone. However, the conformational sampling of the side chains, relative to the backbone appears to play a large role, for which we have designed an additional reference state with one noninteracting dummy atom and one soft atom (R5) and one with the same construct but with three additional dummy atoms, R6. The additional dummy atoms in R6 are remnants from previous simulations but should not influence the free energies, such that these simulations can be considered independent repeats. Furthermore, a more elaborate thermodynamic cycle (Figure ) is devised, employing a conformational library of amino acid side chains, which was developed in the GROMOS force field and can be further used for future applications.

Figure 3

Free energy of mutation is computed using TI and the TPF and OSP approaches. In the OSP approach, multiple side chain conformations are considered, which are subsequently averaged appropriately (see Methods).

Methods

MD Simulations of Reference States

All MD simulations were carried out using the GROMOS11 software simulation package,[44] employing the GROMOS 54a8 force field.[45] The molecular topology building blocks for the reference states (Figure ) are provided in the Supporting Information. The van der Waals parameters for soft reference atoms, noted as A, were modified according to Schafer[18] to (C6)1/2 = 0.27322 (kJ mol–1 nm6)1/2 and (C12)1/2 = 0.056143 (kJ mol–1 nm12)1/2. This amounts to an interactions between water and the soft reference atoms with a radius, σ = 0.43 nm and a well depth, ε, of 0.536 kJ/mol. In the reference state R3 a bond of 0.252 nm was used between the Cα and A, which corresponds to the average distance between Cα and A in R5, in which the side chain bonds have a length of 0.153 nm and the Cα–D–A angle has an optimal value of 111°. For the reference states R2 and R4 a bond of length of 0.351 nm was used between the soft atoms. This distance was calculated as the regular CH–CH bond (0.153 nm) subtracted from twice the distance of Cα and A in R3 (2 × 0.252 nm). To enhance the sampling of the reference, a Lennard-Jones soft core parameter of 1.51 was used for the reference atoms.[18,29] Initial structures of the reference states were modeled in MOE.[46] Reference states were energy-minimized in a vacuum using the steepest-descent algorithm and subsequently solvated in a rectangular, periodic and pre-equilibrated box of simple point charge (SPC) water.[47] Minimum solute to wall distances were set to 1.8 nm and the minimum solute–solvent distance to 0.23 nm. This led to systems containing around 6000 atoms. During equilibration, initial velocities were randomly assigned according to a Maxwell–Boltzmann distribution at 60 K. All solute atoms were positionally restrained with a harmonic potential using a force constant of 2.5 × 104 kJ mol–1 nm–2. In each of the four subsequent 20 ps MD simulations, the force constant of the positional restraints was reduced by 1 order of magnitude and the temperature was increased by 60 K. Subsequently, the positional restraints were removed and rototranslational constraints were introduced on all solute atoms.[48] To ensure that the systems are equilibrated, the simulations at 300 K were prolonged for another 10 ns while keeping the temperature (300 K) as well as the pressure constant (1 atm). To sustain a constant temperature, we used the weak-coupling thermostat[49] with a coupling time of 0.1 ps. The pressure was maintained using a weak coupling barostat with a coupling time of 0.5 ps and an isothermal compressibility of 7.627 × 10–4 kJ–1 mol nm3 for reference states R1 and R2 and 4.575 × 10–4 kJ–1 mol nm3 for reference states R3–R6. Solute and solvent were coupled to separate temperature baths. Implementation of the SHAKE algorithm[50] to constrain bond lengths of solute and solvent to their optimal values allowed for a 2 fs time-step. Nonbonded interactions were calculated using a triple range scheme. Interactions within a short-range cutoff of 0.8 nm were calculated at every time step from a pair list that was updated every fifth step. At these points, interactions between 0.8 and 1.4 nm were also calculated explicitly and kept constant between updates. A reaction field[51] contribution was added to the electrostatic interactions and forces to account for a homogeneous medium outside the long-range cutoff using a relative dielectric constant of 61 as appropriate for the SPC water model.[52] Coordinate and energy trajectories were stored every 0.5 ps for subsequent analysis. Production runs of reference states R1 and R2 in water were subsequently performed for 10 ns, whereas the production runs of tripeptides (reference states R3–R6) were performed for 50 ns.

MD Simulations of Tripeptides for Conformational Library

To obtain a conformational library of the real amino acids, simulations of all Ala–X–Ala tripeptides except for glycine and proline were performed. We followed the simulation protocol of the reference states, with production simulations of 100 ns. These trajectories were subsequently clustered using the algorithm by Daura[53] according to the conformation of the side chain of the second residue after a rotational fit on the backbone atoms. The cutoff for clustering varied depending on the size of the side chain, so that the first five clusters cover ∼90% of trajectories. This clustering led to five central member structures (CMSs) for each amino acid which were subsequently used for fitting in OSP.

Thermodynamic Integration

Solvation free energies of amino acid side chain analogues and mutation free energies for tripeptides were calculated in two steps. In the first step the uncharging of the molecules was performed (ΔGQ>NTI), which was followed by disappearing of the van der Waals radii, i.e., turning the neutral side chains into dummy atoms (ΔGN>DTI) or to the alanine residue (ΔGN>ATI). The side chain analogues were prepared by breaking the Cα–Cβ bond and increasing the number of (united atom) hydrogen atoms on Cβ. Thermodynamic integration allows us to calculate the free energy difference between states A and B via multiple discrete intermediate steps using a coupling parameter λ.In state A, represented by Hamiltonian HA, λ is equal to 0, and in state B, represented by HB, λ is equal to 1. These two states are linked via H(λ) and the free energy difference between them is calculated from the derivative of the free energy with respect to λ. The uncharging simulations were performed at 11 evenly spaced λ points, using a linear coupling of the states, without soft-core potentials,such thatHQ and HN indicate the Hamiltonians from the charged and neutral states, λ denotes an ensemble average obtained from a simulation at λ, and Vlsel is the electrostatic energy of the molecule with its surroundings. The intramolecular interactions are not included in the equation, as they are not expected to change significantly with the surrounding and cancel in the thermodynamic cycle to compute, e.g., the solvation free energy or the relative free energy of mutation in the folded and unfolded state. Twenty picoseconds of equilibration at each λ value were followed by 1 ns of production run. To reach an estimated error smaller than 1.0 kJ/mol in the TI-calculations for every amino acid analogue, two more λ points (0.05 and 0.95) were added to the systems where the solutes carry either a positive or negative net charge. No counter charge was introduced to neutralize these systems upon creation or annihilation of the full charge. The simulations at the end states were prolonged to 10 ns and used to estimate the uncharging free energies with LRA (ΔGQ>NLRA) and TPF (ΔGQ>NTPF) as well (see below). The calculations of the free energies of annihilation were performed using a similar scheme. Here, soft-core parameters of 0.5 for the van der Waals[29] were used for the perturbed atoms, such that a nonlinear path in λ is taken.

Linear Response Approximation

To approximate the electrostatic contribution to the free energy, ΔGQ>N, we applied the linear response approximation (LRA).[54,55]The neutral and charged side chains are simulated explicitly to obtain the appropriate ensemble averages in eq . According to the linear response theory, the parameter β takes a theoretical value of .[54,55] The term ⟨−Vlsel⟩N is also referred to as the electrostatic preorganization energy.[56] Note that eq with β = gives exactly the same result as eq with the parametrization of H(λ) in eqs and 3 if ⟨−Vlsel⟩λ is linear in λ. The LRA estimates were based on 10 ns of simulations in the charged and neutral states, which were the prolonged end states of the TI calculations.

Third Power Fitting

In the work of de Ruiter[43] we observed that the derivative of the electrostatic contribution to the free energy with respect to λ does not always correlate linearly with λ. Rather, it shows a slight curvature, which can be described with a third order polynomial. In the third-power fitting approach, we approximate the integration in eq bywithand the parameters are fitted to the first and second derivative of the free energy with respect to λ in the end states (λ = 0 and λ = 1). Using the Hamiltonian of eq , the first derivative is given by eq and through a cumulant expansion,[57] the second derivative of the free energy, is given byHere, kB is Bolzmann’s constant, and T is the temperature in K. The second derivative of the free energy with respect to λ is equal to the negative of the fluctuations of the derivative of the Hamiltonian with respect to λ. The TPF estimates were based on 10 ns of simulations in the charged (λ = 0) and neutral states (λ = 1), which were the prolonged end states of the TI calculations.

One-Step Perturbation

The one-step perturbation is based on Zwanzig’s free-energy perturbation equation:[58]where HN refers to the Hamiltonian of an end state, HR to the Hamiltonian of a reference state, ⟨⟩R to the ensemble average from a simulation of the reference state. Here, the relevant end state is the neutral amino acid side chain N. The efficiency of this method results from the fact that there is only one reference-state simulation needed to compute the free-energy differences to multiple end states. However, the appropriate reference state has to be chosen very carefully. One precondition for accurate free-energy estimates is that all of the reference-state Hamiltonian singularities should be shared by the Hamiltonians of all end states,[59] i.e., that all the relevant conformations of the end states should be covered by the reference state. To avoid intermolecular singularities between the end-state atoms and their surroundings, we use the soft-core interaction.[29] Liu et al.[26,60] and Schäfer[18] showed that atoms with a soft-core potential in reference-state simulation help to accurately predict the solvation free energies of nonpolar molecules in water. In the current work 10 ns MD trajectories of the R1 or R2 reference state were used for the OSP calculation of the amino acid side chain analogues. In the case of R1, the center of geometry of the side chain was fitted onto the center of geometry of the reference-state atom. In the case of R2, the line between the two atoms with the longest intramolecular distance in the side chain analogue was aligned with the bond between the two atoms of the reference state. This way, we ensured that the amino acid side chain would fit into both atoms of the reference state. Additionally, to improve statistics, this fit was repeated every 36°, leading to 10 rotations for every side chain.[38] For the tripeptides, 50 ns of simulations of the reference states R3–R6 was used. Because of the common backbone, the fitting was not that straightforward. In a first step, the Cα atom of the fitted amino acid was aligned with the Cα atom of the reference state. In the second step, the center of geometry of the side chain was aligned to the center of geometry of the soft atoms in the reference state. Next, the interaction energies were calculated for the side chain conformation attached to the backbone of the reference state. In the tripeptides, the fitting procedure was repeated for each of the central member structures obtained from separate plain MD simulations, to include multiple conformations of the side chains. This procedure leads to five separate ΔGR>NOSP values per side chain. The relative free energies between the conformations i, was computed usingwhere P is the relative occurrence of this conformation, as obtained from the conformational clustering. ΔGconf is added to ΔGR>NOSP and the resulting sum was exponentially averaged to obtain a single estimate of ΔGR>NOSPThis equation represents the proper averaging of the multiple paths from the reference state to the real, neutral state of the amino acid in Figure .

Results and Discussion

Solvation Free Energies

Uncharging Free Energies

In the first step, we calculated the solvation free energies of amino acid side chain analogues in water. As outlined in Figure , this process is split up into an uncharging and a cavity formation, or decoupling, step. For the uncharging process, we compare two end-point methods, LRA and TPF, with TI (Table ). Table S1 shows the individual ligand surrounding energies and their fluctuations in the neutral and charged states, which are used to compute the TPF free-energy differences. The convergence of the fluctuations is represented graphically in Figure S1. All fluctuations seem well converged within about 5 ns. For all molecules studied, the agreement between TPF and TI is significantly better than between LRA and TI. As a representative example, Figure shows the free-energy profiles for the uncharging of methionine. It can be seen that the curve of dG/dλ vs λ (in black) is not strictly linear but shows some curvature. Although LRA assumes linearity, with TPF we approximate the TI profile better, leading to a better free-energy estimate from the end-state simulations. The total root-mean-square error (RMSE) over all compounds in Table between ΔGQ>NLRA and ΔGQ>NTI amounts to 11.8 kJ/mol, whereas between ΔGQ>NTPF and ΔGQ>NTI this is reduced to 3.3 kJ/mol. The only two cases where TPF deviates from TI by more than 4 kJ/mol are the compounds with a full positive charge (Arg and Lys side chain analogues). TPF outperforms LRA especially in the case of compounds bearing a full negative charge (Asp and Glu side chain analogues), where LRA deviates from TI by around 20 kJ/mol.

Table 1

Calculated Uncharging Free Energies of Side Chain Analogues (kJ/mol) via TI, LRA, and TPF and Their Absolute Differences (ΔΔGQ>NLRA–TI and ΔΔGQ>NTPF–TI)a

a.a.	ΔG_Q>N^TI			ΔG_Q>N^LRA			ΔΔG_Q>N^LRA–TI	ΔG_Q>N^TPF			ΔΔG_Q>N^TPF–TI
arg	138.9	±	0.3	152.6	±	0.1	13.8	144.0	±	0.2	5.1
asn	52.1	±	0.2	64.8	±	0.04	12.7	54.0	±	0.1	1.9
asp	338.6	±	0.4	359.2	±	0.1	20.6	339.5	±	0.3	0.9
cys	10.2	±	0.1	12.7	±	0.02	2.5	10.4	±	0.03	0.2
gln	51.8	±	0.2	64.9	±	0.04	13.1	54.3	±	0.1	2.5
glu	337.7	±	0.4	356.9	±	0.1	19.2	339.2	±	0.3	1.5
hisa	51.5	±	0.2	61.9	±	0.1	10.4	54.2	±	0.3	2.8
hisb	66.3	±	0.2	76.0	±	0.04	9.7	68.6	±	0.1	2.3
lys	178.4	±	0.3	192.0	±	0.1	13.7	187.3	±	0.3	8.9
met	14.7	±	0.1	17.4	±	0.02	2.8	14.9	±	0.03	0.2
phe	9.0	±	0.1	10.7	±	0.02	1.8	9.3	±	0.02	0.3
ser	29.5	±	0.1	40.4	±	0.03	10.8	32.5	±	0.1	3.0
thr	28.6	±	0.1	39.3	±	0.03	10.7	31.7	±	0.1	3.1
trp	32.1	±	0.1	35.9	±	0.03	3.8	32.7	±	0.1	0.6
tyr	32.5	±	0.1	43.0	±	0.03	10.5	35.9	±	0.1	3.4
RMSE							11.8				3.3

Statistical error estimates are obtained from 1000 bootstrap replicates on the original data. The root mean square error (RMSE) is computed between TI and LRA and between TI and TPF.

Figure 4

Example of the uncharging free energy computed by TI (black curve), LRA (blue curve), and TPF (red curve) for the side chain analogue of Met.

Example of the uncharging free energy computed by TI (black curve), LRA (blue curve), and TPF (red curve) for the side chain analogue of Met. Statistical error estimates are obtained from 1000 bootstrap replicates on the original data. The root mean square error (RMSE) is computed between TI and LRA and between TI and TPF.

Decoupling Free Energies

The second step toward the solvation free energy involves the cavity formation step, here computed as the decoupling of the neutral side chain analogues by turning off the van der Waals interactions of all atoms with their surroundings. Following Figure , this process was performed by the TI approach, yielding ΔGN>DTI and a combination of TI and OSP, which is computed as ΔGN>DOSP = ΔGN>ROSP + ΔGR>DTI. The value of ΔGR>DTI was −11.5 ± 0.1 kJ/mol for R1 and −16.5 ± 0.2 kJ/mol for R2. Note that these values only need to be computed once to obtain the complete solvation free energy, whereas it is not needed to represent relative solvation free energies. The annihilation free energies are collected in Table and compared between TI and OSP with reference states R1 and R2. For reference state R1 it is clear that ΔGN>ROSP amounts to extremely high values for larger molecules, such as the side chain analogues of Arg, Gln, Met, Tyr, and Trp. This is a clear indication that no relevant conformations for these compounds are observed in the simulation of R1. Table S3 in the Supporting Information shows the percentage of snapshots in the reference-state simulations that contribute significantly to the free-energy estimates. This value is obtained by counting the number of configurations for which (HN – HR) in eq is less than ΔGN>ROSP + kBT. For the indicated compounds, no significant number of contributing snapshots is obtained, suggesting that the extremely high free-energy estimates.

Table 2

Decoupling Free Energies (kJ/mol) of the Neutral Amino Acid Side Chain Analogues via TI and OSP (ΔGN>DOSP = −ΔGR>NOSP + ΔGR>DTI) Using Reference States R1 and R2, Together with Their Absolute Differences (ΔΔGN>DOSP–TI)

	R1						R2
a.a.	ΔG_N>D^TI			ΔG_N>D^OSP			ΔΔG_N>D^OSP–TI	ΔG_N>D^OSP			ΔΔG_N>D^OSP–TI
ala	–9.1	±	0.3	–9.3	±	0.1	0.2
arg	–15.0	±	0.8	–1408	±	3	1393	–21.2	±	0.3	6.2
asn	–9.5	±	0.5	–9.8	±	0.4	0.3	–20.7	±	0.2	11.2
asp	–11.0	±	0.6	–10.9	±	0.4	0.1	–19.0	±	0.3	8.1
cys	–2.9	±	0.4	–5.6	±	0.4	2.7	–21.3	±	0.2	18.4
gln	–11.1	±	0.6	–1325	±	3	1314	–16.2	±	0.2	5.1
glu	–12.2	±	0.6	–9.4	±	0.7	2.8	–13.2	±	0.2	1.1
hisa	–7.5	±	0.7	–11.8	±	1.2	4.3	–13.1	±	0.2	5.6
hisb	–7.2	±	0.6	–9.2	±	1.3	2.0	–14.0	±	0.2	6.8
ile	–9.2	±	0.6	–6.5	±	1.4	2.7	–9.4	±	0.2	0.2
leu	–10.3	±	0.7	–10.7	±	1.1	0.4	–12.5	±	0.3	2.2
lys	–12.4	±	0.6	–16.9	±	2.0	4.5	–16.8	±	0.3	4.4
met	–6.6	±	0.6	–438	±	3	431	–7.2	±	0.2	0.5
phe	–7.6	±	0.8	–12.7	±	2.4	5.1	–10.0	±	0.3	2.4
ser	–5.4	±	0.3	–7.1	±	0.4	1.7	–25.4	±	0.25	20.0
thr	–6.3	±	0.4	–8.6	±	0.3	2.3	–10.5	±	0.57	4.2
trp	–7.0	±	1.0	–149	±	3	142	–1.3	±	0.51	5.7
tyr	–6.9	±	0.9	–8583	±	3	8576	–6.5	±	0.62	0.5
val	–8.5	±	0.5	–9.3	±	0.4	0.8	–15.0	±	0.24	6.5
RMSE							2018				8.1

To accommodate larger compounds better, we introduced reference state R2 and included 10 rotational states of the molecules in the calculations, as outlined in the Methods. Table shows that the agreement between ΔGN>RTI and ΔGN>ROSP is much better for the large compounds, giving a RMS error over the indicated five compounds of 4.4 kJ/mol. However, some of the smaller compounds (e.g., the side chain anlogues of Cys and Ser) are not so well accommodated in the larger reference state, yielding deviations of roughly 20 kJ/mol. A viable strategy to choose the optimal reference state for a compound seems to be to pick the larger reference state, only if the percentage of contributing frames from Table S3 is less than 0.1% for simulation R1. The RMSE between the TI and OSP data is reduced to 3.1 kJ/mol when this rule of thumb is followed. Combining the uncharging and the decoupling simulations into the full transfer of the side chain analogues from the hydrated to the vacuum state, we can compare the solvation free energies from experiment[61] to the value obtained with TI and using the OSP-TPF approach. Table collects these data, using the reference states R1 and R2 in the OSP approach. Note that for the side chains of Ala, Leu, Ile, and Val no TPF contribution was added, as these side chains are completely neutral in the united atom GROMOS 54a8 force field. No experimental data are given for the charged compounds (Arg, Asp, Glu, and Lys), as both the experimental and the computational ones require extensive processing before they can be compared directly.[62,63] In this work, we did not attempt to compensate for the artifacts that arise in free-energy calculations involving a full charge change. These artifacts are identical between the various approaches and are hence irrelevant for a comparison between the TI and TPF+OSP data. The large deviations between the TI data and the TPF+OSP approach are due to the corresponding deviations in the decoupling states. Figure compares the solvation free energies calculated via TI and TPF+OSP to the experimental values. Except for the bulky outliers, which could not be shown in the figure, the calculations using the R1 reference state agree better with the experimental as well as with the TI results. Using the same rule of thumb as above, applying reference state R2 only when the percentage of contributing frames drops below 0.1% for R1, the RMSE between TI and TPF+OSP is reduced to 2.6 kJ/mol, due to a fortuitous cancellation of the small errors in OSP (RMSE 3.1 kJ/mol) and TPF (RMSE 3.3 kJ/mol). The largest remaining deviations are for Trp (using R2; 6.4 kJ/mol), Lys (using R1; 4.4 kJ/mol), and Glu (using R1; 4.3 kJ/mol). For Trp the deviation still largely comes from the OSP calculation, as this compound is still relatively large for the (larger) reference state as reflected by the lowest percentage of contributing frames for R2 in Table S3. For Lys, however, the deviation largely comes from the TPF calculation (8.9 kJ/mol), one of the two compounds with a full positive charge, for which TPF was seen to perform worst (even though still better than LIE or LRA). For Glu, finally, the deviation of 4.3 kJ/mol is the result of errors of 2.8 kJ/mol for the charging free energy, which is quite reasonable for generating a full charge, and 1.5 kJ/mol for the decoupling free energy.

Table 3

Solvation Free Energies of Amino Acid Side Chains (kJ/mol)a

			R1		R2
a.a.	ΔG_solv^exp	ΔG_Q>D^TI	ΔG_Q>D^TPF+OSP	ΔΔG_Q>D^T+O–TI	ΔG_Q>D^TPF+OSP	ΔΔG_Q>D^T+O–TI
ala	8.1	9.1	9.3	0.2
ile	9.0	9.2	6.5	2.7	9.3	0.2
leu	9.5	10.3	10.7	0.4	12.5	2.2
val	8.3	8.5	9.3	0.8	14.9	6.4
arg		–123.9	1264	1388	–122.8	1.1
asn	–40.5	–42.6	–44.2	1.6	–33.3	9.2
asp		–327.6	–328.6	1.0	–320.5	7.1
cys	–5.2	–7.3	–4.8	2.5	10.8	18.1
gln	–39.3	–40.6	1271	1312	–38.1	2.5
glu		–325.5	–329.8	4.3	–326.0	0.5
hisa	–43.0	–44.0	–42.4	1.6	–41.2	2.8
hisb		–59.1	–59.5	0.4	–54.6	4.5
lys		–166.0	–170.4	4.4	–170.5	4.5
met	–6.2	–8.0	423	432	–7.8	0.3
phe	–3.2	–1.4	3.4	4.8	0.7	2.1
ser	–21.2	–24.1	–25.4	1.3	–7.1	17.0
thr	–20.4	–22.3	–23.2	0.9	–21.2	1.1
trp	–24.6	–25.1	116	142	–31.4	6.4
tyr	–25.6	–25.5	8547	8572	–29.5	3.9
RMSE				2018		7.2

ΔGsolvexp are experimentally measured solvation energies,[61] ΔGQ>DTI are obtained from thermodynamic integration, and ΔGQ>DTPF+OSP are obtained with the combined TPF+OSP approach. ΔΔGQ>DT+O–TI are the absolute differences between ΔGQ>DTPF+OSP and ΔGQ>DTI.

Figure 5

Comparison of experimental (ΔGsolvexp) and calculated (ΔGQ>D) solvation free energies of amino acid side chain analogues that do not carry a net charge. Calculations using TI are compared to using the TPF+OSP approach with reference states R1 and R2. Outliers Gln, Met, Trp, and Tyr are not shown for the calculations using R1 (Table ). ΔGsolvexp are experimentally measured solvation energies,[61] ΔGQ>DTI are obtained from thermodynamic integration, and ΔGQ>DTPF+OSP are obtained with the combined TPF+OSP approach. ΔΔGQ>DT+O–TI are the absolute differences between ΔGQ>DTPF+OSP and ΔGQ>DTI.

Mutation Free Energies

Eventually, we aim to calculate the free energies of mutation in a protein; therefore, we here validate the TPF+OSP method using a tripeptide of Ala–ref–Ala. All of the amino acids except for proline and glycine were mutated into Ala via TI or the combination of TPF+OSP, following the scheme in Figure . TI was performed in the tripeptide Ala–X–Ala, X being the mutated amino acid, in two separate steps. First, the partial charges of amino acid side chains were removed followed by the changing of Cß atom of the side chain into a united CH3 atom and annihilation of all remaining side chain atoms into dummy atoms. End points of the uncharging TI, i.e., λ = 0 and λ = 1 were simulated for 10 ns and used for the LRA and TPF calculation. Simulations of the tripeptide with four reference states, R3–R6 were carried out for 50 ns in water. Seeing that in a tripeptide the conformation of a side chain can, through interactions with the backbone, significantly influence the mutation free energies, we created and used a library of the most relevant side chain conformations. A simulation of 100 ns of each tripeptide Ala–X–Ala was performed and subsequently clustered to obtain the five most relevant side chain conformations for each amino acid. CMSs from the five clusters were then fitted into the reference states to apply the OSP. The contributions for the individual conformations and a contribution due to the size of the cluster were exponentially averaged using eq to obtain an overall estimate. An example of the conformational diversity of the five CMSs of Arg is shown in Figure .

Figure 6

Overlay of central member structures of arginine side chains in the tripeptide. The backbone of the first CMS of the tripeptide Ala–Arg–Ala shown in orange and five central member structures of the side chain of Arg in differently colored sticks.

Uncharging Free Energies

Table shows the comparison of uncharging free energies using TI, LRA, and TPF and Table S2 shows the individual ligand surrounding energies and their fluctuations in the neutral and charged states, which are used to compute the TPF free-energy differences. The results are very comparable to the values obtained for the individual side chain analogues. As before, the LRA values agree poorly, with an overall RMSE value of 12 kJ/mol. Similarly, we observe a good agreement between TPF and TI uncharging free energies with an overall RMSE of 3.4 kJ/mol. The biggest differences, of more than 5 kJ/mol, are for Arg, Gln, and Lys. Amino acids with a full charge show large uncharging free energies of more than 100 kJ/mol. Surprisingly, we found comparable values for the neutral residues Asn and Gln, for which uncharging free energies reach more than 250 kJ/mol. This could be traced to very strong intramolecular interactions of the acetamide group.

Table 4

Calculated Uncharging Free Energies of Side Chains in the Tripeptides (kJ/mol) via TI, LRA, and TPF and Their Absolute Differences (ΔΔGQ>NLRA–TI and ΔΔGQ>NTPF–TI)a

a.a.	ΔG_Q>N^TI			ΔG_Q>N^IRA			ΔΔG_Q>N^LRA–TI	ΔG_Q>N^TPF			ΔΔG_Q>N^TPF–TI
arg	217.0	±	0.1	230.8	±	0.1	13.8	222.5	±	0.3	5.5
asn	252.0	±	0.1	263.8	±	0.1	11.9	253.9	±	0.1	1.9
asp	415.9	±	0.2	437.3	±	0.1	21.5	416.9	±	0.5	1.1
cys	18.3	±	0.03	21.0	±	0.03	2.7	18.7	±	0.04	0.4
gln	268.2	±	0.1	284.6	±	0.1	16.5	274.3	±	0.1	6.2
glu	406.1	±	0.2	425.8	±	0.1	19.7	407.1	±	0.5	1.0
hisa	29.3	±	0.1	40.0	±	0.1	10.7	32.2	±	0.1	2.8
hisb	60.9	±	0.1	71.8	±	0.1	10.9	64.0	±	0.1	3.1
lysh	250.4	±	0.1	264.4	±	0.1	14.0	258.5	±	0.4	8.1
met	12.6	±	0.04	15.1	±	0.03	2.6	12.7	±	0.04	0.1
phe	0.6	±	0.03	2.4	±	0.02	1.7	1.0	±	0.03	0.3
ser	44.4	±	0.1	53.5	±	0.04	9.1	46.2	±	0.1	1.7
thr	44.0	±	0.1	51.9	±	0.04	7.9	45.4	±	0.1	1.4
trp	53.0	±	0.1	57.6	±	0.04	4.6	54.1	±	0.1	1.1
tyr	97.9	±	0.1	108.0	±	0.05	10.1	100.8	±	0.1	2.9
RMSE							12.0				3.4

Statistical error estimates are obtained from 1000 bootstrap replicates on the original data. The root mean square error (RMSE) is computed between TI and LRA and between TI and TPF.

Free Energies of Annihilation

In the calculation of the free energies of annihilation, we compare the value obtained by TI, ΔGN>ATI to the corresponding free energy obtained from OSP as ΔΔGN>AOSP = ΔGR>AOSP – ΔGR>NOSP in Table . Following up on the OSP calculation for the solvation free energies in the previous section, we here included reference states R3 and R4. Considering that the smaller R1 gave reasonable results for most of the side chains, we furthermore introduced two additional reference states, R5 and R6. Both of them contain one dummy atom between the soft sphere and Cα, for the sake of a broader sampling of the soft sphere with respect to the backbone. As noted before, R6 is identical to R5, except for some additional dummy particles that may influence the dynamics, but not the conformational ensemble. All five CMSs were fitted into the configurations collected for the four reference states in the Ala–ref–Ala tripeptide and interaction energies were recalculated. Subsequently, the results from the individual CMS were averaged into one final value, using eq . This approach worked reasonably well for most of the side chains, but in Ile we could see a discrepancy between TI and OSP approach by more than 4 kJ/mol in all reference states. This could be avoided by including eight CMSs instead of five, better representing the conformational ensemble of Ile. Also the largest side chain of Trp seems most difficult to accommodate in all of the reference states, with deviations from the TI data of 5.7–16 kJ/mol.

Table 5

Absolute Difference between the Free Energy of Annihilation Calculated via TI and OSP, ΔΔGN>AOSP–TI = |ΔΔGN>AOSP – ΔΔGN>ATI|, for the Different Reference Statesa

	ΔΔG_N>A^OSP–TI [kJ/mol]
tripeptides	R3	R4	R5	R6
arg	2.2	4.3	1.8	2.7
asn	2.8	2.3	1.2	1.1
asp	6.2	3.6	0.8	1.3
cys	1.7	1.5	2.6	2.3
gln	1.6	4.8	0.3	0.1
glu	1.8	2.8	1.5	0.7
hisa	4.3	4.4	0.0	0.6
hisb	1.0	0.3	2.3	1.3
ilea	6.8	8.6	3.1	6.6
leu	3.5	7.4	0.3	0.3
lysh	4.3	3.8	1.7	3.6
met	7.0	6.0	0.6	1.3
phe	6.3	4.7	1.2	1.1
ser	2.6	3.3	2.8	2.5
thr	4.8	3.2	2.1	1.8
trp	16.0	6.3	5.7	8.0
tyr	4.5	2.8	0.3	1.6
val	2.7	4.5	2.2	2.4
RMSE	5.6	4.6	2.2	3.0

Eight CMSs were used instead of five.

Eight CMSs were used instead of five. Overall, Table shows that reference state R3 shows the worst agreement between TI and OSP with RMSE of 5.6 kJ/mol, followed by R4 with RMSE of 4.6 kJ/mol. R5 and R6, however, show very good agreement between the OSP and TPF values. R5 shows excellent agreement with TI, with RMSE of 2.2 kJ/mol, which is below kBT at 300 K.

Free Energies of Mutation

Combing the data from the uncharging and the annihilation of nonpolar side chains, the total free energy of mutation to Ala was calculated and correlated in Figure . The difference from the TI data (ΔΔGQ>A(TPF+OSP)–TI) is given in Table . Following the trend observed for the annihilation of nonpolar particles with OSP, the reference states showing the best agreement with TI are R5 and R6, with RMSE of 3.2 and 4.1 kJ/mol, respectively. R3 and R4 show RMSE of 6.7 and 6.4 kJ/mol, respectively. In all cases the biggest outlier is Lys, with a deviation of more than 9 kJ/mol, Gln, with a deviation of at least 6 kJ/mol and Trp, with a difference of at least 5 kJ/mol. The reasons for these deviations are different. As for the amino acid side chain analogues, the full positive charge hampers the accuracy of the TPF approach, whereas for Trp the large size limits the accuracy of the OSP approach. Not surprisingly, the large TPF estimates for Gln described above, lead to a large remaining deviation in the mutation free energy. An average overall deviation that is on the order of 4 kJ/mol (∼1 kcal/mol) is quite acceptable considering the wide range of the mutational free energies in Figure .

Figure 7

Comparison of the free energies of mutation of amino acids into alanine in the tripeptide calculated via TI (ΔGQ>ATI) and TPF+OSP approach with all reference states (ΔΔGQ>AOSP+TPF). The right panel is a zoom of the indicated square in the left panel.

Table 6

Absolute Difference between the Mutational Free Energy Calculated via TI and OSP, ΔΔGQ>A(TPF+OSP)–TI = |ΔΔGQ>ATPF+OSP – ΔΔGQ>ATI|, for the Different Reference Statesa

	ΔΔG_Q>A^{(TPF+OSP)–TI} [kJ/mol]
tripeptides	R3	R4	R5	R6
ilea	6.8	8.6	3.1	6.6
leu	3.5	7.4	0.3	0.3
val	2.8	4.5	2.2	2.4
arg	3.2	9.7	3.6	2.7
asn	4.7	4.2	0.7	0.8
asp	7.1	4.5	1.7	2.2
cys	1.3	1.1	2.2	1.9
gln	7.9	11.2	6.0	6.4
glu	2.8	3.8	0.5	0.3
hisa	4.6	4.7	0.3	0.9
hisb	5.0	4.3	1.6	2.7
lys	12.5	12.0	9.9	11.8
met	7.1	6.1	0.7	1.4
phe	6.7	5.1	0.8	1.4
ser	0.8	1.5	1.0	0.7
thr	3.4	1.9	0.8	0.4
trp	14.9	5.2	6.8	6.9
tyr	7.9	6.2	3.7	1.8
RMSE	6.7	6.4	3.6	4.1

Eight CMSs were used instead of five.

Efficiency of the Method

The combined TPF+OSP approach has the potential to significantly reduce the amount of computational time needed for calculation of the free energy of mutation. In TI, 10–20 simulations at different λ-values are needed for every pair of mutants that is studied. Most of the simulation time is spent on unphysical intermediate states. In the TPF+OSP approach, the charging free energy is collected from only two simulations in the end states, of which the charged state is a simulation of the actual mutants one is interested in, giving access to structural properties of the mutant. Furthermore, the OSP contribution is calculated from a single (longer) simulation of the reference state, which is efficiently reused for all mutants. Taken together, the TPF+OSP approach requires less overall simulation time, and a larger fraction of the simulation time is spent on simulations that have physical relevance and can be further used to study, e.g., the interactions of the amino acids. Without an optimized simulation time for either of the approaches, Table S4 summarizes the overall simulation time used in the TI calculations and the TPF+OSP calculations of the tripeptides. The latter approach was based on approximately half the simulation length as the TI data. It is important to mention that the computational resources spent on creating of the side chain conformation library amounted to 100 ns for each amino acid. This library is, however, already created and can be further used in future calculations. Alternatively, if necessary, the end-point trajectories of TPF could be used to create additional libraries in future calculations of the free energy of mutation in very different systems.

Conclusion

In the current work, we systematically studied the performance of an approach combining third-power fitting (TPF) and one-step perturbation (OSP) approaches, as compared to more robust thermodynamic integration (TI) data. The solvation free energies of a large range of compounds (charged, polar, nonpolar, small aliphatic, aromatic, ...) were computed and compare excellently to the TI data. When two reference states are considered, depending on the size of the solute, a root-mean-square deviation of only 2.6 kJ/mol was obtained. Furthermore, we extended this approach to the calculation of mutational free energies in model tripeptides and could establish an efficient protocol involving a single reference state. The overall deviation between the TI data and the TPF+OSP approach amounts to a very good 3–4 kJ/mol, which is still very acceptable considering the large range of free energies considered. Without explicit optimization of the method, the TPF+OSP approach can be about 2 times more efficient than full TI calculations.

44 in total

1. Free energies of binding of polychlorinated biphenyls to the estrogen receptor from a single simulation.

Authors: Chris Oostenbrink; Wilfred F van Gunsteren
Journal: Proteins Date: 2004-02-01

2. Single-step perturbations to calculate free energy differences from unphysical reference states: limits on size, flexibility, and character.

Authors: Chris Oostenbrink; Wilfred F van Gunsteren
Journal: J Comput Chem Date: 2003-11-15 Impact factor: 3.376

3. Probing the effect of point mutations at protein-protein interfaces with free energy calculations.

Authors: Martin Almlöf; Johan Aqvist; Arne O Smalås; Bjørn O Brandsdal
Journal: Biophys J Date: 2005-11-04 Impact factor: 4.033

Review 4. Alchemical free energy methods for drug discovery: progress and challenges.

Authors: John D Chodera; David L Mobley; Michael R Shirts; Richard W Dixon; Kim Branson; Vijay S Pande
Journal: Curr Opin Struct Biol Date: 2011-02-23 Impact factor: 6.809

5. A new method for predicting binding affinity in computer-aided drug design.

Authors: J Aqvist; C Medina; J E Samuelsson
Journal: Protein Eng Date: 1994-03

6. Exploring protein native states and large-scale conformational changes with a modified generalized born model.

Authors: Alexey Onufriev; Donald Bashford; David A Case
Journal: Proteins Date: 2004-05-01

7. Computational Alanine Scanning Mutagenesis: MM-PBSA vs TI.

Authors: Sílvia A Martins; Marta A S Perez; Irina S Moreira; Sérgio F Sousa; M J Ramos; P A Fernandes
Journal: J Chem Theory Comput Date: 2013-02-25 Impact factor: 6.006

8. Testing of the GROMOS Force-Field Parameter Set 54A8: Structural Properties of Electrolyte Solutions, Lipid Bilayers, and Proteins.

Authors: Maria M Reif; Moritz Winger; Chris Oostenbrink
Journal: J Chem Theory Comput Date: 2013-01-02 Impact factor: 6.006

9. pmx: Automated protein structure and topology generation for alchemical perturbations.

Authors: Vytautas Gapsys; Servaas Michielssens; Daniel Seeliger; Bert L de Groot
Journal: J Comput Chem Date: 2014-12-08 Impact factor: 3.376

10. Pyranose dehydrogenase ligand promiscuity: a generalized approach to simulate monosaccharide solvation, binding, and product formation.

Authors: Michael M H Graf; Lin Zhixiong; Urban Bren; Dietmar Haltrich; Wilfred F van Gunsteren; Chris Oostenbrink
Journal: PLoS Comput Biol Date: 2014-12-11 Impact factor: 4.475

5 in total

1. Alchemical Binding Free Energy Calculations in AMBER20: Advances and Best Practices for Drug Discovery.

Authors: Tai-Sung Lee; Bryce K Allen; Timothy J Giese; Zhenyu Guo; Pengfei Li; Charles Lin; T Dwight McGee; David A Pearlman; Brian K Radak; Yujun Tao; Hsu-Chun Tsai; Huafeng Xu; Woody Sherman; Darrin M York
Journal: J Chem Inf Model Date: 2020-09-16 Impact factor: 4.956

Saturation Mutagenesis by Efficient Free-Energy Calculation.

Introduction

Methods

MD Simulations of Reference States

MD Simulations of Tripeptides for Conformational Library

Thermodynamic Integration

Linear Response Approximation

Third Power Fitting

One-Step Perturbation

Results and Discussion

Solvation Free Energies

Uncharging Free Energies

Decoupling Free Energies

Mutation Free Energies

Uncharging Free Energies

Free Energies of Annihilation

Free Energies of Mutation

Efficiency of the Method

Conclusion

1. Free energies of binding of polychlorinated biphenyls to the estrogen receptor from a single simulation.

2. Single-step perturbations to calculate free energy differences from unphysical reference states: limits on size, flexibility, and character.

3. Probing the effect of point mutations at protein-protein interfaces with free energy calculations.

Review 4. Alchemical free energy methods for drug discovery: progress and challenges.

5. A new method for predicting binding affinity in computer-aided drug design.

6. Exploring protein native states and large-scale conformational changes with a modified generalized born model.

7. Computational Alanine Scanning Mutagenesis: MM-PBSA vs TI.

8. Testing of the GROMOS Force-Field Parameter Set 54A8: Structural Properties of Electrolyte Solutions, Lipid Bilayers, and Proteins.

9. pmx: Automated protein structure and topology generation for alchemical perturbations.

10. Pyranose dehydrogenase ligand promiscuity: a generalized approach to simulate monosaccharide solvation, binding, and product formation.

1. Alchemical Binding Free Energy Calculations in AMBER20: Advances and Best Practices for Drug Discovery.

2. Implicit ligand theory for relative binding free energies.

3. Toward Automated Free Energy Calculation with Accelerated Enveloping Distribution Sampling (A-EDS).

4. Enhancing the promiscuity of a member of the Caspase protease family by rational design.

5. Efficient In Silico Saturation Mutagenesis of a Member of the Caspase Protease Family.