Literature DB >> 31476127

Insights into Membrane Protein-Lipid Interactions from Free Energy Calculations.

Robin A Corey¹, Owen N Vickery¹, Mark S P Sansom¹, Phillip J Stansfeld¹.

Abstract

Integral membrane proteins are regulated by specific interactions with lipids from the surrounding bilayer. The structures of protein-lipid complexes can be determined through a combination of experimental and computational approaches, but the energetic basis of these interactions is difficult to resolve. Molecular dynamics simulations provide the primary computational technique to estimate the free energies of these interactions. We demonstrate that the energetics of protein-lipid interactions may be reliably and reproducibly calculated using three simulation-based approaches: potential of mean force calculations, alchemical free energy perturbation, and well-tempered metadynamics. We employ these techniques within the framework of a coarse-grained force field and apply them to both bacterial and mammalian membrane protein-lipid systems. We demonstrate good agreement between the different techniques, providing a robust framework for their automated implementation within a pipeline for annotation of newly determined membrane protein structures.

Entities: Chemical

Mesh：

Substances：
Lipids
Membrane Proteins

Year: 2019 PMID： 31476127 PMCID： PMC6785801 DOI： 10.1021/acs.jctc.9b00548

Source DB: PubMed Journal: J Chem Theory Comput ISSN： 1549-9618 Impact factor: 6.578

Introduction

Integral transmembrane proteins have diverse functions within cells, and as such are key targets for many drugs, ranging from antibiotics to anticancer agents. Structurally, they are unified by the presence of a hydrophobic span of residues that both anchors the protein within the core of a lipid bilayer membrane and presents the flanking residues to the surrounding polar lipid head groups. The resulting protein–lipid interactions are important for function, with many membrane proteins, including, for example ion channels, transporters, and receptors, regulated by specific lipid interactions.[1] Lipid-binding sites thus provide potential druggable allosteric sites on many biologically important membrane proteins. Structural studies of membrane proteins often rely on their extraction from their native bilayer environment through use of detergents. As a consequence of this, lipids which bind to the protein are often lost before structural (X-ray diffraction or cryoelectron microscopy) data are gathered. Although there are cases where X-ray or electron scattering density may be observed for lipids bound to membrane proteins (for examples, see refs[2−4]), the often modest resolution of such data presents challenges to the unambiguous assignment of the molecular identity of the bound lipid species. Molecular simulations provide high resolution insights into the interactions of lipids with membrane proteins. They can both predict the location of lipid-binding sites in advance of structural studies[5−7] and can extend structural observations on the lipid interactions of a given membrane protein to other members of a protein family.[8] In addition to identification of potential lipid interaction sites, for example, from estimates of lipid–protein “fingerprints”,[9] molecular simulations can provide estimates of the residence times of lipids at binding sites on a membrane protein[10] and of free energies of interaction of specific lipids.[11,12] Validation of computational predictions of specific lipid interactions can be achieved via a number of biophysical approaches, including, for example, native mass spectrometry (nMS)[13] which can be employed in tandem with molecular simulation.[14] The relatively slow throughput of these techniques, however, means that only a tiny fraction of the possible interactions has so far been identified. Moreover, experimental quantification of the strength and specificity of protein–lipid interactions remains more challenging, with notable recent attempts using nMS[15] and surface plasmon resonance (SPR)-based methods.[16] Molecular simulations can also be used to quantify the strength of protein–lipid interactions, via free energy calculations (Figure A). Several free energy techniques have been developed for the calculation of binding free energies between ligands and (water soluble) proteins,[17] and these can be modified for analysis of protein–lipid interactions. Membrane proteins and lipids pose particular challenges of sampling and convergence for accurate free energy estimation,[18] arising from the relatively slow rates of lipid diffusion and from the diversity of lipid species present in complex biological membranes.[19] To date, most studies[5,11,18,20,21] have combined umbrella sampling with a potential of mean force (PMF) calculation along a one-dimensional reaction coordinate connecting the binding site with the surrounding membrane[18] (Figure B). Convergence of such calculations (i.e. the point at which additional sampling via additional simulation does not substantially change the outcome) is often achieved through use of a coarse-grained (CG) biomolecular force field, such as Martini,[22,23] which allows for efficient sampling of molecular systems. While a powerful technique, the difficulty in demonstrating convergence makes this approach challenging to implement in a high throughout fashion. Furthermore, it is computationally demanding, requiring >50 μs of simulation per protein–lipid interaction, currently equivalent to ∼1−2 weeks on a typical GPU-node. It is therefore important that we explore additional approaches in order to extend the reach and to evaluate the robustness of PMF-based estimates of free energies of protein–lipid interactions.

Figure 1

Introduction to free energy calculations. (A) Overview of free energy calculations. A membrane protein, as viewed from above the membrane, is shown in cyan cartoon, and a lipid in yellow, orange and red spheres. Two states are modelled: the left state is the protein bound to the target lipid, the right is the protein bound to a generic lipid (not shown), with the target lipid unbound. Free energy calculations aim to compute the difference in free energy between these states (ΔΔGbind). (B) PMF calculations create a reaction coordinate in physical space by pulling the lipid away from or towards the binding site. This coordinate can then be sampled, for example with umbrella sampling, to provide a 1D energetic landscape, allowing calculation of ΔΔGbind between the target and a generic lipid. (C) Chemical structures of PIP2, CDL and cholesterol. (D) FEP and ABFE calculations build alchemical pathways which either change the bound lipid into a different species, in this study to that of the bulk membrane, or fully remove the lipid from the simulation box. This provides the binding free energy difference between the target lipid and a generic lipid, ΔΔGbind. (E) WTMetaD biases the diffusion of a target lipid around the protein through addition of a time-dependent Gaussian of energy to the CV. These Gaussians can then be reconstructed into a full 2D energy landscape, with comparison of binding regions and the bulk membrane giving ΔΔGbind. Here, we present an analysis of the determination of protein–lipid binding interactions using PMFs alongside two other powerful free energy approaches, adapted here for investigation of protein–lipid interactions. These are free energy perturbation (FEP[24]) or absolute binding free energy (ABFE[25]) calculations, whereby a molecule is partially or fully perturbed via non-natural (i.e., alchemical) chemical space (Figure D), and well-tempered metadynamics (WTMetaD[26]), where a history-dependent bias is added to a free energy surface (FES) (Figure E) to reduce simulation time spent sampling local energy minima. We compare PMF, FEP/ABFE, and WTMetaD in terms of ease of accuracy and computational cost. We use all three methods on a panel of experimentally well-characterized proteins which are representative of bacterial, mitochondrial, and mammalian cell membranes. Through comparison of the methods applied here we outline a mechanism for producing robust and reproducible estimates of protein–lipid interactions from molecular simulations.

Methods

Equilibrium CG Simulations

Simulations were run using the CG Martini v2 biomolecular force field.[22,23] In this forcefield, molecules are coarse grained through the representation of approximately 4 heavy atoms and associated hydrogens as a single bead or particle. While this simplification provides the force field with reduced resolution,[27] it has repeatedly been shown as highly proficient in the identification of specific interactions between proteins and membrane lipids, including cardiolipin (CDL),[11,21,28,71] phosphatidylinositol (4,5) bisphosphate (PIP2),[5,14,20] cholesterol[20,30] (Figure C), and others. The Martini force field also restricts internal protein dynamics, reducing concerns about potential conformational changes caused by, for example, ion binding or residue protonation. We follow the MemProtMD protocol for setting up CG simulations of integral membrane proteins in an equilibrated bilayer.[31,32] Briefly, the input proteins are aligned accordingly on the xy plane, and lipids (1-palmitoyl-2-oleoyl-sn-glycero-3-phosphocholine; POPC or 1-palmitoyl-2-oleoyl-sn-glycero-3-phosphoethanolamine; POPE) are placed randomly around the transmembrane region of protein, in a z range of 8 nm. The starting protein coordinates were used as follows: chicken (Gallus gallus) Kir2.2 ion channel, PDB code 3SPC;[33] bovine (Bos taurus) AAC transport protein, PDB code 1OKC;[4] bacterial (Aquifex aeolicus) leucine transporter (LeuT) transport protein, PDB code 2A65(34) (simulated here as a dimer); and human (Homo sapiens) GPCR adenosine 2A receptor (A2AR), PDB code 5IU4(35) with the bRIL subunit removed. In each case, nonprotein atoms were removed and any missing loops were added using MODELLER[36] or SWISS-MODEL[37] prior to CG conversion. Phospholipids were modeled with palmitoyl (4-beads) and oleoyl (5-beads) tails. Cholesterol was modeled using the virtual-site description.[38] Full details of the simulations are given in the Supporting Methods.

Potential of Mean Force Calculations

PMF calculations were set up and run as described previously.[18] Calculations start from the complex formed between the protein and target lipid, which is then inserted into a generic membrane (i.e., POPE or POPC). The lipid is then removed from the protein through application of steered MD, in which a distance-dependent pulling force is applied between the lipid head group and the protein. This trajectory then forms the collective variable (CV) to be analyzed. The specific CVs used here are outlined in the Supporting Methods. Snapshots were taken of the system in which the lipid is at specific window along the CV (using a 0.05 nm spacing for optimal histogram overlap, see e.g. Supporting Figure 1), with each snapshot used to seed an independent simulation. For these, an umbrella potential with a force of 1000 kJ mol–1 nm−2 was used to keep the lipid in place along the reaction coordinate, with 100 kJ mol–1 nm–2xy positional restraints on 3–4 protein backbone beads to prevent the protein from rotating during the simulations—details provided in the Supporting Methods. Simulations were run for 1 μs to allow convergence (e.g., Supporting Figure 1). The first 200 ns were removed from each simulation as equilibration, and the final 800 ns were combined into a 1D energy profile using the weighted histogram analysis method,[39] as implemented in gmx wham,[40] and employing 200 rounds of Bayesian bootstrapping to report on statistical accuracy. When plotting, the bulk region of the membrane is considered to have a free energy of 0 kJ mol–1, and the binding energy well is set to 0 nm on the x-axis. Note that, as the lipid binding site will either be occupied by the target lipid or a generic bulk lipid, this analysis will not provide us with ΔGbind of our target lipid to the site, but instead ΔΔGbind between the target and a generic lipid. This is the biologically appropriate term, as protein–lipid binding will always occur in competition with other lipids, and the effective affinity of the interaction will be dependent on the nature of the other lipids present. Accordingly, if you carry out PMF calculations of, for example, POPC in a POPC membrane, the value reported should be 0 kJ mol–1 (e.g. see ref (21)).

Free Energy Perturbation

The bound PMF systems were used as the input for the FEP calculations. The target lipid was alchemically transformed into the generic lipid, along a coordinate in chemical space, termed λ. Additional simulations were run perturbing the target lipid in the bulk membrane (“free”), that is, with no protein visible to the lipid. ΔΔGbind can then be calculated as described in Figure . This value should be equivalent to that obtained using PMF calculations.

Figure 2

Thermodynamic cycle for FEP shown is a representative thermodynamic cycle used for the FEP calculations in this study. The protein receptor (here Kir2.2) is shown as white surface, with the membrane as white sticks. The target lipid is shown as coloured spheres: on the left the native PIP2 molecule has lipid tails in yellow, glycerol beads in purple, phosphate beads in orange and sugar beads in cyan. On the right, the perturbed POPC lipid is coloured as PIP2, but with a blue choline bead, and transparent beads for the beads which have now been decoupled from the system. The horizontal vertices represent the two FEP calculations, where the target lipid is alchemically perturbed into the bulk lipid either free in membrane (ΔGPIP2–POPC(free)) or bound to the receptor (ΔGPIP2–POPC(bound)). These reactions are represented by a λ coordinate from 0 to 1. ΔΔGbind can be calculated as ΔGPIP2–POPC(free) – ΔGPIP2–POPC(bound), which will represent the same value as calculated in the PMF calculations (Figure ). Note that the right vertical vertex (ΔGPOPC-bind) is the free energy required for the bulk lipid binding the receptor—in this special case of protein–lipid binding, where the ligand and the solvent are the same molecule, this value should be 0 kJ mol–1. Charges and Lennard Jones interactions were turned off separately, with a softcore parameter (sigma = 0.3) used for the Lennard Jones interactions.[41,42] Bonded interactions were not perturbed as these cancel out between the bound and free states. For each system, we used a single topology method, where molecule A is converted into molecule B in a single transformation. Details of the transformation are provided in the Supporting Methods. In all cases, we used 10 windows to perturb the Columbic interactions, and either 10 (PIP2) or 20 (CDL) windows were used to perturb the Lennard-Jones interactions. Each λ window was minimized using the steepest descent method, followed by 5 ns of NPT equilibrium. Five independent production simulations of 250 ns were then run with randomized initial velocities, using a leap-frog stochastic dynamics integrator. The free energy pathway was then constructed from the individual λ windows using the Alchemical Analysis package.[43] Energy values were calculated on the final 225 ns of simulation data, with this simulation length showing good conversion (e.g., Supporting Figure 2). Analysis was run using the MBAR method,[44] although we observed good agreement between multiple analysis methods (e.g., Supporting Figure 2). Data from 5 repeats were averaged, and the standard deviations calculated.

Absolute Binding Free Energy Calculations

A2AR-cholesterol simulations followed a similar setup as described above, but with the target lipid fully, rather than partially, decoupled from the simulation box (Supporting Figure 3). To keep the cholesterol molecule in the binding site and within the correct plane and orientation of the membrane at high values of λ, we followed a cholesterol-restraining scheme described previously,[45] and outlined in the Supporting Methods. ΔΔGbind was calculated from the energy required to decouple the cholesterol molecule in the bound and free state, in each case perturbing the Lennard-Jones interactions over 29 λ windows (with 0.05 spacing from 0 to 0.6, and 0.025 spacing from 0.6 to 1). The energetic input of the restraints was accounted for through two analytical terms (see Supporting eqs 1–3), and additional restraint FEP in the gas phase (see Supporting Figure 3). Simulations were run in Gromacs 2016 with restraints imposed using the Plumed v2.4 plugin.[46,47]

Metadynamics

In classical MetaD, the evolution of the system along the CV is biased by a history-dependent potential, which is the sum of the Gaussians deposited along the relevant CV.[48] After a defined period of time, the biasing potential compensates the underlying FES, allowing the real FES to be estimated. Use of a fixed Gaussian height gives classical MetaD several limitations, particularly concerning convergence, as the system can be pushed into regions of configurational space which are not physically or physiologically relevant. Here, we use WTMetaD[26,49] in which the Gaussian height is rescaled based upon an adaptive bias (ΔT), which is dependent on the system simulated.[26] The WTMetaD simulations were made using the protein coordinates built into a simple (POPC or POPE) membrane as described above, with xyz positional restraints on select backbone particles—details provided in the Supporting Methods. The WTMetaD simulations were run using 20 walkers placed randomly within 2.5 nm of the protein. In the A2AR, AAC, and Kir2.2 systems, the walkers were constrained from moving greater than 3.5, 4, and 4.2 nm, respectively, from the geometric center of the protein. For LeuT, the walkers were constrained using a minimum distance of 4 nm between the lipid and the geometric center of each monomer. To prevent the cholesterol from flip-flopping between leaflets, two additional flat-bottomed restraints were applied: an angle restraint of one radian was applied between the ROH, C2 beads, and the z-axis and a wall parallel to the midpoint of the membrane offset toward the extracellular leaflet by 1 nm was applied to the ROH bead to prevent the cholesterol from moving vertically between leaflets. In all simulations, the flat bottom restraints were applied using the Plumed UPPERWALLS of 1000 kJ mol–1 nm–2. The bias was added along the CV for each system, as defined in the Supporting Methods. The bias-factor was tuned to each lipid with cholesterol, CDL, and PIP2 being 6, 8, and 20, respectively. The following WTMetaD parameters were applied to all systems: Gaussian width of 0.01 nm, height of 1 kJ mol–1, and a 1 ps deposition rate. All WTMetaD simulations were performed at 310 K using Gromacs 2016 with the Plumed v2.4 plugin. When plotting, the bulk region of the membrane is considered to have an energy of 0 kJ mol–1. The bulk Gaussian height of 5% of the maximum value was used as a metric for the WTMetaD reaching a steady state (e.g., Supporting Figure 4), at which we see multiple association/disassociation events to the binding sites. The FES was sampled every 250 ps, whereupon the final 2D FES depicted in the text was recovered by averaging over the steady state.

Results

We selected a panel of four experimentally well-characterized membrane proteins to represent bacterial, mitochondrial, and mammalian cell membranes. These were chosen to include examples for which experimental data for protein–lipid interactions are available, and which have been the subject of previous computational studies. They also represent three species of lipid (Figure C) which frequently form interactions with membrane proteins, namely: PIP2, a negatively charged phospholipid which interacts with many ion channels and receptors in mammalian cell membranes; CDL, also negatively charged, which is present in mitochondrial and bacterial inner membranes, and cholesterol, which has been observed to bind to many G-protein-coupled receptors (GPCRs) and ion channels. Thus, we probe the following interactions: of PIP2 with the mammalian inward rectifying potassium channel, Kir2.2; of CDL with the mitochondrial inner membrane ADP/ATP carrier protein, AAC,[21] and with the bacterial inner membrane LeuT;[34,50] and of cholesterol with a GPCR, the A2AR.

Kir2.2–PIP2 Binding Interactions

Kir2.2 is a member of the inwardly rectifying potassium channel family, found in neuronal cell membranes, which play a key role in the regulation of plasticity and neuronal excitation.[51] As with other members of this family, opening of Kir2.2 channels can be activated through interaction with PIP2.[52] Structural (X-ray) studies show PIP2 binds at the interface between the transmembrane and cytoplasmic domains (Figure A) to bring about channel opening.[33]

Figure 3

Calculating Kir2.2–PIP2 binding energetics (A) view of Kir2.2 in cyan cartoon, with a bound PIP2 molecule in yellow, orange and red spheres, as sampled with CG MD. PIP2 interactions map to residues Arg 78, Arg 80, Lys 183, Arg 186, Lys 188, Lys 189 (Gallus gallus numbering). The approximate position of the membrane is shown with black lines. On the left is a view from the side, and on the right is a cytoplasmic view of the transmembrane region alone, with the intracellular domain removed for clarity. Note that only one PIP2 binding pose is shown, but four are present around the homotetrameric Kir2.2. (B) PMF data for Kir2.2–PIP2 binding. The y-axis is set to 0 for the bulk membrane, and the difference between this and the energy well (set to 0 nm on the x-axis) is ΔΔGbind, here −45 ± 2, with errors from 200 rounds of bootstrap analysis. (C) FEP data for Kir2.2–PIP2 binding, showing the energy cost for perturbing PIP2 to POPC whilst bound to Kir2.2 (red) and whilst free in a POPC membrane (blue). ΔΔGbind can be calculated from the free data minus the bound (see Figure ), giving a value of −48 ± 2, with the error the standard deviation from five repeats. (D) 2D energy landscape for Kir2.2 and PIP2 as computed using WTMetaD. The protein is shown as surface behind the data, with the large intracellular domain removed for clarity. The energetic landscape for a PIP2 molecule around the protein has been computed and is shown as a red-blue contour map. Four binding regions in red can be seen around the protein, with reported ΔΔGbind values as follow: S1 = −55 ± 7, S2i = −49 ± 4, S2ii = −45 ± 5, S3 = −45 ± 6 and S4 = −36 ± 7 kJ mol–1. Simulations were run to probe the free energy of Kir2.2–PIP2 interactions in a simple phospholipid (POPC) membrane. Binding sites have previously been identified both structurally[33] and computationally.[5,18] We, therefore, carried out PMF and FEP simulations based on the crystallographic binding pose of PIP2 in order to estimate the binding free energy. We note that these PMFs provide an interaction free energy for Kir2.2 with PIP2 relative to that with POPC, that is the well depth in the PMF can be equated as ΔGPIP2-bind – ΔGPOPC-bind, modeling the vertical axes of the thermodynamic cycle in Figure . We shorten the product of this, ΔΔGbind(PIP2–POPC), to ΔΔGbind. In the FEP analysis, the calculations directly compute ΔGPIP2–POPC for the free and bound states of the lipid, modeling the horizontal axes of the thermodynamic cycle in Figure . Therefore, FEP analysis provides us with the same ΔΔGbind as from the PMF calculations, but calculated in a different manner. The PMF and FEP data agree well, giving similar estimates for Kir2.2–PIP2 interactions (Figure B,C; −45 ± 2 kJ mol–1 for PMF and −48 ± 2 kJ mol–1 for FEP). The PMF data reveal the presence of a second binding site, ∼0.5 nm from the main site, with a free energy of interaction of −36 ± 2 kJ mol–1. This alternative pose of the lipid involves interactions between PIP2 and residue Lys 220, and corresponds to a secondary binding mode proposed for the closely related Kir2.1 channel.[53] Next, we carried out WTMetaD simulations to explore the multiple PIP2 binding sites on the tetrameric channel structure. Through construction of a 2D CV in physical space between the geometric center of the protein to the head group (RP1, RP2, RP3, PO1, PO2, and PO3 beads) of PIP2, WTMetaD allows us to calculate the same quantity as the PMF and FEP analyses, that is, the ΔΔGbind of PIP2 in relation to POPC. The data identified the primary binding modes for three of the channel subunits, with interaction free energies of −55 ± 7, −49 ± 4, and −45 ± 6 kJ mol–1 (Figure D). The latter of these values corresponds to the site probed using PMF and FEP, revealing excellent agreement between the techniques. For the fourth subunit, the secondary binding mode was recovered (−36 ± 7 kJ mol–1), which again corresponds well with value from the PMF analysis. We note that in the WTMetaD simulations of Kir2.2, as for all the systems, the protein is necessarily xyz restrained to keep the protein in a consistent position with relation to the CV (see Supplementary Methods for more detail). For Kir2.2, because of our input coordinates the channel is tilted (by ca. 5°) relative to the bilayer normal. We expect that this small tilt relative to the bilayer induces the shift of PIP2 from the primary to the secondary binding site for the fourth channel subunit, with the primary binding mode only accessible to three of the four binding sites because of the orientation of the channel.

CDL Interactions with Two Transport Proteins

The ADP/ATP carrier (AAC), also known as the adenine nucleotide translocator, is present in the inner mitochondrial membrane where it accounts for 10% of the total protein content.[54] It functions as a regulator of mitochondrial adenine nucleotide concentration, allowing flux of ATP/ADP across the mitochondrial membrane.[55] CDL is known to bind to AAC[4,56] (Figure A), where it results in activation of the transporter.[57,58] The protein has an approximate three-fold symmetry with three homologous, but potentially nonidentical, binding sites for CDL.

Figure 4

Calculating AAC–CDL binding energetics (A) AAC bound to CDL. Colours and views as in Figure A. Note that three CDL binding sites are present around AAC, which is a homotrimer. (B) PMF data for each of the AAC–CDL binding sites, as per Figure B. Site 1 is at the top, site 2 in the middle and site 3 at the bottom. ΔΔGbind is −7 ± 2, −8 ± 4 and −5 ± 2 for each site respectively. (C) FEP data for AAC–CDL binding, showing the energy cost for perturbing CDL to POPC whilst bound to AAC in each of the three binding sites (red: site 1 on the left) and whilst free in a POPC membrane (blue). ΔΔGbind is −9 ± 3, −11 ± 3 and −8 ± 2 for each site respectively. (D) 2D energy landscape for AAC and CDL, as per Figure D. Three binding regions in red can be seen around the protein, reporting ΔΔGbind values as follow: S1 = −14 ± 3, S2 = −13 ± 3 and S3 = −11 ± 6 kJ mol–1. We probed the energetics of the AAC–CDL interaction at all three of these sites with the protein embedded in a POPC membrane using PMF and FEP calculations (Figure A). PMF analyses yielded ΔΔGbind values of −7 ± 2, −8 ± 4 and −5 ± 2 kJ mol–1 for CDL binding at these sites (Figure B). FEP produces good agreement with the PMF data, giving ΔΔGbind values of −9 ± 3, −11 ± 3, and −8 ± 2 kJ mol–1 (Figure C), with each technique ranking the sites the same in terms of CDL binding energies. Next, we explored the system using WTMetaD with a CV constructed between the protein and the GL0 of CDL. This revealed clear energy wells at all three binding sites (Figure D). The ΔΔGbind values for sites 1, 2, and 3 were −14 ± 3, −13 ± 3, and −11 ± 6 kJ mol–1, respectively, showing reasonable agreement with the PMF and, to a greater degree, FEP estimates. Reassuringly, the sites are again similarly ranked, with the site 3 lower in energy than sites 1 or 2. We also explored CDL interactions with LeuT, a bacterial homologue of the solute carrier family 6 (SLC6) class of proteins, which includes the human serotonin transporter.[59] LeuT catalyzes sodium-driven small hydrophobic amino acid transport across the bacterial inner membrane,[60] and has been shown to bind to CDL (Figure A), which stabilizes the dimeric form of the transporter.[50] We probed the energetics of the CDL–LeuT interaction in a POPE membrane using both PMF and FEP calculations. The initial CDL pose was based on a previously identified likely binding site at the dimer interface (see ref (50) for more details). PMF analysis yielded a ΔΔGbind value of −6 ± 3 kJ mol–1 for CDL at this site (Figure B), in agreement with the FEP ΔΔGbind value of −9 ± 2 kJ mol–1 (Figure C). WTMetaD not only identifies the dimer interface binding site (with a binding free energy of −7 ± 3 and −10 ± 6 kJ mol–1 for the equivalent and opposing sites, respectively) but also additional sites around the complex (Figure D). For example, sites 2 and 4 appear to bind CDL with a lower energy than the dimer-interface site (−3 ± 2 kJ mol–1 for each site). Nonspecific of CDL with a low free energy of ca. −3 kJ mol–1 is also seen over a 3 nm area of each monomers (Figure D; dotted lines).

Figure 5

Calculating LeuT–CDL binding energetics (A) view of LeuT bound to CDL. Colours and views as in Figure A. Note that a second equivalent CDL binding site is present on the other side of the homodimeric LeuT. (B) PMF data for AAC–LeuT binding, as per Figure B. ΔΔGbind is −6 ± 3. (C) FEP data for AAC–LeuT binding, perturbing CDL to POPE whilst bound to LeuT (red) and whilst free in a POPE membrane (blue). ΔΔGbind is −9 ± 2. (D) 2D energy landscape for LeuT and CDL, as per Figure D. Two binding regions in red can be seen around the protein, reporting ΔΔGbind values as follow: S1 = −7 ± 3, S2 = −3 ± 2, S3 = −10 ± 6 and S4 = −3 ± 2 kJ mol–1. Two indiscriminate binding regions of ca. −3 kJ mol–1 are highlighted with dotted lines.

Cholesterol Interactions with a GPCR

The GPCR class of proteins constitutes the largest family of membrane proteins and account for 35% of all drug targets.[61] Many members are thought to be functionally modulated by cholesterol binding,[62] including the highly studied A2AR. This provides an example of protein–lipid interactions with an uncharged and relatively rigid lipid molecule. Cholesterol binding to A2AR has been previously identified structurally[63] and explored using molecular dynamics.[7,20,64] To probe the energetics of this process, we applied WTMetaD to human A2AR in a POPC membrane to produce a full 2D map of cholesterol binding on the extracellular leaflet around the receptor, using the protein and ROH bead of cholesterol as a CV. We identified three binding sites between TM 1–2, 1–7, and 6–7 (Figure A,B), with ΔΔGbind of −8 ± 3, −9 ± 3, and −5 ± 2 kJ mol–1. Of these, only the TM 6–7 cholesterol binding site has been captured in structural data for A2AR,[35,65,66] although the other two binding sites correspond to electron density identified as acyl tails in several A2AR structures.[35,65,66]

Figure 6

Calculating A2AR–cholesterol binding energetics (A) view of A2AR bound to cholesterol, in one of several possible binding sites. Colours as in Figure A. (B) WTMetaD 2D energy landscape for A2AR and cholesterol, for the extracellular leaflet only. Three binding regions in red can be seen around the protein, reporting ΔΔGbind values as follow: S1 = −8 ± 3, S2 = −9 ± 3 and S3 = −5 ± 2 kJ mol–1. (C) ABFE analysis the sites from panel B. Here, the bound cholesterol is fully decoupled from a POPC membrane whilst bound to A2AR or whilst free in the membrane. The cycle represented here is a simplified version of the cycle in Supporting Figure 3. The final ΔΔGbind values are −5 ± 1, −8 ± 1 and −2 ± 1 kJ mol–1. We then probed these sites using ABFE, using restraints and a thermodynamic cycle outlined in a recent study by Salari et al.[45] (see Supporting Methods and Supporting Figure 3). We obtained ΔΔGbind values of −5 ± 2, −8 ± 1, and −2 ± 1 kJ mol–1 for the three sites (Figure C), in reasonable agreement with the WTMetaD data. Importantly, we get an identical ranking of the sites between the techniques, with site 2 the highest energy and site 3 the lowest. Note that we were unable to probe these sites using PMF calculations, as the energies become swamped by background thermal fluctuations. We can compare our estimates of the strength of A2AR cholesterol interactions with those from other simulation studies of GPCRs. Genheden et al.[67] estimated free energies of interaction with b2AR and A2AR of the order of −10 to −15 kJ mol–1 from CG simulations. Lee and Lyman[7] estimated free energies for the A2AR of −3 to −5 kJ mol–1 from atomistic MD simulations. Thus, our estimates are in broad agreement with those from previous studies of the A2AR, both of which estimated free energies directly from cholesterol occupancies following extended (but possibly under sampled) equilibrium MD simulations.

Discussion

Integral membrane proteins are strongly affected by the lipid environment in which they reside (for a recent review, see ref (1)). Information on the structural basis of these interactions can be determined by X-ray crystallography, cryoelectron microscopy, and NMR. In contrast, determining the free energies of these interactions remains very challenging either experimentally[15,16,29] or computationally.[5,11,18,20,21] One of the main challenges facing computational estimation of free energies of protein–lipid interactions is presented by the slow timescale of relaxation of lipid molecules in a bilayer. This means that for free energy calculations, we need to be confident that we have adequately sampled those interactions, that is, the simulations have converged. The agreement between all three techniques for lipid binding energetics gives us confidence that this is indeed the case (Table and Figure ). To our surprise, the analyses here provide lower energies for CDL binding to AAC[21] and other proteins[11,18,68] than previously reported; this discrepancy most likely reflects a different handling of long-range electrostatic interactions. Nonetheless, when comparing like-for-like systems, as here, it is evident that all three techniques converge to a common value and provide a similar degree of accuracy.

Table 1

Summary of Binding Energies for the Systems Tested Herea

system	site	lipid	membrane	PMF	FEP/ABFE	WTMetaD
Kir2.2	3	PIP₂	POPC	–45 ± 2	–48 ± 2	–45 ± 6
AAC	1	CDL	POPC	–7 ± 2	–10 ± 4	–14 ± 3
	2			–8 ± 4	–11 ± 2	–13 ± 3
	3			–7 ± 4	–9 ± 4	–11 ± 6
LeuT	1	CDL	POPE	–6 ± 3	–9 ± 2	–7 ± 3
A2AR	1	chol	POPC		–5 ± 1	–8 ± 3
	2				–8 ± 1	–9 ± 3
	3				–2 ± 1	–5 ± 2

Reported are the measured ΔΔGbind values, with the errors calculated as defined in the text and figure legends. These data have been used to populate the chart in Figure .

Figure 7

Comparison of different techniques. Bar chart showing ΔΔGbind for the different protein–lipid systems described above. Shown are energies calculated with PMF (blue), FEP/ABFE (maroon) and WTMetaD (green). The error bars shown here are from the 200 rounds of bootstrap analysis for the PMF, standard deviations of 5 repeats for the FEP, or standard deviations of the energies for each site for the WTMetaD (green). Reported are the measured ΔΔGbind values, with the errors calculated as defined in the text and figure legends. These data have been used to populate the chart in Figure . In addition to accuracy, it is of interest to compare the computational efficiency of the three methods, especially if they are to be employed in an automated pipeline (see e.g. ref (31)) to characterize and compare protein–lipid interactions across a wide range of membrane protein structures. As can be seen from Table , the least computationally demanding technique is FEP/ABFE. This is because each alchemical pathway can be adequately described in 21 (PIP2), 29 (cholesterol) or 31 (CDL) windows, and 250 ns of simulation time per window is sufficient for good convergence (see Supporting Figure 2, 6 and 9). Running 5 independent repeats (resulting in ∼25–40 μs simulation time in total) permits statistical analysis of the data. In fact, the data suggest that this cost could be further reduced to 150 ns per window with only 3 repeats (∼9.5 μs in total; achievable in about 1−2 days with a mid to high range GPU), as this agrees well with the more extensively sampled data (e.g., −48 ± 2 kJ mol–1 vs −51 ± 1 kJ mol–1; Supporting Figure 13A). Note that these estimates do not account for the initial set of simulations that are required for the perturbation of each lipid species in the bulk membrane, as this benchmark may be used for subsequent calculations.

Table 2

Computational Cost of Each Technique as Applied Herea

technique	system applied to here	required simulation time (μs)
PMF	Kir2.2, AAC, LeuT	50–75
FEP	Kir2.2, AAC, LeuT	25–40
ABFE	A_2AR	35
WTMetaD	Kir2.2, AAC, LeuT, A_2AR	120–190

Reporting the simulation time used in each of the analysis measures here. Note that, as described in the Discussion, these values may overestimate the time required for convergence.

Reporting the simulation time used in each of the analysis measures here. Note that, as described in the Discussion, these values may overestimate the time required for convergence. PMFs are generally less cost-efficient than FEP, taking at least 50 μs to converge. This is largely because equilibration of a lipid as it diffuses within a bilayer is relatively slow, meaning that each window needs to be simulated for 1 μs (see Supporting Figures 1, 5 and 8). In addition, a 0.05 nm window spacing from the bound state to bulk membrane usually requires 60 or more windows to get from the bound state to bulk. It should be noted, however, that a less frequent spatial sampling of 0.1 nm intervals, while resulting in a lower statistical certainty across the PMF, produces a similar estimation of ΔΔGbind (for instance −45 ± 2 kJ mol–1 vs −48 ± 2 kJ mol–1; Supporting Figure 13B). In contrast to FEPs, however, PMFs offer a reaction coordinate in 1D space, which may be useful in certain cases. An example of this is for Kir2.2, where the PMF was able to detect the second, previously identified,[53] binding site (Figure B). Generally, WTMetaD is the most expensive technique, taking >100 μs to converge (see Supporting Figures 4, 7, 10 and 12), which is compounded by the requirement to read/write the Gaussian depositions to file, together taking ∼3× longer to sample the same timescales as a standard MD simulation. However, in instances where a binding site is known, the simulations could be scaled back to focus upon the single binding site, saving considerable resources in the process. In addition, WTMetaD allows the mapping of the full 2D energy landscape around the target protein. This proved to be especially powerful for the A2AR and LeuT systems, where multiple asymmetric binding sites were revealed. In addition to this, the WTMetaD is not restricted by needing to know the highest energy pose before analysis, which was of particular use for the study of A2AR here. For each system, the 2D landscapes were additionally compared to the PMF data through extraction of 1D coordinates from the data (see the Supporting Methods for details, and e.g. Supporting Figure 4F,G for data). This is potentially very powerful as it allows the construction of pathways from the bulk membrane to the binding site, providing a greater degree of insight into the dynamics of binding. At present, we have compared these methods for estimation of lipid interaction free energies using the Martini[22,23] force field. We are aware that the free energies estimated are therefore an approximation because of smoothing of the free energy landscape, an inevitable consequence of coarse-graining. In the future, comparison of these data with corresponding atomistic simulations will be necessary, although achieving convergence of equivalent atomistic simulations is currently extremely challenging for systems of the size and complexity described here. These analyses would likely benefit from the recently described use of a funnel potential for WTMetaD simulations,[69] allowing a much more focused CV. Finally, comparison of these methods to experimental analyses will be of particular importance. Currently, there have been only a few examples of experimentally determined protein–lipid binding affinities: these include using nMS,[15] SPR,[16] and FRET.[29] The majority of these approaches, however, involve both the protein and lipid being solubilized beforehand in detergent micelles. This will likely provide an inaccurate picture of the true energetics of lipid binding/unbinding in a lipid membrane, as the acyl tails will likely contribute substantially to the binding energies. Our work therefore highlights a pressing need for accurate measurement of lipid-binding affinities to be applied to a number of well characterized membrane protein systems. In summary, in this study we have shown that three distinct methods for CG free energy calculations are able to provide robust estimates of the strength and specificity of lipid-binding sites on membrane proteins. We envisage that these simulations can be readily performed, taking protein–lipid binding sites identified from structures, long equilibrium simulations (“fingerprinting”)[9,70] or WTMetaD, and enumerating energies for specific lipid association in a semiautomated manner. The characterization of lipid-binding sites on membrane proteins offers the prospect of discovery of potential druggable allosteric sites on a wide range of membrane proteins, including almost all potential target species within the current membrane protein structural proteome of ca. 4000 structures.[32]

63 in total

Review 1. Interaction of G protein coupled receptors and cholesterol.

Authors: Gerald Gimpl
Journal: Chem Phys Lipids Date: 2016-04-20 Impact factor: 3.329

2. Crystal structure of a bacterial homologue of Na+/Cl--dependent neurotransmitter transporters.

Authors: Atsuko Yamashita; Satinder K Singh; Toshimitsu Kawate; Yan Jin; Eric Gouaux
Journal: Nature Date: 2005-07-24 Impact factor: 49.962

3. Parameters for Martini sterols and hopanoids based on a virtual-site description.

Authors: M N Melo; H I Ingólfsson; S J Marrink
Journal: J Chem Phys Date: 2015-12-28 Impact factor: 3.488

4. The MARTINI Coarse-Grained Force Field: Extension to Proteins.

Authors: Luca Monticelli; Senthil K Kandasamy; Xavier Periole; Ronald G Larson; D Peter Tieleman; Siewert-Jan Marrink
Journal: J Chem Theory Comput Date: 2008-05 Impact factor: 6.006

5. Guidelines for the analysis of free energy calculations.

Authors: Pavel V Klimovich; Michael R Shirts; David L Mobley
Journal: J Comput Aided Mol Des Date: 2015-03-26 Impact factor: 3.686

6. A competitive inhibitor traps LeuT in an open-to-out conformation.

Authors: Satinder K Singh; Chayne L Piscitelli; Atsuko Yamashita; Eric Gouaux
Journal: Science Date: 2008-12-12 Impact factor: 47.728

7. Structural basis for allosteric regulation of GPCRs by sodium ions.

Authors: Wei Liu; Eugene Chun; Aaron A Thompson; Pavel Chubukov; Fei Xu; Vsevolod Katritch; Gye Won Han; Christopher B Roth; Laura H Heitman; Adriaan P IJzerman; Vadim Cherezov; Raymond C Stevens
Journal: Science Date: 2012-07-13 Impact factor: 47.728

Insights into Membrane Protein-Lipid Interactions from Free Energy Calculations.

Introduction

Methods

Equilibrium CG Simulations

Potential of Mean Force Calculations

Free Energy Perturbation

Absolute Binding Free Energy Calculations

Metadynamics

Results

Kir2.2–PIP2 Binding Interactions

CDL Interactions with Two Transport Proteins

Cholesterol Interactions with a GPCR

Discussion

Review 1. Interaction of G protein coupled receptors and cholesterol.

2. Crystal structure of a bacterial homologue of Na+/Cl--dependent neurotransmitter transporters.

3. Parameters for Martini sterols and hopanoids based on a virtual-site description.

4. The MARTINI Coarse-Grained Force Field: Extension to Proteins.

5. Guidelines for the analysis of free energy calculations.

6. A competitive inhibitor traps LeuT in an open-to-out conformation.

7. Structural basis for allosteric regulation of GPCRs by sodium ions.

8. Structural basis of PIP2 activation of the classical inward rectifier K+ channel Kir2.2.

9. The role of interfacial lipids in stabilizing membrane protein oligomers.

10. The MemProtMD database: a resource for membrane-embedded protein structures and their lipid interactions.

1. Entropy in the Molecular Recognition of Membrane Protein-Lipid Interactions.

2. Molecular mechanism of CD44 homodimerization modulated by palmitoylation and membrane environments.

3. Transition between conformational states of the TREK-1 K2P channel promoted by interaction with PIP₂.

4. Probing effects of the SARS-CoV-2 E protein on membrane curvature and intracellular calcium.

Review 5. Monolysocardiolipin (MLCL) interactions with mitochondrial membrane proteins.

Review 6. How Do Molecular Dynamics Data Complement Static Structural Data of GPCRs.

7. Lipid Interactions of a Ciliary Membrane TRP Channel: Simulation and Structural Studies of Polycystin-2.

8. Identification and Characterization of Specific Protein-Lipid Interactions Using Molecular Simulation.

9. Structure and lipid dynamics in the maintenance of lipid asymmetry inner membrane complex of A. baumannii.

10. Defining how multiple lipid species interact with inward rectifier potassium (Kir2) channels.