Literature DB >> 24250277

Efficient Computation of Small-Molecule Configurational Binding Entropy and Free Energy Changes by Ensemble Enumeration.

Nathaniel W Silver¹, Bracken M King, Madhavi N L Nalam, Hong Cao, Akbar Ali, G S Kiran Kumar Reddy, Tariq M Rana, Celia A Schiffer, Bruce Tidor.

Abstract

Here we present a novel, end-point method using the dead-end-elimination and A* algorithms to efficiently and accurately calculate the change in free energy, enthalpy, and configurational entropy of binding for ligand-receptor association reactions. We apply the new approach to the binding of a series of human immunodeficiency virus (HIV-1) protease inhibitors to examine the effect ensemble reranking has on relative accuracy as well as to evaluate the role of the absolute and relative ligand configurational entropy losses upon binding in affinity differences for structurally related inhibitors. Our results suggest that most thermodynamic parameters can be estimated using only a small fraction of the full configurational space, and we see significant improvement in relative accuracy when using an ensemble versus single-conformer approach to ligand ranking. We also find that using approximate metrics based on the single-conformation enthalpy differences between the global minimum energy configuration in the bound as well as unbound states also correlates well with experiment. Using a novel, additive entropy expansion based on conditional mutual information, we also analyze the source of ligand configurational entropy loss upon binding in terms of both uncoupled per degree of freedom losses as well as changes in coupling between inhibitor degrees of freedom. We estimate entropic free energy losses of approximately +24 kcal/mol, 12 kcal/mol of which stems from loss of translational and rotational entropy. Coupling effects contribute only a small fraction to the overall entropy change (1-2 kcal/mol) but suggest differences in how inhibitor dihedral angles couple to each other in the bound versus unbound states. The importance of accounting for flexibility in drug optimization and design is also discussed.

Entities: Chemical Disease Gene Species

Year: 2013 PMID： 24250277 PMCID： PMC3827837 DOI： 10.1021/ct400383v

Source DB: PubMed Journal: J Chem Theory Comput ISSN： 1549-9618 Impact factor: 6.006

Introduction

One of the goals of rational, structure-based drug design is to understand the thermodynamics of small-molecule-receptor binding in order to design effective, high-affinity therapeutics. Lead compound development is expensive and requires a great deal of experimental effort to explore the large combinatorial space of chemical functionality. To expedite the process, computational methods are often used to optimize the search and examine the binding thermodynamics of lead compounds. It is difficult, however, to compute accurately both the enthalpy (ΔHbind) and entropy changes (ΔSbind) upon binding and to rank compounds based on a true free energy of binding (ΔG = ΔH – TΔS). Most approaches based on physical force-fields include enthalpic binding contributions and perhaps solvent entropy contributions, but both are estimated from single bound and unbound conformations. The neglect of configurational entropy changes for binding partners is a clear omission in common applications. Nonetheless, such calculations are valuable as they can provide a more accurate and detailed breakdown of the thermodynamic changes. Experimental methods such as isothermal titration calorimetry (ITC)[1] can only report on ensemble averaged binding enthalpies and entropies, and they cannot determine the source of the change (e.g., ligand, receptor, or solvent). Computational approaches, in principle, account for the contributions to the free energy of binding from ligand, receptor, and solvent degrees of freedom. Standard molecular mechanical treatments of ligand binding separate the enthalpic changes into separate terms for internal, van der Waals, Coulombic, and solvation interactions.[2,3] Similarly, binding entropies are often decomposed into conformational entropy terms for the ligand, receptor, and solvent, but compared to the enthalpic terms, they are much more time-consuming to calculate. To accurately compute an ensemble free energy or entropy change upon binding, one must fully explore and integrate over the conformational space of the solvent, the ligand, its receptor, and the complex,[4] which is a formidable task even for small systems. Given this difficulty, the configurational entropy change is often assumed to be the same for different ligands in a series or is approximated with an empirical term that assumes a constant change in entropy per frozen rotatable bond.[5−9] However, these approaches often fail to account for both ligand and receptor topology and lack theoretical support. Chang et al. calculated the change in configurational entropy of the clinically approved inhibitor amprenavir binding to HIV-1 protease (as −TΔS) and found it to oppose binding by ∼25 kcal/mol.[10] It is still unknown, however, whether the configurational binding entropy change is similar for different, related protease inhibitors. Modern computational methods that are used to compute free energies of binding, and their component enthalpies and entropies generally fall into one of two categories, perturbation and end-point methods. The former includes free energy perturbation (FEP)[11,12] and thermodynamic integration (TI),[13−15] which often rely on molecular dynamics (MD) or Monte Carlo (MC) simulations to perturb a system from one state to another (e.g., wild type to mutant, one ligand to another, or even unbound to bound). The total free energy change can then be computed as a function of the perturbation coordinate. These methods provide a well-defined thermodynamic path between each state but can be slow to converge, as there are significant challenges.[16] End-point methods determine free energy changes by calculating absolute free energies of the final and initial states of the system and taking the difference.[4] These absolute free energies can also be found via MD or MC simulations and have been successfully used to study ligand binding in a variety of molecular systems.[17,18] Recent, alternative formulations make use of the single- or predominant-state approximation, in which a single or multiple low-energy structures are identified, and the local configurational space about each initial structure is sampled.[10,19,20] Implicit in these methods is the assumption that high-energy conformations contribute negligibly to the ensemble entropy and enthalpy averages and that the potential energy surface is well described using a single or set of local minima. Further approximations are often made to analytically integrate over local minima using the harmonic or quasiharmonic approximation.[17,21,22] The former assumes the potential energy surface about the initial structure can be modeled using a multidimensional harmonic potential, while the latter also assumes that conformational fluctuations are governed by a multivariate Gaussian probability distribution. While these methods are efficient, they are not guaranteed to search all of phase space and exaggerate favorable free energy changes.[23] This study seeks to evaluate a number of these assumptions using an ensemble, configurational free energy of binding to accurately rank computationally designed human immunodeficiency virus (HIV-1) protease inhibitors. Previous studies of the examined inhibitors have shown that using a single, low-energy configuration to evaluate each inhibitor can successfully predict binding geometries but often fails to correctly rank inhibitors with binding free energies within 2–3 kcal/mol of each other.[24] We sought to improve upon this static, predominantly enthalpic treatment by accounting for ensemble effects of the ligand both in the bound and the unbound state. To this end, we developed a novel, deterministic, end-point method for computing the free energy of binding of ligand–receptor complexes that uniformly searches conformational space and explicitly accounts for both enthalpic and configurational entropic effects. This approach fundamentally differs from the aforementioned methods in that it does not sample from a Boltzmann distribution of configurations to collect an average but instead uses uniform, rotameric enumeration[25] of ligand torsional degrees of freedom to map out and explicitly integrate over the potential energy landscape. While normally an intractable problem, searching through this high-dimensional space is enabled through the use of the dead-end elimination (DEE)[26−29] and the A* algorithms.[30,31] DEE is used to prune high-energy rotamers, which excludes low probability configurations from the search space, while A* is used to rapidly enumerate the accessible configurational states of the structure. Both of these algorithms are global optimizers and, when used in conjunction, are guaranteed to both find the global minimum energy configuration (GMEC) and eliminate all those configurations with energies greater than a user supplied energy cutoff above the GME. Using this method, we were able to generate an energy-ranked, gapless list of low-energy ligand configurations in a computationally tractable amount of time, and evaluate the bound and unbound state partition functions to compute the free energy, enthalpy, and configurational entropy of binding in the context of a rigid receptor. The configurational entropy changes of all the protease inhibitors explored in this study were further analyzed using a novel, additive entropy expansion. By decomposing the entropy into a series of marginal entropy and mutual information (coupling) terms, we were able to extract the entropic contribution of each degree of freedom as well as the contributions from entropic coupling between pairs, triplets, and higher-order combinations. Similar entropy expansions have been described in the literature to examine the configurational entropies of liquids,[32,33] spin frustrated systems,[34] as well as biological systems.[35,36] However, given the aforementioned difficulty associated with effectively sampling the potential energy landscape of complex biological systems, previous applications to such systems have been limited to approximating the full entropy of the system. These methods assume that only a low-order subset of the entropy terms contribute significantly, as they are unable to accurately evaluate the remaining high-dimensional terms. Additionally, while these approximations have been reasonably successful at describing the larger distribution, the individual terms are often difficult to physically interpret as they contain overlapping entropic contributions that are successively added and removed as the level of approximation improves.[35] The expansion used in this study is similar to that used in the MIST expansion,[37,38] as well as those presented by Killian et al. and Matsuda.[34,35] It is based on the generalized Kirkwood superposition approximation,[35,39,40] which approximates a high-order probability distribution using a series of successively lower-order distributions. It differs, however, in that each entropy term in the expansion is conditioned on the remaining degrees of freedom of the system, which aids in the physical interpretation of these terms by separating their contributions into nonoverlapping pieces. Each term describes either the conditional marginal entropy of each degree of freedom or the conditional mutual information (coupling) between sets of degrees of freedom. By appropriately conditioning each term, these conditional couplings are measures of the coupling between degrees of freedom that are not mediated by another degree of freedom of the system, which avoids the layered, compensating additions and subtractions of the same physical effect present in other methods. Applying this novel, conditional mutual information expansion (CMIE) and DEE/A* enumeration method to a series of protease inhibitors, we have been able to interpret configurational variation both within a given ensemble as well as between equilibrium ensembles in terms of specific thermodynamic changes. We can accurately evaluate the contribution of each marginal and coupling term to the full entropy as well as provide some insight into how physical coupling of degrees of freedom affects configurational entropy of binding. Our results analyze the efficacy of our approach by exploring thermodynamic convergence, comparisons with experimental measurements of binding affinity, and the role configurational entropy plays in binding. We find that our computed free energies correlate strongly with experiment and that most thermodynamic averages are well-defined by only a small portion of configurational space. Compared to previous computational studies of the inhibitors examined here,[24] the enhanced sampling methods employed in this study provide better single-conformation and ensemble estimates that correlate with experimental free energy measurements. We also observe that each inhibitor loses a significant amount of configurational entropy upon binding, and that relative entropy differences among related inhibitors are significant (1–3 kcal/mol). Analysis using the CMIE entropy expansion shows that the majority of both the absolute and relative entropic losses can be traced to changes in marginal conditional entropy and that changes in entropic coupling play a more subtle role in the thermodynamics of inhibitor binding.

Methods

Binding Theory

The theoretical framework for binding thermodynamics has been presented in recent literature;[4,41] here, we summarize the relevant portions to place our work in context. The standard free energy of binding for a ligand (L) and receptor (R) in solution can be evaluated using the standard chemical potential for each species,The standard chemical potential for a dilute solution of ligand is defined as[42]Here, VN,L is the volume of the system containing N solvent molecules and one ligand molecule. C° is the standard state concentration, taken as 1 M, which is equivalent to 1000 NA m–3, where is Avogadro’s constant. QN,L and QN are the partition functions for systems containing N solvent molecules and one ligand molecule, and only N solvent molecules, respectively. The last term, P°VL, corresponds to the work associated with moving the ligand from the gas phase to a solvated state at constant pressure, where VL is the volume of a single ligand and P° is the standard state pressure. This last term will be very small except at very high pressure, and in the present analysis, it is assumed that binding occurs at 1 atm where this pressure–volume term will be negligible. The ratio of partition functions is expanded as follows,where β = 1/(kBT), σL is the symmetry number of the ligand, qS/L and pS/L refer to the set of all position and momentum degrees of freedom of the solvent and ligand, respectively, and MS and ML define the total number of solvent and ligand atoms, respectively. U is the internal energy, T is the absolute temperature, h is Planck’s constant, and kB is Boltzmann’s constant. This expression can be simplified by analytically integrating over the momentum portion of phase space (from −∞ to +∞) for each atom i of both the solvent and ligand (pS, pL) and canceling the resulting expressions for the solvent momentum.Further simplification is possible by defining a potential of mean force W(qL) to make use of an implicit solvent treatment and avoid explicit integration over solvent degrees of freedom. This is done by defining the interaction potential between the ligand and the solvent for a fixed configuration of the system and averaging the Boltzmann factor of this potential over all solvent degrees of freedom.Substituting eq 6 into eq 4 yields a reduced expression in which the energy of the ligand no longer depends upon the exact configuration of the solvent, The integral over the coordinates of the ligand (qL) can also be simplified by defining an internal reference frame that does not depend on absolute external coordinates (i.e., translational and rotational coordinates) of the ligand. This coordinate frame is defined using a set of three bonded atoms in the ligand to specify the six external degrees of freedom and a set of 3N–6 bond length (rL), bond angle (θL), and torsional angle (ϕL) (BAT) coordinates to recursively specify the position of each subsequent atom relative to the position of the first three atoms. This coordinate change allows the integral over ligand configurational space to be separated into external and internal pieces, where the potential of the solvated ligand (U(qL)) is now independent of the external degrees of freedom. Analytically integrating over these external degrees of freedom of the ligand yields a constant factor of 8π2VN,L.[4] The remaining integral over internal degrees of freedom can be computed numerically (often approximately and on a coarse grid due to size), and doing so in a BAT coordinate system often results in improved accuracy, as BAT sampling corresponds to natural motions of the molecule and a smoother exploration of the potential energy surface compared to a Cartesian coordinate system.[43] After simplification, the resulting expression for μsol,° iswhere ZL is a configurational integral over the solvated ligand, internal degrees of freedom, and JL = rL2 sin θL is the Jacobian weight for sampling in a BAT space.[44,45] Note that in this study only torsional degrees of freedom of the ligand were explored. Bond lengths and bond angles were held fixed at their equilibrium values, as it has been suggested that these degrees of freedom experience only small changes in configurational freedom upon binding and contribute negligibly to the free energy change.[10] Receptor degrees of freedom were held fixed due to issues of computational tractability and the large number of receptor degrees of freedom. The derivations for the standard chemical potential of the receptor and complex are similar and will not be repeated here. It should be noted, however, that in the complex the six external (i.e., translational and rotational) degrees of freedom of the bound ligand become internal degrees of freedom of the complex and integration over these new internal degrees of freedom is limited to only those conformations in which the ligand is actually bound and contained entirely within the receptor’s active site cavity. Combining eq 1 with eq 8 for the ligand, receptor, and complex, the following expression for the standard free energy change is obtainedNote that this expression is only dependent upon the configurational degrees of freedom of the complex, unbound receptor, and unbound ligand; all factors resulting from integration over the momentum portion of phase space exactly cancel when taking the difference between the bound and unbound states. Once the partition functions in the bound and unbound states have been found, the enthalpy change (excluding negligible pressure–volume terms) can be found by calculating the appropriate averages over solute configurational spacewhere ⟨⟩ defines the configurational ensemble average over ligand and receptor degrees of freedom, respectively. The configurational entropy change upon binding can be found through the canonical equation[46]which results in the following expressionThe final three terms that appear in the above expression for the entropy change result from the introduction of a potential of mean force (eq 6) to implicitly deal with solvent degrees of freedom. This formulation partitions the entropy change into additive solute and conditional solvent components in a mathematically and thermodynamically rigorous fashion.[41,47] The first two terms in eq 13 correspond to the configurational entropy change of the solute, while the remaining terms correspond to the change in solvent entropy conditioned on the configurational state of the solute, averaged over all solute configurational degrees of freedom.In the present study, all reported free energy differences include enthalpic and entropic contributions from both solute and solvent degrees of freedom, while all reported entropic free energies include only contributions from the solute degrees of freedom.

Conditional Mutual Information Expansion

The configurational entropy of each ligand was decomposed into individual, per degree of freedom entropy and higher-order coupling terms using a conditional mutual information expansion (CMIE). Similar to the mutual information expansion presented by Matsuda[34] and Killian et al.,[35] this expansion divides the full entropy into a sum of sequentially higher-order mutual information terms. However, rather than partition the total entropy into a set of overlapping entropic contributions that are added and subtracted with successive terms, we partition the space into a set of mutually exclusive terms, each of which captures the entropy content of either a single degree of freedom or the coupling between a group of degrees of freedom. This is done by adding up the mutual information of all possible combinations of degrees of freedom, given that the distributions of the remaining variables are known. This can be expressed aswhere N is the total number of degrees of freedom of the system, {x} is the complement of {x}, and I({x}|{x}) is the mutual information of a set of variables {x} conditioned on the complementary set {x} or simply the conditional entropy when |{x}| = 1. The semicolon used here denotes mutual information between degrees of freedom (e.g., I(x;x;x) denotes the mutual information shared between the three probability distributions x, x, and x). This decomposition follows from the set measure-theoretic definition of multivariate mutual information,[48,49] where each conditional information term corresponds to a nonoverlapping subset of an information diagram. As an example, consider a system with three degrees of freedom {x,y,z}. The CMIE for this system iswhere the first three terms are of first order, the second three terms are of second order, and the last term is of third order. As illustrated in Figure 1, the first-order terms define the conditional entropy due solely to each individual degree of freedom; this corresponds to the average entropy due to a single degree of freedom, given that the remaining degrees of freedom are known. That is, first-order measures define the entropy due to each degree of freedom that is not mediated by any other degrees of freedom through coupling. Similarly, the second-order terms define the conditional mutual information between each pair of degrees of freedom, which correspond to measures of the coupling present between pairs of variables that is not mediated by higher-order coupling. The third-order term defines the higher-order coupling present among all variables. It is important to note that while this expansion partitions the entropy into nonoverlapping pieces, only first- and second-order terms are guaranteed to be positive.[47] As such, higher-order mutual information terms can either increase or decrease the total entropy of the system. Additionally, as with any entropy expansion, all of these terms are fundamentally dependent upon the choice of the reference frame and thus represent a potentially nonunique but still useful interpretation.[4,41]

Figure 1

Three body conditional mutual information expansion. The entropy of a three body system with degrees of freedom x, y, and z corresponds to the union of all three circles. This total entropy is decomposed according to eq 16 into marginal entropies (blue, green, and red areas), pairwise coupling entropies (purple, orange, and brown areas), and a single three-body or third-order entropy (yellow area).

Ensemble Enumeration and Partition Function Determination

The bound and unbound state configurational integrals (eq 9) for five HIV-1 protease inhibitors (Figure 2) were evaluated via a three-step, rotamer based, enumerative configurational search. All internal torsions as well as the six ligand–receptor intermolecular BAT degrees of freedom were rotamerized using uniform step sizes to exhaustively explore configurational space at different levels of discretization. All examined ligands were comprised of a common chemical scaffold with potentially variable functional groups at five positions (R1-R5). The first step of the search involved generating separate discretized libraries of scaffold positions and orientations as well as rotamer libraries of all possible functional group configurations relative to the scaffold. The second step employed the guaranteed DEE/A* search algorithms to explore all possible combinations of the rotamer libraries found in the first step and generate an energy-ordered list of all possible low-energy configurations using a pairwise additive energy function (termed low-resolution). The third phase of the calculation used a tiered energy function strategy to re-evaluate the energies of the collected low-energy configurations using a high-resolution energy function and numerically integrate over the explored configurational space.

Figure 2

Selected HIV-1 protease inhibitor structures. These five inhibitors were originally designed by Altman et al. to test the substrate envelope hypothesis.[24] They are derived from the darunavir/amprenavir scaffold and all exhibit nanomolar binding affinity. The ensemble of low-energy scaffold conformations was generated using an enumerative, Metropolis Monte Carlo (MC) search.[50] The goal of this step was not to collect a Boltzmann ensemble via sampling, as is traditionally done using MC, but to mine for an ensemble of low-energy scaffold configurations whose relative probabilities will be explicitly computed after exploring the remaining configurational space. For all simulations, a thermodynamic temperature of 298 K was used, as was a continuous move set of all torsional rotations, excluding methyl and amide bond rotations, and overall translations and rotations in the bound state. The upper bounds on step sizes for overall translations and rotations were set to 0.5 Å and 30°, and individual torsional moves were capped at 15° and 180° in the bound and unbound state, respectively, with an equal weight applied to all moves. Ten independent simulations of 50 000 steps each were performed for each ligand in both the bound and unbound states, and the scaffold pose and internal, torsional degrees of freedom of all collected configurations were snapped to a uniform rotamer grid. In the bound state, the six atoms used to define the scaffold pose reference frame in BAT coordinates were receptor atoms Cβ, Cγ, and Oδ2 of residue D25 (chain B) and atoms C1, C2, and C3 of the scaffold (Figure 2). The six ligand–receptor BAT degrees of freedom were bond length Oδ2–C1, pseudotorsions Cβ–Cγ–Oδ2–C1, Cγ–Oδ2–C1–C2, and Oδ2–C1–C2–C3, and pseudobond angles Cγ–Oδ2–C1 and Oδ2–C1–C2. The bound-state grid for the scaffold pose and internal, torsional degrees of freedom was defined using a resolution of 0.1 Å and 10° for all bond lengths and bond angles and torsions. The unbound state grid was defined only for internal, torsional degrees of freedom using a resolution of 20°. All simulations were performed using the CHARMM computer program[2,51] with the CHARMm22 force field[52] and a distance-dependent dielectric constant of 4. The remaining internal torsional degrees of freedom (i.e., functional group torsions) were explored using an in-house implementation of the DEE and A* algorithms.[24] Using uniformly sampled, complete scaffold and functional group rotamer libraries, these algorithms are guaranteed to find the GMEC as well as all other near-optimal structures up to a user supplied energy cutoff. Using the list of collected energies produced by this guaranteed search, Boltzmann factors (e–β) were calculated for each configuration i and used to compute a low-resolution estimate of the configurational integral (eq 9) by numerical quadrature. Multiple functional group rotamer libraries were used in this study, including those that contained all possible combinations of internal torsional degrees of freedom sampled uniformly every 120°, 60°, 30°, and 15° in the bound state, and every 120°, 60°, and 30° in the unbound state. For all high-throughput energy evaluations, a pairwise decomposable energy function was used that included all pairwise van der Waals and Coulombic, intra- and intermolecular interactions, computed with the CHARMm22 parameter set[52] but ignored internal energy differences for computational efficiency. The energy cutoff used in this low-resolution estimate was always ≤10 kcal/mol, as this provided enough coverage of the potential energy landscape to guarantee partition function convergence (vide infra). The final step of the search included the energetic re-evaluation of the collected ensemble using a higher resolution (more detailed) energy function to account for solvation effects and to obtain a more accurate measure of the free energy change upon binding. The improved energy function included all pairwise van der Waals interactions, continuum electrostatic solvation energies collected from a converged linearized Poisson–Boltzmann calculation using the DelPhi computer program,[53,54] as well as solvent accessible surface area energies to model the hydrophobic effect.[55] Solvation energies were calculated using an internal dielectric of 4 and a solvent dielectric of 80. A grid resolution of 129 × 129 × 129 with focusing boundary conditions[56] was used, along with a Stern layer of 2.0 Å and an ionic strength of 0.145 M. The high-resolution (HRes) free energy was computed by reassessing the 50 000 lowest energy configurations using the HRes energy function and re-evaluating the configurational integral. This truncated integral was then corrected using an approximate term that accounted for the contributions of the remaining low-energy configurational ensemble. This correction corresponds to integrating over the HRes energy levels of the unaccounted portion of the ensemble using a probabilistic formalism similar to those hierarchical evaluation methods used in molecular design.[57] Formally, this method breaks up the ligand low-energy configurational space into two regions: one described in terms of explicit configurational states evaluated using the HRes energy function, and the other in terms of distributions of HRes energy levels (EHR) inferred from the low resolution (LRes) energy level (ELR) distribution:Here, g(EHR) is the degeneracy of the HRes energy levels, and A and B define complementary regions of configurational space that together cover the entire space. Note that this approximation of the HRes energy space was made for computational efficiency, as it is currently computationally intractable to explicitly re-evaluate a sufficiently large number of the millions of configurations collected from the LRes DEE/A* search necessary to ensure convergence. As outlined above, the first term of eq 17 corresponds to the HRes partition function defined by some fraction of the total number of low-energy configurations, and it is calculated by explicitly re-evaluating the energies of the top 50 000 configurations and integrating over these states. The second term estimates the contributions made by the remaining, higher-energy members of the full ensemble to the HRes partition function and can be viewed as a correction to the first term. It was computed using an approximate distribution of HRes energy levels, inferred from the known LRes energy-level distribution, to estimate the degeneracy (i.e., number density of configurations) at a given HRes energy level. Specifically, the collected LRes energy space (minus the top 50 000 structures) was divided into 0.1 kcal/mol bins, and a randomly selected set of 1000 configurations was re-evaluated in each bin. Each 1000-configuration sample was used to approximate the distribution of HRes energy levels observed in each LRes bin i. The resulting set of normalized high-resolution energy-level probability distributions, P(EHR), was fit to either single- or double-skew normal distributions,[58] as the HRes energy distributions bore a strong resemblance to these functional forms, to determine the approximate shape of the distribution. Each was then weighted by the number of configurations in that particular bin ρ, which yielded the HRes energy level degeneracy in each bin, g(EHR).The total contribution of all bins to the high-resolution partition function was then calculated by integrating over the HRes energy levels in each bin via numerical quadrature and then summing over each LRes bin

Structure Preparation

The receptor structure used in this study was a darunavir bound X-ray crystal structure obtained from the Protein Data Bank (PDB; Accession code 1T3R),[59,60] prepared using methods and structural modifications described by Altman et al.[24] Partial atomic charges for each inhibitor were determined by fitting to the electrostatic potential of an optimized ground-state structure using the restrained fitting method of Bayly et al.[61] Geometry optimizations as well as electrostatic potential calculations were performed with the GAUSSIAN 03 computer program[62] using the Restricted Hartree–Fock method with the 3-21G and 6-31G* basis sets, respectively.

Experimental Determination of Inhibitor Dissociation Constants

HIV-1 protease inhibitor activities were determined by a fluorescence resonance energy transfer (FRET) method. Protease substrate 1 [Arg-Glu(EDANS)-Ser-Gln-Asn-Tyr-Pro-Ile-Val-Gln-Lys(DABCYL)-Arg] was labeled with the energy transfer donor, 5-((2-Aminoethyl)amino)naphthalene-1-sulfonic acid (EDANS), and acceptor, 4-((4-(dimethylamino)phenyl)azo)benzoic acid (DABCYL), dyes at its two ends to perform FRET.[63] Fluorescence measurements were carried out on Victor2V plate reader (PerkinElmer). Excitation and emission filters were 340 and 490 nm, respectively. All inhibitors were dissolved in dimethyl sulfoxide (DMSO) and each of them was diluted to 12 appropriate concentrations. Protease (2 μL) and inhibitor or DMSO (2 μL) were added into a 96-well assay plate containing 76 μL of HIV-1 protease substrate buffer. All pipettings were performed by a liquid handling system (Tecan Genesis). 80 μL of protease substrate (2 μM) was quickly added into the well with an injector to initiate the cleavage reaction. Each reaction was recorded for 5 min. Inhibitor dissociation constants (K) were obtained by nonlinear regression fitting (GraFit 5) of the plot of the initial velocity as a function of inhibitor concentrations based on the Morrison equation.[64,65] The reported K values were an average of at least 3 individual measurements.

Results and Discussion

Using the enumerative, configurational search method described herein, we explored the configurational space of five HIV-1 protease inhibitors. We assessed the precision and accuracy of our method by examining its thermodynamic convergence properties as a function of the granularity of the rotamer libraries used and the size of the ensemble collected as well as by comparing the converged, calculated inhibitor binding free energies to experimental measurements. We subsequently analyzed the source of the changes in the entropic free energy upon binding in terms of both marginal entropy changes of individual ligand degrees of freedom as well as coupling entropy changes between multiple ligand degrees of freedom.

Rotamer Grid Resolution and Thermodynamic Convergence

Conformational space for inhibitor degrees of freedom was explored at multiple resolutions. Ultimately, the grid resolution used to compute all thermodynamic parameters was selected based on the rate of exploration of scaffold degrees of freedom and the numerical convergence observed in the computed free energy. As the configurational search was performed in multiple steps, we examined the convergence of each step separately. In step one, the external and internal scaffold degrees of freedom were explored via an enumerative MC simulation in which collected configurations were snapped onto a uniform rotamer grid of user defined resolution. The coverage of low-energy space was measured at multiple grid resolutions to find the highest resolution grid possible while maximizing coverage (Figure 3). Simulation convergence was quantified via the number of unique, grid-snapped configurations found per MC step, as this growth rate should approach zero as the simulation length increases and more configurational space is explored. A 0.1 Å and 10° grid was used in the bound state and a 20° grid in the unbound, as the final growth rate at these resolutions was 0.03 unique configurations per step or less for all bound and unbound inhibitors. Increasing the grid resolution to 5° and 10° increased the number of unique configurations that required collection to a computationally intractable number of scaffold positions in order to ensure the configurational space was adequately sampled. Note that in all the computed inhibitor ensembles, the unbound state required a coarser grid in order to obtain a comparable rate of convergence using a similar number of overall configurations. This indicates that there is a larger volume of configurational space accessible in the unbound compared to the bound state. Without the receptor present to constrain the torsional motions of the inhibitor, the scaffold is able to adopt a much wider variety of conformations without paying large energetic penalties. It is important to note, however, that while the unbound state has a larger accessible volume of configurations compared to the bound state, the accessible configurational space is smoother, and capturing these features should require less resolution.

Figure 3

KB-98 enumerative Monte Carlo scaffold grid resolution convergence. (A, B) The unique scaffold configuration and pose growth rates measured as a function of grid resolution and Monte Carlo simulation length in the bound and unbound states. (C, D) Final unique configuration growth rates for bound and unbound state. The dotted line indicates the 0.03 configurations per step acceptance cutoff used. In the second configurational search step, the remaining functional group degrees of freedom were enumerated using a user specified rotamer step size, yielding a low-resolution measure of free energy. Convergence of this absolute free energy was measured using the rate of change of the free energy as a function of the grid resolution and offset calculations (Figure 4). Comparing sampling resolutions of 15° versus 30° in the bound state and 30° versus 60° in the unbound state, we observed free energy changes of only ≈3% and ≈1%, respectively. Offset calculations in which the grid was translated by half of the grid resolution also yielded small offsets of less than ±1 kcal/mol. Both of these measures of free energy uncertainty suggest that converged measures of the free energy of binding can be found with moderate functional group grid resolutions. These data also suggest that the unbound state has a more degenerate low-energy configurational space than the bound state with wider, less rugged potential energy wells (i.e., vast regions of unbound configurational space are well described by coarser sampling with limited grid error).

Figure 4

KB-98 functional group grid resolution convergence. (A, B) Low-resolution free energy convergence of KB-98 in the bound and unbound state as a function of the rotamer step size used when searching the functional group degrees of freedom. Error bars indicate the standard deviation computed from the two offset calculations performed at each grid resolution. All calculations use a starting geometry collected from the lowest energy configuration found during the bound state scaffold search. The offset calculations begin from this configuration with each degree of freedom offset by half the step size so as to escape from the initial, low energy well. (C, D) Final rate of free energy change for each examined inhibitor in the bound and unbound state.

Ensemble Size and Thermodynamic Convergence

The Boltzmann distributions computed for each inhibitor ensemble were truncated at a range of energy cutoffs to explore the effect of collected ensemble size on free energy convergence. These cutoffs define the ensemble of all configurations with energies within a particular energy range above global minimum energy (Figure 5). Our results suggest that only a very small portion of configurational space is necessary to achieve a very high level of convergence for the free energy so long as it covers the low-energy conformations in the space. The lowest 1 kcal/mol of the ensemble brings the computed free energy within 2% (≈1.5 kcal/mol) of the converged free energy value in the bound state and within 5% (≈2.5 kcal/mol) in the unbound state. At the highest degree of rotamerization, this corresponds to less than 200 and 700 configurations in the bound and unbound states, respectively. This behavior was observed in all configurational ensembles, and all thermodynamic averages were converged to within less than 1 kcal/mol when the full set of configurations was included. This rapid convergence suggests that the most relevant portions of configurational space are low-energy wells and that the average thermodynamic properties of these systems are well described by low-energy configurational ensembles, supportive of the predominant state hypothesis.[10]

Figure 5

Configurational free energy convergence. (A–C) The convergence of the free energy, enthalpy, and entropy of KB-98 in the bound state, using a 10° scaffold and 15° functional group grid, as a function of ensemble size, measured in kcal/mol above the global minimum energy. The red portion of curve shows the contribution to the average from the top 50 000 configurations (computed by explicit re-evaluation), and the blue portion shows the contribution from the remaining millions of configurations (computed via high resolution energy level inference). (D) The final rate of convergence of each thermodynamic parameter for each inhibitor in the bound state measured as a function of change in ensemble cutoff (kcal/mol per kcal/mol of the ensemble). (E–H) Measures of convergence for the unbound state using a 20° scaffold and 60° functional group grid. One should note, however, that given a fixed ensemble size, not all averages reach the same level of convergence. We observe that ensemble enthalpies and entropies show slower rates of convergence compared to free energies, consistent with their derivative relationship to the partition function.[66] Using a 15° grid in the bound state and a 60° grid in the unbound state, the ensemble free energies exhibited an average final rate of convergence of 0.001 and 0.1 kcal/mol per kcal/mol of the ensemble, respectively. By comparison, both the enthalpy and entropy averages showed average final rates of 0.02/0.2 and 0.02/0.3 (bound/unbound). These data suggest that larger fractions of configurational space are required to accurately gauge enthalpic or entropic contributions to binding compared to the full free energy and that the sampling error associated with these contributions partially cancels when computing the free energy. In particular, accurate estimation of entropic changes requires elaboration of low-probability (i.e., high-energy) regions of the distribution, which becomes increasingly difficult to explore as the degeneracy of configurational space increases. Nonetheless, the required low-probability regions identified here fall within the lowest 10 kcal/mol of the ensemble. Finally, in all cases, a significantly smaller portion of the less degenerate bound-state space was required to obtain comparable levels of convergence compared to the unbound state.

Experimental versus Calculated Inhibitor Affinities

One of the primary motivations in this study was the lack of strong correlation between the originally computed design energies for these inhibitors and experiment (Figure 6A). We sought to improve upon our original relative affinity predictions and examine which methodological enhancements brought about the most significant correlative improvement of calculated affinity with experiment. Examining the r2 correlation coefficients for calculated vs experimental affinity, the previous static enthalpy metric shows almost no correlation (0.05). Note that the single conformations used to evaluate binding in this study were found using a coarser rotamer library, a constrained scaffold set, and the rigid binding approximation. Using our more exhaustive search method, we first examined the effect of searching both the bound and unbound states in tandem for low-energy structures with a higher resolution rotamer set (Figure 6B). By selecting the lowest energy structure in both the bound and unbound ensembles, we computed a static enthalpy of binding that shows marked correlation with experiment (r2 ≈ 0.7) and significant reduction in the variance weighted sum of squared error (χ2 ≈ 47 vs 230). While the relative affinity ordering is still not correct, we have effectively separated the cluster of drugs that were previously all predicted to bind with the same affinity. This large improvement in correlation likely stems from two factors: searching conformational space more finely and appropriately accounting for the significant conformational change that each inhibitor undergoes upon binding with independent bound and unbound state searches. Heavy atom, least-squares alignments of the best unbound to bound conformations show root-mean-square deviations of greater than 3.4 Å for each of the five inhibitors. Additionally, when bound inside the protease, each inhibitor takes on an extended shape to fit within the active site, and when unbound, each inhibitor undergoes a structural collapse in order to maximize solvation as well as intramolecular interactions. We find that the solvent accessible surface area of the best inhibitor configurations increased on average by 80 ± 20 Å2 upon binding.

Figure 6

Correlation between calculated and experimental binding affinity. Effect of approximation on the correlation of the calculated inhibitor affinities with the experimental binding free energies. (A) Correlation between previously calculated static enthalpy metric (single conformation energy assuming rigid binding) and experiment. (B) Correlation between updated, static enthalpy metric (single conformation energy difference between bound and unbound states) and experiment. (C) Correlation between ensemble enthalpy and experiment. (D) Correlation between updated static enthalpy with ensemble entropy change and experiment. (E) Correlation between ensemble free energy change and experiment. We next examined the correlative effect of ensemble averaging by computing the average enthalpy change upon binding for each of the five inhibitors (Figure 6C). This further improved correlation with experiment, yielding an r2 value of 0.85, an additional 2-fold reduction in χ2 error, and the correct relative ranking of each inhibitor. Incorporating the ensemble average into the calculation of the enthalpy change had a significant effect on both the bound and unbound state and moved the average enthalpy up by 2.3 ± 0.2 kcal/mol and 2.8 ± 0.7 kcal/mol in the bound and unbound state, respectively. Examining the net difference, we find that KB-98 and AD-86 experienced the largest change relative to their static enthalpy evaluation (≈1 kcal/mol). This is due to the fact that for these two inhibitors, the average unbound state enthalpy was pushed up by 1 kcal/mol more than in the bound state, which suggests that the unbound low-energy state space of these two compounds is larger compared to the remaining inhibitors. These changes result in residual improvement for all points except AD-93, with KB-98, KB-92, and AD-94 showing the largest improvement. We also examined the correlative effect of ensemble averaging by correcting the new static enthalpy estimates with our computed entropy penalties (Figure 6D). Interestingly, this also introduced a clear separation between the high-affinity inhibitors (KB-98 and AD-93) and the less effective ones (AD-94, KB-92, and AD-86), and significantly improved correlation with experiment, giving a correlation coefficient of 0.84. Comparing this correlation with that of the new static enthalpy change, the observed improvement is primarily the result of bringing KB-98 and AD-94 closer to the best fit line. KB-98 shifted because its computed entropic free energy loss was much smaller relative to the other computed entropic penalties, while AD-94 shifted because it experienced the largest entropic free energy loss. Note that attempts to similarly correct the new static enthalpy with a constant, entropic penalty per rotatable bond fail to significantly improve correlation. We explored possible constant corrections up to 2 kcal/mol/bond, and the most effective (0.5 kcal/mol/bond) only yielded an r2 value of 0.74. The final effect we explored was accounting for both the configurational entropy change of the ligand (ΔSbind) as well as the average enthalpy change (ΔHbind) upon binding, which together correspond to the full configurational free energy of binding, ΔGbind (Figure 6E). Surprisingly, while separately including either ensemble measure significantly improves correlation, together there is only a slight improvement over previously examined metrics (r2 ≈ 0.87). Each ensemble measure captures similar information such that together they have only a small, coupled effect. In total, our results show that finding a better estimate of the global minimum energy conformation (GMEC) in both the bound and unbound states can substantially improve relative inhibitor rankings but that further improvement necessitates an ensemble treatment. We find that including information about the shape of the minimum energy potential well and surrounding wells, in addition to its relative position, is required to resolve more subtle differences between inhibitors. We note that while the slopes of these lines of best fit are larger than 1, our primary design objective was to improve the relative ranking of these inhibitors, which does not depend on their absolute correlation with experimental free energies. However, we suspect that this discrepancy may be due to the assumption that the protein remains rigid for all bound states. A similar quantitative picture emerges when examining the calculated enthalpy–entropy breakdown of these inhibitors (Table 1). We observed that ΔH is favorable and nearly twice the size of the unfavorable, configurational entropy loss. Given this large relative difference and the functional form of the Boltzmann distribution, the importance of the GMEC in ranking can be rationalized, as this distribution is strongly biased toward and peaked about low-energy configurations. It is interesting to note that the configurational entropy changes of the ligand (−TΔS) are quite large (+22–26 kcal/mol), and they are very similar to previous estimates of the configurational entropy loss of chemically similar HIV-1 protease inhibitors calculated using different methodologies.[10] In contrast to cheaper, empirical measures of configurational entropy loss, these entropies show only marginal correlation with the number of rotatable bonds explored (r2 ≈ 0.5). In particular, the entropy losses of AD-94 and AD-86 deviate significantly from the trend exhibited by KB-98, AD-93, and KB-92, which show a consistent loss of ≈1 kcal/mol per rotatable bond. Both KB-92 and AD-94 have 15 rotatable bonds, yet AD-94 loses nearly 1.5 kcal/mol more in entropic free energy upon binding. Structurally, these two inhibitors are also very similar and differ only in the identity and flexibility of their R2 and R3 functional groups. In AD-94, R2 is more flexible than R3, while the reverse is true for KB-92. In the case of AD-86, we find that it lost nearly 2 kcal/mol less entropic free energy than would be expected assuming a constant entropic penalty. It is more flexible than KB-92 at the R1 and R2 positions and AD-94 at the R1 and R3 positions, yet it lost much less entropy than expected given its flexibility. These deviations highlight the fact that both the number of rotatable bonds as well as their location influence configurational entropy losses.

Table 1

Calculated Thermodynamic Changes (kcal/mol) upon Binding for the Five Tested HIV-1 Protease Inhibitors

	KB-98	AD-93	KB-92	AD-94	AD-86
ΔG	–24.3	–26.0	–19.8	–19.5	–20.1
ΔH	–46.9	–49.7	–44.5	–45.6	–45.1
–TΔS	22.6	23.7	24.8	26.1	24.96
no. rot. bonds	13	14	15	15	17

Analysis of Marginal Configurational Entropy Changes

In order to understand the subtle entropic differences between these inhibitors and identify the major contributions to the absolute entropy loss, we decomposed inhibitor entropy changes using a conditional mutual information expansion (CMIE). The decomposition separated the entropy change into additive contributions that quantified the marginal conditional entropy losses due to each individual degree of freedom as well as all higher-order changes in coupling entropy (pairs, triplets, etc.). Unlike similar information theoretic entropy decompositions such as MIE[35,36] or MIST,[37,38] this decomposition was not a means of approximating the true entropy of the bound or unbound state or any high-order coupling entropy terms from lower-order ones. It is simply an alternative expansion whose individual terms provide detail about direct coupling contributions of all subsets of degrees of freedom. We note that if one were not able to enumerate the configurational ensemble as was done here, the CMIE would not be a useful approximation method, as the conditioning of each term (first-order, second-order, etc.) requires knowledge of the full probability distribution of each degree of freedom. We find that the vast majority (95%) of the total ligand conformational entropy change was due to first-order, uncoupled degrees of freedom, and coupling of degrees of freedom accounted for the remaining 5% of contributions. As can be seen in Figure 7, all inhibitors experience large losses in external entropy (≈12.3 kcal/mol), due to constraints limiting motion corresponding to what had been the unbound translational and rotational degrees of freedom. For comparison, assuming a standard state concentration of 1 M, corresponding to a volume of 13280π2 Å3 per molecule, the unbound ligand state has an external, standard state entropic free energy (−TS°) of approximately −7 kcal/mol, with −4.4 kcal/mol from translational entropy and −2.6 kcal/mol from rotational entropy. Note that while this implies that the bound state entropy of external ligand degrees of freedom, which are now internal to the complex, is negative, it comprises only a fraction of the total entropy, which need not be positive. Additionally, these numbers do not account for the entropic contribution of the momentum portion of phase space, which exactly cancels when taking the difference upon binding (see eq 10).[67] This estimate of the external entropy loss compares very well with a variety of alternative formulations. In particular, Chang et al. estimate an external entropy loss of 12.3 kcal/mol for a chemically similar inhibitor, amprenavir, binding to HIV-1 protease using the second generation mining minima algorithm, and a loss of 11.6 kcal/mol using the quasiharmonic approximation.[10] Using molecular dynamics in conjunction with the quasiharmonic approximation as well as Schlitter’s entropy formula, Carlsson and Aqvist estimate the combined translational and rotational entropy loss of benzene binding to rigid T4 lysozyme to be ≈11 kcal/mol.[68]

Figure 7

First order conditional entropy losses. The reported entropy loss for each group is defined as the sum of the first-order conditional entropy losses for each degree of freedom contained within the structure group. The scaffold contains 5 torsions, R1 contains up to 5 (KB-98/AD-93, 3; AD-94/KB-92, 4; AD-86, 5), R2 contains up to 2 (KB-98/KB-92, 1; AD-93/AD-94/AD-86, 2), R3 contains up to 2 (KB-98/AD-93/AD-94, 1; KB-92/AD-86, 2), and both R4 and R5 contain 1 torsional degree of freedom. The remaining internal, inhibitor degrees of freedom contributed net losses of ≈9–11 kcal/mol (Table 2; −TΔSmarginal), with a coupling independent, average loss of 0.7 ± 0.3 kcal/(mol·torsion) of configurational entropic free energy. This average loss is in agreement with previously reported values of 0.4 to 0.9 kcal/(mol·torsion), which were estimated from the experimentally measured thermodynamics of fusion of small hydrocarbons.[69] Page and Jencks reported losses of 1–1.4 kcal/(mol·torsion), which were estimated from the entropy loss measured upon hydrocarbon cyclization.[70] Note that the former estimate is derived from the entropy loss as a molecule is captured inside a crystal lattice, while the latter is a measure of the entropic cost of freezing out a degree of freedom into a constrained ring structure. The difference between our estimate and that of Page and Jencks could be due to the fact that individual torsional angles are only partially frozen upon ligand binding. As we can see from examining the marginal probability distributions of individual degrees of freedom, many torsional angles retain a considerable amount of conformational freedom upon binding (Figure 8).

Table 2

Calculated Enthalpy and Entropy Component Changes (kcal/mol) upon Binding for the Five Tested HIV-1 Protease Inhibitors as Found from the Top 50 000 Lowest Energy Structures

	KB-98	AD-93	KB-92	AD-94	AD-86
ΔH_elec	–11.66	–11.35	–10.15	–9.90	–7.87
ΔH_vdw	–64.74	–67.84	–60.17	–61.93	–66.63
ΔH_sasa	–6.23	–6.45	–6.23	–6.22	–6.29
ΔH_solvation	35.26	35.54	33.23	33.50	33.84
–TΔS_ext	12.25	12.37	12.45	12.30	12.03
–TΔS_marginal	8.94	9.58	10.79	10.35	10.62
–TΔS_coupling	0.19	0.86	1.28	0.96	0.18

Figure 8

Selected marginal distributions of KB-98 (A) Marginal distributions for selected torsional degrees of freedom in the bound and unbound states of KB-98. (B) Structure of KB-98 marked with degrees of freedom 1–5. Further exploration of these distributions showed unique differences in how each distribution changes upon binding. We observed two major trends among all the inhibitors, which we illustrate using KB-98 as an example. First, moving down either column in Figure 8, we see the distributions becoming increasingly spread out, indicating that each degree of freedom is increasingly more mobile. Small-angle rotations about scaffold torsions (degrees of freedom one and two) swing large lever arms, which result in large displacements. Comparatively, rotations around terminal dihedral angles (e.g., degree of freedom five) swing small lever arms and, as a result, tolerate much larger changes. Second, loss of configurational freedom upon binding for these inhibitors was due both to the disappearance of populated wells and to well contraction. Comparing the unbound- and bound-state distributions for degrees of freedom one, two, and four, we observed a collapse from a multimodal distribution to a unimodal one. Additionally, making the same comparison for degrees of freedom one, three, and four, the corresponding wells contracted from a width of 80°, 120°, and 90° in the unbound state to 10°, 60°, and 60° in the bound. The marginal distribution changes for scaffold torsions were particularly stark, with unimodal collapse and contraction upon binding for almost all core degrees of freedom across all the examined inhibitors. The window of occupied configurational states for these motions is always less than 20° in the bound state, which implies that accurate sampling of these highly constrained degrees of freedom requires very small step sizes. Comparatively, the most free motions correspond to hydroxyl rotations, as they have the shortest associated lever arm. All such groups exhibit broad, nearly uniform distributions in the unbound state, which become more biased (widths of ≈180°) and centered around an ideal hydrogen bond position upon binding. Examining these marginal distributions for the variable functional groups R1, R2, and R3 across all of the studied inhibitors and their marginal entropy losses, we noted a spatial dependence of the marginal entropy loss, with differential losses per degree of freedom depending on where the structural group interacts within the active site. Table 3 shows the average entropy loss upon binding per degree of freedom for the external, scaffold, and functional groups. Averaging over all five inhibitors, we find that for this scaffold the R3 group, which binds in the P2′ pocket, experienced the largest entropic loss per degree of freedom (1.2 ± 0.5 kcal/(mol·torsion)). By comparison, the R1, R2, and R4 groups, which sit in the P2, P1′, and P1 pockets, lost 0.6, 0.8, and 0.7 kcal/(mol·torsion), respectively. This suggests that rigid functional groups might be especially important at this site because groups that bind to the P2′ pocket experience greater losses in entropy per flexible torsion upon binding than other sites. Interestingly, when examining the experimentally measured affinities of the larger MIT-2 inhibitor library, we see a similar trend in which the binding free energy became more unfavorable as the functional groups became more flexible at position R3.[24]

Table 3

Entropy Loss Per Rotatable Bond (kcal/mol)

	–TΔS (kcal/mol)
external	2.05 ± 0.03
scaffold	0.65 ± 0.05
R1	0.6 ± 0.1
R2	0.8 ± 0.3
R3	1.2 ± 0.5
R4	0.7 ± 0.1
R5	0.49 ± 0.06
avg. internal	0.7 ± 0.3

Analysis of Coupled Configurational Entropy Changes

The remaining contributions to the change in configurational entropy upon binding were involved in higher-order coupling terms. Individually, these terms were often much smaller than the first-order terms, but their net effect was significant, accounting for 1–2 kcal/mol of entropy in the bound and unbound states as well as being informative of gross intramolecular coupling behavior. By examining the cumulative differences between the net entropy of the bound or unbound states and the entropy as a function of the number of higher-order terms accounted for (i.e., the error), we observed that the source of the coupling entropy differs between the bound and unbound states (Figure 9A–E). In the bound state, the largest source of coupling appears in second-order terms (as the error drops precipitously upon the addition of second-order coupling terms), but that the higher-order effects only become negligible after the addition of ninth- or tenth-order terms, at which point the cumulative error reaches 0. In the unbound state, we again see that the largest source of coupling arises from second-order terms, but note that higher-order effects become negligible by the addition of fourth- or fifth-order terms. The size of the relative drop in the error between the bound and unbound states upon addition of second-order terms suggests that there was more second-order coupling in the unbound versus bound state, whose contribution translates to a loss of entropic free energy upon binding. The relative importance of higher-order coupling terms in the bound state suggests that there was more significant higher-order coupling in the bound versus unbound state, whose contribution translates to a net gain in entropic free energy upon binding. Averaging over all inhibitors, we find that the net change in entropic free energy upon binding was unfavorable for all coupling interactions involving five or fewer degrees of freedom (−TΔS ≈ + 1 kcal/mol total summed across all terms) and generally favorable for all higher-order coupling interactions (−TΔS ≈ – 0.3 kcal/mol total; Figure 9F). This suggests that, upon binding, the receptor restricted not only the independent motions of individual inhibitor rotatable bonds but many of the pairwise, three-, four-, and five-body coupling interactions present in the unbound state as well. Moreover, this suggests that, in the bound state, higher-order coupling between inhibitor torsional degrees of freedom arose as inhibitors adopted specific conformations to adapt to the constrained receptor binding site.

Figure 9

Cumulative CMIE summation errors. (A–E) Error associated with the cumulative summation of all conditional mutual information terms as a function of term order for each inhibitor in the bound and unbound states. (F) Average, net change in entropic free energy as a function of term order. We examined the large number of individual coupling interactions in both the bound and unbound ensembles and found that the majority of specific coupling terms each contributed less than 0.05 kcal/mol, and that the largest individual coupling terms never contributed more than ≈0.3 kcal/mol. Consistent with our analysis of the overall changes in coupling, we saw that most of the large magnitude couplings terms appeared in the unbound state and stem from second-order coupling between scaffold degrees of freedom. In particular, we observed strong coupling between adjacent scaffold torsions or between torsions one bond apart. Interestingly, these unique pairs of torsions can modulate the van der Waals packing of the functional groups with each other, and Figure 10A shows the two-dimensional probability distribution for two such coupled torsions in KB-98. These two dihedral angles can manipulate the position of R3 relative to the rest of the inhibitor and cooperatively interact to maximize intramolecular van der Waals interactions between the R3 ring and either the scaffold backbone or the R4 phenyl ring (Figure 10 moving from left to right, top to bottom). These data support the notion that unbound state coupling arose as a result of cooperative motions that maintained intramolecular hydrophobic and ring stacking interactions. Note, however, that none of these couplings individually contributed more than 0.15 kcal/mol to the overall entropy change. By comparison, we observed far fewer, large coupling terms in the bound state, and the most significant (≈0.3 kcal/mol) arose between the two dihedral angles surrounding the amide moiety in the R1 functional group. These two torsions seem to be coupled in order to modulate the position of the distal R1 hydroxyl group to maintain its position relative to its hydrogen bonding partners. As in the unbound case, these two torsions compensated for each other, although here they affected intermolecular interactions with the receptor. The higher-order, bound state coupling terms are predicted to couple external degrees of freedom to core, scaffold torsions but individually rarely contribute more than 0.1 kcal/mol.

Figure 10

KB-98 unbound state coupling. (A) Pairwise probability distribution of two coupled scaffold dihedral angles in the unbound ensemble of KB-98. The inset rectangle and arrows highlight the sequence of configurations shown in part C. (B) Unbound KB-98 labeled with dihedrals 1 and 2. Arrows show direction of rotation moving from left to right (dihedral 2) and top to bottom (dihedral 1). (C) Sequence of most probable unbound configurations where dihedrals 1 and 2 have the values indicated by the inset rectangle in part A. Moving from right to left corresponds to clockwise rotation of dihedral 2 with a dihedral 1 fixed, and moving from top to bottom corresponds to counter-clockwise rotation of dihedral 1 with dihedral 2 fixed. Configurations are colored based upon their relative probabilities, with red indicating high probability and the yellow arrows indicate the normal vector of the aromatic rings present in functional groups R3 and R4.

Conclusions

We have presented here a novel method to enumerate and study changes in the potential energy landscape of inhibitors upon binding. Using this enumerative, rotamer based approach, we obtained converged binding free energies, enthalpies, and entropies for flexible HIV protease inhibitors that accurately ranked inhibitor affinities relative to one another. We found that using a fine grain configurational search to refine the global minimum energy conformation in the bound and unbound states to rank inhibitors correlated well with experiment but that ensemble effects were critical for more accurate resolution of affinity differences. Breaking the free energy change into components, we observed that average enthalpies and entropies of binding were highly sensitive to the shapes of the global minimum energy well and surrounding wells but that the sampling errors associated with these sensitivities partially canceled when computing the free energy. Additionally, we found that the predominant state assumption appeared valid for these high-affinity inhibitors in both the bound and unbound states. The majority of configurational space contributed only marginally to the ensemble free energy, and converged free energy, enthalpy, and entropy values were obtained when truncating the configurational integral to include only those configurations within 10 kcal/mol of the GMEC. Compared to the free energy, however, computing accurate entropy and enthalpy changes required larger low-energy ensembles that accounted for lower probability regions of phase space. Analysis of the low-energy thermodynamic ensembles collected in this study revealed both how the shape of this landscape changed upon binding and how these differences translated into changes in the thermodynamic properties of the system. By decomposing the entropy change using an additive, conditional mutual information expansion, we saw that the large computed differences in configurational entropy upon binding originated primarily from losses in external and uncoupled internal entropy, with average losses consistent with previously reported experimental and computational estimates. From a potential energy landscape perspective, these changes arose from both well contraction and well disappearance. Changes in coupling entropy played a more subtle, less pronounced role, and while their net effect was significant and critical to the rank ordering of inhibitor affinity, the entropy present in individual coupled motions was small. We found that most significant coupling interactions were of low (second or third) order and most often appeared between neighboring dihedral angles that could cooperatively modulate intermolecular or intramolecular interactions in the bound and unbound state, respectively. Examining the change in coupling between the bound and unbound state, we observed a net loss of low-order coupling interactions present in the unbound state between core degrees of freedom, and a net gain of high-order coupling interactions that appeared only in the bound state. It is interesting to note that this entropy decomposition could be used to inform the optimization of future inhibitors, as it provides a way to estimate the spacial dependence of entropy loss for a given scaffold and determine the ideal positions to include either flexible or rigid chemical groups. Overall, these results suggest that inhibitor flexibility plays an important role in binding but that the thermodynamic properties of these high affinity inhibitors are fundamentally determined by a small fraction of the full configurational ensemble. Low-energy configurations dominate the ensemble averages and coupling between inhibitor degrees of freedom has only a small but potentially important effect. It is interesting to note that all of these results and conclusions arise without approximating the geometry of the potential energy landscape or inordinate sampling times. Our method is structured around the use the DEE/A* algorithm, which sorts configurations by their energies and explicitly computes their contribution to the Boltzmann distribution. As a result, the free energy is computed from the bottom up without having to approximate or account explicitly for landscape and well geometry. This ensures that the enthalpic and entropic contributions of all spatially distinct, low-energy minima are included according to their level of import and constructs a convergent, minimal configurational ensemble. By comparison, perturbative methods (FEP, TI) are slow to converge, and alternative end-point approaches focus on computing free energies on a well-by-well basis using Boltzmann sampling to explore configurational space and map out low-energy wells. The often used harmonic and quasiharmonic approximations assume the shape of potential energy well(s) can be accurately modeled as a collection of harmonic oscillators, which provides an analytical expression for the free energy contribution of each well. The former is the basis for normal-mode analysis, which has been widely used to estimate entropy changes in biological systems,[71] and the latter is used in the mining minima approach to similar effect.[20] The strong agreement found between ours and more approximate methods speaks to the accuracy of the predominant state and harmonic assumptions made for this system, but this may not be true for all systems. Finally, it is also interesting to note that all the inhibitors examined in this study were originally developed to test the substrate envelope hypothesis and were designed to bind inside the substrate envelope. Four of these inhibitors (KB-98, AD-93, AD-94, and AD-86) were experimentally shown to exhibit relatively flat binding profiles to a variety of HIV protease mutants.[24,72] Considering just the top 50 000 configurations in each ensemble, we find that the vast majority of configurations in each of the respective ensembles also fit inside the substrate envelope, suggesting that the envelope hypothesis may be applicable in a more dynamic context. For an inhibitor to be insensitive to mutations in its target, the low-energy ensemble of ligand configurations must fit inside the substrate envelope or substrate envelope ensemble. The methods outlined here offer a flexible framework in which to study ensemble binding effects, and while the current study only explored ligand configurational freedom, receptor flexibility can be incorporated into this rotamer, DEE/A* based search scheme, given enough computational power. Nonetheless, there are clear limitations to the study presented here, which only considers flexibility in the ligand, binding to a rigid receptor. Proteins have significant numbers of degrees of freedom with local and global motions that will be affected, perhaps differentially, by ligand binding. Moreover, if ligand degrees of freedom couple effectively with receptor ones, then the ligand configurational entropy losses computed here will be overestimates. It has been shown that upon ligand binding the change in protein configurational entropy can be affected both positively and negatively depending upon the specific molecular system.[73] In wild-type HIV protease, ligand binding likely causes a decrease in the configurational freedom of many receptor degrees of freedom, as flap mobility is markedly decreased upon ligand binding and the side chains in direct contact with these tight binding ligands likely experience a reduction in configurational freedom due to enthalpy–entropy compensation;[74,75] nonetheless, it is unclear how strongly backbone motions and/or residues not in direct contact with these ligand are affected. Despite this limitation, in this study, we were primarily concerned with thermodynamic differences between structurally similar ligands, and our ability to correctly rank them. As such, while the changes in the protein configurational entropy will certainly affect the absolute, total change in entropy upon binding, it may have a much smaller effect between related ligands. Extending this methodology to account for receptor degrees of freedom will help address how important this effect may be. We also note that the lack of treatment of internal energies in this study may differentially affect inhibitors upon binding. They were not accounted for at the high-resolution re-evaluation as they were not originally calculated in the low-resolution energy screen used here or in the previous study from which the inhibitors used here were designed.[24] Attempts to introduce their effects at the high-resolution re-evaluation resulted in very poor correlation between the low-resolution and high-resolution energy functions and lack of partition function convergence using the tiered approach applied here (data not shown). We expect that appropriately accounting for the internal energies at both stages of evaluation will result in low/high-resolution energy function correlation comparable to that observed here and further improve the correlation of calculated inhibitors with experiment, both on a relative and absolute scale. Finally, we acknowledge that while we account for the presence of highly conserved ordered water molecules by their explicit inclusion in the bound complex and the electrostatic effects of bulk solvent with our implicit solvation calculations, we likely do not capture the binding thermodynamic effects of semiordered water networks, as such water molecules are neither directly engaged with the complex (and thus not conserved in crystal structures) nor part of the bulk solvent. Water clusters indirectly connected to the ligand or receptor have been observed within the active site in crystal structures of substrate and inhibitor structures,[60,76,77] but their specific role in substrate or inhibitor binding is unclear. It is possible that displacement of these water molecules by conformations in the bound state ligand ensemble is thermodynamically unfavorable, further restricting the configurational space of bound ligands, but further study is needed. Given the structural similarity of these inhibitors and their common mode of binding, however, we suspect even if water clusters do further reduce the configurational entropy of each bound ligand, the differential effect between inhibitors is likely small.

48 in total

1. The Protein Data Bank.

Authors: H M Berman; J Westbrook; Z Feng; G Gilliland; T N Bhat; H Weissig; I N Shindyalov; P E Bourne
Journal: Nucleic Acids Res Date: 2000-01-01 Impact factor: 16.971

2. On the calculation of absolute macromolecular binding free energies.

Authors: Hengbin Luo; Kim Sharp
Journal: Proc Natl Acad Sci U S A Date: 2002-07-29 Impact factor: 11.205

3. Free energy, entropy, and induced fit in host-guest recognition: calculations with the second-generation mining minima algorithm.

Authors: Chia-En Chang; Michael K Gilson
Journal: J Am Chem Soc Date: 2004-10-13 Impact factor: 15.419

4. Calculations of solute and solvent entropies from molecular dynamics simulations.

Authors: Jens Carlsson; Johan Aqvist
Journal: Phys Chem Chem Phys Date: 2006-09-08 Impact factor: 3.676

5. Random walk in orthogonal space to achieve efficient free-energy simulation of complex systems.

Authors: Lianqing Zheng; Mengen Chen; Wei Yang
Journal: Proc Natl Acad Sci U S A Date: 2008-12-15 Impact factor: 11.205

6. The contribution of vibrational entropy to molecular association. The dimerization of insulin.

Authors: B Tidor; M Karplus
Journal: J Mol Biol Date: 1994-05-06 Impact factor: 5.469

7. Empirical scale of side-chain conformational entropy in protein folding.

Authors: S D Pickett; M J Sternberg
Journal: J Mol Biol Date: 1993-06-05 Impact factor: 5.469

8. Evaluation of methods for estimating the dissociation constant of tight binding enzyme inhibitors.

Authors: W R Greco; M T Hakala
Journal: J Biol Chem Date: 1979-12-10 Impact factor: 5.157

Review 9. Theory of free energy and entropy in noncovalent binding.

Authors: Huan-Xiang Zhou; Michael K Gilson
Journal: Chem Rev Date: 2009-09 Impact factor: 60.622

10. Thermodynamic and Differential Entropy under a Change of Variables.

Authors: Vladimir Hnizdo; Michael K Gilson
Journal: Entropy (Basel) Date: 2010-03-16 Impact factor: 2.524

11 in total

1. BWM*: A Novel, Provable, Ensemble-based Dynamic Programming Algorithm for Sparse Approximations of Computational Protein Design.

Authors: Jonathan D Jou; Swati Jain; Ivelin S Georgiev; Bruce R Donald
Journal: J Comput Biol Date: 2016-01-08 Impact factor: 1.479

2. Binding Thermodynamics and Kinetics Calculations Using Chemical Host and Guest: A Comprehensive Picture of Molecular Recognition.

Authors: Zhiye Tang; Chia-En A Chang
Journal: J Chem Theory Comput Date: 2017-12-14 Impact factor: 6.006

3. Computational Analysis of Energy Landscapes Reveals Dynamic Features That Contribute to Binding of Inhibitors to CFTR-Associated Ligand.

Authors: Graham T Holt; Jonathan D Jou; Nicholas P Gill; Anna U Lowegard; Jeffrey W Martin; Dean R Madden; Bruce R Donald
Journal: J Phys Chem B Date: 2019-11-27 Impact factor: 2.991

4. BBK* (Branch and Bound Over K*): A Provable and Efficient Ensemble-Based Protein Design Algorithm to Optimize Stability and Binding Affinity Over Large Sequence Spaces.

Authors: Adegoke A Ojewole; Jonathan D Jou; Vance G Fowler; Bruce R Donald
Journal: J Comput Biol Date: 2018-03-13 Impact factor: 1.479

5. Minimization-Aware Recursive K:* A Novel, Provable Algorithm that Accelerates Ensemble-Based Protein Design and Provably Approximates the Energy Landscape.

Authors: Jonathan D Jou; Graham T Holt; Anna U Lowegard; Bruce R Donald
Journal: J Comput Biol Date: 2019-12-06 Impact factor: 1.479

Review 6. Algorithms for protein design.

Authors: Pablo Gainza; Hunter M Nisonoff; Bruce R Donald
Journal: Curr Opin Struct Biol Date: 2016-04-14 Impact factor: 6.809

7. Effects of Hinge-region Natural Polymorphisms on Human Immunodeficiency Virus-Type 1 Protease Structure, Dynamics, and Drug Pressure Evolution.

Authors: Zhanglong Liu; Xi Huang; Lingna Hu; Linh Pham; Katye M Poole; Yan Tang; Brian P Mahon; Wenxing Tang; Kunhua Li; Nathan E Goldfarb; Ben M Dunn; Robert McKenna; Gail E Fanucci
Journal: J Biol Chem Date: 2016-08-30 Impact factor: 5.157

8. Fast gap-free enumeration of conformations and sequences for protein design.

Authors: Kyle E Roberts; Pablo Gainza; Mark A Hallen; Bruce R Donald
Journal: Proteins Date: 2015-08-24

Review 9. On the energy components governing molecular recognition in the framework of continuum approaches.

Authors: Lin Li; Lin Wang; Emil Alexov
Journal: Front Mol Biosci Date: 2015-03-06

10. Structural and energetic analysis of 2-aminobenzimidazole inhibitors in complex with the hepatitis C virus IRES RNA using molecular dynamics simulations.

Authors: Niel M Henriksen; Hamed S Hayatshahi; Darrell R Davis; Thomas E Cheatham
Journal: J Chem Inf Model Date: 2014-06-03 Impact factor: 4.956