Literature DB >> 31894691

Combined Linear Interaction Energy and Alchemical Solvation Free-Energy Approach for Protein-Binding Affinity Computation.

Eko Aditya Rifai¹, Valerio Ferrario², Jürgen Pleiss², Daan P Geerke¹.

Abstract

Calculating free energies of binding (ΔGbind) between ligands and their target protein is of major interest to drug discovery and safety, yet it is still associated with several challenges and difficulties. Linear interaction energy (LIE) is an efficient in silico method for ΔGbind computation. LIE models can be trained and used to directly calculate binding affinities from interaction energies involving ligands in the bound and unbound states only, and LIE can be combined with statistical weighting to calculate ΔGbind for flexible proteins that may bind their ligands in multiple orientations. Here, we investigate if LIE predictions can be effectively improved by explicitly including the entropy of (de)solvation into our free-energy calculations. For that purpose, we combine LIE calculations for the protein-ligand-bound state with explicit free-energy perturbation to rigorously compute the unbound ligand's solvation free energy. We show that for 28 Cytochrome P450 2A6 (CYP2A6) ligands, coupling LIE with alchemical solvation free-energy calculation helps to improve obtained correlation between computed and reference (experimental) binding data.

Entities: Chemical Disease Gene Species

Mesh：

Substances：

Year: 2020 PMID： 31894691 PMCID： PMC7017367 DOI： 10.1021/acs.jctc.9b00890

Source DB: PubMed Journal: J Chem Theory Comput ISSN： 1549-9618 Impact factor: 6.006

Introduction

Protein–ligand binding can lead to (enhanced) activity or inhibition of a protein,[1−4] and the binding affinity of a ligand to its target or off-target can be quantitatively expressed in terms of the corresponding binding free energy (ΔGbind).[5] Therefore, free-energy calculations have played a prominent role from the hit-identification to lead-optimization stages of drug design and discovery[6,7] and in the screening of inhibitory potentials of molecules that can interact specifically with pharmacologically or toxicologically relevant proteins.[8−10] Due to the increased availability of biomolecular off-targets and computational resources, protein-structure-based computational techniques increasingly become attractive for the pharmaceutical industry for the prediction of ΔGbind values.[11−14] In silico approaches to calculate protein–ligand ΔGbind values range from combined docking and scoring[15] to molecular-simulation-based end-point[16,17] and alchemical free-energy methods.[18,19] Selection of the most suitable method depends on the number of compounds involved, the studied system, and the preferred levels of accuracy and efficiency.[20,21] Docking and scoring functions usually treat protein structure rigidly and are typically used for binding pose prediction and to (qualitatively) distinguish binders from nonbinders within large databases, thus filtering them into smaller sets of compounds for further processing and/or experimental validation.[22] On the other hand, rigorous alchemical methods such as free-energy perturbation (FEP)[18] and thermodynamic integration (TI)[19] can lead to accurate free-energy estimates when sufficient sampling of the relevant conformational states is obtained. This is typically compute-intensive as it requires extensive molecular dynamics (MD) simulations of the system including MD runs of nonphysical states,[23] making these methods less suitable for use in (semi)high-throughput settings. Alternatively, end-point methods such as linear interaction energy (LIE)[17] and molecular mechanics combined with Poisson–Boltzmann or generalized Born and surface area continuum solvation (MM/PBSA or MM/GBSA)[16] lie between alchemical and docking/scoring-based approaches and offer a trade-off between computational efficiency and accuracy, by including protein and ligand conformational sampling but only performing MD simulation of the end states for protein–ligand (un)binding.[24] A plausible application of end-point methods is to reduce compounds in a large set of potential binders to a subset of promising molecules.[20] An advantage of LIE is that it allows for a straightforward incorporation of the contribution of multiple and/or different binding modes of ligands in its computation of (direct or “absolute”) binding free energies to the protein of interest.[25] This is especially relevant when quantifying affinities of binding to target receptors or off-target enzymes with large active-site conformational flexibility, including, e.g., pharmaceutically relevant proteins such as the families of Cytochrome P450s (CYPs) and kinases. In such cases, it can be especially difficult to compute binding free energies from docking/scoring or alchemical calculations, because different protein and/or ligand binding conformations may contribute to the binding free energy. As an attractive alternative, we have in recent years extended and automated a statistical-weighting-based approach to efficiently compute ΔGbind by combining results from (short) MD simulations starting from different relevant parts of protein–ligand conformations. For that purpose, MD simulations are run per training or query compound free in solvent (free) and bound to the protein of interest in N starting configurations comprising different binding orientations and/or protein conformations (bound). From every individual MD simulation, average van der Waals (vdw) and electrostatic (ele) interaction energies of the ligand (lig) with its surrounding (surr) are used to obtain the calculated binding free energy ΔGbind,calc, by weighting results from the N individual simulations of the protein–ligand complex in a Boltzmann-like manner[25,26]withandThe weights W of the individual simulations i in eq are calculated as[26]in which kB is Boltzmann’s constant and T is the temperature of the simulated system. Both α and β are parameterized based on experimentally curated values of ΔGbind, and this needs to be done in an iterative fashion when N > 1.[25] Note that the use of the offset parameter (γ) in eq is optional, which would need to be trained along with α and β as well (albeit in an iterative fashion). In the LIE models presented in this work, γ was set to zero unless noted otherwise. While the above-described (iterative) weighting approach can improve LIE predictions by enhancing protein–ligand sampling, we investigate here if we can further improve LIE by calculating the free energy of ligand desolvation alchemically and by including this contribution to ΔGbind explicitly into our LIE approach. In this way, we can rigorously evaluate both the energetic and entropic contributions of desolvation to ΔGbind, and we only have to calibrate α and β parameters associated with ligand–environment interaction energies in the simulation of the protein-bound system. We evaluate if replacing the contributions from the unbound state in LIE by alchemically computed (de)solvation free energies will improve ΔGbind calculations when compared to experiment, by merely performing short free-energy perturbations of the ligands free in solvent. Thus, we introduce here a modified framework of LIE computation by including ligand solvation by means of FEP instead of using the second terms on the right-hand sides of eqs and 3. In particular, we present combined LIE/FEP models to compute binding to human Cytochrome P450 2A6 (CYP2A6; EC 1.14.13.−) and compare it to traditional LIE models. Cytochrome P450s (CYPs) are malleable and promiscuous heme-containing redox proteins comprising a ubiquitous superfamily of mono-oxygenases, and they act as the major drug-metabolizing enzyme in phase I of xenobiotics metabolism, converting ca. 75% of available marketed drugs.[27,28] A variety of CYPs are known to be able to accommodate substrates in different binding poses. CYP2A6 is a hepatic enzyme comprising 1–10% of total CYPs in human body[29] but responsible for 70–80% of nicotine metabolism,[30] and inhibiting nicotine metabolism may diminish the desire for smoking; hence, CYP2A6 inhibitors may help smoking cessation. Here, we calibrate LIE models for CYP2A6 based on experimental data for 28 inhibitors and take advantage of the weighted LIE scheme of eqs –4 to enable the inclusion of multiple binding poses in the model. We present several adapted LIE models in which (de)solvation thermodynamics is included as described above, and we show that combining LIE with FEP for that purpose can improve correlations between calculated and experimental CYP2A6–inhibitor binding affinities. In addition, we study if this conclusion can be extended toward another CYP (CYP2E1).

Methods

Protein and Ligand Structure Preparation

The crystal structure of CYP2A6 was downloaded from the Protein Data Bank (PDB)[31] with ID 2FDV.[32] After retaining the protein and heme-group atomic positions of chain A, MolProbity[33] was used to add missing hydrogen atoms, optimize hydrogen bond networks, and check for flipped Asn, Gln, and His residues. No residue was flipped. Chimera version 1.10.2[34] was used to define AMBER ff14SB[35] force field parameters for the protein and AM1-BCC charges[36] for the heme group and to energy-minimize the structure by 100 steps steepest descent and 10 steps conjugate gradient, and correct protonation states of His residues. We collected a data set of experimental CYP2A6 (as well as CYP2E1) IC50 or K values for 28 pyridine analogues of nicotine from the literature[32,37] via ChEMBL,[38] which was used to derive either the Cheng–Prusoff estimates of K and/or the observed binding free energies ΔGbind,exp (ranging from −40.9 to −28.0 kJ mol–1; Table ) using[39]

Table 1

Molecular Structures and Experimental Estimates for the CYP2A6 Binding Free Energy (ΔGbind,exp in kJ mol–1) of the 28 Compounds Considered in the Current Work

LIE Model Parameterization

We used our eTOX ALLIES[40] pipeline to automatically perform docking and MD simulations and to employ the Boltzmann-weighted LIE method (Figure ). Open Babel 2.3.9[41] was used to prepare ligand structures (i.e., generation of three-dimensional (3D) coordinates if necessary and protonation based on pH 7.4 or neutralization, depending on model settings). Ligand topologies were created using AmberTools15[42] based on the general Amber force field (GAFF)[43] and the AM1-BCC approach,[36] and this topology generation was automatically executed and converted to GROMACS format using ACPYPE[44] (Rev: 7828). Ligands were docked into the CYP2A6 active site by ParaDockS 1.0.1[45] with a sphere radius of 1.0 nm from the center, which was automatically determined on the basis of the heme domain atom coordinates. Docking solutions were subsequently subjected to principal component analysis (PCA) and k-means clustering to generate medoids as representative docking conformations for MD inputs, as described in our previous works.[40,46]

Figure 1

Schematic overview of LIE and ALIE workflows used in this study. The pipeline starts from the ligand and protein structure preparation, followed by docking, clustering, and MD simulations. The LIE model is obtained using eTOX ALLIES,[40] while the ALIE models are generated using in-house scripts. The medoids (2–3 per ligand) were afterward used in combination with the protein structure as input for steepest-descent energy minimization in GROMACS 4.5.5. The minimized complex structures were in a next step solvated in approximately 16180 TIP3P water molecules,[47] and the systems were neutralized by the addition of 6 Cl– counterions. All subsequent MD stages, including the 1 ns production runs starting from the 2–3 selected binding poses of the relatively rigid ligands, were carried out using GROMACS 4.5.5[48] and the Amber14SB force field,[35] in which energy and coordinate trajectories were written out to disk every 10 ps. Thermal pre-equilibration, temperature and pressure coupling (with time constants of 0.1 and 0.5 ps, respectively), grid-based pair-list update frequency, and long-range treatment of nonbonded interactions during MD were performed as described previously.[49] For the heme cofactor and coordinating cysteine residue in CYP2A6, special force field parameters were used.[40] Every ligand pose was run in duplicated simulations[50] and weighted based on Boltzmann-like statistics (eqs –4).[25] Separate duplicated 1 ns production runs for ligands free in solvent using identical MD settings as for protein–ligand complexes were conducted to obtain average interaction energies for unbound ligands for use in eq . Time-averaged energetic terms between the ligand and environment as obtained from MD were subsequently subjected to ordinary least-squares (OLS) fitting to train LIE models based on curated experimental ΔGbind values, which was performed using the Python scikit-learn 0.17 package.[51] Fitted LIE parameters (i.e., α and β (and γ if required)) from model creation (see Results and Discussion) were obtained in an iterative way.[25] All of the works from ligand preparation until LIE model generation were performed within the eTOX ALLIES pipeline.[40]

Adapted Model 1: Including (De)Solvation Free-Energy Calculations

In adapted LIE model 1 (ALIE1), we separately consider the solvation free energy ΔGsolv (using rigorous alchemical computation) and the free energy of transferring a ligand from the gas phase into the bound state (ΔGprot, Figure ). In this way, we only need LIE parameters to scale ligand–environment interaction energies in the protein-bound state, in which ligand sampling is already enhanced compared to traditional LIE by weighting results from MD simulations starting from different ligand binding poses using eq . Thus, the calculation of “binding free energies” reduces in this ALIE1 model toAs reference data ΔGprot,ref for the calibration of scaling parameters α and β (and optionally, offset parameter γ), we use the sum of ΔGbind,exp and ΔGsolv, cf. Figure To rigorously evaluate ΔGsolv from free-energy perturbation, 21 independent MD simulations were performed of the individual ligands in water. Solute–solvent (Coulomb and van der Waals) interactions were gradually activated in these simulations using equispaced λ points (ranging from 0 to 1) with λ as perturbation parameter for the Hamiltonian.[19,52,53] At every λ point, MD simulations were performed under minimum image periodic boundary conditions based on cubic computational boxes. The equations of motion were integrated using the leap-frog scheme[54] with a time step of 2 fs. All simulations were performed at a temperature of 300 K and a pressure of 1 bar; water was simulated as the TIP3P model. MD simulations were performed using the software GROMACS version 5.[55] All of the systems were previously equilibrated by performing a first minimization step (50 000 steps steepest descent algorithm) followed by 30 ns of equilibration in NPT ensemble at 300 K and at 1 bar of pressure using the v-rescale algorithm and Parrinello–Rahman algorithm for temperature and pressure coupling, respectively.[56,57] Electrostatic interactions were calculated using the fast smooth particle-mesh Ewald (SPME) algorithm.[58] For each λ point, the system was minimized (10 000 steps, steepest descent) and equilibrated for 200 ps (100 ps in NVT ensemble followed by 100 ps in NPT ensemble) before the production of MD simulation (15 ns). The final solvation free energies relied on multistate Bennett acceptance ratio (MBAR).[59] For every free energy calculated, statistical error estimation is provided as indicated by Klimovich et al.[52] Values for and errors in the calculated free energy were obtained using an automatic analysis workflow (http://github.com/choderalab/pymbar) that systematically evaluates free-energy differences for each pair of adjacent states, as a function of simulation time in both the forward and reverse directions and from overlapping distributions.[52,59]

Figure 2

Thermodynamic cycle illustrating the calculated and reference free-energy differences considered in this study. The solvation free energy ΔGsolv is associated with the transfer of the ligand in the gas phase (top) into the water solvent (bottom left). Binding of the solvated ligand to the target protein is associated with the free energy of binding ΔGbind. The sum of these two free energies is equal to the free-energy difference ΔGprot for transferring the ligand from the gas phase directly to its protein-bound state in solution (bottom right).

Adapted Model 2: Extension to the Linear Response Approximation (LRA) Approach

Equation does not account for possible inherent ligand interaction energies with the protein in its unbound conformation, but depending on the system of interest, such preorganization energy may be significant.[60] In that case, the assumption of the linear response in the electrostatic interactions as taken in eq can break down. Here, we examined the size of the preorganization energy and if our calculations could be further improved by replacing the second term on the right-hand side of eq with the original linear response approximation (LRA) of Warshel and co-workers[61] to retrieve adapted model 2 (ALIE2), whereand⟨Vlig–surrele⟩off is the preorganization energy, which can be directly calculated as average ligand–surrounding interaction energy from MD simulations of the noninteracting protein–ligand complex. Thus, we only need to calibrate α (and optionally γ) in the ALIE2 models, for which values of the difference ΔGprot,ref – ΔGLRA for the set of compounds in Table serve as reference calibration data, while the first term on the right-hand side of eq was the actually calculated value during calibration. To generate the ALIE2 model, we only performed MD simulations starting from a single binding pose per ligand. This was per ligand taken to be the (thermally equilibrated) MD starting pose of the simulation used in the ALIE1 model with highest weight W (see Table S1 of the Supporting Information for the separate W values), from which ⟨Vlig–surrele⟩bound was directly obtained. Figure S1 compares selected ligand binding poses used as MD starting poses in the ALIE1 model to illustrate observed differences in these poses both among ligands for which a single simulation showed a contribution (W) of more than 80% in the calculated binding free energy and among poses with significant contribution to the calculated affinity of a single compound. ⟨Vlig–surrele⟩off was calculated by averaging ligand–protein electrostatic interaction energies over configurations from additional duplicated 1 ns simulations in which protein–ligand interactions were turned off while otherwise utilizing identical MD settings (and starting poses) used to obtain the ⟨Vlig–surrele⟩bound values.

Adapted Model 3: Using Free-Energy Perturbation To Estimate Protein–Ligand Electrostatic Interactions

Previously, de Ruiter and Oostenbrink showed for different cases that LRA or LIE approaches for binding free-energy computation can be improved by allowing nonlinear fits to the change in electrostatic energy upon (un)binding.[60] Here, we investigate in a last step if explicitly running free-energy perturbations using a coupling λ parameter for the electrostatic ligand–protein interactions can lead to improved LRA or LIE predictions. For this purpose, 20 ps equilibration and 10 ns production MD runs were performed in one replicate, starting from the same protein–ligand complex structure as used in the additional ALIE2 simulations, at five λ points in which electrostatic interactions between the ligand and surrounding were gradually decoupled. Other MD settings were identical as those in the simulations of the bound state used for the other (A)LIE models. From these simulations, the electrostatic contribution to ΔGprot (denoted by the subscript ele in eq ) was obtained using the Bennett acceptance ratio (BAR) method.[52] Similarly as in the ALIE2 model, only α had to be fitted, which was in this case done based on the difference ΔGprot,ref – ΔGprot,ele, as follows

Results and Discussion

LIE Model

The parameterization and performance of the calibrated LIE model are summarized in Table . The quality of the model was evaluated by two kinds of metrics, i.e., correlation coefficients r2, Pearson’s r, and Spearman’s ρ, and deviations from experimental values estimated as root-mean-square error (RMSE). The obtained correlation between observed experimental and calculated ΔGbind values is also illustrated by the regression and kernel density plots in Figure A, in which the 95% confidence interval is indicated by the shaded area. From Table and Figure A, it can be seen that ΔGbind values as calculated with our LIE model show deviations from ΔGbind,exp that are close to 1 kcal mol–1 (RMSE = 4.3 kJ mol–1), but the obtained correlation plots and coefficients indicate poor predictions of trends in binding for the considered set of compounds (r = 0.18, ρ = 0.16). This may be a result of the small spread within the experimental reference values for the employed data set, cf. Table and Figure A.[62]

Table 2

Model Parameters for the LIE and ALIE Models of CYP2A6, Together with Respective Root-Mean-Square Errors (RMSEs, with Respect to Experiment) and Correlation Metrics for the Set of Compounds Considered in This Work

	LIE	ALIE1	ALIE2	ALIE3
α	0.67	0.34	0.38	0.29
β	0.13	0.66	0.50
RMSE (kJ mol^–1)	4.34	8.39	8.93	8.64
r²	0.03	0.59	0.53	0.59
Pearson’s r	0.18	0.77	0.73	0.77
Spearman’s ρ	0.16	0.76	0.76	0.74

Figure 3

Scatter and kernel density plots of the LIE (A), ALIE1 (B), ALIE2 (C), and ALIE3 (D) models for CYP2A6 binding, illustrating obtained correlations between calculated free energies ΔGbind,calc or ΔGprot,calc and their respective reference values. Dashed lines indicate an ideal correlation between the calculated and reference data, solid lines indicate the actually obtained correlations, and shaded areas indicate 95% confidence intervals.

Adapted LIE Models

The solvation free energies of the set of 28 CYP2A6 binders were calculated from MD simulations and free-energy perturbation. The results of these calculations, as well as their error estimates, are presented in Table S2 of the Supporting Information. The model performance and parameters of the ALIE models are summarized in Table and plotted in Figure B–3D. Interestingly, explicitly evaluating the solvation free energy via alchemical perturbation and only parameterizing α and β parameters to compute ΔGprot greatly improve predictions in trends in binding affinity. This is illustrated by the relatively high correlation coefficients for the ALIE1 model, especially when compared to those for the traditional LIE model (Table ). As recently argued by one of us,[63] computing ΔGprot can aid in predicting intrinsic preferences in protein–ligand interactions, regardless of the desolvation contribution as included in the experimental binding free energy. This can be of help in studying, e.g., possible effects on these intrinsic protein–ligand affinities due to protein mutations (which may, in turn, be relevant to bioengineering[63] or studying polymorphisms), but it should be noted that in our ALIE1 model, relatively large errors in calculated ΔGprot values were obtained (RMSE = 8.4 kJ mol–1). Probably, an important contribution to the approximate doubling of the RMSE when going from the LIE to the ALIE1 model is the accompanying 2- to 3-fold increase in the absolute reference values to be predicted, cf. Figure S2. In addition, the increase in RMSE in going from the LIE to the ALIE1 model may be due to a contribution from possible errors in the computed ΔGsolv value to the calculated complexation free energies in the ALIE1 model. Besides the larger spread in ΔGprot,ref values than for ΔGbind,exp, the higher ALIE1 correlation coefficients (when compared to LIE) can be understood in terms of the different α and β values that we obtained when fitting separate LIE-like models for the prediction of ΔGsolv and ΔGprot (which together constitute ΔGbind; Figure ). To obtain an LIE model for ΔGsolv computation, we used our FEP/MBAR values as reference data to fit parameters that directly relate the ligand–solvent van der Waals and electrostatic interaction energies in the unbound state to the solvation free energy[64]In this way, a model was obtained with α and β close to 0 and 0.5, respectively (Table ). Thus, to compute ΔGsolv values, we can use a linear response model in terms of electrostatic interactions only, Figure . In contrast, our fitted ALIE1 model for ΔGprot calculation has values of α = 0.34 and β = 0.66, which is also significantly closer to the theoretical value of 0.5 than obtained for our LIE model for ΔGbind calculation, Table . In conclusion, by splitting up the calculation of ΔGbind into the upper steps in the thermodynamic cycle depicted in Figure , an improved and significant correlation could be obtained between the computed and reference data, as illustrated by the values of Pearson’s r and Spearman’s ρ that are close to 0.8 for the ALIE1 model.

Table 3

LIE Parameters for and Performance of a Model That Linearly Relates Average Ligand–Solvent van der Waals (via α) and Electrostatic Interaction Energies (via β) As Obtained from Independent Simulations of the 28 Unbound Ligands in Water Solvent, Fitted Based on ΔGsolv Values Computed from Alchemical Perturbationa

	value
α	–0.04
β	0.52
RMSE (kJ mol^–1)	3.40
r²	0.92
Pearson’s r	0.96
Spearman’s ρ	0.97

Deviations and correlation between the LIE and alchemical calculations are expressed in terms of the root-mean-square error (RMSE), and the r2, Pearson’s r, and Spearman’s ρ, respectively.

Figure 4

Scatter and kernel density plots of an LIE model to predict ligand solvation free energies ΔGsolv, which are linearly related to average ligand–solvent van der Waals (via LIE parameter α) and electrostatic interaction energies (via LIE parameter β) as obtained from independent simulations of the 28 unbound ligands in water solvent. Values for α and β are fitted based on ΔGsolv values computed from alchemical perturbation. The dashed line indicates an ideal correlation between the calculated and reference data, the solid line indicates the actually obtained correlations, and the shaded area indicates the 95% confidence interval. Deviations and correlation between the LIE and alchemical calculations are expressed in terms of the root-mean-square error (RMSE), and the r2, Pearson’s r, and Spearman’s ρ, respectively. Subsequently, we investigated if we could further improve our model by including the protein–ligand preorganization energy via the linear response approximation in the ALIE2 model. Table shows that this did not lead to a further improvement in model performance when compared to the ALIE1 results, indicating that the preorganization term does not contribute significantly to trends in CYP2A6 binding for the considered compounds. This is confirmed by the relatively low values obtained for ⟨Vlig–surrele⟩off (Table S3). Accounting for a possible nonlinear contribution of the electrostatic protein–ligand interaction energy to ΔGprot (by evaluating ΔGprot,ele via alchemical free-energy calculation in the ALIE3 model) did not lead to improved model performance compared to that of the ALIE1 model either, Table . We also assessed the effect of including more λ points into the calculation of ΔGprot,ele (i.e., by going from 5 to 9 λ points with regular intervals) for three randomly chosen ligands in the data set (compounds 16, 20, and 23). By introducing more λ points in this way, we observed changes in ΔGprot,calc ranging between 0.04 and 1.74 kJ mol–1 only. In conclusion, the possible nonlinear change in electrostatic energy upon (de)coupling protein–ligand interactions (as we observed for several compounds, data not shown) could, in this case, still effectively be accounted by using β as a free parameter in the ALIE1 model. This is also illustrated by comparing the standard deviation error in the prediction (SDEP) of 9.3 kJ mol–1 as obtained from leave-one-out cross-validation (LOO-CV) for ALIE1, with the corresponding SDEP values for the ALIE2 and ALIE3 models, which were with values of 9.2 and 9.0 kJ mol–1 only 0.1 and 0.3 kJ mol–1 lower, respectively. For these reasons, we focus below on the ALIE1 model, for which we included the SDEP and the associated LOO-CV correlation coefficient q2 in Table S4 of the Supporting Information.

A Closer Look at the ALIE1 Model

To exclude a possible large effect on model performance of the data points that are outliers in the ALIE1 model in terms of their reference (compound 16) or calculated value (compound 22, Figure B), we generated ALIE1-like models, in which either of these two compounds was excluded from the training set (Table S4 and Figure S3). Our finding that compound 22 is an outlier in terms of ΔGprot prediction (Figure B) may be due to the fact that this is the only hydroxyl-containing compound in our ligand set, and previously Åqvist and co-workers argued to calibrate different values for β for compounds with a different number of −OH groups.[65] We found that the quality of the ALIE1 model only moderately improves when excluding compound 22 from our data set: the RMSE decreases by 0.7 kJ mol–1, whereas Pearson’s r and especially Spearman’s ρ do not change significantly (Table S4). On the other hand, LOO-CV shows a more substantial decrease of the SDEP and an increase in q2 upon excluding compound 22 from the data set, Table S4. Encouragingly, when compound 16 (instead of 22) is excluded, the quality of the ALIE1 model only changes slightly. The RMSE and SDEP increase by 0.1 and 0.2 kJ mol–1, respectively, and Pearson’s r and Spearman’s ρ are still above 0.7, while compound 16 is responsible for approximately 20% of the spread in reference data for ALIE1 model calibration (cf. Figure B). This implies that the model does not necessarily rely on compound 16 to show a significant correlation. In a next step, we investigated if we could improve the ALIE1 model by including the γ offset parameter in eq or by including effective corrections for the loss in ligand-configurational freedom and possible release of water molecules from the active site upon binding. We found that including γ as an additional free parameter in eq does not significantly improve the model quality, which is especially apparent when comparing the obtained correlation with the ALIE1 performance, Table S4. To account for possible differences among the considered compounds in their loss in configurational entropy upon protein binding, we added a correction term ΔGrot to our binding free-energy calculations defined aswith N the number of rotatable bonds, kB Boltzmann’s constant, and T the temperature.[66] In this way, we add a penalty per rotatable bond to the ligand binding free energies that is comparable to the corresponding estimates as reported in the literature, which range between 1.5 and 4 kJ mol–1. In an attempt to also account for possible differences in the replacement of water molecules by the different ligands, we computed a free-energy correction ΔGSASA for this based on the ratio of the ligand’s SASA (SASAligand) and the SASA of a water molecule (SASAwater), multiplied by a factor TΔS representing the entropy of a water molecule and set to 20 kJ mol–1, as given byTo include ΔGrot and ΔGSASA into our ALIE1 model, we calibrated α and β in eq based on the sum (ΔGbind,exp + ΔGsolv + ΔGrot + ΔGSASA). The resulting model is presented in Table S4 and Figure S3D. Table S4 and Figure S3 also present the results for a similar model in which a value of TΔS = 6 kJ mol–1 is used. From a comparison with Table and Figure B, we find that incorporating the SASA term and number of rotatable bonds in this adapted model does not significantly change model performance compared to ALIE1, with (for different choices of TΔS in eq ) maximal changes in correlation coefficients and RMSE (or SDEP) of only 0.07 and 0.25 kJ mol–1, respectively.

LIE and ALIE1 Models for Cytochrome P450 2E1

As another test case, we calibrated and compared an LIE and ALIE1 model for a different CYP, i.e., Cytochrome P450 2E1 (CYP2E1), a hepatic enzyme responsible for alcohol metabolism.[67] We chose this protein because for the compounds of our CYP2A6 data set, we also have experimentally estimated ΔGbind values available for CYP2E1 (ranging from −42.6 to −21.5 kJ mol–1; Table S5). As for CYP2A6, these values were obtained using eq (39) and using inhibition data from Cashman and co-workers.[32,37] Obviously, for these compounds, we already have the rigorously calculated set of ΔGsolv values available from our alchemical free-energy calculations described above. The initial protein structure (with ID 3E6I(68)) was obtained from the PDB database. Missing loops were modeled with ModLoop.[69] Subsequent steps for structure preparation, docking, MD, and model calibration were identical as for our CYP2A6 modeling. The performance and parameters of our LIE and ALIE1 models for CYP2E1 binding are summarized in Table and Figure . As observed for CYP2A6, the ALIE1 approach led to a clearly enhanced correlation between the calculated and reference data. For our ALIE1 model for CYP2E1, Pearson’s r = 0.75 and Spearman’s ρ = 0.73, which are to be compared to LIE values of 0.36 and 0.23, respectively. Table additionally shows that β is in the ALIE1 model again closer to its theoretical value of 0.5 than when using traditional LIE (0.57 vs 0.20), and we find the same ligand (compound 22) as a predicted outlier. We also find a higher RMSE value for the ALIE1 model of CYP2E1 than for its LIE model, which is, as in the case of CYP2A6, in line with the accompanying increase in the spread of absolute values for the reference calibration data, Figure S4. We note that different α and β values are obtained for the ALIE1 (and for the LIE) models for CYP2A6 and CYP2E1, cf. Tables and 4. This prevented us from using a single model for CYP2E1 vs CYP2A6 binding selectivity prediction for the compounds of interest.

Table 4

Model Parameters for the LIE and ALIE1 Models of CYP2E1, Together with Respective Root-Mean-Square Errors (RMSEs, with Respect to Experiment) and Correlation Metrics for the Set of Compounds Considered in This Work

	LIE	ALIE1
α	0.70	0.40
β	0.20	0.57
RMSE (kJ mol^–1)	5.14	8.81
r²	0.13	0.56
Pearson’s r	0.36	0.75
Spearman’s ρ	0.23	0.73

Figure 5

Scatter and kernel density plots of the LIE (A) and ALIE1 (B) models for CYP2E1 binding free-energy calculation, illustrating the obtained correlations between calculated free energies ΔGbind,calc or ΔGprot,calc and their respective reference values. Dashed lines indicate an ideal correlation between the calculated and reference data, solid lines indicate the actually obtained correlations, and shaded areas indicate 95% confidence intervals.

Conclusions

We showcase here several adapted LIE models based on modifications of a Boltzmann-like weighted LIE approach for calculating protein–ligand binding free energies. By separately computing ligand solvation free energies (using alchemical perturbation) and evaluating protein–ligand interactions from MD of the bound state, we were able to obtain an adapted LIE model with a high correlation between reference binding data and the calculated results. For that purpose, we solely needed to run short (1 ns production) MD simulations of the protein-bound state in combination with alchemical free-energy calculations of the unbound state only. Pearson’s r and Spearman’s ρ values increased from 0.18 and 0.16 for the traditional LIE model to 0.77 and 0.76, respectively. By including alchemical ligand solvation free-energy calculations into an LIE model for the same set of compounds binding to a different Cytochrome P450 (CYP2E1), a similar increase in correlation was obtained.

58 in total

Review 1. Structure-based virtual screening: an overview.

Authors: Paul D Lyne
Journal: Drug Discov Today Date: 2002-10-15 Impact factor: 7.851

2. Calculations of antibody-antigen interactions: microscopic and semi-microscopic evaluation of the free energies of binding of phosphorylcholine analogs to McPC603.

Authors: F S Lee; Z T Chu; M B Bolger; A Warshel
Journal: Protein Eng Date: 1992-04

2. Simulations on the dual effects of flavonoids as suppressors of Aβ42 fibrillogenesis and destabilizers of mature fibrils.

Authors: Sahar Andarzi Gargari; Abolfazl Barzegar
Journal: Sci Rep Date: 2020-10-06 Impact factor: 4.379

2 in total

Combined Linear Interaction Energy and Alchemical Solvation Free-Energy Approach for Protein-Binding Affinity Computation.

Introduction

Methods

Protein and Ligand Structure Preparation

LIE Model Parameterization

Adapted Model 1: Including (De)Solvation Free-Energy Calculations

Adapted Model 2: Extension to the Linear Response Approximation (LRA) Approach

Adapted Model 3: Using Free-Energy Perturbation To Estimate Protein–Ligand Electrostatic Interactions

Results and Discussion

LIE Model

Adapted LIE Models

A Closer Look at the ALIE1 Model

LIE and ALIE1 Models for Cytochrome P450 2E1

Conclusions

Review 1. Structure-based virtual screening: an overview.

2. Calculations of antibody-antigen interactions: microscopic and semi-microscopic evaluation of the free energies of binding of phosphorylcholine analogs to McPC603.

3. Improved ligand-protein binding affinity predictions using multiple binding modes.

Review 4. Cytochrome P450s and other enzymes in drug metabolism and toxicity.

Review 5. Calculation of protein-ligand binding affinities.

6. Canonical sampling through velocity rescaling.

7. Thermodynamic Activity-Based Interpretation of Enzyme Kinetics.

8. A new method for predicting binding affinity in computer-aided drug design.

9. A dual-ligand approach for enhancing targeting selectivity of therapeutic nanocarriers.

10. An improved relaxed complex scheme for receptor flexibility in computer-aided drug design.

1. Computing Cellulase Kinetics with a Two-Domain Linear Interaction Energy Approach.

2. Simulations on the dual effects of flavonoids as suppressors of Aβ42 fibrillogenesis and destabilizers of mature fibrils.