Literature DB >> 31461271

A Comparative Linear Interaction Energy and MM/PBSA Study on SIRT1-Ligand Binding Free Energy Calculation.

Eko Aditya Rifai¹, Marc van Dijk¹, Nico P E Vermeulen¹, Arry Yanuar², Daan P Geerke¹.

Abstract

Binding free energy (ΔGbind) computation can play an important role in prioritizing compounds to be evaluated experimentally on their affinity for target proteins, yet fast and accurate ΔGbind calculation remains an elusive task. In this study, we compare the performance of two popular end-point methods, i.e., linear interaction energy (LIE) and molecular mechanics/Poisson-Boltzmann surface area (MM/PBSA), with respect to their ability to correlate calculated binding affinities of 27 thieno[3,2-d]pyrimidine-6-carboxamide-derived sirtuin 1 (SIRT1) inhibitors with experimental data. Compared with the standard single-trajectory setup of MM/PBSA, our study elucidates that LIE allows to obtain direct ("absolute") values for SIRT1 binding free energies with lower compute requirements, while the accuracy in calculating relative values for ΔGbind is comparable (Pearson's r = 0.72 and 0.64 for LIE and MM/PBSA, respectively). We also investigate the potential of combining multiple docking poses in iterative LIE models and find that Boltzmann-like weighting of outcomes of simulations starting from different poses can retrieve appropriate binding orientations. In addition, we find that in this particular case study the LIE and MM/PBSA models can be optimized by neglecting the contributions from electrostatic and polar interactions to the ΔGbind calculations.

Entities: Chemical Disease Gene Species

Year: 2019 PMID： 31461271 PMCID： PMC6759767 DOI： 10.1021/acs.jcim.9b00609

Source DB: PubMed Journal: J Chem Inf Model ISSN： 1549-9596 Impact factor: 4.956

Introduction

A quantitative knowledge of protein–ligand binding affinities is essential in understanding molecular recognition; hence, efficient and accurate binding free energy (ΔG) calculations are a central goal in computer-aided drug design.[1] An arsenal of ΔG calculation approaches is available, ranging from rigorous alchemical methods such as free energy perturbation (FEP)[2] and thermodynamic integration (TI)[3] to fast empirical scoring functions featured in molecular docking.[4] The latter are prominent in predicting protein–ligand binding poses and in discriminating binders and nonbinders within large chemical databases,[5] but they typically lack accuracy in quantitatively ranking and predicting ΔG values.[6] In contrast, rigorous alchemical methods may provide reliable estimates for ΔG but require extensive sampling of multiple intermediate nonphysical states and are thus computationally more expensive and still impractical for use in high-throughput scenarios.[7] Compared to these counterparts, the alternate end-point methods offer an intermediate in terms of effectiveness and efficiency in ΔG computation by allowing one to explicitly explore ligand, protein–ligand, and solvent configurational space in the protein–ligand bound and unbound states only.[8] This provides advantages both over rigorous free energy calculations (in terms of efficiency) and empirical scoring functions (in terms of potential accuracy). Frequently applied end-point methods make use of linear interaction energy (LIE) theory[9] and the molecular mechanics/Poisson–Boltzmann surface area (MM/PBSA) approach.[10] In LIE, ΔG is assumed to be linearly proportional with the differences in van der Waals and electrostatic interactions involving the ligand and its environment, as obtained from simulations of its protein-bound and unbound states in solvent. Differences in these interactions (modeled using Lennard-Jones (LJ) and Coulomb potential-energy functions) are scaled by LIE parameters α and β, respectively.[9] Originally, β was set to several fixed values according to a series of studied systems.[9,11−15] Later, it turned out that fixed values for β are usually only suitable for particular systems of interest and not generally transferable between different systems.[16] To mitigate this, a proposal was made to treat both α and β as freely adjustable parameters that can be fitted based on a set of experimentally determined ΔG values.[17,18] The fitted values of α and β (and optionally offset parameter γ) determine the LIE scoring function to be used for predicting ΔG for ligands with unknown affinity, which is given by[9]with γ set to zero in this work (unless noted otherwise), andandThe terms on the right-hand side in eqs and 3 are the MD-averaged van der Waals (vdw) and electrostatic (ele) interaction energies, respectively, between the ligand (lig) and its surroundings (surr) as obtained in complex with the protein bound or unbound in solvent. Another popular end-point method is molecular mechanics combined with Poisson–Boltzmann and surface area (MM/PBSA). MM/PBSA calculations can be performed using either results from protein–ligand complex simulations in a single-trajectory (one-average) setup or from three separate simulations per compound (i.e., of the complex, the protein, and the unbound ligand) in a multi-trajectory or three-average setup.[19] Use of the single-trajectory approach is more widespread owing to its simplicity, efficiency, precision, and accuracy compared to the multi-trajectory setup.[6,20,21] This single-trajectory approach of MM/PBSA resembles LIE in a way that they both do not account explicitly for changes in internal energy and configurational entropy of the ligand and protein upon binding. A difference is that single-trajectory MM/PBSA assumes the conformational distribution for the bound and unbound ligand to be the same, while LIE does not.[22] Molecular dynamics (MD) trajectories of the protein–ligand complex as obtained in the one-average strategy can be used to evaluate each free energy term on the right-hand side of ΔG within the following equation[10,23] In eq , G represents the free energy of the complex, and G and G represent the free energies of the unbound protein and ligand, respectively. The separate free energy terms G for the protein, ligand, or protein–ligand complex are each quantified as[23]where V comprises bond-stretch, angle-bend, torsion, and improper-dihedral energies, and V and V are the van der Waals and electrostatic nonbonded interaction energies, respectively. Together, the sum of these terms make up the vacuum MM energy terms, while G and G constitute the solvation free energies calculated using a continuum (implicit) solvation model, representing the free energy change due to converting a solute in vacuum into its solvated state. In MM/PBSA, G is obtained by solving the Poisson–Boltzmann equation, while the G term is often approximated by a solvent accessible surface area (SASA) term.[24] Replacing Poisson–Boltzmann with Generalized Born[25] for calculating G is on the basis of the alternative MM/GBSA approach.[10] Here, T and S denote the temperature of the system and the entropy of the solute in vacuum, respectively. The TS terms entering eq via G, G, and G account for the change in conformational entropy of the protein–ligand complex upon ligand binding. This can be approximated, for example, via normal-mode analysis of the protein–ligand coordinates obtained from the simulation of the complex. Often this term is neglected,[26] which is justified, for example, by the typical interest in relative binding free energies of similar compounds. As previously reported by Genheden and Ryde[27] among others, LIE can be more compute efficient up to almost 1 order of magnitude when compared to MM/GBSA (which is due to the entropy[27] and polar term[28] in eq ), while the relative accuracy of both methods in predicting experimental data can be system dependent, and this conclusion also applies to MM/PBSA.[27] In the current work, we evaluate if LIE can achieve similar (or higher) accuracy compared to MM/PBSA in calculating binding affinities of 27 thieno[3,2-d]pyrimidine-6-carboxamide analogs for the sirtuin 1 (SIRT1) receptor. SIRT1 is a member of the sirtuin family which is responsible for regulating and/or deacetylating stress resistance and metabolism processes,[29] and inhibition of SIRT1 has been proposed for treating cancer, HIV-1 infections, mental retardation syndrome, and parasitic diseases.[30] Results will also be compared to a MM/GBSA model of Karaman and Sippl[31] for these ligands binding to SIRT1. Although LIE needs experimental data to train the model parameters in eq , their calibration gives access to direct estimates of ΔG for individual (query) compounds. This makes it straightforward to use a Boltzmann-like weighting scheme to combine results of multiple MD simulations starting from different protein conformations and/or ligand binding modes into a single calculation per compound.[32,33] Previously, we successfully used this strategy to compute binding affinities for flexible proteins such as Cytochrome P450s that can bind substrates and/or inhibitors in multiple ways.[33−36] In the current work, we evaluate the potential of this strategy to compute SIRT1 binding free energies by concatenating results of MD simulations starting with binding poses obtained from unsupervised docking and clustering. In addition, we investigate if the Boltzmann-like weighting scheme can identify binding poses as those manually selected by Karaman and Sippl[31] in their MM/GBSA study as the most probable ones. In a final step, we monitor the contributions from the different terms in the central LIE and MM/PBSA equations to the calculations in (trends in) binding affinity, in order to evaluate the possibility to further increase efficiency and accuracy in computing SIRT1 affinities.

Methods

Reference Data and Input File Preparation

Reference data for the SIRT1 ligand set (i.e., experimentally determined values ΔG for protein–ligand binding free energies) were taken from Disch et al.[37] (Table ). All 27 compounds have a thieno[3,2-d]pyrimidine-6-carboxamide scaffold, and ΔG ranges between −29.3 and −50.3 kJ mol–1. Values for ΔG were consistently obtained from the set of IC50 inhibition constants reported by Disch et al., using the Cheng–Prusoff relation[38,39]where R is the gas constant, T is the temperature, and the concentration of agonist used to measure the IC50 data is assumed to be equal to its EC50 value.[39]

Table 1

Compound ID, Molecular Structure, and Experimental Values for Binding Free Energies (ΔG) of SIRT1 Inhibitors Used in the Current Study, with ΔG Values Derived from Experimental Data Obtained by Disch et al.[37],a

The compound numbering (IDs) as listed below is the same as used by Disch et al.[37]

The compound numbering (IDs) as listed below is the same as used by Disch et al.[37] Analogous to previous MD studies,[31,40] the SIRT1 crystal structure with PDB ID 4I5I(41) was used as the protein template structure for docking and subsequent MD. Along with the protein atom coordinates, the positions of the zinc ion and one of the water molecules in the binding pocket[31] were retained to obtain the template. Protein structure preparation was performed using the QuickPrep function of Molecular Operating Environment (MOE) version 2015.10[42] to add hydrogens, assign protonation states and partial charges, and perform energy minimization prior to docking. During minimization, the protein and zinc ion were described using the Amber14SB force field.[43] To obtain starting structures for ligand binding poses, two alternative strategies were used: ligand–binding poses were either taken directly from Karaman and Sippl[31] or obtained from our docking. An advantage of using the Karaman poses can be that these were carefully selected for all ligands such that their scaffold showed maximal structural overlap with the scaffold of ligands 11c, 28, and 31, which were co-crystallized in PDB structures 4JSR, 4JT8, and 4JT9 of the homologous SIRT3 protein, respectively,[37] after fitting these onto the 4I5I SIRT1 structure. In this way, the superimposed poses of all ligands resemble the poses of the co-crystallized inhibitors, which suggests that they occupy both the acetyl lysine substrate channel and the nicotinamide C-pocket of SIRT1 that overlaps with the NAD+ binding channel.[31,37] Ligand-binding poses were also generated using the PLANTS (Protein–Ligand Ant System) docking software version 1.2[44] with the speed 1 setting and evaluated using the ChemPLP scoring function.[45] Coordinates for the docking center were defined as the average of the center-of-mass of ligands in the Karaman binding poses, and a docking radius of 0.8 nm was used. For each ligand, 100 docking poses were generated and subjected to hierarchical clustering with the maximum number of clusters set to 3. For each cluster, a representative pose with the smallest deviation to the average cluster coordinate set was selected as an initial structure for MD simulations. Ligand topologies for MD were generated according to the general Amber force field (GAFF)[46] at the AM1-BCC[47] level of semi-empirical theory by using the antechamber(48) and sqm packages of AmberTools15.[49] This topology generation was performed and converted to GROMACS format by using ACPYPE (Rev: 7268).[50]

MD Simulations

Prior to MD, the protein–ligand complex was immersed in a dodecahedral periodic box and solvated in approximately 15,300 TIP3P[51] water molecules. Nine Na+ counterions were added to neutralize the net charge of the system, and the system was energy minimized using a steepest descent algorithm. Thermal pre-equilibration and (grid-based) pair-list updates during MD were performed as described previously.[34] The equations of motion were integrated using a leap-frog algorithm.[52] Particle mesh Ewald (PME)[53] was used to treat the long-range electrostatic interactions during MD. The cutoff for nonbonded interactions was set to 0.9 nm. Hydrogen atoms were assigned a mass of 4.032 amu,[54] and all bonds were constrained using the LINCS algorithm,[55] allowing a time-step of 4 fs. Coordinative bonds with a length of approximately 0.2 nm were added between the Zn2+ ion and sulfur atoms of its coordinating residues (Cys131, Cys134, Cys155, Cys158) to ensure coordination during simulation and kept constant using the LINCS algorithm as well. A Berendsen thermostat and barostat[56] were used to maintain the system temperature and pressure close to their reference values of 300 K and 1.01325 bar, using coupling constants of 0.1 and 0.5 ps, respectively. All MD stages, including 20 ns production runs in explicit solvent were carried out using GROMACS 4.5.5,[57] in which energy and coordinate trajectories were written out to disk every 10 ps, and center-of-mass motion was removed every 10 ps. Every ligand pose was used twice as the starting structure of replicated simulations[33] starting from different Maxwell–Boltzmann distributions of randomly assigned atomic velocities. Separate duplicated 20 ns production simulations were performed of the unbound ligands in water in periodic dodecahedral boxes with minimal solute atom-box edge distances of 1.8 nm, using identical settings as in the protein–ligand complex simulations described above. Energy minimization and MD simulations were run in an automated fashion as implemented in eTOX ALLIES,[58] and the trajectories obtained from the production runs were used for subsequent LIE and MM/PBSA calculations as described below.

LIE Calculations

Average electrostatic and van der Waals ligand–environment interaction energies were obtained by averaging over concatenated duplicated simulations performed per protein–ligand binding pose or per ligand free in solvent. LIE model parameters were calibrated based on experimental ΔG values (Table ) in an ordinary least squares (OLS) fitting procedure using the Python scikit-learn 0.17 package.[35,59] Inspired by the ability to deal with flexible proteins and/or possible multiple ligand-binding modes in LIE binding affinity computation for, for example, Cytochrome P450s (CYPs)[32,34,35] or nuclear receptors,[60,61] we also report here SIRT1 LIE models in which results from MD simulations starting from different binding poses are combined into a single ΔG calculation per ligand. For this purpose, the weight W of the contribution from simulations starting from an individual binding pose i to the binding free energy is calculated as[32,62]where k is Boltzmann’s constant, T is the temperature of the simulated system, and ΔG is the calculated binding free energy for a given binding pose according to eq . The W’s can be used to combine results from simulations starting from N different binding poses by using the following equationwhere α and β (and the optional offset parameter γ) are again parametrized based on curated experimental values for ΔG, but they need to be trained in an iterative fashion (in which simulation data for multiple binding modes are combined into individual computations) when N > 1[32] because the W’s depend on the LIE model parameter values.

MM/PBSA Calculations

MM/PBSA models were generated using g_mmpbsa(24,63) by first PBC correcting the MD trajectories. The Zn2+ ion was included as part of the protein in the MM/PBSA calculations. We used a single-trajectory MM/PBSA protocol which makes use of protein–ligand complex simulations only and assumes the protein and ligand conformation in the bound and unbound states to be identical. Thus, in eq , the V terms as well as the intramolecular nonbonded interaction energies cancel out,[64] resulting in the final equationwith V and V being the protein–ligand van der Waals and electrostatic interaction energies, respectively, whileand The Poisson–Boltzmann equation for the polar term is evaluated using the following equation[10,63,65]where φ(r) is the solute’s electrostatic potential, ε(r) is the dielectric constant, and ρf(r) is the fixed charge density. As the calculation of the polar solvation energy is known to depend on the chosen value for the dielectric constant ε of the complex,[66] its value was varied in order to inspect its effect on predicted ΔG values. By default, ε was set to 2 and tuned to 1, 4, or 8 to examine the sensitivity of calculated binding free energies to the solute dielectric constant. Other default settings in g_mmpbsa were left unaltered (i.e., use of atomic radii as proposed by Bondi,[67] linear PB equation solver, 0.05 nm grid resolution, and smoothed van der Waals surface).[24] To estimate nonpolar solvation energies, g_mmpbsa assumes a linear dependence of G on the SASA, expressed as follows[68]where γ is related to the surface tension of the solvent and is set to its default value of 2.26778 kJ mol–1 nm–2, while a default value (3.84982 kJ mol–1) was also used for fitting parameter b. A solvent probe radius of 0.14 nm was employed to determine the SASA.[24] Due to its high computational demand and large standard error compared to other contributing terms,[6,20,31,66,69−72] the entropic term was not included in the MM/PBSA models presented in this work.

Protein–ligand Interaction Profiling

To analyze protein–ligand interactions that occurred during the MD simulations, we built a dedicated Python-based biomolecular analysis library MDInteract, which we made freely available at https://github.com/MD-Studio/MDInteract. MDInteract adds structure analysis methods to the API of the popular pandas data analysis toolkit.[73] The MD trajectory and structure file formats are imported using the MDTraj library.[74] MD trajectories were analyzed for the following biomolecular interaction types at a 100 ps resolution similar to previous studies performed by us:[35,61,75] Hydrophobic interactions: identified as all contacts between carbon atoms within 0.4 nm having only C or H atoms as neighbors. Interacting pairs from the same source atom to multiple target atoms in the same residue, in both directions, were filtered to retain only the pair with the smallest distance. Note that atoms involved in π- or T-stacking were separately considered (see below). π- and T-stacking: identified as the geometric centers of two aromatic rings within 0.55 nm, with an offset of no more than 0.2 nm after projection of the geometric center of one ring onto the other and an angle between the normals of both rings of 180° ± 30° for π-stacking or 90° ± 30° for T-stacking.[76] Aromatic rings were detected using a SSSR algorithm (Smallest Set of Smallest Rings) filtering for aromatic planar rings. Hydrogen bonds (H-bond): identified as all interactions between H-bond donor–acceptor pairs within 0.41 nm where the donor–H–acceptor angle was within 50° from the ideal in-plane 180° orientation and the heavy atom acceptor neighbor–acceptor–H angle did not deviate more than 90°.[77] Donor atom SYBYL types: N.3, N.2, N.acid, N.ar, N.am, N.4, N.pl3, N.plc, and O.3 with at least one attached hydrogen. Acceptor atom SYBYL types: N.3, N.2, N.1, N.acid, N.ar, O.3, O.co2, O.2, S.m, and S.a not of a donor character. H-bonds part of a salt bridge were labeled as such by the salt-bridge detection method. Salt bridges: identified as all interactions between the heavy atoms representing the geometric centers of two charged groups of opposite signs within 0.55 nm distance.[78] The H-bond component of the salt bridge was added by the H-bond detection method. Cation−π interactions: identified as all interactions between the center of an aromatic ring and a center of positive charge if the distance was within 0.6 nm and the offset between charge center and ring center after projection on the same plane was less than 0.2 nm. In case the charge center was a tertiary amine, the angle between charge center and ring plane should be within 90° ± 30°.[79] Halogen bonds: identified as pairs of halogen bond donor (I, Br, Cl, F) and acceptor atoms (O, N, and S having a single atom neighbor of type P, C, or S) within 0.41 nm where the donor angle (C–X–O) was within 165° ± 30° and the acceptor angle (Y–O–X) was within 120° ± 30°.[80] The frequency of interactions per ligand pose to interacting amino acid residues was normalized by the number of replicates (i.e., by dividing interaction counts by two) and was represented as heat maps using the seaborn Python package version 0.8.[81] Ligand–residue interaction counts that are less than 20 were discarded for further analysis.

Results and Discussion

LIE and MM/PBSA Models

Tables and 3 summarize the predictive ability of our LIE and MM/PBSA models, which was evaluated in terms of obtained deviations from and correlation with experimental data. The latter was assessed by calculating r2, Pearson’s r (r), and Spearman’s ρ (ρ) coefficients.[82] For the calibrated LIE models, we also report correlation coefficients q2 obtained from leave-one-out cross validation (LOO-CV). Deviations of computed ΔG values from experiments were analyzed in terms of root-mean-square errors (RMSEs) and standard deviations of errors in predictions based on LOO-CV (SDEPs). In Tables and 3, RMSEs and SDEPs are only reported for our LIE models because the MM/PBSA calculations do not allow one to directly compare predicted and experimental ΔG values for individual ligands.[66,83−86] This is exemplified by the range of predicted values obtained from one of the MM/PBSA models discussed below, as presented in Figure (right panel).

Table 2

Model Parameters α and β of LIE Models for SIRT1 and Their Respective Root-Mean-Square Error (RMSE), Standard Deviation of Error in Prediction (SDEP, based on leave-one-out cross validation), and Correlation Metrics Represented by r2, q2, Pearson’s r (r), and Spearman’s ρ (ρ) for the Data Set of 27 Compoundsa

	α	β	RMSE	SDEP	r²	q²	r_P	ρ_S
LIE_single	0.40	–0.04	4.21	4.66	0.52	0.35	0.72	0.72
LIE_γ=–10.24	0.30	–0.05	4.00	4.54	0.52	0.38	0.72	0.73
LIE_mult	0.42	–0.19	6.16	6.59	0.16	–0.30	0.40	0.43
LIE_comb	0.40	–0.05	4.30	4.88	0.49	0.29	0.70	0.70
LIE_Boltzmann4	0.40	–0.02	4.21	4.67	0.49	0.35	0.70	0.69
LIE_first1ns	0.41	–0.04	3.96	4.39	0.54	0.42	0.74	0.70
LIE_first5ns	0.41	–0.02	4.41	4.91	0.46	0.28	0.68	0.68
LIE_first10ns	0.40	–0.04	4.27	4.73	0.51	0.33	0.71	0.72
LIE_first15ns	0.40	–0.04	4.27	4.71	0.51	0.34	0.71	0.71
LIE_last1ns	0.41	0.03	4.58	5.01	0.48	0.25	0.69	0.65
LIE_last5ns	0.40	–0.03	4.28	4.75	0.52	0.32	0.72	0.69
LIE_last10ns	0.40	–0.05	4.41	4.83	0.50	0.30	0.70	0.70
LIE_last15ns	0.40	–0.05	4.30	4.74	0.52	0.33	0.72	0.72
LIE_equil	0.41	–0.02	4.50	5.30	0.45	0.16	0.67	0.66
LIE_β=0	0.41	–	4.24	4.40	0.51	0.42	0.71	0.67
LIE_α=0	–	–1.67	31.3	32.1	0.05	–29.9	0.23	0.18
vdw_bound	0.19	–	3.79	3.94	0.58	0.54	0.76	0.76
vdw_unbound	0.35	–	4.39	4.56	0.49	0.38	0.70	0.67

RMSEs and SDEPs are given in kJ mol–1.

Table 3

Performance of MM/PBSA Models for SIRT1 in Terms of Their Correlation Metrics, Represented by r2, q2, Pearson’s r (r), and Spearman’s ρ (ρ) for the Data Set of 27 Compounds

	r²	r_P	ρ_S
MM/PBSA_ε=2	0.41	0.64	0.61
MM/PBSA_ε=1	0.39	0.63	0.60
MM/PBSA_ε=4	0.22	0.47	0.45
MM/PBSA_ε=8	0.11	0.33	0.31
MM/PBSA_first1ns	0.28	0.53	0.46
MM/PBSA_first5ns	0.41	0.64	0.64
MM/PBSA_first10ns	0.42	0.65	0.63
MM/PBSA_first15ns	0.41	0.64	0.63
MM/PBSA_last1ns	0.36	0.60	0.56
MM/PBSA_last5ns	0.35	0.59	0.56
MM/PBSA_last10ns	0.36	0.60	0.58
MM/PBSA_last15ns	0.39	0.62	0.60
MM/PBSA_ele	0.19	0.43	0.08
MM/PBSA_vdW	0.60	0.77	0.76
MM/PBSA_polar	0.36	–0.60	–0.57
MM/PBSA_SASA	0.57	0.76	0.75
MM/PBSA_noele	0.04	0.20	0.14
MM/PBSA_nopolar	0.49	0.70	0.71

Figure 1

Scatter plots for experimental (exp) and calculated (calc) free energies ΔG of ligand–SIRT1 binding. Calculated values are obtained using the LIE (left panel) or MM/PBSAε=2 model (right panel). The solid blue lines represent linear fits to the data, while the distributions of data are represented by kernel density plots.

RMSEs and SDEPs are given in kJ mol–1. Scatter plots for experimental (exp) and calculated (calc) free energies ΔG of ligand–SIRT1 binding. Calculated values are obtained using the LIE (left panel) or MM/PBSAε=2 model (right panel). The solid blue lines represent linear fits to the data, while the distributions of data are represented by kernel density plots. When calibrating a LIE model using per ligand the binding pose proposed by Karaman and Sippl[31] as a (single) starting pose for MD, we obtain a model (LIE in Table ) with RMSE = 4.21 kJ mol–1, which is within the typical experimental uncertainty of 4–5 kJ mol–1.[87,88] Overall, this suggests satisfactory performance of the model parameters used. Including the optional offset γ parameter did not improve model performance further: The resulting LIEγ=–10.24 model (with γ = −10.24 kJ mol–1) showed a slight reduction in RMSE and SDEP of 0.21 and 0.12 kJ mol–1, respectively, but it did not show an increase in correlation coefficients compared to LIE (Table ). Figure illustrates that LIE shows a slightly better correlation between calculated and experimental ΔG values than the MM/PBSA model in which ε is set to its default g_mmpbsa value of 2 (MM/PBSAε=2 in Table ). This is confirmed by the slightly higher r2 (0.52 vs 0.41) and Pearson’s and Spearman’s coefficients (0.72 vs 0.64 or 0.61) of LIE compared to MM/PBSAε=2 (Tables and 3). It should be kept in mind however that α and β in the LIE equations are fitted parameters. Here, q2 of LIE (0.35) is in fact slightly lower than r2 for MM/PBSAε=2 (0.41), and the value of −0.04 for β hints at possible overfitting. As discussed in more detail below, this could be resolved by constraining β to 0 in a separate LIEβ=0 model, which has a q2 value of 0.42 (Table ). In accordance with previous findings from MM/PBSA studies on other protein systems,[66] we found a clear effect of ε on obtained correlations in the MM/PBSA models. Setting its value to 1 or 4 led to slightly and significantly lower correlation coefficients, respectively, while using ε = 8 further decreased the quality of the model when compared to MM/PBSAε=2 (Table ). Even though, values of 6–7 or higher have been reported as average estimates for dielectric permittivities of protein binding pockets,[89] ε is often set to 1 or 2.[24,90] Our finding that the most successful models of MM/PBSA in predicting SIRT1–ligand binding use a low value of ε is in line with the hydrophobic acetyl–lysine binding interface in SIRT1;[91] as discussed by Karaman and Sippl,[31] the binding site of all inhibitors is characterized by six hydrophobic (Ala22, Ile30, Phe33, Phe57, Ile107, Phe174) and two charged side chains (Asp108, His123). Our MM/PBSAε=2 model performs similarly as one of the MM/GBSA models presented by Karaman and Sippl,[31] who used a ε value of 1 and reported r2 values ranging between 0.4 and 0.6 in their MM/GBSA study on SIRT binding. In addition, the similar performance of LIE and MM/PBSAε=2 is in line with the earlier work of Uciechowska et al., who found comparable performance of LIE and MM/PBSA in reproducing experimental data for a smaller set of SIRT2 binders.[72]

Inclusion of Multiple Binding Poses

In a next step, we investigated if we could use the iterative LIE scheme to improve calculations and to develop a binding affinity model in an unsupervised manner. For that purpose, multiple binding poses were directly obtained from docking and clustering. These were used as starting structures in MD simulations i in eq . The resulting LIE model showed significantly higher RMSE and lower correlation coefficients than the LIE model (Table ). We analyzed if this decrease in predictive quality was due to either a lack of correct MD starting poses in the LIE model or the fact that the protein does not bind its ligands in multiple orientations (or due to a combination of both). We first explored the interactions between the ligands and SIRT1 residues during simulations used in the LIE and LIE models. The obtained interaction heat maps are depicted in Figure . They show high frequencies in three ligand–SIRT1 hot-spot interactions that were present in the starting poses[31] and persistent during the simulations used in the LIE and MM/PBSA models (i.e., with Phe33, Ile107, and Asp108; Figure ). Figure shows that these frequencies are significantly lower for most of the LIE simulations, especially those involving interactions with Asp108. This finding indicates that the unsupervised selection of starting poses for MD leads to a larger spread in protein–ligand interactions during simulation than in the LIE model. For most ligands, we found a relatively large root-mean-square deviation (RMSD) in atomic positions between the MD starting poses used in LIE and the LIE pose; for only eight of the ligands, at least one of the three docked starting poses showed an RMSD of less than 0.1 nm (Table S1). Only when combining results of the MD simulations initiated from the starting poses of LIE with those from LIE, we could train a combined model LIE with similar accuracy and correlation coefficients (and α and β) as LIE (Table ). These findings indicate that docking and clustering for the purpose of LIE training did not lead to sufficiently appropriate binding poses, while the retained predictive quality of the LIE model (compared to LIE) implies that incorporating too many binding poses does not necessarily lead to lower predictive quality. Thus, when being able to generate and select appropriate binding poses from docking and/or experimental information on protein–ligand interactions, iterative LIE training can subsequently be performed in an unsupervised manner.

Figure 2

Figure 3

Residues of SIRT1 (Phe33, Ile107, Asp108) that show hot-spot interactions with the ligands during the simulations used, for example, in the LIE model. All ligands included in the model are depicted in stick representation using different colors (in their starting structure used for MD), while the interacting residues are labeled and shown in ball-and-stick representation.

Ligand–residue interaction profiles for MD simulations used in the LIE and MM/PBSA models (upper panel) and in the LIE model (lower panel). Interaction counts are presented for individual SIRT1 residues (x-axis labels) and averaged over duplicated 20-ns runs per ligand (y-axis labels in upper panel) or per combination of ligand and starting pose used for MD (y-axis labels in lower panel). Residues of SIRT1 (Phe33, Ile107, Asp108) that show hot-spot interactions with the ligands during the simulations used, for example, in the LIE model. All ligands included in the model are depicted in stick representation using different colors (in their starting structure used for MD), while the interacting residues are labeled and shown in ball-and-stick representation. Here, we evaluated if Boltzmann-like weighting (eq ) within the LIE model predicted the manually prepared binding poses of Karaman and Sippl as the most probable ones by inspecting per ligand the starting pose used in the MD simulation with the highest weight W (Figure ). We found that for 22 of the 27 ligands, the simulation starting from the binding pose used in the LIE model shows the largest weight W. For the five ligands for which the largest weight comes from a simulation starting from one of the docked poses used in LIE (19, 20, 21, 27, 38), we inspected the corresponding pose (Figure ) and discovered that the scaffolds of three out of five ligands (19, 20, 21) show clear structural overlap with the ligand pose used for LIE (cf. Figure and the relatively low RMSDs in Table S1). Interestingly, docked starting poses of ligands 27 (pose 1) and 38 (pose 1) for the simulations with the highest weight W in the LIE model show less overlap (Figure ) and higher RMSDs (Table S1), but they still share similar interaction profiles with the simulations used for these ligands in LIE (Figure ). This highlights that similarity in protein–ligand interactions may well be a better measure for comparable modes and/or affinities of binding than structural overlap alone, in agreement with results from previous studies in which we used iterative LIE to evaluate binding to flexible proteins such as CYPs or nuclear receptor FXR.[34,35,61] This is further exemplified by MD simulations starting from poses which show relatively high contributions to LIE calculations such as in the case of pose 1 of compound 26, which does not have the lowest RMSD of poses 1–3 when compared to the LIE starting pose (Table S1), but shows a similar pattern of protein–ligand interactions (Figure ). Subsequently, we trained a LIE model using per ligand the simulation with the highest weight W from LIE, and we found that this LIE model performs similarly as LIE and LIE (Table ), confirming that the latter does not utilize too many different MD starting poses.

Figure 4

Figure 5

Comparison of ligand starting poses for the MD simulations that have the highest weights W in the combined LIE model (solid cyan) with the LIE pose (transparent green).

Boltzmann weights W for the simulations (x-axis) used per ligand (y-axis) in the combined LIE model. Starting poses 0 and 1–3 were starting poses for the simulations used in the LIE and LIE model, respectively, and the intensity of the color indicates the size of W per simulation. Comparison of ligand starting poses for the MD simulations that have the highest weights W in the combined LIE model (solid cyan) with the LIE pose (transparent green).

Impact of Simulation Length

We also explored the effect of simulation length on ΔG computations by the LIE and MM/PBSAε=2 models. For that purpose, models were developed based on averages for the interaction energy and solvation terms over either the first or the last 1, 5, 10, or 15 ns of the production simulations starting from the Karaman and Sippl poses (and using ε = 2 in the MM/PBSA models). The performance of these models is listed in Tables and 3 (using subscripts firstXns or lastXns) and was compared to the models based on 20-ns averages as described above. MM/PBSA studies in the literature often report simulations of tens of nanoseconds from which the last few nanoseconds are used to compute averages for the different terms in eq or 9.[64] Our analysis shows that for the current system of interest a simulation length of 5 ns (in the MM/PBSA model) is sufficient to obtain comparable correlation coefficients for MM/PBSAε=2. However, by running longer simulation times and taking the last 1 or 5 ns to average over gives a slight deterioration in model performance (cf. MM/PBSA and MM/PBSA in Table , respectively). It is also worth mentioning that 1-ns LIE simulations are sufficient to achieve a model (LIE) that shows even slightly better performance compared to LIE (Table ). The finding that shorter simulation lengths can improve LIE calculations is in line with previous results in iterative models that showed improved performance when restricting sampling to local parts of conformational space,[62,92] and it can be seen as a clear advantage of LIE in terms of its efficiency. It prompted us to generate a LIE model (LIE) using the output of our thermal equilibration simulation at 300 K only, but this in turn showed a slightly higher RMSE (4.50 kJ mol–1) and lower r (0.67) than those for LIE and LIE (Table ). The reason that longer equilibration may not necessarily lead to improved correlation for the MM/PBSA model (which is in line with previous findings in literatures[66,93]) and that longer simulations are more required for MM/PBSA than for LIE is due to the relatively slow convergence of the MM/PBSA polar term.[90,94] This is illustrated in Figure , which shows an example of a typically observed time series of the different MM/PBSA and LIE terms. In conclusion, longer equilibration times do not necessarily lead to an improved agreement with experimental data which makes the selection of the optimal simulation time to be an effective model parameter.

Figure 6

Time series of different ligand–environment interaction energies and solvation free energy terms used in the LIE and/or MM/PBSA models, as obtained in one of the MD simulations of ligand 11a either bound to SIRT1 or free in water (red, ΔG; orange, ΔG; light green, ⟨V⟩; green, ⟨V⟩; dark green, V; light blue, ⟨V⟩; blue, ⟨V⟩; dark blue, V). Plotted values are averages over 10 consecutive data points.

Contributions of Different LIE and MM/PBSA Terms to ΔG Calculations

In the LIE models discussed above, β is close or practically equal to zero (Table ). This indicates that electrostatic interactions do not significantly contribute to trends in binding affinity, which can be in line with the hydrophobic character of the SIRT1 binding pocket.[91] In a previous LIE study on a protein with a hydrophobic active site (i.e., Cytochrome P450 1A2 or CYP1A2), fitted values for β were similarly small and constrained to zero in the final presented model.[95] As for CYP1A2, a SIRT1 LIEβ=0 model with β set to zero shows nearly identical performance as LIE when considering RMSEs and correlation metrics and a slightly lower SDEP (and higher q2) (Table ). The low contribution of electrostatic interactions to predicted trends in binding affinity can be understood from the relatively low absolute values for ΔV obtained from MD (Table S2), suggesting that hot-spot and other hydrogen-bonding interactions with SIRT1 residues compensate for the ligand–solvent interactions in the unbound state. As a result, training a LIE model with α set to zero (and using only β as fitting parameter) is of no meaning at all (cf. LIEα=0 in Table ). Similarly as for LIE, we tried to simplify the MM/PBSA models by only taking van der Waals interactions or the nonpolar/SASA term into account. Table shows that models that only include the van der Waals V interactions (MM/PBSA, r = 0.77) or the SASA ΔG term in eq (MM/PBSA, r = 0.76) show higher predictivity than models based on electrostatic V interactions (r = 0.43) or the polar ΔG term only (r = −0.60) and even than the overall MM/PBSAε=2 model. Our finding that LIE and MM/PBSA models with a reduced number of model terms can give similar or slightly improved performance would make it of interest to explore the use of extended LIE approaches such as the one recently proposed by He and co-workers,[96] in which MM/PBSA-like terms are included together with six fitting parameters. Based on our current findings, this number may also be effectively reduced in a model for ligand binding to SIRT1. In addition to the small electrostatic contribution to trends in binding to the hydrophobic SIRT1 pocket (Table S2), the reason that neglecting electrostatics and polar contributions can improve our MM/PBSA predictions can be explained by the relatively large uncertainty in determining V and especially ΔG (cf. Figure ). The difficulty to reach convergence in the latter can be due to fluctuations in contributions not only from the protein binding site but also from distant protein residues that may be less relevant for ligand binding.[22] The uncertainty in determining ΔG was also highlighted in a previous benchmarking study on g_mmpbsa,[24] in which it was observed that ΔG is more prone to be different (by more than 20 kJ mol–1) between results of g_mmpbsa and AMBER simulation tools[97,98] than the other MM/PBSA terms. This was attributed to the different algorithms implemented in APBS[63] (used in g_mmpbsa) and the PBSA package of AMBER and/or subtle differences in the implementation of the AMBER force field used in AMBER and GROMACS. As the MM/PBSA method only considers protein–ligand interactions and MM/PBSA was found to be the best performing MM/PBSA model in Table , we conducted further investigation on a LIE model with β set to zero and in which ΔV in eq is replaced by ⟨V⟩. This LIE model is annotated as vdw in Table and thus considers van der Waals interaction energies in the protein–ligand bound state only. It shows the highest accuracy of the LIE models presented in Table , with RMSE = 3.79 kJ mol–1 and r = 0.76. Interestingly, when calibrating a LIE model vdw with β = 0 and ΔV in eq replaced by ⟨V⟩, comparable performance was obtained as for, for example, LIE (Table ). Comparable correlation coefficients could even be obtained by considering ligand SASAs only (for which r2 = 0.43 and r = 0.66). In this particular case, binding prediction can apparently be made highly efficient by considering ligand simulations or properties only. This is probably again due to the dominance of nonpolar protein–ligand contacts in determining trends in binding affinity, but we note that the most predictive LIE and MM/PBSA models are the ones based on protein–ligand van der Waals interactions (Tables and 3), as exemplified, for example, by the high q2 value for the vdw LIE model.

Conclusions

By comparing fitted LIE models for a set of 27 compounds binding to SIRT1 with single-trajectory-based MM/PBSA models, we find calculated ΔG values from our LIE models with a RMSE of 4.2 kJ mol–1, while the obtained correlations are comparable (Pearson’s r = 0.72 and 0.64 for LIE and MM/PBSAε=2, respectively). This indicates that if agreement with experimental affinities in absolute terms is not of interest, both methods have a comparable predictive quality. We also found that longer equilibration may not necessarily lead to improved correlation with experiments and that longer simulations are more required for MM/PBSA than for LIE due to the relatively slow convergence of the MM/PBSA polar term. In addition to the higher cost in computing this term when compared to the terms in the central LIE equation, this makes LIE more efficient. For the studied protein and ligands, calibration of an iterative LIE model (that uses weighted contributions from multiple simulations starting from different ligand-binding poses) did not lead to satisfactory computations when generating starting poses for MD simulations in unsupervised docking and clustering. During these simulations, we found relatively low frequencies in ligand interactions with hot-spot protein residues, as analyzed using our Python library MDInteract. In other words, careful selection of proper MD starting poses can be crucial in generating iterative LIE models. Encouragingly, when combining results from simulations starting from the docked poses and from poses based on homologous crystal structure data (in the LIE model), Boltzmann-like weighting of the multiple simulations was able to identify appropriate starting configurations. For our simulations with starting structures obtained in a supervised way, we found relatively small values for the difference ΔV in electrostatic interactions in the bound and unbound state. Apparently, ligand interactions with hot-spot residues can largely compensate for the loss in polar interactions with the solvent upon binding. As a result, LIE parameter β was fitted to values close to zero. By constraining it to zero, we could obtain a model with reduced overfitting compared to models in which β was used as a free parameter. This was illustrated by a lower SDEP and higher q2 for the former model in leave-one-out cross validation. We found that the correlation between calculated and experimental ΔG values could be maximized by including protein–ligand van der Waals interactions only (i.e., either in a MM/PBSA model with V in eq as the only term or in a LIE model with α as the only free parameter and which only includes results from the bound-complex simulations). On the other hand, our models could be made most efficient by only including ligand properties, e.g., in a SASA-only-based model or in a LIE model that only includes van der Waals interaction energies involving the unbound ligand. The finding that predictive models could be obtained in terms of van der Waals interactions only for this specific protein is in line with the hydrophobic character of its binding site and with the uncertainty in determining the ΔG MM/PBSA term.

70 in total

1. Calculating structures and free energies of complex molecules: combining molecular mechanics and continuum models.

Authors: P A Kollman; I Massova; C Reyes; B Kuhn; S Huo; L Chong; M Lee; T Lee; Y Duan; W Wang; O Donini; P Cieplak; J Srinivasan; D A Case; T E Cheatham
Journal: Acc Chem Res Date: 2000-12 Impact factor: 22.384

2. Binding constants of neuraminidase inhibitors: An investigation of the linear interaction energy method.

Authors: I D Wall; A R Leach; D W Salt; M G Ford; J W Essex
Journal: J Med Chem Date: 1999-12-16 Impact factor: 7.446

3. What determines the van der Waals coefficient beta in the LIE (linear interaction energy) method to estimate binding free energies using molecular dynamics simulations?

Authors: W Wang; J Wang; P A Kollman
Journal: Proteins Date: 1999-02-15

4. Electrostatics of nanosystems: application to microtubules and the ribosome.

Authors: N A Baker; D Sept; S Joseph; M J Holst; J A McCammon
Journal: Proc Natl Acad Sci U S A Date: 2001-08-21 Impact factor: 11.205

5. Performance comparison of generalized born and Poisson methods in the calculation of electrostatic solvation energies for protein structures.

Authors: Michael Feig; Alexey Onufriev; Michael S Lee; Wonpil Im; David A Case; Charles L Brooks
Journal: J Comput Chem Date: 2004-01-30 Impact factor: 3.376

Review 6. Ligand binding affinities from MD simulations.

Authors: Johan Aqvist; Victor B Luzhkov; Bjørn O Brandsdal
Journal: Acc Chem Res Date: 2002-06 Impact factor: 22.384

Review 7. Lead discovery using molecular docking.

Authors: Brian K Shoichet; Susan L McGovern; Binqing Wei; John J Irwin
Journal: Curr Opin Chem Biol Date: 2002-08 Impact factor: 8.822

8. Fast, efficient generation of high-quality atomic charges. AM1-BCC model: II. Parameterization and validation.

Authors: Araz Jakalian; David B Jack; Christopher I Bayly
Journal: J Comput Chem Date: 2002-12 Impact factor: 3.376

9. Cation-pi interactions in structural biology.

Authors: J P Gallivan; D A Dougherty
Journal: Proc Natl Acad Sci U S A Date: 1999-08-17 Impact factor: 11.205

10. Molecular dynamics simulations and thermodynamics analysis of DNA-drug complexes. Minor groove binding between 4',6-diamidino-2-phenylindole and DNA duplexes in solution.

Authors: Nad'a Spacková; Thomas E Cheatham; Filip Ryjácek; Filip Lankas; Luc Van Meervelt; Pavel Hobza; Jirí Sponer
Journal: J Am Chem Soc Date: 2003-02-19 Impact factor: 15.419

11 in total

1. Remdesivir and Ledipasvir among the FDA-Approved Antiviral Drugs Have Potential to Inhibit SARS-CoV-2 Replication.

Authors: Rameez Hassan Pirzada; Muhammad Haseeb; Maria Batool; MoonSuk Kim; Sangdun Choi
Journal: Cells Date: 2021-04-29 Impact factor: 6.600

2. The interactions of folate with the enzyme furin: a computational study.

Authors: Zahra Sheybani; Maryam Heydari Dokoohaki; Manica Negahdaripour; Mehdi Dehdashti; Hassan Zolghadr; Mohsen Moghadami; Seyed Masoom Masoompour; Amin Reza Zolghadr
Journal: RSC Adv Date: 2021-07-06 Impact factor: 4.036

3. A Modified Arrhenius Approach to Thermodynamically Study Regioselectivity in Cytochrome P450-Catalyzed Substrate Conversion.

Authors: Rosa A Luirink; Marlies C A Verkade-Vreeker; Jan N M Commandeur; Daan P Geerke
Journal: Chembiochem Date: 2020-02-25 Impact factor: 3.164

4. Computational Simulations Identified Marine-Derived Natural Bioactive Compounds as Replication Inhibitors of SARS-CoV-2.

Authors: Vikas Kumar; Shraddha Parate; Sanghwa Yoon; Gihwan Lee; Keun Woo Lee
Journal: Front Microbiol Date: 2021-04-21 Impact factor: 5.640

5. Computing Cellulase Kinetics with a Two-Domain Linear Interaction Energy Approach.

Authors: Kay S Schaller; Jeppe Kari; Gustavo A Molina; Kasper D Tidemand; Kim Borch; Günther H J Peters; Peter Westh
Journal: ACS Omega Date: 2021-01-06

6. Molecular Modeling Studies of N-phenylpyrimidine-4-amine Derivatives for Inhibiting FMS-like Tyrosine Kinase-3.

Authors: Suparna Ghosh; Seketoulie Keretsu; Seung Joo Cho
Journal: Int J Mol Sci Date: 2021-11-19 Impact factor: 5.923

7. Accurate Binding Free Energy Method from End-State MD Simulations.

Authors: Ebru Akkus; Omer Tayfuroglu; Muslum Yildiz; Abdulkadir Kocak
Journal: J Chem Inf Model Date: 2022-08-16 Impact factor: 6.162

8. Exploring the dynamic mechanism of allosteric drug SHP099 inhibiting SHP2^E69K.

Authors: Xin-Hua Lu; Wei-Ya Li; Shan Du; Li-Peng Li; Yang-Chun Ma; Liang Zhou; Jing-Wei Wu; Ying Ma; Run-Ling Wang
Journal: Mol Divers Date: 2021-01-03 Impact factor: 2.943

9. Computational Investigation Identified Potential Chemical Scaffolds for Heparanase as Anticancer Therapeutics.

Authors: Shraddha Parate; Vikas Kumar; Jong Chan Hong; Keun Woo Lee
Journal: Int J Mol Sci Date: 2021-05-18 Impact factor: 5.923

10. V367F Mutation in SARS-CoV-2 Spike RBD Emerging during the Early Transmission Phase Enhances Viral Infectivity through Increased Human ACE2 Receptor Binding Affinity.

Authors: Junxian Ou; Zhonghua Zhou; Ruixue Dai; Jing Zhang; Shan Zhao; Xiaowei Wu; Wendong Lan; Yi Ren; Lilian Cui; Qiaoshuai Lan; Lu Lu; Donald Seto; James Chodosh; Jianguo Wu; Gong Zhang; Qiwei Zhang
Journal: J Virol Date: 2021-07-26 Impact factor: 5.103