Literature DB >> 35415339

Protein-Protein Binding Free Energy Predictions with the MM/PBSA Approach Complemented with the Gaussian-Based Method for Entropy Estimation.

Abstract

Here, we present a Gaussian-based method for estimation of protein-protein binding entropy to augment the molecular mechanics Poisson-Boltzmann surface area (MM/PBSA) method for computational prediction of binding free energy (ΔG). The method is termed f5-MM/PBSA/E, where "E" stands for entropy and f5 for five adjustable parameters. The enthalpy components of ΔG (molecular mechanics, polar and non-polar solvation energies) are computed from a single implicit solvent generalized Born (GB) energy minimized structure of a protein-protein complex, while the binding entropy is computed using independently GB energy minimized unbound and bound structures. It should be emphasized that the f5-MM/PBSA/E method does not use snapshots, just energy minimized structures, and is thus very fast and computationally efficient. The method is trained and benchmarked in 5-fold validation test over a data set consisting of 46 protein-protein binding cases with experimentally determined dissociation constant K d values. This data set has been used for benchmarking in recently published protein-protein binding studies that apply conventional MM/PBSA and MM/PBSA with an enhanced sampling method. The f5-MM/PBSA/E tested on the same data set achieves similar or better performance than these computationally demanding approaches, making it an excellent choice for high throughput protein-protein binding affinity prediction studies.

Entities: Chemical

Year: 2022 PMID： 35415339 PMCID： PMC8991903 DOI： 10.1021/acsomega.1c07037

Source DB: PubMed Journal: ACS Omega ISSN： 2470-1343

Introduction

Protein–protein interactions (PPIs) are involved in diverse kinds of cellular processes, and any deviation from the wild-type PPIs may be deleterious. Thus, aberrant PPIs have been associated with various diseases, including cancer, neurodegenerative, and infectious diseases.[1−3] Understanding the nature of PPIs and responsible features lays a foundation for studying their association with diseases and reveals the molecular mechanism causing it.[4−7] Revealing these assists development of pharmaceutical interventions to modulate disease-associated dysfunctional PPIs and restores the wild-type function.[1,8−15] However, only a tiny fraction of existing PPIs has been experimentally explored mainly because of sophisticated experimental setup, high cost, and labor-intensive requirements.[16−18] Cost-effective and less resource-demanding computational methods provide an alternative by predicting binding affinity (or binding free energy ΔG).[19−23] Despite the efforts and advancement in the computational methodologies, prediction of absolute protein–protein ΔG is a very challenging venture; protein–protein ΔG predictions show low correlation with experimentally determined ΔG.[21−29] The poor correlation between predicted and experimental ΔG stems from two factors: quality of experimental data[30] and accuracy of computational methods.[31] One can frequently observe that experimental ΔG values reported for the same PPI by different researchers do not agree.[32−35] Typically, this is due to different experimental conditions or experimental techniques[32−36] which are not clearly reported in the corresponding publication. On the other hand, computational methods suffer from structural imperfections, insufficient sampling, an inability to incorporate adequate experimental conditions,[31,37] imperfections/shortcomings in the energy functions,[38] and approximations and idealizations made in the statistical mechanics treatment.[39] The protein–protein ΔG prediction methods can be broadly grouped into two categories depending on the information used for making the predictions: (i) sequence-based methods[21,25] and (ii) structure-based methods.[22,23,27] The sequence-based methods for predicting PPI affinity utilize the sequence information of the binding proteins by extracting sequence features/patterns, evolutionary information, physicochemical properties of amino acids, and so on, for the proteins in the benchmarking data set. A subset of the benchmarking data set called the training set is used to learn the relationships between various features and ΔG. This learning phase of model generation is called training. In the next phase, the learned associations of features with ΔG are used for benchmarking the predictions for the test set examples in the data set.[21,25] The second class of methods utilizes the protein structure information for developing a model for ΔG predictions, which can further be divided into two classes: empirical methods[22,27] and physics-based methods which vary in physical plausibility, computational cost, and accuracy.[23,26] Among the physics-based methods, the thermodynamic integration (TI) and free energy perturbation (FEP) are expected to have high accuracy but are computationally costly[40] and have thus been mostly used for receptor–ligand binding free energy calculations.[41] Alternatively, methods like molecular mechanics Poisson–Boltzmann surface areas (MM/PBSA),[42] molecular mechanics generalized Born surface areas (MM/GBSA),[43] and linear interaction energy (LIE)[44] are computationally less demanding but are expected to be less accurate.[45] This inaccuracy stems from the traditional protocol that (a) does not account for entropy and (b) neglects the effect of explicit waters. With regard to entropy, the failures of the MM/PBSA or MM/GBSA methods are considered to be due to approximations made in the statistical mechanical treatment,[46,47] approximations and limitations of the entropy methods,[48] inability to do statistically converged sampling of all of the relevant conformations of the systems, and so on.[45,49] Correction of the effects of explicit waters can be partially done via appropriate modeling of the dielectric function,[50−53] although this will not correct for specific water molecules’ interactions with the macromolecules. To partially address these issues of the traditional MM/PBSA protocol, here we report a single frame f5-MM/PBSA method complemented with an estimator of entropy. It is important to note that the method does not use snapshots taken from MD simulations; rather, it uses only the energy minimized structures of the complex and monomers. The goal of this work is to develop a fast and accurate protocol for computing the absolute ΔG via calculating the average enthalpy and entropy associated with the binding. The Boltzmann averaged enthalpy in the traditional MM/PBSA method is calculated as the average of enthalpy over the frames (snapshots) taken from the corresponding MD simulations. This requires long MD simulations and sequential intensive energy modeling. However, we have shown in several works that energy minimized structures provide solvation energy which is very similar to the solvation energy obtained over an ensemble of MD snapshots.[51,54,55] This is true for both traditional two-dielectric PB and Gaussian-based PB.[50−52] We will use this observation in the current protocol and will model the enthalpy using a single frame, the energy minimized X-ray structure. The second component of the method is the evaluation of entropy change caused by the binding. This is important, since proteins are not static molecules and, frequently, they experience significant conformational changes upon the binding. Thus, neglecting entropy in the protocol that predicts ΔG could have a large effect on the accuracy of the method.[26] However, the evaluation of the change of entropy due to binding (or any other process) is not a trivial task. Most often, the entropy estimation requires extensive phase space sampling, through molecular dynamics (MD) simulations for the complex and the unbound monomers with simulation times often up to several microseconds.[56−58] The application of parametric configurational entropy methods, e.g., normal mode analysis[59] (NMA), quasi-harmonic analysis[60] (QHA), and multiscale cell correlation[61] (MCC), requires comparably lesser conformational sampling (usually several 100 ns to microseconds MD simulations); however, they fail to account for anharmonicity and multimodality of atomic fluctuations. Furthermore, these methods are still very computationally intensive to be applicable for large-scale calculations. Recently, a method called interaction entropy[62] has also been reported which computes entropy from the fluctuations of interaction energies, bypassing the diagonalization of the Hessian matrix (in NMA) or coordinate covariance matrix (in QHA). However, it also requires the MD simulations to find the distributions of interaction energies. In addition to the above-mentioned methods which utilize the forces and fluctuation of atomic positions, a molecular-geometry-based method, the “solvent-accessible-surface-area-based method”, for estimating conformational entropy was also reported.[63] However, this method also requires conformational sampling via MD simulations. The list of entropy estimation methods can be further extended, but practically all existing methods require significant simulation time. Here, we present a method that does not require extensive conformational sampling and thus is very fast. The proposed method uses the energy minimized structure of the protein–protein complex and corresponding unbound monomers. It is based on the Gaussian-based PB approach, where the density of protein molecules is modeled as a function of the atomic packing. Thus, in the core of the solute, the density is high and the ability of side chains to sample different conformations is highly restricted. In contrast, in low density regions, the residues are capable of sampling different conformations, since there is room for side chain reorientation. Thus, in this work, the entropy change upon complex formation is then estimated via the change of accessible side chain rotamers evaluated with Gaussian-based density calculations from unbound to bound states. The method, f5-MM/PBSA/E (where E stands for entropy and f5 stands for five adjustable parameters), is benchmarked against a set of experimental ΔG values frequently used to assess the performance of ΔG predictors,[23,26] and it is shown that the inclusion of entropy greatly improves the accuracy of predictions.

Results and Discussion

To check the sensitivity of results with respect to enthalpic and entropic contributions, we explored three energy formulas, as described by eqs –3. This was done to see the sensitivity of the results with respect to the different energy components, emphasizing the entropy component. Furthermore, the sensitivity of the results was tested for the solvation models, and three generalized Born (GB) models were utilized. The predictions done with each energy formula and GB model were tested against the data set PPI-46 (see the Materials and Methods section). Furthermore, the effects of other parameters, the value of the internal dielectric constant and the variance of the Gaussian distribution, were also tested by systematically varying them and predicting ΔG compared with experimental ones. Here, we assess the performance via comparing the performance of the three energy formulas via multiple linear regression (MLR). In contrast to the standard MM/PBSA model, these models are three-, four-, and five-parameter fitted models as expressed in eqs –3. To reflect it, we call these models f3-MM/PB (eq : model 1), f4-MM/PBSA (eq : model 2), f5-MM/PBSA – TΔSGaussianEntropy, or f5-MM/PBSA/E (eq : model 3).Additionally, we used all three energy models over the PPI-46 data set for all three GB models for 5-fold repeated cross-validation, where 80% of the cases are randomly selected as a training set and the remaining are held for testing the model performance. The process is repeated 25 times, and the analysis results are discussed.

Benchmarks for the PPI-46 Data Set

We have evaluated the three energy models, f3-MM/PB, f4-MM/PBSA, and f5-MM/PBSA/E, over the PPI-46 benchmarking data set. The GBneck2 minimized structures performed best (for modeling enthalpy components) in terms of the Pearson correlation coefficient (PCC) and root mean squared error (RMSE) for all three models shown in Figure . Therefore, here we will discuss results only for the GBneck2 energy minimized structures set in the data set in detail for modeling enthalpy. At the same time, the effect of all three GB models will be presented in the case of entropy.

Figure 1

Summary of the performance of the three energy models (model 1, f3-MM/PB, i.e., eq ; model 2, f4-MM/PBSA, i.e., eq ; model 3, f5-MM/PBSA–TΔSGE, i.e., eq ) with varying internal dielectric constant for the GBneck2 energy minimized set of structures of the PPI-46 benchmarking data set. (a) PCC vs internal dielectric constant value; (b) RMSE vs internal dielectric constant value. We observe that model 1, which considers only ΔGMM and ΔGPB energy terms shows the lowest PCC = 0.429 and largest RMSE = 3.044 kcal/mol for solute dielectric constant ϵin = 1. Here slight improvement in the performance is observed when ϵin is varied from 1 to 10, as PCC goes to 0.436 from 0.429 and RMSE comes down to 3.034 kcal/mol from 3.044 kcal/mol. However, when the non-polar solvation energy term ΔGnon-polar is also included in it, i.e., model 2, not only does the performance of the model improve (increases the PCC to 0.51 and decreases the RMSE to 2.934 kcal/mol), but the effect of variation of solute dielectric constant is also absorbed. After that, we investigated the performance of model 3 which includes Gaussian-based binding entropy along with ΔGMM, ΔGPB, and ΔGnon-polar. We found that after the inclusion of entropy the prediction accuracy improves significantly, as PCC increases to 0.658 kcal/mol and RMSE decreases to 2.60 kcal/mol (Table ). The constant coefficients in the models are summarized in Table . This implies that Gaussian-based entropy captures important information about the protein–protein binding which is missing in the conventional MM/PBSA. In this data set, we did not find any influence of variation of salt concentration on the performance of models evaluated in terms of PCC and RMSE (we varied the salt concentration from 0 to 0.3 M in increments of 0.02 M).

Table 1

Performance of Tested Energy Models and GB Model Combinations over the PPI-46 Data Seta

	model 1b		model 2c		model 3d
GB modele	PCCf	RMSEg	PCC	RMSE	PCC	RMSE
OBC-II	0.423	3.054	0.498	2.957	0.593	2.779
GBneck	0.425	3.051	0.503	2.948	0.631	2.678
GBneck2	0.429	3.044	0.510	2.934	0.658	2.600

f3-MM/PB model (eq ).

f4-MM/PBSA model (eq ).

f5-MM/PBSA/E model (eq ) with parameters σ = 1.20, r = 0.8, and cutoff radius 4.0 Å.

Generalized Born model used in energy minimization of the structures.

Pearson correlation coefficient.

Root mean squared error.

Table 2

Parameter Constant Coefficients in Models 1–3a

		constant coefficient parameter
GBb	EMc	c₁	c₂	c₃	c₄	c₅
OBC-II	model 1d	–9.515405	0.000148	–0.001327
	model 2e	–7.151102	0.000159	–0.000811	0.283298
	model 3f	–12.505926	0.000182	–0.001518	0.971501	0.738526
GBneck	model 1	–9.525939	0.000155	–0.001369
	model 2	–7.129749	0.00016	–0.000882	0.288981
	model 3	–11.449473	0.000184	–0.001839	0.897985	0.680234
GBneck2	model 1	–9.511868	0.000153	–0.0013835
	model 2	–7.038626	0.000161	–0.000852	0.296255
	model 3	–12.070194	0.000147	–0.001632	1.049871	0.777386

Constant coefficient parameters of three energy models (eqs –3) and the GB model used in energy minimizations are summarized for the PPI-46 data set with ϵin = 1.

GB model.

Energy model.

MM/PB model (eq ).

MM/PBSA model (eq ).

MM/PBSA/E model (eq ) with parameters σ = 1.20, r = 0.8, and cutoff radius 4.0 Å.

Results of benchmarking the performance of three energy models (eqs –3) and the GB model used in energy minimizations are summarized for the PPI-46 data set with ϵin = 1; all of the RMSEs are reported in kcal/mol. f3-MM/PB model (eq ). f4-MM/PBSA model (eq ). f5-MM/PBSA/E model (eq ) with parameters σ = 1.20, r = 0.8, and cutoff radius 4.0 Å. Generalized Born model used in energy minimization of the structures. Pearson correlation coefficient. Root mean squared error. Constant coefficient parameters of three energy models (eqs –3) and the GB model used in energy minimizations are summarized for the PPI-46 data set with ϵin = 1. GB model. Energy model. MM/PB model (eq ). MM/PBSA model (eq ). MM/PBSA/E model (eq ) with parameters σ = 1.20, r = 0.8, and cutoff radius 4.0 Å. We compare our results with other studies using the MM/PBSA protocol for the same data set[64] to assess the performance of our method. Fu Chen et al. reported a MM/PB(GB)SA study over the same data set (PPI-46) using ff99, ff02, ff03, and ff14SB force fields, various ϵin values of 1, 2, 4, and 6, and implicit water and explicit water minimization and MD simulations and assessing the performance using a linear regression where the total binding energy predicted from their method is regressed against the experimental binding energy. This study found that MM/PBSA gives the best PCC value of 0.523, when the force field is ff03 and minimization is done in implicit water.[64] We achieved a higher value of PCC = 0.658 over the same data set from single energy minimized structures using MM/PBSA combined with the Gaussian-based binding entropy (see Figure ). However, here we used a different variation of standard MM/PBSA, which we denote as f5-MM/PBSA/E. All benchmarking results are summarized in Table .

Figure 2

Summary of the performance of the three energy models at ϵin = 1 for the GBneck2 energy minimized set of structures of the PPI-46 benchmarking data set. Scatter plots are shown of predicted vs experimental for (left panel) model 1, (middle panel) model 2, and (right panel) model 3. It should be noted that Gaussian entropy does not depend on solute dielectric and salt concentration, as it depends only on the local mean Gaussian densities. However, it depends on the Gaussian variance σ of the Gaussian dielectric model which is implemented in the Poisson–Boltzmann equation (PBE) solver DelPhi,[65] the cutoff radius used to define the local region around the atoms, and the decay rate parameter r in exponential interpolation which controls the curvature of the interpolation curve (Figure a). The σ and cutoff radius parameters affect the mean Gaussian density computation, while r influences the number of effective conformations during the interpolation step of the method. Therefore, we also investigated the influence of parameters related to the Gaussian-density-based method of binding entropy estimation. The three related parameters—σ, the cutoff radius for defining the local region around the atoms, and the decay rate parameter r—are varied. The results of the variation of these parameters on the performance of model 3 are presented in Figure .

Figure 7

Illustration of the interpolation scheme. (a) The effect of the value of the decay rate parameter, r, on the curvature of the exponential decay curve. (b) Illustration of obtaining the number of effective conformations for an example case where the maximum conformations is 3 at a minimum mean Gaussian density of 0.5, with the number of effective conformations (≈2) corresponding to a mean Gaussian density of 0.7 shown.

Figure 3

Influence of variation of the parameters Gaussian variance σ (a and b), cutoff radius for atom surrounding (c and d), and decay rate r (e and f) for Gaussian binding entropy on the PCC (a, c, and e) and RMSE (b, d, and f) of model 3 for the PPI-46 data set. In the PPI-46 data set, the performance of model 3 improves with increasing values of the Gaussian variance parameter, which was varied from 1.0 to 1.30 in increments of 0.05. We obtained best results when σ = 1.20 (Figure a,b). Similarly, on varying the cutoff radius from 3.0 to 8.0 Å in steps of 0.5 Å, we observe improvement in the performance of model 3 which increases initially and after 3.5–4.0 Å starts decreasing with increasing cutoff radius (Figure c,d). The decay rate parameter for the exponential interpolation curve r used to infer the number of effective conformations as a function of the mean Gaussian density of the relevant atoms of a given side chain torsion χ of some residue j which is an amino acid AA is also systematically varied and the performance is tested. We obtained the best correlation at r = 0.8, when we varied it from 0.5 to 1.2 in increments of 0.1 (Figure e,f). In summary, we reached the optimal parameters for Gaussian-density-based binding entropy for the PPI-46 data set which are σ = 1.20, cutoff radius = 4.0 Å, and decay rate of interpolation curve r = 0.8. The best GB model for the PPI-46 data set is GBneck2; however, the effect on performance is small and PCC and RMSE are comparable among the GB models.

5-Fold Cross-Validation of Models

After discussing the performance of the protocol on the whole PPI-46 data sets, we would like to discuss the validation of the protocol. The whole data set must be split into disjoint training and testing data sets for performance assessment. Considering the small size of the PPI-46 data set, we start by splitting the PPI-46 data set into the training set (80% randomly selected) and the remaining as the testing set. The process is repeated 25 times to give 25 different training and associated testing sets. The cases in the training set are used to find the parameter values for the f3-MM/PB, f4-MM/PBSA, and f5-MM/PBSA/E models for all three sets of GB energy minimized structures (using OBC-II, GBneck, and GBneck2), and these parameter values are used for making predictions for testing set cases. The PCC and RMSE values over the training and testing sets are recorded for each repetition, and average and standard deviations of values are tabulated (Table ) and discussed hereafter.

Table 3

Summary of Cross-Validation Results of the Combinations of Three Energy Models and the GB Model Used for Energy Minimizationa

		training		testing
GBb	EMc	PCC	RMSE	PCC	RMSE
OBC-II	model 1d	0.440 ± 0.045	3.065 ± 0.104	0.496 ± 0.241	3.245 ± 0.672
	model 2e	0.513 ± 0.044	2.968 ± 0.094	0.519 ± 0.178	3.295 ± 0.832
	model 3f	0.596 ± 0.040	2.818 ± 0.098	0.651 ± 0.151	2.958 ± 0.894
GBneck	model 1	0.442 ± 0.045	3.061 ± 0.106	0.495 ± 0.242	3.246 ± 0.678
	model 2	0.520 ± 0.044	2.955 ± 0.098	0.522 ± 0.182	3.313 ± 0.867
	model 3	0.653 ± 0.042	2.657 ± 0.151	0.609 ± 0.205	3.175 ± 1.130
GBneck2	model 1	0.447 ± 0.045	3.053 ± 0.106	0.503 ± 0.241	3.255 ± 0.694
	model 2	0.527 ± 0.044	2.941 ± 0.010	0.529 ± 0.180	3.306 ± 0.884
	model 3	0.675 ± 0.040	2.586 ± 0.135	0.653 ± 0.151	3.091 ± 1.103

GB model.

Energy model.

MM/PB model (eq ).

MM/PBSA model (eq ).

MM/PBSA/E model (eq ) with parameters σ = 1.20, r = 0.8, and cutoff radius 4.0 Å.

Summary of PCC and RMSE over training (randomly selected 80%) and testing (remaining 20%) of the PPI-46 data set for the three energy models (eqs –3) and the GB model used in energy minimizations with ϵin = 1, repeated 25 times; mean and standard deviation are provided. GB model. Energy model. MM/PB model (eq ). MM/PBSA model (eq ). MM/PBSA/E model (eq ) with parameters σ = 1.20, r = 0.8, and cutoff radius 4.0 Å. As shown in Table , the f3-MM/PB (model 1) shows the mean PCC between 0.440 and 0.447 for all three GB minimized structure sets for the training sets, RMSE is between 3.065 and 3.053 kcal/mol, and there are similar average values of PCC and RMSE also for the testing sets (Table ). The four-parameter model f4-MM/PBSA (model 2) shows better performance in terms of PCC (higher) and RMSE (smaller). After inclusion of entropy as in the f5-MM/PBSA/E (model 3), we consistently observe a significant increase in PCC and decrease in RMSE for all three sets of GB minimized structures across the training and testing sets (Table ). The improvement for both the training and testing set performance implies that Gaussian entropy provides important information about the binding which is missing in f4-MM/PBSA, and thus, the improvement is not merely due to the increased number of parameters used in the model.

Computation Time

To analyze the computational requirements of the Gaussian-based method of entropy reported here, we recorded the execution time for all of the protein–protein cases in the PPI-46 data set. The computations are performed on compute nodes of the Palmetto Cluster of Clemson University. The nodes used for computation have Intel Xeon E5520 processors which have 8 MB cache and 2.26 GHz frequency. All of the jobs are run on a single core of a node, and the maximum memory reserved for a job was 24 GB. The computations were run three times for each case, and the average times taken in minutes vs the number of residues in the protein–protein complex are shown in Figure . Each of these involves running three DelPhi runs and associated Gaussian-based entropy computation for the complex, and the corresponding two unbound proteins.

Figure 4

Number of residues in the protein–protein complex vs total computation time for running DelPhi and computing entropy from the Gaussian-density map data.

Number of residues in the protein–protein complex vs total computation time for running DelPhi and computing entropy from the Gaussian-density map data. As shown in Figure , the data set contains a very wide range of sizes of protein–protein complexes varying from 116 to 1107 residues. For all of the cases, the total computation time taken in minutes is linearly related to the size of the complex. These computations do not require parallel processing and are very computation time and memory efficient. Thus, these can be performed even on standard desktops, in contrast to other entropy methods which require significantly higher computation resources and time.

Materials and Methods

Overview of the Method

The proposed method combines MM/PBSA with an estimator of entropy. It does not use multiple structures obtained via MD simulations, but rather, it deals with a single structure (Figure ). Thus, the enthalpy components, the MM/PBSA energies, were computed using the energy minimized 3D structure of the protein–protein complex and utilizing a rigid-body approach; i.e., the structures of unbound monomers were taken from the energy minimized 3D structure of the complex. Thus, the bonded energy cancels out and is not calculated (Figure a). In parallel, we tested a protocol that does independent energy minimization of separated monomers. However, the results were found to be less accurate (data not shown) than the rigid-body approach, and thus, in the rest of the paper, we present only the results obtained with enthalpy components modeled with the rigid-body protocol. In the case of entropy estimation, it was found that the rigid-body protocol does not perform well, and thus, the structures of the monomers were energy minimized independently from the energy minimization of the complex and used for entropy change calculations (Figure b). Further discussion is provided in the conclusions section of the manuscript.

Figure 5

Schematic representation of protocols used for computing enthalpy and entropy components of protein–protein binding free energy. (a) The enthalpy energy components are computed over the energy minimized complex structure, and monomers are extracted from it. (b) The entropy estimation is done in a protocol that the complex and two monomers are energy minimized independently.

Benckmarking Data Set

In the present work, we will be using a binding affinity benchmarking data set of 46 protein–protein experimental ΔG values published by Kastritis et al.[28,66] This data set lists the PDB ID of the structure, chains of protein 1 and protein 2, equilibrium dissociation constant Kd, experimental temperature, and pH for most of the cases. The PDB IDs, chains of proteins, and pKd values are provided in the Supporting Information (Table S1). This data set has been used for benchmarking MM/PBSA and MM/GBSA methods in combination with several force fields.[64] We will refer to this data set as PPI-46 from now on. This data set covers a broad range (10 orders of magnitude) of experimental binding affinities.

Structure Preparation and Minimization

The structures of the protein–protein complexes in the PPI-46 data set were downloaded from the RCSB Protein Data Bank,[67,68] and chains mentioned in the data set were extracted. All of the water and hetero atoms were deleted from the structures. All titratable residues were protonated according to the neutral pH state, and all of the histidine residues were kept neutral by placing a proton at the epsilon position. The charges and parameters were kept consistent with the AMBER ff14SB[69] force field. For structure minimizations, we have used three OBC-II,[70] GB-neck,[71] and GB-neck2[72] implicit solvent generalized Born (GB) models implemented in AMBER.[73] The parameter+topology (.prmtop) and starting coordinate (.inpcrd) files were created using the LEaP program in AmberTools18.[73] The starting structures were energy minimized using each of the three GB models to yield three sets of structures for protein–protein complexes. The energy minimization is performed in two stages first while restraining all of the heavy atoms using a 10 kcal·mol–1·Å–2 harmonic potential for 8000 steps of steepest descent (SD) and 2000 steps of conjugate gradient (CG) followed by 8000 steps of SD and 2000 CG without restraint. We have used the rigid-body protocol for the computation of enthalpy components of the MM/PBSA approach, and the unbound protein structures were extracted from the energy minimized complex structures. However, the entropy component of the binding energy which has great sensitivity to conformational changes upon complex formation[24,74,75] was estimated from separately minimized unbound proteins and complexes. The unbound protein structures were extracted from each complex in the data set prior to energy minimization of the complex. These structures were also prepared and energy minimized using the same protocol to yield three sets corresponding to the above-mentioned three GB models.

Binding Free Energy Computation (Enthalpy Component)

Here, the molecular mechanics (MM) part of the binding energy (ΔGMM) is computed using the rigid-body protocol so ΔGbonded = 0. The unbound protein structures are extracted from the energy minimized structure of the complex by removing the partner protein. The nonbonded terms of the binding energy (ΔGnonbonded) and polar solvation energy (ΔGpolar) are computed using the popular numerical Poisson–Boltzmann equation solver DelPhi[65,76] employing the traditional two-dielectric model using charge and radii from the AMBER ff14SB[69] force field, a grid spacing of 0.5 Å, the longest solute dimension filling of 70% of the grid box, and a solvent dielectric constant of 80; hence, it will be referred to as ΔGPB hereafter. The traditional two-dielectric model was applied instead of the Gaussian-based model[50,52,77] because the Gaussian-based atomic density model is used to estimate the entropy as outlined below. The solute dielectric constant and the salt concentration are varied to study the influence of these parameters on the prediction accuracy of the method. The non-polar solvation energy (ΔGnon-polar) is estimated from the change of the solvent accessible surface area (SASA) (eq ).The change in SASA (ΔSASA) is computed as the difference of the SASA of unbound proteins from complex in Å2; the surface tension γ = 0.00542 kcal·mol–1·Å–2 and the correction term b = 0.92 kcal/mol are used. The SASA with a solvent probe radius of 1.4 Å is computed using Visual Molecular Dynamics.[78]

Estimation of Entropy Change upon Binding

The basic idea is to estimate the change of the side chain entropy change upon the binding by evaluating the accessible rotamers of the corresponding amino acids in monomeric versus bound states. To avoid the complexity associated with continuum conformation space, each amino acid side chain is considered to have a finite number of rotamers taken from Dunbrack library of rotamers.[79] When an amino acid is free in the water phase, it is considered that its side chain can sample all rotamers provided in the Dunbrack library[79] (see the left panel in Figure ). When it is part of the bound structure (protein–protein complex) or unbound structure (protein structure obtained after removing the binding partner structure from the complex), not all rotamers are accessible because of the presence of atoms of neighboring residues (right panel in Figure b). Thus, the change of accessible rotamers from unbound to bound states for each amino acid of the proteins forming a complex is used to estimate the entropy change upon the binding. Below, we outline the details about (i) modeling atomic density (which will be used to decide if a rotamer is accessible or not), (ii) building a reference library of atomic densities for free amino acids, and (iii) estimation of accessible rotamers.

Figure 6

A schematic representation of the idea of Gaussian-based entropy. An ILE residue of a protein is shown in black and white ball and stick. Neighboring atoms in radius 4 Å are shown with semitransparent cyan spheres. Left panel: all possible side chain conformations of ILE in unbound protein. Right panel: only one rotamer of the same ILE is accessible due to the presence of neighboring atoms of the binding partner. (i) Modeling Atomic Density: Here we build upon our previously proposed Gaussian-density-based model of atoms.[50,52] In the Gaussian-based model of an atom, an atom is represented as a probability density ρ(r⃗) at any arbitrary point r⃗ in space due to the ith atom, the ρ(r⃗) is maximum, i.e., 1 at its center, and it decreases according to a Gaussian distribution as we move away from it (eq and Figure S1). In a multiatomic molecule, the Gaussian density at any point in space r⃗ is resultant of atomic densities due to all of the atoms (eq )where R is the van der Waals radius of the ith atom and σ is the variance of the Gaussian distribution.The Gaussian density varies from 0 to 1 in space and expresses the extent of atomic packing; a value of 0 corresponds to a point where there are no atoms of the molecule and 1 corresponds to centers of atoms of the molecule, with any other value in the range corresponding to a higher density of atoms for a higher Gaussian density. (ii) Building a Reference Gaussian Density Library: The ability to occupy different conformational states for each side chain torsion angle of each amino acid (AA) is hindered due to spatial packing around the corresponding atoms. Thus, the first step is to identify atoms that participate in the side chain torsion angles of each of 20 standard amino acids (Table S2 in the Supporting Information). Then, the average Gaussian density is computed using the 3D structure of the isolated amino acid and applying the Gaussian subroutine implemented in the PBE solver DelPhi.[65] The mean Gaussian density is computed as the average of Gaussian densities on all of the grid points in a cutoff radius (say 5 Å) from the center of all relevant atoms (as described above) to χ of a given AA and averaged to obtain the mean Gaussian density (ρ̅χ). The ρ̅χ computed from the isolated amino acid structure is called the minimum mean Gaussian density minρ̅χ. The maximum number of conformations to a given χ of amino acid AA, i.e., nConfχ, is obtained from the Dunbrack rotamer library.[79] The pair of minρ̅χ and nConfχ for each χ of each amino acid AA is saved in the library for later use for obtaining the effective number of conformations via interpolation. (iii) Computation of Effective Number of Conformations: The corresponding protein structure (the bound and unbound structures/structure of protein obtained after removing the binding partner structure from the complex; see Figure b) is energy minimized using the protocol described above. The mean Gaussian density ρ̅ for each χ of each residue j is computed. Then, an effective number of conformations is obtained by the exponential interpolation scheme (Figure b) having two boundary points: (a) the maximum number of conformations (maxnConfχ) that are available in isolated residue AA for side chain torsion χ and the yield associated mean Gaussian density termed “minimum mean Gaussian density” and (b) the minimum number of conformations (minnConfχ), i.e., 1 when the mean Gaussian density is the maximum possible value of 1. The increase in ρ̅ in protein w.r.t. minρ̅χ due to more compact “dense” surrounding relevant atoms causes a decrease in the effective number of conformations as expressed in eq where d = (ρ̅ – minρχ)/(1 – minρχ), k = log(maxnConfχ/minnConfχ), a = 1/exp(−k), and r is the decay rate parameter of the interpolation curve (Figure a). Illustration of the interpolation scheme. (a) The effect of the value of the decay rate parameter, r, on the curvature of the exponential decay curve. (b) Illustration of obtaining the number of effective conformations for an example case where the maximum conformations is 3 at a minimum mean Gaussian density of 0.5, with the number of effective conformations (≈2) corresponding to a mean Gaussian density of 0.7 shown. The effective number of conformations of every side chain torsion χ of the residue j which is amino acid AA are multiplied to obtain the effective number of conformations of the residue in the protein (eq ).Finally, we take the logarithm of the effective number of conformations nConf of the residue j to get its entropy S in the protein (eq ). The sum of the entropy of all of the residues in the protein yields the entropy of the protein (eq ).Thus, the entropies of the protein–protein complex Scomplex and unbound proteins Sprotein1 and Sprotein2 are computed and then the binding entropy (ΔSbind) is obtained via subtracting the sum of entropy of unbound proteins from that of the complex (eq ).

Conclusions

In this work, we presented a Gaussian-density-based method for estimation of the entropy change caused by protein–protein binding. This method hypothesizes that isolated amino acid side chains are free to rotate and occupy all of the accessible conformations equiprobably; however, this ability is restricted to a certain extent due to increased atomic packing (estimated via Gaussian density) when the amino acid is a part of a protein or a protein–protein complex. Thus, the change of accessible conformers from unbound to bound states is used to estimate the entropy change induced by the binding. Combining the entropy change with MM/PBSA enthalpic components resulted in the f5-MM/PBSA/E method which was tested on a popular data set PPI-46. It is important to mention that the MM/PBSA/E method uses energy minimized structures of the complex and unbound monomers only, not snapshots obtained via MD simulations, and thus, it is fast and less computationally demanding than traditional MM/PBSA methods. Despite that, the f5-MM/PBSA/E method achieves a similar or better performance than other methods, as benchmarked against experimentally determined ΔG values from the PPI-46 data set. The protocol, the f5-MM/PBSA/E, considers both enthalpic and entropic contributions to the free energy of binding. However, the optimal conditions for modeling enthalpic and entropic components were found to be different: the best performance was obtained with the rigid-body protocol for enthalpic component calculations, while the optimal performance for the entropic component was achieved when bound and unbound structures were independently energy minimized. The main reason why the rigid-body approach worked better for enthalpic components modeling was the cancellation of bonded interactions. Our attempt to use independently minimized bound and unbound structures for enthalpic calculations resulted in large “bonded interaction” energies which dominated all other components. In contrast, the best performance in estimating the entropy change caused by the binding was found when one uses independently minimized bound and unbound structures. This observation reflects the nature of the Gaussian-based method for entropy estimation, which is geometry-based. Thus, small structural changes caused by independently minimizing bound and unbound structures have a significant effect on the entropy change calculations. This indicates that further improvement may be expected if one extends the Gaussian-based entropy estimator method to include backbone changes from unbound to bound states.

68 in total

1. Extraction of configurational entropy from molecular simulations via an expansion approximation.

Authors: Benjamin J Killian; Joslyn Yundenfreund Kravitz; Michael K Gilson
Journal: J Chem Phys Date: 2007-07-14 Impact factor: 3.488

2. Proteins feel more than they see: fine-tuning of binding affinity by properties of the non-interacting surface.

Authors: Panagiotis L Kastritis; João P G L M Rodrigues; Gert E Folkers; Rolf Boelens; Alexandre M J J Bonvin
Journal: J Mol Biol Date: 2014-04-25 Impact factor: 5.469

Review 3. Hot-spot analysis for drug discovery targeting protein-protein interactions.

Authors: Mireia Rosell; Juan Fernández-Recio
Journal: Expert Opin Drug Discov Date: 2018-01-29 Impact factor: 6.098

4. Mutational and structural analysis of the binding interface between type I interferons and their receptor Ifnar2.

Authors: J Piehler; G Schreiber
Journal: J Mol Biol Date: 1999-11-19 Impact factor: 5.469

5. Develop and test a solvent accessible surface area-based model in conformational entropy calculations.

Authors: Junmei Wang; Tingjun Hou
Journal: J Chem Inf Model Date: 2012-04-24 Impact factor: 4.956

6. Pembrolizumab versus Chemotherapy for PD-L1-Positive Non-Small-Cell Lung Cancer.

Authors: Martin Reck; Delvys Rodríguez-Abreu; Andrew G Robinson; Rina Hui; Tibor Csőszi; Andrea Fülöp; Maya Gottfried; Nir Peled; Ali Tafreshi; Sinead Cuffe; Mary O'Brien; Suman Rao; Katsuyuki Hotta; Melanie A Leiby; Gregory M Lubiniecki; Yue Shentu; Reshma Rangwala; Julie R Brahmer
Journal: N Engl J Med Date: 2016-10-08 Impact factor: 91.245

7. Exploring protein native states and large-scale conformational changes with a modified generalized born model.

Authors: Alexey Onufriev; Donald Bashford; David A Case
Journal: Proteins Date: 2004-05-01

8. Atezolizumab for First-Line Treatment of Metastatic Nonsquamous NSCLC.

Authors: Mark A Socinski; Robert M Jotte; Federico Cappuzzo; Francisco Orlandi; Daniil Stroyakovskiy; Naoyuki Nogami; Delvys Rodríguez-Abreu; Denis Moro-Sibilot; Christian A Thomas; Fabrice Barlesi; Gene Finley; Claudia Kelsch; Anthony Lee; Shelley Coleman; Yu Deng; Yijing Shen; Marcin Kowanetz; Ariel Lopez-Chavez; Alan Sandler; Martin Reck
Journal: N Engl J Med Date: 2018-06-04 Impact factor: 91.245

Review 9. Recent advances in the development of protein-protein interactions modulators: mechanisms and clinical trials.

Authors: Haiying Lu; Qiaodan Zhou; Jun He; Zhongliang Jiang; Cheng Peng; Rongsheng Tong; Jianyou Shi
Journal: Signal Transduct Target Ther Date: 2020-09-23