Literature DB >> 31589427

Solvation Free Energy as a Measure of Hydrophobicity: Application to Serine Protease Binding Interfaces.

Johannes Kraml¹, Anna S Kamenik¹, Franz Waibl¹, Michael Schauperl², Klaus R Liedl¹.

Abstract

Solvation and hydrophobicity play a key role in a variety of biological mechanisms. In substrate binding, but also in structure-based drug design, the thermodynamic properties of water molecules surrounding a given protein are of high interest. One of the main algorithms devised in recent years to quantify thermodynamic properties of water is the grid inhomogeneous solvation theory (GIST), which calculates these features on a grid surrounding the protein. Despite the inherent advantages of GIST, the computational demand is a major drawback, as calculations for larger systems can take days or even weeks. Here, we present a GPU accelerated version of the GIST algorithm, which facilitates efficient estimates of solvation free energy even of large biomolecular interfaces. Furthermore, we show that GIST can be used as a reliable tool to evaluate protein surface hydrophobicity. We apply the approach on a set of nine different proteases calculating localized solvation free energies on the surface of the binding interfaces as a measure of their hydrophobicity. We find a compelling agreement with the hydrophobicity of their substrates, i.e., peptides, binding into the binding cleft, and thus our approach provides a reliable description of hydrophobicity characteristics of these biological interfaces.

Entities: Chemical Disease Gene Species

Mesh：

Substances：
Solvents
Serine Proteases

Year: 2019 PMID： 31589427 PMCID： PMC7032847 DOI： 10.1021/acs.jctc.9b00742

Source DB: PubMed Journal: J Chem Theory Comput ISSN： 1549-9618 Impact factor: 6.006

Introduction

The hydrophobic effect is a major driving force for chemical interactions in aqueous solution and thus a heavily studied phenomenon.[1−4] However, it is discussed to be the least understood intermolecular interaction,[5] which is particularly true in the complex environment of biomolecules.[6,7] In spite of the conceptual challenges, a detailed profile of the hydrophobic surface area is critical in state-of-the-art drug design.[8,9] Hydrophobic areas, where water is easily replaced and released into the bulk, indicate possible interaction sites at the protein surface. On the one hand, this information can be exploited to optimize binding properties of small molecules and biopharmaceuticals alike.[10−12] On the other hand, surface hydrophobicity is believed to be linked to protein aggregation, a crucial factor in the developability of biopharmaceutical products.[13,14] Hence, there is an acute demand for methods to estimate the hydrophobicity of biomolecular interfaces. A broad range of concepts and algorithms have already been devised to calculate the surface hydrophobicity of proteins. Most common approaches describe hydrophobicity residue-wise, relying on an average hydrophobicity for each amino acid, which is extracted from premeasured or precalculated hydrophobicity scales.[15−20] These traditional hydrophobicity scales found a broad spectrum of applications, ranging from purely sequence-based secondary structure prediction tools to identification of hydrophobic patches on the protein surfaces.[13−15,21−23] More fine-grained scales have been developed, where distinctions are made between different atom types in the amino acids.[8,24] Despite their applicability for a broad range of problems, hydrophobicity scales share a common limitation—they only consider the average hydrophobicity of each building block independently. Yet, in the context of a protein, the solvent interactions of each residue strongly depend on its surrounding, i.e., on nonadditive effects. Furthermore, it is interesting to study hydrophobicity in terms of entropic and enthalpic contributions to gain insight into the underlying mechanisms of hydrophobicity and protein hydration. However, none of the aforementioned methods can quantify the entropic contribution to hydrophobicity, or hydration, directly. Free energy perturbation and thermodynamic integration are often used to calculate the hydration free energy of various molecules.[25,26] But, to calculate the entropic contribution to hydration these models use either the temperature dependence of the free energy or the difference of the free energy and enthalpy.[27,28] As hydration is a key feature for a variety of different processes, such as biomolecular recognition, a plethora of methods has been developed in recent years to model and analyze water molecules and interactions thereof (AcquaAlta,[29] AQUARIUS2,[30,31] Biki Hydra,[32] Consolv,[33] Dowser++,[34] GRID,[35−37] HINT toolkit,[38] MCSS,[39−41] PyWATER,[42] SPAM,[43] SZMAP,[44] WatCH,[45] WATCLUST,[46] WaterDock,[47] WaterFLAP,[48] WaterScore,[49] WATGEN,[50] wPMF[51]). Methods like AQUARIUS2, MCSS, PyWATER, or WatCH were developed to suggest the placement of water molecules, which can be extremely useful either for further molecular dynamics simulations or molecular docking studies. Another group of methods, such as GRID, WaterDock, or SPAM, calculate the interaction energies of water molecules around a protein. But, most of these cannot calculate the entropic contributions directly or can only analyze certain parts of the protein. Another approach to estimate the solvation free energy is presented by methods based on the Inhomogeneous Solvation Theory (IST). Multiple implementations exist, some of which only analyze the free energy of water on specific, high density water sites, i.e., solvation thermodynamic of ordered water (STOW)[52] or WaterMap.[53,54] Grid inhomogeneous solvation theory (GIST),[12,55−57] on the other hand, calculates the thermodynamic properties of the water molecules for each voxel on a three-dimensional grid. A highly versatile alternative to GIST is the 3D reference site interaction model (3D-RISM).[58,59] This approach relies on the Ornstein–Zernike equation to calculate the enthalpy and entropy directly. However, due to the estimations necessary to calculate the closure relationship, the values obtained by the 3D-RISM theory are approximate. In a recent paper, Nguyen et al.[60] analyzed the differences of GIST and 3D-RISM and have shown that, in general, 3D-RISM yields lower enthalpic and entropic penalties for moving a water molecule from the bulk to the protein’s surface. This leads to lower free energies than are calculated by GIST. Another method similar to GIST is the grid cell theory (GCT),[61] in which the thermodynamic values of the water molecules are calculated on a grid according to the cell theory. However, compared to GIST this approach is extremely computationally expensive. We recently suggested an approach based on GIST to calculate hydrophobicity as solvation free energy.[62] The strengths of this approach include the possibility to dissect solvation free energy into its contributions—enthalpy and entropy. In this method, the individual contributions to the free energy of water surrounding a solute are analyzed on a grid. This allows a detailed analysis of the different contributions to the solvation free energy. Furthermore, GIST allows areas contributing to the solvation free energy of the solute to be identified, therefore providing a method to estimate local contributions to the ensemble property. However, a major drawback we encountered is the extensive computational demand of the approach, especially when it comes to large biomolecular systems. In recent years, the GPU has been harnessed more and more to accelerate simulations and analyses thereof. Here, we reimplement the GIST algorithm to make use of the GPU’s calculation power. Thereby, the approach becomes feasible for the analysis of extensive biomolecular interfaces. We apply our algorithm to characterize the hydrophobicity of the biomolecular interfaces of nine Chymotrypsin-like serine proteases. Proteases in general accelerate the hydrolysis of peptide bonds, via nucleophilic attack on the otherwise kinetically stable amide group.[63] Due to the versatility of this reaction, proteases fulfill a broad range of functions in the human body.[64] Thus, they are highly appealing drug targets and a detailed understanding of their substrate recognition mechanism is paramount.[65,66] For numerous proteases, their substrate preference has been extremely well characterized by massive amounts of cleavage data accumulated in the public MEROPS database.[67] A key feature of protease binding sites captured by the MEROPS is that binding data can be localized to so-called subpockets.[68,69] The concept of subpockets goes back to Schechter and Berger,[70] who suggested to number the binding interface of proteases with S4–S4′ and to do the same on the substrate side, where the substrate positions are named P4–P4′. This partitioning of the binding interface facilitates a localized analysis of the contributions to biomolecular recognition. Due to their role in various diseases, the Chymotrypsin-like serine protease family has been extensively studied experimentally as well as computationally.[69,71,72] In one of our recent studies, we exploited the experimental data on local substrate preferences to show that similarity of Chymotrypsin-like proteases in terms of their electrostatic substrate recognition correlates with the similarity of their electrostatic molecular interaction fields.[73] This finding suggests that electrostatic recognition and shape-based recognition are orthogonal aspects of substrate binding. Our results so far are thus well in-line with a two-step mechanism of protein–protein binding, where in a first step, an initial preordering of the substrate takes place, which relies on long-range electrostatic interactions between the binding partners.[74−76] To allow for the second step, where the formation of the complex structure is partly driven by shape complementarity, water in the binding interface needs to be displaced. Displacement of energetically unfavorable water molecules in the binding interface contributes beneficially to the binding process. This effect is strongest when both the protease binding interface and substrate are strongly hydrophobic, as water molecules are then more easily stripped from both surfaces. We thus surmise that, especially during this last step of protein–protein complex formation, the hydrophobicity of the binding interface plays a key role. In this study, we implement and employ the GIST algorithm on the GPU, for an accelerated estimation of solvation free energy from molecular dynamics simulations. Focusing on a set of nine Chymotrypsin-like serine proteases, we use the GIST results to characterize and localize the hydrophobicity in the respective binding interfaces. Furthermore, we compare this analysis to the proteases’ substrate data, by introducing a straightforward methodology to calculate the average hydrophobicity of the substrate binding to each of the proteases’ subpockets. Combining both analyses we then show that the substrate hydrophobicity correlates well with the local solvation patterns of the binding site.

Methods

Grid Inhomogeneous Solvation Theory

In this section, a brief overview of the grid inhomogeneous solvation theory, devised by Gilson and co-workers, will be given. For a more detailed introduction, the reader is referred to their original studies.[12,55−57] GIST uses a grid to replace the spatial integrals from IST with discrete sums over the voxels on the constructed grid, and only the discrete case will be explained here. For a more general discussion of IST itself, we again refer the reader to the original publication.[77] In GIST, the contributions to a localized, density-weighted free energy of solvation is split into energy and entropy contributions according to eq . The energy contribution is then split up further into the solvent–solvent interaction energy and the solute–solvent interaction energy, as shown in eq . For the entropy, only the two-body term is considered and split into translational and orientational entropy (eq ).The total equation for the localized free energy of solvation can be derived by inserting eqs and 3 into eq . The calculation of E and E is straightforward, as this is the averaged interaction energy between the solvent molecules in the current voxel with the solute (E) or the other solvent molecules (E). Algorithmically, these interaction energies directly derive from the applied force field and are simply averaged over all frames of the trajectory. To avoid double counting of the solvent–solvent interactions, this contribution is divided by 2. For the entropic contribution to solvation free energy, solvent–solvent correlations are neglected. The entropy is then calculated via the distribution of the solvent molecules in each voxel. This distribution is a function of the position of the solvent molecule in three-dimensional space and of the orientation of the solvent molecule. This can be partitioned into two different functions, one considering only the position of the solvent molecule, which is calculated as the translational entropy, and a second one depending on the orientation of the solvent molecule in a certain grid voxel, which is the orientational entropy. The translational entropy is then calculated as a Shannon entropy of the distribution of the position of the solvent molecules, as can be seen in eq .where k is Boltzmann’s constant, ρ0 is the reference density of the solvent model, V is the volume of a voxel, and g() is the positional distribution of the solvent molecule. Similarly, the orientational entropy is also calculated as a Shannon entropy of the orientational distribution of the solvent molecules at a certain position, following eq .As proposed by the original GIST implementation,[57] we use a nearest neighbor method[78] (eqs and 7) to compute the orientational and translational entropy for voxel k, from the simulation data.where N is the total number of solvent molecules in voxel k and γ is Euler’s constant that corrects for an asymptotic bias, d is the closest distance between two solvent molecules, Δω is the closest angular distance between two solvent molecules, and N is the total number of frames. Algorithmically, d is calculated as the Euclidean distance between two solvent molecules. Δω is calculated as the distance between two quaternions, which represent the orientation of each solvent molecule in space (eq ).[79,80]where |.| denotes the absolute value and ⟨q1, q2⟩ denotes the dot product between the two quaternions. Both contributions to the entropy can also be estimated together via the same nearest neighbor method, estimating the integral of sixth order.[57] For all discussed metrics, the reference state is a pure solvent box without solute. Since the solvent–solvent correlations are neglected for the entropic contributions, the reference value for both entropic contributions is zero, as there is no preferred position or orientation resulting in a uniform distribution. Furthermore, the reference for the solute–solvent interaction energy is also zero, with only the solvent–solvent interaction energy differing from zero. In the case of TIP3P water this reference energy is −9.533 kcal/mol for the water–water interaction energy per water molecule.

GIST Implementation

The GPU-accelerated GIST algorithm was implemented using C++ and CUDA. Cpptraj,[57,81] the analysis tool chain included in the AMBER software package, served as backbone for the code. The presented GPU-accelerated GIST algorithm will be provided to be included to the official version of cpptraj. The calculation of the interaction energies was reimplemented on the GPU, since this is the most time-consuming part. The calculation of the entropic contributions is still performed on the CPU, as this calculation is very fast in comparison. The GPU implementation of GIST was analyzed with respect to speedup by performing GIST calculations on pure truncated octahedron water boxes of various sizes. The entire GIST calculation was considered for the analysis of the speedup. These water boxes ranged from a minimum wall distance from the origin of 20 Å to 50 Å, which we will denote as a box radius for readability purposes. For each water box 5000 frames were analyzed, and the grid was created in a way to cover the entire simulation box, using a grid size of 125 Å × 125 Å × 125 Å and a grid spacing of 0.5 Å, resulting in a total number of 250 × 250 × 250 or 15,625,000 grid points.

Simulation Setup

We performed MD simulations starting from the crystal structures of Chymotrypsin (4CHA),[82] Elastase (1QNJ),[83] Factor VIIa (1KLI),[84] Factor Xa (1C5M),[85] Granzyme B (1IAU),[86] Granzyme M (2ZGH),[87] Kallikrein-1 (1SPJ),[88] Thrombin (4AYY),[89] and Trypsin (1PQ7)[90] for subsequent GIST calculations. The structures were prepared using the MOE software package (2019.01),[91] and ligands were deleted where present. We used the implemented Protonate3D tool[92] to resemble a pH of 7.0. Special attention was devoted to the catalytic histidine, to make sure it was not protonated. A TIP3P[93] water box was added with a minimal wall distance of 12 Å. All simulations were performed using the AMBER simulation engine,[94] with the GPU implementation of the particle mesh Ewald algorithm.[95] We employed the Langevin thermostat for temperature control of the system and simulated at a temperature of 300 K. Since simulations were performed in the NpT ensemble, the Berendsen barostat was used with a relaxation time of 2.0 ps. Each structure was simulated for 100 ns, and 5000 frames were used, to ensure convergence. The solute was restrained using a restraint weight of 50 kcal/(mol Å2). The GIST analysis for the proteases was carried out, using the newly implemented GPU version of the GIST algorithm. A grid of size 100 Å × 100 Å × 100 Å was used for all protease structures, with grid spacing of 0.5 Å, resulting in a grid of size 200 × 200 × 200, or 8 × 106 grid points. The temperature was set to 300 K, and the reference density of water was chosen according to the AMBER manual (TIP3P: 0.0329 molecules/Å3). For the analysis, 5000 frames of the simulation were used. The reference water–water interaction energy was set to the same as stated in the AMBER manual (TIP3P: −9.533 kcal/mol).

Amino Acid Solvation Free Energy

To calculate the solvation free energy of each amino acid in the substrates, the same procedure was followed as published previously.[62] Three conformations of each of the naturally occurring 20 amino acids were considered to calculate their solvation free energy. In a distance of 5 Å, integration of the solvation free energy was performed to calculate the total solvation free energy for each conformation of each amino acid. These values were then averaged, and for histidine, the average of the two possible uncharged tautomeric states was used.

Substrate Hydrophobicity

The substrate hydrophobicity is calculated as the weighted average hydrophobicity at each substrate position, based on the MEROPS cleavage data. Here, we used the hydrophobicity scale from Eisenberg[15] as a reference to calculate the average hydrophobicity of an amino acid at each substrate position. We normalized the values of that scale by shifting the hydrophobicity value of the most hydrophobic amino acid to zero. This enables an easy comparison between different methods. Additionally, we calculated the solvation free energy profile of each substrate position using GIST. Also, for this scale, the most hydrophobic amino acid was shifted to have a score of 0. Here, we used the solvation free energy of the different amino acids as a scale, as suggested previously.[62] Then, the hydrophobicity for each substrate position was calculated via eq , which is the weighted average of the hydrophobicity score of each individual amino acid. This step was performed for both scales to compare the solvation free energy ansatz with the experimental concept of Eisenberg’s scale.where P is the substrate position i, AA is the set of all 20 naturally occurring amino acids, n is the number of substrates containing amino acid a in substrate position i, and ΔG is the solvation free energy of amino acid a. The substrate binding data needed for each protease was gathered from the MEROPS database.

Protein Surface Hydrophobicity Mapping

The surface hydrophobicity mapping projects the values of the solvation free energy around a molecule onto its surface. In this case, the mapping was performed on an atomistic level, but also more coarse- and more fine-grained levels could be used, e.g., considering the entire residue or calculating points on the surface and projecting the values there. The hydrophobicity was calculated by a density-weighted average around each atom. Additionally, a Gaussian cutoff function with a σ of 1 Å was applied between 4 and 7 Å.

Localized Subpocket Hydrophobicity

The localized subpocket hydrophobicity is a metric to calculate the solvation free energy that would need to be replaced by a ligand binding into a certain pocket of a protease. The localized subpocket hydrophobicity was calculated, using four different ligands to define the binding interface of each protease. The four different ligands where chosen from different X-ray structures to cover the binding interface of the different proteases from the S4 subpocket to the S4′ subpocket (PDB Codes: 1DE7,[96]3LU9,[97]2ZGH,[87] and 3TJV(98)). The ligands were positioned by aligning the protease crystal structures to the simulated structures. Grid voxels were assigned to each of the amino acids of these ligands to calculate the solvation free energy of the water molecules replaced by the ligands. The grid points were weighted by a Gaussian function with respect to the distance of the atoms of the ligand residue. For the distance in the Gaussian function, the nearest grid point to any side chain atom of any ligand amino acid at the specific substrate position was used. A σ of 1 Å was used to define the Gaussian, and a hard cutoff was defined after 3 σ (3 Å away from any of the ligand’s atoms).

Results

The solvation free energy was calculated for all 20 naturally occurring amino acids as described in the Methods section. The resulting values for these calculations are reported in the SI (Table S1). We then calculated the hydrophobicity in substrate space via eq . For each substrate position, a value was assigned according to the average hydrophobicity of an amino acid in the substrate at that position. In this metric, small values (less negative) imply that the pocket is prone to bind hydrophobic residues. Large values, i.e., favorable solvation free energies, indicate that mostly hydrophilic residues were determined to bind in this pocket. As a reference for our GIST-based approach, we calculated the same values based on the widely established Eisenberg scale. This scale has been created as a consensus scale of five different hydrophobicity scales. The scales considered to generate this consensus scale ranged from scales based on the free energies for the transition from water to ethanol[99,100] and for the transfer from water to vapor,[101] as well as two scales based on calculated free energies for the transfer of amino acid side chains from the surface into the interior of a protein[102,103] and a last semiempirical scale.[104]Figure compares the substrate hydrophobicity from GIST and the Eisenberg scale for four representative proteases. The results for the remaining five Chymotrypsin-like serine proteases can be found in the SI (Figure S1). Comparing the two metrices for all studied proteases and subpockets, we find a striking Pearson correlation of 0.88.

Figure 1

Average hydrophobicity at each substrate position (P4 to P4′). Based on the MEROPS cleavage data, we calculate substrate hydrophobicity using GIST (left) and the Eisenberg scale (right). Substrate positions range from more hydrophobic, i.e., higher score on the hydrophobicity scale (white), to more hydrophilic, i.e., lower score on the hydrophobicity scale (blue). Furthermore, we used GIST to quantify and localize the hydrophobicity within the binding site of each studied protease. To illustrate the resulting free energy grid, we mapped it onto the protein surfaces. Figure compares localized hydrophobicity within the protease binding interface with the respective substrate hydrophobicity. The same analysis was carried out for the remaining five proteases and can be found in the Supporting Information (Figure S2).

Figure 2

Comparison of the substrate hydrophobicity (left), the localized subpocket hydrophobicity (middle), and the protease binding interface colored by the localized hydrophobicity (right). The surface is colored from white—more hydrophobic, higher solvation free energy (cutoff: 0 kcal/mol)—to blue—more hydrophilic, more negative solvation free energy (−6 kcal/mol). The substrate hydrophobicity from Figure is depicted on the far left, and the localized subpocket hydrophobicity is depicted in the middle. For both, the values range from more hydrophobic (white) to more hydrophilic (blue). To quantify the local hydrophobicity in each subpocket, we calculated the localized subpocket hydrophobicity using four different ligands, as described in the Methods section. This approach allowed us to calculate the correlation between the localized subpocket hydrophobicity and the substrate hydrophobicity. Including all nine proteases in our set, we find a Pearson correlation coefficient of 0.70. For a more qualitative visualization we also mapped the values for the hydrophobicity onto the surface, as mentioned in the Methods section, and for visualization we used PyMOL.[105] Here again, a good agreement between the substrate hydrophobicity and our projection becomes apparent. In this way, Figure intuitively reflects that the studied proteases are not only distinct in their substrate preferences but also in their respective surface hydrophobicities. Elastase, for example, is known to bind a broad range of hydrophobic substrates. First, the substrate hydrophobicity of Elastase is significantly higher than for the remaining proteases, with only minor signals in the hydrophilic range. Second, also the solvation free energy within the binding site of Elastase clearly points toward an overall hydrophobic site. In particular, its S1 subpocket is considerably more hydrophobic than in other proteases, but also the outer subpockets of Elastase show barely any indication of hydrophilicity. Kallikrein is a protease that recognizes a higher variety of amino acids at the P1 substrate position, is more hydrophilic than, for example, Elastase, but is rather hydrophobic compared to the strongly hydrophilic proteases, e.g., Trypsin. Using the described metrics, we can clearly identify Kallikrein as amphiphilic in the S1 subpocket. This is interesting as also the surface mapping shows different parts in the S1 subpocket, where a very hydrophilic and a more hydrophobic part can be seen. Granzyme B shows a fairly different pattern as it accepts almost exclusively hydrophilic substrates at the S1 position, recognizing primarily aspartate and glutamate. This is well in line with the surface mapping of solvation free energy, where we find a pronounced hydrophilic signal at that position. Interestingly, Granzyme B is rather hydrophobic at the P4 substrate position, which also agrees well with the binding site’s solvation free energy at the S4 subpocket. Trypsin, compared to the other proteases, is strongly hydrophilic at the P1 substrate position but rather hydrophobic in the remaining substrate positions. We observe the same trend in the surface mapping, where the subpocket S1 is depicted as extraordinarily hydrophilic while the other subpockets are much more hydrophobic. Also, our localized substrate hydrophobicity metric clearly quantifies that the S1 subpocket is significantly more hydrophilic than the other subpockets.

GPU Implementation

We tested the efficiency of our parallelization efforts on water boxes of increasing size and on the example of Chymotrypsin. The computational time of the calculations for the different systems is shown in Figure . For the water boxes an almost linear increase in the acceleration can be seen even to the 50 Å radius. When looking at the speedup with respect to the number of atoms, we still observe a significant acceleration up to our biggest system with almost 90 000 atoms. For the considered systems, we achieved four times faster calculations on the smaller systems, which increase to even 45 times faster calculations for larger systems. So far, these trends are not really showing indications of leveling off. We did not test the speedup for systems larger than 50 Å, because of the substantial computational demand of the CPU implemented version of GIST. All benchmarks were carried out using a machine consisting of an AMD Ryzen 7 1800X and a NVIDIA GeForce GTX 1080. The exact values are given in the SI (Tables S2 and S3) both for the speedup of the entire cpptraj run and for only the parallelized part of the code.

Figure 3

Calculation time of the GIST calculation on the GPU (green) compared to the original implementation on the CPU (orange) and the respective speedup, for the entire analysis. The same is shown as the inlet, using a logarithmic scale for the time. Calculations were performed on the same machine, containing an AMD Ryzen 7 1800X CPU and a NVIDIA GeForce GTX 1080 GPU. GIST calculations were performed on a truncated octahedron water box, with increasing radius.

Discussion

We use the physics-based method GIST to profile the hydrophobicity of protein–protein interaction sites. We test the reliability of the approach on the thoroughly studied family of Chymotrypsin-like serine proteases. The virtues of this set of model systems include structurally well-characterized binding sites as well as compelling statistics on their substrate preferences. Furthermore, protease binding interfaces are designed to allow significant differences in their local physicochemical properties. To evaluate the binding preferences of these nine different proteases, we performed a simple analysis of binding data found in the MEROPS database. We benchmark the plausibility of the GIST-based results against the well-established Eisenberg hydrophobicity scale. Reassuringly, we find that both methods perform very similar in respect to the calculation of substrate hydrophobicity, despite major differences in their underlying concepts. This consistency between the two metrics is further highlighted by a Pearson correlation coefficient of 0.88 over all substrate positions in all nine proteases. Interestingly, the scale devised by Eisenberg rates the hydrophilic substrate positions less strongly than our GIST-based approach. This trend probably originates from the generally broader spread of the data calculated via GIST. The values calculated by GIST are more negative for the hydrophilic residues than is the case in the Eisenberg scale. Additionally, the GIST calculation typically predicts a lower hydrophobicity for the negatively charged amino acids than other hydrophobicity scales. This effect has been observed before[62] and is again clearly visible in Figure , when comparing substrate hydrophilicities of Granzyme B, which mainly binds aspartate and glutamate in the S1. In the GIST calculation, the P1 substrate position of Granzyme B is depicted as strongly hydrophilic. However, using the Eisenberg scale, this position is substantially more hydrophobic than in the GIST approach. This discrepancy stems from the fact that the GIST based method ranks the two negatively charged amino acids, i.e., aspartate and glutamate, more hydrophilic than the Eisenberg scale. In the Eisenberg scale these amino acids are considered even less hydrophilic than some of the uncharged amino acids. However, in our scale, these amino acids are ranked at the hydrophilic end of the scale, with aspartate and glutamate being the most hydrophilic amino acids. In addition to the hydrophobicity of each substrate position we use GIST to establish a localized quantification of hydrophobicity within the protease binding sites. Already visually, the qualitative agreement between hydrophobicity mapped onto the protease surface and the substrate hydrophobicity is evident. We further calculate the localized subpocket hydrophobicity to quantitively compare substrate and binding site hydrophobicity. The resulting Pearson correlation coefficient of 0.70 indicates that a hydrophilic subpocket prefers hydrophilic substrates. This is quite intuitive, as the favorable interactions between the water and the hydrophilic surface need to be replaced by other favorable interactions, e.g., with hydrophilic residues. Hydrophobic surfaces, on the other hand, are drawn to each other as thereby unfavorable interactions of water molecules are then removed and replaced. Water molecules can be released into the bulk, where it has more favorable interactions than with the biomolecule. Furthermore, a high gain in free energy can be achieved when releasing fixed, i.e., entropically unfavorable, water molecules into the bulk, where these molecules can move freely. Thus, a high gain in entropy can compensate for a smaller loss in enthalpy. This is also understood as enthalpy–entropy compensation, where a favorable enthalpic interaction comes at the price of an unfavorable entropic interaction.[106,107] The projection of the hydrophobicity onto the surface shows quite interesting features on the protease binding sites. A generally reassuring observation we made is that more hydrophilic substrates correspond to more hydrophilic subpockets. This can be visualized for the two proteases that accept primarily charged amino acids, i.e., Trypsin and Granzyme B. However, for Elastase, which favors hydrophobic amino acids, all subpockets are hydrophobic. In this protease, the S1 subpocket seems to be the most hydrophobic subpocket of the entire set. Kallikrein binds both hydrophilic and hydrophobic residues, and strikingly its S1 subpocket is split into two distinct sections, which could explain its amphipathic binding behavior. Interestingly, such a partitioning into different areas is also found for proteases favoring solely hydrophilic amino acids in the S1. In most of the cases where a protease favors a positively charged amino acid in the S1 subpocket, this subpocket is hydrophilic at the bottom but is rather hydrophobic around its edges. At first glance, this seems a little counterintuitive, but the positively charged amino acids lysine and arginine consist of a charged group on a long hydrophobic chain. The hydrophobic area at the S1 pocket outlines might thus facilitate replacing water molecules at these positions. This hypothesis is further strengthened by the surface hydrophobicity of Granzyme B’s S1 subpocket. Neither residue favored, i.e., aspartate and glutamate, contains a long hydrophobic chain but is primarily characterized by its negatively charged group. A coloring based solely on the existing hydrophobicity scales is not able to capture the differences in hydrophobicity present in the different areas of the binding interface. When using the Eisenberg scale to analyze the binding interfaces in terms of hydrophobicity, the projected values in most cases do not fit to the values calculated from the substrate distribution (Figure S3). This is not surprising, as this method uses predefined values that cannot consider the surroundings of the different amino acids, as discussed previously. Interestingly, for many different proteases, the Eisenberg scale actually suggests the inverse trend, for example, Chymotrypsin, which is very hydrophobic in the S1 and is colored as slightly hydrophilic. However, Thrombin and Factor Xa, which both heavily favor hydrophilic substrates in the S1 pocket, are colored as almost completely hydrophobic. But both are considered hydrophilic in our presented method. This finding emphasizes one of the most intriguing qualities of the presented metric, which is that it captures nonadditive effects to hydrophobicity, and it can also analyze cooperative contributions to hydrophobicity. Cooperative effects might play a significant role in different aspects of hydrophobicity. A large amount of hydrophobic amino acids, building a cavity, induce order on the encapsulated water molecules, thus leading to an unfavorable entropic contribution. This fixed water then induces order to the other water molecules in its surrounding. This can even go so far that the encapsulated water molecules are forced into a fixed position due to the arrangement of the other water molecules in that cavity. Most prominently, this has been shown to be the case in the streptavidin system by Young et al.,[53] where a network of water molecules is constrained into a similar pattern as the ligand, meaning that the hydration of amino acids is quite different in the complex environment of a protein compared to the free amino acids. When considering only the free amino acids, it is not possible to capture these cooperative effects of the solvent molecules. However, the GIST-based approach allows for a localized analysis of hydrophobicity while considering the surrounding of each residue. This facilitates a reliable and in-depth analysis of the protein surface. Another quite interesting feature of this metric is the potential to split the hydrophobicity into enthalpic and entropic contributions (Figures S3 and S4). Hydration of proteins is still a field of ongoing research with a lot of controversy about the actual mechanism. We believe that molecular dynamics simulation and the analysis of the hydration on an atomistic level is of high interest. Our method can directly measure the entropic and enthalpic contributions to hydration around the protein’s surface. This is particularly interesting for the entropic contributions, as these are directly calculated from the phase space of the water molecules. Thus, from one simulation all major contributions to hydrophobicity of a particular conformation can be studied separately. In their recent work, Nguyen et al.[60] compared the thermodynamic quantities generated by GIST with 3D-RISM. In their study, they used Factor Xa, a protein also investigated here. As mentioned previously, they found that 3D-RISM generally underestimates the enthalpic penalty for moving a water molecule out of the bulk onto a surface. The entropic penalty showed the same behavior when compared to GIST. However, both methods use different approximations for the calculation of the entropy and thus no conclusions were drawn in that regard. For this study, their findings suggest that 3D-RISM will generally yield more hydrophilic properties on the surface. A key aspect of this work was the reimplementation of the GIST algorithm on the GPU. We already briefly discussed the speedup of the GPU implemented GIST calculation, showing a speedup of up to 30 for the protease systems and up to 45 for the simulated water boxes. We believe that the higher amount of speedup for the largest system is an indication that the significant gain in speed still has not started to level off with system size. A few points limit the speedup in smaller systems; first the copying of the data to and from the GPU’s memory might be fast but is, for very small systems, also a significant factor slowing down the calculation on the GPU. Furthermore, the limited number of atoms in small systems might not fully utilize the GPU’s potential, thus not allowing for very high speedup, compared to the larger systems. For our largest system, we reached a speed of about 1 frame per second, while the CPU implementation of GIST required almost 1 min for each frame, showing the highest speedup in our test systems of nearly 45. This significant acceleration of GIST, coupled with cpptraj, allowed us to analyze our test systems (ca. 30000 atoms) in roughly 10 min, compared to the hours it would have taken otherwise. This is most apparent when looking at the calculation of the largest water box, where the calculation with our GPU implementation took about 1.5 h. However, when calculating it on the CPU, the identical analysis took over three full days. We surmise that the solvation free energy as a metric for protein surface hydrophobicity will be particularly valuable for drug discovery and optimization. The GPU implementation facilitates a fast detection of trapped water molecules on the surface, which are easily replaced and thus increase binding affinity of putative ligands. Furthermore, surface hydrophobicity is discussed as a determinant for protein aggregation. The presented metric is fast enough to reliably identify hydrophobic areas on extensive biomolecular surfaces and thus estimate the developability of biopharmaceuticals.

Conclusion

Hydrophobicity scales have been around for over 50 years now and are still an integral part in an enormous amount of theoretical methods. They are of crucial importance for analyzing membrane-bound proteins and in various other fields. The major drawback of these methods is that hydrophobicity is considered residue-wise, neglecting all cooperative effects, which is particularly problematic in the complex environment of biomolecules. The presented method is based on the solvation free energy of a solute and inherently includes cooperative effects for protein surfaces. We show that the approach can localize hydrophobicity in the binding interfaces of nine Chymotrypsin-like serine proteases. The resulting solvation patterns are well in line with the respective substrate preferences deposited in the MEROPS database. Thus, our physics-based approach to analyze hydrophobicity is able to reliably characterize the binding interface of the studied proteases. Reimplementation of the GIST algorithm on the GPU led to a significant speedup. For our tested systems, we reached a maximum speedup of almost 45 times faster than calculation on the CPU. With the presented GPU implementation of GIST, solvation free energy analysis of large biomolecular systems at reasonable calculation times is now publicly accessible.

91 in total

1. Estimating entropies from molecular dynamics simulations.

Authors: Christine Peter; Chris Oostenbrink; Arthur van Dorp; Wilfred F van Gunsteren
Journal: J Chem Phys Date: 2004-02-08 Impact factor: 3.488

2. Development of hydrophobicity parameters to analyze proteins which bear post- or cotranslational modifications.

Authors: S D Black; D R Mould
Journal: Anal Biochem Date: 1991-02-15 Impact factor: 3.365

3. PTRAJ and CPPTRAJ: Software for Processing and Analysis of Molecular Dynamics Trajectory Data.

Authors: Daniel R Roe; Thomas E Cheatham
Journal: J Chem Theory Comput Date: 2013-06-25 Impact factor: 6.006

4. Determination of intrinsic hydrophilicity/hydrophobicity of amino acid side chains in peptides in the absence of nearest-neighbor or conformational effects.

Authors: James M Kovacs; Colin T Mant; Robert S Hodges
Journal: Biopolymers Date: 2006 Impact factor: 2.505

5. Dowser++, a new method of hydrating protein structures.

Authors: A Morozenko; A A Stuchebrukhov
Journal: Proteins Date: 2016-07-05

6. Electrostatic steering and ionic tethering in enzyme-ligand binding: insights from simulations.

Authors: R C Wade; R R Gabdoulline; S K Lüdemann; V Lounnas
Journal: Proc Natl Acad Sci U S A Date: 1998-05-26 Impact factor: 11.205

7. Solvent effects on protein motion and protein effects on solvent motion. Dynamics of the active site region of lysozyme.

Authors: C L Brooks; M Karplus
Journal: J Mol Biol Date: 1989-07-05 Impact factor: 5.469

8. On the size of the active site in proteases. I. Papain.

Authors: I Schechter; A Berger
Journal: Biochem Biophys Res Commun Date: 1967-04-20 Impact factor: 3.575

9. Structural insights into the substrate specificity of human granzyme H: the functional roles of a novel RKR motif.

Authors: Li Wang; Kai Zhang; Lianfeng Wu; Shengwu Liu; Honglian Zhang; Qiangjun Zhou; Liang Tong; Fei Sun; Zusen Fan
Journal: J Immunol Date: 2011-12-09 Impact factor: 5.422

10. Probing Hydration Patterns in Class-A GPCRs via Biased MD: The A_2A Receptor.

Authors: Syeda Rehana Zia; Roberto Gaspari; Sergio Decherchi; Walter Rocchia
Journal: J Chem Theory Comput Date: 2016-11-10 Impact factor: 6.006

12 in total

1. Thermodynamic Decomposition of Solvation Free Energies with Particle Mesh Ewald and Long-Range Lennard-Jones Interactions in Grid Inhomogeneous Solvation Theory.

Authors: Lieyang Chen; Anthony Cruz; Daniel R Roe; Andrew C Simmonett; Lauren Wickstrom; Nanjie Deng; Tom Kurtzman
Journal: J Chem Theory Comput Date: 2021-04-08 Impact factor: 6.006

2. Strength, deformability and toughness of uncrosslinked fibrin fibers from theoretical reconstruction of stress-strain curves.

Authors: Farkhad Maksudov; Ali Daraei; Anuj Sesha; Kenneth A Marx; Martin Guthold; Valeri Barsegov
Journal: Acta Biomater Date: 2021-10-02 Impact factor: 8.947

3. Ground-State Destabilization by Active-Site Hydrophobicity Controls the Selectivity of a Cofactor-Free Decarboxylase.

Authors: Michal Biler; Rory M Crean; Anna K Schweiger; Robert Kourist; Shina Caroline Lynn Kamerlin
Journal: J Am Chem Soc Date: 2020-11-12 Impact factor: 15.419

4. Macrocycle Cell Permeability Measured by Solvation Free Energies in Polar and Apolar Environments.

Authors: Anna S Kamenik; Johannes Kraml; Florian Hofer; Franz Waibl; Patrick K Quoika; Ursula Kahler; Michael Schauperl; Klaus R Liedl
Journal: J Chem Inf Model Date: 2020-06-29 Impact factor: 6.162

5. Solvation Thermodynamics in Different Solvents: Water-Chloroform Partition Coefficients from Grid Inhomogeneous Solvation Theory.

Authors: Johannes Kraml; Florian Hofer; Anna S Kamenik; Franz Waibl; Ursula Kahler; Michael Schauperl; Klaus R Liedl
Journal: J Chem Inf Model Date: 2020-07-20 Impact factor: 6.162

6. An Online Repository of Solvation Thermodynamic and Structural Maps of SARS-CoV-2 Targets.

Authors: Brian Olson; Anthony Cruz; Lieyang Chen; Mossa Ghattas; Yeonji Ji; Kunhui Huang; Daniel J McKay; Tom Kurtzman
Journal: ChemRxiv Date: 2020-05-13

7. Conformational Ensembles of Antibodies Determine Their Hydrophobicity.

Authors: Franz Waibl; Monica L Fernández-Quintero; Anna S Kamenik; Johannes Kraml; Florian Hofer; Hubert Kettenberger; Guy Georges; Klaus R Liedl
Journal: Biophys J Date: 2020-11-18 Impact factor: 4.033

8. Quantum Chemical Microsolvation by Automated Water Placement.

Authors: Miguel Steiner; Tanja Holzknecht; Michael Schauperl; Maren Podewitz
Journal: Molecules Date: 2021-03-23 Impact factor: 4.411

9. Protein-Protein Binding as a Two-Step Mechanism: Preselection of Encounter Poses during the Binding of BPTI and Trypsin.

Authors: Ursula Kahler; Anna S Kamenik; Franz Waibl; Johannes Kraml; Klaus R Liedl
Journal: Biophys J Date: 2020-07-10 Impact factor: 3.699

10. An online repository of solvation thermodynamic and structural maps of SARS-CoV-2 targets.

Authors: Brian Olson; Anthony Cruz; Lieyang Chen; Mossa Ghattas; Yeonji Ji; Kunhui Huang; Steven Ayoub; Tyler Luchko; Daniel J McKay; Tom Kurtzman
Journal: J Comput Aided Mol Des Date: 2020-09-12 Impact factor: 4.179