Johannes Kraml1, Anna S Kamenik1, Franz Waibl1, Michael Schauperl2, Klaus R Liedl1. 1. Institute of General, Inorganic and Theoretical Chemistry and Center for Molecular Biosciences Innsbruck (CMBI) , University of Innsbruck , Innsbruck 6020 , Austria. 2. Skaggs School of Pharmacy and Pharmaceutical Sciences , University of California, San Diego , La Jolla , California 92039-0736 , United States.
Abstract
Solvation and hydrophobicity play a key role in a variety of biological mechanisms. In substrate binding, but also in structure-based drug design, the thermodynamic properties of water molecules surrounding a given protein are of high interest. One of the main algorithms devised in recent years to quantify thermodynamic properties of water is the grid inhomogeneous solvation theory (GIST), which calculates these features on a grid surrounding the protein. Despite the inherent advantages of GIST, the computational demand is a major drawback, as calculations for larger systems can take days or even weeks. Here, we present a GPU accelerated version of the GIST algorithm, which facilitates efficient estimates of solvation free energy even of large biomolecular interfaces. Furthermore, we show that GIST can be used as a reliable tool to evaluate protein surface hydrophobicity. We apply the approach on a set of nine different proteases calculating localized solvation free energies on the surface of the binding interfaces as a measure of their hydrophobicity. We find a compelling agreement with the hydrophobicity of their substrates, i.e., peptides, binding into the binding cleft, and thus our approach provides a reliable description of hydrophobicity characteristics of these biological interfaces.
Solvation and hydrophobicity play a key role in a variety of biological mechanisms. In substrate binding, but also in structure-based drug design, the thermodynamic properties of water molecules surrounding a given protein are of high interest. One of the main algorithms devised in recent years to quantify thermodynamic properties of water is the grid inhomogeneous solvation theory (GIST), which calculates these features on a grid surrounding the protein. Despite the inherent advantages of GIST, the computational demand is a major drawback, as calculations for larger systems can take days or even weeks. Here, we present a GPU accelerated version of the GIST algorithm, which facilitates efficient estimates of solvation free energy even of large biomolecular interfaces. Furthermore, we show that GIST can be used as a reliable tool to evaluate protein surface hydrophobicity. We apply the approach on a set of nine different proteases calculating localized solvation free energies on the surface of the binding interfaces as a measure of their hydrophobicity. We find a compelling agreement with the hydrophobicity of their substrates, i.e., peptides, binding into the binding cleft, and thus our approach provides a reliable description of hydrophobicity characteristics of these biological interfaces.
The hydrophobic effect
is a major driving force for chemical interactions
in aqueous solution and thus a heavily studied phenomenon.[1−4] However, it is discussed to be the least understood intermolecular
interaction,[5] which is particularly true
in the complex environment of biomolecules.[6,7] In
spite of the conceptual challenges, a detailed profile of the hydrophobic
surface area is critical in state-of-the-art drug design.[8,9] Hydrophobic areas, where water is easily replaced and released into
the bulk, indicate possible interaction sites at the protein surface.
On the one hand, this information can be exploited to optimize binding
properties of small molecules and biopharmaceuticals alike.[10−12] On the other hand, surface hydrophobicity is believed to be linked
to protein aggregation, a crucial factor in the developability of
biopharmaceutical products.[13,14] Hence, there is an
acute demand for methods to estimate the hydrophobicity of biomolecular
interfaces.A broad range of concepts and algorithms have already
been devised
to calculate the surface hydrophobicity of proteins. Most common approaches
describe hydrophobicity residue-wise, relying on an average hydrophobicity
for each amino acid, which is extracted from premeasured or precalculated
hydrophobicity scales.[15−20] These traditional hydrophobicity scales found a broad spectrum of
applications, ranging from purely sequence-based secondary structure
prediction tools to identification of hydrophobic patches on the protein
surfaces.[13−15,21−23] More fine-grained scales have been developed, where distinctions
are made between different atom types in the amino acids.[8,24] Despite their applicability for a broad range of problems, hydrophobicity
scales share a common limitation—they only consider the average
hydrophobicity of each building block independently. Yet, in the context
of a protein, the solvent interactions of each residue strongly depend
on its surrounding, i.e., on nonadditive effects. Furthermore, it
is interesting to study hydrophobicity in terms of entropic and enthalpic
contributions to gain insight into the underlying mechanisms of hydrophobicity
and protein hydration. However, none of the aforementioned methods
can quantify the entropic contribution to hydrophobicity, or hydration,
directly. Free energy perturbation and thermodynamic integration are
often used to calculate the hydration free energy of various molecules.[25,26] But, to calculate the entropic contribution to hydration these models
use either the temperature dependence of the free energy or the difference
of the free energy and enthalpy.[27,28]As hydration
is a key feature for a variety of different processes,
such as biomolecular recognition, a plethora of methods has been developed
in recent years to model and analyze water molecules and interactions
thereof (AcquaAlta,[29] AQUARIUS2,[30,31] Biki Hydra,[32] Consolv,[33] Dowser++,[34] GRID,[35−37] HINT toolkit,[38] MCSS,[39−41] PyWATER,[42] SPAM,[43] SZMAP,[44] WatCH,[45] WATCLUST,[46] WaterDock,[47] WaterFLAP,[48] WaterScore,[49] WATGEN,[50] wPMF[51]). Methods
like AQUARIUS2, MCSS, PyWATER, or WatCH were developed to suggest
the placement of water molecules, which can be extremely useful either
for further molecular dynamics simulations or molecular docking studies.
Another group of methods, such as GRID, WaterDock, or SPAM, calculate
the interaction energies of water molecules around a protein. But,
most of these cannot calculate the entropic contributions directly
or can only analyze certain parts of the protein. Another approach
to estimate the solvation free energy is presented by methods based
on the Inhomogeneous Solvation Theory (IST). Multiple implementations
exist, some of which only analyze the free energy of water on specific,
high density water sites, i.e., solvation thermodynamic of ordered
water (STOW)[52] or WaterMap.[53,54] Grid inhomogeneous solvation theory (GIST),[12,55−57] on the other hand, calculates the thermodynamic properties
of the water molecules for each voxel on a three-dimensional grid.
A highly versatile alternative to GIST is the 3D reference site interaction
model (3D-RISM).[58,59] This approach relies on the Ornstein–Zernike
equation to calculate the enthalpy and entropy directly. However,
due to the estimations necessary to calculate the closure relationship,
the values obtained by the 3D-RISM theory are approximate. In a recent
paper, Nguyen et al.[60] analyzed the differences
of GIST and 3D-RISM and have shown that, in general, 3D-RISM yields
lower enthalpic and entropic penalties for moving a water molecule
from the bulk to the protein’s surface. This leads to lower
free energies than are calculated by GIST. Another method similar
to GIST is the grid cell theory (GCT),[61] in which the thermodynamic values of the water molecules are calculated
on a grid according to the cell theory. However, compared to GIST
this approach is extremely computationally expensive.We recently
suggested an approach based on GIST to calculate hydrophobicity
as solvation free energy.[62] The strengths
of this approach include the possibility to dissect solvation free
energy into its contributions—enthalpy and entropy. In this
method, the individual contributions to the free energy of water surrounding
a solute are analyzed on a grid. This allows a detailed analysis of
the different contributions to the solvation free energy. Furthermore,
GIST allows areas contributing to the solvation free energy of the
solute to be identified, therefore providing a method to estimate
local contributions to the ensemble property. However, a major drawback
we encountered is the extensive computational demand of the approach,
especially when it comes to large biomolecular systems. In recent
years, the GPU has been harnessed more and more to accelerate simulations
and analyses thereof. Here, we reimplement the GIST algorithm to make
use of the GPU’s calculation power. Thereby, the approach becomes
feasible for the analysis of extensive biomolecular interfaces.We apply our algorithm to characterize the hydrophobicity of the
biomolecular interfaces of nine Chymotrypsin-like serine proteases.
Proteases in general accelerate the hydrolysis of peptide bonds, via
nucleophilic attack on the otherwise kinetically stable amide group.[63] Due to the versatility of this reaction, proteases
fulfill a broad range of functions in the human body.[64] Thus, they are highly appealing drug targets and a detailed
understanding of their substrate recognition mechanism is paramount.[65,66] For numerous proteases, their substrate preference has been extremely
well characterized by massive amounts of cleavage data accumulated
in the public MEROPS database.[67] A key
feature of protease binding sites captured by the MEROPS is that binding
data can be localized to so-called subpockets.[68,69] The concept of subpockets goes back to Schechter and Berger,[70] who suggested to number the binding interface
of proteases with S4–S4′ and to do the same on the substrate
side, where the substrate positions are named P4–P4′.
This partitioning of the binding interface facilitates a localized
analysis of the contributions to biomolecular recognition.Due
to their role in various diseases, the Chymotrypsin-like serine
protease family has been extensively studied experimentally as well
as computationally.[69,71,72] In one of our recent studies, we exploited the experimental data
on local substrate preferences to show that similarity of Chymotrypsin-like
proteases in terms of their electrostatic substrate recognition correlates
with the similarity of their electrostatic molecular interaction fields.[73] This finding suggests that electrostatic recognition
and shape-based recognition are orthogonal aspects of substrate binding.
Our results so far are thus well in-line with a two-step mechanism
of protein–protein binding, where in a first step, an initial
preordering of the substrate takes place, which relies on long-range
electrostatic interactions between the binding partners.[74−76] To allow for the second step, where the formation of the complex
structure is partly driven by shape complementarity, water in the
binding interface needs to be displaced. Displacement of energetically
unfavorable water molecules in the binding interface contributes beneficially
to the binding process. This effect is strongest when both the protease
binding interface and substrate are strongly hydrophobic, as water
molecules are then more easily stripped from both surfaces. We thus
surmise that, especially during this last step of protein–protein
complex formation, the hydrophobicity of the binding interface plays
a key role.In this study, we implement and employ the GIST
algorithm on the
GPU, for an accelerated estimation of solvation free energy from molecular
dynamics simulations. Focusing on a set of nine Chymotrypsin-like
serine proteases, we use the GIST results to characterize and localize
the hydrophobicity in the respective binding interfaces. Furthermore,
we compare this analysis to the proteases’ substrate data,
by introducing a straightforward methodology to calculate the average
hydrophobicity of the substrate binding to each of the proteases’
subpockets. Combining both analyses we then show that the substrate
hydrophobicity correlates well with the local solvation patterns of
the binding site.
Methods
Grid Inhomogeneous Solvation
Theory
In this section,
a brief overview of the grid inhomogeneous solvation theory, devised
by Gilson and co-workers, will be given. For a more detailed introduction,
the reader is referred to their original studies.[12,55−57] GIST uses a grid to replace the spatial integrals
from IST with discrete sums over the voxels on the constructed grid,
and only the discrete case will be explained here. For a more general
discussion of IST itself, we again refer the reader to the original
publication.[77] In GIST, the contributions
to a localized, density-weighted free energy of solvation is split
into energy and entropy contributions according to eq . The energy contribution is then
split up further into the solvent–solvent interaction energy
and the solute–solvent interaction energy, as shown in eq . For the entropy, only
the two-body term is considered and split into translational and orientational
entropy (eq ).The total equation for the localized free
energy of solvation can be derived by inserting eqs and 3 into eq . The calculation of E and E is straightforward, as this is the averaged
interaction energy between the solvent molecules in the current voxel
with the solute (E)
or the other solvent molecules (E). Algorithmically, these interaction energies directly derive
from the applied force field and are simply averaged over all frames
of the trajectory. To avoid double counting of the solvent–solvent
interactions, this contribution is divided by 2. For the entropic
contribution to solvation free energy, solvent–solvent correlations
are neglected. The entropy is then calculated via the distribution
of the solvent molecules in each voxel. This distribution is a function
of the position of the solvent molecule in three-dimensional space
and of the orientation of the solvent molecule. This can be partitioned
into two different functions, one considering only the position of
the solvent molecule, which is calculated as the translational entropy,
and a second one depending on the orientation of the solvent molecule
in a certain grid voxel, which is the orientational entropy. The translational
entropy is then calculated as a Shannon entropy of the distribution
of the position of the solvent molecules, as can be seen in eq .where k is Boltzmann’s
constant, ρ0 is the reference density
of the solvent model, V is the volume of a voxel, and g() is the positional distribution
of the solvent molecule.Similarly, the orientational entropy
is also calculated as a Shannon
entropy of the orientational distribution of the solvent molecules
at a certain position, following eq .As proposed by the original GIST implementation,[57] we use a nearest neighbor method[78] (eqs and 7) to compute the orientational
and translational entropy for voxel k, from the simulation
data.where N is the total
number of solvent molecules in voxel k and γ
is Euler’s constant that corrects for
an asymptotic bias, d is the closest distance between two solvent molecules, Δω is the closest angular
distance between two solvent molecules, and N is the total number of frames. Algorithmically, d is calculated as the Euclidean distance
between two solvent molecules. Δω is calculated as the distance between two quaternions, which represent
the orientation of each solvent molecule in space (eq ).[79,80]where |.| denotes the absolute value and ⟨q1, q2⟩ denotes
the dot product between the two quaternions.Both contributions
to the entropy can also be estimated together
via the same nearest neighbor method, estimating the integral of sixth
order.[57]For all discussed metrics,
the reference state is a pure solvent
box without solute. Since the solvent–solvent correlations
are neglected for the entropic contributions, the reference value
for both entropic contributions is zero, as there is no preferred
position or orientation resulting in a uniform distribution. Furthermore,
the reference for the solute–solvent interaction energy is
also zero, with only the solvent–solvent interaction energy
differing from zero. In the case of TIP3P water this reference energy
is −9.533 kcal/mol for the water–water interaction energy
per water molecule.
GIST Implementation
The GPU-accelerated
GIST algorithm
was implemented using C++ and CUDA. Cpptraj,[57,81] the analysis tool chain included in the AMBER software package,
served as backbone for the code. The presented GPU-accelerated GIST
algorithm will be provided to be included to the official version
of cpptraj. The calculation of the interaction energies was reimplemented
on the GPU, since this is the most time-consuming part. The calculation
of the entropic contributions is still performed on the CPU, as this
calculation is very fast in comparison. The GPU implementation of
GIST was analyzed with respect to speedup by performing GIST calculations
on pure truncated octahedron water boxes of various sizes. The entire
GIST calculation was considered for the analysis of the speedup. These
water boxes ranged from a minimum wall distance from the origin of
20 Å to 50 Å, which we will denote as a box radius for readability
purposes. For each water box 5000 frames were analyzed, and the grid
was created in a way to cover the entire simulation box, using a grid
size of 125 Å × 125 Å × 125 Å and a grid
spacing of 0.5 Å, resulting in a total number of 250 × 250
× 250 or 15,625,000 grid points.
Simulation Setup
We performed MD simulations starting
from the crystal structures of Chymotrypsin (4CHA),[82] Elastase (1QNJ),[83] Factor VIIa (1KLI),[84] Factor Xa (1C5M),[85] Granzyme B (1IAU),[86] Granzyme M (2ZGH),[87] Kallikrein-1 (1SPJ),[88] Thrombin (4AYY),[89] and Trypsin (1PQ7)[90] for subsequent GIST calculations. The structures were prepared
using the MOE software package (2019.01),[91] and ligands were deleted where present. We used the implemented
Protonate3D tool[92] to resemble a pH of
7.0. Special attention was devoted to the catalytic histidine, to
make sure it was not protonated. A TIP3P[93] water box was added with a minimal wall distance of 12 Å. All
simulations were performed using the AMBER simulation engine,[94] with the GPU implementation of the particle
mesh Ewald algorithm.[95]We employed
the Langevin thermostat for temperature control of the system and
simulated at a temperature of 300 K. Since simulations were performed
in the NpT ensemble, the Berendsen barostat was used with a relaxation
time of 2.0 ps. Each structure was simulated for 100 ns, and 5000
frames were used, to ensure convergence. The solute was restrained
using a restraint weight of 50 kcal/(mol Å2).The GIST analysis for the proteases was carried out, using the
newly implemented GPU version of the GIST algorithm. A grid of size
100 Å × 100 Å × 100 Å was used for all protease
structures, with grid spacing of 0.5 Å, resulting in a grid of
size 200 × 200 × 200, or 8 × 106 grid points.
The temperature was set to 300 K, and the reference density of water
was chosen according to the AMBER manual (TIP3P: 0.0329 molecules/Å3). For the analysis, 5000 frames of the simulation were used.
The reference water–water interaction energy was set to the
same as stated in the AMBER manual (TIP3P: −9.533 kcal/mol).
Amino Acid Solvation Free Energy
To calculate the solvation
free energy of each amino acid in the substrates, the same procedure
was followed as published previously.[62] Three conformations of each of the naturally occurring 20 amino
acids were considered to calculate their solvation free energy. In
a distance of 5 Å, integration of the solvation free energy was
performed to calculate the total solvation free energy for each conformation
of each amino acid. These values were then averaged, and for histidine,
the average of the two possible uncharged tautomeric states was used.
Substrate Hydrophobicity
The substrate hydrophobicity
is calculated as the weighted average hydrophobicity at each substrate
position, based on the MEROPS cleavage data.Here, we used the
hydrophobicity scale from Eisenberg[15] as
a reference to calculate the average hydrophobicity of an amino acid
at each substrate position. We normalized the values of that scale
by shifting the hydrophobicity value of the most hydrophobic amino
acid to zero. This enables an easy comparison between different methods.Additionally, we calculated the solvation free energy profile of
each substrate position using GIST. Also, for this scale, the most
hydrophobic amino acid was shifted to have a score of 0. Here, we
used the solvation free energy of the different amino acids as a scale,
as suggested previously.[62] Then, the hydrophobicity
for each substrate position was calculated via eq , which is the weighted average of the hydrophobicity
score of each individual amino acid. This step was performed for both
scales to compare the solvation free energy ansatz with the experimental
concept of Eisenberg’s scale.where P is the substrate position i, AA is the set of all 20 naturally occurring
amino acids, n is the number of
substrates containing amino acid a in substrate position i, and ΔG is the solvation free energy of amino acid a. The
substrate binding data needed for each protease was gathered from
the MEROPS database.
Protein Surface Hydrophobicity Mapping
The surface
hydrophobicity mapping projects the values of the solvation free energy
around a molecule onto its surface. In this case, the mapping was
performed on an atomistic level, but also more coarse- and more fine-grained
levels could be used, e.g., considering the entire residue or calculating
points on the surface and projecting the values there.The hydrophobicity
was calculated by a density-weighted average around each atom. Additionally,
a Gaussian cutoff function with a σ of 1 Å was applied
between 4 and 7 Å.
Localized Subpocket Hydrophobicity
The localized subpocket
hydrophobicity is a metric to calculate the solvation free energy
that would need to be replaced by a ligand binding into a certain
pocket of a protease.The localized subpocket hydrophobicity
was calculated, using four different ligands to define the binding
interface of each protease. The four different ligands where chosen
from different X-ray structures to cover the binding interface of
the different proteases from the S4 subpocket to the S4′ subpocket
(PDB Codes: 1DE7,[96]3LU9,[97]2ZGH,[87] and 3TJV(98)). The ligands were positioned by aligning
the protease crystal structures to the simulated structures.Grid voxels were assigned to each of the amino acids of these ligands
to calculate the solvation free energy of the water molecules replaced
by the ligands. The grid points were weighted by a Gaussian function
with respect to the distance of the atoms of the ligand residue. For
the distance in the Gaussian function, the nearest grid point to any
side chain atom of any ligand amino acid at the specific substrate
position was used. A σ of 1 Å was used to define the Gaussian,
and a hard cutoff was defined after 3 σ (3 Å away from
any of the ligand’s atoms).
Results
The solvation
free energy was calculated for all 20 naturally occurring
amino acids as described in the Methods section.
The resulting values for these calculations are reported in the SI
(Table S1).We then calculated the
hydrophobicity in substrate space via eq . For each substrate position,
a value was assigned according to the average hydrophobicity of an
amino acid in the substrate at that position. In this metric, small
values (less negative) imply that the pocket is prone to bind hydrophobic
residues. Large values, i.e., favorable solvation free energies, indicate
that mostly hydrophilic residues were determined to bind in this pocket.
As a reference for our GIST-based approach, we calculated the same
values based on the widely established Eisenberg scale. This scale
has been created as a consensus scale of five different hydrophobicity
scales. The scales considered to generate this consensus scale ranged
from scales based on the free energies for the transition from water
to ethanol[99,100] and for the transfer from water
to vapor,[101] as well as two scales based
on calculated free energies for the transfer of amino acid side chains
from the surface into the interior of a protein[102,103] and a last semiempirical scale.[104]Figure compares the substrate
hydrophobicity from GIST and the Eisenberg scale for four representative
proteases. The results for the remaining five Chymotrypsin-like serine
proteases can be found in the SI (Figure S1). Comparing the two metrices for all studied proteases and subpockets,
we find a striking Pearson correlation of 0.88.
Figure 1
Average hydrophobicity
at each substrate position (P4 to P4′).
Based on the MEROPS cleavage data, we calculate substrate hydrophobicity
using GIST (left) and the Eisenberg scale (right). Substrate positions
range from more hydrophobic, i.e., higher score on the hydrophobicity
scale (white), to more hydrophilic, i.e., lower score on the hydrophobicity
scale (blue).
Average hydrophobicity
at each substrate position (P4 to P4′).
Based on the MEROPS cleavage data, we calculate substrate hydrophobicity
using GIST (left) and the Eisenberg scale (right). Substrate positions
range from more hydrophobic, i.e., higher score on the hydrophobicity
scale (white), to more hydrophilic, i.e., lower score on the hydrophobicity
scale (blue).Furthermore, we used GIST to quantify
and localize the hydrophobicity
within the binding site of each studied protease. To illustrate the
resulting free energy grid, we mapped it onto the protein surfaces. Figure compares localized
hydrophobicity within the protease binding interface with the respective
substrate hydrophobicity. The same analysis was carried out for the
remaining five proteases and can be found in the Supporting Information
(Figure S2).
Figure 2
Comparison of the substrate
hydrophobicity (left), the localized
subpocket hydrophobicity (middle), and the protease binding interface
colored by the localized hydrophobicity (right). The surface is colored
from white—more hydrophobic, higher solvation free energy (cutoff:
0 kcal/mol)—to blue—more hydrophilic, more negative
solvation free energy (−6 kcal/mol). The substrate hydrophobicity
from Figure is depicted
on the far left, and the localized subpocket hydrophobicity is depicted
in the middle. For both, the values range from more hydrophobic (white)
to more hydrophilic (blue).
Comparison of the substrate
hydrophobicity (left), the localized
subpocket hydrophobicity (middle), and the protease binding interface
colored by the localized hydrophobicity (right). The surface is colored
from white—more hydrophobic, higher solvation free energy (cutoff:
0 kcal/mol)—to blue—more hydrophilic, more negative
solvation free energy (−6 kcal/mol). The substrate hydrophobicity
from Figure is depicted
on the far left, and the localized subpocket hydrophobicity is depicted
in the middle. For both, the values range from more hydrophobic (white)
to more hydrophilic (blue).To quantify the local hydrophobicity in each subpocket, we calculated
the localized subpocket hydrophobicity using four different ligands,
as described in the Methods section. This
approach allowed us to calculate the correlation between the localized
subpocket hydrophobicity and the substrate hydrophobicity. Including
all nine proteases in our set, we find a Pearson correlation coefficient
of 0.70.For a more qualitative visualization we also mapped
the values
for the hydrophobicity onto the surface, as mentioned in the Methods section, and for visualization we used PyMOL.[105] Here again, a good agreement between the substrate
hydrophobicity and our projection becomes apparent. In this way, Figure intuitively reflects
that the studied proteases are not only distinct in their substrate
preferences but also in their respective surface hydrophobicities.Elastase, for example, is known to bind a broad range of hydrophobic
substrates. First, the substrate hydrophobicity of Elastase is significantly
higher than for the remaining proteases, with only minor signals in
the hydrophilic range. Second, also the solvation free energy within
the binding site of Elastase clearly points toward an overall hydrophobic
site. In particular, its S1 subpocket is considerably more hydrophobic
than in other proteases, but also the outer subpockets of Elastase
show barely any indication of hydrophilicity.Kallikrein is
a protease that recognizes a higher variety of amino
acids at the P1 substrate position, is more hydrophilic than, for
example, Elastase, but is rather hydrophobic compared to the strongly
hydrophilic proteases, e.g., Trypsin. Using the described metrics,
we can clearly identify Kallikrein as amphiphilic in the S1 subpocket.
This is interesting as also the surface mapping shows different parts
in the S1 subpocket, where a very hydrophilic and a more hydrophobic
part can be seen.Granzyme B shows a fairly different pattern
as it accepts almost
exclusively hydrophilic substrates at the S1 position, recognizing
primarily aspartate and glutamate. This is well in line with the surface
mapping of solvation free energy, where we find a pronounced hydrophilic
signal at that position. Interestingly, Granzyme B is rather hydrophobic
at the P4 substrate position, which also agrees well with the binding
site’s solvation free energy at the S4 subpocket.Trypsin,
compared to the other proteases, is strongly hydrophilic
at the P1 substrate position but rather hydrophobic in the remaining
substrate positions. We observe the same trend in the surface mapping,
where the subpocket S1 is depicted as extraordinarily hydrophilic
while the other subpockets are much more hydrophobic. Also, our localized
substrate hydrophobicity metric clearly quantifies that the S1 subpocket
is significantly more hydrophilic than the other subpockets.
GPU Implementation
We tested the efficiency of our
parallelization efforts on water boxes of increasing size and on the
example of Chymotrypsin. The computational time of the calculations
for the different systems is shown in Figure . For the water boxes an almost linear increase
in the acceleration can be seen even to the 50 Å radius. When
looking at the speedup with respect to the number of atoms, we still
observe a significant acceleration up to our biggest system with almost
90 000 atoms. For the considered systems, we achieved four
times faster calculations on the smaller systems, which increase to
even 45 times faster calculations for larger systems. So far, these
trends are not really showing indications of leveling off. We did
not test the speedup for systems larger than 50 Å, because of
the substantial computational demand of the CPU implemented version
of GIST. All benchmarks were carried out using a machine consisting
of an AMD Ryzen 7 1800X and a NVIDIA GeForce GTX 1080. The exact values
are given in the SI (Tables S2 and S3)
both for the speedup of the entire cpptraj run and for only the parallelized
part of the code.
Figure 3
Calculation time of the GIST calculation on the GPU (green)
compared
to the original implementation on the CPU (orange) and the respective
speedup, for the entire analysis. The same is shown as the inlet,
using a logarithmic scale for the time. Calculations were performed
on the same machine, containing an AMD Ryzen 7 1800X CPU and a NVIDIA
GeForce GTX 1080 GPU. GIST calculations were performed on a truncated
octahedron water box, with increasing radius.
Calculation time of the GIST calculation on the GPU (green)
compared
to the original implementation on the CPU (orange) and the respective
speedup, for the entire analysis. The same is shown as the inlet,
using a logarithmic scale for the time. Calculations were performed
on the same machine, containing an AMD Ryzen 7 1800X CPU and a NVIDIA
GeForce GTX 1080 GPU. GIST calculations were performed on a truncated
octahedron water box, with increasing radius.
Discussion
We use the physics-based method GIST to profile
the hydrophobicity
of protein–protein interaction sites. We test the reliability
of the approach on the thoroughly studied family of Chymotrypsin-like
serine proteases. The virtues of this set of model systems include
structurally well-characterized binding sites as well as compelling
statistics on their substrate preferences. Furthermore, protease binding
interfaces are designed to allow significant differences in their
local physicochemical properties.To evaluate the binding preferences
of these nine different proteases,
we performed a simple analysis of binding data found in the MEROPS
database. We benchmark the plausibility of the GIST-based results
against the well-established Eisenberg hydrophobicity scale. Reassuringly,
we find that both methods perform very similar in respect to the calculation
of substrate hydrophobicity, despite major differences in their underlying
concepts. This consistency between the two metrics is further highlighted
by a Pearson correlation coefficient of 0.88 over all substrate positions
in all nine proteases.Interestingly, the scale devised by Eisenberg
rates the hydrophilic
substrate positions less strongly than our GIST-based approach. This
trend probably originates from the generally broader spread of the
data calculated via GIST. The values calculated by GIST are more negative
for the hydrophilic residues than is the case in the Eisenberg scale.
Additionally, the GIST calculation typically predicts a lower hydrophobicity
for the negatively charged amino acids than other hydrophobicity scales.
This effect has been observed before[62] and
is again clearly visible in Figure , when comparing substrate hydrophilicities of Granzyme
B, which mainly binds aspartate and glutamate in the S1. In the GIST
calculation, the P1 substrate position of Granzyme B is depicted as
strongly hydrophilic. However, using the Eisenberg scale, this position
is substantially more hydrophobic than in the GIST approach. This discrepancy stems
from the fact that the GIST based method ranks the two negatively
charged amino acids, i.e., aspartate and glutamate, more hydrophilic
than the Eisenberg scale. In the Eisenberg scale these amino acids
are considered even less hydrophilic than some of the uncharged amino
acids. However, in our scale, these amino acids are ranked at the
hydrophilic end of the scale, with aspartate and glutamate being the
most hydrophilic amino acids.In addition to the hydrophobicity
of each substrate position we
use GIST to establish a localized quantification of hydrophobicity
within the protease binding sites. Already visually, the qualitative
agreement between hydrophobicity mapped onto the protease surface
and the substrate hydrophobicity is evident. We further calculate
the localized subpocket hydrophobicity to quantitively compare substrate
and binding site hydrophobicity. The resulting Pearson correlation
coefficient of 0.70 indicates that a hydrophilic subpocket prefers
hydrophilic substrates. This is quite intuitive, as the favorable
interactions between the water and the hydrophilic surface need to
be replaced by other favorable interactions, e.g., with hydrophilic
residues. Hydrophobic surfaces, on the other hand, are drawn to each
other as thereby unfavorable interactions of water molecules are then
removed and replaced. Water molecules can be released into the bulk,
where it has more favorable interactions than with the biomolecule.
Furthermore, a high gain in free energy can be achieved when releasing
fixed, i.e., entropically unfavorable, water molecules into the bulk,
where these molecules can move freely. Thus, a high gain in entropy
can compensate for a smaller loss in enthalpy. This is also understood
as enthalpy–entropy compensation, where a favorable enthalpic
interaction comes at the price of an unfavorable entropic interaction.[106,107]The projection of the hydrophobicity onto the surface shows
quite
interesting features on the protease binding sites. A generally reassuring
observation we made is that more hydrophilic substrates correspond
to more hydrophilic subpockets. This can be visualized for the two
proteases that accept primarily charged amino acids, i.e., Trypsin
and Granzyme B. However, for Elastase, which favors hydrophobic amino
acids, all subpockets are hydrophobic. In this protease, the S1 subpocket
seems to be the most hydrophobic subpocket of the entire set. Kallikrein
binds both hydrophilic and hydrophobic residues, and strikingly its
S1 subpocket is split into two distinct sections, which could explain
its amphipathic binding behavior.Interestingly, such a partitioning
into different areas is also
found for proteases favoring solely hydrophilic amino acids in the
S1. In most of the cases where a protease favors a positively charged
amino acid in the S1 subpocket, this subpocket is hydrophilic at the
bottom but is rather hydrophobic around its edges. At first glance,
this seems a little counterintuitive, but the positively charged amino
acids lysine and arginine consist of a charged group on a long hydrophobic
chain. The hydrophobic area at the S1 pocket outlines might thus facilitate
replacing water molecules at these positions. This hypothesis is further
strengthened by the surface hydrophobicity of Granzyme B’s
S1 subpocket. Neither residue favored, i.e., aspartate and glutamate,
contains a long hydrophobic chain but is primarily characterized by
its negatively charged group.A coloring based solely on the
existing hydrophobicity scales is
not able to capture the differences in hydrophobicity present in the
different areas of the binding interface. When using the Eisenberg
scale to analyze the binding interfaces in terms of hydrophobicity,
the projected values in most cases do not fit to the values calculated
from the substrate distribution (Figure S3). This is not surprising, as this method uses predefined values
that cannot consider the surroundings of the different amino acids,
as discussed previously. Interestingly, for many different proteases,
the Eisenberg scale actually suggests the inverse trend, for example,
Chymotrypsin, which is very hydrophobic in the S1 and is colored as
slightly hydrophilic. However, Thrombin and Factor Xa, which both
heavily favor hydrophilic substrates in the S1 pocket, are colored
as almost completely hydrophobic. But both are considered hydrophilic
in our presented method.This finding emphasizes one of the
most intriguing qualities of
the presented metric, which is that it captures nonadditive effects
to hydrophobicity, and it can also analyze cooperative contributions
to hydrophobicity. Cooperative effects might play a significant role
in different aspects of hydrophobicity. A large amount of hydrophobic
amino acids, building a cavity, induce order on the encapsulated water
molecules, thus leading to an unfavorable entropic contribution. This
fixed water then induces order to the other water molecules in its
surrounding. This can even go so far that the encapsulated water molecules
are forced into a fixed position due to the arrangement of the other
water molecules in that cavity. Most prominently, this has been shown
to be the case in the streptavidin system by Young et al.,[53] where a network of water molecules is constrained
into a similar pattern as the ligand, meaning that the hydration of
amino acids is quite different in the complex environment of a protein
compared to the free amino acids. When considering only the free amino
acids, it is not possible to capture these cooperative effects of
the solvent molecules. However, the GIST-based approach allows for
a localized analysis of hydrophobicity while considering the surrounding
of each residue. This facilitates a reliable and in-depth analysis
of the protein surface.Another quite interesting feature of
this metric is the potential
to split the hydrophobicity into enthalpic and entropic contributions
(Figures S3 and S4). Hydration of proteins
is still a field of ongoing research with a lot of controversy about
the actual mechanism. We believe that molecular dynamics simulation
and the analysis of the hydration on an atomistic level is of high
interest. Our method can directly measure the entropic and enthalpic
contributions to hydration around the protein’s surface. This
is particularly interesting for the entropic contributions, as these
are directly calculated from the phase space of the water molecules.
Thus, from one simulation all major contributions to hydrophobicity
of a particular conformation can be studied separately.In their
recent work, Nguyen et al.[60] compared the
thermodynamic quantities generated by GIST with 3D-RISM.
In their study, they used Factor Xa, a protein also investigated here.
As mentioned previously, they found that 3D-RISM generally underestimates
the enthalpic penalty for moving a water molecule out of the bulk
onto a surface. The entropic penalty showed the same behavior when
compared to GIST. However, both methods use different approximations
for the calculation of the entropy and thus no conclusions were drawn
in that regard. For this study, their findings suggest that 3D-RISM
will generally yield more hydrophilic properties on the surface.A key aspect of this work was the reimplementation of the GIST
algorithm on the GPU. We already briefly discussed the speedup of
the GPU implemented GIST calculation, showing a speedup of up to 30
for the protease systems and up to 45 for the simulated water boxes.
We believe that the higher amount of speedup for the largest system
is an indication that the significant gain in speed still has not
started to level off with system size. A few points limit the speedup
in smaller systems; first the copying of the data to and from the
GPU’s memory might be fast but is, for very small systems,
also a significant factor slowing down the calculation on the GPU.
Furthermore, the limited number of atoms in small systems might not
fully utilize the GPU’s potential, thus not allowing for very
high speedup, compared to the larger systems. For our largest system,
we reached a speed of about 1 frame per second, while the CPU implementation
of GIST required almost 1 min for each frame, showing the highest
speedup in our test systems of nearly 45. This significant acceleration
of GIST, coupled with cpptraj, allowed us to analyze our test systems
(ca. 30000 atoms) in roughly 10 min, compared to the hours it would
have taken otherwise. This is most apparent when looking at the calculation
of the largest water box, where the calculation with our GPU implementation
took about 1.5 h. However, when calculating it on the CPU, the identical
analysis took over three full days.We surmise that the solvation
free energy as a metric for protein
surface hydrophobicity will be particularly valuable for drug discovery
and optimization. The GPU implementation facilitates a fast detection
of trapped water molecules on the surface, which are easily replaced
and thus increase binding affinity of putative ligands. Furthermore,
surface hydrophobicity is discussed as a determinant for protein aggregation.
The presented metric is fast enough to reliably identify hydrophobic
areas on extensive biomolecular surfaces and thus estimate the developability
of biopharmaceuticals.
Conclusion
Hydrophobicity scales
have been around for over 50 years now and
are still an integral part in an enormous amount of theoretical methods.
They are of crucial importance for analyzing membrane-bound proteins
and in various other fields. The major drawback of these methods is
that hydrophobicity is considered residue-wise, neglecting all cooperative
effects, which is particularly problematic in the complex environment
of biomolecules. The presented method is based on the solvation free
energy of a solute and inherently includes cooperative effects for
protein surfaces.We show that the approach can localize hydrophobicity
in the binding
interfaces of nine Chymotrypsin-like serine proteases. The resulting
solvation patterns are well in line with the respective substrate
preferences deposited in the MEROPS database. Thus, our physics-based
approach to analyze hydrophobicity is able to reliably characterize
the binding interface of the studied proteases.Reimplementation
of the GIST algorithm on the GPU led to a significant
speedup. For our tested systems, we reached a maximum speedup of almost
45 times faster than calculation on the CPU. With the presented GPU
implementation of GIST, solvation free energy analysis of large biomolecular
systems at reasonable calculation times is now publicly accessible.
Authors: Lieyang Chen; Anthony Cruz; Daniel R Roe; Andrew C Simmonett; Lauren Wickstrom; Nanjie Deng; Tom Kurtzman Journal: J Chem Theory Comput Date: 2021-04-08 Impact factor: 6.006
Authors: Michal Biler; Rory M Crean; Anna K Schweiger; Robert Kourist; Shina Caroline Lynn Kamerlin Journal: J Am Chem Soc Date: 2020-11-12 Impact factor: 15.419
Authors: Anna S Kamenik; Johannes Kraml; Florian Hofer; Franz Waibl; Patrick K Quoika; Ursula Kahler; Michael Schauperl; Klaus R Liedl Journal: J Chem Inf Model Date: 2020-06-29 Impact factor: 6.162
Authors: Johannes Kraml; Florian Hofer; Anna S Kamenik; Franz Waibl; Ursula Kahler; Michael Schauperl; Klaus R Liedl Journal: J Chem Inf Model Date: 2020-07-20 Impact factor: 6.162
Authors: Franz Waibl; Monica L Fernández-Quintero; Anna S Kamenik; Johannes Kraml; Florian Hofer; Hubert Kettenberger; Guy Georges; Klaus R Liedl Journal: Biophys J Date: 2020-11-18 Impact factor: 4.033