Enzymatic function and activity of proteases is closely controlled by the pH value. The protonation states of titratable residues in the active site react to changes in the pH value, according to their pKa, and thereby determine the functionality of the enzyme. Knowledge of the titration behavior of these residues is crucial for the development of drugs targeting the active site residues. However, experimental pKa data are scarce, since the systems' size and complexity make determination of these pKa values inherently difficult. In this study, we use single pH constant pH MD simulations as a fast and robust tool to estimate the active site pKa values of a set of aspartic, cysteine, and serine proteases. We capture characteristic pKa shifts of the active site residues, which dictate the experimentally determined activity profiles of the respective protease family. We find clear differences of active site pKa values within the respective families, which closely match the experimentally determined pH preferences of the respective proteases. These shifts are caused by a distinct network of electrostatic interactions characteristic for each protease family. While we find convincing agreement with experimental data for serine and aspartic proteases, we observe clear deficiencies in the description of the titration behavior of cysteines within the constant pH MD framework and highlight opportunities for improvement. Consequently, with this work, we provide a concise set of active site pKa values of aspartic and serine proteases, which could serve as reference for future theoretical as well as experimental studies.
Enzymatic function and activity of proteases is closely controlled by the pH value. The protonation states of titratable residues in the active site react to changes in the pH value, according to their pKa, and thereby determine the functionality of the enzyme. Knowledge of the titration behavior of these residues is crucial for the development of drugs targeting the active site residues. However, experimental pKa data are scarce, since the systems' size and complexity make determination of these pKa values inherently difficult. In this study, we use single pH constant pH MD simulations as a fast and robust tool to estimate the active site pKa values of a set of aspartic, cysteine, and serine proteases. We capture characteristic pKa shifts of the active site residues, which dictate the experimentally determined activity profiles of the respective protease family. We find clear differences of active site pKa values within the respective families, which closely match the experimentally determined pH preferences of the respective proteases. These shifts are caused by a distinct network of electrostatic interactions characteristic for each protease family. While we find convincing agreement with experimental data for serine and aspartic proteases, we observe clear deficiencies in the description of the titration behavior of cysteines within the constant pH MD framework and highlight opportunities for improvement. Consequently, with this work, we provide a concise set of active site pKa values of aspartic and serine proteases, which could serve as reference for future theoretical as well as experimental studies.
Proteases catalyze the cleavage of peptide
bonds, a ubiquitous
reaction in the whole biosphere. Indeed, 2–3% of all human
genes code for proteases or protease inhibitors.[1] The function of the proteases is manifold. Processes from
signaling cascades over digestion to programmed cell death are based
on proteolytic processing.[2] Consequently,
the physiological environments where proteases need to operate are
very diverse as well, including vastly different ranges of acidity.
For example, digestive proteases in the stomach at a pH of 2.0 have
to catalyze the same reaction as proteases of the blood coagulation
cascade at a pH of 7.4 and proteases in the gut at basic conditions.[3,4] An overview of the various activity profiles of aspartic, cysteine,
and serine proteases is shown in Figure .[5,6] Taken together, these
three families cover a broad pH range in terms of activity. While
aspartic proteases are active in the acidic range, cysteine proteases
cover the mild acidic to neutral range and finally serine proteases
are mostly found active at neutral to slightly alkaline conditions.[3−7]
Figure 1
Overview
of the pH-dependent activity profiles of aspartic (red),
cysteine (yellow), and serine (blue) proteases. Together these three
families cover a broad pH span, ranging from very acidic, over neutral,
to mildly basic pH values. The ranges of major subfamilies are highlighted
in a darker shade of the respective color.
Overview
of the pH-dependent activity profiles of aspartic (red),
cysteine (yellow), and serine (blue) proteases. Together these three
families cover a broad pH span, ranging from very acidic, over neutral,
to mildly basic pH values. The ranges of major subfamilies are highlighted
in a darker shade of the respective color.The major distinctions between these three families in terms of
catalysis can be found in their active site architecture. The catalytic
center of aspartate proteases consists of an aspartic dyad, of which
one aspartate acts as a base and the other one as an acid during catalysis.[3,8,9] For this purpose, it is imperative
that the dyad is in a monoprotonated state when the protease is active.
In cysteine proteases on the other hand, a cysteine and a histidine
constitute the active site, which form an ion pair, i.e., the cysteine
is in its thiolate form, while the imidazole side chain of the histidine
is protonated and therefore positively charged.[7,9,10] In contrast, serine proteases show a catalytic
triad motif, consisting of an aspartate, a histidine, and a serine,
of which only the aspartate is negatively charged, while the histidine
and serine are ionized intermediately during catalysis.[4,9] In summary, the nature and arrangement of the active site residues
are decisive for the pH-dependent activity ranges of the different
protease families shown in Figure .Different pH values or changes thereof lead
to changes in the protonation
states of titratable residues within a protein.[11−14] How a titratable residue reacts
to different pH values is determined by its pKa value.[9,15] The so-called intrinsic pKa value of a titratable residue, i.e., the pKa of the isolated amino acid, is perturbed by
the complex electrostatic environment formed by its surrounding residues
within a protein to the so-called macroscopic or apparent pKa value.[9,16] Intrinsic pKa values of the various titratable amino acids
commonly found in proteins can be rigorously approximated by small
peptides (e.g., of the form acetyl-GXG-amide), which are easy to measure
directly (commonly by NMR) and readily available in the literature.[16] However, within the complex environment of a
protein, the direct determination of pKa values can be very challenging or even impossible.[16]All of the aforementioned active site residues, except
serine,
are titratable in the pH range of 0 to 10, in which also the discussed
proteases are active.[9] Thus, the titration
states of the active site residues depend directly on the pH. In consequence,
the pH determines whether or not the enzyme is active, since a well-defined
protonation state configuration is imperative for activity.As discussed above, the macroscopic pKa values determine how the titratable residues in the active site
react to a specific pH value, which in turn depends on the electrostatic
environment they encounter and therefore on the structure of the active
site itself. The pKa values of the active
site residues are thus decisive for inhibitor design and mechanistic
investigations. However, the experimental determination of these pKa values is extremely difficult, which is reflected
by the low number of available pKa values
in the literature. In consequence, computational tools, which can
reliably predict such pKa values are of
utmost importance. Over the last decades a multitude of such prediction
tools have emerged, most of which can be generally divided into two
groups.[17] The group of static methods,
e.g., PROPKA[18] or H++,[19] predicts pKa values based on
single or multiple static structures of a protein. In contrast, dynamic
methods such as the family of constant pH molecular dynamics (cpHMD)
methods use an ensemble of structures for protonation state predictions.[20] It is well-established that proteins in solution
are inherently flexible, meaning they relentlessly fluctuate between
diverse conformational states of varying probabilities.[21−25] Consequently, the structural environment around titratable residues
is continuously changing and the protonation state ensemble is inevitably
linked to conformational rearrangements. The cpHMD approach offers
the unique opportunity to account for this intricate interplay of
conformation and protonation. The approach not only incorporates a
diverse set of conformations into the pKa prediction itself, but also allows capturing how a protein structurally
adapts to different pH values.[20]Most of the different cpHMD approaches can be attributed to two
main groups, based on the treatment of the protonation states. On
the one hand, protonation states can be treated discretely, and all
titratable protons are explicitly defined at each titratable group
and if not active are only present as ghost particles. The simulation
is periodically interrupted, and the protonation state changes are
attempted based on a Metropolis criterion.[26−30] On the other hand, protonation states can be sampled
along a continuous titration coordinate λ.[31,32] Similar to the concept of thermodynamic integration,[33] if λ is 0, the respective residue is protonated
and if λ is 1, it is deprotonated; all states in between
are unphysical. As in typical simulations only a small number of frames
would meet this criterion, usually a cutoff is employed to maximize
the number of analyzable frames. Recently, Radak et al. presented
a hybrid nonequilibrium MD/Monte Carlo approach,[34] based on the works of Roux[35] and Stern.[28] Here, equilibrium MD is
performed with fixed protonation states. Periodically, a nonequilibrium
switch is attempted, sampling in the protonation and conformation
space. Whether or not the switch is accepted is determined via a Metropolis
Criterion. If the switch is indeed accepted, equilibrium MD continues
with the new protonation state from the final conformation of the
switch. If not, the simulation reverts back to the conformation before
the switch attempt. This approach is implemented in the NAMD package.[36] For a more in-depth discussion of the various
techniques, we point the reader to the respective works.[26−32,34,35]In this work we use cpHMD simulations to titrate the active
site
residues of selected proteases of the aspartic, cysteine, and serine
protease families. We focus on the methods implemented in the AMBER
software package.[37] We use primarily the
Monte Carlo[38] (MC)-based cpH approach,
as implemented in AMBER, with discrete protonation states, specifically
the most recent variant by Roitberg and co-workers utilizing
explicit solvent.[30] On the other hand,
we also make use of the continuous cpHMD approach, which was also
recently implemented in AMBER by Shen and co-workers.[32]Both aforementioned approaches of cpHMD have been
combined with
enhanced sampling techniques like replica exchange MD (REMD[39]) and accelerated MD (aMD[40]) in order to achieve efficient sampling of conformations
and protonation states.[29,30,32,41−44] The recent implementations of
cpHMD on graphics processing units (GPUs) dramatically increased calculation
speed. Hence, it is possible to capture dynamics at slower time scales
with continuous trajectories at feasible computational costs. Here
we use single pH cpHMD simulations, as they can be run easily in parallel
with an arbitrary number of GPUs and show acceptable convergence behavior.[45]We apply this workflow to a set of 9 representative
proteases from
three of the four main largest protease classes distinguished by the
catalytic mechanism.[46] On the basis of
relevance in drug discovery and differences in pH-dependent activity
profiles, we selected representative proteases from the aspartate,
cysteine, and serine protease families. We excluded the family of
metalloproteases, as for this family, the protonation/deprotonation
events in the active site are closely linked to the coordinating ion.
In order to capture this effect, a sophisticated description of the
electrostatics and polarizability of the ion would be necessary, which
is not possible for the force fields used within the cpHMD framework.For the selected proteases we efficiently capture reliable protonation
state ensembles. In addition to reference pKa values, we provide atomistic insights to rationalize the
origin of the strongly varying activity profiles.
Methods
System and
Simulation Setup
All systems were prepared
with the program MOE (molecular operating environment)[47] from X-ray structures, which are available in
the PDB. The respective PDB codes are summarized in Table S1 in the Supporting Information. All crystal waters,
agents, and ligands were removed if any were present. If multiple
chains were present in the entry, the chain with the highest quality
and sequence coverage was chosen based on the full PDB validation
report.The LEaP module of AmberTools 19[37] was used to add missing hydrogens and create topology and
starting coordinate files. The AMBER ff99SB[48] force field coupled with the necessary modifications for constant
pH MD simulations was used.[27,30] The GB radii of the
titratable oxygens of aspartate were reduced to 1.3 Å as suggested
by Swails et al.[30] All systems were placed
in a truncated octahedral TIP3P water box with a minimum wall distance
of 10 Å.[49]Furthermore, the
cysteine protease papain was simulated using the
GB-Neck2 implicit solvent model with the appropriate GB radii.[50] As there were no reference energies available
for cysteine for this implicit solvent model, reference energies were
derived as suggested in the AMBER manual.[37] For the derivation of partial charges and force field parameters
of deprotonated, i.e., negatively charged serine, the structure was
prepared with MOE and the needed parameters subsequently derived with
Gaussian 16[51] and the antechamber framework
of AmberTools19.[37] Partial charges were
derived with the RESP[52] procedure.Before production simulations, all systems were equilibrated with
an elaborate protocol developed in our group.[53]All simulations were carried out with the pmemd module of
AMBER
18, making use of both the CPU and the GPU implementation.[37] Calculations were carried out on the Vienna
Scientific Cluster (VSC3 and VSC4) and on our in-house GPU cluster.The Langevin thermostat[54] with a collision
frequency of 5 ps–1 was used to keep a constant
temperature of 310 K, as was the Berendsen barostat[55] with a relaxation time of 2 ps to keep atmospheric pressure.
The SHAKE[56] algorithm was used to restrain
all bonds involving hydrogens, enabling the use of a 2 fs time step.
Long range electrostatics were treated with the particle-mesh Ewald
method[57] (PME), and a nonbonded cutoff
of 8 Å was used. All systems were simulated at pH values from
0.0 to 10.0 (0.0 to 14.0 for papain) with a 0.5 spacing. For all MC-based
cpHMD simulations, protonation state changes were attempted every
200 steps, followed by 200 steps of solvent relaxation after a successful
attempt. For the GB calculations within the cpHMD framework, a salt
concentration of 0.1 was used. For aspartic proteases, the two aspartates
comprising the catalytic dyad were selected to titrate. For serine
proteases, the aspartate and the histidine of the catalytic triad
were selected to titrate. For the cysteine protease papain, two approaches
were tested. On the one hand, both the cysteine and the histidine
of the catalytic center were selected to titrate, and on the other
hand, only the cysteine was titrated, while keeping the histidine
in its doubly protonated, i.e., positively charged form. Frames were
collected every 1000 frames. All simulations were run for 100 ns per
pH value, resulting in 2.1 μs of aggregate simulation time per
system.For papain, the system was prepared following the procedure
described
by Shen and co-workers[50] and was simulated
using the recent implementation of continuous cpHMD in AMBER 18.[32,45,50] In brief, CHARMM[58] with the CHARMM22 all-hydrogen force field[59] was used to add missing hydrogens, terminal cappings, set
up the titratable groups, and perform initial minimizations. Hereafter,
the minimized structure was prepared with LEaP using the AMBER ff14SB[60] force field with the necessary modifications
for continuous cpHMD.[32] The GB radius of
the titratable sulfur was set to 2.0 Å as suggested by Shen and
co-workers.[50] The subsequent simulations
were carried out with the same settings as described above. For the
continuous cpHMD specific settings, a mass of 10 amu was used for
the lambda particles, a friction coefficient of 5 ps–1 was used for the titration integrator, and the forces of the lambda
particles were updated every step.
Analysis
All analyses
were performed using cpptraj[61] and pytraj
from AmberTools 19,[37] combined with in-house
python scripts. Analysis of the
continuous cpHMD data was done with a python script provided by Shen
and co-workers.[62] Structural representations
were created with PyMol.[63]Titration
data from MC-based cpHMD simulations was analyzed with the cphstats
program from AmberTools 19.[37] As the titrations
of the catalytic residues were strongly coupled, titration curves
were obtained by fitting the average number of total protons as was
shown previously by Roitberg and co-workers for the HIV-1 protease
(eq ).[64] Setups in which only one residue was titrated were fitted
to the modified Hill equation (eq ).Shifts in pKa values
were evaluated using capped tripeptide (acetyl-GXG-amide) pKa values as published by Platzer and McIntosh
as reference.[16] Convergence of pKa values was evaluated by monitoring the cumulative
averages of the pKa predictions.To profile protonation state transition probabilities between the
strongly active site residues in aspartate proteases, we set up a
4-state model based on the possible protonation state combinations
of the respective titrated residues and calculated transition matrices
based on these models, as we previously showed.[65] The matrices were then visualized as network plots, in
which circle sizes denote state probabilities and arrow sizes transition
probabilities.
Results
Chymotrypsin in Apo and
Bound Form
In order to benchmark
the robustness of the applied constant pH MD simulation approach,
we aimed to reproduce the pKa change of
the catalytic histidine associated with the activation of chymotrypsin
described by Lin et al.[66]We used
the apo enzyme as a model for the encounter complex of the protease
and the peptide-compound as shown in Figure A. With a predicted pKa value of 7.16, we closely reproduce the literature pKa value of 7.5 (Figure B). We modeled the negatively charged complex
of protease and peptide by simply deprotonating the catalytic serine
residue, thereby introducing an additional negative charge (see Figure A). For this system,
we find a pKa value of 10.78, which is
in line with the experimentally determined pKa range of 10–12.
Figure 2
Prediction of the pKa shift of the
catalytic HIS57 of chymotrypsin upon formation of the negatively charged
complex. Complex structure and schematic representation are shown
in the upper panel, titration curves obtained for the apo enzyme (left)
and the complex (right) are shown in the lower panel. The light blue
area denotes the active region of the enzyme, while experimental
(blue), and predicted (black) pKa values
are shown as lines.
Prediction of the pKa shift of the
catalytic HIS57 of chymotrypsin upon formation of the negatively charged
complex. Complex structure and schematic representation are shown
in the upper panel, titration curves obtained for the apo enzyme (left)
and the complex (right) are shown in the lower panel. The light blue
area denotes the active region of the enzyme, while experimental
(blue), and predicted (black) pKa values
are shown as lines.
Serine Proteases
For the serine protease family, elastase,
trypsin, granzyme B and chymotrypsin were considered. Reported activity
ranges and experimental pKa values (only
available for chymotrypsin) are summarized in Table . Side chain pKa values of the catalytic aspartate and histidine residues were determined
with single pH constant pH MD simulation as described in the method section. Reported activity profiles and predicted
pKa values are summarized in Figure .
Table 1
Summary of Serine Proteases Which
Were Considered in This Studya
protease
pH activity range
experimental
pKa values
elastase
7–7.5
ND
trypsin
7–8
ND
granzyme B
7–8
ND
chymotrypsin
8–9
7.5
Activity ranges reported in literature
and available experimental pKa values
are given.
Figure 3
Titration curves and
predicted pKa values
of the serine proteases elastase (A), trypsin (B), granzyme B (C),
and chymotrypsin (D), which were considered in this study. Activity
ranges reported in the literature are shown as colored boxes (Table ). Experimental (only
available for chymotrypsin) and predicted pKa values are shown as blue and black lines, respectively.
Activity ranges reported in literature
and available experimental pKa values
are given.Titration curves and
predicted pKa values
of the serine proteases elastase (A), trypsin (B), granzyme B (C),
and chymotrypsin (D), which were considered in this study. Activity
ranges reported in the literature are shown as colored boxes (Table ). Experimental (only
available for chymotrypsin) and predicted pKa values are shown as blue and black lines, respectively.As can be seen from Figure , all four systems show a similar titration
behavior. In each
system an acidic and a considerably higher, near neutral pKa value were captured, which span a broad pH
range in which a monoprotonated state is stable. While the upper pKa value is at or near 6.0 for all systems, clear
differences can be seen for the lower pKa value. While for trypsin and chymotrypsin the lower pKa is very acidic (below 1), this is less pronounced in
elastase and granzyme B. For the latter, the titrations of the two
active site residues appear to be coupled stronger and the pKa differences are smaller compared to the other
two systems (5.8 and 6.1 vs 3.1 and 3.5).Furthermore, the upper
pKa value of
6.2 found here for chymotrypsin deviates more from the reported pKa of 7.5 than the one reported for the isolated
titration of the active site histidine described above (7.2).The convergence analysis shows that all upper pKa values converged after 50–60 ns. The lower pKa values show a slower convergence, especially
for granzyme B and chymotrypsin (see Figure S2 in the Supporting Information). This is in line with the titration
curves in Figure ,
which show that the predictions are more noisy at lower pH values.In relation to the respective active pH ranges, we find that for
all systems the active range is located at pH values higher than both
pKa values, i.e., in a range where both
residues are unprotonated.
Aspartic Proteases
We selected a
set of 4 aspartic
proteases with varying pH activity ranges and experimental titration
information, as summarized in Table . Side-chain pKa values
of the active site aspartate residues were predicted with single pH
constant pH simulations as described in the Methods section. The calculated titration curves and respective predicted
pKa values are summarized in Figure .
Table 2
Summary of Aspartic Proteases Which
Were Considered in This Studya
protease
pH activity range
experimental
pKa values
chymosin
<3.5
ND
pepsin[5]
1–6
(optimum at 3.5)
1.57; 5.02
cathepsin D[67]
2.5–6.0
ND
HIV-protease 1[68−70]
4.0–6.0
3.1–3.7; 4.9–6.8
Experimental activity ranges
and available pKa values are given.
Figure 4
Titration curves and predicted pKa values
of the aspartate proteases chymosin (A), pepsin (B), cathepsin D (C),
and HIV-1 protease (D), which were considered in this study. Activity
ranges reported in the literature are shown as colored boxes (Table ). Experimental (if
available) and predicted pKa values are
shown as red and black lines, respectively.
Experimental activity ranges
and available pKa values are given.Titration curves and predicted pKa values
of the aspartate proteases chymosin (A), pepsin (B), cathepsin D (C),
and HIV-1 protease (D), which were considered in this study. Activity
ranges reported in the literature are shown as colored boxes (Table ). Experimental (if
available) and predicted pKa values are
shown as red and black lines, respectively.As can be seen from Figure , our approach closely reproduces the available experimental
pKa values of pepsin and the HIV-protease
I. Furthermore, the calculated pKa values
envelop the experimentally determined active range of the respective
protease (shown as colored boxes in Figure ).Both systems show notable pKa shifts
for both aspartates away from the free amino acid pKa value (3.86 for aspartate[16]), with the effect being more pronounced in the HIV-protease. In
both systems, one pKa value is shifted
more toward acidic and one toward more basic pKa values compared to the free amino acid. Especially in pepsin,
the titration curves appear to be strongly coupled, with practically
no gap between the titrating regions, whereas in the HIV-protease,
a monoprotonated state is stable for a broad pH range (pH 2.0 to pH
6.0). Consequently, the gap between the pKa values is much smaller in pepsin (ΔpKa = 3.4), compared to the HIV-protease (ΔpKa = 7.7). In both cases, the experimentally determined
active range of the respective protease is located between the two
pKa values, i.e., in the monoprotonated
region. While for pepsin the reported active region somewhat exceeds
both experimental and predicted pKa values,
the reported pH optimum of 3.5 is indeed located at the very center
of the calculated titration curve.Also for chymosin and cathepsin
D, for which no experimental pKa information
is available, titration curves
and pKa values could be estimated. The
calculated pKa values show the same trend
as already observed for pepsin and the HIV-protease I. Chymosin shows
a strongly coupled titration behavior, similar to that of pepsin,
with a small pKa gap of 3.1. The activity
maximum is reported to be below pH 3.5, which again lies at the very
center of the monoprotonated region of the titration curve. Cathepsin
D, on the other hand, shows a titration behavior similar to the HIV-protease
I, in that the gap between the pKa values
is larger (4.8) and a monoprotonated state is stable over a longer
pH range (pH 2 to 4). The reported activity range again lies in this
pH region, below the upper pKa value.The convergence analysis of the predicted pKa values shows that again all pKa values converge within the 100 ns of simulation time. Most of the
upper pKa values again converge faster
than their lower counterparts (see Figure S2 in the Supporting Information).
State and Transition Analysis
To characterize the transition
paths and state distributions of the strongly coupled titrations seen
for the aspartic proteases, we performed protonation state transition
analyses. The active site titrations are modeled as a 4-state-system,
based on the protonation states of the two aspartic residues (see Table ). States 0 and 3
represent the fully deprotonated and fully protonated states, respectively,
whereas states 1 and 2 both represent a monoprotonated state but distinguish
which aspartate is protonated. Hereby all 4 variants of a protonated
aspartate defined in the cpHMD framework are condensed into one state.
The resulting transition matrices are visualized as network plots
with circle and arrow sizes corresponding to state and transition
probabilities, respectively.
Table 3
State Definitions
Used for the Protonation
State Transition Analyses of the Aspartic Proteasesa
state number
ASP A
ASP B
0
0
0
1
0
1
2
1
0
3
1
1
Protonation
states are denoted
as 0 (deprotonated) or 1 (protonated).
Protonation
states are denoted
as 0 (deprotonated) or 1 (protonated).We find that all proteases follow a similar, pH-dependent
pattern
in terms of state populations and transition probabilities. In Figure , the results for
selected pH values of the pepsin simulations are shown exemplary.
The analysis for all pH values can be found in the Supporting Information
(Figure S1). We find that at very low or
very high pH values, the fully protonated (state 3) or deprotonated
(state 0) states are dominantly populated, respectively. Transitions
to other, very sparsely, populated states do occur but are rare. At
moderately acidic pH values, after the first titration has occurred,
states 1 and 2, i.e., the monoprotonated states, increase in population
until a near uniform distribution of all four states is reached. As
the pH further increases, first state 3 and consecutively also states
1 and 2 diminish as state 0 becomes more and more populated. Furthermore,
we note that primarily single state transitions occur, which correspond
to transitions over the edges in the network plots in Figure . However, also transitions
over the diagonal are visible, which correspond to both aspartates
changing their protonation state at the same time. However, these
transitions are very rare.
Figure 5
Protonation state transition analysis performed
for pepsin shown
as network plots at selected pH values. Circle sizes and arrow thickness
corresponds to state populations and transition probabilities, respectively.
Protonation state transition analysis performed
for pepsin shown
as network plots at selected pH values. Circle sizes and arrow thickness
corresponds to state populations and transition probabilities, respectively.
Cysteine Proteases
For the family
of cysteine proteases,
experimental pKa values are available
for a number of systems,[9] all of which
show a very strong acidic shift of the active site cysteine residue
away from its tripeptide pKa value of
8.5. The pKa values of active site cysteine
residues have been reported to be extremely challenging to predict,
with most of the available prediction tools failing to predict experimentally
determined pKa shifts and even predicting
shifts into the wrong direction.[71] Here,
we selected papain as a test system, which shows a strong acidic shift
of the active site cysteine of −5.2 pKa units (from 8.5 down to 3.3).[9]We used single pH constant pH MD simulations to predict the
active site pKa value, as described in
the Methods section. The resulting titration
curves and pKa values are shown in Figure A. Clearly, our approach
not only mispredicts the pKa value of
CYS25 but also does not capture the acidic pKa shift at all. As can be seen from the titration curve in Figure A, CYS25 does not
titrate at all in the pH range, which was used for the aspartic and
serine proteases and only starts to titrate at a pH as high as 12.0.
Figure 6
Titration
of papain active site residues CYS25 and HIS159 with
MC based constant pH MD with implicit solvent models GBOBC (A) and GB-Neck2 (B), cpH REMD with the GBOBC model (C)
and continuous cpHMD (D). Experimental pKa value of CYS25 of 3.3 could not be reproduced with any approach,
with the simulations using the GBOBC model (A and C) showing
no titration at all.
Titration
of papain active site residues CYS25 and HIS159 with
MC based constant pH MD with implicit solvent models GBOBC (A) and GB-Neck2 (B), cpH REMD with the GBOBC model (C)
and continuous cpHMD (D). Experimental pKa value of CYS25 of 3.3 could not be reproduced with any approach,
with the simulations using the GBOBC model (A and C) showing
no titration at all.In order to analyze the
source of these erroneous predictions,
we repeated the simulations utilizing different implicit solvent models
and GB-radii for the sulfur, as was suggested recently by Shen and
co-workers.[32,50] To do this, we had to derive
the reference energies for cysteines, which are necessary for the
cpHMD workflow, since they were not yet available for the GB-neck
2 model and different GB radii (see Figure B). Furthermore, we employed the constant
pH replica exchange (cpH REMD) technique, which is implemented in
AMBER. This approach was shown in multiple works to increase both
protonation state and conformational sampling (see Figure C). Finally, we also repeated
the simulations utilizing the recent implementation and setup of continuous
cpHMD in AMBER by Shen and co-workers (see Figure D).While the titration curves in both Figure B (GB-Neck2 implicit
solvent model) and Figure D (continuous cpHMD)
show notable improvements in the titration prediction of the active
site cysteine, the predicted pKa values
(8.75 and 9.80, respectively) are close to the pKa value of free cysteine (8.5) and the strong acidic shift,
which was observed in experiments, could not be captured. In contrast
to this, no benefit in terms of pKa prediction
could be achieved with the cpH REMD setup compared to single pH simulations.
Discussion
We use single constant pH MD simulations to predict
the active
site pKa values of various members of
the aspartic, serine, and cysteine protease families. We further investigate
the molecular origins which could explain the observed differences
in the pH activity ranges within the individual families.
Chymotrypsin
Apo and Bound
As structural data of substrate-bound
complexes are limited, we simulated all proteases in their apo-state.
Nevertheless, we recognize that the presence of a substrate, especially
with charged residues, in the active site might influence the pKa values of titratable active site residues.
For chymotrypsin, experimental pKa values
for both the apo enzyme as well as for various trifluoro-peptidyl
complexes are available (see Figure A).[9,66] To assess, how well our approach
can capture such changes in the active site, we predicted the pKa value of the apo enzyme as well as for the
modified enzyme. We approximated the peptide-bound form with a simple
negatively charged serine (see Figure A). While this is a drastic simplification, we presume
that in terms of pKa shift potential the
additional negative charge in the active site represents the most
decisive aspect. As there is no high-quality structural data of these
complexes available, we assume that the error introduced by this simplification
is smaller, compared to the inaccuracies resulting from modeling the
rather large complex into the binding site. The validity of our approximation
is supported by the pKa values we obtain
from our simulations as shown in Figure B. With an unsigned error of 0.32 pKa units, we closely reproduce the reported pKa value of the free enzyme. For the complexes,
we find a strong basic shift for the active site histidine. This shift
can be directly attributed to the additional charge on the serine
residue. The experimental pKa values for
the bound enzyme range from 10 to 13 depending on the complexed peptide.
Our predicted pKa of 10.8 is perfectly
in line with these results. This indicates that despite the simplified
representation of the bound state, we still capture the strongest
perturbation driving the pKa shift, which
is indeed the additional negative charge.
Aspartate Proteases
In Figure , we summarize
predictions and experimental
references for the family of aspartate proteases. As can be seen from Figure B,D, our approach
closely reproduces the available experimental pKa values of pepsin and the HIV-1 protease.[5,68−70] Notably, in both systems one pKa value is predicted to be shifted into the acidic and the
other one into the basic direction, compared to the tripeptide reference
values of aspartate. Our calculations reproduce these shifts for both
proteases. Furthermore, we find that the experimentally reported activity
range lies between the two respective pKa values.These findings are consistent with the mechanistic
picture of a monoprotonated catalytic dyad in active aspartic proteases.
To make an example with our predicted pKa values, following this argument, pepsin should quickly become inactive
if the pH falls below 2.1 or rises above 5.5. Indeed, pepsin is reported
to be active between pH 1 and pH 6, i.e., in the pH range where a
monoprotonated state is predicted to be stable (see Figure B). A similar picture can be
found for the HIV-1 protease (Figure D). Also here the experimental pH range from 4 to 6,
in which the enzyme is found active, lies between the predicted pKa values of −0.5 and 7.2. In contrast
to pepsin, the lower pKa value was predicted
to be extremely acidic in our simulations compared to the experimental
pKa values (3.1 or 4.9 depending on the
reference). Roitberg and co-workers studied the influence of ligand
binding on the active site pKa values
of the HIV-1 protease and predicted pKa values of 1.29 and 7.32 for the apo enzyme.[64] While their upper pKa value is very
close to ours (7.32 to 7.2), the lower pKa value they find is still less acidic than the one we find (1.29
to −0.5). However, both times the lower pKa is found to be clearly more acidic than in the experiment.
With a pKa difference of 7.7 pKa units, the pH range, in which a monoprotonated
catalytic dyad is stable, is very broad in our simulations. Interestingly,
the enzyme is reported to be active in only a small pH window at mildly
acidic conditions (pH 4 to 6), thereby using only a portion of the
pH range which would be possible from a mechanistic point of view.
However, the activity itself is a very complex parameter, which depends
not only on the pH value and protonation states, but also on the respective
substrate, the exact assay conditions, and the fold stability of the
enzyme toward extreme pH values. The fact, that the reported activity
range of pepsin somewhat exceeds the margins given by the pKa values (regardless if predicted or experimental)
can also be explained by this argument.Our findings for chymosin
and cathepsin D (see Figures A,C, respectively) are in
line with the arguments made for pepsin and the HIV-1 protease. At
the time of writing this manuscript, no experimental pKa values from direct titration experiments but only activity
profiles were available in the literature for chymosin and cathepsin
D. Chymosin was reported to be most active at and slightly below pH
3.5, indicated by the fading color of the panel in Figure A. The titration curve we obtain
shows a lower pKa value at 1.8, followed
by a small plateau at pH 3.5, which means that a monoprotonated state
is predicted to be stable right at the reported most active pH. The
loss of activity at higher pH values can be attributed to the second
aspartate starting to titrate around pH 4.0 (predicted pKa of 4.9), thereby inactivating the enzyme. Cathepsin
D on the other hand, shows a titration behavior similar to the HIV-1
protease, in that the difference between the two pKa values is more than 1 pKa unit larger than in chymosin or pepsin. In consequence, also the
plateau between the two pKa values is
broader and a monoprotonated configuration is stable over a broader
pH range. Cathepsin is reported to be mostly active at pH values from
2.5 to 6.0, depending on the assay conditions and the substrate. This
fits very well with the pKa values we
find for the catalytic dyad (0.8 and 5.6, respectively), as they suggest
a stable monoprotonated state over the reported active pH range. Shen
and co-workers reported calculated pKa values of cathepsin D of 2.9 and 4.7, which in turn narrows the
range in which a monoprotonated state is predicted to be stable.[67] Intriguingly, the only available experimental
pKa values of 4.1 and >5 are significantly
higher than the predictions of Shen and co-workers and this study.
Furthermore, this would suggest that a monoprotonated form is only
stable at pH values above 4.1, which stands in contrast to the reported
active ranges. However, as the reported experimental pKa values do not stem from a dedicated titration study,
but were estimated from kinetic profiles, it is possible that they
are limited by the employed assay conditions.To evaluate the
potential errors of our pKa values stemming
from the discontinuities of the titration
curves (see Figure ), we reran the simulations for cathepsin D and pepsin using the
pH-REMD approach implemented in AMBER. As can be seen from Figure S3 in the Supporting Information, for
pepsin no notable change was found for the lower pKa value, while the error of the upper pKa value compared to the experiment increased by a small
margin. On the other hand, for cathepsin D, both pKa values come closer together, with especially lower pKa showing a higher value than in our single
pH simulations. As expected, with the REMD approach, the discontinuities
disappear for both systems. However, as the overall picture does not
change and the resulting pKa values are
very similar for both methods, we observe no indication that the discontinuities
significantly contribute to the deviation from the experimental values.While the differences in upper pKa value
are small for chymosin, pepsin, and cathepsin D (4.9, 5.5 and 5.6),
they differ significantly from the upper pKa of the HIV-1 protease (7.2). As all proteases were simulated in
their apo form, we conclude that structural differences of the enzymes
themselves must be a source of this difference. We have previously
shown, that within the constant pH framework and the used force fields,
proximal charges and to a lesser degree H-bonds have the biggest potential
of perturbing pKa values.[65] However, there are no positively charged residues close
enough to the active site aspartates to form ion pairs in any of the
studied enzymes. Hence, we calculated the average number of H-bonds
formed by the catalytic dyad to proximal residues with polar side
chains for each pH value. As can be seen from Figure , the HIV-1 protease forms around 1 H-bond
with neighboring residues before the first titration, after which
this number increases to around 4 H-bonds. In contrast to that, pepsin
and cathepsin D can form 2.5 H-bonds on average in their doubly negative
form and chymosin fluctuates around 3 H-bonds on average. This suggests,
that the negative charges in chymosin, pepsin, and cathepsin D are
stabilized by an H-bond network with neighboring residues, which is
not present in the HIV-1 protease. This in turn could explain the
notable shift of the upper pKa value for
the HIV-1 protease, as the number of stabilizing H-bonds drastically
increases to 4 H-bonds on average, as soon as the dyad is monoprotonated.
The increase in the average number of H-bonds after the first titration
is visible for all studied systems, albeit less pronounced compared
to the HIV-1 protease. This can be attributed to the special structural
arrangement of the catalytic aspartates, which enables the formation
of H-bonds between the aspartates when at least one is protonated
and thus further stabilizes the monoprotonated state. This is in line
with the discussion above and supported by the experimental activity
ranges, which report maximum activity of the respective proteases
in these regions.
Figure 7
Average number of hydrogen bonds formed by the catalytic
dyads
of chymosin (A), pepsin (B), cathepsin D (C), and HIV 1 protease (D).
Average number of hydrogen bonds formed by the catalytic
dyads
of chymosin (A), pepsin (B), cathepsin D (C), and HIV 1 protease (D).Due to the spatial vicinity, the titration behavior
of the two
active site aspartates is expected to be strongly coupled. We thus
profile the effect of this coupling by performing a transition analysis
based on the possible protonation state combinations of the catalytic
dyad. We illustrate this behavior for pepsin in Figure , with a focus on the pH region in between
the two apparent pKa values. Here, the
state populations indeed suggest primarily a monoprotonated form of
the dyad, as is expected from the titration curves. At pH 4.0, all
states are almost equally populated with a high number of edge (i.e.,
single proton) transition between all states. However, at the flanking
pH values of 3.0 and 5.0, we note a certain preference in terms of
which aspartate is protonated. On the one hand, this could point to
a simple convergence issue and could be resolved by extending the
simulations; on the other hand, this could mean, that protonation
on one aspartate is indeed more stabilized than on the other. To exclude
a convergence issue, we extended the simulations to 200 ns per pH,
i.e., doubling the simulation time per pH value. However, the state
distributions are remarkably stable and do not change significantly
with longer simulation time. Furthermore, it is intriguing that single
state transitions (transitions over the edges in Figure ) are far more frequent than
both residues changing their protonation state in the same step (diagonal
transitions in Figure ). This is especially interesting for the transition between states
1 and 2 which corresponds to both residues swapping the proton. As
both residues are directly interacting with each other, the overall
change for the system would be very small; however, the deprotonation
of one residue and the protonation of the other in the next step is
clearly favored.For serine proteases
except chymotrypsin,
no reliable active site pKa values could
be found in the literature at the time of the writing of this manuscript.
Therefore, the quality of the pKa prediction
is evaluated based on the reported activity profiles of the respective
protease. As already stated above, for these systems we did not titrate
the whole catalytic triad but only the catalytic aspartate and histidine.
Serine is generally not considered titratable in the investigated
pH range from 0 to 10.As can be seen in Figure , all studied systems show a similar titration
behavior, in that a quite acidic pKa value
below 1.0 and a near neutral pKa value
is predicted for the titrated residues. In relation to the reported
active pH ranges, we find for all studied systems that both pKa values are below the reported active ranges.
This means that both residues are in their deprotonated form, i.e.,
aspartate is negatively charged, while the histidine is neutral. This
is well in line with the mechanism of serine proteases, in which a
neutral histidine is strongly polarized by the neighboring aspartate
and in consequence abstracts a proton from the catalytic serine, which
in turn enacts the nucleophile attack on the substrate. It is therefore
imperative for activity, that the histidine is in its neutral form
and the aspartate is negatively charged. This is reflected in our
predicted pKa values for all studied systems.While the upper pKa values for all
studied systems are relatively similar and all lie within the error
margin of our cpHMD approach of ±1 pKa units, significant differences in the lower pKa values can be identified. In the case of trypsin and chymotrypsin
(Figure B,D), the
respective lower pKa values corresponding
to the titration of the catalytic aspartate are located below 1, clearly
separating them from the upper pKa values
which are around 6. In contrast, the titrations of elastase and granzyme
B (Figure A,C) appear
to be much more coupled, with a separation of around 3 pKa units. This difference can be attributed to differences
in the H-bond network, which the catalytic aspartate can form with
neighboring residues. In detail, a tyrosine residue (Y94, chymotrypsin
numbering), which is conserved in trypsin and chymotrypsin and represents
a potential H-bond partner for the catalytic aspartate, is mutated
to tryptophan in elastase. This loss of interaction could potentially
destabilize the deprotonated, i.e., negatively charged form of the
catalytic aspartate and in turn lead to an elevated apparent pKa value. This hypothesis is supported by an
H-bond analysis, shown in Figure . Clearly, the catalytic aspartate in trypsin (Figure C) forms 1 H-bond
more on average than the respective aspartates in elastase (Figure A) and granzyme B
(Figure B). Interestingly,
the same shift cannot be seen so clearly for chymotrypsin (Figure D). We surmise, that
while the interaction is not recognized as such by the employed metric,
the polar interaction of the side chain is still present and will
perturb the apparent pKa value of the
catalytic aspartate.
Figure 8
Average number of hydrogen bonds formed by the catalytic
aspartate
of elastase (A), granzyme B (B), trypsin (C), and chymotrypsin (D).
Average number of hydrogen bonds formed by the catalytic
aspartate
of elastase (A), granzyme B (B), trypsin (C), and chymotrypsin (D).Compared to serine
proteases, in
which the catalytically active serine gets deprotonated intermediately
during the reaction and directly attacks the substrate, the catalytic
cysteine in papain and other cysteine proteases forms an ion pair
with a neighboring histidine residue even in the apo state of the
enzyme.[9,72] Since the tripeptide pKa value of cysteine of 8.5[16] suggests a protonated form at physiological and especially at acidic
conditions, a strong perturbation of the cysteine pKa value is necessary in order to facilitate the ion pair
formation. Indeed, for papain and papain-like proteases like caricain
and ficin strongly perturbed pKa values
as low as 3.3, 2.9, and 2.5, respectively, have been reported in the
literature.[9,72] This strong shift of more than
5 pKa units is generally attributed to
the aforementioned ionic interaction with the neighboring histidine.
However, common prediction tools are reportedly unable to capture
these strong shifts in the aforementioned systems and strongly mispredict
the respective cysteine pKa values.[71]As can be seen from Figure , unfortunately also our approach falls short
in predicting the pKa shift of the catalytic
cysteine of papain (experimental pKa of
3.3). Indeed, with the implicit solvation model, which was successfully
used for the other families, no clear titration of cysteine could
be observed in the pH range from 0.0 to 14.0 (see Figure A). In an effort to pinpoint
the source of this erroneous behavior, we switched the used implicit
solvent to the most recent GB-Neck 2 model, coupled with the increase
of the GB radius of sulfur to 2.0 Å as suggested recently by
Shen and co-workers.[50] This had the notable
effect that we now capture a titration of cysteine, resulting in a
pKa value of 8.7 (see Figure B). However, the strong acidic
shift is still not captured. Furthermore, we repeated the simulations
using a replica exchange protocol in order to allow for a coupling
of the pH values, but also this did not improve the prediction (Figure C). To exclude a
deficiency of the MC-based constant pH framework, we reran the simulation
with the recent implementation of the continuous constant pH approach
in AMBER by Shen and co-workers, following their suggested setup for
the treatment of cysteines.[50] However,
as can be seen from Figure D, while we capture a titration of the cysteine, we still
are not able to predict the strong acidic shift. We would like to
note here, that while this manuscript was under revision, Shen and
co-workers published a broad benchmark study predicting cysteine pKa
values against experimental reference. With refined parameters, they
were able to very accurately reproduce even strong pKa shifts (i.e.,
in papain). We would like to refer the interested reader to their
publication.[74]We see a few possible
reasons why the correct pKa values or
at least an acidic shift could not be predicted.
First, the predicted pKa values might
correspond to a limited and strongly biased protonation state ensemble,
stemming from insufficient conformational sampling. As the time scales,
which can be covered in standard MD simulations, are generally several
orders of magnitude below the time scales on which the experimental
reference values are measured on, it could be possible that we only
observe a very small fraction of the experimental conformational space.
As the observed protonation state ensemble is closely linked to the
conformational ensemble, also the apparent pKa values we obtain will in turn only correspond to this small
sub-ensemble. Large conformational changes, which happen at much slower
time scales, might severely alter the underlying conformational ensemble
and in turn also the predicted pKa values.
However, we deem this scenario to be very unlikely, as the prediction
errors are extensive both in terms of the actual value as well as
in the shift direction. Furthermore, this would also mean that the
crystal structure and conformations close to it would represent an
almost negligible part of the conformational ensemble at slower time
scales. Thus, we surmise that a systematic error in the titration
prediction is the source of the erroneous cysteine pKa predictions.Second, failing to capture the perturbation-effect
itself could
lead to a complete misprediction of the pKa values. However, as discussed above, the main perturbation of the
pKa of CYS25 in papain comes from the
ionic interaction with HIS159, an interaction that is generally well
captured within the cpHMD framework.[65] To
rule out a possible effect of the titration of HIS159, we reran the
simulation, not allowing HIS159 to titrate and keeping it in its positively
charged form (data not shown). As this did not change the pKa prediction of CYS25, we presume that also
this scenario is not the definitive error source.Third, the
description of the sulfur and its titration in the context
of partial charges could be problematic. Since the titration of the
reference compounds works without any issues, we presume that the
description of the sulfur or the titration itself is not a problem
when an isolated cysteine is considered but rather arises when the
cysteine is located in a complex, i.e., protein environment. Shen
and co-workers recently used the continuous constant pH MD implementation
to successfully reproduce the pKa value
of the creatin kinase.[50] As the cysteine
pKa values in kinases are generally perturbed
less than the ones found in proteases like papain,[9] this could mean that only very strong perturbations are
not captured correctly. This could be linked to the strong polarizability
of sulfur, an effect that is neglected in all tested cpHMD approaches,
as no polarizable force fields are used.[73] We therefore presume that either a more sophisticated description
of the electrostatics of sulfur or the incorporation of polarizable
force fields would significantly improve the prediction. The aforementioned
approach by Radak et al. as implemented in the program NAMD holds
great promise in this regard due to its modular implementation with
generally no prior assumption of the used force field.[34]
Conclusion
We apply constant pH
MD simulations to provide pKa estimations
for active site residues of a set of 9 different
proteases. While the constant pH MD framework has been successfully
applied to protease systems before, to our knowledge no study was
published yet, which systematically predicts and summarizes active
site pKa values of multiple protease families.We find that our predictions are consistent with the available
experimental pKa values and are in sound
agreement with the strongly varying pH activity profiles of aspartic
and serine proteases. All titrated active site residues show substantial
shifts away from the tripeptide pKa values.
The applied sampling strategy successfully captures this behavior,
highlighting the benefits of dynamic pKa prediction tools compared to static algorithms. The approach also
allows us to depict the strongly coupled titration behavior found
for some of the studied systems, which we show in detail for pepsin.
Furthermore, we find pH-dependent H-bond networks which could explain
the varying protonation and thus pH activity profiles. We presume
the discussed residues as promising starting points, e.g., for protein
engineering efforts toward tailored pH activities. However, we also
clearly identify limitations of the methodology in terms of treating
the strongly polarizable sulfur. Nevertheless, as the field of cpHMD
simulations is rapidly progressing, we see these findings as an opportunity
to enhance the reliability of this method even further.
Authors: Emil Alexov; Ernest L Mehler; Nathan Baker; António M Baptista; Yong Huang; Francesca Milletti; Jens Erik Nielsen; Damien Farrell; Tommy Carstensen; Mats H M Olsson; Jana K Shen; Jim Warwicker; Sarah Williams; J Michael Word Journal: Proteins Date: 2011-10-15
Authors: James C Phillips; Rosemary Braun; Wei Wang; James Gumbart; Emad Tajkhorshid; Elizabeth Villa; Christophe Chipot; Robert D Skeel; Laxmikant Kalé; Klaus Schulten Journal: J Comput Chem Date: 2005-12 Impact factor: 3.376
Authors: Diego Garrido Ruiz; Angelica Sandoval-Perez; Amith Vikram Rangarajan; Emma L Gunderson; Matthew P Jacobson Journal: Biochemistry Date: 2022-09-26 Impact factor: 3.321
Authors: Huanchen Wang; Lalith Perera; Nikolaus Jork; Guangning Zong; Andrew M Riley; Barry V L Potter; Henning J Jessen; Stephen B Shears Journal: Nat Commun Date: 2022-04-25 Impact factor: 17.694
Authors: Lewis D Turner; Alexander L Nielsen; Lucy Lin; Sabine Pellett; Takashi Sugane; Margaret E Olson; Eric A Johnson; Kim D Janda Journal: RSC Med Chem Date: 2021-05-19