Literature DB >> 32348143

Catalytic Site pK_a Values of Aspartic, Cysteine, and Serine Proteases: Constant pH MD Simulations.

Florian Hofer¹, Johannes Kraml¹, Ursula Kahler¹, Anna S Kamenik¹, Klaus R Liedl¹.

Abstract

Enzymatic function and activity of proteases is closely controlled by the pH value. The protonation states of titratable residues in the active site react to changes in the pH value, according to their pKa, and thereby determine the functionality of the enzyme. Knowledge of the titration behavior of these residues is crucial for the development of drugs targeting the active site residues. However, experimental pKa data are scarce, since the systems' size and complexity make determination of these pKa values inherently difficult. In this study, we use single pH constant pH MD simulations as a fast and robust tool to estimate the active site pKa values of a set of aspartic, cysteine, and serine proteases. We capture characteristic pKa shifts of the active site residues, which dictate the experimentally determined activity profiles of the respective protease family. We find clear differences of active site pKa values within the respective families, which closely match the experimentally determined pH preferences of the respective proteases. These shifts are caused by a distinct network of electrostatic interactions characteristic for each protease family. While we find convincing agreement with experimental data for serine and aspartic proteases, we observe clear deficiencies in the description of the titration behavior of cysteines within the constant pH MD framework and highlight opportunities for improvement. Consequently, with this work, we provide a concise set of active site pKa values of aspartic and serine proteases, which could serve as reference for future theoretical as well as experimental studies.

Entities: Chemical

Mesh：

Substances：
Serine Proteases
Cysteine

Year: 2020 PMID： 32348143 PMCID： PMC7312390 DOI： 10.1021/acs.jcim.0c00190

Source DB: PubMed Journal: J Chem Inf Model ISSN： 1549-9596 Impact factor: 6.162

Introduction

Proteases catalyze the cleavage of peptide bonds, a ubiquitous reaction in the whole biosphere. Indeed, 2–3% of all human genes code for proteases or protease inhibitors.[1] The function of the proteases is manifold. Processes from signaling cascades over digestion to programmed cell death are based on proteolytic processing.[2] Consequently, the physiological environments where proteases need to operate are very diverse as well, including vastly different ranges of acidity. For example, digestive proteases in the stomach at a pH of 2.0 have to catalyze the same reaction as proteases of the blood coagulation cascade at a pH of 7.4 and proteases in the gut at basic conditions.[3,4] An overview of the various activity profiles of aspartic, cysteine, and serine proteases is shown in Figure .[5,6] Taken together, these three families cover a broad pH range in terms of activity. While aspartic proteases are active in the acidic range, cysteine proteases cover the mild acidic to neutral range and finally serine proteases are mostly found active at neutral to slightly alkaline conditions.[3−7]

Figure 1

Overview of the pH-dependent activity profiles of aspartic (red), cysteine (yellow), and serine (blue) proteases. Together these three families cover a broad pH span, ranging from very acidic, over neutral, to mildly basic pH values. The ranges of major subfamilies are highlighted in a darker shade of the respective color. The major distinctions between these three families in terms of catalysis can be found in their active site architecture. The catalytic center of aspartate proteases consists of an aspartic dyad, of which one aspartate acts as a base and the other one as an acid during catalysis.[3,8,9] For this purpose, it is imperative that the dyad is in a monoprotonated state when the protease is active. In cysteine proteases on the other hand, a cysteine and a histidine constitute the active site, which form an ion pair, i.e., the cysteine is in its thiolate form, while the imidazole side chain of the histidine is protonated and therefore positively charged.[7,9,10] In contrast, serine proteases show a catalytic triad motif, consisting of an aspartate, a histidine, and a serine, of which only the aspartate is negatively charged, while the histidine and serine are ionized intermediately during catalysis.[4,9] In summary, the nature and arrangement of the active site residues are decisive for the pH-dependent activity ranges of the different protease families shown in Figure . Different pH values or changes thereof lead to changes in the protonation states of titratable residues within a protein.[11−14] How a titratable residue reacts to different pH values is determined by its pKa value.[9,15] The so-called intrinsic pKa value of a titratable residue, i.e., the pKa of the isolated amino acid, is perturbed by the complex electrostatic environment formed by its surrounding residues within a protein to the so-called macroscopic or apparent pKa value.[9,16] Intrinsic pKa values of the various titratable amino acids commonly found in proteins can be rigorously approximated by small peptides (e.g., of the form acetyl-GXG-amide), which are easy to measure directly (commonly by NMR) and readily available in the literature.[16] However, within the complex environment of a protein, the direct determination of pKa values can be very challenging or even impossible.[16] All of the aforementioned active site residues, except serine, are titratable in the pH range of 0 to 10, in which also the discussed proteases are active.[9] Thus, the titration states of the active site residues depend directly on the pH. In consequence, the pH determines whether or not the enzyme is active, since a well-defined protonation state configuration is imperative for activity. As discussed above, the macroscopic pKa values determine how the titratable residues in the active site react to a specific pH value, which in turn depends on the electrostatic environment they encounter and therefore on the structure of the active site itself. The pKa values of the active site residues are thus decisive for inhibitor design and mechanistic investigations. However, the experimental determination of these pKa values is extremely difficult, which is reflected by the low number of available pKa values in the literature. In consequence, computational tools, which can reliably predict such pKa values are of utmost importance. Over the last decades a multitude of such prediction tools have emerged, most of which can be generally divided into two groups.[17] The group of static methods, e.g., PROPKA[18] or H++,[19] predicts pKa values based on single or multiple static structures of a protein. In contrast, dynamic methods such as the family of constant pH molecular dynamics (cpHMD) methods use an ensemble of structures for protonation state predictions.[20] It is well-established that proteins in solution are inherently flexible, meaning they relentlessly fluctuate between diverse conformational states of varying probabilities.[21−25] Consequently, the structural environment around titratable residues is continuously changing and the protonation state ensemble is inevitably linked to conformational rearrangements. The cpHMD approach offers the unique opportunity to account for this intricate interplay of conformation and protonation. The approach not only incorporates a diverse set of conformations into the pKa prediction itself, but also allows capturing how a protein structurally adapts to different pH values.[20] Most of the different cpHMD approaches can be attributed to two main groups, based on the treatment of the protonation states. On the one hand, protonation states can be treated discretely, and all titratable protons are explicitly defined at each titratable group and if not active are only present as ghost particles. The simulation is periodically interrupted, and the protonation state changes are attempted based on a Metropolis criterion.[26−30] On the other hand, protonation states can be sampled along a continuous titration coordinate λ.[31,32] Similar to the concept of thermodynamic integration,[33] if λ is 0, the respective residue is protonated and if λ is 1, it is deprotonated; all states in between are unphysical. As in typical simulations only a small number of frames would meet this criterion, usually a cutoff is employed to maximize the number of analyzable frames. Recently, Radak et al. presented a hybrid nonequilibrium MD/Monte Carlo approach,[34] based on the works of Roux[35] and Stern.[28] Here, equilibrium MD is performed with fixed protonation states. Periodically, a nonequilibrium switch is attempted, sampling in the protonation and conformation space. Whether or not the switch is accepted is determined via a Metropolis Criterion. If the switch is indeed accepted, equilibrium MD continues with the new protonation state from the final conformation of the switch. If not, the simulation reverts back to the conformation before the switch attempt. This approach is implemented in the NAMD package.[36] For a more in-depth discussion of the various techniques, we point the reader to the respective works.[26−32,34,35] In this work we use cpHMD simulations to titrate the active site residues of selected proteases of the aspartic, cysteine, and serine protease families. We focus on the methods implemented in the AMBER software package.[37] We use primarily the Monte Carlo[38] (MC)-based cpH approach, as implemented in AMBER, with discrete protonation states, specifically the most recent variant by Roitberg and co-workers utilizing explicit solvent.[30] On the other hand, we also make use of the continuous cpHMD approach, which was also recently implemented in AMBER by Shen and co-workers.[32] Both aforementioned approaches of cpHMD have been combined with enhanced sampling techniques like replica exchange MD (REMD[39]) and accelerated MD (aMD[40]) in order to achieve efficient sampling of conformations and protonation states.[29,30,32,41−44] The recent implementations of cpHMD on graphics processing units (GPUs) dramatically increased calculation speed. Hence, it is possible to capture dynamics at slower time scales with continuous trajectories at feasible computational costs. Here we use single pH cpHMD simulations, as they can be run easily in parallel with an arbitrary number of GPUs and show acceptable convergence behavior.[45] We apply this workflow to a set of 9 representative proteases from three of the four main largest protease classes distinguished by the catalytic mechanism.[46] On the basis of relevance in drug discovery and differences in pH-dependent activity profiles, we selected representative proteases from the aspartate, cysteine, and serine protease families. We excluded the family of metalloproteases, as for this family, the protonation/deprotonation events in the active site are closely linked to the coordinating ion. In order to capture this effect, a sophisticated description of the electrostatics and polarizability of the ion would be necessary, which is not possible for the force fields used within the cpHMD framework. For the selected proteases we efficiently capture reliable protonation state ensembles. In addition to reference pKa values, we provide atomistic insights to rationalize the origin of the strongly varying activity profiles.

Methods

System and Simulation Setup

All systems were prepared with the program MOE (molecular operating environment)[47] from X-ray structures, which are available in the PDB. The respective PDB codes are summarized in Table S1 in the Supporting Information. All crystal waters, agents, and ligands were removed if any were present. If multiple chains were present in the entry, the chain with the highest quality and sequence coverage was chosen based on the full PDB validation report. The LEaP module of AmberTools 19[37] was used to add missing hydrogens and create topology and starting coordinate files. The AMBER ff99SB[48] force field coupled with the necessary modifications for constant pH MD simulations was used.[27,30] The GB radii of the titratable oxygens of aspartate were reduced to 1.3 Å as suggested by Swails et al.[30] All systems were placed in a truncated octahedral TIP3P water box with a minimum wall distance of 10 Å.[49] Furthermore, the cysteine protease papain was simulated using the GB-Neck2 implicit solvent model with the appropriate GB radii.[50] As there were no reference energies available for cysteine for this implicit solvent model, reference energies were derived as suggested in the AMBER manual.[37] For the derivation of partial charges and force field parameters of deprotonated, i.e., negatively charged serine, the structure was prepared with MOE and the needed parameters subsequently derived with Gaussian 16[51] and the antechamber framework of AmberTools19.[37] Partial charges were derived with the RESP[52] procedure. Before production simulations, all systems were equilibrated with an elaborate protocol developed in our group.[53] All simulations were carried out with the pmemd module of AMBER 18, making use of both the CPU and the GPU implementation.[37] Calculations were carried out on the Vienna Scientific Cluster (VSC3 and VSC4) and on our in-house GPU cluster. The Langevin thermostat[54] with a collision frequency of 5 ps–1 was used to keep a constant temperature of 310 K, as was the Berendsen barostat[55] with a relaxation time of 2 ps to keep atmospheric pressure. The SHAKE[56] algorithm was used to restrain all bonds involving hydrogens, enabling the use of a 2 fs time step. Long range electrostatics were treated with the particle-mesh Ewald method[57] (PME), and a nonbonded cutoff of 8 Å was used. All systems were simulated at pH values from 0.0 to 10.0 (0.0 to 14.0 for papain) with a 0.5 spacing. For all MC-based cpHMD simulations, protonation state changes were attempted every 200 steps, followed by 200 steps of solvent relaxation after a successful attempt. For the GB calculations within the cpHMD framework, a salt concentration of 0.1 was used. For aspartic proteases, the two aspartates comprising the catalytic dyad were selected to titrate. For serine proteases, the aspartate and the histidine of the catalytic triad were selected to titrate. For the cysteine protease papain, two approaches were tested. On the one hand, both the cysteine and the histidine of the catalytic center were selected to titrate, and on the other hand, only the cysteine was titrated, while keeping the histidine in its doubly protonated, i.e., positively charged form. Frames were collected every 1000 frames. All simulations were run for 100 ns per pH value, resulting in 2.1 μs of aggregate simulation time per system. For papain, the system was prepared following the procedure described by Shen and co-workers[50] and was simulated using the recent implementation of continuous cpHMD in AMBER 18.[32,45,50] In brief, CHARMM[58] with the CHARMM22 all-hydrogen force field[59] was used to add missing hydrogens, terminal cappings, set up the titratable groups, and perform initial minimizations. Hereafter, the minimized structure was prepared with LEaP using the AMBER ff14SB[60] force field with the necessary modifications for continuous cpHMD.[32] The GB radius of the titratable sulfur was set to 2.0 Å as suggested by Shen and co-workers.[50] The subsequent simulations were carried out with the same settings as described above. For the continuous cpHMD specific settings, a mass of 10 amu was used for the lambda particles, a friction coefficient of 5 ps–1 was used for the titration integrator, and the forces of the lambda particles were updated every step.

Analysis

All analyses were performed using cpptraj[61] and pytraj from AmberTools 19,[37] combined with in-house python scripts. Analysis of the continuous cpHMD data was done with a python script provided by Shen and co-workers.[62] Structural representations were created with PyMol.[63] Titration data from MC-based cpHMD simulations was analyzed with the cphstats program from AmberTools 19.[37] As the titrations of the catalytic residues were strongly coupled, titration curves were obtained by fitting the average number of total protons as was shown previously by Roitberg and co-workers for the HIV-1 protease (eq ).[64] Setups in which only one residue was titrated were fitted to the modified Hill equation (eq ).Shifts in pKa values were evaluated using capped tripeptide (acetyl-GXG-amide) pKa values as published by Platzer and McIntosh as reference.[16] Convergence of pKa values was evaluated by monitoring the cumulative averages of the pKa predictions. To profile protonation state transition probabilities between the strongly active site residues in aspartate proteases, we set up a 4-state model based on the possible protonation state combinations of the respective titrated residues and calculated transition matrices based on these models, as we previously showed.[65] The matrices were then visualized as network plots, in which circle sizes denote state probabilities and arrow sizes transition probabilities.

Results

Chymotrypsin in Apo and Bound Form

In order to benchmark the robustness of the applied constant pH MD simulation approach, we aimed to reproduce the pKa change of the catalytic histidine associated with the activation of chymotrypsin described by Lin et al.[66] We used the apo enzyme as a model for the encounter complex of the protease and the peptide-compound as shown in Figure A. With a predicted pKa value of 7.16, we closely reproduce the literature pKa value of 7.5 (Figure B). We modeled the negatively charged complex of protease and peptide by simply deprotonating the catalytic serine residue, thereby introducing an additional negative charge (see Figure A). For this system, we find a pKa value of 10.78, which is in line with the experimentally determined pKa range of 10–12.

Figure 2

Prediction of the pKa shift of the catalytic HIS57 of chymotrypsin upon formation of the negatively charged complex. Complex structure and schematic representation are shown in the upper panel, titration curves obtained for the apo enzyme (left) and the complex (right) are shown in the lower panel. The light blue area denotes the active region of the enzyme, while experimental (blue), and predicted (black) pKa values are shown as lines.

Serine Proteases

For the serine protease family, elastase, trypsin, granzyme B and chymotrypsin were considered. Reported activity ranges and experimental pKa values (only available for chymotrypsin) are summarized in Table . Side chain pKa values of the catalytic aspartate and histidine residues were determined with single pH constant pH MD simulation as described in the method section. Reported activity profiles and predicted pKa values are summarized in Figure .

Table 1

Summary of Serine Proteases Which Were Considered in This Studya

protease	pH activity range	experimental pK_a values
elastase	7–7.5	ND
trypsin	7–8	ND
granzyme B	7–8	ND
chymotrypsin	8–9	7.5

Activity ranges reported in literature and available experimental pKa values are given.

Figure 3

Titration curves and predicted pKa values of the serine proteases elastase (A), trypsin (B), granzyme B (C), and chymotrypsin (D), which were considered in this study. Activity ranges reported in the literature are shown as colored boxes (Table ). Experimental (only available for chymotrypsin) and predicted pKa values are shown as blue and black lines, respectively.

Activity ranges reported in literature and available experimental pKa values are given. Titration curves and predicted pKa values of the serine proteases elastase (A), trypsin (B), granzyme B (C), and chymotrypsin (D), which were considered in this study. Activity ranges reported in the literature are shown as colored boxes (Table ). Experimental (only available for chymotrypsin) and predicted pKa values are shown as blue and black lines, respectively. As can be seen from Figure , all four systems show a similar titration behavior. In each system an acidic and a considerably higher, near neutral pKa value were captured, which span a broad pH range in which a monoprotonated state is stable. While the upper pKa value is at or near 6.0 for all systems, clear differences can be seen for the lower pKa value. While for trypsin and chymotrypsin the lower pKa is very acidic (below 1), this is less pronounced in elastase and granzyme B. For the latter, the titrations of the two active site residues appear to be coupled stronger and the pKa differences are smaller compared to the other two systems (5.8 and 6.1 vs 3.1 and 3.5). Furthermore, the upper pKa value of 6.2 found here for chymotrypsin deviates more from the reported pKa of 7.5 than the one reported for the isolated titration of the active site histidine described above (7.2). The convergence analysis shows that all upper pKa values converged after 50–60 ns. The lower pKa values show a slower convergence, especially for granzyme B and chymotrypsin (see Figure S2 in the Supporting Information). This is in line with the titration curves in Figure , which show that the predictions are more noisy at lower pH values. In relation to the respective active pH ranges, we find that for all systems the active range is located at pH values higher than both pKa values, i.e., in a range where both residues are unprotonated.

Aspartic Proteases

We selected a set of 4 aspartic proteases with varying pH activity ranges and experimental titration information, as summarized in Table . Side-chain pKa values of the active site aspartate residues were predicted with single pH constant pH simulations as described in the Methods section. The calculated titration curves and respective predicted pKa values are summarized in Figure .

Table 2

Summary of Aspartic Proteases Which Were Considered in This Studya

protease	pH activity range	experimental pK_a values
chymosin	<3.5	ND
pepsin[5]	1–6 (optimum at 3.5)	1.57; 5.02
cathepsin D[67]	2.5–6.0	ND
HIV-protease 1[68−70]	4.0–6.0	3.1–3.7; 4.9–6.8

Experimental activity ranges and available pKa values are given.

Figure 4

Titration curves and predicted pKa values of the aspartate proteases chymosin (A), pepsin (B), cathepsin D (C), and HIV-1 protease (D), which were considered in this study. Activity ranges reported in the literature are shown as colored boxes (Table ). Experimental (if available) and predicted pKa values are shown as red and black lines, respectively.

Experimental activity ranges and available pKa values are given. Titration curves and predicted pKa values of the aspartate proteases chymosin (A), pepsin (B), cathepsin D (C), and HIV-1 protease (D), which were considered in this study. Activity ranges reported in the literature are shown as colored boxes (Table ). Experimental (if available) and predicted pKa values are shown as red and black lines, respectively. As can be seen from Figure , our approach closely reproduces the available experimental pKa values of pepsin and the HIV-protease I. Furthermore, the calculated pKa values envelop the experimentally determined active range of the respective protease (shown as colored boxes in Figure ). Both systems show notable pKa shifts for both aspartates away from the free amino acid pKa value (3.86 for aspartate[16]), with the effect being more pronounced in the HIV-protease. In both systems, one pKa value is shifted more toward acidic and one toward more basic pKa values compared to the free amino acid. Especially in pepsin, the titration curves appear to be strongly coupled, with practically no gap between the titrating regions, whereas in the HIV-protease, a monoprotonated state is stable for a broad pH range (pH 2.0 to pH 6.0). Consequently, the gap between the pKa values is much smaller in pepsin (ΔpKa = 3.4), compared to the HIV-protease (ΔpKa = 7.7). In both cases, the experimentally determined active range of the respective protease is located between the two pKa values, i.e., in the monoprotonated region. While for pepsin the reported active region somewhat exceeds both experimental and predicted pKa values, the reported pH optimum of 3.5 is indeed located at the very center of the calculated titration curve. Also for chymosin and cathepsin D, for which no experimental pKa information is available, titration curves and pKa values could be estimated. The calculated pKa values show the same trend as already observed for pepsin and the HIV-protease I. Chymosin shows a strongly coupled titration behavior, similar to that of pepsin, with a small pKa gap of 3.1. The activity maximum is reported to be below pH 3.5, which again lies at the very center of the monoprotonated region of the titration curve. Cathepsin D, on the other hand, shows a titration behavior similar to the HIV-protease I, in that the gap between the pKa values is larger (4.8) and a monoprotonated state is stable over a longer pH range (pH 2 to 4). The reported activity range again lies in this pH region, below the upper pKa value. The convergence analysis of the predicted pKa values shows that again all pKa values converge within the 100 ns of simulation time. Most of the upper pKa values again converge faster than their lower counterparts (see Figure S2 in the Supporting Information).

State and Transition Analysis

To characterize the transition paths and state distributions of the strongly coupled titrations seen for the aspartic proteases, we performed protonation state transition analyses. The active site titrations are modeled as a 4-state-system, based on the protonation states of the two aspartic residues (see Table ). States 0 and 3 represent the fully deprotonated and fully protonated states, respectively, whereas states 1 and 2 both represent a monoprotonated state but distinguish which aspartate is protonated. Hereby all 4 variants of a protonated aspartate defined in the cpHMD framework are condensed into one state. The resulting transition matrices are visualized as network plots with circle and arrow sizes corresponding to state and transition probabilities, respectively.

Table 3

State Definitions Used for the Protonation State Transition Analyses of the Aspartic Proteasesa

state number	ASP A	ASP B
0	0	0
1	0	1
2	1	0
3	1	1

Protonation states are denoted as 0 (deprotonated) or 1 (protonated).

Protonation states are denoted as 0 (deprotonated) or 1 (protonated). We find that all proteases follow a similar, pH-dependent pattern in terms of state populations and transition probabilities. In Figure , the results for selected pH values of the pepsin simulations are shown exemplary. The analysis for all pH values can be found in the Supporting Information (Figure S1). We find that at very low or very high pH values, the fully protonated (state 3) or deprotonated (state 0) states are dominantly populated, respectively. Transitions to other, very sparsely, populated states do occur but are rare. At moderately acidic pH values, after the first titration has occurred, states 1 and 2, i.e., the monoprotonated states, increase in population until a near uniform distribution of all four states is reached. As the pH further increases, first state 3 and consecutively also states 1 and 2 diminish as state 0 becomes more and more populated. Furthermore, we note that primarily single state transitions occur, which correspond to transitions over the edges in the network plots in Figure . However, also transitions over the diagonal are visible, which correspond to both aspartates changing their protonation state at the same time. However, these transitions are very rare.

Figure 5

Protonation state transition analysis performed for pepsin shown as network plots at selected pH values. Circle sizes and arrow thickness corresponds to state populations and transition probabilities, respectively.

Cysteine Proteases

For the family of cysteine proteases, experimental pKa values are available for a number of systems,[9] all of which show a very strong acidic shift of the active site cysteine residue away from its tripeptide pKa value of 8.5. The pKa values of active site cysteine residues have been reported to be extremely challenging to predict, with most of the available prediction tools failing to predict experimentally determined pKa shifts and even predicting shifts into the wrong direction.[71] Here, we selected papain as a test system, which shows a strong acidic shift of the active site cysteine of −5.2 pKa units (from 8.5 down to 3.3).[9] We used single pH constant pH MD simulations to predict the active site pKa value, as described in the Methods section. The resulting titration curves and pKa values are shown in Figure A. Clearly, our approach not only mispredicts the pKa value of CYS25 but also does not capture the acidic pKa shift at all. As can be seen from the titration curve in Figure A, CYS25 does not titrate at all in the pH range, which was used for the aspartic and serine proteases and only starts to titrate at a pH as high as 12.0.

Figure 6

Titration of papain active site residues CYS25 and HIS159 with MC based constant pH MD with implicit solvent models GBOBC (A) and GB-Neck2 (B), cpH REMD with the GBOBC model (C) and continuous cpHMD (D). Experimental pKa value of CYS25 of 3.3 could not be reproduced with any approach, with the simulations using the GBOBC model (A and C) showing no titration at all. In order to analyze the source of these erroneous predictions, we repeated the simulations utilizing different implicit solvent models and GB-radii for the sulfur, as was suggested recently by Shen and co-workers.[32,50] To do this, we had to derive the reference energies for cysteines, which are necessary for the cpHMD workflow, since they were not yet available for the GB-neck 2 model and different GB radii (see Figure B). Furthermore, we employed the constant pH replica exchange (cpH REMD) technique, which is implemented in AMBER. This approach was shown in multiple works to increase both protonation state and conformational sampling (see Figure C). Finally, we also repeated the simulations utilizing the recent implementation and setup of continuous cpHMD in AMBER by Shen and co-workers (see Figure D). While the titration curves in both Figure B (GB-Neck2 implicit solvent model) and Figure D (continuous cpHMD) show notable improvements in the titration prediction of the active site cysteine, the predicted pKa values (8.75 and 9.80, respectively) are close to the pKa value of free cysteine (8.5) and the strong acidic shift, which was observed in experiments, could not be captured. In contrast to this, no benefit in terms of pKa prediction could be achieved with the cpH REMD setup compared to single pH simulations.

Discussion

We use single constant pH MD simulations to predict the active site pKa values of various members of the aspartic, serine, and cysteine protease families. We further investigate the molecular origins which could explain the observed differences in the pH activity ranges within the individual families.

Chymotrypsin Apo and Bound

As structural data of substrate-bound complexes are limited, we simulated all proteases in their apo-state. Nevertheless, we recognize that the presence of a substrate, especially with charged residues, in the active site might influence the pKa values of titratable active site residues. For chymotrypsin, experimental pKa values for both the apo enzyme as well as for various trifluoro-peptidyl complexes are available (see Figure A).[9,66] To assess, how well our approach can capture such changes in the active site, we predicted the pKa value of the apo enzyme as well as for the modified enzyme. We approximated the peptide-bound form with a simple negatively charged serine (see Figure A). While this is a drastic simplification, we presume that in terms of pKa shift potential the additional negative charge in the active site represents the most decisive aspect. As there is no high-quality structural data of these complexes available, we assume that the error introduced by this simplification is smaller, compared to the inaccuracies resulting from modeling the rather large complex into the binding site. The validity of our approximation is supported by the pKa values we obtain from our simulations as shown in Figure B. With an unsigned error of 0.32 pKa units, we closely reproduce the reported pKa value of the free enzyme. For the complexes, we find a strong basic shift for the active site histidine. This shift can be directly attributed to the additional charge on the serine residue. The experimental pKa values for the bound enzyme range from 10 to 13 depending on the complexed peptide. Our predicted pKa of 10.8 is perfectly in line with these results. This indicates that despite the simplified representation of the bound state, we still capture the strongest perturbation driving the pKa shift, which is indeed the additional negative charge.

Aspartate Proteases

In Figure , we summarize predictions and experimental references for the family of aspartate proteases. As can be seen from Figure B,D, our approach closely reproduces the available experimental pKa values of pepsin and the HIV-1 protease.[5,68−70] Notably, in both systems one pKa value is predicted to be shifted into the acidic and the other one into the basic direction, compared to the tripeptide reference values of aspartate. Our calculations reproduce these shifts for both proteases. Furthermore, we find that the experimentally reported activity range lies between the two respective pKa values. These findings are consistent with the mechanistic picture of a monoprotonated catalytic dyad in active aspartic proteases. To make an example with our predicted pKa values, following this argument, pepsin should quickly become inactive if the pH falls below 2.1 or rises above 5.5. Indeed, pepsin is reported to be active between pH 1 and pH 6, i.e., in the pH range where a monoprotonated state is predicted to be stable (see Figure B). A similar picture can be found for the HIV-1 protease (Figure D). Also here the experimental pH range from 4 to 6, in which the enzyme is found active, lies between the predicted pKa values of −0.5 and 7.2. In contrast to pepsin, the lower pKa value was predicted to be extremely acidic in our simulations compared to the experimental pKa values (3.1 or 4.9 depending on the reference). Roitberg and co-workers studied the influence of ligand binding on the active site pKa values of the HIV-1 protease and predicted pKa values of 1.29 and 7.32 for the apo enzyme.[64] While their upper pKa value is very close to ours (7.32 to 7.2), the lower pKa value they find is still less acidic than the one we find (1.29 to −0.5). However, both times the lower pKa is found to be clearly more acidic than in the experiment. With a pKa difference of 7.7 pKa units, the pH range, in which a monoprotonated catalytic dyad is stable, is very broad in our simulations. Interestingly, the enzyme is reported to be active in only a small pH window at mildly acidic conditions (pH 4 to 6), thereby using only a portion of the pH range which would be possible from a mechanistic point of view. However, the activity itself is a very complex parameter, which depends not only on the pH value and protonation states, but also on the respective substrate, the exact assay conditions, and the fold stability of the enzyme toward extreme pH values. The fact, that the reported activity range of pepsin somewhat exceeds the margins given by the pKa values (regardless if predicted or experimental) can also be explained by this argument. Our findings for chymosin and cathepsin D (see Figures A,C, respectively) are in line with the arguments made for pepsin and the HIV-1 protease. At the time of writing this manuscript, no experimental pKa values from direct titration experiments but only activity profiles were available in the literature for chymosin and cathepsin D. Chymosin was reported to be most active at and slightly below pH 3.5, indicated by the fading color of the panel in Figure A. The titration curve we obtain shows a lower pKa value at 1.8, followed by a small plateau at pH 3.5, which means that a monoprotonated state is predicted to be stable right at the reported most active pH. The loss of activity at higher pH values can be attributed to the second aspartate starting to titrate around pH 4.0 (predicted pKa of 4.9), thereby inactivating the enzyme. Cathepsin D on the other hand, shows a titration behavior similar to the HIV-1 protease, in that the difference between the two pKa values is more than 1 pKa unit larger than in chymosin or pepsin. In consequence, also the plateau between the two pKa values is broader and a monoprotonated configuration is stable over a broader pH range. Cathepsin is reported to be mostly active at pH values from 2.5 to 6.0, depending on the assay conditions and the substrate. This fits very well with the pKa values we find for the catalytic dyad (0.8 and 5.6, respectively), as they suggest a stable monoprotonated state over the reported active pH range. Shen and co-workers reported calculated pKa values of cathepsin D of 2.9 and 4.7, which in turn narrows the range in which a monoprotonated state is predicted to be stable.[67] Intriguingly, the only available experimental pKa values of 4.1 and >5 are significantly higher than the predictions of Shen and co-workers and this study. Furthermore, this would suggest that a monoprotonated form is only stable at pH values above 4.1, which stands in contrast to the reported active ranges. However, as the reported experimental pKa values do not stem from a dedicated titration study, but were estimated from kinetic profiles, it is possible that they are limited by the employed assay conditions. To evaluate the potential errors of our pKa values stemming from the discontinuities of the titration curves (see Figure ), we reran the simulations for cathepsin D and pepsin using the pH-REMD approach implemented in AMBER. As can be seen from Figure S3 in the Supporting Information, for pepsin no notable change was found for the lower pKa value, while the error of the upper pKa value compared to the experiment increased by a small margin. On the other hand, for cathepsin D, both pKa values come closer together, with especially lower pKa showing a higher value than in our single pH simulations. As expected, with the REMD approach, the discontinuities disappear for both systems. However, as the overall picture does not change and the resulting pKa values are very similar for both methods, we observe no indication that the discontinuities significantly contribute to the deviation from the experimental values. While the differences in upper pKa value are small for chymosin, pepsin, and cathepsin D (4.9, 5.5 and 5.6), they differ significantly from the upper pKa of the HIV-1 protease (7.2). As all proteases were simulated in their apo form, we conclude that structural differences of the enzymes themselves must be a source of this difference. We have previously shown, that within the constant pH framework and the used force fields, proximal charges and to a lesser degree H-bonds have the biggest potential of perturbing pKa values.[65] However, there are no positively charged residues close enough to the active site aspartates to form ion pairs in any of the studied enzymes. Hence, we calculated the average number of H-bonds formed by the catalytic dyad to proximal residues with polar side chains for each pH value. As can be seen from Figure , the HIV-1 protease forms around 1 H-bond with neighboring residues before the first titration, after which this number increases to around 4 H-bonds. In contrast to that, pepsin and cathepsin D can form 2.5 H-bonds on average in their doubly negative form and chymosin fluctuates around 3 H-bonds on average. This suggests, that the negative charges in chymosin, pepsin, and cathepsin D are stabilized by an H-bond network with neighboring residues, which is not present in the HIV-1 protease. This in turn could explain the notable shift of the upper pKa value for the HIV-1 protease, as the number of stabilizing H-bonds drastically increases to 4 H-bonds on average, as soon as the dyad is monoprotonated. The increase in the average number of H-bonds after the first titration is visible for all studied systems, albeit less pronounced compared to the HIV-1 protease. This can be attributed to the special structural arrangement of the catalytic aspartates, which enables the formation of H-bonds between the aspartates when at least one is protonated and thus further stabilizes the monoprotonated state. This is in line with the discussion above and supported by the experimental activity ranges, which report maximum activity of the respective proteases in these regions.

Figure 7

Average number of hydrogen bonds formed by the catalytic dyads of chymosin (A), pepsin (B), cathepsin D (C), and HIV 1 protease (D).

Average number of hydrogen bonds formed by the catalytic dyads of chymosin (A), pepsin (B), cathepsin D (C), and HIV 1 protease (D). Due to the spatial vicinity, the titration behavior of the two active site aspartates is expected to be strongly coupled. We thus profile the effect of this coupling by performing a transition analysis based on the possible protonation state combinations of the catalytic dyad. We illustrate this behavior for pepsin in Figure , with a focus on the pH region in between the two apparent pKa values. Here, the state populations indeed suggest primarily a monoprotonated form of the dyad, as is expected from the titration curves. At pH 4.0, all states are almost equally populated with a high number of edge (i.e., single proton) transition between all states. However, at the flanking pH values of 3.0 and 5.0, we note a certain preference in terms of which aspartate is protonated. On the one hand, this could point to a simple convergence issue and could be resolved by extending the simulations; on the other hand, this could mean, that protonation on one aspartate is indeed more stabilized than on the other. To exclude a convergence issue, we extended the simulations to 200 ns per pH, i.e., doubling the simulation time per pH value. However, the state distributions are remarkably stable and do not change significantly with longer simulation time. Furthermore, it is intriguing that single state transitions (transitions over the edges in Figure ) are far more frequent than both residues changing their protonation state in the same step (diagonal transitions in Figure ). This is especially interesting for the transition between states 1 and 2 which corresponds to both residues swapping the proton. As both residues are directly interacting with each other, the overall change for the system would be very small; however, the deprotonation of one residue and the protonation of the other in the next step is clearly favored. For serine proteases except chymotrypsin, no reliable active site pKa values could be found in the literature at the time of the writing of this manuscript. Therefore, the quality of the pKa prediction is evaluated based on the reported activity profiles of the respective protease. As already stated above, for these systems we did not titrate the whole catalytic triad but only the catalytic aspartate and histidine. Serine is generally not considered titratable in the investigated pH range from 0 to 10. As can be seen in Figure , all studied systems show a similar titration behavior, in that a quite acidic pKa value below 1.0 and a near neutral pKa value is predicted for the titrated residues. In relation to the reported active pH ranges, we find for all studied systems that both pKa values are below the reported active ranges. This means that both residues are in their deprotonated form, i.e., aspartate is negatively charged, while the histidine is neutral. This is well in line with the mechanism of serine proteases, in which a neutral histidine is strongly polarized by the neighboring aspartate and in consequence abstracts a proton from the catalytic serine, which in turn enacts the nucleophile attack on the substrate. It is therefore imperative for activity, that the histidine is in its neutral form and the aspartate is negatively charged. This is reflected in our predicted pKa values for all studied systems. While the upper pKa values for all studied systems are relatively similar and all lie within the error margin of our cpHMD approach of ±1 pKa units, significant differences in the lower pKa values can be identified. In the case of trypsin and chymotrypsin (Figure B,D), the respective lower pKa values corresponding to the titration of the catalytic aspartate are located below 1, clearly separating them from the upper pKa values which are around 6. In contrast, the titrations of elastase and granzyme B (Figure A,C) appear to be much more coupled, with a separation of around 3 pKa units. This difference can be attributed to differences in the H-bond network, which the catalytic aspartate can form with neighboring residues. In detail, a tyrosine residue (Y94, chymotrypsin numbering), which is conserved in trypsin and chymotrypsin and represents a potential H-bond partner for the catalytic aspartate, is mutated to tryptophan in elastase. This loss of interaction could potentially destabilize the deprotonated, i.e., negatively charged form of the catalytic aspartate and in turn lead to an elevated apparent pKa value. This hypothesis is supported by an H-bond analysis, shown in Figure . Clearly, the catalytic aspartate in trypsin (Figure C) forms 1 H-bond more on average than the respective aspartates in elastase (Figure A) and granzyme B (Figure B). Interestingly, the same shift cannot be seen so clearly for chymotrypsin (Figure D). We surmise, that while the interaction is not recognized as such by the employed metric, the polar interaction of the side chain is still present and will perturb the apparent pKa value of the catalytic aspartate.

Figure 8

Average number of hydrogen bonds formed by the catalytic aspartate of elastase (A), granzyme B (B), trypsin (C), and chymotrypsin (D).

Average number of hydrogen bonds formed by the catalytic aspartate of elastase (A), granzyme B (B), trypsin (C), and chymotrypsin (D). Compared to serine proteases, in which the catalytically active serine gets deprotonated intermediately during the reaction and directly attacks the substrate, the catalytic cysteine in papain and other cysteine proteases forms an ion pair with a neighboring histidine residue even in the apo state of the enzyme.[9,72] Since the tripeptide pKa value of cysteine of 8.5[16] suggests a protonated form at physiological and especially at acidic conditions, a strong perturbation of the cysteine pKa value is necessary in order to facilitate the ion pair formation. Indeed, for papain and papain-like proteases like caricain and ficin strongly perturbed pKa values as low as 3.3, 2.9, and 2.5, respectively, have been reported in the literature.[9,72] This strong shift of more than 5 pKa units is generally attributed to the aforementioned ionic interaction with the neighboring histidine. However, common prediction tools are reportedly unable to capture these strong shifts in the aforementioned systems and strongly mispredict the respective cysteine pKa values.[71] As can be seen from Figure , unfortunately also our approach falls short in predicting the pKa shift of the catalytic cysteine of papain (experimental pKa of 3.3). Indeed, with the implicit solvation model, which was successfully used for the other families, no clear titration of cysteine could be observed in the pH range from 0.0 to 14.0 (see Figure A). In an effort to pinpoint the source of this erroneous behavior, we switched the used implicit solvent to the most recent GB-Neck 2 model, coupled with the increase of the GB radius of sulfur to 2.0 Å as suggested recently by Shen and co-workers.[50] This had the notable effect that we now capture a titration of cysteine, resulting in a pKa value of 8.7 (see Figure B). However, the strong acidic shift is still not captured. Furthermore, we repeated the simulations using a replica exchange protocol in order to allow for a coupling of the pH values, but also this did not improve the prediction (Figure C). To exclude a deficiency of the MC-based constant pH framework, we reran the simulation with the recent implementation of the continuous constant pH approach in AMBER by Shen and co-workers, following their suggested setup for the treatment of cysteines.[50] However, as can be seen from Figure D, while we capture a titration of the cysteine, we still are not able to predict the strong acidic shift. We would like to note here, that while this manuscript was under revision, Shen and co-workers published a broad benchmark study predicting cysteine pKa values against experimental reference. With refined parameters, they were able to very accurately reproduce even strong pKa shifts (i.e., in papain). We would like to refer the interested reader to their publication.[74] We see a few possible reasons why the correct pKa values or at least an acidic shift could not be predicted. First, the predicted pKa values might correspond to a limited and strongly biased protonation state ensemble, stemming from insufficient conformational sampling. As the time scales, which can be covered in standard MD simulations, are generally several orders of magnitude below the time scales on which the experimental reference values are measured on, it could be possible that we only observe a very small fraction of the experimental conformational space. As the observed protonation state ensemble is closely linked to the conformational ensemble, also the apparent pKa values we obtain will in turn only correspond to this small sub-ensemble. Large conformational changes, which happen at much slower time scales, might severely alter the underlying conformational ensemble and in turn also the predicted pKa values. However, we deem this scenario to be very unlikely, as the prediction errors are extensive both in terms of the actual value as well as in the shift direction. Furthermore, this would also mean that the crystal structure and conformations close to it would represent an almost negligible part of the conformational ensemble at slower time scales. Thus, we surmise that a systematic error in the titration prediction is the source of the erroneous cysteine pKa predictions. Second, failing to capture the perturbation-effect itself could lead to a complete misprediction of the pKa values. However, as discussed above, the main perturbation of the pKa of CYS25 in papain comes from the ionic interaction with HIS159, an interaction that is generally well captured within the cpHMD framework.[65] To rule out a possible effect of the titration of HIS159, we reran the simulation, not allowing HIS159 to titrate and keeping it in its positively charged form (data not shown). As this did not change the pKa prediction of CYS25, we presume that also this scenario is not the definitive error source. Third, the description of the sulfur and its titration in the context of partial charges could be problematic. Since the titration of the reference compounds works without any issues, we presume that the description of the sulfur or the titration itself is not a problem when an isolated cysteine is considered but rather arises when the cysteine is located in a complex, i.e., protein environment. Shen and co-workers recently used the continuous constant pH MD implementation to successfully reproduce the pKa value of the creatin kinase.[50] As the cysteine pKa values in kinases are generally perturbed less than the ones found in proteases like papain,[9] this could mean that only very strong perturbations are not captured correctly. This could be linked to the strong polarizability of sulfur, an effect that is neglected in all tested cpHMD approaches, as no polarizable force fields are used.[73] We therefore presume that either a more sophisticated description of the electrostatics of sulfur or the incorporation of polarizable force fields would significantly improve the prediction. The aforementioned approach by Radak et al. as implemented in the program NAMD holds great promise in this regard due to its modular implementation with generally no prior assumption of the used force field.[34]

Conclusion

We apply constant pH MD simulations to provide pKa estimations for active site residues of a set of 9 different proteases. While the constant pH MD framework has been successfully applied to protease systems before, to our knowledge no study was published yet, which systematically predicts and summarizes active site pKa values of multiple protease families. We find that our predictions are consistent with the available experimental pKa values and are in sound agreement with the strongly varying pH activity profiles of aspartic and serine proteases. All titrated active site residues show substantial shifts away from the tripeptide pKa values. The applied sampling strategy successfully captures this behavior, highlighting the benefits of dynamic pKa prediction tools compared to static algorithms. The approach also allows us to depict the strongly coupled titration behavior found for some of the studied systems, which we show in detail for pepsin. Furthermore, we find pH-dependent H-bond networks which could explain the varying protonation and thus pH activity profiles. We presume the discussed residues as promising starting points, e.g., for protein engineering efforts toward tailored pH activities. However, we also clearly identify limitations of the methodology in terms of treating the strongly polarizable sulfur. Nevertheless, as the field of cpHMD simulations is rapidly progressing, we see these findings as an opportunity to enhance the reliability of this method even further.

54 in total

Review 1. Aspartic proteinases in disease: a structural perspective.

Authors: J B Cooper
Journal: Curr Drug Targets Date: 2002-04 Impact factor: 3.465

Review 2. Serine protease mechanism and specificity.

Authors: Lizbeth Hedstrom
Journal: Chem Rev Date: 2002-12 Impact factor: 60.622

Review 3. Human and parasitic papain-like cysteine proteases: their role in physiology and pathology and recent developments in inhibitor design.

Authors: Fabien Lecaille; Jadwiga Kaleta; Dieter Brömme
Journal: Chem Rev Date: 2002-12 Impact factor: 60.622

Review 4. Progress in the prediction of pKa values in proteins.

Authors: Emil Alexov; Ernest L Mehler; Nathan Baker; António M Baptista; Yong Huang; Francesca Milletti; Jens Erik Nielsen; Damien Farrell; Tommy Carstensen; Mats H M Olsson; Jana K Shen; Jim Warwicker; Sarah Williams; J Michael Word
Journal: Proteins Date: 2011-10-15

5. pH replica-exchange method based on discrete protonation states.

Authors: Satoru G Itoh; Ana Damjanović; Bernard R Brooks
Journal: Proteins Date: 2011-10-15

6. Scalable molecular dynamics with NAMD.

Authors: James C Phillips; Rosemary Braun; Wei Wang; James Gumbart; Emad Tajkhorshid; Elizabeth Villa; Christophe Chipot; Robert D Skeel; Laxmikant Kalé; Klaus Schulten
Journal: J Comput Chem Date: 2005-12 Impact factor: 3.376

7. Electrostatic effects in proteins.

Authors: M F Perutz
Journal: Science Date: 1978-09-29 Impact factor: 47.728

Review 8. Current problems in mechanistic studies of serine and cysteine proteinases.

Authors: L Polgár; P Halász
Journal: Biochem J Date: 1982-10-01 Impact factor: 3.857

9. Conformational dynamics of cathepsin D and binding to a small-molecule BACE1 inhibitor.

Authors: Christopher R Ellis; Cheng-Chieh Tsai; Fang-Yu Lin; Jana Shen
Journal: J Comput Chem Date: 2017-04-02 Impact factor: 3.376

10. Measuring the successes and deficiencies of constant pH molecular dynamics: a blind prediction study.

Authors: Sarah L Williams; Patrick G Blachly; J Andrew McCammon
Journal: Proteins Date: 2011-08-30

6 in total

1. Quantifying charge state heterogeneity for proteins with multiple ionizable residues.

Authors: Martin J Fossat; Ammon E Posey; Rohit V Pappu
Journal: Biophys J Date: 2021-11-23 Impact factor: 4.033

Review 2. Cysteine Oxidation in Proteins: Structure, Biophysics, and Simulation.

Authors: Diego Garrido Ruiz; Angelica Sandoval-Perez; Amith Vikram Rangarajan; Emma L Gunderson; Matthew P Jacobson
Journal: Biochemistry Date: 2022-09-26 Impact factor: 3.321

3. Comparative study of the unbinding process of some HTLV-1 protease inhibitors using unbiased molecular dynamics simulations.

Authors: Fereshteh Noroozi Tiyoula; Hassan Aryapour; Mostafa Javaheri Moghadam
Journal: PLoS One Date: 2022-07-14 Impact factor: 3.752

4. A structural exposé of noncanonical molecular reactivity within the protein tyrosine phosphatase WPD loop.

Authors: Huanchen Wang; Lalith Perera; Nikolaus Jork; Guangning Zong; Andrew M Riley; Barry V L Potter; Henning J Jessen; Stephen B Shears
Journal: Nat Commun Date: 2022-04-25 Impact factor: 17.694

Review 5. Why All the Fury over Furin?

Authors: Essam Eldin A Osman; Alnawaz Rehemtulla; Nouri Neamati
Journal: J Med Chem Date: 2021-08-02 Impact factor: 7.446

6. Irreversible inhibition of BoNT/A protease: proximity-driven reactivity contingent upon a bifunctional approach.

Authors: Lewis D Turner; Alexander L Nielsen; Lucy Lin; Sabine Pellett; Takashi Sugane; Margaret E Olson; Eric A Johnson; Kim D Janda
Journal: RSC Med Chem Date: 2021-05-19

6 in total