We present a detailed theoretical analysis of the reaction mechanism of proteolysis catalyzed by the main protease of SARS-CoV-2. Using multiscale simulation methods, we have characterized the interactions established by a peptidic substrate in the active site, and then we have explored the free energy landscape associated with the acylation and deacylation steps of the proteolysis reaction, characterizing the transition states of the process. Our mechanistic proposals can explain most of the experimental observations made on the highly similar ortholog protease of SARS-CoV. We point to some key interactions that may facilitate the acylation process and thus can be crucial in the design of more specific and efficient inhibitors of the main protease activity. In particular, from our results, the P1' residue can be a key factor to improve the thermodynamics and kinetics of the inhibition process.
We present a detailed theoretical analysis of the reaction mechanism of proteolysis catalyzed by the main protease of SARS-CoV-2. Using multiscale simulation methods, we have characterized the interactions established by a peptidic substrate in the active site, and then we have explored the free energy landscape associated with the acylation and deacylation steps of the proteolysis reaction, characterizing the transition states of the process. Our mechanistic proposals can explain most of the experimental observations made on the highly similar ortholog protease of SARS-CoV. We point to some key interactions that may facilitate the acylation process and thus can be crucial in the design of more specific and efficient inhibitors of the main protease activity. In particular, from our results, the P1' residue can be a key factor to improve the thermodynamics and kinetics of the inhibition process.
The recent outbreak of COVID-19, a pneumonia-like illness caused by a
coronavirus named SARS-CoV-2,[1] has rapidly evolved into a
pandemic as recognized by the World Health Organization. SARS-CoV-2 has been
shown to be highly contagious, causing a large number of infections around
the world. The absence of vaccines and specific treatments has contributed
to a rapid spread of this disease and a fatal outcome in many cases.
Furthermore, the existence of other similar viruses detected in animals
opens the possibility of future similar diseases.[2,3] Thus, finding
effective strategies for the identification of potential targets for new
drugs to fight against SARS-CoV-2 and other CoV-like viruses is an urgent
need. One of these strategies is based on the disruption of the activity of
those enzymes that are crucial in the replication cycle of the virus using
adequate compounds. In this sense, knowledge of the catalytic activity of
the enzyme in atomistic detail is one of the more powerful tools for
efficient and specific new drugs design. In particular, the characterization
and analysis of the geometry and electronic properties of the reaction
transition state (TS) can be used as a guide for the design of active site
inhibitors. In this work we analyze the reaction mechanisms for the main
protease of SARS-CoV-2, also referred to as 3C-like protease
(3CLpro), using density functional theory (DFT)-based
multiscale methods. This enzyme plays an essential role during the
replication of the virus and has no closely related homologues in human
beings, making it an attractive drug target.[4]The 3CLpro enzyme of SARS-CoV-2 exists as a functional homodimer
with two active sites in charge of cleaving the translated polyproteins into
individual fragments to be used by the coronavirus.[5] As
with other cysteine proteases, each of the active sites contains a Cys-His
catalytic dyad in charge of the hydrolysis of the peptide bond at specific
sites of a polypeptide chain. Several structures of thisprotease have been
already resolved by means of X-ray crystallography and deposited in the
Protein Data Bank (PDB), including the free protease (PDB codes 6Y2E(2)
and 6Y84(6)) and inhibitor-bound proteases (PDB codes 6LU7,[6]6Y2F,[7] and 6LZE(8)). The SARS-CoV-2 main protease has
a structure that is virtually identical to the ortholog from SARS-CoV (96%
identity). Even more, the main residues involved in catalysis, binding, and
dimerization processes are fully conserved.[9]
Consequently, these two ortholog enzymes display highly similar substrate
preferences.[10] The substrate cleavage by the
3CLpro takes place between Gln at the P1 position and a
Gly/Ala/Ser at the P1′ position (P and P′ identify the
residues placed before and after the scissile bond, respectively), making
the presence of Gln an essential requirement.[11]In principle, the reaction mechanism of cysteine proteases involves two basic
steps (see Scheme ).[12] In the first step, acylation, the peptide bond is
broken, releasing the P′ fragment of the peptidic substrate and
forming an acyl–enzyme complex where the catalytic cysteine (Cys145
in the protease of SARS-CoV-2) is covalently bound to the carbon atom of the
P1 residue of the target peptide. In a second step, deacylation, the
acyl–enzyme is hydrolyzed, releasing the P fragment and recovering
the enzymatic active site for another catalytic cycle. Covalent inhibitors
of the protease activity form acyl–enzyme complexes that cannot be
hydrolyzed and thus remain bonded into the active site.[6,13] In this work
we take advantage of the similarities between the proteases of SARS-CoV and
SARS-CoV-2 viruses and the existence of ligand-bound structures to build a
structural model of a peptide substrate–enzyme Michaelis complex and
to study the reaction mechanism using computational simulations.
Scheme 1
Reaction Mechanism in Cysteine Proteases
Methods
Classical Molecular Dynamics Simulations
The crystal structures with PDB codes 6Y2F(7) and 3AW0(14) were used as starting points to build the
Michaelis complex. The former corresponds to the holoprotease of
SARS-CoV-2, and the latter is the crystallographic structure of the
SARS-CoV ortholog cocrystallized with the peptidic aldehyde inhibitor
Ac-Ser-Ala-Val-Leu-His-aldehyde. To build the Michaelis complex
corresponding to the 3CLproprotease of SARS-CoV-2, the two
protein structures were aligned and then the cocrystallized ligand in
6Y2F was
replaced with the crystallized ligand in 3AW0 in the two active sites of the
homodimer (protomers A and B). The peptidomimetic
Ac-Ser-Ala-Val-Leu-His-aldehyde inhibitor was elongated using the
Maestro tool[15] until we built the substrate-like
sequence Ac-Ser-Ala-Val-Leu-Gln-Ser-Gly-Phe-NMe. We placed the
substrate in the two active sites, in agreement with the observation
that the X-ray structures (6LZE, 6M0K, 6WNP, 6LU7, and 7BQY) contain inhibitor molecules in both of them.
The absent hydrogen atoms were added using the Protein Preparation
Wizard tool of Maestro, and PROPKA3.0[16] was used to
calculate the protonation states of titratable residues at pH 7.4.The tleap tool from AmberTools18[17] was used to prepare
the simulation systems. The Michaelis complexes, described with the
ff14SB force field,[18] were solvated into a box with
a buffer region of at least 12 Å from any protein/substrate atom
to the limits of the simulation box. TIP3P water molecules[19] were used. Na+ atoms were added to
neutralize the charge. The resulting system was minimized using 500
steps of the steepest descent method followed by the
conjugate-gradient method, until the root-mean-square of the gradient
was below 10–3 kcal mol–1
Å–1. The system was then heated from 0 to
300 K using a heating rate of 1.7 K·ps–1. The
backbone heavy atoms were restrained in a Cartesian space using a
harmonic potential with a force constant of 20 kcal
mol–1 Å–2. Along the
equilibration step, the positional restraint force constant was
changed from 15 to 3 kcal mol–1
Å–2, decreasing by 3 units every 1.25 ns,
and after 6.25 ns the positional restraints were removed and the
systems continued their equilibration until 7.5 ns of NPT (300 K and 1
bar) simulation was completed. Then, 8 μs of NVT simulation at
300 K was performed with a 2 fs time step using SHAKE.[20] The particle mesh Ewald method was employed to
describe the long-range electrostatic
interactions;[21,22] for the short-range
interactions, a cutoff of 10 Å was used. The pressure was
controlled by a Berendsen barostat, and the temperature was controlled
by a Langevin thermostat. For all the simulations, periodic boundary
conditions were employed. The AMBER19 GPU version of
pmemd[23,24] was used to run the classical
molecular dynamic simulations. To sample a reasonable configurational
space of the protein, two additional replicas of the Michaelis complex
model with different initial velocities were run during 1
μs.
QM/MM Calculations
Exploration of the free energy surfaces associated with the peptide
bond-breaking process have been carried out using quantum
mechanics/molecular mechanics (QM/MM) simulations. In most simulations
(see Table S1 for details), the side chains of the
catalytic dyad (Cys145 and His41) and a fragment of the peptide
substrate of protomer A were included in the QM region, while the rest
of the system was described at the MM level as explained earlier. The
interaction of Asp187 with His41 is mediated by a water molecule, as
observed in the X-ray structures of the inhibited enzyme (6LZE, 6M0K, 6WNP, 6LU7, and 7BQY) and the apo
form (1UJ1). The
presence of thiswater molecule in between His41 and Asp187 limits
possible polarization and charge-transfer effects, and the latter
residue was not included in the QM region. The part of the substrate
described at the QM level includes the two residues involved in the
peptide bond to be broken (Gln-P1 and Ser-P1′) and the previous
and next peptide bonds up to the Cα atoms of Leu-P2
and Gly-P2′. In the exploration of the hydrolysis of the
acyl–enzyme, we also included a water molecule in the QM
region. To describe the QM subsystem, we used the B3LYP
functional[25,26] and D3 dispersion
corrections.[27] Calculations were performed
with the 6-31+G* basis set unless otherwise indicated (we also
employed the 6-31G* basis set). This level of theory has been shown to
be one of the best combinations to describe enzymatic
reactions.[28] In addition, this computational
description for the QM region (B3LYPD3/6-31+G(d)) provides a gas-phase
enthalpic change for the proton-transfer reaction between imidazole
and methanethiol (the motifs of Cys and His side chains), in agreement
with the experimentally derived value (see the Supporting Information). A systematic study for the
cysteine–histidineproton transfer also pointed to the B3LYP
functional as the most robust one to obtain an electronic description
in agreement with higher-level methods.[29] Finally,
a recent QM/MM analysis of the electronic properties of
enzyme–substrate complexes in SARS-CoV-2protease emphasizes
the importance of choosing an adequate functional for a proper
description of the process.[30] All calculations were
run with a modified version of Amber18[17,31] coupled
to Gaussian16[32] for DFT calculations. A cutoff
radius of 15 Å was used for all QM/MM interactions, and the
temperature was 300 K.To explore the free energy landscape associated with the chemical
reaction, we used our implementation of the string method, the
adaptive string method (ASM).[33] In this method
N replicas of the system (the nodes of the
string) are evolved according to the averaged forces and kept
equidistant, converging in such a way to the minimum free energy path
(MFEP) in a space of arbitrary dimensionality defined by the
collective variables (CVs). Once the string has converged (see an
example in Figure S1), we define a single path-CV (a collective
variable called s) that measures the advance of the
system along the MFEP. This path-CV is used as the reaction coordinate
to trace the free energy profile associated with the chemical
transformations under analysis. The MFEPs were explored on a free
energy hypersurface defined by a set of CVs formed by those distances
showing relevant changes during the process under study:
H-Sγ(Cys145), H-Nε(His41), H-N(P1′),
Sγ(Cys145)-C(P1), and C(P1)-N(P1′) for the acylation step
(see Scheme a) and
Hw1-Ow, Hw1-N(P1′),
Ow-C(P1), Sγ(Cys145)-C(P1),
Hw2-Ow, N(P1′)-Nε(His41), and
Hw2-Sγ(Cys145) for the deacylation step (see
Scheme b). Different
initial guesses, corresponding to different mechanistic proposals,
were explored at the B3LYPD3/MM level using first the 6-31G* basis
set, and then the best candidates were recalculated using the 6-31+G*
basis set. Each of the strings was composed of at least 96 nodes (see
Table S1) that were propagated with a time step of 1
fs until the RMSD of the string fell below 0.1
amu1/2·Å for at least 2 ps. The converged MFEPs
are then averaged to define the s path-CV
corresponding to each string. The free energy profiles along the
path-CVs were obtained using an umbrella sampling algorithm[34] running simulations for at least 10 ps and were
integrated using the WHAM technique.[35] The values
of the force constants employed to bias the ASM simulations were
determined on-the-fly to ensure a probability density distribution of
the reaction coordinate that was as homogeneous as possible.[33] Replica exchange between neighboring string nodes
was attempted every 50 steps to improve convergence.
Scheme 2
Collective Variables Employed To Study the Acylation (a)
and Deacylation (b) Reaction Mechanisms
For the proton transfer between Cys145 and His41, considering the
proximity of the proton donor and acceptor atoms and the geometrical
simplicity of the process, we traced the free energy profile using
umbrella sampling[34] along a simple proton-transfer
coordinate defined as the antisymmetric combination of the distances
of the proton to the donor and the acceptor atoms
(d(Nε-H)–d(Sγ-H)).
For thisprofile only the side chains of the two involved residues
were included in the QM region (using the B3LYPD3/6-31+G* level of
theory). A total of 40 windows were used, corresponding to an
increment of the reaction coordinate of 0.06 Å, with each of them
composed of 10 ps of equilibration and 20 ps of data collection. The
force constant employed to drive the reaction coordinate change was
600 kcal·mol–1·Å–2.
All the rest of the details of the simulations were as described
earlier. We also used umbrella sampling along distinguished
coordinates to explore the free energy landscape associated with the
separation between the first peptide fragment and the
acyl–enzyme complex and between the two peptide fragments at
the end of the deacylation process. Details of all the free energy
simulations performed in this work are given in Table S1.
Results
Enzyme–Substrate Complex
The time evolutions of the root-mean-square deviation (RMSD) values for
each replica (2 protomers and 2 substrates in each replica) are shown
in Figure S2. These values show that the protein
structure is well-equilibrated and there are no large differences with
respect to the initial structures, as prepared from the X-ray data.
The observations made on the three replicas (one of 8 μs and two
of 1 μs) are very similar in all cases. The substrate-binding
pocket is divided into a series of subsites (denoted as S and
S′), each accommodating a single residue of the substrate
placed before (P) or after (P′) the scissile peptide bond. The
map of hydrogen-bond interactions observed during our molecular
dynamics (MD) simulations of the Michaelis complex is given in Figure a, a general view of
the substrate in the two active sites of the dimer formed by protomers
A and B is shown in Figure b, and an insight into the active site of protomer A is
provided in Figure c. The
3CLpro of SARS-CoV-2 (as is also the case of the
SARS-CoV ortholog) presents a high specificity for Gln at the P1
position.[11,36] As seen in Figure a, the P1 residue is the one
establishing more hydrogen-bond interactions with the enzyme. The O
and N main-chain atoms of Gln-P1 are found to make hydrogen bonds with
main-chain atoms of Gly143, Ser144, and His164 (unless indicated, all
the residues of the protein belong to the same protomer A). The side
chain of Gln-P1 is accommodated into the S1 subsite through
hydrogen-bond contacts with the main-chain atoms of Phe140 and Leu141,
with the Nε atom of His163 and the Oε atoms of Glu166.
This last residue is in turn hydrogen-bonded to the terminal NH group
of Ser1 from protomer B. In fact, the N-terminal fragment (N-finger)
of protomer B plays an active role in preorganizing the active site of
protomer A for catalysis.[37] Dimerization is an
essential condition for catalysis in the protease of a related
coronavirus,[36,38,39] and
consequently, those mutants lacking the N-finger fragment are almost
completely inactive.[40] The side chain of Leu-P2 is
surrounded by the side chains of His41, Met49, His164, Met165, and
Asp187, while the main-chain amide group is hydrogen-bonded to the
Oε atom of Gln189. The side chain of Val-P3 is solvent-exposed,
while main-chain N and O atoms are hydrogen-bonded to main-chain atoms
of Glu166.
Figure 1
Results of the molecular dynamics simulation of
3CLpro of SARS-CoV-2 in complex with the
substrate with sequence
Ac-Ser-Ala-Val-Leu-Gln-Ser-Gly-Phe-NMe. (a) Fraction of
hydrogen-bond contacts between the residues of the
substrate and those of the protease found during the
trajectory of the Michaelis complex. A hydrogen-bond
contact is counted when the donor–acceptor distance
is <3.8 Å and the hydrogen-bond angle is
>120°. (b) General overview of the
substrate–enzyme complex, showing the dimeric
nature of the protease with two identical active sites
occupied by the substrate. Note that the N-finger of each
protomer is close to the active site of the neighbor
protomer. (c) Insight into the binding pose of the peptide
substrate into the active site, showing the most important
active-site residues and the positions occupied by the P
and P′ residues of the substrate. (d) Probability
densities of the distances from the Cys145-Sγ atom
to the carbonyl carbon atom of the substrate (C(P1)), in
red, and to the Nε atom of His41, in blue. The
bimodal distribution of Sγ-C(P1) distances
correspond to the trans (shorter distances) and gauche
(longer distances) conformations of Cys145. (e)
Disposition of the substrate in the vicinity of the
catalytic dyad when Cys145 is present in the trans
conformation. Note the proximity between Sγ and
C(P1) atoms and the orientation of the sulfhydryl proton
toward the Nε atom of His41.
Results of the molecular dynamics simulation of
3CLpro of SARS-CoV-2 in complex with the
substrate with sequence
Ac-Ser-Ala-Val-Leu-Gln-Ser-Gly-Phe-NMe. (a) Fraction of
hydrogen-bond contacts between the residues of the
substrate and those of the protease found during the
trajectory of the Michaelis complex. A hydrogen-bond
contact is counted when the donor–acceptor distance
is <3.8 Å and the hydrogen-bond angle is
>120°. (b) General overview of the
substrate–enzyme complex, showing the dimeric
nature of the protease with two identical active sites
occupied by the substrate. Note that the N-finger of each
protomer is close to the active site of the neighbor
protomer. (c) Insight into the binding pose of the peptide
substrate into the active site, showing the most important
active-site residues and the positions occupied by the P
and P′ residues of the substrate. (d) Probability
densities of the distances from the Cys145-Sγ atom
to the carbonyl carbon atom of the substrate (C(P1)), in
red, and to the Nε atom of His41, in blue. The
bimodal distribution of Sγ-C(P1) distances
correspond to the trans (shorter distances) and gauche
(longer distances) conformations of Cys145. (e)
Disposition of the substrate in the vicinity of the
catalytic dyad when Cys145 is present in the trans
conformation. Note the proximity between Sγ and
C(P1) atoms and the orientation of the sulfhydryl proton
toward the Nε atom of His41.The binding subsite for Ala-P4 is constituted of main-chain interactions
with Gln189 and Thr190, while the side chain of this residue is placed
between the side chains of Met165, Leu167, and Pro168. The S5 subsite
is formed by the side chain of Pro168 and by main-chain atoms of
Thr190. This description of the S1–S5 interaction subsites
observed in our simulations agrees with the characteristics found in
the X-ray structures of SARS-CoV-2 3CL protease with inhibitors bound
in the active site.[6,8] Our MD simulations of the
substrate–enzyme complex offer, in addition, a detailed
description of the S′ subsites, those that accommodate the
P′ residues placed after the scissile peptide bond. The
main-chain O atom of Ser-P1′ establishes a hydrogen bond with
the amide group of Gly143 and the side chain of Asn142. The hydroxyl
group of the P1′ side chain can contact with the catalytic dyad
(Cys145 and His41), while the CH2 group is packed between
Thr25, Thr26, and Leu27. Gly-P2′ is stabilized through
main-chain contacts with Thr25 and Thr26. Finally, the side chain of
the Phe-P3′ residue is packed against the side chain of Thr24.
This structural information can be useful in order to improve the
binding and specificity of potential inhibitors of the protease
activity because these structural findings are lost in the X-ray
structures obtained from those inhibitors in which the fragment
corresponding to P′ residues either is released during the
formation of the acyl–enzyme complex[13] or is
smaller than in our substrate.[6,8]We have also analyzed the substrate–enzyme interactions in
protomer B (see Figure S3). The interactions averaged over the
μs-time-scale simulations show only small differences between
the two protomers, indicating that the poses found in the two active
sites are consistent with the reported data.
Catalytic Dyad
The reaction mechanism of cysteine proteases involves the nucleophilic
attack of the Sγ atom of a cysteine (Cys145 in our case) to the
C(P1) atom of the peptide bond. Figure d shows the probability distribution of
Sγ-C(P1) distances found during our MD simulations. The
distribution is clearly bimodal, with two peaks centered at 3.4 and
4.7 Å. These two peaks correspond to two different conformations
of the side chain of Cys145, which can be present in trans and gauche
conformations. This is in agreement with the observations made on the
X-ray structure of the ortholog protease of SARS-CoV.[37] The most probable conformation corresponds to the
trans conformer in which the Sγ sulfur atom is closer to the
substrate (see Figure e). In
both conformations the catalytic dyad remains hydrogen-bonded, with
the most probable distance between Cys145-Sγ and His41-Nε
being about 3.3 Å (see Figure d). Interestingly, His41 is, in turn, hydrogen-bonded
through a highly conserved crystallographic water molecule to Asp187.
This interaction can raise the pKa of the
histidine, increasing its ability to work as a base and abstract the
proton from Cys145.Considering that the 3CL protease can be asymmetrical,[38] we also analyzed a 1 μs MD simulation of the dimer with only
one of the protomers occupied by the substrate. Figure S4 shows that the distribution of
Sγ-C(P1) distances is very similar to that obtained when the
substrate is present in the two active sites. A small decrease in the
population of the unproductive Cys145 gauche conformer is observed
when only one protomer is occupied, which could contribute to the
activity of the enzyme. In our simulations of the reaction mechanism,
we always selected the Cys145 trans conformation as the initial
structure.Taking into account the short distance observed between Cys145 and His41
and the possible activation of this last residue by the nearby Asp187,
we explored the possibility to find the catalytic dyad forming an ion
pair (Cys–/HisH+ or IP in Figure ) instead of the
neutral form modeled in the Michaelis complex (CysH/His or MC in Figure ). We evaluated the
free energy difference between these two forms of the dyad by means of
free energy profiles associated with the proton-transfer coordinate
from Cys145 to His41 obtained at the B3LYPD3/MM level (see Methods). The proton-transfer free energy
profiles (Figure a) were
obtained for the holo and apo forms of 3CLpro. According to
Figure a the catalytic
dyad is more stable in its neutral form, for both the apo and holo
forms. The IPs are 2.9 and 4.8 kcal·mol–1 above
the neutral form in the apo and holo enzymes, respectively. The
anionic Cys145 can be stabilized by the presence of water molecules
and by the hydroxyl group of Ser-P1′ (see Figure b). In an experimental analysis
of the substrate specificity of the ortholog SARS-CoVprotease, it was
found that mutation of Ser-P1′ to another small residue (such
as Gly or Ala) diminishes the rate constant by about 1 order of
magnitude.[36] A similar effect on the rate
constant was observed for SARS-CoV-2protease when the leaving group
was changed to p-nitroaniline.[41]
These variations in the rate constant could be attributed, in part, to
a stabilizing effect of the hydroxyl group of Ser-P1′ on the
ion pair of about 1.4 kcal·mol–1. Note that the
requirement of a small residue at the P1′ position by SARS-CoV
and SARS-CoV-2 main proteases could be due in part to the fact that in
these cases water molecules can access the active site, contributing
also to the stabilization of the ion pair. Bulkier residues at the
P1′ position could hinder the access of water molecules and
thus destabilize the ionized form of the catalytic dyad. According to
our free energy profiles shown in Figure a, the IP form is better stabilized
with respect to the neutral dyad in the apo enzyme than in the holo
one (by almost 2 kcal·mol–1), which can be
related to the better solvation of the catalytic dyad in the former.
In both the holo and apo enzymes, the free energy barrier associated
with the transfer of the proton between His41 to Cys145 is small,
revealing a fast equilibrium between the ion pair and neutral versions
of the dyad, with the latter being the predominant form. The existence
of a low-lying IP dyad is compatible with the experimental
observations made in the kinetic characterization of the homologue
3CLpro of SARS-CoV, in which an ion pair mechanism
for the proteolysis was proposed on the basis of the pH-inactivation
profile with iodoacetamide and the analysis of solvent isotope
effects.[41]
Figure 2
Analysis of the formation of an ion pair (IP) catalytic dyad
(Cys145–···His41H+)
from the neutral form
(Cys145H····His41) found in the
Michaelis complex (MC). (a) B3LYPD3/6-31+G*/MM free energy
profile associated with the proton-transfer coordinate
from the Sγ atom of Cys145 to the Nε atom of
His41
(d(Nε-H)–d(Sγ-H))
in the apo (blue line) and holo (red line) enzymes. (b)
Representation of the ion pair in the holo enzyme, showing
those interactions that stabilize the charged states of
the catalytic dyad, in particular the hydroxyl group of
Ser(P1′). (c) Representation of the ion pair in the
apo enzyme, showing the presence of water molecules that
stabilize the charged catalytic dyad when the substrate is
absent.
Analysis of the formation of an ion pair (IP) catalytic dyad
(Cys145–···His41H+)
from the neutral form
(Cys145H····His41) found in the
Michaelis complex (MC). (a) B3LYPD3/6-31+G*/MM free energy
profile associated with the proton-transfer coordinate
from the Sγ atom of Cys145 to the Nε atom of
His41
(d(Nε-H)–d(Sγ-H))
in the apo (blue line) and holo (red line) enzymes. (b)
Representation of the ion pair in the holo enzyme, showing
those interactions that stabilize the charged states of
the catalytic dyad, in particular the hydroxyl group of
Ser(P1′). (c) Representation of the ion pair in the
apo enzyme, showing the presence of water molecules that
stabilize the charged catalytic dyad when the substrate is
absent.The picture derived from our QM/MM free energy profiles contrasts with
the results obtained from the exploration of the proton transfer
potential energy surface in the SARS-CoV 3CLpro.[42] This study suggests a scenario where substrate
binding could reduce the energy cost of forming the IP. However, the
reported energy difference between the IP and neutral complexes was
significantly higher than our free energy estimation, about 11
kcal·mol–1, and difficult to reconciliate
with the observed rate constants (see below) and
pKa determinations.[43]
Acylation Step
We explored the free energy landscape for the formation of the
acyl–enzyme starting from the catalytic dyad IP at the
B3LYPD3/MM level. The converged MFEP is shown in Figure a and b. According to our
simulations, after IP formation the acylation proceeds by means of a
proton transfer from His41 to the N(P1′) atom followed by the
nucleophilic attack of Cys145-Sγ on the C(P1) atom and the
simultaneous breaking of the C(P1)-N(P1′) peptide bond. These
elementary events take place in a concerted but asynchronous way. The
transition state (TS) found for this mechanism (see Figure c) is associated with the
proton transfer from His41 to the amidenitrogen atom of the peptide
bond N(P1′). At the TS the Sγ atom of Cys145 approaches
the C(P1) atom, reducing the interatomic distance from 3.11 to 2.34
Å. This approach is accompanied by a moderate lengthening of the
peptide bond (the C(P1)-N(P1′) distance being lengthened from
1.36 to 1.54 Å). According to the free energy path shown in Figure a, the total free
energy barrier associated with the acylation process, including the
free energy cost of the ion pair formation, is 14.6
kcal·mol–1. This value is compatible with
the activation free energies derived from the steady-state rate
constants measured at 25 °C for peptides cleaved at the Gln-Ser
bond by the highly similar ortholog 3CLpro of SARS-CoV
(between 16.2 and 17.2 kcal·mol–1).[41] It must be noticed that in thisproteolysis the
acylation step is not considered to be the rate-limiting one and then
these experimental activation free energies provide an upper limit for
the acylation barrier.[41]Figure c shows that this TS
is stabilized by means of a hydrogen-bond interaction with the
hydroxyl group of Ser-P1′, indicating, also in agreement with
the experimental observations, that the leaving group plays an
important role in catalysis. The TS stabilization provided by the
hydroxyl group could also contribute to the larger rate constant
observed for the scission of the Gln-Ser bond.[36]
Remarkably, the proposed reaction mechanism is also in good agreement
with the experimental proton inventory results, which indicate that
there are two protons in flight during the acylation, one at the TS
and another one at earlier stages.[41] In our picture
these two proton-transfer events correspond to the proton transferred
from His41 to the N(P1′) atom of the substrate at the TS and
the proton transfer from Cys145 to His41 during IP formation.
Regarding the acyl–enzyme product (Figure
d), our free energy profile shows
two possible conformations that differ in the presence of a water
molecule that plays a key role during deacylation (see Figure S5). Finally, in our free energy profile the
formation of the acyl–enzyme (see Figure d) is almost thermoneutral, with a
reaction free energy of about −1
kcal·mol–1.
Figure 3
Simulation of the acylation reaction taking place through the
formation of the ion pair. (a) B3LYPD3/6-31+G*/MM free
energy profile along the path-CV for the acylation
reaction after the formation of the ion pair (IP) from the
Michaelis complex (MC). The reaction takes place with a
single transition state (TS) that yields the
acyl–enzyme (ACE), which can be present in two
conformations. (b) Evolution of the distances selected as
collective variables (CVs) along the minimum free energy
path (MFEP). Sγ-H is in yellow, H-Nε is in
red, C(P1)-N(P1′) (the scissile peptide bond) is in
green, H-N(P1′) is in black, and Sγ-C(P1) is
in blue. (c) Representation of the TS for the acylation
process. This TS corresponds to the proton transfer from
His41 in the ion pair catalytic dyad to the nitrogen atom
of the peptide bond (N(P1′) with the approach of
the Sγ atom of Cys145 to the carbonyl carbon atom
(C(P1)) and the lengthening of the peptide bond. The
values of the distances correspond to the coordinates of
the MFEP at the TS, except the intramolecular distance
between the hydroxyl and NH group of Ser(P1′) that
has been averaged over the trajectory of the corresponding
string node. (d) Representation of the acyl–enzyme
complex formed between the enzyme and the P fragment of
the peptide, with a water molecule hydrogen-bonded to the
N-terminal group of the P′ fragment. The free
energy profile shows two minima for the acyl–enzyme
complex, differing in the distance between the P and
P′ fragments, as shown in Figure S5.
Simulation of the acylation reaction taking place through the
formation of the ion pair. (a) B3LYPD3/6-31+G*/MM free
energy profile along the path-CV for the acylation
reaction after the formation of the ion pair (IP) from the
Michaelis complex (MC). The reaction takes place with a
single transition state (TS) that yields the
acyl–enzyme (ACE), which can be present in two
conformations. (b) Evolution of the distances selected as
collective variables (CVs) along the minimum free energy
path (MFEP). Sγ-H is in yellow, H-Nε is in
red, C(P1)-N(P1′) (the scissile peptide bond) is in
green, H-N(P1′) is in black, and Sγ-C(P1) is
in blue. (c) Representation of the TS for the acylation
process. This TS corresponds to the proton transfer from
His41 in the ion pair catalytic dyad to the nitrogen atom
of the peptide bond (N(P1′) with the approach of
the Sγ atom of Cys145 to the carbonyl carbon atom
(C(P1)) and the lengthening of the peptide bond. The
values of the distances correspond to the coordinates of
the MFEP at the TS, except the intramolecular distance
between the hydroxyl and NH group of Ser(P1′) that
has been averaged over the trajectory of the corresponding
string node. (d) Representation of the acyl–enzyme
complex formed between the enzyme and the P fragment of
the peptide, with a water molecule hydrogen-bonded to the
N-terminal group of the P′ fragment. The free
energy profile shows two minima for the acyl–enzyme
complex, differing in the distance between the P and
P′ fragments, as shown in Figure S5.Very recently an interesting QM/MM study of the same protease with a
different substrate (where the leaving group was not a peptide
fragment but a fluorescent tag, 7-amino-4-carbamoylmethylcoumarin) has
been reported.[44] In that work the proton transfer
from Cys145 to His41 was found to be concomitant with the nucleophilic
attack of the Sγ atom on the carbonyl carbon atom, forming a
thiohemiketal intermediate, and the cleavage of the peptide bond takes
place in a subsequent step assisted by the proton transfer from His41.
The differences with respect to our results could be due to the use of
a different substrate and/or the use of different theoretical
descriptions (that work used a combination of semiempirical and DFT
methods with the M06-2X functional). The description of the acylation
step can be highly dependent on the selection of the QM level, as
demonstrated in Figure S6 where we compare the free energy profiles
obtained with different Hamiltonians and basis sets. In any case, that
QM/MM study obtained an activation free energy of 19.9
kcal·mol–1, in excellent agreement
(within 0.5 kcal·mol–1) with the value derived
from the experimental rate constant obtained for that
substrate.[10] Interestingly, this rate
constant (0.050 s–1)[10] is
significantly smaller than the value reported for the hydrolysis of
the Gln-Ser bond by the ortholog enzyme of SARS-CoV (1.5–8.5
s–1).[41] The gap between
these two experimental rate constant values could be due to
differences in the preparation and purification of the enzyme or to
genuine mechanistic differences between substrates in the main
protease.[41,45] In this sense, as discussed
earlier, the presence of a hydroxyl group at the P1′ position
could play an important role in the acylation process, improving the
binding and kinetics of a hypothetical inhibitor. To explore other
possible mechanisms,[46] we also studied a reaction
path that does not involve the formation of an ion pair, although the
associated free energy barrier is considerably higher and incompatible
with the experimental rate constant (see Figure S7).
Deacylation Step
Regarding the deacylation step, the standard mechanistic proposal
suggested for related enzymes[47] assumes that, once
the neutral P′-NH2 peptide fragment has left the
active site, a water molecule activated by His41 attacks the
C(P1)-Sγ bond, releasing the P-COOHpeptide (also with a neutral
terminal group) and, finally, regenerating the enzyme after a proton
transfer from His41 to Cys145 (see Scheme ). In our simulations of the acylation
product, we found that a water molecule can be placed in between the
P′-NH2 leaving fragment and the
acyl–enzyme complex, being correctly oriented to perform the
hydrolysis of the acyl–enzyme (see Figures
d and S5). This configuration suggests an alternative
reaction mechanism that can yield the two peptide fragments with
correct protonation states in the terminal groups and regenerate the
enzymatic active site in its most stable state (the neutral catalytic
dyad). An additional advantage of this novel mechanism is that it
involves water activation by means of the N-terminus of the P′
fragment, which is known to be a better base than histidine side
chains (the average pKa values are about
7.7 and 6.6, respectively).[48] Finally, the proposed
mechanism can be at the origin of the mechanistic differences observed
when the scissile bond is an amide instead of an ester (in that case a
basic N-terminal group is not formed after the acylation).[41] We obtained the MFEP corresponding to such a
mechanism at the B3LYPD3/MM level (see Figure ). Thisprocess is stepwise, presenting
two TSs (see Figure a). The
first TS (TS1 in Figure c)
corresponds to the proton transfer from the water molecule to the
N(P1′) atom, resulting in the formation of the
P′-NH3+ peptide fragment. The free
energy barrier associated with this step is 15.6
kcal·mol–1. Thisproton transfer is
concomitant to the attack of the hydroxyl group on the C(P1) carbonyl
carbon atom, resulting in the formation of an intermediate thiodiolate
(Figure d). After
rotation of the hydroxyl group to orient the proton toward the sulfur
atom, the reaction proceeds with the cleavage of the C(P1)-Sγ
bond. The second TS observed during the deacylation (TS2, Figure e) corresponds to the
separation of the Sγ atom (with the C(P1)-Sγ distance
being 2.67 Å). The free energy barrier associated with this
second step from the acyl–enzyme complex is very close to the
first one, 15.8 kcal·mol–1 (17.5
kcal·mol–1 from the most stable
acyl–enzyme conformer; see Figure ). This value is in excellent agreement
with the values derived from the reaction rate constants for the
ortholog protease of SARS-CoV (from 16.2 to 17.2
kcal·mol–1).[41]
Afterward, the leaving cysteine is stabilized by means of a proton
transfer from the C-terminal group to the Sγ atom, regenerating
the enzyme in its more stable protonation state (a neutral catalytic
dyad) and yielding the P peptide fragment with a terminal unprotonated
carboxylate (the product is represented in Figure
f). The proposed mechanism, in
which the general base is a N-terminal group, displays a smaller
barrier than the standard mechanism in which His41 acts as the general
base activating the water molecule, as expected from the relative
pKa values (see Figure S8).
Figure 4
Simulation of the deacylation reaction with the N-terminal
group of the P′ fragment acting as a general base
in charge of water activation. (a) B3LYPD3/6-31+G*/MM free
energy profile along the path-CV for the deacylation
reaction. The process takes place in two steps. In the
first one the water activated by the N-terminal group
attacks the acyl–enzyme complex (ACE) to form a
thiodiolate intermediate (I) through the first TS (TS1).
In the second one the reaction product (Pr) is obtained
after breaking the acyl–enzyme bond in the second
TS (TS2). (b) Evolution of the distances selected as
collective variables along the minimum free energy path.
Sγ-Hw2 is in green,
Nε-N(P1′) is in black, Ow-C(P1) is
in gray, N(P1′)-Hw1 is in red,
Sγ-C(P1) is in yellow, Ow-Hw1
is in purple, and Ow-Hw2 is in blue.
(c) Representation of TS1 where the water molecule is
transferring a proton to the N-terminal group of the
P′ fragment and the resulting hydroxyl anion
attacks the carbonyl carbon atom of the P fragment (C(P)).
The values of the distances correspond to the coordinates
of the MFEP at TS1. (d) Representation of the thiodiolate
intermediate (I). (e) Representation of TS2 corresponding
to the breaking of the Sγ-C(P1) bond and the proton
transfer from the carboxylic terminal group to the leaving
sulfur atom. (f) Representation of the reaction products
(Pr) with the P-COO– and
P′-NH3+ peptide
fragments in the active site of the protease.
Figure 5
Proteolysis mechanism in 3CLpro of SARS-CoV-2 as deduced
from our simulations. (a) Schematic representation of the
reaction mechanism. (b) Free energy profile associated with the
reaction mechanism.
Simulation of the deacylation reaction with the N-terminal
group of the P′ fragment acting as a general base
in charge of water activation. (a) B3LYPD3/6-31+G*/MM free
energy profile along the path-CV for the deacylation
reaction. The process takes place in two steps. In the
first one the water activated by the N-terminal group
attacks the acyl–enzyme complex (ACE) to form a
thiodiolate intermediate (I) through the first TS (TS1).
In the second one the reaction product (Pr) is obtained
after breaking the acyl–enzyme bond in the second
TS (TS2). (b) Evolution of the distances selected as
collective variables along the minimum free energy path.
Sγ-Hw2 is in green,
Nε-N(P1′) is in black, Ow-C(P1) is
in gray, N(P1′)-Hw1 is in red,
Sγ-C(P1) is in yellow, Ow-Hw1
is in purple, and Ow-Hw2 is in blue.
(c) Representation of TS1 where the water molecule is
transferring a proton to the N-terminal group of the
P′ fragment and the resulting hydroxyl anion
attacks the carbonyl carbon atom of the P fragment (C(P)).
The values of the distances correspond to the coordinates
of the MFEP at TS1. (d) Representation of the thiodiolate
intermediate (I). (e) Representation of TS2 corresponding
to the breaking of the Sγ-C(P1) bond and the proton
transfer from the carboxylic terminal group to the leaving
sulfur atom. (f) Representation of the reaction products
(Pr) with the P-COO– and
P′-NH3+ peptide
fragments in the active site of the protease.The deacylation step presents a free energy change of about −5
kcal·mol–1. This exergonic character can
be increased with the release of the reaction products to the bulk. It
is worth noticing that the peptide fragments obtained from this
mechanistic proposal present a salt bridge between the charged
C-terminal and N-terminal groups that must be broken during products
release. The separation of the two terminal groups implies that water
molecules must be placed, tightly bounded, between these two charged
groups. Those configurations could contribute to an inverse solvent
isotope effect observed under steady-state conditions only when the
scissile bond is an amide and not an ester (this is only when a
N-terminal group is available to act as the general base).[41]
Conclusions
We have presented a detailed analysis of the Michaelis complex and the
proteolysis mechanism in the 3CLpro of SARS-CoV-2 using DFT/MM
computational simulations. Our study has identified key interactions
established between the protein and a peptide substrate and the detailed
reaction mechanism. The complete reaction cycle is shown in Figure a, while the associated free energy profile is given in Figure b. The reaction process
involves the formation of a catalytic dyad ion pair from which the acylation
step can proceed (see Video 1 in the Supporting Information). In this acylation,
the TS involves the proton transfer from His41 to the nitrogen atom followed
by the nucleophilic displacement of the peptide bond by the Sγ atom of
Cys145. This finding can be relevant for the design of inhibitors with
improved kinetics, stabilizing thisproton transfer through, for example, an
adequate choice of the P1′ group. For the deacylation step (see
Video 2 in the Supporting Information), we have proposed a
completely novel mechanism where the N-terminal group of the first peptide
fragment acts as the general base catalyzing the hydrolysis of the
acyl–enzyme complex.Proteolysis mechanism in 3CLpro of SARS-CoV-2 as deduced
from our simulations. (a) Schematic representation of the
reaction mechanism. (b) Free energy profile associated with the
reaction mechanism.The free energy profile associated with the proposed reaction mechanism is
shown in Figure b. According to
thisprofile, the acylation step is almost thermoneutral, while the
deacylation is more exergonic, in agreement with the observed
irreversibility of the process. Notably, our mechanistic proposal for the
deacylation step can explain the experimental differences observed between
the hydrolysis of amide and ester substrates in the highly similar protease
of SARS-CoV, which had remained unexplained until now. The predicted global
free energy barrier, 17.5 kcal·mol–1, is in excellent
agreement with the values derived from the observed rate constants for the
hydrolysis of the Gln-Serpeptide bond in the highly similar SARS-CoVprotease (between 16.2 and 17.2 kcal·mol–1).[41] Note that the rate constants for the hydrolysis of
different substrates with a fluorescent tag as the leaving group have been
also measured for the SARS-CoV-2protease.[10] The observed
rate constants were smaller than those measured for the Gln-Ser hydrolysis
in the SARS-CoVprotease,[41] and the activation free
energies derived from these rate constants measured for the SARS-CoV-2protease, between 18.6 and 19.4 kcal·mol–1, are
somewhat larger than our calculated value. However, because of the
differences between experimental procedures, it is difficult to assess if
the observed disparity among rate constants in the two proteases is due to
intrinsic properties of the two enzymes (which is unlikely due to the
identity between active sites) or, more likely, due to differences in the
preparation of the enzyme (which can affect the measured rate constants) or
to differences in the substrate (which can change the rate constant by 1
order of magnitude or more). We thus preferred to use as a reference for our
study the detailed kinetic characterization made on the SARS-CoVprotease
with the same substrate as that employed in our simulations.[41]Our study can also be useful to assist in the design of new and more potent
inhibitors of SARS-CoV-2 main protease. Apart from the well-known
interactions established by the P1–P4 groups in the active site, our
MD simulations of the enzyme–substrate complex stress the role of the
interactions established by the P1′ and P2′ groups. In
particular, the hydroxyl group at the P1′ position could be important
not only for binding but also during the reaction process because it could
assist the formation of the catalytic dyad ion par and the acylation
transition state. Interestingly, during the preparation of this manuscript a
preprint has shown that a hydroxymethylketone derivative can be a potent
inhibitor of SARS-CoV-2 3CL protease.[49] In this compound
the hydroxyl group can establish a direct contact with the catalytic
dyad.[50]