Dimas Suárez1, Natalia Díaz1. 1. Departamento de Quı́mica Fı́sica y Analı́tica, Universidad de Oviedo, Avda. Julián Clavería 8, 33006 Oviedo, Asturias, Spain.
Abstract
Herein, we investigate the structure and flexibility of the hydrated SARS-CoV-2 main protease by means of 2.0 μs molecular dynamics (MD) simulations in explicit solvent. After having performed electrostatic pKa calculations on several X-ray structures, we consider both the native (unbound) configuration of the enzyme and its noncovalent complex with a model peptide, Ace-Ala-Val-Leu-Gln∼Ser-Nme, which mimics the polyprotein sequence recognized at the active site. For each configuration, we also study their monomeric and homodimeric forms. The simulations of the unbound systems show that the relative orientation of domain III is not stable in the monomeric form and provide further details about interdomain motions, protomer-protomer interactions, inter-residue contacts, accessibility at the catalytic site, etc. In the presence of the peptide substrate, the monomeric protease exhibits a stable interdomain arrangement, but the relative orientation between the scissile peptide bond and the catalytic dyad is not favorable for catalysis. By means of comparative analysis, we further assess the catalytic impact of the enzyme dimerization, the actual flexibility of the active site region, and other structural effects induced by substrate binding. Overall, our computational results complement previous crystallographic studies on the SARS-CoV-2 enzyme and, together with other simulation studies, should contribute to outline useful structure-activity relationships.
Herein, we investigate the structure and flexibility of the hydrated SARS-CoV-2 main protease by means of 2.0 μs molecular dynamics (MD) simulations in explicit solvent. After having performed electrostatic pKa calculations on several X-ray structures, we consider both the native (unbound) configuration of the enzyme and its noncovalent complex with a model peptide, Ace-Ala-Val-Leu-Gln∼Ser-Nme, which mimics the polyprotein sequence recognized at the active site. For each configuration, we also study their monomeric and homodimeric forms. The simulations of the unbound systems show that the relative orientation of domain III is not stable in the monomeric form and provide further details about interdomain motions, protomer-protomer interactions, inter-residue contacts, accessibility at the catalytic site, etc. In the presence of the peptide substrate, the monomeric protease exhibits a stable interdomain arrangement, but the relative orientation between the scissile peptide bond and the catalytic dyad is not favorable for catalysis. By means of comparative analysis, we further assess the catalytic impact of the enzyme dimerization, the actual flexibility of the active site region, and other structural effects induced by substrate binding. Overall, our computational results complement previous crystallographic studies on the SARS-CoV-2 enzyme and, together with other simulation studies, should contribute to outline useful structure-activity relationships.
At the end of 2019, a novel coronavirus was identified in the airway epithelial
cells of three patients with pneumonia of unknown cause.[1]
The new pathogen, named severe acute respiratory syndrome coronavirus 2
(SARS-CoV-2),[2] spread rapidly around the globe, and
the associated COVID-19 disease become a worldwide pandemic. Social
distancing and quarantine were imposed in numerous countries to stop
dissemination, and the development of effective drugs and/or vaccines
against the virus was urged.Similar to other coronaviruses, two overlapping polyproteins (replicase
polyproteins 1a and 1ab) are produced and processed after SARS-CoV-2
infection to generate multiple functional subunits required for viral
replication.[3] The proteolytic processing is
accomplished by two internally encoded proteases that hydrolyze the
polyproteins at specific sites. One of these proteolytic enzymes, the
so-called main protease or 3C-like protease (3CLpro), catalyzes
most maturation cleavage events. Thus, 3CLpro is an essential
enzyme for viral replication and constitutes one of the best characterized
drug targets among coronaviruses.The crystal structure of the 3CLpro protein from SARS-CoV-2 is
highly similar to that of other coronaviruses.[4,5] The protein is
formed by three domains: domains I (residues 10–99) and II (residues
100–182) have an antiparallel β-barrel structure, while domain
III (residues 198–303) forms a compact α-helical domain
connected to domain II by a long linker loop. The active site is located in
a cleft between domains I and II, and it holds a histidine/cysteine
catalytic dyad. In the SARS-CoV-2 main protease, Cys145 acts as a
nucleophile during the first step of the hydrolysis reaction assisted by
His41 as a base catalyst. The oxyanion hole, which
stabilizes the partial negative charge developed at the
P1 carbonyl group of the peptide substrate
during the hydrolysis of the
P1–P1′
bond, is formed by the backbone amido groups of Gly143 and
Cys145, both placed in the so-termed oxyanion loop
(residues 138–146). The catalytic machinery also includes a number of
binding sites, with the S1 site defining the
enzyme specificity for a glutamine at the P1
position of the peptide substrate. Domain III is involved in the
dimerization of the protease (see below), and the resulting homodimer is
supposed to be the active form of the enzyme.[6]The high homology shown by the 3CLpro enzyme in different
coronaviruses suggests that this enzyme family exhibits almost identical
biochemical and biophysical properties. Particularly, the main proteases
from SARS-CoV-2 and SARS-CoV share 96% of sequence identity so that much of
the experimental results obtained for the later enzyme could be relevant for
the present case.[4] Hence, before outlining our goals in
this work, we first summarize some biochemical and structural studies on the
SARS-CoV main protease.[7]The critical role of domain III in the 3CLpro dimerization has been
settled by fragment deletion experiments performed for the SARS-CoV
protease, showing that a truncated enzyme lacking domain III remains as a
monomer.[8] The catalytic activity of this truncated
form, which has been assayed against a 14-mer peptide substrate, is very
weak as only ∼20% of the substrate was cleaved after 20 h.[8] Furthermore, serial truncation experiments have confirmed
that the last C-terminal helix in domain III is essential
for dimerization and enzyme activity, and more specifically, deletion of
residues Gln299 and Arg298 significantly reduces both
dimerization and enzyme activity
(∼1%–2%).[9,10]Another structural element of 3CLpro that seems essential for the
activity of the SARS-CoV main protease is the N-finger
constituted by residues 1–7 in domain I. According to Chen et
al.,[11] the complete deletion of the
N-finger has a minor effect on dimerization, but it
abolishes enzymatic activity (<1%) when reacting with a 12-mer peptide.
On the other hand, Hsu et al.[9] have found that the first
three N-finger residues have only a small influence both on
the dimer stability and on the activity of the enzyme
(Kd for dimer dissociation changes from
0.28 to 3.4 μM upon deletion while 76% of catalytic activity is
retained). However, in contrast with Chen et al., the serial truncation
experiments of Hsu et al. have revealed a dramatic effect in the structure
and activity of the enzyme after removing Arg4
(Kd = 57.5 μM and 1.3% of activity
in the truncated enzyme) and the next residues.[9]Crystallographic structures have shed light onto the relationship between
dimerization and enzyme activity. In the homodimer structures, the
N-finger of one protomer is squeezed between domains
II and III of another protomer. This allows Ser1 and
Arg4 from one protomer to shape the substrate-specificity
pocket (i.e., the S1 site) and the oxyanion loop
in the other protomer.[12] Curiously, in the crystal
structures for some monomers (mutants G11A, S139A, and
R298A),[13−15] the oxyanion loop is partially folded as a
310-helix that hinders the access to the active site,
particularly to the S1 subsite. Therefore, it
seems that both protein dimerization and the right placement of the
N-finger residues are indispensable for maintaining
an open active site. This interpretation has been supported by surface
plasmon resonance experiments showing that the full-length enzyme binds a
P6–P1
hexapeptide model (Kd ∼ 152 μM),
but almost no binding exists to the N-finger deleted
protease.[11]From the structural and activity results enumerated above, it seems that the
monomeric form of the 3CLpro enzymes has little or no affinity
for the peptide substrate, presumably due to the blocking conformation of
the oxyanion loop in the active site. However, this view has been challenged
by isothermal titration calorimetry (ITC) assays showing that the wild type
and several monomeric mutants of SARS-CoV (R298A, R298L, and R298A/Q299A)
present comparable binding affinities for a 6-mer peptide substrate.[16] Moreover, substrate binding to the single mutants
induces the dimerization of the enzyme, and kinetic assays provided similar
kcat values for the single mutants and the
wild-type protein. In contrast, dimer stabilization upon substrate binding
does not occur for the double mutant, which exhibits null proteolytic
activity.[16] Another intriguing fact is that some of
the crystal structures obtained for the 3CLpro enzyme present a
dimer with one of the monomers displaying a partially collapsed or
disordered oxyanion loop, showing thus that the blocking conformation is not
exclusive of the monomeric state.[12,15] Moreover, it has been reported that
the N214A mutant remains mainly as a dimer without enzymatic activity but
structurally very close to the wild-type enzyme (i.e., no collapsed active
site is observed in the crystal structure).[17] Therefore,
it seems most likely that a complex interplay exists among enzyme
dimerization, active site structure, substrate binding, and catalysis.Clearly, some of the questions regarding the actual role of the protease
dimerization on the architecture and activity of the SARS-CoV-2 catalytic
site are prone to computational examination by means of molecular dynamics
(MD) simulations in explicit solvent. Similarly, computational studies can
reveal important molecular details of the Michaelis complexes with peptide
substrates. Hence, in this work, we examine various configurations of the
SARS-CoV-2 3CLpro enzyme differing in the monomer or homodimer
state and in the presence or not of a short peptide substrate. All the
models are subject to a 2.0 μs MD simulation followed by intensive
analysis in order to characterize many structural and dynamic features
ranging from the tertiary and quaternary structures to specific
inter-residue contacts at the protomer interface or in the active site. By
comparing the results obtained for the various configurations, the
differences and similarities between the monomeric and dimer forms are
discussed in detail, revealing also some effects induced by substrate
binding on the interdomain arrangements and the prereactive organization of
the catalytic site. In this way, our computational results may complement
previous crystallographic studies on the SARS-CoV and SARS-CoV-2 enzymes.
Considering also that the spread of the COVID-19 disease has undoubtedly
sparked a flurry of computational research on this system, the present
results may also contribute to reach a consensus view about the actual
flexibility and structure of the active site supported by independent
simulation studies. Eventually, the representative structures produced by
our simulations could be of interest to undertake further computational
work.
Methods
pKa Calculations
The protonation state for the titratable residues at pH 7 were assigned
according to structure-based pKa
calculations performed with the H++ web server (version 3.1).[18] In these calculations, four high-resolution X-ray
structures recently deposited at the Protein Data Bank (PDB) were
considered: two apoenzyme structures 6Y84 (1.39 Å)[19]and 6M03 (2.0
Å)[20] and two enzyme–inhibitor
complexes 6LU7
(2.16 Å)[5] and 5R82 (1.31 Å)[21]
(coordinates of the inhibitor atoms were removed). The
pKa calculations were performed for
the protease monomers and for the catalytically competent dimeric
forms of the apoenzyme structures. In addition, three different
dielectric constants were selected for the interior of the protein
(εint = 4, 10, and 20) to evaluate the
consistency of the results.
Initial Structures and Building of the Enzyme/Peptide Complex
Starting coordinates for the SARS-CoV-2 main protease were taken from the
6LU7 (pH
6.0) PDB structure. The coordinates of the inhibitor were removed
prior to the edition and simulation of the native form of the enzyme,
and all the crystallographic water molecules were maintained in the
models. We also simulated an enzyme/substrate complex as the
SARS-CoV-2 main protease performs multiple cleavages at viral
polyproteins by recognizing several peptide sequences with an
absolutely conserved glutamine at the P1
site (i.e., cleavage occurs at the
P1–P1′
peptide bond). Hence, to generate a dynamic model for substrate
binding within the 3CLpro active site, we selected the
sequence of the cleavage site between the so-called nonstructural
protein 4 and the 3CLpro enzyme that are successively
encoded in the viral polyproteins (i.e., scissile peptide bond
3263–3264). According to the information prereleased in the
UniProt database (codes P0DTC1 and P0DTD1),[22] the
-P4-P3-P2-P1–P1′-
sequence of this recognition site is -Ala-Val-Leu-Gln–Ser-. We
built this short peptide sequence within the protease active site by
using the 6LU7
crystal structure as a template. Thus, the peptidomimetic inhibitor in
6LU7
includes a peptide moiety with the sequence -Ala-Val-Leu- located
within the
S4-S3-S2
binding sites and a consecutive side chain bound within
S1 that resembles a glutamine side
chain (see Scheme ). Hence,
we kept this part of the inhibitor, and we merely built the Ser
backbone chain at the S1′ site
using the Chimera program.[23] During the molecular
edition of the noncovalent enzyme/pentapeptide complex, the
Ser(P1′) side chain, the
N-terminal acetyl- and
C-terminal N-methyl amide capping
groups, and all H atoms in the peptide substrate were added by the
tLEaP program included in the AMBER18 suite of
programs.[24,25]
Scheme 1
Comparison between Molecular Structures of
Peptido-Mimetic Inhibitor and Peptide Substrate Selected
for This Work
Molecular Dynamics Settings
Molecular Dynamics (MD) simulations in explicit solvent were run for the
monomeric and dimeric forms of the enzyme, either in their free
(unbound) state or in complex with a pentapeptide substrate. The
coordinates of the solute atoms from the selected X-ray structure were
processed with the tLEaP program in order to add the missing H atoms
and to assign the molecular mechanics parameters. The systems, which
were represented with the ff14SB version[26] of the
all-atom AMBER force field,[27] were immersed in an
octahedral water box that extended 18 Å (monomer) or 20 Å
(dimer) from the protein atoms. The TIP3P potential[28] was used to represent the water molecules, and Na+
counterions[29] were added with the tLEaP
program to neutralize the systems.The solvent molecules and counterions were initially relaxed by means of
energy minimizations and 100 ps of molecular dynamics (MD) using the
SANDER program.[25] Then, the full systems were
minimized and heated gradually to 300 K using 60 ps of constant volume
(NVT) MD with a 1 fs time step and using the PMEMD program in AMBER18.
Subsequently, the density was adjusted by means of 2.0 ns of constant
pressure (NPT) MD with a 2 fs time step and using the Monte Carlo
barostat as implemented in PMEMD. Langevin dynamics was employed to
control the temperature (300 K) with a collision frequency of 2
ps–1. The SHAKE algorithm[30]
was selected to constraint all R–H bonds, and periodic boundary
conditions were applied to simulate a continuous system at constant
pressure (NPT). A nonbonded cutoff of 9.0 Å was used, and the
Particle-Mesh-Ewald method[31] was employed to
include the contributions of long-range interactions. The production
phase of the simulations at the NPT conditions extended up to 2.0
μs, and coordinates were saved every 2.5 ps for analysis. The MD
runs with a time step of 2.0 fs employed the GPU accelerated version
of the PMEMD code included in AMBER18.[25,32]
Structural Analysis of MD Simulations
The CPPTRAJ software[33] in AMBER18 was used to compute
the root mean squared deviation (RMSD) of the protein coordinates with
respect to the reference X-ray structure along the MD trajectories.
The coordinates of the models were also clustered using CPPTRAJ with
the average-linkage clustering algorithm and a sieve of 250 frames.
The distance metric between frames was calculated via the best-fit
coordinate RMSD using the coordinates of heavy atoms. The molecular
surfaces of the whole systems and of the domain/monomer components
were computed using the linear combination of pairwise overlaps (LCPO)
method[34] as implemented in CPPTRAJ. Secondary
structure assignment of the oxyanion loop was done using the 2002 CMBI
version of the DSSP program.[35]H-bond and vdW contacts were characterized using a specific software
developed locally. H-bonds were identified based on a geometrical
criterion (X···Y distance <3.5 Å and
X–H···Y angle >120°), whereas
hydrophobic interactions were scored by evaluating a dispersion
attraction term[36] between pairs of atoms belonging
to different hydrophobic groups. The criteria for assessing the
occurrence of dispersion interactions between two groups were as
follows: (a) The total pairwise dispersion energy is larger than 0.5
kcal/mol in absolute value. (b) The distance between the centers of
mass of the two interacting groups is below 12.0 Å. The Chimera
visualization system[23] was employed to draw the
ribbon/stick models of the systems.Using a locally developed FORTRAN code, the relative orientation of the
protein domains and/or of the two protomers in the dimeric form were
monitored in terms of the Euler angles (xyx
convention) that characterize the relative orientation of two rigid
coordinate systems, which are placed at the center of mass of the
considered fragments. Each coordinate system is defined by the
principal inertia axes, which, in turn, are computed considering the
coordinates of the backbone atoms located in the α-helical or
β-strand elements within the selected region.To further characterize the structure and shape of the active site
region, we computed the radius of accessibility
(racc) of various residue side
chains and/or H-bond sites following a computational protocol that has
been described in previous work.[37] For each atom or
group of atoms, racc is defined as the
maximum radius of a spherical ligand that can touch the desired
target. The MSMS program[38] was used to carry out
fast computations of molecular surfaces considering only the protein
heavy atoms and probe spheres of varying radius. The
racc values were calculated for a
subset of 2000 snapshots for each simulation.
Electrostatic Calculations
The electrostatic potentials of the models in solution were computed on
selected cluster representatives from the MD trajectories using the
APBS software.[39] In the electrostatic
Poisson–Boltzmann calculations, only the coordinates of the
solute atoms were used, with the atomic charges and radii taken from
the ff14SB AMBER representation. The nonlinear PB equation was solved
on a cubic lattice by using an iterative finite-difference method. The
cubic lattice had a grid spacing of 0.33 Å, and the points at the
boundary of the grid were set to the sum of Debye–Hückel
potentials. The dielectric boundary was the contact surface between
the radii of the solute and the radius (1.4 Å) of a water probe
molecule. The electrostatic potential was plotted onto the molecular
surface computed by the MSMS program[38] using the
Chimera program.
Conformational Entropy
We estimated the conformational entropy
(Sconform) of the backbone ϕ
and ψ dihedral angles for residues located nearby the active
site using the CENCALC program.[40] The entropy
calculation relies on the discretization of the time evolution of the
selected dihedral angles. To this end, the continuous probability
density function (PDF) of each dihedral angle is represented by a von
Mises kernel density estimator, which depends on a concentration
parameter κ (a κ = 0.50 value was chosen here). By finding
the maxima and minima of the PDF, the time series containing the
values of the corresponding dihedral angle during the MD simulation is
transformed into an array of integer numbers labeling the accessible
conformational states. This allows the estimation of the probability
mass functions (Pi) of the individual
dihedral angles from which the marginal (first-order) conformational
entropy of the each dihedral is computed
asTypically, μs-length MD simulations are required in order to obtain
reasonably converged Sconform
values.[41]
Results and Discussion
Table S1 in the Supporting Information collects the
computed pKa values for all the titratable
residues of the SARS-CoV-2 main protease. The crystal structures used
in the calculations were selected according to their state (apoenzyme
or inhibited enzyme), their atomic resolution (1.3–2.2 Å),
and the experimental pH value (6.0–8.1). The results
consistently predict that all lysines, arginines, and aspartic and
glutamic acids should be modeled in their charged state at pH 7. In
contrast, all cysteines and tyrosines should remain neutral. For the
imidazole histidine groups, their pKa
values are well below 7.0, suggesting thus that they will be mainly
neutral at pH 7. Only the results obtained for His80 at
high dielectric constants, more reliable for this solvent-accessible
residue, suggest the coexistence of neutral and protonated forms.
However, the computed pKa values below 7.0
and the absence of negatively charged residues close to
His80 prompted us to consider its neutral state.
Then, the most probable neutral state for the six histidines, with the
side chain protonated at Nδ or Nε atoms, was selected
after visual inspection of their contacts in the crystal structures.
This resulted in His41 and His80 being
protonated at Nδ, whereas His64, His163,
His164, His172, and His246 were
protonated at Nε. We also note in passing that
pKa calculations on the dimer
3CLpro structures do not introduce any changes in the
selected protonation states. In addition, the
pKa values obtained for the active
site residues, particularly for the catalytic dyad
Cys145/His41, support their neutral state
within the pH range 6–8 corresponding to the selected crystal
structures.
MD Simulations
We examined various protease configurations that involve either the
monomeric or the dimeric forms, which, in turn, simulate either the
native (unbound) state or the bound state with the
Ace-Ala-Val-Leu-Gln∼Ser-Nme peptide substrate (see Table S2 in the Supporting Information). In
particular, we run two 2 μs MD trajectories representing the
native state that are labeled as M (monomer) and
D (dimer). The enzyme–peptide complex was
sampled by two MD trajectories labeled as M/Pep and
D/2Pep, corresponding to the monomeric and dimeric
forms, respectively. In D/2Pep the active site of each
protomer A and B accommodates one peptide molecule. The MD results of
the various simulations are presented and analyzed in a comparative
manner in order to better characterize the structure and flexibility
of the protein domains, the amplitude of the interdomain relative
motions, the nature and stability of the protomer–protomer
contacts, the shape and flexibility of the active site, and the
substrate mode of binding. All these results can be also useful to
outline more clearly the differences and similarities between the
monomeric and dimeric forms of the enzyme in aqueous solution.
Intradomain Structure and Relative Domain Orientation
Changes in the internal geometry and in the relative position of the
various protein domains were first analyzed by monitoring the time
evolution of the RMSD of the coordinates of the heavy atoms with
respect to the crystallographic structures (Figure
; Table
S3 presents mean RMSD values). In addition, the
superposition of the solid-state and the MD-averaged ribbon models
(Figures and 3) reveals some differences in the positioning of
specific loops and secondary structure elements.
Figure 1
(a) Time evolution along the M (in red) and
M/pep (in blue) trajectories of the
root mean squared deviation (RMSD) computed for selected
backbone heavy atoms (in Å). (b) Time evolution along
the D (in red) and D/2Pep (in
blue) trajectories of the RMSD data (in Å).
Figure 2
Different views (90° turned) for a ribbon representation
of the average structure obtained from the last 50 ns of
the M and M/pep simulations. The
average structure was superposed over the 6LU7
X-ray structure (in lighter color) using the backbone
coordinates of domain II.
Figure 3
Different views for a ribbon representation of the average
structure obtained from the last 50 ns of the
D and D/2Pep simulations.
The average structures were superposed over the 6LU7
X-ray structure (in lighter color) using the backbone
coordinates of domain II.
(a) Time evolution along the M (in red) and
M/pep (in blue) trajectories of the
root mean squared deviation (RMSD) computed for selected
backbone heavy atoms (in Å). (b) Time evolution along
the D (in red) and D/2Pep (in
blue) trajectories of the RMSD data (in Å).Different views (90° turned) for a ribbon representation
of the average structure obtained from the last 50 ns of
the M and M/pep simulations. The
average structure was superposed over the 6LU7
X-ray structure (in lighter color) using the backbone
coordinates of domain II.Different views for a ribbon representation of the average
structure obtained from the last 50 ns of the
D and D/2Pep simulations.
The average structures were superposed over the 6LU7
X-ray structure (in lighter color) using the backbone
coordinates of domain II.The rotational motion of the helical domain III with respect to the
central β-strand domain II along the M simulation
is a remarkable result concerning the overall protein architecture
(Figure ). In the
coordinates of protomer A in the 6LU7 structure, domain III establishes
only a few H-bond contacts with domain II (e.g.,
Th111···Asp295 H-bond,
Arg131···Asp289 salt
bridge) and domain I (e.g.,
Arg4···Gln299 H-bonds), and
domains II/III are connected through a 14-long peptide linker
(Gly183–Asp197). In contrast, the
connecting loop between domains I and II comprises only seven residues
(Asp92-Pro99) including a central
Pro96 residue, and abundant interdomain contacts
contribute to define the active site region. In fact, the domain I/II
arrangement was quite stable in all of our MD simulations.The relative motion of domain III is well described by the time evolution
of the RMSD values for all the backbone atoms and by the center of
mass (COM) distance between domains II and III (Figure and Figure S2). To characterize the interdomain
orientation regardless of the COM separation, we also computed the
Euler angles defined by a reference inertial system placed at domain
II and an analogous coordinate system located at domain III, with the
resulting ϕ, θ, and ψ angles shown in the form of
polar plots in Figure S1. In the M simulation, the two
domains slightly depart from each other (COM distance elongates from
∼26 to ∼29 Å) at 200 ns without significant
reorientation. During the ∼200–300 ns interval, an ample
reorientation occurred in less than 15 ns facilitated by a torsional
change at Gly195, and the relative position remained stable
during more than 1 μs. However, during the last 500 ns of the
M simulation, domain III reorients again with
respect to domain II adopting an intermediate pose in terms of the
Euler angles (Figure S1), which also departs
significantly from the X-ray structure as shown in Figure .To further assess whether or not the varying location of domain III is an
intrinsic feature of the monomeric state in aqueous solution, we
decided to run a second MD simulation started from a different X-ray
structure (6Y84)
with the same settings as those employed in the M
simulation. For the sake of brevity, the structural details of this
simulation, which was run for 1.0 μs, are not reported here. We
just comment that a two-step interdomain reorientation occurred again
within the 300–500 ns time interval, leading to an average MD
structure that resembles the first configuration observed in the
M trajectory (Figure S4). Therefore, the independent MD trajectory
added support to the results produced by the M
simulation.Interestingly, the wide domain II/III rearrangement was not observed in
the presence of the peptide substrate (M/Pep simulation),
although the COM separation and the reference Euler angles present
significant fluctuations, especially in the first half of the
simulation, indicating loosened interdomain contacts. The peptide
substrate gives direct H-bond contacts with linker residues (e.g.,
Gln189 and Gln192), which could contribute
to reduce the mobility of the II–III linker. On the other hand,
the interdomain motions are largely dampened out in the dimer
simulations (D and D/2Pep), which are
characterized by a persistent interdomain disposition and stable
interprotomer contacts (see below). Thus, it turns out that the
destabilization of the interdomain orientation would occur only in the
monomeric form.Concerning the internal structure of domains I–III, the most
remarkable feature is the small structural deviations and low
flexibility of the central domain II (Figure and Table S3). Thus, the mean RMSD values are only
∼1.1 ± 0.1 Å for the monomeric simulations
(M and M/Pep) and even smaller for the
dimer models (∼0.8 ± 0.1 Å) with the only exception
being domain IIA in the D simulation (1.52 ± 0.21
Å; see below). The internal stability of domain II is in
consonance with a certain buried character as only
∼55%–60% (monomer simulations) or ∼40%–45%
(dimers) of its molecular surface is solvent accessible (Table S4). The terminal domains I and III are more
hydrophilic, having around 70% and 85% of exposed surfaces,
respectively, all along the MD simulations. They exhibit wider
internal motions involving their helical and loop elements, which,
however, do not induce dramatic changes in the domain structures, with
the largest RMSD values being lower than 2.5 Å. In general, the
largest intradomain deviations arise in the native trajectories
(M and D) as, for example, in domain
IIIB in D (RMSD = 2.30 ± 0.16 Å), whose
C-terminal helix is largely displaced (Figure ). In this respect,
it turns out that the local internal displacements and the actual
flexibility in domains I/III are quite variable when comparing among
the data of the various simulations and/or of the A/B protomers in the
D simulation. This variability suggests that the
protein domains may access different conformational states on the
μs time scale. Finally, we note that the D/2Pep
simulation consistently shows the smaller structural deviations and
the lower fluctuations, both in terms of the intradomain RMSD data and
the interdomain COM distances/Euler angles, revealing thus a
rigidifying effect exerted by substrate binding.
Protomer–Protomer Interactions
As in other 3CLpro enzymes, only the SARS-CoV-2 main protease
homodimer is considered to be catalytically active, and accordingly,
targeting the interface between both protomers (A and B) might be an
alternative therapeutic strategy for the treatment of COVID-19. Hence,
we decided to analyze in detail the overall dimer architecture and the
inter-residue contacts between the two protomers. According to the
protomer–protomer distance in Figure S2, the dimer remains stable during the
D and D/2Pep simulations. By means of
molecular surface calculations (Table S4), we found that the A···B
contact area consists of ∼11% (∼1400 Å2)
of the LCPO surface of the separated protomers and also remains quite
stable along the D simulation. This contact area in
D is similar to that in the X-ray structure
(∼1400 Å2 in 6LU7), albeit lower than in the
D/2Pep simulation (∼1600
Å2). The two protomers are oriented perpendicular to
one another. Their relative orientation, as measured by the Euler
angles between the corresponding inertial reference systems, is highly
stable according to the MD simulations (Figure S1), which is compatible with internal
motions within the respective protomer domains. The stability of the
dimer architecture is also evident in the good overall match between
the average MD structures and their parent X-ray structures (Figure ). The global RMSD
mean values further confirm this agreement as they have moderate
values of 2.91 ± 0.12 Å (D) and 1.78 ±
0.16 Å (D/2Pep), which, in turn, point out again the
substrate binding effect.In the crystallographic structures, the two protomers give symmetrical
contacts (e.g., the
Ser1@O···HN@Phe140* and
Phe140@NH···O@Ser1* H-bonds
have the same interatomic distances; see Table
S6). The A···B contacts mainly involve
residues Ser1-Gly11 in one protomer and several
residues distributed in the three domains of the other protomer (e.g.,
Ser10···Ser10,
Arg4···Lys137,
Met6···Val125, and
Arg4···Glu290). There are
also a few nonpolar contacts between residues in domain II (e.g.,
Val125···Val125) and
between residues in domain III (e.g.,
Ala285···Leu286). Most of
these contacts are well maintained during the D and
D/2pep simulations (Table S6). For instance, the
Gly11@NH···Oε@Glu14
H-bond presents 100% of abundance and a short interatomic distance
(2.9 Å) in the two dimer trajectories. But the percentage of
occurrence of other polar interactions, mainly involving
Ser1 and Arg4 residues in the
N-finger, varies across the simulations and
with respect to the crystal structure. Thus, the initial
Ser1···Glu166 H-bonds are
missing for most part of the simulations (i.e., the highest percentage
of occurrence is below 50%). Similarly, the
Ser1···His172 contact is
weakened in the simulated dimers, and the
Ser1···Phe140 interaction
is not abundant for the D trajectory. With respect to
Arg4, the salt bridge with Glu290 is well
preserved but the
Lys137(A)···Arg4(B)
interaction is clearly destabilized (Table S6A). On the other hand, the simulations also
assess the relevance of nonpolar interactions for the stability of the
dimer. Thus, the Met6/Tyr126,
Pro9/Pro122,
Pro9/Val125, and
Pro9/Leu115 contacts, which connect the
N-finger in one protomer to domain II in the
other protomer, have a high abundance and a large scoring energy.
Similarly, the Val125/Val125 contact glues
domain IIA and IIB in all the simulations, while the hydrophobic
packaging of the Ala285 and Leu286 residues in
domains IIIA and IIIB also contributes to fix the highly mobile
C-terminal domains. This hydrophobic cluster
involving residues Ala285/Leu286 is a
particularly important interprotomer contact spot. The
Ala285···Leu286 average MD
distances are clearly shorter for the D/2Pep trajectory
(Table S6B), which indicates that the
long linker in domain III containing Ala285 and
Leu286 is more packed in the dimer in the presence of
the peptide substrate. This would explain the increase in the contact
area between the protomers along D/2Pep.Domain II in one protomer is also linked to domain III in the other
protomer thanks to the
Ser123···Arg298 and
Ser139···Gln299 H-bonds.
The Ser123···Arg298 contact is
mainly mediated by a water molecule during the simulations, and it is
clearly desestabilized along D/2Pep. In contrast, the
presence of the substrate contributes to stabilize the
Ser···Gln299 interactions (Table S6A) that connect the oxyanion loop in one
protomer to the C-terminal helix in the other
protomer.
Structure and Dynamics of Active Site in Unbound Form of Enzyme
The 3CLpro active site is located at a shallow crevice in the
I/II interdomain region. The nucleophilic Cys145 belongs to
the oxyanion loop, which is an S-shaped loop
(Gly138-Gly146) in domain II. The amidenitrogens of Cys145 and Gly143 define the
“oxyanion hole”, which binds the carbonyl group of the
scissile peptide bond in the substrates. Adjacent to the oxyanion
loop, a β-strand segment (His163-Pro168)
comprises other residues that play an important role in substrate
binding (e.g., His163, Glu166), which, in turn,
are close to a loop segment (Gly183-Ala193)
placed at the beginning of the domain II/III linker that contributes
to border the active site region.Besides Cys145, the catalytic dyad also includes
His41 that is supposed to act as a base during the
activation of the nucleophile. His41 belongs to a small
helix that is placed within a long and highly helical connection loop
in domain I. The positioning of His41 is assisted by a
network of H-bonds including a
His41···Asp187 interaction
mediated by a conserved water molecule and a salt bridge between the
nearby Arg40 guanidinium and the Asp187carboxylate at the domain II/III linker. The active site is completed
by several interdomain nonpolar interactions among Pro39,
Met49, and Leu27 in domain I and
His164 and Met165 in domain II that
constitute a hydrophobic pocket (site S2) for accommodating
the peptide substrate.To describe the structure and flexibility of the active site in the MD
simulations of the unbound configurations, we performed first
clustering calculations that yield the population and representative
structures, which allow us to visualize structural deviations with
respect to the X-ray geometries. A more detailed description is
provided by the statistical analysis of selected H-bond/vdW contacts
and the secondary structure analysis of the oxyanion loop. We also
measured the accessibility of the catalytic residues and binding sites
in terms of the average accessibility radii (Table
). The main results are displayed
in Figures –7,
while Tables S7 and S8 collect statistical
data on inter-residue contacts. Some details of the clustering
calculations and of the Sconform
calculations are given in Table S5 and Figure S3.
Table 1
Average Values (in Å) of Radii of Accessibility
(racc) at Different
Sites from the Last 1.5 μs of MD Simulationsa
6LU7
M
M/Pep
DA
DB
DA/2Pep
DB/2Pep
Cys145 (Sγ)
3.30
2.68
(1.16)
2.63
(1.21)
2.50
(1.28)
3.60
(0.88)
3.45
(1.03)
2.99
(1.16)
His41 (imidazol)
3.45
3.95
(1.24)
3.43
(0.78)
4.37
(0.99)
3.78
(0.84)
3.90
(0.67)
3.62
(0.71)
Cys145 (NH)
1.28
1.21
(0.44)
1.83
(0.43)
2.44
(0.60)
1.32
(0.28)
1.22
(0.31)
1.35
(0.36)
Gly143 (NH)
3.10
3.94
(1.46)
2.76
(0.79)
5.18
(1.38)
3.47
(0.96)
3.11
(0.82)
2.99
(0.83)
Glu166 (NH)
3.35
2.16
(1.17)
3.48
(0.71)
3.59
(0.76)
3.34
(0.83)
2.67
(0.62)
2.86
(0.73)
Glu166 (C=O)
4.03
4.51
(1.08)
5.59
(0.60)
5.55
(0.74)
5.30
(0.77)
5.07
(0.67)
5.00
(0.84)
His164 (C=O)
3.28
2.77
(0.67)
3.87
(0.65)
3.24
(0.77)
3.31
(0.80)
4.20
(0.49)
3.92
(0.57)
His163 (imidazol)
2.28
1.63
(0.68)
2.55
(0.31)
2.02
(0.56)
2.26
(0.33)
2.37
(0.17)
2.38
(0.18)
Met49 (side chain)
3.45
3.82
(1.14)
5.67
(0.52)
5.94
(0.26)
5.74
(0.61)
5.89
(0.34)
5.57
(0.77)
His164 (side chain)
0.68
1.42
(0.52)
1.43
(0.28)
1.26
(0.72)
1.09
(0.21)
1.33
(0.16)
1.33
(0.21)
Met165 (side chain)
3.35
3.51
(0.83)
3.69
(0.97)
4.12
(1.11)
3.83
(0.83)
4.52
(0.57)
3.96
(0.89)
Leu167 (side chain)
1.90
1.98
(0.50)
2.38
(0.47)
1.74
(0.56)
2.14
(0.59)
2.47
(0.37)
2.41
(0.42)
Standard deviations are given in parentheses. The
racc data for the
X-ray structure are also included.
Figure 4
(a, b) View of the active site in the M
simulation as shown by the superposition of the two most
important cluster onto the X-ray structure (shown in
lighter colors). (c) Superposition of top five cluster
representatives (coil thickness and color intensity are
proportional to cluster abundance). (d) Electrostatic
potential mapped onto the surface of the two most
populated clusters.
Figure 7
(a) View of the active site as shown by the superposition of
the most important D/2Pep cluster
representative on the X-ray structure (shown in light
colors). (b) Closer view of the active site region showing
the peptide substrate and the catalytic dyad. (c)
Superposition of the cluster representatives.
(a, b) View of the active site in the M
simulation as shown by the superposition of the two most
important cluster onto the X-ray structure (shown in
lighter colors). (c) Superposition of top five cluster
representatives (coil thickness and color intensity are
proportional to cluster abundance). (d) Electrostatic
potential mapped onto the surface of the two most
populated clusters.(a) View of the active site region in each protomer (A, B)
along the D simulation as shown by the
superposition of the top one clusters onto the X-ray
structure (shown in lighter colors). (b) Superposition of
the cluster representatives (coil thickness and color
intensity are proportional to cluster abundance). (c)
Electrostatic potential mapped onto the solvent-accessible
protein surfaces.Standard deviations are given in parentheses. The
racc data for the
X-ray structure are also included.(a) View of the active site as shown by the superposition of
the most important M/Pep cluster
representative on the X-ray structure (shown in light
colors). (b) Closer view of the active site region showing
the peptide substrate and the catalytic dyad. (c)
Superposition of the cluster representatives. (d)
Electrostatic potential mapped onto the solvent-accessible
protein surfaces.(a) View of the active site as shown by the superposition of
the most important D/2Pep cluster
representative on the X-ray structure (shown in light
colors). (b) Closer view of the active site region showing
the peptide substrate and the catalytic dyad. (c)
Superposition of the cluster representatives.The active site region remains quite accessible during the MD simulations
as expressed in terms of the mean racc
values collected in Table .
According to this structural index, which measures the size of the
largest spherical probes contacting a particular group of protein
atoms, the catalytic Cys145thiol group, the
His41imidazol, and the Gly143amide group
are not sterically blocked during the M/D simulations
with racc values above 2.7, 3.7, and 3.5
Å, respectively, while the backbone Cys145 position is
partially buried with smaller racc values
between 1.3 and 2.0 Å. The majority of the binding spots located
at the hydrophobic shallow pocket are well solvent exposed and have
racc values quite similar to those
in the X-ray structures with the exception of the Glu166
backbone and His163 side chain in the M
simulation (see below). Such large accessibility is observed again in
the molecular surface drawings (Figure ), which in addition reveal that a
local concentration of negative electrostatic potential is another
feature of the active-site surface patch.Although the global form of the active region is comparable during the
M and D simulations, there are
significant differences concerning the amplitude of the fluctuations
and deviations with respect to the initial coordinates. For example,
clustering analysis considering both backbone and side chain atoms
results in 80 different clusters for the M trajectory,
the top two clusters having moderate abundances of 16% and 14%,
respectively. The same clustering settings yield 33/32 clusters for
protomers A/B in the dimer, with the top
D/D
clusters being more populated (33% for the first cluster and 14/17%
for the second cluster; Table S5). When the average-linkage clustering is
based on the backbone RMSD values, there are fewer clusters with
higher abundances as expected, but the flexibility measurement is
somehow modified. Thus, the monomeric M simulation still
has many clusters (25), with the two most populated ones accounting
for 39% and 16% of the analyzed MD frames. For each protomer in the
D simulation, the backbone-only clustering results
in seven (D) and four
(D) clusters, but the
population distribution is asymmetrical given that the top two
D clusters have 37% and
33%, whereas the backbone D is quite rigid,
with the top one cluster having 89% abundance. In terms of the
T-weighed conformational entropy values
(Figure S3), the flexibility of the backbone dihedral
angles in the vicinity of the active site amounts to −11,
−13, and −7 kcal/mol for the M,
D, and
D catalytic sites,
respectively, which seem in consonance with the backbone clustering
analysis. Therefore, it is clear that the active site in the monomeric
state exhibits a significant flexibility both in terms of backbone and
side chain atoms. However, the active site region may also retain some
plasticity in the dimer state as shown in the
D active site.Inspection of the cluster representatives confirms that the catalytic
site in the isolated monomer (Figure c) is more ductile than that in protomers A and B of
the dimer (Figure b).
Moreover, a close examination of the top two M clusters
(Figure ) reveals that
the oxyanion loop, the domain II/III linker segment, and the
interhelical connecting loop in domain I tend all to be distorted with
regard to the X-ray structure. For the oxyanion loop, a short
310 helix conformation is detected in residues
Asn142-Ser144, which, in turn, is
associated with the placement of the Asn142 side chain over
the shallow S1 subsite, blocking access to
the Glu166 NH and CO groups. However, the time evolution of
the racc indexes indicates that the
S1 “collapse” in the
M simulation is reversible (Figure S5) given that access to the
Glu166@NH and His163 sites fluctuates on
the ns time scale between blocked phases with
racc around ∼1.0 Å and
solvent-exposed phases with racc above 2
Å. The racc plots are indeed
correlated with the placement of the Asn142 side chain and
the 310 helical distortion at the oxyanion loop. According
to secondary structure analyses using the DSSP method, the central
residues Asn142-Gly143-Ser144 show
310 helical conformation in ∼43% of the
analyzed snapshots, with the backbone chain interchanging between the
helical and turn/coil conformations along the M
trajectory (Figure S6), which is in consonance with the
structural flexibility of the oxyanion loop in this state.
Figure 5
(a) View of the active site region in each protomer (A, B)
along the D simulation as shown by the
superposition of the top one clusters onto the X-ray
structure (shown in lighter colors). (b) Superposition of
the cluster representatives (coil thickness and color
intensity are proportional to cluster abundance). (c)
Electrostatic potential mapped onto the solvent-accessible
protein surfaces.
The conformation of the oxyanion loop during the D
simulation deserves also particular attention due to its role in
substrate binding and catalysis. A dissimilar behavior was observed
between the two protomers so that the
D active site
maintains an overall conformation that is closer to the
crystallographic structure and clearly more rigid than that of the
other protomer D (Figure ). The major
difference arises in the oxyanion loop because it adopts a
β-strand conformation around the Phe140 residue
during the central part of the D simulation (protomer A;
Figure S6) without compromising the accessibility to
the important binding sites. Therefore, the secondary structure
analysis further confirms the larger flexibility of the oxyanion loop
in D and reveals its complex
conformational properties, which oscillate between flexible and
quasi-static states in the absence of substrate molecules. However,
the implications (if any) of this behavior for substrate binding are
not clear.The simulations allow us to establish unambiguously the relationship
between the oxyanion loop dynamics and specific inter-residue
contacts. Among such contacts, the catalytic Cys145 forms a
persistent (94%–98%)
Cys145@CO···Asn28@NδH
interaction in all the simulations except
D, in which such
backbone···side chain contact is less abundant (54%).
Thus, the Asn28···Cys145
interaction is involved in the orientation and flexibility of the
C-terminal portion of the oxyanion loop
(Asn28 seems also important because of its additional
contacts with Cys117 and Gly120 at the domain
I/II interface). To analyze the contacts of the Cys145thiol group, we selected geometric criteria
(Sγ···:X distance < 4.0 Å and
90 < SγH···:X angle < 180°) that take
into account the larger size and more diffuse electron cloud of the
sulfur atom.[42] The most abundant interaction of
Cys145@SγH occurs with the backbone carbonyl
group of His164, especially in the monomeric state (84% for
M, 81% for protomer A and 65% for B in
D). When looking at the central region of the
oxyanion loop, its conformation is mainly held by means of a
Ser144@OγH···Leu141@O
H-bond so that its rupture (e.g., in
D) leads to a different
loop arrangement. Considering the first part of the oxyanion loop, it
turns out that the Phe140 side chain largely determines its
conformation. This group gives several nonpolar contacts with
Tyr126, His163, His172, and
Val114 both in the X-ray structures and in the
majority of the simulations. However, the destabilization of this
hydrophobic clustering in M and
D, particularly of
the
Phe140(phenyl)···His163(imidazol)
π–π stacking, would trigger the Phe140
rearrangement that results in the conformational change of the whole
oxyanion loop.The network of H-bond contacts around the catalytic His41
residue is also of particular interest. The simulations confirm the
presence of water-mediated
His41@NH···Wat···Asp187@Oδ
and/or
His41@NδH···Wat···Asp187@Oδ
contacts. Furthermore, the same water molecule connects the
His41 and His164 side chains
(His41@NδH···Wat···His164@Nδσ).
We note, however, that the bridging water molecule exchanges
frequently with bulk solvent on the ns time scale without disrupting
its structural role. The His41 side chain also forms
nonpolar weak contacts with Leu27, Pro39,
Met49, and His164 (Tables S7B and S8B). The salt bridge
Arg40···Asp187 interaction
is also observed in all the simulations and contributes to maintain in
the right place the short helix containing His41 and the
first part of the II/III connecting loop (residues
189–191).
Substrate Binding
As previously mentioned, the binding of the peptide substrate within the
active site exerts a certain rigidifying effect in the interdomain
dynamics, with the global structure of the enzyme becoming more
compact. Peptide binding can also induce specific effects on the
oxyanion loop conformation with respect to the unbound form of the
enzyme. Thus, the structural analysis, clustering calculations, and
secondary structure assignments point out clearly that the oxyanion
loop conformation is nearly frozen in the presence of the substrate
and remains relatively close to the X-ray conformation. Hence, the MD
simulations suggest that a stable positioning of the oxyanion loop
partially induced by the substrate would be required for optimal
binding and catalysis.Concerning the enzyme–peptide binding determinants, we note first
that the short peptide
Ace-Ala(P)-Val(P)-Leu(P)-Gln(P)∼Ser(P1′)-Nme
aligns antiparallel to the terminal part of the long II β-strand
(Scheme and Figures and 7). The analysis of the M/Pep and
D/2Pep simulations shows that the alignment occurs
from P4 to P2
thanks to two highly stable (i.e., 90%–100%) H-bond contacts
involving the backbone groups of Glu166 in the
β-strand and that of the substrate
Val(P3) residue. At the
N-terminal end of the peptide, some H-bonds
also occur with residue Gln192 in the II–III
connection loop, although these contacts are less abundant
(∼60%). The cleavage sites selected by the SARS-CoV-2 main
proteinase includes Leu, Phe, Val, or Met at the
P site, all of them well
suited for being accommodated at the hydrophobic
S subsite of
the enzyme. According to the calculated dispersion energy scorings,
the Leu(P2) side chain placed at
S2 mainly interacts with the side
chains of His41, Met49, and Tyr54 in
domain I and Met165 in domain II (Table S9).
Scheme 2
Schematic Representation of Enzyme–Substrate
Interactions
Average values of heavy-atom separation (Å) and %
of abundances are indicated for selected contacts.
Some abundances are segregated into protomer A and B
(in italics).
Figure 6
(a) View of the active site as shown by the superposition of
the most important M/Pep cluster
representative on the X-ray structure (shown in light
colors). (b) Closer view of the active site region showing
the peptide substrate and the catalytic dyad. (c)
Superposition of the cluster representatives. (d)
Electrostatic potential mapped onto the solvent-accessible
protein surfaces.
Schematic Representation of Enzyme–Substrate
Interactions
Average values of heavy-atom separation (Å) and %
of abundances are indicated for selected contacts.
Some abundances are segregated into protomer A and B
(in italics).Coronavirus main protease invariably recognizes peptide cleavage sites
within polyproteins with a strictly conserved glutamine residue in
P1. Taking into account that the
hydrolysis reaction catalyzed by this enzyme occurs at the
P1–P1′
amide bond, it is clear that the Gln(P1)
residue should be correctly placed within the active site. According
to our simulations, the backbone amido group of
Gln(P1) interacts with the backbone
carbonyl group of His164 (60% for M/Pep and
∼96% for D/2Pep), whereas the
Gln(P1) backbone carbonyl group is
placed within the oxyanion hole defined by the amido groups of
Gly143 and Cys145. At this point, we note
a remarkable difference between the M/Pep and
D/2Pep simulations. For the monomer, both
Gly143@NH···Gln(P1)@O and
Cys145@NH···Gln(P1)@O
H-bonds present a low abundance (i.e., <50%), which suggests that
the scissile peptide bond is not well positioned within the active
site. In contrast, the
Gly143@NH···Gln(P1)@O
H-bond is well maintained during the whole D/2Pep
trajectory in the active site of protomers A and B. Some differences
also arise at other contacts formed by the
P1 side chain that build up the
specificity of the S1 site. Our
simulations confirm that residues Phe140 and
His163 are important in defining the
S1 subsite via
His163@NεH···Gln(P1)@Oε
and
Phe140@O···Gln(P1)@NεH
H-bonds. However, these contacts are clearly more abundant in the two
active sites of the dimer than in the monomer state, pointing out that
the N-finger of the second protomer contributes to
organize the S1 subsite (Scheme and Table S9). Glu166 also participates in
the binding of the Gln(P1) side chain but,
according to the M/Pep and D/2Pep
simulations, the
Glu166@Oε···Gln(P1)@NεH
H-bond contact is preferentially mediated by a solvent molecule.In the most likely hydrolysis mechanism assisted by the coronavirus main
protease, the reactive events would involve the proton transfer from
Cys145@SγH to His41@Nε and the
nucleophilic attack of the Sγ atom to the carbonyl group of
Gln(P1). In this respect, the
simulations show that, compared to the native form, the
Cys145 side chain shifts to preferentially interact
with the His41 side chain when the peptide substrate is
bound within the active site. Thus, a catalytically relevant
Cys145@SγH···Nε@His41
contact is present along the D/2Pep simulation (80% for A
and 81% for B), although it is less abundant for M/Pep
(49%). The H-bond network involving His41,
His164, and Asp187 and the structural
water molecule is well preserved in the presence of the peptide
substrate, with the water exchange with bulk solvent being reduced to
just one (M/Pep and
D/Pep) or no
(D/Pep)
events. Hence, the relocation of the Cys145thiol group and
the stable positioning of His41 result in an average
Sγ···Nε distance and average
SγH···Nε angle of 3.9 ± 0.5 Å and
95 ± 43° for M/Pep and 3.5 ± 0.3 Å
and 125 ± 39° for D/Pep (equal values for
protomers A and B). These values suggest that the homodimer state
favors geometrically the activation of the nucleophile by
His41. We also measured the average
Cys145@Sγ···C@Gln(P1)
distance/Sγ···C···O angle, which
have values of 5.1 ± 0.8 Å/84 ± 32° in
M/pep and 3.7 ± 0.6 Å/83 ± 13°
in D/2Pep for protomer A and 4.0 ± 0.7 Å/82
± 11° in D/2Pep for protomer B. In this case,
the shorter
Cys145@Sγ···C@Gln(P)
distances in the D/2Pep simulation suggest again that the
dimer state exhibits an enzyme–substrate orientation favorable
for catalysis.
Discussion
Most of the crystallographic and biochemical studies reported to date about the
3CLpro proteases have been devoted to the SARS-CoV enzyme.
Although their major conclusions are reasonably expected to be valid for the
SARS-CoV-2 protease due to their high degree of homology, there is, of
course, a growing research activity aimed specifically at SARS-CoV-2.
Nevertheless, detailed molecular descriptions of the structure and dynamics
of the native 3CLpro enzyme in aqueous solution and of its
binding determinants are still lacking, which is what prompted us (and other
researchers) to computationally examine the SARS-CoV-2 protein by means of
extensive MD simulations. In this scenario, our results may complement
currently available structural data given that several protein
configurations were simulated and subject to comparative analysis in order
to clarify some questions regarding the monomer structure, the dimer
stability, and so on. It must be noted, however, that the significance of
the present 2.0 μs MD simulations will depend on their critical
comparison with results produced by other theoretical studies.With regard to the tertiary structure of the monomeric 3CLpro
protein, a remarkable observation is the domain III rearrangement occurring
during the M simulation. Thus, the monomer enzyme can adopt
alternative conformations in aqueous solution. Interestingly, the
crystallographic structures of three mutants of the SARS-CoV protein (G11A,
S139A, and R298A) reveal a similar domain II/domain III poses.[13] These mutations cause the dimer dissociation by
disrupting critical contacts at the protomer interface, and thereby, the
rotation of domain III has been considered a structural feature of the
monomeric enzyme. In the most abundant domain III orientation observed in
the M simulation, the alignment of domains II and III is
stabilized by a few salt bridges that also involve N-finger
residues (Arg4···Asp216,
Lys5···Glu290, and
Arg131···Asp289). In the SARS-CoV
mutants (2PWX
structure),[13] the
Lys5···Glu290 salt bridge is
again observed, while domain III is slightly rotated and establishes
different salt bridges (Arg131···Glu240
and Lys137···Asp289). The simulations
also gave some clues about the domain III rotation event. At the beginning
of the monomer MD simulation, the N-finger, which lacks the
Arg4···Glu290* contact
characteristic of the dimer state, is anchored at the last α-helix of
domain III both by polar
(Arg4···Gln299) and nonpolar
(Phe3···Phe291) interactions. It
turned out that the weakening of the π–π contact preceded
the rupture of the Arg4···Gln299
contact, loosening then the N-finger hold onto domain III
and enabling the torsional changes at the hinge Gly195 residue
for the wide rotation to occur. Hence, it seems reasonable to consider that
the N-finger structural instability induces the domain III
conformational change. We also note in passing that the domain III
rearrangement may be especially unfavorable for the recognition and
organization of the long polyprotein sequences processed in
vivo by the main SARS-CoV proteases.The MD simulations probe possible effects associated with the binding of the
relatively small pentapeptide substrate. In general, we found that the
amplitude of intradomain and interdomain motions as well as the global
flexibility are reduced in the presence of the substrate as measured in
terms of various descriptors (RMSDs, molecular surface, cluster
representatives, etc.). The dampening of the domain I and II mobility is
well understood because the peptide molecule, which remains perfectly bound
to the catalytic site, further interconnects domains I and II by forming a
stable network of H-bonds and nonpolar contacts. Similarly, the direct
influence of the Gln(P1) residue on the oxyanion
loop conformation is equally clear. However, the allosteric mechanism(s)
through which substrate binding can affect the conformation of distal enzyme
regions and/or enhance dimerization are not obvious. This is apparently the
case for the outer wall of a channel[43] passing through
the central region of the dimer constituted by the side chains of
Ala285 and Leu286 from each of the protomers.
The inter-residue contact analysis and the surface calculations indicate
that this hydrophobic cluster is significantly more compact in the
D/2Pep simulation than in D, which is likely
connected with the lower mobility of domain III upon substrate binding. On
the other hand, the fact that the domain III rotation did not occur during
the monomeric M/Pep simulation confirms again the rigidifying
role played by the peptide molecule. More specifically, the residues of the
domain II/III linker loop that border the catalytic site
(-Gln189-Thr190-Ala191-Gln192-)
become less flexible in the presence of the peptide molecule, which, in
turn, may hamper the torsional change at the nearby Gly195 acting
as the hinge residue for the domain III rearrangement. Therefore, in this
way, substrate binding to monomeric proteases could favor the interdomain
orientation that is adequate for dimerization, helping thus clarify the
cooperative effects between substrate binding and dimerization
experimentally observed[16] in the monomeric mutants of
SARS-CoV.Clearly, the small impact on dimer stability and catalysis due to the
truncation of residues 1–3 in the N-finger of
SARS-CoV[9] agrees nicely with our MD simulations in
aqueous solution, which reveal how the contacts involving the
N-terminal Ser1 in the starting
crystallographic structures are largely weakened by water molecules (e.g.,
Ser1···Glu166 salt bridge and
Ser1···Phe140 backbone contact).
Hence, we propose that Ser1 plays only a secondary role in
controlling the conformation of the oxyanion loop, which contrasts with the
critically important Arg4 residue. Experimentally, residue
deletion up to Arg4 results in a monomeric SARS-CoV protein with
null or very weak enzymatic activity.[9] In fact, our
simulations emphasize that Arg4 has a two-fold role. On one hand,
it provides the Arg4···Glu290* salt
bridge between the two protomers and helps fix the position of domain III in
the same protomer through a double H-bond between the
N-finger backbone and the Gln299 side chain. On
the other hand, its guanidinium group contributes to clutch the oxyanion
loop by H-bonding the Lys137* amide group in the other protomer.
However, in the native form of the enzyme, the latter interaction and the
Ser139···Gln299* contact with the
C-terminal helix may be perturbed as seen in the D simulation,
conferring some flexibility to the oxyanion loop. Other residues located in
the N-coil also contribute to dimer stability, with the
Gly11@NH····Oε@Glu14*
contacts being especially important in terms of their abundance (100%) and
steady interatomic distance (2.9 Å) all along the dimer simulations,
which is in consonance with the inability of the Gly11Ala
SARS-CoV mutant to dimerize.[13]As mentioned in the Introduction, the absence of
catalytic activity in the monomeric SARS-CoV mutants has been ascribed to a
presumably collapsed form of the active site, in which a partially
310 helical oxyanion loop would impede access to the
binding sites. To our knowledge, such collapsed conformation of residues
Ser139-Phe140-Leu141 has been
observed both in the monomeric X-ray structures of the SARS-CoV enzyme and
in some of its dimer states. In our simulations, the 310 helical
twist occurs exclusively during the monomeric M simulation at
the Asn142-Ser144 residues adjacent to
Cys145. Furthermore, this transition turns out to be
reversible (twist and untwist events are observed)
and only reduces the accessibility to the Glu166@NH group and
His163 side chain. Therefore, the simulations suggest that,
in the liquid phase, the active site of the monomeric state would be
transiently, but not permanently, in the partially collapsed conformation.
The MD analysis also shows that the S1
structural fluctuations associated with the oxyanion loop motions would be
the direct consequence of the missing contacts with the
N-finger of the protomer. Concerning the dynamics of the
dimer unbound state, we find that all the binding sites remain perfectly
solvent exposed, and the active site regions tend to be more rigid than in
the monomer form. The terminal residues (1–3) of the
N-finger still have some conformational freedom, and
the oxyanion loop in D shows
flexibility around the Phe140 position. Hence, we propose that
the S1 site of the 3CLpro enzymes may
retain some plasticity in the dimer form, which would be more accentuated in
the monomer state reducing, but not abolishing, its peptide binding ability.
On the contrary, we expect that the monomer SARS-CoV-2 protein would have a
significant affinity for binding the examined peptide as indicated by some
highly stable enzyme–substrate contacts along the M/Pep
MD trajectory. Therefore, although the M/Pep complex may be not
optimal for catalysis (see below) and the M active site
exhibits varying oxyanion loop conformations, our simulations give no
support to the hypothesis of a collapsed active site to explain the
noncatalytic behavior of the monomer state. In fact, they seem more
compatible with the report of calorimetric Kd
values for the enzyme–peptide complexes involving monomeric mutants
of SARS-CoV that are close to that of the wild-type dimer.[16]The present MD simulations allow us to investigate the noncovalent binding
between the SARS-CoV-2 enzyme and a model peptide,
Ace-Ala-Val-Leu-Gln∼Ser-Nme, reproducing the sequence of the first
cleavage site catalyzed by 3CLpro at the polyproteins. The
construction of the initial complex was feasible thanks to the
crystallographic adduct (6LU7) between SARS-CoV-2 and a peptido-mimetic inhibitor.
The remarkable stability of the enzyme–peptide mode of binding all
along the (unconstrained) simulations confirms that the active site is
readily adapted to accommodate this sequence. As above-discussed, the
presence of the peptide molecule influences both globally and locally the
structure and dynamics of the systems. In this respect, the alignment of the
substrate chain along the binding crevice, together with the accommodation
of Gln(P1) and
Leu(P2) in the
S1 and S2
pockets, firmly locks the scissile peptide bond into the
Cys145/Gly143 oxyanion hole. Furthermore, we
found that substrate is essential for the side chain of the nucleophilic
Cys145 to adopt a suitable orientation for proton transfer
toward the His41imidazol and nucleophilic attack toward the
Gln(P1) carbonyl group. Therefore, we note
that the catalytic competency of the enzyme should not be judged on the
basis of the Cys145···His41 contacts in
the native form of the enzyme. The average interatomic distances and angles
relating Cys145@Sγ, Gln(P1)@C,
and His41@Nε corroborate the nearly prereactive character
of the Michaelis complexes in the D/2Pep simulation (e.g.,
Sγ···C ∼ 3.7–4.0 Å;
Sγ···Nε ∼ 3.5 Å). For the monomeric
Michaelis complex, however, the measured prereactive contacts are much less
favorable (e.g., Sγ···C ∼ 5.1 Å;
Sγ···Nε ∼ 3.9 Å).Finally, the question arises about why the monomeric SARS-CoV-2 and SARS-CoV
proteases have minimal or null activity against short peptide molecules. In
this respect, our results reveal significant differences between the
M/Pep and D/2Pep complexes that would
undoubtedly favor the catalytic efficiency in the dimer state. On one hand,
the larger flexibility of the monomeric active site could result in a
certain entropic penalty as peptide binding would stifle the oxyanion loop
and the domain II/III linker residues delimiting the binding sites. On the
other hand, the relative position between the nucleophile and the
Ser(P1′)–Gln(P1)
peptide bond seems not adequate for catalysis in the monomeric Michaelis
complex. This is most likely due to the above-discussed changes in the
S1 site regarding the oxyanion loop
position and the lack of the N-finger residues. In
addition, other possible effects could positively affect catalysis in the
dimer complexes. For example, the proximity of the whole domain III of the
other protomer to the active site region could well provide a favorable
environment for the electrostatic stabilization of transition states and
reactive intermediates. More conclusive evidence about the catalytic
impairment of the monomeric proteases could be gained by means of further
computational work aimed to determine the full catalytic pathway and free
energy profiles.
Authors: Peter Eugene Jones; Carolina Pérez-Segura; Alexander J Bryer; Juan R Perilla; Jodi A Hadden-Perilla Journal: Curr Opin Virol Date: 2021-08-28 Impact factor: 7.121