Literature DB >> 34192089

Unraveling the SARS-CoV-2 Main Protease Mechanism Using Multiscale Methods.

Carlos A Ramos-Guzmán¹, J Javier Ruiz-Pernía¹, Iñaki Tuñón¹.

Abstract

We present a detailed theoretical analysis of the reaction mechanism of proteolysis catalyzed by the main protease of SARS-CoV-2. Using multiscale simulation methods, we have characterized the interactions established by a peptidic substrate in the active site, and then we have explored the free energy landscape associated with the acylation and deacylation steps of the proteolysis reaction, characterizing the transition states of the process. Our mechanistic proposals can explain most of the experimental observations made on the highly similar ortholog protease of SARS-CoV. We point to some key interactions that may facilitate the acylation process and thus can be crucial in the design of more specific and efficient inhibitors of the main protease activity. In particular, from our results, the P1' residue can be a key factor to improve the thermodynamics and kinetics of the inhibition process.

Entities: Chemical Disease Gene Species

Year: 2020 PMID： 34192089 PMCID： PMC7556163 DOI： 10.1021/acscatal.0c03420

Source DB: PubMed Journal: ACS Catal Impact factor: 13.084

Introduction

The recent outbreak of COVID-19, a pneumonia-like illness caused by a coronavirus named SARS-CoV-2,[1] has rapidly evolved into a pandemic as recognized by the World Health Organization. SARS-CoV-2 has been shown to be highly contagious, causing a large number of infections around the world. The absence of vaccines and specific treatments has contributed to a rapid spread of this disease and a fatal outcome in many cases. Furthermore, the existence of other similar viruses detected in animals opens the possibility of future similar diseases.[2,3] Thus, finding effective strategies for the identification of potential targets for new drugs to fight against SARS-CoV-2 and other CoV-like viruses is an urgent need. One of these strategies is based on the disruption of the activity of those enzymes that are crucial in the replication cycle of the virus using adequate compounds. In this sense, knowledge of the catalytic activity of the enzyme in atomistic detail is one of the more powerful tools for efficient and specific new drugs design. In particular, the characterization and analysis of the geometry and electronic properties of the reaction transition state (TS) can be used as a guide for the design of active site inhibitors. In this work we analyze the reaction mechanisms for the main protease of SARS-CoV-2, also referred to as 3C-like protease (3CLpro), using density functional theory (DFT)-based multiscale methods. This enzyme plays an essential role during the replication of the virus and has no closely related homologues in human beings, making it an attractive drug target.[4] The 3CLpro enzyme of SARS-CoV-2 exists as a functional homodimer with two active sites in charge of cleaving the translated polyproteins into individual fragments to be used by the coronavirus.[5] As with other cysteine proteases, each of the active sites contains a Cys-His catalytic dyad in charge of the hydrolysis of the peptide bond at specific sites of a polypeptide chain. Several structures of this protease have been already resolved by means of X-ray crystallography and deposited in the Protein Data Bank (PDB), including the free protease (PDB codes 6Y2E(2) and 6Y84(6)) and inhibitor-bound proteases (PDB codes 6LU7,[6]6Y2F,[7] and 6LZE(8)). The SARS-CoV-2 main protease has a structure that is virtually identical to the ortholog from SARS-CoV (96% identity). Even more, the main residues involved in catalysis, binding, and dimerization processes are fully conserved.[9] Consequently, these two ortholog enzymes display highly similar substrate preferences.[10] The substrate cleavage by the 3CLpro takes place between Gln at the P1 position and a Gly/Ala/Ser at the P1′ position (P and P′ identify the residues placed before and after the scissile bond, respectively), making the presence of Gln an essential requirement.[11] In principle, the reaction mechanism of cysteine proteases involves two basic steps (see Scheme ).[12] In the first step, acylation, the peptide bond is broken, releasing the P′ fragment of the peptidic substrate and forming an acyl–enzyme complex where the catalytic cysteine (Cys145 in the protease of SARS-CoV-2) is covalently bound to the carbon atom of the P1 residue of the target peptide. In a second step, deacylation, the acyl–enzyme is hydrolyzed, releasing the P fragment and recovering the enzymatic active site for another catalytic cycle. Covalent inhibitors of the protease activity form acyl–enzyme complexes that cannot be hydrolyzed and thus remain bonded into the active site.[6,13] In this work we take advantage of the similarities between the proteases of SARS-CoV and SARS-CoV-2 viruses and the existence of ligand-bound structures to build a structural model of a peptide substrate–enzyme Michaelis complex and to study the reaction mechanism using computational simulations.

Scheme 1

Reaction Mechanism in Cysteine Proteases

Methods

Classical Molecular Dynamics Simulations

The crystal structures with PDB codes 6Y2F(7) and 3AW0(14) were used as starting points to build the Michaelis complex. The former corresponds to the holoprotease of SARS-CoV-2, and the latter is the crystallographic structure of the SARS-CoV ortholog cocrystallized with the peptidic aldehyde inhibitor Ac-Ser-Ala-Val-Leu-His-aldehyde. To build the Michaelis complex corresponding to the 3CLpro protease of SARS-CoV-2, the two protein structures were aligned and then the cocrystallized ligand in 6Y2F was replaced with the crystallized ligand in 3AW0 in the two active sites of the homodimer (protomers A and B). The peptidomimetic Ac-Ser-Ala-Val-Leu-His-aldehyde inhibitor was elongated using the Maestro tool[15] until we built the substrate-like sequence Ac-Ser-Ala-Val-Leu-Gln-Ser-Gly-Phe-NMe. We placed the substrate in the two active sites, in agreement with the observation that the X-ray structures (6LZE, 6M0K, 6WNP, 6LU7, and 7BQY) contain inhibitor molecules in both of them. The absent hydrogen atoms were added using the Protein Preparation Wizard tool of Maestro, and PROPKA3.0[16] was used to calculate the protonation states of titratable residues at pH 7.4. The tleap tool from AmberTools18[17] was used to prepare the simulation systems. The Michaelis complexes, described with the ff14SB force field,[18] were solvated into a box with a buffer region of at least 12 Å from any protein/substrate atom to the limits of the simulation box. TIP3P water molecules[19] were used. Na+ atoms were added to neutralize the charge. The resulting system was minimized using 500 steps of the steepest descent method followed by the conjugate-gradient method, until the root-mean-square of the gradient was below 10–3 kcal mol–1 Å–1. The system was then heated from 0 to 300 K using a heating rate of 1.7 K·ps–1. The backbone heavy atoms were restrained in a Cartesian space using a harmonic potential with a force constant of 20 kcal mol–1 Å–2. Along the equilibration step, the positional restraint force constant was changed from 15 to 3 kcal mol–1 Å–2, decreasing by 3 units every 1.25 ns, and after 6.25 ns the positional restraints were removed and the systems continued their equilibration until 7.5 ns of NPT (300 K and 1 bar) simulation was completed. Then, 8 μs of NVT simulation at 300 K was performed with a 2 fs time step using SHAKE.[20] The particle mesh Ewald method was employed to describe the long-range electrostatic interactions;[21,22] for the short-range interactions, a cutoff of 10 Å was used. The pressure was controlled by a Berendsen barostat, and the temperature was controlled by a Langevin thermostat. For all the simulations, periodic boundary conditions were employed. The AMBER19 GPU version of pmemd[23,24] was used to run the classical molecular dynamic simulations. To sample a reasonable configurational space of the protein, two additional replicas of the Michaelis complex model with different initial velocities were run during 1 μs.

QM/MM Calculations

Exploration of the free energy surfaces associated with the peptide bond-breaking process have been carried out using quantum mechanics/molecular mechanics (QM/MM) simulations. In most simulations (see Table S1 for details), the side chains of the catalytic dyad (Cys145 and His41) and a fragment of the peptide substrate of protomer A were included in the QM region, while the rest of the system was described at the MM level as explained earlier. The interaction of Asp187 with His41 is mediated by a water molecule, as observed in the X-ray structures of the inhibited enzyme (6LZE, 6M0K, 6WNP, 6LU7, and 7BQY) and the apo form (1UJ1). The presence of this water molecule in between His41 and Asp187 limits possible polarization and charge-transfer effects, and the latter residue was not included in the QM region. The part of the substrate described at the QM level includes the two residues involved in the peptide bond to be broken (Gln-P1 and Ser-P1′) and the previous and next peptide bonds up to the Cα atoms of Leu-P2 and Gly-P2′. In the exploration of the hydrolysis of the acyl–enzyme, we also included a water molecule in the QM region. To describe the QM subsystem, we used the B3LYP functional[25,26] and D3 dispersion corrections.[27] Calculations were performed with the 6-31+G* basis set unless otherwise indicated (we also employed the 6-31G* basis set). This level of theory has been shown to be one of the best combinations to describe enzymatic reactions.[28] In addition, this computational description for the QM region (B3LYPD3/6-31+G(d)) provides a gas-phase enthalpic change for the proton-transfer reaction between imidazole and methanethiol (the motifs of Cys and His side chains), in agreement with the experimentally derived value (see the Supporting Information). A systematic study for the cysteine–histidine proton transfer also pointed to the B3LYP functional as the most robust one to obtain an electronic description in agreement with higher-level methods.[29] Finally, a recent QM/MM analysis of the electronic properties of enzyme–substrate complexes in SARS-CoV-2 protease emphasizes the importance of choosing an adequate functional for a proper description of the process.[30] All calculations were run with a modified version of Amber18[17,31] coupled to Gaussian16[32] for DFT calculations. A cutoff radius of 15 Å was used for all QM/MM interactions, and the temperature was 300 K. To explore the free energy landscape associated with the chemical reaction, we used our implementation of the string method, the adaptive string method (ASM).[33] In this method N replicas of the system (the nodes of the string) are evolved according to the averaged forces and kept equidistant, converging in such a way to the minimum free energy path (MFEP) in a space of arbitrary dimensionality defined by the collective variables (CVs). Once the string has converged (see an example in Figure S1), we define a single path-CV (a collective variable called s) that measures the advance of the system along the MFEP. This path-CV is used as the reaction coordinate to trace the free energy profile associated with the chemical transformations under analysis. The MFEPs were explored on a free energy hypersurface defined by a set of CVs formed by those distances showing relevant changes during the process under study: H-Sγ(Cys145), H-Nε(His41), H-N(P1′), Sγ(Cys145)-C(P1), and C(P1)-N(P1′) for the acylation step (see Scheme a) and Hw1-Ow, Hw1-N(P1′), Ow-C(P1), Sγ(Cys145)-C(P1), Hw2-Ow, N(P1′)-Nε(His41), and Hw2-Sγ(Cys145) for the deacylation step (see Scheme b). Different initial guesses, corresponding to different mechanistic proposals, were explored at the B3LYPD3/MM level using first the 6-31G* basis set, and then the best candidates were recalculated using the 6-31+G* basis set. Each of the strings was composed of at least 96 nodes (see Table S1) that were propagated with a time step of 1 fs until the RMSD of the string fell below 0.1 amu1/2·Å for at least 2 ps. The converged MFEPs are then averaged to define the s path-CV corresponding to each string. The free energy profiles along the path-CVs were obtained using an umbrella sampling algorithm[34] running simulations for at least 10 ps and were integrated using the WHAM technique.[35] The values of the force constants employed to bias the ASM simulations were determined on-the-fly to ensure a probability density distribution of the reaction coordinate that was as homogeneous as possible.[33] Replica exchange between neighboring string nodes was attempted every 50 steps to improve convergence.

Scheme 2

Collective Variables Employed To Study the Acylation (a) and Deacylation (b) Reaction Mechanisms

For the proton transfer between Cys145 and His41, considering the proximity of the proton donor and acceptor atoms and the geometrical simplicity of the process, we traced the free energy profile using umbrella sampling[34] along a simple proton-transfer coordinate defined as the antisymmetric combination of the distances of the proton to the donor and the acceptor atoms (d(Nε-H)–d(Sγ-H)). For this profile only the side chains of the two involved residues were included in the QM region (using the B3LYPD3/6-31+G* level of theory). A total of 40 windows were used, corresponding to an increment of the reaction coordinate of 0.06 Å, with each of them composed of 10 ps of equilibration and 20 ps of data collection. The force constant employed to drive the reaction coordinate change was 600 kcal·mol–1·Å–2. All the rest of the details of the simulations were as described earlier. We also used umbrella sampling along distinguished coordinates to explore the free energy landscape associated with the separation between the first peptide fragment and the acyl–enzyme complex and between the two peptide fragments at the end of the deacylation process. Details of all the free energy simulations performed in this work are given in Table S1.

Results

Enzyme–Substrate Complex

The time evolutions of the root-mean-square deviation (RMSD) values for each replica (2 protomers and 2 substrates in each replica) are shown in Figure S2. These values show that the protein structure is well-equilibrated and there are no large differences with respect to the initial structures, as prepared from the X-ray data. The observations made on the three replicas (one of 8 μs and two of 1 μs) are very similar in all cases. The substrate-binding pocket is divided into a series of subsites (denoted as S and S′), each accommodating a single residue of the substrate placed before (P) or after (P′) the scissile peptide bond. The map of hydrogen-bond interactions observed during our molecular dynamics (MD) simulations of the Michaelis complex is given in Figure a, a general view of the substrate in the two active sites of the dimer formed by protomers A and B is shown in Figure b, and an insight into the active site of protomer A is provided in Figure c. The 3CLpro of SARS-CoV-2 (as is also the case of the SARS-CoV ortholog) presents a high specificity for Gln at the P1 position.[11,36] As seen in Figure a, the P1 residue is the one establishing more hydrogen-bond interactions with the enzyme. The O and N main-chain atoms of Gln-P1 are found to make hydrogen bonds with main-chain atoms of Gly143, Ser144, and His164 (unless indicated, all the residues of the protein belong to the same protomer A). The side chain of Gln-P1 is accommodated into the S1 subsite through hydrogen-bond contacts with the main-chain atoms of Phe140 and Leu141, with the Nε atom of His163 and the Oε atoms of Glu166. This last residue is in turn hydrogen-bonded to the terminal NH group of Ser1 from protomer B. In fact, the N-terminal fragment (N-finger) of protomer B plays an active role in preorganizing the active site of protomer A for catalysis.[37] Dimerization is an essential condition for catalysis in the protease of a related coronavirus,[36,38,39] and consequently, those mutants lacking the N-finger fragment are almost completely inactive.[40] The side chain of Leu-P2 is surrounded by the side chains of His41, Met49, His164, Met165, and Asp187, while the main-chain amide group is hydrogen-bonded to the Oε atom of Gln189. The side chain of Val-P3 is solvent-exposed, while main-chain N and O atoms are hydrogen-bonded to main-chain atoms of Glu166.

Figure 1

Results of the molecular dynamics simulation of 3CLpro of SARS-CoV-2 in complex with the substrate with sequence Ac-Ser-Ala-Val-Leu-Gln-Ser-Gly-Phe-NMe. (a) Fraction of hydrogen-bond contacts between the residues of the substrate and those of the protease found during the trajectory of the Michaelis complex. A hydrogen-bond contact is counted when the donor–acceptor distance is <3.8 Å and the hydrogen-bond angle is >120°. (b) General overview of the substrate–enzyme complex, showing the dimeric nature of the protease with two identical active sites occupied by the substrate. Note that the N-finger of each protomer is close to the active site of the neighbor protomer. (c) Insight into the binding pose of the peptide substrate into the active site, showing the most important active-site residues and the positions occupied by the P and P′ residues of the substrate. (d) Probability densities of the distances from the Cys145-Sγ atom to the carbonyl carbon atom of the substrate (C(P1)), in red, and to the Nε atom of His41, in blue. The bimodal distribution of Sγ-C(P1) distances correspond to the trans (shorter distances) and gauche (longer distances) conformations of Cys145. (e) Disposition of the substrate in the vicinity of the catalytic dyad when Cys145 is present in the trans conformation. Note the proximity between Sγ and C(P1) atoms and the orientation of the sulfhydryl proton toward the Nε atom of His41. The binding subsite for Ala-P4 is constituted of main-chain interactions with Gln189 and Thr190, while the side chain of this residue is placed between the side chains of Met165, Leu167, and Pro168. The S5 subsite is formed by the side chain of Pro168 and by main-chain atoms of Thr190. This description of the S1–S5 interaction subsites observed in our simulations agrees with the characteristics found in the X-ray structures of SARS-CoV-2 3CL protease with inhibitors bound in the active site.[6,8] Our MD simulations of the substrate–enzyme complex offer, in addition, a detailed description of the S′ subsites, those that accommodate the P′ residues placed after the scissile peptide bond. The main-chain O atom of Ser-P1′ establishes a hydrogen bond with the amide group of Gly143 and the side chain of Asn142. The hydroxyl group of the P1′ side chain can contact with the catalytic dyad (Cys145 and His41), while the CH2 group is packed between Thr25, Thr26, and Leu27. Gly-P2′ is stabilized through main-chain contacts with Thr25 and Thr26. Finally, the side chain of the Phe-P3′ residue is packed against the side chain of Thr24. This structural information can be useful in order to improve the binding and specificity of potential inhibitors of the protease activity because these structural findings are lost in the X-ray structures obtained from those inhibitors in which the fragment corresponding to P′ residues either is released during the formation of the acyl–enzyme complex[13] or is smaller than in our substrate.[6,8] We have also analyzed the substrate–enzyme interactions in protomer B (see Figure S3). The interactions averaged over the μs-time-scale simulations show only small differences between the two protomers, indicating that the poses found in the two active sites are consistent with the reported data.

Catalytic Dyad

The reaction mechanism of cysteine proteases involves the nucleophilic attack of the Sγ atom of a cysteine (Cys145 in our case) to the C(P1) atom of the peptide bond. Figure d shows the probability distribution of Sγ-C(P1) distances found during our MD simulations. The distribution is clearly bimodal, with two peaks centered at 3.4 and 4.7 Å. These two peaks correspond to two different conformations of the side chain of Cys145, which can be present in trans and gauche conformations. This is in agreement with the observations made on the X-ray structure of the ortholog protease of SARS-CoV.[37] The most probable conformation corresponds to the trans conformer in which the Sγ sulfur atom is closer to the substrate (see Figure e). In both conformations the catalytic dyad remains hydrogen-bonded, with the most probable distance between Cys145-Sγ and His41-Nε being about 3.3 Å (see Figure d). Interestingly, His41 is, in turn, hydrogen-bonded through a highly conserved crystallographic water molecule to Asp187. This interaction can raise the pKa of the histidine, increasing its ability to work as a base and abstract the proton from Cys145. Considering that the 3CL protease can be asymmetrical,[38] we also analyzed a 1 μs MD simulation of the dimer with only one of the protomers occupied by the substrate. Figure S4 shows that the distribution of Sγ-C(P1) distances is very similar to that obtained when the substrate is present in the two active sites. A small decrease in the population of the unproductive Cys145 gauche conformer is observed when only one protomer is occupied, which could contribute to the activity of the enzyme. In our simulations of the reaction mechanism, we always selected the Cys145 trans conformation as the initial structure. Taking into account the short distance observed between Cys145 and His41 and the possible activation of this last residue by the nearby Asp187, we explored the possibility to find the catalytic dyad forming an ion pair (Cys–/HisH+ or IP in Figure ) instead of the neutral form modeled in the Michaelis complex (CysH/His or MC in Figure ). We evaluated the free energy difference between these two forms of the dyad by means of free energy profiles associated with the proton-transfer coordinate from Cys145 to His41 obtained at the B3LYPD3/MM level (see Methods). The proton-transfer free energy profiles (Figure a) were obtained for the holo and apo forms of 3CLpro. According to Figure a the catalytic dyad is more stable in its neutral form, for both the apo and holo forms. The IPs are 2.9 and 4.8 kcal·mol–1 above the neutral form in the apo and holo enzymes, respectively. The anionic Cys145 can be stabilized by the presence of water molecules and by the hydroxyl group of Ser-P1′ (see Figure b). In an experimental analysis of the substrate specificity of the ortholog SARS-CoV protease, it was found that mutation of Ser-P1′ to another small residue (such as Gly or Ala) diminishes the rate constant by about 1 order of magnitude.[36] A similar effect on the rate constant was observed for SARS-CoV-2 protease when the leaving group was changed to p-nitroaniline.[41] These variations in the rate constant could be attributed, in part, to a stabilizing effect of the hydroxyl group of Ser-P1′ on the ion pair of about 1.4 kcal·mol–1. Note that the requirement of a small residue at the P1′ position by SARS-CoV and SARS-CoV-2 main proteases could be due in part to the fact that in these cases water molecules can access the active site, contributing also to the stabilization of the ion pair. Bulkier residues at the P1′ position could hinder the access of water molecules and thus destabilize the ionized form of the catalytic dyad. According to our free energy profiles shown in Figure a, the IP form is better stabilized with respect to the neutral dyad in the apo enzyme than in the holo one (by almost 2 kcal·mol–1), which can be related to the better solvation of the catalytic dyad in the former. In both the holo and apo enzymes, the free energy barrier associated with the transfer of the proton between His41 to Cys145 is small, revealing a fast equilibrium between the ion pair and neutral versions of the dyad, with the latter being the predominant form. The existence of a low-lying IP dyad is compatible with the experimental observations made in the kinetic characterization of the homologue 3CLpro of SARS-CoV, in which an ion pair mechanism for the proteolysis was proposed on the basis of the pH-inactivation profile with iodoacetamide and the analysis of solvent isotope effects.[41]

Figure 2

Analysis of the formation of an ion pair (IP) catalytic dyad (Cys145–···His41H+) from the neutral form (Cys145H····His41) found in the Michaelis complex (MC). (a) B3LYPD3/6-31+G*/MM free energy profile associated with the proton-transfer coordinate from the Sγ atom of Cys145 to the Nε atom of His41 (d(Nε-H)–d(Sγ-H)) in the apo (blue line) and holo (red line) enzymes. (b) Representation of the ion pair in the holo enzyme, showing those interactions that stabilize the charged states of the catalytic dyad, in particular the hydroxyl group of Ser(P1′). (c) Representation of the ion pair in the apo enzyme, showing the presence of water molecules that stabilize the charged catalytic dyad when the substrate is absent. The picture derived from our QM/MM free energy profiles contrasts with the results obtained from the exploration of the proton transfer potential energy surface in the SARS-CoV 3CLpro.[42] This study suggests a scenario where substrate binding could reduce the energy cost of forming the IP. However, the reported energy difference between the IP and neutral complexes was significantly higher than our free energy estimation, about 11 kcal·mol–1, and difficult to reconciliate with the observed rate constants (see below) and pKa determinations.[43]

Acylation Step

We explored the free energy landscape for the formation of the acyl–enzyme starting from the catalytic dyad IP at the B3LYPD3/MM level. The converged MFEP is shown in Figure a and b. According to our simulations, after IP formation the acylation proceeds by means of a proton transfer from His41 to the N(P1′) atom followed by the nucleophilic attack of Cys145-Sγ on the C(P1) atom and the simultaneous breaking of the C(P1)-N(P1′) peptide bond. These elementary events take place in a concerted but asynchronous way. The transition state (TS) found for this mechanism (see Figure c) is associated with the proton transfer from His41 to the amide nitrogen atom of the peptide bond N(P1′). At the TS the Sγ atom of Cys145 approaches the C(P1) atom, reducing the interatomic distance from 3.11 to 2.34 Å. This approach is accompanied by a moderate lengthening of the peptide bond (the C(P1)-N(P1′) distance being lengthened from 1.36 to 1.54 Å). According to the free energy path shown in Figure a, the total free energy barrier associated with the acylation process, including the free energy cost of the ion pair formation, is 14.6 kcal·mol–1. This value is compatible with the activation free energies derived from the steady-state rate constants measured at 25 °C for peptides cleaved at the Gln-Ser bond by the highly similar ortholog 3CLpro of SARS-CoV (between 16.2 and 17.2 kcal·mol–1).[41] It must be noticed that in this proteolysis the acylation step is not considered to be the rate-limiting one and then these experimental activation free energies provide an upper limit for the acylation barrier.[41]Figure c shows that this TS is stabilized by means of a hydrogen-bond interaction with the hydroxyl group of Ser-P1′, indicating, also in agreement with the experimental observations, that the leaving group plays an important role in catalysis. The TS stabilization provided by the hydroxyl group could also contribute to the larger rate constant observed for the scission of the Gln-Ser bond.[36] Remarkably, the proposed reaction mechanism is also in good agreement with the experimental proton inventory results, which indicate that there are two protons in flight during the acylation, one at the TS and another one at earlier stages.[41] In our picture these two proton-transfer events correspond to the proton transferred from His41 to the N(P1′) atom of the substrate at the TS and the proton transfer from Cys145 to His41 during IP formation. Regarding the acyl–enzyme product (Figure d), our free energy profile shows two possible conformations that differ in the presence of a water molecule that plays a key role during deacylation (see Figure S5). Finally, in our free energy profile the formation of the acyl–enzyme (see Figure d) is almost thermoneutral, with a reaction free energy of about −1 kcal·mol–1.

Figure 3

Simulation of the acylation reaction taking place through the formation of the ion pair. (a) B3LYPD3/6-31+G*/MM free energy profile along the path-CV for the acylation reaction after the formation of the ion pair (IP) from the Michaelis complex (MC). The reaction takes place with a single transition state (TS) that yields the acyl–enzyme (ACE), which can be present in two conformations. (b) Evolution of the distances selected as collective variables (CVs) along the minimum free energy path (MFEP). Sγ-H is in yellow, H-Nε is in red, C(P1)-N(P1′) (the scissile peptide bond) is in green, H-N(P1′) is in black, and Sγ-C(P1) is in blue. (c) Representation of the TS for the acylation process. This TS corresponds to the proton transfer from His41 in the ion pair catalytic dyad to the nitrogen atom of the peptide bond (N(P1′) with the approach of the Sγ atom of Cys145 to the carbonyl carbon atom (C(P1)) and the lengthening of the peptide bond. The values of the distances correspond to the coordinates of the MFEP at the TS, except the intramolecular distance between the hydroxyl and NH group of Ser(P1′) that has been averaged over the trajectory of the corresponding string node. (d) Representation of the acyl–enzyme complex formed between the enzyme and the P fragment of the peptide, with a water molecule hydrogen-bonded to the N-terminal group of the P′ fragment. The free energy profile shows two minima for the acyl–enzyme complex, differing in the distance between the P and P′ fragments, as shown in Figure S5. Very recently an interesting QM/MM study of the same protease with a different substrate (where the leaving group was not a peptide fragment but a fluorescent tag, 7-amino-4-carbamoylmethylcoumarin) has been reported.[44] In that work the proton transfer from Cys145 to His41 was found to be concomitant with the nucleophilic attack of the Sγ atom on the carbonyl carbon atom, forming a thiohemiketal intermediate, and the cleavage of the peptide bond takes place in a subsequent step assisted by the proton transfer from His41. The differences with respect to our results could be due to the use of a different substrate and/or the use of different theoretical descriptions (that work used a combination of semiempirical and DFT methods with the M06-2X functional). The description of the acylation step can be highly dependent on the selection of the QM level, as demonstrated in Figure S6 where we compare the free energy profiles obtained with different Hamiltonians and basis sets. In any case, that QM/MM study obtained an activation free energy of 19.9 kcal·mol–1, in excellent agreement (within 0.5 kcal·mol–1) with the value derived from the experimental rate constant obtained for that substrate.[10] Interestingly, this rate constant (0.050 s–1)[10] is significantly smaller than the value reported for the hydrolysis of the Gln-Ser bond by the ortholog enzyme of SARS-CoV (1.5–8.5 s–1).[41] The gap between these two experimental rate constant values could be due to differences in the preparation and purification of the enzyme or to genuine mechanistic differences between substrates in the main protease.[41,45] In this sense, as discussed earlier, the presence of a hydroxyl group at the P1′ position could play an important role in the acylation process, improving the binding and kinetics of a hypothetical inhibitor. To explore other possible mechanisms,[46] we also studied a reaction path that does not involve the formation of an ion pair, although the associated free energy barrier is considerably higher and incompatible with the experimental rate constant (see Figure S7).

Deacylation Step

Regarding the deacylation step, the standard mechanistic proposal suggested for related enzymes[47] assumes that, once the neutral P′-NH2 peptide fragment has left the active site, a water molecule activated by His41 attacks the C(P1)-Sγ bond, releasing the P-COOH peptide (also with a neutral terminal group) and, finally, regenerating the enzyme after a proton transfer from His41 to Cys145 (see Scheme ). In our simulations of the acylation product, we found that a water molecule can be placed in between the P′-NH2 leaving fragment and the acyl–enzyme complex, being correctly oriented to perform the hydrolysis of the acyl–enzyme (see Figures d and S5). This configuration suggests an alternative reaction mechanism that can yield the two peptide fragments with correct protonation states in the terminal groups and regenerate the enzymatic active site in its most stable state (the neutral catalytic dyad). An additional advantage of this novel mechanism is that it involves water activation by means of the N-terminus of the P′ fragment, which is known to be a better base than histidine side chains (the average pKa values are about 7.7 and 6.6, respectively).[48] Finally, the proposed mechanism can be at the origin of the mechanistic differences observed when the scissile bond is an amide instead of an ester (in that case a basic N-terminal group is not formed after the acylation).[41] We obtained the MFEP corresponding to such a mechanism at the B3LYPD3/MM level (see Figure ). This process is stepwise, presenting two TSs (see Figure a). The first TS (TS1 in Figure c) corresponds to the proton transfer from the water molecule to the N(P1′) atom, resulting in the formation of the P′-NH3+ peptide fragment. The free energy barrier associated with this step is 15.6 kcal·mol–1. This proton transfer is concomitant to the attack of the hydroxyl group on the C(P1) carbonyl carbon atom, resulting in the formation of an intermediate thiodiolate (Figure d). After rotation of the hydroxyl group to orient the proton toward the sulfur atom, the reaction proceeds with the cleavage of the C(P1)-Sγ bond. The second TS observed during the deacylation (TS2, Figure e) corresponds to the separation of the Sγ atom (with the C(P1)-Sγ distance being 2.67 Å). The free energy barrier associated with this second step from the acyl–enzyme complex is very close to the first one, 15.8 kcal·mol–1 (17.5 kcal·mol–1 from the most stable acyl–enzyme conformer; see Figure ). This value is in excellent agreement with the values derived from the reaction rate constants for the ortholog protease of SARS-CoV (from 16.2 to 17.2 kcal·mol–1).[41] Afterward, the leaving cysteine is stabilized by means of a proton transfer from the C-terminal group to the Sγ atom, regenerating the enzyme in its more stable protonation state (a neutral catalytic dyad) and yielding the P peptide fragment with a terminal unprotonated carboxylate (the product is represented in Figure f). The proposed mechanism, in which the general base is a N-terminal group, displays a smaller barrier than the standard mechanism in which His41 acts as the general base activating the water molecule, as expected from the relative pKa values (see Figure S8).

Figure 4

Figure 5

Proteolysis mechanism in 3CLpro of SARS-CoV-2 as deduced from our simulations. (a) Schematic representation of the reaction mechanism. (b) Free energy profile associated with the reaction mechanism.

Simulation of the deacylation reaction with the N-terminal group of the P′ fragment acting as a general base in charge of water activation. (a) B3LYPD3/6-31+G*/MM free energy profile along the path-CV for the deacylation reaction. The process takes place in two steps. In the first one the water activated by the N-terminal group attacks the acyl–enzyme complex (ACE) to form a thiodiolate intermediate (I) through the first TS (TS1). In the second one the reaction product (Pr) is obtained after breaking the acyl–enzyme bond in the second TS (TS2). (b) Evolution of the distances selected as collective variables along the minimum free energy path. Sγ-Hw2 is in green, Nε-N(P1′) is in black, Ow-C(P1) is in gray, N(P1′)-Hw1 is in red, Sγ-C(P1) is in yellow, Ow-Hw1 is in purple, and Ow-Hw2 is in blue. (c) Representation of TS1 where the water molecule is transferring a proton to the N-terminal group of the P′ fragment and the resulting hydroxyl anion attacks the carbonyl carbon atom of the P fragment (C(P)). The values of the distances correspond to the coordinates of the MFEP at TS1. (d) Representation of the thiodiolate intermediate (I). (e) Representation of TS2 corresponding to the breaking of the Sγ-C(P1) bond and the proton transfer from the carboxylic terminal group to the leaving sulfur atom. (f) Representation of the reaction products (Pr) with the P-COO– and P′-NH3+ peptide fragments in the active site of the protease. The deacylation step presents a free energy change of about −5 kcal·mol–1. This exergonic character can be increased with the release of the reaction products to the bulk. It is worth noticing that the peptide fragments obtained from this mechanistic proposal present a salt bridge between the charged C-terminal and N-terminal groups that must be broken during products release. The separation of the two terminal groups implies that water molecules must be placed, tightly bounded, between these two charged groups. Those configurations could contribute to an inverse solvent isotope effect observed under steady-state conditions only when the scissile bond is an amide and not an ester (this is only when a N-terminal group is available to act as the general base).[41]

Conclusions

We have presented a detailed analysis of the Michaelis complex and the proteolysis mechanism in the 3CLpro of SARS-CoV-2 using DFT/MM computational simulations. Our study has identified key interactions established between the protein and a peptide substrate and the detailed reaction mechanism. The complete reaction cycle is shown in Figure a, while the associated free energy profile is given in Figure b. The reaction process involves the formation of a catalytic dyad ion pair from which the acylation step can proceed (see Video 1 in the Supporting Information). In this acylation, the TS involves the proton transfer from His41 to the nitrogen atom followed by the nucleophilic displacement of the peptide bond by the Sγ atom of Cys145. This finding can be relevant for the design of inhibitors with improved kinetics, stabilizing this proton transfer through, for example, an adequate choice of the P1′ group. For the deacylation step (see Video 2 in the Supporting Information), we have proposed a completely novel mechanism where the N-terminal group of the first peptide fragment acts as the general base catalyzing the hydrolysis of the acyl–enzyme complex. Proteolysis mechanism in 3CLpro of SARS-CoV-2 as deduced from our simulations. (a) Schematic representation of the reaction mechanism. (b) Free energy profile associated with the reaction mechanism. The free energy profile associated with the proposed reaction mechanism is shown in Figure b. According to this profile, the acylation step is almost thermoneutral, while the deacylation is more exergonic, in agreement with the observed irreversibility of the process. Notably, our mechanistic proposal for the deacylation step can explain the experimental differences observed between the hydrolysis of amide and ester substrates in the highly similar protease of SARS-CoV, which had remained unexplained until now. The predicted global free energy barrier, 17.5 kcal·mol–1, is in excellent agreement with the values derived from the observed rate constants for the hydrolysis of the Gln-Ser peptide bond in the highly similar SARS-CoV protease (between 16.2 and 17.2 kcal·mol–1).[41] Note that the rate constants for the hydrolysis of different substrates with a fluorescent tag as the leaving group have been also measured for the SARS-CoV-2 protease.[10] The observed rate constants were smaller than those measured for the Gln-Ser hydrolysis in the SARS-CoV protease,[41] and the activation free energies derived from these rate constants measured for the SARS-CoV-2 protease, between 18.6 and 19.4 kcal·mol–1, are somewhat larger than our calculated value. However, because of the differences between experimental procedures, it is difficult to assess if the observed disparity among rate constants in the two proteases is due to intrinsic properties of the two enzymes (which is unlikely due to the identity between active sites) or, more likely, due to differences in the preparation of the enzyme (which can affect the measured rate constants) or to differences in the substrate (which can change the rate constant by 1 order of magnitude or more). We thus preferred to use as a reference for our study the detailed kinetic characterization made on the SARS-CoV protease with the same substrate as that employed in our simulations.[41] Our study can also be useful to assist in the design of new and more potent inhibitors of SARS-CoV-2 main protease. Apart from the well-known interactions established by the P1–P4 groups in the active site, our MD simulations of the enzyme–substrate complex stress the role of the interactions established by the P1′ and P2′ groups. In particular, the hydroxyl group at the P1′ position could be important not only for binding but also during the reaction process because it could assist the formation of the catalytic dyad ion par and the acylation transition state. Interestingly, during the preparation of this manuscript a preprint has shown that a hydroxymethylketone derivative can be a potent inhibitor of SARS-CoV-2 3CL protease.[49] In this compound the hydroxyl group can establish a direct contact with the catalytic dyad.[50]

21 in total

Review 1. Methodology-Centered Review of Molecular Modeling, Simulation, and Prediction of SARS-CoV-2.

Authors: Kaifu Gao; Rui Wang; Jiahui Chen; Limei Cheng; Jaclyn Frishcosy; Yuta Huzumi; Yuchi Qiu; Tom Schluckbier; Xiaoqi Wei; Guo-Wei Wei
Journal: Chem Rev Date: 2022-05-20 Impact factor: 72.087

2. Impact of Warhead Modulations on the Covalent Inhibition of SARS-CoV-2 M^pro Explored by QM/MM Simulations.

Authors: Sergio Martí; Kemel Arafet; Alessio Lodola; Adrian J Mulholland; Katarzyna Świderek; Vicent Moliner
Journal: ACS Catal Date: 2021-12-26 Impact factor: 13.084

3. Novel dynamic residue network analysis approaches to study allosteric modulation: SARS-CoV-2 M^pro and its evolutionary mutations as a case study.

Authors: Olivier Sheik Amamuddy; Rita Afriyie Boateng; Victor Barozi; Dorothy Wavinya Nyamai; Özlem Tastan Bishop
Journal: Comput Struct Biotechnol J Date: 2021-11-25 Impact factor: 7.271

4. Protein Posttranslational Signatures Identified in COVID-19 Patient Plasma.

Authors: Pavan Vedula; Hsin-Yao Tang; David W Speicher; Anna Kashina
Journal: Front Cell Dev Biol Date: 2022-02-11

5. Insights into the binding and covalent inhibition mechanism of PF-07321332 to SARS-CoV-2 M^pro.

Authors: Son Tung Ngo; Trung Hai Nguyen; Nguyen Thanh Tung; Binh Khanh Mai
Journal: RSC Adv Date: 2022-01-28 Impact factor: 3.361

6. Multiple protonation states in ligand-free SARS-CoV-2 main protease revealed by large-scale quantum molecular dynamics simulations.

Authors: Junichi Ono; Uika Koshimizu; Yoshifumi Fukunishi; Hiromi Nakai
Journal: Chem Phys Lett Date: 2022-02-22 Impact factor: 2.328

7. Mechanism of inhibition of SARS-CoV-2 M^pro by N3 peptidyl Michael acceptor explained by QM/MM simulations and design of new derivatives with tunable chemical reactivity.

Authors: Kemel Arafet; Natalia Serrano-Aparicio; Alessio Lodola; Adrian J Mulholland; Florenci V González; Katarzyna Świderek; Vicent Moliner
Journal: Chem Sci Date: 2020-11-27 Impact factor: 9.825

8. New insights into the catalytic mechanism of the SARS-CoV-2 main protease: an ONIOM QM/MM approach.

Authors: Henrique S Fernandes; Sérgio F Sousa; Nuno M F S A Cerqueira
Journal: Mol Divers Date: 2021-06-24 Impact factor: 3.364

9. Inhibition Mechanism of SARS-CoV-2 Main Protease with Ketone-Based Inhibitors Unveiled by Multiscale Simulations: Insights for Improved Designs*.

Authors: Carlos A Ramos-Guzmán; J Javier Ruiz-Pernía; Iñaki Tuñón
Journal: Angew Chem Int Ed Engl Date: 2021-11-03 Impact factor: 15.336

10. Fullerenes against COVID-19: Repurposing C₆₀ and C₇₀ to Clog the Active Site of SARS-CoV-2 Protease.

Authors: Tainah Dorina Marforio; Edoardo Jun Mattioli; Francesco Zerbetto; Matteo Calvaresi
Journal: Molecules Date: 2022-03-16 Impact factor: 4.411