Literature DB >> 33939913

Elucidation of Cryptic and Allosteric Pockets within the SARS-CoV-2 Main Protease.

Terra Sztain¹, Rommie Amaro¹, J Andrew McCammon^1,2.

Abstract

The SARS-CoV-2 pandemic has rapidly spread across the globe, posing an urgent health concern. Many quests to computationally identify treatments against the virus rely on in silico small molecule docking to experimentally determined structures of viral proteins. One limit to these approaches is that protein dynamics are often unaccounted for, leading to overlooking transient, druggable conformational states. Using Gaussian accelerated molecular dynamics to enhance sampling of conformational space, we identified cryptic pockets within the SARS-CoV-2 main protease, including some within regions far from the active site. These simulations sampled comparable dynamics and pocket volumes to conventional brute force simulations carried out on two orders of magnitude greater timescales.

Entities: Chemical

Mesh：

Substances：

Year: 2021 PMID： 33939913 PMCID： PMC8117783 DOI： 10.1021/acs.jcim.1c00140

Source DB: PubMed Journal: J Chem Inf Model ISSN： 1549-9596 Impact factor: 4.956

Introduction

In December 2019, the World Health Organization learned of a novel coronavirus that has since rapidly spread, leading to a global pandemic. The SARS-CoV-2 virus, causing the disease named COVID-19, has exceeded 100 million cases worldwide, taking over 2 million lives as of January 30, 2021.[1] The genetic and structural similarity of the SARS-CoV-2 virus to the agents of the severe acute respiratory syndrome coronavirus (SARS-CoV) epidemic in 2003 and Middle East respiratory syndrome coronavirus (MERS-CoV) epidemic in 2012 provide a basis of information for understanding and ultimately treating or preventing this disease; however, no highly effective antiviral drug exists for any human-infecting coronavirus. Proteases are responsible for activating viral proteins for particle assembly and have proved successful targets for antiviral agents; most notable are the protease inhibitors used to treat HIV and Hepatitis C.[2,3] The main protease of SARS-CoV-2, called Mpro or 3CLpro encoded by the nsp5 gene,[4] was the first SARS-CoV-2 protein deposited to the protein databank (PDB) on January 26th 2020.[5] This structure was crystalized with a covalent inhibitor (N3) identified from computer-aided drug design and validated biochemically. Currently, over one hundred Mpro structures exist in the PDB and massive efforts to discover a successful inhibitor are underway.[5−8] Mpro is highly similar to 3CL proteases from other coronaviruses in sequence, structure, and function.[9,10] First, Mpro autocatalytically cleaves itself from the SARS-CoV-2 polypeptide and then forms a homodimer to subsequently cleave at 11 distinct sites, while the papain-like protease, PLpro, cleaves at three sites, allowing the viral proteins to fold and perform their functions.[11−13] Proteolytic cleavage is catalyzed by residues H41 and C145 of the catalytic cysteine dyad. These active site residues are located at the N-terminal globular domain of Mpro, which contains an anti-parallel beta barrel reminiscent of trypsin-like serine proteases, while the C-terminal domain consists of five alpha helices.[13] Upon dimerization, Mpro resembles the shape of a heart symbol (Figure A).

Figure 1

Pocket analysis of GaMD simulations of Mpro. (A) Active site, dimer interface, and distal site definitions for pocket calculations. The inclusion sphere was centered on the center of mass (COM) of the top and bottom protein domains for active and distal sites with 12 and 10 Å radii, respectively. A 10 Å-radius sphere was used for the dimer interface calculation. (B–D) Histograms showing the distribution of pocket volumes calculated from the aggregate 6 μs GaMD simulations. Many efforts to identify inhibitors involve high throughput virtual screening, which exploits the power of computational docking to screen millions of molecules in silico to narrow down a few “hits” for lead optimization. Incorporation of molecular dynamics (MD) has significantly improved the ability to identify promising protein inhibitors.[14] Docking to protein ensembles obtained from MD simulations is often employed to consider multiple target states that remain elusive in static crystal structures.[15] Conventional MD simulations are however limited in the amount of conformational space that can be sampled due to the amount of time required to traverse energy barriers between stable conformational states.[16] In the present study, we used the enhanced sampling technique, Gaussian accelerated MD (GaMD),[17] to overcome such barriers. GaMD adds a harmonic boost to the potential energy up to a threshold, effectively “filling in the wells” creating a smoother potential energy surface.[17,18] This method allowed us to extensively sample conformations of the SARS-CoV-2 main protease at minimal computational expense and provide detailed characterization of several potentially druggable pockets, which can serve as a basis for identifying an Mpro inhibitor for COVID-19 treatment.

Results

GaMD simulations were performed on the first published Mpro structure PDB 6LU7.[5] Six systems were simulated, including monomer and dimer simulation in the presence and absence of the co-crystalized N3 inhibitor either covalently or non-covalently bound (Figure S1). Five independent simulations of 200 ns for an aggregate of 1 μs were carried out for each system. Three regions were defined to investigate potentially druggable pockets: the active site, which lies in the N-terminal top lobe of the heart-shaped protease; the C-terminal region, as a potential allosteric site; and the dimer interface region (Figure A).

Characterization of Mpro Pockets

A variety of pocket volumes were sampled between 16 and 277 Å3 for the active site pocket, starting from 170 Å3 in the crystal structure. The interface spanned from 36 to 429 Å3 starting from 209 Å3 and the distal site from 0 to 78 Å3 starting from 18 Å3 (Figure B–D). Simulations with the N3 ligand present sampled greater mean volumes for the active site and distal region than the apo simulations without N3; however, the average dimer interface region was greater in the apo simulation (Figure S2). Covalently bound N3 sampled greater mean volumes for the active site and dimer interface than non-covalent, which sampled greater mean volumes for the distal site (Figure S3). In a few simulations, the non-covalent ligand escaped the active site pocket (Figure S4). In all simulations, the ligand re-arranges compared to the crystal structure. The non-covalent ligands quickly decrease in total number of native contacts and increase in non-native contacts. The covalent ligands slightly decrease in native contacts but have a similar increase in non-native contacts. (Figures S5 and S6). These simulations revealed significant loop dynamics (Figure A–C) leading to the distinct pocket conformations (Figure D–F and Movie S1) with varying active site water occupancy (Figure S7).

Figure 2

Cryptic pockets identified from GaMD simulations. (A–C) Overlay of several frames highlighting loop dynamics leading to various pocket conformations for each site. (D–F) Overlay of several pockets shown outlined in mesh, colored in varying shades of pink for the active site, blue for the distal site, and orange for the interface. (G–I) Computational solvent mapping of “hot spots” to pockets at each site. Several metrics were used to assess the properties of these pockets. The PockDrug[19] webserver predicts druggability probability based on a variety of factors including geometry, hydrophobicity, and aromaticity, among others. The FTMap[20] computational solvent mapping webserver helps determine “hot spots” and can aid in fragment-based drug design. For each site, the sampled pocket volumes were clustered into 10 populations, and a frame corresponding to that volume was analyzed. The most populated volumes for the active site, dimer interface, and distal site were 114, 220, and 10 Å3, respectively. The 10 active site pockets gave predicted druggabilities between 0.18 and 0.86 and solvents mapped to seven of the pockets (Figure S8). The dimer interface pockets gave higher druggability prediction values, between 0.5 and 0.93, with solvent mapping to all 10 structures (Figure S9). The distal site pockets gave the highest predicted druggability, between 0.37 and 1.00 with six out of ten pockets giving predicted druggability values greater than 0.90. Solvents mapped to seven out of the ten pockets (Figure S10). Notably, none of the solvent mapping to any of the sites showed overlapping results, indicating that each structure may provide distinct opportunities for designing inhibitors. The pocket volume, predicted druggability, and solvent mapping were not directly correlated, as expected, since each calculation considers different metrics. Overall, these analyses indicate that the simulations reveal a diverse set of conformations that are likely to serve as viable targets for in silico drug development. Dimer association is necessary for catalytic activity of the protease;[5] therefore, it is reasonable to assume that binding of a molecule, which prevents this association, would inactivate the protease. The significance of the distal regions is less well understood, though computational investigations support a potential allosteric role.[21,22] To study this further, we examined the correlated motions of each residue throughout the simulations. In the monomer simulation, the active site region was positively correlated to the distal site region and negatively correlated to the region that connects the two, whereas the inverse was observed in the dimer simulation (Figure and Figures S11 and S12). These correlations resulted in primary motions, detected by principal component analysis, comprising hinge-like in the monomer versus twisting in the dimer (Figure S13). These motions were similar in both the apo and N3 bound simulations. In the dimer simulations, the active site of chain A was slightly correlated to the active site of chain B, and the distal regions were highly anticorrelated, twisting in opposite directions. (Figure S13). Regardless of positive or negative correlation values, the distal and active sites show a dynamic interdependence on each other in both the apo and dimer forms.

Figure 3

Correlated motions of each residue. (A) Correlation matrix calculated from aggregate 1 μs simulation of apo Mpro. Calculation from the monomer simulation is shown as an inset in the upper right corner of the calculation from the dimer simulation. (B) Structural representation of correlated motions, with each residue colored based on correlation to the catalytic histidine H41.

GaMD Enhancement of Mpro Conformational Space

Previous benchmarks have demonstrated that GaMD can enhance sampling by multiple orders of magnitude.[23] To test the sampling enhancement compared to conventional MD simulations, we compared the 1 μs aggregate GaMD simulation from five replicates of 200 ns to the 10 μs simulations carried out at Riken[24] and 100 μs simulations D.E. Shaw Research (DESRES).[25] We additionally performed and compared those to 1 μs of GaMD continuous simulation and compared 200 ns cMD to 200 ns GaMD. The ranges of pocket volumes for each site were highly similar, as detailed in Figure S14. Even the short conventional simulations sampled comparable pocket volumes, indicating that this metric may not be the most effective measure of conformational sampling. Principal component analysis (PCA) is frequently used as a metric for conformational diversity as it decreases the dimensionality to maximize the dataset variance.[26,27] PCA was indeed more consistent with expectations where increasing simulation time corresponded to larger regions of space covered by plotting the first two principal components (Figure ).

Figure 4

Conformational sampling of GaMD and conventional MD. Each plot shows the first and second principal components, projected from a mass weighted matrix of backbone atom positions of the apo dimer, colored by probability density. (A) 200 ns of GaMD simulation. (B) 1 μs of GaMD aggregated from five independent 200 μs simulation. (C) 1 μs of continuous GaMD simulation. (D) 200 ns of cMD. (E) 10 μs of cMD performed by Riken. (F) 100 μs of cMD performed by D.E. Shaw Research (DESRES). The area of a 2D histogram of the PC1 vs PC2 datapoints from each simulation is labeled as a percentage relative to the 100 μs DE. Shaw Research (DESRES) conventional MD simulation. While the GaMD simulations run continuously for 1 μs only covered 54% of the conformational space covered by the 100 μs DESRES, the 1 μs aggregate GaMD from five simulations run in parallel covered 80% of the DESRES space. The 10 μs Riken simulations covered 94% of that space (Figure ). It is worth noting that each of the simulations with the most extensive sampling (DESRES, Riken, GaMD 5 × 200 ns) reached areas in the PC1-PC2 subspace that neither of the other two reached. This just confirms that none of these simulations achieve comprehensive sampling, which, in the ultimate case, would include extensively unfolded conformations. Each of these simulations reveals cryptic binding sites not seen in shorter, conventional MD simulations, let alone in the experimental structures, and some sites are seen in one but not the other simulations. Though perhaps not intuitive at first glance, it is reasonable that many simulations in parallel would have a greater chance of sampling diverse space than one long simulation, which may settle into a favorable low energy conformation, even given the boost added by the GaMD, which is restricted by a threshold.[17] Representative structures from dense populations of the PCA plots were further analyzed structurally and through computational solvent mapping (Figure S15). Additionally, RMSD and RMSF comparison revealed that the majority of the conformational deviations occur within the helices and loops lining the active site pocket (Figures S16,S17, and S21–S23), which was also visualized in the representative structures from populations determined from PCA (Figure S15). Taken together, the range of pocket volumes, structural conformations, and dynamics were similar between short GaMD simulations and brute force conventional simulations. The plethora of available structures from all of these studies is expected to be useful for in silico inhibitor development for targeting Mpro.

Conclusions

GaMD simulations of the SARS-CoV-2 main protease, Mpro, revealed cryptic pockets not detectable from the crystal structure. In addition to characterizing the vast conformational landscape of the active site and dimer interface, a distal region was explored as a potential allosteric site. GaMD sampled conformational space more efficiently than brute force simulations, with 1 μs simulation data providing comparable results to 10 μs simulations carried out by Riken[24] and 100 μs simulations carried out by DE Shaw Research,[25] offering a more widely accessible simulation strategy. These structures, along with the millisecond simulations from Folding@home,[28] can serve as a basis for in silico docking to identify Mpro inhibitors for COVID-19 treatment.

Methods

Simulation Preparation

The first crystal structure of SARS-CoV-2 main protease, PDB ID 6LU7,[5] was used as a starting point for all simulations. Protonation states of titratable residues were determined using the H++ webserver[29] with 0.15 M salinity, 10 internal dielectric constant, 80 external dielectric constant, and pH 7.4. Histidines 64, 80, and 164 were thus named HID to indicate delta nitrogen protonation before protonation of entire protein using tleap from AmberTools 18.[30] Six systems were simulated as shown in Figure S1. For apo simulations, the inhibitor was deleted from the PDB file. For dimer simulations, the PyMOL[31] symexp command was used to generate initial coordinates for the second chain based on symmetry. The ff14SB[32] forcefield was used for proteogenic residues. For simulations with the N3 bound, the antechamber package of AmberTools 18 was used to generate GAFF[33] forcefield parameters for N3. Gaussian 09 was used to calculate partial charges of atoms according to the RESP[34] method with the HF/6-31G* level of theory. For covalent simulations, the cysteine 145 was treated as an unnatural amino acid and capped with N-methyl and C-acetyl groups for charge calculations. A TIP3P isometric water box was added with at least 10 Å buffer between solute and edges of the box. Enough Na+ was added to neutralize the system, and then Na+ and Cl– ions were added to a final concentration of 150 mM. Minimization was carried out in two steps. First, the solute was restrained with a 500 kcal/mol restraint force to minimize the solvent for 10,000 cycles followed by unrestrained minimization for 300,000 cycles. Next, the system was heated to 310 K over 350 ps using an isothermal-isovolumetric (NVT) ensemble followed by isothermal-isobaric (NPT) equilibration for 1 ns.

Gaussian Accelerated Molecular Dynamics

Gaussian accelerated molecular dynamics (GaMD)[17] pmemd.cuda implementation of Amber 18 was used to generate five independent trajectories of 200 ns, an aggregate of 1 μs for each system. The dual boost method was employed, adding a biasing force to both the total and dihedral potential energy. The threshold energy was set to the upper bound. GaMD production of equilibrated systems was carried out in five stages. First, 200 ps of preparatory conventional MD simulation was carried out, without any statistics collected. Second, 1 ns of conventional MD was carried out to collect potential statistics Vmax, Vmin, Vavg, and σV. Next, 800 ps of GaMD was carried out with a boost potential applied with fixed parameters. Then, 50 ns of GaMD was carried out with updated boost parameters, and finally, 150 ns of GaMD was carried out with fixed boost parameters. All production simulations were carried out using NVT, with periodic boundary conditions and 2 fs timesteps. The SHAKE[35] algorithm was used to restrain nonpolar hydrogen bonds and TIP3P water molecules. The Particle Mesh Ewald[36] method was used for electrostatic interactions with a 10 Å cutoff for nonbonded interactions. A Langevin thermostat was used for temperature regulation with a collision frequency of 5 ps–1.

Pocket Analysis

For each system, the aggregated simulation was clustered based on the RMSD from the first frame using the cpptraj[37] hierarchical algorithm to obtain 100 distinct structures for pocket calculation. Three regions were defined for individual calculations: the active site pocket, distal site pocket, and dimer interface. Pocket volume was calculated using POVME 2[38] with a 1 Å grid spacing. The point inclusion sphere was determined based on the average center of mass of residues 7–198 with a 12 Å radius for the active site pocket, the average center of mass of residues 198–306 with a 10 Å radius for the distal site pocket, and the average center of mass between residues within 3.5 Å of the other dimer with a 10 Å radius for the dimer interface. The pocket volumes were chosen to be underestimated as opposed to overestimated and were calculated with a minimal radius to avoid extensively calculating known solvent exposed regions. As shown in the FTMap solvent screening (Figures S10 and S15), fragments bound to various regions around the distal site crevices. Increasing the radius of the pocket calculation space would incorporate solvent exposed regions from many directions that would not be useful and would involve other regions of the protein. Due to this volume dependence on chosen radius, these volumes are intended as an initial comparative metrics for identifying structures to analyze further, rather than concrete final values. For analysis of water occupancy, the GIST (Grid Inhomogeneous Solvation Theory Method)[39] function of cpptraj was used for the active site region, and the density of oxygen centers map was used to visualize results in VMD.[40] To quantify druggability, we have used the PockDrug[19] webserver, which predicts druggability probability based on a variety of factors including geometry, hydrophobicity, and aromaticity, among others. Additionally, we have used the FTMap[20] computational solvent mapping webserver to determine druggable “hot spots.” Pocket volumes calculated by POVME[38] from each of the three regions were clustered into 10 populations, and representative frames for each were selected for analysis. The fpocket estimation method was used for PockDrug, and the druggability probability of the pocket that aligned best to the POVME[38] calculation sphere was reported.

Principal Component Analysis

Principal motions were calculated using cpptraj, as described previously,[41] from projection of displacement vectors of each of the backbone atoms onto a diagonalized mass-weighted covariance matrix after rms fitting of every atom except protons to the first frame. Residue correlation matrices were also generated this way. For the graphical abstract, reweighting of the potential energy was carried out on one 200 ns simulation of the one N3 non-covalently bound dimer using the PyReweighting toolkit.[42] Reweighting was carried out using the Maclaurin series expansion to the 10th order with a bin size of 6 and maximum energy of 100 based on the first two principal components. One conventional 200 ns MD simulation was performed for comparison and was outlined in the figure. For the conformational sampling comparison shown in Figure , the dimer–apo system was used for consistency with Riken and DESRES simulations. The colors of the histograms are normalized so the same number of datapoints (frames) is considered. Therefore, 40,000 frames were used for all of the in house simulations, and 50,000 frames were used for the Riken and DESRES simulations. The color bars were set to maximum values of 0.0008 and 0.0010, respectively. This results in densely localized (yellow color) in the short simulations versus less density (blue color) and more spread out in the longer and enhanced simulations. Visualization of the first normal mode was carried out using the Normal Mode Wizard plugin of VMD.

Data and Software Availability

In accordance with the community principles around open sharing of COVID19 simulation data,[43] all simulation input files and data are available at https://covid19.molssi.org through the NSF MolSSI COVID19 Molecular Structure and Therapeutics Hub.

10 in total

Review 1. Methodology-Centered Review of Molecular Modeling, Simulation, and Prediction of SARS-CoV-2.

Authors: Kaifu Gao; Rui Wang; Jiahui Chen; Limei Cheng; Jaclyn Frishcosy; Yuta Huzumi; Yuchi Qiu; Tom Schluckbier; Xiaoqi Wei; Guo-Wei Wei
Journal: Chem Rev Date: 2022-05-20 Impact factor: 72.087

2. Allosteric Hotspots in the Main Protease of SARS-CoV-2.

Authors: Léonie Strömich; Nan Wu; Mauricio Barahona; Sophia N Yaliraki
Journal: J Mol Biol Date: 2022-07-16 Impact factor: 6.151

3. CROMATIC: Cross-Relationship Map of Cavities from Coronaviruses.

Authors: Lydia Siragusa; Gabriele Menna; Fabrizio Buratta; Massimo Baroni; Jenny Desantis; Gabriele Cruciani; Laura Goracci
Journal: J Chem Inf Model Date: 2022-06-13 Impact factor: 6.162

4. Gaussian accelerated molecular dynamics (GaMD): principles and applications.

Authors: Jinan Wang; Pablo R Arantes; Apurba Bhattarai; Rohaine V Hsu; Shristi Pawnikar; Yu-Ming M Huang; Giulia Palermo; Yinglong Miao
Journal: Wiley Interdiscip Rev Comput Mol Sci Date: 2021-03-01

5. Dynamic allostery highlights the evolutionary differences between the CoV-1 and CoV-2 main proteases.

Authors: Paul Campitelli; Jin Lu; S Banu Ozkan
Journal: Biophys J Date: 2022-03-15 Impact factor: 3.699

Review 6. Systematic review on role of structure based drug design (SBDD) in the identification of anti-viral leads against SARS-Cov-2.

Authors: Nilesh Gajanan Bajad; Swetha Rayala; Gopichand Gutti; Anjali Sharma; Meenakshi Singh; Ashok Kumar; Sushil Kumar Singh
Journal: Curr Res Pharmacol Drug Discov Date: 2021-05-14

Review 7. Enhanced sampling without borders: on global biasing functions and how to reweight them.

Authors: Anna S Kamenik; Stephanie M Linker; Sereina Riniker
Journal: Phys Chem Chem Phys Date: 2022-01-19 Impact factor: 3.676

Review 8. Allosteric Binding Sites of the SARS-CoV-2 Main Protease: Potential Targets for Broad-Spectrum Anti-Coronavirus Agents.

Authors: Lara Alzyoud; Mohammad A Ghattas; Noor Atatreh
Journal: Drug Des Devel Ther Date: 2022-08-02 Impact factor: 4.319

9. The Discovery of Small Allosteric and Active Site Inhibitors of the SARS-CoV-2 Main Protease via Structure-Based Virtual Screening and Biological Evaluation.

Authors: Radwa E Mahgoub; Feda E Mohamed; Lara Alzyoud; Bassam R Ali; Juliana Ferreira; Wael M Rabeh; Shaikha S AlNeyadi; Noor Atatreh; Mohammad A Ghattas
Journal: Molecules Date: 2022-10-09 Impact factor: 4.927

10. Identifying the Hot Spot Residues of the SARS-CoV-2 Main Protease Using MM-PBSA and Multiple Force Fields.

Authors: Jinyoung Byun; Juyong Lee
Journal: Life (Basel) Date: 2021-12-31

10 in total