Olivier Sheik Amamuddy1, Gennady M Verkhivker2,3,4, Özlem Tastan Bishop1. 1. Research Unit in Bioinformatics, Department of Microbiology and Biochemistry, Rhodes University, Grahamstown 6140, South Africa. 2. Graduate Program in Computational and Data Sciences, Schmid College of Science and Technology, Chapman University, Orange, California 92866, United States. 3. Department of Biomedical and Pharmaceutical Sciences, Chapman University School of Pharmacy, Irvine, California 92618, United States. 4. Department of Pharmacology, Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California San Diego, 9500 Gilman Drive, La Jolla, California 92093, United States.
Abstract
A new coronavirus (SARS-CoV-2) is a global threat to world health and economy. Its dimeric main protease (Mpro), which is required for the proteolytic cleavage of viral precursor proteins, is a good candidate for drug development owing to its conservation and the absence of a human homolog. Improving our understanding of Mpro behavior can accelerate the discovery of effective therapies to reduce mortality. All-atom molecular dynamics (MD) simulations (100 ns) of 50 mutant Mpro dimers obtained from filtered sequences from the GISAID database were analyzed using root-mean-square deviation, root-mean-square fluctuation, Rg, averaged betweenness centrality, and geometry calculations. The results showed that SARS-CoV-2 Mpro essentially behaves in a similar manner to its SAR-CoV homolog. However, we report the following new findings from the variants: (1) Residues GLY15, VAL157, and PRO184 have mutated more than once in SARS CoV-2; (2) the D48E variant has lead to a novel "TSEEMLN"" loop at the binding pocket; (3) inactive apo Mpro does not show signs of dissociation in 100 ns MD; (4) a non-canonical pose for PHE140 widens the substrate binding surface; (5) dual allosteric pockets coinciding with various stabilizing and functional components of the substrate binding pocket were found to display correlated compaction dynamics; (6) high betweenness centrality values for residues 17 and 128 in all Mpro samples suggest their high importance in dimer stability-one such consequence has been observed for the M17I mutation whereby one of the N-fingers was highly unstable. (7) Independent coarse-grained Monte Carlo simulations suggest a relationship between the rigidity/mutability and enzymatic function. Our entire approach combining database preparation, variant retrieval, homology modeling, dynamic residue network (DRN), relevant conformation retrieval from 1-D kernel density estimates from reaction coordinates to other existing approaches of structural analysis, and data visualization within the coronaviral Mpro is also novel and is applicable to other coronaviral proteins.
A new coronavirus (SARS-CoV-2) is a global threat to world health and economy. Its dimeric main protease (Mpro), which is required for the proteolytic cleavage of viral precursor proteins, is a good candidate for drug development owing to its conservation and the absence of a human homolog. Improving our understanding of Mpro behavior can accelerate the discovery of effective therapies to reduce mortality. All-atom molecular dynamics (MD) simulations (100 ns) of 50 mutant Mpro dimers obtained from filtered sequences from the GISAID database were analyzed using root-mean-square deviation, root-mean-square fluctuation, Rg, averaged betweenness centrality, and geometry calculations. The results showed that SARS-CoV-2Mpro essentially behaves in a similar manner to its SAR-CoV homolog. However, we report the following new findings from the variants: (1) Residues GLY15, VAL157, and PRO184 have mutated more than once in SARS CoV-2; (2) the D48E variant has lead to a novel "TSEEMLN"" loop at the binding pocket; (3) inactive apo Mpro does not show signs of dissociation in 100 ns MD; (4) a non-canonical pose for PHE140 widens the substrate binding surface; (5) dual allosteric pockets coinciding with various stabilizing and functional components of the substrate binding pocket were found to display correlated compaction dynamics; (6) high betweenness centrality values for residues 17 and 128 in all Mpro samples suggest their high importance in dimer stability-one such consequence has been observed for the M17I mutation whereby one of the N-fingers was highly unstable. (7) Independent coarse-grained Monte Carlo simulations suggest a relationship between the rigidity/mutability and enzymatic function. Our entire approach combining database preparation, variant retrieval, homology modeling, dynamic residue network (DRN), relevant conformation retrieval from 1-D kernel density estimates from reaction coordinates to other existing approaches of structural analysis, and data visualization within the coronaviralMpro is also novel and is applicable to other coronaviral proteins.
The humansevere acute respiratory syndrome coronavirus 2 (SARS-CoV-2) strain
is the causative agent of the COVID-19 pandemic.[1] After
being first reported from the Wuhan seafood and animal market in late
December 2019,[2] the total number of reported cases
worldwide has reached over 21 million, with observed case fatality rates
ranging between 0.8 and 14.5% among the most affected countries.[3] The disease is, however, more severe among the elderly
and those living with comorbidities that involve an endothelial
dysfunction,[4] such as hypertension, obesity, and
diabetes.[5] While there is currently no
cure,[2,6] the drugs dexamethasone and remdesivir were both
recently reported to have significantly positive effects in clinical
trials.[7,8] The World Health Organization also reports 160
candidate vaccines being at different stages of clinical evaluation.[9] While this is very encouraging, there is a lot of
uncertainty around the behavior of the pathogen. Drastic measures designed
to limit the rate of new infections[10] have resulted in
global economic problems, which have affected many livelihoods, even
exacerbating food insecurity.[11]Owing to the rapid generation of genomic sequence data[12,13] and the
timely availability of 3D structural data, research into potential drugs is
greatly accelerated. Fundamental research is a key to understanding the
pathogen’s strategies such that more informed decisions can be made
about clinical interventions. As seen in other pathogens, mutations occur
through the normal process of evolution, and certain advantageous variations
can be selected over time. The SARS-CoV-2 genome is RNA-based, and viruses
from this category have been reported to have increased rates of
mutation.[14] For instance, in HIV, this has lead to
several levels of classification of the virus, in which certain strains can
manifest different transmissibility patterns and show differing responses to
existing therapies.[15,16] Nevertheless, coronaviruses comprise an
RNA-dependent RNA polymerase genetic proofreading
machinery,[17,18] which may be responsible for their
reduced genetic diversity, even though the persistence of the pandemic may
allow for the accumulation of immunologically important mutations.[3] From the data gathered from the GISAID database[12] and real-time subsample estimates of genetic relatedness
from the Nextstrain web resource,[19] it is clear that the
virus is evolving within the human host. More recently, a SAR-CoV-2spike
variant was reported to occur asynchronously in multiple geographic
locations with the capacity for expressing higher viral concentrations in
the upper respiratory tract,[3] hinting to some form of
selection.Although progress is being gradually made in understanding the viral structural
biology and symptomatology of the disease, current knowledge is still
fragmentary.[20,21] The death toll and the number of
infections keep on rising albeit at variable rates in different parts of the
world. Thus, time is of the essence for the discovery of effective
therapies. It is imperative to better characterize parts of the viral
mechanisms to better understand the behavior of the new coronavirus.
Already, with the help of experimentally determined structures, genomic
data, and annotations, a growing number of in silico work suggests potential
solutions to the COVID-19 pandemic using various techniques, including the
use of molecular docking[22−27]
and dynamics (MD),[28−36]
network analysis,[37−43]
and machine learning.[2,44,45] Collectively, these may
pave the way to a potential solution by prioritizing inhibitors or drug
targets against COVID-19. Most of the mentioned MD research has been focused
on the application of the technique to filter and prioritize hit compounds
from small compound docking. Suárez and Díaz, however, have
investigated apo and substrate-bound Mpro using 2 μs MD
simulations in monomeric and dimeric forms, to reveal instability of domain
III in the former, using 4 protease crystal structures from
SARS-CoV-2.[36] In this study, we examine some of the
early mutations of SARS-CoV-2Mpro and investigate different
aspects of their dynamics using MD.The Mpro enzyme, also known as the 3C-like protease, is one of the
best studied drug targets among the coronaviruses.[46] This
is mainly due to the similarities in their active site and their mechanisms
with the related pathogenic betacoronaviruses from previous epidemics of
SARS-CoV and MERS-CoV (Middle East respiratory syndrome coronavirus).[47] Mpro is a conserved drug target present in
all members of the Coronavirinae
subfamily[48,49] and is highly similar to its SARS-CoV-2
counterpart.[47] SARS-CoV-2Mpro does not
have a human homolog,[39] which reduces the chances of
accidentally targeting host proteins. Alongside the papain-like protease
(PLP) enzyme, Mpro plays an essential role in the process of
viral maturation,[2] cleaving the large precursor replicase
polyprotein 1ab to produce 16 non-structural proteins.[2,50] The cysteine
protease functions as a homodimer and mainly comprises three domains
(I–III).[2] Homodimerization plays an
important role in the catalytic activity of Mpro, as reported in
the case of the SARS-CoVMpro homolog, where the G11A mutation
completely abolished its activity by interfering with the insertion of the
N-finger region (residues 1–9).[51] Additionally,
only one of the dimers was shown to be active at a time in[52] SARS-CoVMpro. At the N-terminus, the
chymotrypsin-like domain I (residues 10–99) is connected to the
picornavirus 3C-protease-like domain II (residues 100–182), which
together form a hydrophobic substrate binding site, with catalytic residues
HIS41 and CYS145.[50,53] Domain III (residues 198–303), also referred
to as the helical domain, is connected to domain II[54] by
a 15-residue linker loop and was shown to regulate enzymatic activity in
SARS-CoV.[55] While each domain minimally makes
contact with its equivalent domain from the alternate chain, the majority of
the dimer contact interface is a result of interactions present between
domain II (chain A) and the N-finger (chain B).[50] In the
same manner, the N-finger from chain A makes contact with domain II from
chain B. Each chain is referred to as a protomer.[34,42,50]In this work, we study the collective effects of various Mpro
mutations from a filtered sample of 50 SARS-CoV-2 isolates by first mapping
them on 3D structures and performing all-atom MD for each of the mutants, in
addition to the reference protein. All-atom simulations were carried out at
a constant protonation state initially corresponding to a pH of 7. Multiple
aspects of the protein dynamics were analyzed using a battery of techniques,
including the averaged betweenness centrality (BC)—a
metric of dynamic residue network (DRN) analysis, dynamic cross-correlation
(DCC),[56] geometry calculations (interdomain angles
and interprotomer distances) based on the center of mass (COM), cavity
compaction analyses, the anisotropic network model (ANM), and the analysis
of residue and backbone fluctuations. A new metric based on Jaccard’s
coefficient was developed to sort non-Gaussian distributions to facilitate
the differentiation of large numbers of such curves obtained from MD
reaction coordinates. Coarse-grained Monte Carlo simulations were also
independently investigated.
Materials and Methods
Sequence and Template Retrieval
A high-resolution (1.48 Å) biological unit for crystal structure of
the SARS-CoV-2 main protease (PDB ID 5RFV(57)) was retrieved
from the Protein Data Bank (PDB)[58] to be used as a
template for homology modeling. Its sequence was used as a reference
throughout this work. PyMOL (version 2.4)[59] was
used to remove any non-protein molecules and to reconstitute the
biological unit as chains A and B. SARS-CoV-2 genomes of any length
were acquired from the GISAID website as a FASTA-formatted
file.[12]
Building the Mutation Data Set
Low coverage sequences (entries with >5% unknown nucleotides) from
GISAID were not selected. A local BLAST database was then set up for
these sequences using the makeblastdb command
available from the BLAST+ application (version 2.8.1).[60] Protease mutants were subsequently retrieved using
the reference sequence as a query parameter for the
tblastn command with default parameters, except
for the maximum number of target sequences, which was set at 10000.
Identical sequences were then filtered out before selecting BLAST hits
that had a 100% sequence coverage and a percentage sequence identity
of <100%. Sequences coming from non-human hosts were discarded. In
order to retain as much data as possible and minimize the
incorporation of sequencing errors, fold coverage was used where
provided, while quality was imputed on the basis of an identical
Mpro sequence being present more than once in the
filtered data set for samples not possessing sequencing coverage
information. This resulted in 50 mutant sequences with either high
coverage, where available, or with additional sources of support
otherwise. Further details about the samples are given in the
Supporting Information acknowledgment, Table S1.
Homology Modeling, pH Adjustment, and Analysis of Residue
Interactions
PIR-formatted target-template sequence alignment files were generated for
each mutant using the BioPython library (version 1.76)[61] within ad hoc Python scripts for use in MODELLER
(version 9.22).[62] The automodel
class was used with slow refinement and a deviation of 2 Å to
generate 12 models in parallel for each mutant, after which the ones
with the lowest z-DOPE scores were retained. The protein was then
adjusted to a pH of 7 using the PROPKA algorithm from the PDB2PQR tool
(version 2.1.1).[63] For visualizing the overall
interactions at given residue positions, the Arpeggio tool[64] was used to programmatically generate the
inter-residue interactions before computing their sums using an
in-house Python script. More generally, the Discovery Studio
Visualizer (version 19.1) was used for describing the non-bonded
interactions.[65]
Molecular Dynamics Simulations
All-atom protein MD simulations (using the AMBER03 force field) were run
for the protonated dimers using GROMACS (version 2016.1)[66] at the Center for High Performance Computing
(CHPC). Proteins were placed in a triclinic box containing 0.15 M NaCl
in SPC-modeled water. A minimum image distance of 1.5 nm between the
solute and the box was used. The system was then energy minimized
using the steepest descent algorithm with an initial step size of 0.01
nm for a maximum force of 1000 kJ/mol/nm and a maximum of 50,000
steps. Temperature was subsequently equilibrated at 310 K for 50 ps,
according to the NVT ensemble. Pressure was then
equilibrated at 1 bar for 50 ps, using the Berendsen algorithm
according to the NPT ensemble. During both
NVT and NPT, the protein was
position restrained, and constraints were applied on all bonds.
Unrestrained production runs (100 ns) were then performed, where
constraints were applied only on H-bonds, and the
Parrinello–Rahman algorithm was used for pressure coupling. In
all cases, a time step of 2 fs was used, with a short-range non-bonded
cutoff distance of 1.1 nm and the PME algorithm for long-range
electrostatic interaction calculations. Prior to analysis, the
periodic boundary conditions were removed, and the trajectories were
corrected for rotational and translational motions.
Coarse-Grained Simulations
Coarse-grained (CG) models enable simulations of long timescales for
protein systems and assemblies and represent a computationally
effective strategy for adequate sampling of the conformational space
while maintaining physical rigor. The CABS model was employed for
multiple CG simulations[67−71] of the SARS-CoV-2 main protease dimer
structures (PDB IDs 5RFV, 6Y2E, and 6Y2F(50)). In this model, the CG
representation of protein residues is reduced to four united atoms.
The residues are represented by main-chain α-carbons
(Cα), β-carbons (Cβ),
the COM of side chains, and another pseudoatom placed in the center of
the Cα–Cα pseudo-bond. The
sampling scheme involved Monte Carlo (MC) dynamics moves including
local moves of individual residues and moves of small fragments
composed of 3 protein residues. One hundred independent CG simulations
were carried out for each studied system with the CABS-flex standalone
Python package for fast simulations of protein dynamics, which is
implemented as a Python 2.7 object-oriented package.[71] In each simulation, the total number of cycles was
set to 1000, and the number of cycles between trajectory frames was
100. Accordingly, the total number of generated models was 2,000,000,
and the total number of saved models in the trajectory used for
analysis was 20,000. It was previously shown that the CABS-flex
approach can accurately recapitulate all-atom MD simulations on a long
timescale.[67−71] The results of 100 independent CG-CABS
simulations for each system were averaged to obtain adequate sampling
and ensure convergence of simulation runs.
Dynamic Residue Network Analysis, Dynamic Cross-Correlation, and
Normal Mode Analysis
The MD-TASK tool kit[56] was used to calculate the
residue averaged BC values over the last 50 ns of
simulation for each protein using a cutoff distance of 6.70 Å and
a step size of 25, generating a total of 10,001 frames. DCC was
calculated for each of the proteins using the same frames and time
steps before linearizing each matrix. Pairwise Pearson correlations
were then performed for all linearized matrices before performing
hierarchical clustering. In all cases, the Cβ and
glycine Cα atoms were used. The GROMACS commands
trjconv and make_ndx were
utilized to reduce the trajectory sizes, to only keep
Cα and Cβ atoms prior to
computation. Normal modes were obtained from Cα using
ProDy,[72] with default parameters for the
ANM.
Pocket Detection and Dynamic Analysis
The reconstituted biological unit for the reference structure was
submitted to the FTMap web server[73] using default
parameters. The PyMOL plugin PyVOL[74] was then used
to identify the surfaces of any potential cavity, specifying the
protein as selection, with a default minimum volume of 200
Å3. Predictions from both tools were combined.
Residues for the interprotomer subpocket were defined by visually
inspecting residues in proximity to the cavity surfaces detected by
PyVOL that overlapped with part of the FTMap probe binding
predictions. The pocket was selected due to its location and
accessibility to the outside. The substrate binding residues
(identified using PyVOL) from both protomers of Mpro were
also investigated due to their functional importance in catalysis,
even though FTMap did not identify a binding hot spot at that
location. The radius of gyration (Rg) for
the subpocket and the binding pockets was then computed from the
entirety of the simulated MD data in each case, using the GROMACS
gyrate command. The generated data was then
visualized and analyzed using various open source Python libraries,
such as matplotlib,[75] Seaborn, Pandas,[76] NumPy,[77] SciPy,[78] MDTraj,[79] and NGLview.[80]
Coupling Peak Finding to an Adapted Jaccard Distance Metric to Rank
and Extract Equilibrium Conformations from KDE Distributions of MD
Measurements
In order to facilitate the comparison of distributions of kernel density
estimation (KDE) from a large number of simulations, a new metric
based the Jaccard distance (dJ) was
devised and applied to rank both Gaussian and non-Gaussian
distributions obtained from MD measurements based on their distance
from a comparator. Non-Gaussian curves can generally be observed from
a collective variable when multiple equilibrium states are sampled
(leading to multimodal distributions) or when there is a skewed
distribution. The metric assumes no similarity of variance or
distribution shapes but requires a range
[xmin,
xmax] common to both samples to be
set for the KDE fitting calculations and a common area under the
curve, here having a value of one. KDE dJ
has a minimum of zero and a maximum of one, corresponding to complete
dissimilarity and identity, respectively. Scott’s rule, defined
by the equation , was used for
Gaussian kernel bandwidth selection, where n and
d are the sample size and the number of
dimensions.[78,81] The fitting range was set to
the global minimum and maximum values of the complete data set to be
compared. The equation used, which is based on numerical approximation
of the integrals, is defined
aswhere
f(x) and
g(x) are the kernel density
estimates of the reference and test samples. The metric was applied to
rank the mutant distributions according to their distance from the
reference Mpro sample in the case of Cα
root-mean-square deviation (RMSD) and COM distance distributions.Thereafter, a straightforward 1D peak finding algorithm was applied to
detect the modes from the ranked samples. The KDE modes correspond to
zones in a conformational landscape about which structures tend to
visit more frequently. Only peaks above a set minimum KDE estimate
(ε = 0.05) were used to limit the detection of insignificantly
low peaks. Using this approach, equilibrium conformations
corresponding to various energy minima along given reaction
coordinates were extracted.
Results and Discussion
Analysis of Residue Mutations and Their Distribution in the 3D
Structure
As a preliminary investigation of the propensity of the sequences to
acquire novel mutations, unique residue mutations were determined
across our set of 50 protein sequences, which were filtered from the
GISAID database.[12] While we cannot infer population
frequencies (across the world) from our relatively small sample, we
show from our estimate that multiple missense mutations have already
occurred on each domain of the Mpro (Figure ). These mutations are shown in
Table , and further
experimental support is reported in the Supporting Information, Table S1. From these, it can
be observed that many mutations are interconversions of the
hydrophobic side-chain residues alanine and valine. As seen in Figure , most residue
mutations have occurred in solvent-accessible surfaces, with the
exception of A7V, V20L, L89F, A116V, A129V, T135I, I136V, V157I/L,
C160S, A173V, T201A, A234V, and A266V, which were predicted to be
buried by the PyMOL script findSurfaceResidues, using
a default cutoff of 2.5 Å2.
Figure 1
Mapping of the positions showing unique mutations from the
reference Mpro sequence. For clarity, domains
(I–III) are colored (red, blue, and orange,
respectively) only for one of the monomers, while the
other is represented as a gray surface. The domain linker
region is in green, and the N-finger is in cyan. The size
of the labels denotes the number of unique mutations
recorded at that position.
Table 1
List of Missense Mutations in SARS-CoV-2
Mproa
GISAID ID
mutation
GISAID ID
mutation
GISAID ID
mutation
EPI_ISL_425319*
A7V
EPI_ISL_425284*
A116V
EPI_ISL_424470*
T196M
EPI_ISL_420422
G15D
EPI_ISL_421005*
P108S
EPI_ISL_423642
T201A
EPI_ISL_420181
G15S
EPI_ISL_422860*
A129V
EPI_ISL_419256*
L220F
EPI_ISL_425242
G15S, D48E
EPI_ISL_420579
P132L
EPI_ISL_421506
L232F
EPI_ISL_423772*
M17I
EPI_ISL_425655
T135I
EPI_ISL_425235
A234V
EPI_ISL_425342
V20L
EPI_ISL_420182
I136V
EPI_ISL_426097*
K236R
EPI_ISL_421312*
T45I
EPI_ISL_420510*
N151D
EPI_ISL_416720*
Y237H
EPI_ISL_425839*
M49I
EPI_ISL_415503
V157I
EPI_ISL_425886*
D248E
EPI_ISL_418269*
R60C
EPI_ISL_426028
V157L
EPI_ISL_418075
A255V
EPI_ISL_420306
K61R
EPI_ISL_417413
C160S
EPI_ISL_422919
I259T
EPI_ISL_421763*
A70T
EPI_ISL_418082
A173V
EPI_ISL_423725*
A260V
EPI_ISL_413021
G71S
EPI_ISL_423288
P184S
EPI_ISL_425498*
V261A
EPI_ISL_415643
L89F
EPI_ISL_420241*
P184L
EPI_ISL_421380*
A266V
EPI_ISL_420059
K90R
EPI_ISL_419710*
A191V, L220F
EPI_ISL_420610*
N274D
EPI_ISL_419756
P99L
EPI_ISL_415610*
A193V
EPI_ISL_425643
R279C
EPI_ISL_425132
Y101C
EPI_ISL_421515*
T198I
EPI_ISL_422184*
S301L
EPI_ISL_419984*
R105H
EPI_ISL_423007
T190I
Samples with noticeably large differences in
Cα RMSD from the reference
protease are marked with an asterisk.
Mapping of the positions showing unique mutations from the
reference Mpro sequence. For clarity, domains
(I–III) are colored (red, blue, and orange,
respectively) only for one of the monomers, while the
other is represented as a gray surface. The domain linker
region is in green, and the N-finger is in cyan. The size
of the labels denotes the number of unique mutations
recorded at that position.Samples with noticeably large differences in
Cα RMSD from the reference
protease are marked with an asterisk.Current genomic sequence data suggests a generally low rate of mutation
in SARS-CoV-2.[82,83] This being said, in the case of the
Mpro enzyme, a relatively higher rate of
non-synonymous mutations has occurred at residue position 15 (G15D/S)
in domain I, residue position 157 (V157I/L) in domain II, and at
position 184 (P184L/S) within the interdomain linker region. On the 3D
structure, it can be seen that these mutable areas occur away from the
core areas of domains I and II, hence the probable lower selective
pressure for these regions. For this reason, we posit that
individually, these loci may be less important for basic enzymatic
functions. Additionally, samples EPI_ISL_425242 and EPI_ISL_419710
have accumulated two mutations each. Of notable interest is the
mutation of the active site flap residue 48 from the
“TSEDMLN” loop described by Ma and coworkers[84] to “TSEEMLN” in sample
EPI_ISL_425242. Earlier, the homologous region displayed the motif
“TAEDMLN” in SARS-CoV. It is possible that this may
affect the affinity of the enzyme for its substrate and play a role in
the virus’ level of fitness. Residue 15 mutations are examined
in section .Mutation rates of RNA viruses are known to be generally high, endowing
them with the ability to escape host immune responses, improve their
virulence, and even change tissue tropism.[14,85]
Extinction events are not uncommon among the RNA viruses, as seen in
the influenza A H1N1 strains[86] and the previous
SARS-CoV strain,[87] and may be associated with the
gradual accumulation of non-synonymous mutations. On the other hand, a
reduction in replicative speed and genetic diversity was independently
observed in the poliovirus 3DG64S mutant compared to its
wild-type (WT) when rates of mutation were artificially increased by
exposure to a mutagen.[14,88,89] In the
case of the HIV, multiple mutations of minor effect are known to
collaboratively modulate the effect of other advantageous mutations
already present in the protease.[90,91] As a newly emerged
pathogen with a relatively long incubation period and an incompletely
understood biology, these facts from related viruses presuppose a
potentially complex mechanism of viral evolution and adaptation, which
suggests that mutations have to be closely monitored for global health
and security. It is interesting to note that among the buried residue
mutations, domain II has accumulated the highest number of these
mutations in such little time, which may suggest a certain degree of
tolerance to mutations in that region, despite their presence within
beta strands. In the same domain, the A116V mutation occurs on a beta
strand, which is supported by a rich network of hydrogen bonds. The
local impacts of the A116V mutation are discussed further in section .While several mutations are present in domain III, the current data
suggests that these have not yet (at the time of writing) further
evolved at these positions and probably suggests that there might have
a higher fitness cost involved with mutating residues from this
domain. This may be due to its major role in regulating enzyme
activity, together with the stabilizing N-finger region.[53] It is also possible that undersampling may have
lead to a similar observation. Most of the mutations have occurred in
solvent-exposed surfaces of the protein, which may be under reduced
selective pressure, especially if these are in loop regions. On the
other hand, interfacial residue mutations (particularly at the chain
interface) may exert their effects in a more exacerbated manner by
either overstabilizing or destabilizing protein–protein
interactions, as reported in works on other
proteins.[92,93] One such mutation has already
happened at position 7 (A7V) in the N-finger. The local impacts of the
A7V mutation are discussed in section .
Homology Modeling and Inspection of Residue Protonation States
After the preliminary analysis of Mpro at the sequence level,
their 3D structures were built for further investigation. The z-DOPE
scores for the best homology models obtained for each sample were all
below −1, indicating that they were all
native-like.[62,94] Overall, z-DOPE values had a
minimum of −1.48, a maximum of −1.38, with a median of
−1.42. While it is not possible to simulate changes in the
protonation state using classical MD (especially when there are many
titratable residues), an initial approximation of the most prevalent
residue protonation states (at pH 7) was used for each of the
Mpro samples. As there was a relatively higher level
of variation in residue protonation states among each of the seven
histidine residues found in each Mpro protomer, only the
catalytic residue and the missense mutations are described herein,
post homology modeling. We suspect that the high number of titratable
amino acids may play a role in influencing protein behavior at varying
pH levels. Previous work in SARS-CoV indeed reported a pH-dependent
activity switch of the main protease.[95] In our
case, the catalytic residue HIS41 was generally protonated at the
delta nitrogen (HID) atom but also occurred in its fully protonated
state (HIP) in one of the protomers for samples EPI_ISL_419710 and
EPI_ISL_425655. Protonated aspartic acid (ASH) was found in both
protomers of sample EPI_ISL_420510, which was the only isolate to
contain the N151D mutation. ASH was also found in sample
EPI_ISL_421312, for only one of its protomers at residue position 289,
which otherwise occurs in its deprotonated form in all other samples.
The R105H mutation (present only in sample EPI_ISL_419984) occurs as
an HID in each protomer. Similarly, the Y237H mutation, present only
in sample EPI_ISL_416720, occurs as an HID in both of its protomers.
The local residue interactions around residue mutations of interest
were also investigated, by comparing them with their equivalent
position in the Mpro reference, using their modeled
structure. These mutations comprised residues that underwent a higher
number of mutations (G15D/S, V157I/L, and P184L/S) in addition to
mutations that occurred at or close to the dimer interface (A7V and
A116V). A7V significantly increased the number of proximal
interactions to neighboring residues when compared to the reference
protein (Figure ) and gained
in hydrophobic interactions, although a clash in the van der Waals
radius is additionally present.
Figure 2
Differences in the sum of each interaction type for the
Mpro, only for the residue locations that
either accumulated more than one non-synonymous mutation
or the ones occurring close to protein interfaces. The
differences were obtained by subtracting the reference
values from the matching residue loci in the mutant.
Sample names and the selected residue mutations are shown
along the y-axis.
Differences in the sum of each interaction type for the
Mpro, only for the residue locations that
either accumulated more than one non-synonymous mutation
or the ones occurring close to protein interfaces. The
differences were obtained by subtracting the reference
values from the matching residue loci in the mutant.
Sample names and the selected residue mutations are shown
along the y-axis.G15D is found to increase the number of proximal contacts by the largest
extent while also modestly increasing the number of hydrophobic
interactions. By replacing alanine with valine at position 116, an
increased amount of proximal interactions is gained at V116 in sample
EPI_ISL_425284, with a higher amount of hydrophobic contacts to the
residue. Mutations V157I/L both reduced the number of local
hydrophobic contacts and resulted in a reduced van der Waals clash
compared to the reference. P184L had a reduced number of proximal
contacts compared to the reference, while P184S was very similar to
its equivalent position in the reference. These observations only give
a general indication of the local changes present in the static
starting structures. In the sections that follow, the dynamic aspects
of Mpro are investigated.
Estimation of the Protein Backbone Flexibility from MD Using
Cα RMSD
The Cα RMSD values obtained after frame fitting to the
initial frames and periodic image correction (Figure
A) revealed a relatively high
range of divergence from the reference Mpro among the 50
isolates, ranging from 15 to 94%. From the shapes of the kernel
density plots, it can be seen that some mutants may equilibrate around
single energy minima (for unimodal distributions), while others select
for multiple major conformations, as depicted by the multimodal
shapes. In all, these samples involve mutations A7V, M17I, A70T,
A116V, K236R, Y237H, D248E, A266V, and N274D, in which the last five
are exclusive to domain III. A7V occurs on the N-finger, which is a
critical region for Mpro dimer stability. M17I occurs on an
internal loop that connects a β strand to a helix in domain I,
while A70T occurs on a solvent-exposed loop in the same domain.
Mutation A116V occurs in a buried β strand within domain II.
From their visibly positively shifted upper quartiles (and based on
the relatively large KDE dJ and the
visibly higher modes), we may infer that more mobile backbones were
sampled for mutants EPI_ISL_421380, EPI_ISL_423772, and EPI_ISL_425886
compared to the reference. Upon visualizing the trajectories, a
slight, global twisting motion was observed between their protomers,
whereby each protomer moved in opposite directions, while being
tethered at the center. However, a similar motion was also generally
observed in all other samples and is summarized using the first
non-trivial mode (number 7) from the ANM using the highest probability
equilibrium pose from the reference (Figure B). This may indicate that the global
twisting motions are a normal behavior of dimeric Mpro, at
least under our simulated conditions for the apo state. On the other
hand, the second non-trivial ANM mode (number 8) showed the domain
rotational motions—they are similar overall for both the
reference (Figure C) and in
EPI_ISL_425886 where the highest equilibrium backbone RMSD was
observed (Figure D), with
the exception that protomer A behaved as protomer B, and vice versa in
each case. The predictions were obtained from ProDy in VMD.[96]
Figure 3
(A) Violin plots of Cα RMSD values for the
reference (in gray) and the mutant (colored in blue)
Mpro, showing the 25th, 50th, and 75th
percentiles in dotted lines inside the kernel density
plots. Distributions are scaled by area and have been
sorted by the KDE dJ distance
(shown above each distribution) computed between each
sample and the reference protease. (B) General twisting
motion of the protomers observed across Mpro
samples, inferred from the highest probability density
reference protease pose using the ANM mode 7. (C) Mode 8
shown for the same reference pose and for (D) highest
recorded distribution mode in EPI_ISL_425886 at 77 ns.
(A) Violin plots of Cα RMSD values for the
reference (in gray) and the mutant (colored in blue)
Mpro, showing the 25th, 50th, and 75th
percentiles in dotted lines inside the kernel density
plots. Distributions are scaled by area and have been
sorted by the KDE dJ distance
(shown above each distribution) computed between each
sample and the reference protease. (B) General twisting
motion of the protomers observed across Mpro
samples, inferred from the highest probability density
reference protease pose using the ANM mode 7. (C) Mode 8
shown for the same reference pose and for (D) highest
recorded distribution mode in EPI_ISL_425886 at 77 ns.We further examined these samples by collecting single conformations
found at each of the local peaks (local energy minima) from their
individual KDE distributions. A cutoff KDE
dJ value of 60% was used as a prior
filter to limit the number of samples to be assessed visually. Using
this criterion, modes from the KDE distributions of 25 mutant samples
(marked by an asterisk symbol in Table ), in addition to the reference
protease, were used to extract conformations (Figure
) corresponding to different
local energy minima.
Figure 4
(a–r) 3D visualization of Mpro
conformations extracted from KDE distributions of
Cα RMSD. All panels share a common
color scale (ranging from blue through yellow to red)
depicting the Cα distances calculated
from an aligned high probability reference conformation
obtained at 67.84 ns (in white cartoon representation).
Regions of appreciably higher distances are circled in red
(domain III) and black (domain I) dotted lines.
(a–r) 3D visualization of Mpro
conformations extracted from KDE distributions of
Cα RMSD. All panels share a common
color scale (ranging from blue through yellow to red)
depicting the Cα distances calculated
from an aligned high probability reference conformation
obtained at 67.84 ns (in white cartoon representation).
Regions of appreciably higher distances are circled in red
(domain III) and black (domain I) dotted lines.For a fair comparison, the reference protease conformation corresponding
to the highest KDE peak was used as a comparator for all samples.
Thereafter, conformations displaying the highest displacements were
visually selected. From a bird’s eye view of the sampled
backbone RMSD in Figure , it
can be seen that the regions of highest divergence include part of
domain III and to a lesser extent domain I. Only one subunit is seen
to display more movement in each of the extracted high probability
density poses. Due to the homodimeric nature of the protein, a
permutation of structural superpositions of the mutant samples was
performed against the reference protease chains to determine the best
reference position, based on the lowest RMSD to domain II, which was
the least mobile protease domain during MD. Domain I also showed
notable movement, albeit of lesser magnitude than domain III. This
region mainly comprised the active site components. The fact that
motion happens about a more stationary center is reminiscent of a
fulcrum. The overall backbone dynamics and asymmetries are similar to
what is described in SARS-CoVMpro.
Estimation of the N-Finger Flexibility from MD Using All-Atom
RMSD
The N-finger region is an important structural foundation required for
the stabilization of the functional Mpro dimer, as reported
by a large body of work done in SARS-CoV.[97] The
N-finger from one protomer controls the activity of the dimer’s
alternate subunit.[98] Early work by Chou and
coworkers demonstrated the importance of the salt bridge found between
ARG4 (N-finger) and GLU290 (domain III) from alternate protomers in
subunit association.[99] Additional studies involving
deletion[53] and missense
mutations[51,100] have highlighted the
importance of the coronaviralMpro N-finger region in
enzymatic activity. Multiple studies have also reported the dimer to
be an active form of the enzyme,[51,100] with only one
protomer being active at a time.[52] Chen and
colleagues suggested that the N-terminus only stabilizes the active
form of the enzyme and is not a prerequisite for dimerization[101] of the SARS-CovMpro. More
specifically, an N-finger from each protomer interacts with the GLU166
residue from the alternate protomer to stabilize the catalytic
pocket.[102] More recently, Li and coworkers
showed that enzymatic activity is in fact coupled to the flexibility
of a short loop (residues 139–141) near the active site by
engineering an active tetramutant Mpro monomer.[103] Their work, which was focused on understanding
the stability of SARS-CoVMpro, obviates the need for a
similar inquisition in its SARS-CoV-2 homolog. Therefore, we started
our investigation of the N-finger stability within the dimeric protein
across all Mpro samples, as a means to detect any hint of
reduced enzymatic activity via a reduction of dimer stability.While the observation of dissociation would give a clear verdict about
inactivation (as experimentally reported in
SARS-CoV[98,100]), its absence during an MD
simulation may not inform us about inactivation. Independent
simulations performed in duplicate (not shown) of dimeric
Mpro containing the mutants G11A, E14A, and R298A and
even of the triple mutant (G11A/R298A/Q299A) did not dissociate over a
period of 100 ns, even though these were experimentally shown to be
inactive monomers in SARS-CoV, suggesting that particular equilibrium
states may need to be traversed before observing such events. Another
possibility may lie behind an incompletely understood mechanism of
dimer formation, for which two models have been proposed using
SARS-CoV, both suggesting that Mpro dimerization may be
substrate-induced.[102,104] We nevertheless
examined the 100 ns dynamics for any notable differences between the
SARS-CoV-2 variants.After fitting the proteins globally, no further fitting was done for the
N-finger RMSD calculation in order to better represent the N-finger
motions. As seen in the KDE plots (Figure ), the reference protein has two very
stable N-finger conformations, characterized by unimodal distributions
although the protomer equilibria are asymmetric. This tendency toward
asymmetry is seen in most of the cases, with the exception of samples
EPI_ISL_421506, EPI_ISL_417413, and to some extent EPI_ISL_425235. Due
to the existence of both forms of symmetry, with asymmetry largely
dominating the dynamics, it is probable that some form of dynamically
associated constraint may drive one N-finger into an equilibrium state
while the other is forced into another. While one may speculate that
EPI_ISL_421506 may be in an inactive state due to the perceived
symmetry, other parts of the enzyme are not asymmetric. Multimodal
distributions were observed in several cases, which clearly suggest
the presence of multiple local minima, with transitory states
occurring in between these modes. On the far right of Figure , samples EPI_ISL_423288,
EPI_ISL_422191, EPI_ISL_420241, EPI_ISL_425643, EPI_ISL_419256,
EPI_ISL_423007, and EPI_ISL_420306 show median differences above 1.1
Å between the protomers. EPI_ISL_423772 showed the most prominent
difference, which corresponds to a larger amount of time spent away
from a stable conformational equilibrium, even though a stable
equilibrium was also visited in chain A (the lowest mode for the same
sample).
Figure 5
Kernel density distributions of RMSD values for the N-finger
region across the mutant and reference protease complexes.
The violin plots are split into two for each protein
sample, showing the RMSD values for chains A (in blue) and
B (in red). The tips of the distributions mark the minimum
and maximum values for both chains combined in each
complex. Samples have been sorted by increasing median
distance between the chains, also shown (in Å) at the
top of each sample distribution.
Kernel density distributions of RMSD values for the N-finger
region across the mutant and reference protease complexes.
The violin plots are split into two for each protein
sample, showing the RMSD values for chains A (in blue) and
B (in red). The tips of the distributions mark the minimum
and maximum values for both chains combined in each
complex. Samples have been sorted by increasing median
distance between the chains, also shown (in Å) at the
top of each sample distribution.N-finger poses corresponding to the RMSD KDE peaks for EPI_ISL_423772
(M17I mutant) and the reference protease are shown in Figure . From the N-finger
visualization, we can observe that its distance from the high
probability reference protease equilibrium pose varies from 1.0 to
13.5 Å. While a subfigure (g) shows the highest distance reached
by the N-finger Cα atom, all other poses (a–f
and h) in Figure correspond
to multiple modes from the KDE distribution. Additionally, from the
chain labels, the asymmetry can also be observed. It is also
noteworthy to observe the transient nature of the α-helical
region (residues 10–15) to which the N-finger is connected,
both in the mutant and the reference. The STRIDE[105]
functionality within VMD (version 1.9.3) predicts the helical region
to be shorter (residues 14–16) in the reference protease. This
lengthening and shortening are likely related to interprotomer
stability and may potentially be associated with enzymatic activity.
It may therefore be a generalization of a similar functionality
observed in the chameleon loop that is directly associated with
enzymatic activity switching. The same protein displayed discernible
backbone motions in domains I and III (Figures and 4). While the
mutation occurs on a beta strand located in a core area of the
protein, the non-bonded interactions are similar for both M17 and
I17.
Figure 6
(a–h) 3D visualization of equillibrium N-finger
conformations from different Mpro samples. Each
panel shows equilibrium poses of the N-finger using
all-atom RMSD as the collective variable. The mutant chain
labels are shown, next to the sampled times. The reference
pose (chain A, 46.34 ns in gray) is used as a comparator
in each panel, while the samples are colored according to
their Cα atom displacements from the
reference, starting with blue (lowest) through yellow to
red (highest). The distance values are colored from blue
(low values) through yellow to red (highest values). The
distance between the first Cα atom (in
Å) of each sample from the equivalent position in the
reference protein is also shown.
(a–h) 3D visualization of equillibrium N-finger
conformations from different Mpro samples. Each
panel shows equilibrium poses of the N-finger using
all-atom RMSD as the collective variable. The mutant chain
labels are shown, next to the sampled times. The reference
pose (chain A, 46.34 ns in gray) is used as a comparator
in each panel, while the samples are colored according to
their Cα atom displacements from the
reference, starting with blue (lowest) through yellow to
red (highest). The distance values are colored from blue
(low values) through yellow to red (highest values). The
distance between the first Cα atom (in
Å) of each sample from the equivalent position in the
reference protein is also shown.It is clear that the N-finger can adopt a range of equilibrium
conformations, with some showing protomer symmetries and others not.
Above all, it can safely be assumed that all samples are enzymatically
active; otherwise, the SARS-CoV-2 would not be able to perform
catalysis and thrive in the human host. The distribution shapes may
therefore be different parts of a continuum along the potential energy
surface of active Mpro. It is also possible that a
different energy surface may manifest itself in the presence of a
substrate to emphasize the differences in distribution shapes. There
are hypotheses that support this claim, and they are elaborated
further in section .
Analysis of Residue Fluctuations and BC across
Mpro Samples
The analysis of residue fluctuations shows that there are generally
interspersed areas of low and high flexibility along Mpro
(Supporting Information, Figure S1). Some local areas
of higher flexibility were also seen, as is generally observed in
unstructured secondary structures.[106] Additionally,
from the lack of clustering between chains belonging to the same
sample for each segment of Mpro (Supporting Information, Figure S1; a cluster tree
was removed for figure visibility), we conclude that the residue
dynamics for the same domain between alternate chains of
Mpro are asymmetric in the dimeric apo form. Focusing
onto the specific regions of Mpro, it can be seen that the
N-terminal region of the N-finger generally displayed moderate residue
fluctuations despite being sandwiched at the interface of two
Mpro chains, thus agreeing with the N-finger RMSD
data. Domain I is generally moderately flexible at residue positions
22–25, 33, 34, 92, and 93. Higher flexibility was globally
observed along the intervals 45–65 and 71–74 and at
position 76. Very high fluctuations were recorded for a subset of
mutants at positions 46–54, most particularly for only one
chain among the mutants EPI_ISL_416720, EPI_ISL_423642,
EPI_ISL_415503, and EPI_ISL_425886, thus reinforcing the observation
of asymmetry between chains. Mutations from samples EPI_ISL_416720 and
EPI_ISL_425886 were already found to lead to increased backbone
fluctuation using Cα RMSD. The Y237H mutation in
EPI_ISL_416720 introduces two carbon H-bonds (one with L272 and
another with V233), in addition to the pi-alkyl interaction that is
present in both the reference and this mutant, which seem to hold the
solvent-exposed helices together in domain III. In a review by
Horowitz and Trievel, the carbon H-bond was highlighted as an
underappreciated interaction that is otherwise widespread in proteins,
with the ability to form interactions as strong as conventional
H-bonds via polarization.[107] It is possible that
such an increase in interaction may improve the stability around this
region in domain III for the Y237H mutation. In the case of the D248E
mutation in EPI_ISL_425886, the D248 side chain is H-bonded to Q244 in
the reference structure, possibly stabilizing the helical structure.
This interaction is absent upon mutating to an E248. In the last frame
from MD, the side-chain epsilon oxygen was found to interact with its
backbone hydrogen atom, indicating a decreased stabilization of the
helical region of domain III. T201A in EPI_ISL_423642 abolishes the
H-bond that is otherwise present between T201 and the backbone oxygen
atom of E240, very likely weakening their interaction. The V157I
mutation in EPI_ISL_415503 does not significantly alter the non-bonded
interactions but occurs on a beta strand on domain II. A unique
behavior was observed in the case of chain B of EPI_ISL_423772, where
the leading residues of the N-finger were the most flexible while the
rest of the protein was the least flexible across all samples.
However, the M17I mutation does not significantly change the
non-bonded interaction in EPI_ISL_423772 but occurs on a beta strand
in domain I. Domain II is most flexible across all samples at residue
position 153–155. Moderate flexibility is generally observed at
residue positions 100, 107, 119, 137, 141, 142, 165–171, 178,
and 180. The linker region was generally highly flexible in all cases
within the region spanned by residues 188–197. Notably higher
fluctuations were observed at position 185 in samples EPI_ISL_420610
(chain B) and EPI_ISL_423007 (chain A). Across all domains, parts of
domain III contain the most flexible residues within Mpro.
It is highly flexible at residue positions 222–224,
231–233, 235, 236, 244, 245, and 273–279. The high
fluctuations observed at C-terminus residues may not be very
informative in our case as they freely interact with the solvent and
do not have a strong enough network of non-bonded interactions with
the Mpro domains. As a whole, from the empirical cumulative
distribution (ECD) of averaged root-mean-square fluctuation (RMSF)
values across all Mpro samples, the top 5% positions (most
variable regions) comprise residues 222, 277, 223, 154, 47, 72, 50,
224, 232, 64, 279, 236, 235, 244, and 51, with values ranging from
0.386 to 0.245 nm. The bottom 5% (most stable regions) of the
distribution comprise positions 149, 146, 147, 174, 29, 113, 163, 39,
124, 150, 7, 175, 38, 161, and 173, with RMSF values ranging from
0.057 to 0.077 nm. On the 3D mapping (Figure A), it can be seen that the regions
with the highest flexibility are solvent-exposed surfaces, comprising
loops or parts of helices connected by loops. The central core of the
enzyme has the lowest flexibility, most likely to provide structural
stability to the functional dimer. Catalytic residues (HIS41 and
CYS145) are connected to these stable core residues on one side.
However, HIS41 is connected on the other side to a more mobile
structure composed of a 310 helix connected by loops on
each end, which forms a lid structure, similar to what is described
earlier for previous human coronavirus strains.[108,109]
Further discussions around the substrate binding site are given in
section .
Figure 7
3D mapping of averaged values for (A) RMSF and (B) average
BC, computed across all
Mpro samples. Only the extremes (top and
bottom 5% of the ECD values across all samples) of
averaged values are labeled for each metric. The lowest
averaged values are in yellow, while the highest ones are
in red. The last three C-terminus residues (in white) were
not mapped by RMSF as their high values would mask other
values. While only one protomer is detailed, the data is
applicable to both protomers. The other protomer is
represented as a gray surface.
3D mapping of averaged values for (A) RMSF and (B) average
BC, computed across all
Mpro samples. Only the extremes (top and
bottom 5% of the ECD values across all samples) of
averaged values are labeled for each metric. The lowest
averaged values are in yellow, while the highest ones are
in red. The last three C-terminus residues (in white) were
not mapped by RMSF as their high values would mask other
values. While only one protomer is detailed, the data is
applicable to both protomers. The other protomer is
represented as a gray surface.BC is a network centrality metric that is maximized when
a large number of nodes (Cβ or GLY
Cα atoms) traverse a single node along geodesic
paths to reach a large number of other nodes within a network. When
averaged from MD frames, this metric has been shown to be
approximately inversely related to RMSF.[110] These
values were very similar across all samples (Supporting Information, Figure S2). For this reason,
the discussion of our findings will be around the overall
BC behavior recorded across all samples. High
and low BC values are present within all domains
(Supporting Information, Figure S2). The N-finger was
found to be generally composed of high BC residues,
most likely due to its high degree of non-bonded interactions between
the protease chains. A large portion of domains I and III have low
BC values, most likely due to the comparatively
lower amount of contact with protein surfaces, which allows for a
relatively higher mobility compared to domain II. The top 5% highest
averaged BC values across all Mpro samples
were either found in the monomer core regions or at the dimer
interface, as seen in Figure B. These comprised residue positions 17, 128, 115, 111,
112, 14, 18, 39, 11, 13, 146, 148, 147, 139, 6, and 140 in a
descending order of overall averaged BC values,
ranging from 10480.987 to 6064.650. The residue positions within the
lowest 5% overall averaged BC values comprise the
positions 96, 72, 92, 258, 64, 244, 47, 59, 228, 56, 76, 60, 255, 78,
and 232, ranging from 8.095 to 46.251. A complete listing of the top
BC residues for each protomer is given in
Table S2. Compared to the high BC
areas that correspond to very well maintained communication residues
(which may well be actual functional sites[110]), the
low BC values represent residues that are least
important for maintaining the flow of communication across the protein
due to the transient nature of their medium- to long-ranged path
lengths. From the computed sample average BC values
and as seen in the Supporting Information, Table S2, residue positions
17 and 128 were found to occur as the most common first two residues
in all cases. In the previous work done in human heat shock protein,
high BC residues found within cavities were found to
correspond to allosteric hot spots, as these had been independently
verified by the sequential application of external forces on protein
residues using the perturbation response scanning (PRS) method. In our
case, however, these two positions are not found in cavities and are
rather buried structural units that are not very mobile in the short
to medium range. The high BC at M17 is possibly due
to the increased stability imparted by their presence at a dimer
interface. Due to its presence between the N-finger and the beginning
of domain I, it is likely a pivot point for the N-finger. Due to the
high centrality of these residues, it is possible that mutations
leading to the alteration of their physicochemical activity may be
accompanied by a decrease in dimer stability. EPI_ISL_423772, which
contains the M17I mutation displayed a relatively high backbone
mobility, more particularly along one of the α-helices of domain
III. Even though both residues are hydrophobic, an increase in
backbone motion was observed. It is possible that by the replacing of
the unbranched methionine side chain[111] by the
branched amino acid isoleucine, the potential for local mobility is
reduced, possibly leading to far-ranging compensatory effects.
Estimation of Interprotomer Distances Using COM Distance
The COM distance distributions estimated from all atoms between the
Mpro protomers (Figure ) indicate that the distance between
them can vary from over about 2.4 to 2.8 nm globally, and the presence
of modes of differing height suggests at least two different
equilibrium chain COM distances in these cases. About half of the
samples displayed KDE dJ distances that
were very close to that of the reference protease, corresponding to a
distance value of about 0.5 nm. More specifically, in isolates
EPI_ISL_421763 and EPI_ISL_419984, a significant proportion of the
sampled conformations depicts the presence of closer chain COM values,
though the percentage of such conformations is lower in
EPI_ISL_419984. 3D visualization showed that the THR70 side chain
formed an additional (but transient) H-bond with either VAL73 or with
the MET17 backbone oxygen in EPI_ISL_421763. In addition to increasing
interconnectivity within the β hairpin, temporary H-bonding
interactions between MET17 and THR70 connecting the N and C-termini
from domain I may be the main reason for the lowest interprotomer COM
distance. We note that ALA70 cannot form such a bridge with MET17 due
to its hydrophobic side chain. From Figure , it can be seen that domain III is a
key player associated with an increased COM distance, given that the
largest displacements are experienced there. Therefore, it is not
impossible that THR70 may have remotely interfered with the motion of
domain III. Longer MD simulations may be needed to ascertain this
observation. Isolate EPI_ISL_415610 (bearing mutation A193V) displayed
the highest median interprotomer COM distance. The main
physicochemical difference in A193V is the higher molecular weight and
volume of the valine side chain; however, the mutation occurs in the
solvent-exposed linker that connects domains II and III. Weighted
residue contact mapping showed that the main difference was a six-fold
increase in the residue contact frequency (0.72 in the mutant versus
0.17 in the reference) between residue 193 and THR196, in chain A.
Samples starting from EPI_ISL_425655 onward were sampled for modes,
aligned to the reference protease, and colored by their
Cα–Cα distance in Figure . Only samples
showing the most marked differences from the reference are shown.
Superposition adjustment was made as previously explained to obtain
the most probable structural alignment for the dimer. Due to the
observation of a generally higher deviation of domain III from the
rest of the protein, a correlation was calculated at a 95%
significance level for all short-listed equilibrium conformations
against the interprotomer COM distance, and as intuited, a
non-negligible correlation of 0.59 with a p-value of
0.003 was obtained. It is very likely that the domain III COM motions
could drive larger conformational changes due to their size and
flexibility, should their stabilizing interprotomer interactions
weaken. This is in good agreement with previous studies suggesting its
involvement in regulating of dimerization, more specifically with the
proposition by Hu and colleagues that domain III might act as a
“motivator” controlling N-finger movement in
SARS-CoV.[112] Another notable observation is
the relatively higher mobility of one of the domain III
α-helices and one of the 310 helices in some mutant
equilibrium conformations, when compared to the homologous positions
in the reference equilibrium conformation. COM distance can be
influenced to a certain extent by the shape of domain arrangements as
it can modulate the COM. As such, this may not represent other changes
to the protein geometry. Therefore, we proceeded to quantify
interdomain angles as a next step.
Figure 8
Distributions of interprotomer COM distances across samples,
arranged in an ascending order of the KDE
dJ. The reference
protease is in gray, while the mutants are colored from
yellow to red in an increasing order of distance from the
reference KDE.
Figure 9
(a–j) 3D visualization (top view) of the interprotomer
COM distances across samples. Domain III from each
protomer is circled in black, while the chains are colored
white and salmon in the reference (a). All other samples
are colored according to their Cα atom
displacements from the aligned reference, starting with
blue (lowest) through yellow to red (highest). The
interprotomer COM distance is depicted by horizontal
yellow dashes (and yellow font). Domain III COM distance
is also shown, in red dashes and font.
Distributions of interprotomer COM distances across samples,
arranged in an ascending order of the KDE
dJ. The reference
protease is in gray, while the mutants are colored from
yellow to red in an increasing order of distance from the
reference KDE.(a–j) 3D visualization (top view) of the interprotomer
COM distances across samples. Domain III from each
protomer is circled in black, while the chains are colored
white and salmon in the reference (a). All other samples
are colored according to their Cα atom
displacements from the aligned reference, starting with
blue (lowest) through yellow to red (highest). The
interprotomer COM distance is depicted by horizontal
yellow dashes (and yellow font). Domain III COM distance
is also shown, in red dashes and font.
Estimation of Interdomain Angles in Each Protomer of
Mpro
The angle formed between domains I, II, and II has a relatively high
information content for Mpro dynamics due to the
variability that it captures. While both chains A and B behave
similarly in the reference, a higher degree of interdomain angle
variation is seen across the mutants, comprising skewed distributions
and angle asymmetries between protomers (Figure
). The angle distributions are
similar to those of the reference in some cases but are also more
variable in others. The difference between interdomain angles was
computed for each protomer pair and used to sort the samples in an
increasing order of divergence, such that the most asymmetric
distributions occur on the right. Among the divergent samples, using a
median difference of 2 degrees as cutoff, chains A and B were found to
be most divergent in samples EPI_ISL_423280, EPI_ISL_424470,
EPI_ISL_418075, EPI_ISL_425319, EPI_ISL_420510, EPI_ISL_425284,
EPI_ISL_423642, EPI_ISL_425342, EPI_ISL_425643, EPI_ISL_421312,
EPI_ISL_421515, EPI_ISL_419710, EPI_ISL_425886, EPI_ISL_418269,
EPI_ISL_413021, EPI_ISL_420181, EPI_ISL_417413, EPI_ISL_423007,
EPI_ISL_422860, EPI_ISL_419256, EPI_ISL_425132, EPI_ISL_425839,
EPI_ISL_419756, EPI_ISL_420241, and EPI_ISL_421005 in an ascending
order of difference. This provides additional support behind our
general observation of the protomers generally behaving in an
independent manner in SARS-CoV-2Mpro, as similarly
described in SARS-CoV. As explained earlier, symmetric behavior may be
part of the protease mechanics in the absence of a substrate. Due to
the richness of information retrieved from the measurement of
interdomain angles, we propose that this metric may also help assist
the in silico characterization (or differentiation) of Mpro
variants from future strains of SARS-CoV-2 should any particular
virus-associated phenotype becomes available.
Figure 10
Kernel density distributions of interdomain angles (domains
I–II–III) across the mutant and reference
protease complexes. The violin plots are split into two
for each protein, showing the interdomain angles for
chains A (in blue) and B (in red). The tips of the
distributions mark the minimum and maximum values for both
chains combined in each protein complex.
Kernel density distributions of interdomain angles (domains
I–II–III) across the mutant and reference
protease complexes. The violin plots are split into two
for each protein, showing the interdomain angles for
chains A (in blue) and B (in red). The tips of the
distributions mark the minimum and maximum values for both
chains combined in each protein complex.Samples of the most divergent conformations from the reference protease
are shown in Figure .
Using domain II as a fulcrum, it can be seen that interdomain angles
are not necessarily planar. On the basis of divergence from the
reference protease, it can be observed that domain III moves more than
domain I around the fulcrum. While the equilibrium for EPI_ISL_418075
is quasi-planar (even though it displayed the widest range of
interdomain angles), the equilibrium displaying the largest angle
shown (183.2 degrees) was recorded from EPI_ISL_421005 (from the
explement of the shown angle in Figure c).
Figure 11
(a–f) Visualization of interdomain angles (domains
I–II–III) across the mutant and reference
protease complexes. The reference is in white, while the
mutant samples are colored according to their
Cα atom displacements from the
reference protease, starting with blue (lowest) through
yellow to red (highest). Angles are shown in degrees. Only
one of the protomers is shown in each case.
(a–f) Visualization of interdomain angles (domains
I–II–III) across the mutant and reference
protease complexes. The reference is in white, while the
mutant samples are colored according to their
Cα atom displacements from the
reference protease, starting with blue (lowest) through
yellow to red (highest). Angles are shown in degrees. Only
one of the protomers is shown in each case.
Analysis of Pocket Compaction and Proposition of a Dual Allosteric
Site in SARS-CoV-2
The substrate binding pocket, nested in the cleft between domains I and
II,[112] is composed of the very flexible loops
intertwined with the catalytic dyad residues HIS41 and CYS145. With
the assistance of HIS41, residue CYS45 acts as an electrophile during
the first step of hydrolysis.[36] Previously, studies
in SARS-CoV have reported that components of this pocket—the S1
subsite (residues PHE140, HIS163, MET165, GLU166, and HIS172), an
oxyanion hole (main-chain amides of residues GLY143, SER144, and
CYS145), and an oxyanion loop (residues SER139-LEU141)—play an
important role in stabilization of the binding site in the active form
of Mpro in SARS-CoV.[95,102,112] Conversely, inactive Mpro is
characterized by the collapse of both the S1 subsite and the oxyanion
hole,[95] whereby the loop moves inward leaving
no space for the tetrahedral intermediate product, which normally
results from proteolytic cleavage of a substrate.[112,113] A
more recent study by Li et al. positioned the same loop (also known as
the “chameleon loop”) as a switch by producing an active
monomer,[103] whereby its stabilization into a
310 helix was associated with the inactive enzyme.
The stacking of the PHE140 phenyl ring against the HIS163imidazole
ring (found in the S1 subsite) is a key interaction that maintains the
latter uncharged over a range of pH changes.[100]
Another important feature of the binding site is the presence of a
very flexible loop at residue positions 45–51, which form
subsites S2 and S3′, termed the “TAEDMLN loop” in
SARS-CoV. In SARS-CoV-2, this loop has mutated to
“TSEDMLN”.[84] The loop has
further mutated to “TSEEMLN” in sample EPI_ISL_425242,
which we speculate may have an impact on substrate binding affinity or
even specificity. Different cavity calculation methods were then used
to scan the reference protease for any potentially unknown functional
sites, namely, FTMap and PyVOL. Their predictions were concordant for
all cases, except for the substrate binding site, which FTMap did not
detect (Figure ). However,
a very good coverage of interprotomer cavities was obtained by
combining both methods, but more interestingly, a potential allosteric
site was found at the interprotomer interface, next to the substrate
binding site. It is mirrored across each side of the interacting
protomers. For this reason, the dynamics of both pockets were
examined, as this could play an important role in the dimerization
properties of the protomers. The substrate binding site from each
protomer was also examined due to its already known functional
importance in catalysis. While the substrate binding pockets are
easily defined as belonging to a given protomer, the interfacial
pocket is composed of residues from each chain, namely, residues 116,
118, 123, 124, 139, and 141 on chain A and residues 5–8, 111,
127, 291, 295, 298, 299, 302, and 303 on chain B. The mirrored
interfacial pocket comprised the same residue positions but for the
opposite chain labels. When compared to available literature for the
SARS-CoV, multiple residues corresponded to stabilizing elements of
the catalytic pocket. For instance, residues 139 and 141 form part of
the “chameleon” switch, which assumes a 310
helix form when inactive, while residues 5–8 are part of the
N-finger, which is essential for the stability and activity of the
alternate protomer. The FTMap probes formed identical cross-clusters
composed of probe IDs 1amn, 1ady, 1eth, and 1acn at each site, which
respectively correspond to the compounds methylamine, acetaldehyde,
ethane, and acetonitrile. As noted from the probe chemical
compositions, this dual site has the potential to accommodate
compounds of low molecular weight, with no (or probably limited)
rotatable bonds and low octanol–water partition
coefficients.[114] The substrate binding site,
as determined by PyVOL, comprised residues 25, 26, 27, 41, 44, 49,
140, 141, 142, 143, 144, 145, 163, and 166 in each protomer.
Figure 12
Pocket detection using combined predictions from FTMap and
PyVOL. FTMap probes are shown as stick figure
representations, while those from PyVOL are shown as
surfaces. The protomers are depicted as cartoon
representations, in gray and light orange.
Pocket detection using combined predictions from FTMap and
PyVOL. FTMap probes are shown as stick figure
representations, while those from PyVOL are shown as
surfaces. The protomers are depicted as cartoon
representations, in gray and light orange.From the distributions of Rg values for the
catalytic site (Figure ),
we promptly observe the asymmetry in compaction between substrate
binding pockets coming from each protomer in each sample. Partial
symmetry is seen in few cases, such as EPI_ISL_419710, EPI_ISL_425242,
EPI_ISL_423007, 416720, EPI_ISL_421380, EPI_ISL_425284, and
EPI_ISL_419256. In the reference sample, we observed a shift in
equilibrium Rg, where one protomer
oscillates around a value while its counterpart also explores two
others. Multimodal distributions indicate the presence of more than
one equilibrium, which hints at cavity expansion and compaction
movements. The most compact substrate binding cavities are observed in
one of the protomers from sample EPI_ISL_425489 and to some extent
samples EPI_ISL_418082 and EPI_ISL_423642.
Figure 13
Kernel density distributions of
Rg values for the
substrate binding site from each protomer of
Mpro, arranged in an ascending order of
the difference in median from each chain. The differences,
shown at the top, are in nanometers. Chain A values are in
red, while chain B values are in blue. The maxima and
minima are across protomers. Quartiles for each binding
site are shown as dotted lines.
Kernel density distributions of
Rg values for the
substrate binding site from each protomer of
Mpro, arranged in an ascending order of
the difference in median from each chain. The differences,
shown at the top, are in nanometers. Chain A values are in
red, while chain B values are in blue. The maxima and
minima are across protomers. Quartiles for each binding
site are shown as dotted lines.Some substrate binding cavities of interest are shown in Figure . Shi and
colleagues described several features surrounding the substrate
binding pocket in SARS-CoV.[100] One of them is the
requirement of a key interaction between residues PHE140 and HIS163
for the stability and activity of the protease. This same region is
shown in Figure , using
poses identified from local energy minima for (a) the reference
protease, (b) EPI_ISL_420610, and (c) EPI_ISL_423642. The highest
probability density pose (in white) is used as a comparator. In Figure a, the stabilizing
PHE140 and HIS163 display their stacked rings and the stabilizing
H-bonds linking the N-finger SER1 to GLU166 and HIE172. ASN28 is also
shown to stabilize the positioning of CYS145 in all cases. In
EPI_ISL_420610 (N274D), however, we observed a yet undescribed
relatively stable pose for PHE140. It basically loses its described
interaction and flips to hydrophopically interact with the C-terminal
helix, via ARG298 (pi-alkyl interaction), GLN299 (amide-pi stacked
interaction and a carbon H-bond), and MET6 (pi-sulfur
interaction).[65] This has the effect of
pulling the oxyanion loop to face away from the catalytic residues,
thus making the pocket wider. Also shown is the coupled movement of an
opposite loop (containing ARG188) by a distance of 8.9 Å,
compared to the reference protease conformation. In EPI_ISL_423642,
the oxyanion loop showed signs of stabilization, which may correlate
with a of lack enzymatic activity for that protomer. As this was
observed for only one of the protomers and given that the isolate was
obtained from an infected individual, for this sequence, we cannot
infer global inactivation.
Figure 14
3D visualization of the Mpro substrate binding
site poses of interest. (a) Local minimum reference from
the reference protease conformation. (b) Detection of a
non-canonical pose for residue 140 (in purple) together
with a significant loop opening (dashed red line) in
EPI_ISL_420610. (c) Early stages of a 310 helix
formation in EPI_ISL_423642. Another local minimum
(highest probability density) pose from the reference
protease (in white) is used as a comparator in all panels.
Chain A is in pale blue, while chain B is in green. The
key interacting residue for PHE140–HIS163 is
depicted as yellow sticks.
3D visualization of the Mpro substrate binding
site poses of interest. (a) Local minimum reference from
the reference protease conformation. (b) Detection of a
non-canonical pose for residue 140 (in purple) together
with a significant loop opening (dashed red line) in
EPI_ISL_420610. (c) Early stages of a 310 helix
formation in EPI_ISL_423642. Another local minimum
(highest probability density) pose from the reference
protease (in white) is used as a comparator in all panels.
Chain A is in pale blue, while chain B is in green. The
key interacting residue for PHE140–HIS163 is
depicted as yellow sticks.As done for the substrate binding cavity, the degree of compaction of the
interprotomer cavities was also measured. Here as well, the
distributions are asymmetric (Figure ). It is tempting to do a comparison
between the substrate binding cavity and the interprotomer pockets.
However, when the median Rg values
obtained for the substrate binding pocket are correlated (using
Spearman’s correlation) against the corresponding medians for
the interprotomer pocket across all samples, no significant
correlation was obtained, even though in our case, residue 141 was
shared between both pockets. On an individual level, however, there is
a significant degree of correlation between the interprotomer pocket
and the substrate binding site, ranging from −0.68 to 0.54
(using Spearman’s correlation), with p-values
< 0.01 and absolute correlations > 0.1 for 64.9% of the sample
comparisons (two substrate binding sites vs. two interprotomer pockets
for each sample). As the interprotomer pocket is mirrored between
chains, this may explain the negative correlations. From this finding,
we propose that the interprotomer pockets may play an important role
in affecting the degree of compaction of the binding cavity and vice
versa. This suggests the possibility of a potentially bivalent
modulation of the interprotomer pocket and the substrate binding site.
As Rg only captures the overall degree of
compaction, it may not entirely inform us about the cavity volume
accessible to an allosteric modulator; however, this may be an
interesting lead for allosteric modulator targeting. Snapshots where
large deviations were obtained (compared to a high probability density
reference conformation) are shown in Figure . The Rg
values are used as radii for the spheres shown in Figure , centered at the COM of the
allosteric pockets. Different distribution modes for EPI_ISL_423772
(a–c) point to a greater movement of an α-helical region
of domain III, from only one protomer at a time, as seen previously.
The equilibrium from EPI_ISL_425886 and to some extent that of
EPI_ISL_420510 behaved similarly. The second equilibrium from the
reference sample (d) performed very similarly to the reference of
highest probability density used as a comparator (in white) for all
the shown cases. From all the equilibrium poses, it can be seen that
the differences in subpocket Rg between
each protomer pair are not associated with similar displacements of
the domain III helix.
Figure 15
Kernel density distributions of
Rg values across samples
for the mirrored interfacial (and potentially allosteric)
pockets. The differences, shown at the top, are in
nanometers.
Figure 16
(a–f) Visualizations of the
Rg values (Å) of
the mirrored interfacial pockets in various samples. The
reference (obtained after 63 ns) is in white, while the
mutant samples are colored according to their
Cα atom displacements from the
reference protease, starting with blue (lowest) through
yellow to red (highest). Regions of high divergence are
circled. The spheres are centered at the COM of the pocket
residues, while their radii are the
Rg values in Å.
Kernel density distributions of
Rg values across samples
for the mirrored interfacial (and potentially allosteric)
pockets. The differences, shown at the top, are in
nanometers.(a–f) Visualizations of the
Rg values (Å) of
the mirrored interfacial pockets in various samples. The
reference (obtained after 63 ns) is in white, while the
mutant samples are colored according to their
Cα atom displacements from the
reference protease, starting with blue (lowest) through
yellow to red (highest). Regions of high divergence are
circled. The spheres are centered at the COM of the pocket
residues, while their radii are the
Rg values in Å.
Investigating Correlated Residue Motions Using DCC
In order to compare the DCC across all samples, the DCC matrices were
linearized before calculating pairwise correlations and clustering the
final matrix (Figure ).
While abstracting out the intricate details of residue correlation,
clusters of correlated samples inform us on the global similarity of
correlated protein motion across each pair of samples. From Figure , it can be seen
that the samples are generally highly correlated, with the exception
of samples EPI_ISL_415503, EPI_ISL_416720, EPI_ISL_419984,
EPI_ISL_420241, EPI_ISL_422184, and EPI_ISL_418075, which form a
subcluster of moderately correlated samples. Both EPI_ISL_415503 and
EPI_ISL_416720 have been described in section as having higher RMSF
values at positions 46–54 due to the mutations V157I and Y237H.
Mutation R105H (occurring on a loop region) in EPI_ISL_419984 leads to
the loss of an H-bond with F181 but forms a pi-pi stacking interaction
with Y182. The most likely explanation for a difference in this case
may be related to the sampling of at least two conformational
equilibria using protomer COM distance, as described in section . In
sample EPI_ISL_420241, the P184L mutation occurs in a solvent-exposed
loop, and no major non-bonded interactions were detected from the side
chains or backbone atoms. The main reason for the observed difference
is due to the increased divergence in interdomain angles sampled from
MD. In the case of EPI_ISL_422184, as explained in section , it was found that
the N-finger from one chain had moved by a larger extent at the end of
the simulation (at about 93 ns), diminishing its contacts with the
alternate protomer, to interact more with its own protomer. This may
be attributable to the S301L mutation, which reduces H-bonding at the
end of the C-terminal helical structure. In the reference, four
H-bonds are formed between S301 and the backbone oxygen atoms of V297
and R298, whereas two H-bonds are formed by L301. Sample
EPI_ISL_418075 had the lowest positive correlation to all samples.
Upon closer examination, it was found that the terminal α-helix
for each protomer was getting gradually destabilized toward the end of
the simulation. This is very likely an artifact linked to the absence
of the two C-terminal residues, as the residue is a solvent-exposed
protein and displays very similar residue interactions at position 255
in both the reference and the mutant, even though A255V occurs on a
helix.
Figure 17
Heat map of correlations obtained from the linearized DCC
matrices for the mutants and the reference Mpro
clustered using the Euclidean distance.
Heat map of correlations obtained from the linearized DCC
matrices for the mutants and the reference Mpro
clustered using the Euclidean distance.The main observation across all samples is that residues within their
individual domains are highly positively correlated within their
respective domains. Domains I and II are seen to share a high degree
of correlation with each other, behaving as a single unit most likely
to maintain the integrity of the catalytic surface. From the MD
simulations, we find that domain III is generally not positively
correlated with any of the other two domains, can even be negatively
correlated with itself on the alternate chain, and may suggest a
degree of independence for domain III in terms of dynamics and
possibly function as well, at least in its dimeric apo form.
Coarse-Grained Simulations of SARS-CoV-2 Mpro Structures
Reveal Relationships between Mutational Patterns and Functional
Motions
To complement all-atom MD simulations and obtain a more granular
description of structural dynamics in studied systems, we performed
coarse-grained (CG) simulations of the SARS-CoV-2 main protease
structures in the free and ligand-bound forms using the CABS
approach[67−71] (Figure ). By using a large number of independent CG
simulations, we obtained conformational dynamics profiles for the
studied systems and analyzed these distributions in the context of
examined mutations.
Figure 18
Conformational dynamics and collective motion slow-mode
profiles of the SARS-CoV-2 main protease structures. (A)
Computed root-mean-square fluctuations (RMSF) from CG MC
dynamics simulations of the free enzyme of the SARS-CoV-2
main protease (PDB IDs 5RVF and 6Y2E).
The profile is shown in magenta lines. The positions of
the residues undergoing mutations are shown in red filled
circles (A7, G15, M17, V20, T45, D48, M49, R60, K61, A70,
G71, L89, K90, P99, Y101, R105, P108, A116, A129, P132,
T135, I136, N151, V157, C160, A173, P184, T190, A191,
A193, T196, T198, T201, L220, L232, A234, K236, Y237,
D248, A255, T259, A260, V261, A266, N274, R279, and S301).
(B) PCA analysis of functional dynamics SARS-CoV-2 main
protease structures. The slow-mode shapes are shown as
mean square fluctuations averaged over the first five
lowest frequency modes (in magenta lines). The residues
undergoing mutations are shown in red filled circles as in
panel A. (C) Structural map of the functional dynamics
profiles derived from CG-MD simulations in the SARS-CoV-2
main protease (PDB IDs 5RVF and 6Y2E).
The color gradient from blue to red indicates the
decreasing structural rigidity and increasing flexibility
as averaged over the first five low frequency modes. The
positions of the residues undergoing mutations are shown
in spheres colored according to their level of
rigidity/flexibility in slow modes (blue-rigid,
red-flexible). The locations of the protease domains I,
II, and III are indicated. The catalytic residues HIS41
and CYS145 are shown in sticks. (D) Rotated view for the
structural map of functional dynamics profiles in the
SARS-CoV-2 main protease with sites of mutations in
spheres.
Conformational dynamics and collective motion slow-mode
profiles of the SARS-CoV-2 main protease structures. (A)
Computed root-mean-square fluctuations (RMSF) from CG MC
dynamics simulations of the free enzyme of the SARS-CoV-2
main protease (PDB IDs 5RVF and 6Y2E).
The profile is shown in magenta lines. The positions of
the residues undergoing mutations are shown in red filled
circles (A7, G15, M17, V20, T45, D48, M49, R60, K61, A70,
G71, L89, K90, P99, Y101, R105, P108, A116, A129, P132,
T135, I136, N151, V157, C160, A173, P184, T190, A191,
A193, T196, T198, T201, L220, L232, A234, K236, Y237,
D248, A255, T259, A260, V261, A266, N274, R279, and S301).
(B) PCA analysis of functional dynamics SARS-CoV-2 main
protease structures. The slow-mode shapes are shown as
mean square fluctuations averaged over the first five
lowest frequency modes (in magenta lines). The residues
undergoing mutations are shown in red filled circles as in
panel A. (C) Structural map of the functional dynamics
profiles derived from CG-MD simulations in the SARS-CoV-2
main protease (PDB IDs 5RVF and 6Y2E).
The color gradient from blue to red indicates the
decreasing structural rigidity and increasing flexibility
as averaged over the first five low frequency modes. The
positions of the residues undergoing mutations are shown
in spheres colored according to their level of
rigidity/flexibility in slow modes (blue-rigid,
red-flexible). The locations of the protease domains I,
II, and III are indicated. The catalytic residues HIS41
and CYS145 are shown in sticks. (D) Rotated view for the
structural map of functional dynamics profiles in the
SARS-CoV-2 main protease with sites of mutations in
spheres.RMSF of protein residues revealed the distribution of stable and flexible
regions, thereby allowing the assessment of the extent of mobility for
mutational sites (Figure ,
panel A). In domain I, the most flexible residues are in the region
45–75, while for domain II, the L2 loop (residues
165–172) and L3 (residues 185–200) located around the
substrate binding pocket also harbor a significant number of flexible
positions. In addition, we observed significant fluctuations for
surface residues 153–155 and 274–277 (Figure , panel A). Of particular
interest is the distribution of stable and flexible residues in the
substrate binding pocket. Some pocket residues from domain I (T24,
T25, T26, and L27) experience moderate fluctuations, while several
other sites (M49 and Y54) displayed considerably higher mobility.
Another group of the substrate binding site residues from domain II
(S139, F140, and Q189) also exhibited appreciable fluctuations, while
residues G143, S144, H164, H163, E166, P168, and C145 showed only
moderate changes and remained stable in CG simulations (Figure , panel A). Notably
and as expected, the catalytic residues C145 and H41 from the
substrate binding site also remained stable. The analysis generally
showed that domain II residues were stable, while domain III (residues
198–303) showed more flexibility, especially in the peripheral
solvent-exposed regions. This domain is involved in regulation of the
dimerization through a salt bridge interaction between GLU290 of one
protomer and ARG4 of the other protomer. Importantly, we found that
these residues remained extremely stable in simulations.
Interestingly, buried positions subjected to mutations exhibited
different levels of flexibility. While positions A7, V20, A116, A129,
T135, I136, V157, C160, A173, and T201 were very stable, other buried
sites with registered mutations in domain III (A234 and A266) showed
larger fluctuations. It is worth mentioning that the interfacial
residue A7 in the N-finger region important for enzymatic activity
showed an extreme level of rigidity. Simulation-derived
residue–residue couplings were evaluated using principal
component analysis (PCA). By comparing slow-mode profiles, we found
that functionally significant patterns can be yielded with up to the
five slowest eigenvectors that account for 90% of the total variance
of the dynamic fluctuations. The functional dynamics profile averaged
over the five slow modes showed that domain I (residues 10–99)
and domain III (residues 198–303) are mostly mobile in
functional motions and can undergo large structural changes (Figure , panel B). At the
same time, domain II (residues 100–182) is mostly stable during
functional dynamics. The distribution of mutational sites clearly
indicated the existence of two major clusters. One cluster of
mutations is located in highly mobile regions of domain III (T198I,
T201A, L220F, L232F, A234V, K236R, Y237H, D248E, A255V, T259I, A260V,
V261A, A266V, N274D, R279C, and S301L). These residues involved in
protein motion are likely under different evolutionary constraints
compared to other functional sites. Another cluster of mutations is
distributed in domain II and includes 3 subgroups: a group of fully
immobilized positions (A116V, A129V, P132L, T135I, and I136V), a group
of bridging (hinge-like) sites that connect rigid and flexible regions
(Y101C, R105H, P108S, N151D, V157I/L, C160S, A173V, and P184L/S), and
a group of mostly mobile residues (T190I, A191V, and A193V) (Figure , panel B). The
group of potential hinge sites may be important for controlling
regulatory motions, and mutations in these regions (such as V157I/L
and P184L/S) may affect global movements in the protease and its
enzymatic activity. It is particularly important to dissect the
connection between the function of some key residues and their
contribution in collective movements. The dimerization residues (R4,
M6, S10, G11, E14, N28, S139, F140, S147, E166, E290, and R298) are
characterized by different local flexibilities but tend to correspond
to low moving regions of the protein in collective motions (Figure , panels C and D).
The key substrate binding residues (H163, H164, M165, E166, and L167)
are located at the very border of structurally immobilized and more
flexible regions, and as such may constitute a hinge region that
controls cooperative movements. Notably, some other binding site
residues D187, R188, Q189, T190, and A191 are more flexible in slow
modes and may undergo functional motions. Substrate recognition sites
tend to exhibit structural flexibility and sequence variations so as
to enable specific recognition required for mediating substrate
specificity. We also explored the functional dynamics profile of the
ligand-bound protease complex (Figure ). This structural map clearly
illustrated that the ligand binding site comprised both rigid and
flexible residues and was located in the region that bridges an area
of high and low structural stability. In particular, we highlighted
that residues S46, M49, T190, and A191 in the substrate recognition
site and in the ligand proximity may belong to moving regions in the
global motions (Figure ,
panel B).
Figure 19
Structural map of functional motion profiles of the
SARS-CoV-2 main protease structure complex with a ligand.
(A) Structural map of the functional dynamics profiles
derived from CG-MD simulations in the SARS-CoV-2 main
protease in the complex with the α-ketoamide ligand
(PDB ID 6Y2F). The slow-mode shapes are averaged
over the first five lowest frequency modes. The color
gradient from blue to red indicates the decreasing
structural rigidity and increasing flexibility as averaged
over the first five low frequency modes. The positions of
the residues undergoing mutations (A7, G15, M17, V20, T45,
D48, M49, R60, K61, A70, G71, L89, K90, P99, Y101, R105,
P108, A116, A129, P132, T135, I136, N151, V157, C160,
A173, P184, T190, A191, A193, T196, T198, T201, L220,
L232, A234, K236, Y237, D248, A255, T259, A260, V261,
A266, N274, R279, and S301) are shown in spheres colored
according to their level of rigidity and flexibility in
the low frequency modes (blue-rigid, red-flexible). The
locations of the protease domains I, II, and III are
indicated. The α-ketoamide ligands are shown in
sticks in both protomers. (B) Rotated view for the
structural map of functional dynamics profiles in the
SARS-CoV-2 main protease with sites of mutations in
spheres. The position of the α-ketoamide ligand is
shown in sticks. The mobile residues in the slow modes
from the substrate binding site that form interactions
with the ligand (S46, M49, T190, and A191) are indicated
by arrows and annotations.
Structural map of functional motion profiles of the
SARS-CoV-2 main protease structure complex with a ligand.
(A) Structural map of the functional dynamics profiles
derived from CG-MD simulations in the SARS-CoV-2 main
protease in the complex with the α-ketoamide ligand
(PDB ID 6Y2F). The slow-mode shapes are averaged
over the first five lowest frequency modes. The color
gradient from blue to red indicates the decreasing
structural rigidity and increasing flexibility as averaged
over the first five low frequency modes. The positions of
the residues undergoing mutations (A7, G15, M17, V20, T45,
D48, M49, R60, K61, A70, G71, L89, K90, P99, Y101, R105,
P108, A116, A129, P132, T135, I136, N151, V157, C160,
A173, P184, T190, A191, A193, T196, T198, T201, L220,
L232, A234, K236, Y237, D248, A255, T259, A260, V261,
A266, N274, R279, and S301) are shown in spheres colored
according to their level of rigidity and flexibility in
the low frequency modes (blue-rigid, red-flexible). The
locations of the protease domains I, II, and III are
indicated. The α-ketoamide ligands are shown in
sticks in both protomers. (B) Rotated view for the
structural map of functional dynamics profiles in the
SARS-CoV-2 main protease with sites of mutations in
spheres. The position of the α-ketoamide ligand is
shown in sticks. The mobile residues in the slow modes
from the substrate binding site that form interactions
with the ligand (S46, M49, T190, and A191) are indicated
by arrows and annotations.Our analysis shows that structural clusters of mutations may be
distinguished by their evolution propensity and global mobility in
slow-mode regions. The mobile residues may be predisposed to serve as
substrate recognition sites, whereas residues acting as global hinges
during collective dynamics are often supported by conserved residues.
The observed conservation and mutational patterns may thus be
determined by functional catalytic requirements, structural stability
and geometrical constraints, and functional dynamics patterns. We
found that some sites and corresponding mutations may be associated
with dynamic hinge function. The mutability of hinge sites (Y101C,
R105H, P108S, V157I/L, C160S, A173V, and P184L/S) and nearby sites
(T190I, A191V, and A193V) may be related with their structural and
dynamic signatures to reside in the exposed protein regions rather
than in the more conserved protein core. We could also conclude that
these sites are located near the active site and control the bending
motions needed for catalysis; so, their mutability may have an
important functional role for enzymatic activity especially when
combined with mutations in adjacent regions.
Conclusions
COVID-19 represents a significant global threat for which no effective solution
currently exists. Physical distancing[115] has
significantly contributed in slowing down the viral progression. However,
time is of the essence to find a cure that will counter current and future
impacts of the virus, especially for those at a higher risk and for the
world economies. Analyzing the structural and dynamic properties of the
novel mutants of the SARS-CoV-2Mpro and contrasts with known
SARS-CoV homolog facts gives important pointers and details about its
dynamic behavior.From this work, several missense mutations were found across all domains of the
SARS-CoV-2Mpro, with various levels of support. There are
various single residue substitutions, among which several are substitutions
of alanine and valine. These mutations have occurred both in buried and
solvent-accessible surfaces. From our filtered data set, residue positions
15, 157, and 184 appear to have mutated more than once. A relatively high
number of titratable amino acids present in Mpro, which are
similar to SARS-CoV, may play an important role in influencing its behavior
at various pH levels. Higher backbone flexibility was observed for the
isolates EPI_ISL_416720, EPI_ISL_426097, EPI_ISL_421763, EPI_ISL_420610,
EPI_ISL_425284, EPI_ISL_421380, EPI_ISL_423772, and EPI_ISL_425886 mainly
due to domain III motions and to some extent part of domain I. More
generally, regions of lowest flexibility (and high BC) were
core residues, while solvent-exposed loops were most flexible. All samples
displayed a slight interprotomer twisting motion and domain rotational
motions. A high degree of variation was observed in (1) the angle formed
between domains I, II, and II, (2) the substrate binding pocket
Rg, (3) the interprotomer pocket
Rg, and (4) the N-finger flexibility,
which may all be good descriptors for characterizing Mpro
dynamics. BC values were very similar across all samples,
with extreme values being essentially anti-correlated to RMSF. Residues 17
and 128 appear to be very central residues based on the BC
network metric, and it is likely that mutations altering their
physicochemical property may have the potential to alter dimer stability.
The D48E variant (from sample EPI_ISL_425242) leads to a novel
“TSEEMLN” motif at the substrate binding flap, which may have
repercussions on the efficiency and specificity of substrate binding.
Inactive apo Mpro did not show signs of dissociation. A
non-canonical pose for the PHE140 widens the accessible surface of the
substrate binding pocket by pulling on the oxyanion loop. We propose the
presence of a mirrored allosteric interprotomer pocket, supported by two
cavity detection approaches, and correlations in compaction dynamics between
the interprotomer pocket and the substrate binding pocket. The mirrored
pocket may have the potential to accommodate compounds of low molecular
weight and polarity. Asymmetries to partial symmetries in
Rg distributions were seen for each of the
substrate binding pockets and the interprotomer cavities for each isolate.
The compactions of the allosteric subpockets do not seem to noticeably
affect the movement of domain III. However, a large portion of the samples
displayed overall positive correlations according to DCC, indicating that
the mutants generally behave similarly. In each individual DCC plot, domains
I and II are found to behave as a single unit, while domain III is generally
more independent. Thorough, independent CG simulations of the apo and the
ligand-bound Mpro further revealed a connection between regions
accumulating clusters of mutations and their degree of residue fluctuation
from the slowest modes. Additionally, we report of a possible set of dynamic
hinging residues and their tendency to acquire mutations in exposed protein
regions while being grounded by less mutable core residues.As a final note, it is important to be aware that there is an inherent lack of
sampling depth in this current analysis of COVID-19 sequences due to the
existence of undiagnosed mutations that may be present among infected
individuals[116] at the time of writing, and in our
case, we might have missed certain mutations using our filtering criteria,
in our effort to balance accuracy and the number of high confidence
representative samples. Therefore, frequencies should be handled with
caution as the sampling was done at an early stage of the pandemic.
Authors: Peter Eugene Jones; Carolina Pérez-Segura; Alexander J Bryer; Juan R Perilla; Jodi A Hadden-Perilla Journal: Curr Opin Virol Date: 2021-08-28 Impact factor: 7.121
Authors: Wenchun Fan; Katrina B Mar; Levent Sari; Ilona K Gaszek; Qiang Cheng; Bret M Evers; John M Shelton; Mary Wight-Carter; Daniel J Siegwart; Milo M Lin; John W Schoggins Journal: Cell Date: 2021-05-31 Impact factor: 66.850