Literature DB >> 32931703

Sequence Characterization and Molecular Modeling of Clinically Relevant Variants of the SARS-CoV-2 Main Protease.

Thomas J Cross1, Gemma R Takahashi2, Elizabeth M Diessner1,3, Marquise G Crosby2, Vesta Farahmand1, Shannon Zhuang1, Carter T Butts3,4, Rachel W Martin1,2.   

Abstract

The SARS-CoV-2 main protease (Mpro) is essential to viral replication and cleaves highly specific substrate sequences, making it an obvious target for inhibitor design. However, as for any virus, SARS-CoV-2 is subject to constant neutral drift and selection pressure, with new Mpro mutations arising over time. Identification and structural characterization of Mpro variants is thus critical for robust inhibitor design. Here we report sequence analysis, structure predictions, and molecular modeling for seventy-nine Mpro variants, constituting all clinically observed mutations in this protein as of April 29, 2020. Residue substitution is widely distributed, with some tendency toward larger and more hydrophobic residues. Modeling and protein structure network analysis suggest differences in cohesion and active site flexibility, revealing patterns in viral evolution that have relevance for drug discovery.

Entities:  

Mesh:

Substances:

Year:  2020        PMID: 32931703      PMCID: PMC7518256          DOI: 10.1021/acs.biochem.0c00462

Source DB:  PubMed          Journal:  Biochemistry        ISSN: 0006-2960            Impact factor:   3.162


Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) emerged in late 2019[1] and rapidly spread worldwide, causing an ongoing pandemic. Although the sequence of its RNA genome is highly similar to that of SARS-CoV-1, SARS-CoV-2 is believed to have arisen independently from a bat coronavirus,[2] to which it shares 96% similarity.[3] The emerging SARS-CoV-2 subsequently gained a modified spike protein due to recombination in an intermediate host, possibly the pangolin,[4,5] followed by purifying selection for binding to the human ACE2 protein.[6] No therapeutic agents able to reduce SARS-CoV-2 mortality in clinical settings are yet known, although extensive efforts are underway to discover new drugs or repurpose existing ones to inhibit key viral proteins. Here we focus on the main protease (M), which plays a critical role in viral replication. Like other betacoronaviruses, SARS-CoV-2 is a positive-sense RNA virus that expresses all of its proteins as a single polypeptide chain, which is cleaved by M and the papain-like protease PL to yield the mature proteins.[7] Inhibiting this key enzyme would prevent viral replication, reducing viral load and thus symptom intensity. A similar approach was instrumental in making HIV a manageable disease.[8−10] However, the proteins in question differ markedly, rendering HIV protease inhibitors ineffective against SARS-CoV-2; indeed, a standard HIV protease inhibitor combination did not prove effective against COVID-19 in a recent clinical trial.[11] Specifically, HIV protease is an aspartic protease, whereas M is a 3CL cysteine protease. The 3CL cysteine proteases are characterized by a chymotrypsin-like fold and a cysteine-histidine catalytic dyad in the active site, implying both different structures and distinct chemical mechanisms. Although the general strategy of seeking protease inhibitors is hence viable for both SARS-CoV-2 and HIV, drug development for the former depends on characterizing this novel enzyme. The M sequence and three-dimensional structure are highly conserved among coronaviruses, with a characteristic three-domain fold.[12] Domains I and II make up the catalytic region, which has a chymotrypsin-like structure, while domain III is an α-helical domain that is primarily responsible for dimerization.[13,14] Although a mix of monomers and dimers is found in solution,[15] studies of the SARS-CoV-1 M have shown that the dimer is the functional state in the mature protein, whereas the monomer has dramatically reduced trans-enzymatic activity against the substrate.[13] Removing key residues involved in dimerization by mutation or truncation of the C-terminus results in lower enzymatic activity concomitant with an increased population of monomers.[16] The N-finger (residues 1–7) is not required for dimerization[17] but is necessary for activity, apparently due to highly specific interactions between the N-terminus of the inactive monomer and domains II and III of the active one.[18−20] In the G11A variant, the N-finger is unable to occupy the correct binding site, leading to dissociation of the dimer, loss of activity, and collapse of the oxyanion hole that is an essential part of the substrate-binding pocket.[21] The global efforts to document clinically isolated SARS-CoV-2 sequences since the beginning of the pandemic provides an opportunity to examine the impact of mutations on the protein structure and dynamics, and as more sequences are collected going forward, to discover which residues and functional features are absolutely conserved. Molecular modeling is an important tool for guiding inhibitor discovery, making it possible to evaluate large numbers of candidate drugs in silico to select experimental targets; however, standard approaches screen against only one version of the protein, typically the reference or wild-type (WT) sequence. In a host population, mutations accumulate with each viral passage, generating a mutational landscape rather than a single protein. The design of robust inhibitors that can protect against the multiple strains encountered in clinical settings requires the characterization of this sequence space and the populations of conformations it engenders. Furthermore, effective and rapid response to future emerging coronavirus diseases requires both in silico screening and experimental testing of antiviral agents and a validated library of relatively general inhibitors that can be used as a basis for the development of specialized therapeutics. Central to the success of that effort will be developing an understanding of structural and functional variation in SARS-CoV-2 proteins, particularly as mutations accumulate and new strains emerge. In general, SARS-CoV-2 is mutating more slowly than would be expected under neutral drift, suggesting that most of the genome is subject to purifying selection,[22] although this does not rule out adaptive mutations in specific genes or other sequence regions.[23] Especially with a newly emerged zoonotic disease, there is no reason to assume that any viral protein is currently at a global optimum with respect to function, and function-enhancing variants may appear at any time. For example, mutation of the SARS-CoV-2 spike protein is a topic of intense current interest, as its D614G variant appears to confer enhanced infectivity,[24] has spread rapidly through the global population,[25] and correlates with a higher mortality rate.[26] M appears to be relatively tolerant of mutations near the active site,[27] underscoring the importance of mapping the mutational landscape of active variants so that inhibitors whose binding depends on highly specific interactions with mutation-prone residues can be eliminated early in the screening process. Here we characterize all 79 known variants of M as of 29 April, 2020, and analyze trends in amino acid substitutions and the resulting structural changes for both monomers and dimers, using molecular modeling and moiety-level protein structure network analysis. Our analysis shows a trend toward substitution for larger and more hydrophobic residues versus the WT protein. Analysis of active site networks (ASNs) from M variants suggests differences in active site flexibility and cohesion that may serve to guide the design of robust, mutation-resistant inhibitors. Intervariant differences are also observed among the dimer interfaces, which is another potential inhibitor target.

Materials and Methods

Sequence Analysis and Clustering

SARS-CoV-2 genome sequences were found by searching the GISAID (https://www.gisaid.org/)[28] EpiCoV database on May 3, 2020, using the host keyword “human” and a submission cutoff date of April 29, 2020, yielding a total of 15 432 SARS-CoV-2 genomes. Genomes outside the range of ±3% reference (RefSeq: NC 045512.2) length (29 006 bp–30 800 bp inclusive) or ≥1% N content were removed, leaving 10 644 “high-quality” sequences. Open reading frames in these high-quality full genomes were compared with a reference M nucleotide sequence (WT, RefSeq, NC 045512.2; loc, 10 055–10 972) to extract M sequences of at least 80% similarity using a script written in Python v3.7.0.[29] Genomes with gaps or ambiguous nucleotides (e.g., N, S, D, per International Union of Pure and Applied Chemistry (IUPAC) nomenclature[30]) in the M sequence were excluded from this data set, leaving a total of 10 578 sequences from high-quality genomes. Nucleotide sequences were converted into amino acid sequences and screened for nonsynonymous mutations against the WT M using code written in Wolfram Mathematica 12.1,[31] yielding 511 nonsynonymous mutations in M, 77 of which were unique. A single unique M variant, found in an April 24, 2020 data set, but no longer available in the GISAID database, was also used in our analyses. M variants were used in phylogenetic analyses along with reference human, bat, and pangolin viruses. Full genome alignments were performed using MUSCLE (v3.8.1551, max 8 iterations, enable find diagonals)[32] on the complete set of nonsynonymous M mutants as well as reference WT, bat, and pangolin sequences. Trees were generated in MEGA X,[33] using the Neighbor-Joining method;[34] a bootstrap test[35] of 1000 replicates was performed, and distances were calculated using the Maximum Composite Likelihood model.[35] In all, 515 full genomes were used in phylogenetic analyses; 78 unique M mutants and a reference WT sequence (79 total) were used for molecular modeling.

Molecular Modeling of Wild-Type and Variant Protein Structures

Initial conditions for the WT trajectories used here are based on the PDB structure 6Y2E,[36] representing a mature (i.e., cleaved pro-sequence) protein. For monomer trajectories, the A chain of 6Y2E was employed. Initial variant monomer and dimer structures were, respectively, predicted using MODELLER 9.23,[37] using the 6Y2E structure as a template; three rounds of annealing and MD refinement were performed using the “slow” optimization level for each (final objective function values are provided in Table S7). Initial structures were then processed to correct protonation states to reflect their predicted cellular environment (with protonation states predicted using PROPKA 3.1[38]). Each corrected model structure was then minimized and equilibrated in explicit solvent; simulations were performed using NAMD[39] with the CHARMM36 force field[40] in TIP3P water[41] at 310 K under periodic boundary conditions (with a 10 Å margin water box). Solvated protein models were energy-minimized for 10 000 iterations before being simulated for 0.5 ns, followed by a water box size adjustment, after which a 10 ns trajectory was simulated with conformations being sampled every 20 ps; an NpT ensemble was used, with temperature controlled via Langevin dynamics with a damping coefficient of 1/ps and Nosé–Hoover Langevin piston pressure control set to 1 atm.[42,43] Final conformations from each trajectory were used to generate figures in the main text and Supporting Information.

Network Analysis

A protein structure network (PSN) was calculated for each modeled conformation of each variant via scripts employing the statnet,[44−46] Rpdb,[47] and bio3d[48] libraries for R.[49] Vertices were defined using the method used in ref (50), where each node represents a chemical moiety, with edges being defined by interatomic contacts. Specifically, two nodes i and j are considered adjacent if i contains atom g and j contains atom h such that the g and h distance is less than 1.1 times the sum of their respective van der Waals radii (using values from ref (51)). The node definitions are illustrated in Figure S1A, and a small-moiety PSN of this type for WT M is shown in Figure S1B. Active site networks (ASNs) were constructed from each PSN as described in ref (52). Briefly, all vertices belonging to the catalytic Cys and His residues were identified, along with all vertices adjacent to these vertices within the PSN. The ASN was then defined as the subgraph of the corresponding PSN induced by this combined vertex set (Figure S1C). In the case of dimer models, one ASN was constructed for each chain. To assess overall cohesion, degree k-core values[53] were calculated for each vertex in each PSN, and the average core number was computed for the entire protein and for the vertices in each domain, respectively. All calculations were performed using the sna library[46] for R. For dimer PSNs, interfacial moieties were identified by selecting all vertices adjacent to at least one vertex in the opposing chain, and average core numbers were computed for these interfacial vertices to assess cohesion in the dimerization interface; raw counts of edges spanning the two chains (interfacial tie volumes) were also calculated. For each vertex associated with a moiety in the active site, three measures identified as associated with active site constraint by ref (52) were computed: the degree, or number of ties to other vertices; the triangle degree, or number of triangles (3-cliques) to which the vertex belongs; and core number, or number of the highest degree k-core[54] to which the vertex belongs. Physically, these, respectively, indicate the total number of contacts associated with the chemical group (potentially impeding its motion), the number of truss-like, triangular structures in which the group is embedded (again, restricting mobility), and the extent of local cohesion around the chemical group, which is found to distinguish “tighter” and “looser” packing regimes.[55] To summarize the impact of each measure over the active site as a whole, values were averaged across active site vertices. As an additional constraint measure, the number of paths between each pair of active site vertices through neighboring (i.e., nonactive site) vertices was computed, and the log of the minimum of this value over the set of active site vertex pairs was employed as a measure of site cohesion. Intuitively, high values of site cohesion indicate that all active site chemical groups are connected by a large number of indirect contacts, while low values suggest that at least one pair of active site moieties has few local pathways holding them together. These four indices (mean active site degree, mean active site triangle degree, mean active site core number, and site cohesion) were used to produce an omnibus index of site constraint via principal component analysis (PCA) of the standardized network measures over all modeled conformations, as described in ref (52). This first principal component (the constraint score) accounted for approximately 71% of the variance in all four measures in the case of monomer ASNs, and the ratio of its associated eigenvalue to the next largest was approximately 4.7 (confirming the dominance of the principal eigenvector). This process was repeated for the dimer ASNs, resulting in a constraint score vector accounting for approximately 69% of the total index variance, with an eigenvalue ratio of approximately 4.3.

Comparing Mean Cohesion and Constraint Scores Across Variants

Because cohesion and constraint scores are heavily autocorrelated within trajectories, we employ a parametric bootstrap strategy to obtain autocorrelation-corrected standard errors and confidence intervals.[56] For each time series of scores for each trajectory, an autoregressive (AR) model with AIC-selected order was fit, and the estimated series mean was obtained (estimation performed by maximum likelihood estimation using the ar function in R[49]). The whitened residuals from the time series model were then used to construct 5000 parametric bootstrap replicate series, which were then refit to obtain bootstrap replicate means. Mean estimates from the bootstrap replicates were used to construct 95% bootstrap confidence intervals and standard errors for the series mean. This procedure was applied to the MD trajectory for each variant. For cohesion scores, mean and bootstrap standard errors are provided for the full protein and each domain in Table S3.

Kernel PCA of Active Site Networks

To identify key features differentiating active site conformations observed throughout the entire sample of trajectories, we employ a kernelized principal component analysis (kernel PCA[57]) of a stratified sample of monomer and dimer ASNs from all M variants. The ASN sample consists of 25 evenly spaced conformations from each trajectory (monomer, dimer A chain, and dimer B chain), for a total of 5925 networks. For comparative purposes, each ASN was mapped to the set of all unique vertices appearing in any ASN in the sample, and the upper triangle of the associated adjacency matrix was vectorized, yielding a binary vector of fixed dimension encoding each network. Analysis of the vectorized ASNs was performed using a disjunctive normal form (DNF) kernel,[58] defined bywhere x and y are respective ASN vectors, and ϵ is a free parameter controlling regularization. In a graph-theoretic context, the feature space of the DNF kernel consists of the set of all labeled subgraphs of the inputs; thus, kernel PCA on this space identifies combinations of labeled subgraphs that efficiently discriminate among ASNs. Hyperparameter tuning was performed via grid search, with the performance assessed by reconstruction of a held-out sample of 100 randomly selected ASNs, following training on a separate sample of 1000 ASNs. Reconstruction was performed using the preimage method from ref (59) with a local sample of 10 neighbors; the optimal hyperparameter obtained was ϵ = 0.0739. Following hyperparameter tuning, principal component scores were calculated for use in subsequent analysis. Analyses were performed using R[49] with portions implemented using Rcpp.[60]

Results and Discussion

Mutations in M Are Geographically Distributed

From the GISAID (https://www.gisaid.org/)[28] EpiCoV database (through April 29, 2020), 78 unique nonsynonymous mutations to M were found in addition to the WT sequence, including 73 single point variants and 5 double variants. For genome sequences containing these M variants, full genome alignments were performed using MUSCLE,[32] and neighbor-joining trees were generated using MEGA X.[33] Overall, the variation in SARS-CoV-2 sequences observed so far is relatively low, with mutation hotspots not evenly distributed throughout the genome, but localized to specific sequence regions.[61] Because M is critical for viral replication, mutations that have a large deleterious effect on virus replication are unlikely to be observed in clinical isolates; all M variants investigated here are therefore assumed to be enzymatically competent. In general, codon usage and amino acid frequency in viruses of eukaryotes are essentially identical to those of their eukaryotic hosts, reflecting the viruses’ use of the host translation machinery.[62] The known mutations in M are summarized in Figure . The tree was generated based on overall genome similarity; however, only sequences containing at least one nonsynonymous mutation in M were included in the analysis, along with the WT human sequence and two nonhuman reference sequences. The accession numbers and geographical sources are listed in Table S1. The solid arcs around the outside of the diagram indicate M mutations; color coding corresponds to the geographical source. Several mutations appear to have arisen more than once in the virus’s evolutionary history so far. Notably, K90R variants appear in multiple distantly related subtrees; five of these unique evolutionary events can be verified in Nextstrain’s SARS-CoV-2 phylogenetic tree.[63] Further, L89F, P108S, and N274D arise at least twice in both trees.
Figure 1

Optimal tree generated using 512 full mutant genomes and three reference genomes: human wild-type (WT),[64] bat,[3] and pangolin.[65] Only topology is shown; branch lengths are not to scale (average branch length = 1.432161 × 10–4 base substitutions per site). Each continuous arc corresponds to a variant label; these represent only adjacent branches with the same mutation in M and do not necessarily indicate shared ancestry. Branches and arcs from human clinical samples are color coded by location, which includes the following subregions: Africa, light blue (Democratic Republic of the Congo); Asia, green (Beijing, Fujian, Malaysia, Shanghai, Vietnam, and Wuhan); Australia, gold; Central Eurasia, pink (Georgia, Jordan, Russia, and Turkey); Europe, red (Belgium, Denmark, England, Finland, France, Germany, Iceland, Luxembourg, Netherlands, Scotland, Spain, Sweden, Switzerland, and Wales); North America, purple (Costa Rica and United States of America); South America, yellow (Argentina and Brazil). Subtrees that contained identical subregions and mutations have been condensed into a single branch; all subtrees and their constituent accessions can be found in Table S1.

Optimal tree generated using 512 full mutant genomes and three reference genomes: human wild-type (WT),[64] bat,[3] and pangolin.[65] Only topology is shown; branch lengths are not to scale (average branch length = 1.432161 × 10–4 base substitutions per site). Each continuous arc corresponds to a variant label; these represent only adjacent branches with the same mutation in M and do not necessarily indicate shared ancestry. Branches and arcs from human clinical samples are color coded by location, which includes the following subregions: Africa, light blue (Democratic Republic of the Congo); Asia, green (Beijing, Fujian, Malaysia, Shanghai, Vietnam, and Wuhan); Australia, gold; Central Eurasia, pink (Georgia, Jordan, Russia, and Turkey); Europe, red (Belgium, Denmark, England, Finland, France, Germany, Iceland, Luxembourg, Netherlands, Scotland, Spain, Sweden, Switzerland, and Wales); North America, purple (Costa Rica and United States of America); South America, yellow (Argentina and Brazil). Subtrees that contained identical subregions and mutations have been condensed into a single branch; all subtrees and their constituent accessions can be found in Table S1. These phylogenetic comparisons appear to support a multiple event hypothesis but are subject to errors resulting from the sparsity of testing. The repeated occurrence of the same mutation in seemingly unrelated subtrees may be due to missing data that would show their evolutionary connectedness. The average branch length of Figure , which shows only topology, is 1.432161 × 10–4 base substitutions per site (including those from the bat[3] and pangolin[65]); 32.2% of the 1028 branches have, to ten significant figures, 0 base substitutions per site. For a genome of roughly 30 000 base pairs, this amounts to an average of only 4 substitutions per branch. All of these unique mutants therefore effectively belong to the same strain, making them difficult to place in an evolutionary context. For more diverged mutants, unfortunately placed ambiguous nucleotides[30] could push them from one subtree to another. With the exception of five double variants, the sequences in Figure arise from single point mutations. Whether and how M mutations have affected viral fitness is not yet known, but at least three mutants have remained in the population long enough to accumulate another mutation: L220F to A191V/L220F, G15S to G15S/D48E and G15S/V35L, and K90R to V77A/K90R. It is worth noting, that although a single variant A191V exists, the A191V/L220F double variant likely stemmed from an L220F ancestor due to its shared lineage with L220F single variants. A fifth double variant, A193T/R279C, was found but did not stem from any single mutation in our data set; its origins remain unclear. Although a mutation’s prevalence and evolution in a population may be interpreted as a sign of robust viral function, the opposite does not necessarily indicate reduced infectivity. Testing rates, social behavior, and time of first infection in each region are all factors that contribute to the spread of the disease and the availability of sequencing data. For instance, a large number of K90R mutants were collected in Iceland, where the number of tests per 1000 people is nearly twice as many as the next leading country’s and more than seven times as many as the United States’ (Iceland, 141.75; USA, 18.21, as of April 29, 2020).[66] Consequently, further investigation is needed to determine whether M mutations affect viral fitness on a global scale. As such, without greater divergence and more sequences, it is difficult to tell if the presence of an M mutation in unrelated subtrees is evidence of multiple evolutionary events, or an artifact of sparse testing. Despite these caveats, it is clear that these 78 variants are functional enough to infect people within their local populations. In the structural analyses that follow, we focus on the differences in protein properties of the clinically observed M variants relative to WT. Because only sequences harboring M mutations were retained for analysis, certain geographical areas appear to be underrepresented. It is likely that the strains that had spread to underrepresented regions prior to our data collection simply did not have M mutations. Different regions tend to be dominated by different mutants, a feature that might be explained by the timing at which these mutations arose or arrived. For instance, 83 of the 100 M mutants from Iceland were K90R, and most stemmed from a single shared ancestor (see Supporting Information). Further, it is likely that heterogeneity in sequencing rates have resulted in a less-than-complete data set. As of April 29th, the only North American, South American, and African M mutants reported in the GISAID database that passed our filtering parameters were from Costa Rica and the USA, Argentina and Brazil, and the DRC respectively. This does not necessarily indicate a lack of M mutations in other subregions and may instead reflect differences in sequencing rates. It is worth noting that some nodes in expanded Figure (see Supporting Information) exhibit bootstrap values less than 10 (i.e., their branch relationships were shown in fewer than 10% of replicates). The majority of these extremely low values can be found on large subtrees that share an M mutation, like D248E or K90R, many of which likely spread from a common source. Their branches can often be interchanged with no reduction in accuracy due to high sequence identity. Low bootstrap values here primarily reflect genotype similarity within subtrees rather than low accuracy. This is a known property of the Felsenstein procedure,[35] which bootstraps loci instead of sequences; low bootstrap values indicate groupings that are sensitively dependent upon differences in small numbers of loci but do not necessarily reflect uncertainty due to sequence sampling.

M Mutations to Date Suggest Selection for Larger, More Massive, and More Hydrophobic Residues

To reveal the global pattern of substitutions, we visualize mutations in M independent of sequence position or location in the three-dimensional structure, by a network where the nodes, or vertices, are amino acid types and the edges (represented by arrows pointing in the direction of substitution) are directional indicators of how often one amino acid was observed to substitute for another (Figure ). The weights of the edges indicate the frequency of the mutation across known M variants, while node color reflects residue hydrophobicity on the scale of Kyte and Doolittle.[67] The most obvious trend observed in the pattern of mutation so far is the preferential substitution of larger, more hydrophobic amino acids in place of smaller, less hydrophobic ones. The pattern is consistent with increased incidence of amino acid types that are more likely to be present in folded domains rather than those found more often in less-structured linker regions.[68]
Figure 2

Amino acid substitutions in SARS-CoV-2 M observed up to April 29, 2020. Arrows indicate the direction of substitution: an arrow from i to j indicates at least one clinically observed substitution of residue type i to residue type j; heavier lines indicate larger numbers of observed substitutions. Color indicates hydrophobicity, using the scale of Kyte and Doolittle.[67] In general, substitution has been toward larger and more hydrophobic residues.

Amino acid substitutions in SARS-CoV-2 M observed up to April 29, 2020. Arrows indicate the direction of substitution: an arrow from i to j indicates at least one clinically observed substitution of residue type i to residue type j; heavier lines indicate larger numbers of observed substitutions. Color indicates hydrophobicity, using the scale of Kyte and Doolittle.[67] In general, substitution has been toward larger and more hydrophobic residues. In particular, it is notable that alanine has very few incoming ties and a large number of outgoing ties, mostly to valine, which has a larger and more hydrophobic side chain. Alanine is at the same time one of the most common amino acids and one of those with the most variable prevalence in the human genome.[69] Similarly, observed ties to isoleucine are mostly incoming from smaller residues; leucine likewise has more incoming than outgoing ties, with the bulk of its outgoing ties going to phenylalanine, which is also large and hydrophobic. However, aromatic residues per se do not appear to be selected. For example, tyrosine has mostly outgoing ties. Also notable are the selection away from the secondary structure breakers proline and glycine, both of which have only outgoing ties, and the propensity for lysine to be replaced by arginine even though both side chains are positively charged. Arginine is both larger and capable of making more and stronger hydrogen bonds, as well as cation-π interactions not available to lysine, leading to its known overrepresentation in interdomain and intermonomer interfaces.[70−73] The mean differences in side chain properties for observed M mutations are summarized in Table . As observed in the network representation (Figure ), on average, the mutated residues are larger and more hydrophobic than the ones they replace. Although substituted residues are on average larger and more massive, we do not see strong evidence favoring bulky residues over more compact ones, independent of mass: residue bulk (measured as volume/mass) for substituted residues did not differ significantly from WT (mean difference = 0.02 Å3/Da, t = 1.87, p = 0.0650). The variant sequences are not significantly different from WT in charge or aromatic content.
Table 1

Mean Differences in Side Chain Properties for Substituted Residues versus WT (N = 83; Substitutions from Double Mutants Considered Separately)a

 mean differencestd errort valuep value
polar (1 = true)0.080.071.220.2251
hydrophobicity1.030.303.470.0008d
charge–0.050.04–1.270.2078
aromatic (1 = true)0.070.041.620.1093
mass (Da)9.973.492.850.0055c
volume (Å3)11.583.653.170.0021c
bulk (Å3/Da)0.020.011.870.0650

On average, substituted residues are significantly more hydrophobic, more massive, and larger in volume than those they replace (all p values for two-tailed t-tests versus no difference).

P < 0.05

P < 0.01

P < 0.001

On average, substituted residues are significantly more hydrophobic, more massive, and larger in volume than those they replace (all p values for two-tailed t-tests versus no difference). P < 0.05 P < 0.01 P < 0.001

M Mutations Occur Throughout the Protein

In order to provide context for the discussion of conformational changes and the locations of mutation sites, a ribbon diagram of the SARS-CoV-2 M monomer is shown in Figure A. The three domains, the active site residues, and other structural features are labeled. The dimer is shown in Figure B; the dashed box shows the approximate location of the interchain salt bridge shown in Figure C. In SARS-CoV-1 M, this salt bridge between Arg4 of Chain A and Glu290 of Chain B is an important component of the dimer interface,[74] and substitution of either or both of these residues with alanine resulted in shifting the equilibrium from dimer to monomer, although dimerization was not abolished completely.[14] Panels D and E show two different views of the dimer interface, with key residues labeled. In addition to important interfacial residues, we also highlight key residues near the oxyanion hole, a structural feature found in many serine and cysteine proteases that stabilizes negatively charged, tetrahedral transition states.[75]
Figure 3

(A) Ribbon diagram for the M monomer, with key structural features labeled, including the three domains, the N- and C-termini, and the active site residues, H41 and C145. (B) Ribbon diagram for the M dimer, with separate chains shown in light gray and black. The N- and C-termini, both of which mediate interactions at the dimer interface, are labeled for each monomer. The dashed square shows the approximate location of the salt bridge shown in panel C. (C) A salt bridge between Arg4 of one monomer and Glu290 of the other is shown. This salt bridge is hypothesized to be important for stabilizing the dimer interface and maintaining the active conformation. (D, E) The dimer interface is shown from two different angles, with key residues labeled.

(A) Ribbon diagram for the M monomer, with key structural features labeled, including the three domains, the N- and C-termini, and the active site residues, H41 and C145. (B) Ribbon diagram for the M dimer, with separate chains shown in light gray and black. The N- and C-termini, both of which mediate interactions at the dimer interface, are labeled for each monomer. The dashed square shows the approximate location of the salt bridge shown in panel C. (C) A salt bridge between Arg4 of one monomer and Glu290 of the other is shown. This salt bridge is hypothesized to be important for stabilizing the dimer interface and maintaining the active conformation. (D, E) The dimer interface is shown from two different angles, with key residues labeled. In SARS-CoV-2 M, several interactions at or near the dimer interface play a role in keeping the oxyanion hole in the active conformation. An intrachain hydrogen bond between the side chain NH2 of Arg298 and the backbone carbonyl of Met6 appears to help hold the N-terminus (also called the N-finger) in place to make the appropriate contacts with the other chain.[76] In the S1 specificity pocket, a ring-stacking interaction between the side chains of His163 and Phe140 is stabilized by a network of hydrogen bonds comprising residues from both monomers. Hydrogen bonding interactions between Glu166 and Ser139 of Chain A with the N-terminus of Chain B (Ser1) are also important for stabilizing the dimer interface, along with interactions between the Ser10 residues of both chains. Mutating the adjacent Gly11 to Ala completely abolishes dimerization due to a dramatic conformational change to the N-finger that causes disruption of the interface.[21] Although the mutational studies discussed here were performed using SARS-CoV-1 M, many of the key residues are also found in SARS-CoV-2 M, and we hypothesize that they play similar roles. Figure A shows the M dimer with mutation sites indicated by spheres. The mutations observed so far are relatively evenly distributed throughout the protein. Notably, mutations are tolerated close to the active site residues, near the oxyanion hole, and at the termini, both of which are involved in modulating the dimer conformation.[17] Two different views of the dimer interface are shown in Figure B, with interfacial residues shown as space-filling models and with mutated residues highlighted in light blue (Chain A) or dark blue (Chain B). Despite the importance of the dimer interface, several mutations are observed in this region: M6L, A7V, A116V, S121L, C300S, S301L, and G302C. Particularly notable are M6L and A7V, which are part of the N-finger that participates in interactions with the C-terminal domain of the second monomer. In both cases, the hydrophobic character of the residue is preserved, with a slight increase in the size of the residue. With the exception of A116V, the other mutations represent qualitative changes in side chain properties. These changes in chemical properties are apparently well tolerated, as all of these mutations were found in clinical isolates.
Figure 4

(A) Ribbon diagram for the M dimer, with separate chains shown in light gray/light blue and purple/dark blue. The locations of all observed mutations are indicated by spheres centered on the α-carbons. (B) Two different views of the dimer are shown, with all residues at the dimer interface rendered as space-filling models, providing a visualization of the size of the dimer interface and its spatial relationship to the active site residues (blue, His41; yellow, Cys145) Residues having at least one mutation in this data set are highlighted in light blue (Chain A) or dark blue (Chain B.).

(A) Ribbon diagram for the M dimer, with separate chains shown in light gray/light blue and purple/dark blue. The locations of all observed mutations are indicated by spheres centered on the α-carbons. (B) Two different views of the dimer are shown, with all residues at the dimer interface rendered as space-filling models, providing a visualization of the size of the dimer interface and its spatial relationship to the active site residues (blue, His41; yellow, Cys145) Residues having at least one mutation in this data set are highlighted in light blue (Chain A) or dark blue (Chain B.).

Molecular Modeling Suggests Regionally Specific Differences in M Variant Structure

For WT M and each variant, molecular models of the monomer and dimer structures were constructed using MODELLER 9.23,[37] based on PDB structure 6Y2E,[36] followed by annealing, correction of protonation states, and all-atom molecular dynamics simulation in explicit solvent (see Materials and Methods). Examples of representative models are shown in Figure S2, intramolecular contacts for residue 225 in WT and the T225I variant are shown in Figure S3 and listed in Table S2, and selected double mutants are shown in Figure S4. The positions of all observed K to R and R to C mutations, which are commonly observed in interfaces, are shown in Figure S5. In M, two such mutations, K236R and R279C, are located in domain III, near the dimer interface. The positions of all mutated residues are mapped onto the WT structure in Figure S6, with color coding indicating the change (if any) in side chain chemical properties. We do not observe gross differences in structure or dynamics across variants, as expected given that all variants were found in clinical isolates and are therefore necessarily functional; mutations leading to radically altered or misfolded structures would likely be strongly selected against. However, analysis of MD trajectories does suggest more subtle differences across variants, providing insight into function-preserving changes. To assess the overall degree to which local structure is conserved across M variants, we compute the cross-variant variance in average ϕ,ψ backbone torsion angles by residue within free monomers. In order to control for overall flexibility, we normalize this by the estimated variance in torsion angles within each trajectory. For arbitrary angle α at residue i, this leads to the local variation indexwhere α is the vector of angles of type α over the trajectory of variant j with corresponding angular mean , is the vector of such means across variants, VarB is the “between variant” angular variance in mean angles, and VarW is the “within variant” angular variance in α. Intuitively, high values of v(α) indicate relatively large between-variant variation in α relative to angular variation seen within the trajectories themselves. For v(ϕ) and v(ψ), such values correspond to systematic changes in local conformation associated with M mutations. By turns, low values of v(ϕ) and v(ψ) indicate residues whose local structure does not vary meaningfully across variants. It should be noted that such regions can be either flexible or rigid. Figure shows the mean local variation indices for ϕ,ψ by the residue for the 79 M variants, indicated by color on the structure of M WT. (Separate values for ϕ and ψ are shown in Figure S7.) It is immediately noteworthy that, with the minor exception of two small loop regions around N277 and F223 (respectively), Domain III shows little systematic variation across variants. The β-sheet-rich structure around the active site is also relatively well-conserved. By contrast, we see relatively high levels of between-variant difference in the interdomain region involving the termini (residues G2-A7 and S301–F305) and the double loop “active site gateway” region involving (respectively) L50-Y54 and D187-A191. The former is potentially significant in influencing large-scale flexibility (possibly relevant to dimerization), whereas the latter is of obvious relevance to substrate processing and specificity. This motivates a more detailed examination of variation in the active site, to which we return below.
Figure 5

Local variation indices for M monomer backbone torsion angles (front/back views). Blue residues show higher levels of cross-variant ϕ,ψ differences relative to baseline variation; red residues show little evidence of structural difference across variants. Domain III is substantially conserved, while greater change is seen in the interdomain regions and loop regions adjacent to the active site.

Local variation indices for M monomer backbone torsion angles (front/back views). Blue residues show higher levels of cross-variant ϕ,ψ differences relative to baseline variation; red residues show little evidence of structural difference across variants. Domain III is substantially conserved, while greater change is seen in the interdomain regions and loop regions adjacent to the active site. The relatively high levels of conformational variation in the interdomain regions suggest functionally relevant differences in global cohesion across variants. To assess this, we employ protein structure networks (PSNs), which are well-suited for assessing the looseness or cohesiveness of contacts among chemical groups. For all M variants, moiety-level PSNs were constructed for each frame within each variant trajectory, using the definitions found in ref (50) (Figure S1) The assessment of global cohesion was performed by computing the mean degree k-core number for all moieties in each structure; to allow comparison of global cohesion within domains, we also compute mean core numbers within each of the three domains. The mean core number can be considered an index of structural cohesion, with higher values indicating greater numbers of redundant contacts among chemical groups.[55] To account for within-trajectory autocorrelation in comparing mean core numbers, autocorrelation-corrected parametric bootstrap confidence intervals and standard errors were employed. Figure shows global and domain-specific cohesion levels (i.e., mean core numbers) for all variant monomers, sorted in descending order of mean cohesion. Means and standard errors for each variant can be found in Table S3. As suggested from the torsion angle analysis, cohesion differs significantly among variants, both globally and within domains. On average, the majority of variants are estimated to be less cohesively structured than WT, with the exception of Domain III (in which WT does not differ significantly from the mean). It is possible that these differences indicate selection for more globally flexible structures (again, with the exception of Domain III). Whether or not this is the case, however, it appears clear that less cohesive structures are not strongly selected against. Such flexibility may affect dimerization kinetics, which is relevant to protease function and the development of robust dimerization inhibitors.
Figure 6

Mean core numbers for M monomer PSNs, by variant (ordering is by mean value in each panel). Points indicate trajectory means, with segments showing autocorrelation-corrected 95% bootstrap confidence intervals; red/blue intervals have t values versus WT (green) of at least ±2, indicating significant variation in structural cohesion across variants. Overall, the majority of variants are less cohesive than WT globally and in Domains I and II, while Domain III cohesion in WT is typical of the variant set.

Mean core numbers for M monomer PSNs, by variant (ordering is by mean value in each panel). Points indicate trajectory means, with segments showing autocorrelation-corrected 95% bootstrap confidence intervals; red/blue intervals have t values versus WT (green) of at least ±2, indicating significant variation in structural cohesion across variants. Overall, the majority of variants are less cohesive than WT globally and in Domains I and II, while Domain III cohesion in WT is typical of the variant set. While cohesion and flexibility within M monomers may provide clues to how mutations may impact dimerization, examination of the interfacial region within M dimers suggests indications of the impact of mutations on the behavior and stability of the dimers themselves. Figure A shows mean cohesion scores (i.e., mean degree k-core numbers) for interfacial moieties in dimer trajectories for all M variants, together with autocorrelation-corrected 95% bootstrap confidence intervals; values are also provided in Table S4. (We here define a moiety to be interfacial if it is adjacent to at least one chemical group in the opposite chain (Figure C). This is assessed on a frame-by-frame basis, allowing us to account for changes in contact patterns within trajectories.) Although there is some variation in interfacial cohesion, relatively few trajectories differ significantly from the mean cohesion level, and very few are two standard errors away from wild-type (12 being more cohesive and 1 being less cohesive). This is consistent with the hypothesis that interfacial cohesion is strongly conserved in M, suggesting that a looser or more dynamic dimer core may impede function. (The fact that, of those variants differing significantly from WT, all but one show greater cohesion is also consistent with this determination, albeit not determinative.)
Figure 7

(A) Mean core number for moieties at the dimer interface across all variants, with 95% autocorelation-corected bootstrap confidence intervals. Higher values indicate a higher core number (greater cohesion); variants indicated in red/blue have significant variation in core number relative to WT (green). The vertical dotted line indicates grand mean. (B) Mean tie volume across the dimer interface, ordered per panel A. Variants indicated in red/blue have significantly lower/higher tie volumes across the interface relative to WT (green). (C) Illustrative moiety-level PSN for the WT M dimer. Moieties associated with chains A and B are indicated in gray and dark purple, respectively. Moieties making up the interfacial residues for chains A and B are indicated in light gray and light purple, respectively. Also shown are moieties associated with the active site His (blue) and Cys (yellow.) (D) Induced subgraph of the PSN in panel C based on interfacial moieties. Interfacial tie volume is the count of ties from A (gray) to B (purple) interfacial nodes, while embeddedness in locally cohesive structures contributes to the mean k-core number.

(A) Mean core number for moieties at the dimer interface across all variants, with 95% autocorelation-corected bootstrap confidence intervals. Higher values indicate a higher core number (greater cohesion); variants indicated in red/blue have significant variation in core number relative to WT (green). The vertical dotted line indicates grand mean. (B) Mean tie volume across the dimer interface, ordered per panel A. Variants indicated in red/blue have significantly lower/higher tie volumes across the interface relative to WT (green). (C) Illustrative moiety-level PSN for the WT M dimer. Moieties associated with chains A and B are indicated in gray and dark purple, respectively. Moieties making up the interfacial residues for chains A and B are indicated in light gray and light purple, respectively. Also shown are moieties associated with the active site His (blue) and Cys (yellow.) (D) Induced subgraph of the PSN in panel C based on interfacial moieties. Interfacial tie volume is the count of ties from A (gray) to B (purple) interfacial nodes, while embeddedness in locally cohesive structures contributes to the mean k-core number. For a different look at interaction across the interfacial region, we also consider the total number of contacts between the A and B chains over time (the tie volume across the dimer interface, e.g., the interfacial ties in Figure D). Figure B shows mean dimer interface tie volumes and associated confidence intervals for the M variants, with variants sorted in the order of the mean core number to facilitate comparison (panel A). Whereas the mean core number provides a measurement of overall structural cohesion at the interface (including both interactions within and between chains), the tie volume directly measures the number of cross-chain contacts and is thus a potential proxy for interaction strength. As with cohesion, tie volume is fairly well-conserved, and few trajectories show significant deviation from wild-type (with seven trajectories having higher tie volume than WT, and two lower). Interestingly, interfacial tie volume is not strongly related to interfacial cohesion, indicating that the tightness or looseness of the structure around the interface is not simply a function of the raw number of interfacial contacts. Both, however, suggest a relatively high level of conservation in interfacial structure in the M dimer across variants observed to date. While the first version of this work, which addressed moiety-level networks of M monomers[77] was under review, a preprint[78] was released, showing similar results using residue-level network analysis on MD trajectories of M dimers, and subsequently published.[79] Although different methodology is used, the conclusions are in broad agreement, with significant flexibility observed in the N-finger region that is important for dimerization, as well as the loops near the active site. The authors also describe key low-frequency motions of the dimer that may be impacted by mutation and identify a pocket near the dimer interface that could serve as a potential target for allosteric inhibitors, underscoring the importance of mutational analysis as a part of effective inhibitor design. Moiety-level network analysis of M dimers was then added to this paper in response to reviewers’ comments, again finding substantial agreement with the residue-level results.[78]

Active Site Networks Suggest Potential Activity Differences across M Variants

The observation of structural variation in loop regions associated with the binding pocket motivates closer examination of variation in the M active site. To this end, subgraphs of the full protein structure networks comprising moieties belonging to the active site residues and their neighbors were constructed to produce active site networks (ASNs)[52] for all conformations. A protein’s ASN describes physical interactions among active site moieties and other groups that are immediately adjacent in the 3D structure, irrespective of their positions in the amino acid sequence. Per ref (52), we compute for each ASN a constraint score, a general measure of active site flexibility that is associated with substrate specificity. The constraint score is the first principal component of a set of several network metrics (see Materials and Methods), with higher values indicating a greater tendency for the catalytic residues to be constrained by cohesive contacts with other residues, and lower values indicating fewer such constraints. Examples of ASNs corresponding to the maximum, minimum, and mean observed constraint values over all observed M monomer conformations are shown in Figure . For the dimer structures, we repeat this process for the active site in each chain, yielding two constraint values per variant; these scores, along with corresponding maximum, mean, and minimum constraint ASNs, are shown in Figure . Values for both monomers and dimers are also provided in Tables S5 and S6.
Figure 8

(A) Mean monomer active site constraint scores and 95% autocorrelation-corrected parametric bootstrap confidence intervals, by variant. Higher values indicate greater constraints on active site residues; red/blue intervals have t values versus WT (green) of at least ±2, indicating significant variation in average constraint across variants. (B) Minimum, (C) mean, and (D) maximum constraint ASNs over all frames. Low constraint conformations are characterized by no shared partners between the catalytic residues (colored nodes), while highly constrained conformations show cohesively reinforced contacts between them. (E, F, G) Active site residues and the surrounding residues making up the ASN for the proteins described in panels B, C, and D, respectively. (H, I, J) Full protein models for the same examples, with the active site regions indicated by white boxes.

Figure 9

(A) Mean active site constraint scores and 95% autocorrelation-corrected parametric bootstrap confidence intervals for dimer A and B chain active sites, by variant. Red/blue intervals have t values versus WT (green) of at least ±2, indicating significant variation in average constraint across variants (comparisons made within the chain). (B) Minimum, (C) mean, and (D) maximum constraint ASNs over all structures. (E, F, G) Active site residues and the surrounding residues making up the ASN for the proteins described in panels B, C, and D, respectively. (H, I, J) Full protein models for the same examples, with the active site regions indicated by white boxes.

(A) Mean monomer active site constraint scores and 95% autocorrelation-corrected parametric bootstrap confidence intervals, by variant. Higher values indicate greater constraints on active site residues; red/blue intervals have t values versus WT (green) of at least ±2, indicating significant variation in average constraint across variants. (B) Minimum, (C) mean, and (D) maximum constraint ASNs over all frames. Low constraint conformations are characterized by no shared partners between the catalytic residues (colored nodes), while highly constrained conformations show cohesively reinforced contacts between them. (E, F, G) Active site residues and the surrounding residues making up the ASN for the proteins described in panels B, C, and D, respectively. (H, I, J) Full protein models for the same examples, with the active site regions indicated by white boxes. (A) Mean active site constraint scores and 95% autocorrelation-corrected parametric bootstrap confidence intervals for dimer A and B chain active sites, by variant. Red/blue intervals have t values versus WT (green) of at least ±2, indicating significant variation in average constraint across variants (comparisons made within the chain). (B) Minimum, (C) mean, and (D) maximum constraint ASNs over all structures. (E, F, G) Active site residues and the surrounding residues making up the ASN for the proteins described in panels B, C, and D, respectively. (H, I, J) Full protein models for the same examples, with the active site regions indicated by white boxes. Examination of the mean constraint scores for each variant trajectory suggests potential activity differences across M variants. Figure A shows mean constraint scores for each variant monomer, with autocorrelation-corrected parametric bootstrap confidence intervals. Of the 79 trajectories examined, 22 (28%) were significantly below the grand mean (dotted vertical line) and 28 (35%) were significantly above it; similarly, when directly compared to WT, 12 variants were observed to be significantly less constrained, while 17 were significantly more constrained (i.e., bootstrap t-scores less than −2 or greater than 2, respectively). A total of 43 out of 79 M sequences (55%) showed nominally higher levels of mean constraint than WT (discounting significance), suggesting a lack of uniform selection pressure for active sites that are more or less constrained than WT (the fraction greater does not differ significantly from random deviation, p = 0.16, exact binomial test). Thus, although we do not see evidence here of systematic selection for net changes in active site constraint in the monomeric state, we do see evidence that variants differ from each other and from WT in their average active site properties. Turning to the dimer trajectories, Figure A shows mean constraint scores for A and B chain active sites, with autocorrelation-corrected parametric bootstrap confidence intervals for each variant. Consistent with the hypothesis that dimerization can result in asymmetric modification of active site conformations,[20] we frequently observe large differences in the active site constraint between chains within the same variant; indeed, the root-mean-square difference in mean constraints within the variant is significantly larger than would be expected from the variation across sites alone (p = 0.048, permutation test), being approximately 1.65 times the standard deviation in mean constraint across variants. That said, such differentiation is not always present, with many trajectories (including WT) showing similar levels of constraint for both active sites, and it may not be necessary for function. (We observe that symmetric active site conformations were obtained in the WT dimer structure of Zhang et al.,[36] for instance.) Unlike the monomeric case, we see a greater trend here toward reduction in dimer active constraint scores versus WT, with 76% of sequences showing constraint levels lower than WT (p < 0.0001, exact binomial test). It is possible that this reflects a higher level of selection for looser active sites in the dimeric state per se. These differences should be considered when designing inhibitors that are tolerant of mutational change in M over time. In particular, it is clear that the population of extant M variants already possesses some phenotypic diversity in active site flexibility, potentially facilitating its ability to evolve around some types of inhibitors. The ability of M dimers to sustain active sites with different levels of flexibility may also complicate inhibitor design and suggests the importance of inhibitors that are robust to variation in active site constraint.

Overall Variation in M Active Site Conformations Is Related to Cys/His Contact

Given the diversity of active site conformations across monomeric/dimeric states, variants, and chains, it is useful to seek specific features that can be used to characterize those conformations. To examine this, we employ a kernel principal component analysis (kPCA) of a stratified sample of ASNs from the set of monomer and dimer trajectories, taking 25 evenly spaced frames from each trajectory (including both A and B ASNs in the dimeric case) for a total of 5925 conformations from the entire data set. The feature space for our kernel is the set of all labeled subgraphs on the set of potential ASNs, allowing us to detect specific patterns of contacts among moieties that characterize subpopulations of conformations. As shown in Figure , the combined set of active site conformations falls into two fairly distinct clusters obliquely aligned with the first two principal components. These respective clusters correspond to a single feature, specifically the presence or absence of at least one contact between the catalytic cysteine and histidine residues; the cluster aligned with the first principal component consists of “open” conformations lacking such a contact, while the cluster aligned with the second principal component consists of “closed” conformations in which the contact is present.
Figure 10

Kernel PCA solution for M active site networks. Conformations fall into two clusters, corresponding to whether the Cys-His tie is present (“closed,” circles) or absent (“open,” triangles). The position within each cluster is strongly associated with mean degree (see point color) and secondarily with network size (gradient shown via contour lines). Inset networks illustrate centroids of each cluster; note the presence of the Cys/His interaction (highlighted) in the selected A226 V chain B conformation, which is absent in the selected G278R chain A conformation.

Kernel PCA solution for M active site networks. Conformations fall into two clusters, corresponding to whether the Cys-His tie is present (“closed,” circles) or absent (“open,” triangles). The position within each cluster is strongly associated with mean degree (see point color) and secondarily with network size (gradient shown via contour lines). Inset networks illustrate centroids of each cluster; note the presence of the Cys/His interaction (highlighted) in the selected A226 V chain B conformation, which is absent in the selected G278R chain A conformation. Both clusters have an ellipsoidal form, with distance along the ellipsoids away from the origin corresponding to the average number of contacts per chemical group (mean degree). The cluster ellipsoids are also slightly oblique to one another, an orientation that arises from their joint respective correlation with the total number of moieties interacting with the active site residues (ASN size); this is shown in Figure via the gradient of the size distribution, indicated as contour lines. These three properties (Cys/His contact, mean degree, ASN size) thus provide a parsimonious description of the most important axes of variation in active site structure and may be useful as targets for experimental investigation (e.g., via NMR). Interestingly, we observe very little association between conformation and monomer/dimer status (R2 < 0.6% on the first PC, and no significant difference on the second-highest R2 in the first 10 dimensions of 1.4%), indicating that the full range of active site conformations is observed in both monomer and dimer structures and at similar frequencies. Conformation is more strongly associated with the variant, the latter accounting for approximately 14% of the variance in the first PC and 7% in the second; adding interaction effects with chain and monomer/dimer status increases this by 30% and 16%, respectively, suggesting that M mutations tend to affect monomer and dimer structures differently. While the mutations observed to date do not radically alter active site conformation (this being a priori unlikely for functional mutations in any event), they do appear to exert a nontrivial influence on the distribution of conformational states.

Conclusion

For clinically relevant variants of M, the observed variation so far is toward larger and more hydrophobic amino acids, leading to reduced structural cohesion on average in the variants relative to wild-type. Although mutations occur throughout the protein, the structural effects of those mutations appear to be far more localized, impacting primarily the protein’s interdomain interfaces and several key loop regions near the substrate-binding pocket. These mutations appear to result in systematic variations in both global flexibility and in the extent to which the catalytic residues of the active site are constrained (a factor previously found to be related to substrate specificity in related systems). Our results suggest that M may be currently subject to selection for enhanced global flexibility and that currently circulating M variants represent a reservoir of phenotypic diversity in active site structure and dynamics that could facilitate an evolutionary response to certain classes of protease inhibitors. These findings are relevant to dimerization kinetics, substrate capture, and the development of resistance to inhibitors, as well as our understanding of SARS-CoV-2 more generally. On a more methodological note, these results underscore the potential of comparative in silico studies to rapidly probe structural and functional consequences of genotypic variation in emerging diseases. Advances in both GPU-enabled hardware and molecular dynamics have made high-volume simulation studies feasible over short time horizons, giving us a powerful tool for selection of experimental targets. At the same time, comparative analysis of large volumes of trajectory data created by such simulation studies remains a challenge. Here, we have used both network analytic and machine learning techniques to identify potentially important sources of variation across trajectories. Although network analytic ideas have been used to study protein structures at least since 1993,[80] systematic use of combined network analytic and MD trajectories for comparative analysis of protein variants is more recent.[81−84] It is hoped that applications such as this one will inspire further development of this promising approach.
  13 in total

Review 1.  Inhibition of the main protease of SARS-CoV-2 (Mpro) by repurposing/designing drug-like substances and utilizing nature's toolbox of bioactive compounds.

Authors:  Io Antonopoulou; Eleftheria Sapountzaki; Ulrika Rova; Paul Christakopoulos
Journal:  Comput Struct Biotechnol J       Date:  2022-03-14       Impact factor: 7.271

2.  Withasomniferol C, a new potential SARS-CoV-2 main protease inhibitor from the Withania somnifera plant proposed by in silico approaches.

Authors:  Shivananada Kandagalla; Hrvoje Rimac; Krishnamoorthy Gurushankar; Jurica Novak; Maria Grishina; Vladimir Potemkin
Journal:  PeerJ       Date:  2022-06-02       Impact factor: 3.061

3.  Peptidomimetic nitrile warheads as SARS-CoV-2 3CL protease inhibitors.

Authors:  Bing Bai; Elena Arutyunova; Muhammad Bashir Khan; Jimmy Lu; Michael A Joyce; Holly A Saffran; Justin A Shields; Appan Srinivas Kandadai; Alexandr Belovodskiy; Mostofa Hena; Wayne Vuong; Tess Lamer; Howard S Young; John C Vederas; D Lorne Tyrrell; M Joanne Lemieux; James A Nieman
Journal:  RSC Med Chem       Date:  2021-08-20

4.  Screening of potent phytochemical inhibitors against SARS-CoV-2 protease and its two Asian mutants.

Authors:  Ijaz Muhammad; Noor Rahman; Sadaf Niaz; Zarrin Basharat; Luca Rastrelli; Sivaraman Jayanthi; Thomas Efferth; Haroon Khan
Journal:  Comput Biol Med       Date:  2021-04-16       Impact factor: 6.698

5.  Design of modular autoproteolytic gene switches responsive to anti-coronavirus drug candidates.

Authors:  Nik Franko; Ana Palma Teixeira; Shuai Xue; Ghislaine Charpin-El Hamri; Martin Fussenegger
Journal:  Nat Commun       Date:  2021-11-22       Impact factor: 14.919

6.  Novel dynamic residue network analysis approaches to study allosteric modulation: SARS-CoV-2 Mpro and its evolutionary mutations as a case study.

Authors:  Olivier Sheik Amamuddy; Rita Afriyie Boateng; Victor Barozi; Dorothy Wavinya Nyamai; Özlem Tastan Bishop
Journal:  Comput Struct Biotechnol J       Date:  2021-11-25       Impact factor: 7.271

Review 7.  Targeting SARS-CoV-2 Proteases for COVID-19 Antiviral Development.

Authors:  Zongyang Lv; Kristin E Cano; Lijia Jia; Marcin Drag; Tony T Huang; Shaun K Olsen
Journal:  Front Chem       Date:  2022-02-03       Impact factor: 5.545

Review 8.  Potential Resistance of SARS-CoV-2 Main Protease (Mpro) against Protease Inhibitors: Lessons Learned from HIV-1 Protease.

Authors:  János András Mótyán; Mohamed Mahdi; Gyula Hoffka; József Tőzsér
Journal:  Int J Mol Sci       Date:  2022-03-23       Impact factor: 5.923

9.  Large-Scale Recombinant Production of the SARS-CoV-2 Proteome for High-Throughput and Structural Biology Applications.

Authors:  Nadide Altincekic; Sophie Marianne Korn; Nusrat Shahin Qureshi; Marie Dujardin; Martí Ninot-Pedrosa; Rupert Abele; Marie Jose Abi Saad; Caterina Alfano; Fabio C L Almeida; Islam Alshamleh; Gisele Cardoso de Amorim; Thomas K Anderson; Cristiane D Anobom; Chelsea Anorma; Jasleen Kaur Bains; Adriaan Bax; Martin Blackledge; Julius Blechar; Anja Böckmann; Louis Brigandat; Anna Bula; Matthias Bütikofer; Aldo R Camacho-Zarco; Teresa Carlomagno; Icaro Putinhon Caruso; Betül Ceylan; Apirat Chaikuad; Feixia Chu; Laura Cole; Marquise G Crosby; Vanessa de Jesus; Karthikeyan Dhamotharan; Isabella C Felli; Jan Ferner; Yanick Fleischmann; Marie-Laure Fogeron; Nikolaos K Fourkiotis; Christin Fuks; Boris Fürtig; Angelo Gallo; Santosh L Gande; Juan Atilio Gerez; Dhiman Ghosh; Francisco Gomes-Neto; Oksana Gorbatyuk; Serafima Guseva; Carolin Hacker; Sabine Häfner; Bing Hao; Bruno Hargittay; K Henzler-Wildman; Jeffrey C Hoch; Katharina F Hohmann; Marie T Hutchison; Kristaps Jaudzems; Katarina Jović; Janina Kaderli; Gints Kalniņš; Iveta Kaņepe; Robert N Kirchdoerfer; John Kirkpatrick; Stefan Knapp; Robin Krishnathas; Felicitas Kutz; Susanne Zur Lage; Roderick Lambertz; Andras Lang; Douglas Laurents; Lauriane Lecoq; Verena Linhard; Frank Löhr; Anas Malki; Luiza Mamigonian Bessa; Rachel W Martin; Tobias Matzel; Damien Maurin; Seth W McNutt; Nathane Cunha Mebus-Antunes; Beat H Meier; Nathalie Meiser; Miguel Mompeán; Elisa Monaca; Roland Montserret; Laura Mariño Perez; Celine Moser; Claudia Muhle-Goll; Thais Cristtina Neves-Martins; Xiamonin Ni; Brenna Norton-Baker; Roberta Pierattelli; Letizia Pontoriero; Yulia Pustovalova; Oliver Ohlenschläger; Julien Orts; Andrea T Da Poian; Dennis J Pyper; Christian Richter; Roland Riek; Chad M Rienstra; Angus Robertson; Anderson S Pinheiro; Raffaele Sabbatella; Nicola Salvi; Krishna Saxena; Linda Schulte; Marco Schiavina; Harald Schwalbe; Mara Silber; Marcius da Silva Almeida; Marc A Sprague-Piercy; Georgios A Spyroulias; Sridhar Sreeramulu; Jan-Niklas Tants; Kaspars Tārs; Felix Torres; Sabrina Töws; Miguel Á Treviño; Sven Trucks; Aikaterini C Tsika; Krisztina Varga; Ying Wang; Marco E Weber; Julia E Weigand; Christoph Wiedemann; Julia Wirmer-Bartoschek; Maria Alexandra Wirtz Martin; Johannes Zehnder; Martin Hengesbach; Andreas Schlundt
Journal:  Front Mol Biosci       Date:  2021-05-10

10.  Neural Upscaling from Residue-Level Protein Structure Networks to Atomistic Structures.

Authors:  Vy T Duong; Elizabeth M Diessner; Gianmarc Grazioli; Rachel W Martin; Carter T Butts
Journal:  Biomolecules       Date:  2021-11-30
View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.