Literature DB >> 34774600

Structural and functional significance of the amino acid differences Val35Thr, Ser46Ala, Asn65Ser, and Ala94Ser in 3C-like proteinases from SARS-CoV-2 and SARS-CoV.

Alexander I Denesyuk1, Eugene A Permyakov2, Mark S Johnson3, Sergei E Permyakov2, Konstantin Denessiouk3, Vladimir N Uversky4.   

Abstract

Three dimensional structures of (chymo)trypsin-like proteinase (3CLpro) from SARS-CoV-2 and SARS-CoV differ at 8 positions. We previously found that the Val86Leu, Lys88Arg, Phe134His, and Asn180Lys mutations in these enzymes can change the orientation of the N- and C-terminal domains of 3CLpro relative to each other, which leads to a change in catalytic activity. This conclusion was derived from the comparison of the structural catalytic core in 169 (chymo)trypsin-like proteinases with the serine/cysteine fold. Val35Thr, Ser46Ala, Asn65Ser, Ala94Ser mutations were not included in that analysis, since they are located far from the catalytic tetrad. In the present work, the structural and functional roles of these variable amino acids at positions 35, 46, 65, and 94 in the 3CLpro sequences of SARS-CoV-2 and SARS-CoV have been established using a comparison of the same set of proteinases leading to the identification of new conservative elements. Comparative analysis showed that, in addition to interdomain mobility, which could modulate catalytic activity, the 3CLpro(s) can use for functional regulation an autolytic loop and the unique Asp33-Asn95 region (the Asp33-Asn95 Zone) in the N-terminal domain. Therefore, all 4 analyzed mutation sites are associated with the unique structure-functional features of the 3CLpro from SARS-CoV-2 and SARS-CoV. Strictly speaking, the presented structural results are hypothetical, since at present there is not a single experimental work on the identification and characterization of autolysis sites in these proteases.
Copyright © 2021 Elsevier B.V. All rights reserved.

Entities:  

Keywords:  (Chymo)trypsin-like proteinases; Autolysis; COVID-19; Catalytic tetrad; Interdomain loop; SARS-CoV-2; Structural catalytic core

Mesh:

Substances:

Year:  2021        PMID: 34774600      PMCID: PMC8580570          DOI: 10.1016/j.ijbiomac.2021.11.043

Source DB:  PubMed          Journal:  Int J Biol Macromol        ISSN: 0141-8130            Impact factor:   6.953


Introduction

The coronavirus disease 2019 (COVID-19) pandemic, due to severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), has raised many important issues for the international scientific community especially regarding the molecular mechanisms involved in the viral infection process and SARS-CoV-2 replication. In coronaviruses, there are two functionally important proteinases papain-like (PLpro) and the (chymo)trypsin-like cysteine proteinase (3CLpro, also known as viral main proteinase, Mpro), both belonging to the family of cysteine proteinases (https://swissmodel.expasy.org/repository/species/2697049) [1], [2]. The main protease 3CLpro, which corresponds to the coronavirus nonstructural protein 5 (NSP5), splits the central and C-terminal regions of the polyprotein at 11 conserved sites generating 11 mature viral NSPs required for viral replication and infection [3], [4]. The coordinates of the three-dimensional (3D) structures of the 3CLpro(s) from SARS-CoV (PDB ID 1UJ1) [5] and SARS-CoV-2 (PDB ID 6LU7) [6] first appeared in the Protein Data Bank (PDB [7], [8]) in 2003 and 2020, respectively. These two viral proteases differ in their amino acid sequence at 12 positions: Thr35Val, Ala46Ser, Ser65Asn, Leu86Val, Arg88Lys, Ser94Ala, His134Phe, Lys180Asn, Leu202Val, Ala267Ser, Thr285Ala and Ile286Leu [9], [10]. Only the first 8 out of the 12 variable amino acids are resolved in the (chymo)trypsin-like 3D structures, whereas the last 4 amino acid positions – 202, 267, 285, and 286 – belong to the an additional C-terminal extension that forms domain, but which lies outside the solved 3D structure of the 3CLpro. The sequence differences at these 12 amino acid positions are not expected to significantly affect the polarity and hydrophobicity of SARS-CoV-2 3CLpro compared to 3CLpro from SARS-CoV [9]. It is of importance that all 12 variable amino acids are located outside of the catalytic and substrate binding regions of the enzyme. During 2020–2021, structural information from X-ray structures for more than 300 SARS-CoV-2 3CLpro complexes with various inhibitor molecules were reported (https://swissmodel.expasy.org/repository/species/2697049). In addition, the 3D structures of complexes of SARS-CoV-2 3CLpro with 33 known and potential inhibitor molecules have been studied using computational methods in order to discover potential inhibitors that can be used as antiviral therapeutic agents targeting the (chymo)trypsin-like cysteine proteinase (see recent review [11]). As a result, numerous ligand-binding amino acids of SARS-CoV-2 3CLpro have been identified. However, only 2 (Ser46 and Leu286) amino acids of the above mentioned 12 variable residue positions are mentioned in the review [11]. Motivated by the lack of insights on any structural and functional consequences of these amino acid differences at the aforementioned 12 positions in the sequences of the 3CLpro of SARS-CoV and SARS-CoV-2, we first identified the Structural Catalytic Core (SCC) in 169 (chymo)trypsin-like proteinases with serine/cysteine fold [12], [13]. Next, we compared the NBCZone(s) of the 3CLpro of SARS-CoV [13] and SARS-CoV-2 [12], and found that these NBCZones of both viral proteinases form compact structures around the catalytic nucleophile and base that consist of 11 conserved amino acids: Leu27, Asn28, Cys38, Pro39, Arg40, His41 (catalytic base), Val42, Cys145 (catalytic nucleophile), Gly146, Ser147 and His163. Furthermore, it turned out that the NBCZones of 3CLpro of the SARS-CoV-2 (PDB ID 7BQY) [6] and SARS-CoV (PDB ID 6XHN) [14] are identical to each other [12]. The NBCZone is only a part of the SCC. Therefore, the structural regions around the third (Cys85) and fourth (His164) members of the structural catalytic tetrad have been analyzed as well [12]. The compact complex of four amino acids around the catalytic acid analogue Cys85 is referred to as 102T-Core. “T” indicates that the canonical residue numbering based on the trypsin sequence is used; in this case referring to the third member of the structural catalytic tetrad, the catalytic Asp102 in trypsin (PDB ID 4I8H) [15]). In our work, for each protease we used both the original numbering of the amino acid sequence and the canonical numbering based on trypsin. Consequently, the 102T-Core and 85-Core of 3CLpro of SARS-CoV-2 consists of Gln83, Cys85, Val86 and Leu87. In SARS-CoV 3CLpro, leucine replaces valine at position 86. The inclusion of the four amino acids of the 102T-Core in the SCC composition made it possible to reveal one more important difference between the 3CLpro of SARS-CoV-2 and SARS-CoV: Lys88Arg. This amino acid position – 88 (105T) – is located at the conserved position of the β-sheet of the N-terminal β-barrel [16]. A set of six amino acids at positions 134, 135, 136, 180, 181, and 182, located spatially next to the fourth member of the structural catalytic tetrad, His164, is called the S-Core [12]. The S-Core from the 3CLpro of SARS-CoV-2 and SARS-CoV are characterized by two amino acid differences: Phe134His and Asn180Lys. These amino acids form a part of the SCC, but they are located on its periphery. The tertiary structures of (chymo)trypsin-like proteinases with the serine/cysteine fold are separable into groups on the basis of the super-secondary structure differences within this region [12]. Amino acids at positions 86 and 180 are involved in the contacts between the N- and C-terminal β-barrels of the 3CLpro. The sequence differences between the SARS-CoV-2 and SARS-CoV 3CLpros at positions 86 and 180 seem to affect the nature of the interaction between N- and C-terminal β-barrels, which ultimately leads to the modulation of enzymatic activity [12]. These results made it possible to explain the structural and functional significance of 4 (positions 86, 88, 134 and 180) out of 8 observed amino acid differences in the SARS-CoV-2 and SARS-CoV 3CLpro sequences [12]. In the present work, we compared distinct 3D structures of 170 (chymo)trypsin-like proteinases with the serine/cysteine fold, identified new conserved elements, and established the structural and functional roles of the remaining four variable amino acids at positions 35, 46, 65 and 94 in the amino acid sequences of SARS-CoV-2 and SARS-CoV 3CLpros, for which the structural context has been reported. It has been suggested that these 4 positions are associated with the autolysis process in two loops of the 3CLpro of SARS-CoV-2 as shown for trypsin and chymotrypsin proteinases [17], [18], [19]. Currently, a strategy for inhibiting serine/cysteine proteases by targeting its autolysis loops is actively developing [20]. Therefore, the goal of this study was to find some important regularities in the 3D structures of the family of (chymo)trypsin-like serine/cysteine proteases (including 3CLpro (chymo)trypsin-like proteases from SARS-CoV-2 and SARS-CoV), which are currently missing in the structural description of these proteins and which can be used to answer some functional questions. To this end, we utilized a structural biology approach based on the multiple structural comparison and subsequent analysis. Earlier, application of this analysis revealed the presence in the alpha/beta-hydrolases of unique structural motifs termed the structural catalytic zones (SCZs) and the SCCs that serve to properly position the catalytic machinery and coordinate function. The advantage of the use of the SCZs and SCCs for the comparative analysis is in the capability of this approach to compare and group proteins without making superposition of their entire tertiary structures. Therefore, this approach provides useful means to classify proteins into various groups on the basis of such local structural similarities. Earlier, this analysis revealed that all proteases with the (chymo)trypsin-like serine/cysteine fold contain a universal 3D structural motif in their structural catalytic cores, the Nucleophile-Base Catalytic Zone (NBCZone), that includes eleven amino acids near the catalytic nucleophile and base [12]. We also analyzed in detail the peculiarities of the amino acid content of the SCCs in 169 proteinases with the (chymo)trypsin-like serine/cysteine fold [12], [13]. This analysis revealed that based on the differences in their SCCs, these proteinases can be divided into two classes and four groups, with the proteinases belonging to different classes and groups differing from each other by the nature of the interaction between their N- and C-terminal β-barrels. The utility of this approach for gaining important information of the functional peculiarities of proteins was proven by the comparative analysis of the 3CLpro(s) from SARS-CoV-2 and SARS-CoV, which showed that amino acids at positions 103T and 179T affect the nature of the interaction of the “catalytic acid” core (102T-Core, N-terminal β-barrel) with the “supplementary” core (S-Core, C-terminal β-barrel), which ultimately results in the modulation of an enzymatic activity. It was also found that the Val86Leu, Lys88Arg, Phe134His, and Asn180Lys mutations in these enzymes can change the orientation of the N- and C-terminal domains of 3CLpro relative to each other, which leads to a change in catalytic activity [13]. However, Val35Thr, Ser46Ala, Asn65Ser, Ala94Ser mutations were not included in the previous analysis, since they are located far from the catalytic tetrad. In the present work, we are filling this gap and are using the aforementioned structural biology approach to establish the structural and functional roles of these variable amino acids at positions 35, 46, 65, and 94 in the 3CLpro sequences of SARS-CoV-2 and SARS-CoV. A comparison of the same set of 169 proteinases with the (chymo)trypsin-like serine/cysteine fold allowed us to identify new conservative elements. We found that, in addition to interdomain mobility, which could modulate catalytic activity, the 3CLpro(s) can use an autolytic loop and the unique Asp33-Asn95 region (the Asp33-Asn95 Zone) in the N-terminal domain. Therefore, all 4 analyzed mutation sites are associated with the unique structure-functional features of the 3CLpro from SARS-CoV-2 and SARS-CoV.

Results and discussion

Selection of residue Asn28 of the 3CLpro SARS-CoV-2 as the starting amino acid for structural analysis

Earlier, a comparative structural analysis of 169 (chymo)trypsin-like proteases with the serine/cysteine fold was carried out. The analysis was based on the identification in each protein of an SCC near the catalytic tetrad and their subsequent comparison with each other [12], [13]. However, in those studies we found only 4 amino acid sequence positions, in which 3CLpros from SARS-CoV-2 and SARS-CoV sequences differed from each other. Later, it became clear that coronavirus proteases modulate their enzymatic activity using amino acids that also are not located in structural proximity of the catalytic tetrad (this work). The NBCZone of the SARS-CoV-2 3CLpro (PDB ID 7BQY) [6], which is a part of the SCC around the catalytic nucleophile Cys145 and the catalytic base His41, includes the amino acids Asn28 and His163 (Fig. 1A). Structural analysis of the region around His163 and adjacent fourth member of the structural catalytic tetrad, His164, made it possible to elucidate the functional role of the amino acid differences at positions 86 and 180 [13].
Fig. 1

(A) and (B) show the “Val42-Leu27 Zone” and “Cys58-Cys42 Zone” of the SARS-CoV-2 3CLpro and Trypsin Bos Taurus, respectively. 58T–42T Zone is the representative zone for 148 (chymo)trypsin-like proteinases with serine/cysteine fold. 58T–42T Zone consists of the two part: well-structured (four hydrogen bonds) part, which consists of the residues in positions 42T, 33T, 34T and 64T, and the 58T–64T loop, variable in length. The positions of the Cα-atoms of the amino acids Ser46 and Asn65 of the SARS-CoV-2 3CLpro (A), which have changed in comparison with the SARS-CoV 3CLpro, are marked with large green circles. The location of the 58T–42T Zone in relation to the NBCZone formed by amino acids: 42T, 43T, 57T (catalytic base), 58T, 195T (catalytic nucleophile), 196T, 197T and 213T is also shown. Some variations in the organization of the well-structured part of the 58T–42T Zone: (C) shows 3D complex of the NS3 protease and NS4A cofactor (brown); (D) 3C-like viral cysteine protease shows another variant of structural organization of the well-structured part of the 58T–42T Zone. The second and third β-strands are connected by aromatic residues; (E) Instead of four hydrogen bonds, only two remained in this variant of the 58T–42T Zone; and (F) The 2A proteinase has no structural analogue of the 58T–42T Zone at all. (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)

(A) and (B) show the “Val42-Leu27 Zone” and “Cys58-Cys42 Zone” of the SARS-CoV-2 3CLpro and Trypsin Bos Taurus, respectively. 58T–42T Zone is the representative zone for 148 (chymo)trypsin-like proteinases with serine/cysteine fold. 58T–42T Zone consists of the two part: well-structured (four hydrogen bonds) part, which consists of the residues in positions 42T, 33T, 34T and 64T, and the 58T–64T loop, variable in length. The positions of the Cα-atoms of the amino acids Ser46 and Asn65 of the SARS-CoV-2 3CLpro (A), which have changed in comparison with the SARS-CoV 3CLpro, are marked with large green circles. The location of the 58T–42T Zone in relation to the NBCZone formed by amino acids: 42T, 43T, 57T (catalytic base), 58T, 195T (catalytic nucleophile), 196T, 197T and 213T is also shown. Some variations in the organization of the well-structured part of the 58T–42T Zone: (C) shows 3D complex of the NS3 protease and NS4A cofactor (brown); (D) 3C-like viral cysteine protease shows another variant of structural organization of the well-structured part of the 58T–42T Zone. The second and third β-strands are connected by aromatic residues; (E) Instead of four hydrogen bonds, only two remained in this variant of the 58T–42T Zone; and (F) The 2A proteinase has no structural analogue of the 58T–42T Zone at all. (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.) The tertiary structure in the vicinity of Asn28 has previously been subjected to rigorous structural analysis as well [12]. It has been shown that due to the presence of the Asn28 side chain (position 43T), prokaryotic and viral proteases cannot undergo a structural transition from zymogen to zyme type, which is observed in eukaryotic proteases [12]. Asn28 is not directly involved in interactions with inhibitors [6]. In spite of this, mutation of Asn28 to alanine disrupts dimerization (active form of enzyme) and completely inactivates the 3CLpro SARS-CoV [21]. Although Asn28 is not directly involved in interactions with inhibitors [6], it has been shown that 8 out of 33 promising and potential 3CLpro SARS-CoV-2 inhibitor molecules are in contact with the adjacent Leu27 [11]. A hydrophobic contact between Val42, which follows the catalytic base His41, and Leu27 can maintains the conformation of the polypeptide chain near the active site (Fig. 1A). Let us clarify that this statement is a hypothesis.

Val42-Leu27 zone of 3CLpro SARS-CoV-2

In view of the importance of the Leu27-Asn28 dipeptide and the Val42 for the catalytic activity of the SARS-CoV-2 3CLpro, this region of the tertiary structure; i.e., located “above” the NBCZone in Fig. 1A, has been analyzed using the Discovery Studio Modeling Environment (Dassault Systèmes BIOVIA, Discovery Studio Modeling Environment, Release 2017, San Diego: Dassault Systèmes, 2016) and the Ligand-Protein Contacts (LPC) software [22]. Visual analysis of the tertiary structure of the SARS-CoV-2 3CLpro in the vicinity of Leu27, Asn28 and Val42, allowed identification of what we refer to as the “Val42-Leu27 Zone” (Fig. 1A). A characteristic structural feature of the Val42-Leu27 Zone is the presence of four main-chain hydrogen bonds between the two pairs of amino acids: Leu27 and Val20, as well as Thr21 and Leu67 (Table S1, row numbered 1, columns 6 and 7). These four amino acids are located at the ends of three antiparallel β-strands and form the first well-structured part of the V42-L27 Zone. The second part of the Val42-Leu27 Zone is the Val42-Leu67 loop, which is also called loop B [23] or 60-loop [24]. If Leu27 belongs to the “first” β-strand in Fig. 1A, then Leu67 belongs to the third β-strand, which is parallel to the first β-strand in the β-sheet. As noted above, the hydrophobic contact between Val42 and Leu27 unites two structurally dissimilar parts of the Val42-Leu27 Zone into a single compact structure. Note that Val42 and Leu27 of the Val42-Leu27 Zone are also components of the NBCZone.

58T–42T zone of (chymo)trypsin-like proteinases with the serine/cysteine fold

The SARS-CoV-2 3CLpro structure is one of 170 structures studied by us that have the same b.47 fold classification within the Structural Classification of Proteins–extended (SCOPe) (https://scop.berkeley.edu/ [25]). The tertiary structure of trypsin (PDB ID 4I8H) [15] is a representative example of this fold. Val42 and Leu27 of the SARS-CoV-2 3CLpro structurally correspond to Cys58 and Cys42 of trypsin, respectively (Fig. 1B). Visual analysis of the tertiary structure of trypsin in the vicinity of Cys42, Gly43, and Cys58, allowed identification of the “Cys58-Cys42 Zone”, whose characteristic structural feature is the presence of four main-chain hydrogen bonds between two pairs of amino acids: Cys42 and Leu33, as well as Asn34 and Gln64 (Table S1, row numbered 13, columns 6 and 7). At first glance, the Val42-Leu27 and Cys58-Cys42 Zones are structurally similar (see Fig. 1A and B). However, one can see two fundamental differences. The hydrophobic contact between Val42 and Leu27 that closes the Val42-Leu27 Zone is replaced by the Cys58-Cys42 disulfide bond, which is conserved in eukaryotes [12]. The second significant difference between the two zones is the different lengths of the Val42-Leu67 and Cys58-Gln64 loops: 26 and 7 residues, respectively (Table S1, rows numbered 1 and 13, column 9). The Val42-Leu27 Zone for the 3CLpro SARS-CoV-2 can be also written in a universal form as the 58T–42T Zone. Unlike prokaryotic and viral proteases, most eukaryotic proteases have disulfide-linked cysteines at positions 42T and 58T [12]. The need for a strong side-chain interaction forming the single ring-shaped structure (Zone) is apparently directly related to the functioning of eukaryotic proteases. In fact, it has been shown that the presence or absence of the Cys42T-Cys58T disulfide bond affects the overall thermal stability of trypsin [26]. The results of the structural analysis of the 58T–42T Zone for all 170 structures are presented in Table S1. 148 structures (Table S1, rows numbered 1–148) have an identical structural organization of the first well-structured part of the 58T–42T Zone. This large group of proteins includes all eukaryotic and prokaryotic proteases, TA and [KR]P groups of the viral serine proteases, [TA]N and [ΨC][PQ] groups of the viral cysteine proteases and inactive proteases. The names of the groups and the list of the corresponding proteases are taken from the study on the characterization of the NBCZones in (chymo)trypsin-like proteinases with the serine/cysteine fold [12]. The [ΨC][PQ] group includes twelve viral cysteine proteases, which lack the catalytic acid [13]; i.e., instead of the catalytic triad, they have a catalytic dyad in the active site (Table S1, rows numbered 1–12). Eleven out of twelve proteins are coronavirus proteases. It is important to note that, despite the structural identity of the well-structured part of the 58T–42T Zone of this group of 148 proteases, the 58T–64T loop varies significantly in length from 4 to 37 residues (Table S1, column 9). Some variation in the organization of the well-structured part of the 58T–42T Zone is demonstrated by the proteases belonging to the viral serine proteases, [ST]Ψ group (Table S1, rows numbered 149–156). Unlike the 148 proteases considered earlier, the PDB files of 8 proteases belonging to this group contain non-covalent, heterodimer complexes formed by two proteins, the N-terminal serine protease domain of NS3 (catalytic subunit) and the NS4A cofactor (activation subunit). Fig. 1C illustrates the 58T–42T (Gly1058-Phe1043) Zone for the complex of the NS3 protease and NS4A protein from the hepatitis C virus (PDB ID 3SU6) [27]. The formation of a heterodimeric complex results in a structure, where a β-strand from NS4A (brown in Fig. 1C) is located in the Gly1058-Phe1043 Zone structure, replacing the third β-strand of NS3 that begins with Ala1065. The inclusion of the NS4A cofactor in the NS3 protease structure increases the well-structured part of the 58T–42T Zone by one amino acid: Val993. Therefore, in this case, the length of the Gly1058-Ala1065 loop is 8 residues (Table S1, row numbered 154, column 9). The viral cysteine proteases, T[TSA] group (Table S1, rows numbered 157–161), show another variant of the organization of the well-structured part of the 58T–42T Zone (Fig. 1D). Instead of two hydrogen bonds formed between the main-chain atoms that connect the second and third β-strands, the role of such an interchain clamp in the 3C-like protease from the Norwalk virus (PDB ID 5E0G) [28] is performed by aromatic contacts among Phe12, Phe39, and Phe40. The length of the Val31-Phe39 loop is 9 residues (Table S1, row numbered 159, column 9). The distinctive structural characteristics of the adhesion and penetration protein autotransporter (Fig. 1E, PDB ID 3SYJ, [29]), which belongs to the SPATE family [30], [31], were previously studied [12]. Six compared proteins belonging to this family (Table S1, rows numbered 162–167) have a main-chain conformation at position 43T (position 84 in Fig. 1E) that is different from that found in all other proteins analyzed so far. As a result, instead of four hydrogen bonds, only two remained in the Asn99-Val84 Zone of these proteins. Furthermore, two fragments of the polypeptide chain have a loop-like conformation, instead of the β-structural one. The length of Asn99-Asp106 loop is 8 residues (Table S1, row numbered 162, column 9). Three 2A proteinases (Table S1, rows numbered 168–170) have the same structural characteristics at position 43 as the proteases of the SPATE family. However, the N-terminal domain in these proteinases is not a β-barrel, but a four-stranded antiparallel β-sheet [32]. As a result, the 2A proteinase has no structural analogy to the 58T–42T Zone (Fig. 1F, PDB ID 4MG3, [33]). Summarizing everything said in this section, we can conclude that amino acid residues at positions 33T, 34T, and 43T (Val20, Thr21, and Asn28 in the 3CLpro SARS-CoV-2 (Fig. 1A)) should also be included in the conserved structural core of the (chymo)trypsin-like proteinases with the serine/cysteine fold. This conclusion is fully consistent with the results of the structural analysis of 13 widely divergent serine proteinases (see Fig. 10 in [16]). The most frequently observed changes in the local secondary structure are found near position 64T. These structural rearrangements change the nature of the proteolytic activity of the corresponding proteins, but do not eliminate it.

Val42-Leu67 loop of 3CLpro SARS-CoV-2

The Val42-Leu67 loop of the SARS-CoV-2 3CLpro contains 26 amino acids (Table S1, row numbered 1, columns 8 and 9) and the residues at positions 46 and 65 (Figs. 1A and 2A) were the subject of our structural analysis. This loop is long and contains 2 of 8 positions, in which amino acid residues are different in the 3CLpros from SARS-CoV-2 and SARS-CoV. Therefore, we studied the structural and functional role of this loop. Since the 58T–64T loop length in 170 proteases varies significantly from 4 to 37 residues, it was impossible to compare them structurally. For this reason, we analyzed structural and functional role of the Val42-Leu67 (58T–64T) loop of the SARS-CoV-2 3CLpro.
Fig. 2

(A) Hydrophobic interactions (small green and orange circles) between Val42-Leu67 loop and Cys85-Zone in the SARS-CoV-2 3CLpro. The positions of the Cα-atoms of the amino acids Ser46 and Asn65 of the SARS-CoV-2 3CLpro (A), which have changed in comparison with the SARS-CoV 3CLpro, are marked with large green circles. Polar contacts (small blue and green circles) between the conserved parts of the interdomain loop (IDL): Val186-Gln192, Horse Shoe-Shaped Region (HSSR) and Val42-Leu67 loop. (B) In the complex between ligand and SARS-CoV-2 3CLpro hydrophobic amino acids Cys44, Met49 and Tyr54 of the Val42-Leu67 loop and the Leu4 residue of the ligand interact with each other and maintain the 3D position of the catalytic histidine in position 41. (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)

(A) Hydrophobic interactions (small green and orange circles) between Val42-Leu67 loop and Cys85-Zone in the SARS-CoV-2 3CLpro. The positions of the Cα-atoms of the amino acids Ser46 and Asn65 of the SARS-CoV-2 3CLpro (A), which have changed in comparison with the SARS-CoV 3CLpro, are marked with large green circles. Polar contacts (small blue and green circles) between the conserved parts of the interdomain loop (IDL): Val186-Gln192, Horse Shoe-Shaped Region (HSSR) and Val42-Leu67 loop. (B) In the complex between ligand and SARS-CoV-2 3CLpro hydrophobic amino acids Cys44, Met49 and Tyr54 of the Val42-Leu67 loop and the Leu4 residue of the ligand interact with each other and maintain the 3D position of the catalytic histidine in position 41. (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)

Hydrophobic interactions between the Val42-Leu67 loop and Cys85-zone

The tertiary conformation of the Val42-Leu67 loop is stabilized by numerous hydrophobic contacts of the amino acids Val42, Ile43, Tyr54, Leu58, and Phe66 with the N- and C-terminal hydrophobic amino acids Met82 and Leu87 of the Cys85-Zone (Fig. 2A). The Cys85-Zone is actually a structural analogue of the catalytic acid (position 102T) zone [13]. With a few exceptions, a similar type of interactions between the 58T–64T loop and 102T-Zone is typical for all (chymo)trypsin-like proteinases with the serine/cysteine fold (data not shown). The mutation Asn65Ser of the SARS-CoV-2 3CLpro is located near the C-terminal end of the Val42-Leu67 loop. Positions 65 and 66 are adjacent in the amino acid sequence, therefore, it can be reasonably assumed that Asn65Ser sequence difference would affect the contact between the Val42-Leu67 loop and the Cys85-Zone through the amino acid Phe66. The change in the nature of this contact compared to homologous contact in the 3CLpro of the SARS-CoV is caused both by a change in the size of the side chain group of the amino acid at position 65 and by the appearance of an additional positively charged group of NH2 atoms.

The N-end half of the Val42-Leu67 loop interacts with ligand

An analysis of contacts in the complex (PDB ID 7BQY) between the PRD_002214 ligand and SARS-CoV-2 3CLpro performed by means of the LPC software [22] showed that Cys44, Met49, and Tyr54 of the Val42-Leu67 loop and residue Leu4 of the ligand interact with each other and maintain the 3D positioning of the catalytic histidine at the sequence position 41 mainly via hydrophobic interactions (Fig. 2B). These results are consistent with the data presented in the recent review of application of various computational methods (see Table 1 in [11]) for the identification of amino acids that are involved in contacts between 33 antiviral ligands and the SARS-CoV-2 3CLpro. Compared to other residues of the Val42-Leu67 loop, Met49 (17 of 33 cases) and Tyr54 (8 of 33 cases) are the most frequently observed as residues participating in contacts. Therefore, the N-terminal half of the Val42-Leu67 loop is responsible for modulating the catalytic activity of the SARS-CoV-2 3CLpro. Let us clarify that this statement is a hypothesis. It is possible that the observed replacement of the hydrophobic amino acid alanine of in the SARS-CoV-2 3CLpro by the polar serine at position 46 will affect this modulation. Ser46 and Met49 are located on a short, one-turn α-helix Thr45-Met49. The side-chains of Ser46 and Met49 groups are in tight contact with each other. Apparently, Met49 is a key intermediate amino acid, through which a change in the residue at position 46 affects the catalytic base. Fig. 3 shows the structural alignment of the Val42-Leu67 loop from the SARS-CoV-2 3CLpro, with loops from ten coronavirus proteases and the loop of one 3Cl protease from the Cavally virus (Table S1, rows numbered 1–12). The alignment was built using the Protein Structure Comparison Server Dali [34]. The length of loops in coronavirus proteases corresponding to the Val42-Leu67 loop of the SARS-CoV-2 3CLpro is fairly uniform and varies from 23 to 26 amino acids (Table S1, columns 8 and 9). It is important to note that the alignment contains deletions at the N-terminus within a narrow range of positions 45–48 (SARS-CoV-2 3CLpro numbering). Therefore, the use of the position 46 to modulate the catalytic activity of coronavirus proteases is quite possible.
Fig. 3

The structure-based multiple sequence alignment of Val42-Leu67 loop of SARS-CoV-2 3CLpro, ten corresponding coronavirus proteases and one 3Cl protease from Cavally virus loops.

The structure-based multiple sequence alignment of Val42-Leu67 loop of SARS-CoV-2 3CLpro, ten corresponding coronavirus proteases and one 3Cl protease from Cavally virus loops.

The N-end half of the Val42-Leu67 loop interacts with the N-terminus of the interdomain loop (IDL)

The SARS-CoV-2 3CLpro structure is comprised of two β-barrels (domains I and II), bringing the catalytic residues together at their interface (canonical (chymo)trypsin-like structure, residues Ser1-Asp176), and an additional C-terminal extension [6]. Asp176 is the last amino acid of the His164-Core [13]. The C-terminal extension that starts with Leu177 contains interdomain loop (IDL, residues 184–199) (Fig. 2A). The conserved Val186-Gln192 Horse-Shoe-Shaped Region (HSSR) is a part of the IDL [35]. Asp187 and the water molecule HOH570 compensate for the absence of catalytic acid in the SARS-CoV-2 3CLpro (Fig. 2A). The residues Ser46, Met49, Leu50, Asn51, Pro52 and Tyr54 of the Val42-Leu67 loop are in contact with Asp187, Arg188, Gln189 and Thr190 of the IDL. It has been suggested that the IDL may represent a regulatory site, since it does not contain the amino acids of the catalytic triad [35]. Therefore, the Ser46Ala sequence difference between the SARS-CoV-2 and SARS-COV 3CLpros, located near position 58T, can affect the catalytic activity of the SARS-CoV-2 3CLpro not only by itself, but also indirectly through the IDL C-terminal extension. As noted above, the residue Ser46 is an essential functional element affecting the binding process of the ligand [11], [36]. It was also suggested that the mutation Ser46Ala may increase the contribution of other hydrophilic amino acids to the structure of the active site [37].

Ser46Ala and Asn65Ser, and probable autolysis regulation in the SARS-CoV-2 3CLpro

There is another possibility related to the functional consequences of the Ser46Ala and Asn65Ser sequence differences between the SARS-COV-2 and SARS-COV 3CLpro proteins. Dissolved trypsin is known to undergo autolytic degradation [17]. Three important autolysis sites have been reported for bovine trypsin: Lys60-Ser61, Arg117-Val118 and Lys145-Ser146. In rat (chymo)trypsin the autolytic site Phe114 (Leu114 in trypsin) was identified [19]. Mutant Phe114Ile has the same enzymatic activity and molecular stability as the wild-type enzyme, but exhibites significantly slower autolytic inactivation. In rat trypsin, Lys61 (Lys60 in bovine trypsin) and Arg117 are both replaced by Asn [18]. Kinetic parameters of the mutants did not change, but the autolysis rate slowed down significantly. Lys60 of bovine trypsin belongs to the 58T–64T loop. Since the 58T–64T loop of the SARS-CoV-2 3CLpro differs significantly in amino acid sequence from 58T–64T loop of bovine trypsin, it is not known whether there are autolytic sites in the 58T–64T loop of the coronavirus proteases. If there are any, then the Ser46Ala and/or Asn65Ser sequence differences could alter the functional life-time of the SARS-CoV-2 3CLpro without changing its activity. This structural assumption is hypothetical and should be verified by appropriate experiments.

Structural relationship of amino acids Val35 and Ala94 of the SARS-CoV-2 3CLpro

The two remaining positions 35 and 94 of the SARS-CoV-2 3CLpro (PDB ID 7BQY), in which amino acid changes are observed compared to the SARS-CoV 3CLpro (PDB ID 6XHN), are located respectively at the N- and C-terminal ends of the functionally important β-strands Val35-Pro39 and Val86-Lys90 (Fig. 4A). The first of these two β-strands contains catalytic histidine His41 near its C-terminal end, and the second β-strand contains at its N-terminal Cys85 that replaces the catalytic acid. The last β-strand also contains the positions 86 and 88 where SARS-CoV-2 and SARS-CoV sequences have different amino acids. In the SARS-CoV-2 3CLpro 3D structure, these two β-strands form an antiparallel β-hairpin, held together by six interchain hydrogen bonds, starting from the bond N/Arg40-O/Cys85 (3.0 Å) and ending with the O/Asp34-N/Val91 bond (3.0 Å) (Fig. 4A). The 3D contacts between these fragments, being separated within the amino acid sequence, take place within the 3D structure, since two more hydrogen bonds are observed: O/Asp33-CA/Ala94 and O/Asp33-N/Asn95. In addition, the tetrapeptide Leu32-Val35 forms a β-turn. As a result, two fragments Asp33-Asp34 and Val91-Asn95 form a Asp33-Asn95 Zone. This zone directly contains the position of interest to us: Ala94. Val35 lies at the border of the Asp33-Asn95 Zone and an antiparallel β-hairpin.
Fig. 4

(A) and (B) show the “Asp33-Asn95 Zone” and “Ser49-Ala112 Zone” of the SARS-CoV-2 3CLpro and Trypsin Bos Taurus, respectively. 49T–112T Zone is the representative zone for 104 (chymo)trypsin-like proteinases with serine/cysteine fold: the code name of this zone is “(5 + 2)”. The positions of the Cα-atoms of the amino acids Val35 and Ala94 of the SARS-CoV-2 3CLpro (A), which have changed in comparison with the SARS-CoV 3CLpro, are marked with large green circles. The location of the 49T–112T Zone in relation to the catalytic base and catalytic acid is also shown. (C) a zone with the code name is “(8 + 2)”; (D) a zone with the code name “(9 + 2)” and disulfide bond Cys111-Cys50; (E) a zone with the code name: “(11 + 2)”, and the lack of 49T–111T contact or disulfide bond Cys50-Cys111; and (F) a zone with the code name: “(21 + 3)”, a special version of the zone in which the second fragment of the zone has 3 amino acids residues instead of two. (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)

(A) and (B) show the “Asp33-Asn95 Zone” and “Ser49-Ala112 Zone” of the SARS-CoV-2 3CLpro and Trypsin Bos Taurus, respectively. 49T–112T Zone is the representative zone for 104 (chymo)trypsin-like proteinases with serine/cysteine fold: the code name of this zone is “(5 + 2)”. The positions of the Cα-atoms of the amino acids Val35 and Ala94 of the SARS-CoV-2 3CLpro (A), which have changed in comparison with the SARS-CoV 3CLpro, are marked with large green circles. The location of the 49T–112T Zone in relation to the catalytic base and catalytic acid is also shown. (C) a zone with the code name is “(8 + 2)”; (D) a zone with the code name “(9 + 2)” and disulfide bond Cys111-Cys50; (E) a zone with the code name: “(11 + 2)”, and the lack of 49T–111T contact or disulfide bond Cys50-Cys111; and (F) a zone with the code name: “(21 + 3)”, a special version of the zone in which the second fragment of the zone has 3 amino acids residues instead of two. (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.) Due to the fact that positions 35 and 94 are spatially close to each other in the structure of the Asp33-Asn95 Zone from the SARS-CoV-2 3CLpro, it was of interest to analyze the equivalent region from all the 3D structures included in the superfamily of (chymo)trypsin-like proteinases with the serine/cysteine fold. The results of the analysis of 128 proteinases (75% of the total set) are collected in Table S2. In the structure of trypsin, the amino acids Ser49 and Ala112 respectively correspond to the Asp33 and Asn95 of the SARS-CoV-2 3CLpro (Fig. 4B). Fig. 4B shows the 3D organization of the Ser49-Ala112 Zone in trypsin. The first 108T–112T fragment consists of 5 residues, and the second 49T–50T fragment is formed by two amino acids. Therefore, the SARS-CoV-2 3CLpro and trypsin have a structurally identical Asp33-Asn95 and Ser49-Ala112 Zones, which are located far from the catalytic tetrad. In the course of the structural analysis, we identified 104 examples of such 3D sub-structures (Table S2, rows numbered 1–11, 13–79, 88–91, 104, 141–147 and 157–170). In addition to the identity of the 49T–112T Zones, 104 proteases have an identical 3D arrangement of the 49T–112T Zone in relation to the catalytic base at position 57T and the catalytic acid at position 102T (or its structural analog). This group of proteases was given the code name “(5 + 2)”. Eight other proteases form a (8 + 2) group (Table S2, rows numbered 133–140). Fig. 4C shows an example of the 49T–112T Zone for the nuclear inclusion protein A from the Tobacco vein mottling virus from a (8 + 2) group. Apparently, the presence of 8 amino acids in the first fragment of the 49T–112T Zone is the maximum possible number. This conclusion is derived from the structural observations, where the extention of the first fragment only by one amino acid leads to the impossibility of the formation of contact between the Cα-atom of the residue at position 111T and the main-chain oxygen atom of the residue in position 49T. However, four proteases (Table S2, rows numbered 27, 53, 67, and 76) overcome this restriction as their 49T–112T Zone is formed through the use of a disulfide bond Cys111-Cys50, thereby defining the (9 + 2) group (Fig. 4D). Further lengthening of the first fragment of the 49T–112T Zone (Fig. 4E) to a (11 + 2) group results in a situation, where the disulfide bond Cys111-Cys50 is no longer used for mutual structural stabilization of the ends of the 49T–112T Zone (Table S2, rows numbered 117–121). The last seven examples of proteases in Table S2 show that the 49T–112T Zone can be modified not only by changing the amino acid length of the first fragment but also by extending the second fragment by one residue. Fig. 4F shows serine protease HTRA2 from the Homo sapiens as an example of the Ala189-Leu240 Zone from such a (7 + 3) group. Four proteases have been found with a similar zone arrangement (Table S2, rows numbered 80–82, and 148). The last three prokaryotic proteases form a (21 + 3) group (see Table S2, rows numbered 101–103) and demonstrate the variant of the zone, in which the second fragment has 3 amino acids allowing for significantly wider variations in the length of the first fragment compared to the zone, in which the second fragment has 2 amino acids. In the Section 2.4.4. we cited the works in which Arg117-Val118 dipeptide of trypsin was mentioned as an autolysis site [18]. There appears to be a close relationship between the 49T–112T Zone and the autolysis process. At positions 35 and 94, the 3CLpro SARS-CoV-2 has two small hydrophobic amino acids Val35 and Ala94 instead of two small polar residues Thr35 and Ser94 in the 3CLpro SARS-CoV. It is possible that such changes in amino acids give the 3CLpro SARS-CoV-2 structure additional hydrophobic resistance to autolysis that is amino acid replacements at positions 35 and 94 of the 3CLpro SARS-CoV-2 can change the autolysis rate. This structural assumption is hypothetical and should be verified by appropriate experiments.

Conclusions

The 3D structures of (Chymo)trypsin-like 3CLpro from SARS-CoV-2 and SARS-CoV have different amino acid residues at 8 positions of their amino acid sequences: Val35Thr, Ser46Ala, Asn65Ser, Val86Leu, Lys88Arg, Ala94Ser, Phe134His, and Asn180Lys (Fig. 5 ). These residues can be divided into two structural groups. The first group includes 4 amino acids at positions 86, 88, 134, and 180 that form the structural catalytic core. The second group includes 3 amino acids at positions 46, 65, and 94 located in the loop regions. This group is also adjoined by the amino acid at position 35. The first group of residues modulates the catalytic activity of the 3CLpro by changing the nature of the interaction between the N- and C-terminal β-barrels. The second group includes 3 amino acids at positions 46, 65, and 94 located in the loop regions (Fig. 5A and B). This group is also adjoined by the amino acid at position 35 (Fig. 5A and B). The first group of residues modulates the catalytic activity of the 3CLpro by changing the nature of the interaction between the N- and C-terminal β-barrels. The second group of residues can be involved in modulation of the activity using such unique features of the tertiary structure as the existence of the C-terminal extension (IDL loop). In addition, the amino acids of the second group and the sites of possible autolysis of the 3CLpro intersect in the amino acid sequence, which suggests that the process of autolysis of proteases plays an essential role in modulating catalytic activity of this important viral protease. This result opens up a new field of scientific research for those researchers involved in protein characterization and inhibition of the 3CLpro from SARS-CoV-2.
Fig. 5

Superposition of three-dimensional (3D) structures of the 3CLpro(s) from SARS-CoV (PDB ID 1UJ1; shown in gray) and SARS-CoV-2 (PDB ID 6LU7; shown in blue). (A) Shows the location within the entire structure of the V42-L27 Zone (for reference, see Fig. 1A), the catalytic triad (H41, C85 and C145 in SARS-CoV-2), and the variable amino acids at the positions 35, 46, 65 and 94. (B) Shows the location of the D33-N95 Zone (for reference, see Fig. 4A), the catalytic triad (H41, C85 and C145 in SARS-CoV-2), and the variable amino acids at the positions 35, 46, 65 and 94. (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)

Superposition of three-dimensional (3D) structures of the 3CLpro(s) from SARS-CoV (PDB ID 1UJ1; shown in gray) and SARS-CoV-2 (PDB ID 6LU7; shown in blue). (A) Shows the location within the entire structure of the V42-L27 Zone (for reference, see Fig. 1A), the catalytic triad (H41, C85 and C145 in SARS-CoV-2), and the variable amino acids at the positions 35, 46, 65 and 94. (B) Shows the location of the D33-N95 Zone (for reference, see Fig. 4A), the catalytic triad (H41, C85 and C145 in SARS-CoV-2), and the variable amino acids at the positions 35, 46, 65 and 94. (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)

Materials and methods

The choice of structures to be analyzed

In this work, as in the previous publications [12], [13], the same dataset of 170 3D structures of the (chymo)trypsin-like proteinases with serine/cysteine fold was used. In these two publications, all the procedural details of the compilation of the required set of 3D structures are described in detail.

Modeling software for structural analysis

Structure visualization and structural analysis of interactions between amino acids in proteins (hydrogen bonds, hydrophobic, other types of weak interactions) were carried out using the Discovery Studio Modeling Environment (Dassault Systèmes BIOVIA, Discovery Studio Modeling Environment, Release 2017, San Diego: Dassault Systèmes, 2016) and the Ligand-Protein Contacts (LPC) software [22]. Figures are drawn with MOLSCRIPT [38].

Funding

The project was supported by the and Joe, Pentti and Tor Borg (A.I.D. and M.S.J).

CRediT authorship contribution statement

Alexander I. Denesyuk: Study design, Formal analysis, Methodology, Visualization, Writing – Original Draft, Writing – Review & Editing; Eugene A. Permyakov: Formal analysis, Writing – Review & Editing; Mark S. Johnson: Formal analysis, Methodology, Writing – Original Draft; Sergei E. Permyakov: Formal analysis, Writing – Review & Editing; Konstantin Denessiouk: Formal analysis, Methodology, Visualization, Writing – Original Draft, Writing – Review & Editing; Vladimir N. Uversky: Study design, Formal analysis, Methodology, Visualization, Investigation, Writing – Original Draft, Writing – Review & Editing.

Declaration of competing interest

The authors declare no conflict of interest.
  2 in total

Review 1.  The SARS-CoV-2 main protease (Mpro): Structure, function, and emerging therapies for COVID-19.

Authors:  Qing Hu; Yuan Xiong; Guang-Hao Zhu; Ya-Ni Zhang; Yi-Wen Zhang; Ping Huang; Guang-Bo Ge
Journal:  MedComm (2020)       Date:  2022-07-14

2.  Surface cysteines could protect the SARS-CoV-2 main protease from oxidative damage.

Authors:  Raheleh Ravanfar; Yuling Sheng; Mona Shahgholi; Brett Lomenick; Jeff Jones; Tsui-Fen Chou; Harry B Gray; Jay R Winkler
Journal:  J Inorg Biochem       Date:  2022-06-02       Impact factor: 4.336

  2 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.