Since early 2020, the world suffers from a new beta-coronavirus, called SARS-CoV-2, that has devastating effects globally due to its associated disease, Covid-19. Until today, Covid-19, which not only causes life-threatening lung infections but also impairs various other organs and tissues, has killed hundreds of thousands of people and caused irreparable damage to many others. Since the very onset of the pandemic, huge efforts were made worldwide to fully understand this virus and numerous studies were, and still are, published. Many of these deal with structural analyses of the viral spike glycoprotein and with vaccine development, antibodies and antiviral molecules or immunomodulators that are assumed to become essential tools in the struggle against the virus. This paper summarizes knowledge on the properties of the four structural proteins (spike protein S, membrane protein M, envelope protein E and nucleocapsid protein N) of the SARS-CoV-2 virus and its relatives, SARS-CoV and MERS-CoV, that emerged few years earlier. Moreover, attention is paid to ways to analyze such proteins using freely available bioinformatic tools and, more importantly, to bring these proteins alive by looking at them on a computer/laptop screen with the easy-to-use but highly performant and interactive molecular graphics program DeepView. It is hoped that this paper will stimulate non-bioinformaticians and non-specialists in structural biology to scrutinize these and other macromolecules and as such will contribute to establishing procedures to fight these and maybe other forthcoming viruses.
Sinceearly 2020, the world suffers from a new beta-coronavirus, calledSARS-CoV-2, that has devastating effects globally due to its associateddisease, Covid-19. Until today, Covid-19, which not only causes life-threatening lung infections but also impairs various other organs and tissues, has killed hundreds of thousands of people and caused irreparabledamage to many others. Since the very onset of the pandemic, hugeefforts weremade worldwide to fully understand this virus and numerous studies were, and still are, published. Many of thesedeal with structural analyses of the viral spike glycoprotein and with vaccinedevelopment, antibodies and antiviral molecules or immunomodulators that are assumed to becomeessential tools in the struggle against the virus. This paper summarizes knowledge on the properties of the four structural proteins (spike protein S, membrane protein M, envelope protein E andnucleocapsid protein N) of theSARS-CoV-2 virus and its relatives, SARS-CoV andMERS-CoV, that emerged few years earlier. Moreover, attention is paid to ways to analyze such proteins using freely available bioinformatic tools and, more importantly, to bring these proteins alive by looking at them on a computer/laptop screen with theeasy-to-use but highly performant and interactivemolecular graphics programDeepView. It is hoped that this paper will stimulate non-bioinformaticians and non-specialists in structural biology to scrutinize these and othermacromolecules and as such will contribute to establishing procedures to fight these andmaybe other forthcoming viruses.
The year 2020 will always be remembered as “the year of the pandemic.” A new type of virus causing severerespiratory illnessemerged in December 2019 in Wuhan, China. Since then, it has rapidly spread throughout theentire world, leaving a trail of destruction with high mortality. The pathogen was soon identified to belong to theCoronaviridae family, subfamily of the Coronavirinae, which is further subdivided in four genera called alpha, beta, gamma anddelta (Belouzardet al., 2012; Fehr and Perlman, 2015; Fung and Liu, 2019; Li et al., 2020a). Thenew virus could be classified as a beta-coronavirus and was found to be closely related to otherhumanbeta-coronaviruses that emergedearly in the 21-st century, i.e., SARS-CoV that died out after about one year andMERS-CoV that is still lingering. Thenew virus was officially namedSARS-CoV-2 and thedisease it causes is known as Covid-19 (Gorbalenya et al., 2020). The original SARS-CoV-2 virus, which evolved in bats (Andersenet al., 2020; Shereenet al., 2020) likemany other alpha- andbeta-coronaviruses (Li et al., 2005; Chan et al., 2013; Wang and Anderson, 2019), is easily transmitted fromhuman to human, has an appreciably high reproductive number when no containment measures are taken (R = 2–4), a high infection fatality rate (IFR = 0.3–1.3%), and it remains infective for extensive periods of time outside thehuman body (Bar-On et al., 2020; Rabi et al., 2020). Moreover, it spreads already for several days before an infected person notices the first symptoms of disease because the virus developed several ways to thwart the immune system’s response (Astuti and Ysrafil, 2020; Banerjeeet al., 2020; Kikkert, 2020). SARS-CoV-2, together with SARS-CoV andMERS-CoV (hereafter referred to as theSARS-CoV-s), are still of great concern because of their worldwide health threat to humans. For this reason, since its first appearance, numerous studies have been conducted to understand its structure, organization, ways of infection, multiplication and pathogenesis. These studies are anticipated to continue guiding us in thedevelopment of strategies using antivirals and/or immunomodulators to attenuate the severity of illness in case of infection, and/or to prevent infection through thedevelopment of vaccines (AbdEllah et al., 2020; Capell et al., 2020; Callaway, 2020a; Dai et al., 2020; Dong et al., 2020; Graham, 2020; Hu et al., 2020a; Kaslow, 2020; Krammer, 2020; Li et al., 2020c; Liu et al., 2020; Polandet al., 2020; Riva et al., 2020; Tay et al., 2020), thereby trying to avoid serious problems that may show up such as cytokine stormdevelopment, suboptimal antibody response or immuneenhancement (Tisoncik et al., 2012; de Alwis et al., 2020; Hotez et al., 2020; Iwasaki and Yang, 2020; Moore and June, 2020; Pedersen and Ho, 2020). Another point of attention should be the prevention of mutational escape of viral proteins that seems to occur following administration of single antibody species (Baumet al., 2020). Such studies are all themore important as it is envisaged that many moreSARS-CoV/MERS-CoV-likecoronavirusesmight be lurking around the corner, ready to jump to and thereafterspread amongst humans following interspecies transmission in the years or decennia to come (Wang and Anderson, 2019; Valitutto et al., 2020).
Aim of the Paper
This paper presents an overview of the current knowledge of theSARS-CoV-s’ structural proteins, on their spatial organization and functional properties, with emphasis on thespike protein. Also, the involvement of the host’s own proteins in thedevelopment of Covid-19 is considered. Moreover, attention is paid to how antibodies andpeptidesmay help to overcomeinfections. In the Supplementary Material to this paper, we will explore anddemonstrate how bioinformatic tools that are freely available on the internet may help students and researchers who areneither trained bioinformaticians nor structural biologists, to understand and visualizemacromolecules such as those from thebeta-coronaviruses. In view of the overwhelming numbers of structural studies and the continuous increasing availability of data in the protein databank (wwPDB consortium, 2019; Berman et al., 2000), it is definitely an asset to be able to visualize (macro)molecules on a personal computer screen. Although authors do their utmost to present structures they show in publications in optimal orientations and with themost instructive coloring, it is essential to be able to walk around in these structures yourself to gain a much better understanding of thesemolecules and appreciate their 3D structure and flexibility. Therefore, the Supplementary Material will guide the reader within this exciting area, which steadily continues to grow in importance. Playing around with the structural data that are amply available nowadays is becoming a prerequisite to understand complex particles such as theSARS-CoV-s and helps us to deal with them. In fact, looking in detail to the structures of the respective viral components allows us to understand the whole sequence of events that occur during infection and pathogenesis of the virus.
The Genome of the SARS-CoV-s
Coronaviruses are (+)ssRNA (positive-sensesingle-stranded RNA) viruses with a very large RNA genome, typically around 30 kb. The viral RNA is packaged inside a spherical membrane (roughly 100–125 nm in diameter) with the help of solublenucleocapsid proteins (N). Other structural proteins comprise threemembrane proteins, i.e., thespike protein (S), themembrane protein (M) (occasionally also calledmatrix protein) and theenvelope protein (E). These four structural proteins occur at a ratio of roughly (E:S:N:M) = (1:5:50:100), according to estimations done for SARS-CoV (Bar-On et al., 2020).At the 5′ end, the viral RNA contains two large so-called replicase genes (rep1a, rep1b) organized as two extended open reading frames (5′-ORF-1a/1b), followed by genes that code for the four structural and some accessory proteins (Fehr and Perlman, 2015; Tang et al., 2020; Figure 1). As soon as the viral RNA enters a host cell, it acts as an mRNA molecule and hijacks not only the host translational machinery to make all its encoded proteins, but also the host’s intricate post-translational modification systems. The open reading frames, 5′-ORF-1a/1b, encode a series of non-structural proteins (Nsps), some of which areenzymes, while others have yet unknown functions (Fehr and Perlman, 2015; Chenet al., 2020). Translation of these two ORFs in the cytoplasm of the host cell results in the synthesis of two long polyproteins (pp1a, pp1b), which are autoproteolytically cleaved by viral proteases. One of theenzymes (Nsp12) is a unique RNA-dependent RNA polymerase (RdRp), responsible for replication of the viral RNA genome (Jiang et al., 2020). Another one (Nsp14) also has an essential role in replication and transcription: it is a bifunctional enzyme with an exoribonucleasedomain (ExoN) that, extraordinary for a virus, has proofreading activity and, as such, limits the occurrence of lethal mutations in the viral RNA (Ferron et al., 2017; Romano et al., 2020). Moreover, two important viral proteases are produced from its RNA, i.e., themain protease (M, also called 3CL) and a papain-like protease (PL), both implicated in processing of the polyproteins (Zhang et al., 2020a). Besides other components, numerous copies of thenucleocapsid protein N are also made in the cytoplasm.
FIGURE 1
(A) The organization of the viral RNA, coding for, amongst others, the four structural proteins and additional non-structural proteins, e.g., enzymes mentioned in the text. (B) Schematic representation of a SARS-coronavirion with its four structural proteins S, M, E, and N. The spike proteins S are present as a mixture of intact proteins (S1 plus S2), but some having already lost their S1 portion. (C) To the left: schematic view of a SARS-CoV spike protein. S proteins are homotrimers, each subunit is built of two portions, S1 (blue) and S2 (purple), followed by a transmembrane helix (olive) and a small cytoplasmic tail (brown). To the right: the spike protein trimer ectodomains of (from left to right) SARS-CoV-2 (from 6VXX.pdb), SARS-CoV (from 5XLR.pdb) and MERS-CoV (from 5X5C.pdb). Peptide segments S1 and S2 are colored as in the schematic view. Literature references for structural codes: 6VXX.pdb (Walls et al., 2020); 5XLR.pdb (Gui et al., 2017); 5X5C.pdb (Yuan et al., 2017).
(A) The organization of the viral RNA, coding for, amongst others, the four structural proteins and additional non-structural proteins, e.g., enzymes mentioned in the text. (B) Schematic representation of a SARS-coronavirion with its four structural proteins S, M, E, and N. Thespike proteins S are present as a mixture of intact proteins (S1 plus S2), but some having already lost their S1 portion. (C) To the left: schematic view of a SARS-CoVspike protein. S proteins are homotrimers, each subunit is built of two portions, S1 (blue) and S2 (purple), followed by a transmembrane helix (olive) and a small cytoplasmic tail (brown). To the right: thespike protein trimerectodomains of (from left to right) SARS-CoV-2 (from 6VXX.pdb), SARS-CoV (from 5XLR.pdb) andMERS-CoV (from 5X5C.pdb). Peptide segments S1 and S2 are colored as in the schematic view. Literature references for structural codes: 6VXX.pdb (Walls et al., 2020); 5XLR.pdb (Gui et al., 2017); 5X5C.pdb (Yuan et al., 2017).
From RNA to Mature Virions
Upon infection, (+)ssRNA viruses usurp host cell membranes from certain organelles. In SARS-CoV-s, ERmembranes are captured from which a complex reticulovesicular network forms that contains doublemembrane vesicles (DMVs, 200–300 nm) and which remains connected to theER (Snijderet al., 2020). DMVs act as little factories in which ssRNA is first transformed into dsRNA (an intermediate in multiplication), from which new viral (+)ssRNAmolecules are generated. In this way, viral RNA is isolated and shielded from innate immune sensing. This entire process also relies on several viral Nsps that form complexes, many of which have yet to be unraveled. Some Nsps assemble to form pores in theDMVmembranes, through which newly madessRNAmolecules areexported into the host cytoplasm (Wolff et al., 2020). During this process they collect N proteins that bind in a “beads-on-a-string” fashion to stabilize the RNA (Yao et al., 2020). TheseRNP complexes then travel to virus assembly sites at theERGIC and/or Golgi complex.Meanwhile, the viral structural membrane proteins (S, M, E) follow theexport route. Only protein S is equipped with a signal peptide to access theER in the classical way. How the other two reach theER is not yet clear. However, it is known that somemembrane proteins also face the same problem in various organisms (Ott and Lingappa, 2002). The three viral membrane proteins follow the normal flow fromER towards the Golgi apparatus. In this process they aredecorated with N-linkedglycans, which is essential for proper folding andmaturation of themolecules (Zhao et al., 2015). The three proteins assemble in theERGIC/Golgi membrane and leading to its invagination. RNA-(protein N) complexes enter the pro-virions (Wolff et al., 2020), driven by N-M protein interactions, after which new virions containing S, M, andE proteins, as well as RNA-N complexes, pinch off by ERGIC or Golgi membrane fission. Finally, mature virions are released from the host cell in a non-classical manner. Instead of using the secretory exocytosis pathway, the virus utilizes lysosomes that aredeacidified (possibly through the action of the viral protein ORF3a), concomitantly inactivating lysosomal degradativeenzymes anddisturbing cellular processes including autophagy, pathogendegradation and antigen presentation (Ghosh et al., 2020). During trafficking, the virions are continuously accompanied by KDEL-containing ER-chaperones GRP78/BiP andcalreticulin and by theKDEL-receptor, which are also co-released. Hundreds of new virions may beexcreted from an infected lung cell, which dies fromexhaustion or is eliminated by the host’s immune system. Thenew virions can then infect other cells of the same host or beexpelled in the air in droplets or aerosols that may invade anotherhuman host.
The Spike Protein (S) Attaches the Virus to Host Cells and Mediates Internalization
Thespike protein is an integral single-pass type-I membrane protein that protrudes in many copies from the outer surface of the virus, contributing its characteristic appearance. Spikes are responsible for binding the virus to a human or animal cell by recognizing specific receptors and, thereafter, for entry of the virus into the host cell. Spike proteins are themajor antigenic determinants of the virus and themain targets in numerous active and passive immunization studies (Amanat and Krammer, 2020; de Alwis et al., 2020; Wrapp et al., 2020a). Indeed, there is an urgent need to generateneutralizing antibodies to fight Covid-19 (as Klasse (2014) explains: “neutralization“ is defined as “the reduction in viral infectivity by the binding of antibodies to the surface of virions, thereby blocking a step in the viral replication cycle that precedes virally encoded transcription or synthesis”). Detailed structural studies on theSARS-CoV-s have paved the way to understanding the complexity of thespike proteins and their way of action. Schematically, thespikes are built up as shown in Figure 1. Spike proteins are homotrimers. Each monomer has a largeectodomain that consists of several subdomains, followed by a transmembranedomain and an endodomain that contains a series of cysteine residues with palmitoyl chains attached (Petit et al., 2007; McBride andMachamer, 2010; Veit, 2012). S-palmitoylation is a well-known reversible protein post-translational modification (Charollais and van der Goot, 2009; Blaskovic et al., 2013). Theectodomain as well undergoes extensive post-translational modification and becomes heavily glycosylated by the host’s N-/O-glycosylation machinery (Fung and Liu, 2018; Watanabeet al., 2020a, b; Yao et al., 2020; Shajahan et al., 2020).The overall architecture of theSARS-CoV-s is well-understood and has been conservedduring evolution of the viruses. It is summarized in Figure 2. Although thespike glycoproteins of the threeSARS-CoV-s are structurally quitesimilar, their primary structures differ substantially (Table 1, and Supplementary Material: Supplementary Table 1, Supplementary Figure 2). Both the S1 and the S2 domains of thespike protein consist of a series of subdomains, each having a well-defined function. The N-terminal S1 is responsible for receptor binding, while the C-terminal S2 mediates membrane fusion to facilitateentry of the virus into a host cell.
FIGURE 2
Architecture of SARS-CoV-s’ spike proteins. On top, the two S protein ectodomain halves (S1 and S2) with their different subdomains are depicted. Below, the amino acid sequence of the most important domains is shown for the most recent SARS-CoV-2 virus (sequence taken from the NCBI protein database, accession number YP_009724390). The subdomains from which the structure was resolved (PDB database, accession number 6VXX) are put in color, others are shown in white (or in gray, for the signal peptide). A white scissor indicates cleavage of the signal peptide in the ER of the host cell during biosynthesis of the protein. The two black scissors indicate the position of the consecutive cleavage steps occurring during viral infection. SS, signal sequence; NTD, N-terminal domain A27–S305; RBD, receptor-binding domain P330–P521; SD1, SD2 structural subdomains 1 and 2; S1/S2, place where cleavage occurs R682–R685; FP, fusion peptide S816–F833; HR1, heptad repeat 1 G908–D985; CH, central helix E988–G1035; CD, connector domain T1076–L1141; HR2, heptad repeat 2 D1163–E1202; TM, transmembrane domain W1214–L1234; CT, cytoplasmic tail C1235–T1273. Some residues are not seen in the structure in model 6VXX. In the NTD, the position of 71 residues (in 5 stretches, i.e., V16-P26, V70-F79, Y144-N164, Q173-N185 and R246-A262) is missing, and in the RBD, 30 residues remained undetermined (i.e., V445-G446, L455-L461, S469-C488 and residue G502). In the SD subdomain, residues P621-S640 are not seen. The peptide in which the S1/S2 cleavage occurs (containing the furin cleavage sequence RRAR) is also missing from Q677 till A688. From the small fusion peptide, the first twelve residues are presented in the structure (S816-T827), but the end of the peptide (L828ADAGF833) is missing as well till Q853. The last residue in the structure is S1147, somewhat before the HR2 subdomain.
TABLE 1
Properties of the three human Sars-CoV-s’ structural proteins S (ectodomain), M, E and N.
SARS-CoV-2
SARS-CoV
MERS-CoV
Membrane spike protein S
Identity (NCBI accession number)
YP_009724390
P59594
ASY99778
Ectodomain
V16–P1213
S14–V1198
Y18–W1300
Molecular mass (kDa)
132.924
131.577
141.544
Theoretical pI
6.30
5.56
5.65
Aliphatic index
83.23
82.63
81.61
Protein sequence SARS-CoV-2
–
76.0% Id – 17.0% Si
26.7% Id – 34.3% Si
SARS-CoV
76.0% Id – 17.0% Si
–
27.3% Id – 33.9% Si
MERS-CoV
26.7% Id – 34.3% Si
27.3% Id – 33.9% Si
–
Membrane protein M
Identity (NCBI accession number)
QIC53216
AAP13444
AGH58718
Molecular mass (kDa)
25.146
25.070
24.552
Theoretical pI
9.51
9.63
9.27
Aliphatic index
120.86
115.06
103.70
Protein sequence SARS-CoV-2
–
89.2% Id – 8.1% Si
39.1% Id – 31.8% Si
SARS-CoV
89.2% Id – 8.1% Si
–
41.8% Id – 30.9% Si
MERS-CoV
39.1% Id – 31.8% Si
41.8% Id – 30.9% Si
–
Membrane envelope protein E
Identity (NCBI accession number)
P0DTC4
AAP13443
AGH58723
Molecular mass (kDa)
8.365
8.361
9.354
Theoretical pI
8.57
7.01
7.64
Aliphatic index
144.00
145.92
111.59
Protein sequence SARS-CoV-2
–
96.0% Id – 4.0% Si
34.1% Id – 30.5% Si
SARS-CoV
96.0% Id – 4.0% Si
–
34.1% Id – 32.9% Si
MERS-CoV
34.1% Id – 30.5% Si
34.1% Id – 32.9% Si
–
Soluble nucleocapsid protein N
Identity (NCBI accession number)
P0DTC9
AAP13445
AGG22549
Molecular mass (kDa)
45.625
46.025
44.857
Theoretical pI
10.07
10.11
10.05
Aliphatic index
53.52
49.81
56.08
Protein sequence SARS-CoV-2
–
89.3% Id – 8.1% Si
46.1% Id – 26.3% Si
SARS-CoV
89.3% Id – 8.1% Si
–
44.8% Id – 25.8% Si
MERS-CoV
46.1% Id – 26.3% Si
44.8% Id – 25.8% Si
–
Architecture of SARS-CoV-s’ spike proteins. On top, the two S protein ectodomain halves (S1 and S2) with their different subdomains aredepicted. Below, the amino acid sequence of themost important domains is shown for themost recent SARS-CoV-2 virus (sequence taken from the NCBI protein database, accession number YP_009724390). The subdomains from which the structure was resolved (PDBdatabase, accession number 6VXX) are put in color, others are shown in white (or in gray, for thesignal peptide). A white scissor indicates cleavage of thesignal peptide in theER of the host cell during biosynthesis of the protein. The two black scissors indicate the position of the consecutive cleavage steps occurring during viral infection. SS, signal sequence; NTD, N-terminal domain A27–S305; RBD, receptor-binding domain P330–P521; SD1, SD2 structural subdomains 1 and 2; S1/S2, place where cleavage occurs R682–R685; FP, fusion peptide S816–F833; HR1, heptad repeat 1 G908–D985; CH, central helix E988–G1035; CD, connector domain T1076–L1141; HR2, heptad repeat 2 D1163–E1202; TM, transmembranedomain W1214–L1234; CT, cytoplasmic tail C1235–T1273. Some residues are not seen in the structure in model 6VXX. In the NTD, the position of 71 residues (in 5 stretches, i.e., V16-P26, V70-F79, Y144-N164, Q173-N185 and R246-A262) is missing, and in the RBD, 30 residues remained undetermined (i.e., V445-G446, L455-L461, S469-C488 and residue G502). In the SD subdomain, residues P621-S640 are not seen. The peptide in which the S1/S2 cleavage occurs (containing thefurin cleavage sequence RRAR) is also missing from Q677 till A688. From the small fusion peptide, the first twelve residues are presented in the structure (S816-T827), but theend of the peptide (L828ADAGF833) is missing as well till Q853. The last residue in the structure is S1147, somewhat before the HR2 subdomain.Properties of the threehumanSars-CoV-s’ structural proteins S (ectodomain), M, E and N.
Overall Appearance of the SARS-CoV-2 Spike Glycoprotein Trimer: The “Pre-Fusion State”
A model of the completeSARS-CoV-2spike glycoprotein using coordinates from 6VXX.pdb is shown in Figure 3. This structure starts at the twelfth residue of themature polypeptide chain (A27) andends at residueS1147. A spike protein subunit comprises 1173 residues (including thesignal peptide of 15 residues). In the S1 half of thespike protein ectodomain wemostly find beta-strands, but the S2 part mainly consists of long alpha-helices. This figure (Figure 3) also indicates theend with which a spike protein trimer is attached to the virion (schematically drawn on top anddecorated with multiplespike proteins). The organization of each of thespike protein subunits in different subdomains can be appreciated in Figure 4. In this figure, the B-subunit in model 6VXX is colored following the color code used in Figure 2. The last residue seen in the structure (S1147) lies just before the second heptad repeat (HR2). The position of an aspartate residue (D614) that very early spontaneously mutated to glycine (see section “TheMuch-debated lucrativespike protein mutant D614G”) is indicated as well in chain B.
FIGURE 3
Side view (left) and bottom-to-top view with the virus particle being behind the spike protein (right) of the complete SARS-CoV-2 spike glycoprotein trimer, using coordinates from 6VXX.pdb. All subunits are in the closed (down) position. The B-chain is represented as ribbons, while A and C-chains are shown as backbone with side chains, in CPK colors (explained in the Supplementary Material). The B-chain ribbons are colored blue, with the NTD light blue and the RBD green. The location of the position where the activating proteolytic cleavage in the spike protein occurs is indicated for the subunit B (left, in blue) and for subunit A (to the right). The peptide of 12 residues in which cleavage takes place to generate spike protein molecules S1 and S2 (i.e., Q677TNSPRRARSVA688) is missing in the structure, but the flanking residues T676 and S689 are shown with their side chains and (manually) labeled. In the side view, the virion is at the top. It is represented as a sphere from which other spike proteins emerge. The latter are represented with their surface, either in the pre-fusion state with all subunits in closed state (c) and some of them with one subunit in open configuration (o), each subunit colored differently (chain A red, chain B blue, chain C green), or in the post-fusion state (pf; colored in the same way). Approximate spike dimensions were measured on the model and are indicated. In the open state, the spike protein length increases from 160 to about 175 Å. Literature reference for structural codes: 6VXX.pdb (Walls et al., 2020).
FIGURE 4
(A) The different subdomains of the SARS-CoV-2 spike protein, with their function. Chain B in model 6VXX.pdb is represented as ribbons colored according to the code used in Figure 2, i.e., the N-terminal domain (NTD) is blue, the receptor-binding domain (RBD) is green, the structural subdomains SD1 and SD2 are ivory colored, S2 is colored red, except from the fusion peptide (FP), which is turquoise, the first heptad repeat (HR1) that is in ochre and is immediately followed by the central helix (CH) in orange, and the connector domain (CD) is purple. The position where the cleavage S1/S2 occurs is marked with a red arrow and the positions of the N-terminal residue (A27) as well as the last residue of the RBD (P521) are indicated. Also, the position of residue D614 is indicated. (B) The same B-chain incorporated in the complete spike protein trimer. Chains A and C are shown as ribbons colored dark and light gray, respectively. Literature reference for structural codes: 6VXX.pdb (Walls et al., 2020).
Side view (left) and bottom-to-top view with the virus particle being behind thespike protein (right) of the completeSARS-CoV-2spike glycoprotein trimer, using coordinates from 6VXX.pdb. All subunits are in the closed (down) position. The B-chain is represented as ribbons, while A and C-chains are shown as backbone with side chains, in CPK colors (explained in the Supplementary Material). The B-chain ribbons are colored blue, with the NTD light blue and the RBD green. The location of the position where the activating proteolytic cleavage in thespike protein occurs is indicated for the subunit B (left, in blue) and for subunit A (to the right). The peptide of 12 residues in which cleavage takes place to generatespike protein molecules S1 and S2 (i.e., Q677TNSPRRARSVA688) is missing in the structure, but the flanking residues T676 and S689 are shown with their side chains and (manually) labeled. In theside view, the virion is at the top. It is represented as a sphere from which otherspike proteins emerge. The latter are represented with their surface, either in the pre-fusion state with all subunits in closed state (c) and some of them with one subunit in open configuration (o), each subunit coloreddifferently (chain A red, chain B blue, chain C green), or in the post-fusion state (pf; colored in the same way). Approximatespikedimensions weremeasured on themodel and are indicated. In the open state, thespike protein length increases from 160 to about 175 Å. Literature reference for structural codes: 6VXX.pdb (Walls et al., 2020).(A) Thedifferent subdomains of theSARS-CoV-2spike protein, with their function. Chain B in model 6VXX.pdb is represented as ribbons colored according to the code used in Figure 2, i.e., the N-terminal domain (NTD) is blue, the receptor-binding domain (RBD) is green, the structural subdomains SD1 and SD2 are ivory colored, S2 is colored red, except from the fusion peptide (FP), which is turquoise, the first heptad repeat (HR1) that is in ochre and is immediately followed by the central helix (CH) in orange, and the connector domain (CD) is purple. The position where the cleavage S1/S2 occurs is marked with a red arrow and the positions of the N-terminal residue (A27) as well as the last residue of the RBD (P521) are indicated. Also, the position of residueD614 is indicated. (B) The same B-chain incorporated in the completespike protein trimer. Chains A and C are shown as ribbons coloreddark and light gray, respectively. Literature reference for structural codes: 6VXX.pdb (Walls et al., 2020).
The Closed (Down) and Open (Up) Conformation of the Spike Protein
The viral spike protein responsible for binding to the host cell is initially in a “pre-fusion” conformation, in search for a receptor on a host cell. Thereby, each of the three subunits exists a certain period of time in a closed (or “down”) configuration, a state that is more stable but unable to bind the receptor (see section “A Viral Spike Protein Cannot Bind to theAce2 Receptor When All Its Subunits Are in Closed Conformation”), and some time in a more unstable open (or “up”) configuration, which is receptor accessible. This hinge-like open ↔ closed transition is described in a publication by Wrapp et al. (2020b) and two very instructive videos of the internal movements in theSARS-CoV-2spike trimer are included in the same publication on-line. When S1 successfully binds to its host cell receptor, the S protein structure becomes unstable and proteolysis may easily occur, resulting in shedding of the N-terminal S1 half of themolecule (S1/S2 cleavage). A second proteolytic cleavage (S2’) then takes place in theSARS-CoV-s, which further removes a long peptide up to just before the fusion peptide (FP), thereby fully exposing this small peptide. The virus is now ready to fuse its own membrane with the host cell membrane (as will bedescribed in section “Events Causing Virus Entry Into Host Cells: TheSpike Protein “Post-Fusion” State”).In Figures 3, 4, all spike protein monomers are in the closed conformation. For theSARS-CoV-2spike protein, a structure is available in which chain B exists in the open form. The two models 6VXX.pdb (all chains closed) and 6VYB.pdb (B-chain open) were uploaded in thedatabaseexactly in the same orientation, allowing direct comparison of both structures. Such a superposition of the B-chain in both models is seen in the Supplementary Material (Supplementary Figure 3). Similar events occur in theSARS-CoV and theMERS-CoVspike proteins. Coordinates may be found as well for MERS-CoV (5X5C.pdb: all chains down, and 5X5F.pdb: B-chain in up conformation).
Comparison of the SARS-CoV-2, SARS-CoV and MERS-CoV Spike Protein Structures
High quality structural details of all threeSARS-CoVspike proteins are available. However, each of them was uploaded in thePDBdatabase in a different orientation, so they first need to be superimposed for comparison. After superposition of the individual RBDs we see that, especially for SARS-CoV-2 andSARS-CoV, the structures are very similar (Figure 5B). But, also fromSARS-CoV-2 andMERS-CoV, the RBDs can be nicely superposed over an extended part of thedomain (Figure 5C). Towards theend of the RBD, the protein fold in MERS-CoV starts diverting from the structure seen in SARS-CoV-2. This becomes also understandable when looking at a multiple sequence alignment of the three RBDs (Figure 5D). In the first part of the alignment (S364 till V484, MERS-CoV numbering), identities andsimilarities in the sequences are 19.5% and 37.4%, while in the second part (P485 till M569) the value for identities andsimilarities drops to 9.4% and 27.2%, respectively. Moreover, long gaps needed to be introduced to optimize the alignment. However, most of thedisulfide bonds in the RBD are conserved in all threeSARS-CoV-s (Figures 5B,C). In SARS-CoV-2, the stretch I468 till Y489 is missing, so thedisulfide bond 4 is not evident from this structure. However, it is visualized in other structures of theSARS-CoV-2 RBD (see Supplementary Material, Supplementary Figures 4, 5). In MERS-CoV, the corresponding cysteine residue (in between T533 and V534) is absent in the sequence. But in this spike protein RBD, C526 forms an alternativedisulfide bond with C503 instead.
FIGURE 5
(A) One single chain is displayed of each of the spike protein trimer ectodomains from SARS-CoV-2 (6VXX.pdb), SARS-CoV (5XLR.pdb) and MERS-CoV (5X5C.pdb), after superposition of the three models (as explained in the Supplementary Material). They have been given different colors, i.e. red (SARS-CoV-2), blue (SARS-CoV) and gray (MERS-CoV). The upper left part shows the S2 halves of the chains, at the lower right are the S1 halves. The red arrow points to the place where the cleavage S1/S2 occurs. (B) The RBDs of one subunit of the SARS-CoV-2 (red) and SARS-CoV (blue) spike proteins were superposed, starting from the previous picture (Figure 5A). Side chains of residues C336 and C361 in SARS-CoV-2 (forming a disulfide bridge) and of residue D405 are also shown. They correspond with residues C323 and C348 (also forming a disulfide bridge) and D392 in SARS-CoV. These residues are labeled in green. (C) The RBDs of one subunit of the SARS-CoV-2 (red) and MERS-CoV (gray) spike proteins were superposed. Side chains of residues C336 and C361 in SARS-CoV-2 (forming a disulfide bridge) and of residue Y423 are shown as well. They correspond with residues C383 and C407 (also forming a disulfide bridge) and Y469 in MERS-CoV. These residues are labeled in green. The ribbons in the RBD stretch where major structural differences are observed with the SARS-CoV-2 protein (i.e., from V484 till M569 in the MERS-CoV sequence) are colored pink. The SARS-CoV-2 RBD is shown in figures B and C in roughly the same orientation. The SARS-CoV-2 RBD is shown from residue S316 till N544 in both figures, the SARS-CoV RBD from N304 till N528, and the MERS-CoV RBD from S364 till C585. A limited number of residues is missing in the three structures. (D) Multiple sequence alignment of the RBDs of the three spike protein sequences using Clustal omega (Sievers et al., 2011). The sequence where the structure of the MERS-CoV RBD diverts from that of the SARS-CoV and SARS-CoV-2 RBDs is shown in pink letters. The tyrosine that is conserved in the three RBDs, as well as the aspartate residue conserved in SARS-CoV and SARS-CoV-2, are highlighted in green. Disulfide bond cysteine residues are highlighted in yellow and numbers above them indicate their covalent interaction; 4a and 4b refer to different disulfide bond formation in SARS-CoV/SARS-CoV-2 and in MERS-CoV, respectively. Residues that are missing in the structures are in gray and underlined. Literature references for structural codes: 6VXX.pdb (Walls et al., 2020); 5XLR.pdb (Gui et al., 2017); 5X5C.pdb (Yuan et al., 2017)
(A) Onesingle chain is displayed of each of thespike protein trimerectodomains fromSARS-CoV-2 (6VXX.pdb), SARS-CoV (5XLR.pdb) andMERS-CoV (5X5C.pdb), after superposition of the threemodels (as explained in the Supplementary Material). They have been givendifferent colors, i.e. red (SARS-CoV-2), blue (SARS-CoV) and gray (MERS-CoV). The upper left part shows the S2 halves of the chains, at the lower right are the S1 halves. The red arrow points to the place where the cleavage S1/S2 occurs. (B) The RBDs of one subunit of theSARS-CoV-2 (red) andSARS-CoV (blue) spike proteins were superposed, starting from the previous picture (Figure 5A). Side chains of residues C336 and C361 in SARS-CoV-2 (forming a disulfide bridge) and of residueD405 are also shown. They correspond with residues C323 and C348 (also forming a disulfide bridge) andD392 in SARS-CoV. These residues are labeled in green. (C) The RBDs of one subunit of theSARS-CoV-2 (red) andMERS-CoV (gray) spike proteins were superposed. Side chains of residues C336 and C361 in SARS-CoV-2 (forming a disulfide bridge) and of residueY423 are shown as well. They correspond with residues C383 andC407 (also forming a disulfide bridge) and Y469 in MERS-CoV. These residues are labeled in green. The ribbons in the RBD stretch wheremajor structural differences are observed with theSARS-CoV-2 protein (i.e., from V484 till M569 in theMERS-CoV sequence) are colored pink. TheSARS-CoV-2 RBD is shown in figures B and C in roughly the same orientation. TheSARS-CoV-2 RBD is shown from residueS316 till N544 in both figures, theSARS-CoV RBD from N304 till N528, and theMERS-CoV RBD from S364 till C585. A limited number of residues is missing in the three structures. (D) Multiple sequence alignment of the RBDs of the threespike protein sequences using Clustal omega (Sievers et al., 2011). The sequence where the structure of theMERS-CoV RBDdiverts from that of theSARS-CoV andSARS-CoV-2 RBDs is shown in pink letters. Thetyrosine that is conserved in the three RBDs, as well as theaspartate residue conserved in SARS-CoV andSARS-CoV-2, are highlighted in green. Disulfide bondcysteine residues are highlighted in yellow and numbers above them indicate their covalent interaction; 4a and 4b refer to different disulfide bond formation in SARS-CoV/SARS-CoV-2 and in MERS-CoV, respectively. Residues that aremissing in the structures are in gray and underlined. Literature references for structural codes: 6VXX.pdb (Walls et al., 2020); 5XLR.pdb (Gui et al., 2017); 5X5C.pdb (Yuan et al., 2017)
Spike Protein Glycosylation
N-glycosylation is a very ancient process that is fully conserved in all eukaryotes. The N-glycosylation process starts in theER, during biosynthesis of the protein and its co-translational import in theER lumen, by the covalent attachment of a pre-synthesizedGlcNAc2-Man9-Glc3 precursor chain (Marth and Grewal, 2008; Aebi et al., 2009, Aebi, 2013; Stanley et al., 2017; Watanabeet al., 2019). After an elaborate quality control procedure in theER (Ruddock andMolinari, 2006; Słomińska-Wojewódzka and Sandvig, 2015), during which the threeglucose residues are removed, the protein that is now decorated with high-mannose chains is transported to the Golgi apparatus by ERGIC (Cop-II vesicles called ‘ER-Golgi Intermediate Compartment’). In the subsequent Golgi stacks, which move on by cisternal progression (Luini, 2011), all or some of the high-mannose chains may beenzymatically modified into complex-type or hybrid-typeglycans (Strasseret al., 2014; Stanley et al., 2017; Wang et al., 2017).For SARS-CoV-2 (Watanabeet al., 2020a), as well as for SARS-CoV andMERS-CoV (Watanabeet al., 2020b) theextent of N-glycosylation, as well as thematuration of theglycans in each of theNxT/S glycosylation sites, was thoroughly investigated. All threeSARS-CoV-s’ spike proteins are heavily glycosylated over the whole length of their subunits. A SARS-CoV-2/SARS-CoV subunit has 22 potential N-glycosylation sites, while a MERS-CoV subunit has 23. All potential sites are also effectively glycosylated (though some scarcely not to their full extent), as is described in two publications by Watanabeet al. (2020a, b). In these two papers, the authors investigated theextent of glycanmaturation in all thesesites by analyzing glycopeptides with mass spectrometry. In each of the glycosylation positions, theglycan chains turned out to have beenenzymatically modified in the Golgi apparatus to different extents, leading to very heterogeneous combinations of high-mannose-type (Man9GlcNAc2 to Man5GlcNAc2) and hybrid- and complex-type of glycan chains containing different numbers of antennae (A1 to A4), some of which aresialylated, with or without core fucosylation. It is worthwhilementioning that most of the structural data available on theSARS-CoV-s’ spike trimers used proteins that wereexpressed in insect cells (details in Supplementary MaterialSupplementary Table 3), which produceglycan structures that differ from those in mammalian cells (Marth and Grewal, 2008). However, none of thepdb files shows any trace of covalently linkedglycans. Besides being very heterogeneous, N-glycans are also extremely flexible structures, so that, except in a very few exceptional cases, they can only bemodeled into a protein structure. Such modeling studies have beendone for theSARS-CoV-2, SARS-CoV andMERS-CoVspike glycoproteins (Walls et al., 2019, 2020; Casalino et al., 2020; Grant et al., 2020; Vankadari and Wilce, 2020; Watanabeet al., 2020a, b; Zhou et al., 2020).In order to at least visualize where theN-glycans are attached to thespike proteins, the same subunits fromSARS-CoV-2, SARS-CoV andMERS-CoV that were superposed anddisplayed in Figure 5A were used, to make sure that we are looking to the three subunits in the same orientation for comparison. Figure 6 shows that the glycosylation sites are nicely spread over the whole surface of the threeSARS-CoV-s. Keeping in mind that N-glycans are very voluminous but also very flexible structures, it is obvious that thespike glycoproteins will beextremely well covered by an extensiveglycan coat that may act as a real shield. Besides being important for protein folding and/or stability, this glycan coat might also hideepitopes and prevent antibodies from binding. Therefore, it may interfere with the host’s immunedefensemechanisms. Finally, the published observations (Watanabeet al., 2020a, b) that many of theN-glycan chains are complex-type in nature tell us that thespike glycoproteins should have passed through thedifferent stacks of the Golgi apparatus, where theglycanmodifying enzymes are located in the correct order.
FIGURE 6
In the same spike protein ectodomains from SARS-CoV-2 (A), SARS-CoV (B) and MERS-CoV (C), after superposition of the three models as shown in Figure 5, the asparagine residues that are carrying N-glycans are highlighted by showing their vander Waals surfaces. They are given a color depending on the kind of glycan chain that is attached and following the color code introduced by Watanabe et al. (2020a, b), i.e., green, orange or magenta when the sugars are of high-mannose type in 80–100%, 30–79%, or 0–29% of the cases, respectively. The asparagine residues were numbered manually. A blue and a green arrow indicate the beginning and the end of the RBD, respectively.
In the samespike protein ectodomains fromSARS-CoV-2 (A), SARS-CoV (B) andMERS-CoV (C), after superposition of the threemodels as shown in Figure 5, theasparagine residues that are carrying N-glycans are highlighted by showing their vander Waals surfaces. They are given a color depending on the kind of glycan chain that is attached and following the color code introduced by Watanabeet al. (2020a, b), i.e., green, orange or magenta when thesugars are of high-mannose type in 80–100%, 30–79%, or 0–29% of the cases, respectively. Theasparagine residues were numberedmanually. A blue and a green arrow indicate the beginning and theend of the RBD, respectively.
SARS-CoV/SARS-CoV-2 and MERS-CoV Bind to Different Host Cell Receptors
Upon infection with SARS-CoV-s, the skin andmucosal membranes form the first layer of defense. Through the nose, eyes or mouth, the virus reaches the respiratory system where it may recognize a receptor on the surface of lung cells. Thespike protein RBD (receptor-binding domain) is responsible for recognition and attachment to a host cell. The receptor for both theSARS-CoV-2 and theSARS-CoVspike proteins was identified to beACE2 (angiotensin-converting enzyme-2), while the receptor for MERS-CoV is DPP4 (dipeptidyl-peptidase-4) (Belouzardet al., 2012; Fehr and Perlman, 2015; Li, 2016; Skariyachan et al., 2019; Hoffmannet al., 2020; Letko et al., 2020; Matheson and Lehner, 2020; Walls et al., 2020). DPP4, also known as CD26, is a single-pass type-II transmembrane protein of 766 residues with a very extended (738 residues) C-terminal ectodomain that is N-glycosylated. Thanks to its dipeptidyl-peptidase activity, it acts as a regulator of numerous physiological processes.Angiotensin-Converting Enzyme-2 (ACE2) on the other hand is a proteolytic enzyme acting on angiotensin-I and -II, as well as on some other vasoactivepeptides, and it is a regulator of blood pressure. It is a single-pass type-I membrane protein (805 residues) with an extended N-terminal ectodomain of 723 residues. HumanACE2 has six potential N-glycosylation sites and is heavily N-glycosylated (Warneret al., 2004). ACE2 is expressed in many different organs (Xu et al., 2020a), which might contribute to thedamage to organs other than lungs in someCovid-19patients. For SARS-CoV-s, it was observed that co-expression of a plasma cell membrane-anchored surface protease, TMPRSS2 (transmembrane protease-serine-2), highly facilitates cellular uptake of the virus (Hofmann and Pöhlmann, 2004; Heurich et al., 2014). Co-expression of ACE2 and the proteaseTMPRSS2 occurs not only in lung cells but in many different other cell types as well (Sungnak et al., 2020; Ziegleret al., 2020). TMPRSS2 is a single-pass type-II membrane protein of 492 residues with a C-terminal serine proteasedomain and seems to be involved in various physiological and pathological processes (Thunders andDelahunt, 2020). Its expression is developmentally regulated and increases with aging, which may contribute to theenhanced susceptibility of theelderly to SARS-CoV-2.For SARS-CoV-s, a correlation has been observed between the grade of infection and the virus load received (Magleby et al., 2020), but also with the affinity of the viral RBD for the host receptor (Ou et al., 2020). It was shown that for SARS-CoV-2 the affinity is very high (with K-values in thenM range), which contributes to the severity of the symptoms (Wrapp et al., 2020b). These affinities cannot be assessed from structural data but need to bemeasured by otherexperimental techniques, mostly based on ELISA or biosensor type of technologies (e.g., SPR or BLI) (for someexamples: Huo et al., 2020; Ju et al., 2020; Pinto et al., 2020; Shang et al., 2020b; Shi et al., 2020; Walls et al., 2020; Wrapp et al., 2020a).
Binding of SARS-CoV-2 to Its ACE2 Receptor
A picture of SARS-CoV-2 RBD binding to its human receptor ACE2 can bemade using the structure coordinates in model 6VW1.pdb. Figure 7A shows theSARS-CoV-2 RBD on top, with theACE2 receptor molecule underneath. This figure shows both chains as ribbons colored for secondary structure succession, which enables to easily follow the progression in both chains from N- to C-terminus (alpha-helices and beta-strands are colored following rainbow colors and starting from blue to red, while loops are left gray).
FIGURE 7
(A) The SARS-CoV-2 receptor-binding domain from 6VW1.pdb (chain E) on top (from N334 till P521; the side chains of both residues are added to the figure), with the ACE2 receptor protein (chain A) underneath. Ribbons are colored for secondary structure succession. (B) Interface between A-chain (ACE2 receptor) and E-chain (the SARS-CoV-2 RBD) in model 6VW1.pdb. A-chain and E-chain residues were selected that are within a distance of 3.2 Å from the opposite chain. Most of the selected A-chain residues belong to a long ACE2 helix (i.e., residues S19, Q24, K31, H34, E35, E37, D38, Y41, Q42), plus Y83 and K353. The E-chain residues are all located in the region between residue 449 and 505 (i.e. residues Y449, Y453, N487, Y489, Q493, G496, Q498, T500, G502 and Y505). Residues from the receptor are labeled in gray, while those from the viral RBD are labeled in red. Hydrogen bonds are shown as green dashed lines. The picture is shown in the same orientation as in A, but to make it clearer, the width of the α-helical structures was reduced in this figure to 1 Å. Literature reference for structural codes: 6VW1.pdb (Shang et al., 2020b).
(A) TheSARS-CoV-2 receptor-binding domain from 6VW1.pdb (chain E) on top (from N334 till P521; theside chains of both residues are added to the figure), with theACE2 receptor protein (chain A) underneath. Ribbons are colored for secondary structure succession. (B) Interface between A-chain (ACE2 receptor) andE-chain (theSARS-CoV-2 RBD) in model 6VW1.pdb. A-chain andE-chain residues were selected that are within a distance of 3.2 Å from the opposite chain. Most of the selected A-chain residues belong to a long ACE2 helix (i.e., residues S19, Q24, K31, H34, E35, E37, D38, Y41, Q42), plus Y83 and K353. TheE-chain residues are all located in the region between residue 449 and 505 (i.e. residues Y449, Y453, N487, Y489, Q493, G496, Q498, T500, G502 and Y505). Residues from the receptor are labeled in gray, while those from the viral RBD are labeled in red. Hydrogen bonds are shown as greendashed lines. The picture is shown in the same orientation as in A, but to make it clearer, the width of the α-helical structures was reduced in this figure to 1 Å. Literature reference for structural codes: 6VW1.pdb (Shang et al., 2020b).When looking in detail to the interface between the two molecules (Figure 7B), it is clear that the RBD residues making contact with ACE2 are all located in the region from residue Y449 till Y505. This is precisely that part of the RBD that is very different in MERS-CoV (Figure 5) andexplains why MERS-CoV is not using ACE2 as receptor molecule. All the residues lining the contact surface betweenSARS-CoV-2 and its ACE2 receptor are shown in Figure 7B (and Supplementary Figure 6, see Supplementary Material), together with hydrogen bonds that are formed between them. It is striking that it is precisely the RBD residueY489, which forms a hydrogen bond with H83 from theACE2 receptor, that was tentatively identified as one of themost mobile residues in the viral spike protein (see section “Flexibility in theSpike Glycoprotein”).
A Viral Spike Protein Cannot Bind to the ACE2 Receptor When All Its Subunits Are in Closed Conformation
In order to visualize that a spike protein subunit needs to be in open conformation before it can bind theACE2 receptor, an overlay was made between thespike protein RBD in model 6VW1.pdb and the RBD of one of the subunits of thespike protein trimer in model 6VYB.pdb. Figure 8A shows the full spike protein trimer that has one of its subunits in open state (chain B, of which the ribbons are colored blue), with a ACE2 receptor molecule (colored orange) bound to it. No clashes occur between the receptor and any of thespike protein subunits in this conformation. However, when the sameexercise was performed using model 6VXX.pdb, which has all its subunits in closed state, obviously many clashes occur between the receptor molecule andspike protein subunits, as evidenced in Figure 8B. Clashing residues are highlighted in moredetail in Figure 8C, where they aremade visible as pink dashed lines.
FIGURE 8
(A) An overlay was made between the SARS-CoV-2 RBD in model 6VW1.pdb (which shows binding of the virus to its ACE2 receptor) and the viral RBD of subunit B in model 6VYB.pdb (showing the spike protein trimer, subunits A and C in closed conformation and subunit B in open conformation). This figure shows the complete spike protein trimer (chains A, B and C with ribbons colored red, blue and green, respectively). The ribbons of the RBD in model 6VW1 are colored light blue, to demonstrate the overlay with the RBD from chain B in 6VYB. The ACE2 receptor protein is colored orange. There are obviously no clashes here. (B) An overlay was made between the SARS-CoV-2 RBD in model 6VW1.pdb (which shows binding of the virus to its ACE2 receptor) and the viral RBD of subunit A in model 6VXX.pdb (showing the spike protein trimer, all subunits in closed conformation; ribbons of chains A, B and C colored red, blue and green, respectively). Extensive clashes occur between the ACE2 receptor (colored orange), especially with the RBD of chain B. (C) A more detailed view of the clashes between the ACE2 receptor and the spike protein. The residues that are clashing are shown with their backbones and side chains, colored blue when they belong to chain-B and green when belonging to the C-chain of the spike protein (only one residue: N440, labeled), and orange when they belong to the ACE2 receptor. Clashes appear as pink dashed lines. How to make figures demonstrating the occurrence of clashes is explained in the Supplementary Material (Supplementary Figure 13). Literature references for structural codes: 6VW1.pdb (Shang et al., 2020b); 6VYB.pdb (Walls et al., 2020); 6VXX.pdb (Walls et al., 2020).
(A) An overlay was made between theSARS-CoV-2 RBD in model 6VW1.pdb (which shows binding of the virus to its ACE2 receptor) and the viral RBD of subunit B in model 6VYB.pdb (showing thespike protein trimer, subunits A and C in closed conformation and subunit B in open conformation). This figure shows the completespike protein trimer (chains A, B and C with ribbons colored red, blue and green, respectively). The ribbons of the RBD in model 6VW1 are colored light blue, to demonstrate the overlay with the RBD from chain B in 6VYB. TheACE2 receptor protein is colored orange. There are obviously no clashes here. (B) An overlay was made between theSARS-CoV-2 RBD in model 6VW1.pdb (which shows binding of the virus to its ACE2 receptor) and the viral RBD of subunit A in model 6VXX.pdb (showing thespike protein trimer, all subunits in closed conformation; ribbons of chains A, B and C colored red, blue and green, respectively). Extensive clashes occur between theACE2 receptor (colored orange), especially with the RBD of chain B. (C) A moredetailed view of the clashes between theACE2 receptor and thespike protein. The residues that are clashing are shown with their backbones andside chains, colored blue when they belong to chain-B and green when belonging to the C-chain of thespike protein (only one residue: N440, labeled), and orange when they belong to theACE2 receptor. Clashes appear as pink dashed lines. How to make figures demonstrating the occurrence of clashes is explained in the Supplementary Material (Supplementary Figure 13). Literature references for structural codes: 6VW1.pdb (Shang et al., 2020b); 6VYB.pdb (Walls et al., 2020); 6VXX.pdb (Walls et al., 2020).
Flexibility in the Spike Glycoprotein
Another way of representing thespike protein is by coloring themodel for “B-factor”. As discussed by T.E. Creighton already in 1993, B-factors (alternatively called temperature factors, or atomic displacement parameters, or Debye-Waller factors) describe thedisplacement of an atomic position from its average or mean position (Sun et al., 2019). B-factors (expressed in Å2) tell us, for each of the atoms in themodel, how well determined and steady their position is. Several studies suggested that, in high quality models, B-factors might be used to identify flexibility andmobility in proteins, proposing that high B-factors indicate higher than average flexibility as opposed to low B-factors, which are believed to occur at more rigid positions.Supplementary Figure 7 (see Supplementary Material) shows theSARS-CoV-2spike protein trimer colored for B-factor. The region where we find residues having the highest B-factors corresponds to the receptor-binding domain, i.e., from residue P330 till P521 (from light green up to orange). The residue with the highest B-factor in each subunit is Y489. This high flexibility in the RBDmay be thought to greatly assist thespike protein in finding a receptor on a host cell.Spectacular flexibility in thespike protein is indeeddescribed in two papers. One study (Keet al., 2020) is based on cryo-EM and tomography to investigate thedistribution of spike protein trimers and their flexibility using virus-infected VeroE6 andCalu-3 cells. Roughly 24 spike trimers were seen per virion, of which 97% in pre-fusion (about 31% of them with all monomers in closed state) and only 3% in post-fusion state (this state is described in section “Events Causing Virus Entry Into Host Cells: TheSpike Protein “Post-Fusion” State”). The study showed that protruding spike proteins can extensively be tilted (up to 90°) towards the viral membrane. They seem to be rather sparsely but evenly distributed, without clustering, occurring at a density of about 1 trimer per 1,000 nm2 of membrane surface. Based on these calculations, it was hypothesized that multiple binding to ACE2 receptors, leading to avidity, will be an exception rather than the rule. A second study (Turoňová et al., 2020) is based on cryo-electron tomography, combined with molecular dynamics simulation. It shows that, in the pre-fusion state, thespike protein is extremely mobile and its stalk contains three hinges that were coined hip, knee and ankle (with estimated flexibilities of 16.5° ± 8.8°, 23° ± 11.7 and 28° ± 10.2°, respectively). This is assumed to give the head of thespike protein a lot of freedom and helps it to accurately scan the host cell for ACE2 receptors. Contrarily, in the post-fusion state the structure is apparently inflexible. Linked to this publication, a video demonstrates the pronounced flexibility in the pre-fusion state[1]. Theextreme structural adaptability of theSARS-CoV-2spike protein is also visualized in a publication presenting at least ten structures and transition phases occurring over the course of ACE2 binding and priming of the protein for membrane fusion (Benton et al., 2020).
Endocytosis Is an Alternative Way to Enter Host Cells
SARS-CoV-s (but also othercoronaviruses) may also invade a host cell by an alternativemechanism based on clathrin-mediatedendocytosis. In this case, not only is the virus internalized but also theACE2 receptor protein, which may lead to serious secondary effects due to reducedACE2 activity (Delpino and Quarleri, 2020; Gheblawi et al., 2020; Lanza et al., 2020; Magalhaes et al., 2020; Ni et al., 2020; Samavati and Uhal, 2020; Saponaro et al., 2020). This occurs when there is no protease availablenearby at the cellular surface to perform thenecessary proteolytic cleavage into S1 and S2 (Heurich et al., 2014; Fung and Liu, 2019). Endocytosis is then followed by delivery of the virus inside an early endosome, which evolves towards a lateendosome and finally towards a lysosome (Neefjes et al., 2017). The required proteolysis of their spike proteins then takes place in the context of either of these organelles, depending on which protease is able to perform the cleavage. Proteolysis leads to fusion of the virion (which is now inside the organelle) with themembrane of the respective organelle, followed by delivery of the viral RNA in the host cell’s cytoplasm by the samemechanism, involving the regions HR1 and HR2, together with the FP, as is explained in section “Events Causing Virus Entry Into Host Cells: TheSpike Protein “Post-Fusion” State” (Burkardet al., 2014). Host proteolytic enzymes of the cathepsin family, comprising aspartic as well as cysteine andserine proteases with a broad substrate specificity, are occasionally mentioned to help priming SARS-CoV-s for membrane fusion, though this is sometimes disputed (Turk et al., 2012; Patel et al., 2018). Also, other viruses have been reported to rely on cathepsins at some stages of their life cycle (Brix, 2018).
The Role of Spike Protein Cleavage by Furin and of Neuropilin-1 for SARS-CoV-2 Cellular Uptake
Contrary to all othercoronaviruses known so far, SARS-CoV-2 acquired a furin-cleavage sequence, right within the peptide where cleavage occurs in thespike protein to remove the S1 half of the subunits (Figure 2). Furin is a single-pass type-I membrane protein that is ubiquitously expressed in vertebrates and has serineendoprotease activity. It cleaves at doublets or clusters of basic amino acids (e.g., KR↓ and RR↓), Rx(K/R)R↓ being the canonical cleavage sequence (Thomas, 2002). It occurs in the trans-Golgi network (where it cycles between sorting compartments), but also at the cell surface and in early endosomes, i.e., at all locations where the virus might pass by at the onset of infection.Proteolytic cleavage of SARS-CoV-2spike by furin results in exposure of the R682RAR685 sequence at the C-terminus of S1, thereby converting S1 into a so-called C-end-Rule (CendR) peptide. CendRpeptides (conform to a R/KxxR/K motif, where the spacing of the basic residues is important), but not their cryptic motifs, are known to bind to neuropilin-1 (NRP1) and then get internalized, together with molecular structures that are attached to them, by a mechanismsimilar to but different fromendocytosis (Teesalu et al., 2009). NRP1 is an essential pleiotropic surface receptor, present on endothelial andepithelial cells, acting as co-receptor molecule (Parkeret al., 2012; Wildet al., 2012; Kumanogoh and Kikutani, 2013; Guo and Vander Kooi, 2015). It is a single-pass type-I membrane protein with fiveectodomains essential for ligand-binding (two CUB domains, followed by two coagulation factor-homology domains, the first of which has a binding pocket for peptides with C-terminal arginine, and oneMAMdomain) and its short cytoplasmic domain interacts with PDZ-domain proteins. It is believed that, afterfurin cleavage but before further priming thespike by the secondary S2’ proteolytic step, S1 and S2 temporarily remain associated, giving time to the cleavedspike subunit to bind to NRP1. Two publications describeexperiments showing that mAbs directed against NRP1, as well as a small molecule binding in theCendR pocket of NRP1, reduceSARS-CoV-2 infectivity, and also a mutant lacking the original furin cleavagesite is less infective (Cantuti-Castelvetri et al., 2020; Daly et al., 2020). NRP1 can thus be considered as an important host factor facilitating cell entry of SARS-CoV-2 andexplaining its enhanced infectivity when compared to SARS-CoV andMERS-CoV.
The Potential Implication of the Glycocalyx and/or Host Lectins at the Onset of a Coronavirus Infection
When a virus invades a host, the first structure it encounters is the glycocalyx, a 50–200 nm thick layermade up as an intricatenetwork of glycoproteins (N- and/or O-glycosylated) and proteoglycans (containing glycosaminoglycans, or GAGs) that are covalently attached to the outer surface of the plasma membrane, either by means of transmembranedomains or through GPI-anchors (Koehleret al., 2020). Theglycan chains of these glycoproteins are oftendecorated with terminal sialic acids, while theGAGs contain extended chains of heparan-, chondroitin- or keratan-sulfate, these building blocks being heavily negatively charged. Numerous viruses are known to interact with these chargedglycans, though in general with low affinity (K-values in themM range), leading to substantial binding strengths through multivalency. It is assumed that those initial interactions, mostly electrostatic in nature, bring the virions in close proximity and in elevated concentrations to the cell surface, increasing their chance to find their true receptors (Cagno et al., 2019; Koehleret al., 2020).In somecoronaviruses, the NTD, preceding the RBD, displays lectin activity and recognizes glycan ligands (Belouzardet al., 2012; Li, 2016). A finding that often goes unnoticed is that theMERS-CoVspike trimer as well binds to sialoglycans with the NTD proven to be responsible for that (Li et al., 2017). This binding is highly selective but of low affinity, and a multivalent sialoglycan presentation is required for interaction. It was argued that sialoglycansmay guideMERS-CoV search for its true receptor, theDPP4, on the host cell surface. A similar mechanism of sialic acid recognition acting as an infection facilitator was proposed for othercoronaviruses (Qing et al., 2020), including SARS-CoV andSARS-CoV-2 (Morniroli et al., 2020).It was observed (Clausenet al., 2020) that theSARS-CoV-2spike protein also binds heparan-sulfate (HS), which consists of linear chains of disaccharide building blocks comprising D-glucuronic acid (some of themmodified to D-iduronic acid) and N-acetyl-D-glucosamine, extensively substituted with sulfate groups. Through modeling, theHS binding was pinpointed to the RBD, close to theACE2 binding site, and both molecules bind independently fromeach other. Heparan sulfate (HS) binding is hypothesized to result fromelectrostatic interactions between the highly negativeHSmolecule and the overall positively charged RBD surface (see Supplementary Material, Supplementary Figure 8), the latter being able to accommodate a HS chain of up to 20 monosaccharides. Importantly, binding of HS promotes the RBD open conformation, thereby stimulating binding to ACE2 (Clausenet al., 2020). It was furtherdiscussed that theSARS-CoV-2 RBD surface is moreelectropositive than the one fromSARS-CoV, mainly as the result of two mutations, i.e., T431→K444 andE341→N354.Since not only do theSARS-CoV-s’ receptor molecules have covalently attachedN-glycans, but also theSARS-CoV-s’ spike proteins are heavily glycosylated, it would not be unthinkable that the host’s own cellular surface lectins might be involved in capturing virions, or at least act as binding facilitators. C-type lectins (CLRs) are important receptors on patrolling myeloid cells that recognizeglycans at the surface of foreign invaders, leading to the induction of immune responses. However, certain viruses have “learnt” how to modulate the response of macrophages anddendritic cells, and how to (mis)use them for promoting infection instead. Although detailed knowledge of mechanisms is still missing, it was hypothesized that capture by host lectins on myeloid cells does not always lead to normal antigen processing in the lysosomes followed by peptide presentation at the cell surface. Instead, the virions are temporarily contained, leading to their release at a later stage, followed by trans-infection of other susceptible target cells expressing the genuineACE2 receptors (Geijtenbeek and van Kooyk, 2003). In case of SARS-CoV-2, preliminary reports using pseudovirus particles show that both DC/L-SIGN andMGL on antigen-presenting cells bind to spike protein glycan chains and promote virus transfer to permissiveACE2-containing cells (Thépaut et al., 2020). Moreover, thesialic acid-binding immunoglobulin-type lectins Siglec-3, -9 and -10 that are present on myeloid and/or B-cells were also found to bind to spikeglycans (Chiodo et al., 2020).
Binding of the SARS-CoV-2 RBD to the ACE2 Receptor Does Not Extensively Affect the RBD Conformation
To compare the structure of theSARS-CoV-2 RBD in the absence (model 6VXX, chain B) and the presence of theACE2 receptor molecule (model 6VW1, chain E), an overlay needs first to bemade between the RBDs in both models. The result is shown in Figure 9A, in which the RBD of model 6VW1 is colored orange, while theACE2 receptor is colored blue. Superposed is the RBD of model 6VXX, of which the ribbons are colored for RMS (RMS coloring, because the root-mean-squaredistances were calculated between corresponding backbone atoms to arrive at the color assignment for a group). RMS coloring for a model means that groups (backbones/side chains/atoms) in that model are colored according to how far they lie from corresponding groups in the othermodel that is considered the referencemodel (in this example, 6VW1 is taken as the reference). Regions that superimposeexactly are coloreddark blue, with colors farther up the visible spectrum assigned for greaterdistances from corresponding atoms in the referencemodel. Figure 9A shows that the overall conformation in both models does not changedramatically.
FIGURE 9
(A) The SARS-CoV-2 RBD (orange) in 6VW1.pdb (which was chosen as the reference model) is shown, together with the ACE2 receptor (blue). After superposing the RBD from 6VXX.pdb to the previous one, this domain is colored for RMS to analyze how well both RBD structures coincide. (B) Looking in more detail to some side chains in the vicinity of the ACE2 receptor. Side chains of the residues Y449, Y451, Y453, Y489, F490, Q493, Y495, F497, Y505, and Q506 are shown, labeled. They were given CPK colors in the reference model 6VW1, while they were colored for RMS in model 6VXX. Ribbons were colored orange in model 6VW1 and for RMS in model 6VXX. Literature references for structural codes: 6VW1.pdb (Shang et al., 2020b); 6VXX.pdb (Walls et al., 2020).
(A) TheSARS-CoV-2 RBD (orange) in 6VW1.pdb (which was chosen as the referencemodel) is shown, together with theACE2 receptor (blue). After superposing the RBD from 6VXX.pdb to the previous one, this domain is colored for RMS to analyze how well both RBD structures coincide. (B) Looking in moredetail to someside chains in the vicinity of theACE2 receptor. Side chains of the residues Y449, Y451, Y453, Y489, F490, Q493, Y495, F497, Y505, and Q506 are shown, labeled. They were givenCPK colors in the referencemodel 6VW1, while they were colored for RMS in model 6VXX. Ribbons were colored orange in model 6VW1 and for RMS in model 6VXX. Literature references for structural codes: 6VW1.pdb (Shang et al., 2020b); 6VXX.pdb (Walls et al., 2020).In Figure 9B, we look in moredetail to some amino acidside chains. In model 6VXX.pdb, the stretches N450-L455 plus Y489-Q506 from chain A aremade visible (other residues aremissing here). Thesepeptides are very close to theACE2 receptor (see above, Figure 7). In model 6VW1.pdb, the same stretches are added to the picture. The backbone plus side chains of these residues are shown in both models, with the aromatic residues and two glutamines, labeled. Finally, ribbons were added for the stretches on display. Wemay conclude that, locally, the orientation of someside chains is slightly modified in themodel where the RBD is bound to theACE2 receptor, which is not surprising, but the backbone is hardly affected.
Only ACE2 but Not ACE Can Act as a Receptor for SARS-CoV-2
An homolog of ACE2exists in humans, i.e., angiotensin-I-converting enzyme, or ACE. This enzyme is a peptidyl-dipeptidase, cleaving a dipeptide at the C-terminus of angiotensin, and is ubiquitously expressed throughout thehuman body (Riordan, 2003). Just as ACE2, ACE is also a membrane-bound protein. It consists of two very similar domains that originated by duplication, and of which the amino acid sequences can also easily be aligned with ACE2 (see Supplementary Figure 9 in Supplementary Material). Therefore, one could speculate on the possibility that ACEmight also act as a receptor for SARS-CoV andSARS-CoV-2.In Figure 10, the overall structural similarities between both ACEdomains and with ACE2 are visualized. The two domains of ACE are strikingly similar in structure, but also, theACE2 structure is very similar to that of an ACEdomain. However, though the threedomains are structurally very similar, there are important differences to note in their primary structure (see Supplementary Material, Supplementary Figure 9). Particularly in the region of the contact surface of theACE2 receptor protein with theSARS-CoV andSARS-CoV-2 RBDs, the sequences differ greatly fromeach other. Therefore, ACE is very unlikely to be able to act as receptor molecule for both viruses.
FIGURE 10
(A) Structural similarities between the two domains of ACE. The N-terminal domain of model 4BXK.pdb was used, together with the C-terminal domain of model 4APH.pdb. An overlay was made between both. Model 4APH was used as reference layer and ribbons in this model were colored light green. The ribbons of model 4BXK were colored for RMS. (B) Structural similarities between ACE2 (model 1R42.pdb) and the C-terminal ACE domain (model 4APH.pdb) and an overlay was made as well. Model 4APH was used as reference layer and ribbons in this model were colored light gray. The ribbons of model 1R42 were colored for RMS.Literature references for structural codes: 4BXK.pdb and 4APH.pdb (Masuyer et al., 2012; Douglas et al., 2014), 1R42.pdb (Towler et al., 2004).
(A) Structural similarities between the two domains of ACE. The N-terminal domain of model 4BXK.pdb was used, together with the C-terminal domain of model 4APH.pdb. An overlay was made between both. Model 4APH was used as reference layer and ribbons in this model were colored light green. The ribbons of model 4BXK were colored for RMS. (B) Structural similarities betweenACE2 (model 1R42.pdb) and the C-terminal ACEdomain (model 4APH.pdb) and an overlay was made as well. Model 4APH was used as reference layer and ribbons in this model were colored light gray. The ribbons of model 1R42 were colored for RMS.Literature references for structural codes: 4BXK.pdb and 4APH.pdb (Masuyeret al., 2012; Douglas et al., 2014), 1R42.pdb (Towleret al., 2004).
Proteolytic Events Occurring in the Spike Glycoprotein Upon Binding to Its Receptor
Soon after binding of theSARS-CoV-2 virus to its ACE2 receptor, a first proteolytic step occurs in thespike glycoprotein to split S1 (the N-terminal portion of thespike protein, containing the RBD) from S2 (the central andmore rigid portion of thespike protein). The peptide of 12 residues in which cleavage takes place (i.e., T676 QTNSPRRARSVA S689) is missing in the structure, but the flanking residues T676 and S689 are clearly located at the outer surface of thespike protein trimer (Figures 3, 11). It can easily be imagined that this very hydrophilic peptide will beexposed and readily available for proteolysis.
FIGURE 11
The consecutive proteolytic steps that occur upon binding of the spike glycoprotein to its ACE2 receptor. Pictures are made from model 6VXX.pdb. (A) The complete spike protein subunit (chain B), with the S1 half that is cleaved first shown as ribbons, colored for secondary structure succession, except for the RBD, which is colored yellow. The S2 half of the protein is shown as backbone and sidechains, except for the fusion peptide that is shown as ribbons and colored red. The peptide in which the cleavage occurs is missing in the structure, but the two flanking residues (T676 and S689) are labeled (manually, in gray). The enlargement shows the two peptides R646-T676 and S689-N709, colored for accessibility, with their start and end residues labeled (manually). The place where cleavage S1/S2 occurs is indicated with a blue arrow. (B) The figure in the middle shows the peptide that is removed by the second cleavage (proteolytic reaction S2’) as green ribbons. The place where cleavage occurs is indicated with a blue arrow. (C) The figure below shows what is left, with the HR1 domain (the heptad repeat 1, i.e. peptide G908-D985) now shown as ribbons (no backbone and side chains) and colored yellow. The HR2 domain is not part of the structure. After both cleavage reactions (S1/S2, followed by S2’), the remainder of the spike protein undergoes dramatic conformational changes highlighted in Figure 12. Literature reference for structural codes: 6VXX.pdb (Walls et al., 2020).
The consecutive proteolytic steps that occur upon binding of thespike glycoprotein to its ACE2 receptor. Pictures aremade frommodel 6VXX.pdb. (A) The completespike protein subunit (chain B), with the S1 half that is cleaved first shown as ribbons, colored for secondary structure succession, except for the RBD, which is colored yellow. The S2 half of the protein is shown as backbone andsidechains, except for the fusion peptide that is shown as ribbons and colored red. The peptide in which the cleavage occurs is missing in the structure, but the two flanking residues (T676 and S689) are labeled (manually, in gray). Theenlargement shows the two peptides R646-T676 and S689-N709, colored for accessibility, with their start andend residues labeled (manually). The place where cleavage S1/S2 occurs is indicated with a blue arrow. (B) The figure in themiddle shows the peptide that is removed by the second cleavage (proteolytic reaction S2’) as green ribbons. The place where cleavage occurs is indicated with a blue arrow. (C) The figure below shows what is left, with the HR1 domain (the heptad repeat 1, i.e. peptide G908-D985) now shown as ribbons (no backbone andside chains) and colored yellow. The HR2 domain is not part of the structure. After both cleavage reactions (S1/S2, followed by S2’), the remainder of thespike protein undergoes dramatic conformational changes highlighted in Figure 12. Literature reference for structural codes: 6VXX.pdb (Walls et al., 2020).
FIGURE 12
(A) Formation of the 6-helix bundle structure with the remaining heptad repeats HR1 and HR2. Pictures were made from model 6LXT.pdb. The structure is seen as a side view (right) and as a top view (left) with all side chains displayed. All backbones and side chains are displayed in CPK colors, plus ribbons colored for secondary structure succession. In the side view, the terminal residues of the long helices are labeled for both heptad repeats of the first spike protein subunit (i.e., HR1: T912-E988; HR2: V1164-E1202). (B) The HR1 and HR2 regions from the first subunit are shown with only the hydrophobic side chains, labeled in red for HR1 and in blue for HR2. The width of the ribbons was reduced to 1 Å to make the side chains on display better visible. (C) The post-fusion S2 trimer from model 6XRA.pdb. Residues available in this model are N703 till I770, T912 till N1173 (comprising HR1: T912-E988) and Q1180 till L1197 (the latter two stretches comprising part of HR2: V1164-E1202). The model is shown as ribbons, chain A colored green, chain B colored red, except the regions HR1 (yellow) and HR2 (orange) and chain C colored for secondary structure succession. The virion is at the right. The positions of residues N703, I770, T912 and L1197 in chain B are shown. Approximate spike dimensions were measured on the model and are indicated. Literature references for structural codes: 6LXT.pdb (Xia et al., 2020); 6XRA.pdb (Cai et al., 2020).
In Figure 11A, we see thedomains that are removed by the first cleavage as ribbons, colored for secondary structure succession, with the remainder of chain B in model 6VXX shown as backbone with side chains, and the fusion peptide (as far as its structure is available in themodel) overlaid as red ribbons with the RBDdomain at the top right (and colored yellow). After the first proteolytic cleavage, a second cleavage step (indicated as S2’) occurs just before the fusion peptide. In this step, the peptide shown as green ribbons in Figure 11B will be removed. Finally, what is left from thespike protein subunit is shown in Figure 11C. The heptad repeat (HR1) is important for thenext events, which will lead to fusion of viral and host membranes, to allow entry of the viral RNA into the host cell.
Events Causing Virus Entry Into Host Cells: The Spike Protein “Post-Fusion” State
After both proteolytic cleavageevents (S1/S2, followed by S2’), the remaining S2 domain undergoes an instantaneous anddramatic change in conformation to adopt the “post-fusion” state. In this state, the coiled coil-forming heptad repeats (HR1 and HR2) of each of the three subunits in the trimeric S protein form a strong andextendedsix-helix bundle, which prepares the virion for membrane fusion with the host cell plasma membrane (Cai et al., 2020). Towards oneend of this bundle, the three fusion peptides, one in each subunit, are now brought juxtaposed to the host cell membrane and catapulted into it, after which HR2 domains fold back to bring FP and the TMdomain segments together, leading to fusion of the viral and the host membranes. This results in release of the viral RNA, decorated with N proteins, into the host cell (Shulla and Gallagher, 2009; Li, 2016; Cai et al., 2020; Shang et al., 2020a; Tang et al., 2020). It was shown that, due to differences in the HR1 domain sequences, SARS-CoV-2 has a significantly higher capacity for membrane fusion than SARS-CoV, which might also contribute to its higher infectivity (Xia et al., 2020).Figure 12A shows the formation of a 6-helix bundle structure, obtained for SARS-CoV-2peptides, as a side and a top view. Hydrophobic interactions are themajor forcedriving the formation of this helix bundle. Figure 12B shows HR1 and HR2, with only theside chains of the hydrophobic residues. In the peptide T912-E988, 34 residues out of 77 are hydrophobic in nature and in the peptide V1164-E1202, 18 out of 39 are hydrophobic (44% and 46%, respectively). They form two lines of hydrophobicity on thesepeptides that slowly twist around the long helices, which results in wrapping both HR regions aroundeach other through the formation of an antiparallel coiled coil in each of thespike protein subunits. These regions are further assembled to form the 6-helix bundle structure. Figure 12C shows the post-fusion state, after removal of the S1 half of thespike protein. The right half of this structure, from residue T912 till the red arrow, is very similar to the structure of Figure 12A (as a side view). At oneend, this structure is still attached to the virion (which is to the right), and somewhere, in between I770 and T912, are the fusion peptides (thepeptides S816 till F833, missing in this structure) that will integrate in the host membrane, resulting in fusion.(A) Formation of the 6-helix bundle structure with the remaining heptad repeats HR1 and HR2. Pictures weremade frommodel 6LXT.pdb. The structure is seen as a side view (right) and as a top view (left) with all side chains displayed. All backbones andside chains aredisplayed in CPK colors, plus ribbons colored for secondary structure succession. In theside view, the terminal residues of the long helices are labeled for both heptad repeats of the first spike protein subunit (i.e., HR1: T912-E988; HR2: V1164-E1202). (B) The HR1 and HR2 regions from the first subunit are shown with only the hydrophobic side chains, labeled in red for HR1 and in blue for HR2. The width of the ribbons was reduced to 1 Å to make theside chains on display better visible. (C) The post-fusion S2 trimer frommodel 6XRA.pdb. Residues available in this model are N703 till I770, T912 till N1173 (comprising HR1: T912-E988) and Q1180 till L1197 (the latter two stretches comprising part of HR2: V1164-E1202). Themodel is shown as ribbons, chain A colored green, chain B colored red, except the regions HR1 (yellow) and HR2 (orange) and chain C colored for secondary structure succession. The virion is at the right. The positions of residues N703, I770, T912 and L1197 in chain B are shown. Approximatespikedimensions weremeasured on themodel and are indicated. Literature references for structural codes: 6LXT.pdb (Xia et al., 2020); 6XRA.pdb (Cai et al., 2020).
The Much-Debated Lucrative Spike Protein Mutant D614G
From February 2020 onwards, a point mutation (D614G) in theSARS-CoV-2spike protein emerged (Korberet al., 2020). Within no time it supplanted the original protein worldwide and it was, and still is, wondered why this mutation spread at such an incredible speed. The apparently successful mutation was said to confer not only increased transmissibility to the virus, but also increasedmortality. Studies using different kinds of pseudoviruses equipped with SARS-CoV-2spike proteins indicated that spikes having the G614 mutation infect cells far more competently than the original D614 ones (summarized in Callaway, 2020b). Whether this will also be the case with the real virus in humans has yet to be confirmed, though it was further observed that genuineSARS-CoV-2 viruses were also more infectious in lab experiments on human lung cell lines and were present in increased concentrations in the upper airways of infected hamsters (Planteet al., 2020). These puzzling observations couldneither to be attributed to a difference in numbers of virions produced, nor to an increased affinity of the variant to theACE2 receptor (Daniloski et al., 2020). Certain studies ascribe the increasedeffectivity of themutant to a decrease in premature S1/S2 cleavage of the G614 variant during assembly of new virions in the host (Daniloski et al., 2020; Zhang et al., 2020b), though an increased susceptibility to proteases was suggested as well from otherexperiments (Eaaswarkhanth et al., 2020; Hu et al., 2020b). The reason behind an alleged greater or lesser susceptibility to proteolysis remains unclear. Nevertheless, it needs to be kept in mind that mutations such as this one could influence the antigenic properties of the protein andmight reduce theefficacy of vaccines that are currently underdevelopment using the original spike protein as it was isolated in Wuhan. Additionally, simple and seemingly harmless mutations may also have a pronouncedeffect on how the host immune systemdoes recognize and react to the virus.It was inferred frommolecular modeling that the G614 mutation woulddestabilize the open conformation, thus promoting the closed state, which is unable to bind to the RBD (Becerra-Flores and Cardozo, 2020). Why then would the G614 mutant display a much higher fatality rate when compared to the original D614 protein? The authors suggested two possible hypotheses for these seemingly contradictory observations, i.e., the now more prevalent closed formmight (i) be better shielded from attack by the host immune system, and/or (ii) elicit a harmful immune response, e.g., through the production of detrimental antibodies. Other studies, on the contrary, suggest that theD614Gmutant rather loosens thespike protein and brings its subunits moreeasily in the open state, which should facilitate the binding to its ACE2 receptor (Mansbach et al., 2020). Either way, many contradictory conclusions are still circulating and the final word on this mutation has clearly not yet been said. A paperexpressed the stand of affairs (August 2020) in its title as follows: “Making sense of mutation: what D” (Grubaugh et al., 2020) and that statement is surely still true today. Supplementary Figure 10 (see Supplementary Material) gives an impression of the surroundings of residueD614.It needs to be stressed that mutations in thespike protein continue to emerge and by early May 2020, 329 naturally occurring variants were already reported (Li et al., 2020b), some of which make the virus resistant to certain monoclonal antibodies. Moreover, certain glycosylation deletions were found to reduce viral infectivity.
The Problem of Antibody-Dependent Enhancement
There are clear indications that SARS-CoV-s also may infect certain cell types of the PBMC (collection of peripheral bloodmononuclear cells) that do not express theACE2 receptor, i.e., those belonging to the immune system (such as monocytes andmacrophages, the former potentially also leading to productive virus replication). The immune response, which is specifically designed to clear infections, sometimes shows a dysregulated response leading to the opposite outcome (Taylor et al., 2015). This kind of response is due to the presence of anti-spike protein antibodies and known as antibody-dependent enhancement (ADE). This immunopathological situation is mediated by antibody Fc domains and occurs when virus-antibody immune complexes interact with cells carrying receptors for Fc. The ADE pathway is very complex with virus- as well as host-dependencies, and not all details are fully understood. Nevertheless, it is an important issue to be taken into consideration during development of vaccination strategies.
Strategies to Prevent Binding of the Virus to Its Receptor
Soluble Mutated ACE2 Analogs as a Decoy Receptor
The possibility of fooling theSARS-CoV-2 virus by administering high-affinity solubleACE2 analogs as decoy receptors, thereby preventing virus binding to andentry in host cells by competition, was launched as an interesting idea to combat Covid-19 (Chan et al., 2020). The authors created an extensive library of 2,340 human sACE2 (soluble receptor) coding mutants that wereexpressed in humanExpi293F cells (each cell expressing only one type of singlemutant), which were then tested for SARS-CoV-2 RBD-binding using in vitro assays. Based on the results obtained, singlemutants were combined and a series of sACE2molecules with triple up to septuplemutations were generated for moredetailed analysis. Several most interesting findings resulted from this study: (i) a mutation modifying residue T92, resulting in a sACE2mutant that is not glycosylated anymore in position N90, favors RBD-binding, suggesting that theglycan at N90 hinders (but does not prevent) RBD-binding; (ii) a number of sACE2mutants at the interface with RBDenhance binding, which opens perspectives for the aforementioned type of approach in fighting Covid-19; (iii) the variant called sACE2.v2.4 (carrying mutations T27Y, L79T andN330Y, thus still leaving the N-glycosylation site at N90 intact, and which is very well expressed and shows enzymatic activity on angiotensin II, albeit reduced), was purified andextensively analyzed: it was found to display a 65-fold higher affinity for immobilizedSARS-CoV-2 RBD than the soluble wild type (using biosensor andELISA technology) and also efficiently competes with antibodies from serum of Covid-19patients for binding to the RBD. In Supplementary Figure 11 (see Supplementary Material) we are looking to the result of a proposed triplemutation T27Y, L79T, N330Y.In another study, part of the sACE2 receptor (Q18-A614) was engineered after computational design andexperimental affinity maturation, fused to theACE2-collectrin domain (D615-S740) anddimerized by adding a human antibody Fc, resulting in avidity as well as long half-life times in vivo. A variant with seven amino acid changes (Q18R/K31F/N33D/H34S/E35Q/W69R/Q76R), and of which ACE2enzyme activity was destroyed by a H345Lmutation, was found to bind thespike RBD 170-foldmore tightly than the wild-typeACE2. This (or some alternative) construct was proposed to be potentially useful as “trap” to neutralizeSARS-CoV-2 and prevent viral entry into host cells (Glasgow et al., 2020).In small-scale clinical studies, a human recombinant sACE2molecule has already been used as a potential drug candidate with promising results (Zoufaly et al., 2020). These studies were based on earlier research using non-mutated hrsACE2 (human recombinant solubleACE2) (Monteil et al., 2020).
‘Mini-Protein Inhibitors’ as Prophylactic Molecules and/or for Use in Therapeutic Treatments
Another interesting avenue in the search for prophylactic and/or therapeutic treatments of Covid-19 was published (Cao et al., 2020). In this study, researchers intended to find high affinity and thermostablemini-binders to theSARS-CoV-2spike RBD that would compete with ACE2 receptor binding. Such molecules weredevised both by incorporating theACE2 long α-helix that interacts with the RBD (see above, Figure 7) in small proteins that were furtherdesigned to make additional interactions with thespike protein to enhance the affinity, as well as modeled from scratch. Such molecules would (i) not require obligatory storage at low temperatures, (ii) circumvent possiblesideeffects inherent to using antibodies (e.g., ADE: see above), (iii) thosepeptides, being 20-fold smaller than antibodies, have a much higher binding sitedensity per weight, (iv) potentially be applicable for internasal administration, e.g., as a gel or an aerosol, (v) make viral mutational escape very unlikely when being used in combinations. Promising peptides (56-64 amino acid residues long) were created that display excellent stability as well as high affinity for thespike protein (K-values ranging from 100 pM to 10 nM) and they were found to prevent infection of Vero cells with an IC50 between 24 pM and 35 nM (Cao et al., 2020).
Binding of Antibodies to the Spike Glycoprotein
Eversince the onset of the pandemic, numerous efforts have beenmade to track down neutralizing antibodies against SARS-CoV-2 that would help to combat theinfection by using them in passive immunization. Initially, already available antibodies against SARS-CoV were tested for their potency against SARS-CoV-2 and later on, new antibodies were specifically generated and analyzed. A number of structural data have beenmade available in thePDBdatabase. Antibodies in the pipeline areeither of the conventional IgG type (or their Fab fragments), but also of the camelid type (heavy chain-only, or VHH, alternatively calledsingle-domain antibodies). Because their concept is very different, both types of antibodies are conceived by Nature to recognizedifferent types of epitopes: while classical antibodies aredesigned to grasp smaller groups or peptides sticking out from proteins’ surfaces by using their two extended antigen-binding regions as two scoops, camelid antibodies (from which “nanobodies” arederived) form rather “finger-like” structures to penetrate in cavities of the antigen (Romão et al., 2016; Jovčevska andMuyldermans, 2020). Nanobodies have several advantages, one being that, because of their limitedsize (only 15 kDa, which is ten times smaller than classical H2L2 antibodies), they can be administered as inhalabledrugs, which for Covid-19 is an indisputable asset.Structures of SARS-CoV-2 RBD with various Fab fragments are available in thePDBdatabase and they were used for making overlays. In Figure 13, six such Fab fragments and one nanobody are seen bound to theSARS-CoV-2 RBD. This figure was made after superposing all structures using the RBD available in model 6VW1.pdb as the reference chain.
FIGURE 13
(A) An overlay was made of the SARS-CoV-2 spike protein RBD (from residues N334 till P527, colored dark green) with six antibody Fab fragments (all H- and L- chains are colored bluish and reddish, respectively) and one nanobody (colored orange). The Fab fragments in models 6XC2, 6XC4, 7BZ5, and 7C01 are pointing down, while Fab from model 6W41 (antibody CR3022) is pointing left, and the Fab from model 7BWJ is pointing right, which is overlapping with the nanobody from model 6Z2M.pdb. All these SARS-CoV-2-binding H2L2 antibodies, having neutralizing capacity, were cloned and expressed from memory-B cells present in PBMCs either isolated from Covid-19 recovered patients [mAb cc12.1 (CXC2.pdb) and mAb cc12.3 (6XC4.pdb): Rogers et al., 2020; mAb B38 (7BZ5.pdb): Wu et al., 2020; mAb CB6 (7C01.pdb): Shi et al., 2020); mAb P2B-2F6 (7BWJ.pdb): Ju et al., 2020] or from a convalescent SARS-CoV patient [mAb CR3022 (6W41.pdb): Yuan et al., 2020b]. The nanobody (H11-D4) was developed earlier against SARS-CoV and analyzed for its SARS-CoV-2 binding capacity (Huo et al., 2020). (B) Binding of antibody Fab fragment 7BWJ to the SARS-CoV-2 full spike trimer with chain B in open state (model 6VYB.pdb, left) and with all subunits in closed state (model 6VXX.pdb, right). An overlay was made between the structures (using the RBD of chain B) and no clashes were detected. (C) Binding of antibody Fab fragment 6XC4 to the SARS-CoV-2 full spike trimer with chain B in the open state (model 6VYB.pdb). An overlay was made between both structures (using the RBD of chain B) and no clashes were found. However, when chain B is also in the closed state, extensive clashes are seen with chain C residues (figure not shown; see Supplementary Material about how to detect clashes). In B and C, the ribbons of the spike protein subunits are colored yellow, blue and green for chains A, B and C, respectively, and red and gray for the Fab H- and L-chains, respectively.
(A) An overlay was made of theSARS-CoV-2spike protein RBD (from residues N334 till P527, coloreddark green) with six antibody Fab fragments (all H- and L- chains are colored bluish and reddish, respectively) and one nanobody (colored orange). TheFab fragments in models 6XC2, 6XC4, 7BZ5, and 7C01 are pointing down, whileFab frommodel 6W41 (antibody CR3022) is pointing left, and theFab frommodel 7BWJ is pointing right, which is overlapping with the nanobody frommodel 6Z2M.pdb. All theseSARS-CoV-2-binding H2L2 antibodies, having neutralizing capacity, were cloned andexpressed frommemory-B cells present in PBMCs either isolated fromCovid-19 recoveredpatients [mAb cc12.1 (CXC2.pdb) andmAb cc12.3 (6XC4.pdb): Rogers et al., 2020; mAb B38 (7BZ5.pdb): Wu et al., 2020; mAb CB6 (7C01.pdb): Shi et al., 2020); mAb P2B-2F6 (7BWJ.pdb): Ju et al., 2020] or from a convalescent SARS-CoVpatient [mAb CR3022 (6W41.pdb): Yuan et al., 2020b]. The nanobody (H11-D4) was developedearlier against SARS-CoV and analyzed for its SARS-CoV-2 binding capacity (Huo et al., 2020). (B) Binding of antibody Fab fragment 7BWJ to theSARS-CoV-2 full spike trimer with chain B in open state (model 6VYB.pdb, left) and with all subunits in closed state (model 6VXX.pdb, right). An overlay was made between the structures (using the RBD of chain B) and no clashes weredetected. (C) Binding of antibody Fab fragment 6XC4 to theSARS-CoV-2 full spike trimer with chain B in the open state (model 6VYB.pdb). An overlay was made between both structures (using the RBD of chain B) and no clashes were found. However, when chain B is also in the closed state, extensive clashes are seen with chain C residues (figure not shown; see Supplementary Material about how to detect clashes). In B and C, the ribbons of thespike protein subunits are colored yellow, blue and green for chains A, B and C, respectively, and red and gray for theFab H- and L-chains, respectively.Most of the antibodies analyzed compete for binding to theACE2 receptor, as can be seen from Supplementary Figures 12, Supplementary Figures 13 (see Supplementary Material). When an antibody binds to the RBD, several clashes are seen with theACE2 receptor. This is true for Fab fragments in models 6XC2, 6XC4, 7BZ5, and 7C01, which all bind to the same region of the RBD. This is also seen for theFab fragment in model 7BWJ where some, though less prominent, clashes are observed. These five antibodies weredescribed in literature to beneutralizing (Ju et al., 2020; Shi et al., 2020; Wu et al., 2020; Yuan et al., 2020b). Of course, to really compete with theACE2 receptor for binding, the affinity of such an antibody for thespike protein is of utmost importance: when the affinity of the antibody is too low, thespike glycoproteinmight nevertheless preferentially bind to theACE2 receptor, leading to delivery of the viral RNA into the host cell’s cytoplasm.Some structures are also available of a completeSARS-CoV-2spike protein trimer with antibody Fab fragments. TheFab fragment of antibody S309 was determined to potently neutralize both SARS-CoV andSARS-CoV-2 (Pinto et al., 2020). Supplementary Figures 14A,B (see Supplementary Material) shows binding of threeFabmolecules to the RBDs of each of the subunits of theSARS-CoV-2spike protein trimer.Another interesting antibody is CR3022, which was previously isolated from a SARS-CoVpatient. It is directed against the RBD, but theepitopes to which it binds aredifferent compared to the other antibodies (Figure 13). Using an in vitro assay, CR3022 proved to beneutralizing for SARS-CoV but not for SARS-CoV-2, though it is able to bind to its RBD, albeit with 100-fold lower affinity (Yuan et al., 2020a). Theneutralizing effect of this antibody for SARS-CoV was explained through structural modeling: it was envisaged that theepitope to which CR3022 binds can only be reached by the antibody molecules when at least two RBDs are in the open conformation and, moreover, they need to be slightly rotated (Yuan et al., 2020a). Otherwise, there would be clashes with other parts (e.g., the NTDs) of thespike protein trimer (Supplementary Figure 16D, see Supplementary Material). It was furtherdiscussed in the same paper that, enigmatically, antibodies not having an in vitro neutralizing effect may nevertheless display in vivo protection for reasons that need to be furtherexplored (Yuan et al., 2020a). Figure 13 also shows how someFabs are only able to bind to a ‘one-up’ (6XC4), while another binds a ‘one-up’ as well as a ‘none-up’ spike trimer (7BWJ), and anotherneeds more than one subunit in open state (6W41).Finally, when comparing binding of theFab fragment to theSARS-CoV-2 RBD in model 7BWJ.pdb with that of a nanobody in model 6YZ5.pdb, thedifference in the principle of antigen recognition between both antibodies catches theeye. As shown in Figure 13, both antibodies bind to the same region of the antigen. A moredetailed picture of the binding is shown in Figure 14. Binding by theFab fragment is due to residues belonging to two RBD loops, i.e., K444 till N450 and V483 till F490, which are grasped by the binding sites formed by the antibody H- and L-chains, respectively (Figure 14A). On the other hand, binding of the nanobody occurs becauseessentially two VHH loops, i.e., R27 till S30 andE100 till L106, fit into a shallow depression that is formed on the RBD between residues K444 till F456 andE484 till Y495 (Figure 14B). In theexample of theFab binding, sevenhydrogen bonds are formed between the RBD and the antibody (five with the H- and two with the L-chain), while in theexample of the nanobody the interaction is stabilized by elevenhydrogen bonds.
FIGURE 14
(A) Contact surface between the SARS-CoV-2 RBD (chain E, colored green) and an antibody Fab fragment from model 7BWJ.pdb, with the H- and L-chains colored purple and yellow, respectively. Amino acid residues that are within a distance of 3.5 Å from the opposing protein are shown with their backbone and side chains and were manually labeled. Hydrogen bonds are shown as green dashed lines; one hydrogen bond is colored gray because the distance between hydrogen donor and acceptor (3.32 Å) is slightly above the default maximal value of 3.20. (B) Contact surface between the SARS-CoV-2 RBD (chain E, colored green) and a nanobody (chain F, colored orange) from model 6YZ5.pdb. Amino acid residues that are within a distance of 3.5 Å from the opposing protein are shown with their backbone and side chains and were manually labeled. Hydrogen bonds are shown as green dashed lines; one hydrogen bond is colored gray because the distance between hydrogen donor and acceptor (3.33 Å) is again slightly above the default maximal value of 3.20. Literature references for structural codes: 7BWJ.pdb (Ju et al., 2020); 6YZ5.pdb (Huo et al., 2020).
(A) Contact surface between theSARS-CoV-2 RBD (chain E, colored green) and an antibody Fab fragment frommodel 7BWJ.pdb, with the H- and L-chains colored purple and yellow, respectively. Amino acid residues that are within a distance of 3.5 Å from the opposing protein are shown with their backbone andside chains and weremanually labeled. Hydrogen bonds are shown as greendashed lines; onehydrogen bond is colored gray because thedistance betweenhydrogendonor and acceptor (3.32 Å) is slightly above thedefault maximal value of 3.20. (B) Contact surface between theSARS-CoV-2 RBD (chain E, colored green) and a nanobody (chain F, colored orange) frommodel 6YZ5.pdb. Amino acid residues that are within a distance of 3.5 Å from the opposing protein are shown with their backbone andside chains and weremanually labeled. Hydrogen bonds are shown as greendashed lines; onehydrogen bond is colored gray because thedistance betweenhydrogendonor and acceptor (3.33 Å) is again slightly above thedefault maximal value of 3.20. Literature references for structural codes: 7BWJ.pdb (Ju et al., 2020); 6YZ5.pdb (Huo et al., 2020).
Neutralizing Antibodies That Bind to the NTD Prevent Required Conformational Changes in the Spike Protein Trimer
Monoclonal antibodies with neutralizing activity were isolated from convalescent Covid-19patients and characterized, some of which do not bind to the RBD, but rather to the NTD instead. From one of them (mAb 4A8), which binds with high (nM) affinity, the structure of theFab in complex with thespike trimer was intensively analyzed (Chi et al., 2020). The potent neutralizing activity of this mAb was speculatively ascribed to its restraining effect on the conformational changes in thespike trimer, which areessential for activation of thespike leading to invasion of the host cell. In the Supplementary Material, Supplementary Figures 14C,D,E show a structure of theSARS-CoV-2spike trimer in complex with threeFab fragments, each of them obviously binding to a different NTD and Supplementary Figure 15 visualizes the interface between thespike protein’s NTD and theFab fragment.
Other Viral Membrane Proteins
The Abundant Membrane Protein M
Protein M, with a molecular mass of 24–28 kDa in various coronaviruses, is themost abundant protein in the viral membrane. It is known to be involved in the organization of viral assembly and binds to thenucleocapsid (Nal et al., 2005; Dhama et al., 2020). It is a multi-pass trans-membrane protein with three TM helices that are connected by short peptides, and with a long C-terminal endodomain. In theSARS-CoV andMERS-CoV, the N-terminal ectodomain is N-glycosylated at onesingle position (Fung and Liu, 2018). It has been suggested fromelectron microscopy and statistical analyses that protein M occurs in two conformations, which is supposed to regulate virus particle shape andsize: an elongated structure that makes themembranemore rigid, with less curvature and high spikedensity, and a more compact one that renders themembranemore flexible and with less spikedensity (Neuman et al., 2011). However, there are no structural data available as yet for protein M from any of thecoronaviruses. Therefore, we have to rely exclusively on predictions (see Supplementary Material, Supplementary Figures 17, 18). Properties of the protein are summarized in Table 1.
The Minor Membrane Component Protein E
Protein E is the smallest of theSARS-CoV-s’ structural proteins, being 8.5–12 kDa in size, and its properties are summarized in Table 1. It has several functions, acting as an ion channel that is formed by homopentameric assembly of protein E subunits, but it is also involved in virus assembly and release, and interaction with the host (Yuan et al., 2006; Surya et al., 2018; Schoeman and Fielding, 2019; Dhama et al., 2020). Protein E is predicted to be a single-pass membrane protein with a short N-terminal peptide, followed by a TM helix and a longer C-terminal domain. However, theexact location of N- and C-termini is still a matter of debate (see Supplementary Material, Supplementary Figure 19). Some studies indicate that N- and C-termini might be located in the same compartment and it was also proposed that this protein might adopt different conformations in the viral membrane. Protein E fromSARS-CoV was found to be S-palmitoylated at central cysteine residues (see Supplementary Material, Supplementary Figure 20). Very shortly after thesecysteines are two potential N-glycosylation sites, N48 and N66, which were shown to be partially occupied (Fung and Liu, 2018). This wouldmean that, at least during biosynthesis, this part of the protein must face theER lumen. MERS-CoV protein E, on the other hand, does not have N-glycosylation sites.The structure of part (E8 till L65) of protein E fromSARS-CoVembedded in LMPG (lyso-myristoyl phosphatidylglycerol) micelles was unraveled by NMR technology. Thecysteine residues C40, C43 and C44 were replaced in the protein by alanines. All 16 models nicely coincide showing that there are no very flexible regions in the pentamer. The predicted transmembrane helices seem to form a central structure with the potential N-glycosylation sites at the outskirts (see Supplementary Material, Supplementary Figure 20). Theside views suggest that, at least when protein E is taken up in micelles, the N- and C-terminal amino acid residues are located at the sameside of themembrane, with the potential N-glycans and the palmitoyl chains at the oppositeside.
Wrapping Up the Viral RNA: The Soluble Nucleocapsid Protein N
Protein N (45–50 kDa) is the only soluble structural protein in theSARS-CoV-s. It is used by the virions to wrap up their RNA molecules (Chang et al., 2014; Dhama et al., 2020). It consists of two major domains that each contribute to RNA-binding: an N- and a C-terminal domain, the latter of which is used by the protein for dimerization. Both domains are linked to each other by a serine-arginine-rich peptide. A thirddomain at the C-terminus is important for interacting with protein M.SARS-CoV andMERS-CoV protein N molecules are phosphorylated on serine andthreonine residues at multiplesites, especially within the SR-rich peptide, by host kinases (Fung and Liu, 2018). Moreover, SARS-CoV protein N was proven to bemodified by sumoylation (on residueK62) but theeffect of this reaction needs further investigation (Fung and Liu, 2018). Finally, ADP-ribosylation also seems to occur in both SARS-CoV andMERS-CoV (Fung and Liu, 2018).Supplementary Figure 21 (see Supplementary Material) shows that, despite the rathermodest sequencesimilarity, both domains of protein N are structurally very similar in all threeSARS-CoV-s. Thedimeric C-terminal domains are attached with their N-terminal residues to the C-termini of two non-interacting N-terminal domains, making an extended overall structure. The C-terminal tails of the N-protein point in oppositedirections. Both domains as well as the linker region havemany basic residues, explaining theelevated theoretical pI value of 10, and the number of hydrophobic residues is very limited (the aliphatic index of protein N is very small: see Table 1). Theexcess positive charges on this protein are consideredessential for wrapping up the polyanionic viral RNA. Protein-RNA interactions are proposed to be guided further by base stacking interactions using the protein’s aromatic residues that are amply present in the two RNA-binding domains: the YWF content amounts to 10.4% and 11% in the N- and C-terminal domains, respectively (see Supplementary Material, Supplementary Figure 22). Curiously, protein N is also predicted to haveextendeddisordered regions (see Supplementary Material, Supplementary Figure 23), despite the fact that well-ordered structures weredetermined by X-ray crystallography. Only three regions (roughly residues G99-P142, A217-L230 and W301-Y360 in SARS-CoV-2) are predicted by the program IUPred to be ordered, i.e., some parts of the N- and C-terminal domains and part of the SR-rich peptide. It has beendiscussed (Chang et al., 2014) that inclusion of disordered regions (IDRs) within the structured regions of protein N not only increases the binding affinity for nucleotides, but also its binding cooperativity (making a next domain binding better and stronger). This may beexplained in the light of the following known IDR’s properties: enhanced binding/speed of interaction; promiscuity in binding partners; enabling larger interaction surfaces with partners upon complex formation (the IDR is wrapping itself tightly around its binding partner); facilitating introduction/removal of post-translational modifications (Tompa, 2012; Habchi et al., 2014).
Discussion
This paper summarizes anddiscusses the current knowledge of the structural proteins that make up thecoronaviruses in general, and thebeta-coronavirusesSARS-CoV-2, SARS-CoV andMERS-CoV in particular. Wedemonstrate how these proteins are well-designed by Nature for their function, how they cooperate with each other to make very successful virions, and how these viruses mislead and hijack the host for their own benefit. Certain aspects are well-known since they areexplained and illustrated in other papers, but some others remain often unnoticed or their importance underestimated, such as, for instance, the observation that coronavirusesmight transiently interact with sialoglycans/heparan-sulfate prior to binding to their true receptors, thereby facilitating and speeding up invasion of a host cell. Another point is that thesebeta-coronavirusesdeveloped very different ways of entering a host cell, i.e., either by directly releasing their RNA aftermembrane fusion, or after invading the host cell making use of theendocytotic pathway (occasionally with the help of NRP1), and in all routes they rely on the action of a plethora of host proteases that are ubiquitously available. These invasion routes exist side-by-side, and some virions may take one route, while others, at the same time and in the same host, may take the other. The way by which new virions leave a host cell through de-acidified lysozomes is also peculiar. Furthermore, it is of the utmost importance to keep an eye on new mutants that may develop in the future, which might turn theseSARS-CoV-s into even smarter particles than they already are today, possibly rendering our developeddefense strategies ineffective. Finally, although already a lot is known about theseSARS-CoV-s, at several points knowledge andessential details are still missing. It is hoped that thenear future will see these gaps being filled in, and that smart solutions, maybe still not been considered today, will emerge that will put an end to the pandemic that is currently straining the health systems globally. Very promising strategies to combat Covid-19 are in the pipeline, amongst others, thedevelopment of decoy receptors andmini-protein inhibitors, monoclonal antibodies and nanobodies that might find applications in nasal sprays, new and repurposed antivirals, and, of course, vaccines. Today, and after only onesingle year of development and clinical trials, four vaccines have already received approval in EU/United Kingdom and United States and are now being successfully applied. Two of them are viral vector vaccines in which recombinant DNA is packaged in a harmless adenovirus, either fromchimpanzee (Oxford-AstraZeneca) or of human origin (Johnson and Johnson). Two others apply newer technology, based on a synthetic piece of mRNA packaged in lipid nanoparticles (Pfizer/BioNTech andModerna) (Kaur and Gupta, 2020; Silveira et al., 2021). All four use coding sequences for thespike protein. The latest developments will certainly also contribute to our fight, not only against other types of viral infections, but also against cancer (Pardi et al., 2018; Zhang et al., 2019; Xu et al., 2020b; Miao et al., 2021).The problematic spreading of human coronaviruses early in this century, with SARS-CoV, MERS-CoV and themost recently developedSARS-CoV-2 as known culprits, unmistakably ushered a huge variety of structural studies dealing with all aspects of these viruses. This will beevenmore so if new pandemics emerge in thenear or moredistant future, a situation that is predicted by many researchers and healthcare workers to occur. Consequently, it is of the utmost importance to understand these structures and to be able to look at them in detail, using a combination of a series of bioinformatic tools, most of which are freely available thesedays through the internet. It is hoped that this publication will stimulatemore researchers and students to visualize the available structures on their computers and to use the bioinformatic tools that become available, which will help advance science in this and related fields. With this goal in mind, we present in the Supplementary Material a set of guidelines, using the interactive programDeepView (Guex and Peitsch, 1997; Guex et al., 2009), that allows non-specialists in structural biology to upload protein structures and scrutinize them. This program was also used to make the figures in this paper.
Epilog: Recent Developments
Circulating SARS-CoV-2 lineages wereestimated to accumulate nucleotidemutations, mostly synonymous, at a rate of about 2.7 permonth (Ducheneet al., 2020). However, at least threeexamples of much fastermutation rates have recently emerged, one in United Kingdom (Rambaut et al., 2020), another one in South Africa (Tegally et al., 2020) and a third one in Brazil (Faria et al., 2021). The United Kingdom variant has nine amino acidmutations in thespike protein when compared to the original Wuhan strain, one of which in the RBD (N501Y), while the SA and Brazilian variants display each ten non-synonymous mutations in thespike protein, three of which in the RBD (K417N, E484K, N501Y in the South African variant andK417T, E484K, N501Y in the Brazilian one). Although the reason for this rapiddevelopment remains enigmatic so far, intra-host evolution in an immune-deficient or immune-suppressed individual suffering from long-terminfection was suggested, possibly leading to an accumulation of “immune-escape” mutants. A major problem is that such mutations might affect theefficacy of vaccines being developed to combat Covid-19. The variant strains are analyzed in Supplementary Figures 24–26 (see Supplementary Material).
Author Contributions
SB andEV performed in-depth literature searches on the topic. SB wrote the first draft of themanuscript andmade the analyses and the figures and tables in this manuscript, which is partially based on a 30-h course “Bioinformatic Tools” that the first author has been teaching at Vrije Universiteit Brussel in the “Master of Science in Molecular Biology” study programsince 2010. Both authors contributed to manuscript revision and fine-tuning and approved the submitted version.
Conflict of Interest
The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.
Authors: Zunlong Ke; Joaquin Oton; Kun Qu; Mirko Cortese; Vojtech Zila; Lesley McKeane; Takanori Nakane; Jasenko Zivanov; Christopher J Neufeldt; Berati Cerikan; John M Lu; Julia Peukes; Xiaoli Xiong; Hans-Georg Kräusslich; Sjors H W Scheres; Ralf Bartenschlager; John A G Briggs Journal: Nature Date: 2020-08-17 Impact factor: 49.962
Authors: Noura H Abd Ellah; Sheryhan F Gad; Khalid Muhammad; Gaber E Batiha; Helal F Hetta Journal: Nanomedicine (Lond) Date: 2020-07-29 Impact factor: 5.307
Authors: Meng Yuan; Hejun Liu; Nicholas C Wu; Chang-Chun D Lee; Xueyong Zhu; Fangzhu Zhao; Deli Huang; Wenli Yu; Yuanzi Hua; Henry Tien; Thomas F Rogers; Elise Landais; Devin Sok; Joseph G Jardine; Dennis R Burton; Ian A Wilson Journal: Science Date: 2020-07-13 Impact factor: 47.728
Authors: Kui K Chan; Danielle Dorosky; Preeti Sharma; Shawn A Abbasi; John M Dye; David M Kranz; Andrew S Herbert; Erik Procko Journal: Science Date: 2020-08-04 Impact factor: 47.728
Authors: Sourish Ghosh; Teegan A Dellibovi-Ragheb; Adeline Kerviel; Eowyn Pak; Qi Qiu; Matthew Fisher; Peter M Takvorian; Christopher Bleck; Victor W Hsu; Anthony R Fehr; Stanley Perlman; Sooraj R Achar; Marco R Straus; Gary R Whittaker; Cornelis A M de Haan; John Kehrl; Grégoire Altan-Bonnet; Nihal Altan-Bonnet Journal: Cell Date: 2020-10-27 Impact factor: 41.582