Literature DB >> 34856799

Structure and Mutations of SARS-CoV-2 Spike Protein: A Focused Overview.

Rukmankesh Mehra1, Kasper P Kepp2.   

Abstract

The spike protein (S-protein) of SARS-CoV-2, the protein that enables the virus to infect human cells, is the basis for many vaccines and a hotspot of concerning virus evolution. Here, we discuss the outstanding progress in structural characterization of the S-protein and how these structures facilitate analysis of virus function and evolution. We emphasize the differences in reported structures and that analysis of structure-function relationships is sensitive to the structure used. We show that the average residue solvent exposure in nearly complete structures is a good descriptor of open vs closed conformation states. Because of structural heterogeneity of functionally important surface-exposed residues, we recommend using averages of a group of high-quality protein structures rather than a single structure before reaching conclusions on specific structure-function relationships. To illustrate these points, we analyze some significant chemical tendencies of prominent S-protein mutations in the context of the available structures. In the discussion of new variants, we emphasize the selectivity of binding to ACE2 vs prominent antibodies rather than simply the antibody escape or ACE2 affinity separately. We note that larger chemical changes, in particular increased electrostatic charge or side-chain volume of exposed surface residues, are recurring in mutations of concern, plausibly related to adaptation to the negative surface potential of human ACE2. We also find indications that the fixated mutations of the S-protein in the main variants are less destabilizing than would be expected on average, possibly pointing toward a selection pressure on the S-protein. The richness of available structures for all of these situations provides an enormously valuable basis for future research into these structure-function relationships.

Entities:  

Keywords:  SARS-CoV-2; antibody escape; mutation; spike protein; structural biology

Mesh:

Substances:

Year:  2021        PMID: 34856799      PMCID: PMC8673470          DOI: 10.1021/acsinfecdis.1c00433

Source DB:  PubMed          Journal:  ACS Infect Dis        ISSN: 2373-8227            Impact factor:   5.084


Introduction

Since the beginning of the pandemic caused by severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2),[1−3] the evolution of the virus has been of increasing concern due to the risk of more infectious, lethal, or antibody- or vaccine-resistant variants.[4−9] Of central interest is the spike glycoprotein (S-protein), the heavily glycosylated homotrimeric protein on the surface of coronaviruses giving them their characteristic appearance.[9−11] This protein is responsible for binding to the human cell-surface receptor angiotensin-converting enzyme 2 (ACE2) to promote cell entry.[12−14] Specific presentation of the S-protein to the human immune system is the mode of function of several major vaccines.[15,16] Considering its central and urgent importance, it is not surprising that many structures of the S-protein have been published. The number of these and the speed at which they have arrived document an outstanding achievement in modern structural biology. A recent database provides an excellent overview for analysis of these structures.[17] Without the recent technical breakthroughs in cryoelectron microscopy of macromolecules,[18−21] many of these structures would not have been possible, as should be appropriately honored in the context below. Structures have been solved for most relevant conformational states of the receptor binding domain (RBD) without or with (parts of) ACE2 or antibodies bound, at resolutions typically at 2–4 Å. While sequence-based evolution studies are very helpful and informative, evolution occurs in the context of a folded protein structure.[22−26] Because function is structure-dependent, the structure is the platform on which the amino acid evolution occurs, affecting both the extent of positive and neutral evolution. Accordingly, evolution rates tend to depend on the structural context, e.g., solvent exposure, of the evolving site.[27−29] Of particular importance is the need for the arising mutation to not compromise the overall fold stability of the protein, and thus, the evolution of new mutations often occurs with a trade-off not to undermine stability.[24,30−33] This trade-off may be the underlying force causing the prefusion S-protein to be metastable in its continuous selection for new mutations that evade antibodies and enhance ACE2 binding.[34] The structure context will aid the understanding of the evolution and function of the S-protein in relation to both immune evasion, development of treatments such as antibodies, and infection via ACE2 binding. S-protein structures from other viruses, notably other human coronaviruses,[35−37] can further strengthen such analysis. In this Review, we aim to facilitate the use of the wealth of structural information by providing an extensive but data-focused critical overview of more than 200 published structures of the S-protein, including the context of concerning mutations in new emerging variants that may affect ACE2 binding (and thus increase transmission) or reduce antibody binding (and thus impair the efficiency of vaccines).[5,38−40]

Overview of Spike Protein Structures

During natural fusion with ACE2 of the host cell, the S-protein is cleaved and irreversibly changes conformation into a more stable postfusion open conformation state (Figure a).[41] The prefusion S-protein is in a metastable closed state constituted in a lipid surface as a type-I membrane protein with an associated need to be stabilized, both upon fusion with the host cell, and also in the lab to characterize the protein. Strategies employed to this end have relied on mutation to proline in S2 (K986P and V987P), mutation of the furine cleavage site, stabilizing the C-terminal and transmembrane domain by mutation,[34,42] or employing cystine bridges to stabilize the prefusion trimer.[43] Strategies to improve stabilization and expression of the S-protein are progressing.[44,45]
Figure 1

Structural basis for S-protein fusion with host cells. (a) Prefusion closed state (PDB: 7DF3). ACE2 binding occurs via the open state, with RBDs in upward conformations (PDB: 7BNN). (b) Spike monomer cleavage sites: The first furin S1/S2 cleavage site is shown with yellow balls (R685–S686; fragment is missing in the structure) and the second S2̀ cleavage site in magenta (R815–S816). The S1 N-terminal domain (NTD) is followed by the S1 C-terminal domain (S1-CTD). The S1-CTD comprises the receptor binding domain (RBD) region. The second cleavage (S2̀) forms a fusion peptide upon ACE2 binding. Heptad repeats (HR1 and HR2) are located toward the C-terminal.

Structural basis for S-protein fusion with host cells. (a) Prefusion closed state (PDB: 7DF3). ACE2 binding occurs via the open state, with RBDs in upward conformations (PDB: 7BNN). (b) Spike monomer cleavage sites: The first furin S1/S2 cleavage site is shown with yellow balls (R685–S686; fragment is missing in the structure) and the second S2̀ cleavage site in magenta (R815–S816). The S1 N-terminal domain (NTD) is followed by the S1 C-terminal domain (S1-CTD). The S1-CTD comprises the receptor binding domain (RBD) region. The second cleavage (S2̀) forms a fusion peptide upon ACE2 binding. Heptad repeats (HR1 and HR2) are located toward the C-terminal. The S-protein covers the virus lipid surface as a trimer protein, containing the subunit S1 with the N-terminal domain (NTD), RBD, subdomains 1 and 2 (SD1 and SD2), and S2, with heptad repeat 1 (HR1), the central helix (CH), a connecting domain (CD), heptad repeat 2 (HR2), a transmembrane domain (TM), and the C-terminal part of the protein (Figure b).[46] In the closed conformation with all three RBDs in more compact, so-called “down” conformations (“closed” state), the S-protein is not strongly accessible to the human cell, but cleavage into S1 and S2 subunits enables the virus particle to merge with the human host cell membrane, after binding to ACE2 in the “open” conformation where one or more RBDs have changed to the “up” conformation (Figure a).[42,46,47] Mutations may increase infectivity by strengthening amino acid interactions (and thereby the affinity) with ACE2, possibly while also favoring the more open, ACE2-favoring conformations, whereas antibodies will tend to prevent fusion by shielding interaction between the S-protein and ACE2.[47] Recent innovations in cryoelectron microscopy of proteins provide a central underpinning for the success in elucidating the structural biology of SARS-CoV-2[18−21,48] and an important reminder of the importance of basic science enabling later, sometimes urgent applied science. These breakthroughs include advancements in sample handling, electron detectors, and image processing.[19,20,48] The structures are obtained with samples of protein molecules typically deposited with a thin layer of vitrified ice on a carbon grid and rapidly cooled using a cryoagent.[19,20,49] Subsequent 3D-reconstruction of the protein structure is typically done by single particle analysis[48] or subtomogram averaging of multiple particle data to inform on, e.g., conformational ensembles.[19] Proteins that are embedded in lipids in vivo (such as the spike protein or human membrane proteins) are often difficult to stabilize due their special hydrophobic parts, and various techniques to stabilize them by mutations or with detergents, amphipols, and nanodiscs are common.[50−52] It is important to note for the following that the structures discussed at low temperature may not reflect the real conformations of the S-protein at physiological temperature (37 °C): freezing, which is fundamentally important to reduce thermal rearrangements due to ionization-induced radical reactions,[53] tends to remove some conformational dynamics of the protein.[54−58] Freezing-out important conformations and “cryo-contraction” have been observed in X-ray diffraction studies at variable temperature for ribonuclease[56] and myoglobin[57] and in molecular simulations.[54] Conformational changes play a major role in the function of the S-protein upon fusion with the human host cell, and one can expect the conformations to be temperature-dependent, as has indeed recently been observed.[59] Table shows a list of nearly full length (defined here as more than 900 residue coordinates) S-protein structures that we identified from searching the literature, UniProt, and the Protein Data Bank as of June 2021. Structures in the PDB that were not fully documented in the form of a full paper either published or in preprint were not included. The number of structures published during 1.5 years of pandemic is notable; we are not aware of any similar effort in modern structural biology. In addition, several structures of important natural variants have also been elucidated, as summarized in Table , and with already published structures of other coronavirus S-proteins (Table ), this provides a wealth of structural information for detailed structure-based functional and evolutionary studies.
Table 1

Single-Particle Cryoelectron Microscopy Studies Reporting Nearly Full-Length (N > 900) SARS-CoV-2 S-Protein Structures

protein statePDB codesrelease dateareference
prefusion6VSB2020–02–26Wrapp et al.[42]
open and closed6VYB, 6VXX2020–03–11Walls et al.[47]
Ab-bound6WPS2020–05–27Pinto et al.[187]
closed/all down6X29, 6X2A, 6X2B, 6X2C2020–05–27Henderson et al.[46]
closed/all down6X6P2020–06–10Herrera et al.[45]
Ab-bound7BYR2020–06–10Cao et al.[188]
closed, cleaved, 1-up6ZGE, 6ZGI, 6ZGG2020–07–01Wrobel et al.[80]
closed + Ab-bound6Z972020–07–01Huo et al.[189]
Ab-bound7C2L2020–07–01Chi et al.[190]
Ab-bound6XCM, 6XCN2020–07–01Barnes et al.[191]
Ab-bound6ZDH2020–07–01Zhou et al.[192]
pre- and postfusion6XR8, 6XRA2020–07–22Cai et al.[41]
cys-stabilized6ZOX, 6ZOZ, 6ZOY, 6ZP1, 6ZP0, 6ZP22020–07–22Xiong et al.[43]
Ab-bound6XEY2020–07–22Liu et al.[193]
prefusion stabilized6ZP5, 6ZP7, 6ZOW2020–07–29Melero et al.[68]
pH dependent states6XM0, 6XM3, 6XM4, 6XM5, 7JWY, 6XLU2020–08–12Zhou et al.[66]
prefusion closed6X792020–08–19McCallum et al.[194]
open7CN92020–08–26Liu et al.[195]
Ab-bound7JJI, 7JJJ2020–08–26Bangaru et al.[196]
prefusion closed6XF5, 6XF62020–09–02Zhou et al.[78]
Ab-bound7CHH2020–09–16Du et al.[197]
free + ACE2-bound7A93, 7A94, 7A95, 7A96, 7A97, 7A982020–09–16Benton et al.[198]
nanobody-bound6ZXN2020–09–23Hanke et al.[199]
Ab-bound (“inhibitor”)7JZN, 7JZL2020–09–23Cao et al.[98]
closed/all down6ZB4, 6ZB52020–09–30Toelzer et al.[200]
Ab-bound7K43, 7K4N2020–10–07Tortorici et al.[89]
with human VH binder7JWB2020–10–07Bracken et al.[201]
Ab-bound7JW0, 7JVC, 7JV6, 7JV42020–10–14Piccoli et al.[202]
Ab-bound7K8S, 7K8V, 7K8W, 7K8T, 7K8U, 7K8Z, 7K8X, 7K8Y, 7K902020–10–21Barnes et al.[203]
Ab-bound7A29, 7A252020–10–21Custodio et al.[204]
closed and 1-up7A4N, 7AD12020–11–04Juraszek et al.[79]
Ab-bound7KKK, 7KKL2020–11–11Schoof et al.[205]
prefusion and ACE2-bound7KJ2, 7KJ3, 7KJ4, 7KJ52020–11–11Xiao et al.[64]
ACE2-bound7CT52020–11–18Guo et al.[206]
closed and open and Ab-bound7DDD, 7DDN, 7DD2, 7DCX, 7DK6, 7DK4, 7DCC, 7DK7, 7DD8, 7DK52020–11–25Zhang et al.[207]
ACE2 complex7KNB, 7KMZ, 7KMS, 7KNE, 7KNH, 7KNI2020–12–09Zhou et al.[67]
Ab-bound7CWS, 7CWT, 7CWU2020–12–16Wang et al.[208]
closed, open, ACE2-bound7DF3, 7DK3, 7DF42020–12–16Xu et al.[82]
free and Ab-bound7CAB, 7CAC, 7CAI, 7CAK2020–12–16Lv et al.[97]
Ab-bound7CWM, 7CWN, 7CWL2020–12–16Yao et al.[96]
Ab-bound7L06, 7L09, 7L02,2020–12–30Williams et al.[209]
nanobody-bound7KSG, 7B182021–01–20Koenig et al.[210]
Ab-bound7LAB, 7LAA, 7LD1, 7LCN, 7LJR2021–01–27Li et al.[211]
Ab-bound7L3N2021–02–03Jones et al.[212]
Ab-bound7KS92021–02–10Banach et al.[213]
Ab-bound7KMK, 7KML, 7KXJ, 7KXK2021–02–10Miersch et al.[214]
vaccine BNT162b27L7K2021–02–24Vogel et al.[215]
Ab-bound7NDC, 7NDD, 7NDA, 7NDB, 7ND7, 7ND8, 7ND5, 7ND6, 7ND9, 7ND3, 7ND42021–03–03Dejnirattisai et al.[216]
Ab-bound7LSS, 7LS92021–03–17Cerutti et al.[217]
locked, active, ACE2-bound7DWY, 7DWZ, 7DX5, 7DX6, 7DX3, 7DWX, 7DX9, 7DX7, 7DX8, 7DWX, 7DX0, 7DX1, 7DX22021–03–31Yan et al.[63]
Ab-bound7L56, 7L57, 7L582021–04–14Rapp et al.[218]
Ab-bound C37LXY, 7LXZ, 7LY22021–04–14McCallum et al.[92]
bound to biliverdin7NT9, 7NTA, 7NTC2021–04–28Rosa et al.[219]
Ab-bound7M6E, 7M6F, 7M6G, 7M6H, 7M6I2021–05–05Scheid et al.[220]
Ab-bound7MKL2021–05–12VanBlargan et al.[221]
Ab-bound7AKD, 7AKJ2021–05–19Fedry et al.[222]
Ab-bound7KQE, 7KQB2021–05–26Asarnow et al.[223]
Ab-bound7N0G, 7N0H2021–06–02Ahmad et al.[224]
Ab-bound7DZW, 7DZX, 7DZY2021–06–02Liu et al.[225]
Ab-bound7E8C2021–06–09Cao et al.[226]
Ab-bound7MY2, 7MY32021–06–16Xu et al.[227]
Ab-bound7LRT, 7MM02021–07–14Wang et al.[228]

Release date is approximative as it refers to the first PDB code listed in each study.

Table 2

Published Spike-Protein Mutant/Variant Structures

protein statePDB coderelease datereference
D614G variant6XS62020–07–22Yurkovetskiy et al.[69]
D614G variant7KDJ, 7KDK, 7KDH, 7KDI, 7KDL, 7KDG, 7KEC, 7KEA, 7KEB, 7KE92020–11–04Gobeil et al.[65]
D614G variant7DX12021–03–31Yan et al.[63]
D614G closed7BNM2021–02–03Benton et al.[124]
D614G open conformation7BNN2021–02–03Benton et al.[124]
D614G open 2-RBD-up7BNO2021–02–03Benton et al.[124]
B.1.1.7/alpha 1-RBD-up7LWT, 7LWU, 7LWV2021–03–31Gobeil et al.[125]
B.1.1.7/alpha 3-RBD down7LWS2021–03–31Gobeil et al.[125]
B.1.1.28 1-RBD-up7LWW2021–03–31Gobeil et al.[125]
Mink Cluster 5 1-RBD up7LWM, 7LWO2021–03–31Gobeil et al.[125]
Mink Cluster 5 2-RBD up7LWP2021–03–31Gobeil et al.[125]
Mink Cluster 5 3-RBD-down7LWI, 7LWJ, 7LWK, 7LWL2021–03–31Gobeil et al.[125]
P.1/gamma + ACE27NXC2021–04–07Gobeil et al.[125]
B.1.351/beta7LYK, 7LYL, 7LYN, 7LYO, 7LYP, 7LYQ2021–03–31Gobeil et al.[125]
alpha/beta7N1Q, 7N1T, 7N1U, 7N1V, 7N1W, 7N1X, 7N1Y2021–07–07Cai et al.[127]
P.1/gamma7M8K2021–05–05Wang et al.[229]
N501Y mutant Ab/ACE2-bound7MJG, 7MJH, 7MJJ, 7MJK, 7MJM2021–05–12Zhu et al.[230]
B.1.429/epsilon + S2M11,S2L207N8H, 7NHI2021–07–14McCallum et al.[126]
Hexapro stable lab mutant6XKL2020–07–15Hsieh et al.[44]
Table 3

Other Human Coronavirus Spike Protein Structures

proteinPDBrelease datereference
human coronavirus HKU15I082016–03–02Kirchdoerfer et al.[35]
human coronavirus NL63 spike5SZS2016–09–14Walls et al.[231]
MERS-CoV5X5C, 5X5F, 5X592017–05–03Yuan et al.[37]
MERS-CoV5W9K, 5W9I2017–08–16Pallesen et al.[36]
MERS-CoV6Q04, 6Q05, 6Q06, 6Q072019–12–11Park et al.[232]
SARS-CoV5X5B, 5X582017–05–03Yuan et al.[37]
SARS-CoV5XLR, 5WRG2017–06–07Gui et al.[233]
SARS-CoV6CRV, 6CRX, 6CRW, 6CRZ, 6CS1, 6CS0, 6CS22018–04–11Kirchdoerfer et al.[234]
ACE2-bound SARS-CoV6ACK, 6ACJ, 6ACC, 6ACD, 6ACG2018–08–08Song et al.[235]
human coronavirus 229E spike6U7H2019–11–13Li et al.[236]
human coronavirus OC43 trimer6NZK, 6OHW2019–06–05Tortorici et al.[132]
HKU2 S-protein6M152020–05–27Yu et al.[237]
Release date is approximative as it refers to the first PDB code listed in each study. The many available structures even for presumably same protein states raises a question on the relevance and transferability of conclusions based on different structures. Many of the structures are of excellent resolution considering that the size of the protein and the way the structures are obtained, approaching 3 Å resolution in some cryo-EM structures. Also, many conformation states and both antibody(Ab)-bound and ACE2-bound structures have been obtained so that the structures can be used comparatively, and we can estimate the heterogeneity between them and how this affects structure–function relationships. For smaller parts of the RBD interacting with antibodies, some crystal structures are available at resolutions approaching 2 Å.

Structural Heterogeneity: Which Structure Should Be Used?

The protein structures listed in Tables –3 were produced by different research groups and sometimes different protocols and protein states. The large number of studies complicates the choice of structure to use for structure-guided functional or evolutionary analysis. Of concern in this choice is (1) the overall quality of the structure, (2) the site coverage, (3) the protein state of interest (mutated/stabilized, prefusion, postfusion, free or bound to ACE2 or antibody), and (4) the conformation state of the protein (e.g., open or closed, 0, 1, 2, or 3 RBD in the upward conformation). Tables –7 display various properties of the SARS-CoV-2 S-protein structures relevant to making these decisions: Table summarizes data for the apoprotein,Table for the protein bound to ACE2, and Tables and 7 for the protein bound to antibodies.
Table 4

Properties of Some Representative Nearly Complete Structures of Apo-Trimer-S-Proteina

proteinPDBNchains% outl.res (Å)RSAavgreference
pH 5.56XM0105830.12.70.29Zhou et al.[66]
pH 5.5, 1-up, conf. 16XM3106030.12.90.29Zhou et al.[66]
pH 5.5, 1-up, conf. 26XM4106030.12.90.29Zhou et al.[66]
pH 5.5, closed6XM5105830.03.10.28Zhou et al.[66]
pH 4.57JWY106330.12.50.28Zhou et al.[66]
pH 4.06XLU106330.12.40.27Zhou et al.[66]
prefusion6Z97100230.03.40.28Huo et al.[189]
prefusion, closed6XF5100930.03.50.27Zhou et al.[78]
prefusion, 1-up6XF699930.04.00.28Zhou et al.[78]
1-up closed6ZP5112130.03.10.28Melero et al.[68]
prefusion 1-up6ZP799630.03.30.29Melero et al.[68]
stabil. closed7A4N96330.02.80.27Juraszek et al.[79]
1-up7AD196730.02.90.28Juraszek et al.[79]
closed6X7995030.12.90.27McCallum et al.[194]
closed6X6P101730.03.20.26Herrera et al.[45]
closed6X2997230.02.70.27Henderson et al.[46]
1-up6X2A97830.13.30.27Henderson et al.[46]
2-up6X2B97730.23.60.28Henderson et al.[46]
closed6X2C97230.13.20.27Henderson et al.[46]
closed C1 symmetry6ZB4105530.03.00.28Toelzer et al.[200]
closed C3 symmetry6ZB5103230.02.90.28Toelzer et al.[200]
closed7DDD108830.13.00.25Zhang et al.[207]
uncleaved, closed6ZGE109830.22.60.27Wrobel et al.[80]
cleaved 1-up6ZGG107130.13.80.30Wrobel et al.[80]
cleaved closed6ZGI109830.22.90.28Wrobel et al.[80]
closed7DF3108830.32.70.24Xu et al.[82]
closed6VXX97230.22.80.27Walls et al.[47]
open7DDN106830.16.30.28Zhang et al.[207]
open7DK3106230.26.00.27Xu et al.[82]
open6VYB97930.33.20.27Walls et al.[47]
locked7DWY109930.02.70.26Yan et al.[63]
active7DWZ100730.53.30.29Yan et al.[63]
2-RBD-up7A93107430.05.90.29Benton et al.[124]
prefusion6VSB98930.03.50.28Wrapp et al.[42]
stabilized closed6ZOX101730.03.00.26Xiong et al.[43]
stabilized locked6ZOZ107730.03.50.25Xiong et al.[43]
stabilized closed6ZP0103030.03.00.26Xiong et al.[43]
stabilized locked6ZP2106030.03.10.25Xiong et al.[43]
prefusion7JJI110930.23.60.26Bangaru et al.[196]
prefusion7KJ599930.43.60.28Xiao et al.[64]
closed7CAB102930.13.50.26Lv et al.[97]
open7CN9106130.04.70.32Liu et al.[195]
1-up nonstabil.7KDH97930.43.30.27Gobeil et al.[65]
closed nonstabil.7KDG97230.03.00.26Gobeil et al.[65]

N = number of residues; chains = chains in structure; res (Å) = resolution in Å; RSAavg = average solvent exposure of all residues. % outl. = outliers of Ramachandran plot in %.

Table 7

Representative X-ray Structures of S-Protein RBD Bound to Antibodiesa

Ab bound to RBDPDB% outl.res (Å)reference
C5 nanobody7OAO0.01.5Huo et al.[238]
H3/C17OAP0.21.9Huo et al.[238]
H3/C1 (alpha)7OAQ0.01.6Huo et al.[238]
H3/C1 (N501Y)7OAU0.01.7Huo et al.[238]
S2E127K3Q0.01.4Tortorici et al.[89]
S2X357JXE0.02.0Piccoli et al.[202]
S2A47JXD0.12.5Piccoli et al.[202]
S2H147JXC0.22.5Piccoli et al.[202]
VHH E7KN50.11.9Koenig et al.[210]
P4A17CJF0.02.1Guo et al.[239]
C1A-C27KFX0.02.2Clark et al.[240]
C1A-B127KFV0.12.1Clark et al.[240]
C1A-F107KFY0.02.1Clark et al.[240]
7D67EAM0.41.4Li et al.[101]
COVOX-2697NEH0.01.8Supasa et al.[99]
COVOX-269 (N501Y)7NEG0.02.2Supasa et al.[99]
B387BZ50.01.8Wu et al.[241]
S309/S2X357R6W0.21.8Starr et al.[242]
LY-CoV4817KMI0.01.7Jones et al.[212]
LY-CoV5557KMG0.22.2Jones et al.[212]
LY-CoV4887KMH0.01.7Jones et al.[212]
COVOX-222/EY6A7NX60.12.3Dejnirattisai et al.[100]
COVOX-222/EY6A (K417N)7NX70.22.3Dejnirattisai et al.[100]
COVOX-222/EY6A (K417T)7NX80.12.0Dejnirattisai et al.[100]
COVOX-222/EY6A (N501Y)7NX90.12.4Dejnirattisai et al.[100]
COVOX-222/EY6A (beta)7NXA0.12.5Dejnirattisai et al.[100]
COVOX-222/EY6A (gamma)7NXB0.22.7Dejnirattisai et al.[100]
SR317D2Z0.02.0Yao et al.[243]
MR17-SR317D300.02.1Yao et al.[243]
WCSL 1297MZI0.01.9Wheatley et al.[244]
PDI 427MZG0.02.0Wheatley et al.[244]
Re5D067OLZ0.01.8Guttler et al.[245]
CR30226YLA0.22.4Huo et al.[189]
BD-2367CHB0.22.4Du et al.[197]
Sb14/Sb687MFU0.21.7Ahmad et al.[224]
Sb457KGJ0.02.3Ahmad et al.[224]

N = residues in structure; chains = chains in structure; res (Å) = resolution in Å; RSAavg = average solvent exposure of all residues. % outl. = outliers of Ramachandran plot in % (from PDB full report).

Table 5

Properties of Nearly Complete (N > 900) Structures of Spike Protein Bound to ACE2a

protein statePDBNchains% outl.res (Å)RSAavgreference
1 ACE27A94108640.23.90.29Benton et al.[198]
1 ACE2, 1-up7A95107540.24.30.30Benton et al.[198]
1 ACE2, 1-up7A96107140.24.80.29Benton et al.[198]
2 ACE2, bound7A97107250.14.40.29Benton et al.[198]
3 ACE2, bound7A98107160.25.40.29Benton et al.[198]
1 ACE2 pH 7.47KNB108340.03.90.29Zhou et al.[67]
2 ACE2 pH 7.47KMZ108550.03.60.29Zhou et al.[67]
3 ACE2 pH 7.47KMS108660.03.60.28Zhou et al.[67]
1 ACE2, pH 5.57KNE108340.03.90.29Zhou et al.[67]
2 ACE2, pH 5.57KNH108550.03.70.30Zhou et al.[67]
3 ACE2, pH 5.57KNI108260.03.90.28Zhou et al.[67]
1 ACE27DF4108240.23.80.27Xu et al.[82]
ACE2/PD, 1-up7DX5106540.33.30.28Yan et al.[63]
ACE2/PD, 2-up7DX6106540.33.00.29Yan et al.[63]
ACE2/2PD, 3-up7DX9106550.33.60.29Yan et al.[63]
ACE2/PD, 1-up7DX7106540.33.40.28Yan et al.[63]
ACE2/2PD, 2-up7DX8106550.32.90.28Yan et al.[63]
1 ACE27KJ2106940.33.60.28Xiao et al.[64]
2 ACE27KJ3106950.43.70.28Xiao et al.[64]
3 ACE27KJ4106960.43.40.28Xiao et al.[64]
design ACE27CT5106760.14.00.29Guo et al.[206]

N = number of residues in structure; chains = chains in structure; res (Å) = resolution in Å; RSAavg = average solvent exposure; % outl. = outliers of Ramachandran plot in % (from PDB full report); PD = peptidase domain of ACE2.

Table 6

Properties of Some Published Nearly Complete Cryo-EM Structures of Spike Protein Bound to Antibodiesa

proteinPDBNchains% outl.res (Å)RSAavgreference
Fab 2–4 closed6XEY108190.13.30.28Liu et al.[193]
C105 state 16XCM104570.03.40.28Barnes et al.[191]
C105 state 26XCN103590.03.70.28Barnes et al.[191]
S2M11/S2L287LXZ1080150.22.60.28McCallum et al.[92]
S2M11/S2X3337LXY1065150.22.20.27McCallum et al.[92]
S2M11/S2M287LY21067150.22.50.27McCallum et al.[92]
S3096WPS99590.33.10.27Pinto et al.[187]
EY6A6ZDH107290.03.70.29Zhou et al.[192]
Sb237A29107660.02.90.29Custodio et al.[204]
Sb237A25107660.03.10.29Custodio et al.[204]
Fab 2–77LSS104450.03.70.28Cerutti et al.[217]
1–57 Fab7LS9113190.13.40.25Cerutti et al.[217]
P17 1-up7CWM108790.13.60.28Yao et al.[96]
P17/H0147CWN1086150.13.20.28Yao et al.[96]
P17 2-up7CWL109090.13.80.28Yao et al.[96]
3C1 fab 2-up7DD2108170.05.60.30Zhang et al.[207]
2 × 3C1 fab 2-up7DCX108190.05.90.29Zhang et al.[207]
2 × 2H2 Fab 2-up7DK6108170.04.30.28Zhang et al.[207]
3 × 2H2 Fab 2-up7DK4107990.03.80.27Zhang et al.[207]
3 × 3C1 fab 3-up7DCC108190.04.30.30Zhang et al.[207]
3 × 2H2 Fab 3 up7DK7108190.09.70.29Zhang et al.[207]
1 × 3C1 fab 1-up7DD8108150.07.50.29Zhang et al.[207]
1 × 2H2 Fab 1-up7DK5108150.013.50.28Zhang et al.[207]
3 × 4A87C2L107390.93.10.29Chi et al.[190]
1 × Ab23-Fab7BYR105150.03.80.28Cao et al.[188]
1 × Fab H47L58110850.45.10.36Rapp et al.[218]
3 × Fab 2–437L56105690.23.60.28Rapp et al.[218]
1 × Fab 2–157L57105550.15.90.38Rapp et al.[218]
nanobody Ty16ZXN107660.02.90.28Hanke et al.[199]
1 × H014 Fab 1-up7CAC107250.13.60.28Lv et al.[97]
2 × H014 Fab 2-up7CAI106970.23.50.28Lv et al.[97]
3 × H014 Fab 3-up7CAK106190.23.60.28Lv et al.[97]
S-6P BD-368-27CHH105290.03.50.28Du et al.[197]
FC05 + H0147CWS1089150.43.40.28Wang et al.[208]
hb27 + fc057CWT1088150.43.70.28Wang et al.[208]
P17 + FC057CWU1090150.13.50.29Wang et al.[208]
S2H13, 1-up7JV4102590.33.40.29Piccoli et al.[202]
S2H13, closed7JV6101990.33.00.29Piccoli et al.[202]
S3047JW0106190.54.30.33Piccoli et al.[202]
S2A47JVC106490.63.30.31Piccoli et al.[202]
VH domain7JWB107940.13.20.28Bracken et al.[201]
LCB1 2-up7JZL101860.32.70.28Cao et al.[98]
LCB3 2-up7JZN101860.33.10.28Cao et al.[98]
S2M117K43105990.22.60.26Tortorici et al.[89]
S2E127K4N103790.43.30.29Tortorici et al.[89]
human Ab C0027K8S104990.23.40.28Barnes et al.[203]
human Ab C1107K8V103770.03.80.28Barnes et al.[203]
human Ab C1197K8W105470.13.60.28Barnes et al.[203]
human Ab C0027K8T104690.13.40.27Barnes et al.[203]
human Ab C1047K8U103650.03.80.29Barnes et al.[203]
human Ab C1357K8Z102670.03.50.28Barnes et al.[203]
human Ab C1217K8X104270.03.90.29Barnes et al.[203]
human Ab C1217K8Y104270.04.40.29Barnes et al.[203]
human Ab C1447K90106190.13.20.27Barnes et al.[203]
nanobody Nb67KKK103760.03.00.27Schoof et al.[205]
nanobody mNb67KKL103760.22.90.28Schoof et al.[205]
Fab 15033-77KMK106670.24.20.27Miersch et al.[214]
Fab 15033-77KML107190.23.80.28Miersch et al.[214]
Fab 910-307KS9103950.04.80.28Banach et al.[213]
nanobody7KSG109660.03.30.29Koenig et al.[210]
1 × 2G127L02105270.23.20.27Williams et al.[209]
2 × 2G127L061052110.13.30.28Williams et al.[209]
2G127L09105270.23.10.27Williams et al.[209]
LY-CoV5557L3N105550.23.30.28Jones et al.[212]
BNT162b27L7K98630.03.30.29Vogel et al.[215]
DH10417LAA108550.23.40.28Li et al.[211]
DH10527LAB102490.33.00.27Li et al.[211]
DH10477LD1104590.43.40.27Li et al.[211]
DH1050.17LCN104290.93.40.27Li et al.[211]
DH10437LJR105350.43.70.27Li et al.[211]
COVOX-253H55L7NDA108250.43.30.27Dejnirattisai et al.[216]
COVOX-253H165L7NDB111150.14.60.27Dejnirattisai et al.[216]
COVOX-1597NDC108290.14.10.28Dejnirattisai et al.[216]
COVOX-1597NDD108290.14.20.28Dejnirattisai et al.[216]
COVOX-407ND3103850.03.70.28Dejnirattisai et al.[216]
COVOX-1507ND5107450.13.40.28Dejnirattisai et al.[216]
COVOX-1587ND6107450.07.30.28Dejnirattisai et al.[216]
COVOX-3167ND7103890.43.60.27Dejnirattisai et al.[216]
COVOX-3847ND8107250.33.50.27Dejnirattisai et al.[216]
COVOX-253H55L7ND9111850.12.80.27Dejnirattisai et al.[216]

N = residues in structure; chains = chains in structure; res (Å) = resolution in Å; RSAavg = average solvent exposure of all residues. % outl. = outliers of Ramachandran plot in % (from PDB full report); Fab = S-protein-binding antibody fragment.

N = number of residues; chains = chains in structure; res (Å) = resolution in Å; RSAavg = average solvent exposure of all residues. % outl. = outliers of Ramachandran plot in %. N = number of residues in structure; chains = chains in structure; res (Å) = resolution in Å; RSAavg = average solvent exposure; % outl. = outliers of Ramachandran plot in % (from PDB full report); PD = peptidase domain of ACE2. N = residues in structure; chains = chains in structure; res (Å) = resolution in Å; RSAavg = average solvent exposure of all residues. % outl. = outliers of Ramachandran plot in % (from PDB full report); Fab = S-protein-binding antibody fragment. N = residues in structure; chains = chains in structure; res (Å) = resolution in Å; RSAavg = average solvent exposure of all residues. % outl. = outliers of Ramachandran plot in % (from PDB full report). In terms of quality, 2.5–3.5 Å represents the resolution range where amino acid side chain conformations and functionally important surface residues become resolved.[60] A resolution of 3 Å does not enable identification of all atom positions, including water molecules, but provides very good information on the overall backbone conformation and secondary and tertiary structure.[60] We expect many surface residues to be not well-resolved at 3 Å resolution, and some rotamers may be mismodeled. These residues are the functionally important ones in the case of the S-protein, i.e., resolution improvements toward the 2-Å range would be of substantial value. Generally, many of the published structures display very good resolution of the secondary structure and tertiary structure of the S-protein, but the confidence in the orientations of individual surface residues is limited (see below). The percentage of outliers from Ramachandran plots (torsion angles of the peptide backbone) are also shown in Tables –7 as an indicator of the structure’s backbone conformations.[60] These were calculated using the Procheck program[61] (version 3.6.2) available via the PDBsum server from EMBL-EBI.[62] A smaller number implies a normal and expected backbone conformation in the structure, whereas a larger number implies more unusual or “strained” backbone conformations in the structure. For the apo-S-protein structures of Table , the values are generally quite as expected, with Ramachandran outlier residues typically constituting only 0.0–0.3% of the structures. The few exceptions to this are the active state 7DWZ (0.5%),[63] the prefusion state 7KJ5 (0.4%),[64] and the 1-up conformation structure 7KDH (0.4%),[65] but these numbers are fully reasonable, as 0–0.5% translates to only 0–5 of the approximately 1000 residues in the S-protein having an unusual backbone conformation. In terms of amino acid coverage, the total number of residue sites resolved by coordinates is listed as “N” in Tables –7. For a given resolution of ∼3 Å, a more complete structure seems more suitable for study, or if specific mutations are of interest, they should be at least covered by full coordinates in the structure. For example, P681 is an important mutation site that is typically not present in the structures, and most structures miss many of the disordered N-terminal sites harboring mutations of interest such as S13I, L18F, T20N, and P26S. Thus, for example, structure 7LXY has site T20 but misses P26, whereas 6ZB4 has P26 but misses T20. Many other structures, such as 6VXX and 6XM0, miss both. In addition to choosing the correct protein state for the analysis of interest (apoprotein or bound to, e.g., ACE2), it is also important to consider that some protein structures may represent different, more or less open states, both in terms of having 0, 1, 2, or 3 RBD in the upward conformation, but also in terms of other features, such as mutated or deleted transmembrane domains that may affect the overall structure. Herrera et al. reported that some structures (they identified 6VXX, 6X29, 6X2C, 6X79, 6ZOX, 6ZOY, 6ZP0, 6ZP1, 6ZWV) display one conformation (called conformation 1), whereas others (e.g., 6XR8, 6ZGE, 6ZGH, 6ZGI, 6ZP2, 7JJI, 7JJJ) display conformation 2 and have missing residues with respect to the region that starts at the notorious site 614 (614–642).[45] As discussed below, the D614 site has substantial heterogeneity even between structures resembling the supposedly same state. Evidently, the S-protein is very sensitive to changes in the environment, and it is reasonable to expect the in vivo conformations of the protein in the lipid membrane of the virus to differ from the structures obtained in vitro: pH[66,67] and temperature[59] change the conformational states, and molecular crowding, salt, or other features of the local heterogeneous in vivo environment may be expected to do so as well. Conformational states may be affected by delicate effects in the protocols,[68] e.g., the mutations used to stabilize the protein for characterization[34] and the temperature effects on the more dynamic surface residues of the protein.[54−58] Higher temperature as in the human body is likely to favor more entropic conformations states, which might affect the epitopes and may explain the temperature-dependence of some structures.[59] The extent of specific protein variants to favor certain open or closed conformations should thus be seen in this context. For example, the D614G mutation has been reported to favor a more open conformational state[69] but the tendency to do so is probably temperature-dependent.[70] Accordingly, subtomogram averaging and physiological-temperature molecular dynamics simulations are important for analyzing the conformations of the protein, as increasingly explored.[70−74] To determine the extent of the variation in the published S-protein structures of Tables –6, we have listed the average relative solvent accessible surface area (RSA) per site in the structure as an indicator of conformational openness, using the Naccess algorithm[75] as implemented via the FreeSASA program.[76] This property is of central importance in protein evolution,[77] and is particularly relevant because most of the function of interest of the S-protein relates to surface interaction with other proteins, notably antibodies and ACE2. Mutations that affect virus infectivity and immune evasion are likely to be solvent-exposed, as they will then more directly affect the affinity for antibodies or ACE2, e.g., via electrostatic, hydrophobic, or hydrogen bond interactions. As seen from Tables –6, the average exposure per residue of the S-protein tends to be approximately 25−30% (RSA values 0.25–0.30). Considering that the number is an average of approximately 1000 residues, even a difference of 0.01 between two structures (1% difference on average) indicates substantial surface heterogeneity, e.g., 20 residues that change from 0 to 50% exposure. However, we cannot determine which conformation is correct, due to mutations used to stabilize the proteins, the modest resolution and cryo-effects on the conformational dynamics.[54] Still, the total average provides a robust indicator of the surface-specific heterogeneity of the protein structures. We note that for the apo-S-protein (Table ), all average RSA values of 0.29 or larger are partly open or active states with one or more RBD in the upward conformation whereas closed states tend to have lower values of 0.25–0.28. For comparable structures of the same study, the closed structures have lower RSA than partial open structures. Thus, in the closed structure by Zhou et al.,[78] (6XF5) RSA = 0.27, whereas the conformation with one RBD in the upward conformation (1-up; (6XF6) has RSA = 0.28 (Table ). All of the studies that included both closed and partially open states by Melero et al.,[68] Juraszek et al.,[79] Henderson et al.,[46] Wrobel et al.,[80] and Gobeil et al.[65] confirm this observation. The RSA thus seems to be an important simple informer on the conformation state of the nearly complete S-protein structures. From the pH-dependent structures of Zhou et al.,[66] lower pH values tend to produce more closed apo-S protein structures, with pH = 4.0 giving RSA = 0.27 (6XLU) and higher pH giving 0.28 (7JWY, 6XM5). This is consistent with a general recognition of conformational flexibility in the metastable prefusion state.[46,65,68]

ACE2 Binding

The critical step in infection is the fusion of the S-protein with human ACE2, a process that enables the partial opening of the cell membrane and injection of virus genetic material into the cytoplasm.[12,13] New arising mutations in particular on the surface of the S-protein can affect this crucial interaction and thereby the infection process, with both alpha and beta variants shown to possess higher ACE2 affinity.[81]Figure a shows the apo-S-protein (i.e., without any antibody or other protein bound), in its primary closed conformation with all three RBDs in the downward conformation (PDB: 7DF3).[82] Prominent naturally occurring mutations are shown in red and orange. Sites of the RBD shown in red harbor mutations of concern either due to enhanced antibody evasion or effects on transmission. Figure b and Figure c show the corresponding structure and mutations of the open state involved in ACE2 binding: Figure b shows the specific conformations of this state, whereas Figure c shows the full context of ACE2-binding, based on the published structure 7KMS by Zhou et al.[67]
Figure 2

SARS-CoV-2 structures and mutations. (a) The S-protein in RBD-down conformation (PDB 7DF3). Trimer and monomer are represented along with natural mutations reported in spike. Red represents natural mutations of concern, and orange represents other natural mutations. (b) RBD up conformation (PDB 7KMS). (c) S-protein bound to ACE2 in RBD-up conformation (PDB 7KMS).

SARS-CoV-2 structures and mutations. (a) The S-protein in RBD-down conformation (PDB 7DF3). Trimer and monomer are represented along with natural mutations reported in spike. Red represents natural mutations of concern, and orange represents other natural mutations. (b) RBD up conformation (PDB 7KMS). (c) S-protein bound to ACE2 in RBD-up conformation (PDB 7KMS). One can envision mutations to increase the infectivity either by increasing affinity toward ACE2 by maintaining higher ACE2 affinity relative to the most important antibodies, possibly favoring open conformation states that associate more strongly with ACE2 (Figure ). The virus particle’s lifetime in the host may depend on the relative propensity to bind to ACE2 and infect cells vs binding to prominent antibodies. This relative propensity via competitive binding is quantified by the ratio of chemical association constants KACE2/KAb corresponding to the difference in binding free energy of the S-protein to ACE2 vs antibodies (Ab). These affinities are again defined by the mutations that mostly change the interaction with ACE2 and the antibodies. Competitive binding is a distinct consideration from either binding to antibodies (host immunity evasion) or ACE2 (host cell fusion) that plausibly correlates better with virus fitness. Surface mutations may increase binding nonspecifically, and mutations that bind less to antibodies (“antibody escape”) can also bind less to ACE2, which may not change the overall fitness of the variant. These considerations indicate why the relative affinity toward ACE2 and antibodies is of importance to structure-based understanding of SARS-CoV-2 transmission. One may also expect individual variations in e.g., ACE2 expression and surface composition to contribute to the heterogeneity in susceptibility and transmissibility that plays a central role in the epidemiology of the disease.[83,84] For example, ACE2 expression is age-dependent.[85,86] and may correlate with mortality,[87] consistent with lower susceptibility being a function of S-protein-ACE2 complex formation. To understand these interactions, we need to apply the structures of the S-protein bound to ACE2, but also a reasonable structure of the apo-S-protein, to appreciate how the protein structure itself is affected by the binding and how mutations affect the two states. Thus, whereas Table lists properties of the apo-S-protein, Table lists corresponding properties of resolved structures of the S-protein bound to (parts of) ACE2. As already mentioned, the S-protein preferably binds ACE2 in a more open conformation; thus, the RSA values of the ACE2-S-protein complexes are generally of the “open” type, typically 0.28–0.29, with only three exceptions in Table . In general, the resolution of the ACE2 complexes is not as good as for the apo-S-protein, with most R-values >3.5 Å. The notable exceptions are the structures by Yan et al.[63] (2.9–3.6 Å) that also include both the 1-up, 2-up, and 3-up conformation states with one or two ACE2 peptidase domains. Also of note, Zhou et al.[67] studied the complexes with 1, 2, and 3 ACE2 molecules bound at two different pH values, 7.4 and 5.5. The RSA values indicate that pH does not affect the ACE2 complexes as much as the apo-S-protein probably because the ACE2-bound S-protein is always in the open state, regardless of pH.

Structural Basis of Antigenic Drift

Some properties of 80 structures published with antibodies (including nanobodies, etc.) bound to the S-protein are summarized in Table . Antigenic drift can be defined as reduced affinity (typically in the picomolar range) of new arising mutations toward important antibodies.[88,89] New variants may harbor mutations at positions of the S-protein that interact strongly with notable antibodies, including prominent antibodies targeting earlier variants, and these new mutations may reduce the affinity for the antibodies, producing more resistant virus variants.[6,90−92] Understanding this antibody evasion thus requires understanding the (loss of) binding affinity caused by mutation, which has been obtained in several important studies[39,93−95] and is a good structural basis for rationalization of the observed effects. The 80 structures included in Table cover a large range of resolutions and very distinct complexes, and thus offer a wealth of information on the protein–protein interactions of the S-protein. Despite the complexity of these structures, the structures are often of comparable resolution to typical structures of the apo-S-protein (Table ), with many achieving resolutions near 3 Å, which may be considered a benchmark. Some interesting highlights include structures revealing distinct conformation states even for the same antibody, such as P17 in 1-up and 2-up conformations (7CWM/7CWL),[96] which further testifies to the conformational plasticity of the S-protein. Another example is the study by Lv et al.[97] of H014 binding covering both 1, 2, and 3 molecules binding to the S-protein trimer at reasonable resolution, with corresponding 1-up, 2-up, and 3-up conformations, which provides important systematic insight into the effect of binding stoichiometry on the S-protein structure. An escape mutation with regards to one antibody may be captured by another antibody due to distinct antibody surfaces and associated binding modes. This is the molecular basis for the promising use of antibody cocktails in vaccines to minimize the potential threat of escape mutations.[93] An illustrative example is the antibody cocktail REGN10987+REGN10933 (casirivimab/imdevimab) combining two antibodies that bind to different parts of the RBD of the S-protein and thus potentially makes antigenic drift more difficult as it requires multiple substitutions.[93] This illustrates well the power of structural biology in providing the basis for interpreting the assay data of the escape mutations and the effect of different antibodies. Figure illustrates some representative structures of the S-protein associated with ACE2 and antibodies, with emphasis on the different conformations achievable. Antibodies are produced by the immune system to encapsulate the virus and target it for destruction by e.g., macrophages, and therefore the binding to ACE2 may involve residues distinct from those interacting with antibodies. Structural alignment of 10 representative ACE2 complexes (Figure a) reveals conformational heterogeneity. Structural alignment of antibody complexes (7A29, Figure b) reveals distinct conformations obtainable even when the same antibody binds. Such heterogeneity is also evident from the different conformational states attained in the strongly bound miniprotein complex 7JZL[98] (Figure c), where two antibodies bound to S-protein attained similar conformations whereas a third attained a different RBD-conformation. Thus, there is substantial conformational variation in the S-protein bound even to the same antibodies, probably also affected by both temperature, molecular crowding, and pH.[67]
Figure 3

Structural comparison of S-protein-ACE2 and antibody complexes. (a) Structural alignment of the ten spike-ACE2 monomers from PDB structures with high resolution (7DX8, 7DX6, 7DX5, 7DX7, 7KJ4, 7DX3, 7KMZ, 7KMS, 7DX9, and 7KJ2) shows mutual RMSD of 0.35–2.87 Å. (b) Spike–antibody complex 7A29. (c) Miniprotein complex 7JZL. Two spike monomers attained similar RBD-up conformations, and one displayed an RBD-down orientation.

Structural comparison of S-protein-ACE2 and antibody complexes. (a) Structural alignment of the ten spike-ACE2 monomers from PDB structures with high resolution (7DX8, 7DX6, 7DX5, 7DX7, 7KJ4, 7DX3, 7KMZ, 7KMS, 7DX9, and 7KJ2) shows mutual RMSD of 0.35–2.87 Å. (b) Spike–antibody complex 7A29. (c) Miniprotein complex 7JZL. Two spike monomers attained similar RBD-up conformations, and one displayed an RBD-down orientation. Because the resolution is lower in nearly complete cryo-EM structures, it is also of interest to discuss X-ray crystal structures of smaller parts of the S-protein interacting with antibodies, which have been obtained at higher resolution in several cases, and thus provide a more precise structural account of the S-protein–antibody interaction. Illustrative examples of such structures of antibodies interacting with the RBD with resolutions at 2.5 Å and below reported in published papers are compiled in Table . These structures provide in some cases a substantially better detail of individual amino acid conformations, with the resolution of interactions also being improved relative to full cryo-EM structures; however, it is at the expense of only having some parts of the full chemical composition, which may affect precision. For the discussion here, we emphasize enlightening studies of variant effects on RBD interaction with antibodies (Figure ). In one notable study, Supasa et al.[99] reported 1.8 and 2.2 Å resolution structures of RBD binding to antibodies with and without the N501Y mutation at the ACE2 interacting surface. Although the alpha variant does not generally show escape from natural or monoclonal antibodies or vaccines, it is not easily neutralized by some antibodies, and interaction can occur with the antibody light chain at position 501.[99] Interaction of the COVOX-269 antibody with S-protein RBD was shown to be affected by the larger tyrosine side chain at position 501 (Figure a–c).
Figure 4

X-ray crystal structures of S-protein RBD complexed with antibodies. (a) Structural alignment of 7NEH and 7NEG (N501Y), both complexed with COVOX-269. (b) Close view of spike N501 (7NEH) and residues of COVOX-269 interacting with N501 highlighted as sticks. (c) Spike N501Y variant, with Y501 and antibody residues interacting with it shown in sticks. (d) Beta variant and (e) gamma variant complexed with two antibodies COVOX-222 and EY6A. (f) High-resolution of RBD complexed with cross-neutralizing antibody 7D6.

X-ray crystal structures of S-protein RBD complexed with antibodies. (a) Structural alignment of 7NEH and 7NEG (N501Y), both complexed with COVOX-269. (b) Close view of spike N501 (7NEH) and residues of COVOX-269 interacting with N501 highlighted as sticks. (c) Spike N501Y variant, with Y501 and antibody residues interacting with it shown in sticks. (d) Beta variant and (e) gamma variant complexed with two antibodies COVOX-222 and EY6A. (f) High-resolution of RBD complexed with cross-neutralizing antibody 7D6. In another insightful study of the beta variant (lineage B.1.351) structure (7NXA; Figure d), three mutations, K417N, E484K, and N501Y, affected binding to two antibodies COVOX-222 and EY6A, providing a structural basis for the chemical change in interaction and reduced antibody affinity.[100] Similarly, the published structure of the RBD of gamma (P.1) (7NXB) (Figure e) comprises three gamma mutations K417T, E484K, and N501Y, two of which are also present in beta variant.[100] These mutation sites interact with COVOX-222, whereas EY6A binds to the peripheral region of the RBD (Figure d, 4e). The gamma variant has displayed less resistance to antibodies produced from vaccine or natural infection as compared to beta, indicating possible neutralization effects outside the RDB.[100] As a final example, Figure f represents a high-resolution (1.4 Å) RBD structure complexed with the cross-neutralizing antibody 7D6.[101] This antibody binds to a cryptic site, which is distinct from those typically targeted by other antibodies (e.g., Figure a–e), and accordingly, the RBD attains a different conformation (closed) as compared to the structures shown in Figure a–e. The side-chain amino acid conformations in these good-resolution crystal structures are substantially more precisely determined, which makes the structures highly complementary to the lower-resolution, but more complete cryo-EM structures. Due to the conformation states being affected sometimes differently by binding ACE2 and since the interacting residues are not generally the same, the binding energy effect of a mutation is likely to differ for ACE2 and any specific antibody. The most successful variants may not be those that increase the free energy of binding to ACE2 or decrease the free energy of binding to prominent antibodies, but those with the largest difference in affinity toward ACE2 and prominent antibodies. Such selectivity is plausibly related to the virion lifetime in the host and could in principle be measured or computed from published affinities of mutants toward antibodies and ACE2,[92−95,102,103] using the structures reviewed in Tables –7 as input for such analysis.

Evolution of the S-protein in a Structural Context

SARS-CoV-2 is the seventh coronavirus known to infect Homo sapiens, the others being the common-cold causing HCoV-229E, HCoV-NL63, HCoV-HKU1, and HCoV-OC43, as well as the more pathogenic SARS-CoV-1 and MERS-CoV.[104,105] While some of the milder of these are α-coronaviruses and some β-coronaviruses (OC43, HKU1), all the three most pathogenic viruses are β-coronaviruses with distinct functional properties.[106] Coronaviruses consist of four main structural proteins, with the membrane (M), envelope (E), and nucleocapsid (N) proteins in addition to the S-protein of focus in the present paper.[107] Compared to some other mRNA viruses, which generally have very high mutation rates, coronaviruses tend to evolve several-fold more slowly due to their proof-reading machinery,[108] of the order of 10–4 substitutions per site per year.[105,109] SARS-CoV-2 is currently evolving substantially faster than this (albeit still slowly compared to other nonproofed mRNA viruses), estimated at 7 × 10–4 substitutions per site per year (2 × 10–6 per day),[8] and has seen a remarkable lineage evolution during the pandemic, as expected from its high prevalence and ongoing adaptation to humans.[9] Since the emergence of SARS-CoV-2, estimated to be Autumn 2019, several hundred recurrent mutations have been identified, 80% of these being nonsynonymous changes in the virus proteins,[4] including many in the S-protein.[11] As with the other highly pathogenic coronaviruses MERS-CoV and SARS-CoV,[104] the closest related lineages are found in bats.[4] The virus is subject to both neutral evolution and positive selection, and both the nonsynonymous[4] and synonymous[110] evolution rates have been found to be high. We expect SARS-CoV-2 to adapt by accumulating mutations that increase binding to human ACE2, under increased selection pressure from S-protein-based immunity induced by vaccines and therapeutic antibodies, which may lead to mutations that minimize binding to S-protein antibodies frequent in the population.[103,111] Accordingly, bat S-proteins do not have the same affinity for ACE2.[80] These two fitness effect may not work additively; in fact we could envision many mutations to bind better to both proteins, such that the relative specificity for ACE2 vs prominent antibodies would contribute to the fitness. At the same time, overall S-protein stability is probably a restraining parameter on possible evolution, limiting new groups of mutations to those that do not compromise overall conformational stability of the protein in the virion lipid surface, consistent with the constraining role of stability seen more generally in protein evolution.[24,30,33,112,113] After the early D614G mutant[114] became quickly dominant plausibly due to a modest fitness advantage,[69,115] a period of relative calm[9] existed before a “storm” of new variants emerging during late 2020 and early 2021, with notable examples being alpha (B.1.1.7 lineage, clade 20I/501Y.v1),[116] beta (B.1.351, clade 20H/501Y.v2),[117] gamma (P.1, clade 20J/501Y.v3),[118] and delta (B.1.617.2).[119] These variants are of concern due to the presence of amino acid substitutions such as E484K that are associated with escape from monoclonal antibodies.[120,121] Importantly, whereas the new variants display reduced sensitivity to some vaccines after first dose, vaccine efficiency remains high toward symptomatic infection after full dosing, and the most important function of the vaccines is to reduce the severity of the disease.[122,123] Representative S-protein structures of variants of SARS-CoV2 are displayed in Table .
Table 8

Properties of Some Published Structures of Spike Protein Variantsa

proteinPDBNchains% outl.res (Å)RSAavgreference
D614G 1-up7KDJ97930.23.50.27Gobeil et al.[65]
D614G closed7KDK97230.02.80.26Gobeil et al.[65]
D614G 1-up7KDL97930.13.00.27Gobeil et al.[65]
D614G closed7KDI97230.13.30.27Gobeil et al.[65]
D614G 1-up7KEC97930.43.80.26Gobeil et al.[65]
D614G 1-up7KEA97930.23.30.27Gobeil et al.[65]
D614G 1-up7KEB97930.43.50.26Gobeil et al.[65]
D614G 1-up7KE997930.23.10.27Gobeil et al.[65]
D614G variant6XS678530.03.70.29Yurkovetskiy et al.[69]
D614G variant7DX197230.43.10.29Yan et al.[63]
D614G open7BNN107430.03.50.30Benton et al.[124]
D614G −2up7BNO106630.04.20.30Benton et al.[124]
Cluster-5 1-up7LWM99730.32.80.27Gobeil et al.[125]
Cluster-5 1-up7LWO99730.32.90.28Gobeil et al.[125]
Cluster-5 2-up7LWP99530.43.00.28Gobeil et al.[125]
Cluster-5 3-d7LWI100130.23.10.27Gobeil et al.[125]
Cluster-5 3-d7LWJ100130.23.20.27Gobeil et al.[125]
Cluster-5 3-d7LWK100130.22.90.28Gobeil et al.[125]
Cluster-5 3-d7LWL100130.22.80.27Gobeil et al.[125]
alpha/B.1.1.7 1-up7LWT99730.43.20.28Gobeil et al.[125]
alpha/B.1.1.7 1-up7LWU99730.23.20.28Gobeil et al.[125]
alpha/B.1.1.7 1-up7LWV99730.23.10.28Gobeil et al.[125]
alpha/B.1.1.7 3-d7LWS100030.23.20.28Gobeil et al.[125]
alpha/B.1.1.77N1X109630.64.00.28Cai et al.[127]
alpha/B.1.1.77N1U107530.33.10.28Cai et al.[127]
alpha/B.1.1.77N1Y108930.64.30.29Cai et al.[127]
alpha/B.1.1.77N1V110830.33.20.27Cai et al.[127]
alpha/B.1.1.77N1W109630.53.30.28Cai et al.[127]
beta/B.1.3517N1Q111530.62.9 Cai et al.[127]
beta/B.1.3517N1T111530.73.10.27Cai et al.[127]
beta/B.1.351 2-up7LYK99630.33.70.28Gobeil et al.[125]
beta closed/3-d7LYL100130.13.70.27Gobeil et al.[125]
beta 1-up7LYN99930.33.30.27Gobeil et al.[125]
gamma7M8K103930.0N/A0.28Wang et al.[229]
gamma + ACE27NXC59620.13.10.28Gobeil et al.[125]
epsilon+S2M11,S2L207N8H1038150.02.30.28McCallum et al.[126]
Triple mutant7LWW99830.33.00.27Gobeil et al.[125]

PDB = PDB code; N = number of residues in structure; chains = number of chains in structure; res (Å) = resolution in Å; RSAavg = average solvent exposure of all residues; reference = primary citation; % outl. = outliers of Ramachandran plot in % (from PDB full report).

PDB = PDB code; N = number of residues in structure; chains = number of chains in structure; res (Å) = resolution in Å; RSAavg = average solvent exposure of all residues; reference = primary citation; % outl. = outliers of Ramachandran plot in % (from PDB full report). The D614G structure has been elucidated by Yurkovetskiy et al. (6XS6),[69] Gobeil et al.[65] (7KDK etc.), Yan et al.[63] (7DX1), and Benton et al.[124] (7BNN, 7BNO). Yurkovetskiy et al.[69] reported a single D614G structure in an open conformation state which was speculated to be suitable for fusion, although the actual binding affinity to ACE2 was lowered. The authors speculated that this conformation change could explain its fixation in the population. The open conformation is consistent with the relative large RSA (0.29) similar to that of Yan et al.[63] (7DX1, 0.29) and Benton et al.[124] (7BNN, 7BNO, 0.30) and has been supported by molecular dynamics simulations at physiologically relevant temperature.[70] Since then, however, other conformation states have been obtained for D614G. Gobeil et al.[65] have elucidated the D614G in several more closed conformation states, with the variant clearly able to form the same closed and open conformation states as the reference early Wuhan variant (structures summarized in Table ). The more closed and 1-up conformations all have RSA values of 0.26–0.27, indicating reduced solvent exposure. Because D614G has been obtained in both open and closed states, and due to the delicate impact of environmental factors on conformation state,[59,67] the tendency toward open conformations could also be affected by the deletion of the furin site and proline mutation, as used by e.g., Yurkovetskiy et al.[69] Herrera et al. found major heterogeneity in a motif starting at site 614 (614–642) implying its pronounced variability.[45] Because the in vivo protein conformational state is sensitive to composition and molecular environment, it is not possible to deduce that one variant prefers one conformation state over another except when studied under the same conditions with the same protocol. In addition to D614G, major achievements in the structural biology of variants of concern have been documented during Summer 2021. Notably, Gobeil et al.[125] published both mink-related mutations and the structure of the beta variant in several conformation states (e.g., 7LYN in the 1-up conformation and), McCallum et al.[126] published the epsilon variant structure bound to two distinct antibodies (S2M11 and S2L20; 7NH8 in Table ), and Cai et al. published structures and antibody escape data for alpha and beta (7N1Q, 7N1T, 7N1U, 7N1V, 7N1W, 7N1X, and 7N1Y).[127] All of these structures are of good resolution with excellent metrics and form an essential basis for understanding SARS-CoV-2 evolution consistent with this Review’s emphasis on three selection pressures: one to preserve or improve protein stability, one toward enhanced binding to ACE2, and one toward reduced affinity toward antibodies. Each mutation is likely to contribute differently to these three terms and thus the overall fitness effect. Taking the wider perspective of other human-host coronaviruses (Table ) and other spike protein structures (Table ),[128] several mutations in the SARS-CoV-2 RBD have contributed to a stronger affinity toward ACE2.[47] Indeed, by far the most sequence variation between SARS-CoV-1 and SARS-CoV-2 occurs in the S1 domain that includes the RBD.[107,129] In addition, the highly positively charged amino acid sequence RRAR represents a new furin-like cleavage site that is not seen in other related β-coronaviruses.[80,130] Many sequence parts of the RBD of these viruses show similarity both to nonhuman but also human protein motifs, which may help to understand protein–protein interactions with the S-protein more broadly.[131]
Table 9

Properties of Structures of Spike Proteins of Other Human Coronavirusesa

proteinPDBNchains% outl.res (Å)RSAavgreference
MERS-CoV5W9K1216120.64.60.29Pallesen et al.[36]
 5W9I1006120.93.60.28Pallesen et al.[36]
 5X5C114130.34.10.26Yuan et al.[37]
 5X5F114130.54.20.26Yuan et al.[37]
 5X59114130.33.70.27Yuan et al.[37]
 6Q04115930.22.50.25Park et al.[232]
 6Q05115930.22.80.25Park et al.[232]
 6Q06115930.22.70.25Park et al.[232]
 6Q07115930.22.90.25Park et al.[232]
SARS-CoV6CRV88130.13.20.28Kirchdoerfer[234]
 6CRX106930.13.90.28Kirchdoerfer[234]
 6CRW106930.13.90.27Kirchdoerfer[234]
 6CRZ107130.03.30.28Kirchdoerfer[234]
 6CS1106930.04.60.29Kirchdoerfer[234]
 6CS0107130.03.80.28Kirchdoerfer[234]
 6CS2109240.04.40.29Kirchdoerfer[234]
 5X5B105430.03.70.26Yuan et al.[37]
 5X58105430.03.20.26Yuan et al.[37]
 5XLR102230.03.80.28Gui et al.[233]
ACE2+SARS-CoV6ACK106940.04.50.27Song et al.[235]
 6ACJ106940.14.20.28Song et al.[235]
 6ACC106530.03.60.27Song et al.[235]
 6ACD106530.03.90.28Song et al.[235]
 6ACG106940.05.40.29Song et al.[235]
OC436NZK117530.12.80.25Tortorici et al.[132]
 6OHW117530.12.90.26Tortorici et al.[132]
229E6U7H96530.03.10.27Li et al.[236]
HKU15I0895831.34.00.30Kirchdoerfer[35]
HKU26M1596530.02.40.27Yu et al.[237]

PDB = PDB code; N = number of residues in structure; chains = number of chains in structure; res (Å) = resolution in Å; RSAavg = average solvent exposure of all residues; reference = primary citation;. % outl. = outliers of Ramachandran plot in % (from PDB full report).

Table 10

Properties of Some Structures of Spike Proteins of Other Virusesa

proteinPDBNchains% outl.res (Å)RSAavgreference
porcine SADS-CoV6M1696530.02.80.28Yu et al.[237]
porcine SADS-CoV6M3993730.03.60.26Guan et al.[246]
porcine PDCoV6B7N96630.13.30.25Shang et al.[247]
porcine PDCoV6BFU96430.13.50.24Xiong et al.[248]
PEDV6VV5109730.23.50.27Kirchdoerfer[249]
PEDV6U7K106430.13.10.26Wrapp et al.[250]
avian bronchitis6CV099330.03.90.29Shang et al.[251]
bat RaTG137CN4112030.12.90.26Zhang et al.[252]
pangolin PCoV_GX7CN8112530.12.50.25Zhang et al.[252]
bat virus RaTG136ZGF106030.03.10.27Wrobel et al.[80]
mouse (MHV)6VSJ112260.03.90.27Shang et al.[253]
mouse (MHV)3JCL106730.44.00.27Walls et al.[254]
FIPV6JX7124530.03.30.29Yang et al.[255]
Guangdong pango7BBH106330.02.90.27Wrobel et al.[256]

PDB = PDB code; N = number of residues in structure; chains = number of chains in structure; res (Å) = resolution in Å; RSAavg = average solvent exposure of all residues; reference = primary citation; % outl. = outliers of Ramachandran plot in % (from PDB full report).

PDB = PDB code; N = number of residues in structure; chains = number of chains in structure; res (Å) = resolution in Å; RSAavg = average solvent exposure of all residues; reference = primary citation;. % outl. = outliers of Ramachandran plot in % (from PDB full report). PDB = PDB code; N = number of residues in structure; chains = number of chains in structure; res (Å) = resolution in Å; RSAavg = average solvent exposure of all residues; reference = primary citation; % outl. = outliers of Ramachandran plot in % (from PDB full report). Whereas the S-protein structures of MERS-CoV and SARS-CoV-1 display RBD-up conformations as typically associated with strong ACE2 binding by SARS-CoV-2, these ACE2-adapting conformations were not seen in the S-protein structures from HCoV-OC43 and HCoV-HKU1.[132] We note that the two latter coronaviruses mainly use 9-O-acetylated sialic acids for fusion,[107] which could possibly relate to this observation. The structure 6NZK in Table provides a structural basis for this type of interaction.[132] Figure shows the electrostatic potential maps of the S-protein RBD of the three most pathogenic coronaviruses, SARS-CoV-2 (Figure a), SARS-CoV-1 (Figure b), and MERS-CoV (Figure c), indicating clearly different electrostatic patterns on the surfaces. Electrostatic interactions are dominating interactions in proteins[133−137] and thus particularly likely to give rise to non-neutral functional effects. Whereas MERS-CoV has substantial buried positive charge and less positive charge on the surface, the SARS-CoV-1 and SARS-CoV-2 S-protein RBDs both have considerable positive charge on the surface. This may perhaps relate to MERS-CoV S-protein interacting with the distinct receptor dipeptidyl peptidase 4,[138] whereas SARS-CoV-1 and SARS-CoV-2 preferably use ACE2.[13,129,139,140]
Figure 5

Electrostatic potential surface comparisons of the S1-CTD/RBD domain of three coronaviruses: (a) SARS CoV-2, (b) SARS CoV, and (c) MERS CoV. The blue colored surface reflects excess of positive charge, whereas the red colored surface (negative potential) indicates negative charge surplus.

Electrostatic potential surface comparisons of the S1-CTD/RBD domain of three coronaviruses: (a) SARS CoV-2, (b) SARS CoV, and (c) MERS CoV. The blue colored surface reflects excess of positive charge, whereas the red colored surface (negative potential) indicates negative charge surplus. As shown in Figure , electrostatic potential analysis indicates a high negative charge potential on the surface of ACE2 pointing toward the area interacting with the S-protein. Considering that positive charge introductions in prominent mutations of concern occur on the exact interface of this interaction, it is plausible that this will generate favorable electrostatic interactions between the RBD of the S-protein and ACE2, which could enhance the ability of the virus to fuse with the host cell, and more studies into the electrostatic modulation of the S-protein surface therefore seem warranted.
Figure 6

ACE2-spike complex of SARS-CoV-2 (PDB code 7KMS). RBD region 334–527 is shown, together with electrostatic potential maps of ACE2 and RBD; high negative potential (red) on the ACE2 may interact with positive charge on RBD and affect SARS-CoV-2 fusion.

ACE2-spike complex of SARS-CoV-2 (PDB code 7KMS). RBD region 334–527 is shown, together with electrostatic potential maps of ACE2 and RBD; high negative potential (red) on the ACE2 may interact with positive charge on RBD and affect SARS-CoV-2 fusion.

Natural Mutations of Concern

In the following, we take a closer look at prominent natural mutations, with relevant structures summarized in Table . Variants of interest and concern, summarized in Table , harbor mutations that are likely to increase transmission, e.g., by causing the S-protein to bind more strongly to ACE2, or evading the binding of human antibodies or vaccines.[103,111,141,142] Among these mutations, deletions have been associated with substantial escape tendency.[6] Most mutations of concern are, however, missense substitutions where one amino acid has been changed for another.[4,11]
Table 11

Notable SARS-CoV-2 Variants and Their S-Protein Mutations

WHO namePango lineage nametransmission potentialescape mutationsS-protein mutations
alphaB.1.1.7normal[116] higher ACE2 affinity[81] 69del, 70del, 144del, N501Y, A570D, D614G, P681H, T716I, S982A, D1118H
betaB.1.351high[257] higher ACE2 affinity[81]E484K,[38,150] K417N,[91,120] full variant[120,258]L18F, D80A, D215G, K417N, E484K, N501Y, D614G, A701V
gammaP.1high[118]E484K,[38,150] K417T, full variant[229]L18F, T20N, P26S, D138Y, R190S, K417N, K417T, E484K, N501Y, D614G, H655Y, T1027I, V1176F
deltaB.1.617.2high[259,260]L452R,[126,148] P681R,[94] full variant[261]T19R, E156del, F157del, R158G, L452R, T478K, D614G, P681R, D950N
epsilonB.1.427, B.1.429normal (+20%)[262]L452R,[126,148] S13I/W152C,[126] full variant[126,263]S13I, W152C, L452R, D614G
etaB.1.525 E484K[38,150]A67V, 69del, 70del, 144del, E484K, D614G, Q677H, F888L
iotaB.1.526 E484K,[38,150]L5F, T95I, D253G, S477N, E484K, D614G, A701V
kappaB.1.617.1 L452R,[126,148] E484Q,[264] full variant[265]G142D, E154K, L452R, E484Q, D614G, P681R, Q1071H
Some properties of mutations in the main natural variants are listed in Figure a, including the change in polarity (measured by the Grantham scale[143]) of the amino acid, the change in side-chain volume, ΔV, and the RSA average of the site from eight structures (7LXY, 6XM0, 6ZB4, 7DDD, 6ZGE, 6VXX, 6VYB, and 7DWY). These properties represent three of the five important patterns needed to quantify protein-difference metrics and substitution tendencies (considering hydrophobicity and polarity inversely related, the remaining being secondary structure and codon use),[144] and are important for the general fold stability of protein structures.[145] As seen, the polarity and volume changes tend to be randomly affected, as expected for neutral mutations with no chemical property under selection. On average, the mutations tend to occur in sites of typical exposure (∼0.25–0.30, Table ). However, the mutation sites of concern, such as sites E484 and P681, are more exposed than typical mutated sites, consistent with selection for interaction with other proteins.
Figure 7

Chemical properties of prominent natural S-protein mutations. (a) Change in polarity/hydrophobicity (ΔH, normalized Grantham polarity scale), amino acid side chain volume change (ΔV), and relative solvent accessibility of mutated site (RSA). (b) Normalized Grantham[143] polarity change vs side chain volume change for prominent natural S-protein mutations. (c) Solvent accessibility (FreeSASA/Naccess[75,76]) vs computationally estimated stability effect (Simba;[145] in kcal/mol) of prominent natural mutations. Large blue spheres represent the average values for all possible mutations in the S-protein, calculated based on the 7DWY structure.

Chemical properties of prominent natural S-protein mutations. (a) Change in polarity/hydrophobicity (ΔH, normalized Grantham polarity scale), amino acid side chain volume change (ΔV), and relative solvent accessibility of mutated site (RSA). (b) Normalized Grantham[143] polarity change vs side chain volume change for prominent natural S-protein mutations. (c) Solvent accessibility (FreeSASA/Naccess[75,76]) vs computationally estimated stability effect (Simba;[145] in kcal/mol) of prominent natural mutations. Large blue spheres represent the average values for all possible mutations in the S-protein, calculated based on the 7DWY structure. Figure b displays the change in polarity and volume of the same natural mutations, indicating that they spread quite randomly around the average change of all possible mutations in the protein (these properties are obtained from the simple amino acids and are thus not structure-dependent). We note that, except for E484K and E484Q, mutations of concern tend to be farther away from the centroid, i.e. have larger changes in volume and polarity than average, consistent with a non-neutral chemical effect on protein function, e.g., association with ACE2 and antibodies. D614G is itself an unusual mutation involving both a change in charge and introduction of glycine (the smallest and sometimes structure-breaking amino acid), yet was fixated early and probably associated with evolutionary advantage. N501Y, which is characteristic of alpha, beta, and gamma variants, changes from a small polar to an aromatic tyrosine, which increases binding to ACE2.[102] Changes to arginine (P681R, L452R) are also conspicuous. Larger chemical changes are typically less likely to occur in protein evolution and are more often associated with non-neutral evolution.[146,147] Indeed, L452R is a known antibody escape mutation.[126,148] In contrast, mutations with typical changes in chemical properties that are not very exposed are expected (but not guaranteed) to be neutral. Mutations E484K and E484Q, which do not drastically change polarity and size, instead change the full charge of a very exposed site. Very many of the most prominent mutations change the charge of exposed residues in the S-protein. Of the 33 mutations listed in Figure a, 18 change the charge further than already shown for the reference genome S-protein: D80A, D138Y, G142D, E154K, R190S, D215G, R246I, K417N/K417T, N439K, N440K, L452R, T478K, E484K/E484Q, A570D, D614G, and P681R. Twelve of these change the surface charge toward more positive, the most notably being the early D614G substitution, introduction of lysine (K) at positions 439 or 440 (N439K, N440K), gain of arginine at position 452 (L452R), and loss of glutamate at position 484 (E484K/E484Q), whereas the remaining 6 contribute to more negative surface charge. Functional effects of all three positive charge introductions have been seen for N439K,[149] E484K,[38,150] and L452R.[126,148] The delta variant has two positive charge introductions in the RBD, L452R and T478K.[8] Interestingly, mutations that reduce positive charge at position K417 (K417N/T) tend to reduce ACE2 affinity;[102,149] together, these data could suggest that positive charge in the RBD is a main adaptation to human ACE2 interaction. We can also estimate the impact on the S-protein stability using the structures available and computational methods. Fold stability (free energy of folding the protein) is the main trade-off in protein evolution of new functionality[30,31,113,151−153] and also plays a role in many diseases driven by point mutations.[154−156] Such calculations should be of substantial interest, as they may provide information on whether fold stability of the S-protein is a desirable trait in the human host or whether it is relatively easily traded for improved function (ACE2 interaction, antibody evasion). Many such programs exist that use different machine-learning or energy-based force-fields.[157−165] To illustrate this point we computed the change in folding free energy (in kcal/mol) estimated by the recently developed method, Simba;[145] all these methods are inherently subject to limitations[166−169] so these results should only be considered expected estimates; however, Simba has the advantage in contrast to many machine-learning methods of interpretable residue-specific chemical contributions to stability, at generally similar accuracy.[145] These free energy changes are summarized in Figure c plotted against the RSA of each mutated site. The mutations in variants of concern are notably more stabilizing and notably more solvent-exposed than the “average” mutation in the S-protein. It is often expected that evolution will be faster on the surface due to less functional selection pressure and also smaller risk of disturbing the protein structure.[77,170,171] Accordingly, despite the functionality manifesting at the surface of the S-protein, the relation in Figure c is not surprising by itself. However, it is remarkable and a possible signature of positive selection that so many of the prominent natural mutations are in the upper right rather than lower left on this trend line, i.e. that they tend to be more solvent exposed and impair S-protein stability less than expected for a random mutation (estimated −1.2 kcal/mol), with several mutations being predicted to be stabilizing, although such considerations need more detailed analysis of evolution rates. Recent nanomechanical stress studies have indicated that the S-protein has evolved enhanced robustness.[172] A particularly concerning mutation in the spike-protein, E484K seen in e.g., beta (B.1.351) and gamma (lineage P.1),[117] is known to evade antibodies[39] and has been reported to have about 10-fold higher escape than the normal variant.[120] Immune plasma from individuals infected with early variants of the virus may not be equally effective against infection with virus variants harboring the E484K mutation so that new antibody cocktails are desirable.[39] The net positive surface potential is already markedly more positive when comparing the reference variant to SARS-CoV and MERS-CoV (Figure ) and has been supplemented by additional charge addition with E484K, which could suggest an evolutionary advantage of positive surface charge relating to ACE2 interaction and antibody escape. Because full charges contribute the largest electrostatic effects on interaction energies,[134,135] such charge changes may be expected to also produce large perturbations of interaction also with antibodies. Other exposed charge-changing mutations of note include K417T and K417N seen in gamma (P.1) and beta (B.1.351), respectively. K417N, like E484K, reduces the binding of the antibody families IGHV3-53/3-66 and IGHV1-2.[91] N439K, a relatively common mutation, displays reduced binding of some antibodies,[148,149] as does L452R seen in lineage B.1.617 (which includes the sublineage delta/B.1.617.2) and B.1.429 (epsilon).[88,126,148] Finally, it is important to stress that the effects of single amino acid changes, as analyzed above and in the single mutant scan assays for ACE2 binding and antibody binding,[88,94] are not expected to be completely linear, due to amino acid correlation effects (within-protein epistasis), and possibly epistasis between genes.[173−176] This point is very important in the context of SARS-CoV-2 variants.[9] The currently circulating variants are all multisite mutants, i.e. the virulence, transmission, or antibody evasion cannot be assumed simply a linear function of the individual amino acid effects. Influenza evolution has been shown to be highly dependent on epistasis in relation to maintaining protein stability,[177] and such structure–function trade-offs could be speculated to also cause the apparent avoidance of destabilization of many SARS-CoV-2 mutations in Figure c. This should be remembered in the context of Table , where some data are available both for the full variant and for individual mutations.

Structural Effects on Mutation Analysis

In the analysis in Figure , we used averages of eight structures (7LXY, 6XM0, 6ZB4, 7DDD, 6ZGE, 6VXX, 6VYB, and 7DWY) to calculate the RSA and stability effects. The reason is that most structural heterogeneity in the S-protein structures tends to occur on the surface of the proteins, which is generally less ordered than the buried parts of the protein, and structural heterogeneity is known to affect protein energies.[167,178] This is directly seen in the different RSA of the same residues in some reported structures. Because the local environment of the residue is very important for deducing its properties and role, this concern is relevant to any theoretical analysis of mutation structure–function relationships. To further quantify the structural heterogeneity of the S-protein surface sites, we calculated the RSA of the 33 mutation sites discussed above, using ten different apo-S-protein structures (6XM0, 6VYB, 7DDD, 6ZGE, 6VXX, 7DWY, 6X6P, 6ZOX, 7CAB, and 6XF5) produced by different groups (Figure a; note that not all sites are available in all the structures, and S13 and P681 are not in any of these structures). The standard deviation in RSA was 0.12, i.e. we expect the solvent exposure of a specific site to have approximately ∼10% uncertainty on average on the scale of 0–100% exposure, although larger variations commonly occur. For example, the site D614, which got fixated with the glycine substitution early on, is notoriously heterogeneous, with a range of 0.7 RSA, not due to outliers but with four structures in the range 0–0.1 and four structures >0.4. This major site heterogeneity is consistent with previous analysis by Herrera et al. that found substantial conformational variation starting at position 614[45] and with the evident conformational differences between published D614G structures[65,69,70,124] (Table ).
Figure 8

Structural heterogeneity in apo-S-protein structures. (a) RSA comparison of 33 sites known to mutate in different structures. (b) Structural alignment of 10 apo-S-proteins (6XM0, 6VYB, 7DDD, 6ZGE, 6VXX, 7DWY, 6X6P, 6ZOX, 7CAB, and 6XF5) with mutual RMSD values in the range 0.5–3.6 Å.

Structural heterogeneity in apo-S-protein structures. (a) RSA comparison of 33 sites known to mutate in different structures. (b) Structural alignment of 10 apo-S-proteins (6XM0, 6VYB, 7DDD, 6ZGE, 6VXX, 7DWY, 6X6P, 6ZOX, 7CAB, and 6XF5) with mutual RMSD values in the range 0.5–3.6 Å. As another example of site heterogeneity, the important site K417 (with mutations K417N, K417T which reduce antibody binding[91]) has only RSA = 0.01 in 7DDD and 0.07 in 7DWY (essentially buried site), but 0.33 in 6XM0, and 0.41 in 6VYB, despite all these structures being of the same chemical composition as apo-S proteins. Analysis that draws conclusions based on the local environment of these important sites, e.g., computational estimates of ACE2-binding and stability, will be very sensitive to this heterogeneity. Larger heterogeneity is also seen in the N-terminal part of the S-protein, such as D80 (D80A) and D138 (D138Y), which are highly exposed in 6XM0 and 6VXX (RSA = 0.52–0.74) but almost fully buried in 7DDD (0.03–0.17) and 6ZGE (0.11–0.13). Part of this is due to the different open and closed conformation states they represent, but a large part is what can be considered experimental noise, since 6VXX, 6XM0, and 6VXX all represent closed conformation states. Because of the large heterogeneity of some sites and the known conformational sensitivity of the protein,[41,67] it is strongly recommended to account for this heterogeneity for the mutation of interest before deducing its potential effects. The corresponding standard deviation in the estimated free energy effect of each mutation was 0.2 kcal/mol, which gives a more chemical quantification of the structural heterogeneity (many structural variations occur in disordered low-energy modes which may overemphasize differences). This variation does not change the overall conclusions from Figure c that most mutations of concern tend to be solvent-exposed and probably less destabilizing than typical random mutations in the protein. However, the structural heterogeneity of published cryoelectron microscopy structures reflect the real conformational plasticity of the protein as a functional requirement for its conversion from the metastable prefusion state.[34] Conformational plasticity in the prefusion state may even confer evolutionary advantage by providing dynamic variation of the epitopes accessible to antibodies; although this remains to be investigated, such “conformational masking” has been seen in relation to other, e.g., HIV.[179] As a further indication of structural heterogeneity, we aligned 10 apo-S-protein structures (6XM0, 6VYB, 7DDD, 6ZGE, 6VXX, 7DWY, 6X6P, 6ZOX, 7CAB, and 6XF5). The structures differed in their relative RMSDs from 0.5 to 3.6 Å, ranging from close resemblance to substantial differences (Figure b). Thus, the overall structural heterogeneity is as large for the apo-S-structures as for the different ACE2 complexes (Figure a), i.e. the higher plasticity in the apo-state produces as much heterogeneity as the different conformations induced by ACE2 binding. We conclude that subtle effects in the laboratory may lead to distinct local conformations even for the same protein states, consistent with the conformational sensitivity implied by temperature- and pH-dependent effects;[59,67] on top comes additional cooling effects on the structure and dynamics that may underemphasize entropic states of lipid-embedded proteins.[54]

Conclusions and Future Perspectives

SARS-CoV-2 is likely to become endemic and evolve into a milder virus due to intense efforts in clinical treatments and vaccination, yet the continued protein evolution, especially of the S-protein, will remain a concern to be monitored and countered.[180−182] The structural biology of the S-protein is a central cornerstone of this future, serving as the structural basis for rationalizing and predicting impacts of new mutations and treatments against them. We have summarized the current state of the art of structures of the S-protein of SARS-CoV-2, emphasizing the importance of completeness of the structures, their resolution, the conformations they represent, and the structural heterogeneity and plasticity especially of functionally important surface residues and its relation to evolutionary and functional analysis. While many of the published structures are of very good (∼3 Å) resolution allowing secondary structure to be well-accounted for, surface residue conformations are not generally resolved at 3 Å and this leads to substantial structural heterogeneity in the surface sites, which unfortunately are the sites of main functional interest in the S-protein. Even for protein structures supposedly representing the same conformational state, individual mutation sites can be situated in quite different environments, with solvent exposure varying up to 50% in some cases. These differences resonate with an overall tendency of conformational plasticity in the S-protein,[46,82] being affected also by temperature and pH.[59,67] Because of this heterogeneity, we recommend using averages of a group of high-quality protein structures rather than a single structure before reaching conclusions on specific structure–function relationships. The most important topic for the future of SARS-CoV-2 management is arguably to predict new virus evolution and counter it by, e.g., annual mutation-optimized boosters, to minimize mortality in the endemic state as is also attempted in many countries with influenza.[183−186] A large part of the antigenic drift in the endemic state is also likely to occur in the S-protein, making the structures of the S-protein and their mutations critical to a rational prediction and understanding of this evolution.[180] We expect SARS-CoV-2 to adapt by accumulating mutations that optimize binding to human ACE2 under selection from S-protein-based immunity induced by vaccines and therapeutic antibodies, which favors mutations that minimize binding to S-protein antibodies frequent in the population.[103,111] Indeed, the structures suggest that large chemical changes, notably affecting exposed surface charges, are recurring in the mutations of concern, possibly related to non-neutral effects on ACE2 or antibody interaction. These two fitness contributions may not work additively; many mutations may bind better to both proteins, such that the relative specificity for ACE2 vs prominent antibodies would determine fitness. We also expect overall protein stability to be a constraint on this evolution: mutations of the S-protein may be less destabilizing than randomly expected, consistent with a constraining selection pressure on the S-protein to not lose overall stability while adapting to the human host. These various selection pressures can be used to predict the part of the virus evolution that is adaptive, including antigenic drift, but not the random, neutral drift. We envision that the many experimental data on mutation effects on ACE2 binding and antibody escape and thermodynamic stability estimates of proteins (experimental and computational) will be used together with features of the mutation in its structural context to train computer models that can rationally predict fitness effects of new arising mutations in the S-protein. The analysis above suggests that mutations in surface-exposed sites with larger chemical effects, typically volume or charge-changing, will drive the adaptive evolution, although precise predictive models need to be semiquantitative. The heroic efforts in elucidating S-protein structures for all of the main conformation states bound to diverse antibodies and ACE2 will provide an unprecedented basis for analyzing the functional evolution of SARS-CoV-2 in its appropriate structural context.
  7 in total

1.  An Insight Based on Computational Analysis of the Interaction between the Receptor-Binding Domain of the Omicron Variants and Human Angiotensin-Converting Enzyme 2.

Authors:  Ismail Celik; Magda H Abdellattif; Trina Ekawati Tallei
Journal:  Biology (Basel)       Date:  2022-05-23

2.  Computational Analysis of Short Linear Motifs in the Spike Protein of SARS-CoV-2 Variants Provides Possible Clues into the Immune Hijack and Evasion Mechanisms of Omicron Variant.

Authors:  Anjana Soorajkumar; Ebrahim Alakraf; Mohammed Uddin; Stefan S Du Plessis; Alawi Alsheikh-Ali; Richard K Kandasamy
Journal:  Int J Mol Sci       Date:  2022-08-08       Impact factor: 6.208

3.  One Solution for All: Searching for Universal Aptamers for Constantly Mutating Spike Proteins of SARS-CoV-2.

Authors:  Jiuxing Li; Zijie Zhang; Ryan Amini; Yingfu Li
Journal:  ChemMedChem       Date:  2022-05-31       Impact factor: 3.540

4.  In silico analysis of mutations near S1/S2 cleavage site in SARS-CoV-2 spike protein reveals increased propensity of glycosylation in Omicron strain.

Authors:  Christopher A Beaudoin; Arun P Pandurangan; So Yeon Kim; Samir W Hamaia; Christopher L-H Huang; Tom L Blundell; Sundeep Chaitanya Vedithi; Antony P Jackson
Journal:  J Med Virol       Date:  2022-06-07       Impact factor: 20.693

Review 5.  Membrane attachment and fusion of HIV-1, influenza A, and SARS-CoV-2: resolving the mechanisms with biophysical methods.

Authors:  Geetanjali Negi; Anurag Sharma; Manorama Dey; Garvita Dhanawat; Nagma Parveen
Journal:  Biophys Rev       Date:  2022-10-11

6.  Structural heterogeneity and precision of implications drawn from cryo-electron microscopy structures: SARS-CoV-2 spike-protein mutations as a test case.

Authors:  Rukmankesh Mehra; Kasper P Kepp
Journal:  Eur Biophys J       Date:  2022-09-27       Impact factor: 2.095

7.  Binding of Human ACE2 and RBD of Omicron Enhanced by Unique Interaction Patterns Among SARS-CoV-2 Variants of Concern.

Authors:  Seonghan Kim; Yi Liu; Matthew Ziarnik; Yiwei Cao; X Frank Zhang; Wonpil Im
Journal:  bioRxiv       Date:  2022-01-25
  7 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.