Literature DB >> 34856799

Structure and Mutations of SARS-CoV-2 Spike Protein: A Focused Overview.

Abstract

The spike protein (S-protein) of SARS-CoV-2, the protein that enables the virus to infect human cells, is the basis for many vaccines and a hotspot of concerning virus evolution. Here, we discuss the outstanding progress in structural characterization of the S-protein and how these structures facilitate analysis of virus function and evolution. We emphasize the differences in reported structures and that analysis of structure-function relationships is sensitive to the structure used. We show that the average residue solvent exposure in nearly complete structures is a good descriptor of open vs closed conformation states. Because of structural heterogeneity of functionally important surface-exposed residues, we recommend using averages of a group of high-quality protein structures rather than a single structure before reaching conclusions on specific structure-function relationships. To illustrate these points, we analyze some significant chemical tendencies of prominent S-protein mutations in the context of the available structures. In the discussion of new variants, we emphasize the selectivity of binding to ACE2 vs prominent antibodies rather than simply the antibody escape or ACE2 affinity separately. We note that larger chemical changes, in particular increased electrostatic charge or side-chain volume of exposed surface residues, are recurring in mutations of concern, plausibly related to adaptation to the negative surface potential of human ACE2. We also find indications that the fixated mutations of the S-protein in the main variants are less destabilizing than would be expected on average, possibly pointing toward a selection pressure on the S-protein. The richness of available structures for all of these situations provides an enormously valuable basis for future research into these structure-function relationships.

Entities: Chemical

Keywords: SARS-CoV-2; antibody escape; mutation; spike protein; structural biology

Mesh：

Substances：

Year: 2021 PMID： 34856799 PMCID： PMC8673470 DOI： 10.1021/acsinfecdis.1c00433

Source DB: PubMed Journal: ACS Infect Dis ISSN： 2373-8227 Impact factor: 5.084

Introduction

Since the beginning of the pandemic caused by severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2),[1−3] the evolution of the virus has been of increasing concern due to the risk of more infectious, lethal, or antibody- or vaccine-resistant variants.[4−9] Of central interest is the spike glycoprotein (S-protein), the heavily glycosylated homotrimeric protein on the surface of coronaviruses giving them their characteristic appearance.[9−11] This protein is responsible for binding to the human cell-surface receptor angiotensin-converting enzyme 2 (ACE2) to promote cell entry.[12−14] Specific presentation of the S-protein to the human immune system is the mode of function of several major vaccines.[15,16] Considering its central and urgent importance, it is not surprising that many structures of the S-protein have been published. The number of these and the speed at which they have arrived document an outstanding achievement in modern structural biology. A recent database provides an excellent overview for analysis of these structures.[17] Without the recent technical breakthroughs in cryoelectron microscopy of macromolecules,[18−21] many of these structures would not have been possible, as should be appropriately honored in the context below. Structures have been solved for most relevant conformational states of the receptor binding domain (RBD) without or with (parts of) ACE2 or antibodies bound, at resolutions typically at 2–4 Å. While sequence-based evolution studies are very helpful and informative, evolution occurs in the context of a folded protein structure.[22−26] Because function is structure-dependent, the structure is the platform on which the amino acid evolution occurs, affecting both the extent of positive and neutral evolution. Accordingly, evolution rates tend to depend on the structural context, e.g., solvent exposure, of the evolving site.[27−29] Of particular importance is the need for the arising mutation to not compromise the overall fold stability of the protein, and thus, the evolution of new mutations often occurs with a trade-off not to undermine stability.[24,30−33] This trade-off may be the underlying force causing the prefusion S-protein to be metastable in its continuous selection for new mutations that evade antibodies and enhance ACE2 binding.[34] The structure context will aid the understanding of the evolution and function of the S-protein in relation to both immune evasion, development of treatments such as antibodies, and infection via ACE2 binding. S-protein structures from other viruses, notably other human coronaviruses,[35−37] can further strengthen such analysis. In this Review, we aim to facilitate the use of the wealth of structural information by providing an extensive but data-focused critical overview of more than 200 published structures of the S-protein, including the context of concerning mutations in new emerging variants that may affect ACE2 binding (and thus increase transmission) or reduce antibody binding (and thus impair the efficiency of vaccines).[5,38−40]

Overview of Spike Protein Structures

During natural fusion with ACE2 of the host cell, the S-protein is cleaved and irreversibly changes conformation into a more stable postfusion open conformation state (Figure a).[41] The prefusion S-protein is in a metastable closed state constituted in a lipid surface as a type-I membrane protein with an associated need to be stabilized, both upon fusion with the host cell, and also in the lab to characterize the protein. Strategies employed to this end have relied on mutation to proline in S2 (K986P and V987P), mutation of the furine cleavage site, stabilizing the C-terminal and transmembrane domain by mutation,[34,42] or employing cystine bridges to stabilize the prefusion trimer.[43] Strategies to improve stabilization and expression of the S-protein are progressing.[44,45]

Figure 1

Structural basis for S-protein fusion with host cells. (a) Prefusion closed state (PDB: 7DF3). ACE2 binding occurs via the open state, with RBDs in upward conformations (PDB: 7BNN). (b) Spike monomer cleavage sites: The first furin S1/S2 cleavage site is shown with yellow balls (R685–S686; fragment is missing in the structure) and the second S2̀ cleavage site in magenta (R815–S816). The S1 N-terminal domain (NTD) is followed by the S1 C-terminal domain (S1-CTD). The S1-CTD comprises the receptor binding domain (RBD) region. The second cleavage (S2̀) forms a fusion peptide upon ACE2 binding. Heptad repeats (HR1 and HR2) are located toward the C-terminal. The S-protein covers the virus lipid surface as a trimer protein, containing the subunit S1 with the N-terminal domain (NTD), RBD, subdomains 1 and 2 (SD1 and SD2), and S2, with heptad repeat 1 (HR1), the central helix (CH), a connecting domain (CD), heptad repeat 2 (HR2), a transmembrane domain (TM), and the C-terminal part of the protein (Figure b).[46] In the closed conformation with all three RBDs in more compact, so-called “down” conformations (“closed” state), the S-protein is not strongly accessible to the human cell, but cleavage into S1 and S2 subunits enables the virus particle to merge with the human host cell membrane, after binding to ACE2 in the “open” conformation where one or more RBDs have changed to the “up” conformation (Figure a).[42,46,47] Mutations may increase infectivity by strengthening amino acid interactions (and thereby the affinity) with ACE2, possibly while also favoring the more open, ACE2-favoring conformations, whereas antibodies will tend to prevent fusion by shielding interaction between the S-protein and ACE2.[47] Recent innovations in cryoelectron microscopy of proteins provide a central underpinning for the success in elucidating the structural biology of SARS-CoV-2[18−21,48] and an important reminder of the importance of basic science enabling later, sometimes urgent applied science. These breakthroughs include advancements in sample handling, electron detectors, and image processing.[19,20,48] The structures are obtained with samples of protein molecules typically deposited with a thin layer of vitrified ice on a carbon grid and rapidly cooled using a cryoagent.[19,20,49] Subsequent 3D-reconstruction of the protein structure is typically done by single particle analysis[48] or subtomogram averaging of multiple particle data to inform on, e.g., conformational ensembles.[19] Proteins that are embedded in lipids in vivo (such as the spike protein or human membrane proteins) are often difficult to stabilize due their special hydrophobic parts, and various techniques to stabilize them by mutations or with detergents, amphipols, and nanodiscs are common.[50−52] It is important to note for the following that the structures discussed at low temperature may not reflect the real conformations of the S-protein at physiological temperature (37 °C): freezing, which is fundamentally important to reduce thermal rearrangements due to ionization-induced radical reactions,[53] tends to remove some conformational dynamics of the protein.[54−58] Freezing-out important conformations and “cryo-contraction” have been observed in X-ray diffraction studies at variable temperature for ribonuclease[56] and myoglobin[57] and in molecular simulations.[54] Conformational changes play a major role in the function of the S-protein upon fusion with the human host cell, and one can expect the conformations to be temperature-dependent, as has indeed recently been observed.[59] Table shows a list of nearly full length (defined here as more than 900 residue coordinates) S-protein structures that we identified from searching the literature, UniProt, and the Protein Data Bank as of June 2021. Structures in the PDB that were not fully documented in the form of a full paper either published or in preprint were not included. The number of structures published during 1.5 years of pandemic is notable; we are not aware of any similar effort in modern structural biology. In addition, several structures of important natural variants have also been elucidated, as summarized in Table , and with already published structures of other coronavirus S-proteins (Table ), this provides a wealth of structural information for detailed structure-based functional and evolutionary studies.

Table 1

Single-Particle Cryoelectron Microscopy Studies Reporting Nearly Full-Length (N > 900) SARS-CoV-2 S-Protein Structures

protein state	PDB codes	release datea	reference
prefusion	6VSB	2020–02–26	Wrapp et al.[42]
open and closed	6VYB, 6VXX	2020–03–11	Walls et al.[47]
Ab-bound	6WPS	2020–05–27	Pinto et al.[187]
closed/all down	6X29, 6X2A, 6X2B, 6X2C	2020–05–27	Henderson et al.[46]
closed/all down	6X6P	2020–06–10	Herrera et al.[45]
Ab-bound	7BYR	2020–06–10	Cao et al.[188]
closed, cleaved, 1-up	6ZGE, 6ZGI, 6ZGG	2020–07–01	Wrobel et al.[80]
closed + Ab-bound	6Z97	2020–07–01	Huo et al.[189]
Ab-bound	7C2L	2020–07–01	Chi et al.[190]
Ab-bound	6XCM, 6XCN	2020–07–01	Barnes et al.[191]
Ab-bound	6ZDH	2020–07–01	Zhou et al.[192]
pre- and postfusion	6XR8, 6XRA	2020–07–22	Cai et al.[41]
cys-stabilized	6ZOX, 6ZOZ, 6ZOY, 6ZP1, 6ZP0, 6ZP2	2020–07–22	Xiong et al.[43]
Ab-bound	6XEY	2020–07–22	Liu et al.[193]
prefusion stabilized	6ZP5, 6ZP7, 6ZOW	2020–07–29	Melero et al.[68]
pH dependent states	6XM0, 6XM3, 6XM4, 6XM5, 7JWY, 6XLU	2020–08–12	Zhou et al.[66]
prefusion closed	6X79	2020–08–19	McCallum et al.[194]
open	7CN9	2020–08–26	Liu et al.[195]
Ab-bound	7JJI, 7JJJ	2020–08–26	Bangaru et al.[196]
prefusion closed	6XF5, 6XF6	2020–09–02	Zhou et al.[78]
Ab-bound	7CHH	2020–09–16	Du et al.[197]
free + ACE2-bound	7A93, 7A94, 7A95, 7A96, 7A97, 7A98	2020–09–16	Benton et al.[198]
nanobody-bound	6ZXN	2020–09–23	Hanke et al.[199]
Ab-bound (“inhibitor”)	7JZN, 7JZL	2020–09–23	Cao et al.[98]
closed/all down	6ZB4, 6ZB5	2020–09–30	Toelzer et al.[200]
Ab-bound	7K43, 7K4N	2020–10–07	Tortorici et al.[89]
with human VH binder	7JWB	2020–10–07	Bracken et al.[201]
Ab-bound	7JW0, 7JVC, 7JV6, 7JV4	2020–10–14	Piccoli et al.[202]
Ab-bound	7K8S, 7K8V, 7K8W, 7K8T, 7K8U, 7K8Z, 7K8X, 7K8Y, 7K90	2020–10–21	Barnes et al.[203]
Ab-bound	7A29, 7A25	2020–10–21	Custodio et al.[204]
closed and 1-up	7A4N, 7AD1	2020–11–04	Juraszek et al.[79]
Ab-bound	7KKK, 7KKL	2020–11–11	Schoof et al.[205]
prefusion and ACE2-bound	7KJ2, 7KJ3, 7KJ4, 7KJ5	2020–11–11	Xiao et al.[64]
ACE2-bound	7CT5	2020–11–18	Guo et al.[206]
closed and open and Ab-bound	7DDD, 7DDN, 7DD2, 7DCX, 7DK6, 7DK4, 7DCC, 7DK7, 7DD8, 7DK5	2020–11–25	Zhang et al.[207]
ACE2 complex	7KNB, 7KMZ, 7KMS, 7KNE, 7KNH, 7KNI	2020–12–09	Zhou et al.[67]
Ab-bound	7CWS, 7CWT, 7CWU	2020–12–16	Wang et al.[208]
closed, open, ACE2-bound	7DF3, 7DK3, 7DF4	2020–12–16	Xu et al.[82]
free and Ab-bound	7CAB, 7CAC, 7CAI, 7CAK	2020–12–16	Lv et al.[97]
Ab-bound	7CWM, 7CWN, 7CWL	2020–12–16	Yao et al.[96]
Ab-bound	7L06, 7L09, 7L02,	2020–12–30	Williams et al.[209]
nanobody-bound	7KSG, 7B18	2021–01–20	Koenig et al.[210]
Ab-bound	7LAB, 7LAA, 7LD1, 7LCN, 7LJR	2021–01–27	Li et al.[211]
Ab-bound	7L3N	2021–02–03	Jones et al.[212]
Ab-bound	7KS9	2021–02–10	Banach et al.[213]
Ab-bound	7KMK, 7KML, 7KXJ, 7KXK	2021–02–10	Miersch et al.[214]
vaccine BNT162b2	7L7K	2021–02–24	Vogel et al.[215]
Ab-bound	7NDC, 7NDD, 7NDA, 7NDB, 7ND7, 7ND8, 7ND5, 7ND6, 7ND9, 7ND3, 7ND4	2021–03–03	Dejnirattisai et al.[216]
Ab-bound	7LSS, 7LS9	2021–03–17	Cerutti et al.[217]
locked, active, ACE2-bound	7DWY, 7DWZ, 7DX5, 7DX6, 7DX3, 7DWX, 7DX9, 7DX7, 7DX8, 7DWX, 7DX0, 7DX1, 7DX2	2021–03–31	Yan et al.[63]
Ab-bound	7L56, 7L57, 7L58	2021–04–14	Rapp et al.[218]
Ab-bound C3	7LXY, 7LXZ, 7LY2	2021–04–14	McCallum et al.[92]
bound to biliverdin	7NT9, 7NTA, 7NTC	2021–04–28	Rosa et al.[219]
Ab-bound	7M6E, 7M6F, 7M6G, 7M6H, 7M6I	2021–05–05	Scheid et al.[220]
Ab-bound	7MKL	2021–05–12	VanBlargan et al.[221]
Ab-bound	7AKD, 7AKJ	2021–05–19	Fedry et al.[222]
Ab-bound	7KQE, 7KQB	2021–05–26	Asarnow et al.[223]
Ab-bound	7N0G, 7N0H	2021–06–02	Ahmad et al.[224]
Ab-bound	7DZW, 7DZX, 7DZY	2021–06–02	Liu et al.[225]
Ab-bound	7E8C	2021–06–09	Cao et al.[226]
Ab-bound	7MY2, 7MY3	2021–06–16	Xu et al.[227]
Ab-bound	7LRT, 7MM0	2021–07–14	Wang et al.[228]

Release date is approximative as it refers to the first PDB code listed in each study.

Table 2

Published Spike-Protein Mutant/Variant Structures

protein state	PDB code	release date	reference
D614G variant	6XS6	2020–07–22	Yurkovetskiy et al.[69]
D614G variant	7KDJ, 7KDK, 7KDH, 7KDI, 7KDL, 7KDG, 7KEC, 7KEA, 7KEB, 7KE9	2020–11–04	Gobeil et al.[65]
D614G variant	7DX1	2021–03–31	Yan et al.[63]
D614G closed	7BNM	2021–02–03	Benton et al.[124]
D614G open conformation	7BNN	2021–02–03	Benton et al.[124]
D614G open 2-RBD-up	7BNO	2021–02–03	Benton et al.[124]
B.1.1.7/alpha 1-RBD-up	7LWT, 7LWU, 7LWV	2021–03–31	Gobeil et al.[125]
B.1.1.7/alpha 3-RBD down	7LWS	2021–03–31	Gobeil et al.[125]
B.1.1.28 1-RBD-up	7LWW	2021–03–31	Gobeil et al.[125]
Mink Cluster 5 1-RBD up	7LWM, 7LWO	2021–03–31	Gobeil et al.[125]
Mink Cluster 5 2-RBD up	7LWP	2021–03–31	Gobeil et al.[125]
Mink Cluster 5 3-RBD-down	7LWI, 7LWJ, 7LWK, 7LWL	2021–03–31	Gobeil et al.[125]
P.1/gamma + ACE2	7NXC	2021–04–07	Gobeil et al.[125]
B.1.351/beta	7LYK, 7LYL, 7LYN, 7LYO, 7LYP, 7LYQ	2021–03–31	Gobeil et al.[125]
alpha/beta	7N1Q, 7N1T, 7N1U, 7N1V, 7N1W, 7N1X, 7N1Y	2021–07–07	Cai et al.[127]
P.1/gamma	7M8K	2021–05–05	Wang et al.[229]
N501Y mutant Ab/ACE2-bound	7MJG, 7MJH, 7MJJ, 7MJK, 7MJM	2021–05–12	Zhu et al.[230]
B.1.429/epsilon + S2M11,S2L20	7N8H, 7NHI	2021–07–14	McCallum et al.[126]
Hexapro stable lab mutant	6XKL	2020–07–15	Hsieh et al.[44]

Table 3

Other Human Coronavirus Spike Protein Structures

protein	PDB	release date	reference
human coronavirus HKU1	5I08	2016–03–02	Kirchdoerfer et al.[35]
human coronavirus NL63 spike	5SZS	2016–09–14	Walls et al.[231]
MERS-CoV	5X5C, 5X5F, 5X59	2017–05–03	Yuan et al.[37]
MERS-CoV	5W9K, 5W9I	2017–08–16	Pallesen et al.[36]
MERS-CoV	6Q04, 6Q05, 6Q06, 6Q07	2019–12–11	Park et al.[232]
SARS-CoV	5X5B, 5X58	2017–05–03	Yuan et al.[37]
SARS-CoV	5XLR, 5WRG	2017–06–07	Gui et al.[233]
SARS-CoV	6CRV, 6CRX, 6CRW, 6CRZ, 6CS1, 6CS0, 6CS2	2018–04–11	Kirchdoerfer et al.[234]
ACE2-bound SARS-CoV	6ACK, 6ACJ, 6ACC, 6ACD, 6ACG	2018–08–08	Song et al.[235]
human coronavirus 229E spike	6U7H	2019–11–13	Li et al.[236]
human coronavirus OC43 trimer	6NZK, 6OHW	2019–06–05	Tortorici et al.[132]
HKU2 S-protein	6M15	2020–05–27	Yu et al.[237]

Release date is approximative as it refers to the first PDB code listed in each study. The many available structures even for presumably same protein states raises a question on the relevance and transferability of conclusions based on different structures. Many of the structures are of excellent resolution considering that the size of the protein and the way the structures are obtained, approaching 3 Å resolution in some cryo-EM structures. Also, many conformation states and both antibody(Ab)-bound and ACE2-bound structures have been obtained so that the structures can be used comparatively, and we can estimate the heterogeneity between them and how this affects structure–function relationships. For smaller parts of the RBD interacting with antibodies, some crystal structures are available at resolutions approaching 2 Å.

Structural Heterogeneity: Which Structure Should Be Used?

The protein structures listed in Tables –3 were produced by different research groups and sometimes different protocols and protein states. The large number of studies complicates the choice of structure to use for structure-guided functional or evolutionary analysis. Of concern in this choice is (1) the overall quality of the structure, (2) the site coverage, (3) the protein state of interest (mutated/stabilized, prefusion, postfusion, free or bound to ACE2 or antibody), and (4) the conformation state of the protein (e.g., open or closed, 0, 1, 2, or 3 RBD in the upward conformation). Tables –7 display various properties of the SARS-CoV-2 S-protein structures relevant to making these decisions: Table summarizes data for the apoprotein,Table for the protein bound to ACE2, and Tables and 7 for the protein bound to antibodies.

Table 4

Properties of Some Representative Nearly Complete Structures of Apo-Trimer-S-Proteina

protein	PDB	N	chains	% outl.	res (Å)	RSA_avg	reference
pH 5.5	6XM0	1058	3	0.1	2.7	0.29	Zhou et al.[66]
pH 5.5, 1-up, conf. 1	6XM3	1060	3	0.1	2.9	0.29	Zhou et al.[66]
pH 5.5, 1-up, conf. 2	6XM4	1060	3	0.1	2.9	0.29	Zhou et al.[66]
pH 5.5, closed	6XM5	1058	3	0.0	3.1	0.28	Zhou et al.[66]
pH 4.5	7JWY	1063	3	0.1	2.5	0.28	Zhou et al.[66]
pH 4.0	6XLU	1063	3	0.1	2.4	0.27	Zhou et al.[66]
prefusion	6Z97	1002	3	0.0	3.4	0.28	Huo et al.[189]
prefusion, closed	6XF5	1009	3	0.0	3.5	0.27	Zhou et al.[78]
prefusion, 1-up	6XF6	999	3	0.0	4.0	0.28	Zhou et al.[78]
1-up closed	6ZP5	1121	3	0.0	3.1	0.28	Melero et al.[68]
prefusion 1-up	6ZP7	996	3	0.0	3.3	0.29	Melero et al.[68]
stabil. closed	7A4N	963	3	0.0	2.8	0.27	Juraszek et al.[79]
1-up	7AD1	967	3	0.0	2.9	0.28	Juraszek et al.[79]
closed	6X79	950	3	0.1	2.9	0.27	McCallum et al.[194]
closed	6X6P	1017	3	0.0	3.2	0.26	Herrera et al.[45]
closed	6X29	972	3	0.0	2.7	0.27	Henderson et al.[46]
1-up	6X2A	978	3	0.1	3.3	0.27	Henderson et al.[46]
2-up	6X2B	977	3	0.2	3.6	0.28	Henderson et al.[46]
closed	6X2C	972	3	0.1	3.2	0.27	Henderson et al.[46]
closed C1 symmetry	6ZB4	1055	3	0.0	3.0	0.28	Toelzer et al.[200]
closed C3 symmetry	6ZB5	1032	3	0.0	2.9	0.28	Toelzer et al.[200]
closed	7DDD	1088	3	0.1	3.0	0.25	Zhang et al.[207]
uncleaved, closed	6ZGE	1098	3	0.2	2.6	0.27	Wrobel et al.[80]
cleaved 1-up	6ZGG	1071	3	0.1	3.8	0.30	Wrobel et al.[80]
cleaved closed	6ZGI	1098	3	0.2	2.9	0.28	Wrobel et al.[80]
closed	7DF3	1088	3	0.3	2.7	0.24	Xu et al.[82]
closed	6VXX	972	3	0.2	2.8	0.27	Walls et al.[47]
open	7DDN	1068	3	0.1	6.3	0.28	Zhang et al.[207]
open	7DK3	1062	3	0.2	6.0	0.27	Xu et al.[82]
open	6VYB	979	3	0.3	3.2	0.27	Walls et al.[47]
locked	7DWY	1099	3	0.0	2.7	0.26	Yan et al.[63]
active	7DWZ	1007	3	0.5	3.3	0.29	Yan et al.[63]
2-RBD-up	7A93	1074	3	0.0	5.9	0.29	Benton et al.[124]
prefusion	6VSB	989	3	0.0	3.5	0.28	Wrapp et al.[42]
stabilized closed	6ZOX	1017	3	0.0	3.0	0.26	Xiong et al.[43]
stabilized locked	6ZOZ	1077	3	0.0	3.5	0.25	Xiong et al.[43]
stabilized closed	6ZP0	1030	3	0.0	3.0	0.26	Xiong et al.[43]
stabilized locked	6ZP2	1060	3	0.0	3.1	0.25	Xiong et al.[43]
prefusion	7JJI	1109	3	0.2	3.6	0.26	Bangaru et al.[196]
prefusion	7KJ5	999	3	0.4	3.6	0.28	Xiao et al.[64]
closed	7CAB	1029	3	0.1	3.5	0.26	Lv et al.[97]
open	7CN9	1061	3	0.0	4.7	0.32	Liu et al.[195]
1-up nonstabil.	7KDH	979	3	0.4	3.3	0.27	Gobeil et al.[65]
closed nonstabil.	7KDG	972	3	0.0	3.0	0.26	Gobeil et al.[65]

N = number of residues; chains = chains in structure; res (Å) = resolution in Å; RSAavg = average solvent exposure of all residues. % outl. = outliers of Ramachandran plot in %.

Table 7

Representative X-ray Structures of S-Protein RBD Bound to Antibodiesa

Ab bound to RBD	PDB	% outl.	res (Å)	reference
C5 nanobody	7OAO	0.0	1.5	Huo et al.[238]
H3/C1	7OAP	0.2	1.9	Huo et al.[238]
H3/C1 (alpha)	7OAQ	0.0	1.6	Huo et al.[238]
H3/C1 (N501Y)	7OAU	0.0	1.7	Huo et al.[238]
S2E12	7K3Q	0.0	1.4	Tortorici et al.[89]
S2X35	7JXE	0.0	2.0	Piccoli et al.[202]
S2A4	7JXD	0.1	2.5	Piccoli et al.[202]
S2H14	7JXC	0.2	2.5	Piccoli et al.[202]
VHH E	7KN5	0.1	1.9	Koenig et al.[210]
P4A1	7CJF	0.0	2.1	Guo et al.[239]
C1A-C2	7KFX	0.0	2.2	Clark et al.[240]
C1A-B12	7KFV	0.1	2.1	Clark et al.[240]
C1A-F10	7KFY	0.0	2.1	Clark et al.[240]
7D6	7EAM	0.4	1.4	Li et al.[101]
COVOX-269	7NEH	0.0	1.8	Supasa et al.[99]
COVOX-269 (N501Y)	7NEG	0.0	2.2	Supasa et al.[99]
B38	7BZ5	0.0	1.8	Wu et al.[241]
S309/S2X35	7R6W	0.2	1.8	Starr et al.[242]
LY-CoV481	7KMI	0.0	1.7	Jones et al.[212]
LY-CoV555	7KMG	0.2	2.2	Jones et al.[212]
LY-CoV488	7KMH	0.0	1.7	Jones et al.[212]
COVOX-222/EY6A	7NX6	0.1	2.3	Dejnirattisai et al.[100]
COVOX-222/EY6A (K417N)	7NX7	0.2	2.3	Dejnirattisai et al.[100]
COVOX-222/EY6A (K417T)	7NX8	0.1	2.0	Dejnirattisai et al.[100]
COVOX-222/EY6A (N501Y)	7NX9	0.1	2.4	Dejnirattisai et al.[100]
COVOX-222/EY6A (beta)	7NXA	0.1	2.5	Dejnirattisai et al.[100]
COVOX-222/EY6A (gamma)	7NXB	0.2	2.7	Dejnirattisai et al.[100]
SR31	7D2Z	0.0	2.0	Yao et al.[243]
MR17-SR31	7D30	0.0	2.1	Yao et al.[243]
WCSL 129	7MZI	0.0	1.9	Wheatley et al.[244]
PDI 42	7MZG	0.0	2.0	Wheatley et al.[244]
Re5D06	7OLZ	0.0	1.8	Guttler et al.[245]
CR3022	6YLA	0.2	2.4	Huo et al.[189]
BD-236	7CHB	0.2	2.4	Du et al.[197]
Sb14/Sb68	7MFU	0.2	1.7	Ahmad et al.[224]
Sb45	7KGJ	0.0	2.3	Ahmad et al.[224]

N = residues in structure; chains = chains in structure; res (Å) = resolution in Å; RSAavg = average solvent exposure of all residues. % outl. = outliers of Ramachandran plot in % (from PDB full report).

Table 5

Properties of Nearly Complete (N > 900) Structures of Spike Protein Bound to ACE2a

protein state	PDB	N	chains	% outl.	res (Å)	RSA_avg	reference
1 ACE2	7A94	1086	4	0.2	3.9	0.29	Benton et al.[198]
1 ACE2, 1-up	7A95	1075	4	0.2	4.3	0.30	Benton et al.[198]
1 ACE2, 1-up	7A96	1071	4	0.2	4.8	0.29	Benton et al.[198]
2 ACE2, bound	7A97	1072	5	0.1	4.4	0.29	Benton et al.[198]
3 ACE2, bound	7A98	1071	6	0.2	5.4	0.29	Benton et al.[198]
1 ACE2 pH 7.4	7KNB	1083	4	0.0	3.9	0.29	Zhou et al.[67]
2 ACE2 pH 7.4	7KMZ	1085	5	0.0	3.6	0.29	Zhou et al.[67]
3 ACE2 pH 7.4	7KMS	1086	6	0.0	3.6	0.28	Zhou et al.[67]
1 ACE2, pH 5.5	7KNE	1083	4	0.0	3.9	0.29	Zhou et al.[67]
2 ACE2, pH 5.5	7KNH	1085	5	0.0	3.7	0.30	Zhou et al.[67]
3 ACE2, pH 5.5	7KNI	1082	6	0.0	3.9	0.28	Zhou et al.[67]
1 ACE2	7DF4	1082	4	0.2	3.8	0.27	Xu et al.[82]
ACE2/PD, 1-up	7DX5	1065	4	0.3	3.3	0.28	Yan et al.[63]
ACE2/PD, 2-up	7DX6	1065	4	0.3	3.0	0.29	Yan et al.[63]
ACE2/2PD, 3-up	7DX9	1065	5	0.3	3.6	0.29	Yan et al.[63]
ACE2/PD, 1-up	7DX7	1065	4	0.3	3.4	0.28	Yan et al.[63]
ACE2/2PD, 2-up	7DX8	1065	5	0.3	2.9	0.28	Yan et al.[63]
1 ACE2	7KJ2	1069	4	0.3	3.6	0.28	Xiao et al.[64]
2 ACE2	7KJ3	1069	5	0.4	3.7	0.28	Xiao et al.[64]
3 ACE2	7KJ4	1069	6	0.4	3.4	0.28	Xiao et al.[64]
design ACE2	7CT5	1067	6	0.1	4.0	0.29	Guo et al.[206]

N = number of residues in structure; chains = chains in structure; res (Å) = resolution in Å; RSAavg = average solvent exposure; % outl. = outliers of Ramachandran plot in % (from PDB full report); PD = peptidase domain of ACE2.

Table 6

Properties of Some Published Nearly Complete Cryo-EM Structures of Spike Protein Bound to Antibodiesa

protein	PDB	N	chains	% outl.	res (Å)	RSA_avg	reference
Fab 2–4 closed	6XEY	1081	9	0.1	3.3	0.28	Liu et al.[193]
C105 state 1	6XCM	1045	7	0.0	3.4	0.28	Barnes et al.[191]
C105 state 2	6XCN	1035	9	0.0	3.7	0.28	Barnes et al.[191]
S2M11/S2L28	7LXZ	1080	15	0.2	2.6	0.28	McCallum et al.[92]
S2M11/S2X333	7LXY	1065	15	0.2	2.2	0.27	McCallum et al.[92]
S2M11/S2M28	7LY2	1067	15	0.2	2.5	0.27	McCallum et al.[92]
S309	6WPS	995	9	0.3	3.1	0.27	Pinto et al.[187]
EY6A	6ZDH	1072	9	0.0	3.7	0.29	Zhou et al.[192]
Sb23	7A29	1076	6	0.0	2.9	0.29	Custodio et al.[204]
Sb23	7A25	1076	6	0.0	3.1	0.29	Custodio et al.[204]
Fab 2–7	7LSS	1044	5	0.0	3.7	0.28	Cerutti et al.[217]
1–57 Fab	7LS9	1131	9	0.1	3.4	0.25	Cerutti et al.[217]
P17 1-up	7CWM	1087	9	0.1	3.6	0.28	Yao et al.[96]
P17/H014	7CWN	1086	15	0.1	3.2	0.28	Yao et al.[96]
P17 2-up	7CWL	1090	9	0.1	3.8	0.28	Yao et al.[96]
3C1 fab 2-up	7DD2	1081	7	0.0	5.6	0.30	Zhang et al.[207]
2 × 3C1 fab 2-up	7DCX	1081	9	0.0	5.9	0.29	Zhang et al.[207]
2 × 2H2 Fab 2-up	7DK6	1081	7	0.0	4.3	0.28	Zhang et al.[207]
3 × 2H2 Fab 2-up	7DK4	1079	9	0.0	3.8	0.27	Zhang et al.[207]
3 × 3C1 fab 3-up	7DCC	1081	9	0.0	4.3	0.30	Zhang et al.[207]
3 × 2H2 Fab 3 up	7DK7	1081	9	0.0	9.7	0.29	Zhang et al.[207]
1 × 3C1 fab 1-up	7DD8	1081	5	0.0	7.5	0.29	Zhang et al.[207]
1 × 2H2 Fab 1-up	7DK5	1081	5	0.0	13.5	0.28	Zhang et al.[207]
3 × 4A8	7C2L	1073	9	0.9	3.1	0.29	Chi et al.[190]
1 × Ab23-Fab	7BYR	1051	5	0.0	3.8	0.28	Cao et al.[188]
1 × Fab H4	7L58	1108	5	0.4	5.1	0.36	Rapp et al.[218]
3 × Fab 2–43	7L56	1056	9	0.2	3.6	0.28	Rapp et al.[218]
1 × Fab 2–15	7L57	1055	5	0.1	5.9	0.38	Rapp et al.[218]
nanobody Ty1	6ZXN	1076	6	0.0	2.9	0.28	Hanke et al.[199]
1 × H014 Fab 1-up	7CAC	1072	5	0.1	3.6	0.28	Lv et al.[97]
2 × H014 Fab 2-up	7CAI	1069	7	0.2	3.5	0.28	Lv et al.[97]
3 × H014 Fab 3-up	7CAK	1061	9	0.2	3.6	0.28	Lv et al.[97]
S-6P BD-368-2	7CHH	1052	9	0.0	3.5	0.28	Du et al.[197]
FC05 + H014	7CWS	1089	15	0.4	3.4	0.28	Wang et al.[208]
hb27 + fc05	7CWT	1088	15	0.4	3.7	0.28	Wang et al.[208]
P17 + FC05	7CWU	1090	15	0.1	3.5	0.29	Wang et al.[208]
S2H13, 1-up	7JV4	1025	9	0.3	3.4	0.29	Piccoli et al.[202]
S2H13, closed	7JV6	1019	9	0.3	3.0	0.29	Piccoli et al.[202]
S304	7JW0	1061	9	0.5	4.3	0.33	Piccoli et al.[202]
S2A4	7JVC	1064	9	0.6	3.3	0.31	Piccoli et al.[202]
VH domain	7JWB	1079	4	0.1	3.2	0.28	Bracken et al.[201]
LCB1 2-up	7JZL	1018	6	0.3	2.7	0.28	Cao et al.[98]
LCB3 2-up	7JZN	1018	6	0.3	3.1	0.28	Cao et al.[98]
S2M11	7K43	1059	9	0.2	2.6	0.26	Tortorici et al.[89]
S2E12	7K4N	1037	9	0.4	3.3	0.29	Tortorici et al.[89]
human Ab C002	7K8S	1049	9	0.2	3.4	0.28	Barnes et al.[203]
human Ab C110	7K8V	1037	7	0.0	3.8	0.28	Barnes et al.[203]
human Ab C119	7K8W	1054	7	0.1	3.6	0.28	Barnes et al.[203]
human Ab C002	7K8T	1046	9	0.1	3.4	0.27	Barnes et al.[203]
human Ab C104	7K8U	1036	5	0.0	3.8	0.29	Barnes et al.[203]
human Ab C135	7K8Z	1026	7	0.0	3.5	0.28	Barnes et al.[203]
human Ab C121	7K8X	1042	7	0.0	3.9	0.29	Barnes et al.[203]
human Ab C121	7K8Y	1042	7	0.0	4.4	0.29	Barnes et al.[203]
human Ab C144	7K90	1061	9	0.1	3.2	0.27	Barnes et al.[203]
nanobody Nb6	7KKK	1037	6	0.0	3.0	0.27	Schoof et al.[205]
nanobody mNb6	7KKL	1037	6	0.2	2.9	0.28	Schoof et al.[205]
Fab 15033-7	7KMK	1066	7	0.2	4.2	0.27	Miersch et al.[214]
Fab 15033-7	7KML	1071	9	0.2	3.8	0.28	Miersch et al.[214]
Fab 910-30	7KS9	1039	5	0.0	4.8	0.28	Banach et al.[213]
nanobody	7KSG	1096	6	0.0	3.3	0.29	Koenig et al.[210]
1 × 2G12	7L02	1052	7	0.2	3.2	0.27	Williams et al.[209]
2 × 2G12	7L06	1052	11	0.1	3.3	0.28	Williams et al.[209]
2G12	7L09	1052	7	0.2	3.1	0.27	Williams et al.[209]
LY-CoV555	7L3N	1055	5	0.2	3.3	0.28	Jones et al.[212]
BNT162b2	7L7K	986	3	0.0	3.3	0.29	Vogel et al.[215]
DH1041	7LAA	1085	5	0.2	3.4	0.28	Li et al.[211]
DH1052	7LAB	1024	9	0.3	3.0	0.27	Li et al.[211]
DH1047	7LD1	1045	9	0.4	3.4	0.27	Li et al.[211]
DH1050.1	7LCN	1042	9	0.9	3.4	0.27	Li et al.[211]
DH1043	7LJR	1053	5	0.4	3.7	0.27	Li et al.[211]
COVOX-253H55L	7NDA	1082	5	0.4	3.3	0.27	Dejnirattisai et al.[216]
COVOX-253H165L	7NDB	1111	5	0.1	4.6	0.27	Dejnirattisai et al.[216]
COVOX-159	7NDC	1082	9	0.1	4.1	0.28	Dejnirattisai et al.[216]
COVOX-159	7NDD	1082	9	0.1	4.2	0.28	Dejnirattisai et al.[216]
COVOX-40	7ND3	1038	5	0.0	3.7	0.28	Dejnirattisai et al.[216]
COVOX-150	7ND5	1074	5	0.1	3.4	0.28	Dejnirattisai et al.[216]
COVOX-158	7ND6	1074	5	0.0	7.3	0.28	Dejnirattisai et al.[216]
COVOX-316	7ND7	1038	9	0.4	3.6	0.27	Dejnirattisai et al.[216]
COVOX-384	7ND8	1072	5	0.3	3.5	0.27	Dejnirattisai et al.[216]
COVOX-253H55L	7ND9	1118	5	0.1	2.8	0.27	Dejnirattisai et al.[216]

N = number of residues; chains = chains in structure; res (Å) = resolution in Å; RSAavg = average solvent exposure of all residues. % outl. = outliers of Ramachandran plot in %. N = number of residues in structure; chains = chains in structure; res (Å) = resolution in Å; RSAavg = average solvent exposure; % outl. = outliers of Ramachandran plot in % (from PDB full report); PD = peptidase domain of ACE2. N = residues in structure; chains = chains in structure; res (Å) = resolution in Å; RSAavg = average solvent exposure of all residues. % outl. = outliers of Ramachandran plot in % (from PDB full report); Fab = S-protein-binding antibody fragment. N = residues in structure; chains = chains in structure; res (Å) = resolution in Å; RSAavg = average solvent exposure of all residues. % outl. = outliers of Ramachandran plot in % (from PDB full report). In terms of quality, 2.5–3.5 Å represents the resolution range where amino acid side chain conformations and functionally important surface residues become resolved.[60] A resolution of 3 Å does not enable identification of all atom positions, including water molecules, but provides very good information on the overall backbone conformation and secondary and tertiary structure.[60] We expect many surface residues to be not well-resolved at 3 Å resolution, and some rotamers may be mismodeled. These residues are the functionally important ones in the case of the S-protein, i.e., resolution improvements toward the 2-Å range would be of substantial value. Generally, many of the published structures display very good resolution of the secondary structure and tertiary structure of the S-protein, but the confidence in the orientations of individual surface residues is limited (see below). The percentage of outliers from Ramachandran plots (torsion angles of the peptide backbone) are also shown in Tables –7 as an indicator of the structure’s backbone conformations.[60] These were calculated using the Procheck program[61] (version 3.6.2) available via the PDBsum server from EMBL-EBI.[62] A smaller number implies a normal and expected backbone conformation in the structure, whereas a larger number implies more unusual or “strained” backbone conformations in the structure. For the apo-S-protein structures of Table , the values are generally quite as expected, with Ramachandran outlier residues typically constituting only 0.0–0.3% of the structures. The few exceptions to this are the active state 7DWZ (0.5%),[63] the prefusion state 7KJ5 (0.4%),[64] and the 1-up conformation structure 7KDH (0.4%),[65] but these numbers are fully reasonable, as 0–0.5% translates to only 0–5 of the approximately 1000 residues in the S-protein having an unusual backbone conformation. In terms of amino acid coverage, the total number of residue sites resolved by coordinates is listed as “N” in Tables –7. For a given resolution of ∼3 Å, a more complete structure seems more suitable for study, or if specific mutations are of interest, they should be at least covered by full coordinates in the structure. For example, P681 is an important mutation site that is typically not present in the structures, and most structures miss many of the disordered N-terminal sites harboring mutations of interest such as S13I, L18F, T20N, and P26S. Thus, for example, structure 7LXY has site T20 but misses P26, whereas 6ZB4 has P26 but misses T20. Many other structures, such as 6VXX and 6XM0, miss both. In addition to choosing the correct protein state for the analysis of interest (apoprotein or bound to, e.g., ACE2), it is also important to consider that some protein structures may represent different, more or less open states, both in terms of having 0, 1, 2, or 3 RBD in the upward conformation, but also in terms of other features, such as mutated or deleted transmembrane domains that may affect the overall structure. Herrera et al. reported that some structures (they identified 6VXX, 6X29, 6X2C, 6X79, 6ZOX, 6ZOY, 6ZP0, 6ZP1, 6ZWV) display one conformation (called conformation 1), whereas others (e.g., 6XR8, 6ZGE, 6ZGH, 6ZGI, 6ZP2, 7JJI, 7JJJ) display conformation 2 and have missing residues with respect to the region that starts at the notorious site 614 (614–642).[45] As discussed below, the D614 site has substantial heterogeneity even between structures resembling the supposedly same state. Evidently, the S-protein is very sensitive to changes in the environment, and it is reasonable to expect the in vivo conformations of the protein in the lipid membrane of the virus to differ from the structures obtained in vitro: pH[66,67] and temperature[59] change the conformational states, and molecular crowding, salt, or other features of the local heterogeneous in vivo environment may be expected to do so as well. Conformational states may be affected by delicate effects in the protocols,[68] e.g., the mutations used to stabilize the protein for characterization[34] and the temperature effects on the more dynamic surface residues of the protein.[54−58] Higher temperature as in the human body is likely to favor more entropic conformations states, which might affect the epitopes and may explain the temperature-dependence of some structures.[59] The extent of specific protein variants to favor certain open or closed conformations should thus be seen in this context. For example, the D614G mutation has been reported to favor a more open conformational state[69] but the tendency to do so is probably temperature-dependent.[70] Accordingly, subtomogram averaging and physiological-temperature molecular dynamics simulations are important for analyzing the conformations of the protein, as increasingly explored.[70−74] To determine the extent of the variation in the published S-protein structures of Tables –6, we have listed the average relative solvent accessible surface area (RSA) per site in the structure as an indicator of conformational openness, using the Naccess algorithm[75] as implemented via the FreeSASA program.[76] This property is of central importance in protein evolution,[77] and is particularly relevant because most of the function of interest of the S-protein relates to surface interaction with other proteins, notably antibodies and ACE2. Mutations that affect virus infectivity and immune evasion are likely to be solvent-exposed, as they will then more directly affect the affinity for antibodies or ACE2, e.g., via electrostatic, hydrophobic, or hydrogen bond interactions. As seen from Tables –6, the average exposure per residue of the S-protein tends to be approximately 25−30% (RSA values 0.25–0.30). Considering that the number is an average of approximately 1000 residues, even a difference of 0.01 between two structures (1% difference on average) indicates substantial surface heterogeneity, e.g., 20 residues that change from 0 to 50% exposure. However, we cannot determine which conformation is correct, due to mutations used to stabilize the proteins, the modest resolution and cryo-effects on the conformational dynamics.[54] Still, the total average provides a robust indicator of the surface-specific heterogeneity of the protein structures. We note that for the apo-S-protein (Table ), all average RSA values of 0.29 or larger are partly open or active states with one or more RBD in the upward conformation whereas closed states tend to have lower values of 0.25–0.28. For comparable structures of the same study, the closed structures have lower RSA than partial open structures. Thus, in the closed structure by Zhou et al.,[78] (6XF5) RSA = 0.27, whereas the conformation with one RBD in the upward conformation (1-up; (6XF6) has RSA = 0.28 (Table ). All of the studies that included both closed and partially open states by Melero et al.,[68] Juraszek et al.,[79] Henderson et al.,[46] Wrobel et al.,[80] and Gobeil et al.[65] confirm this observation. The RSA thus seems to be an important simple informer on the conformation state of the nearly complete S-protein structures. From the pH-dependent structures of Zhou et al.,[66] lower pH values tend to produce more closed apo-S protein structures, with pH = 4.0 giving RSA = 0.27 (6XLU) and higher pH giving 0.28 (7JWY, 6XM5). This is consistent with a general recognition of conformational flexibility in the metastable prefusion state.[46,65,68]

ACE2 Binding

The critical step in infection is the fusion of the S-protein with human ACE2, a process that enables the partial opening of the cell membrane and injection of virus genetic material into the cytoplasm.[12,13] New arising mutations in particular on the surface of the S-protein can affect this crucial interaction and thereby the infection process, with both alpha and beta variants shown to possess higher ACE2 affinity.[81]Figure a shows the apo-S-protein (i.e., without any antibody or other protein bound), in its primary closed conformation with all three RBDs in the downward conformation (PDB: 7DF3).[82] Prominent naturally occurring mutations are shown in red and orange. Sites of the RBD shown in red harbor mutations of concern either due to enhanced antibody evasion or effects on transmission. Figure b and Figure c show the corresponding structure and mutations of the open state involved in ACE2 binding: Figure b shows the specific conformations of this state, whereas Figure c shows the full context of ACE2-binding, based on the published structure 7KMS by Zhou et al.[67]

Figure 2

SARS-CoV-2 structures and mutations. (a) The S-protein in RBD-down conformation (PDB 7DF3). Trimer and monomer are represented along with natural mutations reported in spike. Red represents natural mutations of concern, and orange represents other natural mutations. (b) RBD up conformation (PDB 7KMS). (c) S-protein bound to ACE2 in RBD-up conformation (PDB 7KMS). One can envision mutations to increase the infectivity either by increasing affinity toward ACE2 by maintaining higher ACE2 affinity relative to the most important antibodies, possibly favoring open conformation states that associate more strongly with ACE2 (Figure ). The virus particle’s lifetime in the host may depend on the relative propensity to bind to ACE2 and infect cells vs binding to prominent antibodies. This relative propensity via competitive binding is quantified by the ratio of chemical association constants KACE2/KAb corresponding to the difference in binding free energy of the S-protein to ACE2 vs antibodies (Ab). These affinities are again defined by the mutations that mostly change the interaction with ACE2 and the antibodies. Competitive binding is a distinct consideration from either binding to antibodies (host immunity evasion) or ACE2 (host cell fusion) that plausibly correlates better with virus fitness. Surface mutations may increase binding nonspecifically, and mutations that bind less to antibodies (“antibody escape”) can also bind less to ACE2, which may not change the overall fitness of the variant. These considerations indicate why the relative affinity toward ACE2 and antibodies is of importance to structure-based understanding of SARS-CoV-2 transmission. One may also expect individual variations in e.g., ACE2 expression and surface composition to contribute to the heterogeneity in susceptibility and transmissibility that plays a central role in the epidemiology of the disease.[83,84] For example, ACE2 expression is age-dependent.[85,86] and may correlate with mortality,[87] consistent with lower susceptibility being a function of S-protein-ACE2 complex formation. To understand these interactions, we need to apply the structures of the S-protein bound to ACE2, but also a reasonable structure of the apo-S-protein, to appreciate how the protein structure itself is affected by the binding and how mutations affect the two states. Thus, whereas Table lists properties of the apo-S-protein, Table lists corresponding properties of resolved structures of the S-protein bound to (parts of) ACE2. As already mentioned, the S-protein preferably binds ACE2 in a more open conformation; thus, the RSA values of the ACE2-S-protein complexes are generally of the “open” type, typically 0.28–0.29, with only three exceptions in Table . In general, the resolution of the ACE2 complexes is not as good as for the apo-S-protein, with most R-values >3.5 Å. The notable exceptions are the structures by Yan et al.[63] (2.9–3.6 Å) that also include both the 1-up, 2-up, and 3-up conformation states with one or two ACE2 peptidase domains. Also of note, Zhou et al.[67] studied the complexes with 1, 2, and 3 ACE2 molecules bound at two different pH values, 7.4 and 5.5. The RSA values indicate that pH does not affect the ACE2 complexes as much as the apo-S-protein probably because the ACE2-bound S-protein is always in the open state, regardless of pH.

Structural Basis of Antigenic Drift

Some properties of 80 structures published with antibodies (including nanobodies, etc.) bound to the S-protein are summarized in Table . Antigenic drift can be defined as reduced affinity (typically in the picomolar range) of new arising mutations toward important antibodies.[88,89] New variants may harbor mutations at positions of the S-protein that interact strongly with notable antibodies, including prominent antibodies targeting earlier variants, and these new mutations may reduce the affinity for the antibodies, producing more resistant virus variants.[6,90−92] Understanding this antibody evasion thus requires understanding the (loss of) binding affinity caused by mutation, which has been obtained in several important studies[39,93−95] and is a good structural basis for rationalization of the observed effects. The 80 structures included in Table cover a large range of resolutions and very distinct complexes, and thus offer a wealth of information on the protein–protein interactions of the S-protein. Despite the complexity of these structures, the structures are often of comparable resolution to typical structures of the apo-S-protein (Table ), with many achieving resolutions near 3 Å, which may be considered a benchmark. Some interesting highlights include structures revealing distinct conformation states even for the same antibody, such as P17 in 1-up and 2-up conformations (7CWM/7CWL),[96] which further testifies to the conformational plasticity of the S-protein. Another example is the study by Lv et al.[97] of H014 binding covering both 1, 2, and 3 molecules binding to the S-protein trimer at reasonable resolution, with corresponding 1-up, 2-up, and 3-up conformations, which provides important systematic insight into the effect of binding stoichiometry on the S-protein structure. An escape mutation with regards to one antibody may be captured by another antibody due to distinct antibody surfaces and associated binding modes. This is the molecular basis for the promising use of antibody cocktails in vaccines to minimize the potential threat of escape mutations.[93] An illustrative example is the antibody cocktail REGN10987+REGN10933 (casirivimab/imdevimab) combining two antibodies that bind to different parts of the RBD of the S-protein and thus potentially makes antigenic drift more difficult as it requires multiple substitutions.[93] This illustrates well the power of structural biology in providing the basis for interpreting the assay data of the escape mutations and the effect of different antibodies. Figure illustrates some representative structures of the S-protein associated with ACE2 and antibodies, with emphasis on the different conformations achievable. Antibodies are produced by the immune system to encapsulate the virus and target it for destruction by e.g., macrophages, and therefore the binding to ACE2 may involve residues distinct from those interacting with antibodies. Structural alignment of 10 representative ACE2 complexes (Figure a) reveals conformational heterogeneity. Structural alignment of antibody complexes (7A29, Figure b) reveals distinct conformations obtainable even when the same antibody binds. Such heterogeneity is also evident from the different conformational states attained in the strongly bound miniprotein complex 7JZL[98] (Figure c), where two antibodies bound to S-protein attained similar conformations whereas a third attained a different RBD-conformation. Thus, there is substantial conformational variation in the S-protein bound even to the same antibodies, probably also affected by both temperature, molecular crowding, and pH.[67]

Figure 3

Structural comparison of S-protein-ACE2 and antibody complexes. (a) Structural alignment of the ten spike-ACE2 monomers from PDB structures with high resolution (7DX8, 7DX6, 7DX5, 7DX7, 7KJ4, 7DX3, 7KMZ, 7KMS, 7DX9, and 7KJ2) shows mutual RMSD of 0.35–2.87 Å. (b) Spike–antibody complex 7A29. (c) Miniprotein complex 7JZL. Two spike monomers attained similar RBD-up conformations, and one displayed an RBD-down orientation. Because the resolution is lower in nearly complete cryo-EM structures, it is also of interest to discuss X-ray crystal structures of smaller parts of the S-protein interacting with antibodies, which have been obtained at higher resolution in several cases, and thus provide a more precise structural account of the S-protein–antibody interaction. Illustrative examples of such structures of antibodies interacting with the RBD with resolutions at 2.5 Å and below reported in published papers are compiled in Table . These structures provide in some cases a substantially better detail of individual amino acid conformations, with the resolution of interactions also being improved relative to full cryo-EM structures; however, it is at the expense of only having some parts of the full chemical composition, which may affect precision. For the discussion here, we emphasize enlightening studies of variant effects on RBD interaction with antibodies (Figure ). In one notable study, Supasa et al.[99] reported 1.8 and 2.2 Å resolution structures of RBD binding to antibodies with and without the N501Y mutation at the ACE2 interacting surface. Although the alpha variant does not generally show escape from natural or monoclonal antibodies or vaccines, it is not easily neutralized by some antibodies, and interaction can occur with the antibody light chain at position 501.[99] Interaction of the COVOX-269 antibody with S-protein RBD was shown to be affected by the larger tyrosine side chain at position 501 (Figure a–c).

Figure 4

X-ray crystal structures of S-protein RBD complexed with antibodies. (a) Structural alignment of 7NEH and 7NEG (N501Y), both complexed with COVOX-269. (b) Close view of spike N501 (7NEH) and residues of COVOX-269 interacting with N501 highlighted as sticks. (c) Spike N501Y variant, with Y501 and antibody residues interacting with it shown in sticks. (d) Beta variant and (e) gamma variant complexed with two antibodies COVOX-222 and EY6A. (f) High-resolution of RBD complexed with cross-neutralizing antibody 7D6. In another insightful study of the beta variant (lineage B.1.351) structure (7NXA; Figure d), three mutations, K417N, E484K, and N501Y, affected binding to two antibodies COVOX-222 and EY6A, providing a structural basis for the chemical change in interaction and reduced antibody affinity.[100] Similarly, the published structure of the RBD of gamma (P.1) (7NXB) (Figure e) comprises three gamma mutations K417T, E484K, and N501Y, two of which are also present in beta variant.[100] These mutation sites interact with COVOX-222, whereas EY6A binds to the peripheral region of the RBD (Figure d, 4e). The gamma variant has displayed less resistance to antibodies produced from vaccine or natural infection as compared to beta, indicating possible neutralization effects outside the RDB.[100] As a final example, Figure f represents a high-resolution (1.4 Å) RBD structure complexed with the cross-neutralizing antibody 7D6.[101] This antibody binds to a cryptic site, which is distinct from those typically targeted by other antibodies (e.g., Figure a–e), and accordingly, the RBD attains a different conformation (closed) as compared to the structures shown in Figure a–e. The side-chain amino acid conformations in these good-resolution crystal structures are substantially more precisely determined, which makes the structures highly complementary to the lower-resolution, but more complete cryo-EM structures. Due to the conformation states being affected sometimes differently by binding ACE2 and since the interacting residues are not generally the same, the binding energy effect of a mutation is likely to differ for ACE2 and any specific antibody. The most successful variants may not be those that increase the free energy of binding to ACE2 or decrease the free energy of binding to prominent antibodies, but those with the largest difference in affinity toward ACE2 and prominent antibodies. Such selectivity is plausibly related to the virion lifetime in the host and could in principle be measured or computed from published affinities of mutants toward antibodies and ACE2,[92−95,102,103] using the structures reviewed in Tables –7 as input for such analysis.

Evolution of the S-protein in a Structural Context

SARS-CoV-2 is the seventh coronavirus known to infect Homo sapiens, the others being the common-cold causing HCoV-229E, HCoV-NL63, HCoV-HKU1, and HCoV-OC43, as well as the more pathogenic SARS-CoV-1 and MERS-CoV.[104,105] While some of the milder of these are α-coronaviruses and some β-coronaviruses (OC43, HKU1), all the three most pathogenic viruses are β-coronaviruses with distinct functional properties.[106] Coronaviruses consist of four main structural proteins, with the membrane (M), envelope (E), and nucleocapsid (N) proteins in addition to the S-protein of focus in the present paper.[107] Compared to some other mRNA viruses, which generally have very high mutation rates, coronaviruses tend to evolve several-fold more slowly due to their proof-reading machinery,[108] of the order of 10–4 substitutions per site per year.[105,109] SARS-CoV-2 is currently evolving substantially faster than this (albeit still slowly compared to other nonproofed mRNA viruses), estimated at 7 × 10–4 substitutions per site per year (2 × 10–6 per day),[8] and has seen a remarkable lineage evolution during the pandemic, as expected from its high prevalence and ongoing adaptation to humans.[9] Since the emergence of SARS-CoV-2, estimated to be Autumn 2019, several hundred recurrent mutations have been identified, 80% of these being nonsynonymous changes in the virus proteins,[4] including many in the S-protein.[11] As with the other highly pathogenic coronaviruses MERS-CoV and SARS-CoV,[104] the closest related lineages are found in bats.[4] The virus is subject to both neutral evolution and positive selection, and both the nonsynonymous[4] and synonymous[110] evolution rates have been found to be high. We expect SARS-CoV-2 to adapt by accumulating mutations that increase binding to human ACE2, under increased selection pressure from S-protein-based immunity induced by vaccines and therapeutic antibodies, which may lead to mutations that minimize binding to S-protein antibodies frequent in the population.[103,111] Accordingly, bat S-proteins do not have the same affinity for ACE2.[80] These two fitness effect may not work additively; in fact we could envision many mutations to bind better to both proteins, such that the relative specificity for ACE2 vs prominent antibodies would contribute to the fitness. At the same time, overall S-protein stability is probably a restraining parameter on possible evolution, limiting new groups of mutations to those that do not compromise overall conformational stability of the protein in the virion lipid surface, consistent with the constraining role of stability seen more generally in protein evolution.[24,30,33,112,113] After the early D614G mutant[114] became quickly dominant plausibly due to a modest fitness advantage,[69,115] a period of relative calm[9] existed before a “storm” of new variants emerging during late 2020 and early 2021, with notable examples being alpha (B.1.1.7 lineage, clade 20I/501Y.v1),[116] beta (B.1.351, clade 20H/501Y.v2),[117] gamma (P.1, clade 20J/501Y.v3),[118] and delta (B.1.617.2).[119] These variants are of concern due to the presence of amino acid substitutions such as E484K that are associated with escape from monoclonal antibodies.[120,121] Importantly, whereas the new variants display reduced sensitivity to some vaccines after first dose, vaccine efficiency remains high toward symptomatic infection after full dosing, and the most important function of the vaccines is to reduce the severity of the disease.[122,123] Representative S-protein structures of variants of SARS-CoV2 are displayed in Table .

Table 8

Properties of Some Published Structures of Spike Protein Variantsa

protein	PDB	N	chains	% outl.	res (Å)	RSA_avg	reference
D614G 1-up	7KDJ	979	3	0.2	3.5	0.27	Gobeil et al.[65]
D614G closed	7KDK	972	3	0.0	2.8	0.26	Gobeil et al.[65]
D614G 1-up	7KDL	979	3	0.1	3.0	0.27	Gobeil et al.[65]
D614G closed	7KDI	972	3	0.1	3.3	0.27	Gobeil et al.[65]
D614G 1-up	7KEC	979	3	0.4	3.8	0.26	Gobeil et al.[65]
D614G 1-up	7KEA	979	3	0.2	3.3	0.27	Gobeil et al.[65]
D614G 1-up	7KEB	979	3	0.4	3.5	0.26	Gobeil et al.[65]
D614G 1-up	7KE9	979	3	0.2	3.1	0.27	Gobeil et al.[65]
D614G variant	6XS6	785	3	0.0	3.7	0.29	Yurkovetskiy et al.[69]
D614G variant	7DX1	972	3	0.4	3.1	0.29	Yan et al.[63]
D614G open	7BNN	1074	3	0.0	3.5	0.30	Benton et al.[124]
D614G −2up	7BNO	1066	3	0.0	4.2	0.30	Benton et al.[124]
Cluster-5 1-up	7LWM	997	3	0.3	2.8	0.27	Gobeil et al.[125]
Cluster-5 1-up	7LWO	997	3	0.3	2.9	0.28	Gobeil et al.[125]
Cluster-5 2-up	7LWP	995	3	0.4	3.0	0.28	Gobeil et al.[125]
Cluster-5 3-d	7LWI	1001	3	0.2	3.1	0.27	Gobeil et al.[125]
Cluster-5 3-d	7LWJ	1001	3	0.2	3.2	0.27	Gobeil et al.[125]
Cluster-5 3-d	7LWK	1001	3	0.2	2.9	0.28	Gobeil et al.[125]
Cluster-5 3-d	7LWL	1001	3	0.2	2.8	0.27	Gobeil et al.[125]
alpha/B.1.1.7 1-up	7LWT	997	3	0.4	3.2	0.28	Gobeil et al.[125]
alpha/B.1.1.7 1-up	7LWU	997	3	0.2	3.2	0.28	Gobeil et al.[125]
alpha/B.1.1.7 1-up	7LWV	997	3	0.2	3.1	0.28	Gobeil et al.[125]
alpha/B.1.1.7 3-d	7LWS	1000	3	0.2	3.2	0.28	Gobeil et al.[125]
alpha/B.1.1.7	7N1X	1096	3	0.6	4.0	0.28	Cai et al.[127]
alpha/B.1.1.7	7N1U	1075	3	0.3	3.1	0.28	Cai et al.[127]
alpha/B.1.1.7	7N1Y	1089	3	0.6	4.3	0.29	Cai et al.[127]
alpha/B.1.1.7	7N1V	1108	3	0.3	3.2	0.27	Cai et al.[127]
alpha/B.1.1.7	7N1W	1096	3	0.5	3.3	0.28	Cai et al.[127]
beta/B.1.351	7N1Q	1115	3	0.6	2.9		Cai et al.[127]
beta/B.1.351	7N1T	1115	3	0.7	3.1	0.27	Cai et al.[127]
beta/B.1.351 2-up	7LYK	996	3	0.3	3.7	0.28	Gobeil et al.[125]
beta closed/3-d	7LYL	1001	3	0.1	3.7	0.27	Gobeil et al.[125]
beta 1-up	7LYN	999	3	0.3	3.3	0.27	Gobeil et al.[125]
gamma	7M8K	1039	3	0.0	N/A	0.28	Wang et al.[229]
gamma + ACE2	7NXC	596	2	0.1	3.1	0.28	Gobeil et al.[125]
epsilon+S2M11,S2L20	7N8H	1038	15	0.0	2.3	0.28	McCallum et al.[126]
Triple mutant	7LWW	998	3	0.3	3.0	0.27	Gobeil et al.[125]

PDB = PDB code; N = number of residues in structure; chains = number of chains in structure; res (Å) = resolution in Å; RSAavg = average solvent exposure of all residues; reference = primary citation; % outl. = outliers of Ramachandran plot in % (from PDB full report). The D614G structure has been elucidated by Yurkovetskiy et al. (6XS6),[69] Gobeil et al.[65] (7KDK etc.), Yan et al.[63] (7DX1), and Benton et al.[124] (7BNN, 7BNO). Yurkovetskiy et al.[69] reported a single D614G structure in an open conformation state which was speculated to be suitable for fusion, although the actual binding affinity to ACE2 was lowered. The authors speculated that this conformation change could explain its fixation in the population. The open conformation is consistent with the relative large RSA (0.29) similar to that of Yan et al.[63] (7DX1, 0.29) and Benton et al.[124] (7BNN, 7BNO, 0.30) and has been supported by molecular dynamics simulations at physiologically relevant temperature.[70] Since then, however, other conformation states have been obtained for D614G. Gobeil et al.[65] have elucidated the D614G in several more closed conformation states, with the variant clearly able to form the same closed and open conformation states as the reference early Wuhan variant (structures summarized in Table ). The more closed and 1-up conformations all have RSA values of 0.26–0.27, indicating reduced solvent exposure. Because D614G has been obtained in both open and closed states, and due to the delicate impact of environmental factors on conformation state,[59,67] the tendency toward open conformations could also be affected by the deletion of the furin site and proline mutation, as used by e.g., Yurkovetskiy et al.[69] Herrera et al. found major heterogeneity in a motif starting at site 614 (614–642) implying its pronounced variability.[45] Because the in vivo protein conformational state is sensitive to composition and molecular environment, it is not possible to deduce that one variant prefers one conformation state over another except when studied under the same conditions with the same protocol. In addition to D614G, major achievements in the structural biology of variants of concern have been documented during Summer 2021. Notably, Gobeil et al.[125] published both mink-related mutations and the structure of the beta variant in several conformation states (e.g., 7LYN in the 1-up conformation and), McCallum et al.[126] published the epsilon variant structure bound to two distinct antibodies (S2M11 and S2L20; 7NH8 in Table ), and Cai et al. published structures and antibody escape data for alpha and beta (7N1Q, 7N1T, 7N1U, 7N1V, 7N1W, 7N1X, and 7N1Y).[127] All of these structures are of good resolution with excellent metrics and form an essential basis for understanding SARS-CoV-2 evolution consistent with this Review’s emphasis on three selection pressures: one to preserve or improve protein stability, one toward enhanced binding to ACE2, and one toward reduced affinity toward antibodies. Each mutation is likely to contribute differently to these three terms and thus the overall fitness effect. Taking the wider perspective of other human-host coronaviruses (Table ) and other spike protein structures (Table ),[128] several mutations in the SARS-CoV-2 RBD have contributed to a stronger affinity toward ACE2.[47] Indeed, by far the most sequence variation between SARS-CoV-1 and SARS-CoV-2 occurs in the S1 domain that includes the RBD.[107,129] In addition, the highly positively charged amino acid sequence RRAR represents a new furin-like cleavage site that is not seen in other related β-coronaviruses.[80,130] Many sequence parts of the RBD of these viruses show similarity both to nonhuman but also human protein motifs, which may help to understand protein–protein interactions with the S-protein more broadly.[131]

Table 9

Properties of Structures of Spike Proteins of Other Human Coronavirusesa

protein	PDB	N	chains	% outl.	res (Å)	RSA_avg	reference
MERS-CoV	5W9K	1216	12	0.6	4.6	0.29	Pallesen et al.[36]
	5W9I	1006	12	0.9	3.6	0.28	Pallesen et al.[36]
	5X5C	1141	3	0.3	4.1	0.26	Yuan et al.[37]
	5X5F	1141	3	0.5	4.2	0.26	Yuan et al.[37]
	5X59	1141	3	0.3	3.7	0.27	Yuan et al.[37]
	6Q04	1159	3	0.2	2.5	0.25	Park et al.[232]
	6Q05	1159	3	0.2	2.8	0.25	Park et al.[232]
	6Q06	1159	3	0.2	2.7	0.25	Park et al.[232]
	6Q07	1159	3	0.2	2.9	0.25	Park et al.[232]
SARS-CoV	6CRV	881	3	0.1	3.2	0.28	Kirchdoerfer[234]
	6CRX	1069	3	0.1	3.9	0.28	Kirchdoerfer[234]
	6CRW	1069	3	0.1	3.9	0.27	Kirchdoerfer[234]
	6CRZ	1071	3	0.0	3.3	0.28	Kirchdoerfer[234]
	6CS1	1069	3	0.0	4.6	0.29	Kirchdoerfer[234]
	6CS0	1071	3	0.0	3.8	0.28	Kirchdoerfer[234]
	6CS2	1092	4	0.0	4.4	0.29	Kirchdoerfer[234]
	5X5B	1054	3	0.0	3.7	0.26	Yuan et al.[37]
	5X58	1054	3	0.0	3.2	0.26	Yuan et al.[37]
	5XLR	1022	3	0.0	3.8	0.28	Gui et al.[233]
ACE2+SARS-CoV	6ACK	1069	4	0.0	4.5	0.27	Song et al.[235]
	6ACJ	1069	4	0.1	4.2	0.28	Song et al.[235]
	6ACC	1065	3	0.0	3.6	0.27	Song et al.[235]
	6ACD	1065	3	0.0	3.9	0.28	Song et al.[235]
	6ACG	1069	4	0.0	5.4	0.29	Song et al.[235]
OC43	6NZK	1175	3	0.1	2.8	0.25	Tortorici et al.[132]
	6OHW	1175	3	0.1	2.9	0.26	Tortorici et al.[132]
229E	6U7H	965	3	0.0	3.1	0.27	Li et al.[236]
HKU1	5I08	958	3	1.3	4.0	0.30	Kirchdoerfer[35]
HKU2	6M15	965	3	0.0	2.4	0.27	Yu et al.[237]

Table 10

Properties of Some Structures of Spike Proteins of Other Virusesa

protein	PDB	N	chains	% outl.	res (Å)	RSA_avg	reference
porcine SADS-CoV	6M16	965	3	0.0	2.8	0.28	Yu et al.[237]
porcine SADS-CoV	6M39	937	3	0.0	3.6	0.26	Guan et al.[246]
porcine PDCoV	6B7N	966	3	0.1	3.3	0.25	Shang et al.[247]
porcine PDCoV	6BFU	964	3	0.1	3.5	0.24	Xiong et al.[248]
PEDV	6VV5	1097	3	0.2	3.5	0.27	Kirchdoerfer[249]
PEDV	6U7K	1064	3	0.1	3.1	0.26	Wrapp et al.[250]
avian bronchitis	6CV0	993	3	0.0	3.9	0.29	Shang et al.[251]
bat RaTG13	7CN4	1120	3	0.1	2.9	0.26	Zhang et al.[252]
pangolin PCoV_GX	7CN8	1125	3	0.1	2.5	0.25	Zhang et al.[252]
bat virus RaTG13	6ZGF	1060	3	0.0	3.1	0.27	Wrobel et al.[80]
mouse (MHV)	6VSJ	1122	6	0.0	3.9	0.27	Shang et al.[253]
mouse (MHV)	3JCL	1067	3	0.4	4.0	0.27	Walls et al.[254]
FIPV	6JX7	1245	3	0.0	3.3	0.29	Yang et al.[255]
Guangdong pango	7BBH	1063	3	0.0	2.9	0.27	Wrobel et al.[256]

Figure 5

Electrostatic potential surface comparisons of the S1-CTD/RBD domain of three coronaviruses: (a) SARS CoV-2, (b) SARS CoV, and (c) MERS CoV. The blue colored surface reflects excess of positive charge, whereas the red colored surface (negative potential) indicates negative charge surplus. As shown in Figure , electrostatic potential analysis indicates a high negative charge potential on the surface of ACE2 pointing toward the area interacting with the S-protein. Considering that positive charge introductions in prominent mutations of concern occur on the exact interface of this interaction, it is plausible that this will generate favorable electrostatic interactions between the RBD of the S-protein and ACE2, which could enhance the ability of the virus to fuse with the host cell, and more studies into the electrostatic modulation of the S-protein surface therefore seem warranted.

Figure 6

ACE2-spike complex of SARS-CoV-2 (PDB code 7KMS). RBD region 334–527 is shown, together with electrostatic potential maps of ACE2 and RBD; high negative potential (red) on the ACE2 may interact with positive charge on RBD and affect SARS-CoV-2 fusion.

Natural Mutations of Concern

In the following, we take a closer look at prominent natural mutations, with relevant structures summarized in Table . Variants of interest and concern, summarized in Table , harbor mutations that are likely to increase transmission, e.g., by causing the S-protein to bind more strongly to ACE2, or evading the binding of human antibodies or vaccines.[103,111,141,142] Among these mutations, deletions have been associated with substantial escape tendency.[6] Most mutations of concern are, however, missense substitutions where one amino acid has been changed for another.[4,11]

Table 11

Notable SARS-CoV-2 Variants and Their S-Protein Mutations

WHO name	Pango lineage name	transmission potential	escape mutations	S-protein mutations
alpha	B.1.1.7	normal[116] higher ACE2 affinity[81]		69del, 70del, 144del, N501Y, A570D, D614G, P681H, T716I, S982A, D1118H
beta	B.1.351	high[257] higher ACE2 affinity[81]	E484K,[38,150] K417N,[91,120] full variant[120,258]	L18F, D80A, D215G, K417N, E484K, N501Y, D614G, A701V
gamma	P.1	high[118]	E484K,[38,150] K417T, full variant[229]	L18F, T20N, P26S, D138Y, R190S, K417N, K417T, E484K, N501Y, D614G, H655Y, T1027I, V1176F
delta	B.1.617.2	high[259,260]	L452R,[126,148] P681R,[94] full variant[261]	T19R, E156del, F157del, R158G, L452R, T478K, D614G, P681R, D950N
epsilon	B.1.427, B.1.429	normal (+20%)[262]	L452R,[126,148] S13I/W152C,[126] full variant[126,263]	S13I, W152C, L452R, D614G
eta	B.1.525		E484K[38,150]	A67V, 69del, 70del, 144del, E484K, D614G, Q677H, F888L
iota	B.1.526		E484K,[38,150]	L5F, T95I, D253G, S477N, E484K, D614G, A701V
kappa	B.1.617.1		L452R,[126,148] E484Q,[264] full variant[265]	G142D, E154K, L452R, E484Q, D614G, P681R, Q1071H

Some properties of mutations in the main natural variants are listed in Figure a, including the change in polarity (measured by the Grantham scale[143]) of the amino acid, the change in side-chain volume, ΔV, and the RSA average of the site from eight structures (7LXY, 6XM0, 6ZB4, 7DDD, 6ZGE, 6VXX, 6VYB, and 7DWY). These properties represent three of the five important patterns needed to quantify protein-difference metrics and substitution tendencies (considering hydrophobicity and polarity inversely related, the remaining being secondary structure and codon use),[144] and are important for the general fold stability of protein structures.[145] As seen, the polarity and volume changes tend to be randomly affected, as expected for neutral mutations with no chemical property under selection. On average, the mutations tend to occur in sites of typical exposure (∼0.25–0.30, Table ). However, the mutation sites of concern, such as sites E484 and P681, are more exposed than typical mutated sites, consistent with selection for interaction with other proteins.

Figure 7

Chemical properties of prominent natural S-protein mutations. (a) Change in polarity/hydrophobicity (ΔH, normalized Grantham polarity scale), amino acid side chain volume change (ΔV), and relative solvent accessibility of mutated site (RSA). (b) Normalized Grantham[143] polarity change vs side chain volume change for prominent natural S-protein mutations. (c) Solvent accessibility (FreeSASA/Naccess[75,76]) vs computationally estimated stability effect (Simba;[145] in kcal/mol) of prominent natural mutations. Large blue spheres represent the average values for all possible mutations in the S-protein, calculated based on the 7DWY structure. Figure b displays the change in polarity and volume of the same natural mutations, indicating that they spread quite randomly around the average change of all possible mutations in the protein (these properties are obtained from the simple amino acids and are thus not structure-dependent). We note that, except for E484K and E484Q, mutations of concern tend to be farther away from the centroid, i.e. have larger changes in volume and polarity than average, consistent with a non-neutral chemical effect on protein function, e.g., association with ACE2 and antibodies. D614G is itself an unusual mutation involving both a change in charge and introduction of glycine (the smallest and sometimes structure-breaking amino acid), yet was fixated early and probably associated with evolutionary advantage. N501Y, which is characteristic of alpha, beta, and gamma variants, changes from a small polar to an aromatic tyrosine, which increases binding to ACE2.[102] Changes to arginine (P681R, L452R) are also conspicuous. Larger chemical changes are typically less likely to occur in protein evolution and are more often associated with non-neutral evolution.[146,147] Indeed, L452R is a known antibody escape mutation.[126,148] In contrast, mutations with typical changes in chemical properties that are not very exposed are expected (but not guaranteed) to be neutral. Mutations E484K and E484Q, which do not drastically change polarity and size, instead change the full charge of a very exposed site. Very many of the most prominent mutations change the charge of exposed residues in the S-protein. Of the 33 mutations listed in Figure a, 18 change the charge further than already shown for the reference genome S-protein: D80A, D138Y, G142D, E154K, R190S, D215G, R246I, K417N/K417T, N439K, N440K, L452R, T478K, E484K/E484Q, A570D, D614G, and P681R. Twelve of these change the surface charge toward more positive, the most notably being the early D614G substitution, introduction of lysine (K) at positions 439 or 440 (N439K, N440K), gain of arginine at position 452 (L452R), and loss of glutamate at position 484 (E484K/E484Q), whereas the remaining 6 contribute to more negative surface charge. Functional effects of all three positive charge introductions have been seen for N439K,[149] E484K,[38,150] and L452R.[126,148] The delta variant has two positive charge introductions in the RBD, L452R and T478K.[8] Interestingly, mutations that reduce positive charge at position K417 (K417N/T) tend to reduce ACE2 affinity;[102,149] together, these data could suggest that positive charge in the RBD is a main adaptation to human ACE2 interaction. We can also estimate the impact on the S-protein stability using the structures available and computational methods. Fold stability (free energy of folding the protein) is the main trade-off in protein evolution of new functionality[30,31,113,151−153] and also plays a role in many diseases driven by point mutations.[154−156] Such calculations should be of substantial interest, as they may provide information on whether fold stability of the S-protein is a desirable trait in the human host or whether it is relatively easily traded for improved function (ACE2 interaction, antibody evasion). Many such programs exist that use different machine-learning or energy-based force-fields.[157−165] To illustrate this point we computed the change in folding free energy (in kcal/mol) estimated by the recently developed method, Simba;[145] all these methods are inherently subject to limitations[166−169] so these results should only be considered expected estimates; however, Simba has the advantage in contrast to many machine-learning methods of interpretable residue-specific chemical contributions to stability, at generally similar accuracy.[145] These free energy changes are summarized in Figure c plotted against the RSA of each mutated site. The mutations in variants of concern are notably more stabilizing and notably more solvent-exposed than the “average” mutation in the S-protein. It is often expected that evolution will be faster on the surface due to less functional selection pressure and also smaller risk of disturbing the protein structure.[77,170,171] Accordingly, despite the functionality manifesting at the surface of the S-protein, the relation in Figure c is not surprising by itself. However, it is remarkable and a possible signature of positive selection that so many of the prominent natural mutations are in the upper right rather than lower left on this trend line, i.e. that they tend to be more solvent exposed and impair S-protein stability less than expected for a random mutation (estimated −1.2 kcal/mol), with several mutations being predicted to be stabilizing, although such considerations need more detailed analysis of evolution rates. Recent nanomechanical stress studies have indicated that the S-protein has evolved enhanced robustness.[172] A particularly concerning mutation in the spike-protein, E484K seen in e.g., beta (B.1.351) and gamma (lineage P.1),[117] is known to evade antibodies[39] and has been reported to have about 10-fold higher escape than the normal variant.[120] Immune plasma from individuals infected with early variants of the virus may not be equally effective against infection with virus variants harboring the E484K mutation so that new antibody cocktails are desirable.[39] The net positive surface potential is already markedly more positive when comparing the reference variant to SARS-CoV and MERS-CoV (Figure ) and has been supplemented by additional charge addition with E484K, which could suggest an evolutionary advantage of positive surface charge relating to ACE2 interaction and antibody escape. Because full charges contribute the largest electrostatic effects on interaction energies,[134,135] such charge changes may be expected to also produce large perturbations of interaction also with antibodies. Other exposed charge-changing mutations of note include K417T and K417N seen in gamma (P.1) and beta (B.1.351), respectively. K417N, like E484K, reduces the binding of the antibody families IGHV3-53/3-66 and IGHV1-2.[91] N439K, a relatively common mutation, displays reduced binding of some antibodies,[148,149] as does L452R seen in lineage B.1.617 (which includes the sublineage delta/B.1.617.2) and B.1.429 (epsilon).[88,126,148] Finally, it is important to stress that the effects of single amino acid changes, as analyzed above and in the single mutant scan assays for ACE2 binding and antibody binding,[88,94] are not expected to be completely linear, due to amino acid correlation effects (within-protein epistasis), and possibly epistasis between genes.[173−176] This point is very important in the context of SARS-CoV-2 variants.[9] The currently circulating variants are all multisite mutants, i.e. the virulence, transmission, or antibody evasion cannot be assumed simply a linear function of the individual amino acid effects. Influenza evolution has been shown to be highly dependent on epistasis in relation to maintaining protein stability,[177] and such structure–function trade-offs could be speculated to also cause the apparent avoidance of destabilization of many SARS-CoV-2 mutations in Figure c. This should be remembered in the context of Table , where some data are available both for the full variant and for individual mutations.

Structural Effects on Mutation Analysis

In the analysis in Figure , we used averages of eight structures (7LXY, 6XM0, 6ZB4, 7DDD, 6ZGE, 6VXX, 6VYB, and 7DWY) to calculate the RSA and stability effects. The reason is that most structural heterogeneity in the S-protein structures tends to occur on the surface of the proteins, which is generally less ordered than the buried parts of the protein, and structural heterogeneity is known to affect protein energies.[167,178] This is directly seen in the different RSA of the same residues in some reported structures. Because the local environment of the residue is very important for deducing its properties and role, this concern is relevant to any theoretical analysis of mutation structure–function relationships. To further quantify the structural heterogeneity of the S-protein surface sites, we calculated the RSA of the 33 mutation sites discussed above, using ten different apo-S-protein structures (6XM0, 6VYB, 7DDD, 6ZGE, 6VXX, 7DWY, 6X6P, 6ZOX, 7CAB, and 6XF5) produced by different groups (Figure a; note that not all sites are available in all the structures, and S13 and P681 are not in any of these structures). The standard deviation in RSA was 0.12, i.e. we expect the solvent exposure of a specific site to have approximately ∼10% uncertainty on average on the scale of 0–100% exposure, although larger variations commonly occur. For example, the site D614, which got fixated with the glycine substitution early on, is notoriously heterogeneous, with a range of 0.7 RSA, not due to outliers but with four structures in the range 0–0.1 and four structures >0.4. This major site heterogeneity is consistent with previous analysis by Herrera et al. that found substantial conformational variation starting at position 614[45] and with the evident conformational differences between published D614G structures[65,69,70,124] (Table ).

Figure 8

Structural heterogeneity in apo-S-protein structures. (a) RSA comparison of 33 sites known to mutate in different structures. (b) Structural alignment of 10 apo-S-proteins (6XM0, 6VYB, 7DDD, 6ZGE, 6VXX, 7DWY, 6X6P, 6ZOX, 7CAB, and 6XF5) with mutual RMSD values in the range 0.5–3.6 Å. As another example of site heterogeneity, the important site K417 (with mutations K417N, K417T which reduce antibody binding[91]) has only RSA = 0.01 in 7DDD and 0.07 in 7DWY (essentially buried site), but 0.33 in 6XM0, and 0.41 in 6VYB, despite all these structures being of the same chemical composition as apo-S proteins. Analysis that draws conclusions based on the local environment of these important sites, e.g., computational estimates of ACE2-binding and stability, will be very sensitive to this heterogeneity. Larger heterogeneity is also seen in the N-terminal part of the S-protein, such as D80 (D80A) and D138 (D138Y), which are highly exposed in 6XM0 and 6VXX (RSA = 0.52–0.74) but almost fully buried in 7DDD (0.03–0.17) and 6ZGE (0.11–0.13). Part of this is due to the different open and closed conformation states they represent, but a large part is what can be considered experimental noise, since 6VXX, 6XM0, and 6VXX all represent closed conformation states. Because of the large heterogeneity of some sites and the known conformational sensitivity of the protein,[41,67] it is strongly recommended to account for this heterogeneity for the mutation of interest before deducing its potential effects. The corresponding standard deviation in the estimated free energy effect of each mutation was 0.2 kcal/mol, which gives a more chemical quantification of the structural heterogeneity (many structural variations occur in disordered low-energy modes which may overemphasize differences). This variation does not change the overall conclusions from Figure c that most mutations of concern tend to be solvent-exposed and probably less destabilizing than typical random mutations in the protein. However, the structural heterogeneity of published cryoelectron microscopy structures reflect the real conformational plasticity of the protein as a functional requirement for its conversion from the metastable prefusion state.[34] Conformational plasticity in the prefusion state may even confer evolutionary advantage by providing dynamic variation of the epitopes accessible to antibodies; although this remains to be investigated, such “conformational masking” has been seen in relation to other, e.g., HIV.[179] As a further indication of structural heterogeneity, we aligned 10 apo-S-protein structures (6XM0, 6VYB, 7DDD, 6ZGE, 6VXX, 7DWY, 6X6P, 6ZOX, 7CAB, and 6XF5). The structures differed in their relative RMSDs from 0.5 to 3.6 Å, ranging from close resemblance to substantial differences (Figure b). Thus, the overall structural heterogeneity is as large for the apo-S-structures as for the different ACE2 complexes (Figure a), i.e. the higher plasticity in the apo-state produces as much heterogeneity as the different conformations induced by ACE2 binding. We conclude that subtle effects in the laboratory may lead to distinct local conformations even for the same protein states, consistent with the conformational sensitivity implied by temperature- and pH-dependent effects;[59,67] on top comes additional cooling effects on the structure and dynamics that may underemphasize entropic states of lipid-embedded proteins.[54]

Conclusions and Future Perspectives

SARS-CoV-2 is likely to become endemic and evolve into a milder virus due to intense efforts in clinical treatments and vaccination, yet the continued protein evolution, especially of the S-protein, will remain a concern to be monitored and countered.[180−182] The structural biology of the S-protein is a central cornerstone of this future, serving as the structural basis for rationalizing and predicting impacts of new mutations and treatments against them. We have summarized the current state of the art of structures of the S-protein of SARS-CoV-2, emphasizing the importance of completeness of the structures, their resolution, the conformations they represent, and the structural heterogeneity and plasticity especially of functionally important surface residues and its relation to evolutionary and functional analysis. While many of the published structures are of very good (∼3 Å) resolution allowing secondary structure to be well-accounted for, surface residue conformations are not generally resolved at 3 Å and this leads to substantial structural heterogeneity in the surface sites, which unfortunately are the sites of main functional interest in the S-protein. Even for protein structures supposedly representing the same conformational state, individual mutation sites can be situated in quite different environments, with solvent exposure varying up to 50% in some cases. These differences resonate with an overall tendency of conformational plasticity in the S-protein,[46,82] being affected also by temperature and pH.[59,67] Because of this heterogeneity, we recommend using averages of a group of high-quality protein structures rather than a single structure before reaching conclusions on specific structure–function relationships. The most important topic for the future of SARS-CoV-2 management is arguably to predict new virus evolution and counter it by, e.g., annual mutation-optimized boosters, to minimize mortality in the endemic state as is also attempted in many countries with influenza.[183−186] A large part of the antigenic drift in the endemic state is also likely to occur in the S-protein, making the structures of the S-protein and their mutations critical to a rational prediction and understanding of this evolution.[180] We expect SARS-CoV-2 to adapt by accumulating mutations that optimize binding to human ACE2 under selection from S-protein-based immunity induced by vaccines and therapeutic antibodies, which favors mutations that minimize binding to S-protein antibodies frequent in the population.[103,111] Indeed, the structures suggest that large chemical changes, notably affecting exposed surface charges, are recurring in the mutations of concern, possibly related to non-neutral effects on ACE2 or antibody interaction. These two fitness contributions may not work additively; many mutations may bind better to both proteins, such that the relative specificity for ACE2 vs prominent antibodies would determine fitness. We also expect overall protein stability to be a constraint on this evolution: mutations of the S-protein may be less destabilizing than randomly expected, consistent with a constraining selection pressure on the S-protein to not lose overall stability while adapting to the human host. These various selection pressures can be used to predict the part of the virus evolution that is adaptive, including antigenic drift, but not the random, neutral drift. We envision that the many experimental data on mutation effects on ACE2 binding and antibody escape and thermodynamic stability estimates of proteins (experimental and computational) will be used together with features of the mutation in its structural context to train computer models that can rationally predict fitness effects of new arising mutations in the S-protein. The analysis above suggests that mutations in surface-exposed sites with larger chemical effects, typically volume or charge-changing, will drive the adaptive evolution, although precise predictive models need to be semiquantitative. The heroic efforts in elucidating S-protein structures for all of the main conformation states bound to diverse antibodies and ACE2 will provide an unprecedented basis for analyzing the functional evolution of SARS-CoV-2 in its appropriate structural context.

7 in total

1. An Insight Based on Computational Analysis of the Interaction between the Receptor-Binding Domain of the Omicron Variants and Human Angiotensin-Converting Enzyme 2.

Authors: Ismail Celik; Magda H Abdellattif; Trina Ekawati Tallei
Journal: Biology (Basel) Date: 2022-05-23

2. Computational Analysis of Short Linear Motifs in the Spike Protein of SARS-CoV-2 Variants Provides Possible Clues into the Immune Hijack and Evasion Mechanisms of Omicron Variant.

Authors: Anjana Soorajkumar; Ebrahim Alakraf; Mohammed Uddin; Stefan S Du Plessis; Alawi Alsheikh-Ali; Richard K Kandasamy
Journal: Int J Mol Sci Date: 2022-08-08 Impact factor: 6.208

3. One Solution for All: Searching for Universal Aptamers for Constantly Mutating Spike Proteins of SARS-CoV-2.

Authors: Jiuxing Li; Zijie Zhang; Ryan Amini; Yingfu Li
Journal: ChemMedChem Date: 2022-05-31 Impact factor: 3.540

4. In silico analysis of mutations near S1/S2 cleavage site in SARS-CoV-2 spike protein reveals increased propensity of glycosylation in Omicron strain.

Authors: Christopher A Beaudoin; Arun P Pandurangan; So Yeon Kim; Samir W Hamaia; Christopher L-H Huang; Tom L Blundell; Sundeep Chaitanya Vedithi; Antony P Jackson
Journal: J Med Virol Date: 2022-06-07 Impact factor: 20.693

Review 5. Membrane attachment and fusion of HIV-1, influenza A, and SARS-CoV-2: resolving the mechanisms with biophysical methods.

Authors: Geetanjali Negi; Anurag Sharma; Manorama Dey; Garvita Dhanawat; Nagma Parveen
Journal: Biophys Rev Date: 2022-10-11

6. Structural heterogeneity and precision of implications drawn from cryo-electron microscopy structures: SARS-CoV-2 spike-protein mutations as a test case.

Authors: Rukmankesh Mehra; Kasper P Kepp
Journal: Eur Biophys J Date: 2022-09-27 Impact factor: 2.095

7. Binding of Human ACE2 and RBD of Omicron Enhanced by Unique Interaction Patterns Among SARS-CoV-2 Variants of Concern.

Authors: Seonghan Kim; Yi Liu; Matthew Ziarnik; Yiwei Cao; X Frank Zhang; Wonpil Im
Journal: bioRxiv Date: 2022-01-25

7 in total