Yasunori Watanabe1,2,3, Joel D Allen1, Daniel Wrapp4, Jason S McLellan4, Max Crispin5. 1. School of Biological Sciences, University of Southampton, Southampton SO17 1BJ, UK. 2. Oxford Glycobiology Institute, Department of Biochemistry, University of Oxford, South Parks Road, Oxford OX1 3QU, UK. 3. Division of Structural Biology, University of Oxford, Wellcome Centre for Human Genetics, Oxford OX3 7BN, UK. 4. Department of Molecular Biosciences, The University of Texas at Austin, Austin, TX 78712, USA. 5. School of Biological Sciences, University of Southampton, Southampton SO17 1BJ, UK. max.crispin@soton.ac.uk.
Abstract
The emergence of the betacoronavirus, severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), the causative agent of coronavirus disease 2019 (COVID-19), represents a considerable threat to global human health. Vaccine development is focused on the principal target of the humoral immune response, the spike (S) glycoprotein, which mediates cell entry and membrane fusion. The SARS-CoV-2 S gene encodes 22 N-linked glycan sequons per protomer, which likely play a role in protein folding and immune evasion. Here, using a site-specific mass spectrometric approach, we reveal the glycan structures on a recombinant SARS-CoV-2 S immunogen. This analysis enables mapping of the glycan-processing states across the trimeric viral spike. We show how SARS-CoV-2 S glycans differ from typical host glycan processing, which may have implications in viral pathobiology and vaccine design.
The emergence of the betacoronavirus, severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), the causative agent of coronavirus disease 2019 (COVID-19), represents a considerable threat to global human health. Vaccine development is focused on the principal target of the humoral immune response, the spike (S) glycoprotein, which mediates cell entry and membrane fusion. The SARS-CoV-2 S gene encodes 22 N-linked glycan sequons per protomer, which likely play a role in protein folding and immune evasion. Here, using a site-specific mass spectrometric approach, we reveal the glycan structures on a recombinant SARS-CoV-2 S immunogen. This analysis enables mapping of the glycan-processing states across the trimeric viral spike. We show how SARS-CoV-2 S glycans differ from typical host glycanprocessing, which may have implications in viral pathobiology and vaccine design.
Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), the causative pathogen of coronavirus 2019 (COVID-19) (, ), induces fever, severe respiratory illness, and pneumonia. SARS-CoV-2 uses an extensively glycosylated spike (S) protein that protrudes from the viral surface to bind to angiotensin-converting enzyme 2 (ACE2) to mediate host-cell entry (). The S protein is a trimeric class I fusion protein, composed of two functional subunits, responsible for receptor binding (S1 subunit) and membrane fusion (S2 subunit) (, ). The surface of the envelope spike is dominated by host-derived glycans, with each trimer displaying 66 N-linked glycosylation sites. The S protein is a key target in vaccine design efforts (), and understanding the glycosylation of recombinant viral spikes can reveal fundamental features of viral biology and guide vaccine design strategies (, ).Viral glycosylation has wide-ranging roles in viral pathobiology, including mediating protein folding and stability and shaping viral tropism (). Glycosylation sites are under selective pressure as they facilitate immune evasion by shielding specific epitopes from antibody neutralization. However, we note the low mutation rate of SARS-CoV-2 and that as yet, there have been no observed mutations to N-linked glycosylation sites (). Surfaces with an unusually high density of glycans can also enable immune recognition (, , ). The role of glycosylation in camouflaging immunogenic protein epitopes has been studied for other coronaviruses (, , ). Coronaviruses form virions by budding into the lumen of endoplasmic reticulum–Golgi intermediate compartments (, ). However, observations of complex-type glycans on virally derived material suggests that the viral glycoproteins are subjected to Golgi-resident processing enzymes (, ).High viral glycan density and local protein architecture can sterically impair the glycan maturation pathway. Impaired glycan maturation resulting in the presence of oligomannose-type glycans can be a sensitive reporter of native-like protein architecture (), and site-specific glycan analysis can be used to compare different immunogens and monitor manufacturing processes (). Additionally, glycosylation can influence the trafficking of recombinant immunogen to germinal centers ().To resolve the site-specific glycosylation of the SARS-CoV-2S protein and visualize the distribution of glycoforms across the protein surface, we expressed and purified three biological replicates of recombinant soluble material in an identical manner to that which was used to obtain the high-resolution cryo–electron microscopy (cryo-EM) structure, albeit without a glycan-processing blockade using kifunensine (). This variant of the S protein contains all 22 glycans on the SARS-CoV-2S protein (Fig. 1A). Stabilization of the trimeric prefusion structure was achieved by using the 2P stabilizing mutations () at residues 986 and 987, a GSAS (Gly-Ser-Ala-Ser) substitution at the furin cleavage site (residues 682 to 685), and a C-terminal trimerization motif. This helps to maintain quaternary architecture during glycanprocessing. Before analysis, supernatant containing the recombinant SARS-CoV-2 S was purified by size exclusion chromatography to ensure that only native-like trimeric protein was analyzed (Fig. 1B and fig. S1). The trimeric conformation of the purified material was validated by using negative-stain EM (Fig. 1C).
Fig. 1
Expression and validation of the SARS-CoV-2 S glycoprotein.
(A) Schematic representation of the SARS-CoV-2 S glycoprotein. The positions of N-linked glycosylation sequons (N-X-S/T, where X ≠ P) are shown as branches (N, Asn; X, any residue; S, Ser; T, Thr; P, Pro). Protein domains are illustrated: N-terminal domain (NTD), receptor binding domain (RBD), fusion peptide (FP), heptad repeat 1 (HR1), central helix (CH), connector domain (CD), and transmembrane domain (TM). (B) SDS–polyacrylamide gel electrophoresis analysis of the SARS-CoV-2 S protein (indicated by the arrowhead) expressed in human embryonic kidney (HEK) 293F cells. Lane 1: filtered supernatant from transfected cells; lane 2: flow-through from StrepTactin resin; lane 3: wash from StrepTactin resin; lane 4: elution from StrepTactin resin. (C) Negative-stain EM 2D class averages of the SARS-CoV-2 S protein. 2D class averages of the SARS-CoV-2 S protein are shown, confirming that the protein adopts the trimeric prefusion conformation matching the material used to determine the structure ().
Expression and validation of the SARS-CoV-2 S glycoprotein.
(A) Schematic representation of the SARS-CoV-2 S glycoprotein. The positions of N-linked glycosylation sequons (N-X-S/T, where X ≠ P) are shown as branches (N, Asn; X, any residue; S, Ser; T, Thr; P, Pro). Protein domains are illustrated: N-terminal domain (NTD), receptor binding domain (RBD), fusion peptide (FP), heptad repeat 1 (HR1), central helix (CH), connector domain (CD), and transmembrane domain (TM). (B) SDS–polyacrylamide gel electrophoresis analysis of the SARS-CoV-2S protein (indicated by the arrowhead) expressed in humanembryonic kidney (HEK) 293F cells. Lane 1: filtered supernatant from transfected cells; lane 2: flow-through from StrepTactin resin; lane 3: wash from StrepTactin resin; lane 4: elution from StrepTactin resin. (C) Negative-stain EM 2D class averages of the SARS-CoV-2S protein. 2D class averages of the SARS-CoV-2S protein are shown, confirming that the protein adopts the trimeric prefusion conformation matching the material used to determine the structure ().To determine the site-specific glycosylation of SARS-CoV-2 S, we used trypsin, chymotrypsin, and α-lytic protease to generate three glycopeptide samples. These proteases were selected to generate glycopeptides that contain a single N-linked glycan sequon. The glycopeptides were analyzed by liquid chromatography–mass spectrometry, and the glycan compositions were determined for all 22 N-linked glycan sites (Fig. 2). To convey the main processing features at each site, the abundances of each glycan are summed into oligomannose-type, hybrid-type, and categories of complex-type glycosylation based on branching and fucosylation. The detailed, expanded graphs showing the diverse range of glycan compositions are presented in table S1 and fig. S2.
Fig. 2
Site-specific N-linked glycosylation of the SARS-CoV-2 S glycoprotein.
The schematic illustrates the color code for the principal glycan types that can arise along the maturation pathway from oligomannose- to hybrid- to complex-type glycans. The graphs summarize quantitative mass spectrometric analysis of the glycan population present at individual N-linked glycosylation sites simplified into categories of glycans. The oligomannose-type glycan series (M9 to M5; Man9GlcNAc2 to Man5GlcNAc2) is colored green, afucosylated and fucosylated hybrid-type glycans (hybrid and F hybrid) are dashed pink, and complex glycans are grouped according to the number of antennae and presence of core fucosylation (A1 to FA4) and are colored pink. Unoccupancy of an N-linked glycan site is represented in gray. The pie charts summarize the quantification of these glycans. Glycan sites are colored according to oligomannose-type glycan content, with the glycan sites labeled in green (80 to 100%), orange (30 to 79%), and pink (0 to 29%). An extended version of the site-specific analysis showing the heterogeneity within each category can be found in table S1 and fig. S2. The bar graphs represent the mean quantities of three biological replicates, with error bars representing the standard error of the mean.
Site-specific N-linked glycosylation of the SARS-CoV-2 S glycoprotein.
The schematic illustrates the color code for the principal glycan types that can arise along the maturation pathway from oligomannose- to hybrid- to complex-type glycans. The graphs summarize quantitative mass spectrometric analysis of the glycan population present at individual N-linked glycosylation sites simplified into categories of glycans. The oligomannose-type glycanseries (M9 to M5; Man9GlcNAc2 to Man5GlcNAc2) is colored green, afucosylated and fucosylated hybrid-type glycans (hybrid and F hybrid) are dashed pink, and complex glycans are grouped according to the number of antennae and presence of core fucosylation (A1 to FA4) and are colored pink. Unoccupancy of an N-linked glycan site is represented in gray. The pie charts summarize the quantification of these glycans. Glycan sites are colored according to oligomannose-type glycan content, with the glycan sites labeled in green (80 to 100%), orange (30 to 79%), and pink (0 to 29%). An extended version of the site-specific analysis showing the heterogeneity within each category can be found in table S1 and fig. S2. The bar graphs represent the mean quantities of three biological replicates, with error bars representing the standard error of the mean.Two sites on SARS-CoV-2 S are principally oligomannose-type: N234 and N709. The predominant oligomannose-type glycan structure observed across the protein, with the exception of N234, is Man5GlcNAc2 (Man, mannose; GlcNAc, N-acetylglucosamine), which demonstrates that these sites are largely accessible to α-1,2-mannosidases but are poor substrates for GlcNAcT-I, which is the gateway enzyme in the formation of hybrid- and complex-type glycans in the Golgi apparatus. The stage at which processing is impeded is a signature related to the density and presentation of glycans on the viral spike. For example, the more densely glycosylated spikes of HIV-1Env and Lassa virus (LASV) GPC exhibit numerous sites dominated by Man9GlcNAc2 (–).A mixture of oligomannose- and complex-type glycans can be found at sites N61, N122, N603, N717, N801, and N1074 (Fig. 2). Of the 22 sites on the S protein, 8 contain substantial populations of oligomannose-type glycans, highlighting how the processing of the SARS-CoV-2 S glycans is divergent from host glycoproteins (). The remaining 14 sites are dominated by processed, complex-type glycans.Although unoccupied glycosylation sites were detected on SARS-CoV-2 S, when quantified they were revealed to form a very minor component of the total peptide pool (table S2). In HIV-1 immunogen research, the holes generated by unoccupied glycan sites have been shown to be immunogenic and potentially give rise to distracting epitopes (). The high occupancy of N-linked glycan sequons of SARS-CoV-2 S indicates that recombinant immunogens will not require further optimization to enhance site occupancy.Using the cryo-EM structure of the trimeric SARS-CoV-2S protein [Protein Data Bank (PDB) ID 6VSB] (), we mapped the glycosylation status of the coronavirusspike mimetic onto the experimentally determined three-dimensional (3D) structure (Fig. 3). This combined mass spectrometric and cryo-EM analysis reveals how the N-linked glycans occlude distinct regions across the surface of the SARS-CoV-2spike.
Fig. 3
Structure-based mapping of SARS-CoV-2 S N-linked glycans.
Representative glycans are modeled onto the prefusion structure of the trimeric SARS-CoV-2 S glycoprotein (PDB ID 6VSB) (), with one RBD in the “up” conformation and the other two RBDs in the “down” conformation. The glycans are colored according to oligomannose content as defined by the key. ACE2 receptor binding sites are highlighted in light blue. The S1 and S2 subunits are rendered with translucent surface representation, colored light and dark gray, respectively. The flexible loops on which the N74 and N149 glycan sites reside are represented as gray dashed lines, with glycan sites on the loops mapped at their approximate regions.
Structure-based mapping of SARS-CoV-2 S N-linked glycans.
Representative glycans are modeled onto the prefusion structure of the trimeric SARS-CoV-2 S glycoprotein (PDB ID 6VSB) (), with one RBD in the “up” conformation and the other two RBDs in the “down” conformation. The glycans are colored according to oligomannose content as defined by the key. ACE2 receptor binding sites are highlighted in light blue. The S1 and S2 subunits are rendered with translucent surface representation, colored light and dark gray, respectively. The flexible loops on which the N74 and N149 glycan sites reside are represented as gray dashed lines, with glycan sites on the loops mapped at their approximate regions.Shielding of the receptor binding sites on the SARS-CoV-2spike by proximal glycosylation sites (N165, N234, N343) can be observed, especially when the receptor binding domain is in the “down” conformation. The shielding of receptor binding sites by glycans is a common feature of viral glycoproteins, as observed on SARS-CoV-1 S (, ), HIV-1Env (), influenza hemagglutinin (, ), and LASVGPC (). Given the functional constraints of receptor binding sites and the resulting low mutation rates of these residues, there is likely selective pressure to use N-linked glycans to camouflage one of the most conserved and potentially vulnerable areas of their respective glycoproteins (, ).We note the dispersion of oligomannose-type glycans across both the S1 and S2 subunits. This is in contrast to other viral glycoproteins; for example, the dense glycan clusters in several strains of HIV-1Env induce oligomannose-type glycans that are recognized by antibodies (, ). In SARS-CoV-2 S, the oligomannose-type structures are likely protected by the protein component, as exemplified by the N234 glycan, which is partially sandwiched between the N-terminal and receptor binding domains (Fig. 3).We characterized the N-linked glycans on extended flexible loop structures (N74 and N149) and at the membrane-proximal C terminus (N1158, N1173, N1194) that were not resolved in the cryo-EM maps (). These were determined to be complex-type glycans, consistent with steric accessibility of these residues.Whereas the oligomannose-type glycan content (28%) (table S2) is above that observed on typical host glycoproteins, it is lower than other viral glycoproteins. For example, one of the most densely glycosylated viral spikeproteins is HIV-1Env, which exhibits ~60% oligomannose-type glycans (, ). This suggests that the SARS-CoV-2S protein is less densely glycosylated and that the glycans form less of a shield compared with other viral glycoproteins, including HIV-1Env and LASVGPC, which may be beneficial for the elicitation of neutralizing antibodies.Additionally, the processing of complex-type glycans is an important consideration in immunogen engineering, especially considering that epitopes of neutralizing antibodies against SARS-CoV-2 S can contain fucosylated glycans at N343 (). Across the 22 N-linked glycosylation sites, 52% are fucosylated and 15% of the glycans contain at least one sialic acid residue (table S2 and fig. S3). Our analysis reveals that N343 is highly fucosylated with 98% of detected glycans bearing fucose residues. Glycan modifications can be heavily influenced by the cellular expression system used. We have previously demonstrated for HIV-1 Env glycosylation that the processing of complex-type glycans is driven by the producer cell but that the levels of oligomannose-type glycans were largely independent of the expression system and are much more closely related to the protein structure and glycan density ().Highly dense glycan shields, such as those observed on LASVGPC and HIV-1Env, feature so-called mannose clusters (, ) on the protein surface (Fig. 4). Whereas small mannose-type clusters have been characterized on the S1 subunit of Middle East respiratory syndrome (MERS)–CoV S (), no such phenomenon has been observed for the SARS-CoV-1 or SARS-CoV-2S proteins. The site-specific glycosylation analysis reported here suggests that the glycan shield of SARS-CoV-2 S is consistent with other coronaviruses and similarly exhibits numerous vulnerabilities throughout the glycan shield (). Last, we detected trace levels of O-linked glycosylation at Thr323/Ser325 (T323/S325), with over 99% of these sites unmodified (fig. S4), suggesting that O-linked glycosylation of this region is minimal when the structure is native-like.
Fig. 4
Underprocessing of viral glycan shields.
From left to right, MERS-CoV S (), SARS-CoV-1 S (), SARS-CoV-2 S, LASV GPC (), and HIV-1 Env (, ). Site-specific N-linked glycan oligomannose quantifications are colored according to the key. All glycoproteins were expressed as soluble trimers in HEK 293F cells apart from LASV GPC, which was derived from virus-like particles from Madin-Darby canine kidney II cells.
Underprocessing of viral glycan shields.
From left to right, MERS-CoV S (), SARS-CoV-1 S (), SARS-CoV-2 S, LASVGPC (), and HIV-1Env (, ). Site-specific N-linked glycanoligomannose quantifications are colored according to the key. All glycoproteins were expressed as soluble trimers in HEK 293F cells apart from LASVGPC, which was derived from virus-like particles from Madin-Darby canine kidney II cells.Our glycosylation analysis of SARS-CoV-2 offers a detailed benchmark of site-specific glycan signatures characteristic of a natively folded trimeric spike. As an increasing number of glycoprotein-based vaccine candidates are being developed, their detailed glycan analysis offers a route for comparing immunogen integrity and will also be important to monitor as manufacturing processes are scaled for clinical use. Glycanprofiling will therefore also be an important measure of antigen quality in the manufacture of serological testing kits. Last, with the advent of nucleotide-based vaccines, it will be important to understand how those delivery mechanisms affect immunogen processing and presentation.
Authors: Anna-Janina Behrens; David J Harvey; Emilia Milne; Albert Cupo; Abhinav Kumar; Nicole Zitzmann; Weston B Struwe; John P Moore; Max Crispin Journal: J Virol Date: 2017-01-03 Impact factor: 5.103
Authors: Guillaume B E Stewart-Jones; Cinque Soto; Thomas Lemmin; Gwo-Yu Chuang; Aliaksandr Druz; Rui Kong; Paul V Thomas; Kshitij Wagh; Tongqing Zhou; Anna-Janina Behrens; Tatsiana Bylund; Chang W Choi; Jack R Davison; Ivelin S Georgiev; M Gordon Joyce; Young Do Kwon; Marie Pancera; Justin Taft; Yongping Yang; Baoshan Zhang; Sachin S Shivatare; Vidya S Shivatare; Chang-Chun D Lee; Chung-Yi Wu; Carole A Bewley; Dennis R Burton; Wayne C Koff; Mark Connors; Max Crispin; Ulrich Baxa; Bette T Korber; Chi-Huey Wong; John R Mascola; Peter D Kwong Journal: Cell Date: 2016-04-21 Impact factor: 41.582
Authors: Devin Sok; Katie J Doores; Bryan Briney; Khoa M Le; Karen L Saye-Francisco; Alejandra Ramos; Daniel W Kulp; Jean-Philippe Julien; Sergey Menis; Lalinda Wickramasinghe; Michael S Seaman; William R Schief; Ian A Wilson; Pascal Poignard; Dennis R Burton Journal: Sci Transl Med Date: 2014-05-14 Impact factor: 17.956
Authors: Dora Pinto; Young-Jun Park; Martina Beltramello; Alexandra C Walls; M Alejandra Tortorici; Siro Bianchi; Stefano Jaconi; Katja Culap; Fabrizia Zatta; Anna De Marco; Alessia Peter; Barbara Guarino; Roberto Spreafico; Elisabetta Cameroni; James Brett Case; Rita E Chen; Colin Havenar-Daughton; Gyorgy Snell; Amalio Telenti; Herbert W Virgin; Antonio Lanzavecchia; Michael S Diamond; Katja Fink; David Veesler; Davide Corti Journal: Nature Date: 2020-05-18 Impact factor: 49.962
Authors: Liwei Cao; Jolene K Diedrich; Daniel W Kulp; Matthias Pauthner; Lin He; Sung-Kyu Robin Park; Devin Sok; Ching Yao Su; Claire M Delahunty; Sergey Menis; Raiees Andrabi; Javier Guenaga; Erik Georgeson; Michael Kubitz; Yumiko Adachi; Dennis R Burton; William R Schief; John R Yates; James C Paulson Journal: Nat Commun Date: 2017-03-28 Impact factor: 14.919
Authors: Liwei Cao; Matthias Pauthner; Raiees Andrabi; Kimmo Rantalainen; Zachary Berndsen; Jolene K Diedrich; Sergey Menis; Devin Sok; Raiza Bastidas; Sung-Kyu Robin Park; Claire M Delahunty; Lin He; Javier Guenaga; Richard T Wyatt; William R Schief; Andrew B Ward; John R Yates; Dennis R Burton; James C Paulson Journal: Nat Commun Date: 2018-09-12 Impact factor: 14.919
Authors: Alexandra C Walls; Young-Jun Park; M Alejandra Tortorici; Abigail Wall; Andrew T McGuire; David Veesler Journal: Cell Date: 2020-03-09 Impact factor: 41.582
Authors: Ziyad Tariq Muhseen; Alaa R Hameed; Halah M H Al-Hasani; Muhammad Tahir Ul Qamar; Guanglin Li Journal: J Mol Liq Date: 2020-10-07 Impact factor: 6.165
Authors: Zunlong Ke; Joaquin Oton; Kun Qu; Mirko Cortese; Vojtech Zila; Lesley McKeane; Takanori Nakane; Jasenko Zivanov; Christopher J Neufeldt; Berati Cerikan; John M Lu; Julia Peukes; Xiaoli Xiong; Hans-Georg Kräusslich; Sjors H W Scheres; Ralf Bartenschlager; John A G Briggs Journal: Nature Date: 2020-08-17 Impact factor: 49.962
Authors: Ngan Nguyen; Ondrej Strnad; Tobias Klein; Deng Luo; Ruwayda Alharbi; Peter Wonka; Martina Maritan; Peter Mindek; Ludovic Autin; David S Goodsell; Ivan Viola Journal: IEEE Trans Vis Comput Graph Date: 2021-01-28 Impact factor: 4.579
Authors: Lohra M Miller; Lauren F Barnes; Shannon A Raab; Benjamin E Draper; Tarick J El-Baba; Corinne A Lutomski; Carol V Robinson; David E Clemmer; Martin F Jarrold Journal: J Am Chem Soc Date: 2021-03-03 Impact factor: 15.419
Authors: Wilton B Williams; R Ryan Meyerhoff; R J Edwards; Hui Li; Kartik Manne; Nathan I Nicely; Rory Henderson; Ye Zhou; Katarzyna Janowska; Katayoun Mansouri; Sophie Gobeil; Tyler Evangelous; Bhavna Hora; Madison Berry; A Yousef Abuahmad; Jordan Sprenz; Margaret Deyton; Victoria Stalls; Megan Kopp; Allen L Hsu; Mario J Borgnia; Guillaume B E Stewart-Jones; Matthew S Lee; Naomi Bronkema; M Anthony Moody; Kevin Wiehe; Todd Bradley; S Munir Alam; Robert J Parks; Andrew Foulger; Thomas Oguin; Gregory D Sempowski; Mattia Bonsignori; Celia C LaBranche; David C Montefiori; Michael Seaman; Sampa Santra; John Perfect; Joseph R Francica; Geoffrey M Lynn; Baptiste Aussedat; William E Walkowicz; Richard Laga; Garnett Kelsoe; Kevin O Saunders; Daniela Fera; Peter D Kwong; Robert A Seder; Alberto Bartesaghi; George M Shaw; Priyamvada Acharya; Barton F Haynes Journal: Cell Date: 2021-05-20 Impact factor: 41.582
Authors: Christian Seitz; Lorenzo Casalino; Robert Konecny; Gary Huber; Rommie E Amaro; J Andrew McCammon Journal: Biophys J Date: 2020-10-31 Impact factor: 4.033