Literature DB >> 35700310

Site-specific glycosylation of SARS-CoV-2: Big challenges in mass spectrometry analysis.

Diana Campos1, Michael Girgis2, Miloslav Sanda1,3.   

Abstract

Glycosylation of viral proteins is required for the progeny formation and infectivity of virtually all viruses. It is increasingly clear that distinct glycans also play pivotal roles in the virus's ability to shield and evade the host's immune system. Recently, there has been a great advancement in structural identification and quantitation of viral glycosylation, especially spike proteins. Given the ongoing pandemic and the high demand for structure analysis of SARS-CoV-2 densely glycosylated spike protein, mass spectrometry methodologies have been employed to accurately determine glycosylation patterns. There are still many challenges in the determination of site-specific glycosylation of SARS-CoV-2 viral spike protein. This is compounded by some conflicting results regarding glycan site occupancy and glycan structural characterization. These are probably due to differences in the expression systems, form of expressed spike glycoprotein, MS methodologies, and analysis software. In this review, we recap the glycosylation of spike protein and compare among various studies. Also, we describe the most recent advancements in glycosylation analysis in greater detail and we explain some misinterpretation of previously observed data in recent publications. Our study provides a comprehensive view of the spike protein glycosylation and highlights the importance of consistent glycosylation determination.
© 2022 The Authors. Proteomics published by Wiley-VCH GmbH.

Entities:  

Keywords:  N-glycosylation; O-glycosylation; SARS-CoV-2 glycoprotein; glycoproteomics

Mesh:

Substances:

Year:  2022        PMID: 35700310      PMCID: PMC9349404          DOI: 10.1002/pmic.202100322

Source DB:  PubMed          Journal:  Proteomics        ISSN: 1615-9853            Impact factor:   5.393


Human Immunodeficiency Virus Hepatitis C Virus Human leukocyte Antigen Human Embryonic Kidney 293 cells Normalized Collision Energy Identity Higher‐energy C‐Trap Dissociation Electron‐Transfer Dissociation Electron‐Transfer/Higher‐energy C‐Trap Dissociation Prostate‐Specific Membrane Antigen Peptide:N‐Glycosidase Severe Acute Respiratory Syndrome Coronavirus type 2 Collision Cross Section

INTRODUCTION

Viruses can be classified into two groups depending on whether they have a lipid bilayer membrane on their outer surface or not: enveloped viruses and nonenveloped viruses. A characteristic feature of enveloped viruses is a cell membrane‐derived envelope modified with virally encoded proteins [1]. All attachment and fusion proteins of the enveloped viruses are usually modified by glycosylation and these glycosylated surface epitopes are key in the pathogen‐host interplay. The interactions between viral proteins and host cell proteome play a crucial role in the infection process [2, 3]. Viral envelope proteins, like cellular proteins, possess signal peptides directing them to the secretory pathway, and can be decorated with different kinds of post‐translational modifications (PTMs). Viral protein PTMs, as well as host cell proteome, are the determining factors in the level of these interactions and the extent of host immune response [4, 5]. Glycosylation is one of the most important types of PTMs that could potentially impact protein structure, orientation, binding affinity, and metabolism [6]. It involves the covalent attachment of different types of glycans to specific sites on protein structures. Viral envelope proteins are often ornamented by glycans that can account for up to half of the molecular weight of these glycoproteins [7]. Despite the numerous types of glycosylation, N‐ and mucin‐type O‐linked glycosylation are the most widely exploited in viral research [8]. Prominent examples include the heavily glycosylated gp120 glycoprotein in HIV, the Ebola virus glycoprotein modified by a very high glycan content, and the HIV‐1 glycoprotein gp160 that is glycosylated by the addition of multiple N‐linked glycans [7, 9‐11]. The N‐linked glycosylation takes place on the nitrogen atom of the asparagine amide side chain, where this asparagine (N) residue is located within the sequence ‐N‐X‐T/S‐ (where X ≠ proline) [12, 13, 14]. The O‐linked glycosylation is usually meant to attach the sugar moiety to the oxygen of the side chains of serine and/or threonine residues within the amino acid sequence [15, 16, 17]. Both types of site‐specific glycosylation have been shown to affect viral glycoprotein secretion and function [8]. As the virus hijacks the host cellular machinery for replication, it subsequently uses the host glycosylation capabilities for the production of viral proteins. Thus, the viral surface antigens may encompass familiar host glycans, which can thereby change the ability of the host to recognize the virus and stimulate the immune response [18, 19, 20]. In addition, certain viruses are able to induce changes in the expression levels of host glycosyltransferases (enzymes that catalyze the glycosidic bond). For example, herpes viruses can induce the expression of host fucosyltransferases leading to the expression of sLex or Ley antigens [21, 22]. Another example is the shift of the total glycoprofile of HCV‐infected hepatoma cells toward more fucosylated, sialylated, and a more complex N‐glycan structures [23]. Furthermore, the changes in expression levels of host glycosylation enzymes may indicate occurrence of specific structural or functional modifications or highlight the demand for increased glycosylation capacity to support viral protein glycosylation. Mass spectrometry (MS) techniques have been abundantly implemented for proteomic and glycoproteomic analysis to decipher pathogenic structures and explore mechanistic immune responses of the host [24, 25, 26, 27, 28, 29, 30, 31, 32, 33]. In this review, we examined the most recent advancements in structural identification and quantitation of viral glycosylation. Given the ongoing pandemic and the high demand for protein and structure analysis, we mainly focused on an in‐depth analysis of SARS‐CoV‐2. Also, we highlighted the most intricate challenges in the determination of site‐specific glycosylation associated with viral spike protein.

VIRAL GLYCOPROTEINS AND THEIR FUNCTIONS

Glycosylation of viral proteins is required for the progeny formation and infectivity of many viruses [31, 34]. Furthermore, distinct glycans play pivotal roles in various stages of the viral cycle [34, 35]. For example, glycans on viral entry proteins are greatly involved in the modulation of receptor binding and entry [36]. A prominent example is the influenza viruses that attach to glycans on cellular surface glycoproteins. Hemagglutinin (HA) and neuraminidase (NA) are the surface glycoproteins of influenza viruses, which interact with the terminal sialic acid (SA) of the host cell surface glycoproteins. NA can cleave the SA residues of host mucin to gain access to the epithelial cells, playing a secondary role in helping viruses to enter host cells [37]. On the other hand, NA can also cleave SA residues from glycoproteins of the enveloped virus itself and enhance infectivity by preventing aggregation of viral particles [38]. Host‐cell dependent glycosylation of HA and NA has also proven to be critical. Glycosylation in the HA stalk region is important for protein folding and trafficking, and for pH stability [39, 40], while the extent of glycosylation near the HA receptor‐binding site alters its affinity for SA‐containing receptors [41, 42]. Glycosylation near the cleavage of HA site also modulates virus pathogenicity [43]. While the role of glycosylation of NA is less well understood, N‐linked glycosylation is important for functional NA, and lack of NA glycosylation increases neurovirulence of the A/WSN/33 IAV strain in mice [44]. Besides, HA N‐glycosylation affects T cell activation and cytokine production and thus promote immune evasion [45]. This and other types of immune evasion have been reported for different glycoproteins in different viruses. In fact, it is strongly believed that the high levels of glycosylation serve primarily as a protective shield against the host's immune system [46, 47]. While the innate immune system is constantly evolving a range of strategies to combat glycosylated epitopes of serious pathogens, mutations could lead to failures in the immune reactions. Alterations on viral glycoproteins can significantly impact viral characteristics, including the extent of protein glycosylation, which may jeopardize efficacy of existing vaccines [45, 48]. Moreover, antigen glycosylation complicates the development of vaccines and antibody‐based therapies. Glycans represent structural features that are not encoded in the gene sequence and yet play a crucial role in immune recognition affecting vaccine designs. Such vaccine candidates are often expressed in cell lines that do not recapitulate the glycosylation pattern on native pathogens, and potentially do not elicit biologically relevant immune responses. Therefore, it is important to identify common glycosylation patterns for translational applications. This has been particularly important in the study of SARS‐CoV‐2 spike (S) protein.

SARS‐CoV‐2 spike protein glycosylation

SARS‐CoV‐2 initiates the infection cycle by binding to Angiotensin‐Converting Enzyme II (ACE2) on the host epithelial cells in the respiratory tract. This leads to viral penetration, domination of the host cell's biological machinery, multiplication, and maturation of the virus [49, 50, 51, 52]. Figure 1 demonstrates SARS‐CoV‐2 as represented by Ganji et al., [53]. There are four structural proteins encoded by SARS‐CoV‐2 genome: the spike (S) glycoprotein, the membrane (M) protein, the envelope (E) protein, and the nucleocapsid (N) protein.
FIGURE 1

Anatomy of the SARS‐CoV‐2 particle showing its structural proteins (adapted from Ganji et al., [53]) with emphasis on the SARS‐CoV‐2 Spike (S) protein with its illustrative protein sequence components: S1 and S2 subunits, and Receptor binding domain (RBD). Also, the most commonly observed N‐ and O‐ glycosylation sites on S protein are depicted

Anatomy of the SARS‐CoV‐2 particle showing its structural proteins (adapted from Ganji et al., [53]) with emphasis on the SARS‐CoV‐2 Spike (S) protein with its illustrative protein sequence components: S1 and S2 subunits, and Receptor binding domain (RBD). Also, the most commonly observed N‐ and O‐ glycosylation sites on S protein are depicted Like many viruses, SARS‐CoV‐2 launches its cellular invasion through its heavily glycosylated S protein [54, 55, 56, 57, 58]. Despite the relatively modest contribution of the glycans to the total molecular weight of the S trimer of the spike protein, they shield approximately 40% of the protein surface. The three‐dimensional structure of the S protein shows that the protein surface is extensively shielded from antibody recognition by glycans, with the notable exception of the ACE2 receptor binding domain [59]. The S protein encompasses two protein subunits, S1 and S2. While the S1 subunit is the ACE2 receptor binding domain and modulates receptor recognition, S2 subunit orchestrates cellular adhesion and membrane fusion [60]. The viral S protein of SARS‐Cov‐2 has also been the ultimate target for vaccine production [61]. In fact, spike proteins are often used as immunogens for vaccines to generate neutralizing antibodies and are frequently targeted for inhibition by small molecules that might block host receptor binding and/or membrane fusion. Glycosylation is a heterogenic process that depends on many factors, including age, underlying disease and ethnicity; therefore, assessing the glycosylation profile may be correlated to the observed differential susceptibilities among individuals to COVID‐19 [62, 63, 64, 65]. While overall shielding of the underlying protein surface does not appear to be highly sensitive to glycan microheterogeneity, it could impact the innate immune response by altering the ability of collectins and other lectins of the immune system to effectively bind to the S glycoprotein and neutralize the virus. Also, it may impact the adaptive immune response by altering the number of viable human leucocyte antigen (HLA) [66]. With reference to the sequence details of the S protein, so far, 22 highly occupied N‐linked glycosylation sites have been identified, as well as a variable number of O‐linked glycosylation sites at low stoichiometries [56, 67‐70]. These SARS‐CoV‐2 S N‐glycosites were reported occupied in most studies in recent glycoproteomic analyses [67, 68, 69, 70, 71]. A summary of the identified O‐glycan sites is displayed in Table 1, with emphasis for the sites found on two critical areas of the genome: furin cleavage site and Receptor Binding Domain (RBD). SARS‐CoV‐2 has a polybasic cleavage site (RRAR) at the junction of S1 and S2, the two subunits of the spike. Three O‐glycan sites (S673, T678 and S686) are predicted to flank the cleavage site [72]. The T678 occupancy was firstly identified by Sanda et al., in a recent publication [68]. The RBD in the spike protein is the most variable part of the coronavirus genome. O‐glycan sites were also consistently found at RBD sequence, namely T323 and S325. It's also important to note that 16 out of 25 O‐glycosites were located within three amino acids from the known N‐glycosites. However, O‐glycosylation was primarily found on peptides that were unoccupied by N‐glycans. This suggests that although O‐glycans constitute a minor component of the S protein, they may ensure maximum shielding of the minor fraction of peptides that are unoccupied by N‐glycans [73]. In the future, it would be relevant to map O‐glycosites on native viruses derived from specified respiratory cell subtypes.
TABLE 1

A Summary of site‐specific O‐glycosylation analysis of SARS‐Cov‐2 Spike glycoproteins

Reference a Source of S proteinExpression systemTotal sitesRBD sitesFurin cleavage sites*Notes
Shajahan A. et al., 2020 [69]S1+S2 (separetly)Hek 2932T323, S325in gel digestion
Watanabe Y. et al., 2020 [70]S protein 2P GSAS (682‐ 685)Hek 2932T323, S325
Sanda M. et al., 2021 [68]S protein (R683A, R685A)Hek 2939T323, S325T678
Gao C. et al., 2020 [92]S protein (R683A, R685A)Hek 2935T323T678, S686
Zhao P. et al., 2020 [58]S protein 2P GGSG (682‐ 685)Hek 29327T323, S325, S359, S366, S371, S373, T393, S399, S494, T547, T553, S555O‐protease OpeRATOR
Bagdonaite I. et al., 2021 [73]S protein 2P AARA (682‐ 685)Hek 29312T478(T676), T678in gel digestion and in solution workflow
Tian Y. et al., 2021 [96]S1 subunityHek 29314T323, T523T678
Zhang Y. et al., 2021 [95]S1 subunityHek 29330T323, S325(T676), T678, (S680)
Bagdonaite I. et al., 2021 [73]S protein 2P AARA (682‐ 685)Insect cells15T478T678in gel digestion
Bagdonaite I. et al., 2021[73]Soluble RBDInsect cells6T323, T333, T345, T415, T523in gel digestion
Zhang Y. et al., 2021[95]S proteinInsect cells43T323, S325, T333, S345, S477(T676), T678
Tian W. et al., 2021[101]SARS‐CoV‐2 virionsVero cells17T323
Brun J. et al., 2021 [97]SARS‐CoV‐2 virions (S1 subunity)Calu‐31T678

Literature references are indicated as first author last name and year

A Summary of site‐specific O‐glycosylation analysis of SARS‐Cov‐2 Spike glycoproteins Literature references are indicated as first author last name and year

Strategies used for the analysis of viral spike glycoprotein

Various strategies have been recruited to analyze and confirm the nature, the structure and exact location of glycans on viral spike proteins. These strategies involve the sample preparation, the chromatographic separation, mass spectrometry techniques, and bioinformatics analysis. The sample preparation usually involved the use of glycan releasing enzymes (such as PNGase F for N‐glycans) as well as proteases (such as trypsin) with or without enrichment depending on the glycopeptide stoichiometry. Hydrophilic interaction liquid chromatography (HILIC), lectin affinity chromatography, and graphitized carbon chromatography were widely adopted for enrichment of glycopeptides. Figure 2 demonstrates the workflow for the sample preparation and data acquisition.
FIGURE 2

Schematic depiction of site‐specific bottom up glycoproteomics workflow used to characterize SARS‐CoV‐2 S protein glycosylation. Different approaches to the workflow and other protocols used are described further in Section 2.2

Schematic depiction of site‐specific bottom up glycoproteomics workflow used to characterize SARS‐CoV‐2 S protein glycosylation. Different approaches to the workflow and other protocols used are described further in Section 2.2 To simplify glycopeptide complex mixture, chromatographic separation can be implemented on reversed‐phase C18 column, Hydrophilic interaction liquid chromatography (HILIC) [74, 75], and Porous graphitic carbon (PGC) columns [76, 77]. The separated glycans or glycopeptides are then analyzed by tandem MS/MS to determine glycosite location and glycan structure of both N‐ and O‐ glycosylation. Collision‐based dissociation, such as collision induced dissociation (CID) and higher‐energy C‐Trap Dissociation (HCD) are the most commonly used fragmentation techniques in mass spectrometry. Electron‐based dissociation, such as Electron Capture dissociation (ECD) and Electron Transfer Dissociation (ETD) can yield a gentle dissociation of the peptide backbone without neutral loss of the N‐glycan moiety. Ultraviolet photodissociation (UVPD) which combines the features of both collision techniques. Moreover, combined EThcD and ETciD on Orbitrap mass spectrometers have also been implemented for a selective backbone fragmentation. Two MS acquisitions have been adopted for this type of analysis: data‐dependent acquisition (DDA) and data‐independent acquisition (DIA). Characterization glycan composition, intact precursors or fragment ions, is done by a variety of software such as Byonic, pleco, GPQuest, GPSeeker, O‐pair Search in MetaMorpheus, MSFragger‐Glyco, and StrucGP. In the following sections, we present the most intricate challenges in the assignments of site ‐specific glycosylations associated with SARS‐CoV‐2S spike protein. While the majority of site‐specific N‐glycosylation analysis were performed using the bottom‐up approach, very few reports utilized the top‐down approach in their O‐glycosylation analysis [60, 78‐80].

Early reports of SARS‐Cov‐2 spike glycoprotein analysis

SARS COV‐2 is considered the most frequently studied viral glycoprotein by mass spectrometry in the recent few years. Roughly, about three dozen of publications were focused on the analysis of N‐ and O‐glycosylation of spike glycoprotein. In the first published report, Watanabe et al., analyzed N‐glycosylation and partially O‐glycosylation of HEK 293 overexpressed protein and utilized high‐resolution mass spectrometry combined with HCD fragmentation [70]. While the glycopeptide tandem mass spectra of later reports were acquired using lower energy (NCE = 20 to 35) or stepped collision energy, the tandem MS data in first report were acquired using HCD set at 50%. Under NCE 50%, glycopeptides were extensively fragmented which results in very low abundances of peptide + Yn ions. Missing peptide + Yn ions limited glycan characterization with only information of intact mass and oxonium ions, which makes difficult to identify core structure [81]. In addition, Oxonium ions are fragmented to single units which reduce chance to correctly identify outer arm specific structures. NCE glycopeptide tandem mass spectra changes are visible in Figure 3 [82, 83]. To observe more complex information, it is possible to acquire data using 2 or 3 different collision energies while low collision energy is used for glycan structure elucidation and high collision energy for glycopeptide identification. Watanabe et al., described 22 occupied N‐glycosites across the whole spike glycoprotein and two occupied O‐glycosites located on RBD domain. In addition, they constructed the three‐dimensional structure based on the mapping of SARS‐CoV‐2 N‐linked glycans. Many groups including our laboratory described glycosylation analysis of SARS‐CoV‐2 spike glycoprotein following this work [68].
FIGURE 3

Comparison of HCD tandem mass spectra of the PD‐L1 glycopeptide LFNVTSTLR occupied by a biantennary galili recorded at four different collision energies (NCE: 20, 30, 40, 50) (Sanda et al., [83])

Comparison of HCD tandem mass spectra of the PD‐L1 glycopeptide LFNVTSTLR occupied by a biantennary galili recorded at four different collision energies (NCE: 20, 30, 40, 50) (Sanda et al., [83])

Glycosylation analysis using mass spectrometers with various instruments and different resolving power

SARS‐CoV‐2 S protein glycosylation was predominantly analyzed only by high‐resolution mass spectrometers. In most cases, it was an orbitrap based analyzer working with a resolution of 60,000 to 120,000 or a Time Of Flight (TOF) based analyzer which was used in very few cases of the top‐down approach [78, 84]. The use of high and ultrahigh resolving power allowed scientists to effectively analyze 22 glycosylation sites in the mixture of digested peptides and glycopeptides. Targeted multiple‐reaction monitoring (MRM) and parallel reaction monitoring (PRM) methodology were used for detection and quantification of spike protein fragments in biological fluids such as saliva or extracted nasal swabs [85, 86]. Although these methodologies were only reported for nonglycosylated peptide analysis, we envision these will be implemented for spike protein glycosylation analysis of SARS‐CoV‐2 S protein in the near future.

Data processing software

In most of the studies focused on SARS‐CoV‐2 S protein glycosylation, researchers used Byonic software (Protein Metrics, USA). Some of the reports used pGlyco [58], and a few researchers used proprietary software. Interestingly, several reports classified glycans into numerous subgroups such us complex/hybrid and oligomannose without glycan structure‐based separation using chromatographic techniques [87] or ion mobility [83]. Classifying glycans only based on the glycan composition utilizing precursor ions only could lead to misinterpretations of the glycoproteomics data. Similar problem could happen with the assumption that most of the glycans are composed of a single structure. For example, the core fucosylated glycan reported by several groups [58, 69, 70] without the contribution of the outer arm fucosylated form, might be an inaccurate interpretation of the fact. Incorporation of an orthogonal chromatography separation or ion mobility technique may be the ultimate solution to this issue. Several studies reported the occurrence of both forms in biological samples [88]. Zhao et al., used glycomics informed glycoproteomic analysis, which defines glycan space (used glycan data base) by glycomic analysis and could minimize glycoproteomics data misinterpretation. Furthermore, they disclosed the glycan informed glycoproteomics analysis protocol, where they determined searching glycan space by separated analysis of detached glycans using Peptide N‐glycosidase (PNGase F) [58]. Also, Hackett et al., described software and glycospace dependent ID results using four different types of glycoproteomic software with default settings and fixed settings, even with fixed setting such as glycospace, [89] and so on. In addition, this methodology can help correct glycan assignment, but it is heavily dependent on the methodology used for glycan analysis. However, the results were different between studies (Figure 4).
FIGURE 4

Summary of site‐specific N‐glycosylation analysis of SARS‐Cov‐2. The N‐glycan sites and glycan composition at each site are compared among different publications. The compositional analysis of the glycans is shown for each site displaying only the most abundant of each of the three types of N‐glycans: unoccupied sites (gray), high‐mannose (green), hybrid (orange), complex (blue), and paucimannose (yellow). When compositional analysis shows equal abundancy of 2 types of N‐glycans, the two correspondent colors are displayed at the same site. Different sources of the S protein, expression systems, and proteases used for sample digestion are also shown for comparison

Summary of site‐specific N‐glycosylation analysis of SARS‐Cov‐2. The N‐glycan sites and glycan composition at each site are compared among different publications. The compositional analysis of the glycans is shown for each site displaying only the most abundant of each of the three types of N‐glycans: unoccupied sites (gray), high‐mannose (green), hybrid (orange), complex (blue), and paucimannose (yellow). When compositional analysis shows equal abundancy of 2 types of N‐glycans, the two correspondent colors are displayed at the same site. Different sources of the S protein, expression systems, and proteases used for sample digestion are also shown for comparison

Levels of glycosylation (Glycan, glycopeptides, and glycoprotein)

Most of the studies on SARS‐CoV‐2 spike protein glycosylation were performed on the glycopeptide level. The advantage of glycopeptide analysis is its ability to link the sugar moiety to an exact glycosylation site, which is important in the case of analyzing spike glycoprotein and receptor binding domain as well as the development of effective vaccines. On the other hand, the ability of bottom up glycoproteomics to assign specific isomeric structure to a particular glycosite is very limited. Cho et al., used isomeric separation to compare detached glycans from SARS‐CoV‐2, SARS‐CoV and Middle East respiratory Syndrome (MERS) spike glycoproteins [87]. Meanwhile, Gao et al., introduced various MALDI‐MS/MS methods to analyze detached glycans. They showed a high level of LacdiNAc containing structures which were described previously in HEK 293 cell overexpressed glycoproteins [90, 91, 92]. Sanda et al., previously utilized ion mobility for separation of biantennary galactosylated complex glycan and LacdiNAc contains hybrid glycans isobaric glycopeptides [83], overexpressed PDL1 in the HEK 293 cells [91]. The approaches used for the analysis of O‐glycosylations are more diverse, such as bottom up (Figure 4) or top down proteomics [78]. The rest of N‐glycosylation analysis was performed using bottom up glycoproteomics strategy. Figure 4 demonstrates the various bottom‐up methodologies used for analysis of N‐glycosylation in all cases. The results were consistent despite the minor variation in a few aspects like using different proteolytic enzyme combination, multiple fragmentation settings, different processing softwares, using ion mobility for separation of glycan isomers [68, 78] or using ETD fragmentation for occupied glycosite assignment [68, 92]. Top‐down approach was exclusively used in the analysis of O‐glycosylation of RBD domain since it has only two potential glycosites and allows effective TOP‐down analysis of O‐glycosylation.

N‐glycosylation depends on different source of material

The N‐glycosylation of SARS‐CoV‐2 S proteins has been investigated extensively in the literature. Several studies used recombinant S proteins produced in different expression systems [58, 68‐70, 93‐95]. This is mostly due to the difficulties inherent to the study of the wild type viral proteins. Human embryonic kidney cell line HEK 293 was the most frequently used expression system to produce the recombinant S protein (Figure 4). Some studies compared the recombinant S protein expressed in HEK cells with protein overexpressed in insect cells [93, 95]. As expected, N‐glycosylation from different expression systems yielded variable results. Figure 4 describes the different expression systems, source of protein, used proteases and references for the various N‐glycosylation analysis. The blue color indicates a complex glycan structure, green refers to a high mannose glycan, orange to hybrid glycans, and yellow to paucimannose glycans. A total of 3 research groups focused on analysis of S1 subunit encompassing the RBD domain which was overexpressed in HEK 293 cell line [93, 95, 96], while a single study used S1 subunit isolated from virion cultivated on Calu‐3 cells [97]. The three studies of S1 subunit overexpressed in HEK 293 cells showed only complex glycans as a major structural component on all sites. On the other hand, Zhang et al., reported N603 mostly occupied by high mannose glycan [95]. Interestingly, a total of 11 out of 16 studies demonstrated that N234 was occupied by oligomannose moiety, while the remaining groups concluded that N234 is decorated with a complex glycan structure which has a direct impact on receptor binding [93, 94, 95, 96]. To note that three of the studies reporting complex glycan structures at N234 used only the S1 subunit expressed in HEK 293 cells, instead of the full size (S1+S2 subunits) spike protein construct. In addition, Brun et al., showed that almost one third of the glycosites (including N234) is occupied by oligomannose structures in the S1 subunit, isolated from virion grown on Calu‐3 cells. Three research groups used insect‐cell lines for the production of SARS‐CoV‐2 spike glycoprotein [67, 93, 95]. It is known that insect cells produce predominantly short version of N‐gycans also known as paucimannose glycans. These glycan type was identified on almost all N‐glycosylation sites on these studies. Wang et al., and Bangarus et al., displayed only a few sites (including N234) occupied by oligomannose glycans [67, 93]. Zhang et al., used insect cells to overexpress their spike glycoprotein, but they identified mostly oligomannose structures in all sites. Similar N‐glycosylation pattern was detected using the same overexpression system in the case of other proteins such as prostate‐specific membrane protein (PSAM) [98]. Following studies described analysis of spike glycoprotein isolated from virions [96, 97, 99]. The glycosylation profiles in vero cells showed more oligomannose structures compared to of HEK 293 cells overexpressed protein (including N234 site) with Yao et al., being an exception. Yao et al., demonstrated complex glycan structures on all sites and except three positions (N234 and N607 which were partially occupied by oligomannose and complex glycans and 122 which was occupied predominantly by paucimannose glycans) [99]. One of interesting issues is the identification of unoccupied sites. Two reports have shown unoccupied sites. Shajahan et al., reported four unoccupied N‐glycosites (N17, N603, N1134, N1158, and N1173) and Sanda et al., reported one unoccupied site N603. These sites were clearly identified as unoccupied sites and, in some cases, confirmed by PNGaseF deglycosylation. Zhang et al., showed site N1173 with very low occupancy. The other 13 reports showed all sites occupied or nonidentified, but did not reported unoccupied sites. The number of identified sites vary depending on the protein source and the digestion system. In our case, work was mainly focused on the analysis of O‐glycosylation near to the furin cleavage site and digestion system was chosen for the cleavage of N657 glycosite and predicted O‐glycosites S673 and T678, to reduce potential false‐positive O‐glycan identification due to incomplete N deglycosylation. Using Try/GluC combination, we were not able to identify site N17 (very short peptide) and sites N709 and N717 (potential double occupied glycopeptide). This pattern is consistent with other reports of Tian et al., used the same digesting enzyme combination. Other reports described using a digestion mixture of trypsin/chymotrypsin with minor changes and were able to describe glycosylation on missing sites in our report. Determination of the expression system independent of the glycosylation events can help understand protein sequence and structure. To the best of our knowledge, there is no analysis of glycosylation of spike glycoprotein isolated from real patient samples. Detailed knowledge of S protein glycosylation it's not only important for vaccine development, but also to understand its role on receptor binding. It was recently shown, that N‐glycans of different types can have different effects on interaction with receptors [84]. In this study, we showed how S1 protein expressed in different expression models (and hence different major N‐glycosylation types) including baculovirus‐insect, Chinese hamster ovarian (CHO) cells and two variants of HEK 293 cells, imposed different binding affinities to ACE2.

O‐Glycosylation on different proteoforms and different used techniques

One of the important aspects of spike glycoproteins analysis was the form (partial sequence or mutation) of glycoprotein for overexpression. As mentioned above, Wang et al., [93] Tian et al., [96] and Zhang et al., [95] overexpressed subunit S1 containing receptor binding domain (RBD). Several groups described analysis of protein which has mutated furin cleavage sites such as R683A, R685A [68, 92]; R683S, R685S [70] and R683G, A684S, and R685G [58]. Spike protein is overexpressed as one intact unit. Many studies described glycosylation analysis of protein overexpressed as trimeric complex similar to real viral spike complex [94]. Table 1 epitomizes the research studies performed on the O‐glycosylation sites. epth of O‐glycoproteomics analysis is shown in column “Total sites” which is number of occupied sites identified in spike glycoprotein. Contrary to N‐glycosylation, there is no known sequence motif for O‐glycosylation. Therefore, the number of identified sites greatly depends on the used methodology, enrichment, and the digestion system. Overall, O‐glycosylation degree (site occupancy) of overexpressed protein is relatively low compared to N‐glycosylation. Many different groups (9 out of 11 reports) identified 1 or 2 occupied O‐glycosites on the receptor binding domain S323 and S325. These glycosites could have an influence on the binding of spike protein to ACE2 receptor [52, 84]. This information is listed in column “RBD” sites in Table 1. Another physiologically important O‐glycosylation region was predicted to be near the furin cleavage site. This polybasic furin cleavage sequence is specific for human SARS‐CoV‐2 spike glycoprotein [52]. There are three predicted O‐glycosites near this site: S673, T678, and S686. A total of eight reports described T678 occupied sites which is only two amino acid from the furin cleavage site. Also, Gao et al., showed occupied S686 in his report. There was no described identification of occupied S673. In addition, Zhang et al., disclosed occupancy of sites T676 and S680 which were not predicted to be occupied [100]. Mutations of T676 and T678 were used in recent study that confirmed influence of O‐glycosylation on furin cleavage efficiency [100]. Some publications used EthcD fragmentation to efficiently assign O‐glycosites [68, 73]. The rest of the publication mostly used beam type fragmentation and deduced glycosite occupancy mostly on the beam type fragmentation. Major glycans occupied O‐glycosites were core 1 and core 2 sialylated structures. Some of the reports identified fucosylated O‐glycans. Sanda et al., used cyclic ion mobility on the oxonium ions fragments to analyze sialic acid linkages. They found that O‐glycan structures that were, commonly described as core 2, could be in the case of spike glycoprotein a mixture of extended core 1 and core 2 with different sialic acid linkages (Figure 5).
FIGURE 5

An example of separation of O‐glycostructures using of ion mobility technique. (Sanda et al., [68])

An example of separation of O‐glycostructures using of ion mobility technique. (Sanda et al., [68]) Two studies used top‐Ddown proteomics to analyze O‐glycosylation of receptor binding domain only[78, 79]. The O‐glycopeptide enrichment, in the case of Zhang Y et al., can significantly increase the number of identified glycosites [94]. However, the physiological significance is questionable due to the very low site occupancy described in other reports. Occupancy of S/T of N‐glycomotives is very rare but two reports showed partial O‐glycosylation occupancy of peptides after N‐glycan cleavage [68, 101].

Source of data variability between laboratories, instrumentation, and methodologies

It is important to note that the identified glycoforms of the overexpressed spike protein greatly depend on the expression system used (Figure 4) and may not necessarily reflect the virion protein glycoforms isolated from infected cells of SARS‐CoV‐2 patients. A number of different protein forms and expression systems have been used to deduce structure and glycosylation of SARS‐CoV‐2 S protein. Many of those were explored as candidates in vaccine research. To the best of our knowledge, only three studies have been carried out on SARS‐CoV‐2 virion preparations [96, 97, 99]. The rest of these publication described glycosylation of overexpressed protein in human embryonic kidney cells or insect‐cell expression systems. As expected, the glycosylation composition strongly influenced by the expression system as well as the number of passages [90]. In addition, the type of digestion, LC‐MS system as well as the software used for identification and quantification could be a source of interlaboratory variation when describing glycosylation of SARS‐CoV‐2 spike glycoprotein. A recent interlaboratory study was focused on examining the variations of data processing and evaluation. This study disclosed significant differences in observation of the number of glycoforms as well as composition with various software, evaluation criteria, and experience of researchers [102]. Figure 4 epitomizes the N‐glycosylation analysis with information about the source of analyzed material, digestion system used, and glycan identification. As mentioned above Hackett et al., compared four different types of software and the results varied significantly (Figure 6) [89].In their previous work he used Watanabe et al., data [70] and his software tool named GlycReSoft [89]. They observed similar quantities and glycan distribution as their original work but identified many sulfated glycans especially on the site N1074 (6 out of major 30) and penta‐antennary or polyLacNAc containing glycopeptides. This indicates that the results strongly depend on the glycan composition of database used for the study. Figure 6 represents the various types of software and their corresponding outcomes. The N‐linked glycans from the native spikes glycans are very similar to that of the recombinant glycoprotein glycans [96].
FIGURE 6

Comparison of software used for N‐glycoproteomics data processing from Hackett et al., [89]. (A) Comparison of glycoproteomics software, and (B) data analysis from Zhang et. al., [105]

Comparison of software used for N‐glycoproteomics data processing from Hackett et al., [89]. (A) Comparison of glycoproteomics software, and (B) data analysis from Zhang et. al., [105] Differences in data are not only influenced by experimental variation, but also by data misinterpretation as well. For example, bioinformatics mass data analysis by Krishnan et al., has mixed nonidentified N‐glycosites in some cases as unoccupied [103]. Furthermore, Segreto et al., misinterpred O‐glycosylation data obtained by EthcD as data obtained by HCD fragmentation with modulated collision energy [104].

CONCLUDING REMARKS AND OUTLOOK

In this review, we collected and compared most of results on glycosylation of SARS‐CoV‐2. We showed the discrepancies found on both glycan structural characterization and site occupancy among the different studies. These could be related to the differences in the expression systems, conditions, and/or form of expressed spike glycoprotein. Moreover, the complex molecular structures and glycan heterogeneity of the spike protein binding domain are still an unresolved mystery due to the challenges in conventional Mass Spectrometry techniques and/or analysis. Furthermore, we demonstrated the most recent advancements in N‐glycosylation as well as O‐glycosylation analysis in greater detail. Also, we highlighted the most recent discrepancies in the current literature of O‐glycosylation in the SARS‐CoV‐2 spike glycoprotein and explained some misinterpretation of previously results in recent publications. Data analysis of O‐glycosylation continues to be a major source of variation. A unified technique and a reliable software may help standardize the workflow of this complex type of analysis.

CONFLICT OF INTEREST

The authors declare no conflict of interest.
  101 in total

Review 1.  The oligosaccharyltransferase complex from yeast.

Authors:  R Knauer; L Lehle
Journal:  Biochim Biophys Acta       Date:  1999-01-06

2.  Substrate recognition by oligosaccharyltransferase. Studies on glycosylation of modified Asn-X-Thr/Ser tripeptides.

Authors:  J K Welply; P Shenbagamurthi; W J Lennarz; F Naider
Journal:  J Biol Chem       Date:  1983-10-10       Impact factor: 5.157

3.  LC-MS3 quantification of O-glycopeptides in human serum.

Authors:  Miloslav Sanda; Petr Pompach; Julius Benicky; Radoslav Goldman
Journal:  Electrophoresis       Date:  2013-07-24       Impact factor: 3.535

4.  Cryo-EM structure of the 2019-nCoV spike in the prefusion conformation.

Authors:  Daniel Wrapp; Nianshuang Wang; Kizzmekia S Corbett; Jory A Goldsmith; Ching-Lin Hsieh; Olubukola Abiona; Barney S Graham; Jason S McLellan
Journal:  Science       Date:  2020-02-19       Impact factor: 47.728

5.  Comprehensive Analysis of the Glycan Complement of SARS-CoV-2 Spike Proteins Using Signature Ions-Triggered Electron-Transfer/Higher-Energy Collisional Dissociation (EThcD) Mass Spectrometry.

Authors:  Dongxia Wang; Jakub Baudys; Jonathan L Bundy; Maria Solano; Theodore Keppel; John R Barr
Journal:  Anal Chem       Date:  2020-10-16       Impact factor: 6.986

Review 6.  Impact of COVID-19 on Mitochondrial-Based Immunity in Aging and Age-Related Diseases.

Authors:  Riya Ganji; P Hemachandra Reddy
Journal:  Front Aging Neurosci       Date:  2021-01-12       Impact factor: 5.750

Review 7.  Mapping the SARS-CoV-2-Host Protein-Protein Interactome by Affinity Purification Mass Spectrometry and Proximity-Dependent Biotin Labeling: A Rational and Straightforward Route to Discover Host-Directed Anti-SARS-CoV-2 Therapeutics.

Authors:  Rosa Terracciano; Mariaimmacolata Preianò; Annalisa Fregola; Corrado Pelaia; Tiziana Montalcini; Rocco Savino
Journal:  Int J Mol Sci       Date:  2021-01-07       Impact factor: 5.923

8.  Glycans are a novel biomarker of chronological and biological ages.

Authors:  Jasminka Krištić; Frano Vučković; Cristina Menni; Lucija Klarić; Toma Keser; Ivona Beceheli; Maja Pučić-Baković; Mislav Novokmet; Massimo Mangino; Kujtim Thaqi; Pavao Rudan; Natalija Novokmet; Jelena Sarac; Saša Missoni; Ivana Kolčić; Ozren Polašek; Igor Rudan; Harry Campbell; Caroline Hayward; Yurii Aulchenko; Ana Valdes; James F Wilson; Olga Gornik; Dragan Primorac; Vlatka Zoldoš; Tim Spector; Gordan Lauc
Journal:  J Gerontol A Biol Sci Med Sci       Date:  2013-12-10       Impact factor: 6.591

9.  Identification of 22 N-glycosites on spike glycoprotein of SARS-CoV-2 and accessible surface glycopeptide motifs: Implications for vaccination and antibody therapeutics.

Authors:  Dapeng Zhou; Xiaoxu Tian; Ruibing Qi; Chao Peng; Wen Zhang
Journal:  Glycobiology       Date:  2021-01-09       Impact factor: 4.313

10.  Analysis of the SARS-CoV-2 spike protein glycan shield reveals implications for immune recognition.

Authors:  Oliver C Grant; David Montgomery; Keigo Ito; Robert J Woods
Journal:  Sci Rep       Date:  2020-09-14       Impact factor: 4.379

View more
  2 in total

Review 1.  Site-specific glycosylation of SARS-CoV-2: Big challenges in mass spectrometry analysis.

Authors:  Diana Campos; Michael Girgis; Miloslav Sanda
Journal:  Proteomics       Date:  2022-06-22       Impact factor: 5.393

Review 2.  Proteomics-based mass spectrometry profiling of SARS-CoV-2 infection from human nasopharyngeal samples.

Authors:  Sayantani Chatterjee; Joseph Zaia
Journal:  Mass Spectrom Rev       Date:  2022-09-29       Impact factor: 9.011

  2 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.