Literature DB >> 35380435

Online Hydrophilic Interaction Chromatography (HILIC) Enhanced Top-Down Mass Spectrometry Characterization of the SARS-CoV-2 Spike Receptor-Binding Domain.

Jesse W Wilson1, Aivett Bilbao1, Juan Wang2, Yen-Chen Liao1, Dusan Velickovic1, Roza Wojcik3, Marta Passamonti4,5, Rui Zhao1, Andrea F G Gargano4,5, Vincent R Gerbasi2, Ljiljana Pas A-Tolić1, Scott E Baker1, Mowei Zhou1.   

Abstract

SARS-CoV-2 cellular infection is mediated by the heavily glycosylated spike protein. Recombinant versions of the spike protein and the receptor-binding domain (RBD) are necessary for seropositivity assays and can potentially serve as vaccines against viral infection. RBD plays key roles in the spike protein's structure and function, and thus, comprehensive characterization of recombinant RBD is critically important for biopharmaceutical applications. Liquid chromatography coupled to mass spectrometry has been widely used to characterize post-translational modifications in proteins, including glycosylation. Most studies of RBDs were performed at the proteolytic peptide (bottom-up proteomics) or released glycan level because of the technical challenges in resolving highly heterogeneous glycans at the intact protein level. Herein, we evaluated several online separation techniques: (1) C2 reverse-phase liquid chromatography (RPLC), (2) capillary zone electrophoresis (CZE), and (3) acrylamide-based monolithic hydrophilic interaction chromatography (HILIC) to separate intact recombinant RBDs with varying combinations of glycosylations (glycoforms) for top-down mass spectrometry (MS). Within the conditions we explored, the HILIC method was superior to RPLC and CZE at separating RBD glycoforms, which differ significantly in neutral glycan groups. In addition, our top-down analysis readily captured unexpected modifications (e.g., cysteinylation and N-terminal sequence variation) and low abundance, heavily glycosylated proteoforms that may be missed by using glycopeptide data alone. The HILIC top-down MS platform holds great potential in resolving heterogeneous glycoproteins for facile comparison of biosimilars in quality control applications.

Entities:  

Mesh:

Substances:

Year:  2022        PMID: 35380435      PMCID: PMC9003935          DOI: 10.1021/acs.analchem.2c00139

Source DB:  PubMed          Journal:  Anal Chem        ISSN: 0003-2700            Impact factor:   6.986


Introduction

The heavily glycosylated spike protein on the surface of SARS-CoV-2 virion particles mediates internalization into human cells via interactions with the cellular surface protein angiotensin converting enzyme-2 (ACE-2).[1−5] Due to the direct interaction of the spike receptor-binding domain (RBD) and ACE-2, the RBD serves as a key target for neutralizing antibodies to prevent infection.[1,6,7] Recombinant spike protein and RBD can potentially serve as vaccines,[1,8,9] and RBDs are necessary for diagnostic purposes in immunoassays.[7,9] RBD glycosylation has been demonstrated to play a role in ACE-2 binding and aids shielding of the spike protein from antibodies.[2,4,10,11] In general, protein glycosylation is known to modulate the immune response,[12,13] and recombinant protein glycosylation is often dependent on the expression platform employed.[14−16] Thus, understanding the full glycosylation profile of the spike RBD is important for the development and quality control of novel therapeutics or vaccines,[11] where knowledge of the precise combination of all post-translational modifications (PTMs) is necessary. Individual glycosites can be occupied with many glycan structural variants, which result in different forms of the protein termed glycoforms. With glycoproteins, macroheterogeneity describes the glycan occupancy at a given glycosite, while microheterogeneity is the variation of glycan composition per glycosite. More recent glycoprotein observations have led to the idea of glycan metaheterogeneity, a higher level of glycan regulation based on the variation in glycosylation across multiple sites.[17] Conventional analysis of protein glycosylation relies on bottom-up mass spectrometry (MS) approaches to produce glycopeptides using enzymatic digestion,[3,18−20] and release of glycans from the protein for comprehensive glycan profiling.[21] Although these approaches lead to robust identification of glycosites and glycans (macroheterogeneity and microheterogeneity, respectively), the overall connectivity between various intact glycoforms and other PTMs is lost (metaheterogeneity). Top-down MS experiments skip enzymatic digestion to analyze whole proteins where the relative abundance of exact proteoforms, proteins with varying glycosylations or PTMs, can be determined.[22] To better resolve heterogeneous samples (e.g., glycoproteins), liquid chromatography is often coupled to MS to separate proteins based on their chemical properties before MS detection.[22−24] Denaturing reversed-phase liquid chromatography (RPLC) is among the most widely used separation techniques with protein retention based on hydrophobic interactions.[25] Capillary zone electrophoresis (CZE) coupled to MS, in comparison, separates molecules based on charge and size characteristics within a capillary using an applied electric field in the presence of a background electrolyte.[26] Ion exchange is another charge-based online separation method that has recently been developed to study intact therapeutic proteins such as antibodies.[27] Hydrophilic interaction chromatography (HILIC), which has the opposite selectivity to RPLC, separates molecules based on hydrophilicity with greater retention for hydrophilic molecules such as glycans.[28−32] Two recent publications have used top-down MS to study RBD glycosylation.[33,34] Both the studies identified O-glycosylation at site T323 of the spike S1 protein and two N-glycosylation sites at N331 and N343, the same sites that were previously identified with bottom-up glycoproteomics.[3,19,20] The study from Roberts et al.[34] used denatured and native top-down analysis to determine the relative abundance of O-glycoforms of the RBD, while the study from Gstottner et al.[33] combined bottom-up and intact protein analysis with multiple glycosidase enzymatic steps to study the N- and O-linked glycosylation profile of the RBD. However, the modality of online separation for improving the intact protein analysis has not yet been thoroughly investigated in these reports. Herein, we compared HILIC with two other commonly used intact protein separation methods (C2 RPLC[35] and CZE[36]) on several RBDs recombinantly expressed in HEK 293 cells from two different vendors (Sino Biological and RayBiotech). HILIC allowed the greatest separation of RBD glycoforms, which were different in their neutral glycans. Drastic differences in the glycan composition were also detected between vendor sources when the same expression platform (from the same type of cell line) was used. Additionally, our results suggest that the RBD exhibits more than 200 individual glycoforms that are assignable. Top-down MS also helped the discovery of unexpected PTMs on the proteins that may affect the structure and function. When compared with glycopeptide data, HILIC top-down analysis better detected low abundance and/or heavily glycosylated proteoforms that may be missed due to more severe detection biases at the peptide level. We anticipate that online HILIC separation has great potential for defining the metaheterogeneity of heterogeneous glycoproteins for biotherapeutic or biotechnology use.

Materials and Methods

Chemicals and Proteins

The glycoprotein standard alpha-1-acid glycoprotein (AGP, G9885) was purchased from Millipore Sigma (St. Louis MO). Spike RBD proteins expressed in HEK 293 cells were purchased from two sources: Sino Biological (Beijing, China) and RayBiotech (Atlanta, GA, USA). A wild type version of the RBD (SARS-CoV-2 spike protein amino acids 319–541) was purchased from both sources expressed with a C-terminal polyhistidine tag. Additionally, the N501Y (stronger ACE-2 binding)[37] mutant from Sino Biological and the N331Q mutant (removes the N331 glycosite) from RayBiotech were purchased for comparison. Ammonium acetate, tris(2-carboxyethyl)phosphine hydrochloride (TCEP), 1,4-dithiothreitol, and iodoacetamide were purchased from Sigma (St. Louis, MO, USA). Peptide-N-glycosidase F (PNGase F, P0704S) was acquired from New England Biolabs (Ipswich, MA, USA).

Sample Preparation for Intact Protein Analysis

AGP and the RBDs from RayBiotech came as lyophilized powders and were diluted to a starting concentration of 1 mg/mL with deionized water. The RBDs from Sino Biological were also lyophilized but were diluted to a starting concentration of 0.25 mg/mL following the manufacturer’s recommendations. Samples were then buffer exchanged into 100 mM ammonium acetate using Zeba 7 kDa desalting columns (Thermo Fisher) that were equilibrated with ammonium acetate. All samples were vialed at 0.25 mg/mL in 100 mM ammonium acetate for MS analysis. Further sample preparation for PNGase F treatment of RBDs as well as glycopeptide and released glycan analysis can be found in the Supplementary Methods.

Online Liquid Chromatography for Intact Protein Mass Spectrometry

Online RPLC and HILIC separations were performed with a Waters NanoAcquity UPLC with dual pump trapping mode. The nanoflow C2 column (100 μm ID capillary) was packed in-house.[35] The HILIC column used was recently developed and made in-house,[38] consisting of an acrylamide-based polymer monolith stationary phase polymerized in a 200 μm ID capillary. Packed HILIC columns can be made using commercially available HILIC materials[29] but have a lower LC resolution and higher baseline. Both C2 and HILIC utilized an online desalting C2 column prior to the analytical separation. Online CZE separation was performed using a CMP Scientific (Brooklyn, NY) EVE-001 capillary electrophoresis autosampler using a proprietary 100 cm PS2 coated capillary (cat: E-SC-PS2-360/150-50-100-B1) from CMP Scientific using denaturing background electrolyte conditions (10% acetic acid in water). Further details for online separation can be found in the Supporting Information Methods. Most intact (RPLC, CZE, and HILIC) and top-down mass measurements were performed with a Thermo Fisher Orbitrap Eclipse tribrid. A Thermo Q-Exactive HF mass spectrometer was used for the Sino Biological C2 RPLC separation experiments. Additionally, a Thermo Exploris 480 was used for higher-energy collisional dissociation (HCD) top-down analysis of RBDs after PNGase F N-glycan removal. The nanoelectrospray source was set to 1.8–2.2 kV, with the transfer tube at 305 °C, source fragmentation voltage set to 35 V for HILIC experiments, and 15 V for C2 RPLC and CZE experiments. The HILIC separation experiments required higher source fragmentation voltages to diminish trifluoroacetic acid (TFA) adducts. TFA at 0.05% was used as a necessary ion-pairing agent in HILIC LC analysis.[29] The RF lens was set at 70% for experiments. MS1 spectra were acquired with a mass range of 600–6000 m/z at 7500 resolution (at m/z of 400), AGC target of 8E5, a maximum injection time of 200 ms, and 5 microscans. The Eclipse and Exploris were set to intact protein mode with low pressure. The Sino Biological WT RBD was run in triplicate as per a separation method tested, while the other RBDs were run in triplicate using HILIC separation.

Data Analysis of Intact Mass Spectra

All raw mass spectra used for comparing separation techniques were deconvolved to zero-charge spectra and output as a matrix of mass, abundance, and elution time slice using Protein Metrics (Cupertino, CA) Intact Mass software (version 4.2)[39] with default settings. An R script was implemented (Version 4.0.2) to compare the list of deconvolved masses from each separation method and to remove mass peaks not observed in triplicate using a mass tolerance of ±2 Da. Further details of this peak filtering approach can be found in the Supporting Information Methods. The R source code, deconvolved intact mass data, and intact mass assignment for each RBD can be found at https://github.com/EMSL-Computing/RBD-intact-peak-analysis.

Results and Discussion

HILIC Provided the Highest Degree of Separation for Resolving the Heterogeneity of RBD Glycoforms by Separating Neutral Glycans

To benchmark the separation methods for glycoform separation, we used AGP (orosomucoid 1) as a mammalian glycoprotein standard. Almost half of the mass of the ∼40 kDa AGP is from extensive N-glycosylation (5× N-glycan sites) with a high degree of sialic acid incorporation.[40] Overall, the different methods showed different selectivities for AGP proteoform separation. Both CZE and HILIC demonstrated greater capacities to separate glycoforms than C2 RPLC (Figure S1). In contrast, C2 RPLC separation was mostly between the sequence variants of ORM1*F1 and ORM1*S (Figure S2). CZE separated AGP charge variants well based on the degree of sialylation per glycoform (Figure S3), while HILIC separation demonstrated a trend of increasing glycosylation with retention time (mostly based on neutral glycans). These observations were consistent with previous studies,[28,40,41] and the work of Baerenfaenger and Meyer was used for AGP proteoform assignment.[40] Variable amounts of sialic acid groups on the O-and the N-glycan sites have been reported on the RBD.[20,33,34] Separation techniques to resolve these complex RBD glycoforms have not been systematically evaluated to the best of our knowledge. One might expect CZE to separate the RBDs well, since there is some variance in charged sialic acid, while HILIC may be able to separate the more neutral N-glycans based on total glycan composition. Figure A–C displays the C2 RPLC, CZE, and monolithic HILIC separation for the Sino Biological WT RBD with total ion chromatograms/electropherogram (TICs), and extracted ion chromatograms/electropherograms (XICs) of selected RBD glycoforms that only differ in their glycan compositions. The glycoform assignments were based on additional top-down, glycoproteomic, and released glycan data, which will be discussed in later sections. Minor separation was seen in both C2 RPLC and CZE for the selected high-abundance RBD glycoforms (Figure A,B). In contrast, HILIC produced clear chromatographic separation of RBD glycoforms, with resolvable peaks even in the TIC (Figure C). The limited CZE separation may be due to the small degree of heterogeneity in sialic acid seen in recent reports of recombinant RBDs. The T323 O-glycan site is decorated with 1–2 sialic acids (the most abundant O-glycan has 2 sialic acids) with Core 1 or Core 2 O-glycan structures.[33,34] The identified N-glycans are complex and have between 0 and 3 sialic acids.[20,33] This combination of O- and N-glycan compositions leads to the majority of RBDs containing 2–4 sialic acids,[33] a narrower distribution than seen with proteins that separate well with CZE such as AGP (13–19 sialic acids).
Figure 1

Sino Biological WT RBD intact glycoform separation comparison between (A) C2 RPLC, (B) CZE, and (C) monolithic HILIC. Each XIC corresponds to a defined glycoform of the RBD (charge states 16+ except for the lightest two glycoforms with one N-glycan are 15+). HILIC separated the RBD glycoforms the best by the N-glycan occupancy and glycan composition. (D–F) Intact mass distributions for the RBD at given HILIC elution times matching three selected elution peaks marked in (C). (D) Part of the RBD population with 1× N-glycan occupied on sites N331 or N343. The mass shift from the most abundant peaks from (D) to (E) (labeled elution peaks 1 and 2) corresponds to 2144 Da adding the N-glycan H3N6F3. The 186 Da spacing observed does not match a glycan mass but is an additional unknown modification. (F) Late eluting RBD species (labeled elution peak 3) display a mass peak spacing of 146 Da, suggesting increasing fucosylation and sialic acid. The two fucose units weighs 1 Da more than one sialic acid. Labeled glycan compositions are based on matching top-down and released N-glycan data. Corresponding glycan key: hexose (H), N-acetylhexosamine (N), fucose (F), and sialic acid (S).

Sino Biological WT RBD intact glycoform separation comparison between (A) C2 RPLC, (B) CZE, and (C) monolithic HILIC. Each XIC corresponds to a defined glycoform of the RBD (charge states 16+ except for the lightest two glycoforms with one N-glycan are 15+). HILIC separated the RBD glycoforms the best by the N-glycan occupancy and glycan composition. (D–F) Intact mass distributions for the RBD at given HILIC elution times matching three selected elution peaks marked in (C). (D) Part of the RBD population with 1× N-glycan occupied on sites N331 or N343. The mass shift from the most abundant peaks from (D) to (E) (labeled elution peaks 1 and 2) corresponds to 2144 Da adding the N-glycan H3N6F3. The 186 Da spacing observed does not match a glycan mass but is an additional unknown modification. (F) Late eluting RBD species (labeled elution peak 3) display a mass peak spacing of 146 Da, suggesting increasing fucosylation and sialic acid. The two fucose units weighs 1 Da more than one sialic acid. Labeled glycan compositions are based on matching top-down and released N-glycan data. Corresponding glycan key: hexose (H), N-acetylhexosamine (N), fucose (F), and sialic acid (S). As seen here and previously, HILIC has a high capacity to separate neutrally charged glycans,[28,30] leading to a broader elution time profile for RBD glycoforms than observed with the standard intact protein separation methods using C2 RPLC or sheath flow CZE (Figure C). To make the data more comparable, we selected conditions that yield similar separation windows of ∼5 min across the main protein peaks for all methods. From the monolithic HILIC elution profile alone, differences in the N-glycan occupancy become readily apparent, which would otherwise be easily missed with CZE or C2 RPLC. Figure D–F displays the intact mass distributions for the Sino Biological WT RBD at three different points along the HILIC elution (marked in Figure C). The first elution peak at 19 min (Figure C) contains RBD species weighing 29830.3 Da (Figure D), while the main elution peak at ∼21 min weighs 31974.3 Da (Figure E), a mass difference of 2144 Da. This mass shift of 2144 Da corresponds to the addition of the N-glycan group with the composition of H3N6F3 (hexose (H), N-acetylhexosamine (N), fucose (F), and sialic acid (S)), which was identified previously from glycoproteomics[20] and is observed in our released glycan and glycoproteomics data. Additionally, a repeated peak spacing of 41 Da is observed in Figure D,E that can be attributed to the exchange of an H1 for N1. Late eluting species (Figure F) display an abundant peak spacing of 146 Da that matches to increasing amounts of fucose or sialic acid. Two fucose units weighs only 1 Da more than one sialic acid group, causing mass degeneracy that is difficult to disentangle due to peak overlap and limited mass resolution. Overall, compared to the generic RPLC and CZE methods for intact proteins, the HILIC used here provided a broader separation of RBD glycoforms based on the extent of glycosylation (increasing retention with extent of glycosylation) with distinct peaks being separated from the main protein elution area. The high resolving power of HILIC for glycoforms seen here was consistent with the different selectivities among the separation methods shown for the glycoprotein lipase.[30] Although further optimization for RPLC and CZE is possible (e.g., gradient length, stationary phase/coating, and mobile phase/buffer), we herein focused on evaluating HILIC separation of RBDs because of the ease of glycoform separation.

HILIC Separation Reduced Spectral Congestion and Improved Detection of Low Abundance Glycoforms

A significant challenge in MS of heterogeneous macromolecules (such as glycoproteins) remains in the effective charge state determination prior to accurate mass determination by deconvolution. The complexity may arise from the lack of mass resolution and high spectral baselines due to peak coalescence. In addition to the advances of new algorithms[42] and instruments (e.g., charge detection MS),[43−45] online separations also play essential roles in reducing sample complexity. Figure displays the deconvolved intact mass distributions with each tested RBD using HILIC separation. Each colored trace corresponds to the intact mass distribution at a given apex elution time from the HILIC separation. With each RBD, the intact mass increases with retention time. Visual comparison of the intact mass profiles between RBDs immediately suggests that the glycan compositions for the Sino Biological WT and N501Y RBDs (Figure A,D) are different from those of the RayBiotech WT RBD (Figure B). Importantly, the two Sino Biological RBDs (WT and N501Y) have very similar intact mass profiles that are only shifted by the N to Y amino acid substitution (49.07 Da). The RayBiotech WT RBD has a broader intact mass profile than the Sino Biological RBDs that are centered around a few high-abundance mass peaks. The N331Q mutant (removes the N331 glycosite, Figure D) displayed a simpler intact mass profile with reduced glycosylation.
Figure 2

Deconvolved intact mass analysis of tested RBDs separated with monolithic HILIC across the elution profile with apex times given for each elution slice. (A) Sino Biological WT RBD. The same raw data as shown in Figure . (B) RayBiotech WT RBD. (C) Sino Biological N501Y RBD. (D) RayBiotech N331Q RBD. The intact mass distributions (without peak filtering) for HILIC separated slices are represented by the overlaid color traces . The mass distributions of the WT RBDs are drastically different between the two vendors, while the N501Y mutant is simply shifted by the mass of the mutation in comparison to the WT RBD from the same vendor.

Deconvolved intact mass analysis of tested RBDs separated with monolithic HILIC across the elution profile with apex times given for each elution slice. (A) Sino Biological WT RBD. The same raw data as shown in Figure . (B) RayBiotech WT RBD. (C) Sino Biological N501Y RBD. (D) RayBiotech N331Q RBD. The intact mass distributions (without peak filtering) for HILIC separated slices are represented by the overlaid color traces . The mass distributions of the WT RBDs are drastically different between the two vendors, while the N501Y mutant is simply shifted by the mass of the mutation in comparison to the WT RBD from the same vendor. With each RBD, more than 200 peaks were detected due to the number of different glycoforms, creating a broad and congested mass distribution that is challenging to interpret. Figure Using the Ray Biotech WT RBD as an example that had the most complex glycosylation pattern among the four RBDs, Figure A displays the TIC for the HILIC separation with selected mass spectra of RBD species from the elution (Figure B). Summing the elution window together produced a heavily congested mass spectrum (Figure C). Some of the lighter RBD species for this sample (red and blue mass spectral traces in Figure B) are easily lost without the use of the HILIC glycoform separation due to the high baseline and overlapping charge state distributions observed. For instance, the 12+ RBD species weighing 29273.3 Da (retention time 16.5 min) overlaps with heavier 13+ RBD species weighing ∼31,570 Da that elute later (retention time 19.5 min). However, taking 30 second windowed slices along the elution drastically reduces the spectral complexity before performing mass deconvolution, which eases detection of the lighter RBD species.
Figure 3

(A) Example chromatogram and (B) overlaid mass spectra from the RayBiotech WT HILIC elution. Using 30 second windowed slices (colored segments and traces, red: 16.5–17 min, blue: 18–18.5 min, and gold: 19.5–20 min) from the elution aids resolution of more RBD glycoforms due to the reduction in spectral congestion in comparison to summing the full elution window together (C) (16.5–26.5 min) where overlapping charge state distributions can be observed. For example, the 12+ RBD species weighing 29273.3 Da (most abundant) overlaps with heavier 13+ RBD species weighing ∼31,500 Da.

(A) Example chromatogram and (B) overlaid mass spectra from the RayBiotech WT HILIC elution. Using 30 second windowed slices (colored segments and traces, red: 16.5–17 min, blue: 18–18.5 min, and gold: 19.5–20 min) from the elution aids resolution of more RBD glycoforms due to the reduction in spectral congestion in comparison to summing the full elution window together (C) (16.5–26.5 min) where overlapping charge state distributions can be observed. For example, the 12+ RBD species weighing 29273.3 Da (most abundant) overlaps with heavier 13+ RBD species weighing ∼31,500 Da. This spectral complexity inevitably resulted in variability and artifact peaks during deconvolution. We thus implemented a peak filtering approach to ensure the consistency of results (more details in the Supporting Methods section). In essence, elution time slices (18 total) for each separation method were deconvolved separately. The resultant mass lists were then merged and filtered by removing mass peaks not observed in three technical replicates with a tolerance of ±2 Da. Representative examples of the chromatographic reproducibility and deconvolution results after peak filtering for the Sino Biological WT are plotted in Figure S4A−C with each separation method, and in Figure S4D for comparison of the observed intact masses. This approach combined the intact mass distribution from triplicates and reduced the influence of noise in the deconvolution when comparing different samples or conditions. On comparing the number of observed RBD proteoforms, HILIC detected the highest number (261 peaks), including low abundance species, in comparison to C2 RPLC (129 peaks) and CZE (177) (Figure S4F). It is noted that the filtering step kept ∼10% of the total peaks, while ∼70 and ∼20% of peaks showed up only in one or two replicates, respectively. The average median abundances of peaks observed in all triplicates were consistently higher than those of peaks observed in one or two replicates for all separation methods, suggesting higher variability in detection of low-abundance species (Figure S5). Using this peak filtering approach, we compared the total numbers of peaks after deconvolution using the HILIC separation across time slices (the same as Figure B) vs summing across the full elution window (the same as Figure C) for all four RBDs (Figure S6A–D). Not surprisingly, separation increased the number of peaks by at least ∼3 fold (Figure S6E) and showed the lowest median abundances of the detected peaks (Figure S6F). Overall, the windowed elution slices and peak filtering approach used here best utilize the HILIC glycoform separation to detect the greatest number of proteoforms while reducing the influence of noise from the deconvolution process from spectral congestion issues. We additionally compared the chromatographic performance of a packed HILIC column (with commercially available packing material) to our monolithic HILIC column format (Figure S7).[38] The monolithic HILIC column consistently had lower chromatographic baselines and better peak resolution, which thus led us to focus on the monolithic format.

HILIC Top-Down Analysis Reported more Low-Abundance, Heavily Glycosylated Proteoforms than What Were Predicted from Glycopeptide Data

From the intact mass measurements alone, initial fitting of the masses to previously reported O- and N-glycans proved very difficult. The measured masses were 206 Da higher for the Sino Biological and 733 Da higher for the RayBiotech RBDs than expected (after considering the known C-terminal polyhistidine affinity tag), suggesting additional modifications. To better confirm the protein sequence, the N-glycans were removed with PNGase F, and the protein was reduced and then denatured to produce O-glycoforms for direct infusion top-down analysis, similar to the approach used by Roberts et al.[34] Removal of N-glycosylation significantly reduced spectral complexity and improved fragmentation for sequencing the protein backbones. After accounting for the expected sequence mass (including purification tags) and the O-glycoforms, the RayBiotech WT and N331Q RBDs were still 613.0 and 610.3 Da heavier than expected, respectively (Figure S8). Manual de novo sequencing of the N-terminus for the RayBiotech WT RBD by electron transfer dissociation (ETD) fragmentation (Figure S9)[46] determined that five amino acids with the sequence, KSMHM (weighing 614.77 Da), from the partially cleaved signaling peptide remained. Top-down with HCD for each RBD produced only C-terminal and glycan fragments that matched to the expected O-glycoform isolated but did not reveal the N-terminus (Figure S10), possibly due to the complication of O-glycosylation. Using the same approach, N-terminal ETD fragments for the Sino Biological WT (Figure S11) explained the extra 85.1 Da by an additional serine residue remained from the signaling peptide sequence, as was previously reported for this RBD.[34] We also identified potential cysteinylation at C538 based on nonreducing intact mass analysis after N-glycan removal (Figure S12) and known information in a recent report.[33] Additionally, we detected an unknown 186 Da mass shift unique to the two Sino Biological RBDs (Figure S12C,D). This mass shift was present in the intact RBDs (without any treatment, Figures and 2) but lost with TCEP reduction. It cannot be readily assigned to known N- or O-glycans but could be a noncovalent adduct protected by disulfides or an unknown covalent modification linked to a cysteine residue. The identity of this modification remains unclear but must be accounted for in the intact mass distribution. Together, all the RBDs studied here had additional N-terminal residues added to the expected RBD sequence from the signaling peptide and cysteinylation, consistent with other reports by Gstottner et al.[33] and Roberts et al.[34] Notably, standard peptide mapping by bottom-up analysis produced 100 and 92.1% sequence coverage for the RayBiotech WT and Sino Biological WT, respectively, (Figures S13 and S14) without considering the modified N-terminal sequence. Since the expected N-termini begin with arginine, additional preceding residues are easily missed due to the trypsin cleavage. Despite the relatively short sequence of the signaling peptides at N-termini, variable N-terminal cleavage points have been observed that leave additional residues, which could complicate RBD-based seropositivity assays if recombinant RBDs with variable N-termini are used.[47] Therefore, complete characterization of the sequence, especially with the power of top-down measurements, can be important for quality control of RBDs. After quantifying intact O-glycoforms and unexpected PTMs of all four RBDs from the intact mass and top-down data, we examined bottom-up glycopeptide (Figures S15–S18) and released glycan data (Figures S19 and S20; Tables S1–S4) regarding their coverage of glycosylation. Overall, released N-glycans showed similar glycan profiles to glycopeptide data but captured few N-glycans with sialic acids, possibly due to the labile nature or detection bias of sialic acid groups.[48,49] Thus, we focused on combining the RBD intact masses after N-glycan removal (O-glycoforms) with the N-glycopeptide data for reconstruction of the mass distributions for each RBD by adapting the method reported by Yang et al.[50]Figure plots the intact mass profile after peak filtering (top, black trace) with the corresponding reconstruction (bottom, red trace). Matching the reconstruction allowed us to attach assignments to at least half of the peaks in the filtered intact mass distributions including the selected glycoforms in Figure . Some intact mass peaks could have multiple glycan assignments due to structural isomers or exchange of glycans between sites, where ambiguity exists due to multiple assignments overlapped in mass, and the glycopeptide abundances were used to inform which species are the most likely.
Figure 4

(A–D) Comparison of HILIC separated deconvolved RBD mass distributions after peak filtering (top, black trace) with the reconstructed mass distribution from top-down and glycopeptide N-glycan analysis (bottom, red trace) for each RBD. The Pearson correlation between the intact mass spectrum and the reconstruction is given for every RBD. Note: For the Sino Biological RBDs, only glycosite N331 was observed to be sometimes unoccupied. The relative abundances of unoccupied glycosites were estimated to be 8% based on the best fit to the intact mass distribution as described in the Supporting Method.

(A–D) Comparison of HILIC separated deconvolved RBD mass distributions after peak filtering (top, black trace) with the reconstructed mass distribution from top-down and glycopeptide N-glycan analysis (bottom, red trace) for each RBD. The Pearson correlation between the intact mass spectrum and the reconstruction is given for every RBD. Note: For the Sino Biological RBDs, only glycosite N331 was observed to be sometimes unoccupied. The relative abundances of unoccupied glycosites were estimated to be 8% based on the best fit to the intact mass distribution as described in the Supporting Method. With all the RBDs investigated, the RayBiotech N331Q RBD (Figure D), which was the least heterogeneous sample, had the best fit between the reconstruction and the experimental data with a Pearson correlation of 0.83 and 52 assignable peaks in the reconstruction out of 107 experimental peaks matching within ±2 Da. The RayBiotech WT RBD had a correlation of 0.76 and matched 98 of 150 peaks. Similarly, the Sino Biological RBDs had correlations of 0.74 and 0.67 with 210/261 and 160/202 mass peaks matching the reconstruction for the WT and N501Y RBDs, respectively. Despite the overall agreements, significant amounts of higher mass peaks (>33 kDa and up to ∼30% relative abundance) in the experiment were unaccounted for in the reconstruction for the RayBiotech WT RBD (Figure B). Underestimations of high mass species were also seen in the two Sino Biological RBDs (the tails in the mass distribution >33 kDa in Figure A,C), but those species were at a much lower abundance (up to ∼5%). This discrepancy could be attributed to the detection biases in the glycopeptide analysis. It is known that glycosylation can reduce the ionization efficiency of glycopeptides relative to unglycosylated peptides, which complicates quantitative glycan analysis without derivitization.[51,52] Additionally, sialic acid is known to substantially shift the retention time of glycopeptides where more heavily sialylated peptides elute later in reverse-phase chromatography. Sialic acid also can undergo more substantial in-source fragmentation than other glycan groups.[53,54] These missing glycoforms in the reconstruction likely have even larger N-glycans with more sialic acid and/or fucose present that are not readily detected from the glycopeptide or released glycan analysis. For instance, the matched reconstruction and experimental peak with the highest mass for the RayBiotech WT RBD weighs 33,296 Da (O-glycan: H2N2S2; combined N-glycans: H10N10F2S3). Many additional higher mass peak pairs (e.g.,33,587 and 33,878 Da) are exclusively observed in the experimental data that are each spaced by sialic acid (291 Da), thus supporting this possibility of suppressed signal of sialyated glycopeptides in our peptide data. In native or denaturing intact mode MS, ionization efficiency is largely driven by the protein backbone, which alleviates the ionization bias based on the extent of glycosylation.[55] This hypothesis could be tested in future work using targeted methods (e.g., derivatization, negative mode) that minimize losses and enhance the signal of sialic acids.[54] We also noted that both Sino Biological RBDs showed more heterogeneous glycoform distributions (∼3 times more peaks) in the reconstructions than in the experimental intact mass distribution. Herein, we collected dual enzyme digestion data to separately define the glycosylation on N331 and N343 for the two Sino Biological RBDs. We suspect that the N-glycans on these two glycosites were likely correlated and not randomly combined as assumed in the reconstruction (i.e., connectivity between the microheterogeneity). In our RayBiotech WT RBD data, the N331 and N343 glycans were defined on a single tryptic peptide, maintaining the native connectivity between the two sites and therefore yielding a more similar reconstruction to the experimental spectrum. While the connectivity and variation of glycosylation across multiple sites (i.e., metaheterogeneity) require further confirmation, our preliminary analysis showed that a top-down framework with HILIC glycoform separation can be highly beneficial for defining the combinations of glycosylation with potentially reduced detection biases.

Conclusions

Here, we demonstrated that online monolithic HILIC provided improved detection of RBD intact glycoforms compared to C2 RPLC and CZE, attributable to HILIC’s superior separation of neutral glycans.[28,30] While all separation methods helped reduce spectral congestion, the heterogeneity of the glycoproteins still caused remarkable variations in detection of glycoforms even among technical replicates. The peak filtering approach used here maintained reproducible RBD proteoforms while limiting the influence of noise with these heterogeneous and difficult-to-study proteins. The intact mass profile of the detected glycoforms provides a rapid assessment of the integrity of the RBDs and readily revealed unexpected mass shifts from PTMs and sequence variations. When combined with top-down fragmentation, glycopeptide, and released glycan analyses, up to 75% of the intact masses could be assigned based on the computationally reconstructed mass profiles integrating all the data. Interestingly, our reconstruction based on glycopeptide data generally showed fewer low abundance and/or heavily glycosylated species than the top-down data, especially for the more heterogeneous RBDs. Such discrepancies were consistently reported in several recent studies of RBD/spike glycosylation,[33,45] and with other glycoprotein analysis,[49] to differing extents. Given the known experimental biases in glycopeptide analysis, incorporating top-down data will be highly valuable for more accurate characterization of glycosylation. While it remains technically challenging to directly assign individual glycoforms by online top-down fragmentation, the continuing advances in MS instrumentation methods such as proton transfer charge reduction[56,57] and charge detection mass spectrometry[45] will likely allow more comprehensive analysis of heterogeneous glycoproteins in the near future. New online separation modalities such as the HILIC method described here will also be indispensable for reducing sample complexity prior to MS analysis. Currently, HILIC separation of intact glycoproteins is uncommon but has shown great promise in the separation of glycoproteins that vary in the glycan occupancy and the amount of neutral glycans.[28,30,58] Future development could utilize the HILIC separation capacity for online top-down fragmentation of glycoforms to better define the PTMs at the intact protein level. In addition, the potential separation of glycoform isomers by HILIC should also be investigated for reducing ambiguity in intact mass assignment. Given the importance of protein glycosylation in modulating immune responses[12,13] and many other biological processes, we envision that HILIC coupled to MS will have great potential in the characterization of heterogeneous glycoprotein products. To realize this, solutions to reduce the amount of TFA used in the separation and column formats that allow for more sensitive analysis should be developed and commercialized.[38]
  56 in total

1.  Resolving heterogeneous macromolecular assemblies by Orbitrap-based single-particle charge detection mass spectrometry.

Authors:  Tobias P Wörner; Joost Snijder; Antonette Bennett; Mavis Agbandje-McKenna; Alexander A Makarov; Albert J R Heck
Journal:  Nat Methods       Date:  2020-03-09       Impact factor: 28.547

Review 2.  Glycan analysis for protein therapeutics.

Authors:  Xiangkun Yang; Michael G Bartlett
Journal:  J Chromatogr B Analyt Technol Biomed Life Sci       Date:  2019-04-26       Impact factor: 3.205

3.  Structure of the SARS-CoV-2 spike receptor-binding domain bound to the ACE2 receptor.

Authors:  Jun Lan; Jiwan Ge; Jinfang Yu; Sisi Shan; Huan Zhou; Shilong Fan; Qi Zhang; Xuanling Shi; Qisheng Wang; Linqi Zhang; Xinquan Wang
Journal:  Nature       Date:  2020-03-30       Impact factor: 49.962

4.  Proton Transfer Charge Reduction Enables High-Throughput Top-Down Analysis of Large Proteoforms.

Authors:  Romain Huguet; Christopher Mullen; Kristina Srzentić; Joseph B Greer; Ryan T Fellers; Vlad Zabrouskov; John E P Syka; Neil L Kelleher; Luca Fornelli
Journal:  Anal Chem       Date:  2019-11-22       Impact factor: 6.986

Review 5.  Approaches to Heterogeneity in Native Mass Spectrometry.

Authors:  Amber D Rolland; James S Prell
Journal:  Chem Rev       Date:  2021-09-01       Impact factor: 72.087

6.  Site-specific glycan analysis of the SARS-CoV-2 spike.

Authors:  Yasunori Watanabe; Joel D Allen; Daniel Wrapp; Jason S McLellan; Max Crispin
Journal:  Science       Date:  2020-05-04       Impact factor: 47.728

7.  Poly(acrylamide-co-N,N'-methylenebisacrylamide) Monoliths for High-Peak-Capacity Hydrophilic-Interaction Chromatography-High-Resolution Mass Spectrometry of Intact Proteins at Low Trifluoroacetic Acid Content.

Authors:  Marta Passamonti; Chiem de Roos; Peter J Schoenmakers; Andrea F G Gargano
Journal:  Anal Chem       Date:  2021-11-22       Impact factor: 6.986

Review 8.  The spike protein of SARS-CoV--a target for vaccine and therapeutic development.

Authors:  Lanying Du; Yuxian He; Yusen Zhou; Shuwen Liu; Bo-Jian Zheng; Shibo Jiang
Journal:  Nat Rev Microbiol       Date:  2009-02-09       Impact factor: 60.633

9.  Analysis of the SARS-CoV-2 spike protein glycan shield reveals implications for immune recognition.

Authors:  Oliver C Grant; David Montgomery; Keigo Ito; Robert J Woods
Journal:  Sci Rep       Date:  2020-09-14       Impact factor: 4.379

10.  Characterization of therapeutic proteins by cation exchange chromatography-mass spectrometry and top-down analysis.

Authors:  Rachel Liuqing Shi; Gang Xiao; Thomas M Dillon; Margaret S Ricci; Pavel V Bondarenko
Journal:  MAbs       Date:  2020 Jan-Dec       Impact factor: 5.857

View more
  2 in total

Review 1.  Site-specific glycosylation of SARS-CoV-2: Big challenges in mass spectrometry analysis.

Authors:  Diana Campos; Michael Girgis; Miloslav Sanda
Journal:  Proteomics       Date:  2022-06-22       Impact factor: 5.393

Review 2.  Proteomics-based mass spectrometry profiling of SARS-CoV-2 infection from human nasopharyngeal samples.

Authors:  Sayantani Chatterjee; Joseph Zaia
Journal:  Mass Spectrom Rev       Date:  2022-09-29       Impact factor: 9.011

  2 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.