Literature DB >> 34110173

An Integrated Strategy Reveals Complex Glycosylation of Erythropoietin Using Mass Spectrometry.

Yudong Guan¹, Min Zhang², Manasi Gaikwad², Hannah Voss², Ramin Fazel³, Samira Ansari⁴, Huali Shen⁵, Jigang Wang¹, Hartmut Schlüter².

Abstract

The characterization of therapeutic glycoproteins is challenging due to the structural heterogeneity of the therapeutic protein glycosylation. This study presents an in-depth analytical strategy for glycosylation of first-generation erythropoietin (epoetin beta), including a developed mass spectrometric workflow for N-glycan analysis, bottom-up mass spectrometric methods for site-specific N-glycosylation, and a LC-MS approach for O-glycan identification. Permethylated N-glycans, peptides, and enriched glycopeptides of erythropoietin were analyzed by nanoLC-MS/MS, and de-N-glycosylated erythropoietin was measured by LC-MS, enabling the qualitative and quantitative analysis of glycosylation and different glycan modifications (e.g., phosphorylation and O-acetylation). The newly developed Python scripts enabled the identification of 140 N-glycan compositions (237 N-glycan structures) from erythropoietin, especially including 8 phosphorylated N-glycan species. The site-specificity of N-glycans was revealed at the glycopeptide level by pGlyco software using different proteases. In total, 114 N-glycan compositions were identified from glycopeptide analysis. Moreover, LC-MS analysis of de-N-glycosylated erythropoietin species identified two O-glycan compositions based on the mass shifts between non-O-glycosylated and O-glycosylated species. Finally, this integrated strategy was proved to realize the in-depth glycosylation analysis of a therapeutic glycoprotein to understand its pharmacological properties and improving the manufacturing processes.

Entities: Chemical

Keywords: Python scripts; bottom-up; erythropoietin; glycoinformatics; glycosylation; mass spectrometry

Mesh：

Substances：

Year: 2021 PMID： 34110173 PMCID： PMC9472269 DOI： 10.1021/acs.jproteome.1c00221

Source DB: PubMed Journal: J Proteome Res ISSN： 1535-3893 Impact factor: 5.370

Introduction

Erythropoietin (EPO) is renal growth factor that stimulates the proliferation and differentiation of erythroid-progenitor cells in the bone marrow and that is therefore used for the treatment of a wide spectrum of erythrocyte deficiency, which was commercialized in 1989 by Amgen as Epogen.[1] The binding of EPO to its receptor (EPOR), initiating the intracellular signaling pathways, highly depends on its amino acid sequence and post-translational modifications (PTMs), such as glycosylation and glycan modifications.[2] EPO is available in four different forms, associated with different glycosylation patterns, including first-generation EPOs (epoetin alfa, epoetin beta, and epoetin delta) and second-generation EPO (darbepoetin alfa). Darbepoetin alfa has five modified amino acids and two additional N-glycosylation sites compared to first-generation EPOs.[3] First-generation epoetin consists of 166 amino acids with mono-, bi-, tri-, or tetra-antennary glycans structures on three N-glycosylation sites (Asn-24, Asn-38, and Asn-83) and one O-glycosylation site (Ser-126). Glycosylation has significant impacts on the biological functions of a protein, including protein stability, solubility, antigenicity, folding, and half-life.[4] An in-depth characterization of glycan patterns is important to guarantee the production quality of biotherapeutic glycoproteins with appropriate pharmacological characteristics. In contrast to polypeptides, the biosynthesis of glycans is not directly depending on the templates, showing high heterogeneity of glycoproteins in biotechnological productions. It has been shown that the glycan composition of the biopharmaceuticals is critically depending on the presence and activity of different glycosyltransferases, concentration of substrates, cell lines, and cell culture conditions, making the production of therapeutic glycoproteins challenging.[5] EPO is highly heterogeneous with a variety of glycosylation patterns as well as post-glycosylational modifications, such as sialylation by N-acetylneuraminic acid (Neu5Ac) or N-glycolylneuraminic acid (Neu5Gc), phosphorylation, and O-acetylation.[6] These patterns have a significant impact on the drug efficiency and adverse drug effects in EPO-treated patients. As an example, increases of the Neu5Gc containing glycoproteins can lead to immunogenic or inflammatory responses in patients.[7,8] Phosphate groups are commonly attached on the C6 position of mannose (Man) of N-glycans.[9,10] O-Acetyl groups bind to either the C4-, C7-, C8-, or C9-hydroxyl position of sialic acid residues.[11] For N-glycan sample preparation, O-acetylation is labile and removed during the commonly used permethylation, simplifying data analysis for N-glycan compositions.[12] However, previous investigations mainly analyzed EPO with total attached glycan compositions or focused on highly abundant N-glycans due to the lack of an efficient glycoinformatics strategy.[3,13−15] Liquid chromatography (LC) coupled with electrospray ionization (ESI)-mass spectrometry (MS) and matrix-assisted laser desorption ionization (MALDI)-MS are used frequently to analyze biomolecules with an outstanding accuracy and reproducibility, like proteomics.[16−18] For glycan analysis, various LC-MS-based strategies have been utilized, such as porous graphitic carbon (PGC)-LC-MS,[19] hydrophilic interaction liquid chromatography (HILIC)-LC-MS,[20,21] and reversed-phase (RP)-LC-MS.[22] In contrast to proteomics, MS-based glycomics is limited by differing standard procedures across different laboratories, technical limitations, as well as the low availability of reliable glycoinformatics approaches, requiring efficient data interpretation software.[23] NanoLC-MS/MS is more able to identify low abundant glycans than MALDI-MS due to the ionization suppression during the laser desorption of MALDI-MS without chromatographic separation. An improved strategy to perform optimized solid-phase permethylation (OSPP) for in-depth N-glycomics, using nanoLC(RP)-MS/MS combined with the developed Python script matching algorithm, has been reported in our previous study.[24] Aiming to characterize each glycan precisely, an integrated analytical workflow is developed to analyze different glycosylation patterns of EPO (epoetin beta), expressed by Chinese hamster ovary (CHO) cell lines containing a cloned human erythropoietin gene, at N-glycan, glycopeptide, and de-N-glycosylated protein levels. In this platform, N-glycans were measured by nanoLC(RP)-MS/MS after N-glycan release, purification, reduction, and permethylation, followed by data analysis using newly designed Python scripts. In addition, enriched glycopeptides were analyzed by bottom-up approach using nanoLC(RP)-ESI-MS/MS after tryptic or chymotryptic digestion. Furthermore, LC-MS analysis of de-N-glycosylated EPO was performed, enabling the identification of O-glycans based on mass shift between non-O-glycosylated and O-glycosylated species. The glycosylation analysis of EPO at multiple levels provides new insights to profile the glycan structures, understand its pharmacological properties, and control the manufacturing processes.

Materials and Methods

Materials

All used chemicals, including sodium hydroxide beads, dimethyl sulfoxide (DMSO), and iodomethane, were purchased from Sigma-Aldrich (Darmstadt, Germany), unless otherwise stated. A2F N-glycan standard was obtained from QA-Bio. Two different batches of EPOs (epoetin beta: RDF9729003 and RDF9729004) were kindly provided by CinnaGen (Tehran, Iran), which were herein named EPO-3 and EPO-4. Sequencing grade modified trypsin, chymotrypsin, and peptide-N-glycosidase F (PNGase F) were obtained from Promega (Madison, WI, USA). Centrifugal filters (0.5 mL, 3 kDa cutoff) were purchased from Merck KGaA (Darmstadt, Germany). Solid-phase-extraction (SPE) columns containing RP materials (C18 Sep-Pak cartridges) were obtained from Waters (Milford, MA, USA).

N-Glycan Release, Purification, Reduction, and Permethylation

N-Glycan cleavage, purification, reduction and permethylation were performed as described in previous studies.[24,25] EPO was denatured using 6 M urea, reduced by 20 mM dithiothreitol (DTT) at 56 °C for 30 min, and alkylated by 40 mM iodoacetamide (IAA) at room temperature for 30 min in the dark, in which DTT and IAA were prepared in 100 mM ammonium bicarbonate (ABC) buffer. Next, buffers were exchanged to 100 mM ABC buffer using 3 kDa cutoff centrifugal filters. Thirty units of PNGase F was added and incubated at 37 °C for 24 h, followed by the addition of trypsin (1/100, w/w) for another 20 h. N-Glycans and tryptic peptides were separated using a RP-SPE C18 cartridge. Briefly, the C18 cartridge was conditioned with 5 mL of methanol and equilibrated with 10 mL of 5% acetic acid, respectively. After loading the digested samples, N-glycans were eluted with 5 mL of 5% acetic acid and evaporated using a SpeedVac vacuum concentrator. The A2F N-glycan standard, containing Neu5Ac2HexNAc3Hex5Fuc1Red-HexNAc1 at a purity of more than 90%, was collected from porcine thyroglobulin by hydrazinolysis using a combination of HPLC and glycosidase digestion. EPO derived N-glycans and A2F N-glycan standard were reduced by borane-ammonia complex, followed by OSPP with iodomethane.[24] Briefly, 10 μg/μL borane-ammonia complex was used to reduce N-glycans at 60 °C for 1 h and then removed by evaporation with three additions of 300 μL of methanol. Reduced N-glycans were resuspended in 110 μL of water/DMSO (10/100, v/v) solution, and 100 μL of iodomethane was added, which was transferred to the glass vial containing sodium hydroxide beads (200 mg). Afterward, samples were incubated in a Thermomixer compact (Eppendorf AG, Hamburg, Germany) using a rotation speed of 1300 rpm for 10 min. The solution containing permethylated N-glycans was transferred to a new vial, followed by the addition of 200 μL 5% acetic acid and 400 μL of chloroform. Chloroform-water extraction was performed to purify permethylated N-glycans, which was dried by a SpeedVac vacuum concentrator and stored at −20 °C until further use.

NanoLC-MS/MS Analysis of Permethylated N-Glycans

The permethylated N-glycans, resuspended into 0.1% formic acid (FA), were injected into a Dionex UltiMate 3000 UPLC system (Thermo Fisher Scientific, Bremen, Germany), coupled with a tribrid quadrupole-orbitrap-ion trap mass spectrometer (Fusion, Thermo Fisher Scientific, Bremen, Germany).[26] Solvent A was 0.1% FA in water, and solvent B was 0.1% FA in ACN. The sample was desalted using a RP C18 trapping column (Thermo Scientific Acclaim PepMap, 100 μm × 2 cm, 5 μm, 100 Å) at a flow rate of 3 μL/min with 2% solvent B and then transported to an analytical RP C18 column (Thermo Scientific Acclaim PepMap RSLC, 75 μm × 50 cm, 2 μm, 100 Å) at a flow rate of 0.2 μL/min. For N-glycan separation, solvent B started with 10%, increasing to 30% in 5 min, to 75% in 70 min, and finally to 95% in 80 min. Detailed MS instrument parameters are described in the Supporting Information. The data were visualized using Thermo Xcalibur in version 4.2.28.14 (Thermo Fisher Scientific, Bremen, Germany).

Identification and Statistical Analysis of Permethylated N-Glycans

For N-glycan identification, permethylated Neu5Gc, Neu5Ac, N-acetylhexosamine (HexNAc), hexose (Hex), fucose (Fuc), and reduced N-acetylhexosamine (Red-HexNAc) are utilized as the building blocks to match experimental masses with theoretical N-glycan compositions using a newly designed Python script. Within predefined deviation, the Python script can provide all the possible N-glycan compositions for one precursor without any N-glycan database, especially solving the problems derived from isobaric compositions like Neu5Ac1Hex1 and Neu5Gc1Fuc1 (Figure S1). In addition, it further extends to the N-glycans assembled with phosphorylated Man residues (P-Hex), considering the HexNAc1Hex3Red-HexNAc1 (trimannosylchitobiose core) (Figure S2). Mass spectrometric raw data was processed using the MaxQuant software in version 1.6.2.3 (http://www.maxquant.org). As MaxQuant was exclusively used for mass value extraction, no database was included and all settings were chosen as automatically recommended for mass spectra.[27] The file in “csv” format, including data columns of Raw file, m/z, and Mass extracted from “allPeptides.txt”, was input into Python scripts and matched with possible N-glycan compositions with a deviation threshold of 2.5 p.p.m. at the MS1 level. From the result table, the masses, matched to N-glycan compositions, were further matched with monoisotopic m/z in the MS raw data. Finally, the positively matched masses were characterized as a preferred N-glycan structure with the assistance of GlycoWorkbench in version 2.1 build 146 (https://download.cnet.com/GlycoWorkbench-64-bit/3000-2383_4-75758804.html). To perform quantitative analysis of N-glycans between different batches of EPO samples, EPO-3 and EPO-4, the peak area of each identified N-glycan composition was integrated using Skyline in version 19.1 (http://skyline.maccosslab.org). N-Glycan compositions and Total Area MS1 (abundance of each N-glycan composition) were uploaded to Perseus in version 1.6.2.1 (http://www.perseus-framework.org) for statistical analysis. Log2 transformation, normalization, and two-sample Student’s t-test at a false discovery rate (FDR) less than 0.05 were performed to identify significantly differential abundant N-glycans between the compared groups. Only N-glycans, showing a mean difference higher than 1.5- or 2-fold change (FC) between different EPO batches, were considered in further analysis.

Tryptic and Chymotryptic Digestion of EPO

EPO samples were denatured, reduced, and alkylated as described above. Buffers were exchanged to 100 mM ABC buffer using 3 kDa cutoff centrifugal filters. Samples were digested using trypsin (1/100, w/w) or chymotrypsin (1/100, w/w) at 37 °C for 20 h. Next, peptides were divided into two aliquots for peptide analysis and glycopeptide enrichment, which were concentrated by a SpeedVac vacuum concentrator and stored at −20 °C until further use.

NanoLC-MS/MS Analysis and Raw Data Processing for Peptides

Digested EPO, by trypsin or chymotrypsin, was resuspended in 0.1% FA for nanoLC-MS/MS analysis. Solvent A was 0.1% FA in water, and solvent B was 0.1% FA in ACN. The peptides were desalted using a RP C18 trapping column at a flow rate of 3 μL/min with 2% solvent B and then transported to an analytical RP C18 column at a flow rate of 0.2 μL/min. Starting with 1% for 5 min, solvent B increased to 25% in 65 min, to 35% in 80 min, and finally to 90% in 85 min. Peptide ions were transferred to a hybrid quadrupole-orbitrap mass spectrometer (Q Exactive, Thermo Fisher Scientific, Bremen, Germany). Detailed MS instrument parameters are described in the Supporting Information. Peptide identification was performed using the pFind software in version 3.1.6 (http://pfind.ict.ac.cn/software/pFind3/),[28] with the FASTA database containing the amino acid sequence of epoetin beta (https://www.drugbank.ca/drugs/DB00016). For peptide identification, up to 3 missed cleavages were tolerated; peptides identified at a FDR less than 0.01 with a minimum length of 6 amino acids and a maximum length of 100 were considered; the precursor mass tolerance was set to 10 p.p.m.; the mass tolerance at MS2 was set to 20 p.p.m.; the oxidation of methionine residues was included as a variable modification; carbamidomethylation of cysteine residues was set as a fixed modification.

Glycopeptide Enrichment by Zwitterionic Hydrophilic Interaction Liquid Chromatography

Digested EPO, by trypsin or chymotrypsin, was resuspended in 80% ACN with 1% trifluoroacetic acid (TFA) and enriched using self-packed zwitterionic hydrophilic interaction liquid chromatography (ZIC-HILIC) columns.[29] Micro-columns were packed with 30 mg of ZIC-HILIC materials obtained from a HILIC column (SeQuant ZIC-cHILIC, 4.6 × 250 mm, 3 μm, 100 Å, Merck KGaA, Darmstadt, Germany) on top of a C8 RP chromatographic material (3M, Eagan, MN, USA). The column was equilibrated with 600 μL of 1% TFA in 80% ACN. After sample loading, low-hydrophilic peptides were first removed using 600 μL of 1% TFA in 80% ACN. Then, glycopeptides were eluted by 600 μL 0.1% TFA, concentrated by a SpeedVac vacuum concentrator, and stored at −20 °C until further use.

NanoLC-MS/MS Measurement and pGlyco-Based Data Analysis of Enriched Glycopeptides

Dried glycopeptides were dissolved in 0.1% FA and injected into a Dionex UltiMate 3000 UPLC system, coupled to a tribrid orbitrap mass spectrometer. Solvent A was 0.1% FA in water, and solvent B was 0.1% FA in ACN. Samples were first desalted using a RP C18 trapping column at a flow rate of 3 μL/min with 1% solvent B and then transported to an analytical RP C18 column at a flow rate of 0.2 μL/min. Solvent B started with 1% and increased to 20% in 90 min, to 30% in 120 min, and to 90% in 130 min. Detailed MS instrument parameters are described in the Supporting Information. Glycopeptides were identified by pGlyco software in version 2.2.0 (http://pfind.ict.ac.cn/software/pGlyco/index.html) with the total N-glycan entries of 7884.[30] By pGlyco identification, the amino acid sequence of epoetin beta was used as a FASTA database. The parameters were set as follows: the precursor tolerance was 5 p.p.m.; the fragment mass tolerance was 20 p.p.m.; the oxidation of methionine was set as a variable modification and carbamidomethylation of cysteine as a fixed modification.

Sample Preparation and Analysis of de-N-Glycosylated EPO

EPO was denatured, reduced, alkylated and digested by PNGase F as mentioned above. de-N-Glycosylated EPO samples were injected into a LC system (Waters ACQUITY, Milford, MA, USA), coupled with a hybrid quadrupole-orbitrap mass spectrometer. Solvent A was 0.1% FA in water, and solvent B was 0.1% FA in ACN. Using a RP monolithic column (ProSwift RP-4H, 1 × 250 mm, Thermo Fisher Scientific, Bremen, Germany), solvent B started with 5% for a duration of 2 min, increasing to 25% in 7 min, and to 60% in 34 min at a flow rate of 0.2 mL/min. Detailed MS instrument parameters are described in the Supporting Information. To deconvolute the MS raw data, the parameters of BioPharma Finder were set as follows: the amino acid sequence of epoetin beta was used as a FASTA database; the oxidation of methionine and carbamidomethylation of cysteine residues were included as variable modifications; the deamidation on Asn-24, Asn-38, and Asn-83 was set as a static modification. For the processing method, the m/z range was set from 1500 to 3000; Xtract (isotopically resolved) was used as a deconvolution algorithm; the output mass range was set from 16,000 to 30,000; charge states between 8 and 25 were considered, and a minimum number of 6 was used for deconvolution; the mass tolerance was set to 20 p.p.m..

Results and Discussions

The Identification and Relative Quantification of Permethylated N-Glycans

Various monosaccharides served as building blocks in the Python script to match the highly accurate MS1 data (Figure a). Furthermore, phosphorylation of glycans has been proved to play crucial roles in protein transport and human diseases, for example, phosphorylation of Man is involved to control the extracellular levels of leukemia inhibitory factor (LIF).[31] However, phosphorylated glycans still remain poorly characterized by MS platform. In the designed Python script (Figure S2), phosphorylated N-glycans were covered, which showed an increase of 93.98198 Da on Man (P-Hex) after permethylation (Figure b). First, the A2F N-glycan standard was used to test this strategy, in which 52 N-glycan compositions (72 N-glycan structures) were identified at MS1 and MS2 levels after permethylation (Table S1). In addition, 140 N-glycan compositions were identified from EPO-3 and EPO-4 samples, including 237 N-glycan structures in total (Table S2). Of them, different isomeric structures derived from one N-glycan composition were identified based on distinguishable retention time (RT) and MS2 fragments. For example, Neu5Ac1HexNAc7Hex6Fuc1Red-HexNAc1, extracted from N-glycans of EPO, showed six isomeric structures using C18 separation (Figure c). Trimannosylchitobiose-free N-glycan species from EPO, containing a Red-HexNAc, were also considered for masses below 1164.6251 Da that is the monoisotopic mass of trimannosylchitobiose core structure. They were also confidently identified at MS1 and MS2 levels, although most species have no chitobiose cores for PNGase F cleavage (Figure S3).[32] Eight phosphorylated N-glycans were identified from EPO samples, showing a much higher coverage than the previous study.[33] In this MS2 spectrum of HexNAc1Hex5P-Hex1Red-HexNAc1 (Figure d), the diagnostic fragment ion at m/z 313.1720 was generated from the terminal P-Hex residue of the N-glycan by collision-induced dissociation (CID) fragmentation. However, N-glycans with O-acetylated sialic acids were not considered in the Python scripts due to the removal of O-acetylation during permethylation. The development of an analysis of these glycans should be included during native N-glycan analysis in a further study as the complementary structural characterization.

Figure 1

MS-based characterization of N-glycans with the newly designed Python scripts. (a) Basic building blocks used in the matching algorithm, showing each monoisotopic molecular weight (monoMW) after permethylation (GlcNAc, N-acetylglucosamine; GalNAc, N-acetylgalactosamine; Gal, galactose; Man, mannose; Glc, glucose). (b) Structures of permethylated P-Hex, as an additional building block to investigate glycan modification using Python script in this study. (c) Isomeric N-glycans separation by nanoLC(C18) and characterization by MS2 fragments, illustrated with the extracted Neu5Ac1HexNAc7Hex6Fuc1Red-HexNAc1 from the N-glycans of EPO. (d) Characterization of HexNAc1Hex5P-Hex1Red-HexNAc1 with extracted ion chromatogram (EIC) of precursors and the generated MS2 spectrum. Compared with previous studies,[3,34−37] this glycoinformatics strategy identified at least a two-fold higher number of N-glycan species (Figure a). Next, classification was performed based on different modifications including sialylation and phosphorylation (Figure b). Most N-glycan species, 54%, were sialylated by Neu5Ac, 25% by both of Neu5Ac and Neu5Gc, and 3% by Neu5Gc. The level of Neu5Gc contaminated N-glycans, 28% in total, is a key characteristic to evaluate the safety of the biotherapeutics for patient treatment due to the immunogenic responses. In addition, 6% of identified N-glycan species were found to be phosphorylated. To compare the amount of each N-glycan species, EICs of different charged species were integrated using Skyline and log2 transformed. All quantified N-glycan species of EPO-3 are summarized in Figure c, showing three highest abundant N-glycan species, Neu5Ac4HexNAc5Hex7Fuc1Red-HexNAc1, Neu5Gc2Neu5Ac3HexNAc4Hex6Fuc1Red-HexNAc1, and Neu5Ac3HexNAc5Hex7Fuc1Red-HexNAc1, and three lowest abundant N-glycan species, HexNAc1Hex3P-Hex1Red-HexNAc1, Neu5Gc1HexNAc3Hex5Red-HexNAc1, and HexNAc2Hex3Red-HexNAc1.

Figure 2

Comparative analysis of identified N-glycan compositions of EPO. (a) N-glycan coverage comparison between this study and other recent publications. (b) Category analysis of identified N-glycan compositions of EPO based on the different modifications (sialylation and phosphorylation). (c) Comparative quantification of all the identified N-glycan compositions of EPO-3, showing the three highest (top) and three lowest (bottom) abundant N-glycan species.

Batch-to-Batch Comparison of EPO Derived N-Glycans

In the production of therapeutic glycoproteins, the reproducibility of structural characteristics, such as PTMs, is highly important as it guarantees the continuous maintenance of pharmacological characteristics. Therefore, the N-glycans of two batches of EPO, EPO-3, and EPO-4, were compared to investigate the differential abundance of N-glycan species across batches, as an essential step for quality control (QC). Using N-glycan MS data, total ion chromatograms (TICs) of both batches were first compared, showing a high similarity (Figure a). It has been proposed that Neu5Gc modifications of glycoprotein pharmaceuticals can lead to immunogenic responses of patients.[8] During the biosynthesis of Neu5Gc, cytidine 5′-monophosphate (CMP)-Neu5Ac in the cytosol is the precursor to generate CMP-Neu5Gc, catalyzed by CMP-Neu5Ac hydroxylase (CMAH). Next, CMP-Neu5Gc is transferred into Golgi and then used to assemble glycoconjugates by various sialyltransferases.[38] The ratio between Neu5Ac and Neu5Gc, as an index to quantify the generated Neu5Gc-containing N-glycans of EPO, was measured based on the peak areas of the MS2 diagnostic fragments, m/z 344.1704 for Neu5Ac and 374.1809 for Neu5Gc. The analysis revealed that the amount of Neu5Ac was about 18-fold larger compared to Neu5Gc in EPO-3 and 17-fold in EPO-4 (Figure b). After two-sample Student’s t-test analysis by Perseus software (EPO-3 vs EPO-4), the significantly differentially abundant N-glycan species were visualized in a volcano plot. The N-glycans, showing at least 1.5-FC between the two batches, were considered (Figure c). Neu5Ac1HexNAc7Hex8Fuc1Red-HexNAc1 (green dot) showed a 1.7-fold higher abundance in EPO-4 compared to EPO-3. In contrast, 11 N-glycan species showed at least 1.5-fold higher abundance in EPO-3 compared to EPO-4 (1.5-FC < blue dots < 2-FC and red dots > 2-FC). To further investigate the N-glycans with abundance difference higher than 2-FC, six N-glycan species were compared in the heatmap (Figure d). As the most significant differences, HexNAc2Hex3Red-HexNAc1 showed a 4.85-fold higher abundance in EPO-3 compared to EPO-4. In addition, most N-glycans, 128 species (91.4%), fluctuated within 1.5-FC between two batches of EPO samples, as an index to evaluate the stability and repeatability during EPO production.

Figure 3

Quantitative comparison of identified N-glycan compositions between two different batches, EPO-3 and EPO-4. (a) TIC comparison of N-glycans from EPO-3 and EPO-4. (b) Quantitative comparison of sialylation levels with Neu5Ac and Neu5Gc between EPO-3 and EPO-4. (c) Quantitative comparison of all identified N-glycan compositions by volcano plot (EPO-3 vs EPO-4). (d) Heatmap of significantly different abundance of identified N-glycan compositions with more than 2-FC (EPO-3 vs EPO-4).

Peptide Identification by pFind

Trypsin cleaves peptides at the C-terminus of Arg and Lys with a high site-specificity, except for -Arg/Lys-Pro- bonds. Chymotrypsin is less specific and cleaves peptide bonds at the C-terminus of Tyr, Phe, Trp, and Leu. In addition, Met, Ala, Asp, and Glu can be cleaved by chymotrypsin at a lower cleavage rate. Through the combination of chymotryptic and tryptic peptide analysis of EPO-3, a total sequence coverage of 74.1% was identified, using the pFind algorithm (Figure S4), in which 62% of the protein sequence was covered by tryptic peptides and 54.8% was covered by chymotryptic peptides. Cys-7 and Cys-161 were modified with carbamidomethylation due to alkylation, and Met-54 was identified with oxidation. Except for N-glycosylated peptides, most chymotryptic and tryptic peptides of EPO were identified from MS data using the bottom-up approach. Also, non-O-glycosylated peptides, EAISPPDAAS(126)AAPLR from tryptic digestion and GAQKEAISPPDAAS(126)AAPL (Figure S5) from chymotryptic digestion, were identified. The analysis revealed that non-O-glycosylated EPO species existed, which provided a clue to identify O-glycans based on the mass difference between the non-O-glycosylated and O-glycosylated EPO species after de-N-glycosylation. Furthermore, the identified modifications were further considered in the deconvolution process of LC-MS data of de-N-glycosylated EPO.

Glycopeptide Analysis by pGlyco and the Identification of O-Acetylated Sialic Acids

In addition to the comprehensive qualitative and quantitative analysis of N-glycans, the characterization of glycopeptides enables the investigation of N-glycan locations on the polypeptide backbone. The glycopeptide analysis was performed in technical triplicates using nanoLC(RP)-MS/MS, and glycopeptides identified in at least two replicates were considered as valid data. After tryptic digestion of EPO-3, several EAEN(24)ITTGCcAEHCcSLNEN(38)ITVPDTK peptides were identified with one N-glycan species attached on Asn-24 or Asn-38, while most of them, attached with two N-glycans, were not identifiable by pGlyco (Table S3). In this case, chymotryptic digestion showed a better applicability for glycopeptide identification compared to tryptic digestion as chymotryptic cleavage separated the Asn-24 and -38 on two different peptides (LL)EAKEAEN(24)ITTGCcAEHCcSL and NEN(38)ITVPDTKVNF(Y), which can be identified using pGlyco (Table S4). For EPO-3, the identified N-glycan compositions from chymotryptic glycopeptide and N-glycan analysis are compared in Table S5. On (LL)EAKEAEN(24)ITTGCcAEHCcSL peptides, 17 N-glycan compositions were identified by pGlyco, of which 12 were also identified by permethylated N-glycan analysis. In parallel, 13 N-glycan compositions were identified on NEN(38)ITVPDTKVNF(Y) peptides and 10 of them were identified by permethylated N-glycan analysis; 49 N-glycan compositions were identified on the (L)VN(83)SSQPW(EPL) peptides and 27 were confirmed by permethylated N-glycan analysis. The O-acetylation of sialic acids was removed during permethylation for N-glycan analysis, while glycopeptides enabled the identification of O-acetylated sialic acids due to more gentle sample preparation, without high pH treatment. For this reason, glycopeptides attached with O-acetylated sialic acid residues were able to be searched from MS data manually, with a mass increase of 42.01057 Da compared to non-O-acetylated species. As an example, the glycopeptide LVN(83)SSQPW(Neu5Ac2HexNAc6Hex7Fuc1) was identified by using pGlyco at m/z 1338.5282 (Figure a). With an addition of one O-acetyl group, LVN(83)SSQPW(Neu5Ac+OAc1Neu5Ac1HexNAc6Hex7Fuc1) was identified with m/z 1352.5311 at the MS1 level. Based on the diagnostic fragments, at m/z 316.1012, 334.1119, and 699.2452, it could be identified at the MS2 level, evidencing the existence of Neu5Ac+OAc contaminated N-glycan species on the glycopeptide (Figure b). However, pGlyco cannot identify these glycan modifications, which will promote next, necessarily required improvement of this software.

Figure 4

Identification of O-acetylated sialic acid by glycopeptide analysis. (a) MS2 spectrum annotation of LVN(83)SSQPW(Neu5Ac2HexNAc6Hex7Fuc1), identified by pGlyco. (b) Comparison of LVN(83)SSQPW(Neu5Ac2HexNAc6Hex7Fuc1) and LVN(83)SSQPW(Neu5Ac+OAc1Neu5Ac1HexNAc6Hex7Fuc1) at MS1 and MS2 levels.

The Comparison of Identified N-Glycans between Three Approaches

This Python script-based analytical approach enabled the identification of 140 N-glycan compositions of EPO after permethylation in this study. Using EPO-3, chymotrypsin-based bottom-up analysis of glycopeptides identified 54 N-glycan compositions and a trypsin-based bottom-up approach identified 103 N-glycan compositions by pGlyco analysis. A total of 29 N-glycan compositions were identified by all approaches; 51 species were exclusively identified by the analysis of tryptic glycopeptides, 10 by chymotryptic glycopeptide analysis, and 101 by the analysis of permethylated N-glycans (Figure S6). The 29 N-glycan compositions were more abundant than other species, and the small overlap between different approaches mainly resulted from the low abundant glycopeptides or N-glycans, which was limited by switching from the MS1 to MS2 level using data-dependent acquisition (DDA) mode. At the N-glycan level, permethylation is still an optimal derivatization strategy for MS analysis to mine more species in low abundance. However, it generates some side-reactions including the removal of O-acetylation of sialic acids, which can be avoided with glycopeptide analysis. Therefore, the combination of the two approaches obtains a higher coverage of N-glycan species.

O-Glycan Identification by LC-MS Analysis of de-N-Glycosylated EPO

O-glycans are mostly released by chemical approaches such as β-elimination.[39] As a limitation, these techniques result in the degradation of the released O-glycans, significantly reducing the reliability of identified O-glycan species. For this reason, the investigation of O-glycan patterns at the O-glycopeptide and O-glycoprotein level is advantageous as it preserves the intact O-glycans. Compared to N-glycans, mainly separated in three major types: high-mannose, complex, and hybrid species with a trimannosylchitobiose core structure (HexNAc2Hex3),[40] O-glycan shows a higher heterogeneity without a common core structure and lower site-specificity.[41] However, O-glycans contribute much less to the heterogeneity of EPO with one O-glycosylation site (Ser-126) compared to N-glycans. LC-MS analysis of de-N-glycosylated EPO-3 revealed three main species with charges from 8 to 22 (Figure a). After deconvolution by BioPharma Finder, three monoisotopic masses were identified. These include 18459.584, 19115.795, and 19406.902 Da, and 18459.584 Da was identified as the monoisotopic mass of non-O-glycosylated EPO-3. The mass difference of 656.211 Da, between 18459.584 and 19115.795 Da species, revealed the O-glycan composition of Neu5Ac1HexNAc1Hex1 and a mass difference of 947.318 Da, between 18459.584 and 19406.902 Da species, revealed an O-glycan composition of Neu5Ac2HexNAc1Hex1 (Figure b). This LC-MS approach is well customized for de-N-glycosylated EPO due to the single O-glycosylation site . However, for therapeutic glycoproteins containing multiple O-glycosylation sites, it is preferred to profile O-glycan patterns at the glycopeptide level.

Figure 5

LC-MS analysis of de-N-glycosylated EPO-3 with three main species, labeled with green, blue, and red asterisks. (a) MS1 spectrum of de-N-glycosylated EPO-3, showing three main species with charges from 8 to 22. (b) Detected monoisotopic masses after deconvolution of LC-MS data generated from de-N-glycosylated EPO-3 by BioPharma Finder, showing two major O-glycosylated species.

Conclusions

In conclusion, an integrated strategy for in-depth glycosylation analysis of therapeutic glycoproteins, including permethylated N-glycan analysis, glycopeptide bottom-up analysis using different proteases and O-glycan identification by LC-MS analysis of de-N-glycosylated proteins, was developed. Focusing on N-glycan structural characterization, the Python scripts were newly designed, based on a database-free matching algorithm, to allow the identification of the low abundant N-glycan species in a commercial standard first. Including phosphorylated Man residues, the additionally designed Python script enables the identification of phosphorylated N-glycans, showing powerful practicability for in-depth N-glycomics. The uncommon N-glycan species identified in this study, like trimannosylchitobiose-free N-glycans, still need further investigation. Understanding different glycan patterns was significant to improve the QC procedure for the glycosylation of EPO, providing feedbacks to manage the EPO preparation, especially regarding the medically critical sialylation (Neu5Gc) level. For such detailed analysis, pGlyco alone is limited as it cannot identify glycan modifications, like O-acetylation of sialic acids, and glycopeptides attached with multiple glycans. de-N-Glycosylation reduced the heterogeneity of EPO species significantly and enabled the identification of O-glycan species. The combination of these techniques and improvements in the newly developed strategy allows an in-depth characterization of glycan species from EPO. This can also be further applied to analyze the other therapeutic glycoproteins, with the aim to understand drug functions and side effects and improve the QC procedures.

38 in total

1. Utilizing ion-pairing hydrophilic interaction chromatography solid phase extraction for efficient glycopeptide enrichment in glycoproteomics.

Authors: Simon Mysling; Giuseppe Palmisano; Peter Højrup; Morten Thaysen-Andersen
Journal: Anal Chem Date: 2010-07-01 Impact factor: 6.986

Review 2. Optimal and consistent protein glycosylation in mammalian cell culture.

Authors: Patrick Hossler; Sarwat F Khattak; Zheng Jian Li
Journal: Glycobiology Date: 2009-06-03 Impact factor: 4.313

Review 3. O-acetylation of sialic acids.

Authors: A Klein; P Roussel
Journal: Biochimie Date: 1998-01 Impact factor: 4.079

4. Comprehensive identification of peptides in tandem mass spectra using an efficient open search engine.

Authors: Hao Chi; Chao Liu; Hao Yang; Wen-Feng Zeng; Long Wu; Wen-Jing Zhou; Rui-Min Wang; Xiu-Nan Niu; Yue-He Ding; Yao Zhang; Zhao-Wei Wang; Zhen-Lin Chen; Rui-Xiang Sun; Tao Liu; Guang-Ming Tan; Meng-Qiu Dong; Ping Xu; Pei-Heng Zhang; Si-Min He
Journal: Nat Biotechnol Date: 2018-10-08 Impact factor: 54.908

5. Automated High-Throughput Permethylation for Glycosylation Analysis of Biologics Using MALDI-TOF-MS.

Authors: Archana Shubhakar; Radoslaw P Kozak; Karli R Reiding; Louise Royle; Daniel I R Spencer; Daryl L Fernandes; Manfred Wuhrer
Journal: Anal Chem Date: 2016-08-15 Impact factor: 6.986

Review 6. SnapShot: O-Glycosylation Pathways across Kingdoms.

Authors: Hiren J Joshi; Yoshiki Narimatsu; Katrine T Schjoldager; Hanne L P Tytgat; Markus Aebi; Henrik Clausen; Adnan Halim
Journal: Cell Date: 2018-01-25 Impact factor: 41.582

7. High-resolution glycoform profiling of intact therapeutic proteins by hydrophilic interaction chromatography-mass spectrometry.

Authors: Elena Domínguez-Vega; Sara Tengattini; Claudia Peintner; Jordy van Angeren; Caterina Temporini; Rob Haselberg; Gabriella Massolini; Govert W Somsen
Journal: Talanta Date: 2018-03-08 Impact factor: 6.057

8. GlycoDelete engineering of mammalian cells simplifies N-glycosylation of recombinant proteins.

Authors: Leander Meuris; Francis Santens; Greg Elson; Nele Festjens; Morgane Boone; Anaëlle Dos Santos; Simon Devos; François Rousseau; Evelyn Plets; Erica Houthuys; Pauline Malinge; Giovanni Magistrelli; Laura Cons; Laurence Chatel; Bart Devreese; Nico Callewaert
Journal: Nat Biotechnol Date: 2014-04-20 Impact factor: 54.908

9. Affinity purification of erythropoietin from cell culture supernatant combined with MALDI-TOF-MS analysis of erythropoietin N-glycosylation.

Authors: David Falck; Markus Haberger; Rosina Plomp; Michaela Hook; Patrick Bulau; Manfred Wuhrer; Dietmar Reusch
Journal: Sci Rep Date: 2017-07-13 Impact factor: 4.379

10. Direct quality control of glycoengineered erythropoietin variants.

Authors: Tomislav Čaval; Weihua Tian; Zhang Yang; Henrik Clausen; Albert J R Heck
Journal: Nat Commun Date: 2018-08-21 Impact factor: 14.919