Yudong Guan1, Min Zhang2, Manasi Gaikwad2, Hannah Voss2, Ramin Fazel3, Samira Ansari4, Huali Shen5, Jigang Wang1, Hartmut Schlüter2. 1. The First Affiliated Hospital (Shenzhen People's Hospital), Southern University of Science and Technology, Shenzhen 518055, China. 2. Section Mass Spectrometric Proteomics, Institute of Clinical Chemistry and Laboratory Medicine, University Medical Center Hamburg-Eppendorf, Hamburg 20246, Germany. 3. Reasearch and Innovation Center, Livogen Pharmed Co., Tehran 1417755358, Iran. 4. CinnaGen Medical Biotechnology Research Center, Alborz University of Medical Sciences, Karaj 3165933155, Iran. 5. Department of Systems Biology for Medicine, School of Basic Medical Sciences, Fudan University, Shanghai 200032, China.
Abstract
The characterization of therapeutic glycoproteins is challenging due to the structural heterogeneity of the therapeutic protein glycosylation. This study presents an in-depth analytical strategy for glycosylation of first-generation erythropoietin (epoetin beta), including a developed mass spectrometric workflow for N-glycan analysis, bottom-up mass spectrometric methods for site-specific N-glycosylation, and a LC-MS approach for O-glycan identification. Permethylated N-glycans, peptides, and enriched glycopeptides of erythropoietin were analyzed by nanoLC-MS/MS, and de-N-glycosylated erythropoietin was measured by LC-MS, enabling the qualitative and quantitative analysis of glycosylation and different glycan modifications (e.g., phosphorylation and O-acetylation). The newly developed Python scripts enabled the identification of 140 N-glycan compositions (237 N-glycan structures) from erythropoietin, especially including 8 phosphorylated N-glycan species. The site-specificity of N-glycans was revealed at the glycopeptide level by pGlyco software using different proteases. In total, 114 N-glycan compositions were identified from glycopeptide analysis. Moreover, LC-MS analysis of de-N-glycosylated erythropoietin species identified two O-glycan compositions based on the mass shifts between non-O-glycosylated and O-glycosylated species. Finally, this integrated strategy was proved to realize the in-depth glycosylation analysis of a therapeutic glycoprotein to understand its pharmacological properties and improving the manufacturing processes.
The characterization of therapeutic glycoproteins is challenging due to the structural heterogeneity of the therapeutic protein glycosylation. This study presents an in-depth analytical strategy for glycosylation of first-generation erythropoietin (epoetin beta), including a developed mass spectrometric workflow for N-glycan analysis, bottom-up mass spectrometric methods for site-specific N-glycosylation, and a LC-MS approach for O-glycan identification. Permethylated N-glycans, peptides, and enriched glycopeptides of erythropoietin were analyzed by nanoLC-MS/MS, and de-N-glycosylated erythropoietin was measured by LC-MS, enabling the qualitative and quantitative analysis of glycosylation and different glycan modifications (e.g., phosphorylation and O-acetylation). The newly developed Python scripts enabled the identification of 140 N-glycan compositions (237 N-glycan structures) from erythropoietin, especially including 8 phosphorylated N-glycan species. The site-specificity of N-glycans was revealed at the glycopeptide level by pGlyco software using different proteases. In total, 114 N-glycan compositions were identified from glycopeptide analysis. Moreover, LC-MS analysis of de-N-glycosylated erythropoietin species identified two O-glycan compositions based on the mass shifts between non-O-glycosylated and O-glycosylated species. Finally, this integrated strategy was proved to realize the in-depth glycosylation analysis of a therapeutic glycoprotein to understand its pharmacological properties and improving the manufacturing processes.
Entities:
Keywords:
Python scripts; bottom-up; erythropoietin; glycoinformatics; glycosylation; mass spectrometry
Erythropoietin (EPO) is renal growth factor that stimulates the
proliferation and differentiation of erythroid-progenitor cells in
the bone marrow and that is therefore used for the treatment of a
wide spectrum of erythrocyte deficiency, which was commercialized
in 1989 by Amgen as Epogen.[1] The binding
of EPO to its receptor (EPOR), initiating the intracellular signaling
pathways, highly depends on its amino acid sequence and post-translational
modifications (PTMs), such as glycosylation and glycan modifications.[2] EPO is available in four different forms, associated
with different glycosylation patterns, including first-generation
EPOs (epoetin alfa, epoetin beta, and epoetin delta) and second-generation
EPO (darbepoetin alfa). Darbepoetin alfa has five modified amino acids
and two additional N-glycosylation sites compared to first-generation
EPOs.[3] First-generation epoetin consists
of 166 amino acids with mono-, bi-, tri-, or tetra-antennary glycans
structures on three N-glycosylation sites (Asn-24, Asn-38, and Asn-83)
and one O-glycosylation site (Ser-126).Glycosylation has significant impacts on the biological functions
of a protein, including protein stability, solubility, antigenicity,
folding, and half-life.[4] An in-depth characterization
of glycan patterns is important to guarantee the production quality
of biotherapeutic glycoproteins with appropriate pharmacological characteristics.
In contrast to polypeptides, the biosynthesis of glycans is not directly
depending on the templates, showing high heterogeneity of glycoproteins
in biotechnological productions. It has been shown that the glycan
composition of the biopharmaceuticals is critically depending on the
presence and activity of different glycosyltransferases, concentration
of substrates, cell lines, and cell culture conditions, making the
production of therapeutic glycoproteins challenging.[5] EPO is highly heterogeneous with a variety of glycosylation
patterns as well as post-glycosylational modifications, such as sialylation
by N-acetylneuraminic acid (Neu5Ac) or N-glycolylneuraminic acid (Neu5Gc), phosphorylation, and O-acetylation.[6] These patterns have a significant impact on the
drug efficiency and adverse drug effects in EPO-treated patients.
As an example, increases of the Neu5Gc containing glycoproteins can
lead to immunogenic or inflammatory responses in patients.[7,8] Phosphate groups are commonly attached on the C6 position of mannose
(Man) of N-glycans.[9,10] O-Acetyl groups bind to either
the C4-, C7-, C8-, or C9-hydroxyl position of sialic acid residues.[11] For N-glycan sample preparation, O-acetylation
is labile and removed during the commonly used permethylation, simplifying
data analysis for N-glycan compositions.[12] However, previous investigations mainly analyzed EPO with total
attached glycan compositions or focused on highly abundant N-glycans
due to the lack of an efficient glycoinformatics strategy.[3,13−15]Liquid chromatography (LC) coupled with electrospray ionization
(ESI)-mass spectrometry (MS) and matrix-assisted laser desorption
ionization (MALDI)-MS are used frequently to analyze biomolecules
with an outstanding accuracy and reproducibility, like proteomics.[16−18] For glycan analysis, various LC-MS-based strategies have been utilized,
such as porous graphitic carbon (PGC)-LC-MS,[19] hydrophilic interaction liquid chromatography (HILIC)-LC-MS,[20,21] and reversed-phase (RP)-LC-MS.[22] In contrast
to proteomics, MS-based glycomics is limited by differing standard
procedures across different laboratories, technical limitations, as
well as the low availability of reliable glycoinformatics approaches,
requiring efficient data interpretation software.[23] NanoLC-MS/MS is more able to identify low abundant glycans
than MALDI-MS due to the ionization suppression during the laser desorption
of MALDI-MS without chromatographic separation. An improved strategy
to perform optimized solid-phase permethylation (OSPP) for in-depth
N-glycomics, using nanoLC(RP)-MS/MS combined with the developed Python
script matching algorithm, has been reported in our previous study.[24]Aiming to characterize each glycan precisely, an integrated analytical
workflow is developed to analyze different glycosylation patterns
of EPO (epoetin beta), expressed by Chinese hamster ovary (CHO) cell
lines containing a cloned human erythropoietin gene, at N-glycan,
glycopeptide, and de-N-glycosylated protein levels. In this platform,
N-glycans were measured by nanoLC(RP)-MS/MS after N-glycan release,
purification, reduction, and permethylation, followed by data analysis
using newly designed Python scripts. In addition, enriched glycopeptides
were analyzed by bottom-up approach using nanoLC(RP)-ESI-MS/MS after
tryptic or chymotryptic digestion. Furthermore, LC-MS analysis of
de-N-glycosylated EPO was performed, enabling the identification of
O-glycans based on mass shift between non-O-glycosylated and O-glycosylated
species. The glycosylation analysis of EPO at multiple levels provides
new insights to profile the glycan structures, understand its pharmacological
properties, and control the manufacturing processes.
Materials and Methods
Materials
All used chemicals, including sodium hydroxide
beads, dimethyl sulfoxide (DMSO), and iodomethane, were purchased
from Sigma-Aldrich (Darmstadt, Germany), unless otherwise stated.
A2F N-glycan standard was obtained from QA-Bio. Two different batches
of EPOs (epoetin beta: RDF9729003 and RDF9729004) were kindly provided
by CinnaGen (Tehran, Iran), which were herein named EPO-3 and EPO-4.
Sequencing grade modified trypsin, chymotrypsin, and peptide-N-glycosidase
F (PNGase F) were obtained from Promega (Madison, WI, USA). Centrifugal
filters (0.5 mL, 3 kDa cutoff) were purchased from Merck KGaA (Darmstadt,
Germany). Solid-phase-extraction (SPE) columns containing RP materials
(C18 Sep-Pak cartridges) were obtained from Waters (Milford,
MA, USA).
N-Glycan Release, Purification, Reduction, and Permethylation
N-Glycan cleavage, purification, reduction and permethylation were
performed as described in previous studies.[24,25] EPO was denatured using 6 M urea, reduced by 20 mM dithiothreitol
(DTT) at 56 °C for 30 min, and alkylated by 40 mM iodoacetamide
(IAA) at room temperature for 30 min in the dark, in which DTT and
IAA were prepared in 100 mM ammonium bicarbonate (ABC) buffer. Next,
buffers were exchanged to 100 mM ABC buffer using 3 kDa cutoff centrifugal
filters. Thirty units of PNGase F was added and incubated at 37 °C
for 24 h, followed by the addition of trypsin (1/100, w/w) for another
20 h. N-Glycans and tryptic peptides were separated using a RP-SPE
C18 cartridge. Briefly, the C18 cartridge was
conditioned with 5 mL of methanol and equilibrated with 10 mL of 5%
acetic acid, respectively. After loading the digested samples, N-glycans
were eluted with 5 mL of 5% acetic acid and evaporated using a SpeedVac
vacuum concentrator. The A2F N-glycan standard, containing Neu5Ac2HexNAc3Hex5Fuc1Red-HexNAc1 at a purity of more than 90%, was collected from porcine
thyroglobulin by hydrazinolysis using a combination of HPLC and glycosidase
digestion. EPO derived N-glycans and A2F N-glycan standard were reduced
by borane-ammonia complex, followed by OSPP with iodomethane.[24] Briefly, 10 μg/μL borane-ammonia
complex was used to reduce N-glycans at 60 °C for 1 h and then
removed by evaporation with three additions of 300 μL of methanol.
Reduced N-glycans were resuspended in 110 μL of water/DMSO (10/100,
v/v) solution, and 100 μL of iodomethane was added, which was
transferred to the glass vial containing sodium hydroxide beads (200
mg). Afterward, samples were incubated in a Thermomixer compact (Eppendorf
AG, Hamburg, Germany) using a rotation speed of 1300 rpm for 10 min.
The solution containing permethylated N-glycans was transferred to
a new vial, followed by the addition of 200 μL 5% acetic acid
and 400 μL of chloroform. Chloroform-water extraction was performed
to purify permethylated N-glycans, which was dried by a SpeedVac vacuum
concentrator and stored at −20 °C until further use.
NanoLC-MS/MS Analysis of Permethylated N-Glycans
The
permethylated N-glycans, resuspended into 0.1% formic acid (FA), were
injected into a Dionex UltiMate 3000 UPLC system (Thermo Fisher Scientific,
Bremen, Germany), coupled with a tribrid quadrupole-orbitrap-ion trap
mass spectrometer (Fusion, Thermo Fisher Scientific, Bremen, Germany).[26] Solvent A was 0.1% FA in water, and solvent
B was 0.1% FA in ACN. The sample was desalted using a RP C18 trapping column (Thermo Scientific Acclaim PepMap, 100 μm
× 2 cm, 5 μm, 100 Å) at a flow rate of 3 μL/min
with 2% solvent B and then transported to an analytical RP C18 column (Thermo Scientific Acclaim PepMap RSLC, 75 μm ×
50 cm, 2 μm, 100 Å) at a flow rate of 0.2 μL/min.
For N-glycan separation, solvent B started with 10%, increasing to
30% in 5 min, to 75% in 70 min, and finally to 95% in 80 min. Detailed
MS instrument parameters are described in the Supporting Information. The data were visualized using Thermo
Xcalibur in version 4.2.28.14 (Thermo Fisher Scientific, Bremen, Germany).
Identification and Statistical Analysis of Permethylated N-Glycans
For N-glycan identification, permethylated Neu5Gc, Neu5Ac, N-acetylhexosamine (HexNAc), hexose (Hex), fucose (Fuc),
and reduced N-acetylhexosamine (Red-HexNAc) are utilized
as the building blocks to match experimental masses with theoretical
N-glycan compositions using a newly designed Python script. Within
predefined deviation, the Python script can provide all the possible
N-glycan compositions for one precursor without any N-glycan database,
especially solving the problems derived from isobaric compositions
like Neu5Ac1Hex1 and Neu5Gc1Fuc1 (Figure S1). In addition, it further
extends to the N-glycans assembled with phosphorylated Man residues
(P-Hex), considering the HexNAc1Hex3Red-HexNAc1 (trimannosylchitobiose core) (Figure S2). Mass spectrometric raw data was processed using the MaxQuant
software in version 1.6.2.3 (http://www.maxquant.org). As MaxQuant was exclusively used for mass value extraction, no
database was included and all settings were chosen as automatically
recommended for mass spectra.[27] The file
in “csv” format, including data columns of Raw file,
m/z, and Mass extracted from “allPeptides.txt”, was
input into Python scripts and matched with possible N-glycan compositions
with a deviation threshold of 2.5 p.p.m. at the MS1 level. From the
result table, the masses, matched to N-glycan compositions, were further
matched with monoisotopic m/z in the MS raw data. Finally, the positively
matched masses were characterized as a preferred N-glycan structure
with the assistance of GlycoWorkbench in version 2.1 build 146 (https://download.cnet.com/GlycoWorkbench-64-bit/3000-2383_4-75758804.html).To perform quantitative analysis of N-glycans between different
batches of EPO samples, EPO-3 and EPO-4, the peak area of each identified
N-glycan composition was integrated using Skyline in version 19.1
(http://skyline.maccosslab.org). N-Glycan compositions and Total Area MS1 (abundance of each N-glycan
composition) were uploaded to Perseus in version 1.6.2.1 (http://www.perseus-framework.org) for statistical analysis. Log2 transformation, normalization, and
two-sample Student’s t-test at a false discovery
rate (FDR) less than 0.05 were performed to identify significantly
differential abundant N-glycans between the compared groups. Only
N-glycans, showing a mean difference higher than 1.5- or 2-fold change
(FC) between different EPO batches, were considered in further analysis.
Tryptic and Chymotryptic Digestion of EPO
EPO samples
were denatured, reduced, and alkylated as described above. Buffers
were exchanged to 100 mM ABC buffer using 3 kDa cutoff centrifugal
filters. Samples were digested using trypsin (1/100, w/w) or chymotrypsin
(1/100, w/w) at 37 °C for 20 h. Next, peptides were divided into
two aliquots for peptide analysis and glycopeptide enrichment, which
were concentrated by a SpeedVac vacuum concentrator and stored at
−20 °C until further use.
NanoLC-MS/MS Analysis and Raw Data Processing for Peptides
Digested EPO, by trypsin or chymotrypsin, was resuspended in 0.1%
FA for nanoLC-MS/MS analysis. Solvent A was 0.1% FA in water, and
solvent B was 0.1% FA in ACN. The peptides were desalted using a RP
C18 trapping column at a flow rate of 3 μL/min with
2% solvent B and then transported to an analytical RP C18 column at a flow rate of 0.2 μL/min. Starting with 1% for
5 min, solvent B increased to 25% in 65 min, to 35% in 80 min, and
finally to 90% in 85 min. Peptide ions were transferred to a hybrid
quadrupole-orbitrap mass spectrometer (Q Exactive, Thermo Fisher Scientific,
Bremen, Germany). Detailed MS instrument parameters are described
in the Supporting Information.Peptide
identification was performed using the pFind software in version 3.1.6
(http://pfind.ict.ac.cn/software/pFind3/),[28] with the FASTA database containing
the amino acid sequence of epoetin beta (https://www.drugbank.ca/drugs/DB00016). For peptide identification, up to 3 missed cleavages were tolerated;
peptides identified at a FDR less than 0.01 with a minimum length
of 6 amino acids and a maximum length of 100 were considered; the
precursor mass tolerance was set to 10 p.p.m.; the mass tolerance
at MS2 was set to 20 p.p.m.; the oxidation of methionine residues
was included as a variable modification; carbamidomethylation of cysteine
residues was set as a fixed modification.
Glycopeptide Enrichment by Zwitterionic Hydrophilic Interaction
Liquid Chromatography
Digested EPO, by trypsin or chymotrypsin,
was resuspended in 80% ACN with 1% trifluoroacetic acid (TFA) and
enriched using self-packed zwitterionic hydrophilic interaction liquid
chromatography (ZIC-HILIC) columns.[29] Micro-columns
were packed with 30 mg of ZIC-HILIC materials obtained from a HILIC
column (SeQuant ZIC-cHILIC, 4.6 × 250 mm, 3 μm, 100 Å,
Merck KGaA, Darmstadt, Germany) on top of a C8 RP chromatographic
material (3M, Eagan, MN, USA). The column was equilibrated with 600
μL of 1% TFA in 80% ACN. After sample loading, low-hydrophilic
peptides were first removed using 600 μL of 1% TFA in 80% ACN.
Then, glycopeptides were eluted by 600 μL 0.1% TFA, concentrated
by a SpeedVac vacuum concentrator, and stored at −20 °C
until further use.
NanoLC-MS/MS Measurement and pGlyco-Based Data Analysis of Enriched
Glycopeptides
Dried glycopeptides were dissolved in 0.1%
FA and injected into a Dionex UltiMate 3000 UPLC system, coupled to
a tribrid orbitrap mass spectrometer. Solvent A was 0.1% FA in water,
and solvent B was 0.1% FA in ACN. Samples were first desalted using
a RP C18 trapping column at a flow rate of 3 μL/min
with 1% solvent B and then transported to an analytical RP C18 column at a flow rate of 0.2 μL/min. Solvent B started with
1% and increased to 20% in 90 min, to 30% in 120 min, and to 90% in
130 min. Detailed MS instrument parameters are described in the Supporting Information.Glycopeptides were
identified by pGlyco software in version 2.2.0 (http://pfind.ict.ac.cn/software/pGlyco/index.html) with the total N-glycan entries of 7884.[30] By pGlyco identification, the amino acid sequence of epoetin beta
was used as a FASTA database. The parameters were set as follows:
the precursor tolerance was 5 p.p.m.; the fragment mass tolerance
was 20 p.p.m.; the oxidation of methionine was set as a variable modification
and carbamidomethylation of cysteine as a fixed modification.
Sample Preparation and Analysis of de-N-Glycosylated EPO
EPO was denatured, reduced, alkylated and digested by PNGase F as
mentioned above. de-N-Glycosylated EPO samples were injected into
a LC system (Waters ACQUITY, Milford, MA, USA), coupled with a hybrid
quadrupole-orbitrap mass spectrometer. Solvent A was 0.1% FA in water,
and solvent B was 0.1% FA in ACN. Using a RP monolithic column (ProSwift
RP-4H, 1 × 250 mm, Thermo Fisher Scientific, Bremen, Germany),
solvent B started with 5% for a duration of 2 min, increasing to 25%
in 7 min, and to 60% in 34 min at a flow rate of 0.2 mL/min. Detailed
MS instrument parameters are described in the Supporting Information.To deconvolute the MS raw data,
the parameters of BioPharma Finder were set as follows: the amino
acid sequence of epoetin beta was used as a FASTA database; the oxidation
of methionine and carbamidomethylation of cysteine residues were included
as variable modifications; the deamidation on Asn-24, Asn-38, and
Asn-83 was set as a static modification. For the processing method,
the m/z range was set from 1500 to 3000; Xtract (isotopically resolved)
was used as a deconvolution algorithm; the output mass range was set
from 16,000 to 30,000; charge states between 8 and 25 were considered,
and a minimum number of 6 was used for deconvolution; the mass tolerance
was set to 20 p.p.m..
Results and Discussions
The Identification and Relative Quantification of Permethylated
N-Glycans
Various monosaccharides served as building blocks
in the Python script to match the highly accurate MS1 data (Figure a). Furthermore,
phosphorylation of glycans has been proved to play crucial roles in
protein transport and human diseases, for example, phosphorylation
of Man is involved to control the extracellular levels of leukemia
inhibitory factor (LIF).[31] However, phosphorylated
glycans still remain poorly characterized by MS platform. In the designed
Python script (Figure S2), phosphorylated
N-glycans were covered, which showed an increase of 93.98198 Da on
Man (P-Hex) after permethylation (Figure b). First, the A2F N-glycan standard was
used to test this strategy, in which 52 N-glycan compositions (72
N-glycan structures) were identified at MS1 and MS2 levels after permethylation
(Table S1). In addition, 140 N-glycan compositions
were identified from EPO-3 and EPO-4 samples, including 237 N-glycan
structures in total (Table S2). Of them,
different isomeric structures derived from one N-glycan composition
were identified based on distinguishable retention time (RT) and MS2
fragments. For example, Neu5Ac1HexNAc7Hex6Fuc1Red-HexNAc1, extracted from N-glycans
of EPO, showed six isomeric structures using C18 separation
(Figure c). Trimannosylchitobiose-free
N-glycan species from EPO, containing a Red-HexNAc, were also considered
for masses below 1164.6251 Da that is the monoisotopic mass of trimannosylchitobiose
core structure. They were also confidently identified at MS1 and MS2
levels, although most species have no chitobiose cores for PNGase
F cleavage (Figure S3).[32] Eight phosphorylated N-glycans were identified from EPO
samples, showing a much higher coverage than the previous study.[33] In this MS2 spectrum of HexNAc1Hex5P-Hex1Red-HexNAc1 (Figure d), the diagnostic fragment
ion at m/z 313.1720 was generated from the terminal P-Hex residue
of the N-glycan by collision-induced dissociation (CID) fragmentation.
However, N-glycans with O-acetylated sialic acids were not considered
in the Python scripts due to the removal of O-acetylation during permethylation.
The development of an analysis of these glycans should be included
during native N-glycan analysis in a further study as the complementary
structural characterization.
Figure 1
MS-based characterization of N-glycans with the newly designed
Python scripts. (a) Basic building blocks used in the matching algorithm,
showing each monoisotopic molecular weight (monoMW) after permethylation
(GlcNAc, N-acetylglucosamine; GalNAc, N-acetylgalactosamine; Gal, galactose; Man, mannose; Glc, glucose).
(b) Structures of permethylated P-Hex, as an additional building block
to investigate glycan modification using Python script in this study.
(c) Isomeric N-glycans separation by nanoLC(C18) and characterization
by MS2 fragments, illustrated with the extracted Neu5Ac1HexNAc7Hex6Fuc1Red-HexNAc1 from the N-glycans of EPO. (d) Characterization of HexNAc1Hex5P-Hex1Red-HexNAc1 with extracted
ion chromatogram (EIC) of precursors and the generated MS2 spectrum.
MS-based characterization of N-glycans with the newly designed
Python scripts. (a) Basic building blocks used in the matching algorithm,
showing each monoisotopic molecular weight (monoMW) after permethylation
(GlcNAc, N-acetylglucosamine; GalNAc, N-acetylgalactosamine; Gal, galactose; Man, mannose; Glc, glucose).
(b) Structures of permethylated P-Hex, as an additional building block
to investigate glycan modification using Python script in this study.
(c) Isomeric N-glycans separation by nanoLC(C18) and characterization
by MS2 fragments, illustrated with the extracted Neu5Ac1HexNAc7Hex6Fuc1Red-HexNAc1 from the N-glycans of EPO. (d) Characterization of HexNAc1Hex5P-Hex1Red-HexNAc1 with extracted
ion chromatogram (EIC) of precursors and the generated MS2 spectrum.Compared with previous studies,[3,34−37] this glycoinformatics strategy identified at least a two-fold higher
number of N-glycan species (Figure a). Next, classification was performed based on different
modifications including sialylation and phosphorylation (Figure b). Most N-glycan
species, 54%, were sialylated by Neu5Ac, 25% by both of Neu5Ac and
Neu5Gc, and 3% by Neu5Gc. The level of Neu5Gc contaminated N-glycans,
28% in total, is a key characteristic to evaluate the safety of the
biotherapeutics for patient
treatment due to the immunogenic responses. In addition, 6% of identified
N-glycan species were found to be phosphorylated. To compare the amount
of each N-glycan species, EICs of different charged species were integrated
using Skyline and log2 transformed. All quantified N-glycan species
of EPO-3 are summarized in Figure c, showing three highest abundant N-glycan species,
Neu5Ac4HexNAc5Hex7Fuc1Red-HexNAc1, Neu5Gc2Neu5Ac3HexNAc4Hex6Fuc1Red-HexNAc1, and
Neu5Ac3HexNAc5Hex7Fuc1Red-HexNAc1, and three lowest abundant N-glycan species,
HexNAc1Hex3P-Hex1Red-HexNAc1, Neu5Gc1HexNAc3Hex5Red-HexNAc1, and HexNAc2Hex3Red-HexNAc1.
Figure 2
Comparative analysis of identified N-glycan compositions of EPO.
(a) N-glycan coverage comparison between this study and other recent
publications. (b) Category analysis of identified N-glycan compositions
of EPO based on the different modifications (sialylation and phosphorylation).
(c) Comparative quantification of all the identified N-glycan compositions
of EPO-3, showing the three highest (top) and three lowest (bottom)
abundant N-glycan species.
Comparative analysis of identified N-glycan compositions of EPO.
(a) N-glycan coverage comparison between this study and other recent
publications. (b) Category analysis of identified N-glycan compositions
of EPO based on the different modifications (sialylation and phosphorylation).
(c) Comparative quantification of all the identified N-glycan compositions
of EPO-3, showing the three highest (top) and three lowest (bottom)
abundant N-glycan species.
Batch-to-Batch Comparison of EPO Derived N-Glycans
In the production of therapeutic glycoproteins, the reproducibility
of structural characteristics, such as PTMs, is highly important as
it guarantees the continuous maintenance of pharmacological characteristics.
Therefore, the N-glycans of two batches of EPO, EPO-3, and EPO-4,
were compared to investigate the differential abundance of N-glycan
species across batches, as an essential step for quality control (QC).
Using N-glycan MS data, total ion chromatograms (TICs) of both batches
were first compared, showing a high similarity (Figure a). It has been proposed that Neu5Gc modifications
of glycoprotein pharmaceuticals can lead to immunogenic responses
of patients.[8] During the biosynthesis of
Neu5Gc, cytidine 5′-monophosphate (CMP)-Neu5Ac in the cytosol
is the precursor to generate CMP-Neu5Gc, catalyzed by CMP-Neu5Ac hydroxylase
(CMAH). Next, CMP-Neu5Gc is transferred into Golgi and then used to
assemble glycoconjugates by various sialyltransferases.[38] The ratio between Neu5Ac and Neu5Gc, as an index
to quantify the generated Neu5Gc-containing N-glycans of EPO, was
measured based on the peak areas of the MS2 diagnostic fragments,
m/z 344.1704 for Neu5Ac and 374.1809 for Neu5Gc. The analysis revealed
that the amount of Neu5Ac was about 18-fold larger compared to Neu5Gc
in EPO-3 and 17-fold in EPO-4 (Figure b). After two-sample Student’s t-test analysis by Perseus software (EPO-3 vs EPO-4), the significantly
differentially abundant N-glycan species were visualized in a volcano
plot. The N-glycans, showing at least 1.5-FC between the two batches,
were considered (Figure c). Neu5Ac1HexNAc7Hex8Fuc1Red-HexNAc1 (green dot) showed a 1.7-fold higher abundance
in EPO-4 compared to EPO-3. In contrast, 11 N-glycan species showed
at least 1.5-fold higher abundance in EPO-3 compared to EPO-4 (1.5-FC
< blue dots < 2-FC and red dots > 2-FC). To further investigate
the N-glycans with abundance difference higher than 2-FC, six N-glycan
species were compared in the heatmap (Figure d). As the most significant differences,
HexNAc2Hex3Red-HexNAc1 showed a 4.85-fold
higher abundance in EPO-3 compared to EPO-4. In addition, most N-glycans,
128 species (91.4%), fluctuated within 1.5-FC between two batches
of EPO samples, as an index to evaluate the stability and repeatability
during EPO production.
Figure 3
Quantitative comparison of identified N-glycan compositions between
two different batches, EPO-3 and EPO-4. (a) TIC comparison of N-glycans
from EPO-3 and EPO-4. (b) Quantitative comparison of sialylation levels
with Neu5Ac and Neu5Gc between EPO-3 and EPO-4. (c) Quantitative comparison
of all identified N-glycan compositions by volcano plot (EPO-3 vs
EPO-4). (d) Heatmap of significantly different abundance of identified
N-glycan compositions with more than 2-FC (EPO-3 vs EPO-4).
Quantitative comparison of identified N-glycan compositions between
two different batches, EPO-3 and EPO-4. (a) TIC comparison of N-glycans
from EPO-3 and EPO-4. (b) Quantitative comparison of sialylation levels
with Neu5Ac and Neu5Gc between EPO-3 and EPO-4. (c) Quantitative comparison
of all identified N-glycan compositions by volcano plot (EPO-3 vs
EPO-4). (d) Heatmap of significantly different abundance of identified
N-glycan compositions with more than 2-FC (EPO-3 vs EPO-4).
Peptide Identification by pFind
Trypsin cleaves peptides
at the C-terminus of Arg and Lys with a high site-specificity, except
for -Arg/Lys-Pro- bonds. Chymotrypsin is less specific and cleaves
peptide bonds at the C-terminus of Tyr, Phe, Trp, and Leu. In addition,
Met, Ala, Asp, and Glu can be cleaved by chymotrypsin at a lower cleavage
rate. Through the combination of chymotryptic and tryptic peptide
analysis of EPO-3, a total sequence coverage of 74.1% was identified,
using the pFind algorithm (Figure S4),
in which 62% of the protein sequence was covered by tryptic peptides
and 54.8% was covered by chymotryptic peptides. Cys-7 and Cys-161
were modified with carbamidomethylation due to alkylation, and Met-54
was identified with oxidation. Except for N-glycosylated peptides,
most chymotryptic and tryptic peptides of EPO were identified from
MS data using the bottom-up approach. Also, non-O-glycosylated peptides,
EAISPPDAAS(126)AAPLR from tryptic digestion and GAQKEAISPPDAAS(126)AAPL
(Figure S5) from chymotryptic digestion,
were identified. The analysis revealed that non-O-glycosylated EPO
species existed, which provided a clue to identify O-glycans based
on the mass difference between the non-O-glycosylated and O-glycosylated
EPO species after de-N-glycosylation. Furthermore, the identified
modifications were further considered in the deconvolution process
of LC-MS data of de-N-glycosylated EPO.
Glycopeptide Analysis by pGlyco and the Identification of O-Acetylated
Sialic Acids
In addition to the comprehensive qualitative
and quantitative analysis of N-glycans, the characterization of glycopeptides
enables the investigation of N-glycan locations on the polypeptide
backbone. The glycopeptide analysis was performed in technical triplicates
using nanoLC(RP)-MS/MS, and glycopeptides identified in at least two
replicates were considered as valid data. After tryptic digestion
of EPO-3, several EAEN(24)ITTGCcAEHCcSLNEN(38)ITVPDTK peptides were
identified with one N-glycan species attached on Asn-24 or Asn-38,
while most of them, attached with two N-glycans, were not identifiable
by pGlyco (Table S3). In this case, chymotryptic
digestion showed a better applicability for glycopeptide identification
compared to tryptic digestion as chymotryptic cleavage separated the
Asn-24 and -38 on two different peptides (LL)EAKEAEN(24)ITTGCcAEHCcSL
and NEN(38)ITVPDTKVNF(Y), which can be identified using pGlyco (Table S4). For EPO-3, the identified N-glycan
compositions from chymotryptic glycopeptide and N-glycan analysis
are compared in Table S5. On (LL)EAKEAEN(24)ITTGCcAEHCcSL
peptides, 17 N-glycan compositions were identified by pGlyco, of which
12 were also identified by permethylated N-glycan analysis. In parallel,
13 N-glycan compositions were identified on NEN(38)ITVPDTKVNF(Y) peptides
and 10 of them were identified by permethylated N-glycan analysis;
49 N-glycan compositions were identified on the (L)VN(83)SSQPW(EPL)
peptides and 27 were confirmed by permethylated N-glycan analysis.The O-acetylation of sialic acids was removed during permethylation
for N-glycan analysis, while glycopeptides enabled the identification
of O-acetylated sialic acids due to more gentle sample preparation,
without high pH treatment. For this reason, glycopeptides attached
with O-acetylated sialic acid residues were able to be searched from
MS data manually, with a mass increase of 42.01057 Da compared to
non-O-acetylated species. As an example, the glycopeptide LVN(83)SSQPW(Neu5Ac2HexNAc6Hex7Fuc1) was identified
by using pGlyco at m/z 1338.5282 (Figure a). With an addition of one O-acetyl group, LVN(83)SSQPW(Neu5Ac+OAc1Neu5Ac1HexNAc6Hex7Fuc1) was identified with m/z 1352.5311 at the MS1 level. Based on the
diagnostic fragments, at m/z 316.1012, 334.1119, and 699.2452, it
could be identified at the MS2 level, evidencing the existence of
Neu5Ac+OAc contaminated N-glycan species on the glycopeptide (Figure b). However, pGlyco
cannot identify these glycan modifications, which will promote next,
necessarily required improvement of this software.
Figure 4
Identification of O-acetylated sialic acid by glycopeptide analysis.
(a) MS2 spectrum annotation of LVN(83)SSQPW(Neu5Ac2HexNAc6Hex7Fuc1), identified by pGlyco. (b)
Comparison of LVN(83)SSQPW(Neu5Ac2HexNAc6Hex7Fuc1) and LVN(83)SSQPW(Neu5Ac+OAc1Neu5Ac1HexNAc6Hex7Fuc1) at MS1 and
MS2 levels.
Identification of O-acetylated sialic acid by glycopeptide analysis.
(a) MS2 spectrum annotation of LVN(83)SSQPW(Neu5Ac2HexNAc6Hex7Fuc1), identified by pGlyco. (b)
Comparison of LVN(83)SSQPW(Neu5Ac2HexNAc6Hex7Fuc1) and LVN(83)SSQPW(Neu5Ac+OAc1Neu5Ac1HexNAc6Hex7Fuc1) at MS1 and
MS2 levels.
The Comparison of Identified N-Glycans between Three Approaches
This Python script-based analytical approach enabled the identification
of 140 N-glycan compositions of EPO after permethylation in this study.
Using EPO-3, chymotrypsin-based bottom-up analysis of glycopeptides
identified 54 N-glycan compositions and a trypsin-based bottom-up
approach identified 103 N-glycan compositions by pGlyco analysis.
A total of 29 N-glycan compositions were identified by all approaches;
51 species were exclusively identified by the analysis of tryptic
glycopeptides, 10 by chymotryptic glycopeptide analysis, and 101 by
the analysis of permethylated N-glycans (Figure S6). The 29 N-glycan compositions were more abundant than other
species, and the small overlap between different approaches mainly
resulted from the low abundant glycopeptides or N-glycans, which was
limited by switching from the MS1 to MS2 level using data-dependent
acquisition (DDA) mode. At the N-glycan level, permethylation is still
an optimal derivatization strategy for MS analysis to mine more species
in low abundance. However, it generates some side-reactions including
the removal of O-acetylation of sialic acids, which can be avoided
with glycopeptide analysis. Therefore, the combination of the two
approaches obtains a higher coverage of N-glycan species.
O-Glycan Identification by LC-MS Analysis of de-N-Glycosylated
EPO
O-glycans are mostly released by chemical approaches
such as β-elimination.[39] As a limitation,
these techniques result in the degradation of the released O-glycans,
significantly reducing the reliability of identified O-glycan species.
For this reason, the investigation of O-glycan patterns at the O-glycopeptide
and O-glycoprotein level is advantageous as it preserves the intact
O-glycans. Compared to N-glycans, mainly separated in three major
types: high-mannose, complex, and hybrid species with a trimannosylchitobiose
core structure (HexNAc2Hex3),[40] O-glycan shows a higher heterogeneity without a common
core structure and lower site-specificity.[41]However, O-glycans contribute much less to the heterogeneity
of EPO with one O-glycosylation site (Ser-126) compared to N-glycans.
LC-MS analysis of de-N-glycosylated EPO-3 revealed three main species
with charges from 8 to 22 (Figure a). After deconvolution by BioPharma Finder, three
monoisotopic masses were identified. These include 18459.584, 19115.795,
and 19406.902 Da, and 18459.584 Da was identified as the monoisotopic
mass of non-O-glycosylated EPO-3. The mass difference of 656.211 Da,
between 18459.584 and 19115.795 Da species, revealed the O-glycan
composition of Neu5Ac1HexNAc1Hex1 and a mass difference of 947.318 Da, between 18459.584 and 19406.902
Da species, revealed an O-glycan composition of Neu5Ac2HexNAc1Hex1 (Figure b). This LC-MS approach is well customized
for de-N-glycosylated EPO due to the single O-glycosylation site .
However, for therapeutic glycoproteins containing multiple O-glycosylation
sites, it is preferred to profile O-glycan patterns at the glycopeptide
level.
Figure 5
LC-MS analysis of de-N-glycosylated EPO-3 with three main species,
labeled with green, blue, and red asterisks. (a) MS1 spectrum of de-N-glycosylated
EPO-3, showing three main species with charges from 8 to 22. (b) Detected
monoisotopic masses after deconvolution of LC-MS data generated from
de-N-glycosylated EPO-3 by BioPharma Finder, showing two major O-glycosylated
species.
LC-MS analysis of de-N-glycosylated EPO-3 with three main species,
labeled with green, blue, and red asterisks. (a) MS1 spectrum of de-N-glycosylated
EPO-3, showing three main species with charges from 8 to 22. (b) Detected
monoisotopic masses after deconvolution of LC-MS data generated from
de-N-glycosylated EPO-3 by BioPharma Finder, showing two major O-glycosylated
species.
Conclusions
In conclusion, an integrated strategy for in-depth glycosylation
analysis of therapeutic glycoproteins, including permethylated N-glycan
analysis, glycopeptide bottom-up analysis using different proteases
and O-glycan identification by LC-MS analysis of de-N-glycosylated
proteins, was developed. Focusing on N-glycan structural characterization,
the Python scripts were newly designed, based on a database-free matching
algorithm, to allow the identification of the low abundant N-glycan
species in a commercial standard first. Including phosphorylated Man
residues, the additionally designed Python script enables the identification
of phosphorylated N-glycans, showing powerful practicability for in-depth
N-glycomics. The uncommon N-glycan species identified in this study,
like trimannosylchitobiose-free N-glycans, still need further investigation.
Understanding different glycan patterns was significant to improve
the QC procedure for the glycosylation of EPO, providing feedbacks
to manage the EPO preparation, especially regarding the medically
critical sialylation (Neu5Gc) level. For such detailed analysis, pGlyco
alone is limited as it cannot identify glycan modifications, like
O-acetylation of sialic acids, and glycopeptides attached with multiple
glycans. de-N-Glycosylation reduced the heterogeneity of EPO species
significantly and enabled the identification of O-glycan species.The combination of these techniques and improvements in the newly
developed strategy allows an in-depth characterization of glycan species
from EPO. This can also be further applied to analyze the other therapeutic
glycoproteins, with the aim to understand drug functions and side
effects and improve the QC procedures.
Authors: Archana Shubhakar; Radoslaw P Kozak; Karli R Reiding; Louise Royle; Daniel I R Spencer; Daryl L Fernandes; Manfred Wuhrer Journal: Anal Chem Date: 2016-08-15 Impact factor: 6.986
Authors: Hiren J Joshi; Yoshiki Narimatsu; Katrine T Schjoldager; Hanne L P Tytgat; Markus Aebi; Henrik Clausen; Adnan Halim Journal: Cell Date: 2018-01-25 Impact factor: 41.582