With 28 potential N-glycosylation sites, human carcinoembryonic antigen (CEA) bears an extreme amount of N-linked glycosylation, and approximately 60% of its molecular mass can be attributed to its carbohydrates. CEA is often overexpressed and released by many solid tumors, including colorectal carcinomas. CEA displays an impressive heterogeneity and variability in sugar content; however, site-specific distribution of carbohydrate structures has not been reported so far. The present study investigated CEA samples purified from human colon carcinoma and human liver metastases and enabled the characterization of 21 out of 28 potential N-glycosylation sites with respect to their occupancy. The coverage was achieved by a multienzymatic digestion approach with specific enzymes, such as trypsin, endoproteinase Glu-C, and the nonspecific enzyme, Pronase, followed by analysis using sheathless CE-MS/MS. In total, 893 different N-glycopeptides and 128 unique N-glycan compositions were identified. Overall, a great heterogeneity was found both within (micro) and in between (macro) individual N-glycosylation sites. Moreover, notable differences were found on certain N-glycosylation sites between primary adenocarcinoma and metastatic tumor in regard to branching, bisection, sialylation, and fucosylation. Those features, if further investigated in a targeted manner, may pave the way toward improved diagnostics and monitoring of colorectal cancer progression and recurrence. Raw mass spectrometric data and Skyline processed data files that support the findings of this study are available in the MassIVE repository with the identifier MSV000086774 [DOI: 10.25345/C5Z50X].
With 28 potential N-glycosylation sites, human carcinoembryonic antigen (CEA) bears an extreme amount of N-linked glycosylation, and approximately 60% of its molecular mass can be attributed to its carbohydrates. CEA is often overexpressed and released by many solid tumors, including colorectal carcinomas. CEA displays an impressive heterogeneity and variability in sugar content; however, site-specific distribution of carbohydrate structures has not been reported so far. The present study investigated CEA samples purified from human colon carcinoma and human liver metastases and enabled the characterization of 21 out of 28 potential N-glycosylation sites with respect to their occupancy. The coverage was achieved by a multienzymatic digestion approach with specific enzymes, such as trypsin, endoproteinase Glu-C, and the nonspecific enzyme, Pronase, followed by analysis using sheathless CE-MS/MS. In total, 893 different N-glycopeptides and 128 unique N-glycan compositions were identified. Overall, a great heterogeneity was found both within (micro) and in between (macro) individual N-glycosylation sites. Moreover, notable differences were found on certain N-glycosylation sites between primary adenocarcinoma and metastatic tumor in regard to branching, bisection, sialylation, and fucosylation. Those features, if further investigated in a targeted manner, may pave the way toward improved diagnostics and monitoring of colorectal cancer progression and recurrence. Raw mass spectrometric data and Skyline processed data files that support the findings of this study are available in the MassIVE repository with the identifier MSV000086774 [DOI: 10.25345/C5Z50X].
With more than half of all secretory and cellular human proteins
being glycosylated, protein glycosylation is assumed to be the most
common and highly complex post-translational modification (PTM).[1] Glycoproteins play a role in an astonishing variety
of cellular processes, in particular, cell–cell interaction,
recognition, signaling, and adhesion processes on the cell surfaces.
For instance, protein N-glycosylation regulates and
fine-tunes critical immune response mechanisms and plays a major role
in tumor recognition and antitumor responses.[2,3] An
assembly error, as minor as a single monosaccharide misplacement,
may strongly impact the function of a glycoprotein and, in turn, the
cell phenotype.[4] Moreover, alterations
of the N-glycan profile can actively contribute to
tumor development and growth,[5] as well
as to the metastatic phenotype formation of the tumor cells.[6] Therefore, the common acceptance of glycans and
glycoproteins as cancer biomarkers is not surprising and keeps driving
the cancer glycomics field forward.[7]Currently, various analytical platforms are being used to study
protein glycosylation. Among them, matrix-assisted laser desorption
ionization time-of-flight mass spectrometry (MALDI-TOF-MS),[8,9] and electrospray ionization mass spectrometry (ESI-MS), coupled
online or off-line, with liquid chromatography (LC)[9,10] and
capillary electrophoresis (CE)[11,12] show considerable popularity.
In-depth site-specific characterization of highly glycosylated proteins
poses an extra challenge for an analytical method that can only be
well-resolved on a glycopeptide level through bottom-up PTM mapping
of the enzymatic glycoprotein hydrolysates.[13,14] Whereas LC-MS is an acknowledged and well-established candidate
for this task, CE-MS with a sheathless interface has advantages with
respect to sensitivity and analyte coverage in bottom-up LC-MS analysis
of glycopeptides.[15,16] The excellent performance currently
shown by sheathless CE-MS is due to the specific design of a novel
interface that was first described by Moini.[17] The use of this interface for glycopeptide analysis has been successfully
demonstrated on moderately glycosylated proteins, paving the way toward
more complex glycopeptidome separations and analysis.[18,19]Human carcinoembryonic antigen (CEA) (UniProt entry P06731,
CEACAM5_HUMAN)
is a highly N-glycosylated protein of approximately
180 kDa with sugars accounting for about half of its molecular mass.[20] CEA can be found in normal human colonic epithelial
cells and is reported to be upregulated in tumor forming and colonic
adenocarcinogenic cell lines;[21,22] elevated CEA levels
are valued as a progression and outcome biomarker in several forms
of human cancers,[23] including colorectal
cancer (CRC).[24] Continuous measurement
of CEA serum levels is clinically used for the postoperative and post-therapy
recurrence monitoring of CRC patients. Nonetheless, patients with
no elevation of CEA in disease progression have been shown to maintain
these low levels in case of recurrence.[25] Another study also demonstrated a poor predictive value and low
reliability of the CEA test for detecting treatable recurrences at
an early stage.[26] The CEA test is also
used for initial clinical diagnostics of CRC, but its predictive value
as a standalone biomarker in cancer screening or detection proves
rather poor.[27] As CEA is predominantly
produced by the tumor cells and has been shown to tightly participate
in the tumor metastatic events,[28] the expectation
is that the overly abundant and considerably heterogeneous glycosylation
could be reflective of cancer development and progression. While some
attempts have already been undertaken to profile CEA glycosylation
and analyze its relationship with cancer[29−31] and other biological
processes,[32] so far, its glycan structural
and site-specific heterogeneity in the cancer context has remained
largely unaddressed. Compared to the analysis of released N-glycans, the bottom-up approach allows to characterize
the protein N-glycan pool with respect to the site
occupancy. The same considerations are also relevant to its biomarker
potential, as glycopeptides are gaining an increased attention in
diagnostics of cancer and other diseases and are investigated as target
substrates in cancer immunotherapy and immunodiagnostics.[33−35]In this study we used a sheathless CE-MS to analyze CEA N-glycopeptides obtained after independent enzymatic digestions
with trypsin, endoproteinase Glu-C, and Pronase. CEA samples retrieved
from three different sources (two purified from human colon carcinoma
and one purified from human liver metastases) were enzymatically hydrolyzed
and subsequently analyzed by sheathless CE-MS/MS. In total, 21 out
of 28 potential N-glycosylation sites, as well as
their site-associated dominant N-glycan sets could
be identified. The comparison between colon and metastatic CEA N-glycomes in terms of N-glycan classes
and their structural features hints toward different N-glycosylation trends in primary and metastatic CRC tumors.
Materials and Methods
Chemicals, Reagents, and
Samples
All the chemicals were of analytical reagent grade
or higher. Proponan-2-ol
(iPrOH), methanol (MeOH), ammonium bicarbonate (ABC), and sodium hydroxide
(NaOH) were purchased from Merck (Darmstadt, Germany). Water (LC-MS
grade), acetonitrile (MeCN) (LC-MS grade), glacial acetic acid (HAc),
hydrochloric acid (HCl), DL-dithiothreitol (DTT), iodoacetamide (IAA),
ammonium acetate (NH4Ac), formic acid (FA), TPCK-treated
trypsin from bovine pancreas, and Pronase from Streptomyces
griseus were purchased from Sigma-Aldrich (Steinheim, Germany).
Endoproteinase Glu-C from Staphylococcus aureus V8
was supplied by Promega Corporation (Madison, WI, USA). CEA purified
from human colon carcinoma was obtained from MyBioSource, Inc. (CEA1;
San Diego, CA, USA) and Fitzgerald Industries International (CEA2;
Acton, MA, USA). CEA purified from human liver metastases of colorectal
carcinoma cells was obtained from Lee BioSolutions, Inc. (CEA3; Maryland
Heights, MO, USA).
Digestion of CEA
Equal amounts of
each CEA sample (10 μg) were taken for all the digestion procedures
and diluted with 10 μL of 25 mM ABC buffer (1 μg/μL
of protein). The protein disulfide bridges were reduced with 1 μL
of 22 mM (tryptic) or 55 mM DTT (Glu-C or Pronase) for 30 min at 60
°C. After the sample was cooled to room temperature, 1 μL
of 72 mM (tryptic) or 180 mM IAA (Glu-C or Pronase) was added. After
addition of the alkylation reagent the samples were left in the dark
for 30 min. Prior to adding the enzyme to the sample, the reaction
was inhibited with 1 μL of 78 mM DTT and left for 30 min at
room temperature. TPCK-treated trypsin (0.5 μg/μL), Glu-C
(0.2 μg/μL) or Pronase (0.2 μg/μL) was added
to the sample in an enzyme:substrate ratio of 1:10 w/w (trypsin) or
1:20 w/w (Glu-C and Pronase). Finally, incubation was performed overnight
at 37 °C. Each enzymatic digestion was performed once for each
CEA sample.
Sheathless CE-MS/MS
CE experiments
were carried out on a SCIEX/Beckman Coulter CESI 8000 system (SCIEX,
Framingham, MA) equipped with a temperature-controlled sample tray
and a power supply delivering up to 30 kV. A 91 cm long (LT) × 30 μm i.d. × 150 μm o.d. bare fused-silica
capillary (Silica Surface OptiMS cartridge, SCIEX) with the high sensitivity
porous sprayer in the outlet tip was used for all the separations.
A 10% v/v HAc water solution (pH 2.3) was used as a background electrolyte
(BGE). Prior to each sample injection, the capillary was rinsed at
pressure of 5 psi with 0.1 M NaOH (2.5 min), water (4 min), 0.1 M
HCl (2.5 min), water (4 min), and BGE (4 min). An online sample preconcentration
by transient isotachophoresis (t-ITP) was achieved by diluting the
CEA digests to a final concentration of 100 mM NH4Ac at
pH 4.0, which acted as a leading electrolyte solution (final concentration
of digested protein 0.40 μg/μL).[36] Samples were hydrodynamically injected in three different amounts:
1 psi for 60 s, 5 psi for 60 s and 8 psi for 60 s. In all experiments,
sample injection was preceded by a water dip and followed by a postplug
BGE injection (both at 0.5 psi for 25 s) to enhance t-ITP stacking
and to prevent sample loss. A separation voltage of 20 kV (normal
polarity, anode at the capillary inlet) was applied for all electrophoretic
separations, the temperature was set for 25 °C. Multiple analysis
(n = 3) were carried out for every digested CEA sample
per injected volume.The CE instrument was hyphenated to an
Impact HD UHR-QqTOF-MS (Bruker Daltonics, Bremen, Germany) via a CESI
OptiMS Bruker MS adapter kit (SCIEX) that allowed an optimal positioning
of the capillary porous tip in front of the mass spectrometer nanospray
shield (Bruker Daltonics). All experiments were carried out using
dopant enriched nitrogen gas (DEN-gas).[19] For this purpose, an in-house made polymer cone was slid onto the
housing of the porous tip, allowing for a coaxial sheath flow of the
DEN-gas around the nano-ESI emitter (MeCN was used as a dopant). Under
optimized conditions, CE-MS/MS experiments were carried out in ESI
positive mode using the following parameters: glass capillary voltage
at 1200 V, drying gas temperature at 150 °C, drying gas flow
rate at 1.2 L/min, nebulizer gas pressure at 0.2 bar, quadrupole ion
energy at 3.0 eV and collision cell energy at 7.0 eV. MS data was
acquired between m/z 200 and 2000
with a spectral acquisition rate of 1 Hz. MS/MS spectra were acquired
in a data dependent mode with an absolute threshold of 4548 counts
and active exclusion. Specific m/z values that were already acquired three times were excluded and
released after 0.8 min, unless the precursor had a five times higher
intensity than the observed in the previous acquisitions. Raw CE-MS/MS
data are available via the MassIVE repository with identifier MSV000086774
[DOI: 10.25345/C5Z50X].
Data Analysis
Manual interpretation
of CE-MS/MS spectra were performed in DataAnalysis 4.3 (Bruker Daltonics,
Build 110.102.1532). All mass spectra were recalibrated internally
with sodium acetate clusters detected at the beginning of the electrophoretic
runs. Carbohydrate moieties of the N-glycopeptides
were deduced from the fragmentation spectra by manual annotation on
the basis of general glycan fragmentation rules[37] and/or basic rules of the N-glycan biosynthetic
pathway.[38] Glycoforms were labeled with
glycan net compositions specifying number of hexoses (Hex), N-acetylglucosamines (HexNAc), fucoses (Fuc), and N-acetylneuraminic acids (NeuAc); these monosaccharide abbreviations
will be further used throughout this manuscript. Exemplary annotations
of MS/MS N-glycopeptide spectra are provided in Supporting Information, Figure S1. N-Glycopeptides were included based upon their exact mass (±10
ppm), signal-to-noise (S/N; >9), and migration order. Briefly,
all
MS/MS scans were screened for oxonium ions (m/z 204.087, 366.140, and 292.103, singly charged HexNAc1, Hex1HexNAc1, and NeuAc1 fragments, respectively). The peptide mass was deduced by subtracting
203.079 (neutral loss of HexNAc) from the characteristically intense
Y1 ion (corresponding to [peptide + HexNAc + H]+) and then matched against in silico digests of CEA (trypsin and
Glu-C) to determine the amino acid sequence and N-glycosylation site position. In addition, the presence of low-intensity
peptide b-ions was used to further confirm the amino acid sequence.
The glycan composition could be assigned based upon the presence of
other Y-ions in the spectra as well as by the neutral loss of (combined)
monosaccharides from the precursor mass. In case of Pronase, deduced
peptide M values were
matched to theoretical M values of randomly cleaved peptides from CEA generated by the FindPept
tool (http://www.expasy.org/tools/findpept.html, with a mass accuracy value of 10 ppm).[39] Through introducing some minor adaptations of the default data processing
pipeline of Skyline software (MacCoss Lab Software, version 20.1.0.76),
nonfragmented N-glycopeptides in the MS1 spectra
were identified as well as to confirm the identity of the manually
annotated fragmented species. Briefly, all MS/MS-confirmed glycan
compositions, their expected derivatives inferred from the N-glycan biosynthetic pathway and extra glycan species found
in the glycoprofile of tumor-associated CEA in the literature[29] were loaded into Skyline software as possible
modifications for all the samples (in total 128 compositions (Supporting Information, Table S1), 89 of these
were MS/MS-confirmed by at least one glycopeptide in at least one
sample type (CEA1, CEA2, or CEA3)). In addition, a peptide mass list
was created for every identified peptide backbone of every manually
detected N-glycopeptide. Upon the basis of this mass
list in combination with all possible glycan compositions, a full
assignment was provided by Skyline for the tryptic and Glu-C digestions.
Within a single data processing step, improved peak assignment and
quantification of the relative peak area of all compounds could be
achieved with Skyline. All assigned N-glycopeptides
by Skyline were visually evaluated based upon peak shapes of the detected
protonated molecular ions of a compound, ppm error (±10 ppm)
as well as on the fitness and quality of the isotopic patterns (dot
product value >0.8), only those that passed these criteria were
chosen
for quantitation. After this, generated peak areas were normalized
per sample and per N-glycosylation site to obtain
their relative abundances (percentage of all glycoform peak areas
detected per site); these values were used for further analyses and
visualization. Despite Skyline software tool not being primarily designed
for glycopeptide profiling and annotation, its above-mentioned features
appeared to be very useful for this application. Within a single data
processing step Skyline can improve peak assignment and quantify the
relative area of all compounds. All resulting N-glycopeptides
were supplied with a graphical structural representation of the attached
glycan (Supporting Information, Tables S2 and S3). A representation of the glycopeptide annotation and quantitation
pipeline highlighting the Bruker DataAnalysis and Skyline software
environments and features are shown in Supporting Information, Figure S2. Skyline processed data are available
via the MassIVE repository with identifier MSV000086774 [DOI: 10.25345/C5Z50X].
Results and Discussion
Analysis of CEA N-Glycopeptides
by Sheathless CE-MS/MS
CEA N-glycopeptides
were generated using three different proteases and analyzed by CE-MS/MS.
Representative extracted ion electropherograms (EIEs) of several N-glycoforms related to N-glycosylation
site N650 (tryptic peptide backbone ITPNNN650GTYACFVSNLATGR) are illustrated in Figure A. A broad range
of glycans was observed varying in the overall composition of monosaccharide
units, their structural assembly and charge (presence of terminal
NeuAc). The glycan portions were found to strongly affect the electrophoretic
mobility of the N-glycopeptides. The number of negatively
charged NeuAc units determined the formation of distinct glycoform
clusters per peptide backbone. Semiempirical models can be used to
relate the molecular mass (M) and the charge (q) of a peptide to its electrophoretic
mobility (me) and to predict the electrophoretic
migration behavior. Recently, it has been shown that the classical
polymer model in which q/M1/2 is proportional to me can be applied to predict the electrophoretic mobility
of N-glycopeptides.[40] An
illustration of this model is shown in Figure C, where for a range of N-glycoforms the theoretical M, q, and q/M1/2 values are provided next
to their experimentally obtained migration time values, which are
inversely proportional to me. The relationship
observed between migration times and q/M1/2 was in accordance with
the classical polymer model (Figure B). By investigating these models, structural modifications,
charge characteristics and conformations of peptides can be studied.
In our study these plots were constructed for all the detected N-glycoforms with the same peptide backbone as a sanity
check of the glycopeptide structure assignment which was primarily
based on MS/MS data.
Figure 1
Experimentally observed linearity of electrophoretic behavior
exemplified
with N-glycopeptides sharing ITPNNGTYACFVSNLATGR peptide backbone
(N-glycosylation site N650). Experimental
migration times of the N-glycopeptides in the electropherogram
(A) were fitted to a linear regression line (B) with the classical
polymer semiempirical model (q/M1/2).[40] The table (C) showcases theoretical M, q, and q/M1/2 values
and experimentally obtained migration time values, which are proportional
to me. Blue square: N-acetylglucosamine (N), green circle: mannose (H), yellow circle:
galactose (H), red triangle: fucose (F), pink diamond: N-acetylneuraminic acid (S).
Experimentally observed linearity of electrophoretic behavior
exemplified
with N-glycopeptides sharing ITPNNGTYACFVSNLATGR peptide backbone
(N-glycosylation site N650). Experimental
migration times of the N-glycopeptides in the electropherogram
(A) were fitted to a linear regression line (B) with the classical
polymer semiempirical model (q/M1/2).[40] The table (C) showcases theoretical M, q, and q/M1/2 values
and experimentally obtained migration time values, which are proportional
to me. Blue square: N-acetylglucosamine (N), green circle: mannose (H), yellow circle:
galactose (H), red triangle: fucose (F), pink diamond: N-acetylneuraminic acid (S).
Sequence Coverage and Characterization of
CEA (Glyco)Peptides
Figure shows the amino acid sequence of CEA and its potential
28 N-glycosylation sites (marked in bold red). In
addition to the numerous N-glycosylation sites, CEA
was expected to have a considerable heterogeneity in the sugar content
per site of the protein, hence the analysis and structural elucidation
of N-glycopeptides from CEA enzymatic digests was
not straightforward. Taking these factors into account, digestion
with several proteases or mixtures of proteases was considered, as
it was previously demonstrated to increase sequence coverage and improve
the PTM characterization.[41,42] Three different proteolytic
enzymes were used to generate three distinct protein digests for each
CEA sample. Trypsin and Glu-C are specific serine proteases that cleave
C-terminal peptide bonds of lysine (K) and arginine (R), or aspartic
(D) and glutamic acid (E), respectively. Spectra of the MS/MS analysis
of trypsin and Glu-C digests are often
sufficient to characterize moderately glycosylated proteins.[41] In particular, Glu-C is able to generate the
peptide backbones that are less likely to be covered by trypsin, as
the trypsin-specific cleavage sites that have D and E amino acids
in close proximity to R amino acid have lower probability of being
hydrolyzed.[43] However, the abundant glycosylation
on CEA may account for inefficient cleaving in PTM-rich protein regions
and large glycopeptides with multiple glycosylation sites, thereby
impairing the annotation. Another issue are the nonglycosylated peptides
in proteolytic digests suppressing the glycopeptide ionization in
ESI-MS and leading to substantial sensitivity reduction for glycopeptides.[41] Pronase—a nonspecific mixture of proteolytic
enzymes—was reported to help in circumventing these limitations.[44] Pronase produces smaller peptide moieties (typically
1 to 8 amino acids) and usually cleaves nonglycosylated peptides to
single amino acids, which reduces signal suppression of N-glycopeptides.[45] However, since nonspecific
proteases generate peptides rather haphazardly, CE-MS/MS data sets
from Pronase digests were harder to interpret and, most importantly,
the resulting N-glycopeptides could not be reliably
quantified. Therefore, these N-glycopeptides were
viewed from an exploratory perspective only, namely for providing
a better overview of the total glycoprofile and enriching the putative
modifications list. When combining the information obtained with the
different enzymes, 21 out of the potential 28 N-glycosylation
sites were covered. These peptide backbones are highlighted on the
CEA sequence in Figure . Some very short N-glycopeptides from Pronase digests
were annotated (Supporting Information, Table S3) but not mapped, due to the lack of specificity of the sequence
motif and could be assigned to multiple glycosylation sites. Additionally,
the peptide mass fingerprint data of all samples were searched against
the human Uniprot database with Mascot Daemon (v.2.5.1, Matrix Science);
the resulting peptide coverage is illustrated in Supporting Information, Figure S3). Albeit the total peptide
coverage appears to be rather low, this could be explained by the
high glycosylation degree of CEA. The Mascot matching algorithm is
not designed to map glycosylated peptides, moreover, glycans are likely
to cause steric hindrance for the proteases attempting to reach the
cleavage site. From Glu-C digests a rather long peptide (Ala174–Asp227) was found as an unmodified sequence only,
although it is harboring N-glycosylation specific
sequons some of which are glycosylated and characterized from our
other data sets (e.g., Pronase digests). That underscored the proteoform
variability in N-glycosylation for these sites, i.e.,
a certain portion of the glycoprotein does not bear glycan modifications
on these putative sites.
Figure 2
Human carcinoembryonic antigen (CEACAM5_HUMAN)
sequence. Putative N-glycosylation sites are indicated
in bold red. Glycopeptides
that were detected after specific proteases digestion are highlighted
in green (Glu-C) and orange (trypsin). The glycopeptide coverage by
a nonspecific protease (Pronase) digestion is shown in blue. The total
molecular mass of the glycosylated protein is reported to vary between
150 000 and 200 000 Da; the mass calculations are based
on the most prominent mass (180 000 Da).[21]
Human carcinoembryonic antigen (CEACAM5_HUMAN)
sequence. Putative N-glycosylation sites are indicated
in bold red. Glycopeptides
that were detected after specific proteases digestion are highlighted
in green (Glu-C) and orange (trypsin). The glycopeptide coverage by
a nonspecific protease (Pronase) digestion is shown in blue. The total
molecular mass of the glycosylated protein is reported to vary between
150 000 and 200 000 Da; the mass calculations are based
on the most prominent mass (180 000 Da).[21]
Site-Specific N-Glycoprofile
of CEA
Supporting Information, Table S2 represents all the discovered N-glycopeptides
that were quantified (trypsin and Glu-C digestion) and Supporting Information, Table S3 provides those
that were only structurally characterized (Pronase digestion), all
grouped by N-glycosylation site for the three CEA
samples. To ease the interpretation of the trends in differential N-glycosylation patterns and structural features of the
glycoprofiles, glycosylation traits (namely glycosylation type, as
well as antennarity, fucosylation, and sialylation present on all
glycans (total) and within the complex types (complex)) were mathematically
derived from the data sets with relative abundance values (Supporting Information, Figure S4, derived traits
calculation is provided in Supporting Information, Table S4). Moreover, to obtain a better representation of the
most characteristic N-glycan species, the top 10 N-glycans (based upon their relative abundance) were selected
per quantifiable N-linked glycosylation site and
plotted as histograms in Figure and Supporting Information, Figure S5.
Figure 3
Relative abundances of the 10 most abundant glycoforms of the quantitatively
characterized N-glycosylation site N204/560 (CEA1 is plotted in blue, CEA2 in green and CEA3 in orange). Peak
area values were normalized by N-glycosylation site
for all samples (as percentage of all glycoforms peak areas detected
per site). The top 10 most abundant glycoforms of all N-glycosylation sites quantitatively characterized can be found in Supporting Information, Figure S4. Error bars
indicate the standard deviation. H: hexose, N: N-acetylhexosamine,
F: fucose and S: N-acetyl neuraminic acid.
Relative abundances of the 10 most abundant glycoforms of the quantitatively
characterized N-glycosylation site N204/560 (CEA1 is plotted in blue, CEA2 in green and CEA3 in orange). Peak
area values were normalized by N-glycosylation site
for all samples (as percentage of all glycoforms peak areas detected
per site). The top 10 most abundant glycoforms of all N-glycosylation sites quantitatively characterized can be found in Supporting Information, Figure S4. Error bars
indicate the standard deviation. H: hexose, N: N-acetylhexosamine,
F: fucose and S: N-acetyl neuraminic acid.Overall, the tryptic digestion allowed the identification
of a
total number of 489 different N-glycopeptides across
the three CEA sample types, corresponding to eight N-glycosylation sites (N197/N553, N204/N560, N375, N580, N650, and N665). Unfortunately, no differentiation could be
made between N-glycopeptides belonging to sites N197 and N553 or N204 and N560, as they presented the same peptide sequences after digestion with
trypsin (LQLSNGN197/553R and TLTLFN204/560VTR, respectively). Some N-glycosylation
sites were expected, but not found in the data (e.g., N152, N208, N246, N480). This may be
caused by glycans shielding the theoretical tryptic cleavage sites,
which resulted in large peptide moieties (aa ≥ 20). Furthermore,
certain peptide sequences carried several potential N-glycosylation sites. Hence, if all of these sites were occupied,
the complexity of the glycopeptide would complicate the detection
and identification of these sites. Interestingly, site N197/553 was represented as two charge variants: LQLSNGNR and its deamidated version; LQLSDGNR, which has not been reported
yet. It is well-known that asparagine (N) can undergo spontaneous
deamidation both in vivo and in vitro and the rate increases under
temperature and pH typical for tryptic digestion.[46] Therefore, this modification could happen as a result of
peptide degradation during sample preparation. However, no other asparagine
degradation was observed for (glyco)peptides in any of the samples.
The peak areas and shapes, observed for both charge variants, were
quite similar in value and appearance, and they migrated one after
another as expected. Moreover, the glycan subsets of the charge variants
overlapped significantly, but not fully (Supporting Information, Table S2 and Figure S5). Taking into account that the peptide sequence is shared between
the two N-glycosylation sites, it was possible to
assume that the two peptides originated from different parts of the
protein, but further research is needed to confirm this suggestion.The Glu-C digest resulted in 150 unique N-glycopeptides
and yielded the identification of three additional N-glycosylation sites (N152/N508, N466) compared to the tryptic digest and overlapped in one site (N580). However, two of these sites (N152/N508) could not be reliably quantified due to peptide backbone variability
from the nonspecific Glu-C cleavages (Supporting Information, Table S3). Interestingly, the site that was detected
in both trypsin and Glu-C digests (N580), revealed much less glycoforms in the trypsin data than in the
Glu-C data (albeit fully overlapping). This further points toward
the possible interference of abundant and bulky N-glycan modifications with the performance of certain enzymes, in
line with literature.[43,47]With regard to Pronase,
the digests revealed in total of 254 N-glycopeptides.
Due to the nonspecificity of the enzyme,
a partial overlap in coverage was found between the Pronase generated N-glycopeptide pool and the glycopeptide sets produced by
the specific enzymes. Nonetheless, 10 new sites were confirmed (N104, N208, N256, N274, N330, N351, N432, N480, N529, N612) and Pronase digest contributed to the
characterization of 21 N-glycosylation sites in total.Throughout the identified and quantified N-glycosylation
sites, different classes of N-glycans were observed
on the same site, including some high-mannose and hybrid types but
mainly complex structures (bi-, tri-, and tetra-antennary structures).
Some nonquantifiable sites (N152/508, N351,
N256 in Supporting Information, Table S3) appeared to be populated only with high-mannose type structures.
The rest of identified N-glycopeptides from nonquantifiable
data were a mixture of different N-glycan types,
though the majority of these were strongly overlapping in sequence
position. Therefore, more high-mannose only sites could exist on the
CEA structure.Figure gives an
overview of the site-specific N-glycosylation of
the three CEA samples. The overall resulting N-glycosylation
site coverage is rather comprehensive, albeit the B1 protein domain
was covered quite poorly. From the Pronase digest, small high-mannose
type glycoforms structures could be assigned to this sequence area
(Supporting Information, Table S3), which
is most likely caused by steric hindrance from the glycans shielding
binding site for the specific proteases. Figure gives the N-glycan structures
with the highest relative abundance per CEA sample analyzed. The N-glycosylation sites (indicated in red) are covered from
the Pronase digests and, as these could not be reliably quantified,
were not accompanied by the most abundant glycan structures. Visual
evaluation prompts two outcomes: distal part (C-terminal) area of
CEA is easier to reach by specific proteolytic enzymes and it harbors
more potential differential traits of increasing malignancy potential
of the tumor cells. In particular, the highest levels of fucosylation
and, especially, sialylation, were observed for the liver metastases
CEA sample (CEA3), which presented a higher degree of metastatic involvement
(Figure and Supporting Information, Figures S4 and S5).
Figure 4
Schematic
representation of human carcinoembryonic antigen (CEACAM5_HUMAN)
domain structure with the N-glycosylation sites revealed
by this study marked for the different type of CEA samples. The CEA
domains are depicted as barrels and N-glycosylation
sites (shown as double dots) are mapped along the structure. The black
double dots are identified N-glycosylation sites
in this study (tryptic or Glu-C digest) and are supplied with the
most abundant N-glycan composition per all three
types of CEA analyzed (side panels). The N-glycans
highlighted in orange (N197 and N553) and in
green (N204 and N560) share the same peptide
backbone, hence the discovered glycan populations are shared. Confidently
characterized by nonspecific digestion data sites are shown as red
double dots (Pronase) and are not accompanied by the most abundant
glycan structures due to the lack of reliable quantification. Undiscovered N-glycosylation sites are shown as gray double dots. All
identified N-glycopeptides can be found in Supporting Information, Table S2 (quantified)
and S3 (nonquantified)). Blue square: N-acetylglucosamine (N), green circle: mannose (H), yellow
circle: galactose (H), red triangle: fucose (F), pink diamond: N-acetylneuraminic acid (S).
Schematic
representation of human carcinoembryonic antigen (CEACAM5_HUMAN)
domain structure with the N-glycosylation sites revealed
by this study marked for the different type of CEA samples. The CEA
domains are depicted as barrels and N-glycosylation
sites (shown as double dots) are mapped along the structure. The black
double dots are identified N-glycosylation sites
in this study (tryptic or Glu-C digest) and are supplied with the
most abundant N-glycan composition per all three
types of CEA analyzed (side panels). The N-glycans
highlighted in orange (N197 and N553) and in
green (N204 and N560) share the same peptide
backbone, hence the discovered glycan populations are shared. Confidently
characterized by nonspecific digestion data sites are shown as red
double dots (Pronase) and are not accompanied by the most abundant
glycan structures due to the lack of reliable quantification. Undiscovered N-glycosylation sites are shown as gray double dots. All
identified N-glycopeptides can be found in Supporting Information, Table S2 (quantified)
and S3 (nonquantified)). Blue square: N-acetylglucosamine (N), green circle: mannose (H), yellow
circle: galactose (H), red triangle: fucose (F), pink diamond: N-acetylneuraminic acid (S).Looking closer at the overall N-glycan population
for the three different types of CEA samples analyzed, a characteristic
cancer glycosylation pattern emerges. To start with, a notable expression
of truncated/paucimannosidic and high-mannose structures was observed
in all three sources analyzed in this study. This feature is hypothesized
to raise from incomplete N-glycan processing in the
early stage of CRC.[48,49] As could be seen from the glycosylation
type panels in Supporting Information, Figure S4, the location of these structures has no predilection toward
any particular area of the protein. Despite the vast presence of incomplete
structures, they are not very abundant and moderately contribute to
the overall CEA glycoprofile.Bisection and branching appear
to have a controversial role in
cancer glycobiology. A fair amount of bisecting complex-type structures
represents an advantage in immune evasion and metastatic potential[50] and impacts several cancer-assisting biological
pathways.[51] Increased branching is also
viewed as both cancer-associated and oncogenesis-promoting.[48,51] It is widely accepted that high bisection levels are likely to coincide
with low degrees of branching as bisecting GlcNAc expression is governed
by GNT3 enzyme that effectively inhibits the activities of other GlcNAc
transferases.[52] Surprisingly, this study
revealed that both branching and bisection levels are relatively high
in all CEA samples and particularly high for CEA3 (liver metastasis).
Furthermore, in many cases both features were combined within the
same glycan structure which, to the best of our knowledge, is uncommon
for cancer-related phenotypes or CEA glycosylation features.[29] Notable differences in antennarity between primary
colon carcinoma (CEA1 and CEA2) and metastatic tumor tissue (CEA3)
samples were found on sites N204/560, N375,
N466, N650 and N655 (Supporting Information, Figure S4). Peculiarly,
while it is typical for mammalian cells to terminate GlcNAc residues
with galactose units, the detected highly branched tetra-antennary
structures often had their antennae only partially occupied by galactose
(i.e., incomplete galactosylation), indicating lowered levels of substrate
or β4GalT enzyme dysregulation. Similar findings of lowered
galactosylation in tumor-associated CEA N-glycome
were obtained before, by lectin microarray analysis, and the authors
speculated about its role in immune response involving CEA.[53] The same study also reported overall increased
levels of GalNAc in tumor-associated CEA, which could also explain
and/or contribute to undergalactosylation of antennae. Alongside with
high branching, CEA N-glycans showed proclivity toward
(poly-)LacNAc elongation on highly branched structures, which is also
a known CRC feature associated with poor prognosis.[48] This signature difference between primary and metastatic
samples is the most notable in the most distal part of CEA sequence
(sites N650 and N655) and is not reflected much
along the other sites.Sialylation changes are commonly found
in cancer biology and have
potential for diagnosis and therapy.[54−56] Particularly, increased
sialylation has often been associated with cancer invasiveness, and
elevated serum total sialylation (TSA), and particularly high level
of sialylation were found for liver metastases.[57,58] Induced by dysregulation of sialidases and sialyltransferases aberrant
NeuAc patterns play a role in immune evasion and cancer associated
inflammation. CRC is no exception, as increased sialylation is shown
to contribute to postoperative recurrence,[59] therapeutic resistance and metastatic spread.[60] The CEA samples exhibited an overall moderate degree of
sialylation (even highly branched N-glycans never
harbored more than two NeuAcs), while elevated sialylation levels
were found for the CEA3 sample on sites N204/560, N375, N580, N650, and N665 (Supporting Information, Figure S4). Total sialylation
was rather in line with complex-type sialylation levels indicating
low contribution of hybrid-type N-glycans in the
overall sialylation pool. NeuAc modifications (e.g., methylation,
acetylation and sulfation) were not included in the scope of this
work. It is also worth mentioning that NeuAc linkage isomers (α-2,3
and α-2,6 and α-2,9 variants) were proven to be an important
discriminatory trait in several cancers, including CRC.[60] In this study, no NeuAc linkage information
could be obtained, but, acknowledging the importance of the trait,
experimental considerations for linkage specificity evaluation will
be included in follow-up research.Amid the glycan traits discussed
above and illustrated in Supporting Information, Figure S4, fucosylation
was the most striking feature of the CEA N-glycome
in the analyzed samples. Aberrant increase in fucosylation—arising
from a large pool of GDP-fucose donors, abnormal expression of enzymes,
and substrate availability—is reputed to be a red flag in many
oncological diseases (brain, colorectal, breast, liver, lung, and
other cancers) and it is involved in multiple stages of cancer biology
(proliferation, survival, multidrug resistance, invasion and metastasis)
and sustained inflammatory processes.[61,62] Whereas upregulation
of core fucosylation (governed by FUT8 enzyme overexpression) has
been reported to directly promote epithelial-mesenchymal transformation
(EMT) and dedifferentiation (i.e., metastatic potential),[62] terminal fucosylation is partaking in the generation
of Lewis antigens. Upon interaction with certain selectins, Lewis
type antigens (on both immune cells and especially cancer cells) promote
angiogenesis, reform tumor microenvironment, and, paradoxically, enhance
cancer progression via chronic activation of innate immune cell. Additionally,
elevated fucose levels are linked to the cancer cell stemness. The
possible roles of fucosylated antigens in cancer biology were recently
reviewed by Blanas and colleagues in full details.[63] Sialyl-Lewis antigens (sialylated version of Lewis antigens)
are also impacting the CRC progression on multiple stages of development.[48] However, these will not be engaged in the current
discussion due to rather moderate partaking of sialylation in the
CEA glycoprofiles observed in this study. CEA-associated-glycan pool
observed in this study appeared to be overabundantly fucosylated with
regards to both core and terminal fucosylation: on average about half
of all glycans from all quantifiable N-glycosylation
sites carried at least one fucose. The maximum number of fucose monosaccharides
in a single glycan composition could reach up to seven units. As an
illustration, the MS/MS-confirmed structure in H9N8F7S1 on site N204/560 (CEA3 sample, Supporting Information, Table S2) can be revised. Interestingly, almost all N-glycans associated with the sites N204/560 and
N466 carried at least one fucose within the structure regardless
of the CEA source. Nonetheless, Supporting Information, Figure S4 demonstrate that, on average, the CEA3 sample exhibits
a higher level of fucosylation than CEA1 and CEA2, when considering
both the abundances of fucosylated glycans as well as number of fucose
units per glycans. That is particularly evident for sites N375, N580, and N650, where primary tumors showed
significant levels of nonfucosylated N-glycan species.
Moreover, for sites N204/560 and N466 of the
CEA3 sample a higher amount of fucosylated determinants and oligofucosylated N-glycans was observed accounting for more Lewis antigens.Overall, notable differences in one or more of the analyzed N-glycan traits per N-glycosylation site
could be observed per CEA sample type (primary tumor vs metastatic
site; Figure , Supporting Information, Figures S4 and S5).
Concluding Remarks
In this study, CEA N-glycosylation sites were
characterized using N-glycopeptide profiles obtained
after enzymatic digestion with trypsin, Glu-C, and Pronase and analysis
by sheathless CE-MS/MS. Complementary information was obtained through
the use of the three above-mentioned enzymes, hence allowing an improved N-glycosylation site coverage and the identification of
most of the potential N-glycosylation sites (21 out
of 28), their degree of occupancy, and their site-specific dominant N-glycan types (893 different N-glycopeptide
glycoforms were identified with a total of 128 unique glycan compositions).
Overall, the CEA N-glycoprofile follows the previously
reported cancer-associated patterns, while exhibiting its distinct
features: simultaneous increased bisection and branching, incomplete
galactosylation or (poly)LacNAc elongation on highly branched structures,
moderate levels of sialylation, and extremely high levels of fucosylation.
For a better understanding of CEA glycosylation pattern heterogeneity,
and to confirm our findings, an average glycosylation profile using
a glycomics approach could be explored in the foreseeable future.
Nonetheless, the N-glycome profile seems to be a
less promising source of biomarkers. This is especially due to the
low abundance of CEA and high abundance of other glycoproteins present
in complex biological samples that may skew the obtained N-glycome profiles of CEA, even for immunopurified samples. It should
be noted that this study presents an exploratory perspective on the N-glycosylation heterogeneity of CEA, and, due to the small
sample set, only initial observations about the biological differences
in glycosylation between primary and metastatic CRC can be made. These
findings should be validated on a larger set of samples to establish
novel biomarkers. To bring an example, the distal part of the CEA
sequence (sites N580, N650, N655 from
tryptic digest, and site N466 from Glu-C digest) exhibits
the most potential in tumor biological status discrimination (colon
primary carcinoma versus liver metastases) with regards to bisection,
antennarity, fucosylation, and sialylation traits. The presented multienzyme
sheathless CE-MS/MS bottom-up strategy shows potential to provide
important biologic information on how N-glycosylation
may influence CEA processing in cancer biogenesis. Furthermore, this
approach may be successfully translated to the characterization of
other highly glycosylated and complex endogenous glycoprotein biomarkers
or glycoprotein biopharmaceuticals.
Authors: Eirikur Saeland; Ana I Belo; Sandra Mongera; Irma van Die; Gerrit A Meijer; Yvette van Kooyk Journal: Int J Cancer Date: 2011-11-28 Impact factor: 7.396
Authors: Guinevere S M Kammeijer; Isabelle Kohler; Bas C Jansen; Paul J Hensbergen; Oleg A Mayboroda; David Falck; Manfred Wuhrer Journal: Anal Chem Date: 2016-05-12 Impact factor: 6.986
Authors: Jianhui Zhu; Junfeng Huang; Jie Zhang; Zhengwei Chen; Yu Lin; Gabriela Grigorean; Lingjun Li; Suyu Liu; Amit G Singal; Neehar D Parikh; David M Lubman Journal: J Proteome Res Date: 2020-05-29 Impact factor: 4.466