Literature DB >> 25406038

Multiple enzyme approach for the characterization of glycan modifications on the C-terminus of the intestinal MUC2mucin.

Sjoerd van der Post1, Kristina A Thomsson, Gunnar C Hansson.   

Abstract

The polymeric mucin MUC2 constitutes the main structural component of the mucus that covers the colon epithelium. The protein's central mucin domain is highly O-glycosylated and binds water to provide lubrication and prevent dehydration, binds bacteria, and separates the bacteria from the epithelial cells. Glycosylation outside the mucin domain is suggested to be important for proper protein folding and protection against intestinal proteases. However, glycosylation of these regions of the MUC2 has not been extensively studied. A purified 250 kDa recombinant protein containing the last 981 amino acids of human MUC2 was produced in CHO-K1 cells. The protein was analyzed before and after PNGase F treatment, followed by in-gel digestion with trypsin, chymotrypsin, subtilisin, or Asp-N. Peptides were analyzed by nLC/MS/MS using a combination of CID, ETD, and HCD fragmentation. The multiple enzyme approach increased peptide coverage from 36% when only using trypsin, to 86%. Seventeen of the 18 N-glycan consensus sites were identified as glycosylated. Fifty-six N-glycopeptides covering 10 N-glycan sites, and 14 O-glycopeptides were sequenced and characterized. The presented method of protein digestion can be used to gain better insights into the density and complexity of glycosylation of complex glycoproteins such as mucins.

Entities:  

Keywords:  ETD; MUC2; N-glycosylation; O-glycosylation; chymotrypsin; mucus; subtilisin

Mesh:

Substances:

Year:  2014        PMID: 25406038      PMCID: PMC4261943          DOI: 10.1021/pr500874f

Source DB:  PubMed          Journal:  J Proteome Res        ISSN: 1535-3893            Impact factor:   4.466


Introduction

It was for a long time assumed that mucus merely served as a simple lubricator and a physical barrier against the surroundings. Today it is clear that the proteins building mucus have very specific properties, with an important role for the O-glycans in harboring the commensal bacterial flora.[1,2] Mucus and its main component, MUC2 mucin, is secreted from the goblet cells that are scattered in between the intestinal enterocytes. The MUC2 protein (∼550 kDa) constitutes the major protein component of mucus and is characterized by two PTS-domains, rich in proline, serine, and threonine, that become densely O-glycosylated to form the extended mucin domains.[3] The attached glycans are estimated to increase the size of the molecule five times (∼2.5 MDa) and to provide a nutritional source for the gut flora as well as making the mucin resistant against proteolytic degradation by endogenous and bacterial proteases.[4,5] The N- and C-terminal parts of the MUC2 molecule are involved in di- and trimerization, enabling the formation of an enormous disulfide cross-linked net that is further polymerized to form the mucus gel upon secretion.[6−8] We have discovered that the inner stratified mucus layer in the colon is firmly attached to the epithelial cells and devoid of bacteria, as it acts as a physical barrier separating the bacteria from the underlying epithelial cells.[1] On the other hand, the commensal microbiota in the colon resides in the outer mucus layer, which has a more open structure that allows bacteria to enter. Results from mouse models have suggested that this process is controlled by proteases.[1] Both layers exist in germ-free mice, indicating that the transition is independent of bacterial proteolytic activities and must be due to endogenous proteases.[1] Although we have identified additional proteins that are secreted in conjunction with MUC2,[9] proteomics studies have so far not revealed any other proteins than MUC2 to be essential for the transition from the inner to the outer colonic mucus layer. It has therefore been proposed that the transition from the inner firm to the outer loose mucus layer involves proteolytic cleavages in the MUC2 molecule. The nature of the disulfide bonded covalent network formed by MUC2 and our earlier studies clearly suggest that the transition from the outer to the inner mucus layer is accompanied by cleavages in the MUC2 C-terminal part.[1] Studies of such proteolytic processing events require detailed biochemical knowledge of the MUC2 protein backbone and its posttranslational modifications. We and others have characterized the O-glycosylation of the central PTS-domains of MUC2 from the various parts of the human gastrointestinal tract.[10−12] However, little is known about the glycosylation of the N-and C-terminal parts outside the central mucin domains of the human MUC2. Due to its size and complexity, MUC2 has proven to be an extremely challenging molecule to study. Therefore, we have generated recombinant proteins encompassing selected regions of the MUC2 molecule and have used these proteins to study the function of the individual parts of the MUC2 molecule.[6,13,14] The aim of the current project was to obtain as complete protein backbone sequence coverage as possible of a recombinant MUC2 C-terminal protein expressed in CHO-K1 cells. This region of the protein contains the last 981 amino acids of the human MUC2 with ∼10% cysteine residues, 18 N-glycosylation (N-X-S/T) consensus sites, and many potential O-glycosylation sites. Recent developments in the field of mass spectrometry, and applied methodologies for selective purification and analysis of glycosylated peptides, have resulted in an increasing number of studies where glycan modifications are both characterized and localized.[15−17] The adaptation of fragmentation techniques such as high-energy collisional dissociation (HCD) and electron-transfer dissociation (ETD) together with the introduction of high-resolution mass analyzers such as the Orbitrap hybrid mass spectrometer have facilitated elucidation of modified peptides.[18,19] The analysis of fragment ions with high mass accuracy permits detection of diagnostic glycan oxonium ions that are used for rapid identification of MS/MS spectra containing glycopeptides.[20] ETD has the advantage of fragmenting the peptide backbone while still retaining the modifications, and it has successfully been applied to the analysis of glycosylated peptides.[21] The analysis of glycopeptides from a complex protein sample requires selective enrichment steps or the release of the glycan moiety to facilitate analysis.[19,22] Alternative studies have used less specific proteases to digest glycoproteins, and by this they leave only short peptides containing glycans.[23,24] Here we aim to obtain as high protein backbone peptide coverage as possible of a purified protein in order to identify as many modified amino acids as possible. The main obstacle in this type of analysis is the limited number of tryptic cleavage sites in the protein backbone of the MUC2, resulting in poor sequence coverage. To overcome this problem, we explored the possibility to use multiple proteolytic enzymes in combination with ETD and HCD fragmentation. In this way we obtained a near full sequence coverage of the MUC2 C-terminus (MUC2-C, amino acids 4,198–5,179 of the human MUC2) and were able to characterize its posttranslational modification.

Experimental Procedures

Materials

HPLC grade acetonitrile and analytical grade formic acid and trifluoroacetic acid were obtained from Sigma-Aldrich. Trypsin of proteomics grade (Promega), α-chymotrypsin (bovine pancreas), and subtilisin (protease from Bacillus licheniformis type VIII) were obtained from Sigma-Aldrich. Sequencing grade endoproteinase Asp-N and N-glycosidase F (PNGase F) were purchased from Roche Applied Science.

Sample preparations

Expression and purification of the recombinant C-terminus of MUC2 were performed as previously described.[13] Briefly, CHO-K1 cells were transfected with a plasmid encoding for the C-terminal region (bases 12622–15708) of the human MUC2 with the addition of a Myc-tag and GFP, downstream of the immunoglobulin κ-chain signal sequence. Expressing clones were expanded and cultured in Iscove’s modified Dulbecco’s medium (Gibco), and media was harvested every 3–4 days. Collected supernatant was filtered through <100 kDa cutoff filters (YM-100; Millipore), to remove smaller components and for buffer exchange to 50 mM Tris/HCl buffer (pH 8.0). MUC2-C was purified from the supernatant using ion exchange chromatography (MonoQ HR, 5/5, GE Healthcare), followed by anion exchange (Superose 6 HR 10/30, GE Healthcare) of the pooled MUC2-C containing fractions. Aliquots of the purified protein were reduced in 15 mM DTT for 15 min at 95 °C and then for 2 h at 37 °C, followed by alkylation in 40 mM iodoacetamide in the dark for 30 min. Protein aliquots used for de-N-glycosylation were dialyzed on 10,000 MW cutoff Slide-a-Lyzer cups (Millipore) against a 0.1 M Tris-HCl buffer pH 7.5, followed by addition of PNGase F, and left at 37 °C overnight. The samples were concentrated under vacuum using a SpeedVac, and 1 μg of protein per condition was analyzed by SDS-PAGE on 8% gels.[25] The gels were developed using Imperial protein stain (Pierce), and the major band migrating at 250 kDa corresponding to the construct was excised and digested with the following enzymes: (I) endoproteinase Asp-N (1:50) in 25 mM Tris/HCl pH7.8, overnight at 37 °C, (II) trypsin (1:100) in 50 mM (NH)4HCO3, overnight at 37 °C, (III) chymotrypsin (1:100) in 50 mM (NH)4HCO3, overnight at 30 °C, (IV) subtilisin in 25 mM (NH)4HCO3, 2 h at 20 °C. The digestion reactions were quenched by the addition of 25 μL of 0.1% formic acid, and peptides were extracted from the gel pieces by the addition of 30 μL of 50% acetonitrile in water (twice). Extracted peptides were dried under vacuum and resolved in 15 μL of 0.1% trifluoroacetic acid prior to analysis.

Mass spectrometry analysis

The protein digests were analyzed by online nLC-MS/MS using an Orbitrap XL mass spectrometer (Thermo Scientific) with the same configuration as described previously.[14] A C18 stationary phase Reprosil-Pur 120 C18-AQ, 3 μm (Dr Maisch gmbH, Germany) was used for both the pre- and analytical columns (2 cm × 100 μm i.d. and 20 cm × 50 μm i.d., respectively). The gradient was run at 5–50% B (A: 0.2% formic acid, B: acetonitrile) over 100 min at a flow rate of ∼250 nl per minute. Data were acquired in a data-dependent mode automatically switching between full scan (m/z 350–1600) and MS/MS acquisition. Full scans were collected at a resolution of 60,000 at 400 m/z with the lock mass option enabled for real time calibration (m/z 371.101). The HCD scans for the glycoproteomics analyses were performed on three of the five most abundant ions from the full scan, a resolution of 7500, maximum 250 ms injection time, AGC (automatic gain control) of 100,000, an isolation width of 4 Da, and a normalized collision energy at 30%. For ETD, AGC target values were set to 30,000 and 200,000 for the fluoranthene cations ions and the precursors, respectively. The reaction time was set to 100 ms, and supplemental activation was enabled. Raw files from analyses of the different enzyme digests were converted to peaklists using extract_msn.exe (version 2, Thermo Scientific), and database searches were performed using Mascot (version 2.3, Matrix Science). Fixed modifications were set for: carbamidomethyl (C), variable modifications: oxidation (M) and deamidation (NQ) for all samples. Peptide tolerance: ±5 ppm and ±100 ppm for fragment ions; peptide charge: 2, 3, and 4+, and Mascot peptide ion score cutoff was set to 20. For the identification of copurified proteins the data were searched against all rodent entries in Swiss-Prot (v 2013_7, entries 26,115) combined with a list of known contaminats. At least two unique peptides were required for positive identification. Enzyme settings for Asp-N (cleavage N-terminal of D and E), chymotrypsin semispecific (F, L, Y, and W) and subtilisin set to “none” to obtain no restrictions with respect to cleavage sites. The tryptic and Asp-N digests were searched against the database, allowing for one missed cleavage, and the chymotryptic and subtilisin digests were searched against the recombinant MUC2 C-terminal construct sequence concatenated to an in-house database containing compiled mucins sequences (http://www.medkem.gu.se/mucinbiology). Manual selection and interpretation of HCD spectra of N- and O-glycopeptides were based on the detection of diagnostic oxonium ions (m/z 204.08 (HexNAc), 292.09 (Neu5Ac), 366.11 (HexHexNAc), and 657.24 (HexHexNAcNeu5Ac)) using the Xcalibur browser (Thermo Scientific). The FindPept (for nonspecifically cleaved peptides) and GlycoMod tool (http://www.expasy.org) were used for identification of potential peptide candidates in the MUC2-C sequence, and all interpreted glycopeptides were identified within 5 ppm accuracy.

Results

Sequence coverage of MUC2-C after de-N-glycosylation

The MUC2-C recombinant protein is made up of 1,300 amino acids containing a signal sequence followed by a MycTag, a GFP protein, and the last 981 amino acids of the human MUC2 protein[13] (Figure 1A). Purified, reduced, and alkylated protein was analyzed by gel electrophoresis, and the fully glycosylated protein migrated at approximately 250 kDa. The expected mass of the protein is 187 kDa, indicating a large number of posttranslational modifications. The band was subjected to in-gel digestion with trypsin and analyzed by nLC-MS/MS. Database searches confirmed that the MUC2-C was the major component of the excised band. Twelve additional proteins were also detected and are compiled in Supporting Information Table S1. The tryptic peptide coverage of the MUC2-C (excluding the Myc and GFP tags) was 36%. None of the N-glycan motifs was found deamidated prior to PNGase F digestion, suggesting that all were modified by N-glycans; two asparagine containing peptides that were not part of a motif were found deamidated although only a minor fraction compared to the unmodified peptide, and likely introduced during sample preparation.
Figure 1

Overview of the MUC2-C sequence and the obtained sequence coverage. (A) The full MUC2 protein with the C-terminal highlighted. The recombinant protein used in this study is composed of the last 981 amino acids with the addition of a Myc-tag and GFP protein at the N-terminal. (B) Total peptide coverage (shaded in gray) from the combined MS analyses of tryptic, chymotryptic, subtilisin, or Asp-N digests. The GFP and Myc-tags are excluded from the figure. N-Glycosylation consensus sites (N-X-S/T) are underlined and in bold and named N1 to N18. Identified O-glycosylation sites (S/T) are highlighted in bold. Regions covered by peptides identified in the individual enzymatic digests are depicted below and shaded in color.

Overview of the MUC2-C sequence and the obtained sequence coverage. (A) The full MUC2 protein with the C-terminal highlighted. The recombinant protein used in this study is composed of the last 981 amino acids with the addition of a Myc-tag and GFP protein at the N-terminal. (B) Total peptide coverage (shaded in gray) from the combined MS analyses of tryptic, chymotryptic, subtilisin, or Asp-N digests. The GFP and Myc-tags are excluded from the figure. N-Glycosylation consensus sites (N-X-S/T) are underlined and in bold and named N1 to N18. Identified O-glycosylation sites (S/T) are highlighted in bold. Regions covered by peptides identified in the individual enzymatic digests are depicted below and shaded in color. The sample was then deglycosylated with PNGase F and reanalyzed. Deglycosylation increased the peptide coverage of MUC2-C to 39%, and the obtained tryptic peptides from the de-N-glycosylated MUC2-C are compiled in Supporting Information Table S2. By this approach, all obtained peptides in the size range between 7 and 16 amino acids expected from a tryptic digest of MUC2-C were observed, with the exception of three peptides (Figure 1B). To increase the peptide coverage, we evaluated other enzymes for protein digestion (one μg protein). Asp-N, subtilisin, and chymotryptic digestions of the deglycosylated recombinant protein were analyzed by mass spectrometry, and the results are shown in Figure 1B and compiled in Supporting Information Tables S3, S4, and S5, respectively. The majority of the peptides from the chymotryptic were formed by nonspecific cleavages other than the C-terminal of the chymotrypsin typical amino acids F, L, Y, and W. The peptide coverage was therefore considerably improved when searches were set to semispecific digestion for the peptide identification. The chymotryptic and subtilisin digest were searched against a limited database compiled in-house containing only mucin sequences including the recombinant MUC2-C protein. 101 and 67 unique peptides were detected in the subtilisin and chymotryptic digest, respectively, covering 67% and 53% of the MUC2-C sequence. As expected, subtilisin was found to cleave nonspecifically after large uncharged amino acids, and many of these subtilisin derived peptides were overlapping on limited stretches in the protein backbone. Most of the chymotryptic cleavages site were found C-terminally of large hydrophobic residues, as previously reported in the literature;[26] however, we also observed that cleavages with both subtilisin and chymotrypsin frequently occurred C-terminally of iodoacetamide alkylated cysteines. Chymotrypsin and subtilisin were found to provide both overlapping peptides and complementary regions. The enzyme Asp-N cleaves N-terminally of the amino acids aspartic acid (D), and at a limited rate glutamic acid (E). Only 23 peptides were detected in the Asp-N digest, mostly in areas already covered by the other enzymes. The poor sequence coverage can be explained by the low number and poor distribution of aspartic acids in the MUC2-C sequence, resulting in a limited number of peptides suitable for mass spectrometry detection. When combining the results from the four enzymes, the total peptide coverage of the MUC2-C was increased to 86% (Figure 1B). The remaining noncovered regions were in the N-terminal part of MUC2-C (amino acids 1–58 and 180–210, Figure 1A). These two regions are rich in the amino acids proline, serine, and threonine and as such are expected to be highly O-glycosylated, and potential proteolytic cleavage sites are likely shielded by glycan moieties.

Site-specific N-glycosylation characterization from HCD fragmentation spectra

Seventeen of the 18 potential N-glycosylation sites in MUC2-C were detected as deamidated peptides after PNGase F treatment in the tryptic, chymotryptic, and subtilisin digests, and as such assumed to be modified. All the 17 sites appeared to be glycosylated, and the only site with a partial occupancy was found at N5 (Figure 1B). For this site a nonmodified peptide was detected in the Asp-N digest and a deamidated peptide in the chymotryptic digest (Supporting Information Tables S3 and S5). To confirm the presence of N-glycans on the MUC2-C, protein digests were analyzed with LC-MS/MS using HCD fragmentation with orbitrap detection. HCD spectra corresponding to glycopeptides were initially selected for manual interpretation by screening for diagnostic ions in the lower mass range at m/z 204 (HexNAc), 366 (HexHexNAc), and m/z 657 (HexHexNAcNeuAc). In total, 56 N-glycopeptides were identified from the tryptic, subtilisin, and chymotryptic digests of the MUC2-C, including 38 glycoforms at 10 of the 18 potential N-glycosylation consensus sites. No N-glycopeptides were detected in the Asp-N digest. Results from the site-specific N-glycosylation survey are shown in Table 1.
Table 1

N-Glycopeptides Identified in the Different Enzymatic Digests of MUC2-C Analyzed by LC/MS/MS

     MW calcda
Glycopep. Obsdb
 Glycan compositionc
StartEndSequenceSiteEnzymeglycanpeptide[M + 2H]2+[M + 3H]3+ΔppmNeuAcFucHexHexNAcRel. Amtd
239247YNNTVEIVK240N6TRY2352.851078.56 1144.811.90076++
    TRY2643.941078.56 1241.840.81076++
    TRY27901078.56 1290.530.71176+
    TRY2935.041078.56 1338.870.62076++
    TRY3226.131078.56 1435.910.43076++
    TRY1768.64679.331224.99 0.30154+
416421FGNNTK418N8TRY2133.77679.331408.06938.71–1.70165+
    TRY2424.87679.33 1035.740.21165++
    TRY2498.9679.33 1060.42–0.80176+
    TRY2715.96679.33 1132.7702165++
    TRY2790679.33 1157.45–0.91176++
    TRY3081.09679.33 1254.48–0.22176++
428442TNTTSDDCILPSGEI429N9SUB1622.581621.71 1082.44–2.70054+
    SUB1913.681621.71 1179.47–11054++
    SUB2204.771621.71 1276.51–2.62054++
679685DTCCNIT683N12SUB1216.42882.321050.38 –0.90052+
    SUB1257.45882.321070.89 –0.60043+
    SUB1403.51882.321143.92 –1.50143++
    SUB1606.59882.321245.46830.6500144+++
    SUB1768.64882.321326.49884.66–20154+++
689695CNTSLCK690N13TRY1257.45881.371070.42 –0.10043+
    TRY1622.58881.371252.98835.661.50054+
    TRY1768.64881.371326.01884.351.40154++
    TRY1913.68881.37 932.69–0.21054+
    TRY2059.73881.37 981.38–0.21154+
688693KCNTSL OR NTSLCK690N13CHY1216.42721.34969.89 –10052++
    CHY1622.58721.341172.97 –1.30054++
    CHY1768.64721.341246 –0.50154++
754761KVDNNTLL757N14SUB1768.64915.501343.08895.72–0.70154++
    SUB2059.73915.501489992.750.21154++
    SUB2424.87915.50 1114.47–11165++
    SUB2790915.50 1236.18–1.11176+
743761SSKCQDCVCTDKVDNNTLL757N14CHY1768.642255.98 1342.55–0.20154++
    CHY2059.732255.98 1439.580.31154++
    CHY2350.832255.98 1536.610.22154++
767774THVPCNTS772N15SUB1403.51914.391159.96 –0.20143+
    SUB1768.64914.391342.52895.35–1.10154++
    SUB2059.73914.391488.07992.38–1.21154+++
    SUB2350.83914.39 1089.42–0.32154++
    SUB2424.87914.39 1114.09–0.41165+
820829NNCTFFSCVK821N16TRY1768.641275.54 1015.73–10154++
    TRY2059.731275.54 1112.77–1.11154++
    TRY2350.831275.54 1209.8–0.32154+
816829SDPKNNCTFFSCVK821N16TRY1768.641702.74 1158.14–2.30154++
    TRY2059.731702.74 1255.170.11154++
813825DFKSDPKNNCTFF821N16SUB2059.731618.71 1227.16–0.71154+++
    SUB2350.831618.71 1324.19–2.32154+
814824FKSDPKNNCTF821N16CHY1768.641356.61 1042.76–1.90154++
  or KSDPKNNCTFFCHY2059.731357.621356.611139.79–11154+++ 
    CHY2350.831356.61 1236.82–1.12154+
838845VSNITCPN840N17SUB1216.42903.411060.93 –1.40052 
837845SVSNITCPNe840N17SUB1216.42990.441104.44 –1.30052 
866878TCTPRNETRVPCS871N18SUB2059.731576.71 1213.160.11154++
  or CTPRNETRVPCSTSUB2350.831577.721576.711310.19–0.82154++ 
    SUB2424.871576.71 1334.87–1.61165+

Theoretical peptide and glycan mass.

Glycopeptide mass measured in the orbitrap (details are described in Methods section).

NeuAc = N-acetylneuraminic acid; Hex = (Man and Gal); HexNAc = N-acetylhexosamine (GlcNAc); Fuc = Fucose.

Relative amounts are estimated by comparing peak intensities within the group of closely eluting glycoforms of the same peptide.

Coeluting precursor ions with composition GlcNAc2, GlcNAc2Man1–5 were also detected, but may be caused by in-source fragmentation, since they are not biologically relevant.

Theoretical peptide and glycan mass. Glycopeptide mass measured in the orbitrap (details are described in Methods section). NeuAc = N-acetylneuraminic acid; Hex = (Man and Gal); HexNAc = N-acetylhexosamine (GlcNAc); Fuc = Fucose. Relative amounts are estimated by comparing peak intensities within the group of closely eluting glycoforms of the same peptide. Coeluting precursor ions with composition GlcNAc2, GlcNAc2Man1–5 were also detected, but may be caused by in-source fragmentation, since they are not biologically relevant. The identified N-glycopeptides were modified by a broad variety of high mannose-, hybrid-, and complex type structures with up to seven different glycoforms per site. This is in line with previous analyses of wild type CHO cell N-glycosylation showing a broad heterogeneity.[27] Four N-glycosylation sites were covered by tryptic glycopeptides, three with chymotrypsin, and seven with subtilisin. In cases when glycopeptides covering the same N-glycosylation site were found in more than one enzymatic digest, the same N-glycan composition was generally found, together with other less abundant structures (Table 1, e.g. 690N13, 757N14, and 821N16). De-N-glycosylation by PNGase F allowed subtilisin to cleave on the N-terminal side of the deamidated asparagine at 224N5, 429N9, and 772N15. The majority of the glycopeptides eluted early on the reverse phase column, prior to the respective nonmodified peptides. However, glycopeptides from the subtilisin and chymotryptic digests covering site 757N14 eluted after the majority of the nonmodified peptides, and the glycopeptide covering site 821N16 identified in all three enzymatic digests eluted even later. These long retention times reflect the hydrophobic nature of the peptides. Interestingly, the deglycosylated peptides eluted shortly before the glycosylated versions, suggesting that the surface accessibility was affected by removal of the glycans. Examination of the full scans in areas where glycopeptides eluted revealed that all major glycopeptides were selected for fragmentation despite the narrow elution time windows. Approximately 20% of the HCD spectra containing glycopeptide oxonium ions could not be interpreted due to weak fragment ions. This is likely due to the low abundance of the precursor and the fact that different glycopeptides require different collision energies to generate a sufficient number of fragment ions, as has been reported to be essential for HCD.[20] The observed precursor ions were doubly or triply charged [M + 2H]2+ or [M + 3H]3+ ions; however, glycopeptides above approximately 2500 Da were largely triply charged independently of the size of the peptide. The peptide size varied from 6 up to 19 amino acids. The HCD spectra were similar whether generated from tryptic or from nontryptic glycopeptides, but occasionally chymotryptic and subtilisin digests revealed fragments from the peptide backbone.

HCD fragmentation of a core fucosylated N-glycopeptide 15KSDPKNCTFF from the chymotryptic digest of MUC2-C

The HCD spectrum of an N-glycopeptide at m/z 1042.76 [M + 3H]3+ from a chymotryptic digest is shown in Figure 2. The lower region of the HCD spectra contains the characteristic diagnostic glycan oxonium b-ions at m/z 274/292, 366, 528, 657, and 893, which confirms the presence of a glycopeptide. However, in order to interpret the HCD spectrum, the main challenge is to identify the series of monoprotonated or doubly protonated y1 ions, which are abundant during HCD fragmentation and contain the peptide and the first GlcNAc residue.[20] In this spectrum, the presence of fragment ions from the sequential loss of residues from both the fucosylated core and the nonfucosylated glycopeptides made it difficult to identify this ion. However, the smallest doubly charged ions in the spectrum at m/z 780.85 [M + 2H]2+ and m/z 853.88 [M + 2H]2+ differed by 146, which corresponds to a Fuc residue. These ions correspond to the y1 ions of a core fucosylated N-glycan [peptide + GlcNAc + 2H+] and [peptide + GlcNAcFuc + 2H+], respectively. Subtracting the proposed N-glycan mass [Hex5HexNAc4Fuc1] = 1768.639 from the precursor ion mass measured in the full scan resulted in two candidates within 10 ppm, corresponding to the peptide sequences 815KSDPKNCTFF at Δppm = 0.3.
Figure 2

HCD fragmentation spectrum of the N-glycopeptide from a chymotryptic digest of MUC2-C detected at m/z 1042.76 [M + 3H]3+, interpreted to have the sequence 815KSDPKNNCTFF (site 821N16, Figure 1B) and the N-glycan with the composition (Hex5HexNAc4Fuc). The mass chromatograms from LC/MS of the three N-glycopeptides with the same sequence are inserted.

HCD fragmentation spectrum of the N-glycopeptide from a chymotryptic digest of MUC2-C detected at m/z 1042.76 [M + 3H]3+, interpreted to have the sequence 815KSDPKNNCTFF (site 821N16, Figure 1B) and the N-glycan with the composition (Hex5HexNAc4Fuc). The mass chromatograms from LC/MS of the three N-glycopeptides with the same sequence are inserted.

HCD fragmentation of glycopeptides 767THVPCNTS and 679DTCCNIT from the subtilisin digest of MUC2-C

The pooled mass chromatograms of five glycoforms which showed similar HCD spectra are found in Figure 3A. The base peak chromatograms as well as the full scans between 14.5 and 15.0 min show that these closely eluting peaks belong to different glycoforms of the same peptide (Figure 4B). The HCD spectra of the doubly charged precursor ion of the nonsialylated N-glycan at m/z 1342.52 [M + 2H]2+ and the triply charged precursor ion of the monosialylated N-glycan at m/z 992.38 [M + 3H]3+ are shown in Figure 3C and D, respectively. As in the previous example, a mixture of fragment ions originating from sequential loss of monosaccharide residues from glycopeptides with and without core fucosylation was observed. The fragment ions at m/z 1118.5 and 1264.5 correspond to the peptide with the core GlcNAc and the corresponding fucosylated variant, allowing identification of the peptide as 767THVPCNTS (site 772N15, Figure 1A). In the HCD spectra of the nonsialylated glycopeptide, additional fragment ions were found at m/z 781.29 and m/z 915.39, which were interpreted as the nonglycosylated peptide and the peptide + GlcNAc that had lost the N-terminal amino acids Thr-His-Val (Figure 3C). Fragment ions from cleavages in the peptide backbone were consistently found in all the spectra of doubly charged precursor ions of these glycoforms but not in the triply charged ones.
Figure 3

Pooled mass chromatograms of five N-glycopeptides labeled Gp 1–5 interpreted as glycoforms of the peptide 767THVPCNTS (site 772N15, Figure 1) from a subtilisin digest of MUC2-C (A) and pooled full scans collected between 14.5 and 15 min (B). Detected precursor ions and retention times of Gp1–5 are inserted in B. HCD fragmentation spectra of Gp 2 detected at m/z 1342.52 [M + 2H]2+ and Gp 3 detected at m/z 992.38 [M + 3H]3+ are shown in C and D, respectively.

Figure 4

HCD fragmentation spectra of N-glycopeptides from a subtilisin digest of MUC2-C interpreted to have the sequence 679DTCCNIT (site 683N12, Figure 1B) and the monosaccharide compositions [Hex5HexNAc2] detected at m/z 1050.38 [M + 2H]2+ (A) and (Hex5 HexNAc4Fuc) at m/z 884.66 [M + 3H]3+ (B).

Pooled mass chromatograms of five N-glycopeptides labeled Gp 1–5 interpreted as glycoforms of the peptide 767THVPCNTS (site 772N15, Figure 1) from a subtilisin digest of MUC2-C (A) and pooled full scans collected between 14.5 and 15 min (B). Detected precursor ions and retention times of Gp1–5 are inserted in B. HCD fragmentation spectra of Gp 2 detected at m/z 1342.52 [M + 2H]2+ and Gp 3 detected at m/z 992.38 [M + 3H]3+ are shown in C and D, respectively. HCD fragmentation spectra of N-glycopeptides from a subtilisin digest of MUC2-C interpreted to have the sequence 679DTCCNIT (site 683N12, Figure 1B) and the monosaccharide compositions [Hex5HexNAc2] detected at m/z 1050.38 [M + 2H]2+ (A) and (Hex5 HexNAc4Fuc) at m/z 884.66 [M + 3H]3+ (B). We detected five glycoforms of a peptide eluting at 16.2–16.7 min that was interpreted as 679DTCCNIT (site 683N12, Figure 1A), of which two peptides were of a less abundant high mannose/hybrid type and three of a complex type (Table 1). The HCD spectrum of the high mannose type (GlcNAc2, Man2) glycoform detected at m/z 1050.38 [M + 2H]2+ is shown in Figure 4A. The four fragment ions at m/z 764.23, 883.32, 967.34, and 1086.40 were present in all HCD spectra of the respective glycoforms. The ions at m/z 883.32 and 1086.40 were assigned to the protonated peptide and the y1 ion (peptide + GlcNAc), and the m/z 764.23 and 967.34 ions were assigned to the same fragments after loss of the C-terminal threonine. Additional fragment ions at m/z 866.30 and 747.24 were interpreted as secondary and tertiary fragment ions formed after loss of the glycan, ammonia (17.02), and threonine from the peptide. A fragment ion at m/z 651.18 was interpreted as being formed from loss of the N-glycan and Ile-Thr from the C-terminal end, supporting the interpretation of the peptide sequence. A second glycoform on the same peptide was found at m/z 884.66 [M + 3H]3+ with the glycan composition [Hex5HexNAc4Fuc] as shown in Figure 4B.

Site-specific O-glycosylation of MUC2-C digested with Asp-N, trypsin, chymotrypsin, or subtilisin

Spectra of 15 O-glycopeptides from four different sites were elucidated from HCD experiments of enzymatic digests of the MUC2-C, and the results are compiled in Supporting Information Table 6. Four additional O-glycopeptides from the GFP/Myc-tag regions were detected, but these are not discussed further. Based on the sequence, we would expect to identify more O-glycosylated sites. However, the first 120 amino acids of the recombinant MUC2-C terminal are almost solely composed of serine and threonine residues with the potential to become heavily O-glycosylated; hence, no peptides were observed due to the lack of proteolytic cleavage sites and potential glycan density. O-Glycopeptides were distinguished from N-glycopeptides by a more simple fragmentation pattern, and all the observed O-glycans were core 1 type O-glycans with 1–4 sugar residues, as previously reported for CHO-K1 cells.[27,28]O-Glycan sites were only identified in three areas in the peptide backbone: between amino acids Ser118 and Ser132, on Thr483 or Thr484, and between Thr965 and Ser968 (Figure 1B). The HCD spectrum of the highly abundant ion at m/z 872.92 [M + 2H]2+ corresponded to an O-glycopeptide trisaccharide from the chymotryptic digest (Figure 5). This structure was dominated by three large fragment ions corresponding to sequential loss of the monosaccharide residues NeuAc, NeuAc + Hex, and NeuAc + Hex + HexNAc from the peptide, and the diagnostic ions at m/z 274, 292, and 366. The proposed peptide mass observed at m/z 1088.60 [M + H]+ matched the sequence 126GLRPYPSSVL in MUC2-C Δppm = 1, and in addition fragment ions at m/z 771.41, 858.44, and 957.51 confirmed the proposed peptide sequence, and correspond to the peptide after loss of the amino acids S, V, and L from the C-terminal end. The ion at m/z 587.33 corresponds to the N-terminal amino acids 126GLRPY. As the assigned peptide contained two potential glycosylated Ser residues, we reanalyzed the chymotryptic material using ETD fragmentation, which made it possible to assign Ser132 (126GLRPYPSVL) as the glycosylation site (Supporting Information Table S6).
Figure 5

HCD fragmentation spectrum of a O-glycopeptide at m/z 872.92 [M + 2H]2+ from a chymotryptic digest of MUC2-C interpreted to have the sequence 126GLRPYPSSVL (site Ser132, Figure 1B) and the monosaccharide composition (HexHexNAcNeuAc).

HCD fragmentation spectrum of a O-glycopeptide at m/z 872.92 [M + 2H]2+ from a chymotryptic digest of MUC2-C interpreted to have the sequence 126GLRPYPSSVL (site Ser132, Figure 1B) and the monosaccharide composition (HexHexNAcNeuAc).

Discussion

In the present study we demonstrate that substantial information on site-specific glycosylation of a large purified glycoprotein can be obtained when combining standard proteomics methods without including additional preparative steps selecting for glycopeptides. This is an attractive approach to use when it is possible to obtain relatively small amounts (low μg) of highly purified material. Most methodologies described for glycopeptide enrichment will have the drawback of selecting for specific types of N-glycans or O-glycans, such as lectin based approaches or by chemically modifying the glycans.[29,30] Here, the use of multiple digestive enzymes and HCD fragmentation enabled the identification of 56 N- and 15 O-glycopeptides covering 10 N- and 4 O-glycosylation sites of a purified recombinant protein expressed in CHO-K1 cells containing the last 981 amino acids of the human MUC2 mucin. The majority of the MUC2-C derived glycopeptides eluted early, as expected distributed over several minutes, allowing time for fragmentation of multiple glycoforms. The MUC2-C protein does not contain sufficient numbers of tryptic cleavage sites to generate reasonable sized peptides required for complete sequence coverage. To overcome this problem, we used additional enzymes for protein digestion, such as Asp-N and the less specific enzymes chymotrypsin and subtilisin. Chymotrypsin and subtilisin have been unattractive to use compared to trypsin, as these enzymes are less specific compared to trypsin and generally generate smaller peptides with no fixed basic amino acids. The small size and reduced information obtained via fragmentation result in reduced peptide identifications and overall lower peptide confidence scores.[31,32] However, when working with a purified protein and high-resolution mass spectrometry, these drawbacks can be overcome. In our analyses, digestion with chymotrypsin generated the highest sequence coverage. One reason for this was that overnight digestion generated many for these enzymes nontypical peptides. In total, we identified 64 (of 101) chymotryptic peptides formed by cleavages after residues other than L, W, F, and Y. We also found that chymotrypsin frequently cleaved C-terminally of cysteines alkylated with iodoacetamide (13 peptides), something that was advantageous in our case, as the MUC2 mucin is rich in cysteines. Details on the characteristics of HCD spectra from tryptic N-glycopeptides have been published by others.[18,19,33] We here show that nontryptic glycopeptides generate equally informative fragmentation spectra when analyzed using high-resolution HCD fragmentation. However, not all spectra were straightforward to interpret due to the presence of secondary fragment ions from Fuc on the core GlcNAc, and the presence of fragment ions originating from cleavages in the amino acid sequence. The diagnostic y-fragment ion series containing the peptide and one or two core monosaccharide residues were detected in all 56 spectra but were not always among the more abundant ions. In addition to the nonspecific enzymes subtilisin and chymotrypsin, we also tried proteinase K, which has recently been used by others for glycoproteomics.[23,24] However, digestion using proteinase K generated many overlapping glycopeptides of the same site which made spectral data interpretation time-consuming. MUC2-C produced in CHO cells was found to be N-glycosylated on 17 out of 18 N-glycosylation consensus sites (N-X-S/T) and only one consensus site that was shown unmodified (N5, Figure 1B). The N-glycans of the individual sites were found to differ, with abundant core fucosylation on some sites and little or none on others. Terminal NeuAc was absent on N12, and larger N-glycans were found on sites N6 and N8 than on the other sites. We did not detect any NeuGc, something that might be because the CHO cell clone used or because the cells were grown serum free. A single N-glycan of high mannose type was detected at site N17, suggesting that protein folding may modify N-glycan trimming on this particular site when the protein passes through Golgi. As the protein was produced in CHO-K1 cells, the N-glycans composition might not reflect the structures found on the native MUC2 protein, although the same subset of glycosyltransferases involved in N-glycosylation is expressed in both species, which indicates that similar structures can be synthesized.[34] The domain structure of the human MUC2 mucin C-terminal is identical when compared to that of rodents, and the majority of the consensus sequences for N-glycosylation is conserved.[35] A total of seven N-glycan motifs were only covered after de-N-glycosylation, and their composition remains unknown. The first four motifs (i.e141N1, 153N2, 164N3, and 175N4) were not identified and are found in close proximity to each other with a limited number of potential enzymatic cleavage sites, which is a likely reason why no glycopeptides were identified in this region. In addition there might be N-glycopeptides derived from other sites with unfavorable properties for the analysis by reverse phase chromatography, or resulting in fragmentation spectra with low ion intensities or limited sequence information. The identification of these missing sites might require N-glycopeptide enrichment using hydrophilic interaction chromatography (HILIC) prior to MS analyses to reduce the complexity of the samples by removing the majority of the unmodified peptides.[36] The human MUC2 C-terminus shows extensive sequence homology with the two other secreted human mucins MUC5B and MUC5AC and the von Willebrand factor (vWF) especially when it comes to the distribution of Cys residues. Interestingly, the vWF differs from the mucins, which are more similar, on the localization of the N-glycan sites. Only one single N-glycan site is conserved between vWF and human MUC2, and this is not conserved in two other secreted mucins (MUC5AC and MUC5B). This common Asn is N-glycosylated in vWF and has been shown to be essential for the folding/function of vWF, as mutations at this site give the severe von Willebrand type III disease.[37]N-Glycans are added cotranslationally and before folding and disulfide bond formation, suggesting that N-glycans are utilized to generate differently folded proteins and maybe different disulfide bonds. There are to date 20 different GalNAc transferases described to be involved in O-glycan synthesis in the Golgi, adding the first monosaccharide residue to serine or threonine initiating O-glycosylation. These transferases have differing substrate specificity and have also been shown to be expressed in a cell and developmental stage specific manner (reviewed by Bennett et al.[38]). Consequently the site specific O-glycosylation found on a recombinant protein produced in CHO-K1 cells will of course depend on the repertoire of GalNAc-transferases present in these cells and not fully reflect the MUC2 present in the human intestine.[5,39] However, the identification of O-glycans on MUC2-C produced in a cell line still reveals important information. These O-glycosylated areas prove that these regions of the protein backbone must be situated on the exterior of the folded molecule. In this context, our O-glycan data fits well with our previously published work on certain pathogenic bacterial and parasite proteases which can disrupt the mucus gel.[5,40] These proteases were found to target the same specific site in the MUC2 C-terminal, and the addition of a single GalNAc in proximity of the cleavage site abolished the proteolytic activity (437LSTPSIIR↓T↓TGLRPYPSSVL; arrows show cleavage sites and underlines show the observed glycosylation site in the present study). Proteolytic processing of mucins by endogenous or bacterial proteases is important for maintaining a functional colonic mucus layer. A prerequisite for understanding these complex processes is knowledge of protein modifications, especially glycosylation. As we identified the majority of the modified sites on the C-terminal of MUC2, our approach can be used as a reference when studying site-specific glycosylation on mucus collected from clinical samples. The protein quantities used for this method are possible to obtain from these sources, although one has to consider the increase in complexity by introducing peptides derived from the N-terminal of MUC2 and other mucus associated proteins. We have now shown that by using multiple enzymes for protein digestion in combination with high-resolution mass spectrometry and different types of ion fragmentation one can obtain detailed insight into complex and large glycoproteins such as the mucins.
  40 in total

1.  Semi-supervised learning for peptide identification from shotgun proteomics datasets.

Authors:  Lukas Käll; Jesse D Canterbury; Jason Weston; William Stafford Noble; Michael J MacCoss
Journal:  Nat Methods       Date:  2007-10-21       Impact factor: 28.547

Review 2.  Application of electron transfer dissociation (ETD) for the analysis of posttranslational modifications.

Authors:  Julia Wiesner; Thomas Premsler; Albert Sickmann
Journal:  Proteomics       Date:  2008-11       Impact factor: 3.984

3.  Enrichment of glycopeptides for glycan structure and attachment site identification.

Authors:  Jonas Nilsson; Ulla Rüetschi; Adnan Halim; Camilla Hesse; Elisabet Carlsohn; Gunnar Brinkmalm; Göran Larson
Journal:  Nat Methods       Date:  2009-10-18       Impact factor: 28.547

4.  Gel-forming mucins appeared early in metazoan evolution.

Authors:  Tiange Lang; Gunnar C Hansson; Tore Samuelsson
Journal:  Proc Natl Acad Sci U S A       Date:  2007-10-02       Impact factor: 11.205

5.  A complex, but uniform O-glycosylation of the human MUC2 mucin from colonic biopsies analyzed by nanoLC/MSn.

Authors:  Jessica M Holmén Larsson; Hasse Karlsson; Henrik Sjövall; Gunnar C Hansson
Journal:  Glycobiology       Date:  2009-03-25       Impact factor: 4.313

6.  Proteomic analyses of the two mucus layers of the colon barrier reveal that their main component, the Muc2 mucin, is strongly bound to the Fcgbp protein.

Authors:  Malin E V Johansson; Kristina A Thomsson; Gunnar C Hansson
Journal:  J Proteome Res       Date:  2009-07       Impact factor: 4.466

7.  Affinity enrichment and characterization of mucin core-1 type glycopeptides from bovine serum.

Authors:  Zsuzsanna Darula; Katalin F Medzihradszky
Journal:  Mol Cell Proteomics       Date:  2009-08-12       Impact factor: 5.911

Review 8.  Structural glycomics using hydrophilic interaction chromatography (HILIC) with mass spectrometry.

Authors:  Manfred Wuhrer; Arjen R de Boer; André M Deelder
Journal:  Mass Spectrom Rev       Date:  2009 Mar-Apr       Impact factor: 10.946

Review 9.  Mucin-type O-glycosylation--putting the pieces together.

Authors:  Pia H Jensen; Daniel Kolarich; Nicolle H Packer
Journal:  FEBS J       Date:  2009-11-17       Impact factor: 5.542

10.  Glycomics profiling of Chinese hamster ovary cell glycosylation mutants reveals N-glycans of a novel size and complexity.

Authors:  Simon J North; Hung-Hsiang Huang; Subha Sundaram; Jihye Jang-Lee; A Tony Etienne; Alana Trollope; Sara Chalabi; Anne Dell; Pamela Stanley; Stuart M Haslam
Journal:  J Biol Chem       Date:  2009-12-01       Impact factor: 5.157

View more
  5 in total

Review 1.  Intestinal epithelial glycosylation in homeostasis and gut microbiota interactions in IBD.

Authors:  Matthew R Kudelka; Sean R Stowell; Richard D Cummings; Andrew S Neish
Journal:  Nat Rev Gastroenterol Hepatol       Date:  2020-07-24       Impact factor: 46.802

2.  Mucin Agarose Gel Electrophoresis: Western Blotting for High-molecular-weight Glycoproteins.

Authors:  Kathryn A Ramsey; Zachary L Rushton; Camille Ehre
Journal:  J Vis Exp       Date:  2016-06-14       Impact factor: 1.355

Review 3.  The Role of Electron Transfer Dissociation in Modern Proteomics.

Authors:  Nicholas M Riley; Joshua J Coon
Journal:  Anal Chem       Date:  2017-12-12       Impact factor: 6.986

4.  Molecular basis for intestinal mucin recognition by galectin-3 and C-type lectins.

Authors:  Charlotte Leclaire; Karine Lecointe; Patrick A Gunning; Sandra Tribolo; Devon W Kavanaugh; Alexandra Wittmann; Dimitrios Latousakis; Donald A MacKenzie; Norihito Kawasaki; Nathalie Juge
Journal:  FASEB J       Date:  2018-01-29       Impact factor: 5.834

5.  Exploring Effects of Chitosan Oligosaccharides on the DSS-Induced Intestinal Barrier Impairment In Vitro and In Vivo.

Authors:  Yujie Wang; Rong Wen; Dongdong Liu; Chen Zhang; Zhuo A Wang; Yuguang Du
Journal:  Molecules       Date:  2021-04-11       Impact factor: 4.411

  5 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.