Literature DB >> 28221766

Proteoform Profile Mapping of the Human Serum Complement Component C9 Revealing Unexpected New Features of N-, O-, and C-Glycosylation.

Vojtech Franc^1,2, Yang Yang^1,2, Albert J R Heck^1,2.

Abstract

The human complement C9 protein (∼65 kDa) is a member of the complement pathway. It plays an essential role in the membrane attack complex (MAC), which forms a lethal pore on the cellular surface of pathogenic bacteria. Here, we charted in detail the structural microheterogeneity of C9 purified from human blood serum, using an integrative workflow combining high-resolution native mass spectrometry and (glyco)peptide-centric proteomics. The proteoform profile of C9 was acquired by high-resolution native mass spectrometry, which revealed the co-occurrence of ∼50 distinct mass spectrometry (MS) signals. Subsequent peptide-centric analysis, through proteolytic digestion of C9 and liquid chromatography (LC)-tandem mass spectrometry (MS/MS) measurements of the resulting peptide mixtures, provided site-specific quantitative profiles of three different types of C9 glycosylation and validation of the native MS data. Our study provides a detailed specification, validation, and quantification of 15 co-occurring C9 proteoforms and the first direct experimental evidence of O-linked glycans in the N-terminal region. Additionally, next to the two known glycosylation sites, a third novel, albeit low abundant, N-glycosylation site on C9 is identified, which surprisingly does not possess the canonical N-glycosylation sequence N-X-S/T. Our data also reveal a binding of up to two Ca2+ ions to C9. Mapping all detected and validated sites of modifications on a structural model of C9, as present in the MAC, hints at their putative roles in pore formation or receptor interactions. The applied methods herein represent a powerful tool for the unbiased in-depth analysis of plasma proteins and may advance biomarker discovery, as aberrant glycosylation profiles may be indicative of the pathophysiological state of the patients.

Entities: CellLine Chemical Disease Gene Species

Mesh：

Substances：
Complement C9

Year: 2017 PMID： 28221766 PMCID： PMC5362742 DOI： 10.1021/acs.analchem.6b04527

Source DB: PubMed Journal: Anal Chem ISSN： 0003-2700 Impact factor: 6.986

Post-translational modifications (PTMs) of proteins regulate their activity, localization, turnover, interactions, and many other important physiological processes.[1,2] Of all possible PTMs, protein glycosylation is one of the most abundant yet structurally diverse PTM, making it analytically and biochemically challenging to monitor.[3] This is because the enzymes involved in the glycosylation machinery can produce diverse glycosylation patterns on proteins and heterogeneous populations of glycans at every occupied glycosylation site.[4−6] In addition, these modifications are also present in nonstoichiometric amounts with multiple varieties of chemical moieties involved. Thus, new methods are needed for their detailed analysis. Recent progress in high-resolution native electrospray ionization (ESI) mass spectrometry (MS) can provide a novel means to facilitate the in-depth analysis of all coappearing modifications, at least when they are distinguishable in mass.[7−9] In combination with peptide-centric proteomics, this approach is very useful for high detail analysis of PTMs on proteins from different biological sources and can provide direct assessment of the biosimilarity among similar therapeutic proteins.[10] Through integrating native MS data and peptide-centric proteomics, one can obtain information about composition, stoichiometry, site-specificity, and relative abundance of the modifications at each site. Moreover, a direct cross-evaluation of both data sets (native MS and peptide-centric MS) validates completeness of each approach and provides reliability for the quantitative profiling of protein proteoforms. So far, in the analysis of protein proteoforms, most MS methods typically use denaturing conditions prior to the ESI process.[11−14] These conditions inevitably disrupt protein tertiary and quaternary structures. Although native MS is often regarded as somewhat less sensitive, it provides the advantage that the resulting mass spectra are less congested, as the ion signals are distributed over a substantially fewer number of charge states and over a wider m/z window. Collectively, such hybrid mass spectrometry strategies have the potential to become beneficial for the study of biologically important (glyco)proteins, whereby knowledge about their precise modifications is crucial in understanding their activity and function. Most proteins in human blood plasma are decorated by a plethora of PTMs, particularly involving glycosylation, and the complement component protein C9 is not an exception.[15] Human C9 is primarily produced in the liver and plays a key role in the formation of the membrane attack complex (MAC), together with the other complement proteins C5, C6, C7, and C8. While several cryoEM maps have recently become available for the MAC,[16,17] no detailed structure is available for its C9 component. Still, amino acid alignments have identified several domains in C9 based on its homology to other proteins. These include the N-terminal type 1 thrombospondin (TSP) domain, a low-density lipoprotein receptor class A repeat (LDLRA), a number of potential transmembrane regions, and the C-terminal epidermal growth factor (EGF)-like domain (Figure ).[18] The majority of the detailed characterization studies of the PTMs occurring on C9 dates back to the previous century, when techniques used to perform such analysis were very cumbersome. In these early studies, C9 was reported to be N-glycosylated;[19−21] however, no current evidence exists regarding the composition and heterogeneity of these N-linked glycans. Moreover, although suspected, no direct proof has been reported for the presence of O-linked glycans. C9 is additionally modified by a rarer type of glycosylation: C-mannosylation.[22] With such a diverse repertoire of modifications, C9 not only presents a challenging analytical target but also imparts a potential variability in its physiological functioning. Exemplary findings to support this statement come from reports wherein the extracellular Ser phosphorylation of C9 by ecto-protein kinases in cancer cells K562 was proposed to serve as a protective mechanism against complement in tumor cells.[23] Moreover, C9 with fucosylated N-glycans has been suggested as a biomarker for squamous cell lung cancer, as patients tend to show overproduction of these proteoforms.[24] A detailed map of the full proteoform profile of serum-derived complement component C9 is, therefore, of high interest.

Figure 1

Schematic of domain composition and primary structure of C9. The scheme includes previously reported sites of C-mannosylation[22] and N-glycosylation on C9.[19−21] The glycan nomenclature used is indicated at the bottom. Here, we report an unbiased and in-depth analysis of the complement component C9 protein isolated from pooled human blood serum (of at least three donors) using modern, hybrid MS technologies. Our data provides a detailed view of the modifications co-occurring on C9. We validate all identified PTMs from high quality tandem mass spectrometry (MS/MS) spectra using a peptide-centric approach. In addition to the earlier reported C9 modifications, our data revealed the attachment of mucin type of O-glycosylation in the N-terminal part of C9, providing the first experimental evidence of this modification on C9. Except information about PTMs, our native MS measurements suggest binding of up to two Ca2+ ions on C9. Since the N-terminal region of C9 seems to play a crucial role in the C9 polymerization process and thus also in the assembly of MAC,[25] the role of the here identified O-glycosylation site may act as a target for further functional investigations. Moreover, we identified a novel low abundant N-glycosylation site on N215 that is highly conserved throughout mammals and does not adhere to the typical N-glycosylation sequon N-X-S/T (X can be any amino acid except P).

Materials and Methods

Chemicals and Materials

Complement component C9 (Uniprot Code: P02748) purified from pooled human blood plasma (more than three healthy donors) was acquired from Complement Technology, Inc. (Texas, USA). The sample was purified according to a standard protocol[26] (the certificate of analysis is attached in the Supporting Information S5). Dithiothreitol (DTT), iodoacetamide (IAA), and ammonium acetate (AMAC) were purchased from Sigma-Aldrich (Steinheim, Germany). Formic acid (FA) was from Merck (Darmstadt, Germany). Acetonitrile (ACN) was purchased from Biosolve (Valkenswaard, The Netherlands). POROS Oligo R3 50 μm particles were obtained from PerSeptive Biosystems (Framingham, MA, USA) and packed into GELoader pipet tips (Eppendorf, Hamburg, Germany). Sequencing grade trypsin was obtained from Promega (Madison, WI). Glu-C, Asp-N, PNGase F, and Sialidase were obtained from Roche (Indianapolis, USA).

Sample Preparation for Native MS

Unprocessed protein solution in a phosphate buffer at pH 7.2, containing ∼30–40 μg of C9, was buffer exchanged into 150 mM aqueous AMAC (pH 7.5) by ultrafiltration (vivaspin500, Sartorius Stedim Biotech, Germany) using a 10 kDa cutoff filter. The resulting protein concentration was measured by UV absorbance at 280 nm and adjusted to 2–3 μM prior to native MS analysis. The enzyme Sialidase was used to remove sialic acid residues from C9. PNGase F was used to cleave the N-glycans of C9.[26] All samples were buffer exchanged to 150 mM AMAC (pH 7.2) prior to native MS measurements.

Native MS Analysis

Samples were analyzed on a modified Exactive Plus Orbitrap instrument with extended mass range (EMR) (Thermo Fisher Scientific, Bremen) using a standard m/z range of 500–10 000, as described in detail previously.[27] The voltage offsets on the transport multipoles and ion lenses were manually tuned to achieve optimal transmission of protein ions at elevated m/z. Nitrogen was used in the higher-energy collisional dissociation (HCD) cell at a gas pressure of 6–8 × 10–10 bar. MS parameters used: spray voltage, 1.2–1.3 V; source fragmentation, 30 V; source temperature, 250 °C; collision energy, 30 V; resolution (at m/z 200), 30 000. The instrument was mass calibrated as described previously, using a solution of CsI.[27]

Native MS Data Analysis

The accurate masses of the observed C9 proteoforms were calculated manually averaging over all detected charge states of C9. For PTM composition analysis, data were processed manually and glycan structures were deduced on the basis of known biosynthetic pathways. Average masses were used for the PTM assignments, including hexose/mannose/galactose (Hex/Man/Gal, 162.1424 Da), N-acetylhexosamine/N-acetylglucosamine (HexNAc/GlcNAc/GalNAc, 203.1950 Da), and N-acetylneuraminic acid (NeuAc, 291.2579 Da). All used symbols and text nomenclature are according to recommendations of the Consortium for Functional Glycomics.

In-Solution Digestion for Peptide-Centric Glycoproteomics

Intact human C9 protein in PBS buffer (10 mM sodium phosphate, 145 mM NaCl, pH 7.3) at a concentration of 1 mg/mL was reduced with 5 mM DTT at 56 °C for 30 min and alkylated with 15 mM IAA at room temperature for 30 min in the dark. The excess of IAA was quenched by using 5 mM DTT. C9 was digested overnight with trypsin at an enzyme-to-protein-ratio of 1:100 (w/w) at 37 °C. Another C9 sample was digested for 4 h by using Asp-N at an enzyme-to-protein ratio of 1:75 (w/w) at 37 °C, and the resulted peptide mixtures were further treated with trypsin (1:100; w/w) overnight at 37 °C. All proteolytic digests containing modified glycopeptides were desalted by GELoader tips filled with POROS Oligo R3 50 μm particles,[28] dried, and dissolved in 40 μL of 0.1% FA prior to liquid chromatography (LC)-MS and MS/MS analysis.

LC-MS and MS/MS Analysis

All peptides (typically 300 fmol of C9 peptides) were separated and analyzed using an Agilent 1290 Infinity HPLC system (Agilent Technologies, Waldbronn. Germany) coupled online to an Orbitrap Fusion mass spectrometer (Thermo Fisher Scientific, Bremen, Germany). Reversed-phase separation was accomplished using a 100 μm inner diameter 2 cm trap column (in-housed packed with ReproSil-Pur C18-AQ, 3 μm) (Dr. Maisch GmbH, Ammerbuch-Entringen, Germany) coupled to a 50 μm inner diameter 50 cm analytical column (in-house packed with Poroshell 120 EC-C18, 2.7 μm) (Agilent Technologies, Amstelveen, The Netherlands). Mobile-phase solvent A consisted of 0.1% FA in water, and mobile-phase solvent B consisted of 0.1% FA in ACN. The flow rate was set to 300 nL/min. A 45 min gradient was used as follows: 0–10 min, 100% solvent A; 10.1–35 min, 10% solvent B; 35–38 min, 45% solvent B; 38–40 min, 100% solvent B; 40–45 min, 100% solvent A. Nanospray was achieved using a coated fused silica emitter (New Objective, Cambridge, MA) (outer diameter, 360 μm; inner diameter, 20 μm; tip inner diameter, 10 μm) biased to 2 kV. The mass spectrometer was operated in positive ion mode, and the spectra were acquired in the data dependent acquisition mode. For the MS scans, the mass range was set from 300 to 2000 m/z at a resolution of 60 000 and the AGC target was set to 4 × 105. For the MS/MS measurements, HCD and electron-transfer and higher-energy collision dissociation (EThcD) were used. HCD was performed with normalized collision energy of 15% and 35%, respectively. A supplementary activation energy of 20% was used for EThcD. For the MS/MS scans, the mass range was set from 100 to 2000 m/z, and the resolution was set to 30 000; the AGC target was set to 5 × 105. The precursor isolation width was 1.6 Da, and the maximum injection time was set to 300 ms.

LC/MS and MS/MS Data Analysis

Raw data were interpreted by using the Byonic software suite (Protein Metrics Inc.),[29] and further validation of the key MS/MS spectra was performed manually. The following parameters were used for data searches: precursor ion mass tolerance, 10 ppm; product ion mass tolerance, 20 ppm; fixed modification, Cys carbamidomethyl; variable modification: Met oxidation, Trp mannosylation, and both N- and O-glycosylation from mammalian glycan databases. A nonenzyme specificity search was chosen for all samples. The database used contained the C9 protein amino acid sequence (Uniprot Code: P02748). Profiling and relative quantification of PTM modified C9 peptides were achieved by use of the extracted ion chromatograms (XICs) from two independently processed C9 samples. The peptide mixtures were prepared with different combinations of proteolytic enzymes as described above ((1) Trypsin; (2) AspN + Trypsin). For peak area calculations, the first three isotopes were taken from each manually validated peptide proteoform. Integrated peak areas were normalized for all PTM sites individually, and the average peptide ratios from the two samples were taken as a final estimation of the abundance. The XICs were obtained using the software Skyline.[30] The glycan structures of each glycoform were manually annotated. Hereby reported glycan structures are depicted without the linkage type of glycan units, since the acquired MS/MS patterns do not provide such information.

Combining Native MS and Peptide-Centric Proteomic Data

Reliability and completeness of the obtained proteoform profiles of C9 were assessed by an integrative approach combining the native MS data with the glycopeptide centric proteomics data. Details of this approach have been described in detail previously.[10] Briefly, in silico data construction of the “intact protein spectra” was performed on the basis of the masses and relative abundances of all site-specific PTMs derived from the glycopeptide centric analysis. Subsequently, the constructed spectrum was compared to the experimental native MS spectra of C9. The similarity between the two independent data sets (Native MS spectra and constructed spectra based on glycopeptide centric data) was expressed by a Pearson correlation factor. All R scripts used for the spectra simulation are available at github (https://github.com/Yang0014/glycoNativeMS). All C9 proteoforms predicted from the peptide-centric data were further filtered by taking 0.5% cutoff in relative intensity of the peaks in the experimental native spectrum, and mass deviations were manually checked.

Results

Native MS Analysis Provides Hints about Novel Unexpected PTMs and Ca2+ Binding to C9

We started our investigation by first acquiring high-resolution native ESI-MS spectra of the human complement component C9 (Figure a). The recorded native MS spectrum of C9 shows at least five different charge states, ranging from [M + 13H]13+ to [M + 17H]17+. Each charge state contains various ion series that correspond to different masses and thus different proteoforms of C9. On the basis of their distinguishable masses, taking a 1% cutoff in relative intensity of the peaks, we can distinguish at least ∼50 co-occurring MS signals. Since we suspected these could correspond to different proteoforms of C9, we set out to further examine and validate our findings.

Figure 2

Full native ESI-MS spectrum of the intact C9 sprayed from aqueous ammonium acetate (a). The charge states are indicated. Zoom in on the 15+ charged state in the inset reveals approximately 50 distinct ion signals. (b) Zoom in on the 15+ charged state, centered around m/z 4500, in the native mass spectra of intact C9. In (c), alike spectrum of C9 treated with PNGase F for 4 h reveals partial N-deglycosylation. In (d), C9 was treated with PNGase F for 48 h. In (e), C9 was first enzymatically desialylated, and in (f), C9 was partially denatured, prior to native MS analysis. The differences in mass between C9 proteoforms in the unprocessed and treated samples allows the deduction of the PTM composition of these most abundant C9 proteoforms. The mass of the most abundant ion in the unprocessed sample, at m/z of 4435.28, is 66 516.20 Da, from which a calculated composition of PTMs can be derived, Hex12HexNAc9NeuAc6 unambiguously. The 15 positive charges come from 13 H+ and 1 Ca2+ ion, due to the presence of a bound Ca2+ ion, as revealed in (f). To simplify the visualization of the C9 proteoform profile, we focused on the most intense charge state (15+). The average mass of the protein backbone of C9 is 60 954.02 Da. In this mass calculation, we used the mass of the C9 backbone sequence lacking the N-terminal signal peptide, corrected by the mass shift induced by the 12 disulfide bonds present in C9 (−24 × 1.0079 Da). Compared to, for instance, chicken ovalbumin[31] and CHO derived erythropotein,[10] which we previously analyzed by high-resolution native mass spectrometry, the native mass spectra of C9 are remarkably less heterogeneous, especially since, according to previously published data, C9 has been shown to be C-mannosylated[22] at the TPS domain and N-glycosylated[19−21] at two sites in the MACPF domain. Looking at the inset in Figure a, the mass difference of 656 Da between the abundant peaks with m/z of 4435.28 and 4479.02 corresponds to the glycan composition HexNAc1Hex1NeuAc1. The same mass difference can be observed between the abundant peaks with m/z of 4415.89 and 4459.61. This may correspond to either variability in the number of antennas on the N-glycans or the additional attachment of mucin type O-glycans. To address this, we next treated the protein with various enzymes that cleave off parts of the glycan moieties. The removal of N-glycans or sialic acid residues resulted in specific mass shifts, allowing us to calculate and partly predict the PTM composition of C9. For cleavage of N-glycosylations, we used PNGaseF (Figure b–d) and sialidase for the specific removal of sialic acids (Figure e). We subsequently subjected these treated C9 samples to native MS analysis. The incubation of C9 with PNGase F for 4 h at room temperature resulted in a removal of one of the two N-glycan chains (Figure c). The mass difference of 2206 Da between the most abundant intact C9 proteoform and the N-deglycosylated C9 indicated the attachment of a N-glycan with the composition of HexNAc4Hex5NeuAc2. A prolonged treatment (48 h; 37 °C) of C9 with PNGase F resulted in a second major mass shift of 2206 Da corresponding to a loss of the second N-glycan (Figure d). A closer look at the native MS spectrum of C9 exposed another lower abundant ion series in the higher mass range region suggesting the existence of an unexpected third N-glycosylation site. The observed mass difference of 2206 Da in the intact C9 between the peaks with m/z of 4435.28 and 4582.33 strongly indicates the presence of a third N-glycan with a similar composition as the other two known sites (Figure b–d). The heterogeneity in the proteoform profile of the fully N-deglycosylated C9 remained very similar (Figure b,d). Therefore, the observed mass differences of 656 Da most likely correspond to O-linked glycan chains with the overall composition of HexNAc1Hex1NeuAc1. Additionally, a mass difference of 947 Da between the peaks with m/z of 4435.28 and 4498.41 and between 4415.89 and 4459.61 suggests the occurrence of disialylated mucin O-glycans with the overall composition of HexNAc1Hex1NeuAc2. For the further assessment of the PTM composition of C9, we treated C9 with sialidase. The removal of sialic acid residues resulted in a pronounced simplification of the structural heterogeneity in C9 (Figure e). The proteoform occurring at 4435.284 m/z, which is known to contain only two occupied N-glycosylation sites, shifted to 4318.80 m/z, corresponding to an overall loss of six sialic acid moieties. Four out of six of these sialic acids were cleaved off from the two N-glycans. The remaining two sialic acids are thus most likely attached to O-glycans. This assumption is based on the mass difference of 1147.72 Da between the overall mass of C9 PTMs (66 513.76–60 954.02 Da = 5559.74 Da) and the total mass of N-glycans in the most abundant C9 proteoform represented by the peak with m/z of 4435.28 (2 × 2206.01 Da = 4412.02 Da). Since the proteoform occurring at 4435.28 m/z contains one O-glycan, further sequential mass differences (656 and 947 Da respectively) among the peaks with m/z of 4435.28, 4498.41, and 4542.22 suggest at least three O-glycans are attached to C9. As mentioned above, the expected total mass of PTMs for the proteoform present at 4435.28 m/z is 5559.74 Da. This estimation is based on the assumption that all charges on the C9 in the native MS spectrum come from H+. However, the total mass of experimentally proven PTM composition (Hex12HexNAc9NeuAc6) is 5522.01 Da. The remaining mass of 37.73 Da (5559.74–5522.01 Da) can be explained by the presence of a Ca2+ ion (40.076 Da) in the structure of C9. This means that the 15 charges present on the C9 at 4435.28 m/z come from 13 H+ and one from Ca2+. In that case, the deprotonated mass of the most abundant proteoform with all PTMs and a Ca2+ ion is 66 516.20, which corresponds to a small standard deviation of ±0.05 Da with respect to the calculated C9 mass: 66 516.11 Da. This hypothesis was supported by the recording native MS profile of the partly denatured C9 (Figure f). Denaturation of C9 through the addition of 1% formic acid resulted in a release of Ca2+ and corresponding mass shifts of all ions in the spectrum. This indicates that C9 can bind two Ca2+ ions, whereby the most abundant proteoforms contain just one bound Ca2+ ion.

Analysis of N-Glycosylation Revealed a Noncanonical N-Glycosylation Site on Asn215

C9 is a glycoprotein containing two known canonical N-glycosylation sites (N256 and N394). Our native MS measurements suggest the presence of a novel third low abundant N-glycosylation site. To validate this finding, C9 was digested using different proteolytic enzymes and the resulting peptide mixtures were analyzed by peptide centric LC-MS/MS. Data interpretation provided information about the site location, glycan type, composition, and abundance of all three N-glycans. Low energy HCD MS/MS spectra of the tryptic peptides with amino acid sequences FSYSKNETYQLFLSYSSK and AVNITSENLIDDVVSLIR clearly revealed the composition of the two known N-glycans on N256 and N394 (Figure S1a,b). Improved sequence coverage obtained by EThcD MS/MS further confirmed the amino acid sequence of these N-glycopeptides. In addition, we also identified the third N-glycosylation site at N215, using the tryptic peptide TSNFNAAISLK (N215). Low energy HCD (Figure a) and EThcD MS/MS (Figure b) spectra of this glycopeptide unambiguously confirmed its amino acid sequence, localization, and the composition of the N-glycan. Markedly, this newly determined N-glycosylation site does not adhere to the canonical N-X-S/T motif.

Figure 3

Low energy HCD MS/MS (a) and EThcD MS/MS (b) spectra of the peptide harboring the novel, noncanonical N-glycosylation site at N215, derived by tryptic digestion of C9. Both spectra were acquired for the same precursor with m/z of 1124.13 Da. Sequential fragmentation of the N-glycan moiety in the spectrum (a) allowed deduction of its glycan composition while the EThcD spectrum (b) provided confirmation of the peptide sequence and position of the N-glycan. “P” = peptide backbone of the glycopeptide.

Analysis of O-Glycosylation Confirmed the Presence up to Three O-Glycans

Peptide-centric LC MS/MS analysis also provided direct evidence of the existence of three O-glycosylation sites on C9. Figure S2 displays annotated EThcD MS/MS spectra of the C9 N-terminal peptide with amino acid sequence QYTTSYDPELTESSGSASHIDCR derived from proteolytic digestion of the C9 with AspN and trypsin. This peptide contains an N-terminal Q and exhibits a significant level of cyclization to pyroglutamic acid. Such N-terminal modifications are frequently reported in LC-MS/MS analysis.[32] Manual inspection of the MS/MS spectra exposed the structural composition of the attached O-glycans. These were mucin type O-glycans with 0, 1, or 2 sialic acids connected to the core structure HexNAc1Hex1. The peptide was found to be modified with up two O-glycans; however, their precise location could not be determined, as the fragmentation spectra lack sufficient signature ions. A peptide bearing the third O-glycosylation site, which is present in the native MS spectrum, was not detected in LC MS/MS, presumably due to its low abundance.

C9 TSP Domain Harbors Two C-Mannosylation Sites

C9 contains a thrombospondin (TSP) domain, which may be C-mannosylated.[22] Looking back on the native MS profile of the intact C9, the most abundant peaks with m/z of 4415.89 and 4435.28 are in a vicinity with their lower abundant forms (4426.65 m/z and 4446.08 m/z) differing in mass by 162 Da. The same mass difference can also be observed in the N-deglycosylated and the desialylated C9, which corresponds to the presence of a single Hex. Peptide-centric LC MS/MS detected peptides originating from the MSPWSEWSQCDPCLR sequence. This peptide contains the well-known sequence motif WXXW, which is frequently mannosylated in proteins with TSP repeats.[33] Our EThcD MS/MS spectra unequivocally confirmed that in C9 W27 is fully occupied by C-mannosylation, whereas W30 is only partially occupied (Figure S3a,b).

Defining the Overall Structural Heterogeneity, Validation, and Quantification of C9 Proteoforms

Next, we also assessed the relative abundance of the different proteforms of C9. Figure a shows the overall view on all detected C9 proteoforms in the annotated native spectrum. The site-specific characterization of C9 is based on peptide-centric proteomics data (Figure b, Table S1). Two N-glycosylation sites (N256 and N394) are fully occupied by complex biantennary N-glycans with sialylated antennas, each with a low degree of structural heterogeneity (only ∼4% of the N-glycans at N256 have one of the antennas nonsialylated). The third novel noncanonical N-glycosylation site at N215 is occupied only in ∼1% of C9, carrying the same type of glycan as the other two N-glycosylation sites. In contrast to the N-glycosylation, the O-glycosylation exhibits higher variability in its degree of sialylation. EThcD MS/MS spectra conclusively confirmed the composition of the presented O-glycan chains. The N-terminal peptides that contain an amino acid sequence of QYTTSYDPELTESSGSASHIDCR possess mucin type O-glycans with a different level of sialylation. In 36% of C9, the peptide is occupied by an O-glycan with a sialic acid attached to both glycan units, HexNAc and Hex. The most abundant form (52%) represents O-glycan chains with one sialic acid attached to Hex. The lowest abundant form (8%) contains a nonsialylated O-glycan core. Unfortunately, our EThcD MS/MS spectra did not reveal the exact location of the O-glycosylated sites; however, the observed dual series of c, z, y, and b ions indirectly suggest that T11 is likely occupied (Figure S2b–d). The O-glycopeptide with two attached O-glycans is presented in less than 3% of relative abundance. The N-terminal peptide was also found to be unmodified (Figure S2a); however, its abundance is almost negligible (0.5%). Moreover, C-mannosylation contributes to the C9 heterogeneity with a variable occupancy of W residues by Man in the sequence motif WXXW. Tryptophan 27 is 100% occupied by Man, while W30 is only partially modified (23%).

Figure 4

Summary of assignments of PTMs on C9. (a) Native MS spectrum of C9, zoomed in on charge state [M + 13H + Ca2+]15+. The overall PTM composition of the most abundant proteoform with m/z of 4435.28 was deduced as described in Figure . Mass differences among the peaks correspond to various glycan units or glycans. (b) Relative abundances of peptide proteoforms were estimated from their corresponding peptide ion currents (XICs). Each PTM modified peptide was normalized individually so that the sum of all proteoforms was set to 100%. For clarity, only parts of the peptide sequence carrying PTMs are shown below the graph. (c) A comparison of the intact C9 native MS spectrum with the in silico constructed spectrum based on the peptide-centric proteomics data. The correlation is very high (R = 0.92). (d) Structural model of poly-C9[16] and mono-C9, whereby the sites of the modifications are indicated. The poly-C9 model was chosen as template for the mono-C9 using I-Tasser,[44] and the model was processed by means of the PyMOL Molecular Graphic System, Version 1.8 Schrödinger, LLC. (e) Overview of the C9 sequence with all identified PTM sites, wherein the newly discovered O-glycosylation sites at the N-terminus and the novel N-glycosylation site at residue 236 are highlighted in orange and purple, respectively. A comparison of the intact C9 native MS spectrum with in silico constructed MS spectrum (Figure c) reveals a high degree of consistency between the native MS and peptide-centric MS approach (R = 0.92). Using this method, 15 C9 proteoforms could be validated (Table S2). The unmatched peaks mostly correspond to C9 proteoforms containing two Ca2+ ions and species that were not detected during the peptide centric analysis. For example, the peak at m/z 4479.08 most likely corresponds to the C9 proteoform containing two O-glycans with one and two sialic acids on the core HexNAc1Hex1. Since a peptide with this PTMs composition was not detected by LC-MS/MS, the reconstructed native-like spectrum does not contain this proteoform. Several nonannotated low abundant ions signals correspond to Na+ and/or K+ adducts, which are very frequent artifacts in native MS. The constructed spectrum also contains systematic artifacts, caused by the fact that labile PTMs are easily lost during the peptide-centric LC MS/MS analysis.[34] In our case, the most prevalent artifacts are glycopeptides that lose partly their sialic acids and O-linked glycan chains. Other MS signals with relative abundance below 0.5% were eventually filtered out during the validation process. This cross-correlation of the data pinpoints the advantages of uniting native MS and peptide-centric LC-MS/MS, as well as the weaknesses associated with the use of only a peptide centric approach. Without the information obtained from the native MS measurements, the undesired peptide artifacts and modifications that are not detected or overlooked at the peptide-centric level would not have been discovered. Still, the consistency between both data sets is very high (R = 0.92), verifying that we did cover and validate most of detected signal of the C9 proteoforms and their PTMs in a comprehensive manner.

Discussion

All previous reports on C9 PTMs were largely focused on only one type of PTM. From these studies, it has been known that C9 is N-glycosylated[19−21] and C-mannosylated[22] with 8% of the total mass of C9 originating from these attached glycans.[35] The majority of these glycan chains are distributed between the two previously known N-glycosylation sites (N256 and N394); however, their exact monosaccharide composition had so far remained elusive. A study suggested that the N-linked glycans of C9 are of the tri- or tetra-antennary complex type since neither endo F nor endo H released the glycans.[21] Here, we unambiguously determined the composition of these N-glycans using low energy HCD. In agreement with previous observations, the N-deglycosylation reaction proceeded sequentially and in a time dependent manner.[21] Incubation of C9 with PNGase F for 4 h at 37 °C resulted in the cleavage of a single N-linked glycan. The second N-glycan (N256) was released only after prolonged incubation (48 h). Surprisingly, the native MS profile of N-deglycosylated C9 was shifted in mass but did not substantially change in comparison with the native MS profile of the intact C9, revealing the homogeneous N-glycosylation patterns on C9. As C9 contains only two canonical N-glycosylation sequons, the observation of a third N-glycosylation, located at N215 (NAA), was not expected. The N-glycosylation sequon N-X-S/T is well established. However, a few reports have shown N-glycosylation in the sequence N-X-C[36−38] or N-X-V.[39,40] In this regard, a high-resolution structural model of the bacterial oligosaccharyltransferase (OTase) provided evidence that an amino acid in the +2 position increases the binding affinity of N to the active site of OTase, but it is not the key requirement for a glycosylation. Rather, key requirements are that the position of the N residue be at the surface of a protein, neutrality of the amino acid in the +2 position, and compatibility with correct folding.[41] Nevertheless, our findings seem to represent a very unique case as an N-glycan attached to the N in the NAA sequence has, as far as we know, no analogy in the available literature. All three found N-glycans are biantennary complex type and only slightly vary in their degree of terminal sialylation. Despite the fact that experimental evidence about N-glycosylation is available mostly for human C9, sequence analysis of C9 from various species and their alignment provided an interesting view on the C9 glycosylation in general (see Figure S4). Although the presence of O-glycans on C9 was proposed more than 20 years ago,[21] no direct evidence has been previously reported. Mucin type O-linked glycan chains, which were found in this study, are mostly located at the N-terminal part of the C9. The amino acid sequence QYTTSYDPELTESSGSASHIDCR of the N-terminal peptide does not seem to contain any distinct sequence motif for O-glycosylation; however, our integrated hybrid MS approach unambiguously demonstrated that O-glycans are attached to C9. The highly labile nature of O-glycans and their typically weak ionization response may have hampered their detection in earlier MS analysis. EThcD MS/MS spectra of the C9 O-glycosylated peptides revealed the composition of the O-linked glycan chains that are consistent with a core 1 type (GalNAc, also known as T-antigen). This structure is further terminally modified by a sialylation whereby the disialyl core 1 type was found to be the most abundant. Here, we report O-glycosylation of C9 for the first time, so the possible functional significance of these modifications remains speculative. Interestingly, it is known that the domain within the first 16 amino acids at the N-terminus of C9 is crucial in regulating the self-polymerization of C9.[25] Since the N-termini and also the putative O-glycosylation sites are not highly conserved among mammals (Figure S4), it seems unlikely that the O-glycans would play a significant role in the polymerization process, unless this is species specific as well. O-Linked glycans can also bind bacterial carbohydrate binding receptors.[42] It is therefore tempting to speculate that the O-glycans on C9 protein in the MAC complex may interact with receptors or glycans at the surface of encountering pathogens. The third type of glycosylation harbored on C9 is C-mannosylation. This less common glycosylation has been previously found to occur on C9 using MS, Edman degradation, and nuclear magnetic resonance (NMR).[22] It was already known that the N-terminal type-1 TSP domain of C9 can be modified with two mannoses. Here, we report data supporting these findings. While in human C9 W27 is 100% occupied, W30 is C-mannosylated in only ∼23% of the C9. All these PTMs are covalently attached to the backbone sequence of C9. Our native MS data provided additional information regarding the noncovalent binding of Ca2+ to C9. The only direct experimental proof about the binding of Ca2+ by intact C9 was reported two decades ago through the use of plasma emission spectroscopy. It was determined that C9 binds one Ca2+ ion per molecule with a dissociation constant of ∼3 μM at physiological pH and ionic strength.[43] Here, we confirm the findings of that earlier report. Notably, standard bottom-up/middle-down proteomics approaches are not capable of identifying metal ion binding, due to the fact that such analyses happen under denaturing conditions. Our data indicate that C9 can possibly bind up to two Ca2+ ions. The exact site of Ca2+ binding in C9 has not been identified yet; nevertheless, one of the possible candidate sites is located at cysteine-rich LDLRA domain of C9.[43] Another possible Ca2+ binding site is at the C9 N-terminal part, which also harbors O-glycosylation sites. The 12 N-terminal amino acids of C9 exhibit a strong negative charge, and the 16 N-terminal amino acids contain a consensus sequence for Ca2+ binding proteins.[25] As mentioned above, the N-terminal region of C9, located at the outside of the MAC (Figure d,e), has been shown to play a significant role in C9 polymerization, hinting at a possible regulating role of the here identified O-glycosylation and Ca2+ binding in the assembly of the MAC or interactions with some other molecules or receptors. In conclusion, here, we provide an unbiased detailed specification of three different types of glycosylation on C9 isolated from human blood serum and quantification and validation of 15 C9 proteoforms. In total, we achieved more than 90% correlation between the native MS data and peptide centric data. Mapping all sites of modifications on a structural model of C9, as present in the MAC, hints at their putative roles in pore formation or receptor interactions. In general, the applied combination of MS methods represents a powerful tool for the in-depth analysis of plasma proteins and thus may advance biomarker discovery.

43 in total

Review 1. Protein glycosylation: nature, distribution, enzymatic formation, and disease implications of glycopeptide bonds.

Authors: Robert G Spiro
Journal: Glycobiology Date: 2002-04 Impact factor: 4.313

Review 2. The impact of glycosylation on the biological function and structure of human immunoglobulins.

Authors: James N Arnold; Mark R Wormald; Robert B Sim; Pauline M Rudd; Raymond A Dwek
Journal: Annu Rev Immunol Date: 2007 Impact factor: 28.527

3. Multiple dimeric forms of human CD69 result from differential addition of N-glycans to typical (Asn-X-Ser/Thr) and atypical (Asn-X-cys) glycosylation motifs.

Authors: B A Vance; W Wu; R K Ribaudo; D M Segal; K P Kearse
Journal: J Biol Chem Date: 1997-09-12 Impact factor: 5.157

4. Fucosylated glycoproteomic approach to identify a complement component 9 associated with squamous cell lung cancer (SQLC).

Authors: Arul Narayanasamy; Jung-Mo Ahn; Hye-Jin Sung; Deok-Hoon Kong; Kwon-Soo Ha; Soo-Youn Lee; Je-Yoel Cho
Journal: J Proteomics Date: 2011-08-03 Impact factor: 4.044

5. Byonic: advanced peptide and protein identification software.

Authors: Marshall Bern; Yong J Kil; Christopher Becker
Journal: Curr Protoc Bioinformatics Date: 2012-12

Review 6. O-linked protein glycosylation structure and function.

Authors: E F Hounsell; M J Davies; D V Renouf
Journal: Glycoconj J Date: 1996-02 Impact factor: 2.916

7. High Resolution CZE-MS Quantitative Characterization of Intact Biopharmaceutical Proteins: Proteoforms of Interferon-β1.

Authors: David R Bush; Li Zang; Arseniy M Belov; Alexander R Ivanov; Barry L Karger
Journal: Anal Chem Date: 2015-12-24 Impact factor: 6.986

Review 8. The diverse and expanding role of mass spectrometry in structural and molecular biology.

Authors: Philip Lössl; Michiel van de Waterbeemd; Albert Jr Heck
Journal: EMBO J Date: 2016-10-26 Impact factor: 11.598

9. An artifact in LC-MS/MS measurement of glutamine and glutamic acid: in-source cyclization to pyroglutamic acid.

Authors: Preeti Purwaha; Leslie P Silva; David H Hawke; John N Weinstein; Philip L Lorenzi
Journal: Anal Chem Date: 2014-06-05 Impact factor: 6.986

10. Structure of the poly-C9 component of the complement membrane attack complex.

Authors: Natalya V Dudkina; Bradley A Spicer; Cyril F Reboul; Paul J Conroy; Natalya Lukoyanova; Hans Elmlund; Ruby H P Law; Susan M Ekkel; Stephanie C Kondos; Robert J A Goode; Georg Ramm; James C Whisstock; Helen R Saibil; Michelle A Dunstone
Journal: Nat Commun Date: 2016-02-04 Impact factor: 14.919

19 in total

1. Altered glycan accessibility on native immunoglobulin G complexes in early rheumatoid arthritis and its changes during therapy.

Authors: J Stümer; M H C Biermann; J Knopf; I Magorivska; A Kastbom; A Svärd; C Janko; R Bilyy; G Schett; C Sjöwall; M Herrmann; L E Muñoz
Journal: Clin Exp Immunol Date: 2017-06-13 Impact factor: 4.330

2. Evaluation of Serum Glycoprotein Biomarker Candidates for Detection of Esophageal Adenocarcinoma and Surveillance of Barrett's Esophagus.

Authors: Alok K Shah; Gunter Hartel; Ian Brown; Clay Winterford; Renhua Na; Kim-Anh Lê Cao; Bradley A Spicer; Michelle A Dunstone; Wayne A Phillips; Reginald V Lord; Andrew P Barbour; David I Watson; Virendra Joshi; David C Whiteman; Michelle M Hill
Journal: Mol Cell Proteomics Date: 2018-08-10 Impact factor: 5.911

Review 3. Analysis of Mammalian O-Glycopeptides-We Have Made a Good Start, but There is a Long Way to Go.

Authors: Zsuzsanna Darula; Katalin F Medzihradszky
Journal: Mol Cell Proteomics Date: 2017-11-21 Impact factor: 5.911

4. Isolation and characterization of glycosylated neuropeptides.

Authors: Yang Liu; Qinjingwen Cao; Lingjun Li
Journal: Methods Enzymol Date: 2019-08-12 Impact factor: 1.600

Review 5. High-Resolution Native Mass Spectrometry.

Authors: Sem Tamara; Maurits A den Boer; Albert J R Heck
Journal: Chem Rev Date: 2021-08-20 Impact factor: 72.087

6. Quantitative assessment of successive carbohydrate additions to the clustered O-glycosylation sites of IgA1 by glycosyltransferases.

Authors: Tyler J Stewart; Kazuo Takahashi; Nuo Xu; Amol Prakash; Rhubell Brown; Milan Raska; Matthew B Renfrow; Jan Novak
Journal: Glycobiology Date: 2021-06-03 Impact factor: 4.313

Review 7. The Role of Electron Transfer Dissociation in Modern Proteomics.

Authors: Nicholas M Riley; Joshua J Coon
Journal: Anal Chem Date: 2017-12-12 Impact factor: 6.986

8. Comprehensive Proteoform Characterization of Plasma Complement Component C8αβγ by Hybrid Mass Spectrometry Approaches.

Authors: Vojtech Franc; Jing Zhu; Albert J R Heck
Journal: J Am Soc Mass Spectrom Date: 2018-03-12 Impact factor: 3.109

9. Development of EndoScreen Chip, a Microfluidic Pre-Endoscopy Triage Test for Esophageal Adenocarcinoma.

Authors: Julie A Webster; Alain Wuethrich; Karthik B Shanmugasundaram; Renee S Richards; Wioleta M Zelek; Alok K Shah; Louisa G Gordon; Bradley J Kendall; Gunter Hartel; B Paul Morgan; Matt Trau; Michelle M Hill
Journal: Cancers (Basel) Date: 2021-06-08 Impact factor: 6.575

10. Parsimonious Charge Deconvolution for Native Mass Spectrometry.

Authors: Marshall Bern; Tomislav Caval; Yong J Kil; Wilfred Tang; Christopher Becker; Eric Carlson; Doron Kletter; K Ilker Sen; Nicolas Galy; Dominique Hagemans; Vojtech Franc; Albert J R Heck
Journal: J Proteome Res Date: 2018-02-08 Impact factor: 4.466