| Literature DB >> 32102970 |
Qingbo Shu1,2,3, Mengjie Li4,3, Lian Shu1,3,5, Zhiwu An4,3, Jifeng Wang1, Hao Lv4,3,6, Ming Yang1,3, Tanxi Cai1,3, Tony Hu2, Yan Fu7,3, Fuquan Yang8,5.
Abstract
Large-scale identification of N-linked intact glycopeptides by liquid chromatography coupled with tandem mass spectrometry (LC-MS/MS) inpan> human serum is challenging because of the wide dynamic range of serum protein abundances, the lack of a complete serum N-glycan database and the existence of proteoforms. In this regard, a spectral library search method was presented for the identification of N-linked intact glycopeptides from N-linked glycoproteins in human serum with target-decoy and motif-specific false discovery rate (FDR) control. Serum proteins were firstly separated into low-abundance and high-abundance proteins by acetonitrile (ACN) precipitation. After digestion, the N-linked intact glycopeptides were enriched by hydrophilic interaction liquid chromatography (HILIC) and a portion of the enriched N-linked intact glycopeptides were processed by Peptide-N-Glycosidase F (PNGase F) to generate N-linked deglycopeptides. Both N-linked intact glycopeptides and deglycopeptides were analyzed by LC-MS/MS. From N-linked deglycopeptides data sets, 764 N-linked glycoproteins, 1699 N-linked glycosites and 3328 unique N-linked deglycopeptides were identified. Four types of N-linked glycosylation motifs (NXS/T/C/V, X≠P) were used to recognize the N-linked deglycopeptides. The spectra of these N-linked deglycopeptides were utilized for N-linked deglycopeptides library construction and identification of N-linked intact glycopeptides. A database containing 739 N-glycan masses was constructed and utilized during spectral library search for the identification of N-linked intact glycopeptides. In total, 526 N-linked glycoproteins, 1036 N-linked glycosites, 22,677 N-linked intact glycopeptides and 738 N-glycan masses were identified under 1% FDR, representing the most in-depth serum N-glycoproteome identified by LC-MS/MS at N-linked intact glycopeptide level.Entities:
Keywords: Glycomics; N-linked intact glycopeptide; bioinformatics software; co-elution; glycoprotein pathways; glycoproteins; glycoproteomics; human serum; isotopic distribution; mass spectrometry; spectral library search
Mesh:
Substances:
Year: 2020 PMID: 32102970 PMCID: PMC7124471 DOI: 10.1074/mcp.RA119.001791
Source DB: PubMed Journal: Mol Cell Proteomics ISSN: 1535-9476 Impact factor: 5.911
Fig. 1.Experimental design of A, Serum is processed with or without ACN precipitation and high pH RPLC fractionation. N-glycan is presented as dark green star and peptide backbone is presented as line. B, Data analysis workflow for N-linked intact glycopeptide identification.
Statistics of identifications from N-linked deglycopeptide samples
| Indicator | Data set | Combined | % of identifications increased by pre-fractionation | |
|---|---|---|---|---|
| UDGP | FDGP | |||
| Glycoproteins | 219 | 724 | 764 | 249 |
| Glycosites | 444 | 1536 | 1699 | 283 |
| Deglycopeptides | 915 | 2724 | 3328 | 264 |
Fig. 2.Overlap analysis of the two glycan mass databases used and the N-glycan masses identified. A, Shared and unique N-glycan masses between 194 N-glycan masses database and 701 N-glycan masses database. B, Commonly identified N-glycan masses using 194 and 701 N-glycan masses databases in FGP and UGP data set. C, D, The scattering plot of median pMatchGlyco score and number of GPSM for each identified N-glycan mass in the two serum data sets using the 739 N-glycan masses.
Statistics of identifications from N-linked intact glycopeptide samples using 701 N-glycan masses
| Indicator | Data set | Combined | Identifications increased by pre-fractionation | |
|---|---|---|---|---|
| UGP | FGP | |||
| Glycoprotein | 373 | 451 | 521 | 148 |
| Glycosite | 676 | 856 | 1030 | 354 |
| Glycopeptide | 7141 | 17,482 | 22,194 | 15,053 |
Statistics of identifications from N-linked intact glycopeptide samples using 194 N-glycan masses
| Indicator | Data set | Combined | Identifications increased by pre-fractionation | |
|---|---|---|---|---|
| UGP | FGP | |||
| Glycoprotein | 381 | 415 | 507 | 126 |
| Glycosite | 690 | 767 | 981 | 291 |
| Glycopeptide | 5393 | 10,448 | 13,874 | 8481 |
Fig. 3.Overlap of the identified serum glycoproteins and glycosites using the two glycan mass databases.
Statistics of Identifications from N-linked glycopeptide samples using 739 N-glycan masses
| Indicator | Data set | Combined | Identifications increased by pre-fractionation | |
|---|---|---|---|---|
| UGP | FGP | |||
| Glycoprotein | 378 | 446 | 526 | 148 |
| Glycosite | 685 | 847 | 1036 | 351 |
| Glycopeptide | 7254 | 17,854 | 22,677 | 15,423 |
Fig. 4.Validation of the proposed method using entrapment glycan masses. A, The distribution of 739 N-glycan masses used for serum N-linked intact glycopeptide identification. B, The distribution of mass difference in the range of 0–150 Da among the 739 glycan masses. Insert shows the frequencies of mass difference in the range of 0–15 Da. C, D, Scattering plots of all matched glycan masses after including entrapment masses. All identified GPSMs in UGP (C) and FGP (D) using the 739 true N-glycan masses and 739 entrapment N-glycan masses are combined for plotting.
Fig. 5.Validation of semi-tryptic digestion and 16 PTMs setting using glycoprotein standards and serum spiked with glycoprotein standards. Four different set of search parameters are used to identify glycopeptides from five non-human glycoprotein standards and their GPSMs (A), N-linked glycosites (B), N-linked intact glycopeptides(C) and number of N-glycans (D) are compared. HRP, horseradish peroxidase; IOVO, ovomucoid; Oval, ovalbumin; Ogchi, alpha-1-acid glycoprotein; QSOX1, sulfhydryl oxidase 1.
Fig. 6.Site-specific glycosylation of serum glycoproteins. The number of identified N-glycan masses on each of the eight glycosites of complement factor H (A) and on each of the four glycosites of serotransferrin (B) are shown. C, D, The overlap of N-glycan masses identified on the four glycosites N432, N630 and N637 of serotransferrin and on the four glycosites N529, N822, N882 and N1029 of complement factor H in different samples including commercial transferrin, serum spiked with HRP and ovalbumin, UGP and FGP data set. Pink ellipse represents the N-glycan masses from commercial serotransferrin. Purple ellipse represents the N-glycan masses from human serum spiked with HRP and ovalbumin. Cyan ellipse represents the N-glycan masses from the UGP data set. Yellow ellipse represents the N-glycan masses from FGP data set. CFH, complement factor H; TRFE, serotransferrin.
Fig. 7.A glycopeptide spectrum identified by pMatchGlyco in UGP data set. A, The spectrum is shown in the m/z range of 0–3000. Eight commonly observed oxonium ions are colored in blue and labeled with their m/z values. B, Extracted ion chromatographs of the first six isotopic peaks of this glycopeptide's precursor ion. C, Peak areas of the first six isotopic peaks in the four replicate LC runs in UGP data set. D, Experimentally observed isotopic distribution of the glycopeptide precursor ion in Run_01. E, Theoretical isotopic distribution of the same precursor ion predicted based on its chemical composition.