| Literature DB >> 31941975 |
Heeyoun Hwang1, Hoi Keun Jeong1,2, Hyun Kyoung Lee1,2, Gun Wook Park1, Ju Yeon Lee1, Soo Youn Lee1, Young-Mook Kang3, Hyun Joo An2,4, Jeong Gu Kang5, Jeong-Heon Ko5,6, Jin Young Kim7, Jong Shin Yoo8,9.
Abstract
Protein glycosylation is known to be involved in biological progresses such as cell recognition, growth, differentiation, and apoptosis. Fucosylation of glycoproteins plays an important role for structural stability and function of N-linked glycoproteins. Although many of biological and clinical studies of protein fucosylation by fucosyltransferases has been reported, structural classification of fucosylated N-glycoproteins such as core or outer isoforms remains a challenge. Here, we report for the first time the classification of N-glycopeptides as core- and outer-fucosylated types using tandem mass spectrometry (MS/MS) and machine learning algorithms such as the deep neural network (DNN) and support vector machine (SVM). Training and test sets of more than 800 MS/MS spectra of N-glycopeptides from the immunoglobulin gamma and alpha 1-acid-glycoprotein standards were selected for classification of the fucosylation types using supervised learning models. The best-performing model had an accuracy of more than 99% against manual characterization and area under the curve values greater than 0.99, which were calculated by probability scores from target and decoy datasets. Finally, this model was applied to classify fucosylated N-glycoproteins from human plasma. A total of 82N-glycopeptides, with 54 core-, 24 outer-, and 4 dual-fucosylation types derived from 54 glycoproteins, were commonly classified as the same type in both the DNN and SVM. Specifically, outer fucosylation was dominant in tri- and tetra-antennary N-glycopeptides, while core fucosylation was dominant in the mono-, bi-antennary and hybrid types of N-glycoproteins in human plasma. Thus, the machine learning methods can be combined with MS/MS to distinguish between different isoforms of fucosylated N-glycopeptides.Entities:
Mesh:
Substances:
Year: 2020 PMID: 31941975 PMCID: PMC6962204 DOI: 10.1038/s41598-019-57274-1
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.379
Construction of training and test sets of glycopeptide spectra matches (GSMs) of N-glycopeptides identified from IgG and AGP standards and their classification of fucosylation types both manually and by machine learning methods such as the support vector machine (SVM) and deep neural network (DNN).
| N-glycoproteins (IgG & AGP Standards) | Training Set (433 GSMs) | Test Set (393 GSMs) | |||||||
|---|---|---|---|---|---|---|---|---|---|
| None (%) | Core (%) | Outer (%) | Dual (%) | None (%) | Core (%) | Outer (%) | Dual (%) | ||
| Classification Methods | Manual Classification | 170 (39.2%) | 106 (24.5%) | 89 (20.5%) | 68* (15.7%) | 162 (41.2%) | 70 (17.8%) | 96 (24.4%) | 65* (16.5%) |
| SVM Classification | 170 (39.2%) | 106 (24.5%) | 89 (20.5%) | 68 (15.7%) | 163 (41.5%) | 70 (17.8%) | 97 (24.9%) | 62 (15.8%) | |
| DNN Classification | 170 (39.2%) | 106 (24.5%) | 89 (20.5%) | 68 (15.7%) | 163 (41.5%) | 70 (17.8%) | 98 (24.9%) | 62 (15.8%) | |
*GSMs with dual fucosylation from AGP standard proteins identified and added to the training and test sets from additional experiments.
Figure 1The computational workflow for classifying the fucosylation of N-glycopeptides using machine learning. The relative intensities of 14 fucosylation features extracted from CID tandem MS spectra of identified N-glycopeptides were calculated and used to classify fucosylation using the DNN and SVM. Training and testing data sets were constructed with N-glycopeptides identified from standard IgG and AGP glycoproteins using IQ-GPA. The DNN and SVM models were constructed with TensorFlow (ver. 0.12.0) and the R package e1071 (ver. 3.4.3), respectively. The best-performing model was selected from each machine learning method, and classified N-glycopeptides were filtered with <1% FDR using a random decoy. Finally, the DNN and SVM were used to classify an unknown data set from human plasma according to four types of fucosylation: none, core, outer, and dual. Green circles = nomannose; yellow circles = angalactose; blue squares = N-acetylglucosamine; red triangles = fucose; and pink diamonds = N-acetylneuraminic acid.
Comparison of Pscore histograms from the classification of fucosylation types between selected machine learning models of the deep neural network (DNN) and support vector machine (SVM).
| Training set (433 GSMs) | Test set (393 GSMs) | Unknown set (671 GSMs) | ||||
|---|---|---|---|---|---|---|
| DNN | SVM | DNN | SVM | DNN | SVM | |
| AUC* | 0.999 | 0.994 | 0.999 | 0.998 | 0.998 | 0.986 |
| Pscore cut <1% FDR** | 4.623 | 0.982 | 5.559 | 0.303 | 3.415 | 0.692 |
| Filtered GSMs*** | 433 | 417 | 391 | 387 | 657 | 626 |
| Union of Filtered GSMs**** (TP / FP) | 433 (433/0) | 392 (388/4) | 638 (626/12) | |||
| Sensitivity (TP /(TP TPFN)) | 100% (433/433) | 100% (388/388) | 99.21% (626/631) | |||
| Accuracy | 100% | 99.75% | 97.47% | |||
*Area under the curve (AUC) values were calculated from receiver operating characteristic curves between the target and decoy.
**Pscores were less than 1% FDR between the target and decoy, where Pscores were calculated as the natural logarithm of the difference between the first and second ranked probabilities for classification of the fucosylation types.
***Number of glycopeptide spectra matches (GSMs) was filtered with less than 1% FDR between the target and decoy.
****Union number of GSMs were classified using the DNN and SVM filtered with less than 1% FDR between the target and decoy.
Figure 2Classification of fucosylated N-glycopeptides of (A) total proteins, (B) IgG, and (C) AGP in human plasma.
Figure 3Classification of fucosylated N-glycopeptides of total proteins in human plasma by their (A) mono-antennary, (B) bi-antennary, (C) tri- and tetra-antennary, and (D) hybrid types.
Figure 4Representative CID MS/MS spectra of N-glycopeptides classified as (A) core fucosylation with bi-antennary type (VCQDCPLLAPLNDTR_5_4_1_2) and (B) outer fucosylation with tri-antennary type (VCQDCPLLAPLNDTR_6_5_1_3) of alpha-2-HS glycoprotein in human plasma (green circle, mannose; yellow circle, galactose; blue square, N-acetylglucosamine; red triangle, fucose; pink diamond, N-acetylneuraminic acid; red arrow, fucosylation diagnostic ions; and red box, pair of fragmented ions with or without fucose).