| Literature DB >> 31685900 |
Thejkiran Pitti1,2,3, Ching-Tai Chen4, Hsin-Nan Lin1, Wai-Kok Choong1, Wen-Lian Hsu1, Ting-Yi Sung5.
Abstract
N-linked glycosylation is one of the predominant post-translational modifications involved in a number of biological functions. Since experimental characterization of glycosites is challenging, glycosite prediction is crucial. Several predictors have been made available and report high performance. Most of them evaluate their performance at every asparagine in protein sequences, not confined to asparagine in the N-X-S/T sequon. In this paper, we present N-GlyDE, a two-stage prediction tool trained on rigorously-constructed non-redundant datasets to predict N-linked glycosites in the human proteome. The first stage uses a protein similarity voting algorithm trained on both glycoproteins and non-glycoproteins to predict a score for a protein to improve glycosite prediction. The second stage uses a support vector machine to predict N-linked glycosites by utilizing features of gapped dipeptides, pattern-based predicted surface accessibility, and predicted secondary structure. N-GlyDE's final predictions are derived from a weight adjustment of the second-stage prediction results based on the first-stage prediction score. Evaluated on N-X-S/T sequons of an independent dataset comprised of 53 glycoproteins and 33 non-glycoproteins, N-GlyDE achieves an accuracy and MCC of 0.740 and 0.499, respectively, outperforming the compared tools. The N-GlyDE web server is available at http://bioapp.iis.sinica.edu.tw/N-GlyDE/ .Entities:
Year: 2019 PMID: 31685900 PMCID: PMC6828726 DOI: 10.1038/s41598-019-52341-z
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.379
Figure 1Schematic framework of N-GlyDE.
Figure 2N-GlyDE first-stage prediction on 6195 proteins. The green and red bars indicate the percentage of N-linked glycoproteins and non-N-linked glycoproteins predicted at each interval. The blue line represents the total number of proteins observed across each interval.
Prediction performance of different predictors on the independent dataset.
| Predictors | Accuracy | Precision | Sensitivity | Specificity | MCC |
|---|---|---|---|---|---|
| N-GlyDE |
| 0.613 | 0.826 | 0.689 |
|
| GlycoMine | 0.725 |
| 0.700 |
| 0.430 |
| NetNGlyc | 0.572 | 0.460 |
| 0.411 | 0.265 |
| GlycoEP_Std_PPP | 0.574 | 0.437 | 0.512 | 0.610 | 0.119 |
Figure 3ROC curves of N-GlyDE, GlycoMine, NetNGlyc, and GlycoEP_Std_PPP on the independent dataset. For each predictor, the area under the ROC curve is calculated.
Discriminative gapped dipeptides with high and low GDR.
| Positive-oriented gapped dipeptides | Top 10 GDR | Negative-oriented gapped dipeptides | Bottom 10 GDR |
|---|---|---|---|
| W5 | 2.401 | 0.064 | |
| 2.198 | C4 | 0.542 | |
| W11 | 1.96 | 0.569 | |
| Y7 | 1.96 | K11 | 0.574 |
| 1.941 | 0.589 | ||
| L0 | 1.923 | N0 | 0.595 |
| Y0 | 1.884 | M9 | 0.612 |
| 1.794 | 0.618 | ||
| W0 | 1.794 | K6 | 0.63 |
| H6 | 1.714 | 0.643 |
Underlined and boldface asparagine corresponds to the sequon and is located at the center of a sequence window.
Figure 4N-GlyDE prediction results of human VEGFR2, where sites with a prediction score above 0.6 (shown by the dotted line) are predicted as glycosites. Green bars represent the three glycosites with experimental evidence in UniProt. Glycosites annotated by sequence analysis in UniProt were recently validated by Chandler et al.’s mass spectrometry (MS) experiment on extracellular domain of murine VEGFR2, which shares 86% sequence similarity with human VEGFR2. Blue bars represent the MS-validated glycosites; the brown bar (N631) represents sites undetected in the MS experiment; and the orange bar (N66) represents the sequon only in human, not in murine. Red bars represent the non-glycosites, which was not studied by Chandler et al. The number following ‘N’ represents the asparagine position of the sequon in the sequence.
Figure 5Illustrations of gapped dipeptides and sequence windows with lengths 3 ≤ w ≤ 11 used to derive secondary structure and surface accessibility features.