| Literature DB >> 27708373 |
Fuyi Li1,2, Chen Li2, Jerico Revote3, Yang Zhang1, Geoffrey I Webb4, Jian Li5, Jiangning Song2,4,6, Trevor Lithgow5.
Abstract
Glycosylation plays an important role in cell-cell adhesion, ligand-binding and subcellular recognition. Current apEntities:
Mesh:
Substances:
Year: 2016 PMID: 27708373 PMCID: PMC5052564 DOI: 10.1038/srep34595
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.379
Figure 1Overview of the GlycoMine framework.
Four major steps are denoted by different colors: dataset collection and preprocessing (blue), feature extraction (yellow), feature analysis and selection (red), model evaluation (green).
Figure 2Residue specificity and enrichment of sequons.
(a) N- and (b) O-linked glycosylation sites with the “human protein dataset” selected as the background set. Sequence logos and statistical test (binomial probabilities and Bonferroni correction) were generated using the pLogo program38.
The selected optimal features for N-linked glycosylation.
| Num. | Feature | Position | Software |
|---|---|---|---|
| Normalized average hydrophobicity scales | P10 | AAindex | |
| PSSM | P250 | PSI-BLAST | |
| PSSM | P235 | PSI-BLAST | |
| Conformational parameter of beta-turn | P10 | AAindex | |
| PSSM | P173 | PSI-BLAST | |
| Mean polarity | P8 | AAindex | |
| Average flexibility indices | P7 | AAindex | |
| Mean polarity | P10 | AAindex | |
| PSSM | P274 | PSI-BLAST |
Features highlighted in italic indicate structural features, while other features not highlighted are sequence-derived features or amino acid properties.
The selected optimal features for O-linked glycosylation.
| Num. | Feature | Position | Software |
|---|---|---|---|
| Conformational parameter of beta-turn | P8 | AAindex | |
| PSSM | P38 | PSI-BLAST | |
| Normalized average hydrophobicity scales | P8 | AAindex | |
| PSSM | P293 | PSI-BLAST | |
| PSSM | P248 | PSI-BLAST | |
| PSSM | P8 | PSI-BLAST | |
| PSSM | P128 | PSI-BLAST | |
| Mean polarity | P8 | AAindex |
Features highlighted in italic indicate structural features, while other features not highlighted are sequence-derived features or amino acid properties.
Figure 3The relative importance and ranking of the selected optimal features.
(a) N-linked glycosylation and (b) O-linked glycosylation based on the average accuracy decrease of models trained after removal of a correspoding feature from the feature set.
Figure 4ROC curves.
(a) Different GlycoMine models trained with OFSs selected from all features, sequence features only, and structural features only, for N- and O-linked glycosylation sites. (b) N- and O-linked glycosylation-site predictions from GlycoMine (trained with the OFS) and NGlycPred using the independent test dataset.
Figure 5Predicted N-linked glycosylation sites from two case-study proteins using GlycoMine.
(a) Toll-like receptor 8. (b) α-L-iduronidase. Predicted N-glycosylation sites from both GlycoMine and NGlycoPred are colored in yellow, while the sites that were correctly predicted by GlycoMine, but were not predicted by NGlycPred are coloured in red. The illustrations of Pfam domains and N-glycosylation sites of these two proteins shown at the bottom of each panel were rendered using the IBS program98.
Figure 6Functional enrichment analysis and classification of N-linked and O-linked glycoproteomes in terms of protein subcellular location, KEGG pathway, molecular function and biological process based on GO annotations.
(a) Subcellular locations and GO terms enriched in N-linked glycosylated proteins. (b) Subcellular locations and GO terms enriched in O-linked glycosylated proteins. (c,d) Distributions of N-linked and O-linked glycosylated proteins categorized based on the numbers of predicted glycosylation sites.