| Literature DB >> 23459159 |
Phuc Vinh Nguyen Lam1, Radoslav Goldman, Konstantinos Karagiannis, Tejas Narsule, Vahan Simonyan, Valerii Soika, Raja Mazumder.
Abstract
The asparagine-X-serine/threonine (NXS/T) motif, where X is any amino acid except proline, is the consensus motif for N-linked glycosylation. Significant numbers of high-resolution crystal structures of glycosylated proteins allow us to carry out structural analysis of the N-linked glycosylation sites (NGS). Our analysis shows that there is enough structural information from diverse glycoproteins to allow the development of rules which can be used to predict NGS. A Python-based tool was developed to investigate asparagines implicated in N-glycosylation in five species: Homo sapiens, Mus musculus, Drosophila melanogaster, Arabidopsis thaliana and Saccharomyces cerevisiae. Our analysis shows that 78% of all asparagines of NXS/T motif involved in N-glycosylation are localized in the loop/turn conformation in the human proteome. Similar distribution was revealed for all the other species examined. Comparative analysis of the occurrence of NXS/T motifs not known to be glycosylated and their reverse sequence (S/TXN) shows a similar distribution across the secondary structural elements, indicating that the NXS/T motif in itself is not biologically relevant. Based on our analysis, we have defined rules to determine NGS. Using machine learning methods based on these rules we can predict with 93% accuracy if a particular site will be glycosylated. If structural information is not available the tool uses structural prediction results resulting in 74% accuracy. The tool was used to identify glycosylation sites in 108 human proteins with structures and 2247 proteins without structures that have acquired NXS/T site/s due to non-synonymous variation. The tool, Structure Feature Analysis Tool (SFAT), is freely available to the public at http://hive.biochemistry.gwu.edu/tools/sfat.Entities:
Mesh:
Substances:
Year: 2013 PMID: 23459159 PMCID: PMC3914773 DOI: 10.1016/j.gpb.2012.11.003
Source DB: PubMed Journal: Genomics Proteomics Bioinformatics ISSN: 1672-0229 Impact factor: 7.691
| Organism | Available structuresa | No. of annotated NXS/T sites | No. of unannotated NXS/T sites | No. of N sites | Total length | Sheet total length | Helix total length | Loop/turn total length |
|---|---|---|---|---|---|---|---|---|
| Human | 3094 | 2284 | 3779 | 30,762 | 1,627,531 | 377,793 | 713,587 | 536,151 |
| Mouse | 644 | 453 | 739 | 5984 | 91,718 | 31,568 | 24,182 | 35,968 |
| Fly | 103 | 42 | 103 | 1029 | 37,216 | 12,622 | 16,435 | 37,216 |
| Plant | 179 | 33 | 223 | 1834 | 136,158 | 30,062 | 62,978 | 43,118 |
| Yeast | 756 | 10 | 1163 | 16,745 | 191,581 | 41,428 | 87,412 | 62,741 |
Note:aStructures that have at least one asparagine in their sequence.
Figure 1The distribution of secondary structure elements and asparagine A. Distribution of secondary structural elements in proteins of human, mouse, fly, plant and yeast. B. Distribution of asparagine in secondary structural elements in proteins of human, mouse, fly, plant and yeast proteins. P values are calculated with χ2 test by comparing the occurrence of asparagine in secondary structural elements to the overall distribution of α-helix, β-sheet and turns/loops in all available structures in the species of interest.
Figure 2The distribution of asparagine in unannotated and annotated NXS/T motifs A. Distribution of unannotated NXS/T motifs in secondary structural elements. P values are calculated with χ2 test by comparing the occurrence of asparagine in unannotated NXS/T motif to the distribution of all asparagines. B. Distribution of annotated NXS/T motifs in secondary structural elements. P values are calculated with χ2 test by comparing the occurrence of asparagine in annotated NXS/T motif to the distribution of asparagines in unannotated NXS/T motifs.
Figure 3Distribution of asparagines in human and mouse proteins A. Distribution of asparagines in human proteins. B. Distribution of asparagines in mouse proteins.
Subcellular distribution of annotated NXS/T motifs in human and mouse proteome.
| Species | Entire proteome (%) | Secreted/membrane (%) | Cytoplasm/nucleus/mitochondria (%) |
|---|---|---|---|
| Human | 27 | 53 | 3 |
| Mouse | 21 | 36 | 2.6 |
Note: Percentage of annotated NXS/T motifs against all NXS/T motifs in respective categories is shown.
Figure 4Identification of N-linked glycosylation (NLG) sites using SFAT
Figure 5Home page for N-linked glycosylation prediction tool SFAT User can either predict N-linked glycosylation sites, find the distribution of a motif in secondary structural elements or map UniProtKB and PDB sequence features.