| Literature DB >> 28886091 |
Thiago Britto-Borges1, Geoffrey J Barton1.
Abstract
Protein O-GlcNAcylation (O-GlcNAc) is an essential post-translational modification (PTM) in higher eukaryotes. The O-linked β-N-acetylglucosamine transferase (OGT), targets specific Serines and Threonines (S/T) in intracellular proteins. However, unlike phosphorylation, fewer than 25% of known O-GlcNAc sites match a clear sequence pattern. Accordingly, the three-dimensional structures of O-GlcNAc sites were characterised to investigate the role of structure in molecular recognition. From 1,584 O-GlcNAc sites in 620 proteins, 143 were mapped to protein structures determined by X-ray crystallography. The modified S/T were 1.7 times more likely to be annotated in the REM465 field which defines missing residues in a protein structure, while 7 O-GlcNAc sites were solvent inaccessible and unlikely to be targeted by OGT. 132 sites with complete backbone atoms clustered into 10 groups, but these were indistinguishable from clusters from unmodified S/T. This suggests there is no prevalent three-dimensional motif for OGT recognition. Predicted features from the 620 proteins were compared to unmodified S/T in O-GlcNAcylated proteins and globular proteins. The Jpred4 predicted secondary structure shows that modified S/T were more likely to be coils. 5/6 methods to predict intrinsic disorder indicated O-GlcNAcylated S/T to be significantly more disordered than unmodified S/T. Although the analysis did not find a pattern in the site three-dimensional structure, it revealed the residues around the modification site are likely to be disordered and suggests a potential role of secondary structure elements in OGT site recognition.Entities:
Mesh:
Substances:
Year: 2017 PMID: 28886091 PMCID: PMC5590929 DOI: 10.1371/journal.pone.0184405
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Fig 1Sequence relative entropy of sites (+/- 7 residues) from 4 posttranslational modifications.
Three kinases with most sites in PhosphoSitePlus database [14] protein kinase A (PKA with 1285 sites), protein kinase C (PKC with 930 sites) and casein kinase 2 (CK2 with 742 sites). 1530 OGT sites were compiled from the same database. The sequence relative entropy was calculated with the WebLogo library [25]. Lines show mean relative entropy and the semi-transparent area represents 95% confidence intervals.
DSSP assigned secondary structure proportion of S/T in the SS143 dataset compared to unmodified S/T in same protein chains.
| Modified | Unmodified | ||||
|---|---|---|---|---|---|
| Secondary structure | Proportion(n) | 95% CI [lower, upper] | Proportion (n) | 95% CI [lower, upper] | |
| C | 0.55 (78) | [0.46, 0.63] | 0.51 (2475) | [0.50, 0.53] | 0.36 |
| H | 0.25 (36) | [0.18, 0.32] | 0.32 (1525) | [0.31, 0.33] | 0.06 |
| E | 0.20 (29) | [0.13, 0.27] | 0.17 (811) | [0.16, 0.18] | 0.27 |
| Total | 143 | 4,811 | |||
95% CI– 95% confidence interval; n–number of S/T.
The p value refers to the two-tailed z-score test between the proportions of modified and unmodified groups.
Fig 4Structural superimpositions for the 10 clusters comprising 96 sites in the SS132 dataset.
Pairs of sites were superimposed on their 7 Cα atoms and the Cβ of the central S/T. Their pairwise RMSD were clustered with complete linkage and Euclidean distance. Clusters were defined by a 3 Å threshold. Green, yellow and grey represent residues in H, E, C secondary structures respectively.
Structural evidence of buried O-GlcNAc sites in the SS143 dataset.
| PDB id | Chain | Position | Cluster id | RSA |
|---|---|---|---|---|
| 1f4j | B | 114 | D | 0.05 |
| 3cb2 | B | 170 | D | 0.02 |
| 4qvp | T | 131 | D | 0.01 |
| 2zxe | A | 366 | G | 0.02 |
| 3abm | R | 63 | G | 0.01 |
| 4l3j | A | 180 | G | 0.01 |
| 4y7y | Z | 190 | G | 0.04 |
RSA–site mean relative solvent accessibility; Cluster id–Clusters in .
Dataset summary.
See Methods for details.
| Dataset name | Number of sites | Number of proteins | Short name |
|---|---|---|---|
| Modified Sequence Sites | 1,385 | 620 | MSS |
| Unmodified Sequence Sites | 100,329 | 620 | USS |
| Structural Sites | 143 | 106 | SS143 |
| Structural Sites with backbone | 132 | 93 | SS132 |
| Globular Set | 1,164 | 1,164 | GS |
JPred4 predicted solvent accessibility for S/T in the MSS and USS datasets.
The proportions of buried S/T as predicted by the Jnetsol method in JPred4. The proportions of buried S/T are significantly smaller for modified group.
| Modified (MSS) | Unmodified (USS) | ||||
|---|---|---|---|---|---|
| Buried at | Proportion (n) | 95% CI [lower, upper] | Proportion (n) | 95% CI [lower, upper] | |
| 0% | 0.01 (7) | [0.00, 0.01] | 0.01 (836) | [0.008, 0.009] | 0.18 |
| 5% | 0.04 (55) | [0.03, 0.05] | 0.04 (3,917) | [0.038, 0.040] | 0.86 |
| 25% | 0.29 (403) | [0.27, 0.31] | 0.35 (28,044) | [0.27, 0.28] | 0.31 |
95% CI– 95% confidence interval; n–number of S/T predicted to be buried. The p value refers to the two-tailed z-score test between the modified and unmodified groups.
JPred4 predicted secondary structure proportions for S/T in the MSS and USS datasets.
| Modified (MSS) | Unmodified (USS) | ||||
|---|---|---|---|---|---|
| Secondary structure | Proportion (n) | 95% CI [lower, upper] | Proportion (n) | 95% CI [lower, upper] | |
| C | 0.88 (1,205) | [0.86, 0.90] | 0.829 | [0.826, 0.831] | <0.01 |
| H | 0.08 | [0.07, 0.09] | 0.126 | [0.124, 0.128] | <0.01 |
| E | 0.05 (66) | [0.04, 0.06] | 0.045 (4,495) | [0.044, 0.046] | 0.6 |
95% CI– 95% confidence interval; n–the number of S/T; the p value refers to the two-tailed z-score test between the modified and unmodified groups.
Predicted disorder between modified and unmodified S/T.
All disorder prediction methods, excepting DisEMBL-HOTLOOPS, reveal a small but significant increase of mean disorder score for modified S/T over unmodified ones.
| Method | Mean score modified (MSS) ± SE | Mean score unmodified (USS)± SE | |
|---|---|---|---|
| DisEMBL-REM465 | 0.48 ± 0.004 | 0.47 ± 0.001 | 0.01 |
| DisEMBL-COILS | 0.60 ± 0.004 | 0.58 ± 0.001 | <0.01 |
| DisEMBL-HOTLOOPS | 0.10 ± 0.001 | 0.10 ± 0.001 | 0.45 |
| IUpred-Long | 0.59 ± 0.006 | 0.55 ± 0.001 | <0.01 |
| IUpred-Short | 0.48 ± 0.005 | 0.45 ± 0.001 | <0.01 |
| JRonn | 0.62 ± 0.004 | 0.61 ± 0.001 | 0.02 |
The p value refers to the two-tailed t-test between the modified and unmodified groups. SE–standard error.