| Literature DB >> 25521204 |
Hsin-Yi Wu, Cheng-Tsung Lu, Hui-Ju Kao, Yi-Ju Chen, Yu-Ju Chen, Tzong-Yi Lee.
Abstract
BACKGROUND: Protein O-GlcNAcylation, involving the attachment of single N-acetylglucosamine (GlcNAc) to the hydroxyl group of serine or threonine residues. Elucidation of O-GlcNAcylation sites on proteins is required in order to decipher its crucial roles in regulating cellular processes and aid in drug design. With an increasing number of O-GlcNAcylation sites identified by mass spectrometry (MS)-based proteomics, several methods have been proposed for the computational identification of O-GlcNAcylation sites. However, no development that focuses on the investigation of O-GlcNAcylated substrate motifs has existed. Thus, we were motivated to design a new method for the identification of protein O-GlcNAcylation sites with the consideration of substrate site specificity.Entities:
Mesh:
Substances:
Year: 2014 PMID: 25521204 PMCID: PMC4290634 DOI: 10.1186/1471-2105-15-S16-S1
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Number of sites of training and independent testing set.
| Data resource | O-GlcNAcylated sites (Positive data) | Non-O-GlcNAcylated sites (Negative data) | ||
|---|---|---|---|---|
| dbOGAP | Serine | 240 | 16740 | |
| Threonine | 135 | 10079 | ||
| Ser and Thr | 375 | 26819 | ||
| UniProtKB | Serine | 57 | 4488 | |
| Threonine | 51 | 2978 | ||
| Ser and Thr | 108 | 7466 | ||
| OGlycBase | Serine | 24 | 1013 | |
| Threonine | 24 | 694 | ||
| Ser and Thr | 48 | 1707 | ||
| PhosphoSitePlus | Serine | 779 | 58082 | |
| Threonine | 582 | 34217 | ||
| Ser and Thr | 1361 | 92299 | ||
| Serine | 578 | 41075 | ||
| Threonine | 470 | 23920 | ||
| Ser and Thr | 1048 | 64995 | ||
Figure 1The tree-like visualization of the identified substrate motifs by applying MDDLogo.
Figure 2TwoSampleLogo between O-GlcNAcylated and non-O-GlcNAcylated sites.
Figure 3MDDLogo-identified motifs of O-GlcNAcylation data.
Five-fold cross validation results on single SVM model trained with various features.
| Training features | Number of positive data | Number of negative data | Sn | Sp | Acc | MCC |
|---|---|---|---|---|---|---|
| 20D Binary code | 375 | 375 | 0.66 | 0.69 | 0.69 | 0.21 |
| Amino Acid Composition (AAC) | 375 | 375 | 0.64 | 0.65 | 0.65 | 0.17 |
| Amino Acid Pair Composition (AAPC) | 375 | 375 | 0.66 | 0.67 | 0.67 | 0.20 |
| Accessible Surface Area (ASA) | 375 | 375 | 0.57 | 0.59 | 0.59 | 0.10 |
| Position Weight Matrix (PWM) | 375 | 375 | 0.62 | 0.63 | 0.63 | 0.14 |
| Position-specific scoring matrix (PSSM) | 375 | 375 | 0.68 | 0.69 | 0.69 | 0.22 |
Performance of MDDLogo-clustered SVM models evaluated by five-fold cross validation.
| SVM model | Number of positive data | Number of negative data | Sn | Sp | Acc | MCC |
|---|---|---|---|---|---|---|
| All data (Single SVM) | 375 | 375 | 0.68 | 0.69 | 0.69 | 0.22 |
| Subgroup OG1 | 150 | 150 | 0.80 | 0.81 | 0.81 | 0.41 |
| Subgroup OG2 | 92 | 92 | 0.78 | 0.79 | 0.79 | 0.37 |
| Subgroup OG3 | 64 | 64 | 0.76 | 0.80 | 0.79 | 0.37 |
| Subgroup OG4 | 69 | 69 | 0.70 | 0.71 | 0.71 | 0.25 |
Figure 4Comparison of independent testing performance between Single SVM model and MDDLogo-clustered SVM models.
Figure 5Comparison of independent testing performance between our method and three available online O-GlcNAcylation site prediction tools.
Figure 6A case study of O-GlcNAcylation sites prediction on Calcium/calmodulin-dependent protein kinase type IV (CAMK4).