| Literature DB >> 21466708 |
Jinlian Wang1, Manabu Torii, Hongfang Liu, Gerald W Hart, Zhang-Zhi Hu.
Abstract
BACKGROUND: Protein O-GlcNAcylation (or O-GlcNAc-ylation) is an O-linked glycosylation involving the transfer of β-N-acetylglucosamine to the hydroxyl group of serine or threonine residues of proteins. Growing evidences suggest that protein O-GlcNAcylation is common and is analogous to phosphorylation in modulating broad ranges of biological processes. However, compared to phosphorylation, the amount of protein O-GlcNAcylation data is relatively limited and its annotation in databases is scarce. Furthermore, a bioinformatics resource for O-GlcNAcylation is lacking, and an O-GlcNAcylation site prediction tool is much needed. DESCRIPTION: We developed a database of O-GlcNAcylated proteins and sites, dbOGAP, primarily based on literature published since O-GlcNAcylation was first described in 1984. The database currently contains ~800 proteins with experimental O-GlcNAcylation information, of which ~61% are of humans, and 172 proteins have a total of ~400 O-GlcNAcylation sites identified. The O-GlcNAcylated proteins are primarily nucleocytoplasmic, including membrane- and non-membrane bounded organelle-associated proteins. The known O-GlcNAcylated proteins exert a broad range of functions including transcriptional regulation, macromolecular complex assembly, intracellular transport, translation, and regulation of cell growth or death. The database also contains ~365 potential O-GlcNAcylated proteins inferred from known O-GlcNAcylated orthologs. Additional annotations, including other protein posttranslational modifications, biological pathways and disease information are integrated into the database. We developed an O-GlcNAcylation site prediction system, OGlcNAcScan, based on Support Vector Machine and trained using protein sequences with known O-GlcNAcylation sites from dbOGAP. The site prediction system achieved an area under ROC curve of 74.3% in five-fold cross-validation. The dbOGAP website was developed to allow for performing search and query on O-GlcNAcylated proteins and associated literature, as well as for browsing by gene names, organisms or pathways, and downloading of the database. Also available from the website, the OGlcNAcScan tool presents a list of predicted O-GlcNAcylation sites for given protein sequences.Entities:
Mesh:
Substances:
Year: 2011 PMID: 21466708 PMCID: PMC3083348 DOI: 10.1186/1471-2105-12-91
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Figure 1Overall workflow of dbOGAP development.
Figure 2Statistics of protein entries in dbOGAP. Left, Venn diagram showing the number of O-GlcNAcylated proteins and modification sites in the dbOGAP database. There are a total of 1163 protein entries in the database. A - Experimental O-GlcNAcylated proteins; B - Proteins with identified O-GlcNAcylation sites (G-sites); C - Proteins with identified phosphorylation sites (P-sites); D - Proteins with inferred O-GlcNAcylation sites based on orthologs with known O-GlcNAcylation sites. Right, Taxonomic distributions of experimentally determined O-GlcNAcylated proteins in dbOGAP. Numbers shown in the pie chart are percentage of proteins in given species over the total number of proteins in the database. "Other" species include Bos taurus (3 protein entries), Gallus gallus (2), Coturnix coturnix japonica (1), Rhea americana (1), Macaca mulatta (1) and Viruses (3).
O-GlcNAcylation and phosphorylation occurring at identical or adjacent (+/- 4 amino acids) serine/threonine (S/T) sites of O-GlcNAcylated proteins.
| UniProt ID | Gene name | O-GlcNAcylation site | Identical phosphorylation S/T site | Adjacent phosphorylation S/T site |
|---|---|---|---|---|
| RRP1B_HUMAN | RRP1B | S731 | S731 | T728, S732, S735 |
| CARF_HUMAN | CDKN2AIP | S348 | T345 | |
| VIME_HUMAN | VIM | S7, T33, S34, S55 | S7, T33, S34, S55 | S5, S8, S9, S10, S29, T33, S34, S51, S56 |
| SPTB2_HUMAN | SPTBN1 | S2324 | T2328 | |
| TPR_HUMAN | TPR | S1676 | T1677 | |
| RBP2_HUMAN | RANBP2 | T1399 | T1399 | T1396, S1400 |
| H31_HUMAN | HIST1H3A | S11 | S11 | T12 |
| K2C8_HUMAN | KRT8 | S13, S15 | S13 | S9, S13, T14 |
| MYC_HUMAN | MYC | T58 | T58 | S62 |
| NUMA1_HUMAN | NUMA1 | S1844 | S1840, S1847 | |
| PHB_HUMAN | PHB | T258 | S254 | |
| EMSY_HUMAN | EMSY | S236 | S238 | |
| NU214_HUMAN | NUP214 | T1201, S1354 | T1203, S1356 | |
| CRTC2_HUMAN | CRTC2 | S70, S171, S173 | S70, S171 | T169, S171, T177 |
| KCC4_HUMAN | CAMK4 | S356 | S356 | S360 |
| FOXO1_HUMAN | FOXO1 | T317 | S319 | |
| BPTF_HUMAN | BPTF | T2094 | S2098 | |
| HCFC1_HUMAN | HCFC1 | T738 | T738 | T737 |
| K1C18_HUMAN | KRT18 | S30, S31, S49 | S30, S31 | S30, S31, S34, S47, S53 |
| P121A_HUMAN | POM121 | T693 | S697 | |
| RBM14_HUMAN | RBM14 | S244, S254, S256, S280 | S256, S280 | S256 |
| AKT1_HUMAN | AKT1 | T308, S473 | S473, T308 | |
| ATX2L_HUMAN | ATXN2L | S684 | S684 | |
| SYUA_HUMAN | SNCA | S87 | S87 | |
| IKKB_HUMAN | IKBKB | S733 | S733 | |
| ESR1_MOUSE | Esr1 | S10 | S10 | T7 |
| SPTB2_MOUSE | Sptbn1 | S2323 | T2327 | |
| BSN_MOUSE | Bsn | S1407, S2027, S2029, T2700, T2703 | S2029, S2694, T2703 | T1406, S2029, T2703 |
| SYN1_MOUSE | Syn1 | S518, T564 | S520, S568 | |
| ABLM1_MOUSE | Ablim1 | S496, S499 | S496, S499 | S494, T495, S496, S499, S502 |
| SKT_MOUSE | Skt | S357 | S359, S361 | |
| DEMA_MOUSE | Epb49 | S285 | S289 | |
| RBM14_MOUSE | Rbm14 | S278 | S280 | |
| CEBPB_MOUSE | Cebpb | S180, S181 | S184 | |
| SRBS1_MOUSE | Sorbs1 | S1199, S1200, S1201 | S1201 | S1201 |
| ESR2_MOUSE | Esr2 | S61 | S61 | |
| AKT1_MOUSE | Akt1 | S473 | S473 | |
| NOS3_RAT | Nos3 | S1178 | T1174, S1176 | |
| SP1_RAT | Sp1 | S613, T641, S642, S699, S703 | S613, T641, S642, S703 | T641, S642, S703 |
| LBR_RAT | Lbr | S96 | S99 | |
| TAU_RAT | Mapt | S711 | S711 | S707, T714, S715 |
| KPCB_RAT | Prkcb | T635 | T635 | |
| KPCD_RAT | Prkcd | T295, T348 | T295 | S299 |
| KPCE_RAT | Prkce | S368, T710 | S368, T710 | |
| KPCG_RAT | Prkcg | T689, S690 | S687 | |
| SYN1_RAT | Syn1 | S516, T562 | S518, S566 | |
| G3P_RAT | Gapdh | T227 | T227 | |
| LT_SV40 | SV40gp6 | S111, S112 | S112 | S112 |
Major GO categories of human O-GlcNAcylated proteins.
| Gene Ontology (GO) Terms | Count* | % Total | P-Value |
|---|---|---|---|
| GO:0045449~regulation of transcription | 108 | 23.48 | 5.90E-04 |
| GO:0006350~transcription | 93 | 20.22 | 1.99E-04 |
| GO:0051252~regulation of RNA metabolic process | 75 | 16.30 | 6.09E-03 |
| GO:0006355~regulation of transcription, DNA-dependent | 69 | 15.00 | 3.08E-02 |
| GO:0043933~macromolecular complex subunit organization | 54 | 11.74 | 1.64E-09 |
| GO:0065003~macromolecular complex assembly | 53 | 11.52 | 4.68E-10 |
| GO:0046907~intracellular transport | 52 | 11.30 | 8.96E-10 |
| GO:0007049~cell cycle | 52 | 11.30 | 2.14E-07 |
| GO:0006412~translation | 51 | 11.09 | 3.30E-21 |
| GO:0006396~RNA processing | 51 | 11.09 | 4.00E-12 |
| GO:0008104~protein localization | 51 | 11.09 | 1.91E-05 |
| GO:0010605~negative regulation of macromolecule metabolic process | 48 | 10.43 | 1.39E-06 |
| GO:0042981~regulation of apoptosis | 48 | 10.43 | 1.60E-05 |
| GO:0043067~regulation of programmed cell death | 48 | 10.43 | 2.07E-05 |
| GO:0010941~regulation of cell death | 48 | 10.43 | 2.25E-05 |
| GO:0045184~establishment of protein localization | 47 | 10.22 | 1.11E-05 |
| GO:0010604~positive regulation of macromolecule metabolic process | 47 | 10.22 | 1.55E-04 |
| GO:0000166~nucleotide binding | 132 | 28.70 | 2.56E-13 |
| GO:0003677~DNA binding | 101 | 21.96 | 7.11E-04 |
| GO:0032555~purine ribonucleotide binding | 92 | 20.00 | 7.02E-06 |
| GO:0032553~ribonucleotide binding | 92 | 20.00 | 7.02E-06 |
| GO:0017076~purine nucleotide binding | 92 | 20.00 | 3.91E-05 |
| GO:0030528~transcription regulator activity | 83 | 18.04 | 6.87E-07 |
| GO:0003723~RNA binding | 82 | 17.83 | 2.05E-24 |
| GO:0001882~nucleoside binding | 82 | 17.83 | 1.56E-05 |
| GO:0005524~ATP binding | 81 | 17.61 | 1.03E-06 |
| GO:0032559~adenyl ribonucleotide binding | 81 | 17.61 | 1.73E-06 |
| GO:0030554~adenyl nucleotide binding | 81 | 17.61 | 1.26E-05 |
| GO:0001883~purine nucleoside binding | 81 | 17.61 | 2.16E-05 |
| GO:0005198~structural molecule activity | 56 | 12.17 | 8.96E-12 |
| GO:0042802~identical protein binding | 51 | 11.09 | 3.27E-09 |
| GO:0043232~intracellular non-membrane-bounded organelle | 170 | 36.96 | 6.37E-29 |
| GO:0043228~non-membrane-bounded organelle | 170 | 36.96 | 6.37E-29 |
| GO:0005829~cytosol | 141 | 30.65 | 7.16E-46 |
| GO:0031974~membrane-enclosed lumen | 131 | 28.48 | 6.60E-24 |
| GO:0043233~organelle lumen | 129 | 28.04 | 1.20E-23 |
| GO:0070013~intracellular organelle lumen | 128 | 27.83 | 5.00E-24 |
| GO:0031981~nuclear lumen | 116 | 25.22 | 2.44E-25 |
| GO:0005856~cytoskeleton | 77 | 16.74 | 2.21E-08 |
| GO:0005654~nucleoplasm | 77 | 16.74 | 2.07E-18 |
| GO:0030529~ribonucleoprotein complex | 72 | 15.65 | 3.55E-29 |
| GO:0005730~nucleolus | 55 | 11.96 | 3.09E-11 |
| GO:0000267~cell fraction | 49 | 10.65 | 2.01E-03 |
| GO:0044430~cytoskeletal part | 49 | 10.65 | 1.16E-04 |
* The table is sorted based on counts of proteins annotated with given GO terms in each of the major GO category. The percentage of proteins with given GO terms over total number of human O-GlcNAcylated proteins is shown (% Total). A more detailed GO profiles are shown in [Additional file 1, Supplementary Table S1].
Pathway profiles using GeneGo Pathway Maps analysis.
| Pathways | P-value | Count* |
|---|---|---|
| Development_Role of CDK5 in neuronal development | 2.68E-11 | 12/34 |
| Development_Gastrin in cell growth and proliferation | 3.86E-11 | 15/62 |
| Immune response_Gastrin in inflammatory response | 2.00E-10 | 15/69 |
| Signal transduction_Activation of PKC via G-Protein coupled receptor | 5.21E-10 | 13/52 |
| Cytoskeleton remodeling_Cytoskeleton remodeling | 9.88E-10 | 17/102 |
| Glycolysis and gluconeogenesis (short map) | 1.43E-09 | 14/67 |
| Cytoskeleton remodeling_Neurofilaments | 7.28E-09 | 9/25 |
| Transcription_Role of Akt in hypoxia induced HIF1 activation | 1.59E-08 | 9/27 |
| Immune response_MIF - the neuroendocrine-macrophage connector | 1.93E-08 | 11/46 |
| Development_Prolactin receptor signaling | 2.50E-08 | 12/58 |
| Cytoskeleton remodeling_TGF, WNT and cytoskeletal remodeling | 2.72E-08 | 16/111 |
| Development_Gastrin in differentiation of the gastric mucosa | 3.22E-08 | 10/38 |
| Development_GM-CSF signaling | 4.94E-08 | 11/50 |
| Development_EGFR signaling pathway | 6.66E-08 | 12/63 |
| Cytoskeleton remodeling_Regulation of actin cytoskeleton by Rho GTPases | 7.06E-08 | 8/23 |
| G-protein signaling_Proinsulin C-peptide signaling | 7.62E-08 | 11/52 |
| Development_Glucocorticoid receptor signaling | 1.04E-07 | 8/24 |
| Development_VEGF signaling and activation | 1.16E-07 | 10/43 |
| Cell adhesion_Histamine H1 receptor signaling in the interruption of cell barrier integrity | 1.85E-07 | 10/45 |
| Immune response_Inhibitory action of Lipoxins on pro-inflammatory TNF-alpha signaling | 1.85E-07 | 10/45 |
| Signal transduction_Calcium signaling | 1.85E-07 | 10/45 |
| Regulation of CFTR activity (norm and CF) | 2.50E-07 | 11/58 |
| Translation _Regulation of translation initiation | 2.92E-07 | 8/27 |
| Immune response_Histamine H1 receptor signaling in immune response | 3.53E-07 | 10/48 |
| Immune response_IL-2 activation and signaling pathway | 4.34E-07 | 10/49 |
| Transcription_P53 signaling pathway | 5.48E-07 | 9/39 |
| Development_Slit-Robo signaling | 7.19E-07 | 8/30 |
| Development_Ligand-dependent activation of the ESR1/AP-1 pathway | 7.81E-07 | 6/14 |
| Development_PDGF signaling via STATs and NF-kB | 1.24E-06 | 8/32 |
| Signal transduction_AKT signaling | 1.33E-06 | 9/43 |
| Development_VEGF signaling via VEGFR2 - generic cascades | 2.00E-06 | 9/45 |
| Development_Thrombopoietin-regulated cell processes | 2.00E-06 | 9/45 |
| Mucin expression in CF via IL-6, IL-17 signaling pathways | 2.04E-06 | 8/34 |
| Development_TGF-beta-dependent induction of EMT via RhoA, PI3K and ILK. | 2.43E-06 | 9/46 |
| Development_PIP3 signaling in cardiac myocytes | 2.93E-06 | 9/47 |
| Cytoskeleton remodeling_Keratin filaments | 3.24E-06 | 8/36 |
| Development_Thyroliberin signaling | 3.61E-06 | 10/61 |
| Development_A3 receptor signaling | 4.22E-06 | 9/49 |
| Transport_RAN regulation pathway | 4.42E-06 | 6/18 |
| Immune response_IL-7 signaling in T lymphocytes | 5.01E-06 | 8/38 |
| Muscle contraction_Regulation of eNOS activity in endothelial cells | 5.66E-06 | 10/64 |
| Immune response_IL-6 signaling pathway | 7.57E-06 | 7/29 |
*Number of known O-GlcNAcylated proteins over the total number of proteins annotated in the corresponding GeneGO pathways. The table is ranked based on the pathway enrichment P-values.
Figure 3Sequence patterns and prediction performance of O-GlcNAcylation sites. Above, Graphical representation of sequence patterns surrounding the O-GlcNAcylation sites as determined by Two Sample Logo. The height of the amino acid character represents the relative frequency (enrichment or depletion) of the amino acid at any given positions relative to the O-GlcNAcylated residue (S/T at position "0"). Below, An ROC curve of OGlcNAcScan obtained in a five-fold cross-validation test. The area under this curve (i.e., AUC) is 74.3% of the plot area. The diagonal line indicates the ROC curve of random guessing, where the corresponding AUC value is 50%.
Figure 4The dbOGAP website home page. The website provides functionalities depicted by #1-#4: 1) search and browse the O-GlcNAcylated proteins in the database; 2) de novo prediction of O-GlcNAcylation sites for any protein sequences; 3) user annotation of O-GlcNAcylation information; 4) search and browse the total O-GlcNAcylation bibliography. The dbOGAP web site can be accessed at http://cbsb.lombardi.georgetown.edu/OGAP.html.
Figure 5The dbOGAP protein entry view (shown is human AKT1). The entry report provides general protein information as well as specific O-GlcNAcylation information in the context of other posttranslational modifications and site features. The literature evidence (PMID) for the O-GlcNAc sites (e.g. S473 and T308) is given. Clicking on any site will display the residue in the neighboring sequence context (pointed by blue arrow). If the O-GlcNAcylation sites are inferred from orthologs with known sites (e.g. T308 of mouse AKT1, pointed by red arrow, inferred from human AKT1 shown in the inset), sequence alignment for the inferred sites can be displayed (lower portion of the inset). Other annotations are also included in the entry record (below the sequence section, not shown), including gene ontology, pathway, derived from UniProtKB and iProClass.
Figure 6The O-GlcNAcylation site prediction result from OGlcNAcScan (shown is human ankyrin-1). The section at the bottom displays a ranked list of predicted O-GlcNAcylation sites (e.g., S1162 as the top one). The rank is based on the output value of the SVM classifier, which is converted into "Estimated Precision" and "Lift" scores (see help page linked from the top of the page for explanation). The estimated precision score is an estimated lower-bound of the precision (e.g., the score of 0.3910 indicates that at least 39.1% of sites assigned with the similar SVM output scores are O-GlcNAcylation sites), and the Lift score is an index of relative improvement through the classifier, which is calculated as the estimated precision divided by a constant value corresponding to the initial rate of positive sites (i.e., ~0.0123). All displayed potential sites are shown as red "S/T" in the sequence section (middle). Clicking on any predicted site, the residue will be highlighted in the sequence (arrow).