| Literature DB >> 33581334 |
Jiangming Huang1, Mengxi Wu1, Yang Zhang1, Siyuan Kong2, Mingqi Liu2, Biyun Jiang1, Pengyuan Yang3, Weiqian Cao4.
Abstract
Numerous studies on cancers, biopharmaceuticals, and clinical trials have necessitated comprehensive and precise analysis of protein O-glycosylation. However, the lack of updated and convenient databases deters the storage of and reference to emerging O-glycoprotein data. To resolve this issue, an O-glycoprotein repository named OGP was established in this work. It was constructed with a collection of O-glycoprotein data from different sources. OGP contains 9354 O-glycosylation sites and 11,633 site-specific O-glycans mapping to 2133 O-glycoproteins, and it is the largest O-glycoprotein repository thus far. Based on the recorded O-glycosylation sites, an O-glycosylation site prediction tool was developed. Moreover, an OGP-based website is already available (https://www.oglyp.org/). The website comprises four specially designed and user-friendly modules: statistical analysis, database search, site prediction, and data submission. The first version of OGP repository and the website allow users to obtain various O-glycoprotein-related information, such as protein accession Nos., O-glycosylation sites, O-glycopeptide sequences, site-specific O-glycan structures, experimental methods, and potential O-glycosylation sites. O-glycosylation data mining can be performed efficiently on this website, which will greatly facilitate related studies. In addition, the database is accessible from OGP website (https://www.oglyp.org/download.php).Entities:
Keywords: Data mining; O-glycoprotein related website; O-glycoprotein repository; O-glycosylation; Site prediction
Mesh:
Substances:
Year: 2021 PMID: 33581334 PMCID: PMC9039567 DOI: 10.1016/j.gpb.2020.05.003
Source DB: PubMed Journal: Genomics Proteomics Bioinformatics ISSN: 1672-0229 Impact factor: 6.409
Figure 1Overview of the OGP repository. A. OGP data collection. B. The scale of the OGP repository. C. Species distribution of O-glycoproteins and O-glycosylation sites in OGP. D. Comparison of OGP with O-GlycBase v6.0 on glycosylation site level. E. Comparison of OGP with O-GlycBase v6.0 on glycoprotein level.
Figure 2Development of Workflow for building OGP-based O-glycosylation site prediction model. B. Effect of scales and ratios of positive and negative instances on model prediction performance. C. Influence of amino acid residue length on the performance of the site prediction model. D. ROC curves of each classification algorithm. E. Precision recall curves of each classification algorithm. NB, naïve Bayesian; RF, random forest; SVM, support vector machine; ROC, receiver operating characteristic; AUC, area under the ROC curve; ANN, artificial neural networks; C4.5, C4.5 decision tree; KNN, k-nearest neighbors.
Figure 3Construction of OGP-based website. A. The MVC framework of the OGP-based website. B. Homepage of the website. MVC, Model View Controller.
Figure 4A webpage returned from a query for Fibrinogen gamma chain. A. Basic information of the O-glycoprotein. B. Protein sequence and all recorded O-glycosylation sites highlighted in pink. C. Experimentally verified O-glycopeptides and site-specific O-glycans. D. Corresponding experimental methods. E. Related source of references.