Literature DB >> 22864808

A novel model to predict O-glycosylation sites using a highly unbalanced dataset.

Kun Zhou1, Chunzhi Ai, Peipei Dong, Xuran Fan, Ling Yang.   

Abstract

In silico approaches have become an alternative method to study O-glycosylation. In this paper, we developed a linear interpretable model for O-glycosylation prediction based on an unbalanced dataset, analyzing the underlying biological knowledge of glycosylation. A training set of 4446 sites involving 468 positive sites and 3978 negative sites was developed during this research. The sites were encoded using the amino acid index (AAindex), and the forward stepwise procedure utilized for feature selection. The linear discriminant analysis with an equal a priori probability (PP-LDA) was employed to develop the interpretable model. Performance of the model was verified using both the internal leave-one-out cross-validation and external validation methods. Two non-linear algorithms, the supervised support vector machine and the unsupervised self-organizing competitive neural network, were used as comparisons. The PP-LDA model exhibited improved classification results with accuracy of 82.1% for cross-validations and 80.3% for external prediction. Further analysis of this linear model indicated that the properties at position R(1) and the properties relative to hydrophobicity contributed more to the glycosylation prediction. However, the alpha and turn propensities at the C-terminal, together with physicochemical properties at the N-terminal, are also relative to the glycosylation activity. This model is not only capable of predicting the possibility of glycosylation using an unbalanced dataset, but is also helpful to understand the underlying biological mechanisms of glycosylation. Considering the publicly accessibility of our prediction model, a downloadable program is provided in our supply materials.

Entities:  

Mesh:

Substances:

Year:  2012        PMID: 22864808     DOI: 10.1007/s10719-012-9434-x

Source DB:  PubMed          Journal:  Glycoconj J        ISSN: 0282-0080            Impact factor:   2.916


  36 in total

1.  AAindex: amino acid index database.

Authors:  S Kawashima; M Kanehisa
Journal:  Nucleic Acids Res       Date:  2000-01-01       Impact factor: 16.971

Review 2.  Glycosylation.

Authors:  G W Hart
Journal:  Curr Opin Cell Biol       Date:  1992-12       Impact factor: 8.382

Review 3.  Post-translational modifications in the context of therapeutic proteins.

Authors:  Gary Walsh; Roy Jefferis
Journal:  Nat Biotechnol       Date:  2006-10       Impact factor: 54.908

Review 4.  Glycan changes: cancer metastasis and anti-cancer vaccines.

Authors:  Min Li; Lujun Song; Xinyu Qin
Journal:  J Biosci       Date:  2010-12       Impact factor: 1.826

Review 5.  Modulation of protein biophysical properties by chemical glycosylation: biochemical insights and biomedical implications.

Authors:  R J Solá; J A Rodríguez-Martínez; K Griebenow
Journal:  Cell Mol Life Sci       Date:  2007-08       Impact factor: 9.261

6.  A sequence-coupled vector-projection model for predicting the specificity of GalNAc-transferase.

Authors:  K C Chou
Journal:  Protein Sci       Date:  1995-07       Impact factor: 6.725

7.  Prediction of O-glycosylation of mammalian proteins: specificity patterns of UDP-GalNAc:polypeptide N-acetylgalactosaminyltransferase.

Authors:  J E Hansen; O Lund; J Engelbrecht; H Bohr; J O Nielsen; J E Hansen
Journal:  Biochem J       Date:  1995-06-15       Impact factor: 3.857

Review 8.  Post-translational modifications of tau protein in Alzheimer's disease.

Authors:  C-X Gong; F Liu; I Grundke-Iqbal; K Iqbal
Journal:  J Neural Transm (Vienna)       Date:  2004-10-27       Impact factor: 3.575

9.  The specificity of UDP-GalNAc:polypeptide N-acetylgalactosaminyltransferase as inferred from a database of in vivo substrates and from the in vitro glycosylation of proteins and peptides.

Authors:  A P Elhammer; R A Poorman; E Brown; L L Maggiora; J G Hoogerheide; F J Kézdy
Journal:  J Biol Chem       Date:  1993-05-15       Impact factor: 5.157

10.  The influence of flanking sequence on the O-glycosylation of threonine in vitro.

Authors:  B C O'Connell; F K Hagen; L A Tabak
Journal:  J Biol Chem       Date:  1992-12-15       Impact factor: 5.157

View more
  1 in total

1.  OGP: A Repository of Experimentally Characterized O-glycoproteins to Facilitate Studies on O-glycosylation.

Authors:  Jiangming Huang; Mengxi Wu; Yang Zhang; Siyuan Kong; Mingqi Liu; Biyun Jiang; Pengyuan Yang; Weiqian Cao
Journal:  Genomics Proteomics Bioinformatics       Date:  2021-02-10       Impact factor: 6.409

  1 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.