Literature DB >> 25752770

Prediction of O-glycosylation sites based on multi-scale composition of amino acids and feature selection.

Yuan Chen1, Wei Zhou, Haiyan Wang, Zheming Yuan.   

Abstract

Protein glycosylation is one of the most important and complex post-translational modification that provides greater proteomic diversity than any other post-translational modification. Fast and reliable computational methods to identify glycosylation sites are in great demand. Two key issues, feature encoding and feature selection, can critically affect the accuracy of a computational method. We present a new O-glycosylation sites prediction method using only amino acid sequence information. The method includes the following components: (1) on the basis of multi-scale theory, features based on multi-scale composition of amino acids were extracted from the training sequences with identified glycosylation sites; (2) perform a two-stage feature selection to remove features that had adverse effects on the prediction, including a stage one preliminary filtering with Student's t test, and a second stage screening through iterative elimination using novel pairwise comparisons conducted in random subspace using support vector machine. Important features retained are used to build prediction model. The method is evaluated with sequence-based tenfold cross-validation tests on balanced datasets. The results of our experiments show that our method significantly outperforms those reported in the literature in terms of sensitivity, specificity, accuracy, Matthew's correlation coefficient. The prediction accuracy of serine and threonine residues sites reached 95.7 and 92.7%. The Matthew correlation coefficient of our method for S and T sites is 0.914 and 0.873, respectively. This method can evaluate each feature with the interactions of the rest of the features, which are still included in the model and have the advantage of high efficiency.

Entities:  

Mesh:

Substances:

Year:  2015        PMID: 25752770     DOI: 10.1007/s11517-015-1268-9

Source DB:  PubMed          Journal:  Med Biol Eng Comput        ISSN: 0140-0118            Impact factor:   2.602


  37 in total

1.  Support vector machines for predicting the specificity of GalNAc-transferase.

Authors:  Yu Dong Cai; Xiao Jun Liu; Xue Biao Xu; Kuo Chen Chou
Journal:  Peptides       Date:  2002-01       Impact factor: 3.750

Review 2.  Post-translational modifications in the context of therapeutic proteins.

Authors:  Gary Walsh; Roy Jefferis
Journal:  Nat Biotechnol       Date:  2006-10       Impact factor: 54.908

3.  The location and characterisation of the O-linked glycans of the human insulin receptor.

Authors:  Lindsay G Sparrow; Jeffrey J Gorman; Phillip M Strike; Christine P Robinson; Neil M McKern; V Chandana Epa; Colin W Ward
Journal:  Proteins       Date:  2007-02-01

4.  Characterization of domain-peptide interaction interface: a case study on the amphiphysin-1 SH3 domain.

Authors:  Tingjun Hou; Wei Zhang; David A Case; Wei Wang
Journal:  J Mol Biol       Date:  2008-01-03       Impact factor: 5.469

5.  Characterization of domain-peptide interaction interface: a generic structure-based model to decipher the binding specificity of SH3 domains.

Authors:  Tingjun Hou; Zheng Xu; Wei Zhang; William A McLaughlin; David A Case; Yang Xu; Wei Wang
Journal:  Mol Cell Proteomics       Date:  2008-11-20       Impact factor: 5.911

6.  Comparison of feature selection and classification methods for a brain-computer interface driven by non-motor imagery.

Authors:  Alvaro Fuentes Cabrera; Dario Farina; Kim Dremstrup
Journal:  Med Biol Eng Comput       Date:  2009-12-30       Impact factor: 2.602

7.  A sequence-coupled vector-projection model for predicting the specificity of GalNAc-transferase.

Authors:  K C Chou
Journal:  Protein Sci       Date:  1995-07       Impact factor: 6.725

Review 8.  Getting the glycosylation right: implications for the biotechnology industry.

Authors:  N Jenkins; R B Parekh; D C James
Journal:  Nat Biotechnol       Date:  1996-08       Impact factor: 54.908

9.  Predicting drug resistance of the HIV-1 protease using molecular interaction energy components.

Authors:  Tingjun Hou; Wei Zhang; Jian Wang; Wei Wang
Journal:  Proteins       Date:  2009-03

10.  Statistics review 13: receiver operating characteristic curves.

Authors:  Viv Bewick; Liz Cheek; Jonathan Ball
Journal:  Crit Care       Date:  2004-11-04       Impact factor: 9.097

View more
  1 in total

1.  Ridge regression estimated linear probability model predictions of O-glycosylation in proteins with structural and sequence data.

Authors:  Rajaram Gana; Sona Vasudevan
Journal:  BMC Mol Cell Biol       Date:  2019-06-28
  1 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.