| Literature DB >> 20652405 |
YuDong Cai1, JianFeng He, Lin Lu.
Abstract
The mucin-type O-glycosylation of a protein is an important type of protein post-translational modification. This process is mediated by a family of UDP-GalNAc:polypeptide N-acetylgalactosaminyltransferases which transfer the N-acetylgalactosamine (GalNAc) to the serine or threonine residues with unknown specificity. In order to determine the glycosylation sites of a given protein, we present a two-staged prediction method here, which first determines whether a protein is a glycoprotein, and then determines the glycosylation sites of a protein that has been predicted to be glycosylated in the first stage. In the first stage, a protein is encoded by the protein families in PFAM, which is a collective annotated database of classified protein families; then it is predicted by a predictor trained by the training set. In the second stage, nonapeptides of the predicted mucin-type glycoproteins, with serine or threonine residues at their fifth sites, are represented by indices in AAIndex. Then, it is predicted whether the nonapeptides are attached by GalNAc by a predictor, which is constructed with features selected by feature selection methods [Maximum Relevance Minimum Redundancy (mRMR) method and Incremental Feature Selection method]. The prediction accuracy of the first stage is 94.9% validated by Leave-One-Out validation method; the prediction accuracy of the second stage is 99.4%. These results show that this method is valuable to study the mucin-type O-glycosylation. The analysis of the features used to construct the predictor of the second stage confirms the previously obtained results from other groups. The residues at position -1 and +3 have great impact on the prediction. Among other amino acid indices, the indices about alpha and turn propensities and indices about hydrophobicity of the residues in nonapeptide also influence the recognition of the GalNAc transferases. A web server is available at http://chemdata.shu.edu.cn/gal/.Entities:
Mesh:
Substances:
Year: 2010 PMID: 20652405 DOI: 10.1007/s11030-010-9240-y
Source DB: PubMed Journal: Mol Divers ISSN: 1381-1991 Impact factor: 2.943