Abel Avitesh Chandra1, Alok Sharma1,2,3, Abdollah Dehzangi4,5, Tatushiko Tsunoda2,6,7. 1. School of Engineering & Physics, University of the South Pacific, Laucala Bay, Suva, Fiji. 2. Laboratory for Medical Science Mathematics, RIKEN Center for Integrative Medical Sciences, Yokohama 230-0045, Japan. 3. Institute for Integrated and Intelligent Systems, Griffith University, Brisbane, QLD 4111, Australia. 4. Department of Computer Science, Rutgers University, Camden, NJ 08102, USA. 5. Center for Computational and Integrative Biology, Rutgers University, Camden, NJ 08102, USA. 6. Laboratory for Medical Science Mathematics, Department of Biological Sciences, Graduate School of Science, The University of Tokyo, Tokyo 113-0033, Japan. 7. Department of Medical Science Mathematics, Medical Research Institute, Tokyo Medical and Dental University, Tokyo 113-8510, Japan.
Abstract
BACKGROUND: Post-translational modification (PTM) is a biological process that is associated with the modification of proteome, which results in the alteration of normal cell biology and pathogenesis. There have been numerous PTM reports in recent years, out of which, lysine phosphoglycerylation has emerged as one of the recent developments. The traditional methods of identifying phosphoglycerylated residues, which are experimental procedures such as mass spectrometry, have shown to be time-consuming and cost-inefficient, despite the abundance of proteins being sequenced in this post-genomic era. Due to these drawbacks, computational techniques are being sought to establish an effective identification system of phosphoglycerylated lysine residues. The development of a predictor for phosphoglycerylation prediction is not a first, but it is necessary as the latest predictor falls short in adequately detecting phosphoglycerylated and non-phosphoglycerylated lysine residues. RESULTS: In this work, we introduce a new predictor named RAM-PGK, which uses sequence-based information relating to amino acid residues to predict phosphoglycerylated and non-phosphoglycerylated sites. A benchmark dataset was employed for this purpose, which contained experimentally identified phosphoglycerylated and non-phosphoglycerylated lysine residues. From the dataset, we extracted the residue adjacency matrix pertaining to each lysine residue in the protein sequences and converted them into feature vectors, which is used to build the phosphoglycerylation predictor. CONCLUSION: RAM-PGK, which is based on sequential features and support vector machine classifiers, has shown a noteworthy improvement in terms of performance in comparison to some of the recent prediction methods. The performance metrics of the RAM-PGK predictor are: 0.5741 sensitivity, 0.6436 specificity, 0.0531 precision, 0.6414 accuracy, and 0.0824 Mathews correlation coefficient.
BACKGROUND: Post-translational modification (PTM) is a biological process that is associated with the modification of proteome, which results in the alteration of normal cell biology and pathogenesis. There have been numerous PTM reports in recent years, out of which, lysine phosphoglycerylation has emerged as one of the recent developments. The traditional methods of identifying phosphoglycerylated residues, which are experimental procedures such as mass spectrometry, have shown to be time-consuming and cost-inefficient, despite the abundance of proteins being sequenced in this post-genomic era. Due to these drawbacks, computational techniques are being sought to establish an effective identification system of phosphoglycerylated lysine residues. The development of a predictor for phosphoglycerylation prediction is not a first, but it is necessary as the latest predictor falls short in adequately detecting phosphoglycerylated and non-phosphoglycerylated lysine residues. RESULTS: In this work, we introduce a new predictor named RAM-PGK, which uses sequence-based information relating to amino acid residues to predict phosphoglycerylated and non-phosphoglycerylated sites. A benchmark dataset was employed for this purpose, which contained experimentally identified phosphoglycerylated and non-phosphoglycerylated lysine residues. From the dataset, we extracted the residue adjacency matrix pertaining to each lysine residue in the protein sequences and converted them into feature vectors, which is used to build the phosphoglycerylation predictor. CONCLUSION: RAM-PGK, which is based on sequential features and support vector machine classifiers, has shown a noteworthy improvement in terms of performance in comparison to some of the recent prediction methods. The performance metrics of the RAM-PGK predictor are: 0.5741 sensitivity, 0.6436 specificity, 0.0531 precision, 0.6414 accuracy, and 0.0824 Mathews correlation coefficient.
Entities:
Keywords:
amino acids; lysine; non-phosphoglycerylation; phosphoglycerylation; post-translational modification; predictor; protein lysine modification database; protein sequence; residue adjacency matrix; support vector machine
Authors: Jeongsoon Park; Yue Chen; Daniel X Tishkoff; Chao Peng; Minjia Tan; Lunzhai Dai; Zhongyu Xie; Yi Zhang; Bernadette M M Zwaans; Mary E Skinner; David B Lombard; Yingming Zhao Journal: Mol Cell Date: 2013-06-27 Impact factor: 17.970