Literature DB >> 23543354

Data construction for phosphorylation site prediction.

Haipeng Gong, Xiaoqing Liu, Jun Wu, Zengyou He.   

Abstract

Protein phosphorylation is one of the most pervasive post-translational modifications, regulating diverse cellular processes in various organisms. As mass spectrometry-based experimental approaches for identifying phosphorylation events are resource-intensive, many computational methods have been proposed, in which phosphorylation site prediction is formulated as a classification problem. They differ in several ways, and one crucial issue is the construction of training data and test data for unbiased performance evaluation. In this article, we categorize the existing data construction methods and try to answer three questions: (i) Is it equivalent to use different data construction methods in the assessment of phosphorylation site prediction algorithms? (ii) What kind of test data set is unbiased for assessing the prediction performance of a trained algorithm in different real world scenarios? (iii) Among the summarized training data construction methods, which one(s) has better generalization performance for most scenarios? To answer these questions, we conduct comprehensive experimental studies for both non-kinase-specific and kinase-specific prediction tasks. The experimental results show that: (i) different data construction methods can lead to significantly different prediction performance; (ii) there can be different test data construction methods that are unbiased with respect to different real world scenarios; and (iii) different data construction methods have different generalization performance in different real world scenarios. Therefore, when developing new algorithms in future research, people should concentrate on what kind of scenario their algorithm will work for, what the corresponding unbiased test data are and which training data construction method can generate best generalization performance.
© The Author 2013. Published by Oxford University Press. For Permissions, please email: journals.permissions@oup.com.

Entities:  

Keywords:  Friedman test; Wilcoxon signed-ranks test; classification; generalization performance; phosphorylation site prediction

Mesh:

Substances:

Year:  2013        PMID: 23543354     DOI: 10.1093/bib/bbt012

Source DB:  PubMed          Journal:  Brief Bioinform        ISSN: 1467-5463            Impact factor:   11.622


  2 in total

1.  Exact p-values for pairwise comparison of Friedman rank sums, with application to comparing classifiers.

Authors:  Rob Eisinga; Tom Heskes; Ben Pelzer; Manfred Te Grotenhuis
Journal:  BMC Bioinformatics       Date:  2017-01-25       Impact factor: 3.169

2.  PKIS: computational identification of protein kinases for experimentally discovered protein phosphorylation sites.

Authors:  Liang Zou; Mang Wang; Yi Shen; Jie Liao; Ao Li; Minghui Wang
Journal:  BMC Bioinformatics       Date:  2013-08-13       Impact factor: 3.169

  2 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.