| Literature DB >> 27681207 |
Wang-Ren Qiu1,2,3, Quan-Shu Zheng1, Bi-Qian Sun1, Xuan Xiao1,4.
Abstract
Predicting phosphorylation protein is a challenging problem, particularly when query proteins have multi-label features meaning that they may be phosphorylated at two or more different type amino acids. In fact, human protein usually be phosphorylated at serine, threonine and tyrosine. By introducing the "multi-label learning" approach, a novel predictor has been developed that can be used to deal with the systems containing both single- and multi-label phosphorylation protein. Here we proposed a predictor called Multi-iPPseEvo by (1) incorporating the protein sequence evolutionary information into the general pseudo amino acid composition (PseAAC) via the grey system theory, (2) balancing out the skewed training datasets by the asymmetric bootstrap approach, and (3) constructing an ensemble predictor by fusing an array of individual random forest classifiers thru a voting system. Rigorous cross-validations via a set of multi-label metrics indicate that the multi-label phosphorylation predictor is very promising and encouraging. The current approach represents a new strategy to deal with the multi-label biological problems, and the software is freely available for academic use at http://www.jci-bioinfo.cn/Multi-iPPseEvo.Entities:
Keywords: Ensemble classifier; Multi-label learning; Protein phosphorylation; Random Forests
Mesh:
Substances:
Year: 2016 PMID: 27681207 DOI: 10.1002/minf.201600085
Source DB: PubMed Journal: Mol Inform ISSN: 1868-1743 Impact factor: 3.353