Chi-Chang Chang1,2, Chi-Hua Tung3, Chi-Wei Chen4,5, Chin-Hau Tu5, Yen-Wei Chu6,7. 1. School of Medical Informatics, Chung-Shan Medical University, Taichung, Taiwan. 2. IT Office, Chung Shan Medical University Hospital, Taichung, Taiwan. 3. Department of Bioinformatics, Chung-Hua University, Rm. S116, 707, Sec. 2, WuFu Rd., Hsinchu, 30012, Taiwan. 4. Department of Computer Science and Engineering, National Chung-Hsing University, 250, Kuo Kuang Rd., Taichung, 402, Taiwan. 5. Institute of Genomics and Bioinformatics, National Chung Hsing University, 250, Kuo Kuang Rd., Taichung, 402, Taiwan. 6. Institute of Genomics and Bioinformatics, National Chung Hsing University, 250, Kuo Kuang Rd., Taichung, 402, Taiwan. ywchu@nchu.edu.tw. 7. Biotechnology Center, Agricultural Biotechnology Center, Institute of Molecular Biology, National Chung Hsing University, 250, Kuo Kuang Rd., Taichung, 402, Taiwan. ywchu@nchu.edu.tw.
Abstract
Most modern tools used to predict sites of small ubiquitin-like modifier (SUMO) binding (referred to as SUMOylation) use algorithms, chemical features of the protein, and consensus motifs. However, these tools rarely consider the influence of post-translational modification (PTM) information for other sites within the same protein on the accuracy of prediction results. This study applied the Random Forest machine learning method, as well as motif screening models and a feature selection combination mechanism, to develop a SUMOylation prediction system, referred to as SUMOgo. With regard to prediction method, PTM sites were coded as new functional features in addition to structural features, such as sequence-based binary coding, encoded chemical features of proteins, and encoded secondary structure information that is important for PTM. Twenty cycles of prediction were conducted with a 1:1 combination of positive test data and random negative data. Matthew's correlation coefficient of SUMOgo reached 0.511, which is higher than that of current commonly used tools. This study further verified the important role of PTM in SUMOgo and includes a case study on CREB binding protein (CREBBP). The website for the final tool is http://predictor.nchu.edu.tw/SUMOgo .
Most modern tools used to predict sites of small ubiquitin-like modifier (SUMO) binding (referred to as SUMOylation) use algorithms, chemical features of the protein, and consensus motifs. However, these tools rarely consider the influence of post-translational modification (PTM) information for other sites within the same protein on the accuran>an class="Chemical">cy of prediction results. This study applied the Random Forest machine learning method, as well as motif screening models and a feature selection combination mechanism, to develop a SUMOylation prediction system, referred to as SUMOgo. With regard to prediction method, PTM sites were coded as new functional features in addition to structural features, such as sequence-based binary coding, encoded chemical features of proteins, and encoded secondary structure information that is important for PTM. Twenty cycles of prediction were conducted with a 1:1 combination of positive test data and random negative data. Matthew's correlation coefficient of SUMOgo reached 0.511, which is higher than that of current commonly used tools. This study further verified the important role of PTM in SUMOgo and includes a case study on CREB binding protein (CREBBP). The website for the final tool is http://predictor.nchu.edu.tw/SUMOgo .
Authors: Jun Wang; Li Chen; Shu Wen; Huiping Zhu; Wei Yu; Ivan P Moskowitz; Gary M Shaw; Richard H Finnell; Robert J Schwartz Journal: Birth Defects Res A Clin Mol Teratol Date: 2011-05-11
Authors: Vinay Ayyappan; Ricky Wat; Calvin Barber; Christina A Vivelo; Kathryn Gauch; Pat Visanpattanasin; Garth Cook; Christos Sazeides; Anthony K L Leung Journal: Nucleic Acids Res Date: 2021-01-08 Impact factor: 16.971