Literature DB >> 32750881

SSKM_Succ: A Novel Succinylation Sites Prediction Method Incorporating K-Means Clustering With a New Semi-Supervised Learning Algorithm.

Qiao Ning, Zhiqiang Ma, Xiaowei Zhao, Minghao Yin.   

Abstract

Protein succinylation is a type of post-translational modification (PTM) that occurs on lysine sites and plays a key role in protein conformation regulation and cellular function control. When training in computational method, it is difficult to designate negative samples because of the uncertainty of non-succinylation lysine sites, and if not handled properly, it may affect the performance of computational models dramatically. Therefore, we propose a new semi-supervised learning method to identify reliable non-succinylation lysine sites as negative samples. This method, named SSKM_Succ, also employs K-means clustering to divide data into 5 clusters. Besides, information of proximal PTMs and three kinds of sequence features (grey pseudo amino acid composition, K-space and position-special amino acid propensity) are utilized to formulate protein. Then, we perform a two-step feature selection to remove redundant features and construct the optimization model for each cluster. Finally, support vector machine is applied to construct a prediction model for each cluster. Promising results are obtained by this method with an accuracy of 80.18 percent for succinylation sites on the independent testing dataset. Meanwhile, we compare the result with other existing tools, and it shows that our method is promising for predicting succinylation sites. Through analysis, we further verify that succinylated protein has potential effects on amino acid degradation and fatty acid metabolism, and speculate that protein succinylation may be closely related to neurodegenerative diseases. The code of SSKM_Succ is available on the web https://github.com/yangyq505/SSKM_Succ.git.

Entities:  

Mesh:

Substances:

Year:  2022        PMID: 32750881     DOI: 10.1109/TCBB.2020.3006144

Source DB:  PubMed          Journal:  IEEE/ACM Trans Comput Biol Bioinform        ISSN: 1545-5963            Impact factor:   3.710


  3 in total

1.  PUP-Fuse: Prediction of Protein Pupylation Sites by Integrating Multiple Sequence Representations.

Authors:  Firda Nurul Auliah; Andi Nur Nilamyani; Watshara Shoombuatong; Md Ashad Alam; Md Mehedi Hasan; Hiroyuki Kurata
Journal:  Int J Mol Sci       Date:  2021-02-20       Impact factor: 5.923

2.  Deep_KsuccSite: A novel deep learning method for the identification of lysine succinylation sites.

Authors:  Xin Liu; Lin-Lin Xu; Ya-Ping Lu; Ting Yang; Xin-Yu Gu; Liang Wang; Yong Liu
Journal:  Front Genet       Date:  2022-09-29       Impact factor: 4.772

3.  PredNTS: Improved and Robust Prediction of Nitrotyrosine Sites by Integrating Multiple Sequence Features.

Authors:  Andi Nur Nilamyani; Firda Nurul Auliah; Mohammad Ali Moni; Watshara Shoombuatong; Md Mehedi Hasan; Hiroyuki Kurata
Journal:  Int J Mol Sci       Date:  2021-03-08       Impact factor: 5.923

  3 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.