| Literature DB >> 32805274 |
Yaning Liu1, Zhaomin Yu1, Cheng Chen1, Yu Han1, Bin Yu2.
Abstract
Lysine crotonylation is an important protein post-translational modification, which plays an important role in the process of chromosome organization and nucleic acid metabolism. Recognition of crotonylation sites is important to understand the function and mechanism of proteins. Traditional experimental methods are time-consuming and expensive, and can't predict crotonylation sites quickly and accurately. Therefore, this paper proposes a novel crotonylation sites prediction method called LightGBM-CroSite. First, binary encoding (BE), position weight amino acid composition (PWAA), encoding based on grouped weight (EBGW), k nearest neighbors (KNN), pseudo-position specific scoring matrix (PsePSSM) are used to extract features of protein sequences and obtain the original feature space. Second, the elastic net is used to remove redundant information and select the optimal feature subset. Third, the synthetic minority oversampling technique (SMOTE) is used to balance the samples. Finally, the balanced feature vectors are input into LightGBM to predict the crotonylation sites. According to the result of jackknife test, the Accuracy (ACC), Matthew's correlation coefficient (MCC) and area under ROC curve (AUC) are 98.99%, 0.9798 and 0.9996, respectively. Compared with other state-of-the-art methods, the results show that our method has a better model performance on the crotonylation sites prediction. The source code and all datasets are available at https://github.com/QUST-AIBBDRC/LightGBM-CroSite/.Entities:
Keywords: Crotonylation sites; Elastic net; LightGBM; Multi-feature fusion; SMOTE
Mesh:
Substances:
Year: 2020 PMID: 32805274 DOI: 10.1016/j.ab.2020.113903
Source DB: PubMed Journal: Anal Biochem ISSN: 0003-2697 Impact factor: 3.365