Literature DB >> 32805274

Prediction of protein crotonylation sites through LightGBM classifier based on SMOTE and elastic net.

Yaning Liu1, Zhaomin Yu1, Cheng Chen1, Yu Han1, Bin Yu2.   

Abstract

Lysine crotonylation is an important protein post-translational modification, which plays an important role in the process of chromosome organization and nucleic acid metabolism. Recognition of crotonylation sites is important to understand the function and mechanism of proteins. Traditional experimental methods are time-consuming and expensive, and can't predict crotonylation sites quickly and accurately. Therefore, this paper proposes a novel crotonylation sites prediction method called LightGBM-CroSite. First, binary encoding (BE), position weight amino acid composition (PWAA), encoding based on grouped weight (EBGW), k nearest neighbors (KNN), pseudo-position specific scoring matrix (PsePSSM) are used to extract features of protein sequences and obtain the original feature space. Second, the elastic net is used to remove redundant information and select the optimal feature subset. Third, the synthetic minority oversampling technique (SMOTE) is used to balance the samples. Finally, the balanced feature vectors are input into LightGBM to predict the crotonylation sites. According to the result of jackknife test, the Accuracy (ACC), Matthew's correlation coefficient (MCC) and area under ROC curve (AUC) are 98.99%, 0.9798 and 0.9996, respectively. Compared with other state-of-the-art methods, the results show that our method has a better model performance on the crotonylation sites prediction. The source code and all datasets are available at https://github.com/QUST-AIBBDRC/LightGBM-CroSite/.
Copyright © 2020 Elsevier Inc. All rights reserved.

Entities:  

Keywords:  Crotonylation sites; Elastic net; LightGBM; Multi-feature fusion; SMOTE

Mesh:

Substances:

Year:  2020        PMID: 32805274     DOI: 10.1016/j.ab.2020.113903

Source DB:  PubMed          Journal:  Anal Biochem        ISSN: 0003-2697            Impact factor:   3.365


  7 in total

1.  DeepStack-DTIs: Predicting Drug-Target Interactions Using LightGBM Feature Selection and Deep-Stacked Ensemble Classifier.

Authors:  Yan Zhang; Zhiwen Jiang; Cheng Chen; Qinqin Wei; Haiming Gu; Bin Yu
Journal:  Interdiscip Sci       Date:  2021-11-03       Impact factor: 2.233

2.  BioAutoML: automated feature engineering and metalearning to predict noncoding RNAs in bacteria.

Authors:  Robson P Bonidia; Anderson P Avila Santos; Breno L S de Almeida; Peter F Stadler; Ulisses N da Rocha; Danilo S Sanches; André C P L F de Carvalho
Journal:  Brief Bioinform       Date:  2022-07-18       Impact factor: 13.994

3.  nhKcr: a new bioinformatics tool for predicting crotonylation sites on human nonhistone proteins based on deep learning.

Authors:  Yong-Zi Chen; Zhuo-Zhi Wang; Yanan Wang; Guoguang Ying; Zhen Chen; Jiangning Song
Journal:  Brief Bioinform       Date:  2021-11-05       Impact factor: 11.622

Review 4.  Application of Sparse Representation in Bioinformatics.

Authors:  Shuguang Han; Ning Wang; Yuxin Guo; Furong Tang; Lei Xu; Ying Ju; Lei Shi
Journal:  Front Genet       Date:  2021-12-15       Impact factor: 4.599

5.  iKcr_CNN: A novel computational tool for imbalance classification of human nonhistone crotonylation sites based on convolutional neural networks with focal loss.

Authors:  Lijun Dou; Zilong Zhang; Lei Xu; Quan Zou
Journal:  Comput Struct Biotechnol J       Date:  2022-06-16       Impact factor: 6.155

6.  ResSUMO: A Deep Learning Architecture Based on Residual Structure for Prediction of Lysine SUMOylation Sites.

Authors:  Yafei Zhu; Yuhai Liu; Yu Chen; Lei Li
Journal:  Cells       Date:  2022-08-25       Impact factor: 7.666

Review 7.  Protein lysine crotonylation: past, present, perspective.

Authors:  Gaoyue Jiang; Chunxia Li; Meng Lu; Kefeng Lu; Huihui Li
Journal:  Cell Death Dis       Date:  2021-07-14       Impact factor: 8.469

  7 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.