Literature DB >> 34532736

STALLION: a stacking-based ensemble learning framework for prokaryotic lysine acetylation site prediction.

Shaherin Basith1, Gwang Lee2, Balachandran Manavalan1.   

Abstract

Protein post-translational modification (PTM) is an important regulatory mechanism that plays a key role in both normal and disease states. Acetylation on lysine residues is one of the most potent PTMs owing to its critical role in cellular metabolism and regulatory processes. Identifying protein lysine acetylation (Kace) sites is a challenging task in bioinformatics. To date, several machine learning-based methods for the in silico identification of Kace sites have been developed. Of those, a few are prokaryotic species-specific. Despite their attractive advantages and performances, these methods have certain limitations. Therefore, this study proposes a novel predictor STALLION (STacking-based Predictor for ProkAryotic Lysine AcetyLatION), containing six prokaryotic species-specific models to identify Kace sites accurately. To extract crucial patterns around Kace sites, we employed 11 different encodings representing three different characteristics. Subsequently, a systematic and rigorous feature selection approach was employed to identify the optimal feature set independently for five tree-based ensemble algorithms and built their respective baseline model for each species. Finally, the predicted values from baseline models were utilized and trained with an appropriate classifier using the stacking strategy to develop STALLION. Comparative benchmarking experiments showed that STALLION significantly outperformed existing predictor on independent tests. To expedite direct accessibility to the STALLION models, a user-friendly online predictor was implemented, which is available at: http://thegleelab.org/STALLION.
© The Author(s) 2021. Published by Oxford University Press.

Entities:  

Keywords:  bioinformatics; feature optimization; lysine acetylation sites; machine learning; performance assessment; stacking strategy

Mesh:

Substances:

Year:  2022        PMID: 34532736      PMCID: PMC8769686          DOI: 10.1093/bib/bbab376

Source DB:  PubMed          Journal:  Brief Bioinform        ISSN: 1467-5463            Impact factor:   11.622


  80 in total

1.  Amino acid substitution matrices from protein blocks.

Authors:  S Henikoff; J G Henikoff
Journal:  Proc Natl Acad Sci U S A       Date:  1992-11-15       Impact factor: 11.205

Review 2.  Proteolytic post-translational modification of proteins: proteomic tools and methodology.

Authors:  Lindsay D Rogers; Christopher M Overall
Journal:  Mol Cell Proteomics       Date:  2013-07-25       Impact factor: 5.911

3.  New chemical descriptors relevant for the design of biologically active peptides. A multivariate characterization of 87 amino acids.

Authors:  M Sandberg; L Eriksson; J Jonsson; M Sjöström; S Wold
Journal:  J Med Chem       Date:  1998-07-02       Impact factor: 7.446

4.  iATP: A Sequence Based Method for Identifying Anti-tubercular Peptides.

Authors:  Wei Chen; Pengmian Feng; Fulei Nie
Journal:  Med Chem       Date:  2020       Impact factor: 2.745

5.  Posttranslational modifications in proteins: resources, tools and prediction methods.

Authors:  Shahin Ramazi; Javad Zahiri
Journal:  Database (Oxford)       Date:  2021-04-07       Impact factor: 3.451

6.  Proteome-wide analysis of amino acid variations that influence protein lysine acetylation.

Authors:  Sheng-Bao Suo; Jian-Ding Qiu; Shao-Ping Shi; Xiang Chen; Shu-Yun Huang; Ru-Ping Liang
Journal:  J Proteome Res       Date:  2013-01-18       Impact factor: 4.466

7.  Prediction of S-nitrosylation sites by integrating support vector machines and random forest.

Authors:  Md Mehedi Hasan; Balachandran Manavalan; Mst Shamima Khatun; Hiroyuki Kurata
Journal:  Mol Omics       Date:  2019-12-02

Review 8.  In vivo chemical modification of proteins (post-translational modification).

Authors:  F Wold
Journal:  Annu Rev Biochem       Date:  1981       Impact factor: 23.643

9.  dbPTM: an information repository of protein post-translational modification.

Authors:  Tzong-Yi Lee; Hsien-Da Huang; Jui-Hung Hung; Hsi-Yuan Huang; Yuh-Shyong Yang; Tzu-Hao Wang
Journal:  Nucleic Acids Res       Date:  2006-01-01       Impact factor: 16.971

10.  CD-HIT: accelerated for clustering the next-generation sequencing data.

Authors:  Limin Fu; Beifang Niu; Zhengwei Zhu; Sitao Wu; Weizhong Li
Journal:  Bioinformatics       Date:  2012-10-11       Impact factor: 6.937

View more
  8 in total

1.  TACOS: a novel approach for accurate prediction of cell-specific long noncoding RNAs subcellular localization.

Authors:  Young-Jun Jeon; Md Mehedi Hasan; Hyun Woo Park; Ki Wook Lee; Balachandran Manavalan
Journal:  Brief Bioinform       Date:  2022-07-18       Impact factor: 13.994

2.  Prediction of Plant Resistance Proteins Based on Pairwise Energy Content and Stacking Framework.

Authors:  Yifan Chen; Zejun Li; Zhiyong Li
Journal:  Front Plant Sci       Date:  2022-05-31       Impact factor: 6.627

Review 3.  Empirical comparison and analysis of machine learning-based predictors for predicting and analyzing of thermophilic proteins.

Authors:  Phasit Charoenkwan; Nalini Schaduangrat; Md Mehedi Hasan; Mohammad Ali Moni; Pietro Lió; Watshara Shoombuatong
Journal:  EXCLI J       Date:  2022-03-02       Impact factor: 4.022

4.  SCORPION is a stacking-based ensemble learning framework for accurate prediction of phage virion proteins.

Authors:  Saeed Ahmad; Phasit Charoenkwan; Julian M W Quinn; Mohammad Ali Moni; Md Mehedi Hasan; Pietro Lio'; Watshara Shoombuatong
Journal:  Sci Rep       Date:  2022-03-08       Impact factor: 4.379

5.  C10Pred: A First Machine Learning Based Tool to Predict C10 Family Cysteine Peptidases Using Sequence-Derived Features.

Authors:  Adeel Malik; Nitin Mahajan; Tanveer Ali Dar; Chang-Bae Kim
Journal:  Int J Mol Sci       Date:  2022-08-23       Impact factor: 6.208

6.  A Statistical Analysis of the Sequence and Structure of Thermophilic and Non-Thermophilic Proteins.

Authors:  Zahoor Ahmed; Hasan Zulfiqar; Lixia Tang; Hao Lin
Journal:  Int J Mol Sci       Date:  2022-09-04       Impact factor: 6.208

7.  Prediction of anti-inflammatory peptides by a sequence-based stacking ensemble model named AIPStack.

Authors:  Hua Deng; Chaofeng Lou; Zengrui Wu; Weihua Li; Guixia Liu; Yun Tang
Journal:  iScience       Date:  2022-08-17

8.  Identification of Helicobacter pylori Membrane Proteins Using Sequence-Based Features.

Authors:  Mujiexin Liu; Hui Chen; Dong Gao; Cai-Yi Ma; Zhao-Yue Zhang
Journal:  Comput Math Methods Med       Date:  2022-01-12       Impact factor: 2.238

  8 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.