Literature DB >> 30351377

Computational analysis and prediction of lysine malonylation sites by exploiting informative features in an integrative machine-learning framework.

Yanju Zhang1, Ruopeng Xie1, Jiawei Wang2, André Leier3,4, Tatiana T Marquez-Lago3,4, Tatsuya Akutsu5, Geoffrey I Webb6, Kuo-Chen Chou7,8, Jiangning Song6,9,10.   

Abstract

As a newly discovered post-translational modification (PTM), lysine malonylation (Kmal) regulates a myriad of cellular processes from prokaryotes to eukaryotes and has important implications in human diseases. Despite its functional significance, computational methods to accurately identify malonylation sites are still lacking and urgently needed. In particular, there is currently no comprehensive analysis and assessment of different features and machine learning (ML) methods that are required for constructing the necessary prediction models. Here, we review, analyze and compare 11 different feature encoding methods, with the goal of extracting key patterns and characteristics from residue sequences of Kmal sites. We identify optimized feature sets, with which four commonly used ML methods (random forest, support vector machines, K-nearest neighbor and logistic regression) and one recently proposed [Light Gradient Boosting Machine (LightGBM)] are trained on data from three species, namely, Escherichia coli, Mus musculus and Homo sapiens, and compared using randomized 10-fold cross-validation tests. We show that integration of the single method-based models through ensemble learning further improves the prediction performance and model robustness on the independent test. When compared to the existing state-of-the-art predictor, MaloPred, the optimal ensemble models were more accurate for all three species (AUC: 0.930, 0.923 and 0.944 for E. coli, M. musculus and H. sapiens, respectively). Using the ensemble models, we developed an accessible online predictor, kmal-sp, available at http://kmalsp.erc.monash.edu/. We hope that this comprehensive survey and the proposed strategy for building more accurate models can serve as a useful guide for inspiring future developments of computational methods for PTM site prediction, expedite the discovery of new malonylation and other PTM types and facilitate hypothesis-driven experimental validation of novel malonylated substrates and malonylation sites.
© The Author(s) 2018. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com.

Entities:  

Keywords:  Light Gradient Boosting Machine; computational prediction; ensemble learning; feature encoding methods; lysine malonylation; machine learning

Mesh:

Substances:

Year:  2019        PMID: 30351377      PMCID: PMC6954445          DOI: 10.1093/bib/bby079

Source DB:  PubMed          Journal:  Brief Bioinform        ISSN: 1467-5463            Impact factor:   11.622


  59 in total

1.  Two Sample Logo: a graphical representation of the differences between two sets of sequence alignments.

Authors:  Vladimir Vacic; Lilia M Iakoucheva; Predrag Radivojac
Journal:  Bioinformatics       Date:  2006-04-21       Impact factor: 6.937

2.  PPIevo: protein-protein interaction prediction from PSSM based evolutionary information.

Authors:  Javad Zahiri; Omid Yaghoubi; Morteza Mohammad-Noori; Reza Ebrahimpour; Ali Masoudi-Nejad
Journal:  Genomics       Date:  2013-06-06       Impact factor: 5.736

3.  The first identification of lysine malonylation substrates and its regulatory enzyme.

Authors:  Chao Peng; Zhike Lu; Zhongyu Xie; Zhongyi Cheng; Yue Chen; Minjia Tan; Hao Luo; Yi Zhang; Wendy He; Ke Yang; Bernadette M M Zwaans; Daniel Tishkoff; Linh Ho; David Lombard; Tong-Chuan He; Junbiao Dai; Eric Verdin; Yang Ye; Yingming Zhao
Journal:  Mol Cell Proteomics       Date:  2011-09-09       Impact factor: 5.911

4.  protr/ProtrWeb: R package and web server for generating various numerical representation schemes of protein sequences.

Authors:  Nan Xiao; Dong-Sheng Cao; Min-Feng Zhu; Qing-Song Xu
Journal:  Bioinformatics       Date:  2015-01-24       Impact factor: 6.937

5.  PaRSnIP: sequence-based protein solubility prediction using gradient boosting machine.

Authors:  Reda Rawi; Raghvendra Mall; Khalid Kunji; Chen-Hsiang Shen; Peter D Kwong; Gwo-Yu Chuang
Journal:  Bioinformatics       Date:  2018-04-01       Impact factor: 6.937

6.  Bastion6: a bioinformatics approach for accurate prediction of type VI secreted effectors.

Authors:  Jiawei Wang; Bingjiao Yang; André Leier; Tatiana T Marquez-Lago; Morihiro Hayashida; Andrea Rocker; Yanju Zhang; Tatsuya Akutsu; Kuo-Chen Chou; Richard A Strugnell; Jiangning Song; Trevor Lithgow
Journal:  Bioinformatics       Date:  2018-08-01       Impact factor: 6.937

7.  Prediction of Lysine Malonylation Sites Based on Pseudo Amino Acid.

Authors:  Qilin Xiang; Kaiyan Feng; Bo Liao; Yuewu Liu; Guohua Huang
Journal:  Comb Chem High Throughput Screen       Date:  2017       Impact factor: 1.339

8.  Some remarks on protein attribute prediction and pseudo amino acid composition.

Authors:  Kuo-Chen Chou
Journal:  J Theor Biol       Date:  2010-12-17       Impact factor: 2.691

9.  MS-kNN: protein function prediction by integrating multiple data sources.

Authors:  Liang Lan; Nemanja Djuric; Yuhong Guo; Slobodan Vucetic
Journal:  BMC Bioinformatics       Date:  2013-02-28       Impact factor: 3.169

10.  PhosphoPredict: A bioinformatics tool for prediction of human kinase-specific phosphorylation substrates and sites by integrating heterogeneous feature selection.

Authors:  Jiangning Song; Huilin Wang; Jiawei Wang; André Leier; Tatiana Marquez-Lago; Bingjiao Yang; Ziding Zhang; Tatsuya Akutsu; Geoffrey I Webb; Roger J Daly
Journal:  Sci Rep       Date:  2017-07-31       Impact factor: 4.379

View more
  17 in total

1.  Bastion3: a two-layer ensemble predictor of type III secreted effectors.

Authors:  Jiawei Wang; Jiahui Li; Bingjiao Yang; Ruopeng Xie; Tatiana T Marquez-Lago; André Leier; Morihiro Hayashida; Tatsuya Akutsu; Yanju Zhang; Kuo-Chen Chou; Joel Selkrig; Tieli Zhou; Jiangning Song; Trevor Lithgow
Journal:  Bioinformatics       Date:  2019-06-01       Impact factor: 6.937

2.  PaCRISPR: a server for predicting and visualizing anti-CRISPR proteins.

Authors:  Jiawei Wang; Wei Dai; Jiahui Li; Ruopeng Xie; Rhys A Dunstan; Christopher Stubenrauch; Yanju Zhang; Trevor Lithgow
Journal:  Nucleic Acids Res       Date:  2020-07-02       Impact factor: 16.971

3.  Critical assessment of computational tools for prokaryotic and eukaryotic promoter prediction.

Authors:  Meng Zhang; Cangzhi Jia; Fuyi Li; Chen Li; Yan Zhu; Tatsuya Akutsu; Geoffrey I Webb; Quan Zou; Lachlan J M Coin; Jiangning Song
Journal:  Brief Bioinform       Date:  2022-03-10       Impact factor: 11.622

4.  STALLION: a stacking-based ensemble learning framework for prokaryotic lysine acetylation site prediction.

Authors:  Shaherin Basith; Gwang Lee; Balachandran Manavalan
Journal:  Brief Bioinform       Date:  2022-01-17       Impact factor: 11.622

5.  Deep Learning-Based Advances In Protein Posttranslational Modification Site and Protein Cleavage Prediction.

Authors:  Subash C Pakhrin; Suresh Pokharel; Hiroto Saigo; Dukka B Kc
Journal:  Methods Mol Biol       Date:  2022

6.  Development of Machine-Learning Model to Predict COVID-19 Mortality: Application of Ensemble Model and Regarding Feature Impacts.

Authors:  Seung-Min Baik; Miae Lee; Kyung-Sook Hong; Dong-Jin Park
Journal:  Diagnostics (Basel)       Date:  2022-06-14

7.  csDMA: an improved bioinformatics tool for identifying DNA 6 mA modifications via Chou's 5-step rule.

Authors:  Ze Liu; Wei Dong; Wei Jiang; Zili He
Journal:  Sci Rep       Date:  2019-09-11       Impact factor: 4.379

8.  RF-MaloSite and DL-Malosite: Methods based on random forest and deep learning to identify malonylation sites.

Authors:  Hussam Al-Barakati; Niraj Thapa; Saigo Hiroto; Kaushik Roy; Robert H Newman; Dukka Kc
Journal:  Comput Struct Biotechnol J       Date:  2020-03-04       Impact factor: 7.271

9.  Identifying sarcopenia in advanced non-small cell lung cancer patients using skeletal muscle CT radiomics and machine learning.

Authors:  Xing Dong; Xu Dan; Ao Yawen; Xu Haibo; Li Huan; Tu Mengqi; Chen Linglong; Ruan Zhao
Journal:  Thorac Cancer       Date:  2020-08-06       Impact factor: 3.500

10.  SDM6A: A Web-Based Integrative Machine-Learning Framework for Predicting 6mA Sites in the Rice Genome.

Authors:  Shaherin Basith; Balachandran Manavalan; Tae Hwan Shin; Gwang Lee
Journal:  Mol Ther Nucleic Acids       Date:  2019-08-16       Impact factor: 8.886

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.