Literature DB >> 30388198

Bastion3: a two-layer ensemble predictor of type III secreted effectors.

Jiawei Wang1, Jiahui Li1,2, Bingjiao Yang3, Ruopeng Xie3, Tatiana T Marquez-Lago4,5, André Leier4,5, Morihiro Hayashida6, Tatsuya Akutsu7, Yanju Zhang3, Kuo-Chen Chou8,9,10, Joel Selkrig11, Tieli Zhou2, Jiangning Song12,13,14, Trevor Lithgow1.   

Abstract

MOTIVATION: Type III secreted effectors (T3SEs) can be injected into host cell cytoplasm via type III secretion systems (T3SSs) to modulate interactions between Gram-negative bacterial pathogens and their hosts. Due to their relevance in pathogen-host interactions, significant computational efforts have been put toward identification of T3SEs and these in turn have stimulated new T3SE discoveries. However, as T3SEs with new characteristics are discovered, these existing computational tools reveal important limitations: (i) most of the trained machine learning models are based on the N-terminus (or incorporating also the C-terminus) instead of the proteins' complete sequences, and (ii) the underlying models (trained with classic algorithms) employed only few features, most of which were extracted based on sequence-information alone. To achieve better T3SE prediction, we must identify more powerful, informative features and investigate how to effectively integrate these into a comprehensive model.
RESULTS: In this work, we present Bastion3, a two-layer ensemble predictor developed to accurately identify type III secreted effectors from protein sequence data. In contrast with existing methods that employ single models with few features, Bastion3 explores a wide range of features, from various types, trains single models based on these features and finally integrates these models through ensemble learning. We trained the models using a new gradient boosting machine, LightGBM and further boosted the models' performances through a novel genetic algorithm (GA) based two-step parameter optimization strategy. Our benchmark test demonstrates that Bastion3 achieves a much better performance compared to commonly used methods, with an ACC value of 0.959, F-value of 0.958, MCC value of 0.917 and AUC value of 0.956, which comprehensively outperformed all other toolkits by more than 5.6% in ACC value, 5.7% in F-value, 12.4% in MCC value and 5.8% in AUC value. Based on our proposed two-layer ensemble model, we further developed a user-friendly online toolkit, maximizing convenience for experimental scientists toward T3SE prediction. With its design to ease future discoveries of novel T3SEs and improved performance, Bastion3 is poised to become a widely used, state-of-the-art toolkit for T3SE prediction.
AVAILABILITY AND IMPLEMENTATION: http://bastion3.erc.monash.edu/. CONTACT: selkrig@embl.de or wyztli@163.com or or trevor.lithgow@monash.edu. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
© The Author(s) 2018. Published by Oxford University Press. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.

Mesh:

Substances:

Year:  2019        PMID: 30388198      PMCID: PMC7963071          DOI: 10.1093/bioinformatics/bty914

Source DB:  PubMed          Journal:  Bioinformatics        ISSN: 1367-4803            Impact factor:   6.937


  61 in total

1.  Prediction of protein subcellular locations by incorporating quasi-sequence-order effect.

Authors:  K C Chou
Journal:  Biochem Biophys Res Commun       Date:  2000-11-19       Impact factor: 3.575

2.  Three-dimensional secretion signals in chaperone-effector complexes of bacterial pathogens.

Authors:  Sara C Birtalan; Rebecca M Phillips; Partho Ghosh
Journal:  Mol Cell       Date:  2002-05       Impact factor: 17.970

3.  T-Coffee: A novel method for fast and accurate multiple sequence alignment.

Authors:  C Notredame; D G Higgins; J Heringa
Journal:  J Mol Biol       Date:  2000-09-08       Impact factor: 5.469

4.  PaRSnIP: sequence-based protein solubility prediction using gradient boosting machine.

Authors:  Reda Rawi; Raghvendra Mall; Khalid Kunji; Chen-Hsiang Shen; Peter D Kwong; Gwo-Yu Chuang
Journal:  Bioinformatics       Date:  2018-04-01       Impact factor: 6.937

5.  Bastion6: a bioinformatics approach for accurate prediction of type VI secreted effectors.

Authors:  Jiawei Wang; Bingjiao Yang; André Leier; Tatiana T Marquez-Lago; Morihiro Hayashida; Andrea Rocker; Yanju Zhang; Tatsuya Akutsu; Kuo-Chen Chou; Richard A Strugnell; Jiangning Song; Trevor Lithgow
Journal:  Bioinformatics       Date:  2018-08-01       Impact factor: 6.937

Review 6.  Protein-Injection Machines in Bacteria.

Authors:  Jorge E Galán; Gabriel Waksman
Journal:  Cell       Date:  2018-03-08       Impact factor: 41.582

7.  Systematic analysis and prediction of type IV secreted effector proteins by machine learning approaches.

Authors:  Jiawei Wang; Bingjiao Yang; Yi An; Tatiana Marquez-Lago; André Leier; Jonathan Wilksch; Qingyang Hong; Yang Zhang; Morihiro Hayashida; Tatsuya Akutsu; Geoffrey I Webb; Richard A Strugnell; Jiangning Song; Trevor Lithgow
Journal:  Brief Bioinform       Date:  2019-05-21       Impact factor: 11.622

8.  Using weakly conserved motifs hidden in secretion signals to identify type-III effectors from bacterial pathogen genomes.

Authors:  Xiaobao Dong; Yong-Jun Zhang; Ziding Zhang
Journal:  PLoS One       Date:  2013-02-20       Impact factor: 3.240

9.  The EMBL-EBI bioinformatics web and programmatic tools framework.

Authors:  Weizhong Li; Andrew Cowley; Mahmut Uludag; Tamer Gur; Hamish McWilliam; Silvano Squizzato; Young Mi Park; Nicola Buso; Rodrigo Lopez
Journal:  Nucleic Acids Res       Date:  2015-04-06       Impact factor: 16.971

10.  Effective identification of bacterial type III secretion signals using joint element features.

Authors:  Yejun Wang; Ming'an Sun; Hongxia Bao; Qing Zhang; Dianjing Guo
Journal:  PLoS One       Date:  2013-04-04       Impact factor: 3.240

View more
  18 in total

1.  PaCRISPR: a server for predicting and visualizing anti-CRISPR proteins.

Authors:  Jiawei Wang; Wei Dai; Jiahui Li; Ruopeng Xie; Rhys A Dunstan; Christopher Stubenrauch; Yanju Zhang; Trevor Lithgow
Journal:  Nucleic Acids Res       Date:  2020-07-02       Impact factor: 16.971

2.  STALLION: a stacking-based ensemble learning framework for prokaryotic lysine acetylation site prediction.

Authors:  Shaherin Basith; Gwang Lee; Balachandran Manavalan
Journal:  Brief Bioinform       Date:  2022-01-17       Impact factor: 11.622

3.  BBPpredict: A Web Service for Identifying Blood-Brain Barrier Penetrating Peptides.

Authors:  Xue Chen; Qianyue Zhang; Bowen Li; Chunying Lu; Shanshan Yang; Jinjin Long; Bifang He; Heng Chen; Jian Huang
Journal:  Front Genet       Date:  2022-05-17       Impact factor: 4.772

4.  Extremely-randomized-tree-based Prediction of N6-Methyladenosine Sites in Saccharomyces cerevisiae.

Authors:  Rajiv G Govindaraj; Sathiyamoorthy Subramaniyam; Balachandran Manavalan
Journal:  Curr Genomics       Date:  2020-01       Impact factor: 2.236

5.  iMethylK_pseAAC: Improving Accuracy of Lysine Methylation Sites Identification by Incorporating Statistical Moments and Position Relative Features into General PseAAC via Chou's 5-steps Rule.

Authors:  Sarah Ilyas; Waqar Hussain; Adeel Ashraf; Yaser Daanial Khan; Sher Afzal Khan; Kuo-Chen Chou
Journal:  Curr Genomics       Date:  2019-05       Impact factor: 2.236

6.  Identifying sarcopenia in advanced non-small cell lung cancer patients using skeletal muscle CT radiomics and machine learning.

Authors:  Xing Dong; Xu Dan; Ao Yawen; Xu Haibo; Li Huan; Tu Mengqi; Chen Linglong; Ruan Zhao
Journal:  Thorac Cancer       Date:  2020-08-06       Impact factor: 3.500

7.  SDM6A: A Web-Based Integrative Machine-Learning Framework for Predicting 6mA Sites in the Rice Genome.

Authors:  Shaherin Basith; Balachandran Manavalan; Tae Hwan Shin; Gwang Lee
Journal:  Mol Ther Nucleic Acids       Date:  2019-08-16       Impact factor: 8.886

8.  Using an optimal set of features with a machine learning-based approach to predict effector proteins for Legionella pneumophila.

Authors:  Zhila Esna Ashari; Kelly A Brayton; Shira L Broschat
Journal:  PLoS One       Date:  2019-01-25       Impact factor: 3.240

9.  AtbPpred: A Robust Sequence-Based Prediction of Anti-Tubercular Peptides Using Extremely Randomized Trees.

Authors:  Balachandran Manavalan; Shaherin Basith; Tae Hwan Shin; Leyi Wei; Gwang Lee
Journal:  Comput Struct Biotechnol J       Date:  2019-07-03       Impact factor: 7.271

10.  4mCpred-EL: An Ensemble Learning Framework for Identification of DNA N4-methylcytosine Sites in the Mouse Genome.

Authors:  Balachandran Manavalan; Shaherin Basith; Tae Hwan Shin; Da Yeon Lee; Leyi Wei; Gwang Lee
Journal:  Cells       Date:  2019-10-28       Impact factor: 6.600

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.