Literature DB >> 32599617

DeepVF: a deep learning-based hybrid framework for identifying virulence factors using the stacking strategy.

Ruopeng Xie1, Jiahui Li1, Jiawei Wang2, Wei Dai3, André Leier4, Tatiana T Marquez-Lago4, Tatsuya Akutsu5, Trevor Lithgow6, Jiangning Song7, Yanju Zhang8.   

Abstract

Virulence factors (VFs) enable pathogens to infect their hosts. A wealth of individual, disease-focused studies has identified a wide variety of VFs, and the growing mass of bacterial genome sequence data provides an opportunity for computational methods aimed at predicting VFs. Despite their attractive advantages and performance improvements, the existing methods have some limitations and drawbacks. Firstly, as the characteristics and mechanisms of VFs are continually evolving with the emergence of antibiotic resistance, it is more and more difficult to identify novel VFs using existing tools that were previously developed based on the outdated data sets; secondly, few systematic feature engineering efforts have been made to examine the utility of different types of features for model performances, as the majority of tools only focused on extracting very few types of features. By addressing the aforementioned issues, the accuracy of VF predictors can likely be significantly improved. This, in turn, would be particularly useful in the context of genome wide predictions of VFs. In this work, we present a deep learning (DL)-based hybrid framework (termed DeepVF) that is utilizing the stacking strategy to achieve more accurate identification of VFs. Using an enlarged, up-to-date dataset, DeepVF comprehensively explores a wide range of heterogeneous features with popular machine learning algorithms. Specifically, four classical algorithms, including random forest, support vector machines, extreme gradient boosting and multilayer perceptron, and three DL algorithms, including convolutional neural networks, long short-term memory networks and deep neural networks are employed to train 62 baseline models using these features. In order to integrate their individual strengths, DeepVF effectively combines these baseline models to construct the final meta model using the stacking strategy. Extensive benchmarking experiments demonstrate the effectiveness of DeepVF: it achieves a more accurate and stable performance compared with baseline models on the benchmark dataset and clearly outperforms state-of-the-art VF predictors on the independent test. Using the proposed hybrid ensemble model, a user-friendly online predictor of DeepVF (http://deepvf.erc.monash.edu/) is implemented. Furthermore, its utility, from the user's viewpoint, is compared with that of existing toolkits. We believe that DeepVF will be exploited as a useful tool for screening and identifying potential VFs from protein-coding gene sequences in bacterial genomes.
© The Author(s) 2020. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com.

Entities:  

Keywords:  deep learning; ensemble learning; machine learning; recognition; sequence analysis; virulence factor prediction

Year:  2021        PMID: 32599617     DOI: 10.1093/bib/bbaa125

Source DB:  PubMed          Journal:  Brief Bioinform        ISSN: 1467-5463            Impact factor:   11.622


  12 in total

1.  Computational prediction of species-specific yeast DNA replication origin via iterative feature representation.

Authors:  Balachandran Manavalan; Shaherin Basith; Tae Hwan Shin; Gwang Lee
Journal:  Brief Bioinform       Date:  2021-07-20       Impact factor: 11.622

2.  STALLION: a stacking-based ensemble learning framework for prokaryotic lysine acetylation site prediction.

Authors:  Shaherin Basith; Gwang Lee; Balachandran Manavalan
Journal:  Brief Bioinform       Date:  2022-01-17       Impact factor: 11.622

3.  TACOS: a novel approach for accurate prediction of cell-specific long noncoding RNAs subcellular localization.

Authors:  Young-Jun Jeon; Md Mehedi Hasan; Hyun Woo Park; Ki Wook Lee; Balachandran Manavalan
Journal:  Brief Bioinform       Date:  2022-07-18       Impact factor: 13.994

4.  SCMRSA: a New Approach for Identifying and Analyzing Anti-MRSA Peptides Using Estimated Propensity Scores of Dipeptides.

Authors:  Phasit Charoenkwan; Sakawrat Kanthawong; Nalini Schaduangrat; Pietro Li'; Mohammad Ali Moni; Watshara Shoombuatong
Journal:  ACS Omega       Date:  2022-09-01

5.  Deepm5C: A deep-learning-based hybrid framework for identifying human RNA N5-methylcytosine sites using a stacking strategy.

Authors:  Md Mehedi Hasan; Sho Tsukiyama; Jae Youl Cho; Hiroyuki Kurata; Md Ashad Alam; Xiaowen Liu; Balachandran Manavalan; Hong-Wen Deng
Journal:  Mol Ther       Date:  2022-05-06       Impact factor: 12.910

Review 6.  Empirical comparison and analysis of machine learning-based predictors for predicting and analyzing of thermophilic proteins.

Authors:  Phasit Charoenkwan; Nalini Schaduangrat; Md Mehedi Hasan; Mohammad Ali Moni; Pietro Lió; Watshara Shoombuatong
Journal:  EXCLI J       Date:  2022-03-02       Impact factor: 4.022

7.  UMPred-FRL: A New Approach for Accurate Prediction of Umami Peptides Using Feature Representation Learning.

Authors:  Phasit Charoenkwan; Chanin Nantasenamat; Md Mehedi Hasan; Mohammad Ali Moni; Balachandran Manavalan; Watshara Shoombuatong
Journal:  Int J Mol Sci       Date:  2021-12-04       Impact factor: 5.923

8.  SCORPION is a stacking-based ensemble learning framework for accurate prediction of phage virion proteins.

Authors:  Saeed Ahmad; Phasit Charoenkwan; Julian M W Quinn; Mohammad Ali Moni; Md Mehedi Hasan; Pietro Lio'; Watshara Shoombuatong
Journal:  Sci Rep       Date:  2022-03-08       Impact factor: 4.379

9.  Functional Annotation of Hypothetical Proteins From the Enterobacter cloacae B13 Strain and Its Association With Pathogenicity.

Authors:  Supantha Dey; Sazzad Shahrear; Maliha Afroj Zinnia; Ahnaf Tajwar; Abul Bashar Mir Md Khademul Islam
Journal:  Bioinform Biol Insights       Date:  2022-08-06

10.  Computational prediction and interpretation of druggable proteins using a stacked ensemble-learning framework.

Authors:  Phasit Charoenkwan; Nalini Schaduangrat; Pietro Lio'; Mohammad Ali Moni; Watshara Shoombuatong; Balachandran Manavalan
Journal:  iScience       Date:  2022-08-05
View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.