Literature DB >> 35176756

ASPIRER: a new computational approach for identifying non-classical secreted proteins based on deep learning.

Xiaoyu Wang1, Fuyi Li2, Jing Xu1, Jia Rong3, Geoffrey I Webb3, Zongyuan Ge4, Jian Li5, Jiangning Song3.   

Abstract

Protein secretion has a pivotal role in many biological processes and is particularly important for intercellular communication, from the cytoplasm to the host or external environment. Gram-positive bacteria can secrete proteins through multiple secretion pathways. The non-classical secretion pathway has recently received increasing attention among these secretion pathways, but its exact mechanism remains unclear. Non-classical secreted proteins (NCSPs) are a class of secreted proteins lacking signal peptides and motifs. Several NCSP predictors have been proposed to identify NCSPs and most of them employed the whole amino acid sequence of NCSPs to construct the model. However, the sequence length of different proteins varies greatly. In addition, not all regions of the protein are equally important and some local regions are not relevant to the secretion. The functional regions of the protein, particularly in the N- and C-terminal regions, contain important determinants for secretion. In this study, we propose a new hybrid deep learning-based framework, referred to as ASPIRER, which improves the prediction of NCSPs from amino acid sequences. More specifically, it combines a whole sequence-based XGBoost model and an N-terminal sequence-based convolutional neural network model; 5-fold cross-validation and independent tests demonstrate that ASPIRER achieves superior performance than existing state-of-the-art approaches. The source code and curated datasets of ASPIRER are publicly available at https://github.com/yanwu20/ASPIRER/. ASPIRER is anticipated to be a useful tool for improved prediction of novel putative NCSPs from sequences information and prioritization of candidate proteins for follow-up experimental validation.
© The Author(s) 2022. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com.

Entities:  

Keywords:  bioinformatics; deep learning; feature engineering; machine learning; non-classical secreted protein; predictor

Mesh:

Substances:

Year:  2022        PMID: 35176756      PMCID: PMC8921646          DOI: 10.1093/bib/bbac031

Source DB:  PubMed          Journal:  Brief Bioinform        ISSN: 1467-5463            Impact factor:   13.994


  45 in total

1.  Sequence-based prediction of protein-protein interactions by means of rotation forest and autocorrelation descriptor.

Authors:  Jun-Feng Xia; Kyungsook Han; De-Shuang Huang
Journal:  Protein Pept Lett       Date:  2010-01       Impact factor: 1.890

2.  GlycoMine: a machine learning-based approach for predicting N-, C- and O-linked glycosylation in the human proteome.

Authors:  Fuyi Li; Chen Li; Mingjun Wang; Geoffrey I Webb; Yang Zhang; James C Whisstock; Jiangning Song
Journal:  Bioinformatics       Date:  2015-01-06       Impact factor: 6.937

3.  PeNGaRoo, a combined gradient boosting and ensemble learning framework for predicting non-classical secreted proteins.

Authors:  Yanju Zhang; Sha Yu; Ruopeng Xie; Jiahui Li; André Leier; Tatiana T Marquez-Lago; Tatsuya Akutsu; A Ian Smith; Zongyuan Ge; Jiawei Wang; Trevor Lithgow; Jiangning Song
Journal:  Bioinformatics       Date:  2020-02-01       Impact factor: 6.937

4.  Nonclassical protein secretion by Bacillus subtilis in the stationary phase is not due to cell lysis.

Authors:  Chun-Kai Yang; Hosam E Ewis; XiaoZhou Zhang; Chung-Dar Lu; Hae-Jin Hu; Yi Pan; Ahmed T Abdelal; Phang C Tai
Journal:  J Bacteriol       Date:  2011-08-19       Impact factor: 3.490

5.  iLearnPlus: a comprehensive and automated machine-learning platform for nucleic acid and protein sequence analysis, prediction and visualization.

Authors:  Zhen Chen; Pei Zhao; Chen Li; Fuyi Li; Dongxu Xiang; Yong-Zi Chen; Tatsuya Akutsu; Roger J Daly; Geoffrey I Webb; Quanzhi Zhao; Lukasz Kurgan; Jiangning Song
Journal:  Nucleic Acids Res       Date:  2021-06-04       Impact factor: 16.971

Review 6.  Bacillus subtilis as cell factory for pharmaceutical proteins: a biotechnological approach to optimize the host organism.

Authors:  Lidia Westers; Helga Westers; Wim J Quax
Journal:  Biochim Biophys Acta       Date:  2004-11-11

7.  FunSAV: predicting the functional effect of single amino acid variants using a two-stage random forest model.

Authors:  Mingjun Wang; Xing-Ming Zhao; Kazuhiro Takemoto; Haisong Xu; Yuan Li; Tatsuya Akutsu; Jiangning Song
Journal:  PLoS One       Date:  2012-08-24       Impact factor: 3.240

8.  CD-HIT Suite: a web server for clustering and comparing biological sequences.

Authors:  Ying Huang; Beifang Niu; Ying Gao; Limin Fu; Weizhong Li
Journal:  Bioinformatics       Date:  2010-01-06       Impact factor: 6.937

9.  Porpoise: a new approach for accurate prediction of RNA pseudouridine sites.

Authors:  Fuyi Li; Xudong Guo; Peipei Jin; Jinxiang Chen; Dongxu Xiang; Jiangning Song; Lachlan J M Coin
Journal:  Brief Bioinform       Date:  2021-11-05       Impact factor: 13.994

10.  A major surface protein on group A streptococci is a glyceraldehyde-3-phosphate-dehydrogenase with multiple binding activity.

Authors:  V Pancholi; V A Fischetti
Journal:  J Exp Med       Date:  1992-08-01       Impact factor: 14.307

View more
  1 in total

1.  MLACP 2.0: An updated machine learning tool for anticancer peptide prediction.

Authors:  Le Thi Phan; Hyun Woo Park; Thejkiran Pitti; Thirumurthy Madhavan; Young-Jun Jeon; Balachandran Manavalan
Journal:  Comput Struct Biotechnol J       Date:  2022-08-02       Impact factor: 6.155

  1 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.