Literature DB >> 32363397

Computational prediction and interpretation of both general and specific types of promoters in Escherichia coli by exploiting a stacked ensemble-learning framework.

Fuyi Li1,2, Jinxiang Chen3, Zongyuan Ge4, Ya Wen5, Yanwei Yue6, Morihiro Hayashida7, Abdelkader Baggag8, Halima Bensmail9, Jiangning Song10.   

Abstract

Promoters are short consensus sequences of DNA, which are responsible for transcription activation or the repression of all genes. There are many types of promoters in bacteria with important roles in initiating gene transcription. Therefore, solving promoter-identification problems has important implications for improving the understanding of their functions. To this end, computational methods targeting promoter classification have been established; however, their performance remains unsatisfactory. In this study, we present a novel stacked-ensemble approach (termed SELECTOR) for identifying both promoters and their respective classification. SELECTOR combined the composition of k-spaced nucleic acid pairs, parallel correlation pseudo-dinucleotide composition, position-specific trinucleotide propensity based on single-strand, and DNA strand features and using five popular tree-based ensemble learning algorithms to build a stacked model. Both 5-fold cross-validation tests using benchmark datasets and independent tests using the newly collected independent test dataset showed that SELECTOR outperformed state-of-the-art methods in both general and specific types of promoter prediction in Escherichia coli. Furthermore, this novel framework provides essential interpretations that aid understanding of model success by leveraging the powerful Shapley Additive exPlanation algorithm, thereby highlighting the most important features relevant for predicting both general and specific types of promoters and overcoming the limitations of existing 'Black-box' approaches that are unable to reveal causal relationships from large amounts of initially encoded features.
© The Author(s) 2020. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com.

Entities:  

Keywords:  bioinformatics; machine learning; model interpretability; promoters; sequence analysis; stacking strategy

Year:  2021        PMID: 32363397      PMCID: PMC7986616          DOI: 10.1093/bib/bbaa049

Source DB:  PubMed          Journal:  Brief Bioinform        ISSN: 1467-5463            Impact factor:   11.622


  53 in total

1.  Two Sample Logo: a graphical representation of the differences between two sets of sequence alignments.

Authors:  Vladimir Vacic; Lilia M Iakoucheva; Predrag Radivojac
Journal:  Bioinformatics       Date:  2006-04-21       Impact factor: 6.937

2.  Energetic contributions to the initiation of transcription in E. coli.

Authors:  Jayanthi Ramprakash; Frederick P Schwarz
Journal:  Biophys Chem       Date:  2008-09-18       Impact factor: 2.352

3.  A symmetrical theory of DNA sequences and its applications.

Authors:  C T Zhang
Journal:  J Theor Biol       Date:  1997-08-07       Impact factor: 2.691

4.  iDHS-EL: identifying DNase I hypersensitive sites by fusing three different modes of pseudo nucleotide composition into an ensemble learning framework.

Authors:  Bin Liu; Ren Long; Kuo-Chen Chou
Journal:  Bioinformatics       Date:  2016-04-08       Impact factor: 6.937

5.  DeepCleave: a deep learning predictor for caspase and matrix metalloprotease substrates and cleavage sites.

Authors:  Fuyi Li; Jinxiang Chen; André Leier; Tatiana Marquez-Lago; Quanzhong Liu; Yanze Wang; Jerico Revote; A Ian Smith; Tatsuya Akutsu; Geoffrey I Webb; Lukasz Kurgan; Jiangning Song
Journal:  Bioinformatics       Date:  2020-02-15       Impact factor: 6.937

6.  Pse-in-One: a web server for generating various modes of pseudo components of DNA, RNA, and protein sequences.

Authors:  Bin Liu; Fule Liu; Xiaolong Wang; Junjie Chen; Longyun Fang; Kuo-Chen Chou
Journal:  Nucleic Acids Res       Date:  2015-05-09       Impact factor: 16.971

7.  Protein functional annotation of simultaneously improved stability, accuracy and false discovery rate achieved by a sequence-based deep learning.

Authors:  Jiajun Hong; Yongchao Luo; Yang Zhang; Junbiao Ying; Weiwei Xue; Tian Xie; Lin Tao; Feng Zhu
Journal:  Brief Bioinform       Date:  2020-07-15       Impact factor: 11.622

8.  PlaD: A Transcriptomics Database for Plant Defense Responses to Pathogens, Providing New Insights into Plant Immune System.

Authors:  Huan Qi; Zhenhong Jiang; Kang Zhang; Shiping Yang; Fei He; Ziding Zhang
Journal:  Genomics Proteomics Bioinformatics       Date:  2018-09-26       Impact factor: 7.691

9.  RegulonDB version 9.0: high-level integration of gene regulation, coexpression, motif clustering and beyond.

Authors:  Socorro Gama-Castro; Heladia Salgado; Alberto Santos-Zavaleta; Daniela Ledezma-Tejeida; Luis Muñiz-Rascado; Jair Santiago García-Sotelo; Kevin Alquicira-Hernández; Irma Martínez-Flores; Lucia Pannier; Jaime Abraham Castro-Mondragón; Alejandra Medina-Rivera; Hilda Solano-Lira; César Bonavides-Martínez; Ernesto Pérez-Rueda; Shirley Alquicira-Hernández; Liliana Porrón-Sotelo; Alejandra López-Fuentes; Anastasia Hernández-Koutoucheva; Víctor Del Moral-Chávez; Fabio Rinaldi; Julio Collado-Vides
Journal:  Nucleic Acids Res       Date:  2015-11-02       Impact factor: 16.971

10.  PDRLGB: precise DNA-binding residue prediction using a light gradient boosting machine.

Authors:  Lei Deng; Juan Pan; Xiaojie Xu; Wenyi Yang; Chuyao Liu; Hui Liu
Journal:  BMC Bioinformatics       Date:  2018-12-31       Impact factor: 3.169

View more
  15 in total

1.  Computational prediction of species-specific yeast DNA replication origin via iterative feature representation.

Authors:  Balachandran Manavalan; Shaherin Basith; Tae Hwan Shin; Gwang Lee
Journal:  Brief Bioinform       Date:  2021-07-20       Impact factor: 11.622

2.  Critical assessment of computational tools for prokaryotic and eukaryotic promoter prediction.

Authors:  Meng Zhang; Cangzhi Jia; Fuyi Li; Chen Li; Yan Zhu; Tatsuya Akutsu; Geoffrey I Webb; Quan Zou; Lachlan J M Coin; Jiangning Song
Journal:  Brief Bioinform       Date:  2022-03-10       Impact factor: 11.622

3.  STALLION: a stacking-based ensemble learning framework for prokaryotic lysine acetylation site prediction.

Authors:  Shaherin Basith; Gwang Lee; Balachandran Manavalan
Journal:  Brief Bioinform       Date:  2022-01-17       Impact factor: 11.622

4.  AMYPred-FRL is a novel approach for accurate prediction of amyloid proteins by using feature representation learning.

Authors:  Phasit Charoenkwan; Saeed Ahmed; Chanin Nantasenamat; Julian M W Quinn; Mohammad Ali Moni; Pietro Lio'; Watshara Shoombuatong
Journal:  Sci Rep       Date:  2022-05-11       Impact factor: 4.996

5.  ASPIRER: a new computational approach for identifying non-classical secreted proteins based on deep learning.

Authors:  Xiaoyu Wang; Fuyi Li; Jing Xu; Jia Rong; Geoffrey I Webb; Zongyuan Ge; Jian Li; Jiangning Song
Journal:  Brief Bioinform       Date:  2022-03-10       Impact factor: 13.994

6.  Deepm5C: A deep-learning-based hybrid framework for identifying human RNA N5-methylcytosine sites using a stacking strategy.

Authors:  Md Mehedi Hasan; Sho Tsukiyama; Jae Youl Cho; Hiroyuki Kurata; Md Ashad Alam; Xiaowen Liu; Balachandran Manavalan; Hong-Wen Deng
Journal:  Mol Ther       Date:  2022-05-06       Impact factor: 12.910

Review 7.  Large-scale comparative review and assessment of computational methods for anti-cancer peptide identification.

Authors:  Xiao Liang; Fuyi Li; Jinxiang Chen; Junlong Li; Hao Wu; Shuqin Li; Jiangning Song; Quanzhong Liu
Journal:  Brief Bioinform       Date:  2021-07-20       Impact factor: 11.622

8.  TSSFinder-fast and accurate ab initio prediction of the core promoter in eukaryotic genomes.

Authors:  Mauro de Medeiros Oliveira; Igor Bonadio; Alicia Lie de Melo; Glaucia Mendes Souza; Alan Mitchell Durham
Journal:  Brief Bioinform       Date:  2021-11-05       Impact factor: 11.622

9.  UMPred-FRL: A New Approach for Accurate Prediction of Umami Peptides Using Feature Representation Learning.

Authors:  Phasit Charoenkwan; Chanin Nantasenamat; Md Mehedi Hasan; Mohammad Ali Moni; Balachandran Manavalan; Watshara Shoombuatong
Journal:  Int J Mol Sci       Date:  2021-12-04       Impact factor: 5.923

10.  Porpoise: a new approach for accurate prediction of RNA pseudouridine sites.

Authors:  Fuyi Li; Xudong Guo; Peipei Jin; Jinxiang Chen; Dongxu Xiang; Jiangning Song; Lachlan J M Coin
Journal:  Brief Bioinform       Date:  2021-11-05       Impact factor: 13.994

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.