Literature DB >> 35021193

Critical assessment of computational tools for prokaryotic and eukaryotic promoter prediction.

Meng Zhang1, Cangzhi Jia1, Fuyi Li2,3, Chen Li2, Yan Zhu1, Tatsuya Akutsu4, Geoffrey I Webb5,6, Quan Zou7, Lachlan J M Coin3, Jiangning Song2,6.   

Abstract

Promoters are crucial regulatory DNA regions for gene transcriptional activation. Rapid advances in next-generation sequencing technologies have accelerated the accumulation of genome sequences, providing increased training data to inform computational approaches for both prokaryotic and eukaryotic promoter prediction. However, it remains a significant challenge to accurately identify species-specific promoter sequences using computational approaches. To advance computational support for promoter prediction, in this study, we curated 58 comprehensive, up-to-date, benchmark datasets for 7 different species (i.e. Escherichia coli, Bacillus subtilis, Homo sapiens, Mus musculus, Arabidopsis thaliana, Zea mays and Drosophila melanogaster) to assist the research community to assess the relative functionality of alternative approaches and support future research on both prokaryotic and eukaryotic promoters. We revisited 106 predictors published since 2000 for promoter identification (40 for prokaryotic promoter, 61 for eukaryotic promoter, and 5 for both). We systematically evaluated their training datasets, computational methodologies, calculated features, performance and software usability. On the basis of these benchmark datasets, we benchmarked 19 predictors with functioning webservers/local tools and assessed their prediction performance. We found that deep learning and traditional machine learning-based approaches generally outperformed scoring function-based approaches. Taken together, the curated benchmark dataset repository and the benchmarking analysis in this study serve to inform the design and implementation of computational approaches for promoter prediction and facilitate more rigorous comparison of new techniques in the future.
© The Author(s) 2022. Published by Oxford University Press. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.

Entities:  

Keywords:  deep learning; machine learning; performance evaluation; promoter identification

Mesh:

Year:  2022        PMID: 35021193      PMCID: PMC8921625          DOI: 10.1093/bib/bbab551

Source DB:  PubMed          Journal:  Brief Bioinform        ISSN: 1467-5463            Impact factor:   11.622


  170 in total

1.  Dragon Promoter Finder: recognition of vertebrate RNA polymerase II promoters.

Authors:  Vladimir B Bajic; Seng Hong Seah; Allen Chong; Guanglan Zhang; Judice L Y Koh; Vladimir Brusic
Journal:  Bioinformatics       Date:  2002-01       Impact factor: 6.937

Review 2.  The RNA polymerase II core promoter: a key component in the regulation of gene expression.

Authors:  Jennifer E F Butler; James T Kadonaga
Journal:  Genes Dev       Date:  2002-10-15       Impact factor: 11.361

3.  iPro54-PseKNC: a sequence-based predictor for identifying sigma-54 promoters in prokaryote with pseudo k-tuple nucleotide composition.

Authors:  Hao Lin; En-Ze Deng; Hui Ding; Wei Chen; Kuo-Chen Chou
Journal:  Nucleic Acids Res       Date:  2014-10-31       Impact factor: 16.971

4.  Computational detection of vertebrate RNA polymerase II promoters.

Authors:  Vladimir B Bajic; Vladimir Brusic
Journal:  Methods Enzymol       Date:  2003       Impact factor: 1.600

5.  Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences.

Authors:  Weizhong Li; Adam Godzik
Journal:  Bioinformatics       Date:  2006-05-26       Impact factor: 6.937

6.  SCS: signal, context, and structure features for genome-wide human promoter recognition.

Authors:  Jia Zeng; Xiao-Yu Zhao; Xiao-Qin Cao; Hong Yan
Journal:  IEEE/ACM Trans Comput Biol Bioinform       Date:  2010 Jul-Sep       Impact factor: 3.710

7.  Critical assessment and performance improvement of plant-pathogen protein-protein interaction prediction methods.

Authors:  Shiping Yang; Hong Li; Huaqin He; Yuan Zhou; Ziding Zhang
Journal:  Brief Bioinform       Date:  2019-01-18       Impact factor: 11.622

8.  DNA free energy-based promoter prediction and comparative analysis of Arabidopsis and rice genomes.

Authors:  Czuee Morey; Sushmita Mookherjee; Ganesan Rajasekaran; Manju Bansal
Journal:  Plant Physiol       Date:  2011-04-29       Impact factor: 8.340

9.  Computational detection and location of transcription start sites in mammalian genomic DNA.

Authors:  Thomas A Down; Tim J P Hubbard
Journal:  Genome Res       Date:  2002-03       Impact factor: 9.043

10.  Identification and annotation of promoter regions in microbial genome sequences on the basis of DNA stability.

Authors:  Vetriselvi Rangannan; Manju Bansal
Journal:  J Biosci       Date:  2007-08       Impact factor: 1.826

View more
  1 in total

1.  iPro-WAEL: a comprehensive and robust framework for identifying promoters in multiple species.

Authors:  Pengyu Zhang; Hongming Zhang; Hao Wu
Journal:  Nucleic Acids Res       Date:  2022-10-14       Impact factor: 19.160

  1 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.