Literature DB >> 31532487

AssessORF: combining evolutionary conservation and proteomics to assess prokaryotic gene predictions.

Deepank R Korandla1,2,3, Jacob M Wozniak4,5, Anaamika Campeau4,5, David J Gonzalez4,5, Erik S Wright3.   

Abstract

MOTIVATION: A core task of genomics is to identify the boundaries of protein coding genes, which may cover over 90% of a prokaryote's genome. Several programs are available for gene finding, yet it is currently unclear how well these programs perform and whether any offers superior accuracy. This is in part because there is no universal benchmark for gene finding and, therefore, most developers select their own benchmarking strategy.
RESULTS: Here, we introduce AssessORF, a new approach for benchmarking prokaryotic gene predictions based on evidence from proteomics data and the evolutionary conservation of start and stop codons. We applied AssessORF to compare gene predictions offered by GenBank, GeneMarkS-2, Glimmer and Prodigal on genomes spanning the prokaryotic tree of life. Gene predictions were 88-95% in agreement with the available evidence, with Glimmer performing the worst but no clear winner. All programs were biased towards selecting start codons that were upstream of the actual start. Given these findings, there remains considerable room for improvement, especially in the detection of correct start sites.
AVAILABILITY AND IMPLEMENTATION: AssessORF is available as an R package via the Bioconductor package repository. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
© The Author(s) 2019. Published by Oxford University Press. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.

Mesh:

Substances:

Year:  2020        PMID: 31532487      PMCID: PMC7998711          DOI: 10.1093/bioinformatics/btz714

Source DB:  PubMed          Journal:  Bioinformatics        ISSN: 1367-4803            Impact factor:   6.937


  37 in total

1.  Improved microbial gene identification with GLIMMER.

Authors:  A L Delcher; D Harmon; S Kasif; O White; S L Salzberg
Journal:  Nucleic Acids Res       Date:  1999-12-01       Impact factor: 16.971

2.  Comparative evaluation of mass spectrometry platforms used in large-scale proteomics investigations.

Authors:  Joshua E Elias; Wilhelm Haas; Brendan K Faherty; Steven P Gygi
Journal:  Nat Methods       Date:  2005-09       Impact factor: 28.547

3.  Comparative genomic analysis of translation initiation mechanisms for genes lacking the Shine-Dalgarno sequence in prokaryotes.

Authors:  So Nakagawa; Yoshihito Niimura; Takashi Gojobori
Journal:  Nucleic Acids Res       Date:  2017-04-20       Impact factor: 16.971

4.  Microbial gene identification using interpolated Markov models.

Authors:  S L Salzberg; A L Delcher; S Kasif; O White
Journal:  Nucleic Acids Res       Date:  1998-01-15       Impact factor: 16.971

5.  Consistency of gene starts among Burkholderia genomes.

Authors:  John Dunbar; Judith D Cohn; Michael E Wall
Journal:  BMC Genomics       Date:  2011-02-22       Impact factor: 3.969

6.  Genome majority vote improves gene predictions.

Authors:  Michael E Wall; Sindhu Raghavan; Judith D Cohn; John Dunbar
Journal:  PLoS Comput Biol       Date:  2011-11-17       Impact factor: 4.475

7.  GenBank.

Authors:  Dennis A Benson; Mark Cavanaugh; Karen Clark; Ilene Karsch-Mizrachi; David J Lipman; James Ostell; Eric W Sayers
Journal:  Nucleic Acids Res       Date:  2016-11-28       Impact factor: 16.971

8.  Computational discovery and annotation of conserved small open reading frames in fungal genomes.

Authors:  Shuhaila Mat-Sharani; Mohd Firdaus-Raih
Journal:  BMC Bioinformatics       Date:  2019-02-04       Impact factor: 3.169

9.  Exploiting proteomic data for genome annotation and gene model validation in Aspergillus niger.

Authors:  James C Wright; Deana Sugden; Sue Francis-McIntyre; Isabel Riba-Garcia; Simon J Gaskell; Igor V Grigoriev; Scott E Baker; Robert J Beynon; Simon J Hubbard
Journal:  BMC Genomics       Date:  2009-02-04       Impact factor: 3.969

10.  N-terminal Proteomics Assisted Profiling of the Unexplored Translation Initiation Landscape in Arabidopsis thaliana.

Authors:  Patrick Willems; Elvis Ndah; Veronique Jonckheere; Simon Stael; Adriaan Sticker; Lennart Martens; Frank Van Breusegem; Kris Gevaert; Petra Van Damme
Journal:  Mol Cell Proteomics       Date:  2017-04-21       Impact factor: 5.911

View more
  2 in total

1.  KEMET - A python tool for KEGG Module evaluation and microbial genome annotation expansion.

Authors:  Matteo Palù; Arianna Basile; Guido Zampieri; Laura Treu; Alessandro Rossi; Maria Silvia Morlino; Stefano Campanaro
Journal:  Comput Struct Biotechnol J       Date:  2022-03-26       Impact factor: 7.271

2.  A Large-Scale Genome-Based Survey of Acidophilic Bacteria Suggests That Genome Streamlining Is an Adaption for Life at Low pH.

Authors:  Diego Cortez; Gonzalo Neira; Carolina González; Eva Vergara; David S Holmes
Journal:  Front Microbiol       Date:  2022-03-21       Impact factor: 5.640

  2 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.