Literature DB >> 9611239

Combining diverse evidence for gene recognition in completely sequenced bacterial genomes.

D Frishman1, A Mironov, H W Mewes, M Gelfand.   

Abstract

Analysis of a newly sequenced bacterial genome starts with identification of protein-coding genes. Functional assignment of proteins requires the exact knowledge of protein N-termini. We present a new program ORPHEUS that identifies candidate genes and accurately predicts gene starts. The analysis starts with a database similarity search and identification of reliable gene fragments. The latter are used to derive statistical characteristics of protein-coding regions and ribosome-binding sites and to predict the complete set of genes in the analyzed genome. In a test on Bacillus subtilis and Escherichia coli genomes, the program correctly identified 93.3% (resp. 96.3%) of experimentally annotated genes longer than 100 codons described in the PIR-International database, and for these genes 96.3% (83.9%) of starts were predicted exactly. Furthermore, 98.9% (99.1%) of genes longer than 100 codons annotated in GenBank were found, and 92.9% (75.7%) of predicted starts coincided with the feature table description. Finally, for the complete gene complements of B.subtilis and E.coli , including genes shorter than 100 codons, gene prediction accuracy was 88.9 and 87.1%, respectively, with 94.2 and 76.7% starts coinciding with the existing annotation.

Entities:  

Mesh:

Substances:

Year:  1998        PMID: 9611239      PMCID: PMC147632          DOI: 10.1093/nar/26.12.2941

Source DB:  PubMed          Journal:  Nucleic Acids Res        ISSN: 0305-1048            Impact factor:   16.971


  33 in total

1.  Fast comparison of a DNA sequence with a protein sequence database.

Authors:  X Huang
Journal:  Microb Comp Genomics       Date:  1996

2.  Deriving ribosomal binding site (RBS) statistical models from unannotated DNA sequences and the use of the RBS model for N-terminal prediction.

Authors:  W S Hayes; M Borodovsky
Journal:  Pac Symp Biocomput       Date:  1998

3.  Comparison of DNA sequences with protein sequences.

Authors:  W R Pearson; T Wood; Z Zhang; W Miller
Journal:  Genomics       Date:  1997-11-15       Impact factor: 5.736

4.  GeneMark.hmm: new solutions for gene finding.

Authors:  A V Lukashin; M Borodovsky
Journal:  Nucleic Acids Res       Date:  1998-02-15       Impact factor: 16.971

5.  The SWISS-PROT protein sequence data bank and its supplement TrEMBL in 1998.

Authors:  A Bairoch; R Apweiler
Journal:  Nucleic Acids Res       Date:  1998-01-01       Impact factor: 16.971

6.  Information content of binding sites on nucleotide sequences.

Authors:  T D Schneider; G D Stormo; L Gold; A Ehrenfeucht
Journal:  J Mol Biol       Date:  1986-04-05       Impact factor: 5.469

7.  Synonymous codon usage in Bacillus subtilis reflects both translational selection and mutational biases.

Authors:  D C Shields; P M Sharp
Journal:  Nucleic Acids Res       Date:  1987-10-12       Impact factor: 16.971

8.  Microbial gene identification using interpolated Markov models.

Authors:  S L Salzberg; A L Delcher; S Kasif; O White
Journal:  Nucleic Acids Res       Date:  1998-01-15       Impact factor: 16.971

9.  The 3'-terminal sequence of Escherichia coli 16S ribosomal RNA: complementarity to nonsense triplets and ribosome binding sites.

Authors:  J Shine; L Dalgarno
Journal:  Proc Natl Acad Sci U S A       Date:  1974-04       Impact factor: 11.205

10.  Markedly unbiased codon usage in Bacillus subtilis.

Authors:  N Ogasawara
Journal:  Gene       Date:  1985       Impact factor: 3.688

View more
  69 in total

1.  GeneMarkS: a self-training method for prediction of gene starts in microbial genomes. Implications for finding sequence motifs in regulatory regions.

Authors:  J Besemer; A Lomsadze; M Borodovsky
Journal:  Nucleic Acids Res       Date:  2001-06-15       Impact factor: 16.971

2.  ZCURVE: a new system for recognizing protein-coding genes in bacterial and archaeal genomes.

Authors:  Feng-Biao Guo; Hong-Yu Ou; Chun-Ting Zhang
Journal:  Nucleic Acids Res       Date:  2003-03-15       Impact factor: 16.971

3.  Yersinia pestis pFra shows biovar-specific differences and recent common ancestry with a Salmonella enterica serovar Typhi plasmid.

Authors:  M B Prentice; K D James; J Parkhill; S G Baker; K Stevens; M N Simmonds; K L Mungall; C Churcher; P C Oyston; R W Titball; B W Wren; J Wain; D Pickard; T T Hien; J J Farrar; G Dougan
Journal:  J Bacteriol       Date:  2001-04       Impact factor: 3.490

4.  Dictionary-driven prokaryotic gene finding.

Authors:  Tetsuo Shibuya; Isidore Rigoutsos
Journal:  Nucleic Acids Res       Date:  2002-06-15       Impact factor: 16.971

5.  The PEDANT genome database.

Authors:  Dmitrij Frishman; Martin Mokrejs; Denis Kosykh; Gabi Kastenmüller; Grigory Kolesov; Igor Zubrzycki; Christian Gruber; Birgitta Geier; Andreas Kaps; Kaj Albermann; Andreas Volz; Christian Wagner; Matthias Fellenberg; Klaus Heumann; Hans-Werner Mewes
Journal:  Nucleic Acids Res       Date:  2003-01-01       Impact factor: 16.971

6.  DNA-energetics-based analyses suggest additional genes in prokaryotes.

Authors:  Garima Khandelwal; Jalaj Gupta; B Jayaram
Journal:  J Biosci       Date:  2012-07       Impact factor: 1.826

7.  First genome data from uncultured upland soil cluster alpha methanotrophs provide further evidence for a close phylogenetic relationship to Methylocapsa acidiphila B2 and for high-affinity methanotrophy involving particulate methane monooxygenase.

Authors:  Peter Ricke; Michael Kube; Satoshi Nakagawa; Christoph Erkel; Richard Reinhardt; Werner Liesack
Journal:  Appl Environ Microbiol       Date:  2005-11       Impact factor: 4.792

8.  Identifying bacterial genes and endosymbiont DNA with Glimmer.

Authors:  Arthur L Delcher; Kirsten A Bratke; Edwin C Powers; Steven L Salzberg
Journal:  Bioinformatics       Date:  2007-01-19       Impact factor: 6.937

9.  Complete genome sequencing of Anaplasma marginale reveals that the surface is skewed to two superfamilies of outer membrane proteins.

Authors:  Kelly A Brayton; Lowell S Kappmeyer; David R Herndon; Michael J Dark; David L Tibbals; Guy H Palmer; Travis C McGuire; Donald P Knowles
Journal:  Proc Natl Acad Sci U S A       Date:  2004-12-23       Impact factor: 11.205

Review 10.  A bioinformatician's guide to metagenomics.

Authors:  Victor Kunin; Alex Copeland; Alla Lapidus; Konstantinos Mavromatis; Philip Hugenholtz
Journal:  Microbiol Mol Biol Rev       Date:  2008-12       Impact factor: 11.056

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.