Literature DB >> 12060689

Dictionary-driven prokaryotic gene finding.

Tetsuo Shibuya1, Isidore Rigoutsos.   

Abstract

Gene identification, also known as gene finding or gene recognition, is among the important problems of molecular biology that have been receiving increasing attention with the advent of large scale sequencing projects. Previous strategies for solving this problem can be categorized into essentially two schools of thought: one school employs sequence composition statistics, whereas the other relies on database similarity searches. In this paper, we propose a new gene identification scheme that combines the best characteristics from each of these two schools. In particular, our method determines gene candidates among the ORFs that can be identified in a given DNA strand through the use of the Bio-Dictionary, a database of patterns that covers essentially all of the currently available sample of the natural protein sequence space. Our approach relies entirely on the use of redundant patterns as the agents on which the presence or absence of genes is predicated and does not employ any additional evidence, e.g. ribosome-binding site signals. The Bio-Dictionary Gene Finder (BDGF), the algorithm's implementation, is a single computational engine able to handle the gene identification task across distinct archaeal and bacterial genomes. The engine exhibits performance that is characterized by simultaneous very high values of sensitivity and specificity, and a high percentage of correctly predicted start sites. Using a collection of patterns derived from an old (June 2000) release of the Swiss-Prot/TrEMBL database that contained 451 602 proteins and fragments, we demonstrate our method's generality and capabilities through an extensive analysis of 17 complete archaeal and bacterial genomes. Examples of previously unreported genes are also shown and discussed in detail.

Entities:  

Mesh:

Substances:

Year:  2002        PMID: 12060689      PMCID: PMC117281          DOI: 10.1093/nar/gkf338

Source DB:  PubMed          Journal:  Nucleic Acids Res        ISSN: 0305-1048            Impact factor:   16.971


  34 in total

1.  Starts of bacterial genes: estimating the reliability of computer predictions.

Authors:  D Frishman; A Mironov; M Gelfand
Journal:  Gene       Date:  1999-07-08       Impact factor: 3.688

2.  The gene identification problem: an overview for developers.

Authors:  J W Fickett
Journal:  Comput Chem       Date:  1996-03

3.  The PROSITE database, its status in 1999.

Authors:  K Hofmann; P Bucher; L Falquet; A Bairoch
Journal:  Nucleic Acids Res       Date:  1999-01-01       Impact factor: 16.971

4.  PRINTS prepares for the new millennium.

Authors:  T K Attwood; D R Flower; A P Lewis; J E Mabey; S R Morgan; P Scordis; J N Selley; W Wright
Journal:  Nucleic Acids Res       Date:  1999-01-01       Impact factor: 16.971

Review 5.  Computational methods for exon detection.

Authors:  J M Claverie
Journal:  Mol Biotechnol       Date:  1998-08       Impact factor: 2.695

Review 6.  Finding the genes in genomic DNA.

Authors:  C B Burge; S Karlin
Journal:  Curr Opin Struct Biol       Date:  1998-06       Impact factor: 6.809

Review 7.  Horizontal gene transfer from transgenic plants to terrestrial bacteria--a rare event?

Authors:  K M Nielsen; A M Bones; K Smalla; J D van Elsas
Journal:  FEMS Microbiol Rev       Date:  1998-06       Impact factor: 16.408

8.  Self-identification of protein-coding regions in microbial genomes.

Authors:  S Audic; J M Claverie
Journal:  Proc Natl Acad Sci U S A       Date:  1998-08-18       Impact factor: 11.205

9.  Bacterial start site prediction.

Authors:  S S Hannenhalli; W S Hayes; A G Hatzigeorgiou; J W Fickett
Journal:  Nucleic Acids Res       Date:  1999-09-01       Impact factor: 16.971

10.  CRITICA: coding region identification tool invoking comparative analysis.

Authors:  J H Badger; G J Olsen
Journal:  Mol Biol Evol       Date:  1999-04       Impact factor: 16.240

View more
  9 in total

1.  In silico pattern-based analysis of the human cytomegalovirus genome.

Authors:  Isidore Rigoutsos; Jiri Novotny; Tien Huynh; Stephen T Chin-Bow; Laxmi Parida; Daniel Platt; David Coleman; Thomas Shenk
Journal:  J Virol       Date:  2003-04       Impact factor: 5.103

2.  The web server of IBM's Bioinformatics and Pattern Discovery group.

Authors:  Tien Huynh; Isidore Rigoutsos; Laxmi Parida; Daniel Platt; Tetsuo Shibuya
Journal:  Nucleic Acids Res       Date:  2003-07-01       Impact factor: 16.971

3.  Structural details (kinks and non-alpha conformations) in transmembrane helices are intrahelically determined and can be predicted by sequence pattern descriptors.

Authors:  Isidore Rigoutsos; Peter Riek; Robert M Graham; Jiri Novotny
Journal:  Nucleic Acids Res       Date:  2003-08-01       Impact factor: 16.971

4.  Dictionary-driven protein annotation.

Authors:  Isidore Rigoutsos; Tien Huynh; Aris Floratos; Laxmi Parida; Daniel Platt
Journal:  Nucleic Acids Res       Date:  2002-09-01       Impact factor: 16.971

5.  Reevaluation of human cytomegalovirus coding potential.

Authors:  Eain Murphy; Isidore Rigoutsos; Tetsuo Shibuya; Thomas E Shenk
Journal:  Proc Natl Acad Sci U S A       Date:  2003-10-30       Impact factor: 11.205

6.  The web server of IBM's Bioinformatics and Pattern Discovery group: 2004 update.

Authors:  Tien Huynh; Isidore Rigoutsos
Journal:  Nucleic Acids Res       Date:  2004-07-01       Impact factor: 16.971

7.  Why genes overlap in viruses.

Authors:  Nicola Chirico; Alberto Vianelli; Robert Belshaw
Journal:  Proc Biol Sci       Date:  2010-07-07       Impact factor: 5.349

8.  Sequence analysis of the mobile genome island pKLC102 of Pseudomonas aeruginosa C.

Authors:  Jens Klockgether; Oleg Reva; Karen Larbig; Burkhard Tümmler
Journal:  J Bacteriol       Date:  2004-01       Impact factor: 3.490

9.  GISMO--gene identification using a support vector machine for ORF classification.

Authors:  Lutz Krause; Alice C McHardy; Tim W Nattkemper; Alfred Pühler; Jens Stoye; Folker Meyer
Journal:  Nucleic Acids Res       Date:  2006-12-14       Impact factor: 16.971

  9 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.