Literature DB >> 9847079

How to interpret an anonymous bacterial genome: machine learning approach to gene identification.

W S Hayes1, M Borodovsky.   

Abstract

In this report we address the problem of accurate statistical modeling of DNA sequences, either coding or noncoding, for a bacterial species whose genome (or a large portion) was sequenced but not yet characterized experimentally. Availability of these models is critical for successful solution of the genome annotation task by statistical methods of gene finding. We present the method, GeneMark-Genesis, which learns the parameters of Markov models of protein-coding and noncoding regions from anonymous bacterial genomic sequence. These models are subsequently used in the GeneMark and GeneMark.hmm gene-finding programs. Although there is basically one model of a noncoding region for a given genome, several models of protein-coding region are automatically obtained by GeneMark-Genesis. The diversity of protein-coding models reflects the diversity of oligonucleotide compositions, particularly the diversity of codon usage strategies observed in genes from one and the same genome. In the simplest and the most important case, there are just two gene models-typical and atypical ones. We show that the atypical model allows one to predict genes that escape identification by the typical model. Many genes predicted by the atypical model appear to be horizontally transferred genes. The early versions of GeneMark-Genesis were used for annotating the genomes of Methanoccocus jannaschii and Helicobacter pylori. We report the results of accuracy testing of the full-scale version of GeneMark-Genesis on 10 completely sequenced bacterial genomes. Interestingly, the GeneMark.hmm program that employed the typical and atypical models defined by GeneMark-Genesis was able to predict 683 new atypical genes with 176 of them confirmed by similarity search.

Entities:  

Mesh:

Substances:

Year:  1998        PMID: 9847079     DOI: 10.1101/gr.8.11.1154

Source DB:  PubMed          Journal:  Genome Res        ISSN: 1088-9051            Impact factor:   9.043


  41 in total

1.  Detecting and analyzing DNA sequencing errors: toward a higher quality of the Bacillus subtilis genome sequence.

Authors:  C Médigue; M Rose; A Viari; A Danchin
Journal:  Genome Res       Date:  1999-11       Impact factor: 9.043

2.  GeneMarkS: a self-training method for prediction of gene starts in microbial genomes. Implications for finding sequence motifs in regulatory regions.

Authors:  J Besemer; A Lomsadze; M Borodovsky
Journal:  Nucleic Acids Res       Date:  2001-06-15       Impact factor: 16.971

Review 3.  Horizontal gene transfer and bacterial diversity.

Authors:  Chitra Dutta; Archana Pan
Journal:  J Biosci       Date:  2002-02       Impact factor: 1.826

Review 4.  Current methods of gene prediction, their strengths and weaknesses.

Authors:  Catherine Mathé; Marie-France Sagot; Thomas Schiex; Pierre Rouzé
Journal:  Nucleic Acids Res       Date:  2002-10-01       Impact factor: 16.971

5.  Indications for acquisition of reductive dehalogenase genes through horizontal gene transfer by Dehalococcoides ethenogenes strain 195.

Authors:  Christophe Regeard; Julien Maillard; Christine Dufraigne; Patrick Deschavanne; Christof Holliger
Journal:  Appl Environ Microbiol       Date:  2005-06       Impact factor: 4.792

Review 6.  Analytical tools and databases for metagenomics in the next-generation sequencing era.

Authors:  Mincheol Kim; Ki-Hyun Lee; Seok-Whan Yoon; Bong-Soo Kim; Jongsik Chun; Hana Yi
Journal:  Genomics Inform       Date:  2013-09-30

7.  Gene prediction in novel fungal genomes using an ab initio algorithm with unsupervised training.

Authors:  Vardges Ter-Hovhannisyan; Alexandre Lomsadze; Yury O Chernoff; Mark Borodovsky
Journal:  Genome Res       Date:  2008-08-29       Impact factor: 9.043

8.  A benchmark of parametric methods for horizontal transfers detection.

Authors:  Jennifer Becq; Cécile Churlaud; Patrick Deschavanne
Journal:  PLoS One       Date:  2010-04-01       Impact factor: 3.240

9.  MetaGeneAnnotator: detecting species-specific patterns of ribosomal binding site for precise gene prediction in anonymous prokaryotic and phage genomes.

Authors:  Hideki Noguchi; Takeaki Taniguchi; Takehiko Itoh
Journal:  DNA Res       Date:  2008-10-21       Impact factor: 4.458

10.  A hidden reservoir of integrative elements is the major source of recently acquired foreign genes and ORFans in archaeal and bacterial genomes.

Authors:  Diego Cortez; Patrick Forterre; Simonetta Gribaldo
Journal:  Genome Biol       Date:  2009-06-16       Impact factor: 13.583

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.