Literature DB >> 10869034

On the convergence of a clustering algorithm for protein-coding regions in microbial genomes.

P Baldi1.   

Abstract

MOTIVATION: As the number of fully sequenced prokaryotic genomes continues to grow rapidly, computational methods for reliably detecting protein-coding regions become even more important. Audic and Claverie (1998) Proc. Natl Acad. Sci. USA, 95, 10026-10031, have proposed a clustering algorithm for protein-coding regions in microbial genomes. The algorithm is based on three Markov models of order k associated with subsequences extracted from a given genome. The parameters of the three Markov models are recursively updated by the algorithm which, in simulations, always appear to converge to a unique stable partition of the genome. The partition corresponds to three kinds of regions: (1) coding on the direct strand, (2) coding on the complementary strand, (3) non-coding.
RESULTS: Here we provide an explanation for the convergence of the algorithm by observing that it is essentially a form of the expectation maximization (EM) algorithm applied to the corresponding mixture model. We also provide a partial justification for the uniqueness of the partition based on identifiability. Other possible variations and improvements are briefly discussed.

Mesh:

Substances:

Year:  2000        PMID: 10869034     DOI: 10.1093/bioinformatics/16.4.367

Source DB:  PubMed          Journal:  Bioinformatics        ISSN: 1367-4803            Impact factor:   6.937


  5 in total

1.  GeneMarkS: a self-training method for prediction of gene starts in microbial genomes. Implications for finding sequence motifs in regulatory regions.

Authors:  J Besemer; A Lomsadze; M Borodovsky
Journal:  Nucleic Acids Res       Date:  2001-06-15       Impact factor: 16.971

2.  Prokaryotic gene finding based on physicochemical characteristics of codons calculated from molecular dynamics simulations.

Authors:  Poonam Singhal; B Jayaram; Surjit B Dixit; David L Beveridge
Journal:  Biophys J       Date:  2008-03-07       Impact factor: 4.033

3.  MetaGene: prokaryotic gene finding from environmental genome shotgun sequences.

Authors:  Hideki Noguchi; Jungho Park; Toshihisa Takagi
Journal:  Nucleic Acids Res       Date:  2006-10-05       Impact factor: 16.971

4.  Gene identification in novel eukaryotic genomes by self-training algorithm.

Authors:  Alexandre Lomsadze; Vardges Ter-Hovhannisyan; Yury O Chernoff; Mark Borodovsky
Journal:  Nucleic Acids Res       Date:  2005-11-28       Impact factor: 16.971

5.  MetaGeneAnnotator: detecting species-specific patterns of ribosomal binding site for precise gene prediction in anonymous prokaryotic and phage genomes.

Authors:  Hideki Noguchi; Takeaki Taniguchi; Takehiko Itoh
Journal:  DNA Res       Date:  2008-10-21       Impact factor: 4.458

  5 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.