Literature DB >> 7984429

A hidden Markov model that finds genes in E. coli DNA.

A Krogh1, I S Mian, D Haussler.   

Abstract

A hidden Markov model (HMM) has been developed to find protein coding genes in E. coli DNA using E. coli genome DNA sequence from the EcoSeq6 database maintained by Kenn Rudd. This HMM includes states that model the codons and their frequencies in E. coli genes, as well as the patterns found in the intergenic region, including repetitive extragenic palindromic sequences and the Shine-Delgarno motif. To account for potential sequencing errors and or frameshifts in raw genomic DNA sequence, it allows for the (very unlikely) possibility of insertions and deletions of individual nucleotides within a codon. The parameters of the HMM are estimated using approximately one million nucleotides of annotated DNA in EcoSeq6 and the model tested on a disjoint set of contigs containing about 325,000 nucleotides. The HMM finds the exact locations of about 80% of the known E. coli genes, and approximate locations for about 10%. It also finds several potentially new genes, and locates several places were insertion or deletion errors/and or frameshifts may be present in the contigs.

Entities:  

Mesh:

Substances:

Year:  1994        PMID: 7984429      PMCID: PMC308529          DOI: 10.1093/nar/22.22.4768

Source DB:  PubMed          Journal:  Nucleic Acids Res        ISSN: 0305-1048            Impact factor:   16.971


  33 in total

1.  Basic local alignment search tool.

Authors:  S F Altschul; W Gish; W Miller; E W Myers; D J Lipman
Journal:  J Mol Biol       Date:  1990-10-05       Impact factor: 5.469

2.  An expectation maximization (EM) algorithm for the identification and characterization of common sites in unaligned biopolymer sequences.

Authors:  C E Lawrence; A A Reilly
Journal:  Proteins       Date:  1990

3.  Finding protein coding regions in genomic sequences.

Authors:  R Staden
Journal:  Methods Enzymol       Date:  1990       Impact factor: 1.600

4.  Stochastic models for heterogeneous DNA sequences.

Authors:  G A Churchill
Journal:  Bull Math Biol       Date:  1989       Impact factor: 1.758

5.  Codon preference and primary sequence structure in protein-coding regions.

Authors:  S Tavaré; B Song
Journal:  Bull Math Biol       Date:  1989       Impact factor: 1.758

6.  Using Dirichlet mixture priors to derive hidden Markov models for protein families.

Authors:  M Brown; R Hughey; A Krogh; I S Mian; K Sjölander; D Haussler
Journal:  Proc Int Conf Intell Syst Mol Biol       Date:  1993

7.  Repetitive extragenic palindromic sequences: a major component of the bacterial genome.

Authors:  M J Stern; G F Ames; N H Smith; E C Robinson; C F Higgins
Journal:  Cell       Date:  1984-07       Impact factor: 41.582

8.  Prediction of human mRNA donor and acceptor sites from the DNA sequence.

Authors:  S Brunak; J Engelbrecht; S Knudsen
Journal:  J Mol Biol       Date:  1991-07-05       Impact factor: 5.469

9.  Molecular sequence accuracy and the analysis of protein coding regions.

Authors:  D J States; D Botstein
Journal:  Proc Natl Acad Sci U S A       Date:  1991-07-01       Impact factor: 11.205

10.  Mapping sequenced E.coli genes by computer: software, strategies and examples.

Authors:  K E Rudd; W Miller; C Werner; J Ostell; C Tolstoshev; S G Satterfield
Journal:  Nucleic Acids Res       Date:  1991-02-11       Impact factor: 16.971

View more
  50 in total

1.  Comparison of sequence profiles. Strategies for structural predictions using sequence information.

Authors:  L Rychlewski; L Jaroszewski; W Li; A Godzik
Journal:  Protein Sci       Date:  2000-02       Impact factor: 6.725

2.  GeneMarkS: a self-training method for prediction of gene starts in microbial genomes. Implications for finding sequence motifs in regulatory regions.

Authors:  J Besemer; A Lomsadze; M Borodovsky
Journal:  Nucleic Acids Res       Date:  2001-06-15       Impact factor: 16.971

3.  Mining Bacillus subtilis chromosome heterogeneities using hidden Markov models.

Authors:  Pierre Nicolas; Laurent Bize; Florence Muri; Mark Hoebeke; François Rodolphe; S Dusko Ehrlich; Bernard Prum; Philippe Bessières
Journal:  Nucleic Acids Res       Date:  2002-03-15       Impact factor: 16.971

4.  Hidden Markov models from molecular dynamics simulations on DNA.

Authors:  Kelly M Thayer; D L Beveridge
Journal:  Proc Natl Acad Sci U S A       Date:  2002-06-18       Impact factor: 11.205

5.  Conservation of structure and function among tyrosine recombinases: homology-based modeling of the lambda integrase core-binding domain.

Authors:  Brian M Swalla; Richard I Gumport; Jeffrey F Gardner
Journal:  Nucleic Acids Res       Date:  2003-02-01       Impact factor: 16.971

6.  Highly accurate classification of Watson-Crick basepairs on termini of single DNA molecules.

Authors:  Stephen Winters-Hilt; Wenonah Vercoutere; Veronica S DeGuzman; David Deamer; Mark Akeson; David Haussler
Journal:  Biophys J       Date:  2003-02       Impact factor: 4.033

Review 7.  Current methods of gene prediction, their strengths and weaknesses.

Authors:  Catherine Mathé; Marie-France Sagot; Thomas Schiex; Pierre Rouzé
Journal:  Nucleic Acids Res       Date:  2002-10-01       Impact factor: 16.971

8.  Gene prediction by spectral rotation measure: a new method for identifying protein-coding regions.

Authors:  Daniel Kotlar; Yizhar Lavner
Journal:  Genome Res       Date:  2003-07-17       Impact factor: 9.043

9.  Evaluating bacterial gene-finding HMM structures as probabilistic logic programs.

Authors:  Søren Mørk; Ian Holmes
Journal:  Bioinformatics       Date:  2012-01-03       Impact factor: 6.937

10.  BGI-RIS: an integrated information resource and comparative analysis workbench for rice genomics.

Authors:  Wenming Zhao; Jing Wang; Ximiao He; Xiaobing Huang; Yongzhi Jiao; Mingtao Dai; Shulin Wei; Jian Fu; Ye Chen; Xiaoyu Ren; Yong Zhang; Peixiang Ni; Jianguo Zhang; Songgang Li; Jian Wang; Gane Ka-Shu Wong; Hongyu Zhao; Jun Yu; Huanming Yang; Jun Wang
Journal:  Nucleic Acids Res       Date:  2004-01-01       Impact factor: 16.971

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.