Literature DB >> 11410670

GeneMarkS: a self-training method for prediction of gene starts in microbial genomes. Implications for finding sequence motifs in regulatory regions.

J Besemer1, A Lomsadze, M Borodovsky.   

Abstract

Improving the accuracy of prediction of gene starts is one of a few remaining open problems in computer prediction of prokaryotic genes. Its difficulty is caused by the absence of relatively strong sequence patterns identifying true translation initiation sites. In the current paper we show that the accuracy of gene start prediction can be improved by combining models of protein-coding and non-coding regions and models of regulatory sites near gene start within an iterative Hidden Markov model based algorithm. The new gene prediction method, called GeneMarkS, utilizes a non-supervised training procedure and can be used for a newly sequenced prokaryotic genome with no prior knowledge of any protein or rRNA genes. The GeneMarkS implementation uses an improved version of the gene finding program GeneMark.hmm, heuristic Markov models of coding and non-coding regions and the Gibbs sampling multiple alignment program. GeneMarkS predicted precisely 83.2% of the translation starts of GenBank annotated Bacillus subtilis genes and 94.4% of translation starts in an experimentally validated set of Escherichia coli genes. We have also observed that GeneMarkS detects prokaryotic genes, in terms of identifying open reading frames containing real genes, with an accuracy matching the level of the best currently used gene detection methods. Accurate translation start prediction, in addition to the refinement of protein sequence N-terminal data, provides the benefit of precise positioning of the sequence region situated upstream to a gene start. Therefore, sequence motifs related to transcription and translation regulatory sites can be revealed and analyzed with higher precision. These motifs were shown to possess a significant variability, the functional and evolutionary connections of which are discussed.

Entities:  

Mesh:

Substances:

Year:  2001        PMID: 11410670      PMCID: PMC55746          DOI: 10.1093/nar/29.12.2607

Source DB:  PubMed          Journal:  Nucleic Acids Res        ISSN: 0305-1048            Impact factor:   16.971


  45 in total

1.  Identifying DNA and protein patterns with statistically significant alignments of multiple sequences.

Authors:  G Z Hertz; G D Stormo
Journal:  Bioinformatics       Date:  1999 Jul-Aug       Impact factor: 6.937

2.  The complete genome sequence of the gastric pathogen Helicobacter pylori.

Authors:  J F Tomb; O White; A R Kerlavage; R A Clayton; G G Sutton; R D Fleischmann; K A Ketchum; H P Klenk; S Gill; B A Dougherty; K Nelson; J Quackenbush; L Zhou; E F Kirkness; S Peterson; B Loftus; D Richardson; R Dodson; H G Khalak; A Glodek; K McKenney; L M Fitzegerald; N Lee; M D Adams; E K Hickey; D E Berg; J D Gocayne; T R Utterback; J D Peterson; J M Kelley; M D Cotton; J M Weidman; C Fujii; C Bowman; L Watthey; E Wallin; W S Hayes; M Borodovsky; P D Karp; H O Smith; C M Fraser; J C Venter
Journal:  Nature       Date:  1997-08-07       Impact factor: 49.962

3.  Determination of the optimal aligned spacing between the Shine-Dalgarno sequence and the translation initiation codon of Escherichia coli mRNAs.

Authors:  H Chen; M Bjerknes; R Kumar; E Jay
Journal:  Nucleic Acids Res       Date:  1994-11-25       Impact factor: 16.971

4.  Sequence analysis of the genome of the unicellular cyanobacterium Synechocystis sp. strain PCC6803. II. Sequence determination of the entire genome and assignment of potential protein-coding regions.

Authors:  T Kaneko; S Sato; H Kotani; A Tanaka; E Asamizu; Y Nakamura; N Miyajima; M Hirosawa; M Sugiura; S Sasamoto; T Kimura; T Hosouchi; A Matsuno; A Muraki; N Nakazaki; K Naruo; S Okumura; S Shimpo; C Takeuchi; T Wada; A Watanabe; M Yamada; M Yasuda; S Tabata
Journal:  DNA Res       Date:  1996-06-30       Impact factor: 4.458

5.  The value of prior knowledge in discovering motifs with MEME.

Authors:  T L Bailey; C Elkan
Journal:  Proc Int Conf Intell Syst Mol Biol       Date:  1995

6.  Gibbs motif sampling: detection of bacterial outer membrane protein repeats.

Authors:  A F Neuwald; J S Liu; C E Lawrence
Journal:  Protein Sci       Date:  1995-08       Impact factor: 6.725

7.  Identification of common motifs in unaligned DNA sequences: application to Escherichia coli Lrp regulon.

Authors:  Y M Fraenkel; Y Mandel; D Friedberg; H Margalit
Journal:  Comput Appl Biosci       Date:  1995-08

8.  Aeropyrum pernix gen. nov., sp. nov., a novel aerobic hyperthermophilic archaeon growing at temperatures up to 100 degrees C.

Authors:  Y Sako; N Nomura; A Uchida; Y Ishida; H Morii; Y Koga; T Hoaki; T Maruyama
Journal:  Int J Syst Bacteriol       Date:  1996-10

9.  Whole-genome random sequencing and assembly of Haemophilus influenzae Rd.

Authors:  R D Fleischmann; M D Adams; O White; R A Clayton; E F Kirkness; A R Kerlavage; C J Bult; J F Tomb; B A Dougherty; J M Merrick
Journal:  Science       Date:  1995-07-28       Impact factor: 47.728

10.  Complete genome sequence of the methanogenic archaeon, Methanococcus jannaschii.

Authors:  C J Bult; O White; G J Olsen; L Zhou; R D Fleischmann; G G Sutton; J A Blake; L M FitzGerald; R A Clayton; J D Gocayne; A R Kerlavage; B A Dougherty; J F Tomb; M D Adams; C I Reich; R Overbeek; E F Kirkness; K G Weinstock; J M Merrick; A Glodek; J L Scott; N S Geoghagen; J C Venter
Journal:  Science       Date:  1996-08-23       Impact factor: 47.728

View more
  878 in total

1.  Mining Bacillus subtilis chromosome heterogeneities using hidden Markov models.

Authors:  Pierre Nicolas; Laurent Bize; Florence Muri; Mark Hoebeke; François Rodolphe; S Dusko Ehrlich; Bernard Prum; Philippe Bessières
Journal:  Nucleic Acids Res       Date:  2002-03-15       Impact factor: 16.971

2.  ZCURVE: a new system for recognizing protein-coding genes in bacterial and archaeal genomes.

Authors:  Feng-Biao Guo; Hong-Yu Ou; Chun-Ting Zhang
Journal:  Nucleic Acids Res       Date:  2003-03-15       Impact factor: 16.971

3.  Hierarchy of sequence-dependent features associated with prokaryotic translation.

Authors:  Gila Lithwick; Hanah Margalit
Journal:  Genome Res       Date:  2003-12       Impact factor: 9.043

4.  A comparative genomic method for computational identification of prokaryotic translation initiation sites.

Authors:  Megon Walker; Vladimir Pavlovic; Simon Kasif
Journal:  Nucleic Acids Res       Date:  2002-07-15       Impact factor: 16.971

5.  FrameD: A flexible program for quality check and gene prediction in prokaryotic genomes and noisy matured eukaryotic sequences.

Authors:  Thomas Schiex; Jérôme Gouzy; Annick Moisan; Yannick de Oliveira
Journal:  Nucleic Acids Res       Date:  2003-07-01       Impact factor: 16.971

Review 6.  Current methods of gene prediction, their strengths and weaknesses.

Authors:  Catherine Mathé; Marie-France Sagot; Thomas Schiex; Pierre Rouzé
Journal:  Nucleic Acids Res       Date:  2002-10-01       Impact factor: 16.971

7.  Correlations between Shine-Dalgarno sequences and gene features such as predicted expression levels and operon structures.

Authors:  Jiong Ma; Allan Campbell; Samuel Karlin
Journal:  J Bacteriol       Date:  2002-10       Impact factor: 3.490

8.  Public web-based services from the European Bioinformatics Institute.

Authors:  Nicola Harte; Ville Silventoinen; Emmanuel Quevillon; Stephen Robinson; Kimmo Kallio; Xavier Fustero; Pravin Patel; Petteri Jokinen; Rodrigo Lopez
Journal:  Nucleic Acids Res       Date:  2004-07-01       Impact factor: 16.971

9.  Transfer of photosynthesis genes to and from Prochlorococcus viruses.

Authors:  Debbie Lindell; Matthew B Sullivan; Zackary I Johnson; Andrew C Tolonen; Forest Rohwer; Sallie W Chisholm
Journal:  Proc Natl Acad Sci U S A       Date:  2004-07-15       Impact factor: 11.205

10.  Protein interaction mapping on a functional shotgun sequence of Rickettsia sibirica.

Authors:  Joel A Malek; Jamey M Wierzbowski; Wei Tao; Stephanie A Bosak; David J Saranga; Lynn Doucette-Stamm; Douglas R Smith; Paul J McEwan; Kevin J McKernan
Journal:  Nucleic Acids Res       Date:  2004-02-10       Impact factor: 16.971

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.