Literature DB >> 22215819

Evaluating bacterial gene-finding HMM structures as probabilistic logic programs.

Søren Mørk1, Ian Holmes.   

Abstract

MOTIVATION: Probabilistic logic programming offers a powerful way to describe and evaluate structured statistical models. To investigate the practicality of probabilistic logic programming for structure learning in bioinformatics, we undertook a simplified bacterial gene-finding benchmark in PRISM, a probabilistic dialect of Prolog.
RESULTS: We evaluate Hidden Markov Model structures for bacterial protein-coding gene potential, including a simple null model structure, three structures based on existing bacterial gene finders and two novel model structures. We test standard versions as well as ADPH length modeling and three-state versions of the five model structures. The models are all represented as probabilistic logic programs and evaluated using the PRISM machine learning system in terms of statistical information criteria and gene-finding prediction accuracy, in two bacterial genomes. Neither of our implementations of the two currently most used model structures are best performing in terms of statistical information criteria or prediction performances, suggesting that better-fitting models might be achievable. AVAILABILITY: The source code of all PRISM models, data and additional scripts are freely available for download at: http://github.com/somork/codonhmm. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Mesh:

Year:  2012        PMID: 22215819      PMCID: PMC3289911          DOI: 10.1093/bioinformatics/btr698

Source DB:  PubMed          Journal:  Bioinformatics        ISSN: 1367-4803            Impact factor:   6.937


  29 in total

1.  A hidden Markov model that finds genes in E. coli DNA.

Authors:  A Krogh; I S Mian; D Haussler
Journal:  Nucleic Acids Res       Date:  1994-11-11       Impact factor: 16.971

2.  Finding genes in DNA with a Hidden Markov Model.

Authors:  J Henderson; S Salzberg; K H Fasman
Journal:  J Comput Biol       Date:  1997       Impact factor: 1.479

3.  The complete genome sequence of Escherichia coli K-12.

Authors:  F R Blattner; G Plunkett; C A Bloch; N T Perna; V Burland; M Riley; J Collado-Vides; J D Glasner; C K Rode; G F Mayhew; J Gregor; N W Davis; H A Kirkpatrick; M A Goeden; D J Rose; B Mau; Y Shao
Journal:  Science       Date:  1997-09-05       Impact factor: 47.728

4.  Genie--gene finding in Drosophila melanogaster.

Authors:  M G Reese; D Kulp; H Tammana; D Haussler
Journal:  Genome Res       Date:  2000-04       Impact factor: 9.043

5.  Hidden Markov models in computational biology. Applications to protein modeling.

Authors:  A Krogh; M Brown; I S Mian; K Sjölander; D Haussler
Journal:  J Mol Biol       Date:  1994-02-04       Impact factor: 5.469

6.  Computer methods to locate signals in nucleic acid sequences.

Authors:  R Staden
Journal:  Nucleic Acids Res       Date:  1984-01-11       Impact factor: 16.971

7.  Microbial gene identification using interpolated Markov models.

Authors:  S L Salzberg; A L Delcher; S Kasif; O White
Journal:  Nucleic Acids Res       Date:  1998-01-15       Impact factor: 16.971

8.  Automatic generation of gene finders for eukaryotic species.

Authors:  Kasper Munch; Anders Krogh
Journal:  BMC Bioinformatics       Date:  2006-05-21       Impact factor: 3.169

9.  Gene finding in novel genomes.

Authors:  Ian Korf
Journal:  BMC Bioinformatics       Date:  2004-05-14       Impact factor: 3.169

10.  EasyGene--a prokaryotic gene finder that ranks ORFs by statistical significance.

Authors:  Thomas Schou Larsen; Anders Krogh
Journal:  BMC Bioinformatics       Date:  2003-06-03       Impact factor: 3.169

View more
  4 in total

1.  BioMake: a GNU make-compatible utility for declarative workflow management.

Authors:  Ian H Holmes; Christopher J Mungall
Journal:  Bioinformatics       Date:  2017-11-01       Impact factor: 6.937

2.  Prediction of Sphingosine protein-coding regions with a self adaptive spectral rotation method.

Authors:  Zhongwei Li; Yanan Guan; Xiang Yuan; Pan Zheng; Hu Zhu
Journal:  PLoS One       Date:  2019-04-03       Impact factor: 3.240

3.  Effects of using coding potential, sequence conservation and mRNA structure conservation for predicting pyrrolysine containing genes.

Authors:  Christian Theil Have; Sine Zambach; Henning Christiansen
Journal:  BMC Bioinformatics       Date:  2013-04-04       Impact factor: 3.169

4.  Next-generation annotation of prokaryotic genomes with EuGene-P: application to Sinorhizobium meliloti 2011.

Authors:  Erika Sallet; Brice Roux; Laurent Sauviac; Marie-Francoise Jardinaud; Sébastien Carrère; Thomas Faraut; Fernanda de Carvalho-Niebel; Jérôme Gouzy; Pascal Gamas; Delphine Capela; Claude Bruand; Thomas Schiex
Journal:  DNA Res       Date:  2013-04-18       Impact factor: 4.458

  4 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.