Literature DB >> 24833225

Algorithms for hidden markov models restricted to occurrences of regular expressions.

Paula Tataru1, Andreas Sand2, Asger Hobolth3, Thomas Mailund4, Christian N S Pedersen5.   

Abstract

Hidden Markov Models (HMMs) are widely used probabilistic models, particularly for annotating sequential data with an underlying hidden structure. Patterns in the annotation are often more relevant to study than the hidden structure itself. A typical HMM analysis consists of annotating the observed data using a decoding algorithm and analyzing the annotation to study patterns of interest. For example, given an HMM modeling genes in DNA sequences, the focus is on occurrences of genes in the annotation. In this paper, we define a pattern through a regular expression and present a restriction of three classical algorithms to take the number of occurrences of the pattern in the hidden sequence into account. We present a new algorithm to compute the distribution of the number of pattern occurrences, and we extend the two most widely used existing decoding algorithms to employ information from this distribution. We show experimentally that the expectation of the distribution of the number of pattern occurrences gives a highly accurate estimate, while the typical procedure can be biased in the sense that the identified number of pattern occurrences does not correspond to the true number. We furthermore show that using this distribution in the decoding algorithms improves the predictive power of the model.

Entities:  

Year:  2013        PMID: 24833225      PMCID: PMC4009796          DOI: 10.3390/biology2041282

Source DB:  PubMed          Journal:  Biology (Basel)        ISSN: 2079-7737


  12 in total

1.  Predicting protein structure using only sequence information.

Authors:  K Karplus; C Barrett; M Cline; M Diekhans; L Grate; R Hughey
Journal:  Proteins       Date:  1999

2.  Predicting transmembrane protein topology with a hidden Markov model: application to complete genomes.

Authors:  A Krogh; B Larsson; G von Heijne; E L Sonnhammer
Journal:  J Mol Biol       Date:  2001-01-19       Impact factor: 5.469

3.  Genetack: frameshift identification in protein-coding sequences by the Viterbi algorithm.

Authors:  Ivan Antonov; Mark Borodovsky
Journal:  J Bioinform Comput Biol       Date:  2010-06       Impact factor: 1.122

4.  Multiple pattern matching: a Markov chain approach.

Authors:  Manuel E Lladser; M D Betterton; Rob Knight
Journal:  J Math Biol       Date:  2007-08-01       Impact factor: 2.259

5.  A hidden Markov model that finds genes in E. coli DNA.

Authors:  A Krogh; I S Mian; D Haussler
Journal:  Nucleic Acids Res       Date:  1994-11-11       Impact factor: 16.971

Review 6.  Profile hidden Markov models.

Authors:  S R Eddy
Journal:  Bioinformatics       Date:  1998       Impact factor: 6.937

7.  GeneMark.hmm: new solutions for gene finding.

Authors:  A V Lukashin; M Borodovsky
Journal:  Nucleic Acids Res       Date:  1998-02-15       Impact factor: 16.971

8.  Hidden Markov models in computational biology. Applications to protein modeling.

Authors:  A Krogh; M Brown; I S Mian; K Sjölander; D Haussler
Journal:  J Mol Biol       Date:  1994-02-04       Impact factor: 5.469

9.  Estimating divergence time and ancestral effective population size of Bornean and Sumatran orangutan subspecies using a coalescent hidden Markov model.

Authors:  Thomas Mailund; Julien Y Dutheil; Asger Hobolth; Gerton Lunter; Mikkel H Schierup
Journal:  PLoS Genet       Date:  2011-03-03       Impact factor: 5.917

10.  A new decoding algorithm for hidden Markov models improves the prediction of the topology of all-beta membrane proteins.

Authors:  Piero Fariselli; Pier Luigi Martelli; Rita Casadio
Journal:  BMC Bioinformatics       Date:  2005-12-01       Impact factor: 3.169

View more
  2 in total

1.  Semi-supervised morphosyntactic classification of Old Icelandic.

Authors:  Kryztof Urban; Timothy R Tangherlini; Aurelijus Vijūnas; Peter M Broadwell
Journal:  PLoS One       Date:  2014-07-16       Impact factor: 3.240

2.  Regmex: a statistical tool for exploring motifs in ranked sequence lists from genomics experiments.

Authors:  Morten Muhlig Nielsen; Paula Tataru; Tobias Madsen; Asger Hobolth; Jakob Skou Pedersen
Journal:  Algorithms Mol Biol       Date:  2018-12-08       Impact factor: 1.405

  2 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.