Literature DB >> 14534169

HMM sampling and applications to gene finding and alternative splicing.

Simon L Cawley1, Lior Pachter.   

Abstract

The standard method of applying hidden Markov models to biological problems is to find a Viterbi (maximal weight) path through the HMM graph. The Viterbi algorithm reduces the problem of finding the most likely hidden state sequence that explains given observations, to a dynamic programming problem for corresponding directed acyclic graphs. For example, in the gene finding application, the HMM is used to find the most likely underlying gene structure given a DNA sequence. In this note we discuss the applications of sampling methods for HMMs. The standard sampling algorithm for HMMs is a variant of the common forward-backward and backtrack algorithms, and has already been applied in the context of Gibbs sampling methods. Nevetheless, the practice of sampling state paths from HMMs does not seem to have been widely adopted, and important applications have been overlooked. We show how sampling can be used for finding alternative splicings for genes, including alternative splicings that are conserved between genes from related organisms. We also show how sampling from the posterior distribution is a natural way to compute probabilities for predicted exons and gene structures being correct under the assumed model. Finally, we describe a new memory efficient sampling algorithm for certain classes of HMMs which provides a practical sampling alternative to the Hirschberg algorithm for optimal alignment. The ideas presented have applications not only to gene finding and HMMs but more generally to stochastic context free grammars and RNA structure prediction.

Mesh:

Substances:

Year:  2003        PMID: 14534169     DOI: 10.1093/bioinformatics/btg1057

Source DB:  PubMed          Journal:  Bioinformatics        ISSN: 1367-4803            Impact factor:   6.937


  23 in total

1.  Accurate identification of novel human genes through simultaneous gene prediction in human, mouse, and rat.

Authors:  Colin Dewey; Jia Qian Wu; Simon Cawley; Marina Alexandersson; Richard Gibbs; Lior Pachter
Journal:  Genome Res       Date:  2004-04       Impact factor: 9.043

2.  Efficient algorithms for training the parameters of hidden Markov models using stochastic expectation maximization (EM) training and Viterbi training.

Authors:  Tin Y Lam; Irmtraud M Meyer
Journal:  Algorithms Mol Biol       Date:  2010-12-09       Impact factor: 1.405

3.  A generative, probabilistic model of local protein structure.

Authors:  Wouter Boomsma; Kanti V Mardia; Charles C Taylor; Jesper Ferkinghoff-Borg; Anders Krogh; Thomas Hamelryck
Journal:  Proc Natl Acad Sci U S A       Date:  2008-06-25       Impact factor: 11.205

4.  Assessing protein conformational sampling methods based on bivariate lag-distributions of backbone angles.

Authors:  Mehdi Maadooliat; Xin Gao; Jianhua Z Huang
Journal:  Brief Bioinform       Date:  2012-08-27       Impact factor: 11.622

5.  StochHMM: a flexible hidden Markov model tool and C++ library.

Authors:  Paul C Lott; Ian Korf
Journal:  Bioinformatics       Date:  2014-01-30       Impact factor: 6.937

6.  An overview and metanalysis of machine and deep learning-based CRISPR gRNA design tools.

Authors:  Jun Wang; Xiuqing Zhang; Lixin Cheng; Yonglun Luo
Journal:  RNA Biol       Date:  2019-09-27       Impact factor: 4.652

7.  Genome-wide inference of ancestral recombination graphs.

Authors:  Matthew D Rasmussen; Melissa J Hubisz; Ilan Gronau; Adam Siepel
Journal:  PLoS Genet       Date:  2014-05-15       Impact factor: 5.917

8.  Beyond rotamers: a generative, probabilistic model of side chains in proteins.

Authors:  Tim Harder; Wouter Boomsma; Martin Paluszewski; Jes Frellsen; Kristoffer E Johansson; Thomas Hamelryck
Journal:  BMC Bioinformatics       Date:  2010-06-05       Impact factor: 3.169

9.  A probabilistic model of RNA conformational space.

Authors:  Jes Frellsen; Ida Moltke; Martin Thiim; Kanti V Mardia; Jesper Ferkinghoff-Borg; Thomas Hamelryck
Journal:  PLoS Comput Biol       Date:  2009-06-19       Impact factor: 4.475

Review 10.  T-cell epitope vaccine design by immunoinformatics.

Authors:  Atanas Patronov; Irini Doytchinova
Journal:  Open Biol       Date:  2013-01-08       Impact factor: 6.411

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.