Literature DB >> 16463190

Analysing grouping of nucleotides in DNA sequences using lumped processes constructed from Markov chains.

Yann Guédon1, Yves d'Aubenton-Carafa, Claude Thermes.   

Abstract

The most commonly used models for analysing local dependencies in DNA sequences are (high-order) Markov chains. Incorporating knowledge relative to the possible grouping of the nucleotides enables to define dedicated sub-classes of Markov chains. The problem of formulating lumpability hypotheses for a Markov chain is therefore addressed. In the classical approach to lumpability, this problem can be formulated as the determination of an appropriate state space (smaller than the original state space) such that the lumped chain defined on this state space retains the Markov property. We propose a different perspective on lumpability where the state space is fixed and the partitioning of this state space is represented by a one-to-many probabilistic function within a two-level stochastic process. Three nested classes of lumped processes can be defined in this way as sub-classes of first-order Markov chains. These lumped processes enable parsimonious reparameterizations of Markov chains that help to reveal relevant partitions of the state space. Characterizations of the lumped processes on the original transition probability matrix are derived. Different model selection methods relying either on hypothesis testing or on penalized log-likelihood criteria are presented as well as extensions to lumped processes constructed from high-order Markov chains. The relevance of the proposed approach to lumpability is illustrated by the analysis of DNA sequences. In particular, the use of lumped processes enables to highlight differences between intronic sequences and gene untranslated region sequences.

Entities:  

Mesh:

Substances:

Year:  2006        PMID: 16463190     DOI: 10.1007/s00285-005-0358-y

Source DB:  PubMed          Journal:  J Math Biol        ISSN: 0303-6812            Impact factor:   2.164


  6 in total

Review 1.  mRNA localization: message on the move.

Authors:  R P Jansen
Journal:  Nat Rev Mol Cell Biol       Date:  2001-04       Impact factor: 94.444

Review 2.  Probabilistic and statistical properties of words: an overview.

Authors:  G Reinert; S Schbath; M S Waterman
Journal:  J Comput Biol       Date:  2000 Feb-Apr       Impact factor: 1.479

3.  UTRdb and UTRsite: specialized databases of sequences and functional elements of 5' and 3' untranslated regions of eukaryotic mRNAs. Update 2002.

Authors:  Graziano Pesole; Sabino Liuni; Giorgio Grillo; Flavio Licciulli; Flavio Mignone; Carmela Gissi; Cecilia Saccone
Journal:  Nucleic Acids Res       Date:  2002-01-01       Impact factor: 16.971

Review 4.  mRNA turnover.

Authors:  P Mitchell; D Tollervey
Journal:  Curr Opin Cell Biol       Date:  2001-06       Impact factor: 8.382

Review 5.  Diversity in translational regulation.

Authors:  P Macdonald
Journal:  Curr Opin Cell Biol       Date:  2001-06       Impact factor: 8.382

6.  Probabilistic independence networks for hidden Markov probability models.

Authors:  P Smyth; D Heckerman; M I Jordan
Journal:  Neural Comput       Date:  1997-02-15       Impact factor: 2.026

  6 in total
  1 in total

1.  Statistical tests to identify appropriate types of nucleotide sequence recoding in molecular phylogenetics.

Authors:  Victor A Vera-Ruiz; Kwok W Lau; John Robinson; Lars S Jermiin
Journal:  BMC Bioinformatics       Date:  2014-01-31       Impact factor: 3.169

  1 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.