Literature DB >> 21757463

Sufficient statistics and expectation maximization algorithms in phylogenetic tree models.

Hisanori Kiryu1.   

Abstract

MOTIVATION: Measuring evolutionary conservation is a routine step in the identification of functional elements in genome sequences. Although a number of studies have proposed methods that use the continuous time Markov models (CTMMs) to find evolutionarily constrained elements, their probabilistic structures have been less frequently investigated.
RESULTS: In this article, we investigate a sufficient statistic for CTMMs. The statistic is composed of the fractional duration of nucleotide characters over evolutionary time, F(d), and the number of substitutions occurring in phylogenetic trees, N(s). We first derive basic properties of the sufficient statistic. Then, we derive an expectation maximization (EM) algorithm for estimating the parameters of a phylogenetic model, which iteratively computes the expectation values of the sufficient statistic. We show that the EM algorithm exhibits much faster convergence than other optimization methods that use numerical gradient descent algorithms. Finally, we investigate the genome-wide distribution of fractional duration time F(d) which, unlike the number of substitutions N(s), has rarely been investigated. We show that F(d) has evolutionary information that is distinct from that in N(s), which may be useful for detecting novel types of evolutionary constraints existing in the human genome. AVAILABILITY: The C++ source code of the 'Fdur' software is available at http://www.ncrna.org/software/fdur/ CONTACT: kiryu-h@k.u-tokyo.ac.jp SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Entities:  

Mesh:

Year:  2011        PMID: 21757463     DOI: 10.1093/bioinformatics/btr420

Source DB:  PubMed          Journal:  Bioinformatics        ISSN: 1367-4803            Impact factor:   6.937


  3 in total

1.  SCOUP: a probabilistic model based on the Ornstein-Uhlenbeck process to analyze single-cell expression data during differentiation.

Authors:  Hirotaka Matsumoto; Hisanori Kiryu
Journal:  BMC Bioinformatics       Date:  2016-06-08       Impact factor: 3.169

2.  Mirage 2.0: fast and memory-efficient reconstruction of gene-content evolution considering heterogeneous evolutionary patterns among gene families.

Authors:  Tsukasa Fukunaga; Wataru Iwasaki
Journal:  Bioinformatics       Date:  2022-06-30       Impact factor: 6.931

3.  TMRS: an algorithm for computing the time to the most recent substitution event from a multiple alignment column.

Authors:  Hisanori Kiryu; Yuto Ichikawa; Yasuhiro Kojima
Journal:  Algorithms Mol Biol       Date:  2019-11-18       Impact factor: 1.405

  3 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.