Literature DB >> 12015885

A structural EM algorithm for phylogenetic inference.

Nir Friedman1, Matan Ninio, Itsik Pe'er, Tal Pupko.   

Abstract

A central task in the study of molecular evolution is the reconstruction of a phylogenetic tree from sequences of current-day taxa. The most established approach to tree reconstruction is maximum likelihood (ML) analysis. Unfortunately, searching for the maximum likelihood phylogenetic tree is computationally prohibitive for large data sets. In this paper, we describe a new algorithm that uses Structural Expectation Maximization (EM) for learning maximum likelihood phylogenetic trees. This algorithm is similar to the standard EM method for edge-length estimation, except that during iterations of the Structural EM algorithm the topology is improved as well as the edge length. Our algorithm performs iterations of two steps. In the E-step, we use the current tree topology and edge lengths to compute expected sufficient statistics, which summarize the data. In the M-Step, we search for a topology that maximizes the likelihood with respect to these expected sufficient statistics. We show that searching for better topologies inside the M-step can be done efficiently, as opposed to standard methods for topology search. We prove that each iteration of this procedure increases the likelihood of the topology, and thus the procedure must converge. This convergence point, however, can be a suboptimal one. To escape from such "local optima," we further enhance our basic EM procedure by incorporating moves in the flavor of simulated annealing. We evaluate these new algorithms on both synthetic and real sequence data and show that for protein sequences even our basic algorithm finds more plausible trees than existing methods for searching maximum likelihood phylogenies. Furthermore, our algorithms are dramatically faster than such methods, enabling, for the first time, phylogenetic analysis of large protein data sets in the maximum likelihood framework.

Entities:  

Mesh:

Substances:

Year:  2002        PMID: 12015885     DOI: 10.1089/10665270252935494

Source DB:  PubMed          Journal:  J Comput Biol        ISSN: 1066-5277            Impact factor:   1.479


  39 in total

1.  MAVID: constrained ancestral alignment of multiple sequences.

Authors:  Nicolas Bray; Lior Pachter
Journal:  Genome Res       Date:  2004-04       Impact factor: 9.043

2.  Site-specific evolutionary rate inference: taking phylogenetic uncertainty into account.

Authors:  Itay Mayrose; Amir Mitchell; Tal Pupko
Journal:  J Mol Evol       Date:  2005-03       Impact factor: 2.395

3.  Physicochemical constraint violation by missense substitutions mediates impairment of protein function and disease severity.

Authors:  Eric A Stone; Arend Sidow
Journal:  Genome Res       Date:  2005-06-17       Impact factor: 9.043

4.  Comparing 3D Genome Organization in Multiple Species Using Phylo-HMRF.

Authors:  Yang Yang; Yang Zhang; Bing Ren; Jesse R Dixon; Jian Ma
Journal:  Cell Syst       Date:  2019-06-26       Impact factor: 10.304

5.  Distribution and intensity of constraint in mammalian genomic sequence.

Authors:  Gregory M Cooper; Eric A Stone; George Asimenos; Eric D Green; Serafim Batzoglou; Arend Sidow
Journal:  Genome Res       Date:  2005-06-17       Impact factor: 9.043

6.  Three distinct modes of intron dynamics in the evolution of eukaryotes.

Authors:  Liran Carmel; Yuri I Wolf; Igor B Rogozin; Eugene V Koonin
Journal:  Genome Res       Date:  2007-05-10       Impact factor: 9.043

Review 7.  A maximum likelihood method for reconstruction of the evolution of eukaryotic gene structure.

Authors:  Liran Carmel; Igor B Rogozin; Yuri I Wolf; Eugene V Koonin
Journal:  Methods Mol Biol       Date:  2009

8.  ProPhylER: a curated online resource for protein function and structure based on evolutionary constraint analyses.

Authors:  Jonathan Binkley; Kalpana Karra; Andrew Kirby; Midori Hosobuchi; Eric A Stone; Arend Sidow
Journal:  Genome Res       Date:  2009-10-21       Impact factor: 9.043

9.  A comparative analysis of methods for predicting clinical outcomes using high-dimensional genomic datasets.

Authors:  Xia Jiang; Binghuang Cai; Diyang Xue; Xinghua Lu; Gregory F Cooper; Richard E Neapolitan
Journal:  J Am Med Inform Assoc       Date:  2014-04-15       Impact factor: 4.497

10.  Mammalian genomes ease location of human DNA functional segments but not their description.

Authors:  Lee A Newberg; Charles E Lawrence
Journal:  Stat Appl Genet Mol Biol       Date:  2004-09-30
View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.