Literature DB >> 14571377

Very fast algorithms for evaluating the stability of ML and Bayesian phylogenetic trees from sequence data.

Peter J Waddell1, Hirohisa Kishino, Rissa Ota.   

Abstract

Evolutionary trees sit at the core of all realistic models describing a set of related sequences, including alignment, homology search, ancestral protein reconstruction and 2D/3D structural change. It is important to assess the stochastic error when estimating a tree, including models using the most realistic likelihood-based optimizations, yet computation times may be many days or weeks. If so, the bootstrap is computationally prohibitive. Here we show that the extremely fast "resampling of estimated log likelihoods" or RELL method behaves well under more general circumstances than previously examined. RELL approximates the bootstrap (BP) proportions of trees better that some bootstrap methods that rely on fast heuristics to search the tree space. The BIC approximation of the Bayesian posterior probability (BPP) of trees is made more accurate by including an additional term related to the determinant of the information matrix (which may also be obtained as a product of gradient or score vectors). Such estimates are shown to be very close to MCMC chain values. Our analysis of mammalian mitochondrial amino acid sequences suggest that when model breakdown occurs, as it typically does for sequences separated by more than a few million years, the BPP values are far too peaked and the real fluctuations in the likelihood of the data are many times larger than expected. Accordingly, several ways to incorporate the bootstrap and other types of direct resampling with MCMC procedures are outlined. Genes evolve by a process which involves some sites following a tree close to, but not identical with, the species tree. It is seen that under such a likelihood model BP (bootstrap proportions) and BPP estimates may still be reasonable estimates of the species tree. Since many of the methods studied are very fast computationally, there is no reason to ignore stochastic error even with the slowest ML or likelihood based methods.

Entities:  

Mesh:

Year:  2002        PMID: 14571377

Source DB:  PubMed          Journal:  Genome Inform        ISSN: 0919-9454


  14 in total

Review 1.  Statistical measures of uncertainty for branches in phylogenetic trees inferred from molecular sequences by using model-based methods.

Authors:  Borys Wróbel
Journal:  J Appl Genet       Date:  2008       Impact factor: 3.240

Review 2.  Multilocus phylogeography and phylogenetics using sequence-based markers.

Authors:  Patrícia H Brito; Scott V Edwards
Journal:  Genetica       Date:  2008-07-24       Impact factor: 1.082

Review 3.  Statistics and truth in phylogenomics.

Authors:  Sudhir Kumar; Alan J Filipski; Fabia U Battistuzzi; Sergei L Kosakovsky Pond; Koichiro Tamura
Journal:  Mol Biol Evol       Date:  2011-08-26       Impact factor: 16.240

4.  Pitfalls of the site-concordance factor (sCF) as measure of phylogenetic branch support.

Authors:  Patrick Kück; Juliane Romahn; Karen Meusemann
Journal:  NAR Genom Bioinform       Date:  2022-09-15

5.  Duplicated gelsolin family genes in zebrafish: a novel scinderin-like gene (scinla) encodes the major corneal crystallin.

Authors:  Sujuan Jia; Marina Omelchenko; Donita Garland; Vasilis Vasiliou; Jyotshnabala Kanungo; Michael Spencer; Yuri Wolf; Eugene Koonin; Joram Piatigorsky
Journal:  FASEB J       Date:  2007-06-04       Impact factor: 5.191

6.  Ultrafast approximation for phylogenetic bootstrap.

Authors:  Bui Quang Minh; Minh Anh Thi Nguyen; Arndt von Haeseler
Journal:  Mol Biol Evol       Date:  2013-02-15       Impact factor: 16.240

7.  The evolutionary radiation of Arvicolinae rodents (voles and lemmings): relative contribution of nuclear and mitochondrial DNA phylogenies.

Authors:  Thomas Galewski; Marie-ka Tilak; Sophie Sanchez; Pascale Chevret; Emmanuel Paradis; Emmanuel J P Douzery
Journal:  BMC Evol Biol       Date:  2006-10-09       Impact factor: 3.260

8.  Confirming the phylogeny of mammals by use of large comparative sequence data sets.

Authors:  Arjun B Prasad; Marc W Allard; Eric D Green
Journal:  Mol Biol Evol       Date:  2008-05-02       Impact factor: 16.240

9.  Taxonomic distribution of large DNA viruses in the sea.

Authors:  Adam Monier; Jean-Michel Claverie; Hiroyuki Ogata
Journal:  Genome Biol       Date:  2008-07-03       Impact factor: 13.583

10.  Fast splice site detection using information content and feature reduction.

Authors:  A K M A Baten; S K Halgamuge; B C H Chang
Journal:  BMC Bioinformatics       Date:  2008-12-12       Impact factor: 3.169

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.