Literature DB >> 19783593

Rapid likelihood analysis on large phylogenies using partial sampling of substitution histories.

A P Jason de Koning1, Wanjun Gu, David D Pollock.   

Abstract

Likelihood-based approaches can reconstruct evolutionary processes in greater detail and with better precision from larger data sets. The extremely large comparative genomic data sets that are now being generated thus create new opportunities for understanding molecular evolution, but analysis of such large quantities of data poses escalating computational challenges. Recently developed Markov chain Monte Carlo methods that augment substitution histories are a promising approach to alleviate these computational costs. We analyzed the computational costs of several such approaches, considering how they scale with model and data set complexity. This provided a theoretical framework to understand the most important computational bottlenecks, leading us to combine novel variations of our conditional pathway integration approach with recent advances made by others. The resulting technique ("partial sampling" of substitution histories) is considerably faster than all other approaches we considered. It is accurate, simple to implement, and scales exceptionally well with dimensions of model complexity and data set size. In particular, the time complexity of sampling unobserved substitution histories using the new method is much faster than previously existing methods, and model parameter and branch length updates are independent of data set size. We compared the performance of methods on a 224-taxon set of mammalian cytochrome-b sequences. For a simple nucleotide substitution model, partial sampling was at least 10 times faster than the PhyloBayes program, which samples substitutions in continuous time, and about 100 times faster than when using fully integrated substitution histories. Under a general reversible model of amino acid substitution, the partial sampling method was 1,600 times faster than when using fully integrated substitution histories, confirming significantly improved scaling with model state-space complexity. Partial sampling of substitutions thus dramatically improves the utility of likelihood approaches for analyzing complex evolutionary processes on large data sets.

Entities:  

Mesh:

Substances:

Year:  2009        PMID: 19783593      PMCID: PMC2877550          DOI: 10.1093/molbev/msp228

Source DB:  PubMed          Journal:  Mol Biol Evol        ISSN: 0737-4038            Impact factor:   16.240


  27 in total

Review 1.  A case for evolutionary genomics and the comprehensive examination of sequence biodiversity.

Authors:  D D Pollock; J A Eisen; N A Doggett; M P Cummings
Journal:  Mol Biol Evol       Date:  2000-12       Impact factor: 16.240

2.  Phylogenetic estimation of context-dependent substitution rates by maximum likelihood.

Authors:  Adam Siepel; David Haussler
Journal:  Mol Biol Evol       Date:  2003-12-05       Impact factor: 16.240

3.  Mapping mutations on phylogenies.

Authors:  Rasmus Nielsen
Journal:  Syst Biol       Date:  2002-10       Impact factor: 15.683

4.  MrBayes 3: Bayesian phylogenetic inference under mixed models.

Authors:  Fredrik Ronquist; John P Huelsenbeck
Journal:  Bioinformatics       Date:  2003-08-12       Impact factor: 6.937

5.  Modeling the site-specific variation of selection patterns along lineages.

Authors:  Stéphane Guindon; Allen G Rodrigo; Kelly A Dyer; John P Huelsenbeck
Journal:  Proc Natl Acad Sci U S A       Date:  2004-08-23       Impact factor: 11.205

6.  A Bayesian mixture model for across-site heterogeneities in the amino-acid replacement process.

Authors:  Nicolas Lartillot; Hervé Philippe
Journal:  Mol Biol Evol       Date:  2004-03-10       Impact factor: 16.240

7.  Discrete-time stochastic modeling and simulation of biochemical networks.

Authors:  Werner Sandmann
Journal:  Comput Biol Chem       Date:  2008-04-10       Impact factor: 2.877

8.  Exploring fast computational strategies for probabilistic phylogenetic analysis.

Authors:  Nicolas Rodrigue; Hervé Philippe; Nicolas Lartillot
Journal:  Syst Biol       Date:  2007-10       Impact factor: 15.683

9.  Estimating the pattern of nucleotide substitution.

Authors:  Z Yang
Journal:  J Mol Evol       Date:  1994-07       Impact factor: 2.395

10.  Ancestral sequence reconstruction in primate mitochondrial DNA: compositional bias and effect on functional inference.

Authors:  Neeraja M Krishnan; Hervé Seligmann; Caro-Beth Stewart; A P Jason De Koning; David D Pollock
Journal:  Mol Biol Evol       Date:  2004-06-30       Impact factor: 16.240

View more
  9 in total

1.  Mutation-selection models of coding sequence evolution with site-heterogeneous amino acid fitness profiles.

Authors:  Nicolas Rodrigue; Hervé Philippe; Nicolas Lartillot
Journal:  Proc Natl Acad Sci U S A       Date:  2010-02-22       Impact factor: 11.205

2.  Phylogenetics, likelihood, evolution and complexity.

Authors:  A P Jason de Koning; Wanjun Gu; Todd A Castoe; David D Pollock
Journal:  Bioinformatics       Date:  2012-09-12       Impact factor: 6.937

3.  Phylogenetic stochastic mapping without matrix exponentiation.

Authors:  Jan Irvahn; Vladimir N Minin
Journal:  J Comput Biol       Date:  2014-06-11       Impact factor: 1.479

4.  Modelling the ancestral sequence distribution and model frequencies in context-dependent models for primate non-coding sequences.

Authors:  Guy Baele; Yves Van de Peer; Stijn Vansteelandt
Journal:  BMC Evol Biol       Date:  2010-08-10       Impact factor: 3.260

Review 5.  Evaluating phylogenetic congruence in the post-genomic era.

Authors:  Jessica W Leigh; François-Joseph Lapointe; Philippe Lopez; Eric Bapteste
Journal:  Genome Biol Evol       Date:  2011-06-28       Impact factor: 3.416

6.  Resolving difficult phylogenetic questions: why more sequences are not enough.

Authors:  Hervé Philippe; Henner Brinkmann; Dennis V Lavrov; D Timothy J Littlewood; Michael Manuel; Gert Wörheide; Denis Baurain
Journal:  PLoS Biol       Date:  2011-03-15       Impact factor: 8.029

7.  Site-heterogeneous mutation-selection models within the PhyloBayes-MPI package.

Authors:  Nicolas Rodrigue; Nicolas Lartillot
Journal:  Bioinformatics       Date:  2013-12-18       Impact factor: 6.937

8.  The tangled bank of amino acids.

Authors:  Richard A Goldstein; David D Pollock
Journal:  Protein Sci       Date:  2016-05-12       Impact factor: 6.725

Review 9.  Using the Mutation-Selection Framework to Characterize Selection on Protein Sequences.

Authors:  Ashley I Teufel; Andrew M Ritchie; Claus O Wilke; David A Liberles
Journal:  Genes (Basel)       Date:  2018-08-13       Impact factor: 4.096

  9 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.