Guy Baele1, Philippe Lemey. 1. Department of Microbiology and Immunology, Rega Institute, KU Leuven, Leuven, Belgium. guy.baele@rega.kuleuven.be.
Abstract
MOTIVATION: The advent of new sequencing technologies has led to increasing amounts of data being available to perform phylogenetic analyses, with genomic data giving rise to the field of phylogenomics. High-performance computing is becoming an indispensable research tool to fit complex evolutionary models, which take into account specific genomic properties, to large datasets. Here, we perform an extensive Bayesian phylogenetic model selection study, comparing codon and nucleotide substitution models, including codon position partitioning for nucleotide data as well gene-specific substitution models for both data types. For the best fitting partitioned models, we also compare independent partitioning with standard diffuse prior specification to conditional partitioning via hierarchical prior specification. To compare the different models, we use state-of-the-art marginal likelihood estimation techniques, including path sampling and stepping-stone sampling. RESULTS: We show that a full codon model best describes the features of a whole mitochondrial genome dataset, consisting of 12 protein-coding genes, but only when each gene is allowed to evolve under a separate codon model. However, when using hierarchical prior specification for the partition-specific parameters instead of independent diffuse priors, codon position partitioned nucleotide models can still outperform standard codon models. We demonstrate the feasibility of fitting such a combination of complex models using the BEAGLE library for BEAST in combination with recent graphics cards. We argue that development and use of such models needs to be accompanied by state-of-the-art marginal likelihood estimators because the more traditional and computationally less demanding estimators do not offer adequate accuracy.
MOTIVATION: The advent of new sequencing technologies has led to increasing amounts of data being available to perform phylogenetic analyses, with genomic data giving rise to the field of phylogenomics. High-performance computing is becoming an indispensable research tool to fit complex evolutionary models, which take into account specific genomic properties, to large datasets. Here, we perform an extensive Bayesian phylogenetic model selection study, comparing codon and nucleotide substitution models, including codon position partitioning for nucleotide data as well gene-specific substitution models for both data types. For the best fitting partitioned models, we also compare independent partitioning with standard diffuse prior specification to conditional partitioning via hierarchical prior specification. To compare the different models, we use state-of-the-art marginal likelihood estimation techniques, including path sampling and stepping-stone sampling. RESULTS: We show that a full codon model best describes the features of a whole mitochondrial genome dataset, consisting of 12 protein-coding genes, but only when each gene is allowed to evolve under a separate codon model. However, when using hierarchical prior specification for the partition-specific parameters instead of independent diffuse priors, codon position partitioned nucleotide models can still outperform standard codon models. We demonstrate the feasibility of fitting such a combination of complex models using the BEAGLE library for BEAST in combination with recent graphics cards. We argue that development and use of such models needs to be accompanied by state-of-the-art marginal likelihood estimators because the more traditional and computationally less demanding estimators do not offer adequate accuracy.
Authors: Bram Vrancken; Guy Baele; Anne-Mieke Vandamme; Kristel van Laethem; Marc A Suchard; Philippe Lemey Journal: AIDS Date: 2015-07-31 Impact factor: 4.177
Authors: Nídia Sequeira Trovão; Marc A Suchard; Guy Baele; Marius Gilbert; Philippe Lemey Journal: Mol Biol Evol Date: 2015-09-03 Impact factor: 16.240
Authors: Chase L Ridenour; Jill Cocking; Samuel Poidmore; Daryn Erickson; Breezy Brock; Michael Valentine; Chandler C Roe; Steven J Young; Jennifer A Henke; Kim Y Hung; Jeremy Wittie; Elene Stefanakos; Chris Sumner; Martha Ruedas; Vivek Raman; Nicole Seaton; William Bendik; Heidie M Hornstra O'Neill; Krystal Sheridan; Heather Centner; Darrin Lemmer; Viacheslav Fofanov; Kirk Smith; James Will; John Townsend; Jeffrey T Foster; Paul S Keim; David M Engelthaler; Crystal M Hepp Journal: Front Genet Date: 2021-06-08 Impact factor: 4.772