Literature DB >> 22367748

Bayesian phylogenetics with BEAUti and the BEAST 1.7.

Alexei J Drummond1, Marc A Suchard, Dong Xie, Andrew Rambaut.   

Abstract

Computational evolutionary biology, statistical phylogenetics and coalescent-based population genetics are becoming increasingly central to the analysis and understanding of molecular sequence data. We present the Bayesian Evolutionary Analysis by Sampling Trees (BEAST) software package version 1.7, which implements a family of Markov chain Monte Carlo (MCMC) algorithms for Bayesian phylogenetic inference, divergence time dating, coalescent analysis, phylogeography and related molecular evolutionary analyses. This package includes an enhanced graphical user interface program called Bayesian Evolutionary Analysis Utility (BEAUti) that enables access to advanced models for molecular sequence and phenotypic trait evolution that were previously available to developers only. The package also provides new tools for visualizing and summarizing multispecies coalescent and phylogeographic analyses. BEAUti and BEAST 1.7 are open source under the GNU lesser general public license and available at http://beast-mcmc.googlecode.com and http://beast.bio.ed.ac.uk.

Entities:  

Mesh:

Substances:

Year:  2012        PMID: 22367748      PMCID: PMC3408070          DOI: 10.1093/molbev/mss075

Source DB:  PubMed          Journal:  Mol Biol Evol        ISSN: 0737-4038            Impact factor:   16.240


Introduction

Molecular sequences, morphological measurements, geographic distributions and fossil remains all provide a wealth of potential information about the evolutionary history of life on Earth, the dynamics of ancient and modern biological populations, and the emergence and spread of infectious diseases. One of the challenges of modern Evolutionary Biology is the integration of these different data sources to address evolutionary hypotheses over the full range of spatial and temporal scales. The field is witnessing a transition to an increasingly quantitative science. This transformation began first through an explosion of molecular sequence data with the parallel development of mathematical and computational tools for their analysis. However, increasingly this transformation can be observed in other aspects of Evolutionary Biology where large global databases of complementary sources of information, such as fossils, geographical distributions and population history are being curated and made publicly available.

Software Advances

Here, we present a major new version of the molecular evolutionary software package Bayesian Evolutionary Analysis by Sampling Trees (BEAST), updated to version 1.7, and representing a significant software advance over that previously described (Drummond and Rambaut 2007). Alongside the primary analysis engine in BEAST, this package also includes a suite of utilities for specifying the analysis design, processing output files and summarizing and visualizing the results. Taken together, these programs enable Bayesian inference of molecular sequences with an emphasis on time-structured evolutionary models including phylodynamic models, divergence time estimates, multiloci demographic models, gene/species–tree inference, a range of spatial phylogeographic analyses and discrete and continuous trait evolution. Implementing Markov chain Monte Carlo (MCMC) algorithms to perform these inferences, the package is intended and used for rigorous statistical inference and hypothesis testing of evolutionary models with joint inference of phylogeny. It is also possible to constrain portions of the phylogenetic model space to known values, including the tree topology, and perform conditional inference if required.

User Interface

One area of significant improvement since the last release publication is in the analysis construction and model specification tool called Bayesian Evolutionary Analysis Utility (BEAUti). This acts as the graphical user interface (GUI) for BEAST and allows the user to import data, select models, choose prior distributions on individual parameters and specify the settings for the MCMC sampler (fig. 1). Although the BEAST model specification format (a standard XML format structured text file) allows for great flexibility in the construction of complex evolutionary models, the constraints of a GUI unavoidably restrict the scope of the researcher to a prespecified set of models and combinations, hiding many advanced inference models. Working directly within the BEAST XML input format, on the other hand, represents a high barrier to the accessibility of BEAST and incurs significant risk of inadvertent errors being introduced into the model. We have concentrated development efforts on BEAUti to provide greater flexibility in model specification while still maintaining the benefits of a visual, table-based representation of the model and automatic generation of BEAST XML files. Improvements to BEAUti provide support for multiple data partitions in a joint analysis and the input of fossil calibration and trait information.
F

BEAUti GUI for importing data and specifying the evolutionary model.

BEAUti GUI for importing data and specifying the evolutionary model.

Heterogeneous Data

Multiple data partitions may reflect separate loci for simultaneous inference of genealogies and species trees (Heled and Drummond 2010) and stochastic ancestral recombination graph reconstruction (Bloomquist and Suchard 2010) or the growing wealth of nonsequence data and their respective substitution models. These latter data and models include microsatellite markers (Wu and Drummond 2011), phenotypic traits under a multistate stochastic Dollo process (Alekseyenko et al. 2008), discretized geographic diffusion (Lemey et al. 2009), and multivariate continuous relaxed random walks (Lemey et al. 2010). We also ease the use of a growing number of tree prior specifications. These include the extended Bayesian skyline model (Heled and Drummond 2008) for multilocus data, the flexible Gaussian Markov random field skyride model (Minin et al. 2008), and birth–death models of speciation (Stadler 2010).

Multispecies Coalescent

Discordance between individual gene trees that share a phylogenetic history results from incomplete lineage sorting and becomes increasingly likely when times between speciation events are short compared with species' population sizes. We provide a fully Bayesian implementation of the multispecies coalescent that improves the accuracy and precision of species tree reconstruction (Heled and Drummond 2010) and divergence time estimation (McCormack et al. 2011).

Phenotypic Trait Analysis

For trait inference including phylogeography, we now provide several tools for mapping posterior distributions of trees onto higher dimensional or geographics maps for both interactive exploration and better visualization (Bielejec et al. 2011). These tools interface with GoogleEarth via keyhole markup language and enable users to generate animations of evolutionary processes through time and real space; see http://www.phylogeography.org for several examples.

Molecular Clocks

We have refined the relaxed clock models to allow more than one branch to have the same rate value to remove anticorrelation. In practice this will only have any appreciable impact on trees that have a small number of branches (< 15 taxa). An efficient implementation of the relaxed clock models that facilitates calculation of Bayes Factors for model selection and model averaging of several clock models has also be developed (Li and Drummond, 2012). Further, we provide a new random local clock (RLC) model (Drummond and Suchard 2010), in which all possible local clock configurations and a strict clock are nested, providing a convenient model to test for a strict clock. Heled and Drummond (2011) begins to investigate alternative approaches to the calibration of tree priors with fossil and geological evidence and this area of research is still in its infancy. Often, uncertainty exists in the age of viral RNA/DNA or ancient DNA samples and these can now be incorporated (Shapiro et al. 2011), along with models for sequence damage and error (Rambaut et al. 2009).

Performance

Finally, to exploit high-performance computing, BEAST 1.7 integrates with and provides a GUI interface to configure the BEAGLE library (Ayres et al. 2011) that utilizes multicore processors, vectorization and massively parallel graphics processors to substantially decrease BEAST run-times (Suchard and Rambaut 2009).

Examples

Figure 2 presents a reconstruction of the gene tree relating 13 species of Darwin's finches from a 2,065-bp partial nucleotide alignment of the mitochondrial control region and cytochrome b genes (Sato et al. 1999) and five continuously measured phenotypic traits of the corresponding species (Sulloway 1982). In performing this simultaneous inference, we exploit the RLC model (Drummond and Suchard 2010) and find evidence for one suggestive rate change (Bayes factor in favor of the RLC over a strict clock = 2.3) in the lineage leading to the Cocos Island Finch, Pinaroloxias inornata. Multivariate Brownian trait diffusion shows strong correlation between wing and tarsus length and between bill depth and gonys length. Posterior trait prediction at any point along the history is possible and, currently unique to BEAST, comparative method inference is performed jointly with phylogenetic inference.
F

Simultaneous phylogenetic and phenotypic trait reconstruction of Darwin's finches. Plotted are the maximum clade credibility tree and posterior estimate of the trait correlation matrix. We annotate the tree with estimates of selected posterior clade support values and the one significant nucleotide substitution local clock (in red) and the branches scale in expected substitutions per site. We depict correlation coefficients through their bivariate ellipse sizes, where more highly correlated phenotypes return narrower ellipses.

Simultaneous phylogenetic and phenotypic trait reconstruction of Darwin's finches. Plotted are the maximum clade credibility tree and posterior estimate of the trait correlation matrix. We annotate the tree with estimates of selected posterior clade support values and the one significant nucleotide substitution local clock (in red) and the branches scale in expected substitutions per site. We depict correlation coefficients through their bivariate ellipse sizes, where more highly correlated phenotypes return narrower ellipses. Our second example demonstrates the application of the multispecies coalescent model (*BEAST) to a 1,165-bp fragment of the mitochondrial genome sequenced from 16 Darwin's finches representing four species (Geospiza fortis, G. magnirostris, Camarhynchus parvulus, and Certhidea olivacea). Figure 3 shows 1) a representative gene tree and 2) the two species trees with highest posterior probability. The 99% credible set for the species tree contains 3 of the 15 possible tree topologies: 65.8% (((F, M),P), O); 17.2% ((F, M),(P, O)); 16.5% (((F, M),O), P). This uncertainty in the species tree arises despite overwhelming support for Certhidea olivacea and Camarhynchus parvulus as the nested outgroup species according to the gene tree (fig. 3a), due to the possibility of incomplete lineage sorting in the deeper branches of the gene tree. The possibility of incomplete lineage sorting can be appreciated in figure 3c, in which a representative gene tree is embedded inside the most probable species tree topology for this data, showing extensive incomplete lineage sorting in the Geospiza clade and also depicting the reason that species trees necessarily have (sometimes much) younger divergence times than the corresponding gene tree might suggest. This example demonstrates that even for single-gene analyses, the multispecies coalescent can provide 1) important insight into the potential for incomplete lineage sorting, 2) more accurate assessment of uncertainty in the species tree estimate and 3) better estimates of species divergence times.
F

(a) Representative gene tree of mitochondrial DNA fragment from 16 Darwin's finches of four species (Geospiza fortis, G. magnirostris, Camarhynchus parvulus, and Certhidea olivacea). Nodes that have posterior clade probabilities of greater than 0.5 are labeled with their posterior clade probability. (b) The two most probable species trees (solid line represents most probable species tree; dashed line is second most probable). (c) Gene tree embedded in a point estimate of the species tree, including divergence times and effective population sizes. The x axis is divergence time in units of substitutions per site and the y axis is proportional to effective population size.

(a) Representative gene tree of mitochondrial DNA fragment from 16 Darwin's finches of four species (Geospiza fortis, G. magnirostris, Camarhynchus parvulus, and Certhidea olivacea). Nodes that have posterior clade probabilities of greater than 0.5 are labeled with their posterior clade probability. (b) The two most probable species trees (solid line represents most probable species tree; dashed line is second most probable). (c) Gene tree embedded in a point estimate of the species tree, including divergence times and effective population sizes. The x axis is divergence time in units of substitutions per site and the y axis is proportional to effective population size.

Availability and Future Directions

We make the BEAST package available in both executable and source code forms. BEAST requires Java version 1.5 or greater and executables for Windows, Mac OS and Linux platforms are located at http://beast.bio.ed.ac.uk which serves as the main page for the package. This page also links to a sizable list of self-contained step-by-step tutorials covering basic to advance usage of BEAST. Popular tutorials describe how to use BEAST to infer population dynamics and phylogeographic processes and walk users all the way through to generating a range of graphical summaries of their results. GoogleCode houses the BEAST's version-controlled source code at http://beast-mcmc.googlecode.com and links to two GoogleGroup discussion groups related to BEAST. The first is the “beast-users” group (http://groups.google.com/group/beast-users) with over 1,500 members. At the time of writing, 47 developers belong to the “beast-dev” group that facilitates BEAST development across three continents. Future development directions for BEAUti and BEAST focus on easing the user experience in several ways. These include in fitting hierarchical phylogenetics models (Suchard et al. 2003) that commonly arise in studies of intrahost viral evolution, in exploiting MarkovJump methods (Minin and Suchard 2008; O'Brien et al. 2009) for computationally efficient and robust estimation of complex evolutionary processes under simple models, and in specifying phylogeographic models (Lemey et al. 2009, 2010) in a convenient geographical user interface.
  23 in total

1.  Many-core algorithms for statistical phylogenetics.

Authors:  Marc A Suchard; Andrew Rambaut
Journal:  Bioinformatics       Date:  2009-04-15       Impact factor: 6.937

2.  Learning to count: robust estimates for labeled distances between molecular sequences.

Authors:  John D O'Brien; Vladimir N Minin; Marc A Suchard
Journal:  Mol Biol Evol       Date:  2009-01-08       Impact factor: 16.240

3.  Smooth skyride through a rough skyline: Bayesian coalescent-based inference of population dynamics.

Authors:  Vladimir N Minin; Erik W Bloomquist; Marc A Suchard
Journal:  Mol Biol Evol       Date:  2008-04-11       Impact factor: 16.240

4.  Accommodating the effect of ancient DNA damage on inferences of demographic histories.

Authors:  Andrew Rambaut; Simon Y W Ho; Alexei J Drummond; Beth Shapiro
Journal:  Mol Biol Evol       Date:  2008-11-11       Impact factor: 16.240

5.  Phylogeography takes a relaxed random walk in continuous space and time.

Authors:  Philippe Lemey; Andrew Rambaut; John J Welch; Marc A Suchard
Journal:  Mol Biol Evol       Date:  2010-03-04       Impact factor: 16.240

6.  Unifying vertical and nonvertical evolution: a stochastic ARG-based framework.

Authors:  Erik W Bloomquist; Marc A Suchard
Journal:  Syst Biol       Date:  2009-11-09       Impact factor: 15.683

7.  BEAST: Bayesian evolutionary analysis by sampling trees.

Authors:  Alexei J Drummond; Andrew Rambaut
Journal:  BMC Evol Biol       Date:  2007-11-08       Impact factor: 3.260

8.  Bayesian inference of population size history from multiple loci.

Authors:  Joseph Heled; Alexei J Drummond
Journal:  BMC Evol Biol       Date:  2008-10-23       Impact factor: 3.260

9.  Bayesian phylogeography finds its roots.

Authors:  Philippe Lemey; Andrew Rambaut; Alexei J Drummond; Marc A Suchard
Journal:  PLoS Comput Biol       Date:  2009-09-25       Impact factor: 4.475

10.  Bayesian inference of species trees from multilocus data.

Authors:  Joseph Heled; Alexei J Drummond
Journal:  Mol Biol Evol       Date:  2009-11-11       Impact factor: 16.240

View more
  2000 in total

1.  Recent HIV-1 Outbreak Among Intravenous Drug Users in Romania: Evidence for Cocirculation of CRF14_BG and Subtype F1 Strains.

Authors:  Iulia Niculescu; Simona Paraschiv; Dimitrios Paraskevis; Adrian Abagiu; Ionelia Batan; Leontina Banica; Dan Otelea
Journal:  AIDS Res Hum Retroviruses       Date:  2014-11-04       Impact factor: 2.205

2.  Protection of the general stress response σS factor by the CrsR regulator allows a rapid and efficient adaptation of Shewanella oneidensis.

Authors:  Sophie Bouillet; Olivier Genest; Vincent Méjean; Chantal Iobbi-Nivol
Journal:  J Biol Chem       Date:  2017-07-20       Impact factor: 5.157

3.  DNA barcoding and species delimitation of the Old World tooth-carps, family Aphaniidae Hoedeman, 1949 (Teleostei: Cyprinodontiformes).

Authors:  Hamid Reza Esmaeili; Azad Teimori; Fatah Zarei; Golnaz Sayyadzadeh
Journal:  PLoS One       Date:  2020-04-16       Impact factor: 3.240

4.  The diversity of rice phytocystatins.

Authors:  Ana Paula Christoff; Rogerio Margis
Journal:  Mol Genet Genomics       Date:  2014-08-07       Impact factor: 3.291

5.  Human pegivirus (HPgV) infection in Ghanaians co-infected with human immunodeficiency virus (HIV) and hepatitis B virus (HBV).

Authors:  Kombo F N'Guessan; Ceejay Boyce; Awewura Kwara; Timothy N A Archampong; Margaret Lartey; Kwamena W Sagoe; Ernest Kenu; Adjoa Obo-Akwa; Jason T Blackard
Journal:  Virus Genes       Date:  2018-03-17       Impact factor: 2.332

Review 6.  Evolution and population genomics of the Lyme borreliosis pathogen, Borrelia burgdorferi.

Authors:  Stephanie N Seifert; Camilo E Khatchikian; Wei Zhou; Dustin Brisson
Journal:  Trends Genet       Date:  2015-03-09       Impact factor: 11.639

7.  Identification and comparative analysis of hepatitis B virus genotype D/E recombinants in Africa.

Authors:  Ceejay L Boyce; Lilia Ganova-Raeva; Timothy N A Archampong; Margaret Lartey; Kwamena W Sagoe; Adjoa Obo-Akwa; Ernest Kenu; Awewura Kwara; Jason T Blackard
Journal:  Virus Genes       Date:  2017-05-31       Impact factor: 2.332

8.  Somatic evolution and global expansion of an ancient transmissible cancer lineage.

Authors:  Kevin Gori; Andrea Strakova; Adrian Baez-Ortega; Janice L Allen; Karen M Allum; Leontine Bansse-Issa; Thinlay N Bhutia; Jocelyn L Bisson; Cristóbal Briceño; Artemio Castillo Domracheva; Anne M Corrigan; Hugh R Cran; Jane T Crawford; Eric Davis; Karina F de Castro; Andrigo B de Nardi; Anna P de Vos; Laura Delgadillo Keenan; Edward M Donelan; Adela R Espinoza Huerta; Ibikunle A Faramade; Mohammed Fazil; Eleni Fotopoulou; Skye N Fruean; Fanny Gallardo-Arrieta; Olga Glebova; Pagona G Gouletsou; Rodrigo F Häfelin Manrique; Joaquim J G P Henriques; Rodrigo S Horta; Natalia Ignatenko; Yaghouba Kane; Cathy King; Debbie Koenig; Ada Krupa; Steven J Kruzeniski; Young-Mi Kwon; Marta Lanza-Perea; Mihran Lazyan; Adriana M Lopez Quintana; Thibault Losfelt; Gabriele Marino; Simón Martínez Castañeda; Mayra F Martínez-López; Michael Meyer; Edward J Migneco; Berna Nakanwagi; Karter B Neal; Winifred Neunzig; Máire Ní Leathlobhair; Sally J Nixon; Antonio Ortega-Pacheco; Francisco Pedraza-Ordoñez; Maria C Peleteiro; Katherine Polak; Ruth J Pye; John F Reece; Jose Rojas Gutierrez; Haleema Sadia; Sheila K Schmeling; Olga Shamanova; Alan G Sherlock; Maximilian Stammnitz; Audrey E Steenland-Smit; Alla Svitich; Lester J Tapia Martínez; Ismail Thoya Ngoka; Cristian G Torres; Elizabeth M Tudor; Mirjam G van der Wel; Bogdan A Viţălaru; Sevil A Vural; Oliver Walkinton; Jinhong Wang; Alvaro S Wehrle-Martinez; Sophie A E Widdowson; Michael R Stratton; Ludmil B Alexandrov; Iñigo Martincorena; Elizabeth P Murchison
Journal:  Science       Date:  2019-08-02       Impact factor: 47.728

9.  Link of a ubiquitous human coronavirus to dromedary camels.

Authors:  Victor M Corman; Isabella Eckerle; Ziad A Memish; Anne M Liljander; Ronald Dijkman; Hulda Jonsdottir; Kisi J Z Juma Ngeiywa; Esther Kamau; Mario Younan; Malakita Al Masri; Abdullah Assiri; Ilona Gluecks; Bakri E Musa; Benjamin Meyer; Marcel A Müller; Mosaad Hilali; Set Bornstein; Ulrich Wernery; Volker Thiel; Joerg Jores; Jan Felix Drexler; Christian Drosten
Journal:  Proc Natl Acad Sci U S A       Date:  2016-08-15       Impact factor: 11.205

10.  On the contribution of Angola to the initial spread of HIV-1.

Authors:  Andrea-Clemencia Pineda-Peña; Jorge Varanda; João Dinis Sousa; Kristof Theys; Inês Bártolo; Thomas Leitner; Nuno Taveira; Anne-Mieke Vandamme; Ana B Abecasis
Journal:  Infect Genet Evol       Date:  2016-08-10       Impact factor: 3.342

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.