Literature DB >> 29942656

Bayesian phylogenetic and phylodynamic data integration using BEAST 1.10.

Marc A Suchard1,2,3, Philippe Lemey4, Guy Baele4, Daniel L Ayres5, Alexei J Drummond6,7, Andrew Rambaut8.   

Abstract

The Bayesian Evolutionary Analysis by Sampling Trees (BEAST) software package has become a primary tool for Bayesian phylogenetic and phylodynamic inference from genetic sequence data. BEAST unifies molecular phylogenetic reconstruction with complex discrete and continuous trait evolution, divergence-time dating, and coalescent demographic models in an efficient statistical inference engine using Markov chain Monte Carlo integration. A convenient, cross-platform, graphical user interface allows the flexible construction of complex evolutionary analyses.

Entities:  

Keywords:  Bayesian inference; Markov chain Monte Carlo; phylodynamics; phylogenetics

Year:  2018        PMID: 29942656      PMCID: PMC6007674          DOI: 10.1093/ve/vey016

Source DB:  PubMed          Journal:  Virus Evol        ISSN: 2057-1577


Introduction

First released over 14 years ago, the Bayesian Evolutionary Analysis by Sampling Trees (BEAST) software package has become firmly established in a broad diversity of biological fields from phylogenetics and paleontology, population dynamics, ancient DNA, and the phylodynamics and molecular epidemiology of infectious disease (Drummond et al. 2012). BEAST's specific focus on time-scaled trees, and the evolutionary analyses dependent on them, has given it a unique place in the toolbox of molecular evolution and phylogenetic researchers. Since inception, a strong motivation for BEAST development has been the rapid growth of pathogen genome sequencing as part of public health responses to infectious diseases (Grenfell et al. 2004). In particular, fast evolving viruses can now be tracked in near real-time (see, e.g. Quick et al. 2016) to understand their epidemiology and evolutionary dynamics. In BEAST version 1.10, we have introduced a series of advances with a particular focus on delivering accurate and informative insights for infectious disease research through the integration of diverse data sources, including phenotypic and epidemiological information, with molecular evolutionary models. These advances fall into three broad themes—the integration of diverse sources of extrinsic information as covariates of evolutionary processes, the increased flexibility and modularization of the model design process with robust and accurate model testing methods, and substantial improvements on the speed and efficiency of the statistical inference.

2. Data integration

Many traits in phylogenetics are represented as or partitioned into a finite number of discrete values, with geographical location standing out as a popular example. Because BEAST is dedicated to sampling time-scaled phylogenies, new developments of discrete character mapping enable the reconstruction of timed viral dispersal patterns while accommodating phylogenetic uncertainty. By extending the discrete diffusion models to incorporate empirical data as covariates or predictors of transition rates, BEAST can simultaneously test and quantify a range of potential predictive variables of the diffusion process (Lemey et al. 2014). Further, realizations of the trait transition process can also be efficiently produced, to pinpoint the nature and timing of changes in evolutionary history beyond ancestral node state reconstruction (termed Markov jumps), or to infer the time spent in a particular state (Markov rewards) (Minin and Suchard 2008). For molecular data, fast stochastic mapping approaches are also employed to obtain site-specific estimates, integrating over the posterior distribution of phylogenies and ancestral reconstructions to quantify uncertainty on these measures of the selective forces on individual codons (Lemey et al. 2012). Multivariate continuous traits are incorporated using phylogenetic Brownian diffusion processes, modelling the shared ancestral dependence across taxa and the correlations between these variables. Such continuous models have most frequently been applied to diffusion on a geographical landscape with the traits representing coordinates and the phylogeny reconstructing the epidemiological process within the host population (Lemey et al. 2010). The landscapes can also represent other spaces, and integration of antibody binding assay data have extended ‘antigenic cartography’ (Smith et al. 2004) approaches to model simultaneous antigenic and genetic evolution and infer the viral trajectories in the immunological space generated by the host population (Bedford et al. 2014). Standard Brownian diffusion processes that assume a zero-mean displacement along each branch may however be unrealistic for many evolutionary problems (including geographical reconstruction). A recently developed relaxed directional random walk allows the diffusion processes to take on different directional trends in different parts of the phylogeny while preserving model identifiability (Gill et al. 2017) and opens up these processes for a wide range of applications. BEAST 1.10 also extends multivariate phylogenetic diffusion to latent liability model formulations in order to assess correlations between traits of different data types, including (various combinations of) continuous, binary and discrete traits (Cybis et al. 2015), as demonstrated by applications to flower morphology, antibiotic resistance, and viral epitope evolution. To infer correlations between high-dimensional traits computationally efficiently, a novel phylogenetic factor analysis approach assumes that a small unknown number of independent evolutionary factors evolve along the phylogeny and generate clusters of dependent traits at the tips (Tolkoff et al. 2018). Further extending the data integration approach, BEAST 1.10 includes a flexible framework for incorporating time-varying covariates of the effective population size over time. This uses Gaussian Markov random fields to reconstruct smoothed effective population size trajectories while simultaneously estimating to what extent predictor variables (e.g. fluctuations in climatic factors, host mobility, or vector density) may have driven the dynamics (Gill et al. 2016). Using a similar generalized linear modeling (GLM) approach, classical epidemiological time-series data such as case counts (Gill et al. 2016) can be integrated with pathogen genome sequence data to provide joint inference of important epidemiological parameters. Finally, recent host-transmission models allow the integration of complete or partial knowledge of a pathogen’s transmission history, enabling the simultaneous inference of within-host population dynamics, viral evolutionary processes, and transmission times and bottlenecks (Vrancken et al. 2014). Likewise, other priors enable the reconstruction of transmission trees of infectious disease epidemics and outbreaks, while accommodating phylogenetic uncertainty and employ a newly designed set of phylogenetic tree proposals that respect node partitions (Hall et al. 2015).

3. Flexible model design

BEAST's companion graphical user interface program, BEAUti, allows the user to import data, select models, choose prior distributions, and specify the settings for both Bayesian inference and marginal likelihood estimation. Our efforts on BEAUti 1.10 have focused on allowing the user to easily link or unlink substitution, clock and tree models across multiple partitions as well as linking individual parameters to provide considerable adaptability in model design. Additionally, BEAUti can also group various parameters in a hierarchical phylogenetic model prior (Suchard et al. 2003), which allows parameters to take different values but be linked by a common distribution, the parameters of which can then be inferred. For example, flexible codon model parameterizations, using hierarchical phylogenetic models (Baele et al. 2016b) and incorporating a range of potential predictive variables for substitution behaviour (Bielejec et al. 2016a), provide insight into the tempo and mode of pathogen evolution. Marginal likelihood estimation to compare models using Bayes factors has become common practice in Bayesian phylogenetic inference. BEAST 1.10 now features marginal likelihood estimation (Baele et al. 2012), using path sampling (Gelman and Meng 1998; Lartillot and Philippe 2006) and stepping-stone sampling (Xie et al. 2011), as well as the recently developed generalized stepping-stone sampling (Fan et al. 2011; Baele et al. 2016a) that offers increased accuracy and improved numerical stability by employing the concept of ‘working distributions’, i.e. distributions with known normalizing constants and parameterized using samples from the posterior distribution.

4. Performance and efficiency

Increasing model complexity and sequence availability in modern-day analyses have stretched the computational demands of Bayesian phylogenetic inference. To improve efficiency for large-scale sequence data, BEAST 1.10 uses the BEAGLE library (Ayres et al. 2012) that provides access to massive parallelization on a range of computing architectures. In particular, the combination of BEAST 1.10 with BEAGLE 3.0 (Ayres et al., under review) allows multiple data partitions to be parallelized across a single high-performance device (i.e. a GPGPU graphics board) allowing for the utilization of the full capacity of these devices, reducing the computational overheads. As the complexity of phylogenetic model designs increase, concomitant with the surge in scale of genomic data, updating only a parameter associated with a single data partition limits the occupation of the massively multicore devices. To address this we have developed an adaptive multivariate transition kernel that simultaneously updates parameters across all the partitioned data, making more efficient use of available hardware (Baele et al. 2017). Through a combination of these two advances, BEAST 1.10 can yield a sizeable increase in effectively independent posterior samples per unit-time over previous software versions. For the example data described below, we see a 5- to 25-fold improvement depending on the model parameter, using an NVIDIA Titan V.

4.1 Example

Figure 1 presents a spatiotemporal reconstruction of Ebola virus evolution and spread during the 2013–2016 West African epidemic, highlighting several aspects of phylodynamic data integration. The estimates are based on a large data set of 1,610 genomes that represent over 5 per cent of the known cases (Dudas et al. 2017). Administrative regions (n = 56) are included as discrete sampling locations to estimate viral dispersal through time while testing the contribution of a set of potential covariates to the pattern of spread using a GLM parameterization of phylogeographic diffusion (Lemey et al. 2014). This indicates, for example, the importance of population sizes and geographic distance to explain viral dispersal intensities.
Figure 1.

Phylodynamic analysis of the 2013–2016 West African Ebola virus epidemic, encompassing simultaneous estimation of sequence and discrete (geographic) trait data with a GLM fitted to the discrete trait model in order to establish potential predictors of viral transition between locations. Plotted are a snapshot of geographic spread using SpreaD3 (Bielejec et al. 2016b), the maximum clade credibility tree, the posterior estimates of the GLM coefficients for seven possible predictors for Ebola virus spread (Bayes Factor support values of 3, 20, and 150 are indicated by vertical lines) and the effective population size through time, estimated by incorporating case counts.

Phylodynamic analysis of the 2013–2016 West African Ebola virus epidemic, encompassing simultaneous estimation of sequence and discrete (geographic) trait data with a GLM fitted to the discrete trait model in order to establish potential predictors of viral transition between locations. Plotted are a snapshot of geographic spread using SpreaD3 (Bielejec et al. 2016b), the maximum clade credibility tree, the posterior estimates of the GLM coefficients for seven possible predictors for Ebola virus spread (Bayes Factor support values of 3, 20, and 150 are indicated by vertical lines) and the effective population size through time, estimated by incorporating case counts.

5. Relationship to BEAST2 and other software

Distinct from BEAST 1.10 described here, BEAST2 is an independent project (Bouckaert et al. 2014) intended as a platform that more readily facilitates the development of packages of models and analyses by other researchers. Although both projects share many of the same models and the underlying inference framework, BEAST has increasingly focused on the analysis of rapidly evolving pathogens and their evolution and epidemiology. We affirm that BEAST will continue to be developed in parallel to the BEAST2. While these projects share a recent common origin, each now aims to foster complementary research domains. A range of other software focusing on phylodynamic analyses of fast-evolving pathogens has been described since the last version of BEAST was published. Of particular note are LSD (To et al. 2016), TreeDater (Volz and Frost 2017), and TreeTime (Sagulenko et al. 2018). These programs use least-squares algorithms (LSD) or maximum likelihood inference (TreeDater, TreeTime) and provide rapid analysis on large data sets for a subset of the models that BEAST provides. However, the former program implements very limited phylodynamic models and the latter two programs require a phylogenetic tree, inferred using other software, as input data, conditioning parameter estimates on this single tree.

5.1 Availability

BEAST 1.10 is open source under the GNU lesser general public license and available at https://beast-dev.github.io/beast-mcmc for cross-platform compiled programs and https://github.com/beast-dev/beast-mcmc for software development and source code. It requires Java version 1.6 or greater. Documentation, tutorials, and help are available at http://beast.community and many users actively discuss BEAST usage and development in the ‘beast-users’ GoogleGroup discussion group (http://groups.google.com/group/beast-users). We also host an expanding suite of R tools—designed for posterior analyses using BEAST (https://github.com/beast-dev/RBeast).
  30 in total

1.  Computing Bayes factors using thermodynamic integration.

Authors:  Nicolas Lartillot; Hervé Philippe
Journal:  Syst Biol       Date:  2006-04       Impact factor: 15.683

2.  Fast, accurate and simulation-free stochastic mapping.

Authors:  Vladimir N Minin; Marc A Suchard
Journal:  Philos Trans R Soc Lond B Biol Sci       Date:  2008-12-27       Impact factor: 6.237

3.  Understanding Past Population Dynamics: Bayesian Coalescent-Based Modeling with Covariates.

Authors:  Mandev S Gill; Philippe Lemey; Shannon N Bennett; Roman Biek; Marc A Suchard
Journal:  Syst Biol       Date:  2016-07-01       Impact factor: 15.683

4.  Improving the accuracy of demographic and molecular clock model comparison while accommodating phylogenetic uncertainty.

Authors:  Guy Baele; Philippe Lemey; Trevor Bedford; Andrew Rambaut; Marc A Suchard; Alexander V Alekseyenko
Journal:  Mol Biol Evol       Date:  2012-03-07       Impact factor: 16.240

5.  Virus genomes reveal factors that spread and sustained the Ebola epidemic.

Authors:  Gytis Dudas; Luiz Max Carvalho; Trevor Bedford; Andrew J Tatem; Guy Baele; Nuno R Faria; Daniel J Park; Jason T Ladner; Armando Arias; Danny Asogun; Filip Bielejec; Sarah L Caddy; Matthew Cotten; Jonathan D'Ambrozio; Simon Dellicour; Antonino Di Caro; Joseph W Diclaro; Sophie Duraffour; Michael J Elmore; Lawrence S Fakoli; Ousmane Faye; Merle L Gilbert; Sahr M Gevao; Stephen Gire; Adrianne Gladden-Young; Andreas Gnirke; Augustine Goba; Donald S Grant; Bart L Haagmans; Julian A Hiscox; Umaru Jah; Jeffrey R Kugelman; Di Liu; Jia Lu; Christine M Malboeuf; Suzanne Mate; David A Matthews; Christian B Matranga; Luke W Meredith; James Qu; Joshua Quick; Suzan D Pas; My V T Phan; Georgios Pollakis; Chantal B Reusken; Mariano Sanchez-Lockhart; Stephen F Schaffner; John S Schieffelin; Rachel S Sealfon; Etienne Simon-Loriere; Saskia L Smits; Kilian Stoecker; Lucy Thorne; Ekaete Alice Tobin; Mohamed A Vandi; Simon J Watson; Kendra West; Shannon Whitmer; Michael R Wiley; Sarah M Winnicki; Shirlee Wohl; Roman Wölfel; Nathan L Yozwiak; Kristian G Andersen; Sylvia O Blyden; Fatorma Bolay; Miles W Carroll; Bernice Dahn; Boubacar Diallo; Pierre Formenty; Christophe Fraser; George F Gao; Robert F Garry; Ian Goodfellow; Stephan Günther; Christian T Happi; Edward C Holmes; Brima Kargbo; Sakoba Keïta; Paul Kellam; Marion P G Koopmans; Jens H Kuhn; Nicholas J Loman; N'Faly Magassouba; Dhamari Naidoo; Stuart T Nichol; Tolbert Nyenswah; Gustavo Palacios; Oliver G Pybus; Pardis C Sabeti; Amadou Sall; Ute Ströher; Isatta Wurie; Marc A Suchard; Philippe Lemey; Andrew Rambaut
Journal:  Nature       Date:  2017-04-12       Impact factor: 49.962

6.  ASSESSING PHENOTYPIC CORRELATION THROUGH THE MULTIVARIATE PHYLOGENETIC LATENT LIABILITY MODEL.

Authors:  Gabriela B Cybis; Janet S Sinsheimer; Trevor Bedford; Alison E Mather; Philippe Lemey; Marc A Suchard
Journal:  Ann Appl Stat       Date:  2015-06       Impact factor: 2.083

7.  SpreaD3: Interactive Visualization of Spatiotemporal History and Trait Evolutionary Processes.

Authors:  Filip Bielejec; Guy Baele; Bram Vrancken; Marc A Suchard; Andrew Rambaut; Philippe Lemey
Journal:  Mol Biol Evol       Date:  2016-04-23       Impact factor: 16.240

8.  A counting renaissance: combining stochastic mapping and empirical Bayes to quickly detect amino acid sites under positive selection.

Authors:  Philippe Lemey; Vladimir N Minin; Filip Bielejec; Sergei L Kosakovsky Pond; Marc A Suchard
Journal:  Bioinformatics       Date:  2012-10-12       Impact factor: 6.937

9.  Fast Dating Using Least-Squares Criteria and Algorithms.

Authors:  Thu-Hien To; Matthieu Jung; Samantha Lycett; Olivier Gascuel
Journal:  Syst Biol       Date:  2015-09-30       Impact factor: 15.683

Review 10.  Unifying the epidemiological and evolutionary dynamics of pathogens.

Authors:  Bryan T Grenfell; Oliver G Pybus; Julia R Gog; James L N Wood; Janet M Daly; Jenny A Mumford; Edward C Holmes
Journal:  Science       Date:  2004-01-16       Impact factor: 47.728

View more
  652 in total

1.  Measles virus and rinderpest virus divergence dated to the sixth century BCE.

Authors:  Ariane Düx; Sebastian Lequime; Philippe Lemey; Sébastien Calvignac-Spencer; Livia Victoria Patrono; Bram Vrancken; Sengül Boral; Jan F Gogarten; Antonia Hilbig; David Horst; Kevin Merkel; Baptiste Prepoint; Sabine Santibanez; Jasmin Schlotterbeck; Marc A Suchard; Markus Ulrich; Navena Widulin; Annette Mankertz; Fabian H Leendertz; Kyle Harper; Thomas Schnalke
Journal:  Science       Date:  2020-06-19       Impact factor: 47.728

2.  Genetic diversity and evolution of enterovirus A71 subgenogroup C1 from children with hand, foot, and mouth disease in Thailand.

Authors:  Jiratchaya Puenpa; Kamol Suwannakarn; Jira Chansaenroj; Chompoonut Auphimai; Nasamon Wanlapakorn; Sompong Vongpunsawad; Yong Poovorawan
Journal:  Arch Virol       Date:  2021-06-04       Impact factor: 2.574

3.  Selection, Linkage, and Population Structure Interact To Shape Genetic Variation Among Threespine Stickleback Genomes.

Authors:  Thomas C Nelson; Johnathan G Crandall; Catherine M Ituarte; Julian M Catchen; William A Cresko
Journal:  Genetics       Date:  2019-06-18       Impact factor: 4.562

4.  Insights into matrilineal genetic structure, differentiation and ancestry of Armenians based on complete mitogenome data.

Authors:  Miroslava Derenko; Galina Denisova; Boris Malyarchuk; Anahit Hovhannisyan; Zaruhi Khachatryan; Peter Hrechdakian; Andrey Litvinov; Levon Yepiskoposyan
Journal:  Mol Genet Genomics       Date:  2019-08-01       Impact factor: 3.291

5.  Anomalous influenza seasonality in the United States and the emergence of novel influenza B viruses.

Authors:  Rebecca K Borchering; Christian E Gunning; Deven V Gokhale; K Bodie Weedop; Arash Saeidpour; Tobias S Brett; Pejman Rohani
Journal:  Proc Natl Acad Sci U S A       Date:  2021-02-02       Impact factor: 11.205

6.  Evolutionary Dynamics of Transferred Sequences Between Organellar Genomes in Cucurbita.

Authors:  Xitlali Aguirre-Dugua; Gabriela Castellanos-Morales; Leslie M Paredes-Torres; Helena S Hernández-Rosales; Josué Barrera-Redondo; Guillermo Sánchez-de la Vega; Fernando Tapia-Aguirre; Karen Y Ruiz-Mondragón; Enrique Scheinvar; Paulina Hernández; Erika Aguirre-Planter; Salvador Montes-Hernández; Rafael Lira-Saade; Luis E Eguiarte
Journal:  J Mol Evol       Date:  2019-11-07       Impact factor: 2.395

7.  Host-Specific Evolutionary and Transmission Dynamics Shape the Functional Diversification of Staphylococcus epidermidis in Human Skin.

Authors:  Wei Zhou; Michelle Spoto; Rachel Hardy; Changhui Guan; Elizabeth Fleming; Peter J Larson; Joseph S Brown; Julia Oh
Journal:  Cell       Date:  2020-01-30       Impact factor: 41.582

8.  Molecular characterization and complete genome of alstroemeria mosaic virus (AlMV).

Authors:  Francisco Mosquera-Yuqui; Patricia Garrido; Francisco J Flores
Journal:  Virus Genes       Date:  2019-11-06       Impact factor: 2.332

9.  Molecular Evolution of rbcL in Orthotrichales (Bryophyta): Site Variation, Adaptive Evolution, and Coevolutionary Patterns of Amino Acid Replacements.

Authors:  Moisès Bernabeu; Josep A Rosselló
Journal:  J Mol Evol       Date:  2021-02-20       Impact factor: 2.395

10.  Molecular biology and structure of a novel penaeid shrimp densovirus elucidate convergent parvoviral host capsid evolution.

Authors:  Judit J Pénzes; Hanh T Pham; Paul Chipman; Nilakshee Bhattacharya; Robert McKenna; Mavis Agbandje-McKenna; Peter Tijssen
Journal:  Proc Natl Acad Sci U S A       Date:  2020-08-03       Impact factor: 11.205

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.