| Literature DB >> 28983516 |
Fabrícia F Nascimento1,2, Mario Dos Reis3, Ziheng Yang4.
Abstract
Bayesian methods have become very popular in molecular phylogenetics due to the availability of user-friendly software implementing sophisticated models of evolution. However, Bayesian phylogenetic models are complex, and analyses are often carried out using default settings, which may not be appropriate. Here, we summarize the major features of Bayesian phylogenetic inference and discuss Bayesian computation using Markov chain Monte Carlo (MCMC), the diagnosis of an MCMC run, and ways of summarising the MCMC sample. We discuss the specification of the prior, the choice of the substitution model, and partitioning of the data. Finally, we provide a list of common Bayesian phylogenetic software and provide recommendations as to their use.Entities:
Mesh:
Year: 2017 PMID: 28983516 PMCID: PMC5624502 DOI: 10.1038/s41559-017-0280-x
Source DB: PubMed Journal: Nat Ecol Evol ISSN: 2397-334X Impact factor: 15.460
List of Bayesian programs
| Program | Brief description | Refs |
|---|---|---|
| Implements a vast number of models. Examples are simultaneous estimation of the tree topology and divergence times, phylodynamics, phylogeography, and species tree estimation under the multispecies coalescent model. | ||
| Implements a large number of models for analysis of nucleotide, amino acid, and morphological data. Estimates species phylogenies and species divergence times. | ||
| Similar to MrBayes, but with its own programming language to set up complex hierarchical Bayesian models. | ||
| Estimates divergence times on a fixed phylogenetic tree. | ||
| Estimates phylogenetic trees based on nucleotide data. This allows for multifurcating trees, helping to reduce spuriously high posterior probabilities for phylogenies. | ||
| Reconstructs phylogenetic trees using infinite mixture models to account for among-site and among-lineage heterogeneity in nucleotide or amino acid compositions, which may be important for inferring deep phylogenies. | ||
| Implements species tree estimation and species delimitation under the multi-species coalescent model using multi-loci genomic sequence data. | ||
| Estimates population sizes and migration rates under the population-subdivision model based on molecular data. | ||
| Estimates divergence times, population sizes and migration rates under the isolation-with-migration model using multi-loci DNA sequence data and a fixed phylogenetic tree for populations. | ||
| Estimates population structure from multi-locus genotype data. | ||
| Estimates clade diversification rates on phylogenies. | ||
| A program for MCMC diagnostics and summaries. | ||
| A package for MCMC diagnostics for Bayesian phylogenetic inference. |
Figure 1Prior, likelihood and posterior distribution for a two-parameter phylogenetic example.
The data of the 12s RNA mitochondrial genes from human and orang-utan are used to estimate of the evolutionary distance (d) and the transition/transversion ratio (κ) model75.
Figure 2Trace plots and histograms for parameters d and κ sampling the posterior distribution of Figure 1c using efficient and inefficient MCMC chains.
Parts a and b show the trace plots of d and κ for an efficient chain with good mixing. The window sizes are w = 0.12 and w = 180, with acceptance proportions P = 30.4% for d and 29.8% for κ, achieving efficiency Eff = 23% for d and 20% for κ. Parts a’ and b’ show the trace plots for an inefficient chain with poor mixing, with w = 5 and w = 1. In a’, the window for d is too wide, and most proposals are rejected (P = 1.5%), so that the chain is often stuck at the same value for many iterations, leading to poor mixing with Eff = 1.79%. In b’, the window for κ is too small, so that most of the proposals are accepted (with P = 98.6%), but the chain makes small baby steps and is very slow in traversing the posterior parameter space, with Eff = 1.28%. Parts c and c’ show histograms of κ for two runs of the efficient and inefficient chains (sample size n = 10,000). The posterior mean (and standard deviation) calculated using a very long run of the efficient chain is 0.104 (0.0114) for d, and 29.2 (10.0) for κ.