| Literature DB >> 27342194 |
John J Andersen1,2, Bradley J Nelson1,3, Jeremy M Brown4.
Abstract
BACKGROUND: Branch-length parameters are a central component of phylogenetic models and of intrinsic biological interest. Default branch-length priors in some Bayesian phylogenetic software can be unintentionally informative and lead to branch- and tree-length estimates that are unreasonable. Alternatively, priors may be uninformative, but lead to diffuse posterior estimates. Despite the widespread availability of relevant datasets from other groups, biologists rarely leverage outside information to specify branch-length priors that are specific to the analysis they are conducting.Entities:
Keywords: Bayesian phylogenetics; Branch lengths; Informed priors
Mesh:
Year: 2016 PMID: 27342194 PMCID: PMC4919878 DOI: 10.1186/s12859-016-1132-4
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Fig. 1Flowchart for generating informed branch-length priors with EmpPrior. EmpPrior-search queries TreeBASE to find data similar to the focal data. Outside data are then used as input for maximum-likelihood (ML) tree searches. Branch-length distributions are fit to ML trees in EmpPrior-fit and parameter estimates are used to set priors for analysis of the focal data
Fig. 2EmpPrior-search graphical user interface (GUI). The EmpPrior-search GUI allows users to specify the gene name and constraints on the number of taxa in a series of text fields at the bottom. These restrictions help to ensure that datasets returned from the search can provide relevant information to inform analysis of the focal data. A window in the middle of the GUI logs information about the progress of the TreeBase search and post-processing of datasets. A progress bar at the top provides users with a rough idea of EmpPrior-search’s progress. An optional post-processing step can be turned on with a radio button at the bottom, causing EmpPrior-search to attempt to extract the gene of interest from a multi-gene dataset. Due to inconsistencies in gene naming and data file formatting, this step can sometimes produce unreliable results. Users should always manually inspect relevant datasets to ensure that they have been parsed properly
Fig. 3Log-likelihood surfaces for c and α of the compound Dirichlet branch-length distribution. Both log-likelihood surfaces were calculated using maximum-likelihood (ML) branch lengths based on a dataset of cytochrome b and 16S sequences from alpine newts (Mesotriton alpestris) with TreeBase Study ID S1777 [11]. The left plot shows log-likelihoods based on the compound Dirichlet distribution [3] for different values of the internal:external branch-length ratio (c) with all other parameters fixed. The right plot shows log-likelihoods for different values of the concentration parameter (α) with all other parameters fixed. The dashed line in each plot shows the ML estimate for each parameter returned by EmpPrior-fit
Default and informed tree-length estimates
| Outside Dataset |
|
|
| MLE | 95 % HPD |
|---|---|---|---|---|---|
| None (default) | 10 | 0.40 | [10.54–14.94] | ||
| 1 | 1 | 0.40 | [0.24–0.34] | ||
| S1777 | 95.6 | 0.40 | [0.83–1.25] | ||
|
|
| 0.40 |
| ||
| 1 | 0.18 | 0.40 | [0.26–0.39] | ||
|
|
| 0.40 |
| ||
| S2043 | 80.7 | 0.40 | [1.03–1.53] | ||
|
|
| 0.40 |
| ||
| 1 | 0.27 | 0.40 | [0.26–0.38] | ||
|
|
| 0.40 |
|
Highest posterior density intervals (95 % HPDs) of tree length (TL) for analyses of a focal dataset from brittle stars [12]. Each individual row corresponds to a Bayesian analysis of the focal data with different prior settings for branch- and tree-lengths. We used both default and informed parameterizations of the exponential (λ) and compound Dirichlet (α, c) branch-length priors. The top set of two shaded rows show default settings and focal inferences. Below that, alternate shadings indicate prior settings parameterized with different outside datasets (S1777 and S2043). The focal TL MLE was produced with Garli [13]. Focal HPDs in bold contain the MLE