| Literature DB >> 26880113 |
Adrien Rieux1, François Balloux1.
Abstract
Molecular dating of phylogenetic trees is a growing discipline using sequence data to co-estimate the timing of evolutionary events and rates of molecular evolution. All molecular-dating methods require converting genetic divergence between sequences into absolute time. Historically, this could only be achieved by associating externally derived dates obtained from fossil or biogeographical evidence to internal nodes of the tree. In some cases, notably for fast-evolving genomes such as viruses and some bacteria, the time span over which samples were collected may cover a significant proportion of the time since they last shared a common ancestor. This situation allows phylogenetic trees to be calibrated by associating sampling dates directly to the sequences representing the tips (terminal nodes) of the tree. The increasing availability of genomic data from ancient DNA extends the applicability of such tip-based calibration to a variety of taxa including humans, extinct megafauna and various microorganisms which typically have a scarce fossil record. The development of statistical models accounting for heterogeneity in different aspects of the evolutionary process while accommodating very large data sets (e.g. whole genomes) has allowed using tip-dating methods to reach inferences on divergence times, substitution rates, past demography or the age of specific mutations on a variety of spatiotemporal scales. In this review, we summarize the current state of the art of tip dating, discuss some recent applications, highlight common pitfalls and provide a 'how to' guide to thoroughly perform such analyses.Entities:
Keywords: Bayesian phylogenetics; calibration, divergence time and substitution rate inferences; measurably evolving populations; population dynamics; tip-dating
Mesh:
Substances:
Year: 2016 PMID: 26880113 PMCID: PMC4949988 DOI: 10.1111/mec.13586
Source DB: PubMed Journal: Mol Ecol ISSN: 0962-1083 Impact factor: 6.185
Figure 1Tip‐dating principle. (a) In this simplified theoretical situation adapted from Rambaut (2000), sequences A and B were isolated at different points in time ( and , respectively) and C is an outgroup sequence. If we assume the rate of evolution to be the same in lineages A and B, then the amount of molecular evolution expected to have occurred between and is equal to – ( and being the genetic distance between A&C and B&C, respectively). If the time X between and represents a significant proportion of the time Y since A and B last shared a common ancestor, then one can use tip dates to conjointly estimate the rate of evolution μ = (AC−BC)/(−) and extrapolate the age of T MRCA(AB). (b) Top: Tree with modern samples only for which no divergence time estimate is possible without calibrations on internal nodes or a strong prior on the rate of molecular clock. Middle: Tree where tip dates may not be widely spread enough for accurate inferences. Bottom: Tree where tip date width should be sufficiently broad to allow divergence time and rate of evolution estimates with a good degree of certainty, since the sample dates cover a relatively large fraction of the total age of the tree.
List of available and useful programs/packages to perform tip dating
| Software to perform tip dating | Method | Tree topology | Models of rate variation | Tip date uncertainty (as in Fig. | Ref/Source |
|---|---|---|---|---|---|
|
| Bayesian | Estimated/Fixed | SC, LC, ARC, URC | a, b, c, d, e | [1] |
| R8s | Nonparametric | Fixed | SC, LC, DC, ARC | a | [2] |
| PAML (MCMCTREE) | Bayesian | Fixed | SC, ARC, URC | a, b, d, e | [3] |
| PHYSHER | ML | Fixed | LC, DC | a | [4] |
| Mr Bayes | Bayesian | Estimated/Fixed | SC, ARC, URC | a, b, d, e | [5] |
| TipDate | ML | Estimated | SC, DC | a | [6] |
| DAMBE | Distance based | Fixed | SC, DC, ARC | a | [7] |
| Multidivtime | Bayesian | Fixed | SC, ARC | a | [8] |
| LSD | Least squares | Fixed | SC | a | [9] |
[1] (Drummond & Rambaut 2007), [2] (Sanderson 1997), [3] (Yang 2007), [4] (Fourment & Holmes 2014), [5] (Ronquist et al. 2012b), [6] (Rambaut 2000), [7] (Xia 2013), [8] (Thorne et al. 1998), [9] http://www.atgc-montpellier.fr/LSD/, [10] (Didelot & Wilson 2015), [11] (Huson & Bryant 2006), [12] (Martin et al. 2015), [13] (McVean et al. 2002), [14] (Jonsson et al. 2013), [15] (Lanfear et al. 2012), [16] Distributed in the beast package, [17] https://cran.r-project.org/web/packages/TipDatingBeast/index.html, [18] https://github.com/simon-ho/sitesampler/, [19] (Murray et al. 2015) [20] http://tree.bio.ed.ac.uk/software/tracer/, [21] http://tree.bio.ed.ac.uk/software/figtree/, [22] https://www.cs.auckland.ac.nz/~remco/DensiTree/, [23] http://tree.bio.ed.ac.uk/software/pathogen/, [24] (Jombart et al. 2010), [25] http://beast.bio.ed.ac.uk/beagle, [26] http://www.christophheibl.de/Rpackages.html.
Models of rate variation among branches: strict clock (SC), local multirate clock (LC), discrete multirate clock (DC), autocorrelated relaxed clock (ARC) and uncorrelated relaxed clock (URC). See Ho & Duchêne (2014) for more details.
Available distribution to model tip date uncertainty (see also Fig. 4): a: point values (no uncertainty), b: normal distribution, c: empirical description of the probability density function directly measured on the calibrated sample, d: uniform distributions with hard minimum and maximum bounds, e: uniform distribution with hard minimum and soft maximum bounds.
Figure 4Different statistical distributions to model uncertainty in tip calibrations inferences. Different distributions can be used to model the error associated with sampling dates. Choosing the best‐suited one depends on the type of sample and the information associated with the dating method (Ho & Phillips 2009). Point values (a) can be used if the age of a sample is exactly known (e.g. sampling date). Modelling radiocarbon dating errors with a normal distribution (b) is common practice in ancient DNA studies even though recent improvement allow to use empirical description of the probability density function directly measured on the calibrated sample (c) (see Molak et al. 2015 for more details on this topic). Uniform distributions with hard minimum and maximum bounds (d) are suited to samples obtained from a well‐defined stratum [e.g. ancient DNA retrieved from ice cores (Willerslev et al. 2007) or from samples associated with archaeological horizons (Edwards et al. 2007)] or to model uncertainty in sampling time accuracy (e.g. if the sampling month is known for some samples but not for others). Finally, uniform distribution with hard minimum and soft maximum bounds (e) can be suited to ancient DNA samples beyond the 45–50 ka resolution limit of radiocarbon dating (thus yielding a minimum age) for which additional information (e.g. from fossil data) exists and justifies the use of a soft maximum bound. This figure is adapted from Ho & Duchêne (2014).
Figure 5Major steps to conduct accurate tip dating. This figure summarizes the five main steps that ought to be conducted when performing tip‐dating analyses. For each of those steps, additional advices such as the important choices that must be made or the software to be used are given in the form of a practical guide available in Appendix S1 (Supporting information).
Figure 2Testing for temporal signal. Flow chart for testing measurable evolutionary change in a data set prior to any tip‐dating analysis. The most robust method existing so far is the ‘date‐randomization test’ which involves generating multiple randomized data sets by permutation of sampling times, and comparing parameter estimates obtained with the initial data set vs the randomized ones (see Section To date or not to date in the text for more details on how to perform this test and interpret the results); visual evidence for a temporal signal can also be obtained by fitting a linear regression between the age of the samples and their root‐to‐tip distances, which has to be computed from a tree built without constraining tip heights to their sampling times. Different tools allowing computing date‐randomized data sets and root‐to‐tip distances are listed in Table 1.
Figure 3Transmission graph vs. Phylogenetic tree. This figure adapted from Jombart et al. (2011) illustrates the difference between a transmission chain and a phylogenetic reconstruction. Panel a represents the transmission chain of a pathogen as arrows connecting hosts represented as circles, with grey circles representing sampled hosts. In panel (a, b) transmission graph (or network) is correctly reconstructed from the sampled hosts. In panel (c), a time‐structured phylogeny is reconstructed using the same samples with black dots representing hypothetical ancestral isolates.