| Literature DB >> 31004175 |
Jade Vincent Membrebe1, Marc A Suchard2,3,4, Andrew Rambaut5,6, Guy Baele1, Philippe Lemey1.
Abstract
Many factors complicate the estimation of time scales for phylogenetic histories, requiring increasingly complex evolutionary models and inference procedures. The widespread application of molecular clock dating has led to the insight that evolutionary rate estimates may vary with the time frame of measurement. This is particularly well established for rapidly evolving viruses that can accumulate sequence divergence over years or even months. However, this rapid evolution stands at odds with a relatively high degree of conservation of viruses or endogenous virus elements over much longer time scales. Building on recent insights into time-dependent evolutionary rates, we develop a formal and flexible Bayesian statistical inference approach that accommodates rate variation through time. We evaluate the novel molecular clock model on a foamy virus cospeciation history and a lentivirus evolutionary history and compare the performance to other molecular clock models. For both virus examples, we estimate a similarly strong time-dependent effect that implies rates varying over four orders of magnitude. The application of an analogous codon substitution model does not implicate long-term purifying selection as the cause of this effect. However, selection does appear to affect divergence time estimates for the less deep evolutionary history of the Ebolavirus genus. Finally, we explore the application of our approach on woolly mammoth ancient DNA data, which shows a much weaker, but still important, time-dependent rate effect that has a noticeable impact on node age estimates. Future developments aimed at incorporating more complex evolutionary processes will further add to the broad applicability of our approach.Entities:
Keywords: Bayesian inference; evolutionary rate; molecular clock; phylogenetics
Mesh:
Year: 2019 PMID: 31004175 PMCID: PMC6657730 DOI: 10.1093/molbev/msz094
Source DB: PubMed Journal: Mol Biol Evol ISSN: 0737-4038 Impact factor: 16.240
. 1.Epoch modeling in the FV phylogeny (top) and associated substitution rate estimates (bottom). Both trees are the same, but the one on the right is scaled in log time in order to better illustrate the superimposed epoch intervals (delineated by the vertical lines in both trees). The node circles in the phylogenies indicate the nodes to which a calibration prior is applied. The gray shade in the two regression plots represents the 95% highest posterior density (HPD) interval of the estimates.
Molecular Clock Model Estimates for the FV Data Set.
| Clock Model | ln MLE | Parameters | Mean | 95% HPD |
|---|---|---|---|---|
| TDRuni | −33,737 | Intercept | −2.349 | (−2.517; −2.157) |
| Slope | −0.868 | (−0.945; −0.800) | ||
| TDRexp | −33,667 | Intercept | −3.305 | (−3.389; −3.229) |
| Slope | −0.539 | (−0.570; −0.511) | ||
| Strict | −34,044 | Clock rate | 0.012 | (0.011; 0.013) |
| UCLD | −33,646 | Mean | 0.014 | (0.011; 0.018) |
| Dispersion | 0.019 | (0.014; 0.03) | ||
| RLC | −33,674 | Initial rate | 0.0072 | (0.0067; 0.0078) |
| Rate changes | 10 | (8; 12) |
Note.—The uncorrelated relaxed clock model with an UCLD yields the highest log marginal likelihood estimate (ln MLE) among the clock models being compared. In order of decreasing model fit to the data, the UCLD is followed by the exponential epoch model (TDRexp), the RLC model, the uniform epoch model (TDRuni), and the strict clock model.
Model Fit Estimates under Different Dispersed and Aggregated Calibrations.
| Clock Model | ln MLE | |||
|---|---|---|---|---|
| Dispersed | Aggregated | |||
| I | II | I | II | |
| TDRuni | −33,643 | −33,667 | NA | −33,635 |
| TDRexp | −33,629 | −33,638 | −33,636 | −33,637 |
| Strict | −34,691 | −33,933 | −33,648 | −33,933 |
| UCLD | −33,655 | −33,658 | −33,635 | −33,636 |
| RLC | −33,655 | −33,655 | −33,649 | −33,636 |
Note.—The exponential epoch model (TDRexp) outperforms the other clock models for the dispersed calibrations in terms of the log marginal likelihood (ln MLE) estimates, while differences are generally less conclusive for the aggregated calibrations.
No estimate is available for the uniform epoch model because all shallow calibrating nodes fall under a single epoch.
. 2.Relative errors for the different clock models under the (a) Dispersed I, (b) Dispersed II, (c) Aggregated I, and (d) Aggregated II calibration schemes. The whiskers represent the standard error of the relative rates. The exponential epoch model (TDRexp) is associated with the lowest mean relative error for all four calibration schemes.
Molecular Clock Model Estimates for the LV Data Set.
| Clock Model | TMRCA (My) | |||||
|---|---|---|---|---|---|---|
| ln MLE | Mean | 95% HPD | Parameters | Mean | 95% HPD | |
| TDRexp(9) | −10,674 | 1.022 | (0.6012; 1.413) | Intercept | −0.344 | (−0.670; 0.005) |
| Slope | −0.621 | (−0.685; −0.526) | ||||
| TDRexp(5) | −10,655 | 1.093 | (0.623; 2.691) | Intercept | −0.136 | (−0.643; 0.14) |
| Slope | −0.627 | (−0.660; −0.593) | ||||
| Strict | −10,915 | 0.038 | (0.031; 0.045) | Overall rate | 77.7 | (59.9; 96.9) |
| UCLD | −10,856 | 0.032 | (0.023; 0.043) | Mean | 37.5 | (29.5; 46.3) |
| Dispersion | 26.6 | (21.9; 31.6) | ||||
| RLC | −10,662 | 0.046 | (0.033; 0.059) | Initial rate | 26.5 | (19.4; 34.2) |
| Rate changes | 4 | (3; 5) | ||||
Note.—The sparse exponential epoch model (TDRexp(5)) yields the best log marginal likelihood estimates (ln MLE), followed by the RLC model, the denser exponential epoch model (TDRexp(9)), the uncorrelated relaxed clock model with an UCLD, and the strict clock model. The TDRexp models yield estimates for the TMRCA that are considerably deeper than the other clock models.
Estimates for the Ebolavirus Data Set with Changing Selection Pressure (ω).
| Homogenous | Time-Dependent | |
|---|---|---|
| ln MLE | −66,035 | −65,767 |
| Selection parameter(s) |
| Intercept |
| Slope | ||
| Clock rate (subst./site/yr) | 0.0004 (0.0002; 0.0006) | 0.0002 (0.0001; 0.0003) |
| TMRCA (years) | 11,166 (7,894; 15,190) | 49,022 (30,586; 77,863) |
Note.—The time-dependent ω model resulted in a better fit compared with the homogenous ω model.
. 3.Differences in mean and variance of node age estimates between UCLD and TDRexp. The percentage reduction in variance of the TDRexp estimates relative to the UCLD estimates is plotted against the difference in mean estimates (mrca[UCLD] − mrca[TDRexp]) for the node heights studied by Chang et al. (2017).