| Literature DB >> 24627184 |
Filip Bielejec1, Philippe Lemey2, Guy Baele2, Andrew Rambaut3, Marc A Suchard4.
Abstract
Molecular phylogenetic and phylogeographic reconstructions generally assume time-homogeneous substitution processes. Motivated by computational convenience, this assumption sacrifices biological realism and offers little opportunity to uncover the temporal dynamics in evolutionary histories. Here, we propose an evolutionary approach that explicitly relaxes the time-homogeneity assumption by allowing the specification of different infinitesimal substitution rate matrices across different time intervals, called epochs, along the evolutionary history. We focus on an epoch model implementation in a Bayesian inference framework that offers great modeling flexibility in drawing inference about any discrete data type characterized as a continuous-time Markov chain, including phylogeographic traits. To alleviate the computational burden that the additional temporal heterogeneity imposes, we adopt a massively parallel approach that achieves both fine- and coarse-grain parallelization of the computations across branches that accommodate epoch transitions, making extensive use of graphics processing units. Through synthetic examples, we assess model performance in recovering evolutionary parameters from data generated according to different evolutionary scenarios that comprise different numbers of epochs for both nucleotide and codon substitution processes. We illustrate the usefulness of our inference framework in two different applications to empirical data sets: the selection dynamics on within-host HIV populations throughout infection and the seasonality of global influenza circulation. In both cases, our epoch model captures key features of temporal heterogeneity that remained difficult to test using ad hoc procedures. [Bayesian inference; BEAGLE; BEAST; Epoch Model; phylogeography; Phylogenetics.].Entities:
Mesh:
Year: 2014 PMID: 24627184 PMCID: PMC4055869 DOI: 10.1093/sysbio/syu015
Source DB: PubMed Journal: Syst Biol ISSN: 1063-5157 Impact factor: 15.683
FCollapsing branches. Transition probability matrix P(t,t,r) governs the nonhomogeneous substitution process along a branch from time t to t and is the matrix-product of transition matrices P1(t,T1,r) and P2(T1,t,r), where T1 is the epoch change-point time between homogeneous processes 1 and 2. We assume rate scalar r remains constant along the entire branch.
FEpoch simulation scenarios on an influenza A maximum clade credibility tree topology. In the two-epoch example illustrated at the top, the transition time is set at t1 = 7, creating two epochs with substitution processes governed by infinitesimal rate matrices Q and Q respectively, separated by the light and dark gray areas and the dotted line. In the three-epoch example illustrated at the bottom, transition times are put at t1 = 7 and t2 = 15, creating three epochs with substitution processes governed by infinitesimal rate matrices Q, Q and then again Q, as indicated by the alternating dark and light areas.
Estimator performance for simulated data sets
| Simulated | Estimated | |||||||||
| Coverage | Mean | MSEa | Coverage | Mean | MSE | Coverage | Mean | MSE | ||
| Nucleotides | κ2 | κ3 | ||||||||
| κ1 = 10 | 0.96 | 10.068 | 1.005 | 0.98 | 10.283 | 1.097 | — | — | — | |
| Dated tips | κ1 = 1, κ2 = 10 | 0.98 | 1.007 | 0.008 | 0.96 | 10.446 | 1.551 | — | — | — |
| κ1 = 1, κ2 = 10, κ3 = 1 | 0.96 | 1.010 | 0.009 | 0.96 | 9.993 | 2.435 | 0.95 | 1.017 | 0.026 | |
| Contemporaneous | κ1 = 1, κ2 = 10, κ3 = 1 | 0.96 | 1.049 | 0.002 | 0.95 | 10.002 | 3.723 | 0.92 | 1.022 | 0.061 |
| codon | ω2 | ω3 | ||||||||
| ω1 = 1 | 0.94 | 0.993 | 0.048 | 0.93 | 1.014 | 0.53 | — | — | — | |
| Dated tips | ω1 = 0.1, ω2 = 1 | 0.90 | 0.103 | 0.001 | 0.93 | 1.011 | 0.054 | — | — | — |
| ω1 = 0.1, ω2 = 1, ω3 = 0.1 | 0.92 | 0.102 | 0.001 | 0.89 | 1.096 | 0.274 | 0.96 | 0.110 | 0.02 | |
| Contemporaneous | ω1 = 0.1, ω2 = 1, ω3 = 0.1 | 0.93 | 0.100 | 0.001 | 0.96 | 1.067 | 0.051 | 0.95 | 0.103 | 0.002 |
Notes: The table lists the parameter values used to generate data in the first major column and coverage of their estimates, along with measures of variance and bias, in the second major column. Consecutive rows present the results for the first, second, and third nucleotide model simulation for dated-tip samples and the third nucleotide model simulation for contemporaneous sequences (ultrametric tree), followed by the the results of first, second, and third codon model simulation for dated-tip samples and the third codon model simulation for contemporaneous sequences.
aMean Squared Error.
bHKY model's transition-transversion bias parameters.
cYang codon model's nonsynonymous to synonymous substitution rate ratio.
FEstimates of d/d ratio for within-host HIV analyses. Vertical lines represent 95% highest posterior density intervals for the d/d ratio estimates. Parameter ω is estimated under the homogeneous model, while ω1 and ω2 are obtained using the epoch model.
Bayes factor test for decreased selection after progression
| Patient | Posterior probability | log Bayes factor |
| Patient 1 | >0.999 | 7.418 |
| Patient 2 | >0.999 | 9.602 |
| Patient 3 | 0.898 | 2.174 |
| Patient 5 | 0.430 | −0.282 |
| Patient 6 | >0.999 | 9.210 |
| Patient 7 | >0.999 | 8.112 |
| Patient 8 | 0.933 | 2.627 |
| Patient 9 | 0.895 | 2.142 |
| 0.894 | 2.14 |
Notes: We report the posterior probability that ω1 < ω2 and the corresponding Bayes factor against the alternative that ω1 ≥ ω2.
FA two-epoch phylogeographic model applied to seasonal influenza H3N2. A. Maximum clade credibility (MCC) tree with branches colored according to modal discrete location states at each node. The gray time intervals represent the epoch model with a single discrete rate matrix shared across northern hemisphere spring and summer (light gray) time intervals and another rate matrix shared across the northern hemisphere autumn and winter (dark gray) time intervals. B. Diffusion rates supported by a Bayes factor >20 for spring and summer epoch intervals. The width of the arrows reflects the magnitude of the Bayes factor support. C. Diffusion rates supported by a Bayes factor >20 for autumn and winter epoch intervals.
Marginal likelihood estimates
| Marginal likelihood | ||
| Model | PS | SS |
| homogeneous | − 827.29 | − 825.07 |
| 7-epoch | − 806.36 | − 803.40 |
| 14-epoch | − 798.77 | − 795.63 |
Notes: Comparison in terms of model fit between a homogeneous model, an epoch model with time discretized into S = 7 epochs alternating between 2 different rate matrices and an epoch model with time discretized into S = 14 epochs, alternating between 4 separate rate matrices. PS, path sampling; SS, stepping stone sampling.