| Literature DB >> 24992937 |
Abstract
BACKGROUND: To understand biological diversification, it is important to account for large-scale processes that affect the evolutionary history of groups of co-distributed populations of organisms. Such events predict temporally clustered divergences times, a pattern that can be estimated using genetic data from co-distributed species. I introduce a new approximate-Bayesian method for comparative phylogeographical model-choice that estimates the temporal distribution of divergences across taxa from multi-locus DNA sequence data. The model is an extension of that implemented in msBayes.Entities:
Mesh:
Year: 2014 PMID: 24992937 PMCID: PMC4227068 DOI: 10.1186/1471-2148-14-150
Source DB: PubMed Journal: BMC Evol Biol ISSN: 1471-2148 Impact factor: 3.260
Summary of the notation used throughout this work; modified from Oaks et al. [7]
| Number of population pairs. | |
| The number of genome copies sampled from population pair | |
| Number of loci sampled from population pair | |
| Total number of unique loci sampled. | |
| Sequence alignment of locus | |
| Population genetic summary statistics calculated from | |
| Vector containing the sequence alignments of each locus from each population pair: | |
| Vector containing the summary statistics of each locus from each population pair: | |
| Multi-dimensional Euclidean space around the observed summary statistics, | |
| Radius of | |
| Gene tree of the sequences in | |
| Vector containing the gene trees of each locus from each population pair: | |
| | | Number of population divergence-time parameters shared among the |
| Time of population divergence in 4 | |
| Set of divergence-time parameters: { | |
| The index of the divergence-time in | |
| Vector of divergence-time indices: ( | |
| Time of divergence in 4 | |
| Vector of divergence times for each of the population pairs: ( | |
| Scaled time of divergence between the populations of pair | |
| Vector containing the scaled divergence times of each locus from each population pair: | |
| Mutation-rate-scaled effective population size of the 1 | |
| Mutation-rate-scaled effective population size of the population ancestral to pair | |
| Vectors ( | |
| Vector containing the | |
| Mutation-rate multiplier of locus | |
| Vector containing the locus-specific mutation-rate multipliers: ( | |
| The shape parameter of the gamma prior distribution on | |
| | respectively. The bottleneck in each descendant population begins immediately after divergence. |
| Vectors ( | |
| Proportion of time between present and | |
| Vector containing the | |
| m | Symmetric migration rate between the descendant populations of pair |
| Vector containing the migration rates for each population pair: ( | |
| | or differences in generation times among taxa. |
| | among loci or among taxa. |
| Vector of ploidy and/or generation-time scaling constants: | |
| Vector of mutation-rate scaling constants: | |
| Mean of divergence times across the | |
| Variance of divergence times across the | |
| Dispersion index of divergence times across the | |
| Number of samples from the joint prior. | |
The models evaluated in the simulation-based analyses
| | ||||
|---|---|---|---|---|
For the M model, the prior on the concentration parameter, Gamma, was set to Gamma(2,2) for the validation analyses and Gamma(1.5,18.1) for the power analyses. The distributions of divergence times are given in units of 4N generations followed in brackets by units of millions of generations ago (MGA), with the former converted to the latter assuming a per-site rate of 110−8 mutations per generation. For model M, the priors for theta parameters are U(0, 0.05) and ,Beta(1, 1)2U(0, 0.05). The later is summarized as . For the M and M, and M models, ,, and are independently and exponentially distributed with a mean of 0.025.
The models used to simulate pseudo-replicate datasets for assessing the power of the models in Table2
| | ||||
|---|---|---|---|---|
| | | ||||
| | | | |||
| | | | |||
| | | | |||
| | | | |||
| | | | |||
| | | ||||
| | | | |||
| | | | |||
| | | | |||
| | | | |||
| | | | |||
| | | ||||
| | | | |||
| | | | |||
| | | | |||
| | | | |||
| | | ||||
The distributions of divergence times are given in units of 4N generations followed in brackets by units of millions of generations ago (MGA), with the former converted to the latter assuming a per-site rate of 1 10−8 mutations per generation. For all of the models, the priors for theta parameters are U(0, 0.05) and ,Beta(1, 1)2U(0, 0.05. The later is summarized as . For the and models, ,, and are independently and exponentially distributed with a mean of 0.025.
The models used to analyze the data from the 22 pairs of taxa from the Philippines (M), and a subset of nine of those pairs from the Islands of Negros and Panay ()
| | |
| | |
| | |
In addition to the n1 coalescent times, the has only a single parameter for each taxon pair. The remaining M models have three , two , and one parameter. The distributions of divergence times are given in units of 4N generations followed in brackets by units of millions of generations ago (MGA), with the former converted to the latter assuming a per-site rate of 1 10−8 mutations per generation. The model (and its counterpart that samples over ordered divergence models) has only two parameters (the descendant populations of each pair share the same parameter, and there are no bottleneck parameters).
Figure 1Comparison of model-choice accuracy. Model-choice accuracy for models (A–D)M, (E–H)M, (I–L)M, and (M–P)M when analyzing data generated under models (A, E, I, and M)M, (B, F, J, and N)M, (C, G, K, and O)M, and (D, H, L, and P)M. The unadjusted posterior probability of a single divergence event, based on ||=1, from 50,000 posterior estimates are assigned to bins of width 0.05 and plotted against the proportion of replicates in each bin where the truth is ||=1.
Figure 2Comparison of model-choice accuracy using threshold. Model-choice accuracy for models (A–D)M, (E–H)M, (I–L)M, and (M–P)M when analyzing data generated under models (A, E, I, and M)M, (B, F, J, and N)M, (C, G, K, and O)M, and (D, H, L, and P)M. The unadjusted posterior probability of a single divergence event, based on D<0.01, from 50,000 posterior estimates are assigned to bins of width 0.05 and plotted against the proportion of replicates in each bin where the truth is D<0.01.
Figure 3Power to avoid spurious estimation of clustered divergences when divergence times are random. The power of models (A–D)M, (E–H)M, (I–L)M, and (M–P)M to detect random variation in divergence times as simulated under the series of models. The plots illustrate the estimated number of divergence events () from analyses of 1000 datasets simulated under each of the models, with the the estimated probability of the model inferring one divergence event, , given for each combination. The 22 divergence times were randomly drawn as indicated above each column of plots, where time is respresented as millions of generations ago (MGA) according to a per-site rate of 1 × 10−8 mutations per generation. Four of the six data-generating models of the series are shown; please see Additional file 1: Figure S14 for all results.
Figure 4Power to avoid spurious support for one divergence event when divergence times are random. The tendency of models (A–D)M, (E–H)M, (I–L)M, and (M–P)M to support one divergence event when there is random variation in divergence times as simulated under the series of models. The plots illustrate histograms of the estimated posterior probability of the one divergence model, p(||=1|B(S∗)), from analyses of 1000 datasets simulated under each of the models. The 22 divergence times were randomly drawn as indicated above each column of plots, where time is respresented as millions of generations ago (MGA) according to a per-site rate of 1 × 10−8 mutations per generation. Four of the six data-generating models of the series are shown; please see Additional file 1: Figure S17 for all results.
Figure 5Power to avoid spurious estimation of small temporal variance in divergences when divergence times are random. The power of models (A–D)M, (E–H)M, (I–L)M, and (M–P)M to detect random variation in divergence times as simulated under the series of models. The plots illustrate the estimated dispersion index of divergence times () from analyses of 1000 datasets simulated under each of the models, with the the estimated probability of the model inferring one divergence event, , given for each combination. The 22 divergence times were randomly drawn as indicated above each column of plots, where time is respresented as millions of generations ago (MGA) according to a per-site rate of 1 × 10−8 mutations per generation. Four of the six data-generating models of the series are shown; please see Additional file 1: Figure S20 for all results.
Figure 6Power to avoid spurious support for no temporal variance in divergences (i.e.,) when divergence times are random. The tendency of models (A–D)M, (E–H)M, (I–L)M, and (M–P)M to support one divergence event when there is random variation in divergence times as simulated under the series of models. The plots illustrate histograms of the estimated posterior probability of the one divergence model, p(D<0.01|B(S∗)), from analyses of 1000 datasets simulated under each of the models. The 22 divergence times were randomly drawn as indicated above each column of plots, where time is respresented as millions of generations ago (MGA) according to a per-site rate of 1 × 10−8 mutations per generation. Four of the six data-generating models of the series are shown; please see Additional file 1: Figure S23 for all results.
Figure 7Estimated number of divergence events for 22 taxa from the Philippines. The (A–E) posterior and (F–J) prior probabilities of the number of divergence events (||) when the data of the 22 pairs of taxa from the Philippines are analyzed under the five models indicated at the top of each column of plots (Table 4). The average prior probability of an (K–O) unordered and (P–T) ordered model of divergence (t) with || divergence-time parameters is also shown. The posterior median of the dispersion index of divergence times (D) is also given for each model, followed by the 95% highest posterior density interval in parentheses.
Figure 8Estimated number of divergence events for 9 taxa from the Philippines. The posterior probabilities of the number of divergence events, ||, when the data of the 9 pairs of taxa from Negros and Panay Islands are analyzed under the DPP model that samples over (A) unordered and (B) ordered models of divergence (Table 4). Both models share the same (C) prior probability of the number of divergence events, and the average prior probability of an (D) unordered and (E) ordered model of divergence (t) with || divergence-time parameters. The posterior median of the dispersion index of divergence times (D) is also given for each model, followed by the 95% highest posterior density interval in parentheses.