| Literature DB >> 28449263 |
Alexander T Xue1, Michael J Hickerson1,2.
Abstract
Population genetic data from multiple taxa can address comparative phylogeographic questions about community-scale response to environmental shifts, and a useful strategy to this end is to employ hierarchical co-demographic models that directly test multi-taxa hypotheses within a single, unified analysis. This approach has been applied to classical phylogeographic data sets such as mitochondrial barcodes as well as reduced-genome polymorphism data sets that can yield 10,000s of SNPs, produced by emergent technologies such as RAD-seq and GBS. A strategy for the latter had been accomplished by adapting the site frequency spectrum to a novel summarization of population genomic data across multiple taxa called the aggregate site frequency spectrum (aSFS), which potentially can be deployed under various inferential frameworks including approximate Bayesian computation, random forest and composite likelihood optimization. Here, we introduce the r package multi-dice, a wrapper program that exploits existing simulation software for flexible execution of hierarchical model-based inference using the aSFS, which is derived from reduced genome data, as well as mitochondrial data. We validate several novel software features such as applying alternative inferential frameworks, enforcing a minimal threshold of time surrounding co-demographic pulses and specifying flexible hyperprior distributions. In sum, multi-dice provides comparative analysis within the familiar R environment while allowing a high degree of user customization, and will thus serve as a tool for comparative phylogeography and population genomics.Entities:
Keywords: aggregate site frequency spectrum; approximate Bayesian computation; comparative phylogeography; population genetics software; random forest
Mesh:
Year: 2017 PMID: 28449263 PMCID: PMC5724483 DOI: 10.1111/1755-0998.12686
Source DB: PubMed Journal: Mol Ecol Resour ISSN: 1755-098X Impact factor: 7.090
Figure 1Hierarchical co‐demographic models. (a) Example instantaneous co‐expansion model. (b) Example instantaneous co‐contraction model. Both models are such that eight of the ten taxa are assigned to three synchronous co‐demographic pulses (ѱ = 3; ζ = 0.8), with the first pulse containing three taxa (ζ1 = 0.3), the second pulse containing another two taxa (ζ2 = 0.2) and the third pulse containing yet another three taxa (ζ3 = 0.3). Pulse 1 occurs at the most recent time (τ), pulse 2 occurs at the intermediate time (τ), and pulse 3 occurs at the most ancient time (τ). The remaining two taxa are then behaving idiosyncratically in time from all other taxa (τ and τ). Each taxon is allowed nuisance demographic parameter draws independent from each other ({ε1, …, ε10} and {N 1, …, N 10})
Glossary of hyperparameters, parameter summaries, and parameters
| Hyper/parameter ( | Details |
|---|---|
| Ψ | Number of total events; hyperparameter that directly governs ζ and in turn governs τ; Ψ = ψ + σ |
| ψ | Number of synchronous pulse events; hyperparameter that directly governs ζ |
| ζ | Total proportion of taxa belonging to any of ψ pulses; ranges from 0.0 to 1.0; ζ |
| ζ | Vector of proportions of taxa belonging to each event, thus including ζ |
| ζ | Vector of proportions of taxa belonging to each pulse {ζ1, …, ζψ}, ordered from most recent to most ancient; hyperparameter that directly governs τ |
| ζ | An element of ζ or ζ |
|
| Conversion of ζ |
|
| Conversion of ζ |
| σ | Number of idiosyncratic events, and thus idiosyncratic taxa as well; determines length of τ |
| τ | Vector of times across |
| τ | Vector of synchronous pulse times corresponding to ζ |
| τ | Vector of idiosyncratic pulse times and similarly ordered from most recent to most ancient |
| ε | Vector of nuisance size change magnitudes in units of ratio from ancestral |
|
| Vector of nuisance |
|
| Total number of taxa in data set |
| β | Pulse buffer value, in units of number of generations, between pulses and thereby modifying the τ prior; though not explored here, if ε or |
| Ωτ | Dispersion index of τ, or Var(τ)/ |
ζ values given even distribution of ζ = 1.0 for each value of ψ > 0
| ψ value | ζ |
|---|---|
| ψ = 1 | ζ1 = 1.0 |
| ψ = 2 | {ζ1, ζ2} = 0.5 |
| ψ = 3 | {ζ1, ζ2, ζ3} = {0.4, 0.3, 0.3} (in random order per simulation) |
| ψ = 4 | {ζ1, ζ2, ζ3, ζ4} = {0.3, 0.3, 0.2, 0.2} (in random order per simulation) |
| ψ = 5 | {ζ1, ζ2, ζ3, ζ4, ζ5} = 0.2 |
Specifications of subset reference tables for truncating hyperprior range simulation experiment
| Subset reference table hyperprior | Total simulations (based on 100,000 per ψ value) | Total PODs (based on 20 per ψ value) | Total sub‐sampled simulations for each cycle of 10 hRF decision trees (based on 1,000 per ψ value) | Remaining simulations for hRF sub‐sampling once PODs removed | hABC accepted tolerance level (leading to 1,500 retained simulations) |
|---|---|---|---|---|---|
| ψ ~ | 600,000 | 120 | 6,000 | 599,880 | 0.00250 |
| ψ ~ | 500,000 | 100 | 5,000 | 499,900 | 0.00300 |
| ψ ~ | 400,000 | 80 | 4,000 | 399,920 | 0.00375 |
| ψ ~ | 300,000 | 60 | 3,000 | 299,940 | 0.00500 |
| ψ ~ | 200,000 | 40 | 2,000 | 199,960 | 0.00750 |
Results for testing inferential frameworks simulation experiment
| Instantaneous co‐expansion | Instantaneous co‐contraction | |||
|---|---|---|---|---|
|
|
|
|
| |
| hRF prediction of Ψ | .600 | 2.22 | .807 | 1.77 |
| hRF coupled with PLS prediction of Ψ | .469 | 2.44 | .831 | 1.73 |
| hABC hyperparameter estimation of Ψ | ||||
| tol. = 0.0050 | ||||
| Mean | .500 | 2.41 | .800 | 1.77 |
| Median | .426 | 2.85 | .733 | 2.03 |
| Mode | .413 | 3.19 | .602 | 2.67 |
| tol. = 0.0010 | ||||
| Mean | .534 | 2.36 | .800 | 1.77 |
| Median | .428 | 2.85 | .735 | 2.03 |
| Mode | .427 | 3.05 | .631 | 2.53 |
| tol. = 0.0005 | ||||
| Mean | .547 | 2.34 | .802 | 1.76 |
| Median | .495 | 2.71 | .758 | 1.95 |
| Mode | .481 | 2.94 | .666 | 2.40 |
| hABC coupled with PLS hyperparameter estimation of Ψ | ||||
| tol. = 0.0050 | ||||
| Mean | .323 | 2.75 | .612 | 2.83 |
| Median | .251 | 2.75 | .392 | 2.99 |
| Mode | .234 | 2.75 | .301 | 2.99 |
| tol. = 0.0010 | ||||
| Mean | .384 | 2.67 | .641 | 2.61 |
| Median | .267 | 2.74 | .466 | 2.82 |
| Mode | .277 | 2.76 | .385 | 2.88 |
| tol. = 0.0005 | ||||
| Mean | .402 | 2.64 | .665 | 2.52 |
| Median | .221 | 2.77 | .457 | 2.84 |
| Mode | .202 | 2.85 | .397 | 2.90 |
| hCL optimization of Ψ | .027 | 4.10 | .259 | 3.49 |
Results for pulse buffer on prior space simulation experiment
| β = 0 generations | β = 30,000 generations | PODs: β = 0 reference table: β = 30,000 | ||||
|---|---|---|---|---|---|---|
|
|
|
|
|
|
| |
| hRF prediction of Ψ | .609 | 2.32 | .758 | 1.91 | .666 | 2.26 |
| hABC model selection of Ψ | ||||||
| Mean | .600 | 2.37 | .750 | 1.96 | .617 | 2.43 |
| Median | .557 | 2.65 | .686 | 2.20 | .596 | 2.71 |
| Mode | .507 | 3.07 | .722 | 2.18 | .527 | 2.91 |
| hABC parameter summary estimation of Ωτ | ||||||
| Mean | .932 | 7555 | .874 | 9750 | .904 | 11009 |
| Median | .886 | 12616 | .860 | 11120 | .905 | 11042 |
| Mode | .846 | 13727 | .889 | 12775 | .826 | 15227 |
| hABC parameter summary estimation of | ||||||
| Mean | .945 | 14550 | .927 | 12539 | .962 | 13072 |
| Median | .920 | 14199 | .946 | 11738 | .962 | 12923 |
| Mode | .915 | 15983 | .949 | 12222 | .957 | 13644 |
Results for truncating hyperprior range simulation experiment
| ѱ ~ | ѱ ~ | ѱ ~ | ѱ ~ | ѱ ~ | ||||||
|---|---|---|---|---|---|---|---|---|---|---|
|
|
|
|
|
|
|
|
|
|
| |
| hRF prediction of Ψ | .987 | 0.73 | .897 | 1.79 | .809 | 2.08 | .756 | 2.07 | .758 | 1.91 |
| hABC model selection of Ψ | ||||||||||
| Mean | .963 | 1.22 | .901 | 1.79 | .808 | 2.13 | .754 | 2.10 | .750 | 1.96 |
| Median | .900 | 2.01 | .830 | 2.35 | .705 | 2.65 | .711 | 2.37 | .686 | 2.20 |
| Mode | .900 | 2.01 | .864 | 2.11 | .811 | 2.16 | .744 | 2.35 | .722 | 2.18 |
Figure 2Flowchart of multi‐dice usage. multi‐dice accomplishes multi‐taxa co‐demographic inference under a hierarchical model through three major steps: model specification, single‐population simulation across multiple taxa and conversion of simulated data to multi‐taxa summary statistics. Hierarchical co‐demographic model specification is conducted across multiple functions in sequence, with preceding functions contained within successive functions. This sequential embedding of functions extends to dice.sims(), allowing the entire model specification process to be performed concurrently with data simulation. Simulated data can then be converted to multi‐taxa summary statistics by either dice.aSFS() or dice.sumstats(), depending on the data type. Additionally, these functions can be applied to empirical data as well. To clarify, only two multi‐dice functions/command lines, dice.sims() and dice.aSFS()/dice.sumstats(), are needed for simplest usage to construct a reference table of multi‐taxa summary statistics under a hierarchical co‐demographic model. This reference table can then be exploited in a downstream software program for hRF or hABC purposes, where appropriate statistical practices should be used to examine robustness and fit. Importantly, exploratory analyses should be performed on the empirical data prior to deploying multi‐dice to better guide its usage, for example, to determine sensible prior distributions and evaluate differences among taxa