| Literature DB >> 34930835 |
Stéphane Guindon1, Nicola De Maio2.
Abstract
Statistical phylogeography provides useful tools to characterize and quantify the spread of organisms during the course of evolution. Analyzing georeferenced genetic data often relies on the assumption that samples are preferentially collected in densely populated areas of the habitat. Deviation from this assumption negatively impacts the inference of the spatial and demographic dynamics. This issue is pervasive in phylogeography. It affects analyses that approximate the habitat as a set of discrete demes as well as those that treat it as a continuum. The present study introduces a Bayesian modeling approach that explicitly accommodates for spatial sampling strategies. An original inference technique, based on recent advances in statistical computing, is then described that is most suited to modeling data where sequences are preferentially collected at certain locations, independently of the outcome of the evolutionary process. The analysis of georeferenced genetic sequences from the West Nile virus in North America along with simulated data shows how assumptions about spatial sampling may impact our understanding of the forces shaping biodiversity across time and space.Entities:
Keywords: Bayesian inference; West Nile virus; phylogeography; sampling design; statistical modeling
Mesh:
Year: 2021 PMID: 34930835 PMCID: PMC8719894 DOI: 10.1073/pnas.2105273118
Source DB: PubMed Journal: Proc Natl Acad Sci U S A ISSN: 0027-8424 Impact factor: 11.205
Fig. 1.Analyses of WNV data under the detection and survey sampling schemes. (A) Estimated location of the MRCA obtained under the detection scheme. (B) Estimated location of the MRCA obtained under the survey scheme. The black density line delineates the 95% credibility interval for this parameter. Solid and shaded dots on the maps correspond to the sampled locations. (C) Posterior densities of Kingman’s coalescent effective population sizes. (D) The age of the root node. (E) The dispersal distance per year (in kilometers). (F) The exponential growth parameter. Distributions in blue and red were obtained under the detection and survey schemes, respectively.
Accuracy and precision of dispersal rate estimates under seven spatial sampling designs: comparison of the detection and survey schemes
| Design Scheme | Detection | Survey | |||
|
| % correct |
| % correct | ||
| Lat. | [0.78, 1.80] | 0.90 | [0.94, 3.32] | 0.65 | |
| 1) | Lon. | [0.75, 1.72] | 0.90 | [0.92, 3.34] | 0.68 |
| Lat. | [0.11, 0.22] | 0 | [0.27, 2.32] | 0.43 | |
| 2) | Lon. | [0.12, 0.24] | 0 | [0.28, 2.93] | 0.53 |
| Lat. | [0.38, 0.87] | 0.30 | [0.78, 14.64] | 0.80 | |
| 3) | Lon. | [0.39, 0.88] | 0.28 | [0.76, 12.30] | 0.83 |
| Lat. | [0.74, 1.72] | 0.93 | [0.83, 2.49] | 0.78 | |
| 4) | Lon. | [0.27, 0.65] | 0.13 | [0.59, 4.75] | 0.90 |
| Lat. | [0.74, 1.72] | 0.90 | [0.87, 3.02] | 0.78 | |
| 5) | Lon. | [0.54, 1.25] | 0.75 | [0.67, 2.50] | 0.90 |
| Lat. | [0.10, 0.18] | 0 | [0.16, 1.09] | 0.23 | |
| 6) | Lon. | [0.47, 1.05] | 0.63 | [0.97, 4.20] | 0.55 |
| Lat. | [0.77, 1.76] | 0.88 | [0.93, 3.48] | 0.65 | |
| 7) | Lon. | [0.79, 1.79] | 0.90 | [0.92, 3.16] | 0.68 |
“” (respectively “”) is the average taken over 40 simulation replicates of the 0.025 (respectively 0.975) quantile of the posterior distribution for the corresponding dispersal parameter. The “% correct” gives the proportion of simulated datasets where the 95% HPD brackets 1.0, the true value of the dispersal parameters.