| Literature DB >> 21541031 |
Toni Safner1, Mark P Miller, Brad H McRae, Marie-Josée Fortin, Stéphanie Manel.
Abstract
Recently, techniques available for identifying clusters of individuals or boundaries between clusters using genetic data from natural populations have expanded rapidly. Consequently, there is a need to evaluate these different techniques. We used spatially-explicit simulation models to compare three spatial Bayesian clustering programs and two edge detection methods. Spatially-structured populations were simulated where a continuous population was subdivided by barriers. We evaluated the ability of each method to correctly identify boundary locations while varying: (i) time after divergence, (ii) strength of isolation by distance, (iii) level of genetic diversity, and (iv) amount of gene flow across barriers. To further evaluate the methods' effectiveness to detect genetic clusters in natural populations, we used previously published data on North American pumas and a European shrub. Our results show that with simulated and empirical data, the Bayesian spatial clustering algorithms outperformed direct edge detection methods. All methods incorrectly detected boundaries in the presence of strong patterns of isolation by distance. Based on this finding, we support the application of Bayesian spatial clustering algorithms for boundary detection in empirical datasets, with necessary tests for the influence of isolation by distance.Entities:
Keywords: edge detection methods; genetic boundaries; landscape genetics; spatial Bayesian clustering
Mesh:
Year: 2011 PMID: 21541031 PMCID: PMC3083678 DOI: 10.3390/ijms12020865
Source DB: PubMed Journal: Int J Mol Sci ISSN: 1422-0067 Impact factor: 5.923
Main characteristics of software packages compared in this study.
| Model | Spatial Bayesian clustering | Spatial Bayesian clustering | Spatial Bayesian clustering | Non parametric | Non parametric |
| Analytical and Stochastic methods | Markov chain Monte Carlo | Markov chain Monte Carlo | |||
| Spatial | Colored Voronoi tessellation based on discrete sampling site | Hidden Markov random field | Free colored Voronoi tessellation based on continuous Poisson point process | Geographic coordinates are included in the local weighted regression | Delaunay triangulation ( |
| Clustering criteria | None | None | |||
| Local edge detection criteria | None | None | None | Average rate of change based on individuals located within a kernel of a given bandwidth size | High rate of change between paired individuals based on Delaunay link |
| Data | Co-dominant and Dominant | Codominant | Co-dominant and Dominant | Co-dominant, Dominant and categorical data | Co-dominant, Dominant, and Sequence Data |
| Platforms | Windows, Unix/Linux Mac OS X | Windows, Unix/Linux | R package Windows, Unix/Linux Mac OS X | R package Windows, Unix/Linux Mac OS X | Windows |
| Reference | [ | [ | [ | [ | [ |
| URL | |||||
HWE: Hardy Weinberg equilibrium
LE: linkage equilibrium
Glossary of technical terms used.
| Hidden Markov Random Field | A hidden Markov random field model is a special case of Hidden Markov Models (HMM). A HMM is a |
| Markov chain Monte Carlo | Markov chain Monte Carlo methods are a class of |
| Neighborhood graphs | Neighborhood graphs capture proximity between points by connecting nearby points with a graph edge. Many possible ways to determine nearby points lead to a variety of neighborhood graph types such as Voronoi tesselation and Delaunay triangulation. |
| Voronoi tesselation | Given a set of |
| Delaunay triangulation | The Delaunay triangulation graph connects the adjacent geographical positions of the samples on a map, resulting in a network that connects all the samples. None of the points is inside the circumcircle of any triangle. |
Input parameters used for Bayesian clustering methods, WOMBSOFT and Monmonier’s algorithm (AIS) in our application.
| BAPS5 | 1–6 | 1–8 | 1–10 | |
| Number of replications | 10 | 10 | 10 | |
| TESS | 1–6 | 1–7 | 1–7 | |
| 0–0.6 | 0.6–1 | 0.6–1 | ||
| Number of Sweeps | 10,000 | 100,000 | 100,000 | |
| Number of burnin period | 2000 | 10,000 | 10,000 | |
| Number of runs | 10 | 10 | 10 | |
| Admixture parameter | Yes and no | Yes and no | ||
| GENELAND | 1–6 | 1–7 | 1–7 | |
| Number of iterations | 50,000 | 100,000 | 100,000 | |
| Thinning | 10 | 10 | 10 | |
| Number of replications | 10 | 10 | 10 | |
| Allele frequencies | Correlated | Correlated | Correlated | |
| WOMBSOFT | Resolution of the grid | 100 × 100 | 100 × 100 | 34 × 16 |
| Bandwidth | 7 | 70 km | 30 km | |
| Binomial threshold | 0.3 | 0.3 | 0.3 | |
| Statistical significance of the binomial test | 0.05 | 0.01 | 0.05 | |
| Monmonier’s algorithm | Genetic distances were specified | Residual | Raw and residuals | Raw and residuals |
| Number of barriers to be identified. | 4 | 1–7 | 1–7 | |
K: maximal number of clusters.
Psi: the interaction parameter of TESS can be interpreted as the intensity with which two neighbors belong to the same clusters. The higher the value of psi is the more likely the population may consists of a unique cluster with a high level of genetic continuity.
Admixture model was used although we know that our data have no admixture.
Figure 1.Results obtained with simulated datasets at generation 2,500. Genetic boundaries were simulated using μ = 0.000025 and δ = 11 for panels A, B, C, and D, and μ = 0.000025 and δ = 1 for panel e. (A) Monmonier's algorithm. Thin lines represent Delaunay triangulation of sampling points. Boundaries are presented as black lines of different width. Wider boundaries are the first detected; (B) WOMBSOFT. Circles represent sampling points. Areas in green represent homogenous zones, while boundaries are shown in light grey; (C) TESS. Dots represent sampling points. Lines separate Voronoi tessellation polygons. Different colors represent individual spatial cluster membership; (D) GENELAND. Dots represent sampling points. Different colors represent individual spatial cluster membership; (E) BAPS5. Lines separate Voronoi tessellation polygons. Different colors represent individual spatial cluster membership. BAPS5 overestimated the number of populations mostly for datasets with small dispersal parameters.
Mean number of alleles, mean gene diversity and mean isolation by distance (IBD) slope observed in analyzed data sets of 200 individuals for each parameter combination. The mean IBD slope was calculated using replicate data sets from the pre-barrier stage (generation 0). Standard deviations for all values are indicated in brackets. Mutation rates are given by μ and average dispersal distances by δ. In parentheses, the percentage of individual tests for each parameter combination that gave significant slopes at the α = 0.05 level is shown.
| 24.9 [0.59] | 0.86 [0.0088] | 0.265 [0.02] (100%) | |
| 9.5 [0.68] | 0.62 [0.0073] | 0.202 [0.03] (100%) | |
| 17.8 [1.28] | 0.79 [0.0182] | 0.004 [0.003] (12%) | |
| 6.32 [0.36] | 0.49 [0.0098] | 0.002 [0.002] (24%) |
Calculated over generation time and over repetitions.
Calculated only for generation time 0 over repetitions.
Percent correct inferences observed in simulated data for each parameter combination (25 replicates in each case) and for each generation time. At t = 0, we consider the inference to be correct if no boundary has been detected. For all the other generation times, inferences are correct if the two main boundaries to gene flow have been detected (i.e., K = 4). μ reflects mutation rate, δ average dispersal distances, b boundary permeability (i.e., gene flow across the boundary). The mean number of clusters (averaged over replicates) estimated by Bayesian clustering methods are reported in brackets.
| 0 | 0 | 0 | 0 [5.4] | 0 [5.2] | 0 [2.0] | 0 [6.0] | |
| 100 | 0 | 0 | 0 [5.2] | 0 [4.8] | 0 [5.2] | 0 [6.0] | |
| 500 | 36 | 0 | 72 [4.3] | 4 [4.6] | 0 [5.2] | 0 [6.0] | |
| 1000 | 68 | 8 | 84 [4.2] | 52 [4.1] | 4 [5.1] | 0 [6.0] | |
| 3000 | 96 | 20 | 100 [4.0] | 100 [4.0] | 60 [4.5] | 0 [5.9] | |
| 5000 | 100 | 24 | 100 [4.0] | 92 [4.0] | 88 [4.1] | 0 [5.9] | |
| 0 | 0 | 0 | 0 [5.4] | 0 [4.9] | 4 [5.0] | 0 [6.0] | |
| 100 | 0 | 0 | 0 [5.5] | 0 [4.6] | 0 [5.0] | 0 [6.0] | |
| 500 | 4 | 0 | 36 [4.7] | 0 [4.6] | 0 [5.2] | 0 [5.9] | |
| 1000 | 8 | 0 | 68 [4.3] | 12 [4.3] | 0 [5.6] | 12 [5.6] | |
| 3000 | 40 | 28 | 100 [4.0] | 60 [4.1] | 48 [4.7] | 16 [5.2] | |
| 5000 | 68 | 40 | 100 [4.0] | 80 [4.0] | 68 [4.4] | 64 [4.5] | |
| 0 | 0 | 0 | 100 [1.0] | 0 [3.5] | 100 [1.0] | 100 [1.0] | |
| 100 | 0 | 0 | 0 [2.6] | 0 [3.9] | 100 [4.0] | 0 [1.0] | |
| 500 | 36 | 32 | 100 [4.0] | 4 [4.0] | 100 [4.0] | 100 [4.0] | |
| 1000 | 60 | 72 | 100 [4.0] | 32 [4.0] | 100 [4.0] | 100 [4.0] | |
| 3000 | 96 | 92 | 100 [4.0] | 92 [4.0] | 100 [4.0] | 100 [4.0] | |
| 5000 | 100 | 88 | 100 [4.0] | 96 [4.0] | 100 [4.0] | 100 [4.0] | |
| 0 | 0 | 0 | 100 [1.0] | 0 [3.6] | 100 [1.0] | 100 [1.0] | |
| 100 | 0 | 0 | 0 [1.4] | 0 [3.8] | 96 [4.0] | 0 [1.0] | |
| 500 | 0 | 0 | 56 [4.0] | 0 [4.0] | 100 [4.0] | 44 [4.0] | |
| 1000 | 0 | 24 | 88 [4.0] | 4 [4.0] | 100 [4.0] | 96 [4.0] | |
| 3000 | 40 | 92 | 100 [4.0] | 40 [4.1] | 100 [4.0] | 100 [4.0] | |
| 5000 | 64 | 100 | 100 [4.0] | 80 [4.0] | 100 [4.0] | 100 [4.0] | |
| 0 | 0 | 0 | 100 [1.0] | 0 [2.4] | 100 [1.0] | 100 [1.0] | |
| 100 | 0 | 0 | 0 [1.0] | 0 [2.3] | 100 [4.0] | 0 [1.0] | |
| 500 | 0 | 0 | 0 [1.9] | 0 [2.3] | 100 [4.0] | 0 [1.0] | |
| 1000 | 0 | 0 | 0 [1.9] | 0 [2.4] | 100 [4.0] | 0 [1.0] | |
| 3000 | 0 | 0 | 0 [2.0] | 0 [2.6] | 100 [4.0] | 0 [1.0] | |
| 5000 | 0 | 0 | 0 [2.0] | 0 [2.9] | 100 [4.0] | 0 [1.0] | |
Figure 2.(A) Relationship between global F (averaged across repetitions) and generation after imposition of barriers for each of four simulated datasets with impermeable barriers, b = 0. (B) Percentage correct inference averaged across dispersal and mutation rates in relation to generation time for simulated datasets with impermeable barriers, b = 0. (C) Percentage correct inference averaged across dispersal and mutation rates in relation to generation time for simulated datasets with permeable boundaries (b = 0.03).
Figure 3.Results for puma dataset. (A) Five clusters identified by TESS. (B) Three clusters identified by BAPS5. (C) Three clusters identified by GENELAND. (D) Significant boundary elements (black circles) detected by WOMBSOFT. Small dots indicate sampling sites. Puma habitat is shown in grey.
Figure 4.Results for Rhododendron dataset. (A) Six clusters identified by TESS; (B) Four clusters identified by BAPS5; (C) Three clusters identified by GENELAND; (D) Significant boundary elements (black circles) detected by WOMBSOFT. Small crosses indicate sampling sites.