| Literature DB >> 29246102 |
Abstract
BACKGROUND: The spatial Principal Component Analysis (sPCA, Jombart (Heredity 101:92-103, 2008) is designed to investigate non-random spatial distributions of genetic variation. Unfortunately, the associated tests used for assessing the existence of spatial patterns (global and local test; (Heredity 101:92-103, 2008) lack statistical power and may fail to reveal existing spatial patterns. Here, we present a non-parametric test for the significance of specific patterns recovered by sPCA.Entities:
Keywords: Eigenvalues; Monte-Carlo; Spatial genetic patterns; sPCA
Mesh:
Substances:
Year: 2017 PMID: 29246102 PMCID: PMC5732370 DOI: 10.1186/s12859-017-1988-y
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Fig. 1Flow chart illustrating the steps of the spca_randtest. The first step on the top panel shows one permutation that is used to obtain one value of fi + and fi-. To assess the statistical significance of global either local patterns permutations are repeated x times to obtain empirical distributions of fi + and fi- that are thus compared to the observed values of fi + and fi-. If at least one of the two is significant, the second step of the test exploits the eigenvalue distribution recorded over the permutations to obtain an empirical p-value for each eigenvalue, starting from the most positive (or most negative). As the first eigenvalue is significant in comparison with a chosen threshold, the following is tested and compared to a more stringent threshold (Bonferroni correction) until a non-significant eigenvalue is found and the routine stops
Fig. 2Graphical representation of island and stepping stone migration models (IS and SS) in the panel above. Black rows represent the presence and direction of migration rates among populations (purple circles). The panel below represents two examples of simulated global patterns, where a set of 100 pairs of coordinates are picked from a set of 1000 random pairs of coordinates built in 2D squares at different scales (in the example here reported the scales are 1:10,000 and 1:100,000, respectively). Every 25 pairs of coordinates are assigned to a different simulated population, distinguished by red, blue, black and yellow colors, in order to obtain spatially segregated populations. These simulated spatial distributions are used to calculate the matrix L of spatial connection (see Additional file 3: Figure S1)
Significant results for global test (g test), local tests (l test), and spca_randtest (r test +/−) for random, global and local patterns using 200 loci per individual. IS, SS, IBD indicate the migration models (see Methods); different migration rates are coded by number: 1 = 0.005, 2 = 0.01 and 3 = 0.1
| 200 SNPs | Random Patterns | Global Patterns | Local Patterns | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Models | Significance level | g test | r test (+) | l test | r test (−) | g test | r test (+) | l test | rt est. (−) | g test | r test (+) | l test | r test (−) |
| IS-1 | .05 | 0.054 | 0.059 |
|
|
|
|
|
|
| 0.071 | 0.061 |
|
| .01 | 0.011 |
|
| 0.010 |
|
|
|
|
| 0.010 | 0.015 | 0.113 | |
| IS-2 | .05 |
|
| 0.058 | 0.056 |
|
|
|
| 0.056 | 0.059 | 0.050 | 0.123 |
| .01 |
|
|
| 0.013 | 0.067 |
|
|
| 0.011 |
| 0.012 | 0.026 | |
| IS-3 | .05 | 0.051 |
| 0.053 |
| 0.055 |
|
|
|
|
|
| 0.059 |
| .01 | 0.010 | 0.014 | 0.013 |
| 0.010 | 0.013 |
| 0.013 |
|
|
| 0.019 | |
| SS-1 | .05 | 0.053 | 0.058 | 0.053 | 0.050 |
|
|
|
| 0.063 | 0.064 | 0.124 |
|
| .01 |
| 0.011 | 0.010 | 0.010 |
|
|
|
| 0.017 | 0.010 |
|
| |
| SS-2 | .05 |
| 0.058 | 0.058 | 0.063 |
|
|
|
|
|
| 0.059 |
|
| .01 | 0.011 | 0.011 | 0.013 | 0.016 |
|
|
|
|
|
| 0.014 | 0.147 | |
| SS-3 | .05 |
|
| 0.057 |
| 0.054 | 0.128 |
|
|
| 0.054 |
| 0.071 |
| .01 | 0.014 |
| 0.011 | 0.013 | 0.014 | 0.036 |
| 0.010 |
|
|
|
| |
| IBD-1 | .05 |
| 0.050 | 0.053 |
|
|
|
|
|
| 0.087 |
|
|
| .01 |
| 0.012 |
| 0.010 |
|
|
|
|
| 0.023 | 0.192 |
| |
| IBD-2 | .05 | 0.052 |
| 0.061 |
|
|
|
|
|
| 0.076 |
|
|
| .01 |
|
| 0.011 |
|
|
|
|
|
| 0.018 |
|
| |
| IBD-3 | .05 | 0.052 |
| 0.053 | 0.050 |
|
|
|
| 0.050 | 0.083 |
|
|
| .01 | 0.013 |
| 0.011 | 0.012 |
|
|
|
|
| 0.023 |
|
| |
* p-values are in italic when non significant and in bold when the fraction of true positive is above 20%
Results show the proportion of significant tests over 1000 replicates, based on 1000 permutations with thresholds .05 and .01
Results for the same simulations reported in Table 1 using a subset of 40 loci per individual
| 40 SNPs | Random Patterns | Global Patterns | Local Patterns | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Models | Significance level | g test | r test (+) | l test | r test (−) | g test | r test (+) | l test | r test (−) | g test | r test (+) | l test | r test (−) |
| IS-1 | .05 | 0.052 | 0.061 | 0.046 | 0.050 |
|
|
|
|
|
| 0.055 | 0.077 |
| .01 | 0.016 | 0.013 | 0.010 |
|
|
|
|
|
|
| 0.015 | 0.022 | |
| IS-2 | .05 | 0.053 |
|
|
| 0.103 |
|
|
| 0.073 |
| 0.057 |
|
| .01 | 0.011 |
|
|
| 0.022 | 0.072 | 0.011 | 0.005 | 0.012 |
| 0.010 |
| |
| IS-3 | .05 |
| 0.050 | 0.050 |
|
| 0.060 |
|
|
|
| 0.053 |
|
| .01 |
| 0.011 |
|
|
| 0.011 | 0.011 | 0.011 |
|
| 0.013 |
| |
| SS-1 | .05 | 0.052 | 0.054 |
|
|
|
|
|
| 0.050 |
| 0.067 | 0.169 |
| .01 |
| 0.012 |
| 0.011 |
|
|
|
|
|
| 0.021 | 0.052 | |
| SS-2 | .05 |
|
| 0.050 |
|
|
|
|
| 0.052 |
|
| 0.081 |
| .01 | 0.013 | 0.010 | 0.010 | 0.015 |
|
| 0.016 |
|
|
| 0.011 | 0.014 | |
| SS-3 | .05 | 0.068 |
| 0.050 |
| 0.066 | 0.055 | 0.053 |
|
|
|
|
|
| .01 | 0.014 |
| 0.013 | 0.012 | 0.012 |
|
|
|
|
|
|
| |
| IBD-1 | .05 |
| 0.053 | 0.052 | 0.057 |
|
|
|
|
| 0.055 | 0.124 |
|
| .01 |
|
| 0.013 | 0.013 |
|
|
|
|
|
| 0.032 |
| |
| IBD-2 | .05 |
| 0.054 | 0.060 |
|
|
|
|
|
| 0.051 | 0.111 |
|
| .01 | 0.011 |
| 0.015 |
|
|
|
|
|
| 0.015 | 0.026 |
| |
| IBD-3 | .05 |
|
| 0.051 | 0.050 |
|
|
|
|
| 0.058 | 0.115 |
|
| .01 | 0.012 | 0.013 | 0.012 | 0.010 |
|
|
|
|
| 0.010 | 0.023 |
| |
* p-values are in italic when non significant and in bold when the fraction of true positive is above 20%
Results of the spca_randtest with 10,000 permutations on the human mtDNA dataset (Montano et al., [14])
| Spatial patterns | Observed | Decreasing Positive Eigenvalues | Observed | Bonferroni corrected significant level |
|---|---|---|---|---|
| Global pattern |
| 3.4e-2 |
| 0.05 |
| Local pattern | 0.8826 | 8.5e-3 |
| 0.025 |
| 4.1e3 |
| 0.016 | ||
| 1.6e-3 | 0.506 | 0.0125 |
The simulated distribution of the f and f statistics are compared to the f and f statistics observed for the original dataset. A significant global pattern (or significant f observed statistics) is found with the spca_randtest (p-value <0.01). Thus, each positive eigenvalue is compared with its simulated distribution and assigned to be significant if its observed p-value is lower than the corrected Bonferroni p-value, with starting threshold of 0.05. Significant observed p-values as compared with Bonferroni corrected p-values are highlighted in bold
Fig. 3Plot of the first and second most positive observed eigenvalues of the mtDNA dataset here analysed. The background map represents the countries from where the populations included into the original study were sampled (from West to East: Senegal, Guinea-Bissau, Guinea, Sierra Leone, Liberia, Ivory Coast, Ghana, Togo, Benin, Nigeria, Cameroon, Equatorial Guinea, Gabon, Congo). sPC1 and sPC2 are represented independently using a square size proportional to the value of each population along the first and second component, respectively. Whites squares show negative values and black squares the positive values, with size being proportional to the absolute value of the coordinate. sPC1-sPC2 is a summarized representation of the values along the first and second component assumed by each population, using a color gradient. The maps were produced using rworldmap R package [15]. Genetic data and geographical coordinates of the populations analysed are publicly available from Montano et al. ([14])