| Literature DB >> 36110882 |
Abstract
Geographic genetic differentiation measures are used for purposes such as assessing genetic diversity and connectivity, and searching for signals of selection. Confirmation by unrelated measures can minimize false positives. A popular differentiation measure, Bray-Curtis, has been used increasingly in molecular ecology, renamed AFD (hereafter called BCAFD). Critically, BCAFD is expected to be partially independent of the commonly used Hill "Q-profile" measures. BCAFD needs scrutiny for potential biases, by examining limits on its value, and comparing simulations against expectations. BCAFD has two dependencies on within-population (alpha) variation, undesirable for a between-population (beta) measure. The first dependency is derived from similarity to G ST and F ST . The second dependency is that BCAFD cannot be larger than the highest allele proportion in either location (alpha variation), which can be overcome by data-filtering or by a modified statistic A A or "Adjusted AFD". The first dependency does not forestall applications such as assessing connectivity or selection, if we know the measure's null behavior under selective neutrality with specified conditions-which is shown in this article for A A, for equilibrium, and nonequilibrium, for the commonly used data type of single-nucleotide-polymorphisms (SNPs) in two locations. Thus, A A can be used in tandem with mathematically contrasting differentiation measures, with the aim of reducing false inferences. For detecting adaptive loci, the relative performance of A A and other measures was evaluated, showing that it is best to use two mathematically different measures simultaneously, and that A A is in one of the best such pairwise criteria. For any application, using A A, rather than BCAFD, avoids the counterintuitive limitation by maximum allele proportion within localities.Entities:
Keywords: adaptation; allele frequency difference; biodiversity; genetic differentiation; mutual information; outlier loci
Year: 2022 PMID: 36110882 PMCID: PMC9465203 DOI: 10.1002/ece3.9176
Source DB: PubMed Journal: Ecol Evol ISSN: 2045-7758 Impact factor: 3.167
FIGURE 1(a) Comparison of simulation results with algebraic predictions for BCAFD; 9000 points from the 1000 replicates of each of nine neutral scenarios (effective size N = 1000, 10,000, 100,000, dispersal rate m = 0.01, 0.03, 0.10) and with regression equation (Simulated‐BCAFD) = 0.83 × (Predicted‐BCAFD from Equation 6) (significance P <<0, R 2 = .50, intercept negligibly different from zero: −7.6 × 10−5). The black line is the regression line; the red line is the expected 1:1 relationship. (b) The same data again, using the correction for the limitation by maximum p, that is a plot of A = |p 1 –p 2|/(0.6152 + 0.3985 × p max) against the expectation shown in Equations (6) and (8). In this case, the expected 45‐degree plot is achieved exactly, with the expected slope of unity (slope coefficient = 1.00, 95% confidence limits 0.98 to 1.02, significance P <<0, R = 0.50, intercept negligibly different from zero: 0.0004). The red line for 1:1 slope is exactly coincident with the regression line. (c) The nine scenarios from (b) plotted individually—comparison of simulation results with algebraic predictions, using A, the correction for the limitation by maximum p. Each panel shows 1000 points from the 1000 replicates of one scenario, whose dispersal rate m and effective size N is shown in the panel's headline. The slopes of regression lines are shown on the panels, with 95% confidence intervals, which included unity in all except two marginal cases, and are therefore each concordant with the overall result shown in (b) and the relationship in Equation (8). In all cases, the intercept was negligibly different from zero, and P for significance was <10−18.
The effect of on forecasts for BCAFD
| Central |
|
| Intercept | Slope coefficient (95% CL) |
|---|---|---|---|---|
| 0.05 | .465 | 1.0 × 10−131 | +0.0008 | 0.630258 (0.59–0.67) |
| 0.15 | .444 | 1.7 × 10−111 | +0.0020 | 0.673769 (0.62–0.72) |
| 0.25 | .420 | 2.3 × 10−104 | +0.0022 | 0.713834 (0.66–0.77) |
| 0.35 | .456 | 8.2 × 10−118 | +0.0007 | 0.79615 (0.74–0.85) |
| 0.45 | .414 | 2.2 × 10−106 | +0.0024 | 0.766259 (0.71–0.83) |
| 0.55 | .482 | 4.5 × 10−126 | +0.0001 | 0.849727 (0.79–0.91) |
| 0.65 | .569 | 7.4 × 10−158 | −0.0015 | 0.900086 (0.85–0.95) |
| 0.75 | .482 | 2.8 × 10−128 | −0.0008 | 0.824037 (0.77–0.88) |
| 0.85 | .538 | 1.6 × 10−151 | −0.0020 | 0.947642 (0.89–1.01) |
| 0.95 | .586 | 2.1 × 10−201 | −0.0023 | 1.042645 (0.99–1.10) |
Note: The 9000 data points from Figure 1a, sorted by in the final generation. In the first column, “Central ” identifies the points with , etc. The remaining columns show the results of regression analysis of (Simulated‐BCAFD) against (Predicted‐BCAFD from Equation 6) for the subset of the datapoints identified in the left column. All regressions showed an intercept very close to zero, as expected. Large numbers of significant digits are retained in the slope coefficients because of their subsequent use in the analysis in Figure 2, where the coefficients are plotted against central values.
FIGURE 2The effect of maximum p‐value on the regression slope coefficient of (simulated BCAFD) on (expected BCAFD from Equation 6). This plot itself has a regression equation: , with = .90, and P = .000025. The values upon which the plot is based are taken from Table 1.
FIGURE 3(mutual information) plotted against A (i.e., BCAFD corrected for maximum‐value dependency). is shown as squares, as discs, as triangles. All measures were from the same simulated dataset that was used in Figure 1.
Detection of loci under directional selection
| Criteria {Differentiation measure(s)} | Known selection strength ( | ||||
|---|---|---|---|---|---|
| 0.001 | 0.003 | 0.005 | 0.05 | Mean performance | |
|
|
488.18 ± 5.90 10 ± 0 33.03 |
851.18 ± 4.20 10 ± 0 46.23 |
911 ± 2.57 10 ± 0 47.92 |
999 ± 0.30 10 ± 0 50.23 |
|
|
|
485.36 ± 6.15 10 ± 0 32.9 |
820.45 ± 4.69 10 ± 0 45.32 |
891 ± 3.16 10 ± 0 47.37 |
998.73 ± 0.38 10 ± 0 50.22 | 43.95 ± 3.82 |
|
|
459.91 ± 5.46 10 ± 0 31.72 |
857.55 ± 3.75 10 ± 0 46.42 |
938.09 ± 2.27 10 ± 0 48.65 |
999.82 ± 0.12 10 ± 0 50.25 |
|
|
|
488.36 ± 6.57 10 ± 0 33.03 |
804.82 ± 4.54 10 ± 0 44.84 |
874.73 ± 3.73 10 ± 0 46.91 |
998.18 ± 0.40 10 ± 0 50.21 | 43.75 ± 3.74 |
|
|
458.27 ± 5.35 10 ± 0 31.64 |
857 ± 3.71 10 ± 0 46.4 |
938 ± 2.27 10 ± 0 48.65 |
999.82 ± 0.12 10 ± 0 50.25 |
|
|
|
468.91 ± 5.52 7.45 ± 0.25 38.87 |
820.45 ± 4.69 7.45 ± 0.25 52.66 |
891 ± 3.16 7.45 ± 0.25 54.71 |
998.73 ± 0.38 7.45 ± 0.25 57.52 | 50.94 ± 4.15 |
|
|
443.64 ± 5.09 5.64 ± 0.24 44.28 |
843.82 ± 4.32 5.64 ± 0.24 60.18 |
910.73 ± 2.57 5.64 ± 0.24 61.99 |
999 ± 0.30 5.64 ± 0.24 64.15 |
|
|
|
470.18 ± 5.87 7.18 ± 0.26 39.81 |
804.82 ± 4.54 7.18 ± 0.26 53.1 |
874.3 ± 3.73 7.18 ± 0.26 55.16 |
998.18 ± 0.40 7.18 ± 0.26 58.41 | 51.62 ± 4.08 |
|
|
442.18 ± 5.01 5.64 ± 0.24 44.19 |
843.36 ± 4.26 5.64 ± 0.24 60.17 |
910.73 ± 2.57 5.64 ± 0.24 61.99 |
999 ± 0.30 5.64 ± 0.24 64.15 | 57.63 ± 4.55 |
|
|
450.45 ± 5.67 5.64 ± 0.34 44.65 |
819.67 ± 4.77 5.64 ± 0.34 59.48 |
891 ± 3.16 5.64 ± 0.34 61.48 |
998.73 ± 0.38 5.64 ± 0.34 64.14 | 57.44 ± 4.37 |
|
|
475.55 ± 6.22 8.91 ± 0.16 35.03 |
804.82 ± 4.54 8.91 ± 0.16 47.71 |
874.73 ± 3.73 8.91 ± 0.16 49.79 |
998.18 ± 0.40 8.91 ± 0.16 53.09 | 46.40 ± 3.95 |
|
|
449 ± 5.56 6.18 ± 0.44 42.33 |
819.55 ± 4.71 6.18 ± 0.44 57.26 |
891 ± 3.16 6.18 ± 0.44 59.29 |
998.73 ± 0.38 6.18 ± 0.44 62.01 | 55.22 ± 4.41 |
|
|
441.36 ± 5.67 4.91 ± 0.31 47.59 |
804.09 ± 4.62 4.91 ± 0.31 62.32 |
874.73 ± 3.73 4.91 ± 0.31 64.28 |
998.18 ± 0.40 4.91 ± 0.31 67.25 |
|
|
|
458.27 ± 5.35 10 ± 0 31.64 |
857 ± 3.71 10 ± 0 46.4 |
938 ± 2.26 10 ± 0 48.65 |
999.82 ± 0.12 10 ± 0 50.25 | 44.24 ± 4.27 |
|
|
439.91 ± 5.59 4.91 ± 0.31 47.51 |
804.09 ± 4.62 4.91 ± 0.31 62.32 |
874.73 ± 3.73 4.91 ± 0.31 64.28 |
998.18 ± 0.40 4.91 ± 0.31 67.25 |
|
Note: The table shows the number of loci (±SE) from selection simulations of 1000 loci, which were identified as being under selection by criteria based on differentiation values from neutral simulations of 1000 loci: either a “univariate” criterion of being in the top 1% of neutral values for one differentiation measure, or a “bivariate” criterion of being simultaneously in the top 1% for two differentiation measures. In each of columns 2–5, the top value in each cell is the number of loci identified as being under selection (true positive, TP), in the selection simulation with the known value of selection shown at the top of the column, out of the total of 1000 independent loci simulated. The second value in each cell is the number of loci identified as being under selection (False positive, FP), in the parallel neutral simulation; of course with univariate criteria and the cutoff being the top 1%, the FP value is always 10 (1% of 1000 loci). The third value in each cell is the “performance” value—the percentage of loci that are true positive, out of all loci identified as outliers by that criterion (TP & FP). The performance value shown is for the case where 1% of all loci were under that selective regime, and all other loci were neutral; the calculation is 100 × (TP × 0.01)/[(TP × 0.01) + (FP × 0.99)]. Of course, the proportions of neutral and selected loci would not be known beforehand in a study designed to detect loci under selection, but given that it is standardized to a constant univariate FD rate, the performance values can be used to compare the criteria. The right column shows the performance averaged over all four selection strengths. Within each of the univariate criteria and the bivariate criteria, the three criteria with the best average performance are bolded. Note that the rank order of performance values is similar for most selection strengths, except the weakest selection (s = 0.001).
Scheme for the simulation, for each generation, using terms defined in text of Appendix 1
| Location 1 | Location 2 | |
|---|---|---|
| Generation |
|
|
| After drift |
|
|
| After dispersal |
|
|
Time in generations to half‐equilibrium for the scenario conditions simulated
|
|
|
|
|---|---|---|
| 1000 | 0.01 | 32.8488 |
| 1000 | 0.03 | 11.1944 |
| 1000 | 0.10 | 3.27386 |
| 10,000 | 0.01 | 34.3131 |
| 10,000 | 0.03 | 11.3596 |
| 10,000 | 0.10 | 3.28785 |
| 100,000 | 0.01 | 34.4666 |
| 100,000 | 0.03 | 11.3764 |
| 100,000 | 0.10 | 3.28925 |
Note: See Appendix 1 for definitions of other symbols.
Expected time in generations to fixation for the scenario conditions simulated
| Initial |
| Fixation time |
|---|---|---|
| 0.5 | 100,000 | 277258.9 |
| 0.1 | 100,000 | 102337.1 |
| 0.01 | 100,000 | 18606.7 |
| 0.5 | 10,000 | 27725.9 |
| 0.1 | 10,000 | 10233.7 |
| 0.01 | 10,000 | 1860.7 |
| 0.5 | 1000 | 2772.6 |
| 0.1 | 1000 | 1023.4 |
| 0.01 | 1000 | 186.1 |
Note: See Appendix 1 for definitions of symbols.