Literature DB >> 31871215

Contingent Convergence: The Ability To Detect Convergent Genomic Evolution Is Dependent on Population Size and Migration.

Abstract

Outlier scans, in which the genome is scanned for signatures of selection, have become a prominent tool in studies of local adaptation, and more recently studies of genetic convergence in natural populations. However, such methods have the potential to be confounded by features of demographic history, such as population size and migration, which are considerably varied across natural populations. In this study, we use forward-simulations to investigate and illustrate how several measures of genetic differentiation commonly used in outlier scans (FST, DXY and Δπ) are influenced by demographic variation across multiple sampling generations. In a factorial design with 16 treatments, we manipulate the presence/absence of founding bottlenecks (N of founding individuals), prolonged bottlenecks (proportional size of diverging population) and migration rate between two populations with ancestral and diverged phenotypic optima. Our results illustrate known constraints of individual measures associated with reduced population size and a lack of migration; but notably we demonstrate how relationships between measures are similarly dependent on these features of demography. We find that false-positive signals of convergent evolution (the same simulated outliers detected in independent treatments) are attainable as a product of similar population size and migration treatments (particularly for DXY), and that outliers across different measures (for e.g., FST and DXY) can occur with little influence of selection. Taken together, we show how underappreciated, yet quantifiable measures of demographic history can influence commonly employed methods for detecting selection.

Entities: CellLine Chemical Disease Gene Mutation Species

Keywords: DXY; FST; Outlier scans; convergence; demographic history; false-positives

Year: 2020 PMID： 31871215 PMCID： PMC7003088 DOI： 10.1534/g3.119.400970

Source DB: PubMed Journal: G3 (Bethesda) ISSN： 2160-1836 Impact factor: 3.154

Studies assessing adaptation and evolution across the genome are increasing in popularity with the availability of modern sequencing technologies. These studies often center around quantifying patterns of variation in genome-wide SNPs, which can be used to highlight regions or genes having experienced selection relative to the neutral backdrop of the rest of the genome. These analyses, which we refer to here as outlier scans, have become a common tool in population genetics and have been useful across diverse, natural systems in identifying candidate genes associated with the evolution of a range of adaptive traits (reviewed recently by (Ahrens )). More recently, the method of overlapping outlier scans across independent lineages has been employed to test whether the same regions are involved in independent adaptation events (i.e., genetic convergent evolution) (reviewed by (Fraser and Whiting 2019)). This study seeks to investigate how different outlier scan methods are influenced by demographic variation in natural populations, and how this may lead to overlapping false-positives. Recent discussions have highlighted the propensity of outlier scans to yield false-positives, given that outliers caused by heterogeneous genomic landscapes are commonplace irrespective of selection (Ellegren and Wolf 2017). For example, background selection (BGS), whereby linkage between neutral and deleterious variants reduces local diversity (Charlesworth ; Bank ; Burri 2017), has been invoked to offer alternative explanations for patterns at first attributed to directional selection (Cruickshank and Hahn 2014). Furthermore, the influence of neutral processes such as genetic drift (Charlesworth 2009) can similarly produce elevated genetic divergence and false-positive signatures of selection. The strength of these processes is dependent on demography. For example, the influence of neutral processes and BGS should be more pronounced in smaller populations (low Ne) (Charlesworth ; Charlesworth 2009; Yeaman and Otto 2011; Cutter and Payseur 2013); as has been demonstrated in humans (Torres ) and Drosophila (Charlesworth 1996; Sella ). This relationship, however, may not be simply linear, with additional simulation evidence suggesting the effects of BGS are strongest at intermediate-low Ne, and weaker at very low or high Ne (Zeng 2013). Lack of connectivity among populations may also elevate measures of genetic differentiation, if a large global population is composed of smaller, isolated, sub-divided populations that each experience the effects of reduced Ne (Charlesworth ; Hoban ). In an attempt to mitigate the likelihood of false-positives, some have advocated using multiple measures of population differentiation and divergence to identify regions of the genome likely to be under selection (Charlesworth 1998; Cruickshank and Hahn 2014). However, how these measures are correlated with each other, and with selection under different demographic scenarios, has not been explored. A diverse array of measures of genetic differentiation and divergence have been employed for outlier scans, but here we focus on two of the most common measures of relative differentiation (FST and changes in nucleotide diversity [Δπ]) and a measure of absolute divergence (DXY). The fixation-index FST (Weir and Cockerham 1984; Hudson ), which measures the relative amount of within- and between-population variance. This measure is therefore maximized when genomic regions exhibit the lowest within- and highest between-population variance. FST outliers at the right-tail of the distribution are considered candidates for adaptation because they reflect regions with large differences in allele frequency or high substitution rate (large between-population variance) and/or low nucleotide variation (low within-population variance) relative to the rest of the genome. Changes in nucleotide diversity (Δπ) are another indicator of adaptation, as selection on a beneficial allele limits variation within a population resulting in selective sweeps (Smith and Haigh 1974). Comparisons of the ratio of π between diverging populations reveal regions under selection, as local π is reduced in one population in comparison to the other. While similar to FST, Δπ does not discriminate between which copy of a polymorphism is fixed, such that a substitution between populations is equivalent to a common non-polymorphic site. Δπ outliers therefore represent regions of the genome with reduced π in either population relative to the rest of the genome. As a measure of absolute genetic divergence, DXY (Nei 1987) does not consider the relative frequencies of polymorphisms within populations (Charlesworth 1998; Cruickshank and Hahn 2014). DXY can be quantified as the average number of pairwise differences between sequence comparisons between two populations. This measure is therefore influenced by ancestral π and the substitution rate, so DXY outliers highlight regions that are highly variable ancestrally, or in either population (large π), or exhibit many substitutions and thus increased sequence divergence. Because each measure of differentiation/divergence (hereafter referred to collectively as measures of divergence) quantifies genetic variation between populations in a slightly different way, each has a unique relationship with demography. While we can predict how individual measures are influenced by demography, and subsequently neutral processes, we know little about how different relationships between measures of divergence and demography affect their combined usage. We expect then that the utility of using multiple measures of divergence to detect selection may vary with demography. This complex interplay between divergence measures and demography may be further exacerbated in studies that compare scans from multiple populations to identify convergent genomic evolution. Here, researchers use measures of population divergence across independent pairs of evolutionary replicates with outlier loci compared across results. This strategy has been employed extensively across diverse taxa, including: birds (Cooper and Uy 2017), fish (Hohenlohe ; Jones ; Fraser ; Reid ; Rougemont ; Meier ), insects (Soria-Carrasco ; Van Belleghem ), mammals (Waterhouse ), molluscs (Westram ; Ravinet ) and plants (Roda ; Trucchi ). For the sake of consistency, it is common to infer outlier loci within each replicate pair through a common method, but if replicates differ in their demographic histories then the applicability/power of that common method will also vary accordingly. This begs the question, can demographic variation among replicates alone explain signals of, or lack of, convergence? Here, we used forward-simulations to investigate the effects of different demographic histories on the relationships between measures of divergence with selection. We varied demography through manipulating the number of founding individuals (founding bottlenecks), the population size of the diverging population (prolonged bottlenecks) and the presence/absence of migration. We simulated the effects of selection on 25kb regions, each with a gene designed from features taken from the guppy (Poecilia reticulata) genome assembly, a prominent model system for studies of convergent evolution (Reznick and Endler 1982; Fraser ). Moreover, we measured genetic divergence at 12 set time points through FST, DXY and Δπ between diverging populations to examine temporal relationships. Our aim was to investigate the significance of founding/prolonged bottlenecks and migration between populations when employing outlier scans, and highlight demographic scenarios that are susceptible to false-positives. We also aimed to test the occurrence of common outliers across measures, and whether overlapping outliers are consistently good indicators of selection. We sought to answer the following questions: 1) How do these demographic factors influence measures of divergence through time? 2) How well do measures of divergence identify regions of the genome under strong selection through time and across demographic factors? 3) How are the different measures of divergence related through time and across demographic factors? 4) Where do we detect the strongest signals of convergence (i.e., overlapping outliers using single or multiple measures), and are these consistent with selection?

Methods

The forward-simulation software SLiM 3.0 (Haller and Messer 2019) was used to simulate population divergence under contrasting demographic treatments in a fully factorial design between a population with an ancestral phenotype (AP) of N = 1000 and population with a diverged phenotype (DP), with a mutation rate based on the guppy genome (4.89e-8) (Künstner ) and scaled 100-fold over three raw mutation rates (4.89e-5, 4.89e-6, 4.89e-7) to ensure robustness of results across different, more realistic values of θ (4Neμ) (Table 1). These scaled mutation rates therefore represent effective population sizes of 10-, 100-, and 1000-times greater than the number of individuals in our simulations (N = 1000), in line with estimates from other species (Charlesworth 2009). The main text reflects results for the intermediate mutation rate 4.89e-6, with others presented in supplementary figures. SLiM employs a classic Wright-Fisher model to simulate populations, in which a population of diploid hermaphrodites proceeds through generations such that an individual’s contribution toward the next generation is proportional to its relative fitness.

Table 1

Simulation Parameters

Variable	Value	Description
AP	0	Ancestral phenotypic optimum
AP-S_σ	1	AP Selection (σ of fitness around phenotypic optimum, after transforming)
AP_N	1000	AP population size
DP	10	Diverged phenotypic optimum
DP-S_σ	0.1-10.00	DP Selection (σ of fitness around phenotypic optimum, after transforming)
DP_N	(10, 100, 500, 1000)	DP population size
BN	(100, 1000)	Founding bottleneck (N burn-in genomes sampled to populate DP)
m	(0, 0.002)	Migration (Percentage gene flow in both directions)
μ	4.89e^-6 (4.89e^-5, 4.89e^-7)	Mutation rate (bp^-1) (additional rates for higher/lower scaling)
μ_σ	1.00	Mutation effect size (σ of distribution of effect sizes)
r	1.00e^-6 (1.00e^-5, 1.00e^-7)	Recombination rate (bp^-1) (additional rates for higher/lower scaling)
C_max	1.00	Maximum fitness cost through competition
C_σ	0.40	Local phenotypic competition (σ of fitness reductions between individuals)
C_Dist	1.20	Maximum phenotypic distance between competitive individuals

Demographic treatments included reducing DP size (prolonged bottlenecks) relative to the AP size (N = 1000) (0.01, 0.1, 0.5, 1.0), migration as a proportion of individuals exchanged between populations (0.0, 0.002) and founding bottlenecks as the number of individuals sampled from the burn-in population to construct DP genomes (N = 100 or 1000). For example, a demographic history with founding bottleneck = 100 individuals, DP size = 0.01, and migration = 0.002 would represent the following scenario: 1) At generation 1, following a burn-in period of 10,000 generations to reach mutation-selection balance, populations split as 100 genomes are sampled from the burn-in population of 1000 to form DP, while AP is formed by sampling all 1000 burn-in genomes; 2) At the next generation (and for the remainder of the simulation), DP size reflects the prolonged bottleneck treatment, in this case 10 (0.01 of 1000); 3) AP and DP experience migration in both directions for the remainder of the simulation. Thus, each of our treatments can be thought of as a manipulation of the following: Founding bottlenecks limit the amount of variation within the burn-in population that is available to found DP; Prolonged bottlenecks limit the size of DP, reducing mutational input and moderating the strength of neutral evolution and efficacy of selection; and migration dictates the presence of migration between AP and DP (Figure 1). The total N of individuals (AP + DP) within simulations varies between 1010 and 2000 depending on DP size parameter, however this potential expansion does not impact our results (Supporting Information).

Figure 1

Experimental design for simulations. A) Examples of selection treatments experienced by genes from across the range, illustrated as the relationship between relative fitness of an individual and its phenotype. Each facet represents a different fitness landscape modified through editing the standard deviation of the normal distribution of fitness consequences in DP (blue, dashed). Facet labels constitute S, which were transformed as 10- to give fitness function standard deviations (S). The x-axes represent phenotype as calculated through non-synonymous mutations and the y-axes represent relative fitness of individuals. The ancestral phenotype of AP (red, solid) is the same in all treatments (mean = 0, σ = 1), while DPs have a diverged phenotypic optimum (mean = 10, σ = S). B) Distribution of S values applied to genomic regions (1 per region) C) Demographic representation of treatment factors. D) Representation of simulation timeline for treatments, illustrating that all treatments share an ancestral burn-in population before splitting into 16 replicated “AP” (solid) and “DP” (dotted) population pairs. Red, dashed lines denote sampling generations at which FST, DXY and Δπ are calculated and averaged across the preceding 20 generations. The purpose of this averaging was to achieve a general sense of population differentiation at sampling points, such that values represent stable patterns rather than stochastic generation to generation variation. E) Examples of two simulated 25kb regions, showing central genes that function as QTL, and illustrating how regions vary in gene length, exon N and selection target (%). F) Representation of 100 simulated 25kb regions dataset per iteration (N = 20) per treatment group (N = 16). Combining these treatment levels in a fully factorial experimental design generated a total of 16 different, independent demographic histories that all 25kb gene regions (see below for architecture) experienced. This factorial design allows us to examine the relative influence of founding bottlenecks, prolonged bottlenecks and migration, but our study is limited to these features of demographic history and does not extend to population expansions, more complex cases of migration, or demographic fluctuations through time. In some cases, the influence of prolonged bottlenecks renders the effects of founding bottlenecks unnecessary. However, when DP size is greater than 100, the inclusion of the founding bottleneck allows us to compare populations with equivalent mutational input but different standing variation. SLiM runs were performed independently over 25kb genomic regions with a central ‘gene’ that functioned as a QTL and varied in length and exon content (Figure 1E); with exon number, lengths, and intron lengths drawn at random from the guppy genome gene annotation file (gff). In total, we simulated a dataset of 100 25kb regions (Figure 1F). Recombination rate was scaled alongside mutation rate from a human-derived r = 1e-8 to 1e-6. Selection (S) in our model was represented as the fitness consequences incurred through distance from a phenotypic optimum in a one-dimensional fitness landscape. Selection in Wright-Fisher models is soft, in that low fitness individuals are not removed but have a lower likelihood of contributing to the next generation. Phenotypic optima were maintained over the course of simulations; thus, selection was constant throughout. This setup can be considered as analogous to two environments with contrasting optimum trait values for a single trait. Intensity of selection was manipulated by modifying the standard deviation (Sσ) of the normal distribution curve from which the density distribution was calculated through the “dnorm” function in SLiM. Values for S were drawn from a continuous distribution between -1.00 and 1.00 and transformed such that Sσ = 10-, yielding values between 0.1 and 10, with 0.1 representing the steepest fitness peak (S = 1) and 10 the shallowest (S = -1) (Figure 1A). Phenotypes were calculated per individual, per region, as the additive phenotypic effects of exonic non-synonymous mutations, which appeared at a rate of 7/3 relative to synonymous mutations (assuming that most mutations in the third base of a codon do not alter the amino acid). Additive genetic variance was assessed due to its prevalence in complex traits in nature (Hill ). Effect sizes for mutations with phenotypic effects were drawn from a Gaussian distribution with mean = 0 and σ = 1. The remaining synonymous exon mutations, and mutations in introns and outside of genes, had no effect on fitness. For each simulation, populations were seeded with 1000 individuals and allowed to proceed for a burn-in period of 5*2N (10,000) generations to reach mutation-selection-migration balance. During this period, burn-in populations evolved toward the ancestral phenotypic optimum of 0, defined as a normal distribution with mean = 0 and σ = 1.0 (Figure 1A). Burn-in populations were then subjected to each demographic treatment to simulate the founding of multiple populations from a shared ancestral state. During this ‘divergence period’, AP continued to evolve around the ancestral optimum of 0, while DP’s phenotypic optimum was centered around 10, with fitness consequences defined according to S. Individuals also experienced fitness costs associated with phenotypic proximity to other individuals within the population as a proxy for competition and to ensure a realistic amount of phenotypic variation persisted within populations. Fitness costs due to competitive proximity were scaled to a maximum value of 1 with σ = 0.4 and occurred reciprocally between local individuals with phenotypes with a difference of ≤ 1.2 (3 * 0.4). Simulations were sampled at 100, 500, 1000, and then every 1000 generations up to 10,000 (Figure 1C). FST was calculated across the 25kb region based on the proportion of subpopulation heterozygosity (H) relative to total heterozygosity (H) (according to Hudson, Slatkin and Maddison (1992)): . DXY was calculated as the sum of nucleotide differences (d) between the ith haplotype from AP and the jth haplotype from DP (according to Nei (1987)): . Mean heterozygosity (π) for each population was calculated as a single measure across the 25kb region. At each sampling point, each measure was calculated and averaged across the preceding 20 generations. Averaging was performed such that measures would not be dramatically biased by events occurring within individual generations. The change in mean heterozygosity between AP and DP (Δπ) was calculated as the ratio of πAP to πDP, such that reduced diversity in the DP population increases the value of Δπ: . Statistics were calculated over all monomorphic and polymorphic sites of 25kb regions. This has been designed to replicate genome scans that use window-approaches, with each 25kb region analogous to an independent window. In total, 100 unique 25kb regions were simulated across 16 demographic treatments across three mutation rates, with results for the intermediate mutation rate presented in the main text. To account for stochastic noise in the simulation, each 25kb region was iterated 20 times for each demographic treatment. Simulations with divergent AP and DP phenotypes are referred to as “PhenoDiv” simulations. Additional simulations were performed in which both populations shared the ancestral phenotype (“PhenoNull” simulations) and in which all sites were neutral with no selection imposed (“Neutral” simulations). The former of these was used to assess whether patterns associated with selection were driven by phenotypic divergence or variable stabilizing selection within populations, while the latter was used to disentangle effects of selection and neutral processes such as drift. A common set of 100 25kb regions were used for all simulations. Outliers for simulations were taken as upper 95% quantiles. All data analysis was performed in R (3.5) (R Core Team 2016). To assess relationships between divergence measures and selection, data were grouped by sampling generation within each treatment group (N = 16) and Pearson’s correlation coefficients were calculated between measures of divergence and selection (S) at each sampling point for all regions (N = 100). Correlation coefficients were then grouped within specific treatment levels (e.g., DP size = 0.5, or migration = 0.002) and averaged to give a coefficient reflecting each specific treatment level. These correlation coefficients were calculated for each iteration (N = 20) and averaged over to give final values. To assess the effects of treatments on detecting outliers, we compared distributions and 95% cut-offs within each treatment for each measure of divergence for PhenoDiv, PhenoNull, and Neutral simulations. We limited this analysis to early (100, 500), intermediate (3000) and late (10,000) sampling generations. Here, data from all iterations of each 25kb region were pooled. To calculate false positive (FPR) and false negative rates (FNR), we pooled Neutral simulation data within treatment groups (20 iterations of 100 genes, N = 2000), removed a random set of iterations (N = 100), and replaced it with an assortment of random iterations of each gene (e.g., Gene 1/Iteration 2, Gene 2/Iteration 14... Gene 100/Iteration 4) from PhenoDiv data. We calculated FPR as the proportion of data above the 95% quantile for each measure of divergence/differentiation that came from neutral simulations. For single measures, FPR = FNR as 5% of data in each permutation comes from PhenoDiv. We combined outlier sets across all combinations of FST, DXY and Δπ and examined neutral proportions within outlier sets to determine FPR. FNR for combined outlier sets corresponded to the proportion of PhenoDiv data not recovered in the combined outlier sets. These permutations were performed 100 times with results averaged. Proportional overlap of outlier sets was also calculated and compared across demographic treatment groups to examine convergence of results across treatments. Overlap was calculated within each permutation, averaged over, and visualized using heatmaps with hierarchical clustering of axes. To examine how simulated gene features influenced patterns of genetic divergence, we used a linear mixed modeling (LMM) approach with gene ID and treatment group as random factors. Gene features modeled as independent factors were: number of exons (Exon N), gene size, the proportion of gene that is coding (selection target %), selection applied to each gene (S), and the generation at which the optimum phenotype was reached (Pheno Gen).

Data availability

A full set of scripts including all bash, Eidos and R scripts necessary to repeat this analysis can be downloaded from Github (https://github.com/JimWhiting91/Contingent_Convergence_Pipeline). Supplementary figures (S1-49) have been uploaded through the GSA figshare portal. Supplementary figures include results for different mutation/recombination rates, PhenoNull and neutral data across sampling generations along with additional figures. Supplemental material available at figshare: https://doi.org/10.25387/g3.11323088.

Results

Demography modifies measures of divergence

Founding bottlenecks, simulated by reducing the number of possible founding genomes to a random 10% of the ancestral population, had little measurable effect on FST, DXY or Δπ, producing minimal variance between treatments over all sampling points (Figure 2A).

Figure 2

Effects of demographic treatments on measures of genetic divergence across all sampling generations. Point color and shape denote treatment groups for founding bottlenecks (A), prolonged bottlenecks (B) and migration (C). Each point represents values of divergence averaged across all genes within individual treatments groups, averaged within treatment levels, and averaged over 20 iterations. Prolonged bottlenecks, simulated by modifying the stable number of individuals within DP, had pronounced and variable effects on all measures. FST increased with reductions in DP size. This effect was generally consistent through time, although variance between treatments increased gradually through time (Figure 2B). A similar effect was observed for Δπ; however, this measure was particularly susceptible to inflation under the most extreme reductions in DP size, with substantially more elevated values observed between DP size proportions of 0.01 and 0.1 compared with 0.1 and either 0.5 or 1.0. This pattern was broadly consistent through sampling points. Such observations are unsurprising given that both FST and Δπ increase with processes that reduce within-population genetic variance. DXY was generally robust to prolonged bottlenecks, but in contrast to FST and Δπ, DXY decreased when DP sizes were reduced (Figure 2B). The inclusion of migration reduced absolute values of all measures of divergence. For FST and DXY, variance between migration treatments increased across the simulation, however DXY was generally more consistent across migration treatments. Δπ was also reduced in the presence of migration, although the effect of migration on Δπ was generally consistent across all generations and did not increase through time, as was the case for FST and DXY. While FST and DXY increased generally over time, Δπ peaked around generation 500 and declined thereafter. This peak corresponded to the median generation that DP replicates reached their optimum phenotype (median across all data = 493), suggesting this peak reflects selective sweeps. The majority of these patterns were apparent in the PhenoNull (Figure S1) and neutral (Figure S2) simulations, however divergence for all measures was reduced and ultimately negligible by the removal of divergent phenotypes when migration was present.

Demography moderates the association between measures of divergence and selection

Again, founding bottlenecks had a minimal effect on the correlations observed between strength of selection and measures of divergence with minimal variance observed between bottleneck treatments in all comparisons (Figure 3A).

Figure 3

Effects of demographic treatments on the relationship between selection and measures of genetic divergence across all sampling generations. Point color and shape denote treatment groups for founding bottlenecks (A), prolonged bottlenecks (B) and migration (C). Each point represents correlation coefficients calculated across all genes within individual treatments groups, averaged within treatment levels, and averaged over 20 iterations. Prolonged bottlenecks had substantial effects on relative (FST and Δπ) measures and marginal effects on absolute (DXY) measures of divergence (Figure 3B). FST correlations with selection became consistently weaker as DP size reduced. There was minimal difference between Δπ correlations with selection except for the most extreme reductions in DP size. Effects for both were generally consistent through time. For DXY, correlations with selection were broadly consistent with minimal variance across DP size reductions (Figure 3B). Expectedly, the absence of migration largely precluded the ability of measures of divergence to predict strength of selection, with notable variance observed between no migration (0.0) and minimal migration (0.002) treatments emerging rapidly for all divergence measures (Figure 3C). Both FST and DXY variance between migration treatments was greatest at the 10,000 generations sampling point, whereas similar variance was observed between migration treatments for Δπ across simulations. This observation again highlights the significance of temporal differences between measures. Interestingly, correlations between Δπ and selection persisted, albeit weakly, in the absence of migration, which was not the case for FST and DXY. Further, at larger population scaling (μ = 4.89e-5, r = 1e-5) DXY correlations with selection were negative (although became more positive over time with migration) (Figure S6), most likely due to a stronger influence of mutational input. In larger populations, positive correlations were observed between FST and selection without migration, but were weaker than with migration (Figure S6). Prolonged bottlenecks had a minimal effect on how measures of divergence correlated with selection in PhenoNull data (Figure S5). Both FST and DXY became negatively correlated with selection over time without divergent selection, likely due to stronger selection on common alleles shared between AP and DP. Negative correlations were stronger for DXY, consistent with reductions in πDP with stronger selection. This suggests positive associations between selection and DXY in PhenoDiv simulations are likely more dependent on adaptive substitutions in order to overcome this effect. Δπ was generally positively associated with selection in PhenoNull simulations regardless of migration, but was slightly reduced when migration was absent. By examining the correlation coefficients of all 16 unique demographic histories, we can investigate the combined effect of migration and prolonged bottlenecks and directly compare effectiveness of individual measures across time (Figure 4). FST consistently outperforms DXY in terms of associating with selection under most demographic treatments, particularly when DP sizes are larger and migration is present. By 10,000 generations however, the relative dominance of FST appears to subside, with the trend through time suggesting a relative improvement in DXY in treatments with migration (Figure 4). Δπ is similarly more informative than DXY across sampling generations under most demographic treatments. Interestingly at sampling generation 10,000, reductions in prolonged bottlenecks produce the biggest bias toward Δπ (Figure 4). The resilience of Δπ under no-migration treatments is also apparent in FST - Δπ comparisons, such that at 3,000 generations Δπ is more informative than FST in the absence of migration. Consistent with its rapid response to sweeps around 500 generation, Δπ slightly outperforms FST under most demographic scenarios in early generations. By 10,000 generations, however, FST performs as well as Δπ without migration and outperforms Δπ with migration.

Figure 4

Pairwise comparisons between the correlation coefficients of selection with FST, DXY and Δπ across four sampling points. Each data point represents a unique demographic treatment with points colored according to DP population size and shaped according to migration level. Correlation 1 refers to the first measure listed in the comparison and Correlation 2 to the second. The y = x line is plotted within each facet to illustrate biases toward one measure. Points below the line are biased toward Correlation 1, while points above the line are biased toward Correlation 2. Each point represents correlation coefficients calculated across all genes within individual treatments groups and averaged over 20 iterations.

Demography moderates the shape and tail end of divergence distributions

By comparing distributions across simulations with divergent (PhenoDiv) and stabilizing (PhenoNull) selection with neutral runs, we can examine the effect of demographic treatments on the ability of each measure of divergence to discriminate between them (Figure S11-13; Figure 5). There are few differences between distributions of FST for the three simulation types when migration is absent between AP and DP replicates, highlighting an increased likelihood of false-positives. The exceptions occur in early sampling points at 100 and 500 generations (Figure S12) when sweeps are most common. With migration, PhenoNull FST is marginally elevated compared with neutral FST, but distributions are broadly similar. PhenoDiv FST distributions however become more positive and flattened, according to variable selection, with the majority of PhenoDiv FST above the 95% quantiles of PhenoDiv and neutral FST by 10,000 generations (Figure 5).

Figure 5

Distributions of FST under each of the 16 unique demographic treatments under three selection regimes: PhenoDiv (divergent selection), PhenoNull (stabilizing selection) and Neutral, after 10,000 generations. Upper 5% quantiles are highlighted for each distribution, with linetype corresponding to selection: Solid = Divergent, Dashed = Stabilizing, Dotted = Neutral. Each distribution represents data pooled from 20 iterations of 100 25kb regions (N = 2000). Similar patterns were observed for DXY distributions (Figure S14-17), with little to discriminate between in treatments without migration. However, without migration, neutral DXY was generally reduced relative to PhenoNull and PhenoDiv DXY at earlier sampling points. With migration, like FST, PhenoDiv DXY was readily distinguishable from PhenoNull and neutral distributions, but PhenoNull DXY was also generally more positive than neutral DXY. These patterns also emerged more slowly than for FST. In contrast, PhenoDiv Δπ (Figure S18-21) was elevated according to DP size, such that at 500 generations the majority of 25kb regions under divergent selection exhibited Δπ above neutral and PhenoNull 95% cut-offs for all treatments with DP size ≥ 0.5. We quantified false-positive rates (FPR) by permuting over merged data comprised of randomly sampled 5% PhenoDiv regions and 95% neutral regions and observing the upper 5% quantile (Table S1). By 100 generations, FST FPR ranged between 0.06 and 0.91, and was lower with increased DP size and lower in treatments without migration (Table S1). By 500 generations (Table 2), FPR rates were lower with migration and higher without, but only for treatments with DP sizes of 0.5 and 1.0. FST FPR remained high (> 0.81) for all treatments with smaller DP sizes. By 3,000 generations, migration was the most important demographic factor for FST FPR. With migration, FPR ranged from 0.15 – 0.61, and decreased with increasing DP size. Without migration, FPR were high (0.88 - 0.97), close to the random proportion of neutral (0.95) data. By the end of simulations, FST FPR was as low as 0.13, and was no greater than 0.27 with migration and DP size ≥ 0.1. Without migration, FPR was > 0.94.

Table 2

False positive (FPR) and false negative rates (FNR) calculated across all measures of divergence and their combined use. Estimates were calculated by combining 5% of data under divergent selection with 95% neutral data and taking upper 5% cut-offs. For single measures, outlier N is always 100 (5% OF 2,000) and FNR = FPR

Gen	Demographic Treatments		Single Measures			Combined Measures
			F_ST	D_XY	Δπ	F_ST & D_XY			F_ST & Δπ			D_XY & Δπ			All 3
	Migration	DP Size	FPR	FPR	FPR	Outlier N	FPR	FNR	Outlier N	FPR	FNR	Outlier N	FPR	FNR	Outlier N	FPR	FNR
500	0	0.01	0.87	0.87	0.99	62.21	0.87	0.92	9.64	0.98	1.00	26.21	0.98	1.00	9.64	0.98	1.00
500		0.1	0.91	0.86	0.99	54.79	0.89	0.94	14.77	0.98	1.00	13.63	0.99	1.00	7.89	0.98	1.00
500		0.5	0.50	0.81	0.47	31.56	0.49	0.84	41.54	0.19	0.66	12.11	0.37	0.92	10.01	0.24	0.92
500		1	0.28	0.80	0.28	26.84	0.28	0.81	59.81	0.04	0.43	14.99	0.00	0.85	14.75	0.00	0.85
500	0.002	0.01	0.93	0.76	0.98	60.15	0.92	0.95	19.57	0.97	0.99	17.07	0.96	0.99	15.21	0.96	0.99
500		0.1	0.85	0.81	0.96	14.83	0.78	0.97	29.09	0.89	0.97	1.48	0.72	0.99	1.48	0.72	0.99
500		0.5	0.35	0.90	0.47	10.56	0.37	0.93	58.81	0.17	0.51	5.20	0.21	0.96	5.20	0.21	0.96
500		1	0.18	0.90	0.30	11.37	0.29	0.92	66.35	0.02	0.35	6.61	0.12	0.94	5.67	0.00	0.94
10000	0	0.01	0.95	0.89	0.97	27.86	0.96	0.99	0.00	0.00	1.00	7.57	0.96	1.00	0.00	0.00	1.00
10000		0.1	0.98	0.89	1.00	31.29	0.99	1.00	0.73	0.73	1.00	1.53	0.98	1.00	0.00	0.00	1.00
10000		0.5	0.98	0.88	0.98	31.21	0.99	1.00	7.60	0.99	1.00	4.98	0.95	1.00	3.79	0.99	1.00
10000		1	0.94	0.88	0.95	31.47	0.95	0.98	5.25	0.82	0.99	5.15	0.87	0.99	2.80	0.89	1.00
10000	0.002	0.01	0.53	0.17	0.79	56.84	0.21	0.55	34.41	0.44	0.81	22.76	0.17	0.81	22.56	0.17	0.81
10000		0.1	0.27	0.04	0.89	72.06	0.00	0.28	24.08	0.54	0.89	11.12	0.00	0.89	11.12	0.00	0.89
10000		0.5	0.17	0.17	0.70	74.68	0.02	0.27	32.26	0.09	0.70	28.57	0.00	0.71	28.57	0.00	0.71
10000		1	0.14	0.34	0.78	67.03	0.07	0.37	25.33	0.11	0.78	20.70	0.05	0.80	20.70	0.05	0.80

Initial DXY FPR were generally high irrespective of demographic treatment, ranging between 0.78 and 0.93. FPR were largely similar after 500 generations, but by 3,000 generations there was a distinction between treatments with (0.07 ≤ FPR ≤ 0.71) and without (0.85 ≤ FPR ≤ 0.88) migration. Interestingly, here FPR rates were lower (0.07 ≤ FPR ≤ 0.24) when DP were smaller (size = 0.01, 0.1) rather than larger (0.47 ≤ FPR ≤ 0.71). This pattern was also observed at the end of simulations, with FPR lower without migration (0.01 ≤ FPR ≤ 0.34) and lowest with DP size = 0.1. Δπ FPR rates were generally higher across all sampling generations, with FPR not falling below 0.28. There was a clear distinction based on DP size in earlier (100 and 500) sampling points, with FPR lower with larger DP size. However, at later sampling generations (3,000 and 10,000), FPR were generally high (≥ 0.68) regardless of treatment. Taken together, there are clear effects of DP size and migration on distributions of genetic variation and upper quantiles of interest. DP size appears initially most important in the first few hundred generations for FST and Δπ, but these effects are later swamped by the effect of migration for FST and are simply eroded for Δπ. DXY exhibits similar patterns to FST, but these develop after many more generations, and while gene flow increases the informativeness of DXY for detecting divergent selection, as it does for FST, DXY and FST experience opposing effects of increases to DP size.

Demography moderates relationships between measures of divergence

Given FST, DXY and Δπ are all measures of population genetic divergence, there is an assumption that positive correlations should exist between them. We employed the same analysis as above for correlations with selection, but instead examined correlations between individual measures. Founding bottlenecks had minimal effects on the correlations observed between all measures of divergence (Figure 6A).

Figure 6

Effects of demographic treatments on the relationship between measures of genetic divergence across all sampling generations. Point color and shape denote treatment groups for founding bottlenecks (A), prolonged bottlenecks (B) and migration (C). Each point represents correlation coefficients calculated across all genes within individual treatments groups, averaged within treatment levels, and averaged over 20 iterations. Positive correlations between FST and DXY emerged rapidly irrespective of DP size, but smaller DP sizes generally increased the correlation (Figure 6B), with variance between treatments generally decreasing over time. Similarly, FST and Δπ were generally positively correlated, however reductions in DP size reduced correlation coefficients. By 4,000 generations, FST - Δπ correlations for DP sizes ≥ 0.1 stabilized around 0.4, but correlations under extreme prolonged bottlenecks continued to decline to a low of 0.19 (Figure 6B). DXY - Δπ correlations were generally weaker than other comparisons across the course of simulations, but were minimally affected by prolonged bottlenecks. Migration induced substantial variance between correlations of divergence measures, with effects dependent on sampling point (Figure 6C). In the absence of migration, FST and DXY were more strongly correlated for the first 4,000 generations than in treatments with migration. However, from here until 10,000 generations this pattern reversed and FST - DXY correlations increased with migration and deteriorated in allopatry. FST - Δπ correlations were strong initially, but a lack of migration weakened the correlation over time until measures were uncorrelated by around 4,000 generations. In contrast, in the presence of migration, positive correlations between FST and Δπ were strong and relatively stable (R2 = 0.75 – 0.61 over the whole simulation period). Interestingly, without migration, DXY and Δπ were largely uncorrelated, but migration induced a positive correlation between DXY and Δπ that emerged after 2,000 generations and continued to increase through time. Contextualised by our previous demonstrations of associations with selection, these results highlight that selection can induce positive correlations between measures. By 10,000 generations, all pairwise comparisons of divergence measures become positively correlated with migration, which we know is when variation is most strongly associated with selection. Crucially however, this relationship only emerges after several thousand generations, before which we observe positive correlations in migration-absent treatments when associations with selection are weak for all measures (FST - DXY in particular). Extreme reductions in DP size also increase positive correlations between FST and DXY despite poor associations with selection in these treatments. These results highlight that positive correlations between statistics are also achievable in the absence of divergent selection. It is also interesting to note the decay of correlations with Δπ and both FST and DXY in the absence of migration. These observations are most likely driven by the substitution rate. Substitutions that are not linked to selection (drift with ineffective selection) likely drive increased FST and DXY but not Δπ. In PhenoNull simulations, FST and DXY were positively correlated in all treatments, but stronger associations linked with demographic treatments with effective selection did not emerge (Figure S22). This was also true for neutral simulations (Figure S23), but FST - DXY correlations were slightly larger than for PhenoDiv and PhenoNull when selection was ineffective, reaching a maximum of R2 = 0.81 without migration (Figure S23C). This demonstrates that the positive correlations in PhenoDiv simulations are driven in part by divergently adaptive allele frequency shifts and substitutions when selection is effective, but these correlations can also emerge under stabilizing selection or neutrality. Specifically, strong positive correlations with Δπ were dependent on divergent selection, and FST – Δπ correlations became negative over time in treatments without migration under neutrality. DXY – Δπ correlations became negative in PhenoNull data with migration and were unassociated without. Thus, these results highlight that the relationships between measures of divergence are highly dependent on migration, population size, time, and selection experienced over a genomic region. We then examined FPR and false negative rates (FNR) when combining outliers across FST, DXY and Δπ (Table S1; summaries from 500 and 10,000 generations in Table 2). Combined FST - DXY outliers exhibited FPR rates that were highly variable (0.00 - 0.95) and similar to or slightly lower than FST alone at generations 100 and 500, suggesting some improvement in reducing FPR. During this period, FNRs were also high (≥ 0.78), suggesting most regions with divergent selection could not be detected on a neutral backdrop. Combined FST - DXY outlier sets performed well at generations 3,000 and 10,000 for treatments with migration, in some cases dropping to 0 although FNR were highly variable (0.26 ≤ FNR ≤ 1.00). High FPR (0.84 ≤ FPR ≤ 0.99) and high FNR (0.93 ≤ FPNR ≤ 1.00) were observed without migration, highlighting most common outliers between FST and DXY to be neutral, and a failure to detect almost all divergent regions. Combined FST - Δπ outliers tended to outperform FST and Δπ outliers according to FPR with DP sizes of 0.5 or 1.0 and in earlier (100 and 500) sampling generations. Here, FPR dropped to a low of 0 but FNR were reasonable through this time (≥ 0.35). Beyond this (sampling generations 3,000 and 10,000), FPR again dropped to 0, however these were generally alongside high FNR of up to 1.0 without migration, highlighting a failure to detect any common outliers at all. A good example of improvement over singular measures was observed at generation 500, with a founding bottleneck, equal sized populations, and migration. Here, an average of ∼66% of divergent regions were detected with an average FPR of 0. This is compared with: singular FST, where FPR/FNR = 0.18; and singular Δπ where FPR/FNR = 0.29. By 10,000 generations, combined outlier sets of FST - Δπ performed poorer than singular FST with migration present, but generally returned low numbers of false positives when migration was absent, unlike FST - DXY. Combined DXY - Δπ outliers performed poorly across all treatments and all sampling generations, with FNR failing to fall below 0.71. There were, however, some benefits in terms of low FPR for treatments with larger DP sizes (0.5 and 1.0) across all sampling points, suggesting high confidence in outliers (although most divergent regions are missed). This discordance between DXY and Δπ and large FNR also limited the combined usage of all three measures together, with similarly high FNR largely precluding their combined usage. All three statistics did exhibit low FPR with larger DP populations at 100, and 500 generations despite migration. However, at later sampling generations performance of all three combined outlier sets was poor (high FPR/FNR) if migration was absent. These results therefore highlight that combining measures can help reduce FPR, but usually at the cost of increased FNR (expectedly), and only under certain demographies. Our observation that high FPR are prevalent among combined outlier sets from statistics, particularly in the absence of migration, suggests their usage must be dependent on a knowledge of disparity in population size and connectivity of populations. These findings also highlight that the strong correlations that emerge between measures of divergence under scenarios with ineffective selection or even under neutrality do extend to the tail-ends of distributions.

Demography drives signals of convergence irrespective of selection

Clusters of overlapping outliers developed steadily over time (Figure S26-28). By sampling generation 10,000, significant proportions of overlapping outliers were recovered across different demographic treatment groups (Figure 7). For FST and DXY, clustering of treatments was driven by the presence/absence of migration, with the highest proportions of convergent outliers observed between treatments with migration.

Figure 7

The proportional overlap of outliers above the 95% quantile, averaged across 100 downsampled datasets consisting of 95% Neutral and 5% PhenoDiv data for each of the 16 demographic treatments after 10,000 generations. Axis orderings were determined through hierarchical clustering. Heatmaps are shown for single measures of FST, DXY and Δπ in the first column, and combined measures in the second column. Heatmaps are colored according to a common scale of 0 to 1. Treatments are labeled with founding bottleneck (Bot), DP population size (Pop2), and migration (Mig) values. Interestingly, for DXY reasonable proportions of convergent outliers were also recovered across no migration treatments, and the same was true of FST by 3,000 generations (Figure S28), and for both measures at 10,000 generations in ‘smaller’ populations with reduced scaling (μ = 4.89e-7, r = 1e-7; Figure S37). This is despite these treatment groups lacking effective selection. Importantly, there was little overlap between migration and no-migration clusters, suggesting different convergent outliers within each. Combining outliers from FST and DXY reduced overlap among no-migration treatments, but did not remove overlapping outliers altogether. There were minimal convergent outliers observed for Δπ outliers, however combining Δπ with FST and DXY did appear somewhat effective in removing convergent outliers found between no-migration treatments. However, for both combination of Δπ with FST and DXY, the highest proportional overlap was observed for treatments with the smallest DP size. Interestingly, the clustering of FST - DXY outliers in the presence of migration was greatly reduced when divergent selection was removed in both PhenoNull (Figure S29-32) and neutral data (Figure S33-36), but clusters of outliers in the absence of migration were still apparent. Because migration was the dominant factor in clustering of treatments with convergent outlier overlap, we sought to investigate what features of simulated genes drove variance in measures of divergence with treatments separated by migration factor using linear mixed models at the final sampling generation. With migration between AP and DP, selection had by far the strongest effect on FST (Table 3) in the expected positive direction. We also observed a weaker positive association with selection target (% coding) of gene (Table 3), and weaker negative associations with Pheno Gen (generation DP optimum reached) (Table 3). These fixed effects explained 48.96% of variance in FST with migration.

Table 3

LMM results for models of measures of divergence explained by features of simulated regions. For all models, random variables included gene ID and demographic treatment

		Var. explained (%)
Measure	Migration	Fixed	Total	Fixed Effect	Estimate	Std. Error	df	t	P
F_ST	0.002	48.96	69.12	Selection	0.044	0.002	98	27.27	<2.00E-16
				Selection Target	0.009	0.001	103	9.32	2.47E-15
				Pheno Gen	−0.002	0.000	14530	−7.55	4.60E-14

F_ST	0	1.10	96.84	Selection	0.005	0.001	96	4.32	3.78E-05
				Pheno Gen	0.000	0.000	14480	−2.58	0.010
				Exon N	0.005	0.002	96	2.51	0.014
				Selection Target	0.004	0.002	97	2.46	0.015

D_XY	0.002	9.53	84.25	Selection	0.007	0.000	98	16.97	<2.00E-16
				Pheno Gen	−0.001	0.000	14560	−10.49	<2.00E-16
				Exon N	−0.001	0.000	101	−4.15	6.89E-05

D_XY	0	30.70	53.94	Selection Target	−0.003	0.000	98	−11.57	<2.00E-16

Δπ	0.002	8.27	76.20	Selection	0.184	0.006	99	29.82	<2.00E-16
				Selection Target	0.018	0.004	120	4.71	6.84E-06
				Pheno Gen	0.006	0.002	14250	2.92	0.004

Δπ	0	0.59	95.84	Selection	0.066	0.004	99	16.14	<2.00E-16
				Exon N	−0.018	0.003	111	−6.15	1.24E-08
				Pheno Gen	−0.004	0.001	13900	−3.33	8.67E-04

Conversely, fixed effects explained minimal variance (1.10%) of FST in treatments without migration. These fixed effects were significant, but had weak positive associations with selection, exon N and selection target %, and a weak negative association with Pheno Gen. DXY was similarly most strongly positively associated with selection in treatments with migration (Table 3), but the strength of this effect relative to the other model effects was not as large as observed for FST. DXY was also negatively associated with Pheno Gen (Table 3), as was FST, but negatively associated with Exon N (Table 3). This model however explained less variance of DXY with migration (9.53%) than FST with migration. In treatments without migration, selection target % was strongly negatively associated with DXY (Table 3), and this fixed effect explained 30.70% of DXY variance without migration. Selection was the most important fixed effect for Δπ regardless of migration (Table 3), however both models explained minimal variance (8.27% with migration, 0.59% without migration). This is consistent with the previous demonstration of the erosion of Δπ over time (Figure 1). Removing divergent selection (i.e., in PhenoNull simulations) modified models for FST and DXY for treatments with migration. Selection and selection target remained the most important model effects, but had more similar sized effects, and ultimately variance explained dropped from 48.96 to 3.60%. Conversely, the model for PhenoNull DXY with migration increased variance explained from 9.53 to 13.10% compared with PhenoDiv DXY. Selection target had a strongly significant negative association alongside strength of selection in this model. Δπ models were largely unchanged. Together, these results confirm that divergent selection, and not stabilizing selection within DP, drive FST and DXY variation in PhenoDiv simulations.

Discussion

Summary of results

Here, we show that features of demography can have dramatic effects on how measures of population divergence identify regions of the genome under selection. Effects are also strongly time-dependent. Using simulated populations, we have demonstrated the relative influences of founding bottlenecks, prolonged bottlenecks, and migration on three commonly used measures of genetic divergence (FST, DXY, Δπ), while demonstrating the relative usefulness of each measure for informing on selection when population sizes and connectivity vary. We find that founding bottlenecks have little effect on population divergence measures, potentially either because populations quickly recover (before first sampling after 100 generations), or founding bottlenecks of 10% (100 individuals) were not extreme enough. Prolonged bottlenecks (reductions in DP size) however, artificially inflate FST and Δπ but reduce DXY, and can erase the relationship of FST and DXY with selection under the most extreme reductions in population size. Relative measures are, in part, driven by intra-population changes in allele frequency, which become exaggerated in smaller populations as a product of drift (Charlesworth 2009; Ellegren and Galtier 2016). As a consequence, we observe inflated measures of relative divergence as allele frequencies drift in smaller DP replicates. In contrast, DXY increases with larger DP size, as a product of the relationship between the number of segregating sites and the population-level mutation rate (4Neμ) (Hartl ). DXY is a measure of sequence divergence and is averaged across all sites (although similar statistics limit averaging to segregating sites only), which results in higher DXY as segregating sites are introduced into either population at a rate of 4Neμ. This relationship with the number of segregating sites can be observed by examining the positive relationships between DXY and πDP (Figure S38). Overall, the relationship between DXY and selection is less affected by prolonged bottlenecks than FST and Δπ, likely due to the lack of allele frequency relevance. Consider for example, two SNPs with minor allele frequencies of 0/0.5 (SNP 1) and 0.5/0.5 (SNP 2) in AP/DP. Each locus contributes equally toward Dxy (SNP 1 = [0 × 0.5] + [1 × 0.5] = 0.5; SNP 2 = [0.5 × 0.5] + [0.5 × 0.5] = 0.5), whereas the reduction of within-population variance observed for SNP 1 inflates FST and Δπ. However, we also see evidence of DXY FPR increasing with increased DP size (Table 2), suggesting this relationship with increased acquisition of segregating sites may conflict with increased efficacy of selection. Migration, even at the relatively modest rate of 0.2% employed here, substantially reduced absolute values for all measures of divergence. However, while absolute values were reduced, their informativeness of selection coefficients increased dramatically (both in terms of their overall relationship with selection and in identifying outliers). Such a result is expected given the role of gene flow in homogenizing neutral loci (reducing measures of divergence), while retaining population divergence around adaptive loci (increasing informativeness). The well-known ‘genomic islands of divergence’ model is often invoked to explain this pattern of heterogenous genomic divergence in studies of speciation-with-gene-flow (Turner ; Nosil ). Migration also exhibited interesting temporal patterns that are useful for discussing the discrepancies observed between Δπ and both FST and DXY. Of the measures considered here, Δπ uniquely disregards SNP substitutions in its calculation, as fixed substitutions have no influence on intra-population heterozygosity beyond reducing the heterozygosity of proximate SNPs in the aftermath of a hard sweep. This characteristic explains why Δπ responds to selection more rapidly than FST and Dxy and is influenced by migration in a constant manner across time. In addition, the inability of Δπ to account for adaptive substitutions is likely why the informativeness of Δπ peaked at around 500 generations (approximate median for sweeps), decayed, and then stabilized. In contrast, the contributions of adaptive substitutions to FST and Dxy over time increases their predictive power. This characteristic of Δπ, however, also makes it unable to differentiate between divergent and stabilizing selection. This temporal observation has important implications for studies of rapid adaptation on the scale of 10s to 100s of generations, in which our simulations suggest Δπ may be the most informative measure of divergence, while Dxy is initially uninformative for several thousand generations. Examining the effects of bottlenecks and migration on the associations between measures of divergence themselves is a novel element to this study. The relative weaknesses of individual measures of divergences has prompted a recent movement within the literature to employ multiple measures of divergence to avoid false-positives (for e.g., Tine ; Malinsky ; Hämälä and Savolainen 2019). Our results provide some support for this strategy, with reductions in FPR in treatments with gene flow generally, and low FPR when combining either FST or DXY with Δπ (albeit at a reasonably high FNR). However, we also find large numbers of overlapping outliers across combined FST - DXY with a large FPR across treatments without migration. Further, these false-positives persisted in PhenoNull and neutral data, whereas clusters of treatments with overlapping outliers (Figure 7) and with low FPR were restricted to simulations with divergent phenotypes. It is clear then that combining outlier sets of FST and DXY only improves analyses under certain demographic histories, in particular when populations are connected by migration. Cruickshank and Hahn (2014) suggested that a disagreement between FST and DXY outliers in genomic-islands-of-divergence tests highlights a particular susceptibility of FST to BGS (although see (Matthey‐Doret and Whitlock 2019)). Our results are in line with the notion that DXY may be more resistant to false positives due to BGS, given that reductions in DP size resulted in minimal variance in DXY (assuming reductions in population size are analogous to different rates of BGS across the genome exhibit reduced Ne). However, BGS was not specifically manipulated in these simulations, and results are difficult to disentangle with variation in efficacy of removing deleterious mutations. Further, this minimal variance may be explained by the opposing forces of selection efficacy and acquisition of segregating sites discussed previously. We also found that a greater proportion of variance could be explained by the size of selection target for DXY (30.7%) in the absence of migration than for FST (1.1%). We found that FST and DXY are positively correlated under several demographic scenarios, but this strong association only reflects selection when gene flow is present and only after several thousand generations, consistent with previous simulation work (Ravinet ). When migration is absent, and selection ineffective, FST and DXY are also positively correlated; but this relationship decays over time (Figure 6C), with empirical evidence (see below) suggesting a negative relationship is likely to emerge without gene flow. No-migration treatments are consistent with Isolation-By-Distance demographic histories and, similarly, comparisons between reproductively-isolated populations or species. Indeed, negative correlations between FST and DXY within and between clades of birds (Irwin ; Vijay ), within a radiation event of monkeyflowers (Stankowski ) and between speciating orca populations (Foote ) support a declining relationship between these measures over long periods of time in isolation. In agreement, we find that without migration FST and DXY exhibit opposite associations with the proportion of regions made up of coding elements. Mechanistically, a larger selection target increases the rate of deleterious mutation, reducing local π directly through loss of polymorphic deleterious sites, or indirectly through the loss of linked neutral variants under a BGS model. Reductions in local π increase FST and decrease DXY. Over time, associations between FST and DXY should stabilize, given π is generally well-conserved in stable populations even across long time periods (Romiguier ; Dutoit ; Van Doren ). By comparing our results to a second dataset that lacked divergent selection (PhenoNull), we found consistent support that positive associations between DXY and selection are strongly dependent on the inclusion of a divergent phenotype. Positive associations with FST are attainable only in the absence of migration, and are weakly negative without, and Δπ patterns are primarily driven by variable stabilizing selection and made weaker by the inclusion of phenotypic divergence. These comparisons are useful in highlighting the relative roles of adaptive allele frequency changes and substitutions in driving patterns of genetic differentiation. It is also of interest to note that overlapping outliers are readily attainable (most strongly for DXY) across no-migration treatments in PhenoNull (Figure S32) and neutral simulations (Figure S36), but not for treatments with migration. This confirms that overlapping outliers in no-migration treatments occur due to common non-divergent processes within genomic regions, whereas overlapping outliers in treatments with migration are driven by the effects of divergent selection. The discrepancies between our PhenoDiv and other simulated datasets highlight the necessity in quantifying phenotypic differences or environmental selection pressure when interpreting patterns of variation across the genome.

Detecting genetic convergence

In addition to understanding how outlier detection in individual pairs were affected by bottlenecks and migration, we wanted to explore how studies looking at overlapping outliers in multiple pairs (i.e., detecting genetic convergence) were affected by these aspects of demography. Our simulation design, in which an ancestral burn-in population is used to found 16 independent AP-DP pairs is analogous to replicated ecotype population pairs in model systems, such as various ecotype pairs of the three-spined stickleback, Gasterosteus aculeatus (Hohenlohe ; Jones ); high/low predation Trinidadian guppies, Poecilia reticulata (Fraser ); crab/wave ecotype periwinkles, Littorina saxitilis (Westram ; Ravinet ; Kess ), and alpine/montane ecotypes of Heliosperma pusillum (Trucchi ). We found overlapping outliers between demographic treatments and thus, that signals of convergent outliers are attainable for singularly used FST and DXY, and combined outlier sets for FST – Δπ, DXY – Δπ and FST - DXY. Clustering of outliers was predominantly driven by the presence or absence of migration (with minimal overlap between clusters) (Figure 7). The overlap of quantile-based outliers between demographic treatments without migration and ineffective selection may be explained in part probabilistically. With migration restricted, AP and DP exhibit significantly elevated measures of divergence, as seen in Figure 1. This increase results in a normal distribution of FST and DXY across neutral simulations without migration. In contrast, with migration, drift is limited and random recombination influences gene flow and subsequently variation in divergence. Distributions of divergence with migration under neutrality are therefore are heavily right tail-skewed (Figure S39). Spread of data increases with longer right tails, and density of data in each overlapping distribution is limited to the median and lower quantiles. We also expect some effect of different amounts of starting variation among genome regions following burn-ins. With gene flow restricted, this variation may promote overlap under neutral conditions given all demographic treatments are founded from a common burn-in. However, this feature of our simulations is analogous to the conserved landscapes of variation observed in natural genomes (Burri 2017; Vijay ; Stankowski ). Empirically, the influence of migration on outlier overlap has been observed in replicate pairs of parasitic and non-parasitic lampreys. As we show here, when comparing outliers from disconnected and connected parasitic/non-parasitic pairs, Rougemont recover greater numbers of overlapping outliers among comparisons of disconnected pairs than connected pairs. Overlapping outliers among connected pairs are however better correlated, which the authors suggest reflects selection.

Limitations of simulations

In our analyses, we grouped genomic regions within 16 unique demographic treatments, assuming the effects of reductions in population size and migration are equivalent for all regions. This may be unrepresentative of genomes sampled from the wild, in which gene flow and effective population size can vary across genomic regions through structural variation, variable recombination rate and BGS (Gossmann ). A prominent example of such a mechanism is the well-characterized consequence of reduced gene flow within inversions that carry locally adapted alleles. Assuming inversion variants are fixed between populations, gene flow across the locus is limited by the prevention of recombinant haplotypes and resistance to introgression (Kirkpatrick and Barton 2006; Ravinet ). Recent genome scan studies have highlighted convergent outliers within inversions (Jones ; Nishikawa ; Morales ), confirming theoretical models regarding their role in shielding adaptive haplotypes from introgression during the adaptation process. Our simulations suggest that genes with reduced migration, and reduced Ne as a consequence, will have inflated measures of divergence and differentiation relative to other genomic regions. Therefore, we predict this may have led to their over-representation in genome scan outliers, and increased potential to overlap across replicate populations. Thus, caution should be taken regarding the adaptive significance of these outliers relative to absolutely lower values of genetic divergence attained from regions outside of chromosomal rearrangements. By choosing to use a factorial design here, we have increased our understanding of the interplay between specific features of demographic history and multiple measures of population divergence. However, computational limitations constrained absolute population sizes to a maximum of 1,000 individuals and generations to a maximum of 10,000 (with 10,000 generation burn-in). To mitigate these constraints, we repeated the analysis over multiple mutation rates (and scaled recombination rate) to illustrate patterns over 100-fold variation of θ. In general, most trends were consistent, suggesting that results should be consistent across taxa of variable effective population size. However, certain patterns were exaggerated or dampened with increased or decreased mutation rate respectively. For instance, the overlap of outliers between no-migration treatments was exaggerated at our smallest level of population scaling (Figure S37), suggesting false-positive signals of convergence may be more likely with reduced Ne. Further, patterns associated with DXY and selection vary according to population size, appearing delayed in smaller populations (Figure S7) and swamped by mutational input in larger populations (Figure S6). This latter effect can induce negative associations between selection and DXY with effective selection, as selection reduces local genetic variation that would otherwise increase DXY. It is well-documented that both Ne (Frankham 1995) and mutation rate (μ) (Hodgkinson and Eyre-Walker 2011) are highly variable across taxa, which suggests that applying knowledge of the relationships between measures of population differentiation will vary in nature. In addition, temporal variation within our simulations is confounded by the time at which there was a major shift in the DP population phenotype (Figure S40), and by extension when selective sweeps occur. This variation in timing is random with respect to adaptive mutations arising de novo, but is also influenced by demographic treatment. For example, treatments that experience founding bottlenecks are less likely to evolve using variation in the founder, increasing dependence on de novo mutations for adaptation. Additionally, variation in our DP size parameter modifies the per population number of new mutations per generation. This is also true for features of simulated genes, such as size of selection target. Predictable temporal variation in the time at which adaptation is likely to occur is a probable source of variance between measures of divergence and is particularly clear for Δπ, but this was controlled for in later modeling analyses as a fixed effect. A further consideration for these simulations concerns the architecture of the phenotype. Results reported here pertain to mutation effect sizes drawn from a distribution centered at 0 with σ = 1. This produces mutations of typically large effect, but was selected on the basis of phenotypic optima being reasonably distant, with a difference of 10. Thus, 99% of mutations in simulations produce phenotypic differences of less than a third the divergence distance of AP and DP phenotypes. There are numerous factors that influence the distribution of mutation effect sizes in nature, including: selection, mutation, drift, gene flow, extent of pleiotropy and distance to phenotypic optima (Dittmar ), with no single expectation for natural systems as a result. The relatively large distance between optima in our simulations, as well as the rapid change in optima implemented in simulations (Collins and De Meaux 2009), likely gives increased importance to mutations of large effect. The interactions between mutation effect size and the results presented here are beyond the scope of the current study, but we did investigate the effect of reducing μσ of mutation effect distributions to 0.1 (Supporting Information; Figures S40-45). Briefly, we see reductions in the strength of correlations and associations with selection with decreasing phenotypic effects of mutation, consistent with the notion of softer sweeps on small-effect loci. We also see increased variance in the amount of time taken for simulations to reach the PhenoDiv optimum, which agrees with the probable importance of large effect loci in our standard dataset. We, however, retain strong positive associations between measures such as FST and DXY, as we see in neutral simulations, as well as overlapping outliers linked to selection at the tail ends of PhenoDiv distributions. Running the simulations in this way suggests that many of the patterns described here may be robust to scenarios with reduced mutation effect sizes. It is also important when translating these results to genomic data to consider how correlations between measures of divergence depend on the selection type used in simulations. For example, FST and DXY are strongly positively correlated under neutrality without migration, but under the same demographic scenarios we observe a decay in the relationship between FST and DXY when divergent selection is involved. Genomes of natural populations will include regions that are neutral or nearly neutral, under stabilizing selection around a common phenotypic optimum, or divergent between populations. Thus, the patterns described here may not apply to all genomic regions pooled together.

Concluding remarks

We have used forward-in-time simulations to perform a factorial experiment in which we explored the relationships between three measures of genetic divergence, selection, and features of demographic history that are commonly variable in natural populations. In agreement with previous theoretical work, we found the reliance of measures of genetic divergence to identify loci under selection are dependent on population size and migration, and variable through time, with a notable lag in DXY. Furthermore, we provided novel comparisons between measures of genetic divergence that call into question the use of multiple measures to rule out false-positives. We also demonstrated that signals of convergent evolution across independent replicates can be driven by similar features of demographic history with minimal influences of selection, specifically, across replicate populations pairs that lack migration. Therefore, we strongly advise those using overlapping outlier scans to carefully consider the demographic context of their system to avoid false-positives. In particular, the presence or absence of migration between diverging populations is a key factor determining the informativeness of genetic variation for selection, and importantly shapes our expectations of outlier overlap among replicate population pairs. It is tempting to assume that replication in study design or analysis in the form of taking multiple measures of genetic divergence can reduce the risk of attaining false-positives. We hope to emphasize in this study that this is not always the case, as false-positives (i.e., genome scan outliers that are not associated with regions under divergent selection) can be driven by non-random genomic or common demographic features that cannot be bypassed through replication. Moreover, many of the patterns we observe are variable through time, such that the relevant pitfalls of analyses will depend on the age of the populations being considered. It should thus also be important to estimate population splits, as a young replicate pair and older replicate pair with similar demographic histories should be expected to exhibit potentially different patterns of genetic variation. Recent simulation work by Quilodrán , has also emphasized the influence of genomic features and demography, and includes simulation software for estimating the distribution of genetic variation over user-defined chromosomes. Such an approach is particularly useful for systems with chromosome-level genome assemblies in order to gain a sense for how features such as recombination, gene density, and selection targets may produce false-positives under certain demographies. The software employed here, SLiM (Haller and Messer 2019), may also be used to this end, and the scripts accompanying this study will facilitate similar analyses over system-specific genome regions. Further, recent work on genomic landscapes of linked selection (Stankowski ) has highlighted that much of the total variance of genetic divergence such as FST, DXY, and π can be explained by the major principal component (PC1) over numerous pairwise comparisons. These population comparisons need not reflect divergent phenotypes, as PC1 reflects genomic features associated with diversity landscapes. Adopting this approach may be useful for systems lacking a chromosome-level genome assembly by estimating SNPs or regions with non-random elevated measures of divergence associated with genome features. Such SNPs or regions may be particularly prone to false-positives under certain demographic histories.

63 in total

1. Quantifying the variation in the effective population size within a genome.

Authors: Toni I Gossmann; Megan Woolfit; Adam Eyre-Walker
Journal: Genetics Date: 2011-09-27 Impact factor: 4.562

Review 2. Fundamental concepts in genetics: effective population size and patterns of molecular evolution and variation.

Authors: Brian Charlesworth
Journal: Nat Rev Genet Date: 2009-03 Impact factor: 53.242

3. Background selection and F_ST : Consequences for detecting local adaptation.

Authors: Remi Matthey-Doret; Michael C Whitlock
Journal: Mol Ecol Date: 2019-09-03 Impact factor: 6.185

4. Genomewide patterns of variation in genetic diversity are shared among populations, species and higher-order taxa.

Authors: Nagarjun Vijay; Matthias Weissensteiner; Reto Burri; Takeshi Kawakami; Hans Ellegren; Jochen B W Wolf
Journal: Mol Ecol Date: 2017-06-27 Impact factor: 6.185

Review 5. Determinants of genetic diversity.

Authors: Hans Ellegren; Nicolas Galtier
Journal: Nat Rev Genet Date: 2016-06-06 Impact factor: 53.242

6. Genomics of Parallel Ecological Speciation in Lake Victoria Cichlids.

Authors: Joana Isabel Meier; David Alexander Marques; Catherine Elise Wagner; Laurent Excoffier; Ole Seehausen
Journal: Mol Biol Evol Date: 2018-06-01 Impact factor: 16.240

7. A coalescent model of background selection with recombination, demography and variation in selection coefficients.

Authors: K Zeng
Journal: Heredity (Edinb) Date: 2012-11-28 Impact factor: 3.821

8. Genome-culture coevolution promotes rapid divergence of killer whale ecotypes.

Authors: Andrew D Foote; Nagarjun Vijay; María C Ávila-Arcos; Robin W Baird; John W Durban; Matteo Fumagalli; Richard A Gibbs; M Bradley Hanson; Thorfinn S Korneliussen; Michael D Martin; Kelly M Robertson; Vitor C Sousa; Filipe G Vieira; Tomáš Vinař; Paul Wade; Kim C Worley; Laurent Excoffier; Phillip A Morin; M Thomas P Gilbert; Jochen B W Wolf
Journal: Nat Commun Date: 2016-05-31 Impact factor: 14.919