María Gabián1, Paloma Morán1, María Saura2, Antonio Carvajal-Rodríguez1. 1. Centro de Investigación Mariña (CIM), Departamento de Bioquímica, Genética e Inmunología, Universidade de Vigo, 36310 Vigo, Spain. 2. Departamento de Mejora Genética Animal, Instituto Nacional de Investigación y Tecnología Agraria y Alimentaria (INIA-CSIC), 28040 Madrid, Spain.
Abstract
Pollution and other anthropogenic effects have driven a decrease in Atlantic salmon (Salmo salar) in the Iberian Peninsula. The restocking effort carried out in the 1980s, with salmon from northern latitudes with the aim of mitigating the decline of native populations, failed, probably due to the deficiency in adaptation of foreign salmon from northern Europe to the warm waters of the Iberian Peninsula. This result would imply that the Iberian populations of Atlantic salmon have experienced local adaptation in their past evolutionary history, as has been described for other populations of this species and other salmonids. Local adaptation can occur by divergent selections between environments, favoring the fixation of alleles that increase the fitness of a population in the environment it inhabits relative to other alleles favored in another population. In this work, we compared the genomes of different populations from the Iberian Peninsula (Atlantic and Cantabric basins) and Scotland in order to provide tentative evidence of candidate SNPs responsible for the adaptive differences between populations, which may explain the failures of restocking carried out during the 1980s. For this purpose, the samples were genotyped with a 220,000 high-density SNP array (Affymetrix) specific to Atlantic salmon. Our results revealed potential evidence of local adaptation for North Spanish and Scottish populations. As expected, most differences concerned the comparison of the Iberian Peninsula with Scotland, although there were also differences between Atlantic and Cantabric populations. A high proportion of the genes identified are related to development and cellular metabolism, DNA transcription and anatomical structure. A particular SNP was identified within the NADP-dependent malic enzyme-2 (mMEP-2*), previously reported by independent studies as a candidate for local adaptation in salmon from the Iberian Peninsula. Interestingly, the corresponding SNP within the mMEP-2* region was consistent with a genomic pattern of divergent selection.
Pollution and other anthropogenic effects have driven a decrease in Atlantic salmon (Salmo salar) in the Iberian Peninsula. The restocking effort carried out in the 1980s, with salmon from northern latitudes with the aim of mitigating the decline of native populations, failed, probably due to the deficiency in adaptation of foreign salmon from northern Europe to the warm waters of the Iberian Peninsula. This result would imply that the Iberian populations of Atlantic salmon have experienced local adaptation in their past evolutionary history, as has been described for other populations of this species and other salmonids. Local adaptation can occur by divergent selections between environments, favoring the fixation of alleles that increase the fitness of a population in the environment it inhabits relative to other alleles favored in another population. In this work, we compared the genomes of different populations from the Iberian Peninsula (Atlantic and Cantabric basins) and Scotland in order to provide tentative evidence of candidate SNPs responsible for the adaptive differences between populations, which may explain the failures of restocking carried out during the 1980s. For this purpose, the samples were genotyped with a 220,000 high-density SNP array (Affymetrix) specific to Atlantic salmon. Our results revealed potential evidence of local adaptation for North Spanish and Scottish populations. As expected, most differences concerned the comparison of the Iberian Peninsula with Scotland, although there were also differences between Atlantic and Cantabric populations. A high proportion of the genes identified are related to development and cellular metabolism, DNA transcription and anatomical structure. A particular SNP was identified within the NADP-dependent malic enzyme-2 (mMEP-2*), previously reported by independent studies as a candidate for local adaptation in salmon from the Iberian Peninsula. Interestingly, the corresponding SNP within the mMEP-2* region was consistent with a genomic pattern of divergent selection.
Entities:
Keywords:
Atlantic salmon; NADP-dependent malic enzyme-2; SNP array; divergent selection; local adaptation
Wild populations of Atlantic salmon (Salmo salar) from the Iberian Peninsula suffered a drastic decrease in the last three decades of the 20th century, mainly due to anthropogenic actions such as pollution [1], the construction of physical barriers that reduced the habitat, and the increase in overfishing [2,3].To reduce this drastic decline, efforts were focused on restocking from 1981 to 1994, not only with eggs from the wild but also purchased from fish farms, mainly from rivers located in Scotland and Norway. Restocking was done by planting eggs into the river or by releasing fry previously hatched in local fish hatcheries (a practice known as supportive breeding (see [4] and references therein). Nevertheless, there was no increase in the effective population size, suggesting that the introduction of foreign genomes may have contributed even more to the decline of some populations [5,6].The restocking fail was probably due to a deficiency in the adaptation of salmon from northern Europe to the warm waters of the Iberian Peninsula since this area constitutes the southern limit of distribution of this species in Europe [7]. In fact, local adaptation has been described in previous studies for other populations of this species and salmonids in general [8,9,10,11]. Therefore, it seems that the Iberian populations of Atlantic salmon may have also experienced local adaptation at some point in their evolutionary history.To investigate the causes of the deficient adaptation of the northern European populations to the rivers of the Iberian Peninsula, several translocation experiments were carried out from 1992 to 1993, comparing salmon from River Ulla (northwestern Spain) and River Shin (Scotland), the former being one of the rivers used in the restocking [12]. The results of these experiments, carried out in a hatchery and in an experimental channel located in northwest Spain, indicated that while salmon from River Shin reached a higher condition factor than salmon from River Ulla, mortality was significantly higher in salmon from River Shin during the early stages of development. These findings suggest that salmon from Scotland are not well adapted to the conditions of the northern Iberian Peninsula and may explain the failure of restocking with foreign salmon [13], which contrasts with the success of the supportive breeding program carried out with local populations since 1998 [14].The fact that salmon populations located in different areas are genetically different is related to an innate preference to return to their river of origin, a behavior known as homing [15]. In line with this, a recent work by Jeffery et al. [16] studied the occurrence of regional differentiation in Atlantic salmon populations from both sides of the Atlantic using a panel of 96 SNPs. The authors were able to assign salmon populations to their areas of origin with high precision. Although the main cause of this strong differentiation is the reproductive behavior of the species, as gene flow is highly restricted by the tendency to return to the river of birth, other differentiating factors, such as local adaptation, may be involved [17,18]. Local adaptation can occur by divergent selection across environments, which favors the fixation of alleles that increase the fitness of a population in the environment it inhabits relative to other alleles favored in another population [19,20,21,22,23]. Consequently, the number of homozygous individuals for favorable alleles in each environment increases because of positive selection [24,25], thus contributing to population differentiation.For example, in Atlantic salmon, the NADP-dependent malic enzyme malic (mMEP-2*) has been identified as a candidate for local adaptation, particularly for populations in the Iberian Peninsula, as its allelic frequencies vary at macro- and microgeographic scales in relation to freshwater temperatures and the at-sea age of returning adults [13,26,27].The combined effect of natural selection and interpopulation isolation can produce a fingerprint in genomes. This effect of selection on the genomes of different populations of the same species can be detected by following at least two different methodologies [28,29], namely:Detection of atypically high values of differentiation (FST outliers) associated with particular markers/loci.Detection of unusual haplotypic patterns associated with a selective carryover effect, such as increased homozygosity in certain regions.Regardless of the method, genome-wide fingerprinting simply provides indirect evidence of the possible selective effect, so it is necessary to annotate candidate genomic regions and address the consequent study of candidate genes potentially involved in the selective process. Identification of these selective signatures may reveal genomic regions of biological and commercial interest. In addition, understanding local adaptation could be important for interpreting how salmonid populations respond to habitat alterations, climate change and fisheries.Therefore, the aim of this work is to compare the genomes of different populations from the Iberian Peninsula and Scotland to identify SNPs and candidate regions associated with high values of interpopulation differentiation and unusual haplotype patterns. The processes in which these genes are involved may explain the adaptive differences between populations and the failures of the restocking carried out during the 1980s.
2. Materials and Methods
A total of 282 returning adult individuals (165 females and 117 males) sampled between 2008 and 2009 were considered. To cover the entire distribution range of salmon in the Iberian Peninsula, 250 individuals from six Spanish rivers (Miño, Ulla, Eo, Sella, Urumea and Bidasoa) were analyzed. This included the Atlantic coast (77 females and 80 males) and the Cantabric coast (70 females and 23 males). The vast majority of the salmon analyzed came from recreational fishing. A few salmon came from sampling conducted by Atlantic salmon program management. For both the salmon caught by the rod (and dead) and those caught at the capture stations, the adipose fin is cut off and stored in ethanol by the management staff responsible for the salmon program. Additionally, 32 individuals from Scotland (18 females and 14 males) coming from two Scottish tributaries (Baddoch and Girnnock) of the Dee River (see Figure 1) were available for analysis. To identify the population structure, the samples were grouped using principal component analysis (PCA) with Adegenet [30] based on the Bayesian Information Criterion (BIC).
Figure 1
Origin and number of samples analysed: 250 individuals from six rivers in the Iberian Peninsula (Miño, Ulla, Eo, Sella, Urumea and Bidasoa), covering the entire distribution of salmon in its southern boundary, and 32 individuals from two Scottish rivers (Baddock and Girnock), tributaries of the Dee River.
2.1. Genotyping Quality Analyses
To purify the genomic DNA from ethanol-preserved adipose fins, an NZY Tissue gDNA Isolation Kit (NZYtech) was used. Quantification and purity were assessed using a Nanodrop-1000 spectrophotometer. DNA samples were adjusted to a final concentration of 100 ng/μL and frozen until use. Morphological sex was confirmed by the presence/absence of the SDY intron gene (~200 bp) successfully amplified in males and absent in females, by using the primers SDY E1S1 and SDY E2AS4 [31,32]. Samples were genotyped using a 220 K Affymetrix genotyping array (Thermo Scientific) according to the manufacturer’s recommendations [33]. Genotypes from samples showing a dish quality control (DQC) < 0.82 or call rate < 0.97 were discarded. Only those SNPs classified as Poly High Resolution, with a call rate > 0.97, were used in our analysis. Unmapped SNPs and those with a minor allele frequency (MAF) < 0.01 were also removed (Table 1).
Table 1
Quality control was performed on a 220,000 customized array.
Quality Filter
N° Removed SNP
Not mapped
1112
Low quality
47,432
MAF < 0.01
5679
Erratic genotypes
3
Total analysed SNPs
165,774
After applying these filters, a total of 165,774 SNPs remained allocated across the genome and were proportionally distributed according to the size of each chromosome (Figure 2).
Figure 2
SNP distribution after quality control. A total of 165,774 SNPs remained allocated across the genome and were proportionally distributed according to the size of each chromosome.
2.2. Selection Signatures and Gene Functional Annotation
To evaluate the occurrence of selection, we applied two methods based on the identification of FST outliers and three haplotype-based methods.The first family of methods is one of the most common ways to detect local adaptation in non-model organisms. These methods rely on allelic frequencies, assuming that loci under selection will provide unusually high FST values [34,35,36]. The outlier-based methods were assessed using BayeScan v.2.1 [37] and the EOS test implemented in HacDivSel v.1.4 software [21]. BayeScan uses a Bayesian likelihood method to estimate the posterior probability of loci experiencing selection. We used default settings, but to protect from false positives, we increased prior odds for the neutral model to 5000 and considered as significant only SNPs detected with Bayes factor between 32 and 100 (log10 = 1.5–2.0), as these indicate a very strong or decisive signal of divergent selection [36,38]. On the other hand, EOS implements a two-step conservative heuristic strategy for detecting extreme FST outliers [21]. We performed the EOS analysis using the default parameters. Input files were converted to the correct format using PGDSpider v.2.1.1.2 [39].Nevertheless, outliers’ strategy has some drawbacks and challenges that require the methods to be used with caution; for example, because the markers are considered independent, there is a loss of power when their number is large. In addition, the assumed null model can be wrong, leading to false positives if the population suffered demographic or historical events [38,40,41,42,43].To overcome these drawbacks, we also applied haplotype-based methods for detecting diversifying selection and performed three different methods: (1) a cross-population extended haplotype homozygosity model (XP-EHH) [44], as implemented in selscan v. 1.1.0 [45], (2) the nvdFST test [21], and (3) a test for multiloci FST variance as implemented in SmileFinder [46]. Haplotypes were inferred from genotypes with fastPHASE software [47].Although all three methods require haplotypes, the methodology for searching for selective patterns is different.XP-EHH. The cross-population extended haplotype homozygosity model is based on the inspection of patterns of linkage disequilibrium decay around selected loci and detects selection based on an excess of specific haplotypes in one of the populations. This method requires a linkage map. Since no map was available for our 220 K array, we used the physical distance.nvdFST. The nvdFST statistic combines two measures: a normalized variance difference (nvd) and an FST index. The nvd measure divides the haplotypes into two sets for each candidate SNP: one set carrying the major allele for the SNP and the other set carrying the minor allele. Only SNPs shared by both populations were considered. A variance of mutational distances is computed within each set and a normalized difference between variances defines the statistic that will increase under selection. The FST measure takes advantage of the fact that if selection acts on a SNP pointed by a high nvd value, then the FST at that site will be higher when compared to the overall FST assuming equilibrium in the presence of migration. A resampling method is used to compute the p-value under the hypothesis of panmixia and the final candidate SNPs are those with highest nvd values that additionally reject panmixia [21]. We used different windows sizes (1000, 500, 250, 125 and 62) for computing nvd and considered as potential candidates the 1% of the SNPs with highest nvd, which were also significant for the FST test under any window size.SmileFinder. This method uses a resampling-based strategy to infer the significance of multiloci FST variance using sliding windows of haplotypes of increasing size. In this case, the highest values of variance indicate the presence of selection.The gene content within genomic regions identified by candidate SNPs was explored using SalmoBase [48] and gene function annotated with the DAVID tool [49], using as reference and prioritizing by phylogenetic proximity, other salmonids, Danio rerio, Xenopus laevis, Mus musculus and Homo sapiens.
3. Results
3.1. Population Structure
Genotyped samples were classified into three groups after clustering with PCA according to samples from Scotland (Baddoch and Girnnock) and both slopes of the Iberian Peninsula: rivers from the Atlantic coast (Miño and Ulla) and from the Cantabric coast (Eo, Sella, Urumea and Bidasoa) (Figure 3). From now on, these three pools of data will be listed as Atlantic, Cantabric and Scotland, with 157, 93 and 32 individuals, respectively. Although 9% of individuals from Atlantic rivers were bound with Cantabric rivers, they were considered Atlantic data. This finding is probably related to the fact that during the 80s and until 1992, supportive breeding was performed in Galicia with an initial stock originating from different Galician rivers, including River Eo. However, since 1992, supportive breeding has been exclusively performed with individuals from the same river.
Figure 3
Samples were clustered through principal component analysis with Adegenet (Jombart 2008) based on the Bayesian Information Criterion (BIC). Three pools of data were clustered, listed as Atlantic (Miño and Ulla), Cantabric (Eo, Sella, Urumea and Bidasoa) and Scotland (Baddoch and Girnnock), with 157, 93 and 32 individuals, respectively.
3.2. Selection Signatures: Outlier Methods
Bayescan and EOS methods were applied for the comparison of population pairs. For Bayescan, we required a posterior probability of at least 0.9 (logBF ≥ 1.5) as evidence for local adaptation, but only the comparison between Atlantic and Scotland showed evidence of positive selection and only for two significant unannotated SNPs (ctg718000187612_7079_SGT and ctg7180001695888_3098_SCT) in chromosome Ssa22 in a region of around 20 Mb. These same two SNPs were also detected by EOS, which found 142 significant SNPs in the comparison Cantabric–Scotland and 412 in Atlantic–Scotland. There were no significant SNPs in the comparison Atlantic versus Cantabric (Table 2).
Table 2
Summary of significant SNPs in each comparison between studied populations after the application of different selection detection methods (Atl-Can: Atlantic–Cantabric; Can-Scot: Cantabric–Scotland; Atl-Scot: Atlantic–Scotland).
Statistic
Software
Number of Significant SNPs
Atl-Can
Can-Scot
Atl-Scot
FST outliersdetection
HacDivSel (EOS test)
0
142
412
BayeScan (logBF = 1.5)
0
0
2
Haplotypebasedmethods
HacDivSel (nvdFST)
748
1504
2607
SmileFinder
631
1346
2786
selscan (XP-EHH)
201
1863
2880
The largest number of significant SNPs was found in chromosome Ssa04, for the comparison of Atlantic versus Scotland (Figure 4). In addition, additional regions were detected by EOS in both comparisons with Scotland. Notably, the 24.8–24.9 Mb region in Ssa09 has previously been identified by several studies as carrying a strong diversifying selection signal among European but also North American Atlantic salmon populations [10,33,50,51].
Figure 4
Proportion of significant SNPs of outliers, methods: BayeScan (log BF = 1.5) and EOS-HacDivSel (Atl-Can: Atlantic–Cantabric; Can-Scot: Cantabric–Scotland; Atl-Scot: Atlantic–Scotland).
3.3. Selection Signatures: Haplotype Methods
The haplotype-based methods identified many more candidate SNPs, even for the Atlantic–Cantabric comparison (Table 2). Since the methods assayed imply different strategies to infer selection and it is a recommended practice to combine various approaches [52], we focused on those SNPs detected for at least two of the haplotype-based methods (Figure 5 and Table 3), whether they were also detected by the outlier-based methods. The complete list of these SNPs and the methods for which they were significant are given in Supplementary Tables S1 and S2.
Figure 5
Per chromosome regions with significant SNPs obtained from at least two haplotype-based methods. Regions surrounded by a red circle indicate SNPs detected in both comparisons with Scotland.
Table 3
Number of significant SNPs in at least two haplotype-based methods for each comparison. Comparisons: Atl-Can: Atlantic–Cantabric; Atl-Scot: Atlantic–Scotland; Can-Scot: Cantabric–Scotland. Methods: X: XP-EHH; N: nvdFST; S: SmileFinder. Total: total number of different SNPs detected.
Comparison
Methods
X-N
X-S
N-S
X-N-S
Total
Atl-Can
0
4
14
0
16
Atl-Scot
210
275
147
64
506
Can-Scot
59
31
82
19
170
Chromosomes Ssa09, Ssa11 and Ssa24 had the highest abundance of selection signatures detected by at least two haplotype-based methods, with 90 (Table S1 identifiers H136–H225), 95 (H241–H335) and 119 (H474–H592) SNPs, respectively (Table S1). There were 16 candidate SNPs for local adaptation between the Atlantic and Cantabric rivers, 506 for Atlantic–Scotland and 170 for Cantabric–Scotland (Table 3). Regarding the comparison with Scotland, 62 out of the 170 SNPs detected in Cantabric–Scotland were also detected in the Atlantic–Scotland pair. These common SNPs were located in different chromosomes: Ssa01, 04, 07, 09, 11, 16, 18, 22, 24 and 27 (red circles in Figure 5).Regarding the aforementioned region, 24.8–24.9 Mb of divergent selection in Ssa09 was also identified in both comparisons with Scotland by the nvdFST method, although not by selscan or SmileFinder. On the other hand, chromosome Ssa22 had 13 significant SNPs detected for the Atlantic–Scotland comparison by both XP-EHH and nvdFST methods in a close region (1–2 Mb downstream), where BayeScan and EOS found two significant (non-annotated) SNPs (Figure 4).Notably, 64 SNPs were significantly detected under all three methods for the Atlantic–Scotland comparison and 19 of these were for the Cantabric–Scotland one.
3.4. Gene Functional Annotation
The complete list of significant SNPs in one or two population comparisons and for two or three of the haplotype-based methods can be found in the supplementary Tables S1 and S2 and the genes containing these SNPs in Table S3.
3.5. Atlantic–Cantabric Comparison
As expected, there were fewer significant SNPs detected by the two haplotype methods in the Atlantic–Cantabric comparison than in either of the two pairwise comparisons with Scotland. These SNPs were located in chromosomes Ssa03 (identifiers H040–H043 in Table S1), Ssa13 (identifiers H339–H340 in Table S1), Ssa17 (identifiers H408–H412 in Table S1), Ssa20 (identifier H439 in Table S1), Ssa25 (identifiers H593–H594 in Table S1) and Ssa26 (identifiers H600–H601 in Table S1).Notably, SNP H601 within gene Cyld, which was detected by nvd and SmileFinder (Table S2), has recently been associated with body weight in rainbow trout [53].
3.6. Comparison with Scotland: SNPs Significant for All Three Haplotype-Based Methods
There were 64 significant SNPs for all three haplotype-based methods. The GO (Gene Ontology) categories of the genes containing those SNPs correspond mainly to developmental and cellular processes. Among the annotated genes (Table 4), we found the Protocadherin fat4 that pertains to the cadherin gene family and has been associated with amoebic gill disease resistance [54]; the glyoxalase 1 gene, glo1, which is important in many physiological processes and diseases and seems to increase its expression in response to stress [55]; MAM domain containing glycosylphosphatidylinositol anchor 1, mdga1, which is related to brain development [56], and has been previously identified as selective in the divergence between Norway populations of Atlantic salmon [57]; WD repeat domain 43, wdr43 is related with lipid metabolism [58] and transcriptional response to contaminant exposure [59] and Zinc finger AN1-type domain 3, zfand3 which is involved in sex determination and male germ cell maturation in the teleost Nile tilapia [60].
Table 4
Genes that include (or are located at less than 1000 kb) significant SNPs for the Atlantic–Scotland comparison detected by all three haplotype-based methods. SNPs were annotated using the online tool SalmoBase (https://salmobase.org/, accessed on 20 May 2021). The #SNPs refer to the number of all SNPs detected by all three methods in that chromosome, not only to those included in genes. The latter are indicated by SNP IDs. In parentheses, the Mb position in the corresponding chromosome.
*: was significant in both comparisons Atlantic–Scotland and Cantabric–Scotland.
When the SNPs detected only by two haplotype-based methods were also considered, most of the genes identified were related to metabolic or cellular developmental processes, DNA transcription and anatomical structure. In Ssa03, the gene tpr is related to response to temperature changes in mice [61] affected by two SNPs in the comparison Cantabric–Scotland. Another polymorphism is close to pgk, a locus related to growth in the turbot in Ssa04 [62]. Other genes involved with growth in Atlantic salmon are close to significant SNPs: agrn and pomt1 in Ssa15 (1,164,514 bp) and Ssa20 (31,286 bp) [63]. In addition, other SNPs are near to e2f4 in Ssa10 and fra10ac1 in Ssa28 and are related to growth and late maturation in salmon [64].
3.7. Malic Enzyme
As the NADP-dependent malic enzyme-2 (mMEP-2*) has been reported to be a candidate for local adaptation in salmon from the Iberian Peninsula [27], there is a priori independent information about local adaptation effects related to that locus (LOC106586750), which is located at 39,878,908–39,909,791 bp in Ss25 [65]. In this region, there is just one SNP (ctg7180001794010_7928_SCT) at 39,897,822 bp in chromosome Ssa25 (id 3579 in Table S4). This malic SNP is surrounded by two other SNPs located at −11 kb and +15.6 kb (ids 3578 and 3580, Table S4). However, in chromosome Ssa25, there were only 6 SNPs detected as significant by at least two haplotype-based methods and these SNPs are located at 7,175,102 bp and 32,983,986 bp (Table S1), thus leaving out the region coding for mMEP-2*.In a previous study using the same microarray and populations (from Spain), genomic linkage disequilibrium highly decreased for SNPs separated 1 Mb apart (reaching half of the maximum value between SNPs separated at 0.5 Mb [66]. In our case, if we consider a window size of 51 SNPs centered in the mMEP-2*gene, SNPs located upstream and downstream of the region correspond to distances of −0.23 Mb (SNP id 3554) at the left and +0.64 Mb (SNP id 3604) at the right. These distances are minimal because, contrary to the SNPs presented in Table S4—HacDivSel does not consider non-shared SNPs, which could increase the distance between the adjacent SNPs in a window of the given size.Similarly, if we use a window of size 125 SNPs, we will have 62 SNPs on each side, implying a distance of at least 0.61 Mb at the left and 1.5 Mb at the right side from the malic SNP (ctg7180001794010_7928_SCT).Therefore, we expect that if the malic SNP reflects a selective pattern affecting the variance of haplotype allelic classes, it could be detected under a window size of less than 50 SNPs or even 125 SNPs, since linkage disequilibrium is still appreciable up to 1 Mb away.Moreover, considering a candidate SNP that has been previously identified, we can perform the test only for this SNP so that multiple testing corrections are not required. Unfortunately, because the distribution of the nvd statistic is not known, we still face the problem that the statistical significance of a SNP that we assigned to the 1% of candidates with the highest nvd is meaningless if we have only one candidate. Fortunately, we can still do a FST test and look for a more direct way of estimating statistical significance by comparing haplotype variances.Therefore, we modified our program HacDivSel to introduce user-defined candidate SNPs and performed the analysis in a window of SNPs centered at the gene coding mMEP-2*. For each population, we compared the homogeneity of variances between the partitions having one or other malic allele, i.e., C or T at ctg7180001794010_7928_SCT SNP. These are the same variances compared by nvd, and we can use a robust test of variances (e.g., the composite test as defined in Ramsey and Ramsey 2007) to assess whether the variance for the major allele partition is significantly lower, which would indicate the presence of a positive selection effect in the population. We considered divergent selection when the comparison of variances was significant in at least one population and when both populations had a significantly high FST value (see Appendix A for details of the variance comparison). With this criterion, we analysed the SNP within the mMEP-2* region using windows of 25, 51 and 125 SNPs to study the possible pattern of divergent selection. The results are presented in Table 5.
Table 5
Homogeneity variance test for divergent selection from the new version of the HacDivSel program. Var test p value: The p-value of the within-population variance homogeneity F test. The within-population variance homogeneity test implies two comparisons, one for each population, and two p values are obtained. The minimum p-value obtained is given in the table.
Comparison
WINDOW SIZE
Var Test p-Value
FSTp-Value
Divergence Significance Test
Atl-Can
25
4 × 10−8
0.014
*
51
0.003
0.004
*
125
0.164
0.002
ns
Can-Scot
25
0.011
0.038
*
51
0.005
0.030
*
125
0.653
0.024
ns
Atl-Scot
25
8 × 10−10
0
*
51
0.021
0
*
125
1
0
ns
*: Significant variance and FST
p-values. ns: non-significant.
The malic mMEP-2* SNP studied in this new method with the new version of HacDivSel software was clearly detected as a candidate for local adaptation in the three comparisons under the windows sizes assayed within 1 Mb. The window size of 125 SNPs was non-significant in the three comparisons, which probably means that the linkage disequilibrium decreased enough to make it difficult to detect the selection pattern under the given sample sizes.Furthermore, the neighbors of mMEP-2* are only a few kb apart, which suggests that SNPs in the vicinity should still present a selective pattern, even in the Atlantic–Cantabric comparison (Table 6). In contrast, the SNPs in the extremes of the windows of size 50 and 125 SNPs should not have a selective pattern, at least not one caused by the mMEP-2* effect, with the exception of SNP ID 3704 (Table 6).
Table 6
Homogeneity variance test for divergent selection for the Atlantic–Cantabric comparison. ID: SNP ID in Table S4 (e.g., ID 3579 corresponds to SNP ctg7180001794010_7928_SCT in the region with the gene symbol LOC106586750 corresponding to mMEP-2*). Var test p value: minimum p-value of the two variance homogeneity F tests (one for each population).
ID
Window Size
Var Test p-Value
FSTp-Value
Divergence Significance Test
3454
25
1
1
ns
3529
25
1
1
ns
3578
25
0.026
3 × 10−9
*
3579
25
4 × 10−8
0.002
*
3580
25
0.002
0.013
*
3629
25
1
1
ns
3704
25
2 × 10−8
0.008
*
*: Significant variance and FST
p-values. ns: non-significant.
In addition, we estimated the false-positive rate (FPR) of the new method using simulated data. We note that, at least with the tested window sizes (from 25 to 400 SNPs), the method appears conservative, with an FPR below 5% (see Table A1 in Appendix A), supporting the reliability of our results.
Table A1
False positive rate (FPR) for the divergent selection detection variance test using simulated neutral evolution data available from Carvajal-Rodríguez, 2017 [21]. The genome size was 1 Mb. Population size N = 1000. Number of generations T = 10,000. Population mutation rate θ = 4 Nµ. Population recombination rate ρ = 4 Nr. The candidate position is in the middle of the window. The total number of replicates for each case was 1000. FPR = 100×number of replicates with significant test/1000. MAF = 1%. Significance level α = 0.05.
File Name
θ
ρ
Mean Window Size
FPR
C4
12
0
25
1%
C4
12
0
87
1%
C5
12
4
25
2%
C5
12
4
87
1%
C6
12
12
25
2%
C6
12
12
86
1%
C16
60
0
25
1%
C16
60
0
51
1%
C16
60
0
125
2%
C16
60
0
436
1%
C17
60
4
25
2%
C17
60
4
51
2%
C17
60
4
125
2%
C17
60
4
429
2%
C18
60
60
25
2%
C18
60
60
51
2%
C18
60
60
125
2%
C18
60
60
431
1%
4. Discussion
In this work, we performed a genome-wide scan using a high-density SNP array with the aim of investigating different patterns of selective variation in the Atlantic salmon genome to identify candidate loci that may be involved in the local adaptation of Atlantic salmon in the northern Iberian Peninsula, which is the southern limit of the distribution of this species.We used different frequency-based (FST outliers) and haplotype-based selection detection methods and observed little overlap in the regions detected by the two types of strategies. The lack of overlap between frequency-based and haplotype-based methods has already been reported in other studies [67,68]. Regarding the use of outliers, most salmonid phenotypic traits are polygenic, such as growth, body size and fat content, among others [9], for which the effect of selection may involve a subtle change in allele frequency at several loci. This context makes it difficult to detect the selection effect using outlier-based methods that rely on a strong change in allele frequency at a few independent loci [21]. Therefore, as expected, haplotype-based methods have more power to detect recent signatures of selection [69]. We combined the results of three haplotype-based methods to identify 630 SNPs, 55% of which matched 116 genes in 25 out of 29 chromosomes (Tables S1–S3). The chromosomes containing more candidate genes were Ssa24 (29 genes), Ssa09 (13 genes), Ssa11 (9 genes), Ssa01 and Ssa17 (8 genes) and Ssa10 and Ssa03 (7 and 6 genes, respectively). Genes potentially more affected by divergent selection were those resulting from the comparison between the peninsular slopes and Scotland, as expected, since the intensity of local adaptation appears to be correlated with the geographical distance between salmonid populations [9,57]. Our results support the idea that Atlantic salmon populations show significant genetic differences that may be the result of a combination of diversifying natural selection and the effect of gene flow and drift.Even with the conservative threshold we used for outlier methods, coupled with the requirement of a positive signal in at least two haplotype-based methods, we detected quite a few signals and genes that could be involved in local adaptation processes. Similar to other genome-wide studies looking for local adaptation, several of the candidate genes we have found are distributed along different chromosomal regions and are related to cellular growth and lipid metabolic processes, among others.In addition to the coincident SNPs for at least two haplotype-based methods, we found a well-known Ssa09 haploblock associated with a strong signature of divergent selection among Atlantic salmon populations [10]. This region, located at 24.8–24.9 Mb, presented a strong signal in the comparisons between Scotland and the Atlantic and Cantabric populations for both the EOS and the nvdFST methods. The same region has been previously reported as a candidate for selection by several authors and contains the protein phosphatase 1a (ppm1a) gene and the SIX homeobox (six6), the latter related to circadian timing processes [10]. For the comparison between Atlantic and Scotland, nvdFST also detected SNPs within the region located at 77–78 Mb on Ssa13, corresponding to the SLC25A14 gene, which encodes a brain mitochondrial transporter protein and has also been found to diverge between European and North American populations [70]. Similarly, we found divergence in regions containing immune-related genes already identified in the literature as divergent in comparisons between northern and southern Norwegian populations [57].
Malic Enzyme
The gene encoding the malic enzyme deserves special mention. As previously explained, this gene located at Ss25 was specifically explored for candidate SNPs, since early electrophoretic studies on genetic variation in Atlantic salmon referred specifically to this gene [71]. Variation at mMEP-2* alleles (isozymes 100 and 125) presents strong correlations with environmental temperature, both within and among rivers, and associations with phenotypic performance [71]. Indeed, there is a latitudinal cline in which the frequency of the *125 allele of mMEP-2* is higher in northern European Atlantic salmon populations than in southern European populations. Moreover, in the Atlantic populations of the Iberian Peninsula, the frequency of the *125 allele is almost zero [72]. In fact, the failure of the restocking carried out in the Iberian Peninsula with salmon from Northern Europe was proven by the low frequency of the *125 allele in the Iberian populations after restocking [13].However, with the genomic selection detection methods initially assayed, we could not detect any significant SNP within the region coding for mMEP-2*. However, this result was expected. First, the allele frequencies of the malic SNP were 0.02229 in the Atlantic sample, 0.134 in the Cantabric sample and 0.4219 in the Scotland sample, which implies that the frequencies of the three samples are in the same half of the frequency range. Second, since we are analyzing several thousand SNPs, multitest correction implies a loss of statistical power. Thus, it is expected that other more powerful signals will be more easily detected. Fortunately, after modification of the HacDivSel method to consider some candidates a priori, the malic mMEP-2* SNP, was clearly detected as a candidate for local adaptation.Although the maladaptation of northern European salmon demonstrated by the garden experiment described by García de Leániz et al. [12] is not exclusively due to the malic enzyme, the role of this gene could be relevant, as suggested by the results obtained in the present work.
5. Conclusions
In this study, we provided additional evidence of local adaptation for north Spanish and Scottish populations of Atlantic salmon using genome-wide information. Defining the spatiotemporal scales over which adaptation operates is important from the point of view of management and conservation. Ignoring the existence of locally adapted populations may pose a greater risk to the conservation management of threatened populations [8]. We modified the program HacDivSel to introduce user-defined candidate SNPs and performed the analysis using a window of SNPs centered at a SNP within the NADP-dependent malic enzyme-2 (mMEP-2*), previously reported by independent studies as a candidate for local adaptation in salmon from the Iberian Peninsula. Interestingly, the corresponding SNP within the mMEP-2* region was consistent with a genomic pattern of divergent selection.In summary, our results suggest that Spanish and Scottish populations may differ at the functional gene level due to local adaptation, which would explain the failure to restock Spanish rivers with Norway and Scottish individuals.
Authors: Hollie S Skaggs; Hongyan Xing; Donald C Wilkerson; Lynea A Murphy; Yiling Hong; Christopher N Mayhew; Kevin D Sarge Journal: J Biol Chem Date: 2007-09-25 Impact factor: 5.157
Authors: Jean-Sébastien Moore; Vincent Bourret; Mélanie Dionne; Ian Bradbury; Patrick O'Reilly; Matthew Kent; Gérald Chaput; Louis Bernatchez Journal: Mol Ecol Date: 2014-11-08 Impact factor: 6.185
Authors: Pardis C Sabeti; Patrick Varilly; Ben Fry; Jason Lohmueller; Elizabeth Hostetter; Chris Cotsapas; Xiaohui Xie; Elizabeth H Byrne; Steven A McCarroll; Rachelle Gaudet; Stephen F Schaffner; Eric S Lander; Kelly A Frazer; Dennis G Ballinger; David R Cox; David A Hinds; Laura L Stuve; Richard A Gibbs; John W Belmont; Andrew Boudreau; Paul Hardenbol; Suzanne M Leal; Shiran Pasternak; David A Wheeler; Thomas D Willis; Fuli Yu; Huanming Yang; Changqing Zeng; Yang Gao; Haoran Hu; Weitao Hu; Chaohua Li; Wei Lin; Siqi Liu; Hao Pan; Xiaoli Tang; Jian Wang; Wei Wang; Jun Yu; Bo Zhang; Qingrun Zhang; Hongbin Zhao; Hui Zhao; Jun Zhou; Stacey B Gabriel; Rachel Barry; Brendan Blumenstiel; Amy Camargo; Matthew Defelice; Maura Faggart; Mary Goyette; Supriya Gupta; Jamie Moore; Huy Nguyen; Robert C Onofrio; Melissa Parkin; Jessica Roy; Erich Stahl; Ellen Winchester; Liuda Ziaugra; David Altshuler; Yan Shen; Zhijian Yao; Wei Huang; Xun Chu; Yungang He; Li Jin; Yangfan Liu; Yayun Shen; Weiwei Sun; Haifeng Wang; Yi Wang; Ying Wang; Xiaoyan Xiong; Liang Xu; Mary M Y Waye; Stephen K W Tsui; Hong Xue; J Tze-Fei Wong; Luana M Galver; Jian-Bing Fan; Kevin Gunderson; Sarah S Murray; Arnold R Oliphant; Mark S Chee; Alexandre Montpetit; Fanny Chagnon; Vincent Ferretti; Martin Leboeuf; Jean-François Olivier; Michael S Phillips; Stéphanie Roumy; Clémentine Sallée; Andrei Verner; Thomas J Hudson; Pui-Yan Kwok; Dongmei Cai; Daniel C Koboldt; Raymond D Miller; Ludmila Pawlikowska; Patricia Taillon-Miller; Ming Xiao; Lap-Chee Tsui; William Mak; You Qiang Song; Paul K H Tam; Yusuke Nakamura; Takahisa Kawaguchi; Takuya Kitamoto; Takashi Morizono; Atsushi Nagashima; Yozo Ohnishi; Akihiro Sekine; Toshihiro Tanaka; Tatsuhiko Tsunoda; Panos Deloukas; Christine P Bird; Marcos Delgado; Emmanouil T Dermitzakis; Rhian Gwilliam; Sarah Hunt; Jonathan Morrison; Don Powell; Barbara E Stranger; Pamela Whittaker; David R Bentley; Mark J Daly; Paul I W de Bakker; Jeff Barrett; Yves R Chretien; Julian Maller; Steve McCarroll; Nick Patterson; Itsik Pe'er; Alkes Price; Shaun Purcell; Daniel J Richter; Pardis Sabeti; Richa Saxena; Stephen F Schaffner; Pak C Sham; Patrick Varilly; David Altshuler; Lincoln D Stein; Lalitha Krishnan; Albert Vernon Smith; Marcela K Tello-Ruiz; Gudmundur A Thorisson; Aravinda Chakravarti; Peter E Chen; David J Cutler; Carl S Kashuk; Shin Lin; Gonçalo R Abecasis; Weihua Guan; Yun Li; Heather M Munro; Zhaohui Steve Qin; Daryl J Thomas; Gilean McVean; Adam Auton; Leonardo Bottolo; Niall Cardin; Susana Eyheramendy; Colin Freeman; Jonathan Marchini; Simon Myers; Chris Spencer; Matthew Stephens; Peter Donnelly; Lon R Cardon; Geraldine Clarke; David M Evans; Andrew P Morris; Bruce S Weir; Tatsuhiko Tsunoda; Todd A Johnson; James C Mullikin; Stephen T Sherry; Michael Feolo; Andrew Skol; Houcan Zhang; Changqing Zeng; Hui Zhao; Ichiro Matsuda; Yoshimitsu Fukushima; Darryl R Macer; Eiko Suda; Charles N Rotimi; Clement A Adebamowo; Ike Ajayi; Toyin Aniagwu; Patricia A Marshall; Chibuzor Nkwodimmah; Charmaine D M Royal; Mark F Leppert; Missy Dixon; Andy Peiffer; Renzong Qiu; Alastair Kent; Kazuto Kato; Norio Niikawa; Isaac F Adewole; Bartha M Knoppers; Morris W Foster; Ellen Wright Clayton; Jessica Watkin; Richard A Gibbs; John W Belmont; Donna Muzny; Lynne Nazareth; Erica Sodergren; George M Weinstock; David A Wheeler; Imtaz Yakub; Stacey B Gabriel; Robert C Onofrio; Daniel J Richter; Liuda Ziaugra; Bruce W Birren; Mark J Daly; David Altshuler; Richard K Wilson; Lucinda L Fulton; Jane Rogers; John Burton; Nigel P Carter; Christopher M Clee; Mark Griffiths; Matthew C Jones; Kirsten McLay; Robert W Plumb; Mark T Ross; Sarah K Sims; David L Willey; Zhu Chen; Hua Han; Le Kang; Martin Godbout; John C Wallenburg; Paul L'Archevêque; Guy Bellemare; Koji Saeki; Hongguang Wang; Daochang An; Hongbo Fu; Qing Li; Zhen Wang; Renwu Wang; Arthur L Holden; Lisa D Brooks; Jean E McEwen; Mark S Guyer; Vivian Ota Wang; Jane L Peterson; Michael Shi; Jack Spiegel; Lawrence M Sung; Lynn F Zacharia; Francis S Collins; Karen Kennedy; Ruth Jamieson; John Stewart Journal: Nature Date: 2007-10-18 Impact factor: 49.962
Authors: María Gabián; Paloma Morán; Ana I Fernández; Beatriz Villanueva; Amel Chtioui; Matthew P Kent; Lara Covelo-Soto; Almudena Fernández; María Saura Journal: BMC Genomics Date: 2019-10-22 Impact factor: 3.969