Literature DB >> 25286752

A low-density SNP array for analyzing differential selection in freshwater and marine populations of threespine stickleback (Gasterosteus aculeatus).

Anne-Laure Ferchaud, Susanne H Pedersen, Dorte Bekkevold, Jianbo Jian, Yongchao Niu, Michael M Hansen1.   

Abstract

BACKGROUND: The threespine stickleback (Gasterosteus aculeatus) has become an important model species for studying both contemporary and parallel evolution. In particular, differential adaptation to freshwater and marine environments has led to high differentiation between freshwater and marine stickleback populations at the phenotypic trait of lateral plate morphology and the underlying candidate gene Ectodysplacin (EDA). Many studies have focused on this trait and candidate gene, although other genes involved in marine-freshwater adaptation may be equally important. In order to develop a resource for rapid and cost efficient analysis of genetic divergence between freshwater and marine sticklebacks, we generated a low-density SNP (Single Nucleotide Polymorphism) array encompassing markers of chromosome regions under putative directional selection, along with neutral markers for background.
RESULTS: RAD (Restriction site Associated DNA) sequencing of sixty individuals representing two freshwater and one marine population led to the identification of 33,993 SNP markers. Ninety-six of these were chosen for the low-density SNP array, among which 70 represented SNPs under putatively directional selection in freshwater vs. marine environments, whereas 26 SNPs were assumed to be neutral. Annotation of these regions revealed several genes that are candidates for affecting stickleback phenotypic variation, some of which have been observed in previous studies whereas others are new.
CONCLUSIONS: We have developed a cost-efficient low-density SNP array that allows for rapid screening of polymorphisms in threespine stickleback. The array provides a valuable tool for analyzing adaptive divergence between freshwater and marine stickleback populations beyond the well-established candidate gene Ectodysplacin (EDA).

Entities:  

Mesh:

Substances:

Year:  2014        PMID: 25286752      PMCID: PMC4196021          DOI: 10.1186/1471-2164-15-867

Source DB:  PubMed          Journal:  BMC Genomics        ISSN: 1471-2164            Impact factor:   3.969


Background

It is becoming increasingly evident that evolution is not just a long-term process on the scale of millennia; contemporary evolution can take place over just a few generations [1, 2]. Similarly, the importance of parallel evolution in populations facing similar environmental conditions and the role of gene reuse (or lack thereof) in this process is increasingly discussed [3-6]. The threespine stickleback (Gasterosteus aculateus) is distributed throughout the northern Hemisphere and shows extensive morphological and ecological variation [7]. Numerous resources, including its genome sequence are available, and the species has emerged as one of the most important models for studying both contemporary [8, 9] and parallel evolution [10-16]. Adaptation to freshwater and marine environments, respectively, has received particular attention due to the differences of plate morphology in the two environments and the finding of Ectodysplacin (EDA) as a candidate locus [10, 16, 17]. Nevertheless, other regions of the genome than that harboring EDA also show footprints of differential selection in freshwater and marine habitats [11-13], and in some cases these encompass non-coding and presumably regulatory regions [13]. In some geographical regions, notably Northern Europe, the patterns of divergence between marine and freshwater populations of threespine sticklebacks appear less distinct than in other regions, possibly reflecting gene flow overcoming selection [18, 19]. However, this has mainly been studied with specific focus on EDA and/or lateral plate morphology. Screening of adaptive divergence at other chromosomal regions could be achieved by whole genome sequencing or RAD (Restriction Associated DNA) sequencing [11, 13], though this precludes studies requiring large sample sizes. Also, a medium-density SNP chip has previously been constructed, encompassing 3,072 markers [12]. However in some situations, where analysis of many individuals from many localities is required, it would be preferable to invest more in sample size than in genomic resolution. This involves cases where hundreds of individuals are analyzed in order to assess e.g. temporal changes of allele frequencies as a result of selection, or hybrid zone dynamics [18-20]. Microsatellite loci have been developed that mark chromosomal regions under differential selection in freshwater and marine environments [16, 17], but development of a SNP array would allow for even faster and cost-efficient genotyping. In the present study, we therefore aimed at generating a low-density SNP array encompassing markers of chromosomal regions under differential freshwater-marine selection along with neutral markers for background, thus providing a resource for extensive studies of parallel evolution and marine-freshwater hybrid zone dynamics. We identified SNPs based on RAD sequencing of one marine and two isolated freshwater populations. Based on these data we chose 96 SNPs for inclusion in the array. In order to validate the array we also analyzed a sample of threespine sticklebacks from a Danish river that represents a mixture of marine and freshwater morphs.

Methods

Ethical statement

Sampling of sticklebacks took place in accordance with Danish law and regulations. Threespine stickleback is not included in the Directive “Bekendtgørelse om fredning af visse dyre- og plantearter mv., indfangning af og handel med vildt og pleje af tilskadekommet vildt” (Directive on Protection of Certain Animal and Plant Species, Catch and Trade of Game, and Nursing of Wounded Game) by the Danish Ministry of the Environment. Catch of sticklebacks is therefore permitted unless it involves so high numbers of individuals that it would significantly affect the ecosystem, which was clearly not the case in this study. The fish were euthanized using an overdose of benzocaine and were subsequently stored in 96% ethanol.

Sampled localities

Sixty threespine sticklebacks, 20 from each site, were sampled by cast nets or minnow traps from three localities in Jutland, Denmark: 1) Lake Hald, a 3.3 km2 freshwater lake, 2) a small unnamed freshwater pond (ca. 0.01 km2) near the town of Hadsten and 3) the Mariager Fjord, a marine environment (see Figure 1). These individuals were analyzed using RAD sequencing [11, 21] in order to identify SNPs. 4) An additional 96 individuals were sampled close to the outlet of the Odder River, Jutland, Denmark (see Figure 1). Individuals from this estuarine population were genotyped in order to validate the generated SNP array. The first two samples (Lake Hald and Hadsten) consisted of morphs with low numbers of lateral plates (“low-plated”), as typically observed in freshwater [10]. The third, marine sample consisted of the typical marine morph with high numbers of lateral plates (“high-plated”), whereas the fourth estuarine population consisted of a mixture of low and high-plated morphs.
Figure 1

Map showing the location of sampled three-spine stickleback populations in Jutland, Denmark.

Map showing the location of sampled three-spine stickleback populations in Jutland, Denmark.

RAD sequencing and SNP identification

Genomic DNA was extracted from muscle tissue using standard phenol-chloroform extraction. RAD sequencing was conducted by Beijing Genomics Institute (BGI, Hong Kong, China). The procedures for construction of libraries and Illumina HiSeq paired-end sequencing followed those described for European eel (A. anguilla) by Pujolar et al. [22], except for the fact that samples were digested with the restriction enzyme SbfI instead of EcoRI. Sequence lengths were 90 bp. Only the first reads (with the restriction site) were used in subsequent analyses due to low coverage of the second reads (not containing the restriction site). The sequence reads were sorted according to their unique barcode tag and filtered and trimmed using the FASTX Toolkit (http://hannonlab.cshl.edu/fastx-toolkit). Final read lengths were trimmed to 75 nucleotides to avoid an increase of sequencing errors in the tail ends [22]. Reads of poor quality (with a Phred score < 10 per nucleotide position) were removed. Reads were subsequently aligned to the stickleback genome using Bowtie version 0.12.8 [23] with a maximum of 2 mismatches allowed between individual reads and the genome sequence. Alignments were suppressed for a particular read if more than one reportable alignment was present. This was done in order to minimize the occurrence of paralogous sequences in the data. The reference-aligned data were subsequently used to identify SNPs and call genotypes. For this purpose we used the Refmap.pl pipeline in Stacks [24], implementing a maximum-likelihood model for SNP calling and filtering out RAD loci within individuals with a coverage < 10x. Furthermore, we required loci to be genotyped in at least 70% of the individuals from each population sample. Loci with a sequencing depth > 80x or exhibiting three alleles within individuals were also removed in order to avoid paralogs. FST for each SNP between pairs of marine and freshwater populations was estimated using Populations implemented in Stacks [ [24] ]. The same pipeline was used for estimating sliding windows FST across 150,000 bp along each chromosome, based on a Gaussian Kernel smoothing function. Finally, the smoothed FST values were plotted using the R package [25].

SNP low-density array design

Based on the outcome of the analysis of RAD data we selected 96 SNPs for inclusion in the low density SNP array. We selected SNPs 1) exhibiting high genetic differentiation between the two freshwater and marine populations, both at the individual SNPs and based on smoothed FST values, indicating possible diversifying selection; and 2) SNPs outside regions of elevated differentiation, presumably reflecting neutral markers. We used the threespine stickleback genome sequence to extend the flanking sequence to at least 100 bp to allow for optimal primer design. We also searched for possible candidate loci marked by the SNPs using the stickleback genome browser (http://sticklebrowser.stanford.edu), in which many genes are already annotated by name and putative orthology. The best Blast hit was used to assess the putative orthologous gene. The putative orthology relationships of the remaining genes, i.e. those that have not yet been annotated, were further analyzed by a Blast comparison of their predicted protein sequence against the NCBI protein database. The function of the candidate genes was assessed using two searchable databases: The AmiGO 2 GO browser and an integrated database of human genes that also provides putative orthology with other vertebrates (http://www.genecards.org/). The selected 96 SNPs were genotyped in 96 individuals from the Odder River population on 96.96 Dynamic Arrays (Fluidigm Corporation, SanFrancisco, CA, USA), using the Fluidigm EP1 instrumentation according to the manufacturer’s recommendations. The Fluidigm system uses nano-fluidic circuitry to simultaneously genotype up to 96 individuals at 96 loci (see [26] for a description of the Fluidigm system methodology). Genotypes were called using the Fluidigm SNP Genotyping Analysis software. We used Genalex 6.5 [27] to estimate expected and observed heterozygosity and test for Hardy-Weinberg equilibrium at each locus. Significance levels were adjusted using False Discovery Rate correction [28].

Results

RAD sequencing

RAD sequencing generated from 1.06 to 8.22 million reads per individual, with an average of 2.8 million reads. The mean depth of sequencing was 44.59. The number of reads retained through each step of the analysis is listed in Table 1. After all filtering steps in Stacks and post-filtering to remove possible paralogs, 19,793 loci were retained that represent 33,993 SNPs.
Table 1

RAD sequencing statistics

PopulationNRaw read count (M)Read counts (M) after FASTX filtering and BOWTIE alignment% Raw reads aligned% Raw reads used
Hadsten 202.851.8765.563.3
Lake Hald 202.801.1240.038.8
Mariager Fjord 202.801.6659.257.47

Summary-statistics for different steps of restriction-site associated DNA-sequencing (RAD-seq) data processing. N denotes the number of individuals in each sample. For each population the per individual average of raw read counts (Raw read count in Million bp), the number and percentage of high quality reads that were successfully aligned to the stickleback genome (Bowtie aligned), and the percentage of the aligned reads subsequently fed into Stacks (% Raw reads used) are presented.

RAD sequencing statistics Summary-statistics for different steps of restriction-site associated DNA-sequencing (RAD-seq) data processing. N denotes the number of individuals in each sample. For each population the per individual average of raw read counts (Raw read count in Million bp), the number and percentage of high quality reads that were successfully aligned to the stickleback genome (Bowtie aligned), and the percentage of the aligned reads subsequently fed into Stacks (% Raw reads used) are presented. Genome-wide FST was 0.056 between the Lake Hald (freshwater) and Mariager Fjord (marine) populations and 0.111 between Hadsten (freshwater) and the Mariager Fjord populations. Sliding window analysis of FST revealed high peaks of differentiation, potentially marking chromosome regions under differential selection in marine and freshwater. Twenty-one peaks distributed across 15 different chromosomes were thus identified in the Hadsten – Mariager Fjord comparison, whereas 15 peaks across 9 different chromosomes were revealed in the case of Hald – Mariager Fjord (Figure 2). Though most of these identified regions were found in both marine-freshwater population comparisons, some of them were found in only one of the two pairs.
Figure 2

Genome-wide distribution of smoothed F estimates for pairwise comparisons between the Hadsten/Mariager and Lake Hald/Mariager populations. Grey boxes indicate boundaries of chromosomes (from 1 to 21) and successive chromosomes are denoted by different shades of grey. Peaks above the red line correspond to the chromosomal regions exhibiting elevated differentiation that are referred to throughout the text, from which candidate SNPs for directional selection were chosen. Most peaks of elevated differentiation were shared between the two population comparisons, but in the cases where elevated differentiation was only observed in a single comparison this is denoted by a red dot.

Genome-wide distribution of smoothed F estimates for pairwise comparisons between the Hadsten/Mariager and Lake Hald/Mariager populations. Grey boxes indicate boundaries of chromosomes (from 1 to 21) and successive chromosomes are denoted by different shades of grey. Peaks above the red line correspond to the chromosomal regions exhibiting elevated differentiation that are referred to throughout the text, from which candidate SNPs for directional selection were chosen. Most peaks of elevated differentiation were shared between the two population comparisons, but in the cases where elevated differentiation was only observed in a single comparison this is denoted by a red dot. We selected 96 SNPs for inclusion in the array. Twenty-six were chosen at random, but randomly distributed across 19 chromosomes to represent putatively neutral markers, with FST ranging from 0 to 0.18 between the two independent freshwater populations and the marine sample. The remaining 70 SNPs were chosen to reflect all of the high differentiation regions identified by the sliding-window approach. Some of the SNPs included represented high-differentiation peaks observed in both marine-freshwater population comparisons, but some were found to be outliers in only one of the two comparisons (Figure 2). The SNPs presumably under (hitchhiking) selection exhibited FST values ranging from 0.24 to 0.93 between Hadsten and Mariager Fjord and from 0.27 to 0.78 between Lake Hald and Mariager Fjord (Table 2). The number of outlier SNPs per chromosome ranged from 1 to 7. Considering all SNPs (neutral and under possible selection), each chromosome was represented by at least 4 SNPs.
Table 2

List of the 96 selected SNPs

SNP IDChr_positionpqF ST Pop 1 IDPop 2 ID
5812b*I_14574103CT0.18HadstenMariager
27027*II_13068160AG0.00HadstenMariager
1800*II_19830429GT0.12HadstenMariager
1139*III_9900776CT0.03HadstenMariager
2620*IV_10499598CT0.07HadstenMariager
11120*V_10474117CT0.14HadstenMariager
9990*VI_10404859CT0.06HadstenMariager
10561*VI_2863536AG0.01HadstenMariager
9202*VII_22894561AG-0.12HadstenMariager
35752*VII_4229364AG0.05HadstenMariager
7275*VIII_11705544CT0.00HadstenMariager
28703*VIII_14145458AC-0.05HadstenMariager
4390*IX_10719826AG-0.04HadstenMariager
16548*XI_11232372GT-0.01HadstenMariager
13177*XII_10200557CT-0.11HadstenMariager
14236*XIV_10623162AG0.04HadstenMariager
15044*XIV_7617082CT-0.07HadstenMariager
20574*XV_10990137AG-0.01HadstenMariager
20825*XV_14641595CT-0.01HadstenMariager
31954*XVII_1792778AT-0.03HadstenMariager
15728*XIX_1761203CG-0.01HadstenMariager
22319*XX_12299466AG0.18HadstenMariager
32977*XX_15123750AC-0.04HadstenMariager
21643*XXI_10893532CG0.05HadstenMariager
21858*XXI_4924010AG-0.10HadstenMariager
33523*Scaffold_122_287328GT0.09HadstenMariager
5812I_14574107CT0.50HaldMariager
5939I_16513837CT0.57HadstenMariager
28321I_21607623CT0.27HaldMariager
28526I_4931967AC0.89HadstenMariager
6844I_4932075AT0.85HadstenMariager
1955II_22061028GT0.81HadstenMariager
2113II_3125972AC0.80HadstenMariager
2114II_3182440AG0.72HadstenMariager
276III_13446716AG0.67HadstenMariager
3231IV_20387384TC0.71HadstenMariager
3644IV_29334535AG0.70HadstenMariager
27608IV_29334612AT0.78HadstenMariager
3851IV_3216905AT0.70HadstenMariager
4073IV_5292424AC0.40HadstenMariager
27768IV_6660266AT0.77HadstenMariager
27791IV_8128962GT0.70HadstenMariager
29903V_8795372AG0.65HadstenMariager
10131VI_12603660GT0.68HadstenMariager
8466VII_11202861AG0.61HadstenMariager
9026VII_19985290CG0.79HadstenMariager
9206VII_22946194AG0.91HadstenMariager
7745VIII_17638021CT0.82HadstenMariager
7807VIII_18320215AG0.67HadstenMariager
7808VIII_18320304AG0.76HadstenMariager
8351VIII_9101385AG0.52HadstenMariager
4521IX_13007542GT0.40HadstenMariager
5200IX_5019009CT0.78HadstenMariager
5238IX_5337004CG0.77HadstenMariager
23341X_11668366CG0.36HadstenMariager
33387X_6967185AC0.26HadstenMariager
16649XI_12761116CT0.69HadstenMariager
16691XI_13340957CT0.70HadstenMariager
31479XI_9810247CT0.50HadstenMariager
13340XII_12843029AG0.78HaldMariager
13682XII_18204005CT0.68HadstenMariager
13744XII_3061560AG0.60HadstenMariager
30542XII_8981405CT0.39HadstenMariager
14188XII_9924630AG0.54HadstenMariager
11996XIII_11719547CT0.47HadstenMariager
19976XVI_18491CT0.47HadstenMariager
20063XVI_3012006CT0.24HadstenMariager
35236XVI_5021761CT0.54HadstenMariager
20222XVI_565127CT0.54HadstenMariager
20238XVI_593641CT0.89HadstenMariager
18651XVII_11584572CT0.68HadstenMariager
18814XVII_13715805AG0.70HadstenMariager
32060XVII_7058248CT0.49HaldMariager
17506XVIII_10340214AG0.63HadstenMariager
17577XVIII_11202384GT0.49HadstenMariager
17787XVIII_14143185AC0.76HadstenMariager
18047XVIII_3081169AT0.26HadstenMariager
18262XVIII_6321841CT0.77HadstenMariager
15358XIX_11929930AG0.62HadstenMariager
30997XIX_16674218AT0.69HadstenMariager
30997bXIX_16674221CG0.68HadstenMariager
15971XIX_3850285GT0.48HadstenMariager
31101XIX_5560633CT0.72HadstenMariager
22229XX_10890902AG0.50HaldMariager
32994XX_15998772CT0.53HadstenMariager
22693XX_17936432AG0.93HadstenMariager
23102XX_7553459AC0.56HadstenMariager
21688XXI_11538922CT0.65HadstenMariager
21693XXI_11580802AG0.69HadstenMariager
22037XXI_8170313AT0.71HadstenMariager
24457Scaffold_122_232322AG0.83HadstenMariager
25425Scaffold_27_3893488CT0.59HadstenMariager
33942Scaffold_309_4735AG0.51HadstenMariager
26062Scaffold_58_401854AC0.57HadstenMariager
26071Scaffold_58_511232CT0.45HadstenMariager
26405Scaffold_76_220535CT0.59HadstenMariager

The 26 putatively neutral SNPs are indicated by asterisks (*) following the SNP IDs. Chr_position denotes the position of the SNPs in the threespine stickleback genome [13]. p and q are the two alleles found at the SNP position. FST denotes differentiation at the SNPs between population 1 and population 2.

List of the 96 selected SNPs The 26 putatively neutral SNPs are indicated by asterisks (*) following the SNP IDs. Chr_position denotes the position of the SNPs in the threespine stickleback genome [13]. p and q are the two alleles found at the SNP position. FST denotes differentiation at the SNPs between population 1 and population 2. The potential candidate loci for the SNPs under selection, along with their ontological relationships (when available) are listed in Table 3. This table lists 71 candidate genes identified from 20 chromosomes, 7 of which are involved in functions related to morphogenesis and growth, 2 related to skeletal biology, 5 related to kidney functions and 11 involved in osmoregulation. The remaining 46 candidate genes are associated to other functional categories, such as immune response, hormonal system or vision (see Table 3 for details). We chose not to include SNPs close to EDA, as this gene is usually analyzed using an indel (insertion-deletion) marker (Stn381) that is not suitable for inclusion in the array [10, 18]. Among the SNPs included in the array, the one closest to the EDA gene is situated more than 2.3 Mb away and therefore not showing tight linkage relationships. All sequences along with SNP positions used for generating the array are listed in Additional file 1: Table S1.
Table 3

Identified candidate genes for freshwater marine adaptation in threespine stickleback

SNP IDChr_positionF ST Candidate geneRelated functionMGSBKFOMOF
5812I_145741070,5Teneurin transmembrane protein 1 (ODZ1 )morphogenesis ×
5939I_165138370,57Claudin 4 (CLDN4)internal organ development ×
28321I_216076230,27insulin-like growth factor binding protein 2 (IGFBP2)*growth and developmental rates ×
28526I_49319670,89maltase-glucoamylase (alpha-glucosidase) (MGAM)digestion ×
6844I_49320750,85maltase-glucoamylase (alpha-glucosidase) (MGAM)digestion ×
1955II_220610280,81microfibrillar-associated protein 1 (MFAP1)elastik fibres and collagen formation ×
2113II_31259720,8ADAM metallopeptidase with thrombospondin type 1 motif, 18 (ADAMTS18)*tumor supressor, eye development ×
2114II_31824400,72testis-specific serine kinase 3 (TSSK3)*germ cell development, protein kinase activity ×
276III_134467160,67RUN and SH3 domain containing 1 (RUSC1)*neuronal differentiation, cytoplasmic development × ×
3231IV_203873840,71family with sequence similarity 19 (chemokine (C-C motif)-like) (FAM19A1)regulators of immune and nervous cells ×
3644IV_293345350,7coiled-coil-helix-coiled-coil-helix domain containing 3 (CHCHD3)crista integrity and mitochondrial function ×
27608IV_293346120,78coiled-coil-helix-coiled-coil-helix domain containing 3 (CHCHD3)crista integrity and mitochondrial function ×
3851IV_32169050,7polycomb group ring finger 1 (PCGF1)early embryonic development ×
4073IV_52924240,4vascular endothelial growth factor B (VEGFB)vascular endothelial growth ×
27768IV_66602660,77family with sequence similarity 70, member A (FAM70A)transmembrane protein ×
27791IV_81289620,7heparan sulfate (glucosamine) 3-O-sulfotransferase 1 (HS3ST1)synthesis of anticoagulant ×
29903V_87953720,65retinol binding protein 4, plasma (RBP4)*cardiac regulation, kidney filtration, retinal binding × ×
10131VI_126036600,68glutamate receptor, ionotropic, delta 1 (GRID1)nervous system ×
8466VII_112028610,61lens intrinsic membrane protein 2 (LIM2)eye development and cataractogenesis ×
9026VII_199852900,79SCC-112immune responses ×
9206VII_229461940,91RAD50cell growth and viability ×
7745VIII_176380210,82tumor protein p63 (TP73L)regulation of epithelial morphogenesis ×
7807VIII_183202150,67WW and C2 domain containing 1 (Wwc2)*memory performance, regulation of organ growth × ×
7808VIII_183203040,76WW and C2 domain containing 1 (Wwc2)*memory performance, regulation of organ growth × ×
8351VIII_91013850,52nephrosis 2, idiopathic, steroid-resistant (NPHS2)*renal regulation, cell development × ×
4521IX_130075420,4phosphatidylinositol transfer protein, cytoplasmic 1 (PITPNC1)*cell signaling and lipid metabolism ×
5200IX_50190090,78retinoblastoma binding protein 6 (RBBP6)*suppresses cellular proliferation, embryonic development ×
5238IX_53370040,77KIAA0922immune responses ×
23341X_116683660,36protein kinase (cAMP-dependent, catalytic) inhibitor beta (PKIB)urinary regulation ×
33387X_69671850,26Transcription factor EF1 (EF1)regulates dendritic spine morphogenesis ×
16649XI_127611160,69ATPase, Ca++ transporting, cardiac muscle, slow twitch 2 (ATP2A2)*contraction/relaxation muscle cycle, heart regulation ×
16691XI_133409570,7transcription elongation factor B (SIII), polypeptide 2 (TCEB2)renal regulation ×
31479XI_98102470,5ATP-binding cassette, sub-family A (ABC1), member 3 (ABCA3)*programmed cell death, membrane regulation × ×
13340XII_128430290,78COMM domain containing 7 (COMMD7)hepato cellular growth ×
13682XII_182040050,68TPX2, microtubule-associated (TPX2)cell development ×
13744XII_30615600,6keratin 18 (KRT18)internal organ development ×
30542XII_89814050,39erythrocyte membrane protein band 4.1-like 1 (EPB41L1)*neuronal plasma regulation, cytoskeleton regulation × ×
14188XII_99246300,54suppression of tumorigenicity 5 (ST5)immune responses ×
11996XIII_117195470,47transient receptor potential cation channel, subfamily M, member 3 (TRPM3)mediates calcium entry ×
19976XVI_184910,47MMADHC (uc010fnu.1) (CR595331)vitamine B12 metabolism ×
20063XVI_30120060,24sodium leak channel, non-selective (VGCNL1)neuronal background sodium leak conductance, cell death ×
35236XVI_50217610,54retinoid X receptor, alpha (RXRA)*retinoid development, heart development and morphogenesis × ×
20222XVI_5651270,54FLJ10154hormonal expression ×
20238XVI_5936410,89FLJ10154hormonal expression ×
18651XVII_115845720,68EPH receptor A8 (EPHA8)nervous system development ×
18814XVII_137158050,7PDZ domain containing ring finger 3 (PDZRN3)myogenic differentiation ×
32060XVII_70582480,49vitamin D (1,25- dihydroxyvitamin D3) receptor (VDR)hormone receptor for vitamine D3, related to bone density ×
17506XVIII_103402140,63FBJ osteosarcoma oncogene (FOS)cell proliferation, differentiation, transformation ×
17577XVIII_112023840,49iodotyrosine deiodinase (C6orf71)thyroid hormone production ×
17787XVIII_141431850,76phospholipase C, beta 1 (PLCB1)intracellular transduction ×
18047XVIII_30811690,26regulatory factor X, 6 (RFXDC1)Production of insulin ×
18262XVIII_63218410,77estrogen receptor 2 (ER beta) (ESR2)hormonal receptor, gametogenesis ×
15358XIX_119299300,62death-associated protein kinase (DAPK2)programmed cell death ×
30997XIX_166742180,69lipase maturation factor 2 (LMF2)maturation of the endoplasmic reticulum ×
30997bXIX_166742210,68lipase maturation factor 2 (LMF2)maturation of the endoplasmic reticulum ×
15971XIX_38502850,48Fc receptor-like A (FCRLM1)immune responses ×
31101XIX_55606330,72AK130540salivary gland ×
22229XX_108909020,5cornifelin (CNFN)ion transport across squamous epithelia, keratinization ×
32994XX_159987720,53ubiquilin 4 (UBQLN4)proteasomal protein degradation ×
22693XX_179364320,93TAF12 RNA polymerase II, TATA box binding protein (TBP)-associated factor (TAF12)transcriptional activators ×
23102XX_75534590,56metastasis suppressor 1 (MTSS1)metastases supressor ×
21688XXI_115389220,65AK095260osmoregulation ×
21693XXI_115808020,69cadherin 20 (CDH20)tumor suppressor ×

Candidate genes are identified for 63 SNPs under putative directional selection. Note that 7 out of the 70 putative outliers SNPs were not found near to a coding gene and are not reported in the table. This concerns the six SNPs identified in diverse scaffold and one SNP (22037) in chromosome XX (see Table 2). Genes are assigned to one of the following categories: MG = Morphogenesis and Growth, OM = Osmoregulation, SB = skeletal Biology, KF = Kidney Function, OF = Other Function. These putative functions have been assessed using both the GeneCard database (http://www.genecards.org/) and AmiGO 2 GO browser. In cases of multiple functions assigned to a single gene, this is denoted by “*”. For genes with multiple functions, only main functions previously documented in vertebrate species are reported.

Identified candidate genes for freshwater marine adaptation in threespine stickleback Candidate genes are identified for 63 SNPs under putative directional selection. Note that 7 out of the 70 putative outliers SNPs were not found near to a coding gene and are not reported in the table. This concerns the six SNPs identified in diverse scaffold and one SNP (22037) in chromosome XX (see Table 2). Genes are assigned to one of the following categories: MG = Morphogenesis and Growth, OM = Osmoregulation, SB = skeletal Biology, KF = Kidney Function, OF = Other Function. These putative functions have been assessed using both the GeneCard database (http://www.genecards.org/) and AmiGO 2 GO browser. In cases of multiple functions assigned to a single gene, this is denoted by “*”. For genes with multiple functions, only main functions previously documented in vertebrate species are reported. Validation of the array based on analysis of 96 individuals from the Odder River provided results for all SNPs. However, there was significant drop-out at the markers 19976 and 26062 indicating technical problems with these two SNPs. Seven loci showed low expected heterozygosity (He < 0.05), whereas mean He across all loci was 0.226 (Additional file 2: Table S2). Twelve loci showed deviations from Hardy-Weinberg equilibrium, possibly reflecting that samples were taken in a mixture zone between freshwater and marine sticklebacks (Additional file 2: Table S2). Genotypic data for all SNPs and individuals are provided in Genalex 6.5 [27] format in Additional file 3.

Discussion

Development and utility of low density SNP chips

We are currently witnessing a transition from population genetics to population genomics, particularly mediated by the development of Next Generation Sequencing [29-31]. Whereas this allows for addressing research questions at the level of entire genomes [13, 32, 33], the methods used also provide resources that can be used for generating markers for more specific purposes. For instance, Hess et al. [34] conducted a population genomics study of Pacific lampreys using RAD sequencing, and subsequently used RAD data to construct a 96 SNP chip including markers that could be used for species identification, for general studies of genetic population structure and for screening loci previously suggested to be under directional selection [35]. Similarly, Pujolar et al. [36] used RAD sequencing of European (Anguilla anguilla) and American eel (A. rostrata) to develop a 96 SNP chip encompassing markers diagnostic for the two species. This resource was subsequently used for tracing hybridization between the two species several generations back in time. The SNP chip developed in the current study similarly distills information derived from RAD sequencing. The 96 SNPs encompass markers of chromosomal regions that exhibit elevated differentiation in comparisons involving a marine population and two independent freshwater stickleback populations, possibly reflecting diversifying selection. It therefore provides a useful resource for analyzing differential adaptive responses in freshwater and marine sticklebacks and the extent to which this reflects parallel evolution. Nevertheless, it also involves some important caveats. First, although there is evidence for geographically widespread parallel evolution and gene reuse when marine sticklebacks colonize freshwater environments [11, 13, 16], there are clearly also examples of non-parallel adaptive responses [16], either reflecting differences in local freshwater environments or different genetic architecture underlying similar phenotypes. Our inclusion of SNPs therefore undoubtedly represents some degree of ascertainment bias [37, 38], particularly in terms of not identifying chromosomal regions under selection in other freshwater populations than those used for identifying SNPs. Second, three-spine stickleback is widespread across the Northern Hemisphere, and there is presumably a geographical limit defined by phylogeographical relationships beyond which many of the SNPs are no longer polymorphic; this can be regarded as another aspect of ascertainment bias. The developed SNP chip may therefore be of primary use in North-Western Europe, encompassing the North Sea and Baltic Sea regions. Other marker resources have been developed for three-spine stickleback, including a 3,072 SNP chip [12], a resource of 158 microsatellite markers linked to physiologically important genes [17] and a resource of 110 SNPs representing both genic and non-genic regions [39]. Compared to the 3,072 SNP chip [12], the array developed in the present study obviously provides less dense genome coverage, but is also cheaper in running costs and specifically targeted towards freshwater-saltwater adaptation. Compared to the microsatellite resource [17], our 96 SNP array provides faster genotyping. On the other side, marker-by-marker multiallelic microsatellites provide more statistical power than diallelic SNPs [39-41]. A further important difference between 1) the microsatellite resource [17] and the 110 SNP resource [39] on the one side and 2) the current 96 SNP array on the other side consists in the choice of markers. Microsatellites and approximately half of the 110 SNPs were chosen based on the criterion that they should be linked to physiologically important genes [17]. In contrast, 70 of the SNPs included in the 96 SNP array were chosen from genomic regions exhibiting elevated differentiation, regardless of their linkage to candidate genes. There is increasing evidence that non-coding DNA may be of functional importance and potentially under selection [13, 42–44]. Indeed, 7 of the 70 SNPs under putative directional selection could not be linked to a candidate gene and could therefore potentially mark regulatory regions under selection. In total, our resource can be considered unbiased with respect to prior choice of candidate genes, but can be subject to ascertainment bias given that markers were chosen based on genetic differentiation between a subset of freshwater and marine populations. On the other side, the microsatellite resource by Shimada et al. [17] and a major part of the SNP resource by DeFaveri et al. [39] are specifically targeted towards genes of physiological importance but do not involve ascertainment bias in terms of choosing loci exhibiting high differentiation. Hence, there are pros and cons with both approaches and the choice of markers and methods may depend on the specific study and research question.

Candidate genes for marine and freshwater adaptation

Similar to previous studies undertaking genome-wide scans of threespine sticklebacks [11–13, 16, 17], we identified several chromosomal regions that are likely under differential selection in freshwater and marine environments (Figure 2). Comparison of our results with results from whole genome sequencing [12] and RAD sequencing [11] suggests that several of the regions may be the same, thereby also implying that the same candidate genes may be involved. Specifically, there appears to be concordance among the previous and the current study in identifying regions on chromosomes I, IV, VII, IX, XI, XIV, XVI and XX as being involved in freshwater-saltwater adaptation (compare e.g. Figure 2 of the present study with Figure two (a) in [13]). The identified outlier chromosomal regions harbor a number of candidate genes with functional relationships that are already known to be important for adaptation between freshwater and marine habitats, such as genes affecting bone development, kidney function and osmoregulation (Skeletal Biology: SB; Kidney Function: KF; Osmoregulation: OM ,respectively; see Table 3). We find it interesting that our study reveals two candidate loci (both on chromosome XI; ATP2A2 and ABCA3, see Table 3) putatively implied in ATPase activity, generally associated with salinity tolerance. Other candidate genes related to this ATPase activity have previously been found on chromosome I and in two other regions of chromosome XI [12], and the candidate genes suggested by the current study further emphasize the importance of this physiological trait. The insulin-like growth factor binding protein 2, IGFBP2 in chromosome I (see Table 3) is another interesting candidate gene observed in the present study that was also suggested as a candidate for freshwater-marine adaptation by Hohenlohe et al. [11]. We also note four highly differentiated SNPs in four different chromosomal regions (Table 3); ADAMTS18 in chromosome II, retinol binding protein 4 (RBP4) in chromosome V, lens fiber membrane intrinsic protein 2 (LIM2) in chromosome VII and the retinoic X receptor alpha (RXRA) in chromosome XVI (FST values ranging from 0.54 to 0.8, Table 2) that could be involved in vision. This could reflect adaptation to different light environments, in the present case between freshwater and marine habitats, as previously observed in other marine organisms [45, 46]. As our SNP resource was specifically designed based on RAD sequencing data, there are a number of candidate genes and chromosomal regions that will inevitably not be represented. First, some candidate genes and SNPs may only be regionally important, as discussed previously. Second, RAD sequencing using the 8-base cutter SbfI obviously provides less resolution than e.g. whole genome sequencing, and there may be regions and candidate genes showing elevated differentiation that have not been detected. Our SNP resource can be regarded as a reduced representation of outlier regions detected by RAD sequencing, which by itself represents a reduced representation of the whole genome. Obviously, the SNP resource can be supplemented by other previously identified candidate genes and markers, and conversely it represents a supplement to the markers and resources already available [10, 12, 13, 17].

Conclusions

We have constructed a low density SNP array that encompasses both neutral SNPs for background and SNPs representing genomic regions that exhibit differentiation compatible with diversifying selection in freshwater and marine environments. We find this resource to be particularly useful for addressing research questions that require high sample sizes, e.g. several hundreds, which would in most cases not be feasible for whole genome sequencing and RAD sequencing. For instance, this concerns situations where hybrid zone dynamics between freshwater and marine sticklebacks are analyzed along environmental gradients [20]. This may necessitate large sample sizes, e.g. if continuous sampling is conducted in order to identify clinal shifts of allele frequencies [47] or define populations based on neutral or adaptive markers [48]. Also, studies of selection based on detecting allele frequency change using analysis of temporal samples, e.g. taken at different time points within a year [18], may require analysis of many samples and large sample sizes. We find our SNP array to be particularly useful in such situations, as it allows for studies going beyond analyzing EDA and instead targeting multiple genomic regions involved in differential adaptation to freshwater and marine environments. We specifically intend to use the SNP array for testing the hypothesis that gene flow from marine populations overrides selection in freshwater sticklebacks in coastal regions [18]. If this is indeed the case, then this should not only be detectable at the EDA locus but also at other genes involved in adaptive responses, including those represented in our array.

Availability of supporting data

Sequence reads have been deposited in NCBI’s Sequence Read Archive (Accession number: SAMN0255793). Additional file 1: Table S1: Nucleotide sequences for each SNP position. Fifty nucleotides before and after the targeted position are reported in this table. The two nucleotides corresponding to SNP alleles are presented in brackets. (XLSX 15 KB) Additional file 2: Table S2: Diversity indices estimated for each locus over 96 individuals from Odder river system. Sample Size (N), observed heterozygosity (Ho), expected heterozygosity (He) and outcomes of tests for Hardy-Weinberg equilibrium (HWE test). *significant at 5% level, **significant at 0.01 level, ***significant at 0.001 level. (XLSX 12 KB) Additional file 3: SNP genotypes for 96 SNPs in 96 sticklebacks. SNP genotype data for 96 SNPs in 96 sticklebacks from the Odder River, Denmark. The data are provided in Genalex 6.5 [27] format. (XLSX 99 kb) (XLSX 100 KB)
  42 in total

1.  Genomics: Stickleback is the catch of the day.

Authors:  Hopi E Hoekstra
Journal:  Nature       Date:  2012-04-04       Impact factor: 49.962

2.  A gene with major phenotypic effects as a target for selection vs. homogenizing gene flow.

Authors:  Joost A M Raeymaekers; Nellie Konijnendijk; Maarten H D Larmuseau; Bart Hellemans; Luc De Meester; Filip A M Volckaert
Journal:  Mol Ecol       Date:  2013-11-28       Impact factor: 6.185

Review 3.  Genome-wide genetic marker discovery and genotyping using next-generation sequencing.

Authors:  John W Davey; Paul A Hohenlohe; Paul D Etter; Jason Q Boone; Julian M Catchen; Mark L Blaxter
Journal:  Nat Rev Genet       Date:  2011-06-17       Impact factor: 53.242

4.  Strong and consistent natural selection associated with armour reduction in sticklebacks.

Authors:  Arnaud LE Rouzic; Kjartan Østbye; Tom O Klepaker; Thomas F Hansen; Louis Bernatchez; Dolph Schluter; L Asbjørn Vøllestad
Journal:  Mol Ecol       Date:  2011-03-28       Impact factor: 6.185

5.  A single amino acid mutation contributes to adaptive beach mouse color pattern.

Authors:  Hopi E Hoekstra; Rachel J Hirschmann; Richard A Bundey; Paul A Insel; Janet P Crossland
Journal:  Science       Date:  2006-07-07       Impact factor: 47.728

6.  The genetic architecture of reproductive isolation during speciation-with-gene-flow in lake whitefish species pairs assessed by RAD sequencing.

Authors:  Pierre-Alexandre Gagnaire; Scott A Pavey; Eric Normandeau; Louis Bernatchez
Journal:  Evolution       Date:  2013-03-09       Impact factor: 3.694

7.  The probability of genetic parallelism and convergence in natural populations.

Authors:  Gina L Conte; Matthew E Arnegard; Catherine L Peichel; Dolph Schluter
Journal:  Proc Biol Sci       Date:  2012-10-17       Impact factor: 5.349

8.  The genomic basis of adaptive evolution in threespine sticklebacks.

Authors:  Felicity C Jones; Manfred G Grabherr; Yingguang Frank Chan; Pamela Russell; Evan Mauceli; Jeremy Johnson; Ross Swofford; Mono Pirun; Michael C Zody; Simon White; Ewan Birney; Stephen Searle; Jeremy Schmutz; Jane Grimwood; Mark C Dickson; Richard M Myers; Craig T Miller; Brian R Summers; Anne K Knecht; Shannon D Brady; Haili Zhang; Alex A Pollen; Timothy Howes; Chris Amemiya; Jen Baldwin; Toby Bloom; David B Jaffe; Robert Nicol; Jane Wilkinson; Eric S Lander; Federica Di Palma; Kerstin Lindblad-Toh; David M Kingsley
Journal:  Nature       Date:  2012-04-04       Impact factor: 49.962

9.  Rapid SNP discovery and genetic mapping using sequenced RAD markers.

Authors:  Nathan A Baird; Paul D Etter; Tressa S Atwood; Mark C Currey; Anthony L Shiver; Zachary A Lewis; Eric U Selker; William A Cresko; Eric A Johnson
Journal:  PLoS One       Date:  2008-10-13       Impact factor: 3.240

10.  Killer whale nuclear genome and mtDNA reveal widespread population bottleneck during the last glacial maximum.

Authors:  Andre E Moura; Charlene Janse van Rensburg; Malgorzata Pilot; Arman Tehrani; Peter B Best; Meredith Thornton; Stephanie Plön; P J Nico de Bruyn; Kim C Worley; Richard A Gibbs; Marilyn E Dahlheim; Alan Rus Hoelzel
Journal:  Mol Biol Evol       Date:  2014-02-04       Impact factor: 16.240

View more
  7 in total

Review 1.  Fish TNF and TNF receptors.

Authors:  Yaoguo Li; Tiaoyi Xiao; Jun Zou
Journal:  Sci China Life Sci       Date:  2020-07-24       Impact factor: 6.038

2.  Population genomic evidence for adaptive differentiation in Baltic Sea three-spined sticklebacks.

Authors:  Baocheng Guo; Jacquelin DeFaveri; Graciela Sotelo; Abhilash Nair; Juha Merilä
Journal:  BMC Biol       Date:  2015-03-24       Impact factor: 7.431

3.  Low genetic and phenotypic divergence in a contact zone between freshwater and marine sticklebacks: gene flow constrains adaptation.

Authors:  Susanne Holst Pedersen; Anne-Laure Ferchaud; Mia S Bertelsen; Dorte Bekkevold; Michael M Hansen
Journal:  BMC Evol Biol       Date:  2017-06-06       Impact factor: 3.260

4.  Comparison of Migratory and Resident Populations of Brown Trout Reveals Candidate Genes for Migration Tendency.

Authors:  Alexandre Lemopoulos; Silva Uusi-Heikkilä; Ari Huusko; Anti Vasemägi; Anssi Vainikka
Journal:  Genome Biol Evol       Date:  2018-06-01       Impact factor: 3.416

5.  Divergent selection and drift shape the genomes of two avian sister species spanning a saline-freshwater ecotone.

Authors:  Jennifer Walsh; Gemma V Clucas; Matthew D MacManes; W Kelley Thomas; Adrienne I Kovach
Journal:  Ecol Evol       Date:  2019-11-07       Impact factor: 2.912

6.  Identification of stress-related genes by co-expression network analysis based on the improved turbot genome.

Authors:  Xi-Wen Xu; Weiwei Zheng; Zhen Meng; Wenteng Xu; Yingjie Liu; Songlin Chen
Journal:  Sci Data       Date:  2022-06-29       Impact factor: 8.501

7.  Genomics of rapid ecological divergence and parallel adaptation in four tidal marsh sparrows.

Authors:  Jennifer Walsh; Phred M Benham; Petra E Deane-Coe; Peter Arcese; Bronwyn G Butcher; Yvonne L Chan; Zachary A Cheviron; Chris S Elphick; Adrienne I Kovach; Brian J Olsen; W Gregory Shriver; Virginia L Winder; Irby J Lovette
Journal:  Evol Lett       Date:  2019-07-16
  7 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.