| Literature DB >> 28250812 |
Lei Liu1, Keng Pee Ang2, J A K Elliott2, Matthew Peter Kent3, Sigbjørn Lien3, Danielle MacDonald4, Elizabeth Grace Boulding5.
Abstract
Comparative genome scans can be used to identify chromosome regions, but not traits, that are putatively under selection. Identification of targeted traits may be more likely in recently domesticated populations under strong artificial selection for increased production. We used a North American Atlantic salmon 6K SNP dataset to locate genome regions of an aquaculture strain (Saint John River) that were highly diverged from that of its putative wild founder population (Tobique River). First, admixed individuals with partial European ancestry were detected using STRUCTURE and removed from the dataset. Outlier loci were then identified as those showing extreme differentiation between the aquaculture population and the founder population. All Arlequin methods identified an overlapping subset of 17 outlier loci, three of which were also identified by BayeScan. Many outlier loci were near candidate genes and some were near published quantitative trait loci (QTLs) for growth, appetite, maturity, or disease resistance. Parallel comparisons using a wild, nonfounder population (Stewiacke River) yielded only one overlapping outlier locus as well as a known maturity QTL. We conclude that genome scans comparing a recently domesticated strain with its wild founder population can facilitate identification of candidate genes for traits known to have been under strong artificial selection.Entities:
Keywords: Atlantic salmon; SNP; artificial selection; candidate genes; continent of origin; domestication selection; outlier tests; population structure
Year: 2016 PMID: 28250812 PMCID: PMC5322405 DOI: 10.1111/eva.12450
Source DB: PubMed Journal: Evol Appl ISSN: 1752-4571 Impact factor: 5.183
Atlantic salmon (Salmo salar L.) samples analyzed in the present study
| Group | Population name | Gen (strain) | Abbreviation | Sample size |
|---|---|---|---|---|
| Aquacultural Group | 2008–2009 Parents & Offspring | 6 (84JC) | 2009PO_AQUA | 134 |
| 2009–2010 Parents & Grandparents | 5 (89JC) | 2010PG_AQUA | 250 | |
| 2010–2011 Parents & Grandparents | 5 (90JC) | 2011PG_AQUA | 268 | |
| 2010–2011N Parents & Grandparents | 5 (90JC) | 2011PGN_AQUA | 96 | |
| 2011–2012 Parents & Grandparents | 6 (87JC) | 2012PG_AQUA | 191 | |
| Mowi (EU) | 10 | MOWI | 8 | |
| Hybrids (AQUA × MOWI) | – | Hybrids | 10 | |
| Wild Founder | Tobique River Wild Population | 1 | TOB_WILD | 98 |
| Wild Outgroup | Stewiacke River Wild Population | 1 | STW_WILD | 100 |
All are North American (NA) subspecies of Salmo salar unless otherwise indicated. Each population of the SJR AQUA strains was primarily derived from a single year class of fish from the Mactaquac Biodiversity Facility (see text of Methods). Historically, the 4‐year classes of the SJR AQUA strain were spawned on a 4‐year cycle so that the offspring of year 1 of a cycle were the parents of year 1 of the next cycle (J. A. K. Elliott, pers. obs.).
Number of generations in captivity (year strain founded from fish returning to the Mactaquac Dam).
Relatives of the broodstock in the main 2010–2011 breeding nucleus.
European (EU) subspecies of Salmo salar. The founder fish for the Mowi strain were collected from the Voss River, Norway, and surrounding areas in 1964 (Ferguson et al., 2007).
F1 hybrids (2001) and F1 backcrosses (2005) between NA and EU (Mowi) subspecies of Salmo salar (Boulding et al., 2008).
The sampled “wild‐exposed two‐year salmon” from single‐pair crosses were captured as smolts or presmolts from the wild by DFO technicians (i.e., caught as smolts in the rotary screw traps set lower in the Tobique River system) then reared in the Mactaquac hatchery for 2 years to maturity. Fin clips were taken during spawning in late Fall 2010 under the supervision of D.M.
Stewiacke River reared at Coldbrook Biodiversity Facility, NS from single‐pair crosses. Fin clips were taken during spawning in late Fall 2010 under the supervision of S. Ratelle.
Pairwise F ST values: Slatkin's linearized F ST a (above the diagonal) and Nei's mean number of pairwise differencesb (below the diagonal)c , d
| 2009PO_AQUA | 2010PG_AQUA | 2011PG_AQUA | 2011PGN_AQUA | 2012PG_AQUA | TOB_WILD | STW_WILD | |
|---|---|---|---|---|---|---|---|
| 2009PO_AQUA | 0.051 | 0.057 | 0.064 | 0.028 | 0.046 | 0.132 | |
| 2010PG_AQUA | 0.049 | 0.034 | 0.003# | 0.028 | 0.016 | 0.101 | |
| 2011PG_AQUA | 0.054 | 0.033 | 0.043 | 0.030 | 0.031 | 0.105 | |
| 2011PGN_AQUA | 0.060 | 0.003# | 0.041 | 0.039 | 0.026 | 0.115 | |
| 2012PG_AQUA | 0.027 | 0.028 | 0.029 | 0.038 | 0.028 | 0.107 | |
| TOB_WILD | 0.044 | 0.015 | 0.031 | 0.025 | 0.028 | 0.089 | |
| STW_WILD | 0.116 | 0.092 | 0.095 | 0.103 | 0.096 | 0.082 |
Pairwise F ST values (Weir & Cockerham, 1984) calculated as Slatkin's linearized F ST (Slatkin, 1995) as implemented in Arlequin 3.5 (Excoffier & Lischer, 2010). All genetic distance measures are highly significant in a permutation test with 10,100 permutations at p = .00000 + −.0000 except for # which had p = .00762 + −.0009.
Nei's mean number of pairwise differences between pairs of populations is a method of measuring genetic distance when the characters are binary (Nei & Li, 1979) as implemented in Arlequin 3.5 (Excoffier & Lischer, 2010). All genetic distance measures are highly significant in a permutation test with 10,100 permutations at p = .00000 + −.0000 except for # which had p = .00762 + −.0009.
Full population names corresponding to these abbreviations are given in Table 1.
All genetic distance calculations were performed on a reduced dataset after individuals shown by STRUCTURE to have European ancestry had been removed.
Figure 1Detection of outlier loci putatively under diversifying selection within five different groups of the SJR Aquaculture strain and one wild group from the Tobique River (2009PO_AQUA, 2010PG_AQUA, 2011PG_AQUA, 2011PGN_AQUA, 2012PG_AQUA, and TOB_WILD). (a) Locus distribution on a continuous‐chromosome nonhierarchical Arlequin 3.5 analysis. The solid blue line represents the −log10(p) = 2 (p = .001). The solid red line corresponds to the −log10(p) = 3 (p = .0001) (see Appendix S1 for outlier loci numbers and official names of SNPs on chip). (b) Ten outlier loci by BayeScan 2.1 (Foll & Gaggiotti, 2008). The F ST estimates are plotted against the false discovery rate (q‐values). Loci to the left of the solid black line that correspond to the q = 0.05 are significant outliers (see Appendix S2 for outlier loci numbers and official names of SNPs on chip)
Figure 3Locus distribution on a continuous chromosome from hierarchical Arlequin 3.5 analysis with the three AQUA populations {2010PG, (2011PG+2011PGN), and 2012 PG} placed into one group and the two random halves of the TOB_WILD placed in another group (Appendix S5). The line at −log10(p) = 1.5 represents p = .05), the line at −log10(p) = 2 represents p = .01, and the line at −log10(p) = 3 represents p = .0001. (a) F ST. (b) F CT. (c) QTL (with number showing chromosome and the letter the study “a” Baranski et al., 2010; “b” Boulding et al., 2008; “c” Gutierrez et al., 2012 males time 3, “d” QTLs from Gutierrez et al., 2012 females time 3. “e” Gutierrez et al., 2012 females time 4, “f” Gutierrez et al., 2012 Males time 4, “g” Houston et al., 2009; “h” Petersen et al. 2013, “i” Reid et al., 2005; “j” Tsai et al. 2015)
Figure 4A Venn diagram with the circles representing the three different methods of outlier analysis done comparing the TOB_WILD population with the AQUA populations using Arlequin 3.5 showing: (i) the one overlapping subset of outlier loci found by all three methods of analysis, (ii) the three overlapping subsets of outlier loci that were found only by two methods of analysis, and (iii) the three subsets found exclusively by one method of analysis. The 17 loci found by all three methods are shown in Table 4
Figure 2Locus distribution on a continuous chromosome for pairwise comparisons of six different generations of the AQUA population versus TOB_WILD population. The solid blue lines represent the −log10(p) = 2 (p = .001). The solid red lines correspond to the −log10(p) = 3 (p = .0001). Detailed results in Appendix S3
Protein homologies found for 23 candidate SNP loci under diversifying selectionb identified by nonhierarchicalc and hierarchicalf Arlequin analysis and BayeScand after outlier analysis of datasets comprising AQUA populations and the TOB_WILD population. Homologies are also shown for the 17 “consistent” loci found using all three Arlequin F ST methodse. Genome position of transcript is from Ssa ICSASG_v2 (http://salmobase.org)
| SNP Name | Nonhierarch. Arlequin | BayeScan | All three Arlequin | Hierarch. Arlequin | Chr | Position NA map | Chr | Position EU map | %seq. similarity | Genome position (transcript) | E‐value | Matches (Gene or Protein) |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| ESTV_15118_153 | √ | 1 | 13.7 | 1 | 94.9 | 70% | CIGSSA_003028.t4 | 1E‐75 | Mitochondrial ribosomal protein s5 | |||
| ESTNV_16380_81 | √ | 3 | 11.5 | 3 | 0.9 | 99% | CIGSSA_119709.t2 | 2E‐125 | WW domain‐containing adapter protein with coiled‐coil‐like isoform X2 | |||
| GCR_cBin8095_Ctg1_103 | 3 | 103.2 | 3 | 87.2 | 99% | CIGSSA_018951.t1 | 4E‐138 | Protein phosphatase 1 regulatory subunit 1B‐like | ||||
| ESTNV_36457_2447 | √ | √ | 4 | 0 | 4 | 6.1 | 99% | CIGSSA_024082.t6 | 0 | Protogenin B‐like isoform X1 | ||
| ESTNV_35352_77 | √ | √ | 4 | 101.6 | 4 | 109.9 | 72% | CIGSSA_024847.t1 | 1E‐163 | RPA‐interacting protein rpain‐a | ||
| GCR_hBin33595_Ctg1_191 | √ | √ | √ | 6 | 7.9 | – | – | 98% | CIGSSA_032258.t2 | 1E‐10 | ATPase family AAA domain protein 5‐like isoform X1 | |
| ESTV_16674_300 | 6 | 63.7 | – | – | 100% | CIGSSA_034497.t1 | 3E‐14 | Zona pellucida sperm‐binding protein 4‐like | ||||
| ESTNV_33402_1114 | √ | 6 | 67.4 | 6 | 66 | 100% | CIGSSA_032952.t1 | 0 | Cleavage and polyadenylation specificity factor subunit 3 | |||
| ESTNV_34703_1491 | √ | √ | 9 | 67.4 | 9 | 52.7 | 100% | CIGSSA_042894.t2 | 0 | HCLS1‐binding protein 3 | ||
| ESTNV_17881_371 | √ | √√ | 9 | 96.2 | 9 | 91.9 | 99% | XM_014214565 | 0 | Protein phosphatase, Mg2+/Mn2+ dependent | ||
| ESTNV_24797_128 | √ | √ | √ | 12 | 38.0 | 12 | 68.7 | 61.6% | CIGSSA_063980.t1 | 3.41E‐27 | MHC class ii alpha chain | |
| ESTNV_16810_167 | √ | √ | √ | 12 | 70.4 | 12 | 108.3 | 99.8% | CIGSSA_062376.t1 | 0 | CD34b1 | |
| ESTNV_27237_290 | √ | 13 | 37.1 | – | – | 94% | CIGSSA_069043.t1 | 2.39E‐66 | Orm1‐like protein 2 | |||
| ESTNV_22997_260 | √ | √ | √ | 13 | 52 | 13 | 85 | 97% | XM_014138527.1 | 0 | RILP‐like protein 1 (LOC106568309), transcript variant X4, | |
| ESTNV_28880_179 | √ | √ | 14 | – | 14 | 62 | 100% | CIGSSA_073641.t4 | 2E‐151 | Probable 39S ribosomal protein L24, mitochondrial isoform X1 | ||
| ESTNV_29368_168 | √ | 15 | 34.9 | 15 | 44.2 | 83% | CIGSSA_079971.t5 | 2.00E‐161 | Translocation associated membrane protein 2 | |||
| ESTNV_34978_699 | √ | √ | 16 | 42.9 | 16 | 43.5 | 100% | CIGSSA_085177.t1 | 4E‐102 | Glycine cleavage system H protein, mitochondrial‐like | ||
| ESTV_14714_122 | √ | √ | √ | 19 | 61.5 | 19 | 52.7 | 98% | XM_014158407.1 | 0.0 | Zinc transporter ZIP11‐like transcript variant X2, mRNA | |
| ESTNV_31104_129 | √ | 20 | – | 20 | 46.2 | 100% | CIGSSA_101213.t1 | 3E‐45 | Rab5 gdpgtp exchange factor | |||
| ESTNV_28135_402 | √ | √ | 22 | 58.2 | 22 | 48.5 | 100% | CIGSSA_109518.t2 | 1E‐26 | Claudin‐19 isoform X2 | ||
| ESTNV_28135_544 | √ | √ | 22 | 58.2 | 22 | 48.5 | 100% | CIGSSA_109518 | 1E‐26 | Claudin‐19 isoform X2 | ||
| ESTNV_31411_621 | √ | 22 | 44.5 | 22 | 35.4 | 100% | CIGSSA_108471.t1 | 1E‐92 | DNA‐binding protein inhibitor ID‐1 | |||
| ESTNV_31110_261 | √ | 1/23 | 63.3 | 23 | 13.2 | 91% | CIGSSA_112247.t1 | 2E‐26 | Thrombopoietin receptor‐like | |||
| ESTNV_31110_721 | √ | 1/23 | 63.3 | 23 | 13.2 | 91% | CIGSSA_112247.t1 | 2E‐26 | Thrombopoietin receptor‐like | |||
| ESTNV_31210_275 | √ | √ | 1/23 | 118.1 | 23 | 51.4 | 87% | CIGSSA_112663.t1 | 0 | Palmitoyltransferase ZDHHC7‐like isoform X1 | ||
| ESTNV_35075_2399 | √ | √ | 25 | 73 | 25 | 52.7 | 92% | CIGSSA_117588.t3 | 0 | Type I inositol 3,4‐bisphosphate 4‐phosphatase‐like isoform X3 | ||
| ESTNV_25950_262 | √ | 26/28 | 49.6 | 26 | 22.9 | 81% | CIGSSA_119802.t1 | 4E‐140 | Phosphatidylinositol 3‐kinase regulatory subunit alpha‐like |
p < .001.
After individuals shown by STRUCTURE to have European subspecies ancestry had been removed from the dataset.
Non‐hierarchical Arlequin outlier analysis (Appendix S1) was done using the nonhierarchical option and included five AQUA populations: 2010PG_AQUA, (2011PG_AQUA + 2011PGN_AQUA), 2012PG_AQUA as well as the TOB_WILD population.
BayeScan analysis (Figure 2; Appendix S2) is always nonhierarchical and included five AQUA populations: 2009PO_AQUA, 2010PG_AQUA, 2011PG_AQUA, 2011PGN_AQUA, 2012PG_AQUA, as well as the TOB_WILD population.
Hierarchical Arlequin outlier analysis (Appendix S5) was performed using the hierarchical option. The three large populations of AQUA populations (2010PG, {2011PG + 2011PGN}, and 2012 PG) were placed in one group, and the TOB_WILD population was randomly split into two random populations (see text) and placed in a second group.
Significant F ST (p < .01) in three methods of Arlequin analysis including in at least one of the pairwise comparisons (Figure 2; Appendix S3).
“Chr” means the chromosome number on which the outlier locus was located.
“NA” means the position on the North American Atlantic salmon female linkage map by Brenna‐Hansen et al. (2012); “EU” means the position on the European Atlantic salmon female linkage map by Lien et al. (2011).
Genome position is shown as the reference transcript name and number containing the SNP where possible as numbering of the base pairs may change in subsequent releases of Atlantic salmon (Salmo salar) genome.
“√” means the significance value is smaller than the significance level (see text) for that method.
These two SNPs are 460 base pairs apart on in the same mRNA transcript and, consequently, were in perfect linkage disequilibrium with each other.
This SNP is also an outlier locus for nonhierarchical and hierarchical comparisons of the Stewiacke populations with the AQUA populations.
Chromosome containing a SNP inferred from European map (Lien et al., 2011) after known translocations were accounted for (Brenna‐Hansen et al., 2012); consequently, the exact position on the North American chromosome is unknown.
ESTNV_31210_275 codes for a protein that plays a role in follicle stimulating hormone activation of testicular Sertoli cells (Pedram et al. 2012).
Comparison between overlapping sets of 17 outlier loci found by all three different methodsb of outlier analysis that compared the TOB_WILD population with the AQUA populations using Arlequin 3.5 (Figure 4). Outlier SNPs within 15 cM of the focal SNP from previous studies that included the SJR watershed. Candidate genes documented to affect growth, sexual maturity, or immune response within 5 Mb of the SNP (http://salmobase.org) are also shown
| SNP NAME | Chr NA map | NA | Chr EU map | EU | Published outlier loci ± 15 cM EU female map | Candidate growth/gonad/immune genes within 5 Mbp of SNP |
|---|---|---|---|---|---|---|
|
GCR_cBin4356_Ctg1_1956 | 2 | – | 2 | 0.4 | dcb (Near MHC class II antigen, CIGSSA_012441.t1); rapgef5 (Rap guanine nucleotide exchange factor, CIGSSA_010612.t1) | |
|
GCR_cBin25891_Ctg1_305 | 3 | 96.9 | 3 | 83.2 | Adjacent gene: StAR‐related lipid transfer protein 3|conserved (predicted) (CIGSSA_018950.t4); other genes: Igf2bp1 (insulin‐like growth factor 2 mRNA binding protein 1, CIGSSA_016974.t1); crhr1, corticotropin‐releasing hormone receptor 1, CIGSSA_016935.t1) | |
|
ESTNV_35352_77 | 4 | 101.6 | 4 | 109.9 | Adjacent gene: mlxipl or ChREBP (carbohydrate response element binding, (CIGSSA_023462.t1) | |
|
GCR_cBin37707_Ctg1_102 | 5 | 43.7 | 5 | 47.8 | EU_F 33.9 cM: 16466_1044 | Adjacent genes: CCR4‐NOT transcription complex subunit 6‐like and a disintegrin and metalloproteinase with thrombospondin motifs 2‐like |
|
GCR_hBin33595_Ctg1_191 | 6 | 7.9 | – | – | Adjacent gene: mitochondrial Rho GTPase 1 (CIGSSA_033844.t8) | |
|
ESTNV_17881_371 | 9 | 96.2 | 9 | 91.9 | EU_F 88.2 cM: ESTNV_34364_237 | Immunoglobulin superfamily member 11 (CIGSSA_044008); unconventional myosin‐VIIa (CIGSSA_044110.t4) |
|
ESTNV_24797_128 | 12 | 38 | 12 | 68.7 | EU_F 61.2 cM: GCR_cBin25404_Ctg1_420 | SNP is in a candidate gene (MHC class II beta chain); Adjacent gene: myosin heavy chain larval type 2 (CIGSSA_061303.t1) |
|
ESTNV_16810_167 | 12 | 70.4 | 12 | 108.3 |
EU_F 110.5 cM: 16129_0239 |
Adjacent genes: ora1 (vomeronasal type‐1 receptor 4‐like) and ora2 (vomeronasal type‐1 receptor 1‐like); |
|
ESTNV_22997_260 | 13 | 52 | 13 | 85 | NA_F 48.8 cM: 15806_943 | Adjacent genes: RILP‐like protein 1 isoform X1 and U11/U12 small nuclear ribonucleoprotein 35 kDa protein |
|
GCR_cBin3895_Ctg1_167 | 14 | 44.8 | 14 | 29.7 | crh (corticotropin‐releasing hormone, CIGSSA_072826.t1); mhc1uxa2 (Mhc1uxa2 protein, CIGSSA_074346.t1) | |
|
ESTNV_29368_168 | 15 | 34.9 | 15 | 44.2 | bmp2 (bone morphogenetic protein 2, CIGSSA_080330.t1) | |
|
GCR_cBin15233_Ctg1_136_V2 | 15 | 53.4 | – | – | Intron in mdag2 (MAM domain‐containing glycosylphosphatidylinositol anchor protein 2), TSHR (thyrotropin receptor, XM_014145015.1) | |
|
GCR_cBin2472_Ctg1_142 | 15 | 59.6 | 15 | 77.2 | Rab‐32 (Ras‐related protein, CIGSSA_078732.t1) | |
|
GCR_cBin27732_Ctg1_177 | 19 | 54.4 | 19 | 42 | Adjacent genes: cadherin‐10‐like and cadherin‐6‐like; Mcr4 (Melanocortin receptor 4, CIGSSA_099455) | |
|
ESTV_14714_122 | 19 | 61.5 | 19 | 52.7 | ZupT (zinc transporter ZIP11‐like) contains SNP; sstr5 (somatostatin receptor 5 or growth hormone‐inhibiting hormone (GHIH) expressed in gut | |
|
GCR_cBin6274_Ctg1_67 | 20 | 29.9 | 20 | 34.3 | EU 39.7 cM: 16260_0757 | Rab‐35 (Ras‐related protein, CIGSSA_102343); gnrh1, (gonadotropin‐releasing hormone 1, XM_014160183.1), lysosome membrane protein 2‐like isoform X1, X2 |
|
ESTNV_31210_275 | 1p/23 | 118.1 | 23 | 51.4 | myo5c (unconventional myosin‐Vc, CIGSSA_112000) |
p = .001 or smaller.
Nonhierarchical AMOVA (Appendix S1), pairwise AMOVA (Appendix S3), and hierarchical AMOVA (Appendix S5).
“Chr” means the chromosome number on which the outlier locus was located.
“NA” means the position on the North American Atlantic salmon female linkage map by Brenna‐Hansen et al. (2012); “EU” means the position on the European Atlantic salmon female linkage map by Lien et al. (2011).
See Table 3 for protein homology for this locus.
Freamo et al. (2011).
Culling et al. (2013).
Mäkinen et al. (2014).
Johnston et al. (2014).
Very et al. (2008).
Three of the 10 loci identified by BayeScan (Appendix S2).
One of the three outlier loci was identified by all six pairwise comparisons between an AQUA population and the TOB_WILD population.
Also significant for F CT in the hierarchical Arlequin analysis.
Chromosome containing SNP inferred from European map (Lien et al., 2011) after known translocations were accounted for (Brenna‐Hansen et al., 2012); consequently, the exact position on the chromosome is unknown.
SNP has not been located on European map (Lien et al., 2011).