Literature DB >> 32165371

Network Analysis of Linkage Disequilibrium Reveals Genome Architecture in Chum Salmon.

Garrett McKinney1,2, Megan V McPhee3, Carita Pascal2, James E Seeb2, Lisa W Seeb2.   

Abstract

Many studies exclude loci that exhibit linkage disequilibrium (LD); however, high LD can signal reduced recombination around genomic features such as chromosome inversions or sex-determining regions. Chromosome inversions and sex-determining regions are often involved in adaptation, allowing for the inheritance of co-adapted gene complexes and for the resolution of sexually antagonistic selection through sex-specific partitioning of genetic variants. Genomic features such as these can escape detection when loci with LD are removed; in addition, failing to account for these features can introduce bias to analyses. We examined patterns of LD using network analysis to identify an overlapping chromosome inversion and sex-determining region in chum salmon. The signal of the inversion was strong enough to show up as false population substructure when the entire dataset was analyzed, while the effect of the sex-determining region on population structure was only obvious after restricting analysis to the sex chromosome. Understanding the extent and geographic distribution of inversions is now a critically important part of genetic analyses of natural populations. Our results highlight the importance of analyzing and understanding patterns of LD in genomic dataset and the perils of excluding or ignoring loci exhibiting LD. Blindly excluding loci in LD would have prevented detection of the sex-determining region and chromosome inversion while failing to understand the genomic features leading to high-LD could have resulted in false interpretations of population structure.
Copyright © 2020 McKinney et al.

Entities:  

Keywords:  chum salmon; inversion; linkage disequilibrium; network analysis; x-chromosome

Mesh:

Year:  2020        PMID: 32165371      PMCID: PMC7202013          DOI: 10.1534/g3.119.400972

Source DB:  PubMed          Journal:  G3 (Bethesda)        ISSN: 2160-1836            Impact factor:   3.154


The vast amount of genomic data now available allows research to move beyond analysis of allele frequency distributions among populations and into the effects of genomic structure and organization on adaptation and population structure. Genomic research has illuminated a range of evolutionary processes influencing adaptation. Single gene differences have been shown to have an important role in adaptation and life-history variation: see for example the Eda gene that influences armor plating in stickleback (Schluter ) and the VGLL3 gene that maintains variation in age at maturity in Atlantic salmon (Salmo salar) (Barson ). On a larger genomic scale, the importance of islands of divergence (e.g., Larson ) and chromosome inversions (Wellenreuther and Bernatchez 2018) is increasingly recognized in non-model organisms, particularly in the context of sexually antagonistic selection (Kirkpatrick and Guerrero 2014; Barson ). While islands of divergence can exhibit elevated linkage disequilibrium (LD) due to genetic hitchhiking, other features of genomic architecture such as inversions or sex-determining regions inhibit recombination, resulting in elevated LD and the inheritance of haplotype blocks. Historically, many studies explicitly excluded loci exhibiting LD under the assumption that these markers provide redundant information (e.g., Larson ), but with the current recognition of their importance and ability to detect genomic architecture, that practice is becoming obsolete. Chromosome inversions have been associated with widely divergent adaptive variation ranging from social behavior (Thomas ), life-history variation (Miller ), alternative reproductive strategies (Küpper ), and adaptation to different environments (Jones ). Inversions have been implicated in the formation and divergence of sex chromosomes (Lemaitre ) and establishment of reproductive barriers leading to speciation (Noor ). Inversions often exhibit elevated divergence due to inhibited recombination, which can manifest as population substructure in genetic analyses (Tian ; Arostegui ), and the extended LD exhibited by large inversions facilitates their detection in marker-dense next-generation sequencing projects (Kemppainen ). An understanding of the extent and geographic distribution of inversions is now a critically important part of genetic analyses of natural populations. Sex chromosomes are another genomic feature that often exhibits reduced recombination and elevated LD, with important evolutionary implications as males and females often exhibit different phenotypes and experience different selective pressures (Badyaev and Hill 2000; Tamate and Maekawa 2006). Elevated LD in sex chromosomes can be due to the presence of inversions (Lemaitre ; Connallon ) or as a result of sex-specific patterns of recombination (Kijas ). Reduced recombination can facilitate divergence between sex chromosomes under sexually antagonistic selection (Charlesworth 2018). Sexually antagonistic selection has been noted in a number of cases such as ornamental traits in poeciliid fishes (Lindholm and Breden 2002), genetic variation for fitness in red deer (Foerster ), and optimal age at maturity in Atlantic salmon (Barson ). When causal variants are located on the sex chromosome, sexually antagonistic selection can be resolved through portioning of genetic variation between sexes (Roberts ). With increasingly dense genomic data, sex-associated loci are likely to be found even if the sex-chromosome has not been identified; but, when unaccounted for, these sex-associated markers can bias genomic analyses (e.g., Benestan ). Traditionally, sex-associated markers have been identified through genome-wide association studies (GWAS) of individuals with known sex or by identifying characteristics of heterogametic sex loci, such as presence/absence or genotypic frequencies where half of the samples are heterozygous and the other half are a single homozygous genotype (e.g., Star ; McKinney ). An alternative approach is to search for regions of extended LD which are often associated with sex-determining regions. Identification of genomic features should be a routine step in genomic analyses to avoid biases outlined above (Benestan ; Arostegui ). Genome assemblies allow direct visualization of LD patterns along the chromosome, but most species do not have assembled genomes. In this case, network analysis can be performed on patterns of LD to identify genomic features (Kemppainen ). Network analysis is likely to be particularly informative when multiple genomic features overlap, manifesting as a single region of high LD. Here, we describe the detection and genetic architecture of multiple genomic features in chum salmon (Oncorhynchus keta) from western Alaska. Chum salmon exhibit an anadromous life history where individuals emerge in freshwater and then migrate to the ocean to mature before returning to freshwater streams to spawn. Populations often exhibit life-history variation in the timing of return migration. We used patterns of LD combined with network analysis to discover a chromosomal inversion in chum salmon that contains the sex-determining region. The inversion exhibits spatial variation in frequency throughout western Alaska and includes putatively adaptive genes associated with life-history variation in other salmonids.

Materials and Methods

SNP discovery and RAD sequencing

SNP discovery was conducted on 6 collections of chum salmon (48 samples each) using RAD sequencing (Figure 1). These collections were distributed among four major regions: Norton Sound, Yukon River, Kuskokwim River, and Nushagak River (Table 1). DNA was obtained from Alaska Department of Fish and Game (ADFG).
Figure 1

Map of sampling locations. Collections are colored by region, and shapes denote whether samples were genotyped using RADseq and GT-seq or GT-seq only.

Table 1

Number of samples initially sequenced and retained after quality filtering for RADseq and GT-seq datasets. Collections used to evaluate accuracy of the putative sex- determining region are marked with an asterisk *

RegionCollectionInitial RADseq SamplesRetained RADseq SamplesInitial GT-seq SamplesRetained GT-seq Samples
Norton SoundEldorado River48488382
Norton SoundFish River*48398277
Norton SoundKwiniuk River*008274
Yukon RiverNulato River48388380
Yukon RiverOtter Creek48489992
Yukon RiverEast Fork Andreafsky River008383
Kuskokwim RiverHolokuk River48488379
Kuksokwim RiverAniak River008279
Nushagak RiverKokwok River48468379
Nushagak RiverMulchatna River*008273
288267842798
Map of sampling locations. Collections are colored by region, and shapes denote whether samples were genotyped using RADseq and GT-seq or GT-seq only. Sequencing libraries were prepared with the SbfI enzyme using a modification of the improved RADseq library preparation protocol developed in Ali . Samples were sequenced on a HiSeq 4000 with paired-end 150 bp reads; 96 samples were sequenced per lane. Two rounds of sequencing were conducted, with the volume of DNA for each individual adjusted in the second round of sequencing to reduce variation in sequence reads per individual (Prince ). Sequence data were processed with STACKS v1.47 (Catchen ) using default settings with the following exceptions: process_radtags (-c -r -q -t 140), ustacks (-r–model_type bounded–bound_low 0–bound_high 0.01), cstacks (-n 2). The catalog of variation for STACKS was created using six individuals from each collection. Loci and samples were filtered using an iterative process where poor-quality loci and samples were removed initially, the proportion of missing data were recalculated, and more stringent thresholds were applied for final filtering. A genotype rate threshold of 50% was initially applied to remove poor-quality loci. The genotype rate per sample was then estimated using the retained loci; samples were retained if they had a genotype rate of at least 75%. This iterative process prevents filtering of good quality samples due to an excess of poor quality loci, as well as filtering of good quality loci due to an excess of poor quality samples. Allele frequencies and FST were estimated with Genepop (Rousset 2008) using the retained samples; loci with a minor allele frequency (MAF) of at least 0.05 were retained. HDplot (https://github.com/gjmckinney/HDplot, McKinney ) was used to identify paralogs, which were excluded from further analysis because read depth was too low for accurate genotyping (McKinney ). Singleton (non-paralogous) loci were retained for final analysis if they met a threshold of 90% genotype rate. Retained singleton loci were aligned against the rainbow trout O. mykiss genome (Omyk_1.0, NCBI: GCF_002163495.1) using bowtie2 v2.2.1 (Langmead and Salzberg 2012) to allow investigation of genomic patterns of differentiation.

Population structure

Genomic features can manifest as population substructure when reduced recombination leads to fixation of alternate alleles over large genomic areas (e.g., Arostegui ). Population structure was visualized using individual-based PCAs in R with Adegenet (Jombart 2008) to determine if any populations exhibited patterns of substructure. The populations included in this study spanned a large geographical region in Alaska, so population structure was examined within each region to prevent large-scale population structure from overwhelming any signal of genomic features. Populations exhibiting substructure were candidates for initial examination of LD.

Identification of genomic structures

Genomic features were identified by examining LD between markers. Pairwise LD was estimated between SNPs using the r-squared method in Plink (v1.07, Purcell ). Pairwise LD was plotted for each chromosome using R to identify regions of elevated LD. Network analysis and community detection were conducted in R using the igraph package (https://igraph.org/r) to identify groups of linked SNPs, hereafter referred to as ‘sets’. This was done to determine if multiple patterns of LD were present within a single genomic region. For network analysis, SNP pairs with an r2 < 0.3 were removed. SNPs that were in LD with fewer than 3 other SNPs were also excluded to reduce the number of SNPs that were linked only by close physical proximity. Following network analysis, chromosomal positions of SNPs in LD sets were examined to determine if they could be attributed to genomic structures. Finally, patterns of variation for SNPs within LD sets were visualized using PCA.

High-throughput assay panel

Two putative, co-occurring genomic features were identified from the RADseq data (see results): a chromosome inversion and a sex-determining region. GT-seq assays were developed for 39 loci diagnostic for the putative inversion and 21 loci diagnostic for the putative sex-determining region. Multiple loci for each genomic feature were included to provide a control for genotyping error and because some loci would be routinely lost during panel optimization (described below). Filters were applied prior to and during primer design to remove loci that were likely to amplify off-target sequence with GT-seq following the methods of McKinney ; this included identifying transposable elements and primers that align to multiple genomic regions. Loci with SNPs within 20 bp of the end of the RAD tag required genomic sequence past the RAD tag for primer design. We obtained a draft copy of the chum salmon genome (B.F. Koop, University of Victoria, Victoria, British Columbia, Canada; personal communication) and aligned RAD tag sequences to the unassembled scaffolds using bowtie2; custom perl scripts were used to obtain flanking sequence for primer design. Primers were designed using BatchPrimer3 (You ). Amplicons for retained primers were then examined to ensure that SNPs were contained within the first 100 bp of the amplicon to facilitate downstream single-end sequencing. Two rounds of panel optimization were conducted in order to remove loci that did not perform well. Each round of optimization was conducted using 48 individuals sequenced on a MiSeq with 150 bp paired-end sequence. DNA was extracted and sequencing libraries were prepared following the methods of Campbell . Sequencing data were processed and genotyped using GTscore (https://github.com/gjmckinney/GTscore, McKinney ). After each round of sequencing, the number of reads amplified by each primer was counted, as well as the proportion of reads that contained both the primer and bioinformatics probes for a locus. Loci with excessive amplification and off-target amplification were removed following the methods of McKinney . An expanded sample of individuals was genotyped using the GT-seq panel to better characterize genomic structure across the region. This included additional individuals from the ascertainment collections as well as additional collections from Norton Sound, and the Yukon, Kuskokwim, and Nushagak rivers (Table 1). Populations and individuals with paired sex data were preferentially chosen to assess concordance between phenotypic sex and the putative sex-determining region. Sequencing libraries were prepared as above. A total of 871 samples (842 plus 39 sequenced twice with GT-seq for quality control) were sequenced on a single lane of a HiSeq 4000 with 100 bp single-end sequencing. These samples were also genotyped for an additional 478 markers on this sequencing lane as part of a separate collaborative project (McKinney et al., unpubl.). All GT-seq loci were used for evaluating sample quality even though loci developed for the separate project are not included in this study. Samples were evaluated for quality based on a minimum 90% genotype rate and visualization of allele scatter plots.

Data availability

Raw RADseq is available in NCBI BioProject PRJNA611968. Raw GT-seq data are available in NCBI BioProject PRJNA609360.The authors affirm that all data necessary for confirming the conclusions of this article are represented fully within the article and its tables and figures. Supplemental material available at figshare: https://doi.org/10.25387/g3.11903409.

Results

STACKS initially outputted 222,668 SNPs within 94,002 RAD tags. A total of 30,006 SNPs within 22,693 RAD tags were retained after applying all filters (Table S1). Of these loci, 13,015 SNPs within 10,821 RAD tags were aligned to the rainbow trout genome. After all filtering steps, 267 individuals were retained for SNP discovery. Clear patterns of population substructure were apparent in the Yukon River from individual-based PCAs using all 13,015 markers (Figure 2). No substructure was visible in individual PCAs for other geographic regions (Figure S1). The loci driving population substructure within Yukon River collections were aligned to a region of elevated LD on Omy28 (Figure 3A) which corresponds to linkage group 15 on the chum salmon linkage map (Waples ; Sutherland ). To avoid confusion, we refer to the chromosome as LG15 throughout even though SNP positions are derived from alignments to the O. mykiss chromosome 28. Network analysis identified three distinct sets of loci that exhibited high LD (Figure 3B). Loci in set 1 spanned from 437 Kb to 20.3 Mb, and the loci in set 2 spanned from 1.6 Mb to 21.3 Mb. The loci in set 3 spanned only a 2 Kb region, suggesting that their high LD is likely due only to close physical proximity. Therefore, these loci were excluded from further analysis.
Figure 2

Individual PCA using all loci for Yukon River collections. Both collections exhibit patterns of substructure.

Figure 3

A) Plot of linkage disequilibrium (pairwise r2) of SNPs aligned to O. mykiss chromosome 28 (chum LG 15). Each point is a SNP pair colored by LD. The pattern of elevated LD spans 20 Mb of chromosome 28. B) Network analysis with community detection identified three distinct sets of loci contributing to LD on LG15. Set 1 (red background) has 51 loci, set 2 (purple background) has 27 loci, and set 3 (green background) has 4 loci. Loci in sets 1 and 2 span the entire 20 Mb while loci in set 3 are linked due to close physical proximity.

Individual PCA using all loci for Yukon River collections. Both collections exhibit patterns of substructure. A) Plot of linkage disequilibrium (pairwise r2) of SNPs aligned to O. mykiss chromosome 28 (chum LG 15). Each point is a SNP pair colored by LD. The pattern of elevated LD spans 20 Mb of chromosome 28. B) Network analysis with community detection identified three distinct sets of loci contributing to LD on LG15. Set 1 (red background) has 51 loci, set 2 (purple background) has 27 loci, and set 3 (green background) has 4 loci. Loci in sets 1 and 2 span the entire 20 Mb while loci in set 3 are linked due to close physical proximity. We then plotted all RADseq samples for all loci from LG15. The PCA of LG15 revealed clustering of samples that was driven by the loci in sets 1 and 2; PCA loadings show that Axis 1 was primarily driven by loci in set 1, and axis 2 was primarily driven by loci in set 2 (Figure 4A).
Figure 4

A) Individual PCA of RADseq samples using all loci from LG15. Examination of locus loadings show that Axis 1 is primarily driven by loci in set 1 (inversion loci) and axis 2 is primarily driven by loci in set 2 (sex-associated loci). Labels for each cluster of individuals denote the putative chromosome type of individuals within the cluster with respect to inversion and sex. B) Individual PCA of RADseq and GT-seq samples using loci successfully developed into GT-seq assays. Samples are color coded by phenotypic sex with gray individuals having unassigned sex.

A) Individual PCA of RADseq samples using all loci from LG15. Examination of locus loadings show that Axis 1 is primarily driven by loci in set 1 (inversion loci) and axis 2 is primarily driven by loci in set 2 (sex-associated loci). Labels for each cluster of individuals denote the putative chromosome type of individuals within the cluster with respect to inversion and sex. B) Individual PCA of RADseq and GT-seq samples using loci successfully developed into GT-seq assays. Samples are color coded by phenotypic sex with gray individuals having unassigned sex. Genotype patterns within a genomic feature can give clues to the type of genomic feature. Genotypes for loci in sets 1 and 2 (including those genotyped with GT-seq; see below) were plotted by SNP set and position to better visualize genotype patterns (Figure 5); this plot revealed patterns consistent with multiple genomic features within this region. Loci in set 1 exhibited three genotype classes (both homozygous and heterozygous) and near-complete linkage for loci spanning 20 Mb, which is consistent with a genomic inversion that is variable within populations (e.g., Arostegui ). While we could not directly test for an inversion using methods such as breakpoint mapping or linkage mapping, it is unlikely that the observed pattern of LD spanning 20Mb with three alternate genotypes could result from another process. We assume that the inversion was the least common form but this is not always the case (Arostegui ). This inversion was present in all collections, but its frequency varied by region, with Yukon and Kuskokwim rivers having the highest frequency of the inversion (Table 2).
Figure 5

Genotypes for combined RADseq and GT-seq samples. Rows are samples ordered by sex (inferred from PCA cluster) and inversion type; columns are loci. Loci were separated by genomic feature to aid in visualization: inversion-associated loci are to the left of the dashed line and sex-associated loci are to the right of the dashed line. Within each genomic feature, loci are ordered by position. Individual genotypes are color coded with 0 and 2 representing alternate homozygous genotypes and 1 being a heterozygous genotype. Prefix of Oke_uwRAD was dropped from marker names for brevity.

Table 2

Frequency of the chromosome inversion and sex-assignment accuracy by collection. Samples were assigned an inversion type (+/−) and sex based on PCA clustering. For collections with phenotypic sex data, phenotypes were compared to sex assigned through clustering to assess accuracy

RegionCollectionData Source+++-Freq (-)M clusterF clusterSex Assignment Accuracy
Norton SoundEldorado RiverRAD/GT-seq96%4%0%2%5080NA
Norton SoundFish RiverRAD/GT-seq90%10%0%6%546284%
Norton SoundKwiniuk RiverGT-seq92%8%0%4%304489%
Yukon R.Nulato RiverRAD/GT-seq73%23%4%22%6553NA
Yukon R.Otter CreekRAD/GT-seq76%22%1%16%6575NA
Yukon R.East Fork AndreafskyGT-seq83%17%0%10%3746NA
Kuskokwim R.Holokuk RiverRAD/GT-seq89%10%1%7%5473NA
Kuskokwim R.Aniak RiverGT-seq85%15%0%9%4138NA
Nushagak R.Kokwok RiverRAD/GT-seq96%4%0%2%9629NA
Nushagak R.Mulchatna RiverGT-seq90%10%0%5%373699%
Genotypes for combined RADseq and GT-seq samples. Rows are samples ordered by sex (inferred from PCA cluster) and inversion type; columns are loci. Loci were separated by genomic feature to aid in visualization: inversion-associated loci are to the left of the dashed line and sex-associated loci are to the right of the dashed line. Within each genomic feature, loci are ordered by position. Individual genotypes are color coded with 0 and 2 representing alternate homozygous genotypes and 1 being a heterozygous genotype. Prefix of Oke_uwRAD was dropped from marker names for brevity. Loci in set 2 exhibited a more complex pattern that is consistent with a sex-determining region. Putative males in the top half of Figure 5 and loci to the right of the dividing line show a pattern of very high heterozygosity with homozygous genotypes almost entirely of one class rather than both homozygous genotypes being equally represented. Putative females in the bottom half of Figure 5 with loci to the right of dividing line show a pattern of high homozygosity for the alternate allele found in males, and all three genotype classes are present. This overall pattern is consistent with fixation of one allele on the Y-chromosome with the alternate allele at high frequency, but not fixed, on the ancestral X-chromosome. These patterns are consistent with an XY sex-determining region and balanced sex ratios. This finding was further supported by phenotypic sex data (see below).

High-throughput panel for screening genomic structures

Development of the GT-seq panel involved filtering steps during primer design and panel optimization. After evaluation and optimization, the final GT-seq panel contained 22 SNPs diagnostic for a putative inversion and 18 SNPs diagnostic for a putative sex-determining region. Primer sequences for final loci are located in Table S2 and bioinformatic probes used for genotyping in GTscore are in Table S3. A total of 43 individuals were removed for failing quality control: 19 samples had <90% genotype rate and an additional eight had broad allelic scatter due to reduced read depth (Figure S2B), five individuals showed elevated heterozygosity and indistinct allelic clusters suggesting contamination (Figure S2C), one individual showed four clusters consistent with triploidy (Figure S2D), two individuals showed five clusters consistent with tetraploidy (Figure S2E), and one individual had no heterozygous genotypes and was likely a different species of salmon (Figure S2F). Inversion type and sex were assigned by clustering samples with PCA (Figure 4B); this PCA mirrored the results from RADseq analysis shown in Figure 4A. Results for inversion and sex are reported for the full dataset with the RADseq and GT-seq samples combined. Phenotypic sex data were available for samples in five of the collections examined; however, systemic errors in phenotypic records were identified for the East Fork Andreafsky and Aniak Rivers populations, causing phenotypes for these populations to be removed from analysis. The average concordance between phenotypic sex and cluster sex for the remaining combined RADseq and GT-seq datasets was 90% (range 84–99%, Table 2). Phenotypic sex of each individual, along with results of inversion type and sex assignment can be found in Table S4. Similar to the RADseq data alone, within the full dataset the inversion was found in every collection but showed regional variation in frequency. Inversion frequency ranged from 2 to 22% and was most common in the Lower Yukon collections followed by the Kuskokwim River collections (Table 2). Outside of the Yukon and Kuskokwim rivers the inversion reached a maximum frequency of only 6% (Table 2).

Discussion

Genomic structures including inversions and sex-determining regions provide important contributions to adaptive variation and population structure (Wellenreuther and Bernatchez 2018; Benestan ). Large inversions are associated with adaptation and life- history variation across taxa, including ecotypes in mosquitos (Anopheles sp.) (Ayala ), migration vs. residency in rainbow trout (Oncorhynchus mykiss) (Pearse ), and annual vs. perennial yellow monkey flowers (Mimulus guttatus) (Lowry and Willis 2010). Inhibited recombination within inverted regions increases divergence between chromosomal forms and can drive patterns of population structure (Tian ; Arostegui ). A similar pattern has been observed for sex chromosomes where sex-associated markers can drive patterns of population structure (Benestan ). We identified two genomic structures in chum–a chromosome inversion and sex-determining region–that occur in the same genomic region. The signal of the chromosome inversion was strong enough to cause population structuring when all markers were used for PCA; however, population structure due to the sex-determining region was only visible in the chromosome-specific PCA.

Identification of genomic structures using LD

Population substructure shaped by the inversion and sex-determining region was only determined through a combination of LD and network analysis. Both the inversion and sex-determining region exhibited LD spanning the same genomic region, which would be difficult to ascertain manually. Network analysis and community detection was a simple method to automate detection of different groups of linked markers that contributed to this overall pattern of LD. Previous work has shown that a combination of LD and network analyses facilitates detection of genomic features even in the absence of reference genome (Kemppainen ). Here we demonstrated that LD and network analysis can be used to tease apart multiple genomic features within the same chromosomal region. The extended LD combined with the specific genotype patterns observed for high-LD loci suggests a genomic inversion that overlaps with the sex-determining region in chum salmon. The genotype patterns also suggest that the inversion arose on the X chromosome. An inversion arising on the X chromosome should result in 5 different chromosomal combinations: XX, XXINV, XINVXINV, XY, XINVY. This agrees with the observed pattern of 5 sample clusters on the individual PCA (Figure 4). If the inversion had arisen on the Y chromosome only three clusters would be expected: XX, XY, XYINV. Levels of LD for sex-associated loci varied depending on the X-chromosome type, suggesting different levels of divergence between the Y-chromosome and the ancestral and inverted X-chromosomes. Salmonids exhibit sex-specific recombination patterns where females recombine across the full length of chromosomes while recombination in males is restricted to telomeric regions (Lien ; Pearse ). While autosomes will be free to recombine in females, recombination between the X and Y can only take place in males and will be limited to telomeric regions. This type of recombination will likely result in patterns of extended LD as the sex chromosomes diverge due to genetic drift. Extended LD on sex-chromosomes has also been noted in Atlantic salmon (Kijas ) and Chinook salmon (McKinney preprint) and may be a general feature in salmonids.

Chromosome inversion

An inversion on the X chromosome containing the sex-determining region has interesting evolutionary implications. Females can exhibit all three genotypes of the inversion, while males can be homozygous for the ancestral form or heterozygous for the inversion but never homozygous for the inversion. Inversions can lead to fixation of genetic variants on alternate chromosomes and facilitate adaptive divergence (reviewed in Wellenreuther and Bernatchez 2018). In salmonids, an inversion on Omy05 is associated with variation in migrant vs. resident life history in rainbow trout (Miller ; Pearse ; Pearse ; Arostegui ) . We cannot determine if the inversion we identified in chum salmon is adaptively important with the data currently available, but one intriguing result is that the inversion contains the Greb1L gene. This gene has been associated with variation in the timing of migration return in both Chinook salmon and rainbow trout (Prince ; Micheletti ; Hess ). Chum salmon exhibit two ecotypes that vary in season of migration into freshwater: summer migrants and fall migrants. In the Yukon River, summer chum salmon enter the river between early June and early July and spawn primarily in tributaries in the lower 700 km of the river. Fall chum salmon, the less-abundant ecotype, enter in mid-July to late August and, unlike the majority the chum salmon populations throughout the range, undertake long migrations with populations extending to the headwaters of Yukon River over 2800 km upriver (Buklis and Barton 1984; Flannery ). In addition, significant divergence in phenotypic and genetic characters between the two ecotypes suggests different evolutionary histories in the recent past (Seeb ). Here we only examined the summer ecotype. Further study of individual migration timing including the fall and summer ecotypes in the Yukon River could determine whether there is an association between inversion type and return. We also recommend follow up work with whole-genome sequencing combined with individual metadata to better determine if this inversion is adaptive, and what genes and gene variants may be involved (i.e., Pearse ).

Sex-determining region

Sex in salmon is genetically controlled (Devlin and Nagahama 2002) and is governed by the sex-determining gene sdY (Yano ). The identity of sex chromosomes differ across salmonid species due to translocation of the sex-determining region (Davidson ), and in Atlantic salmon sdY has been identified on three different chromosomes (Eisbrenner ). In this study, we identified the sex-chromosome in chum salmon as LG15 which is orthologous to Ssa03, one of the chromosomes containing the sdY gene in Atlantic salmon. Interestingly, the homeolog of LG15 is orthologous to the sex chromosome in coho salmon, lake whitefish, and sockeye salmon (Sutherland ) suggesting that sex genes were silenced or lost on separate homeologs between species. Discrepancies between genetic and phenotypic sex have been observed in previous studies and have been speculated to be due to spontaneous sex reversal (Nagler ), non-functional Y-chromosomes due to sdY exon loss (Cavileer ), or the possibility of autosomal modifiers that may interact with sdY (Eisbrenner ). However, discrepancies may also be explained by technical errors including error in phenotypic sex assignment or errors in sample records. The populations in this study were sexed visually which can be unreliable, particularly if fish were caught before secondary sex characteristics were fully developed (Lozorie and McIntosh 2014). Despite this, overall concordance between phenotypic sex and sex determined by PCA cluster was high (90% overall and 99% in the Mulchatna collection) suggesting that this is the true sex determining region. Future studies of this genomic region should involve new collections of samples with careful attention to accurate phenotypic sex identification and pairing of phenotypic data and tissue samples.

Conclusion

Examining genome-wide patterns of LD is an important tool for evolutionary analysis. Many studies explicitly exclude loci with LD; these may be missing important patterns of genomic variation. We identified a co-locating inversion and sex-determining region in chum salmon by performing network analysis on patterns of LD. The signal of these features was strong enough to drive PCA of the full dataset, resulting in false population structure. Attempting to identify and account for genomic features should be standard practice in genome-scale datasets.
  51 in total

1.  Natural selection and the genetics of adaptation in threespine stickleback.

Authors:  Dolph Schluter; Kerry B Marchinko; R D H Barrett; Sean M Rogers
Journal:  Philos Trans R Soc Lond B Biol Sci       Date:  2010-08-27       Impact factor: 6.237

2.  Sex chromosomes and sexual selection in poeciliid fishes.

Authors:  Anna Lindholm; Felix Breden
Journal:  Am Nat       Date:  2002-12       Impact factor: 3.926

3.  Local adaptation and the evolution of inversions on sex chromosomes and autosomes.

Authors:  Tim Connallon; Colin Olito; Ludovic Dutoit; Homa Papoli; Filip Ruzicka; Lengxob Yong
Journal:  Philos Trans R Soc Lond B Biol Sci       Date:  2018-10-05       Impact factor: 6.237

4.  Fast gapped-read alignment with Bowtie 2.

Authors:  Ben Langmead; Steven L Salzberg
Journal:  Nat Methods       Date:  2012-03-04       Impact factor: 28.547

5.  Signatures of sex-antagonistic selection on recombining sex chromosomes.

Authors:  Mark Kirkpatrick; Rafael F Guerrero
Journal:  Genetics       Date:  2014-02-27       Impact factor: 4.562

6.  Sex matters in massive parallel sequencing: Evidence for biases in genetic parameter estimation and investigation of sex determination systems.

Authors:  Laura Benestan; Jean-Sébastien Moore; Ben J G Sutherland; Jérémy Le Luyer; Halim Maaroufi; Clément Rougeux; Eric Normandeau; Nathan Rycroft; Jelle Atema; Les N Harris; Ross F Tallman; Spencer J Greenwood; Fraser K Clark; Louis Bernatchez
Journal:  Mol Ecol       Date:  2017-07-24       Impact factor: 6.185

7.  Linkage mapping with paralogs exposes regions of residual tetrasomic inheritance in chum salmon (Oncorhynchus keta).

Authors:  R K Waples; L W Seeb; J E Seeb
Journal:  Mol Ecol Resour       Date:  2015-03-11       Impact factor: 7.090

8.  A widespread chromosomal inversion polymorphism contributes to a major life-history transition, local adaptation, and reproductive isolation.

Authors:  David B Lowry; John H Willis
Journal:  PLoS Biol       Date:  2010-09-28       Impact factor: 8.029

Review 9.  The Guppy Sex Chromosome System and the Sexually Antagonistic Polymorphism Hypothesis for Y Chromosome Recombination Suppression.

Authors:  Deborah Charlesworth
Journal:  Genes (Basel)       Date:  2018-05-19       Impact factor: 4.096

10.  BatchPrimer3: a high throughput web application for PCR and sequencing primer design.

Authors:  Frank M You; Naxin Huo; Yong Qiang Gu; Ming-Cheng Luo; Yaqin Ma; Dave Hane; Gerard R Lazo; Jan Dvorak; Olin D Anderson
Journal:  BMC Bioinformatics       Date:  2008-05-29       Impact factor: 3.169

View more
  6 in total

1.  A chromosomal inversion may facilitate adaptation despite periodic gene flow in a freshwater fish.

Authors:  Matt J Thorstensen; Peter T Euclide; Jennifer D Jeffrey; Yue Shi; Jason R Treberg; Douglas A Watkinson; Eva C Enders; Wesley A Larson; Yasuhiro Kobayashi; Ken M Jeffries
Journal:  Ecol Evol       Date:  2022-05-07       Impact factor: 3.167

2.  Contrasting population differentiation in two sympatric Triplophysa loaches on the Qinghai-Tibet Plateau.

Authors:  Ling Jin; Zitong Li; Chongnv Wang; Yingnan Wang; Xinxin Li; Jian Yang; Yahui Zhao; Baocheng Guo
Journal:  Front Genet       Date:  2022-08-25       Impact factor: 4.772

3.  Neutral and adaptive loci reveal fine-scale population structure in Eleginops maclovinus from north Patagonia.

Authors:  Cristian B Canales-Aguirre; Wesley A Larson; Garrett J McKinney; C Eliza Claure; J Dellis Rocha; Santiago G Ceballos; María I Cádiz; José M Yáñez; Daniel Gomez-Uchida
Journal:  Ecol Evol       Date:  2022-10-03       Impact factor: 3.167

4.  The sockeye salmon genome, transcriptome, and analyses identifying population defining regions of the genome.

Authors:  Kris A Christensen; Eric B Rondeau; David R Minkley; Dionne Sakhrani; Carlo A Biagi; Anne-Marie Flores; Ruth E Withler; Scott A Pavey; Terry D Beacham; Theresa Godin; Eric B Taylor; Michael A Russello; Robert H Devlin; Ben F Koop
Journal:  PLoS One       Date:  2020-10-29       Impact factor: 3.240

5.  Power of a dual-use SNP panel for pedigree reconstruction and population assignment.

Authors:  Samuel A May; Garrett J McKinney; Ray Hilborn; Lorenz Hauser; Kerry A Naish
Journal:  Ecol Evol       Date:  2020-08-10       Impact factor: 2.912

6.  Whole-genome sequencing and genome regions of special interest: Lessons from major histocompatibility complex, sex determination, and plant self-incompatibility.

Authors:  Xavier Vekemans; Vincent Castric; Helen Hipperson; Niels A Müller; Helena Westerdahl; Quentin Cronk
Journal:  Mol Ecol       Date:  2021-07-01       Impact factor: 6.622

  6 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.