| Literature DB >> 32165371 |
Garrett McKinney1,2, Megan V McPhee3, Carita Pascal2, James E Seeb2, Lisa W Seeb2.
Abstract
Many studies exclude loci that exhibit linkage disequilibrium (LD); however, high LD can signal reduced recombination around genomic features such as chromosome inversions or sex-determining regions. Chromosome inversions and sex-determining regions are often involved in adaptation, allowing for the inheritance of co-adapted gene complexes and for the resolution of sexually antagonistic selection through sex-specific partitioning of genetic variants. Genomic features such as these can escape detection when loci with LD are removed; in addition, failing to account for these features can introduce bias to analyses. We examined patterns of LD using network analysis to identify an overlapping chromosome inversion and sex-determining region in chum salmon. The signal of the inversion was strong enough to show up as false population substructure when the entire dataset was analyzed, while the effect of the sex-determining region on population structure was only obvious after restricting analysis to the sex chromosome. Understanding the extent and geographic distribution of inversions is now a critically important part of genetic analyses of natural populations. Our results highlight the importance of analyzing and understanding patterns of LD in genomic dataset and the perils of excluding or ignoring loci exhibiting LD. Blindly excluding loci in LD would have prevented detection of the sex-determining region and chromosome inversion while failing to understand the genomic features leading to high-LD could have resulted in false interpretations of population structure.Entities:
Keywords: chum salmon; inversion; linkage disequilibrium; network analysis; x-chromosome
Mesh:
Year: 2020 PMID: 32165371 PMCID: PMC7202013 DOI: 10.1534/g3.119.400972
Source DB: PubMed Journal: G3 (Bethesda) ISSN: 2160-1836 Impact factor: 3.154
Figure 1Map of sampling locations. Collections are colored by region, and shapes denote whether samples were genotyped using RADseq and GT-seq or GT-seq only.
Number of samples initially sequenced and retained after quality filtering for RADseq and GT-seq datasets. Collections used to evaluate accuracy of the putative sex- determining region are marked with an asterisk *
| Region | Collection | Initial RADseq Samples | Retained RADseq Samples | Initial GT-seq Samples | Retained GT-seq Samples |
|---|---|---|---|---|---|
| Norton Sound | Eldorado River | 48 | 48 | 83 | 82 |
| Norton Sound | Fish River* | 48 | 39 | 82 | 77 |
| Norton Sound | Kwiniuk River* | 0 | 0 | 82 | 74 |
| Yukon River | Nulato River | 48 | 38 | 83 | 80 |
| Yukon River | Otter Creek | 48 | 48 | 99 | 92 |
| Yukon River | East Fork Andreafsky River | 0 | 0 | 83 | 83 |
| Kuskokwim River | Holokuk River | 48 | 48 | 83 | 79 |
| Kuksokwim River | Aniak River | 0 | 0 | 82 | 79 |
| Nushagak River | Kokwok River | 48 | 46 | 83 | 79 |
| Nushagak River | Mulchatna River* | 0 | 0 | 82 | 73 |
| 288 | 267 | 842 | 798 |
Figure 2Individual PCA using all loci for Yukon River collections. Both collections exhibit patterns of substructure.
Figure 3A) Plot of linkage disequilibrium (pairwise r2) of SNPs aligned to O. mykiss chromosome 28 (chum LG 15). Each point is a SNP pair colored by LD. The pattern of elevated LD spans 20 Mb of chromosome 28. B) Network analysis with community detection identified three distinct sets of loci contributing to LD on LG15. Set 1 (red background) has 51 loci, set 2 (purple background) has 27 loci, and set 3 (green background) has 4 loci. Loci in sets 1 and 2 span the entire 20 Mb while loci in set 3 are linked due to close physical proximity.
Figure 4A) Individual PCA of RADseq samples using all loci from LG15. Examination of locus loadings show that Axis 1 is primarily driven by loci in set 1 (inversion loci) and axis 2 is primarily driven by loci in set 2 (sex-associated loci). Labels for each cluster of individuals denote the putative chromosome type of individuals within the cluster with respect to inversion and sex. B) Individual PCA of RADseq and GT-seq samples using loci successfully developed into GT-seq assays. Samples are color coded by phenotypic sex with gray individuals having unassigned sex.
Figure 5Genotypes for combined RADseq and GT-seq samples. Rows are samples ordered by sex (inferred from PCA cluster) and inversion type; columns are loci. Loci were separated by genomic feature to aid in visualization: inversion-associated loci are to the left of the dashed line and sex-associated loci are to the right of the dashed line. Within each genomic feature, loci are ordered by position. Individual genotypes are color coded with 0 and 2 representing alternate homozygous genotypes and 1 being a heterozygous genotype. Prefix of Oke_uwRAD was dropped from marker names for brevity.
Frequency of the chromosome inversion and sex-assignment accuracy by collection. Samples were assigned an inversion type (+/−) and sex based on PCA clustering. For collections with phenotypic sex data, phenotypes were compared to sex assigned through clustering to assess accuracy
| Region | Collection | Data Source | ++ | +- | – | Freq (-) | M cluster | F cluster | Sex Assignment Accuracy |
|---|---|---|---|---|---|---|---|---|---|
| Norton Sound | Eldorado River | RAD/GT-seq | 96% | 4% | 0% | 2% | 50 | 80 | NA |
| Norton Sound | Fish River | RAD/GT-seq | 90% | 10% | 0% | 6% | 54 | 62 | 84% |
| Norton Sound | Kwiniuk River | GT-seq | 92% | 8% | 0% | 4% | 30 | 44 | 89% |
| Yukon R. | Nulato River | RAD/GT-seq | 73% | 23% | 4% | 22% | 65 | 53 | NA |
| Yukon R. | Otter Creek | RAD/GT-seq | 76% | 22% | 1% | 16% | 65 | 75 | NA |
| Yukon R. | East Fork Andreafsky | GT-seq | 83% | 17% | 0% | 10% | 37 | 46 | NA |
| Kuskokwim R. | Holokuk River | RAD/GT-seq | 89% | 10% | 1% | 7% | 54 | 73 | NA |
| Kuskokwim R. | Aniak River | GT-seq | 85% | 15% | 0% | 9% | 41 | 38 | NA |
| Nushagak R. | Kokwok River | RAD/GT-seq | 96% | 4% | 0% | 2% | 96 | 29 | NA |
| Nushagak R. | Mulchatna River | GT-seq | 90% | 10% | 0% | 5% | 37 | 36 | 99% |