Literature DB >> 32611547

Comparative Genomic Analyses and a Novel Linkage Map for Cisco (Coregonus artedi) Provide Insights into Chromosomal Evolution and Rediploidization Across Salmonids.

Danielle M Blumstein1, Matthew A Campbell2, Matthew C Hale3, Ben J G Sutherland4, Garrett J McKinney5, Wendylee Stott6, Wesley A Larson7,1.   

Abstract

Whole-genome duplication (WGD) is hypothesized to be an important evolutionary mechanism that can facilitate adaptation and speciation. Genomes that exist in states of both diploidy and residual tetraploidy are of particular interest, as mechanisms that maintain the ploidy mosaic after WGD may provide important insights into evolutionary processes. The Salmonidae family exhibits residual tetraploidy, and this, combined with the evolutionary diversity formed after an ancestral autotetraploidization event, makes this group a useful study system. In this study, we generate a novel linkage map for cisco (Coregonus artedi), an economically and culturally important fish in North America and a member of the subfamily Coregoninae, which previously lacked a high-density haploid linkage map. We also conduct comparative genomic analyses to refine our understanding of chromosomal fusion/fission history across salmonids. To facilitate this comparative approach, we use the naming strategy of protokaryotype identifiers (PKs) to associate duplicated chromosomes to their putative ancestral state. The female linkage map for cisco contains 20,292 loci, 3,225 of which are likely within residually tetraploid regions. Comparative genomic analyses revealed that patterns of residual tetrasomy are generally conserved across species, although interspecific variation persists. To determine the broad-scale retention of residual tetrasomy across the salmonids, we analyze sequence similarity of currently available genomes and find evidence of residual tetrasomy in seven of the eight chromosomes that have been previously hypothesized to show this pattern. This interspecific variation in extent of rediploidization may have important implications for understanding salmonid evolutionary histories and informing future conservation efforts.
Copyright © 2020 Blumstein et al.

Entities:  

Keywords:  Salmonidae; comparative genomics; coregonines; linkage mapping; residual tetrasomy; whole genome duplication

Mesh:

Year:  2020        PMID: 32611547      PMCID: PMC7407451          DOI: 10.1534/g3.120.401497

Source DB:  PubMed          Journal:  G3 (Bethesda)        ISSN: 2160-1836            Impact factor:   3.154


The evolutionary significance of whole-genome duplications (WGDs) has been intensively debated for decades (e.g; Ohno 1970; Taylor ; Santini ; Wood ; Zhan ; Mayrose ; Van de Peer ). Multiple studies have hypothesized that WGD is an important evolutionary mechanism that can facilitate adaptation on short- and long-term evolutionary timescales (Ohta 1989, Selmecki ; Van de Peer ). For example, genes found in polyploid regions are able to gain new function (i.e., neofunctionalization) without the consequences of deleterious mutations affecting the main function of the original gene copy. This may facilitate adaptive molecular divergence and evolution of new phenotypes (Wittbrodt ; Wendel 2000; Rastogi and Liberles 2005). However, other studies have hypothesized that WGD presents significant challenges for meiosis and mitosis (Hollister 2015) and may not have as much of an effect on evolution as originally considered (Mayrose ; Arrigo and Barker 2012; Vanneste ; Clarke ). A consensus is therefore yet to be reached on the evolutionary impact of WGD relative to other evolutionary forces. Conducting genetic studies on organisms with relatively recent WGD can be challenging due to the inability to differentiate alleles and sequences from the same chromosome (homologs) from those on the duplicated chromosome (homeologs) (Limborg ). Fortunately, approaches leveraging gamete manipulation, high sequencing coverage, and long read sequencing have improved our ability to characterize duplicated regions. Linkage mapping with haploids and doubled haploids has facilitated analysis of duplicated regions in salmonids (Brieuc ; Kodama ; Lien ; Waples ). Further, long-read sequencing technologies have made it possible to assemble complex genomes with convoluted duplication histories (Kyriakidou ). These technological advances have revolutionized our ability to understand genomic architecture in species that have adaptively radiated into dozens of species following an ancestral WGD in lineages, such as salmonids (Lien ; Robertson ; Campbell ) and many plant species (Alix ). Salmonids are derived from an ancestral species that underwent a WGD ∼100 million years ago (Ss4R, Allendorf and Thorgaard 1984; Berthelot ; Macqueen and Johnston 2014; Lien ) and have since diversified into a broad array of ecologically and genetically distinct taxa. The Salmonidae family is comprised of three subfamilies: Salmoninae (salmon, trout, and char), Thymallinae (graylings), and Coregoninae (whitefish and ciscoes) (Norden 1961), and diversification of these subfamilies post-dates the Ss4R, occurring 40-50 million years ago (Campbell ; Macqueen and Johnston 2014). Phylogenetic analysis has revealed that the majority of the salmonid genome returned to a diploid inheritance state prior to the divergence of the subfamilies (Robertson ). However, the rediploidization process is still incomplete and approximately 20–25% of each salmonid genome still shows signals of tetrasomic inheritance (i.e., residual tetrasomy, or the recombination between homeologs that results in the exchange of alleles between homeologous chromosomes (Allendorf ; Lien ; Robertson )). Early evidence for residual tetrasomy in the salmonids was identified using allozyme studies in experimental crosses (Allendorf and Danzmann 1997) and, more recently, by linkage maps, sequenced genomes, and sequence capture (Kodama ; McKinney ; Robertson ; Christensen ; Pearse ). High-density linkage maps that include both duplicated and non-duplicated markers have revealed that eight pairs of homeologous chromosomes repeatedly display evidence of residual tetraploidy despite independent fusion and fission events (Brieuc ; Kodama ; Sutherland ). These chromosomes, often referred to as the “magic eight,” have been observed in linkage mapping studies of coho salmon Oncorhynchus kisutch (Kodama ), Chinook salmon O. tshawytscha (Brieuc ; McKinney ; McKinney ), pink salmon O. gorbuscha (Tarpey ), chum salmon O. keta (Waples ), and sockeye salmon O. nerka (Larson ). Mapping studies have also revealed that at least one of the two homeologs exhibiting residual tetraploidy is within a chromosomal fusion, suggesting their role in tetrasomy persistence (Brieuc ; Kodama ; Sutherland ). Analysis of sequenced genomes for Atlantic salmon (Salmo salar) and rainbow trout (O. mykiss) also support evidence of residual tetraploidy, although in these genomic studies only seven of these pairs were identified as displaying clear signals of conserved residual tetraploidy (Lien ; Campbell ). Sequence capture analysis also identified seven pairs displaying conserved signals of residual tetrasomy across species (Robertson ). This led to the definition of two types of homeologous regions: 1) ancestral ohnologue resolution (AORe) regions with relatively low sequence similarity with ancestral homeologs that likely rediploidized prior to species diversification; and 2) lineage-specific ohnologue resolution (LORe) regions with high sequence similarity among homeologs, likely maintained by residual tetraploidy (Robertson ). Over the past decade, an extensive proliferation of genomic resources has occurred within salmonids. Currently, linkage maps that include duplicated regions are available for five Oncorhynchus species, and genome assemblies are available for grayling Thymallus thymallus (Savilammi ), Atlantic salmon (Lien ), Arctic char Salvelinus alpinus (Christensen ), rainbow trout (Pearse ), and Chinook salmon (Christensen ). These resources permit investigation into the processes of rediploidization and residual tetrasomy across the salmonid family. There is however an underrepresentation of other lineages within the salmonid family, such as the Coregoninae subfamily (but see Gagnaire , and recently De-Kayne ). Our focal species for this manuscript was the North American cisco (Coregonus artedi). Cisco are a commercially, economically, and ecologically important species across northern North America. Additionally, these species are preyed upon by many apex predators and have historically represented an important trophic linkage in freshwater ecosystems, such as in the Laurentian Great Lakes (Eshenroder ). Cisco also display extremely high phenotypic diversity, which has led to the definition of multiple forms based primarily on morphological evidence (Eshenroder ; Koelz 1929; Yule ). Recent environmental shifts and a renewed focus on conservation of native species has resulted in increased interest in restoring cisco in Laurentian Great Lakes and other inland lakes in the United States and Canada (Zimmerman and Krueger 2009; Eshenroder ). Key to this restoration effort is understanding the relative roles of phenotypic plasticity and adaptive genetic diversity in shaping phenotypic diversity within cisco, and genomic tools and resourced are needed to address these important questions. In the current study, we develop a high-density linkage map for cisco, the first haploid linkage map for the Coregoninae subfamily, and analyze existing genomic resources for other salmonids with the goal of investigating patterns of residual tetrasomy and chromosomal fusion and fission history across the salmonid family, with particular focus on the coregonines. Our results suggest that (1) interspecific variation in residual tetrasomy is greater than previously observed; (2) binary definitions of chromosome ploidy status may not adequately capture variation within and among species; (3) linkage maps and sequenced genomes identify slightly different patterns regarding residual tetrasomy; and (4) a large number of fissions and fusions are specific to the base of the Coregoninae subfamily and species-specific fusions within Coregoninae are rare. This study uses new and existing resources to conduct the most comprehensive analysis of residual tetrasomy across the salmonid phylogeny to date.

Methods

Experimental crosses for linkage mapping

Genotypes from four diploid families (n = 73, 81, 84, and 95) and three haploid families (n = 80, 111, 139) were used to build sex-specific linkage maps (Table S2). Diploid crosses were constructed from cisco collected in northern Lake Huron (45° 58’51.6” N -84 °19’40.8” W, USA) during spawning season (November 2015) by U. S. Fish and Wildlife Service crews using standardized gill net assessment methods. Gametes for haploid crosses were collected following the same methods, from the same location and month but in 2017. Gametes were extracted from mature fish and eggs were combined directly with sperm to produce diploid crosses or with sperm that had been irradiated with 300,000 µJ/cm2 UV light for two minutes to break down the DNA and produce haploid crosses. UV irradiation leaves the sperm intact so that the egg can be activated but no paternal genetic material is contributed (i.e., gynogenesis, Chourrout 1982), resulting in haploid embryos with maternal genetic material only. Crosses were made in the field and transported to the U. S. Geological Survey-Great Lakes Science Center, Ann Arbor, Michigan (USA) for rearing. Tissue samples (fin clips) were taken from adult parents, from offspring of the diploid crosses at age two, and from haploids approximately 50 days post fertilization. All samples were preserved in a combination of 95% ethanol and 5% EDTA and sent to the University of Wisconsin, Stevens Point Molecular Conservation Genetics Lab for processing. Laboratory and field collections were conducted under the auspices of the U.S. Fish and Wildlife Service and U.S. Geological Survey-Great Lakes Science Center and all necessary animal care and use protocols were filed by these agencies.

DNA extraction and RAD-sequencing library preparation

DNA was extracted using DNeasy 96 Blood and Tissue Kits (Qiagen, Valencia, California) per the manufacturer’s instructions. Quality and quantity of the extracted genomic DNA was measured using the Quant-iT PicoGreen double-stranded DNA Assay (Life Technologies) with a plate reader (BioTek). To confirm ploidy of haploid samples, parents and offspring were genotyped using six polymorphic microsatellite loci known to occur at diploid sites developed by Angers , Patton , and Rogers , and individuals were classified as haploids if only a single allele was present at all loci. The probability of not detecting a diploid if a diploid was present is ∼1.09% based on microsatellite heterozygosity in the parental population (unpublished data, Wendylee Stott). Genomic DNA from diploids and confirmed haploids was prepared for RAD sequencing using the SbfI restriction enzyme following the methods outlined in Ali except shearing restriction digested DNA was done with NEBNext dsDNA Fragmentase (New England Biolabs, Inc) instead of sonication. DNA was then purified and indexed using NEBNext Ultra DNA Library Prep Kit for Illumina per the manufacturer’s instructions (New England Biolabs, Inc). Libraries were sequenced on a HiSeq4000 with paired end 150bp chemistry at the Michigan State Genomics Core Facility (East Lansing, MI).

SNP discovery and genotyping

Quality filtering, SNP identification, and genotyping was conducted using Stacks v.2.2 (Rochette and Catchen 2017). First, samples were demultiplexed with process_radtags with flags -c, -q, -r, -t 140,–bestrad. Markers were discovered de novo and genotyped within individuals with ustacks (flags = -m 3, -M 5, -H–max_locus_stacks 4,–model_type bounded,–bound_high 0.05,–disable-gapped). A catalog of loci was created using a subset of the individuals (diploid parents = 8, haploid parents = 5, wild fish = 38, total cisco = 51) with cstacks (-n of 3,–disable-gapped). The 38 wild fish used in the catalog were collected from the same geographic area using the same collection methods as listed above and were included to search for a sex identification marker, which was unsuccessful (data not shown). Putative loci within each individual fish were matched against the catalog with sstacks (flag =–disable-gapped), tsv2bam was used with only the forward reads to orient the data by SNP, and gstacks was used to combine genotypes across individuals. Only the forward reads from the paired-end data were used in gstacks due to variable read depth in reverse reads and thus less reliable genotyping. gstacks was also run separately with the forward and reverse reads using tsv2bam to assemble longer contigs for sequence alignment and annotation. Final genotype calls were output as VCF files with populations (flags = -r 0.75), with each family grouped as a separate population in the popmap sample interpretation file. VCFtools (Danecek ) was used to identify and remove individuals from the study that were missing more than 30% of data. Maximum likelihood-based methods developed by Waples were used to identify loci that could be mapped in haploid crosses and to identify potentially duplicated loci. Custom Python scripts available on GitHub (Python Software Foundation version 2.7) (see Data Availability), were used to filter the haplotype VCF file output from the populations module to identify loci that could be mapped in the diploid families. Loci missing more than 25% of data and loci that were genotyped as heterozygous in both parents of diploid families (and therefore could not be reliably mapped) were removed (as in Larson ). Individual genotypes were exported with the custom Python scripts as LepMap3 input files. As a final step before linkage mapping, genotypes from all seven families (haploid and diploid) were combined into a single dataset to form the final female LepMap3 input file and the four diploid families were combined into a single dataset to form the final male LepMap3 input file.

Linkage mapping

The program LepMap3 (Rastas 2017) was used to construct linkage maps following the methods of McKinney . Due to heterochiasmy (i.e., recombination rate differences between males and females) that occurs in the Salmonidae family (Sakamoto ), a separate map was constructed for each sex. Loci were filtered and clustered into linkage groups (LGs) based on recombination rates by calculating logarithm of the odds (LOD) scores between all pairs of loci with the SeparateChromosomes2 module. The LOD scores were chosen by increasing the LOD value by one with no minimum marker parameter until the number of LGs stabilized and was similar to that expected based on the haploid karyotype of cisco (N = 40, Phillips ) and the number of makers for additional LGs in the female map was less then 100 markers and less then 10 makers on the male map. The final LOD scores used to generate the map were LOD = 15 and 5 for the female and the male maps, respectively. Loci were then ordered within LGs by utilizing paternal and maternal haplotypes as inheritance vectors with the OrderMarkers2 module. We used a minimum marker number per LG of 100 for the female map and 40 for the male map as LGs with fewer markers did not display consistent synteny with genomic resources (i.e., markers aligned to >>2 chromosome arms) and were likely statistical artifacts (data not shown). LGs were reordered and markers removed until no large gaps remained (Rastas 2017).

Comparative analysis of syntenic regions of linkage maps via mapcomp

mapcomp can be used to compare syntenic relationships among markers between linkage maps of any related species using a genome intermediate from another related species (Sutherland ). Here, mapcomp was used to compare the cisco map with other Coregonus spp., including lake whitefish C. clupeaformis (Gagnaire ), European whitefish C. lavaretusAlbock” (De-Kayne and Feulner 2018), as well as other representative species from other salmonid genera (i.e., Atlantic salmon (Lien ), brook trout S. fontinalis (Sutherland ), and Chinook salmon (Brieuc ) and a representative outgroup to the salmonid WGD, northern pike Esox lucius (Rondeau )). All code to collect and prepare maps, and run the analysis are available on GitHub (see Data availability). MapComp pairs loci between the two compared linkage maps if they align at the same locus or close to each other on the same contig or scaffold on the intermediate reference genome (Sutherland ). Due to the large phylogenetic distance covered in this analysis, two reference genomes were used, including grayling (Savilammi ) for comparisons within Coregonus and Atlantic salmon (Lien ) for comparisons between all species. As salmonid chromosomal evolution is typified by Robertsonian fusions (Phillips and Rab 2001), fused chromosome arms in cisco, lake whitefish, and European whitefish were identified by aligning cisco markers to multiple salmonid genomes to identify cases where one cisco LG corresponded with at least two chromosome arms in another species. The fusion and fission phylogenetic history was plotted based on the most parsimonious explanation of common fusions among Coregonus spp., basing the approximate occurrences of fusions on shared fusions among species. Fusion history shared at the base of the Salmoninae lineage was taken from earlier work (Sutherland ).

Homeolog identification, similarity and inheritance mode

Homeologous chromosome arms can be identified in haploid crosses by mapping multiple alleles of duplicated markers based on the expected segregation ratio per paralog as described in Brieuc . Duplicated markers in cisco were mapped using this method, and duplicated markers from previously constructed linkage maps for coho salmon (Kodama ), Chinook salmon (McKinney ), pink salmon (Tarpey ), chum salmon (Waples ), and sockeye salmon (Larson ) were obtained. Homeologs were then ranked based on the number of markers supporting each known homeologous relationship and the number of duplicated markers was used to determine patterns of inheritance (i.e., disomy vs. tetrasomy) of the homeologous pair. Homeology was assessed by comparing DNA sequence similarity between homeologous arms from chromosome-level genome assemblies. Genomes included in this analysis were grayling (GCA_004348285.1, Savilammi ), Atlantic salmon (GCF_000233375.1, Lien ), Arctic char (GCF_002910315.2, Christensen ), rainbow trout (GCA_002163495.1, Pearse ), and Chinook salmon (GCF_002872995.1, Christensen ). Homeologous arms were inferred either as identified in the original genome paper (references above) or through MapComp comparisons. Homeologous arms were then aligned to determine sequence similarity using LASTZ v1.02 (Harris 2007) following methods outlined in Lien . Options specified with LASTZ included–chain–gapped–gfextend–identity = 75.0..100.–ambiguous = iupac–exact = 20. The analysis was restricted to alignments with minimum percent match values of 75%, and a minimum length of 1,000 base pairs to minimize the likelihood of spurious alignments that might be due to gene family duplication rather than WGD. Overall similarity of a homeologous pair was represented by the median percent similarity of all alignments, weighted by alignment length, and summarized with boxplots for each homeologous pair in each species ordered based on descending median percentage sequence similarity. Each homeologous pair was classified into one of two categories, tetrasomic or disomic, using a machine learning approach. Previous research indicates that salmonids are undergoing rediploidization of tetrasomic homeologous pairs, disomic homeologous pairs, and intermediate homeologous pairs of uncertain affinity (Campbell ; Lien ). To objectively classify protokaryotypes into tetrasomic homeologous or disomic homeologous pairs in each species, a training set was constructed containing the four highest and four lowest sequence similarity homeolog pairs. A k – nearest neighbor classification (knn) approach was then applied to the dataset using this training set. The k - nearest neighbor method uses votes from the training set to classify, therefore it supplies an objective method not only to place protokaryotypes into either a tetrasomic or disomic class, but also to identify intermediates and toward which class they are more similar based on the number and kind of votes received from the training set. In order to establish the k nearest-neighbors for each species, we used the resampling-based approach of 10-fold cross-validation repeated for 100 iterations implented in the trainControl function in the R package caret (v6.0-84, Kuhn 2019). For a k of one to 10, the median sequence similarity between all protokaryotypes for each species was divided into 10 folds, with the first fold used to test the model and the remaining folds to train the model. Next, the second fold was used to test the model and the other folds were the training set. This process continued for a total of 10 times and was repeated 100 times. The k for each species was chosen based on the largest number of k nearest-neighbors exhibiting the highest accuracy from the cross-validation procedure. This k was then used to classify homeologous pairs as disomic or tetrasomic, along with the predefined training set (knn function, Ripley and Venables 2019). Overall, similarity between the two predicted categories from the knn classification was tested with a Wilcoxon test (R Core Team 2018) incorporating percent similarity from all alignments to determine whether the categories displayed significantly different sequence similarity between homeologs (alpha = 0.01). All scripts used in this analysis are available on GitHub (see Data Availability).

Data availability

Raw sequence data has been uploaded to SRA under BioProject PRJNA555579. File S1 contains detailed descriptions of all supplemental files. File S2 contains sampling information for cisco (C. artedi) families. File S3 contains the Male linkage map for cisco (C. artedi). File S4 contains information for each marker on the female and male cisco (C. artedi) linkage maps. File S5 contains homologous chromosome arms determined by mapcomp. File S6 contains the probable metacentric chromosomes from the mapcomp analysis for coregonines. File S7 contains homeologous chromosome pairs for currently available haploid linkage maps. File S8 contains all the homeologous chromosome pairs for all available salmonid genomic resources. File S9 contains support for classifications from k – nearest neighbor machine learning algorithm. Code used to generate the Linkage mapping is available at https://github.com/DaniBlumstein/Cisco-Linkage-Map. Code used to collect Coregonus maps and running mapcomp is available at https://github.com/bensutherland/coregonus_mapcomp. Code used for classifications from k nearest-neighbor machine learning algorithm is available at https://github.com/MacCampbell/residual-tetrasomy. Supplemental material available at figshare: https://doi.org/10.25387/g3.12588551.

Results

RADseq, SNP discovery, and data filtering

RADseq data were obtained from 746 cisco across seven families, with an average of 4.1 M reads per individual (range: 1.1 – 30.8 M reads per individual). Individuals that were genotyped at more than 30% of loci and loci that were genotyped in more than 75% of the total individuals were retained, resulting in a dataset of 676 individuals (n = 333 diploid offspring; 330 haploid offspring; and 13 parents) and 49,998 unique polymorphic loci (Supplementary file S2). A total of 22,020 unique loci were mapped in the female (Figure 1) and male linkage maps (Supplementary file S2) and 27,978 loci were unplaced. The female map included 20,292 loci distributed across 38 LGs (Table 1), the male map included 6,340 loci distributed across 40 LGs, and 4,612 loci were present on both maps (Supplementary file S3). A total of 40 chromosomes was expected from karyotyping of coregonine fishes from the Great Lakes (Phillips ), which matches the number of LGs mapped in males. However, male LGs 39 and 40 contained relatively few markers and may be fragments of other linkage groups rather than the two linkage groups that were not mapped in females. Eight LGs (i.e., Cart01 – Cart08) were identified as metacentric based on homology to two chromosome arms in other salmonids using mapcomp (see below). In the female map, metacentric LGs were on average 85.44 cM (57.56 – 101.35 cM) and contained and average of 731.5 loci (range: 592 – 856). Putative acrocentric LGs in the female map were on average 59.10 cM (50.97 - 64.53 cM) and contained and average of 484 loci (range: 292 – 582). The total length of the female map was 2,456.51 cM. The average lengths of metacentric LGs on the male map were 66.76 cM (51.62 – 87.14 cM) and they contained 212 loci on average (range: 159 – 278). Putative acrocentric LGs in the male map were on average 57.00 cM (40.54 – 83.66 cM) and contained 145 loci (range: 41 – 224). The total length of the male map was 2,357.97 cM. We identified 3,383 putatively duplicated loci on the female linkage map, and of these, 2,671 loci mapped to one paralog and 709 loci mapped to both paralogs.
Figure 1

A) Female linkage map for cisco (Coregonus artedi) containing 20,292 loci. Each dot represents a locus, duplicated loci are blue and non-duplicated loci are gray. Lengths are in centimorgans (cM). Approximate location of centromeres for metacentric LGs are denoted in red. Metacentric LGs were identified through homologous relationships of chromosome arms with other Salmonids via mapcomp. B) Circos plot of cisco LGs highlighting 17 supported homeologous regions within the linkage map. Included in the 17 homeologous regions are six of the eight regions that are likely still residually tetrasomic across the Salmonids. Colors represent the number of markers supporting relationship, with darker colors representing higher marker numbers (maximum support = 86 markers) and theoretical links inferred via mapcomp.

Table 1

Linkage map results for the female and male cisco (Coregonus artedi) linkage maps. Duplicated and non-duplicated loci are from the female linkage map. Linkage group (LG) type is denoted with acrocentric (A) and metacentric (M).

LengthTotalLoci/cM
C. art LGDuplicated LociNon-duplicated LociFemale (cM)Male (cM)FemaleMaleFemaleMaleLG type
Cart01203510101.3587.147132247.032.57M
Cart0238038497.7761.117641597.812.60M
Cart0322347793.5678.987002487.483.14M
Cart044554792.4567.805921946.402.86M
Cart0527258491.7358.078562199.333.77M
Cart0610472991.3668.958332789.124.03M
Cart0718449657.7260.4068020411.783.38M
Cart0818253257.5651.6271417012.403.29M
Cart0923027164.5344.47501927.762.07A
Cart101848863.9849.685061527.913.06A
Cart113651063.7249.415461658.573.34A
Cart1237017763.1874.69547808.661.07A
Cart132455862.9053.675821819.253.37A
Cart146845462.6659.745221408.332.34A
Cart158731862.4548.264051216.492.51A
Cart162245662.2545.804781537.683.34A
Cart172747862.2553.195051608.113.01A
Cart182244661.6842.084681367.593.23A
Cart192754161.8575.085682139.182.84A
Cart202853660.1948.365642249.374.63A
Cart213038859.0461.234181427.082.32A
Cart221954458.9649.495632049.554.12A
Cart234947358.5755.395221628.912.92A
Cart241155658.3857.235671959.713.41A
Cart252345858.2851.084811618.253.15A
Cart262844558.0950.914731478.142.89A
Cart273049958.0060.695292139.123.51A
Cart282444457.7954.444681418.102.59A
Cart291340457.7862.644171497.222.38A
Cart302738457.7671.504111417.121.97A
Cart312254156.9377.825631849.892.36A
Cart321547856.5675.174931778.722.35A
Cart331946056.3749.574791468.502.95A
Cart342608255.5483.66342626.160.74A
Cart353825454.7440.54292815.332.00A
Cart361833254.4545.503501176.432.57A
Cart372340253.1655.824251227.992.19A
Cart382443150.9773.214551718.932.34A
Cart39000.0045.130410.000.91A
Cart40000.0058.460710.001.21A
Average80.63426.6861.4158.95507.30158.507.892.73
Total3225170672456.512357.98202926340315.42109.34
A) Female linkage map for cisco (Coregonus artedi) containing 20,292 loci. Each dot represents a locus, duplicated loci are blue and non-duplicated loci are gray. Lengths are in centimorgans (cM). Approximate location of centromeres for metacentric LGs are denoted in red. Metacentric LGs were identified through homologous relationships of chromosome arms with other Salmonids via mapcomp. B) Circos plot of cisco LGs highlighting 17 supported homeologous regions within the linkage map. Included in the 17 homeologous regions are six of the eight regions that are likely still residually tetrasomic across the Salmonids. Colors represent the number of markers supporting relationship, with darker colors representing higher marker numbers (maximum support = 86 markers) and theoretical links inferred via mapcomp. The main focus of our comparative analysis was to define the homologous and homeologous relationships among the linkage groups available for the coregonines, specifically in cisco (current study), lake whitefish (Gagnaire ), and European whitefish (De-Kayne and Feulner 2018), and bring these species into the context of the broader chromosomal correspondence within the lineage by identifying the homologous chromosome arms in brook trout, Atlantic salmon, and Chinook salmon, as well as the non-duplicated northern pike (Table 2, Supplementary file S5). To facilitate these comparisons, we applied the same chromosome identification system as used by Sutherland , here termed the “protokaryotype identifier (PK)” system. For consistency, we maintain the 0.1 and 0.2 definitions for each ancestral chromosome pair (PK) as used in the prior work (Sutherland ).
Table 2

MapComp results documenting homologous chromosomes for the three coregonines; cisco (C. art), lake whitefish (C. clu, Gagnaire ), and European whitefish (C. alb, De-Kayne and Feulner 2018), integrated with Atlantic salmon (S. sal, Lien ), brook trout (S. fon, Sutherland ), and Chinook salmon (O. tsh, Brieuc ). Homologous chromosomes for all species are named using the corresponding Northern Pike (E. luc) linkage group as a reference (Rondeau ), as per Sutherland , here termed protokaryotype ID (PK). Letters after linkage group (LG) names indicate the first (a) or second (b) arm of the LG, ^ indicates weak evidence and * indicates uncertainty between homeologs from MapComp analysis

E. luc (PK)C. artC. albC. cluS. fonS. salO. tsh
01.1Cart23Calb16Cclu28Sf25Ssa20bOts13q
01.2Cart14Calb33Cclu35Sf38Ssa09cOts14q
02.1Cart01aCalb02b^*Cclu04aSf06aSsa26Ots04q
02.2Cart12Calb02b^*Cclu04a^*Sf28Ssa11aOts12q
03.1Cart25Calb19Cclu25Sf22Ssa14aOts10q
03.2Cart26Calb22Cclu26Sf11Ssa03aOts28
04.1Cart30Calb29Cclu16Sf33Ssa09bOts08q
04.2Cart21Calb30Cclu29Sf07bSsa05aOts21
05.1Cart06bCalb01aCclu05aSf01aSsa19bOts24
05.2Cart18Calb35 or Calb40Cclu15Sf27Ssa28Ots25
06.1Cart06aCalb01bCclu05bSf01bSsa01bOts01q
06.2Cart15Calb27Cclu05b^*Sf36Ssa18aOts06q
07.1Cart20Calb06Cclu13Sf08bSsa13bOts09p
07.2Cart19Calb07Cclu08Sf09Ssa04bOts30
08.1Cart27Calb17Cclu36Sf04aSsa23Ots01p
08.2Cart03aCalb08Cclu06aSf17Ssa10aOts05q
09.1Cart03b*not identifiedCclu06bSf42Ssa02bOts32
09.2Cart34*not identifiednot identifiedSf03bSsa12aOts02q
10.1Cart08aCalb20bCclu10Sf23Ssa27Ots13p
10.2Cart04bCalb09aCclu24aSf34Ssa14bOts31
11.1Cart05aCalb13bCclu18Sf14Ssa06aOts27
11.2Cart09Calb34not identifiedSf08aSsa03bOts09q
12.1Cart33Calb14Cclu27Sf18Ssa13aOts22
12.2Cart16Calb28Cclu14Sf30Ssa15bOts16q
13.1Cart17Calb25Cclu34Sf06bSsa24Ots04p
13.2Cart32Calb31Cclu37Sf40Ssa20aOts12p
14.1Cart01bCalb02aCclu04bSf13Ssa01cOts20
14.2Cart38Calb11Cclu33Sf10Ssa11bOts33
15.1Cart10Calb18Cclu31Sf35Ssa09aOts08p
15.2Cart31Calb10Cclu22Sf12Ssa01aOts11q
16.1Cart07aCalb03bCclu02b or Cclu03Sf26Ssa21Ots26
16.2Cart28Calb21Cclu32Sf24Ssa25Ots03q
17.1Cart24Calb05Cclu38Sf03aSsa12bOts02p
17.2Cart22Calb12Cclu21Sf21Ssa22Ots07q
18.1Cart29Calb24Cclu40Sf19Ssa15aOts05p
18.2Cart37Calb23Cclu17Sf31Ssa06bOts18
19.1Cart13Calb04Cclu30Sf15Ssa10bOts19
19.2Cart11Calb15bCclu11Sf20Ssa16aOts06p
20.1Cart08b^Calb36Cclu01aSf07aSsa05bOts23
20.2Cart02bCalb20anot identifiedSf29Ssa02aOts03p
21.1Cart05bCalb13aCclu12Sf05bSsa29Ots29
21.2Cart36Calb26Cclu39Sf16Ssa19aOts16p
22.1Cart02aCalb39^not identifiedSf39Ssa17aOts07p
22.2not identifiedCalb15aCclu19^Sf05aSsa16bOts14p
23.1Cart07b^*Calb03aCclu02aSf02bSsa07bOts15p
23.2Cart07b^*missingCclu01b^Sf37Ssa17bOts17
24.1Cart04aCalb09bCclu24bSf02aSsa07aOts15q
24.2Cart35Calb32Cclu23Sf32Ssa18bOts10p
25.1not identifiednot identifiedCclu09^Sf04bSsa04aOts34
25.2not identifiednot identifiednot identifiedSf41Ssa08aOts11p
In brief, PKs correspond to hypothetical ancestral salmonid chromosomes, which are thought to be similar to the salmonid WGD sister outgroup, the Esociformes (Ishiguro , López ) and are ordered as PK 01-25. Protokaryotypes correspond 1:1 with the northern pike genome but have two descendant homeologous regions within salmonid genomes. For example, PK 01 corresponds to northern pike chromosome 01 and was an ancestral pre-duplication salmonid chromosome which gave rise to homeologous Atlantic salmon chromosomes Ssa09c (PK 01.2) and Ssa20b (PK 01.1) and to homeologous rainbow trout Omy27 (PK 01.1) and Omy24 (PK 01.2) (Supplementary file S8; Sutherland ). PKs in the previously hypothesized “magic eight” PKs from linkage mapping studies are PKs 02, 06, 09, 11, 20, 22, 23, 25. PKs defined as LORes by Robertson and those that displayed residual tetraploidy in previous genome-based studies (Lien ; Campbell ) are the same as these with the exception of PK 06, which is not identified as residually tetraploid. Most PKs were identifiable in the mapcomp analysis conducted here, with some notable exceptions for each coregonine species. In cisco, chromosome arms PK 22.2, 25.1 and 25.2 were unidentified; two of these arms (PK 25.1 and 25.2) were also unidentified in the European whitefish linkage map (De-Kayne and Feulner 2018). Additionally, it was difficult to determine correspondences for PK 09 and 23. In European whitefish, five chromosome arms were unidentifiable (PK 09.1, 09.2, 23.2, 25.1 and 25.2), and there were homeology ambiguities for PK 02, as well as homology ambiguities for PK 05.2 (Table 2). In lake whitefish, five arms were unidentifiable (i.e., PK 09.2, 11.2, 20.2, 22.1, and 25.2), and there were homeology ambiguities for PK 02.2 and 06.2. In multiple species, arms where it was difficult to determine homologous relationships often had a high proportion of duplicated loci, presumably making distinguishing homologs and homeologs challenging. Nonetheless, most homologs and homeologs (42/50; 84%) were identified in all three coregonine species. This information was then leveraged to characterize the fusion/fission history within the Coregoninae lineage using the methods outlined in Sutherland . The fusion/fission analysis indicated far fewer species-specific fusions than identified for subfamily Salmoninae in Sutherland , with most fusions that occurred within subfamily Coregoninae occurring prior to the divergence of the coregonines (Figure 2). This difference in species-specific fusions may also be related to the general lower number of fusions in coregonines relative to Salmo and Oncorhynchus (Supplementary file S6); although, in the coregonines the majority of fusions were observed in more than one species, which was not observed in most other species previously characterized. Two strongly supported fusions were observed in all three coregonine species: fusions PK 05.1-06.1 and 10.2-24.1. PK 11.1-21.1 was fused in both cisco and European whitefish, which presumably underwent a fission in lake whitefish (Figure 2). However, evidence for the correspondence for lake whitefish for PK 11.1 and 21.1 was not highly conclusive, and a more recent analyses of lake whitefish by regenerating the linkage map suggests that PK 11.1-21.1 may have not underwent a fission in this species and is indeed still fused (Claire Mérot, pers. comm.). Therefore, more work is needed to determine whether this fusion is conserved in all three species. The full characterization of fissions will require the resolution of the ambiguous arms that are considered as probable in the current analysis, and this may be further clarified in future work.
Figure 2

Fusions and fissions in the Coregoninae and Salmoninae lineages. This is an extension of Figure 4 from Sutherland . White boxes display the fusion events, where the homologous chromosomes for all species are named according to the protokaryotype ID. Bold and underlined chromosome numbers are the homeologous pairs that exhibit residual tetraploidy (i.e., “magic eight”), * indicate uncertainty in one species, and ** indicates uncertain in two species (i.e., C. artedi is ambiguous for homeolog 09.1 or 09.2 while C. lavaretus is missing 09.1 and 09.2). Above the species names are conserved fusions, whereas below are the species-specific fusions. The phylogeny is adapted from (Crête-Lafrenière ). Branch lengths do not represent phylogenetic distance, only relative phylogenetic position. 1Arms 11.1-21.1 were fused in both C. artedi and C. lavaretus, but likely underwent fission in lake whitefish (but see Results).

Fusions and fissions in the Coregoninae and Salmoninae lineages. This is an extension of Figure 4 from Sutherland . White boxes display the fusion events, where the homologous chromosomes for all species are named according to the protokaryotype ID. Bold and underlined chromosome numbers are the homeologous pairs that exhibit residual tetraploidy (i.e., “magic eight”), * indicate uncertainty in one species, and ** indicates uncertain in two species (i.e., C. artedi is ambiguous for homeolog 09.1 or 09.2 while C. lavaretus is missing 09.1 and 09.2). Above the species names are conserved fusions, whereas below are the species-specific fusions. The phylogeny is adapted from (Crête-Lafrenière ). Branch lengths do not represent phylogenetic distance, only relative phylogenetic position. 1Arms 11.1-21.1 were fused in both C. artedi and C. lavaretus, but likely underwent fission in lake whitefish (but see Results).
Figure 4

Distribution of protokaryotype (PK) similarity in aligned sections between the homeolog pairs across salmonids based on genome assemblies. For each species with a genome sequence, the percent similarity (y – axis) of the 25 PK pairs as shown as box plots. PK pairs are ranked from highest to lowest median similarity for each species (x – axis), with the average similarity of protokaryotypes presented as a dashed line. The classification of protokaryotypes by the machine learning approach described in the main text into putatively tetrasomic and disomic pairs is shown through coloring of the boxplots into purple (putatively tetrasomic) and yellow (putatively disomic). The number of alignments used in computing similarity is presented at the top of each bar. Those protokaryotypes that did not receive the highest observed voting proportion for the assigned class are indicated with an asterisk (*). PKs with high variance (e.g., PK 11 in S. alp) may be due to methodological limitations that have caused additional non-homeologous chromosome arms to be included in the comparisons (see discussion). Species abbreviations are grayling (T. thy), Atlantic salmon (S. sal), Arctic char (S. alp), rainbow trout (O. myk), and Chinook salmon (O. tsh)

In summary, five fusions were likely shared among all three species, and one was shared between cisco and European whitefish, and possibly all three species (PK 11.1-21.1). Cisco had two species-specific fusions (PK 10.1-20.1 and 22.1-20.2), bringing the total count of observed fusions to eight. European whitefish had one species-specific fusion (PK 20.2-10.1), bringing the total count of observed fusions to seven. Lake whitefish also had one species-specific fusion (PK 20.1-23.2), bringing the total count to six. Interestingly, the PK 09.2-17.1 fusion that was originally proposed to be shared among all known salmonids (Sutherland ), was found not to be fused in any of the species here, suggesting either that this fusion occurred after the divergence of Coregonus from the ancestor of the rest of the salmonids, or that a fission occurred at the base of the coregonines (Figure 2). The observation that this fusion was not present in grayling Varadharajan et al. 2018) suggests the former.

Homeolog identification, similarity, and inheritance mode

A second major goal of this study was to compare homeologous relationships and modes of inheritance within and among species. We identified 17 of the 25 homeologous chromosome pairs (PK) in cisco using the markers that could be mapped to both homeologs in the linkage map, and each homeologous pair shared between one and 86 duplicated loci (Figure 3, Supplementary file S7). Of the 17 homeolog pairs, six (PK 02, 06, 09, 11, 20, and 23) had many loci (42-86) supporting homeology; these are six of the “magic eight” discussed above. The other 11 had few markers supporting homeology (i.e., 1-6) and are not members of the “magic eight”. The other two arms found in the “magic eight” were not identifiable in cisco. All of the previously constructed linkage maps for salmonids that included duplicated regions had a large number of markers supporting homeology for the “magic eight” with the exception of pink salmon, where seven of the eight PKs had high support (34 – 68 loci) but one pair (PK25) displayed substantially lower support (nine loci) (Tarpey ) (Figure 3, Supplementary file S7).
Figure 3

Ranking of homeologous chromosome pairs based on putative residual tetrasomic inheritance as measured by the number of markers shared among homeologs for linkage maps or percent sequence similarity for genomes. A lower rank represents more marker pairs supporting a homeolog and or a higher sequence similarity. Chromosomes for all species are named according to the protokaryotype ID (PK). PKs are ordered in the figure by averaging the ranks across all species and then sorting the averages from smallest to largest (i.e., ordered from highest support for residual tetrasomy to lowest). Gray indicates that no duplicated loci could be mapped to both homeologs. Species abbreviations are grayling (T. thy), Atlantic salmon (S. sal), Arctic char (S. alp), rainbow trout (O. myk), Chinook salmon (O. tsh), cisco (C. art), and coho salmon (O. kis).

Ranking of homeologous chromosome pairs based on putative residual tetrasomic inheritance as measured by the number of markers shared among homeologs for linkage maps or percent sequence similarity for genomes. A lower rank represents more marker pairs supporting a homeolog and or a higher sequence similarity. Chromosomes for all species are named according to the protokaryotype ID (PK). PKs are ordered in the figure by averaging the ranks across all species and then sorting the averages from smallest to largest (i.e., ordered from highest support for residual tetrasomy to lowest). Gray indicates that no duplicated loci could be mapped to both homeologs. Species abbreviations are grayling (T. thy), Atlantic salmon (S. sal), Arctic char (S. alp), rainbow trout (O. myk), Chinook salmon (O. tsh), cisco (C. art), and coho salmon (O. kis). To better understand the genetic similarity between homeologs and infer inheritance mechanisms (i.e., residual tetrasomy or disomy), all 25 known homeologous relationships were compared in reference genomes for grayling, Atlantic salmon, Arctic char, rainbow trout, and Chinook salmon (Figures 3 and 4, Supplementary file S8). Using the machine learning algorithm (see Methods), the optimal k nearest-neighbor for each species was identified as five. Those five nearest neighbors from the training sets voted on the assignment of a particular PK to either putatively tetrasomic or disomic classes (Figure 4), and the proportion of votes supporting each assignment are reported in Supplementary file S9. The highest observed vote proportion for assignment to a class is 4 of 5 as a result of the limit on training set size to four of each class and the five optimal k nearest-neighbors indicated for accuracy. Distribution of protokaryotype (PK) similarity in aligned sections between the homeolog pairs across salmonids based on genome assemblies. For each species with a genome sequence, the percent similarity (y – axis) of the 25 PK pairs as shown as box plots. PK pairs are ranked from highest to lowest median similarity for each species (x – axis), with the average similarity of protokaryotypes presented as a dashed line. The classification of protokaryotypes by the machine learning approach described in the main text into putatively tetrasomic and disomic pairs is shown through coloring of the boxplots into purple (putatively tetrasomic) and yellow (putatively disomic). The number of alignments used in computing similarity is presented at the top of each bar. Those protokaryotypes that did not receive the highest observed voting proportion for the assigned class are indicated with an asterisk (*). PKs with high variance (e.g., PK 11 in S. alp) may be due to methodological limitations that have caused additional non-homeologous chromosome arms to be included in the comparisons (see discussion). Species abbreviations are grayling (T. thy), Atlantic salmon (S. sal), Arctic char (S. alp), rainbow trout (O. myk), and Chinook salmon (O. tsh) For Atlantic salmon and all Oncorhynchus spp. (i.e., rainbow trout and Chinook salmon), the same eight PKs (i.e., PK 01, 02, 09, 11, 20, 22, 23, 25) were classified as tetrasomic using the machine learning approach. This list of PKs includes all of those defined as LORes by Robertson , and one additional (i.e., PK 01), but does not include PK 06, which is considered to be part of the “magic eight” using linkage map evidence. Arctic char showed evidence for residual tetrasomy in seven of these eight PKs, with the exception of PK 11 (see below for details regarding this discrepancy due to other chromosome arms in this fusion). Grayling also shared seven of the eight residually tetraploid homeolog pairs, with the exception of PK 01. Most PKs received the highest possible vote proportions for their classifications (0.8), however, PK01 in Atlantic salmon and rainbow trout demonstrated a lower vote proportion (0.6) (Figure 4, Supplementary file S9), suggesting reduced support (i.e., lower sequence similarity) for this homeologous pair being tetrasomic. Additionally, PK 19 in grayling, did not have the highest vote proportion and was assigned as diploid but had the highest sequence similarity in that class (Figure 4, Supplementary file S9). Sequence similarity was significantly higher for the tetrasomic PKs across all species (P < 0.0001). Although the group of tetrasomic PKs was largely conserved across species, there was substantial variation in the relative sequence similarity between these homeolog pairs (i.e., order of highest to lowest similarity) among species. PK 01 consistently displayed the lowest sequence similarity of all the PKs in all five species where it was classified as tetrasomic and did not always receive the highest observed vote proportion (see above). However, there were a number of other homeolog pairs that displayed highly variable sequence similarity rankings across species (Figure 4). For example, PK 09 had the highest sequence similarity in the grayling genome, the sixth highest in the rainbow trout genome, and the fourth or fifth highest in the other genomes. This variation suggests that the frequency of tetravalent meiosis for each PK may differ across species and that the process of diploidization has occurred in a species-specific manner post WGD as suggested in the mechanisms proposed by Robertson .

Discussion

The amount of genomic resources available for Salmonidae has increased drastically over the last decade. However, many previous studies investigating genome evolution in salmonids focus on one or a few species, with a limited number of studies considering broader subsets of available taxa to understand patterns of genome evolution across the Salmonidae family (but see Sutherland ; Robertson ). Here, we utilize genomic resources along with a newly generated high-density linkage map for cisco to compare patterns of homology, fusion/fission events, homeology, and residual tetrasomy across species. The cisco linkage map incorporates duplicated regions and contains 20,292 loci, making it denser than most salmonid RAD-based haploid linkage (typically built from 3,000 to 7,000 loci). The higher density linkage map was achieved by using an updated RAD library preparation and linkage map algorithms (Rastas 2017) in addition to including more families and more individuals per family. Higher marker density allowed the identification of orthologous relationships between coregonines and other salmonids as well as to identify homeologous chromosomes in cisco. We also demonstrate the use of the protokaryotype ID (PK), defined here but first used in Sutherland , for comparative analyses in salmonids in order to unify and facilitate comparative approaches in salmonid linkage maps and chromosome-level assemblies. Comparisons across Salmonidae revealed that patterns of rediploidization are relatively similar across genera and loosely correspond with phylogeny. However, we did identify substantial variation in sequence similarity between homeologs both within species across homeolog pairs, and among species, suggesting that frequently used binary classifications such as AORe/LORe and “magic eight” may be oversimplified.

Protokaryotype identifiers to facilitate comparative genomics in salmonids and other fishes

Comparative genomics within Salmonidae is important for the interpretation of the effects of rediploidization after WGD on genome evolution (e.g., Berthelot ; Kodama ; Lien ). However, chromosomes in all species have been named differently, making it difficult to directly compare studies without complicated lookup tables or alignments to confirm homology (e.g., Brieuc ; Kodama ). Recently developed methods for connecting linkage maps through reference genomes (Sutherland ) facilitated description of homologous relationships for all linkage group arms across salmonids (with a few exceptions in coregonines). Additionally, Sutherland and Savilammi have explored the utility of naming chromosomes based on homology to northern pike. This naming system has the potential to facilitate comparative genomics in salmonids by creating a “Mueller element”-like system (reviewed in Schaeffer 2018), where each chromosome arm has a universal identifier. However, there also remains value in species-specific identifiers; for example, Cart03 is the third named linkage group in the Cisco linkage map (Table 2). By comparison, Cart03 named via the PK system could be Cart03 (PK 08.2-09.1) or Cart03PK08.2-09.1 as Cart03 represents the fusion of two ancestral salmonid chromosome arms 08.2 and 09.1 (Figure 1, Table 2). While Sutherland named salmonid chromosomes based on ancestral northern pike chromosomes, the utility of the system was not yet fully explored or discussed. Here, we demonstrate the utility of this system and advocate its use in future studies. For example, the PK system can facilitate comparisons of chromosomes containing genes for adaptive potential in sockeye salmon (So13PK18.2, TULP4, Larson ), for run timing in Chinook (Ots28PK03.2, GREB1L, Prince ), and for age-at-maturity in Atlantic salmon (Ssa25PK16.2, VGLL3, Barson ). While there may be some sections of the PK that are not always retained (e.g., some transposition of parts of chromosomes), as long as the majority of the chromosome is preserved, then the PK system enables general comparisons. The PK system will facilitate quick and accurate comparisons across taxa, adding significant value to the myriad studies searching for adaptively important genes and regions in salmonids by leveraging comparative approaches. This system was previously applied by Sutherland to compare sex chromosomes across the species by comparing chromosomes containing the transposing salmonid sex determining gene (sdY, Yano ). This comparison demonstrated that some chromosome arms more frequently contain or are fused to the chromosome that contains the sex determining gene than would be expected by chance or explainable by phylogenetic conservation (i.e., PK 01.2 (AC04q), PK 03.1 (Cclu25, Co30, So09), PK 19.1 (So09.5, AC04q.1), PK 15.1 (AC04q.2, BC35), Sutherland ). Even more intriguing is that the northern pike naming was based on the three-spined stickleback Gasterosteus aculeatus (Rondeau ), and PK 19 is the sex determining chromosome in three-spined stickleback (Peichel ). As observed above, this chromosome is often fused with sex chromosomes in salmon (Sutherland ). By comparison, using the naming system, it is easy to observe that LG24 in northern pike (i.e., PK 24 in salmonids), recently identified to hold the sex determining gene in northern pike (Pan ), does not appear to contain the sex-determining locus in any tested salmonid. Deriving this information would be more difficult without the PK system and would require extensive cross-referencing. The example of comparing sex chromosomes from the PK system indicates a broad phylogenetic utility of this nomenclature as it applies to three-spined stickleback (a neoteleost) as well as Esociformes and Salmoniformes. The protokaryotypes as defined here may be able to represent the ancestral karyotype of the five major euteleost lineages and be applicable in comparative genomic studies among and within (1) Esociformes and Salmoniformes, (2) Stomiatii, (3) Argentiniformes, (4) Galaxiiformes, and (5) Neoteleostei (Betancur-R et al. 2013). Exploration of the PK system as defined here and its applicability across euteleosts should be conducted to determine the suitability of the PK system for comparative genomics in the Euteleostei.

Homology and fission/fusion history in coregonines

Comparisons using linkage maps for three coregonine species (i.e., cisco, lake whitefish and European whitefish), allowed us to assess homology and variation in karyotypes across the genus. Within lake whitefish, aneuploidy has been documented in diverged populations and historical contingency (Dion-Côté , 2017). Our results show ambiguity in homologous relationships remained for at least five chromosome arms in all three coregonines. This degree of uncertainty was much higher than documented in Salmo, Oncorhynchus, and Salvelinus by Sutherland , where there were only two ambiguities across these groups. Coregonines appear to have a number of relatively small acrocentric chromosomes (Phillips and Rab 2001), some of which contain a high degree of duplicated loci, making constructing linkage maps more difficult than for other salmonids (Gagnaire ; De-Kayne and Feulner 2018). For example, PK 25 has never been successfully mapped in coregonines, likely because it is small, submetacentric or acrocentric, and contains many duplicates. In an attempt to recover the missing PK in coregonines linkage maps, various approaches were attempted, including using unassigned makers to form LGs, using only non-duplicated loci from the female cisco linkage map to form LGs, and aligning unassigned sequences to reference genomes. Markers either formed very large LG with many gaps, still remained unassigned, or aligned to unplaced scaffolds on reference genomes. In other salmonids, where PK 25 is part of larger and/or metacentric chromosome, mapping is expected to be easier as there are many disomically inherited markers on the chromosome. Interestingly, the fact that PK 25 is likely residually tetrasomic in cisco, even though it is likely an acrocentric or submetacentric chromosome, indicates that metacentric chromosomes may not be required for homeologous recombination, as previously suggested in Lien . This potentially contradicts previous theory which suggests that homeologous recombination requires at least one chromosome arm to be metacentric (Kodama ), but requires further testing given uncertainties regarding PK25. Additionally, PK 25.2 in grayling is a submetacentric chromosome and also displays signals of residual tetrasomy (Savilammi ), potentially providing further evidence that a small secondary arm may be sufficient to facilitate tetrasomic meiosis. The fusion history in coregonines differs substantially from many other members of the salmonid family. Members of the Coregonus, Salvelinus, and Thymallus genera possess the “A karyotype,” with a diploid chromosome number (2N) ∼80 and many acrocentric chromosomes, whereas Oncorhynchus and Salmo, possess the “B karyotype,” with 2N ∼60 and many metacentric chromosomes (Phillips and Rab 2001). Given that these both come from an ancestral type of n = 50 chromosome arms, species with the “A karyotype” have undergone fewer fusions than lineages with the “B karyotype”. Interestingly, it appears that “A karyotype” species also generally contain a lower proportion of species-specific fusions compared to “B karyotype” species, suggesting that the reduction in chromosome number and the higher frequency of metacentric chromosomes characteristic of the “B karyotype”, comes from species-specific fusions. Sutherland investigated fusion history within many species from the Oncorhynchus genus and found that most species had many species-specific fusions (e.g., 17 species-specific fusions in pink salmon). However, Sutherland only investigated one species from the Coregonus and Salvelinus genera as this was all that was available at the time of publication, and no species from Thymallus. Our current study is the first to investigate fusion history across multiple coregonines and illustrates that most fusions are shared among species in the Coregonus genus, contrasting the pattern observed in Oncorhynchus spp. (Sutherland ). The functional effect of differing fusion histories is yet to be determined, and remains an important question differentiating species within the Coregonus, Salvelinus, and Thymallus genera from other salmonids. Further information from genome sequencing projects, for example the European whitefish genome (De-Kayne ) should facilitate important future studies contrasting genomic processes and structure in species with differing fusion histories.

Patterns of homeology and residual tetrasomy across salmonids

Although patterns of residual tetrasomy were generally conserved, variation within and among species was observed when examining results from linkage maps vs. reference genomes. Sequence similarity analyses using reference genomes suggested that all species showed evidence for residual tetrasomic inheritance in seven homeologous pairs (PK 02, 09, 11, 20, 22, 23, and 25) with the exception of PK11 in Arctic char (see below). Using linkage maps, these same seven homeologous pairs have been found to be tetrasomic in Oncorhynchus (Kodama ; McKinney ), Salvelinus (Sutherland ; Nugent ), Salmo (Robertson ), and likely Coregonus (results reported herein), strongly suggesting that tetravalent meioses can and do form between these homeologs in all investigated species to date. However, evidence for residual tetrasomy differed between linkage map and genome methods for multiple PKs, most notably PK 06, which was classified as tetrasomic in linkage mapping studies but not in genome analyses, and PK 01 which was classified as tetrasomic in genome analyses but not linkage maps. It is likely that some of these differences are the result of methodological limitations of the current approach and point to future analysis approaches that may be able to improve upon the framework presented here. This is further described below. The observation that PK11 did not display high sequence similarity in Arctic char might suggest a difference in diploidization rates in Salvelinus compared to other salmonids for this homeologous pair, but it is more likely that methodological limitations prevented us from detecting residual tetrasomy, as a linkage map study in Arctic char found a high number of duplicated markers on this PK (Nugent ). The percentage similarity analysis applied in the present study uses complete chromosome alignments and requires post-filtering to remove non-homeologous alignments. This method appears to be robust when chromosome arms are well defined but, PK 11 in Arctic char appears to be composed of four chromosome arms that have come together in a series of species-specific fusions (inferred from Christensen ). Since arm boundaries were not well defined, alignments in this PK produced a wide interquartile range, suggesting that, while some regions of the PK are likely undergoing residual tetrasomy, the alignments may have masked these regions by integrating over multiple chromosome arms. This would be particularly problematic if the chromosomes being compared both contained non-target chromosomes that were homeologous. To improve upon the method applied here, better definition of the breaks between chromosome fusions could be applied and this could prevent such ambiguities or noise in the sequence similarity calculated. We therefore conclude that PK 11 is likely tetrasomic in Arctic char, but that we were unable to classify it as such due to methodological limitations. The sequence similarity method applied here is generally robust, but the fusion history of the species being analyzed needs to be considered to avoid unexpected and erroneous similarity values. Ideally, only the section containing the ancestral chromosome of interest would be being compared between the homeologs. This is an avenue of method development that will be valuable for future work. Contrastingly, the finding that PK 06 is not tetrasomic does not appear to be due to methodological limitations of our genome analysis but may be due to differences in estimating extent of residual tetraploidy between linkage mapping and genome assembly approaches. Linkage mapping in Oncorhynchus and Salvelinus consistently finds support for tetrasomic inheritance at PK 06 (Larson ; Nugent ), but the genome analysis conducted here and that was conducted for rainbow trout (Campbell ) found that this PK displayed intermediate sequence similarity consistent with disomic homeologs. One of the ways the two approaches differ is the length of the sequence used during each analysis. The genome analysis conducted here calculated similarity by using alignments of at least 1,000 bp, whereas linkage maps compare alleles within ∼100-150 bp RADtags. The short sequences analyzed by software such as Stacks (Rochette and Catchen 2017; Rochette ) make it possible to collapse sequences into a single locus that can be mapped at both paralogs, even when sequence divergence in a given region is relatively large. This makes linkage maps a less conservative characterization method for determining residual tetrasomy. In addition, many genome assemblers applied to salmonid genomes (e.g., Chin ; Koren ; Ruan and Li 2019) are not optimized for paralogous regions in polyploid genomes. This could be especially problematic for genomes that combine both disomic and tetrasomic regions, such as in salmonids. The end result is that duplicated regions may be detected as single copies as a result of sequence collapse during the assembly process (Alkan ; Varadharajan ). If sequences do not collapse during assembly, contigs might be fragmented and misassembled in the genome, making it difficult to differentiate between homologs and homeologs (Kyriakidou ). This could lead to homeologous regions being missed altogether in genome sequences, particularly in comparisons that require chromosome-level assemblies. However, the fact that support for tetrasomic inheritance in other PKs identified as tetrasomic through linkage mapping was consistent with that observed in genome analysis strongly suggests that there is something unique with PK 06 rather than a fault with the genome analyses conducted here. Perhaps, as suggested by Campbell , the PK 06 chromosome arms are returning to a diploid state faster than the other seven tetrasomic homeolog pairs or the tetrasomically inherited portion of PK 06 is smaller than other tetrasomic PKs. Another notable difference between linkage mapping and genome analysis was the consistent classification of PK 01 as tetrasomic in the genome analysis (five of six species) but not in any linkage map. PK 01 uniformly exhibited the least similarity between tetrasomic homeologous pairs and was assigned to the putatively tetrasomic class of PKs with less certainty by the machine learning algorithm. This suggests that PK 01 may have low levels of tetrasomy. We also observed some consistent patterns of variation in sequence similarity within disomic markers. For example, homeolog pairs for PK 24 and 21 generally displayed the lowest sequence similarity, and homeolog pairs for PK 07 and 19 displayed higher similarity. Our study therefore presents additional nuances into the rediploidization process by identifying a core group of conserved tetrasomic homeologs, potentially intermediate homeologs (PK 01, 06) and consistently diverged homeologs (PK 21, 24). Future investigations can be refined to examine four well-defined categories across PKs: tetrasomic, intermediate, disomic, and most diverged. This enhanced refinement should reduce noise from the incorrect pooling of homeologs and aid in understanding the rediploidization process in salmonids. Interestingly, more variation in sequence similarity was observed within tetrasomic homeologs than was observed in disomic homeologs. For example, PK 23 has the second highest sequence similarity in Arctic char, the fourth highest in rainbow trout, the sixth highest in Atlantic salmon, and the seventh highest in grayling. While this may be in part due to differences in genome assembly method and assembly quality, the fact that variation exists even among the highest quality genomes (Atlantic salmon and rainbow trout) suggests that rediploidization rates at tetrasomic PKs may vary among species, even though the same seven PKs are consistently classified as tetrasomic. In other words, although there appears to be a large amount of conservation of tetrasomic inheritance between species, our genome analyses also suggest some independence in the return to disomy since the three subfamilies of salmonid split ∼50MYA.

Conclusions

Here we provide the most complete analysis of chromosomal rearrangements in coregonines using the currently available genomic resources and a haploid linkage map for cisco. We also integrate this analysis with prior characterizations of chromosomal rearrangements in salmonids through the use of a common identifier system, the protokaryotype ID (PK), and suggest its continued use to facilitate comparative analyses of salmonids. Our study revealed that patterns of tetrasomic inheritance are largely conserved across the salmonids, but that there is substantial variation in these patterns both within and among species. For example, while the same seven PKs appear to be tetrasomically inherited across all species examined, their relative rates of sequence similarity differ within species, suggesting the potential of independent evolutionary trajectories following speciation. Additionally, we documented that analyses based on linkage maps do not identify the same tetrasomically inherited PKs as genome analyses and postulate that this may be due to inconsistencies with genome assemblies or due to differences in the length of sequence used in comparisons. This study provides important insights about the WGD in salmon and also provides a framework that can be built upon to improve our understanding of WGDs both within and beyond salmonids.
  68 in total

1.  Basal euteleostean relationships: a mitogenomic perspective on the phylogenetic reality of the "Protacanthopterygii".

Authors:  Naoya B Ishiguro; Masaki Miya; Mutsumi Nishida
Journal:  Mol Phylogenet Evol       Date:  2003-06       Impact factor: 4.286

2.  The master sex-determination locus in threespine sticklebacks is on a nascent Y chromosome.

Authors:  Catherine L Peichel; Joseph A Ross; Clinton K Matson; Mark Dickson; Jane Grimwood; Jeremy Schmutz; Richard M Myers; Seiichi Mori; Dolph Schluter; David M Kingsley
Journal:  Curr Biol       Date:  2004-08-24       Impact factor: 10.834

Review 3.  Polyploidy: adaptation to the genomic environment.

Authors:  Jesse D Hollister
Journal:  New Phytol       Date:  2014-07-18       Impact factor: 10.151

Review 4.  The evolutionary significance of polyploidy.

Authors:  Yves Van de Peer; Eshchar Mizrachi; Kathleen Marchal
Journal:  Nat Rev Genet       Date:  2017-05-15       Impact factor: 53.242

5.  Comparative analysis reveals that polyploidy does not decelerate diversification in fish.

Authors:  S H Zhan; L Glick; C S Tsigenopoulos; S P Otto; I Mayrose
Journal:  J Evol Biol       Date:  2014-01-13       Impact factor: 2.411

Review 6.  Sorting duplicated loci disentangles complexities of polyploid genomes masked by genotyping by sequencing.

Authors:  Morten T Limborg; Lisa W Seeb; James E Seeb
Journal:  Mol Ecol       Date:  2016-04-20       Impact factor: 6.185

Review 7.  Polyploidy and interspecific hybridization: partners for adaptation, speciation and evolution in plants.

Authors:  Karine Alix; Pierre R Gérard; Trude Schwarzacher; J S Pat Heslop-Harrison
Journal:  Ann Bot       Date:  2017-08-01       Impact factor: 4.357

8.  Reproductive isolation in a nascent species pair is associated with aneuploidy in hybrid offspring.

Authors:  Anne-Marie Dion-Côté; Radka Symonová; Petr Ráb; Louis Bernatchez
Journal:  Proc Biol Sci       Date:  2015-03-07       Impact factor: 5.349

9.  Subfunctionalization of duplicated genes as a transition state to neofunctionalization.

Authors:  Shruti Rastogi; David A Liberles
Journal:  BMC Evol Biol       Date:  2005-04-14       Impact factor: 3.260

10.  A well-constrained estimate for the timing of the salmonid whole genome duplication reveals major decoupling from species diversification.

Authors:  Daniel J Macqueen; Ian A Johnston
Journal:  Proc Biol Sci       Date:  2014-01-22       Impact factor: 5.349

View more
  3 in total

1.  A de novo chromosome-level genome assembly of Coregonus sp. "Balchen": One representative of the Swiss Alpine whitefish radiation.

Authors:  Rishi De-Kayne; Stefan Zoller; Philine G D Feulner
Journal:  Mol Ecol Resour       Date:  2020-05-29       Impact factor: 7.090

2.  Genome-Wide Reconstruction of Rediploidization Following Autopolyploidization across One Hundred Million Years of Salmonid Evolution.

Authors:  Manu Kumar Gundappa; Thu-Hien To; Lars Grønvold; Samuel A M Martin; Sigbjørn Lien; Juergen Geist; David Hazlerigg; Simen R Sandve; Daniel J Macqueen
Journal:  Mol Biol Evol       Date:  2022-01-07       Impact factor: 16.240

3.  Genomic and environmental influences on resilience in a cold-water fish near the edge of its range.

Authors:  Amanda S Ackiss; Madeline R Magee; Greg G Sass; Keith Turnquist; Peter B McIntyre; Wesley A Larson
Journal:  Evol Appl       Date:  2021-11-09       Impact factor: 5.183

  3 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.