| Literature DB >> 34351050 |
Seth R Smith1,2, Eric Normandeau3, Haig Djambazian4, Pubudu M Nawarathna5, Pierre Berube4, Andrew M Muir6, Jiannis Ragoussis4, Chantelle M Penney7, Kim T Scribner1,2,8, Gordon Luikart9,10, Chris C Wilson11, Louis Bernatchez3.
Abstract
Here, we present an annotated, chromosome-anchored, genome assembly for Lake Trout (Salvelinus namaycush) - a highly diverse salmonid species of notable conservation concern and an excellent model for research on adaptation and speciation. We leveraged Pacific Biosciences long-read sequencing, paired-end Illumina sequencing, proximity ligation (Hi-C) sequencing, and a previously published linkage map to produce a highly contiguous assembly composed of 7378 contigs (contig N50 = 1.8 Mb) assigned to 4120 scaffolds (scaffold N50 = 44.975 Mb). Long read sequencing data were generated using DNA from a female double haploid individual. 84.7% of the genome was assigned to 42 chromosome-sized scaffolds and 93.2% of Benchmarking Universal Single Copy Orthologues were recovered, putting this assembly on par with the best currently available salmonid genomes. Estimates of genome size based on k-mer frequency analysis were highly similar to the total size of the finished genome, suggesting that the entirety of the genome was recovered. A mitochondrial genome assembly was also produced. Self-versus-self synteny analysis allowed us to identify homeologs resulting from the salmonid specific autotetraploid event (Ss4R) as well as regions exhibiting delayed rediploidization. Alignment with three other salmonid genomes and the Northern Pike (Esox lucius) genome also allowed us to identify homologous chromosomes in related taxa. We also generated multiple resources useful for future genomic research on Lake Trout, including a repeat library and a sex-averaged recombination map. A novel RNA sequencing data set for liver tissue was also generated in order to produce a publicly available set of annotations for 49,668 genes and pseudogenes. Potential applications of these resources to population genetics and the conservation of native populations are discussed.Entities:
Keywords: zzm321990Salvelinuszzm321990; Lake Trout; genome assembly; genomics; salmonid
Mesh:
Year: 2021 PMID: 34351050 PMCID: PMC9291852 DOI: 10.1111/1755-0998.13483
Source DB: PubMed Journal: Mol Ecol Resour ISSN: 1755-098X Impact factor: 8.678
FIGURE 1Photograph of an adult Lake Trout (Salvelinus namaycush) from Great Bear Lake, Northwest Territories, Canada. Photo credit: Andrew Muir
FIGURE 2Circos plot displaying centromere positions, Tcl‐Mariner abundance, density of annotated protein coding genes, local homeolog sequence identity, male and female Lake Trout (Salvelinus namaycush) linkage maps, and homeolog pairs resulting from Ss4R. (A) Black boxes in the outside ring display the mean mapping positions (±5 Mb) for centromere associated RAD loci from Smith et al. (2020). (B) The second ring displays Z‐transformed Tcl‐Mariner repeat abundance in 5 Mb sliding windows with an offset of 100 kilobases. (C) The third ring displays the density of annotated genes in 5 Mb sliding windows with an offset of 100 kilobases. (D) The fourth ring displays local homeolog identity between syntenic blocks detected by SynMap2. Red points correspond to windows with elevated sequence identity putatively resulting from delayed rediploidization (posterior probability >0.5). Blue points correspond to windows with elevated sequence divergence between homeologs. (E) The fifth ring displays map distance (centimorgans) for male (red) and female (blue) linkage maps (y‐axis) versus physical distance (x‐axis) for each of the 42 chromosomes. Connections are drawn between syntenic blocks identified by symap v5 putatively resulting from Ss4R
General summary statistics for the Lake Trout (Salvelinus namaycush) genome assembly. The total number of chromosomes, scaffolds (including chromosomes), and contigs are listed in the top row. Metrics reported for chromosomes and scaffolds include gaps of unknown length. Consensus accuracy was obtained from the output of POLCA after running three iterations of the program
| Chromosomes | Scaffolds | Contigs | Gaps | |
|---|---|---|---|---|
| Count | 42 | 4120 | 7378 | 3258 |
| Minimum length (bp) | 22,041,605 | 9606 | 84 | 100 |
| Mean length (bp) | 47,175,710 | 569,295 | 317,859 | 100 |
| Max length (bp) | 98,200,354 | 98,200,354 | 34,788,501 | 100 |
| Total length (bp) | 1,981,379,816 | 2,345,496,355 | 2,345,170,555 | 325,800 |
| N50 (bp) | 48,336,861 | 44,976,251 | 1,804,090 | 100 |
| N90 (bp) | 34,530,387 | 249,999 | 114,532 | 100 |
| N95 (bp) | 26,015,404 | 84,453 | 61,568 | 100 |
| Consensus accuracy (%) | ‐ | ‐ | 99.9959 | ‐ |
FIGURE 3Comparison of BUSCO scores across multiple chromosome‐level salmonid assemblies. Scores for the preduplication outgroup species (Northern Pike; Esox lucius) are also included for comparison. Assemblies are listed top to bottom according to the total percentage of complete BUSCOs. Complete single‐copy, complete duplicated, fragmented, and missing BUSCO percentages are delineated with green, blue, yellow, and red bars, respectively
Number of elements, total sequence length, and percent of the Lake Trout (Salvelinus namaycush) genome occupied by retroelements, DNA transposons, and other repeat types
| No. elements | Length | Percent | |
|---|---|---|---|
| Retroelements | 551,376 | 305,755,720 | 13.04 |
| SINEs | 0 | 0 | 0.00 |
| Penelope | 11,724 | 3,138,292 | 0.13 |
| LINEs | 483,866 | 245,479,169 | 10.47 |
| CRE/SLACS | 0 | 0 | 0.00 |
| L2/CR1/Rex | 337,340 | 178,461,635 | 7.61 |
| R1/LOA/Jockey | 9131 | 2,778,587 | 0.12 |
| R2/R4/NeSL | 705 | 573,357 | 0.02 |
| RTE/Bov‐B | 28,238 | 14,293,769 | 0.61 |
| L1/CIN4 | 12,257 | 6,142,123 | 0.26 |
| LTR Elements | 67,510 | 60,276,551 | 2.57 |
| BEL/Pao | 1533 | 1,173,630 | 0.05 |
| Ty1/Copia | 1427 | 1,007,823 | 0.04 |
| Gypsy/DIRS1 | 55,237 | 49,788,865 | 2.12 |
| Retroviral | 9313 | 8,306,233 | 0.35 |
| DNA transposons | 533,707 | 233,872,078 | 9.97 |
| hobo‐Activator | 34,814 | 15,807,935 | 0.67 |
| Tc1‐IS630‐Pogo | 473,487 | 209,441,783 | 8.93 |
| En‐Spm | 0 | 0 | 0.00 |
| MuDR‐IS905 | 0 | 0 | 0.00 |
| PiggyBac | 9091 | 3,370,797 | 0.14 |
| Tourist/Harbinger | 3105 | 834,759 | 0.04 |
| Other (Mirage, P‐elements, Transib) | 1104 | 292,535 | 0.01 |
| Rolling‐circles | 348 | 227,654 | 0.01 |
| Unclassified: | 2,885,512 | 722,299,456 | 30.79 |
| All interspersed repeats | 1,261,927,254 | 53.80 |