| Literature DB >> 32917720 |
Mikhail Yu Ozerov1,2,3, Martin Flajšhans4, Kristina Noreikiene5, Anti Vasemägi6,5, Riho Gross5.
Abstract
The wels catfish (Silurus glanis) is one of the largest freshwater fish species in the world. This top predator plays a key role in ecosystem stability, and represents an iconic trophy-fish for recreational fishermen. S. glanis is also a highly valued species for its high-quality boneless flesh, and has been cultivated for over 100 years in Eastern and Central Europe. The interest in rearing S. glanis continues to grow; the aquaculture production of this species has almost doubled during the last decade. However, despite its high ecological, cultural and economic importance, the available genomic resources for S. glanis are very limited. To fulfill this gap we report a de novo assembly and annotation of the whole genome sequence of a female S. glanis The linked-read based technology with 10X Genomics Chromium chemistry and Supernova assembler produced a highly continuous draft genome of S. glanis: ∼0.8Gb assembly (scaffold N 50 = 3.2 Mb; longest individual scaffold = 13.9 Mb; BUSCO completeness = 84.2%), which included 313.3 Mb of putative repeated sequences. In total, 21,316 protein-coding genes were predicted, of which 96% were annotated functionally from either sequence homology or protein signature searches. The highly continuous genome assembly will be an invaluable resource for aquaculture genomics, genetics, conservation, and breeding research of S. glanis.Entities:
Keywords: 10X Genomics Chromium linked-reads; Silurus glanis; de novo assembly; teleost; wels catfish; whole genome sequencing
Mesh:
Year: 2020 PMID: 32917720 PMCID: PMC7642921 DOI: 10.1534/g3.120.401711
Source DB: PubMed Journal: G3 (Bethesda) ISSN: 2160-1836 Impact factor: 3.154
Figure 1Wels catfish (Silurus glanis). Photo by Filip Staes, http://www.fsfotografie.be/.
The S. glanis genome size, heterozygosity and repeat content as estimated by the GenomeScope and findGSE software
| Genome characteristics | k-mer size | ||
|---|---|---|---|
| k = 17 | k = 21 | k = 25 | |
| Genome haploid length (Mb) | 723.4 | 753.6 | 769.9 |
| Genome repeat length (Mb) | 337.8 | 216.8 | 205.4 |
| Genome unique length (Mb) | 385.6 | 536.7 | 564.5 |
| Heterozygosity, % | 0.24 | 0.25 | 0.23 |
| Estimated repetitive ratio,% | 46.7 | 28.8 | 26.7 |
| Read error rate, % | 0.30 | 0.36 | 0.36 |
| Genome haploid length (Mb) | 822.6 | 901.3 | 906.5 |
| Genome repeat length (Mb) | 395.1 | 326.9 | 300.2 |
| Genome unique length (Mb) | 427.5 | 574.5 | 606.3 |
| Estimated repetitive ratio, % | 48.0 | 36.3 | 33.1 |
The S. glanis genome assembly and annotation statistics
| Genome assembly | |
|---|---|
| Number of contigs | 105,816 |
| Total contig size (bp) | 712,999,588 |
| Contig | 13,869 |
| Largest contig (bp) | 140,841 |
| Number of scaffolds | 25,703 |
| Total scaffold size (bp) | 793,358,859 |
| Scaffold | 3,169,562 |
| Largest scaffold (bp) | 13,715,129 |
| GC content (%) | 39.2 |
| Unknown base (%) | 10.1 |
| Complete | 3,859 (84.2%) |
| Complete and single copy | 3,717 (81.1%) |
| Complete and duplicated | 142 (3.1%) |
| Fragmented | 312 (6.8%) |
| Missing | 413 (9.0%) |
| Number of protein-coding genes | 21,316 |
| with partial EST support | 10,260 |
| with > 90% EST support | 4,989 |
| with full length EST support | 3,795 |
| with > 100 RNAseq reads aligned | 17,330 |
| with > 10 RNAseq reads aligned | 19,855 |
| Number of functionally-annotated proteins | 20,532 |
| Mean protein length (interquartile range, aa) | 501 (218-617) |
| Longest protein (aa) | 27,306 (titin-like) |
| Average number of exons per gene (mean length, interquartile range) | 9 (212, 89-194 bp) |
| Average number of introns per gene (length, interquartile range) | 8 (1,208, 133-1,274 bp) |
| Complete | 3,427 (74.8%) |
| Complete and single copy | 3,248 (70.9%) |
| Complete and duplicated | 179 (3.9%) |
| Fragmented | 403 (8.8%) |
| Missing | 754 (16.4%) |
Minimum scaffold length: 1 Kb.
Figure 2BUSCO assessment of the S. glanis and other Siluriformes genomes.
Figure 3Circos plot showing the high level of synteny between S. glanis (this study) and I. punctatus (Liu ).
The S. glanis transcriptome assembly statistics
| Transcriptome assembly (multiple tissues) | |
|---|---|
| Number of transcripts | 48,133 |
| Total transcript size (bp) | 80,812,654 |
| Transcript | 2,394 |
| Largest transcript (bp) | 69,646 |
| Complete | 4,222 (92.1%) |
| Complete and single copy | 3,844 (83.9%) |
| Complete and duplicated | 378 (8.2%) |
| Fragmented | 123 (2.7%) |
| Missing | 239 (5.2%) |
Figure 4Inference of the S. glanis demographic history as revealed by PSMC analysis. The inferred population size is presented as a bold red line, and the surrounding thin pink lines are the estimates of population size generated after 100 rounds of bootstrapping. Green and white bars above the figure represent interglacial and glacial periods, respectively. The names of the last four geological epochs are indicated above the bars: the Holocene, the Last Glacial Period (LGP), the Eemian and the Penultimate Glacial Period (PGP).