| Literature DB >> 30355765 |
Mikhail Yu Ozerov1, Freed Ahmad1, Riho Gross2, Lilian Pukk2,3, Siim Kahar2, Veljo Kisand4, Anti Vasemägi5,2,6.
Abstract
The Eurasian perch (Perca fluviatilis) is the most common fish of the Percidae family and is widely distributed across Eurasia. Perch is a popular target for professional and recreational fisheries, and a promising freshwater aquaculture species in Europe. However, despite its high ecological, economical and societal importance, the available genomic resources for P. fluviatilis are rather limited. In this work, we report de novo assembly and annotation of the whole genome sequence of perch. The linked-read based technology with 10X Genomics Chromium chemistry and Supernova assembler produced a draft perch genome ∼1.0 Gbp assembly (scaffold N 50 = 6.3 Mb; the longest individual scaffold of 29.3 Mb; BUSCO completeness of 88.0%), which included 281.6 Mb of putative repeated sequences. The perch genome assembly presented here, generated from small amount of starting material (0.75 ng) and a single linked-read library, is highly continuous and considerably more complete than the currently available draft of P. fluviatilis genome. A total of 23,397 protein-coding genes were predicted, 23,171 (99%) of which were annotated functionally from either sequence homology or protein signature searches. Linked-read technology enables fast, accurate and cost-effective de novo assembly of large non-model eukaryote genomes. The highly continuous assembly of the Eurasian perch genome presented in this study will be an invaluable resource for a range of genetic, ecological, physiological, ecotoxicological, functional and comparative genomic studies in perch and other fish species of the Percidae family.Entities:
Keywords: 10X Genomics Chromium linked-read; Perca fluviatilis; de novo assembly; fish; whole genome sequencing
Mesh:
Substances:
Year: 2018 PMID: 30355765 PMCID: PMC6288837 DOI: 10.1534/g3.118.200768
Source DB: PubMed Journal: G3 (Bethesda) ISSN: 2160-1836 Impact factor: 3.154
Genome size, heterozygosity and repeat content as estimated by GenomeScope and findGSE software
| Genome characteristics | k-mer size | ||
|---|---|---|---|
| k = 17 | k = 21 | k = 25 | |
| Genome haploid length (Mb) | 851.7 | 894.8 | 928.2 |
| Genome repeat length (Mb) | 426.9 | 306.9 | 307.4 |
| Genome unique length (Mb) | 424.8 | 587.9 | 620.8 |
| Heterozygosity, % | 0.28 | 0.26 | 0.24 |
| Estimated repetitive ratio,% | 50.1 | 34.3 | 33.1 |
| Read error rate, % | 0.14 | 0.18 | 0.19 |
| Genome haploid length (Mb) | 1,050.9 | 1,163.8 | 1,172.8 |
| Genome repeat length (Mb) | 578.1 | 529.8 | 503.2 |
| Estimated repetitive ratio, % | 55.0 | 45.5 | 42.9 |
Eurasian perch genome assembly statistics
| 10X Genome assembly | Genome assembly by | |
|---|---|---|
| Number of contigs | 100,796 | 181,537 |
| Total contig size (bp) | 851,640,084 | 626,588,998 |
| Contig | 18,196 | 4,140 |
| Largest contig (bp) | 241,857 | 46,493 |
| Number of scaffolds | 31,105 | 139,898 |
| Total scaffold size (bp) | 958,225,764 | 630,583,430 |
| Scaffold | 6,260,519 | 5,973 |
| Largest scaffold (bp) | 29,260,448 | 73,288 |
| GC/N (%) | 40.9/11.1 | 40.6/0.6 |
| Complete | 4,033 (88.0%) | 2,144 (46.8%) |
| Complete and single copy | 3,933 (85.8%) | 2,105 (45.9%) |
| Complete and duplicated | 100 (2.2%) | 39 (0.9%) |
| Fragmented | 323 (7.0%) | 1246 (27.2%) |
| Missing | 228 (5.0%) | 1194 (26.0%) |
| Number of protein-coding genes | 23,397 | |
| Number of functionally-annotated proteins | 23,171 | |
| Mean protein length (interquartile range, aa) | 506 (224-614) | |
| Longest protein (aa) | 8,907 (nesprin-1) | |
| Average number (length, interquartile range of length) of exon per gene | 9 (228, 89-189 bp) | |
| Average number (length, interquartile range of length) of intron per gene | 8 (1,224, 150-1,340 bp) |
Minimum scaffold length is 1 Kb.
Figure 1Cumulative length of the assembly represented by scaffolds (solid line) and contigs (dashed line). De novo perch genome assembly obtained using linked-reads (this study, black lines) and recently published genome assembly using Illumina short reads (Malmstrøm , gray lines).
Eurasian perch transcriptome assembly statistics
| Combined transcriptome assembly (multiple tissues) | |
|---|---|
| Number of transcripts | 36,431 |
| Total transcript size (bp) | 108,727,847 |
| Transcript | 3,962 |
| Largest transcript (bp) | 78,856 |
| Complete | 4,411 (96.2%) |
| Complete and single copy | 3,644 (79.5%) |
| Complete and duplicated | 767 (16.7%) |
| Fragmented | 58 (1.3%) |
| Missing | 115 (2.5%) |