| Literature DB >> 31540274 |
Julien Alban Nguinkal1, Ronald Marco Brunner2, Marieke Verleih3, Alexander Rebl4, Lidia de Los Ríos-Pérez5, Nadine Schäfer6, Frieder Hadlich7, Marcus Stüeken8, Dörte Wittenburg9, Tom Goldammer10.
Abstract
The pikeperch (Sander lucioperca) is a fresh and brackish water Percid fish natively inhabiting the northern hemisphere. This species is emerging as a promising candidate for intensive aquaculture production in Europe. Specific traits like cannibalism, growth rate and meat quality require genomics based understanding, for an optimal husbandry and domestication process. Still, the aquaculture community is lacking an annotated genome sequence to facilitate genome-wide studies on pikeperch. Here, we report the first highly contiguous draft genome assembly of Sander lucioperca. In total, 413 and 66 giga base pairs of DNA sequencing raw data were generated with the Illumina platform and PacBio Sequel System, respectively. The PacBio data were assembled into a final assembly size of ~900 Mb covering 89% of the 1,014 Mb estimated genome size. The draft genome consisted of 1966 contigs ordered into 1,313 scaffolds. The contig and scaffold N50 lengths are 3.0 Mb and 4.9 Mb, respectively. The identified repetitive structures accounted for 39% of the genome. We utilized homologies to other ray-finned fishes, and ab initio gene prediction methods to predict 21,249 protein-coding genes in the Sander lucioperca genome, of which 88% were functionally annotated by either sequence homology or protein domains and signatures search. The assembled genome spans 97.6% and 96.3% of Vertebrate and Actinopterygii single-copy orthologs, respectively. The outstanding mapping rate (99.9%) of genomic PE-reads on the assembly suggests an accurate and nearly complete genome reconstruction. This draft genome sequence is the first genomic resource for this promising aquaculture species. It will provide an impetus for genomic-based breeding studies targeting phenotypic and performance traits of captive pikeperch.Entities:
Keywords: aquaculture; fish; genes annotation; genome assembly; genome sequencing; pikeperch
Mesh:
Substances:
Year: 2019 PMID: 31540274 PMCID: PMC6770990 DOI: 10.3390/genes10090708
Source DB: PubMed Journal: Genes (Basel) ISSN: 2073-4425 Impact factor: 4.096
Figure 1Estimated characteristics of Sander lucioperca genome based on 19-mer analysis. The vertical axis represents the 19-mer depth, and the horizontal their corresponding frequency. is the heterozygous and the homozygous peak. Low coverage (<50) 19-mers are putative erroneous sequences, whereas deep coverage (>450) 19-mer indicate repetitive genomic sequences.
Figure 2Comparison of contiguity (N50) and repeat content among selected Perciformes fish species. (A): Contigs N50 (scaled with natural logarithm) of the pikeperch assembly compared with recently published assemblies of species of the same taxonomic order (Perciformes). (B): Correlation of repeat content and genome size in recently published genomes of Perciformes fish species. R is the Pearson’s correlation coefficient and p the associated p-value.
Figure 3Assembly length and mappability statistics. (A): The cumulative length of pikeperch assembly in correlation with the total number of contigs, sorted from the largest to the shortest. (B): Overall trend of contigs Nx-metric as x varies from 0 to 100. (C): Mapping rates of genomic paired-end reads of 40 pikeperch individuals to our constructed reference pikeperch assembly.
Summary statistics of Benchmarking Universal Single-Copy Orthologs (BUSCO) analysis for Sander lucioperca genome assembly.
| Categories | Actinopterygii | Vertebrata | ||
|---|---|---|---|---|
| #Genes | Percentage | #Genes | Percentage | |
| Complete single-copy | 4413 | 96.27 | 2523 | 97.56 |
| Complete duplicated | 112 | 2.45 | 26 | 1.01 |
| Fragmented | 89 | 1.94 | 40 | 1.54 |
| Missing | 82 | 1.79 | 23 | 0.89 |
Summary statistics of Sander lucioperca genome assembly and annotation.
|
| |
| Total size (nt) | 900,477,756 |
| No. of contigs | 1966 |
| Contigs N50 (nt) | 2,995,800 |
| Longest contig (nt) | 17,774,792 |
| No. of scaffolds | 1313 |
| Scaffold N50 (nt) | 4,929,547 |
| Longest scaffold (nt) | 19,065,786 |
| Average scaffold (nt) | 685,817 |
| GC-content (%) | 40.91 |
|
| |
| Number of coding genes | 21,249 |
| mean gene length (nt) | 10,961 |
| Mean coding sequence (CDS) length (nt) | 1313 |
| Mean intron length (nt) | 1696 |
| Mean exon length (nt) | 196 |
| Average no. of exons per CDS | 6.7 |
| % of genome covered by genes | 25.9 |
| % of genome covered by CDS | 3.1 |
|
| |
| Non-redundant (NR) hits | 18,536 (87.2%) |
| Swissprot hits | 13,783 (64.8%) |
| trEMBL hits | 18,171 (85.5%) |
| Interpro hits | 18,486 (87.0 %) |
|
| |
| tRNA | 2313 |
| rRNA | 180 |
| miRNA | 166 |
Figure 4Shared gene families and their distribution per species. (A): Venn-diagram showing the shared gene families between selected Perciformes species: L.mac (Lateolabrax maculatus), S.sin (Sillago sinica), C.arg (Channa argus), P.fla (Perca flavescens), S.luc (Sander lucioperca), P.char (Parachaenichthys charcoti). Colored numbers indicate the number of species-specific gene families. (B): Total number of gene families for each species.
Figure 5Phylogenetic analysis of Sander lucioperca and closely related Perciformes genomes. The constructed phylogenetic tree is based on one-to-one single-copy orthologs between the seven Perciformes fish species. The node labels indicate the estimated divergence time from the last common ancestor (LCA), in million years ago (MYA).
Comparison of currently reported genome assemblies of fish species in the Percidae family.
| Estimated | Total Assembly | Ungapped | Number of | Contigs | #Coding | |
|---|---|---|---|---|---|---|
| Yellow perch | 41 | 877.4 | 877.0 (99.9%) | 1097 | 4.2 | 23,749 |
| Pikeperch | 39 | 900.5 | 899.8 (99.9%) | 1966 | 3.0 | 21,249 |
| Eurasian perch | 33 | 958.2 | 851.6 (88.9%) | 100,821 | 0.0182 | 23,397 |