| Literature DB >> 36050331 |
Shihu Zhao1, Xiufeng Yang1, Bo Pang2, Lei Zhang1, Qi Wang2, Shangbin He1, Huashan Dou3, Honghai Zhang4.
Abstract
Chanodichthys erythropterus is a fierce carnivorous fish widely found in East Asian waters. It is not only a popular food fish in China, it is also a representative victim of overfishing. Genetic breeding programs launched to meet market demands urgently require high-quality genomes to facilitate genomic selection and genetic research. In this study, we constructed a chromosome-level reference genome of C. erythropterus by taking advantage of long-read single-molecule sequencing and de novo assembly by Oxford Nanopore Technology (ONT) and Hi-C. The 1.085 Gb C. erythropterus genome was assembled from 132 Gb of Nanopore sequence. The assembled genome represents 98.5% completeness (BUSCO) with a contig N50 length of 23.29 Mb. The contigs were clustered and ordered onto 24 chromosomes covering roughly 99.49% of the genome assembly with Hi-C data. Additionally, 33,041 (98.0%) genes were functionally annotated from a total of 33,706 predicted protein-coding sequences by combining transcriptome data from seven tissues. This high-quality assembled genome will be a precious resource for future molecular breeding and functional genomics research of C. erythropterus.Entities:
Mesh:
Year: 2022 PMID: 36050331 PMCID: PMC9436972 DOI: 10.1038/s41597-022-01648-0
Source DB: PubMed Journal: Sci Data ISSN: 2052-4463 Impact factor: 8.501
Fig. 117-mer frequency distribution in C. erythropterus genome. The X-axis is the k-mer depth, and Y-axis represents the frequency of the k-mer for a given depth.
The result of k-mer analysis.
| Kmer | Depth | N Kmer | Genomesize (M) | Heterozygousrate (%) | Repeatrate (%) |
|---|---|---|---|---|---|
| 17 | 27 | 30,891,679,507 | 1,120.68 | 0.31 | 57.05 |
Assembly statistics of C. erythropterus.
| Type | Contig length (bp) | Scaffold length (bp) | Contig number | Scaffold number |
|---|---|---|---|---|
| Total | 1,085,492,200 | 1,085,510,300 | 231 | 50 |
| Max | 46,701,910 | 73,070,995 | — | — |
| Number > = 2000 | — | — | 231 | 50 |
| N50 | 23,286,394 | 42,399,299 | 18 | 11 |
| N60 | 20,193,970 | 41,239,264 | 23 | 13 |
| N70 | 13,953,221 | 39,512,133 | 29 | 16 |
| N80 | 8,516,902 | 39,089,359 | 39 | 19 |
| N90 | 3,227,172 | 37,095,974 | 60 | 21 |
Summary of assembled 24 chromosomes of C. erythropterus.
| Sequeues ID | Sequeues Length | Sequeues ID | Sequeues Length |
|---|---|---|---|
| Chr1 | 38,364,365 | Chr13 | 54,232,047 |
| Chr2 | 41,374,698 | Chr14 | 47,491,587 |
| Chr3 | 73,070,995 | Chr15 | 42,777,030 |
| Chr4 | 39,512,133 | Chr16 | 48,609,862 |
| Chr5 | 39,089,359 | Chr17 | 42,399,299 |
| Chr6 | 35,868,044 | Chr18 | 39,783,364 |
| Chr7 | 45,130,715 | Chr19 | 39,191,619 |
| Chr8 | 47,279,267 | Chr20 | 39,167,548 |
| Chr9 | 39,627,888 | Chr21 | 41,239,264 |
| Chr10 | 61,666,924 | Chr22 | 37,095,974 |
| Chr11 | 59,924,899 | Chr23 | 33,623,848 |
| Chr12 | 61,677,361 | Chr24 | 31,722,787 |
| Place | 1,079,920,877 | ||
| Unplace | 5,589,423 | ||
| Total | 1,085,510,300 | ||
| Percentage | 99.49% |
Fig. 2Hi-C chromosome contact map.
Results of the BUSCO assessment of C. erythropterus.
| Type | Number |
|---|---|
| Complete BUSCOs (C) | 3,304 (98.5%) |
| Complete and single-copy BUSCOs (S) | 3,275 (97.6%) |
| Complete and duplicated BUSCOs (D) | 29 (0.9%) |
| Fragmented BUSCOs (F) | 14 (0.4%) |
| Missing BUSCOs (M) | 36 (1.1%) |
| Total BUSCO groups searched | 3,354 |
Classification of repeat elements in C. erythropterus genome.
| Type | Denovo + Repbase | TE Proteins | Combined TEs | |||
|---|---|---|---|---|---|---|
| Length (bp) | % in Genome | Length (bp) | % in Genome | Length (bp) | % in Genome | |
| DNA | 58,226,942 | 5.36 | 7,413,708 | 0.68 | 62,122,195 | 5.72 |
| LINE | 7,641,127 | 0.70 | 16,986,628 | 1.56 | 20,557,781 | 1.89 |
| SINE | 1,634,833 | 0.15 | 0 | 0 | 1,634,833 | 0.15 |
| LTR | 467,225,494 | 43.04 | 32,239,687 | 2.97 | 469,221,600 | 43.23 |
| Unknown | 21,969,188 | 2.02 | 0 | 0 | 21,969,188 | 2.02 |
| Total | 551,340,511 | 50.79 | 56,626,202 | 5.22 | 557,279,616 | 51.34 |
The statistics of gene models of protein-coding genes annotated in C. erythropterus genome.
| Gene set | Number | Average transcript length (bp) | Average CDS length (bp) | Average exons per gene | Average exon length (bp) | Average intron length (bp) | |
|---|---|---|---|---|---|---|---|
| Augustus | 41,060 | 10,388.42 | 1,140.26 | 6.27 | 181.73 | 1,753.44 | |
| GlimmerHMM | 108,494 | 8,823.60 | 566.91 | 3.86 | 146.98 | 2,889.85 | |
| SNAP | 63,613 | 17,053.13 | 684.81 | 5.08 | 134.69 | 4,007.40 | |
| Geneid | 31,402 | 20,537.73 | 1,833.65 | 6.23 | 294.09 | 3,572.90 | |
| Genscan | 32,242 | 23,196.75 | 1,545.59 | 8.10 | 190.80 | 3,049.14 | |
| Homolog | 77,362 | 5,250.48 | 793.11 | 3.88 | 204.37 | 1,547.29 | |
| 32,561 | 11,939.92 | 1,570.24 | 6.95 | 225.83 | 1,741.90 | ||
| 34,130 | 10,738.32 | 1,553.64 | 6.48 | 239.75 | 1,675.95 | ||
| 40,317 | 9,754.61 | 1,366.59 | 5.83 | 234.28 | 1,735.50 | ||
| 41,063 | 8,962.70 | 1,270.36 | 5.57 | 228.06 | 1,683.09 | ||
| 34,358 | 11,162.86 | 1,430.97 | 6.45 | 222.02 | 1,787.22 | ||
| RNAseq | PASA | 116,439 | 12,899.85 | 1,279.78 | 7.79 | 164.34 | 1,711.96 |
| Cufflinks | 80,918 | 18,982.81 | 3,213.28 | 8.52 | 376.93 | 2,095.63 | |
| EVM | 37,168 | 14,243.82 | 1,274.10 | 7.17 | 177.66 | 2,101.51 | |
| PASA-update | 36,819 | 14,260.02 | 1,288.94 | 7.22 | 178.52 | 2,085.34 | |
| Final set | 33,706 | 15,469.83 | 1,363.50 | 7.77 | 175.58 | 2,085.05 | |
The comparison of the gene models annotated from C. erythropterus genome and other teleosts.
| Species | Number | Average transcript length (bp) | Average CDS length (bp) | Average exons per gene | Average exon length (bp) | Average intron length (bp) |
|---|---|---|---|---|---|---|
| 33,706 | 15,469.83 | 1,363.50 | 7.77 | 175.58 | 2,085.05 | |
| 42,645 | 17,491.76 | 1,690.94 | 9.95 | 169.90 | 1,765.00 | |
| 45,899 | 16,217.28 | 1,585.31 | 9.23 | 171.79 | 1,778.31 | |
| 44,351 | 16,478.32 | 1,645.32 | 9.64 | 170.66 | 1,716.65 | |
| 34,414 | 15,105.52 | 1,309.42 | 7.86 | 166.68 | 2,012.35 | |
| 43,518 | 15,745.34 | 1,727.67 | 9.94 | 173.73 | 1,567.13 | |
| 32,715 | 26,262.69 | 1,703.09 | 9.44 | 180.32 | 2,908.24 |
Fig. 3Comparisons of the prediction gene models in C. erythropterus genome to other species. (a) CDS length distribution and comparison with other species. (b) Exon length distribution and comparison with other species. (c) Exon number distribution and comparison with other species. (d) Gene length distribution and comparison with other species. (e) Intron length distribution and comparison with other species.
The number of genes with homology or functional classification for C. erythropterus.
| Type | Number | Percent (%) |
|---|---|---|
| Total | 33,706 | — |
| SwissProt | 22,560 | 66.9 |
| Nr | 27,865 | 82.7 |
| KEGG | 23,194 | 68.8 |
| InterPro | 32,791 | 97.3 |
| GO | 29,853 | 88.6 |
| Pfam | 21,159 | 62.8 |
| Annotated | 33,041 | 98.0 |
| Unannotated | 665 | 2.0 |
Fig. 4Venn diagram of the number of genes with functional annotation using multiple public databases.
Classification of ncRNAs in C. erythropterus genome.
| Type | Copy number | Average length (bp) | Total length (bp) | % of genome | |
|---|---|---|---|---|---|
| miRNA | 1,609 | 114.79 | 184,694 | 0.017014 | |
| tRNA | 8,135 | 75.75 | 616,216 | 0.056767 | |
| rRNA | rRNA | 1,251 | 133.09 | 166,498 | 0.015338 |
| 18 S | 49 | 448.49 | 21,976 | 0.002024 | |
| 28 S | 105 | 278.25 | 29,216 | 0.002691 | |
| 5.8 S | 8 | 157.00 | 1,256 | 0.000116 | |
| 5 S | 1,089 | 104.73 | 114,050 | 0.010507 | |
| snRNA | snRNA | 1,060 | 152.67 | 161,831 | 0.014908 |
| CD-box | 231 | 145.46 | 33,601 | 0.003095 | |
| HACA-box | 93 | 151.15 | 14,057 | 0.001295 | |
| splicing | 690 | 155.31 | 107,164 | 0.009872 | |
| Measurement(s) | whole genome sequencing |
| Technology Type(s) | Oxford Nanopore Sequencing |
| Sample Characteristic - Organism | Chanodichthys erythropterus |