| Literature DB >> 31325912 |
Hui Ge1,2, Kebing Lin1, Mi Shen3, Shuiqing Wu1, Yilei Wang2, Ziping Zhang4, Zhiyong Wang2, Yong Zhang5, Zhen Huang6, Chen Zhou1, Qi Lin1, Jianshao Wu1, Lei Liu3, Jiang Hu3, Zhongchi Huang1, Leyun Zheng1.
Abstract
The red-spotted grouper Epinephelus akaara (E. akaara) is one of the most economically important marine fish in China, Japan and South-East Asia and is a threatened species. The species is also considered a good model for studies of sex inversion, development, genetic diversity and immunity. Despite its importance, molecular resources for E. akaara remain limited and no reference genome has been published to date. In this study, we constructed a chromosome-level reference genome of E. akaara by taking advantage of long-read single-molecule sequencing and de novo assembly by Oxford Nanopore Technology (ONT) and Hi-C. A red-spotted grouper genome of 1.135 Gb was assembled from a total of 106.29 Gb polished Nanopore sequence (GridION, ONT), equivalent to 96-fold genome coverage. The assembled genome represents 96.8% completeness (BUSCO) with a contig N50 length of 5.25 Mb and a longest contig of 25.75 Mb. The contigs were clustered and ordered onto 24 pseudochromosomes covering approximately 95.55% of the genome assembly with Hi-C data, with a scaffold N50 length of 46.03 Mb. The genome contained 43.02% repeat sequences and 5,480 noncoding RNAs. Furthermore, combined with several RNA-seq data sets, 23,808 (99.5%) genes were functionally annotated from a total of 23,923 predicted protein-coding sequences. The high-quality chromosome-level reference genome of E. akaara was assembled for the first time and will be a valuable resource for molecular breeding and functional genomics studies of red-spotted grouper in the future.Entities:
Keywords: Hi-C; Nanopore sequencing; RNA-seq; genome assembly; red-spotted grouper
Mesh:
Year: 2019 PMID: 31325912 PMCID: PMC6899872 DOI: 10.1111/1755-0998.13064
Source DB: PubMed Journal: Mol Ecol Resour ISSN: 1755-098X Impact factor: 7.090
Figure 1The red‐spotted grouper (Epinephelus akaara) [Colour figure can be viewed at http://wileyonlinelibrary.com]
Genome assembly statistics and postprocessing of E. akaara
| Genome assembly of | Canu | Canu + Nanopolish | Canu + Nanopolish+Pilon 5X | Canu + Nanopolish+Pilon 5X + Hi‐C |
|---|---|---|---|---|
| Total assembly size of contigs (bp) | 1,124,444,644 | 1,131,374,363 | 1,135,521,910 | |
| Number of contigs | 2,055 | 2,055 | 2,055 | |
| N50 contig length (bp) | 5,194,129 | 5,229,032 | 5,248,933 | |
| N90 contig length (bp) | 130,058 | 131,300 | 132,032 | |
| Longest contig (bp) | 25,455,544 | 25,638,949 | 25,751,680 | |
| Total assembly size of scaffolds (bp) | 1,135,726,210 | |||
| Number of scaffolds | 24 | |||
| N50 scaffold length (bp) | 46,028,906 | |||
| N90 scaffold length (bp) | 35,838,671 |
Unanchored contig base count is not included.
Figure 2Red‐spotted grouper genome contig contact matrix using Hi‐C data. The colour bar illuminates the contact density from red (high) to white (low) [Colour figure can be viewed at http://wileyonlinelibrary.com]
Summary statistics of annotated repeats
| Type | Repbase TEs | RepeatModeler | TE proteins | Combined TEs | ||||
|---|---|---|---|---|---|---|---|---|
| Length (bp) | Per cent of sequence (%) | Length (bp) | Per cent of sequence (%) | Length (bp) | Per cent of sequence (%) | Length (bp) | Per cent of sequence (%) | |
| DNA | 72,547,595 | 6.39 | 152,578,623 | 13.44 | 12,639,588 | 1.11 | 190,003,264 | 16.73 |
| LINE | 54,462,165 | 4.8 | 48,239,766 | 4.25 | 47,434,545 | 4.18 | 72,820,879 | 6.41 |
| LTR | 15,915,741 | 1.4 | 13,792,952 | 1.21 | 10,697,269 | 0.94 | 79,800,582 | 7.03 |
| SINE | 7,048,847 | 0.62 | 5,336,045 | 0.47 | 0 | 0 | 7,684,877 | 0.68 |
| Other | 2,402,087 | 0.21 | 14,324,885 | 1.26 | 620,554 | 0.06 | 47,787,296 | 4.21 |
| Unknown | 569,614 | 0.05 | 102,421,979 | 9.02 | 0 | 0 | 90,408,213 | 7.96 |
| Total | 152,946,049 | 13.47 | 336,694,250 | 29.65 | 71,391,956 | 6.29 | 488,505,111 | 43.02 |
Summary statistics of predicted protein‐coding genes
| Gene set | Total number of gene | Average gene length (bp) | Average CDS length (bp) | Average exons number per gene | Average exon length (bp) | Average intron length (bp) |
|---|---|---|---|---|---|---|
| De novo | ||||||
| Augustus | 29,024 | 16,738 | 1,413.28 | 8.33 | 169.76 | 2,092.11 |
| SNAP | 59,925 | 26,835.73 | 1,417.17 | 10.27 | 137.93 | 2,740.60 |
| Homology | ||||||
|
| 53,989 | 5,519.60 | 901.7 | 4.39 | 205.53 | 1,363.34 |
|
| 33,543 | 11,471.97 | 1,387.91 | 7.1 | 195.57 | 1,654.05 |
|
| 38,897 | 8,415.67 | 1,140.39 | 5.73 | 199.1 | 1,538.84 |
|
| 41,023 | 8,218.57 | 1,033.88 | 5.62 | 183.82 | 1,553.69 |
|
| 36,878 | 11,137.05 | 1,321.62 | 7.1 | 186.2 | 1,609.70 |
|
| 41,660 | 9,565.48 | 1,305.17 | 6.56 | 198.92 | 1,485.36 |
|
| 46,255 | 6,769.88 | 987.84 | 5.1 | 193.58 | 1,409.27 |
|
| 33,739 | 9,958.68 | 1,191.94 | 6.45 | 184.89 | 1,609.54 |
| Final set | ||||||
| Evm | 23,9,243 | 23,162.39 | 1,791.15 | 10.9 | 164.27 | 2,157.86 |
Summary statistics of functional annotated protein‐coding genes
| Type | Number | Per cent (%) |
|---|---|---|
| SwissProt | 22,223 | 92.9 |
| TrEMBL | 23,793 | 99.5 |
| KEGG | 13,828 | 57.8 |
| GO | 16,469 | 68.8 |
| InterProsScan | 21,892 | 91.5 |
| Total Annotated Genes | 23,808 | 99.5 |
| Predicted Genes | 23,923 | – |
Summary statistics of noncoding RNA
| Type | Copy number | Average length (bp) | Total length (bp) | Percentage (%) of genome |
|---|---|---|---|---|
| rRNA | ||||
| rRNA | 792 | 385.48 | 305,299 | 0.022804 |
| 18S | 47 | 1,742.36 | 81,891 | 0.006117 |
| 28S | 35 | 4,066.66 | 142,333 | 0.010632 |
| 5.8S | 47 | 148.04 | 6,958 | 0.00052 |
| 5S | 663 | 111.79 | 74,117 | 0.005536 |
| snRNA | ||||
| snRNA | 1,252 | 144.96 | 181,492 | 0.013557 |
| CD‐box | 230 | 122.29 | 28,127 | 0.002101 |
| HACA‐box | 107 | 146.43 | 15,668 | 0.00117 |
| splicing | 915 | 150.49 | 137,697 | 0.010285 |
| miRNA | 828 | 85.36 | 70,675 | 0.005279 |
| tRNA | 2,608 | 74.68 | 194,771 | 0.014548 |