Literature DB >> 31325912

De novo assembly of a chromosome-level reference genome of red-spotted grouper (Epinephelus akaara) using nanopore sequencing and Hi-C.

Hui Ge^1,2, Kebing Lin¹, Mi Shen³, Shuiqing Wu¹, Yilei Wang², Ziping Zhang⁴, Zhiyong Wang², Yong Zhang⁵, Zhen Huang⁶, Chen Zhou¹, Qi Lin¹, Jianshao Wu¹, Lei Liu³, Jiang Hu³, Zhongchi Huang¹, Leyun Zheng¹.

Abstract

The red-spotted grouper Epinephelus akaara (E. akaara) is one of the most economically important marine fish in China, Japan and South-East Asia and is a threatened species. The species is also considered a good model for studies of sex inversion, development, genetic diversity and immunity. Despite its importance, molecular resources for E. akaara remain limited and no reference genome has been published to date. In this study, we constructed a chromosome-level reference genome of E. akaara by taking advantage of long-read single-molecule sequencing and de novo assembly by Oxford Nanopore Technology (ONT) and Hi-C. A red-spotted grouper genome of 1.135 Gb was assembled from a total of 106.29 Gb polished Nanopore sequence (GridION, ONT), equivalent to 96-fold genome coverage. The assembled genome represents 96.8% completeness (BUSCO) with a contig N50 length of 5.25 Mb and a longest contig of 25.75 Mb. The contigs were clustered and ordered onto 24 pseudochromosomes covering approximately 95.55% of the genome assembly with Hi-C data, with a scaffold N50 length of 46.03 Mb. The genome contained 43.02% repeat sequences and 5,480 noncoding RNAs. Furthermore, combined with several RNA-seq data sets, 23,808 (99.5%) genes were functionally annotated from a total of 23,923 predicted protein-coding sequences. The high-quality chromosome-level reference genome of E. akaara was assembled for the first time and will be a valuable resource for molecular breeding and functional genomics studies of red-spotted grouper in the future.

Entities: Chemical Disease Species

Keywords: Hi-C; Nanopore sequencing; RNA-seq; genome assembly; red-spotted grouper

Mesh：

Year: 2019 PMID： 31325912 PMCID： PMC6899872 DOI： 10.1111/1755-0998.13064

Source DB: PubMed Journal: Mol Ecol Resour ISSN： 1755-098X Impact factor: 7.090

INTRODUCTION

Groupers (Epinephelus spp.) are economically important marine fish species which are distributed worldwide in tropical and subtropical waters (Heemstra & Randall, 1993). The aquaculture of groupers is well developed in China, Japan and South‐East Asia due to their high nutrition value, high market value and a large consumer demand (Pierre et al., 2010). In China, the annual production of groupers exceeds 100,000 tons, generating gross revenue of $1.5 billion (The 8th National Grouper Breeding and Breeding Industrialization Forum in China). Among the Serranidae species, red‐spotted grouper (Epinephelus akaara), called Hong Kong grouper, is particularly preferred by consumers and is highly prized in China (Figure 1). Since the wild population of red‐spotted grouper has declined sharply in the past few decades (Morris, Roberts, & Hawkins, 2000), this species has been listed as endangered since 2003 (The 2004 IUCN red list of threatened species, http://www.iucnredlist.org). As a species of protogynous sequential hermaphrodite, most red‐spotted groupers start out as females and develop into males after exceeding one or several years, which increases the cost of obtaining mature male fish for breeding. Thus, it is important to investigate the mechanisms controlling the sex inversion of red‐spotted grouper (Huang, Zhou, Li, & Gui, 2009). Outbreaks of infectious diseases, such as the disease caused by nervous necrosis virus (NNV) infection, have become a serious threat to red‐spotted grouper culture industry (Tso & Lu, 2018). Investigation of molecular immunity and disease resistance mechanisms in red‐spotted grouper will benefit the sustainable development of red‐spotted grouper culture industry. Additionally, red‐spotted grouper is considered to be a good model for studies of sex inversion, development, genetic diversity and immunity (Huang et al., 2009; Mao, Lei, Alex, Hong, & Wang, 2012).

Figure 1

The red‐spotted grouper (Epinephelus akaara) [Colour figure can be viewed at http://wileyonlinelibrary.com]

The red‐spotted grouper (Epinephelus akaara) [Colour figure can be viewed at http://wileyonlinelibrary.com] Currently, genomes from approximately 181 species of fish have been sequenced and are available at NCBI (https://www.ncbi.nlm.nih.gov/genome/browse#!/overview/fish). However, only limited genetic information is available for molecular breeding and functional genomics because the genome of E. akaara is absent. Genomic information, as an essential conservation and management tool (Nguyen et al., 2007; Timmins‐Schiffman, Friedman, Metzger, White, & Roberts, 2013), is needed for the protection and long‐term survival of the endangered the red‐spotted grouper. Genomic data and resources can aid us in fully understanding the diversity and population dynamics of the species and can provide a sound support for suitable conservation measures. Compared to second‐generation sequencing technologies, which constructs a complete genome by assembling numerous small sequence reads, thus leaving many gaps, third‐generation sequencing technologies produce long reads and enable the production of a genome with a high level of completeness. The recent technological advances of the Nanopore sequencing technology (ONT, Oxford, UK) enable the production of long single reads with the longest 2,272,580 bases (https://nanoporetech.com/about-us/news/longer-and-longer-dna-sequence-more-two-million-bases-now-achieved-nanopore). The long reads assembly is possible to span tricky regions that have not been sequenced. For example, the ultra‐long reads produced by ONT enabled assembly and phasing of the 4‐Mb major histocompatibility complex (MHC) locus in human genome (Jain et al., 2018). Recently, many researchers have reported the value of long Nanopore reads in improving the contiguity and completeness of assembled genomes (Austin et al., 2017; Mondal, Rawal, Gaikwad, Sharma, & Singh, 2017; Tan et al., 2018). In the case of Murray cod (Maccullochella peelii), the inclusion of Nanopore reads for scaffolding followed by subsequent gap‐closing using Illumina data led to a 29% reduction in the number of scaffolds and a 55% and 54% increase in the scaffold and contig N50, respectively (Austin et al., 2017). With the potential of the minimal cost and reduced time requirements of Nanopore sequencing, Nanopore long read‐only assembly is beginning to be adopted in the genomics field (Jain et al., 2018; Michael et al., 2018; Schmidt et al.., 2017). However, its application has rarely been reported except in model animals, such as flies and nematode (Eccles et al., 2018; Miller, Staber, Zeitlinger, & Hawley,). In this study, we assembled a high‐quality genome of red‐spotted grouper using a combination of Nanopore long‐read sequencing and Hi‐C sequencing. Our reference genome should lay a strong foundation for genome science‐based molecular breeding of red‐spotted grouper with improved phenotypes of high yield, rapid growth and resistance to diseases and stress.

MATERIALS AND METHODS

DNA sampling and sequencing

A male adult E. akaara (NCBI taxonomy ID: 215347), bred at the Fisheries Research Institute of Fujian, Xiamen, China (XMRG2392), was used for DNA sequencing (Figure 1). The DNA of E. akaara was extracted from muscle tissue using a Blood & Cell Culture DNA Midi Kit (Q13343, Qiagen, CA, USA), and then, the genomic DNA was used to construct Illumina and Nanopore libraries. An Illumina library with insert sizes of 400 bp was constructed using an Illumina TruSeq Nano DNA Library Prep Kit and then sequenced on an Illumina HiSeq X Ten instrument. The raw data were filtered using the following strategies (Luo et al., 2012): (a) filtered reads with adapters; (b) trimmed reads with 2 low‐quality bases at the 5’ end and 3 low‐quality bases at the 3’ end; (c) filtered reads with N bases more than 10%; (d) filtered duplicated reads due to polymerase chain reaction amplification; and (e) filtered reads with low‐quality bases (≤5) >50%. All the obtained clean reads were applied to estimate the genome size and heterozygosity by a k‐mer analysis. A 17‐mer frequency distribution analysis was performed on the quality‐filtered reads from the Illumina sequencing to estimate the genome size of the red‐spotted grouper using jellyfish (Marçais & Kingsford, 2011). For each Nanopore library, approximately 8 µg of gDNA was size‐selected (10–50 kb) with a Blue Pippin (Sage Science, Beverly, MA) and processed using the Ligation sequencing 1D kit (SQK‐LSK108, ONT, UK) according to the manufacturer's instructions. Nineteen libraries were constructed and sequenced on nineteen different R9.4 FlowCells using the GridION X5 sequencer (ONT, UK) for 48 hr each at the Genome Center of Nextomics (Wuhan, China). Base calling was subsequently performed on fast5 files using the ONT Albacore software (v0.8.4), and the “passed filter” reads representing data of somewhat higher quality were used for further downstream analyses.

RNA sampling and sequencing

Nine RNA samples were extracted from a male adult red‐spotted grouper (brain tissue) and a female adult red‐spotted grouper (eight tissues including brain, gill, gonad, heart, kidney, liver, muscle and spleen) using TRNzol Universal Reagent (Cat# DP424, TIANGEN). The adult E. akaara were bred at the Fisheries Research Institute of Fujian, Xiamen, China. The concentration and integrity of RNA samples were evaluated using a NanoDrop™ One UV–Vis spectrophotometer (Thermo Fisher Scientific, USA), Qubit® 3.0 Fluorometer (Invitrogen, USA) and Agilent Bioanalyzer 2100 (Agilent Technologies, USA). RNA samples with an integrity value over 7.2 were used for the subsequent cDNA library construction and sequencing. The cDNA library was prepared using the TruSeq Sample Preparation Kit (Illumina, CA, USA), and paired‐end sequencing with 150 bp was conducted on a HiSeq X ten platform (Illumina). The clean data were obtained by removing reads containing adapters, reads containing poly‐N and low‐quality reads from the raw data.

Oxford Nanopore assembly, correction and validation

The Oxford Nanopore sequencing data were assembled using Canu (Koren et al., 2017), and Canu assembly was generated using the default parameters and the complete Canu pipeline (Koren et al., 2017). To further improve the accuracy of the assembly, nanopolish (v0.8.4) was used based on bwa mem –x ont2d (Jain et al., 2018). As Illumina data are known to have a lower overall error rate, five rounds of consensus correction were performed using Illumina reads mapped with bwa v0.7.12 (Li & Durbin, 2009)(bwa, RRID:SCR_010910) and pilon (Walker et al., 2014) (pilon, RRID:SCR_014731). To evaluate the completeness genome of the assembly, we used benchmarking universal single‐copy orthologs v3.0.1 (Simão, Waterhouse, Ioannidis, Kriventseva, & Zdobnov, 2015) (busco, RRID:SCR_015008) to search the annotated genes in the assembly. Additionally, we further aligned all the RNA‐seq reads against the E. akaara genome assembly using HISAT2 (Kim, Langmead, & Salzberg, 2015) with default parameters.

Repeat annotation

The repeat sequences in the genome consisted of simple sequence repeats (SSRs), moderately repetitive sequences and highly repetitive sequences. The SSRs in the genome of E. akaara were identified with the MIcroSAtellite identification tool (misa, RRID:SCR_010765) (Thiel, Michalek, Varshney, & Graner, 2003), which can identify and locate simple and composite microsatellites. Repetitive sequences including tandem repeats and transposable elements (TEs) were searched for the E. akaara genome. The tandem repeats finder (trf, v4.07b) was used to annotate the tandem repeats (Benson, 1999). The transposable elements (TEs) were identified by a combination of homology‐based and de novo approaches. Repeat regions were searched against the repbase (v.22.11) library by repeatmasker v4.0.6 (Tarailo‐Graovac & Chen, 2009)(repeatmasker, RRID:SCR_012954). In addition, repeatmodeler v1.0.4 (repeatmodeler, RRID:SCR_015027) was used to construct a de novo repeat library, which built a repeat consensus database with classification information (Tarailo‐Graovac & Chen, 2009), and repeatmasker v4.0.6 was adopted to search the repeat regions against the built repeat library. RepeatProteinMask was used to search against the TE protein database using a WU‐BLASTX engine.

Gene prediction

Gene models were constructed, which incorporates ab initio prediction, homology‐based prediction and RNA‐seq‐assisted prediction. For ab initio gene prediction, repeat regions of the E. akaara genome were first masked based on the result from the repeat annotation, and then snap (V2013‐02‐16) (Korf, 2004) and augustus v3.2.2 (augustus: Gene Prediction, RRID:SCR_008417) (Stanke et al., 2006), which are trained to model parameters from homologous genes of BUSCO, were employed to generate gene structures. For homology‐based prediction, protein sequences from 8 sequenced vertebrates, Ctenopharyngodon idellus (Wang et al., 2015), Cynoglossus semilaevis (Chen et al., 2014), Takifugu rubripes (Kai et al., 2011), Gasterosteus aculeatus (Jones et al., 2012), Lates calcarifer (Shubha et al., 2016), Oreochromis niloticus (Conte, Gammerdinger, Bartie, Penman, & Kocher, 2017), Oryzias latipes (Kasahara et al., 2007) and Danio rerio (Genome Reference Consortium, GRCz11; released in May 2017, https://www.ncbi.nlm.nih.gov/genome/?term=Danio+rerio), were initially mapped onto the E. akaara genome using tBlastn (Mount, 2007). Subsequently, exonerate (v2.2.0) (Slater & Birney, 2005) was used to polish BLAST hits to obtain exact intron/exon positions. Furthermore, E. akaara RNA‐seq data from nine tissues sequenced in this study and four tissues provided by Sun Yat‐sen University were used to identify candidate exon regions and donor and acceptor sites by TransDecoder (https://github.com/TransDecoder/TransDecoder) based on RNA‐seq alignments to the genome using Tophat. Finally, the three respective annotation files were integrated using evidencemodeler (EVM) v1.1.1 (Haas et al., 2008) to produce a consensus gene set. The gene set was aligned to the transposon database by transposonpsi (v08222010) with default parameters. Any gene homology to transposons was removed from the final gene set.

Gene function annotation

The functional annotation of the predicted genes of E. akaara was performed by alignment to the SwissProt, TrEMBL, KEGG and GO databases using blastall and kaas (v2.1) (Moriya, Itoh, Okuda, Yoshizawa, & Kanehisa, 2007). Motifs and domains were annotated using interproscan v5.24 (Jones et al., 2014)( interproscan, RRID:SCR_005829) by searching against publicly available databases including prodom (prodom, RRID:SCR_006969), prints (prints, RRID:SCR_003412), pfam (pfam, RRID:SCR_004726), smrt, panther (panther, RRID:SCR_004869) and prosite (prosite, RRID:SCR_003457).

Noncoding RNA annotation

Noncoding RNAs, including rRNAs, snRNAs, miRNAs and tRNAs, were identified by adopting infernal v1.1.2 (Nawrocki & Eddy, 2013) using the rfam database (release 13.0) (Kalvari et al., 2018) for the E. akaara genome using BLASTN (Camacho et al., 2009) (E‐value ≤ 1e−5). Transfer RNA was predicted using trnascan‐se v1.3.1 (trnascan‐se, RRID:SCR_010835) software with default parameters for eukaryotes (Lowe & Eddy, 1997). The rRNAs and their subunits were predicted using rnammer v1.2 (Lagesen et al., 2007).

Chromosome assembly by Hi‐C

An improved Hi‐C procedure (Rao et al., 2014) was adapted for red‐spotted grouper bred at the Fisheries Research Institute of Fujian, Xiamen, China (XMRG201305).. In brief, a 100 μl blood sample was treated with 1% formaldehyde for 10 min at room temperature to induce cross‐linking. The reaction was quenched by adding 2.5 M glycine to 0.2 M for 5 min. Nuclei were digested with 100 units of DpnII and marked by biotin‐14‐dCTP (Invitrogen), and then ligated by T4 DNA Ligase. After incubating overnight to reverse cross‐links, the ligated DNA was sheared into 300‐ to 600‐bp fragments. The DNA fragments were blunt‐end repaired and A‐tailed, followed by purification through biotin–streptavidin‐mediated pulldown. Finally, the Hi‐C libraries were quantified and sequenced in an Illumina NovaSeq 6,000 instrument. The Hi‐C library generated 112.83 Gb data (Table 1), of among which 384.80 million clean paired‐end reads were aligned using bowtie2 (v2.3.2) with the end‐to‐end model (‐very‐sensitive ‐L 30). The paired reads were mapped to the genome independently, but only the reads for which both pairs could be uniquely mapped to the assembled genome were used in further analysis. In total, 262.94 million paired reads could be uniquely mapped to the flank sequences of DpnII‐cleaved sites in the assembled genome. Reads with low mapping quality (maq < 20), multiple hits, duplications and singletons were discarded. Subsequently, the contact count among each contig was calculated and normalized by standardizing the digestion sites of DpnII on the genome sketch.

Table 1

Genome assembly statistics and postprocessing of E. akaara

Genome assembly of E. akaara (2n = 24)	Canu	Canu + Nanopolish	Canu + Nanopolish+Pilon 5X	Canu + Nanopolish+Pilon 5X + Hi‐C
Total assembly size of contigs (bp)	1,124,444,644	1,131,374,363	1,135,521,910
Number of contigs	2,055	2,055	2,055
N50 contig length (bp)	5,194,129	5,229,032	5,248,933
N90 contig length (bp)	130,058	131,300	132,032
Longest contig (bp)	25,455,544	25,638,949	25,751,680
Total assembly size of scaffolds (bp)				1,135,726,210a
Number of scaffolds				24
N50 scaffold length (bp)				46,028,906
N90 scaffold length (bp)				35,838,671

Unanchored contig base count is not included.

Genome assembly statistics and postprocessing of E. akaara Unanchored contig base count is not included. We then used Lachesis (https://github.com/shendurelab/LACHESIS) to cluster, order and orient the contigs. First, 2,055 contigs were clustered into 24 chromosome groups (Figure 2) according to the agglomerative hierarchical clustering algorithm with the following parameters:

Figure 2

Red‐spotted grouper genome contig contact matrix using Hi‐C data. The colour bar illuminates the contact density from red (high) to white (low) [Colour figure can be viewed at http://wileyonlinelibrary.com]

CLUSTER MIN RE SITES = 100; CLUSTER MAX LINK DENSITY = 2; CLUSTER NONINFORMATIVE RATIO = 1.5; ORDER MIN N RES IN TRUNK = 60; ORDER MIN N RES IN SHREDS = 60). Red‐spotted grouper genome contig contact matrix using Hi‐C data. The colour bar illuminates the contact density from red (high) to white (low) [Colour figure can be viewed at http://wileyonlinelibrary.com] The contigs within the chromosome groups were then constructed as branches based on the interaction strength among the contigs, and the most dependable branches were selected as roots to add the rest of the contigs into suitable positions and to produce a group with correct contigs order. Finally, the orientations of contigs within chromosomal groups were determined using WDAG based on interaction strength between two contigs directions.

RESULTS AND DISCUSSION

Initial characterization of the E. akaara genome

To estimate the genome size and heterozygosity of E. akaara, 54.7 Gb Illumina clean reads (Table S1) were used for k‐mer analysis. Based on the total number of 48,893,735,532 17‐mers and a peak 17‐mer depth of 44, the estimated genome size of E. akaara was calculated to be 1,111 Mb, and the estimated heterozygosity rate was approximately 0.375% (Table S2 and Figure S1).

De novo assembly of an E. akaara reference genome

We sequenced the genome of E. akaara with Oxford Nanopore sequencing technology. Nineteen Flow Cells yielded 111.05 Gb of data in total, of which 106.29 Gb was collected after filtering. The average read length and the N50 length were 18.35 kb and 26 kb, respectively (Table S3 and Figure S2). The average quality Q‐score was between 9.55 and 10.53 after filtering. Canu, a state‐of‐the‐art genome assembler known to support Oxford Nanopore sequencing technology (Schmidt et al., 2017), was used with the E. akaara ONT sequenced data. The initial assembly of these ONT data resulted in an assembly size of 1,124 Mb (Table 1), which was close to the genome size estimated by the k‐mer distribution analysis (1,111 Mb), suggesting that the E. akaara genome size assembled by ONT reads is appropriate. To improve the Nanopore sequencing read‐level accuracy, two assembly polishing approaches were used: Nanopolish, which performs consensus‐based error correction of ONT reads, and Pilon, which used Illumina short reads to correct ONT reads (Loman, Quick, & Simpson, 2015). As shown in Table 1, by comparing the uncorrected, Nanopolish corrected, and Nanopolish followed by five rounds of Pilon assemblies, we found that Nanopolish and Pilon incrementally improved the continuity and completeness of the initial genome assembly, as judged by the marked increase in the contig N50 length (from 5.19 Mb to 5.25 Mb) and the longest contig length (from 25.46 Mb to 25.75 Mb). Finally, the length of the resulting assembled genome was 1,135 Mb with 2,055 contigs. To the best of our knowledge, this assembly is one of the most contiguous fish genome assemblies with the highest contig N50 when compared with the published genomes produced with other sequencing strategies (Table S4) (Austin et al., 2017; Conte et al., 2017; Kim et al.., 2018; Lehmann et al., 2018; Tan et al., 2018; Tørresen et al., 2017; Vij et al., 2016; Zhao et al., 2018), confirming that Nanopore long read‐only sequences can be effectively used to produce highly contiguous genome assemblies. The reference assembly of the E. akaara genome will be an invaluable molecular resource for supporting a range of genetic, genomic and phylogenetic studies specifically for grouper and more generally for other related fish species of the Perciformes. Further, the genome completeness of the assembly was evaluated by BUSCO and RNA‐seq data. Approximately 96.8% of the complete BUSCO were found in the assembly (Table S5). When mapping the clean RNA‐seq data (149.1 Gb in total) to our assembly genome, the percentage of aligned reads ranged from 82.6% to 95.0% (Table S6). These results suggested that the genome assembly was complete and robust.

Chromosome assembly by Hi‐C data

95.55% (1,085,005,028 bp) of the total contig bases (1,135,521,910 bp) were reliably anchored to the 24 chromosomes, with a contig N50 of 5.20 Mb and a scaffold N50 of 46.03 Mb (Tables 2 and S4).

Table 2

Summary statistics of annotated repeats

Type	Repbase TEs		RepeatModeler		TE proteins		Combined TEs
Type	Length (bp)	Per cent of sequence (%)	Length (bp)	Per cent of sequence (%)	Length (bp)	Per cent of sequence (%)	Length (bp)	Per cent of sequence (%)
DNA	72,547,595	6.39	152,578,623	13.44	12,639,588	1.11	190,003,264	16.73
LINE	54,462,165	4.8	48,239,766	4.25	47,434,545	4.18	72,820,879	6.41
LTR	15,915,741	1.4	13,792,952	1.21	10,697,269	0.94	79,800,582	7.03
SINE	7,048,847	0.62	5,336,045	0.47	0	0	7,684,877	0.68
Other	2,402,087	0.21	14,324,885	1.26	620,554	0.06	47,787,296	4.21
Unknown	569,614	0.05	102,421,979	9.02	0	0	90,408,213	7.96
Total	152,946,049	13.47	336,694,250	29.65	71,391,956	6.29	488,505,111	43.02

Summary statistics of annotated repeats

Genome annotation

Repeat sequences were identified based on the assembled sequences of the E. akaara genome. A total of 545,286 SSRs were identified, including 190,558 mono‐, 261,169 di‐, 61,227 tri‐, 25,893 tetra‐, 5,335 penta‐ and 1,104 hexa‐nucleotide repeats (Table S8). Approximately 5.36% of the E. akaara genome was identified as tandem repeats (Table S9). Overall, the combined results of the homology‐based and de novo predictions indicated that repetitive sequences account for 43.02% of the genome, with DNA transposons representing the most abundant class of repeats accounting for 16.73% (Table 2). For genome annotation, a total of 23,923 protein‐coding genes were identified in the E. akaara genome (Table 3). Compared with other existing published vertebrate genomes, the number of genes in E. akaara is similar to that in C. semilaevis (21,253 genes) and D. rerio (24,884 genes) (Table S10). In total, 23,808 genes were annotated in at least one of the databases, and up to 99.5% of E. akaara genes were functionally annotated (Table 4). Finally, 828 miRNAs, 792 rRNAs, 1,252 snRNAs and 2,608 tRNAs were also identified (Table 5). These results revealed that it is possible to obtain functional and highly contiguous genome assemblies covering most of the genes in gigabase‐sized fish genomes when using ONT in combination with Illumina data for polishing.

Table 3

Summary statistics of predicted protein‐coding genes

Gene set	Total number of gene	Average gene length (bp)	Average CDS length (bp)	Average exons number per gene	Average exon length (bp)	Average intron length (bp)
De novo
Augustus	29,024	16,738	1,413.28	8.33	169.76	2,092.11
SNAP	59,925	26,835.73	1,417.17	10.27	137.93	2,740.60
Homology
C. idellus	53,989	5,519.60	901.7	4.39	205.53	1,363.34
C. ynoglossus	33,543	11,471.97	1,387.91	7.1	195.57	1,654.05
D. rerio	38,897	8,415.67	1,140.39	5.73	199.1	1,538.84
G. aculeatus	41,023	8,218.57	1,033.88	5.62	183.82	1,553.69
L. calcarifer	36,878	11,137.05	1,321.62	7.1	186.2	1,609.70
O. niloticus	41,660	9,565.48	1,305.17	6.56	198.92	1,485.36
O. latipes	46,255	6,769.88	987.84	5.1	193.58	1,409.27
T. rubipes	33,739	9,958.68	1,191.94	6.45	184.89	1,609.54
Final set
Evm	23,9,243	23,162.39	1,791.15	10.9	164.27	2,157.86

Table 4

Summary statistics of functional annotated protein‐coding genes

Type	Number	Per cent (%)
SwissProt	22,223	92.9
TrEMBL	23,793	99.5
KEGG	13,828	57.8
GO	16,469	68.8
InterProsScan	21,892	91.5
Total Annotated Genes	23,808	99.5
Predicted Genes	23,923	–

Table 5

Summary statistics of noncoding RNA

Type	Copy number	Average length (bp)	Total length (bp)	Percentage (%) of genome
rRNA
rRNA	792	385.48	305,299	0.022804
18S	47	1,742.36	81,891	0.006117
28S	35	4,066.66	142,333	0.010632
5.8S	47	148.04	6,958	0.00052
5S	663	111.79	74,117	0.005536
snRNA
snRNA	1,252	144.96	181,492	0.013557
CD‐box	230	122.29	28,127	0.002101
HACA‐box	107	146.43	15,668	0.00117
splicing	915	150.49	137,697	0.010285
miRNA	828	85.36	70,675	0.005279
tRNA	2,608	74.68	194,771	0.014548

Summary statistics of predicted protein‐coding genes Summary statistics of functional annotated protein‐coding genes Summary statistics of noncoding RNA

CONCLUSION

In the present study, we assembled the most contiguous genome of E. akaara by de novo assembly of long sequence reads produced by Nanopore sequencing technology (GridION) and Hi‐C. The reference quality genome (1.135 Gb) of E. akaara has the assembled contig N50 of 5.25 Mb and a longest contig of 25.75 Mb. By utilizing Hi‐C technology, we assembled contigs into scaffolds that resulted in a chromosome‐level genome assembly with 24 chromosomes and a scaffold N50 length of 46.03 Mb. The genome was functionally annotated to produce a total of 23,808 (99.5%) protein‐coding genes. Our genome assembly results demonstrate that long sequence reads produced by Nanopore sequencing can be effectively used for genome assembly. Our well‐annotated genome allows us to identify genes and pathways related to sex determination and differentiation to reveal the mechanisms controlling the sex inversion of red‐spotted grouper, to screen for genetic variation conferring resistance to pathogen infection at the genome scale and to investigate genetic correlations with production traits of E. akaara in the future.

CONFLICT OF INTEREST

The authors declare that they have no competing interests.

AUTHOR CONTRIBUTIONS

H.G., J.H., Y.W., Z.H. and L.Z. designed research; H.G., K.L., M.S., S.W., Y.Z., Z.H., C.Z, Q.L., J.S., Z.W., D.C., L.L., D.W., J.H., Y.W., Z.H., L.Z. performed research and/or analysed data; H.G., K.L., M.S., Z.Z and S.W. wrote the study. All authors reviewed the manuscript. Click here for additional data file.

48 in total

1. Expression pattern, cellular localization and promoter activity analysis of ovarian aromatase (Cyp19a1a) in protogynous hermaphrodite red-spotted grouper.

Authors: Wei Huang; Li Zhou; Zhi Li; Jian-Fang Gui
Journal: Mol Cell Endocrinol Date: 2009-04-15 Impact factor: 4.102

2. The draft genome of the grass carp (Ctenopharyngodon idellus) provides insights into its evolution and vegetarian adaptation.

Authors: Yaping Wang; Ying Lu; Yong Zhang; Zemin Ning; Yan Li; Qiang Zhao; Hengyun Lu; Rong Huang; Xiaoqin Xia; Qi Feng; Xufang Liang; Kunyan Liu; Lei Zhang; Tingting Lu; Tao Huang; Danlin Fan; Qijun Weng; Chuanrang Zhu; Yiqi Lu; Wenjun Li; Ziruo Wen; Congcong Zhou; Qilin Tian; Xiaojun Kang; Mijuan Shi; Wanting Zhang; Songhun Jang; Fukuan Du; Shan He; Lanjie Liao; Yongming Li; Bin Gui; Huihui He; Zhen Ning; Cheng Yang; Libo He; Lifei Luo; Rui Yang; Qiong Luo; Xiaochun Liu; Shuisheng Li; Wen Huang; Ling Xiao; Haoran Lin; Bin Han; Zuoyan Zhu
Journal: Nat Genet Date: 2015-05-04 Impact factor: 38.330

3. De novo assembly of the complex genome of Nippostrongylus brasiliensis using MinION long reads.

Authors: David Eccles; Jodie Chandler; Mali Camberis; Bernard Henrissat; Sergey Koren; Graham Le Gros; Jonathan J Ewbank
Journal: BMC Biol Date: 2018-01-11 Impact factor: 7.431

4. Whole-genome sequence of a flatfish provides insights into ZW sex chromosome evolution and adaptation to a benthic lifestyle.

Authors: Songlin Chen; Guojie Zhang; Changwei Shao; Quanfei Huang; Geng Liu; Pei Zhang; Wentao Song; Na An; Domitille Chalopin; Jean-Nicolas Volff; Yunhan Hong; Qiye Li; Zhenxia Sha; Heling Zhou; Mingshu Xie; Qiulin Yu; Yang Liu; Hui Xiang; Na Wang; Kui Wu; Changgeng Yang; Qian Zhou; Xiaolin Liao; Linfeng Yang; Qiaomu Hu; Jilin Zhang; Liang Meng; Lijun Jin; Yongsheng Tian; Jinmin Lian; Jingfeng Yang; Guidong Miao; Shanshan Liu; Zhuo Liang; Fang Yan; Yangzhen Li; Bin Sun; Hong Zhang; Jing Zhang; Ying Zhu; Min Du; Yongwei Zhao; Manfred Schartl; Qisheng Tang; Jun Wang
Journal: Nat Genet Date: 2014-02-02 Impact factor: 38.330

5. First de novo draft genome sequence of Oryza coarctata, the only halophytic species in the genus Oryza.

Authors: Tapan Kumar Mondal; Hukam Chand Rawal; Kishor Gaikwad; Tilak Raj Sharma; Nagendra Kumar Singh
Journal: F1000Res Date: 2017-09-25

6. Nanopore sequencing and assembly of a human genome with ultra-long reads.

Authors: Miten Jain; Sergey Koren; Karen H Miga; Josh Quick; Arthur C Rand; Thomas A Sasani; John R Tyson; Andrew D Beggs; Alexander T Dilthey; Ian T Fiddes; Sunir Malla; Hannah Marriott; Tom Nieto; Justin O'Grady; Hugh E Olsen; Brent S Pedersen; Arang Rhie; Hollian Richardson; Aaron R Quinlan; Terrance P Snutch; Louise Tee; Benedict Paten; Adam M Phillippy; Jared T Simpson; Nicholas J Loman; Matthew Loose
Journal: Nat Biotechnol Date: 2018-01-29 Impact factor: 54.908

7. Chromosome-scale assembly of the Monopterus genome.

Authors: Xueya Zhao; Majing Luo; Zhigang Li; Pei Zhong; Yibin Cheng; Fengling Lai; Xin Wang; Jiumeng Min; Mingzhou Bai; Yulan Yang; Hanhua Cheng; Rongjia Zhou
Journal: Gigascience Date: 2018-05-01 Impact factor: 6.524

8. Gene finding in novel genomes.

Authors: Ian Korf
Journal: BMC Bioinformatics Date: 2004-05-14 Impact factor: 3.169

9. Rfam 13.0: shifting to a genome-centric resource for non-coding RNA families.

Authors: Ioanna Kalvari; Joanna Argasinska; Natalia Quinones-Olvera; Eric P Nawrocki; Elena Rivas; Sean R Eddy; Alex Bateman; Robert D Finn; Anton I Petrov
Journal: Nucleic Acids Res Date: 2018-01-04 Impact factor: 16.971

10. Finding Nemo: hybrid assembly with Oxford Nanopore and Illumina reads greatly improves the clownfish (Amphiprion ocellaris) genome assembly.

Authors: Mun Hua Tan; Christopher M Austin; Michael P Hammer; Yin Peng Lee; Laurence J Croft; Han Ming Gan
Journal: Gigascience Date: 2018-03-01 Impact factor: 6.524

11 in total

1. Chromosome anchoring in Senegalese sole (Solea senegalensis) reveals sex-associated markers and genome rearrangements in flatfish.

Authors: Israel Guerrero-Cózar; Jessica Gomez-Garrido; Concha Berbel; Juan F Martinez-Blanch; Tyler Alioto; M Gonzalo Claros; Pierre-Alexandre Gagnaire; Manuel Manchado
Journal: Sci Rep Date: 2021-06-29 Impact factor: 4.379

2. An SNP-Based Genetic Map and QTL Mapping for Growth Traits in the Red-Spotted Grouper (Epinephelus akaara).

Authors: Xiang Wang; Shoujia Jiang; Leyun Zheng; Ling Xiao; Xinhui Zhang; Dengdong Wang; Shuisheng Li; Qiong Shi; Shuiqing Wu; Haoran Lin; Xinxin You; Yong Zhang
Journal: Genes (Basel) Date: 2019-10-12 Impact factor: 4.096

3. De novo assembly of a chromosome-level reference genome of red-spotted grouper (Epinephelus akaara) using nanopore sequencing and Hi-C.

Authors: Hui Ge; Kebing Lin; Mi Shen; Shuiqing Wu; Yilei Wang; Ziping Zhang; Zhiyong Wang; Yong Zhang; Zhen Huang; Chen Zhou; Qi Lin; Jianshao Wu; Lei Liu; Jiang Hu; Zhongchi Huang; Leyun Zheng
Journal: Mol Ecol Resour Date: 2019-08-14 Impact factor: 7.090

4. A Chromosome-Level Genome Assembly of the Dark Sleeper Odontobutis potamophila.

Authors: Yongyi Jia; Jianbo Zheng; Shili Liu; Fei Li; Meili Chi; Shun Cheng; Zhimi Gu
Journal: Genome Biol Evol Date: 2021-02-03 Impact factor: 3.416

5. A highly contiguous nuclear genome assembly of the mandarinfish Synchiropus splendidus (Syngnathiformes: Callionymidae).

Authors: Martin Stervander; William A Cresko
Journal: G3 (Bethesda) Date: 2021-12-08 Impact factor: 3.154

6. Genomic prediction using low-coverage portable Nanopore sequencing.

Authors: Harrison J Lamb; Ben J Hayes; Imtiaz A S Randhawa; Loan T Nguyen; Elizabeth M Ross
Journal: PLoS One Date: 2021-12-15 Impact factor: 3.240

7. Insights into chromosomal evolution and sex determination of Pseudobagrus ussuriensis (Bagridae, Siluriformes) based on a chromosome-level genome.

Authors: Chuankun Zhu; Haiyang Liu; Zhengjun Pan; Lei Cheng; Yanhong Sun; Hui Wang; Guoliang Chang; Nan Wu; Huaiyu Ding; Haitao Zhao; Lei Zhang; Xiangsheng Yu
Journal: DNA Res Date: 2022-06-25 Impact factor: 4.477

8. Recovery of human gut microbiota genomes with third-generation sequencing.

Authors: Yanfei Li; Yueling Jin; Jianming Zhang; Haoying Pan; Lan Wu; Dingsheng Liu; Jinlong Liu; Jing Hu; Junwei Shen
Journal: Cell Death Dis Date: 2021-06-02 Impact factor: 8.469

9. Whole-genome sequencing of leopard coral grouper ( Plectropomus leopardus) and exploration of regulation mechanism of skin color and adaptive evolution.

Authors: Yang Yang; Li-Na Wu; Jing-Fang Chen; Xi Wu; Jun-Hong Xia; Zi-Ning Meng; Xiao-Chun Liu; Hao-Ran Lin
Journal: Zool Res Date: 2020-05-18

10. Chromosome Genome Assembly of Cromileptes altivelis Reveals Loss of Genome Fragment in Cromileptes Compared with Epinephelus Species.

Authors: Yang Yang; Lina Wu; Zhuoying Weng; Xi Wu; Xi Wang; Junhong Xia; Zining Meng; Xiaochun Liu
Journal: Genes (Basel) Date: 2021-11-24 Impact factor: 4.096