| Literature DB >> 30351299 |
Younhee Shin1,2, Myunghee Jung1,3, Ga-Hee Shin1, Ho-Jin Jung1, Su-Jin Baek1, Gi-Yong Lee1, Byeong-Chul Kang1, Jaeyoung Shim1, Ji-Man Hong1, Jung Youn Park4, Cheul Min An4, Young-Ok Kim4, Jae Koo Noh4, Ju-Won Kim4, Bo-Hye Nam4, Chan-Il Park5.
Abstract
The rock bream (Oplegnathus fasciatus) is one of the most economically valuable marine fish in East Asia, and due to various environmental factors, there is substantial revenue loss in the production sector. Therefore, knowledge of its genome is required to uncover the genetic factors and the solutions to these problems. In this study, we constructed the first draft genome of O. fasciatus as a reference for the family Oplegnathidae. The genome size is estimated to be 749 Mb, and it was assembled into 766 Mb by combining Illumina and PacBio sequences. A total of 24,053 transcripts (23,338 genes) are predicted, and among those transcripts, 23,362 (97%), are annotated with functional terms. Finally, the completeness of the genome assembly was assessed by CEGMA, which resulted in the complete mapping of 220 (88.7%) core genes in the genome. To the best of our knowledge, this is the first draft genome for the family Oplegnathidae.Entities:
Mesh:
Year: 2018 PMID: 30351299 PMCID: PMC6198749 DOI: 10.1038/sdata.2018.234
Source DB: PubMed Journal: Sci Data ISSN: 2052-4463 Impact factor: 6.444
Figure 1Illustration of the complete Oplegnathus fasciatus genome assembly and the structural and functional annotation pipelines used.
(a) the genome assembly pipeline, (b) the structural and functional annotation pipeline, (c) details of the reference gene sets used for the ab initio and evidence-based gene model predictions.
Figure 2Illustration of the genome size and the functional annotation of the Oplegnathus fasciatus genome.
(a) k-mer based genome size estimation, (b) sequence similarity-based species distribution obtained from BLAST.
Summary of the complete sequence libraries used in this study.
| S. No | Sample Type | Library type | Platform | Insert size (bp)/cell | Read length (bp) | Total length(Gb) | Coverage (X) | Preprocessed | Coverage (X) | SRA Accesion Number |
|---|---|---|---|---|---|---|---|---|---|---|
| 1 | DNA | Paired-end | Illumina-HiSeq2000 | 350 | 101 | 56.4 | 73.6 | 39.9 | 52.1 | SRR5860988 |
| 2 | DNA | Paired-end | Illumina-HiSeq2000 | 550 | 101 | 53.6 | 69.9 | 35.8 | 46.7 | SRR5860989 |
| 3 | DNA | Mate-pair | Illumina-HiSeq2000 | 3,000 | 101 | 31.1 | 40.6 | 16.2 | 21.1 | SRR5860986 |
| 4 | DNA | Mate-pair | Illumina-HiSeq2000 | 5,000 | 101 | 27.6 | 36.0 | 12.1 | 15.8 | SRR5860987 |
| 5 | DNA | Mate-pair | Illumina-HiSeq2000 | 8,000 | 101 | 26.2 | 34.2 | 2.4 | 3.1 | SRR5860984 |
| 6 | DNA | Mate-pair | Illumina-HiSeq2000 | 10,000 | 101 | 29.5 | 38.5 | 3.2 | 4.2 | SRR5860985 |
| 7 | DNA | Long Fragments | PacBio RSII | 20 Kb | Max: 50,375/Min: 50 | 11.5 | 15.0 | SRR5860983 |
Oplegnathus fasciatus genome de novo assemblies.
| Description | 1st scaffolding (w/o PacBio) | 2nd scaffolding (w/PacBio) |
|---|---|---|
| No. of scaffolds | 31,533 | 4,149 |
| No. of bases (bp) | 762,490,804 | 766,301,214 |
| Scaffold N50 (bp) | 874,256 | 1,126,915 |
| Maximum length (bp) | 5,005,633 | 7,250,909 |
| Minimum length (bp) | 143 | 1,000 |
| N (%) | 5.3 | 5.2 |
| No. of contigs | 108,639 | |
| No. of bases (bp) | 730,022,001 | |
| Contig N50 (bp) | 37,752 | |
| Minimum length (bp) | 200 | |
| Maximum length (bp) | 462,101 | |
| N (%) | 0.5 |
Repeat elements present in the Oplegnathus fasciatus genome.
| Categories | Subcategories | No. of Elements | Length Occupied | % of Sequences |
|---|---|---|---|---|
| SINEs | 16,852 | 2,167,823 | 0.28 | |
| MIRs | 2,753 | 4,18,120 | 0.05 | |
| LINEs | 76,644 | 19,492,079 | 2.54 | |
| LINE1 | 1,232 | 5,34,505 | 0.07 | |
| LINE2 | 31,556 | 7,363,574 | 0.96 | |
| L3/CR1 | 149 | 53,174 | 0.01 | |
| LTR elements | 10,054 | 2,940,460 | 0.38 | |
| ERV_Class I | 184 | 111,018 | 0.01 | |
| DNA elements | 253,296 | 50,393,060 | 6.58 | |
| hAT-Charlie | 11,297 | 2,077,564 | 0.27 | |
| Unclassified | 469,919 | 88,403,276 | 11.54 | |
| Total Interspersed repeats | 163,396,698 | 21.32 | ||
| Small RNA | 5,689 | 758,706 | 0.1 | |
| Satellites | 1,693 | 165,759 | 0.02 | |
| Simple repeats | 334,581 | 14,726,054 | 1.92 | |
| Low complexity | 41,697 | 2,428,206 | 0.32 |
Datasets for this project submitted to the figshare repository and its data descriptions.
| Oplegnathusfasciatus_contig.fa | fasta | Genome assembly result file (CLC Assembly Cell) |
| Oplegnathusfasciatus_scaffold.fa | fasta | Genome assembly result file (SSPACE - scaffolding with Illumina MP reads) |
| Oplegnathusfasciatus_super_scaffold.fa | fasta | Genome assembly results file (SSPACE - scaffolding with PacBio long reads) |
| Oplegnathusfasciatus_super_scaffold.fa.out | txt | Repeat annotation file by Repeat Masker |
| Oplegnathusfasciatus_super_scaffold.fa.tbl | txt | The summary file |
| Oplegnathusfasciatus_super_scaffold.fa.masked | fasta | Repeat masked genome assembly file |
| Oplegnathusfasciatus_cds.fna | fasta | Predicted coding sequence |
| Oplegnathusfasciatus_gene.gff3 | gff3 | Annotated coding sequence, gff3 format file |
| Oplegnathusfasciatus_protein.faa | fasta | Predicted protein sequence |
| Oplegnathusfasciatus_gene_definition.xls | xls | Give the blast description table from blast2go files |
| Oplegnathusfasciatus_Interpro.xls | xls | InterPro database annotation table |
| Oplegnathusfasciatus_gene_KEGG.xls | xls | KEGG database annotation |
| Oplegnathusfasciatus_GO_annotation.tar | tar | Gene Ontologies (BP, MF. CC) |