| Literature DB >> 28327967 |
Bo-Hye Nam1, Woori Kwak2,3, Young-Ok Kim1, Dong-Gyun Kim1, Hee Jeong Kong1, Woo-Jin Kim1, Jeong-Ha Kang1, Jung Youn Park1, Cheul Min An1, Ji-Young Moon1, Choul Ji Park4, Jae Woong Yu3, Joon Yoon2, Minseok Seo3, Kwondo Kim2,3, Duk Kyung Kim3, SaetByeol Lee3, Samsun Sung3, Chul Lee2,3, Younhee Shin5, Myunghee Jung5, Byeong-Chul Kang5, Ga-Hee Shin5, Sojeong Ka6, Kelsey Caetano-Anolles6, Seoae Cho3, Heebal Kim7.
Abstract
Background: Abalones are large marine snails in the family Haliotidae and the genus Haliotis belonging to the class Gastropoda of the phylum Mollusca. The family Haliotidae contains only one genus, Haliotis, and this single genus is known to contain several species of abalone. With 18 additional subspecies, the most comprehensive treatment of Haliotidae considers 56 species valid [ 1 ]. Abalone is an economically important fishery and aquaculture animal that is considered a highly prized seafood delicacy. The total global supply of abalone has increased 5-fold since the 1970s and farm production increased explosively from 50 mt to 103 464 mt in the past 40 years. Additionally, researchers have recently focused on abalone given their reported tumor suppression effect. However, despite the valuable features of this marine animal, no genomic information is available for the Haliotidae family and related research is still limited. To construct the H . discus hannai genome, a total of 580-G base pairs using Illumina and Pacbio platforms were generated with 322-fold coverage based on the 1.8-Gb estimated genome size of H . discus hannai using flow cytometry. The final genome assembly consisted of 1.86 Gb with 35 450 scaffolds (>2 kb). GC content level was 40.51%, and the N50 length of assembled scaffolds was 211 kb. We identified 29 449 genes using Evidence Modeler based on the gene information from ab initio prediction, protein homology with known genes, and transcriptome evidence of RNA-seq. Here we present the first Haliotidae genome, H . discus hannai , with sequencing data, assembly, and gene annotation information. This will be helpful for resolving the lack of genomic information in the Haliotidae family as well as providing more opportunities for understanding gastropod evolution.Entities:
Keywords: Abalone genome; Haliotis discus hannai; Halotidae
Mesh:
Year: 2017 PMID: 28327967 PMCID: PMC5439488 DOI: 10.1093/gigascience/gix014
Source DB: PubMed Journal: Gigascience ISSN: 2047-217X Impact factor: 6.524
Figure 1.Example of a H. discus hannai, the pacific abalone.
Summary statistics of generated whole genome shotgun sequencing data.
| Library name | Library type | Insert size | Platform | Read length | No. read | Total bp |
|---|---|---|---|---|---|---|
| 250 bp | Paired-end | 250 | Nextseq500 | 150 | 876 529 480 | 131 440 418 087 |
| 350 bp | Paired-end | 350 | Hiseq2000 | 101 | 1 413 620 786 | 142 775 699 386 |
| 3 k | Mate-pair | 3000 | Nextseq500 | 150 | 580 064 464 | 85 689 154 056 |
| 5 k | Mate-pair | 5000 | Nextseq500 | 150 | 468 432 888 | 69 966 139 205 |
| 8 k | Mate-pair | 8000 | Nextseq500 | 150 | 335 132 792 | 50 109 845 012 |
| 10 k | Mate-pair | 10 000 | Nextseq500 | 150 | 569 376 096 | 85 080 237 236 |
| 20 k | P6-C4 | 20 000 | Pacbio RS II | 10 094 | ||
| (average) | 1,573,020 | 15 879 626 978 | ||||
| Total | 580 941 119 960 |
Figure 2.19-mer distribution of using jellyfish with 350-bp paired-end whole genome sequencing data.
Summary statistics for the H. discus hannai draft genome (>2 kb).
| Assembled genome | |
|---|---|
| Size (1n) | 1.80 Gb |
| GC level | 40.51% |
| No. scaffolds | 35 450 |
| N50 of scaffolds (bp) | 211 346 |
| N bases in scaffolds (%) | 116 Mb (6.45%) |
| Longest (shortest) scaffolds (bp) | 2 207 537 (2000) |
| Average scaffold length (bp) | 50 870.65 |
Summary of identified repeat elements in the Haliotis discus hannai genome.
| Repeat element | No. element | Length (%) |
|---|---|---|
| SINE | 284 485 | 96 155 199 (5.11%) |
| LINE | 700 245 | 160 387 248 (8.53%) |
| LTR element | 383 770 | 55 149 794 (2.93%) |
| DNA element | 58 022 | 14 563 432 (0.77%) |
| Small RNA | 20 997 | 1 537 853 (0.08%) |
| Simple repeat | 161 246 | 32 547 245 (1.73%) |
| Low complexity | 326 399 | 21 446 303 (1.14%) |
| Unclassifed | 1 522 272 | 265 603 066 (14.1%) |
Figure 3.Repeat element information of H. discus hannai compared to L. gigantean. (a) Total amount and ratio of identified repeat element classified into eight classes (DNA, LINE, SINE, LTR, Low complexity, Satellite, Simple repeat, and Unknown) from each genome. (b) Distribution of gene copy number of the two highly possessed repeat elements in each genome based on the divergence. Heat maps indicate the total amount of repeat element divided into 20 levels based on the divergence.
Summary statistics of generated transcriptome data for six organ tissues using Illumina platform.
| Library name | Library type | Platform | Read length | No. read | Total bp |
|---|---|---|---|---|---|
| Blood | Paired-end | Hiseq2000 | 101 | 53 525 950 | 5 406 120 950 |
| Digestive duct | Paired-end | Hiseq2000 | 101 | 56 485 666 | 5 705 052 266 |
| Gill | Paired-end | Hiseq2000 | 101 | 66 415 882 | 6 708 004 082 |
| Hepatopancreas | Paired-end | Hiseq2000 | 101 | 58 467 176 | 5 905 184 776 |
| Mantle | Paired-end | Hiseq2000 | 101 | 65 741 776 | 6 639 919 376 |
| Ovary | Paired-end | Hiseq2000 | 101 | 60 997 100 | 6 160 707 100 |
| Total | 36 524 988 550 |
Summary statistics of protein alignment using tBlastn for protein based evidence gene structure.
| Total | Count/ | Total | Mean | Genome | |||
|---|---|---|---|---|---|---|---|
| Species | Type | Element | count | gene | length, bp | length, Bp | coverage, % |
|
| Protein | Transcript | 18 792 | 109 068 639 | 5803.99 | 5.80 | |
| (69 002) | Exon | 77 320 | 4.11 | 12 667 395 | 163.83 | 0.67 | |
|
| Protein | Transcript | 11 605 | 68 796 463 | 5928.17 | 3.66 | |
| (42 474) | Exon | 47 300 | 4.08 | 7 978 167 | 168.67 | 0.42 | |
|
| Protein | Transcript | 15 901 | 55 043 032 | 3461.61 | 2.93 | |
| (53 876) | Exon | 46 040 | 2.90 | 7 567 059 | 164.36 | 0.40 | |
|
| Protein | Transcript | 29 345 | 177 851 531 | 6060.71 | 9.47 | |
| (23 851) | Exon | 118 165 | 4.03 | 20 583 999 | 174.20 | 1.10 | |
|
| Protein | Transcript | 32 978 | 231 175 282 | 7009.98 | 12.30 | |
| (28 027) | Exon | 140 784 | 4.27 | 23 649 828 | 167.99 | 1.26 | |
|
| Protein | Transcript | 10 570 | 67 396 621 | 6376.22 | 3.59 | |
| (29 096) | Exon | 45 737 | 4.33 | 7 797 503 | 170.49 | 0.42 | |
|
| Protein | Transcript | 9116 | 46 270 640 | 5075.76 | 2.46 | |
| (38 730) | Exon | 34 572 | 3.79 | 5 627 082 | 162.76 | 0.30 | |
|
| Protein | Transcript | 27 438 | 125 307 206 | 4566.92 | 6.67 | |
| (58 493) | Exon | 92 426 | 3.37 | 15 483 164 | 167.52 | 0.82 |
Summary statistics for ab initio gene prediction results using various programs and parameters.
| Total | Count/ | Total | Mean | Genome | |||
|---|---|---|---|---|---|---|---|
| Program | Matrix | Element | count | gene | length, bp | length, bp | coverage, % |
| Augustus | Custom parameter (RNAseq) | Gene | 88 825 | 3.92 | 367 066 732 | 4132.47 | 19.54 |
| CDS | 348 528 | 76 388 076 | 219.17 | 4.07 | |||
| Custom parameter ( | Gene | 90 396 | 4.11 | 395 511 710 | 4375.32 | 21.05 | |
| CDS | 371 487 | 78 508 401 | 211.34 | 4.18 | |||
| Custom parameter (H.discus discus IsoSeq) | Gene | 84 322 | 3.97 | 346 455 180 | 4108.72 | 18.44 | |
| CDS | 335 103 | 72 527 841 | 216.43 | 3.86 | |||
| Custom parameter (BUSCO) | Gene | 111 058 | 4.24 | 626 749 935 | 5643.45 | 33.36 | |
| CDS | 470 839 | 84 333 972 | 179.11 | 4.49 | |||
| Custom parameter (CEGAM) | Gene | 76 504 | 4.95 | 393 121 657 | 5138.58 | 20.92 | |
| CDS | 378 485 | 63 424 677 | 167.58 | 3.38 | |||
| Custom parameter (Protein) | Gene | 22 420 | 3.43 | 184 289 721 | 8219.88 | 9.81 | |
| CDS | 76 848 | 20 291 739 | 264.05 | 1.08 | |||
| Fgenesh | Custom parameter | Gene | 184 051 | 3.46 | 1 366 924 540 | 7426.88 | 72.75 |
| CDS | 636 568 | 98 055 591 | 154.04 | 5.22 | |||
| Geneid |
| Gene | 789 540 | 1.41 | 436 990 370 | 553.47 | 23.26 |
| CDS | 1 112 959 | 140 976 492 | 126.67 | 7.50 |
Summary statistics for the consensus gene set of Haliotis discus hannai genome.
| Element | No. elements | Exon/transcript | Avg. length | Total length | Genome coverage |
|---|---|---|---|---|---|
| Gene | 29 449 | – | 2705 | 79 661 536 | 4.2% |
| Exon | 74 745 | 2.54 | 280 | 20 985 298 | 1.1% |
| Intron | 45 296 | 1.54 | 1295 | 58 676 238 | 3.1% |
Summary statistics of Benchmarking Universal Single-Copy Orthologs (BUSCO) analysis for H. discus hannai genome based on Metazoans DB.
| Categories | #Genes | Percentage |
|---|---|---|
| Complete single-copy BUSCOs | 609 | 72.2 |
| Complete duplicate BUSCOs | 48 | 5.7 |
| Fragmented BUSCOs | 130 | 15.4 |
| Missing BUSCOs | 104 | 12.3 |