| Literature DB >> 28444302 |
Qiang Lin1, Ying Qiu2,3, Ruobo Gu2,3,4, Meng Xu5, Jia Li3, Chao Bian3,6,7, Huixian Zhang1, Geng Qin1, Yanhong Zhang1, Wei Luo1, Jieming Chen3, Xinxin You3,6, Mingjun Fan3, Min Sun3, Pao Xu2,6, Byrappa Venkatesh8, Junming Xu3,4,6, Hongtuo Fu2,6, Qiong Shi3,4,6,9.
Abstract
Background: The lined seahorse, Hippocampus erectus , is an Atlantic species and mainly inhabits shallow sea beds or coral reefs. It has become very popular in China for its wide use in traditional Chinese medicine. In order to improve the aquaculture yield of this valuable fish species, we are trying to develop genomic resources for assistant selection in genetic breeding. Here, we provide whole genome sequencing, assembly, and gene annotation of the lined seahorse, which can enrich genome resource and further application for its molecular breeding. A total of 174.6 Gb (Gigabase) raw DNA sequences were generated by the Illumina Hiseq2500 platform. The final assembly of the lined seahorse genome is around 458 Mb, representing 94% of the estimated genome size (489 Mb by k-mer analysis). The contig N50 and scaffold N50 reached 14.57 kb and 1.97 Mb, respectively. Quality of the assembled genome was assessed by BUSCO with prediction of 85% of the known vertebrate genes and evaluated using the de novo assembled RNA-seq transcripts to prove a high mapping ratio (more than 99% transcripts could be mapped to the assembly). Using homology-based, de novo and transcriptome-based prediction methods, we predicted 20 788 protein-coding genes in the generated assembly, which is less than our previously reported gene number (23 458) of the tiger tail seahorse ( H. comes ). We report a draft genome of the lined seahorse. These generated genomic data are going to enrich genome resource of this economically important fish, and also provide insights into the genetic mechanisms of its iconic morphology and male pregnancy behavior.Entities:
Keywords: genome; assembly; annotation; Hippocampus erectus
Mesh:
Year: 2017 PMID: 28444302 PMCID: PMC5459928 DOI: 10.1093/gigascience/gix030
Source DB: PubMed Journal: Gigascience ISSN: 2047-217X Impact factor: 6.524
Figure 1:Photo of a cultivated line seahorse in Shenzhen, China.
Comparison of genome assembly and annotation between the lined seahorse and the reported tiger tail seahorse
| Genome Assembly | Lined Seahorse | Tiger Tail Seahorse |
|---|---|---|
| Contig N50 size (kb) | 14.57 | 34.67 |
| Scaffold N50 size (Mb) | 1.97 | 1.87 |
| Estimated genome size (Mb) | 489 | 695 |
| Assembled genome size (Mb) | 457.76 | 501.59 |
| Genome coverage (×) | 243.05 | 192.05 |
| Longest scaffold (bp) | 7 855 128 | 9 810 584 |
| Genome annotation | ||
| Protein-coding gene number | 20 788 | 23 458 |
| Annotated functional gene number | 18 776 (90.32%) | 22 245 (94.83%) |
| Unannotated functional gene number | 2012 (9.68%) | 1213 (5.17%) |
| Transposable elements content | 28.1% | 24.8% |
Assessment of the completeness of the lined seahorse genome using transcriptome data
| Dataset | Number | Total Length (bp) | Base Covered by Assembly (%) | Sequence Covered by Assembly (%) | With >90% Sequence in 1 Scaffold | With >50% Sequence in 1 Scaffold | ||
|---|---|---|---|---|---|---|---|---|
| Number | Percent | Number | Percent | |||||
| All | 71 765 | 52 877 091 | 98.22 | 99.52 | 68 292 | 95.16 | 71 255 | 99.29 |
| >200 bp | 71 765 | 52 877 091 | 98.22 | 99.52 | 68 292 | 95.16 | 71 255 | 99.29 |
| >500 bp | 29 811 | 40 111 717 | 98.12 | 99.68 | 27 902 | 93.60 | 29 640 | 99.43 |
| >1000 bp | 14 780 | 29 612 539 | 97.92 | 99.70 | 13 561 | 91.75 | 14 686 | 99.36 |
Figure 2:Phylogeny of ray-finned fishes. The Spotted gar was used as the outgroup species. See more details of the protein sequence sources in the main context.