| Literature DB >> 28327943 |
Kai Liu1, Dongpo Xu1, Jia Li2, Chao Bian2, Jinrong Duan1, Yanfeng Zhou1, Minying Zhang1, Xinxin You2, Yang You1, Jieming Chen2, Hui Yu2, Gangchun Xu1, Di-An Fang1, Jun Qiang1, Shulun Jiang1, Jie He1, Junmin Xu2,3,4, Qiong Shi2,3,4,5, Zhiyong Zhang6, Pao Xu1,4.
Abstract
Background: Chinese clearhead icefish, Protosalanx hyalocranius , is a representative icefish species with economic importance and special appearance. Due to its great economic value in China, the fish was introduced into Lake Dianchi and several other lakes from the Lake Taihu half a century ago. Similar to the Sinocyclocheilus cavefish, the clearhead icefish has certain cavefish-like traits, such as transparent body and nearly scaleless skin. Here, we provide the whole genome sequence of this surface-dwelling fish and generated a draft genome assembly, aiming at exploring molecular mechanisms for the biological interests. A total of 252.1 Gb of raw reads were sequenced. Subsequently, a novel draft genome assembly was generated, with the scaffold N50 reaching 1.163 Mb. The genome completeness was estimated to be 98.39 % by using the CEGMA evaluation. Finally, we annotated 19 884 protein-coding genes and observed that repeat sequences account for 24.43 % of the genome assembly. We report the first draft genome of the Chinese clearhead icefish. The genome assembly will provide a solid foundation for further molecular breeding and germplasm resource protection in Chinese clearhead icefish, as well as other icefishes. It is also a valuable genetic resource for revealing the molecular mechanisms for the cavefish-like characters.Entities:
Keywords: Clearhead icefish; Gene prediction; Genome assembly; Protosalanx hyalocranius; Repetitive sequences; Whole genome sequencing
Mesh:
Year: 2017 PMID: 28327943 PMCID: PMC5530312 DOI: 10.1093/gigascience/giw012
Source DB: PubMed Journal: Gigascience ISSN: 2047-217X Impact factor: 6.524
Figure 1:Picture of a Chinese clearhead icefish. It was captured from the Taihu Lake of Jiangsu Province, China.
The statistics of genome assembly and annotation for P. hyalocranius
| Genome assembly | |
|---|---|
| Contig N50 size (kb) | 17.2 |
| Scaffold N50 size (Mb) | 1.163 |
| Estimated genome size (Mb) | 525 |
| Assembled genome size (Mb) | 536 |
| Genome coverage (X) | 315 |
| The longest scaffold (bp) | 5 398 389 |
| Gap length (Mb) | 122 |
| Genome annotation | |
| Protein-coding gene number | 19 884 |
| Annotated functional gene number | 19 125 (96.2 %) |
| Unannotated functional gene number | 759 (3.8 %) |
| Repeat content | 24.43 % |
Detailed classification of repeat sequences in the assembled genome
| Type | Repeat size (bp) | % of Genome |
|---|---|---|
| ProteinMask | 9 925 152 | 1.85 |
| RepeatMasker | 5 948 136 | 1.11 |
| Tandem Repeat Finder | 66 595 756 | 12.41 |
| De novo | 93 726 009 | 17.47 |
| Total | 131 090 229 | 24.43 |
Gene annotation statistics of the genome of P. hyalocranius
| Average transcript | Average CDS | Average Exons | Average Exons | Average Intron | |||
|---|---|---|---|---|---|---|---|
| Method | Number | length (bp) | length (bp) | Per Gene | Length (bp) | Length (bp) | |
|
| AUGUSTUS | 23 132 | 4897.24 | 1264.61 | 5.78 | 218.81 | 760.04 |
| GeneScan | 21 379 | 17 213.49 | 1973.56 | 10.22 | 193.05 | 1652.41 | |
| Homolog |
| 25 390 | 7156.92 | 1312.32 | 6.17 | 212.62 | 1129.99 |
|
| 25 319 | 6411.36 | 1194.58 | 5.89 | 202.73 | 1066.29 | |
|
| 16 563 | 7990.91 | 1759.17 | 11.59 | 151.75 | 588.32 | |
|
| 19 128 | 8335.40 | 1351.98 | 7.44 | 181.78 | 1084.78 | |
|
| 24 861 | 8019.18 | 1375.58 | 6.92 | 198.85 | 1122.70 | |
|
| 25 354 | 6819.62 | 1183.46 | 6.18 | 191.44 | 1087.68 | |
| Final gene set | 19 884 | 12 889.35 | 1821.79 | 9.13 | 199.49 | 1360.92 |
Figure 2:Phylogeny of seven representative ray-finned fishes. The spotted gar was used as the outgroup species.
Figure 3:Distribution of 4DTV distances between the clearhead icefish and tilapia. The horizontal axis stands for the 4DTV distance corrected using the HKY model. The vertical axis represents the percentage of collinear gene pairs.