| Literature DB >> 31704938 |
Yitao Zhou1, Shijun Xiao2, Gang Lin1,3, Duo Chen1, Wan Cen1, Ting Xue1, Zhiyu Liu4, Jianxing Zhong4, Yanting Chen5, Yijun Xiao1, Jianhua Chen1, Yunhai Guo6, Youqiang Chen1, Yanding Zhang1, Xuefeng Hu7, Zhen Huang8.
Abstract
Pufferfish are ideal models for vertebrate chromosome evolution studies. The yellowbelly pufferfish, Takifugu flavidus, is an important marine fish species in the aquaculture industry and ecology of East Asia. The chromosome assembly of the species could facilitate the study of chromosome evolution and functional gene mapping. To this end, 44, 27 and 50 Gb reads were generated for genome assembly using Illumina, PacBio and Hi-C sequencing technologies, respectively. More than 13 Gb full-length transcripts were sequenced on the PacBio platform. A 366 Mb genome was obtained with the contig of 4.4 Mb and scaffold N50 length of 15.7 Mb. 266 contigs were reliably assembled into 22 chromosomes, representing 95.9% of the total genome. A total of 29,416 protein-coding genes were predicted and 28,071 genes were functionally annotated. More than 97.7% of the BUSCO genes were successfully detected in the genome. The genome resource in this work will be used for the conservation and population genetics of the yellowbelly pufferfish, as well as in vertebrate chromosome evolution studies.Entities:
Mesh:
Year: 2019 PMID: 31704938 PMCID: PMC6841922 DOI: 10.1038/s41597-019-0279-z
Source DB: PubMed Journal: Sci Data ISSN: 2052-4463 Impact factor: 6.444
Fig. 1The work flow used for the yellowbelly pufferfish genome assembly and annotation in this work. The panes with green, cyan and yellow represent the input sequencing data, intermediate files and final outputs, respectively. Bioinformatics software is highlighted in red along the work flow.
Fig. 2A picture of the yellowbelly pufferfish used in the genome sequencing and assembly.
Sequencing data used for the yellowbelly pufferfish genome assembly.
| Library | Sequencing platform | Insert size | Raw data (Gb) | Sequence coverage (X) |
|---|---|---|---|---|
| genome | Illumina HiSeq X Ten | 250 bp | 41.8 | 110 |
| genome | PacBio SEQUEL | 20 kb | 27.2 | 73 |
| Hi-C | Illumina HiSeq X Ten | 250 bp | 50.7 | 132 |
| transcriptome | PacBio SEQUEL | 0.6–3 kb | 13.1 | — |
Note that the sequence coverage values were calculated based on the genome size estimated by the Kmer-based method.
Fig. 3The 17-mer count distribution for the genome size estimation. Note that the peaks around the depths of 33, 66 and 132 represent the heterozygous, homozygous and repeated Kmers, respectively.
Assembly statistics for the yellowbelly pufferfish.
| Content | Length | Number | ||||||
|---|---|---|---|---|---|---|---|---|
| Contig (Mb) | Scaffold (Mb) | Contig | Scaffold | |||||
| new | old | new | old | new | old | new | old | |
| Total | 366.26 | 278.46 | 366.28 | 366.28 | 1,117 | 376,565 | 867 | 3,226 |
| Max | 12.82 | 0.046 | 28.84 | 2.8 | — | — | — | — |
| Number > =2 kb | — | — | — | — | 1,115 | 23,662 | 867 | 3,146 |
| N50 | 4.4 | 0.0011 | 15.7 | 0.37 | 28 | 64,775 | 10 | 251 |
| N90 | 0.4 | 0.0003 | 11.7 | 0.055 | 127 | 241,187 | 21 | 1,198 |
Note that the term contig here refers to the continuous sequences obtained after the Hi-C-data-based chromosome construction. Note that “new” represents the genome assembled in the present work and that “old” refers to the genome published in 2014.
Summary of the assembled chromosomes of the yellowbelly pufferfish.
| Chr | Chr length (bp) | Contig number | Gene number |
|---|---|---|---|
| Chr1 | 28,838,366 | 15 | 2,171 |
| Chr2 | 19,632,357 | 11 | 1,243 |
| Chr3 | 19,136,632 | 13 | 1,361 |
| Chr4 | 18,781,179 | 28 | 1,639 |
| Chr5 | 18,395,123 | 16 | 1,444 |
| Chr6 | 16,875,900 | 15 | 1,440 |
| Chr7 | 16,703,359 | 13 | 1,189 |
| Chr8 | 16,202,710 | 8 | 1,268 |
| Chr9 | 15,776,270 | 8 | 1,063 |
| Chr10 | 15,676,631 | 7 | 1,215 |
| Chr11 | 15,654,207 | 13 | 1,091 |
| Chr12 | 15,631,021 | 10 | 1,269 |
| Chr13 | 15,542,920 | 11 | 1,272 |
| Chr14 | 15,503,328 | 17 | 1,341 |
| Chr15 | 15,463,098 | 11 | 1,395 |
| Chr16 | 14,247,604 | 12 | 1,103 |
| Chr17 | 13,381,174 | 14 | 986 |
| Chr18 | 13,174,367 | 19 | 1,324 |
| Chr19 | 12,605,058 | 6 | 993 |
| Chr20 | 12,303,402 | 10 | 991 |
| Chr21 | 11,708,235 | 5 | 868 |
| Chr22 | 9,947,389 | 9 | 747 |
Repetitive element annotations in the yellowbelly pufferfish.
| No. of TEs | Length (bp) | % of total TEs | % of genome | |
|---|---|---|---|---|
|
| 300,773 | 60,927,544 | 100 | 16.63 |
|
| 77,720 | 29,919,159 | 49.11 | 8.17 |
| | 20,782 | 10,394,098 | 17.06 | 2.84 |
| Ty1/Copia | 865 | 227,841 | 0.37 | 0.06 |
| Ty3/Gypsy | 7,440 | 3,670,541 | 6.02 | 1.00 |
| Other | 12,477 | 6,495,716 | 10.66 | 1.77 |
| | 51,417 | 18,060,948 | 29.64 | 4.93 |
| LINE | 37,274 | 16,042,878 | 26.33 | 4.38 |
| SINE | 14,143 | 2,018,070 | 3.31 | 0.55 |
| | 5,521 | 1,464,113 | 2.40 | 0.40 |
|
| 39,742 | 11,514,017 | 18.90 | 3.14 |
| | ||||
| CMC[DTC] | 3,477 | 365,955 | 0.60 | 0.10 |
| hAT | 8,606 | 3,318,856 | 5.45 | 0.91 |
| Mutator | 406 | 110,998 | 0.18 | 0.03 |
| Tc1/Mariner | 9,293 | 3,015,854 | 4.95 | 0.82 |
| PIF/Harbinger | 2,035 | 672,257 | 1.10 | 0.18 |
| Other | 6,632 | 1,014,243 | 1.66 | 0.28 |
| | 74 | 12,208 | 0.02 | 0.00 |
|
| 194,185 | 19,801,588 | 32.50 | 5.41 |
|
| 1,412 | 870,602 | 1.43 | 0.24 |
The statistics of functional annotation of protein-coding genes.
| Database | Number | Percent (%) |
|---|---|---|
| Nr | 27,859 | 94.7 |
| GO | 16,533 | 56.2 |
| KEGG | 27,700 | 94.2 |
| SwissProt | 23,881 | 81.2 |
| At least one database | 28,017 | 95.2 |
| Total | 29,416 |
Note that “at least one database” here refers to genes with at least one hit in multiple databases.
Fig. 4Two examples of the alignment of scaffolds from the previous genome assembly to our new yellowbelly pufferfish genome assembly. (a) Alignments on contig5 in the new genome. (b) Alignment on contig74 in the new genome. The X axis represents the scaffolds from the previous genome, and the Y axis represents the contig sequences assembled in this work. The straight and reverse alignments of the scaffold sequences are shown in blue and red, respectively.
| Measurement(s) | whole genome sequencing assay • transcription profiling assay • sequence_assembly • sequence annotation |
| Technology Type(s) | DNA sequencing • RNA sequencing assay • genome assembly • bioinformatics analysis |
| Sample Characteristic - Organism | Takifugu flavidus |
| Sample Characteristic - Environment | aquatic environment |