| Literature DB >> 31570724 |
Zhixiong Zhou1, Bo Liu2, Baohua Chen1, Yue Shi1, Fei Pu1, Huaqiang Bai1, Leibin Li2, Peng Xu3,4,5.
Abstract
Takifugu bimaculatus is a native teleost species of the southeast coast of China where it has been cultivated as an important edible fish in the last decade. Genetic breeding programs, which have been recently initiated for improving the aquaculture performance of T. bimaculatus, urgently require a high-quality reference genome to facilitate genome selection and related genetic studies. To address this need, we produced a chromosome-level reference genome of T. bimaculatus using the PacBio single molecule sequencing technique (SMRT) and High-through chromosome conformation capture (Hi-C) technologies. The genome was assembled into 2,193 contigs with a total length of 404.21 Mb and a contig N50 length of 1.31 Mb. After chromosome-level scaffolding, 22 chromosomes with a total length of 371.68 Mb were constructed. Moreover, a total of 21,117 protein-coding genes and 3,471 ncRNAs were annotated in the reference genome. The highly accurate, chromosome-level reference genome of T. bimaculatus provides an essential genome resource for not only the genome-scale selective breeding of T. bimaculatus but also the exploration of the evolutionary basis of the speciation and local adaptation of the Takifugu genus.Entities:
Mesh:
Year: 2019 PMID: 31570724 PMCID: PMC6768875 DOI: 10.1038/s41597-019-0195-2
Source DB: PubMed Journal: Sci Data ISSN: 2052-4463 Impact factor: 6.444
Summary of genome sequencing data generated with multiple sequencing technologies.
| Library Type | Insert Size (bp) | Raw Data (Gb) | Clean Data (Gb) | Average Read Length (bp) | N50 Read Length (bp) | Sequencing Coverage (X) |
|---|---|---|---|---|---|---|
| Illumina | 350 | 53.43 | 53.28 | 150 | 150 | 135.52 |
| PacBio | 20,000 | 28.97 | — | 7,505 | 12,513 | 73.69 |
| Hi-C | — | 46.39 | 46.13 | 150 | 150 | 117.8 |
| RNA-Seq | — | 21.35 | 20.95 | 150 | 150 | 54.3 |
| Total | — | 149.99 | — | — | — | 381.5 |
Note: Genome size of T. bimaculatus used to calculate sequencing coverage were 393.15 Mbp, which is estimated by genome survey.
Fig. 1The genome assembly pipeline.
Statistics of the genome assembly of T. bimaculatus.
| length | Number | |||
|---|---|---|---|---|
| Contig (bp) | Scaffold (bp) | Contig | Scaffold | |
| Total | 404,208,938 | 404,312,138 | 2,193 | 1,161 |
| Max | 8,128,173 | 28,865,866 | — | — |
| Number >= 2000 | — | — | 2,143 | 1.111 |
| N50 | 1.312,995 | 16,785,490 | 82 | 11 |
| N60 | 951,152 | 16,217,719 | 117 | 13 |
| N70 | 563,057 | 15,683,578 | 173 | 16 |
| N80 | 220,884 | 13,896,868 | 292 | 19 |
| N90 | 68,784 | 10,376,233 | 627 | 22 |
Summary of assembled 22 chromosomes of T. bimaculatus.
| Chromosomes | Length (Mbp) | Number of Contigs |
|---|---|---|
| Chr1 | 28,856,866 | 68 |
| Chr2 | 20,901,650 | 55 |
| Chr3 | 20,839,560 | 60 |
| Chr4 | 19,082,936 | 61 |
| Chr5 | 18,556,983 | 59 |
| Chr6 | 17,762,956 | 51 |
| Chr7 | 17,385,507 | 47 |
| Chr8 | 17,095,808 | 54 |
| Chr9 | 17,068,765 | 55 |
| Chr10 | 16,786,025 | 53 |
| Chr11 | 16,785,490 | 54 |
| Chr12 | 16,284,555 | 50 |
| Chr13 | 16,217,719 | 54 |
| Chr14 | 16,120,980 | 47 |
| Chr15 | 16,059,269 | 50 |
| Chr16 | 15,683,578 | 65 |
| Chr17 | 14,840,516 | 62 |
| Chr18 | 14,847,795 | 52 |
| Chr19 | 13,896,868 | 51 |
| Chr20 | 13,487,414 | 56 |
| Chr21 | 12,729,218 | 46 |
| Chr22 | 10,376,233 | 40 |
| Linked Total | 371,675,691 | 1,242 |
| Unlinked Total | 32,532,707 | 951 |
| Linked Percent | 91.95% | 56.63% |
Classification of repeat elements and ncRNAs in T. bimaculatus genome.
| Repeat type | Denove + Repbase Length (bp) | TE protein Length (bp) | Combined TEs length (bp) | Proportion in Genome (%) | |
|---|---|---|---|---|---|
| DNA | 21,029,049 | 3,437,660 | 24,459,756 | 6.05 | |
| LINE | 37,262,756 | 12,547,875 | 49,755,614 | 12.31 | |
| SINE | 1,189,529 | 0 | 1,189,529 | 0.29 | |
| LTR | 25,586,059 | 5,992,977 | 31,547,035 | 7.80 | |
| Simple Repeat | 8,473,364 | 0 | 8,473,364 | 2.10 | |
| Unknow | 4,719,800 | 0 | 4,719,800 | 1.17 | |
| Total | 88,122,922 | 21,916,443 | 109,924,780 | 27.20 | |
|
|
|
|
|
| |
| miRNA | 1666 | 91.11 | 151786 | 0.037551 | |
| tRNA | 753 | 75.20 | 56629 | 0.01401 | |
| rRNA | 18S | 464 | 113.37 | 52604 | 0.013014 |
| 28S | 1 | 121 | 121 | 0.00003 | |
| 5.8S | 9 | 142.78 | 1,285 | 0.000318 | |
| 5S | 0 | 0 | 0 | 0 | |
| Subtotal | 454 | 112.77 | 51,198 | 0.012666 | |
| sRNA | CD-box | 588 | 141.15 | 82,996 | 0.020533 |
| HACA-box | 84 | 92.52 | 7,772 | 0.001923 | |
| Splicing | 77 | 162.88 | 12,542 | 0.003103 | |
| Subtotal | 413 | 144.85 | 59,821 | 0.0148 | |
Note: “Denovo” represented the de novo identified transposable elements using RepeatMasker, RepeatModeler, RepeatScout, and LTR_FINDER. “TE protein” meant the homologous of transposable elements in Repbase identified with RepeatProteinMask. While “Combined TEs” referred to the combined result of transposable elements identified in the two ways. “Unknown” represented transposable elements could not be classified by RepeatMasker.
Fig. 2Circos plot of the reference genome of T. bimaculatus and syntenic relationship with the T. rubripes genome. (a) Circos plot of 22 chromosomes and the annotated genes, ncRNAs and transposable elements of T. bimaculatus. The tracks from inside to outside are 22 chromosome-level scaffolds, the positive-strand gene abundance (red), negative-strand gene abundance (blue), positive-strand TE abundance (orange), negative-strand TE abundance (green), ncRNA abundance of both strands, and contigs that comprised the scaffolds (adjacent contigs on a scaffold are shown in different colours). (b) Circos diagram between T. bimaculatus and T. rubripes. Each coloured arc represents a 1 Kb fragment match between two species. We re-ordered the chromosome numbers of T. rubripes for better illustration.
Fig. 3Gene and repetitive element annotations of the T. bimaculatus genome. (a) Divergence distribution of TEs in the T. bimaculatus genome (b) Venn diagram of the number of genes with structure prediction based on different strategies. (c) Venn diagram of the number of functionally annotated genes based on different public databases.
Gene structure and function annotation in T. bimaculatus genome.
|
| |
| Number of protein-coding gene | 21,117 |
| Number of unannotated gene | 19 |
| Average transcript length (bp) | 7,914.81 |
| Average exons per gene | 9.71 |
| Average exon length (bp) | 162.13 |
| Average CDS length (bp) | 1,573.89 |
| Average intron length (bp) | 728.2 |
|
| |
| Number (Percent) | |
| Swissprot | 20,086 (95.10%) |
| Nr | 20,817 (98.60%) |
| KEGG | 18,307 (86.70%) |
| InterPro | 21,090 (99.90%) |
| GO | 19,934 (94.40%) |
| Pfam | 18,050 (85.50%) |
| Annotated | 21,098 (99.90%) |
| Unannotated | 19 (0.10%) |
Fig. 4Divergence times and distribution of different types of orthologues in representative species. (a)Estimated divergence times of representative species based on the phylogenomic analysis. The blue bars in the ancestral nodes indicate the 95% confidence intervals of the estimated divergence time (MYA, million years). Different background colours represent the corresponding geological age. (b) Distribution of different types of orthologues in the selected representative species.
| Measurement(s) | whole genome sequencing assay • transcription profiling assay |
| Technology Type(s) | DNA sequencing • RNA sequencing |
| Factor Type(s) | organism part |
| Sample Characteristic - Organism | Takifugu bimaculatus |