| Literature DB >> 31641137 |
Dian-Chang Zhang1, Liang Guo2, Hua-Yang Guo2, Ke-Cheng Zhu2, Shang-Qi Li3, Yan Zhang3, Nan Zhang2, Bao-Suo Liu2, Shi-Gui Jiang2, Jiong-Tang Li4.
Abstract
Golden pompano (Trachinotus ovatus), a marine fish in the Carangidae family, has a wide geographical distribution and adapts to severe environmental rigours. It is also an economically valuable aquaculture fish. To understand the genetic mechanism of adaption to environmental rigours and improve the production in aquaculture, we assembled its genome. By combination of Illumina and Pacbio reads, the obtained genome sequence is 647.5 Mb with the contig N50 of 1.80 Mb and the scaffold N50 of 5.05 Mb. The assembly covers 98.9% of the estimated genome size (655 Mb). Based on Hi-C data, 99.4% of the assembled bases are anchored into 24 pseudo-chromosomes. The annotation includes 21,915 protein-coding genes, in which 95.7% of 2,586 BUSCO vertebrate conserved genes are complete. This genome is expected to contribute to the comparative analysis of the Carangidae family.Entities:
Mesh:
Year: 2019 PMID: 31641137 PMCID: PMC6805935 DOI: 10.1038/s41597-019-0238-8
Source DB: PubMed Journal: Sci Data ISSN: 2052-4463 Impact factor: 6.444
Fig. 1The pipelines of the chromosome-level pompano genome assembly.
Fig. 2The K-mer distribution of Illumina paired-end reads using GenomeScope based on k value of 31. Frequency distribution of k-mers of different occurrences in two pair-end libraries. K-mer occurrences (x axis) were plotted against their frequencies (y axis).
Data statistics of whole genome sequencing reads of pompano.
| Platform | Insert size | Clean pairs | Total bases | Genome coverage (X) | SRA accession |
|---|---|---|---|---|---|
| Illumina | 500 bp | 44,554,312 | 19,894,674,143 | 30.3 | SRR8185380 |
| 700 bp | 94,147,131 | 15,691,188,500 | 23.9 | SRR8185379 | |
| 3 K bp | 24,639,173 | 5,597,129,699 | 8.5 | SRR8185378 | |
| 5 K bp | 22,753,897 | 5,688,834,998 | 8.6 | SRR8185382 | |
| 14 K bp | 149,292,822 | 28,171,641,480 | 42.9 | SRR8185385 | |
| Hi-C (Illumina X ten) | 382,798,592 | 114,839,577,600 | 175.1 | SRR8168440 | |
| Pacbio | 2,278,176 | 16,879,861,540 | 25.7 | SRR7943174 | |
| Total | 272,622,581 | 206,762,907,960 | 315.3 |
Estimation of genome size of pompano by k-mer analysis.
| K | Total number of k-mers | Number of erroneous k-mers | Peak in Jellyfish counting | Estimated genome size (Mb) |
|---|---|---|---|---|
| 17 | 30,359,515,882 | 1,700,273,328 | 45 | 636.9 |
| 19 | 29,905,858,631 | 2,266,172,955 | 43 | 642.8 |
| 21 | 29,425,980,179 | 2,419,537,116 | 42 | 643.0 |
| 23 | 28,931,567,876 | 2,494,020,191 | 41 | 644.8 |
| 25 | 28,427,735,494 | 2,544,415,369 | 40 | 647.1 |
| 27 | 27,917,344,738 | 2,581,038,454 | 39 | 649.6 |
| 29 | 27,402,087,718 | 2,606,597,782 | 38 | 652.5 |
| 31 | 26,882,868,388 | 2,621,598,458 | 37 | 655.7 |
Comparisons of other published Carangiformes assemblies.
| Order | Carangiformes | ||||||
|---|---|---|---|---|---|---|---|
| Family | Carangidae | Echeneidae | |||||
| Species |
|
|
|
|
|
| |
| Assembled Size (Mb) | 647.5 | 639.2 | 672.1 | 716.4 | 661.8 | 614.2 | 544.2 |
| Scaffold N50 size (Mb) | 5.05 | 5.61 | 5.81 | 1.27 | 9.51 | 0.411 | NA |
| Total scaffolds | 373 | 384 | 34,656 | 99,598 | 1,343 | 7,606 | NA |
| Pseudo-chromosome number | 24 | NA | NA | NA | NA | NA | 24 |
| Average pseudo-chromosome length (Mb) | 26.8 | NA | NA | NA | NA | NA | 22.5 |
| Number protein-coding genes | 21,915 | NA | 22,083 | 25,802 | NA | NA | 21,288 |
| Average CDS length | 1,608 | NA | 1,806 | 1,647 | NA | NA | 1,863 |
| Average exon number | 10.4 | NA | 11.0 | 9.96 | NA | NA | 11.2 |
| Average exon length | 275 | NA | 248 | 271 | NA | NA | 267 |
Fig. 3Hi-C chromosome contact map. Each block represents a Hi-C contact between two genomic loci within a 100-kb window. Darker color of a block indicates higher contact intensity.
Repeat content in pompano genome.
| Repeat elements | Copies | Bases | Percent (%) |
|---|---|---|---|
|
| |||
| SINE | 11,964 | 1,473,642 | 0.22 |
| Penelope | 2,054 | 373,482 | 0.06 |
| LINE | 54,917 | 13,503,181 | 2.08 |
| LTR | 15,038 | 2,965,180 | 0.46 |
| DNA transposon | 161,301 | 22,551,263 | 3.48 |
| Unclassified | 435,045 | 69,429,000 | 10.71 |
| Subtotal | 680,319 | 109,922,266 | 16.96 |
|
| |||
| Satellites | 1,037 | 167,798 | 0.026 |
| Simple repeats | 415,200 | 18,131,460 | 2.80 |
| Low complexity | 50,191 | 2,814,637 | 0.43 |
| Subtotal | 466,428 | 21,113,895 | 3.26 |
|
| 2,167 | 188,301 | 0.029 |
|
| 1,148,914 | 131,224,462 | 20.25 |
Annotation of pompano genes to different databases.
| Type | Database | Assigned gene number |
|---|---|---|
| Homolog | Ensembl | 21,277 |
| SwissProt | 19,794 | |
| TrEMBL | 21,356 | |
| Total | 21,365 | |
| Gene Ontology | 20,594 | |
| KEGG pathway | 7,956 | |
| Total | 21,365 |
Fig. 4K-mer spectra copy number plot. Different color on the stacked bars represents copy number on the assembly. Frequency counts (spectral distribution) are computed on the Illumina paired-end reads.
Fig. 5Alignment frequency distribution of Pacbio long reads and Illumina short reads.
Fig. 6Distribution of insert sizes of sequencing reads in five libraries.
Mapping ratio of RNA-seq reads from eight tissues.
| Tissue | Cleaned pairs | Total bases | Alignment ratio | SRA accession |
|---|---|---|---|---|
| Blood | 10,639,911 | 2,631,736,943 | 90.67% | SRR8656488 |
| Liver | 16,235,470 | 4,029,392,277 | 89.20% | SRR8656489 |
| Muscle | 14,800,607 | 3,677,971,940 | 94.05% | SRR8656490 |
| Brain | 14,983,402 | 3,714,276,260 | 82.65% | SRR8656491 |
| Spleen | 8,778,246 | 2,178,602,070 | 93.22% | SRR8656484 |
| Fin | 25,750,965 | 6,390,342,718 | 93.52% | SRR8656485 |
| Ovary | 19,151,732 | 4,749,798,341 | 91.98% | SRR8656486 |
| Stomach | 18,574,229 | 4,604,137,153 | 87.94% | SRR8656487 |
| Total | 128,914,562 | 31,976,257,702 | 90.49% |
BUSCO evaluation of the pompano genes compared with the vertebrate gene set.
| BUSCO benchmark | Number | Percentage (%) |
|---|---|---|
| Complete BUSCOs | 2,473 | 95.7% |
| Complete and single-copy BUSCOs | 2,438 | 94.3% |
| Complete and duplicated BUSCOs | 35 | 1.4% |
| Fragmented BUSCOs | 45 | 1.7% |
| Missing BUSCOs | 68 | 2.6% |
| Total BUSCO vertebrate genes | 2,586 | 100% |
Fig. 7Whole genome plot of four Carangiformes genomes compared to pompano genome. Alignment dot plots show the genome comparisons between four Carangiformes assemblies (y-axis) and pompano assembly (x-axis). Dotted lines (vertical and horizontal, respectively) are the boundaries of chromosome and of scaffolds in the assemblies. (a) Plot between the assemblies of Seriola quinqueradiata and pompano. (b) Plot between Seriola rivoliana assembly and pompano assembly. (c) Plot between Seriola dumerili assembly and pompano assembly. (d) Plot between Echeneis naucrates assembly and pompano assembly.
| Measurement(s) | DNA • chromosome conformation capture assay • transcription profiling assay |
| Technology Type(s) | DNA sequencing • Hi-C • RNA sequencing |
| Factor Type(s) | organism part |
| Sample Characteristic - Organism | Trachinotus ovatus |
| Sample Characteristic - Environment | ocean biome |