| Literature DB >> 35918339 |
Fayan Wang1, Lihan Wang1, Dan Liu1, Qiang Gao1, Miaomiao Nie1, Shihai Zhu2, Yan Chao3, Chaojie Yang1, Cunfang Zhang1, Rigui Yi1, Weilin Ni1, Fei Tian4, Kai Zhao4, Delin Qi5.
Abstract
Gymnocypris eckloni is widely distributed in isolated lakes and the upper reaches of the Yellow River and play significant roles in the trophic web of freshwater communities. In this study, we generated a chromosome-level genome of G. eckloni using PacBio, Illumina and Hi-C sequencing data. The genome consists of 23 pseudo-chromosomes that contain 918.68 Mb of sequence, with a scaffold N50 length of 43.54 Mb. In total, 23,157 genes were annotated, representing 94.80% of the total predicted protein-coding genes. The phylogenetic analysis showed that G. eckloni was most closely related to C. carpio with an estimated divergence time of ~34.8 million years ago. For G. eckloni, we identified a high-quality genome at the chromosome level. This genome will serve as a valuable genomic resource for future research on the evolution and ecology of the schizothoracine fish in the Qinghai-Tibetan Plateau.Entities:
Mesh:
Year: 2022 PMID: 35918339 PMCID: PMC9346132 DOI: 10.1038/s41597-022-01595-w
Source DB: PubMed Journal: Sci Data ISSN: 2052-4463 Impact factor: 8.501
Sequencing data used for the genome G. eckloni assembly.
| Library types | Insert size (bp) | Raw data (Gb) | Clean data (Gb) | Read length (bp) | Sequence coverage (X) |
|---|---|---|---|---|---|
| Illumina reads | 300 | 215.7 | 215.2 | 150 | 231.2 |
| PacBio reads | 20000 | 312.2 | 239.0 | 23706 | 334.6 |
| Hi-C reads | — | 257.3 | 257.3 | 300 | 275.8 |
| RNA reads | 300 | 67.76 | 66.43 | 150 | — |
| Total | — | 852.96 | 777.93 | — | — |
The statistics of length and number for the de novo assembled G. eckloni genome.
| Term | Length | No. | ||
|---|---|---|---|---|
| Contig (bp) | Scaffold (bp) | Contig | Scaffold | |
| 918,450,624 | 918,681,488 | 3,170 | 711 | |
| 22,682,260 | 89,391,071 | — | — | |
| — | — | 3,058 | 711 | |
| 4,192,824 | 43,543,958 | 56 | 8 | |
| 2,476,204 | 34,715,927 | 85 | 11 | |
| 1,500,513 | 32,896,108 | 133 | 13 | |
| 641,416 | 29,129,546 | 229 | 16 | |
| 146,685 | 25,669,045 | 553 | 20 | |
Fig. 1Characteristics of the G. eckloni genome. (a) Hi-C intra-chromosomal contact map of the G. eckloni genome assembly. (b) Circos plot of the G. eckloni genome assembly. 1) Pseudo-chromosomes; 2) gene distribution; 3) GC content; 4) repeat distribution; 5) rRNA distribution; 6) tRNA distribution; 7) miRNA distribution; 8) snRNA distribution. All data were obtained using a sliding window of 10 Kb.
Gene annotation of G. eckloni genome via three methods.
| Method | Gene set | Number | Average length (bp) | Exons No. per gene | |||
|---|---|---|---|---|---|---|---|
| Transcript | CDS | Exon | Intron | ||||
| De novo | Augustus | 38,431 | 9,427.68 | 1,102.72 | 181.26 | 1,637.57 | 6.08 |
| GlimmerHMM | 88,372 | 9,368.19 | 580.21 | 146.67 | 2,973.05 | 3.96 | |
| SNAP | 47,478 | 20,534.02 | 796.89 | 143.55 | 4,336.44 | 5.55 | |
| Geneid | 32,716 | 17,045.39 | 1,223.45 | 216.31 | 3,398.15 | 5.66 | |
| Genscan | 32,712 | 19,569.14 | 1,429.87 | 189.78 | 2,775.94 | 7.53 | |
| Homolog | 18,845 | 11,159.32 | 1,293.66 | 179.22 | 1,586.58 | 7.22 | |
| 24,602 | 9,475.91 | 1,264.07 | 184.13 | 1,400.12 | 6.87 | ||
| 19,535 | 13,585.62 | 1,522.10 | 182.89 | 1,647.47 | 8.32 | ||
| 23,776 | 10,240.81 | 1,276.54 | 182.36 | 1,494.04 | 7.00 | ||
| 18,028 | 13,503.98 | 1,497.61 | 181.74 | 1,658.25 | 8.24 | ||
| 20,929 | 13,270.63 | 1,510.71 | 180.45 | 1,595.28 | 8.37 | ||
| 20,090 | 11,862.97 | 1,393.39 | 185.72 | 1,610.03 | 7.50 | ||
| RNAseq | PASA | 91,220 | 14,128.66 | 1,240.08 | 165.12 | 1,979.72 | 7.51 |
| Transcripts | 66,837 | 31,133.65 | 2,702.91 | 300.16 | 3,551.63 | 9.00 | |
| EVM | 35,931 | 11,908.94 | 1,192.33 | 176.57 | 1,862.87 | 6.75 | |
| Pasa-update | 35,599 | 12,447.20 | 1,220.20 | 177.47 | 1,910.77 | 6.88 | |
| Final set | 24,430 | 16,219.34 | 1,536.71 | 173.00 | 1,862.69 | 8.88 | |
Note that CDS refers to coding sequence; GlimmerHMM was a new genefinder based on a Generalized Hidden Markov Model (GHMM); SNAP refers to Semi-HMM-based Nucleic Acid Parser; EVM refers to Evidence modeler.
Fig. 2The composition of gene elements in the G. eckloni genome to other species. (a) CDS length distribution and comparison with other species. (b) Exon length distribution and comparison with other species. (c) Exon number distribution and comparison with other species. (d) Gene length distribution and comparison with other species. (e) Intron length distribution and comparison with other species.
Fig. 3Venn diagram of number of genes with homology or functional classification by each method.
Fig. 4Phylogenetic tree based on single-copy genes from 14 species shows the estimated divergence time (blue numbers), topology and expansion (green numbers), and contraction (red numbers) of gene families.
| Measurement(s) | Genome |
| Technology Type(s) | Whole Genome Sequencing |
| Sample Characteristic - Organism | Gymnocypris eckloni |
| Sample Characteristic - Environment | fresh water |
| Sample Characteristic - Location | Little Yellow River |