| Literature DB >> 30720802 |
Hai-Ping Liu1, Shi-Jun Xiao1,2, Nan Wu3, Di Wang3, Yan-Chao Liu1, Chao-Wei Zhou1, Qi-Yong Liu1, Rui-Bin Yang4, Wen-Kai Jiang3, Qi-Qi Liang3, Chi Zhang1, Jun-Hua Gong1, Xiao-Hui Yuan2, Zhen-Bo Mou1.
Abstract
Animal genomes in the Qinghai-Tibetan Plateau provide valuable resources for scientists to understand the molecular mechanism of environmental adaptation. Tibetan fish species play essential roles in the local ecology; however, the genomic information for native fishes was still insufficient. Oxygymnocypris stewartii, belonging to Oxygymnocypris genus, Schizothoracinae subfamily, is a native fish in the Tibetan plateau living within the elevation from roughly 3,000 m to 4,200 m. In this report, PacBio and Illumina sequencing platform were used to generate ~385.3 Gb genomic sequencing data. A genome of about 1,849.2 Mb was obtained with a contig N50 length of 257.1 kb. More than 44.5% of the genome were identified as repetitive elements, and 46,400 protein-coding genes were annotated in the genome. The assembled genome can be used as a reference for future population genetic studies of O. stewartii and will improve our understanding of high altitude adaptation of fishes in the Qinghai-Tibetan Plateau.Entities:
Mesh:
Year: 2019 PMID: 30720802 PMCID: PMC6362891 DOI: 10.1038/sdata.2019.9
Source DB: PubMed Journal: Sci Data ISSN: 2052-4463 Impact factor: 6.444
Figure 1A picture of Oxygymnocypris stewartii.
(a) The appearance of Oxygymnocypris stewartii; (b) Distributed localization (red triangle) of Oxygymnocypris stewartii for the genomic sequencing.
Sequencing data used for the Oxygymnocypris stewartii genome assembly.
| Library types | Insert size (bp) | Raw data (Gb) | Clean data (Gb) | Read length (bp) | Sequence coverage (X) |
|---|---|---|---|---|---|
| Note that the coverage was calculated using the estimated genome size from | |||||
| Illumina reads | 250 | 145.4 | 144.3 | 150 | 76.21 |
| Pacbio reads | 20,000 | 141.1 | 140.7 | 13,287 | 74.31 |
| RNA reads | 250 | 98.8 | 94.76 | 150 | 50.04 |
| Total | — | 385.3 | 379.76 | — | 200.56 |
Statistics of 17-mer analysis for Oxygymnocypris stewartii genome.
| Peak depth | Genome size(Mb) | Used bases | Used reads | Coverage (X) | ||
|---|---|---|---|---|---|---|
| Note that all 17-mer sequences were extracted from paired-end clean reads that passed quality control (QC) from Next-generation sequencing libraries, and the frequency of each 17-mer was calculated and plotted in | ||||||
| 17 | 115,523,294,760 | 60 | 1,893.51 | 144,295,054,200 | 961,967,028 | 76.21 |
Figure 217-mer frequency distribution in Oxygymnocypris stewartii genomes.
The X-axis is the Kmer depth, and Y-axis represents the frequency of the Kmer for a given depth.
The statistics of length and number for the de novo assembled genome of Oxygymnocypris stewartii.
| Statistics | Length (bp) | Number |
|---|---|---|
| Note that the length statistics of the genome assembly was based on the estimated genome size from | ||
| Total | 1,849,224,471 | 26,281 |
| Max | 8,753,147 | — |
| Number >= 2000 | — | 25,716 |
| N50 | 257,093 | 1,104 |
| N60 | 120,727 | 2,199 |
| N70 | 70,409 | 4,248 |
| N80 | 44,440 | 7,597 |
| N90 | 29,065 | 12,765 |
The annotation of repeated sequences in the Oxygymnocypris stewartii genome using TRF, RepeatMasker, and RepeatProteinMask.
| Type | Repeat Size(bp) | percentage of genome (%) |
|---|---|---|
| Note that the total content was merged and redundancy was eliminated by each method. | ||
| TRF (Tendem Repeat Finder) | 151,169,214 | 8.17 |
| RepeatMasker | 788,753,932 | 42.65 |
| RepeatProteinMask | 103,914 | 0.01 |
| Total | 822,841,233 | 44.50 |
Summary statistics of repeat annotation in Oxygymnocypris stewartii.
| Type | TE Proteins | Combined TEs | |||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| Length (bp) | % in Genome | Length (bp) | % in Genome | Length (bp) | % in Genome | ||||||
| Note that | |||||||||||
| DNA | 294,627,292 | 15.93 | 6,140 | 0.0003 | 294,628,980 | 15.93 | |||||
| LINE | 180,661,987 | 9.77 | 54,732 | 0.003 | 180,672,396 | 9.77 | |||||
| SINE | 10,828,447 | 0.59 | 0 | 0 | 10,828,447 | 0.59 | |||||
| LTR | 283,995,197 | 15.36 | 43,968 | 0.0024 | 284,000,105 | 15.36 | |||||
| Satellite | 35,364,895 | 1.91 | 0 | 0 | 35,364,895 | 1.91 | |||||
| Simple_repeat | 37,479,121 | 2.03 | 0 | 0 | 37,479,121 | 2.03 | |||||
| Unknown | 25,680,794 | 1.39 | 0 | 0 | 25,680,794 | 1.39 | |||||
| Total | 788,753,932 | 42.65 | 103,914 | 0.0056 | 788,758,656 | 42.65 | |||||
Figure 3Distribution of the divergence rate of each type of repetitive element in Oxygymnocypris stewartii genome.
The divergence rate was calculated between the identified TE elements in the genome by the homology-based method and the consensus sequence in the Repbase.
The number of the annotated non-coding RNA in the Oxygymnocypris stewartii genome.
| Type | Number | Average length (bp) | Total length (bp) | % of genome | |
|---|---|---|---|---|---|
| miRNA | 1,758 | 106.4 | 187,050 | 0.0101 | |
| tRNA | 24,208 | 75.45 | 1,826,526 | 0.0988 | |
| rRNA | rRNA | 1,363 | 123.19 | 167,907 | 0.0091 |
| 18 S | 112 | 294.73 | 33,010 | 0.0018 | |
| 28 S | 170 | 210.1 | 35,717 | 0.0019 | |
| 5.8 S | 19 | 103.42 | 1,965 | 0.0001 | |
| 5 S | 1,062 | 91.54 | 97,215 | 0.0053 | |
| snRNA | snRNA | 923 | 132.36 | 122,168 | 0.0066 |
| CD-box | 221 | 111.13 | 24,560 | 0.0013 | |
| HACA-box | 215 | 143.72 | 30,899 | 0.0017 | |
| splicing | 444 | 129.1 | 57,322 | 0.0031 | |
The statistics of gene models of protein-coding genes annotated in the Oxygymnocypris stewartii genome.
| Methods/Tools | Gene Number | Average length (bp) | Exons number per gene | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| transcript | CDS | Exon | Intron | ||||||||||
| Note that: CDS refers to coding sequence; GlimmerHMM was a new gene finder based on a Generalized Hidden Markov Model (GHMM); SNAP refers to Semi-HMM-based Nucleic Acid Parser; EVM refers to Evidence modeler. | |||||||||||||
| Augustus | 101,732 | 7,592.54 | 981.37 | 188.14 | 1,568.10 | 5.22 | |||||||
| GlimmerHMM | 223,822 | 7,337.30 | 534.39 | 154.34 | 2,762.67 | 3.46 | |||||||
| SNAP | 198,963 | 10,915.28 | 755.07 | 150.73 | 2,534.08 | 5.01 | |||||||
| Geneid | 97,442 | 10,811.87 | 1,010.54 | 230.95 | 2,903.65 | 4.38 | |||||||
| Genscan | 95,641 | 12,679.26 | 1,184.24 | 200.27 | 2,339.66 | 5.91 | |||||||
| Homolog | 53,733 | 8,271.98 | 1,195.23 | 202.14 | 1,440.46 | 5.91 | |||||||
| 70,092 | 6,457.54 | 1,162.26 | 217.66 | 1,220.15 | 5.34 | ||||||||
| 63,215 | 8,466.61 | 1,261.85 | 206.17 | 1,407.08 | 6.12 | ||||||||
| 78,104 | 6,467.89 | 1,176.98 | 227.61 | 1,268.52 | 5.17 | ||||||||
| 44,944 | 9,259.53 | 1,202.88 | 189.77 | 1,509.12 | 6.34 | ||||||||
| 59,212 | 8,747.74 | 1,268.06 | 205.62 | 1,447.61 | 6.17 | ||||||||
| 70,380 | 7,956.29 | 1,204.04 | 205.73 | 1,391.50 | 5.85 | ||||||||
| 46,698 | 9,041.85 | 1,176.04 | 189.54 | 1,511.28 | 6.2 | ||||||||
| RNA-seq | Cufflinks | 93,109 | 21,118.90 | 3,436.98 | 357.39 | 2,051.98 | 9.62 | ||||||
| PASA | 140,045 | 10,537.33 | 1,152.91 | 165.15 | 1,569.00 | 6.98 | |||||||
| EVM | 101,031 | 8,674.09 | 1,018.48 | 183.65 | 1,684.16 | 5.55 | |||||||
| PASA-update | 100,450 | 8,739.14 | 1,026.34 | 184.60 | 1,691.47 | 5.56 | |||||||
| Final set | 46,400 | 13,348.16 | 1438.34 | 171.04 | 1,607.39 | 8.41 | |||||||
The comparison of the gene models annotated from the Oxygymnocypris stewartii genome and other teleosts.
| Species | Gene Number | Average length (bp) | Exons number per gene | |||
|---|---|---|---|---|---|---|
| transcript | CDS | Exon | Intron | |||
| 46,400 | 13,348.16 | 1438.34 | 171.04 | 1,607.39 | 8.41 | |
| 32,811 | 10444.53 | 1384.98 | 180.99 | 1361.89 | 7.65 | |
| 19,805 | 43772.47 | 1457.89 | 171.22 | 5631.04 | 8.51 | |
| 22,278 | 37435.55 | 1600.64 | 179.14 | 4516.11 | 8.93 | |
| 45,899 | 16243.9 | 1585.31 | 171.68 | 1780.2 | 9.23 | |
| 21,317 | 8334.84 | 1699.01 | 165.45 | 715.91 | 10.27 | |
| 25,619 | 25207.59 | 1642.64 | 174.39 | 2798.97 | 9.42 | |
| 49,264 | 11780.68 | 1260.28 | 163.96 | 1573.34 | 7.69 | |
| 22,966 | 17866.19 | 1760.81 | 170.99 | 1732.19 | 10.3 | |
Figure 4Comparisons of the prediction gene models in the Oxygymnocypris stewartii genome to other species.
(a) CDS length distribution and comparison with other species. (b) Exon length distribution and comparison with other species. (c) Exon number distribution and comparison with other species. (d) Gene length distribution and comparison with other species. (e) Intron length distribution and comparison with other species.
The number of genes with homology or functional classification for Oxygymnocypris stewartii.
| Database | Annotated Num | Annotated Percent (%) | |
|---|---|---|---|
| NR | 45,976 | 99.1 | |
| Swiss-Prot | 43,115 | 92.9 | |
| KEGG | 39,302 | 84.7 | |
| InterPro | All | 43,183 | 93.1 |
| Pfam | 38,742 | 83.5 | |
| GO | 31,811 | 68.6 | |
| Annotated | 45,991 | 99.1 | |
| Total | 46,400 | - | |
Figure 5Venn diagram of the number of genes with functional annotation using multiple public databases.