| Literature DB >> 32561793 |
Hengxing Ba1, Zexi Cai2, Haoyang Gao3, Tao Qin4, Wenyuan Liu4, Liuwei Xie4, Yaolei Zhang3, Binyu Jing5, Datao Wang6, Chunyi Li7,8.
Abstract
Tarim red deer (Cervus elaphus yarkandensis) is the only subspecies of red deer (of 22 subspecies) from Central Asia. This species is a desert dweller of the Tarim Basin of southern Xinjiang, China, and exhibits some unique adaptations to the dry and extreme hot climate. We report here the assembly of a Tarim red deer genome employing a 10X Genomics library, termed CEY_v1. Our genome consisted of 2.6 Gb with contig N50 and scaffold N50 of 275.5 Kb and 31.7 Mb, respectively. Around 96% of the assembled sequences were anchored onto 34 chromosomes based on the published high-quality red deer genetic linkage map. More than 94% BUSCOs complete genes (including 90.5% single and 3.6% duplicated ones) were detected in the CEY_v1 and 20,653 genes were annotated. The CEY_v1 is expected to contribute to comparative analysis of genome biology, to evolutionary studies within Cervidae, and to facilitating investigation of mechanisms underlying adaptation of this species to the extreme dry and hot climate.Entities:
Mesh:
Year: 2020 PMID: 32561793 PMCID: PMC7305323 DOI: 10.1038/s41597-020-0537-0
Source DB: PubMed Journal: Sci Data ISSN: 2052-4463 Impact factor: 6.444
Fig. 1Photograph and location of the Tarim red deer selected in this study. (a) A photograph of an adult male Tarim red deer individual, from which blood samples were collected for genome sequencing. (b) A natural distribution map of Tarim red deer (yellow arrowhead).
Fig. 2Circos plot showing 34 chromosomes of CEY_v1. (a) chromosome length in Mb unit; (b) arrangement of the scaffolds (>1 Mb) in random colors within each chromosome; (c) the heatmap mapped SNPs number within 1 Mb window, ranging from 0 to 60; (d) histogram showing the GC skewer of 1 Mb windows with 1 Kb step size; (e) line plot of gene density for 1 Mb windows, and (f) line plot of repeat density for 1 Mb windows.
Statistics of chromosome anchoring based on the SNP markers.
| Anchored | Oriented | Unplaced | |
|---|---|---|---|
| Markers (unique) | 38,083 | 37,606 | 106 |
| Markers per Mb | 15.5 | 15.5 | 1 |
| N50 Scaffolds | 28 | 28 | 0 |
| Scaffolds | 269 | 160 | 18,740 |
| Scaffolds with 1 marker | 63 | 0 | 91 |
| Scaffolds with 2 markers | 14 | 2 | 4 |
| Scaffolds with 3 markers | 9 | 2 | 1 |
| Scaffolds with > = 4 markers | 183 | 156 | 1 |
| Total bases | 2,490,596,933 (95.90%) | 2,441,137,212 (94.2%) | 106,169,671 (4.10%) |
Prediction of repeat elements in the Tarim red deer genome.
| Type | Repeat Size(bp) | % of genome |
|---|---|---|
| TRF | 26,065,074 | 1.00 |
| RepeatMasker | 836,426,458 | 32.21 |
| RepeatProteinMask | 431,640,750 | 16.62 |
| 988,599,789 | 38.07 | |
| Total | 1,099,992,590 | 42.36 |
Statistics of repeat elements in the Tarim red deer genome.
| Repbase TEs | TE Proteins | Combined TEs | ||||||
|---|---|---|---|---|---|---|---|---|
| Length (bp) | % in Genome | Length (bp) | % in Genome | Length (bp) | % in Genome | Length (bp) | % in Genome | |
| DNA | 765,397 | 0.03 | 26,322,675 | 1.01 | 655,292 | 0.25 | 26,729,330 | 1.03 |
| LINE | 855,277,270 | 32.94 | 640,898,202 | 24.68 | 423,761,737 | 16.32 | 980,437,996 | 37.76 |
| SINE | 281,327 | 0.01 | 109,276,352 | 4.21 | 0 | 0.00 | 109,493,480 | 4.22 |
| LTR | 247,139,539 | 9.52 | 73,669,154 | 2.84 | 7,252,671 | 0.28 | 303,709,517 | 11.70 |
| Other | 0 | 0.00 | 192 | 0.00 | 444 | 0.00 | 636 | 0.00 |
| Unknown | 3,083,692 | 0.12 | 0 | 0.00 | 0 | 0.00 | 3,083,692 | 0.12 |
| Total | 988,599,789 | 38.07 | 836,426,458 | 32.21 | 431,640,750 | 16.62 | 1,086,749,836 | 41.85 |
The statistics of gene models of protein-coding genes annotated in the Tarim red deer genome.
| Methods | Gene set | Number of genes | Average length (bp) | Exons per gene | |||
|---|---|---|---|---|---|---|---|
| Gene length | CDS length | Exon length | Intron length | ||||
| Augustus | 25,176 | 44,593.56 | 1,427.27 | 175.37 | 6,046.70 | 8.14 | |
| Homolog | 26,515 | 23,126.00 | 1,524.78 | 181.24 | 2,913.94 | 8.41 | |
| 28,410 | 40,491.39 | 1,575.50 | 180.72 | 5,042.44 | 8.72 | ||
| 102,682 | 31,718.82 | 1,081.93 | 165.30 | 5,525.07 | 6.55 | ||
| 27,407 | 33,288.38 | 1,459.59 | 179.88 | 4,474.00 | 8.11 | ||
| 29,486 | 23,673.50 | 1,267.90 | 184.48 | 3,815.11 | 6.87 | ||
| 36,502 | 47,716.59 | 1,749.88 | 168.55 | 4,899.43 | 10.38 | ||
| Glean | 20,652 | 37,290.72 | 1,577.53 | 190.74 | 4,912.07 | 8.54 | |
Statistics of functional annotation.
| Type | Number of overall predicted genes | Percentage of overall predicted genes |
|---|---|---|
| Total | 20,652 | 100% |
| SwissProt | 20,189 | 97.71% |
| KEGG | 18,017 | 87.20% |
| TrEMBL | 20,528 | 99.35% |
| NR | 20,505 | 99.24% |
| GO | 13,867 | 67.11% |
Comparison of the deer genome assembly metrics.
| Species | Assembled genome size (ungaped) (Gb) | Genome coverage (×) | Contig N50 (Kb) | Scaffold N50 (Mb) | Number of scaffolds |
|---|---|---|---|---|---|
| Tarim red deer ( | 2.60 (2.56) | 63 | 275.5 | 31.7 | 19,010 |
| White-lipped deer ( | 2.69 (2.64) | 214 | 39.6 | 3.8 | 171,874 |
| Chinese water deer ( | 2.53 (2.48) | 76 | 131.4 | 13.8 | 22,246 |
| Black muntjac ( | 2.68 (2.67) | 116 | 8.2 | 1.3 | 21,052 |
| Hog deer ( | 2.68 (2.64) | 197 | 172.8 | 20.6 | 136,093 |
| Milu ( | 2.52 (2.46) | 82 | 32.7 | 3.0 | 46 381 |
| Red deer ( | 3.40 (1.95) | 62 | 7.9 | 0.27 | 34,724 |
| Reeves muntjac ( | 2.58(2.51) | 34 | 225.1 | 9.4 | 29,705 |
| Muntjak ( | 2.57(2.52) | 41 | 215.5 | - | 25,651 |
| Mule deer ( | 2.34 (2.34) | 25 | 113.3 | 0.8 | 838,758 |
| Reindeer ( | 2.64 (2.54) | 220 | 89.7 | 0.94 | 58 765 |
| Eastern roe deer ( | 2.61 (2,55) | 77 | - | 6.6 | 92,100 |
| White-tailed deer ( | 2.38 (2.36) | 150 | 122.0 | 0.9 | 17,025 |
| Alces alces ( | 2,74 (2,54) | 35 | 131,8 | 4.1 | 48,219 |
Fig. 3Reconstructed chromosome 1 of the Tarim red deer genome (CEY_v1) using two genetic maps: the red deer female and male genetic maps with equal weights. (a) “Side-by-side” alignments between chromosomes and the linkage groups. The conflict markers are shown as across lines. (b) Two scatter plots, in which dots representing the physical position (x-axis) versus the genetic map distance (y-axis) on the chromosomes, showed a monotonic trend and no breaks for illustrating near-perfect collinearity. Adjacent scaffolds within the chromosome are shown as boxes with alternation shades, marking the boundaries of the component scaffolds. The ρ-value on each scatter plot measures the Pearson correlation coefficient, with values in the range of −1 to 1 (values closer to −1 and 1 indicate near-perfect collinearity). (c) Correlation between the size of the reconstructed chromosomes and those of the previous estimation by Johnston, et al.[27].
| Measurement(s) | DNA • genome • sequence_assembly • sequence feature annotation |
| Technology Type(s) | DNA sequencing • sequence assembly process • sequence annotation |
| Sample Characteristic - Organism | Cervus elaphus |