| Literature DB >> 30620341 |
Wei Wang1, Hui-Juan Yan2, Shi-Yi Chen3, Zhen-Zhen Li4, Jun Yi1, Li-Li Niu2, Jia-Po Deng2, Wei-Gang Chen2, Yang Pu2, Xianbo Jia3, Yu Qu2, Ang Chen2, Yan Zhong2, Xin-Ming Yu2, Shuai Pang4, Wan-Long Huang4, Yue Han4, Guang-Jian Liu4, Jian-Qiu Yu2.
Abstract
Hog deer (Axis porcinus) is a small deer species in family Cervidae and has been undergoing a serious and global decline during the past decades. Chengdu Zoo currently holds a captive population of hog deer with sufficient genetic diversity in China. We sequenced and de novo assembled its genome sequence in the present study. A total of six different insert-size libraries were sequenced and generated 395 Gb of clean data in total. With aid of the linked reads of 10X Genomics, genome sequence was assembled to 2.72 Gb in length (contig N50, 66.04 Kb; scaffold N50, 20.55 Mb), in which 94.5% of expected genes were detected. We comprehensively annotated 22,473 protein-coding genes, 37,019 tRNAs, and 1,058 Mb repeated sequences. The newly generated reference genome is expected to significantly contribute to comparative analysis of genome biology and evolution within family Cervidae.Entities:
Mesh:
Year: 2019 PMID: 30620341 PMCID: PMC6326164 DOI: 10.1038/sdata.2018.305
Source DB: PubMed Journal: Sci Data ISSN: 2052-4463 Impact factor: 6.444
Figure 1An adult female hog deer and its small baby in Chengdu Zoo.
Library information and sequencing results.
| Types | Libraries | Insert sizes | Raw data (Gb) | Clean data (Gb) |
|---|---|---|---|---|
| Genomic DNA sequencing | DES01754 | 250 bp | 102.4 | 101.68 |
| DES01765 | 350 bp | 74.1 | 73.59 | |
| DES01755 | 450 bp | 72.3 | 71.62 | |
| DEL01229 | 2 Kb | 31.5 | 30.56 | |
| DEL01226 | 2 Kb | 34.4 | 34.40 | |
| DEL01227 | 5 Kb | 33.3 | 31.36 | |
| DEL01230 | 5 Kb | 27.2 | 26.78 | |
| DEL01228 | 10 Kb | 13.7 | 12.38 | |
| DEL01231 | 10 Kb | 14.65 | 12.83 | |
| KD17051609 | 10X Genomics | 236.7 | 230.78 | |
| RNA sequencing | RRA59894-S | 250 bp | 7.2 | 7.12 |
| RRA59895-S | 250 bp | 8.8 | 8.66 | |
| RRA59896-S | 250 bp | 8.3 | 8.16 | |
| RRA59897-S | 250 bp | 10.0 | 9.86 | |
| RRA59898-S | 250 bp | 9.1 | 9.00 | |
| RRA59899-S | 250 bp | 10.0 | 9.94 |
The de novo assembled genome of hog deer.
| Length, bp | Number | |||
|---|---|---|---|---|
| Contigs | Scaffolds | Contigs | Scaffolds | |
| Total | 2,679,167,314 | 2,719,585,391 | 544,656 | 463,740 |
| Max | 794,078 | 91,389,359 | — | — |
| N50 | 66,035 | 20,551,061 | 11,195 | 40 |
| N90 | 9,852 | 1,790,557 | 47,200 | 170 |
Annotation of repeated sequences.
| Tools | Repeat Size (bp) | % of genome |
|---|---|---|
| RepeatMasker | 1,016,366,209 | 37.37 |
| RepeatProteinMask | 439,972,572 | 16.10 |
| TRF | 42,982,131 | 1.58 |
| Total | 1,057,944,353 | 38.90 |
Prediction of protein-coding genes.
| Methods / Tools | Gene number | Exons per gene | Average length (bp) | ||||
|---|---|---|---|---|---|---|---|
| Gene | CDS | Exon | Intron | ||||
| Homologous comparison | 34,654 | 5.21 | 15,443.28 | 1,052.42 | 202.17 | 3,421.74 | |
| 26,310 | 5.55 | 16,413.01 | 1,154.44 | 207.95 | 3,352.45 | ||
| 71,084 | 3.64 | 8,528.97 | 779.38 | 214.37 | 2,940.32 | ||
| 73,148 | 3.48 | 8,194.63 | 732.05 | 210.60 | 3,013.96 | ||
| 25,194 | 6.60 | 20,193.20 | 1,269.57 | 192.35 | 3,378.97 | ||
| RNA-seq | 81,311 | 8.05 | 37,959.37 | 3,869.12 | 480.89 | 4,838.45 | |
| Augustus | 36,909 | 4.67 | 14,638.62 | 1,002.89 | 214.88 | 3,718.34 | |
| GlimmerHMM | 557,641 | 2.41 | 4,014.61 | 424.63 | 176.26 | 2,547.72 | |
| SNAP | 128,744 | 3.53 | 25,890.73 | 530.45 | 150.38 | 10,034.10 | |
| GenID | 286,917 | 1.64 | 4,298.66 | 190.45 | 115.91 | 6,388.70 | |
| GeneScan | 71,999 | 5.48 | 24,967.54 | 920.23 | 168.05 | 5,372.64 | |
| EVM | 44,470 | 3.92 | 16,031.05 | 957.78 | 194.69 | 3,845.72 | |
| Final set | 22,473 | 8.61 | 34,536.59 | 1,449.48 | 172.73 | 4,476.40 | |
Annotation of non-coding RNA genes.
| Type | Copy | Average length (bp) | Total length (bp) | % of genome | |
|---|---|---|---|---|---|
| rRNA | miRNA | 17,289 | 97.54 | 1,686,371 | 0.06 |
| tRNA | 37,019 | 72.90 | 2,698,717 | 0.10 | |
| rRNA | 920 | 97.94 | 90,101 | 0.01 | |
| 18 S | 51 | 131.27 | 6,695 | 0.00 | |
| 28 S | 250 | 143.38 | 35,844 | 0.00 | |
| 5.8 S | 4 | 81.25 | 325 | 0.00 | |
| 5 S | 615 | 76.81 | 47,237 | 0.00 | |
| snRNA | snRNA | 4119 | 102.84 | 423,601 | 0.02 |
| CD-box | 501 | 92.24 | 46,212 | 0.00 | |
| HACA-box | 607 | 132.91 | 80,680 | 0.00 | |
| Splicing | 2925 | 97.20 | 284,299 | 0.01 | |
Functional annotation of the predicted protein-coding genes.
| Methods for annotation | Number | Percent (%) |
|---|---|---|
| Swissprot | 20,162 | 89.7 |
| InterPro | 19,650 | 87.4 |
| KEGG | 17,783 | 79.1 |
| NR | 20,957 | 93.3 |
| Annotated | 20,994 | 93.4 |
| Unannotated | 1,479 | 6.6 |