| Literature DB >> 29048483 |
Ming-Shan Wang1,2, Yan Zeng1,2, Xiao Wang1,2, Wen-Hui Nie1, Jin-Huan Wang1, Wei-Ting Su1, Newton O Otecko1,2, Zi-Jun Xiong1,3, Sheng Wang4, Kai-Xing Qu5, Shou-Qing Yan6, Min-Min Yang1,2, Wen Wang1,2, Yang Dong7,8, Dong-Dong Wu1,2, Ya-Ping Zhang1,2,9.
Abstract
Gayal (Bos frontalis), also known as mithan or mithun, is a large endangered semi-domesticated bovine that has a limited geographical distribution in the hill-forests of China, Northeast India, Bangladesh, Myanmar, and Bhutan. Many questions about the gayal such as its origin, population history, and genetic basis of local adaptation remain largely unresolved. De novo sequencing and assembly of the whole gayal genome provides an opportunity to address these issues. We report a high-depth sequencing, de novo assembly, and annotation of a female Chinese gayal genome. Based on the Illumina genomic sequencing platform, we have generated 350.38 Gb of raw data from 16 different insert-size libraries. A total of 276.86 Gb of clean data is retained after quality control. The assembled genome is about 2.85 Gb with scaffold and contig N50 sizes of 2.74 Mb and 14.41 kb, respectively. Repetitive elements account for 48.13% of the genome. Gene annotation has yielded 26 667 protein-coding genes, of which 97.18% have been functionally annotated. BUSCO assessment shows that our assembly captures 93% (3183 of 4104) of the core eukaryotic genes and 83.1% of vertebrate universal single-copy orthologs. We provide the first comprehensive de novo genome of the gayal. This genetic resource is integral for investigating the origin of the gayal and performing comparative genomic studies to improve understanding of the speciation and divergence of bovine species. The assembled genome could be used as reference in future population genetic studies of gayal.Entities:
Keywords: Bos frontalis; annotation; genome assembly; phylogeny
Mesh:
Year: 2017 PMID: 29048483 PMCID: PMC5710521 DOI: 10.1093/gigascience/gix094
Source DB: PubMed Journal: Gigascience ISSN: 2047-217X Impact factor: 6.524
Figure 1:A picture showing a female gayal (Bos frontalis, provided by Kai-Xing Qu).
Figure 2:Karyotype of the gayal used for genome sequencing (provided by Wen-Hui Nie).
Figure 3:17-mer frequency distribution of sequencing reads.
Statistics of the completeness of the hybrid de novo assembly of Bos frontalis genome
| Terms | Contig | Scaffold | ||
|---|---|---|---|---|
| Size | Number | Size | Number | |
| N90 | 2461 | 211 577 | 158 610 | 1357 |
| N80 | 5335 | 140 237 | 1 060 177 | 800 |
| N70 | 8109 | 99 930 | 1 668 147 | 587 |
| N60 | 11 044 | 71 764 | 2 170 469 | 437 |
| N50 | 14 405 | 50 585 | 2 737 757 | 320 |
| Max length | 208 099 | 13 764 521 | ||
| Total length | 2 669 378 334 | 2 848 570 279 | ||
| Total number | 583 373 | 460 059 | ||
| Average length | 4575 | 6191 | ||
| Number ≥ 500 bp | 394 757 | 116 481 | ||
| Number ≥ 1000 bp | 300 178 | 53 989 | ||
| Number ≥ 2000 bp | 229 796 | 19 915 | ||
| Number ≥ 5000 bp | 146 493 | 5387 | ||
Statistics of the completeness of the assembled genomes for Bos frontalis and close related species by BUSCO (version 2)
| Species | Terms | Complete (C) | Complete and single-copy (S) | Complete and duplicated (D) | Fragmented (F) | Missing (M) |
|---|---|---|---|---|---|---|
| Gayal | Number | 3494 | 3434 | 60 | 319 | 291 |
| Proportion, % | 85.14 | 83.67 | 1.46 | 7.77 | 7.09 | |
| Zebu | Number | 3698 | 3644 | 54 | 158 | 248 |
| Proportion, % | 90.11 | 88.79 | 1.32 | 3.85 | 6.04 | |
| Wisent | Number | 3794 | 3763 | 31 | 180 | 130 |
| Proportion, % | 92.45 | 91.69 | 0.76 | 4.39 | 3.17 | |
| Yak | Number | 3841 | 3809 | 32 | 138 | 125 |
| Proportion, % | 93.59 | 92.81 | 0.78 | 3.36 | 3.05 | |
| Buffalo | Number | 3817 | 3780 | 37 | 142 | 145 |
| Proportion, % | 93.01 | 92.11 | 0.90 | 3.46 | 3.53 | |
| Bison | Number | 3779 | 3735 | 44 | 165 | 160 |
| Proportion, % | 92.08 | 91.01 | 1.07 | 4.02 | 3.90 |
Statistics of repeats in Bos frontalis genome
| Type | Repeat size, bp | % of genome |
|---|---|---|
| Trf | 17 696 175 | 0.62 |
| Repeatmasker | 868 885 926 | 30.50 |
| Proteinmask | 265 003 148 | 9.30 |
|
| 917 371 710 | 32.20 |
| Total | 1 371 023 312 | 48.13 |
General statistics of predicted protein-coding genes
| Gene set | Total | Exon number | CDS length, bp | mRNA length, bp | Exons per gene | Exon length, bp | Intron length, bp | |
|---|---|---|---|---|---|---|---|---|
| Homolog |
| 19 666 | 141 323 | 1325 | 20 618 | 7.19 | 184 | 3118 |
|
| 17 627 | 121 986 | 1323 | 20 802 | 6.92 | 191 | 3290 | |
|
| 24 783 | 146 172 | 1108 | 17 567 | 5.89 | 187 | 3360 | |
|
| 20 283 | 121 282 | 1142 | 16 288 | 5.97 | 191 | 3041 | |
|
| 17 988 | 117 965 | 1277 | 19 469 | 6.55 | 194 | 3273 | |
|
| 20 947 | 147 367 | 1287 | 20 973 | 7.03 | 183 | 3261 | |
|
| AUGUSTUS | 41 227 | 180 664 | 1127 | 22 786 | 4.38 | 257 | 6403 |
| GlimmerHMM | 27 067 | 104 294 | 874 | 5433 | 3.85 | 226 | 1597 | |
| Genescan | 46 598 | 297 828 | 1321 | 36 828 | 6.39 | 206 | 6585 | |
| Glean (final) | 26 667 | 87 392 | 1156 | 4996 | 3.27 | 352 | 1686 | |
Figure 4:Phylogenetic trees of gayal and other bovine species. (A) Tree constructed based on maximum likelihood method. (B) Tree constructed using Bayesian inference.
Figure 5:Maximum likelihood trees of gayal and other bovine species using whole complete mtDNA. IDs in parentheses are GenBank accession number.
Figure 6:Divergence time estimated between gayal and other bovine species.