| Literature DB >> 30962899 |
Abdul Awal Mintoo1,2,3, Hailin Zhang4,5, Chunhai Chen4, Mohammad Moniruzzaman2,3, Tingxian Deng6, Mahbub Anam1,2,3, Quazi Mohammad Emdadul Huque2,3, Xuanmin Guang4, Ping Wang4, Zhen Zhong4, Pengfei Han4, Asma Khatun2, Tabith M Awal1,2,3, Qiang Gao4, Xianwei Liang6.
Abstract
Water buffalo (Bubalus bubalis), a large-sized member of the Bovidae family, is considered as an important livestock species throughout Southeast Asia. In order to better understand the molecular basis of buffalo improvement and breeding, we sequenced and assembled the genome (2n=50) of a river buffalo species Bubalus bubalis from Bangladesh. Its genome size is 2.77 Gb, with a contig N50 of 25 kb and the scaffold N50 of 6.9 Mbp. Based on the assembled genome, we annotated 24,613 genes for future functional genomics studies. Phylogenetic tree analysis of cattle and water buffalo lineages showed that they diverged about 5.8-9.8 million years ago. Our findings provide an insight into the water buffalo genome which will contribute in further research on buffalo such as molecular breeding, understanding complex traits, conservation, and biodiversity.Entities:
Keywords: annotation; genome; phylogenetic analysis; water buffalo
Year: 2019 PMID: 30962899 PMCID: PMC6434576 DOI: 10.1002/ece3.4965
Source DB: PubMed Journal: Ecol Evol ISSN: 2045-7758 Impact factor: 2.912
The statistics for raw data and clean data
| Pair‐end libraries | Insert size | Reads length | Raw data | Clean data | ||
|---|---|---|---|---|---|---|
| Total data (Gb) | Total data (Gb) | Sequence depth (X) | Physical depth (X) | |||
| Solexa reads | 170 bp | 100_100 | 37.94 | 31.07 | 10.55 | 8.97 |
| 500 bp | 100_100 | 64.89 | 57.46 | 19.50 | 48.76 | |
| 800 bp | 100_100 | 44.68 | 38.38 | 13.03 | 52.11 | |
| 2 kb | 49_49 | 93.38 | 66.66 | 22.63 | 461.77 | |
| 5 kb | 49_49 | 40.43 | 25.55 | 8.67 | 442.47 | |
| 10 kb | 49_49 | 34.18 | 22.78 | 7.73 | 788.90 | |
| 20 kb | 49_49 | 35.30 | 14.05 | 4.77 | 973.60 | |
| Total | — | — | 350.80 | 255.95 | 86.88 | 2,776.57 |
Assuming the genome size is 2.946 Gb.
Assembly statistics of our River water buffalo genome, African buffalo, and published water buffalo genome
| River Water buffalo | African buffalo | Water Buffalo# | |
|---|---|---|---|
| Contig | |||
| N50 | 25,036 | 42,601 | 21,938 |
| Largest | 262,402 | 471,476 | — |
| Number | 235,999 | 561,609 | 630,368 |
| Scaffold | |||
| N50 | 6,957,949 | 2,411,048 | 1,412,388 |
| Largest | 25,744,419 | 16,927,952 | — |
| Number | 33,840 | 442,401 | 366,983 |
| Total assembled size (bp) | 2,770,477,792 | 2,688,614,675 | 2,836,166,969 |
“Water Buffalo #” was represented the assembly UMD_CASPUR_WB_2.0 from the paper.
Figure 1The synteny block between the genome of water buffalo and cow
Summary statistics cattle reference CNVs using buffalo reads
| Range | Deletion | Duplication | ||||
|---|---|---|---|---|---|---|
| Block | Length | Cover | Block | Length | Cover | |
| >1 kb | 16,207 | 3,000 | 113,985,100 | 1,475 | 9,300 | 20,914,100 |
| >5 kb | 5,468 | 9,700 | 88,900,300 | 1,101 | 12,800 | 19,716,700 |
| >10 kb | 2,601 | 15,900 | 68,317,500 | 689 | 19,000 | 16,688,100 |
| >20 kb | 907 | 30,700 | 45,028,000 | 316 | 29,300 | 11,383,700 |
| >50 kb | 235 | 82,300 | 25,510,100 | 47 | 61,800 | 3,475,500 |
Summary statistics of interspersed repeat regions in Bubalus bubalis
| Type | Repbase TEs | TE proteins | De novo | Combined TEs | ||||
|---|---|---|---|---|---|---|---|---|
| Length (bp) | % in genome | Length (bp) | % in genome | Length (bp) | % in genome | Length (bp) | % in genome | |
| DNA | 36,153,654 | 1.30 | 7,072,414 | 0.26 | 4,337,092 | 0.16 | 40,080,739 | 1.45 |
| LINE | 601,949,239 | 21.73 | 395,094,173 | 14.26 | 958,438,214 | 34.60 | 1,094,513,235 | 39.51 |
| SINE | 201,003,037 | 7.26 | 0 | 0.00 | 11,615,206 | 0.42 | 210,326,622 | 7.59 |
| LTR | 100,299,951 | 3.62 | 11,867,770 | 0.43 | 296,982,552 | 10.72 | 375,094,177 | 13.54 |
| Other | 272 | 0.00 | 0 | 0.00 | 0 | 0.00 | 272 | 0.00 |
| Unknown | 0 | 0.00 | 0 | 0.00 | 134,207 | 0.00 | 134,207 | 0.00 |
| Total | 921,567,446 | 33.26 | 413,831,008 | 14.94 | 1,118,906,262 | 40.39 | 1,255,723,859 | 45.33 |
Summary statistics of non‐coding RNAs in Bubalus bubalis
| Type | Copy number | Average length (bp) | Total length (bp) | % of genome |
|---|---|---|---|---|
| miRNA | 23,310 | 100.82 | 2,350,000 | 0.0848 |
| tRNA | 38,483 | 72.86 | 2,803,734 | 0.1012 |
| rRNA | 867 | 105.79 | 91,722 | 0.0033 |
| 18S | 123 | 135.18 | 16,627 | 0.0006 |
| 28S | 271 | 146.65 | 39,741 | 0.0014 |
| 5.8S | 9 | 81.89 | 737 | 0.0000 |
| 5S | 464 | 74.61 | 34,617 | 0.0013 |
| snRNA | 1,762 | 114.17 | 201,174 | 0.0073 |
| CD‐box | 319 | 92.78 | 29,598 | 0.0011 |
| HACA‐box | 300 | 135.20 | 40,560 | 0.0015 |
| Splicing | 1,106 | 114.34 | 126,457 | 0.0046 |
Summary statistics of denovo, homolog, transcript approaches and integrate the gene prediction in Bubalus bubalis
| Gene set | Number | Average gene length (bp) | Average CDS length (bp) | Average exons per gene | Average exon length (bp) | Average intron length (bp) |
|---|---|---|---|---|---|---|
| AUGUSTUS | 21,098 | 50,022 | 1,453 | 9 | 166.05 | 6,266 |
|
| 27,004 | 21,134 | 1,272 | 7 | 177.59 | 3,224 |
|
| 25,417 | 23,299 | 1,343 | 7 | 181.74 | 3,435 |
|
| 24,332 | 22,204 | 1,343 | 7 | 181.01 | 3,250 |
|
| 24,849 | 23,684 | 1,354 | 7 | 180.79 | 3,440 |
|
| 25,515 | 22,246 | 1,322 | 7 | 181.52 | 3,330 |
|
| 26,247 | 21,689 | 1,248 | 7 | 181.92 | 3,488 |
|
| 24,378 | 21,661 | 1,290 | 7 | 183.08 | 3,368 |
| Transcript | 95,359 | 3,145 | 893 | 3 | 319.30 | 1,254 |
| Homolog and transcript | 34,560 | 18,446 | 1,128 | 6 | 183.90 | 3,325 |
| End integrate | 24,613 | 45,255 | 1,407 | 9 | 164.26 | 5,789 |
Summarized benchmarks in the BUSCO assessment for genome assembly and genesets
| BUSCO benchmark |
|
|
|
| ||||
|---|---|---|---|---|---|---|---|---|
| Genesets | Genome | Genesets | Genome | Genesets | Genome | Genesets | Genome | |
| Complete single‐copy | 2,395/92.6 | 3,870/94.3 | 2,387/92.3 | 1680/40.9 | 2,389/92.4 | 3,987/97.1 | 2,399/92.8 | 3,785/92.2 |
| Complete duplicated | 39/1.5 | 37/0.9 | 42/1.6 | 750/18.3 | 29/1.1 | 27/0.7 | 44/1.7 | 248/6 |
| Fragmented | 81/3.1 | 78/1.9 | 90/3.5 | 105/2.6 | 99/3.8 | 59/1.4 | 79/3.1 | 50/1.2 |
| Missing | 71/2.8 | 119/2.9 | 67/2.6 | 1569/38.2 | 69/2.7 | 31/0.8 | 64/2.4 | 21/0.6 |
“B. bubalis *” was studied in this paper. “B. bubalis #” represented the assembly UMD_CASPUR_WB_2.0. BUSCO version is: 2.0. The lineage dataset is: vertebrata_odb9 (Creation date: 2016‐02‐13, number of species: 65, number of BUSCOs: 4,041).
Figure 2Comparison of gene parameters among the Bovidae family genome. (a) Gene length; (b) CDS length; (c) exon length; (d) intron length
Figure 3Estimation of divergence time. The numbers on the nodes represent the divergence times from present (million years ago, Mya).The red points in three internal nodes indicate fossil calibration times for Equus caballus‐Bos taurus divergence (74–81Mya), Camelus bactrianus‐Bos taurus divergence (61–71Mya), and Bos Taurus‐Bos grunniens divergence (1.96–6.77Mya) (http://www.timetree.org/) used in the analysis. The estimated divergence times with their 95% confidence intervals are shown
Figure 4Demographic history inferred from a single buffalo genome. Buffalo populations reached a maximum size coinciding with the largest glacial maximum (LGM) at about 20,000 years ago and rised to another peak almost simultaneous with the Penultimate Glaciation (PG) at about 200,000 years ago (vertical gray shadow on graph). The graph's horizontal axis shows the measurement of time by pairwise sequence divergence, and the vertical axis shows the measurement of the effective population size by the scaled mutation rate. The light pink lines correspond to PSMC inferences on 100 rounds of bootstrapped sequences and the red line stands for the estimate from the data