| Literature DB >> 29048578 |
John L Williams1,2, Daniela Iamartino3,2, Kim D Pruitt4, Tad Sonstegard5, Timothy P L Smith6, Wai Yee Low1, Tommaso Biagini7, Lorenzo Bomba8,9, Stefano Capomaccio8, Bianca Castiglioni10, Angelo Coletta11, Federica Corrado12, Fabrizio Ferré13, Leopoldo Iannuzzi14, Cynthia Lawley15, Nicolò Macciotta16, Matthew McClure17,18, Giordano Mancini19, Donato Matassino20, Raffaele Mazza3, Marco Milanesi8, Bianca Moioli21, Nicola Morandi22, Luigi Ramunno23, Vincenzo Peretti24, Fabio Pilla25, Paola Ramelli2, Steven Schroeder17, Francesco Strozzi2,26, Francoise Thibaud-Nissen4, Luigi Zicarelli25, Paolo Ajmone-Marsan8, Alessio Valentini27, Giovanni Chillemi28, Aleksey Zimin29.
Abstract
Water buffalo is a globally important species for agriculture and local economies. A de novo assembled, well-annotated reference sequence for the water buffalo is an important prerequisite for studying the biology of this species, and is necessary to manage genetic diversity and to use modern breeding and genomic selection techniques. However, no such genome assembly has been previously reported. There are 2 species of domestic water buffalo, the river (2 n = 50) and the swamp (2 n = 48) buffalo. Here we describe a draft quality reference sequence for the river buffalo created from Illumina GA and Roche 454 short read sequences using the MaSuRCA assembler. The assembled sequence is 2.83 Gb, consisting of 366 983 scaffolds with a scaffold N50 of 1.41 Mb and contig N50 of 21 398 bp. Annotation of the genome was supported by transcriptome data from 30 tissues and identified 21 711 predicted protein coding genes. Searches for complete mammalian BUSCO gene groups found 98.6% of curated single copy orthologs present among predicted genes, which suggests a high level of completeness of the genome. The annotated sequence is available from NCBI at accession GCA_000471725.1.Entities:
Keywords: Water buffalo; annotation; genome assembly; transcriptome
Mesh:
Year: 2017 PMID: 29048578 PMCID: PMC5737279 DOI: 10.1093/gigascience/gix088
Source DB: PubMed Journal: Gigascience ISSN: 2047-217X Impact factor: 6.524
Sequencing data used for the water buffalo genome assembly. The coverage is calculated using an estimated genome size of 2.8 Gbp.
| Sequencing technology | Mean library fragment length, bp | Mean read length, bp | Number of reads | Coverage |
|---|---|---|---|---|
| 454 FLX | 800 | 353 | 16 737 372 | ×2.1 |
| 454 FLX mate pair | 20 000 | 229 | 4 821 352 | ×0.4 |
| Illumina GAIIx paired end | 400 | 96 | 2 122 738 136 | ×73 |
| Illumina GAIIx mate pair | 4000–6000 | 75 | 335 354 888 | ×9.0 |
Raw genomic sequence data are available from the SRA (PRJNA207334).
Assembly statistics of water buffalo.
| Total sequence length | 2 836 166 969 |
| Total assembly gap length | 74 388 041 |
| Gaps between scaffolds | 0 |
| Number of scaffolds | 366 983 |
| Scaffold N50 | 1 412 388 |
| Scaffold L50 | 581 |
| Number of contigs | 630 368 |
| Contig N50 | 21 938 |
| Contig L50 | 35 881 |
aOne of the scaffolds is the full mitochondrial genome.
Counts of predicted genomic features in Annotation Release 100
| Feature | Count |
|---|---|
| Genes and pseudogenes | 27 837 |
| Protein-coding | 21 711 |
| Non-coding | 2303 |
| Pseudogenes | 3823 |
| mRNAs | 41 486 |
| Fully supported | 38 378 |
| With > 5% | 1662 |
| Partial | 1500 |
| Other RNAs | 5544 |
| Fully supported | 3911 |
| With > 5% | 0 |
| Partial | 0 |
| CDSs | 41 665 |
| Fully supported | 38 378 |
| With > 5% | 1956 |
| Partial | 1515 |
Genome annotation comparison with other domestic species.
| Species | Common | Protein | Partial | Assembly | Divergence time | RefSeq assembly | Annotation |
|---|---|---|---|---|---|---|---|
| name | coding genes | CDS | size | to buffalo, Myr | accession | release ID | |
|
| Water Buffalo | 21 711 | 1515 | 2 836 166 969 | - | GCF_000471725.1 | 100 |
|
| Cattle | 21 295 | 1589 | 2 670 139 648 | 12.3 | GCF_000003055.6 | 105 |
|
| Goat | 20 755 | 457 | 2 922 813 246 | 24.6 | GCF_001704415.1 | 102 |
|
| Sheep | 20 645 | 758 | 2 615 516 299 | 24.6 | GCF_000298735.2 | 102 |
|
| Pig | 24 205 | 4112 | 2 808 525 991 | 62 | GCF_000003025.5 | 105 |
Completeness of buffalo genome assembly as assessed by BUSCO.
| Complete BUSCOs (C) | 4048 |
| Complete and single-copy BUSCOs (S) | 4007 |
| Complete and duplicated BUSCOs (D) | 41 |
| Fragmented BUSCOs (F) | 50 |
| Missing BUSCOs (M) | 6 |
| Total BUSCO groups searched | 4104 |