| Literature DB >> 29232881 |
Steven J M Jones1,2,3, Gregory A Taylor4, Simon Chan5, René L Warren6, S Austin Hammond7, Steven Bilobram8, Gideon Mordecai9,10, Curtis A Suttle11,12,13,14, Kristina M Miller15, Angela Schulze16, Amy M Chan17,18, Samantha J Jones19,20, Kane Tse21, Irene Li22, Dorothy Cheung23, Karen L Mungall24, Caleb Choo25, Adrian Ally26, Noreen Dhalla27, Angela K Y Tam28, Armelle Troussard29, Heather Kirk30, Pawan Pandoh31, Daniel Paulino32, Robin J N Coope33, Andrew J Mungall34, Richard Moore35, Yongjun Zhao36, Inanc Birol37,38, Yussanne Ma39, Marco Marra40,41, Martin Haulena42.
Abstract
The beluga whale is a cetacean that inhabits arctic and subarctic regions, and is the only living member of the genus Delphinapterus. The genome of the beluga whale was determined using DNA sequencing approaches that employed both microfluidic partitioning library and non-partitioned library construction. The former allowed for the construction of a highly contiguous assembly with a scaffold N50 length of over 19 Mbp and total reconstruction of 2.32 Gbp. To aid our understanding of the functional elements, transcriptome data was also derived from brain, duodenum, heart, lung, spleen, and liver tissue. Assembled sequence and all of the underlying sequence data are available at the National Center for Biotechnology Information (NCBI) under the Bioproject accession number PRJNA360851A.Entities:
Keywords: Cetacea; Delphinapterus leucas; beluga whale; genome; genome assembly
Year: 2017 PMID: 29232881 PMCID: PMC5748696 DOI: 10.3390/genes8120378
Source DB: PubMed Journal: Genes (Basel) ISSN: 2073-4425 Impact factor: 4.096
Figure 1Genome assembly workflow. gDNA: Genomic DNA.
Figure 2Transcriptome assembly workflow.
Assembly statistics and gene content for the genome sequences reported in this study.
| Assembly | Total Size (Gbp) | No. of Gaps | No. of Scaffolds | Scaffold N50 (bp) | Longest Scaffold (bp) | BUSCO Complete Genes | BUSCO Complete + Fragmented Genes |
|---|---|---|---|---|---|---|---|
| ABySS-pe | 2.325 + 0.216% in gaps | 210,782 | 102,940 | 58,545 | 997,316 | 4153 (66.42%) | 4689 (74.99%) |
| Supernova | 2.314 + 1.40% in gaps | 30,858 | 8930 | 16.79 × 106 | 78 × 106 | 5667 (90.63%) | 5911 (94.53%) |
| Rails/Cobbler | 2.327 + 1.37% in gaps | 26,898 | 6971 | 19.59 × 106 | 95 × 106 | 5669 (90.66%) | 5915 (94.59%) |
| Sealer | 2.327 + 1.36% in gaps | 25,839 | 6971 | 19.59 × 106 | 95 × 106 | 5669 (90.66%) | 5915 (94.59%) |
BUSCO: Benchmarking Universal Single-Copy Orthologs.
Transcriptome assembly statistics for all tissues studied and the read counts for each library.
| Tissue | n | n:N50 | Min | N80 | N50 | N20 | Max | Sum | Read Count |
|---|---|---|---|---|---|---|---|---|---|
| Liver, dam | 960,722 | 117,511 | 74 | 144 | 420 | 1542 | 47,312 | 246.3 × 106 | 239.4 × 106 |
| Brain, dam | 2,019,281 | 247,296 | 74 | 193 | 587 | 2013 | 18,656 | 691 × 106 | 247.0 × 106 |
| Liver, daughter | 854,394 | 99,263 | 74 | 145 | 555 | 1538 | 47,494 | 235.4 × 106 | 241.2 × 106 |
| Brain daughter | 2,258,624 | 260,327 | 74 | 198 | 653 | 2219 | 19,796 | 806.2 × 106 | 270.4 × 106 |
| Lung, dam | 1,170,674 | 374,225 | 162 | 201 | 282 | 504 | 5091 | 339.2 × 106 | 25.6 × 106 |
| Mixed, dam | 860,603 | 305,220 | 149 | 186 | 220 | 352 | 4751 | 208.2 × 106 | 26.6 × 106 |
| Serum, dam | 1,244,441 | 516,834 | 135 | 195 | 203 | 287 | 8034 | 281.5 × 106 | 23.8 × 106 |