| Literature DB >> 28873963 |
Christopher M Austin1,2,3, Mun Hua Tan1,2,3, Katherine A Harrisson4, Yin Peng Lee2,3, Laurence J Croft3,5, Paul Sunnucks4, Alexandra Pavlova4, Han Ming Gan1,2,3.
Abstract
One of the most iconic Australian fish is the Murray cod, Maccullochella peelii (Mitchell 1838), a freshwater species that can grow to ∼1.8 metres in length and live to age ≥48 years. The Murray cod is of a conservation concern as a result of strong population contractions, but it is also popular for recreational fishing and is of growing aquaculture interest. In this study, we report the whole genome sequence of the Murray cod to support ongoing population genetics, conservation, and management research, as well as to better understand the evolutionary ecology and history of the species. A draft Murray cod genome of 633 Mbp (N50 = 109 974bp; BUSCO and CEGMA completeness of 94.2% and 91.9%, respectively) with an estimated 148 Mbp of putative repetitive sequences was assembled from the combined sequencing data of 2 fish individuals with an identical maternal lineage; 47.2 Gb of Illumina HiSeq data and 804 Mb of Nanopore data were generated from the first individual while 23.2 Gb of Illumina MiSeq data were generated from the second individual. The inclusion of Nanopore reads for scaffolding followed by subsequent gap-closing using Illumina data led to a 29% reduction in the number of scaffolds and a 55% and 54% increase in the scaffold and contig N50, respectively. We also report the first transcriptome of Murray cod that was subsequently used to annotate the Murray cod genome, leading to the identification of 26 539 protein-coding genes. We present the whole genome of the Murray cod and anticipate this will be a catalyst for a range of genetic, genomic, and phylogenetic studies of the Murray cod and more generally other fish species of the Percichthydae family.Entities:
Keywords: Murray cod; genome; hybrid assembly; long reads; transcriptome
Mesh:
Year: 2017 PMID: 28873963 PMCID: PMC5597895 DOI: 10.1093/gigascience/gix063
Source DB: PubMed Journal: Gigascience ISSN: 2047-217X Impact factor: 6.524
Figure 1:The iconic Murray cod. Photo: Paul Sunnucks.
Figure 2:Estimation of genome size, repeat content, and heterozygosity by GenomeScope, based on 21-mers in HiSeq sequence reads (max kmer coverage at 1000).
Murray cod assembly and annotation statistics
| Genome assembly | Illumina only | Illumina (≥500 bp) | Illumina + Nanopore (≥500 bp) |
|---|---|---|---|
| Number of contigs | 95 612 | 41 152 | 45 882 |
| Contig N50 size | 33 442 bp | 34 269 bp | 52 687 bp |
| Longest contig | 328 477 bp | 328 477 bp | 501 239 bp |
| Number of scaffolds | 80 098 | 25 642 | 18 198 |
| Total scaffold size | 622 421 194 bp | 609 090 121 bp | 633 241 041 bp |
| Scaffold N50 size | 68 937 bp | 70 993 bp | 109 974 bp |
| Longest scaffold | 548 726 bp | 548 726 bp | 1 119 190 bp |
| % GC/AT/N | 40.7/59.1/0.2 | 40.7/59.2/0.1 | 40.4/58.7/0.9 |
| CEGMA completeness | 89.52% | 84.68% | 91.94% |
| Complete BUSCOs | 4228 (92.3%) | 4229 (92.3%) | 4317 (94.2%) |
| Complete and single-copy BUSCOs | 4115 (89.8%) | 4115 (89.8%) | 4202 (91.7%) |
| Complete and duplicated BUSCOs | 113 (2.5%) | 114 (2.5%) | 115 (2.5%) |
| Fragmented BUSCOs | 224 (4.9%) | 222 (4.8%) | 156 (3.4%) |
| Missing BUSCOs | 132 (2.8%) | 133 (2.9%) | 111 (2.4%) |
| Transcriptome assembly | |||
| Number of transcripts | 321 855 | ||
| Transcriptome size | 305 149 376 bp | ||
| Mean transcript length | 948.10 bp | ||
| Longest transcript | 23 655 bp | ||
| CEGMA completeness | 99.19% | ||
| Annotation | |||
| Number of protein-coding genes | 26 539 | ||
| Mean gene length | 10 115.3 bp | ||
| Longest gene | 134 909 bp | ||
| With functional annotation | 25 607 |