| Literature DB >> 32637481 |
Thuy-Yen Duong1, Mun Hua Tan2,3, Yin Peng Lee2,3, Larry Croft2, Christopher M Austin2,3.
Abstract
Freshwater catfish of the genus Clarias, known as the airbreathing catfish, are widespread and important for food security through small scale inland fisheries and aquaculture. Limited genomic data are available for this important group of fishes. The bighead catfish (Clarias macrocephalus) is a commercial aquaculture species in southeast Asia used for aquaculture and threatened in its natural environment through habitat destruction, over-exploitation and competition from other introduced species of Clarias. Despite its commercial importance and threats to natural populations, public databases do not include any genomic data for C. macrocephalus. We present the first genomic data for the bighead catfish from Illumina sequencing. A total of 128 Gb of sequence data in paired-end 150 bp reads were assembled de novo, generating a final assembly of 883 Mbp contained in 27,833 scaffolds (N50 length: 80.8 kbp) with BUSCO completeness assessments of 96.3% and 87.6% based on metazoan and Actinopterygii ortholog datasets, respectively. Annotation of the genome predicted 21,124 gene sequences, which were assigned putative functions based on homology to existing protein sequences in public databases. Raw fastq reads and the final version of the genome assembly have been deposited in the NCBI (BioProject: PRJNA604477, WGS: JAAGKR000000000, SRA: SRR11188453). The complete C. macrocephalus mitochondrial genome was also recovered from the same sequence read dataset and is available on NCBI (accession: MT109097), representing the first mitogenome for this species. Lastly, we find an expansion of the mb and ora1 genes thought to be associated with adaptations to air-breathing and a semi-terrestrial life style in this genus of catfish.Entities:
Keywords: Aquaculture; Catfish; Clarias macrocephalus; Genome; Illumina
Year: 2020 PMID: 32637481 PMCID: PMC7326715 DOI: 10.1016/j.dib.2020.105861
Source DB: PubMed Journal: Data Brief ISSN: 2352-3409
Sequencing, assembly and annotation of the Clarias macrocephalus genome.
| Based on 19-mers | 883,670,904 bp |
| Based on 21-mers | 880,824,886 bp |
| Based on 25-mers | 878,508,361 bp |
| Number of scaffolds | 27,833 |
| Assembly size | 883,399,353 bp |
| Average scaffold size | 31,739.28 bp |
| Scaffold N50 size | 80,802 bp |
| Largest scaffold | 650,799 bp |
| Smallest scaffold | 200 bp |
| Number of Ns | 998,952 bp |
| Number of gaps | 17,036 |
| Percentage of short reads aligned to assembly | 96.14% |
| Complete | 942 (96.3%) |
| Complete and single copy | 900 (92.0%) |
| Complete and duplicated copy | 42 (4.3%) |
| Fragmented | 21 (2.1%) |
| Missing | 15 (1.6%) |
| Complete | 4014 (87.6%) |
| Complete and single copy | 3859 (84.2%) |
| Complete and duplicated copy | 155 (3.4%) |
| Fragmented | 294 (6.4%) |
| Missing | 276 (6.0%) |
| Number of predicted genes (AED ≤ 0.5) | 21,124 |
| Number of genes with homology to NR | 20,693 (98.0%) |
| Number of genes with functional domain | 20,794 (98.4%) |
Fig. 1Gene copies of mb, ora1 and sult6b1 in genomes of Clarias macrocephalus and other fish (C. macrocephalus genes colored in blue).
Fig. 2Mitochondrial genome of Clarias macrocephalus.
Number of gene copies for mb, ora1 and sult6b1 in the genomes of Clarias macrocephalus and other fishes.
| Species name | Ensembl code | |||
| ENSAMXP | 1 | 1 | 2 | |
| ENSCCRP | 2 | 1 | 1 | |
| ENSDARP | 5 | 1 | 1 | |
| ENSEEEP | 1 | 3 | 5 | |
| ENSGMOP | 1 | 1 | 4 | |
| ENSGACP | 2 | 1 | 6 | |
| ENSIPUP | 1 | 1 | 7 | |
| ENSONIP | 1 | 1 | 2 | |
| ENSORLP | 1 | 1 | 3 | |
| ENSPFOP | 1 | 1 | 1 | |
| ENSPLAP | 1 | 1 | 3 | |
| ENSPNAP | 1 | 1 | 2 | |
| ENSSANP | 1 | 1 | 3 | |
| ENSSGRP | 1 | 1 | 2 | |
| ENSSRHP | 2 | 1 | 5 | |
| ENSTRUP | 2 | 1 | 1 | |
| ENSTNIP | 5 | 3 | 1 |
| Biology | |
| Genomics | |
| Sequencing raw reads, Assembly, Table, Figure, | |
| Illumina NovaSeq | |
| Raw Reads (fastq), Assembly (fasta), Protein and Transcript sequences (fasta) | |
| DNA from a white muscle tissue sample of an adult catfish specimen was used for library preparation and sequencing. | |
| Total genomic DNA extraction was performed using the SDS-Chloroform extraction method. The gDNA library was subsequently processed with the Illumina TruSeq PCR Free kit following manufacturer's instructions. Paired-end sequencing of the constructed library was performed on a NovaSeq 6000 (2 × 150 bp run configuration) at the Deakin Genomics Centre. | |
| 9° 19′ 28.9″ N; 104° 53′ 20.9″ E | |
| Mitochondrial genome is available on NCBI under accession number MT109097 ( |