Literature DB >> 33195818

The genome sequence of the channel bull blenny, Cottoperca gobio (Günther, 1861).

Iliana Bista1,2, Shane A McCarthy1,2, Jonathan Wood1, Zemin Ning1, H William Detrich Iii3, Thomas Desvignes4, John Postlethwait4, William Chow1, Kerstin Howe1, James Torrance1, Michelle Smith1, Karen Oliver1, Eric A Miska2,5, Richard Durbin1,2.   

Abstract

We present a genome assembly for Cottoperca gobio (channel bull blenny, (Günther, 1861)); Chordata; Actinopterygii (ray-finned fishes), a temperate water outgroup for Antarctic Notothenioids. The size of the genome assembly is 609 megabases, with the majority of the assembly scaffolded into 24 chromosomal pseudomolecules. Gene annotation on Ensembl of this assembly has identified 21,662 coding genes. Copyright:
© 2020 Bista I et al.

Entities:  

Keywords:  Cottoperca gobio; Notothenioidei; channel bull blenny; genome assembly chromosomal

Year:  2020        PMID: 33195818      PMCID: PMC7649722          DOI: 10.12688/wellcomeopenres.16012.1

Source DB:  PubMed          Journal:  Wellcome Open Res        ISSN: 2398-502X


Species taxonomy

Eukaryota; Metazoa; Chordata; Vertebrata; Gnathostomata; Actinopterygii; Teleostei; Clupeocephala; Percomorphaceae; Perciformes; Notothenioidei; Bovichtidae; Cottoperca; Cottoperca gobio (Günther, 1861) - synonym: Cottoperca trigloides ( Balushkin, 2000), NCBI taxid: 56716.

Background

Cottoperca gobio (channel bull blenny) is a member of the Bovichtidae family of the Notothenioidei, a fish group endemic to the Southern Ocean. The Bovichtidae (thornfishes), are considered to be the most basally diverging family of notothenioids and are less adapted to life in the extreme cold in comparison to Antarctic members of the clade ( Near ). C. gobio occupies the Patagonian regions of Chile and Argentina, and the area around the Falkland Islands. In contrast to Antarctic notothenioids (cryonotothenioids), the Bovichtidae do not produce antifreeze glycoproteins (AFGPs), a key adaptation to extreme Antarctic cold ( Chen ; Cheng ) and their hemoglobins possess slightly higher oxygen affinity than most high-Antarctic species ( Giordano ; Giordano ). Cytogenetic investigation of C. gobio showed that the karyotype of this species consists of 2n=48 chromosomes ( Pisano ). This condition, shared by other Bovichtidae, is considered to be the ancestral karyotype condition for all notothenioids ( Mazzei ). Here, we present a chromosomally complete genome sequence of Cottoperca gobio generated using specimens collected south of the Falkland Islands/Islas Malvinas. We trust that this genome sequence will be used to aid analysis of population structure and phylogeography of non-Antarctic and Antarctic notothenioid fish species, which are increasingly under threat due to climate change and human activities ( Dornburg ).

Genome sequence report

The C. gobio genome was sequenced from a specimen collected under permits to fish in territorial waters of the Falkland Islands/Islas Malvinas issued by the United Kingdom, by the Falkland Islands Government, and by Argentina. The genome assembly for C. gobio (fCotGob3.1) is based on a combination of data from four technologies, including 75x coverage Pacific Biosciences (PacBio) single-molecule long reads (N50 14 kb), 54x coverage of Illumina data generated from a 10X Genomics Chromium library (estimated molecule length N50 43 kb), and BioNano Saphyr two-enzyme data (BspQI and BssSI). Additionally, 145x coverage of Illumina HiSeqX data were obtained from a Hi-C library prepared by Arima Genomics using tissue from a second individual (fCotGob2, spleen tissue). The final assembly has a total length of 609 Mb, in 322 sequence scaffolds with a scaffold N50 of 25 Mb ( Figure 1; Table 1). The majority (94.36%) of the assembly sequence was assigned to 24 chromosomal-level scaffolds using the Hi-C data ( Figure 2; Table 2). The assembly has a BUSCO ( Simão ) gene completeness score of 93.4% using the actinopterygii reference set (with -sp zebrafish parameter). The chromosomes clearly show a one-to-one relationship with those in the Japanese medaka ( Oryzias latipes) HdrR assembly GCA_002234675.1 ( Figure 3 and Figure 4), with 3671 of the 3780 complete and single copy BUSCO genes present in both genomes found on homologous chromosomes (97.1%), and were thus named correspondingly. Analysis of conserved syntenies detected no major interchromosomal rearrangements in the approximately 195 million years since the divergence of medaka and C. gobio lineages ( Steinke ), but many intrachromosomal rearrangements ( Figure 4). While not fully phased, the assembly deposited represents one haplotype. Contigs corresponding to the second haplotype have also been deposited.
Figure 1.

Genome assembly of Cottoperca gobio, fCotGob3.1. - BlobToolKit Snailplot, showing N50 metrics and BUSCO gene completeness.

BlobToolKit plots are available at: fCotGob3.1 - BlobToolKit.

Table 1.

Data information for Cottoperca gobio, fCotGob3.1 genome assembly.

Project accession information
Assembly identifierfCotGob3.1
Species Cottoperca gobio ( Cottoperca trigloides)
SpecimensfCotGob3 (PacBio, 10XG and BioNano), fCotGob2 (Hi-C and RNA-seq)
fCotGob1 (RNA-seq)
NCBI taxonomy ID56716
BioProjectPRJEB30272
Study accessionPRJEB19273
BioSample IDsSAMEA104132835 (fCotGob1) SAMEA5365137 (fCotGob1.brain1) SAMEA5365124 (fCotGob1.gonad1) SAMEA5365123 (fCotGob1.muscle1) SAMEA104242971 (fCotGob2) SAMEA4872137 (fCotGob2.spleen1) SAMEA104242975 (fCotGob3)
Raw data accessions
Pacific Biosciences SEQUEL IERR2219167 - ERR2219176
10X Genomics IlluminaERR2639757 - ERR2639760
Hi-C IlluminaERR4179340 - ERR4179344
BioNanoERZ1392783 - ERZ1392785
RNA-seqERR3132340 (fCotGob1.brain1) ERR3132342 (fCotGob1.gonad1) ERR3132341 (fCotGob1.muscle1) ERR2639616 (fCotGob2.spleen1)
Genome assembly
Assembly accession GCA_900634415.1
Accession of alternate haplotype GCA_900634435.1
Span (Mb)609
Number of contigs766
Contig N50 length (Mb)5,939,854
Number of scaffolds322
Scaffold N50 length (Mb)25,156,145
Longest scaffold (Mb)30.48
BUSCO genome scoreC:93.4%, [S:90.5%, D:2.9%], F:1.3%, M:5.3%, n:4584
Figure 2.

Hi-C contact map for the genome assembly of Cottoperca gobio, fCotGob3.1.

Visualized in Juicebox ( Durand ).

Table 2.

Chromosomal pseudomolecules in the genome assembly fCotGob3.1, of species Cottoperca gobio - GCA_900634415.1.

NameINSDCRefSeqSize (Mb)GC%ProteinGene
1 LR131916.1 NC_041355.1 27.0640.81,8081,175
2 LR131927.1 NC_041356.1 12.9241.9792681
3 LR131933.1 NC_041357.1 30.0340.31,487919
4 LR131934.1 NC_041358.1 28.9540.71,6291,007
5 LR131935.1 NC_041359.1 30.4840.92,0331,302
6 LR131936.1 NC_041360.1 27.6840.91,8231,143
7 LR131937.1 NC_041361.1 23.07411,6191,088
8 LR131938.1 NC_041362.1 23.4341.21,8361,194
9 LR131939.1 NC_041363.1 30.07411,8881,158
10 LR131917.1 NC_041364.1 27.4440.81,407992
11 LR131918.1 NC_041365.1 22.1940.81,440909
12 LR131919.1 NC_041366.1 22.940.61,424850
13 LR131920.1 NC_041367.1 27.74411,5421,029
14 LR131921.1 NC_041368.1 25.740.61,6271,134
15 LR131922.1 NC_041369.1 24.96411,365967
16 LR131923.1 NC_041370.1 26.58411,8111,094
17 LR131924.1 NC_041371.1 25.1640.81,6631,228
18 LR131925.1 NC_041372.1 14.9341.81,018690
19 LR131926.1 NC_041373.1 21.0641.21,563969
20 LR131928.1 NC_041374.1 17.641.4964649
21 LR131929.1 NC_041375.1 24.140.61,400937
22 LR131930.1 NC_041376.1 22.6141.31,4151,026
23 LR131931.1 NC_041377.1 15.9341.9973594
24 LR131932.1 NC_041378.1 22.4441.11,2291,184
Unplaced-.34.3441.62,0931,676
Figure 3.

Syntenic relationships of fCotGob3.1 assembly with Japanese medaka HdrR chromosomes, based on single copy orthologs.

Visualised in Circos ( Krzywinski ).

Figure 4.

Examples of conserved synteny between Japanese medaka HdrR (purple) and fCotGob3.1 (pink) from chromosomes 1, 3, 6, and 16 (source: Ensembl).

Genome assembly of Cottoperca gobio, fCotGob3.1. - BlobToolKit Snailplot, showing N50 metrics and BUSCO gene completeness.

BlobToolKit plots are available at: fCotGob3.1 - BlobToolKit.

Hi-C contact map for the genome assembly of Cottoperca gobio, fCotGob3.1.

Visualized in Juicebox ( Durand ).

Syntenic relationships of fCotGob3.1 assembly with Japanese medaka HdrR chromosomes, based on single copy orthologs.

Visualised in Circos ( Krzywinski ).

Gene annotation

An Ensembl annotation was generated for the fCotGob3.1 assembly using RNA-seq data generated from 4 tissues (brain, muscle, ovary, and spleen). The annotation for assembly fCotGob3.1 was released in Ensembl under database version 99.31 ( Hunt ) (for fish clade annotation information see 2019-09: fish clade gene annotation). The resulting Ensembl annotation includes 60,811 transcripts assigned to 21,662 coding and 2,823 non-coding genes ( Channel bull blenny - Ensembl). RefSeq annotation is also available as NCBI Cottoperca gobio Annotation Release 100 ( Table 2).

Methods

Specimen acquisition and nucleic acid extractions

Both specimens used to generate the genome assembly were collected south of the Falkland Islands/Islas Malvinas in 2004 (Lat Long: -52° 40’, -59° 12’) during the ICEFISH 2004 Cruise (International Collaborative Expedition to collect and study Fish Indigenous to Sub-Antarctic Habitats; led by H. W. Detrich ( Detrich )) of the RVIB Nathaniel B. Palmer. Following euthanasia, fresh blood was collected from specimen fCotGob3, and spleen tissue (used for Hi-C) was collected from specimen fCotGob2 and was flash frozen in liquid nitrogen. Blood was processed immediately, whereas flash frozen spleen was preserved in the -80 freezer until processing. For RNA sequencing, tissue samples from two specimens were used (fCotGob2 - spleen, and fCotGob1 - brain, skeletal muscle, ovary). The additional tissues (fCotGob1) were preserved in RNALater and kept frozen until extraction. The tissues were sampled by T. Desvignes, H. W. Detrich, and J. H. Postlethwait from a specimen captured northwest of the Falkland Islands in 2018 by the Falkland Islands Fisheries Department ( Grass ). High molecular weight (HMW) DNA from fresh blood cells was prepared using an agarose plug extraction protocol ( Smith ). Blood DNA was initially stabilised in agarose plugs and then shipped to Sanger Institute where the final steps of the extraction were performed using a BioNano Tissue extraction protocol. Quality control (QC) of HMW DNA was performed using the Femto Pulse instrument (Agilent). Total RNA was extracted from approximately 20–40 mg of tissue, from brain, skeletal muscle, ovary and spleen tissues using the RNeasy Qiagen extraction kit (Qiagen). QC was performed using Qubit HS RNA kit, and Agilent Bioanalyzer Nano chips. Only extracts with RIN value >8 were used for sequencing.

Sequencing

PacBio continuous long read (CLR) and 10X Genomics linked read sequencing libraries were constructed according to manufacturers’ instructions. Sequencing was performed by the Scientific Operations core at the Wellcome Sanger Institute on PacBio SEQUEL I and Illumina HiSeq X instruments. Hi-C data were generated using the Arima Hi-C kit v1 by Arima Genomics. BioNano data were generated on Saphyr (dual enzyme) at Bionano Genomics. RNA-seq was performed on HiSeq 4000 with 150bp insert paired end (PE) libraries.

Genome assembly

An initial PacBio assembly was made using Falcon-unzip ( Chin ) without repeat-masking during overlap detection with Dazzler. The contigs from this assembly were first scaffolded by comparing them to a second wtdbg ( Ruan & Li, 2019) assembly using cross_genome, then they were scaffolded further using the 10X data with scaff10X, and then with BioNano two-enzyme hybrid scaffolding using Solve v3.2.1. The original PacBio data were then used to fill gaps with PBJelly ( English ) and polish with Arrow. The resulting assembly was then polished again using the 10X Illumina data, by mapping with bwa mem ( Li, 2013), calling variants with freebayes ( Garrison & Marth, 2012), and correcting homozygous non-reference variants with bcftools consensus. Contiguity was increased further by filling gaps with the contigs from a second wtdgb assembly, which was made using PacBio reads corrected with Canu ( Koren ). This assembly was re-polished with Arrow and freebayes, and retained haplotigs were identified with Purge Haplotigs ( Roach ). Finally, the assembly was scaffolded to chromosomes using Arima Hi-C data with Salsa ( Ghurye ). The scaffolded assembly was checked for contamination and manually improved using gEVAL ( Chow ). The manual curation included steps such as correcting mis-joins, improving concordance with all available data types, and Hi-C 2D map visualized in Juicebox to produce complete chromosomal units ( Durand ). Curation resulted in 9 manual breaks, 114 manual joins and the removal of 102 regions representing false duplications, decreasing the scaffold count by 39% to 322 and increasing the scaffold N50 by 68% to 25.2 Mb. The chromosomal-level scaffolds were named based on conserved synteny to the medaka assembly ( Oryzias latipes, Assembly accession GCA_002234675.1). The genome was further analysed within the BlobToolKit environment ( Challis ). Software tools and versions used for assembly are listed in Table 3.
Table 3.

Software tools used for genome assembly.

Software toolVersionSource
Falcon- unzipfalcon-2018.03.12- 04.00( Chin et al., 2016)
wtdbg1.1( Ruan & Li, 2019)
cross_ genome2014-08-22 https://sourceforge. net/projects/phusion2/files/ cross_genome/
PBJellyPBSuite_15.8.24( English et al., 2012)
Canu1.6( Koren et al., 2017)
Purge Haplotigsv1( Roach et al., 2018)
Juicebox( Durand et al., 2016; Robinson et al., 2018)
scaff10x1.0 https://github.com/wtsi- hpag/Scaff10X
SolveSolve3.2.2_ 08222018 https://bionanogenomics. com/downloads/bionano- solve/
arrowGenomicConsensus 2.2.2 https://github.com/ PacificBiosciences/ GenomicConsensus
Bwa-mem0.7.17-r1188( Li, 2013)
freebayesv1.1.0-3-g961e5f3( Garrison & Marth, 2012)
bcftools consensus1.7 http://samtools.github. io/bcftools/bcftools.html

Data availability

Underlying data

European Nucleotide Archive: Cottoperca gobio (channel bull blenny) genome assembly, fCotGob3.1. BioProject accession number PRJEB30272; https://identifiers.org/ena.embl:PRJEB30272. The C. gobio genome sequencing is part of the Wellcome Sanger Institute’s Vertebrate Sequencing project, and of the Vertebrate Genomes Project (VGP) ordinal references programme ( Rhie ). All raw data and the assembly have been deposited in the ENA. Raw data and assembly accession identifiers are reported in Table 1.

Reporting guidelines

Not applicable.

Consent

Not applicable.

Author contributions

RD, JHP, HWD, SAM, IB: designed the experiment. IB, MS, KO: generated data. HWD, TD, JHP: provided samples. SAM, IB, JW, ZN, RD: performed data analysis. JW, WC, KH, JT: performed data curation. VGP Consortium: provided guidance for methodology development. EAM, RD: supervised the work and provided funding. IB: wrote the manuscript. All authors reviewed and edited the final version of the manuscript. The authors reported a well-assembled genome of an interesting species. The methods used are sound and sufficient, and the datasets are easily accessible. The only thing I found is that they maybe want to elaborate the importance of sequencing this species and to use it as an outgroup for comparative genomics in the Antarctic notothenioids, in addition to conservation of this species per se. Or, if possible make a comparison with the published Antarctic fish, which would increase the interest much more and maybe number of citations as well. Comments from my Doctoral student (not that I agree with them all): Have the barcodes been trimmed when using 10XIllumina data for polishing? The genes were annotated, but how their function was predicted was not clear. The sample numbers, BioProject, BioSamples and the length of each chromosome can be lodged in the “data available” part, not in the main text. The tools used were detailed in the text, which is redundant to include them in Table 3. The biological background was touched in the introduction, but no any analyses were used to address any biological questions. (I understand that this is a data note though). Are sufficient details of methods and materials provided to allow replication by others? Yes Is the rationale for creating the dataset(s) clearly described? Partly Are the datasets clearly presented in a useable and accessible format? Yes Are the protocols appropriate and is the work technically sound? Yes Reviewer Expertise: Molecular phylogenetics, genomics, bioinformatics. We confirm that we have read this submission and believe that we have an appropriate level of expertise to confirm that it is of an acceptable scientific standard. The manuscript is reporting the newly available genome assembly for Cottoperca gobio, a basal notothenioid that diverged before the evolution of the cold-adaptations and cold-specialization that define the cryonotothenioids. One of the major challenges in studying the Antarctic notothenioids has been limited availability of genetic and genomic information from phylogenetically close temperate species, making comparative investigations difficult. The C. gobio genome plays an incredibly important role filling this gap, providing a temperate companion to the recent Eleginops maclovinus genome. These provide a far more appropriate baseline to evaluate the changes that have come with evolution in a polar environment than the traditional model fish species all of which are far more phylogenetically distant. The genome sequencing, assembly, and annotation are all well described and appear handled appropriately. Personally, I can only say that I am happy that this genome is now available and look forward to utilizing it in my own work over the coming months. Are sufficient details of methods and materials provided to allow replication by others? Yes Is the rationale for creating the dataset(s) clearly described? Yes Are the datasets clearly presented in a useable and accessible format? Yes Are the protocols appropriate and is the work technically sound? Yes Reviewer Expertise: Comparative physiology, transcriptomics, genomics and bioinformatics. I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard.
  20 in total

1.  Circos: an information aesthetic for comparative genomics.

Authors:  Martin Krzywinski; Jacqueline Schein; Inanç Birol; Joseph Connors; Randy Gascoyne; Doug Horsman; Steven J Jones; Marco A Marra
Journal:  Genome Res       Date:  2009-06-18       Impact factor: 9.043

2.  Evolution of antifreeze glycoprotein gene from a trypsinogen gene in Antarctic notothenioid fish.

Authors:  L Chen; A L DeVries; C H Cheng
Journal:  Proc Natl Acad Sci U S A       Date:  1997-04-15       Impact factor: 11.205

3.  BUSCO: assessing genome assembly and annotation completeness with single-copy orthologs.

Authors:  Felipe A Simão; Robert M Waterhouse; Panagiotis Ioannidis; Evgenia V Kriventseva; Evgeny M Zdobnov
Journal:  Bioinformatics       Date:  2015-06-09       Impact factor: 6.937

4.  Development and analysis of a germline BAC resource for the sea lamprey, a vertebrate that undergoes substantial chromatin diminution.

Authors:  Jeramiah J Smith; Andrew B Stuart; Tatjana Sauka-Spengler; Sandra W Clifton; Chris T Amemiya
Journal:  Chromosoma       Date:  2010-03-02       Impact factor: 4.316

5.  Juicebox Provides a Visualization System for Hi-C Contact Maps with Unlimited Zoom.

Authors:  Neva C Durand; James T Robinson; Muhammad S Shamim; Ido Machol; Jill P Mesirov; Eric S Lander; Erez Lieberman Aiden
Journal:  Cell Syst       Date:  2016-07       Impact factor: 10.304

6.  Scaffolding of long read assemblies using long range contact information.

Authors:  Jay Ghurye; Mihai Pop; Sergey Koren; Derek Bickhart; Chen-Shan Chin
Journal:  BMC Genomics       Date:  2017-07-12       Impact factor: 3.969

7.  Juicebox.js Provides a Cloud-Based Visualization System for Hi-C Data.

Authors:  James T Robinson; Douglass Turner; Neva C Durand; Helga Thorvaldsdóttir; Jill P Mesirov; Erez Lieberman Aiden
Journal:  Cell Syst       Date:  2018-02-07       Impact factor: 10.304

8.  Purge Haplotigs: allelic contig reassignment for third-gen diploid genome assemblies.

Authors:  Michael J Roach; Simon A Schmidt; Anthony R Borneman
Journal:  BMC Bioinformatics       Date:  2018-11-29       Impact factor: 3.169

9.  BlobToolKit - Interactive Quality Assessment of Genome Assemblies.

Authors:  Richard Challis; Edward Richards; Jeena Rajan; Guy Cochrane; Mark Blaxter
Journal:  G3 (Bethesda)       Date:  2020-04-09       Impact factor: 3.154

10.  Fast and accurate long-read assembly with wtdbg2.

Authors:  Jue Ruan; Heng Li
Journal:  Nat Methods       Date:  2019-12-09       Impact factor: 28.547

View more
  4 in total

1.  Benchmarking ultra-high molecular weight DNA preservation methods for long-read and long-range sequencing.

Authors:  Hollis A Dahn; Jacquelyn Mountcastle; Jennifer Balacco; Sylke Winkler; Iliana Bista; Anthony D Schmitt; Olga Vinnere Pettersson; Giulio Formenti; Karen Oliver; Michelle Smith; Wenhua Tan; Anne Kraus; Stephen Mac; Lisa M Komoroske; Tanya Lama; Andrew J Crawford; Robert W Murphy; Samara Brown; Alan F Scott; Phillip A Morin; Erich D Jarvis; Olivier Fedrigo
Journal:  Gigascience       Date:  2022-08-10       Impact factor: 7.658

2.  Comparative Analysis of the pIgR Gene from the Antarctic Teleost Trematomus bernacchii Reveals Distinctive Features of Cold-Adapted Notothenioidei.

Authors:  Alessia Ametrano; Simona Picchietti; Laura Guerra; Stefano Giacomelli; Umberto Oreste; Maria Rosaria Coscia
Journal:  Int J Mol Sci       Date:  2022-07-14       Impact factor: 6.208

3.  Population genomics of an icefish reveals mechanisms of glacier-driven adaptive radiation in Antarctic notothenioids.

Authors:  Ying Lu; Wenhao Li; Yalin Li; Wanying Zhai; Xuming Zhou; Zhichao Wu; Shouwen Jiang; Taigang Liu; Huamin Wang; Ruiqin Hu; Yan Zhou; Jun Zou; Peng Hu; Guijun Guan; Qianghua Xu; Adelino V M Canário; Liangbiao Chen
Journal:  BMC Biol       Date:  2022-10-13       Impact factor: 7.364

4.  Evolution of Transient Receptor Potential (TRP) Ion Channels in Antarctic Fishes (Cryonotothenioidea) and Identification of Putative Thermosensors.

Authors:  Julia M York; Harold H Zakon
Journal:  Genome Biol Evol       Date:  2022-02-04       Impact factor: 3.416

  4 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.