Literature DB >> 31222011

Whole genome resequencing data for three rockfish species of Sebastes.

Shengyong Xu1, Linlin Zhao2, Shijun Xiao3, Tianxiang Gao4.   

Abstract

Here we report Illumina-based whole genome sequencing of three rockfish species of Sebastes in northwest Pacific. The whole genomic DNA was used to prepare 350-bp pair-end libraries and the high-throughput sequencing yielded 128.5, 137.5, and 124.8 million mapped reads corresponding to 38.54, 41.26, and 37.43 Gb sequence data for S. schlegelii, S. koreanus, and S. nudus, respectively. The k-mer analyses revealed genome sizes were 846.4, 832.5, and 813.1 Mb and the sequencing coverages were 45×, 49×, and 46× for three rockfish, respectively. Comparative genomic analyses identified 46,624 genome-wide single nucleotide polymorphisms (SNPs). Phylogenetic analysis revealed closer relationships of the three species, compared to other six rockfish species. Demographic analysis identified contrasting changes between S. schlegelii and other two species, suggesting drastically different response to climate changes. The reported genome data in this study are valuable for further studies on comparative genomics and evolutionary biology of rockfish species.

Entities:  

Mesh:

Year:  2019        PMID: 31222011      PMCID: PMC6586840          DOI: 10.1038/s41597-019-0100-z

Source DB:  PubMed          Journal:  Sci Data        ISSN: 2052-4463            Impact factor:   6.444


Background & Summary

The rockfish of genus Sebastes Cuvier 1829 is the most specious in the family Sebastidae (Actinopterygii: Scorpaeniformes)[1,2]. The genus contains nearly 110 species worldwide and most of the species are subjected to substantial commercial and recreational fisheries[2]. Such great species diversity is likely attributed to recent species diversification processes[2-4], thus resulting taxonomic confusion in some areas due to morphological similarity. The rockfish species have provided valuable opportunities for evolutionary studies, shedding light on the origin and diversification within the genus[3,5]. In addition, as ovoviviparous teleost, rockfish could provide exceptional clues for studying evolution of their reproductive ecology. Ovoviviparity is a unique fish reproduction mode, in which fertilized eggs cannot be delivered from the female ovary until the embryos are mature[6]. In these respects, molecular information such as whole genome data would contribute to providing more comprehensive insights into evolutionary biology of these species. In this study, we report whole genome data of three marine ovoviviparous fish in genus Sebastes, viz., Sebastes schlegelii Hilgendorf 1880, Sebastes koreanus Kim and Lee 1994, and Sebastes nudus Matsubara 1943. The three rockfish are commercial species commonly distributed in Korea, Japan, and northeast coast of China[1]. Herein, a total of three male adults (each individual representing one species) were collected from coastal waters of Qingdao, China. Prior to sequencing, the genome sizes of three species were estimated as ~800 Mb, thus nearly 40 Gb sequencing data (about 50× genome coverage) of each species was produced by Illumina HiSeq2500 sequencing platform. We intend to develop genomic resources for further studies on taxonomy, phylogenetics, conservation and evolution of these commercially important rockfish in genus Sebastes. The experimental design, sequencing and analysis pipeline is shown in Fig. 1. After data filtering, a total of 38.54, 41.26, and 37.43 Gb sequence data were produced for S. schlegelii, S. koreanus, and S. nudus, respectively (Table 1). K-mer analyses revealed the genome size was 846.4, 832.5, and 813.1 Mb for the respective three species (Table 2). The genome sequences of S. schlegelii, S. koreanus, and S. nudus were assembled into scaffolds with a total size of 755.1, 751.7, and 748.5 Mb, respectively. The estimated genomic information of three rockfish species were shown in Table 2.
Fig. 1

Overview of the experimental design and analysis pipeline.

Table 1

Summary of the high-throughput sequencing in this study.

SpeciesSample typeLibrary typeTotal length (Gb)Effective rate (%)Error rate (%)Q20 (%)SRA Accession Number
Sebastes schlegelii DNA350-bp pair-end38.5499.630.0297.53SRP171394
Sebastes koreanus DNA350-bp pair-end41.2699.630.0297.53SRP171394
Sebastes nudus DNA350-bp pair-end37.4399.860.0397.42SRP171394
Table 2

Statistical information of k-mer analysis and genome assembly in this study.

SpeciesGen (Mb)Cov (X)Het (%)Rep (%)NocontN50cont (bp)Lcont (Kb)NoscafN50scaf (bp)Lscaf (Kb)Ltotal (Mb)
Sebastes schlegelii 846.36450.2244.10379,9926,79086.78260,97113,978207.70755.13
Sebastes koreanus 832.53490.2043.65385,4027,261109.96259,17616,225269.69751.68
Sebastes nudus 813.12460.3141.40402,8576,813109.92318,03110,934165.86748.46

Note: Gen: genome size, Cov: sequencing coverage, Het: Heterozygous ratio, Rep: Repeat ratio, Nocont: number of contigs, N50cont: contig N50, Lcont: maximum length of contigs, Noscaf: number of scaffolds, N50scaf: scaffold N50, Lscaf: maximum length of scaffolds, Ltotal: total length of scaffolds.

Overview of the experimental design and analysis pipeline. Summary of the high-throughput sequencing in this study. Statistical information of k-mer analysis and genome assembly in this study. Note: Gen: genome size, Cov: sequencing coverage, Het: Heterozygous ratio, Rep: Repeat ratio, Nocont: number of contigs, N50cont: contig N50, Lcont: maximum length of contigs, Noscaf: number of scaffolds, N50scaf: scaffold N50, Lscaf: maximum length of scaffolds, Ltotal: total length of scaffolds. The filtered clean data were mapped to the reported S. steindachneri (GCA_001910785.2) reference genome and the generated bam files were subsequently investigated in demographic analyses. A coalescent-based hidden Markov model, the pairwise sequentially Markovian coalescent (PSMC) model, was used to infer the history of effective population sizes (Ne). The PSMC results exhibited contrasting demographic changes in the last glacial, revealing Ne increase in S. schlegelii and decrease in other two species (Fig. 2). The demographic analyses suggested that drastically different responses to climate changes can be detected in closely related species, as reported in demographic changes of two closely related dolphin species[7]. Such contrasting demographic changes could be due to the altered ecology of competitors and the pattern of population differentiation[7]. Further studies are warranted to specify the contrasting demographic patterns among closely related species. In addition, phylogenetic relationship of species in genus Sebastes were reconstructed based on whole genome sequences. Supplemented with six reported genome sequences, a total of 14,821,089 single nucleotide polymorphisms (SNPs) were identified. After SNP filtering, the remaining 46,624 SNPs were employed in phylogenetic reconstruction. The neighbour-joining topology revealed closer relationship of S. schlegelii, S. koreanus, and S. nudus, compared to other rockfish species in this genus (Fig. 3). Based on a literature survey and author knowledge, the reported whole genome data in the present study is the first whole genome information present to the public of the three rockfish, therefore, these data could be valuable for further studies on taxonomy, phylogenetics and evolutionary biology of rockfish species.
Fig. 2

Demographic history of three rockfish species in this study. PSMC estimates of demographic changes in effective population size (Ne) over time inferred from the draft genome sequences of the three rockfishes. Thick lines represent the median and thin light lines correspond to 100 rounds of bootstrapping.

Fig. 3

Phylogenetic relationship reconstructed based on whole genome sequences of nine rockfish species. The whole genome sequences of 9 rockfish species (including 6 reported species and 3 species in this study) were used for phylogenetic reconstruction based on neighbour-joining algorithm.

Demographic history of three rockfish species in this study. PSMC estimates of demographic changes in effective population size (Ne) over time inferred from the draft genome sequences of the three rockfishes. Thick lines represent the median and thin light lines correspond to 100 rounds of bootstrapping. Phylogenetic relationship reconstructed based on whole genome sequences of nine rockfish species. The whole genome sequences of 9 rockfish species (including 6 reported species and 3 species in this study) were used for phylogenetic reconstruction based on neighbour-joining algorithm.

Methods

Sample collection

Animal experiments were conducted in accordance with the guidelines approved by the Zhejiang Ocean University Animal Ethics Committee and the national legislation. The sample collection procedure was following the description of our previous published work (ref.[8]). To obtain enough genomic DNA for the Illumina sequencing, we collected fresh epaxial white muscle tissues from Sebastes schlegelii, S. koreanus, and S. nudus sampled from Qingdao, China. The samples were quickly frozen in liquid nitrogen for 1 hour before storing at −80 °C. Genomic DNA was extracted using a standard phenol/chloroform extraction protocol. The integrity of genomic DNA molecules was checked using agarose gel electrophoresis, showing a main band around 20 Kb and satisfying the requirement for Illumina library construction by the manufacturer’s protocol.

Whole-genome sequencing

Whole genome sequencing was performed commercially at Novogene Co. Ltd in Beijing. In brief, 1.0 μg of genome DNA was fragmented using an E210 Focused-ultrasonicator (Covaris, Woburn, MA). The sheared DNA fragments were used to prepare pair-end libraries with an average insert size of 350 bp for all samples according to the manufacturer’s instructions (Illumina Inc., San Diego, CA). Each library was sequenced in two independent lanes of HiSeq 2500 platform (Illumina Inc.) using 150-bp pair-end fashion. The raw data were converted to single-sample FASTQ files through base calling procedure and after filtering interference information such as adaptors and low-quality reads, the clean data FASTQ files of each sample were employed for further bioinformatics analyses.

Genome assembly

The genome size, heterozygous ratio and repeat ratio were estimated using k-mer analysis (K = 17) performed in GCE v1.0.0[9]. Pair-end reads were assembled into contigs and scaffolds in SOAPdenovo v2.01[10] with a k-mer of 41 by applying the de Bruijn graph structure.

Phylogenetic analysis

The generated genome data were supplemented with publicly available sequences of six rockfish species in genus Sebastes, i.e. S. steindachneri (GCA_001910785.2), S. aleutianus (GCA_001910805.2), S. minor (GCA_001910765.2), S. nigrocinctus (GCA_000475235.3), S. norvegicus (GCA_900302655.1), and S. rubrivinctus (GCA_000475215.1) downloaded from NCBI database. The clean reads were aligned to the genome reference of S. steindachneri by using the bwa-mem algorithm in BWA 0.7.12[11] with default parameters. Single nucleotide polymorphisms (SNPs) calling was implemented in SAMtools 1.3.1[12] with default parameters. SNP filtering was produced using VCFtools[13]. The SNP calling procedure and parameters are expanded versions of descriptions in our related work[14]. In order to avoid sex bias affecting topological structure, contigs containing SNPs were cross-validated with the sex-determining loci identified in the previous study[15]. Sex-determining SNP loci were excluded in phylogenetic analysis. Phylogenetic tree of the nine species of Sebastes based on the filtered SNPs was reconstructed using neighbour-joining (NJ) method in Tassel 5[16] with default parameters. However, potential sampling bias should be raised as a caveat when performing phylogenetic analyses based on SNPs derived from one single individual per species. Further analyses are warranted to obtain more robust results by sampling more individuals.

Demographic analysis

Analysis of demographic history for all three rockfish species was done using the PSMC model, as implemented in the PSMC package[17]. The “fq2psmcfa” and “splitfa” tools from the PSMC package were used to create the input file for the PSMC modelling. The PSMC analysis command included the options “-N25” for the number of cycles of the algorithm, “-t15” as the upper limit for the most recent common ancestor (TMRCA), “-r5” for the initial θ/ρ, and “-p 4 + 25*2 + 4 + 6” atomic intervals. The reconstructed population history was plotted using “psmc_plot.pl” script using the substitution rate “-u 2.5e-8” adopted from medaka[18], and a generation time of 8 years. The generation time was calculated as: g = a + [s/(1 − s)][19], where s is the expected adult survival rate which is assumed as 80%, and a is sexual maturation age that is 4 years for S. schlegelii[20]. Therefore, the generation time was determined as 8 in the PSMC analysis. To determine variance in the estimated effective population size, we performed 100 bootstraps for each species.

Data Records

All sequencing raw reads for the three rockfish species have been deposited within NCBI Sequence Read Archive[21], and the assembly genome sequences (Sebastes schlegelii[22], S. nudus[23], and S. koreanus[24]) have been deposited within GenBank. Also, the assembly genome sequences, aligned VCF files and phylogenetic tree file were stored in Figshare[25].

Technical Validation

In our present study, the sampled fish individuals were captured using hook-and-line fishing in the coastal waters of Qingdao, China. Taxonomic determination was implemented in the laboratory by identifying morphological characters. The DNA quality was checked using agarose gel electrophoresis (Fig. 4). The preprocessing steps including quality evaluation and data filtering of raw reads were implemented by the following procedures as in the previous study[8]. The quality of raw reads was evaluated using FastQC[26] software and low-quality reads were filtered using HTQC[27] software according to the following criteria: (1) adaptors in the reads were trimmed and removed; (2) read pairs were removed when either of the reads had more than 10% of N bases; (3) read pairs were removed if either of the reads had more than 20% low-quality bases (phred quality score < 5); (4) ambiguous or low-quality fragments at the two ends of reads within a window size of 5 bp and an average quality threshold of 20 were trimmed. The sequencing quality was also assessed by examining GC-content, Q20-statistics and error rate (Table 1, Fig. 5). FastQC output files can be also viewed within the Supplementary Information. Moreover, the parameters used in bioinformatics analyses were following the default settings or the published literatures, which were provided in the Methods section.
Fig. 4

Agarose gel electrophoresis of DNA integrity assessment. The DNA lanes presented here have been cropped from a larger image with multiple DNA samples. Two kinds of DNA markers (M-1 and M-2) were used for DNA integrity assessment. Numbers embedded in the diagram (33, 34, and 35) represent S. schlegelii, S. koreanus, and S. nudus, respectively.

Fig. 5

Quality evaluation including base composition, quality scoring and error rate of sequencing data. Sequencing quality met the requirement of further bioinformatics analyses in all three species. Illustrated here by the example of S. nudus.

Agarose gel electrophoresis of DNA integrity assessment. The DNA lanes presented here have been cropped from a larger image with multiple DNA samples. Two kinds of DNA markers (M-1 and M-2) were used for DNA integrity assessment. Numbers embedded in the diagram (33, 34, and 35) represent S. schlegelii, S. koreanus, and S. nudus, respectively. Quality evaluation including base composition, quality scoring and error rate of sequencing data. Sequencing quality met the requirement of further bioinformatics analyses in all three species. Illustrated here by the example of S. nudus.
Design Type(s)species comparison design • sequence analysis objective
Measurement Type(s)whole genome sequencing assay
Technology Type(s)DNA sequencing
Factor Type(s)
Sample Characteristic(s)Sebastes schlegelii • Sebastes koreanus • Sebastes nudus
  17 in total

1.  The incomplete history of mitochondrial lineages between two rockfishes, Sebastes longispinis and Sebastes hubbsi (Scorpaeniformes: Scorpaenidae).

Authors:  Y Kai; K-D Park; T Nakabo
Journal:  J Fish Biol       Date:  2012-08       Impact factor: 2.051

2.  Genomic characterization of sex-identification markers in Sebastes carnatus and Sebastes chrysomelas rockfishes.

Authors:  Benjamin L S Fowler; Vincent P Buonaccorsi
Journal:  Mol Ecol       Date:  2016-03-28       Impact factor: 6.185

3.  The geography of morphological convergence in the radiations of Pacific Sebastes rockfishes.

Authors:  Travis Ingram; Yoshiaki Kai
Journal:  Am Nat       Date:  2014-10-06       Impact factor: 3.926

4.  A draft genome assembly of the Chinese sillago (Sillago sinica), the first reference genome for Sillaginidae fishes.

Authors:  Shengyong Xu; Shijun Xiao; Shilin Zhu; Xiaofei Zeng; Jing Luo; Jiaqi Liu; Tianxiang Gao; Nansheng Chen
Journal:  Gigascience       Date:  2018-09-01       Impact factor: 6.524

5.  HTQC: a fast quality control toolkit for Illumina sequencing data.

Authors:  Xi Yang; Di Liu; Fei Liu; Jun Wu; Jing Zou; Xue Xiao; Fangqing Zhao; Baoli Zhu
Journal:  BMC Bioinformatics       Date:  2013-01-31       Impact factor: 3.169

6.  The variant call format and VCFtools.

Authors:  Petr Danecek; Adam Auton; Goncalo Abecasis; Cornelis A Albers; Eric Banks; Mark A DePristo; Robert E Handsaker; Gerton Lunter; Gabor T Marth; Stephen T Sherry; Gilean McVean; Richard Durbin
Journal:  Bioinformatics       Date:  2011-06-07       Impact factor: 6.937

7.  Genomic and phenotypic characterization of a wild medaka population: towards the establishment of an isogenic population genetic resource in fish.

Authors:  Mikhail Spivakov; Thomas O Auer; Ravindra Peravali; Ian Dunham; Dirk Dolle; Asao Fujiyama; Atsushi Toyoda; Tomoyuki Aizu; Yohei Minakuchi; Felix Loosli; Kiyoshi Naruse; Ewan Birney; Joachim Wittbrodt
Journal:  G3 (Bethesda)       Date:  2014-03-20       Impact factor: 3.154

8.  Fast and accurate short read alignment with Burrows-Wheeler transform.

Authors:  Heng Li; Richard Durbin
Journal:  Bioinformatics       Date:  2009-05-18       Impact factor: 6.937

9.  SOAPdenovo2: an empirically improved memory-efficient short-read de novo assembler.

Authors:  Ruibang Luo; Binghang Liu; Yinlong Xie; Zhenyu Li; Weihua Huang; Jianying Yuan; Guangzhu He; Yanxiang Chen; Qi Pan; Yunjie Liu; Jingbo Tang; Gengxiong Wu; Hao Zhang; Yujian Shi; Yong Liu; Chang Yu; Bo Wang; Yao Lu; Changlei Han; David W Cheung; Siu-Ming Yiu; Shaoliang Peng; Zhu Xiaoqian; Guangming Liu; Xiangke Liao; Yingrui Li; Huanming Yang; Jian Wang; Tak-Wah Lam; Jun Wang
Journal:  Gigascience       Date:  2012-12-27       Impact factor: 6.524

10.  Deep Transcriptomic Analysis of Black Rockfish (Sebastes schlegelii) Provides New Insights on Responses to Acute Temperature Stress.

Authors:  Likang Lyu; Haishen Wen; Yun Li; Jifang Li; Ji Zhao; Simin Zhang; Min Song; Xiaojie Wang
Journal:  Sci Rep       Date:  2018-06-14       Impact factor: 4.379

View more
  5 in total

1.  High Genetic Connectivity Inferred from Whole-Genome Resequencing Provides Insight into the Phylogeographic Pattern of Larimichthys polyactis.

Authors:  Jian Zheng; Linlin Zhao; Xiang Zhao; Tianxiang Gao; Na Song
Journal:  Mar Biotechnol (NY)       Date:  2022-06-14       Impact factor: 3.727

Review 2.  Bioinformatics for Marine Products: An Overview of Resources, Bottlenecks, and Perspectives.

Authors:  Luca Ambrosino; Michael Tangherlini; Chiara Colantuono; Alfonso Esposito; Mara Sangiovanni; Marco Miralto; Clementina Sansone; Maria Luisa Chiusano
Journal:  Mar Drugs       Date:  2019-10-11       Impact factor: 5.118

3.  Comparative transcriptomic analysis of gonadal development and renewal in the ovoviviparous black rockfish (Sebastes schlegelii).

Authors:  Jianshuang Li; Likang Lyu; Haishen Wen; Yun Li; Xiaojie Wang; Ying Zhang; Yijia Yao; Xin Qi
Journal:  BMC Genomics       Date:  2021-12-04       Impact factor: 3.969

Review 4.  Alternative Animal Models of Aging Research.

Authors:  Susanne Holtze; Ekaterina Gorshkova; Stan Braude; Alessandro Cellerino; Philip Dammann; Thomas B Hildebrandt; Andreas Hoeflich; Steve Hoffmann; Philipp Koch; Eva Terzibasi Tozzini; Maxim Skulachev; Vladimir P Skulachev; Arne Sahm
Journal:  Front Mol Biosci       Date:  2021-05-17

5.  Genome and gene evolution of seahorse species revealed by the chromosome-level genome of Hippocampus abdominalis.

Authors:  Libin He; Xin Long; Jianfei Qi; Zongji Wang; Zhen Huang; Shuiqing Wu; Xingtan Zhang; Huiyu Luo; Xinxin Chen; Jinbo Lin; Qiuhua Yang; Shiyu Huang; Qi Zhou; Leyun Zheng
Journal:  Mol Ecol Resour       Date:  2021-11-07       Impact factor: 8.678

  5 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.