Literature DB >> 28369350

Draft genome of the sea cucumber Apostichopus japonicus and genetic polymorphism among color variants.

Jihoon Jo1, Jooseong Oh1, Hyun-Gwan Lee2, Hyun-Hee Hong1, Sung-Gwon Lee1, Seongmin Cheon1, Elizabeth M A Kern3, Soyeong Jin3, Sung-Jin Cho4, Joong-Ki Park3, Chungoo Park1.   

Abstract

The Japanese sea cucumber (Apostichopus japonicus Selenka 1867) is an economically important species as a source of seafood and ingredient in traditional medicine. It is mainly found off the coasts of northeast Asia. Recently, substantial exploitation and widespread biotic diseases in A. japonicus have generated increasing conservation concern. However, the genomic knowledge base and resources available for researchers to use in managing this natural resource and to establish genetically based breeding systems for sea cucumber aquaculture are still in a nascent stage. A total of 312 Gb of raw sequences were generated using the Illumina HiSeq 2000 platform and assembled to a final size of 0.66 Gb, which is about 80.5% of the estimated genome size (0.82 Gb). We observed nucleotide-level heterozygosity within the assembled genome to be 0.986%. The resulting draft genome assembly comprising 132 607 scaffolds with an N50 value of 10.5 kb contains a total of 21 771 predicted protein-coding genes. We identified 6.6-14.5 million heterozygous single nucleotide polymorphisms in the assembled genome of the three natural color variants (green, red, and black), resulting in an estimated nucleotide diversity of 0.00146. We report the first draft genome of A. japonicus and provide a general overview of the genetic variation in the three major color variants of A. japonicus. These data will help provide a comprehensive view of the genetic, physiological, and evolutionary relationships among color variants in A. japonicus, and will be invaluable resources for sea cucumber genomic research.
© The Author 2017. Published by Oxford University Press.

Entities:  

Keywords:  Apostichopus japonicas; Color variants; Genetic variation; Population genomics; Sea cucumber genome

Mesh:

Year:  2017        PMID: 28369350      PMCID: PMC5437941          DOI: 10.1093/gigascience/giw006

Source DB:  PubMed          Journal:  Gigascience        ISSN: 2047-217X            Impact factor:   6.524


Data description

Background information on Apostichopus japonicus

The class Holothuroidea (also known as sea cucumbers) belongs to the phylum Echinodermata and comprises approximately 1250 recorded species worldwide, including some species that are of commercial and medical value [1, 2]. Apostichopus japonicus Selenka 1867 is one of the well-known, commercially important sea cucumber species and occurs in the northwestern Pacific coast including China, Japan, Korea, and the Far Eastern seas. This species exhibits a wide array of dorsal/ventral color variants (in particular green, red, and black; Fig. 1), which differ in their biological and morphological attributes (e.g., shape of ossicle, habitat preference, spawning period, and polian vesicles) [1, 3]. The red variant is found on rock pebbles and gravel substrate and has higher salinity and temperature tolerance than the other color variants [4, 5]. Green and black variants are found on sandy and muddy bottoms at shallower depths, and the green variant has greater plasticity in thermotolerance than the red variant [6, 7].
Figure 1.

Three color variants of A. japonicus (green, red, and black).

Three color variants of A. japonicus (green, red, and black). Recently, overexploitation and the prevalence of biotic diseases (viral infections) in sea cucumber aquaculture have generated increasing conservation concern [8, 9]. However, the genomic knowledge base and resources available for researchers to use in managing this natural resource or establishing genetically based breeding systems are still in a nascent stage [10].

Sample collection and genomic DNA extraction

Specimens of the three color A. japonicus variants (green, red, and black) were collected from same geographical location (GPS data: 34.1 N, 127.18 E, Geomun-do, Yeosu, Republic of Korea). Genomic DNA of each color variant was extracted manually from body wall tissues of single male specimens. Briefly, we ground the tissues to fine powder using mortar and pestle with liquid nitrogen freezing. Tissue powders were digested for 1 hour at 65°C in CTAB buffer (2% cetyltrimethylammonium bromide, 1.4 M NaCl, 20 mM EDTA, 100 mM Tris-HCl, and pH 8.0), followed by Phenol/Chloroform extraction and ethanol precipitation.

Sequencing and quality control

Using the standard protocol provided by Illumina (San Diego, USA), we constructed both short-insert (180 and 400 bp) and long-insert (2 kb) libraries for 2 × 101 bp paired-end reads, which were sequenced using a HiSeq 2000 instrument. For the green color variant, a total of 225 Gb of raw data was generated from all three libraries. In the case of the red and black color variants, 40 and 47 Gb of raw reads, respectively, were produced by 400 bp short-insert library. The raw reads were preprocessed using Trimmomatic v0.33 [11] and Trim Galore [12], in which reads containing adapter sequences, poly-N sequences, or low-quality bases (below a mean Phred score of 20) were removed. To correct errors in the raw sequences, we used ALLPATHS-LG v52488 [13]. Approximately 208, 39, and 42 billion clean reads were obtained for green, red, and black color variant samples, respectively (Table 1). The A. japonicus genome size was estimated to be approximately 0.9 Gb based on k-mer measurement (Fig. 2), which is fully consistent with genome size measured by flow cytometry (∼0.82 Gb) [14]. Based on this estimation, the clean sequence reads correspond to about 356-fold coverage of the A. japonicus genome.
Table 1.

Statistics on total reads of the A. japonicus genome

VariantsInsertion size (bp)Total readsa (raw data)Total readsa (w/o adaptor)Total readsa (error corrected)% error corrected
Green180498 608 646474 117 288466 062 9201.70
400897 432 174842 766 704831 964 2421.28
2000 (v1)293 701 464270 513 434268 573 8120.72
2000 (v2)538 359 438496 446 984493 387 4180.62
Total2228 101 7222083 844 4102059 988 3921.14
Red400397 799 042394 984 810383 734 4402.85
Black400460 597 940423 543 558416 007 6141.78

aThe length of each read is 101 bp.

Figure 2.

K-mer distribution of the A. japonicus genome.

K-mer distribution of the A. japonicus genome. Schematic workflow of A. japonicus genome assembly and annotation. The left side represents the genome assembly and the right side represents the transcriptome assembly that was performed in previous publications. To achieve suitable gene prediction, we integrated these two assembly results. Statistics on total reads of the A. japonicus genome aThe length of each read is 101 bp.

Assembly

For whole-genome assembly, we used reads only from green color variant libraries and employed Platanus v1.2.4 [15], which is well suited for high-throughput short reads and heterozygous diploid genomes. Briefly, error corrected paired-end (insert size: 180 bp and 400 bp) reads were input for contig assembly. Next, all cleaned paired-end (insert size: 180 bp and 400 bp) and mate-paired (insert size: two 2 kb samples) reads were mapped onto the contigs for scaffold building and were utilized for gap filling (any nucleotide represented by “N” in scaffolds). After gap filling by Platanus, the gaps that still remained in the resulting scaffolds were closed using GapCloser (a module of SOAPdenovo2 [16]). The final genome assembly was 0.66 Gb in total length, which is about 80.5% of the estimated genome size by flow cytometry (0.82 Gb) [14], and is composed of 132 607 scaffolds and unscaffolded contigs (that are longer than or equal to 1 kb) with an N50 value of 10.5 kb (Table 2). We assessed the completeness of the assembly using CEGMA v2.4.010312 [17] and BUSCO v1.22 [18]. Then 73.4% of the core eukaryotic genes (based on the 248 core essential genes) and 60.7% of the metazoan single-copy orthologs (based on the 843 genes) were identifiable in the genome. Because assembling highly heterozygous genomes is a major challenge in de novo genome sequencing, we further sought to explore whether there are other assemblers that could produce better genome assembly statistics. We applied two popular genome assemblers, SOAPdenovo2 2.04-r240 [16] and ALLPATHS-LG v52488 [13], and as expected [15], the Platanus assembler was superior to the others (Table S1).
Table 2.

Statistics on Apostichopus japonicus genome assembly

StatisticsValues
Total assembled bases (bp)664 375  288
Average length of scaffolds (bp)5010
Number of scaffolds132 607
Number of contigs197 146
Length of longest scaffold (bp)131 537
GC content (%)35.92
Scaffold N50 (bp)10 488
Contig N50 (bp)5525
Number of genes21 771
Number of exons per gene4.67
Average exon length (bp)209
Number of introns per gene4.21
Average intron length (bp)1048
Statistics on Apostichopus japonicus genome assembly

Annotation

To identify genomic repeat elements in the A. japonicus genome assembly, we ran RepeatMasker (version 4.0.6) [19] using the Repbase transposable element library (release 20150807) [20] and the de novo repeat library constructed by RepeatModeler (version 1.0.8) [21]. Approximately 27.2% of the A. japonicus genome was identified as interspersed repeats. Protein-coding genes were predicted using four steps. First, ab initio gene prediction was performed with trained AUGUSTUS v3.2.1 [22] using hints from splicing alignment of transcripts to the repeat-masked assembled genome with BLAT [23] and PASA v2.0.2 [24]. To obtain high-quality spliced alignments of expressed transcript sequences for the AUGUSTUS training set, we collected high-throughput messenger RNA sequencing (RNA-seq) data from our previous [25] (from body wall tissue of adult stage specimens) and other transcriptome (from embryo, larva, and juvenile stages [developmental-stage specific]; from gonads, intestines, respiratory trees, and coelomic fluid of adults [tissue-specific]) [26] studies, and assembled reads from the RNA-seq dataset using Trinity v2.1.1 [27]. Second, for homology-based gene prediction, homologous proteins in other species (from UniProt [28]) were mapped to the repeat-masked assembled genome using tBLASTn [29] with an E-value ≤ 1 × 10−5. The aligned sequences were predicted using GeneWise v2.4.0 [30] to search for precise spliced alignment and gene structures. Third, for homology-based gene prediction with transcriptome evidence, existing RNA-seq reads [23, 25] were mapped to the repeat-masked assembled genome using TopHat v2.1.0 [31], and gene models were built using Cufflinks v2.2.1 [32]. Finally, the resulting gene sets from each approach were integrated into a comprehensive and non-redundant consensus gene set. We predicted a total of 21 771 (≥ 50 amino acids) genes in the assembled A. japonicus genome, including 101 776 exons (average 4.67 exons per gene), and an average gene size of 5402 nucleotides (average transcript size of 982 nucleotides) (Table. 2).

Genetic polymorphism among natural color variants

To provide a general overview of the total genetic variation in the species, we realigned reads from the green color variant to the assembled genome using BWA v0.7.13 [33]. Picard v1.141 (http://broadinstitute.github.io/picard) was used to mark and remove duplicates. Before single nucleotide polymorphism (SNP) and small insertion and deletion (indel) calling, we realigned reads with indels using GATK RealignerTargetCreator and IndelRealigner v3.5 [34] to avoid misalignment around indels. Next, GATK Haplotypecaller was used to call SNPs and indels from the resulting sequences. In this study, we observed nucleotide-level heterozygosity within the assembled genome to be 0.986%; namely, we identified a total of 6 550 122 SNPs at the assembled genome, for a heterozygous SNP rate of 0.00986 per site. This high rate of nucleotide polymorphism is not uncommon in marine invertebrates and also has been found in the sea urchin genome (∼1%; at least one SNP per 100 bases) [35], which belongs to the same phylum. To measure nucleotide diversity in A. japonicus, the aforementioned analyses were repeated for red and black color variants separately, and VCFtools v0.1.14 [36] with sliding window analysis (bin 10 kb, step 1 kb) was used to calculate nucleotide diversity. We identified 6.6–14.5 million heterozygous SNPs (1.7–3.7 million small indels) in the assembled genome from the three natural color variants (Table 3), resulting in an estimated nucleotide diversity of 0.00146.
Table 3.

SNP and small indel statistics among three color variants

VariantsPercent heterozygous SNP lociPercent small indel loci
Green6 550  1221 662 708
Red14 509 7133 681 007
Black12 627  5603 198 584
SNP and small indel statistics among three color variants In summary, we report the first draft genome of A. japonicus Fig. 3 and provide a general overview of the genetic variation in its three color variants (green, red, and black). These data will help elucidate the genetic, physiological, and evolutionary relationships among different color variants in A. japonicus and will be invaluable resources for sea cucumber genomic research.
Figure 3.

Schematic workflow of A. japonicus genome assembly and annotation. The left side represents the genome assembly and the right side represents the transcriptome assembly that was performed in previous publications. To achieve suitable gene prediction, we integrated these two assembly results.

Availability of supporting data

The raw dataset of all A. japonicus genome libraries and the assembly were deposited in the NCBI database with BioProject accession number PRJNA335936, SRA accession number SRP082485, and GenBank accession number MODV00000000. The additional dataset associated with genome annotation along with further supporting data are available in the GigaScience Database, GigaDB [37]. The RNA-seq datasets used in this study were downloaded from the ENA database with accession number PRJEB12167 and the NCBI database with SRA accession number SRA046386.

Abbreviations

Indel: insertion and deletion; RNA-seq: high-throughput messenger RNA sequencing; SNP: single nucleotide polymorphism.

Competing interests

The authors declare that they have no competing interests.

Authors’ contributions

CP designed the study. CP, JKP, and SJC contributed to the project coordination. JJ, HGL, HHH, and SJ collected the samples and extracted the genomic DNA. CP, JO, SGL, and SC conducted the genome analyses. CP, JKP, JJ, and EK wrote the paper. All authors read and approved the final manuscript. Click here for additional data file. Click here for additional data file. Click here for additional data file. Click here for additional data file. Click here for additional data file. Click here for additional data file. Click here for additional data file. Click here for additional data file.
  27 in total

1.  Improving the Arabidopsis genome annotation using maximal transcript alignment assemblies.

Authors:  Brian J Haas; Arthur L Delcher; Stephen M Mount; Jennifer R Wortman; Roger K Smith; Linda I Hannick; Rama Maiti; Catherine M Ronning; Douglas B Rusch; Christopher D Town; Steven L Salzberg; Owen White
Journal:  Nucleic Acids Res       Date:  2003-10-01       Impact factor: 16.971

2.  The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data.

Authors:  Aaron McKenna; Matthew Hanna; Eric Banks; Andrey Sivachenko; Kristian Cibulskis; Andrew Kernytsky; Kiran Garimella; David Altshuler; Stacey Gabriel; Mark Daly; Mark A DePristo
Journal:  Genome Res       Date:  2010-07-19       Impact factor: 9.043

3.  CEGMA: a pipeline to accurately annotate core genes in eukaryotic genomes.

Authors:  Genis Parra; Keith Bradnam; Ian Korf
Journal:  Bioinformatics       Date:  2007-03-01       Impact factor: 6.937

4.  Using native and syntenically mapped cDNA alignments to improve de novo gene finding.

Authors:  Mario Stanke; Mark Diekhans; Robert Baertsch; David Haussler
Journal:  Bioinformatics       Date:  2008-01-24       Impact factor: 6.937

5.  BUSCO: assessing genome assembly and annotation completeness with single-copy orthologs.

Authors:  Felipe A Simão; Robert M Waterhouse; Panagiotis Ioannidis; Evgenia V Kriventseva; Evgeny M Zdobnov
Journal:  Bioinformatics       Date:  2015-06-09       Impact factor: 6.937

6.  The genome of the sea urchin Strongylocentrotus purpuratus.

Authors:  Erica Sodergren; George M Weinstock; Eric H Davidson; R Andrew Cameron; Richard A Gibbs; Robert C Angerer; Lynne M Angerer; Maria Ina Arnone; David R Burgess; Robert D Burke; James A Coffman; Michael Dean; Maurice R Elphick; Charles A Ettensohn; Kathy R Foltz; Amro Hamdoun; Richard O Hynes; William H Klein; William Marzluff; David R McClay; Robert L Morris; Arcady Mushegian; Jonathan P Rast; L Courtney Smith; Michael C Thorndyke; Victor D Vacquier; Gary M Wessel; Greg Wray; Lan Zhang; Christine G Elsik; Olga Ermolaeva; Wratko Hlavina; Gretchen Hofmann; Paul Kitts; Melissa J Landrum; Aaron J Mackey; Donna Maglott; Georgia Panopoulou; Albert J Poustka; Kim Pruitt; Victor Sapojnikov; Xingzhi Song; Alexandre Souvorov; Victor Solovyev; Zheng Wei; Charles A Whittaker; Kim Worley; K James Durbin; Yufeng Shen; Olivier Fedrigo; David Garfield; Ralph Haygood; Alexander Primus; Rahul Satija; Tonya Severson; Manuel L Gonzalez-Garay; Andrew R Jackson; Aleksandar Milosavljevic; Mark Tong; Christopher E Killian; Brian T Livingston; Fred H Wilt; Nikki Adams; Robert Bellé; Seth Carbonneau; Rocky Cheung; Patrick Cormier; Bertrand Cosson; Jenifer Croce; Antonio Fernandez-Guerra; Anne-Marie Genevière; Manisha Goel; Hemant Kelkar; Julia Morales; Odile Mulner-Lorillon; Anthony J Robertson; Jared V Goldstone; Bryan Cole; David Epel; Bert Gold; Mark E Hahn; Meredith Howard-Ashby; Mark Scally; John J Stegeman; Erin L Allgood; Jonah Cool; Kyle M Judkins; Shawn S McCafferty; Ashlan M Musante; Robert A Obar; Amanda P Rawson; Blair J Rossetti; Ian R Gibbons; Matthew P Hoffman; Andrew Leone; Sorin Istrail; Stefan C Materna; Manoj P Samanta; Viktor Stolc; Waraporn Tongprasit; Qiang Tu; Karl-Frederik Bergeron; Bruce P Brandhorst; James Whittle; Kevin Berney; David J Bottjer; Cristina Calestani; Kevin Peterson; Elly Chow; Qiu Autumn Yuan; Eran Elhaik; Dan Graur; Justin T Reese; Ian Bosdet; Shin Heesun; Marco A Marra; Jacqueline Schein; Michele K Anderson; Virginia Brockton; Katherine M Buckley; Avis H Cohen; Sebastian D Fugmann; Taku Hibino; Mariano Loza-Coll; Audrey J Majeske; Cynthia Messier; Sham V Nair; Zeev Pancer; David P Terwilliger; Cavit Agca; Enrique Arboleda; Nansheng Chen; Allison M Churcher; F Hallböök; Glen W Humphrey; Mohammed M Idris; Takae Kiyama; Shuguang Liang; Dan Mellott; Xiuqian Mu; Greg Murray; Robert P Olinski; Florian Raible; Matthew Rowe; John S Taylor; Kristin Tessmar-Raible; D Wang; Karen H Wilson; Shunsuke Yaguchi; Terry Gaasterland; Blanca E Galindo; Herath J Gunaratne; Celina Juliano; Masashi Kinukawa; Gary W Moy; Anna T Neill; Mamoru Nomura; Michael Raisch; Anna Reade; Michelle M Roux; Jia L Song; Yi-Hsien Su; Ian K Townley; Ekaterina Voronina; Julian L Wong; Gabriele Amore; Margherita Branno; Euan R Brown; Vincenzo Cavalieri; Véronique Duboc; Louise Duloquin; Constantin Flytzanis; Christian Gache; François Lapraz; Thierry Lepage; Annamaria Locascio; Pedro Martinez; Giorgio Matassi; Valeria Matranga; Ryan Range; Francesca Rizzo; Eric Röttinger; Wendy Beane; Cynthia Bradham; Christine Byrum; Tom Glenn; Sofia Hussain; Gerard Manning; Esther Miranda; Rebecca Thomason; Katherine Walton; Athula Wikramanayke; Shu-Yu Wu; Ronghui Xu; C Titus Brown; Lili Chen; Rachel F Gray; Pei Yun Lee; Jongmin Nam; Paola Oliveri; Joel Smith; Donna Muzny; Stephanie Bell; Joseph Chacko; Andrew Cree; Stacey Curry; Clay Davis; Huyen Dinh; Shannon Dugan-Rocha; Jerry Fowler; Rachel Gill; Cerrissa Hamilton; Judith Hernandez; Sandra Hines; Jennifer Hume; Laronda Jackson; Angela Jolivet; Christie Kovar; Sandra Lee; Lora Lewis; George Miner; Margaret Morgan; Lynne V Nazareth; Geoffrey Okwuonu; David Parker; Ling-Ling Pu; Rachel Thorn; Rita Wright
Journal:  Science       Date:  2006-11-10       Impact factor: 47.728

Review 7.  High-value components and bioactives from sea cucumbers for functional foods--a review.

Authors:  Sara Bordbar; Farooq Anwar; Nazamid Saari
Journal:  Mar Drugs       Date:  2011-10-10       Impact factor: 6.085

8.  The EMBL-EBI bioinformatics web and programmatic tools framework.

Authors:  Weizhong Li; Andrew Cowley; Mahmut Uludag; Tamer Gur; Hamish McWilliam; Silvano Squizzato; Young Mi Park; Nicola Buso; Rodrigo Lopez
Journal:  Nucleic Acids Res       Date:  2015-04-06       Impact factor: 16.971

9.  Low coverage sequencing of three echinoderm genomes: the brittle star Ophionereis fasciata, the sea star Patiriella regularis, and the sea cucumber Australostichopus mollis.

Authors:  Kyle A Long; Carlos W Nossa; Mary A Sewell; Nicholas H Putnam; Joseph F Ryan
Journal:  Gigascience       Date:  2016-05-10       Impact factor: 6.524

10.  Draft genome of the sea cucumber Apostichopus japonicus and genetic polymorphism among color variants.

Authors:  Jihoon Jo; Jooseong Oh; Hyun-Gwan Lee; Hyun-Hee Hong; Sung-Gwon Lee; Seongmin Cheon; Elizabeth M A Kern; Soyeong Jin; Sung-Jin Cho; Joong-Ki Park; Chungoo Park
Journal:  Gigascience       Date:  2017-01-01       Impact factor: 6.524

View more
  6 in total

1.  Draft genome of the sea cucumber Apostichopus japonicus and genetic polymorphism among color variants.

Authors:  Jihoon Jo; Jooseong Oh; Hyun-Gwan Lee; Hyun-Hee Hong; Sung-Gwon Lee; Seongmin Cheon; Elizabeth M A Kern; Soyeong Jin; Sung-Jin Cho; Joong-Ki Park; Chungoo Park
Journal:  Gigascience       Date:  2017-01-01       Impact factor: 6.524

2.  Molecular mechanisms of fission in echinoderms: Transcriptome analysis.

Authors:  Igor Yu Dolmatov; Sergey V Afanasyev; Alexey V Boyko
Journal:  PLoS One       Date:  2018-04-12       Impact factor: 3.240

3.  Cloning, Expression and Inhibitory Effects on Lewis Lung Carcinoma Cells of rAj-Tspin from Sea Cucumber (Apostichopus japonicus).

Authors:  Rong Qiao; Rong Xiao; Zhong Chen; Jingwei Jiang; Chenghua Yuan; Shuxiang Ning; Jihong Wang; Zunchun Zhou
Journal:  Molecules       Date:  2021-12-30       Impact factor: 4.411

Review 4.  Machinery and Developmental Role of Retinoic Acid Signaling in Echinoderms.

Authors:  Shumpei Yamakawa; Hiroshi Wada
Journal:  Cells       Date:  2022-02-02       Impact factor: 6.600

5.  The sea cucumber genome provides insights into morphological evolution and visceral regeneration.

Authors:  Xiaojun Zhang; Lina Sun; Jianbo Yuan; Yamin Sun; Yi Gao; Libin Zhang; Shihao Li; Hui Dai; Jean-François Hamel; Chengzhang Liu; Yang Yu; Shilin Liu; Wenchao Lin; Kaimin Guo; Songjun Jin; Peng Xu; Kenneth B Storey; Pin Huan; Tao Zhang; Yi Zhou; Jiquan Zhang; Chenggang Lin; Xiaoni Li; Lili Xing; Da Huo; Mingzhe Sun; Lei Wang; Annie Mercier; Fuhua Li; Hongsheng Yang; Jianhai Xiang
Journal:  PLoS Biol       Date:  2017-10-12       Impact factor: 8.029

6.  Ambulacrarian insulin-related peptides and their putative receptors suggest how insulin and similar peptides may have evolved from insulin-like growth factor.

Authors:  Jan A Veenstra
Journal:  PeerJ       Date:  2021-07-14       Impact factor: 2.984

  6 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.