Literature DB >> 33377334

High-continuity genome assembly of the jellyfish Chrysaora quinquecirrha.

Wang-Xiao Xia¹, Hao-Rong Li², Jing-Hao Ge¹, Yao-Wu Liu³, Hong-Hui Li⁴, Yan-Hua Su⁴, Hai-Zhen Wang⁴, Hui-Fang Guo⁵, Yu-Xuan Dai¹, Yao-Wen Liu⁶, Xing-Chun Gou⁷.

Abstract

The Atlantic sea nettle ( Chrysaora quinquecirrha) has an important evolutionary position due to its high ecological value. However, due to limited sequencing technologies and complex jellyfish genomic sequences, the current C. quinquecirrha genome assembly is highly fragmented. Here, we used the most advanced high-throughput chromosome conformation capture (Hi-C) technology to obtain high-coverage sequencing data of the C. quinquecirrha genome. We then anchored these data to the previously published contig-level assembly to improve the genome. Finally, a high-continuity genome sequence of C. quinquecirrha was successfully assembled, which contained 1 882 scaffolds with a N50 length of 3.83 Mb. The N50 length of the genome assembly was 5.23 times longer than the previously released one, and additional analysis revealed that it had a high degree of genomic continuity and accuracy. Acquisition of the high-continuity genome sequence of C. quinquecirrha not only provides a basis for the study of jellyfish evolution through comparative genomics but also provides an important resource for studies on jellyfish growth and development.

Entities: Chemical Disease Species

Keywords: Assembly; Evolution; Genome; Hi-C; High continuity; Jellyfish;

Year: 2021 PMID： 33377334 PMCID： PMC7840447 DOI： 10.24272/j.issn.2095-8137.2020.258

Source DB: PubMed Journal: Zool Res ISSN： 2095-8137

DEAR EDITOR, The Atlantic sea nettle (Chrysaora quinquecirrha) has an important evolutionary position due to its high ecological value. However, due to limited sequencing technologies and complex jellyfish genomic sequences, the current C. quinquecirrha genome assembly is highly fragmented. Here, we used the most advanced high-throughput chromosome conformation capture (Hi-C) technology to obtain high-coverage sequencing data of the C. quinquecirrha genome. We then anchored these data to the previously published contig-level assembly to improve the genome. Finally, a high-continuity genome sequence of C. quinquecirrha was successfully assembled, which contained 1 882 scaffolds with a N50 length of 3.83 Mb. The N50 length of the genome assembly was 5.23 times longer than the previously released one, and additional analysis revealed that it had a high degree of genomic continuity and accuracy. Acquisition of the high-continuity genome sequence of C. quinquecirrha not only provides a basis for the study of jellyfish evolution through comparative genomics but also provides an important resource for studies on jellyfish growth and development. Jellyfishes belong to the phylum Cnidaria, which are lower invertebrate umbrella-shaped gelatinous zooplankton. Jellyfish, especially C. quinquecirrha, have substantial ecological impact due to their wide distribution, ranging from the southern coast of New England to tropical areas of the eastern coast of North America (Decker et al., 2007). Atlantic sea nettles are fertile in late spring and early summer, and large populations can have a significant impact on fisheries (Olesen et al., 1996). Additionally, continuous blooms of gelatinous zooplankton can permanently disrupt natural food webs (Oguz et al., 2012). This disruption is because jellyfish consume eggs, larvae, and juveniles, and thus can have long-term effects on commercially important fishery species (Finenko et al., 2013). Acquisition of the genome sequence could help in C. quinquecirrha research, including on their developmental processes. Fortunately, the first reference genome of C. quinquecirrha was assembled and released recently (Xia et al., 2020). However, due to its complexity and high heterozygosity, the assembled genome is very fragmented, thereby hindering further study of this species. Several sequencing technologies, such as Bionano optical mapping (http://www.bionanogenomics.com), 10x genomics (https://www.10xgenomics.com), and Hi-C (Pal et al., 2019), have been developed to help the assembly of high-continuity genomes (Chen et al., 2020; Dudchenko et al., 2017; Ghurye et al., 2019). The Hi-C technique has been widely used for the assembly of high-quality genomes (Chen et al., 2020; Dudchenko et al., 2017; Ghurye et al., 2019). With second-generation sequencing, Hi-C can obtain high-throughput data of the genomic loci and measure physical interactions. Moreover, Hi-C can measure the frequency of interactions within and between different chromosomes, including the number of interactions between chromosome fragments (Pal et al., 2019). Additionally, different chromosomes can be distinguished by identifying differences in the frequency of direct interactions between different regions, thereby constructing a genome at the chromosome level (Pal et al., 2019). In this study, we generated high-coverage Hi-C sequencing data of C. quinquecirrha, which we then anchored to the previously published genome to generate a high-continuity assembly. Fresh muscle samples of C. quinquecirrha were dissected and used for high-quality DNA extraction with a Qiagen Blood & Cell Culture DNA Mini Kit (Germany). The Hi-C library was then prepared via digestion of the DpnII restriction enzyme and polymerase chain reaction (PCR) enrichment by Novogene (China) on the NovaSeq 6000 sequencing platform (Illumina, USA) with a read length of 150 bp. Raw Hi-C sequencing reads with more than 30% low-quality bases or 10% unknown bases were filtered. Duplicate reads, which may be produced during PCR, and adaptor sequences were also removed as described in previous study (Chen et al., 2020). All remaining sequencing reads were used for further analysis. The clean Hi-C sequencing reads were then used for high-continuity genome assembly. We used Juicer (v1.5.6) (Durand et al., 2016b) to align all clean Hi-C reads to the previously published contig-level genome (Xia et al., 2020), with obviously duplicated mapping regions removed. The default parameters of Juicer were used, except –S was set to “early”. We then anchored the contigs into long sequences using 3D de novo assembly software (v170123) (Dudchenko et al., 2017) with parameters “-m haploid -i 15000 -r 0”. We used Juicebox (v1.9.8) (Durand et al., 2016a) to visualize the chromosome assembly after raw genome construction. According to the sequence interactions, we modified the fragments with obvious assembly errors. We obtained the final high-continuity genome after adjusting minor errors in connection order. The genome annotation workflow used was the same as in the previous study (Xia et al., 2020), except additional de novo prediction software were used. Specifically, we used SNAP (v2006-07-28) (Korf, 2004) with the HMM library (mam54.hmm) and default parameters. We next predicted the coding-region using Genscan (v1.0) (Burge & Karlin, 1997) with the HumanIso library. In addition, GlimmerHMM (v3.0.1) (Majoros et al., 2004) was used in the prediction of the coding regions. We analyzed genome synteny between C. quinquecirrha and Aurelia aurita (GCA_004194415.1_ABSv1) using LAST (v802) (Kiełbasa et al., 2011) with parameters “-m 100 –E 0.05”. The one-to-one comparison areas in the obtained maf file were selected for plotting, and the syntenic blocks between genomes were plotted using Circos (v0.69-6) (Krzywinski et al., 2009). Although several jellyfish genome assemblies have been published in recent years, most are highly fragmented (Jiang et al., 2019; Leclère et al., 2019). For C. quinquecirrha, the published genome contig N50 is 733 647 kb. To acquire a high-continuity assembly for the C. quinquecirrha genome, we sequenced high-coverage (~272 Gb) Hi-C data, which were mapped to the previously published genome (Xia et al., 2020). To obtain a more accurate genome assembly, we constructed long sequences using 3D de novo assembly software, allowing broken contig sequences. Finally, we identified 51 scaffolds with obvious edges, and ~67.18% of the total contig length was assembled into super-scaffolds (Supplementary Table 1; Supplementary Figures 1, 2). Results showed that this assembly, with a N50 of 3.83 Mb (Table 1, >Supplementary Table 2), was 5.23 times longer than the earlier version (Xia et al., 2020). In addition, the cumulative assembly length showed by the L50 (smallest number of sequences that make up at least 50% of the total assembly) statistics between the two genome versions (contig and Hi-C) indicated substantial improvement in connectivity degree of C. quinquecirrha (Figure 1A). The BUSCO scores were also significantly improved compared to the contig version (Supplementary Table 3).

Table 1

Statistics of two genome versions

Statistical item	Length (bp)	Number	Length (bp)	Number
Version	Contig-version (Xia et al., 2020)		Hi-C version (this study)
N90	66 354	666	227 000	195
N80	205 342	365	582 500	109
N70	395 469	249	873 000	62
N60	555 468	178	2 312 428	36
N50	733 647	125	3 825 607	24
Average length (bp)	134 943		179 287
Max length (bp)	4 015 784		15 257 941
Total length (bp)	336 819 409		337 419 359
Total number	2 496		1 882
Number ≥1 000 (bp)	2 496		1 880

Figure 1

Statistics and evaluation of Hi-C-based genome assembly

Statistics and evaluation of Hi-C-based genome assembly A: Cumulative assembly length of sequences from two genome assemblies of C. quinquecirrha. Two versions, including previously published contig version (Xia et al., 2020) and Hi-C version from this study, were used in statistical analysis. Dots above and below each line indicate L50 and L90 values, respectively. B: Scaffold-level genome assembly of C. quinquecirrha. Assembly results are shown in Circos diagram, with outer to inner rings showing distribution of protein-coding genes, tandem repeats (TRP), long tandem repeats (LTR), short interspersed repetitive elements (SINE), long interspersed repetitive elements (LINE), DNA elements, and GC content, respectively. C: Distribution of contigs and coding genes in each scaffold. Plot shows gene density distribution, contig number, and coding gene number in each scaffold, from left to right. D: Synteny of genomes between scaffold-level C. quinquecirrha and A. aurita. Syntenic blocks are linked between two genomes with a Circos plot. To obtain more accurate information about the distribution of repetitive sequences (including interspersed nuclear elements, tandem repeats (TRP), and DNA elements), coding genes, and GC content of each assembled sequence, the genome was cut with a 200 kb slide-window and plotted the results with Circos (v0.69-6) (Figure 1B). Results revealed that the GC content of each scaffold (scaf) exhibited little difference, ranging from 37.37% (scaf35) to 41.48% (scaf54). However, the ratio of repetitive sequences on different scaffolds was highly variable, ranging from 17.40% (scaf51, repeat length: 842 697 bp) to 61.94% (scaf44, repeat length: 1 888 601 bp). According to the Circos plot, the distribution of GC content was the most uniform, followed by the distribution of coding genes (Figure 1B). In some scaffolds, the repetitive sequence distribution was quite uneven, such as scaf29 (Figure 1B). The distribution ranges of short interspersed repetitive elements (SINE, length: 138 bp, 0.0009% in length of scaf29) and TRP (length: 1.32 Mb, 8.69% in length of scaf29) are shown in different colors in the two circles in Figure 1B. The distribution of TRP was relatively more concentrated in scaf36 (SINE, length: 1 944 bp, 0.064% in length of scaf36; TRP, length: 1.36 Mb, 44.65% in length of scaf36), while the distribution of SINE was more concentrated in scaf29 (Figure 1B). We counted the coding gene number and contig number in each scaffold to better understand the distribution of protein-coding genes and contigs in the scaffolds of C. quinquecirrha. We identified the longest 51 scaffolds (≥1 Mb), which showed a maximum of nearly 15 Mb. The longest scaffold (scaf29) was also comprised of the largest number (72) of contigs (Figure 1C), implying that complex scaffold composition may be the cause of the fragmented genome assembly in previous research. The longest scaffold (scaf29) also contained the most genes (1 170 genes), suggesting a positive relationship between gene number and scaffold length (Figure 1C). This was verified by the positive Pearson correlation coefficient (r=0.982) between chromosome length and gene number. To clarify the syntenic block relationship of C. quinquecirrha with other jellyfish species, we performed a genome-wide collinearity comparison between C. quinquecirrha and Aurelia aurita using the whole genome sequences. The two regions that showed similar sequences between the scaffolds of the two species were connected in the resulting plot (Figure 1D). As shown in Figure 1D (only sequences longer than 1 Mb are shown), we found that the collinearity between these two jellyfishes was good but was poor when we compared other species groups with short divergence time (such as among mammals). Among the C. quinquecirrha scaffolds, scaf26 (Figure 1D) had the longest collinearity with A. aurita, and the length of the collinear region was 401.33 kb. After removing several matched fragments with overlapping regions, the total lengths of the collinear regions were 8.23 Mb (3.65% of the whole genome) and 10.23 Mb (5.22% of the whole genome) in C. quinquecirrha and A. aurita, respectively. Based on comparison of their scaffold-level genomes, the two jellyfish showed few collinear regions (Figure 1D), which may result from differences in chromosome number and many genetic variation sites in each evolutionary process with the long divergence time of ~475 million years (Xia et al., 2020). Through systematic investigation of the Animal Genome Size Database (http://www.genomesize.com), we found that the C-value of the groups, including hydrozoan and scyphozoan, ranged from 0.26 to 1.49, indicating differences in the size of their genomes. In addition, though there are many extant species of jellyfish (likely more than 250), only a few species’ genomes have been published and most are very fragmented, suggesting complex genome composition or karyotypes in jellyfish. The previously published genome of C. quinquecirrha is very fragmented. Genome assembly can be difficult, especially regarding differences in the assembly of the genomes of Anura and Urodela species in Amphibia. The C-values in Anura range from 0.95 to 12.40 (Olmo & Morescalchi, 2005), but range from 10.02 to 120.6 in Urodela (Goin et al., 1968). In addition, the differences in published genome papers between these two groups also reflect the impact of genome size on assembly. To date, only one Urodela genome has been published (Nowoshilow et al., 2018) in comparison to the many genomes of Anura, e.g., Nanorana parkeri (Sun et al., 2015), Xenopus laevis (Session et al., 2016), and Xenopus tropicalis (Hellsten et al., 2010). Therefore, it may be that difficulty in genome assembly is closely related to the content of repetitive sequences and genome size. It remains a huge technological challenge to analyze high-quality genomes of these complex species. Sequencing technology has developed rapidly in recent years. In particular, the emergence of Hi-C techniques has been of considerable benefit for comparative genomics (Cali et al., 2019) and provided unprecedented accuracy and convenience for obtaining high-quality chromosome-level genomes (Pal et al., 2019). For example, high-continuity genomes have been obtained for many species using Hi-C technology (Dudchenko et al., 2017). With Hi-C, more high-continuity genomes of jellyfish species can be assembled in the future. In this study, we successfully assembled a high-continuity genome of C. quinquecirrha by generating high-coverage Hi-C sequencing data. Compared to the previously published version (Xia et al., 2020), the N50 length was substantially improved (Figure 1A). Genome synteny analysis showed a collinear relationship between C. quinquecirrha and A. aurita. The assembled high-continuity C. quinquecirrha genome could help improve our knowledge on the evolution of genomes and have practical application in studies on conservation biology and population genetics. It could also improve our understanding of the genomes of jellyfish, which should help in studies on the growth, development, and reproduction of C. quinquecirrha.

DATA AVAILABILITY

The raw Hi-C sequencing data and genome assembly of Chrysaora quinquecirrha were deposited in the National Center for Biotechnology Information (NCBI) database under accession No. PRJNA658826. The annotation file was uploaded to the DRYAD database (https://datadryad.org/stash/share/RY8FQAETuLZR_O0_hEdQsvROvJnMcsPZ9UtJBZsnqlQ). Supplementary data to this article can be found online. Click here for additional data file.

COMPETING INTERESTS

The authors declare that they have no competing interests.

AUTHORS’ CONTRIBUTIONS

X.C.G. and Y.W.L. conceived and supervised the project and revised the manuscript. W.X.X. and Y.W.L. collected samples. W.X.X. and H.R.L. performed bioinformatics analyses. W.X.X. and J.H.G. wrote the manuscript. Y.W.L., H.H.L., Y.H.S., H.Z.W., H.F.G., and Y.X.D. revised the manuscript. All authors read and approved the final version of the manuscript.

19 in total

1. Adaptive seeds tame genomic sequence comparison.

Authors: Szymon M Kiełbasa; Raymond Wan; Kengo Sato; Paul Horton; Martin C Frith
Journal: Genome Res Date: 2011-01-05 Impact factor: 9.043

Review 2. Nanopore sequencing technology and tools for genome assembly: computational analysis of the current state, bottlenecks and future directions.

Authors: Damla Senol Cali; Jeremie S Kim; Saugata Ghose; Can Alkan; Onur Mutlu
Journal: Brief Bioinform Date: 2019-07-19 Impact factor: 11.622

3. Prediction of complete gene structures in human genomic DNA.

Authors: C Burge; S Karlin
Journal: J Mol Biol Date: 1997-04-25 Impact factor: 5.469

4. Whole-genome sequence of the Tibetan frog Nanorana parkeri and the comparative evolution of tetrapod genomes.

Authors: Yan-Bo Sun; Zi-Jun Xiong; Xue-Yan Xiang; Shi-Ping Liu; Wei-Wei Zhou; Xiao-Long Tu; Li Zhong; Lu Wang; Dong-Dong Wu; Bao-Lin Zhang; Chun-Ling Zhu; Min-Min Yang; Hong-Man Chen; Fang Li; Long Zhou; Shao-Hong Feng; Chao Huang; Guo-Jie Zhang; David Irwin; David M Hillis; Robert W Murphy; Huan-Ming Yang; Jing Che; Jun Wang; Ya-Ping Zhang
Journal: Proc Natl Acad Sci U S A Date: 2015-03-02 Impact factor: 11.205

5. Integrating Hi-C links with assembly graphs for chromosome-scale assembly.

Authors: Jay Ghurye; Arang Rhie; Brian P Walenz; Anthony Schmitt; Siddarth Selvaraj; Mihai Pop; Adam M Phillippy; Sergey Koren
Journal: PLoS Comput Biol Date: 2019-08-21 Impact factor: 4.475

6. Juicebox Provides a Visualization System for Hi-C Contact Maps with Unlimited Zoom.

Authors: Neva C Durand; James T Robinson; Muhammad S Shamim; Ido Machol; Jill P Mesirov; Eric S Lander; Erez Lieberman Aiden
Journal: Cell Syst Date: 2016-07 Impact factor: 10.304

7. The genome of the jellyfish Clytia hemisphaerica and the evolution of the cnidarian life-cycle.

Authors: Lucas Leclère; Coralie Horin; Sandra Chevalier; Pascal Lapébie; Philippe Dru; Sophie Peron; Muriel Jager; Thomas Condamine; Karen Pottin; Séverine Romano; Julia Steger; Chiara Sinigaglia; Carine Barreau; Gonzalo Quiroga Artigas; Antonella Ruggiero; Cécile Fourrage; Johanna E M Kraus; Julie Poulain; Jean-Marc Aury; Patrick Wincker; Eric Quéinnec; Ulrich Technau; Michaël Manuel; Tsuyoshi Momose; Evelyn Houliston; Richard R Copley
Journal: Nat Ecol Evol Date: 2019-03-11 Impact factor: 15.460

8. High-Quality Genome Assembly of Chrysaora quinquecirrha Provides Insights Into the Adaptive Evolution of Jellyfish.

Authors: Wangxiao Xia; Haorong Li; Wenmin Cheng; Honghui Li; Yajing Mi; Xingchun Gou; Yaowen Liu
Journal: Front Genet Date: 2020-06-04 Impact factor: 4.599

9. A hybrid de novo assembly of the sea pansy (Renilla muelleri) genome.

Authors: Justin B Jiang; Andrea M Quattrini; Warren R Francis; Joseph F Ryan; Estefanía Rodríguez; Catherine S McFadden
Journal: Gigascience Date: 2019-04-01 Impact factor: 6.524

10. De novo genome assembly and Hi-C analysis reveal an association between chromatin architecture alterations and sex differentiation in the woody plant Jatropha curcas.

Authors: Mao-Sheng Chen; Longjian Niu; Mei-Li Zhao; Chuanjia Xu; Bang-Zhen Pan; Qiantang Fu; Yan-Bin Tao; Huiying He; Chunhui Hou; Zeng-Fu Xu
Journal: Gigascience Date: 2020-02-01 Impact factor: 6.524

2 in total

Review 1. The state of Medusozoa genomics: current evidence and future challenges.

Authors: Mylena D Santander; Maximiliano M Maronna; Joseph F Ryan; Sónia C S Andrade
Journal: Gigascience Date: 2022-05-17 Impact factor: 7.658

2. Genome of the sea anemone Exaiptasia pallida and transcriptome profiles during tentacle regeneration.

Authors: Cheryl W Y Shum; Wenyan Nong; Wai Lok So; Yiqian Li; Zhe Qu; Ho Yin Yip; Thomas Swale; Put O Ang; King Ming Chan; Ting Fung Chan; Ka Hou Chu; Apple P Y Chui; Kwok Fai Lau; Sai Ming Ngai; Fei Xu; Jerome H L Hui
Journal: Front Cell Dev Biol Date: 2022-08-17

2 in total