Literature DB >> 32467969

De Novo Genome Assembly of Limpet Bathyacmaea lactea (Gastropoda: Pectinodontidae): The First Reference Genome of a Deep-Sea Gastropod Endemic to Cold Seeps.

Ruoyu Liu1,2, Kun Wang3, Jun Liu1, Wenjie Xu3, Yang Zhou1,2, Chenglong Zhu3, Baosheng Wu1,2, Yongxin Li3, Wen Wang3, Shunping He1,4, Chenguang Feng3, Haibin Zhang1.   

Abstract

Cold seeps, characterized by the methane, hydrogen sulfide, and other hydrocarbon chemicals, foster one of the most widespread chemosynthetic ecosystems in deep sea that are densely populated by specialized benthos. However, scarce genomic resources severely limit our knowledge about the origin and adaptation of life in this unique ecosystem. Here, we present a genome of a deep-sea limpet Bathyacmaea lactea, a common species associated with the dominant mussel beds in cold seeps. We yielded 54.6 gigabases (Gb) of Nanopore reads and 77.9-Gb BGI-seq raw reads, respectively. Assembly harvested a 754.3-Mb genome for B. lactea, with 3,720 contigs and a contig N50 of 1.57 Mb, covering 94.3% of metazoan Benchmarking Universal Single-Copy Orthologs. In total, 23,574 protein-coding genes and 463.4 Mb of repetitive elements were identified. We analyzed the phylogenetic position, substitution rate, demographic history, and TE activity of B. lactea. We also identified 80 expanded gene families and 87 rapidly evolving Gene Ontology categories in the B. lactea genome. Many of these genes were associated with heterocyclic compound metabolism, membrane-bounded organelle, metal ion binding, and nitrogen and phosphorus metabolism. The high-quality assembly and in-depth characterization suggest the B. lactea genome will serve as an essential resource for understanding the origin and adaptation of life in the cold seeps.
© The Author(s) 2020. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.

Entities:  

Keywords:  zzm321990 Bathyacmaeazzm321990 ; Nanopore; cold seeps; deep-sea adaptation; limpet; mollusk genome

Mesh:

Year:  2020        PMID: 32467969      PMCID: PMC7313663          DOI: 10.1093/gbe/evaa100

Source DB:  PubMed          Journal:  Genome Biol Evol        ISSN: 1759-6653            Impact factor:   3.416


Introduction

Cold seeps are the deep-sea areas where methane, hydrogen sulfide, and other hydrocarbon-rich fluid seepages occur (Joseph 2017). Cold-seep animals not only have to adapt the harsh deep-sea environments including high hydrostatic pressure, darkness, and low temperatures, but also should cope with hypoxia and the rich reducing chemicals that are toxic to most animals (Levin 2005; Hourdez and Lallier 2006). Despite inhospitable, cold seeps support huge chemosynthesis-based ecosystems characterized by abundant specialized benthos (Van Dover et al. 2002; Niu et al. 2017; Åström et al. 2019). In a typical cold-seep ecosystem, certain invertebrates of mussels, clams, tubeworms, shrimps, and gastropods are the common species (Levin 2005; Mazumdar et al. 2019), most of them perform symbioses with chemosynthetic bacteria (Petersen and Dubilier 2009; Kiel 2010), whereas some gastropods feed on bacteria and other animals (Zande and Carney 2001; Amano 2003). Biologists have long wondered the origin and adaptations of these specialized organisms in such unique environments. To date, genomes of symbiotic Bathymodiolus platifrons (Sun et al. 2017) and Lamellibrachia luymesi (Li et al. 2019) in cold seeps have been published, which have well revealed the mechanisms of symbioses. However, our knowledge about the underlying adaptations of them to the cold-seep environment is still lagging. This is particularly true for nonsymbiotic animals. Thus, the genomes of cold-seep animals with different nutrition modes are vital for a comprehensive understanding to the origin and adaptation of cold-seep life. As important members of deep-sea gastropods, Pectinodontid limpets are often associated with the dominant bathymodioline mussel beds (Feng et al. 2018; Mazumdar et al. 2019). Previous investigations reported that some limpets (e.g., Bathyacmaea lactea) feed on bacteria and decomposing the periostracum of Bathymodiolus shells (Zande and Carney 2001; Zhang et al. 2016; Feng et al. 2018; Linse et al. 2019; Mazumdar et al. 2019), which makes deep-sea limpets different from the well-studied symbionts. Meanwhile, a large number of reports (Nakano and Ozawa 2007) accompanied by complete fossil records (Jenkins et al. 2017) make the Pectinodontid limpets ideal for studying phylogeography and paleoecology. Here, we present a genome assembly of cold-seep limpet B. lactea (NCBI: txid2049241; Gastropoda, Pectinodontidae) by Nanopore and BGI-seq sequencing. To the best of our knowledge, this is the first genome of gastropods in the deep-sea cold seeps. We believe that high-quality assembly and annotation will provide a solid foundation in answering the fundamental issues about the origin and adaptation of cold-seep life.

Materials and Methods

Sampling and sequencing

A total of eight specimens of deep-sea limpet B. lactea were obtained by the HOV Shenhai Yongshi at a depth of ∼1,388 m in the Haima cold seeps in the South China Sea (supplementary fig. S1, Supplementary Material online). These samples were clean and immediately refrigerated by liquid nitrogen. The whole-genome DNA of a specimen (sample No. MBSQW58) was extracted by Qiagen Genomic DNA extraction kit (Cat#13323, Qiagen, Hilden, Germany) and then subjected to construct genome BGI-seq library using an MGIEasy Library Prep Kit (v1.1, MGI Tech). We conducted 100-bp paired-end single-indexed sequencing on the MGISEQ-2000 platform (BGI, Shenzhen, China). Nanopore libraries with insertions >20 kb were prepared and sequenced on one flow cell using a PromethION DNA sequencer (v24, Oxford Nanopore, Oxford, UK).

Genome Assembly and Assessment

For the BGI-seq data, both low-quality reads and adaptor sequences were removed by fastp (v0.20, Chen et al. 2018) with default parameters. We estimated the genome size via a 23-mer frequency distribution analysis using the formula: G = K_num/K_depth. The K_num and K_depth represented the total number and the expected depth of 23-mer analysis, respectively. For the Nanopore data, raw reads were filtered by program ontbc (https://github.com/FlyPythons/ontbc) with parameters “–min_score 7 –min_length 1000.” The filtered Nanopore reads were first corrected the base errors by NextDenovo (https://github.com/Nextomics/NextDenovo) with default settings, and then assembled into contigs by program wtdbg (v2.4, Ruan and Li 2019) with parameters “-L 5000 -k 0 -p 21 -S 1.” The filtered BGI-seq reads were mapped to assembled contigs using Burrows–Wheeler Aligner (BWA, Li and Durbin 2010) and polished twice with Pilon (v1.21, Walker et al. 2014) under default settings. We employed Benchmarking Universal Single-Copy Orthologs (BUSCO v3.0, Simão et al. 2015) to evaluate the completeness of the genome assembly and annotation using the “metazoa_odb9” database. To further evaluate the accuracy, we aligned the clean BGI-seq reads against the genome assembly using BWA (Li and Durbin 2010). Then, SAMtools/BCFtools package (v1.3.1, Li et al. 2009) was employed to investigate the reads coverage of the final assembly and calculate the base error rate of it.

Genome Annotation

We carried out annotations for repeats before predicting the structure of genes. Firstly, we build a de novo repeat library by RepeatModeler (v1.0.11, Tarailo-Graovac and Chen 2009), then we employed RepeatMasker (v3.3.0, Tarailo-Graovac and Chen 2009) to identify homolog repeats in B. lactea. Moreover, RepeatMasker (v3.3.0, Tarailo-Graovac and Chen 2009) and RepeatProteinMask (v3.3.0, a package in RepeatMasker) were employed to search previously reported repeats in the Repbase. Further, we used Tandem Repeat Finder (v4.04, Benson 1999) to identify tandem repeats with following settings: Match = 2, Mismatch = 7, Delta = 7, PM = 80, PI = 10, Minscore = 50. The location and structure of genes were predicted by a combination of ab initio and homology-based approaches. Before homology-based annotation, the genomes of six mollusk species were retrieved from NCBI (supplementary table S1, Supplementary Material online). Transcript of each gene with the longest length and no premature termination site was picked out and translated into amino acids (Wang, Xu, et al. 2019). These protein sequences were aligned to the repeats soft-masked genome by TBlastN (v2.7.1, Altschul et al. 1990) with a cut-off value 1e-5. Then, identified homologous sequences were subjected to GeneWise (Birney et al. 2004) to define gene models. For the ab initio annotation, we used AUGUSTUS (v3.2.1, Stanke et al. 2008) to predict gene models based on the “seahare” training set. All gene models were integrated into a nonredundant gene set by EvidenceModeler (EVM, v1.1.1, Haas et al. 2008). To determine the function of those predicted protein-coding genes, we employed InterProScan (v5.30-69.0, Jones et al. 2014) to searched for the best hits of these proteins against all incorporated databases.

Phylogenetic Analysis and Divergence Time Analysis

To assess the phylogenetic position of B. lactea, genomic data of another seven lophotrochozoa species (see supplementary table S1, Supplementary Material online) were retrieved from NCBI and preprocessed in the same way as those six mollusk species described above. Protein set of B. lactea and these thirteen species were clustered into families by running reciprocal BLAST analysis in OrthoFinder (v2.3.1, Emms and Kelly 2015). The 580 produced single-copy orthologues were aligned using MUSCLE (v3.8.1551, Edgar 2004) with default parameters. Low-quality alignments were trimmed by TrimAl (Capella-Gutiérrez et al. 2009) using an automated1 algorithm. Subsequently, the remaining alignments were concatenated into a supergene and used for phylogeny reconstruction. Finally, a maximum likelihood (ML) tree was built by RAxML (v8.2.12, Stamatakis 2014) with the GTRGAMMA model and 1,000 bootstraps. Divergence time among species was estimated via the MCMCtree program in PAML (v4.9, Yang 2007). Five fossil-based calibration points were used as time priors (Benton et al. 2009; Plazzi and Passamonti 2010): The most recent common ancestor of lophotrochozoa (550.3–636.1 Ma); the most recent common ancestor of molluscs (532–549 Ma); Aplysia californica (or Biomphalaria glabrata)–Lottia gigantea (470.2–531.5 Ma); A. californica–B. glabrata (168.6–473.4 Ma); Crassostrea gigasCrassostrea virginica (63–83 Ma).

Substitution Rate, Demographic History, and TE Activity

The branch-length calibrated ML tree was subjected to r8s (v1.81, Sanderson 2003) to investigate the substitution rate of each species under the penalized likelihood method. We explored demographic histories using the Pairwise Sequentially Markovian Coalescence model (PSMC, v0.6.5-r67, Li and Durbin 2011) based on heterozygous sites. The filtered BGI-seq reads were mapped to the final genome assembly using BWA-mem (Li and Durbin 2010). Subsequently, SAMtools (v1.3.1, Li et al. 2009) with the parameters “mpileup -q 20 -Q 20” were employed to extract heterozygous sites. The substitution rate was set according to the r8s (Sanderson 2003) results. Finally, the PSMC model (Li and Durbin 2011) was analyzed using parameters: -N25 -t15 -r5 -b -p “4 + 25*2 + 4 + 6.” We further used a custom Perl script parseRM.pl (available at https://github.com/4ureliek/Parsing-RepeatMasker-Outputs; Kapusta et al. 2017) to parse the TE activities in B. lactea based on alignment outputs from RepeatMasker (v3.3.0, Tarailo-Graovac and Chen 2009). The substitution rate was set as 0.00121 per site per million years, according to the results of r8s (v1.81, Sanderson 2003). The analysis result was packed into bin per 3 Ma.

Gene Family Analysis and Rapidly Evolving Gene Ontology Categories

Based on the OrthoFinder analysis, we investigated significant expansion and contraction of gene families with CAFÉ (v4.0, Berger and Young 2006) along the time-calibrated tree (Viterbi P-values < 0.01). Single-copy orthologues from the OrthoFinder analysis were realigned with PRANK (v170427, Löytynoja 2014) and filtered by Gblocks (v0.91b, Castresana 2000) using default settings to remove the ambiguously aligned sequences. We used the free-ratio model in the “Codeml” module of the PAML (v4.9, Yang 2007) to estimate the Ka/Ks values of species for each gene. According to Gene Ontology (GO) annotations, these genes were assigned to the corresponding GO categories. Any GO comprised <10 orthologues was excluded from subsequent analysis. Following the method of Wang, Shen, et al. (2019), we counted the Ka and Ks values of B. lactea and it is shallow-water relative L. gigantea in each GO and set the nonsynonymous substitution probability of L. gigantea corrected by substitutions of these two genomes as the background expectation. By a one-sided binomial test, those with a significantly higher Ka/Ks level in B. lactea than expected were considered as rapidly evolving GO categories (FDR < 0.05).

Results and Discussion

In total, we generated a 77.9 Gb (98.3×) raw data set of BGI-seq and a 54.6 Gb (68.9×, reads N50 of 18.3 kb) raw data set of Nanopore, respectively (supplementary table S2, Supplementary Material online). After filtering, 77.0 Gb (97.1×) BGI reads and 46.2 Gb (58.3×, N50 of 18.7 kb) passed Nanopore reads were harvested (supplementary table S2, Supplementary Material online). The 23-mer survey suggested an estimated genome size of ∼792.8 Mb for B. lactea (supplementary fig. S2 and table S3, Supplementary Material online). A slight subpeak was observed at half the value of the expected depth (supplementary fig. S2, Supplementary Material online), indicating a moderately heterozygous genome. After assembly and polishing, we obtained a final genome of 754.3 Mb, with 3,720 contigs and a contig N50 of 1.57 Mb (table 1). The completeness assessment showed that the assembly included complete matches for 922 of 978 metazoan BUSCOs (94.3%), which represented a high-quality assembly compared with other gastropod genomes (supplementary table S4, Supplementary Material online). Furthermore, we found that 98.1% of the BGI-seq reads were reliably aligned to the genome assembly, and 93.9% of the reads were properly aligned to the genome with their mates. The genome-level base error rate was as low as 2.3 bp per 100,000 bases. These results suggested a high-quality assembly of the B. lactea genome.
Table 1

Assembly Statistics of the Bathyacmaea lactea Genome

Contig
Size (bp)Number
N90111,041755
N501,568,604125
Longest8,382,764
Shortest1,973
Total size754,256,7723,720
Assembly Statistics of the Bathyacmaea lactea Genome

Genome Annotations

Totally, ∼463.4 Mb of repeat sequences that accounted for 61.4% of the genome were identified (supplementary table S5, Supplementary Material online). This proportion was quite high compared with that of other mollusk genomes (20.5–61.1%, Sun et al. 2017; Cai et al. 2019). DNA transposons (10.4%) and long interspersed nuclear elements (LINEs; 6.5%) were the most abundant TEs in the B. lactea genome (supplementary table S6, Supplementary Material online). We finally generated a gene set of 23,574 protein-coding genes with an average of 6.8 exons and 1,417-bp coding region per gene, which was comparable to that of other gastropod genomes (supplementary fig. S3 and table S7, Supplementary Material online). BUSCO assessment indicated that 95.7% of the metazoa conserved genes were completely detected in the B. lactea gene set. This result was better than most of the published gastropod genomes (supplementary table S4, Supplementary Material online). As for function annotation, 21,136 of the 23,574 genes were annotated by at least one database, representing 89.7% of the total genes (supplementary table S8, Supplementary Material online).

Phylogenetics and Divergence Time Analysis

Totally, 580 single-copy orthologues were identified among fourteen lophotrochozoa species. Based on the single-copy orthologues, ML analysis recovered a well-resolved phylogeny of lophotrochozoan and supported B. lactea was most closely related to limpet L. gigantea. They diverged from a common ancestor at ∼199.5 Ma (fig. 1), with a 95% confidence interval of 141.4–258.7 Ma.
Fig. 1.

Evolutionary analysis of Bathyacmaea lactea. (a) Phylogeny and divergence time (Ma) of fourteen Lophotrochozoa species, labeled with the 95% confidence interval (purple bars). Numbers on branches indicate the event of gene family expansions (red) and contractions (blue). (b) Comparison of substitution rates among fourteen Lophotrochozoa species. (c) Demographic history of B. lactea estimated by PSMC. (d) The landscape of TE accumulation along the timeline. The result was exhibited by bins of 3 Ma.

Evolutionary analysis of Bathyacmaea lactea. (a) Phylogeny and divergence time (Ma) of fourteen Lophotrochozoa species, labeled with the 95% confidence interval (purple bars). Numbers on branches indicate the event of gene family expansions (red) and contractions (blue). (b) Comparison of substitution rates among fourteen Lophotrochozoa species. (c) Demographic history of B. lactea estimated by PSMC. (d) The landscape of TE accumulation along the timeline. The result was exhibited by bins of 3 Ma. The r8s analysis showed that gastropods tended to have a higher substitution rate than bivalves (fig. 1). The substitution rate of B. lactea was ∼1.21 × 10−9 per site per year, which was slightly higher than that of its shallow-sea relative L. gigantea (fig. 1). Demographic history analysis showed that the effective population size of B. lactea reached a peak of ∼2.6 Ma and then experienced a long-term recession until it recovered again at ∼21,000 years ago (21.0 ka, fig. 1). The rise and fall of population of B. lactea coincided with the great events in ancient times, such as the marine life extinction around the Pliocene-Pleistocene boundary (∼2.58 Ma, Pimiento et al. 2017) and the Last Glacial Maximum (26.5–19.0 ka, Clark et al. 2009). These results indicate the deep-sea ecosystems may also be influenced by paleoclimate. Furthermore, we found that B. lactea had undergone a long-lasting TE activity until the last 10 Ma, including two concentrated TE expansions (fig. 1 and supplementary fig. S4, Supplementary Material online). It has been reported that TEs could play important roles in response to environmental challenges (Casacuberta and González 2013). Hence, whether and how TEs facilitate the adaptability of deep-sea limpet to cold seeps also deserve the attention of further studies.

Gene Family Analysis and Rapidly Evolving GO Categories

We detected 80 expanded gene families and 11 contracted gene families in the B. lactea lineage (fig. 1). The expanded gene families were enriched in 97 GO categories, and many of them were associated with metal ion binding, heterocyclic compound metabolism, apoptotic process, nitrogen compound metabolism, and membranes (supplementary table S9, Supplementary Material online). Meanwhile, gene families related to peptidase inhibitor activity were contracted (supplementary table S10, Supplementary Material online). Further analysis harvested 87 rapidly evolving GOs (supplementary table S11, Supplementary Material online), most of them were involved in heterocyclic compound metabolism, membrane-bounded organelle, metal ion binding, and nitrogen and phosphorus metabolism (supplementary fig. S5, Supplementary Material online). These expanded and rapidly evolving genes depict the holistic adaptive changes within the genome. Positive changes of genes associated with heterocyclic and nitrogen compound metabolism hinted the genetic basis of B. lactea adapting to the chemosynthesis-based and highly toxic reducing environments in cold seeps (Berry, et al. 1987; Xue and Warshawsky 2005). Moreover, changes in membranes and metal ion binding may facilitate lives survive and thrive in the inhospitable conditions of cold seeps including high pressures, coldness, and hypoxia (Hourdez and Lallier 2006; Wang, Xu, et al. 2019). Although many works remain to be done to disentangle the origin and adaptation of life in cold seeps, we believe that the high-quality genome and numerous data created in this study will serve as important resources in answering these issues. Click here for additional data file.
  36 in total

1.  Selection of conserved blocks from multiple alignments for their use in phylogenetic analysis.

Authors:  J Castresana
Journal:  Mol Biol Evol       Date:  2000-04       Impact factor: 16.240

Review 2.  Evolution and biogeography of deep-sea vent and seep invertebrates.

Authors:  C L Van Dover; C R German; K G Speer; L M Parson; R C Vrijenhoek
Journal:  Science       Date:  2002-02-15       Impact factor: 47.728

3.  GeneWise and Genomewise.

Authors:  Ewan Birney; Michele Clamp; Richard Durbin
Journal:  Genome Res       Date:  2004-05       Impact factor: 9.043

4.  Towards a molecular phylogeny of Mollusks: bivalves' early evolution as revealed by mitochondrial genes.

Authors:  Federico Plazzi; Marco Passamonti
Journal:  Mol Phylogenet Evol       Date:  2010-09-09       Impact factor: 4.286

5.  The Last Glacial Maximum.

Authors:  Peter U Clark; Arthur S Dyke; Jeremy D Shakun; Anders E Carlson; Jorie Clark; Barbara Wohlfarth; Jerry X Mitrovica; Steven W Hostetler; A Marshall McCabe
Journal:  Science       Date:  2009-08-07       Impact factor: 47.728

6.  BUSCO: assessing genome assembly and annotation completeness with single-copy orthologs.

Authors:  Felipe A Simão; Robert M Waterhouse; Panagiotis Ioannidis; Evgenia V Kriventseva; Evgeny M Zdobnov
Journal:  Bioinformatics       Date:  2015-06-09       Impact factor: 6.937

7.  InterProScan 5: genome-scale protein function classification.

Authors:  Philip Jones; David Binns; Hsin-Yu Chang; Matthew Fraser; Weizhong Li; Craig McAnulla; Hamish McWilliam; John Maslen; Alex Mitchell; Gift Nuka; Sebastien Pesseat; Antony F Quinn; Amaia Sangrador-Vegas; Maxim Scheremetjew; Siew-Yit Yong; Rodrigo Lopez; Sarah Hunter
Journal:  Bioinformatics       Date:  2014-01-21       Impact factor: 6.937

8.  RAxML version 8: a tool for phylogenetic analysis and post-analysis of large phylogenies.

Authors:  Alexandros Stamatakis
Journal:  Bioinformatics       Date:  2014-01-21       Impact factor: 6.937

9.  A draft genome assembly of the solar-powered sea slug Elysia chlorotica.

Authors:  Huimin Cai; Qiye Li; Xiaodong Fang; Ji Li; Nicholas E Curtis; Andreas Altenburger; Tomoko Shibata; Mingji Feng; Taro Maeda; Julie A Schwartz; Shuji Shigenobu; Nina Lundholm; Tomoaki Nishiyama; Huanming Yang; Mitsuyasu Hasebe; Shuaicheng Li; Sidney K Pierce; Jian Wang
Journal:  Sci Data       Date:  2019-02-19       Impact factor: 6.444

10.  Fast and accurate long-read alignment with Burrows-Wheeler transform.

Authors:  Heng Li; Richard Durbin
Journal:  Bioinformatics       Date:  2010-01-15       Impact factor: 6.937

View more
  5 in total

1.  Genome of a giant isopod, Bathynomus jamesi, provides insights into body size evolution and adaptation to deep-sea environment.

Authors:  Jianbo Yuan; Xiaojun Zhang; Qi Kou; Yamin Sun; Chengzhang Liu; Shihao Li; Yang Yu; Chengsong Zhang; Songjun Jin; Jianhai Xiang; Xinzheng Li; Fuhua Li
Journal:  BMC Biol       Date:  2022-05-13       Impact factor: 7.364

2.  The First Draft Genome of a Cold-Water Coral Trachythela sp. (Alcyonacea: Stolonifera: Clavulariidae).

Authors:  Yang Zhou; Chenguang Feng; Yujin Pu; Jun Liu; Ruoyu Liu; Haibin Zhang
Journal:  Genome Biol Evol       Date:  2021-02-03       Impact factor: 3.416

3.  Comparative transcriptomic analysis of in situ and onboard fixed deep-sea limpets reveals sample preparation-related differences.

Authors:  Guoyong Yan; Yi Lan; Jin Sun; Ting Xu; Tong Wei; Pei-Yuan Qian
Journal:  iScience       Date:  2022-03-17

4.  Convergent Evolution and Structural Adaptation to the Deep Ocean in the Protein-Folding Chaperonin CCTα.

Authors:  Alexandra A-T Weber; Andrew F Hugall; Timothy D O'Hara
Journal:  Genome Biol Evol       Date:  2020-11-03       Impact factor: 3.416

5.  The genome of an apodid holothuroid (Chiridota heheva) provides insights into its adaptation to a deep-sea reducing environment.

Authors:  Long Zhang; Jian He; Peipei Tan; Zhen Gong; Shiyu Qian; Yuanyuan Miao; Han-Yu Zhang; Guangxian Tu; Qi Chen; Qiqi Zhong; Guanzhu Han; Jianguo He; Muhua Wang
Journal:  Commun Biol       Date:  2022-03-10
  5 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.