Literature DB >> 30295708

The genome of the oyster Saccostrea offers insight into the environmental resilience of bivalves.

Daniel Powell¹, Sankar Subramanian¹, Saowaros Suwansa-Ard¹, Min Zhao¹, Wayne O'Connor², David Raftos³, Abigail Elizur¹.

Abstract

Oysters are keystone species in estuarine ecosystems and are of substantial economic value to fisheries and aquaculture worldwide. Contending with disease and environmental stress are considerable challenges to oyster culture. Here we report a draft genome of the Sydney Rock Oyster, Saccostrea glomerata, an iconic and commercially important species of edible oyster in Australia known for its enhanced resilience to harsh environmental conditions. This is the second reference genome to be reported from the family Ostreidae enabling a genus-level study of lophotrochozoan genome evolution. Our analysis of the 784-megabase S. glomerata genome shows extensive expansions of gene families associated with immunological non-self-recognition. Transcriptomic analysis revealed highly tissue-specific patterns of expression among these genes, suggesting a complex assortment of immune receptors provide this oyster with a unique capacity to recognize invading microbes. Several gene families involved in stress response are notably expanded in Saccostrea compared with other oysters, and likely key to this species' adaptations for improved survival higher in the intertidal zone. The Sydney Rock Oyster genome provides a valuable resource for future research in molluscan biology, evolution and environmental resilience. Its close relatedness to Crassostrea will further comparative studies, advancing the means for improved oyster agriculture and conservation.

Entities: Chemical Disease Species

Mesh：

Year: 2018 PMID： 30295708 PMCID： PMC6289776 DOI： 10.1093/dnares/dsy032

Source DB: PubMed Journal: DNA Res ISSN： 1340-2838 Impact factor: 4.458

1. Introduction

Oysters of the family Ostreidae are a group of bivalve molluscs that include over 70 extant members considered to be keystone species widely distributed in estuarine ecosystems, performing important roles in mitigating turbidity and improving water quality., Edible oysters have established commercial significance in fisheries and aquaculture industries being among the most highly produced mollusc species in the world. Having evolved an extraordinary resilience to the harsh conditions of intertidal marine environments, oysters are capable of tolerating wild fluctuations in temperature and salinity, extended emersion and the persistent exposure to microbes encountered by filter-feeding. A rich and diverse set of immune and stress response genes in the oyster genome are thought to be pivotal to the remarkably effective host defence system that enables these animals to thrive in estuaries and coastal oceans worldwide., Despite these adaptations, oyster populations both wild and captive are threatened with mass mortalities caused by epizootic infections and by factors associated with environmental change. Understanding and ameliorating susceptibility to these threats is essential for the establishment of secure mariculture and effective conservation. The Sydney Rock Oyster (Saccostrea glomerata) is an economically important species of edible oyster in Australia, naturally populating the shorelines of its eastern coast and extending across the Tasman Sea to the northern regions of New Zealand. Its cultivation contributes substantially to an aquaculture industry in Australia dating back to the 19th century, supported by selective breeding programmes that have been operating for over 25 years., The Sydney Rock Oyster, in contrast to the more widely distributed and invasive Pacific oyster (Crassostrea gigas), grows ∼60% slower under favourable conditions, yet has a higher tolerance to abiotic stress, surviving up to three times longer out of water. It also appears to be resistant to the devastating viral disease Pacific Oyster Mortality Syndrome caused by OsHV-1. These characteristics provoke inquiry into the nature of the higher resilience observed in S. glomerata and offer a unique opportunity for comparative studies given the recent availability of the Pacific oyster genome. A lack of sequenced genomes from closely related species of lophotrochozoa have limited the extent of comparative studies within this highly diverse superphylum of species. Decoding a complete Saccostrea genome enables a deeper understanding of the biology and evolution of the Ostreidae and will serve as a valuable resource for genetic improvement within the oyster farming industry. Here, we present an annotated draft genome for S. glomerata and explore comparisons between genes relevant to resilience among a close relative and other more evolutionarily distant molluscs.

2. Materials and methods

2.1. Generation of sequence data

Mantle and gill tissues were dissected from a single female oyster for high molecular weight DNA extraction and library preparation. PCR-free short-insert libraries of 210 and 450 bp along with mate-pair libraries of 3, 6 and 9 kb were sequenced on the illumina HiSeq 2500 (Supplementary Table S1). A Chicago library was produced from an additional oyster and sequenced on the illumina platform.

2.2. Sequence assembly

Raw reads were quality filtered and trimmed using either Trimmomatic or Skewer to produce over 200 Gb of clean data (Supplementary Table S2). Clean reads were assembled and initially scaffolded de novo using Meraculous2. For detailed information see Supplementary data. Construction of a primary haploid assembly was performed using HaploMerger2 and further scaffolding performed by Dovetail Genomics (Santa Cruz, CA, USA). Read mapping of genomic reads was performed using Bowtie2. Rate of heterozygosity was estimated using the SAMtools/BCFtools (mpileup) pipeline to call SNPs and short InDels. Estimates of completeness were undertaken using both the CEGMA pipeline and BUSCO searches of the OrthoDB metazoan library.

2.3. Genome annotation

Automated gene annotation was performed using MAKER2 (see Supplementary data). Repetitive sequences were soft-masked with RepeatMasker (http://www.repeatmasker.org/RMDownload.html (13 September 2018, date last accessed)) using the RepBase library and a custom repeat library generated with RepeatModeler (http://www.repeatmasker.org/RepeatModeler/ (13 September 2018, date last accessed)). Protein-coding sequences from the genomes of C. gigas, Pinctada fucata, Lottia gigantea, Octopus bimaculoides, Drosophila melanogaster and Homo sapiens were used for homology-based gene prediction. Stranded RNA-Seq data were aligned to the genome for use with the BRAKER1 pipeline for training an AUGUSTUS model that was included with a stranded transcriptome assembly produced by Ertl et al., CEGMA derived proteins and the features of the RNA-Seq alignment in the MAKER2 pipeline. Predictions were filtered from the final gene set if they displayed no alignment to either the protein-coding genes from other species using BLAST (E-value <10−10) or to the Pfam database.

2.4. Phylogenetic analysis

Amino acid sequences of 1,205 genes were concatenated to create a super gene (see Supplementary data). The concatenated sequences from 13 species were aligned using MUSCLE by selecting default settings. After removing alignment gaps from all sequences 247,779 amino acids were available for further analysis. This multiple sequence alignment was used to infer the phylogenetic relationship between the species and the maximum likelihood based RAxML was used for this purpose. A γ distribution was used to model the rate variation among sites and four rate categories were chosen. To model substitutions between amino acids we opted the LG (Le and Gascuel) substitution matrix and used the empirical amino acid frequencies. The species Nematostella vectensis was set as the outgroup. A bootstrap resampling procedure with 100 pseudo-replicates was used to obtain statistical confidence for each bifurcation (node) of the phylogenetic tree. The software FigTree (http://tree.bio.ed.ac.uk/software/figtree/ (13 September 2018, date last accessed)) was used to view and print the tree generated by RAxML.

2.5. Divergence time estimation

In order to estimate the divergence times between molluscan species a Bayesian statistics based MCMCtree method was employed. The amino acid sequence alignment was used for this analysis and the maximum likelihood tree obtained from the RAxML program was used as the guide tree. The following fossil ages were used to calibrate the tree: 306–581 million years (MY) for the spilt between Capitella teleta and Helobdella robusta 470–532 MY for Aplysia californica–L. gigantea divergence, 532–549 MY for the first appearance of the molluscs and 550–636 MY for the first appearance of the Lophotrochozoa and Eumetazoa. We also fixed a maximum age of 650 MY for the root of the tree (root age). To obtain the Hessian matrix for the protein data, the codeml program of the software PAML was used. Using the WAG+Gamma model of amino acid substitution matrix and the four calibration times listed above the divergence times were estimated. The results of MCMCtree were checked for convergence using the program Tracer (http://tree.bio.ed.ac.uk/software/tracer/ (13 September 2018, date last accessed)) and the time-tree generated by this program was viewed using FigTree. The divergence times for each bifurcating node are given in Fig. 1a.

Figure 1.

Divergence time and rate of non-synonymous substitutions between bivalves. (a) A time tree based on protein sequences from 16 metazoan genomes. Divergence times were estimated using a Bayesian MCMC method. A maximum likelihood based phylogenetic tree was calibrated based on seven well-defined fossils (see Methods). (b) Genetic divergence measured by the non-synonymous substitution rate (dN) between bivalves and the outgroup Lottia gigantea using 3,269 orthologous genes. The relative divergences with respect to the outgroup reveal the differences in the rate of protein evolution among bivalves.

2.6. Rate of protein evolution and gene expression

To examine the correlation between gene expression and rate of protein evolution we used 11,388 orthologous genes for the Sydney Rock-Pacific Oyster comparison. Using the codeml module of software program PAML we obtained the likelihood based pairwise non-synonymous divergence for each gene. We then obtained the expression levels of each gene in five tissues (gill, mantle, muscle, haemolymph and digestive system). Our analysis revealed a highly significant negative correlation (P < 0.0001) between expression level and rate of protein evolution (Supplementary Fig. S7a). We also sorted protein-coding genes based on their expression levels and grouped them into 12 categories containing equal number (949) of genes. The average estimates of gene expression levels and rate of protein evolution were computed for genes belonging to each category (Supplementary Fig. S7b). The relationship based on the mean estimates was also highly significant (P < 0.0001). To examine the rate of protein evolution across bivalves we used 3,269 orthologous genes from the 6 bivalves and L. gigantea, which was used as the outgroup. Using the codeml module of software program PAML we obtained the likelihood based pairwise non-synonymous divergence between L. gigantea and bivalve genomes. Since the time of divergence between L. gigantea and each bivalve is expected to be the same, any difference in the pairwise divergence suggest the variation in the rate of evolution between different bivalves (Fig. 1b).

2.7. Gene family analysis

Protein sequences from S. glomerata, C. gigas, P. fucata and Patinopecten yessoensis were compared for orthology using all-against-all BLASTP alignment (E-value of 10−5) and clustered using OrthoMCL (inflation value of 1.5). Protein family domain analysis was performed with Hidden Markov Model (HMM) searches of the Pfam database (see Supplementary data). Phylogenetic trees were generated using full length protein sequences aligned with MUSCLE and constructed with FastTree using the Jones–Taylor–Thornton model then visualized with FigTree v1.4.3 and MEGA 7. Functional annotation of the protein-coding genes from the S. glomerata genome was undertaken using BLASTp 2.5.0+ (E-value 10−5) to search for homologs with a local copy of the NCBI non-redundant database (nr). Protein-coding sequences were aligned against the NCBI KOG database (version 28 March 2017) using RPSBLAST v2.2.15 performed via the WebMGA server and for analysis of biological pathways present within the S. glomerata genome, KEGG orthologous gene information was obtained using the KEGG Automatic Annotation Server (KAAS) using a bi-directional best hit approach with the eukaryote representative gene set as reference.

2.8. Quantifying gene expression

Paired-end transcriptome libraries from gill, mantle, male gonad, female gonad, muscle, haemolymph and digestive system used in this study were generated previously., Expression levels were measured by aligning quality processed RNA-Seq reads to the genome assembly using HiSat. Mapped reads were sorted with SAMtools and counts reported as transcripts per one million mapped reads (TPM) using StringTie. TPM values visualized in heatmaps were transformed to log2 (TPM + 1) and normalized across tissues using the scale function in R.

3. Results

3.1. Genome sequencing and de novo assembly

The genome of S. glomerata was estimated to be 784 Mb in size based on an analysis of k-mer frequency distribution. It was sequenced to over 300-fold coverage based on this estimate by a whole-genome shotgun approach using the illumina HiSeq platform (Supplementary Table S1). Over 200 Gb of quality filtered short-insert and mate-pair read data (Supplementary Table S2) was initially assembled and scaffolded in accordance with the strategy outlined in Supplementary Fig. S1. Further contiguity improvements were made by Chicago library sequencing, HiRise scaffolding and gap filling (Supplementary data). The final S. glomerata draft assembly included 788 Mb in 10,107 scaffolds with a scaffold N50 of 804.2 kb and a contig N50 of 39.8 kb (Supplementary Tables S3 and S4). This genome is ∼44% larger than that estimated for the closely related Pacific oyster C. gigas (545 Mb) yet smaller than the pearl oyster P. fucata (1.14 Gb). Many invertebrate genomes, including that of oysters, exhibit high levels of heterozygosity and repetitiveness which can complicate the assembly process. The repeat content of the draft genome was estimated to be 45.03% (Supplementary Table S5). The S. glomerata genome exhibits a relatively high level of heterozygosity based on the observation of 3.2 million SNPs and 354,373 short insertion/deletions (indels) in 703,199,470 eligible positions. This resulted in a polymorphism rate of 0.51%, comparable with the inbred sequenced C. gigas (0.73%) though, in contrast with the octopus, O. bimaculoides (0.08%). Almost 82% complete and 96% partial matches to the 248 core eukaryotic gene set could be identified in the draft assembly using the CEGMA pipeline (Supplementary Table S6), which is comparable with other invertebrate genome assemblies reported previously. Using the BUSCO tool we could detect a total of 787 (93.3%) of the 843 genes in the metazoan library with 672 (79%) of these being complete matches. Over 91% of the clean short-insert read data could be aligned to the draft assembly. Half the assembly was contained in the longest 241 scaffolds ranging from 0.8 to 7.1 Mb. Over 90% of the assembled bases were covered by the longest 1,321 (13%) scaffolds. The assembly metrics of the S. glomerata genome are comparable to those of other published mollusc genomes,,, and together with completeness measures, indicate the production of a comprehensive draft.

3.2. Genome annotation and comparative analysis

A total of 29,738 protein-coding genes from the S. glomerata genome were annotated using homology-based and ab initio predictions (Supplementary data). Almost 88% of these were supported by RNA-Seq evidence derived from six different tissues. The gene content spanned one-third of the genome with a mean gene length of 8,737 bp averaging 8 exons per transcript (Supplementary Table S7). The number of gene predictions presented here is similar to that reported for the two published oysters C. gigas (28,027) and P. fucata (29,353)., Comparing BUSCO search results with the gene models from C. gigas and P. fucata show that the predictions from this study have the greatest number of complete (87.2%) and least number if missing genes (5.6%) of the three, indicating this study has produced the most complete gene model set for an oyster species reported to date (Supplementary Table S8). Phylogenetic analysis using 1,205 conserved orthologues with 247,779 amino acid positions from 16 metazoan species show that Saccostrea diverged from Crassostrea ∼77 million years ago (Ma) (Fig. 1a and Supplementary Fig. S6). The time estimates obtained for the other nodes of the tree was comparable to those of other previous studies.,,Saccostrea glomerata has the slowest non-synonymous substitution rate in protein sequences of the six bivalve assemblies reported to date, suggestive of a slowly evolving genome, a rate slightly lower than the scallop (Fig. 1b). The rate of protein evolution has a particularly strong negative correlation with the level of gene expression indicating strong selection pressures in genes expressed highly across the genome (Supplementary Fig. S7). Comparison of orthologous gene groups shared among bivalves S. glomerata, C. gigas, P. fucata and the scallop P. yessoensis show a core set of 8,838 gene groups and unique set of 1,111 that are Saccostrea-specific. Saccostrea glomerata shares 13,106 gene groups with the most closely related C. gigas yet has a larger number of unique genes (Fig. 2a and b). Gene ontology (GO) enrichment analysis of the gene groups unique to Saccostrea reveal an overrepresentation of GO terms associated largely with binding and metabolism (Supplementary Table S9).

Figure 2.

Gene family representation analysis. (a) Numbers of shared and unique gene groups in four species of molluscs. Gene groups were constructed by clustering of orthologous groups using OrthoMCL software. (b) Genome-wide orthology based on OrthoMCL gene clustering among nine species of molluscs. (c) A selection of expanded Pfam domains in S. glomerata. Associated gene families were considered expanded with a corrected P value of <0.01. Multiple domains in a given gene model were counted only once. Sgl, S. glomerata; Cgi, C. gigas: Pfu, P. fucata; Pye, P. yessoensis; Mph, Modiolus philippinarum; Bpl, Bathymodiolus platifrons; Lgi, Lottia gigantea; Aca, Aplysia californica; Obi, O. bimaculoides; Lan, Lingula anatina; Cte, Capitella teleta; Dme, Drosophila melanogaster; Spu, Strongylocentrotus purpuratus; Cel, Caenorhabditis elegans; Dre, Danio rerio; Xtr, Xenopus tropicalis; Hsa, Homo sapiens; Nve, Nematostella vectensis. *Pfam domain enriched in each species of bivalves; #Pfam domains not found expanded in Crassostrea.

3.3. Gene family expansion

To better understand the genome-wide similarities among molluscs, we examined the distribution of 8,629 Pfam domains across a diverse set of 26 metazoan genomes identifying a number of significant gene family expansions in Saccostrea. These include toll-like receptors (TLRs); PF01582, immunoglobulin domain-containing genes; PF00047, thrombospondins; PF00090, complement (C1q) subunits; PF00386, G-protein-coupled receptors (GPCRs); PF00002 and heat-shock proteins; PF00012, many of which are not found expanded in Crassostrea (Fig. 2c). A group of 42 gene families were determined to be significantly expanded in the S. glomerata genome and not so in any of the other molluscs examined. Almost a third (13) of these appear uniquely in Saccostrea, that is to say, not considered expanded in any of the other 25 species included in this analysis (Supplementary Table S10). Of these gene family expansions, some of the most notable have occurred in gene families associated with non-self-recognition and other components of the immune response. Expansions of some immune-related gene families has been described previously in studies of Crassostrea and other bivalves, and has been attributed to a pathogen-rich and dynamic intertidal habitat., However, several protein families in Saccostrea greatly exceed the domain counts found in the other oysters. Saccostrea glomerata also retain a considerable inventory of some of the important and well-studied immune-related gene families compared with other lophotrochozoans (Table 1), even though these families do not appear expanded when compared with the metazoan genomes used in this study.

Table 1.

Distribution of protein families associated with immune response

Pfam	Domain	Sgl	Cgi	Pfu	Pye	Mph	Bpl	Lgi	Aca	Obi	Lan	Cte
PF00335	Tetraspanin	63	54	53	52	4	42	39	49	31	43	50
PF00061	Lipocalin	12	13	14	11	1	11	14	17	0	17	2
PF02798	GST_N	33	25	37	29	2	23	24	56	22	44	22
PF00255	GSHPx	6	8	9	6	1	7	4	3	4	10	3
PF08210	APOBEC_N	7	2	2	1	1	1	0	0	0	6	0
PF00080	Sod_Cu	8	10	8	8	2	6	9	9	4	6	6
PF01823	MACPF	20	13	9	5	10	13	4	3	0	8	1
PF11648	RIG-I_C-RD	9	8	6	5	2	12	3	5	1	6	3
PF02898	NO_synthase	1	1	7	1	1	2	1	3	1	3	2
PF16673	TRAF_BIRC3_bd	2	2	2	1	1	1	0	0	0	0	1
	Total	161	136	147	119	25	118	98	145	63	143	90

Number of proteins containing the specific Pfam domain for each family. GST_N, glutathione S-transferase; GSHPx, glutathione peroxidase; APOBEC_N, apolipoprotein B mRNA editing enzyme, catalytic polypeptide-like N-terminal domain, N-terminal domain; Sod_Cu, Copper/zinc superoxide dismutase (SODC); MACPF, membrane attack complex/perforin (MACPF) domain; RIG-I_C-RD, C-terminal domain of RIG-I; NO_synthase, nitric oxide synthase, oxygenase domain; TRAF_BIRC3_bd, TNF receptor-associated factor BIRC3 binding domain.

Distribution of protein families associated with immune response Number of proteins containing the specific Pfam domain for each family. GST_N, glutathione S-transferase; GSHPx, glutathione peroxidase; APOBEC_N, apolipoprotein B mRNA editing enzyme, catalytic polypeptide-like N-terminal domain, N-terminal domain; Sod_Cu, Copper/zinc superoxide dismutase (SODC); MACPF, membrane attack complex/perforin (MACPF) domain; RIG-I_C-RD, C-terminal domain of RIG-I; NO_synthase, nitric oxide synthase, oxygenase domain; TRAF_BIRC3_bd, TNF receptor-associated factor BIRC3 binding domain. TLRs are characterized by an intracellular toll/interleukin-1 receptor (TIR) domain and have a well described association with innate immunity in animals and plants. The S. glomerata genome contains 182 TIR domain-containing genes that are markedly expanded beyond the relatively broad repertoire of 61 and 91 found in C. gigas and P. fucata, respectively (Fig. 3a). This level of expansion has extended to other families of non-self-recognition proteins, such as C-type lectins and fibrinogen-related proteins (FREPs). The diversification of TLRs is thought to be important for the recognition of Pathogen Associated Molecular Patterns (PAMPs) and the subsequent activation of an immunological response via MYD88 signalling., The expansions of TLRs indicate a highly specialized recognition and signalling system in Saccostrea. The abundant expression in the male gonad implies a number of these genes may have a role in reproduction (Fig. 3c). TLRs have various functions in addition to immunity in other invertebrates and mammalian studies have suggested a role for TLRs in spermatogenesis and the protection of spermatozoa. The clear tissue specificity of groups of TIR domains in S. glomerata may reflect distinctions or possible interactions between reproductive and immune processes.

Figure 3.

Expansion of Toll/interleukin-1 receptor domain containing genes in S. glomerata. (a) Phylogenetic tree of TIR domain-containing genes among five molluscs, S. glomerata (blue), C. gigas (red), P. fucata (purple), P. yessoensis (green), L. gigantea (gold). I indicates an expansion that includes the largest cluster of 13 genes located on a single scaffold (Scaffold SL_29), II & III indicate the major Saccostrea expansions. (b) Scaffold containing the largest cluster of TIR domain-containing genes. The 13 blue regions indicate TIR genes with 3 non-TIR genes interspersed in gold. (c) Expression profile of 154 TIR genes in 7 different tissues. Cell colours indicate the number of standard deviations from the mean expression level. The fibrinogen_C gene family is the C-terminal globular domain of FREPs that are known to exhibit extremely high levels of sequence variability and can function as a molecular recognition unit in immunological defence. There are a total of 576 genes in the S. glomerata genome containing at least one fibrinogen_C domain, more than in any of the 25 genomes included in the gene family analysis. This number is triple that of C. gigas (192) and P. fucata (163) and over 100 more than B. floridae which has the highest compliment of FREPs (395) of any other species included in the comparison (Fig. 4a). The S. glomerata genes containing fibrinogen_C domains vary considerably in their length, composition and domain arrangement and the majority appear to be indiscriminately distributed across the genome apart from a few main clusters, the largest being a set of 17 genes arranged in the same orientation ranging over 478 kb on a single scaffold (Fig. 4b). The majority of FREPs were not found to be expressed among the 7 tissue types used in this study; however, the expression profile of the 262 FREPs that were detected suggests these genes are distinctly tissue-specific, occurring to a lesser extent within the haemolymph (Fig. 4c). The broadest expression of FREPs can be seen in gill tissue, likely due to the increased exposure of this organ to exogenous microbes. Also observed is an expansion of C-type lectin domain-containing genes that are similarly important for non-self-recognition and display comparable tissue-specific patterns of expression (Supplementary Fig. S9).

Figure 4.

Expansion and expression of S. glomerata fibrinogen_C domain-containing genes. (a) Distribution of fibrinogen_C domain-containing gene models detected using hmmsearch among 22 selected metazoan genomes. (b) Scaffold SGL_242 and Scaffold SGL_221 contain the 2 largest clusters of fibrinogen_C domain-containing genes. The largest cluster is on Scaffold SGL_242 and contains 17 genes that vary in genomic span and are each orientated in the anti-sense direction. Scaffold SGL_221 contains 15 genes and is orientated with the majority in a set in the sense direction flanked by 5 genes in the anti-sense direction. (c) Expression profiles of 262 fibrinogen_C domain containing genes in 7 S. glomerata tissues. Expression of the remaining 314 fibrinogen_C domain containing genes was not detected in any of the 7 tissues sequenced. Phenoloxidases are copper containing proteins including tyrosinase and laccase that have important roles in the innate immune mechanisms in invertebrates. Laccases contain multicopper oxidase domains (PF00394) and along with tyrosinase domains (PF00264) are both expanded in S. glomerata. Phenoloxidase has been shown to be important in disease resistance in S. glomerata and laccases and have been shown to have antibacterial activity in C. gigas. Phylogenetic analysis of genes in S. glomerata that contain at least three multicopper oxidase domains form two clusters and are expressed most highly in the digestive system and mantle and with a pattern suggestive of tissue-specific functionality (Supplementary Fig. S10). In addition to the expansion of immune-related gene families, oysters have stress response adaptations that enable them to persist in challenging environments. Comparisons of anti-apoptosis and stress-related genes reveal a broadening of these gene families in S. glomerata compared with C. gigas (Fig. 5). A similar pattern is shared between the mussel B. platifrons which endures the stress associated with deep sea vents and the shallow-water mussel M. philippinarum. Expansion of these gene families appears to be a critical response for building resilience in these more stress-adapted species.

Figure 5.

Comparison of domains associated with stress between oysters and mussels. Number of proteins containing the specific Pfam domain for each family. Bcl-2, apoptosis regulator proteins Bcl-2 family; TNF, Tumour necrosis factor family; CARD, Caspase recruitment domain; BIR, inhibitor of apoptosis domain; USP, universal stress protein family; TNFR_c6, tumour necrosis factor receptor cysteine-rich region. Precise regulation of apoptosis is essential for an organism’s ability to adapt to changing environments. Inhibitor of apoptosis proteins (IAPs) are characterized by the presence of at least one copy of the baculovirus IAP repeat (BIR) domain which facilitate binding between IAPs and caspases, controlling apoptotic signalling. Oysters are thought to maintain a powerful anti-apoptosis system given the repertoire of 48 IAPs annotated from the genome of Crassostrea, and further supported by the 61 found in Pinctada in this study. The S. glomerata genome encodes 80 IAP genes, almost half of which form two clusters, based on phylogenetic analysis, suggesting a recent expansion (Supplementary Fig. S8a). The expression of these 80 IAPs in 7 oyster tissues appears tissue-specific, suggesting the possibility of functional specializations (Supplementary Fig. S8b). Although, the phylogenetic relationships of these IAPs do not appear to accord with tissue-specific patterns of expression. Expansions of genes containing the IAP repeat were found only within the 6 bivalves included in this study. The enrichment of IAP genes unique to the bivalve lineage is likely a fundamental component of their extraordinary survivability.

3.4. Analysis of molluscan genes associated with reproduction

Reproductive activities of all eumetazoans are controlled by the neuroendocrine system. In vertebrates, this includes a hypothalamic–pituitary–gonadal (HPG) axis. Several homologs of the HPG axis, such as gonadotropin-releasing hormone (GnRH) and glycoprotein hormones, have been identified among invertebrates. However, the molecular components of reproductive regulation in invertebrates are still largely unresolved and can be variable across phyla. We found 40 genes in the S. glomerata genome that have previous been linked to invertebrate reproductive processes. A similar distribution was observed for these genes among six other molluscan genomes (Fig. 6). The majority of non-neuropeptide genes associated with reproduction were highly expressed in the gonads, both testis and ovary. However, abundant expression of some of these genes within non-reproductive tissues (haemolymph, gill, mantle, muscle and digestive gland) indicates they may fulfil additional roles. For example, the high abundance of NPY and GPR54-2 within the digestive gland suggests a functional association of these two proteins in the physiological regulation of digestive system and feeding. Putative receptors for GnRH/corazonin, tachykinin and NPY, and a homologue of vertebrate GPR54, which is a receptor for a vertebrate reproductive neuropeptide ‘kisspeptin’, were abundant in the gonad tissues. This provides strong evidence for a role of these receptors, with their cognate ligands, in Saccostrea reproduction.

Figure 6.

Genes associated with reproduction and their distribution in 6 molluscs and their expression in S. glomerata. Genes that were more abundant in the testis compared with ovary includes the putative receptor genes (5HTR1, TACRs, GnRHR/CrzR, and NPYR) and testis-specific protein genes (TSSKs). Conversely, expression of genes encoding reproductive signalling factors, such as CDA-A, β-catenin, Phb2, nanos, FoxL2 and vitellogenin, were higher in ovary tissue. Neuropeptides, including CCAP, buccalin, APGWamide and ELH, were found at relative low abundance in the gonads, but higher in the gill and mantle tissues. In the muscle and digestive tissues, there was overall low expression of all reproductive-related genes, except for NPY and GPR54-2, which were highly expressed and potentially specific to the digestive gland.

4. Discussion

This study presents the sequenced and assembled genome of the Sydney Rock Oyster, S. glomerata, an important species for Australian aquaculture and a focus of conservation due to the severe decline in shellfish reef habitats. The S. glomerata genome is the second reference genome to be reported from the family Ostreidae (edible oysters), enabling for the first time whole-genome comparative studies between ostreid species. This resource can improve our understanding of the mechanisms these organisms have evolved for survival in the highly stressful intertidal environment, facilitating selective breeding and the enhancement of oyster cultivation practices. The draft assembly achieved comparably high levels of completeness and contiguity with respect to other published mollusc genomes. The statistics for scaffold and contig N50 improved upon the two other oyster genomes previously reported due largely to technological advances in sequencing. The use of 250 bp read lengths and ‘Chicago’ libraries contributed heavily to the production of a highly contiguous assembly, offering a more affordable alternative to BAC or fosmid sequencing. The larger genome size of S. glomerata did not appear to contain a proportionately higher number of protein-coding genes, nor larger average exon or intron sizes. However, 354 Mb of repetitive sequences (45% of the genome) were identified compared with only 202 Mb (36% of the genome) found in C. gigas. An expansion of nuclease and transposase domains such as PF01498 (15 out of 89 total genes across 23 species) in S. glomerata, suggests that this may be the result of increased transposable element activity, adding further support to a role for these mobile sequences in shaping genome variation. The types of gene families found expanded in the S. glomerata genome are not dissimilar to those in the other bivalve genomes. In fact, there is a noteworthy collection of genes associated with stress and immune defence in the genome of C. gigas that are thought to have expanded in response to the environmental variability and the exposures of filter-feeding confronted by the oyster. It appears that these features of the oyster genome are not unique to C. gigas, despite its invasive capacity, and are rather a characteristic of oysters more generally and their adaptation to a sessile existence in harsh environments. However, some of the immune-associated gene families are present up to 3-fold higher in Saccostrea than in the other published molluscs and are more abundant than in any of the other 26 species investigated here, suggesting an evolutionary history of extensive selective pressure from invading microbes. Given the relative short divergence time from Crassostrea and a slower rate of protein evolution yet a distinctly larger, more repetitive genome, it may be that the mechanisms for gene duplications are intrinsically more active in Saccostrea and are, at least in part, contributing to a phenotype of higher environmental resilience. Genes associated with molecular pattern recognition are collectively the most significantly expanded in the S. glomerata genome. Large expansions of FREPs have been extensively studied in the snail Biomphalaria and are speculated to be due to lineage-specific selective pressure from trematode parasites like Schistosoma mansoni., Protozoan parasites, among other microbes, are major threats for a variety of bivalve species. Commercial production of S. glomerata has been severely impacted by mortalities arising from infection with the protozoan pathogens, such as Marteilia sydneyi causing QX disease, which infect S. glomerata exclusively., The substantial expansion of FREPs in Saccostrea may be due to increased selective pressure from persistent challenge by disease-causing microbes and a need for broad recognition capacity. Clustering of these genes within the genome offers some evidence of local tandem duplications and the clear pattern of tissue-specific expression indicates these genes may have evolved to perform specialized functions. Interestingly, the expression of FREPs appears somewhat concentrated in the gill tissue that coincides with the site of infection for M. sydneyi and other important pathogens., In Australia, wild populations of S. glomerata have been under pressure from the introduction of the faster growing C. gigas, which can rapidly overgrow and displace the native oyster at the low to mid-intertidal zones. At higher intertidal areas, however, S. glomerata is able to survive due to its greater tolerance to thermal stress and emersion. A highly developed anti-apoptosis system is thought to be important for the endurance of oysters to abiotic stress ascribed to the expansion of IAP genes and the propensity of air exposure to induce the dramatic upregulation of IAPs in C. gigas., The expansion of anti-apoptosis and stress response genes in the S. glomerata genome appears essential to the resilience observed in this species. The draft S. glomerata genome presented here provides a valuable resource for further studies in molluscan biology and would accelerate genetic enhancement programs for commercially produced oyster species. Click here for additional data file.

64 in total

1. A general empirical model of protein evolution derived from multiple protein families using a maximum-likelihood approach.

Authors: S Whelan; N Goldman
Journal: Mol Biol Evol Date: 2001-05 Impact factor: 16.240

2. BRAKER1: Unsupervised RNA-Seq-Based Genome Annotation with GeneMark-ET and AUGUSTUS.

Authors: Katharina J Hoff; Simone Lange; Alexandre Lomsadze; Mark Borodovsky; Mario Stanke
Journal: Bioinformatics Date: 2015-11-11 Impact factor: 6.937

Review 3. Toll-like receptors in the gonads and reproductive tract: emerging roles in reproductive physiology and pathology.

Authors: Jane E Girling; Mark P Hedger
Journal: Immunol Cell Biol Date: 2007-06-26 Impact factor: 5.126

4. FastTree 2--approximately maximum-likelihood trees for large alignments.

Authors: Morgan N Price; Paramvir S Dehal; Adam P Arkin
Journal: PLoS One Date: 2010-03-10 Impact factor: 3.240

5. Diversity and evolution of TIR-domain-containing proteins in bivalves and Metazoa: New insights from comparative genomics.

Authors: Marco Gerdol; Paola Venier; Paolo Edomi; Alberto Pallavicini
Journal: Dev Comp Immunol Date: 2017-01-18 Impact factor: 3.636

6. Fast gapped-read alignment with Bowtie 2.

Authors: Ben Langmead; Steven L Salzberg
Journal: Nat Methods Date: 2012-03-04 Impact factor: 28.547

7. Reproductive neuropeptides that stimulate spawning in the Sydney Rock Oyster (Saccostrea glomerata).

Authors: Vu Van In; Nikoleta Ntalamagka; Wayne O'Connor; Tianfang Wang; Daniel Powell; Scott F Cummins; Abigail Elizur
Journal: Peptides Date: 2016-06-17 Impact factor: 3.750

8. Evolutionary origin of a functional gonadotropin in the pituitary of the most primitive vertebrate, hagfish.

Authors: Katsuhisa Uchida; Shunsuke Moriyama; Hiroaki Chiba; Toyokazu Shimotani; Kaori Honda; Makoto Miki; Akiyoshi Takahashi; Stacia A Sower; Masumi Nozaki
Journal: Proc Natl Acad Sci U S A Date: 2010-08-23 Impact factor: 11.205

Review 9. Toll-like receptors--taking an evolutionary approach.

Authors: François Leulier; Bruno Lemaitre
Journal: Nat Rev Genet Date: 2008-03 Impact factor: 53.242

10. The octopus genome and the evolution of cephalopod neural and morphological novelties.

Authors: Caroline B Albertin; Oleg Simakov; Therese Mitros; Z Yan Wang; Judit R Pungor; Eric Edsinger-Gonzales; Sydney Brenner; Clifton W Ragsdale; Daniel S Rokhsar
Journal: Nature Date: 2015-08-13 Impact factor: 49.962

22 in total

Review 1. From the raw bar to the bench: Bivalves as models for human health.

Authors: José A Fernández Robledo; Raghavendra Yadavalli; Bassem Allam; Emmanuelle Pales Espinosa; Marco Gerdol; Samuele Greco; Rebecca J Stevick; Marta Gómez-Chiarri; Ying Zhang; Cynthia A Heil; Adrienne N Tracy; David Bishop-Bailey; Michael J Metzger
Journal: Dev Comp Immunol Date: 2018-11-29 Impact factor: 3.636

2. Stress Adapted Mollusca and Nematoda Exhibit Convergently Expanded Hsp70 and AIG1 Gene Families.

Authors: Megan N Guerin; Deborah J Weinstein; John R Bracht
Journal: J Mol Evol Date: 2019-09-05 Impact factor: 2.395

3. Teneurin and TCAP Phylogeny and Physiology: Molecular Analysis, Immune Activity, and Transcriptomic Analysis of the Stress Response in the Sydney Rock Oyster (Saccostrea glomerata) Hemocytes.

Authors: Tomer Abramov; Saowaros Suwansa-Ard; Patricia Mirella da Silva; Tianfang Wang; Michael Dove; Wayne O'Connor; Laura Parker; David A Lovejoy; Scott F Cummins; Abigail Elizur
Journal: Front Endocrinol (Lausanne) Date: 2022-06-17 Impact factor: 6.055

Review 4. Potential of genomic technologies to improve disease resistance in molluscan aquaculture.

Authors: Robert W A Potts; Alejandro P Gutierrez; Carolina S Penaloza; Tim Regan; Tim P Bean; Ross D Houston
Journal: Philos Trans R Soc Lond B Biol Sci Date: 2021-04-05 Impact factor: 6.671

5. A draft genome assembly of the solar-powered sea slug Elysia chlorotica.

Authors: Huimin Cai; Qiye Li; Xiaodong Fang; Ji Li; Nicholas E Curtis; Andreas Altenburger; Tomoko Shibata; Mingji Feng; Taro Maeda; Julie A Schwartz; Shuji Shigenobu; Nina Lundholm; Tomoaki Nishiyama; Huanming Yang; Mitsuyasu Hasebe; Shuaicheng Li; Sidney K Pierce; Jian Wang
Journal: Sci Data Date: 2019-02-19 Impact factor: 6.444

6. The gene-rich genome of the scallop Pecten maximus.

Authors: Nathan J Kenny; Shane A McCarthy; Olga Dudchenko; Katherine James; Emma Betteridge; Craig Corton; Jale Dolucan; Dan Mead; Karen Oliver; Arina D Omer; Sarah Pelan; Yan Ryan; Ying Sims; Jason Skelton; Michelle Smith; James Torrance; David Weisz; Anil Wipat; Erez L Aiden; Kerstin Howe; Suzanne T Williams
Journal: Gigascience Date: 2020-05-01 Impact factor: 6.524

7. Extensive Tandem Duplication Events Drive the Expansion of the C1q-Domain-Containing Gene Family in Bivalves.

Authors: Marco Gerdol; Samuele Greco; Alberto Pallavicini
Journal: Mar Drugs Date: 2019-10-14 Impact factor: 5.118

8. Chromosome-Level Clam Genome Helps Elucidate the Molecular Basis of Adaptation to a Buried Lifestyle.

Authors: Min Wei; Hongxing Ge; Changwei Shao; Xiwu Yan; Hongtao Nie; Haibao Duan; Xiaoting Liao; Min Zhang; Yihua Chen; Dongdong Zhang; Zhiguo Dong
Journal: iScience Date: 2020-05-11

9. The genome of a subterrestrial nematode reveals adaptations to heat.

Authors: Deborah J Weinstein; Sarah E Allen; Maggie C Y Lau; Mariana Erasmus; Kathryn C Asalone; Kathryn Walters-Conte; Gintaras Deikus; Robert Sebra; Gaetan Borgonie; Esta van Heerden; Tullis C Onstott; John R Bracht
Journal: Nat Commun Date: 2019-11-21 Impact factor: 14.919

10. Adaptive venom evolution and toxicity in octopods is driven by extensive novel gene formation, expansion, and loss.

Authors: Brooke L Whitelaw; Ira R Cooke; Julian Finn; Rute R da Fonseca; Elena A Ritschard; M T P Gilbert; Oleg Simakov; Jan M Strugnell
Journal: Gigascience Date: 2020-11-10 Impact factor: 7.658