Literature DB >> 29618045

The developmental transcriptome atlas of the spoon worm Urechis unicinctus (Echiurida: Annelida).

Chungoo Park1, Yong-Hee Han2, Sung-Gwon Lee1, Kyoung-Bin Ry2, Jooseong Oh1, Elizabeth M A Kern3, Joong-Ki Park3, Sung-Jin Cho2.   

Abstract

Background: Echiurida is one of the most intriguing major subgroups of annelida because, unlike most other annelids, echiurids lack metameric body segmentation as adults. For this reason, transcriptome analyses from various developmental stages of echiurid species can be of substantial value for understanding precise expression levels and the complex regulatory networks during early and larval development.
Results: A total of 914 million raw RNA-Seq reads were produced from 14 developmental stages of Urechis unicinctus and were de novo assembled into contigs spanning 63,928,225 bp with an N50 length of 2700 bp. The resulting comprehensive transcriptome database of the early developmental stages of U. unicinctus consists of 20,305 representative functional protein-coding transcripts. Approximately 66% of unigenes were assigned to superphylum-level taxa, including Lophotrochozoa (40%). The completeness of the transcriptome assembly was assessed using benchmarking universal single-copy orthologs; 75.7% of the single-copy orthologs were presented in our transcriptome database. We observed 3 distinct patterns of global transcriptome profiles from 14 developmental stages and identified 12,705 genes that showed dynamic regulation patterns during the differentiation and maturation of U. unicinctus cells. Conclusions: We present the first large-scale developmental transcriptome dataset of U. unicinctus and provide a general overview of the dynamics of global gene expression changes during its early developmental stages. The analysis of time-course gene expression data is a first step toward understanding the complex developmental gene regulatory networks in U. unicinctus and will furnish a valuable resource for analyzing the functions of gene repertoires in various developmental phases.

Entities:  

Mesh:

Year:  2018        PMID: 29618045      PMCID: PMC5863216          DOI: 10.1093/gigascience/giy007

Source DB:  PubMed          Journal:  Gigascience        ISSN: 2047-217X            Impact factor:   6.524


Data Description

Background

Within the major annelid groups, Echiurida (also called “marine spoon worms”) is represented by a morphologically and ontogenetically unique assemblage that includes approximately 165 species, most of which lack segmentation as adults. However, they possess annelid-like morphological and developmental features, including the organization of the larval nervous system [1, 2]. They were once considered a separate metazoan phylum. However, reevaluation of morphological and molecular data indicated that Echiurida is nested within Annelida, which represents one of the three major animal phyla with body segmentation [3-7]. In this respect, transcriptome analyses from various developmental stages of echiurid species are of substantial value for understanding precise expression levels and the complex regulatory networks involved in early and larval development. Indeed, data from recently published developmental transcriptomes of other Lophotrochozoans (e.g., Aplysia californica and Platynereis dumerilii) have highlighted insights into molecular mechanisms underlying early development and metamorphosis [8, 9]. Urechis unicinctus is an echiuran species that inhabits burrows in soft sediments in intertidal areas (Fig. 1). The Urechis genus may hold important clues to the genetic basis of the evolutionary gain and loss of segmentation due to its nested position within Annelida (i.e., sister to capitellid polychaetes), a lophotrochozoan phylum that is represented by a diverse group of segmented worms [4, 7]. However, current knowledge is limited on the molecular mechanisms that underlie the ontogeny of U. unicinctus. The goal of this study is to enhance our understanding of gene expression during embryonic development. Here, we report the transcriptome profiles (generated with the Illumina HiSeq platform) of developing embryos of U. unicinctus. Transcriptome sequencing data assist in the discovery of the roles of genes involved in various embryological and larval development processes. As the first large-scale transcriptomic dataset for U. unicinctus, this resource will help in the validation of development-specific gene features predicted by the genome.
Figure 1:

Adult Urechis unicinctus used in this study (proboscis retracted). Scale bar; 1 cm.

Adult Urechis unicinctus used in this study (proboscis retracted). Scale bar; 1 cm.

Sample collection, embryo culture, and RNA isolations

Adults of U. unicinctus were collected from intertidal mud flats on the southern coast of South Korea. We extracted eggs and sperms from 1 adult female and 1 male. To obtain U. unicinctus embryos, artificial fertilization was performed by mixing the appropriate ratio of sperms and eggs. Embryos were reared in artificial seawater (reef crystals from Aquarium Systems, France) in a plastic case at room temperature (18°C–20°C). The late trochophore, a typical larval stage in which the intestinal tract is formed, was fed with a microalgae called Isochrysis galbana. Reared embryo samples were collected at each of the following stages: 0 hour (unfertilized egg), 0.5 hours post-fertilization (fertilized egg), polar body cell, 2 cell, 4 cell, 8 cell, 16 cell, 32 cell, blastula, emerged cilia, early trochophore (day 1), middle trochophore (day 2), late trochophore (day 5), and segmentation stage (day 30–45). Diagnostic features for each of the 3 trochophore stages are as follows. The early trochophore is a nonfeeding stage. In the middle trochophore, the gastrointestinal valve opens and the anus appears. In the late trochophore, the longer cilia of the apical tufts are replaced by shorter cilia that cover a greater area, and the prototroch cilia are longer. These developmental stages follow Newby's classification [10]. Total RNA was isolated from the embryos of the above samples using TRIZOL reagent (Invitrogen, Carlsbad, California) following the manufacturer's instructions. The purity and integrity of the total RNA isolated from each embryo sample were examined using a Nanodrop 2000C spectrophotometer (Thermo Scientific, Waltham, Massachusetts) and Bioanalyzer 2100 (Agilent Technologies, Palo Alto, California). Adult images were taken on a Canon EOS 550D, and embryo bright-field images were taken on a Leica DM6 B microscope using differential interference contrast (DIC) optics.

TruSeq Stranded Ribo-Zero library preparation and sequencing

Total RNA concentration was calculated using Quant-IT RiboGreen (Invitrogen, R11490). To assess the integrity of the total RNA, samples were run on TapeStation RNA screentape (Agilent, 5067–5576). Only high-quality RNA preparations, with a RNA Integrity Number greater than 7.0, were used for RNA library construction. A library was independently prepared with 1 μg of total RNA for each sample using an Illumina TruSeq Stranded Total RNA Sample Prep Kit (Illumina, Inc., San Diego, California). The rRNA in total RNA was depleted using a Ribo-Zero kit. After the rRNA was depleted, the remaining RNA was purified, fragmented, and primed for cDNA synthesis. The cleaved RNA fragments were copied into first-strand cDNA using reverse transcriptase and random hexamers. This was followed by second-strand cDNA synthesis using DNA Polymerase I, RNase H, and dUTP. These cDNA fragments then underwent an end repair process, the addition of a single “A” base, and ligation of the adapters. The products were then purified and enriched with polymerase chain reaction (PCR) to create the final cDNA library. The libraries were quantified using quantitative PCR according to the qPCR Quantification Protocol Guide (KAPA Library Quantification kits for Illumina Sequencing platforms) and qualified using the TapeStation D1000 ScreenTape (Agilent Technologies, Waldbronn, Germany). The resulting samples were sequenced on the Illumina HiSeq 2000 system with a paired-end read with 101 cycles or the Illumina HiSeq 4000 system with a paired-end read with 151 cycles (Table 1). The experimental procedures and complete assembly pipeline are summarized in Fig. 2.
Table 1:

Reads statistics

SamplesTotal produced bases (bp)Number of readsRead length (bp)guanine plus cytosine (GC) %Q30%Number of clean reads (%)
Oocyte8749,299,07857,942,37815143.8790.5354,583,372 (94.20)
Fertilized embryo7204,375,49647,711,09615143.8692.3245,817,358 (96.04)
Polar body7553,516,79050,023,29015141.4091.1247,401,970 (94.76)
2 cell8663,957,20057,377,20015140.2192.6355,263,572 (96.32)
4 cell6693,881,64244,330,34215140.8890.8143,001,172 (97.00)
8 cell7417,271,00049,121,00015142.1492.3146,360,492 (94.38)
16 cell7993,095,60852,934,40815141.5291.7550,571,562 (95.54)
32 cell22,163,185,664146,776,06415142.1191.44139,587,140 (95.10)
Blastula8885,042,03858,841,33815145.2392.0456,298,300 (95.68)
Emerged cilia8077,246,39853,491,69815144.1889.8350,401,516 (94.22)
Early trochophore7354,720,61672,819,01610145.9096.0272,513,798 (99.58)
Middle trochophore7581,052,12275,059,92210146.5896.3174,755,084 (99.59)
Late trochophore7807,192,94077,298,94010146.6996.6677,100,204 (99.74)
Segmentation10,556,984,10269,913,80215148.1992.3767,990,654 (97.25)
Figure 2:

Schematic diagram of Urechis unicinctus transcriptome analysis in this study.

Schematic diagram of Urechis unicinctus transcriptome analysis in this study. Reads statistics

Transcriptome preprocessing and de novo assembly

After completion of the sequencing run, to obtain high-quality clean reads from the raw data (i.e., removing those containing adapter sequences, poly-N sequences, or low-quality bases), we performed quality-based trimming and filtering using Trimmomatic, version 0.33 (Trimmomatic, RRID:SCR_011848) [11] with the parameters ILLUMINACLIP: TruSeq3-PE-2.fa:2:30:10 LEADING:3 TRAILING:3 SLIDINGWINDOW:4:15 MINLEN:36 for the 101 bp library (or MINLEN:50 for the 151 bp library). An average of 63 million clean reads per sample was obtained (Table 1). Before de novo assembly, all clean reads were pooled without normalization of read abundance, even though the use of all merged reads may require progressively increasing assembly time and memory usage in order to obtain a comprehensive reference transcriptome database. The merged reads were used for de novo transcriptome assembly using Trinity, version 2.1.1 (Trinity, RRID:SCR_013048) [12] with default parameters. The resulting assembled transcriptome consisted of 620,490 transcripts with an N50 value of 846 bp (Table 2). After assembly, open reading frames (ORFs) were predicted using TransDecoder (version 3.0.0) (http://transdecoder.sourceforge.net). To maximize sensitivity for capturing ORFs, all transcripts were aligned against the Uniprot/Swiss-Prot database (http://www.uniprot.org) via BLASTP search with an E-value cutoff of 10−5. Next, ORF lengths <100 amino acids were discarded to avoid maintaining transcripts with poor evidence for protein-coding regions. Finally, redundant transcripts with more than 99% sequence identity were removed using CD-HIT (version 4.6.5) [13], producing 60,472 nonredundant ORFs. These sequences span 63,928,225 bp with an N50 length of 2,700 bp.
Table 2:

Statistics for Urechis unicinctus transcriptome assembly

SamplesTotal assembled bases (bp)Number of assembled transcriptsN50 transcript length (bp) (min–max: median)Number of non-redundant ORFsNumber of ORFs with NR blast hit (longest ORF per unigene)
Oocyte45,868,75526,5692801 (201–26,298: 1105)96847791
Fertilized embryo43,996,84928,3612689 (201–26,298: 917)94697561
Polar body43,132,73826,7162626 (201–26,298: 1020)92467380
2 cell44,839,83631,3262412 (201–26,298: 917)91397254
4 cell47,675,42023,1223204 (201–26,298: 841)94147567
8 cell45,215,46227,5322564 (201–31,183: 1442)90307220
16 cell49,536,40133,7762470 (201–26,298: 871)94707463
32 cell58,598,78338,7182461 (201–26,298: 927)11,1938597
Blastula50,083,67730,5533004 (201–31,183: 901)10,9948535
Emerged cilia58,462,74627,8553320 (201–31,183: 1513)12,1539625
Early trochophore64,464,32138,4433291 (201–36,191: 858)12,98010,034
Middle trochophore72,767,17042,7973234 (201–36,191: 930)14,48211,001
Late trochophore77,723,47748,5533081 (201–36,191: 837)15,20811,300
Segmentation49,350,93826,5092740 (201–32,619: 1318)11,8839030
Total368,166,154620,490846 (201–36,191: 322)32,88020,305

Abbreviation: ORF: open reading frame.

Statistics for Urechis unicinctus transcriptome assembly Abbreviation: ORF: open reading frame. To quantify expression levels, the reads for each library were mapped independently to the reference U. unicinctus transcriptome sequences using Bowtie, version 2.2.6 (Bowtie, RRID:SCR_005476) [14]; expression levels of these transcripts were estimated with RSEM, version 1.2.26 (RSEM, RRID:SCR_013027) [15]. The unit of expression level is referred to as fragment per kilobase of transcript per million fragments mapped in our analyses.

Annotation

To annotate coding sequences (CDSs), the resulting 60,472 CDSs were compared against the NCBI nonredundant protein (NR) database (downloaded on 11 April 2017) using BLASTP with an E-value cutoff of 10−10 and the best BLAST hit. About 66% (40,111/60,472) of the CDS were assigned to superphylum-level taxa, including Lophotrochozoa (40%), Deuterostomia (8%), and Panarthropoda (2%) (Fig. 3A), which was to be generally expected. For further analysis, we excluded a number of CDSs (18%; 7,231/40,111) by using sequences derived from nonmetazoan taxa. When there were multiple coding sequences that mapped to the same gene in the NR database, the sequences with the longest CDS were first assigned to that gene. Based on this criterion, we established a comprehensive transcriptome database of 14 early developmental stages of U. unicinctus that comprises 20,305 representative functional protein-coding transcripts. We further assessed the completeness of the U. unicinctus development transcriptome using the program benchmarking universal single-copy orthologs, version 2.0 (BUSCO, RRID:SCR_015008) [16]. A total of 75.9% (230/303 genes) and 75.7% (740/978 genes) of the eukaryote and metazoan single-copy orthologs were identified, respectively (Fig. 3B).
Figure 3:

Analysis of de novo transcriptome and global gene expression patterns. A) Superphylum distribution for homology search of Urechis unicinctus coding sequences against the NR database using the best BLAST hit. B) Results of BUSCO analysis. C) Result of principal component analysis and a dendrogram of transcriptomes of 14 U. unicinctus developmental stages based on pairwise distance matrices (1 − ρ, Spearman correlation coefficient). The first, second, and third principal components account for 86.8, 6.8, and 5.9% of variance, respectively.

Analysis of de novo transcriptome and global gene expression patterns. A) Superphylum distribution for homology search of Urechis unicinctus coding sequences against the NR database using the best BLAST hit. B) Results of BUSCO analysis. C) Result of principal component analysis and a dendrogram of transcriptomes of 14 U. unicinctus developmental stages based on pairwise distance matrices (1 − ρ, Spearman correlation coefficient). The first, second, and third principal components account for 86.8, 6.8, and 5.9% of variance, respectively.

Transcriptome comparisons

To show that gene expression reflects development-specific differentiation and maturation processes, we built expression distance matrices for each developmental stage and constructed a gene expression tree (Fig. 3C). Two major transitions in expression patterns were observed: blastula to emerged cilia and late trochophore to segmentation. These transitions divided the 14 U. unicinctus developmental stages into 3 phases. The oocyte; polar body; fertilized; 2-, 4-, 8-, 16-, 32-cell embryo; and blastula stages make up phase I. The emerged cilia and early-, middle-, and late-trochophore stages make up phase II. The segmentation stage makes up phase III. These 3 distinct phases of global transcriptome profiles covering 14 developmental stages were supported by principal component analysis, which was performed using the “prcomp” function in the “stats” package in R (version 3.2.4) (Fig. 3C). These results suggest that developmental stages are well characterized by our transcription profiles and that the differential gene expression profiles presented in this study will be useful for further study of ontogenic processes at the gene expression level. In an additional analysis, a gene whose expression level was significantly changed (≥10-fold and false discovery rate adjusted P value ≤ 0.1%) in at least one comparison was defined as a developmentally regulated gene. We identified 12,705 genes that showed dynamic regulation patterns during the differentiation and maturation of U. unicinctus cells (Fig. 4). Note that we used the trimmed mean of M values normalization [17] provided by the edgeR bioconductor package for R for this test.
Figure 4:

Representative images of Urechis unicinctus developmental stages and their gene expression profiles. A) Overview of U. unicinctus developmental stages. (a) oocyte, (b) fertilized embryo, (c) polar body, (d) 2 cell, (e) 4 cell, (f) 8 cell, (g) 16 cell, (h) 32 cell, (i) blastula, (j) emerged cilia, (k) early trochophore, (l) middle trochophore, (m) late trochophore, and (n) segmentation. p, polar body; bp, blastopore; c, cilia; ls, larval stomach; int, intestine; glv, gastrointestinal valve; m, mouth; vnc, ventral nerve cord; a, anus. Scale bar; 50μm. B) A heat map showing dynamic gene expression patterns with the relative expression levels (column) in each stage (row). Expression values (trimmed mean of M values) were log2-transformed and mean-centered by transcript. The hierarchical clustering was performed with Euclidean distances of gene expression values.

Representative images of Urechis unicinctus developmental stages and their gene expression profiles. A) Overview of U. unicinctus developmental stages. (a) oocyte, (b) fertilized embryo, (c) polar body, (d) 2 cell, (e) 4 cell, (f) 8 cell, (g) 16 cell, (h) 32 cell, (i) blastula, (j) emerged cilia, (k) early trochophore, (l) middle trochophore, (m) late trochophore, and (n) segmentation. p, polar body; bp, blastopore; c, cilia; ls, larval stomach; int, intestine; glv, gastrointestinal valve; m, mouth; vnc, ventral nerve cord; a, anus. Scale bar; 50μm. B) A heat map showing dynamic gene expression patterns with the relative expression levels (column) in each stage (row). Expression values (trimmed mean of M values) were log2-transformed and mean-centered by transcript. The hierarchical clustering was performed with Euclidean distances of gene expression values. Although this study presents the first large-scale developmental transcriptome dataset for a developmentally interesting animal group, U. unicinctus (Echiurida), the global landscape of its developmental transcriptome is not yet complete due to the lack of biological replicates and reference genome sequences. In summary, we present the first large-scale, developmental, stage-specific transcriptome dataset for U. unicinctus and provide a general overview of the dynamics of global gene expression changes at different developmental stages. These data will fill an important gap in annelid-wide comparisons of gene expression patterns and will lead to a better understanding of gene repertoires involved in different developmental stages and of complex developmental gene regulatory networks.

Availability of supporting data

All raw sequencing data used for assembly have been deposited in the NCBI database under the accession numbers SRX2999418–SRX2999431, associated with BioProject PRJNA394029. Additional data further supporting the results of this article, including the transcriptome assembly, annotations, and BUSCO results, can be found in the GigaScience repository, GigaDB [18].

Abbreviations

BUSCO: benchmarking universal single-copy orthologs; CDS: coding sequence; ORF: open reading frame.

Competing interests

All authors report no competing interests.

Author contributions

C.P. and S.J.C. designed the study; J.K.P. contributed to the project coordination; Y.H.H., K.B.R., and S.J.C. performed the experiments; S.G.L., J.O., and C.P. analyzed the data and evaluated the conclusions; C.P., S.J.C., J.K.P., S.G.L., and E.M.A.K. wrote the paper; all authors read and approved the final manuscript. Click here for additional data file. Click here for additional data file. Click here for additional data file. Click here for additional data file. Click here for additional data file. 06 Sep 2017 Reviewed Click here for additional data file. 11 Sep 2017 Reviewed Click here for additional data file. 27 Nov 2017 Reviewed Click here for additional data file. 20 Sep 2017 Reviewed Click here for additional data file. 04 Dec 2017 Reviewed Click here for additional data file.
  13 in total

1.  BUSCO: assessing genome assembly and annotation completeness with single-copy orthologs.

Authors:  Felipe A Simão; Robert M Waterhouse; Panagiotis Ioannidis; Evgenia V Kriventseva; Evgeny M Zdobnov
Journal:  Bioinformatics       Date:  2015-06-09       Impact factor: 6.937

2.  Developmental transcriptome of Aplysia californica.

Authors:  Andreas Heyland; Zer Vue; Christian R Voolstra; Mónica Medina; Leonid L Moroz
Journal:  J Exp Zool B Mol Dev Evol       Date:  2010-12-06       Impact factor: 2.656

3.  Articulating "Archiannelids": Phylogenomics and Annelid Relationships, with Emphasis on Meiofaunal Taxa.

Authors:  Sónia C S Andrade; Marta Novo; Gisele Y Kawauchi; Katrine Worsaae; Fredrik Pleijel; Gonzalo Giribet; Greg W Rouse
Journal:  Mol Biol Evol       Date:  2015-07-23       Impact factor: 16.240

4.  Phylogeny of Annelida (Lophotrochozoa): total-evidence analysis of morphology and six genes.

Authors:  Jan Zrzavý; Pavel Ríha; Lubomír Piálek; Jan Janouskovec
Journal:  BMC Evol Biol       Date:  2009-08-06       Impact factor: 3.260

5.  Ultrafast and memory-efficient alignment of short DNA sequences to the human genome.

Authors:  Ben Langmead; Cole Trapnell; Mihai Pop; Steven L Salzberg
Journal:  Genome Biol       Date:  2009-03-04       Impact factor: 13.583

6.  RSEM: accurate transcript quantification from RNA-Seq data with or without a reference genome.

Authors:  Bo Li; Colin N Dewey
Journal:  BMC Bioinformatics       Date:  2011-08-04       Impact factor: 3.307

7.  Full-length transcriptome assembly from RNA-Seq data without a reference genome.

Authors:  Manfred G Grabherr; Brian J Haas; Moran Yassour; Joshua Z Levin; Dawn A Thompson; Ido Amit; Xian Adiconis; Lin Fan; Raktima Raychowdhury; Qiandong Zeng; Zehua Chen; Evan Mauceli; Nir Hacohen; Andreas Gnirke; Nicholas Rhind; Federica di Palma; Bruce W Birren; Chad Nusbaum; Kerstin Lindblad-Toh; Nir Friedman; Aviv Regev
Journal:  Nat Biotechnol       Date:  2011-05-15       Impact factor: 54.908

8.  CD-HIT: accelerated for clustering the next-generation sequencing data.

Authors:  Limin Fu; Beifang Niu; Zhengwei Zhu; Sitao Wu; Weizhong Li
Journal:  Bioinformatics       Date:  2012-10-11       Impact factor: 6.937

9.  Annelid phylogeny and the status of Sipuncula and Echiura.

Authors:  Torsten H Struck; Nancy Schult; Tiffany Kusen; Emily Hickman; Christoph Bleidorn; Damhnait McHugh; Kenneth M Halanych
Journal:  BMC Evol Biol       Date:  2007-04-05       Impact factor: 3.260

10.  Trimmomatic: a flexible trimmer for Illumina sequence data.

Authors:  Anthony M Bolger; Marc Lohse; Bjoern Usadel
Journal:  Bioinformatics       Date:  2014-04-01       Impact factor: 6.937

View more
  4 in total

1.  Echiuran Hox genes provide new insights into the correspondence between Hox subcluster organization and collinearity pattern.

Authors:  Maokai Wei; Zhenkui Qin; Dexu Kong; Danwen Liu; Qiaojun Zheng; Shumiao Bai; Zhifeng Zhang; Yubin Ma
Journal:  Proc Biol Sci       Date:  2022-09-07       Impact factor: 5.530

2.  Delegating Sex: Differential Gene Expression in Stolonizing Syllids Uncovers the Hormonal Control of Reproduction.

Authors:  Patricia Álvarez-Campos; Nathan J Kenny; Aida Verdes; Rosa Fernández; Marta Novo; Gonzalo Giribet; Ana Riesgo
Journal:  Genome Biol Evol       Date:  2019-01-01       Impact factor: 3.416

3.  Identification of the neuropeptide precursor genes potentially involved in the larval settlement in the Echiuran worm Urechis unicinctus.

Authors:  Xitan Hou; Zhenkui Qin; Maokai Wei; Zhong Fu; Ruonan Liu; Li Lu; Shumiao Bai; Yubin Ma; Zhifeng Zhang
Journal:  BMC Genomics       Date:  2020-12-14       Impact factor: 3.969

4.  Muscular Development in Urechis unicinctus (Echiura, Annelida).

Authors:  Yong-Hee Han; Kyoung-Bin Ryu; Brenda I Medina Jiménez; Jung Kim; Hae-Youn Lee; Sung-Jin Cho
Journal:  Int J Mol Sci       Date:  2020-03-26       Impact factor: 5.923

  4 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.