Literature DB >> 29204349

Annotation of nerve cord transcriptome in earthworm Eisenia fetida.

Vasanthakumar Ponesakki1, Sayan Paul1, Dinesh Kumar Sudalai Mani1, Veeraragavan Rajendiran1, Paulkumar Kanniah1, Sudhakar Sivasubramaniam1.   

Abstract

In annelid worms, the nerve cord serves as a crucial organ to control the sensory and behavioral physiology. The inadequate genome resource of earthworms has prioritized the comprehensive analysis of their transcriptome dataset to monitor the genes express in the nerve cord and predict their role in the neurotransmission and sensory perception of the species. The present study focuses on identifying the potential transcripts and predicting their functional features by annotating the transcriptome dataset of nerve cord tissues prepared by Gong et al., 2010 from the earthworm Eisenia fetida. Totally 9762 transcripts were successfully annotated against the NCBI nr database using the BLASTX algorithm and among them 7680 transcripts were assigned to a total of 44,354 GO terms. The conserve domain analysis indicated the over representation of P-loop NTPase domain and calcium binding EF-hand domain. The COG functional annotation classified 5860 transcript sequences into 25 functional categories. Further, 4502 contig sequences were found to map with 124 KEGG pathways. The annotated contig dataset exhibited 22 crucial neuropeptides having considerable matches to the marine annelid Platynereis dumerilii, suggesting their possible role in neurotransmission and neuromodulation. In addition, 108 human stem cell marker homologs were identified including the crucial epigenetic regulators, transcriptional repressors and cell cycle regulators, which may contribute to the neuronal and segmental regeneration. The complete functional annotation of this nerve cord transcriptome can be further utilized to interpret genetic and molecular mechanisms associated with neuronal development, nervous system regeneration and nerve cord function.

Entities:  

Keywords:  Annotation; Eisenia fetida; Nerve cord; Transcriptome

Year:  2017        PMID: 29204349      PMCID: PMC5688751          DOI: 10.1016/j.gdata.2017.10.002

Source DB:  PubMed          Journal:  Genom Data        ISSN: 2213-5960


Introduction

The cellular components of nervous system play regulatory role in conducting sensory signal transduction, synaptic transmission, muscle contraction and neuronal communication [1]. Among the invertebrate species, the annelid earthworms are occupying pivotal position within soil faunal biomass due to their immense contribution in maintaining the soil quality & fertility and their ecotoxicological importance in assessing the soil contaminants and toxicity [2], [3]. In addition, the earthworms are also renowned for their astonishing phenomenon of regenerating the amputated organs, hence, used as a key model for stem cell and regeneration research [4]. From the past few decades, the nervous system of the earthworms has been extensively studied to unveil their importance in maintaining the sensory and behavioral physiology of the species. The nervous system of the worm is composed of bilobed cerebral ganglion connected to the circumpharyngeal connectives followed by the sub-esophageal ganglion and the nerve cord extended throughout the length of the worm [5], [6]. The complex regeneration process of the bilobed cerebral ganglion commences from the amputated nerve cord and the worm regenerates the functional brain in eight days [7]. Acknowledging the correlation between nerve cord and escape reflex of earthworm, Drewes, 1984 indicated the activation of the giant nerve fibers in the ventral nerve cord is critically required to elicit the escape reflexes in the species under threatening external stimuli [8]. Subsequent studies have confirmed the transplantation of the ventral nerve cord in heterotrophic and homotropic manner has the capability of restoring the sensory and motor activities to reconstruct the crucial behavioral reflex pathways associated with the escape mechanism of the worm [9]. The earthworm Eisenia fetida, popularly known as the red wriggler worm, has always remained a species of interest to the environmental and regeneration biologists because of their easy availability, effective usage in controlling the waste management, extensive regeneration capacity and rapid growth and reproductive ability [10], [11], [12], [13]. The morphological analysis has demonstrated the presence of both peptidergic and amine secreting neurons in the ventral nerve cord of the worm [14]. The immunocytochemical analysis revealed that the organization of serotonergic system in earthworm E. fetida is correlated with the embryogenesis of the species and during the post embryonic development/hatching period the number of serotonin immunoreactive cells and total serotonin content increase in the ganglia of the ventral nerve cord [6]. The stomagastric nervous system of earthworm E. fetida showed the distribution of monoamine neurotransmitters like serotonin, dopamine and octopamine, amino acid neurotransmitter like GABA and neuropeptides like proctolin, FMRFamide and ETP having the excitatory and inhibitory effects on the muscle cells of alimentary canal [15]. Herbert et al., 2009 identified six novel neuropeptides in the ventral nerve cord ganglia of worm having significant sequence homology with the insect neuropeptides like pyrokinins (PKs) and periviscerokinins (PVKs) [16]. Varhalmi et al., (2008) demonstrated the expression of neurotrophic factor PACAP (pituitary adenylate cyclase-activating polypeptide) in both normal and regenerated ventral nerve cord ganglia of E. fetida suggesting the peptide may act as a crucial neuro-modulator and assist during the caudal regeneration of the worm [17]. Transcriptome analysis is widely used powerful tool that provides a speedy and meticulous gene expression data in a given tissue sample. During specific developmental and physiological events, it is an effective way for studying the gene expression, gene regulation, characterizing gene functions, and discovering of novel genes and biomarkers [18]. The recent advancements in next generation sequencing techniques accompanied with relevant bioinformatics tools have enabled to explore the non-model species at the transcriptome level [19]. Especially in the field of neurological research the transcriptome sequencing has emerged as a valuable technique to explore the differentially expressed genes controlling the cellular and molecular architecture of the invertebrate nervous system. Among molluscs species, the de novo transcriptome analysis of the central nervous system (CNS) of Lymnaea stagnalis generated 116,355 contigs of which only 18% sequences had significant match to the known proteins in BLAST search including some newly identified monoamine synthesis enzymes like tyrosine hydroxylase and dopa decarboxylase [20]. Among the arthropods the transcriptome based EST database prepared from the CNS of the desert locust Schistocerca gregaria displayed 4000 functionally annotated transcripts with a considerable proportion of sequences associated with the neuronal signaling and signal transduction to regulate the physiological process of the species [21]. The deep sequencing transcriptome study of the tissue samples obtained from the nervous system of Cancer borealis (Jonah crab) and Homarus americanus (American lobster) identified 9489 and 11,061 protein coding transcripts along with the genes belong to ion channels, amine and GABA receptors and neurotransmitter receptors to regulate the neuronal properties and behavioral physiology of the crustacean species [22]. The inadequate information at genome level for most of the annelid worms have prioritized the transcriptome analysis and annotation as the key to understanding the crucial genes and transcripts associated with several biological events like metabolism, regeneration and development. Few significant research works based on the transcriptome and expressed sequence tag analysis of various earthworm species have been performed in the recent past. These researches have revealed some of the key findings like expression of globin-related genes and fibrinolytic enzymes in the midgut sample of earthworm Eisenia andrei [18], differential expression of putative genes under inorganic, organic and agrochemical xenobiotic exposure in earthworm Lumbricus rubellus [23], genes regulating the anterior regeneration in Perionyx excavatus [24] and identification of genes responsible for immune response in Eisenia andrei [25]. Considering the ecological and environmental aspects of the earthworm E. fetida several transcriptome analysis were undertaken to extract maximum information regarding their genetic and molecular resources. Pirooznia et al., 2007 performed the cloning and sequencing of the ESTs (expressed sequence tags) of the worm to monitor the genes associated with environmental toxicity [26]. Gong et al., 2010 constructed the transcriptome based EST library for designing and validating the oligo-probes associated with the given ESTs [27]. More recently the comprehensive de novo transcriptome analysis and annotation study have captured the differential expression of toxin genes by giving priority to the ecotoxicological impact of the species [28]. Despite of certain elaborative efforts the nervous system and the nerve cord specific genes of the earthworms have not been explored to a great extent at the transcriptome level. Among the few important studies, the transcriptome sequencing of earthworm E. fetida upon the exposure to the neurotoxic chemical Perfluorooctanesulfonic acid (PFOS) exhibited altered expression in neuronal development and calcium homeostasis related genes and resulted in the neurodegeneration of the species [29]. Subsequently, the Perfluorooctanoic acid (PFOA) exposure in the worm led to the differential regulation of different neuronal development specific genes associated with synaptogenesis, synaptic transmission and cellular morphogenesis [30]. However, the data is still quite insufficient and can be improved further with comprehensive transcriptome analysis and annotation. Notably, the transcriptome data prepared by Gong et al. [27] were obtained from the 454 sequencing of the nerve cord tissue comprising both neuronal and peripheral cells of earthworm E. fetida, but the complete functional annotation details of the assembled contigs were remained unaccomplished in their study. The present study focuses on the functional annotation of the assembled transcriptome dataset of earthworm E. fetida, generated by Gong et al., 2010 to identify the nerve cord specific genes and predict their functional features in regulating different cellular and physiological processes like signal transduction, reproduction and sensory perception. Simultaneously with extensive regeneration capacity and well known metabolic activity of the earthworm E. fetida the annotation overview based on biological pathway assessment and gene ontology analysis can be informative to identify and categorize the crucial stem cell markers, developmental genes and metabolic markers reside in the nerve cord the species.

Materials and methods

Acquiring de novo assembled contig sequences of Eisenia fetida

We have selected the Newbler and Seqman assembled contig sequences obtained from the 454 transcriptome sequencing of the nerve cord cDNA sample of earthworm E. fetida (Gong et al., 2010; Table S1, Table S2). The statistical information regarding the length, median value, N50 value and total number of bases of the given contigs was generated by using the NGS QC Toolkit [31].

Annotation of the transcriptome data

The functional annotation study of the Newbler and Seqman assembled contigs of earthworm E. fetida was performed through 1. Sequence based alignment and 2. Domain based alignment. The sequence based similarity searching was carried out by aligning the sequences against NCBI non-redundant protein (nr) database, and Swiss Prot database using the BLASTx algorithm with E-value threshold of 1E-5. Depending upon the significant BLAST hits obtained from the nr database the Gene Ontology (GO) (www.geneontology.org) annotation study was performed by using BLAST2GO functional annotation software [32], [33]. The GO annotation step was accomplished with two sub steps: 1. retrieving the GO terms associated with the BLAST hits (Mapping) and 2. assigning the GO terms and Enzyme Commission number (EC number) to the query sequences based on certain parameters like Annotation Cut-Off: 55; E value hit filter: 1.0E-6; GO Weight: 5; HSP-Hit Coverage Cut-Off: 0 and Hit Filter: 500. The domain details and motif information of the query contig sequences were obtained by scanning the sequences against InterPro protein signatures databases using the InterProScan annotation embedded in BLAST2GO [34]. The resulting GO terms from InterProScan annotation were further transferred and merged with the already existing GO terms acquired from GO annotation. To predict and classify the functions in a more reliable manner the assembled contigs were annotated based on their orthologous groups (COGs). The screening and annotation of the corresponding orthologous groups for the given contigs of earthworm E. fetida were performed by using the EggNog database (Evolutionary genealogy of genes) implanted within the BLAST2GO annotation software [35]. The Newbler and Seqman assembled contigs were aligned to the EggNog database using the E-value threshold of 1E-5 and minimum similarity percentage of 50%. The KEGG database acts as a major knowledgebase to provide significant details regarding the biological pathways and molecular networks associated with the given genes or transcripts (36). The KEGG pathway annotation of the assembled contigs was achieved by mapping the sequences assigned with EC numbers against the online KEGG pathway database using the BLAST2GO program. The enrichment of the gene ontology terms associated with the nerve cord transcriptome of earthworm E. fetida in comparison to their draft genome [37] was analyzed by using the Fisher's exact test [38] implemented in the BLAST2GO version 4.1. The annotated draft genome of E. fetida was used as the reference set and the nerve cord specific transcripts were used as the test set for the analysis. The P-values were corrected by Benjamini and Hochberg FDR correction method and the FDR < 0.05 was considered as statistically significant. Further the enrichments of the KEGG pathways were confirmed by KOBAS 3.0 web tool [39].

Identification of neuropeptides and stem cell markers

In invertebrate organisms the neuropeptides are largely localized in the central and peripheral nervous system and act as essential neurotransmitters to regulate the cellular and physiological activities [40]. The Newbler and Seqman assembled transcripts of earthworm E. fetida annotated against the nr database were thoroughly screened to monitor the neuropeptide genes present in the nerve cord tissue of the species. Simultaneously the potential stem cell specific genes participate in the cellular differentiation and segmental regeneration processes of the worm were identified by comparing the assembled contigs with the previously reported 250 stem cell marker genes obtained from human amniocytes [41] using the local BLASTX with E-value cutoff of 1E-5.

Identification of calcium binding proteins and phylogenetic analysis

The nr annotated contigs of earthworm E. fetida were further screened to identify the calcium binding proteins containing the EF hand domain. The identified contigs were translated into their corresponding protein sequences by using the Expasy translate tool. The translated protein sequences of earthworm E. fetida were aligned to their orthologs in leech Helobdella robusta and the polychaete Capitella teleta using the Clustal W multiple sequence alignment tool [42] with default settings. The phylogenetic tree was constructed using the UPGMA method [43] with 100 bootstrap replicates.

Results and discussion

Quality assessment and length distribution of contigs

To create the complete annotation framework of the transcriptome dataset obtained from the nerve cord cDNA sample of earthworm E. fetida we have selected total 94,716 contigs generated from the de novo assembly of 562,327 filtered reads using Newbler and Seqman assemblers (Gong et al., 2010; Table S1, Table S2) [27]. Both the Newbler and Seqman are efficient assemblers for the 454 transcriptome dataset. The Newbler assembly is comparatively faster and produces less number of contigs with low redundancy and better alignment scores whereas the Seqman assembly usually generates higher amount of small and redundant contigs along with many significant novel transcripts [44], [45]. The quality assessment statistics (Table 1) obtained from the NGS QC Toolkit reported that the N50 and N75 values for the Newbler assembled contigs were 146 bp and 109 bp and Seqman assembled contigs were 170 bp and 119 bp respectively. The GC% for both the Newbler and Seqman assembled contigs were observed as 39.89. The distribution of contigs length is a considerable feature, which can be connected with retrieving maximum number of BLAST matches. Hence, we have opted for analyzing the contig length distribution and correlated the information with number of sequences matched in the BLAST search against the non-redundant (nr) database. Among the total 94,716 assembled contig sequences a large proportion of contigs (87,183, 92.04%) showed sequence length ≤ 300 bp, whereas only 5000 (5.27%), 1504 (1.58%), 586 (0.61%) and 443 (0.46%) contigs had sequence lengths ranging from 301 to 500 bp, 501–700 bp, 701–900 bp and ≥ 901 bp respectively.
Table 1

Summary Statistics of the Newbler and Seqman assembled contigs.

StatisticsNewbler contigsSeqman contigs
Total sequences31,11463,602
Total bases4,828,47010,463,348
Min sequence length7240
Max sequence length13952167
Average sequence length155.19164.51
Median sequence length115.00128.00
N25 length277303
N50 length146170
N75 length109119
GC%39.89%39.89%

Annotation of E. Fetida contigs against public databases

In transcriptomic study, the annotation is a crucial step to gain maximum information regarding the sequence identity and features of the resulting transcripts. The assembled contig sequences of earthworm E. fetida were annotated through similarity searching against the NCBI non-redundant protein (nr) database and Swiss Prot database using the BLASTX algorithm with cut-off E-value < 1E− 5. The annotation quality based on BLAST search was further improved by comparing the contigs against the EST database using the BLASTN algorithm with E-value threshold 1E-5. A total of 17,189 (18.14%) contigs were annotated from BLAST search against these three databases. The overall sequence alignment data represented 9762 (10.30%), 8992 (9.49%) and 9560 (10.09%) E. fetida transcripts with significant BLAST matches against the nr, SwissProt and EST databases respectively. A three way Venn diagram showed the overlap of the assembled contigs annotated against the nr, SwissProt and EST databases (Fig. 1A). The Venn diagram represented that 2689 contigs had homologous sequences in all three databases and 395, 276 and 5075 contig sequences showed the annotation overlap between nr and EST, EST and SwissProt and nr and SwissProt databases respectively. Among the contigs with significant BLAST hits total 6199 (36.06%) sequences showed exclusive BLAST matches with the dbEST as they were not annotated to the nr and Swiss Prot databases. The Sequence based similarity analysis depicted that most of the assembled contigs did not match to the known proteins available in these databases probably due to their short sequence length or the deficiency of the genome information for earthworm E. fetida. After correlating the contigs length with the retrieved BLAST hits we have observed that 51.57% of the contigs with sequence length ≥ 900 bp showed BLAST matches against nr database, whereas the matching efficiency gradually decreased to 10.43% for the contigs with sequence length between 100 and 300 bp (Fig. 1B). The retrieval of low number of BLAST hits for the contigs with shorter sequence length is mainly due to the lack of conserved domains representing the functional properties of the annotated proteins [46]. The overall data distribution details of the annotated contigs obtained from BLAST2GO analysis was outlined in Fig. 1C. The complete list of the Newbler and Seqman assembled transcripts annotated against the nr database was documented in Table S1.
Fig. 1

(A) A three way Venn diagram denoting the unique and overlapped transcripts annotated against the public databases nr, SwissProt and EST (BLASTX algorithm; E-value threshold 1E-5). (B) The correlation between contigs length and percentage of sequences with BLAST matches. The longer assembled contigs have more number of sequences with BLAST matches against NCBI nr database (cut-off E-value < 1E− 5). (C) The data distribution details obtained from BLAST2GO annotation depicts the InterProScan annotation, BLAST hits, mapping and annotation summary of the assembled contigs.

The E-value distribution and sequence similarity distribution are considerable statistical parameters to evaluate the quality of BLAST analysis [47]. The E-value distribution of the top BLAST hits obtained from nr database revealed that among the contigs with BLAST hits 9040 (92.60%) sequences showed significant homology with an E-value ranging from 1E-5 to 1E-45 (Fig. 2A). The similarity distribution data indicated that 53.69% of the assembled contigs with BLAST hits showed sequence similarity ≥ 80% to the known protein sequences in nr database (Fig. 2B). The analysis of the top hit species distribution data for the transcriptome dataset of earthworm E. fetida identified the leech, Helobdella robusta as the top hit species with maximum number of BLAST hits (20%). Among the other species a commendable number of BLASTX hits were observed to match with the nr protein sequences of polychaete worm, Capitella teleta (16%), brachiopod species, Lingula anatina (6%), oyster, Crassostrea gigas (4%) and owl limpet, Lottia gigantea (3%) respectively (Fig. 2C). Helobdella robusta and Capitella teleta are the most extensively studied annelid species with complete draft genome information giving detailed insights regarding their sensory perception, signal transduction, embryonic development and metabolic activities [48]. In addition, the genome datasets of these two annelids have represented the expression of several neuro hormones and neuropeptides precursors providing a significant clue regarding the neuropeptide evolution in the early lineage [49]. The top hit species distribution statistics obtained from the BLASTX annotation clearly indicated that the nerve cord specific transcripts of earthworm E. fetida are evolutionarily more closely related to the other annelid organisms and the distribution of the species depends on the completeness of their genome information available in the databases. Interestingly we have found that only 141 (1%) BLAST hits for the identified transcripts matched with the known protein sequences of earthworm E. fetida available in the nr databases probably due to the lack of annotated draft genome and insufficient proteome information of the species in the database.
Fig. 2

(A) E-value distribution of the nr BLAST hits for each sequence of contigs with a cut-off E-value of 1E-5. (B) Sequence similarity distribution (in percentage) of top BLAST hits for each contigs sequences. (C) Top hit species distribution summary of the E. fetida contigs annotated against nr database with an E-value cut-off of 1E-5.

(A) A three way Venn diagram denoting the unique and overlapped transcripts annotated against the public databases nr, SwissProt and EST (BLASTX algorithm; E-value threshold 1E-5). (B) The correlation between contigs length and percentage of sequences with BLAST matches. The longer assembled contigs have more number of sequences with BLAST matches against NCBI nr database (cut-off E-value < 1E− 5). (C) The data distribution details obtained from BLAST2GO annotation depicts the InterProScan annotation, BLAST hits, mapping and annotation summary of the assembled contigs. (A) E-value distribution of the nr BLAST hits for each sequence of contigs with a cut-off E-value of 1E-5. (B) Sequence similarity distribution (in percentage) of top BLAST hits for each contigs sequences. (C) Top hit species distribution summary of the E. fetida contigs annotated against nr database with an E-value cut-off of 1E-5. Summary Statistics of the Newbler and Seqman assembled contigs.

Conserved domain annotation of E. Fetida transcripts

The annotation based on evolutionary conserve domains provides detailed insight regarding the domain architecture and functional sites conserved across the protein families [50]. To further understand the functional properties of the given transcripts of earthworm E. fetida the conserved domains are identified against the InterPro database using the BLAST2GO software [34]. The top 30 InterPro domains/families for the Newbler and Seqman assembled contigs were represented in Fig. 3. The domain details obtained from InterProScan analysis denoted “P-loop containing nucleoside triphosphate hydrolase” (IPR027417) as the most conserved domain with 222 contigs, followed by the “EF-hand domain pair” (IPR011992) (160 contigs), “EF-hand domain” (IPR002048) (132 contigs) and “NAD(P)-binding domain” (IPR016040) (109 contigs). Detailed analysis of the identified domain characteristics in the transcript sequences of earthworm E. fetida depicted that most of the abundant domains are associated with several critical biological functions like nucleotide binding, signal transduction and transcriptional regulation to carry out the genetic and environmental information processing. More specifically, the P-loop NTPase domain plays a crucial role in energy production through NTP hydrolysis and acts as a substrate for nucleotide binding [51]. Simultaneously the domain also participates in signal transduction, stress response and bacterial transcriptional regulation [52]. The abundance of this domain in the transcriptome dataset of earthworm E. fetida indicated that most of the transcripts expressed in the nerve cord of the worm are associated with signal transduction, protein transport and localization, signal recognition, chromosome partitioning and membrane transport [53]. Besides the InterPro analysis data depicted the presence of several ATP synthase subunits and molecular motor proteins with P-loop NTPase domain in the transcriptome dataset. The ATP and the related nucleotides are important neurotransmitters of the central and peripheral nervous system to conduct the purinergic neurotransmission, smooth muscle contraction and sensory perception to external stimuli [54], [55]. Among the molecular motor proteins myosin is a conserved motor protein that regulates the muscle contraction and motility in most of the living organisms through ATP hydrolysis [56], [57]. Since the backbone of our transcriptome dataset is the nerve cord sample of earthworm E. fetida, the physiological energy resource generated from the nucleotide hydrolysis may assist in the neurotransmission, sensory signal transduction and muscle contraction of the species. The second most representative domain in our study, the EF-hand domain pair mainly involves in signal transduction, transcriptional regulation and muscle contraction through calcium ion binding [58]. Simultaneously the EF-hand domain acts as the hallmark for the calcium binding proteins participating in neuronal stimulation and neural circuit development though their calcium buffering activities. Especially in invertebrate like Drosophila the calcium buffering assist in the axonal growth during the neuronal development [59], [60]. Subsequent studies have reported the EF-hand domain containing calcium binding proteins play regulatory role in the maturation of nervous system and formation of chemosensory neurons in nematodes like Caenorhabditis elegans to facilitate the serotonin synthesis [61]. Hence the expression of calcium binding proteins containing the EF-hand domain in the transcriptome dataset may assist in the neuronal development and locomotory activity of the worm to maintain the behavioral physiology of the species.
Fig. 3

Histogram of top 30 InterPro domains distribution obtained from InterProScan annotation of the E. fetida nerve cord transcriptome dataset using the BLAST2GO software.

Histogram of top 30 InterPro domains distribution obtained from InterProScan annotation of the E. fetida nerve cord transcriptome dataset using the BLAST2GO software.

Gene ontology annotation

The gene ontology (GO) is an international functional classification system aims to annotate and classify the functional features of the genes and gene products in species independent manner [32]. BLAST2GO software was used to functionally categorize these annotated contigs into three different ontologies known as Biological Process, Molecular Function and Cellular Component. In our gene ontology study, we made use of the BLAST2GO mapping step to retrieve GO terms associated with the contigs having BLAST hits from the nr database. The GO data illustrated that out of 9762 contigs with nr BLAST hits 8819 (90.34%) contig sequences were mapped with the associated GO terms and among them 7680 (78.67%) contigs were successfully annotated to total 44,354 GO terms. The functional categorization study of the obtained GO terms represented that among these 44,354 GO terms assigned to the annotated contigs 19,786 (44.60%) GO terms belong to the Biological Process, 12,937 (29.16%) GO terms were categorized into Molecular Function category and 11,631 (26.22%) GO terms belong to the Cellular Component. The major functional subcategories specifying the predicted functional features of the annotated contigs were obtained from GO level class 2 and representing total 55 functional groups (20 Biological Process, 15 Molecular Function and 20 Cellular Component) in the gene ontology annotation study (Fig. 4). The overall GO distribution analysis of the annotated contig dataset of earthworm E. fetida denoted that within the Biological Process the dominating subcategories were “cellular Process” (58.45%), “metabolic process” (56.17%) and “single-organism process” (40.06%); within the Cellular Component the most represented GO terms were “cell” (56.88%), “cell part” (56.30%) and “organelle” (44.80%) and within Molecular Function most of the annotated contigs are associated with the subcategories like “binding” (53.34%) and “catalytic activity” (45.96%). Interestingly the transcriptome annotation data obtained from sterile cultured earthworm Eisenia andrei also identified these GO terms as dominant functional subcategories for the reported transcripts (25). Apart from these over represented subcategories a substantial percentage of the annotated contigs were assigned to the significant GO terms like “biological regulation” (GO:0065007), “response to stimulus” (GO:0050896), “structural molecule activity” (GO:0005198) and “macromolecular complex” (GO:0032991).
Fig. 4

Histogram representing the gene ontology distribution of the assembled E. fetida contigs. The functionally annotated contigs were classified into three main GO categories: Biological Process (BP), Molecular Function (MF) and Cellular Component (CC).

Histogram representing the gene ontology distribution of the assembled E. fetida contigs. The functionally annotated contigs were classified into three main GO categories: Biological Process (BP), Molecular Function (MF) and Cellular Component (CC). The environmental and cellular signals play pivotal role in regulating the physiological process of the living organisms. In earthworm, the sensory perception to the external stimuli is a notable physiological feature carried out mainly by the sensory nerve fibers residing in the ventral nerve cord ganglion [62]. The external stimuli coming from the tactile, photo and chemical stimulus creates electrical impulses in the form of the action potential to excite the giant nerve fibers and assist in the sensory and escape reflex of the worm [63]. Simultaneously several neuropeptides and monoamine neurotransmitters serve as important chemical messengers to transmit the neuronal signal and conduct synaptic modulation [16]. Hence the nerve cord of earthworm E. fetida is expected to contain a large number of contigs associated with signal transduction, cell signaling and sensory perception of the species. In the annotated transcriptome dataset we have observed 323 contigs were assigned to the function “signal transduction” (GO:0007165), 31 contigs were assigned to the function “cell-cell signaling” (GO:0007267) and 30 contigs were assigned to different sensory perceptions (GO:0007601, GO:0050908, GO:0050916, GO:0019233, GO:0001580, GO:0007600, GO:0007605, GO:0050909) (Table S2). A total of 410 transcripts in the transcriptome dataset were found to associate with different signaling pathways like cytokine-mediated signaling pathway (GO:0019221), G-protein coupled receptor signaling pathway (GO:0007186), fibroblast growth factor receptor signaling pathway (GO:0008543), epidermal growth factor receptor signaling pathway (GO:0007173), Notch signaling pathway (GO:0007219), Wnt signaling pathway (GO:0016055) and smoothened signaling pathway (GO:0007224). The G protein-coupled receptors (GPCRs) are conserved seven transmembrane domain receptors that mediate sensory perception in response to the environmental cues by altering the neural circuits [64]. Several odor molecules, pheromones, light sensitive components, neurohormones and neurotransmitters serve as the potential ligands for binding and activating the GPCRs to conduct a wide range of cellular and physiological processes like photoreception, olfaction, chemoreception, synaptic transmission, neuromodulation and cellular communication [65], [66]. Besides the notch pathway acts as a key signaling mechanism to determine the cell fate during development. In invertebrate like Drosophila the notch pathway assists in the patterning of neurogenesis and lateral inhibition of neural fate to promote the embryo and larval development [67]. Interestingly in annelid Platynereis the notch signaling is associated with the development of chaetal sac and formation of bristles to support the locomotory activity [68]. Previous studies confirmed that ventral nerve cord has pronounced effect on the reproductive behavior of several invertebrate species. In insect tobacco budworm the ventral nerve cord regulates the emission of sex pheromone and enhances the egg production and deposition [69]. The egg laying behavior in nematode C. elegans is modulated by different environmental stimuli activating the neural circuits, motor neurons and skeletal muscles. The ventral nerve cord plays major role in regulating the hermaphrodite-specific motor-neurons to conduct the synaptic transmission for the contraction of egg laying muscle in C. elegans [70]. The transcriptome analysis have pinpointed the regulatory role of pheromones (Attractin, Temptin and Seductin), sexual differentiation markers (DMRT3, SPATA2 and SOX3), germ line determination factors (vasa, PL10) and fertilization factors (Fertilin and Acrosin) in reproduction and sexual maturity of the Mediterranean earthworms [71]. The functional annotation details of earthworm E. fetida denoted that total 89 contigs in the dataset were assigned to the gene ontology term reproduction (GO:0000003) and 61 contigs were associated with the function embryo development (GO:0009790, GO:0009795). Based on the sequence homology and functional importance of the annotated contigs we have shortlisted some crucial genes (Table 2) regulating the functions like anatomical structure formation involved in morphogenesis (GO:0048646), developmental maturation (GO:0021700), gonad development (GO:0008406), hatching (GO:0035188), nematode larval development (GO:0002119) and hermaphrodite genitalia development (GO:0040035). Among the identified genes Ras-related Rab-2, dihydrolipoyl dehydrogenase, DEAD box, Centrin-2 and Calmodulin showed complete (100%) sequence match with the nr database. Calmodulin is a well-known calcium binding protein which can alter the Smad signaling pathway to trigger the ventral mesoderm formation in Xenopus species [72]. In addition calmodulin plays decisive role in modulating the calcium function to allow the compaction essential for blastocyst formation [73]. Among the other embryonic regulators the 14-3-3 zeta protein affects the RAS/MAP kinase pathway to trigger the eye and embryo differentiation in Drosophila [74].
Table 2

List of transcripts associated with embryonic development and reproduction.

SeqNameDescriptionE-valueSim mean
Newbler_Contig00646Ras-related Rab-25.46E-13100%
Newbler_Contig08599cAMP-dependent kinase catalytic subunit beta isoform × 33.47E-2197%
Newbler_Contig11354Dihydrolipoyl dehydrogenase6.86E-15100%
Newbler_Contig12349DEAD box ATP-dependent RNA helicase3.51E-13100%
Newbler_Contig17280Pre-mRNA-splicing factor ATP-dependent RNA helicase DHX161.47E-1498.05%
Newbler_Contig19026Guanine nucleotide-binding subunit beta-2-like 12.42E-2099.85%
Newbler_Contig19125Ras-related Rac13.65E-1599.85%
Newbler_Contig20803Transmembrane emp24 domain-containing 21.25E-10092%
Newbler_Contig21587Ras-related Rap-1b isoform × 21.86E-7494.65%
Newbler_Contig23046DNA excision repair ERCC-1 isoform × 15.65E-1597%
Newbler_Contig24204Structural maintenance of chromosomes 1A3.58E-2197.30%
Newbler_Contig25563Centrin-2 isoform × 21.02E-35100%
Newbler_Contig28743Calmodulin1.00E-51100%
Newbler_Contig28993Ubiquitin-conjugating enzyme E2 26.26E-1399.80%
Newbler_Contig30617BRICK11.58E-2092.75%
Seqman_Contig_4254TALDO_DROME ame: Full = Probable transaldolase1.49E-3580.40%
Seqman_Contig_16470Elongation factor 1- oocyte form1.10E-2398.60%
Seqman_Contig_23100frataxin mitochondrial6.56E-1487.10%
Seqman_Contig_3132314-3-3 zeta1.46E-2697.70%
Seqman_Contig_40681Casein kinase I isoform alpha1.23E-2095%
Seqman_Contig_50528Angiopoietin-related protein 21.45E-1997%
Seqman_Contig_60730GTP-binding 128up2.73E-2587.70%
Seqman_Contig_6073114–3-3 GF14 omicron3.95E-1299.70%
List of transcripts associated with embryonic development and reproduction.

Enzyme based classification

The enzyme commission number is a hierarchical classification system that classifies the enzymes depending upon their chemical reactions [75]. The complete analysis of the assigned EC numbers to the annotated contigs indicated that among the 7680 contigs with functionally annotated gene ontology terms, 2638 contig sequences were assigned to 535 EC numbers. The enzyme code distribution analysis for the annotated contig datasets were represented in Fig. 5A. The overall data denoted that among the enzyme classes the enzymes associated with the hydrolase activity were the most represented (1265, 47.96%) followed by the enzymes associated with transferase (629, 23.84%) and oxidoreductase (513, 19.44%) activities. The over-representation of these enzyme classes in our study clearly substantiates that most of the annotated contigs of earthworm E. fetida are involved in crucial metabolic processes, signal transduction and immune response process of the species. Our functional interpretation can be strongly supported by the previous studies explicating the roles of oxidoreductase and hydrolases in regulating the metabolism [76] and signal transduction process [77].
Fig. 5

(A) Distribution of the enzyme classes in the annotated contig sequences of earthworm E. fetida. (B) Histogram presenting the Clusters of Orthologous Groups (COG) classification of the assembled contig sequences.

(A) Distribution of the enzyme classes in the annotated contig sequences of earthworm E. fetida. (B) Histogram presenting the Clusters of Orthologous Groups (COG) classification of the assembled contig sequences.

COG functional classification

The Clusters of Orthologous Groups (COGs) database is a useful resource which allows the prediction and classification of the possible functions of new species depending upon the orthologous relationship of the gene products [78]. The COGs functional annotation was performed by using the EggNog database integrated in BLAST2GO annotation pipeline [35]. The overall data from EggNog analysis illustrated that total 5860 transcript sequences of earthworm E. fetida (60.02% of the contigs with BLAST hit) were assigned to 25 COG functional groups (Fig. 5B). Among these 25 functional categories the cluster for “Posttranslational modification, protein turnover, chaperones” (1991, 33.98%) comprises the largest functional group. The heat shock proteins (HSPs) are notable molecular chaperons with anti-apoptotic activity to provide protection to the cells under different stress conditions like altered temperature, pH and oxygen deprivation [79]. These proteins express abundantly in the glial and neuronal cells and involved in regulating the inflammatory responses, cell homeostasis and cell survival. In addition the heat shock proteins show altered expression in the central and peripheral nervous system to support the axonal regeneration and remyelination upon nerve injuries like cerebral ischemia [80]. The transcriptome dataset of earthworm E. fetida denoted the presence of total 65 heat shock protein transcripts. Among the other functional groups the clusters for “Translation, ribosomal structure and biogenesis” (1233, 21.04%), “Function unknown” (1180, 20.13%), “Intracellular trafficking, secretion, and vesicular transport” (909, 15.51%), “Signal transduction mechanisms” (822, 14.02%), “General Function Prediction Only” (710, 12.11%) and “Energy production and conversion” (700, 11.94%) were the highly represented functional categories (Fig. 5B). Simultaneously the COG functional annotation data delineated that the clusters for “Cell wall/membrane/envelope biogenesis”, “Defense mechanisms”, “Extracellular structures”, “Cell motility” and “Nuclear structure” were the underrepresented categories which constituted the smallest functional groups in our study. The detailed interpretation of the COG analysis deciphers that most of the contigs of earthworm E. fetida transcriptome dataset are associated with information storage and processing. In metabolism category the clusters for “Carbohydrate transport and metabolism”, “Lipid transport and metabolism” and “Amino acid transport and metabolism” were the well represented functional groups. In addition a considerable number of contigs were assigned to poorly characterized functional categories probably due to phylognetically distant relationship of E. fetida with other species available in COG database. In eukaryotic system the ribosome biogenesis is an essential metabolic activity connected with pivotal cellular processes like cell proliferation and differentiation [81]. Especially in invertebrate species like C. elegans the ribosomal biogenesis regulates gonadogenesis by assisting in germline proliferation and pattern formation [82]. The transcriptome annotation details of earthworm E. fetida have represented several important translational regulatory factors like 60S ribosomal L5, 60S ribosomal L3, 40S ribosomal S13, 40S ribosomal S5 and 40S ribosomal S20, which may serve as the regulatory factors to assist in the biogenesis and structural constitution of ribosome in the worm species.

KEGG pathway annotation

The KEGG (Kyoto Encyclopedia of Genes and Genomes) pathway is an informative knowledgebase that displays the biological pathways connected to the gene or gene products obtained from molecular interaction network analysis [36]. These biological pathways provide significant functional information by graphically representing the crucial cellular and metabolic process of the species. In order to identify the biological pathways associated with the transcriptome dataset generated from the nerve cord tissue sample of earthworm E. fetida, the assembled contigs were annotated with corresponding EC numbers against the KEGG Pathway database using the BLAST2GO annotation tool. Total 4502 contig sequences were mapped to 124 KEGG pathways. Fig. 6 denoted the top 20 KEGG pathways highly represented by the annotated contig sequences of earthworm E. fetida. Among these pathways Purine Metabolism (617 contigs), Thiamine metabolism (500 contigs) and Biosynthesis of antibiotics (313 contigs) were the most dominant KEGG pathways observed in the transcriptome dataset. In addition the other highly represented pathways were Glycolysis/Gluconeogenesis, Oxidative phosphorylation and Aminobenzoate degradation. Notably, purines are small organic molecules involve in a wide range of neuronal functions like neurotransmission, neuromodulation, controlling neurite outgrowth and glial cell proliferation. The purine metabolism is firmly connected to the neuronal differentiation and neural development and any deficiency in the purine biosynthesis and recycling can lead to the degeneration of motor control, behavior and cognition [83]. Besides the purinergic signaling in central and peripheral nervous systems are essential to regulate diverse physiological and cellular processes like mechanosensory transduction, neurohormones secretion, cell proliferation, differentiation and neuronal protection [84]. Further meticulous analysis of the mapped KEGG pathways indicated that majority of the annotated contig sequences were attributed to the metabolism pathway (4229 contigs), whereas only few contigs were mapped to the pathways like organismal systems (162 contigs), genetic information processing (83 contigs) and Environmental Information Processing (27 contigs) (Table S3).
Fig. 6

Pie chart denoting the distribution of top 20 KEGG pathways associated with the annotated contigs of earthworm Eisenia fetida.

Functional enrichment analysis

The functional enrichment analysis is a powerful analytical technique which provides valuable information regarding the over and under representation of the annotated functions and biological pathways associated with a group of genes compare to their background dataset [85]. The enrichment analysis of the nerve cord specific GO terms and KEGG pathways in comparison to the entire genome of earthworm E fetida [37] was performed using the Fisher's exact test with FDR corrected P value < 0.05. The top 10 over represented GO terms from the three different ontologies were shown in Fig. S1. The GO enrichment data denoted that “ATP binding” (GO:0005524), “extracellular exosome” (GO:0070062), “structural constituent of ribosome” (GO:0003735) and “GTP binding” (GO:0005525) were the most enriched GO terms for the nerve cord transcripts. The enrichment of the transcripts associated with molecular functions like ATP and GTP binding activities in the nerve cord transcriptome dataset is extremely significant as they might be the potential source for the energy and phosphate to support the crucial physiological functions like neuronal sensory signal transduction and muscle contraction and relaxation of the worm [86]. The details of the functionally upregulated GO terms and the biological pathways were represented in Table S4. The pathway enrichment study suggested that among the 124 KEGG pathways mapped to the nerve cord transcripts, 23 pathways were functionally enriched and the transcripts associated with nucleotide metabolism, lipid metabolism and metabolism of cofactors and vitamins were abundant in the transcriptome dataset.
Fig. S1

Bar chart representing top 30 functionally enriched gene ontology terms from three different categories (Biological Process, Cellular Component and Molecular Function), associated with the nerve cord transcripts of earthworm E. fetida in comparison to their whole genome dataset.

Identification of neuropeptide specific genes

Neuropeptides are peptidergic neurotransmitters capable of transmitting the cellular signals to facilitate the neuronal communication [87]. The neuropeptides are widely distributed in the central and peripheral nervous system and regulate a wide variety of cellular, physiological and behavioral processes like metabolism, energy expenditure, reproduction, circadian rhythmicity, pain sensation, memory and learning ability [88]. In invertebrate species the neuropeptides are mainly confined in the neurosecretory cells, inter neurons and motor neurons and act as a synaptic modulator to affect the neural circuits and neuromuscular junction in the central nervous system [89]. Especially in nematodes like C. elegans the neuropeptides control the muscle contraction and convey the convulsive behavior through excitatory and inhibitory imbalance of the cholinergic and GABAergic motor neurons of the species [90]. In the annelid organisms like earthworm, leeches and other polychaetes > 40 neuropeptides were identified so far, showing high sequence similarity to the reported neuropeptides of arthropods, molluscs and vertebrates [91]. Among the earthworm species E. fetida has been thoroughly investigated to identify the putative neuropeptides and interpret their function in the physiological regulation of the species. The neuropeptides are mainly observed in the ventral nerve cord ganglia and somatogastric nervous system of the worm and known to regulate the physiological and reproductive activities like muscle contraction, gut motility, osmoregulation and egg laying behaviors associated with changes in body shapes, rotatory movements and mucosal secretion [15], [16], [92], [93]. In addition the neuropeptides also participate in the embryogenesis and caudal regeneration process of the species [16], [94]. The nerve cord specific transcriptome sequences of earthworm E. fetida annotated against the nr database have marked the presence of total 37 contigs homologous to 22 neuropeptides (Table 3). Most of the neuropeptide homologs in the nr BLAST alignment were retrieved from the marine annelid Platynereis dumerilii. Among the identified neuropeptides XPRLamide neuropeptide precursor, whitnin-1 neuropeptide precursor SPTR, QERAS neuropeptide precursor, WI neuropeptide and allatostatin-C neuropeptide precursor showed high sequence homology (80–100%) in the BLAST search. XPRLamide is an insect pyrokinin related neuropeptide that plays regulatory role in insect development through diapause termination and assist in sexual communication through pheromone biosynthesis [95]. Whitnin gene is a proctolin neuropeptide homolog commonly observed in the annelid Platynereis dumerilii and gastropod Haliotis asinine [96], [97]. Proctolin functions as the myotropic neuromuscular transmitter to regulate the visceral and skeletal muscle contraction through neuromodulatory activity [98]. Allatostatin-C neuropeptide is widely distributed throughout the peripheral and ventral nerve cord ganglia of arthropod species Calanus finmarchicus and modulates their feeding and locomotory activity for physiological adaptation [99]. In lobster Homarus americanus Allatostatin-C has modulatory role on the cardiac ganglion to trigger the rhythmic heart contraction [100]. In addition the transcriptome data also denoted the expression of neuropeptide Y and FMRF-amide neuropeptides in the nerve cord tissue. Both these neuropeptides along with proctolin are known to be expressed in the somatogastric nervous system of E. fetida to regulate the muscle cells in the alimentary canal of the worm [15].
Table 3

List of Neuropeptides with their gene ontology details.

Sl. No.SeqNameDescriptionLengthE-valueSim meanGO names
1Newbler_Contig0720505CLCCY neuropeptide precursor [Platynereis dumerilii]2354.58E-0770P:neuropeptide signaling pathway; C:extracellular region
2Newbler_Contig08652XPRLamide neuropeptide precursor [Eisenia fetida]1831.03E-32100P:signal transduction
3Newbler_Contig11852XPRLamide neuropeptide precursor [Eisenia fetida]1461.03E-21100P:signal transduction
4Newbler_Contig13284FMRF-amide neuropeptides-like [Biomphalaria glabrata]2377.94E-1667.5P:neuropeptide signaling pathway
5Newbler_Contig18293whitnin-1 neuropeptide precursor SPTR [Platynereis dumerilii]2541.54E-1671.67C:cellular_component
6Newbler_Contig24715FMRFamide precursor [Lumbriculus variegatus]4852.48E-0666P:neuropeptide signaling pathway
7Newbler_Contig25668neuropeptide F [Lottia gigantea]3094.55E-0869.33P:neuropeptide signaling pathway; C:membrane; C:integral component of membrane; C:extracellular region; F:hormone activity
8Newbler_Contig26118SVPGVLRF-amide 3 [Aplysia californica]3054.87E-1761.56C:extracellular space; P:neuropeptide signaling pathway; P:embryo development ending in birth or egg hatching; F:neuropeptide hormone activity; C:extracellular region; F:neuropeptide receptor binding; F:molecular_function; F:neuropeptide Y receptor binding
9Newbler_Contig26513whitnin-1 neuropeptide precursor SPTR [Platynereis dumerilii]5777.10E-2077.83P:signal transduction; C:cellular_component
10Newbler_Contig29900WI neuropeptide partial [Platynereis dumerilii]3511.41E-1577P:neuropeptide signaling pathway
11Newbler_Contig30157prohormone-3 neuropeptide precursor [Platynereis dumerilii]4811.83E-4360.5P:neuropeptide signaling pathway
12Seqman_Contig_11518neuropeptide Y [Lymnaea stagnalis]2244.04E-1473.25P:neuropeptide signaling pathway; C:extracellular region; F:hormone activity
13Seqman_Contig_11560MIP allatostatin B neuropeptide precursor [Platynereis dumerilii]8161.01E-1752.5P:neuropeptide signaling pathway
14Seqman_Contig_18656pedal peptide neuropeptide precursor 2 MLDpeptide [Platynereis dumerilii]2311.10E-1061P:neuropeptide signaling pathway
15Seqman_Contig_1992WI neuropeptide partial [Platynereis dumerilii]2532.76E-1284P:neuropeptide signaling pathway
16Seqman_Contig_20437Mesotocin-neurophysin partial [Corvus brachyrhynchos]2807.46E-1057.75P:neuropeptide signaling pathway; F:neurohypophyseal hormone activity; C:extracellular region
17Seqman_Contig_2485whitnin-1 neuropeptide precursor SPTR [Platynereis dumerilii]6956.60E-2872.2P:neuropeptide signaling pathway
18Seqman_Contig_27117allatostatin-C neuropeptide precursor [Platynereis dumerilii]6821.59E-1567P:neuropeptide signaling pathway
19Seqman_Contig_3225FVRIamide neuropeptide precursor [Platynereis dumerilii]2962.35E-0757.75P:neuropeptide signaling pathway
20Seqman_Contig_36821oxytocin-neurophysin 1 [Xenopus (Silurana) tropicalis]1532.43E-0868.27P:neuropeptide signaling pathway; F:neurohypophyseal hormone activity; C:extracellular region
21Seqman_Contig_39560allatostatin-C neuropeptide precursor [Platynereis dumerilii]6041.80E-0680P:neuropeptide signaling pathway
22Seqman_Contig_40507XPRLamide neuropeptide precursor [Eisenia fetida]2147.63E-3980P:neuropeptide signaling pathway
23Seqman_Contig_40845LYamide FDSIG neuropeptide precursor [Platynereis dumerilii]1362.23E-0979.5P:neuropeptide signaling pathway
24Seqman_Contig_5359CLCCY neuropeptide precursor [Platynereis dumerilii]3142.46E-0764.5P:neuropeptide signaling pathway; F:ion channel inhibitor activity; C:extracellular region; P:pathogenesis
25Seqman_Contig_55026whitnin-1 neuropeptide precursor SPTR [Platynereis dumerilii]1609.93E-1270.75C:membrane
26Seqman_Contig_57817FMRF-amide neuropeptides-like [Biomphalaria glabrata]2581.16E-1762.63P:neuropeptide signaling pathway
27Seqman_Contig_58801whitnin-1 neuropeptide precursor SPTR [Platynereis dumerilii]6052.32E-1295.5P:neuropeptide signaling pathway; C:integral component of membrane
28Seqman_Contig_59269QERAS neuropeptide precursor [Platynereis dumerilii]2355.04E-0894P:neuropeptide signaling pathway
29Seqman_Contig_59283neuropeptide F [Lottia gigantea]5916.28E-1470.4P:neuropeptide signaling pathway; C:extracellular region; F:hormone activity
30Seqman_Contig_59977prohormone-3 neuropeptide precursor [Platynereis dumerilii]4731.87E-1249P:neuropeptide signaling pathway
31Seqman_Contig_60850CLCCY neuropeptide precursor [Platynereis dumerilii]3403.14E-0972P:neuropeptide signaling pathway; C:extracellular region
32Seqman_Contig_61369Neuroendocrine 7B2 [Zootermopsis nevadensis]3441.19E-3463.15P:neuropeptide signaling pathway; C:secretory granule
33Seqman_Contig_61694prohormone-3 neuropeptide precursor [Platynereis dumerilii]6191.78E-2852.5P:neuropeptide signaling pathway
34Seqman_Contig_62842XPRLamide neuropeptide precursor [Eisenia fetida]2332.59E-2695P:neuropeptide signaling pathway
35Seqman_Contig_7798neuropeptide precursor [Platynereis dumerilii]8301.08E-1054P:neuropeptide signaling pathway
36Seqman_Contig_7835MIP allatostatin B neuropeptide precursor [Platynereis dumerilii]1916.26E-0968.5P:neuropeptide signaling pathway
37Seqman_Contig_8949MIP allatostatin B neuropeptide precursor [Platynereis dumerilii]10264.21E-1053.5P:neuropeptide signaling pathway
Pie chart denoting the distribution of top 20 KEGG pathways associated with the annotated contigs of earthworm Eisenia fetida. List of Neuropeptides with their gene ontology details.

Identification of potential stem cell markers

Annelids are renowned for their phenomenal ability to regenerate the amputated body parts [4]. Among the annelid species the earthworm E. fetida have been immensely exploited to monitor the different aspects of annelid regeneration. The regeneration potential and rate of survival of E. fetida are positively correlated with the number of segments retained in worm post amputation [13]. Confirming the certain influence of nerve cord in regeneration process, Morgan, 1902 reported the presence of ventral nerve cord at the amputated region is essentially required for the head regeneration in earthworm Allolobophora foetida [101]. Subsequent studies have indicated that the regeneration of ventral nerve cord in worm upon amputation is correlated with the conduction of impulses for regulating the coordinated movement and locomotory activity [102]. Recently, Nino et al., 2016 demonstrated that during the anterior regeneration in the brain amputated earthworm Eudrilus eugeniae the nerve cord takes control of the neurological activities and regulates the segmental nerves to assist in worm's survival [7]. The process of regeneration in annelids is a combinatorial outcome of epimorphosis and morphallaxis and the formation of blastema during epimorphosis is supported by the neoblast stem cells residing particularly in the epidermal cell layers [103], [104], [105]. Especially the post amputation and wound healing process enables the synthesis and activation of several growth factors and signaling molecules, which assist in signal transduction by triggering the stem cell proliferation to facilitate the process of tissue regeneration and organogenesis [106]. The tissue homogenate of earthworm E. fetida showed the activation of EGF and FGF to carry out the signal transduction during regeneration [107]. In addition the crucial pluripotent stem cell markers like oct4, c-myc, sox2 and nanog plays regulatory role in assisting the regeneration process of the species [108]. In order to screen the genes associated with the stem cell differentiation we have compared the entire transcriptome dataset of the worm with 250 essential stem cell markers present in the human amniocytes [41]. The corresponding protein sequences of these stem cell markers were obtained from the UniProt database and locally blasted with the assembled contig dataset of earthworm E. fetida using E-value threshold 1E-5. Total 487 contigs homologous to 108 stem cell markers (Table S5) were identified from the BLAST Search. In the transcriptome dataset of earthworm E. fetida we have marked the presence of some crucial epigenetic regulators like Wdr5, Kat5, Rbbp4 and Cbx; Transcriptional repressors like Cnot1, Tle1 and Yy1 and cell cycle specific genes like Smc1a, Cdk2 and Pcna having high sequence homology with human in BLAST search. The BLAST data retrieved against the stem cell markers in the transcriptome dataset were further confirmed by matching them with the nr BLAST hits. Among the trithorax group protein Wdr5 acts as a transcriptional activator for the self-renewal of the embryonic stem cells in mammalian system [109]. Simultaneously in invertebrate species like planaria the Wdr5 forms complex with the Smed gene and knocking down of the Smed-wdr5 complex has inhibitory effect on regeneration by reducing the blastema formation of the species [110]. The close evolutionary relationship between planaria and earthworm and the mechanistic similarity in their regeneration and organogenesis have prioritized the role of the gene in assisting the blastema formation of the worm.

Phylogenetic analysis of calcium binding proteins

Calcium acts as a crucial messenger molecule to facilitate the process of synaptic transmission, neuronal growth and axonal regeneration [59]. The domain based annotation of E. fetida nerve cord transcriptome has demonstrated the abundance of calcium binding proteins containing the EF hand domain. We have selected total 91 E fetida contigs representing the calcium binding proteins and performed BLAST search to identify their orthologs in leech H. robusta and the polychaete C. teleta. The BLAST analysis has shown significant orthology to 74H. robusta proteins and 84C. teleta proteins respectively. The protein sequences were aligned together using Clustal W tool and a phylogenetic tree was constructed using the UPGMA method with bootstrap replicate of 100 (Fig. S2). The phylogenetic tree represented total 168 internal nodes and 170 leaf nodes. A few of the nodes lack the proper bootstrap support probably due to the construction of the phylogenetic tree from large set of gene families [111]. Among the EF hand domain containing transcripts the calbindin family proteins (Seqman_Contig_17763, Newbler_Contig27185, Seqman_Contig_9416, Newbler_Contig27868 and Seqman_Contig_24001) and the striated muscle regulatory proteins (Seqman_Contig_61120 and Newbler_Contig23758) were evolutionary closely related to their orthologs in the phylogenetic tree. Among the members of calbindin family, the multifunctional protein calretinin is widely distributed in the central and peripheral nervous system and it performs heterogeneous functions ranging from regulating the calcium homeostasis to trigger the nerve impulses, maintaining the neuronal physiology, ensuring neuronal protection by reducing the neuronal cell death and supporting neurogenesis [112], [113]. The protein is evolutionary conserved and having high sequence identity with its invertebrate ortholog calbindin 32 [113]. The close evolutionary relationship of the calretinin and calbindin 32 proteins in our phylogenetic analysis suggested that the protein is functionally conserved in the annelids and it may play essential role in regulating their neurophysiology and neuronal development. Simultaneously our evolutionary data also depicted the event of tandem gene duplication for the sarcoplasmic calcium-binding proteins (Seqman_Contig_3135, Newbler_Contig21647, Newbler_Contig26378, Seqman_Contig_60438, Seqman_Contig_61888, Seqman_Contig_60208, Newbler_Contig24773, Newbler_Contig07358 and Seqman_Contig_6452) in the transcriptome dataset of earthworm E fetida. Among the other proteins the 20 kDa calcium-binding (Antigen SM20) protein (Newbler_Contig24366) was clustered together with the troponin C protein (Seqman_Contig_8254) as a monophyletic clade and showed significant evolutionary relatedness with their orthologs. In contrast, the neuronal calcium sensors (Newbler_Contig25072 and Seqman_Contig_10943) and neurocalcin (Seqman_Contig_31579) proteins were phylogenetically distantly related with their orthologs in H. robusta and C. teleta.
Fig. S2

Phylogenetic tree representing the evolutionary relationship of Eisenia fetida calcium binding proteins to their orthologs in the leech Helobdella robusta and the polychaete Capitella teleta. The tree was constructed using the UPGMA method and bootstrapping of 100 replicates.

Conclusion

In the scarcity of proper genome information for the annelid earthworms the transcriptome sequencing has emerged as a perfect substitute to explore them at cellular, molecular and genetic level. The cerebral ganglion and nerve cord of earthworms are crucial organs for regulating their movement, rhythmicity, regeneration, social interaction and physiological behavior. However, the nerve cord specific transcripts and their function features were not explored in great details. The present study emphasizes the role of earthworm nerve cord by providing a complete annotation summary of the nerve cord transcripts and their functions. In our study 1. we have represented the complete functional information regarding 8819 nerve cord transcripts, 2. rationalized the involvement of earthworm nerve cord in maintaining the sensory and behavioral physiology of the species based on their transcriptomic perspective and 3. identified 22 neuropeptide transcripts and 108 stem cell markers reside in the nerve cord tissue of earthworm Eisenia fetida. The complete transcriptome annotation summary of the nerve cord specific transcripts of earthworm can be further utilized as a useful resource to interpret the genetic and biochemical pathways associated with the neurobiological and physiological process of the species. The following are the supplementary data related to this article. Bar chart representing top 30 functionally enriched gene ontology terms from three different categories (Biological Process, Cellular Component and Molecular Function), associated with the nerve cord transcripts of earthworm E. fetida in comparison to their whole genome dataset. Phylogenetic tree representing the evolutionary relationship of Eisenia fetida calcium binding proteins to their orthologs in the leech Helobdella robusta and the polychaete Capitella teleta. The tree was constructed using the UPGMA method and bootstrapping of 100 replicates.

Table S1

BLAST2GO functional annotation against nr database.

Table S2

Gene Ontology details of the annotated transcripts.

Table S3

Summary of KEGG pathway mapping.

Table S4

Complete list of over-represented functions and pathways with FDR corrected P-values.

Table S5

List of Human stem cell markers homologs in earthworm Eisenia fetida.

List of Abbreviations

Gamma-Amino butyric acid Eisenia tetradecapeptide Express Sequence Tag G protein Couple receptors Next Generation Sequencing Gene Ontology Clusters of Orthologous Groups Evolutionary genealogy of genes Kyoto Encyclopedia of Genes and Genomes WD repeat-containing protein 5 Histone acetyltransferase KAT5 Histone-binding protein RBBP4 Chromobox family CCR4-NOT transcription complex subunit 1 Transducin-like enhancer protein 1 Transcriptional repressor protein YY1 Structural maintenance of chromosomes 1A Cyclin-dependent kinase 2 Proliferating cell nuclear antigen

Conflict of interests

The authors declare no potential conflicts of interest.

Transparency document

Transparency document.
  83 in total

1.  KEGG: kyoto encyclopedia of genes and genomes.

Authors:  M Kanehisa; S Goto
Journal:  Nucleic Acids Res       Date:  2000-01-01       Impact factor: 16.971

2.  Gene ontology: tool for the unification of biology. The Gene Ontology Consortium.

Authors:  M Ashburner; C A Ball; J A Blake; D Botstein; H Butler; J M Cherry; A P Davis; K Dolinski; S S Dwight; J T Eppig; M A Harris; D P Hill; L Issel-Tarver; A Kasarskis; S Lewis; J C Matese; J E Richardson; M Ringwald; G M Rubin; G Sherlock
Journal:  Nat Genet       Date:  2000-05       Impact factor: 38.330

Review 3.  Proctolin, an insect neuropeptide.

Authors:  D Konopińska; G Rosiński
Journal:  J Pept Sci       Date:  1999-12       Impact factor: 1.905

Review 4.  EF-hand calcium-binding proteins.

Authors:  A Lewit-Bentley; S Réty
Journal:  Curr Opin Struct Biol       Date:  2000-12       Impact factor: 6.809

5.  Classification and evolution of P-loop GTPases and related ATPases.

Authors:  Detlef D Leipe; Yuri I Wolf; Eugene V Koonin; L Aravind
Journal:  J Mol Biol       Date:  2002-03-15       Impact factor: 5.469

Review 6.  ATP and sensory transduction in the enteric nervous system.

Authors:  Paul P Bertrand
Journal:  Neuroscientist       Date:  2003-08       Impact factor: 7.519

7.  Possible functions of oxytocin/vasopressin-superfamily peptides in annelids with special reference to reproduction and osmoregulation.

Authors:  Y Fujino; T Nagahama; T Oumi; K Ukena; F Morishita; Y Furukawa; O Matsushima; M Ando; H Takahama; H Satake; H Minakata; K Nomoto
Journal:  J Exp Zool       Date:  1999-09-01

8.  Calmodulin differentially modulates Smad1 and Smad2 signaling.

Authors:  A Scherer; J M Graff
Journal:  J Biol Chem       Date:  2000-12-29       Impact factor: 5.157

9.  Distribution and action of some putative neurotransmitters in the stomatogastric nervous system of the earthworm, Eisenia fetida (Oligochaeta, Annelida).

Authors:  J Barna; M Csoknya; Z Lázár; L Barthó; J Hámori; K Elekes
Journal:  J Neurocytol       Date:  2001-04

10.  Cell cycle roles for two 14-3-3 proteins during Drosophila development.

Authors:  T T Su; D H Parry; B Donahoe; C T Chien; P H O'Farrell; A Purdy
Journal:  J Cell Sci       Date:  2001-10       Impact factor: 5.285

View more
  5 in total

1.  The transcriptome of anterior regeneration in earthworm Eudrilus eugeniae.

Authors:  Sayan Paul; Subburathinam Balakrishnan; Arun Arumugaperumal; Saranya Lathakumari; Sandhya Soman Syamala; Vaithilingaraja Arumugaswami; Sudhakar Sivasubramaniam
Journal:  Mol Biol Rep       Date:  2020-12-11       Impact factor: 2.316

2.  Importance of clitellar tissue in the regeneration ability of earthworm Eudrilus eugeniae.

Authors:  Sayan Paul; Subburathinam Balakrishnan; Arun Arumugaperumal; Saranya Lathakumari; Sandhya Soman Syamala; Vijithkumar Vijayan; Selvan Christyraj Jackson Durairaj; Vaithilingaraja Arumugaswami; Sudhakar Sivasubramaniam
Journal:  Funct Integr Genomics       Date:  2022-04-13       Impact factor: 3.674

Review 3.  Comparative Aspects of Annelid Regeneration: Towards Understanding the Mechanisms of Regeneration.

Authors:  Roman P Kostyuchenko; Vitaly V Kozin
Journal:  Genes (Basel)       Date:  2021-07-28       Impact factor: 4.096

4.  Data on genome annotation and analysis of earthworm Eisenia fetida.

Authors:  Sayan Paul; Arun Arumugaperumal; Rashmi Rathy; Vasanthakumar Ponesakki; Palavesam Arunachalam; Sudhakar Sivasubramaniam
Journal:  Data Brief       Date:  2018-08-29

5.  Pontoscolex corethrurus: A homeless invasive tropical earthworm?

Authors:  Angel I Ortíz-Ceballos; Diana Ortiz-Gamino; Antonio Andrade-Torres; Paulino Pérez-Rodríguez; Maurilio López-Ortega
Journal:  PLoS One       Date:  2019-09-20       Impact factor: 3.240

  5 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.