| Literature DB >> 25128015 |
Dae-Won Kim, Won Gi Yoo, Myoung-Ro Lee, Hye-Won Yang, Yu-Jung Kim, Shin-Hyeong Cho, Won-Ja Lee1, Jung-Won Ju.
Abstract
BACKGROUND: Although spargana, which are the plerocercoids of Spirometra erinacei, are of biological and clinical importance, expressed sequence tags (ESTs) from this parasite have not been explored. To understand molecular and biological features of this parasite, sparganum ESTs were examined by large-scale EST sequencing and multiple bioinformatics tools.Entities:
Mesh:
Year: 2014 PMID: 25128015 PMCID: PMC4262225 DOI: 10.1186/1756-3305-7-368
Source DB: PubMed Journal: Parasit Vectors ISSN: 1756-3305 Impact factor: 3.876
Figure 1Main workflow for analysis. Outlay of analysis steps performed for Spirometra erinacei ESTs data. External programs used for analysis are shown where appropriate. ESTs were pre-processed and subjected to clustering and assembly (A). Singlets and contigs were examined for homology (B), screened for secretory antigen candidates (C) and compared with other species at the whole transcriptome scale (D).
Transcriptome features of spargana
| Numbers | |
|---|---|
| Total sequence reads | 5,760 |
| Total analyzed reads (average size) | 5,634 (687 bp) |
| Total number of assembled sequences (average size) | 1,794 (715 bp) |
| Contigs | 934 |
| Singlets | 860 |
| Total annotated genes (BLASTX or InterProScan) | 1,351 |
| BLASTX | 1,335 |
| InterProScan | 96 |
Figure 2Distribution of ESTs within contigs after clustering the 5,634 sequences using CAP3.
Biological process and molecular function GO terms with the 15 highest scores
| Category | Level | GO ID | GO terms | Representation a | Score c | |
|---|---|---|---|---|---|---|
| Number | Percentage b | |||||
| Biological process | 5 | GO:0044260 | Cellular macromolecule metabolic process | 213 | 31.84% | 74.85 |
| 6 | GO:0044267 | Cellular protein metabolic process | 164 | 24.51% | 73.52 | |
| 5 | GO:0010467 | Gene expression | 128 | 19.13% | 71.09 | |
| 7 | GO:0006412 | Translation | 82 | 12.26% | 70.64 | |
| 6 | GO:0034645 | Cellular macromolecule biosynthetic process | 123 | 18.39% | 62.53 | |
| 4 | GO:0043170 | Macromolecule metabolic process | 229 | 34.23% | 62.04 | |
| 5 | GO:0019538 | Protein metabolic process | 179 | 26.76% | 55.76 | |
| 4 | GO:0050794 | Regulation of cellular process | 107 | 15.99% | 45.18 | |
| 4 | GO:0044249 | Cellular biosynthetic process | 146 | 21.82% | 42.06 | |
| 5 | GO:0009059 | Macromolecule biosynthetic process | 123 | 18.39% | 37.73 | |
| 6 | GO:0016310 | Phosphorylation | 53 | 7.92% | 34.29 | |
| 4 | GO:0006810 | Transport | 76 | 11.36% | 29.17 | |
| 8 | GO:0006468 | Protein phosphorylation | 35 | 5.23% | 28.24 | |
| 5 | GO:0007165 | Signal transduction | 58 | 8.67% | 28.05 | |
| 4 | GO:0055114 | Oxidation-reduction process | 42 | 6.28% | 27.96 | |
| Molecular function | 9 | GO:0005524 | ATP binding | 100 | 12.12% | 100 |
| 6 | GO:0032550 | Purine ribonucleoside binding | 135 | 16.36% | 84 | |
| 5 | GO:0035639 | Purine ribonucleoside triphosphate binding | 135 | 16.36% | 82.2 | |
| 4 | GO:1901265 | Nucleoside phosphate binding | 186 | 22.55% | 78.48 | |
| 4 | GO:0043168 | Anion binding | 162 | 19.64% | 70.46 | |
| 4 | GO:0003676 | Nucleic acid binding | 112 | 13.58% | 69.51 | |
| 5 | GO:0000166 | Nucleotide binding | 186 | 22.55% | 66.35 | |
| 8 | GO:0032559 | Adenyl ribonucleotide binding | 100 | 12.12% | 61.8 | |
| 5 | GO:0046872 | Metal ion binding | 88 | 10.67% | 54.83 | |
| 5 | GO:0032549 | Ribonucleoside binding | 136 | 16.48% | 52.84 | |
| 5 | GO:0001883 | Purine nucleoside binding | 135 | 16.36% | 50.4 | |
| 7 | GO:0032555 | Purine ribonucleotide binding | 135 | 16.36% | 50.4 | |
| 5 | GO:0003723 | RNA binding | 56 | 6.79% | 38.19 | |
| 7 | GO:0030554 | Adenyl nucleotide binding | 100 | 12.12% | 37.8 | |
| 9 | GO:0005525 | GTP binding | 37 | 4.48% | 37 | |
aNote that individual GO categories can have multiples mappings. The representation means the number of SpAEs that can be mapped to a certain GO term.
bThe representation percentage is based on the total number of GO mappings in each of the two major ontologies (biological process: 669, molecular function: 825).
cScore was calculated by BLAST2GO according to number of different sequences annotated at a child GO term and distance to node of the child GO term.
The most abundant transcripts in spargana
| Cluster ID | No. of reads | Accession ID | Description | Organism | E-value |
|---|---|---|---|---|---|
| EPA018LGAA12C000033 | 164 | XP_007424327.1 | PREDICTED: fibronectin isoform X1 |
| 1.73E-91 |
| EPA018LGAA12C000039 | 90 | EUB60510.1 | Polyadenylate-binding protein |
| 0 |
| EPA018LGAA12C000019 | 87 | AAD11479.1 | Cytoplasmic antigen containing repeat epitope, partial |
| 0 |
| EPA018LGAA12C000001 | 80 | EUB65008.1 | Cyclin-I |
| 2.45E-69 |
| EPA018LGAA12C000005 | 70 | AFX72984.1 | Elongation factor 1 alpha |
| 0 |
| EPA018LGAA12C000052 | 61 | - | - | - | |
| EPA018LGAA12C000025 | 50 | AAL18701.1 | AF418991_1 cytoplasmic antigen 4 |
| 1.88E-62 |
| EPA018LGAA12C000035 | 38 | GAA43229.2 | ATP-dependent RNA helicase UAP56/SUB2 |
| 1.6E-132 |
| EPA018LGAA12C000055 | 38 | AFX73009.1 | pDJA1 chaperone |
| 0 |
| EPA018LGAA12C000053 | 36 | BAA90773.1 | Glyceraldehyde-3-phosphate dehydrogenase |
| 0 |
| EPA018LGAA12C000018 | 35 | AFM74218.1 | 40S ribosomal protein S24 |
| 4.97E-70 |
| EPA018LGAA12C000047 | 35 | CDJ25645.1 | Transaldolase |
| 5.15E-64 |
| EPA018LGAA12C000028 | 32 | CDJ16325.1 | Programmed cell death protein 4 |
| 3.03E-88 |
| EPA018LGAA12C000056 | 27 | - | - | - | |
| EPA018LGAA12C000061 | 26 | CAX75788.1 | Tubulin beta-2C chain |
| 0 |
| EPA018LGAA12C000062 | 24 | - | - | - | |
| EPA018LGAA12C000010 | 22 | ABR68549.1 | Cystatin-2 |
| 1.49E-19 |
| EPA018LGAA12C000063 | 21 | CDJ08795.1 | Nervous system adducin |
| 5.3E-113 |
| EPA018LGAA12C000064 | 21 | EUB59337.1 | Actin |
| 0 |
| EPA018LGAA12C000065 | 21 | Q8MUA4.1 | 14332_ECHGR RecName: Full = 14-3-3 protein homolog 2 |
| 5.4E-101 |
| EPA018LGAA12C000066 | 21 | CDJ25303.1 | Synaptic vesicle membrane protein VAT 1 |
| 1.4E-169 |
| EPA018LGAA12C000070 | 20 | BAB62718.1 | Plerocercoid growth factor/cysteine protease |
| 0 |
| EPA018LGAA12C000009 | 19 | ABN14906.1 | Heat shock protein 90 alpha |
| 4.31E-71 |
| EPA018LGAA12C000040 | 19 | - | - | - | |
| EPA018LGAA12C000075 | 18 | XP_002020246.1 | GL13880 |
| 1.1E-107 |
| EPA018LGAA12C000049 | 16 | - | - | - | |
| EPA018LGAA12C000080 | 16 | CDJ23790.1 | Heat shock 70 kDa protein 4 |
| 0 |
| EPA018LGAA12C000081 | 16 | CDJ17948.1 | gtp binding protein 2 |
| 3.37E-70 |
| EPA018LGAA12C000024 | 15 | CDJ17047.1 | 40s ribosomal protein s15 |
| 6.88E-65 |
| EPA018LGAA12C000076 | 15 | CDJ20938.1 | Ubiquitin conjugating enzyme E2 G1 |
| 8.5E-104 |
| EPA018LGAA12C000082 | 15 | CDJ15210.1 | Excitatory amino acid transporter 3 |
| 2.23E-80 |
| EPA018LGAA12C000085 | 15 | CDJ17337.1 | Fructose 16 bisphosphate aldolase |
| 6.4E-178 |
| EPA018LGAA12C000086 | 15 | CDJ13399.1 | Signal peptidase complex subunit 3 |
| 7E-137 |
The 25 most frequent Pfam domains in spargana
| Protein domain family | Pfam ID | No. of SpAEs |
|---|---|---|
| Protein kinase domain | PF00069 | 22 |
| RNA recognition motif domain | PF00076 | 20 |
| BTB/Kelch-associated | PF07707 | 15 |
| EF-hand domain pair | PF13499 | 13 |
| BTB/POZ | PF00651 | 12 |
| Chaperonin Cpn60/TCP-1 | PF00118 | 11 |
| Phox/Bem1p | PF00564 | 10 |
| WD40 repeat | PF00400 | 10 |
| Small GTPase superfamily | PF00071 | 9 |
| Heat shock protein 70 family | PF00012 | 9 |
| Kelch repeat type 1 | PF01344 | 8 |
| Fibronectin, type III | PF00041 | 8 |
| Null | PF13414 | 7 |
| Calponin homology domain | PF00307 | 7 |
| 14-3-3 domain | PF00244 | 7 |
| Ubiquitin-conjugating enzyme, E2 | PF00179 | 7 |
| Thioredoxin domain | PF00085 | 7 |
| Zinc finger, C2H2 | PF13465 | 6 |
| Leucine rich repeat 4 | PF12799 | 6 |
| Collagen triple helix repeat | PF01391 | 6 |
| Dynein light chain, type 1/2 | PF01221 | 6 |
| AMP-dependent synthetase/ligase | PF00501 | 6 |
| Tetraspanin/Peripherin | PF00335 | 6 |
| Aminotransferase, class V/Cysteine desulfurase | PF00266 | 6 |
| K Homology domain, type 1 | PF00013 | 6 |
The 10 most abundant enzymes in spargana
| Enzyme code | Name | No. of reads | No. of SpAEs | Cluster IDs |
|---|---|---|---|---|
| EC:1.2.1.12 | Glyceraldehyde-3-phosphate dehydrogenase | 36 | 2 | EPA018LGAA12C000053, EPA018LGAA12S001658 |
| EC:2.2.1.2 | ATP dependent rna helicase ddx1 | 35 | 1 | EPA018LGAA12C000047 |
| EC:3.4 | Cysteine proteinase | 20 | 6 | EPA018LGAA12C000070, EPA018LGAA12C000238, EPA018LGAA12C000503, EPA018LGAA12C000561, EPA018LGAA12C000680, EPA018LGAA12S005500 |
| EC:3.6.1.3 | Heat shock protein 90 alpha | 19 | 10 | EPA018LGAA12C000009, EPA018LGAA12C000086, EPA018LGAA12C000157, EPA018LGAA12C000209, EPA018LGAA12C000367, EPA018LGAA12C000500, EPA018LGAA12C000591, EPA018LGAA12S002094, EPA018LGAA12S004373, EPA018LGAA12S005358 |
| EC:2.6.1.52 | Phosphoserine aminotransferase 1 | 18 | 1 | EPA018LGAA12C000075 |
| EC:4.1.2.13 | Fructose-bisphosphate aldolase | 15 | 1 | EPA018LGAA12C000085 |
| EC:2.1.1.45 | Thymidylate synthase | 12 | 2 | EPA018LGAA12C000100, EPA018LGAA12C000380 |
| EC:6.3.1.2 | Glutamine synthetase | 12 | 2 | EPA018LGAA12C000104, EPA018LGAA12C000121 |
| EC:2.3.1.29 | 2 amino 3 ketobutyrate coenzyme a ligase | 10 | 1 | EPA018LGAA12C000125 |
| EC:1.11.1.7 | Glutathione peroxidase | 9 | 1 | EPA018LGAA12C000127 |
Putative secretory proteins predicted by ORFpredictor, SignalP, TMHMM and YLoc
| Cluster ID | No. of reads | Accession ID | E-value | Description |
|
|---|---|---|---|---|---|
| EPA018LGAA12C000067 | 12 | - | - | - | - |
| EPA018LGAA12C000103 | 12 | XP_005104335.1 | 2.98e-09 | PREDICTED: ADP-ribosyl cyclase-like | 25% |
| EPA018LGAA12C000011 | 10 | EUB64644.1 | 5.25e-08 | DNA-binding protein HEXBP | 48% |
| EPA018LGAA12C000266 | 5 | CCD82741.1 | 2.66e-27 | T-cell immunomodulatory protein | 27% |
| EPA018LGAA12C000319 | 5 | ETE62793.1 | 9.78e-51 | Collagen alpha-1(III) chain | 66% |
| EPA018LGAA12C000036 | 5 | AFX72984.1 | 0 | Elongation factor 1 alpha | - |
| EPA018LGAA12C000355 | 4 | AFI71096.1 | 1.26e-34 | Ag5 | 31% |
| EPA018LGAA12C000352 | 4 | - | - | - | - |
| EPA018LGAA12C000362 | 4 | - | - | - | - |
| EPA018LGAA12C000007 | 4 | - | - | - | - |
| EPA018LGAA12C000336 | 4 | - | - | - | - |
| EPA018LGAA12C000609 | 3 | - | - | - | - |
| EPA018LGAA12C000487 | 3 | CDJ10900.1 | 7.93e-50 | Phospholipase A | 30% |
| EPA018LGAA12C000593 | 3 | GAA50115.1 | 3.69e-32 | Ribonuclease Oy | 33% |
| EPA018LGAA12C000572 | 3 | CDJ18388.1 | 4.19e-41 | Hypothetical protein EgrG_001045000 | 34% |
| EPA018LGAA12C000068 | 3 | - | - | - | - |
| EPA018LGAA12C000451 | 3 | CDJ13292.1 | 3.32e-06 | Collagen alpha 2(I) chain | 53% |
| EPA018LGAA12C000891 | 2 | XP_003223989.1 | 6.16e-16 | PREDICTED: transforming growth factor-beta-induced protein ig-h3 | 78% |
| EPA018LGAA12C000855 | 2 | - | - | - | - |
| EPA018LGAA12C000638 | 2 | EUB63160.1 | 1.95e-07 | Murinoglobulin-2 | - |
| EPA018LGAA12C000838 | 2 | CDJ18319.1 | 4.40e-24 | Hypothetical protein EgrG_001037900 | - |
| EPA018LGAA12C000772 | 2 | - | - | - | - |
| EPA018LGAA12S001747 | 1 | - | - | - | - |
| EPA018LGAA12S004749 | 1 | CDI70591.1 | 8.15e-42 | Armet protein | 33% |
| EPA018LGAA12S003348 | 1 | CDJ11019.1 | 2.28e-10 | Collagen alpha(iv) chain | 56% |
| EPA018LGAA12S001839 | 1 | - | - | - | - |
| EPA018LGAA12S002027 | 1 | AFM74226.1 | 7.49e-16 | Cysteine-rich with egf-like domains protein | 34% |
| EPA018LGAA12S004089 | 1 | CDJ21221.1 | 6.76e-51 | Leucine rich repeat typical subtype | 33% |
| EPA018LGAA12S003220 | 1 | - | - | - | - |
| EPA018LGAA12S002557 | 1 | CDJ24800.1 | 1.67e-54 | Heat shock protein DnaJ N terminal | 56% |
| EPA018LGAA12S003645 | 1 | CDJ12970.1 | 1.81e-23 | Type II collagen B | 38% |
| EPA018LGAA12S000676 | 1 | - | - | - | - |
| EPA018LGAA12S003769 | 1 | - | - | - | - |
| EPA018LGAA12S004845 | 1 | - | - | - | - |
| EPA018LGAA12S000743 | 1 | - | - | - | - |
| EPA018LGAA12S000277 | 1 | - | - | - | - |
| EPA018LGAA12S001291 | 1 | CAJ00244.1 | 2.87e-11 | TPA: endonuclease-reverse transcriptase | - |
| EPA018LGAA12S000397 | 1 | AAM82156.1 | 1.03e-11 | AF523312_1 oncosphere-specific antigen | 42% |
| EPA018LGAA12S003589 | 1 | XP_007441014.1 | 5.03e-28 | PREDICTED: c-C motif chemokine 4-like | 46% |
Figure 3Transcriptome-wide relative similarity between sparganum and other species. Spargana contigs and singlets were searched against the whole transcriptome using TBLASTX score (a cut-off of ≥50). The Venn diagrams show the number of spargana sequences associated with each dataset. Global similarity comparison of cestoda (A) and trematoda (B) with a free-living flatworm. Square tiles indicate genes, with the squares colored by their highest TBLASTX score to each of the databases: red ≥300; yellow ≥200; green ≥150, blue ≥100 and purple <100.