| Literature DB >> 20582164 |
Neil D Young1, Bronwyn E Campbell, Ross S Hall, Aaron R Jex, Cinzia Cantacessi, Thewarach Laha, Woon-Mok Sohn, Banchob Sripa, Alex Loukas, Paul J Brindley, Robin B Gasser.
Abstract
The two parasitic trematodes, Clonorchis sinensis and Opisthorchis viverrini, have a major impact on the health of tens of millions of humans throughout Asia. The greatest impact is through the malignant cancer ( = cholangiocarcinoma) that these parasites induce in chronically infected people. Therefore, both C. sinensis and O. viverrini have been classified by the World Health Organization (WHO) as Group 1 carcinogens. Despite their impact, little is known about these parasites and their interplay with the host at the molecular level. Recent advances in genomics and bioinformatics provide unique opportunities to gain improved insights into the biology of parasites as well as their relationships with their hosts at the molecular level. The present study elucidates the transcriptomes of C. sinensis and O. viverrini using a platform based on next-generation (high throughput) sequencing and advanced in silico analyses. From 500,000 sequences, >50,000 sequences were assembled for each species and categorized as biologically relevant based on homology searches, gene ontology and/or pathway mapping. The results of the present study could assist in defining molecules that are essential for the development, reproduction and survival of liver flukes and/or that are linked to the development of cholangiocarcinoma. This study also lays a foundation for future genomic and proteomic research of C. sinensis and O. viverrini and the cancers that they are known to induce, as well as novel intervention strategies.Entities:
Mesh:
Substances:
Year: 2010 PMID: 20582164 PMCID: PMC2889816 DOI: 10.1371/journal.pntd.0000719
Source DB: PubMed Journal: PLoS Negl Trop Dis ISSN: 1935-2727
Summary of the clustering performance and bioinformatic analyses performed on the nucleotide sequences encoded in the transcriptome of the adult stage of each Clonorchis sinensis and Opisthorchis viverrini.
| cDNA libraries | ||
| Initial clustering |
|
|
| Sequences before clustering | 574,448 | 642,918 |
| (351±141; 1–727) | (373±133; 1–724) | |
| Proportion of sequences incorporated into clusters | 83.96% (482,325) | 84.19% (541,264) |
| Contigs | 42,179 | 60,833 |
| (711±483; 42–11,947) | (680±438; 41–9,753) | |
| Singletons | 92,123 | 101,654 |
| (279±161; 40–727) | (307±162; 40–724) | |
| Total unique sequences after assembly | 134,302 | 162,487 |
| (415±363; 40–11,947) | (447±348; 40–9,753) | |
| Coverage (average reads per assembled contig) | 10.8±20.0 | 8.6±14.5 |
| Containing an open reading frame (ORF) | 88,714 (66.1%) | 107,217 (66.0%) |
Summarized as number of sequences (average sequence length ± standard deviation; minimum and maximum sequence lengths).
Summarized as number of sequences (proportion of total sequences used for the analysis).
Summary of the bioinformatic analyses performed on the amino acid sequences encoded by the transcriptome of the adult stage of Clonorchis sinensis and Opisthorchis viverrini.
| cDNA libraries | ||
| Characterization of transcripts |
|
|
| Nucleotide sequences containing a predicted ORF | 50,769 (92.9%) | 61,417 (92.3%) |
| Full-length transcripts (containing start and stop codons) | 3,113 | 4,144 |
| Partial transcripts with start codon only | 8,466 | 11,407 |
| Partial transcripts with stop codon only | 10,558 | 13,296 |
| Sequences with signal peptides | 3,305 (6.5%) | 4,246 (6.9%) |
| Containing transmembrane-domains | 3,453 (6.8%) | 4,382 (7.1%) |
| Putative excretory/secretory proteins | 1,143 (2.3%) | 1,470 (2.4%) |
Summarized as number of sequences (proportion of total sequences used in the analysis).
Comparative genomic analysis between Clonorchis sinensis, Opisthorchis viverrini, other parasitic trematodes and selected eukaryotic (model) organisms.
|
|
| |||||
| with homology (%) | with homology (%) | |||||
| Predicted proteins similar | <1E−05 | <1E−15 | <1E−30 | <1E−05 | <1E−15 | <1E−30 |
|
| 29,995 (48.84) | 22,216 (36.17) | 16,324 (26.58) | |||
|
| 27,103 (53.38) | 21,036 (41.43) | 15,796 (31.11) | |||
| NCBI non-redundant database | 16,782 (33.06) | 12,164 (23.96) | 7,974 (15.71) | 19,126 (31.14) | 13,664 (22.25) | 9,093 (14.81) |
|
| 15,725 (30.97) | 10,580 (20.84) | 6,700 (13.20) | 17,495 (28.49) | 11,572 (18.84) | 7,186 (11.70) |
|
| 15,229 (30.00) | 11,033 (21.73) | 7,465 (14.70) | 16,736 (27.25) | 11,897 (19.37) | 8,095 (13.18) |
|
| 14,526 (28.61) | 10,159 (20.01) | 6,429 (12.66) | 15,982 (26.02) | 11,116 (18.10) | 7,083 (11.53) |
|
| 10,177 (20.05) | 6,628 (13.06) | 3,890 (7.66) | 11,238 (18.30) | 7,246 (11.80) | 4,203 (6.84) |
|
| 10,164 (20.02) | 6,591 (12.98) | 3,890 (7.66) | 11,206 (18.25) | 7,259 (11.82) | 4,213 (6.86) |
|
| 10,000 (19.70) | 6,386 (12.58) | 3,744 (7.37) | 11,100 (18.07) | 7,043 (11.47) | 4,042 (6.58) |
|
| 9,737 (19.18) | 6,187 (12.19) | 3,613 (7.12) | 10,718 (17.45) | 6,795 (11.06) | 3,889 (6.33) |
|
| 9,642 (18.99) | 6,105 (12.03) | 3,505 (6.90) | 10,716 (17.45) | 6,683 (10.88) | 3,823 (6.22) |
|
| 9,032 (17.79) | 5,676 (11.18) | 3,292 (6.48) | 10,092 (16.43) | 6,251 (10.18) | 3,583 (5.83) |
|
| 8,029 (15.81) | 4,847 (9.55) | 2,771 (5.46) | 8,951 (14.57) | 5,367 (8.74) | 2,974 (4.84) |
|
| 4,509 (8.88) | 2,371 (4.67) | 1,266 (2.49) | 5,194 (8.46) | 2,655 (4.32) | 1,397 (2.27) |
All amino acid sequences conceptually translated from ORF-enriched sequence data were searched against protein databases using BLASTx employing permissive (E-value of <1E−05), moderate (E-value of <1E−15) and stringent (E-value of <1E−30) search strategies.
Figure 1Venn diagram illustrating the overlap in sequence homology among parasitic trematodes.
Predicted proteins with significant sequence similarity (permissive BLASTx search with E-value <1E−05) among parasitic trematodes, Clonorchis sinensis and Opisthorchis viverrini (family Opisthorchiidae), Schistosoma mansoni (Schistosomatidae) and Fasciola hepatica (Fasciolidae).
Comparative genomic analysis between or among Clonorchis sinensis, Opisthorchis viverrini, Fasciola hepatica, Schistosoma mansoni, S. japonicum (blood flukes) and selected mammals.
|
|
| |||||
| (n = 50,769) with homology (%) | (n = 61,417) with homology (%) | |||||
| Proteins predicted to be similar | 1E−05 | 1E−15 | 1E−30 | 1E−05 | 1E−15 | 1E−30 |
|
| 13,254 | 9,325 | 5,940 | 14,438 | 10,067 | 6,459 |
| (26.10) | (18.37) | (11.70) | (23.51) | (16.39) | (10.52) | |
|
| 11,134 | 7,403 | 4,508 | 12,167 | 7,994 | 4,762 |
| (21.93) | (14.58) | (8.88) | (19.81) | (13.02) | (7.75) | |
|
| 10,875 | 7,164 | 4,320 | |||
| (21.42) | (14.11) | (8.51) | ||||
|
| 11,780 | 7,660 | 4,529 | |||
| (19.18) | (12.47) | (7.37) | ||||
|
| 9,954 | 6,475 | 3,829 | 10,983 | 7,116 | 4,143 |
| (19.61) | (12.75) | (7.54) | (17.88) | (11.59) | (6.75) | |
| Eukaryotic model organisms | 3,434 | 1,732 | 906 | 3,753 | 1,856 | 947 |
| (6.76) | (3.41) | (1.78) | (6.11) | (3.02) | (1.54) | |
ORF-enriched sequence data were searched against protein databases by BLASTx using permissive (E-value of <1E−05), moderate (E-value of <1E−15) and stringent (E-value of <1E−30) search strategies.
Proteins that were homologous to model organisms assessed in Table 3.
Summary of the numbers of unique genes predicted to be expressed by the adult stage of each Clonorchis sinensis and Opisthorchis viverrini based on amino acid sequence similarity to model eukaryotic organisms.
|
| Sequences with homology to unique genes | Cluster size | Estimated number of genes |
|
| 7,154 | 2.03±3.64 (1–208) | 25,007 |
|
| 6,845 | 2.22±2.95 (1–122) | 22,824 |
|
| 6,110 | 1.63±1.24 (1–32) | 31,054 |
|
| 5,920 | 1.72±1.34 (1–26) | 29,585 |
|
| 5,872 | 1.73±1.34 (1–30) | 29,307 |
|
| 5,468 | 1.76±1.45 (1–33) | 28,800 |
|
| 5,289 | 1.84±1.47 (1–24) | 27,580 |
|
| 4,525 | 1.99±1.79 (1–26) | 25,452 |
|
| 4,033 | 1.99±2.09 (1–62) | 25,511 |
|
| 2,060 | 2.19±2.89 (1–65) | 23,205 |
All amino acid sequences conceptually translated from ORF-enriched sequence data were searched against protein databases using BLASTx employing permissive (E-value of <1E−05) search strategies.
Cluster size denotes the average number of sequences (± standard deviation) clustered with a unique gene. The numbers (range) of sequences representing each cluster are given in parentheses.
The estimated number of unique genes is based on the multiplication of the number of ORF-enriched sequences by the predicted proportion of unique genes.
Functions predicted for proteins encoded in the transcriptome of the adult stage of each Clonorchis sinensis and Opisthorchis viverrini based on gene ontology (GO).
| Parental GO terms |
|
| Top GO term for |
| Biological process GO:0008150 | |||
| Anatomical structure formation GO:0010926 | 134 (1.26) | 124 (1.02) | Protein polymerization GO:0051258 (CS:56; OV:37) |
| Biological adhesion GO:0022610 | 76 (0.72) | 100 (0.83) | Homophilic cell adhesion GO:0007156 (CS:44; OV:65) |
| Biological regulation GO:0065007 | 829 (7.82) | 952 (7.86) | Regulation of transcription, DNA-dependent GO:0006355 (CS:138;OV:163) |
| Cellular component biogenesis GO:0044085 | 164 (1.55) | 155 (1.28) | Protein polymerization GO:0051258 (CS:56; OV:37) |
| Cellular component organization GO:0016043 | 243 (2.29) | 247 (2.04) | Protein polymerization GO:0051258 (CS:56; OV:37) |
| Cellular process GO:0009987 | 3786 (35.70) | 4254 (35.13) | Protein amino acid phosphorylation GO:0006468 (CS:321; OV:345) |
| Death GO:0016265 | 10 (0.09) | 17 (0.14) | Regulation of apoptosis GO:0042981 (CS:6; OV:11) |
| Developmental process GO:0032502 | 27 (0.25) | 40 (0.33) | Multicellular organismal development GO:0007275 (CS:18; OV:21) |
| Growth GO:0040007 | 1 (0.01) | 1 (0.01) | Regulation of cell growth GO:0001558 (CS:1; OV:1) |
| Immune system process GO:0002376 | 4 (0.04) | 3 (0.02) | Immune response GO:0006955 (CS:3; OV:2) |
| Localization GO:0051179 | 852 (8.03) | 994 (8.21) | Transport GO:0006810 (CS:204; OV:236) |
| Locomotion GO:0040011 | 2 (0.02) | 3 (0.02) | Ciliary or flagellar motility GO:0001539 (CS:1; OV:2) |
| Metabolic process GO:0008152 | 3516 (33.15) | 4113 (33.96) | Protein amino acid phosphorylation GO:0006468 (CS:321; OV:345) |
| Multicellular organismal process GO:0032501 | 28 (0.26) | 37 (0.31) | Multicellular organismal development GO:0007275 (CS:18; OV:21) |
| Multi-organism process GO:0051704 | 1 (0.01) | 1 (0.01) | Pathogenesis GO:0009405 (CS:1; OV:1) |
| Regulation of biological process GO:0050789 | 805 (7.59) | 920 (7.60) | Regulation of transcription, DNA-dependent GO:0006355 (CS:138; OV:163) |
| Reproduction GO:0000003 | 3 (0.03) | 8 (0.07) | Spermatogenesis GO:0007283 (CS:2; OV:1) |
| Response to stimulus GO:0050896 | 123 (1.16) | 140 (1.16) | DNA repair GO:0006281 (CS:49; OV:51) |
| Viral reproduction GO:0016032 | 1 (0.01) | 1 (0.01) | Viral genome replication GO:0019079 (CS:1), viral transcription GO:0019083 (OV:1) |
| Cellular component GO:0005575 | |||
| Cell GO:0005623 | 2953 (60.19) | 3393 (61.55) | Intracellular GO:0005622 (CS:640; OV:707), |
| Envelope GO:0031975 | 59 (1.20) | 66 (1.20) | Nuclear pore GO:0005643 (CS:18; OV:18) |
| Extracellular region GO:0005576 | 85 (1.73) | 110 (2.00) | Proteinaceous extracellular matrix GO:0005578 (CS:12; OV:13) |
| Macromolecular complex GO:0032991 | 671 (13.68) | 705 (12.79) | Ribosome GO:0005840 (CS:128; OV:141) |
| Membrane-enclosed lumen GO:0031974 | 58 (1.18) | 59 (1.07) | Mediator complex GO:0000119 (CS:14; OV:14) |
| Organelle GO:0043226 | 1057 (21.55) | 1160 (21.04) | Nucleus GO:0005634 (CS:368; OV:413) |
| Synapse GO:0045202 | 22 (0.45) | 19 (0.34) | Postsynaptic membrane GO:0045211 (CS:20; OV:17) |
| Virion GO:0019012 | 1 (0.02) | 1 (0.02) | Viral capsid GO:0019028 (CS:1) viral nucleocapsid GO:0019013 (OV:1) |
| Molecular function GO:0003674 | |||
| Antioxidant activity GO:0016209 | 10 (0.09) | 16 (0.13) | Glutathione peroxidase activity GO:0004602 (CS:4; OV:6) |
| Auxiliary transport protein activity GO:0015457 | 1 (0.01) | Sodium channel inhibitor activity GO:0019871 (OV:1) | |
| Binding GO:0005488 | 5160 (48.56) | 5757 (47.81) | ATP binding GO:0005524 (CS:919; OV:1012) |
| Catalytic activity GO:0003824 | 4159 (39.14) | 4733 (39.30) | Protein kinase activity GO:0004672 (CS:289; OV:316) |
| Electron carrier activity GO:0009055 | 68 (0.64) | 96 (0.80) | Electron carrier activity GO:0009055 (CS:68; OV:96) |
| Enzyme regulator activity GO:0030234 | 180 (1.69) | 221 (1.84) | Serine-type endopeptidase inhibitor activity GO:0004867 (CS:33; OV:67) |
| Metallochaperone activity GO:0016530 | 1 (0.01) | Copper chaperone activity GO:0016531 (CS:1) | |
| Molecular transducer activity GO:0060089 | 111 (1.04) | 124 (1.03) | Signal transducer activity GO:0004871 (CS:38; OV:49) |
| Nutrient reservoir activity GO:0045735 | 2 (0.02) | 2 (0.02) | Nutrient reservoir activity GO:0045735 (CS:2; OV:2) |
| Proteasome regulator activity GO:0010860 | 2 (0.02) | 1 (0.01) | Proteasome activator activity GO:0008538 (CS:2; OV:1) |
| Structural molecule activity GO:0005198 | 218 (2.05) | 224 (1.86) | Structural constituent of ribosome GO:0003735 (CS:132; OV:145) |
| Transcription regulator activity GO:0030528 | 197 (1.85) | 229 (1.90) | Transcription factor activity GO:0003700 (CS:99; OV:121) |
| Translation regulator activity GO:0045182 | 29 (0.27) | 31 (0.26) | Translation initiation factor activity GO:0003743 (CS:17, OV:22) |
| Transporter activity GO:0005215 | 489 (4.60) | 607 (5.04) | Atpase activity, coupled to transmembrane movement of ions, phosphorylative mechanism GO:0015662 (CS:48; OV:63) |
Values in parentheses are the percentage of the total number of predicted proteins within each GO category (i.e. biological process, molecular function or cellular component).
The most frequently reported GO category and number of sequences within each category are summarized for each parental GO category.
The parental (i.e. level 2) and specific GO categories were assigned according to InterPro domains with homology to functionally annotated genes.
Summary of biological key pathways predicted from amino acid sequences encoded in the transcriptome of the adult stage of each Clonorchis sinensis and Opisthorchis viverrini based on homology to annotated proteins in the Kyoto Encyclopedia of Genes and Genomes (KEGG) biological pathways database.
| Parent KEGG pathway |
|
| Top KEGG pathway term |
| Cellular processes | |||
| Behavior | 1 (0.01) | 3 (0.04) | Circadian rhythm ko04710 (CS:1; OV:3) |
| Cell communication | 471 (6.54) | 507 (6.29) | Focal adhesion ko04510 (CS:168; OV:174) |
| Cell growth and death | 249 (3.46) | 287 (3.56) | Cell cycle ko04110 (CS:109; OV:123) |
| Cell motility | 137 (1.90) | 135 (1.67) | Regulation of actin cytoskeleton ko04810 (CS:137; OV:135) |
| Development | 120 (1.67) | 116 (1.44) | Axon guidance ko04360 (CS:96; OV:94) |
| Endocrine system | 508 (7.06) | 603 (7.48) | Insulin signaling pathway ko04910 (CS:151; OV:167) |
| Immune system | 284 (3.95) | 402 (4.99) | Leukocyte transendothelial migration ko04670 (CS:64; OV:70) |
| Nervous system | 150 (2.08) | 180 (2.23) | Long-term potentiation ko04720 (CS:88; OV:116) |
| Sensory system | 46 (0.64) | 76 (0.94) | Olfactory transduction ko04740 (CS:31; OV:55) |
| Environmental information processing | |||
| Membrane transport | 116 (1.61) | 132 (1.64) | Other ion-coupled transporters ko00000 (CS:45; OV:48) |
| Signaling molecules and interaction | 103 (1.43) | 115 (1.43) | Neuroactive ligand-receptor interaction ko04080 (CS:37; OV:46) |
| Signal transduction | 794 (11.03) | 957 (11.87) | MAPK signaling pathway ko04010 (CS:162; OV:194) |
| Genetic information processing | |||
| Folding, sorting and degradation | 281 (3.90) | 308 (3.82) | Ubiquitin mediated proteolysis ko04120 (CS:132; OV:153) |
| Replication and repair | 106 (1.47) | 101 (1.25) | Other replication, recombination and repair proteins ko00000 (CS:50; OV:51) |
| Transcription | 87 (1.21) | 89 (1.10) | RNA polymerase ko03020 (CS:41; OV:34) |
| Translation | 192 (2.67) | 243 (3.01) | Ribosome ko03010 (CS:91; OV:104) |
| Human diseases | |||
| Cancers | 547 (7.60) | 580 (7.20) | Colorectal cancer ko05210 (CS:71; OV:73) |
| Infectious diseases | 37 (0.51) | 46 (0.57) | Epithelial cell signaling in |
| Metabolic disorders | 33 (0.46) | 38 (0.47) | Type II diabetes mellitus ko04930 (CS:22; OV:25) |
| Neurodegenerative disorders | 102 (1.42) | 123 (1.53) | Huntington's disease ko05040 (CS:40; OV:46) |
| Metabolism | |||
| Amino acid metabolism | 565 (7.85) | 659 (8.18) | Lysine degradation ko00310 (CS:75; OV:79) |
| Biosynthesis of polyketides and nonribosomal peptides | 1 (0.01) | 2 (0.02) | Biosynthesis of ansamycins ko01051 (CS:1; OV:2) |
| Biosynthesis of Secondary Metabolites | 114 (1.58) | 124 (1.54) | Limonene and pinene degradation ko00903 (CS:31; OV:37) |
| Carbohydrate metabolism | 623 (8.66) | 598 (7.42) | Starch and sucrose metabolism ko00500 (CS:91; OV:97) |
| Energy metabolism | 198 (2.75) | 232 (2.88) | Oxidative phosphorylation ko00190 (CS:95; OV:104) |
| Glycan biosynthesis and metabolism | 211 (2.93) | 213 (2.64) | N-Glycan biosynthesis ko00510 (CS:55;OV:21) |
| Lipid metabolism | 389 (5.40) | 439 (5.45) | Glycerophospholipid metabolism ko00564 (CS:65; OV:80) |
| Metabolism of cofactors and vitamins | 225 (3.13) | 229 (2.84) | Folate biosynthesis ko00790 (CS:67; OV:67) |
| Metabolism of other amino acids | 114 (1.58) | 116 (1.44) | Selenoamino acid metabolism ko00450 (CS:33; OV:30) |
| Nucleotide metabolism | 189 (2.63) | 184 (2.28) | Purine metabolism ko00230 (CS:126; OV:121) |
| Xenobiotics biodegradation and metabolism | 205 (2.85) | 223 (2.77) | Benzoate degradation via CoA ligation ko00632 (CS:33; OV:41) |
Values in parentheses are the percentage of the total number of predicted proteins within each KEGG category.
The most frequently reported KEGG biological pathway and number of sequences within each pathway.
Figure 2Characterization of the putative excretory/secretory proteins of the adult stage of each Clonorchis sinensis and Opisthorchis viverrini.
Protein families (A) and biological pathways (B) were assigned to proteins based on their homology to annotated proteins in the Kyoto Encyclopedia of Genes and Genomes (KEGG) biological pathways. Within gene ontology (GO) categories, the parental (i.e. level 2) biological processes (C) were assigned to proteins according to InterPro domains with homology to functionally annotated genes. Individual KEGG and GO categories can have multiple mappings.