| Literature DB >> 22369360 |
Gagan Garg1, Shoba Ranganathan.
Abstract
BACKGROUND: Excretory/secretory proteins (ESPs) play a major role in parasitic infection as they are present at the host-parasite interface and regulate host immune system. In case of parasitic helminths, transcriptomics has been used extensively to understand the molecular basis of parasitism and for developing novel therapeutic strategies against parasitic infections. However, none of transcriptomic studies have extensively covered ES protein prediction for identifying novel therapeutic targets, especially as parasites adopt non-classical secretion pathways.Entities:
Mesh:
Substances:
Year: 2011 PMID: 22369360 PMCID: PMC3333173 DOI: 10.1186/1471-2164-12-S3-S14
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 3.969
Figure 1Bioinformatics workflow for secretome analysis. Bioinformatics workflow comprising Phase I (pre-processing and assembly), II (prediction of excretory/secretory proteins) and III (Protein-level annotation) were augmented by homologue identification from nematodes as well as parasitic nematodes, using specialized databases.
Comparison of results from different NGS assemblers
| Assembler | No. of second order contigs | No. of contigs | Largest contig | Average length | N50* | N90* | Number of bases |
|---|---|---|---|---|---|---|---|
| MIRA [ | 3056 | 25845 | 3620 | 402.36 | 406 | 253 | 11628536 |
| Newbler [ | 25127 | 2607 | 407.11 | 409 | 252 | 10229510 |
*N50 refers to the length of the shortest contig such that the sum of contigs of equal length or longer is at least 50% of the total assembly size. While N90 refers to the length of the shortest contig such that the sum of contigs of equal length or longer is at least 90% of the total assembly size.
Top 15 most represented protein domains found in ES proteins using Interproscan
| InterPro description | InterPro code | Number of ES proteins (%) |
|---|---|---|
| Protein Kinase like domain | IPR011009 | 126 (4.90) |
| Protein kinase, catalytic domain | IPR000719 | 114 (4.43) |
| Serine/threonine-protein kinase like domain | IPR017442 | 99 (3.85) |
| Serine/threonine-protein kinase domain | IPR002290 | 64 (2.49) |
| Serine/threonine-protein kinase active site | IPR008271 | 52 (2.02) |
| WD40 repeat like domain | IPR011046 | 40 (1.55) |
| WD40 repeat subgroup | IPR019781 | 39 (1.52) |
| WD40/YVTN repeat like domain | IPR015943 | 39 (1.52) |
| WD40 repeat | IPR001680 | 39 (1.52) |
| WD40 repeat domain | IPR017986 | 38 (1.47) |
| Tyrosine-protein kinase catalytic domain | IPR020635 | 37 (1.44) |
| WD40 repeat 2 | IPR019782 | 37 (1.44) |
| Helicase C | IPR001650 | 35 (1.36) |
| NAD(P)-binding domain | IPR016040 | 29 (1.13) |
| Immunoglobulin-like fold | IPR013783 | 28 (1.09) |
Top 15 most represented KEGG pathways found in ES proteins predicted by KAAS
| Pathway name | Number of ES proteins represented (%) |
|---|---|
| Metabolic pathways | 109 (4.24) |
| Protein processing in endoplasmic reticulum | 57 (2.22) |
| Ubiquitin mediated proteolysis | 44 (1.71) |
| Wnt signalling pathway | 29 (1.13) |
| Glycolysis / Gluconeogenesis | 28 (1.08) |
| Spliceosome | 28 (1.08) |
| Glutathione metabolism | 26 (1.01) |
| Circadian rhythm - mammal | 22 (0.85) |
| TGF- beta signalling pathway | 22 (0.85) |
| RNA transport | 20 (0.77) |
| Endocytosis | 20 (0.77) |
| Purine metabolism | 19 (0.74) |
| Phagosome | 19 (0.74) |
| Proteasome | 18 (0.70) |
| Drug metabolism | 17 (0.66) |
Top 15 most represented KEGG BRITE objects found in ES proteins predicted by KAAS
| BRITE object | Number of ES proteins represented (%) |
|---|---|
| Enzymes | 282 (10.96) |
| Spliceosome | 49 (1.90) |
| Chaperons and folding catalysts | 44 (1.71) |
| Peptidases | 44 (1.71) |
| Protein kinases | 43 (1.67) |
| Ubiquitin system | 37 (1.44) |
| Chromosome | 34 (1.32) |
| Cytoskeleton proteins | 27 (1.05) |
| DNA repair and recombination proteins | 21 (0.82) |
| GTP-binding proteins | 19 (0.74) |
| Proteasome | 18 (0.70) |
| Transcription factors | 17 (0.66) |
| Ribosome biogenesis | 16 (0.62) |
| Translation factors | 11 (0.43) |
| DNA replication proteins | 9 (0.35) |
Figure 2Comparison of The numbers at each vertex indicate the number of proteins matching only that specific database. The numbers on the edges indicate the number of proteins matching the two databases linked by that edge. The number within the triangle indicates the number of S. ratti ES proteins with matches to all three databases.