Literature DB >> 25989346

Genome analysis of Excretory/Secretory proteins in Taenia solium reveals their Abundance of Antigenic Regions (AAR).

Sandra Gomez1, Laura Adalid-Peralta2, Hector Palafox-Fonseca1, Vito Adrian Cantu-Robles3, Xavier Soberón4, Edda Sciutto5, Gladis Fragoso5, Raúl J Bobes5, Juan P Laclette5, Luis del Pozo Yauner3, Adrián Ochoa-Leyva6.   

Abstract

Excretory/Secretory (ES) proteins play an important role in the host-parasite interactions. Experimental identification of ES proteins is time-consuming and expensive. Alternative bioinformatics approaches are cost-effective and can be used to prioritize the experimental analysis of therapeutic targets for parasitic diseases. Here we predicted and functionally annotated the ES proteins in T. solium genome using an integration of bioinformatics tools. Additionally, we developed a novel measurement to evaluate the potential antigenicity of T. solium secretome using sequence length and number of antigenic regions of ES proteins. This measurement was formalized as the Abundance of Antigenic Regions (AAR) value. AAR value for secretome showed a similar value to that obtained for a set of experimentally determined antigenic proteins and was different to the calculated value for the non-ES proteins of T. solium genome. Furthermore, we calculated the AAR values for known helminth secretomes and they were similar to that obtained for T. solium. The results reveal the utility of AAR value as a novel genomic measurement to evaluate the potential antigenicity of secretomes. This comprehensive analysis of T. solium secretome provides functional information for future experimental studies, including the identification of novel ES proteins of therapeutic, diagnosis and immunological interest.

Entities:  

Mesh:

Substances:

Year:  2015        PMID: 25989346      PMCID: PMC4437048          DOI: 10.1038/srep09683

Source DB:  PubMed          Journal:  Sci Rep        ISSN: 2045-2322            Impact factor:   4.379


The secretome refers to the set of proteins that are excreted/secreted by a given cell, including extracellular-matrix (ECM) proteins, vesicle proteins (e.g., from microsomal vesicles) and proteins shed from the cell membrane1. These Excretory/Secretory (ES) proteins play important roles in development, adhesion, proteolysis and extracellular matrix organization of the organism. In parasitic organisms, the ES proteins play important roles acting as virulence factors and as immune regulators to control the host immune recognition during infection. The ES proteins are crucial for parasite survival inside and outside the host and their expression usually changes in response to several environmental stimuli1. As the ES proteins are involved in clinical manifestations of the host organism, they represent attractive drug targets for the development of novel therapeutic strategies2. Moreover, ES proteins are an important source of immunogenic proteins due to their accessibility to be recognized by the host immune system. Thus, considerable attention has been made in ES proteins as biomarkers to detect the presence of a parasite and/or the status of the infection in different infectious diseases3456. The prediction of ES proteins from sequenced genomes is a novel strategy used to prioritize the experimental study of new therapeutic and immunodiagnostic targets for human parasitic diseases2. The ability to sequence the whole genome of parasite organisms provides the opportunity to in silico screen for the encoded secretomes and for the most probable antigenic proteins before undertaking confirmatory experiments. The increasing availability of genomes provides the opportunity to systematically examine their encoded secretomes using bioinformatics approaches. Echinococcosis (hydatid disease) and cysticercosis caused by the proliferation of larval tapeworms in vital organs, are important neglected tropical diseases7. Cysticercosis is a tissue infection caused by the Taenia solium parasite (known as the pork tapeworm). The life cycle includes pig as intermediary host and human as definitive host. The tapeworm is the adult stage of T. solium parasite and infects the human intestine delivering the eggs into the human feces. The intermediary host becomes infected by ingesting contaminated vegetation with eggs and subsequently oncospheres hatch, penetrate intestinal wall and circulate to musculature. The oncospheres develop into larval stage (cysticerci) in muscle and central nervous system (CNS). The life cycle is completed when humans ingest raw or undercooked infected meat and develop the adult tapeworm in the intestine89. However, humans accidentally ingest the eggs and develop the cysticerci. In humans the cysticerci is predominantly established in the CNS causing neurocysticercosis (NC), which is the most common worldwide tapeworm infection of the brain and it is an endemic disease of developing countries1011. The NC causes symptoms that range from cephalea and dizziness to epilepsy and severe intracranial hypertension, impacting on the social and economic development of the affected communities1112. Tapeworms (Platyhelminthes, Cestoda) secrete several ES molecules to regulate the host immune system for parasite survival13141516171819. ES proteins involved in the uptake and sequestration of host hydrophobic molecules20 and mediating the host immune response to parasite infection21 have been experimentally characterized in different life cycle stages of T. solium22. Also, several ES proteins with peptidase activities has been reported232425. However, since no curated protein database and no genome sequence for T. solium was then available, those studies only produced partial lists of Excreted/Secreted proteins. Recently, the T. solium genome has been published26, allowing us the opportunity to characterize the ES proteins encoded in the genome and to in silico screen for the most probable protective antigens before undertaking confirmatory experiments. The prediction of number of antigenic regions per each protein at genome-wide level can help in the design of vaccine components and immunodiagnostic reagents. There are many bioinformatics methods to predict antigenic regions from a protein sequence. The classical approach of epitope prediction is to utilize the amino acid properties including hydrophobicity27, hydrophilicity28, surface accessibility29, flexibility30 and antigenicity31. In addition, there are methods using machine learning algorithms such as Hidden Markov Model (HMM)32, Artificial Neural Network (ANN)33 and Support Vector Machine (SVM)34 to locate antigenic epitopes. However, sequence length to normalize the epitope density never has been considered to measure the antigenic potential of a protein sequence at a genome wide level. In the present study, we predicted ES proteins encoded in the T. solium genome, followed by functional annotation. Predicted ES proteins were functionally annotated in terms of similarity to other known proteins, biochemical pathways, gene ontologies, protein families and domains. ES proteins were also analyzed for number of antigenic regions using three different bioinformatics algorithms and searched for structural homologues using fold recognition algorithms. We developed a novel genomic measurement to evaluate the potential antigenicity of a secretome using the sequence length and the number of antigenic regions of ES proteins. This measurement was formalized as the Abundance of Antigenic Region (AAR) value. We also determine the AAR value for a set of 46 experimentally determined antigenic proteins of T. solium and for previously reported ES proteins of 12 parasitic helminth species. We believe that our genome wide exploration of ES proteins is a valuable resource for future experimental studies of the T. solium secretome. Our work represents a starting point to the characterization of the parasite secretome and it would contribute to a better comprehension of the host-parasite interactions.

Results

Prediction of Excretory/Secretory (ES) proteins of T. solium genome

The bioinformatics pipeline is summarized in Figure 1. Of the 12,902 proteins encoded in the T. solium genome26, we could annotate a total of 731 proteins as classical secretory proteins by SignalP35 and 543 proteins as non-classical secretory proteins by SecretomeP36. The classical and non-classical secretory proteins were merged, yielding a set of only 1190 different proteins because 84 proteins were shared between both predictions (see Venn diagram in Figure 1). The 1190 proteins were subsequently analyzed by TargetP37 to identify mitochondrial proteins. After that, 98 proteins were predicted as mitochondrial and were removed from the original set of 1190 proteins. The remaining 1092 proteins were scanned using TMHMM38 and for 254 proteins transmembrane regions were predicted. These transmembrane proteins were removed from the protein dataset. Finally, a total of 838 sequences were predicted as ES proteins by our bioinformatics pipeline (Figure 1). The 838 ES proteins represent the 6.5% of the total sequences of T. solium genome. The ES proteins were searched against the RNAseq and ESTs libraries from T. solium to analyze the percentage of ES proteins that are supported at RNA level. The access to its RNA data was kindly provided by T. solium consortium (unpublished data). Interestingly, we found RNA support for 347 ES proteins, representing 41.4% of the total T. solium secretome.
Figure 1

Bioinformatics pipeline to identify and annotate the ES proteins in T. solium genome.

Functional annotation of T. solium secretome

ES protein identification

Of the 838 ES proteins, 654 (81.6%) proteins show significant BLASTP matches with proteins deposited in the non-redundant (nr) database and 63 (7.5%) proteins represented significant BLASTP matches with hypothetical protein homologs. According to the sequence description of protein homologs, several ES proteins were indentified as diagnostic antigen gp50 (14 proteins), cystein-rich secretory protein (9 proteins), chorion class high cystein protein (6 proteins), oncosphere antigen a (5 proteins) and others.

Gene Ontology analysis

ES proteins were annotated for Biological Process, Molecular Function and Cellular Components with Gene Ontology (GO) terms. Out of 838 ES proteins, 349 (41.6%) proteins were annotated with GO terms using Blast2GO3940. In an effort to obtain more sequences with annotations, the 488 unannotated proteins were subjected to GO terms annotation using Argot241. The advantage of Argot2 is that it exploits HMMER searches in addition to the typical BLAST searches and combines the clustering of GO terms based on their semantic similarities with a weighting scheme to annotate the query sequences41. After the analysis using Argot2, we can annotate 276 proteins from the 488 originally unannotated by Blast2GO3940. In summary, of the 838 ES proteins, 625 (74.6%) proteins were annotated with 1429 different GO terms (835 for Biological Process, 231 for Cellular Component and 363 for Molecular Function) using the two annotation bioinformatics programs. The 12,064 non-ES proteins of the T. solium genome were also analyzed for GO terms annotation. After that, a total of 10,218 (84.7%) proteins were mapped with GO terms. The GO terms distribution to a second level category is provided in Figure 2 for ES and non-ES proteins from T. solium genome.
Figure 2

Gene Ontology distribution of ES proteins and non-ES proteins from T. solium.

Distribution of Gene Ontology terms at level 2 for: (A) Molecular Function, (B) Cellular Component and (C) Biological Process.

The most represented GO terms in the 838 ES proteins at Molecular Function category (Figure 2A) were: binding (42%) and catalytic activity (37%). The molecular function regulator and catalytic activity terms show an overrepresentation of annotated sequences in the ES proteins as compared to the distribution of the same terms for the non-ES proteins of Taenia solium genome (Figure 2A). Contrary, transporter activity and binding terms show a subrepresentation of annotated sequences in the secretome as compared to the distribution of the same terms for the non-ES proteins. The binding term predominantly includes at the third level subcategory the ion binding (13%), protein binding (11%), organic cyclic compound binding (11%), heterocyclic compound binding (11%) and small molecule binding (10%) terms. The catalytic activity term predominantly includes at the third level subcategory the hydrolase activity (10%), transferase activity (10%), oxidoreductase activity (3%), isomerase activity (1%), ligase activity (0.5%) and lyase activity (0.5%) terms. The most represented GO terms in the ES proteins at Cellular Component category (Figure 2B) were: cell (28%), organelle (21%), membrane (21%), macromolecular complex (10%), extracellular region (9%) and membrane enclosed lumen (4%) terms. The extracellular matrix, extracellular region and membrane terms show an overrepresentation in the secretome as compared to the distribution of the same terms for the non-ES proteins (Figure 2B). The most represented GO terms in the 838 ES proteins at Biological Process category (Figure 2C) were: cellular process (18%), metabolic process (16%), single-organism process (14%), biological regulation (10%), response to stimulus (7%) and multicellular organism process (5%) terms. The biological adhesion, biological regulation and metabolic process terms show an overrepresentation in the secretome as compared with the distribution of the same terms for the non-ES proteins of Taenia solium genome.

Gene Ontology terms enrichment

We analyze whether any GO term shows a significant enrichment in the secretome as compared to the expected by GO term distributions for all T. solium genome (Figure 3). In the molecular Function category a significant enrichment with terms related to the regulation of peptidase activities, extracellular matrix structural constituent and oxidoreductase activity was found (Figure 3A). The terms related to extracellular components, endoplasmic reticulum lumen and components anchored to membrane shows a significant enrichment in the Cellular Component category (Figure 3B). The terms that show a significant enrichment in Biological Process category were related to regulation of peptidase and hydrolase activity, proteolysis and extracellular structure organization (Figure 3C). The complete lists of significantly GO enrichments assigned to ES proteins are provided in Supplementary Tables S1–S3.
Figure 3

Gene Ontology enrichment of ES proteins as compared to the total proteins from T. solium genome.

Significantly enrichments of Gene Ontology terms for: (A) Molecular Function, (B) Cellular Component and (C) Biological Process.

Pathway mapping

We used KAAS424344 to annotate ES proteins to biochemical pathways. A total of 384 (45.8%) ES proteins were associated to 166 KEGG pathways. The most represented KEGG pathways are shown in Table 1 and full annotations are available in Supplementary Table S4. The two most frequently mapped KEGG pathways were protein processing in endoplasmic reticulum and Lysosome. Interestingly, four proteins were predicted as involved in antigen processing and presentation (ranking 23) which might play critical roles in host-parasite interactions.
Table 1

Top 15 most represented KEGG pathways in T. solium secretome

Pathway nameNumber of the represented ES proteins (%)
Protein processing in endoplasmic reticulum11 (1.31)
Lysosome10 (1.19)
Pathways in cancer10 (1.19)
Focal adhesion9 (1.07)
Hippo signaling pathway7 (0.84)
Proteoglycans in cancer7 (0.84)
Purine metabolism5 (0.60)
Wnt signaling pathway5 (0.60)
PI3K-Akt signaling pathway5 (0.60)
Phagosome5 (0.60)
Protein digestion and absorption5 (0.60)
Alcoholism5 (0.60)
Epstein-Barr virus infection5 (0.60)
Glycerophospholipid metabolism4 (0.48)
Pyrimidine metabolism4 (0.48)

Enzyme Code Distribution

We classified the enzymes contained in the ES proteins and in the non-ES proteins according to the six enzymes commission classes (Figure 4). The results show an overrepresentation of hydrolases, oxidoreductases and ligases in the ES proteins as compared to the same enzyme types for the non-ES proteins of Taenia solium genome (Figure 4A). The hydrolases represented 43% of the enzymes in the ES proteins, while this enzyme type represented 31% of the non-ES proteins (Figure 4A). The oxidoreductases represented 16% of the enzymes in the ES proteins, while this enzyme type only represented 9% of the non-ES proteins (Figure 4A). The three most represented EC Subclasses of Hydrolase enzymes were: acting on peptide bonds (peptide hydrolases) (18 proteins), acting on ester bonds (8) and glycosylases (6) (Figure 4B). The three most represented EC subclasses of Transferase enzymes were: transferring phosphorous-containing groups (13 proteins), glycosyltransferases (5) and acyltransferases (4) (Figure 4C). Finally, the most represented EC subclasses of oxidoreductases enzymes are shown in Figure 4D.
Figure 4

Enzyme commission classes and subclasses distribution of T. solium ES proteins.

(A) EC classes for ES and non-ES proteins, (B) EC hydrolase subclasses for ES proteins, (C) EC transferase subclasses for ES proteins and (D) oxidoreductase subclasses for ES proteins.

Analysis of protein domains and motifs

The annotation of ES proteins using InterProScan4546 resulted in 491 protein families and domains. The most represented InterPro domains are shown in Table 2. The three most represented protein domains were the Immunoglobulin-like fold, CAP domain and fibronectin type III. Interestingly, the Immunoglobulin-like domains are involved in a variety of functions, including cell-cell recognition, cell-surface receptors, muscle structure and the immune system. The Taeniidae antigen was also overrepresented (ranking 14).
Table 2

Top 15 most represented protein domains in T. solium secretome

InterPro codeInterPro descriptionNumber of ES proteins (%)
IPR013783Immunoglobulin-like fold32 (3.81)
IPR014044CAP domain18 (2.14)
IPR003961Fibronectin, type III17 (2.02)
IPR007110Immunoglobulin-like16 (1.90)
IPR001283Allergen V5/Tpx-1-related15 (1.78)
IPR002223Proteinase inhibitor I2, Kunitz metazoa14 (1.67)
IPR020901Proteinase inhibitor I2, Kunitz, conserved site12 (1.43)
IPR003599Immunoglobulin subtype9 (1.07)
IPR011009Protein kinase-like domain9 (1.07)
IPR013083Zinc finger, RING/FYVE/PHD-type9 (1.07)
IPR002126Cadherin8 (0.95)
IPR015919Cadherin-like8 (0.95)
IPR000719Protein kinase, catalytic domain8 (0.95)
IPR008860Taeniidae antigen8 (0.95)
IPR007087Zinc finger, C2H28 (0.95)

Functional analyses of the specific T. solium secretome

We compared the 838 ES proteins against the genomes of E. multilocularis (Family: Taeniidae) and H. microstome (Family: Hymenolepididae) to discard the ES proteins with homologues in both genomes. These two species are the closest evolutionary related genomes to the T. solium genome that are sequenced to date26. From these analyses, we retrieved 121 ES proteins without homologues in both genomes (threshold e-value of 1 E−3). These 121 ES proteins also were BLASTed against all the non-redundant (nr) proteins of NCBI and we did not find any related protein homologue (threshold e-value of 1 E−3). Thus, these 121 proteins constitute the specific secretome of the T. solium genome and can be used as specific targets for T. solium infections. After mapping the set of 121 ES proteins to the InterPro and KEGG databases, we did not obtain protein sequences with functional annotations. Nonetheless, we annotated 39 sequences with 83 different GO terms using Argot2. However, the GO term enrichment analysis of these 39 sequences does not show statistically significant results as compared with GO distributions for all genome of Taenia solium. In an effort to obtain more functional information for this set of ES proteins, we subjected the 121 sequences to a fold recognition analysis using the Phyre2 algorithm47. Phyre2 algorithm was recently used as an alternative approach for functional annotation of novel protein sequences. In this regard, if the predicted structure for query protein is confident, the template protein functions can be tentatively assigned to the query protein. The confidence score of Phyre2 was established to 55% as the minimum cut-off value and the proteins with confidence scores equal to or higher than this cut-off value are shown in Table 3. The protein 08062.0.1 has a high structural similarity with the UPLC1 protein. Interestingly, the UPLC1 protein is an important regulator in cancer cell migration/invasion and in actin-based cytoskeletal remodeling48.
Table 3

Phyre2 confident predictions found in the T. solium specific secretome

Gene IDTop structural hitConfidence (%)Sequence identity (%)Template information
08062.0.12b0b86.718PDB header: metal binding protein; Chain: F: PDB Molecule:uplc1
47522.0.12dcw78.5100PDB header: antimicrobial protein; Chain: A: PDB Molecule:tachystatin-b2
10029.7.11yfo69.047(Phosphotyrosine protein) phosphatases II; Higher-molecular-weight phosphotyrosine protein phosphatases
43027.12fd559.942Tetracyclin repressor-like, C-terminal domain
69637.11xak58.747Immunoglobulin-like beta-sandwich; Accessory protein X4 (ORF8, ORF7a)

The Abundance of Antigenic Regions (AAR) value

To evaluate the antigenicity potential of T. solium secretome the number of antigenic regions for each protein sequence was obtained using three different bioinformatics algorithms: the method reported by Kolaskar and Tongaonkar31, CBTOPE34 and BepiPred32. The Kolaskar31 method is a classical approach that uses the antigenicity propensity and physicochemical properties of amino acids to make the prediction of antigenic regions. The BepiPred32 method combine the hydrophilicity property of amino acids with a Hidden Markov Model (HMM) to predict B-cell epitopes. The CBTOPE34 method predicts conformational B-cell epitopes using the amino acid composition as an input feature for a Support Vector Machine (SVM) model. However, to normalize the number of antigenic regions by sequence length we introduce the Abundance of Antigenic Regions (AAR) value (see materials and methods). This normalization was applied to the results of the three bioinformatics methods used for antigenic prediction. The AAR value was used to define the number of amino acids between antigenic regions per sequence. Hence, low AAR values means that protein has more antigenic regions (more epitope density). We determined the AAR value for the 838 ES proteins and we found in average one antigenic region each 26.2 amino acids using the Kolaskar method (Table 4), while the AAR values using the CBTOPE34 and Bepipred32 methods, were of 105.7 and 93.6 respectively (Table 4). The three different AAR values obtained for ES proteins are due to the different number of antigenic regions predicted by each method. However, the three methods used for the prediction of antigenic regions show a consistently AAR difference between the ES and non-ES proteins obtained in each method (Table 4). Hence, we use the obtained AAR values by Kolaskar31 method for comparisons between protein datasets.
Table 4

Abundance of Antigenic Regions (AAR) for different T. solium protein datasets

Protein datasetNumber of proteins in the datasetAverage of AAR values (Kolaskar)Average of AAR values (CBTOPE)Average of AAR values (BepiPred)
Secretome83826.2105.793.6
Secretome supported at RNA level34726.2108.2101.9
Specific secretome12128.985.476.7
Specific secretome supported at RNA level4828.384.483.5
Experimentally determined ES proteins4621.774.381.3
Non-ES proteins from T. solium genome1206442.1126.5102.1
The AAR value for the 347 ES proteins that are supported at RNA level was of 26.2 (Table 4). The AAR value for the set of 121 ES proteins that is specific of T. solium genome was of 28.9, while the non-ES proteins have average one antigenic region each 42.1 amino acids (Table 4). The AAR value for the 48 ES proteins supported at RNA level which are specific of the T. solium secretome was of 28.3. Interestingly, all ES proteins datasets had twofold more antigenic regions in comparison with the non-ES proteins of the T. solium genome (Table 4). Hence the epitope density in ES proteins is higher than for non-ES proteins. For the validation of biological significance of AAR values, we calculated this value for a dataset of experimentally derived ES proteins of T. solium compiled from literature (see materials and methods). This set contained 46 protein sequences that have been experimentally reported to be useful in the diagnostic of human teniosis or neurocysticercosis (Supplementary Table S5). Interestingly, the AAR value for this antigenic protein dataset was 21.8, which is close to the calculated value for the secretome (Table 4). In contrast, the non-ES proteins showed an AAR value of 42.1. Interestingly, 44 (95.6%) of the 46 diagnostic proteins were found in our secretome (Supplementary Table S5). Furthermore, we also found RNA support for these 44 proteins (Supplementary Table S5). To test whether our obtained AAR values are similar to other known secretomes, we selected the secretomes of 12 helminth species which were recently reported in the Helminth Secretome Database (HSD)2 and their AAR values were calculated. Table 5 contains the AAR values for the 12 helminth secretomes (4 nematodes, 4 trematodes and 4 cestodes). Interestingly, the obtained AAR values for known helminth secretomes were very similar to that obtained for the T. solium secretome which is reported in this study (Table 5).
Table 5

Abundance of Antigenic Regions (AAR) for different known helminth secretomes

ES proteinsRelative Density of Antigenic Regions (Kolaskar)Average of AAR values (CBTOPE)Average of AAR values (BepiPred)
Nematodes   
Haterorhabditis bacteriophora26.496.5105.0
Caenorhabditis brenneri26.9102.196.0
Caenorhabditis japonica26.797.894.8
Heterodera glycines29.1100.697.7
Trematodes   
Echinostoma paraensei24.678.982.4
Fasciola gigantica28.282.080.5
Opisthorchis viverrini26.686.673.6
Paragonimus westermani26.368.177.8
Cestodes   
Echinococcus multilocularis28.091.092.0
Mesocestoides corti26.684.865.9
Moniezia expansa27.395.595.0
Spirometra erinaceieuropaei27.6111.678.9
Taenia solium26.2105.793.6

Discussion

The cysticercosis is a neglected zoonotic infection caused by T. solium parasite. It is one of the WHO's lists of most neglected tropical diseases and the most prevalent human tapeworm. We have applied different bioinformatics approaches to identify and annotate all the predicted ES proteins encoded in the T. solium genome. To the best of our knowledge, the present study is the most comprehensive in silico collection of the T. solium secretome and it represented the 6.5% of the total proteins encoded in their genome. This proportion of ES proteins is in agreement with secretomes previously reported for other species226. The ES proteins can circulate in the extracellular space of an organism making them attractive as targets for novel therapeutics, because they may be more accessible to drugs than other proteins. Our T. solium secretome provides a rich source of potential drug targets, vaccine candidates or diagnostic proteins for developing new treatment and diagnostics strategies. In addition, our study contributes to increase the knowledge of the molecular mechanisms of host-parasite interaction. As well as to identify novel proteins with immunomodulatory properties that could be used as targets to control inflammatory processes of non-infectious diseases. Functional information of the T. solium secretome was obtained through the analysis of Gene Ontology (GO) annotations of the 838 ES proteins. The top 10 GO term enrichment showed a statistical overrepresentation in the ES proteins of biological activities that are strongly related to the typical functions of secreted proteins (Figure 3). The GO terms related to extracellular matrix, endoplasmic reticulum lumen and anchored to membrane showed a significant enrichment in the Cellular Component category. The secretome of an organism includes all proteins secreted by the cell including those of the extracellular matrix, proteins shed from the cell membrane and vesicle proteins like microsomal vesicles14950. The GO term enrichment related to the endoplasmic reticulum lumen suggests that, even with a correctly predicted signal peptide, some proteins can be resident of the endoplasmic reticulum. The top 10 GO term enrichment of Biological Process and Molecular Function showed a statistical overrepresentation in the ES proteins of peptidase activities, extracellular organization and cell adhesion terms. Proteins with peptidase domains have been previously reported to be involved in virulence activity in several helminth species51. Several ES proteins were predicted to be involved in antigen processing and presentation pathway. Interestingly, there is evidence that secreted glycoantigens by cysticerci can modulate the host inflammatory response through the activation of dendritic cells in the experimental murine cysticercosis caused by T. crassiceps52. However, the relevance of ES proteins on the modulation of host-parasite relationships has not been studied in human cysticercosis. Although, it is well known that helminth ES proteins can modulate the host immune system during the infection for parasite survival131415. The functional annotations found in the T. solium secretome by GO term enrichment, pathway mapping, enzyme code distribution and protein domain analysis strengthened our bioinformatics workflow to be useful to predict secretomes in other genomes. However, it is clear that integration of bioinformatics strategies with RNAseq data can improve the identification of expressed secretomes. Interestingly, the 41.4% of our secretome was supported at RNA level (unpublished data). The 121 ES proteins specific of T. solium secretome represents potential novel drug or vaccine targets for therapeutic strategies and denotes the importance of future experimental research to characterize this protein dataset. The proteins of this dataset are not shared with other sequenced organisms, suggesting that it can be explored as diagnostic proteins for specific T. solium infections. The T. solium is unable to synthesize the amino-acid lysine and among the secreted proteins we found enzymes able to degrade lysine-containing peptides. This finding is an example of the complex host-parasite interactions. The presence of lytic proteins in our secretome, suggest that these proteins can be used to cut down nutrients making them more accessible for the parasite or to cut down immune response-related molecules that could induce parasite damage535455565758. Interestingly, the hydrolases and oxidoreductases showed an overrepresentation in the secretome as compared to the distribution of the same terms for the non-ES proteins of Taenia solium genome. It is in agreement with the considerable enrichment of this enzyme types found in other experimentally determined secretomes505960. Previously was suggested that high epitope density in a single protein molecule significantly enhances their antigenicity and immunogenicity61. Here, we found that experimental determined antigenic proteins have more antigenic density, measured by the normalization of the number of antigenic regions by sequence length (AAR values in Tables 4 and 5). It is, in fact, a manageable metric which reflects the epitope density of a protein. To our knowledge, AAR is the first example of a tool implementing antigenic regions and sequence length to estimate the antigenicity of a protein at genome wide level. Nearly 40% of predicted ES proteins remain unannotated in the Helminth Secretome Database (HSD)2. The sequence annotation results obtained for the T. solium specific secretome, which were based in BLAST and HMMER searches, fold recognition strategies and AAR values, suggest that these strategies can be used to enhance the annotations of known secretomes. The Abundance of Antigenic Regions (AAR) value for the T. solium secretome (Table 4) showed that these proteins are enriched of antigenic regions as compared to the non-ES proteins. Interestingly, the AAR values for the ES proteins were very similar to that obtained for the diagnostic proteins, suggesting their potential use in the diagnosis of T. solium infections (Table 4). In addition, the obtained AAR values for known helminth secretomes were very similar to that obtained for T. solium secretome (Table 5). These results demonstrated the utility of the AAR value as a novel genomic measurement to evaluate the potential antigenicity of ES proteins at genome wide level. The traditional cloning of the proteins for immunization purposes is clearly not feasible on a genomic scale. The AAR approach is cost effective and can guide a genome wide search for antigenic proteins of therapeutic, diagnosis and immunological interest. The use of different algorithms to make the prediction of antigenic regions could potentially improve the predictions. In this work, we obtained the AAR values using the number of antigenic regions predicted from three independent algorithms, the CBTOPE34 which is based in a Support Vector Machine (SVM) model, the BepiPred32 which is based in a Hidden Markov Model (HMM) and Kolaskar31 that uses the antigenicity propensity and physicochemical properties of amino acids to make the prediction of antigenic regions. Although, the obtained AAR values using Kolaskar31 method shows more antigenic regions per protein than the AAR values obtained using CBTOPE34 and BepiPred32, there is a consistently difference of AAR values between ES and non-ES proteins for each method (Tables 4–5). The T. solium ES proteins could be used as antigens to capture antibodies from infected patients. Subsequently, the antibodies can be used to directly detect the ES antigens in infected patients through a sandwich ELISA. Actually, the human NC diagnosis has not high sensibility and specificity to establish the definitive NC diagnosis in patients with neurological diseases. The HP10 monoclonal antibody is one of the best proteins used for immunodiagnosis. However, the HP10 is only effective for the detection and the follow-up of the most severe forms of NC (this is when vesicular cysticercis are located in subarachnoid space at the base62). Although, novel ES antigens from oncosphere stage has been recently suggested for NC diagnosis2563. However, the immunoassays in pigs using T. solium ES or total antigens have been demonstrated a low sensibility and many false positives and false negatives64. The experimental study of the ES proteins identified in this work will confirm the proteins that can be candidate for use in the development of new diagnostic tests and new disease treatments. However, protein functions are strongly context-dependent and further experimental analyses are needed to improve the reliability of the functional interpretation of our results. Additionally, further studies on the proteomic level are highly desirable to confirm the predicted secretome reported herein.

Methods

The bioinformatics pipeline is summarized in Figure 1. We started out with 12,902 protein sequences of the T. solium genome26. For all of these proteins the SignalP (version 4.1)35 and SecretomeP (version 2.0)36 algorithms were applied. SignalP was used to predict classically secreted proteins, setting the option for eukaryote organisms and the positional limit of 70 residues for truncation before submitting it to the neural networks algorithm. The input sequences also may include TM regions and the D-cutoff values were setting as default. SecretomeP was used to predict the non-classical secreted proteins using the default options for mammalian organisms. All the classical and non-classical secretory proteins were merged together and the resulting list was scanned by TargetP37 to predict the mitochondrial proteins, using at 95% of specificity and the default options for non-plant organisms. The mitochondrial proteins predicted by TargetP were discarded from the protein data set. The resulting ES proteins were subsequently scanned for the presence of transmembrane helices by TMHMM (version 2.0)38 and protein sequences exhibiting transmembrane helices were also excluded from the final protein data set.

Functional annotation and comparative analysis of ES proteins

The ES proteins were functionally annotated using several bioinformatics tools. For identifying homologous proteins, ES proteins were BLASTed (BLASTP) against the non-redundant (nr) database using the Blast2GO package. The E-value cut-off was set at 1.0 E−3. Supported by Blast2GO39406566 ES proteins were functionally mapped to GO terms and annotated by setting the following parameters: E-Value-Hit-Filter: 1.0 E−3; Annotation cut-off: 55; GO weight: 5; Hsp-Hit Coverage cut-off: 0. The ES proteins were also mapped to Gene Ontology terms using Argot241 by setting the Total Score (TS) to ≥200. Additionally, ES proteins were associated to protein families and domains through InterProScan4546. Blast2GO was used to identify the statistically enriched GO terms represented in the ES proteins setting the term filter value to 0.05 and the term filter mode to FDR. The KAAS was used for mapping ES proteins to KEGG pathways and to KEGG BRITE objects using the BBH (bi-directional best hit) method to assign the orthologs and the representative genes data set was setting for eukaryotes424344. The 838 ES proteins were searched for sequence similarity against the Hymenolepis microstoma (Family: Hymenolepididae) and E. multilocularis (Family: Taeniidae) genomes26 using BLASTP (E-value cut-off was set at 1.0 E−3) to obtain the specific secretome of T. solium. The number of antigenic regions was calculated using the methods Kolaskar and Tongoankar31, CBTOPE34 and BepiPred32 for each protein. The Abundance of Antigenic Regions (AAR) was calculated as follows for each method: Xp: The relative abundance of antigenic regions in protein p Lp: The sequence length in protein p Ap: The number of antigenic regions in protein p The AAR value was introduced to define the number of amino acids between antigenic regions for each protein. This value was scored as the ratio between the sequence lengths to the number of predicted antigenic regions for each protein. Hence, the final value determines the number of amino acids that are needed to find one antigenic region in the corresponding sequence. The dataset of experimentally-determined proteins used to diagnose human T. solium infections was compiled from a search at NCBI database. After that, we found 46 different proteins, at the sequence level, that have been experimentally reported to be useful for the diagnostic of human teniosis or neurocysticercosis (Supplementary Table S5). The ES protein sequences also were submitted to Phyre2 program47 using the default options and the twenty top scoring matches (if any) were retained for each protein. The Phyre 2 result is based in secondary structure prediction coupled to fold-recognition and three-dimensional structure predictions47.

Author Contributions

S.G., L.A.P., H.P.F., V.A.C.R. and A.O.L. generated and conducted the bioinformatics analyses. S.G. and A.O.L. performed the statistical analysis. L.A.P. and A.O.L. wrote the manuscript. X.S., E.S., G.F., R.J.B., J.P.L. and L.P.Y. examined the data. A.O.L. conceived the project, generated data, conducted bioinformatics analyses and coordinated the draft manuscript. All authors edited and approved the final manuscript.

Additional information

How to cite this article: Gomez, S. et al. Genome analysis of Excretory/Secretory proteins in Taenia solium reveals their Abundance of Antigenic Regions (AAR). Sci. Rep. 5, 9683; DOI:10.1038/srep09683 (2015).
  66 in total

1.  Predicting transmembrane protein topology with a hidden Markov model: application to complete genomes.

Authors:  A Krogh; B Larsson; G von Heijne; E L Sonnhammer
Journal:  J Mol Biol       Date:  2001-01-19       Impact factor: 5.469

2.  Protein structure prediction on the Web: a case study using the Phyre server.

Authors:  Lawrence A Kelley; Michael J E Sternberg
Journal:  Nat Protoc       Date:  2009       Impact factor: 13.491

3.  New hydrophilicity scale derived from high-performance liquid chromatography peptide retention data: correlation of predicted surface residues with antigenicity and X-ray-derived accessible sites.

Authors:  J M Parker; D Guo; R S Hodges
Journal:  Biochemistry       Date:  1986-09-23       Impact factor: 3.162

4.  Temporal alterations in the secretome of the selective ligninolytic fungus Ceriporiopsis subvermispora during growth on aspen wood reveal this organism's strategy for degrading lignocellulose.

Authors:  Chiaki Hori; Jill Gaskell; Kiyohiko Igarashi; Phil Kersten; Michael Mozuch; Masahiro Samejima; Dan Cullen
Journal:  Appl Environ Microbiol       Date:  2014-01-17       Impact factor: 4.792

Review 5.  Cysticercosis of the central nervous system: how should it be managed?

Authors:  Hector H Garcia; Armando E Gonzalez; Robert H Gilman
Journal:  Curr Opin Infect Dis       Date:  2011-10       Impact factor: 4.915

6.  Comparison of the peptidase activity in the oncosphere excretory/secretory products of Taenia solium and Taenia saginata.

Authors:  Mirko J Zimic; Jesús Infantes; César López; Jeanette Velásquez; Marilú Farfán; Mónica Pajuelo; Patricia Sheen; Manuela Verastegui; Armando Gonzalez; Hector H Garciá; Robert H Gilman
Journal:  J Parasitol       Date:  2007-08       Impact factor: 1.276

7.  Diagnosis of porcine cysticercosis: a comparative study of serological tests for detection of circulating antibody and viable parasites.

Authors:  E Sciutto; M Hernández; G García; A S de Aluja; A N Villalobos; L F Rodarte; M Parkhouse; L Harrison
Journal:  Vet Parasitol       Date:  1998-08-14       Impact factor: 2.738

8.  Induction of hepatitis A virus-neutralizing antibody by a virus-specific synthetic peptide.

Authors:  E A Emini; J V Hughes; D S Perlow; J Boger
Journal:  J Virol       Date:  1985-09       Impact factor: 5.103

Review 9.  Host targeting of virulence determinants and phosphoinositides in blood stage malaria parasites.

Authors:  Souvik Bhattacharjee; Robert V Stahelin; Kasturi Haldar
Journal:  Trends Parasitol       Date:  2012-10-16

10.  From genomics to chemical genomics: new developments in KEGG.

Authors:  Minoru Kanehisa; Susumu Goto; Masahiro Hattori; Kiyoko F Aoki-Kinoshita; Masumi Itoh; Shuichi Kawashima; Toshiaki Katayama; Michihiro Araki; Mika Hirakawa
Journal:  Nucleic Acids Res       Date:  2006-01-01       Impact factor: 16.971

View more
  18 in total

1.  A gel-free proteomic analysis of Taenia solium and Taenia crassiceps cysticerci vesicular extracts.

Authors:  Giovani Carlo Veríssimo da Costa; Regina Helena Saramago Peralta; Dário Eluan Kalume; Ana Larissa Gama Martins Alves; José Mauro Peralta
Journal:  Parasitol Res       Date:  2018-09-13       Impact factor: 2.289

Review 2.  Recent Trends in System-Scale Integrative Approaches for Discovering Protective Antigens Against Mycobacterial Pathogens.

Authors:  Aarti Rana; Shweta Thakur; Girish Kumar; Yusuf Akhter
Journal:  Front Genet       Date:  2018-11-27       Impact factor: 4.599

3.  Zone of Interaction Between the Parasite and the Host: Protein Profile of the Body Cavity Fluid of Gasterosteus aculeatus L. Infected with the Cestode Schistocephalus solidus (Muller, 1776).

Authors:  Albina Kochneva; Ekaterina Borvinskaya; Lev Smirnov
Journal:  Acta Parasitol       Date:  2021-01-02       Impact factor: 1.440

4.  Experimental and Theoretical Approaches To Investigate the Immunogenicity of Taenia solium-Derived KE7 Antigen.

Authors:  Raúl J Bobes; José Navarrete-Perea; Adrián Ochoa-Leyva; Víctor Hugo Anaya; Marisela Hernández; Jacquelynne Cervantes-Torres; Karel Estrada; Filiberto Sánchez-Lopez; Xavier Soberón; Gabriela Rosas; Cáris Maroni Nunes; Martín García-Varela; Rogerio Rafael Sotelo-Mundo; Alonso Alexis López-Zavala; Goar Gevorkian; Gonzalo Acero; Juan P Laclette; Gladis Fragoso; Edda Sciutto
Journal:  Infect Immun       Date:  2017-11-17       Impact factor: 3.441

Review 5.  Taenia solium Cysticercosis and Its Impact in Neurological Disease.

Authors:  Hector H Garcia; Armando E Gonzalez; Robert H Gilman
Journal:  Clin Microbiol Rev       Date:  2020-05-27       Impact factor: 26.132

6.  Computational identification and characterization of antigenic properties of Rv3899c of Mycobacterium tuberculosis and its interaction with human leukocyte antigen (HLA).

Authors:  Ritam Das; Kandasamy Eniyan; Urmi Bajpai
Journal:  Immunogenetics       Date:  2021-07-06       Impact factor: 2.846

7.  Genome-wide analysis of excretory/secretory proteins in Echinococcus multilocularis: insights into functional characteristics of the tapeworm secretome.

Authors:  Shuai Wang; Wei Wei; Xuepeng Cai
Journal:  Parasit Vectors       Date:  2015-12-30       Impact factor: 3.876

8.  Effect of Transforming Growth Factor-β upon Taenia solium and Taenia crassiceps Cysticerci.

Authors:  Laura Adalid-Peralta; Gabriela Rosas; Asiel Arce-Sillas; Raúl J Bobes; Graciela Cárdenas; Marisela Hernández; Celeste Trejo; Gabriela Meneses; Beatriz Hernández; Karel Estrada; Agnes Fleury; Juan P Laclette; Carlos Larralde; Edda Sciutto; Gladis Fragoso
Journal:  Sci Rep       Date:  2017-09-27       Impact factor: 4.379

9.  Functional diversity of secreted cestode Kunitz proteins: Inhibition of serine peptidases and blockade of cation channels.

Authors:  Martín Fló; Mariana Margenat; Leonardo Pellizza; Martín Graña; Rosario Durán; Adriana Báez; Emilio Salceda; Enrique Soto; Beatriz Alvarez; Cecilia Fernández
Journal:  PLoS Pathog       Date:  2017-02-13       Impact factor: 6.823

10.  Secretome Prediction of Two M. tuberculosis Clinical Isolates Reveals Their High Antigenic Density and Potential Drug Targets.

Authors:  Fernanda Cornejo-Granados; Zyanya L Zatarain-Barrón; Vito A Cantu-Robles; Alfredo Mendoza-Vargas; Camilo Molina-Romero; Filiberto Sánchez; Luis Del Pozo-Yauner; Rogelio Hernández-Pando; Adrián Ochoa-Leyva
Journal:  Front Microbiol       Date:  2017-02-07       Impact factor: 5.640

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.