| Literature DB >> 30177953 |
Stephen J Goodswen1, Paul J Kennedy2, John T Ellis1.
Abstract
Over the last two decades, various in silico approaches have been developed and refined that attempt to identify protein and/or peptide vaccines candidates from informative signals encoded in protein sequences of a target pathogen. As to date, no signal has been identified that clearly indicates a protein will effectively contribute to a protective immune response in a host. The premise for this study is that proteins under positive selection from the immune system are more likely suitable vaccine candidates than proteins exposed to other selection pressures. Furthermore, our expectation is that protein sequence regions encoding major histocompatibility complexes (MHC) binding peptides will contain consecutive positive selection sites. Using freely available data and bioinformatic tools, we present a high-throughput approach through a pipeline that predicts positive selection sites, protein subcellular locations, and sequence locations of medium to high T-Cell MHC class I binding peptides. Positive selection sites are estimated from a sequence alignment by comparing rates of synonymous (dS) and non-synonymous (dN) substitutions among protein coding sequences of orthologous genes in a phylogeny. The main pipeline output is a list of protein vaccine candidates predicted to be naturally exposed to the immune system and containing sites under positive selection. Candidates are ranked with respect to the number of consecutive sites located on protein sequence regions encoding MHCI-binding peptides. Results are constrained by the reliability of prediction programs and quality of input data. Protein sequences from Toxoplasma gondii ME49 strain (TGME49) were used as a case study. Surface antigen (SAG), dense granules (GRA), microneme (MIC), and rhoptry (ROP) proteins are considered worthy T. gondii candidates. Given 8263 TGME49 protein sequences processed anonymously, the top 10 predicted candidates were all worthy candidates. In particular, the top ten included ROP5 and ROP18, which are T. gondii virulence determinants. The chance of randomly selecting a ROP protein was 0.2% given 8263 sequences. We conclude that the approach described is a valuable addition to other in silico approaches to identify vaccines candidates worthy of laboratory validation and could be adapted for other apicomplexan parasite species (with appropriate data).Entities:
Keywords: Hammondia hammondi; Neospora caninum; Toxoplasma gondii; positive selection; reverse vaccinology; vaccine discovery
Year: 2018 PMID: 30177953 PMCID: PMC6109633 DOI: 10.3389/fgene.2018.00332
Source DB: PubMed Journal: Front Genet ISSN: 1664-8021 Impact factor: 4.599
List of Apicomplexan species used in study.
| Data group | Speciesa |
|---|---|
| 16 species | |
| 25 species | 14 Set + |
| 55 species | 14 and 25 Set + |
Programs used in the study [download date: July 2017].
| Program | Version | aFunction | Download URL | Reference |
|---|---|---|---|---|
| bBLASTP | 2.6.0 | Performs a protein vs. protein sequence alignment | ||
| Clustal Omega | 1.2.4 | Computes a multiple sequence alignment | ||
| PAL2NAL | 14 | Converts a multiple sequence alignment of proteins and the corresponding mRNA sequences into a codon-based DNA alignment. | ||
| RAxML | 8.2.10 | Creates a phylogenetic tree based on maximum-likelihood inference. | ||
| cCODEML | 4.9e | Computes substitution rate ratio (dN/dS) | ||
| dpredict_binding.py | 2.17 | Predicts peptides binding to Major Histocompatibility Complex (MHC) class I molecules | ||
| WoLF PSORT | 0.2 | Predicts subcellular localization sites of proteins | e | |
| SignalP | 4.1 | Predicts the presence and location of signal peptide cleavage sites | ||
| TargetP | 1.1 | Predicts subcellular location | ||
| TMHMM | 2.0 | Predicts transmembrane helices | ||
| Phobius | 1.01 | A combined transmembrane topology and signal peptide predictor | ||
| Vacceed | 1.0 | Predicts secreted and/or membrane-associated proteins | ||
Number of ortholog groups per dataset and the number of predicted candidates.
| Species dataset | Filter set | Dataset similarity criteriaa | Ortholog groups | Number of candidatesd | |
|---|---|---|---|---|---|
| Inputb | Outputc | ||||
| 16 | 1 | >70% and <99% | 3139 | 2986 | 651 (280) |
| 16 | 2 | >70% and <95% | 143 | 130 | 60 (43) |
| 16 | 3 | >70% and <90% | 22 | 19 | 11 (8) |
| 25 | 1 | >70% and <99% | 3606 | 3373 | 663 (290) |
| 25 | 2 | >70% and <95% | 581 | 520 | 61 (44) |
| 25 | 3 | >70% and <90% | 252 | 226 | 16 (13) |
| 55 | 1 | >70% and <99% | 3522 | NC | NC |
| 55 | 2 | >70% and <95% | 597 | 527 | 63 (46) |
| 55 | 3 | >70% and <90% | 314 | 288 | 17 (14) |
Comparisons between predicted outcomes from different species datasets when predicting target candidates for Toxoplasma gondii ME49.
| Species dataseta | Similarity criteriab | TP | FP | FN | TN | SP (%) | SN (%) | PPV (%) | NPV (%) | Processing time (hms)c |
|---|---|---|---|---|---|---|---|---|---|---|
| 16 | >70% and <99% | 90 | 189 | 35 | 894 | 72 | 82 | 32 | 96 | 65 h 25 m 6 s |
| 16 | >70% and <95% | 40 | 3 | 13 | 27 | 75 | 90 | 93 | 68 | 1 h 53 m 48 s |
| 16 | >70% and <90% | 7 | 1 | 2 | 3 | 78 | 75 | 88 | 60 | 12 m 4 s |
| 25 | >70% and <99% | 87 | 203 | 53 | 1257 | 62 | 86 | 30 | 96 | 97 h 3 m 51 s |
| 25 | >70% and <95% | 38 | 6 | 14 | 389 | 73 | 98 | 86 | 96 | 8 h 46 m 43 s |
| 25 | >70% and <90% | 6 | 7 | 3 | 192 | 67 | 96 | 46 | 98 | 5 h 1 m 8 s |
| 55 | >70% and <95% | 40 | 6 | 12 | 400 | 76 | 98 | 87 | 97 | 360 h 14 m 24 s |
| 55 | >70% and <90% | 8 | 6 | 3 | 247 | 73 | 98 | 27 | 99 | 202 h 26 m 14 s |
The top 10 predicted Toxoplasma gondii ME49 vaccine candidates for this study, i.e., proteins predicted to be exposed to the immune system, under positive selection, and contain consecutive positive selection sites on intermediate and/or high binding MHC I peptides.
| Protein ID | Protein namea | No. of sites | No. of sig. sites | Consc. PSSs | Max. No. consecutive | Exposed probability | Reference |
|---|---|---|---|---|---|---|---|
| TGME49_227280 | Dense granule protein GRA3 | 20 | 20 | 179 | 8 | 0.99 | |
| TGME49_310780 | Dense granule protein GRA4 | 82 | 21 | 153 | 8 | 0.92 | |
| TGME49_309330 | SAG-related sequence SRS55F | 63 | 18 | 107 | 5 | 0.82 | |
| TGME49_320190 | SAG-related sequence SRS16Bb | 72 | 26 | 52 | 4 | 0.90 | |
| TGME49_320200 | SAG-related sequence SRS16A | 39 | 13 | 41 | 3 | 0.94 | |
| TGME49_215775 | Rhoptry protein ROP8 | 134 | 13 | 40 | 2 | 0.86 | |
| TGME49_214080 | Toxofilinc | 39 | 14 | 36 | 3 | 0.94 | |
| TGME49_205250 | Rhoptry protein ROP18 | 87 | 11 | 35 | 3 | 0.98 | |
| TGME49_238440 | SAG-related sequence SRS22A | 28 | 18 | 30 | 4 | 0.59 | |
| TGME49_308090 | Rhoptry protein ROP5 | 20 | 7 | 29 | 3 | 0.86 | |