| Literature DB >> 30837982 |
Mattia Dalsass1,2, Alessandro Brozzi1, Duccio Medini1, Rino Rappuoli1.
Abstract
Reverse Vaccinology (RV) is a widely used approach to identify potential vaccine candidates (PVCs) by screening the proteome of a pathogen through computational analyses. Since its first application in Group B meningococcus (MenB) vaccine in early 1990's, several software programs have been developed implementing different flavors of the first RV protocol. However, there has been no comprehensive review to date on these different RV tools. We have compared six of these applications designed for bacterial vaccines (NERVE, Vaxign, VaxiJen, Jenner-predict, Bowman-Heinson, and VacSol) against a set of 11 pathogens for which a curated list of known bacterial protective antigens (BPAs) was available. We present results on: (1) the comparison of criteria and programs used for the selection of PVCs (2) computational runtime and (3) performances in terms of fraction of proteome identified as PVC, fraction and enrichment of BPA identified in the set of PVCs. This review demonstrates that none of the programs was able to recall 100% of the tested set of BPAs and that the output lists of proteins are in poor agreement suggesting in the process of prioritize vaccine candidates not to rely on a single RV tool response. Singularly the best balance in terms of fraction of a proteome predicted as good candidate and recall of BPAs has been observed by the machine-learning approach proposed by Bowman (1) and enhanced by Heinson (2). Even though more performing than the other approaches it shows the disadvantage of limited accessibility to non-experts users and strong dependence between results and a-priori training dataset composition. In conclusion we believe that to significantly enhance the performances of next RV methods further studies should focus on the enhancement of accuracy of the existing protein annotation tools and should leverage on the assets of machine-learning techniques applied to biological datasets expanded also through the incorporation and curation of bacterial proteins characterized by negative experimental results.Entities:
Keywords: antigen; bacterial pathogens; bacterial protective antigens (BPAs); potential vaccine candidates (PVCs); reverse vaccinology (RV) programs
Mesh:
Substances:
Year: 2019 PMID: 30837982 PMCID: PMC6382693 DOI: 10.3389/fimmu.2019.00113
Source DB: PubMed Journal: Front Immunol ISSN: 1664-3224 Impact factor: 7.561
Figure 1Cartoon schematically representing the main steps for protein subunit vaccines development. In the square is highlighted the Reverse Vaccinology part (A). Timeline of the delivery of RV standalone programs and their main characteristics in terms of type of software, interface and target pathogen (B).
Synoptic summary of the main characteristics of the six programs tested.
| NERVE | Decision-tree | No cytoplasmatic protein < 2 transmembrane helices High adhesin probability No homology with human proteins | Input and output data are automatically structured in a database | Not updated | 4 |
| VaxiJen | Machine-learning | Output probability greater than a cut-off (0.5) | Very fast Graphical interface | Fixed training datasets (100 known bacterial antigens, 100 putative non-antigens) | 20 |
| Vaxign | Decision-tree | No cytoplasmatic protein < 2 transmembrane helices High adhesin probability No homology with human and mouse proteins | Regularly maintained Easy to use and intuitive | Download of the results is limited to 500 proteins | 18 |
| Jenner-predict | Decision-tree | No cytoplasmatic protein < 2 transmembrane helices Presence of Pfam domains involved in host-pathogen interaction and pathogenesis | Upload and download of large datasets | Temporarily unavailable | 1 |
| Bowman-Heinson | Machine-learning | Output probability greater than a cut-off (0.5) | Larger training set (200 known bacterial antigens, 200 putative non-antigens) | Annotation tools for eukaryotes used for bacterial proteins Pipeline not delivered | 0 |
| VacSol | Decision-tree | No cytoplasmatic protein < 2 transmembrane helices No homology with human proteins Essential gene Virulence factor | User-friendly interface | Too restrictive | 0 |
Prototype of the golden-standard 2 x 2 table to measure the RV performances.
| RV method prediction | PVC | True positive (TP) | False positive (FP) |
| Not-PVC | False negative (FN) | True negative (TN) | |
Summary of the external computational programs used by the six programs to predict the protein features instrumental to filter or classify PVCs.
| Subcellular localization | Psortb | X | X | X | X | |
| TargetP | X | |||||
| Transmembrane domains | HMMTOP | X | X | X | X | |
| Pathogenic domains or virulent factors | SPAAN | X | X | |||
| Pfam | X | |||||
| VFDB | X | |||||
| LipoP | X | |||||
| Similarity to host proteins | BLASTp against MHCPEP db | X | ||||
| BLASTp against RefSeq and Swiss Prot db | X | |||||
| OrthoMCL | X | |||||
| B-T cell response | NetMhc | X | ||||
| Vaxitope | X | |||||
| ABCPred | X | |||||
| ProPred-I | X | |||||
| ProPred | X | |||||
| GPS-MBA | X | |||||
| PickPocket | X | |||||
| Post-translational modification | YinOYang (glycosylation) | X | ||||
| NetPhosK (phosphorylation) | X | |||||
| ProP (proprotein convertase cleavage) | X |
Summary of run times on a benchmark dataset of 100 proteins (average length 360 a.a.).
| VaxiJen | 5 s |
| Vaxign | 5 min 40 s |
| NERVE | 17 min 37 s |
| Bowman-Heinson | 27 min 8 s |
| VacSol | 49 min 40 s |
Fraction of PVCs predicted by each of the six programs (NERVE, VaxiJen, Vaxign, VacSol, Bowman-Heinson, and Jenner-predict) where pathogens are listed following the order of their proteome size.
| 627 (11.7%) | 1,979 (37%) | 452 (8.5%) | 174 (3.3%) | 661 (12.4%) | 250 (4.7%) | |
| 690 (19.2%) | 972 (27%) | 504 (14%) | 190 (5.3%) | 414 (11.5%) | 121 (3.4%) | |
| 398 (11.8%) | 984 (29.2%) | 330 (9.8%) | 25 (0.7%) | 380 (11.3%) | 260 (7.7%) | |
| 254 (9.5%) | 992 (37.3%) | 125 (4.7%) | 118 (4.4%) | 300 (11.3%) | 126 (4.7%) | |
| 194 (9.2%) | 625 (29.6%) | 122 (5.8%) | 111 (5.3%) | 216 (10.2%) | 75 (3.6%) | |
| 272 (12.9%) | 917 (43.5%) | 197 (9.3%) | 45 (2.1%) | 304 (14.4%) | 81 (3.8%) | |
| 256 (12.8%) | 815 (40.7%) | 180 (9%) | 37 (1.8%) | 308 (15.4%) | 88 (4.4%) | |
| 92 (5.6%) | 682 (41.3%) | 85 (5.2%) | 20 (1.2%) | 234 (14.2%) | 58 (3.5%) | |
| 199 (12.2%) | 530 (32.6%) | 111 (6.8%) | 39 (2.4%) | 211 (13%) | 60 (3.7%) | |
| 201 (13.5%) | 429 (28.7%) | 131 (8.8%) | 40 (2.7%) | 231 (15.5%) | 81 (5.4%) | |
| 213 (16.5%) | 432 (33.5%) | 96 (7.4%) | 10 (0.8%) | 186 (14.4%) | 25 (1.9%) | |
| Average on 27,247 total proteins | 3,396 (12.5%) | 9,357 (34.3%) | 2,602 (9.5%) | 809 (3%) | 3,445 (12.6%) | 1,225 (4.5%) |
Figure 2Hierarchical clustering of RV programs based on the fraction (%) of proteome predicted as PVCs. Columns correspond to the six programs and rows to the 11 pathogens' proteomes. Legend shows the color code of fraction of proteome predicted as PVCs.
Figure 3Hierarchical clustering of the six RV programs (columns) based on PVC calls for 27,247 total proteins (rows). Each cell corresponds to the output of each program for each protein: white colored means not-PVC, black colored means PVC.
Choen's kappa values for the pair-wise agreement between programs.
| VaxiJen | −0.012 | ||||
| Vaxign | −0.016 | 0.129 | |||
| NERVE | −0.014 | 0.135 | |||
| Bowman-Heinson | 0.009 | 0.126 | 0.278 | 0.276 | |
| Jenner-predict | 0.032 | 0.073 | 0.321 | 0.269 | 0.275 |
In bold the maximum value of each column.
Summary of the performance on the RV programs in terms of recall of BPAs and fold-enrichment.
| NERVE | 3,396 | 64 | 64 | 12 | 5.1 | 1.51E-33 |
| VaxiJen | 9,357 | 2.2 | 1.80E-17 | |||
| Vaxign | 2,602 | 58 | 58 | 10 | 6.1 | 1.90E-33 |
| VacSol | 809 | 4 | 4 | 3 | 1.3 | 3.46E-01 |
| Bowman-Heinson | 3,445 | 75 | 75 | 13 | 5.9 | 1.99E-46 |
| Jenner-predict | 1,225 | 44 | 44 | 4 | 1.09E-32 |
In bold the maximum value of each column. Numbers are referred to the total number of proteins (27,247) of the 11 pathogens. BPAs are 100 in total.