Literature DB >> 24790156

Vacceed: a high-throughput in silico vaccine candidate discovery pipeline for eukaryotic pathogens based on reverse vaccinology.

Stephen J Goodswen1, Paul J Kennedy1, John T Ellis1.   

Abstract

UNLABELLED: We present Vacceed, a highly configurable and scalable framework designed to automate the process of high-throughput in silico vaccine candidate discovery for eukaryotic pathogens. Given thousands of protein sequences from the target pathogen as input, the main output is a ranked list of protein candidates determined by a set of machine learning algorithms. Vacceed has the potential to save time and money by reducing the number of false candidates allocated for laboratory validation. Vacceed, if required, can also predict protein sequences from the pathogen's genome.
AVAILABILITY AND IMPLEMENTATION: Vacceed is tested on Linux and can be freely downloaded from https://github.com/sgoodswe/vacceed/releases (includes a worked example with sample data). Vacceed User Guide can be obtained from https://github.com/sgoodswe/vacceed.
© The Author 2014. Published by Oxford University Press.

Entities:  

Mesh:

Substances:

Year:  2014        PMID: 24790156      PMCID: PMC4207429          DOI: 10.1093/bioinformatics/btu300

Source DB:  PubMed          Journal:  Bioinformatics        ISSN: 1367-4803            Impact factor:   6.937


1 INTRODUCTION

Several subunit vaccines against prokaryotic pathogens have been identified (Ariel ; Montigiani ; Ross ) using reverse vaccinology (Rappuoli, 2000). Vaxign (He ) and NERVE (Vivona ) are examples of vaccine discovery tools for prokaryotes, but there is currently no equivalent tool for eukaryotes. Freely available bioinformatics tools and an unprecedented volume of –omics data now present an opportunity for in silico vaccine discovery for eukaryotic pathogens. A general approach is to use several tools to predict and gather evidence for protein characteristics. From this potential evidence, the researcher makes an informed decision as to a protein’s vaccine candidacy suitability. Determining which tools are appropriate, as well as how to use them, presents the first of many challenges. A further challenge, especially to a researcher with limited programming ability, is to extract and gather the pertinent evidence distributed within large-scale outputs. The subsequent and more imposing challenge is that the evidence is mainly in different formats, contradicting and inaccurate. Poor evidence reliability arises because some of the input data to the tools (e.g. protein sequences and training data) are inaccurate or missing. Moreover, tools used to predict protein characteristics are, in general, inaccurate. Vaccine candidates identified in silico can only be validated in a laboratory. Validation should provide feedback to inform and improve vaccine candidacy decision making. The repetitive nature for this ideal in silico approach is in need of automation. Furthermore, an automated process must accommodate an ever-increasing choice of new or improved prediction programs that inevitably replace existing ones. We have developed Vacceed to address the challenges raised here i.e. to provide a flexible, automated process to predict worthy vaccine candidates from large volumes of superfluous, disseminated and noisy data. Vacceed is the collective name for a framework of linked bioinformatics programs, Perl scripts, R functions and Linux shell scripts. A previous published study provided guidance in development (Goodswen ).

2 DISTINCTIVE FEATURES

A detailed description of all aspects of Vacceed is provided in a comprehensive user guide provided as Supplementary Information. The focus of this article is to introduce Vacceed via a selection of distinctive features. The Vacceed framework is built around the concept of linked resources (see Fig. 1). Each resource, in this context, is built from a central Linux shell script encapsulating all programs needed to perform specific but related tasks. Typical tasks include predicting a particular protein characteristic as well as pre- and post-validation. A resource can be executed as an independent modular unit. This flexible design allows for scalability and easy maintenance. Any prediction program can be integrated within an existing or new resource if it meets the following criteria: runs in a Linux environment, has high-throughput capability, is applicable to eukaryotes, can be trained or has trained data specific to target pathogen and provides consistent text output. From a user’s perspective, all the work involved in the complexity of linking tasks and resources into a seamless continuous pipeline has already been resolved in Vacceed. The only time a user must be concerned with the contents of a resource is when adding a new one. There is a template resource script and generic Perl scripts to ease this process.
Fig. 1.

Vacceed framework. A set hierarchal structure exists for the execution of all Vacceed scripts e.g. startup → master script → resource script → subordinate script (only three resources are shown to maintain clarity)

Vacceed framework. A set hierarchal structure exists for the execution of all Vacceed scripts e.g. startup → master script → resource script → subordinate script (only three resources are shown to maintain clarity) Core to Vacceed are user-definable configuration files (see Fig. 2). These files are in effect the user’s interface to configuring each resource, if desired, and consequently controlling the outcome of the entire pipeline. For example, by altering names in a list, the user can determine the resources to be run and their order. The expectation is to have one configuration file for each target pathogen. The command-line syntax to invoke Vacceed is ‘perl startup xx’, where xx determines the appropriate configuration file. Specifying a code allows for multiple instances of Vacceed to process different species or resource combinations. No other user input is needed. An e-mail with attached log file is sent on successful completion or immediately following an error.
Fig. 2.

Extract of a Vacceed configuration file defined by a header-key format (only one resource, WoLF PSORT, is shown for brevity)

Extract of a Vacceed configuration file defined by a header-key format (only one resource, WoLF PSORT, is shown for brevity) The framework is organized into two major parts referred henceforth as part A—build proteome, and part B—run pipeline (see Fig. 3). Table 1 lists the programs currently integrated in each part. A starting prerequisite for part B is a file containing amino acid sequences for proteins from the target eukaryotic pathogen i.e. the proteome. Known protein sequences for many pathogens can be downloaded from public databases. Part A is used, only if required, to predict novel protein sequences and/or collect evidence to support the existence of known proteins. Part A resources typically predict genes, which is one among multiple tasks within linked resources involved in building the proteome. Examples of other tasks are validating gene start and end sequences (e.g. ATG, TAA, TAG or TGA), predicting exon locations relative to gene start, converting predictions to amino acid sequences and homology searching.
Fig. 3.

Schematic of data flow in Vacceed

Table 1.

Programs currently integrated in Vacceed

NameFunctionURL (last viewed May 2014)
Part A—Build proteome
AugustusAb initio gene predictorhttp://bioinf.uni-greifswald.de/augustus
GlimmerHMMAb initio gene predictorhttp://ccb.jhu.edu/software/glimmerhmm
BLATAligns expressed sequence tags (ESTs) to DNAhttp://genome.ucsc.edu/FAQ/FAQblat.html
GMAPAligns expressed sequence tags (ESTs) to DNAhttp://research-pub.gene.com/gmap
N-ScanAb initio gene predictor supported by genome comparisonhttp://mblab.wustl.edu/software.html
BLASTNFinds regions of similarity between nucleotide sequenceshttp://www.ncbi.nlm.nih.gov
BLASTPFinds regions of similarity between protein sequenceshttp://www.ncbi.nlm.nih.gov
Part B—Run pipeline (vaccine candidate discovery)
WoLf PSORTProtein subcellular localization predictionhttp://wolfpsort.seq.cbrc.jp
SignalPPredicts presence and location of signal peptide cleavage siteshttp://www.cbs.dtu.dk/services/SignalP
TargetPProtein subcellular localization predictionhttp://www.cbs.dtu.dk/services/TargetP
PhobiusCombined transmembrane topology and signal peptide predictorhttp://phobius.binf.ku.dk/instructions.html
TMHMMPrediction of transmembrane helices in proteinshttp://www.cbs.dtu.dk/services/TMHMM
MHC I-bindingPeptide binding to MHC class I moleculeshttp://tools.immuneepitope.org/mhci/download
MHC II-bindingPeptide binding to MHC class II moleculeshttp://tools.immuneepitope.org/mhcii/download
Schematic of data flow in Vacceed Programs currently integrated in Vacceed Part B resources predict protein characteristics. One resource called ‘Evidence’, however, parses output files and collates relevant protein characteristics (referred henceforth as an evidence profile). A typical profile is a mixture of data types corresponding to an accuracy measure or score for the predicted characteristic (see Fig. 4). A crucial feature of the resource is a set of supervised machine learning algorithms for binary classification executed via Rscript. The ensemble of classifiers constitutes the heart of Vacceed’s decision making. The main output is a ranked vaccine candidate list of all proteins in the target pathogen based on an average probability of individual classifier predictions (see Fig. 4). Machine learning algorithms are the key to overcoming the challenge that an unknown percentage of evidence is questionable in each profile.
Fig. 4.

Examples of evidence profiles and a ranked vaccine candidate list (only four proteins out of potentially thousands constituting the target pathogen are shown for brevity)

Examples of evidence profiles and a ranked vaccine candidate list (only four proteins out of potentially thousands constituting the target pathogen are shown for brevity) Resources encapsulate, for the most part, a large number of independent computation-intensive tasks. Vacceed takes advantage of multi-core processors. Part A processes one chromosome per CPU in parallel. Chromosomes are queued if there are more chromosomes than CPUs. The user, however, can specify the number of chromosomes to process in parallel. Part B internally splits the proteins by the number of CPUs and processes each subset in parallel. Alternatively, the user can specify the split value. Proof of concept: There is no program yet to evaluate in silico vaccine candidates in a host–vaccine interaction. The best interim option is to validate the in silico process by predicting candidates using experimentally validated proteins with known immunogenicity characteristics i.e. compare predicted with expected to determine sensitivity and specificity of the process. Using a mixed dataset of 140 published proteins observed to induce or not induce immune responses, we demonstrated in an earlier study (Goodswen ) Vacceed’s decision making potential by effectively distinguishing expected true from expected false vaccine candidates, with an average sensitivity and specificity of 0.97 and 0.98, respectively. Funding: PhD scholarship from Zoetis (Pfizer) Animal Health. Conflict of Interest: none declared.
  8 in total

Review 1.  Reverse vaccinology.

Authors:  R Rappuoli
Journal:  Curr Opin Microbiol       Date:  2000-10       Impact factor: 7.934

2.  Genomic approach for analysis of surface proteins in Chlamydia pneumoniae.

Authors:  Silvia Montigiani; Fabiana Falugi; Maria Scarselli; Oretta Finco; Roberto Petracca; Giuliano Galli; Massimo Mariani; Roberto Manetti; Mauro Agnusdei; Roberto Cevenini; Manuela Donati; Renzo Nogarotto; Nathalie Norais; Ignazio Garaguso; Sandra Nuti; Giulietta Saletti; Domenico Rosa; Giulio Ratti; Guido Grandi
Journal:  Infect Immun       Date:  2002-01       Impact factor: 3.441

3.  Search for potential vaccine candidate open reading frames in the Bacillus anthracis virulence plasmid pXO1: in silico and in vitro screening.

Authors:  N Ariel; A Zvi; H Grosfeld; O Gat; Y Inbar; B Velan; S Cohen; A Shafferman
Journal:  Infect Immun       Date:  2002-12       Impact factor: 3.441

4.  Identification of vaccine candidate antigens from a genomic analysis of Porphyromonas gingivalis.

Authors:  B C Ross; L Czajkowski; D Hocking; M Margetts; E Webb; L Rothel; M Patterson; C Agius; S Camuglia; E Reynolds; T Littlejohn; B Gaeta; A Ng; E S Kuczek; J S Mattick; D Gearing; I G Barr
Journal:  Vaccine       Date:  2001-07-20       Impact factor: 3.641

5.  A guide to in silico vaccine discovery for eukaryotic pathogens.

Authors:  Stephen J Goodswen; Paul J Kennedy; John T Ellis
Journal:  Brief Bioinform       Date:  2012-10-24       Impact factor: 11.622

6.  Vaxign: the first web-based vaccine design program for reverse vaccinology and applications for vaccine development.

Authors:  Yongqun He; Zuoshuang Xiang; Harry L T Mobley
Journal:  J Biomed Biotechnol       Date:  2010-07-04

7.  NERVE: new enhanced reverse vaccinology environment.

Authors:  Sandro Vivona; Filippo Bernante; Francesco Filippini
Journal:  BMC Biotechnol       Date:  2006-07-18       Impact factor: 2.563

8.  A novel strategy for classifying the output from an in silico vaccine discovery pipeline for eukaryotic pathogens using machine learning algorithms.

Authors:  Stephen J Goodswen; Paul J Kennedy; John T Ellis
Journal:  BMC Bioinformatics       Date:  2013-11-02       Impact factor: 3.169

  8 in total
  16 in total

1.  Compilation of parasitic immunogenic proteins from 30 years of published research using machine learning and natural language processing.

Authors:  Stephen J Goodswen; Paul J Kennedy; John T Ellis
Journal:  Sci Rep       Date:  2022-06-20       Impact factor: 4.996

Review 2.  Fundamentals and Methods for T- and B-Cell Epitope Prediction.

Authors:  Jose L Sanchez-Trincado; Marta Gomez-Perosanz; Pedro A Reche
Journal:  J Immunol Res       Date:  2017-12-28       Impact factor: 4.818

3.  VacSol: a high throughput in silico pipeline to predict potential therapeutic targets in prokaryotic pathogens using subtractive reverse vaccinology.

Authors:  Muhammad Rizwan; Anam Naz; Jamil Ahmad; Kanwal Naz; Ayesha Obaid; Tamsila Parveen; Muhammad Ahsan; Amjad Ali
Journal:  BMC Bioinformatics       Date:  2017-02-13       Impact factor: 3.169

Review 4.  Vaccination against parasites - status quo and the way forward.

Authors:  Anja Joachim
Journal:  Porcine Health Manag       Date:  2016-12-15

5.  The genome of the protozoan parasite Cystoisospora suis and a reverse vaccinology approach to identify vaccine candidates.

Authors:  Nicola Palmieri; Aruna Shrestha; Bärbel Ruttkowski; Tomas Beck; Claus Vogl; Fiona Tomley; Damer P Blake; Anja Joachim
Journal:  Int J Parasitol       Date:  2017-02-02       Impact factor: 3.981

6.  A Gene-Based Positive Selection Detection Approach to Identify Vaccine Candidates Using Toxoplasma gondii as a Test Case Protozoan Pathogen.

Authors:  Stephen J Goodswen; Paul J Kennedy; John T Ellis
Journal:  Front Genet       Date:  2018-08-20       Impact factor: 4.599

7.  Development of a Modular Vaccine Platform for Multimeric Antigen Display Using an Orthobunyavirus Model.

Authors:  Andrea Aebischer; Kerstin Wernike; Patricia König; Kati Franzke; Paul J Wichgers Schreur; Jeroen Kortekaas; Marika Vitikainen; Marilyn Wiebe; Markku Saloheimo; Ronen Tchelet; Jean-Christophe Audonnet; Martin Beer
Journal:  Vaccines (Basel)       Date:  2021-06-15

Review 8.  Structural and Computational Biology in the Design of Immunogenic Vaccine Antigens.

Authors:  Lassi Liljeroos; Enrico Malito; Ilaria Ferlenghi; Matthew James Bottomley
Journal:  J Immunol Res       Date:  2015-10-07       Impact factor: 4.818

Review 9.  Cystoisospora suis - A Model of Mammalian Cystoisosporosis.

Authors:  Aruna Shrestha; Ahmed Abd-Elfattah; Barbara Freudenschuss; Barbara Hinney; Nicola Palmieri; Bärbel Ruttkowski; Anja Joachim
Journal:  Front Vet Sci       Date:  2015-11-30

Review 10.  Vaccines Meet Big Data: State-of-the-Art and Future Prospects. From the Classical 3Is ("Isolate-Inactivate-Inject") Vaccinology 1.0 to Vaccinology 3.0, Vaccinomics, and Beyond: A Historical Overview.

Authors:  Nicola Luigi Bragazzi; Vincenza Gianfredi; Milena Villarini; Roberto Rosselli; Ahmed Nasr; Amr Hussein; Mariano Martini; Masoud Behzadifar
Journal:  Front Public Health       Date:  2018-03-05
View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.