Literature DB >> 15980443

PEPVAC: a web server for multi-epitope vaccine development based on the prediction of supertypic MHC ligands.

Abstract

Prediction of peptide binding to major histocompatibility complex (MHC) molecules is a basis for anticipating T-cell epitopes, as well as epitope discovery-driven vaccine development. In the human, MHC molecules are known as human leukocyte antigens (HLAs) and are extremely polymorphic. HLA polymorphism is the basis of differential peptide binding, until now limiting the practical use of current epitope-prediction tools for vaccine development. Here, we describe a web server, PEPVAC (Promiscuous EPitope-based VACcine), optimized for the formulation of multi-epitope vaccines with broad population coverage. This optimization is accomplished through the prediction of peptides that bind to several HLA molecules with similar peptide-binding specificity (supertypes). Specifically, we offer the possibility of identifying promiscuous peptide binders to five distinct HLA class I supertypes (A2, A3, B7, A24 and B15). We estimated the phenotypic population frequency of these supertypes to be 95%, regardless of ethnicity. Targeting these supertypes for promiscuous peptide-binding predictions results in a limited number of potential epitopes without compromising the population coverage required for practical vaccine design considerations. PEPVAC can also identify conserved MHC ligands, as well as those with a C-terminus resulting from proteasomal cleavage. The combination of these features with the prediction of promiscuous HLA class I ligands further limits the number of potential epitopes. The PEPVAC server is hosted by the Dana-Farber Cancer Institute at the site http://immunax.dfci.harvard.edu/PEPVAC/.

Entities: Disease Gene Species

Mesh：

Substances：

Year: 2005 PMID： 15980443 PMCID： PMC1160118 DOI： 10.1093/nar/gki357

Source DB: PubMed Journal: Nucleic Acids Res ISSN： 0305-1048 Impact factor: 16.971

INTRODUCTION

T-cells are the key component of the adaptive immune system, playing a pivotal role fighting both infectious agents and cancer cells (1). T-cell-based immune responses are driven by antigenic peptides (epitopes), presented in the context of major histocompatibility complex (MHC) molecules (2). Therefore, the prediction of peptides that can bind to MHC molecules has become the basis for the anticipation of T-cell epitopes (3). MHC molecules fall into two major classes, namely MHC class I (MHCI) and MHC class II (MHCII). Antigens presented by MHCI and MHCII are recognized by two distinct sets of T-cells, CD8+ T and CD4+ T-cells, respectively. Identification of T-cell epitopes is important for both understanding disease pathogenesis and vaccine design. Thus, the availability of computational methods that can readily identify potential epitopes from primary protein sequences has fueled a new paradigm in vaccine development that is driven by this epitope discovery. A major complication to this vaccine development approach is the extreme polymorphism of the MHC molecules. In the human, MHC molecules are known as human leukocyte antigens (HLAs), and there are hundreds of allelic variants of the class I (HLA I) and the class II (HLA II) molecules. These HLA allelic variants bind distinct sets of peptides as MHC polymorphism is the basis for peptide-binding specificity (4), and are expressed at vastly variable frequencies in different ethnic groups (5). This complexity suggests that a large number of HLA molecules will have to be targeted for peptide-binding predictions, requiring so many peptides to elicit a broadly protective multi-epitope vaccine as to be impractical. Interestingly, groups of several HLA molecules (supertypes) can bind largely overlapping sets of peptides (6,7). The identification of these HLA supertypes facilitates the epitope-based vaccine development for the following two reasons: first, targeting of representative HLA alleles from distinct supertypes allows the immune response to be stimulated in a variety of genetic backgrounds; second, the selection of promiscuous peptide binders to those alleles included within a given supertype limits the number of peptides to be considered without decreasing the spectrum of the immune response. In this paper, we describe a web server, PEPVAC (Promiscuous EPitope-based VACcine), that allows the prediction of promiscuous epitopes to five HLA I supertypes: A2 (A*0201-07, A*0209 and A*6802), A3 (A*0301, A*1101, A*3101, A*3301, A*6801 and A*6601), A24 (A*2402 and B*3801), B7 (B*0702, B*3501, B*5101-02, B*5301 and B*5401) and B15 (A*0101, B*1501_B62 and B1502). These supertypes were defined using a method based on the clustering of the predicted peptide-binding repertoire of MHC molecules (8). The combined phenotypic frequency of these supertypes is >95% for five major American ethnicities (Black, Caucasian, Hispanic, Native American and Asian). Thus, targeting these supertypes with epitope predictions would potentially provide a population coverage ≥95%, regardless of ethnicity. Peptides binding to HLA I molecules are potential CD8+ T-cell epitopes. In vivo, the C-terminus of these antigenic epitopes results from the selective proteolysis of cytosolic proteins mediated by the proteasome (9). The proteasome is thus important for determining these epitopes. Therefore, PEPVAC has also been implemented with an algorithm for the identification of those peptides containing a C-terminus that is likely to be the result of proteasomal cleavage. Finally, PEPVAC also allows the prediction of conserved epitopes from sequences with variability masked. The combination of these two features serves in both refining the predictions of T-cell epitopes and limiting the number of potential epitopes.

Prediction of peptide-MHCI binding

The peptide-binding mode of MHCI molecules differs from that of MHCII (10–12), and as result, the prediction of peptide-MHCII binding is less reliable than that of peptide-MHCI binding. Thereby, we have focused here in the prediction of MHCI ligands, a class that is specifically recognized by CD8+ cytotoxic T lymphocytes. Peptides binding to a specific MHCI molecule are related by sequence similarity, and thus we use position-specific scoring matrix (PSSM) from aligned MHCI ligands as the predictors of peptide-MHCI binding in combination with a dynamic algorithm. PSSMs are also known as profiles and weight matrices and have previously been shown to be adequate tools for the prediction of peptide-MHC binding (13–16). PSSMs are derived from block alignments of MHCI ligands that are of the same length. Such a restriction guarantees proper structural alignment of ligands and subsequent accuracy of the peptide-binding predictions (13,14). Given that MHCI-ligands are usually of nine residues in length, PSSMs used in this study are for the prediction of ligands of that same size (nine residues). Accuracy of the prediction of peptide-MHCI binding using PSSMs varies depending on threshold and the targeted MHCI molecule. On average, however, ROC analyses of the predictions at different thresholds result in AUC values (Area Under ROC Curve) above 0.8, indicating that these PSSMs are very good for predictors of peptide-MHCI binding. Furthermore, >80% of known CD8+ T-cell epitopes can be predicted at a 2% threshold from their protein sources.

Supertypes: identification and population coverage analysis

We defined HLA I supertypes through clustering of predicted MHC peptide-binding repertoires (8). In brief, the core of the method consists of the generation of a distance matrix whose coefficients are inversely proportional to the peptide binders shared by any two HLA molecules (Figure 1). Subsequently, this distance matrix is fed to a phylogenic clustering algorithm to establish the kinship among the distinct HLA peptide-binding repertoires. Figure 2 shows a phylogenic tree built upon the peptide-binding repertoire of 55 HLA I molecules, using a Fitch and Margoliash clustering algorithm (17). We defined supertypes (Figure 2) as groups of HLA I alleles with ≥20% peptide-binding overlap (pairwise between any pair of alleles). The supertypes identified in this study include the A2, A3, B7, B27 and B44 supertypes previously identified by Sidney et al. (16). Furthermore, we have also identified three new supertypes, BX, B15 and B57 (Figure 2). The cumulative phenotypic frequency (CPF) of these supertypes is shown in Table 1. CPF was calculated using the gene and haplotype frequencies reported for five distinct American ethnic groups including Blacks, Caucasians, Hispanic, North American Natives and Asians (18). CPF represents the population coverage that would be provided by a vaccine composed of epitopes restricted by the alleles included in the supertype. The A2, A3 and B7 supertypes have the largest CPF in the five studied ethnic groups, close to 90%, irrespective of ethnicity. To increase the population coverage to ≥95%, regardless of ethnicity, it is necessary to include at least two more supertypes. Specifically, the supertypes A2, A3, B7, B15 and A24/B44 represent the minimal supertypic combinations with the indicated population coverage. Alleles belonging to each of these supertypes are shown in Figure 2 and Table 1.

Figure 1

Strategy to define HLA I supertypes. HLA I supertypes are identified by clustering their peptide-binding repertoire (8). The method consists of four basic steps. (i) Predict the peptide-binding repertoire (i,j sets in figure) of each HLA I molecule from the same random protein using the relevant PSSMs in combination with the RANKPEP scoring algorithm (13). (ii) Compute the number of common peptides between the binding repertoire of any two HLA I molecules. (iii) Build a distance matrix whose coefficients are inversely proportional to the peptide-binding overlap between any pair of HLA I molecules. (iv) Use a phylogenic clustering algorithm to compute and visualize HLA I supertypes (clusters of HLA I molecules with overlapping peptide-binding repertoires).

Figure 2

HLA I peptide-binding overlap and supertypes. The Figure shows an unroot dendrogram built after clustering the overlap between the peptide-binding repertoire of the indicated HLA I molecules. Peptide-binding repertoires of HLA I molecules were obtained from a random protein (1000 amino acids) using the relevant PSSMs at a 2% peptide-binding threshold. This dendrogram reflects the relationship between the peptide-binding specificities of HLA I molecules. HLA I alleles with similar peptide-binding specificities branch together in groups or clusters. The closer HLA I alleles branch, the larger is the overlap between their peptide-binding repertoires. Supertypes (shadowed with different colors) consist of groups HLA I alleles with at least a 20% peptide-binding overlap (pairwise between any pair of alleles).

Table 1

Cumulative phenotype frequency of defined supertypes

Supertype	Alleles	Blacks (%)	Caucasians (%)	Hispanics (%)	North American natives (%)	Asians (%)
A2	A0201-7, A6802	43.7	49.9	51.8	52.4	44.7
A3	A0301, A1101, A3101, A3301, A6801, A6601	35.4	46.9	41.5	40.7	47.9
B7	B0702, B3501, B5101-02, B5301, B*5401	45.9	42.2	40.5	52.0	31.3
B15	A0101, B1501_B62, B1502	13.06	37.80	16.75	27.26	21.04
A24	A2402, B3801	15.5	17.28	25.85	41.94	35.0
B44	B4402, B4403	10.4	27.7	17.15	14.4	10.1
B57	B5701-02, B5801, B*1503	19.2	10.3	5.9	5.8	16.5
ABX	A2902, B4002	7.4	11.3	19.1	16.3	16.3
B27	B2701-06, B2709, B*3909	2.3	4.8	5.1	16.9	4.7
BX	B1509, B1510, B*39011	3.1	0.7	4.2	7.8	4.1

Cumulative phenotype frequency was obtained using the HLA I gene and haplotype frequencies published by Cao et al. (18) corresponding to the indicated five American ethnic groups. Method for computing the cumulative phenotype frequency considered the disequilibrium linkage between the HLA-A and -B gene and was based on that reported by Dawson et al. (21).

PEPVAC web server

Following the HLA I supertypic analysis as discussed, we have implemented a tool for the prediction of promiscuous peptide binders to a set of supertypes with a CPF >95%, irrespective of ethnicity. We named this tool PEPVAC, and it is Online at the site hosted by the Molecular Immunology Foundation/Dana-Farber Cancer Institute. The web interface to PEPVAC is divided into several sections that facilitate intuitive use (Figure 3A). Main features of the web server are discussed bellow.

Figure 3

The PEPVAC web server. (A) PEPVAC input page. The page is divided into several sections. E-MAIL, for obtaining the results via e-mail (optional). GENOMES, where a selection of genomes from pathogenic organisms is available, as well as the possibility of uploading a user-provided genome. SUPERTYPES, the supertypes A2, A3, B7, A24, and B15 are available for selection. Alleles targeted for peptide-biding predictions in each supertype are indicated. The minimum population coverage of the selected supertypes is calculated on the fly and shown on the relevant window. PROTEASOMAL CLEAVAGE, prediction of proteasomal cleavage using three optimal language models is carried out in parallel to the peptide-biding predictions. (B) PEPVAC result page. An example result page where the A3 supertype was selected for peptide-binding predictions from the genome of Influenza A virus (A/PR/8/34) is shown. The result page first displays a summary of the predictions, followed by the predicted peptide binders to each of the selected supertypes (only A3 in the shown example). Peptides highlighted in violet contain a C-terminal residue that is predicted to be the result of proteasomal cleavage. If the proteasomal cleavage filter is checked ON in the input page, only violet peptides will be shown.

Input and limitations

In PEPVAC, input query to carry epitope predictions is entered in the GENOME section (Figure 3A). Input consists of a single or various protein sequences in FASTA format. Only the standard 20 amino acid residues are considered. There are several translated genomes from pathogenic organisms that can be selected as inputs. More useful, a user-provided local file containing a set of protein sequences can be uploaded to the server using the choose/browse bottom. PEPVAC can also process files with protein sequences, in which the variable sites have been masked with a dot ‘.’ symbol. In that case, peptide-binding predictions will be carried out only over consecutive stretches of nine or more residues. Sequences with variable positions masked according to the Shannon entropy variability metric (4,19) can be obtained at the site . Currently, there is a limit of 200 sequences and 50 000 symbols that can be processed per request. If such limits are exceeded, the server will return an error.

Supertypes and thresholds

The A2, A3, B7, B15 and A24 (Figure 2 and Table 1) supertypes have been chosen for promiscuous peptide-binding predictions in PEPVAC. Only those peptides that are predicted to bind to all the alleles included in the supertypes are returned in the output (Figure 3B). Threshold for the prediction of promiscuous peptide binders in PEPVAC has been fixed to provide a reduced and manageable set of promiscuous peptide binders to each supertype. As an example, predicted promiscuous peptides to the above five supertypes from a genome, such as that of Influenza virus A (4160 amino acids) distributed in 10 distinct open reading frames, represent only 5.51% (254 9mer peptides) of all possible peptides (4617 9mer peptides).

Proteasome cleavage

In PEPVAC, predictions of supertypic peptide binders are combined with the prediction of proteasomal cleavage using probabilistic language models derived from HLA I-restricted epitopes (14). Currently, there are three optional models for proteasomal cleavage that differ in their sensitivity/specificity ratio of the predictions as discussed elsewhere (14). These models are selected within the PROTEASOME CLEAVAGE section. Model 1 has the highest sensitivity (∼95%) and the lower specificity (∼60%). Conversely, Model 3 has the lowest sensitivity (65%) with the largest specificity (80%). Model 2 has a sensitivity and specificity of ∼70%. Promiscuous peptide binders containing a C-terminal end, predicted to be the result of proteasomal cleavage, are shown in violet in the result page (Figure 3B). In the previous example with the Influenza virus A, the list of promiscuous peptide binders to the five selected supertypes decreases from 254 down to 170 peptides (3.7% of all 9mer peptides from Influenza virus A genome) after considering proteasomal cleavage using Model 1. Furthermore, a combination of the predictions of peptide-MHCI binding and proteasomal cleavage increases the specificity of the epitope predictions by discarding predicted peptide-MHCI binders that are experimentally unable to elicit CD8+ T-cell responses (20).

Output

The results page returned by PEPVAC is shown in Figure 3B. This page first displays a summary of the predictions, including the chosen selections, the number of predicted peptides and the minimum population coverage provided by the supertypic selection, followed by the predicted peptide binders to each of the selected supertypes (only A3 in the shown example). Peptides are predicted to bind to all alleles included in the supertype, and appear ranked with regard to the PSSMs of the first allele included in the supertype. Relevant information about each sorted peptide includes its protein source as well as its molecular weight.

18 in total

Review 1. Nine major HLA class I supertypes account for the vast preponderance of HLA-A and -B polymorphism.

Authors: A Sette; J Sidney
Journal: Immunogenetics Date: 1999-11 Impact factor: 2.846

2. Examining the independent binding assumption for binding of peptide epitopes to MHC-I molecules.

Authors: Björn Peters; Weiwei Tong; John Sidney; Alessandro Sette; Zhiping Weng
Journal: Bioinformatics Date: 2003-09-22 Impact factor: 6.937

3. Genome-wide characterization of a viral cytotoxic T lymphocyte epitope repertoire.

Authors: Weimin Zhong; Pedro A Reche; Char-Chang Lai; Bruce Reinhold; Ellis L Reinherz
Journal: J Biol Chem Date: 2003-09-05 Impact factor: 5.157

4. Prediction of MHC class I binding peptides using profile motifs.

Authors: Pedro A Reche; John-Paul Glutting; Ellis L Reinherz
Journal: Hum Immunol Date: 2002-09 Impact factor: 2.850

Review 5. Immunoinformatics and the prediction of immunogenicity.

Authors: Darren R Flower; Irini A Doytchinova
Journal: Appl Bioinformatics Date: 2002

6. Enhancement to the RANKPEP resource for the prediction of peptide binding to MHC molecules using profiles.

Authors: Pedro A Reche; John-Paul Glutting; Hong Zhang; Ellis L Reinherz
Journal: Immunogenetics Date: 2004-09-03 Impact factor: 2.846

7. Improved prediction of MHC class I and class II epitopes using a novel Gibbs sampling approach.

Authors: Morten Nielsen; Claus Lundegaard; Peder Worning; Christina Sylvester Hvid; Kasper Lamberth; Søren Buus; Søren Brunak; Ole Lund
Journal: Bioinformatics Date: 2004-02-12 Impact factor: 6.937

8. HLA supertypes and supermotifs: a functional perspective on HLA polymorphism.

Authors: A Sette; J Sidney
Journal: Curr Opin Immunol Date: 1998-08 Impact factor: 7.486

9. Analysis of the frequencies of HLA-A, B, and C alleles and haplotypes in the five major ethnic groups of the United States reveals high levels of diversity in these loci and contrasting distribution patterns in these populations.

Authors: K Cao; J Hollenbach; X Shi; W Shi; M Chopek; M A Fernández-Viña
Journal: Hum Immunol Date: 2001-09 Impact factor: 2.850

10. Sequence variability analysis of human class I and class II MHC molecules: functional and structural correlates of amino acid polymorphisms.

Authors: Pedro A Reche; Ellis L Reinherz
Journal: J Mol Biol Date: 2003-08-15 Impact factor: 5.469

37 in total

1. MHC-BPS: MHC-binder prediction server for identifying peptides of flexible lengths from sequence-derived physicochemical properties.

Authors: Juan Cui; Lian Yi Han; Hong Huang Lin; Zhi Qun Tang; Li Jiang; Zhi Wei Cao; Yu Zong Chen
Journal: Immunogenetics Date: 2006-07-11 Impact factor: 2.846

Review 2. Epitope prediction and identification- adaptive T cell responses in humans.

Authors: John Sidney; Bjoern Peters; Alessandro Sette
Journal: Semin Immunol Date: 2020-10-31 Impact factor: 11.130

3. Recombinant and epitope-based vaccines on the road to the market and implications for vaccine design and production.

Authors: Patricio Oyarzún; Bostjan Kobe
Journal: Hum Vaccin Immunother Date: 2016-03-03 Impact factor: 3.452

4. An immunoinformatic approach for identification of Trypanosoma cruzi HLA-A2-restricted CD8(+) T cell epitopes.

Authors: Christopher S Eickhoff; Daniel Van Aartsen; Frances E Terry; Sheba K Meymandi; Mahmoud M Traina; Salvador Hernandez; William D Martin; Leonard Moise; Annie S De Groot; Daniel F Hoft
Journal: Hum Vaccin Immunother Date: 2015-06-24 Impact factor: 3.452

5. Prediction of supertype-specific HLA class I binding peptides using support vector machines.

Authors: Guang Lan Zhang; Ivana Bozic; Chee Keong Kwoh; J Thomas August; Vladimir Brusic
Journal: J Immunol Methods Date: 2007-01-25 Impact factor: 2.303

6. In silico Design of a Vaccine Candidate for SAR S-CoV-2 Based on Multiple T-cell and B-cell Epitopes.

Authors: B J Oso; I F Olaoye; C O Ogidi
Journal: Arch Razi Inst Date: 2021-11-30

7. MULTIPRED2: a computational system for large-scale identification of peptides predicted to bind to HLA supertypes and alleles.

Authors: Guang Lan Zhang; David S DeLuca; Derin B Keskin; Lou Chitkushev; Tanya Zlateva; Ole Lund; Ellis L Reinherz; Vladimir Brusic
Journal: J Immunol Methods Date: 2010-12-02 Impact factor: 2.303

8. Direct ex vivo analyses of HLA-DR1 transgenic mice reveal an exceptionally broad pattern of immunodominance in the primary HLA-DR1-restricted CD4 T-cell response to influenza virus hemagglutinin.

Authors: Katherine A Richards; Francisco A Chaves; Frederick R Krafcik; David J Topham; Christopher A Lazarski; Andrea J Sant
Journal: J Virol Date: 2007-05-16 Impact factor: 5.103

9. Identification and Immune Assessment of T Cell Epitopes in Five Plasmodium falciparum Blood Stage Antigens to Facilitate Vaccine Candidate Selection and Optimization.

Authors: Vinayaka Kotraiah; Timothy W Phares; Frances E Terry; Pooja Hindocha; Sarah E Silk; Carolyn M Nielsen; Leonard Moise; Kenneth D Tucker; Rebecca Ashfield; William D Martin; Anne S De Groot; Simon J Draper; Gabriel M Gutierrez; Amy R Noe
Journal: Front Immunol Date: 2021-07-07 Impact factor: 7.561

10. Virtual interactomics of proteins from biochemical standpoint.

Authors: Jaroslav Kubrycht; Karel Sigler; Pavel Souček
Journal: Mol Biol Int Date: 2012-08-08