Literature DB >> 17897458

Development of an epitope conservancy analysis tool to facilitate the design of epitope-based diagnostics and vaccines.

Huynh-Hoa Bui¹, John Sidney, Wei Li, Nicolas Fusseder, Alessandro Sette.

Abstract

BACKGROUND: In an epitope-based vaccine setting, the use of conserved epitopes would be expected to provide broader protection across multiple strains, or even species, than epitopes derived from highly variable genome regions. Conversely, in a diagnostic and disease monitoring setting, epitopes that are specific to a given pathogen strain, for example, can be used to monitor responses to that particular infectious strain. In both cases, concrete information pertaining to the degree of conservancy of the epitope(s) considered is crucial.
RESULTS: To assist in the selection of epitopes with the desired degree of conservation, we have developed a new tool to determine the variability of epitopes within a given set of protein sequences. The tool was implemented as a component of the Immune Epitope Database and Analysis Resources (IEDB), and is directly accessible at http://tools.immuneepitope.org/tools/conservancy.
CONCLUSION: An epitope conservancy analysis tool was developed to analyze the variability or conservation of epitopes. The tool is user friendly, and is expected to aid in the design of epitope-based vaccines and diagnostics.

Entities: Chemical Disease Gene Species

Mesh：

Substances：
Epitopes
Vaccines

Year: 2007 PMID： 17897458 PMCID： PMC2233646 DOI： 10.1186/1471-2105-8-361

Source DB: PubMed Journal: BMC Bioinformatics ISSN： 1471-2105 Impact factor: 3.169

Background

An epitope can be defined as a group of amino acids derived from a protein antigen that interacts with antibodies or T-cell receptors, thereby activating an immune response. Epitopes can be classified as either continuous or discontinuous. Continuous epitopes, also known as linear or sequential epitopes, are composed of amino acid residues that are contiguous in their primary protein sequence. Conversely, discontinuous epitopes, also known as assembled or conformational epitopes, are composed of amino acid residues that are typically present in different protein regions, but which are brought together by protein folding. Recognition of T cell epitopes typically depends upon processing of antigenic proteins, and as a result T cell epitopes are usually continuous. B cell epitopes, often recognized in the native protein context, may be either continuous or discontinuous. Pathogenic proteins, in general, and epitopes in particular, are often variable. The degree of variability or similarity of specific proteins or protein regions can provide important information regarding evolutionary, structural, functional, and immunological correlates. Given a set of homologous proteins, phylogenetic relationships can be constructed and used to calculate the evolutionary rate at each amino acid site. Regions that evolve slowly are considered "conserved" while those that evolve rapidly are considered "variable". This approach is widely used in sequence conservation identification and mapping programs such as ConSeq [1] and ConSurf [2,3]. However, to fully describe and characterize protein and/or epitope variability, measures of identity and conservancy are typically utilized. Identity refers to the extent to which two amino acid sequences are invariant, and is measured as the percentage of identical amino acids in the alignment of two sequences. Conservancy is defined as the fraction of protein sequences that contain the epitope considered at or above a specified level of identity. Conversely, the fraction of protein sequences that contain the epitope considered below a specified level of identity reflects the degree of variability or uniqueness of the epitope. Amino acid residues that are crucial for retention of protein function are believed to be associated with intrinsically lower variability, even under immune pressure. As such, these regions often represent good targets for the development of epitope-based vaccines, as the epitopes targeted can be expected to be present irrespective of disease stage, or particular strain of the pathogen. Furthermore, these same residues are often highly conserved across different related species, such as, for example, has been found in several instances in the context of the poxviridae [4]. As a result, a vaccine containing such conserved epitopes might be effective in providing broad-spectrum protection. Conversely, in a diagnostic and disease monitoring setting, epitopes that are specific to a given pathogen can be used to monitor responses to that particular infectious strain, removing the confounding influence of immune responses derived from previous exposures to partially cross-reactive strains or organisms. Herein, to assist in the selection of epitopes having a desired level of conservation or, conversely, variability, we have developed an epitope conservancy analysis tool. The tool has been specifically designed to determine the degree of conservation or variability associated with a specific epitope within a given set of protein sequences. Despite our emphasis on epitope identification contexts, it is also apparent that the tool can be utilized for other purposes, such as tracking mutation of epitopes during disease progression. This tool was implemented as a component of the Immune Epitope Database and Analysis Resources (IEDB) [5-7] and was used in predicting the cross-reactivity of influenza A epitopes [8].

Implementation

Approach

Given an epitope sequence and a set of protein sequences {}, our approach is to find the best local alignment(s) of on each . The degree of conservation of within is calculated as the fraction of {} that matched the aligned above a chosen identity level. Two separate processes were developed for assessing the degree of conservation/variability of continuous and discontinuous epitope sequences.

Continuous sequence

If is continuous, the process of finding the best alignment of on involves breaking in to sub-sequences {} of length equal to and comparing to each . For a sequence of length and an sequence of length , a total -+1 {} different sequences are generated. For each and comparison, the degree of identity is calculated as a percent of residues that are identical between the two sequences. If contains repeat regions, or the identity threshold is low, multiple alignments may be found for . However, the sequence(s) associated with the maximum identity score determines the alignment(s) of on . The degree of conservation of is then calculated as the percent of sequences in which is aligned with an identity level at or above a chosen threshold. Conversely, the degree of variability is calculated as the fraction of that was aligned below a chosen threshold. An illustrative conservancy analysis of a continuous epitope sequence is shown in Table 1.

Table 1

Example conservancy analysis of a continuous sequence

	Reference sequence¹										Identity²

Source	F	L	P	S	D	F	F	P	S	V	No.	(%)
Strain 1	Y	L	P	S	D	F	F	P	S	I	8	(80)
Strain 2	F	L	P	S	D	F	F	P	S	V	10	(100)
Strain 3	R	L	P	S	K	Q	F	P	S	V	7	(70)
Strain 4	Y	E	P	T	D	F	F	P	S	V	7	(70)
Strain 5	F	L	P	T	D	F	F	P	S	V	9	(90)
Strain 6	F	L	P	T	D	F	S	F	T	V	6	(60)
Strain 7	F	L	P	S	D	F	F	P	S	V	10	(100)
Strain 8	Y	E	P	S	E	F	S	F	S	I	4	(40)
Strain 9	F	L	P	S	E	F	F	P	S	V	9	(90)
Strain 10	F	L	P	S	D	F	F	P	S	V	10	(100)

Total: conservancy at identity threshold ≥ 80%											6	(60)
Total: variability at identity threshold <80%											4	(40)

1. Residues that are different from that of the corresponding residue in the reference sequence are highlighted in bold.

2. Identity indicates the number (%) of residues in the homologous sequence that are identical to the corresponding residue in the reference sequence.

3. Totals indicate the number (%) of strains in which the reference sequence is found with an identity above or below the indicated threshold.

Example conservancy analysis of a continuous sequence 1. Residues that are different from that of the corresponding residue in the reference sequence are highlighted in bold. 2. Identity indicates the number (%) of residues in the homologous sequence that are identical to the corresponding residue in the reference sequence. 3. Totals indicate the number (%) of strains in which the reference sequence is found with an identity above or below the indicated threshold.

Discontinuous sequence

If is discontinuous, a continuous sequence pattern is first generated. For example, given a discontinuous sequence "A1, B3, C6" (meaning A is at position 1, B is at position 3 and C is at position 6), its matching sequence pattern is AXBXXC where X is any amino acid residue, and the number of X's between two nearest known amino acid residues is equal to the gap distance between them. Next, the same procedure described for continuous sequences is used to identify the best alignment(s) of on . The identity level is calculated based on the defined epitope residues. An illustration of a discontinuous sequence conservancy analysis is shown in Table 2. To obtain meaningful results, the program only performs calculations for discontinuous sequences consisting of at least three identified residues.

Table 2

Example conservancy analysis of a discontinuous sequence

	Reference sequence¹										Identity²

Source	F	X	X	X	D	F	F	X	X	V	No.	(%)
Strain 1	Y	L	P	S	D	F	F	P	S	I	3	(60)
Strain 2	F	L	P	S	D	F	F	P	S	V	5	(100)
Strain 3	R	L	P	S	K	Q	F	P	S	V	2	(40)
Strain 4	Y	E	P	T	D	F	F	P	S	V	4	(80)
Strain 5	F	L	P	T	D	F	F	P	S	V	5	(100)
Strain 6	F	L	P	T	D	F	S	F	T	V	4	(80)
Strain 7	F	L	P	S	D	F	F	P	S	V	5	(100)
Strain 8	Y	E	P	S	E	F	S	F	S	I	1	(20)
Strain 9	F	L	P	S	E	F	F	P	S	V	4	(80)
Strain 10	F	L	P	S	D	F	F	P	S	V	5	(100)

Total: conservancy at identity threshold ≥ 80%											7	(70)
Total: variability at identity threshold <80%											3	(30)

1. Residues that are not defined in the reference sequence are highlighted in italics. Residues that are different from that of the corresponding residue in the reference sequence are highlighted in bold.

2. Identity indicates the number (%) of residues in the homologous sequence that are identical to the corresponding residue in the reference sequence. Residues highlighted with gray shading are not considered in calculating the identity because they are not defined in the reference sequence.

3. Totals indicate the number (%) of strains in which the reference sequence is found with an identity above or below the indicated threshold.

Example conservancy analysis of a discontinuous sequence 1. Residues that are not defined in the reference sequence are highlighted in italics. Residues that are different from that of the corresponding residue in the reference sequence are highlighted in bold. 2. Identity indicates the number (%) of residues in the homologous sequence that are identical to the corresponding residue in the reference sequence. Residues highlighted with gray shading are not considered in calculating the identity because they are not defined in the reference sequence. 3. Totals indicate the number (%) of strains in which the reference sequence is found with an identity above or below the indicated threshold.

Program description

The epitope conservancy analysis tool was implemented as a Java web-application. An overview of the tool is shown in Figure 1. As input, the program requires the user to provide an epitope set, consisting of one or more epitope sequences, and a set of protein sequences against which each epitope is compared to determine conservancy. Based on our experience, to achieve the best results it is recommended that the protein sequence set utilized be constructed such that redundancies are eliminated and the representation of different substrains and serotypes is balanced. To assist in assembling protein sequence sets, a "Browse for sequences in NCBI" link is provided. When this link is selected, a browser is opened, enabling the user to search for all available protein sequences in NCBI, grouped by organism taxonomic level. To reduce redundancies in the protein sequence set, the user can check the box at the bottom of the input form to have the program automatically remove all duplicated sequences in the protein data set used in the analysis. As output, the program will calculate the fraction of protein sequences that match each epitope sequence above or below a given identity level. The program also calculates the minimum and maximum matching identity level for each epitope. A position mapping of epitope sequences to matching protein sub-fragments is also provided and can be viewed by clicking on the "Go" link in the "View details" column. Detailed sequence mappings of an epitope to all protein sequences in a dataset are also generated. In some cases, if a protein sequence has significant repeat regions, or the level of matching identity is set at a low value, multiple matching protein sub-fragments can be found for a given epitope sequence. All calculation results can be downloaded as text files by clicking on the "Download data to file" button.

Figure 1

An overview of the epitope conservancy analysis tool.

Results and discussion

To determine the degree of conservation of an epitope within a given set of protein sequences, it is necessary to align the epitope to each protein sequence. The degree of conservation is then calculated as the fraction of protein sequences that match the aligned epitope sequence above a defined identity level. Conversely, the degree of variablity is calculated as the fraction of protein sequences that match the aligned epitope sequence below a defined identity level. For continuous epitopes, existing sequence searching and alignment tools, such as BLAST [9] or ClustalW [10], can be used to perform pair-wise local alignment of the epitope to a protein sequence. But, to be relevant in an immunological context, it is crucial that the entire epitope sequence is completely aligned with absolutely no gaps. This requirement entails the use of somewhat different parameters making it cumbersome to use currently existing alignment tools for the characterization of immune epitopes. At the same time, there is no alignment tool currently available for analyzing discontinuous sequences. To rectify these shortcomings, we have developed a robust, user-friendly, epitope conservancy analysis tool. The tool has the capacity to simultaneously align and assess the degree of conservation/variability of each epitope, and can perform these functions for both linear and discontinuous peptide epitope sequences. For the purpose of developing cross-reactive vaccines that aim toward highly variable pathogens, the use of conserved epitopes across different species is desired. Nevertheless, care should be taken to avoid selecting epitopes that are conserved between the pathogen and the host as this could lead to undesirable induction of auto-immunity. Moreover, extremely conserved epitopes between species are sometimes less immunogenic because they may be derived from proteins that resemble similar proteins in the host. As a result, they are less likely to be recognized by T cells due to self-tolerance. It should also be emphasized that conservation at the sequence level does not assure that the epitope will be equally recognized and cross-reactive. This is due to the differences in the antigen sequences from which the epitope is derived. For T cell epitopes, whether they will be processed in the first place is determined by flanking residues that are different for different antigens. Therefore, the same epitope sequence from different antigens may or may not be generated to subsequently presented and recognized by T cell receptors. In the case of B cell epitopes, their recognition by an antibody is dependent on the antigen 3D structures. A sequence-wise conserved epitope may not be structurally conserved as it can adopt different conformations in the context of the antigen structures. Exposed amino acids as opposed to buried amino acids are more important in determining the immunogenic of a given segment of peptide. It is because only exposed residues, as observed in antigen:antibody co-crystals, can form contacts with the complementarity determining regions (CDRs) of the corresponding antibody. Those residues that are recognized by a single antibody are often defined as a discontinuous epitope. The epitope conservancy analysis tool developed here can be used to assess the pattern conservation of discontinuous epitopes. Nevertheless, pattern-wise conserved discontinuous epitopes may not be cross-reactive due to the unknown influence of neighboring and inter-dispersed amino acids. As a result, if antigen structures are available, it may be better to predict cross-reactivity based on the epitope's 3D structural conservation. Depending on the specific needs of a user, an analysis of epitope conservancy may need to be performed at various phylogenetic levels. For example, to determine the potential of a given epitope to be cross-reactive amongst different isolates of a pathogen, or with different microorganisms associated with different pathogenicity, it may be necessary to determine conservancy within a given sub strain, type or clad, within a specific species, or within a genus, or other higher phylogenetic classification group. This type of analysis was utilized previously to identify highly conserved HBV derived epitopes [11,12], and also applied to identify HCV, P. falciparum and HIV derived epitopes [[13], [14], [15], [16], [17], [18], [19]]. Alternatively, to develop epitope-based diagnostic applications aimed at detecting all isolates of a given pathogen but not isolates from related strains, or aimed at detecting specific strains or isolates, it might be necessary to identify epitopes that are highly conserved in only a single or just a few isolates, and poorly conserved in others. Finally, the analysis of potential homologies with sequences expressed by a pathogen's host, or an animal species to be used as an animal model, might be of particular relevance. We anticipate that its relevance might range from predicting poor responses due to self-tolerance and differential performance in animal species expressing different degrees of similarities with a given epitope, to predicting potential safety problems and autoreactivity linked to cross-reactive self reactivity and molecular mimicry. For each of these broad applications, the analysis tool we have developed provides the means to easily assemble the protein sets required to undertake the appropriate analyses, and generates the information necessary to make the appropriate design decisions.

Conclusion

To address the issue of conservation (or variability) of epitopes or, more broadly speaking, peptide sequences, we have developed a tool to calculate the degree of conservancy (or inversely, the variability) of an epitope within a given protein sequence set. Conservancy can be calculated following user defined identity criteria, and minimal and maximal levels of conservancy are identified. Furthermore, the program provides detail information for each alignment executed. This epitope conservancy analysis tool is publicly available and can be used to assist in the selection of epitopes with the desired pattern of conservation for designing epitope-based diagnostics and vaccines.

Availability and requirements

• Project name: Epitope Conservancy Analysis • Project home page: • Operating system(s): Platform independent • Programming language: Java • Other requirements: Java 1.4 or higher, Tomcat 4.0 or higher • License: none • Any restrictions to use by non-academics: none

Abbreviations

BLAST: Basic Local Alignment Search Tool CDRs: Complementarity determining regions IEDB: Immune Epitope Database and Analysis Resources MSA: Multiple sequence alignment NCBI: National Center for Biotechnology Information

Competing interests

The author(s) declares that there are no competing interests.

Authors' contributions

HHB developed the program. WL and NF participated in programming tasks. HHB, JS and AS wrote the manuscript. All authors read and approved the final version.

18 in total

1. Degenerate cytotoxic T cell epitopes from P. falciparum restricted by multiple HLA-A and HLA-B supertype alleles.

Authors: D L Doolan; S L Hoffman; S Southwood; P A Wentworth; J Sidney; R W Chesnut; E Keogh; E Appella; T B Nutman; A A Lal; D M Gordon; A Oloo; A Sette
Journal: Immunity Date: 1997-07 Impact factor: 31.745

2. The design and implementation of the immune epitope database and analysis resource.

Authors: Bjoern Peters; John Sidney; Phil Bourne; Huynh-Hoa Bui; Soeren Buus; Grace Doh; Ward Fleri; Mitch Kronenberg; Ralph Kubo; Ole Lund; David Nemazee; Julia V Ponomarenko; Muthu Sathiamurthy; Stephen P Schoenberger; Scott Stewart; Pamela Surko; Scott Way; Steve Wilson; Alessandro Sette
Journal: Immunogenetics Date: 2005-05-14 Impact factor: 2.846

Review 3. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs.

Authors: S F Altschul; T L Madden; A A Schäffer; J Zhang; Z Zhang; W Miller; D J Lipman
Journal: Nucleic Acids Res Date: 1997-09-01 Impact factor: 16.971

4. HLA-DR-promiscuous T cell epitopes from Plasmodium falciparum pre-erythrocytic-stage antigens restricted by multiple HLA class II alleles.

Authors: D L Doolan; S Southwood; R Chesnut; E Appella; E Gomez; A Richards; Y I Higashimoto; A Maewal; J Sidney; R A Gramzinski; C Mason; D Koech; S L Hoffman; A Sette
Journal: J Immunol Date: 2000-07-15 Impact factor: 5.422

5. HLA-A0201, HLA-A1101, and HLA-B*0702 transgenic mice recognize numerous poxvirus determinants from a wide variety of viral gene products.

Authors: Valerie Pasquetto; Huynh-Hoa Bui; Rielle Giannino; Cindy Banh; Fareed Mirza; John Sidney; Carla Oseroff; David C Tscharke; Kari Irvine; Jack R Bennink; Bjoern Peters; Scott Southwood; Vincenzo Cerundolo; Howard Grey; Jonathan W Yewdell; Alessandro Sette
Journal: J Immunol Date: 2005-10-15 Impact factor: 5.422

6. Conserved hepatitis C virus sequences are highly immunogenic for CD4(+) T cells: implications for vaccine development.

Authors: V Lamonaca; G Missale; S Urbani; M Pilli; C Boni; C Mori; A Sette; M Massari; S Southwood; R Bertoni; A Valli; F Fiaccadori; C Ferrari
Journal: Hepatology Date: 1999-10 Impact factor: 17.425

7. Identification of A2-restricted hepatitis C virus-specific cytotoxic T lymphocyte epitopes from conserved regions of the viral genome.

Authors: P A Wentworth; A Sette; E Celis; J Sidney; S Southwood; C Crimi; S Stitely; E Keogh; N C Wong; B Livingston; D Alazard; A Vitiello; H M Grey; F V Chisari; R W Chesnut; J Fikes
Journal: Int Immunol Date: 1996-05 Impact factor: 4.823

8. Identification of HLA-A3 and -B7-restricted CTL response to hepatitis C virus in patients with acute and chronic hepatitis C.

Authors: K M Chang; N H Gruener; S Southwood; J Sidney; G R Pape; F V Chisari; A Sette
Journal: J Immunol Date: 1999-01-15 Impact factor: 5.422

9. ConSurf 2005: the projection of evolutionary conservation scores of residues on protein structures.

Authors: Meytal Landau; Itay Mayrose; Yossi Rosenberg; Fabian Glaser; Eric Martz; Tal Pupko; Nir Ben-Tal
Journal: Nucleic Acids Res Date: 2005-07-01 Impact factor: 16.971

10. The immune epitope database and analysis resource: from vision to blueprint.

Authors: Bjoern Peters; John Sidney; Phil Bourne; Huynh-Hoa Bui; Soeren Buus; Grace Doh; Ward Fleri; Mitch Kronenberg; Ralph Kubo; Ole Lund; David Nemazee; Julia V Ponomarenko; Muthu Sathiamurthy; Stephen Schoenberger; Scott Stewart; Pamela Surko; Scott Way; Steve Wilson; Alessandro Sette
Journal: PLoS Biol Date: 2005-03 Impact factor: 8.029

118 in total

1. Towards an immunosense vaccine to prevent toxoplasmosis: protective Toxoplasma gondii epitopes restricted by HLA-A*0201.

Authors: Hua Cong; Ernest J Mui; William H Witola; John Sidney; Jeff Alexander; Alessandro Sette; Ajesh Maewal; Rima McLeod
Journal: Vaccine Date: 2010-11-21 Impact factor: 3.641

Review 2. Applications for T-cell epitope queries and tools in the Immune Epitope Database and Analysis Resource.

Authors: Yohan Kim; Alessandro Sette; Bjoern Peters
Journal: J Immunol Methods Date: 2010-10-31 Impact factor: 2.303

3. Comparative studies of infectivity, immunogenicity and cross-protective efficacy of live attenuated influenza vaccines containing nucleoprotein from cold-adapted or wild-type influenza virus in a mouse model.

Authors: Irina Isakova-Sivak; Daniil Korenkov; Tatiana Smolonogina; Tatiana Tretiak; Svetlana Donina; Andrey Rekstin; Anatoly Naykhin; Svetlana Shcherbik; Nicholas Pearce; Li-Mei Chen; Tatiana Bousse; Larisa Rudenko
Journal: Virology Date: 2016-11-06 Impact factor: 3.616

4. Human CD4⁺ T Cell Responses to an Attenuated Tetravalent Dengue Vaccine Parallel Those Induced by Natural Infection in Magnitude, HLA Restriction, and Antigen Specificity.

Authors: Michael A Angelo; Alba Grifoni; Patrick H O'Rourke; John Sidney; Sinu Paul; Bjoern Peters; Aruna D de Silva; Elizabeth Phillips; Simon Mallal; Sean A Diehl; Beth D Kirkpatrick; Stephen S Whitehead; Anna P Durbin; Alessandro Sette; Daniela Weiskopf
Journal: J Virol Date: 2017-02-14 Impact factor: 5.103

5. Cross-reactive immunogenicity of group A streptococcal vaccines designed using a recurrent neural network to identify conserved M protein linear epitopes.

Authors: Jay A Spencer; Tom Penfound; Sanaz Salehi; Michelle P Aranha; Lauren E Wade; Rupesh Agarwal; Jeremy C Smith; James B Dale; Jerome Baudry
Journal: Vaccine Date: 2021-02-26 Impact factor: 3.641

6. From ZikV genome to vaccine: in silico approach for the epitope-based peptide vaccine against Zika virus envelope glycoprotein.

Authors: Aftab Alam; Shahnawaz Ali; Shahzaib Ahamad; Md Zubbair Malik; Romana Ishrat
Journal: Immunology Date: 2016-09-07 Impact factor: 7.397

7. Design of peptide-based epitope vaccine and further binding site scrutiny led to groundswell in drug discovery against Lassa virus.

Authors: Mohammad Uzzal Hossain; Taimur Md Omar; Arafat Rahman Oany; K M Kaderi Kibria; Abu Zaffar Shibly; Md Moniruzzaman; Syed Raju Ali; Md Monirul Islam
Journal: 3 Biotech Date: 2018-01-16 Impact factor: 2.406

Review 8. Design and utilization of epitope-based databases and predictive tools.

Authors: Nima Salimi; Ward Fleri; Bjoern Peters; Alessandro Sette
Journal: Immunogenetics Date: 2010-03-06 Impact factor: 2.846

9. Using epitope predictions to evaluate efficacy and population coverage of the Mtb72f vaccine for tuberculosis.

Authors: Lucy A McNamara; Yongqun He; Zhenhua Yang
Journal: BMC Immunol Date: 2010-03-30 Impact factor: 3.615

10. Human immunome, bioinformatic analyses using HLA supermotifs and the parasite genome, binding assays, studies of human T cell responses, and immunization of HLA-A*1101 transgenic mice including novel adjuvants provide a foundation for HLA-A03 restricted CD8+T cell epitope based, adjuvanted vaccine protective against Toxoplasma gondii.

Authors: Hua Cong; Ernest J Mui; William H Witola; John Sidney; Jeff Alexander; Alessandro Sette; Ajesh Maewal; Rima McLeod
Journal: Immunome Res Date: 2010-12-03