Literature DB >> 23620816

Conformational B-cell epitope prediction on antigen protein structures: a review of current algorithms and comparison with common binding site prediction methods.

Bo Yao¹, Dandan Zheng, Shide Liang, Chi Zhang.

Abstract

Accurate prediction of B-cell antigenic epitopes is important for immunologic research and medical applications, but compared with other bioinformatic problems, antigenic epitope prediction is more challenging because of the extreme variability of antigenic epitopes, where the paratope on the antibody binds specifically to a given epitope with high precision. In spite of the continuing efforts in the past decade, the problem remains unsolved and therefore still attracts a lot of attention from bioinformaticists. Recently, several discontinuous epitope prediction servers became available, and it is intriguing to review all existing methods and evaluate their performances on the same benchmark. In addition, these methods are also compared against common binding site prediction algorithms, since they have been frequently used as substitutes in the absence of good epitope prediction methods.

Entities: Disease Species

Mesh：

Substances：

Year: 2013 PMID： 23620816 PMCID： PMC3631208 DOI： 10.1371/journal.pone.0062249

Source DB: PubMed Journal: PLoS One ISSN： 1932-6203 Impact factor: 3.240

Introduction

Antigenic epitopes are regions of the antigen protein surface that are preferentially recognized by antibodies. Prediction of B-cell antigenic epitopes is of direct help to the design of vaccine components and immuno-diagnostic reagents. Usually, B-cell antigenic epitopes are classified as either continuous or discontinuous. The majority of available epitope prediction methods focus on continuous epitopes [1], [2], [3], [4], [5], [6], [7], [8], [9], [10], [11], [12]. On the other hand, discontinuous epitopes dominate most antigenic epitope families [13]. Unfortunately, due to computational complexity and the limited number of known antibody-antigen complex structures, only a limited number of prediction methods exist for discontinuous epitope prediction: CEP [14], DiscoTope [15], BEpro(PEPITO) [16], ElliPro [17], SEPPA [18], EPITOPIA [19], [20] and EPCES [21], EPSVR [22], EPMeta [22], and Bpredictor [23]. Since currently all discontinuous epitope prediction methods require the three-dimensional (3D) structures of antigenic proteins, the small number of available antigen-antibody complex structures greatly limits the development of reliable discontinuous epitope prediction methods. In addition, an unbiased benchmark set is very much in demand [21], [24].

Results

Performance of Structure-based Prediction Methods

In the review, we will discuss and evaluate conformational epitope predictors of DiscoTope [15], BEpro(PEPITO) [16], ElliPro [17], SEPPA [18], EPITOPIA [19], [20] and EPCES [21], EPSVR [22], Bpredictor [23], and EPMeta [22] for all of which there exist web servers or free downloadable software packages. DiscoTope [15] integrates with linear combination two scores, the hydrophilicity scale and the epitope log-odds ratios, the latter of which is also one kind of epitopic residue propensity score. BEpro(PEPITO) [16] also applies linear combination to two scores: the epitopic residue propensity and the half sphere exposure values at multiple distances. ElliPro [17] uses only one single score, i.e. residue protrusion index (PI). SEPPA [18] employs the epitopic residue propensity and the compactness of the neighboring residues around one residue (contact number or flat surface), again using linear combination. EPITOPIA [19], [20] applies a naive Bayesian classifier to forty-four physico-chemical and structural–geometrical attributes, including secondary structure, propensity, conservation, solvent accessible surface, and hydrophilicity etc. EPCES [21] devises a special linear method, using a voting mechanism for consensus, to integrate six scores, namely propensity, amino acid side-chain energy value, secondary structure composition, contact number, conservation score, and surface planarity score. One step forward, EPSVR [22] uses the same attributes as EPCES [21] but Support Vector Regression (SVR) to integrate all scores. Bpredictor [23] employs the random forest classifier to adjacent residue distance score, accessible surface area, conservation, secondary structure, and propensity etc. EPMeta is a meta server, which combines EPSVR, EPCES, EPITOPIA, SEPPA, PEPITO, and Discotope1.2. In general, the features used by these predictors include conservation score, structural features such as secondary composition, geometry characteristics such as protrusion index and planarity score, and amino acid features such as hydrophilicity and propensity (odd-ratios). These attributes can be integrated by linear combination or machine-learning algorithms, such as naive Bayesian classifiers, SVR, and random forest classifiers. Different number of features can be used in a given predictor, from two scores to forty-four attributes. For small numbers of attributes, a simple linear combination can usually work well, whereas large numbers of features often require sophisticated machine-learning algorithms to optimally integrate the scores. Notably, some of these features may be mutual-exclusive or overlapped. For example, the antigenic epitope is frequently located at either a protruding region or a flat surface. In such cases, linearly combining two incompatible terms contradicts the physical basis and will only degrade the performance of a predictor. The above epitope predictors are trained with most or all of the available antigen-antibody complex structures obtained from x-ray diffraction on crystallized proteins. Therefore, the independent test set compiled by Liang et al. [22], which contains 19 protein monomer structures with epitope information derived from experimental methods other than crystal structures, was applied to all methods as an independent evaluation. Table 1 shows the area under receiver operating characteristic curve (AUC) values of all methods. A receiver operating characteristic (ROC) curve represents a dependency of sensitivity and (1-specificity), which is plotted with true positives rate versus false positive rate at various threshold settings. To change the threshold setting, the number of predicted residues is increased in steps of 1% of total surface residues. The mean AUC values are calculated using the method described by Liang et al. [22], except for Bpredictor. For Bpredictor, the AUC value is directly obtained from the manuscript, where the same benchmark by Liang et al. was applied as in the current work. Among single servers, EPSVR and Bpredictor have the best performance according to the AUC values. Although EPSVR has the highest mean AUC value, the differences between EPSVR and other servers are not statistically significant (p-value >0.05), according to the pairwise t-student tests. The meta server, EPMeta, achieves a mean AUC value of 0.638, which is significantly higher than all single servers.

Table 1

List of the conformational B-cell epitope prediction methods and their obtained AUC results.

Method	URL of web server	AUC	Accuracyb(%)
DiscoTope [15]	http://www.cbs.dtu.dk/services/DiscoTope/	0.567	15.5
BEpro(PEPITO) [16]	http://pepito.proteomics.ics.uci.edu/	0.570	17.0
ElliPro [17]	http://tools.immuneepitope.org/tools/ElliPro/iedb_input	0.585	14.3
SEPPA [18]	http://lifecenter.sgst.cn/seppa/index.php	0.576	17.2
EPITOPIA [19], [20]	http://epitopia.tau.ac.il/index.html	0.579	17.8
EPCES [21]	http://sysbio.unl.edu/EPCES/	0.586	18.8
EPSVR [22]	http://sysbio.unl.edu/EPSVR/	0.597	24.7
Bpredictor [23]	http://code.google.com/p/my-project-bpredictor/downloads/list	0.598a	24.0c
EPMeta [22]	http://sysbio.unl.edu/EPMeta/	0.638	25.6

The AUC value is obtained from the Reference [23].

10% of surface residues are returned as predicted epitopic residues.

Estimated based on the Figure 4 in the Reference [23].

The AUC value is obtained from the Reference [23]. 10% of surface residues are returned as predicted epitopic residues. Estimated based on the Figure 4 in the Reference [23]. The accuracy, i.e. positive prediction rate, is useful for experimental testing. If each server returns 10% of surface residues as predicted epitopic residues, the accuracy is 14.3%, 15.5%, 17.0%, 17.2%, 17.8%, 18.8%, 24.7%, and 25.6% for ElliPro [17], DiscoTope1.2 [15], BEpro (PEPITO) [16], SEPPA [18], EPCES [21], EPITOPIA [19], [20], EPSVR [22], and EPMeta [22] respectively. The accuracy is around 24% for Bpredictor based on Figure 4 in the Reference[23]. The rationale of selecting 10% is because the average length of antigen proteins is around 200 amino acids, and the average size of epitopic patch is about 20 amino acid residues. The current level of accuracy of all predictors is not yet satisfactory. Even the highest accuracy, 25.6% achieved by EPMeta, leaves room for further improvement. If 3% of surface residues are returned as predicted epitopic residues, the accuracy of EPMeta is 31.6%, which is the overall highest value by all conditions and methods.

Single Chain or Multiple Chains

The recognition of antibody to antigenic epitopes has high specificity; the epitopic surface is not as conserved as other functional protein binding sites, which comes from the conserved functions of protein-protein interactions during evolution. The interfaces of regular protein-protein binding are usually more conserved and have more hydrophobic amino acid residues than non-binding protein surfaces. This makes the exposed protein-protein interfaces relatively easy to distinguish from both the antigenic epitopes and non-binding protein surfaces. In other words, the prediction task for a single chain protein that has both protein-protein binding interfaces and an antigenic epitope is easier than that of a complete protein complex. In the benchmark, six of the proteins (PDB IDs: 1eku, 1av1, 1al2, 1jeq, 2gib, and 1qgt) possess multiple chains. Therefore, in the evaluation all methods are tested with two different scenarios for these six proteins: prediction on a single chain, where the experimental antigenic epitope is located, and prediction on the whole protein, including all chains. When using multiple chains, all chains are considered, and the total number of surface residues is counted for the intact complex structure. As a result, some methods, such as EPSVR, show dampened performances if the whole protein is used for prediction, resulting in lower mean AUC values for the 6 proteins as compared with predicting based on the single chain containing the antigenic epitope. Therefore, in the future, if sufficient data exist, variant test datasets shall be compiled for different cases, i.e. single chain antigens, single chains from antigen complexes, and antigen complexes. A good antigenic epitope predictor shall have satisfying performance on all types of benchmarks.

Protein Binding Site Prediction Methods

Protein binding site prediction methods are frequently borrowed for conformational epitope prediction [24], [25], since epitopic patches can be considered as one kind of protein binding sites, and due to the lack of many epitope prediction methods for analysis and comparison. The methodologies used by protein binding site prediction and epitope prediction are similar; both integrate some amino acid scoring functions with a machine learning algorithm or other platform to train a prediction model on known data. The major difference is their distinct training sets; while protein binding site prediction uses all known protein-protein binding complexes, an epitope prediction method is trained with antibody-antigen complexes only. Therefore, we also applied the independent benchmark of epitopes to some binding site prediction methods. For this we selected binding site prediction methods that have both demonstrated good performance and convenient web servers for public use. The AUCs achieved by these methods for the epitope benchmark are shown in Table 2. One can see that the performances of the binding site prediction methods to predict B-cell epitopes are significantly lower than all conformational epitope prediction methods. This is not surprising, because all binding site prediction methods are designed based on the conservation and hydrophobicity of binding patches, but B-cell epitopic patches are neither conserved nor more hydrophobic compared with other protein-protein binding surfaces. Instead, the residues on the antigenic epitopes are more diverse than regular surface residues due to the evolution pressure from the host immune system. Therefore, we conclude that the general binding site prediction methods are not suitable for antigenic epitope prediction. Any future developed epitope prediction method is not recommended to claim performance improvement by comparing with binding site prediction methods.

Table 2

List of the protein binding site prediction methods and their obtained AUC results.

Method	URL of web server	AUC
ProMate [26]	http://bioinfo.weizmann.ac.il/promate/	0.530
ConSurf [27]	http://consurf.tau.ac.il/index_proteins.php	0.460a
PINUP [28]	http://sysbio.unl.edu/services/PINUP	0.562
PIER [29]	http://abagyan.ucsd.edu/PIER/pier.cgi?act=dataset	0.537

Conserved residues are selected as for common binding site prediction.

Discussion

Currently, various sets of attributes and classifiers have been applied by different existing epitope prediction algorithms, which naturally leads to one question: Which combination of attributes is optimal for the prediction? To answer this question, one may systematically evaluate different machine-learning algorithms on all non-redundant attributes and allocate the optimal set among them. Also of great importance to the epitope prediction research is the growth of the training data, especially the antigens that have both bounded and unbounded structures. In addition, it is also important to collect high quality independent testing data, such as the ones compiled by Liang et al. [22] that contain experimentally measured epitopic residues but no complex structures. We also recommend that all future researchers implement their developed algorithms as free accessible web servers or downloadable software packages, because B-cell epitope prediction algorithms will likely become more and more complicated and meta-methods usually have better prediction accuracy than any of the single algorithms (Table 1).

Conclusions

In recent years, there have been developed a number of new conformational B-cell epitope prediction algorithms. While the prediction performance has accumulated some improvement, it is still far from satisfactory. Compared with other bioinformatic problems, antigenic epitope prediction is especially difficult due to the lack of properties that are universally observed for the antigenic epitopes but not for other protein surfaces. Additionally, common binding site prediction methods are not suitable for antigenic epitope prediction because they focus on the conservation of surface residues.

28 in total

1. ProMate: a structure based prediction program to identify the location of protein-protein binding sites.

Authors: Hani Neuvirth; Ran Raz; Gideon Schreiber
Journal: J Mol Biol Date: 2004-04-16 Impact factor: 5.469

2. Prediction of residues in discontinuous B-cell epitopes using protein 3D structures.

Authors: Pernille Haste Andersen; Morten Nielsen; Ole Lund
Journal: Protein Sci Date: 2006-09-25 Impact factor: 6.725

3. New hydrophilicity scale derived from high-performance liquid chromatography peptide retention data: correlation of predicted surface residues with antigenicity and X-ray-derived accessible sites.

Authors: J M Parker; D Guo; R S Hodges
Journal: Biochemistry Date: 1986-09-23 Impact factor: 3.162

4. Mapping Epitope Structure and Activity: From One-Dimensional Prediction to Four-Dimensional Description of Antigenic Specificity

Authors:
Journal: Methods Date: 1996-06 Impact factor: 3.608

5. Induction of hepatitis A virus-neutralizing antibody by a virus-specific synthetic peptide.

Authors: E A Emini; J V Hughes; D S Perlow; J Boger
Journal: J Virol Date: 1985-09 Impact factor: 5.103

6. Prediction of conformational B-cell epitopes from 3D structures by random forests with a distance-based feature.

Authors: Wen Zhang; Yi Xiong; Meng Zhao; Hua Zou; Xinghuo Ye; Juan Liu
Journal: BMC Bioinformatics Date: 2011-08-17 Impact factor: 3.169

7. Predicting linear B-cell epitopes using string kernels.

Authors: Yasser El-Manzalawy; Drena Dobbs; Vasant Honavar
Journal: J Mol Recognit Date: 2008 Jul-Aug Impact factor: 2.137

8. Prediction of antigenic epitopes on protein surfaces by consensus scoring.

Authors: Shide Liang; Dandan Zheng; Chi Zhang; Martin Zacharias
Journal: BMC Bioinformatics Date: 2009-09-22 Impact factor: 3.169

9. SEPPA: a computational server for spatial epitope prediction of protein antigens.

Authors: Jing Sun; Di Wu; Tianlei Xu; Xiaojing Wang; Xiaolian Xu; Lin Tao; Y X Li; Z W Cao
Journal: Nucleic Acids Res Date: 2009-05-22 Impact factor: 16.971

10. Antibody-protein interactions: benchmark datasets and prediction tools evaluation.

Authors: Julia V Ponomarenko; Philip E Bourne
Journal: BMC Struct Biol Date: 2007-10-02

37 in total

1. Peptide amphiphile micelles self-adjuvant group A streptococcal vaccination.

Authors: Amanda Trent; Bret D Ulery; Matthew J Black; John C Barrett; Simon Liang; Yulia Kostenko; Natalie A David; Matthew V Tirrell
Journal: AAPS J Date: 2014-12-20 Impact factor: 4.009

Review 2. Antibody specific epitope prediction-emergence of a new paradigm.

Authors: Inbal Sela-Culang; Yanay Ofran; Bjoern Peters
Journal: Curr Opin Virol Date: 2015-03-31 Impact factor: 7.090

3. In Silico Prediction of Linear B-Cell Epitopes on Proteins.

Authors: Yasser El-Manzalawy; Drena Dobbs; Vasant G Honavar
Journal: Methods Mol Biol Date: 2017

4. Origins of specificity and affinity in antibody-protein interactions.

Authors: Hung-Pin Peng; Kuo Hao Lee; Jhih-Wei Jian; An-Suei Yang
Journal: Proc Natl Acad Sci U S A Date: 2014-06-17 Impact factor: 11.205

5. Learning context-aware structural representations to predict antigen and antibody binding interfaces.

Authors: Srivamshi Pittala; Chris Bailey-Kellogg
Journal: Bioinformatics Date: 2020-07-01 Impact factor: 6.937

6. A combined view of B-cell epitope features in antigens.

Authors: Nayem Zobayer; Abm Aowlad Hossain; Md Asadur Rahman
Journal: Bioinformation Date: 2019-08-15

7. Building classifier ensembles for B-cell epitope prediction.

Authors: Yasser EL-Manzalawy; Vasant Honavar
Journal: Methods Mol Biol Date: 2014

8. Anti-peptide monoclonal antibodies generated for immuno-multiple reaction monitoring-mass spectrometry assays have a high probability of supporting Western blot and ELISA.

Authors: Regine M Schoenherr; Richard G Saul; Jeffrey R Whiteaker; Ping Yan; Gordon R Whiteley; Amanda G Paulovich
Journal: Mol Cell Proteomics Date: 2014-12-15 Impact factor: 5.911

9. A Computationally Designed Serological Assay for Porcine Epidemic Diarrhea Virus.

Authors: Yunfeng Song; Pankaj Singh; Eric Nelson; Sheela Ramamoorthy
Journal: J Clin Microbiol Date: 2016-05-25 Impact factor: 5.948

10. SEPPA 2.0--more refined server to predict spatial epitope considering species of immune host and subcellular localization of protein antigen.

Authors: Tao Qi; Tianyi Qiu; Qingchen Zhang; Kailin Tang; Yangyang Fan; Jingxuan Qiu; Dingfeng Wu; Wei Zhang; Yanan Chen; Jun Gao; Ruixin Zhu; Zhiwei Cao
Journal: Nucleic Acids Res Date: 2014-05-16 Impact factor: 16.971