| Literature DB >> 33809918 |
Kosmas A Galanis1, Katerina C Nastou1, Nikos C Papandreou1, Georgios N Petichakis1, Diomidis G Pigis1, Vassiliki A Iconomidou1.
Abstract
Linear B-cell epitope prediction research has received a steadily growing interest ever since the first method was developed in 1981. B-cell epitope identification with the help of an accurate prediction method can lead to an overall faster and cheaper vaccine design process, a crucial necessity in the COVID-19 era. Consequently, several B-cell epitope prediction methods have been developed over the past few decades, but without significant success. In this study, we review the current performance and methodology of some of the most widely used linear B-cell epitope predictors which are available via a command-line interface, namely, BcePred, BepiPred, ABCpred, COBEpro, SVMTriP, LBtope, and LBEEP. Additionally, we attempted to remedy performance issues of the individual methods by developing a consensus classifier, which combines the separate predictions of these methods into a single output, accelerating the epitope-based vaccine design. While the method comparison was performed with some necessary caveats and individual methods might perform much better for specialized datasets, we hope that this update in performance can aid researchers towards the choice of a predictor, for the development of biomedical applications such as designed vaccines, diagnostic kits, immunotherapeutics, immunodiagnostic tests, antibody production, and disease diagnosis and therapy.Entities:
Keywords: B-cell epitope; consensus prediction method; immunotherapy; linear epitope; vaccine design
Year: 2021 PMID: 33809918 PMCID: PMC8004178 DOI: 10.3390/ijms22063210
Source DB: PubMed Journal: Int J Mol Sci ISSN: 1422-0067 Impact factor: 5.923
Linear B-cell epitope predictors in chronological order, alongside a short description of their methodology, their current status and their web page. After researching the relevant publications, we gathered up all the linear B-cell epitopes predictors we could find in this fairly complete, but not exhaustive catalogue. For every method we reference the source material to determine their methodology, which we have summed up for each predictor in a short description. For every predictor we also checked their availability status, as of writing this review, and categorized them regarding their general and current availability online as tools, as well as their obtainability as standalone software packages. We also provide the institution in which they were developed. In the last column, we provide the website links for each method, when available.
| Predictor | Description | Status | Institution | Link |
|---|---|---|---|---|
| Physico-chemical propensity scales, occurrence of residues | Not currently available online | Department of Zoology, University of Poona, India |
| |
| Physico-chemical propensity scales | Not available online | Laboratoire de Spectroscopies et Structures Biomoléculaire, Université de Reims Champagne Ardenne, France | - | |
| Physico-chemical propensity scales | Freely available online | Ιnstitute of environmental biology and biotechnology, CEA, France |
| |
| Physico-chemical propensity scales | Freely available online and downloadable | Ιnstitute of Microbial Technology, Chandigarh, Ιndia |
| |
| HMM & Parker hydrophilicity scale | Freely available online and downloadable | Center for Biological Sequence Analysis, Technical University of Denmark |
| |
| Physicochemical | Not available online | emergentec biodevelopment GmbH, Vienna, Austria | - | |
| SVM & AAP | Not available online | Institute of Image Processing and Pattern Recognition, Shanghai Jiaotong University, Shanghai, China | - | |
| Neural networks (feed forward & reccurent) | Freely available online and downloadable | Ιnstitute of Microbial Technology, Chandigarh, Ιndia |
| |
| SVM | Freely available online and downloadable | Dep. of Computer Science & Dep. of Genetics, Development and Cell Biology, Ιowa State University, USA |
| |
| SVM & AAP | Not currently available online | Faculty of Biology, Moscow |
| |
| Machine Learning algorithm trained to discern antigenic features | Freely available online and downloadable | Tel Aviv Uni. Ιsrael & Uni. of British Columbia Canada & Uni. of Massachusetts, USA |
| |
| SVM | Freely available online and downloadable upon request | Dep. of Computer Science and Ιnstitute for Genomics and Bioinformatics, University of California USA |
| |
| SVM | Not currently available online | Singapore Ιmmunology Network & Dep. of Biochemistry, National Uni. of Singapore |
| |
| SVM & Physicochemical propensity scales & Amino Acid Segments | Not currently available online | National Taiwan Ocean |
| |
| SVM | Not available online | Department of Biostatistics and Computational Biology, Dana-Farber Cancer Ιnstitute & Harvard School of Public Health, Boston, USA | - | |
| SVM | Not currently available online | School of Mathematical Sciences and LPMC, Nankai University, Tianjin, People’s Republic of China |
| |
| SVM | Freely available online and downloadable | University of Nebraska USA, Osaka Uni. Japan |
| |
| SVM & Physicochemical propensity scales & Position Specific Scoring Matrix | Not available online | School of Medicine, Taipei Medical University, Taipei, Taiwan | - | |
| SVM & Physicochemical propensity scales & AAP | Freely available online and downloadable | Ιnstitute of Microbial Technology, Chandigarh, Ιndia |
| |
| Amino acid descriptors & Random Forest | Not currently available online | Research Center of Modernization of Traditional Chinese Medicines, Central South University, Changsha, China |
| |
| Multiple Linear Regression | Not currently available online | The Key Laboratory of Bioinformatics, Ministry of Education, School of Life Sciences, Tsinghua University, Beijing, China |
| |
| Deep Maxout Networks | Not currently available online | The Key Laboratory of Bioinformatics, Ministry of Education, School of Life Sciences, Tsinghua University, Beijing, China |
| |
| Deviation from Expected Mean—SVM | Freely available download | Center for Advanced Study in Crystallography and Biophysics, University of Madras, Guindy Campus, Chennai, Tamil Nadu, Ιndia. |
| |
| Amino acid Anchoring Pair Composition & SVM | Not currently available online | Department of Molecular Biology, Hebei University College of Life Sciences, China |
| |
| Deep Ridge Neural Network | Not currently available online | Department of Computer Science, University of Central Florida, Orlando, FL, USA |
| |
| Random forest algorithm trained on epitopes derived from crystal structures | Freely available online and downloadable | Department of Bio and Health Informatics, Technical University of Denmark, Denmark |
| |
| Ensemble framework combining ERT & GB | Freely available online | Department of Physiology, Ajou University School of Medicine, Suwon, South Korea |
|
HMM: Hidden Markov Model, SVM: Support Vector Machine, AAP: Amino Acid Pairs, ERT: Extremely Randomized Tree, GB: Gradient Boosting, CEA: Commissariat à l’énergie atomique et aux énergies alternatives.
A summary of methods, threshold values, and modifications applied to each predictor. Each predictor first had its best performing mode selected and its threshold value set to a specific value shown in the table, using the criteria described in the manuscript.
| Predictor | Threshold | Mode | Threshold Type |
|---|---|---|---|
| BcePred | 2 | Combined | Not Default |
| BepiPred-1.0 | 0.35 | BepiPred | Default |
| ABCpred | 0.51 | 20 | Default |
| COBEpro | 4 | - | Not Default |
| SVMTriP | 0.2 | 20 | Default |
| LBtope | 0.6 | LBtop_Confirm | Default |
| LBEEP | 0.6 | Balanced | Default |
Input window sizes and prediction approach of each method. The classification of query proteins as epitopes can generally be performed in either a “per residue” or a “per peptide” basis. In the “per residue” methods each separate residue of a protein is assigned an antigenicity score, while in the “per peptide” methods, a prediction is limited within fixed windows sizes.
| Predictor | Prediction | Window Size |
|---|---|---|
| ABCpred | Per peptide | 10, 12, 14, 16, 18, 20 |
| SVMTriP | Per peptide | 10, 12, 14, 16, 18, 20 |
| LBEEP | Per peptide | 5–15 |
| BcePred | Per residue | - |
| BepiPred-1.0 | Per residue | - |
| COBEpro | Per residue | - |
| LBtope | Per residue | - |
A summary of the source of positive and negative data sets for each predictor. For every predictor, a database had to be used to construct its training data sets, which comprise of a positive and a negative subset of data. In this table, we outline the database or curated data set from which each method sourced its training data set, along with the date that the data was obtained. The date could be used to determine the snapshot of the data, which could have been obtained for each predictor’s training, allowing us to determine possible overlaps of our testing data set with the relevant training data.
| Predictor | Positive | Negative |
|---|---|---|
| BcePred | BCIPEP (2004) | 1029 random sequences |
| BepiPred-1.0 | HΙV/PELLEQUER/ANTIJEN | Not described in the original publication |
| ABCpred | BCIPEP (2006) | 700 random sequences |
| COBEpro | HΙV/PELLEQUER | HIV/Pellequer non-Epitopes |
| SVMTriP | ΙEDB (2012) | 4925 IEDB non-epitopes |
| LBtope | ΙEDB (2012) | IEDB (2012) non-epitopes |
| LBEEP | ΙEDB (2015) | IEDB (2015) non-epitopes |
A summary of test data sets utilized in this study. The counts of positive and negative subsets of data used in each of the three data sets developed for method testing is shown.
| Data Set | Epitopes | Non-Epitopes |
|---|---|---|
| BepiPred-2.0 * | 11,814 | 18,689 |
| Consensus_R | 7675 | 15,617 |
| Consensus_NR | 4286 | 5266 |
* A slightly modified version of BepiPred-2.0′s data set was used, which had a few epitopes removed because their sequence of origin was shorter than 20 amino acid residues, and thus the epitope could not be extended to the desired length.
Performance of all predictors in “per peptide” mode. The methods are tested against the Consensus_NR (Non_Redundant) data set.
| Predictor | SN% | SP% | ACC% | MCC |
|---|---|---|---|---|
| Consensus_noLBEEP | 48.39 | 58.81 | 54.14 |
|
| Consensus_ALL | 27.15 | 78.73 | 55.59 | 0.0687 |
| BcePred | 22.21 | 79.85 | 53.99 | 0.0251 |
| ABCpred | 66.44 | 36.9 | 50.16 | 0.0348 |
| LBtope | 45.91 | 58.94 | 53.1 | 0.0488 |
| BepiPred-1.0 | 49.95 | 57.84 | 54.3 |
|
| COBEpro | 58.63 | 45.67 | 51.49 | 0.0431 |
| SVMTriP | 16.21 | 85.87 | 54.62 | 0.0290 |
| LBEEP | 19.06 | 80.12 | 52.72 | −0.0103 |
SN: Sensitivity, SP: Specificity, ACC: Accuracy, MCC: Matthew’s Correlation Coefficient.
Performance of “per residue” predictors. The methods are tested against the Consensus_NR data set.
| Predictor | SN% | SP% | ACC% | MCC |
|---|---|---|---|---|
| Consensus_RES | 46.64 | 58.24 | 53.04 | 0.0489 |
| BcePred | 29.18 | 72.21 | 52.9 | 0.0154 |
| LBtope | 45.56 | 57.47 | 52.13 | 0.0304 |
| BepiPred-1.0 | 48.12 | 56.76 | 52.88 | 0.0488 |
| COBEpro | 49.27 | 52.49 | 51.05 | 0.0175 |
SN: Sensitivity, SP: Specificity, ACC: Accuracy, MCC: Matthew’s Correlation Coefficient.
Figure 1Matthews Correlation Coefficient (MCC) values achieved by all methods tested on the Consensus_NR data set at 20 amino acid residues in “per peptide” mode. The vertical axis represents the MCC value for all the methods and the horizontal axis the names of these methods. The best MCC is achieved by the BepiPred method, followed closely by our Consensus methods, while the worst performers are the LBEEP, SVMTriP, and BcePred methods.
Comparison of the performance of our consensus predictor and BepiPred-2.0 against the Consensus_NR data set.
| Predictor | SN% | SP% | ACC% | MCC |
|---|---|---|---|---|
| Consensus_noLBEEP | 50.18 | 58.54 | 54.07 | 0.0873 |
| BepiPred-2.0 | 63.35 | 42.63 | 51.93 | 0.0607 |
SN: Sensitivity, SP: Specificity, ACC: Accuracy, MCC: Matthew’s Correlation Coefficient.