| Literature DB >> 23667458 |
Harinder Singh1, Hifzur Rahman Ansari, Gajendra P S Raghava.
Abstract
One of the major challenges in designing a peptide-based vaccine is the identification of antigenic regions in an antigen that can stimulate B-cell's response, also called B-cell epitopes. In the past, several methods have been developed for the prediction of conformational and linear (or continuous) B-cell epitopes. However, the existing methods for predicting linear B-cell epitopes are far from perfection. In this study, an attempt has been made to develop an improved method for predicting linear B-cell epitopes. We have retrieved experimentally validated B-cell epitopes as well as non B-cell epitopes from Immune Epitope Database and derived two types of datasets called Lbtope_Variable and Lbtope_Fixed length datasets. The Lbtope_Variable dataset contains 14876 B-cell epitope and 23321 non-epitopes of variable length where as Lbtope_Fixed length dataset contains 12063 B-cell epitopes and 20589 non-epitopes of fixed length. We also evaluated the performance of models on above datasets after removing highly identical peptides from the datasets. In addition, we have derived third dataset Lbtope_Confirm having 1042 epitopes and 1795 non-epitopes where each epitope or non-epitope has been experimentally validated in at least two studies. A number of models have been developed to discriminate epitopes and non-epitopes using different machine-learning techniques like Support Vector Machine, and K-Nearest Neighbor. We achieved accuracy from ∼54% to 86% using diverse s features like binary profile, dipeptide composition, AAP (amino acid pair) profile. In this study, for the first time experimentally validated non B-cell epitopes have been used for developing method for predicting linear B-cell epitopes. In previous studies, random peptides have been used as non B-cell epitopes. In order to provide service to scientific community, a web server LBtope has been developed for predicting and designing B-cell epitopes (http://crdd.osdd.net/raghava/lbtope/).Entities:
Mesh:
Substances:
Year: 2013 PMID: 23667458 PMCID: PMC3646881 DOI: 10.1371/journal.pone.0062216
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Figure 1Length-wise distribution of peptides (B-cell epitopes and non-epitopes), we divided peptides in different bins like peptides having a length less than five residues, having residues between 5 to 10 residues.
Figure 2Two-sample logo showing dominance of surface accessible residues in B-cell epitopes.
Yellow and black color residues indicate to surface accessible and non-accessible residues respectively.
The performance of SVM models developed on Lbtope_Fixed dataset using various features.
| Features/Parameters | Threshold | Sensitivity | Specificity | Accuracy | MCC | AUC |
|
| −0.3 | 67.92 | 53.10 | 58.48 | 0.20 | 0.65 |
|
| −0.2 | 58.17 | 52.81 | 54.76 | 0.11 | 0.58 |
|
| −0.3 | 74.08 | 79.71 | 77.67 | 0.53 | 0.85 |
|
| −0.4 | 67.67 | 67.33 | 67.45 | 0.34 | 0.72 |
|
| −0.2 | 81.75 | 77.62 | 79.12 | 0.58 | 0.86 |
|
| 0 | 66.67 | 53.48 | 58.27 | 0.19 | 0.64 |
|
| −0.3 | 80.5 | 81.67 | 81.24 | 0.61 | 0.88 |
These models were developed using 5-fold cross-validation on 90% data and tested on remaining 10% data.
AAP*: Modified AAP where in place of multiplication, simple matrix assignment is used.
The performance of IBk models developed on Lbtope_Fixed dataset using various features.
| Features/Parameters | Threshold | Sensitivity | Specificity | Accuracy | MCC | AUC |
|
| 0.3 | 78.5 | 74.05 | 75.67 | 0.51 | 0.83 |
|
| 0.4 | 68.17 | 69.95 | 69.3 | 0.37 | 0.73 |
|
| 0.4 | 78.25 | 81.76 | 80.48 | 0.59 | 0.83 |
|
| 0.3 | 80.33 | 81.67 | 81.18 | 0.61 | 0.86 |
These models were developed using 5-fold cross-validation on 90% data and tested on remaining 10% data.
The performance of SVM models developed on Lbtope_Variable dataset using various features.
| Features/Parameters | Threshold | Sensitivity | Specificity | Accuracy | MCC | AUC |
|
| −0.4 | 68.06 | 66.31 | 66.99 | 0.34 | 0.73 |
|
| −0.1 | 67.72 | 64.42 | 65.71 | 0.31 | 0.77 |
|
| 0 | 76.06 | 63.87 | 68.61 | 0.39 | 0.72 |
|
| −0.2 | 75.18 | 76.34 | 75.89 | 0.51 | 0.83 |
These models were developed using 5-fold cross-validation on 90% data and tested on remaining 10% data.
The performance of IBk models developed on Lbtope_Variable dataset using various features.
| Features/Parameters | Threshold | Sensitivity | Specificity | Accuracy | MCC | AUC |
|
| 0.4 | 70.41 | 72.82 | 71.88 | 0.42 | 0.77 |
|
| 0.4 | 66.38 | 61.12 | 63.17 | 0.27 | 0.68 |
|
| 0.3 | 77.67 | 76.68 | 77.07 | 0.53 | 0.81 |
|
| 0.2 | 80.90 | 77.50 | 78.82 | 0.57 | 0.84 |
These models were developed using 5-fold cross-validation on 90% data and tested on remaining 10% data.
The performance of SVM models developed on Lbtope_Confirm (epitope tested by at least two studies) dataset using various features.
| Features/Parameters | Threshold | Sensitivity | Specificity | Accuracy | MCC | AUC |
|
| 0 | 81.73 | 73.18 | 76.33 | 0.53 | 0.84 |
|
| −0.1 | 76.92 | 74.86 | 75.62 | 0.50 | 0.82 |
|
| 0 | 81.73 | 73.74 | 76.68 | 0.54 | 0.83 |
|
| −0.3 | 84.62 | 81.01 | 82.33 | 0.64 | 0.91 |
These models were developed using 5-fold cross-validation on 90% data and tested on remaining 10% data.
The performance of IBk models developed on Lbtope_Confirm (epitope tested by at least two studies) dataset using various features.
| Features/Parameters | Threshold | Sensitivity | Specificity | Accuracy | MCC | AUC |
|
| 0.3 | 80.77 | 77.09 | 78.45 | 0.56 | 0.90 |
|
| 0.3 | 73.08 | 73.18 | 73.14 | 0.45 | 0.79 |
|
| 0.4 | 87.5 | 81.56 | 83.75 | 0.67 | 0.85 |
|
| 0.4 | 82.69 | 87.71 | 85.87 | 0.70 | 0.92 |
These models were developed using 5-fold cross-validation on 90% data and tested on remaining 10% data.
The performance of SVM models developed on Lbtope_Fixed_non_redundant dataset using various features.
| Features/Parameters | Threshold | Sensitivity | Specificity | Accuracy | MCC | AUC |
|
| 0.0 | 59.35 | 57.31 | 58.33 | 0.17 | 0.61 |
|
| 0.0 | 54.38 | 57.31 | 55.85 | 0.12 | 0.57 |
|
| 0.0 | 65.88 | 60.57 | 63.23 | 0.26 | 0.66 |
|
| 0.0 | 65.75 | 63.97 | 64.86 | 0.30 | 0.69 |
These models were developed using 5-fold cross-validation on 90% data and tested on remaining 10% data.
The performance of SVM models developed on Lbtope_Variable_non_redundant dataset using various features.
| Features/Parameters | Threshold | Sensitivity | Specificity | Accuracy | MCC | AUC |
|
| −0.3 | 56.41 | 59.9 | 58.39 | 0.16 | 0.60 |
|
| −0.6 | 58.41 | 66.86 | 63.19 | 0.25 | 0.66 |
|
| −0.3 | 63.51 | 62.95 | 63.19 | 0.26 | 0.68 |
|
| −0.1 | 66 | 67.24 | 66.7 | 0.33 | 0.73 |
These models were developed using 5-fold cross-validation on 90% data and tested on remaining 10% data.
The performance of our method LBtope on ABCpred dataset and performance of ABCpred on dataset Lbtope_Fixed (fixed length patterns used in this study).
| Performance on Datasets/Parameters | Sensitivity | Specificity | Accuracy | MCC |
|
| 54.55 | 49.54 | 51.39 | 0.04 |
|
| 57.14 | 71.57 | 64.36 | 0.29 |
|
| 70 | 33.71 | 57.9 | 0.04 |
The performance of BCPred, Chen on Lbtope_fixed and, Lbtope_variable dataset and performance of LBtope models on datasets used in previous studies.
| Performance of models/Parameters | Sensitivity | Specificity | Accuracy | MCC |
|
| 58.87 | 50.47 | 53.57 | 0.09 |
|
| 69.33 | 33.81 | 51.57 | 0.03 |
|
| 62.1 | 43.26 | 50.22 | 0.05 |
|
| 70.76 | 35.89 | 53.33 | 0.07 |
|
| 74.49 | 40.64 | 57.56 | 0.16 |
|
| 78.09 | 27.23 | 52.66 | 0.06 |
|
| 84.86 | 81.37 | 82.65 | 0.65 |
|
| 85.82 | 85.65 | 85.74 | 0.71 |
Performance of LBtope models on Lbtope_Confirm and Lbtope_positive_fbcpred_negative datasets.