| Literature DB >> 28183272 |
Georgios A Dalkas1,2, Marianne Rooman3,4.
Abstract
BACKGROUND: The identification of immunogenic regions on the surface of antigens, which are able to be recognized by antibodies and to trigger an immune response, is a major challenge for the design of new and effective vaccines. The prediction of such regions through computational immunology techniques is a challenging goal, which will ultimately lead to a drastic limitation of the experimental tests required to validate their efficiency. However, current methods are far from being sufficiently reliable and/or applicable on a large scale.Entities:
Keywords: Antigen-antibody complexes; B-cell epitopes; Bioinformatics predictor; Immunoinformatics; Machine learning; Physicochemical properties; Statistical potentials; β2 adrenergic G-protein-coupled receptor
Mesh:
Substances:
Year: 2017 PMID: 28183272 PMCID: PMC5301386 DOI: 10.1186/s12859-017-1528-9
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Fig. 1Cumulative distributions for individual features, with the D-value of the KS test indicated (a) Energy-like solvent accessibility feature F11 for the sequence interval of size I = 7, with a D-value of 0.185; (b) Feature F2 defined as the ratio of the amino acid frequency in epitopes and in the remaining antigen, with a D-value of 0.177
Prediction performance of the individual features F1-13 and of their combination (F), for all window sizes W = 0-9, estimated by the AUC score and evaluated by 10-fold cross validation of the S85 set. The features indicate intrinsically disordered regions (F8 and F7), flexibility (F5 and F6), evolutionary information (F13), energy-like (F9), secondary structure (F4), solvent accessibility (F10 and F11), solubility (F12), hydrophilicity (F3), and amino acid composition (F1 and F2)
| AUC score for different window sizes W | ||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| W | F1 | F2 | F3 | F4 | F5 | F6 | F7 | F8 | F9 | F10 | F11 | F12 | F13 | F |
| 0 | 0.586 | 0.574 | 0.545 | 0.561 | 0.517 | 0.560 | 0.523 | 0.519 | 0.551 | 0.516 | 0.521 | 0.547 | 0.532 | 0.644 |
| 3 | 0.591 | 0.615 | 0.576 | 0.533 |
| 0.579 | 0.543 | 0.514 | 0.569 | 0.548 | 0.542 | 0.585 | 0.547 | 0.639 |
| 5 | 0.604 | 0.597 | 0.579 | 0.552 | 0.542 |
| 0.544 | 0.511 |
| 0.583 | 0.575 | 0.588 |
| 0.635 |
| 7 | 0.600 | 0.603 | 0.570 | 0.558 | 0.541 |
| 0.545 | 0.495 |
|
|
|
| 0.548 | 0.640 |
| 9 |
|
|
|
| 0.533 | 0.579 |
|
| 0.553 | 0.569 | 0.586 | 0.570 | 0.550 |
|
Values in bold correspond to the optimal window sizes for each feature
Fig. 2Prediction performance of the individual features F1-13 and of their combination (F), estimated by the AUC and evaluated by 10-fold cross validation of the S85 set, using a sequence window size W = 9. The bold horizontal line indicates the level of random prediction. From least to best performing: intrinsically disordered regions (F8 and F7), flexibility (F5 and F6), evolutionary information (F13), energy-like (F9), secondary structure (F4), solvent accessibility (F10 and F11), solubility (F12), hydrophilicity (F3), and amino acid composition (F1 and F2)
Increase of the prediction performance upon sequential addition of features. The window size is W = 9, and the AUC score is evaluated in 10-fold cross validation on the S85 dataset
| Feature combination | AUC score |
|---|---|
| F1 | 0.619 |
| F1 + F2 | 0.624 |
| F1 + F2 + F10 | 0.629 |
| F1 + F2 + F10 + F11 | 0.630 |
| F1 + F2 + F10 + F11 + F12 | 0.631 |
| F1 + F2 + F9 + F10 + F11 + F12 | 0.631 |
| F1 + F2 + F6 + F9 + F10 + F11 + F12 | 0.636 |
| F1 + F2 + F3 + F6 + F9 + F10 + F11 + F12 | 0.636 |
| F1 + F2 + F3 + F6 + F9 + F10 + F11 + F12 + F13 | 0.637 |
| F1 + F2 + F3 + F6 + F9 + F10 + F11 + F12 + F13 + F7 | 0.640 |
| F1 + F2 + F3 + F6 + F9 + F10 + F11 + F12 + F13 + F7 + F4 | 0.644 |
| F1 + F2 + F3 + F6 + F9 + F10 + F11 + F12 + F13 + F7 + F4 + F5 | 0.644 |
| F1 + F2 + F3 + F6 + F9 + F10 + F11 + F12 + F13 + F7 + F4 + F5 + F8 |
|
The largest AUC score is indicated in bold
Prediction performance of the combination of features as a function of the window size, estimated by the AUC score and evaluated on the S19 test set
| Window size | AUC score |
|---|---|
| 0 | 0.643 |
| 3 | 0.639 |
| 5 | 0.635 |
| 7 | 0.640 |
| 9 |
|
The best score is indicated in bold
The performance of different epitope prediction servers, estimated by the AUC score and evaluated on the S19 test set
| Category | Method | AUC |
|---|---|---|
| Sequence- based | Ensemblebound [ | 0.579 |
| Zhangbound [ | 0.600 | |
| Zhangunbound [ | 0.601 | |
| Ensembleunbound [ | 0.604 | |
| CBTOPE [ | 0.607 | |
| SEPIa |
| |
| Structure-based | EPCES [ | 0.569 |
| EPITOPIA [ | 0.572 | |
| DiscoTope [ | 0.579 | |
| BPredictor [ | 0.587 | |
| SEPPA [ | 0.589 | |
| EPSVR [ | 0.606 |
The largest score is indicated in bold
Fig. 3Predicted and observed epitope residues in the human β2AR receptor. The predicted epitope residues are in green, the observed epitopes are in red, and the residues that are both predicted and observed as epitopes are in blue. Above: amino acid sequence, with the modeled loop regions in italic and underlined. Below: structure of β2AR co-crystallized with a Fab fragment, shown as ribbons with predicted and observed epitopes in sticks; β2AR is colored in light purple with modeled regions in light pink, Fab heavy chain in dark gray and Fab light chain in light gray