| Literature DB >> 18179690 |
Johannes Sollner1, Rainer Grohmann, Ronald Rapberger, Paul Perco, Arno Lukas, Bernd Mayer.
Abstract
BACKGROUND: The application of peptide based diagnostics and therapeutics mimicking part of protein antigen is experiencing renewed interest. So far selection and design rationale for such peptides is usually driven by T-cell epitope prediction, available experimental and modelled 3D structure, B-cell epitope predictions such as hydrophilicity plots or experience. If no structure is available the rational selection of peptides for the production of functionally altering or neutralizing antibodies is practically impossible. Specifically if many alternative antigens are available the reduction of required synthesized peptides until one successful candidate is found is of central technical interest. We have investigated the integration of B-cell epitope prediction with the variability of antigen and the conservation of patterns for post-translational modification (PTM) prediction to improve over state of the art in the field. In particular the application of machine-learning methods shows promising results.Entities:
Year: 2008 PMID: 18179690 PMCID: PMC2244602 DOI: 10.1186/1745-7580-4-1
Source DB: PubMed Journal: Immunome Res ISSN: 1745-7580
Figure 1The figure shows the first part of the overall workflow applied in this project.
Figure 2The figure shows the second part of the overall workflow applied in this project.
protectivity AROC values for individual proteins separately derived from predictions based either on antigenicity, 1 – modifications or 1 – variability as well as a combined (sum) score for individual proteins
| attachment/fusion | P11224 | 0,51 | 0,15 | 0,42 | 0,15 |
| attachment/fusion | 2327073 | 0,77 | 0,69 | 0,77 | 0,77 |
| attachment/fusion | 1930067 | 0,83 | 0,51 | 0,64 | 0,65 |
| attachment/fusion | P09592 | 0,66 | 0,77 | 0,57 | 0,74 |
| RNA encapsidation | 13559809 | 0,62 | 0,48 | 0,50 | 0,46 |
| enzyme/secreted cytolysin | P13128 | 0,45 | 0,47 | 0,43 | 0,45 |
| RNA encapsidation | 37724690 | 0,29 | 0,74 | 0,26 | 0,67 |
| toxin/translocation/binary-toxin | Q46221 | 0,42 | 0,28 | 0,10 | 0,11 |
| attachment/capsid | P30129 | 0,77 | 0,35 | 0,07 | 0,29 |
| DNA replication | 138881 | 0,31 | 0,31 | 0,14 | 0,13 |
| toxin | P01558 | 0,41 | 0,49 | 0,29 | 0,41 |
| host evasion/IgG binding | 13622466 | 0,88 | 0,61 | 0,86 | 0,76 |
| attachment/fusion | 116774 | 0,60 | 0,40 | 0,26 | 0,41 |
| enzyme/protease | P10845 | 0,91 | 0,55 | 0,60 | 0,77 |
| attachment/fusion | P08669 | 0,63 | 0,58 | 0,67 | 0,70 |
| attachment (?) | Q02938 | 0,82 | 0,60 | 0,54 | 0,73 |
| enzyme/protease/tissue invasion | 30260755 | 0,18 | 0,80 | 0,94 | 0,85 |
| unknown function | P13664 | 0,46 | 0,68 | 0,91 | 0,70 |
| unknown function | 42374894 | 0,62 | 0,58 | 0,64 | 0,66 |
| toxin/translocation/binary-toxin | P13423 | 0,59 | 0,62 | 0,22 | 0,56 |
| unknown function | 14162008 | 0,23 | 0,23 | 0,11 | 0,08 |
| attachment/fusion | P59594 | 0,57 | 0,45 | 0,59 | 0,52 |
| unknown function | 13621499 | 0,35 | 0,30 | 0,65 | 0,36 |
| attachment (?) | 13622014 | 0,27 | 0,46 | 0,16 | 0,39 |
| unknown function | 15675130 | 0,64 | 0,96 | 0,73 | 0,92 |
| unknown function | 13622584 | 0,63 | 0,56 | 0,72 | 0,65 |
| unknown function | 790646 | 0,98 | 0,83 | 0,25 | 0,95 |
| host evasion/resistance to phagocytosis (?) | P26948 | 0,62 | 0,57 | 0,61 | 0,70 |
| unknown function | P21206 | 0,54 | 0,52 | 0,43 | 0,58 |
| enzyme | 153640 | 0,35 | 0,88 | 0,80 | 0,87 |
| transport | 13623184 | 0,55 | 0,70 | 0,65 | 0,69 |
| attachment/fusion | P35253 | 0,56 | 0,45 | 0,27 | 0,38 |
| host evasion/resistance to phagocytosis/toxin | P55128 | 1,00 | 0,40 | 0,54 | 0,73 |
| host evasion/superantigen | P06886 | 0,90 | 0,68 | 0,37 | 0,80 |
| toxin | P0A0L2 | 0,43 | 0,54 | 0,49 | 0,46 |
| attachment | 97812 | 0,80 | 0,57 | 0,80 | 0,73 |
| attachment/fusion | P27662 | 0,66 | 0,43 | 0,72 | 0,50 |
| transport | 13621681 | 0,64 | 0,41 | 0,54 | 0,46 |
| toxin/peptidase/binary-toxin | P15917 | 0,51 | 0,22 | 0,65 | 0,21 |
| attachment/fusion | P33478 | 0,71 | 0,61 | 0,69 | 0,69 |
| attachment/fusion | P07946 | 0,71 | 0,46 | 0,31 | 0,53 |
| attachment/fusion | Q05320 | 0,43 | 0,54 | 0,18 | 0,38 |
| attachment/RNA encapsidation | P03308 | 0,88 | 0,86 | 0,71 | 0,82 |
| attachment/fusion/neuraminidase | O89343 | 0,19 | 0,88 | 0,31 | 0,71 |
| attachment/fusion | P05769 | 0,93 | 0,97 | 0,91 | 0,97 |
| attachment/RNA encapsidation | P08617 | 0,86 | 0,82 | 0,86 | 0,90 |
Proteins highlighted in bold are markedly homologous or identical to sequences used for the training of the PCA19 antigenicity classifier. Values derived from those proteins are denoted as "contaminated". The last five rows compare overall (global) classification performances of concatenated (merged) proteins as well as mean and median values between individual parameters and the combined score. Note that "merged" does not mean averaged and is thus biased by the length of individual proteins.
The table lists AROC performance of protectivity classifications for individual proteins based on different parameters
| 10 | 7 | 16 | 7 | |
| 21 | 19 | 16 | 30 | |
| 0.62 | 0.56 | 0.56 | 0.65 | |
| 0.55 | 0.59 | 0.45 | 0.63 |
Note that measured by the AROC median non-contaminated proteins unexpectedly seem to perform better than contaminated sequences. The combined classifier yields higher median AROC values than classifications based on individual parameters. Possibly more important, the number of proteins with AROC values > = 0.65 is substantially higher when comparing the combined score (protein count 30) with the best single score antigenicity (protein count 21).
Figure 3The figure shows the distribution of AROC values for protectivity predictions solely based on antigenicity, PTM (post translational modifications), variability as well as the sum score of the three.
The table lists numbers of protective epitopes exhibiting negative or positive feature association when considering all six possible combinations of antigenicity, modification percentage, variability and evolutionary constraint index (ECI)
| 0/110 | 33/14 | 31/26 | 26/23 | |
| - | 0/110 | 22/27 | 19/21 | |
| - | - | 0/110 | 15/31 | |
| - | - | - | 0/110 |
Note that like for the prediction of protectivity the used modification and variability scores (features) have been used as 1-value, so actual associations have to inverted as well. Each field in the table contains a pair of numbers where the first and second indicate the number of epitopes associated with a Pearson coefficient of < = -0.5 and > = 0.5, respectively. High numbers therefore indicate a strong degree of feature association.
The table shows the class separation measured by 10-fold stratified cross-validation using a C4.5 decision tree learner on the training dataset (MTS). In each row a different combination of input parameters was used as indicated by X (present) in the last four columns. Separation baseline for this dataset is 50%
| Nr | %sep | TN | FP | FN | TP | FPrate | TPrate | score | antigen | PTM | var |
| a | 57.78 | 1086 | 759 | 799 | 1046 | 0.41 | 0.57 | ||||
| b | 54.58 | 1242 | 603 | 1073 | 772 | 0.33 | 0.42 | ||||
| c | 49.86 | 920 | 925 | 925 | 920 | 0.50 | 0.50 | ||||
| d | 60.08 | 1382 | 463 | 1010 | 835 | 0.25 | 0.45 | ||||
| e | 70.41 | 1399 | 446 | 646 | 1199 | 0.24 | 0.65 | ||||
| f | 63.47 | 1607 | 238 | 1110 | 735 | 0.13 | 0.40 | ||||
| g | 65.42 | 1337 | 508 | 768 | 1077 | 0.28 | 0.58 | ||||
| h | 65.28 | 1259 | 586 | 695 | 1150 | 0.32 | 0.62 | ||||
| i | 66.18 | 1263 | 582 | 666 | 1179 | 0.32 | 0.64 | ||||
| j | 56.02 | 1298 | 547 | 1076 | 769 | 0.30 | 0.42 | ||||
| k | 61.00 | 1446 | 399 | 1041 | 804 | 0.22 | 0.44 | ||||
| l | 63.82 | 1520 | 325 | 1010 | 835 | 0.18 | 0.45 |
Column abbreviations: %sep (% class separation), score (sum score), antigen (PCA19 derived antigenicity score), PTM (post-translational modifications), var (variability score), TPrate (True Positive rate) and FPrate (False Positive rate).
Figure 4The figure shows the ROC plot of C4.5 decision trees used to determine the relevance of individual parameters and parameter combinations for the prediction of protectivity. For details on the obtained classifications please see table 4.
The table compares the performance of a decision tree derived using the C4.5 algorithm and a Random Forest, both generated with standard parameters. Ten-fold stratified cross-validation (on MTS) and validation on an independent protectivity validation set (MVS) are compared
| C4.5 | Random Forest | |||||
| % sep | FPrate | TPrate | % sep | FPrate | TPrate | |
| Cross-validation | 70.41 | 0.24 | 0.65 | 83.39 | 0.15 | 0.82 |
| Validation-set | 73.64 | 0.26 | 0.73 | 84.27 | 0.15 | 0.83 |
Column abbreviations: %sep (% class separation), FPrate (False Positive rate) and TPrate (True Positive rate).
Figure 5The figure plots the classification results of the C4.5 tree (in blue), the Random Forest (in green) as obtained during validation in comparison to the signal peptide (fat grey bar) and the published continuous epitope (fat red bar). On top the values for the sum-score are plotted. The last line indicates the hypothetical selection of five 17-mers (slim red bars) based on the Random Forest prediction as it performed best during cross-validation.