| Literature DB >> 17105666 |
Jesper Salomon1, Darren R Flower.
Abstract
BACKGROUND: Modelling the interaction between potentially antigenic peptides and Major Histocompatibility Complex (MHC) molecules is a key step in identifying potential T-cell epitopes. For Class II MHC alleles, the binding groove is open at both ends, causing ambiguity in the positional alignment between the groove and peptide, as well as creating uncertainty as to what parts of the peptide interact with the MHC. Moreover, the antigenic peptides have variable lengths, making naive modelling methods difficult to apply. This paper introduces a kernel method that can handle variable length peptides effectively by quantifying similarities between peptide sequences and integrating these into the kernel.Entities:
Mesh:
Substances:
Year: 2006 PMID: 17105666 PMCID: PMC1664591 DOI: 10.1186/1471-2105-7-501
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Overview of data sets
| MHCBN | HLA-DRB1*0101 | 580 | 475 | 105 |
| HLA-DRB1*0301 | 369 | 219 | 150 | |
| MHCBench | Set 1 | 1017 | 694 | 323 |
| Set 2 | 673 | 381 | 292 | |
| Set 3a | 590 | 373 | 217 | |
| Set 3b | 495 | 279 | 216 | |
| Set 4a | 646 | 323 | 323 | |
| Set 4b | 584 | 292 | 292 | |
| Set 5a | 117 | 70 | 47 | |
| Set 5b | 85 | 48 | 37 | |
| MHCPEP | 20 sets from MHC alleles | 3578 | 3578 | 0 |
Overview of the benchmark data sets. MHCBench Sets 1–5 contain data from the HLA-DRB1*0401 allele. MHCPEP consists of data from numerous alleles, with 18 MHC Class II and a single MHC Class I allele selected.
Figure 1Evaluating performance for varying the β-parameter using 10-fold CV. Graphs are plotted for accuracy (proportion of correct predictions), sensitivity (proportion of false negatives), specificity (proportion of true positives), and AROC (area under receiver operating characteristic curve).
Results from varying substitution matrices
| HENS920102 | BLOSUM62. Matrix based on possible pair-wise substitutions from aligned segments of polypeptides [34] | 0.05 | 0.8049 | 0.8708 | 0.532 | 0.543 |
| BLAJ010101 | Matrix built from structural superposition data for identifying potential remote homologues [52] | 0.027 | 0.8207 | 0.8752 | 0.540 | 0.571 |
| DOSZ010104 | SM_THREADER_NORM. Amino acid similarity matrices based on force fields (Normalised version) [36] | 0.045 |
Evaluating performance for using different scoring matrices using 10-fold CV. The test measurements are same as in previous experiment. Best values are shown in bold.
Results on HLA-DRB1*0101 and HLA-DRB1*0301
| HLA-DRB1*0101 | SKM (β = 0.04) | 0.886 | 0.643 | |||
| SKM (β = 0.085) | 0.904 | 0.778 | 0.096 | |||
| LP_top2 | 0.779 | 0.221 | ||||
| TEPITOPE | 0.842 | 0.158 | ||||
| HLA-DRB1*0301 | SKM (β = 0.06) | 0.823 | 0.177 | |||
| SKM (β = 0.08) | 0.757 | 0.575 | 0.525 | |||
| LP_top2 | 0.721 | 0.279 | ||||
| TEPITOPE | 0.585 | 0.415 | ||||
Results of 5-fold cross-validation with best results shown in bold. Results from LP_top2 and TEPITOPE are taken from [21]. Measurements are same as previously reported, except for AOVER-ROC, which is the area over the ROC curve. AROC = 1.00 is perfect classification, so AOVER-ROC, 1- AROC, can be seen as an error measure.
Comparison of AROC values on HLA-DRB1*0401 data sets from MHCBench
| TEPITOPE | 0.776 | 0.740 | 0.740 | 0.754 | 0.763 | 0.750 | 0.651 | 0.661 | 0.729 |
| PERUN | 0.771 | 0.685 | 0.693 | 0.713 | 0.724 | 0.672 | 0.695 | 0.714 | 0.708 |
| Gibbs Sampler2 | 0.803 | 0.775 | 0.75 | 0.762 | 0.793 | 0.787 | 0.6211 | 0.6611 | 0.744 |
| LP_top22 | 0.725 | 0.721 | 0.728 | 0.753 | 0.719 | 0.728 | 0.756 | ||
| SKM | 0.787 | 0.770 |
Comparing performance of SKM with results reported for the Gibbs Sampling method [27], "LP_top2" [21], and PERUN [7]. Best results shown in bold.
1: Best reported results, where Cysteines are treated as Alanines [27].
2: Best reported results of [21].
Results of the LP_top2 and Gibbs Sampler are from evaluation on the MHCBench sets. However, as is described in [21], training was performed on a training set consisting of selected samples from MHCPEP [1] and SYFPEITHI [53]. However, MHCBench mainly consists of samples from MHCPEP, and a large overlap exist between training and test sets (e.g. 502 of 646 samples of Set 4a).
Results of SKM on multiple MHC Class II alleles from MHCPEP
| HLA-DR11 | Human | 1346 | 0.04 | 0.9123 | 0.9153 | 0.9094 | 0.9712 | 0.8460 | 0.8247 | 0.0288 |
| - *0101 | 474 | 0.06 | 0.8987 | 0.9114 | 0.8861 | 0.9673 | 0.8864 | 0.7977 | 0.0327 | |
| - *0102 | 12 | 0.005 | 0.8333 | 0.6667 | 1 | 0.9444 | 0.9444 | 0.7071 | 0.0556 | |
| HLA-DR21 | Human | 648 | 0.15 | 0.9059 | 0.9692 | 0.8426 | 0.9608 | 0.8701 | 0.8183 | 0.0392 |
| - *0201 | 44 | 0.8 | 0.8864 | 0.9091 | 0.8636 | 0.9360 | 0.9360 | 0.7735 | 0.0640 | |
| HLA-DR31 | Human | 378 | 0.15 | 0.9101 | 0.9577 | 0.8624 | 0.9750 | 0.9216 | 0.8239 | 0.0250 |
| - *0301 | 242 | 0.02 | 0.9339 | 0.9008 | 0.9669 | 0.9847 | 0.9676 | 0.8697 | 0.0153 | |
| HLA-DR41 | Human | 1742 | 0.125 | 0.9248 | 0.9460 | 0.9036 | 0.9749 | 0.8677 | 0.8504 | 0.0251 |
| - *0401 | 910 | 0.125 | 0.8890 | 0.9187 | 0.8593 | 0.9521 | 0.7989 | 0.7794 | 0.0479 | |
| - *0402 | 240 | 0.07 | 0.9 | 0.925 | 0.875 | 0.9717 | 0.9365 | 0.8010 | 0.0282 | |
| HLA-DR5 | Human | 398 | 0.125 | 0.9171 | 0.9799 | 0.8542 | 0.9717 | 0.9166 | 0.8408 | 0.0283 |
| HLA-DR6 | Human | 46 | 0.25 | 0.9348 | 1 | 0.8696 | 0.9981 | 0.9981 | 0.8771 | 0.0019 |
| HLA-DR7 | Human | 528 | 0.1 | 0.9034 | 0.9659 | 0.8409 | 0.9696 | 0.8965 | 0.8132 | 0.0304 |
| HLA-DR8 | Human | 160 | 0.06 | 0.8938 | 0.8625 | 0.925 | 0.9683 | 0.9505 | 0.7890 | 0.0317 |
| HLA-DR9 | Human | 192 | 0.2 | 0.9375 | 0.9896 | 0.8854 | 0.9779 | 0.9575 | 0.8798 | 0.0221 |
| HLA-DR10 | Human | 12 | 5 | 0.5833 | 0.6667 | 0.5 | 0.6389 | 0.6389 | 0.1690 | 0.3611 |
| HLA-DR11 | Human | 590 | 0.03 | 0.9169 | 0.9390 | 0.8949 | 0.9615 | 0.8847 | 0.8347 | 0.0385 |
| HLA-DR14 | Human | 126 | 1 | 0.9762 | 1 | 0.9524 | 0.9934 | 0.9917 | 0.9535 | 0.0066 |
| HLA-DR17 | Human | 308 | 0.03 | 0.9448 | 0.9545 | 0.9351 | 0.9802 | 0.9579 | 0.8898 | 0.0198 |
| HLA-DR53 | Human | 72 | 0.2 | 0.8889 | 1 | 0.7778 | 0.9931 | 0.9931 | 0.7977 | 0.0069 |
| HLA-DP9 | Human | 90 | 0.2 | 0.9889 | 0.9778 | 1 | 1 | 1 | 0.9780 | 0 |
| HLA-DPw4 | Human | 38 | 0.01 | 0.7895 | 0.8421 | 0.7368 | 0.9058 | 0.9058 | 0.5822 | 0.0942 |
| HLA-DQ1 | Human | 78 | 0.02 | 0.8974 | 0.9231 | 0.8718 | 0.9579 | 0.9579 | 0.7959 | 0.0420 |
| HLA-DQ2 | Human | 210 | 0.08 | 0.8952 | 0.9714 | 0.8190 | 0.9664 | 0.936 | 0.7998 | 0.0336 |
| HLA-DQ4 | Human | 194 | 0.2 | 0.8866 | 0.8969 | 0.8763 | 0.9557 | 0.9188 | 0.7734 | 0.0443 |
Evaluating performance on multiple alleles using 10-fold CV. Average scores shown underneath, weighted by number of samples.
1All binders belonging to a group of alleles.