| Literature DB >> 17608956 |
Morten Nielsen1, Claus Lundegaard, Ole Lund.
Abstract
BACKGROUND: Antigen presenting cells (APCs) sample the extra cellular space and present peptides from here to T helper cells, which can be activated if the peptides are of foreign origin. The peptides are presented on the surface of the cells in complex with major histocompatibility class II (MHC II) molecules. Identification of peptides that bind MHC II molecules is thus a key step in rational vaccine design and developing methods for accurate prediction of the peptide:MHC interactions play a central role in epitope discovery. The MHC class II binding groove is open at both ends making the correct alignment of a peptide in the binding groove a crucial part of identifying the core of an MHC class II binding motif. Here, we present a novel stabilization matrix alignment method, SMM-align, that allows for direct prediction of peptide:MHC binding affinities. The predictive performance of the method is validated on a large MHC class II benchmark data set covering 14 HLA-DR (human MHC) and three mouse H2-IA alleles.Entities:
Mesh:
Substances:
Year: 2007 PMID: 17608956 PMCID: PMC1939856 DOI: 10.1186/1471-2105-8-238
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Summary of the HLA-DR benchmark results.
| 0.730 | 0.697 | 0.758 | 0.749 | 0.756 | 14 | ||||
| 0.740 | 0.705 | 0.736 | 0.762 | 0.748 | 0.750 | 11 | |||
| 0.710 | 0.690 | 0.714 | 0.688 | 0.719 | 0.719 | 0.722 | 5 | ||
| 0.737 | 0.710 | 0.723 | 0.606 | 0.717 | 0.749 | 0.754 | 3 | ||
| 0.420 | 0.368 | 0.464 | 0.436 | 0.448 | 14 | ||||
| 0.408 | 0.369 | 0.157 | 0.431 | 0.428 | 0.435 | 5 | |||
| 0.458 | 0.384 | 0.218 | 0.425 | 0.480 | 0.487 | 3 | |||
| 0.430 | 0.372 | 0.464 | 0.445 | 0.453 | 14 | ||||
| 0.443 | 0.378 | 0.428 | 0.479 | 0.458 | 0.463 | 11 | |||
| 0.398 | 0.353 | 0.430 | 0.377 | 0.424 | 0.422 | 0.427 | 5 | ||
| 0.450 | 0.365 | 0.434 | 0.210 | 0.407 | 0.474 | 0.481 | 3 | ||
The predictive performance is shown in terms of the average area under the ROC curve (upper panel), the average Pearson's correlation (middle panel), and the average Spearman's rank correlation (lower panel) for the SMM (SMM-align), Gibbs sampler [4], TEPITOPE [3], SVRMHC [7], MHCpred [15], and ARB [12] methods, respectively. The SMM-PRF method refers to the extended SMM align method including penalties for longer peptides and short amino terminal peptide flanking residues, and the NetMHCII method refers to the extended SMM align method including direct encoding of peptide flanking residues and penalties for longer peptides and short amino terminal peptide flanking residues. The last column gives the number of alleles included in each average. In A is shown the average performance for all 14 HLA-DR alleles, in B the average performance for the subset of 11 alleles covered by the TEPITOPE method, in C the average performance for the five alleles covered by the SVRMHC method, and in D the average performance for the three alleles covered by the MHCpred method. For each allele, the performance of the SMM-align, Gibbs sampler, and NetMHCII methods was estimated using five-fold cross-validation as described in Methods.
Details of the benchmark calculation covering the 14 HLA-DR alleles.
| 1*0101 | 0.702 | 0.676 | 0.647 | 0.623 | 0.565 | 0.666 | 0.716 | 0.716 | 1203 |
| 1*0301 | 0.779 | 0.722 | 0.734 | 0.799 | 0.770 | 0.765 | 474 | ||
| 1*0401 | 0.741 | 0.759 | 0.754 | 0.739 | 0.606 | 0.737 | 0.756 | 0.758 | 457 |
| 1*0404 | 0.798 | 0.743 | 0.829 | 0.788 | 0.808 | 0.785 | 168 | ||
| 1*0405 | 0.727 | 0.724 | 0.790 | 0.701 | 0.724 | 0.733 | 0.735 | 171 | |
| 1*0701 | 0.768 | 0.695 | 0.768 | 0.647 | 0.749 | 0.774 | 0.787 | 310 | |
| 1*0802 | 0.724 | 0.721 | 0.769 | 0.803 | 0.740 | 0.756 | 174 | ||
| 1*0901 | 0.726 | 0.734 | 0.711 | 0.759 | 0.775 | 117 | |||
| 1*1101 | 0.715 | 0.715 | 0.710 | 0.727 | 0.720 | 0.734 | 359 | ||
| 1*1302 | 0.810 | 0.716 | 0.720 | 0.917 | 0.819 | 0.818 | 179 | ||
| 1*1501 | 0.715 | 0.672 | 0.726 | 0.730 | 0.792 | 0.733 | 0.736 | 365 | |
| 3*0101 | 0.620 | 0.512 | 0.717 | 0.771 | 0.815 | 102 | |||
| 4*0101 | 0.730 | 0.742 | 0.800 | 0.729 | 0.736 | 181 | |||
| 5*0101 | 0.664 | 0.618 | 0.653 | 0.649 | 0.677 | 0.655 | 0.664 | 343 | |
The predictive performance is shown in terms of the area under the ROC curve (AUC) for the SMM-align, Gibbs sampler [4], TEPITOPE [3], SVRMHC [7], MHCpred [15], and ARB methods, respectively. The SMM-PRF method refers to the extended SMM align method including penalties for long peptides and short amino terminal peptide flanking residues, and the NetMHCII method refers to the final extended SMM align method including direct encoding of peptide flanking residues and penalties for longer peptides and short amino terminal peptide flanking residues. The first column gives the allele names as 1*0101 for DRB1*0101 etc The last column gives the number of peptide data included for each allele. For each allele, the performance of the SMM-align, Gibbs sampler, and NetMHCII methods was estimated using five-fold cross-validation as described in the text. The details of the benchmark calculation as measured in terms of the Pearson's and Spearman's rank correlation are shown in Supplementary materials table 1 [see Additional file 1].
Predictive performance in terms of the area under the ROC curve (AUC) of the different methods evaluated on six data sets.
| DRB1*0101 | DRB1*0401 | DRB1*1501 | DRB1*0101 | DRB1*0401 | DRB1*1501 | |
| 0.709 | 0.757 | 0.609 | ||||
| 0.718 | 0.806 | 0.691 | 0.702 | 0.741 | 0.715 | |
| 0.667 | 0.744 | 0.665 | 0.647 | 0.754 | 0.726 | |
| 0.770 | 0.757 | 0.677 | ||||
| 0.807 | 0.819 | 0.741 | 0.744 | 0.750 | 0.718 | |
| 0.616 | 0.785 | 0.669 | 0.645 | 0.721 | 0.712 | |
| 0.742 | 0.814 | 0.726 | 0.716 | 0.756 | 0.733 | |
The methods are; ISC-PLS [15], SMM-align, TEPITOPE, Chang [11], SMM-regr (SMM with peptide length regression correction from training data set), SMM-regr-alter (SMM with peptide length regression correction from alternative AntiJen/IEDB dataset), and SMM-PFR (The SMM-PRF method refers to the extended SMM align method including penalties for long peptides and short amino terminal peptide flanking residues). The data sets consist of peptides binding data from two sources (IEDB and AntiJen) covering three HLA-DR alleles (1*0101, 1*0401, and 1*1501). Performance value for the ISC-PLS and Chang methods, are taken from Chang et al. [11]. These values are only available for the AntiJen data set.
Figure 1Length distribution of amino terminal PFRs for MHC-II binding and non-binding peptides. All peptide data for the three alleles in the AntiJen and IEDB data sets are included in the figure. Binding peptides have an affinity stronger than 500 nM. The PFR is defined as the residues flanking the peptide-binding core as determined by the SMM-align method.
Summary of the mouse H2-IA benchmark.
| H-2-IAb | 0.913 | 0.908 | 0.662 | 76 | |
| H-2-IAd | 0.819 | 0.818 | 0.819 | 0.659 | 342 |
| H-2-IAs | 0.877 | 0.898 | 126 | ||
The predictive performance is shown in terms of the average area under the ROC curve. The methods included in the benchmark are SMM (SMM-align), NetMHCII (the extended SMM-align method including direct encoding of peptide flanking residues and penalties for longer peptides and short amino terminal peptide flanking residues), ARB [12], and PredBalbc [22]. The performance of the SMM and NetMHCII methods was estimated using five-fold cross-validation as described in Methods.
Figure 2Kullback-Leibler logo visualizations of peptide binding motifs. The upper panel depicts the motif for the DRB1*0101 allele, and the lower panel the motif for the DRB1*1302 alleles. From left the different columns show the motif estimated by the SMM (NetMHCII), Gibbs sampler, and TEPITOPE methods, respectively. The height of a column in the logo is proportional to the relative information content in the sequence motif, and the letter height is proportional to the amino acid frequency [23]
Data included in the benchmark calculation.
| 1203 | 920 | 33 | |
| 474 | 65 | 22 | |
| 457 | 209 | 65 | |
| 168 | 74 | 23 | |
| 171 | 88 | 23 | |
| 310 | 125 | 34 | |
| 174 | 58 | 1 | |
| 117 | 47 | 13 | |
| 359 | 95 | 23 | |
| 179 | 101 | 20 | |
| 365 | 188 | 12 | |
| 102 | 3 | 2 | |
| 181 | 74 | 5 | |
| 343 | 112 | 11 | |
| 76 | 43 | 47 | |
| 342 | 56 | 7 | |
| 126 | 35 | 19 |
The first column gives the allele name, the second to fourth columns the number of unique peptide data, and binders included for each allele in the IEDB [13] and SYFPEITHI [33] databases, respectively. Binding peptides were identified using an IC50 binding threshold of 500 nM.