| Literature DB >> 18031584 |
Menaka Rajapakse1, Bertil Schmidt, Lin Feng, Vladimir Brusic.
Abstract
BACKGROUND: Peptides binding to Major Histocompatibility Complex (MHC) class II molecules are crucial for initiation and regulation of immune responses. Predicting peptides that bind to a specific MHC molecule plays an important role in determining potential candidates for vaccines. The binding groove in class II MHC is open at both ends, allowing peptides longer than 9-mer to bind. Finding the consensus motif facilitating the binding of peptides to a MHC class II molecule is difficult because of different lengths of binding peptides and varying location of 9-mer binding core. The level of difficulty increases when the molecule is promiscuous and binds to a large number of low affinity peptides. In this paper, we propose two approaches using multi-objective evolutionary algorithms (MOEA) for predicting peptides binding to MHC class II molecules. One uses the information from both binders and non-binders for self-discovery of motifs. The other, in addition, uses information from experimentally determined motifs for guided-discovery of motifs.Entities:
Mesh:
Substances:
Year: 2007 PMID: 18031584 PMCID: PMC2212666 DOI: 10.1186/1471-2105-8-459
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
I-Ag7 datasets and experimental motifs
| Dataset | Experimental Motif | Non-binders | Binders | Reference |
| Reizis | 21 | 33 | [40] | |
| Harrison | 19 | 157 | [41] | |
| Gregori | 31 | 109 | [43] | |
| Latek | 8 | 37 | [42] | |
| - | - | - | [44] | |
| - | - | - | [38] | |
| - | - | - | [39] | |
| Corper | - | 35 | 13 | [62] |
| MHCPEP | - | - | 176 | [63] |
| Yu | - | 16 | 10 | [64] |
| Brusic | - | 37 | - | [unpublished] |
Information on I-Ag7 related peptide binding datasets and motifs. Unavailable information is indicated by "-".
Representation of an experimentally derived I-Ag7 motif
| Position | Well-Tolerated | Weakly-Tolerated | Non-Tolerated |
| P1 | VEQMHLPD | - | R |
| P2 | - | - | - |
| P3 | - | - | - |
| P5 | - | - | - |
| P7 | QVYLHINRF | - | - |
| P8 | - | - | - |
The description of experimentally determined I-Ag7 9-mer peptide binding motif by Reizis: each position accommodates a well-tolerated, weakly-tolerated, or non-tolerated amino acid. The positions P4, P6 and P9 are the primary anchor positions where binding is highly likely to occur.
Validation of I-Ag7 experimental motifs
| Experimental Motif | AUC value | ||||||
| Datasets | |||||||
| Reizis | Harrison | Gregori | Latek | Corper | MHCPEP | Yu | |
| 0.95 | 0.68 | 0.74 | 0.95 | 0.50 | 0.59 | 0.48 | |
| 0.75 | 0.88 | 0.69 | 0.64 | 0.53 | 0.72 | 0.33 | |
| 0.64 | 0.68 | 0.71 | 0.73 | 0.40 | 0.64 | 0.61 | |
| 0.66 | 0.72 | 0.80 | 0.95 | 0.64 | 0.52 | 0.75 | |
| 0.49 | 0.64 | 0.76 | 0.82 | 0.60 | 0.48 | 0.43 | |
| 0.55 | 0.64 | 0.69 | 0.58 | 0.56 | 0.47 | 0.50 | |
| 0.69 | 0.54 | 0.66 | 0.70 | 0.56 | 0.66 | 0.40 | |
Performance measured by AUC of experimentally determined I-Ag7 motifs on their own datasets and other experimental datasets.
Performance of I-Ag7 MOEA derived motifs
| AUC value | |||||||
| MOEA-derived Motifs | Datasets | ||||||
| Reizis | Harrison | Gregori | Latek | Corper | MHCPEP | Yu | |
| self-discovery | 0.75 | 0.75 | 0.77 | 0.93 | 0.70 | 0.75 | 0.75 |
| guided-discovery | 0.77 | 0.74 | 0.81 | 0.83 | 0.72 | 0.77 | 0.71 |
Seven-fold cross-validation accuracies of MOEA derived motifs on training dataset.
Figure 1Comparison of Performances. Comparison of performance of MOEA based algorithms – self-discovery and guided-discovery – against MEME, RANKPEP, and experimental motifs on the balanced I-Ag7 test datasets (the performance was averaged over 25 test datasets)
Description of peptides in BM-Set1
| BM-Set1 | Original | NR | ||
| DRB1*0401 | Binders | Non-binders | Binders | Non-binders |
| Set1 | 694 | 323 | 248 | 283 |
| Set2 | 381 | 292 | 161 | 255 |
| Set3a | 373 | 217 | 151 | 204 |
| Set3b | 279 | 216 | 128 | 197 |
| Set4a | 323 | 323 | 120 | 283 |
| Set4b | 292 | 292 | 120 | 255 |
| Set5a | 70 | 47 | 65 | 45 |
| Set5b | 48 | 37 | 47 | 37 |
| Southwood | 16 | 6 | 15 | 6 |
| Geluk | 22 | 83 | 19 | 80 |
The number of binders and non-binders in the original and non-redundant (NR) datasets in BM-Set1.
Comparison of performance on BM-Set1
| Dataset | AUC | |||||
| †SVRMHC | Gibbs | ARB | TEPITOPE | MOEA | ||
| Original | set1 | 0.711 | 0.799 | 0.666 | 0.760 | 0.760 |
| set2 | 0.652 | 0.766 | 0.653 | 0.736 | 0.765 | |
| set3a | 0.626 | 0.740 | 0.652 | 0.730 | 0.733 | |
| set3b | 0.618 | 0.751 | 0.666 | 0.750 | 0.752 | |
| set4a | 0.706 | 0.788 | 0.668 | 0.748 | 0.748 | |
| set4b | 0.664 | 0.770 | 0.661 | 0.748 | 0.770 | |
| set5a | 0.553 | 0.604 | 0.539 | 0.653 | 0.777 | |
| set5b | 0.606 | 0.621 | 0.579 | 0.679 | 0.748 | |
| Southwood | 0.912 | 0.862 | 0.514 | 0.490 | 0.784 | |
| Geluk | 0.697 | 0.723 | 0.682 | 0.710 | 0.786 | |
| NR | set1 | 0.619 | 0.673 | 0.572 | 0.594 | 0.587 |
| set2 | 0.581 | 0.665 | 0.640 | 0.653 | 0.685 | |
| set3a | 0.578 | 0.598 | 0.600 | 0.598 | 0.660 | |
| set3b | 0.577 | 0.692 | 0.669 | 0.699 | 0.713 | |
| set4a | 0.597 | 0.671 | 0.575 | 0.573 | 0.599 | |
| set4b | 0.577 | 0.669 | 0.651 | 0.655 | 0.690 | |
| set5a | 0.544 | 0.601 | 0.536 | 0.646 | 0.790 | |
| set5b | 0.593 | 0.610 | 0.572 | 0.671 | 0.743 | |
| Southwood | 0.917 | 0.850 | 0.671 | 0.505 | 0.770 | |
| Geluk | 0.655 | 0.697 | 0.510 | 0.670 | 0.768 | |
Comparison of AUC values of the BM-Set1 (DRB1*0401). †These values are based on smaller dataset sizes as SVRMHC didn't predict values for some of the peptides. The values from the Gibbs sampler were estimated from the matrix provided by the authors in [32].
Description of peptides in BM-Set2
| Type | Allele | Binders | Non-binders |
| Mouse | I-Ab | 43 | 33 |
| I-Ad | 56 | 286 | |
| I-As | 35 | 91 | |
| HLA | DRB1-0101 | 920 | 283 |
| DRB1-0301 | 65 | 409 | |
| DRB1-0401 | 209 | 248 | |
| DRB1-0404 | 74 | 94 | |
| DRB1-0405 | 88 | 83 | |
| DRB1-0701 | 125 | 185 | |
| DRB1-0802 | 58 | 116 | |
| DRB1-0901 | 47 | 70 | |
| DRB1-1101 | 95 | 264 | |
| DRB1-1302 | 101 | 78 | |
| DRB1-1501 | 188 | 177 | |
| DRB4-0101 | 74 | 107 | |
| DRB5-0101 | 112 | 231 |
The number of binders and non-binders in each of the dataset in BM-Set2. The datasets in BM-Set2 were obtained from [77]. The DRB3-0101 allele dataset was excluded from the performance comparison due to significant imbalance in the dataset (3 binders and 99 non-binders).
Comparison of Performance on BM-Set2
| Type | Allele | AUC | |||||
| SVRMHC | Gibbs | ARB | TEPITOPE | NetMHCII | MOEA | ||
| Mouse | I-Ab | - | - | 0.662 | - | 0.908 | 0.919 |
| I-Ad | - | - | 0.819 | - | 0.818 | 0.855 | |
| I-As | - | - | - | 0.898 | 0.889 | ||
| HLA | DRB1-0101 | 0.623 | 0.676 | 0.666 | 0.647 | 0.716 | 0.651 |
| DRB1-0301 | - | 0.722 | 0.799 | 0.734 | 0.765 | 0.778 | |
| DRB1-0401 | 0.739 | 0.759 | 0.737 | 0.754 | 0.758 | 0.725 | |
| DRB1-0404 | - | 0.743 | 0.788 | 0.829 | 0.785 | 0.786 | |
| DRB1-0405 | 0.701 | 0.724 | 0.724 | 0.790 | 0.735 | 0.756 | |
| DRB1-0701 | - | 0.695 | 0.749 | 0.768 | 0.787 | 0.735 | |
| DRB1-0802 | - | 0.721 | 0.803 | 0.769 | 0.756 | 0.773 | |
| DRB1-0901 | - | 0.734 | 0.711 | - | 0.775 | 0.712 | |
| DRB1-1101 | - | 0.715 | 0.727 | 0.710 | 0.734 | 0.759 | |
| DRB1-1302 | - | 0.716 | 0.917 | 0.720 | 0.818 | 0.820 | |
| DRB1-1501 | 0.730 | 0.672 | 0.792 | 0.726 | 0.736 | 0.743 | |
| DRB4-0101 | - | 0.742 | 0.800 | - | 0.736 | 0.759 | |
| DRB5-0101 | 0.649 | 0.618 | 0.677 | 0.653 | 0.664 | 0.660 | |
Comparison of AUC values from five-fold cross-validation of allele datasets given in BM-Set2. "-" indicates that the allele is unavailable for testing with the respective prediction method.