| Literature DB >> 18389056 |
Peng Wang1, John Sidney, Courtney Dow, Bianca Mothé, Alessandro Sette, Bjoern Peters.
Abstract
The identification of MHC class II restricted peptide epitopes is an important goal in immunological research. A number of computational tools have been developed for this purpose, but there is a lack of large-scale systematic evaluation of their performance. Herein, we used a comprehensive dataset consisting of more than 10,000 previously unpublished MHC-peptide binding affinities, 29 peptide/MHC crystal structures, and 664 peptides experimentally tested for CD4+ T cell responses to systematically evaluate the performances of publicly available MHC class II binding prediction tools. While in selected instances the best tools were associated with AUC values up to 0.86, in general, class II predictions did not perform as well as historically noted for class I predictions. It appears that the ability of MHC class II molecules to bind variable length peptides, which requires the correct assignment of peptide binding cores, is a critical factor limiting the performance of existing prediction tools. To improve performance, we implemented a consensus prediction approach that combines methods with top performances. We show that this consensus approach achieved best overall performance. Finally, we make the large datasets used publicly available as a benchmark to facilitate further development of MHC class II binding peptide prediction methods.Entities:
Mesh:
Substances:
Year: 2008 PMID: 18389056 PMCID: PMC2267221 DOI: 10.1371/journal.pcbi.1000048
Source DB: PubMed Journal: PLoS Comput Biol ISSN: 1553-734X Impact factor: 4.475
Overview of the MHC-peptide binding affinity dataset.
| Organism | MHC class II types | Number of MHC-peptide affinities | |
| New | Known | ||
| Human | HLA-DRB1*0101 | 3882 | 1390 |
| HLA-DRB1*0301 | 502 | 817 | |
| HLA-DRB1*0401 | 512 | 675 | |
| HLA-DRB1*0404 | 449 | 233 | |
| HLA-DRB1*0405 | 457 | 175 | |
| HLA-DRB1*0701 | 505 | 424 | |
| HLA-DRB1*0802 | 245 | 213 | |
| HLA-DRB1*0901 | 412 | 174 | |
| HLA-DRB1*1101 | 520 | 522 | |
| HLA-DRB1*1302 | 289 | 242 | |
| HLA-DRB1*1501 | 520 | 491 | |
| HLA-DRB3*0101 | 420 | 104 | |
| HLA-DRB4*0101 | 245 | 203 | |
| HLA-DRB5*0101 | 520 | 383 | |
| Mouse | H-2-IAb | 500 | 225 |
| H-2-IEd | 39 | 231 | |
Number of records in IEDB as of 12-04-2006.
Overview of nine MHC class II peptide prediction methods tested with the new dataset.
| Category | Method | MHC class II types | Training dataset | Algorithm |
| Matrix based | ARB | 16 (16) | IEDB | Average relative binding (ARB) matrix |
| PROPRED | 51 (11) | TEPITOPE | Pocket profile | |
| SVMHC | 51 (11) | TEPITOPE | Pocket profile | |
| SYFPEITHI | 6 (6) | SYFPEITHI | Position specific scoring matrices | |
| RANKPEP | 46 (16) | MHCPEP | Position specific scoring matrices | |
| SMM-align | 17 (16) | IEDB SYFPEITHI | Stabilized matrix | |
| Machine Learning based | SVRMHC | 6 (5) | AntiJen | Support vector machine regression |
| MHC2PRED | 21 (15) | MHCBN JenPep | Support vector machine | |
| Multivariate regression | MHCPRED | 10 (6) | JenPep | Quantitative structure activity relationship (QSAR) regression |
Number of MHC class II types covered by a prediction method. The number in parentheses is the number of MHC class II types also in our dataset.
Performance of various MHC class II prediction methodsa.
| MHC class II type | Number of peptides | ARB | MHC2PRED | MHCPRED | PROPRED | RANKPEP | SMM-align | SVRMHC | SYFPEITHI | Consensus |
| DRB1*0101 | 3882 | 0.76 | 0.67 | 0.62 | 0.74 | 0.70 | 0.77 | 0.69 | 0.71 | 0.79 |
| DRB1*0301 | 502 | 0.66 | 0.53 | 0.65 | 0.67 | 0.69 | 0.65 | 0.72 | ||
| DRB1*0401 | 512 | 0.67 | 0.52 | 0.60 | 0.69 | 0.63 | 0.68 | 0.66 | 0.65 | 0.69 |
| DRB1*0404 | 449 | 0.72 | 0.64 | 0.79 | 0.66 | 0.75 | 0.80 | |||
| DRB1*0405 | 457 | 0.67 | 0.51 | 0.75 | 0.62 | 0.69 | 0.62 | 0.72 | ||
| DRB1*0701 | 505 | 0.69 | 0.63 | 0.78 | 0.58 | 0.78 | 0.68 | 0.83 | ||
| DRB1*0802 | 245 | 0.74 | 0.70 | 0.77 | 0.75 | 0.82 | ||||
| DRB1*0901 | 412 | 0.62 | 0.48 | 0.61 | 0.66 | 0.68 | ||||
| DRB1*1101 | 520 | 0.73 | 0.60 | 0.80 | 0.70 | 0.81 | 0.73 | 0.80 | ||
| DRB1*1302 | 289 | 0.79 | 0.54 | 0.58 | 0.52 | 0.69 | 0.73 | |||
| DRB1*1501 | 520 | 0.7 | 0.63 | 0.72 | 0.62 | 0.74 | 0.64 | 0.67 | 0.72 | |
| DRB3*0101 | 420 | 0.59 | 0.68 | |||||||
| DRB4*0101 | 245 | 0.74 | 0.61 | 0.65 | 0.71 | 0.74 | ||||
| DRB5*0101 | 520 | 0.7 | 0.59 | 0.79 | 0.73 | 0.75 | 0.63 | 0.79 | ||
| IAB | 500 | 0.8 | 0.56 | 0.51 | 0.74 | 0.75 | 0.86 | |||
| IED | 39 | 0.53 | 0.83 | |||||||
| Mean | 0.71 | 0.58 | 0.58 | 0.73 | 0.66 | 0.73 | 0.65 | 0.68 | 0.76 | |
| Min | 0.59 | 0.48 | 0.51 | 0.58 | 0.52 | 0.66 | 0.62 | 0.65 | 0.68 | |
| Max | 0.8 | 0.70 | 0.63 | 0.80 | 0.83 | 0.81 | 0.69 | 0.73 | 0.86 |
Performance is measured in terms of AUC as described in Materials and Methods. Evaluation of ARB was carried out via 10-fold cross validation. Evaluation of the rest of the methods were done as blind tests.
Figure 1Performance of nine MHC class II prediction methods using HLA DRB1*0101 as an example.
Prediction results for eight methods for HLA DRB1*0101 are shown in the ROC curve. The curves were generated by plotting the true positive rate (y-axis) against the false positive rate (x-axis). The AUC values for corresponding ROC curves were shown in parentheses.
MHC class II structures used to evaluate the performance of different MHC class II epitope prediction methods.
| Core | Peptide | Chain | PDB ID | MHC class II type |
| PFPQPELPY | LQPFPQPELPY | C | 1S9V | DQB1*0201 |
| EALYLVCGE | LVEALYLVCGERGG | C | 1JK8 | DQB1*0302 |
| LPSTKVSWA | EGRDSMNLPSTKVSWAAVGGGGSLVPRGSGGGG | C | 1UVQ | DQB1*0602 |
| MRMATPLLM | PVSKMRMATPLLMQA | C | 1A6A | DRB1*0301 |
| FKGEQGPKG | AGFKGEQGPKGEPG | E | 2FSE | DRB1*0101 |
| IGILNAAKV | GELIGILNAAKVPAD | C | 1KLG | DRB1*0101 |
| VIPMFSALS | PEVIPMFSALSEGATP | C | 1SJE | DRB1*0101 |
| WRFLRGYHQ | GSDWRFLRGYHQYA | C | 1AQD | DRB1*0101 |
| YSDQATPLL | AAYSDQATPLLLSPR | C | 1T5W | DRB1*0101 |
| YVKQNTLKL | PKYVKQNTLKLAT | C | 2G9H | DRB1*0101 |
| MRADAAAGG | AYMRADAAAGGA | E | 2SEB | DRB1*0401 |
| YVKQNTLKL | PKYVKQNTLKLAT | C | 1J8H | DRB1*0401 |
| VHFFKNIVT | ENPVVHFFKNIVTPR | C | 1BX2 | DRB1*1501 |
| FKNIVTPRT | NPVVHFFKNIVTPRTPPPSQ | C | 1FV1 | DRB5*0101 |
| YHFVKKHVH | GGVYHFVKKHVHES | C | 1H15 | DRB5*0101 |
| AQKAKANKA | FEAQKAKANKAVDGGGG | B | 1LNU | IAb |
| MRMATPLLM | GSHSRGLPKPPKPVSKMRMATPLLMQALPMGSGSGS | C | 1MUJ | IAb |
| SQAVHAAHA | RGISQAVHAAHAEI | B | 1IAO | IAd |
| TQGVTAASS | GHATQGVTAASSHE | B | 2IAD | IAd |
| IAPVFVLLE | YEIAPVFVLLEYVT | B | 1ES0 | IAg7 |
| RHGLDNYRG | AMKRHGLDNYRGYS | P | 1F3J | IAg7 |
| DYGILQINS | STDYGILQINSRW | P | 1IAK | IAk |
| HRGAIEWEG | GNSHRGAIEWEGIESG | P | 1D9K | IAk |
| GGASQYRPS | HSRGGASQYRPSQRHGTGSGSGS | P | 1K2D | IAu |
| IAYLKQASA | ADLIAYLKQASAKGG | B | 1KTD | IEK |
| IAYLKQATK | ADLIAYLKQATKGGG | B | 1KT2 | IEK |
| IAYPKAATK | ADLIAYPKAATKF | E | 1R5V | IEK |
| ITAFNDGLK | KKVITAFNDGLKGGG | B | 1FNE | IEK |
| ITAFNEGLK | KKVITAFNEGLKGGG | B | 1I3R | IEK |
Accuracy of MHC class II prediction methods for identifying epitope core regions.
| MHC class II type | Known cores | Methods (Number of core regions identified correctly) | |||||||
| PROPRED | SMM-align | RANKPEP | ARB | MHCPRED | MHC2PRED | SVRMHC | SYFPEITHI | ||
| DQB1*0201 | 1 | NA | NA | 0 | NA | NA | 0 | NA | NA |
| DQB1*0302 | 1 | NA | NA | 0 | NA | NA | 0 | NA | NA |
| DQB1*0602 | 1 | NA | NA | 0 | NA | NA | NA | NA | NA |
| DRB1*0101 | 6 | 6 | 5 | 5 | 4 | 1 | 2 | 3 | 6 |
| DRB1*0301 | 1 | 1 | 1 | 1 | 0 | NA | 0 | NA | 1 |
| DRB1*0401 | 2 | 2 | 1 | 1 | 0 | 0 | 2 | 0 | 1 |
| DRB1*1501 | 1 | 1 | 1 | 1 | 0 | NA | 0 | 1 | 1 |
| DRB5*0101 | 2 | 2 | 1 | 0 | 0 | NA | 0 | 2 | NA |
| IAb | 2 | NA | 1 | 2 | 0 | 0 | 0 | NA | NA |
| IAd | 2 | NA | 0 | 0 | 0 | 0 | 0 | NA | NA |
| IAg7 | 2 | NA | NA | 0 | NA | NA | 1 | NA | NA |
| IAk | 2 | NA | NA | 1 | NA | 0 | NA | NA | NA |
| IAu | 1 | NA | NA | 0 | NA | NA | NA | NA | NA |
| IEk | 5 | NA | NA | 5 | NA | 3 | NA | NA | NA |
| Accuracy (Correct/Total) | 29 | 1.000 (12/12) | 0.625 (10/16) | 0.552 (16/29) | 0.250 (4/16) | 0.211 (4/19) | 0.250 (5/20) | 0.545 (6/11) | 0.900 (9/10) |
Figure 2The performance of various MHC class II binding prediction approaches to identify CD4+ T cell epitopes.
ROC curves are generated from the predictions made by five MHC class II peptide binding prediction methods on the LCMV CD4+ T cell activation data. The AUC value for each method is shown in parentheses.
Sensitivity and positive predictive value for predicting T cell activation.
| ARB | MHC2PRED | MHCPRED | RANKPEP | SMM-align | Consensus | |
| Sensitivity | 4/9 (44.4%) | 2/9 (22.2%) | 1/9 (11.1%) | 3/9 (33.3%) | 2/9 (22.2%) | 6/9 (66.7%) |
| Positive predictive value | 4/64 (6.2%) | 2/64 (3.1%) | 1/64 (1.6%) | 3/64 (4.7%) | 2/64 (3.1%) | 6/64 (9.4%) |