| Literature DB >> 20379710 |
Thomas Stranzl1, Mette Voldby Larsen, Claus Lundegaard, Morten Nielsen.
Abstract
Reliable predictions of immunogenic peptides are essential in rational vaccine design and can minimize the experimental effort needed to identify epitopes. In this work, we describe a pan-specific major histocompatibility complex (MHC) class I epitope predictor, NetCTLpan. The method integrates predictions of proteasomal cleavage, transporter associated with antigen processing (TAP) transport efficiency, and MHC class I binding affinity into a MHC class I pathway likelihood score and is an improved and extended version of NetCTL. The NetCTLpan method performs predictions for all MHC class I molecules with known protein sequence and allows predictions for 8-, 9-, 10-, and 11-mer peptides. In order to meet the need for a low false positive rate, the method is optimized to achieve high specificity. The method was trained and validated on large datasets of experimentally identified MHC class I ligands and cytotoxic T lymphocyte (CTL) epitopes. It has been reported that MHC molecules are differentially dependent on TAP transport and proteasomal cleavage. Here, we did not find any consistent signs of such MHC dependencies, and the NetCTLpan method is implemented with fixed weights for proteasomal cleavage and TAP transport for all MHC molecules. The predictive performance of the NetCTLpan method was shown to outperform other state-of-the-art CTL epitope prediction methods. Our results further confirm the importance of using full-type human leukocyte antigen restriction information when identifying MHC class I epitopes. Using the NetCTLpan method, the experimental effort to identify 90% of new epitopes can be reduced by 15% and 40%, respectively, when compared to the NetMHCpan and NetCTL methods. The method and benchmark datasets are available at http://www.cbs.dtu.dk/services/NetCTLpan/.Entities:
Mesh:
Substances:
Year: 2010 PMID: 20379710 PMCID: PMC2875469 DOI: 10.1007/s00251-010-0441-4
Source DB: PubMed Journal: Immunogenetics ISSN: 0093-7711 Impact factor: 2.846
Numbers of ligands per supertype in the training and test set
| Supertype | Train | Test 9-mer | Test 8-/10-/11-mer | HIV |
|---|---|---|---|---|
| A1 | 36 | 0 | 29 | 5 |
| A2 | 50 | 208 | 94 | 82 |
| A3 | 50 | 49 | 75 | 41 |
| A24 | 19 | 0 | 5 | 9 |
| A26 | 50 | 43 | 74 | 2 |
| B7 | 50 | 8 | 57 | 32 |
| B8 | 28 | 0 | 19 | 5 |
| B62 | 47 | 0 | 27 | 10 |
| B27 | 50 | 224 | 141 | 3 |
| B39 | 50 | 21 | 36 | 1 |
| B44 | 50 | 336 | 227 | 16 |
| B58 | 24 | 0 | 22 | 10 |
| Total | 504 | 889 | 806 | 216 |
Fig. 1ROC curves for a pooled data set from the HLA-A*0101, HLA-B*4402, and HLA-B*5101 alleles. The source proteins for all three alleles were cut into overlapping peptides of the size of the given ligand, and all peptides except the given ligands were taken as negative. The data set contained 31 HLA-A*0101, 50 HLA-B*4402, and 29 HLA-B*5101 ligands, and the predictions were made using the NetCTLpan method. The black curve shows the ROC curve for the combined data set. The other three curves show the allele-specific sensitivity (fraction of ligands identified) as a function of the overall specificity for each of the three alleles. The insert shows the curves for the full range of specificities
Fig. 2Weights on proteasomal cleavage and TAP transport efficiency related to AUCx fraction. The smaller the included fraction, the higher the contribution of proteasomal cleavage and TAP transport efficiency to a high performance. Optimal weights on proteasomal cleavage and TAP were found by optimizing the average AUCx value on the SYF training data set. The dotted line indicates the AUC0.1 fraction
Fig. 3Performance comparison in terms of ROC curves for NetCTLpan and NetMHCpan. The true positive rate is shown as a function of the false positive rate. The figure is based on the SYF training set. The shaded area shows the area under the curve used to calculate the AUC0.1. The insert shows the complete curves
AUC and fractional AUC value comparison between NetCTLpan and NetMHCpan
| Data | Measure | NetCTLpan | NetMHCpan |
|
|---|---|---|---|---|
| Train (9) | AUC | 0.976 |
| 0.056 |
| AUC0.1 |
| 0.852 | 0.002 | |
| Test (8/9/10/11) | AUC | 0.977 |
| 0.273 |
| AUC0.1 |
| 0.855 | 0.002 | |
| Test (HIV) | AUC |
| 0.920 | 0.028 |
| AUC0.1 |
| 0.593 | 0.106 | |
| Test (HLA-C) | AUC |
| 0.866 | <0.001 |
| AUC0.1 |
| 0.307 | <0.001 |
The performance values are calculated as average per protein AUC values over the corresponding data sets. p values are calculated by a paired t test excluding ties. The best performing method is, for each data set and performance measure, highlighted in bold
Supertype-specific weights benchmark
| Supertype | Weights | Train | Test (8/9/10/11) | Test (HIV) | |||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| Cleavage | TAP | Fixed | Specific |
| Fixed | Specific |
| Fixed | Specific |
| |
| A1 | 0.050 | 0.075 | 0.942 |
| 0.294 |
| 0.936 | 0.326 | 0.381 |
| 0.610 |
| A2 | 0.550 | 0.000 | 0.808 |
| 0.133 |
| 0.758 | 0.008 |
| 0.657 | 0.104 |
| A3 | 0.225 | 0.025 | 0.890 | 0.890 | 0.598 | 0.872 | 0.872 | (a) | 0.648 | 0.648 | (a) |
| A24 | 0.000 | 0.000 | 0.917 |
| 0.257 | 0.783 |
| 0.389 | 0.636 |
| 0.960 |
| A26 | 0.275 | 0.025 | 0.885 |
| 0.476 |
| 0.864 | 0.006 | 0.761 |
| 0.500 |
| B7 | 0.000 | 0.000 | 0.710 |
| 0.378 | 0.765 |
| 0.998 |
| 0.437 | 0.064 |
| B8 | 0.725 | 0.000 | 0.916 |
| 0.231 | 0.858 |
| 0.517 | 0.132 |
| 0.374 |
| B62 | 0.275 | 0.200 | 0.889 |
| 0.014 |
| 0.727 | 0.345 | 0.440 |
| 0.303 |
| B27 | 0.475 | 0.025 | 0.911 |
| 0.014 |
| 0.902 | 0.001 |
| 0.299 | 0.390 |
| B39 | 0.175 | 0.025 | 0.859 |
| 0.896 |
| 0.849 | 0.362 |
| 0.711 | (b) |
| B44 | 0.100 | 0.025 | 0.859 |
| 0.127 | 0.885 |
| 0.001 |
| 0.631 | 0.845 |
| B58 | 0.025 | 0.025 | 0.959 |
| 0.399 | 0.820 |
| 0.161 | 0.774 |
| 0.128 |
| All | 0.225 | 0.025 | 0.869 |
| 0.016 |
| 0.860 | 0.143 |
| 0.603 | 0.300 |
Optimal weights per supertype are shown. Performance is given as the average AUC0.1 value for each data set. Fixed weights for proteasomal cleavage and TAP transport efficiency are 0.225 and 0.025, respectively. The higher AUC0.1 value is highlighted in bold for each data set and supertype
(a) AUC0.1 values are equal for fixed and specific weights, (b) only one sample available for the given supertype
Benchmark comparison of the NetCTLpan and the NetCTL methods
| Data | Measure | NetCTLpan | NetCTL |
|
|---|---|---|---|---|
| Train (9) | AUC |
| 0.971 | 0.018 |
| AUC0.1 |
| 0.816 | <0.001 | |
| Test (9) | AUC |
| 0.975 | <0.001 |
| AUC0.1 |
| 0.802 | <0.001 | |
| Test (HIV) | AUC | 0.933 |
| 0.366a |
| AUC0.1 |
| 0.606 | 0.600 |
Average AUC and AUC0.1 values for the NetCTLpan and NetCTL methods calculated for the SYF train set and the SYF and HIV test sets. For each data set and performance measure, the best performing method is shown in bold. p values are calculated by a paired t test excluding ties
aWhen using full HLA typing information, the NetCTLpan performance values are 0.959 and 0.745 for AUC and AUC0.1, respectively. Both these values are significantly higher than the values of NetCTL
Benchmark comparison of NetCTL, NetCTLpan, and NetMHCpan_ST (supertype-specific version of NetCTLpan)
| Data | Measure |
|
|
|
|---|---|---|---|---|
| Train (9) | AUC | 0.971 | 0.976 | 0.971 |
| AUC0.1 | 0.816 | 0.869 | 0.830 | |
| Test (9) | AUC | 0.975 | 0.982 | 0.971 |
| AUC0.1 | 0.802 | 0.877 | 0.805 | |
| Test (8/10/11) | AUC | NA | 0.972 | 0.961 |
| AUC0.1 | NA | 0.848 | 0.770 |
The performance values are calculated as average per protein AUC values for the training and test data sets
Benchmark comparison of the NetCTLpan and MHC-pathway methods
| Data | Measure |
|
|
|
|
|
|
|
|
|
|---|---|---|---|---|---|---|---|---|---|---|
| Train (9) | AUC | 0.978 | 0.972 | <0.001 | 0.983 | 0.981 | 0.839 | 0.881 | 0.803 | 438 |
| AUC0.1 | 0.874 | 0.854 | 0.01 | 0.858 | 0.862 | 0.278 | 0.360 | 0.260 | 438 | |
| Test (9) | AUC | 0.978 | 0.974 | <0.001 | 0.978 | 0.977 | 0.809 | 0.870 | 0.774 | 615 |
| AUC0.1 | 0.871 | 0.847 | <0.001 | 0.864 | 0.870 | 0.204 | 0.362 | 0.215 | 615 | |
| Test (10) | AUC | 0.966 | 0.957 | <0.005 | 0.964 | 0.966 | 0.810 | 0.817 | 0.734 | 291 |
| AUC0.1 | 0.842 | 0.800 | <0.005 | 0.835 | 0.824 | 0.272 | 0.238 | 0.180 | 291 |
The performance values are calculated as average per protein AUC values for the training and test data sets. The benchmark is made on the subset of the SYF ligand data sets covered by the MHC-pathway method
aMHC prediction score from MHC-pathway method
bImmunoproteasomal cleavage score from MHC-pathway predictions. The TAP prediction method is identical between the two methods. p value for the comparison of NetCTLpan to MHC-pathway are calculated by a paired t test excluding ties