| Literature DB >> 21073747 |
Morten Nielsen1, Sune Justesen, Ole Lund, Claus Lundegaard, Søren Buus.
Abstract
BACKGROUND: Binding of peptides to Major Histocompatibility class II (MHC-II) molecules play a central role in governing responses of the adaptive immune system. MHC-II molecules sample peptides from the extracellular space allowing the immune system to detect the presence of foreign microbes from this compartment. Predicting which peptides bind to an MHC-II molecule is therefore of pivotal importance for understanding the immune response and its effect on host-pathogen interactions. The experimental cost associated with characterizing the binding motif of an MHC-II molecule is significant and large efforts have therefore been placed in developing accurate computer methods capable of predicting this binding event. Prediction of peptide binding to MHC-II is complicated by the open binding cleft of the MHC-II molecule, allowing binding of peptides extending out of the binding groove. Moreover, the genes encoding the MHC molecules are immensely diverse leading to a large set of different MHC molecules each potentially binding a unique set of peptides. Characterizing each MHC-II molecule using peptide-screening binding assays is hence not a viable option.Entities:
Year: 2010 PMID: 21073747 PMCID: PMC2994798 DOI: 10.1186/1745-7580-6-9
Source DB: PubMed Journal: Immunome Res ISSN: 1745-7580
Quantitative HLA-DR peptide binding data
| Allele | # | # bind | Allele | # | # bind |
|---|---|---|---|---|---|
| DRB1*0101 | 7685 | 4382 | DRB1*1101 | 1794 | 778 |
| DRB1*0301 | 2505 | 649 | DRB1*1201 | 117 | 81 |
| DRB1*0302 | 148 | 44 | DRB1*1202 | 117 | 79 |
| DRB1*0401 | 3116 | 1039 | DRB1*1302 | 1580 | 493 |
| DRB1*0404 | 577 | 336 | DRB1*1402 | 118 | 78 |
| DRB1*0405 | 1582 | 627 | DRB1*1404 | 30 | 16 |
| DRB1*0701 | 1745 | 849 | DRB1*1412 | 116 | 63 |
| DRB1*0802 | 1520 | 431 | DRB1*1501 | 1769 | 709 |
| DRB1*0806 | 118 | 91 | DRB3*0101 | 1501 | 281 |
| DRB1*0813 | 1370 | 455 | DRB3*0301 | 160 | 70 |
| DRB1*0819 | 116 | 54 | DRB4*0101 | 1521 | 485 |
| DRB1*0901 | 1520 | 622 | DRB5*0101 | 3106 | 1280 |
| Total | 33931 | 13992 | |||
# is the number of peptide binding data for each allele, and #bind is the number of peptides with binding affinity stronger than 500 nM.
MHC class II ligands from the SYFPEITHI database
| Allele | # | Allele | # |
|---|---|---|---|
| HLA-DRB1*0101 | 53 | HLA-DRB1*1101 | 35 |
| HLA-DRB1*0102 | 5 | HLA-DRB1*1104 | 8 |
| HLA-DRB1*0301 | 88 | HLA-DRB1*1201 | 11 |
| HLA-DRB1*0401 | 468 | HLA-DRB1*1301 | 16 |
| HLA-DRB1*0402 | 36 | HLA-DRB1*1302 | 19 |
| HLA-DRB1*0403 | 1 | HLA-DRB1*1401 | 9 |
| HLA-DRB1*0404 | 42 | HLA-DRB1*1501 | 22 |
| HLA-DRB1*0405 | 36 | HLA-DRB1*1502 | 3 |
| HLA-DRB1*0701 | 47 | HLA-DRB1*1601 | 2 |
| HLA-DRB1*0801 | 39 | HLA-DRB3*0101 | 2 |
| HLA-DRB1*0802 | 1 | HLA-DRB3*0301 | 5 |
| HLA-DRB1*0803 | 1 | HLA-DRB4*0101 | 6 |
| HLA-DRB1*0901 | 6 | HLA-DRB4*0103 | 2 |
| HLA-DRB1*1001 | 183 | HLA-DRB5*0101 | 18 |
| Total | 1164 | ||
HLA-DR restriction T cell epitope from the IEDB database
| Allele | # | Allele | # |
|---|---|---|---|
| HLA-DRB1*0101 | 125 | HLA-DRB1*1103 | 3 |
| HLA-DRB1*0102 | 4 | HLA-DRB1*1104 | 6 |
| HLA-DRB1*0103 | 5 | HLA-DRB1*1201 | 3 |
| HLA-DRB1*0301 | 173 | HLA-DRB1*1301 | 15 |
| HLA-DRB1*0401 | 342 | HLA-DRB1*1302 | 10 |
| HLA-DRB1*0402 | 33 | HLA-DRB1*1303 | 3 |
| HLA-DRB1*0403 | 14 | HLA-DRB1*1401 | 16 |
| HLA-DRB1*0404 | 46 | HLA-DRB1*1404 | 1 |
| HLA-DRB1*0405 | 21 | HLA-DRB1*1405 | 2 |
| HLA-DRB1*0406 | 6 | HLA-DRB1*1501 | 193 |
| HLA-DRB1*0407 | 4 | HLA-DRB1*1502 | 20 |
| HLA-DRB1*0408 | 2 | HLA-DRB1*1503 | 2 |
| HLA-DRB1*0701 | 56 | HLA-DRB1*1601 | 5 |
| HLA-DRB1*0703 | 1 | HLA-DRB1*1602 | 3 |
| HLA-DRB1*0801 | 4 | HLA-DRB3*0101 | 12 |
| HLA-DRB1*0802 | 2 | HLA-DRB3*0202 | 10 |
| HLA-DRB1*0803 | 2 | HLA-DRB3*0301 | 1 |
| HLA-DRB1*0901 | 13 | HLA-DRB4*0101 | 17 |
| HLA-DRB1*1001 | 4 | HLA-DRB4*0103 | 1 |
| HLA-DRB1*1101 | 88 | HLA-DRB5*0101 | 55 |
| HLA-DRB1*1102 | 1 | HLA-DRB5*0102 | 1 |
| Total | 1325 | ||
Five-fold cross-validation performance of the pan-specific NetMHCIIpan-2.0 method compared to the allele-specific NN-align and TEPITOPE methods on the quantitative benchmark data set
| DRB1*0101 | 7685 | 4382 | 0.675 | 0.825 | 0.727 | ||
| DRB1*0301 | 2505 | 649 | 0.690 | 0.855 | 0.718 | ||
| DRB1*0302 | 148 | 44 | 0.272 | 0.659 | |||
| DRB1*0401 | 3116 | 1039 | 0.643 | 0.833 | 0.762 | ||
| DRB1*0404 | 577 | 336 | 0.565 | 0.766 | 0.747 | ||
| DRB1*0405 | 1582 | 627 | 0.698 | 0.858 | 0.780 | ||
| DRB1*0701 | 1745 | 849 | 0.718 | 0.855 | 0.777 | ||
| DRB1*0802 | 1520 | 431 | 0.518 | 0.778 | 0.645 | ||
| DRB1*0806 | 118 | 91 | 0.744 | 0.902 | 0.884 | ||
| DRB1*0813 | 1370 | 455 | 0.729 | 0.878 | 0.750 | ||
| DRB1*0819 | 116 | 54 | 0.370 | 0.706 | |||
| DRB1*0901 | 1520 | 622 | 0.597 | 0.810 | |||
| DRB1*1101 | 1794 | 778 | 0.756 | 0.873 | 0.793 | ||
| DRB1*1201 | 117 | 81 | 0.699 | 0.860 | |||
| DRB1*1202 | 117 | 79 | 0.695 | 0.866 | |||
| DRB1*1302 | 1580 | 493 | 0.634 | 0.825 | 0.596 | ||
| DRB1*1402 | 118 | 78 | 0.623 | 0.825 | |||
| DRB1*1404 | 30 | 16 | 0.466 | 0.661 | |||
| DRB1*1412 | 116 | 63 | 0.680 | 0.857 | |||
| DRB1*1501 | 1769 | 709 | 0.641 | 0.815 | 0.731 | ||
| DRB3*0101 | 1501 | 281 | 0.673 | 0.843 | |||
| DRB3*0301 | 160 | 70 | 0.604 | 0.826 | |||
| DRB4*0101 | 1521 | 485 | 0.675 | 0.837 | |||
| DRB5*0101 | 3106 | 1280 | 0.735 | 0.865 | 0.760 | ||
| Ave | 33931 | 13992 | 0.631 | 0.821 | 0.688 | 0.846 | |
| Ave* | 0.673 | 0.841 | 0.697 | 0.854 | 0.744 | ||
| Ave** | 0.580 | 0.797 | 0.679 | 0.837 | |||
# gives the number of peptide binding data for each allele, #bind gives the number of peptides with a binding affinity stronger than 500 nM. NN-align is the method described by Nielsen et al. [5], NetMHCIIpan-2.0 is the method described here, and TEPITOPE is the method described by Sturniolo et al. [1]. Ave gives the per allele average, Ave* gives the per allele average of the 13 alleles characterized by the TEPITOPE method, and Ave** gives the per allele average of the 11 allele not characterized by the TEPITOPE method. In bold is highlighted the best performing method for each of the 24 alleles. AUC values were calculated using a binding threshold of 500 nM. Only AUC values are included for the TEPITOPE method since prediction values for this method are not linearly related to the binding affinity.
LOO benchmark comparison of the pan-specific NetMHCIIpan-2.0 and the NetMHCIIpan-1.0 methods
| DRB1*0101 | 5166 | 3510 | 0.571 | 0.778 | 0.720 | ||
| DRB1*0301 | 1020 | 277 | 0.465 | 0.746 | 0.664 | ||
| DRB1*0401 | 1024 | 510 | 0.591 | 0.775 | 0.716 | ||
| DRB1*0404 | 663 | 386 | 0.693 | 0.852 | 0.770 | ||
| DRB1*0405 | 630 | 425 | 0.594 | 0.808 | 0.759 | ||
| DRB1*0701 | 853 | 498 | 0.655 | 0.825 | 0.761 | ||
| DRB1*0802 | 420 | 148 | 0.637 | 0.841 | 0.766 | ||
| DRB1*0901 | 530 | 254 | 0.406 | 0.653 | |||
| DRB1*1101 | 950 | 429 | 0.580 | 0.799 | 0.721 | ||
| DRB1*1302 | 498 | 199 | 0.323 | 0.648 | 0.652 | ||
| DRB1*1501 | 934 | 450 | 0.533 | 0.738 | 0.686 | ||
| DRB3*0101 | 549 | 75 | 0.449 | 0.716 | |||
| DRB4*0101 | 446 | 200 | 0.448 | 0.724 | |||
| DRB5*0101 | 924 | 478 | 0.627 | 0.831 | 0.686 | ||
| Ave | 0.541 | 0.768 | 0.606 | 0.799 | |||
| Ave* | 0.570 | 0.786 | 0.639 | 0.819 | 0.718 | ||
The two methods are compared in a leave-one-out experiment on the peptide binding data described in the original NetMHCIIpan publication [29].
# is the number of peptide binding data for each allele, #bind is the number of peptides with a binding affinity stronger than 500 nM. NetMHCIIpan-1.0 is the method by Nielsen et al. [29], NetMHCIIpan-2.0 is the method described here, and TEPITOPE is the method by Sturniolo et al. [1]. Prediction values for NetMHCIIpan-1.0 were taken from [29]. Ave gives the per allele average, Ave* gives the per allele average of the 11 alleles characterized by the TEPITOPE method. In bold is highlighted the best performing method for each of the 14 alleles. AUC values were calculated using a binding threshold of 500 nM. Only AUC values are included for the TEPITOPE method since prediction values for this method are not linearly related to the binding affinity.
The extended LOO benchmark
| OLD | NEW | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| DRB1*0101 | 7685 | 4382 | 0.567 | 0.767 | DRB1*0401 | 0.352 | DRB1*1402 | 0.322 | 0.727 | ||
| DRB1*0301 | 2505 | 649 | 0.433 | 0.727 | DRB3*0101 | 0.277 | DRB1*0302 | 0.156 | 0.718 | ||
| DRB1*0401 | 3116 | 1039 | 0.563 | 0.787 | DRB1*0405 | 0.066 | DRB1*0405 | 0.066 | 0.762 | ||
| DRB1*0404 | 577 | 336 | 0.592 | DRB1*0401 | 0.091 | 0.804 | DRB1*0401 | 0.091 | 0.747 | ||
| DRB1*0405 | 1582 | 627 | 0.826 | DRB1*0401 | 0.066 | 0.633 | DRB1*0401 | 0.066 | |||
| DRB1*0701 | 1745 | 849 | DRB1*0901 | 0.504 | 0.648 | 0.826 | DRB1*0901 | 0.504 | 0.780 | ||
| DRB1*0802 | 1520 | 431 | DRB1*1101 | 0.111 | 0.369 | 0.692 | DRB1*0813 | 0.041 | 0.777 | ||
| DRB1*0901 | 1520 | 622 | 0.757 | DRB5*0101 | 0.431 | 0.517 | DRB5*0101 | 0.431 | 0.645 | ||
| DRB1*1101 | 1794 | 778 | DRB1*1302 | 0.084 | 0.460 | 0.741 | DRB1*1302 | 0.084 | |||
| DRB1*1302 | 1580 | 493 | DRB1*1101 | 0.084 | 0.323 | 0.671 | DRB1*1101 | 0.084 | 0.793 | ||
| DRB1*1501 | 1769 | 709 | DRB1*0404 | 0.295 | 0.525 | 0.756 | DRB1*0404 | 0.295 | 0.596 | ||
| DRB3*0101 | 1501 | 281 | 0.339 | 0.672 | DRB1*0301 | 0.277 | DRB3*0301 | 0.223 | 0.731 | ||
| DRB4*0101 | 1521 | 485 | 0.506 | 0.753 | DRB1*0404 | 0.397 | DRB1*0404 | 0.397 | |||
| DRB5*0101 | 3106 | 1280 | 0.547 | 0.781 | DRB1*1101 | 0.295 | DRB1*1101 | 0.295 | |||
| DRB1*0302 | 148 | 44 | 0.396 | 0.729 | DRB1*0301 | 0.156 | 0.759 | DRB1*1402 | 0.119 | ||
| DRB1*0806 | 118 | 91 | 0.670 | 0.886 | DRB1*0802 | 0.107 | DRB1*0802 | 0.107 | |||
| DRB1*0813 | 1370 | 455 | 0.735 | DRB1*0802 | 0.041 | 0.666 | DRB1*0802 | 0.041 | |||
| DRB1*0819 | 116 | 54 | 0.789 | DRB1*0802 | 0.107 | DRB1*0813 | 0.083 | 0.750 | |||
| DRB1*1201 | 117 | 81 | 0.786 | DRB1*1101 | 0.445 | 0.609 | DRB1*1202 | 0.045 | |||
| DRB1*1202 | 117 | 79 | 0.623 | 0.814 | DRB1*1101 | 0.399 | DRB1*1201 | 0.045 | |||
| DRB1*1402 | 118 | 78 | 0.570 | 0.793 | DRB1*1101 | 0.148 | DRB1*0302 | 0.119 | |||
| DRB1*1404 | 30 | 16 | 0.393 | 0.594 | DRB1*0404 | 0.311 | DRB1*0806 | 0.240 | |||
| DRB1*1412 | 116 | 63 | 0.640 | 0.845 | DRB1*0802 | 0.180 | DRB1*0813 | 0.139 | |||
| DRB3*0301 | 160 | 70 | 0.395 | 0.738 | DRB3*0101 | 0.223 | DRB3*0101 | 0.223 | |||
| Ave | 0.527 | 0.766 | 0.554 | 0.780 | |||||||
| Ave* | 0.543 | 0.779 | 0.529 | 0.774 | 0.744 | ||||||
| Ave** | 0.539 | 0.771 | 0.606 | 0.800 | |||||||
The predictive performance of the pan-specific NN-align method when trained in a leave-one-out experiment and evaluated on the 24 alleles included in the new peptide binding data set.
# is the number of peptide binding data for each allele, #bind is the number of peptides with a binding affinity stronger than 500 nM. OLD is the method described here trained on the old peptide data set, NEW is the method described here trained on the new data set, and TEPITOPE is the method by Sturniolo et al. [1]. NN is the nearest neighbor as defined by the pseudo sequence distance, and dist is the nearest neighbor distance calculated as described in Materials and methods. Ave is the per allele average, Ave* is the per allele average of the 13 alleles characterized by the TEPITOPE method, and Ave** is the per-allele average performance of the 10 alleles included in the new peptide binding data set. In bold is highlighted the best performing method for each of the 24 alleles. AUC values were calculated using a binding threshold of 500 nM. Only AUC values are included for the TEPITOPE method since prediction values for this method are not linearly related to the binding affinity. The double line separates the 10 novel alleles from the original 14 alleles included in the development of the NetMHCIIpan-1.0 method.
Predictive performance in terms of the AUC on the Lin benchmark data set
| Allele | Multipred_SVM | SVMHC | |||
|---|---|---|---|---|---|
| DRB1*0101 | 0.883 | 0.847 | 0.860 | 0.860 | |
| DRB1*0301 | 0.716 | 0.668 | 0.718 | 0.690 | |
| DRB1*0401 | 0.815 | 0.745 | 0.650 | 0.750 | |
| DRB1*0701 | 0.852 | 0.715 | 0.700 | 0.740 | |
| DRB1*1101 | 0.821 | 0.824 | 0.780 | 0.830 | |
| DRB1*1301 | 0.715 | 0.718 | 0.630 | 0.720 | |
| DRB1*1501 | 0.791 | 0.737 | 0.620 | 0.660 | |
| Ave | 0.825 | 0.787 | 0.768 | 0.720 | 0.750 |
The AUC was calculated using the following binding affinity threshold values for each of the 7 alleles: DRB1*0101, 0401, 0701, and 1501 threshold = 100 nM, DRB1*0301, 1101, and 1301, threshold = 1000 nM. The performance values for Multipred_SVM and SVMHC were taken from Nielsen et al. [5]. TEPITOPE is the method described by Sturniolo et al. [1]. NetMHCIIpan-2.0 is the pan-specific method described here, and NetMHCIIpan-1.0 is the pan-specific method by Nielsen et al. [29]. For each allele, the best performing method is highlighted in bold.
Endogenous HLA-DR ligand benchmark
| SYF | # | |||
|---|---|---|---|---|
| Ave per ligand | 1164 | 0.800 | ||
| Ave per allele | 28 | 0.788 | ||
| In TEPITOPE | 17 | 0.768 | 0.786 | |
| !In TEPITOPE | 11 | 0.814 | ||
| Ave per epitope | 1325 | 0.729 | ||
| Ave per allele | 42 | 0.759 | ||
| In TEPITOPE | 20 | 0.745 | 0.747 | |
| !In TEPITOPE | 22 | 0.772 | ||
NetMHCIIpan-1.0 is the method described by Nielsen et al. [29], NetMHCIIpan-2.0 is the pan-specific method described here, and TEPITOPE is the method described by Sturniolo et al. [1]. Ave per ligand/epitope gives the average AUC over the 1164/1325 ligands/epitopes in the benchmark data set. Ave per allele gives the average over the per allele averaged AUC values. In TEPITOPE gives the per allele average of the subset of alleles characterized by the TEPITOPE method, and !In TEPITOPE give the per-allele average performance of the alleles not characterized by the TEPITOPE method. AUC values were calculated as described in the text. For each benchmark subset, the best performing method is highlighted in bold.
Figure 1Histogram of the predictive performance measured in terms of the AUC value for the ligands/epitopes in the SYFPEITHI/IEDB dataset as a function of the peptide length. 2.0 refers to the pan-specific method developed here, and 1.0 refers to the NetMHCIIpan-1.0 method. SYF refers to the SYFPEITHI ligand data set, and IEDB refers to the IEDB T cell epitope data set.