| Literature DB >> 21092157 |
Peng Wang1, John Sidney, Yohan Kim, Alessandro Sette, Ole Lund, Morten Nielsen, Bjoern Peters.
Abstract
BACKGROUND: MHC class II binding predictions are widely used to identify epitope candidates in infectious agents, allergens, cancer and autoantigens. The vast majority of prediction algorithms for human MHC class II to date have targeted HLA molecules encoded in the DR locus. This reflects a significant gap in knowledge as HLA DP and DQ molecules are presumably equally important, and have only been studied less because they are more difficult to handle experimentally.Entities:
Mesh:
Substances:
Year: 2010 PMID: 21092157 PMCID: PMC2998531 DOI: 10.1186/1471-2105-11-568
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Overview of human MHC class II loci, allele and polymorphism.
| Locus | Gene | Chain | # of alleles |
|---|---|---|---|
| HLA-DP | HLA-DPA1 | alpha | 28 |
| HLA-DP | HLA-DPB1 | beta | 138 |
| HLA-DQ | HLA-DQA1 | alpha | 35 |
| HLA-DQ | HLA-DQB1 | beta | 108 |
| HLA-DR | HLA-DRA | alpha | 3 |
| HLA-DR | HLA-DRB1 | beta | 785 |
| HLA-DR | HLA-DRB2 | beta | 1 |
| HLA-DR | HLA-DRB3 | beta | 52 |
| HLA-DR | HLA-DRB4 | beta | 14 |
| HLA-DR | HLA-DRB5 | beta | 19 |
| HLA-DR | HLA-DRB6 | beta | 3 |
| HLA-DR | HLA-DRB7 | beta | 2 |
| HLA-DR | HLA-DRB8 | beta | 1 |
| HLA-DR | HLA-DRB9 | beta | 1 |
Information was extracted from IMGT database. HLA-DM and HLA-DO molecules are not included as they are not expressed on cell surface.
Overview of MHC class II binding dataset utilized in the present study.
| Allelic variant | # of binding affinities | % of binders | ||
|---|---|---|---|---|
| HLA-DPA1*0201-DPB1*0101 | 1399 | 702 | 0.5 | 16.0 |
| HLA-DPA1*0103-DPB1*0201 | 1404 | 635 | 0.45 | 17.5 |
| HLA-DPA1*01-DPB1*0401 | 1337 | 540 | 0.4 | 36.2 |
| HLA-DPA1*0301-DPB1*0402 | 1407 | 621 | 0.44 | 41.6 |
| HLA-DPA1*0201-DPB1*0501 | 1410 | 528 | 0.37 | 21.7 |
| HLA-DQA1*0501-DQB1*0201 | 1658 | 742 | 0.45 | 11.3 |
| HLA-DQA1*0501-DQB1*0301 | 1689 | 1023 | 0.61 | 35.1 |
| HLA-DQA1*0301-DQB1*0302 | 1719 | 670 | 0.39 | 19.0 |
| HLA-DQA1*0401-DQB1*0402 | 1701 | 731 | 0.43 | 12.8 |
| HLA-DQA1*0101-DQB1*0501 | 1739 | 687 | 0.4 | 14.6 |
| HLA-DQA1*0102-DQB1*0602 | 1629 | 974 | 0.6 | 14.6 |
| HLA-DRB1*0101 | 6427 | 4519 | 0.7 | 5.4 |
| HLA-DRB1*0301 | 1715 | 553 | 0.32 | 13.7 |
| HLA-DRB1*0401 | 1769 | 978 | 0.55 | 4.6 |
| HLA-DRB1*0404 | 577 | 396 | 0.69 | 3.6 |
| HLA-DRB1*0405 | 1582 | 806 | 0.51 | 6.2 |
| HLA-DRB1*0701 | 1745 | 1033 | 0.59 | 13.5 |
| HLA-DRB1*0802 | 1520 | 591 | 0.39 | 4.9 |
| HLA-DRB1*0901 | 1520 | 815 | 0.54 | 6.2 |
| HLA-DRB1*1101 | 1794 | 957 | 0.53 | 11.8 |
| HLA-DRB1*1302 | 1580 | 656 | 0.42 | 7.7 |
| HLA-DRB1*1501 | 1769 | 909 | 0.51 | 12.2 |
| HLA-DRB3*0101 | 1501 | 426 | 0.28 | 26.1 |
| HLA-DRB4*0101 | 1521 | 654 | 0.43 | 41.8 |
| HLA-DRB5*0101 | 1769 | 992 | 0.56 | 16.0 |
| H-2-IAb | 660 | 180 | 0.27 | - |
| Total | 44541 | 22318 | ||
| Min | 577 | 180 | ||
| Max | 6427 | 4519 | ||
| DP | 92.6 | |||
| DQ | 81.6 | |||
| DRB1 | 71.0 | |||
| DRB3/4/5 | 70.9 | |||
| Total | 99.9 | |||
1. Binder defined as IC50 <1000 nM.
2. Average haplotype and phenotype frequencies for individual alleles are based on data available at dbMHC. dbMHC data considers prevalence in Europe, North Africa, North-East Asia, the South Pacific (Australia and Oceania), Hispanic North and South America, American Indian, South-East Asia, South-West Asia, and Sub-Saharan Africa populations. DP, DRB1 and DRB3/4/5 frequencies consider only the beta chain frequency, given that the DRA chain is largely monomorphic, and that differences in DPA are not hypothesized to significantly influence binding. Frequency data are not available for DRB3/4/5 alleles. However, because of linkage with DRB1 alleles, coverage for these specificities may be assumed as follows: DRB3 with DR3, DR11, DR12, DR13 and DR14; DRB4 with DR4, DR7 and DR9; DRB5 with DR15 and DR16. Specific allele frequencies at each B3/B4/B5 locus is based on published associations with various DRB1 alleles, and assumes only limited variation at the indicated locus.
Comparison of ARB, SMM-align and PROPRED's performance on current and old dataset.
| Allelic variant | ARB | SMM-align | PROPRED | |||
|---|---|---|---|---|---|---|
| HLA-DPA1*0103-DPB1*0201 | 0.823 | |||||
| HLA-DPA1*01-DPB1*0401 | 0.847 | |||||
| HLA-DPA1*0201-DPB1*0101 | 0.824 | |||||
| HLA-DPA1*0201-DPB1*0501 | 0.859 | |||||
| HLA-DPA1*0301-DPB1*0402 | 0.821 | |||||
| HLA-DQA1*0101-DQB1*0501 | 0.871 | |||||
| HLA-DQA1*0102-DQB1*0602 | 0.777 | |||||
| HLA-DQA1*0301-DQB1*0302 | 0.748 | |||||
| HLA-DQA1*0401-DQB1*0402 | 0.845 | |||||
| HLA-DQA1*0501-DQB1*0201 | 0.855 | |||||
| HLA-DQA1*0501-DQB1*0301 | 0.844 | |||||
| HLA-DRB1*0101 | 0.770 | 0.764 | 0.769 | 0.720 | 0.738 | |
| HLA-DRB1*0301 | 0.753 | 0.660 | 0.693 | 0.699 | 0.652 | |
| HLA-DRB1*0401 | 0.731 | 0.667 | 0.684 | 0.737 | 0.686 | |
| HLA-DRB1*0404 | 0.707 | 0.724 | 0.753 | 0.769 | 0.789 | |
| HLA-DRB1*0405 | 0.771 | 0.669 | 0.694 | 0.767 | 0.750 | |
| HLA-DRB1*0701 | 0.767 | 0.692 | 0.776 | 0.773 | 0.776 | |
| HLA-DRB1*0802 | 0.702 | 0.737 | 0.741 | 0.750 | 0.647 | |
| HLA-DRB1*0901 | 0.747 | 0.622 | 0.660 | |||
| HLA-DRB1*1101 | 0.800 | 0.731 | 0.808 | 0.804 | 0.796 | |
| HLA-DRB1*1302 | 0.727 | 0.787 | 0.695 | 0.600 | 0.584 | |
| HLA-DRB1*1501 | 0.763 | 0.700 | 0.738 | 0.743 | 0.715 | |
| HLA-DRB3*0101 | 0.709 | 0.590 | 0.677 | |||
| HLA-DRB4*0101 | 0.785 | 0.741 | 0.713 | |||
| HLA-DRB5*0101 | 0.760 | 0.703 | 0.751 | 0.728 | 0.790 | |
| H-2-IAb | 0.800 | 0.803 | 0.746 | |||
| Average | 0.784 | 0.706 | 0.727 | 0.726 | 0.731 | |
| Min | 0.702 | 0.590 | 0.741 | 0.660 | 0.600 | 0.584 |
| Max | 0.871 | 0.803 | 0.932 | 0.808 | 0.804 | 0.796 |
Best prediction performance for each allelic variant was highlighted in bold.
1. The current AUC values for ARB and SMM-align were derived by cross-validation. The current AUC values for PROPRED were derived by predicting affinities for the new dataset.
2. The old AUC values were taken from previous evaluation [30].
Cross validation prediction performances of all methods on complete and similarity reduced datasets measured with AUC.
| Allelic variant | ARB | SMM-align | PROPRED | combinatorial library | NN-align | Consensus | ||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| HLA-DPA1*0103-DPB1*0201 | 0.823 | 0.745 | 0.921 | 0.767 | 0.840 | 0.724 | 0.793 | 0.932 | 0.935 | 0.796 | ||||
| HLA-DPA1*01-DPB1*0401 | 0.847 | 0.746 | 0.930 | 0.767 | 0.833 | 0.704 | 0.802 | 0.938 | 0.941 | 0.794 | ||||
| HLA-DPA1*0201-DPB1*0101 | 0.824 | 0.743 | 0.909 | 0.786 | 0.849 | 0.723 | 0.818 | 0.927 | 0.818 | 0.932 | ||||
| HLA-DPA1*0201-DPB1*0501 | 0.859 | 0.709 | 0.923 | 0.728 | 0.867 | 0.729 | 0.942 | 0.781 | 0.946 | 0.782 | ||||
| HLA-DPA1*0301-DPB1*0402 | 0.821 | 0.771 | 0.932 | 0.818 | 0.864 | 0.756 | 0.828 | 0.938 | 0.941 | 0.830 | ||||
| HLA-DQA1*0101-DQB1*0501 | 0.871 | 0.741 | 0.930 | 0.783 | 0.809 | 0.728 | 0.805 | 0.933 | 0.809 | 0.942 | ||||
| HLA-DQA1*0102-DQB1*0602 | 0.777 | 0.708 | 0.838 | 0.734 | 0.765 | 0.752 | 0.762 | 0.851 | 0.778 | 0.859 | ||||
| HLA-DQA1*0301-DQB1*0302 | 0.748 | 0.637 | 0.807 | 0.663 | 0.698 | 0.616 | 0.823 | 0.690 | 0.837 | 0.692 | ||||
| HLA-DQA1*0401-DQB1*0402 | 0.845 | 0.643 | 0.896 | 0.761 | 0.681 | 0.637 | 0.742 | 0.908 | 0.749 | 0.916 | ||||
| HLA-DQA1*0501-DQB1*0201 | 0.855 | 0.700 | 0.901 | 0.736 | 0.586 | 0.620 | 0.777 | 0.917 | 0.774 | 0.923 | ||||
| HLA-DQA1*0501-DQB1*0301 | 0.844 | 0.756 | 0.910 | 0.801 | 0.802 | 0.745 | 0.811 | 0.917 | 0.814 | 0.919 | ||||
| HLA-DRB1*0101 | 0.770 | 0.710 | 0.798 | 0.756 | 0.720 | 0.692 | 0.739 | 0.697 | 0.763 | 0.810 | 0.759 | 0.820 | ||
| HLA-DRB1*0301 | 0.753 | 0.728 | 0.852 | 0.808 | 0.699 | 0.669 | 0.829 | 0.862 | 0.823 | 0.873 | ||||
| HLA-DRB1*0401 | 0.731 | 0.668 | 0.781 | 0.721 | 0.737 | 0.711 | 0.734 | 0.799 | 0.735 | 0.804 | ||||
| HLA-DRB1*0404 | 0.707 | 0.681 | 0.816 | 0.789 | 0.769 | 0.753 | 0.823 | 0.803 | 0.826 | 0.800 | ||||
| HLA-DRB1*0405 | 0.771 | 0.716 | 0.822 | 0.767 | 0.767 | 0.742 | 0.794 | 0.847 | 0.851 | |||||
| HLA-DRB1*0701 | 0.767 | 0.736 | 0.834 | 0.796 | 0.773 | 0.750 | 0.762 | 0.729 | 0.851 | 0.806 | 0.858 | 0.808 | ||
| HLA-DRB1*0802 | 0.702 | 0.649 | 0.741 | 0.689 | 0.647 | 0.641 | 0.698 | 0.772 | 0.708 | 0.778 | ||||
| HLA-DRB1*0901 | 0.747 | 0.654 | 0.765 | 0.696 | 0.572 | 0.553 | 0.713 | 0.801 | 0.796 | |||||
| HLA-DRB1*1101 | 0.800 | 0.777 | 0.864 | 0.829 | 0.804 | 0.779 | 0.847 | 0.880 | 0.850 | 0.885 | ||||
| HLA-DRB1*1302 | 0.727 | 0.667 | 0.797 | 0.754 | 0.600 | 0.577 | 0.732 | 0.796 | 0.742 | 0.811 | ||||
| HLA-DRB1*1501 | 0.763 | 0.696 | 0.796 | 0.741 | 0.743 | 0.703 | 0.756 | 0.820 | 0.756 | 0.827 | ||||
| HLA-DRB3*0101 | 0.709 | 0.678 | 0.819 | 0.780 | 0.655 | 0.655 | 0.798 | 0.834 | 0.787 | 0.844 | ||||
| HLA-DRB4*0101 | 0.785 | 0.747 | 0.816 | 0.762 | 0.697 | 0.691 | 0.789 | 0.844 | 0.846 | 0.784 | ||||
| HLA-DRB5*0101 | 0.760 | 0.697 | 0.832 | 0.776 | 0.728 | 0.711 | 0.795 | 0.848 | 0.786 | 0.851 | ||||
| H-2-IAb | 0.800 | 0.775 | 0.855 | 0.830 | 0.858 | 0.853 | 0.846 | |||||||
| Average | 0.785 | 0.711 | 0.850 | 0.763 | 0.726 | 0.703 | 0.751 | 0.691 | 0.782 | 0.864 | 0.783 | 0.871 | ||
| Min | 0.702 | 0.637 | 0.741 | 0.663 | 0.600 | 0.577 | 0.572 | 0.553 | 0.796 | 0.693 | 0.772 | 0.690 | 0.778 | 0.692 |
| Max | 0.871 | 0.777 | 0.932 | 0.830 | 0.804 | 0.779 | 0.867 | 0.756 | 0.956 | 0.847 | 0.942 | 0.850 | 0.946 | 0.854 |
1. SR1stands for similarity reduced.
2. The Consensus-best3 method is based on NN-align, SMM-align and combinatorial peptide library. PROPRED was used for allelic variants when combinatorial peptide library was not available
Best prediction performance for each allelic variant was highlighted. The best performing method for "ALL" dataset was highlighted with underline while the best performing method for "SR" dataset was highlighted in bold.
Figure 1A Venn diagram illustrating the relationship among "ALL", "SR' and "SP" datasets. The simulated dataset illustrated the superset relationships among the "ALL", "SR" and "SP" sets. The "ALL" dataset contains all three peptides. The "SR" dataset contains two peptides with one of the similar peptide being removed and the "SP" dataset only contains a single peptide that shares no similarity with any other peptides.
Prediction performance on singular peptide set (SP) using training sets with and without homologs.
| Allelic variant | SR | ALL | AUC reduction1 | # peptide reduction2 | % peptide reduction3 |
|---|---|---|---|---|---|
| HLA-DPA1*0103-DPB1*0201 | 0.787 | 0.797 | 0.010 | 801 | 0.571 |
| HLA-DPA1*01-DPB1*0401 | 0.809 | 0.801 | -0.008 | 797 | 0.596 |
| HLA-DPA1*0201-DPB1*0101 | 0.764 | 0.735 | -0.029 | 795 | 0.568 |
| HLA-DPA1*0201-DPB1*0501 | 0.587 | 0.640 | 0.053 | 824 | 0.584 |
| HLA-DPA1*0301-DPB1*0402 | 0.744 | 0.772 | 0.028 | 805 | 0.572 |
| HLA-DQA1*0101-DQB1*0501 | 0.850 | 0.821 | -0.029 | 1155 | 0.664 |
| HLA-DQA1*0102-DQB1*0602 | 0.667 | 0.719 | 0.052 | 1036 | 0.636 |
| HLA-DQA1*0301-DQB1*0302 | 0.569 | 0.756 | 0.187 | 1123 | 0.653 |
| HLA-DQA1*0401-DQB1*0402 | 0.632 | 0.551 | -0.081 | 1116 | 0.656 |
| HLA-DQA1*0501-DQB1*0201 | 0.587 | 0.652 | 0.065 | 1069 | 0.645 |
| HLA-DQA1*0501-DQB1*0301 | 0.764 | 0.766 | 0.002 | 1087 | 0.644 |
| HLA-DRB1*0101 | 0.777 | 0.781 | 0.004 | 2923 | 0.455 |
| HLA-DRB1*0301 | 0.782 | 0.786 | 0.004 | 579 | 0.338 |
| HLA-DRB1*0401 | 0.682 | 0.709 | 0.027 | 548 | 0.310 |
| HLA-DRB1*0404 | 0.805 | 0.818 | 0.013 | 103 | 0.179 |
| HLA-DRB1*0405 | 0.765 | 0.748 | -0.017 | 533 | 0.337 |
| HLA-DRB1*0701 | 0.793 | 0.810 | 0.017 | 570 | 0.327 |
| HLA-DRB1*0802 | 0.672 | 0.622 | -0.050 | 503 | 0.331 |
| HLA-DRB1*0901 | 0.669 | 0.651 | -0.018 | 478 | 0.314 |
| HLA-DRB1*1101 | 0.809 | 0.799 | -0.010 | 590 | 0.329 |
| HLA-DRB1*1302 | 0.712 | 0.733 | 0.021 | 510 | 0.323 |
| HLA-DRB1*1501 | 0.712 | 0.719 | 0.007 | 598 | 0.338 |
| HLA-DRB3*0101 | 0.829 | 0.838 | 0.009 | 514 | 0.342 |
| HLA-DRB4*0101 | 0.762 | 0.745 | -0.017 | 510 | 0.335 |
| HLA-DRB5*0101 | 0.774 | 0.798 | 0.024 | 571 | 0.323 |
| H-2-IAb | 0.816 | 0.833 | 0.017 | 114 | 0.173 |
| Average | 0.737 | 0.748 | 0.011 | 779 | 0.444 |
The "ALL" column indicates 5-fold cross validation performance of this subset trained with entire dataset. The "SR" indicates 5-fold cross validation performance of this subset trained with sequence similarity reduced dataset.
1. AUC reduction = AUC all - AUC SR
2. # peptide reduction = # peptide all - # peptide SR
3. % peptide reduction = (# peptide all - # peptide SR)/# peptide all