| Literature DB >> 23368521 |
Cem Meydan1, Hasan H Otu, Osman Uğur Sezerman.
Abstract
BACKGROUND: MHC (Major Histocompatibility Complex) is a key player in the immune response of most vertebrates. The computational prediction of whether a given antigenic peptide will bind to a specific MHC allele is important in the development of vaccines for emerging pathogens, the creation of possibilities for controlling immune response, and for the applications of immunotherapy. One of the problems that make this computational prediction difficult is the detection of the binding core region in peptides, coupled with the presence of bulges and loops causing variations in the total sequence length. Most machine learning methods require the sequences to be of the same length to successfully discover the binding motifs, ignoring the length variance in both motif mining and prediction steps. In order to overcome this limitation, we propose the use of time-based motif mining methods that work position-independently.Entities:
Mesh:
Substances:
Year: 2013 PMID: 23368521 PMCID: PMC3549809 DOI: 10.1186/1471-2105-14-S2-S13
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Figure 1An example of the temporal rule mining process. a-) Schematic representation of the sliding windows approach on a sample set of sequences binding to MHC class I allele HLA A*0201. The windows are shifted with the given s and w values until all sequences are covered. The resulting rules are then filtered by their support. This process is repeated for all s values from -8 to -1 and 1 to 8. b-) Representation of the L→V rule captured with the parameters s = 6 and w = 3. For HLA A*0201, Leucine in 2nd position and Valine around 9th position is a well-known binding motif [40]. Although this motif is present in 6 of the 7 sequences (support of 0.86), it is unlikely to be captured by position specific methods due to length variance and positional shifts.
Figure 2Pseudocode of the temporal apriori motif mining algorithm.
Figure 3Overall flow of the recursive rule mining step.
Figure 4Overview of the experimentation process.
Results of MHC-PPM in class I predictions in Peters dataset [22].
| Allele | # Peptides | ANN | ARB | SMM | MHC-PPM | ||
|---|---|---|---|---|---|---|---|
| 9 | 10 | ||||||
| 9-mers + 10-mers | HLA-A*0201 | 3089 | 1316 | - | 0.919 | 0.931 | |
| HLA-A*0202 | 1447 | 1056 | - | 0.851 | 0.871 | ||
| HLA-A*0203 | 1443 | 1055 | - | 0.838 | 0.878 | ||
| HLA-A*0301 | 2094 | 1082 | - | 0.883 | 0.911 | ||
| HLA-A*0206 | 1437 | 1054 | - | 0.849 | 0.885 | ||
| HLA-A*1101 | 1985 | 1093 | - | 0.897 | 0.932 | ||
| HLA-A*2402 | 197 | 78 | - | 0.722 | 0.809 | ||
| HLA-A*3101 | 1869 | 1057 | - | 0.881 | 0.878 | ||
| HLA-A*3301 | 1140 | 1055 | - | 0.866 | 0.863 | ||
| HLA-A*6801 | 1141 | 1055 | - | 0.827 | 0.864 | ||
| HLA-B*0702 | 1262 | 205 | - | 0.925 | 0.952 | ||
| HLA-B*3501 | 736 | 177 | - | 0.833 | 0.866 | ||
| HLA-B*5101 | 244 | 177 | - | 0.782 | 0.875 | ||
| HLA-B*5301 | 254 | 177 | - | 0.758 | 0.847 | ||
| 9-mers | HLA-A*0101 | 1157 | - | 0.964 | 0.980 | 0.963 | |
| HLA-A*2601 | 672 | - | 0.907 | 0.931 | 0.901 | ||
| HLA-A*2902 | 160 | - | 0.755 | 0.911 | 0.907 | ||
| HLA-A*6802 | 1434 | - | 0.865 | 0.898 | 0.867 | ||
| HLA-B*0801 | 708 | - | 0.936 | 0.943 | 0.926 | ||
| HLA-B*1501 | 978 | - | 0.941 | 0.900 | 0.922 | ||
| HLA-B*1801 | 118 | - | 0.838 | 0.573 | 0.853 | ||
| HLA-B*2705 | 969 | - | 0.938 | 0.915 | 0.938 | ||
| HLA-B*4002 | 118 | - | 0.754 | 0.541 | 0.842 | ||
| HLA-B*4402 | 119 | - | 0.778 | 0.533 | 0.740 | ||
| HLA-B*4403 | 119 | - | 0.763 | 0.461 | 0.770 | ||
| HLA-B*5401 | 255 | - | 0.903 | 0.847 | 0.883 | ||
| HLA-B*5701 | 59 | - | 0.826 | 0.428 | 0.871 | ||
| HLA-B*5801 | 988 | - | 0.961 | 0.889 | 0.944 | ||
| 0.888 | 0.798 | 0.893 | |||||
| 0.888 | 0.751 | 0.894 | |||||
| 0.872 | 0.910 | 0.901 | |||||
The best-performing method for each allele is underlined. The given AUC values for ARB and SMM are the weighted averages of the AUC values for 9-mers and 10-mers based on the given peptide counts for a specific allele. The alleles in the bottom part of the table were only trained & tested in 9-mers and are directly comparable.
Results of MHC-PPM in class II predictions in Wang2008 dataset [19].
| Allele | # | RANKPEP | ARB | PROPRED | SMM-align | MHCMIR | MHC-PPM |
|---|---|---|---|---|---|---|---|
| HLA-DRB1*0101 | 3882 | 0.700 | 0.760 | 0.740 | 0.770 | 0.810 | |
| HLA-DRB1*0301 | 502 | 0.670 | 0.660 | 0.650 | 0.690 | 0.640 | |
| HLA-DRB1*0401 | 512 | 0.630 | 0.670 | 0.690 | 0.680 | 0.666 | |
| HLA-DRB1*0404 | 449 | 0.660 | 0.720 | 0.790 | 0.750 | 0.730 | |
| HLA-DRB1*0405 | 457 | 0.620 | 0.670 | 0.690 | 0.730 | 0.734 | |
| HLA-DRB1*0701 | 505 | 0.580 | 0.690 | 0.780 | 0.780 | 0.830 | |
| HLA-DRB1*0802 | 245 | - | 0.740 | 0.770 | 0.750 | 0.740 | |
| HLA-DRB1*0901 | 412 | 0.610 | 0.620 | - | 0.660 | 0.620 | |
| HLA-DRB1*1101 | 520 | 0.700 | 0.730 | 0.800 | 0.810 | 0.810 | |
| HLA-DRB1*1302 | 289 | 0.520 | 0.580 | 0.690 | 0.720 | 0.679 | |
| HLA-DRB1*1501 | 520 | 0.620 | 0.700 | 0.720 | 0.740 | 0.730 | |
| HLA-DRB3*0101 | 420 | - | 0.590 | - | 0.680 | - | |
| HLA-DRB4*0101 | 245 | 0.650 | 0.740 | - | 0.710 | 0.760 | |
| HLA-DRB5*0101 | 520 | 0.730 | 0.700 | 0.790 | 0.750 | 0.710 | |
| H-2 IAb | 500 | 0.740 | - | 0.750 | 0.690 | 0.786 | |
| H-2 IEd | 39 | 0.830 | - | - | - | - | |
| Average | 0.661 | 0.705 | 0.733 | 0.727 | 0.732 | ||
| Weighted Avg | 0.671 | 0.722 | 0.738 | 0.743 | 0.760 | ||
The (#) column gives the total number of peptides for the given allele. The best-performing method for each allele is underlined.
Results of MHC-PPM in MHC class II predictions in Wang2010 dataset [23].
| Allele | # Peptides | ARB | SMM-align | NN-align | MHC-PPM | |||||
|---|---|---|---|---|---|---|---|---|---|---|
| ALL | SR | ALL | SR | ALL | SR | ALL | SR | ALL | SR | |
| HLA-DPA1*0103-DPB1*0201 | 1404 | 603 | 0.823 | 0.745 | 0.921 | 0.767 | 0.931 | 0.772 | ||
| HLA-DPA1*01-DPB1*0401 | 1337 | 540 | 0.847 | 0.746 | 0.930 | 0.767 | 0.935 | 0.751 | ||
| HLA-DPA1*0201-DPB1*0101 | 1399 | 604 | 0.824 | 0.743 | 0.909 | 0.786 | 0.938 | 0.806 | ||
| HLA-DPA1*0201-DPB1*0501 | 1410 | 586 | 0.859 | 0.709 | 0.923 | 0.728 | 0.948 | 0.773 | ||
| HLA-DPA1*0301-DPB1*0402 | 1407 | 602 | 0.821 | 0.771 | 0.932 | 0.818 | 0.935 | 0.815 | ||
| HLA-DQA1*0101-DQB1*0501 | 1739 | 584 | 0.871 | 0.741 | 0.930 | 0.783 | 0.945 | 0.805 | ||
| HLA-DQA1*0102-DQB1*0602 | 1629 | 593 | 0.777 | 0.708 | 0.838 | 0.734 | 0.842 | 0.730 | ||
| HLA-DQA1*0301-DQB1*0302 | 1719 | 596 | 0.748 | 0.637 | 0.807 | 0.663 | 0.693 | 0.845 | ||
| HLA-DQA1*0401-DQB1*0402 | 1701 | 585 | 0.845 | 0.643 | 0.896 | 0.761 | 0.742 | 0.920 | ||
| HLA-DQA1*0501-DQB1*0201 | 1658 | 589 | 0.855 | 0.700 | 0.901 | 0.736 | 0.919 | 0.766 | ||
| HLA-DQA1*0501-DQB1*0301 | 1689 | 602 | 0.844 | 0.756 | 0.910 | 0.801 | 0.915 | 0.771 | ||
| HLA-DRB1*0101 | 6427 | 3504 | 0.770 | 0.710 | 0.798 | 0.756 | 0.821 | 0.758 | ||
| HLA-DRB1*0301 | 1715 | 1136 | 0.753 | 0.728 | 0.852 | 0.808 | 0.828 | 0.747 | ||
| HLA-DRB1*0401 | 1769 | 1221 | 0.731 | 0.668 | 0.781 | 0.721 | 0.763 | 0.711 | ||
| HLA-DRB1*0404 | 577 | 474 | 0.707 | 0.681 | 0.816 | 0.789 | 0.823 | 0.803 | ||
| HLA-DRB1*0405 | 1582 | 1049 | 0.771 | 0.716 | 0.822 | 0.767 | 0.831 | 0.734 | ||
| HLA-DRB1*0701 | 1745 | 1175 | 0.767 | 0.736 | 0.834 | 0.796 | 0.846 | 0.804 | ||
| HLA-DRB1*0802 | 1520 | 1017 | 0.702 | 0.649 | 0.741 | 0.689 | 0.752 | 0.687 | ||
| HLA-DRB1*0901 | 1520 | 1042 | 0.747 | 0.654 | 0.765 | 0.696 | 0.762 | 0.671 | ||
| HLA-DRB1*1101 | 1794 | 1204 | 0.800 | 0.777 | 0.864 | 0.829 | 0.858 | 0.811 | ||
| HLA-DRB1*1302 | 1580 | 1070 | 0.727 | 0.667 | 0.797 | 0.732 | 0.768 | 0.717 | ||
| HLA-DRB1*1501 | 1769 | 1171 | 0.763 | 0.696 | 0.796 | 0.741 | 0.813 | 0.745 | ||
| HLA-DRB3*0101 | 1501 | 987 | 0.709 | 0.678 | 0.819 | 0.780 | 0.782 | 0.718 | ||
| HLA-DRB4*0101 | 1521 | 1011 | 0.785 | 0.747 | 0.816 | 0.762 | 0.860 | 0.772 | ||
| HLA-DRB5*0101 | 1769 | 1198 | 0.760 | 0.697 | 0.832 | 0.776 | 0.795 | 0.843 | ||
| H-2-Iab | 660 | 546 | 0.800 | 0.775 | 0.855 | 0.830 | 0.824 | 0.807 | ||
| Average | 0.785 | 0.711 | 0.849 | 0.763 | 0.858 | 0.755 | ||||
| WeightedAverage | 0.784 | 0.709 | 0.843 | 0.762 | 0.853 | 0.754 | ||||
Each method contains results from all of the peptides (ALL) and the similarity reduced data (SR). The best-performing method for each allele in ALL dataset is marked by bold and the best performing method in SR dataset is underlined.
Effect of flanking peptides on the binding affinity to HLA DRB1*1501 allele.
| Experimental | ARB | NetMHCIIpan | SMM_align | NN_align | MHC-PPM | |
|---|---|---|---|---|---|---|
| Sequence | IC50(nM) | IC50(nM) | IC50(nM) | IC50(nM) | IC50(nM) | IC50(nM) |
| ENPVVHFFKNIVTPR | 33 | 21.9 | 10 | 21 | 8 | 11 |
| VVHFFKNIVHAAA | 33 | 21.9 | 9.2 | 52 | 10.7 | 139 |
| VVHFFKNIVTAAA | 45 | 21.9 | 9.5 | 20 | 11.5 | 224 |
| VVHFFKNIVT | 35 | 21.9 | 8.1 | 20 | 10.5 | 142 |
| VVHFFKNIVTA | 4 | 21.9 | 8.1 | 20 | 9.8 | 83 |
| VVHFFKNIVTAA | 5 | 21.9 | 8.9 | 20 | 11.1 | 263 |
| 326 | 82.5 | 23.6 | 25 | 23.8 | 316 | |
| A | 454 | 82.5 | 23.8 | 25 | 23.9 | 320 |
| AA | 264 | 1286.7 | 45 | 30 | 74.4 | 392 |
| RMSE | 371.90 | 190.78 | 191.84 | 187.13 | ||
| Pearson's Corr. | 0.349 | 0.041 | 0.540 | 0.721 | ||
Experimental affinity measurements are from [11]. Predictions of other values calculated from the IEDB website[21]. MHC-PPM has the lowest root mean squared error (RMSE) and has a correlation score approximately equal to the top performing method.