| Literature DB >> 21209859 |
Kasper Winther Jørgensen1, Søren Buus, Morten Nielsen.
Abstract
Major Histocompatibility class II (MHC-II) molecules sample peptides from the extracellular space allowing the immune system to detect the presence of foreign microbes from this compartment. Prediction of MHC class II ligands is complicated by the open binding cleft of the MHC class II molecule, allowing binding of peptides extending out of the binding groove. Furthermore, only a few HLA-DR alleles have been characterized with a sufficient number of peptides (100-200 peptides per allele) to derive accurate description of their binding motif. Little work has been performed characterizing structural properties of MHC class II ligands. Here, we perform one such large-scale analysis. A large set of SYFPEITHI MHC class II ligands covering more than 20 different HLA-DR molecules was analyzed in terms of their secondary structure and surface exposure characteristics in the context of the native structure of the corresponding source protein. We demonstrated that MHC class II ligands are significantly more exposed and have significantly more coil content than other peptides in the same protein with similar predicted binding affinity. We next exploited this observation to derive an improved prediction method for MHC class II ligands by integrating prediction of MHC- peptide binding with prediction of surface exposure and protein secondary structure. This combined prediction method was shown to significantly outperform the state-of-the-art MHC class II peptide binding prediction method when used to identify MHC class II ligands. We also tried to integrate N- and O-glycosylation in our prediction methods but this additional information was found not to improve prediction performance. In summary, these findings strongly suggest that local structural properties influence antigen processing and/or the accessibility of peptides to the MHC class II molecule.Entities:
Mesh:
Substances:
Year: 2010 PMID: 21209859 PMCID: PMC3012731 DOI: 10.1371/journal.pone.0015877
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Mean and standard deviation value for ligands and non-ligands compared for the different groups.
| Class | Ligand | Non-ligand | P-value |
| α-helix | 0.231±0.276 | 0.285±0.322 | P<0.002 |
| β-strand | 0.289±0.207 | 0.279±0.232 | P<0.396 |
| α+β | 0.520±0.171 | 0.564±0.192 | P<0.0002 |
| Coil | 0.480±0.076 | 0.437±0.192 | P<0.0002 |
| RSA | 0.298±0.076 | 0.273±0.099 | P<0.0013 |
| 1-log50k | 0.404±0.173 | 0.404±0.173 | P<0.895 |
P-values are obtained from a paired t-test. Class indicates the different classes/methods used in the analysis. The first groups are self-explanatory (i.e. α-helix, β-strand, α+β, and coil). RSA is the relative surface accessibility. All these values are obtained using NetSurfP. 1-log50k is the binding affinity in log-transformed units obtained from NetMHCIIpan.
Predictive performance of the model compared to NetMHCIIpan as measured by AUC0.1 and AUC.
| RSA |
| Model – Rescaled01 | ||||
| AUC | AUC0.1 | α | AUC | AUC0.1 | P-value | |
| Balanced training set | 0.781 | 0.293 | 0.3 | 0.784 | 0.312 | <0.0004 |
| Rest of training sset | 0.823 | 0.334 | 0.3 | 0.834 | 0.371 | <10−7 |
| Test set | 0.796 | 0.318 | 0.3 | 0.792 | 0.329 | <0.02 |
The balanced set was used to identify the optimal weights for RSA and coil combined with rescaled binding affinities (Rescaled01) as define by Eq. (1). The optimal α-values for each model are given in the table. P-values are given by paired t-tests when comparing AUC0.1 of the model to the NetMHCIIpan method. Rest of training set refers to the training set, and test to the 697 ligands in the test set.
Comparison of glycosylation between ligand and non-ligands.
| Class | Ligand | Non-ligand | P-value |
| l-log50k | 0.404±0.173 | 0.404±0.173 | <0.895 |
| N-glyc | 20 | 40 | <0.015 |
| O-glyc | 7 | 10 | <0.63 |
All the 459 ligands with corresponding non-ligands were analyzed in respect to N- and O-glycosylation. Ligands and non-ligands were defined as described in the text. P-values are based on binomial tests, with a hypothesized proportion of 0.5.
The MHC class II ligands distribution across alleles for the two data sets.
| Allele | Training set | Test set | Allele | Training set | Test set |
| DRB1*0101 | 13 | 47 | DRB1*1104 | 7 | 2 |
| DRB1*0102 | 5 | 1 | DRB1*1201 | 8 | 6 |
| DRB1*0301 | 20 | 89 | DRB1*1301 | 14 | 12 |
| DRB1*0401 | 365 | 154 | DRB1*1302 | 14 | 9 |
| DRB1*0402 | 33 | 4 | DRB1*1401 | 3 | 7 |
| DRB1*0403 | - | 1 | DRB1*1501 | 2 | 21 |
| DRB1*0404 | 43 | 4 | DRB1*1502 | - | 3 |
| DRB1*0405 | 26 | 10 | DRB1*1601 | - | 2 |
| DRB1*0701 | 23 | 27 | DRB3*0101 | - | 3 |
| DRB1*0801 | 33 | 7 | DRB3*0202 | 3 | - |
| DRB1*0802 | - | 1 | DRB3*0301 | 3 | 2 |
| DRB1*0803 | - | 1 | DRB4*0101 | 1 | 5 |
| DRB1*0901 | 4 | 2 | DRB4*0103 | - | 2 |
| DRB1*1001 | 1 | 241 | DRB5*0101 | 7 | 14 |
| DRB1*1101 | 16 | 20 | Total | 644 | 697 |