| Literature DB >> 25392716 |
Clemente Aguilar-Bonavides1, Reinaldo Sanchez-Arias2, Cristina Lanzas1,3.
Abstract
BACKGROUND: The major histocompatibility complex (MHC) is responsible for presenting antigens (epitopes) on the surface of antigen-presenting cells (APCs). When pathogen-derived epitopes are presented by MHC class II on an APC surface, T cells may be able to trigger an specific immune response. Prediction of MHC-II epitopes is particularly challenging because the open binding cleft of the MHC-II molecule allows epitopes to bind beyond the peptide binding groove; therefore, the molecule is capable of accommodating peptides of variable length. Among the methods proposed to predict MHC-II epitopes, artificial neural networks (ANNs) and support vector machines (SVMs) are the most effective methods. We propose a novel classification algorithm to predict MHC-II called sparse representation via ℓ 1-minimization.Entities:
Keywords: Classification algorithms; Epitope prediction; Immunoinformatics; MHC class II; Machine learning; Peptide binding; Sparse representation
Year: 2014 PMID: 25392716 PMCID: PMC4225598 DOI: 10.1186/1756-0381-7-23
Source DB: PubMed Journal: BioData Min ISSN: 1756-0381 Impact factor: 2.522
Peptide sequences and their binding affinities
| | Bind | 49 | 93 | 112 | 153 | 233 | 285 | 321 |
| H2-IAb | Non-bind | 485 | 441 | 422 | 381 | 301 | 249 | 213 |
| | Total | 534 | 534 | 534 | 534 | 534 | 534 | 534 |
| | Bind | 34 | 43 | 52 | 56 | 68 | 70 | 79 |
| H2-IAd | Non-bind | 214 | 205 | 196 | 192 | 180 | 178 | 169 |
| | Total | 248 | 248 | 248 | 248 | 248 | 248 | 248 |
| HLA-DPA1*0103/ | Bind | 32 | 53 | 67 | 92 | 149 | 171 | 175 |
| DPB1*0201 | Non-bind | 264 | 243 | 229 | 204 | 147 | 125 | 121 |
| Total | 296 | 296 | 296 | 296 | 296 | 296 | 296 | |
| | Bind | 2253 | 2988 | 3326 | 3799 | 4503 | 4783 | 4924 |
| HLA-DRB1*0101 | Non-bind | 3095 | 2360 | 2022 | 1549 | 845 | 565 | 424 |
| | Total | 5348 | 5348 | 5348 | 5348 | 5348 | 5348 | 5348 |
| | Bind | 58 | 87 | 102 | 139 | 196 | 210 | 270 |
| HLA-DRB1*0401 | Non-bind | 252 | 223 | 208 | 171 | 114 | 100 | 40 |
| Total | 310 | 310 | 310 | 310 | 310 | 310 | 310 |
Figure 1A 10-fold cross validation partition example.
Average results for 10-fold cross-validation with -minimization and SVM
| | | | | | | |
|---|---|---|---|---|---|---|
| | | | | | ||
| Avg. Sn ( | 89 (6) | 86 (2) | 81 (12) | 97 (6) | 97 (25) | 97 (22) |
| Avg. Sp ( | 86 (10) | 54 (9) | 82 (10) | 29 (10) | 8 (22) | 9 (21) |
| Avg. Acc. ( | 88 (4) | 70 (7) | 82 (8) | 85 (8) | 82 (12) | 82 (15) |
| Avg. MCC | 0.7558 | 0.4418 | 0.6441 | 0.3715 | 0.1 | 0 |
| Time in secs | 0.1650 | 0.1368 | 0.1368 | 1.5198 | 1.6831 | 1.9103 |
| | | | | | | |
| Avg. Sn ( | 78 (4) | 81 (1) | 78 (1) | 83 (7) | 97 (23) | 100 (37) |
| Avg. Sp ( | 76 (10) | 76 (2) | 77 (6) | 72 (12) | 31 (20) | 2 (37) |
| Avg. Acc. ( | 77 (3) | 78 (6) | 77 (8) | 81 (7) | 86 (10) | 83 (19) |
| Avg. MCC | 0.5559 | 0.5804 | 0.5694 | 0.4738 | 0.3951 | 0 |
| Time in secs | 0.1039 | 0.1268 | 0.1597 | 0.1770 | 0.2045 | 0.1928 |
Numbers in parenthesis indicate standard deviation.
Logistic regression p-values with cutoff value, encoding factor and predictive method as predictors
| Sn | *** | *** | * | |
| | Sp | *** | *** | *** |
| | Acc | *** | NS | NS |
| Sn | NS | NS | *** | |
| | Sp | *** | *** | *** |
| | Acc | NS | NS | NS |
| Sn | *** | *** | *** | |
| | Sp | *** | *** | *** |
| | Acc | NS | NS | NS |
| Sn | *** | *** | ** | |
| | Sp | *** | *** | ** |
| | Acc | *** | NS | NS |
| Sn | *** | *** | ** | |
| | Sp | *** | *** | *** |
| Acc | NS | NS | NS |
95% Confidence interval (CI) and P-values NS (No significant), *P < 0.05, **P < 0.01, ***P <.001.
Figure 211-factor encoding. ROCs for sparse representation and SVM.
Figure 3DPPS encoding. ROCs for sparse representation and SVM.
Figure 4Binary encoding. ROCs for sparse representation and SVM.
11-factor and DPPS encoding for H2-IA
| | ||||||||
|---|---|---|---|---|---|---|---|---|
| | Sn Avg (%) | 100.00 | 97.05 | 97.15 | 96.51 | 92.00 | 68.28 | 58.72 |
| SR ( | Sp Avg (%) | 0.00 | 9.67 | 12.65 | 16.44 | 24.17 | 48.05 | 60.15 |
| | Acc Avg (%) | 90.00 | 81.82 | 79.41 | 73.58 | 62.38 | 57.49 | 59.60 |
| | Sn Avg (%) | 99.40 | 98.58 | 99.74 | 100.00 | 100.00 | 99.56 | 96.28 |
| SVM | Sp Avg (%) | 17.86 | 15.97 | 16.84 | 12.22 | 8.09 | 8.14 | 11.52 |
| | Acc Avg (%) | 91.94 | 84.11 | 82.32 | 74.84 | 59.80 | 50.82 | 45.31 |
| | ||||||||
| | Sn Avg (%) | 96.93 | 95.23 | 94.08 | 89.43 | 67.82 | 48.98 | 34.81 |
| SR ( | Sp Avg (%) | 22.50 | 34.44 | 38.41 | 57.65 | 55.34 | 63.87 | 74.77 |
| | Acc Avg (%) | 89.99 | 84.63 | 82.40 | 79.84 | 62.39 | 56.94 | 58.79 |
| | Sn Avg (%) | 85.77 | 84.58 | 87.36 | 90.62 | 85.72 | 84.75 | 84.59 |
| SVM | Sp Avg (%) | 49.50 | 54.22 | 51.01 | 56.44 | 42.57 | 41.77 | 39.28 |
| Acc Avg (%) | 82.40 | 79.26 | 79.77 | 80.15 | 66.88 | 61.81 | 57.30 | |
11-factor and DPPS encoding for H2-IA
| | ||||||||
|---|---|---|---|---|---|---|---|---|
| | Sn Avg (%) | 100.00 | 100.00 | 94.74 | 96.58 | 93.90 | 96.19 | 97.65 |
| SR ( | Sp Avg (%) | 0.00 | 0.00 | 8.33 | 5.56 | 36.00 | 4.76 | 7.50 |
| | Acc Avg (%) | 90.00 | 80.00 | 75.80 | 77.00 | 70.48 | 70.22 | 68.80 |
| | Sn Avg (%) | 96.93 | 98.37 | 98.57 | 97.74 | 98.33 | 100.00 | 99.35 |
| SVM | Sp Avg (%) | 23.61 | 24.17 | 25.24 | 18.10 | 26.25 | 25.40 | 23.61 |
| | Acc Avg (%) | 86.19 | 85.80 | 83.17 | 79.80 | 69.33 | 78.94 | 75.00 |
| | ||||||||
| | Sn Avg (%) | 94.62 | 95.06 | 95.95 | 92.71 | 78.29 | 89.84 | 85.18 |
| SR ( | Sp Avg (%) | 10.71 | 30.00 | 36.67 | 37.00 | 63.00 | 61.43 | 59.64 |
| | Acc Avg (%) | 82.68 | 83.46 | 83.51 | 80.27 | 72.23 | 81.82 | 77.02 |
| | Sn Avg (%) | 70.82 | 79.50 | 81.63 | 76.03 | 83.05 | 76.83 | 82.32 |
| SVM | Sp Avg (%) | 45.00 | 33.00 | 39.67 | 50.33 | 39.00 | 65.71 | 51.79 |
| Acc Avg (%) | 67.33 | 71.39 | 72.98 | 70.08 | 65.30 | 73.73 | 72.63 | |
11-factor and DPPS encoding for HLA-DPA1*0103/DPB1*0201
| | ||||||||
|---|---|---|---|---|---|---|---|---|
| | Sn Avg (%) | 96.15 | 95.83 | 94.14 | 90.24 | 76.29 | 68.91 | 71.09 |
| SR ( | Sp Avg (%) | 0.00 | 0.00 | 18.25 | 29.22 | 70.38 | 75.52 | 75.39 |
| | Acc Avg (%) | 86.21 | 79.31 | 77.28 | 71.29 | 73.31 | 72.64 | 73.61 |
| | Sn Avg (%) | 96.19 | 96.32 | 95.20 | 93.44 | 89.10 | 83.78 | 72.69 |
| SVM | Sp Avg (%) | 16.67 | 21.00 | 22.14 | 24.57 | 58.29 | 72.09 | 81.24 |
| | Acc Avg (%) | 88.02 | 82.83 | 78.72 | 72.21 | 73.59 | 77.06 | 77.69 |
| | ||||||||
| | Sn Avg (%) | 93.93 | 92.22 | 87.37 | 83.29 | 70.67 | 66.28 | 67.76 |
| SR ( | Sp Avg (%) | 18.33 | 11.00 | 26.67 | 51.00 | 75.86 | 80.78 | 85.85 |
| | Acc Avg (%) | 85.82 | 77.69 | 73.64 | 73.30 | 73.31 | 74.67 | 78.44 |
| | Sn Avg (%) | 58.33 | 46.55 | 43.66 | 44.52 | 49.10 | 55.00 | 57.82 |
| SVM | Sp Avg (%) | 77.50 | 94.67 | 94.29 | 93.33 | 88.67 | 87.09 | 87.42 |
| Acc Avg (%) | 60.42 | 55.14 | 55.08 | 59.72 | 68.99 | 73.62 | 75.33 | |
11-factor and DPPS encoding for HLA-DRB1*0101
| | ||||||||
|---|---|---|---|---|---|---|---|---|
| | Sn Avg (%) | 94.70 | 81.31 | 73.10 | 60.49 | 40.13 | 13.98 | 4.70 |
| SR ( | Sp Avg (%) | 10.70 | 38.79 | 51.11 | 68.25 | 83.50 | 96.72 | 99.23 |
| | Acc Avg (%) | 59.31 | 57.55 | 59.42 | 66.00 | 76.65 | 87.98 | 91.74 |
| | Sn Avg (%) | 80.46 | 79.84 | 76.55 | 73.05 | 55.24 | 37.70 | 20.51 |
| SVM | Sp Avg (%) | 45.39 | 45.22 | 51.70 | 59.53 | 80.45 | 88.48 | 93.20 |
| | Acc Avg (%) | 65.67 | 65.24 | 62.66 | 64.64 | 76.46 | 83.11 | 87.43 |
| | ||||||||
| | Sn Avg (%) | 72.89 | 52.12 | 39.46 | 24.60 | 8.41 | 4.61 | 4.24 |
| SR ( | Sp Avg (%) | 48.24 | 72.52 | 82.56 | 91.84 | 98.13 | 99.33 | 99.59 |
| | Acc Avg (%) | 62.51 | 63.52 | 66.27 | 72.36 | 83.96 | 89.32 | 92.03 |
| | Sn Avg (%) | 70.72 | 68.74 | 56.70 | 72.50 | 59.55 | 56.79 | 55.44 |
| SVM | Sp Avg (%) | 54.62 | 57.37 | 75.38 | 49.70 | 75.77 | 76.90 | 79.59 |
| Acc Avg (%) | 63.94 | 62.38 | 68.32 | 56.32 | 73.20 | 74.78 | 77.67 | |
11-factor and DPPS encoding for HLA-DRB1*0401
| | ||||||||
|---|---|---|---|---|---|---|---|---|
| | Sn Avg (%) | 96.00 | 95.51 | 89.92 | 75.49 | 37.65 | 35.00 | 20.00 |
| SR ( | Sp Avg (%) | 0.00 | 1.79 | 3.23 | 26.65 | 70.50 | 74.76 | 91.48 |
| | Acc Avg (%) | 77.42 | 69.47 | 61.44 | 53.55 | 58.37 | 61.94 | 82.26 |
| | Sn Avg (%) | 97.40 | 97.46 | 98.10 | 93.56 | 94.70 | 98.00 | 25.00 |
| SVM | Sp Avg (%) | 27.04 | 32.14 | 19.73 | 28.63 | 30.08 | 24.76 | 98.15 |
| | Acc Avg (%) | 84.28 | 78.95 | 72.30 | 64.45 | 53.86 | 48.39 | 86.45 |
| | ||||||||
| | Sn Avg (%) | 93.65 | 87.37 | 84.10 | 62.68 | 42.12 | 43.00 | 10.00 |
| SR ( | Sp Avg (%) | 43.33 | 38.06 | 43.91 | 56.15 | 78.58 | 84.76 | 95.93 |
| | Acc Avg (%) | 84.23 | 73.53 | 70.94 | 59.70 | 65.18 | 71.29 | 84.84 |
| | Sn Avg (%) | 83.68 | 81.45 | 81.85 | 64.97 | 48.82 | 63.75 | 32.50 |
| SVM | Sp Avg (%) | 38.33 | 42.36 | 35.15 | 53.96 | 60.47 | 39.88 | 67.04 |
| Acc Avg (%) | 75.11 | 70.40 | 66.33 | 60.00 | 56.26 | 47.58 | 62.58 | |
Figure 5DPPS encoding accuracy. Comparison of predictive accuracy.