| Literature DB >> 23176036 |
Thammakorn Saethang1, Osamu Hirose, Ingorn Kimkong, Vu Anh Tran, Xuan Tho Dang, Lan Anh T Nguyen, Tu Kien T Le, Mamoru Kubo, Yoichi Yamada, Kenji Satou.
Abstract
BACKGROUND: Epitope identification is an essential step toward synthetic vaccine development since epitopes play an important role in activating immune response. Classical experimental approaches are laborious and time-consuming, and therefore computational methods for generating epitope candidates have been actively studied. Most of these methods, however, are based on sophisticated nonlinear techniques for achieving higher predictive performance. The use of these techniques tend to diminish their interpretability with respect to binding potential: that is, they do not provide much insight into binding mechanisms.Entities:
Mesh:
Substances:
Year: 2012 PMID: 23176036 PMCID: PMC3548761 DOI: 10.1186/1471-2105-13-313
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Figure 1Visualization of the HLA-nonapeptide complex. (A) Crystal structure of the LLFGYPVYV-HLA-A*02:01 complex resolved by X-ray crystal diffraction (PDB entry 1DUZ [7]) (B) Conformation of the nonapeptide extracted from the complex.
Amino acid descriptors acknowledged in this study
| DPPS | physicochemical | principal component analysis (PCA) | 10 | [ |
| FASGAI | physicochemical | factor analysis (FA) | 6 | [ |
| z-scale | physicochemical | PCA and partial least square (PLS) | 5 | [ |
| ISA/ECI | quantum-chemical | - | 2 | [ |
Classification result of peptide-encoding schemes
| EpicCapo | 360 | 0.792 ± 0.006 | 0.841 ± 0.004 | 0.915 ± 0.001 | 0.883 | 0.744 | 0.815 | ||||
| EpicCapo(3 AAPPs*) | 27 | 0.876 ± 0.005 | 0.862 ± 0.003 | 0.855 | 0.828 | 0.878 | |||||
| DPPS | 90 | 0.865 ± 0.005 | 0.760 ± 0.007 | 0.834 ± 0.004 | 0.816 ± 0.004 | 0.888 ± 0.001 | 0.868 | 0.697 | 0.807 | 0.785 | 0.878 |
| FASGAI | 54 | 0.847 ± 0.004 | 0.761 ± 0.004 | 0.825 ± 0.003 | 0.801 ± 0.003 | 0.882 ± 0.001 | 0.840 | 0.730 | 0.803 | 0.787 | 0.874 |
| z-scale | 45 | 0.847 ± 0.005 | 0.732 ± 0.005 | 0.815 ± 0.004 | 0.793 ± 0.004 | 0.873 ± 0.002 | 0.848 | 0.676 | 0.788 | 0.765 | 0.858 |
| ISA/ECI | 18 | 0.799 ± 0.005 | 0.652 ± 0.005 | 0.760 ± 0.003 | 0.731 ± 0.003 | 0.797 ± 0.001 | 0.829 | 0.643 | 0.766 | 0.739 | 0.796 |
| Binary encoding | 180 | 0.721 ± 0.006 | 0.831 ± 0.003 | 0.807 ± 0.003 | 0.883 ± 0.002 | 0.705 | 0.820 | 0.799 | 0.879 | ||
Means and standard deviations were calculated by 20 iterations of 10-fold cross validation.
Underlined values represent the highest performance.
sens = sensitivity; spec = specificity; F1 = F-score; ACC = accuracy; AUC = area under the curve.
*These three top-ranked AAPPs were MICC010101, SIMK990101, and SIMK990105 (see Additional file 1).
Figure 2ROC curves of peptide-encoding schemes evaluated on a test set.
Classification results of 34 allele datasets
| HLA-A*01:01 | 1157 | 0.964 | 0.980 | 0.977 | 0.972 ± 0.004 | 0.977 ± 0.003 | |
| HLA-A*02:01 | 3089 | 0.934 | 0.952 | 0.946 | 0.950 ± 0.004 | 0.951 ± 0.004 | |
| HLA-A*02:02 | 1447 | 0.875 | 0.899 | 0.899 | 0.901 ± 0.004 | ||
| HLA-A*02:03 | 1443 | 0.884 | 0.916 | 0.916 | 0.920 ± 0.003 | 0.923 ± 0.003 | |
| HLA-A*02:06 | 1437 | 0.872 | 0.914 | 0.916 | 0.925 ± 0.004 | 0.927 ± 0.004 | |
| HLA-A*03:01 | 2094 | 0.908 | 0.928 | 0.937 | 0.934 ± 0.004 | 0.938 ± 0.003 | |
| HLA-A*11:01 | 1985 | 0.918 | 0.948 | 0.939 | 0.945 ± 0.004 | 0.951 ± 0.002 | |
| HLA-A*24:02 | 197 | 0.718 | 0.780 | 0.801 | |||
| HLA-A*26:01 | 672 | 0.907 | 0.931 | 0.924 | 0.941 ± 0.005 | 0.957 ± 0.007 | |
| HLA-A*29:02 | 160 | 0.755 | 0.911 | 0.916 | |||
| HLA-A*31:01 | 1869 | 0.909 | 0.925 | 0.928 | 0.930 ± 0.002 | ||
| HLA-A*33:01 | 1140 | 0.892 | 0.915 | 0.926 ± 0.004 | |||
| HLA-A*68:01 | 1141 | 0.840 | 0.885 | 0.883 | |||
| HLA-A*68:02 | 1434 | 0.865 | 0.898 | 0.889 | 0.901 ± 0.005 | 0.907 ± 0.003 | |
| HLA-B*07:02 | 1262 | 0.952 | 0.964 | 0.960 | 0.960 ± 0.004 | 0.964 ± 0.002 | |
| HLA-B*08:01 | 708 | 0.936 | 0.943 | 0.955 | 0.942 ± 0.005 | 0.951 ± 0.004 | |
| HLA-B*15:01 | 978 | 0.900 | 0.940 | 0.941 | 0.940 ± 0.006 | 0.950 ± 0.005 | |
| HLA-B*18:01 | 118 | 0.573 | 0.853 | 0.838 | 0.886 ± 0.013 | ||
| HLA-B*27:05 | 969 | 0.915 | 0.940 | 0.938 | |||
| HLA-B*35:01 | 736 | 0.851 | 0.889 | 0.875 | 0.900 ± 0.004 | ||
| HLA-B*40:02 | 118 | 0.541 | 0.842 | 0.754 | 0.811 ± 0.007 | ||
| HLA-B*44:02 | 119 | 0.533 | 0.740 | 0.739 | |||
| HLA-B*44:03 | 119 | 0.461 | 0.753 | 0.763 | |||
| HLA-B*51:01 | 244 | 0.822 | 0.868 | 0.886 | |||
| HLA-B*53:01 | 254 | 0.871 | 0.882 | 0.885 | |||
| HLA-B*54:01 | 255 | 0.847 | 0.921 | 0.903 | 0.927 ± 0.008 | 0.938 ± 0.006 | |
| HLA-B*57:01 | 59 | 0.428 | 0.843 | 0.826 | 0.792 ± 0.009 | 0.854 ± 0.010 | |
| HLA-B*58:01 | 988 | 0.889 | 0.945 | 0.961 | 0.959 ± 0.005 | 0.964 ± 0.004 | |
| H-2 Db | 303 | 0.865 | 0.912 | 0.901 | |||
| H-2 Dd | 85 | 0.696 | 0.853 | 0.837 | |||
| H-2 Kb | 223 | 0.792 | 0.810 | 0.833 | 0.844 ± 0.021 | ||
| H-2 Kd | 176 | 0.798 | 0.936 | 0.931 | |||
| H-2 Kk | 164 | 0.758 | 0.770 | 0.790 | |||
| H-2 Ld | 102 | 0.551 | 0.924 | 0.942 | |||
| Average | | 0.801 | 0.895 | 0.895 | 0.900 | 0.912 | 0.931 |
| | NA | 4.37E-5 | 3.69E-5 | 1.25E-5 | 5.21E-6 | 2.64E-6 | |
| | | NA | 8.61E-1 | 2.30E-1 | 8.28E-3 | 2.87E-5 | |
| | | | NA | 2.61E-1 | 3.50E-3 | 8.49E-6 | |
| | | | | NA | 8.57E-3 | 7.74E-5 | |
| NA | 1.95E-5 | ||||||
For each dataset, AUCs were evaluated based on 5-fold cross validation. In the lower part, p-values of average AUCs were calculated using paired t-tests (two-tailed).
Means and standard deviations were calculated by 20 iterations of 5-fold cross validation for EpicCapo and EpicCapo+.
Underlined values represent the highest performance among ARB, SMM, SMMPMBEC, and NetMHC. Values in bold represent significant improvements of EpicCapo or EpicCapo+ AUCs from 20 iterations of 5-fold cross validation over the underlined values according to t-tests (one-tailed, significance level = 0.01).
Optimal subsets of AAPPs and number of selected features identified by EpicCapousing 14 HLA-A allele datasets
| A *01:01 | 0.980 | 1,11,14,20,24,26,28,33 | 72 |
| A *02:01 | 0.958 | 9,11,14,24,26,28,31 | 62 |
| A *02:02 | 0.913 | 14,28 | 18 |
| A *02:03 | 0.925 | 3,9,11,14,19,24,25,26,28,29,31,33 | 104 |
| A *02:06 | 0.926 | 1,3,9,11,13,14,18,19,21,22,24,25,26,27,28,31,34,38,39 | 141 |
| A *03:01 | 0.946 | 11,14,20,24,26,28,33 | 58 |
| A *11:01 | 0.956 | 11,14,26,28 | 35 |
| A *24:02 | 0.877 | 5,6,14,24,28,31 | 31 |
| A *26:01 | 0.960 | 14,28 | 18 |
| A *29:02 | 0.955 | 5,8,9,20,33 | 23 |
| A *31:01 | 0.940 | 11,14,20,26,28,33 | 46 |
| A *33:01 | 0.940 | 14,28 | 17 |
| A *68:01 | 0.904 | 11,14,20,26,28,33 | 40 |
| A *68:02 | 0.913 | 1,9,11,14,20,22,24,26,28,33,39 | 79 |
| Average | 0.935 |
Prediction results of EpicCapousing four influenza A strains categorized by specific alleles
| A *01:01 | 14 | 13 | 6 | 5 | A1 |
| A *26:01 | 6 | 9 | 1 | 5 | A1 |
| A *29:02 | 103 | 134 | 61 | 161 | ? |
| A *02:01 | 122 | 160 | 71 | 168 | A2 |
| A *02:02 | 302 | 370 | 162 | 391 | A2 |
| A *02:03 | 268 | 326 | 144 | 307 | A2 |
| A *02:06 | 200 | 250 | 105 | 264 | A2 |
| A *68:02 | 198 | 220 | 109 | 277 | A2 |
| A *24:02 | 90 | 108 | 50 | 150 | A24 |
| A *03:01 | 85 | 94 | 50 | 136 | A3 |
| A *11:01 | 162 | 176 | 91 | 229 | A3 |
| A *31:01 | 183 | 227 | 110 | 245 | A3 |
| A *33:01 | 96 | 117 | 62 | 110 | A3 |
| A *68:01 | 263 | 346 | 151 | 325 | A3 |
| Total | 2092 | 2550 | 1173 | 2773 | |
Comparison of epitopes identified by EpicCapowith the broadly protective influenza A viral epitopes identified by Uchida[50]
| H1N1 (A/PR/8/34) | GILGFVFTL | A*02:01, A*02:02, A*02:03, A*02:06 |
| IILKANFSV | A*02:01, A*02:02, A*02:03, A*02:06, A*68:02 | |
| GMFNMLSTV | A*02:01, A*02:02, A*02:03, A*02:06 | |
| H3N2 (A/Aichi/2/68) | GILGFVFTL | A*02:01, A*02:02, A*02:03, A*02:06 |
| VMLKANFSV | A*02:01, A*02:02, A*02:03, A*02:06 | |
| GMFNMLSTV | A*02:01, A*02:02, A*02:03, A*02:06 | |
| H1N1 (A/NewYork/4290/2009) | GILGFVFTL | A*02:01, A*02:02, A*02:03, A*02:06 |
| IVLKANFSV | A*02:01, A*02:02, A*02:06, A*68:02 | |
| GMFNMLSTV | A*02:01, A*02:02, A*02:03, A*02:06 | |
| H5N1 (A/Hong Kong/483/97) | GILGFVFTL | A*02:01, A*02:02, A*02:03, A*02:06 |
| IILKANFSV | A*02:01, A*02:02, A*02:03, A*02:06, A*68:02 | |
| GMFNMLSTV | A*02:01, A*02:02, A*02:03, A*02:06 |
Figure 3Our peptide data-encoding scheme, using the first position of a nonapeptide as an example.