| Literature DB >> 21876642 |
Hsin-Wei Wang1, Ya-Chi Lin, Tun-Wen Pai, Hao-Teng Chang.
Abstract
Epitopes are antigenic determinants that are useful because they induce B-cell antibody production and stimulate T-cell activation. Bioinformatics can enable rapid, efficient prediction of potential epitopes. Here, we designed a novel B-cell linear epitope prediction system called LEPS, Linear Epitope Prediction by Propensities and Support Vector Machine, that combined physico-chemical propensity identification and support vector machine (SVM) classification. We tested the LEPS on four datasets: AntiJen, HIV, a newly generated PC, and AHP, a combination of these three datasets. Peptides with globally or locally high physicochemical propensities were first identified as primitive linear epitope (LE) candidates. Then, candidates were classified with the SVM based on the unique features of amino acid segments. This reduced the number of predicted epitopes and enhanced the positive prediction value (PPV). Compared to four other well-known LE prediction systems, the LEPS achieved the highest accuracy (72.52%), specificity (84.22%), PPV (32.07%), and Matthews' correlation coefficient (10.36%).Entities:
Mesh:
Substances:
Year: 2011 PMID: 21876642 PMCID: PMC3163029 DOI: 10.1155/2011/432830
Source DB: PubMed Journal: J Biomed Biotechnol ISSN: 1110-7243
Epitopes predicted in the PC dataset after analysis with LEPS.
| Antigen : length (UniProt IDa) | LEPS-predicted Epitopes | Experimental epitopes | Ref. |
|---|---|---|---|
| PrP : 253 | M1ANLGCWML9 | ||
| R37YPGQG42 | [ | ||
| Q52GG54 | [ | ||
| Q91GGGT95 | [ | ||
| N100KPSKPKTNMKHMA113 | [ | ||
| G123GLGGYMLG131 | [ | ||
| H140FG | [ | ||
| Q160VYYRPMD167 | [ | ||
| F198TETD202 | [ | ||
| Y218ERESQAYYQRGS230 | |||
| GAPDH : 338 | A4KVGING10 | ||
| A21AFLKNTVDV30 | |||
| [ | |||
| K48RDSTHGTFP | [ | ||
| G100VFTTIDK | [ | ||
| S123APSADAPM131 | |||
| V136NENSYEKS144 | |||
| V148SNASCTTN156 | |||
| [ | |||
| V188VDGPSSKLWRDGRGAM204 | |||
| A210STGAAKAVG219 | |||
| L225NGKLT230 | |||
| R235VPTPDVSV243 | |||
| R249LGKGASYEE258 | |||
| S268GPLKGILEYTEDEVVSSD | [ | ||
| I302SLNNNF308 | |||
| Y315DNEFGY321 | |||
| I329THMHKVDHA338 | |||
| Ara h 1 : 626 | [ | ||
| Q47 | [ | ||
| E66YDPRCVY73 | [ | ||
| P75RGHTGTTNQRSPPG | [ | ||
| [ | |||
| R | [ | ||
| E134DWRRPSHQQPR | [ | ||
| NN173 | |||
| P295GQFEDFF302 | [ | ||
| Y312LQGFSRN319 | [ | ||
| F325NAEFNEIRR334 | [ | ||
| Q345EERGQRR352 | [ | ||
| K381SVSKKGSEEEG | [ | ||
| N409NFGKLFEVK418 | [ | ||
| G463NLELV468 | [ | ||
| K472EQQQRGRREEEEDEDEEEEGSN | |||
| R498RYTARLKEG507 | [ | ||
| E525LHLLGFGIN534 | [ | ||
| H539RIFLAGDKD548 | [ | ||
| I551DQIEKQAKDLAFPGSGE568 | [ | ||
| P587QSQSQSPSSPEKESPEKEDQEEEN | |||
| SARS N : 422 | A36RPKQRRPQGLPNNTASWFT55 | [ | |
| H60GKEEL65 | |||
| T77NSGPDDQ84 | |||
| L140NTPKDHIGTRNPNNN155 | |||
| A156ATVLQLPQGTTLPKGFYAEGSRGG180 | [ | ||
| T266KQYNVTQAFGRRGP280 | [ | ||
| N286FGDQDLIRQGTDYK300 | [ | ||
| K356HIDAYKTFPPTEPKKDKKK375 | [ | ||
| R386QKKQPTVTLLPAADMDDFSRQLQN410 | [ | ||
| ZP3 : 399 | [ | ||
| Q71AAELTLGPSACAPVPAEPLSK92 | [ | ||
| H101ECGSELQMTPDSLIYSTVLHY122 | [ | ||
| P124N | [ | ||
| G156IQPTWVPFHSTLSREQ172 | [ | ||
| D251SSSIFISPRPG262 | [ | ||
| V291TATDQAPSPLN302 | [ | ||
| A311DEWLPVEGPRD322 | [ | ||
| Q346EPGNPSEFEADLMLGPLVLSEAENGP372 | [ | ||
| AIV-H4 : 511 | Q17NYTGNPVIC26 | D107TCYPFDVPEYQSLR121 | [ |
| F137QWNTVKQNGKSGACKRANVNDFFNRLNWLVK | [ | ||
| N206LYKNNPGRVTVSTK220 | [ | ||
| T224SVVPNIGSGPLVRGGQSGRVSXYWTIV250 | [ | ||
| V257FNTIGNLIAPRGHYKLNNQKKSTILNTAIPIGSC | [ | ||
| D455SEMNKLFERVRRQL469 | [ | ||
| A473EDKGNGCFEIFHKCDNN490 | [ | ||
| N512RFQIQGVKLTQGYM526 | [ | ||
| AIV-H5 : 568 | A25NNSTEQVDTIMEKNVTVTHAQDILEKTHNGKL57 | [ | |
| E85FLNVPEWSYIVEKINPANDLCYP108 | [ | ||
| C151PYQGRSSFFRNVVW165 | [ | ||
| D199AAEQTRLYQNPTTY213 | [ | ||
| R223SKVNGQSGRMEFFWTILKPNDAINFESNGNFIA | [ | ||
| ENAYKIV273 | |||
| L472RDNAKELGNGCFEFYHR489 | [ | ||
| E284LEYGNCNTKC294 | |||
| AIV-H12 : 527 (C7FPM3, residue 1–527) | D31TVN | [ | |
| K127YERVKMFDFTKWNVTYTGTSKACNNTSNQGS | [ | ||
| YRSMRWLTLKSGQFPVQTDEY180 | |||
| F190TWAIHHPPTSDEQVKLYKNPNSLSSVTTDEINR | [ | ||
| FRPNIGPRPL234 | |||
| Q238QGRMDYYWAVLKPGQTV255 | [ | ||
| T259NGNLIAPEYGHLITGKSHGRILKNDLPIGQCTTEC294 | [ | ||
| T310SKHYIGKCPKYIPS324 | [ | ||
| R334NVPQAQDRGLFGAIAGFIEG354 | [ | ||
| I430TDIWAYNAELLVLLENQKTLDEHDANVRNLHD | [ | ||
| VR465 | |||
| G478CFEILHKCDDGCMDTIKNGT498 | [ | ||
| Q502DYEEESKLERQRINGVKLEENSTYK527 | [ | ||
| DEN-3 E-glycoprotein : 493 (D2JWZ8, residue 281–773) | T331QLATLRKLCIEGKI345 | [ | |
| D351SRCPTQGEAVLPEEQDPNY370 | [ | ||
| Q411YENLKYTVIITVHTGDQHQVGNETQGVT | [ | ||
| L476LTMKNKAWMVHRQW490 | [ | ||
| Q526EVVVLG | [ | ||
| W669YKKGSSI676 | |||
| L707NSLG711 | |||
| O. tsutsugamushi 47-kDa antigen : 466 (Q53246) | H21SKSLLNQKAVLPQQKSDMHIN42 | [ | |
| T65NIGISLNNKVSKYQQEV82 | [ | ||
| V97TNENVIAGR106 | [ | ||
| Y145ATFGDSNQS154 | [ | ||
| V173TNGIISSKGRDMG186 | [ | ||
| F193IQTNAAIHM202 | [ | ||
| H201MGSFGGPMF210 | [ | ||
| I233PSNTVLEAV242 | [ | ||
| [ | |||
| L333LRNGKSMTLKCKIIANK350 | [ | ||
| Q357SNDQSLVVN366 | [ | ||
| L373TPDLVKKYNITSA386 | [ | ||
| HPV L1 protein : 510 (A8BQ01) | D41VYVTRTNVYYHGGSSRLLTVGHPYYSIKKSNN | [ | |
| V90KLPDPNKFGLPDADLYDPDTQRLLWACVGVE | [ | ||
| T205TIEDGDMVET215 | [ | ||
| D219ICTNTCKYPDYLKMAAEPY238 | [ | ||
| G235DSMFFSLRREQMFTRHFFNRGGKMGDTIPD285 | [ | ||
| R326AQGHNNGMCW336 | |||
| S350TNVSLCATEA360 | [ | ||
| F370KEYLRHMEEYDLQFIFQLCKITLTPEIMAY400 | [ | ||
| V416PPPPSASL424 | |||
| K440PTPPKTPTD | [ | ||
| G497TPPPTSKRKRV508 | |||
| Bacillus anthracis, PA domain III and IV : 248 (P13423, residue 488–735) | R532RIAAV | [ | |
| A596ELNATNIYTVL607 | [ | ||
| I620RDKRFHYDRNNIAVGADES639 | [ | ||
| L692NISSLRQDGKT703 | [ | ||
| L716YIS | [ | ||
aBecause some of the epitopes in the PC dataset were partial antigen fragments, the serial numbers for the residues in each epitope were assigned according to the sequence information retrieved from the UniProt database [38]. The overlapping amino acids between the experimentally verified and predicted epitopes are shown in bold.
Figure 1The design of LEPS. (a) Step 1(a): primitive epitope candidates with globally and locally high antigenicity were extracted by calculating weighting coefficients for various physicochemical propensities of each amino acid. After the filtering process with the SVM classifier (step 2(a)), predicted epitopes were highlighted (step 3(a)) in the query sequence and the simulated structure. (b) Step 1(b): 1230 experimentally verified epitopes and 872 non-epitopes were analyzed to determine the statistical characteristics of AASs. Step 2(b): subsequently, epitope indexes of 872 epitopes and 872 non-epitopes were used to train the SVM model to predict candidate epitopes based on the statistical characteristics defined in step 1(b).
Figure 2Comparison of the performances of LEPS, BepiPred, ABCPred, BCPred, and FBCPred systems. The best performance for each indicator is marked with a star.
Comparison of the performances of LEPS, BepiPred, ABCPred, BCPred, and FBCPred systems.
| Systems | SENa | SPEa | ACCa | PPVa | MCCa |
|---|---|---|---|---|---|
| PC dataset | |||||
| LEPS | 12.78 | 3.65 | |||
| BepiPred | 48.23 | 59.72 | 55.33 | 38.19 | |
| ABCPred0.8b | 40.26 | 48.89 | 36.21 | 5.13 | |
| BCPred | 50.92 | 59.35 | 52.83 | 36.07 | 4.43 |
| FBCPred | 51.03 | 52.55 | 52.20 | 35.26 | 3.17 |
| AntiJen dataset | |||||
| LEPS | 26.72 | ||||
| BepiPred | 51.79 | 57.61 | 55.52 | 22.02 | 6.04 |
| ABCPred0.8 | 40.40 | 44.70 | 21.83 | 5.46 | |
| BCPred | 58.84 | 54.87 | 53.92 | 23.34 | 8.93 |
| FBCPred | 60.31 | 51.21 | 51.45 | 22.33 | 6.73 |
| HIV dataset | |||||
| LEPS | 48.33 | 63.45 | 22.76 | ||
| BepiPred | 50.16 | 60.85 | 56.72 | 61.22 | 9.72 |
| ABCPred0.7 | 14.65 | 56.59 | 56.33 | 5.64 | |
| BCPred | 80.18 | 54.57 | 66.57 | 65.55 | |
| FBCPred | 73.20 | 58.20 | 65.56 | 27.81 | |
| AHP datasetc | |||||
| LEPS | 26.97 | ||||
| BepiPred | 51.48 | 57.91 | 55.57 | 25.06 | 6.32 |
| ABCPred0.8 | 39.06 | 45.58 | 24.51 | 5.45 | |
| BCPred | 59.45 | 54.80 | 54.50 | 26.32 | 9.73 |
| FBCPred | 60.40 | 51.66 | 52.31 | 25.38 | 7.60 |
aSEN: sensitivity; SPE: specificity; PPV: positive prediction value; ACC: accuracy; MCC: Matthews' correlation coefficient, unit, %.
bThe subscripts of ABCPred denote threshold values according to the highest accuracy.
cThis dataset is a merge of the other 3 datasets.
Figure 3The LEPS server. (a) Users can input a query sequence and manually adjust the weight and window size of each propensity. (b) The output information of HIV integrase predicted by LEPS shows 17 candidates, and only 9 candidates were retained after SVM filtration. The final predicted epitope segments are labeled in yellow at the bottom.
Figure 4The predicted LEs of HIV integrase mapped onto a simulated 3D structure. The predicted epitopes are labeled in yellow, and the selected epitopes (number 1 and number 3) are shown in yellow spheres.
Figure 5The experimental and predicted epitopes of 10 kDa chaperonin. The structural surfaces display the true epitopes (a) and predicted epitopes (b) in yellow spheres. The red and blue spheres represent the remainder of the protein. Both figures were created with PyMOL.