| Literature DB >> 22347389 |
Chun-Hung Su1, Nikhil R Pal, Ken-Li Lin, I-Fang Chung.
Abstract
BACKGROUND: Identification of amino acid propensities that are strong determinants of linear B-cell epitope is very important to enrich our knowledge about epitopes. This can also help to obtain better epitope prediction. Typical linear B-cell epitope prediction methods combine various propensities in different ways to improve prediction accuracies. However, fewer but better features may yield better prediction. Moreover, for a propensity, when the sequence length is k, there will be k values, which should be treated as a single unit for feature selection and hence usual feature selection method will not work. Here we use a novel Group Feature Selecting Multilayered Perceptron, GFSMLP, which treats a group of related information as a single entity and selects useful propensities related to linear B-cell epitopes, and uses them to predict epitopes. METHODOLOGY/ PRINCIPALEntities:
Mesh:
Substances:
Year: 2012 PMID: 22347389 PMCID: PMC3275595 DOI: 10.1371/journal.pone.0030617
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
The frequency with which each propensity is selected in 1,000 runs of GFSMLP (η = 0.2, μ = 0.1, n = 15, number of iterations = 2,000).
| Data SetsPropensities | AAP872 | ABCpred | BCPred | Combo |
| 1. Hydrophilicity (Parker) | 315 | 178 | 295 | 386 |
| 2. Accessibility (Emini) | 232 | 214 | 251 | 315 |
| 3. Flexibility (Karplus) | 239 | 162 | 216 | 305 |
| 4. Surface Exposed Scale (Janin) | 149 | 183 | 186 | 250 |
| 5. Polarity (Ponnuswamy) | 397 | 307 | 402 | 498 |
| 6. Turns (Pellequer) | 186 | 147 | 91 | 209 |
| 7. Antigenicity (Kolaskar) | 240 | 206 | 140 | 309 |
| 8. Beta-turn (Chou) | 629 | 553 | 554 | 643 |
Average of misclassification rates over 100 runs of GFSMLP using the selected propensity (η = 0.2, μ = 0, n = 15, number of iterations = 2,000).
| Data setsPropensities | AAP872 | ABCpred | BCPred | Combo |
| 1. Hydrophilicity (Parker) | 9.43±0.79 | 5.46±0.77 | 6.63±0.60 | 14.02±0.86 |
| 2. Accessibility (Emini) | 8.87±0.70 | 4.79±0.48 | 6.09±0.49 | 13.24±0.74 |
| 3. Flexibility (Karplus) | 10.01±0.84 | 5.34±0.66 | 6.99±0.74 | 14.50±1.03 |
| 4. Surface Exposed Scale (Janin) | 15.80±2.67 | 6.72±1.05 | 8.68±1.05 | 20.31±1.93 |
| 5. Polarity (Ponnuswamy) | 19.78±1.75 | 16.59±2.97 | 13.07±1.84 | 24.10±2.43 |
| 6. Turns (Pellequer) | 23.49±2.25 | 11.24±2.08 | 16.25±2.64 | 26.37±2.06 |
| 7. Antigenicity (Kolaskar) | 11.00±1.28 | 5.84±0.62 | 8.06±0.96 | 16.96±1.29 |
| 8. Beta-turn (Chou) | 8.40±0.60 | 4.53±0.45 | 6.05±0.52 | 12.67±0.66 |
The accuracy of SVM using just single propensity by the 2-level 10-fold cross validation scheme.
| Data setsPropensities | AAP872 | ABCpred | BCPred | Combo |
| 1. Hydrophilicity (Parker) | 56.99% | 53.75% | 58.00% | 57.11% |
| 2. Accessibility (Emini) | 57 23% | 56.10% | 58.36% | 57.52% |
| 3. Flexibility (Karplus) | 53.96% | 54.23% | 55.43% | 56.10% |
| 4. Surface Exposed Scale (Janin) | 54.65% | 55.29% | 55.43% | 57.48% |
| 5. Polarity (Ponnuswamy) | 52.76% | 49.67% | 52.86% | 54.20% |
| 6. Turns (Pellequer) | 52.41% | 52.59% | 52.43% | 52.67% |
| 7. Antigenicity (Kolaskar) | 55.50% | 56.28% | 55.29% | 56.43% |
| 8. Beta-turn (Chou) | 56.94% | 56.44% | 58.64% | 59.46% |
The frequency with which each feature/propensity is selected in 100 runs of GFSMLP (η = 0.2, μ = 0.1, n = 15, number of iterations = 2,000).
| Propensities | [1] | [2] | [3] | [4] | [5] | [6] | [7] | [8] | Sum |
|
| |||||||||
| [1] Hydrophilicity (Parker) | 100 | 12 | 1 | 16 | 25 | 16 | 12 | 11 | 193 |
| [2] Accessibility (Emini) | 11 | 100 | 13 | 0 | 7 | 15 | 5 | 12 | 163 |
| [3] Flexibility (Karplus) | 4 | 12 | 100 | 13 | 13 | 17 | 12 | 22 | 193 |
| [4] Surface Exposed Scale (Janin) | 19 | 1 | 16 | 100 | 1 | 11 | 11 | 13 | 172 |
| [5] Polarity (Ponnuswamy) | 51 | 29 | 40 | 23 | 100 | 41 | 45 | 38 | 367 |
| [6] Turns (Pellequer) | 11 | 22 | 15 | 21 | 17 | 100 | 17 | 7 | 210 |
| [7] Antigenicity (Kolaskar) | 6 | 9 | 9 | 16 | 9 | 9 | 100 | 13 | 171 |
| [8] Beta-turn (Chou) | 22 | 44 | 25 | 49 | 23 | 25 | 27 | 100 | 315 |
|
| |||||||||
| [1] Hydrophilicity (Parker) | 100 | 4 | 0 | 14 | 10 | 11 | 4 | 1 | 144 |
| [2] Accessibility (Emini) | 2 | 100 | 2 | 0 | 1 | 10 | 3 | 4 | 122 |
| [3] Flexibility (Karplus) | 0 | 3 | 100 | 16 | 8 | 16 | 10 | 6 | 159 |
| [4] Surface Exposed Scale (Janin) | 8 | 0 | 5 | 100 | 0 | 11 | 6 | 9 | 139 |
| [5] Polarity (Ponnuswamy) | 30 | 18 | 27 | 13 | 100 | 31 | 33 | 20 | 272 |
| [6] Turns (Pellequer) | 15 | 9 | 8 | 16 | 4 | 100 | 12 | 2 | 166 |
| [7] Antigenicity (Kolaskar) | 4 | 2 | 2 | 5 | 2 | 19 | 100 | 3 | 137 |
| [8] Beta-turn (Chou) | 16 | 28 | 13 | 39 | 22 | 18 | 23 | 100 | 259 |
|
| |||||||||
| [1] Hydrophilicity (Parker) | 100 | 7 | 1 | 12 | 18 | 21 | 9 | 3 | 171 |
| [2] Accessibility (Emini) | 4 | 100 | 8 | 0 | 5 | 18 | 7 | 9 | 151 |
| [3] Flexibility (Karplus) | 0 | 6 | 100 | 8 | 16 | 14 | 7 | 10 | 161 |
| [4] Surface Exposed Scale (Janin) | 8 | 1 | 9 | 100 | 1 | 13 | 10 | 13 | 155 |
| [5] Polarity (Ponnuswamy) | 41 | 32 | 43 | 19 | 100 | 33 | 39 | 33 | 340 |
| [6] Turns (Pellequer) | 16 | 15 | 12 | 19 | 9 | 100 | 18 | 1 | 190 |
| [7] Antigenicity (Kolaskar) | 6 | 4 | 2 | 10 | 14 | 12 | 100 | 11 | 159 |
| [8] Beta-turn (Chou) | 15 | 27 | 20 | 34 | 25 | 21 | 27 | 100 | 269 |
|
| |||||||||
| [1] Hydrophilicity (Parker) | 100 | 26 | 11 | 36 | 25 | 20 | 24 | 18 | 260 |
| [2] Accessibility (Emini) | 19 | 100 | 12 | 2 | 10 | 18 | 15 | 23 | 199 |
| [3] Flexibility (Karplus) | 4 | 12 | 100 | 22 | 30 | 19 | 18 | 16 | 221 |
| [4] Surface Exposed Scale (Janin) | 16 | 3 | 25 | 100 | 5 | 20 | 20 | 26 | 215 |
| [5] Polarity (Ponnuswamy) | 52 | 48 | 51 | 39 | 100 | 48 | 59 | 43 | 440 |
| [6] Turns (Pellequer) | 19 | 26 | 25 | 26 | 11 | 100 | 18 | 11 | 236 |
| [7] Antigenicity (Kolaskar) | 11 | 17 | 18 | 26 | 17 | 23 | 100 | 22 | 234 |
| [8] Beta-turn (Chou) | 35 | 53 | 29 | 52 | 39 | 34 | 47 | 100 | 389 |
The accuracy of SVM using pair of propensities by 2-level 10-fold cross validation.
| Data setsPropensities | AAP872 | ABCpred | BCPred | Combo |
| Propensities: 1 & 2 | 58.49% | 54.57% | 59.93% | 62.21% |
| Propensities: 1 & 3 | 56.42% | 53.01% | 56.43% | 60.06% |
| Propensities: 1 & 4 | 57.05% | 55.38% | 59.57% | 62.20% |
| Propensities: 1 & 5 | 58.43% | 54.16% | 59.07% | 60.67% |
| Propensities: 1 & 6 | 57.28% | 54.48% | 57.57% | 57.52% |
| Propensities: 1 & 7 | 56.94% | 56.60% | 57.50% | 59.86% |
| Propensities: 1 & 8 | 57.86% | 57.82% | 59.79% | 60.92% |
| Propensities: 2 & 3 | 56.60% | 55.46% | 60.29% | 60.47% |
| Propensities: 2 & 4 | 55.45% | 55.38% | 56.71% | 59.38% |
| Propensities: 2 & 5 | 55.22% | 54.57% | 55.64% | 58.77% |
| Propensities: 2 & 6 | 57.12% | 56.70% | 58.21% | 59.25% |
| Propensities: 2 & 7 | 56.76% | 57.26% | 56.14% | 60.95% |
| Propensities: 2 & 8 | 57.28% | 60.02% | 61.71% | 62.90% |
| Propensities: 3 & 4 | 55.34% | 55.45% | 58.50% | 60.71% |
| Propensities: 3 & 5 | 56.26% | 52.06% | 57.50% | 59.38% |
| Propensities: 3 & 6 | 54.82% | 54.89% | 56.57% | 58.36% |
| Propensities: 3 & 7 | 53.84% | 57.01% | 56.64% | 58.40% |
| Propensities: 3 & 8 | 57.57% | 58.30% | 59.21% | 60.27% |
| Propensities: 4 & 5 | 52.64% | 54.82% | 53.71% | 56.91% |
| Propensities: 4 & 6 | 54.36% | 55.39% | 57.07% | 57.84% |
| Propensities: 4 & 7 | 56.71% | 57.59% | 55.29% | 59.78% |
| Propensities: 4 & 8 | 58.03% | 59.29% | 62.07% | 62.53% |
| Propensities: 5 & 6 | 53.33% | 52.55% | 54.14% | 54.69% |
| Propensities: 5 & 7 | 55.85% | 57.02% | 57.50% | 58.04% |
| Propensities: 5 & 8 | 58.32% | 57.90% | 60.29% | 61.00% |
| Propensities: 6 & 7 | 57.11% | 55.04% | 56.14% | 57.40% |
| Propensities: 6 & 8 | 56.99% | 58.72% | 58.07% | 58.41% |
| Propensities: 7 & 8 | 57.39% | 58.30% | 60.29% | 60.59% |
Figure 1Encoding scheme for the calculation of correlations of pair amino acid propensities.
Figure 2FSMLP network structure.
Eight amino acid propensities are used in the input layer. Each propensity results in 20 normalized amino acid values. Thus the inputs are in 160-dimension. The 20 values corresponding to a particular propensity are treated as a group. The algorithm selects one or more propensities to evaluate its or their performance. After the training is over, the GFSMLP reports the most useful propensity/propensities to classify the input peptide sequence belonging to epitopes or non-epitopes in the output layer.
Figure 3Scheme of two-level 10-fold cross-validation.
The data set is partitioned into 10 parts (folds) in the outer loop. One fold of the data set is kept for testing of SVM. The remaining 9 folds are used as the training set for training an SVM. In the inner loop, the training set is further divided into 10 folds to choose the optimal parameters for testing the accuracy of the data set kept in the outer loop. The procedure is repeated 10 times.