| Literature DB >> 26696329 |
Edgar Kozlova, Benjamin Viart, Ricardo de Avila, Liza Felicori, Carlos Chavez-Olortegui.
Abstract
BACKGROUND: The humoral immune system response is based on the interaction between antibodies and antigens for the clearance of pathogens and foreign molecules. The interaction between these proteins occurs at specific positions known as antigenic determinants or B-cell epitopes. The experimental identification of epitopes is costly and time consuming. Therefore the use of in silico methods, to help discover new epitopes, is an appealing alternative due the importance of biomedical applications such as vaccine design, disease diagnostic, anti-venoms and immune-therapeutics. However, the performance of predictions is not optimal been around 70% of accuracy. Further research could increase our understanding of the biochemical and structural properties that characterize a B-cell epitope.Entities:
Mesh:
Substances:
Year: 2015 PMID: 26696329 PMCID: PMC4686779 DOI: 10.1186/1471-2105-16-S19-S7
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Dataset composition
| Groups | Proteins | Epitopes | Non epitopes |
|---|---|---|---|
| Uniprot | 544996 | -- | -- |
| Neurotoxin | 16 | 29 | 0 |
| Metalloproteinase | 11 | 70 | 0 |
| Negative examples | 13 | 0 | 49 |
The metalloproteinase and neurotoxin epitopes showed to be different from each other showing a statistical dissemblance for a confidence interval of 95% for the amino acids R, K, M and Y (Table 2, column 1). Also when compared these epitopes to their respective proteins they showed differences for the amino acids R, Q, V and M for metalloproteinases (Table 2, column 4) and D and C for neurotoxins (Table 2, column 5).
Analysis of means for all datasets with Welch two sample T-test
| Parameter | p - values for a confidence interval of 95% | ||||
|---|---|---|---|---|---|
| (1)ME vs NE | (2)Random vs ME | (3)Random vs NE | (4) MP vs ME | (5) NP vs NE | |
| R (Arg) | 0.0762 | 0.4226 | |||
| H (His) | 0.1046 | 0.1074 | 0.5636 | 0.7906 | |
| K (Lys) | 0.4098 | 0.4818 | |||
| D (Asp) | 0.0890 | 0.6994 | 0.7091 | ||
| E (Glu) | 0.9289 | 0.2681 | 0.0838 | 0.6696 | 0.4072 |
| S (Ser) | 0.2953 | 0.5024 | 0.3546 | 0.9630 | 0.8954 |
| T (Thr) | 0.4077 | 0.1867 | 0.3509 | 0.2199 | 0.4523 |
| N (Ans) | 0.1878 | 0.7647 | 0.5880 | 0.4944 | |
| Q (Gln) | 0.1509 | 0.9483 | 0.8471 | 0.8185 | |
| C (Cys) | 0.1821 | ||||
| G (Gly) | 0.6979 | 0.2576 | 0.4620 | 0.3509 | 0.8450 |
| P (Pro) | 0.3156 | 0.5165 | 0.3781 | 0.2103 | 0.4271 |
| A (Ala) | 0.2121 | 0.1092 | 0.0756 | ||
| V (Val) | 0.0993 | 0.2903 | 0.0550 | 0.1854 | |
| I (Ile) | 0.2657 | 0.0352 | 0.1286 | 0.3275 | |
| L (Leu) | 0.1374 | 0.1182 | 0.5549 | 0.2322 | |
| M (Met) | 0.0725 | 0.2477 | |||
| F (Phe) | 0.6997 | 0.4713 | 0.0765 | 0.7890 | 0.5818 |
| Y (Tyr) | 0.5245 | 0.8318 | 0.0938 | ||
| W (Trp) | 0.0889 | 0.9443 | 0.5782 | 0.1221 | |
| Isoe.Point | 0.0425 | 0.5190 | 0.5190 | 0.3221 | |
| gravy | 0.0672 | 0.0672 | |||
| Aliph. Index | 0.8550 | ||||
Values under p-value under 0.05 are writen in bold. IC = 95%, H0 = Difference in means is cero. Hi = Difference in means is not equal to zero. Metalloproteinases epitopes = ME, Neurotoxin epitopes = NE, Metalloproteinase proteins = MP, Neurotoxin proteins = NP, Random = Random sequences.
Performance of all data mining methods showed in AUC and accuracy.
| Matrix | PCP | PSS | PCP+PSS | |||
|---|---|---|---|---|---|---|
| SVM | 1 | 1 | 1 | 1 | 1 | 1 |
| MLR | 0.986 | 0.952 | 0.655 | 0.714 | 1 | 1 |
| DT | 0.957 | 0.962 | 0.921 | 0.943 | 0.943 | 0.952 |
| NB | 0.8 | 0.838 | 0.521 | 0.667 | 0.793 | 0.838 |
| KM | 0.493 | 0.667 | 0.509 | 0.681 | 0.507 | 0.667 |
Properties used by the classification models until 8º order out of 39.
| Classification Model: Linear Multiple Regression | |||
|---|---|---|---|
| 1º | Statistic of N | Z-fit | Statistic of E |
| 2º | Statistic of Q | ASA | Statistic C Atoms |
| 3º | Statistic of S | RSA | Statistic of N |
| 4º | Statistic of T | Strand index | Statistic of Q |
| 5º | Uncharged STNQ | Helix index | Statistic of S |
| 6º | Special CGP | Coil index | Statistic of T |
| 7º | Statistic H Atoms | -- | Uncharged STNQ |
| 8º | Statistic C Atoms | -- | Statistic H Atoms |
| 1º | Statistic of K | Z-fit | Statistic of K |
| 2º | Statistic of D | RSA | Statistic of D |
| 3º | Statistic of M | ASA | Statistic of M |
| 4º | Statistic S Atoms | Strand index | Statistic S Atoms |
| 5º | Statistic of I | Coil index | Statistic of I |
| 6º | Statistic of W | -- | Statistic of W |
| 7º | Statistic of Y | -- | Coil index |
| 8º | Isoelectric point | -- | -- |