| Literature DB >> 19091021 |
Shen Jean Lim1, Joo Chuan Tong, Fook Tim Chew, Martti T Tammi.
Abstract
BACKGROUND: Bioinformatics tools are commonly used for assessing potential protein allergenicity. While these methods have achieved good accuracies for highly conserved sequences, they are less effective when the overall similarity is low. In this study, we assessed the feasibility of using position-specific scoring matrices as a basis for predicting potential allergenicity in proteins.Entities:
Mesh:
Substances:
Year: 2008 PMID: 19091021 PMCID: PMC2638161 DOI: 10.1186/1471-2105-9-S12-S21
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Prediction quality of the profile-based methods
| General Profile | 10 | 11 | 1792 | 21.67 | 10.38 | 96.42 | 13.99 | 94.71 | 0.08 |
| 1 | 22 | 737 | 67.03 | 63.17 | 92.58 | 27.74 | 98.25 | 0.38 | |
| 10-1 | 31 | 298 | 85.72 | 85.11 | 89.80 | 48.45 | 98.22 | 0.59 | |
| 10-2 | 36 | 184 | 90.42 | 90.80 | 87.95 | 60.22 | 98.03 | 0.68 | |
| 10-3 | 41 | 137 | 92.27 | 93.14 | 86.56 | 66.70 | 97.87 | 0.72 | |
| 10-4 | 43 | 108 | 93.44 | 94.61 | 85.70 | 71.76 | 97.77 | 0.75 | |
| 10-6 | 48 | 83 | 94.32 | 95.87 | 84.04 | 76.53 | 97.55 | 0.77 | |
| 10-9 | 53 | 62 | 95.02 | 96.92 | 82.45 | 81.34 | 97.34 | 0.79 | |
| Group-Specific | 10 | 14 | 1801 | 21.14 | 9.92 | 95.43 | 13.79 | 93.48 | 0.06 |
| Profiles | 1 | 22 | 748 | 66.53 | 62.58 | 92.72 | 27.33 | 98.27 | 0.38 |
| 10-1 | 29 | 317 | 84.99 | 84.17 | 90.40 | 46.88 | 98.31 | 0.58 | |
| 10-2 | 34 | 202 | 89.76 | 89.89 | 88.87 | 57.87 | 98.16 | 0.66 | |
| 10-3 | 37 | 151 | 91.83 | 92.44 | 87.81 | 64.73 | 98.04 | 0.71 | |
| 10-4 | 40 | 124 | 92.86 | 93.79 | 86.69 | 68.87 | 97.90 | 0.73 | |
| 10-6 | 44 | 94 | 94.02 | 95.31 | 85.50 | 74.51 | 97.75 | 0.76 | |
| 10-9 | 48 | 70 | 94.88 | 96.52 | 84.04 | 79.49 | 97.56 | 0.79 |
Average prediction quality of the group-specific profiles. Performance of group-specific profile models at E-value threshold of 10-9.
| Animal | 86.01 | 87.55 | 65.39 | 27.09 | 97.26 | 0.36 |
| Food | 69.63 | 63.89 | 83.22 | 48.85 | 90.37 | 0.43 |
| Weed | 77.79 | 78.44 | 69.33 | 18.70 | 97.26 | 0.27 |
| Insect | 87.20 | 87.82 | 82.08 | 44.05 | 97.71 | 0.54 |
| Mite | 95.29 | 95.80 | 90.81 | 68.48 | 99.02 | 0.76 |
| Grass | 87.81 | 87.91 | 87.16 | 49.27 | 98.25 | 0.59 |
| Tree | 82.10 | 81.56 | 86.88 | 35.85 | 98.14 | 0.48 |
| Fungi | 80.50 | 80.82 | 78.17 | 35.77 | 96.51 | 0.44 |
| Other | 82.50 | 83.62 | 61.13 | 17.29 | 97.55 | 0.26 |
Comparison of the performance between the profile-based methods and existing allergenicity prediction systems
| General profile model | 95.02 | 96.92 | 82.45 | 81.34 | 97.34 | 0.79 |
| Group-specific profile model | 94.88 | 96.52 | 84.04 | 79.49 | 97.56 | 0.79 |
| FAO/WHO [ | 31.58 | 23.31 | 86.36 | 14.55 | 91.83 | 0.08 |
| SVM (global description) [ | 93.01 | 95.40 | 77.22 | 59.02 | 96.52 | 0.71 |
| SVM (aa composition) [ | 61.77 | 57.61 | 89.33 | 24.14 | 97.28 | 0.32 |
| SVM (dipeptide composition) [ | 61.73 | 57.55 | 89.40 | 24.12 | 92.29 | 0.32 |
| MEME/MAST motifs [ | 86.84 | 99.75 | 1.26 | 31.59 | 87.00 | 0.04 |
| ARP [ | 61.55 | 57.45 | 88.74 | 18.92 | 97.12 | 0.31 |
Figure 1Distribution of the allergen data used in this study.
Figure 2General strategy of the profile-based method. The general strategy involves performing a RPS-BLAST search on the query protein against a searchable database of allergen profiles generated by PSI-BLAST. Query sequences that generate hits above the specified e-value threshold are predicted to be potential allergens.
Figure 3Schematic representation of how allergen profiles are constructed in this study. The development of this approach consists of A) a preliminary screening step and B) an optimization step.