| Literature DB >> 19516900 |
Hon Cheng Muh1, Joo Chuan Tong, Martti T Tammi.
Abstract
Allergy is a major health problem in industrialized countries. The number of transgenic food crops is growing rapidly creating the need for allergenicity assessment before they are introduced into human food chain. While existing bioinformatic methods have achieved good accuracies for highly conserved sequences, the discrimination of allergens and non-allergens from allergen-like non-allergen sequences remains difficult. We describe AllerHunter, a web-based computational system for the assessment of potential allergenicity and allergic cross-reactivity in proteins. It combines an iterative pairwise sequence similarity encoding scheme with SVM as the discriminating engine. The pairwise vectorization framework allows the system to model essential features in allergens that are involved in cross-reactivity, but not limited to distinct sets of physicochemical properties. The system was rigorously trained and tested using 1,356 known allergen and 13,449 putative non-allergen sequences. Extensive testing was performed for validation of the prediction models. The system is effective for distinguishing allergens and non-allergens from allergen-like non-allergen sequences. Testing results showed that AllerHunter, with a sensitivity of 83.4% and specificity of 96.4% (accuracy = 95.3%, area under the receiver operating characteristic curve AROC = 0.928+/-0.004 and Matthew's correlation coefficient MCC = 0.738), performs significantly better than a number of existing methods using an independent dataset of 1443 protein sequences. AllerHunter is available at (http://tiger.dbs.nus.edu.sg/AllerHunter).Entities:
Mesh:
Substances:
Year: 2009 PMID: 19516900 PMCID: PMC2689655 DOI: 10.1371/journal.pone.0005861
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Figure 1An iterative pairwise sequence similarity training scheme used for constructing a protein's feature vector.
Feature vector corresponding to a particular protein X is FX = fX1, fX2, …, fXi, where i is the total number of allergens in the training data set and fXi, is the Smith-Waterman alignment score of sequence X against the ith allergens in the training dataset.
Comparison of the performances between SVM-pairwise and state-of-the-art techniques using an independent dataset of 1,443 sequences.
| Method | SE (%) | ACC (%) | SP (%) | MCC | |
| All | All APNs | ||||
| FAO/WHO | 97.8 | 20.9 | 27.9 | 0.03 | 0.001 |
| AlgPred | 92.2 | 46.4 | 75.9 | 28.1 | 0.201 |
| DASARP | 91.0 | 94.3 | 85.9 | 33.2 | 0.298 |
| APPEL | 81.4 | 92.7 | 96.4 | 89.6 | 0.641 |
| SVM-pairwise | 83.7 | 95.3 | 96.4 | 98.3 | 0.738 |
The specificity (SP) of the system was assessed using i) all putative non-allergens, ii) allergen-like putative non-allergens (APN) iii) 100 APN sequences with lowest E-values and iv) divergent putative non-allergens (DPN), respectively.
Figure 2Comparison of SVM-pairwise's performance against existing systems.
A cumulative plot of specificity against log(E-value) is shown; indicating that SVM-pairwise is more capable of differentiating allergen-like non-allergens that other reported systems.