Literature DB >> 15373946

Allermatch, a webtool for the prediction of potential allergenicity according to current FAO/WHO Codex alimentarius guidelines.

Mark W E J Fiers, Gijs A Kleter, Herman Nijland, Ad A C M Peijnenburg, Jan Peter Nap, Roeland C H J van Ham.   

Abstract

BACKGROUND: Novel proteins entering the food chain, for example by genetic modification of plants, have to be tested for allergenicity. Allermatch http://allermatch.org is a webtool for the efficient and standardized prediction of potential allergenicity of proteins and peptides according to the current recommendations of the FAO/WHO Expert Consultation, as outlined in the Codex alimentarius. DESCRIPTION: A query amino acid sequence is compared with all known allergenic proteins retrieved from the protein databases using a sliding window approach. This identifies stretches of 80 amino acids with more than 35% similarity or small identical stretches of at least six amino acids. The outcome of the analysis is presented in a concise format. The predictive performance of the FAO/WHO criteria is evaluated by screening sets of allergens and non-allergens against the Allermatch databases. Besides correct predictions, both methods are shown to generate false positive and false negative hits and the outcomes should therefore be combined with other methods of allergenicity assessment, as advised by the FAO/WHO.
CONCLUSIONS: Allermatch provides an accessible, efficient, and useful webtool for analysis of potential allergenicity of proteins introduced in genetically modified food prior to market release that complies with current FAO/WHO guidelines.

Entities:  

Mesh:

Substances:

Year:  2004        PMID: 15373946      PMCID: PMC522748          DOI: 10.1186/1471-2105-5-133

Source DB:  PubMed          Journal:  BMC Bioinformatics        ISSN: 1471-2105            Impact factor:   3.169


Background

The safety of genetically engineered foods must be assessed before authorities in most nations will consider granting market approval. An important issue in current food safety assessment is the evaluation of the potential allergenicity of food derived from biotechnology. Since many food allergens are proteins, introduction of a new ("foreign") protein in food by genetic engineering can in theory cause allergic reactions. Therefore the allergenicity of novel proteins needs to be assessed. Potential allergenicity of a protein is a complex issue and various tests can be used for prediction, including bioinformatics, in vitro digestibility and binding of antisera of allergic patients. A step-by-step procedure to assess allergenicity is described by the Codex alimentarius and the FAO/WHO consultation group [1,2]. An important step in this procedure is to use bioinformatics to determine whether the primary structure (amino acid sequence) of a given transgenic protein is sufficiently similar to sequences of known allergenic proteins. The recommended procedure [1] to establish the possibility of allergenicity is to: (1) Obtain the amino acids sequences of known allergens in protein databases in FASTA format (using the amino acids from the mature proteins only, disregarding the leader sequences, if any). (2) Prepare the complete set of 80-amino acid length sequences derived from the query protein (again disregarding the leader sequence, if any). (3) Compare each of the sequences of (2) with all sequences of (1), using the program FASTA [3] with default settings for gap penalty and extension. According to the Codex alimentarius [2], potential allergenicity should be considered, when there is either: (a) More than 35 % similarity over a window of 80 amino acids of the query protein with a known allergen. (b) A stretch of identity of 6 to 8 contiguous amino acids. This procedure is described in more detail by the expert consultation and the Codex Alimentarius. Potential allergenicity requires further testing of the protein with panels of patient sera and possibly animal exposure tests [1,2].

Construction and content

Three allergen databases were created, one derived from SwissProt [4] and one from the WHO-IUIS allergen list [5]. A third database is a non-redundant combination of the other two. The databases were created by extracting all proteins from public databases; SwissProt (version 44.1, July 5 2004, [4]), PIR [6] and GenPept . Leader sequences were, if annotated, trimmed from the sequence. The SwissProt allergen list contains 334 mature protein sequences, while the WHO-IUIS allergen list (version June 7, 2004) contains 632 sequences (correcting for three internal duplications). These two databases contain 236 duplicate entries. The non-redundant combined database contains 730 sequences (Figure 1).
Figure 1

A Venn-diagram showing the relationships of the three databases provided by Allermatch™. This figure shows the size and overlap between the SwissProt and WHO-IUIS allergen databases.

Allermatch™ is build around the FASTA package (version 3.4t21; , [3]) running with default parameters (ktup = 2, matrix = Blosum50, Gap open = -10, Gap extend = -2). The Allermatch™ analysis tool and the web interface are written in Python and run on a Suse L Linux Enterprise server with an Apache web server (version 1.3.26). Allermatch™ provides two search methods (mode 1 & 2) corresponding with the FAO/WHO guidelines described above and a third method (mode 3) is provided as an extra tool. The outline of the application is schematically presented in Figure 2.
Figure 2

Schematic representation of the Allermatch™ webtool. The user submits a protein sequence of interest to the Allermatch™ webtool and chooses one of the three alignment methods and three databases available. Upon completion the results are formatted and returned to the user.

Mode 1: Sliding window approach

The query protein sequence is divided into 80 amino acid (aa) windows using a sliding window with steps of a single residue. Each of these windows is compared with all sequences in the allergen database of choice. All database entries showing a similarity higher than a configurable threshold percentage (default is 35%) to any of the 80 aa query sequence windows are flagged. Upon completion of the analysis, a table is shown with all flagged database entries. Per entry, the highest similarity score is given, as well as the number of windows having a similarity above the cut-off percentage. For each allergen database entry identified, more detailed information on the similarity between the allergen and query sequence can be retrieved, such as those areas of both proteins within all 80 aa windows scoring above the cut-off percentage. The similarity score calculated by FASTA can apply to stretches smaller than 80 aa, Allermatch™ converts such a similarity score to an 80 aa window. For example, 40% similarity on a stretch of 40 aa converts to 20% similarity on an 80 aa window.

Mode 2: Wordmatch

This method looks for short sub-sequences (words), which have a perfect identity with a database entry. The wordsize is configurable (default is 6 aa). The output given is similar to the output given by Mode 1. All database entries with at least one hit are listed and for each of these, more detailed information can be retrieved upon request.

Mode 3: full FASTA alignment with an Allermatch™ allergen database

The Allermatch™ webtool also offers a full alignment of the query sequence with either of the allergen databases using FASTA. Although this full alignment is currently not required by the FAO/WHO guidelines, the full alignment of protein sequences helps positioning of regions of potential allergenicity in the whole primary structure of the protein. The FASTA output is parsed and information from the allergen database is added and presented.

Utility and discussion

To examine the predictive performance of the FAO/WHO criteria for potential allergenicity, we have performed two tests. The first test determines the percentage of false negative and the second test assesses the amount of false positives. Both tests are performed with standard settings; for the sliding window approach an 80 amino acid window with a 35% similarity cutoff is used and for the wordmatch approach 6, 7 and 8 aa word sizes are tested. The false negative error-rate is estimated by a leave-one-out method, testing all sequences in each Allermatch™ database against that database with the tested sequence excluded. Each sequence not resulting in a hit is considered a false negative. The results of each method/database combination are summarized in Table 1, column 1. The results show that the number of false negatives decreases when a larger database of allergen sequences is used. This may (partly) be explained by an increased proportion of similar, but not equal, sequences in the larger databases, such as isoallergens listed by WHO-IUIS. In examining the results, various sequences were observed that were not able to produce a hit (data not shown) due to their short length, since a perfect hit on a sequence shorter than 28 amino acids cannot convert to a 35% hit on an 80 amino acid window. Column 2 of the same table shows the corrected false negative rate after exclusion of these sequences. Also after this correction the wordmatch with 6 amino acids method shows lower numbers of false negatives than the sliding window approach. It is clear, however, that in case of short protein sequences the sensitivity of the sliding window methods is reduced.
Table 1

Prediction quality of the FAO/WHO methods.

123

False negativesFalse negatives (corrected)False positives

DatabaseMethodWordsizeNumber%Number%Number%
SwissProtWindown.a.71 / 33421.357 / 32017.83 / 1225.0
Wordmatch654 / 33416.2n.a.n.a.7 / 1258.3
769 / 33420.7n.a.n.a.6 / 1250.0
878 / 33423.4n.a.n.a.3 / 1225.0
WHO-IUISWindown.a.99 / 63215.778 / 61112.84 / 1233.3
Wordmatch658 / 6329.2n.a.n.a.9 / 1275.0
798 / 63215.5n.a.n.a.8 / 1266.7
8117 / 63218.5n.a.n.a.3 / 1225.0
SwissProt & WHO-IUISWindown.a.101 / 73013.877 / 70610.95 / 1241.7
Wordmatch655 / 7307.5n.a.n.a.9 / 1275.0
795 / 73013.0n.a.n.a.8 / 1266.7
8115 / 73015.8n.a.n.a.3 / 1225.0

The number and percentage of false negative and false positive results are shown here for all FAO/WHO recommended method/database combinations. Result set 1 describes the number of false negatives observed in a leave-one-out approach. The next result set (2) shows the same results but corrected for those sequences that were not able to generate a hit against itself due to the short length of the sequence. The last result set (3) shows the observed number of false positives when testing 12 non-allergenic sequences with the Allermatch™ webtool. Each of the result sets consists of two columns; the first column shows the number of erroneous hits and the total number of sequences in this set. The second column shows the percentage of erroneous hits.

In the second test, we assess the odds of a false positive by testing 12 protein sequences known to be non allergenic. This is based on non-reactivity of these proteins towards IgE-sera of allergy patients or on the inability to cause IgE-responses in experimental animals (Table 2). It should be noted that such data are only available for a limited number of proteins, which accounts for the size of this dataset. Each of these 12 sequences was tested against all databases with all methods. Each non-allergenic sequence resulting in a hit is considered a false positive (Table 1, column 3). The number of false positives grows with the database size, as is to be expected: the chance of a random hit increases with a larger database. In contrast to the false negative hit rates the sliding window method gives the lower error rate. This test might, however, overestimate the number of false positives. A number of these non-allergens are related to and display similarities with their allergenic counterparts, i.e. T1 (related to Bet v 1), human serum albumin (related to animal serum albumins), and human heat shock protein 70 (similar to heat shock proteins from fungi and other allergens). A selection of unrelated, non-allergenic proteins is therefore likely to give a lower false positive rate. Caution should be taken in interpreting these false hit rates. The used methods might perform differently with other sets of proteins. For example, a member of a completely novel group of valid allergens is likely to generate a false negative result.
Table 2

Sequences used for the negative control

ProteinHost organismEvidence for non-allergenicityAccessionReference
Amaranth seed albuminAmaranthus hypochondriacusIgG-response, but no raised IgE-levels, after administration (intranasal and intraperitoneal) of amaranth seed albumin to miceGenPept CAA77664[14]
T1Catharanthus roseusNo reaction of recombinant T1 in IgE-sera binding, basophile histamine release, and skin prick testing using patients allergic to the related birch pollen allergen Bet v 1Not applicable[15]
Mite ferritin heavy chainDermatophagoides pteronyssinusReaction of mite ferritin with IgG, but not with IgE, of sera from patients allergic to house dust miteGenPept AAG02250[16]
Maltose binding proteinEscherichia coliNo reaction with IgE-sera from patients allergic to natural rubber latex (maltose binding protein used as part of fusion proteins with latex allergens)SwissProt P02928[17]
Human serum albuminHomo sapiensNo reaction of human serum albumin with IgE-sera of patients allergic to cat- and porcine-serum albuminSwissProt P02768[18]
Human heat shock protein 70Homo sapiensNo reaction of human heat shock protein 70 with IgE-sera of patients allergic to heat shock protein 70 from Echinococcus granulosusSwissProt P08107[19]
Human beta-2-glycoprotein IHomo sapiensPresence of IgM antibodies, but not of IgE antibodies, directed against human beta-2-glycoprotein I in sera from atopic eczema/dermatitis patientsSwissProt P02749[20]
Guayule rubber particle proteinParthenium argentatumNo cross-reactivity between proteins from guayule and latex using IgE-sera from patients allergic to latexSwissprot Q40778[21]
Purple acid phosphatase 1Solanum tuberosumStimulation of IgG-, but no or only low stimulation of IgE-antibodies following administration of potato acid phosphatase to mice (oral and intraperitoneal)TrEMBL Q6J5M7[22]
Purple acid phosphatase 2Solanum tuberosumSee aboveTrEMBL Q6J5M9[22]
Purple acid phosphatase 3Solanum tuberosumSee aboveTrEMBL Q6J5M8[22]
Potato lectinSolanum tuberosumStimulation of IgG-, but no or only low stimulation of IgE-antibodies following administration of potato lectin to mice (intraperitoneal)TrEMBL Q9S8M0[23]
The imperfect results show here agree with literature where the FAO/WHO methods for sequence comparisons are also shown to lack full predictive capability [7-9]. Interestingly, the results show that there is a balance between false positives and negatives when increasing the threshold level for short exact matches from 6 to 8 amino acids, with the number of false positives sharply decreasing at 8 amino acids (Table 1). The outcomes of these tests therefore need to be further refined by checking for the presence of potential IgE-epitopes as recommended by Kleter and Peijnenburg [7], as well as combined with results of other assays as recommended by the Codex. Other methods to decrease false hit rates may also be considered [8,9]. We plan to implement such supplementary methods in the future to support the Codex based predictions of potential allergenicity. The prediction of potential allergenicity by primary sequence comparison depends on the quality of the data used for comparison. Addition of a non-allergenic or poorly annotated protein to any of the Allermatch™ allergen databases would obviously result in undesired false positives and should be prevented. A workable strategy could be to use multiple databases, i.e. a database based on SwissProt's list of allergens, which contains well-annotated sequences from SwissProt, simultaneously with a larger database based on the WHO-IUIS list, which contains possibly less well annotated sequences from other protein databases, such as GenPept. For example, a number of protein accessions in the WHO-IUIS database do not mention the presence of signal- and/or pro-peptides, where removal of such peptides is essential to prevent false positives. Users of Allermatch™ should, at all times, take into account the possibility of a false positive or negative, for example by checking original data (accessions, clinical literature) and confirm results, before arriving at conclusions. To prevent false positives as much as possible, one should choose for the well-annotated SwissProt database. To prevent false negatives, the combination of the larger WHO-IUIS database with that of SwissProt is more appropriate. Updates to the SwissProt and WHO-IUIS allergen lists will be incorporated in the Allermatch™ databases on a regular basis. Several other websites in the public domain offer sequence alignment facilities that support the prediction of potential allergenicity, such as SDAP [10,11], AllerPredict [12] and Farrp [13]. These websites offer search algorithms that find contiguous similar amino acids between a query sequence and database sequences (SDAP, AllerPredict) and more than 35% identity in alignments (SDAP, AllerPredict), as well as a general FASTA of a query protein sequence against the database (SDAP, Farrp).

Conclusions

Allermatch™ is an efficient and comprehensive webtool that combines all bioinformatics approaches required to assess the allergenicity of protein sequences according to the current guidelines in the Codex. The application will be kept up to date with the FAO/WHO criteria and the SwissProt and WHO-IUIS allergen lists. It will be extended with other, supplementary methods to support and refine the prediction of allergenicity.

Availability and requirements

Allermatch™ is platform independent and accessible using any Netscape 4+ compatible webbrowser at .

Authors' contributions

MF developed and implemented the Allermatch™ webtool. HN provided the domain name registration and advised in the web site development. GK and AP provided the scientific background and constructed the sequence databases. JPN and RvH provided time, resources and ample discussion. All authors have read and approved the final manuscript.
  18 in total

1.  Statistical evaluation of local alignment features predicting allergenicity using supervised classification algorithms.

Authors:  D Soeria-Atmadja; A Zorzet; M G Gustafsson; U Hammerling
Journal:  Int Arch Allergy Immunol       Date:  2004-01-21       Impact factor: 2.749

2.  The Protein Information Resource.

Authors:  Cathy H Wu; Lai-Su L Yeh; Hongzhan Huang; Leslie Arminski; Jorge Castro-Alvear; Yongxing Chen; Zhangzhi Hu; Panagiotis Kourtesis; Robert S Ledley; Baris E Suzek; C R Vinayaka; Jian Zhang; Winona C Barker
Journal:  Nucleic Acids Res       Date:  2003-01-01       Impact factor: 16.971

3.  Allergic cross-reactions between cat and pig serum albumin. Study at the protein and DNA levels.

Authors:  C Hilger; M Kohnen; F Grigioni; C Lehners; F Hentges
Journal:  Allergy       Date:  1997-02       Impact factor: 13.146

4.  Improved tools for biological sequence comparison.

Authors:  W R Pearson; D J Lipman
Journal:  Proc Natl Acad Sci U S A       Date:  1988-04       Impact factor: 11.205

5.  Allergen nomenclature. WHO/IUIS Allergen Nomenclature Subcommittee.

Authors:  T P King; D Hoffman; H Lowenstein; D G Marsh; T A Platts-Mills; W Thomas
Journal:  Int Arch Allergy Immunol       Date:  1994-11       Impact factor: 2.749

6.  Non-allergenic antigen in allergic sensitization: responses to the mite ferritin heavy chain antigen by allergic and non-allergic subjects.

Authors:  M J Epton; W Smith; B J Hales; L Hazell; P J Thompson; W R Thomas
Journal:  Clin Exp Allergy       Date:  2002-09       Impact factor: 5.018

7.  Molecular and immunological characterization of the C-terminal region of a new Echinococcus granulosus Heat Shock Protein 70.

Authors:  E Ortona; P Margutti; F Delunardo; S Vaccari; R Riganò; E Profumo; B Buttari; A Teggi; A Siracusano
Journal:  Parasite Immunol       Date:  2003-03       Impact factor: 2.280

8.  Association between the occurrence of the anticardiolipin IgM and mite allergen-specific IgE antibodies in children with extrinsic type of atopic eczema/dermatitis syndrome.

Authors:  E Szakos; G Lakos; M Aleksza; E Gyimesi; G Páll; B Fodor; J Hunyadi; E Sólyom; S Sipka
Journal:  Allergy       Date:  2004-02       Impact factor: 13.146

9.  Evaluation of protein allergenic potential in mice: dose-response analyses.

Authors:  R J Dearman; S Stone; H T Caddick; D A Basketter; I Kimber
Journal:  Clin Exp Allergy       Date:  2003-11       Impact factor: 5.018

10.  Molecular characterization of recombinant T1, a non-allergenic periwinkle (Catharanthus roseus) protein, with sequence similarity to the Bet v 1 plant allergen family.

Authors:  Sylvia Laffer; Said Hamdi; Christian Lupinek; Wolfgang R Sperr; Peter Valent; Petra Verdino; Walter Keller; Monika Grote; Karin Hoffmann-Sommergruber; Otto Scheiner; Dietrich Kraft; Marc Rideau; Rudolf Valenta
Journal:  Biochem J       Date:  2003-07-01       Impact factor: 3.857

View more
  35 in total

Review 1.  Immunoinformatics: an integrated scenario.

Authors:  Namrata Tomar; Rajat K De
Journal:  Immunology       Date:  2010-08-16       Impact factor: 7.397

2.  Arabitol dehydrogenase as a selectable marker for rice.

Authors:  P R LaFayette; P M Kane; B H Phan; W A Parrott
Journal:  Plant Cell Rep       Date:  2005-11-16       Impact factor: 4.570

Review 3.  Bioinformatics approaches to classifying allergens and predicting cross-reactivity.

Authors:  Catherine H Schein; Ovidiu Ivanciuc; Werner Braun
Journal:  Immunol Allergy Clin North Am       Date:  2007-02       Impact factor: 3.479

Review 4.  Relevant B cell epitopes in allergic disease.

Authors:  Anna Pomés
Journal:  Int Arch Allergy Immunol       Date:  2009-11-26       Impact factor: 2.749

5.  Allerdictor: fast allergen prediction using text classification techniques.

Authors:  Ha X Dang; Christopher B Lawrence
Journal:  Bioinformatics       Date:  2014-01-07       Impact factor: 6.937

6.  Current progress of immunoinformatics approach harnessed for cellular- and antibody-dependent vaccine design.

Authors:  Ada Kazi; Candy Chuah; Abu Bakar Abdul Majeed; Chiuan Herng Leow; Boon Huat Lim; Chiuan Yee Leow
Journal:  Pathog Glob Health       Date:  2018-03-12       Impact factor: 2.894

7.  Characteristic motifs for families of allergenic proteins.

Authors:  Ovidiu Ivanciuc; Tzintzuni Garcia; Miguel Torres; Catherine H Schein; Werner Braun
Journal:  Mol Immunol       Date:  2008-10-31       Impact factor: 4.407

8.  An Allergen Portrait Gallery: Representative Structures and an Overview of IgE Binding Surfaces.

Authors:  Catherine H Schein; Ovidiu Ivanciuc; Terumi Midoro-Horiuti; Randall M Goldblum; Werner Braun
Journal:  Bioinform Biol Insights       Date:  2010-10-11

9.  Integrative immunoinformatics for Mycobacterial diseases in R platform.

Authors:  Rupanjali Chaudhuri; Deepika Kulshreshtha; Muthukurussi Varieth Raghunandanan; Srinivasan Ramachandran
Journal:  Syst Synth Biol       Date:  2014-02-15

10.  A Novel d-Allulose 3-Epimerase Gene from the Metagenome of a Thermal Aquatic Habitat and d-Allulose Production by Bacillus subtilis Whole-Cell Catalysis.

Authors:  Satya Narayan Patel; Girija Kaushal; Sudhir P Singh
Journal:  Appl Environ Microbiol       Date:  2020-02-18       Impact factor: 4.792

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.