Literature DB >> 28191711

Bioinformatic screening and detection of allergen cross-reactive IgE-binding epitopes.

Scott McClain1.   

Abstract

Protein allergens can be related by cross-reactivity. Allergens that share relevant sequence can cross-react, those lacking sufficient similarity in their IgE antibody-binding epitopes do not cross-react. Cross-reactivity is based on shared epitopes that is based on shared sequence and higher level structure (charge and shape). Epitopes are important in predicting cross-reactivity potential and may provide the potential to establish criteria that identify homology among allergens. Selected allergen's IgE-binding epitope sequences were used to determine how the FASTA algorithm could be used to identify a threshold of significance. A statistical measure (expectation value, E-value) was used to identify a threshold specific to identifying cross-reactivity potential. Peanut Ara h 1 and Ara h 2, shrimp tropomyosin Pen a 1, and birch tree pollen allergen, Bet v 1 were sources of known epitopes. Each epitope or set of epitopes was inserted into random amino acid sequence to create hypothetical proteins used as queries to an allergen database. Alignments with allergens were noted for the ability to match the epitope's source allergen as well as any cross-reactive or other homologous allergens. A FASTA expectation value range (1 × 10-5 -1 × 10-6 ) was identified that could act as a threshold to help identify cross-reactivity potential.
© 2017 Syngenta Crop Protection, LLC. Molecular Nutrition & Food Research published by WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

Entities:  

Keywords:  Allergen; Bioinformatics; Cross-reactivity; Epitope; FASTA; Homology

Mesh:

Substances:

Year:  2017        PMID: 28191711      PMCID: PMC5573986          DOI: 10.1002/mnfr.201600676

Source DB:  PubMed          Journal:  Mol Nutr Food Res        ISSN: 1613-4125            Impact factor:   5.914


expectation value National Center for Biotechnology Information

Introduction

Cross‐reactivity has generally been studied in the context of food safety. The goal has been to understand how sensitization to one particular food may allow for cross‐reactivity to homologous allergens in other foods. Cross‐reactivity begins with sensitization to one allergen. Exposure to another highly similar protein that shares enough of the IgE‐binding epitope structure/sequence can also sensitize and/or elicit an allergic response. This shared sequence and shared reactivity is termed, cross‐reactivity. In some cases, a sensitizer may not cause allergy symptoms and the cross‐reactive allergen is the eliciting antigen, but not sensitizing to the patient. This example illustrates the case where the eliciting, cross‐reactive allergen is an incomplete allergen. In contrast, the allergen that can both sensitize and elicit a reaction is considered a “true” or complete allergen. Nevertheless, it is the shared sequence homology that is of interest in determining the potential to elicit a clinically relevant allergy response, that is, an allergy event associated with symptomology such as wheezing, urticaria, and oral allergy syndrome, as examples. Protein allergens have the potential to cross‐react if two or more allergens share amino acid sequence to a degree that there is shared IgE antibody binding. IgE binding is ascribed to the IgE‐binding regions, or epitopes, of the allergen. Shared IgE binding across multiple proteins is based on the premise that there is high degree of homology within the epitope(s) to maintain IgE binding. Thus, cross‐reactivity results from the Fc epsilon‐RI (FcεRI) receptor binding by the IgE–allergen complex from either allergen 1. Together, this complex is the basis for stimulating mast cells and basophils to release mediators, such as histamine. IgE‐binding epitopes can be of a sequential, uninterrupted amino acid string or as a discontinuous distribution throughout the larger allergen sequence. A sequential epitope tends to result in a minimum length of sequence that can bind IgE 2, 3, and in some food allergen based cases retain IgE‐binding capacity after surviving gastric enzyme reduction of the intact protein 4. Sequential epitopes may be part of larger epitopes, but likely have the same physical constraints in binding IgE whether or not they are isolated as peptides or within the intact protein 5. Still, sequential epitopes acting to bind IgE as part of the larger allergen structure would of course be impacted by the constraints of structure and charge imparted by the rest of the sequence. Discontinuous epitopes (e.g., conformational or nonsequential) are defined by an allergen remaining intact and the maintenance of proper folding (secondary and tertiary sequence conformation) to bring distributed residues within range of one another to allow IgE binding. The structural family of allergens with the most numerous sequences is the birch tree pollen allergens (Bet v 1). This allergen group is well recognized for birch tree respiratory sensitivity, which can be due to sensitivity to one or more of the many isoforms produced by Betula pendula (verrucosa) species 6. However, the allergen belongs to a broader group of structurally related proteins in several species, some of which can induce allergy based on shared homology. This allergen group is a good example of epitope homology where patients with birch pollen hay fever can also experience clinical symptoms not from the original sensitizing allergen, Bet v 1, but are instead reacting to a Bet v 1 homologue in a food. One allergen that shares homology with Bet v 1 is Mal d 1; the pathogen resistance associated protein in apples that can cause oral allergy syndrome 7. The Bet v 1 sequence appears to be the “parental” source of the shared epitopes, as all of the Mal d 1 epitopes (both B and T‐cell) are contained within Bet v 1. Clinically, the focus is on the elicitation response and this is shown by Bet v 1 being able to inhibit the B‐cell epitopes through IgE binding by Mal d 1, but with Mal d 1 being unable to fully inhibit Bet v 1 IgE binding 7. It should be noted that the foods themselves are not exclusively dependent on the Bet v 1 as a sensitizer. The foods themselves can also prompt naïve or original reactivity to the allergens in those foods directly 8. It should be recognized that other proteins in birch pollen and foods with the Bet v 1 homologues may also sensitize and elicit allergy of their own accord. Bioinformatics has the capacity to statistically determine the probability of taxonomic relatedness at the protein level 9, 10. As Pearson (2000) notes, “…, with biological sequences (as opposed to fair coins), the assumptions underlying the statistical model may not be met. When the assumptions fail, the highest scoring unrelated sequence may have an expectation value (E‐value) that is much too low (e.g., E < 10−3] or much too high [E > 100]” 11. This sets the context for using FASTA as a tool, which needs to be vetted for its use in specific cases with appropriate context for the groups of proteins being evaluated. Bioinformatics has been extended herein in its application for assessing whether similarity can describe the possibility of cross‐reactivity between protein allergens. The shared percent identity in amino acids remains a traditional way to describe how alike two proteins are in their sequence. Although noted for the imperfect nature of using identity (i.e., a percentage of shared, exact amino acid matches across a total amino acid length) to “find” potential cross‐reactivity among sequences 5, an identity threshold has found its way into regulatory guidance 12, 13. Thus, the metric of a minimum 35% shared identity, plus a minimum of 80 amino acid overlap length, has become criteria to establish significant shared sequence between an unknown or novel protein allergen and a known allergen. In the regulatory framework from 2001 (FAO/WHO; evaluation of allergenicity of genetically modified foods), the intent was to set a tiered approach whereby the first step would be that if an alignment between an allergen and a novel protein exceeded 35% and 80 amino acid overlap, then a second step, serum screening, would be employed to confirm the existence or absence of cross reactivity. However, as was recognized at the time, there was no qualified, complete list of known allergens 14, which could be systematically explored for similarity thresholds. Together with the fact that very few epitopes for allergens were known at the time, the 35% over 80 amino acids represented a conservative approach to setting a tiered assessment that hinged on serology‐based confirmation of allergenicity, but lacked a detailed exploration of allergens and the way in which cross‐reactivity can be assessed bioinformatically. It is well understood that bioinformatics and alignment algorithms base their probability assessments of homology on extrapolating to higher order protein structure from sequential sequence similarity (i.e., identical and similar residues). In the case of the FASTA algorithm, the intent is to identify local alignments between sequences 11 to find the portions of two proteins that may describe their core areas of shared sequence. This local alignment feature of FASTA is consistent with how epitopes tend to be localized as small portions within the larger, intact protein sequence. In the current study, verified IgE‐binding epitopes of key allergens were used to determine a minimum alignment threshold to detect homologous sequences. In using only epitope sequence information, the focus was on testing the capacity of minimum, but immunologically relevant sequence to act as a bioinformatic screen for other homologous or cross‐reactive allergens. Known, sequential and discontinuous epitopes were used to model the localized positioning of the epitopes within a larger protein sequence. Hypothetical query proteins were constructed of random amino acid sequence that was modified to include known allergen epitopes; in effect, doping random sequence with known, biologically relevant allergen epitopes. Each hypothetical protein was then compared to a database of known allergens to determine whether FASTA alignments could discern homology based on only epitopes. In using epitopes, the goal was to model the use of bioinformatics for establishing threshold criteria based on biological similarity among distinct allergen groups.

Methods

Random amino acid sequence was used to construct hypothetical protein sequence(s) that were of the same length as known allergens. Random sequence was used to fill in between the portions of known allergen epitopes. This random “filler” sequence was derived from random, alternative open reading frame sequence, as derived and translated from an original gene, human alpha‐amylase; the filler sequence otherwise had no similarity to the parental gene or any known gene, including allergens. The actual primary reading frame was ignored, and a reverse reading frame was translated and prepared using the BLAST program, GETORF3 routine 15; this amino acid sequence was then randomized (Supporting Information Fig. 1). Portions of each filler sequence were repetitively used to construct hypothetical proteins of the proper length for each allergen from which epitopes were derived (discussed below).

Hypothetical sequence construction based on allergen epitopes

Each hypothetical sequence was prepared by spacing the known epitopes and placing them into their same locations, relative to the N‐terminus of the native allergen. Allergen sequences were based on their identification, as listed in the Food Allergy Research and Resource www.allergenonline.org database 16, 17. To create the hypothetical sequence for Ara h 1, the IgE‐binding 10‐mer peptides that were identified by the Bannon laboratory 18, 19 were placed into random sequence to create a 626 amino acid sequence to be used for allergen database comparisons (Fig. 1A). The length was based on the Ara h 1 protein (GI: 1168391) with epitope locations mapped into the hypothetical sequence according to their native location in Ara h 1 18.
Figure 1

(A) A random selection of amino acids loaded with peanut Ara h 1 epitopes, numbers 10–22: total length = 626. Epitopes are identified by bold and highlighted lettering. (B) A random selection of amino acids was loaded with the epitopes of peanut Ara h 2 (AH2‐1, AH2‐3a, and AH2‐3c); total length = 172 aa. A contiguous epitope region covering 66 amino acids is identified by bold and highlighted lettering; underlined epitopes are, in order from N‐ to C‐terminal end, AH2‐1, AH2‐3a, and AH2‐3c. (C) A random selection of amino acids was loaded with shrimp Pen a 1 epitopes: total length = 284. Epitopes are identified by bold and highlighted lettering. (D) A random selection of amino acids was loaded with the region of sequence from Bet v 1 containing the discontinuous epitope residues; total length = 160 aa. Epitope region is identified by bold and highlighted lettering; underlined letters are the Bet v 1 epitope residues.

(A) A random selection of amino acids loaded with peanut Ara h 1 epitopes, numbers 10–22: total length = 626. Epitopes are identified by bold and highlighted lettering. (B) A random selection of amino acids was loaded with the epitopes of peanut Ara h 2 (AH2‐1, AH2‐3a, and AH2‐3c); total length = 172 aa. A contiguous epitope region covering 66 amino acids is identified by bold and highlighted lettering; underlined epitopes are, in order from N‐ to C‐terminal end, AH2‐1, AH2‐3a, and AH2‐3c. (C) A random selection of amino acids was loaded with shrimp Pen a 1 epitopes: total length = 284. Epitopes are identified by bold and highlighted lettering. (D) A random selection of amino acids was loaded with the region of sequence from Bet v 1 containing the discontinuous epitope residues; total length = 160 aa. Epitope region is identified by bold and highlighted lettering; underlined letters are the Bet v 1 epitope residues. A second peanut‐based allergen sequence was constructed based on the recent discovery that nonhomologous proteins may have cross‐reactivity 20. A hypothetical sequence was constructed using synthetic epitopes known to cross‐react with peanut allergens. The epitopes AH2‐1, AH2‐3a, and AH2‐3c from Bublin et al. 20 were loaded into random sequence to make a 172 amino acid length sequence (Fig. 1B) based on the Ara h 2 protein (GI: 26245447). A supplementary 172 amino acid hypothetical sequence was also prepared, but a contiguous 69 amino acid section covering the AH2‐1, AH2‐2, and AH2‐3a AH2‐3b and AH2‐3c epitopes 20 was inserted into random sequence. The epitopes of the Pen a 1 tropomyosin allergen from the brown shrimp species Penaeus aztecus 21 were used to prepare hypothetical sequence as with the other allergens; note, the genus is now listed as Farfantepenaeus. Epitope regions were inserted based on concatenating the individual, overlapping epitopes listed in the work by Reese, et al., Fig. 1A–E 21. The length is based on the Pen a 1 protein (GI: 73532979) and was used to create a hypothetical protein sequence of 284 amino acids (Fig. 1C). The epitope 22 of the European White birch (B. pendula) pollen allergen, Bet v 1, was prepared similar to the others. The single epitope is discontinuous along the length of the allergen. The length is based on the average length of several listed isoforms of the Bet v 1 protein (example given by GI: 1542865) with random sequence used to create a 160 amino acid hypothetical protein (Fig. 1D). In addition to only using the epitope residues, a separate hypothetical sequence was constructed using the entire region of the Bet v 1 protein over which the epitopes are dispersed. This “epitope region” is 56 amino acids in length and two separate sequences were constructed for comparing with the allergen database. It should be noted that in using only one epitope this is not necessarily a representation of the multiple, nonoverlapping eptitopes required by an allergen for cross‐linking of the FcεRI receptor 1.

FASTA comparisons

Each hypothetical sequence was compared to a protein allergen sequence database (FARRP, 2015). The database consisted of 1,897 sequences representing clinically confirmed, as well as putative allergens. The comparison was performed using the FASTA algorithm, version 3.4t11 10. Parameter settings for FASTA were as follows: BLOSUM 50 matrix, gap penalty = 12, gap extension penalty = 2, Z = 2,000, and z = 1. A minimum 30% shared identity and a sequence overlap length of 40 amino acids were used as display limits for the output shown in tables; an upper threshold of E = value of 10 was also used for display of alignments. This combination of alignment display limiting was set below the Codex Alimentarius (2009) guideline values of 35% and 80 amino acids to insure alignments would be displayed above and below the Codex threshold values.

Results

The alignments produced between each epitope‐containing sequence and allergens were evaluated to identify homologous and possible cross‐reactive allergens. Four sequences were used; peanut Ara h 1 and Ara h 2, shrimp tropomyosin Pen a 1 and the birch tree pollen allergen, Bet v 1. They have varying levels of taxonomic conservation at the epitope level and across the entire sequence that were expected to affect alignment metrics 23, 24. Tropomyosin, for example, has well‐recognized conservation across many species that was expected to allow identification of all known tropomyosin allergens. In order to perform the analysis in this study, each allergen had representative, cross‐reactive epitopes inserted into random, hypothetical sequence. FASTA was used to compare the hypothetical “full‐length” protein to the allergen database in order to observe the primary alignment to any database sequences below the expectation cut‐off of 10 (E‐value = 10). Relevant alignments were judged by whether or not aligned database sequences matched the allergen (or group of allergens) from which the epitopes were derived, and by the E‐value at which non‐homologous alignments were observed. It should be noted that two or more significantly aligning sequences does not imply known cross‐reactivity in every case. Although highly cross‐reactive groups of allergens have been examined, serology work to support cross‐reactivity to confirm cross‐reactivity has not been performed for many members of the respective allergen homologues discussed herein. In the first exercise, the 10‐mer peptides that were identified by the Bannon laboratory 18, 19 were examined for their impact on alignments between allergens and a hypothetical 626 amino acid sequence. FASTA analysis showed that the overlap with the parental Ara h1 protein was between 305 and 328 amino acids in length, with the best alignment producing an E‐value of 5.3 × 10−39 (Table 1). Only Ara h 1 and two beta‐conglycinin proteins from soybean showed alignments, both vicilin‐like 7S globulins 25, 26; the least significant alignment was 2.1 × 10−11. Alignments indicated little variance among these homologous proteins, as there were no unexpected allergens from other species identified. The epitopes supported very specific identification of homology even though each epitope was only ten amino acids (except for the case of the overlapping region). The epitopes in total represented 19% of the overall hypothetical sequence.
Table 1

Summary of hypothetical sequence comparisons to an allergen database

Most significant alignmentLeast significant alignment
Allergens used for hypothetical sequence E‐valueSpecies, allergen E‐valueSpecies, allergen
Ara h 15.3 × 10−39 Arachis hypogea, Ara h 12.1 × 10−11 Glycine max, β‐conglycinin
Bet v 1, residues only6.8 × 10−3 Betula pendula, Bet v 18.0 × 10−1 Vigna radiata, Vig r 1.0101 (PRP 10 protein)
Bet v 1, 56 aa epitope region1.1 × 10−22 Betula pendula, Bet v 13.0 × 10−3 Vigna radiata, Vig r 6.0101
Pen a 13.4 × 10−27 Metapenaeus ensis, Hom a 1.0102 homologue1.5 × 10−9 Tyrophagus putrescentiae, Tyr p 10
Ara h 25.1 × 10−17 Arachis hypogea, Ara h 25.1 × 10−10 Arachis hypogea, Ara h 2.01 allergen

Significance is based on the summary FASTA statistic, E‐value; a small value is more significant. Least significant alignment displayed in this table is that which is E‐value ≤ 9.9 × 10−1.

Summary of hypothetical sequence comparisons to an allergen database Significance is based on the summary FASTA statistic, E‐value; a small value is more significant. Least significant alignment displayed in this table is that which is E‐value ≤ 9.9 × 10−1. The Pen a 1 allergen is the tropomyosin protein from shrimp with a length of 284 amino acids (FARRP 2015), and it represents a highly conserved protein with less variability across species than Ara h 1. The percentage of epitope residues relative to the overall sequence length was 32%, the most of any of the hypothetical proteins. FASTA results showed a total of 72 alignments, with all of these far below an E‐value of 0.1 (Table 1). The most significant alignment was nearly identical in its E‐value to several other tropomyosins from related species (35 alignments had E‐values between 1.3 × 10−26 and 8.1 × 10−22), indicating close homology within this structurally related group. The source organism of the Pen a 1 sequence is the shrimp species, Farfantepenaeus aztecus, and it displayed the second most significant alignment (E‐value 1.5 × 10−26). The analysis of Bet v 1 was focused on exploring whether a discontinuous epitope 22 would retain the capacity to represent the protein in general; that is, could the epitope alone flag Bet v 1 (and homologues) when inserted into random sequence. Sixteen epitope residues (Fig. 1D) were considered as a challenge in the use of FASTA due to the lower concentration of contiguous residues as well as the lower total number of residues relative to the random sequence length (10%). The hypothesis was that algorithms such as FASTA may be limited in identifying either the parental sequence or homologues with so few epitope residues scattered throughout the sequence. As shown in Table 1, the Bet v 1 allergen was clearly observed (as the most significant alignment) as were numerous other members of this large, cross‐reactive allergen family (Tables 2 and 3). In all, there were 106 alignments (alignment data not shown) with only two alignments producing an E‐value greater than the cut‐off of 0.1 used for Table 1. An E‐value range of 0.001–0.01 is recommended as an upper threshold below which alignments likely start to have significance 11. Most of the other species with Bet v 1 homologues were represented, including genus’ Castanea, Corylus, Malus, Quercus, and Carpinus and the range in E‐values was between 6.8 × 10−3 and 8.0 × 10−1; a very narrow range.
Table 2

All alignments comparing the hypothetical protein containing the Bet v 1 epitope region with the allergen database

Database match descriptionGI numberSpecies% IdentityOverlap E‐value
Pollen allergen Betv1, isoform At84006928 Betula pendula 58.81141.10E‐22
Pollen allergen Betv1, isoform At374006953 Betula pendula 58.81141.50E‐22
Bet Vi Jap112583681 Betula platyphylla 57.91141.70E‐22
Isoallergen Bet V 1 B14590392 Betula pendula 57.91141.70E‐22
Pollen allergen Bet V 11542861 Betula pendula 57.91141.70E‐22
Bet Vi Jap312583685 Betula platyphylla 57.91142.00E‐22
Pollen allergen , Betv14376216 Betula pendula 55.71152.00E‐22
Chain A, birch pollen allergen Bet V 1 mutant N28t, K32q, E45s, P108g11514622 Betula pendula 57.91142.80E‐22
Pollen allergen Bet V 11542869 Betula pendula 87.9663.30E‐22
Bet V 1 D452732 Betula pendula 571143.30E‐22
Bet V 1 L452744 Betula pendula 571143.30E‐22
Major pollen allergen Bet V 1‐1168706 Betula pendula 571143.30E‐22
Pollen allergen Bet V 11542867 Betula pendula 58.81143.30E‐22
Pollen allergen Betv12564220 Betula pendula 571143.30E‐22
Pollen allergen Bet V 11542865 Betula pendula 57.91143.90E‐22
Chain A, birch pollen allergen Bet V 1159162097 Betula pendula 56.51154.50E‐22
Bet Vi Jap212583683 Betula platyphylla 57.91144.60E‐22
Pollen allergen Betv1, isoform At144006947 Betula pendula 86.4665.20E‐22
Pollen allergen Betv1, isoform At74006967 Betula pendula 571145.30E‐22
Pollen allergen Bet V 11542873 Betula pendula 55.71155.30E‐22
Bet V 1 ‐ like17938 Betula pendula 86.4666.30E‐22
chain A, crystal structure of a dimeric variant of Bet V 1565807648 Betula pendula 86.4666.30E‐22
Chain A, crystal structure of a variant of the major birch pollen allergen Bet V 1560188693 Betula pendula 86.4666.30E‐22
Variant of Bet V chain A of the crystal structure560188694 Betula pendula 86.4666.30E‐22
Chain B, crystal structure of a dimeric variant of Bet V 1560188692 Betula pendula 86.4666.30E‐22
Isoallergen Bet V 1 B24590394 Betula pendula 86.4666.30E‐22
Major allergen Bet V 11321720 Betula pendula 86.4666.30E‐22
Pollen allergen Betv12564224 Betula pendula 86.4666.30E‐22
Pollen allergen Betv1, isoform At54006965 Betula pendula 86.4666.30E‐22
Pollen allergen Bet V 11542863 Betula pendula 84.8667.40E‐22
Pollen allergen Betv1, isoform At424006955 Betula pendula 84.8667.40E‐22
Pollen allergen Betv12564228 Betula pendula 56.11147.40E‐22
Pollen allergen Betv1, isoform At874006963 Betula pendula 84.8668.40E‐22
Pollen allergen Betv1, isoform At454006957 Betula pendula 54.81158.70E‐22
Chain A, birch pollen allergen Bet V 1 mutant E45s38492423 Betula pendula 86.4661.00E‐21
Pollen allergen Bet V 11542871 Betula pendula 84.8661.00E‐21
Pollen allergen Betv12564222 Betula pendula 84.8661.00E‐21
Pollen allergen Betv1, isoform At504006959 Betula pendula 84.8661.00E‐21
Pollen allergen Betv1, isoform At104006945 Betula pendula 84.8661.20E‐21
Bet V 1 F452736 Betula pendula 83.3661.90E‐21
Bet V 1 J452740 Betula pendula 83.3661.90E‐21
Pollen allergen , Betv14376222 Betula pendula 83.3661.90E‐21
Major allergen Bet V 12414158 Betula pendula 81.8663.10E‐21
Isoallergen Bet V 1 B34590396 Betula pendula 531153.10E‐21
Major allergen Bet V 11321716 Betula pendula 81.8663.70E‐21
Pollen allergen , Betv14376220 Betula pendula 80.3666.90E‐21
Bet V 1 E452734 Betula pendula 81.8667.00E‐21
Major birch pollen allergen Bet V 1.010, chain A of the crystal structure550544347 Betula pendula 83.3661.10E‐20
Major allergen Bet V 11321728 Betula pendula 81.8662.10E‐20
Pollen allergen Betv1, isoform At594006961 Betula pendula 51.81141.50E‐19
1‐Sc1534910 Betula pendula 72.2722.80E‐19
Bet V 1 C452730 Betula pendula 74.2667.30E‐19
Bet V 1 K452742 Betula pendula 74.2667.30E‐19
Bet V 1b450885 Betula pendula 74.2667.30E‐19
Major pollen allergen Bet V 1‐M/1168710 Betula pendula 74.2667.30E‐19
Major allergen Bet V 11321724 Betula pendula 74.2667.30E‐19
Major allergen Bet V 11321718 Betula pendula 75.8668.60E‐19
Pollen allergen , Betv14376219 Betula pendula 75.8668.60E‐19
Pollen allergen , Betv14376221 Betula pendula 75.8668.60E‐19
Major allergen Bet V 11321722 Betula pendula 74.2661.00E‐18
1 Sc2534900 Betula pendula 70.8721.40E‐18
1 Sc‐3534898 Betula pendula 77611.90E‐18
Major allergen Cor A 11321731 Corylus avellana 80.4564.30E‐18
Major allergen Bet V 11321726 Betula pendula 78.7614.30E‐18
Aln G I261407 Alnus glutinosa 77616.90E‐18
Major allergen Bet V 11321714 Betula pendula 77616.90E‐18
Pollen allergen Car B 11545897 Carpinus betulus 78.6561.80E‐17
Pollen allergen Car B 11545895 Carpinus betulus 76.8561.70E‐16
Car B I, partial402747 Carpinus betulus 76.8562.00E‐16
Major allergen variant Cor A 1.040211762102 Corylus avellana 68.9613.30E‐16
Major allergen variant Cor A 1.040311762104 Corylus avellana 68.9613.30E‐16
Major allergen Cor A 1.04015726304 Corylus avellana 68.9613.90E‐16
Major allergen variant Cor A 1.040411762106 Corylus avellana 68.9615.30E‐16
Car B I, partial402743 Carpinus betulus 73.2561.40E‐15
Pollen allergen Car B 11545877 Carpinus betulus 73.2562.60E‐15
Car B I402745 Carpinus betulus 73.2563.10E‐15
Pollen allergen300872535 Ostrya carpinifolia 73.2563.10E‐15
Pollen allergen Car B 11545879 Carpinus betulus 73.2563.10E‐15
Pollen allergen Car B 11545881 Carpinus betulus 73.2563.10E‐15
Pollen allergen Car B 11545875 Carpinus betulus 73.2563.10E‐15
Pollen allergen Car B 1 isoform167472843 Carpinus betulus 73.2563.10E‐15
Pollen allergen Car B 1 isoform167472841 Carpinus betulus 73.2563.10E‐15
Pollen allergen Car B 1 isoform167472839 Carpinus betulus 73.2563.10E‐15
Pollen allergen Car B 1 isoform167472837 Carpinus betulus 73.2563.10E‐15
Pollen allergen Car B 11545893 Carpinus betulus 66.1623.10E‐15
Pollen allergen Car B 1 isoform167472845 Carpinus betulus 71.4563.60E‐15
Pollen allergen Car B 11545889 Carpinus betulus 65.6616.90E‐15
Major allergen22690 Corylus avellana 73.2568.10E‐15
Major allergen22686 Corylus avellana 54.5882.10E‐14
Major allergen22688 Corylus avellana 71.4562.50E‐14
Major allergen22684 Corylus avellana 71.4562.50E‐14
Major allergen Cor A 11321733 Corylus avellana 62.3611.70E‐13
Ypr1016555781 Castanea sativa 63.9612.40E‐13
Fag S 1 pollen allergen212291472 Fagus sylvatica 58.1621.60E‐12
Fag S 1 pollen allergen212291474 Fagus sylvatica 59.7622.20E‐12
Cherry‐allergen PRUA11513216 Prunus avium 35.51213.60E‐12
Fag S 1 pollen allergen212291470 Fagus sylvatica 58.1625.00E‐12
Mal D 1747852 Malus domestica 43.3901.10E‐11
Major allergen Mal D 14768879 Malus x domestica 43.3901.10E‐11
Major allergen Mal D 14590382 Malus x domestica 43.3901.10E‐11
Major allergen Mal D 14590376 Malus x domestica 43.3901.10E‐11
Major allergen Mal D 14590378 Malus x domestica 43.3901.10E‐11
Major allergen Mal D 14590364 Malus x domestica 43.3901.10E‐11
Ribonuclease‐like PR‐10c15418742 Malus domestica 43.3901.10E‐11
Major allergen Pyr C 114423877 Pyrus communis 44.8871.50E‐11
Major cherry allergen Pru Av 1 mutant E45w, chain A,159162378 Prunus avium 34.71211.50E‐11
Group 2 Car B 1 = isoallergenic variant1008580 Carpinus betulus 79.5391.80E‐11
Major allergen886683 Malus x domestica 43.3901.80E‐11
Major allergen Mal D 14590388 Malus x domestica 35.91171.80E‐11
18 Kd winter accumulating protein C54311119 Morus bombycis 38.91132.10E‐11
Major allergen Pru P 182492265 Prunus persica 34.71212.10E‐11
Ap15862307 Malus x domestica 42.5872.90E‐11
Major allergen Mal D 12443824 Malus x domestica 42.5872.90E‐11
Major allergen D 121685277 Malus x domestica 42.5872.90E‐11
Major allergen Mal D 14590366 Malus x domestica 42.5872.90E‐11
Major allergen Mal D 14590380 Malus x domestica 42.2902.90E‐11
Major allergen Mal D 14590368 Malus x domestica 42.5873.40E‐11
Pollen allergen Que A 1 isoform167472849 Quercus alba 52.1734.00E‐11
18 Kda winter accumulating protein610664572 Morus alba var. atropurpurea 38.11134.00E‐11
Pollen allergen Que A 1 isoform167472851 Quercus alba 57.4615.50E‐11
Major allergen Mal D 11313966 Malus x domestica 48.1775.50E‐11
Major allergen Mal D 127922941 Malus x domestica 48.1775.50E‐11
Putative allergen Pru Du 1.01190613871 Prunus dulcis x Prunus persica 33.91215.50E‐11
Major cherry allergen Pru Av 1.020144409451 Prunus avium 35.91176.50E‐11
Ribonuclease‐like PR‐10b15418738 Malus domestica 48.1777.60E‐11
Ribonuclease‐like PR‐10a15418744 Malus domestica 41.4877.60E‐11
18 Kd winter accumulating protein A54311115 Morus bombycis 55.4561.00E‐10
Major allergen Mal D 11313968 Malus x domestica 55.7611.50E‐10
Major allergen Mal D11313972 Malus x domestica 55.7611.50E‐10
Major cherry allergen Pru Av 1.020244409474 Prunus avium 49.3671.50E‐10
Major Cherry allergen Pru Av 1.020344409496 Prunus avium 49.3671.50E‐10
Group 1 Car B 1 = isoallergenic variant1008578 Carpinus betulus = hornbeams, pollen, Peptide Recomb (80 aa)74.4392.80E‐10
Group 1 Car B 1 = isoallergenic variant1008579 Carpinus betulus 74.4392.80E‐10
Major allergen Mal D11313970 Malus × domestica 54.1613.20E‐10
Putative allergen Rub I 1110180525 Rubus idaeus 41.6776.50E‐10
Major strawberry allergen Fra A 1‐C90185688 Fragaria x ananassa 45.8729.90E‐10
Fra A 1‐A allergen88082485 Fragaria x ananassa 45.8721.00E‐09
Major strawberry allergen Fra A 1‐B90185682 Fragaria × ananassa 45.8721.00E‐09
Major strawberry allergen Fra A 1‐D90185684 Fragaria × ananassa 45.8721.00E‐09
Pollen allergen Que A 1 isoform167472847 Quercus alba 53.2621.20E‐09
Major allergen protein homolog2677826 Prunus armeniaca 51.8561.60E‐09
Pathogenesis‐related protein18744 Glycine max 47.8691.90E‐09
PR10 protein565380238 Solanum lycopersicum 34.4931.90E‐09
TSI‐1 protein2887310 Solanum lycopersicum 34.4932.00E‐09
Chain B, crystal structure of the strawberry pathogenesis‐related 1550544407 Fragaria x ananassa 44.4722.20E‐09
Major strawberry allergen Fra A 1‐E90185692 Fragaria x ananassa 44.4722.20E‐09
Cas S 1 pollen allergen212291464 Castanea sativa 58.5532.60E‐09
PR10 Protein565380268 Solanum lycopersicum 35.4963.60E‐09
Cas S 1 pollen allergen212291468 Castanea sativa 56.6534.20E‐09
Cas S 1 pollen allergen212291466 Castanea sativa 56.6535.80E‐09
Pathogenesis‐related protein 1060418924 Vigna radiata 43.8739.20E‐09
Bet V 1 related allergen281552898 Actinidia deliciosa 42.5802.90E‐08
Ara H 8 allergen37499626 Arachis hypogaea 42.5738.80E‐08
Ara H 8 allergen isoform 3169786740 Arachis hypogaea 42.5738.80E‐08
Bet V 1 related allergen281552896 Actinidia chinensis 34.7953.20E‐07
Pathogenesis‐related protein 10110676574 Arachis hypogaea 40.3724.40E‐07
Ara H 8 allergen isoform145904610 Arachis hypogaea 33.3819.60E‐07
PRP‐like protein302379159 Daucus carota 34.6785.70E‐06
PRP‐like protein302379157 Daucus carota 34.6786.60E‐06
PRP‐like protein302379147 Daucus carota 33.3781.10E‐05
PRP‐like protein302379149 Daucus carota 33.3781.10E‐05
PRP‐like protein302379155 Daucus carota 32.1782.80E‐05
Pathogenesis‐related protein‐like protein 119912791 Daucus carota 33.3782.80E‐05
Major allergen Api G14423646 Apium graveolens 32.9762.90E‐05
PRP‐like protein302379151 Daucus carota 32.1785.30E‐05
PRP‐like protein302379153 Daucus carota 32.1785.30E‐05
Cytokinin‐specific binding protein4190976 Vigna radiata 31.3963.00E‐03
Per A 4 allergen60678787 Periplaneta americana 39.6486.30E+00
Vacuolar serine protease12005497 Penicillium oxalicum 35.8537.50E+00

Rows indicated by shaded cells in the E‐value column have a percent identity <35 and/or an overlap <80.

Table 3

The best (lowest E‐value) 15 alignments comparing the hypothetical protein containing the Bet v 1 epitopes only with the allergen database

Database match descriptionGI numberSpecies% IdentityOverlap E‐value
1‐Sc1534910 Betula pendula 30.4796.80E‐03
Bet v 1 c452730 Betula pendula 33.3638.90E‐03
Bet v 1 k452742 Betula pendula 33.3638.90E‐03
Bet v 1b450885 Betula pendula 33.3638.90E‐03
Major pollen allergen Bet v 1‐M/1168710 Betula pendula 33.3638.90E‐03
Major allergen Bet v 11321724 Betula pendula 33.3638.90E‐03
Major allergen Bet v 11321718 Betula pendula 33.3638.90E‐03
Pollen allergen Betv1, isoform at594006961 Betula pendula 33.3638.90E‐03
Pollen allergen, Betv14376221 Betula pendula 33.3638.90E‐03
Pollen allergen, Betv14376219 Betula pendula 33.3638.90E‐03
Car b I, partial402747 Carpinus betulus 32.1561.20E‐02
Pollen allergen Car b 11545895 Carpinus betulus 30.2631.20E‐02
Pollen allergen Betv1, isoform at74006967 Betula pendula 33.9561.50E‐02
Major allergen Bet v 11321722 Betula pendula 31.7631.50E‐02
Bet v 1 e452734 Betula pendula 35.7561.80E‐02

Rows indicated by shaded cells in the E‐value column have a percent identity <35 and an overlap <80.

All alignments comparing the hypothetical protein containing the Bet v 1 epitope region with the allergen database Rows indicated by shaded cells in the E‐value column have a percent identity <35 and/or an overlap <80. The best (lowest E‐value) 15 alignments comparing the hypothetical protein containing the Bet v 1 epitopes only with the allergen database Rows indicated by shaded cells in the E‐value column have a percent identity <35 and an overlap <80. In an attempt to determine how Bet v 1 epitope impacted the FASTA E‐value range, the entire native protein sequence encompassing the Bet v1 epitope residues (56 amino acids; Fig. 1D) was also loaded into random sequence, similar in construction as the other hypothetical proteins, and compared against the allergen database. The expectation was that homologues of Bet v 1 would be more easily distinguished due to the much greater proportion of the allergen represented in the hypothetical sequence. The sequence consisted of 35% Bet v1 sequence and produced a total alignment count that was greater (169) than using just the epitope residues. As before, there were only two alignments above an E‐value of 0.1. The main difference across all the alignments was that most of the isoforms of Bet v 1 represented in allergen database are observed (63) and many more representative isoforms from Carpinus and Caucus, for example, populate the alignment list (Table 2). The main impact on the alignment metrics was a much more significant E‐value maximum (10−22 vs. 10−3) when the larger 56 amino acid region was used. This is an expectation with FASTA when a large localized portion of the sequence is an exact match 10. The cut‐off between homologues and nonhomologous sequences was also more in line with an established threshold for evaluating allergens with an E‐value of 3.9 × 10−7 27 when the larger 56 amino acid Bet v 1 section was used (Table 2). One last example of epitope detection was examined based on the recent discovery that nonhomologous proteins may have cross‐reactivity 20. The synthetic peptides AH2‐1, AH2‐3a, and AH2‐3c aligned with Ara h 2 and these peptides showed cross‐reactivity by IgE‐binding inhibition 20. These same three peptides were loaded into random sequence to make a 172 amino acid length sequence for comparison to the allergen database. The bioinformatic results show that Ara h2 was clearly identified (Table 1); similarity with Ara h1 and Ara h3 was not identified as there were only four total proteins aligned and all of them were Ara h 2 isoforms. The additional hypothetical sequence containing a contiguous 66 amino acids section of the Ara h 2 protein (Fig. 1B) was more effective in identifying Ara h 1 and Ara h 3 (Supporting Information Table 2). The only other peanut allergen identified by this additional comparison and studied by Bublin et al. was Ara h 6. A primary reason a larger portion of the Ara h 2 protein was required to identify the nonhomologous, but cross‐reactive proteins, is the likely presence of undiscovered epitopes that are shared among these proteins. In contrast, the disparate epitope locations across the three proteins appear to have limited the ability to clearly identify shared similarity that may exist for this unique example of nonhomologous cross‐reactivity. For example, the epitope regions in Ara h1 and Ara h3 are located far away from the N‐terminal location 20. The Ara h1 region begins at amino acid position 584, and Ara h 3 begins at amino acid 328, whereas Ara h 2 begins very near the N‐terminal end at position 26 for the peptide AH2‐1.

Discussion

The premise behind comparing novel protein sequences to allergens has always been based on either identifying a known allergen or identifying a risk of allergic cross‐reactivity. Presumably, unexpected cross‐reactivity with a novel protein, or newly discovered protein, could happen in one of three ways. First, a portion of a known allergen could be inadvertently used to construct a synthetic novel protein. Second, a novel protein is derived from a species not yet identified as an allergen source and the novel protein is unexpectedly similar to its homologous counterpart, a known allergen. The third, and most unlikely scenario, involves the unintentional modification of a protein that results in enough similarity to share IgE binding with an allergen. For all practical purposes, even heavily modified novel proteins have to retain their function and structure to the extent that they are highly similar to the native protein expressed in the source organism. This level of retained, native similarity to a known structural class of protein makes it unlikely that random modification would somehow create an unexpected, but immunologically relevant level of similarity with an allergen. Otherwise, it is straightforward to identify the taxonomic class of the source organism and the structural family and function of the modified novel protein. Well‐characterized allergen epitopes were used to examine the sensitivity of bioinformatics as a screening tool. The goal was to examine the level of sensitivity for detecting known cross‐reactivity potential by focusing on epitope sequence isolated from the rest of the parental core sequence. The B‐cell epitope was considered the minimum level of biologically relevant sequence that could identify the parental allergen or homologues for which cross‐reactivity risk could be observed. The results demonstrated that FASTA retained its usefulness in identifying localized areas of sequence similarity even when the localized areas are small. This is an extension of the concept discussed by Bannon and Ogawa 4 and experimentally tested using maize allergen sequences 27, as well as with motif‐centric experiments where shared sequence across proteins is predicted first, then tested for immune reactivity 28. In this article, the Bet v 1 exercise was the best example of detecting homology at a very low proportion of native allergen sequence, with only 16 total residues representing a single discontinuous epitope (Fig. 1D). The 16 residues, although not contiguous within the hypothetical sequence, created enough of an exact match within the larger sequence to identify Bet v 1 (Table 1). This is because spacing along the length of the hypothetical sequence was the same as in endogenous Bet v 1 and thus, still allowed FASTA to identify this unique pattern of similarity; any other order would lower the similarity score. In this regard, the results hint at the importance of the underlying unique structure of the whole protein and the inherent spacing along the sequential sequence for identifying similarity between sequences with FASTA. The summary statistic, the E‐value, performed well in identifying both the parental sequence from which the epitope was derived as well as homologous proteins with known cross‐reactivity. This was improved upon (lower E‐value and more homologous proteins from related species) when the whole Bet v 1 region was loaded into the random sequence, or when there were many more residues as there are for Ara h 1 and tropomyosin. In the case of tropomyosin, the epitopes are well conserved and unique to a degree that the E‐value range remained virtually unchanged even when the epitopes were inserted into a much longer (700 amino acids) hypothetical sequence (data not shown). In contrast to only lengthening the random amino acid content, a reduction from six to four total epitopes (two replaced with random sequence) was used as a separate comparison. This shifted the E‐value range in an increasing direction; the lowest value moving from 3.4 × 10−27 to 2.1 × 10−15, and the highest moving from 1.5 × 10−09 to 1 × 10−06, plus one additional alignment at E‐value = 7.4. The reduction in epitopes was impactful, but with such a highly conserved protein there was no reduction in the ability to detect homologues. The sensitivity in detecting similarity was much better using an E‐value rather than other metrics such as percent identity, or a combination percent identity and overlap length (Table 2). For example, E‐value was consistent in grouping known homologous Bet v 1 sequences. This is in contrast to the inconsistent designation by percent identity and overlap. Those alignments with E‐values below 10−1 and also having <35% identity and an overlap length <80 are shaded in gray in Table 2. An E‐value of 10−1 was considered as a minimum to survey for the accuracy of all alignments with regard to the percentage and overlap metrics. Clearly, members of the Bet v 1 family were identified by >35% identity and 80 or more amino acid overlap (Codex metrics), but these metrics were comparatively poor at identifying all of the homologous proteins. In the most extreme example, where only the Bet v 1 epitope residues were loaded into the hypothetical sequence, no alignments exceeding Codex metrics were observed below an E‐value of 10. Interestingly, the threshold in E‐value observed with proteins from carrot (Daucus carotus) is in line with the threshold of 3.9 × 10−7 calculated by Silvanovich et al. 27, but ranges just past this cutoff; E‐values 5.3 × 10−5 to 5.7 × 10−6. These values are very close to one another considering that the E‐value is typically judged on a log scale and both would be considered statistically significant 29. However, the hypothetical Bet v 1 protein and Daucus alignment displays percent identity and overlap length values just below thresholds of 35 and 80, respectively; an example in which those metrics 12 do not identify an important alignment from a screening perspective (Fig. 2). This would be important to recognize with regard to cross‐reactivity given the recent confirmation of IgE reactivity in Bet v 1 sensitized patients to the Dau c 1 protein 30. A lack of vicilin (vicilin‐like) homogues to the Ara h 1 hypothetical sequence was noted. A further evaluation noted the fact that some expected homologues, such as those from Lupinus and Pisum, were not present until the display limit was lowered to 25% identity. Yet, these same Lupinus and Pisum allergens had E‐values ranging from ∼1 × 10−10 to 1 × 10−16, further indicating that percent identity can be disconnected from the more relevant overall similarity identified by E‐value.
Figure 2

Alignment of the Bet v 1 epitope region (loaded into random amino acid sequence) with a PRP protein from carrot (Daucus carota).

Alignment of the Bet v 1 epitope region (loaded into random amino acid sequence) with a PRP protein from carrot (Daucus carota). The more simplistic Codex metrics also fail to discriminate the tropomyosin and Ara h 2 region of epitopes (Supporting Information Tables 1 and 2). Alignments with the Pen a 1 hypothetical protein, for example, all align with significant E‐values, but the percent identity values decline to point where percent identity and overlap length do not coincide with obvious homology among the tropomyosins. This is also highlighted by the Bet v 1 analysis where the recently confirmed cross‐reactivity with Vig r 6 and Vig r 1 31 from Vigna radiata was noted by identification of these using an E‐value, but would not have been noted using Codex metrics (Table 2). The impact of lower percentages of epitope residues, relative to the rest of the allergen, is highlighted even more clearly by the inability to identify homologues for the Bet v 1 epitope. Compared with using E‐values, there were no alignments in the first 14 best results identified where either 35% identity or 80 amino acid lengths were observed (Table 3). An E‐value of 1 × 10−5–1 × 10−6 has been identified that delineates significant allergen similarity, as modeled for the Ara h 1, Pen a 1, and Bet v 1 (region) epitopes. This E‐value range was selected based on the observation that E‐values lower than 1 × 10−5 were exclusively observed for the Pen a 1 and Ara h 1 hypothetical sequences (Table 1). In addition, the Bet v 1 epitope region‐containing hypothetical sequence displayed a breakpoint between cross‐reactive and other, nonhomologous sequences near this E‐value (Table 2). When modeling shorter proteins (the Bet v 1 protein is only 160–161 amino acids) with very limited epitope sequence in a discontinuous structure, a higher threshold may be appropriate (E‐value = 1 × 10−3) to identity similarity. There is some minimum level of sequence information that is below the point at which FASTA can consistently produce the same threshold E‐value compared with alignments based on the analysis of their full length sequence. The combination of a short protein and a single, dispersed epitope accounts for this impact. Modeling of nonhomologous Ara h 2 cross‐reactive proteins 20 requires further clinical confirmation and further bioinformatic modeling in order to identify a meaningful E‐value threshold. In addition, E‐values based solely on Bet v 1 and a single discontinuous cross‐reactive epitope seem too indistinct to be considered predictive at this time. It is likely that the epitopes for both Bet v1 and the Ara h 1/Ara h 2/Ara h 3 complex are so uniquely distributed they may produce bioinformatic outcomes distinct from modeling observations of other allergens. Thus, identifying a single bioinformatic threshold for similarity using FASTA E‐values would be unlikely to hold true for various other discontinuous epitopes or nonhomologues proteins, respectively. This points to limitations in trying to fit a single threshold value across the many disparate groups of allergens. In the case of nonhomologues proteins, the epitopes may have arisen from cross‐reactivity due to very subtly distinct secondary and tertiary protein structures that simply cannot be resolved in the experiment herein where the epitopes have been taken away from contextual core sequence that would otherwise help identify homologous regions of proteins (i.e., more numerous local aligning sequences). In combination with a well‐curated allergen database, allergen sequence screening benefits from an observed reduction in false‐positive alignments when using an E‐value of 1 × 10−5–1×10−6. More important from a safety perspective, false negatives based on percent identity were avoided by using an E‐value. A range, rather than a single value, is appropriate due to the unique nature of individual proteins and cross‐reactive allergen groups, which promotes thresholds that are expected to vary to some degree. Certainly, for full length proteins, an E‐value of 1 × 10−5 would be an effective E‐value threshold. Nevertheless, the overall conclusion is consistent with the basic tenant of allergen cross‐reactivity that there is no known risk of cross‐reactivity without homology to a known allergen. FASTA has the capability to serve this screening purpose given the appropriate application of the algorithm. The use of alignment parameters based on modeling of known allergens (both epitopes and core structure) that incorporate a relevant level of statistical significance criteria (i.e., E‐value) is the key. Taken together, epitopes do appear to weight a random amino acid string enough to identify significant similarity and potential cross‐reactivity. In reality, allergens do not consist of just epitopes; they have core sequence structure that gives them their unique secondary and tertiary structure. Core sequence homology among related organisms is a key to the FASTA calculation of probable similarity because it was designed to help identify conserved domains 10. In terms of using a dedicated allergen database, shared core sequence is the basis for relatedness among allergens 32, 33. When at least some of the core sequence is present (e.g. Bet v 1 epitope region) or when epitopes are relatively numerous, FASTA easily identified homologues and produced an E‐value threshold that was indicative of statistically significant alignments according to FASTA expectations. The bioinformatic screening of sequences based on FASTA E‐values would be consistent with the intent of the FASTA algorithm statistical metrics for homology and an improvement over shared identity and overlap length. Examinations into the details of all of the alignments within an allergen class are always advised once screening for homology identification is performed. The study herein complements previous work 27, 34, 35, 36 intended to identify biologically relevant metrics for screening novel proteins. The first attempts at creating regulatory thresholds for novel protein screening were based on percent identity and sequence overlap length, and they have persisted until present day 12, 13. These values were primarily based on Bet v 1 homology structure, but it was unclear how these would perform using curated allergen databases such as FARRP 16, which were not available at the time. There are more sophisticated informatics methods that have been adapted for specific antigens 37. And, for deep analyses it is unlikely that one method is able to completely capture the variability across all known homologues of a given structural class 38, much less all of those different classes in an allergen database. Yet, a local‐alignment approach based on shared similarity and the use of an accepted general standard is a way forward since it underpins the very use of algorithms such as FASTA and the similar BLAST. In effect, by using the summary E‐value statistic, the base percent identity and length of alignment is incorporated into an analysis of similarity between proteins. As presented, the concept of building an analysis of cross‐reactivity among known allergens offers a “from the ground up” approach that can be extended. It has the potential to support modeling of the different allergen structural groups to identify meaningful thresholds for shared similarity, as it remains to be identified whether there is a single shared feature of proteins that make them allergens. The goal is to support the growing knowledge base of understanding the nature of how allergens are similar in structure, function, and their propensity to cross‐react in order to work toward a predictive approach that is both conservative and accurate. No conflict of interest. Employed solely by agricultural biotechnology company, Syngenta Crop Protection, LLC. Supplemental Figure 1. The alpha‐amylase gene from human (NCBI Genbank identification 565263) was used as the source for the alternative frame open reading frame sequences used as filler sequence to prepare each of the hypothetical sequences. Click here for additional data file. Supplementary Tables Click here for additional data file.
  32 in total

Review 1.  Evolutionary biology of plant food allergens.

Authors:  Christian Radauer; Heimo Breiteneder
Journal:  J Allergy Clin Immunol       Date:  2007-08-08       Impact factor: 10.793

2.  The use of E-scores to determine the quality of protein alignments.

Authors:  Andre Silvanovich; Gary Bannon; Scott McClain
Journal:  Regul Toxicol Pharmacol       Date:  2009-02-24       Impact factor: 3.271

3.  Allergens are distributed into few protein families and possess a restricted number of biochemical functions.

Authors:  Christian Radauer; Merima Bublin; Stefan Wagner; Adriano Mari; Heimo Breiteneder
Journal:  J Allergy Clin Immunol       Date:  2008-04       Impact factor: 10.793

4.  Empirical statistical estimates for sequence similarity searches.

Authors:  W R Pearson
Journal:  J Mol Biol       Date:  1998-02-13       Impact factor: 5.469

Review 5.  Gapped BLAST and PSI-BLAST: a new generation of protein database search programs.

Authors:  S F Altschul; T L Madden; A A Schäffer; J Zhang; Z Zhang; W Miller; D J Lipman
Journal:  Nucleic Acids Res       Date:  1997-09-01       Impact factor: 16.971

6.  Improved tools for biological sequence comparison.

Authors:  W R Pearson; D J Lipman
Journal:  Proc Natl Acad Sci U S A       Date:  1988-04       Impact factor: 11.205

7.  A novel approach for investigation of specific and cross-reactive IgE epitopes on Bet v 1 and homologous food allergens in individual patients.

Authors:  Diana Mittag; Vincent Batori; Philipp Neudecker; Regina Wiche; Esben P Friis; Barbara K Ballmer-Weber; Stefan Vieths; Erwin L Roggen
Journal:  Mol Immunol       Date:  2006-02       Impact factor: 4.407

Review 8.  Molecular properties of food allergens.

Authors:  Heimo Breiteneder; E N Clare Mills
Journal:  J Allergy Clin Immunol       Date:  2005-01       Impact factor: 10.793

9.  Dominating IgE-binding epitope of Bet v 1, the major allergen of birch pollen, characterized by X-ray crystallography and site-directed mutagenesis.

Authors:  Michael D Spangfort; Osman Mirza; Henrik Ipsen; R J Joost Van Neerven; Michael Gajhede; Jørgen N Larsen
Journal:  J Immunol       Date:  2003-09-15       Impact factor: 5.422

10.  Bioinformatic screening and detection of allergen cross-reactive IgE-binding epitopes.

Authors:  Scott McClain
Journal:  Mol Nutr Food Res       Date:  2017-03-27       Impact factor: 5.914

View more
  7 in total

Review 1.  Allergens from Edible Insects: Cross-reactivity and Effects of Processing.

Authors:  Laura De Marchi; Andrea Wangorsch; Gianni Zoccatelli
Journal:  Curr Allergy Asthma Rep       Date:  2021-05-30       Impact factor: 4.806

2.  AllerBase: a comprehensive allergen knowledgebase.

Authors:  Kiran Kadam; Rajiv Karbhal; V K Jayaraman; Sangeeta Sawant; Urmila Kulkarni-Kale
Journal:  Database (Oxford)       Date:  2017-01-01       Impact factor: 3.451

3.  Bioinformatic screening and detection of allergen cross-reactive IgE-binding epitopes.

Authors:  Scott McClain
Journal:  Mol Nutr Food Res       Date:  2017-03-27       Impact factor: 5.914

4.  A robust method for the estimation and visualization of IgE cross-reactivity likelihood between allergens belonging to the same protein family.

Authors:  Maksymilian Chruszcz; A Brenda Kapingidza; Coleman Dolamore; Krzysztof Kowal
Journal:  PLoS One       Date:  2018-11-29       Impact factor: 3.240

5.  Comparative evaluation of the immunodominant proteins of Brucella abortus for the diagnosis of cattle brucellosis.

Authors:  Mohandoss Nagalingam; Thaslim J Basheer; Vinayagamurthy Balamurugan; Rajeswari Shome; S Sowjanya Kumari; G B Manjunatha Reddy; Bibek Ranjan Shome; Habibur Rahman; Parimal Roy; J Joseph Kingston; R K Gandham
Journal:  Vet World       Date:  2021-03-30

6.  Biochemical and clinical studies of putative allergens to assess what distinguishes them from other non-allergenic proteins in the same family.

Authors:  Kevin C Glenn; Andre Silvanovich; Soon Goo Lee; Aron Allen; Stephanie Park; S Eliza Dunn; Colton Kessenich; Chen Meng; John L Vicini; Joseph M Jez
Journal:  Transgenic Res       Date:  2022-08-08       Impact factor: 3.145

7.  The Identification and Characterization of Immunoreactive Fungal Proteins Recognized by Sera from Zimbabweans Sensitized to Fungi.

Authors:  Lorraine Tsitsi Pfavayi; Richard Burchmore; Elopy Nimele Sibanda; Stephen Baker; Mark Woolhouse; Takafira Mduluza; Francisca Mutapi
Journal:  Int Arch Allergy Immunol       Date:  2022-05-18       Impact factor: 3.767

  7 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.