| Literature DB >> 22408383 |
Giulia Pagliarani, Roberta Paris, Anna Rosa Iorio, Stefano Tartarini, Stefano Del Duca, Paul Arens, Sander Peters, Eric van de Weg.
Abstract
European populations exhibit progressive sensitisation to food allergens, and apples are one of the foods for which sensitisation is observed most frequently. Apple cultivars vary greatly in their allergenic characteristics, and a better understanding of the genetic basis of low allergenicity may therefore allow allergic individuals to increase their fruit intake. Mal d 1 is considered to be a major apple allergen, and this protein is encoded by the most complex allergen gene family. Not all Mal d 1 members are likely to be involved in allergenicity. Therefore, additional knowledge about the existence and characteristics of the different Mal d 1 genes is required. In the present study, we investigated the genomic organisation of the Mal d 1 gene cluster in linkage group 16 of apple through the sequencing of two bacterial artificial chromosome clones. The results provided new information on the composition of this family with respect to the number and orientation of functional and pseudogenes and their physical distances. The results were compared with the apple and peach genome sequences that have recently been made available. A broad analysis of the whole apple genome revealed the presence of new genes in this family, and a complete list of the observed Mal d 1 genes is supplied. Thus, this study provides an important contribution towards a better understanding of the genetics of the Mal d 1 family and establishes the basis for further research on allelic diversity among cultivars in relation to variation in allergenicity. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1007/s11032-011-9588-4) contains supplementary material, which is available to authorized users.Entities:
Year: 2011 PMID: 22408383 PMCID: PMC3285766 DOI: 10.1007/s11032-011-9588-4
Source DB: PubMed Journal: Mol Breed ISSN: 1380-3743 Impact factor: 2.589
Description of the predicted ORFs in the MC-12 and MC-20 sequences (FN823234 and FN823235, respectively) analysed with BLASTP software
| ORF | Protein ID | Gene features | BLASTP results | Mal d 1 isoformc | |||||
|---|---|---|---|---|---|---|---|---|---|
| Dira | Start–end position (from T7-end) | Most similar sequence ID |
| Descriptionb | Source | Conserved domains | |||
|
| |||||||||
| ORF1 | CBL94129 | ↓ | 520–1891 | AAT85304 | 4,00E-92 | Reverse transcriptase (RNA-dependent DNA polymerase) domain containing protein |
| RVE superfamily (Integrase domain) | |
| ORF2 | CBL94130 | ↓ | 2837–3941 | Hypothetical protein with no significant similarities | |||||
| ORF3 | CBL94131 | ↓ | 5458–5673 | Hypothetical protein with no significant similarities | |||||
| ORF4 | CBL94132 | ↓ | 12339–13767 | AAO45751 | 3,00E-11 | Gag-protease polyprotein (retrotransposon protein) |
| Gag proteins from LTR retrotransposons | |
| ORF5 | CBL94133 | ↓ | 14727–16116 | ABA98191 | Retrotransposon protein, putative, Ty3-gypsy subclass |
| RT_LTR (reverse transcriptases from LTR retrotransposons); RNase H superfamily | ||
| ORF6 | CBL94134 | ↑ | 19701–20302 | AAX18298 | 1,00E-84 |
|
| Polyketide_cyc 2 superfamily |
|
| ORF7 | CBL94135 | ↓ | 22203–22903 | Hypothetical protein with no significant similarities | |||||
| ORF8 | CBL94136 | ↑ | 24478–25500 | Hypothetical protein with no significant similarities | |||||
| ORF9 | CBL94137 | ↓ | 29603–30001 | Hypothetical protein with no significant similarities | |||||
| ORF10 | CBL94138 | ↓ | 30450–31410 | AAS00052 | 1,00E-86 |
|
| Polyketide_cyc 2 superfamily |
|
| ORF11 | CBL94139 | ↑ | 33218–37315 | AAP52115 | 8,00E-57 | Retrotransposon protein, putative, Ty1-copia subclass |
| RVE superfamily (integrase domain); RVT_2 superfamily (reverse transcriptases domain) | |
| ORF12 | CBL94140 | ↑ | 39819–42609 | Hypothetical protein with no significant similarities | |||||
| ORF13 | CBL94141 | ↓ | 45547–46197 | AAD26548 | 2,00E-85 |
|
| Polyketide_cyc 2 superfamily |
|
| ORF14 | CBL94142 | ↑ | 50864–51162 | Hypothetical protein with no significant similarities | |||||
| ORF15 | CBL94143 | ↑ | 52015–52470 | CAN81839 | 9,00E-42 | Hypothetical protein |
| ||
| ORF16 | CBL94144 | ↑ | 53617–54123 | AAG51247 | 2,00E-39 | copia-type polyprotein, putative (retrotrasposon protein) |
| ||
| ORF17 | CBL94145 | ↓ | 59754–60614 | ACE80952 | 4,00E-46 |
|
| Polyketide_cyc 2 superfamily |
|
| ORF18 | CBL94146 | ↓ | 63792–65322 | Hypothetical protein with no significant similarities | |||||
| ORF19 | CBL94147 | ↑ | 66153–66580 | Hypothetical protein with no significant similarities | |||||
| ORF20 | CBL94148 | ↓ | 68018–68719 | AAS00053 | 6,00E-92 |
|
| Polyketide_cyc 2 superfamily |
|
| ORF21 | CBL94149 | ↑ | 70217–70631 | Hypothetical protein with no significant similarities | |||||
| ORF22 | CBL94150 | ↑ | 71128–72667 | Hypothetical protein with no significant similarities | |||||
| ORF23 | CBL94151 | ↓ | 73771–74403 | AAX18303 | 2,00E-84 |
|
| Polyketide_cyc 2 superfamily |
|
| ORF24 | CBL94152 | ↑ | 75956–76767 | Hypothetical protein with no significant similarities | |||||
| ORF25 | ↓ | 78571–78853 |
| ||||||
| ORF26 | CBL94154 | ↑ | 79009–83451 | XP_002280852 | 0,0 | Hypothetical protein |
| RVE superfamily (integrase domain); RVT_2 superfamily (reverse transcriptases domain) | |
| ORF27 | CBL94155 | ↓ | 87386–89959 | ABA95230 | 1,00E-51 | Retrotransposon protein, putative |
| RVT_2 superfamily (reverse transcriptases domain) | |
| ORF28 | CBL94156 | ↓ | 92219–92826 | AAX18306 | 5,00E-85 |
|
| Polyketide_cyc 2 superfamily |
|
| ORF29 | ↓ | 94875–95346 |
| ||||||
| ORF30 | CBL94158 | ↓ | 95952–96348 | Hypothetical protein with no significant similarities | |||||
| ORF31 | CBL94159 | ↓ | 96909–106394 | ABA94905 | 2,00E-17 | Retrotrasposon protein |
| Transposase domain | |
| ORF32 | CBL94160 | ↓ | 107170–109602 | CAB39638 | 4,00E-33 | RNA-directed DNA polymerase-like protein |
| RT_like superfamily (reverse transcriptases domain) | |
| ORF33 | CBL94161 | ↓ | 114358–115592 | CAN79553 | 4,00E-71 | Hypothetical protein |
| HMA, heavy-metal-associated domain | |
| ORF34 | CBL94162 | ↑ | 116800–121966 | NP_196685 | 0,0 | Transducin family protein/WD-40 repeat family protein |
| WD40 domain | |
|
| |||||||||
| ORF35 | CBL94163 | ↓ | 06–1982 | ABD28426 | 4E-179 | RNA-directed DNA polymerase (Reverse transcriptase) |
| RT_nLTR (reverse transcriptase from non-LTR retrotransposons) | |
| ORF36 | CBL94164 | ↑ | 2242–5241 | Hypothetical protein with no significant similarities | |||||
| ORF37 | CBL94165 | ↓ | 5409–5900 | ABB00038 | 4E-47 | Reverse transcriptase family member |
| ||
| ORF38 | CBL94166 | ↓ | 9631–11289 | CAN74865 | 1E-35 | Hypothetical protein |
| PMD, plant mobile domain | |
| ORF39 | CBL94167 | ↓ | 13865–15433 | Hypothetical protein with no significant similarities | |||||
| ORF40 | CBL94168 | ↑ | 16118–17227 | CAN69026 | 7E-09 | Hypothetical protein |
| RVE superfamily (integrase domain) | |
| ORF41 | CBL94169 | ↑ | 17718–23267 | CAN62188 | 7E-38 | Hypothetical protein |
| ||
| ORF42 | CBL94170 | ↓ | 24714–25663 | Hypothetical protein with no significant similarities | |||||
| ORF43 | CBL94171 | ↑ | 27463–30549 | CAN67317 | 3E-79 | Hypothetical protein |
| RVT_2 (reverse transcriptase) superfamily | |
| ORF44 | CBL94172 | ↓ | 36000–40765 | ABA98193 | 0,0 | Retrotransposon protein, putative, Ty3-gypsy subclass |
| RT_LTR: reverse transcriptases (RTs) from retrotransposons; RVP_2: single domain aspartyl proteases from retrotransposons; Retrotrans_gag: Gag or capsid-like proteins from LTR retrotransposons; Chromo (CHRromatin Organisation MOdifier) domain; RVT_1: reverse transcriptase domain | |
| ORF45 | CBL94173 | ↑ | 43900–44379 | AAX18324 | 7E-85 |
|
| PolyKetide_cyc2 superfamily |
|
| ORF46 | CBL94174 | ↓ | 60900–61379 | AAX18307 | 5E-86 |
|
| PolyKetide_cyc2 superfamily |
|
| ORF47 | CBL94175 | ↓ | 63263–63742 | AAX20996 | 5E-87 |
|
| PolyKetide_cyc2 superfamily |
|
| ORF48 | CBL94176 | ↑ | 66110–72637 | Hypothetical protein with no significant similarities | |||||
| ORF49 | CBL94177 | ↓ | 74151–74630 | AAX18309 | 6E-87 |
|
| PolyKetide_cyc2 superfamily |
|
| ORF50 | CBL94178 | ↑ | 75936–77515 | Hypothetical protein with no significant similarities | |||||
| ORF51 | CBL94179 | ↓ | 77779–79873 | XP_002271465 | 5E-73 | Hypothetical protein |
| DUF789: conserved domain found in several plant proteins of unknown function | |
| ORF52 | CBL94180 | ↑ | 83182–85403 | BAB10341 | 3E-41 | Drought-induced protein Di19-like protein |
| Drought-induced 19 protein (Di19) domain | |
| ORF53 | CBL94181 | ↑ | 87649–91977 | Hypothetical protein with no significant similarities | |||||
| ORF54 | CBL94182 | ↓ | 102592–103322 | Hypothetical protein with no significant similarities | |||||
| ORF55 | CBL94183 | ↓ | 114066–117194 | XP_002519174 | 0,0 | Sensor histidine kinase, putative |
| REC Response_reg, Response regulator receiver domain; myb-like DNA-binding domain, SHAQKYF class | |
| ORF56 | CBL94184 | ↑ | 117937–126807 | XP_002519175 | 0,0 | Protein COBRA precursor, putative |
| COBRA superfamily | |
aDirection of the gene. The arrows are directed downward when the gene is directed from T7-end to Sp6-end and vice versa
bORF similar to Mal d 1 genes are indicated in bold and pseudo-genes are highlighted in gray
cMal d 1 isoform names proposed following the official allergen nomenclature (King et al. 1995) and Gao et al. (2005)
* Mal d 1 genes for which the genomic sequence is reported in this work for the first time
Description of the Mal d 1-like sequences in the MC-12 and MC-20 sequences (FN823234 and FN823235, respectively) analysed with BLASTN software
|
| BLASTN results | |||||||||
|---|---|---|---|---|---|---|---|---|---|---|
| ORF |
| Dira | Full length (nt) | CDS (nt) | Intron/Exon | Most similar sequence ID | Descriptionb | SNPs in CDS | SNPs in intron |
|
|
| ||||||||||
| ORF6 |
| ↑ | 602 | 480 | Exon I:184; Intron:122; Exon II: 296 | AY827697 |
| 1 | 0 | 0.0 |
| ORF10 |
| ↓ | 961 | 486 | Exon I:184; Intron:475; Exon II: 302 | AY428588 |
| 0 | Not available | 1.00E–156 |
| ORF13 |
| ↓ | 651 | 480 | Exon I:184; Intron:171; Exon II: 296 | AY827654 |
| 1 | 0 | 0.0 |
| ORF17 |
| ↓ | 861 | 486 | Exon I:184; Intron:375; Exon II: 302 | AY822733 |
| – | – | 3.00E–25 |
| ORF20 |
| ↓ | 702 | 495 | Exon I:184; Intron:208; Exon II: 310 | AY428589 |
| 0 | Not available | 4.00E–160 |
| ORF23 |
| ↓ | 633 | 480 | Exon I:184; Intron:153; Exon II:296 | AY827712 |
| 0 | 0 | 0.0 |
| ORF25 |
| ↓ | 283 | – | With stop codons and truncated | AY827730 |
| – | – | 2.00E–113 |
| ORF28 |
| ↓ | 608 | 480 | Exon I:184; Intron:128; Exon II:296 | AY827725 |
| 1 | 0 | 0.0 |
| ORF29 |
| ↓ | 470 | – | With stop codons | AY827730 |
| – | – | 0.0 |
|
| ||||||||||
| ORF45 |
| ↑ | 480 | 480 | Intronless | AY822733 |
| 5 | – | 0.0 |
| ORF46 |
| ↓ | 480 | 480 | Intronless | AY822717 |
| 5 | – | 0.0 |
| ORF47 |
| ↓ | 480 | 480 | Intronless | AY822721 |
| 0 | – | 0.0 |
| ORF49 |
| ↓ | 480 | 480 | Intronless | AY822719 |
| 0 | – | 0.0 |
Fig. 1Genomic organisation of Mal d 1 gene cluster on LG16. a Genetic map of Durello di Forlì LG16. SSRs developed based on the sequences of the two BACs are indicated in bold. b Physical map of the two BAC clones from cv Florina, MC-12 and MC-20. c Physical map of Mal d 1 cluster on LG16 from GD draft genome sequence. In (b) and (c), the Mal d 1 gene positions are indicated as black bars, the pseudogene positions as striped bars and the other genes in the cluster as dotted bars. The isoallergen genes previously known but located for the first time or designated by a new name are underlined, and the new isoallergen genes are indicated in boxes. The arrows indicate gene orientation; ° and ″ are identical sequences; *Mal d 1.12 sequence, but with a gap of 45 bp
Fig. 2Schematic overview of Mal d 1 allergen gene positions in the apple genetic map. Genetic positions of Mal d 1 loci are estimated through retrieval of their physical location in the GD whole genome relative to reference marker sequences. Genetic positions of reference markers are indicated according to Supplementary Figure 9 in Velasco et al. (2010). Mal d 1 loci in new genomic regions are underlined; *Mal d 1.05 in tandem duplication
Fig. 3Alignment of predicted amino acid sequences of Mal d 1 isoforms. Mal d 1 sequences were retrieved from the BAC clones sequences and from Gbrowser of the apple genome sequence. Two Bet v 1 isoforms, Bet v 1.01 and Bet v 1.04, are also included. The sequences are indicated with the isoform name followed by the ID number and the LG in which they are located. The P-loop region is indicated by the dashed box; substitutions between Bet v 1 sequences are indicated as small boxes; the position 45 is highlighted in red and the other amino acids putatively important for IgE recognition are within brackets or large boxes. Important amino acid substitutions are shown as circles