Literature DB >> 28829832

Allergen homologs in the Euroglyphus maynei draft genome.

S Dean Rider1, Marjorie S Morgan1, Larry G Arlian1.   

Abstract

Euroglyphus maynei is a house dust mite commonly found in homes worldwide and is the source of allergens that sensitize and induce allergic reactions in humans. It is the source of species-specific allergens as well as allergens that are cross-reactive with the allergens from house dust mites Dermatophagoides farinae and D. pteronyssinus, and the ectoparasitic scabies mite Sarcoptes scabiei. The genomics, proteomics and molecular biology of E. maynei and its allergens have not been as extensively investigated as those of D. farinae, D. pteronyssinus, and S. scabiei where natural and recombinant allergens from these species have been characterized. Until now, little was known about the genome of E. maynei and it allergens but this information will be important for producing recombinant allergens for diagnostic and therapeutic purposes and for understanding the allergic response mechanism by immune effector cells that mediate the allergic reaction. We sequenced and assembled the 59 Mb E. maynei genome to aid the identification of homologs for known allergenic proteins. The predicted proteome shared orthologs with D. farinae and S. scabiei, and included proteins with homology to more than 30 different groups of allergens. However, the majority of allergen candidates could not be assigned as clear orthologs to known mite allergens. The genomic sequence data, predicted proteome, and allergen homologs identified from E. maynei provide insight into the relationships among astigmatid mites and their allergens, which should allow for the development of improved diagnostics and immunotherapy.

Entities:  

Mesh:

Substances:

Year:  2017        PMID: 28829832      PMCID: PMC5568732          DOI: 10.1371/journal.pone.0183535

Source DB:  PubMed          Journal:  PLoS One        ISSN: 1932-6203            Impact factor:   3.240


Introduction

The allergy-causing house dust mite, Euroglyphus maynei occurs in homes worldwide [1-4]. It appears to be more common in homes in Europe and in the United Kingdom than in homes in the United States. Generally, E. maynei is less frequently found in homes than Dermatophagoides farinae and D. pteronyssinus. However, it usually co-inhabits homes with D. farinae and D. pteronyssinus and sometimes it is the most numerous of the three species or the only species found [5]. E. maynei is phylogenetically closely related to the common house dust mites, D. farinae and D. pteronyssinus and to the itch mite, Sarcoptes scabiei. E. maynei is an astigmatid mite and belongs to the suborder Oribata, infraorder Desmonomata and family Pyroglyphidae along with the Dermatophagoides spp [6,7]. Many patients that are sensitive to D. farinae and D. pteronyssinus are also sensitive to E. maynei [8-10]. E. maynei is the source of many allergens, some of which cross-react with those from D. farinae and D. pteronyssinus while others are unique to this mite species [9,11-15]. Antigens from E. maynei are also cross-reactive with some antigens from the scabies mite S. scabiei. However, there has been less research into E. maynei biology, allergens, prevalence of sensitivity and molecular biology of the allergens compared to D. farinae and D. pteronyssinus. Presumably, this is because this species is not as prevalent in homes as D. farinae and D. pteronyssinus and it is not widely cultured for research and commercial purposes. Thus, few of its allergens have been characterized or produced by recombinant technology. A draft genome for D. farinae has been produced [16] but none have been reported for D. pteronyssinus and E. maynei. The genomes have been sequenced for the related itch mite, S. scabiei, the unrelated mites, Tetranychus urticae and Metaseiulus occidentalis and the tick, Ixodes scapularis [7,17-19]. We report here the draft genome of E. maynei and compared its allergome to those of D. farinae and D. pteronyssinus and to the genomes of these other acarine species.

Materials and methods

The strain of E. maynei that was chosen for genome sequencing is a long-standing laboratory reference culture that originated in Galveston, TX, USA, and has been maintained as a randomly-breeding population for over 20 years [5,20]. Genomic DNA was isolated from a pool of living mites (~20 mg wet weight) using the Wizard SV genomic DNA purification system (Promega, Madison, WI) and the manufacturer’s animal tissue protocol. Mites were ground in ice-cold digestion buffer using a Dounce homogenizer prior to an overnight proteinase K digestion and column purification of the DNA. TruSeq Library construction and paired end sequencing were done by Beckman Coulter Genomics (Danvers, MA) to generate ~24 million paired reads (2x150 bp; ~12 million fragments; 350 bp insert size). This was designed to cover ~100 Mb of DNA at over 30x coverage. Unless specifically indicated, the software programs were used with the recommended or default settings. KmerGenie v1.6741 [21] was used to estimate the best value of k and the minimum abundance (m) for a given Kmer when using De Bruijn graph assembly algorithms. The best estimates provided were k = 47 and m = 7 using the diploid model or k = 53 and m = 12 with the regular model, although there were other peaks with the regular model. Therefore, different values of k (21, 47, 53, 71, 91) were used with their corresponding minimum abundance estimates to generate multiple assemblies. Minia v1.6088; [22] was used with SSPACE-STANDARD v3.0 [23] to assemble trimmed reads into gapped contigs. Within SSPACE, Bowtie v0.12.5 [24] was used to map reads during “scaffolding” and without implementing contig extension. The insert size provided to SSPACE was 372 +/- 334.8 bp. An additional assembly was also created using CLC Genomics Workbench v7.5. This method adopted a kmer size of 25, which was automatically implemented by the software to optimize computational efficiency based on the amount of input data. The MISA perl script [25] was used to scan the assembly for simple sequence repeats of: 1 nt with >10 copies, 2 nt with >6 copies, 3 nt with >5 copies, 4 nt with >5 copies, 5 nt with >5 copies, 6 nt with >5 copies, 7 nt with >5 copies, and 8 nt with >5 copies. The Benchmarking Universal Single-Copy Orthologs (BUSCO) strategy was used to test the completeness of the assemblies using the arthropod, metazoan, and eukaryote profiles [26]. The Augustus gene model output from the BUSCO arthropod analysis (in optimization mode) was used as input for gene predictions within the Maker software pipeline. Maker software was implemented with the accessory software programs RepeatMasker v1.317 [27], SNAP [28], and Augustus v3.0.1 [29]. SNAP utilized a scabies mite Hidden Markov Model (HMM) generated previously [7] and Augustus used the BUSCO-generated model from the k = 47, m = 7 Minia assembly. Hints for gene predictions were based on 17,091 EST sequences from other Sarcoptiformes mites (NCBI EST database), and the CEGMA 458 core proteins [30]. Over 200 of the Maker predictions possessed introns less than 10 bp. These protein predictions were removed from the final annotation data set that was deposited into NCBI under BioProject PRJNA350546. Annotations of the mitochondrial genome contig were aided by the MITOS web server (http://mitos.bioinf.uni-leipzig.de/index.py) and RNA Weasel (http://megasun.bch.umontreal.ca/RNAweasel/). GAG software (http://genomeannotation.github.io/GAG) was used to analyze the output from Maker to quantify the number of loci with start and stop codons, and to generate tables needed for submission of the information into public databases. Reciprocal best BLAST hits were identified using Legacy BLAST and the perl script orthoparahomlist.pl [31] and output was used to estimate the number of proteins that are shared among E. maynei, S. scabiei and D. farinae. Blast2Go was used to categorize proteins into functional groups based on the presence of conserved domains [32]. To identify candidate allergen proteins, allergen protein sequences from Dermatophagoides farinae, D. pteronyssinus, Sarcoptes scabiei, Blomia tropicalis, and Euroglyphus maynei were used as queries in BLAST searches against the predicted proteome of E. maynei. Query cutoffs included an expect of (E = 1x10-5) and a query coverage of >50%. Some fragments were later selected with lower query coverage because they represented missing regions of fragmented loci. Phylogenetic comparisons among proteins relied on ClustalOmega alignments [33] and Neighbor joining trees generated with the web-based interface at EBI [34] available at http://www.ebi.ac.uk or using the MUSCLE + PhyML pipeline available at http://www.phylogeny.fr. The gapped contigs were filtered to remove contaminating sequences, and sequences less than 200 nt prior to depositing into public databases. Sequence Reads, Gapped Contigs, and Predicted Proteins are all stored in public databases under BioProject PRJNA350546.

Results and discussion

Genome size and assembly metrics

The E. maynei genome size has not been reported. Kmer distributions from our survey data provided the first estimates of this metric. The genome size estimates, based on the two Kmer genie models, ranged from ~68 to ~97 Mb (S1 Fig). Thus, the survey provided >30x coverage of the sequenceable genome. Inbred or isogenic lines of E. maynei have never been generated. The E. maynei genome survey was based on a pool of DNA from thousands of (presumably genetically different) individual mites. Current assembly methods have trouble with heterogenous DNA samples such as this. However, stretches of unique sequences (like protein-coding genes) are more amenable to reconstruction than repetitive elements. As a result, we anticipated that genes that were not interrupted by repetitive elements or that did not possess gross allelic variation within the population (large insertions or deletions) would be able to be assembled from the sequence data. Assembly metrics for scaffolds are presented in S1 Table. BUSCO was performed on assembled contigs to test how well genes were assembled from the data, and to estimate how well the assemblies represented a complete gene set. These analyses indicated that the Minia-produced assemblies using a kmer length of 47 or 53 were the best in terms of the number of complete or fragmented loci present, and with the fewest missing loci (Table 1). The models used by Kmer genie predicted that these assemblies would be the best. SSPACE Scaffolds (gapped contigs) generated with these two data sets displayed improvement over the contigs, and were also better than the scaffolds generated by CLC assembly methods. The scaffolds from the Minia/SSPACE assembly using a kmer length of 47 were then selected for use in protein predictions. The size of this assembly was roughly 59 Mb, distributed across 196,133 fragments with an N50 of 480 nt. Following removal of short (<200nt) sequences, the deposited assembly represented 43 Mb across 72,749 fragments with an N50 of 788.
Table 1

BUSCO analyses of contigs, scaffolds, and predicted proteins for Euroglyphus maynei.

TestkmCompleteDuplicatedFragmentedMissing# lociLineage / mode
Contigs
21173.70.29.386429Eukaryota / genome
477261.11756429Eukaryota / genome
5312231.11264429Eukaryota / genome
7171501272429Eukaryota / genome
9133.208.887429Eukaryota / genome
Scaffolds
477311.31750429Eukaryota / genome
5312261.31458429Eukaryota / genome
25 (CLC Bio)n.a.325.31056429Eukaryota / genome
Proteome
4776111299429Eukaryota / OGS
477433.62530843Metazoa / OGS
477322.418492675Arthropoda / OGS

Values for kmer length (k), minimum abundance (m) during assembly, and the percentages of complete, duplicated, fragmented and missing loci are presented along with the total number of loci sampled, the analysis lineage and mode used.

Values for kmer length (k), minimum abundance (m) during assembly, and the percentages of complete, duplicated, fragmented and missing loci are presented along with the total number of loci sampled, the analysis lineage and mode used.

Predicted proteome

Maker predicted over 15,000 proteins (or fragments). The predicted proteome was also analyzed by BUSCO and nearly double the number of complete loci were identified when compared to the tests of scaffolds. This likely represented enhancements that the Maker pipeline provided for gene predictions and structural annotations which were not inherent in the methods embedded in the BUSCO genomic DNA analysis pipeline. BUSCO estimates of missing proteins ranged from 9% of highly conserved eukaryotic loci to 49% of arthropod (mostly insect) loci. However, prior analyses suggest that the number of missing loci is overestimated when BUSCO is performed with species of acari [7,35]. Genome statistics generated by GAG revealed that the majority of predicted proteins (80.4%) had a start codon, but that only 8,198 of the 15,598 loci (52.5%) had both a predicted start codon and a predicted stop codon. All analyses indicated that the assembled E. maynei sequences were partly fragmented, and that some loci were probably missed in our survey (Table 1). Although the assembly and gene prediction methods had some difficulty assembling and predicting full-length genes from the source sequences, the majority of test loci were represented in our data set.

Comparison to related mites

To determine if the assembled genome and predicted proteome were representative of an astigmatid mite, we made multiple comparisons. The assembled genome of 59 Mb was similar to the genomes assembled from S. scabiei (56.2 Mb) and D. farinae (53.5 Mb), two other mites in the Astigmata. Previous analyses indicated that astigmatid mite genomes are replete with simple sequence repeats [7]. We clearly demonstrated that simple sequence repeats (SSR) are more frequent in the genomes of astigmatid mites than they are in the genomes of ticks and other non-astigmatid mites whose genomes are larger. This is not the same phenomenon as having a large proportion of the total genome represented by transposons and retroviral sequences. The E. maynei genome was found to have >1400 SSR/megabasepair (S2 Table). This number was slightly over half of that found in either S. scabiei, or D. farinae, but it was consistent with what we expected from an astigmatid mite [7]. Blast2Go was used to categorize predicted proteins into molecular functions, biological processes, and cellular components for comparison to the S. scabiei and D. farinae proteomes. The most abundant molecular functions, biological processes, and cellular components were not substantially different across the mite species and were consistent with their phylogenetic relationship (Fig 1). We also utilized reciprocal best BLAST searches of D. farinae, S. scabiei, and E. maynei to identify proteins that were common across these astigmatid mite species. There were more than five thousand proteins shared among the predicted proteomes of the three astigmatid mites that were examined, but over 25% of the loci from each species were not able to be assigned to an ortholog in another species using reciprocal BLAST (Fig 2). The unassigned loci included homologs, as well as unique sequences.
Fig 1

Top Gene Ontologies for mite proteins containing Interpro domains.

The twelve most abundant GO terms were identified for E. maynei (Em) and the corresponding values for those terms for S. scabiei (Ss) and D. farinae (Df) were compiled. Data for Cellular Processes, Cellular Components, and Molecular Functions are presented. Bar volumes represent the percentage of proteins annotated with a particular GO term when compared to the total number of proteins in the twelve selected categories for each species.

Fig 2

Venn diagram representing shared orthologs in three astigmatid mite species.

Orthologs were identified as best reciprocal Blast hits.

Top Gene Ontologies for mite proteins containing Interpro domains.

The twelve most abundant GO terms were identified for E. maynei (Em) and the corresponding values for those terms for S. scabiei (Ss) and D. farinae (Df) were compiled. Data for Cellular Processes, Cellular Components, and Molecular Functions are presented. Bar volumes represent the percentage of proteins annotated with a particular GO term when compared to the total number of proteins in the twelve selected categories for each species.

Venn diagram representing shared orthologs in three astigmatid mite species.

Orthologs were identified as best reciprocal Blast hits.

Allergen homologs

Information from E. maynei can aid our understanding of the evolution and molecular biology of other mites such as Dermatophagoides farinae and Sarcoptes scabiei. However, a key impetus for studying E. maynei is that it is an important source of allergens for humans and companion animals. In the future, diagnostic tests and immunotherapy will likely be based on using recombinant allergens. Thus, knowledge of the E. maynei genome, and allergens predicted to be encoded by it are prerequisites for developing these. Because limited molecular information is available for E. maynei allergens, we were interested in identifying E. maynei orthologs of the previously reported dust mite allergens. Mite allergens are placed into groups based on their immunoreactivity and the groups are named sequentially based on date of discovery. There are currently 33 allergen classes (Group 1 –Group 33) associated with house dust mites [36]. The WHO/IUIS database (http://www.allergen.org/) attempts to maintain an authoritative listing of these allergens, but ambiguities persist in the nomenclature for the more recently described mite allergens. This stems from discord among published manuscripts, the presence of sequence database entries from different investigators, a lack of sequence data, or a combination of these issues. For example, Group 17 is reported to be a Calcium-binding EF hand protein [37], but no sequence data has been deposited in a publicly available database for comparisons. Three separate identities have been given to Group 26: EF-2 [38], cyclophilin [39], or myosin [36]. The latter is presented in a review that cites An et al. [38]. However, a more recent review by the same author [40] cites the WHO/IUIS database. Two different assignments have been made for Group 29: cyclophilin [38] or Profilin (Accession number AIO08866.1) [39]. Furthermore, the WHO/IUIS database indicates that myosin and ferritin were previously alternatives for the group 30 allergens. From a bioinformatics perspective, it was straight forward to identify even distantly related sequences when sequence data was available, but assigning sequences to the proper allergen group name (even when clearly orthologous proteins were identified) was problematic. We were able to identify sequences encoding allergen candidates for 28 of the 30 allergen groups for which sequence data have been unequivocally assigned, as well as candidates for alternative group 26 and group 29 proteins (EF-2, cyclophilin, profilin). Of the 30 allergen groups, we were only unable to find group 19 (a short antimicrobial peptide) and group 24 (ubiquinol-cytochrome c reductase binding protein) in our data set. Two groups did not assemble well as evidenced by incomplete or fragmented protein coding regions: gelsolins (Group 16) and the Heat Shock Proteins (Group 28). Gelsolin proteins contain a repeat structure, and this likely created ambiguity during assembly of sequences into full-length protein coding loci. With the Heat Shock Proteins, a similar assembly challenge existed but it was likely due to the presence of a multigene family. This also occurred for certain gene families in the S. scabiei genome, where the same assembly strategy was used [7]. Additional allergen candidates were split among multiple contigs. For example, a single paramyosin was represented by three contigs, as was one of at least two identified apolipophorin genes. An amylase gene was split in two, as was an inorganic pyrophosphatase. A supervised assembly method would likely result in a more complete reconstruction of these loci. The majority of loci with homology to known allergens were assembled as contiguous sequences and contained complete functional domains (Table 2).
Table 2

Allergen homologs in the genome of E. maynei.

Locus IDStart MethionineComplete DomainsFragmented DomainsOrthology
BLA29_000749-RAYesI29, Peptidase C1Yes
BLA29_005376-RAYesI29, Peptidase C1Yes
BLA29_005717-RAYesI29, Peptidase C1Yes
BLA29_002725-RAYesI29, Peptidase C1
BLA29_000188-RAYesPeptidase C1
BLA29_005995-RANoPeptidase C1
BLA29_001679-RANoPeptidase C1
BLA29_002474-RAYesPeptidase C1
BLA29_004167-RAYesPeptidase C1
BLA29_009596-RANoPeptidase C1
BLA29_007497-RAYesPeptidase C1
BLA29_001844-RAYesML superfamilyYes
BLA29_002039-RAYesML superfamilyYes
BLA29_007771-RAYesML superfamily
BLA29_004181-RAYesML superfamily
BLA29_001646-RAYesML superfamilyYes
BLA29_001231-RAYesTryp SPcYes
BLA29_009032-RAYesTryp SPc
BLA29_000748-RAYesTryp SPc
BLA29_004152-RAYesTryp SPc
BLA29_004244-RAYesTryp SPc
BLA29_003534-RAYesTryp SPc
BLA29_001429-RAYesTryp SPc
BLA29_001029-RAYesTryp SPc
BLA29_004415-RAYesTryp SPc
BLA29_005972-RAYesTryp SPc
BLA29_003326-RAYesTryp SPc
BLA29_006839-RANoTryp SPc
BLA29_007935-RAYesTryp SPc
BLA29_008722-RANoTryp SPc
BLA29_008667-RANoTryp SPc
BLA29_004732-RAYesTSP 1, Tryp SPc
BLA29_013905-RANoAmy Ac
BLA29_003192-RAYesAmy Ac, Amy C
BLA29_005982-RANoAmy Ac, Amy C
BLA29_010118-RAYesBlo t 5Yes
BLA29_005095-RAYesTryp SPcYes
BLA29_008589-RAYesGrp7 allergen
BLA29_008391-RAYesGrp7 allergen
BLA29_010568-RAYesGST N Mu, GST C
BLA29_007805-RAYesGST N Mu, GST C
BLA29_008641-RAYesGST N Mu, GST C
BLA29_002590-RAYesTryp SPcYes
BLA29_005861-RAYesTropomyosinYes
BLA29_002808-RAYesMyosin tail 1Yes 1/3
BLA29_004720-RANoFam76, TMPIT, Bap31Yes 1/3
BLA29_010257-RANoMyosin tail 1Yes 1/3
BLA29_002507-RAYesGlyco 18, CBM 14
BLA29_002269-RANoGlyco 18, CBM 14, CBM 14
BLA29_005952-RAYesGlyco 18
BLA29_014546-RAYesCBM 14
BLA29_003622-RAYesCBM 14
BLA29_011891-RANoCBM 14, CBM 14
BLA29_012640-RAYesLipocalinYes
BLA29_007304-RAYesLipocalin
BLA29_007408-RAYesLipocalinYes
BLA29_007657-RAYesLipocalinYes
BLA29_008468-RAYesLipocalinYes
BLA29_010085-RANoLipocalinYes
BLA29_010726-RAYesYes 1/3
BLA29_003031-RAYesLPD NYes 1/3
BLA29_002649-RAYesYes 1/3
BLA29_008619-RAYesADF GelsolinADF Gelsolin
BLA29_014251-RANoADF Gelsolin, ADF Gelsolin
BLA29_001338-RAYesarginine kinase likeYes
BLA29_006943-RANoarginine kinase likeYes 1/2
BLA29_011280-RAYesarginine kinase likeYes 1/2
BLA29_006646-RAYesarginine kinase like
BLA29_006289-RAYesBlo t 5Yes
BLA29_010079-RAYesTIM phosphate binding
BLA29_008286-RANoSerpin
BLA29_005081-RAYesSerpin
BLA29_002584-RAYesSerpin
BLA29_005776-RAYesSerpin
BLA29_000753-RAYesSerpin
BLA29_004510-RAYesSerpin
BLA29_004546-RAYesSerpin
BLA29_002279-RAYesSerpin
BLA29_011336-RAYesHSP70
BLA29_015195-RAYesHSP70
BLA29_006000-RANoHSP70/dnaK
BLA29_008310-RAYesHSP70
BLA29_008418-RANoHSP70
BLA29_003383-RAYesHSP70
BLA29_006210-RAYesHSP70
BLA29_001944-RAYesHSP70
BLA29_004075-RAYesHSP70
BLA29_004373-RAYesHSP70
BLA29_008613-RAYesHSP70
BLA29_002587-RAYesCyclophilin, RRM PPIL 4
BLA29_010596-RAYesCyclophilin
BLA29_003954-RANoCyclophilin
BLA29_009322-RAYesCyclophilin
BLA29_010263-RAYesCyclophilin
BLA29_012512-RANoCyclophilin
BLA29_013856-RAYesCyclophilin
BLA29_006024-RAYesCyclophilin
BLA29_007484-RAYesCyclophilin
BLA29_008744-RANoCyclophilin
BLA29_002533-RAYesWD40, Cyclophilin
BLA29_004461-RAYesRRM PPIE, Cyclophilin
BLA29_000133-RAYesPROF
BLA29_008125-RAYesFerritin
BLA29_002580-RAYesFerritin
BLA29_008226-RAYesFerritin
BLA29_001133-RAYesADF CofilinYes
BLA29_004452-RANoPyrophosphatase
BLA29_014471-RAYesPyrophosphatase
BLA29_003724-RAYesalpha tubulin
BLA29_003856-RAYesalpha tubulin
BLA29_001366-RAYesalpha tubulin

The locus ID for each allergen identified is indicated, along with whether or not it had a predicted start codon. Complete and partial functional domains are indicated. Orthology was determined by phylogenetic analyses of gene families.

The locus ID for each allergen identified is indicated, along with whether or not it had a predicted start codon. Complete and partial functional domains are indicated. Orthology was determined by phylogenetic analyses of gene families.

Allergen orthologs

There are different allergen groups that either share sequence homology between groups, or that are represented by multigene families. Therefore, an additional challenge is to identify the best allergen candidates from among those that show some homology to the known allergens. To do this, we pursued a phylogenetics approach within certain gene families to identify which E. maynei homologs were most closely related to the known allergens (Table 2). Noteworthy results from those comparisons are presented below.

Cysteine proteases (group 1 family)

The cysteine proteases in E. maynei were represented by at least 11 proteins with homology to cathepsins and allergenic proteases from other species. All possessed a complete peptidase domain. One of these cysteine proteases was orthologous to the two known allergens from Dermatophagoides mites (Der f 1 and Der p 1). Two other proteins also displayed close affinity to these allergens in the phylogenetic analyses. The other 9 homologs were more distantly related and were outside the clade that includes the expanded family of scabies allergen paralogs [7]. In S. scabiei, the majority of cysteine protease allergen paralogs that have been identified have mutations that inactivate the protein and it has been proposed that these scabies mite inactive protease paralogs (SMIPP-C) are involved in modulating or evading the host immune system [41]. Only one protein in E. maynei was missing the active site cysteine and one of the predicted proteins lacked a start methionine. These observations were consistent with the hypothesis that inactive protease genes are under a unique type of selection pressure in scabies mites that is not present in house dust mites like E. maynei and D. farinae.

Lipid binding domain proteins (group 2 and 22 families)

MD-2 like lipid binding domains are present in two different allergen groups (2 and 22). Group 2 is considered to be a major allergen, and group 22 is considered a minor allergen. In E. maynei, the predicted proteome contained four proteins with complete MD-2 domains and one protein with a partial domain. Two of these proteins were clear orthologs of Der f 2, and one was closely related to Der p 22. The remaining two homologs (one was the partial protein) were more similar to family members from Sarcoptes scabiei.

Serine proteases (group 3, 6, and 9 families)

Twenty serine protease homologs were identified. In Sarcoptes scabiei, the serine protease gene family is expanded and contains inactive paralogs that are speculated to be involved in manipulating host immunity [42-44]. Most of the serine protease allergen homologs found in S. scabiei var. canis are derived from group 3 allergens and are present as tandem duplications in the genome [7]. In E. maynei, there was one protein that appeared to be an ortholog to each of the group 3, 6, and 9 allergens from Dermatophagoides mites. Each of these three proteins possessed an intact (active) catalytic triad. There were 17 more predicted proteins whose relationships were unclear. They were from different genomic scaffolds and they were attached by long branches to the main phylogenetic tree (S2 Fig). Other than being related to the allergens, no attempt has been made to determine what type of proteases these are. Two of the predicted proteins contained short introns and may represent pseudogenes. Three of the remaining proteins were missing one or two of the residues in the catalytic triad, and are likely inactive. A different subset of three proteases appeared to be missing the residues required for substrate binding. Thus, the serine protease gene family in E. maynei was unlike that of either Dermatophagiodes or scabies mites.

Amylases (group 4 family)

Mite group 4 allergens are amylase enzymes that can be found in feces. A nearly full length E. maynei homolog was identified that is a little different from the existing NCBI record (AAD38943; 82% identity). Two other fragments were found that probably represent the N and C termini of the existing NCBI record (>98% identity). So, at least two alpha amylases were present in E. maynei, one of which is the previously identified ortholog to the group 4 allergens. This is in contrast to S. scabiei, which did not appear to possess an ortholog of the allergenic alpha amylase, but did possess distantly related amylases [7].

Chitin-interacting proteins (group 12, 15, 18, 23 families)

Four groups of allergens, also present in mite feces, share the presence of chitin interaction domains. One domain is a chitin-binding module (CBM14; present in all four allergen groups) and the other is a putative chitinase domain (Glyco18; also found in groups 15 and 18). Three of the proteins in E. maynei that showed homology to these allergens contained only the CBM14 domain, one contained only a Glyco18 domain, and two possessed both Glyco18 and CBM14 domains. The domain organization, and the homology of the E. maynei proteins to known allergens is discordant. While these might be potential E. maynei allergen candidates, none of these proteins could be assigned as orthologs to the known mite allergens.

Fatty acid binding proteins (group 13 family)

There were 6 homologs of the group 13 allergens. One was most closely related to a version of Blo t 13, and one was most closely related to Der p 13. Three of the remaining 4 appeared to be paralogs, and along with one S. scabiei protein formed a clade with Der f 13 (S3 Fig).

Arginine kinases (group 20 family)

Four arginine kinase-containing contigs were identified that probably represent three loci. Two proteins appear to be paralogs (77% identity) with one of the paralogs split into two contigs that were probably fragmented during assembly because of repetitive sequences (as was the case in S. scabiei). These paralogs appeared to be orthologs of the Dermatophagoides allergens, while the last arginine kinase was a more distantly related type.

Serpin-like proteins (group 27 family)

Eight serpin like proteins were identified. Only one could be considered a candidate ortholog for Der f 27, and the remainder were more distantly related.

Alpha tubulin homologs (group 33 family)

Three alpha tubulin homologs were identified. All were very closely related to one another and to other alpha tubulins. The high sequence conservations made them indistinguishable from the perspective of determining which one is a more likely allergen candidate.

Conclusions

The E. maynei draft genome is similar to other astigmatid mites because it is small and possesses a large number of simple sequence repeats per megabase pair. Future investigations will undoubtedly improve the assembly and annotation, which could be achieved with alternative DNA sources (e.g., sequences derived from multiple individual mites, or using long read technologies). Gene annotations would also be improved with the acquisition of RNA sequence data. We found that the composition of the predicted proteomes from astigmatid mites were similar with respect to the proteins that were able to be categorized based on gene ontology term abundance for conserved domains. Allergen-related proteins were readily identified within the predicted proteome. Identifying clear 1:1 orthologs for the mite allergen proteins was more challenging. The sequence data generated from this study should allow for the development of better methods for diagnosing the sources of allergens originating from different species of astigmatid mites that occur in house dust in homes worldwide. It is also a prerequisite for developing well-defined recombinant allergen cocktails for the treatment of dust mite allergies. The work is timely and important to enable new approaches for the diagnosis and understanding of the mechanistic aspect of the allergic reaction to house dust mites.

Kmer analyses of trimmed reads from E. maynei.

The top right is the data output from the standard model test with the optimal kmer distribution graph presented to the right. Bottom is the data output from the diploid model test with the optimal kmer distribution to the right. In both models, the model is represented by the red line. For the diploid model, the green line represents the heterozygous kmer distribution and the blue line represents homozygous kmer distribution predicted within the model. (TIF) Click here for additional data file.

Maximum likelihood tree for the group 3, 6 and 9 mite allergens (serine proteases).

Bootstrap values are included. (TIF) Click here for additional data file.

Maximum likelihood tree for the group 13 mite allergens and related proteins (fatty acid binding proteins).

Bootstrap values are included. (TIF) Click here for additional data file.

Assembly metrics for contigs before scaffolding.

(XLSX) Click here for additional data file.

MISA output data file for simple sequence repeat counts.

(XLSX) Click here for additional data file.
  40 in total

1.  A multigene family of inactivated cysteine proteases in Sarcoptes scabiei.

Authors:  Deborah C Holt; Katja Fischer; Susan J Pizzutto; Bart J Currie; Shelley F Walton; David J Kemp
Journal:  J Invest Dermatol       Date:  2004-07       Impact factor: 8.551

2.  Cross-allergenicity of the house dust mites Euroglyphus maynei and Blomia tropicalis.

Authors:  M S Morgan; L G Arlian; E Fernandez-Caldas
Journal:  Ann Allergy Asthma Immunol       Date:  1996-11       Impact factor: 6.347

3.  Population Growth and Allergen Content of Cultured Euroglyphus maynei House Dust Mites.

Authors:  Marjorie S Morgan; DiAnn L Vyszenski-Moher; Larry G Arlian
Journal:  Int Arch Allergy Immunol       Date:  2015-04-29       Impact factor: 2.749

4.  The draft genome, transcriptome, and microbiome of Dermatophagoides farinae reveal a broad spectrum of dust mite allergens.

Authors:  Ting-Fung Chan; Kun-Mei Ji; Aldrin Kay-Yuen Yim; Xiao-Yu Liu; Jun-Wei Zhou; Rui-Qi Li; Kevin Yi Yang; Jing Li; Meng Li; Patrick Tik-Wan Law; Yu-Lan Wu; Ze-Lang Cai; Hao Qin; Ying Bao; Ross Ka-Kit Leung; Patrick Kwok-Shing Ng; Ju Zou; Xiao-Jun Zhong; Pi-Xin Ran; Nan-Shan Zhong; Zhi-Gang Liu; Stephen Kwok-Wing Tsui
Journal:  J Allergy Clin Immunol       Date:  2014-10-31       Impact factor: 10.793

5.  A new bioinformatics analysis tools framework at EMBL-EBI.

Authors:  Mickael Goujon; Hamish McWilliam; Weizhong Li; Franck Valentin; Silvano Squizzato; Juri Paern; Rodrigo Lopez
Journal:  Nucleic Acids Res       Date:  2010-05-03       Impact factor: 16.971

6.  A comparative study of allergenic and potentially allergenic enzymes from Dermatophagoides pteronyssinus, D. farinae and Euroglyphus maynei.

Authors:  G A Stewart; C H Bird; K D Krska; M J Colloff; P J Thompson
Journal:  Exp Appl Acarol       Date:  1992-11       Impact factor: 2.132

7.  Clinical significance and allergenic cross-reactivity of Euroglyphus maynei and other nonpyroglyphid and pyroglyphid mites.

Authors:  M van Hage-Hamsten; S G Johansson
Journal:  J Allergy Clin Immunol       Date:  1989-03       Impact factor: 10.793

8.  Tick genomics: the Ixodes genome project and beyond.

Authors:  J Pagel Van Zee; N S Geraci; F D Guerrero; S K Wikel; J J Stuart; V M Nene; C A Hill
Journal:  Int J Parasitol       Date:  2007-06-09       Impact factor: 3.981

9.  Fast, scalable generation of high-quality protein multiple sequence alignments using Clustal Omega.

Authors:  Fabian Sievers; Andreas Wilm; David Dineen; Toby J Gibson; Kevin Karplus; Weizhong Li; Rodrigo Lopez; Hamish McWilliam; Michael Remmert; Johannes Söding; Julie D Thompson; Desmond G Higgins
Journal:  Mol Syst Biol       Date:  2011-10-11       Impact factor: 11.429

10.  Genome Sequencing of the Phytoseiid Predatory Mite Metaseiulus occidentalis Reveals Completely Atomized Hox Genes and Superdynamic Intron Evolution.

Authors:  Marjorie A Hoy; Robert M Waterhouse; Ke Wu; Alden S Estep; Panagiotis Ioannidis; William J Palmer; Aaron F Pomerantz; Felipe A Simão; Jainy Thomas; Francis M Jiggins; Terence D Murphy; Ellen J Pritham; Hugh M Robertson; Evgeny M Zdobnov; Richard A Gibbs; Stephen Richards
Journal:  Genome Biol Evol       Date:  2016-06-27       Impact factor: 3.416

View more
  3 in total

1.  A genome-wide screening for RNAi pathway proteins in Acari.

Authors:  Beatrice T Nganso; Noa Sela; Victoria Soroker
Journal:  BMC Genomics       Date:  2020-11-12       Impact factor: 3.969

2.  Transcriptomic analysis of the poultry red mite, Dermanyssus gallinae, across all stages of the lifecycle.

Authors:  Kathryn Bartley; Wan Chen; Richard I Lloyd Mills; Francesca Nunn; Daniel R G Price; Stephane Rombauts; Yves Van de Peer; Lise Roy; Alasdair J Nisbet; Stewart T G Burgess
Journal:  BMC Genomics       Date:  2021-04-07       Impact factor: 3.969

3.  Genomes of trombidid mites reveal novel predicted allergens and laterally transferred genes associated with secondary metabolism.

Authors:  Xiaofeng Dong; Kittipong Chaisiri; Dong Xia; Stuart D Armstrong; Yongxiang Fang; Martin J Donnelly; Tatsuhiko Kadowaki; John W McGarry; Alistair C Darby; Benjamin L Makepeace
Journal:  Gigascience       Date:  2018-12-01       Impact factor: 6.524

  3 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.