Literature DB >> 29788250

Comparative Genomics Reveals Thousands of Novel Chemosensory Genes and Massive Changes in Chemoreceptor Repertories across Chelicerates.

Joel Vizueta¹, Julio Rozas¹, Alejandro Sánchez-Gracia¹.

Abstract

Chemoreception is a widespread biological function that is essential for the survival, reproduction, and social communication of animals. Though the molecular mechanisms underlying chemoreception are relatively well known in insects, they are poorly studied in the other major arthropod lineages. Current availability of a number of chelicerate genomes constitutes a great opportunity to better characterize gene families involved in this important function in a lineage that emerged and colonized land independently of insects. At the same time, that offers new opportunities and challenges for the study of this interesting animal branch in many translational research areas. Here, we have performed a comprehensive comparative genomics study that explicitly considers the high fragmentation of available draft genomes and that for the first time included complete genome data that cover most of the chelicerate diversity. Our exhaustive searches exposed thousands of previously uncharacterized chemosensory sequences, most of them encoding members of the gustatory and ionotropic receptor families. The phylogenetic and gene turnover analyses of these sequences indicated that the whole-genome duplication events proposed for this subphylum would not explain the differences in the number of chemoreceptors observed across species. A constant and prolonged gene birth and death process, altered by episodic bursts of gene duplication yielding lineage-specific expansions, has contributed significantly to the extant chemosensory diversity in this group of animals. This study also provides valuable insights into the origin and functional diversification of other relevant chemosensory gene families different from receptors, such as odorant-binding proteins and other related molecules.

Entities: Chemical Disease Gene Species

Mesh：

Substances：
Arthropod Proteins

Year: 2018 PMID： 29788250 PMCID： PMC5952958 DOI： 10.1093/gbe/evy081

Source DB: PubMed Journal: Genome Biol Evol ISSN： 1759-6653 Impact factor: 3.416

Introduction

The i5k initiative (Robinson et al. 2011) has greatly boosted the complete genome sequencing and functional annotation of a number of arthropod species. The currently available genome data were obtained from species chosen for their significance as model organisms in diverse areas, such as agriculture, medicine, food safety or biodiversity, or for their strategic phylogenetic position in evolutionary studies on the diversification of the major arthropod lineages (Adams et al. 2000; Colbourne et al. 2011; Cao et al. 2013; Chipman et al. 2014; Sanggaard et al. 2014; Gulia-Nuss et al. 2016). As expected, the first sequencing initiatives focused on insects, although the number of sequenced noninsect genomes has increased considerably over time, especially in chelicerates. The recent genome sequence data from chelicerate species (Cao et al. 2013; Sanggaard et al. 2014; Gulia-Nuss et al. 2016) are disrupting the strongly biased taxonomic distribution of arthropod genomes hitherto available. More importantly, these new data have greatly facilitated studies on the origin and evolutionary divergence of this highly diverse animal subphylum (Kenny et al. 2016; Schwager et al. 2017), which has important impacts on translational research such as silk production in spiders, biomedical applications of spider and scorpion venom toxins, or plague control in acari (Mille et al. 2015; Hoy et al. 2016; Babb et al. 2017; Gendreau et al. 2017; Pennisi 2017). Chemoreception is a paradigmatic example of a relatively well-known biological system in insects, but it is not as well characterized in other arthropods despite numerous practical applications as pest control strategies, biosensors or electronic nose sensors (Berna et al. 2009; Wei et al. 2017). In chelicerates, as in other animals, the chemosensory system (CS) is critical for the survival, reproduction, and social communication of individuals. The detection and integration of environmental chemical signals, including smell and taste, allow organisms to detect food, hosts, and predators and frequently play a crucial role in social communication (Joseph and Carlson 2015). In Drosophila, peripheral events occur in specialized hair-like cuticular structures (sesilla) that are distributed throughout the body surface, with a prominent concentration in antennae and maxillary palps (olfactory sensilla) or on the distal tarsal segments of the legs (gustatory sensilla) (Pelosi 1996; Shanbhag et al. 2001). In this species, chemoreceptor proteins, which are located in the membranes of sensory neurons innervating the sensillum lymph, convert the external chemical signal into an electrical one, which is, in turn, processed in higher brain regions (de Bruyne and Baker 2008; Sánchez-Gracia et al. 2009; Sato and Touhara 2009). The sensillum lymph contains a set of highly abundant small globular proteins (hereafter termed “binding proteins”) that are thought to bind to, solubilize and transport chemical cues to the space surrounding chemoreceptors (Vogt and Riddiford 1981; Pelosi et al. 2006). The genome of the fruit fly encodes two different kinds of membrane chemoreceptors that are phylogenetically unrelated. The first group comprises the superfamily of insect olfactory (Or) and gustatory (Gr) receptors, which encode seven-transmembrane receptors with an atypical membrane topology and heteromeric function, and share a common origin (Missbach et al. 2015). Interestingly, and despite performing analogous functions, these receptors are structurally and genetically unrelated to their vertebrate counterparts, where G protein-coupled receptors are involved in chemoreception (Kaupp 2010). The second group of chemoreceptors encodes the ionotropic receptor (Ir) gene family, a highly divergent lineage that is related to the ionotropic glutamate receptors superfamily (iGluR) associated with both olfaction and taste functions (Robertson and Wanner 2006; Benton et al. 2009; He et al. 2013; Missbach et al. 2014). The extracellular binding proteins of Drosophila include the odorant binding protein (Obp), chemosensory protein (Csp), chemosensory proteins A and B (CheA and CheB) and Niemann–Pick Type C2 (Npc2) families (Li et al. 2008; Dani et al. 2011; Iovinella et al. 2011). Moreover, sensory neuron membrane proteins (SNMPs), which are related to the CD36 receptor family and expressed in specific Drosophila pheromone-responding sensory neurons, also play a key role in sensory perception by facilitating the contact between ligand and receptor (Gomez-Diaz et al. 2016). It is worth noting that there is a lack of evidence that all CS family members actually possess a true chemosensory function, and they are usually classified as chemosensory-related genes based on their sequence similarity with previously examined members (Kitabayashi et al. 1998; Wanner et al. 2005; Ishida et al. 2013; Joseph and Carlson 2015). There are few comprehensive studies of the characterization and classification of CS gene families in noninsect genomes, with only six noninsect arthropod species investigated to date: The crustacean Daphnia pulex, the myriapods Strigamia maritima and Trigoniulus corallinus, and the chelicerates Ixodes scapularis, Metaseiulus occidentalis and Tetranychus urticae (Colbourne et al. 2011; Chipman et al. 2014; Kenny et al. 2015; Gulia-Nuss et al. 2016; Hoy et al. 2016; Ngoc et al. 2016). Moreover, we and others have also reported transcriptome data for various chelicerate species (Frías-López et al. 2015; Qu et al. 2016; Eliash et al. 2017; Vizueta et al. 2017). These works confirm that chelicerates contain members of all insect CS gene families, with the single exception of the Or family (Benton et al. 2007; et al. 2011; Missbach et al. 2014), which likely emerged from a Gr ancestor during the diversification of flying insects (Missbach et al. 2015). The recent identification of two novel candidate CS families in chelicerates, the Obp-like and the candidate carrier protein (Cpp) families, is also remarkable (Vizueta et al. 2017). The Obp-like family, which encodes proteins with some sequence and structural similarity to canonical insect OBPs, has also been identified in centipedes (Vizueta et al. 2017), and this finding makes unclear the evolution of these gene families in arthropods. The Ccp family, which was first discovered in the transcriptome of D. silvatica, contains members that are differentially expressed in the putative chemosensory appendages of this spider. Although OBP-like and CCPs share some common structural features with other CS proteins, their potential functional roles as chemosensory proteins and the extent to which these proteins are present in arthropods remain to be elucidated (Renthal et al. 2017; Vizueta et al. 2017). The ancestor of all extant chelicerates can be traced back to the Cambrian period (∼530 Ma); therefore, this group colonized land independently of the other arthropod lineages (Hexapoda, Crustacea, and Myriapoda; Rota-Stabelli et al. 2013). As there are no OR-encoding genes, other proteins likely perform OR’s function. Current experimental data from noninsect arthropods, such as the specific gene expression and electrophysiological recording data for some IR members in the olfactory structures of lobsters and hermit crabs (Corey et al. 2013; Groh-Lunow et al. 2015) and RNA-seq of the palps and first pair of legs of spiders (Vizueta et al. 2017) and centipede antennas (C. Frias-López, F.C. Almeida, S. Guirao-Rico, R. Jenner, A. Sánchez-Gracia and J. Rozas, unpublished results), indicate that this receptor family contains the best candidates for actual olfactory receptors. The specific organs and molecules responsible for gustatory function are less well understood; nevertheless, as some Gr and Ir family members are differentially expressed across some body parts in these species, contact chemoreceptors appear to be the best candidates. Given this difference in functional roles of the various CS families, it is highly relevant to gain further comprehensive insights into their evolution in arthropods other than insects/hexapods. Here, we carried out an enhanced comparative genomic analysis of the CS families across 11 chelicerate genomes. We applied powerful sequence similarity-based searches using state-of-the-art methodologies and expressly considered the fragmented nature of the surveyed genomes. We conducted a comprehensive phylogenetic analysis of chemosensory genes from different gene families and characterized the turnover rates of chemoreceptor families across chelicerates after accurate estimation of the number of gene duplications and gene losses in each lineage. We also contribute new knowledge about some interesting questions that are not yet fully resolved, such as the evolutionary relationship between OBP and OBP-like proteins or the extent in which CCP and CSP are present in chelicerates.

Materials and Methods

Genomic Data

We retrieved all genomic sequences, annotations, and predicted peptides of 14 arthropods, including 11 chelicerates, from public databases (fig. 1). Specifically, we used the genome information of the fruit fly Drosophila melanogaster (r6.05, FlyBase) (Adams et al. 2000), the crustacean Daphnia pulex (r1.26, Ensembl Genomes) (Colbourne et al. 2011), and the centipede Strigamia maritima (r1.26, Ensembl Genomes) (Chipman et al. 2014). The chelicerate genomes included the horseshoe crab Limulus polyphemus (v2.1.2, NCBI Genomes) (Nossa et al. 2014); the acari Tetranychus urticae (r1.26, Ensembl Genomes) (Grbić et al. 2011), Metaseiulus occidentalis (v1.0, NCBI Genomes) (Hoy et al. 2016), and Ixodes scapularis (r1.26, Ensembl Genomes) (Gulia-Nuss et al. 2016); the scorpions Centruroides exilicauda (bark scorpion, genome assembly version v1.0, annotation version v0.5.3; Human Genome Sequencing Center [HGSC]) and Mesobuthus martensii (v1.0, Scientific Data Sharing Platform Bioinformation [SDSPB]; Cao et al. 2013); and the spiders Acanthoscurria geniculata (tarantula, v1, NCBI Assembly, BGI; Sanggaard et al. 2014), Stegodyphus mimosarum (African social velvet spider, v1, NCBI Assembly, BGI; Sanggaard et al. 2014), Latrodectus hesperus (western black widow, v1.0, HGSC), Parasteatoda tepidariorum (common house spider, v1.0 Augustus 3, SpiderWeb and HGSC; Schwager et al. 2017), and Loxosceles reclusa (brown recluse, v1.0, HGSC).

. 1.—

Phylogenetic relationships among the 14 surveyed species. Divergence times are given in millions of years. Some branches representative of major lineages are shaded in different colors. Green, insects; light blue, crustaceans; dark blue, myriapods; black, horseshoe crabs; orange, acari; brown, scorpions; red, spiders. Numbers in the right part of the figure indicate the number of CS encoding sequences separated per each family (SMIN values).

Query Data Sets and Protein Search Protocol

Our comprehensive CS search protocol included the creation of three data sets, which were iteratively used as queries in successive hierarchical rounds of sequence similarity- and profile-based searches (fig. 2).

. 2.—

Workflow showing the steps used for the identification and annotation of the chemosensory gene families.

Data Set 1

The starting data set contained the CS proteins from publicly available, well-annotated genomes. This data set included the protein sequences of the Gr, Ir/iGluR, Or, Csp, Obp, Npc2, and Snmp-Cd36 families from 1) the hexapods D. melanogaster (Benton et al. 2009; Vogt et al. 2009; Vieira and Rozas 2011; Pelosi et al. 2014), T. castaneum (Sánchez-Gracia et al. 2009; Croset et al. 2010; Dippel et al. 2014), A. pisum (Zhou et al. 2010), and A. mellifera (Robertson and Wanner 2006; Forêt et al. 2007; Nichols and Vogt 2008); 2) the crustacean D. pulex (Peñalva-Arana et al. 2009); 3) the myriapod S. maritima (Chipman et al. 2014); and 4) the ticks I. scapularis (Gulia-Nuss et al. 2016), M. occidentalis (Hoy et al. 2016), and T. urticae (Ngoc et al. 2016).

Data Set 2

This data set included the sequences of data set 1 (DS1) plus the new identified CS protein sequences with specific CS protein domains (see Table S1 in Vizueta et al. [2017] for details). We applied InterProScan (5.4.47; Jones et al. 2014) against genome-wide predicted peptides without a functional chemosensory annotation (i.e., in chelicerate genomes that were not used in the step to build DS1). Furthermore, we also included in data set 2 (DS2) the members of the Cpp family identified in Vizueta et al. (2017), as well as those found in current chelicerate genomes, after conducting several rounds of BlastP searches (version 2.2.30; Altschul 1997).

Data Set 3

This data set resulted from incorporating some additional highly curated sequences (a second search round against all surveyed genomes) into DS2. For that, we built for each CS family a multiple sequence alignment (MSA) of all DS2 proteins and the corresponding Pfam profile as a guide (using the HMMER software; Eddy 2011). We used these MSAs to build new (more specific) HMM profiles, with one per gene family (generically named CS-F-HMM). For the second search round of predicted peptides from all genomes, we used as queries both the CS-F-HMM profiles (in HMMER searches; i-E-value < 10−5) and the sequences of DS2 (in BlastP searches; E-value < 10−5). Moreover, we only retained the BlastP-positive hits for which the alignment between the query and the subject either covered at least two-thirds of the query length or included at least 80% of the subject peptide. Finally, we trimmed all the fragments not aligned between queries and the subject sequences and added the alignment region to DS2 to build data set 3 (DS3).

Data Set 4 and Data Set for the Analyses

Data set 4 (DS4) is the most curated and inclusive data set used for searches. The new information in DS4 was obtained after conducting exhaustive searches for CS-encoding regions directly on the DNA genome sequences using DS3 peptides as queries in a TBlastN search (E-value < 10−5). Positive blast hits on regions that were not annotated in the GFF files were considered putative novel CS family members. For the genome of A. geniculata, where there is no GFF information, we checked for the presence of any protein-coding region in the available transcriptomic data. The TBlastN search allowed essentially the identification of exonic regions. To expand these regions to cover complete genes (as much as possible), we concatenated all sequences with hits located in the same scaffold and separated by <16 kb. We chose a 16-kb cut-off value because it corresponds to the 95th percentile of the intron length distribution in the studied genomes (i.e., fragments separated by higher distances are unlikely to be exons of the same gene). Next, we translated the nucleotide sequences according to the TBlastN reading frame. To avoid generating chimeric proteins from physically close but different genes, we used the specific CS-F-HMM profile to determine whether the number of different domains of each new protein after concatenation was compatible with a single gene (HMMER search; i-E-value < 10−5). In addition to the “16-kb cut-off approach,” and to try to extend a putative incomplete gene because the putative exons might be located in different scaffolds, we also applied the ESPRIT algorithm (Dessimoz et al. 2011) to join these partial fragments using DS1 as a guide. Finally, all the newly discovered CS-encoding sequences were added to DS3 to generate DS4. These protein data in DS4 were then used as a query to conduct an additional search round (in the same way as in the DS3 and DS4 steps). Finally, we conducted a semiautomatic step to curate the newly identified sequences from putative errors introduced in the search process (deletion of putative artefactual stop codons generated by TBlastN searches, splitting different genes erroneously fused in the same sequence, removing very small fragments). With the curated data, we established the final chelicerate CS protein data set, named DSA (data set for the analyses), which was used in further comparative genomic and evolutionary analyses (supplementary table S1, Supplementary Material online). All new CS-proteins (including incomplete fragments) identified in this study are provided in the supplementary material, Supplementary Material online.

Functional and Structural Classification of CS Sequences

We classified the novel sequences in different categories based on structural and functional criteria. First, we examined the presence of premature stop codons; these features could represent real nonfunctional copies (pseudogenes), errors in sequencing or genome assembly steps or inaccuracies in our automatic annotation step based on TBlastN hits. All sequences encoding complete proteins (CPs) that were free of stop codons were included in the first category (CP set). Operationally, we considered a CP when its length was >80% of the corresponding average protein domain length. In addition, and only for the GR family, we also required that the CP members contained a minimum of 5 of the 7 transmembrane domains (defined by the software TMHMM version 2.0c; Krogh et al. 2001; Phobius version 1.01; Käll et al. 2004). For the CP Ir/iGluR members, we required the presence of the two ligand-binding domains, namely, PF00060 (ligand-gated ion channel) and PF10613 (ligand ion channel L-glutamate- and glycine-binding site), which are present in all Ir/iGluR subfamilies, i.e., kainate, AMPA, NMDA, conserved IRs (Ir25a/Ir8a), and divergent IRs (Croset et al. 2010). The third domain exhibited by some members of the family, PF01094 (ANF receptor), was not used in this step. The remaining sequences that were free of stop codons and did not pass the length filter criteria were classified as incomplete proteins (IP set). Finally, the CP and IP sequences exhibiting some in-frame stop codons (that could represent pseudogenes, among other features; Ψ) were incorporated into two extra data sets (CPΨ and IPΨ sets, respectively). We used three different estimators of the number of copies of a particular CS family (family size). In addition to the straightforward number of CPs in a particular genome (SCP), we also determined the minimum number of sequences that could be unequivocally attributed to different functional genes (SMIN) and the maximum number of members in cases where all the incomplete protein fragments were actually different functional genes (SMAX). We estimated these numbers by aligning all protein sequences (both CP and IP) within a family using the CS-F-HMM profile as a guide and examining the matching distribution of all fragments aligned along the protein. The SMIN was obtained by adding to the total number of sequences present in the CP set, the minimum number of sequences of the IP set that could be unequivocally attributed to different family members. This minimum amount was determined by counting the number of partial sequences aligned in the most covered protein region of the CS-F-HMM profile-guided MSA. The SMAX is the total number of both CP and IP copies identified (supplementary table S1 and , Supplementary Material online).

Phylogenetic Analyses

As the divergence between some members of the same CS family is huge (i.e., their most recent common ancestor traces back far before the split of the major arthropod lineages, ∼600 Ma; Hedges et al. 2006), building a reliable MSA to estimate the phylogenetic relationships is not straightforward. To address this long-standing problem, we applied the MSA-free HMM distance-based method (Bogusz and Whelan 2017) implemented in the PaHMM-Tree software, which outperforms MSA-based methods when dealing with the high alignment uncertainty that is usually associated with large divergences. All the phylogenies except those of the IR family (see Results for more details about this family) were based on complete sequences. We used the iTOL web server (Letunic and Bork 2007) to format and display the trees.

Gene Turnover Rates

We estimated the gene family turnover rates using a gene tree–species tree reconciliation approach. The ultrametric species tree required for the analysis was inferred by fitting the amino acid variation of all 88 putative single-copy orthologs to the most accepted topology for the 11 species. For the analysis, we used OrthoMCL (v2.0.9; Li et al. 2003) to identify 1:1 orthologs by clustering the sequences by similarity and then generated an MSA (for each ortholog group) with T-Coffee v11.00 (mcoffe mode; Notredame et al. 2000). After filtering the MSAs with trimAl v1.4 (-automated1 option; Capella-Gutiérrez et al. 2009), we estimated the best-fit amino acid substitution model for each MSA with the program jModelTest based on the Akaike information criteria for model selection (Guindon and Gascuel 2003; Darriba et al. 2012) and concatenated all MSA, keeping the individual coordinate information to be used as a partition for the phylogenetic analysis. We used RAxML software (option –f e) to obtain ML estimates of branch lengths and r8s software v 1.80 (Sanderson 2003) to linearize the unrooted ML using the penalized likelihood algorithm. For the last step, we constrained the ages of two internal nodes according to the fossil calibrations: 1) the root (on the range 528–445 Myr; Dunlop and Selden 2009) and 2) the split between scorpions and spiders (at a minimum of 428 Myr; Jeyaprakash and Hoy 2009). We analyzed the family turnover rates for the two largest gene families in Arachnida, Gr and Ir/iGluR, using a gene tree–species tree reconciliation approach. For each family and lineage, we estimated separately the birth (β) and death (δ) rates, which measure the number of sequence gains and losses per sequence per million years, respectively. For the global analysis, we estimated the average values across all branches, excluding Li. polyphemus, which was used to root the tree. We used the software OrthoFinder (Emms and Kelly 2015) to obtain orthogroups (i.e., all groups of N: N orthologs) and gene trees to calculate the number of gene gain and loss events in each lineage with the program Notung (Chen et al. 2000). Finally, we estimated the global turnover rates (β and δ) from these events using formulas 1 and 2 in Almeida et al. (2014), whereas the net turnover rates (Δ) were directly estimated as Δ = β − δ.

Results

The Chemosensory Subgenome of Chelicerates

Our comprehensive search protocol revealed 6,026 CS protein-coding sequences across the 11 surveyed chelicerate genomes (supplementary table S1, Supplementary Material online). Surprisingly, nearly 85% of them (5,086) had previously inaccurate genome annotations, including 4,131 nonannotated sequences (without a GFF record) and another 955 that, despite having structural annotation data in the GFF file, lacked functional information (as putative CS proteins) in the GFF field. Nevertheless, only 2,646 of the 6,026 sequences (supplementary table S1, Supplementary Material online) encoded complete (or nearly complete) CS proteins free of stop codons (CP set). Among the remaining sequences, 1,895 were incomplete (but without stop codons in frame) (IP set) and 1,485 showed one or more premature stop codons (including both CP and IP sequences). Globally, the actual number of putative functional CS genes ranged from 4,255 (SMIN) to 4,541 (SMAX), although only 2,646 of them were complete (SCP) (supplementary table S1, Supplementary Material online). Remarkably, although canonical insect Obp and Or genes were absent in chelicerate genomes, we found a huge and unexpected number of novel Gr-coding (108 uncharacterized peptides plus 3,331 novel genomic sequences) and Ir/iGluR-coding (525 plus 694) sequences. Furthermore, it is noteworthy that Csp members were absent in all genomes, except in the tick I. scapularis, and Ccp family members were identified only in spiders and scorpions (fig. 1).

Chemoreceptors

We found that the Gr family is the largest CS gene family in chelicerates (SMIN = 3,074, SMAX = 3,157, and SCP = 2,032, considering only putative functional sequences; fig. 1; supplementary table S1, Supplementary Material online). Moreover, we also identified 1,097 putative Gr pseudogenes (see Discussion). Remarkably, there are extraordinary differences in the family size across chelicerates; although some species exhibit >400 copies, such as the scorpion C. exilicauda (SMIN = 832), the tick T. urticae (SMIN = 469) or the spider P. tepidariorum (SMIN = 643), others have <60, such as I. scapularis (SMIN = 57) and Li. polyphemus (SMIN = 58) (supplementary table S1, Supplementary Material online). These results cannot be explained by putative differences in the assembly quality across genomes because the same trend was observed with SMAX and SCP values. In fact, there is no relationship between the values of our three estimates of the real number of Gr genes across genomes and the N50, the number of scaffolds or the number of predicted peptides in these genomes (supplementary table S1, Supplementary Material online). Strikingly, even the most closely related species, the spiders La. hesperus and P. tepidariorum, greatly differ in their repertory size (fig. 1), revealing a highly dynamic evolution. These differences are clearly shown in the phylogenetic tree as large monophyletic groups (mostly species-specific clades). Despite these findings, the tree also reveals a distinctive monophyletic group of apparently less dynamic sequences with representatives from all chelicerates (fig. 3; supplementary fig. S1, Supplementary Material online). However, we did not detect any GR protein closely related to the functionally characterized carbon dioxide, sweet taste, and fructose insect receptors in chelicerates (Jones et al. 2007; Miyamoto et al. 2012; Fujii et al. 2015).

. 3.—

Phylogenetic tree of the Gr family members across arthropods. The different species are depicted in colors as in figure 1. The scale bar represents one amino acid substitutions per site.

Phylogenetic tree of the Gr family members across arthropods. The different species are depicted in colors as in figure 1. The scale bar represents one amino acid substitutions per site. The Ir/iGluR is the second largest CS family (SCP = 323, SMIN = 825, and SMAX = 979). Again, but less pronounced than in the Gr family, we also detected a highly uneven distribution of copies across lineages. Interestingly, the repertory sizes of these two families do not correlate across chelicerates (Pearson correlation, P-value > 0.05); for instance, T. urticae encodes very few Ir/iGluR copies (SMIN = 19) but a large number of Gr genes (SMIN = 469). Similar to the Gr family, the relationship of the Ir/iGluR family size across species is very similar regardless of the use of SCP, SMIN, or SMAX values, suggesting that the assembly quality has no influence. The phylogenetic analysis using sequences with the complete ligand channel domain reproduced the established relationships of the five major arthropod Ir/iGluR subfamilies (fig. 4; supplementary fig. S2, Supplementary Material online; Croset et al. 2010; Vizueta et al. 2017). The gene topology allowed us to identify 249 IR proteins (or truthful IR set, t-IR) (200 with the two ligand-binding domains plus another 49 with only the ligand channel domain; supplementary table S1, Supplementary Material online), which would represent the minimum number of functional IR copy candidates to perform a chemosensory function. The phylogenetic analysis also revealed the absence of members of the Ir25a/Ir8a-conserved IR subfamily in M. martensii, S. mimosarum, A. geniculata, and La. hesperus. However, a more comprehensive analysis of the IP set revealed that, in fact, all these species encode one IR25a receptor (supplementary table S2 and fig. S3, Supplementary Material online). Interestingly, we failed to detect any putative homologs of IR8a in all chelicerates, except in the horseshoe crab Li. polyphemus (LpolIR11 sequence). Still, we could detect putative homologs of two Drosophila antennal IRs, IR93a and IR76b. The first member was identified in all species, excluding A. geniculata and S. mimosarum, whereas IR76b was present in Daphnia, the horseshoe crab, the two scorpions and the spiders P. tepidariorum and La. hesperus (supplementary table S2 and fig. S3, Supplementary Material online). Nonetheless, we did not find putative homologs of the other Drosophila antennal IRs with orthologous copies in insects, such as IR21a and IR40a (Croset et al. 2010; Eyun et al. 2017).

. 4.—

Phylogenetic tree of the Ir/iGluR family members across arthropods. The tree is based on LCD domain sequences (PF00060). Different lineages are colored as in figure 1. The three main subfamilies of iGluRs and the conserved IR clade are shaded in different colors. The scale bar represents one amino acid substitution per site.

Other Chemosensory Families

We identified several novel and complete OBP-like encoding sequences in chelicerates (fig. 1; supplementary table S1, Supplementary Material online). In addition to the described members in I. scapularis, M. occidentalis, S. mimosarum, and S. maritima (Renthal et al. 2017; Vizueta et al. 2017), we identified a total of 26 new (out of 30) OBP-like proteins in chelicerates. All the chelicerates encode at least one member of this family, with repertory sizes ranging from 1 to 4 copies. Additionally, and very surprisingly, we detected 19 novel (out a total of 21) Obp-like genes in the centipede S. maritima. Our phylogenetic analysis of canonical OBP (from insects) and OBP-like proteins (fig. 5, supplementary fig. S4, Supplementary Material online) does not support the reciprocal monophyly of these two gene families. Although some OBP-like sequences (such as MoccOPBl2, IscaOBPl2 and PtepOBPl3) are phylogenetically close to the OBP Plus-C subfamily, others, for example, DmelOBP99c (a member of the insect minus-C subfamily), are more related to the chelicerate OBP-like sequences than to the insect OBP sequences. Moreover, the phylogenetic analysis revealed three major clades, each almost exclusively containing sequences of the given arthropod subphylum (i.e., D. melanogaster, S. maritima, and chelicerates).

. 5.—

Phylogenetic relationships of the Obp-like and insect (D. melanogaster) Obp family members. Lineages and species names are colored as in figure 1. For clarity, two D. melanogaster nodes with 12 and 33 descent sequences are collapsed. The color of the inner circle indicates the Obp subfamily: Classic (black), Minus-C (green), Plus-C (blue) and Dimer (red). The outer circle in yellow indicates the members from noninsect species with PBP/GOBP domain (IPR006170). The scale bar represents 0.1 amino acid substitutions per site. The size of the Npc2 family has remained relatively constant during the diversification of the major chelicerate lineages, ranging from 10 to 20 (SMIN values, supplementary table S1, Supplementary Material online), with the outstanding exception of T. urticae, which encodes 47 genes. Nevertheless, nearly half of the Npc2 members of some species are incomplete fragments or show premature stop codons, resulting in much greater difficulty in drawing a firm conclusion about the real sizes of this family compared with the other families. In this case, we found a strong positive correlation between N50 and SCP, SMIN, and SMAX values (Pearson correlation coefficient, r > 0.80; P < 0.05; supplementary table S1, Supplementary Material online), indicating that the observed variation in the number of Npc2 genes across species is clearly associated with genome assembly continuity. This result is probably due to the fact that the length of the genomic region that includes the target sequences of the similarity searches is the longest (jointly with the Cd36-Snmp family, see below) among the families surveyed in this work. Unlike chemoreceptors and Obp-like members, NPC2 proteins are not arranged in large species-specific phylogenetic clades (supplementary fig. S5, Supplementary Material online), suggesting a less dynamic evolution of this family compared with chemoreceptors and OBP-like proteins. Our searches for members of the recently discovered Ccp gene family (Vizueta et al. 2017) only provided positive results in spiders and in Centruroides exilicauda (the Bark scorpion), although the sequence identity of the copy detected in this last species is low. We found important differences in family size across species, from 2 in Lo. reclusa to 21 in P. tepidariourum (SMIN). Like in D. silvatica, most CCPs exhibited an identifiable signal peptide sequence and a conserved cysteine pattern, supporting their putative role in the extracellular binding and transport of chemical cues (Vizueta et al. 2017). The phylogenetic analysis of this family revealed relatively short branches and clades likely representing orthologous genes (supplementary fig. S6, Supplementary Material online). Even so, the 21 copies of P. tepidariorum (11 of them forming a species-specific clade) is a remarkable exception and could be associated with an adaptive event linked to this family in this lineage. The high-quality assembly and annotation of the P. tepidariorum genome may be good enough to have a closer look at the genomic location of Cpp genes and to search in this family for signatures of the lineage-specific bursts of tandem duplications stated by Schwager et al. (2017).

The Cd36-Snmp Family

The Cd36-Snmp family size has also remained relatively constant during the diversification of chelicerates, especially with respect to the SMAX values (ranging from 8 to 19). Nevertheless, as in the Npc2 family, nearly half of the positive hits encode incomplete proteins, most of which are in spiders and scorpions (supplementary table S1, Supplementary Material online). Consistent with the large size of the target genomic regions of this family, we also found a positive correlation between N50 and SCP and SMIN (but not SMAX) values for this family (Pearson correlation coefficient, r > 0.56; P < 0.05; supplementary table S1, Supplementary Material online), although weaker than in the case of NPC2. The phylogenetic analysis (supplementary fig. S7, Supplementary Material online) showed that only one of three phylogenetic clades described by Nichols and Vogt (2008) has remained monophyletic across all arthropods (i.e., the group including the SNMP protein of D. melanogaster). However, many sequences do not form monophyletic groups and, therefore, cannot be unambiguously assigned to a given subfamily group, suggesting a more complex grouping than those observed in insects (Nichols and Vogt 2008).

Gene Turnover Rates of Chemoreceptors

We estimated gene family turnover rates for the two largest Chelicerata gene families, Gr and Ir/iGluR, using Li. polyphemus to root the tree (fig. 6, supplementary fig. S8, Supplementary Material online). As the analysis could have been compromised by the use of three different estimates of family size (per CS family), we first evaluated the behavior of these size estimates with respect to the turnover rates. We found that the number of gene duplications and losses calculated using SCP (only for the Gr family), SMIN, and SMAX values strongly correlated across lineages (r > 0.94; P-values < 10−5); therefore, we did not expect important relative rate differences among the three estimates. Thus, we calculated birth and death rates only with SMIN because this estimate likely represented the true number of copies in most genomes.

. 6.—

Gene turnover of chemoreceptors across chelicerates. Estimates obtained from the data set used to estimate SMIN. Numbers above and below each branch indicate lineage-specific gene duplications and losses, respectively. Green, GR family; blue, IR/iGluR family. Estimates in very short and outgroup branches have large uncertainty and are not showed. Numbers in the ancestral nodes show the estimated family sizes. Numbers at the tips indicate the number of sequences used for the analysis; such values can differ from SMIN because only sequences that clustered in an orthogroup (with three or more sequences) were included in the analysis. We found that the global (across all phylogenetic tree) gene turnover rates of Gr and Ir/iGluR showed important differences (supplementary fig. S8, Supplementary Material online). In Gr, the net turnover rates were positive (Δ = 0.003), indicating an overall expansion of gustatory receptor repertory during arachnid diversification. In contrast, the Ir/iGluR family showed an overall contraction (Δ = −0.002). These results should be considered with caution because global turnover rates are strongly affected by the presence of specific phylogenetic branches with extreme values. In the Gr family, for instance, the external lineages leading to T. urticae (β = 0.015), C. exilicauda (β = 0.030), and P. tepidariorum (β = 0.030) have β values that are much higher than the global rates (β = 0.007); in contrast, other branches, such as the internal lineage leading to acari (δ = 0.008) and the external lineage leading to La. hesperus (δ = 0.007), show death rates that clearly exceed global estimates (δ = 0.004). The Ir/iGluR family exhibits smaller turnover rate differences among the lineages than those observed for Gr. Even so, the external branches of C. exilicauda (β = 0.005), and especially of P. tepidariorum (β = 0.011), are clear outliers and the only ones that show a clear expansion of the Ir/iGluR repertory during the diversification of arachnids. It should be noted that the Ir/iGluR data set includes the sequences of five subfamilies of this highly functional, diverse family of receptors, which show very dissimilar turnover rates in insects. In fact, the Ir subfamily, which is the only subfamily encoding putative chemosensory receptors, is the most dynamic family of insects. Therefore, to disentangle subfamily-specific effects, we estimated the gene turnover rates using only the IR copies from SMIN and the t-IR set (fig. 4). As expected, birth and death rates estimated from the SMIN and t-IR sets did not show big differences (results not shown), suggesting a major effect of the Ir subfamily on gene turnover estimates in the Ir/iGluR family. Indeed, the t-IR estimates were even more variable across lineages than those obtained for the whole family, especially for birth rates, with slightly higher average rates. Especially noteworthy is the case of the P. tepidariorum lineage, which not only confirmed the findings of the SMIN set analysis but also showed that the gene number expansion (supplementary fig. S2, Supplementary Material online) was definitively caused by the birth of new Ir genes (t-Ir set based estimates, β = 0.020, δ = 4 x 10−4).

Discussion

The early diversification of arthropods predated the colonization of land by animals (Rota-Stabelli et al. 2013). Chemical communication strategies associated with this terrestrialization, therefore, should have been invented several times independently in their major lineages (Hexapoda, Crustacea, Myriapoda, and Chelicerata). It is likely that proteins involved in the first peripheral chemosensory perception steps, which are commonly associated with medium-size gene families, played a central role. Hence, these gene families represent an important fraction of arthropod genomes and contribute significantly to gene turnover dynamics in insects (Sánchez-Gracia et al. 2009, 2011). The recent availability of the complete genome sequences from various chelicerates has provided insights into their CS family members. Nevertheless, the quality of the genome assembly and functional annotation is far from satisfactory. Some genomes are highly fragmented, with an absence of functional annotations or annotations obtained using only nonexhaustive automated protocols. Here, we report the first comparative analysis of the actual copy number and gene turnover evolution of CS families in 11 nonhexapod genomes. This study is in fact the first comprehensive comparative genomics study that, although enriched in Arachnida species, covers most of the chelicerate diversity (see Eyun et al. [2017], Palmer and Jiggins [2015], and Sanggaard et al. [2014] for examples of previous studies based on many fewer genomes).

The Outstanding Chemoreceptor Repertory of Chelicerata Genomes

The most important challenge for understanding gene family evolution is having well-characterized copies and accurate functional annotations of their members. This is particularly relevant when using highly fragmented genome assemblies generated from short-read sequencing data. To circumvent this problem, we applied a very comprehensive identification and characterization protocol that combined both protein and DNA sequence data, including HMM profiles and protein domain signatures, in a series of sequential searches with accurate filters based on our biological knowledge of the CS system. Our study revealed a surprisingly large number of novel Gr- and Ir-encoding sequences. This feature can be mostly explained by the poor functional annotation status of some genomes. In fact, in those genomes in which CS families had been explicitly characterized (the three acari species, D. melanogaster, D. pulex, and S. marítima), our search protocol largely matched with previously annotations. This characteristic, therefore, indicated that the novel CS-encoding sequences were not false positives caused by a misleading search protocol. We also found that some of the newly identified CS genes were highly fragmented, which is also a consequence of the low quality of assemblies and, therefore, of the poor annotation of gene structures in most surveyed genomes. Most genes are distributed across many different scaffolds, preventing the calculation of the exact number of functional copies in a particular genome. This feature led us to define three repertory size statistics, which not only provided an approximate idea of true values but also allowed for harmonized comparisons across genomes and lineages. As expected, the largest discrepancy occurred between size estimates based on complete genes (SCP) and those including information of incomplete gene fragments (SMIN and SMAX). Despite this difference, however, all three data sets yielded very similar estimates of gene turnover rates; therefore, all of them are good approximations of true CS family sizes and are appropriated to study gene family dynamics across chelicerates. Although SMIN and SMAX values were generally similar, two families showed very important discrepancies: Ir/iGluR and Cd36-Snmp. These discrepancies could be explained by the fact that these genes (and the encoding region including introns) are larger than in the other families, and therefore, it is more likely that the encoding region was fragmented in different scaffolds. In fact, this effect was not observed in genomes with more contiguity (based on the N50 values of the genome assemblies), as observed in T. urticae, M. occidentalis, S. mimosarum, and P. tepidariorum. Finally, we also found numerous sequences with in-frame stop codons, which we have preliminarily classified as putative pseudogenes. It should be taken into account that not all sequences with evidence of stop codons must be nonfunctional copies; indeed, some of these stop codons may be introduced during gene assembly from dispersed TBlastN hits (which has been done in a semiautomatic way). Only with the use of additional, high-quality assembled genomes will it be possible to obtain accurate information concerning the nature and number of these putative pseudogenes.

CS Gene Turnover in Chelicerates: Complex Evolutionary Dynamics

We have shown that although chelicerate have larger Gr gene repertories than nonchelicerates, the estimated birth and death rates for the Gr family are almost the same as those in insects (Almeida et al. 2014). The disparate family sizes might be explained by former differences in the ancestors of each of these two lineages. In fact, at least two ancient and independent whole-genome duplications (WGD) have been proposed for chelicerates, one in the ancestor of spiders and scorpions (∼450 Ma; Schwager et al. 2017), and the other likely occurred in the lineage of horseshoe crabs (Kenny et al. 2016; Schwager et al. 2017). Thus, it is tempting to hypothesize that evolutionary forces and genomic mechanisms underlying the long-term birth and death dynamics of chemosensory families were essentially the same in all arthropods, although eventually promoted by lineage-specific genome-scale events such as WGD. Nevertheless, not all of our results are compatible with such an evolutionary scenario. For instance, the results obtained for the Ir subfamily do not agree with those observed for Gr. The birth and death rates of these putative chemoreceptors differ between chelicerates and nonchelicerates, and they do not show the footprint of the WGD preceding the diversification of spiders and scorpions. In fact, net turnover rate of this family has the opposite pattern as GRs, suggesting an important contraction of ionotropic receptors in chelicerates. Furthermore, the occurrence of WGD events could not satisfactorily explain the full evolutionary history of most of the surveyed families, not even for the Gr family. For instance, T. urticae shows very high GR repertoires (SMIN = 469) and a very low IR (SMIN = 6) compared with the other acari, and this pattern is unequivocally not explained by the use of a particular family size SMIN statistic (the three estimators point to the same feature). Although we cannot completely rule out the possibility of a WGD in this lineage, there is no compiled evidence in support of this phenomenon (Grbić et al. 2011; Kenny et al. 2016). Second, the closest phylogenetic lineages in our study (La. hesperus and P. tepidariorum, with the most recent common ancestor tracing back approximately 100 Ma) show enormous differences in Gr and Ccp family sizes. Finally, estimation of the turnover rates in a pair of phylogenetically close species (C. exilicauda and M. martensii; La. hesperus and P. tepidariorum) is difficult to reconcile with a constant birth and death process. Therefore, the evolutionary process was rather complex and cannot be entirely explained by WGD. Here, we have demonstrated that other processes affecting specifically chemosensory families, such as long-term birth-and-death evolution associated with high turnover rates occurred in parallel to these whole genomic changes. In addition, more episodic, and probably lineage-specific, expansions and/or contractions also contributed to determine current sizes, as suggested in other studies (Chipman et al. 2014; Schwager et al. 2017). In order to know the relative role of these different processes in shaping actual CS family sizes and their functional meaning, it is imperative to improve the quality of existing genomes and include in the analysis new, more closely related genomes (i.e., increase the phylogenetic coverage).

Phylogenetic Analysis of CS Genes in Arthropods

Despite the above-mentioned limitations, our phylogenetic analysis can shed light on the diversification pattern of CS families. As arthropod CS families are very old and many of their members, especially chemoreceptors, are distantly related, the use of the standard MSA alignment method could be inappropriate for building robust phylogenies. A common method to circumvent this problem is filtering poorly aligned positions and, therefore, considering only highly conserved sites for phylogenetic analyses (Croset et al. 2010; Wu et al. 2016). This approach nevertheless results in a significant loss of relevant amino acid positions that likely contain valuable information on functional and structural features related to the molecular specificity and diversification. Here, we used, for the first time in highly divergent CS families, a method to estimate gene trees using an MSA-free approach, which takes into account alignment uncertainty. For the sake of comparison, we reconstructed the same phylogenetic trees using RAxML based on HMM profile-guided MSAs (Stamatakis 2014: Supplementary file 4). Major differences between PaHMM-Tree and RAxML were found at internal nodes and nodes with low bootstrap support in ML trees (< 70% from 500 replicates). Although bootstrap values increased when filtering poorly aligned positions (Capella-Gutiérrez et al. 2009), the number of informative sites retained after removing these unreliable positions was very low, causing the ML trees to be based on a very small number of positions. These trees may not be reflecting the real evolutionary history of the chemosensory proteins. Besides, for very large families, such as the Gr, the bootstrap analysis was unfeasible in the practice due to excessive computation times. Given that PaHMM-Tree is an alignment-free approach, which allow us to utilize all the amino acid positions to reconstruct the trees, and that the results obtained by Bogusz and Whelan (2017) point to a better performance of this approach for highly divergent sequences without the need for a previous filtering step, here, we decided to report the results based on this method. However, a more exhaustive study comparing these and other tree reconstruction methods, using both real and simulated data and under different degrees of divergence, would be necessary to know whether this method actually improves the phylogenetic analysis. Our phylogenetic analysis correctly recovered all previously known (and accepted) relationships among subfamilies and revealed new aspects of the diversification of CS genes. We found that chelicerates virtually have their own GR repertoires with almost no phylogenetic clade containing members of insects, crustaceans, and myriapods. In fact, we did not find homologs of any of the GR functionally characterized in insects. Apparently, chelicerate genomes do not encode any protein sequence close to Drosophila sugar, fructose, or carbon dioxide receptors (Jones et al. 2007; Miyamoto et al. 2012; Fujii et al. 2015), questioning their ability to detect these substances. Nevertheless, chelicerates might be using other phylogenetically distant gustatory receptors to perform these tasks. Yet, the presence of a monophyletic clade with more conserved GR chelicerate sequences would suggest the existence of some other important biological function played by these receptors. The members of this clade could have a highly relevant function in chelicerates, evolving under lower evolutionary rates despite the tremendous diversification of this subphylum. Future functional studies combined with new evidence based on greater coverage phylogenetic analysis will definitely shed light on this interesting hypothesis. Another remarkable result is the verification that most GR receptors found in species with very large repertories such as in P. tepidariorum or in C. exilicauda are monophyletic, pointing to important bursts of gene duplication events in relatively recent time periods. These events probably represent adaptive expansions of the gustatory repertory associated with chemosensory diversifications. In other cases, such as in T. urticae lineage, apparent species-specific family expansions might be just an artefact caused by the continued effect of the birth-and-death process in a very long terminal branch (i.e., reflecting the low phylogenetic coverage of this part of the tree). Although the general phylogenetic pattern observed in the IR is very similar to that of the GR, we detected some Ir members with relatively conserved sequences across all arthropods. We can hypothesize that these receptors should have a very relevant and not easily replaceable function. For instance, IR25a, a receptor found in all arthropods surveyed to date, is a broadly expressed protein involved in trafficking to the membrane of other IRs in olfactory and taste organs that has been proposed to have also a coreceptor function in the membrane (Joseph and Carlson 2015). We also found a putative ortholog of IR8a in the horseshoe crab Li. polyphemus, which led us to reformulate the hypothesis of Eyun et al. (2017) suggesting that this member arose in the ancestor of myriapods and pancrustaceans, tracing back its origin, again, to at least the ancestor of arthropods. Our analysis also supports the presence of a group of IR76b homologs outside the insect clade (Eyun et al. 2017) which was likely present in the arthropod ancestor. This receptor, proposed to play a coreceptor function for other IRs and associated with a gustatory function as a detector of low salt concentrations (Zhang et al. 2013), has been identified in all chelicerates except in the acari and some spider clades. Its absence in these arthropod groups suggests a secondary loss in the ancestor of these lineages. However, we could not fully refute the possibility that we were unable to detect this member in these genomes, especially in spiders, because of assembly fragmentation. Our current phylogenetic analysis failed to detect putative homologs of IR21a and IR40a in chelicerates. Though we found some week evidence for homologs of these receptors in the transcriptome of the spider D. silvatica (Vizueta et al. 2017), we rely more in the analysis applied herein, which is most comprehensive and uses an alignment-free method based on HMM profiles to generate the trees. These new evidences, together with previous genomic analyses, would indicate the presence of IR21a exclusively in panarthropods (Eyun et al. [2017] have recently found a putative homolog of the IR21a protein in copepods) and of IR40a exclusively in insects. Notably, our study shows that all chelicerates and the centipede S. maritima carry members of the Obp-like family, a gene family that is closely related to insect OBPs (Renthal et al. 2017; Vizueta et al. 2017). This family, which is absent in crustaceans, might represent a remote homolog of canonical insect OBPs. The close relationship of a Drosophila minus-C OBP within an OBP-like chelicerates clade, in agreement with the results of Renthal et al. (2017) based on the disulfide bonding pattern, suggests that this subfamily represents an ancestral state of an OBP. Nonetheless, we cannot completely ignore the possibility that the similar sequence arose by structural convergence. As a canonical OBP, OBP-like has a signal peptide region, a predicted globular protein with the characteristic cysteine patterns of OBPs, and predicted folding similar to that of insect OBPs. Moreover, some experimental results have also confirmed the expression of some Obp-like members in specific chelicerates chemosensory appendages (Renthal et al. 2017). All compiled evidence, therefore, suggests that chelicerates and myriapod OBP-like may have a similar function to canonical OBPs, such as in solubilizing and transporting chemical cues. Regardless, the extraordinarily large repertory observed in S. maritima clearly merits further investigation. This is especially interesting because the genome paper of S. marticima reported a high number of tandem duplications (Chipman et al. 2014). Intriguingly, we did not find CSP-encoding genes in the surveyed chelicerates, except the single copy found in the tick I. scapularis (Gulia-Nuss et al. 2016). Although Eyun et al. (2017) reported some sequences encoding CSP proteins in the bark scorpion C. exilicauda and the spider La. hesperus, our analysis of such sequences could not unequivocally establish that they encode real CSP proteins; indeed, these sequences are very short with multiple in frame stop codons and do not exhibit the characteristic cysteine CSP pattern, suggesting a false positive result. Our analysis also allowed us to identify members of the Ccp gene family in spiders, as well as a remote homolog in the bark scorpion C. exilicauda, suggesting that the origin of this rapidly evolving gene family traces back to the ancestor of these two groups. Remarkably, we observed a large expansion of some members (a lineage-specific expansion) in the house spider P. tepidariorum, a feature that reflects its greater number of chemoreceptors. We have established that the CCP-encoding genes have a signal peptide fragment and similar folding characteristics to the insect OBP and are differentially expressed in the putative chemosensory appendices of the spider D. silvatica (Vizueta et al. 2017). Therefore, although their actual function is unknown, it is tempting to assign a putative function to the transport and solubilization of chemical cues, a functional role equivalent to that of the canonical OBP, Nevertheless, given that the Ccp is a rapidly evolving gene family that emerged in some derived chelicerate lineages, it could provide new insights into the extracellular-binding protein functions and their roles in diversification and adaptation in arthropods.

Conclusions

Noninsect arthropods comprise a significant portion of earth’s biodiversity and include many species of economic and medical importance. Here, we conducted the first comprehensive comparative genomic analysis across 11 genomes of this old lineage and the first of this magnitude outside of insects. Despite that the high fragmentation of genome drafts prevented us from establishing the exact number of chemosensory genes in each species, our exhaustive search protocol exposed an unprecedented huge number of new family members. Remarkably, many of these new genes were not characterized or even not detected before and most of them encode chemoreceptors. Moreover, we found a remarkable disparity in chemoreceptor repertories across species that is difficult to explain without invoking lineage-specific adaptive expansions probably related with sensory diversification processes. Characterizing the intragenomic dynamics and the specific function of these recently expanded chemosensory genes is an exciting prospect that jointly with the improvement of existing genome assemblies and the reduction of the phylogenetic gap will allow researchers to move forward in the knowledge of chelicerate genomics and biology. This work aims to contribute to this advance and hopes to be the starting signal for many future comprehensive comparative genomic studies in a group of animals as fascinating as unknown.

Supplementary Material

Supplementary data are available at Genome Biology and Evolution online. Click here for additional data file.

86 in total

1. Predicting transmembrane protein topology with a hidden Markov model: application to complete genomes.

Authors: A Krogh; B Larsson; G von Heijne; E L Sonnhammer
Journal: J Mol Biol Date: 2001-01-19 Impact factor: 5.469

2. T-Coffee: A novel method for fast and accurate multiple sequence alignment.

Authors: C Notredame; D G Higgins; J Heringa
Journal: J Mol Biol Date: 2000-09-08 Impact factor: 5.469

3. TimeTree: a public knowledge-base of divergence times among organisms.

Authors: S Blair Hedges; Joel Dudley; Sudhir Kumar
Journal: Bioinformatics Date: 2006-10-04 Impact factor: 6.937

4. Trancriptomic approach reveals the molecular diversity of Hottentotta conspersus (Buthidae) venom.

Authors: Bea G Mille; Steve Peigneur; Reinhard Predel; Jan Tytgat
Journal: Toxicon Date: 2015-03-27 Impact factor: 3.033

5. Pheromone binding and inactivation by moth antennae.

Authors: R G Vogt; L M Riddiford
Journal: Nature Date: 1981 Sep 10-16 Impact factor: 49.962

6. The insect SNMP gene family.

Authors: Richard G Vogt; Natalie E Miller; Rachel Litvack; Richard A Fandino; Jackson Sparks; Jon Staples; Robert Friedman; Joseph C Dickens
Journal: Insect Biochem Mol Biol Date: 2009-04-11 Impact factor: 4.714

7. Genomic insights into the Ixodes scapularis tick vector of Lyme disease.

Authors: Monika Gulia-Nuss; Andrew B Nuss; Jason M Meyer; Daniel E Sonenshine; R Michael Roe; Robert M Waterhouse; David B Sattelle; José de la Fuente; Jose M Ribeiro; Karine Megy; Jyothi Thimmapuram; Jason R Miller; Brian P Walenz; Sergey Koren; Jessica B Hostetler; Mathangi Thiagarajan; Vinita S Joardar; Linda I Hannick; Shelby Bidwell; Martin P Hammond; Sarah Young; Qiandong Zeng; Jenica L Abrudan; Francisca C Almeida; Nieves Ayllón; Ketaki Bhide; Brooke W Bissinger; Elena Bonzon-Kulichenko; Steven D Buckingham; Daniel R Caffrey; Melissa J Caimano; Vincent Croset; Timothy Driscoll; Don Gilbert; Joseph J Gillespie; Gloria I Giraldo-Calderón; Jeffrey M Grabowski; David Jiang; Sayed M S Khalil; Donghun Kim; Katherine M Kocan; Juraj Koči; Richard J Kuhn; Timothy J Kurtti; Kristin Lees; Emma G Lang; Ryan C Kennedy; Hyeogsun Kwon; Rushika Perera; Yumin Qi; Justin D Radolf; Joyce M Sakamoto; Alejandro Sánchez-Gracia; Maiara S Severo; Neal Silverman; Ladislav Šimo; Marta Tojo; Cristian Tornador; Janice P Van Zee; Jesús Vázquez; Filipe G Vieira; Margarita Villar; Adam R Wespiser; Yunlong Yang; Jiwei Zhu; Peter Arensburger; Patricia V Pietrantonio; Stephen C Barker; Renfu Shao; Evgeny M Zdobnov; Frank Hauser; Cornelis J P Grimmelikhuijzen; Yoonseong Park; Julio Rozas; Richard Benton; Joao H F Pedra; David R Nelson; Maria F Unger; Jose M C Tubio; Zhijian Tu; Hugh M Robertson; Martin Shumway; Granger Sutton; Jennifer R Wortman; Daniel Lawson; Stephen K Wikel; Vishvanath M Nene; Claire M Fraser; Frank H Collins; Bruce Birren; Karen E Nelson; Elisabet Caler; Catherine A Hill
Journal: Nat Commun Date: 2016-02-09 Impact factor: 14.919

8. trimAl: a tool for automated alignment trimming in large-scale phylogenetic analyses.

Authors: Salvador Capella-Gutiérrez; José M Silla-Martínez; Toni Gabaldón
Journal: Bioinformatics Date: 2009-06-08 Impact factor: 6.937

9. Fatty acid solubilizer from the oral disk of the blowfly.

Authors: Yuko Ishida; Jun Ishibashi; Walter S Leal
Journal: PLoS One Date: 2013-01-11 Impact factor: 3.240

10. Genome Sequencing of the Phytoseiid Predatory Mite Metaseiulus occidentalis Reveals Completely Atomized Hox Genes and Superdynamic Intron Evolution.

Authors: Marjorie A Hoy; Robert M Waterhouse; Ke Wu; Alden S Estep; Panagiotis Ioannidis; William J Palmer; Aaron F Pomerantz; Felipe A Simão; Jainy Thomas; Francis M Jiggins; Terence D Murphy; Ellen J Pritham; Hugh M Robertson; Evgeny M Zdobnov; Richard A Gibbs; Stephen Richards
Journal: Genome Biol Evol Date: 2016-06-27 Impact factor: 3.416

13 in total

1. Lipocalins in Arthropod Chemical Communication.

Authors: Jiao Zhu; Alessio Iannucci; Francesca Romana Dani; Wolfgang Knoll; Paolo Pelosi
Journal: Genome Biol Evol Date: 2021-06-08 Impact factor: 3.416

Review 2. Odorant-Binding Proteins as Sensing Elements for Odour Monitoring.

Authors: Paolo Pelosi; Jiao Zhu; Wolfgang Knoll
Journal: Sensors (Basel) Date: 2018-09-27 Impact factor: 3.576

3. The draft genome sequence of the spider Dysdera silvatica (Araneae, Dysderidae): A valuable resource for functional and evolutionary genomic studies in chelicerates.

Authors: Jose Francisco Sánchez-Herrero; Cristina Frías-López; Paula Escuer; Silvia Hinojosa-Alvarez; Miquel A Arnedo; Alejandro Sánchez-Gracia; Julio Rozas
Journal: Gigascience Date: 2019-08-01 Impact factor: 6.524

4. Comparative Genomics Identifies Putative Signatures of Sociality in Spiders.

Authors: Chao Tong; Gabriella M Najm; Noa Pinter-Wollman; Jonathan N Pruitt; Timothy A Linksvayer
Journal: Genome Biol Evol Date: 2020-03-01 Impact factor: 3.416

5. Comparison of transcriptomes from two chemosensory organs in four decapod crustaceans reveals hundreds of candidate chemoreceptor proteins.

Authors: Mihika T Kozma; Hanh Ngo-Vu; Yuen Yan Wong; Neal S Shukla; Shrikant D Pawar; Adriano Senatore; Manfred Schmidt; Charles D Derby
Journal: PLoS One Date: 2020-03-12 Impact factor: 3.240

6. Genomic insights into mite phylogeny, fitness, development, and reproduction.

Authors: Yan-Xuan Zhang; Xia Chen; Jie-Ping Wang; Zhi-Qiang Zhang; Hui Wei; Hai-Yan Yu; Hong-Kun Zheng; Yong Chen; Li-Sheng Zhang; Jian-Zhen Lin; Li Sun; Dong-Yuan Liu; Juan Tang; Yan Lei; Xu-Ming Li; Min Liu
Journal: BMC Genomics Date: 2019-12-09 Impact factor: 3.969

7. The Odorant-Binding Proteins of the Spider Mite Tetranychus urticae.

Authors: Jiao Zhu; Giovanni Renzone; Simona Arena; Francesca Romana Dani; Harald Paulsen; Wolfgang Knoll; Christian Cambillau; Andrea Scaloni; Paolo Pelosi
Journal: Int J Mol Sci Date: 2021-06-25 Impact factor: 5.923

8. A new non-classical fold of varroa odorant-binding proteins reveals a wide open internal cavity.

Authors: Beatrice Amigues; Jiao Zhu; Anais Gaubert; Simona Arena; Giovanni Renzone; Philippe Leone; Isabella Maria Fischer; Harald Paulsen; Wolfgang Knoll; Andrea Scaloni; Alain Roussel; Christian Cambillau; Paolo Pelosi
Journal: Sci Rep Date: 2021-06-23 Impact factor: 4.379

9. The genome sequence of the grape phylloxera provides insights into the evolution, adaptation, and invasion routes of an iconic pest.

Authors: Claude Rispe; Fabrice Legeai; Paul D Nabity; Rosa Fernández; Arinder K Arora; Patrice Baa-Puyoulet; Celeste R Banfill; Leticia Bao; Miquel Barberà; Maryem Bouallègue; Anthony Bretaudeau; Jennifer A Brisson; Federica Calevro; Pierre Capy; Olivier Catrice; Thomas Chertemps; Carole Couture; Laurent Delière; Angela E Douglas; Keith Dufault-Thompson; Paula Escuer; Honglin Feng; Astrid Forneck; Toni Gabaldón; Roderic Guigó; Frédérique Hilliou; Silvia Hinojosa-Alvarez; Yi-Min Hsiao; Sylvie Hudaverdian; Emmanuelle Jacquin-Joly; Edward B James; Spencer Johnston; Benjamin Joubard; Gaëlle Le Goff; Gaël Le Trionnaire; Pablo Librado; Shanlin Liu; Eric Lombaert; Hsiao-Ling Lu; Martine Maïbèche; Mohamed Makni; Marina Marcet-Houben; David Martínez-Torres; Camille Meslin; Nicolas Montagné; Nancy A Moran; Daciana Papura; Nicolas Parisot; Yvan Rahbé; Mélanie Ribeiro Lopes; Aida Ripoll-Cladellas; Stéphanie Robin; Céline Roques; Pascale Roux; Julio Rozas; Alejandro Sánchez-Gracia; Jose F Sánchez-Herrero; Didac Santesmasses; Iris Scatoni; Rémy-Félix Serre; Ming Tang; Wenhua Tian; Paul A Umina; Manuella van Munster; Carole Vincent-Monégat; Joshua Wemmer; Alex C C Wilson; Ying Zhang; Chaoyang Zhao; Jing Zhao; Serena Zhao; Xin Zhou; François Delmotte; Denis Tagu
Journal: BMC Biol Date: 2020-07-23 Impact factor: 7.431

10. Comparative morphological and transcriptomic analyses reveal chemosensory genes in the poultry red mite, Dermanyssus gallinae.

Authors: Biswajit Bhowmick; Yu Tang; Fang Lin; Øivind Øines; Jianguo Zhao; Chenghong Liao; Rickard Ignell; Bill S Hansson; Qian Han
Journal: Sci Rep Date: 2020-10-21 Impact factor: 4.379