Literature DB >> 29899973

A TALE of shrimps: Genome-wide survey of homeobox genes in 120 species from diverse crustacean taxa.

Abstract

The homeodomain-containing proteins are an important group of transcription factors found in most eukaryotes including animals, plants and fungi. Homeobox genes are responsible for a wide range of critical developmental and physiological processes, ranging from embryonic development, innate immune homeostasis to whole-body regeneration. With continued fascination on this key class of proteins by developmental and evolutionary biologists, multiple efforts have thus far focused on the identification and characterization of homeobox orthologs from key model organisms in attempts to infer their evolutionary origin and how this underpins the evolution of complex body plans. Despite their importance, the genetic complement of homeobox genes has yet been described in one of the most valuable groups of animals representing economically important food crops. With crustacean aquaculture being a growing industry worldwide, it is clear that systematic and cross-species identification of crustacean homeobox orthologs is necessary in order to harness this genetic circuitry for the improvement of aquaculture sustainability. Using publicly available transcriptome data sets, we identified a total of 4183 putative homeobox genes from 120 crustacean species that include food crop species, such as lobsters, shrimps, crayfish and crabs. Additionally, we identified 717 homeobox orthologs from 6 other non-crustacean arthropods, which include the scorpion, deer tick, mosquitoes and centipede. This high confidence set of homeobox genes will now serve as a key resource to the broader community for future functional and comparative genomics studies.

Entities: Chemical Disease Gene Species

Keywords: Crustacean; TALE; arthropod; comparative genomics; homeobox; homeodomain

Year: 2018 PMID： 29899973 PMCID： PMC5968366 DOI： 10.12688/f1000research.13636.1

Source DB: PubMed Journal: F1000Res ISSN： 2046-1402

Introduction

As one of the fastest growing industries, the seafood trade is dominated by fishing and farming of crustaceans, with annual sales exceeding $40 billion ( Stentiford ). Crustacean aquaculture is multi-faceted, not only contributing to the ever-increasing demands by international markets, but is also directly linked to the socio-economic aspects of many developing nations through the creation of jobs and infrastructure. Aquaculture practices have intensified in recent years to cope with the demand. Yet, many are not sustainable since the increased densities of farmed shrimps often serve as hotbeds for pathogens if left unabated, causing infectious diseases and the devastation of cultures resulting in massive financial losses. As a result, regulations associated with aquaculture diseases are being enforced with emphasis placed on preventative measures, e.g. enhancement of broodstock and research aiming to further our understanding on crustacean development and ways to utilize the innate ability of crustaceans to combat pathogens ( Lai & Aboobaker, 2017; Stentiford ). Several conserved molecular genetic circuitries are well-known for regulating many aspects of development and innate immune homeostasis. One prominent example would be homeobox genes, a family of transcription factors defined by the presence of a homeodomain ( Holland, 2013). As one of the most important master controls in development, some headway has already been made in understanding the involvement of homeobox genes in innate immunity; Caudal in Drosophila melanogaster is implicated in commensal-gut mutualism ( Ryu ; Ryu ). Given their importance, major efforts have thus far focused on characterization of homeobox genes in well-known model organisms such as humans ( Garcia-Fernàndez, 2005; Holland ), Caenorhabditis elegans ( Bürglin, 1997), D. melanogaster ( Mukherjee & Bürglin, 2007), planarians ( Currie ; Felix & Aboobaker, 2010; Garcia-Fernandez ), amphioxus ( Luke ), teleost fish ( Mulley ) and many more. Although homeobox orthologs have been previously studied in the crustacean Parhyale hawaiensis ( Kao ), systematic and cross-species characterization of this gene family across the broader Crustacea with focus on food crop species is currently lacking. A better understanding of homeobox genes in crustaceans is therefore required to address this major shortfall, leading us to our present work.

Methods

Transcriptome data sets and query sets

We retrieved complete transcriptome data sets for 120 crustacean species available at the time of manuscript preparation from the European Nucleotide Archive. Six non-crustacean arthropod proteomes were retrieved from Uniprot. A complete list of accessions used in this study is provided in Supplementary Table 1. We retrieved a list of query sequences used in subsequent homology searches from Uniprot and GenBank.

Identification of homeobox orthologs

Based on a previously published workflow ( Lai & Aboobaker, 2017), we used multiple Basic Local Alignment Search Tool ( BLAST)-based approaches, such as BLASTp and tBLASTn to identify genes with homeodomain sequences. The BLAST results were filtered by e-value of < 10 -6, best reciprocal BLAST hits against the GenBank non-redundant (nr) database and redundant contigs having at least 95% identity were collapsed using CD-HIT. We then utilized HMMER (version 3.1) employing hidden Markov models (HMM) profiles ( Finn ) to scan for the presence of Pfam homeodomains ( Bateman ) on the best reciprocal nr BLAST hits, to compile a final non-redundant set of crustacean and arthropod homeobox gene orthologs ( Dataset 1).

Multiple sequence alignment and phylogenetic tree construction

Multiple sequence alignment of homeodomain sequences was performed using MAFFT (version 7) ( Katoh ). Phylogenetic tree was built from the MAFFT alignment using RAxML WAG + G model to generate a best-scoring maximum likelihood tree ( Stamatakis, 2014). Geneious (version 7) was used to generate a graphical representation of Newick tree ( Kearse ).

Results and discussion

Identification of putative homeobox genes in crustaceans

With the recent availability of a large number of transcriptome data sets, we perform an extensive search for homeobox genes from 120 crustacean species. We focus on species represented across the broader Crustacea sampling from three main crustacean classes, Malacostraca, Branchiopoda and Copepoda, with focus on key food crop species from the order Decapoda ( Supplementary Table 1). Using BLAST-based approaches and profile HMM ( Bateman ; Finn ; Finn ) for homology searches, we conservatively identified 4183 transcripts with homeodomain sequences from crustaceans ( Figure 1; Dataset 1). Additionally, we included six non-crustacean arthropod species in our search and from these species, we identified 717 homeobox orthologs ( Figure 1; Dataset 1).

Figure 1.

The homeobox superfamily in Crustacea and representative arthropod species.

( A) Number of homeobox gene orthologs identified in each species are depicted as boxplots, indicating the median and quartiles. Violin plots underlying the boxplots illustrate sample distribution across different crustacean taxa and kernel probability density (width of the shaded areas represent the proportion of data located in these areas). The homeobox gene orthologs from six non-crustacean species within Arthropoda (others) are also shown. ( B) Bar charts illustrating the number of homeobox gene orthologs in crustaceans from Decapoda, Branchiopoda and Copepoda along with six non-crustacean arthropods (others).

The homeobox superfamily in Crustacea and representative arthropod species.

Classification and phylogenetic analysis of TALE class genes

Concerted efforts to establish evolutionary classification of homeobox genes have resulted in 11 recognised classes ( Edvardsen ; Holland ; Ryan ; Zhong ; Zhong & Holland, 2011). The Three-Amino acid-Loop Extension (TALE) superclass within the group of homeobox genes is characterized by three additional residues between alpha helices 1 and 2 of the homeodomain ( Bertolino ). TALE class homeodomain proteins are further divided into 6 subclasses, Meis, Pknox, Pbc, Irx, Mkx and Tgif characterized by distinct motifs beyond the homeodomain ( Bürglin, 1997; Bürglin, 2005; Holland ; Mukherjee & Bürglin, 2007). We have classified a total of 165 TALE class orthologs from 15 decapod crustacean species ( Figure 2). These genes form distinct phylogenetic grouping, which allows confident assignment of decapod TALE class orthologs into 6 sub-families ( Figure 2). Importantly, the tree topology of crustacean TALE class orthologs recapitulated observations from a previous study ( Holland ).

Figure 2.

Phylogeny of TALE superclass orthologs in decapod crustaceans.

The tree was constructed using the maximum-likelihood method from an amino acid multiple sequence alignment, which include TALE class genes from other species ( Zhong and Zhong & Holland, 2011). TALE orthologs representing 6 subclasses are colour-coded. The node labels of each taxon are marked with distinctive colors denoted in the figure inset. Bootstrap support values (n=1000) are denoted as branch labels.

Phylogeny of TALE superclass orthologs in decapod crustaceans.

Conclusion

We identified 4900 homeodomain transcripts from 120 crustaceans and 6 non-crustacean arthropod species. Although this data set is non-exhaustive – transcriptomes contain only genes expressed at the point of sample collection – it will now serve as a key resource for future functional studies in the context of crustacean aquaculture. Beyond crustaceans, this work is widely applicable to studies on homeobox genes from other animals and will facilitate evolutionary and comparative genomics investigations.

Data availability

The data referenced by this article are under copyright with the following copyright statement: Copyright: © 2018 Chang WH and Lai AG Data associated with the article are available under the terms of the Creative Commons Zero "No rights reserved" data waiver (CC0 1.0 Public domain dedication). Dataset 1: List of Pfam annotated homeobox genes and associated e-values in crustaceans and other arthropods. DOI, 10.5256/f1000research.13636.d190417 ( Chang & Lai, 2018). Dataset 2: Fasta file for homeobox gene sequences in crustaceans and other arthropods. DOI, 10.5256/f1000research.13636.d190418 ( Chang & Lai, 2018). In this manuscript, Chang and Lai identify sequences of the homeobox genes in crustaceans from transcriptional data. For TALE family members, they classify these orthologs into the six subfamilies. The introduction provides the justification for establishing this resource in these agriculturally important species. In total, this will be an important reference source for future work in understanding the transcriptional regulation of development in crustacean species. There are a few minor questions and suggestions the authors can address. The details of the BLAST-based approaches and CD-Hit should be described in more detail. Were default settings applied? Perhaps adding an additional supplemental file or link with the commands would clarify this to facilitate replication in other species. The authors have the opportunity to address how effective their approach is and potential places for improvement. For example, how do the datasets derived from proteomes compare to that from transcriptomes? Is there an overlapping dataset where the two approaches could be compared? Also, how does the identification of homeobox genes using this approach in Drosophila compare to the number annotated in this well-studied species? If all are identified this would suggest that this approach may be sufficient if some are missing it suggests that this conservative starting point could be enhanced in the future by additional refinement. Minor comments: Introduction, second paragraph, description of homeobox identification in other species. Either provide references for “and many more” or omit. As is, this phrase is vague and doesn’t add constructively to the introduction. I have read this submission. I believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard. This work makes a putative assessment of the overall homeodomain complements of the transcriptomes of a number of crustacean species. One class of homeodomain containing genes (TALE) from one order of crustaceans (Decapoda) is assessed in detail, but otherwise no attempt is made to categorise putative hits into gene families. The work is therefore preliminary in scope, and would benefit from the provision of even broadest-level classification of hits into appropriate classes/subclasses of gene, which should be fairly straightforward given the diagnostic residues used to categorise these classes. The title as it stands is misleading - a genome wide survey is not made, and instead transcriptomic data is used, which will (by necessity) be gappy. A link is made between aquaculture, innate immunity and homeodomain-containing proteins, but this is very tenuous. Particularly, why the focus is on TALE class genes is unclear, if Caudal is the gene used as the exemplar for a link between these fields? The methods section needs to be more precise. For example: "such as BLASTp and tBLASTn to identify genes with homeodomain sequences". What was done? How were protein sequences derived from nucleotide data for blastp searches? What sequences were used to search your datasets? (Perhaps add these to the sentence: "list of query sequences used in subsequent homology searches from Uniprot and GenBank."). The latter is particularly important as more distant sequences may be missed. I have several questions about e values. -Dataset 1 contains several ID'd genes with higher E values than the stated cutoff (< 10-6). Is this deliberate? -The e-value of < 10-6 will also likely result in larger datasets returning more hits, purely as a consequence of how the E (expect) value is calculated. For example, the Daphnia magna transcriptome has 271,000 sequences, Triops 12,000. Therefore it is much more likely that sequences will make it through your annotation pathway in Daphnia magna rather than Triops. This will skew the results shown in Fig 1B. Is it possible to show that homeodomain genes are not artificially excluded, perhaps by giving the E values and best blast hits from "next-best" excluded sequences in your initial searches of small datasets, to prove no homeodomain sequences were artificially excluded? This is crucial, given the short length of the homeodomain, which will be the primary source of signal. Was HMMR really run on the best reciprocal nr hits, as is suggested by your phrasing? Or was it run on the transcriptome-derived data? Fig 1A: Violin plots are not appropriate here. Look for instance at the Branchiopod data, where 3 points are used to infer this plot. The results in the tree in Fig 2 seem to indicate that decapod crustaceans completely lack Mkx genes, and the presentation of this is disingenuous in text (note the paraphyly of known Mkx homologues with regard to the inferred crustacean Mkx). Instead, the crustacean Mkx seem to be Pbc? "Importantly, the tree topology of crustacean TALE class orthologs recapitulated observations from a previous study (Holland et al., 2007)." - this statement does not seem to be correct. The homeodomain complements, especially of ANTP class genes, of several crustacean species have been described previously, but no attempt is made to place the results observed in the context of the annotated sets of other species. Could this be provided? Particularly, the utility of the re-assessment of non-crustacean datasets is unclear, as these resources have been annotated previously in more detail. Were additional homeodomain-containing genes found by this re-analysis? Or fewer? In short, this work seems to be partially successful in its aims. With the addition of additional information about the identity of sequences, and the correction of the problems noted above, it will be a coherent addition to extant information on crustacean homeodomain-containing genes. $40 billion - which currency? USD? There are several areas where the phrasing could be improved, e.g. -"With continued fascination on this key class of proteins" Sometimes articles (a/the) are missing from the text, e.g. - "Phylogenetic tree was built from" I have read this submission. I believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard, however I have significant reservations, as outlined above. The report by Chang and Lai provide an extensive dataset and phylogenetic analysis crustacean homeobox genes. This data will be useful to individuals interested in studying the evolution and function of homeobox genes in crustacea and other organisms. Overall, this manuscript is well written and appears to be technically sound. I only have some very minor comments the authors could address to improve clarity: 1) In the abstract, the rationale for the work takes an intellectual leap: it is unclear how identification of homeobox genes will be useful for aquaculture sustainability. I think the authors provide some compelling reasons within the text of the manuscript. 2) In the results and discussion there are a few instances where the authors should use the past tense. 3) Page 3, last sentence: even though the authors referred to Peter Holland's work, they should be much more precise about the observations from a "previous study". The authors should expand what they mean. I have read this submission. I believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard.

26 in total

Review 1. The genesis and evolution of homeobox gene clusters.

Authors: Jordi Garcia-Fernàndez
Journal: Nat Rev Genet Date: 2005-12 Impact factor: 53.242

2. HomeoDB: a database of homeobox gene diversity.

Authors: Ying-Fu Zhong; Thomas Butts; Peter W H Holland
Journal: Evol Dev Date: 2008 Sep-Oct Impact factor: 1.930

Review 3. Evolution of homeobox genes.

Authors: Peter W H Holland
Journal: Wiley Interdiscip Rev Dev Biol Date: 2012-09-10 Impact factor: 5.814

4. The homeobox gene Caudal regulates constitutive local expression of antimicrobial peptide genes in Drosophila epithelia.

Authors: Ji-Hwan Ryu; Ki-Bum Nam; Chun-Taek Oh; Hyuck-Jin Nam; Sung-Hee Kim; Joo-Heon Yoon; Je-Kyeong Seong; Mi-Ae Yoo; In-Hwan Jang; Paul T Brey; Won-Jae Lee
Journal: Mol Cell Biol Date: 2004-01 Impact factor: 4.272

5. Innate immune homeostasis by the homeobox gene caudal and commensal-gut mutualism in Drosophila.

Authors: Ji-Hwan Ryu; Sung-Hee Kim; Hyo-Young Lee; Jin Young Bai; Young-Do Nam; Jin-Woo Bae; Dong Gun Lee; Seung Chul Shin; Eun-Mi Ha; Won-Jae Lee
Journal: Science Date: 2008-01-24 Impact factor: 47.728

6. Comprehensive analysis of animal TALE homeobox genes: new conserved motifs and cases of accelerated evolution.

Authors: Krishanu Mukherjee; Thomas R Bürglin
Journal: J Mol Evol Date: 2007-07-30 Impact factor: 2.395

7. A novel homeobox protein which recognizes a TGT core and functionally interferes with a retinoid-responsive motif.

Authors: E Bertolino; B Reimund; D Wildt-Perinic; R G Clerc
Journal: J Biol Chem Date: 1995-12-29 Impact factor: 5.157

8. The genome of the crustacean Parhyale hawaiensis, a model for animal development, regeneration, immunity and lignocellulose digestion.

Authors: Damian Kao; Alvina G Lai; Evangelia Stamataki; Silvana Rosic; Nikolaos Konstantinides; Erin Jarvis; Alessia Di Donfrancesco; Natalia Pouchkina-Stancheva; Marie Sémon; Marco Grillo; Heather Bruce; Suyash Kumar; Igor Siwanowicz; Andy Le; Andrew Lemire; Michael B Eisen; Cassandra Extavour; William E Browne; Carsten Wolff; Michalis Averof; Nipam H Patel; Peter Sarkies; Anastasios Pavlopoulos; Aziz Aboobaker
Journal: Elife Date: 2016-11-16 Impact factor: 8.140

9. Geneious Basic: an integrated and extendable desktop software platform for the organization and analysis of sequence data.

Authors: Matthew Kearse; Richard Moir; Amy Wilson; Steven Stones-Havas; Matthew Cheung; Shane Sturrock; Simon Buxton; Alex Cooper; Sidney Markowitz; Chris Duran; Tobias Thierer; Bruce Ashton; Peter Meintjes; Alexei Drummond
Journal: Bioinformatics Date: 2012-04-27 Impact factor: 6.937

10. HMMER web server: 2015 update.

Authors: Robert D Finn; Jody Clements; William Arndt; Benjamin L Miller; Travis J Wheeler; Fabian Schreiber; Alex Bateman; Sean R Eddy
Journal: Nucleic Acids Res Date: 2015-05-05 Impact factor: 16.971