Literature DB >> 32597988

Evolutionary History of the Globin Gene Family in Annelids.

Flávia A Belato1, Christopher J Coates2, Kenneth M Halanych3, Roy E Weber4, Elisa M Costa-Paiva1.   

Abstract

Animals depend on the sequential oxidation of organic molecules to survive; thus, oxygen-carrying/transporting proteins play a fundamental role in aerobic metabolism. Globins are the most common and widespread group of respiratory proteins. They can be divided into three types: circulating intracellular, noncirculating intracellular, and extracellular, all of which have been reported in annelids. The diversity of oxygen transport proteins has been underestimated across metazoans. We probed 250 annelid transcriptomes in search of globin diversity in order to elucidate the evolutionary history of this gene family within this phylum. We report two new globin types in annelids, namely androglobins and cytoglobins. Although cytoglobins and myoglobins from vertebrates and from invertebrates are referred to by the same name, our data show they are not genuine orthologs. Our phylogenetic analyses show that extracellular globins from annelids are more closely related to extracellular globins from other metazoans than to the intracellular globins of annelids. Broadly, our findings indicate that multiple gene duplication and neo-functionalization events shaped the evolutionary history of the globin family.
© The Author(s) 2020. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.

Entities:  

Keywords:  androglobin; cytoglobin; extracellular globin; gene tree; respiratory proteins; transcriptomics

Year:  2020        PMID: 32597988      PMCID: PMC7549130          DOI: 10.1093/gbe/evaa134

Source DB:  PubMed          Journal:  Genome Biol Evol        ISSN: 1759-6653            Impact factor:   3.416


Significance

Annelid worms have the greatest diversity of oxygen-carrying proteins, also known as blood pigments, among all animals. However, the real diversity of these proteins remains underestimated. To access the diversity of globins present in annelids, and to elucidate their evolutionary relationships, we have searched for globin genes among the genomes and transcriptomes of 250 annelid species. We found two new globins in this phylum: androglobins and cytoglobins. Our results indicate that cytoglobins and myoglobins from vertebrates and invertebrates have different evolutionary origins, and that androglobins and extracellular globins originated early in animal's evolution. We show that multiple gene duplication events shaped the complex evolutionary history of the globin family.

Introduction

Aerobic metabolism relies on the sustained transfer of oxygen (O2) from environmental sources to the respiring tissues of animals, which is carried out by O2 transport proteins (also known as respiratory pigments) (Terwilliger 1998; Burmester and Hankeln 2004; Coates and Decker 2017). These globular proteins represent the most widespread respiratory pigments and occur almost ubiquitously amongst organisms, including bacteria, fungi, plants, protists, and animals (Hardison 1996, 1998; Weber and Vinogradov 2001; Vinogradov et al. 2007; Vázquez-Limón et al. 2012; Vinogradov, Bailly, et al. 2013; Vinogradov, Tinajero-Trejo, et al. 2013). Concerning metazoans, intra and extracellular hemoglobins (Hb and HBL-Hb, respectively) and myoglobin (Mb) have been known for over a century (Lankester 1872). More recent comparative genomic studies revealed the existence of several new globin types in vertebrates, such as cytoglobin (Cygb), androglobin (Adgb), and neuroglobin (Ngb) (Burmester et al. 2000; Kawada et al. 2001; Burmester et al. 2002; Trent and Hargrove 2002; Burmester and Hankeln 2004; Hoogewijs et al. 2012). Following this trend, studies demonstrated that the known diversity of oxygen-carrying proteins in animals is underestimated, and this also seems to be true in annelids (Bailly et al. 2008; Martín-Durán et al. 2013; Costa-Paiva et al. 2017, 2018; Belato et al. 2019). Despite the conserved tertiary structure of all globins, the recently discovered proteins differ markedly in their amino acid sequences and carry out several alternative cellular functions besides O2 transport, for example, oxygen-sensing, enzymic activity, signal transduction, lipid and nitric oxide metabolism, and detoxification of reactive oxygen species, suggesting that the presence of more than one type of oxygen-binding protein in animals is related to those other cellular functions (Weber and Vinogradov 2001; Burmester and Hankeln 2014). The circulating annelid Hbs occur within nucleated red blood cells (RBCs), in contrast to the anucleate RBCs that harbor the intensively studied mammalian Hbs (Storz 2018). Invertebrate Hbs are found in at least six phyla: Phoronida, Nemertea, Mollusca, Annelida, Arthropoda, and Echinodermata (Terwilliger and Ryan 2001; Weber and Vinogradov 2001), where they may occur in closed vascular systems, or dissolved in the coelomic fluid and hemolymph (functional equivalents to blood). Annelid Hbs exhibit the classical “Mb-fold,” a structure that comprises five to eight α-helices, named A–H, forming a three-on-three or two-on-two helical sandwich that surrounds the oxygen-binding heme group (Bolognesi et al. 1997; Terwilliger 1998; Weber and Vinogradov 2001; Vinogradov and Moens 2008; Gell 2018). All invertebrate Hbs contain the characteristic, invariant globin amino acids residues: His at E7 (seventh amino acid in helix E), His at F8, and Phe at the interhelical region CD1 (Bolognesi et al. 1997; Weber and Vinogradov 2001). In contrast to the tetrameric vertebrate Hbs, annelid RBC Hbs may be monomeric, dimeric, tetrameric, polymeric, or a combination of these states (Weber 1980; Mangum 1985). The hexagonal bilayer hemoglobins (HBL-Hbs), also called chlorocruorins and erythrocruorins, are giant (MW ∼3.5 × 106), extracellular circulating protein complexes that occur freely dissolved in blood equivalents (Weber 1971; Weber and Vinogradov 2001). For decades, the HBL-Hbs were considered to be present only in a few annelid species (Vinogradov 1985; Weber and Vinogradov 2001). Recently, we demonstrated a much wider phylogenetic distribution of these giant extracellular proteins in invertebrates, including Mollusca, Platyhelminthes, and some deuterostome lineages (Belato et al. 2019). The mega-molecular HBL-Hbs are comprised of two types of polypeptides: Globin chains, with single oxygen-binding sites that satisfy the criterion of a globin-like fold, and linker chains that lack heme groups and are required for the multimeric (hierarchal) assembly of the vast quaternary structures (Vinogradov 1985; Lamy et al. 1996; Weber and Vinogradov 2001; Royer et al. 2006). Among the intracellular noncirculating globins, Mbs are monomers consisting of ∼140 amino acids that reside in the cytoplasm of muscle cells of metazoan taxa and function as intracellular O2 store and in transcellular (facilitated) diffusion of O2 (Wittenberg 1970; Suzuki and Imai 1998). Noncirculating globins also comprise the nerve hemoglobins (nHbs), that occur sporadically in glial cells surrounding the nerve cord and neurons of various invertebrate taxa, including Annelida, Arthropoda, Mollusca, Nematoda, and Nemertea (Wittenberg 1992, Weber and Vinogradov 2001; Geuens et al. 2004; Burmester and Hankeln 2008). nHbs consist of ∼150 amino acid residues and may exhibit the Mb-like structure and exist as homodimers, as seen in the annelid Aphrodita aculeate (Wittenberg 1992; Dewilde et al. 1996; Weber and Vinogradov 2001; Geuens et al. 2004). Although all nHbs contain the diagnostic residues (Phe CDl, HisE7, and HisF8), phylogenetic analyses indicate divergent evolutionary origins (Wittenberg 1992; Dewilde et al. 1996; Weber and Vinogradov 2001; Burmester and Hankeln 2008). The principal function of invertebrate nHbs is considered to be O2 storage and supply during hypoxia, sustaining the aerobic metabolism of the nervous system (Kraus and Doeller 1988; Wittenberg 1992, Weber and Vinogradov 2001; Geuens et al. 2004). Cygbs are noncirculating globins that are colocated alongside Mbs in the cytoplasm of cells of several different vertebrate tissues. However, Cygbs have longer polypeptide chains, with around 170 amino acid residues, since additional residues flank the N- and C-terminals and they thus lack sequence insertions that interrupt the globin fold (Burmester et al. 2002). These proteins do not contain signal peptides and are found in the cytoplasm and nucleus of many different cell types (Burmester et al. 2002). Vertebrate Cygb shows structural and phylogenetic affinities to vertebrate Mb (Burmester et al. 2002; DeSanctis et al. 2004); however, these relationships are not resolved for invertebrate Cygbs. Adgbs, the most recently discovered noncirculating globins, are cytoplasmic, large chimeric proteins that exhibit a modular domain structure. They comprise a N-terminal calpain-like domain, a rearranged globin domain, where the eight α-helices (A–H) are organized such that helices C–H precede helices A–B, and an IQ calmodulin-binding motif (Hoogewijs et al. 2012; Bracke et al. 2018). Despite the different globin domain sequence, Adgbs satisfy the globin-fold criterion (Hoogewijs et al. 2012; Bracke et al. 2018). These chimeric proteins have been recorded in a wide range of metazoan taxa, such as Mollusca, Cnidaria, and Chordata (Hoogewijs et al. 2012; Bracke et al. 2018). Annelids thus exhibit the greatest diversity of oxygen-binding proteins among metazoans (Mangum 1998; Costa-Paiva et al. 2017), with three types of globins described so far: 1) noncirculating cytoplasmic globins, such as Mbs and nHbs; 2) circulating intracellular RBC Hbs; and 3) extracellular HBL-Hbs dissolved in body fluids (Wittenberg 1970; Vinogradov et al. 1993; Lamy et al. 1996; Suzuki and Imai 1998; Weber and Vinogradov 2001; Bailly et al. 2007). Related to the scarcity of available sequences for these three globin types in annelids, only three families are known to express all simultaneously: Opheliidae, Terebellidae, and Alvinellidae (Weber 1978; Hourdez et al. 2000). To the best of our knowledge, only one study (Bailly et al. 2007) has focused exclusively on the evolutionary history of the globin superfamily in annelids. Using 28 annelid globin sequences, Bailly et al. (2007) demonstrated that extracellular globin lineages appear to have a separate evolutionary history compared with intracellular circulating and noncirculating annelid globins. Nevertheless, the real diversity of globin genes within annelids is yet to be investigated, and phylogenetic relationships between different globin types in these animals remain uncertain. In order to access the real diversity of globins present in annelids, and to elucidate the evolutionary relationships of these proteins within the phylum, our work represents a systematic analysis of 250 annelid transcriptomes to survey for globin genes. We report the existence of four noncirculating intracellular globin types in annelids: Mbs, nHbs, Cygbs, and Adgbs. Our molecular evolutionary analyses indicate a complex evolutionary history for members of the globin superfamily within Annelida, which includes several gene duplication and neo-functionalization events.

Materials and Methods

Transcriptomes of 250 annelid species were used in this work and information about each species is indexed in supplementary file 1, Supplementary Material online. The transcriptomes were collected as part of the WormNet II project that primarily seeks to resolve annelid phylogeny. Specimens were obtained by several collection techniques, including intertidal sampling, dredging, and box cores. Afterwards, all samples were preserved either in RNALater or frozen at –80 °C. Protocols from Kocot et al. (2011) and Whelan et al. (2015) were used for RNA extraction, cDNA preparation and high-throughput sequencing. Succinctly, total RNA was extracted using TRIzol (Invitrogen) either from whole small animals, or from the body walls and coelomic regions, in bigger specimens. After extraction, RNAs were purified using the RNeasy kit (Qiagen) with on-column DNase digestion. To reverse transcribe single-stranded RNA template, the SMART cDNA Library Construction Kit (Clontech) was used and double-stranded cDNA synthesis was performed with the Advantage 2 PCR system (Clontech). The Genomic Services Lab at the Hudson Alpha Institute (Huntsville, AL) was responsible for barcoding and sequencing libraries with Illumina technology. Considering that transcriptomic sequencing was conducted from 2012 to 2015, paired-end runs were of 100 or 125 bp in length, utilizing either v3 or v4 chemistry on Illumina HiSeq 2000 or 2500 platforms (San Diego, CA). Finally, in order to facilitate sequence assembly, paired-end transcriptome data were digitally normalized to an average k-mer coverage of 30 using the script normalize-by-median.py (Brown et al. 2012) and was assembled using Trinity r2013-02-25 with default settings (Grabherr et al. 2011). Bioinformatic methods employed to search in silico for genes of the globin family were similar to those in Belato et al. (2019). Transcriptome data were processed through the Trinotate annotation pipeline (http://trinotate.github.io/; last accessed July, 24, 2020) (Grabherr et al. 2011). The Trinotate pipeline uses a BLAST-based method against two databases, namely EggNOG 4.5.1 (Huerta-Cepas et al. 2016) and KEGG (Kanehisa et al. 2016), to provide the Gene Ontology (GO) annotation. The GO is a standardized functional classification system for genes that describes the properties of genes and their products using a dynamic-updated controlled vocabulary (Gene Ontology Consortium 2004). The complete list of software employed by the Trinotate pipeline to provide the annotation of genes is: HMMER 3.2.1, for protein domain identification (Finn et al. 2011); tmHMM 2.0, for prediction of transmembrane helices of proteins (Krogh et al. 2001); RNAmmer 1.2, for prediction of ribosomal RNA (Lagesen et al. 2007); SignalP 4.1, to predict signal peptide cleavage sites (Petersen et al. 2011); GOseq, for prediction of the GO (Young et al. 2010); and EggNOG 4.5.1, for searching orthologous groups of genes (Huerta-Cepas et al. 2016). As we used transcriptomic data, we can only make inferences about the presence of gene signatures and refrain from drawing conclusions about their absence, because genes may be present in the genome without being expressed in the sampled tissue at time of collection. Retrieved sequences were manually verified by inspecting each functional annotation made by Trinotate in order to select sequences annotated as Hbs, Mbs, Cygbs, Adgbs, and nHbs. In addition, 10 annelid extracellular hemoglobin (HBL-Hbs) sequences from Belato et al. (2019) were selected to be used in our analyses because these sequences were obtained employing the same bioinformatic pipeline including the same annotation and validation steps. RNA sequences identified as one of the genes described above were then translated into amino acids using TransDecoder software with default settings (https://transdecoder.github.io/; last accessed July, 24, 2020). All translated protein sequences were evaluated using the Pfam domain check (Finn et al. 2016) employing the EMBL-EBI protein database with an e-value cutoff of 10−5. This step was necessary because the TransDecoder translation may produce multiple open reading frames (ORFs). Translations returning with a confirmed Pfam domain and that were longer than 130 amino acids residues were retained for further analyses. In order to refine the results, we added a confirmatory step, where we performed a reciprocal BLASTp (Altschul et al. 1990) of all sequences annotated as the target genes against the non-redundant protein database (nr) from the National Center for Biotechnology (NCBI). Only sequences that presented a significant top “hit” with a minimum e-value of 10−10 to one of the target genes Hbs, Mbs, Cygbs, Adgbs, nHbs, and HBL-Hbs were retained. We have labeled proteins according to their putative functional role, considering local similarity between sequences from our data set and NCBI database. Sequence similarity and similarity in domain structure are generally indicative of similarity in function (Marcotte 1999; Ashburner et al. 2000; Gabaldón and Huynen 2004). Adgbs were manually rearranged in order to remove the IQ motif and concatenate the eight α-helices (A–H) of the globin domain that are inverted in these globins. After all validation steps, the remaining 379 sequences were aligned with MAFFT using the accurate algorithm E-INS-i (Katoh and Standley 2013), and gap-rich regions in the alignment were removed with trimAl 1.2 (Capella-Gutierrez et al. 2009) using a gap threshold of 0.75. The alignment was manually curated using the software Geneious 11.1.2 (Kearse et al. 2012) in order to remove spuriously aligned sequences based on similarity to the protein alignment as a whole. To eliminate data set redundancy sequences that presented 100% of similarity to each other were also excluded from the alignment. The resulting amino acid alignment of 238 sequences was subsequently used for phylogenetic analyses. ModelFinder, an ultrafast and automatic model selector implemented in IQ-TREE software (Kalyaanamoorthy et al. 2017) was applied to carry out statistical selection of the best-fit model of protein evolution for the data set using the Akaike and Bayesian Information Criteria (AIC and BIC, respectively) (Darriba et al. 2011). Two phylogenetic inference methods were employed: 1) a maximum likelihood inference performed with the IQ-TREE software (Nguyen et al. 2015) with branch support obtained with the ultrafast bootstrap approximation (UFBoot) with 1,000 replicates (Minh et al. 2013) and 2) a Bayesian inference using MrBayes 3.2.7 (Ronquist and Huelsenbeck 2003) with two independent runs, each one containing four Metropolis-coupled chains that were run for 107 generations and sampled every 500th generation to approximate posterior distributions. To confirm whether chains achieved stationary and to determine an appropriate burn-in, we evaluated trace plots of all MrBayes parameter outputs in Tracer v1.6 (Rambaut et al. 2014). The first 25% of samples were discarded as burn-in and a majority rule consensus tree was generated using MrBayes. Bayesian posterior probabilities were used for assessing statistical support of each bipartition. The resultant trees were summarized with FigTree 1.4.3 (Rambaut 2009) and rooted by midpoint rooting (Farris 1972; Hess and Russo 2007). The Phyre2 web portal (Kelley et al. 2015) was used to predict the putative tertiary structures of the different globins and models were visualized and inspected using UCSF Chimera (Pettersen et al. 2004). In order to better understand the evolutionary relationship between extracellular globins of annelids and those from other metazoans, another maximum likelihood analysis was performed using the IQ-TREE software (Nguyen et al. 2015) with branch support obtained under the UFBoot (Minh et al. 2013). This analysis expanded the original 238 sequences alignment with another 15 extracellular globin sequences from five metazoan species obtained from Belato et al. (2019): Astrotoma agassizii (Echinodermata; MH995909 and MH996362), Cephalodiscus gracilis (Hemichordata; MH995925–26), Hemithiris psittacea (Brachiopoda; MH996374–75 and MH996036–38), Phoronis psammophila (Phoronida; MH996210–11), and Priapulus sp. (Priapulida; MH996240–42 and MH996405). To access the evolutionary relationships between annelid globins and other metazoan globins, we selected a representative panel of 54 globins from deuterostomes (including vertebrates) and other protostomes from NCBI to be used as references of metazoan globins (supplementary file 2, Supplementary Material online). Together with a subset of 43 annelid globins (table 1), these 97 sequences of annelid globins + metazoan globins were aligned with MAFFT using the accurate algorithm E-INS-i (Katoh and Standley 2013), and gap-rich regions in the alignment were removed with trimAl 1.2 (Capella-Gutierrez et al. 2009) using a gap threshold of 0.50 (supplementary file 3, Supplementary Material online). Afterwards, a maximum likelihood analysis was performed using the IQ-TREE software (Nguyen et al. 2015) with branch support obtained under the UFBoot (Minh et al. 2013).
Table 1

List of all taxa analyzed in which globin genes were found and the number of expressed genes in each species

TaxonHb GenesMb GenesnHb GenesCygb GenesAdgb GenesHBL-Hb GenesAccession Number
Acrocirridae
 Macrochaeta sp.7MT312144–50
Aeolosomatidae
 Aeolosoma sp.412MT312084–87 MT312044 MH995870–71
Alciopidae
 Alciopa sp.3MT312045–47
Alvinellidae
 Paralvinella palmiformis Desbruyères & Laubier, 198631MT312166–68 MT311987
 Paramphinome jeffreysii (McIntosh, 1868)11MT312169 MT312037
Ampharetidae
 Amphisamytha galapagensis Zottoli, 19831MT312090
 Auchenoplax crinita Ehlers, 18871MT312103
 Melinna maculata Webster, 187911MT312152 MT312032
Amphinomidae
 Chloeia pinnata Moore, 19111MT312108
 Hermodice carunculata (Pallas, 1766)1MT312128
 Pherecardia striata (Kinberg, 1857)1MT312171
Aphroditidae
 Aphrodita japonica Marenzeller, 1879112MT312093 MT311999 MT312048–49
Arenicolidae
 Arenicola loveni Kinberg, 18662MT312000–01
 Abarenicola pacifica Healy & Wells, 1959112MT312083 MT311998 MH995867–68
Aspidosiphonidae
 Aspidosiphon laevis Quatrefages, 186511MT312102 MT312050
 Lithacrosiphon cristatus (Sluiter, 1902)2MT312140–41
Branchiobdellidae
 Branchiobdella parasita (Braun, 1805)1MT312020
Chaetopteridae
 Chaetopterus variopedatus (Renier, 1804)1MT312106
 Mesochaetopterus taylori Potts, 19141MT312007
Chrysopetalidae
 Arichlidon gathofi Watson Russell, 20004MT312095–98
Cirratulidae
 Aphelochaeta sp.12MT312092 MH996356 MH995881
 Chaetozone sp.1MT312107
 Tharyx kirkegaardi Blake, 19911MT312206
Dinophilidae
 Dinophilus gyrociliatus O. Schmidt, 18571MT312054
Dorvilleidae
 Ophryotrocha globopalpata Blake & Hilbig, 199011MT312163 MT312073
Eunicidae
 Eunice norvegica (Linnaeus, 1767)1MT312113
 Eunice pennata (Müller, 1776)11MT312114 MT312005
 Marphysa sanguinea (Montagu, 1813)1MT312151
 Palola sp.21MT312164–65 MT312036
Flabelligeridae
 Ilyphagus octobranchus Hartman, 19651MT312066
 Poeobius meseres Heath, 19301MT312176
Glossoscolecidae
 Andiorrhinus sp.2MH995879–80
 Pontoscolex corethrurus (Muller, 1857)1MT312039
 Urobenus sp.1MT312043
Glyceridae
 Glycera americana Leidy, 18551MT312120
 Glycera dibranchiata Ehlers, 18684MT312121–24
 Hemipodia simplex (Grube, 1857)3MT312125–27
Goniadidae
 Goniada brunnea Treadwell, 19061MT312006
Haplotaxidae
 Delaya leruthi (Hrabĕ, 1958)1MT312112
Haplotaxidae gen. sp.1MT312059
Hesionidae
 Hesionides sp.1MT312129
 Microphthalmus listensis Westheide, 19671MT312153
 Microphthalmus similis Bobretzky, 187011MT312070 MT312033
Histriobdellidae
 Histriobdella homari Beneden, 18582MT312063–64
Hrabeiellidae
 Hrabeiella periglandulata Pizl and Chalupský, 198411MT312130 MT312065
Komarekionidae
 Komarekiona eatoni Gates, 19741MT312028
Lessoniaceae
 Eisenia sp.1MT312026
Lumbricidae
 Dendrobaena hortensis (Michaelsen, 1890)11MT312021 MT311991
Lumbrineridae
 Lumbrineris crassicephala Hartman, 196511MT312142 MT312031
 Ninoe nigripes Verrill, 187321MT312159–60 MT312034
Maldanidae
 Axiothella rubrocincta (Johnson, 1901)11MT312105 MT312002
 Clymenella torquata (Leidy, 1855)111MT312109 MT312003 MT312053
 Nicomache venticola Blake & Hilbig, 199013MT312158 MT312008–10
 Praxillella pacifica Berkley, 1929141MT312177 MT312012–15 MT311988
 Sabaco elongatus (Verrill, 1873)21MT312184–85 MT312016
Megascolecidae
 Amynthas sp.1MT312091
 Pontodrilus litoralis (Grube, 1855)1MT312038
Microchaetidae
 Gattyana cirrhosa (Pallas, 1766)1MT312057
 Kynotus pittarellii Cognetti, 19061MT312029
Moniligastridae
 Drawida sp.1MT312025
Naididae
 Aulodrilus japonicus Yamaguchi, 19531MT312104
 Bothrioneurum vejdovskyanum Štolc, 18861MT312019
 Heterodrilus sp. 113MT312062 MT311992–94
Nephtyidae
 Aglaophamus verrilli (McIntosh, 1885)122MT312088 MT312017–18 MH995874 MH995876
 Nephtys incisa Malmgren, 186511MT312156 MT312071
Nereididae
 Alitta succinea (Leuckart, 1847)1MT312089
Octochaetidae
 Dichogaster green tree worm1MT312022
 Dichogaster guadeloupensis James, 19961MT312023
Oenonidae
 Arabella sp.1MT312094
 Drilonereis sp.1MT312004
Onuphidae
 Diopatra cuprea (Bosc, 1802)1
Opheliidae
 Armandia sp.3MT312099–101
 Ophelina acuminata Örsted, 18432MT312161–62
Orbiniidae
 Leitoscoloplos robustus (Verrill, 1873)22MT312138–39 MT312067–68
 Naineris laevigata (Grube, 1855)1MT312155
 Proscoloplos cygnochaetus Day, 19541MT312179
Oweniidae
 Galathowenia oculata (Zachs, 1923)2MT312118–19
 Owenia fusiformis Delle Chiaje, 18441MT312035
Parergodrilidae
 Stygocapitella subterranea 2 Knöllner, 193431MT312194–96 MT312081
Parvidrilidae
 Parvidrilus meyssonnieri DesChâtelliers & Martin, 20121MT312170
Pectinariidae
 Pectinaria gouldii (Verrill, 1874)1MT312011
Phyllodocidae
 Eulalia myriacyclum (Schmarda, 1861)11MT312055 MT312027
 Nereiphylla sp.1MT312157
Pilargidae
 Synelmis sp.1MT312197
Polygordiidae
 Polygordius sp.1MT312074
Polynoidae
 Halosydna brevisetosa Kinberg, 18551MT312058
 Hermenia verruculosa Grube, 18562MT312060–61
 Lepidonotus semitectus (Stimpson, 1856)1MT312030
Protodriloididae
 Protodriloides chaetifer (Remane, 1926)41MT312180–83 MT312077
Sabellariidae
 Idanthyrsus sp.2MT312131–32
Sabellidae
 Bispira pacifica (Berkeley & Berkeley, 1954)1MT311989
 Myxicola infundibulum (Montagu, 1808)1MT312154
Scalibregmatidae
 Scalibregma inflatum Rathke, 18431MT312186
Serpulidae
 Crucigera zygophora (Johnson, 1901)2MT312110–11
 Galeolaria caespitosa Lamarck, 18181MT312056
 Serpula vermicularis Linnaeus, 176711MT312188 MT312078
 Spirobranchus kraussii (Baird, 1865)2MT312075–76
Siboglinidae
 Lamellibrachia luymesi van der Land & Nørrevang, 19751MT312133
 Osedax sp.2KT166962–63
 Sclerolinum brattstromi Webb, 19641MT312187
 Siboglinum ekmani Jägersten, 19563MT312189–91
Sigalionidae
 Sigalion sp.1MT312079
Sparganophilidae
 Sparganophilus sp.11MT312080 MT312040
Spionidae
 Boccardia proboscidea Hartman, 19402MT312051–52
 Laonice sp.2MT311995–96
 Prionospio dubia Day, 19611MT312178
Sternaspidae
 Sternaspis scutata (Ranzani, 1817)1MT311990
Sternaspis sp.1MT311997
Syllidae
 Odontosyllis gibba Claparède, 18631MT312072
 Syllis cf. hyalina Grube, 18631MT312041
Terebellidae
 Eupolymnia nebulosa (Montagu, 1819)3MT312115–17
 Lanicides sp.4MT312134–37
 Lysilla sp.11MT312143 MT312069
 Pista macrolobata Hessle, 19174MT312172–75
 Streblosoma hartmanae Kritzler, 19712MT312192–93
 Terebellides stroemii Sars, 18358MT312198–205
 Thelepus crispus Johnson, 19012MT312207–08
Themistidae
 Themiste pyroides (Chamberlin, 1919)1MT312209
Travisiidae
 Travisia brevis Moore, 19233MT312210–12
Tritogeniidae
 Tritogenia sulcata Kinberg, 18671MT312042
Trochochaetidae
Trochochaetidae gen. sp.1MT312082

Note.—Hb, hemoglobin; Mb, myoglobin; nHb, nerve hemoglobin; Cygb, cytoglobin; Adgb, androglobin; HBL-Hb, hexagonal bilayer hemoglobin. GenBank accession numbers are also provided here and detailed in supplementary file 4, Supplementary Material online.

List of all taxa analyzed in which globin genes were found and the number of expressed genes in each species Note.—Hb, hemoglobin; Mb, myoglobin; nHb, nerve hemoglobin; Cygb, cytoglobin; Adgb, androglobin; HBL-Hb, hexagonal bilayer hemoglobin. GenBank accession numbers are also provided here and detailed in supplementary file 4, Supplementary Material online.

Results

The initial Trinotate analysis recovered 5,267 nucleotide sequences annotated as Hbs, Mbs, Cygbs, Adgbs, nHbs, or HBL-Hbs. After all processing steps, including translation from nucleotides to amino acids, selection by minimum size, reciprocal BLASTp, and Pfam domain evaluation, our in silico analyses recovered 238 unique amino acid sequences. These sequences consisted of 130 sequences of Hbs, 19 sequences of Mbs, 27 sequences of Cygbs, four sequences of Adgbs, 39 sequences of nHbs, and 19 sequences of HBL-Hbs genes (table 1). These genes are actively transcribed in 121 annelid species belonging to 64 different families as detailed in table 1. Accession numbers of each one of the sequences obtained in this work were deposited at GenBank and are listed in table 1 and detailed in supplementary file 4, Supplementary Material online. The number of expressed Hbs genes in a given species ranged from one in 48 different species to eight in Terebellides stroemii (Trichobranchidae). For Mbs, the number of expressed genes in a given species ranged from one in 10 different species to four in Praxillella pacifica (Maldanidae). For Cygbs the corresponding numbers were one in 25 different species and two in Aglaophamus verrilli (Nephtyidae). One Adgb gene was found in four different species, and nHbs genes ranged from one expressed copy in 24 different species to three in Alciopa sp. (Alciopidae). Besides the 10 HBL-Hbs sequences selected from Belato et al.’s (2019) previous work that were used as reference (GenBank accession numbers: MH995867–68, MH995870–71, MH995874, MH995876, MH995879–80, MH996356, and MH995881), we found more HBL-Hb genes, ranging from one in two different species to three in Heterodrilus sp. (Naididae). Besides the two well-known intracellular noncirculating globin types found in annelids, Mbs and nHbs, we reveal the presence of Cygbs and Adgbs as two new members of this globin type. Additionally, we found three more families that express all globin types (extracellular circulating, intracellular circulating and intracellular noncirculating) simultaneously: 1) Aeolosomatidae (Aeolosoma sp.), 2) Arenicolidae (Abarenicola pacifica), and 3) Nephtyidae (Aglaophamus. verrilli). Following trimming and alignment of translated transcripts, the final alignment had the maximum sequence length of 142 residue positions (supplementary file 5, Supplementary Material online). All sequences in the alignment contained the essential residues of the globin domain (Phe at CD1, His at E7, His at F8), which is an indicator of respiratory function for each one of these proteins (Lecomte et al. 2005; fig. 1). The best-fixed rate model for phylogenetic analyses of the annelid globins data set and the annelid globins + metazoan extracellular globins data set was the WAG model. For phylogenetic analyses of the annelid globins + metazoan globins data set the best-fixed rate model was LG.
. 1.

Multiple amino acid sequence alignment of annelid Cygb (cytoglobin), Hb (hemoglobin), HBL (hexagonal bilayer hemoglobin), nHb (nerve hemoglobin), Mb (myoglobin), and manually rearranged Adgb (androglobin). Invariant amino acid residues at positions CD1, E7, and F8, which are diagnostic characters of the globin domain, are indicated in bold.

Multiple amino acid sequence alignment of annelid Cygb (cytoglobin), Hb (hemoglobin), HBL (hexagonal bilayer hemoglobin), nHb (nerve hemoglobin), Mb (myoglobin), and manually rearranged Adgb (androglobin). Invariant amino acid residues at positions CD1, E7, and F8, which are diagnostic characters of the globin domain, are indicated in bold. Bayesian and maximum likelihood inferences recovered the same topology with several strongly supported clades in the annelid globins tree (fig. 2), although the topology did not mirror the recent Annelida phylogeny (e.g., Struck et al. 2015; Weigert and Bleidorn 2016). Adgb genes clustered into a highly supported monophyletic group by bootstrap values and posterior probabilities (100%; PP = 1; orange clade; fig. 2), as well as the HBL-Hb genes that also clustered into one monophyletic group with strong statistical support (100%; PP = 1; green clade; fig. 2). When HBL-Hbs sequences from other metazoans were added to the alignment, all extracellular globins from both annelids and other metazoans clustered together in one monophyletic group with high support values (100%; supplementary file 6, Supplementary Material online).
. 2.

Maximum likelihood gene genealogy of annelid globin genes rooted by midpoint. Bootstrap support values obtained from the maximum likelihood inference are shown in black, and the posterior probabilities values obtained from the Bayesian inference are shown in red. To improve clarity, only support values >80 or 0.8 are shown. Posterior probabilities values: Yellow clades represent hemoglobin groups. Blue clades represent nerve hemoglobins. Purple clades represent myoglobins. Green clade represents hexagonal bilayer hemoglobins. Pink clades represent cytoglobins. Orange clade represents androglobins.

Maximum likelihood gene genealogy of annelid globin genes rooted by midpoint. Bootstrap support values obtained from the maximum likelihood inference are shown in black, and the posterior probabilities values obtained from the Bayesian inference are shown in red. To improve clarity, only support values >80 or 0.8 are shown. Posterior probabilities values: Yellow clades represent hemoglobin groups. Blue clades represent nerve hemoglobins. Purple clades represent myoglobins. Green clade represents hexagonal bilayer hemoglobins. Pink clades represent cytoglobins. Orange clade represents androglobins. Mbs clustered in two distinct well-supported clades (100%; PP > 0.97; purple clades; fig. 2), as well as Cygbs which also grouped into two separate highly supported clades (100%; PP > 0.99; pink clades; fig. 2). Concerning nHbs, a majority of sequences (26 sequences from a total of 39) clustered into one large and highly supported clade (100%; PP = 1; blue clade; fig. 2) and four other small clades with very few sequences in each one (blue clades; fig. 2). The majority of Hb sequences also clustered into one large clade (67%; PP > 0.58; yellow clade; fig. 2); however, several smaller clades with few sequences were also formed (yellow clades; fig. 2). Within the Hb clades some low nodal support values were found. However, such results are common in phylogenetic analyzes within the same protein family (DeSalle 2015). The tertiary structures of the different globin genes inferred using the Phyre2 web portal resulted in proteins with high similarity among their tertiary structure and putative respiratory function (supplementary file 7, Supplementary Material online). Model prediction confidence and coverage ranged from 97% to 100%. In the phylogenetic reconstruction recovered with the annelid globins + metazoan globins data set Adgbs from both vertebrates and invertebrates clustered in one clade (75%; red clade; fig. 3), and HBL-Hbs also clustered into one monophyletic group (75%; green clade; fig. 3). Mb clustered in two separate clades with strong bootstrap support values, one with vertebrate Mbs (85%; light purple clade; fig. 3) and another with invertebrate Mbs (100%; dark purple clade; fig. 3). Cygbs formed three different groups with high support values, one with vertebrate Cygbs (85%; pink clade; fig. 3), and two others with invertebrate Cygbs (86% and 87%; pink clades; fig. 3). nHbs and vertebrate Ngb clustered into two distinct clades (71% and 75%; blue clades; fig. 3), as well as vertebrate Hbs A and B, which also clustered in two monophyletic groups (95%; dark and light gray clades, respectively; fig. 3). Invertebrate Hbs clustered into five different clades (yellow clades; fig. 3). Some low nodal support values were also found in this gene tree.
. 3.

Maximum likelihood gene genealogy of 43 annelid globin genes and 54 metazoan globin genes rooted by midpoint. Bootstrap support values obtained from the maximum likelihood inference are shown above the branches. Dark and light blue clades are nerve hemoglobin and vertebrate neuroglobin, respectively. Green clade is hexagonal bilayer hemoglobin. Yellow clades are invertebrate hemoglobins. Dark and light pink clades are invertebrate and vertebrate cytoglobin, respectively. Dark and light gray clades are vertebrate hemoglobin A and B. Red clade is androglobin. Dark and light purple clades are vertebrate and invertebrate myoglobin, respectively. Vertebrate and invertebrate Cygbs, Hbs, and Mbs do not represent genuine orthologs.

Maximum likelihood gene genealogy of 43 annelid globin genes and 54 metazoan globin genes rooted by midpoint. Bootstrap support values obtained from the maximum likelihood inference are shown above the branches. Dark and light blue clades are nerve hemoglobin and vertebrate neuroglobin, respectively. Green clade is hexagonal bilayer hemoglobin. Yellow clades are invertebrate hemoglobins. Dark and light pink clades are invertebrate and vertebrate cytoglobin, respectively. Dark and light gray clades are vertebrate hemoglobin A and B. Red clade is androglobin. Dark and light purple clades are vertebrate and invertebrate myoglobin, respectively. Vertebrate and invertebrate Cygbs, Hbs, and Mbs do not represent genuine orthologs.

Discussion

We recovered two more types of noncirculating globins within annelids: Adgbs and Cygbs, in addition to the already known nerve Hbs (nHbs) and Mbs, confirming that Annelida has the greatest diversity of oxygen-binding proteins (Mangum 1998; Weber and Vinogradov 2001; Costa-Paiva et al. 2017, 2018). Three annelid families, Opheliidae, Terebellidae, and Alvinellidae, were previously known to simultaneously express the three globin types—extracellular circulating, intracellular circulating, and intracellular noncirculating (Weber 1978; Hourdez et al. 2000). Our data reveal three additional families that express these three globin types: Aeolosomatidae, Arenicolidae, and Nephtyidae. In conjunction with results of Bailly et al. (2007), our results demonstrate that co-occurrence of different globin types is much more common than previously documented and that it probably already existed in the annelid ancestor. Our findings corroborate previous studies that suggest that vertebrate Cygbs and Mbs lineages are distinct from invertebrate Cygbs and Mbs lineages (Suzuki and Imai 1998; Hoffmann et al. 2011, 2012; Blank and Burmester 2012; Storz et al. 2013; Pillai et al. 2020). Furthermore, our phylogenetic analyses show that Cygbs and Mbs from vertebrates are not true orthologs of invertebrate Cygbs and Mbs (fig. 3). These proteins received the same name because of their presumed functional role and not their evolutionary relationships. The phylogenetic hypothesis for invertebrate globin sequences constructed by Goodman et al. (1988) divided annelid globins into two monophyletic groups: intracellular and extracellular. Similarly, the phylogenetic analysis carried out by Bailly et al. (2007) on the annelid globin genes demonstrated a well-supported division between intracellular and extracellular globins. Interestingly, we did not find one clade of extracellular globins and another of intracellular globins (fig. 2). In our phylogenetic reconstruction, extracellular globins clustered into one group within a major clade containing Hbs also (fig. 2). Based on our results, the extracellular globins appear to be phylogenetically closer to the intracellular globins. When additional extracellular globin sequences from other metazoans were added to the analysis, they all clustered together, forming a monophyletic group (supplementary file 6, Supplementary Material online). These results suggest that the extracellular globins from annelids are more closely related to the extracellular counterparts in other invertebrate phyla (Belato et al. 2019) than to the annelid intracellular globins. Studies suggest that extracellular globins arose from an ancient duplication event of an intracellular globin gene (Gotoh et al. 1987; Yuasa et al. 1996; Bailly et al. 2007; Belato et al. 2019). Considering that extracellular globins were found in both deuterostome and protostome lineages, such as Echinodermata, Mollusca, Platyhelminthes, and Brachiopoda (Belato et al. 2019), this duplication event most likely occurred at least before the protostome–deuterostome split. Hbs are very ancient proteins and have already undergone several gene duplication events (Goodman et al. 1988; Hardison 1998; Vinogradov et al. 2005, 2007; Storz 2018; Pillai et al. 2020). Supporting these observations, we find that Hbs are divided into several subgroups that may represent paralogs. Annelid Hbs clustered into one large clade that represents the annelid intracellular monomeric RBC Hbs (Goodman et al. 1975, 1988; Weber and Vinogradov 2001; Bailly et al. 2007), and some other smaller clades with few sequences (fig. 2, yellow clades). Moreover, as expected, Hbs from Glycera dibranchiata and G. americana clustered into one separate clade compared with other Hbs, which seems to represent the well-known distinct circulating glycerid RBC Hbs (Weber et al. 1977, Weber and Vinogradov 2001). Herein, we present the first record of Adgbs in annelids, with all newly discovered sequences clustered into one well supported group (fig. 2, orange clade), which is consistent with their conspicuous inversion in the globin domain, with helices C–H preceding helices A–B (Hoogewijs et al. 2012; Bracke et al. 2018). In the gene tree of annelid globins + metazoan globins, the Adgbs of vertebrates and invertebrates clustered together, corroborating the hypothesis that Adgb genes predate the origin of metazoans (Hoogewijs et al. 2012; Bracke et al. 2018). We also present the first record of invertebrate Cygbs within the Annelida. Cygbs grouped into two distinct clades (fig. 2, pink clades), where one is a sister group to nHbs and the other is sister group to a clade that contains Mbs and Hbs (fig. 2). These results indicate that Cygbs appear to have a high molecular affinity to other noncirculating intracellular globins, such as invertebrate Mbs and nHbs, similar to vertebrate Cygbs, which show phylogenetic affinities to vertebrate Mbs (Burmester et al. 2002). Annelid Mbs clustered into two well-supported clades (fig. 2, purple clades), and these two distinct groups presumably represent the two major components MbI and MbII that have been reported in the annelid Arenicola marina (Weber and Pauptit 1972; Kleinschmidt and Weber 1998). Our results are in agreement with those from Suzuki and Imai (1998), which separated Mbs from invertebrates and vertebrates (fig. 3, purple clades). Some nerve Hbs clustered into one big clade (fig. 2, blue clade) with high support values, and the other ones are mixed with other intracellular Hbs sequences (fig. 2, yellow and blue clades), supporting the hypothesis of divergent evolutionary origins of invertebrate nHbs (Wittenberg 1992; Weber and Vinogradov 2001). Dewilde et al. (1996) and Burmester et al. (2000) have reported that nHbs from the worm A. aculeata reveal higher sequence similarity to the intracellular Hbs from the bloodworm G. dibranchiata than to other invertebrate nHbs. Our gene genealogy corroborates these results in that the nHbs from Aphrodita japonica are more closely related to the Hbs from G. dibranchiata than to the other nHbs grouped in the bigger clade (fig. 2). Vertebrate Ngbs and nHbs clustered in separate clades (fig. 3), confirming that although both globins are localized in nerve tissues, they have different evolutionary origins (Weber and Vinogradov 2001; Blank and Burmester 2012; Pillai et al. 2020). The inferred tertiary structure of the globin genes suggested that they could have a putative respiratory function (supplementary file 7, Supplementary Material online). In conclusion, our findings confirm a pattern evident from several recent studies, there is a much greater phylogenetic distribution of oxygen-binding proteins than previously established, especially in annelids (Bailly et al. 2008; Martín-Durán et al. 2013; Costa-Paiva et al. 2017, 2018; Belato et al. 2019). We found two new intracellular noncirculating globin types within annelids: Adgbs and Cygbs, in addition to the two other documented types. We confirm that Mbs and Cygbs from vertebrates and those from invertebrates have different evolutionary origins. Our analyses demonstrate an intimate relationship between annelid extracellular globins and those from other metazoans, most likely because they were already present in the common ancestor of protostomes and deuterostomes—and reaffirm the crucial importance of further comprehensive studies on the molecular evolution of the globin superfamily across the metazoan evolutionary tree.

Supplementary Material

Supplementary data are available at Genome Biology and Evolution online. Click here for additional data file.
  78 in total

Review 1.  Neuroglobin: a respiratory protein of the nervous system.

Authors:  Thorsten Burmester; Thomas Hankeln
Journal:  News Physiol Sci       Date:  2004-06

Review 2.  Structural divergence and distant relationships in proteins: evolution of the globins.

Authors:  Juliette T J Lecomte; David A Vuletich; Arthur M Lesk
Journal:  Curr Opin Struct Biol       Date:  2005-06       Impact factor: 6.809

3.  Characterization of a stellate cell activation-associated protein (STAP) with peroxidase activity found in rat hepatic stellate cells.

Authors:  N Kawada; D B Kristensen; K Asahina; K Nakatani; Y Minamiyama; S Seki; K Yoshizato
Journal:  J Biol Chem       Date:  2001-04-24       Impact factor: 5.157

4.  Error, signal, and the placement of Ctenophora sister to all other animals.

Authors:  Nathan V Whelan; Kevin M Kocot; Leonid L Moroz; Kenneth M Halanych
Journal:  Proc Natl Acad Sci U S A       Date:  2015-04-20       Impact factor: 11.205

5.  Cytoglobin: a novel globin type ubiquitously expressed in vertebrate tissues.

Authors:  Thorsten Burmester; Bettina Ebner; Bettina Weich; Thomas Hankeln
Journal:  Mol Biol Evol       Date:  2002-04       Impact factor: 16.240

6.  Crystal structure of cytoglobin: the fourth globin type discovered in man displays heme hexa-coordination.

Authors:  Daniele de Sanctis; Sylvia Dewilde; Alessandra Pesce; Luc Moens; Paolo Ascenzi; Thomas Hankeln; Thorsten Burmester; Martino Bolognesi
Journal:  J Mol Biol       Date:  2004-02-27       Impact factor: 5.469

Review 7.  Hemoglobins from bacteria to man: evolution of different patterns of gene expression.

Authors:  R Hardison
Journal:  J Exp Biol       Date:  1998-04       Impact factor: 3.312

8.  KEGG as a reference resource for gene and protein annotation.

Authors:  Minoru Kanehisa; Yoko Sato; Masayuki Kawashima; Miho Furumichi; Mao Tanabe
Journal:  Nucleic Acids Res       Date:  2015-10-17       Impact factor: 16.971

9.  A broad genomic survey reveals multiple origins and frequent losses in the evolution of respiratory hemerythrins and hemocyanins.

Authors:  José M Martín-Durán; Alex de Mendoza; Arnau Sebé-Pedrós; Iñaki Ruiz-Trillo; Andreas Hejnol
Journal:  Genome Biol Evol       Date:  2013       Impact factor: 3.416

10.  The Pfam protein families database: towards a more sustainable future.

Authors:  Robert D Finn; Penelope Coggill; Ruth Y Eberhardt; Sean R Eddy; Jaina Mistry; Alex L Mitchell; Simon C Potter; Marco Punta; Matloob Qureshi; Amaia Sangrador-Vegas; Gustavo A Salazar; John Tate; Alex Bateman
Journal:  Nucleic Acids Res       Date:  2015-12-15       Impact factor: 16.971

View more
  2 in total

1.  Evolutionary analysis of globin domains from kinetoplastids.

Authors:  Akash Mitra; Kusumita Acharya; Arijit Bhattacharya
Journal:  Arch Microbiol       Date:  2022-07-16       Impact factor: 2.667

2.  Globins in the marine annelid Platynereis dumerilii shed new light on hemoglobin evolution in bilaterians.

Authors:  Solène Song; Viktor Starunov; Xavier Bailly; Christine Ruta; Pierre Kerner; Annemiek J M Cornelissen; Guillaume Balavoine
Journal:  BMC Evol Biol       Date:  2020-12-29       Impact factor: 3.260

  2 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.