Literature DB >> 33527140

The Origin and Evolution of Antistasin-like Proteins in Leeches (Hirudinida, Clitellata).

Rafael Eiji Iwama1,2, Michael Tessler3,4,5, Mark E Siddall, Sebastian Kvist1,2.   

Abstract

Bloodfeeding is employed by many parasitic animals and requires specific innovations for efficient feeding. Some of these innovations are molecular features that are related to the inhibition of hemostasis. For example, bloodfeeding insects, bats, and leeches release proteins with anticoagulatory activity through their salivary secretions. The antistasin-like protein family, composed of serine protease inhibitors with one or more antistasin-like domains, is tightly linked to inhibition of hemostasis in leeches. However, this protein family has been recorded also in non-bloodfeeding invertebrates, such as cnidarians, mollusks, polychaetes, and oligochaetes. The present study aims to 1) root the antistasin-like gene tree and delimit the major orthologous groups, 2) identify potential independent origins of salivary proteins secreted by leeches, and 3) identify major changes in domain and/or motif structure within each orthologous group. Five clades containing leech antistasin-like proteins are distinguishable through rigorous phylogenetic analyses based on nine new transcriptomes and a diverse set of comparative data: the trypsin + leukocyte elastase inhibitors clade, the antistasin clade, the therostasin clade, and two additional, unnamed clades. The antistasin-like gene tree supports multiple origins of leech antistasin-like proteins due to the presence of both leech and non-leech sequences in one of the unnamed clades, but a single origin of factor Xa and trypsin + leukocyte elastase inhibitors. This is further supported by three sequence motifs that are exclusive to antistasins, the trypsin + leukocyte elastase inhibitor clade, and the therostasin clade, respectively. We discuss the implications of our findings for the evolution of this diverse family of leech anticoagulants.
© The Author(s) 2020. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.

Entities:  

Keywords:  anticoagulants; antistasin; bloodfeeding; leeches; protein evolution

Year:  2021        PMID: 33527140      PMCID: PMC7851590          DOI: 10.1093/gbe/evaa242

Source DB:  PubMed          Journal:  Genome Biol Evol        ISSN: 1759-6653            Impact factor:   3.416


Significance

Antistasin-like proteins are common in leech saliva and have also been recorded in non-bloodfeeding invertebrates, such as cnidarians, polychaetes, and oligochaetes. Their function in non-bloodfeeding invertebrates is still unknown, and so is the origin and protein-level changes related to bloodfeeding in leeches. The present study establishes a root for the leech antistasin-like gene tree, delimits orthologous groups of leech antistasin-like proteins, and identifies conserved regions specific to each protein lineage. Our findings support multiple origins of leech antistasin-like proteins and identification of protein-level changes that might be related to bloodfeeding.

Introduction

Bloodfeeding occurs in mammals, fishes, birds, and numerous invertebrate taxa, such as insects, arachnids, and leeches (Mans and Neitz 2004; Oviedo et al. 2009; Phillips and Siddall 2009; Gerdol et al. 2018). Several evolutionary innovations are required for bloodfeeding, including both morphological and molecular adaptations (Ribeiro and Arcà 2009). Although morphologically diverse, feeding structures and mouthparts of bloodfeeding animals usually include an organ designed for incision making, such as a proboscis in mosquitoes and glossiphoniid leeches, or specific adaptations to the teeth and mandibular systems such as in bats and lampreys (Potter and Gill 2003; Ribeiro and Arcà 2009; Tessler, de Carle, et al. 2018). Other examples of innovations include organs harboring bacteria that aid in the production of B vitamins that are lacking from the blood diet, specialized organs for fixation to the host’s skin, metabolic adaptations against heme toxicity, and water excretion (Sawyer 1986; Graça-Souza et al. 2006; Kersch and Pietrantonio 2011; Siddall et al. 2011; Manzano-Marín et al. 2015; Smith et al. 2015). Some of the most important innovations required for bloodfeeding are related to the inhibition of hemostasis (blood clotting) through the secretion of salivary compounds that act as anticoagulants (Ribeiro and Arcà 2009; Min et al. 2010). Anticoagulants are proteins that inhibit the process of hemostasis by interaction with various components of the coagulation cascade. Such proteins have already been recorded for a subset of bloodfeeding species, including bats, insects, and leeches (Assumpção et al. 2008; Min et al. 2010; Mizurini et al. 2013; Ribeiro et al. 2018; Tessler, Marancik, et al. 2018), but they are not restricted to bloodfeeders. In some non-bloodfeeding organisms, anticoagulants prevent the formation of clots in normal conditions and allow blood flow in the circulatory system of the organism (Tsakiris et al. 1999; Hanumanthaiah et al. 2002; Lee et al. 2010; Mehr et al. 2015). Leeches (Hirudinida sensu Tessler, de Carle, et al. 2018) are notorious for their bloodfeeding behavior, which was extensively leveraged in medicinal practices during the 18th and 19th centuries (Elliott and Kutschera 2011), and they continue to be used in modern, science-based medicine (Utley et al. 1998; Herlin et al. 2017). Despite the prominence of bloodfeeding as a feeding strategy in leeches, about one-third of the known leech diversity are non-bloodfeeding species that feed on other macroinvertebrates by two main strategies. Liquidosomatophagous leeches ingest body liquids and tissue of invertebrates with the use of a proboscis; this feeding strategy is manifested in leeches of the suborders Glossiphoniiformes and Oceanobdelliformes (Sawyer 1986). By contrast, some members of the Americobdelliformes + Erpobdelliformes + Hirudiniformes clade (formerly Arhynchobdellida) are macrophagous, meaning that their prey are ingested whole. Hirudinea (leeches + Branchiobdellida [crayfish worms] + Acanthobdellida [salmonid-fish parasites]) is consistently recovered as the sister to the non-bloodfeeding oligochaete family Lumbriculidae, and two hypotheses about the origin of bloodfeeding have been proposed (Siddall et al. 2001; Erséus and Källersjö 2004; Tessler, de Carle, et al. 2018). The first hypothesis was proposed by Sawyer (1986) and it states that the bloodfeeding behavior in leeches evolved from a non-bloodfeeding ancestor. However, the detailed placement of non-bloodfeeding taxa on the leech tree has led to the second hypothesis, suggesting bloodfeeding as the plesiomorphic state for leeches with subsequent independent losses of the behavior (Apakupakul et al. 1999; Borda and Siddall 2004; Tessler, de Carle, et al. 2018). Bloodfeeding leeches use three main strategies to inhibit hemostasis: 1) inhibition of crosslinked platelets by preventing platelet adhesion to collagen, 2) inhibition of platelet adhesion to fibrinogen, and 3) thrombin inhibition (Koh and Kini 2009; Francischetti 2010). Among leech thrombin inhibitors, antistasin targets factor Xa, an agonist of prothrombin (Dunwiddie et al. 1989). Antistasin is an inhibitor of serine proteases and is formed by two tandemly repeated domains that each includes ten cysteine residues (Nutt et al. 1988). These domains are also found in other leech anticoagulants that belong to the antistasin-like protein family, such as ghilanten, therostasin, theromin, bdellastasin, guamerin, piguamerin, hirustasin, poecistasin, and gigastasin. Both ghilanten and therostasin also target factor Xa, whereas theromin targets thrombin (Blankenship et al. 1990; Chopin et al. 2000; Salzet et al. 2000); piguamerin, guamerin, bdellastasin, hirustasin, and poecistasin are thought to target plasmin, trypsin, and leukocyte elastase, although their role in hemostasis inhibition is not well understood (Söllner et al. 1994; Jung et al. 1995; Mittl et al. 1997; Rester et al. 1999). Proteins with antistasin-like domains have been recorded from several non-bloodfeeding invertebrate taxa, such as cnidarians, mollusks, polychaetes, and oligochaetes (Holstein et al. 1992; Lee et al. 2010; Nikapitiya et al. 2010; Mehr et al. 2015). Although it has been proposed that these proteins might be involved in immune response in non-bloodfeeding taxa, their specific function remains unclear (Mehr et al. 2015). The presence of antistasin-like proteins in non-bloodfeeding animals, coupled with the different targets for leech antistasin-like proteins, opens up questions regarding the origin and evolution of these proteins. The present study aims to clarify some of these questions by 1) rooting the antistasin-like tree and delimiting the major orthologous groups, 2) determining if independent origins should be inferred for leech antistasin-like proteins, and 3) identifying the major changes within each orthologous group.

Materials and Methods

Specimen Collection and Data Set Generation

All specimens used to generate the new transcriptomes were preserved in RNAlater at the time of collection. They were later identified using traditional microscopy, and the identities were secondarily confirmed using the cytochrome c oxidase subunit I and 18S rDNA loci from the transcriptomes described below. Salivary glands were dissected for larger specimens, medium sized specimens had their anterior used (where salivary tissue is found), and for the very small branchiobdellidans the entire specimen was used. The following species were newly sequenced: Lumbriculus variegatus (USA, NY), Xironogiton victoriensis (Germany), Branchiobdella cf. kozarovi (the Netherlands), Acanthobdella peledina (Sweden), Mesobdella gemmata (Chile), Cylicobdella costaricae (Costa Rica), Patagoniobdella sp. (Chile), Erpobdella sp. (USA, VT), and Americobdella valdiviana (Chile). Additionally, transcriptome data from 13 annelid species available on the NCBI Sequence Read Archive (SRA) were included in the analysis. Table 1 lists all transcriptomes used in the present study. The data set was supplemented by oligochaete expressed sequence tag (EST) sequence data available from GenBank and transcriptome sequences identified as antistasin, therostasin, guamerin, piguamerin, and bdellastasin from previous leech studies (Kvist et al. 2017; Tessler, Marancik, et al. 2018; Iwama et al. 2019). To further explore the root of the antistasin-like gene tree, a second data set was generated including all sequences in the original data set, as well as putative antistasin-like proteins from non-annelid taxa. The complete list of sequences included in the final data set, along with literature references, is available as supplementary table S1, Supplementary Material online. The final data sets are available as supplementary data S2 and S3, Supplementary Material online.
Table 1

List of Transcriptomes Used in the Present Study and Their Respective Statistics

SpeciesSRRTSATSA Used in This StudyReferenceRaw ReadsContigsORFsN50
Lumbriculus variegatus SRR12921559 GIWA00000000 GIWA01000000 Present study 32,973,677 250,739 135,066 1,975
Xironogiton victoriensis SRR12921556 GIWI00000000 GIWI01000000 Present study 37,418,234 318,490 148,552 714
Branchiobdella cf. kozarovi SRR12921562 GIVZ00000000 GIVZ01000000 Present study 36,321,044 157,890 76,134 1,202
Acanthobdella peledina SRR12921564 GIWE00000000 GIWE01000000 Present study 37,413,030 145,245 65,145 1,179
Mesobdella gemmata SRR12921558 GIWH00000000 GIWH01000000 Present study 38,530,097 116,307 79,332 2,312
Cylicobdella costaricae SRR12921561 GIWF00000000 GIWF01000000 Present study 36,208,544 90,518 69,695 3,088
Patagoniobdella sp. SRR12921557 GIWB00000000 GIWB01000000 Present study 36,495,218 107,805 75,331 2,573
Erpobdella sp. SRR12921560 GIWG00000000 GIWG01000000 Present study 38,688,118 99,710 68,909 2,747
Americobdella valdiviana SRR12921563 GIVY00000000 GIVY01000000 Present study 30,510,929 73,730 47,931 2,578
Aulodrilus japonicus SRR10997429 Erséus et al. (2020) 30,043,36795,94974,7931,543
Branchiura sowerbyi SRR9668457 Zhao et al. (2020) 22,053,563150,477108,3722,250
Chaetogaster diaphanus SRR10997419 Erséus et al. (2020) 18,614,40183,04549,9881,260
Criodrilus lacuum SRR5353276 Anderson et al. (2017) 22,692,862113,51258,490725
Dendrobaena hortensis SRR5353263 Anderson et al. (2017) 29,866,771142,71389,4321,448
Dichogaster saliens SRR5353279 Anderson et al. (2017) 30,019,52221,55511,349626
Drawida sp.SRR5353252 Anderson et al. (2017) 25,982,583152,91877,7371,042
Eudrilus eugeniae SRR5353275 Anderson et al. (2017) 28,843,70366,14837,1991,001
Glossoscolex sp.SRR5353272 Anderson et al. (2017) 8,536,064139,99779,610939
Kynotus pittarelli SRR5353264 Anderson et al. (2017) 21,051,30698,80465,0501,075
Loimia bermudensis SRR11434463 Stiller et al. (2020)54,545,318116,16867,1982,168
Lutodrilus multivesiculatus SRR5353257 Anderson et al. (2017) 6,930,86558,51341,2181,098
Propappus volki SRR5353250 Anderson et al. (2017) 25,630,160120,31264,6441,260
Rhinodrilus priollii SRR5353249 Anderson et al. (2017) 24,922,10453,92140,1861,494

Note.—New transcriptomes are denoted in bold.

List of Transcriptomes Used in the Present Study and Their Respective Statistics Note.—New transcriptomes are denoted in bold.

RNA Extraction, cDNA Library Preparation, and Sequencing

RNA extractions of leech tissue followed Kvist et al. (2014) and Tessler, Marancik, et al. (2018). Briefly, a TRIzol-based (Life Sciences) protocol was used for extraction and, thereafter, the isolated RNA was further extracted using a Qiagen RNeasy Mini Kit. Finally, an Illumina HiSeq 2500 with a TruSeq 2 × 125-bp paired end library prep was used for transcriptome sequencing at the New York Genome Center.

Data Sanitation and Annotation

Raw sequences were trimmed and filtered with Trimmomatic ver. 0.39 (Bolger et al. 2014) and TrimGalore ver. 0.6.4 (http://www.bioinformatics.babraham.ac.uk/projects/trim_galore, last accessed November 24, 2020) applying a PHRED score of 30 and, thereafter, assembled with Trinity ver. 2.8.1 (Haas et al. 2013) with pair distance set to 800. Open reading frames (ORFs) among transcriptomic contigs and oligochaete ESTs with a minimum of 30 amino acids were predicted with Transdecoder ver. 5.2.0. Predicted ORFs were BLASTed both against a local database adapted from Kvist et al. (2014, 2017) and Tessler, Marancik, et al. (2018), and also against the Swiss-Prot database, using BlastP and BlastX with a cutoff e-value of 1E−10. Protein domains were identified through searches against the Pfam database (Finn et al. 2016), Prosite (Sigrist et al. 2002), PRINTS (Attwood et al. 1994), and SUPERFAMILY (Wilson et al. 2007) with InterProScan ver. 5.4 (Jones et al. 2014). Signal peptides were predicted using SignalP ver. 4.1 (Petersen et al. 2011), with the sensitive D-cutoff value. The final antistasin-like data set was created from previously annotated antistasin-like sequences from other studies and BLAST hits against antistasin-like archetypal proteins that did not have any better hits against other proteins in Swiss-Prot and in which only antistasin-like domains were predicted. In order to identify conserved regions among the antistasin-like proteins in our data set and to establish homology between domains, we used the MEME suit ver. 5.1.1 online tool (Bailey et al. 2015) to find motifs across putative orthologs of anticoagulants found in both oligochaetes and leeches. MEME uses a deterministic optimization algorithm to optimize a position weight matrix that describes the putative motif. We restricted the number of repeats that any motif was allowed within the same sequence and also restricted their length to 25 amino acids (corresponding to the length of antistasin-like domains) in order to differentiate the tandemly repeated antistasin-like domains.

Alignments and Phylogenetic Analysis

All alignments were performed using the web version of MAFFT ver. 7 (Katoh and Standley 2013) with automatic assignment of alignment strategy, the BLOSUM62 scoring matrix, and gap opening penalty of 3.00. IQTREE ver. 1.6.12 (Nguyen et al. 2015) was used to infer the best-fitting model of amino acid evolution and, using the same software, gene trees were reconstructed under maximum likelihood (ML). The best-fitting model, WAG + I + G4 (General Empirical Model of Protein Evolution), was used for all gene tree estimations, as suggested by IQTREE. The default heuristic search in IQTREE was performed with 1,000 replications. Likelihood bootstrap support values were calculated using 1,000 replicates with default settings. The trees were rooted at Loimia bermudensis (Terebellidae) following previous phylogenetic hypotheses (e.g., Rousset et al. 2007). Orthologous groups were defined according to the distribution of archetypal anticoagulants and the distribution of motifs predicted.

Results

Raw sequences generated for this study and their respective assemblies are deposited in the SRA and the Transcriptome Shotgun Assembly (TSA) Sequences Database (BioProject accession number: PRJNA670722); SRA and TSA accession numbers are available in table 1. Over 30,000,000 raw sequence reads were generated and 151,159 contigs were assembled on average for each of the nine new transcriptomes. Additionally, we included in our analyses both oligochaete ESTs deposited in GenBank, annotated transcriptomic data from previous studies of leech anticoagulants and additional annelid transcriptomes available on SRA (see table 1 for statistics). Transdecoder predicted 85,121 ORFs on average for each of the new transcriptomes (a total of 57 of these found matches against one of antistasin, therostasin, guamerin, piguamerin, or bdellastasin and, at the same time, possessed a predicted antistasin-like domain). No hits against antistasin-like proteins were found for the Acanthobdella peledina transcriptome. Transdecoder predicted a total of 14,188 ORFs for the ESTs and 51,743 ORFs on average for each of the SRA transcriptomes and a total of 141 EST sequences showed significant matches against one of the aforementioned proteins. The final antistasin-like data set was composed of 232 sequences from the new and SRA transcriptomes, sequences from previous leech studies, and oligochaete ESTs.

Gene Tree

The final alignment for the data set exclusively formed by annelid sequences included 2,289 aligned sites. The best scoring ML tree had a log likelihood (ln L) of −62,971.666 (fig. 1). The tree was rooted at L. bermudensis—note that this is the first time that an antistasin gene tree has been rooted. In our analysis, all leech sequences fall within a larger clade denoted “leech antistasins” in figure 1, though this clade is not formed exclusively by leech sequences. Clades 1–3 are formed only by non-Hirudinea sequences, except for a few sequences from Branchiobdellida that fall within clade 2. The larger leech antistasin clade can be further subdivided into six clades, three of which contain archetypal leech sequences that were used to define them as antistasins, trypsin and leukocyte elastase inhibitors (TLIs), and therostasins. The remaining three clades (4–6) do not contain any archetypal sequences and, therefore, these remain undefined. Clade 4 is formed by sequences from leeches, branchiobdellidans, and oligochaetes. Clade 5 is formed only by oligochaete sequences and is sister to larger clade formed by clade 6 and the three defined clades that contain the archetypal leech sequences.
Fig. 1

Best scoring ML tree resulting from the analysis of the putative antistasin-like proteins (ln L = −62,971.666): (A) best scoring ML tree with Leech antistasins clade collaped; (B) Leech antistasins clade of the best scoring ML tree. Nodes with likelihood bootstrap support values above 75 are denoted by blue circles. Branches are drawn proportionate to change. Archetypal leech bioactive proteins are denoted in bold. Red and black branches represent leech and non-leech sequences, respectively.

Best scoring ML tree resulting from the analysis of the putative antistasin-like proteins (ln L = −62,971.666): (A) best scoring ML tree with Leech antistasins clade collaped; (B) Leech antistasins clade of the best scoring ML tree. Nodes with likelihood bootstrap support values above 75 are denoted by blue circles. Branches are drawn proportionate to change. Archetypal leech bioactive proteins are denoted in bold. Red and black branches represent leech and non-leech sequences, respectively. The archetypal antistasins are sister to the archetypal ghilanten and these two anticoagulants are separated by very short branches. These fall within a clade formed by sequences recorded from both glossiphoniids and non-proboscis bearing leeches. The TLI clade is sister to the antistasin clade and is formed by five archetypal sequences: bdellastasin, guamerin, hirustasin, piguamerin, and poecistasin. The archetypal therostasin, theromin, and gigastasin place within a monophyletic group (Therostasins) as sister to clade 6, antistasins and the TLI clade. Therostasin and theromin are recovered as sister to each other in our analysis, despite their functional divergence (see Introduction). The tree resulting from the analysis that included putative non-annelid antistasin-like sequences is shown in supplementary figure S4, Supplementary Material online. The inclusion of these sequences and the rooting at Hydra vulgaris resulted in the paraphyly/polyphyly of clades that were previously reported monophyletic, including the leech antistasins clade. Therefore, we largely disregarded this hypothesis because of obvious artefactual issues (see Discussion).

Motif and Domain Prediction

In total, 50 motifs (M1–M50) were predicted at an e-value superior to 1E−10. Figure 2 shows the distribution of 22 predicted motifs that are present among sequences from Hirudinea. M1–M12, M20, M40, and M50 are located within antistasin-like domains predicted by InterProScan, as illustrated in figure 3. M1 is present in almost all sequences included in the analysis except for six leech sequences that place in three separate clades in the gene tree. M2 and M3 are present in the majority of the oligochaete and branchiobdellid sequences (fig. 2) but only three leech sequences in the antistasin clade possess M2 and only one sequence possesses M3 (fig. 2). Three motifs (M26, M33, and M39) are exclusive or almost exclusive to certain clades. M26 and M33 are exclusive to the TLI and therostasin clades, respectively (fig. 2). M39 occurs in the majority of the antistasin clade (fig. 2), including in archetypal antistasin and ghilanten, but it is also present in sequences from the therostasin clade, including gigastasin. The only shared motifs between the archetypal theromin and therostasin is the M6, which corresponds to an antistasin-like domain.
Fig. 2

Motif distribution on the best scoring ML tree based on the MEME suit. (A) Clade 1; (B) Clade 2; (C) Antistasin clade; (D) TLI clade; (E) Therostasin clade. Each color denotes a different motif. Archetypal leech bioactive proteins are denoted in bold. Red and black branches represent leech and non-leech sequences, respectively

Fig. 3

Alignment of antistasin-like proteins isolated from Oligochaeta and Hirudinea: (A) archetypal antistasin (AAA29193) and ghilanten (P16242); (B) archetypal theromin (P82354) and therostasin (Q9NBW4); (C) archetypal poecistasin, piguamerin (P81499), guamerin (AAD09442), hirustasin (P80302), and bdellastasin (P82107); and (D) contig BP524404 from Eisenia andrei, BF422342 from Lumbricus rubellus, therostasin (Q9NBW4), theromin (P82354), guamerin (AAD09442), and antistasin (AAA29193). See text for further discussion.

Motif distribution on the best scoring ML tree based on the MEME suit. (A) Clade 1; (B) Clade 2; (C) Antistasin clade; (D) TLI clade; (E) Therostasin clade. Each color denotes a different motif. Archetypal leech bioactive proteins are denoted in bold. Red and black branches represent leech and non-leech sequences, respectively Alignment of antistasin-like proteins isolated from Oligochaeta and Hirudinea: (A) archetypal antistasin (AAA29193) and ghilanten (P16242); (B) archetypal theromin (P82354) and therostasin (Q9NBW4); (C) archetypal poecistasin, piguamerin (P81499), guamerin (AAD09442), hirustasin (P80302), and bdellastasin (P82107); and (D) contig BP524404 from Eisenia andrei, BF422342 from Lumbricus rubellus, therostasin (Q9NBW4), theromin (P82354), guamerin (AAD09442), and antistasin (AAA29193). See text for further discussion.

Discussion

Anticoagulants are important evolutionary innovations insofar as they allow bloodfeeders to feed efficiently. The results presented here provide much needed data for the understanding of the origin and evolution of antistasin-like anticoagulants in leeches. The inclusion of oligochaete and polychaete sequence data for antistasin-like proteins in a phylogenetic context provides an objective root to the gene tree and allowed the recognition of the major protein-level changes in each group of orthologous proteins. Additionally, the detailed analyses of conserved motifs allowed for the comparison of composition and motif number in antistasin-like domains across orthologous groups and granted the identification of major changes within antistasin-like sequences across evolutionary time. Our phylogenetic analysis suggests that the archetypal antistasin-like proteins isolated from leech saliva belong to three orthologous groups: 1) TLIs, 2) antistasins, and 3) therostasins. Beyond this, two additional clades (clades 4 and 6 in fig. 1) likely represent new anticoagulants that have yet to be named. Without formal functional analyses, however, we refrain from naming these putative anticoagulants here. Although antistasin-like proteins have been recorded in non-annelid taxa, previous studies on the evolution and classification of anticoagulants have left gene trees unrooted due to the lack of comparative data from outgroup taxa. The issue of long branches and random rooting is discussed in greater detail elsewhere (e.g., Wheeler 1990; Graham et al. 2002; Rosenfeld et al. 2012). However, the position of the root within an unrooted tree can render clans (equivalent to monophyletic group in an unrooted tree; see Wilkinson et al. 2007) paraphyletic if the root falls within a specific clade. Therefore, orthology determination and a nomenclatural system based on phylogenetic relationships are dependent on the root of a tree. Moreover, the lack of a root has so far prevented any conclusion about direction of evolution within leech anticoagulants. We rooted our tree at a sequence from L. bermudensis, a polychaete species that belongs to the family Terebellidae and included oligochaete sequence data to provide a meaningful outgroup. We also explored the possibility of a non-annelid root, but the resulting tree topology requires at least two extra origins of leech antistasin-like proteins and is discordant with previous studies (Amorim et al. 2015; Iwama et al. 2019; Kvist et al. 2020).

Classification of Leech Antistasin-like Proteins

The classification of leech anticoagulants is fraught with problems. A clear example of this is the classification and separation of hirudin and hirudin-like factors (Müller et al. 2016, 2017, 2019). Hirudin-like factors share several compositional characteristics with hirudin, making them a target for BLAST-based orthology determination, yet all but one lack the critical capacity for anticoagulation. As a result, Müller et al. (2020) suggested a classification system for hirudin and hirudin-like factors that hinges entirely on the function of these proteins. However, this system does not reflect historical events that have led to the diversification of the proteins. Herein, we suggest a classification scheme for antistasin-like proteins that is based on orthology as inferred by phylogenetic analysis (fig. 1). Our results corroborate the findings of Iwama et al. (2019), and we suggest the following names for the three major groups of leech antistasin-like proteins: 1) the TLI clade, 2) the antistasin clade, and 3) the therostasin clade. The TLI clade is formed by guamerin, piguamerin, bdellastasin, hirustasin, and poecistasin. Whereas guamerin is a leukocyte elastase inhibitor, bdellastasin, hirustasin, piguamerin, and poecistasin are strong inhibitors of trypsin and plasmin (Söllner et al. 1994; Kim and Kang 1998; Moser et al. 1998; Rester et al. 1999; Tang et al. 2018). The antistasin clade is formed by archetypal antistasin, isolated from the leech Haementeria officinalis de Filippi 1849; ghilanten, isolated from Haementeria ghilianii de Filippi, 1849; and sequences from both proboscis- and non-proboscis-bearing leeches (fig. 1). Previous studies have pointed to the sister relationship (with very short branch lengths), as well as the high sequence (95.8%) and functional similarity between these two antistasin-like archetypal anticoagulants, and have suggested their synonymization (Kvist et al. 2014; Amorim et al. 2015; Iwama et al. 2019). However, the lack of both a root for the antistasin-like gene tree and sufficient comparative data across the diversity of Clitellata has prevented the determination of orthologous groupings within this family of proteins and, therefore, the circumscription of a single name for these highly similar proteins. Our phylogenetic analysis confirms the sister relationship of antistasin and ghilanten and the motif prediction re-iterates the similarity in primary structure. Moreover, previous studies demonstrated that they both inhibit hemostasis by targeting factor Xa (Nutt et al. 1988; Condra et al. 1989; Dunwiddie et al. 1989; Blankenship et al. 1990; Lapatto et al. 1997). As a result of our findings, we suggest the formal synonymization of antistasin and ghilanten: Antistasin has seniority and should therefore be the formal name used for this group of anticoagulants. In our rooted tree, as in Iwama et al. (2019), therostasins form a monophyletic group, separate from the other clades in the tree (fig. 1). Archetypal therostasin and theromin are sister to each other, as reported by Amorim et al. (2015). However, unlike antistasin and ghilanten, these two anticoagulants are functionally divergent. Therostasin is a strong factor Xa inhibitor and theromin is a thrombin inhibitor (Chopin et al. 2000; Salzet et al. 2000). Moreover, the genetic distance between these two anticoagulants (amino acid similarity of 79.7%) is much higher than that between antistasin and ghilanten (95.8%). Also, the motif analysis revealed a lack of M1 and M33 (fig. 2) in the theromin sequence. This motif is conserved in all previously identified members of the therostasin clade (except for the early diverging Haemadipsa interrupta (Moore, 1935) sequence). Therefore, the distribution of motifs within the therostasin clade supports the nomenclatural differentiation between therostasins and theromin. Our results also necessitate a broadening of the definition for antistasin-like proteins to include direct thrombin inhibitors, as well as trypsin, leukocyte elastase, plasmin, and factor Xa inhibitors. Clades 1–6 do not possess any well-characterized proteins and do not all have strong nodal support. Because of the low nodal support for some of these clades, it remains impossible to robustly infer whether or not these are antistasin-like proteins, therostasin-like proteins or represent a new lineage of anticoagulants; therefore, we choose not to name these clades.

Functional Diversity of Antistasin-like Proteins in Leeches

The TLI clade is formed by archetypal anticoagulants that have been functionally characterized, although their role in antihemostasis in leeches is still not completely understood. Guamerin is the only archetypal protein within this clade that does not inhibit trypsin or plasmin. Instead, guamerin is a leukocyte elastase inhibitor, which inhibits several neutrophil serine proteases related to the inflammatory process and has a potential to influence immunological responses (Jung et al. 1995). Bdellastasin, poecistasin, hirustasin, and piguamerin are strong inhibitors of plasmin and trypsin. Plasmin is a key component of the fibrinolysis pathway, responsible for the degradation of several blood plasma proteins, including fibrin and fibrinogen that form the blood clot. Therefore, it seems counterintuitive to suggest that antiplasmin proteins are used by leeches to inhibit hemostasis. However, these four archetypal anticoagulants also inhibit trypsin. The effect of the serine protease trypsin in blood coagulation has long been described and it is analogous to factor Xa (Ferguson et al. 1960). In fishes, trypsin secretion is localized in nasopharyngeal mucosal and gill tissues, and its effect on hemostasis in blood of gills has been demonstrated (Kim et al. 2009). Nonetheless, the effect of trypsin inhibition in hemostasis has yet to be robustly demonstrated for other animal taxa, but an inhibitory impact on factor IX, factor VII, and/or factor V (such as in fishes) would suggest that they play important roles in antihemostasis (Furie and Furie 2005; Kim et al. 2009; Tavares-Dias and Oliveira 2009). Our alignment of antistasin and ghilanten shows that differences in primary structure between these two anticoagulants are concentrated in their N-terminal domains (fig. 3), including their reactive sites contained in M2. Previous studies have linked changes in the reactive sites and the exosite bindings of antistasin to subtle changes in anticoagulatory power (Dunwiddie et al. 1989; Blankenship et al. 1990). The alignment of the sequences for (the functionally divergent) therostasin and theromin (fig. 3) revealed relatively high sequence similarity (79.7%) except in the C-terminal region, which possesses a gap formed by 8 amino acid residues (between residues 53 and 54) in theromin. Interestingly, this gap region is aligned with M33, which is otherwise present in all other therostasins included in this study, but is not present in theromin (fig. 2). This suggests a deletion event affecting this motif in theromin. The biological significance, if any, of M33 is still unknown, but our tree indicates that the origin of this motif occurred within the therostasin clade and that it was subsequently lost in theromin. This loss could either be a consequence of the accumulation of mutations following gene duplication or by expression of different splice variants, a common source of toxin variation in other animal taxa (Siigur et al. 2001; Fry et al. 2010; Haney et al. 2019). Another gap is present in the C-terminal region of theromin, between residues 37 and 38. These residues are included in the putative active site of therostasin, suggesting that this region is degraded in theromin; this is further supported by an amino acid substitution (I/C) in this region of the theromin sequence.

Origin of Leech Salivary Antistasin-like Proteins

Although the archetypal antistasin was originally described from the glossiphoniid leech Haementeria officinalis (Nutt et al. 1988; Condra et al. 1989; Dunwiddie et al. 1989), several studies have recorded antistasin-like serine proteases in non-leech taxa, such as oligochaetes, polychaetes, and cnidarians (Holstein et al. 1992; Lee et al. 2010; Mehr et al. 2015). Their function is still largely unknown but it has been suggested that they might be related to physiological processes other than antihemostasis (Kwak et al. 2019), such as immune response (Nikapitiya et al. 2010). Therefore, there seems to be an obvious functional change between non-leech antistasin-like proteins and leech salivary antistasin-like proteins that may be related to bloodfeeding. Such functional change opens up questions about the origin and evolution of anticoagulatory capabilities within the antistasin family of serine protease inhibitors, including structural and sequence changes that are involved in the inhibition of hemostasis. The presence of non-leech antistasin-like sequences in clades 4 and 5 necessarily invokes at least two independent origins of leech antistasin-like proteins: one independent origin for clade 4 and one for the factor Xa and TLI inhibitors. Clades 4 and 5 lack functionally characterized anticoagulants, such that the proteins may not be involved in anticoagulation or bloodfeeding. In fact, BLAST searches of the final antistasin-like data set (formed exclusively by antistasin-like domains) recovered hits against the leech cocoon protein described by Mason et al. (2004) that is formed by six repeats of antistasin-like and NOTCH domains (Mason et al. 2006). Moreover, a recent study has suggested that proteins with antistasin-like domains might be involved in segment polarity signaling (Kwak et al. 2019). By contrast, archetypal anticoagulants in the TLI and the factor Xa inhibitor clades are produced and secreted by leech salivary gland cells and have been shown to possess anticoagulatory capabilities. The topology shown in figure 1 suggests a single origin of TLIs and factor Xa inhibitors, and that the ancestor of Hirudinida was capable of inhibiting factor Xa, trypsin, and/or leukocyte elastase. In the last decade, it has been proposed that the presence of antistasin, hirudins, mannilase, piguamerin, bdellastasin, destabilase, and leech-derived tryptase inhibitors in modern leeches is derived from orthologs present in the ancestor of Hirudinida (Min et al. 2010; Kvist et al. 2011, 2013; Siddall et al. 2016; Tessler, Marancik, et al. 2018; Iwama et al. 2019; Babenko et al. 2020). Nevertheless, these analyses lacked robust orthology determination methodology and/or an objective root in the trees. These methodological difficulties impose limitations on any evolutionary conclusion regarding the origin of anticoagulants.

Evolution of Leech Antistasin-like Anticoagulants

Holstein et al. (1992), in the description of the Hydra antistasin-like serine protease inhibitor, noticed that the sequence was composed of 6-fold internal repeats of 25 and 26 amino acids and that these internal repeats shared the same distribution of cysteines with the C-terminal region of the two antistasin-like domains in the archetypal leech antistasin. Therefore, they suggested that members of the antistasin-like family of serine proteases evolved by duplication of the repeated internal folds that form the domains in the archetypal antistasin. In our analyses, all oligochaete antistasin-like sequences possess multiple domains that are shared by all sequences in non-leech clades. These results suggest that the ancestral leech antistasin was a multidomain protein (varying between two and six domains). On the other hand, sequences in the TLI clade, as well as in the antistasin and therostasin clades, exhibit a reduction in number of domains predicted by InterProScan and the motif analysis suggests that the antistasin-like domain present in leech sequences predate bloodfeeding and the diversification of Hirudinida (figs. 2 and 3). Despite efforts to elucidate the evolution and origin of leech anticoagulants, few studies have focused on specific protein-level changes that resulted in anticoagulatory activity. One of the only recent attempts demonstrated the importance of specific residues in different regions of hirudin and hirudin-like factors in regards to anticoagulatory activity (Müller et al. 2020). However, the direction of changes within each orthologous group was not elucidated due to the lack of a phylogenetic hypothesis, much less a proper root for the tree. In hirudins and hirudin-like factors, this task is particularly difficult because these proteins seem to be restricted to Hirudiniformes, which are thought to have a bloodfeeding ancestor (Iwama et al. 2019). Unlike hirudins and hirudin-like factors, antistasin-like proteins are widely distributed within leeches, as well as their relatives. However, most proteins belonging to the antistasin family possess a variable number of antistasin-like domains repeated in tandem (fig. 3), which complicates comparisons between domains in different antistasin-like proteins, because orthology between domains is difficult to establish. In the present study, this problem was partially alleviated by comparing the distribution and organization of motifs among antistasin-like domains. Sequence motifs are a common resource for establishing homology between distantly related domains, and some important tools for protein and domain identification (e.g., PROSITE profiles and PRINTS) are based on motif identification (Attwood et al. 1994; Sigrist et al. 2002; Attwood et al. 2012; Karlin and Belshaw 2012; Gerdol et al. 2019). Moreover, several studies attempt to establish tentative functional characterization based on the presence and organization of certain motifs that are associated with a specific function within a family of proteins (e.g., Fang et al. 2016; Müller et al. 2016; Wang et al. 2019). Our motif analyses found homology between domains belonging to different orthologous groups of leech and oligochaete antistasin-like proteins. In the archetypal antistasin and ghilanten, the two tandem domains seem to be homologous to two domains present in the oligochaete sequences (fig. 3). The first domain is characterized by the presence of M2 and the second domain by the presence of M1. Furthermore, the N-terminal region of each domain seems to be modified in relation to the oligochaete sequences, with the acquisition of M31 and M39 in the leech sequences, respectively. The alignment shown in figure 3 failed to capture the putative homologous relationship between domains in the antistasin and the oligochaete sequences. This might be due to artifacts in the motif analysis, because we expect a high similarity between tandem domains. Another explanation is a major domain rearrangement during the evolution of antistasin-like proteins, a common process in protein evolution (Björklund et al. 2005; Bornberg-Bauer et al. 2005). The archetypal proteins in the TLI clade are all formed by a single domain characterized by the presence of M1 and M26 in a N-terminal position, suggesting a homologous relationship between domains in the oligochaete sequences that are also characterized by M1 and an acquisition of M26 in the TLI clade. The archetypal therostasin is formed by two antistasin-like domains, characterized by M1 and M6. Moreover, the presence of M4 in the C-terminal region of the second domain represents putative modifications in the therostasin clade. Unlike the other members of the therostasin clade, in theromin, M6 is the only domain predicted. The sequence alignment of therostasin and theromin demonstrates that the majority of changes in theromin are located in the C-terminal region of the alignment, within the second antistasin-like domain (fig. 3). This may explain why M1 is not predicted to be present in the second domain of theromin. Although the motif analysis and domain prediction successfully identified changes within each orthologous group of leech antistasin-like proteins, their role in hemostasis inhibition (if any) is still unknown. Therefore, it is still not possible to directly link these motifs and domain number to anticoagulatory activity and future efforts should focus on functionally characterizing antistasin-like proteins in order to better understand the transitions related to bloodfeeding and the significance of each identified motif in the inhibition of hemostasis.

Supplementary Material

Supplementary data are available at Genome Biology and Evolution online. Click here for additional data file.
  84 in total

1.  Transcriptomic analysis of oligochaete immune responses to myxosporeans infection: Branchiura sowerbyi infected with Myxobolus cultus.

Authors:  Yuanli Zhao; Xinhua Liu; Bingwen Xi; Qianqian Zhang; Aihua Li; Jinyong Zhang
Journal:  J Invertebr Pathol       Date:  2019-11-22       Impact factor: 2.841

2.  Amino acid sequence of ghilanten: anticoagulant-antimetastatic principle of the South American leech, Haementeria ghilianii.

Authors:  D T Blankenship; R G Brankamp; G D Manley; A D Cardin
Journal:  Biochem Biophys Res Commun       Date:  1990-02-14       Impact factor: 3.575

3.  Bacterial symbiont and salivary peptide evolution in the context of leech phylogeny.

Authors:  Mark E Siddall; Gi-Sik Min; Frank M Fontanella; Anna J Phillips; Sara C Watson
Journal:  Parasitology       Date:  2011-05-10       Impact factor: 3.234

4.  Worms that suck: Phylogenetic analysis of Hirudinea solidifies the position of Acanthobdellida and necessitates the dissolution of Rhynchobdellida.

Authors:  Michael Tessler; Danielle de Carle; Madeleine L Voiklis; Olivia A Gresham; Johannes S Neumann; Stanisław Cios; Mark E Siddall
Journal:  Mol Phylogenet Evol       Date:  2018-05-17       Impact factor: 4.286

5.  The salivary transcriptome of Anopheles gambiae (Diptera: Culicidae) larvae: A microarray-based analysis.

Authors:  M Neira Oviedo; J M C Ribeiro; A Heyland; L VanEkeris; T Moroz; P J Linser
Journal:  Insect Biochem Mol Biol       Date:  2009-03-28       Impact factor: 4.714

6.  More than just one: multiplicity of Hirudins and Hirudin-like Factors in the Medicinal Leech, Hirudo medicinalis.

Authors:  Christian Müller; Katharina Mescke; Stephanie Liebig; Hala Mahfoud; Sarah Lemke; Jan-Peter Hildebrandt
Journal:  Mol Genet Genomics       Date:  2015-08-13       Impact factor: 3.291

Review 7.  Platelet aggregation inhibitors from hematophagous animals.

Authors:  Ivo M B Francischetti
Journal:  Toxicon       Date:  2009-12-24       Impact factor: 3.033

8.  Alternative Transcription at Venom Genes and Its Role as a Complementary Mechanism for the Generation of Venom Complexity in the Common House Spider.

Authors:  Robert A Haney; Taylor Matte; FitzAnthony S Forsyth; Jessica E Garb
Journal:  Front Ecol Evol       Date:  2019-04

9.  Solving a Bloody Mess: B-Vitamin Independent Metabolic Convergence among Gammaproteobacterial Obligate Endosymbionts from Blood-Feeding Arthropods and the Leech Haementeria officinalis.

Authors:  Alejandro Manzano-Marín; Alejandro Oceguera-Figueroa; Amparo Latorre; Luis F Jiménez-García; Andres Moya
Journal:  Genome Biol Evol       Date:  2015-10-09       Impact factor: 3.416

View more
  2 in total

1.  Evolution, Expression Patterns, and Distribution of Novel Ribbon Worm Predatory and Defensive Toxins.

Authors:  Aida Verdes; Sergi Taboada; Brett R Hamilton; Eivind A B Undheim; Gabriel G Sonoda; Sonia C S Andrade; Esperanza Morato; Ana Isabel Marina; César A Cárdenas; Ana Riesgo
Journal:  Mol Biol Evol       Date:  2022-05-03       Impact factor: 8.800

2.  Research on ACEI of Low-Molecular-Weight Peptides from Hirudo nipponia Whitman.

Authors:  Zhao Ding; Keli Chen; Yunzhong Chen
Journal:  Molecules       Date:  2022-08-24       Impact factor: 4.927

  2 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.