Literature DB >> 24113121

A genome-wide analysis of annexins from parasitic organisms and their vectors.

Cinzia Cantacessi¹, Jennifer M Seddon, Terrence L Miller, Chiuan Yee Leow, Laëtitia Thomas, Lyndel Mason, Charlene Willis, Giselle Walker, Alex Loukas, Robin B Gasser, Malcolm K Jones, Andreas Hofmann.

Abstract

In this study, we conduct an in-depth analysis of annexin proteins from a diverse range of invertebrate taxa, including the major groups that contain the parasites and vector organisms that are harmful to humans and domestic animals. Using structure-based amino acid sequence alignments and phylogenetic analyses, we present a classification for this protein group and assign names to sequences with ambiguous annotations in public databases. Our analyses reveal six distinct annexin clades, and the mapping of genes encoding annexins to the genome of the human blood fluke Schistosoma mansoni supports the hypothesis of gene duplication as a major evolutionary event in annexin genesis. This study illuminates annexin diversity from a novel perspective using contemporary phylogenetic hypotheses of eukaryote evolution, and will aid the consolidation of annexin protein identities in public databases and provide a foundation for future functional analysis and characterisation of these proteins in parasites of socioeconomic importance.

Entities: Chemical Disease Gene Species

Mesh：

Substances：
Annexins

Year: 2013 PMID： 24113121 PMCID： PMC3795353 DOI： 10.1038/srep02893

Source DB: PubMed Journal: Sci Rep ISSN： 2045-2322 Impact factor: 4.379

Annexins are a large family of proteins which are widely expressed across all eukaryotes and play key roles in a range of fundamental biological activities, including calcium metabolism, cell adhesion, growth and differentiation and subcellular transport1, as well as membrane repair2. In parasites, annexins are considered to play critical roles in mechanisms linked to their survival, including the maintenance of cell structure integrity and modulation of the immune responses of the vertebrate hosts3. Due to their location at the host-parasite interface and their immunogenic properties, these parasite annexins have been proposed as potential targets for the development of novel drug and vaccine candidates34. Structurally, annexins are characterised by a C-terminal domain comprised of four homologous repeats of ~70 amino acids in length. The homologous domains often contain the characteristic endonexin sequence (K-G-X-G-T), which structurally translates into a type II calcium binding site with a high affinity for calcium and phospholipids5. The variable N-terminal domain harbours sites for post-translational modifications and protein-protein interactions1. Previous studies6 have demonstrated that the evolution of annexins has been characterised by successive gene duplication events, which have led to the expansion and diversification of annexin-encoding genes in vertebrates, invertebrates, plants and protists. Despite the substantial amino acid sequence similarities, sequence variants in different groups of eukaryotes are associated with structural features and biochemical properties, resulting in functional differences that are specific to each eukaryote group37. Based on the classification proposed by Fernandez and Morgan8, which integrated the use of phylogenetic analyses of amino acid sequences with gene structural analyses and genetic linkage maps, annexins are grouped into distinct families that correspond to the evolutionary divisions of the eukaryotes. This classification system, endorsed by the 50th Harden Conference (First International Annexin Conference, Wye College, UK, Sept 1–5, 1999), led to the current annexin nomenclature, which includes ‘A’ (from vertebrates, including humans), ‘B’ (invertebrates, including parasitic helminths), ‘C’ (fungi), ‘D’ (plants), and ‘E’ (protists) annexins8. Within the A annexins, a total of 12 distinct sequences have been described and assigned the identifiers A1–A13 (annexin A12 being unassigned), while annexins in families B through E are numbered progressively based on their presumed evolutionary distance from the A annexins. However, newly identified annexin sequences are usually named based on identity with the first hit on a BLAST search and without consideration of the family as a whole, thus leading to ambiguity in identity and relationship to other annexins. The vast majority of these proteins described to date have been detected in animals, plants and fungi. Systematics in the last 15 years has shown these multicellular taxa to each have arisen independently from unicellular protists. As more genomic data for protists and multicellular eukaryotic groups become available, the annexin nomenclature will become increasingly complex. There has been significant confusion and inconsistencies in classification and nomenclature for annexins within Group B, including those of parasitic helminths. Recent advances in high-throughput sequencing and bioinformatics have resulted in an explosion of large-scale genomic and transcriptomic studies of parasitic helminths910111213 and, in turn, of the sequence data deposited in public databases for a range of helminth species of medical and veterinary importance. These advances have resulted in the exacerbation of inconsistencies in classification of B annexins; for example, an annexin from the human blood fluke Schistosoma japonicum (gb:CAX82892) is currently designated as ‘annexin A13’ instead of carrying a ‘B’ identifier; and ‘annexin B2’ has been assigned to two distinct proteins, one from the human tapeworm Taenia solium (gb:AAY17503) and one from Schistosoma mansoni (up:G4VL6814). Given the biological significance of parasite annexins, implementing a rational and consistent nomenclature for these proteins will promote structural and functional investigations of individual members of this protein family, and thus assist future studies aimed at elucidating their role/s in host-parasite interactions and the modulation of the hosts' immune response. In the present study, we (i) prepared a comprehensive, secondary structure-based sequence alignment of B annexins from a range of parasitic helminths of public health and veterinary importance (including the blood flukes Schistosoma spp., the carcinogenic liver flukes Clonorchis sinensis and Opisthorchis viverrini and the hookworm Necator americanus) and some parasite vectors available in public databases, (ii) inferred phylogenetic relationships and (iii) proposed a nomenclature of B annexins considering secondary structure, characteristic protein signature/motifs, taxonomic features and evolutionary distance from a corresponding vertebrate homolog (A annexin). We also mapped the various annexin groups to clades arising from the contemporary hypotheses of eukaryote evolution as presented by Walker and colleagues15.

Results

Identification of annexins

In order to construct a dataset of putative Group B annexin sequences, we searched genomic sequences of a total of 35 species from 12 invertebrate and protistan phyla. After identification and verification, amino acid sequences from 28 species were confirmed as putative functional annexins (see Table 1) with amino acid identities ranging from 0.17 to 0.84. The structure-based alignment of the amino acid sequences is provided in Supplementary Figure S2. The published sequence of annexin (Sm)5 (new name: annexin B7a) from S. mansoni (gi:256084742) lacked 75 amino acids corresponding to positions spanning D91-V165.

Table 1

Organisms searched for annexin proteins

Phylum	Class	Family	Organism	Full genome sequenced	No of full-length annexin sequences	No of partial annexin sequences	Full-length annexin proteins (no of putative isoforms)	URL
Apicomplexa	Aconoidasida	Plasmodiidae	Plasmodium falciparum*	yes	0	0		1
Arthropoda	Arachnida	Ixodidae	Ixodes scapularis	yes	1	0	B28	1
	Insecta	Bombycidae	Bombyx mori	yes	4	0	B11 (2), B17 (2)	1
	Insecta	Culicidae	Aedes aegypti	yes	6	0	B9 (2), B17 (4)	1
	Insecta	Culicidae	Anopheles gambiae	yes	4	1	B9, B17 (3)	1
	Insecta	Culicidae	Culex quinquefasciatus	yes	5	0	B9 (2), B17 (3)	1
	Insecta	Drosophilidae	Drosophila melanogaster	yes	5	0	B9 (3), B11 (2)	1
	Insecta	Pediculidae	Pediculus humanus	yes	5	0	B9 (2), B11, B17 (2)	1
Cnidaria	Hydrozoa	Hydridae	Hydra vulgaris	yes	1	0	B12	1
	Hydrozoa	Hydridae	Hydra magnipapillata	yes	3	0	B4, B12 (2)	1
Mollusca	Gastropoda	Pomatiopsidae	Oncomelania hupensis	no	0	0		1
Nematoda	Adenophorea	Trichinellidae	Trichinella spiralis	yes	0	0		1
	Chromadorea	Ascarididae	Ascaris suum*	yes	7	0	B8, B19 (3), B21 (3)	1
	Chromadorea	Onchocercidae	Brugia malayi*	yes	2	1	B37, B40	1
	Secernentea	Heteroderidae	Heterodera glycines	no	1	0	B19	1
	Secernentea	Rhabditidae	Caenorhabditis elegans	yes	5	0	B8 (2), B19, B21, B36	1
	Secernentea	Strongyloididae	Strongyloides ratti*	yes	0	1		4
	Secernentea	Trichostrongylidae	Haemonchus contortus	yes	0	1		2
	Secernentea	Uncinariidae	Necator americanus	yes	0	5		3
Placozoa	Tricoplacia**	Trichoplacidae**	Trichoplax adhaerens	yes	3	0	B4, B6, B26	1
Platyhelminthes	Cestoda	Taeniidae	Echinococcus granulosus*	yes	12	0	B1, B2, B3, B5, B15, B18, B20, B23, B24, B25, B33, B38	6
	Cestoda	Taeniidae	Taenia solium*	no	3	0	B1, B2, B3	1
	Monogenea	Microcotylidae	Microcotyle sebastis	no	1	0	B34	1
	Rhabditophora	Dugesiidae	Schmidtea mediterranea	yes	3	0	B5, B16, B27	8
	Trematoda	Fasciolidae	Fasciola gigantica	yes	3	4	B22, B30, B39	5
	Trematoda	Fasciolidae	Fasciola hepatica	yes	3	6	B5 (2), B7	5
	Trematoda	Opisthorchiidae	Clonorchis sinensis	yes	7	0	B5 (2), B7, B14, B22, B30, B35	5
	Trematoda	Opisthorchiidae	Opisthorchis viverrini	yes	5	2	B5 (3), B22, B30	5
	Trematoda	Schistosomatidae	Schistosoma haematobium*	yes	7	0	B5 (2), B7, B13, B22, B30, B32	7
	Trematoda	Schistosomatidae	Schistosoma japonicum*	yes	6	0	B5, B7, B22, B30, B32, B39	1
	Trematoda	Schistosomatidae	Schistosoma mansoni	yes	13	0	B5 (2), B7 (2), B10, B13, B22, B29, B30, B31, B32, B39 (2)	1
Porifera	Demospongiae	Spongillidae	Ephydatia fluviatilis	no	1	0	B4	1
Sarcomastigophora	Kinetoplastea	Trypanosomatidae	Leishmania braziliensis*	yes	0	0		1
	Kinetoplastea	Trypanosomatidae	Trypanosoma brucei*	yes	0	0		1
	Lobosea	Entamoebidae	Entamoeba histolytica**	yes	0	0		1

Sequences accessed at:

1 http://www.ncbi.nlm.nih.gov/BLAST/

2 http://bioinfosecond.vet.unimelb.edu.au/wblast3.html

3 http://bioinfosecond.vet.unimelb.edu.au/wblast4.html

4 http://www.sanger.ac.uk/cgi-bin/blast/submitblast/strongyloides

5 http://bioinfosecond.vet.unimelb.edu.au/wblast2.html

6 http://www.sanger.ac.uk/resources/downloads/helminths/echinococcus-granulosus.html

7 http://bioinfosecond.vet.unimelb.edu.au/

8 http://smedgd.neuro.utah.edu/

All taxonomy checked in Catalogue of Life at http://www.catalogueoflife.org/col, 11th March 2013 edition.

*not in Catalogue of Life; taxonomy thus taken from NCBI Taxonomy Browser.

**listed as not assigned.

Annexin sequences were detected in all species of invertebrates surveyed except the gastropod mollusc Oncomelania hupensis, and the nematode Trichinella spiralis. No molluscan annexin sequences have yet been identified, although large-scale genomic datasets are available for two gastropods, Oncomelania hupensis, the pulmonate intermediate host (vector) of the trematode Opisthorchis viverrini, and the marine gastropod, the sea hare Aplysia californicus16. Many invertebrates have multiple annexins. The human blood fluke S. mansoni (Platyhelminthes: Digenea) has 13 and for the liver fluke Fasciola hepatica, there are currently 10 annexin sequences known, although some of them are only partial. The searchable database of annexin proteins on the existing Annexin Website (http://www.annexins.org/) has been updated to include the sequences surveyed in the present study.

Phylogenetic analyses of group B annexins

Bayesian inference analysis of the structure-based amino acid sequence alignment (Supplementary Figure S2) resulted in a consensus tree with most of the putative B annexins forming clades with relatively high posterior probability (Figure 1). The maximum likelihood tree had similar topology, however, confidence based on ML bootstrapping for many of the basal nodes was low.

Figure 1

Consensus Bayesian phylogram of B annexins based on Bayesian inference analysis of the structure-based amino acid sequence alignment.

Posterior probabilities and maximum likelihood bootstrap values, respectively, are shown at the nodes. ‘*’ indicates values <50%.

The Bayesian inference and maximum likelihood analyses differed in two instances. Maximum likelihood analysis placed the annexin from Microcotyle sebastis (gb:EU719209) with low bootstrap support in a clade together with annexins from T. solium (up:Q52MU2; B2) and Echinococcus granulosus (EG_04230; B2). In the Bayesian analysis, the M. sebastis sequence was not within the annexin B2 group, but was placed external to a clade containing B22 and B39 annexins with moderate support from posterior probability (Figure 1). A similar situation was encountered with an annexin from E. granulosus (EG_00675). Maximum likelihood analysis placed this annexin in the clade formed by annexin B7 sequences, but again with low bootstrap support (Figure 1). Bayesian inference, in contrast, placed the E. granulosus sequence distant from annexin B7, with strong posterior probability, and separate to other annexin clades. Since the Bayesian inference analysis offered stronger support for these groupings, we assigned these two sequences to their own clades, i.e. annexin B18 (E. granulosus, EG_00675) and annexin B34 (M. sebastis, gb:EU719209). Our present results indicate that two annexins, namely “AnxB13” from B. mori and “B2” from S. mansoni (up:C3VEV017), should indeed be renamed as B11 and B30, respectively. The phylogenetic clades grouped strongly according to phyla (e.g., Arthropoda, Nematoda, Platyhelminthes) and in most cases according to class (e.g., Insecta, Cestoda, Trematoda), both at individual B-number groupings and in more basal major clades (see Figure 1). There are two basal clades of annexins (I and II) restricted to the phylum Platyhelminthes. In the phylum Arthropoda, two major clades (IV and V) are restricted to class Insecta and these clades grouped relatively close in both the Bayesian inference and maximum likelihood analyses. The only other arthropod annexin included in these analyses, from the tick Ixodes scapularis (annexin B28, class Arachnida), grouped well outside of these two clades, more closely related to the platyhelminth clade II (Figure 1). A couple of notable exceptions included annexins B4 and B5. The annexin B4 group contained sequences derived from members of the Cnidaria (Hydra magnipapillata), Porifera (Ephydatia fluviatilis), and the basal metazoan phylum Placozoa (Trichoplax adhaerens) (see Figure 1). However, visual examination of the structure-based amino acid sequence alignment and the close relationship inferred between these annexin sequences in the phylogenetic analyses supported the grouping despite the relatively distant taxonomic relationships of these organism over three phyla. The annexin sequences from these three phyla were grouped together in clade III comprising of basal organisms with two annexins from S. mediterranea (see Figure 1), which may indicate an older origin for these latter sequences. The annexin B5 group contained sequences obtained from platyhelminths of the classes Cestoda, Trematoda and Turbellaria, strongly supporting the orthology of these sequences.

Mapping of genes coding annexins in the genome of Schistosoma mansoni - implications for annexin evolution

In most clades (Figure 1) there was a substantial number of putative paralogs or isoforms, indicated by letters appended to the annexin group number. However, the Platyhelminthes clade I is comprised mostly of annexin groups containing only or mainly orthologs. For example, there were large numbers of sequences retrieved for E. granulosus and S. mansoni. In clade I, all nine E. granulosus sequences were considered orthologs and for S. mansoni, three sequences were inferred as orthologs and four as paralogs. In contrast, there were seven paralogs but only one ortholog for Ascaris suum in clade VI. This pattern may indicate that there was significant gene duplication prior to the divergence of the Platyhelminthes, with each duplicated gene lineage passed into a range of species. However, not all lineages are represented in all species, arguing for a birth-and-death model of gene evolution. Alternatively, gene duplications or isoform development within species may be older in the Platyhelminthes than the other phyla studied here. Older duplications will accumulate mutations with time, giving greater p-distances and hence be assigned different annexin numbers using the present approach. By 2011, 81% of the S. mansoni genome could be mapped to the seven autosomal and one Z/W sex-determining chromosomes18. The majority of annexins are located on autosomal chromosomes 4 and 6 (Table 2). From our present data (Figure 1), it is clearly visible that sequences from the upper Platyhelminthes clade (I) are associated with S. mansoni chromosome 6, and sequences from the lower clade (II) are associated with chromosome 4. One could thus infer that multiple copies within each of these clades have arisen by successive duplication events.

Table 2

Genomic mapping of Schistosoma mansoni annexins

Annexin	GeneDB	Location	Chromosome	Clade
B5a	Smp_045560	4519381–4536000	4	II
B5b	Smp_045550	4548453–4568563	4	II
B10	Smp_146690	4476102–4503979	4	II
B13	Smp_045500	4628571–4637840	4	II
B29	Smp_207040	4583330–4595243	4	II
B31	Smp_045490	4611818–4625468	4	II
B7a	Smp_074140	286104–308545	6	I
B7b partial	Smp_162160	258618–271457	6	I
B22	Smp_074150	330560–360327	6	I
B30	Smp_077720	19430888–19460055	6	I
B32	Smp_164100	19468674–19494199	6	I
B39b	Smp_077880	19921307–19935211	6	I
B39b partial	Smp_201250	48936–58086	SC_0076 (chromosome 1)	I
B39b partial	Smp_201340	249520–254223	SC_0153 (chromosome 1)	I
B39a	Smp_155580	115733–134818	SC_0154	I
B39a partial	Smp_194120	504–4350	SC_0542	I
B39a partial	Smp_178820	1962573–1962761	SC_0041	I
B39b partial	Smp_205300	20951–26123	SC_0276	I
B39b partial	Smp_173300	79778–80578	SC_0154	I
B39b partial	Smp_173290	81573–82755	SC_0154	I

Sequences accessed at: http://www.genedb.org/Homepage/Smansoni.

The patterns in different species may provide clues as to when in evolution the duplication events occurred, although this is somewhat confounded by differing amounts of data in each species. The present tree (Figure 1) suggests that, for example, annexins B30 and B32 represent a duplication event only in trematodes. From the species involved, we hypothesise that clade II was the original platyhelminth annexin clade, given that Ixodes and Schmidtea are at its base, and that clade I arose from a duplication early in platyhelminth evolution.

Discussion

The annexin nomenclature and diversity largely reflects the early investigations of these molecules in “advanced” multicellular organisms, and the focus on the roles of these molecules in humans and mammalian models. As a result, the current annexin nomenclature scheme has an implicit understanding that the substantial diversity of annexin structure and function occurs within animals, fungi and plants, as four of five annexin groups are seen in these multicellular taxa. The ‘protistan’ annexins are grouped together as Group E annexins. Such categorisation, while convenient, cannot possibly be supported by modern concepts of eukaryotic phylogeny. Phylogenetic systematic studies1519 have broken down the traditional concepts of relationships of single-celled eukaryotes, resulting in a new system of highly divergent clades, thereby changing concepts from primitive stem-group protozoa and algae as precursors of ‘crown eukaryotes’, to diverse ‘supergroups’, two of which contain multicellular animals and fungi, and plants. The current view of eukaryotic systematics recognises six distinct lineages; in five of these, annexin sequences have been identified (Figure 2). The lineages are comprised of the opisthokonts (in which one finds Group A, B and C annexins), the archaeplastids (Group D annexins), the SAR (Stramenopile, Alveolate, Rhizarian) clade (annexins not yet categorised), the centrohelid-telonemid-haptophyte (CTH) clade (no annexins described), the excavates (Group E annexins) and the amoebozoa (Group C annexins). The CTH supergroup constitutes the only major clade in which annexin molecules are apparently absent. This observation is gaining support by the recent draft genomes for two of these groups, the haptophytes (Emiliania huxleyi) and the picobilophytes20, which have not yielded any annexin sequences.

Figure 2

Eukaryote scheme and occurrence of annexins as known to date.

Denotation of ‘A’-‘E’ indicates presence of annexins in the groups listed. ‘Yes’ indicates presence of (full-length or partial) uncategorised annexin sequences. ‘None’ indicates absence of annexins to current knowledge.

Records of annexins in other supergroup members (see Figure 2) are fragmentary and reflect research attention paid to particular species of major significance to humans. Some interesting patterns emerge that raise questions about annexin and protist phylogeny. Firstly, it is obviously well supported that the metazoans are monophyletic. The Group B annexins separate strongly into clades reflective of the phylogenetic relationships of the organisms in which they are found. Thus, clades I and II are found largely in platyhelminths, while other clades are dominant in ecdysozoan lineages. Therefore, there is strong support for the three major animal lineages, lophotrochozoan, deuterostome and ecdysozoan, in annexin phylogeny. Annexins are also present in closest relatives of the animals, the choanoflagellates. Secondly, in the current nomenclature, the fungal annexins are classed along with those of dictyostelid and myxogastrid amoebozoans, as Group C annexins. Phylogenetic inferences based on numerous genes and cellular ultrastructure place the dictyostelids and myxogastrids firmly within the Amoebozoa and not within the opisthokonts with the fungi15. The prima facie case for including annexins from Dictyostelium discoideum or Physarum polycephalum with fungi in the Group C annexins is thus not supported by phylogeny. Interestingly, although the genome of the archamoeban Entamoeba histolytica has been described21, no annexin sequences have been located in that species. Group D annexins are those of plants and their phylogeny has recently been investigated in more detail2223. Annexins are present in green algae including both the chlorophytes and streptophytes, but apparently are not in the red algae, a major clade of the supergroup Archaeplastida. The genomic sequences available for one cryptomonad, Guillardia theta, indicate the presence of annexins in this photosynthetic protist. G. theta is the product of secondary endosymbiosis, a process whereby the original non-photosynthetic cell incorporated a red algal cell24. Detailed phylogenetic analyses of group D annexins in relation to other holders of secondary red algal endosymbioses should unravel whether the annexins in that cell arose with the host cell or the endosymbiont. The stramenopiles, alveolates and rhizarians (SAR) supergroup is a monophyletic but structurally and ecologically diverse clade. As the alveolates contain some very important human pathogens, genomic data on this clade is abundantly available. The human cerebral malaria parasite, Plasmodium falciparum, lacks annexin sequences and such sequences have not yet been detected among other alveolates, indicating that this clade may have lost these proteins. Other lineages in the SAR group do contain annexins. A single annexin sequence known for the Rhizaria occurs in the enigmatic Bigelowiella natans, a chlorarachniophyte that has also undergone secondary endosymbiosis. The hypothesis to test here is whether annexins have transferred between supergroups through secondary endosymbiosis. Finally, the Group E annexins are found in two disparate groups of excavates, the diplomonads and parabasalids. Phylogeny of the Excavata indicates an early bifurcation into two lineages, the metamonads (including Giardia) and the Discoba (including the kinetoplastids and the amoeboflagellate Naegleria gruberi). Interestingly, although genomes for a number of parasitic species (e.g. trypanosomes) in the Discoba have been well described, annexins have only been detected in the metamonads, suggesting loss of these proteins from the discoban lineage. The Bayesian inference and maximum likelihood analyses agree with respect to topology and nodal support for the majority of the clades containing the assigned Group B numbers (Figure 1). Within the framework of the current annexin nomenclature, we have assigned novel annexins from parasitic organisms and parasite vectors to remedy ambiguous annexin names found in databases. The phylogenetic analyses conducted in this context shows clearly that annexin diversity follows the phyla, and that within groups, there have been successive gene duplication events, as previously proposed823. Individual effects of diversification of the annexin and species phylogenies is difficult to determine. Clearly, the demarcation cannot be attributed to variable regions of these proteins, such as the N-terminal domain, which is divergent both within and between clades. In contrast, variations in canonical features are more suitable to study effects of diversification, and one such feature, that is accessible at the level of primary structure, is the presence or absence of the endonexin sequence. This motif, at the level of three-dimensional structure, is responsible for the canonical type II calcium binding in annexins. Indeed, common, but inconsistent patterns of presence or absence of these motifs emerge when examining Group B annexins (see Supplementary Figure S4). The most frequent lack of the endonexin sequence appears in repeat III (30 times), as compared to repeats I, II and IV (19, 13 and 10 times, respectively). Interestingly, the calcium-dependent membrane binding mechanisms of some invertebrate annexins may engage exclusively the canonical membrane binding site of the I/IV module (Leow et al., submitted). There is a trend towards loss of endonexin motifs in one or more annexin repeats in clades I (trematode) and VI (nematode), whereas the clades of insect annexins (clades IV and V) retain endonexin domains in all annexin repeats. Endonexin sequences are generally present in all four repeats in the basal invertebrates, although individual repeats of some basal annexins may have lost the motifs. The trend towards partial or complete loss of the endonexin motif may reflect the early changes in cellular structure that led to evolution of the unique cellular architecture of helminths, notably those of the parasite groups. Current genome data is biased towards species with direct implications for humans, but future studies dissecting uncategorised annexins in supergroups such as the Rhizaria and Excavates (see Figure 2) will advance our understanding of molecular evolution. Intriguingly, instances of secondary endosymbiosis may be potentially complicating, but highly informative. Contemporary phylogenies in the past decade have postulated highly divergent eukaryotic clades, different from the traditional top-down concept with a “ladder” from amitochondriate parasites “up” to multicellular organisms15. This has led to parts of the traditional annexin nomenclature being unnecessarily confusing. A prominent example of the current annexin nomenclature resulting in complicated relationships is the case of Group C annexins, which appear in both the Amoebozoa (Dictyostelids, Myxogastrids) and the Opisthokonts (Fungi). With increasing amounts of genomic data becoming available, the nomenclature of annexins might benefit from some modifications, particularly considering changing inferences of eukaryotic evolution.

Methods

Sequence identification and secondary structure-based alignment

Putative annexin amino acid sequences (available in public databases) from 34 organisms representing 13 phyla (Table 1) were identified using the BLASTp and tBLASTn algorithms25, and the corresponding nucleotide sequences were subsequently retrieved. Two search patterns were used (see Supplementary Table S1), namely the C-terminal domain of Anx(Sm)1 (gb:XP_002578586; 330 residues), as well as its first repeat only (71 residues). The selection focused on parasitic organisms and their vectors, but also included non-parasitic organisms representing annexins that have already been established in the literature. Secondary structure elements for each amino acid sequence were predicted using the software PSIPRED26. A secondary structure-based sequence alignment was generated automatically using the software SBAL27, visually inspected and manually adjusted (see Supplementary Figure S2). For each annexin protein sequence, the corresponding cDNA was retrieved from public databases (see Table 1); subsequently, all cDNA sequences were aligned using ClustalW28 with default parameters. The phyla from which representative annexins were obtained included a range of protistan (Amoeboza: Archamoebae, Alveolata: Apicomplexa, Excavata: Euglenozoa) and animal (Placozoa, Radiata: Cnidaria, Lophotrochozoa: Mollusca and Platyhelminthes, Ecdysozoa: Nematoda and Arthropoda, Deuterostomia: Chordata) groups. The annexin (Sm)5 from S. mansoni (new name: annexin B7a) gene was amplified from a cDNA library obtained from seven different life cycle stages (eggs through to adult worms) by polymerase chain reaction (PCR) using Pfu polymerase (Stratagene), buffers and nucleotides as recommended by the manufacturer, and 0.25 μM of each forward (5′-CATGCCATGGGCATGGGAAGAGATAAATCACAAATAA-3′) and reverse primer (5′-CCGCTCGAGTTGCCATTCAGCACCAATTA-3′), and a cycling protocol of 1 min at 95°C followed by 30 cycles of denaturation at 95°C for 10 sec, annealing at 53°C for 30 sec and extension at 68°C for 3 min. The final extension step was 68°C for 7 min. DNA sequencing was performed with BigDye (Applied Biosystems) terminator chemistry as per the manufacturer's instructions.

Phylogenetic analyses and prediction of orthologous/paralogous relationships

A non-redundant data set, including full-length annexin sequences, was extracted from the structure-based sequence alignment. Best-fit evolutionary models for maximum-likelihood (ML) phylogenetic analyses of both annexin amino acid and nucleotide sequences were predicted using ProtTest29 and jModelTest30, respectively. The best-fit model inferred from the Akaike Information Criteria (AIC) was used in the amino acid dataset analyses and the Bayesian Information Criteria (BIC) for the nucleotide dataset. For each amino acid and nucleotide sequence alignment, ML and Bayesian Inference (BI) trees were derived using MEGA v.531 and MrBayes 3.1.232, respectively. All trees were rooted using the human annexin A13 (GenBank accession numbers NP_004297 and NM_004306, for the amino acid and nucleotide sequence, respectively) as the outgroup. The ML phylogenetic trees of amino acid and nucleotide sequences were constructed using the Jones-Taylor-Thornton (JTT) model assuming uniform rates among sites (+G + I; i.e. including gamma, proportion of invariant sites) and the General Time Reversible model (GTR), respectively. For each ML analysis, the bootstrapped confidence interval was based on 100 replicates. BI analyses for both nucleotide and amino acid sequence alignments were run over 1,000,000 generations (‘ngen = 1,000,000’) with two runs each containing four simultaneous Markov Chain Monte Carlo (MCMC) chains (‘nchains = 4’) and every 100th tree being saved (‘samplefreq = 100’). The parameters used were as follows: ‘nst = 6’, ‘rates = invgamma’, with MCMC left at default settings, ‘ratepr = variable’ and ‘burnin = 100’. Consensus trees were constructed, with ‘contype = allcompat’ nodal support being determined using consensus posterior probabilities. An initial Bayesian inference analysis of the amino acid dataset which excluded sequences of Schmidtea mediterranea was performed at 10,000,000 generations. The overall topology and posterior probability values did not vary significantly from the final analysis conducted at 1,000,000 generations as the likelihood probabilities stabilised well before 1,000,000 generations when examined in Tracer v1.5 software (Tracer v.1.5; http://beast.bio.ed.ac.uk/Tracer). All trees were displayed using FigTree v1.4 (http://tree.bio.ed.ac.uk/software/figtree/). The backtrans feature of TreeBeST (http://treesoft.sourceforge.net/treebest.shtml) was used to create a protein-guided codon alignment of the nucleotide sequences using the present protein sequence alignment. A species-guided ML tree using the Hasegawa-Kishino-Yano (HKY) model was constructed in TreeBeST and viewed in FigTree v1.4. The species tree was constructed with reference to relevant molecular phylogenies33343536373839 and the Tree of Life web project (http://tolweb.org/tree/phylogeny.html and references therein). For these analyses only, the human annexin A13 was removed as the outgroup as it was not consistent with the species tree constraint and instead the annexin B sequence from the freshwater sponge Ephydatia fluviatilis was used as the outgroup.

Nomenclature strategy

The B Group naming convention implemented here for the new sequence data sought to conform to the framework of nomenclature proposed by Fernandez and Morgan8, who suggested that new names should be assigned based on their level of amino acid sequence identity (‘closeness’) to the authoritative human annexins. Initial alignment and phylogenetic analyses of the amino acid sequence data reported here with that of the human annexins ANXA1−ANXA13 resulted in phylograms that were markedly polyphyletic, with the human annexins interspersed within various clades of B annexin sequences (data not shown). This observed polyphyly made determining ‘closeness’ of the new B annexin amino acid sequence data to the human annexins as a whole for naming purposes ambiguous. Therefore, we chose to exclude the human annexins ANXA1−ANXA11 from these analyses and use the human annexin ANXA13 as the functional outgroup in all subsequent analyses and for naming purposes. Since the annexin A13 gene is the probable common ancestor of all vertebrate annexins40, it is the appropriate outgroup sequence and root for the non-vertebrate phyla presented here. The determination of ‘closeness’ of the B annexins to the A13 sequence for naming purposes was initially undertaken using the maximum likelihood and Bayesian inference phylogenies. However, due to the large number of new sequences included in the present study (n = 115), we selected p-distances of B amino acid sequences relative to the human annexin A13 sequence as a more robust and objective method for assigning names because this yields the actual proportion of amino acid sites which differ between two sequences rather than inferring genetic distance based on a model of evolution41. The p-distances were calculated in MEGA v.531, with the setting ‘pairwise deletion of gaps/missing data’ selected. The data were then exported into a spreadsheet and sorted (see Supplementary Table S3). B annexin amino acid identifiers were assigned respecting those that have already been established in the literature (i.e. B1, B2, B3 from Taenia solium; B9 and B11 from Drosophila melanogaster; B12 from Hydra vulgaris) beginning with B4. The same B annexin number identifier was assigned to sequences proposed to be orthologs (shared ancestry through speciation) or paralogs (shared ancestry through duplication) based on clades with shared similarity (i.e. putative isoforms) as assessed by a combination of inferred relationships from the phylogenetic analyses, orthology/paralogy and secondary structure. Clades containing sequences from different species were considered putative orthologs and were assigned the same identifier. Where sequences from the same species were present in a clade, letter designations (‘a’, ‘b’, ‘c’, etc) were appended after the B numbers to indicate either putative isoforms or putative paralogs. The number assigned to a group of putative isoforms or paralogs was determined based on the sequence with the shortest p-distance to annexin A13. The subsequent letter designations for the putative isoforms or paralogs within the B annexins were then assigned in descending order based on p-distance from annexin A13.

Author Contributions

C.C., J.M.S., T.L.M., G.W., M.K.J. and A.H. designed, performed, and analysed computational work. C.Y.L., L.T., L.M., C.W., M.K.J. and A.H. designed, performed, and analysed experimental work. A.L., R.B.G. and M.K.J. provided essential datasets. All authors wrote and reviewed the manuscript.

38 in total

Review 1. Annexins: from structure to function.

Authors: Volker Gerke; Stephen E Moss
Journal: Physiol Rev Date: 2002-04 Impact factor: 37.312

2. The phylogeny of the Schistosomatidae based on three genes with emphasis on the interrelationships of Schistosoma Weinland, 1858.

Authors: A E Lockyer; P D Olson; P Ostergaard; D Rollinson; D A Johnston; S W Attwood; V R Southgate; P Horak; S D Snyder; T H Le; T Agatsuma; D P McManus; A C Carmichael; S Naem; D T J Littlewood
Journal: Parasitology Date: 2003-03 Impact factor: 3.234

3. Phylogeny and classification of the Digenea (Platyhelminthes: Trematoda).

Authors: P D Olson; T H Cribb; V V Tkach; R A Bray; D T J Littlewood
Journal: Int J Parasitol Date: 2003-07 Impact factor: 3.981

4. MrBayes 3: Bayesian phylogenetic inference under mixed models.

Authors: Fredrik Ronquist; John P Huelsenbeck
Journal: Bioinformatics Date: 2003-08-12 Impact factor: 6.937

Review 5. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs.

Authors: S F Altschul; T L Madden; A A Schäffer; J Zhang; Z Zhang; W Miller; D J Lipman
Journal: Nucleic Acids Res Date: 1997-09-01 Impact factor: 16.971

6. The genome of the protist parasite Entamoeba histolytica.

Authors: Brendan Loftus; Iain Anderson; Rob Davies; U Cecilia M Alsmark; John Samuelson; Paolo Amedeo; Paola Roncaglia; Matt Berriman; Robert P Hirt; Barbara J Mann; Tomo Nozaki; Bernard Suh; Mihai Pop; Michael Duchene; John Ackers; Egbert Tannich; Matthias Leippe; Margit Hofer; Iris Bruchhaus; Ute Willhoeft; Alok Bhattacharya; Tracey Chillingworth; Carol Churcher; Zahra Hance; Barbara Harris; David Harris; Kay Jagels; Sharon Moule; Karen Mungall; Doug Ormond; Rob Squares; Sally Whitehead; Michael A Quail; Ester Rabbinowitsch; Halina Norbertczak; Claire Price; Zheng Wang; Nancy Guillén; Carol Gilchrist; Suzanne E Stroup; Sudha Bhattacharya; Anuradha Lohia; Peter G Foster; Thomas Sicheritz-Ponten; Christian Weber; Upinder Singh; Chandrama Mukherjee; Najib M El-Sayed; William A Petri; C Graham Clark; T Martin Embley; Bart Barrell; Claire M Fraser; Neil Hall
Journal: Nature Date: 2005-02-24 Impact factor: 49.962

7. A consensus amino-acid sequence repeat in Torpedo and mammalian Ca2+-dependent membrane-binding proteins.

Authors: M J Geisow; U Fritsche; J M Hexham; B Dash; T Johnson
Journal: Nature Date: 1986 Apr 17-23 Impact factor: 49.962

8. Contributions to the phylogeny of Platyhelminthes based on partial sequencing of 18S ribosomal DNA.

Authors: K Rohde; C Hefford; J T Ellis; P R Baverstock; A M Johnson; N A Watson; S Dittmann
Journal: Int J Parasitol Date: 1993-09 Impact factor: 3.981

9. Interrelationships and evolution of the tapeworms (Platyhelminthes: Cestoda).

Authors: P D Olson; D T Littlewood; R A Bray; J Mariaux
Journal: Mol Phylogenet Evol Date: 2001-06 Impact factor: 4.286

Review 10. The annexins.

Authors: Stephen E Moss; Reg O Morgan
Journal: Genome Biol Date: 2004-03-31 Impact factor: 13.583

13 in total

1. The enigmatic role of fungal annexins: the case of Cryptococcus neoformans.

Authors: Maria Maryam; Man Shun Fu; Alexandre Alanio; Emma Camacho; Diego S Goncalves; Eden E Faneuff; Nina T Grossman; Arturo Casadevall; Carolina Coelho
Journal: Microbiology (Reading) Date: 2019-05-29 Impact factor: 2.777

2. Characterization of a Secretory Annexin in Echinococcus granulosus.

Authors: Xingju Song; Dandan Hu; Xiuqin Zhong; Ning Wang; Xiaobin Gu; Tao Wang; Xuerong Peng; Guangyou Yang
Journal: Am J Trop Med Hyg Date: 2016-01-19 Impact factor: 2.345

Review 3. Structure-function analysis of apical membrane-associated molecules of the tegument of schistosome parasites of humans: prospects for identification of novel targets for parasite control.

Authors: Chiuan Yee Leow; Charlene Willis; Andreas Hofmann; Malcolm K Jones
Journal: Br J Pharmacol Date: 2014-12-23 Impact factor: 8.739

4. An integrated Java tool for generating amino acid sequence alignments with mapped secondary structure elements.

Authors: Conan K Wang; Andreas Hofmann
Journal: 3 Biotech Date: 2014-05-20 Impact factor: 2.406

5. Bioinformatics analysis of bacterial annexins--putative ancestral relatives of eukaryotic annexins.

Authors: Praveen Kumar Kodavali; Małgorzata Dudkiewicz; Sławomir Pikuła; Krzysztof Pawłowski
Journal: PLoS One Date: 2014-01-16 Impact factor: 3.240

6. Comparative Cell Biology and Evolution of Annexins in Diplomonads.

Authors: Elin Einarsson; Ásgeir Ástvaldsson; Kjell Hultenby; Jan O Andersson; Staffan G Svärd; Jon Jerlström-Hultqvist
Journal: mSphere Date: 2016-03-23 Impact factor: 4.389

Review 7. Annexins as Overlooked Regulators of Membrane Trafficking in Plant Cells.

Authors: Dorota Konopka-Postupolska; Greg Clark
Journal: Int J Mol Sci Date: 2017-04-19 Impact factor: 5.923

8. Molecular Characterization of Annexin B2, B3 and B12 in Taenia multiceps.

Authors: Cheng Guo; Yue Xie; Yuchen Liu; Ning Wang; Jiafei Zhan; Xuan Zhou; Christiana Angel; Xiaobin Gu; Weimin Lai; Xuerong Peng; Guangyou Yang
Journal: Genes (Basel) Date: 2018-11-19 Impact factor: 4.096

Review 9. Fungal annexins: a mini review.

Authors: Kamand Khalaj; Elahe Aminollahi; Ali Bordbar; Vahid Khalaj
Journal: Springerplus Date: 2015-11-24

10. Transcriptomic Analysis of the Early Strobilar Development of Echinococcus granulosus.

Authors: João Antonio Debarba; Martín Pablo Cancela Sehabiague; Karina Mariante Monteiro; Alexandra Lehmkuhl Gerber; Ana Tereza Ribeiro Vasconcelos; Henrique Bunselmeyer Ferreira; Arnaldo Zaha
Journal: Pathogens Date: 2020-06-12