Literature DB >> 26829753

The genomic basis of parasitism in the Strongyloides clade of nematodes.

Vicky L Hunt1, Isheng J Tsai2,3, Avril Coghlan4, Adam J Reid4, Nancy Holroyd4, Bernardo J Foth4, Alan Tracey4, James A Cotton4, Eleanor J Stanley4, Helen Beasley4, Hayley M Bennett4, Karen Brooks4, Bhavana Harsha4, Rei Kajitani5, Arpita Kulkarni6, Dorothee Harbecke6, Eiji Nagayasu3, Sarah Nichol4, Yoshitoshi Ogura7, Michael A Quail4, Nadine Randle8, Dong Xia8, Norbert W Brattig9, Hanns Soblik9, Diogo M Ribeiro4, Alejandro Sanchez-Flores4,10, Tetsuya Hayashi7, Takehiko Itoh5, Dee R Denver11, Warwick Grant12, Jonathan D Stoltzfus13, James B Lok13, Haruhiko Murayama3, Jonathan Wastling8,14, Adrian Streit6, Taisei Kikuchi3, Mark Viney1, Matthew Berriman4.   

Abstract

Soil-transmitted nematodes, including the Strongyloides genus, cause one of the most prevalent neglected tropical diseases. Here we compare the genomes of four Strongyloides species, including the human pathogen Strongyloides stercoralis, and their close relatives that are facultatively parasitic (Parastrongyloides trichosuri) and free-living (Rhabditophanes sp. KR3021). A significant paralogous expansion of key gene families--families encoding astacin-like and SCP/TAPS proteins--is associated with the evolution of parasitism in this clade. Exploiting the unique Strongyloides life cycle, we compare the transcriptomes of the parasitic and free-living stages and find that these same gene families are upregulated in the parasitic stages, underscoring their role in nematode parasitism.

Entities:  

Mesh:

Year:  2016        PMID: 26829753      PMCID: PMC4948059          DOI: 10.1038/ng.3495

Source DB:  PubMed          Journal:  Nat Genet        ISSN: 1061-4036            Impact factor:   38.330


Introduction

More than a billion people are infected with intestinal nematodes[1,2]. The World Health Organization (WHO) has classified infections with soil transmitted nematodes[3] as one of the 17 most neglected tropical diseases and estimates that worldwide they cause an annual disease burden of 5 million Years Lost due to Disability (YLD), greater than that for malaria (4 million YLD) and HIV/AIDS (4.5 million)[4]. Parasitic nematode infections can impair physical and educational development[1]. Strongyloides spp. are soil-transmitted gastrointestinal parasitic nematodes infecting a wide range of vertebrates[5]. Two species – S. stercoralis and S. fuelleborni – infect some 100–200 million people worldwide[6,7]. Other Strongyloides species infect livestock, such as S. papillosus infection in sheep. Strongyloides spp. are from a clade of nematodes[8-10] that include taxa with diverse lifestyles including free-living (Rhabditophanes), parasitism of invertebrates, facultative parasitism of vertebrates (Parastrongyloides) and obligate parasitism of vertebrates (Strongyloides)[8,9]. Nematodes have independently evolved parasitism of animals several times[11], and thus understanding the genomic adaptations to parasitism in one clade will help in understanding how parasitism has evolved across the phylum more widely. The Strongyloides life cycle alternates between free-living and parasitic generations. The female only, parthenogenetic[12] parasitic stage lives in the small intestine of its host where it produces offspring that develop outside of the host either directly to infective third-stage larvae (iL3s) or via a dioecious, sexually reproducing, adult generation[13], whose progeny are also iL3s. The iL3s penetrate the skin of a host and migrate to its gut[14] where they develop into parasitic adults (Fig. 1). Therefore, this life cycle has two genetically identical adult female stages – one obligate and parasitic, and one facultative and free-living; we have compared these transcriptomically and proteomically to reveal the genes and gene products specifically present in the parasitic stage. The closely related genus Parastrongyloides[5,15] is similar to Strongyloides spp., except that its parasitic generation is dioecious and sexually reproducing, and that it can have apparently unlimited cycles of its free-living adult generation[5,16](Fig. 1).
Fig. 1

Evolution and comparative genomics of Strongyloides and relatives

The life cycles of six clade IV nematodes showing the transition from a free-living lifestyle (in Rhabditophanes), through facultative parasitism (P. trichosuri), to obligate parasitism (Strongyloides spp.), and the phylogeny of these species (maximum-likelihood phylogeny based on a concatenated alignment of 841,529 amino acid sites from 4,437 conserved single-copy orthologous genes). Values on nodes (all 100) are the number of bootstrap replicate trees showing the split induced by the node, out of 100 bootstrap replicates. The phylogeny is annotated with the numbers of gene families appearing along each branch of the phylogeny (+values on each branch) and histograms show the number of duplications (blue) and losses (red) for individual genes (dark blue or red) and foor families (light blue or red); the number of gene origins and gene losses in 18 astacin families (upper numbers in boxes) and ten SCP/TAPS families (lower numbers in boxes) as estimated by the Ensembl Compara pipeline is also shown. The pie charts summarize the evolutionary history of the genome of each species, defining genes shared among all six species, the five parasitic species (Strongyloididae, which includes all except Rhabditophanes), the four Strongyloides species, and species-specific genes. The host species of the parasites are shown: for P. trichosuri the brushtail possum, for S. ratti and S. venezuelensis the rat, for S. stercoralis humans, and for S. papillosus sheep.

Here we report the genome sequences for six nematodes from one superfamily: four species of StrongyloidesS. stercoralis (a parasite of humans and dogs), S. ratti and S. venezuelensis (both parasites of rats, and important laboratory models of nematode infection) and S. papillosus (a parasite of sheep); Parastrongyloides trichosuri (which infects the brushtail possum Trichosurus vulpecula), and the free-living nematode Rhabditophanes sp.[8]. To investigate the genomic and molecular basis of parasitism in these nematodes we compared (i) the genomes and gene families of these parasitic (Strongyloides and Parastrongyloides, the Strongyloididae) and free-living (Rhabditophanes) taxa (Fig. 1); (ii) the transcriptomes of parasitic adult females, free-living adult females and iL3s of S. ratti and S. stercoralis, and (iii) the proteomes of parasitic and free-living females of S. ratti. We have identified the genes present in the parasitic species, and the genes and gene products uniquely upregulated in the parasitic stages of S. stercoralis and S. ratti; together these are the major genomic and molecular adaptations to the parasitic lifestyle of these nematodes.

Results

Chromosome biology

We have produced a high-quality 43 Mb reference genome assembly for S. ratti (Supplementary Note), with its two autosomes[17] assembled into single scaffolds and the X chromosome[17] into ten (Table 1; Fig. 2). This assembly is the second most contiguous assembled nematode genome after the Caenorhabditis elegans reference genome[18]. We also produced high quality draft assemblies of the 42–60 Mb genomes of S. stercoralis, S. venezuelensis, S. papillosus, P. trichosuri and Rhabditophanes sp., which are 95.6 – 99.6% complete (Supplementary Table 1). With GC contents of 21% and 22% respectively, the S. ratti and S. stercoralis genomes are the most AT-rich reported to date for nematodes (Supplementary Table 1). The ~43 Mb S. ratti and S. stercoralis genomes are small compared with other nematodes. However, the total protein-coding content of each nematode genome is similar (18–22 Mb versus 14–30 Mb in eight outgroup species; Supplementary Table 1). Significant loss of introns as well as shorter intergenic regions account for the smaller genomes from the present study (Spearman’s correlation between genome size and intron number ρ=0.91, P < 0.001 and size of intergenic regions ρ=0.63, P = 0.02; Supplementary Table 2). However, parsimony analysis of intronic positions conserved in two or more species revealed that substantial intron losses occurred prior to the evolution of the Rhabditophanes-Parastrongyloides-Strongyloides clade (Supplementary Fig. 1), and are therefore not an adaptation associated with parasitism.
Table 1

Properties of genome assemblies

Genome statistics based on scaffolds, excluding scaffolds less than 1000 bp. N50 is the size above which 50% of the assembled bases are distributed; N50 (number) is the number of scaffolds in which 50% of assembled bases exist.

S. rattiS. stercoralisS. papillosusS. venezuelensisP. trichosuriRhabditophanesC. elegans
CladeIVIVIVIVIVIVV
Number of chromosomes3[107]3[108]2[109]2[22]3[23]5a6[18]
Assembly versionV5.0.4V2.0.4V2.1.4V2.0.4V2.0.4V2.0.4WS244
Assembly size (Mb)43.142.660.252.142.247.2100.2
Number of scaffolds115b6754,3535201,3913806
N50 of scaffolds (kb)11,7004318671583753717,500
N50 (number)2161291612223
Maximum scaffold length (Mb)16.85.01.75.96.27.320.9
G+C content (%)21222625313236
Number of genes12,45113,09818,45716,90415,01013,49623,629
Number of exons33,79634,36640,82140,61935,04937,987145,275
Exons, combined length (Mb)17.517.922.420.320.817.830.1
Median exon length (bp)263265304261348276146
Number of introns21,34521,26822,36423,71520,03924,491169,506

See Supplementary Figure 7

12 scaffolds, covering 93% of the genome, are assigned to chromosomes; 103 scaffolds are not assigned to a chromosome.

Fig. 2

Nuclear genomic synteny and mitochondrial genomes of four Strongyloides spp., P. trichosuri and Rhabditophanes sp

(a) The S. ratti genome, our best assembled genome, is used as the reference sequence; synteny is based on sequence matches. Graduation of color across the S. ratti chromosomes represents position along the chromosome for chromosome I (yellow-red), chromosome II (blue-purple) and chromosome X (green). Black boxes represent scaffolds >1Mb; scaffolds <1Mb are grouped together and shown in grey. (b) The mitochondrial gene order and phylogeny for our six species and seven outgroup species that encompass four nematode clades. Our eighth outgroup species, Meloidogyne hapla, was excluded due to insufficient mitochondrial genome data. Inverted sequences are shown by gene boxes with inverted text. The maximum likelihood tree (left) was constructed using 12 mitochondrial proteins. Amino acid sequences were aligned before concatenation (Supplementary Note).

The canonical view of a nematode chromosome, defined nearly twenty years ago using C. elegans autosomes (and later confirmed in C. briggsae[19]) is of a gene-dense, repeat-poor “center” of conserved genes (based on homology with yeast genes[18]), flanked by two gene-poor, repeat-rich “arms” in which most genes are less strongly conserved. S. ratti is the first non-Caenorhabditis nematode whose whole chromosomes have been assembled and it presents a strikingly different organisation with relatively little variability in gene density, repeat density or gene conservation to yeast genes along its autosomes (Supplementary Figs. 2, 3). Synteny is highly conserved within the parasitic Strongyloididae, but much less between this family and Rhabditophanes (Fig. 2). Scaffolds of the parasitic species largely correspond to blocks from a particular S. ratti chromosome, but in a scrambled order. This suggests that intra-chromosomal rearrangement is frequent, but inter-chromosomal rearrangement is rare, a common phenomenon in nematode chromosome evolution[19-21]. The notable exception was for S. papillosus and S. venezuelensis scaffolds that have many blocks that are syntenic to both S. ratti chromosome I and X (Supplementary Table 3). This likely reflects the fusion event between chromosomes I and X in these species[22-24]. Associated with this fusion is a change in the chromosome biology of sex determination in these species. S. papillosus undergoes chromatin diminution (where a chromosome fragments after which part of the chromosome is eliminated during mitosis) to mimic the XX/XO sex-determining system of S. ratti[25] and S. stercoralis[22]. By analyzing the differential coverage of mapped sequence data from iL3s (which are all female) and adult males, we were able to identify regions of the S. papillosus X-I fusion chromosome that are eliminated from males during diminution (Supplementary Table 4). Six scaffolds were identified from the diminished region using existing genetic markers (Supplementary Table 5), but our read-depth approach extended this map to 153 scaffolds (18% of the assembly, 10.9 Mb). Interestingly, some genes with orthologs on the X chromosome of S. ratti are not diminished in S. papillosus, so dosage of these genes in males has changed since the species diverged, including three genes on S. papillosus chromosome II (confirming earlier work[22]), and 33 that lie in non-diminished regions of the X-I fusion chromosome (Supplementary Table 6).

Extensive rearrangement of the mitochondrial gene order

The S. stercoralis mitochondrial (mt) genome is highly rearranged compared with nematodes from clades I, III and V[26]. Manual finishing of the mt genomes of the six species revealed that the Rhabditophanes mt genome consists of two circular chromosomes, a feature of some other nematode species[27]. Compared with eight outgroup species, Rhabditophanes has a conventional gene order but Strongyloides spp. and P. trichosuri have highly rearranged mt genomes (Fig. 2, Supplementary Table 7). Similar observations have been reported in other clade IV parasitic nematodes[27-30] and there is evidence of mt recombination[29,31], which is rarely observed in animals[32]. Consistent with published nematode mt genomes, the gene-based phylogeny of the mt genome (Fig. 2) conflicts with phylogenies based on nuclear genes[29,33,34], and the rearranged gene order of the mt genome of Strongyloides spp. is accompanied by nucleotide divergence (Fig. 2).

Gene families associated with the evolution of parasitism

We predicted 12,451–18,457 genes across the six genomes, numbers comparable to other nematode species (Table 1, Supplementary Fig. 4). We then used Ensembl Compara (Supplementary Note)[35] to identify orthologs and gene families (Supplementary Table 8) in these and eight outgroup species, encompassing four further nematode clades (Supplementary Fig. 4). By pinpointing when a new gene family arose, and where a family has expanded or contracted, we could determine which gene families are associated with the evolution of parasitism. The largest acquisition of gene families (1075 families) was found on the branch leading to the parasitic nematodes, Strongyloides spp. and P. trichosuri (Fig. 1, Supplementary Fig. 4). Despite this highly dynamic pattern of gene gains and loss within each species’ genome, the proportion of Strongyloides- (and Strongyloididae-) specific genes is consistent across the phylogeny (Fig. 1). The branches leading to these five parasitic species also showed greater expansion of genes and families of genes, compared to that in the free-living Rhabditophanes. Gain and expansion of gene families in these parasitic species likely reflects the necessary adaptations required by these species to be able to parasitize vertebrate hosts while maintaining a free-living phase. The two most expanded Strongyloides spp. gene families encode astacin-like[36] and SCP/TAPS (SCP/Tpx-l/Ag5/PR-1/Sc7[37], also known as CAP-domain) proteins, present in multiple subfamilies (based on Ensembl Compara analysis, Supplementary Table 8, and protein domain combinations, Supplementary Table 9). The astacin family of metallopeptidases was the most expanded, with 184–387 copies in Strongyloides/Parastrongyloides compared with Rhabditophanes and with eight outgroup species, showing that this expansion accompanies the evolution of parasitism (Fig. 1; Supplementary Table 10). Among the outgroup species the hookworm Necator americanus[38] has 82 astacin coding genes, and the free-living C. elegans 40[36]. SCP/TAPS proteins are often immunomodulatory molecules in parasitic nematodes[37] and have been investigated as potential vaccine candidates against N. americanus[39,40]. We found 89–205 SCP/TAPS coding genes in the Strongyloides spp. genomes, including nine subfamilies not present in P. trichosuri, Rhabditophanes or the eight outgroup species (Supplementary Tables 8 and 10). In N. americanus there are 137 SCP/TAPS coding genes[38], suggesting that this gene family has independently expanded twice: in nematode clades IV and V. Additional gene expansions included receptor-type protein tyrosine phosphatases which have a putative role in signaling[41], and are expanded in Strongyloides and Parastrongyloides (52–75 genes) compared with Rhabditophanes (13), and the eight outgroup species (up to 39 genes). Acetylcholinesterase coding genes were expanded in Strongyloides and Parastrongyloides (30–126 genes) compared to Rhabditophanes (1) and 1–5 genes in our outgroup species. Many parasitic nematodes secrete acetylcholinesterases which are thought to facilitate their maintenance in hosts[42] and the expansion of this gene family in these parasitic species is consistent with this role. Some families show sub-clade specific expansion; for instance, S. papillosus / S. venezuelensis have a paralogous expansion of genes encoding Speckle-type POZ domains[43] (92–130 genes) compared with S. ratti / S. stercoralis (9–10 genes) (Fig. 1; Supplementary Table 8). No function or annotation could be assigned to approximately one third (26–37%) of the genes present in the six species, but 50% of these could be assigned to novel gene families. The six largest of these families occurred only in Strongyloides and Parastrongyloides, comprising a total of 630 genes. We have named these Strongyloides genome project families (sgpf) 1–6. Members of sgpf-1 and -5 are predicted to have signal peptides and to be highly glycosylated (Supplementary Table 11).

Expanded gene familes are upregulated in parasitic stages

We identified genes and gene families that are likely to play a key role in the parasitic lifestyle of S. ratti and S. stercoralis, by comparing the transcriptomes of parasitic and free-living female stages. We generated S. ratti transcriptome data and used previously published S. stercoralis data[44]. A total of 909 S. ratti and 1,188 S. stercoralis genes were upregulated in parasitic females compared with free-living females (edgeR, fold change>2, FDR<0.01; Supplementary Tables 12, 13) of which 423 S. ratti and 457 S. stercoralis orthologous genes were upregulated in the parasitic female stage of both species (Supplementary Table 14). The two most expanded Strongyloides gene families – SCP/TAPS[37] and astacin domain coding genes[45-48] – dominated the list of genes differentially expressed by the parasitic female. In S. ratti and S. stercoralis, respectively, 58 and 62% of putative astacin-like proteins and 57 and 71% SCP/TAPS genes were differentially expressed between parasitic vs. free-living females (Fig. 3; Supplementary Tables 10, 13). However, other paralogously expanded genes were not enriched among the upregulated genes suggesting they may not be important for parasitism. Both Strongyloides and Parastrongyloides infect their hosts by skin penetration; the larvae then migrate through the host, and adult females in the host live in the mucosa of the small intestine[49,50] where they feed on the host. Astacins are metallopeptidases that have previously been associated with a role in tissue migration by nematode infective larvae[46,51]. Around half of the putative astacin-like proteins in Strongyloides spp. contain the canonical zinc binding motif (HEXXHXXGXXH) of astacin active sites and likely have a role in penetrating the host mucosa in which the parasitic females live. Teasing apart the role of different astacin gene family members in the migration and gut-dwelling phases of this life cycle could provide insights to allow new therapeutic interventions to be developed. For S. ratti and S. stercoralis respectively, 63 and 53% of the SCP/TAPS genes upregulated in the parasitic female encode a signal peptide suggesting that they may be secreted from the worm into the host. An immunomodulatory role for SCP/TAPS proteins has also been proposed based on the inhibitory effect that these proteins have on neutrophil and platelet activity in hookworm infections[37,52,53].
Fig. 3

The parasitic female, free-living female and infective third-stage larvae transcriptomes of Strongyloides spp

The progeny of the parasitic female pass out of the host (as larvae for S. stercoralis, or eggs and larvae for S. ratti) where infective third stage larvae (iL3s) can develop directly, or free-living males and females develop, whose progeny develop into iL3s; iL3s then infect hosts. The human parasite, S. stercoralis, can undergo internal auto-infection (grey dashed line) where iL3s develop and internally reinfect the same host. The transcriptome of the parasitic females, free-living females and iL3s were compared for S. ratti and S. stercoralis. Representative GO terms that were significantly enriched (left-hand side area of box) and Ensembl Compara gene families significantly upregulated (right-hand side of box) for each of these three stages of the lifecycle is summarized. The pie charts show the proportion of the GO terms common to S. ratti and S. stercoralis, or unique to either. Numbers in the right-hand side of boxes represent the number of genes upregulated in each gene family for S. ratti and S. stercoralis.

Other gene families commonly upregulated in the parasitic females of both species, compared with free-living females and iL3s, included those coding for transthyretin-like proteins, prolyl endopeptidases, acetylcholinesterases, trypsin-inhibitors, and aspartic peptidases (Fig. 3, Supplementary Table 15). The transthyretin-like genes had some of the highest fold changes of genes upregulated in the parasitic females (Supplementary Table 13). Transthyretin-like genes are a large, nematode-specific gene family[54], expressed in adult parasitic stages[55-57], and are distant relatives of vertebrate transthyretins that are involved in transporting thyroid hormones[58]. While some aspartic peptidases are essential for the digestion of host hemoglobin in blood-borne parasites[59,60], it has been proposed that others are involved in digesting other host macromolecules[61]. Hypothetical protein-coding genes accounted for 20–37% of the differentially expressed genes from pairwise comparisons of parasitic females, free-living females and iL3s, and included genes with the highest relative expression levels (Supplementary Table 13). These novel genes are likely to be important to these distinctive phases of the life cycle, including in parasitism. Three small novel gene families (sgpf-7-9) were predominantly upregulated in S. ratti parasitic females, two of which are predicted to be predominantly secretory or membrane-targeted (Supplementary Table 11). In contrast, the largest hypothetical protein-coding gene families, sgpf-1–6, accounted for only a small proportion (1% in both S. ratti and S. stercoralis) of all differentially expressed hypothetical protein-coding genes suggesting that they do not have roles involved in parasitism. Using gene ontology annotations to summarize the putative functions of upregulated genes revealed distinct differences between the life cycle stages of both species (Fig. 3, Supplementary Table 16). The genes upregulated in iL3s appear to be associated with sensing the environment and with signal transduction, and were the most consistent between S. ratti and S. stercoralis. The products of free-living female expressed genes have core metabolic and growth-related roles (such as in cytoskeleton and chromatin). In parasitic stages, the dominant functional categories were proteases, consistent with the abundant astacins (Fig. 3, Supplementary Table 16).

The products of putative parasitism genes are secreted

In parallel we compared the somatic proteome of parasitic and free-living females of S. ratti. Of 1,266 proteins detected overall, 569 were comparatively upregulated in parasitic females and 409 in free-living females (Supplementary Tables 12, 17). We found a modest overlap between the transcriptome and somatic proteome; 6% of genes upregulated in the parasitic female transcriptome were also upregulated in the proteome, and 10% for free-living females (Supplementary Fig. 5; Supplementary Table 18). A poor concordance between transcript and peptide abundance has been reported in many systems[62-64] and likely reflects post-translational processes that decouple protein and mRNA abundance. In the present study, this may be compounded by the excretion / secretion of many gene products from parasitic stages, to interact with the host. Indeed, 43% of genes upregulated in the parasitic female transcriptome are predicted to encode signal peptides, compared with 26% for the free-living females. Furthermore, while several of the putative parasitism gene families were highly upregulated in the somatic proteome (aspartic peptidases, prolyl endopeptidases and acetylcholinesterases; Supplementary Table 17), we found only five astacin-like and no SCP/TAPS proteins (Supplementary Fig. 5). To address this we extended the analysis to the excretory/secretory (ES) proteome data of Soblik et al [65]. In the ES proteome we detected an additional 882 proteins, and found greater consistency with the parasitic female transcriptome: 13% of the parasitic female ES proteins overlapped with the upregulated transcriptome (Supplementary Table 18). We also found 25 astacin and 14 SCP/TAPS gene products in the ES proteome. Other gene families highly upregulated in the parasitic female transcriptome were also dominant in the parasitic ES proteome including prolyl endopeptidases, acetylcholinesterases, and transthyretin-like proteins (Supplementary Table 19). Protein products of novel gene families sgpf-1 and −5 were also identified in the ES products of both parasitic and free-living females (Supplementary Table 11). Other parasitic nematodes have been noted to have many protease coding genes, and different species appear to have expanded different protease families[38,66-68]. Together these, and our findings, suggest that expansion of protease coding genes, and secretion of extensive quantities of proteases is likely to be an essential feature of nematode parasitism. These proteases are, presumably, used to penetrate host tissue, acquire resources from the host and to protect the parasite from host-induced harm.

Parasitism-associated genes are in co-expressed clusters

We observed that genes upregulated in the parasitic females and iL3s were often physically clustered in the genome, more so than for genes upregulated in the free-living female (Supplementary Table 20). To test whether this clustering was significant we asked whether clusters of three or more adjacent genes, upregulated in the same life cycle stage, occurred more often than would be expected by chance. We found that 31%, 4% and 26% of upregulated genes were in such clusters in S. ratti parasitic females, free-living females and iL3s, respectively, while in S. stercoralis this was 34%, 2% and 34% (Supplementary Table 20). This clustering is more than would be expected by chance (Supplementary Fig. 6; Supplementary Table 20). The parasitic female clusters were larger (19 and 16 genes in the largest S. ratti and S. stercoralis clusters, respectively) compared with those of the iL3s (9 and 14 genes) and free-living female stages (3 genes) (Supplementary Table 20). Although nematodes, including S. ratti[69], have operons these clusters are unlikely to be operons because (i) the average intergenic distance among clustered genes does not differ from the genome-wide average (Supplementary Fig. 6) and (ii) cluster members include genes on both strands. Clusters of genes upregulated in the parasitic female were more likely to comprise genes from the same gene family. The majority (88–73 % for S. ratti and S. stercoralis, respectively) of these parasitic female clusters were of genes belonging to the same Compara gene family; this is greater than for iL3s (8–10%) (Supplementary Tables 20–22). Two gene families dominated parasitic female clusters: astacins (24 and 23% of parasitic female clusters for S. ratti and S. stercoralis) and SCP/TAPS (15 and 11%). Tandem expansions of astacin and SCP/TAPS genes could provide a plausible explanation for the preponderance of these gene families in the parasitic female expression clusters. However, even with the exclusion of the astacin and SCP/TAPS families, most remaining parasitic female clusters still comprised genes from the same gene family (85 and 65% for S. ratti and S. stercoralis, respectively); fewer clusters from the same gene family occurred for iL3s (7 and 9%) compared to parasitic females (Supplementary Table 21). Phylogenetic analysis of astacins, including the eight outgroup species, showed that 139 S. ratti genes form one distinct clade (Fig. 4), presumably derived from a single ancestral astacin gene. Similarly, the S. ratti SCP/TAPS gene family has almost exclusively expanded from one ancestral gene (Fig. 4). These gene clusters likely arose by tandem duplication of genes, as has occurred for other large gene families, for example in C. elegans[18]. However, in contrast to C. elegans, physical adjacency of the duplicated genes has been maintained in Strongyloides, perhaps due to the expansions being recent and therefore not having yet been broken-up by recombination. Alternatively the adjacency may be functional, for example there being pressure to maintain a common regulatory environment. Clustering of gene families was relatively rare among Rhabditophanes and eight outgroup species (Supplementary Table 21), meaning that this clustering is specific to the Strongyloides/Parastrongyloides lineage and thus to the parasitic lifestyle in this clade.
Fig. 4

Strongyloides-specific expansions and chromosomal clustering of gene families

(a) Astacin-like and (b) SCP/TAPS are the two major Strongyloides ratti gene families upregulated in the transcriptome of parasitic females. Left shows the phylogeny of each of these for S. ratti and our eight outgroup species and the crayfish Astacus astacus S. ratti genes are in light blue. Right shows the distribution of these genes in the genome, plotted as clusters of physically adjacent genes in the genome. Numbers above the peaks are the number of genes in a cluster of physically neighboring genes; ticks below the axis denote scaffold boundaries for chromosome X. The transcriptomic expression of these genes (in RPKM, reads per kilobase per million mapped reads) for parasitic females, free-living females and iL3s are shown on a grey scale, and the results of pairwise edgeR analysis of the gene expression among these lifecycle stages is shown in red or blue where a gene is upregulated. The color representing upregulation (red or blue) in a given stage of the life cycle relates to the color of the name of that stage for each pairwise comparison (fold change > 2, FDR < 0.01); no differential expression is shown as a white block.

The clusters of genes upregulated in the parasitic females were themselves chromosomally clustered forming 'parasitism regions' (Fig. 4). In S. ratti a third of genes upregulated in the parasitic female are concentrated in three regions of chromosome II, most notably a 3.6 Mb region at one end of chromosome II, comprising 171 genes that were upregulated in the parasitic female transcriptome (Supplementary Fig. 2). A similar pattern is evident in S. stercoralis where seven scaffolds and contigs with a high density of genes upregulated in the parasitic female also belong to chromosome II; 46% of the 171 S. ratti genes belong to just eight different gene families including those coding for aspartic peptidases, astacin-like, SCP/TAPS, transthyretin-like and trypsin inhibitor-like proteins. This is the first report of chromosomal clustering of genes likely to be important in nematode parasitism and hints at possible regulatory mechanisms for parasite development.

Discussion

Understanding the molecular and genetic differences between parasitic and free-living organisms is of fundamental biological interest, and essential to identify novel drug targets, and other methods to control parasitic nematodes and the diseases that they cause. We have undertaken a comparative genomics study of six taxa from an evolutionary clade that transitions from a free-living to parasitic lifestyle, which we combined with transcriptomic and proteomic analyses of parasitic and free-living female stages of Strongyloides spp. Together, this is a powerful way to discover the molecular adaptations to parasitism among these nematodes. We find that a preponderance of genes expanded in parasitic species are specifically used in the parasitic stages and are within genomic clusters, concentrated in regions of chromosome II. This is consistent with the idea that the within-host stages of parasitic nematodes deploy a specific biology that enables them to be successful parasites. The Strongyloides proteome and transcriptome have a limited overlap, as has been observed in other systems. For the Strongyloides clade we find that astacin and SCP/TAPS coding genes are prominent amongst parasitism-associated genes. Other parasitic nematodes appear to have expanded the number of protease coding genes in their genome, which also appear to be used predominantly during the within-host stages. In Strongyloides we have also found genomic clustering of these and other likely parasitism-associated genes, which is likely to have been initiated during the adaptation to parasitism, followed by subsequent repeated gene duplication, associated with adaptation to different hosts. This genomic arrangement may facilitate expression of a parasitic transcriptional program by these parasites. Operons have been demonstrated in Strongyloides, and it will be important to determine whether these parasitism associated genes are under operonic control. Strongyloides is a particularly amendable laboratory system – both S. ratti and S. venezuelensis can be laboratory maintained in their natural rat host, as well as other rodents, and the parasite of humans S. stercoralis can also be maintained in the laboratory. In addition to providing a compelling model of the evolution of parasitism, transgenesis of Strongyloides and Parastrongyloides is possible[70-73] uniquely among parasitic nematodes, which will allow functional genomic studies, directed by our findings, to further explore the genetic basis of nematode parasitism.

Online Methods

Parasite material, sequencing and assembly

S. ratti, S. stercoralis, S. venezuelensis and S. papillosus larvae were obtained from fecal cultures of infected laboratory animals; for Parastrongyloides trichosuri and Rhabditophanes sp. KR3021 material was obtained from stages grown on agar plates. To produce the S. ratti reference genome, a combination of Sanger capillary, 454 and Illumina-derived sequence data was used, while data for the other species were generated using Illumina technology. The S. ratti genome was initially assembled using Newbler v.2.3[74] (for the capillary and 454 sequence data) and AbySS v.1.3.1[75] (for the Illumina data); Illumina paired-end reads were mapped to this with SMALT (Hannes Ponstingl, pers. comm.). The genomes of the other species, except S. venezuelensis, were assembled using a combination of SGA assembler[76] and Velvet[77], from 100 bp paired-end Illumina reads, produced from short (~500 bp) fragment[78] and 3 kb mate-pair libraries[79]. Illumina reads were used in the IMAGE[80] and Gapfiller[81] software to fill gaps, and in iCORN[82] to correct base errors. Gap5[83] was used to manually extend and link scaffolds using Illumina read pairs. Genetic markers[22] were mapped to the S. ratti assembly to order and orient scaffolds, and in S. papillosus to assign scaffolds to chromosomes and regions of putative chromosomal diminution. The S. venezuelensis genome was assembled using the Platanus assembler[84] and improved as described above for other species. The resulting v2 S. venezuelensis assembly was further scaffolded using an optical map produced using an Argus optical mapping platform (Opgen). CEGMA v2[85] was used to assess the completeness of each assembly. Assembled sequences were scanned for contamination from other species, using a series of BLASTX and BLASTP[86] searches against vertebrate and invertebrate sequence databases. Repeat sequences in the assemblies were characterized using RepeatModeler and TransposonPSI. Mitochondrial genomes were assembled using MITObim assembler[87] with the C. elegans mitochondrial genes as seeds. The gene order of each assembly was confirmed by PCR. A mitochondrial protein-coding gene sequence phylogeny was constructed using RaxML v7.2.8[88].

Identifying regions that undergo chromatin diminution or belong to the X chromosome

To identify chromosomal regions that undergo chromatin diminution in S. papillosus, and scaffolds that belong to the X chromosome in S. ratti, S. stercoralis, and P. trichosuri, DNA of males and females from each species was sequenced and mapped to the appropriate reference genome using SMALT v0.7.4 (Hannes Ponstingl, pers. comm.. The read depth was calculated for each scaffold using the BedTools function genomecov[89], and all scaffolds were classified as diminished/X or non-diminished/autosomal based on differences in read coverage. Since males are hemizygous for the diminished region in S. papillosus[22], and for the X chromosome in the other species, a male: female read-depth ratio of 0.5:1 was expected in diminished or X scaffolds relative to autosomes, whereas in non-diminished/autosomal region the ratio would be expected to be close to 1:1

Gene prediction and functional annotation

Genes were predicted using Augustus[90] – with a training set of approximately 200–400 manually curated genes per species, aligned transcript data and S. ratti protein sequences as hints – supplemented with non-overlapping predictions from MAKER[91]. If there was more than one alternative splice pattern for a gene prediction in the combined Augustus/MAKER gene set we only kept the transcript corresponding to the longest predicted protein. Astacin gene models and a subset of SCP/TAPS gene models from S. ratti, S. venezuelensis and S. stercoralis were manually curated prior to phylogenetic analyses. A protein name was assigned to each predicted protein based on manually curated orthologs in UniProt[92] from selected species (human, zebrafish, Drosophila melanogaster, Caenorhabditis elegans, and Schistosoma mansoni orthologs) where possible. If a predicted protein was not assigned a protein name based on its orthologs, then a protein name was assigned based on InterPro[93] domains in the protein. Gene Ontology (GO) terms were assigned by transferring GO terms from human, zebrafish, C. elegans, and D. melanogaster orthologs using an approach based on the Ensembl Compara approach for transferring GO terms to orthologs in vertebrate species[35], but modified for improved accuracy in transferring GO terms across phyla. Manually curated GO annotations were downloaded from the GO Consortium website[94], and for a particular predicted protein in the present study, the manually curated GO terms were obtained for all its human, zebrafish, C. elegans, and D. melanogaster orthologs. From this set the last common ancestor term (in the GO hierarchy) was found for each pair of GO terms from orthologs of two different species (e.g. a C. elegans ortholog and a zebrafish ortholog) and then transferred to our predicted protein. GO terms of the three possible types (molecular function, cellular component and biological process) were assigned to predicted proteins in this way. Additional GO terms were identified using InterproScan[95].

Gene orthology and species tree reconstruction

Eight outgroup species were used, encompassing four previously defined nematode clades[11] (clade I, Trichinella spiralis, Trichuris muris; clade III, Ascaris suum, Brugia malayi; clade IV, Bursaphelenchus xylophilus, Meloidogyne hapla; clade V, Necator americanus, C. elegans), together with the six species from the present study to construct a Compara database using the Ensembl Compara pipeline[35]. The database was used to identify orthologs and paralogs; gene duplications and gene losses; as well as gene families shared among the species, or sub-sets of the species, or specific to one species. 4,437 gene families were identified that contained just one gene from each species and that were present in at least ten species out of the six species and the eight outgroups. An alignment for the proteins in each family was built using MAFFT version v6.857[96], poorly-aligning regions were trimmed using GBlocks v0.91b, and the remaining columns were concatenated. For each alignment, the best-fitting amino acid substitution model was identified as that minimising the Akaike Information Criterion from the set of models available in RAxML v8.0.24[88], testing models with both pre-defined amino acid frequencies and observed frequencies in the data, and all with the CAT model of rate variation across sites. A maximum likelihood phylogenetic tree was constructed based on the concatenated alignment, with each protein alignment an independent partition of these data, applying the best-fitting substitution model identified above to each partition. This inference used RAxML v8.0.24 with ten random addition-sequence replicates and 100 bootstrap replicates, and otherwise default heuristic search settings.

Analysis of intron-exon structure and synteny analysis

Introns that were present in two or more species were identified from gene structures and full gene nucleotide alignments of 208 single-copy orthologs using ScipPio[97] and GenePainter[98]. The output from GenePainter was parsed into DOLLOP (PHYLIP package; Felsenstein, J.) to infer intron gain and loss on every node of the species tree using maximum parsimony. Whole-assembly nucleotide alignments were produced between S. ratti and the other five species using nucmer[99]. Each scaffold from the other species was assigned a chromosome based on its nucmer alignment to a S. ratti chromosome. To identify syntenic regions, conserved blocks of three consecutive orthologous genes or more in the same order and orientation were defined by DAGchainer[100], between the S. ratti reference and each of the other five species. To gain a high-level view of synteny, PROmer[101] was used to identify very highly conserved sequence matches, based on translated sequence, after which scaffolds from a particular species were ordered by matching to S. ratti chromosome and position in that chromosome, and the matches plotted using Circos[102].

Transcriptome and proteome analyses

For S. ratti and S. stercoralis the transcriptomes were compared from the parasitic female, free-living female and third stage infective larvae (iL3s); we note that parasitic and free-living adult females will have eggs in utero. For S. ratti, free-living females were picked individually from cultures of S. ratti-infected rat faeces, from where iL3s were also collected; parasitic females were collected by dissection of S. ratti-infected rats[103]. Two biological replicates were collected for parasitic and free-living females. These samples were divided approximately equally and used for both transcriptomic and proteomic analysis. A single biological sample was used for iL3 transcriptomic analysis. RNA was prepared from Trizol, and poly(A)RNA selected with Dynabeads, acoustically sheared and reverse transcribed to construct Illumina libraries that were sequenced. For S. stercoralis we used previously published data[44]. RNA-seq data were analyzed using R v.3.0.2 and the bioconductor package edgeR[104] to identify genes differentially expressed between all pairwise combinations of the three life-cycle stages. For S. ratti the proteome was also compared between the parasitic and free-living females. Equivalent samples of the material collected for the transcriptome analyses were used. Protein was extracted by freeze / thawing, mechanical grinding and chemical extraction and digested with trypsin. The resulting peptide mixture was analyzed by liquid chromatography-mass spectrometry. Proteins were identified and quantified using Progenesis. For downstream analyses at least two unique peptides were required to identify proteins. Protein abundance (iBAQ) was calculated from Progenesis. For both the transcriptome and proteome data, GO analysis was performed in R using TopGo v.2.16.0 and Fisher’s exact test. For the analysis of the ES proteome[65], converted raw spectral files were analysed by the Mascot search engine, where <1% FDR and a minimum of two significant peptides were required to identify proteins. Protein abundance was calculated from Mascot algorithm emPAI.

Astacins and SCP/TAPS

Genes encoding astacins and SCP/TAPS were identified using Interproscan. For these gene families we aligned amino acid sequences of all S. ratti and eight outgroup species’ members using MAFFT[96]. The alignments were edited with TCS[105] using the weighted option and the distance matrix of the new alignment was calculated using ProtTest[106]. The phylogenetic tree was constructed by maximum likelihood using RAxML[88] with 100 bootstrap replicates.

Gene clusters

Clusters of genes were identified as three or more adjacent genes upregulated in the same stage of the life cycle. The members of a cluster were considered to share a common gene family where ≥50 % of genes belonged to the same Compara gene family. To investigate the number of clusters expected by chance for a particular life cycle stage, for n genes upregulated in a particular stage, we randomly selected n genes from the genome, and calculated the number of clusters seen for the n random genes; this was repeated 1000 times and the mean value calculated.
  101 in total

1.  A broad survey of recombination in animal mitochondria.

Authors:  Gwenaël Piganeau; Michael Gardner; Adam Eyre-Walker
Journal:  Mol Biol Evol       Date:  2004-09-01       Impact factor: 16.240

2.  Expression of the Necator americanus hookworm larval antigen Na-ASP-2 in Pichia pastoris and purification of the recombinant protein for use in human clinical trials.

Authors:  Gaddam Narsa Goud; Maria Elena Bottazzi; Bin Zhan; Susana Mendez; Vehid Deumic; Jordan Plieskatt; Sen Liu; Yan Wang; Lilian Bueno; Ricardo Fujiwara; Andre Samuel; So Yeong Ahn; Maneesha Solanki; Oluwatoyin A Asojo; Jin Wang; Jeffrey M Bethony; Alex Loukas; Michael Roy; Peter J Hotez
Journal:  Vaccine       Date:  2005-09-15       Impact factor: 3.641

3.  Ancylostoma caninum MTP-1, an astacin-like metalloprotease secreted by infective hookworm larvae, is involved in tissue migration.

Authors:  Angela L Williamson; Sara Lustigman; Yelena Oksov; Vehid Deumic; Jordan Plieskatt; Susana Mendez; Bin Zhan; Maria Elena Bottazzi; Peter J Hotez; Alex Loukas
Journal:  Infect Immun       Date:  2006-02       Impact factor: 3.441

Review 4.  Cell signaling by receptor tyrosine kinases.

Authors:  Mark A Lemmon; Joseph Schlessinger
Journal:  Cell       Date:  2010-06-25       Impact factor: 41.582

5.  Iterative Correction of Reference Nucleotides (iCORN) using second generation sequencing technology.

Authors:  Thomas D Otto; Mandy Sanders; Matthew Berriman; Chris Newbold
Journal:  Bioinformatics       Date:  2010-06-18       Impact factor: 6.937

6.  BTB domain-containing speckle-type POZ protein (SPOP) serves as an adaptor of Daxx for ubiquitination by Cul3-based ubiquitin ligase.

Authors:  Jeong Eun Kwon; Muhnho La; Kyu Hee Oh; Young Mi Oh; Gi Ryang Kim; Jae Hong Seol; Sung Hee Baek; Tomoki Chiba; Keiji Tanaka; Ok Sun Bang; Cheol O Joe; Chin Ha Chung
Journal:  J Biol Chem       Date:  2006-03-08       Impact factor: 5.157

7.  A hookworm glycoprotein that inhibits neutrophil function is a ligand of the integrin CD11b/CD18.

Authors:  M Moyle; D L Foster; D E McGrath; S M Brown; Y Laroche; J De Meutter; P Stanssens; C A Bogowitz; V A Fried; J A Ely
Journal:  J Biol Chem       Date:  1994-04-01       Impact factor: 5.157

8.  The mitochondrial genome of Strongyloides stercoralis (Nematoda) - idiosyncratic gene order and evolutionary implications.

Authors:  Min Hu; Neil B Chilton; Robin B Gasser
Journal:  Int J Parasitol       Date:  2003-10       Impact factor: 3.981

9.  Genomic insights into the origin of parasitism in the emerging plant pathogen Bursaphelenchus xylophilus.

Authors:  Taisei Kikuchi; James A Cotton; Jonathan J Dalzell; Koichi Hasegawa; Natsumi Kanzaki; Paul McVeigh; Takuma Takanashi; Isheng J Tsai; Samuel A Assefa; Peter J A Cock; Thomas Dan Otto; Martin Hunt; Adam J Reid; Alejandro Sanchez-Flores; Kazuko Tsuchihara; Toshiro Yokoi; Mattias C Larsson; Johji Miwa; Aaron G Maule; Norio Sahashi; John T Jones; Matthew Berriman
Journal:  PLoS Pathog       Date:  2011-09-01       Impact factor: 6.823

10.  The genome and developmental transcriptome of the strongylid nematode Haemonchus contortus.

Authors:  Erich M Schwarz; Pasi K Korhonen; Bronwyn E Campbell; Neil D Young; Aaron R Jex; Abdul Jabbar; Ross S Hall; Alinda Mondal; Adina C Howe; Jason Pell; Andreas Hofmann; Peter R Boag; Xing-Quan Zhu; T Gregory; Alex Loukas; Brian A Williams; Igor Antoshechkin; C Brown; Paul W Sternberg; Robin B Gasser
Journal:  Genome Biol       Date:  2013-08-28       Impact factor: 13.583

View more
  99 in total

1.  Phosphoglycerate kinase: structural aspects and functions, with special emphasis on the enzyme from Kinetoplastea.

Authors:  Maura Rojas-Pirela; Diego Andrade-Alviárez; Verónica Rojas; Ulrike Kemmerling; Ana J Cáceres; Paul A Michels; Juan Luis Concepción; Wilfredo Quiñones
Journal:  Open Biol       Date:  2020-11-25       Impact factor: 6.411

2.  Signaling in Parasitic Nematodes: Physicochemical Communication Between Host and Parasite and Endogenous Molecular Transduction Pathways Governing Worm Development and Survival.

Authors:  James B Lok
Journal:  Curr Clin Microbiol Rep       Date:  2016-10-07

Review 3.  Perusal of parasitic nematode 'omics in the post-genomic era.

Authors:  Jonathan D Stoltzfus; Adeiye A Pilgrim; De'Broski R Herbert
Journal:  Mol Biochem Parasitol       Date:  2016-11-22       Impact factor: 1.759

Review 4.  Recent advances in functional genomics for parasitic nematodes of mammals.

Authors:  Michelle L Castelletto; Spencer S Gang; Elissa A Hallem
Journal:  J Exp Biol       Date:  2020-02-07       Impact factor: 3.312

Review 5.  The genomic basis of nematode parasitism.

Authors:  Mark Viney
Journal:  Brief Funct Genomics       Date:  2018-01-01       Impact factor: 4.241

Review 6.  Human infection with Strongyloides stercoralis and other related Strongyloides species.

Authors:  Thomas B Nutman
Journal:  Parasitology       Date:  2016-05-16       Impact factor: 3.234

7.  A Critical Role for Thermosensation in Host Seeking by Skin-Penetrating Nematodes.

Authors:  Astra S Bryant; Felicitas Ruiz; Spencer S Gang; Michelle L Castelletto; Jacqueline B Lopez; Elissa A Hallem
Journal:  Curr Biol       Date:  2018-07-12       Impact factor: 10.834

8.  Adaptive Radiation of the Flukes of the Family Fasciolidae Inferred from Genome-Wide Comparisons of Key Species.

Authors:  Young-Jun Choi; Santiago Fontenla; Peter U Fischer; Thanh Hoa Le; Alicia Costábile; David Blair; Paul J Brindley; Jose F Tort; Miguel M Cabada; Makedonka Mitreva
Journal:  Mol Biol Evol       Date:  2020-01-01       Impact factor: 16.240

Review 9.  Temperature-dependent behaviors of parasitic helminths.

Authors:  Astra S Bryant; Elissa A Hallem
Journal:  Neurosci Lett       Date:  2018-10-15       Impact factor: 3.046

10.  Host- and Helminth-Derived Endocannabinoids That Have Effects on Host Immunity Are Generated during Infection.

Authors:  Hashini M Batugedara; Donovan Argueta; Jessica C Jang; Nicholas V DiPatrizio; Meera G Nair; Dihong Lu; Marissa Macchietto; Jaspreet Kaur; Shaokui Ge; Adler R Dillman
Journal:  Infect Immun       Date:  2018-10-25       Impact factor: 3.441

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.