Literature DB >> 21336279

The draft genome of the parasitic nematode Trichinella spiralis.

Makedonka Mitreva1, Douglas P Jasmer, Dante S Zarlenga, Zhengyuan Wang, Sahar Abubucker, John Martin, Christina M Taylor, Yong Yin, Lucinda Fulton, Pat Minx, Shiaw-Pyng Yang, Wesley C Warren, Robert S Fulton, Veena Bhonagiri, Xu Zhang, Kym Hallsworth-Pepin, Sandra W Clifton, James P McCarter, Judith Appleton, Elaine R Mardis, Richard K Wilson.   

Abstract

Genome evolution studies for the phylum Nematoda have been limited by focusing on comparisons involving Caenorhabditis elegans. We report a draft genome sequence of Trichinella spiralis, a food-borne zoonotic parasite, which is the most common cause of human trichinellosis. This parasitic nematode is an extant member of a clade that diverged early in the evolution of the phylum, enabling identification of archetypical genes and molecular signatures exclusive to nematodes. We sequenced the 64-Mb nuclear genome, which is estimated to contain 15,808 protein-coding genes, at ∼35-fold coverage using whole-genome shotgun and hierarchal map-assisted sequencing. Comparative genome analyses support intrachromosomal rearrangements across the phylum, disproportionate numbers of protein family deaths over births in parasitic compared to a non-parasitic nematode and a preponderance of gene-loss and -gain events in nematodes relative to Drosophila melanogaster. This genome sequence and the identified pan-phylum characteristics will contribute to genome evolution studies of Nematoda as well as strategies to combat global parasites of humans, food animals and crops.

Entities:  

Mesh:

Year:  2011        PMID: 21336279      PMCID: PMC3057868          DOI: 10.1038/ng.769

Source DB:  PubMed          Journal:  Nat Genet        ISSN: 1061-4036            Impact factor:   38.330


Currently no complete genome sequence information exists from lineages spanning the phylum Nematoda (Supplementary Fig. 1). Yet, such information is essential to understanding evolution of the Nematoda analogous to the way that a basal chordate informed vertebrate evolution1. To this end, the genome sequence of Trichinella spiralis a food-borne, zoonotic parasite was generated to reveal molecular characters and evolutionary trends among this organism, evolutionarily distant parasitic and non-parasitic nematodes, and a member of the next closest sequenced relatives, the arthropods. In so doing, commonalities that link nematodes to other Metazoa were identified, as well as distinctions that define the Nematoda and differentiate T. spiralis from other species investigated. The Trichinella assembly is 64 million base pairs in length and encodes at least 15,808 proteins which make this genome substantially smaller than that of the prototypical nematode, Caenorhabditis elegans. Trichinellosis is worldwide zoonotic disease. The nematode, Trichinella spiralis, the most common cause of human trichinellosis, is a member of a clade that diverged early in the evolution of the Nematoda. It differs substantially in biological and molecular characters from other crown groups2–4. The lineage giving rise to the genus Trichinella last shared a common ancestor approximately 275 million years ago (Lower Permian Period) whereas the diversification of extant Trichinella species occurred as recent as 16–20 million years ago (Miocene Epoch)5. The life-cycle of Trichinella spp. (Supplementary Fig. 2) begins when muscle tissue containing first stage larvae (ML) is ingested by the new host. The ML rapidly develop to adults in the intestines where they mate and produce newborn larvae (NBL). The NBL migrate from the intestines through the lymphatic system and eventually to the blood where they search for striated skeletal muscle cells to invade, complete the cycle and become infectious. Intense inflammation is a primary cause of disease and involves myositis, myocarditis and encephalitis, the intensity of which depends on the number of parasites ingested. Currently, the genus consists of 8 distinct species and/or genotypes that are further categorized as encapsulated or non-encapsulated predicated upon the formation of a collagen envelope around the infected muscle cell. This capsule is believed to be a host-derived structure induced only by species that infect placental mammals and is unique to this genus. In addition to the formation of a collagen capsule, and contrary to most other parasitic nematodes, T. spiralis exhibits little host specificity, completes its entire life cycle in a single host, does not have a free-living stage, and lives as an intracellular parasite within a single striated muscle cell. As such, this genus presents biological characteristics that markedly differ from what is common among most other nematodes. Herein we compared molecular characteristics of nematodes and other metazoans using the entire T. spiralis genome. The comparative approach identified conserved protein and gene sequences with apparent archetypical standing for the phylum Nematoda. We found that intrachromosomal rearrangements were common throughout the phylum; however, this was in contrast to other characters such as protein family deaths and births which showed a clear demarcation between parasitic and a non-parasitic nematode. In addition, unlike Drosophila melanogaster the levels of gene loss and gain in each nematode species indicate that these events may have played a substantially larger role in the evolution of this phylum. The identification of these and other conserved characteristics, predicated in part upon this work, will advance more targeted research on pathogens from a phylum harboring thousands of pathogens that infect humans, animals and plants. The advances may one day provide holistic strategies to treat and control diseases caused by pathogens from across the Nematoda.

RESULTS

Sequencing, assembly and gene organization

Data were generated from whole-genome shotgun sequencing and hierarchal map-assisted sequencing 6. The assembly totaled 64 Mb (Supplementary Note and Supplementary Table 1), which is in line with recent genome size estimates made by flow cytometry (1C = 71 Mb)6–7. The data provided coverage of 35-fold, with 15% of the supercontigs encompassing 90% of the genome. The pan class="Species">T. spiralis fingerprint clone map enabled construction of nine ultracontigs comprised of 69 supercontigs representing 49 Mb or 76% of the genome. The repeat content of the T. spiralis genome is estimated at 18%. The repeats have a low GC content (27%) relative to the genome overall (34%) and to protein coding regions (43%). The 15,808 protein-coding sequences occupy 26.6% of the genome at an average density of 272 genes per Megabase (Mb). Although 15% of C. elegans genes are organized in operons8, spatial relationships of genes in T. spiralis do not readily indicate the existence of operons (Supplementary Note). This observation validated prior studies indicating similar findings4. As such, the existence of operons in this nematode remains an open question. Further, T. spiralis lacks both the canonical SL1 trans-spliced leader found in most nematodes and the SL2 trans-spliced leader that is spliced onto transcripts from downstream genes in C. elegans operons. To date, at least 15 distinct spliced leaders encoded by 19 SL RNA genes have been identified in T. spiralis4; however, these putative splice leaders, exhibit sequence variability at nearly all base positions, and were found to be present in only 1% of the cDNAs examined. It is likely, therefore, that the canonical SL1 and SL2 spliced leader sequences were not part of the genetic repertoire in nematodes that diverged early in the evolution of the Nematoda. This hypothesis is supported in part by our inability to identify canonical SL1 and SL2 sequences among Trichuris muris EST as well (data not shown). After comparison to an extensive collection of proteins from other species, 45% (7,251) of the predicted protein coding genes were T. spiralis specific, of which 12% had EST confirmation (Supplementary Fig. 3). The amino acid (AA) composition of predicted proteins in T. spiralis is similar to that observed in other nematodes9, organisms (Supplementary Table 2), and taxa10. In agreement with previous studies11, nematodes show a correlation between AA usage and the degree of codon degeneracy (R=0.74).

Genome evolution

The availability of a genome from a member of the Dorylaimia expanded our abilities to evaluate genome evolution among highly divergent crown clades and to potentially identify factors underlying lineage diversification. We evaluated changes associated with nematode evolution in relation to: i) genome organization; ii) births and pan class="Disease">deaths of gene families; iii) gene duplications and deletions that have occurred within gene families; and iv) linear organization of orthologous genes. Organizational characteristics were evaluated by comparing the genomes of T. spiralis and C. elegans. The number of predicted genes in T. spiralis is notably lower than the 20,140 genes identified in C. elegans even though the two genomes exhibit similar repeat content and gene density. A comparison of approximately ~3,400 predicted orthologous genes (based on reciprocal best BLAST hits) showed that T. spiralis has a significantly shorter average intron size (191 bp vs. 391 bp, P=6.5e–69), amidst an average exon size that is relatively similar for the two species (179 bp for T. spiralis and 226 bp for C. elegans, P=7.0e–3). Focusing only on predicted orthologous genes with 20 or more exons, the mean total length for all exons was significantly higher in C. elegans (P=0.001). Comparisons of Pfam domains contained in orthologous pairs showed C. elegans had significantly more domains compared to the orthologous T. spiralis genes (876 vs. 755, P<0.01). These differences coincide with the smaller size of the T. spiralis genome; however, we cannot rule out the possibility for higher numbers of gene fragments in T. spiralis resulting from less refined genome annotation. Delineating gene family emergence and extinction within phylogenetically related organisms can identify molecular determinants that underlie species (and pathogen) adaptation and lineage or species evolution. Such an approach has been used in analyzing nematode EST12–14. Here we measured potential emergence and extinction events of protein families across the Nematoda. The analysis included species from four major lineages that collectively span the phylum (C. elegans, Meloidogyne incognita15, Brugia malayi16 and T. spiralis). These species represent nematodes that are non-parasitic, parasitic in plants, and parasitic in animals, respectively, thus representing diverse trophic ecologies. Arthropod (Drosophila melanogaster17) and yeast (Saccharomyces cerevisae18) species were used as outgroups. Markov clustering19 of the complete protein catalog (87,406 proteins) comprising all six species generated 12,163 protein families (Supplementary Table 3). Inter-specific protein families overlaid onto species phylogeny identified 702 protein families at the node between Nematoda and the outgroups (Fig. 1a and Supplementary Table 4). Of these nematode families, 274 families were common among all four members of the Nematoda. We screened the genes in the 274 core nematode group (1,990 genes) against all available nematode ESTs/cDNAs and found that 73% shared homology to nematode transcriptome data from 27 nematode genera, and only 5% shared sequence homology to arthropods using the same cutoff value. These numbers do not preclude gains that may have occurred before the appearance of the Nematoda or gains relative to Drosophila that may still be present in other arthropods. In contrast, 88 protein family deaths were identified as common among the four nematodes relative to D. melanogaster. Protein family deaths outnumbered births for all three parasitic species, whereas in the non-parasitic species C. elegans, births outnumbered deaths four to one. The methods utilized here will allow future assessment of this tendency with availability of additional genomes from other parasitic and non-parasitic nematodes. Emergence of new protein families was observed in all nematode lineages, albeit less so for B. malayi. Accordingly, it is now possible to explore the relevance of protein families identified in the evolution of lineages within the Nematoda and across phyla.
Fig. 1

Protein and gene family changes associated with the origin and evolution of the Nematoda. (a) Protein family changes. At the branch of each lineage, the ‘+’ number indicates family birth events and the ‘-’ number indicates family death events represented by all members indicated for that lineage. For example, there are 702 protein family births ancestral to the phylum Nematoda and 88 protein family deaths in common among the four nematodes by comparison to arthropods (represented by D. melanogaster). These events were reconstructed from 12,206 inter-specific orthologous families (63,273 proteins). (b) Gene duplications and losses over the evolution of the common protein families. The gene duplication and loss events were reconstructed using 858 orthologous multi-member protein families (containing 8,260 proteins) conserved among all 6 species. At the branch of each lineage, the ‘+’ number indicates the number of gene duplication events and the ‘-’ number indicates the number of gene loss events for that lineage.

Similarly, quantitative changes in protein family members (duplications and deletions) can reflect evolutionary determinants of lineage and species diversity. We evaluated 858 families (8,260 genes) common to the four nematode species and two outgroup species defined above (Fig. 1b); 674 families had no obvious duplications or deletions, 70 had only deletions,105 had only duplications and nine had both. Nematode species had a higher number of events compared to D. melanogaster (Fig. 1b). Among the nematodes, M. incognita had the highest number of both duplications and deletions likely due to the 30% of the genome being duplicated resulting in more species-specific events15. An example for T. spiralis involves the secreted DNase II-like protein family, a member of which has been evaluated as a vaccine candidate20 and implicated in host-parasite interactions. The genome shows more extensive expansion of this family (estimated 125 genes) than previously realized (Supplementary Note and Supplementary Fig. 4). To provide additional examples, we compared protein families in C. elegans with sequence homologues in T. spiralis. Ten families were relatively expanded and five families were contracted in T. spiralis (P<0.001) (Supplementary Table 5). These families can be grouped into i) those present prior to the separation of nematodes and arthropods (nine families) and ii) those putatively born coincident with this separation (six families), and possibly the origin of nematodes. The six protein families in this later group included four that are relatively expanded in T. spiralis; a retrotransposon (2:201 Ce:Ts), a translation initiation factor 2C, putatively related to lipid metabolism (2:140 Ce:Ts), a zinc finger C2H2 type protein (1:14, Ce:Ts), and a hypothetical protein (1:44, Ce:Ts) associated with defective egg laying in C. elegans. Two protein families are relatively contracted in T. spiralis; a major sperm protein (33:1, Ce:Ts ), and a protein of unknown function, DUF1647, (18:1, Ce:Ts). Comparisons of orthologous protein families outlined in sections ii and iii facilitated assessment of a nematode genome (T. spiralis) from a basally positioned clade (clade 2), with those from highly divergent clades (clades 8, 9, 12)21 and an outgroup member (D. melanogaster). Results consistently demonstrated similar and extensive levels of disparity in orthologous family sizes between T. spiralis and either C.elegans or D. melanogaster, while members of clades 8, 9, and 12 showed higher levels of shared attributes with C. elegans only (Fig. 2). Information in the next section provide independent measures, based on genome organization, to support this data which previously was indicated by rRNA sequence comparisons21.
Fig. 2

Comparison of orthologous protein families among nematodes that span the phylum. Orthologous families comprised of each of the three parasites and D. melanogaster and C. elegans are plotted separately. The size of the dot represents the size of the orthologous family; the position represents the composition of the family based on the three represented species. With the assumption that evolutionarily close species have similar orthologous family size (fewer duplications and deletions), these plots illustrate that T. spiralis is equally distinct from both C. elegans and D. melanogaster while the two other parasites share greater commonality with C. elegans. P-values (derived using Chi-square test in pair-wise plot comparison) indicate a greater number of families present in C. elegans compared to D. melanogaster, and show that significantly fewer families are biased to C. elegans when T. spiralis is present in the orthologous family

Next we evaluated the nematode genomes across the phylum regarding extent and limits to evolutionary changes and functional associations that may depend on gene arrangements. Comparisons between C. elegans and B. malayi (~350 million years of separation) indicated that intra- rather than inter-chromosomal rearrangements preferentially characterize genome evolution evident between these species16. We used the T. spiralis genes organized on the six longest ultracontigs to extend this analysis. As for B. malayi, T. spiralis genes showed macrosyntenic relationships with predicted orthologs from C. elegans (P<0.0001) albeit to a lesser extent (Fig. 3a). Because T. spiralis is diploid only in females of these species (female 2n=12 [XX], male 2n=11 [XO]), the correlation coefficient was calculated also when the X chromosome was excluded. This resulted in improved support for macrosynteny. This non-random distribution of orthologous genes is consistent with that observed in several nematode species22–24.
Fig. 3

Genes from T. spiralis show macrosyntenic relationships with predicted orthologs from other nematodes. (a) T. spiralis genes on the six largest ultracontigs with orthologs in C. elegans, colored to indicate the C. elegans chromosome on which the ortholog is located. The correlation was strong (R=0.95, R=0.76 and R=0.99), and even stronger when the X chromosome was excluded (R=0.97, R=0.97 and R=0.99). As example, R=0.95 indicates that both T. spiralis Ultracontigs 1 and 4 are strongly associated with one predominant C. elegans chromosome, Chr III, and not a result of random gene distribution. (b) Orthologous segments shared among nematode species shown on the C. elegans chromosomes. Red segments are considered to be ancestral orthologous segments among nematodes. The size of segments corresponds to the C. elegans orthologous segment that might be different than the orthologous segment in the other two species (Supplementary Table 7).

Assuming a constant tendency towards randomness, genome re-assortment is expected to occur at a rate commensurate with evolutionary distance. Using syntenic blocks of C. elegans for standardization, we measured dynamics of nematode chromosome re-assortment among multiple nematode pairs25. The highest syntenic conservation score was observed between C. elegans and C. briggsae (0.752), less so between C. elegans and B. malayi (0.508), and the least between C. elegans and T. spiralis (0,28) (Supplementary Table 6). Because sequences for non-C. elegans genomes have varying levels of fragmentation, it was not possible to use entirely complementary gene sets in the pairwise comparisons (orthologous genes on different scaffolds were not considered). Nevertheless, the relative syntenic conservation values were consistent with the perceived evolutionary distance of the species investigated. The approximate 72% of the T. spiralis genome organization that lacked demonstrable congruence with the C. elegans genome provided a tentative estimate on the limits of evolutionary diversity of this kind across the Nematoda. Despite an anticipated tendency toward randomization, existence of syntenic blocks suggests functional constraints to genome evolution. This possibility was investigated with a high-level orthology map created with coding exons as anchors26 from C. elegans, B. malayi and T. spiralis. We identified 196 orthologous segments (Supplementary Table 7); 155 were shared among C. elegans and B. malayi, five were shared among B. malayi and T. spiralis and 36 segments were shared among all three species, putatively defined as ancestral orthologous segments. No segments were shared exclusively between C. elegans and T. spiralis (Fig. 3b). The results are again consistent with the perceived evolutionary distance among these organisms based on all pairwise comparisons. The genes within the 36 ancestral segments accounted for ~50% of the genes in all segments for C. elegans and B. malayi, but 97% of the genes in T. spiralis. Over half of the ancestral segments are located on C. elegans chromosomes III and IV. These ancestral segments tended to localize more centrally in the chromosomes (P=0.001)27. This tendency was also suggested by the two-species orthologous segments, although less evident (different at P=0.1). The overall patterns highlighted likely reflect basic properties that influence the evolution of genome organization in nematodes. Nematode species from the lineages evaluated span recent and early radiation events within the phylum Nematoda. Hence, the quantitative and qualitative measures of genomic diversity will help to define both the extent and limits of genome organizational diversity across the Nematoda and help clarify molecular determinants of nematode lineages and species. Nevertheless, the results based on Markov clustering of predicted orthologous protein families will exclude other forms of diversity such as nucleotide substitutions, insertions and deletions. As such, the documented differences reflect but a small component of the total genomic diversity within the Nematoda.

Molecular determinants archetypical of the phylum Nematoda

Molecular determinants for traits that characterize the archetypical nematode have been evaluated 12,14. To identify proteins and protein sequences that are broadly conserved among the four nematodes that span the phylum, we further compared worm derived proteins to those of arthropod and yeast outgroups. The 12,163 orthologous protein families were partitioned into: 1) orthologous protein sequences that are broadly conserved among all of the four nematode species and any of the two outgroups (2,517 families, 14,801 nematode proteins); 2) those conserved exclusively among the four nematodes (274 families, 1,990 nematode proteins); and 3) those that are conserved between any nematode and any outgroup (4,980 families, 30,729 proteins) (Supplementary Table 3). We evaluated 328 protein families represented by a single copy gene in all six species by querying the C. elegans database for RNAi phenotypes. The exclusion of multi-member protein families from this evaluation precluded cases where compensation by other family members might obscure RNAi phenotypes. Of the 328 C. elegans genes, 232 (71%) had associated RNAi phenotypes (significant enrichment at P<0.00001) consistent with a gene set essential to core cellular and biochemical functions of eukaryotes (Supplementary Table 8). Of the 2,517 nematode protein families (Fig. 4), 274 were detected in all four nematodes only (see Genome evolution section ii) and were collectively referred to as Nematode Orthologous Groups (NOGs)(Supplementary Table 9 and Supplementary Fig. 5). These NOGs were significantly enriched (P<0.00001) for genes with RNAi phenotypes in pan class="Species">C. elegans and likely represent a gene set essential to core cellular and biochemical functions of nematodes.
Fig. 4

Distribution of orthologous families among the four nematode representatives spanning the phylum Nematoda. The lineages represented in the Nematoda are: Rhabditida (C. elegans), Tylenchina (M. incognita), Spirurina (B. malayi) and Dorylaimia (T. spiralis). The trophic ecology of each of the 4 nematode species used in this study for pan-phylum analysis is indicated next to the species name. The 2,517 orthologous groups are conserved in all four nematodes. Sixty-four orthologous groups are conserved among the parasitic species, but not the free-living C. elegans. Enrichment of functional categories related to certain orthologous groups compared to the complete functional repertoire for the 4 nematode species is presented in Supplementary Table 8 and Supplementary Table 9.

The 274 NOGs encoded 189 multi-copy gene families and 85 single copy gene families (scNOGs). Sixty-eight of the scNOGs had RNAi information and 21 had observable RNAi phenotypes (Table 1 and Supplementary Table 9). There was no enrichment of RNAi phenotypes in the pan class="Species">C elegans genes in scNOGs compared to all pan class="Species">C. elegans genes (p<0.05). Nevertheless, among the 21 genes with phenotypes, eight had known tissue localization and only one was neuronal. Of the remaining 64 genes, 17 had known expression patterns of which 10 were neuronal. Therefore, the biological significance of the scNOGs may be underestimated by RNAi information because nervous tissue is relatively insensitive to RNAi (e.g.28).
Table 1

Pan-phylum single-copy genes withC. elegan s ortholog having severe RNAi phenotype

T. spiralis geneOrtholog in
DescriptoraC. elegnas RNAibStructural annotation
B. malayiC. elegansM. incognitaTMcSPd
Tsp_1494914972.m07791F39H12.2Minc14650Hypothetical, WD40 repeat-likeEmb--
Tsp_0387914330.m00196F28F8.6Minc16561Machedo-Joseph proteinEmb--
Tsp_0959114058.m00575M05B5.2Minc04214Hypothetical proteinLon Unc thin GroYY
Tsp_0256314972.m07706F53B6.1Minc01712aTetraspanin family protein NADH dehydrogenase (ubiquinone) 1Lva Dpy Bmd BliY-
Tsp_0747614379.m00149W01A8.4Minc03402beta subcomplex 4Lva Emb BmdY-
Tsp_0582914961.m05209ZK899.2Minc06660Hypothetical protein NADH dehydrogenase (ubiquinone) 1Lva Emb Lvl GroY-
Tsp_1027413068.m00024F44G4.2Minc14463beta subcomplex 2Lva Emb RBS--
Tsp_0587214972.m06963ZK682.5Minc05446aLeucine Rich Repeat family proteinLva GroYY
Tsp_0537313756.m00013C45B2.7Minc06522Patched related family protein 4Prl Unc Lva Dpy Emb LvlY-
Tsp_0950514968.m01485W08F4.6Minc18112Hypothetical proteinPrl Unc Lva Lvl Bmd Ela--
Tsp_1087714992.m10900T19B10.2Minc10356Hypothetical proteinPrl Unc Tsla Rup Gro-Y
Tsp_1103213644.m00292C09H10.7Minc15358Hypothetical proteinPvl Da, Emb Stp--
Tsp_1036913847.m00044F10E7.6Minc16059Hypothetical proteinSck Clr Ela GroY-
Tsp_0196614972.m07319W04G3.2Minc11161Lipocalin proteinUnc Lva Lvl Bmd-Y
Tsp_1003014961.m05181Y8G1A.2Minc07816Innexin membrane proteinUnc Rup Stp GroY-

Descriptor, annotation based on KEGG Orthology and Interpro.

RNAi phenotype description (www.wormbase.org).

TM, transmembrane.

SP, signal peptide for secretion.

Nematode-specific amino acid sequences in scNOG proteins may have practical significance for functional investigations. As such, we evaluated the scNOGs sequences for molecular features by forced alignment with non-nematode homologs i.e. human, chicken, frog and zebrafish, associated with the same Pfam entries. The scNOGs were categorized into two groups; i) those involving nematode-specific insertions and deletions (InDels)(e.g.29) relative to non-nematode homologues (15 proteins) (Supplementary Fig. 6a) and ii) those involving unique patterns of conservation independent of InDels (70 proteins) (Supplementary Fig 6b and Supplementary Fig. 7)(e.g.14). Sequence variation exclusive of conserved motifs was generally higher among the nematode proteins than among the vertebrate proteins, even though evolutionarily, each comparison spanned similar predicted lengths of time, consistent with a previous report30 (Supplementary Fig. 8). Therefore, pan-Nematoda specific conservation has persisted despite the high evolutionary rate in adjacent sequences of these NOGs. The nematode specific amino acid sequences in NOGs may have fundamental importance across the Nematoda. For instance, the predicted subunit of an electron transfer complex (Supplementary Fig. 6a) has well defined insertions, and a severe RNAi phenotype is associated with the pan class="Species">C. elegans member of this NOG. As such, comparative information from the vertebrate homolog may guide experiments to dissect the functional roles of the NOG insertions. Furthermore, a sequence containing amino acid insertions in one protein interaction partner may be compensated by deletions in the other protein interaction partner. We indeed identified that the interaction partner of the complex to which that protein belongs (long chain Acyl-CoA dehydrogenase, interaction that has been confirmed experimentally31) has deletions in the non-nematode protein (Supplementary Note, Supplementary Fig. 9 and Supplementary Fig 10). This series of analyses identified genes and proteins that may have fundamental importance to all nematode species. Two categories of nematode-specific sequences are responsible for delineation as scNOGs. Therefore, scNOGs, and most likely other NOGs, contain pan-phylum nematode-specific sequences incorporated either into universally conserved protein structures or into protein structures that are unique to the Nematoda. Evidence reflecting biological significance highlights the potential for NOGs to serve as targets for control of parasitic nematodes that infect pan class="Species">humans, animals and plants, while potentially limiting risk to the host.

Nematode core- and phylogenetically-restricted functional categories

A question of central importance is whether or not parasitic nematodes (and potentially other parasites) have independently evolved, or preferentially retained common solutions to challenges of pan class="Disease">parasitism despite their exploitation of widely divergent trophic ecologies (e.g.32). Much interest in this context has focused on: i) secretory proteins, ii) molecular functions, and iii) biochemical pathways that are conserved or taxonomically restricted. Although not all secretory proteins from parasitic nematodes are involved in interactions with the host, constituents of this protein category are prime candidates for examining the host-pathogen interface. Here, we sought proteins that are broadly conserved among nematodes, or among parasitic nematodes. These proteins were sorted into orthologous protein groups shared among species representing diverse parasite lineages and then sub-grouped into those with secretory peptides (Supplementary Fig. 11). Predicted secretory protein orthologs were interrogated with previously identified secreted proteins using an orthogonal approach, based on excretory-secretory products in T. spiralis and B. malayi identified by tandem mass spectrographic analysis33–34. Only two proteins were identified as secretory and common to each parasite member (including vertebrate and plant parasites), but absent from the non-parasitic C. elegans: i) a serine peptidase member of the prolyl oligopeptidase family that can be critical for invasion of the mammalian host cells by protozoan parasites35; and ii) a cyanate hydratase that in other organisms hydrolyzes and detoxifies environmental cyanate36. Our results suggest that the number of conserved secretory proteins broadly involved in nematode interactions with hosts may be relatively few. Nevertheless, this number is likely to increase when reducing our analysis to sub-groupings of parasitic nematodes, as we found when proteomes for any two of the three parasitic species were interrogated here. Among the T. spiralis genes analyzed, 35% (5,456/15,808) could be assigned one or more GO terms. Putative molecular functions were assigned to 90% of this 35%; biological processes to 68% and cellular components to 45%. The remaining two-thirds of genes in T. spiralis represent uncharacterized and possibly novel functions in the parasite. A set of 25 molecular functions were significantly enriched (at P<0.01) or depleted when intra- or inter-specific orthologous groups were compared to the complete repertoire of GO terms for T. spiralis (Supplementary Table 10 and Supplementary Fig. 12). Among the orthologous families confined only to T. spiralis and C. elegans, rhodopsin-like receptor activity was enriched, a possible consequence of the number of genes involved in G-protein coupled receptor protein signaling pathways. In orthologous groups with members only from T. spiralis and B. malayi, the enriched category involved steroid binding proteins. Among a total of 71 molecular GO categories identified, 42 were enriched and 29 were depleted in the 2,517 nematode orthologous families (including pan class="Species">C. elegans) by comparison to the complete proteomes of the four nematode species (Supplementary Table 11). When considering the 64 orthologous groups conserved among the three parasitic nematodes, nine GO categories were statistically enriched or depleted; pan class="Chemical">ATP-binding was the only depleted category, whereas DNA-, and RNA-binding, aspartic-type endopeptidase and prolyl oligopeptidase activities were among those enriched (Supplementary Table 12). Therefore, commonalities in molecular functions may exist even among parasites from widely diverse ecological niches. Further light will be shed on genetic associations among parasitic and non-parasitic nematodes as more robust comparisons among species from each category begin to surface. Guided by the possibility that parasitic nematodes undergo reductive genome evolution due to reliance on the metabolic capacity and homeostatic buffering of their host, we compared T. spiralis genes encoding enzymes to similar genes from the other parasites and the non-parasitic C. elegans (37–38, Supplementary Fig. 13) and the NemaCyc viewer (Supplementary Fig. 14). We found that the parasitic species had fewer KOs (KEGG orthology) associated with their genes (~522–548), compared to C. elegans (704) (Table 2 and Supplementary Table 13). The number of genes correlated with the number of associated KOs. Therefore, we examined the KOs in relation to nematode lineages used in this study. Among the 785 KOs associated with the nematode species evaluated herein, 337 were shared among all 4 species, i.e. Core Nematode KOs (CNKs). The pathway that had most of the KOs as CNKs was the energy metabolism (53% of all KOs were conserved across all 4 species); the least was the metabolism of cofactors and vitamins (34% of the KOs were in all 4 species). Among the energy metabolism pathways, there were 96 KOs related to oxidative phosphorylation, 52 of which were conserved among all 4 nematodes. This result supports previous observations in which parasite enzymes involved in oxidative phosphorylation exhibited significant sequence divergence from similar host proteins. These differences were largely associated with nematode-specific insertions14,29. Despite the high level of conservation, the number of CNKs among all 4 nematodes was very low (34%) suggesting that different adaptations distinguish nematodes with distinct modes of existence.
Table 2

Genes and KEGG Orthologies (KOs) represented in metabolic patways in four nematodes

PathwayKOs in KEGG Reference pathwayRepresented KOs in nematodesConserved KO in nematodesC. elegansM. incognitaB. malayiT. spiralis

GenesKOsGenesKOsGenesKOsGenesKOs
1. Metabolism22587853372480704182252511325481069515
1.1 Carbohydrate Metabolism55019292626167499130294133252145
1.2 Energy Metabolism408131712351232109714410712387
1.3 Lipid Metabolism325144527101223809821810119987
1.4 Nucleotide Metabolism174783530674294521825118253
1.5 Amino Acid Metabolism48418875607174430129250114266124
1.6 Metabolism of Other Amino Acids1265526222501193973417639
1.7 Glycan Biosynthesis and Metabolism1608330163741535495638955
1.8 Biosynth. of Polyketides and Nonrib. Peptides42152614211
1.9 Metabolism of Cofactors and Vitamins301913139280298551745718556
1.10 Biosynthesis of Secondary Metabolites552513234201151859194718
1.11 Xenobiotics Biodegradation and Metabolism178612754855249401253711938

DISCUSSION

Here we present the genome sequence of T. spiralis, a member of the Dorylaimia and a lineage that diverged early in the evolution of the phylum Nematoda. The draft sequence of T. spiralis covered over 90% of the estimated genome and expected genes. Coupled with genomes from nematode lineages depicting more recent episodes of divergence, the T. spiralis data provide new perspectives on genomic evolution that more broadly spans the Nematoda. The T. spiralis genome sequence and the accompanying genome-mining analysis address four key issues. First, details of genomic diversity that were deduced among species have outlined molecular determinants, where the magnitude of change likely reflects molecular elements that have figured decisively in both lineage and species evolution of the Nematoda (e.g.39–41). It has been argued that such drastic differences can be related to functional diversification, speciation and species adaptation. Given the modest number of nematode species with available genomes, we fully expect that as additional nematode genome sequences become available, much greater resolution of differences will occur. Nonetheless, results presented here helped resolve many specific genomic characteristics that can be further investigated in this context. Second, host characteristics may select for common parasite characteristics of otherwise widely disparate nematode species. The similarities in the steroid binding protein family common to the parasites of humans and mammals, T. spiralis and B. malayi, were distinct from a large family of related nuclear hormone receptors in C. elegans, many of which are homologous to steroid-binding receptors in other organisms42. This distinction provides support for convergent enrichment of common steroid binding receptors in the two parasites of humans and other mammals, possibly dictated by characteristics of the host environment, as previously suggested43. Third, the new databases guided discovery of genes and proteins that appear to have fundamental importance to all nematode species (archetypical characteristics). Accordingly, the NOGs were significantly enriched for genes with RNAi phenotypes in C. elegans. Success in circumscribing archetypical nematode characteristics from pan-phylum databases will serve to refocus research on characteristics that have the broadest application for controlling pathogens of humans, animals and plants. Fourth, these results provide a valuable resource to investigate the biology of the intracellular pathogen, T. spiralis. One example involves a DNase II gene family of T. spiralis, which includes secreted proteins previously implicated in host-parasite interactions and immune control20. The curious expansion and diversification of this family by comparison to other nematodes can now be related to unique characteristics of T. spiralis, and possibly the lineages it represents. A second example centers around why species within this genus have separated into those that generate protective capsules from those which do not; a character which is not host related. There are innumerable anticipated applications of the genome data towards elucidating the biology, methods for immune control and treatments of this parasite. The comparative value of this genome sequence will extend these applications well beyond this species and phylum.

METHODS

Sequencing, assembly and annotation

Rats were infected orally with ML of T. spiralis strain ISS 195. Infections were allowed to precede a minimum of 30 days, then the muscle tissue was digested and parasite collected. Genomic DNA was extracted from muscle larvae of T. spiralis using standard protocols. Whole genome shotgun, BAC and EST libraries were generated3,6. The assembly was performed using the PCAP package44. The physical map for T. spiralis was constructed using 26,784 clones (Supplementary Note). The repeats were masked using RECON45 and RepeatMasker (see URLs1). Then the Ribosomal RNA genes were identified using RNAmmer (see URLs2). Transfer RNA genes were identified with tRNAscan-SE46. Non-coding RNAs were identified by sequence homology searches of the Rfam database (see URLs3). Protein-coding genes were predicted using a combination of ab initio programs47 and FgenesH (Softberry, Corp) and the evidence based program EAnnot48. A consensus gene set from the above prediction algorithms will be generated, using a logical, hierarchical approach. Gene product naming was determined by BER (see URLs4). Signal peptide for secretion and trans-membrane domain containing proteins were identified using PHOBIUS49.

Protein families and genome evolution

OrthoMCL 19 was used to predict orthologous groups of proteins. Phylogenetic trees were built for protein families with one member from each of the 6 species using PHYLIP (version 3.69; see URLs5) after aligning the family members with MUSCLE (version 3.7; 50). The consensus tree of the trees was used as the phylogeny of the species. pan class="Disease">Death and birth of each protein family overlaid over species phylogeny was constructed using PHYLIP-DOLLOP by treating each protein family as a character. Gene duplication and deletion events of the families having member from each of the 6 species were reconstructed using URec51 and a neighbor joining tree of each family was generated using PHYLIP-NEIGHBOR. The dynamics of nematode chromosome re-assortment among multiple nematode pairs was measured using OrthoCluster25 and using syntenic blocks of pan class="Species">C. elegans for standardization. For the identification of the ancestrial orthologous regions we used exons that are orthologous among species as map "anchors”52 (Supplementary Note).

Nematode-specific molecular features

A profile was built for each of the 85 scNOGs using HMMBUILD53. The profiles were calibrated using hmmcalibrate and each profile was used to search the Pfam (release 23.0). Hits better than 0.1 were considered. The selected non-nematode species were of evolutionary distances similar to C. elegans and T. spiralis: human, chicken, zebrafish and frog. After identification of the non-nematode families that were associated with same Pfam as the scNOGs the multi-fasta files were aligned using MUSCLE. These alignments were used to build distance matrix using PHYLIP-PROTDIST. RNAi source data was from Wormmart from Wormbase release 180. The core nematode groups were screened against nematode (~1.1 M ESTs and/or Roche/454 cDNAs) and arthropod (5.3 M ESTs) transcript data and sequence homology at 35 bits and 55% identity cut-off was accepted as significant.

Structural annotation and comparison of interaction partners

The three dimensional structure was modeled using the Rosetta3.0 software suite54–56. A total of 40,000 decoys were generated using the full-atom scoring method57 for each sequence. Several of the decoys with a small radius of gyration and low all-atom energy (i.e. the bottom of the energy well) were compared using TM-align58 and pan class="Species">MAMMOTH59. The position of the insertions was mapped onto the models generated. The secondary structure predictions calculated for the Rosetta ab initio program were added to the sequence alignment generated by MUSCLE50. The functional significance of the insertions in the electron transfer complex was further dissected by comparing interacting proteins. Two protein-protein interaction databases, IntAct60 and MINT61, were used to see if this protein or its orthologs were involved in a protein-protein interaction.

Functional associations and taxonomic restrictions

Default parameters for InterProScan (v16.1) were used to search apan class="Disease">gainst the InterPro database62 and Gene Ontology (GO, 63) annotations were obtained with no additional curation (IEA associations only). These annotations have been displayed graphically by AmiGO and can be accessed at Nematode.net37. Significant enrichment of GO terms was computed based on the hypergeometric distribution using FUNC 64 (including false discovery rate, FDR). A probability refinement was done to remove the GO terms identified as significant due to their pan class="Species">children terms. We used the false discovery rate (FDR) computed by FUNC to reduce false discovery. Therefore, unless specified otherwise, the GO term enrichment was selected based on both p-value <0.05 (after refinement) and FDR <0.1. The gene products were associated with a specific biochemical pathway using the KEGG pathway mappings65. WU-BLAST matches of the genes against KEGG database version 46.0 was used for pathway mapping with a filter of 1e-10. Graphical presentation of the pathway associations was done using NemaPath38. The C. elegans NemaCyc viewer is based on mapping a BLASTP alignment of the KEGG’s genesDB against the predicted T. spiralis genes. Scores stronger than 1e-10 were considered.
  64 in total

1.  URec: a system for unrooted reconciliation.

Authors:  Pawel Górecki; Jerzy Tiuryn
Journal:  Bioinformatics       Date:  2006-12-20       Impact factor: 6.937

2.  The evolution of biased codon and amino acid usage in nematode genomes.

Authors:  Asher D Cutter; James D Wasmuth; Mark L Blaxter
Journal:  Mol Biol Evol       Date:  2006-08-25       Impact factor: 16.240

3.  Limited microsynteny between the genomes of Pristionchus pacificus and Caenorhabditis elegans.

Authors:  Kwang-Zin Lee; Andreas Eizinger; Ramkumar Nandakumar; Stephan C Schuster; Ralf J Sommer
Journal:  Nucleic Acids Res       Date:  2003-05-15       Impact factor: 16.971

Review 4.  Life with 6000 genes.

Authors:  A Goffeau; B G Barrell; H Bussey; R W Davis; B Dujon; H Feldmann; F Galibert; J D Hoheisel; C Jacq; M Johnston; E J Louis; H W Mewes; Y Murakami; P Philippsen; H Tettelin; S G Oliver
Journal:  Science       Date:  1996-10-25       Impact factor: 47.728

5.  Analysis of a 43-kDa glycoprotein from the intracellular parasitic nematode Trichinella spiralis.

Authors:  D K Vassilatis; D Despommier; D E Misek; R I Polvere; A M Gold; L H Van der Ploeg
Journal:  J Biol Chem       Date:  1992-09-15       Impact factor: 5.157

6.  InterPro, progress and status in 2005.

Authors:  Nicola J Mulder; Rolf Apweiler; Teresa K Attwood; Amos Bairoch; Alex Bateman; David Binns; Paul Bradley; Peer Bork; Phillip Bucher; Lorenzo Cerutti; Richard Copley; Emmanuel Courcelle; Ujjwal Das; Richard Durbin; Wolfgang Fleischmann; Julian Gough; Daniel Haft; Nicola Harte; Nicolas Hulo; Daniel Kahn; Alexander Kanapin; Maria Krestyaninova; David Lonsdale; Rodrigo Lopez; Ivica Letunic; Martin Madera; John Maslen; Jennifer McDowall; Alex Mitchell; Anastasia N Nikolskaya; Sandra Orchard; Marco Pagni; Chris P Ponting; Emmanuel Quevillon; Jeremy Selengut; Christian J A Sigrist; Ville Silventoinen; David J Studholme; Robert Vaughan; Cathy H Wu
Journal:  Nucleic Acids Res       Date:  2005-01-01       Impact factor: 16.971

7.  TM-align: a protein structure alignment algorithm based on the TM-score.

Authors:  Yang Zhang; Jeffrey Skolnick
Journal:  Nucleic Acids Res       Date:  2005-04-22       Impact factor: 16.971

8.  Molecular determinants archetypical to the phylum Nematoda.

Authors:  Yong Yin; John Martin; Sahar Abubucker; Zhengyuan Wang; Lucjan Wyrwicz; Leszek Rychlewski; James P McCarter; Richard K Wilson; Makedonka Mitreva
Journal:  BMC Genomics       Date:  2009-03-18       Impact factor: 3.969

9.  FUNC: a package for detecting significant associations between gene sets and ontological annotations.

Authors:  Kay Prüfer; Bjoern Muetzel; Hong-Hai Do; Gunter Weiss; Philipp Khaitovich; Erhard Rahm; Svante Pääbo; Michael Lachmann; Wolfgang Enard
Journal:  BMC Bioinformatics       Date:  2007-02-06       Impact factor: 3.169

10.  NemaPath: online exploration of KEGG-based metabolic pathways for nematodes.

Authors:  Todd Wylie; John Martin; Sahar Abubucker; Yong Yin; David Messina; Zhengyuan Wang; James P McCarter; Makedonka Mitreva
Journal:  BMC Genomics       Date:  2008-11-04       Impact factor: 3.969

View more
  143 in total

1.  Rapid diversification of five Oryza AA genomes associated with rice adaptation.

Authors:  Qun-Jie Zhang; Ting Zhu; En-Hua Xia; Chao Shi; Yun-Long Liu; Yun Zhang; Yuan Liu; Wen-Kai Jiang; You-Jie Zhao; Shu-Yan Mao; Li-Ping Zhang; Hui Huang; Jun-Ying Jiao; Ping-Zhen Xu; Qiu-Yang Yao; Fan-Chun Zeng; Li-Li Yang; Ju Gao; Da-Yun Tao; Yue-Ju Wang; Jeffrey L Bennetzen; Li-Zhi Gao
Journal:  Proc Natl Acad Sci U S A       Date:  2014-11-03       Impact factor: 11.205

Review 2.  Genome mining offers a new starting point for parasitology research.

Authors:  Zhiyue Lv; Zhongdao Wu; Limei Zhang; Pengyu Ji; Yifeng Cai; Shiqi Luo; Hongxi Wang; Hao Li
Journal:  Parasitol Res       Date:  2015-01-08       Impact factor: 2.289

3.  Incorporating genomics into the toolkit of nematology.

Authors:  Adler R Dillman; Ali Mortazavi; Paul W Sternberg
Journal:  J Nematol       Date:  2012-06       Impact factor: 1.402

4.  Panning for molecular gold in whipworm genomes.

Authors:  Elodie Ghedin
Journal:  Nat Genet       Date:  2014-07       Impact factor: 38.330

5.  Screening and characterization of early diagnostic antigens in excretory-secretory proteins from Trichinella spiralis intestinal infective larvae by immunoproteomics.

Authors:  Ruo Dan Liu; Peng Jiang; Hui Wen; Jiang Yang Duan; Li Ang Wang; Jie Feng Li; Chun Ying Liu; Ge Ge Sun; Zhong Quan Wang; Jing Cui
Journal:  Parasitol Res       Date:  2016-02       Impact factor: 2.289

6.  Ascaris suum draft genome.

Authors:  Aaron R Jex; Shiping Liu; Bo Li; Neil D Young; Ross S Hall; Yingrui Li; Linfeng Yang; Na Zeng; Xun Xu; Zijun Xiong; Fangyuan Chen; Xuan Wu; Guojie Zhang; Xiaodong Fang; Yi Kang; Garry A Anderson; Todd W Harris; Bronwyn E Campbell; Johnny Vlaminck; Tao Wang; Cinzia Cantacessi; Erich M Schwarz; Shoba Ranganathan; Peter Geldhof; Peter Nejsum; Paul W Sternberg; Huanming Yang; Jun Wang; Jian Wang; Robin B Gasser
Journal:  Nature       Date:  2011-10-26       Impact factor: 49.962

7.  Discovery of leucokinin-like neuropeptides that modulate a specific parameter of feeding motor programs in the molluscan model, Aplysia.

Authors:  Guo Zhang; Ferdinand S Vilim; Dan-Dan Liu; Elena V Romanova; Ke Yu; Wang-Ding Yuan; Hui Xiao; Amanda B Hummon; Ting-Ting Chen; Vera Alexeeva; Si-Yuan Yin; Song-An Chen; Elizabeth C Cropper; Jonathan V Sweedler; Klaudiusz R Weiss; Jian Jing
Journal:  J Biol Chem       Date:  2017-09-18       Impact factor: 5.157

Review 8.  The genomic basis of nematode parasitism.

Authors:  Mark Viney
Journal:  Brief Funct Genomics       Date:  2018-01-01       Impact factor: 4.241

9.  Comparative bioinformatics, temporal and spatial expression analyses of Ixodes scapularis organic anion transporting polypeptides.

Authors:  Zeljko Radulović; Lindsay M Porter; Tae K Kim; Albert Mulenga
Journal:  Ticks Tick Borne Dis       Date:  2014-02-25       Impact factor: 3.744

10.  Construction and use of a Trichinella spiralis phage display library to identify the interactions between parasite and host enterocytes.

Authors:  Hui Jun Ren; Ruo Dan Liu; Zhong Quan Wang; Jing Cui
Journal:  Parasitol Res       Date:  2013-02-19       Impact factor: 2.289

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.