Literature DB >> 24441737

Genome of the human hookworm Necator americanus.

Yat T Tang1, Xin Gao1, Bruce A Rosa1, Sahar Abubucker1, Kymberlie Hallsworth-Pepin1, John Martin1, Rahul Tyagi1, Esley Heizer1, Xu Zhang1, Veena Bhonagiri-Palsikar1, Patrick Minx1, Wesley C Warren1,2, Qi Wang1, Bin Zhan3,4, Peter J Hotez3,4, Paul W Sternberg5,6, Annette Dougall7, Soraya Torres Gaze7, Jason Mulvenna8, Javier Sotillo7, Shoba Ranganathan9,10, Elida M Rabelo11, Richard W Wilson1,2, Philip L Felgner12, Jeffrey Bethony13, John M Hawdon13, Robin B Gasser14, Alex Loukas7, Makedonka Mitreva1,2,15.   

Abstract

The hookworm Necator americanus is the predominant soil-transmitted human parasite. Adult worms feed on blood in the small intestine, causing iron-deficiency anemia, malnutrition, growth and development stunting in children, and severe morbidity and mortality during pregnancy in women. We report sequencing and assembly of the N. americanus genome (244 Mb, 19,151 genes). Characterization of this first hookworm genome sequence identified genes orchestrating the hookworm's invasion of the human host, genes involved in blood feeding and development, and genes encoding proteins that represent new potential drug targets against hookworms. N. americanus has undergone a considerable and unique expansion of immunomodulator proteins, some of which we highlight as potential treatments against inflammatory diseases. We also used a protein microarray to demonstrate a postgenomic application of the hookworm genome sequence. This genome provides an invaluable resource to boost ongoing efforts toward fundamental and applied postgenomic research, including the development of new methods to control hookworm and human immunological diseases.

Entities:  

Mesh:

Year:  2014        PMID: 24441737      PMCID: PMC3978129          DOI: 10.1038/ng.2875

Source DB:  PubMed          Journal:  Nat Genet        ISSN: 1061-4036            Impact factor:   38.330


INTRODUCTION

Soil transmitted helminths (STHs), including Ascaris, Trichuris and hookworms, cause neglected tropical diseases (NTDs) affecting >1 billion people worldwide[1,2]. Hookworms alone infect approximately 700 million people (primarily in disadvantaged communities in tropical and subtropical regions), causing a disease burden of 1.5-22.1 million disability-adjusted life years (DALYs)[3]. Necator americanus represents ~85% of all hookworm infections[4] and causes necatoriasis, characterized clinically by anaemia, malnutrition in pregnant women and an impairment of cognitive and/or physical development in children[5]. The life cycle of N. americanus commences with eggs being shed in the faeces of infected people. Eggs embryonate in soil under favorable conditions, and then the first-stage larvae hatch, feed on environmental microbes and moult twice to reach the infective third-stage larvae (iL3). These larvae infect the human host by skin penetration, enter subcutaneous blood and lymph vessels and travel via the circulation to the lungs. The iL3 break into the alveoli and migrate via the trachea to the oropharynx, after which they are swallowed and travel to the small intestine, where they develop to become dioecious adults. The adult worms (~1 cm long) attach to the mucosa, where they feed on blood (up to 30 μl per day per worm), and can survive in the human host for up to a decade. The pre-patent period of N. americanus is 4-8 weeks and a female worm can produce up to 10,000 eggs per day. New methods to control hookworm disease are urgently needed. Presently, the treatment of hookworm disease relies mainly on mass treatment with albendazole[6], but its repeated and excessive use has the potential to lead to treatment failures[7] and drug resistance[8]. Recent indications of reduced cure rates in infected humans[9] imply an urgent need for new interventions strategies. Early attempts to utilize bioinformatic approaches for the discovery of immunogens were hampered by a lack of understanding of the molecular biology of N. americanus and other hookworms[4], and the absence of genome and proteome sequences. A recent study[10] demonstrated that comparative genomics facilitates the characterization and prioritization of anthelmintic targets which results in a higher hit rate compared with conventional approaches. In addition to a need for anti-hookworm vaccines in countries with high rates of hookworm infections, hookworms and other helminths are being explored as treatments (probiotics) against immunological diseases in humans in many industrialized countries where hookworm infections are not endemic[11]. Recent studies[12-14] indicate that hookworms suppress the production of pro-inflammatory molecules and promote anti-inflammatory and wound healing properties, suggesting a mechanism by which worms reside for long periods in humans and suppress autoimmune and allergic diseases. Indeed hookworm recombinant proteins have been tested in clinical trials for non-infectious diseases[15]. Herein we characterized the N. americanus genome and compared it with those of other nematodes and the human host. Bioinformatic analyses of the protein-coding genes identified salient molecular groups, some of which may represent new intervention targets. The production and screening of a hookworm protein microarray reveals novel findings on the immune response to the parasite and demonstrates a post-genomic exploration of the genome sequence, including identification of molecules with low similarity to proteins in other species but recognized by all infected individuals, therefore demonstrating high diagnostic potential.

RESULTS

Genome features

The nuclear genome of N. americanus (244 Mb) was assembled, with 11.4% (1,336) of the supercontigs (≥1 kb) comprising 90% of the genome. The 244 Mb sequence was estimated to represent 92% of the N. americanus genome (Table 1; Supplementary Fig. 1, 2 and 3; Supplementary Note). The GC content is 40.2%, the amino acid composition is comparable to other species (including 5 nematodes, the host and two outgroups; ) and the repeat content is 23.5%. In total, 669 repeat families were predicted and annotated (). The protein-encoding genes predicted (n = 19,151) represent 33.7% of the genome at an average density of 78.5 genes per Mb and a GC content of 45.8%. Compared to C. elegans, N. americanus exons were shorter and the introns were longer (), but the average intron length and count for genes orthologous between the two species was not significantly different (). However, introns in C. elegans genes that are orthologous to N. americanus genes are significantly longer than introns in non-orthologous C. elegans genes (), which may indicate a diversity of function for these genes, since longer introns are thought to contain functional elements that are present in addition to what might be regarded as ‘normal’ intron structure[16]. In addition, N. americanus iL3-overexpressed genes had longer introns than adult-overexpressed genes (), which may indicate a greater diversity of regulation for these gene sets[16]. Positional bias was observed for intron length, which was comparable to C. elegans position-specific intron lengths for orthologous genes (). Most genes (82.6%) were confirmed using RNA-Seq data from the iL3 and adult stages of N. americanus (two biological replicates per stage), and 6.5% and 3.7% were overexpressed in these stages, respectively (). Alternative splicing was detected for 24.6% (4,712) of the genes, of which ~68.3% have orthologs in C. elegans. Among N. americanus genes with C. elegans orthologs, the alternatively spliced genes were more likely than other genes to belong to orthologous groups for which more than half of the CE genes were also alternatively spliced (p = 0.037, binomial distribution test). As expected, genes associated with alternative splicing had a higher number of exons than those without (p<10−15 and 2×10−7 for N. americanus and C. elegans, respectively). A total of 3,223 N. americanus genes were predicted to be trans-spliced, of which 818 had conserved gene order and orientation with 373 C. elegans operons (Fig. 1d; Supplementary Figs. 6 and 7; Supplementary Table 4; Supplementary Note). The genes within the operons had significantly more similar expression profiles to one another than to random subsets of non-operon genes (p<0.0001), supporting that they are co-transcribed under the similar regulatory control[17].
Table 1

Summary of N. americanus genomic features

Estimated genome size (Mega bases)244
Assembly statistics
    Total number of supercontigs (>=1 kb)11,713
    Total number of base pairs (bp) in supercontigs244,009,025
    Number of N50 supercontigs283
    N50 supercontig length (bp)213,095
    Number N90 supercontigs1,336
    N90 supercontig length (bp)29,214
    GC content of whole genome (%)40.20%
    Repetitive sequences (%)23.50%
Protein-coding loci
    Total number of protein coding genes19151
    Avg. gene loci footprint (bp)4289
    Avg. # exons per gene6.4
    Avg. exon size (bp)125
    Avg. intron size (bp)642
    Avg. intergenic space (bp)6631

N50: number-50% of all nucleotides in the assembly are in 283 supercontigs, length-50% of the genome is in supercontigs with a minimum length of 213kb; N90: number-90% of all nucleotides in the assembly are within 1,336 supercontigs, length-90% of the genome is in supercontigs with a minimum length of 29kb.

Figure 1

N. americanus gene feature organization compared to C. elegans

a, The average exon of N. americanus genes is significantly shorter and the average intron length is significantly larger than for C. elegans genes. b, Orthologous genes have significantly more introns than non-orthologous genes in both species. c, Introns are longer for orthologous genes in C. elegans at every intron position (compared to non-orthologous genes). In a-c, Error bars indicate standard error values. d, N. americanus genes in operons and conserved with C. elegans shown on the C. elegans chromosomes.

The N. americanus predicted secretome (classical 1,590 and non-classical secretion 4,785 proteins) represents 33% of the deduced proteome. Functional annotation of predicted proteins based on sequence comparisons identified 4,961 unique domains and 1,411 gene ontology (GO) terms for 57% and 44% of the N. americanus genes, respectively, and annotations are provided for 68% of the predicted N. americanus proteins (Supplementary Table 5).

Transcriptional differences between infective and parasitic stages

Hookworms spend a considerable amount of time as free-living larvae in the external environment before transitioning to parasitism. Gene expression differences between these stages reflect this developmental progression (Supplementary Table 3; ). Of the 1,948 differentially expressed genes, 36% were significantly overexpressed in iL3-, and 64% in adult-. Compared to iL3-overexpressed genes, nearly twice as many of the adult-overexpressed genes were N. americanus-specific (58% compared to 32%, p<10−15), suggesting that species-specific genes are more likely to be related to parasitism rather than to non-parasitic iL3 stage[18]. The iL3-overexpressed genes are over-represented (p<0.01) for eight molecular functions, including signal transduction, transmembrane receptor activity, and anion transporter activity, reflecting the ability of iL3 to adapt to a complex environment and infect a suitable host (). This finding is supported by the enrichment of genes encoding G-protein coupled receptor proteins among iL3-overexpressed genes, (p=5.1×10−8), and not among adult-overexpressed genes (p=4.1×10−7) (). Consistent with other parasitic nematodes[19], serine/threonine protein kinase activity is also enriched among iL3-overexpressed genes (p=0.008). The complexity of transcription regulatory activities is likely to be high in iL3, as evidenced by the enrichment of sequence-specific DNA binding transcription factor activity genes (p=1.7×10−14) and genes with alternative splicing (p<2×10−13), and the fact that most (92.5%) of the differentially expressed transcription factors are iL3-overexpressed (). This iL3-stage enrichment of transcription factor-related activity might indicate that transcription factors (TFs) are poised for rapid gene expression after host invasion (i.e.gene expression is not active but is likely to be primed, as observed in arrested stages of C. elegans[20]). In contrast, in the adult stage, a broad spectrum of enzymes, such as proteases, hydrolases and catalases (Supplementary Table 6) are detected, emphasizing nutritional adaptation of adult worms that demands of a high-protein diet (i.e. blood[21]) (). Proteins with a signal-peptide (SP) for secretion had transcripts which were enriched among adult-overexpressed genes (p<10−15), whereas transmembrane domain-containing proteins (p=1.2×10−8) had transcripts which were enriched among iL3-overexpressed genes. SP-containing genes are enriched for proteases and protease inhibitors, the former contributing substantially to the predicted secretome (Supplementary Table 6, ), with 55% of all proteases (325/592) predicted as secreted. Proteases (particularly N. americanus-specific proteases with no orthologs in C. elegans) are overexpressed more often in adult compared to iL3 (p<10−15 for all both comparisons; Fig. 2b,c; Supplementary Note, Supplementary Table 7). Serine-type endopeptidase inhibitor activity, required to protect the adult stage from the digestive and immunologically hostile environment in the host[22], was adult-enriched (p=1.6×10−4). The adult enrichment of transcription pertaining to structural constituents of the cuticle (p=1.7×10−5) also relates to the importance of protection of the parasite from the host[23].
Figure 2

Molecular functions enriched among N. americanus genes, stage-enriched genes and the N. americanus degradome

a, “Molecular Function” Gene Ontology (GO) terms enriched in life-cycle stages and in N. americanus compared to other species. Included are (i) categories enriched in the iL3 or adult life cycle stages in N. americanus (ii) categories significantly over-represented or depleted in N. americanus compared to at least two of the comparison species, and (iii) second-order root nodes. b, Expression profiling of N. americanus proteases with C.-elegans orthologs. c, Expression profiling of N. americanus proteases with no C. elegans orthologs.

Blood feeding in adult hookworms is facilitated by an anticoagulation process and degradation of blood proteins by proteases. Known hookworm anticoagulants[24] are dominated by single-domain serine protease inhibitors (SPIs). We annotated 87 serine protease inhibitors (SPIs) in N. americanus, accounting for 8 of 17 protease inhibitor clans. Given that serine proteases in humans are involved in diverse physiological functions (including blood coagulation and immunomodulation) the diversity of SPIs in N. americanus are likely critical not only for anticoagulation during blood-feeding, but also for long-term survival in the host. Specifically, SPIs are also likely to protect adult worms from enzymes in the small intestine where serine proteases, including trypsin, chymotrypsin, and elastase are prominent [25], therefore mediating hookworm-associated growth delay[22]. SPIs are enriched among the adult-overexpressed genes (p=3.9×10−8), but not among the iL3-overexpressed genes (p=0.35). Most of the SPIs characterized in hookworms are Kunitz-type molecules (), but our findings suggest that multiple types of SPIs are produced by adult N. americanus in the human host. A mass spectrometry-based proteomics analysis was also performed using whole adult N. americanus worms (Full Methods Online), and the proteins detected (Supplementary Table 7, ) were also enriched for proteases (p=4.9×10−7), SPIs (p=1.8×10−4), as well as proteins with signal peptides (p=4.7×10−11) and a wide range of GO terms, many of which were related to proteolysis (Supplementary Table 6; ).

Pathogenesis and immunobiology of hookworm disease

N. americanus causes chronic disease and does not usually induce sterile immunity in the host. Adult hookworms live in the host for several years due to their ability to modulate and evade host immune defenses[13] with their E/S products that sustain development and create a site of immune privilege[26]. Comparing the N. americanus genome with genomes from other nematodes, its host, and distant species, resulted in identification of molecules that facilitate parasitism. Sixty percent of N. americanus genes share an ortholog with other species (Supplementary Table 8; Supplementary Fig. 11, Supplementary Note). Comparative analysis identified metalloendopeptidases as the most prominent N. americanus protease (), which is likely associated with the cleavage of eotaxin and inhibition of eosinophil recruitment[27], in addition to tissue penetration[28] and haemoglobinolysis[29]. N. americanus is the only blood-feeding nematode included in the comparison, and the hierarchical structure for enriched molecular functions () reveals shared and unique patterns and subsequent functional relationships. SCP/Tpx-1/Ag5/PR-1/Sc7 (SCP/TAPS; IPR014044; Supplementary Table 5) is a protein family inferred to be involved in host-parasite interactions (). There are 137 SCP/TAPS proteins in N. americanus, a 4-fold expansion of this protein family compared to other nematodes. More than half (69/137) of the N. americanus SCP/TAPS proteins are adult-overexpressed (p<10−15; ), and only 6 of the 137 N. americanus SCP/TAPS proteins have orthologs in C. elegans (according to the MCL clustering; see Methods). The presence of a limited repertoire of orthologs in C. elegans suggests that nematode SCP/TAPS proteins may have originated prior to parasitism. Primary sequence similarity classified SCP/TAPS proteins into multiple groups (), which do or do not contain C. elegans members, suggesting independent expansion of SCP/TAPS proteins after parasite speciation. The large expansion of SCP/TAPS proteins in N. americanus suggests multiple, possibly distinct roles in host-parasite interactions. SCP/TAPS proteins have been studied extensively as potential hookworm drug/vaccine candidates[30] or as therapeutics for human inflammatory diseases[15] or stroke[31] (). The 96 N. americanus-specific SCP/TAPS identified might serve as candidates for selective drug or vaccine targets[32] (Supplementary Table 5). A total of 336 N. americanus genes that are orthologous to previously-predicted immunogenic/immunomodulatory proteins in A. suum[24] were identified, along with three homologs to of transforming growth factor beta (TGF-β), an important protein in modulation of inflammation and the evolution of nematode parasitism[33] (Supplementary Table 5). Additional protein-coding genes in N. americanus inferred to be involved in host-parasite immunomodulatory interactions include macrophage migration inhibitory factors (MIF), neutrophil inhibitor factor (NIF), hookworm platelet inhibitor (HPI), galectins, C-type lectins (C-TL), peroxiredoxins (PRX), glutathione S-transferases (GST), etc ().

Prospects for new interventions

Historically, anthelmintic drugs have been discovered using in vivo and in vitro compound screens[34]. Recent comparative ‘omics’ studies (accompanied by experimental screening) in multiple nematode species[10] demonstrate that genomic and transcriptomic data can be used to prioritize targets, with a higher hit rate compared with conventional approaches. Hence, the availability of the N. americanus genome is expected to enable comparative genomic and chemogenomic studies for the prediction and prioritization of therapeutic targets. Since more than half (53%) of all current drug targets[35] consist of rhodopsin-like G-protein-coupled receptors (GPCRs), nuclear receptors (NRs), ligand-gated ion channels (LGICs), kinases and voltage-gated ion channels (VGICs, these protein groups were investigated in the N. americanus genome to identify potential therapeutic targets (Supplementary Table 9, ). GPCRs are attractive drug targets due to their importance in signal transduction[35]. We identified 272 GPCR genes, whereas there are nearly 1,700 GPCR genes in C. elegans. Although GPCRs are challenging to characterize at the primary sequence level (and the N. americanus genome is in a draft state), there may be a biological explanation for this difference in the number of GPCRs identified, including frequent amplifications of several subfamilies of GPCRs in C. elegans relative to the closely-related C. briggsae[36]. Three of the 5 GRAFS families (glutamate, rhodopsin, and frizzled, but not adhesion or secretin) are found in N. americanus. The putative GPCRs are enriched for iL3-overexpression (30 genes, p=5.1×10−8), with only one gene being adult-overexpressed (p=4.1×10−7 for under-representation). N. americanus encodes members of both major ion channel categories (LGICs and VGICs); 224 LGICs belonging to two of the three subfamilies of LGIC (Cys-loop family and glutamate-activated cation channels) were identified, compared with 159 LGIC-encoding genes in C. elegans[ Genes encoding nicotinic acetylcholine receptor subunits (nAChR) of cys-loop family members are also found. Nematodes have a much larger number of nAChR alpha subunits than examined vertebrates (17 nAChR-encoding genes in mammals and birds vs. 29 nAChR subunits in C. elegans[38]), and several anthelmintics such as levamisole[39] and monepantel[40] have been developed to exploit these differences. Ivermectin[41] targets a subunit of glutamate-gated chloride channels that are present in N. americanus (eight genes; IPR015680); three of these genes clustered with six C. elegans glutamate-gated chloride channel genes (avr 14/15 and glc 1-4[42]). The lack of a clear ortholog of the ivermectin-sensitive genes within the N. americanus genome and the underlying sequence diversity at a position correlated with direct activation by ivermectin may explain the relative ivermectin insensitivity of N. americanus[43] () compared to other nematodes[44]. VGICs include sodium, potassium and calcium channels, and are anthelmintic targets (e.g., emodepside inhibiting SLO-1 in C. elegans[45] and parasitic nematodes such as A. suum[46]). N. americanus encodes 48 VGICs (less than C. elegans), including members from the major families such as 6-transmembrane (6TM) potassium channels, voltage-gated calcium channels, and voltage-gated chloride channels (). Consistent with other nematodes[47], voltage-gated sodium channels are not present in N. americanus. Protein kinases are involved in numerous signal transduction pathways that regulate biological processes, and have been exploited major focus for drug discovery[48]. Of the 274 N. americanus genes encoding kinases, 15 and 12 are overexpressed in iL3 and adults, respectively. Gene expression, tissue expression, conservation among nematodes and dissimilarity to human ortholog was used for prioritization[10] of candidate targets (Supplementary Table 10). To evaluate current drugs and inhibitors that target homologous kinases, compounds from a publicly available database were also prioritized (Full Methods Online). The highest scoring compound is an approved tyrosine kinase inhibitor for treating chronic myelogenous leukemia (CML)[49]. A total of 233 other compounds had the second-highest score of 5 (Supplementary Table 11), indicating that these existing drugs might be repurposed for treating neglected tropical diseases, thus minimizing development time and cost[50]. Chokepoints in metabolic pathways[51] were analyzed and prioritized to identify further drug targets. N. americanus encodes at least 3,976 protein-coding genes associated with 3,265 KEGG orthology (KO) terms (Supplementary Table 7), 938 (24%) of which are involved in metabolic pathways (), representing 32 potentially complete modules. A total of 34% of the metabolic pathway genes are classified as a chokepoint (Supplementary Table 12), of which 120 are conserved among nematodes and non-nematode species used in the comparative analysis. Chokepoint prioritization, along with a requirement for a chokepoint to be an expression bottleneck in N. americanus and to display a lethal RNAi phenotype of the C. elegans orthologous gene prioritized 8 enzymes encoded by 10 distinct genes (Supplementary Table 12-14, Among the prioritized chokepoints is adenylosuccinate lyase (ASL) (EC 4.3.2.2) (), an enzyme involved in the purine metabolism pathway (ko00230) and a chokepoint in the adenine ribonucleotide biosynthesis module (M00049). To identify chokepoint inhibitors for repurposing, compounds from publicly available databases (449 target-compound pairs) were assessed using the same method as for kinase inhibitors. The highest ranked candidates include compounds such as azathioprine (DB00993), a pro-drug that is converted into mercaptopurine (DB01033) to inhibit purine metabolism and DNA synthesis (Supplementary Fig. 18, Supplementary Table 14, Supplementary Note).

Post-genomic exploration using the N. americanus immunome

The N. americanus genome enables development of post-genomic tools to address the immuno-biology of human hookworm disease and accelerate antigen discovery for the development of vaccines and diagnostics. We developed a protein microarray containing 564 N. americanus recombinant proteins inferred from the genome (Supplementary Table 15, ). The microarray was probed with sera from individuals aged 4 to 66 years residents in an N. americanus-endemic area of northeastern Minas Gerais state in Brazil. This pilot study based on 200 individuals from the youngest (<14 years of age) and the oldest age strata (>45 years of age), resulted in identification of 22 antigens that were significant targets of anti-hookworm immune responses (). Older individuals showed stronger IgG responses to a larger number of secreted antigens, but these antibodies appear to play no role in killing the parasite or protecting against heavy infection. Hence, unlike other STHs of humans, protective immunity to N. americanus does not seem to develop in most individuals during adolescence. This is consistent with knowledge that, in Necator endemic areas, older human individuals often harbour the heaviest-intensity infections[1,52,53]. Younger individuals showed IgG responses against fewer antigens, usually with lower intensity. Thus, while antibodies are a key feature of the immune response to N. americanus and increase with host age, they fail to protect individuals from infection over time. The absence of overall protective immunity to hookworm infection as opposed to age-acquired protective immunity observed with other STH infections is likely multifactorial. Detailed kinetic studies of the IgG subclasses and IgE responses to hookworm antigens represented on our protein microarray will be required to better understand the roles of these antibodies in the acquisition of immunity against hookworm[13]. The protein microarray can be probed with sera from individuals with different genetic backgrounds and different histories of exposure to hookworm[54], as well as animals rendered immunologically resistant to hookworm infection by vaccination with irradiated iL3[55], thereby facilitating efforts to develop an efficacious vaccine against hookworm disease. Furthermore, secreted proteins recognized by most or all the infected individuals and with weak or no homologies to other nematode species, indicate identification of antigens that might form the basis of sensitive and specific serodiagnostic tests ( e.g. ).

DISCUSSION

N. americanus is responsible for causing more disease worldwide than any other STH. The characterization of the first genome of a human hookworm is expected to significantly facilitate future fundamental explorations of the epidemiology and evolutionary biology of hookworms as well as efforts toward the development of therapeutics to combat hookworm disease. Since N. americanus is the first hookworm whose genome has been sequenced, the data presented provide a first insight into blood-feeding nematodes of major human and animal health importance. Our post-genomic exploration of inferred proteomic information highlights the utility of the draft genome sequence for understanding the immuno-biology of human hookworm disease and accelerating the development of vaccines and diagnostics. It is also pertinent to note that hookworms are garnering interest for their therapeutic properties against a range of non-infectious inflammatory diseases of humans. The genome sequence, therefore, represents a veritable pharmacopoeia – indeed, recombinant hookworm molecules have already undergone clinical trials for stroke and deep vein thrombosis[15]. Clearly the N. americanus genome sequence will have broad implications and provides many exciting opportunities to establish post-genomic methods in the quest to develop improved interventions against this ancient and neglected parasite, as well as inflammatory diseases that are reaching epidemic proportions in industrialized societies.

Online Methods

Parasite material

The Anhui strain of N. americanus was maintained[56] in Golden Syrian Hamster (3-4 weeks, male) from Harlan under the George Washington University IACUC approved protocol 053-12,2, and in accordance with all Animal Welfare guidance. Adult worms were collected from intestines of hamsters infected subcutaneously with N. americanus iL3 for 8 weeks[57]. DNA was extracted with the QIAamp DNA Mini Kit according to manufacturer's instruction (Qiagen). For transcriptome sequencing, two key developmental stages from a host-parasite interaction perspective, the infective L3 (iL3; environmental) and adult (parasitic) worm stages, were collected.

Sequencing, assembly and annotation

Fragment, paired-end whole-genome shotgun libraries (3kb and 8 kb insert sizes) were sequenced using Roche/454 platform and assembled with Newbler[58]. A repeat library was generated (Repeatmodeler) and repeats characterized (CENSOR[59] v. 4.2.27 against RepBase release 17.03[60]). Ribosomal RNA genes (RNAmmer[61]) and transfer RNAs (tRNAscan-SE[62]) were identified. Other non-coding RNAs were identified by a sequence homology search against the Rfam database[63]. Repeats and predicted RNAs were then masked using RepeatMasker. Protein-coding genes were predicted using a combination of ab initio programs[64,65] and the annotation pipeline tool MAKER[66]. A consensus high confidence gene set from the above prediction algorithms was generated (). The size and number of exons and introns in N. americanus were determined by parsing exon sizes from gff-format annotations (considering only exon features tagged as “coding_exon”) and calculating intron sizes and compared to the C. elegans genes (WS230). Significant differences in exon and intron lengths and numbers were tested between species and orthologous and non-orthologous gene groups using two-tailed T-tests with unequal variance (). Two separate approaches were used to identify putative operons in N. americanus (). Gene product naming was determined by BER (JCVI) and functional categories of deduced proteins were assigned[67-69]. Orthologous groups were built from 13 species using OrthoMCL[70] and genes not orthologous to the other 12 species were classified as N. americanus-specific.

RNA-seq

RNA was extracted[18], DNAse treated and used to generate both Roche/454 and Illumina cDNA libraries () that were sequenced using a Genome Sequencer Titanium FLX (Roche Diagnostics) and Illumina (Illumina Inc, San Diego, CA), with slight modification (). The 454 cDNA reads were analyzed as previously described[18]. The Illumina RNA-seq data were processed[71] and low-compositional complexity bases were masked[72]. RNA-Seq reads were aligned[73] to the predicted gene set and genes with a breadth of coverage ≥50% across the gene sequence (i.e., “expressed”) were used for further downstream analysis. Expression was quantified using expression values normalized to the depth of coverage per 100 million mapped bases (DCPM). Expressed genes were subject to further differential expression analysis using EdgeR[74] (false discovery rate <0.05), in order to identify stage-overexpressed genes ().

Deduced Proteome Functional Annotation and Enrichment

Proteins were searched against KEGG[75] using KAAS[68] (cut-off 35 bits) and InterProScan[69] was used to get InterPro[76] domain matches and Gene Ontology[67] (GO) annotations. Proteins with signal peptides[77], non-classical secretion[78] and transmembrane topology[77] were identified. The degradome was identified by comparison to the MEROPS[79] protease unit database using WU-BLAST (identifying the best hit with E≤e-10). Enrichment of different protease groups among different gene sets (based on similarity to C. elegans) was detected based on False Discovery Rate (FDR)-corrected binomial distribution probability tests[80]. GO enrichment significance comparing the iL3 and adult-overexpressed gene sets was calculated using FUNC[81] at a 0.01 significance threshold after Family-Wise Error Rate (FWER) population correction[81]. QuickGO[82] was used to analyze the hierarchical structure of significant GO categories.

Proteomic analysis of somatic worm extract

Whole worms were ground under liquid nitrogen before solubilisation in lysis buffer, total protein was precipitated, and established methods[83] were used to reduce, alkyate and tryptic-digest two 1.5 mg samples of total somatic protein. Peptide fractions were prepared before LC and mass spectral analysis (). Only proteins confirmed with at least two peptides and a confidence of p≤0.05 were considered identified. GO functional enrichment among the genes supported by proteomics was calculated[81], using all of the genes without proteomics support as a background for comparison.

Transcription Factors and the binding sites

Transcription factors in N. americanus were identified by extracting KEGG Orthology (KO) numbers from the KEGG transcription factor database (derived from TRANSFAC 7.0[84]) and comparing to N. americanus KOs. Documented matrices of transcription factor binding sites were downloaded from the JASPAR database[85]. The corresponding protein accession numbers were extracted and converted to KOs, and were compared to N. americanus transcription factor KOs to define a subset of N. americanus transcription factors with available binding site information. The binding site matrices of this subset of N. americanus transcription factors were used to scan the sequences of up to 500 bp downstream and upstream of differentially expressed genes using Patser.

SCP/TAPS

Each protein was searched for the SCP/TAPS-representative protein domains[86] IPR014044 (“CAP domain”) and PF00188 (“CAP”)[86] using Interproscan[69] and hmmpfam[87]. Phylogenetic relationship trees using full length primary sequences derived from ungapped genes were built using Bayesian inference[88] and Neighbor Joining[89] as previously described for other helminths[32,86,90]. Leaves of the tree were annotated with domain information, secretion mode and expression data, and then visualized using iTOL[91].

Potential Drug Targets

GPCRs, LGICs and VGICs were identified with InterProScan[69]. Ion channels were identified using WU-BLASTP (E≤e-10) against the C. elegans proteome (WS230). Ivermectin Target Characterization: sequence alignments were obtained by MUSCLE[92] for the C. elegans and N. americanus orthologs within two orthologous groups (NAIF1.5_00184 and NAIF1.5_06724). Homology models for the two N. americanus orthologs (NECAME_16744 and NECAME_16780) were built by MODELLER[93] using the C. elegans crystal structure as template[94]. For each ortholog, five models were built and the one with the lowest total function score (energy) was chosen as the model shown. Sequence alignments are colored by Clustalx scheme in JalView[95]; protein structure models are rendered in PyMol (Schrodinger, LLC, The PyMOL Molecular Graphics System, Version 1.3r1. 2010).

Kinome and Chokepoints

N. americanus genes were screened against the collection of kinase domain models in the Kinomer[96] and custom score thresholds applied for each kinase group and then adjusted until an hmmpfam search[87] came as close as possible to identifying known C. elegans kinases. Those same cutoffs were then applied to the N. americanus gene set to identify putative kinases as previously described[97]. Kinase prioritization was done adapting the protocol as previously described[10] (). Chokepoints of KEGG metabolic pathways were defined as a reaction that either consumes a unique substrate or produces a unique product. The reaction database from KEGG v58[98] was used and the chokepoint were identified and prioritized as previously described[99] (). Metabolic module abundances were calculated (and normalized in DCPM) based on KAAS annotations[68], and module bottlenecks were defined as reaction steps in the cascade that are both essential for the module completion and have that have low enzyme abundance that primarily constrains the overall module abundance. Homology models were aligned with their reference sequence using T-COFFEE[100], constructed using MODELLER[101] with default parameters using PDB structures with the highest sequence similarity, and docking was performed using AutoDock4.2[102] using default parameters. Chemogenomic screening for compound prioritization was performed as previously described[99] ().

Protein microarray

In 2005, 1494 individuals between the ages 4 and 66 years (inclusive) were enrolled (with informed consent) into a cross-sectional study in an N. americanus-endemic area of Northeastern Minas Gerais state in Brazil, using protocols approved by the George Washington University IRB (117040 and 060605), the Ethics Committee of Instituto René Rachou, and the National Ethics Committee of Brazil (CONEP) (Protocol numbers 04/2008 and 12/2006). Venous blood (15 mL) was collected from individuals determined to be positive for N. americanus (). A total of 1,275 N. americanus open reading frames (ORFs) contained a classical signal peptide for secretion and had RNA-seq evidence for transcription in iL3 and/or adult worms. Of those, 623 corresponding cDNAs were successfully amplified, cloned, expressed and the protein extracts were contact-printed without purification onto nitrocellulose glass FAST® slides (). The printed in vitro-expressed proteins were quality-checked using antibodies against incorporated N-terminal poly-histidine (His) and C-terminal hemaglutinin (HA) tags. Protein arrays were blocked in blocking solution (Whatman) and probed with human sera overnight. Arrays were washed and isotype and subclass-specific responses were detected using biotinylated mouse monoclonal antibodies against human IgG1, IgG3, IgG4 (Sigma) and biotin-conjugated mouse monoclonal anti-human IgE Fc (Human Reagent Laboratory, Baltimore, MD). Microarrays were scanned using a GenePix microarray scanner (Molecular Devices). The data was analyzed using the “group average” method[103], whereby the mean fluorescence was considered for analysis ().
  102 in total

Review 1.  The putative chemoreceptor families of C. elegans.

Authors:  Hugh M Robertson; James H Thomas
Journal:  WormBook       Date:  2006-01-06

2.  MAKER: an easy-to-use annotation pipeline designed for emerging model organism genomes.

Authors:  Brandi L Cantarel; Ian Korf; Sofia M C Robb; Genis Parra; Eric Ross; Barry Moore; Carson Holt; Alejandro Sánchez Alvarado; Mark Yandell
Journal:  Genome Res       Date:  2007-11-19       Impact factor: 9.043

3.  Using native and syntenically mapped cDNA alignments to improve de novo gene finding.

Authors:  Mario Stanke; Mark Diekhans; Robert Baertsch; David Haussler
Journal:  Bioinformatics       Date:  2008-01-24       Impact factor: 6.937

Review 4.  Hookworm vaccines.

Authors:  David J Diemert; Jeffrey M Bethony; Peter J Hotez
Journal:  Clin Infect Dis       Date:  2008-01-15       Impact factor: 9.079

5.  Comparative protein structure modeling using MODELLER.

Authors:  Narayanan Eswar; Ben Webb; Marc A Marti-Renom; M S Madhusudhan; David Eramian; Min-Yi Shen; Ursula Pieper; Andrej Sali
Journal:  Curr Protoc Protein Sci       Date:  2007-11

6.  Clustal W and Clustal X version 2.0.

Authors:  M A Larkin; G Blackshields; N P Brown; R Chenna; P A McGettigan; H McWilliam; F Valentin; I M Wallace; A Wilm; R Lopez; J D Thompson; T J Gibson; D G Higgins
Journal:  Bioinformatics       Date:  2007-09-10       Impact factor: 6.937

Review 7.  Efficacy of current drugs against soil-transmitted helminth infections: systematic review and meta-analysis.

Authors:  Jennifer Keiser; Jürg Utzinger
Journal:  JAMA       Date:  2008-04-23       Impact factor: 56.272

8.  Coexpression of neighboring genes in Caenorhabditis elegans is mostly due to operons and duplicate genes.

Authors:  Martin J Lercher; Thomas Blumenthal; Laurence D Hurst
Journal:  Genome Res       Date:  2003-02       Impact factor: 9.043

9.  The evaluation of recombinant hookworm antigens as vaccines in hamsters (Mesocricetus auratus) challenged with human hookworm, Necator americanus.

Authors:  Shuhua Xiao; Bin Zhan; Jian Xue; Gaddam Narsa Goud; Alex Loukas; Yueyuan Liu; Angela Williamson; Sen Liu; Vehid Deumic; Peter Hotez
Journal:  Exp Parasitol       Date:  2007-06-06       Impact factor: 2.011

10.  JASPAR, the open access database of transcription factor-binding profiles: new content and tools in the 2008 update.

Authors:  Jan Christian Bryne; Eivind Valen; Man-Hung Eric Tang; Troels Marstrand; Ole Winther; Isabelle da Piedade; Anders Krogh; Boris Lenhard; Albin Sandelin
Journal:  Nucleic Acids Res       Date:  2007-11-15       Impact factor: 16.971

View more
  85 in total

1.  Transcriptomic analysis of hookworm Ancylostoma ceylanicum life cycle stages reveals changes in G-protein coupled receptor diversity associated with the onset of parasitism.

Authors:  James P Bernot; Gabriella Rudy; Patti T Erickson; Ramesh Ratnappan; Meseret Haile; Bruce A Rosa; Makedonka Mitreva; Damien M O'Halloran; John M Hawdon
Journal:  Int J Parasitol       Date:  2020-06-25       Impact factor: 3.981

Review 2.  Genome mining offers a new starting point for parasitology research.

Authors:  Zhiyue Lv; Zhongdao Wu; Limei Zhang; Pengyu Ji; Yifeng Cai; Shiqi Luo; Hongxi Wang; Hao Li
Journal:  Parasitol Res       Date:  2015-01-08       Impact factor: 2.289

Review 3.  Onchocerca volvulus: The Road from Basic Biology to a Vaccine.

Authors:  Sara Lustigman; Benjamin L Makepeace; Thomas R Klei; Simon A Babayan; Peter Hotez; David Abraham; Maria Elena Bottazzi
Journal:  Trends Parasitol       Date:  2017-09-22

Review 4.  Perusal of parasitic nematode 'omics in the post-genomic era.

Authors:  Jonathan D Stoltzfus; Adeiye A Pilgrim; De'Broski R Herbert
Journal:  Mol Biochem Parasitol       Date:  2016-11-22       Impact factor: 1.759

Review 5.  The genomic basis of nematode parasitism.

Authors:  Mark Viney
Journal:  Brief Funct Genomics       Date:  2018-01-01       Impact factor: 4.241

Review 6.  Looking beyond the induction of Th2 responses to explain immunomodulation by helminths.

Authors:  T B Nutman
Journal:  Parasite Immunol       Date:  2015-06       Impact factor: 2.280

7.  Ion channels and drug transporters as targets for anthelmintics.

Authors:  Robert M Greenberg
Journal:  Curr Clin Microbiol Rep       Date:  2014-12

Review 8.  Cracking the nodule worm code advances knowledge of parasite biology and biotechnology to tackle major diseases of livestock.

Authors:  Rahul Tyagi; Anja Joachim; Bärbel Ruttkowski; Bruce A Rosa; John C Martin; Kymberlie Hallsworth-Pepin; Xu Zhang; Philip Ozersky; Richard K Wilson; Shoba Ranganathan; Paul W Sternberg; Robin B Gasser; Makedonka Mitreva
Journal:  Biotechnol Adv       Date:  2015-05-27       Impact factor: 14.227

9.  Secreted proteomes of different developmental stages of the gastrointestinal nematode Nippostrongylus brasiliensis.

Authors:  Javier Sotillo; Alejandro Sanchez-Flores; Cinzia Cantacessi; Yvonne Harcus; Darren Pickering; Tiffany Bouchery; Mali Camberis; Shiau-Choot Tang; Paul Giacomin; Jason Mulvenna; Makedonka Mitreva; Matthew Berriman; Graham LeGros; Rick M Maizels; Alex Loukas
Journal:  Mol Cell Proteomics       Date:  2014-07-03       Impact factor: 5.911

10.  Hc-daf-2 encodes an insulin-like receptor kinase in the barber's pole worm, Haemonchus contortus, and restores partial dauer regulation.

Authors:  Facai Li; James B Lok; Robin B Gasser; Pasi K Korhonen; Mark R Sandeman; Deshi Shi; Rui Zhou; Xiangrui Li; Yanqin Zhou; Junlong Zhao; Min Hu
Journal:  Int J Parasitol       Date:  2014-04-12       Impact factor: 3.981

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.