Literature DB >> 23773280

Computational cloning of drug target genes of a parasitic nematode, Oesophagostomum dentatum.

Nathan M Romine1, Richard J Martin, Jeffrey K Beetham.   

Abstract

BACKGROUND: Gene identification and sequence determination are critical requirements for many biological, genomic, and bioinformatic studies. With the advent of next generation sequencing (NGS) technologies, such determinations are predominantly accomplished in silico for organisms for which the genome is known or for which there exists substantial gene sequence information. Without detailed genomic/gene information, in silico sequence determination is not straightforward, and full coding sequence determination typically involves time- and labor-intensive PCR-based amplification and cloning methods.
RESULTS: An improved method was developed with which to determine full length gene coding sequences in silico using de novo assembly of RNA-Seq data. The scheme improves upon initial contigs through contig-to-gene identification by BLAST nearest-neighbor comparison, and through single-contig refinement by iterative-binning and -assembly of reads. Application of the iterative method produced the gene identification and full coding sequence for 9 of 12 genes and improved the sequence of 3 of the 12 genes targeted by benzimidazole, macrocyclic lactone, and nicotinic agonist classes of anthelminthic drugs in the swine nodular parasite Oesophagostomum dentatum. The approach improved upon the initial optimized assembly with Velvet that only identified full coding sequences for 2 genes.
CONCLUSIONS: Our reiterative methodology represents a simplified pipeline with which to determine longer gene sequences in silico from next generation sequence data for any nematode for which detailed genetic/gene information is lacking. The method significantly improved upon an initial Velvet assembly of RNA-Seq data that yielded only 2 full length sequences. The identified coding sequences for the 11 target genes enables further future examinations including: (i) the use of recombinant target protein in functional assays seeking a better understanding of the mechanism of drug resistance, and (ii) seeking comparative genomic and transcriptomic assessments between parasite isolates that exhibit varied drug sensitivities.

Entities:  

Mesh:

Substances:

Year:  2013        PMID: 23773280      PMCID: PMC3689052          DOI: 10.1186/1471-2156-14-55

Source DB:  PubMed          Journal:  BMC Genet        ISSN: 1471-2156            Impact factor:   2.797


Background

Helminth infection of the gut of humans and domestic animals is a global concern with tremendous social and economic costs [1]. Treatment of either humans or animals may involve administering anthelminthic drugs, typically from 1 of 3 main drug classes: the benzimidazoles (which selectively bind nematode beta-tubulin), macrocyclic lactones (which allosterically activate glutamate-gated chloride channels present in nematodes), or nicotinic agonists (which selectively activate subtypes of nematode nicotinic acetylcholine receptors; nAChRs) [2,3]. Not yet in widespread use are two more recently developed drugs monepantel [4] and derquantel [5] that represent amino-acetonitrile and spiroindole drug classes, respectively; both of these compounds have sites of action on subtypes of nematode nicotinic receptors other than the nematode levamisole receptor subtype [3] and have been promoted as ‘resistance-busting’. Of particular concern is that resistance has developed to members of each of the typically used drug classes [6,7]. Studies of anthelminthic drugs, many performed in the non-parasitic nematode Caenorhabditis elegans, have identified a number of genes/proteins that are drug targets of benzimidazoles (ben-1), macrocyclic lactones (avr-14, avr-15, glc-1, glc-2, glc-3, glc-4), and the nicotinic agonist levamisole (lev-1, lev-8, unc-29, unc-38, unc-63) (reviewed in [7,8]). From studies like these it appears that the molecular basis of susceptibility to macrocyclic lactones and levamisole is polygenic. Although studies in C. elegans have provided important insights about drug-targets and drug-sensitivity, they are free-living bacteriovores and not parasitic worms. Therefore, there is great interest in extending drug sensitivity studies to the parasitic nematodes. One major hindrance to such efforts has been the lack of genome/gene information that is available for the parasitic nematodes. Consequently, studies requiring sequence determination typically include the time- and labor-intensive steps of: (i) identifying the target gene from among known genes of genetically close organisms, (ii) aligning target gene sequences to determine regions of high similarity, (iii) designing DNA primers to those regions, and finally, (iv) amplifying the target using PCR and a single pair of primers, cloning and sequencing the cloned product. A modification of this approach, one having the potential to reduce the time and expense required to identify multiple gene sequences, is to use RNA-Seq data to build the target nematode gene sequences in silico, i.e., by computational methods. The next-generation sequencing (NGS) technique of RNA-Seq economically produces large amounts of sequence data, albeit comprised of very short (50-150 base) reads. When applied to an organism having a known genome, RNA-Seq sequence data can be computationally analyzed using software packages such as the commonly used Velvet package [9] to produce entire gene sequences in silico. Unfortunately, when applied to organisms for which little genome information is known, the output is more typically comprised of short contigs and contigs of lower quality [10]. An example of a parasitic nematode lacking a described genome is Oesophagostomum dentatum, a soil-transmitted helminth (STH) with a classic fecal-oral transmission route that causes minor nodules in the large intestine of swine and that is used as a parasite model for STH [11,12]. This report describes a reiterative method that builds upon assembly using the existing Velvet analysis software to allow greatly improved in silico determination of gene sequences from O. dentatum. The method was applied towards a determination of the sequence of 12 O. dentatum genes that, predominantly in C. elegans, have been identified as targets of the major classes of anthelminthic drugs (benzimidazoles, macrocyclic lactones, and nicotinic agonists). Establishing these sequences in O. dentatum facilitates downstream physiological and molecular studies.

Results

Generation of mRNA-seq read libraries

Separate mRNA-seq libraries were constructed from 5 μg of high quality total RNA (RNA Integrity Number ≥ 7.3) isolated from 58 adult male and 141 adult female worms. Given the absence of a sequenced genome of O. dentatum for use as a scaffold during subsequent assembly and analysis, paired-end sequencing was performed to facilitate contig-building steps. Similar numbers of 75-cycle paired reads were obtained for both libraries, (2.144 × 107 for male, 2.159 × 107 for female) for a combined total of more than 40 million reads. The lack of information about the exome size and complexity of O. dentatum, and that the RNA-seq libraries were not normalized, preclude estimation of the depth of coverage of the libraries; however, in broad approximation, the 3,100 × 106 DNA base calls within the 40 × 106 reads represent considerable coverage of mRNAs transcribed from the estimated 53-59 MB genome [13]. Reads were trimmed of read tags (added during library building to allow multiplexing during the sequencing run) and deposited at the NCBI sequence read archive: male library [GenBank:SRR393668] and female library [GenBank:SRR393669].

Velvet assembly

The NGS assembler Velvet executed in the paired-end mode was used to assemble the 40 million reads from the combined male and female libraries into contigs (Figure 1, step 1). Combining the 2 libraries increased the potential number of reads that were identified for each target sequence while maintaining compliance with NCBI guidelines requiring submitted cDNA sequences be derived from a single strain. The paired-end mode was used for optimal de novo assembly, given the lack of a reference genome. Hashing, the early Velvet step in which individual reads are indexed into overlapping sequences (k-mers) of some length “k”, has significant impact on the length and quality of the contigs built during an assembly, with longer k-mers generally corresponding to greater specificity (i.e., that a detected alignment is actually correct) but lower sensitivity to detect alignments [9]. Because optimal k-mer length cannot be known a priori, multiple assemblies were performed using k-mers ranging from length 17 to 49.
Figure 1

Determination of gene sequences (1) Individual data sets were assembled into contigs using Velvet. (2) BLAST searches for genes of the nAChR-pathway were carried out with a high cutoff (expect value = 1E-10) to identify contigs highly similar to the target genes. (3) Reads were individually mapped (using Bowtie) to high similarity contigs. (4) All paired-reads for which at least one read mapped to a contig (in Step- 3) were identified and binned using a custom Java program. (5) De novo assembly of Step- 4 sequences was performed using Velvet. (6) Iteration of Steps 3-5 was performed until the iteration resulted in no additional reads being mapped to the contig of interest.

Determination of gene sequences (1) Individual data sets were assembled into contigs using Velvet. (2) BLAST searches for genes of the nAChR-pathway were carried out with a high cutoff (expect value = 1E-10) to identify contigs highly similar to the target genes. (3) Reads were individually mapped (using Bowtie) to high similarity contigs. (4) All paired-reads for which at least one read mapped to a contig (in Step- 3) were identified and binned using a custom Java program. (5) De novo assembly of Step- 4 sequences was performed using Velvet. (6) Iteration of Steps 3-5 was performed until the iteration resulted in no additional reads being mapped to the contig of interest. The 12 genes of C. elegans shown to produce anthelminthic targets for which the homologous genes and sequences of O. dentatum were sought are shown in Table 1; the drug classes for which they are targets are shown in Table 2. To identify contigs exhibiting high similarity to the C. elegans target sequences separate BLAST databases, each comprised of the set of contigs produced by a single k-mer assembly, were built and then queried each database using tblastn and the protein sequences of the C. elegans target genes (Figure 1, step 2). A stringent “E-value” threshold of 1 × 10-10 was used within the tblastn search to limit false positive identifications; the E-value corresponds to the number of times a match of the same quality would be found by chance within the database. A custom java-script (Additional file 1) was used to collect those contigs whose corresponding high scoring pair (HSP) E-values were ≤ 1 × 10-10 and, for further stringency, to retain only those contigs whose region of alignment with the target protein exhibited ≥ 60% identity across ≥ 50% of the contig length. Some contigs that passed these filters exhibited high similarity to more than one target gene (data not shown), as would be expected given that some of the target genes are paralogous to one another. To unambiguously assign the filtered contigs to single target genes, contigs were queried (using blastx) against the C. elegans genome database and then gene identity was assigned based upon the match exhibiting the lowest E-value.
Table 1

Drug-target genes and high similarity initial-assembly contigs at multiple k-mers

C. elegans Target gene
21-mer HSPs
23-mer HSPs
25-mer HSPs
27-mer HSPs
IDAcc #CDS len#X¯R#X¯R#X¯R#X¯R
lev-1
CAB03148
1419
3
148
99-219
3
238
174-294
5
238
96-525
4
237
114-381
lev-8
CAB01685
1596
2
147
126-168
1
255
255
1
108
108
 
nd
nd
unc-29
CAB02308
1482
12
144
105-168
7
259
120-480
8
248
132-426
11
202
96-402
unc-38
CCD69819
1524
6
180
111-294
9
184
90-375
9
283
84-1098
8
339
87-1380
unc-63
CCD66192
1524
3
272
105-576
5
247
96-510
5
263
105-510
5
235
105-510
avr-14
CCD61323
1251
1
90
90
2
112
102-123
2
114
105-123
4
111
105-123
avr-15
CAB03329
1437
4
163
123-210
7
154
123-192
6
140
105-210
3
185
123-240
ben-1
CAB00853
1335
2
163
108-219
1
111
111
1
114
114
3
102
90-117
glc-1
CAB07361
1305
1
123
123
1
108
108
1
318
318
1
237
237
glc-2
CCD62432
1305
4
198
147-318
1
879
879
1
939
939
3
375
207-582
glc-3
CCD69051
1455
3
146
87-207
2
115
111-120
2
102
93-111
4
117
99-144
glc-4
CCD65896
1503
6
192
111-279
2
702
249-1155
2
702
249-1155
2
702
249-1155
 
29-mer HSPs
31-mer HSPs
33-mer HSPs
35-mer HSPs
37-mer HSPs
ID
#
X¯
R
#
X¯
R
#
X¯
R
#
X¯
R
#
X¯
R
lev-1
5
219
93-420
11
150
99-330
9
181
114-255
8
207
102-558
6
334
138-657
lev-8
 
 
nd
 
 
nd
 
 
nd
 
 
nd
 
 
nd
unc-29
8
223
120-465
9
171
96-378
8
180
96-396
2
172
162-183
4
173
144-207
unc-38
7
493
90-1473
7
416
84-1473
6
472
108-1473
4
708
261-1473
3
890
360-1473
unc-63
3
227
105-297
4
285
105-414
3
275
105-414
1
414
414
2
277
141-414
avr-14
3
123
111-135
5
145
111-207
6
169
123-270
5
202
123-321
5
240
123-327
avr-15
5
156
90-243
7
153
96-291
4
132
93-213
4
138
96-243
4
108
108-111
ben-1
2
97
93-102
4
99
90-120
3
99
96-102
3
109
87-135
5
109
93-129
glc-1
1
237
237
 
 
nd
2
141
141-141
 
 
nd
 
 
nd
glc-2
3
261
147-393
2
439
135-744
2
475
207-744
2
387
171-603
2
336
333-339
glc-3
3
112
111-114
7
169
93-330
7
120
90-168
5
145
117-213
6
153
111-189
glc-4
2
702
249-1155
2
702
249-1155
2
702
249-1155
2
702
249-1155
1
972
972
 
39-mer HSPs
41-mer HSPs
43-mer HSPs
45-mer HSPs
47-mer HSPs
ID
#
X¯
R
#
X¯
R
#
X¯
R
#
X¯
R
#
X¯
R
lev-1
3
564
339-807
1
1392
1392
1
1401
1401
2
619
345-894
3
245
219-270
lev-8
 
 
nd
 
 
nd
 
 
nd
 
 
nd
 
 
nd
unc-29
 
 
nd
 
 
nd
 
 
nd
 
 
nd
 
 
nd
unc-38
1
1473
1473
1
1473
1473
2
868
264-1473
1
1473
1473
 
 
nd
unc-63
 
 
nd
 
 
nd
 
 
nd
 
 
nd
 
 
nd
avr-14
1
390
390
1
219
219
 
 
nd
 
 
nd
 
 
nd
avr-15
2
111
111-111
 
 
 
1
120
120
1
174
174
1
174
174
ben-1
7
100
75-120
10
138
90-234
16
115
84-234
25
114
87-348
15
160
90-453
glc-1
 
 
nd
 
 
nd
 
 
nd
 
 
nd
 
 
nd
glc-2
 
 
nd
 
 
nd
 
 
nd
 
 
nd
 
 
nd
glc-3
3
618
84-1383
 
 
nd
1
378
378
 
 
nd
 
 
nd
glc-411428142811428142819729721972972114281428

C. elegans target genes (name (ID), GenBank accession number (Acc #) and coding sequence length (CDS len)) used to BLASTx-query a database comprised of the initial de novo library assembly. For each k-mer (e.g. “21 mer HSPs”) are listed in columns the number of high scoring pairs identified (HSP; “#”), the mean HSP length in DNA bases (“”), and the range of HSP-lengths (“R”) with minimum and maximum length shown. Bold HSP-length values indicates the longest HSP identified among all k-mers for a given target gene. “nd” indicate no high-similarity HSPs were identified at that k-mer.

Table 2

Target gene identification and comparison

Gene ID
Max HSP length (DNA)
Max Contig length (DNA)
Max Contig length (AA)
Length (AA)
Final vs C. elegans protein
O. dentatum
 InitialFinalInitialFinalInitialFinalC. elegans% ID% full lengthAcc #
Levamisole target genes (nAChR subunits)
 
 
 
 
lev-1
1401
1401
1580
1612
467
477
472
91
100
GACS01000001
lev-8
255
255
269
269
85
89
531
85
17
GACS01000002
unc-29
480
576
482
614
160
204
493
88
41
GACS01000003
unc-38
1473
1455
1607
1631
491
507
511
72
100
GACS01000004
unc-63
576
1248
578
1770
192
417
502
82
82
GACS01000005
Macrocyclic lactone, benzimidazole target genes
 
 
 
 
avr-14
390
1299
629
1485
130
464
416
52
100
GACS01000006
avr-15
291
1242
560
1531
97
447
478
44
100
GACS01000007
ben-1_1
453
1308
453
1436
151
448
444
94
100
GACS01000008
ben-1_2
453
1317
453
1751
151
448
444
95
100
GACS01000009
glc-2
939
1215
994
1389
313
424
434
72
100
GACS01000010
glc-3
1383
1383
1724
2048
461
531
484
58
100
GACS01000011
glc-4142815211628162847650850077100GACS01000012

Best initial high scoring pair (HSP) and contig lengths correspond to the longest HSP from the initial read assemblies (see bold values from Table 1) and the contig from which that HSP derives. “% full length” represents the percent of the comparator C. elegans protein that is represented within the final contig. “% ID” represents the percent identity as determined by pairwise alignment.

Drug-target genes and high similarity initial-assembly contigs at multiple k-mers C. elegans target genes (name (ID), GenBank accession number (Acc #) and coding sequence length (CDS len)) used to BLASTx-query a database comprised of the initial de novo library assembly. For each k-mer (e.g. “21 mer HSPs”) are listed in columns the number of high scoring pairs identified (HSP; “#”), the mean HSP length in DNA bases (“”), and the range of HSP-lengths (“R”) with minimum and maximum length shown. Bold HSP-length values indicates the longest HSP identified among all k-mers for a given target gene. “nd” indicate no high-similarity HSPs were identified at that k-mer. Target gene identification and comparison Best initial high scoring pair (HSP) and contig lengths correspond to the longest HSP from the initial read assemblies (see bold values from Table 1) and the contig from which that HSP derives. “% full length” represents the percent of the comparator C. elegans protein that is represented within the final contig. “% ID” represents the percent identity as determined by pairwise alignment. For each gene of interest, Table 1 indicates C. elegans target gene information including coding sequence length, and summarizes the BLAST results profile for each k-mer assembly from k-value 21 to 47; profiles for k-values of 17, 19, and 49 were unremarkable and are not shown. For each k-mer evaluated and each target sequence, the table indicates the number of HSPs identified, their mean length, and their range from shortest to longest. A best/longest HSP was identified for each target contig (as is indicated by bold font for R value), and no contig yielded more than 1 HSP for a target gene (data not shown). Relative to the length of the target gene coding sequences, the length of the corresponding longest HSP was quite short for 9 genes, and was near to completely full length for 3 other sequences (lev-1, unc-38, and glc-4). Table 1 demonstrates the impact that k-mer can have on the quality of assembly; for a given length k-mer, the quality of the assembly varied greatly, with (i) some k-mers producing many but short HSPs, (ii) other k-mers yielding few, but longer HSPs, and (iii) no single k-mer yielding the longest HSP (contig) for all target genes. Also as shown in Table 1, for some targets only a single k-mer produced the best HSP (e.g. glc-3 at 39-mer), whereas for other targets a range of k-mers yielded equivalent best HSPs (e.g. unc-38 at k-mers 29-45).

Iterative-binning and iterative-assembly to optimize sequence determination

To improve the overall quality of the contigs (i.e., increase contig lengths by extension and by gap filling) additional computational steps of assembly were developed (Figure 1, steps 3-6). In outline, this involved identifying and binning all RNA-Seq library read-pairs for which at least 1 read matched a single contig for each target gene, and then using Velvet to reassemble those binned reads into contigs; this process was repeated until the output contigs exhibited no relative improvement. In detail, high-identity contigs that were identified in the initial Velvet assembly (at k-mer length 31 for all gene targets excepting lev-8, glc-1, and glc-3,which used k-mer lengths of 23, 25, and 39 respectively) were used in Step 3 (Figure 1) to identify and bin all library reads that mapped to those contigs; this process utilized the mapping program Bowtie set to increase the sensitivity of read identification by using a low quality threshold value (of 150) and by running it in unpaired-read mode. The unpaired-read mapping mode allowed inclusion of those reads that did not contribute to the prior contig (Table 1) for reason their pair read failed to map to that prior contig. Consequently, the reads binned in Step 3 included the paired reads along with a number of single reads for which their pair did not map; a custom Java-script collected into a single bin those reads that mapped as well as reads that did not map but whose read-pair did map (Figure 1, Step 4: Additional file 2). The Velvet program was then used to assemble contigs from the collected paired-end reads combined from both libraries (Figure 1, Step 5) using multiple length k-mers and coverage cutoffs to identify assembly conditions that produced a maximum contig length; this step involved minimal computational time given the small number of reads (< 2 * 105) present among all bins. Bowtie mapping, read set collection and assembly were reiterated (Figure 1, Step 6) until a maximum contig length and maximum coverage (relative to the target genes) were achieved. The dramatic effectiveness of the reiterative method is evidenced by comparison of contigs and corresponding best-HSPs of the final (iterative-derived) contigs to the initial contigs and to the C. elegans target sequence, as is shown in Table 2. Whereas the initial Velvet assembly yielded 3 sequences corresponding to nearly 100% of the target gene coding sequences (lev-1, unc-38, glc-4), reiteration yielded the full coding sequence of an additional 7 target sequences including a second isotype of ben-1, and yielded significant improvement to all other sequences excepting that of lev-8. Interestingly, whereas the initial assembly yielded contigs that best-mapped by BLAST analysis to glc-1 (Table 1), reiteration yielded an extended contig unambiguously identified by BLAST as the paralogous target gene avr-15. A closer examination of the initial contigs that were identified as glc-1 revealed that their region of similarity to glc-1 was quite small and exhibited almost-as-high BLAST similarity to the paralog avr-15. Consequently, reads within the libraries provide evidence for transcripts for 2 avr-15 variants and no glc-1. The quality/accuracy of the in silico derived sequences can be inferred from the shared identity to target genes of C. elegans (Table 2, column “% ID”). Additionally, a direct indication of in silico sequence quality for 2 of the target genes (unc-38 and -63) is possible because their sequence in O. dentatum had previously been determined (from PCR-amplicons). BLAST nucleotide comparison of the previously determined O. dentatum unc-63 mRNA sequence [GenBank:HQ162136] to the corresponding 1770 nucleotide in silico sequence (Table 2, GACS01000005) identified a single alignment comprised of 1768 bases (including 30 non-identical bases) and no gaps. Of the non-identical bases, 20 were located within the coding sequence but only 1 of the 20 resulted in a change in the deduced protein sequence, a G1411A/ R433Q (numbering relative to HQ162136 DNA and corresponding protein sequences). Within the RNA-seq library reads this base call was invariant, i.e. of the 5 reads that mapped over the G1411A position, all were unique and all contained the “A” base call. A similar comparison of the 1681 base O. dentatum unc-38 mRNA sequence [GenBank:GU256648.1] to the corresponding 1631 nucleotide in silico sequence (Table 2, GACS01000004) identified a single alignment comprised of 1607 bases and no gaps, and which, when limiting the comparison to only the coding sequence, exhibited 3 base changes (i.e., 1521 of the 1524 coding sequence bases were identical). All 3 base changes result in amino acid changes: A179G/ Y35C, T434C/ Y120H, and T1103C/ F343S (numbering relative to GACS01000004 DNA and corresponding protein sequences). The other differences determined by alignment of the full sequences were that the GACS01000004 3’ UTR was shorter by 43 bases, and that the first 25 bases of the GACS01000004 75 base 5’ UTR was not contained within the 82 base GU256648.1 5’ UTR (suggesting the in silico sequence may represent a 5’ splice variant). To validate these differences within the 5’ UTR and the coding sequence of GACS01000004, reverse transcription polymerase chain reaction (RT-PCR) using a 5’ SL1 splice leader forward primer (see Methods) and an unc-38 specific internal reverse primer were used to amplify an approximately 1600 base RNA segment (spliced leaders represent a set of invariant nucleotide sequences that are post-transcriptionally added to the 5’ end of many nematode mRNAs). The sequence of individual PCR clones fully recapitulated the GACS01000004 5’ UTR sequence as well as the 3 base changes within the coding sequence. Further evidence for the validity of the in silico derived sequences was shown by RT-PCR amplification of ben-1 and avr-15 from RNA template using gene specific primers designed from the in silico sequence (Table 2, GACS01000008 and GACS01000007, respectively). The sequences of 9 ben-1 PCR clones compared against that of GACS01000008 showed a single 1349 base no-gap alignment with 1307 identities and 42 variant base calls of which 6 resulted in a change in the deduced protein sequence: G165A/ V49I, G512A/ M164I, A646G/ D209G, T840C/ S274P, G996A/ V326M, C1111T/ A364V (numbering relative to GACS01000008 DNA and corresponding protein sequences). Interestingly, at each of the 42 sites of base variance, the majority of clone sequences called for an identical base to that of the in silico GACS01000008 sequence. In addition, at 21 of the 42 sites of base variance only a single PCR clone contained the variant base call. Thus the 42 base variants likely represent single nucleotide polymorphisms that are present within the population of ben-1 RNAs examined. In the equivalent analysis of avr-15, the sequences of 2 PCR clones compared against that of GACS01000007 showed a single 1363 base no-gap alignment with 1320 identities and 43 variant base calls of which 6 resulted in a change in the deduced protein sequence: A1089G/ K345R, C1103T/ H350Y, C1173T/ V373A, T1185C/ V377A, A1284G/ K410R, C1292G/ L413V (numbering relative to GACS01000007 DNA and corresponding protein sequences). Among the 2 PCR clones, one contained only the V373A deduced change and otherwise was identical in translation to the protein deduced from GACS01000007, and the other clone contained all 6 deduced amino acid changes.

Discussion

The data shown in Results demonstrate the efficient and successful use of an iterative de novo assembly of RNA-Seq data to determine in silico the sequence of 12 O. dentatum anthelminthic drug target genes. Selection of these particular genes was based upon their general importance to a range of studies investigating helminth drug targets (reviewed in [7,8]). The iterative assembly produced full length coding sequences for 9 target genes, whereas the Velvet assembly yielded full length (or nearly full-length) sequences for only a 3-gene subset of those 9 genes. A major utility of this process is that, as a computational process, it is scalable and should fit well to a variety of gene-characterization situations in O. dentatum or in any other nematode lacking a known genome sequence. Related computational processes have been described with utility for producing output other than full length coding sequences. In one method, remapping of reads by identity to contigs within an initial assembly, and then reassembling contigs from those remapped reads, improved transcriptome assembly [14]; the desired output from that work was production of a general gene ontology. In another method, genome assemblies were variably improved via gap closure achieved by mapping paired-end reads and collecting pairs for which only one of the ends aligned to a contig [15]. As shown in Table 1, a number of k-mer lengths were used for initial library assemblies, demonstrating the dramatic effect of k-mer length on contigs. That said, as noted in Results, k-mer length 31 was used to build the contigs used in all target-gene resampling excepting that for lev-8 (for which k-mer 31 returned no contigs; see Table 1) and glc-3 (for which k-mer 31 contigs failed to support generation of resampled contigs representing the full target sequence). This suggests there is little need to use the best or longest contig as input to the resampling process. As a further test of this concept we were able to successfully build avr-15 from a single approximately 300 base initial contig. Thus, there seems to be a reduced need to conduct full library assemblies over a wide range of k-mers when attempting to derive in silico the sequences for a set of target genes; instead, one stops when a limited set of k-mers that have been run produce a quality contig for each target. The reiterative approach presented here may have utility on a larger scale to facilitate transcriptome projects in nematodes and other organisms that lack genomic information but for which more complete gene-transcript sequences are desired. A dynamic programming approach will likely be required in such extended application to accommodate the conditional filters used during reiterative binning and assembly, and to accommodate a broader range of possible contig-to-target sequence similarity scores (i.e., blast E-values). We note that for a gene family represented by many members within a single organism there can arise ambiguities in contig identification, something that was seen in the present study for some initially assembled nAChR subunit contigs (data not shown); that such ambiguities were not observed for any reiteratively assembled contigs logically suggests that the longer the contig sequence-lengths, the less the chance that ambiguities of identity assignment will occur. Of interest to nematologists is the identification of a lev-8-like sequence in O. dentatum, since it has also been found in C. elegans[16] but not in H. contortus[17], a Clade V nematode that is considered more closely related to O. dentatum than is C. elegans[18]. This identification is validated by its 85% amino acid identity (Table 2) and 94% similarity to C. elegans lev-8; by comparison, it exhibited only 67% identity and 82% similarity to the highest BLAST-identified matching gene/protein in H. contortus, acr-8 [GenBank:ABV68891]. These data suggest a loss of lev-8 in H. contortus but not O. dentatum. Because data reported here identified only a partial in silico sequence for lev-8 (<300 bp), it is uncertain whether O. dentatum contains a full length (functional) lev-8 or only a vestigial (partial) lev-8 sequence; if the latter, then O. dentatum may be close behind H. contortus in losing the gene; if the former, then given that the lev-8 nAChR subunit has been shown in C. elegans to confer sensitivity to levamisole [19], one might predict a difference in levamisole sensitivity as a function of the presence, or absence of lev-8. While differences in the nAChR properties and drug sensitivities of closely related nematode species have been observed, the genetic basis for these differences are unknown.

Conclusions

The reiterative approach presented here was effective in determining in silico longer sequence reads for 11 genes of a 12-gene set of drug target genes of O. dentatum, a nematode for which exists very little genomic or gene information; an initial Velvet assembly that yielded 3 full/nearly-full length sequences can be improved by reiteration to yield full coding sequences for 9 (or 10, including ben-1 isotype 2) target genes and improved sequences for the remaining genes. The reiterative approach is expected to have general application for the in silico gene identification/sequencing of any nematode for which detailed genetic/gene information is lacking. The identification of full coding sequences for the target genes enables further examinations including studies like (i) seeking to reconstitute functional proteins/systems for assessment in vitro (similar to [17]), (ii) seeking comparative genomic and transcriptomic assessments between parasites isolates that exhibit varied drug sensitivities; such studies are ongoing in our labs and those of collaborators.

Methods

Parasites

Adult O. dentatum worms were harvested from a pig (as described [20]) previously infected with an isolate that exhibits low sensitivity to levamisole [12]. Worms were counted, and separated by sex, by microscopic examination. To remove foreign contaminants, the separated (male versus female) worms were washed at least 3x in a pH 7.5 maintenance solution consisting of (mM): NaCl (150), KCl (2.7), CaCl2 (2), MgCl2 (0.3), PIPES (10), NaOH (13), glucose (11), NaHCO3 (12), penicillin 0.06 g L-1, streptomycin 0.1 g L-1. Worms were settled by gravity before removing each wash, then blotted dry and transferred to a 1.5 ml microfuge tube, weighed by difference, and processed for extraction of RNA.

RNA isolation

Parasite samples resuspended in 1.0 ml TRI reagent (Molecular Research Center, Cincinnati, Ohio) were ground by mortar and pestle under liquid nitrogen then brought to a total volume of 2 to 3 ml TRI reagent. Total RNA was extracted from the TRI reagent according to the manufacturer’s instructions, including an additional centrifugation step for clearing insoluble material. Extracted RNA was treated with DNase I (New England BioLabs, Ipswich, Massachusetts) (10 min at 37°C, 10 min at 75°C), then re-extracted with TRI reagent and resuspended in diethylpyrocarbonate-treated water. RNA concentration, purity, and quality (RNA Integrity Number) were assessed on a 2100 Bioanalyzer (Agilent Technologies, Santa Clara, California).

mRNA-Seq

The building of indexed, non-normalized, paired-end mRNA-seq libraries, and subsequent 75-cycle pyrosequencing on an Illumina GAIIx platform, were performed as a service by the DNA Facility (Office of Biotechnology, Iowa State University) using 5 μg total RNA (per sample). Male and female libraries were duplexed in a single sequencing lane.

Genomics and bioinformatics

Assembly

Velvet version 1.1.06 [9] was used for contig assembly.

Similarity searching

BLAST algorithms [21] were used to compare contigs with sequences available in public databases including the National Center for Biotechnology Information (NCBI) to identify homologues from other nematodes, i.e., sequences returning BLAST expect values ≤ 1E-10.

Read mapping

64 bit Bowtie [22] version 0.12.7 was used to map reads for contig building.

Pairwise comparison

The Needle algorithm [23] was used for pairwise comparison.

Custom codes

Java 1: Java-script code to read BLAST output and collect contigs that pass identity thresholds from a contig source file: available at Additional file 1. The length and identity of BLAST HSPs collected are adjustable parameters in the source code and should be adjusted for the degree of relatedness between the organisms compared. Java-2: Java-script code that reads Bowtie alignment file and collects mapped and orphan reads from read source file: available at Additional file 2.

RT-PCR

RT-PCR was performed using gene specific primers for ben-1 (ben-1 1 F: aacatgcgtgagatcgtgc, ben-1 1R: ctactcttctggataagcctcct) and avr-15 (avr-15 1 F: atgcatggtctcctaattgt, avr-15 1R: caaatgaagtcagtattcctcatct) using Platinum Taq DNA Polymerase High Fidelity (Invitrogen, Carlsbad, California). RT-PCR was performed for unc-38 using gene specific and SL1 splice leader primers (SL1: ggtttaattacccaagtttgag, unc-38 1R: caaggaggaaaatttatttcgagag) using Taq DNA Polymerase with ThermoPol Buffer (New England BioLabs, Ipswich, Massachusetts). Sanger sequencing was performed as a service by the DNA Facility (Office of Biotechnology, Iowa State University).

Competing interests

The authors declare that they have no competing interests.

Authors’ contributions

NMR performed the sequence assembly, wrote the custom scripts, and contributed to drafting of the manuscript. JKB and RJM contributed to the experimental design, analysis, and drafting of the manuscript. All authors read and approved the final manuscript.

Additional file 1

Java-script code to read BLAST output and collect contigs that pass identity thresholds from a contig source file. Description of data: Java-script code to read BLAST output. Click here for file

Additional file 2

Java-script code that reads Bowtie alignment file and collects mapped and orphan reads from read source file. Description of data: Java-script code to read BOWTIE alignment file. Click here for file
  22 in total

Review 1.  Drug resistance in human helminths: current situation and lessons from livestock.

Authors:  S Geerts; B Gryseels
Journal:  Clin Microbiol Rev       Date:  2000-04       Impact factor: 26.132

2.  Velvet: algorithms for de novo short read assembly using de Bruijn graphs.

Authors:  Daniel R Zerbino; Ewan Birney
Journal:  Genome Res       Date:  2008-03-18       Impact factor: 9.043

3.  Drug resistance mechanisms in helminths: is it survival of the fittest?

Authors:  Catherine E James; Amanda L Hudson; Mary W Davey
Journal:  Trends Parasitol       Date:  2009-06-21

Review 4.  Next-generation transcriptome assembly.

Authors:  Jeffrey A Martin; Zhong Wang
Journal:  Nat Rev Genet       Date:  2011-09-07       Impact factor: 53.242

5.  Resistance to levamisole resolved at the single-channel level.

Authors:  A P Robertson; H E Bjorn; R J Martin
Journal:  FASEB J       Date:  1999-04       Impact factor: 5.191

6.  In vitro characterization of lines of Oesophagostomum dentatum selected or not selected for resistance to pyrantel, levamisole and ivermectin.

Authors:  M Várady; H Bjørn; J Craven; P Nansen
Journal:  Int J Parasitol       Date:  1997-01       Impact factor: 3.981

7.  Oesophagostomum infections in humans.

Authors:  A M Polderman; J Blotkamp
Journal:  Parasitol Today       Date:  1995-12

Review 8.  Genome sequence of the nematode C. elegans: a platform for investigating biology.

Authors: 
Journal:  Science       Date:  1998-12-11       Impact factor: 47.728

9.  De novo assembled expressed gene catalog of a fast-growing Eucalyptus tree produced by Illumina mRNA-Seq.

Authors:  Eshchar Mizrachi; Charles A Hefer; Martin Ranik; Fourie Joubert; Alexander A Myburg
Journal:  BMC Genomics       Date:  2010-12-01       Impact factor: 3.969

10.  Field efficacy and safety of an oral formulation of the novel combination anthelmintic, derquantel-abamectin, in sheep in New Zealand.

Authors:  P R Little; A Hodges; T G Watson; J A Seed; S J Maeder
Journal:  N Z Vet J       Date:  2010-06       Impact factor: 1.628

View more
  4 in total

Review 1.  Perusal of parasitic nematode 'omics in the post-genomic era.

Authors:  Jonathan D Stoltzfus; Adeiye A Pilgrim; De'Broski R Herbert
Journal:  Mol Biochem Parasitol       Date:  2016-11-22       Impact factor: 1.759

2.  Transcriptomic evaluation of the nicotinic acetylcholine receptor pathway in levamisole-resistant and -sensitive Oesophagostomum dentatum.

Authors:  Nathan M Romine; Richard J Martin; Jeffrey K Beetham
Journal:  Mol Biochem Parasitol       Date:  2014-02-12       Impact factor: 1.759

Review 3.  Next-generation sequencing in veterinary medicine: how can the massive amount of information arising from high-throughput technologies improve diagnosis, control, and management of infectious diseases?

Authors:  Steven Van Borm; Sándor Belák; Graham Freimanis; Alice Fusaro; Fredrik Granberg; Dirk Höper; Donald P King; Isabella Monne; Richard Orton; Toon Rosseel
Journal:  Methods Mol Biol       Date:  2015

4.  The Conqueror Worm: recent advances with cholinergic anthelmintics and techniques excite research for better therapeutic drugs.

Authors:  R J Martin; S Puttachary; S K Buxton; S Verma; A P Robertson
Journal:  J Helminthol       Date:  2014-05-29       Impact factor: 2.170

  4 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.