Literature DB >> 23404454

Whole-genome sequencing of Theileria parva strains provides insight into parasite migration and diversification in the African continent.

Kyoko Hayashida1, Takashi Abe, William Weir, Ryo Nakao, Kimihito Ito, Kiichi Kajino, Yutaka Suzuki, Frans Jongejan, Dirk Geysen, Chihiro Sugimoto.   

Abstract

The disease caused by the apicomplexan protozoan parasite Theileria parva, known as East Coast fever or Corridor disease, is one of the most serious cattle diseases in Eastern, Central, and Southern Africa. We performed whole-genome sequencing of nine T. parva strains, including one of the vaccine strains (Kiambu 5), field isolates from Zambia, Uganda, Tanzania, or Rwanda, and two buffalo-derived strains. Comparison with the reference Muguga genome sequence revealed 34 814-121 545 single nucleotide polymorphisms (SNPs) that were more abundant in buffalo-derived strains. High-resolution phylogenetic trees were constructed with selected informative SNPs that allowed the investigation of possible complex recombination events among ancestors of the extant strains. We further analysed the dN/dS ratio (non-synonymous substitutions per non-synonymous site divided by synonymous substitutions per synonymous site) for 4011 coding genes to estimate potential selective pressure. Genes under possible positive selection were identified that may, in turn, assist in the identification of immunogenic proteins or vaccine candidates. This study elucidated the phylogeny of T. parva strains based on genome-wide SNPs analysis with prediction of possible past recombination events, providing insight into the migration, diversification, and evolution of this parasite species in the African continent.

Entities:  

Keywords:  SNPs; Theileria parva; dN/dS; genome sequence; recombination

Mesh:

Year:  2013        PMID: 23404454      PMCID: PMC3686427          DOI: 10.1093/dnares/dst003

Source DB:  PubMed          Journal:  DNA Res        ISSN: 1340-2838            Impact factor:   4.458


Introduction

Theileria parva is a tick-borne protozoan parasite belonging to the phylum Apicomplexa. Infection of T. parva in cattle causes a severe disease known as East Coast fever (ECF) or Corridor disease.[1-3] The disease is endemic in East African countries, where it has caused a serious economical problem to the livestock industry. Although the mortality in cattle may reach 100%, especially in exotic breeds, the Cape buffalo (Syncerus caffer) shows no clinical signs and is considered to be the main natural host. Although clinical differences have been documented,[4] ECF and Corridor disease have similar presentations. However, a major epidemiological difference is that, whereas ECF spreads from cattle to cattle, Corridor disease is believed to be transmitted solely from buffalo to cattle. The parasites causing ECF and Corridor disease were designated as T. p. parva and T. p. lawrencei, respectively.[3] Vaccination against ECF is based on an infection and treatment method that involves inoculation of live sporozoite-stage parasites and simultaneous treatment with long-acting tetracycline.[5] The Muguga cocktail, consisting of the three strains of Muguga, Serengeti-transformed, and Kiambu 5, is the most widely used vaccine in East Africa. Importantly, there is an extensive debate concerning the risk of vaccination with live non-attenuated sporozoites such as the Muguga cocktail vaccine, as the vaccination may introduce parasites with an exotic genetic background into the local parasite population.[6-9] This was proven to be a real risk when Oura et al.[7] demonstrated the transmission of a strain of vaccine constituent to unvaccinated cattle under field conditions in Uganda. In addition, the presence of the vaccine component strain (Muguga or Serengeti-transformed) was confirmed in clinical cases of ECF in the Southern Province of Zambia,[6] following deployment of the Muguga Cocktail over a 7-year period, ranging from 1986 to 1992. Therefore, two indigenous Zambian strains (Katete and Chitongo) have been used as a vaccine in the Eastern and Southern Provinces of Zambia,[10] although the consequences of this vaccination have not been analysed. Given that Theileria parasites could recombine between divergent strains during the sexual stage in ticks, vaccine-derived ‘exotic’ and ‘local’ strains could exchange genetic information, resulting in parasites with genetic mosaics and diversity. In addition to the problems with the current vaccine, quality control of the cocktail vaccine in terms of the composition of each component is difficult. This may be related to recombination and selection during the maintenance and passage of the stabilates through ticks.[11] Thus, precise and reliable methods for parasite genotyping or phenotyping during vaccine production and its field application are required. Genetic diversity between different T. parva strains has been assessed using various approaches, including polymerase chain reaction (PCR) or PCR-restriction fragment length polymorphism (RFLP) of polymorphic antigen-encoding genes,[6,12] or the indirect immunofluorescence assay (IFA) using monoclonal antibodies against the surface protein, the polymorphic immunodominant molecule (PIM).[13] A panel of micro- and mini-satellite markers has also been developed[14,15] that is widely used in the genetic analysis of field populations[7,8] and has also been used to characterize vaccine stabilates[11] and genetic recombination analysis.[16-18] However, the resolution of genetic differentiation in these studies is limited because of the relatively low marker density. In this study, we carried out the whole-genome sequencing of nine T. parva strains, comprising seven cattle-derived and two buffalo-derived strains, using next-generation sequencing technology. Genome-wide comparison of strains revealed genetic polymorphisms on a fine scale and was used to infer phylogenetic relationships among the parasites. The analysis enabled us to determine potential immune selective pressures against parasite genes, which may prove useful in identifying potential antigens. Moreover, the allelic diversity pattern among strains gave us insight into the evolution, diversification, and migration of this parasite in the African continent.

Materials and methods

Parasite strains

In total, nine strains of T. parva, mainly isolated in the 1980s, were used in this study. The place and the year isolated are shown in Table 1. These strains were originally isolated in ticks from infected cows and cultured as schizont-infected bovine lymphocyte cell lines. ChitongoZ2and KateteB2 have been used as sporozoite stabilate vaccines in the Eastern and Southern Provinces of Zambia.[10] Kiambu 5[19] is one of the Muguga cocktail vaccine components, and KiambuZ464/C12 is a strain that has been cloned out from Kiambu 5 (Kenya, stabilate 68). Zambian strains KateteB2, ChitongoZ2, and MandaliZ22H10 were isolated before the introduction of the Muguga cocktail into Zambia, thus representing ECF epidemiology in Zambia, excluding human-induced genetic contamination. In addition, the analysis included two buffalo-derived isolates, LAWR and Z5E5. Z5E5 is a buffalo-type isolate obtained from a bovine, whereas LAWR is a buffalo-type isolate obtained from a buffalo. KiambuZ464/C12, MandaliZ22H10, and Z5E5 were cloned by limiting dilution. These Theileria-infected cell lines did not undergo extensive passages (<30 passage) and were stored in liquid nitrogen until use. Cultures were maintained in Roswell Park Memorial Institute (RPMI) -1640 culture medium containing 10 or 20% heat-inactivated fetal bovine serum, 50 µM 2-mercaptoethanol, 50 units/ml penicillin, and 50 mg/ml streptomycin.
Table 1.

T. parva strains sequenced in this study with the summary of Solexa sequence results

Strain namePlace isolatedIsolated yearTotal reads obtainedReference genome mapped readsMapped read (%)Average coverageGenome covered (%)SNP number
SNP density (per 1kb)
OverallCodingNon-codingOverallCodingNon-coding
ChitongoZ2Zambia198214 405 28511 225 62977.949.197.446 36631 75314 6135.635.485.99
KateteB2Zambia198916 558 7654 954 29129.921.397.343 87331 53312 3405.335.445.06
Kiambu Z464/C12Kenya197215 848 4476 278 93239.627.497.246 43533 02113 4145.645.705.50
MandaliZ22H10Zambia198516 362 2873 904 89723.917.19738 49828 27010 2284.674.884.19
EntebbeUganda198010 171 3123 547 20834.915.595.234 81427 19576194.234.693.12
NyakizuRwanda197929 366 7825 710 63419.4259751 79034 70017 0906.295.997.01
KatumbaTanzania198135 406 7254 089 73611.617.997.146 44132 32114 1205.645.585.79
Buffalo LAWRKenya199017 072 3606 155 88836.126.994.7121 54577 47244 07314.7613.3718.07
Buffalo Z5E5Zambia198214 821 0545 119 54234.522.495.3103 88068 45435 42612.6111.8114.52
T. parva strains sequenced in this study with the summary of Solexa sequence results

Parasite purification and genomic DNA preparation

Schizont-enriched material was prepared from the infected lymphocytes by a density-gradient separation method as previously described,[20-22] with some modifications. The cells were treated with 3 µM nocodazole for 18 h, and then harvested cells were lyzed for 30–60 min at room temperature with a Gram-negative bacterium, Aeromonas hydrophila (AH-1)-derived haemolysin, in a suspension of HEPES-CaCl2 (10 mM-(2-hydroxyethyl)-1-piperazineethanesulfonic acid (HEPES), 150 mM NaCl, 20 mM KCl, and 1 mM CaCl2, pH 7.4) to obtain a cell concentration of 4 × 107 cells/ml (0.5–2 × 108 cells in total). Crude AH-1 haemolysin was prepared by bacterial culture supernatant according to a previously described method[23] and was added to the cell suspension at a final concentration of 100 U/ml. Lysis of infected lymphocytes was observed under a microscope. If complete cell lysis was not observed after 15 min, then the incubation period was prolonged until almost 100% of cells were lyzed, whereas schizonts remained intact. Because the sensitivity of schizont-infected cells varied significantly between cell lines, the maximum incubation time was 120 min. After lysis, the suspension was washed with HEPES-CaCl2 and re-suspended in 3 ml of HEPES-ethylenediamine tetraacetic acid (EDTA) (10 mM HEPES, 150 mM NaCl, 20 mM KCl, and 5 mM CaCl2, pH 7.4). Four layers of Percoll solution comprising 10, 10, 5, and 5 ml of 65, 40, 30, and 20% Percoll in HEPES-EDTA, respectively, were prepared in an ultracentrifuge tube. The cell lysate was overlaid on top of the Percoll solution and ultracentrifuged at 87 000 g for 30 min at 4°C, using a SW41 rotor (Beckman, USA). The schizont layer that formed at the interface between 40 and 65% Percoll solutions was carefully collected with a Pasteur pipette and then washed in phosphate-buffered saline (PBS) to remove the Percoll. A sample of each schizont preparation was stained with Giemsa, and preparations with negligible amounts of contamination with host-cell components were subjected to DNA isolation.

DNA preparation, whole-genome amplification, and Illumina genome analyzer II (GAII) sequencing

Genomic DNA was prepared from the purified schizonts using the NucleoSpin Tissue XS protocol (Machery-Nagel, Duren, Germany). Whole-genome amplification was performed on 10 ng of the total template DNA using an Illustra GenomiPhi V2 DNA Amplification Kit (GE Healthcare) following the manufacturer's instructions.[24,25] The obtained DNA was purified by ethanol precipitation and subjected to sequence analysis. A 36 nucleotide, single-end sequence run was performed on the Illumina GAII Analyzer following the manufacturer's protocols (Illumina, San Diego, GA, USA). The obtained reads, as listed in Table 1, were mapped on the 8 235 476 bp sequence of the T. parva Muguga strain (AAGK01000001, AAGK01000002, AAGK01000005, AAGK01000006, and AAGK01000004) using the CLC Genomics Workbench (CLC bio, Aarhus, Denmark, Version 4.0.2). The ungapped alignment algorithm was used for all alignments, keeping the default parameters for mismatch and deletion costs (mismatch cost = 2, deletion cost = 1). Files containing these short sequence reads were submitted to the DDBJ Sequence Read Archive (accession number DRA000613).

Single nucleotide polymorphisms (SNPs) analysis

Three sets of single nucleotide polymorphisms (SNPs) were defined (Supplementary Fig. S1). SNPs were identified by comparing each re-sequenced genome with the reference Muguga strain.[26] SNP detection was performed using the SNP detection tool in the CLC Genomic Workbench with the default parameters (window length = 11, maximum number of gaps and mismatch = 2, minimum average quality surrounding bases = 15, and minimum quality of central base = 20),[27] except for the minimum coverage that was set at five reads, and the list was manually curated to include only SNPs, where all reads within a single sample agreed (SNP dataset I). The extracted SNPs data were exported and analysed by Microsoft Office Excel 2010. SNP dataset I was used for creating a SNPs density map and for dN/dS analysis. From SNP dataset I, SNPs identified among the eight bovine T. parva strains were extracted. To avoid calling block substitutions as SNPs, SNPs were only selected, if they did not exist within 100 bp of another SNP, and this provided SNP dataset II. Allelic data from each strain were extracted, and this information was used for the allelic combination and recombination analysis. SNP dataset III was created using the eight cattle-derived and two buffalo-derived strains, and again SNP positions were required to have at least 100 bp intervals. Thus, the high stringency dataset encompassing all 10 Theileria strains (including the reference strain), SNP dataset III, was used for phylogenetic analysis. Plots of the allele combination pattern for each chromosome were generated using freeware and open-source R software version 2.11.1 (R Development Core Team, 2010; http://www.R-project.org). Genes under selection pressure were estimated by calculating the dN/dS between strains with the SNP dataset I by the method of Yang et al.,[28] implemented in the PAML package.[29] Signal sequences for all the annotated genes of the Muguga strain were predicted using SignalP v4.0.[30]

Phylogenetic tree and recombination detection

To identify the relationship between the sequences of the nine strains and the Muguga reference strain, an unrooted neighbour-net tree[31] was constructed based on the concatenated SNP dataset III using Split tree version 4.11.3.[32] The Recombination Detection Program version 3.44 (RDP3) was used to detect possible recombination regions.[33] This software incorporates several recombination prediction methods. As the reliability of each method has not been fully evaluated, it is anticipated that some of the recombination events predicted may be artifactual. We manually curated the results choosing Geneconv[34] and maximum Chi-square[35] as the selection priority, as the accuracy of these tests is relatively well defined.[36] Predicted recombination events were considered valid, if at least one additional program supported the findings, i.e. (P ≤ 0.001) for that event from RDP,[37] Boot scanning,[38] 3 Seq method,[39] or the sister-scanning method.[40] Predictions that did not meet these criteria were removed. For phylogenetic analysis of p150 and p104, we used mapping sequence information for each strain, and unmapped or unreliable regions were filled by manual Sanger sequencing. The sequences obtained in this study were submitted to GenBank under accession no. AB739676–AB739693.

Results

Genome sequencing of nine T. parva strains using Illumina technology

Single runs of Illumina produced over 10 million reads for each sample, and this provided coverage of 94.7–97.5% for genomes of individual strains against 8.3 M of the reference Muguga genome, with an average coverage between ×17 and ×49 (Table 1). Depending on the purity of the preparations, 11.6–77.9% of the total reads for any one strain were successfully mapped, whereas unmapped reads were considered to be derived from host genomic DNA. All four chromosomes of each stock were evenly covered in general, except for ChitongoZ2 (Fig. 1). As the concentration of extracted DNA from purified schizonts in ChitongoZ2 strain was lowest, we suspect that the whole-genome amplification procedure for this strain caused biased amplification, resulting in an uneven distribution of the coverage; however, this did not affect SNPs detection.
Figure 1.

SNPs distribution across the Theileria genome. SNPs in individual strains were detected after mapping to the reference genome Muguga strain. The entire datasets of 34 814–121 545 SNPs (SNP dataset I) were plotted as SNP densities (per 10 kb intervals) alongside chromosome 1–4. The x-axis shows the chromosomal position, and the left y-axis shows the number of SNPs (black bars) per 10 kb interval. Average short read coverage is also shown on the right y-axis (above line). Arrowheads indicate the possible location of the centromere.

SNPs distribution across the Theileria genome. SNPs in individual strains were detected after mapping to the reference genome Muguga strain. The entire datasets of 34 814–121 545 SNPs (SNP dataset I) were plotted as SNP densities (per 10 kb intervals) alongside chromosome 1–4. The x-axis shows the chromosomal position, and the left y-axis shows the number of SNPs (black bars) per 10 kb interval. Average short read coverage is also shown on the right y-axis (above line). Arrowheads indicate the possible location of the centromere.

SNPs detection

Stringent conditions for SNPs detection were used, i.e. more than five high-quality reads covering the SNPs and 100% concordance in position. If multiple allele variants calling was allowed, 5216 loci had complex SNPs in at least one strain (0.0633% of the reference Muguga genome). As the genome of Theileria at the schizont stage is haploid, only a single allele is expected at each locus, and complex SNPs are unexpected, if the sample contains a clonal population. The appearance of these multi-allelic SNPs could represent base-calling or mapping errors (due to repetitive sequence or paralogous genes). Because other possibilities that these SNPs were generated during in vitro passages after cloning by the limited dilution and that minor populations in the original materials obtained from host animals remained in the analysed samples cannot be excluded, such questionable SNPs were excluded in further analysis. Although it is likely that some genuine SNPs may be overlooked, a high stringency SNPs calling protocol was utilized to avoid false SNPs calls. The number of SNPs identified in bovine-derived strains when compared with the Muguga strain ranged from 34 814 in the Entebbe strain to 51 790 in the Nyakizu strain. Additionally, 121 545 and 103 880 SNPs were identified in buffalo-derived LAWR and Z5E5 strains, respectively (Table 1). The densities of the SNPs in each chromosome tended to be higher in chromosomes 1 and 3 than in chromosomes 2 and 4 in most of the strains (Fig. 2). Out of a total of 533 642 SNPs identified in 9 strains (Table 1), 364 719 were present in coding regions (cSNP) and 168 923 were present in non-coding regions (ncSNP), although the SNP density (calculated per 1 kp) of cSNPs and ncSNPs were similar (Table 1). The numbers of SNPs ranged from 34 814 (Entebbe) to 121 545 for the buffalo-derived LAWR strain, and more than 2-fold SNPs were identified in 2 of the buffalo-derived strains when compared with the cattle-derived strains (Table 1), suggesting a degree of genetic differentiation between these types of Theileria. As shown in Fig. 1, clustered distribution of SNPs was observed (black bars in each panel). The uneven distribution of SNPs was not found to correlate with the sequence coverage distribution (line); thus, the effect of low SNPs calling efficiency in particular regions can be excluded. In addition, lower SNPs densities were observed within defined regions on chromosomes 1, 3, and 4, which was most evident in buffalo-derived Theileria strains (Fig. 1, arrowhead). These regions correspond to the putative centromeres with an extremely AT-rich composition.
Figure 2.

SNP density in each chromosome (SNP dataset I). Average SNP densities per 1 kb interval were calculated for each chromosome in nine T. parva strains with reference to the Muguga genome strain. In the published full genome sequence of T. parva, there is a large gap in the assembly of chromosome 3, due to the repetitive Tpr locus. The large contig AAGK01000005 and smaller contig AAGK 01000006 are shown as Chr3_530 and Chr3_531, respectively.

SNP density in each chromosome (SNP dataset I). Average SNP densities per 1 kb interval were calculated for each chromosome in nine T. parva strains with reference to the Muguga genome strain. In the published full genome sequence of T. parva, there is a large gap in the assembly of chromosome 3, due to the repetitive Tpr locus. The large contig AAGK01000005 and smaller contig AAGK 01000006 are shown as Chr3_530 and Chr3_531, respectively.

dN/dS analysis

The ratio of the number of non-synonymous substitutions per non-synonymous site (dN) to synonymous substitutions per synonymous site (dS) both at the inter- and intra-species level has been used to estimate the potential selective pressure acting on the genes.[41] A dN/dS ratio lower than one suggests negative or purifying selection, whereas a ratio higher than one suggests positive selection or diversification. Estimation of dN/dS ratios can potentially identify genes encoding immunogenic proteins and, thus, putative vaccine candidates.[42] Therefore, we calculated dN/dS ratios for individual genes using SNP dataset I for seven bovine Theileria strains with the yn00 program of the PAML package.[29] Overall, the dN/dS ratios calculated between cattle T. parva strains were average values of 0.0894–0.0993 when pair-wise comparisons were performed against the Muguga strain, with similar values to those observed in the comparison between T. parva versus Theileria annulata (average dN/dS = 0.097).[43] Among a total of 4011 genes annotated on the Muguga genome, 263 genes showed elevated levels of dN/dS values (average + 3SD) in at least 1 strain (Supplementary Table S1). We further narrowed the list down to 71 genes by selecting only those genes that have a signal sequence for targeting to the endoplasmic reticulum. Those selected genes may be potential targets of the host's immune system. The final list of these possible antigenic, and therefore vaccine target, genes is shown in Table 2, and the orthologous groups were also assigned according to our previous study.[44] Most of the other genes listed here are currently annotated as hypothetical proteins without any predicted functional domain. However, some of them are known to be recognized by host humoral immunity. For example, p32 (TP01_1056)[45] and 23 kDa piroplasm surface protein (TP02_0551)[46] are erythrocytic piroplasm stage antigens, and strong antibody response in infected cattle has been reported.
Table 2.

List of genes with high dN/dS ratios and a secretion signal peptide 71 genes were listed from 263 genes (higher dN/dS ratios), by selecting secretion signal peptide-predicted genes

GeneIDDescriptionOrtholog groupSignalGeneIDDescriptionOrtholog groupSignal
TP01_0144Hypothetical proteinPiroF0002444YTP03_0003Hypothetical (SVSP)PiroF0100037Y
TP01_0178Hypothetical proteinPiroF0002919YTP03_0039Hypothetical proteinNot assignedY
TP01_018040S ribosomal protein S11PiroF0000589YTP03_0040Hypothetical proteinPiroF0003613Y
TP01_0291Hypothetical proteinPiroF0002390YTP03_0123Hypothetical proteinPiroF0002851Y
TP01_0367Hypothetical proteinPiroF0000012YTP03_0217Hypothetical proteinPiroF0000012Y
TP01_0378Hypothetical proteinPiroF0003402YTP03_0297Hypothetical (FAINT superfamily)PiroF0100056Y
TP01_0380Hypothetical proteinPiroF0003404YTP03_0298Hypothetical (FAINT superfamily)PiroF0000056Y
TP01_0610Hypothetical (Tash family)PiroF0100038YTP03_0319Hypothetical proteinPiroF0000012Y
TP01_0619Hypothetical (Tash family)PiroF0100038YTP03_0368Hypothetical (FAINT superfamily)PiroF0100056Y
TP01_0621Hypothetical (Tash family)PiroF0100038YTP03_0405Hypothetical proteinPiroF0002425Y
TP01_0914Hypothetical proteinPiroF0002316YTP03_0498Hypothetical (SVSP)PiroF0100037Y
TP01_0955Hypothetical proteinPiroF0003569YTP03_0520Hypothetical proteinPiroF0000012Y
TP01_0987Hypothetical proteinPiroF0002967YTP03_0530Hypothetical proteinY
TP01_1011Hypothetical proteinPiroF0100045YTP03_0664Hypothetical proteinPiroF0000012Y
TP01_1044Hypothetical proteinNot assignedYTP03_0780Hypothetical proteinPiroF0002660Y
TP01_105632 kDa surface antigenPiroF0002963YTP03_0810Hypothetical proteinPiroF0002675Y
TP01_1109Hypothetical proteinPiroF0000207YTP03_0886Hypothetical (SVSP)PiroF0100037Y
TP01_1227Hypothetical (SVSP)PiroF0100037YTP03_0893Hypothetical (SVSP)PiroF0100037Y
TP02_0004Hypothetical (SVSP)PiroF0100037YTP04_0009Hypothetical (SVSP)PiroF0100037Y
TP02_0006Hypothetical (SVSP)PiroF0100037YTP04_0012Hypothetical (FAINT superfamily)PiroF0100056Y
TP02_0010Hypothetical (SVSP)PiroF0100037YTP04_0013Hypothetical (SVSP)PiroF0100037Y
TP02_0018Hypothetical proteinPiroF0100055YTP04_0096Hypothetical (FAINT superfamily)PiroF0100056Y
TP02_0239Hypothetical proteinPiroF0002609YTP04_0097Hypothetical (FAINT superfamily)PiroF0100056Y
TP02_0327Hypothetical proteinPiroF0000012YTP04_0101Hypothetical (FAINT superfamily)PiroF0100056Y
TP02_0331Ubiquitin-activating enzyme, putativePiroF0002575YTP04_0104Hypothetical (FAINT superfamily)PiroF0100056Y
TP02_055123 kDa piroplasm surface proteinPiroF0003021YTP04_0110Hypothetical proteinPiroF0001224Y
TP02_0575Hypothetical proteinPiroF0003017YTP04_0116Hypothetical proteinPiroF0003546Y
TP02_0819Hypothetical (FAINT superfamily)PiroF0100056YTP04_0150Hypothetical (SVSP)PiroF0000037Y
TP02_0856Hypothetical (FAINT superfamily)PiroF0100056YTP04_0328Hypothetical proteinPiroF0002219Y
TP02_0875Hypothetical proteinPiroF0002985YTP04_0411Hypothetical proteinPiroF0003185Y
TP02_0952Hypothetical proteinPiroF0003456YTP04_0437104 kDa antigenPiroF0003088Y
TP02_0954Hypothetical (SVSP)PiroF0100037YTP04_0558Hypothetical proteinPiroF0001517Y
TP02_0956Hypothetical (SVSP)PiroF0100037YTP04_0919Hypothetical (SVSP)PiroF0100037Y
TP03_0001Hypothetical (SVSP)PiroF0100037YTP04_0920Hypothetical (SVSP)PiroF0100037Y
TP03_0002Hypothetical protein (SVSP)PiroF0100037YTP04_0921Hypothetical protein (FAINT superfamily)PiroF0000056Y
List of genes with high dN/dS ratios and a secretion signal peptide 71 genes were listed from 263 genes (higher dN/dS ratios), by selecting secretion signal peptide-predicted genes

Phylogenetic relationship among 10 T. parva strains and evidence of recombination

The allele frequency or combination of the bovine Theileria strain alleles collected in SNP dataset II was determined. By scoring biallelic positions only, 127 allelic combinations were identified among 8 bovine Theileria. Each of the 15 901 SNPs was assigned 1 of the 127 combinations. When the rank order of these combinations was calculated, the allele pattern unique to the Muguga strain came first, followed by Nyakizu-, KiambuZ464/C12-, and Katumba-unique allele combinations (Fig. 3). Because Muguga strains were used as the reference sequence, ranking ‘Muguga strain-unique allele pattern’ as the first event seems reasonable, as it incorporates a minor allele that is present in the Muguga strain. The distribution of frequencies among the 127 events was uneven because 54% of all SNPs were assigned to these top 10 allelic combinations. When the list was extended to cover the top 20 or 25 combinations, this ratio increased to 73 and 80%, respectively, indicating that most of the SNPs alleles were represented by a limited number of combinations. The distribution of these different SNPs patterns is represented on a schematic diagram of the chromosomes, and different combination events are colour coded (Fig. 3). As shown in Fig. 3, allelic combinations among the strains are distributed throughout every chromosome. A major observation was that SNPs with particular allelic combinations tend to cluster into defined loci, giving rise to a rough, large-scale mosaic pattern of allelic combinations. If the evolution of these strains had taken place completely independently, i.e. without interaction between strains, this clustering of allelic combinations would not be expected.
Figure 3.

Mosaic pattern of SNPs in T. parva strains. The frequency of each of the 127 possible allelic combinations for the 8 cattle-derived T. parva strains was calculated using the SNP dataset II. The 10 top-ranking combinations were plotted onto schematic chromosomes in the assigned colours. Each line within a chromosome represents a single SNP marker position.

Mosaic pattern of SNPs in T. parva strains. The frequency of each of the 127 possible allelic combinations for the 8 cattle-derived T. parva strains was calculated using the SNP dataset II. The 10 top-ranking combinations were plotted onto schematic chromosomes in the assigned colours. Each line within a chromosome represents a single SNP marker position. The relationships among the 10 T. parva strains were analysed by creating a phylogenetic tree (Fig. 4). The allelic combinations are well correlated with the phylogenetic relationship among these strains, and the top 10 allelic combination events represented major nodes in the tree. Neighbour net is a phylogenetic network construction method that combines aspects of the neighbour joining and Split tree. In this neighbour-net analysis, the appearance of the reticulated branches indicates the recombination events. Considered together with the mosaic allelic combination patterns as described above (Fig. 3), we speculate that recombination events are responsible for the interrelationships between strains. To verify this hypothesis, we carried out further recombination event estimations with the RDP programs. The concatenated SNP dataset II was subjected to six recombination detection tests, namely Geneconv, maximum Chi-square, RDP, Boot scanning, 3 Seq., and sister-scanning methods. This resulted in a minimum of 133 recombination events being predicted as shown in Supplementary Fig. S2. A snapshot of the alignment of a concatenated version of the SNP dataset II is also shown in Supplementary Fig. S3. An RDP analysis was also carried out using the SNP dataset III, but no significant evidence for recombination was detected between cattle- and buffalo-derived strains (Z5E5 and LAWR, data not shown).
Figure 4.

Neighbour-net network analysis of 10 T. parva strains. Neighbour-net network analysis was performed with the concatenated SNP allele sequence data from SNP dataset III. Bootstrap values are based on 100 replicates and were near 100 at most of the nodes.

Neighbour-net network analysis of 10 T. parva strains. Neighbour-net network analysis was performed with the concatenated SNP allele sequence data from SNP dataset III. Bootstrap values are based on 100 replicates and were near 100 at most of the nodes. As polymorphic antigens such as p104 or p150 have been used for the genotyping of T. parva,[6] we compared results of genotyping based on p104 or p150 with those obtained by SNPs analysis. As shown in Supplementary Fig. S4, there was no congruency in tree shapes. The most likely explanation for this inconsistency is that the recombination event between the ancestral strains involved these loci, as is evident in Supplementary Fig. S2. RDP3 program predicts recombination events within those two loci. In p104 loci, KateteB2 and Katumba are predicted to be recombinant from unknown parent or Entebbe strains. And this is true for Muguga, KiambuZ464/C12, and the possible donor, Nyakizu, at the p150 locus as marked in Supplementary Fig. S2.

Discussion

Comparison of whole-genome sequencing data of several Theileria strains, using short reads sequencing and mapping on the reference genome sequence, revealed genome-wide nucleotide-based polymorphisms in this species. SNPs density plots evaluate clustered SNPs distribution across the genome and identify SNP-poor and SNP-rich regions. Such a clustering of SNPs has been also reported in mammalian genome, although the forces responsible (e.g. mutation hot spot, recombination, or balancing selection) remain poorly understood.[47,48] For apicomplexan parasite, reports for the genome-wide SNPs analysis are limited, but similar SNPs distribution pattern was observed in Plasmodium[49] suggesting existence of the same underlying mechanisms between parasite and mammalian genomes for these SNPs clustering. Our SNPs analysis clarified the phylogenetic relationships among 10 Theileria strains on a genome-wide scale. When these Theileria strains were further analysed using neighbour-net analysis, clusters were formed in accordance with both host species and geographical origin. For example, three Zambian strains (ChitongoZ2, MandaliZ22H10, and KateteB2) were clustered together in the same node, inferring that they are closely related genotypes, but distant from strains isolated in Eastern Africa. In addition, there is a clear demarcation between the bovine- and buffalo-derived strains (Z5E5 and LAWR). Genetic difference between Z5E5 and LAWR was also confirmed as high numbers of SNPs were not shared between Z5E5 and LAWR, as is shown in Supplementary Fig. S5. However, reticulated patterns between strains belonging to different clusters are evident, as shown in Fig. 4, which suggests genetic recombination between ancestors of the strains that are currently geographically separated. The evidence for recombination among the analysed strains was further supported by the presence of a mosaic pattern of allelic combinations, together with the statistical analysis of recombination. This result is intuitive when one considers the fact that the parasite has an obligate sexual cycle and that analysis of field populations suggests that recombination in the tick vector is commonplace.[14] There are two possibilities of ticks being infected with parasites with different genotypes: infestation on a single bovine host infected with genotypically mixed parasite populations or multiple infestations on different animals infected with different parasite genotypes that are possible for two or three host tick species. However, the latter is less likely, as synchronization of the sexual stages (micro- and macro gamete) between two parasites is difficult, if they are picked up by ticks at different feeding times. We hypothesize that genetic recombination occurred in the ancestral bovine Theileria populations in the distant past, and parasites had evolved independently after geographical isolation. The origin of T. p. parva in cattle is unknown, but it is considered likely to have originated in the African buffalo.[50] Evidently, T. parva populations in buffalo are considered to be more diverse than in cattle[6,13,51] and cause severe disease in cattle. Historically, domestic cattle were believed to have been introduced to the African continent thousands of years ago, possibly into Sub-Saharan Africa from the Mideast.[52-54] After the introduction of cattle, a subset of the buffalo Theileria population may have been transmitted (at that stage, it would be called T. p. lawrencei, as it could not infect other cattle), adapted, and co-evolved within cattle, resulting in the emergence of T. p. parva that can spread within cattle. It should be emphasized that the phylogenetic tree obtained from two polymorphic antigens (p104 and p150) showed a different topology from that based on genome-wide SNPs. Thus, the interpretation of the phylogenetic relationship, analysed by a limited number of loci, must be made carefully in the case of pathogens that acquire genetic diversity by recombination, rather than by accumulation of mutations. This is due to the fact that each locus can become chimaeric by crossing over between genotypes that have different evolutional histories. Therefore, a number of independent loci should be included to estimate the real relationship between isolates such as multi-locus sequencing typing, but genome-wide SNPs analysis is the ultimate solution in this context. Two buffalo-derived strains (Z5E5 and LAWR) were genetically distant from cattle-derived Theileria strains and between the two strains, as expected from earlier studies.[6,13,51] It has been proposed that genetic exchange between buffalo-derived Theileria and cattle-derived Theileria still occurs through sexual recombination, based on evidence that T. p. lawrencei and T. p. parva showed a mosaic sequence pattern in the ITS region.[55] However, in our recombination analysis using the RDP program, no recombination events were detected between bovine and buffalo Theileria strains. It might be hypothesized that cattle-infecting strains were originated from a subset of buffalo-infecting T. parva population that has been circulating in Africa for a long time and now have evolved a genetic barrier to recombination. Further analysis with a greater number of buffalo-derived samples and denser SNPs coverage would be needed to clarify the genetic relationship between buffalo and cattle Theileria more precisely. Estimation of dN/dS values can potentially identify immunogenic genes under possible selective pressure and, thus, possible vaccine candidate molecules. The selected candidate 71 antigen list (Table 2) covers most of the known genes for antigenic or host-interacting proteins, which confirms the effectiveness of this genome-wide approach. p23[46] and p32[45] are surface or secreted antigens recognized by humoral immunity in infected animals. The diversification of these genes may be related to immune evasion of the Theileria parasite. On the other hand, although several genes with CTL targets have been identified as being under possible immune pressure,[56] only Tp1 (TP03_0849) showed a higher dN/dS value in this study, whereas other genes for CTL targets (Tp2-9) showed relatively low dN/dS values (Supplementary Table S2). Relative conservation of the sequences of these CTL antigen genes among the different parasite strains has already been reported.[56,57] Considering that the CTL response is a function of the host MHC type/TCR repertoire and antigenic types of parasites, the positive selective pressure acting on a particular gene may be too weak to be detected. In addition, CTL recognizes short peptides presented by MHC class I molecules, and, therefore, immune-based selective pressure is likely to be focused on a limited region of the targeted genes that dN/dS analysis is not sensitive enough to detect. The selected 71 antigen list also contained several genes from 3 large gene families, namely the Tash gene family (Ortholog group number: PiroF0100038), the SVSP gene family (PiroF0100037), and FAINT super family (PiroF0100056, also called as SfiI-subtelomeric fragment related protein family member). The Tash gene family has been characterized extensively in T. annulata.[58] Some of the Tash and SVSP gene products have been predicted or demonstrated to be translocated in the host nucleus, and most of the Tash and SVSP genes are expressed predominantly in the schizont stage.[58,59] This entails that the potential selective pressure will not be humoral, although the possibility that these proteins are exposed to the humoral immune response when infected cells are lysed cannot be excluded. A previous comparison between T. annulata and T. parva genomes also revealed high inter-species dN/dS ratios for the Tash and the SVSP family,[60] consistent with our analysis. It was argued that gene expansion and divergence of Tash and SVSP family genes were associated with different functionality in each species. In conclusion, this study highlighted the phylogenetic relationship of 10 T. parva strains based on full genome sequences with prediction of possible past recombination events. The high-density SNPs map developed in this study is now applicable for genotyping or linkage analysis of the parasite. Practically, SNPs-based genotyping can discriminate vaccine and field strains of T. parva. Recent methodological advances in high-throughput technologies such as Taq man-real-time PCR and Golden gate technologies[61] for SNPs genotyping will likely facilitate future genotyping studies. Further phylogenetic analysis in combination with phenotypic data will assist in the investigation of the virulence and evolution of bovine theilerias after their diversification from buffalo. Importantly, the putative antigen-encoding genes listed in this study should be further investigated to assess their candidacy as Theileria subunit vaccine components.

Supplementary data

Supplementary Data are available at www.dnaresearch.oxfordjournals.org.

Funding

This work was supported in part by the Grants-in-Aid for Scientific Research and Asia-Africa S & T Strategic Cooperation Promotion Program by the Special Coordination Funds for Promoting Science & Technology, from the Ministry of Education, Culture, Sports, Science and Technology of Japan (MEXT) to C.S. K.H. was supported by the Program of Founding Research Centers for Emerging and Reemerging Infectious Diseases, MEXT.
  59 in total

1.  A modified bootscan algorithm for automated identification of recombinant sequences and recombination breakpoints.

Authors:  D P Martin; D Posada; K A Crandall; C Williamson
Journal:  AIDS Res Hum Retroviruses       Date:  2005-01       Impact factor: 2.205

2.  Population genetic analysis and sub-structuring of Theileria parva in Uganda.

Authors:  C A L Oura; B B Asiimwe; W Weir; G W Lubega; A Tait
Journal:  Mol Biochem Parasitol       Date:  2005-04       Impact factor: 1.759

3.  Application of phylogenetic networks in evolutionary studies.

Authors:  Daniel H Huson; David Bryant
Journal:  Mol Biol Evol       Date:  2005-10-12       Impact factor: 16.240

4.  Genome sequence of Theileria parva, a bovine pathogen that transforms lymphocytes.

Authors:  Malcolm J Gardner; Richard Bishop; Trushar Shah; Etienne P de Villiers; Jane M Carlton; Neil Hall; Qinghu Ren; Ian T Paulsen; Arnab Pain; Matthew Berriman; Robert J M Wilson; Shigeharu Sato; Stuart A Ralph; David J Mann; Zikai Xiong; Shamira J Shallom; Janice Weidman; Lingxia Jiang; Jeffery Lynn; Bruce Weaver; Azadeh Shoaibi; Alexander R Domingo; Delia Wasawo; Jonathan Crabtree; Jennifer R Wortman; Brian Haas; Samuel V Angiuoli; Todd H Creasy; Charles Lu; Bernard Suh; Joana C Silva; Teresa R Utterback; Tamara V Feldblyum; Mihaela Pertea; Jonathan Allen; William C Nierman; Evans L N Taracha; Steven L Salzberg; Owen R White; Henry A Fitzhugh; Subhash Morzaria; J Craig Venter; Claire M Fraser; Vishvanath Nene
Journal:  Science       Date:  2005-07-01       Impact factor: 47.728

5.  Genome of the host-cell transforming parasite Theileria annulata compared with T. parva.

Authors:  Arnab Pain; Hubert Renauld; Matthew Berriman; Lee Murphy; Corin A Yeats; William Weir; Arnaud Kerhornou; Martin Aslett; Richard Bishop; Christiane Bouchier; Madeleine Cochet; Richard M R Coulson; Ann Cronin; Etienne P de Villiers; Audrey Fraser; Nigel Fosker; Malcolm Gardner; Arlette Goble; Sam Griffiths-Jones; David E Harris; Frank Katzer; Natasha Larke; Angela Lord; Pascal Maser; Sue McKellar; Paul Mooney; Fraser Morton; Vishvanath Nene; Susan O'Neil; Claire Price; Michael A Quail; Ester Rabbinowitsch; Neil D Rawlings; Simon Rutter; David Saunders; Kathy Seeger; Trushar Shah; Robert Squares; Steven Squares; Adrian Tivey; Alan R Walker; John Woodward; Dirk A E Dobbelaere; Gordon Langsley; Marie-Adele Rajandream; Declan McKeever; Brian Shiels; Andrew Tait; Bart Barrell; Neil Hall
Journal:  Science       Date:  2005-07-01       Impact factor: 47.728

6.  Why do human diversity levels vary at a megabase scale?

Authors:  Ines Hellmann; Kay Prüfer; Hongkai Ji; Michael C Zody; Svante Pääbo; Susan E Ptak
Journal:  Genome Res       Date:  2005-09       Impact factor: 9.043

7.  Molecular cloning and characterisation of 23-kDa piroplasm surface proteins of Theileria sergenti and Theileria buffeli.

Authors:  Y Sako; M Asada; S Kubota; C Sugimoto; M Onuma
Journal:  Int J Parasitol       Date:  1999-04       Impact factor: 3.981

8.  Extensive genotypic diversity in a recombining population of the apicomplexan parasite Theileria parva.

Authors:  Frank Katzer; Daniel Ngugi; Chris Oura; Richard P Bishop; Evans L N Taracha; Alan R Walker; Declan J McKeever
Journal:  Infect Immun       Date:  2006-10       Impact factor: 3.441

9.  Evidence for localisation of a Theileria parasite AT hook DNA-binding protein to the nucleus of immortalised bovine host cells.

Authors:  D G Swan; K Phillips; A Tait; B R Shiels
Journal:  Mol Biochem Parasitol       Date:  1999-06-25       Impact factor: 1.759

10.  Comparative genome analysis of three eukaryotic parasites with differing abilities to transform leukocytes reveals key mediators of Theileria-induced leukocyte transformation.

Authors:  Kyoko Hayashida; Yuichiro Hara; Takashi Abe; Chisato Yamasaki; Atsushi Toyoda; Takehide Kosuge; Yutaka Suzuki; Yoshiharu Sato; Shuichi Kawashima; Toshiaki Katayama; Hiroyuki Wakaguri; Noboru Inoue; Keiichi Homma; Masahito Tada-Umezaki; Yukio Yagi; Yasuyuki Fujii; Takuya Habara; Minoru Kanehisa; Hidemi Watanabe; Kimihito Ito; Takashi Gojobori; Hideaki Sugawara; Tadashi Imanishi; William Weir; Malcolm Gardner; Arnab Pain; Brian Shiels; Masahira Hattori; Vishvanath Nene; Chihiro Sugimoto
Journal:  MBio       Date:  2012-09-04       Impact factor: 7.867

View more
  13 in total

1.  The genomes of three stocks comprising the most widely utilized live sporozoite Theileria parva vaccine exhibit very different degrees and patterns of sequence divergence.

Authors:  Martin Norling; Richard P Bishop; Roger Pelle; Weihong Qi; Sonal Henson; Elliott F Drábek; Kyle Tretina; David Odongo; Stephen Mwaura; Thomas Njoroge; Erik Bongcam-Rudloff; Claudia A Daubenberger; Joana C Silva
Journal:  BMC Genomics       Date:  2015-09-24       Impact factor: 3.969

Review 2.  Approaches to vaccination against Theileria parva and Theileria annulata.

Authors:  V Nene; W I Morrison
Journal:  Parasite Immunol       Date:  2016-12       Impact factor: 2.280

3.  Analysis of Theileria orientalis draft genome sequences reveals potential species-level divergence of the Ikeda, Chitose and Buffeli genotypes.

Authors:  Daniel R Bogema; Melinda L Micallef; Michael Liu; Matthew P Padula; Steven P Djordjevic; Aaron E Darling; Cheryl Jenkins
Journal:  BMC Genomics       Date:  2018-04-27       Impact factor: 3.969

Review 4.  Theileria parva: a parasite of African buffalo, which has adapted to infect and undergo transmission in cattle.

Authors:  W Ivan Morrison; Johanneke D Hemmink; Philip G Toye
Journal:  Int J Parasitol       Date:  2020-02-04       Impact factor: 3.981

5.  Unique Mitochondrial Single Nucleotide Polymorphisms Demonstrate Resolution Potential to Discriminate Theileria parva Vaccine and Buffalo-Derived Strains.

Authors:  Micky M Mwamuye; Isaiah Obara; Khawla Elati; David Odongo; Mohammed A Bakheit; Frans Jongejan; Ard M Nijhof
Journal:  Life (Basel)       Date:  2020-12-08

6.  Antigenic Diversity in Theileria parva Populations From Sympatric Cattle and African Buffalo Analyzed Using Long Read Sequencing.

Authors:  Fiona K Allan; Siddharth Jayaraman; Edith Paxton; Emmanuel Sindoya; Tito Kibona; Robert Fyumagwa; Furaha Mramba; Stephen J Torr; Johanneke D Hemmink; Philip Toye; Tiziana Lembo; Ian Handel; Harriet K Auty; W Ivan Morrison; Liam J Morrison
Journal:  Front Genet       Date:  2021-07-15       Impact factor: 4.599

7.  Induced pluripotent stem cell generation-associated point mutations arise during the initial stages of the conversion of these cells.

Authors:  Mayumi Sugiura; Yasuji Kasama; Ryoko Araki; Yuko Hoki; Misato Sunayama; Masahiro Uda; Miki Nakamura; Shunsuke Ando; Masumi Abe
Journal:  Stem Cell Reports       Date:  2014-01-02       Impact factor: 7.765

8.  Genome-wide diversity and gene expression profiling of Babesia microti isolates identify polymorphic genes that mediate host-pathogen interactions.

Authors:  Joana C Silva; Emmanuel Cornillot; Carrie McCracken; Sahar Usmani-Brown; Ankit Dwivedi; Olukemi O Ifeonu; Jonathan Crabtree; Hanzel T Gotia; Azan Z Virji; Christelle Reynes; Jacques Colinge; Vidya Kumar; Lauren Lawres; Joseph E Pazzi; Jozelyn V Pablo; Chris Hung; Jana Brancato; Priti Kumari; Joshua Orvis; Kyle Tretina; Marcus Chibucos; Sandy Ott; Lisa Sadzewicz; Naomi Sengamalay; Amol C Shetty; Qi Su; Luke Tallon; Claire M Fraser; Roger Frutos; Douglas M Molina; Peter J Krause; Choukri Ben Mamoun
Journal:  Sci Rep       Date:  2016-10-18       Impact factor: 4.379

Review 9.  Role of parasitic vaccines in integrated control of parasitic diseases in livestock.

Authors:  Neelu Sharma; Veer Singh; K P Shyma
Journal:  Vet World       Date:  2015-05-14

10.  Combining Landscape Genomics and Ecological Modelling to Investigate Local Adaptation of Indigenous Ugandan Cattle to East Coast Fever.

Authors:  Elia Vajana; Mario Barbato; Licia Colli; Marco Milanesi; Estelle Rochat; Enrico Fabrizi; Christopher Mukasa; Marcello Del Corvo; Charles Masembe; Vincent B Muwanika; Fredrick Kabi; Tad Stewart Sonstegard; Heather Jay Huson; Riccardo Negrini; Stéphane Joost; Paolo Ajmone-Marsan
Journal:  Front Genet       Date:  2018-10-03       Impact factor: 4.599

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.