| Literature DB >> 23404454 |
Kyoko Hayashida1, Takashi Abe, William Weir, Ryo Nakao, Kimihito Ito, Kiichi Kajino, Yutaka Suzuki, Frans Jongejan, Dirk Geysen, Chihiro Sugimoto.
Abstract
The disease caused by the apicomplexan protozoan parasite Theileria parva, known as East Coast fever or Corridor disease, is one of the most serious cattle diseases in Eastern, Central, and Southern Africa. We performed whole-genome sequencing of nine T. parva strains, including one of the vaccine strains (Kiambu 5), field isolates from Zambia, Uganda, Tanzania, or Rwanda, and two buffalo-derived strains. Comparison with the reference Muguga genome sequence revealed 34 814-121 545 single nucleotide polymorphisms (SNPs) that were more abundant in buffalo-derived strains. High-resolution phylogenetic trees were constructed with selected informative SNPs that allowed the investigation of possible complex recombination events among ancestors of the extant strains. We further analysed the dN/dS ratio (non-synonymous substitutions per non-synonymous site divided by synonymous substitutions per synonymous site) for 4011 coding genes to estimate potential selective pressure. Genes under possible positive selection were identified that may, in turn, assist in the identification of immunogenic proteins or vaccine candidates. This study elucidated the phylogeny of T. parva strains based on genome-wide SNPs analysis with prediction of possible past recombination events, providing insight into the migration, diversification, and evolution of this parasite species in the African continent.Entities:
Keywords: SNPs; Theileria parva; dN/dS; genome sequence; recombination
Mesh:
Year: 2013 PMID: 23404454 PMCID: PMC3686427 DOI: 10.1093/dnares/dst003
Source DB: PubMed Journal: DNA Res ISSN: 1340-2838 Impact factor: 4.458
T. parva strains sequenced in this study with the summary of Solexa sequence results
| Strain name | Place isolated | Isolated year | Total reads obtained | Reference genome mapped reads | Mapped read (%) | Average coverage | Genome covered (%) | SNP number | SNP density (per 1kb) | ||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Overall | Coding | Non-coding | Overall | Coding | Non-coding | ||||||||
| ChitongoZ2 | Zambia | 1982 | 14 405 285 | 11 225 629 | 77.9 | 49.1 | 97.4 | 46 366 | 31 753 | 14 613 | 5.63 | 5.48 | 5.99 |
| KateteB2 | Zambia | 1989 | 16 558 765 | 4 954 291 | 29.9 | 21.3 | 97.3 | 43 873 | 31 533 | 12 340 | 5.33 | 5.44 | 5.06 |
| Kiambu Z464/C12 | Kenya | 1972 | 15 848 447 | 6 278 932 | 39.6 | 27.4 | 97.2 | 46 435 | 33 021 | 13 414 | 5.64 | 5.70 | 5.50 |
| MandaliZ22H10 | Zambia | 1985 | 16 362 287 | 3 904 897 | 23.9 | 17.1 | 97 | 38 498 | 28 270 | 10 228 | 4.67 | 4.88 | 4.19 |
| Entebbe | Uganda | 1980 | 10 171 312 | 3 547 208 | 34.9 | 15.5 | 95.2 | 34 814 | 27 195 | 7619 | 4.23 | 4.69 | 3.12 |
| Nyakizu | Rwanda | 1979 | 29 366 782 | 5 710 634 | 19.4 | 25 | 97 | 51 790 | 34 700 | 17 090 | 6.29 | 5.99 | 7.01 |
| Katumba | Tanzania | 1981 | 35 406 725 | 4 089 736 | 11.6 | 17.9 | 97.1 | 46 441 | 32 321 | 14 120 | 5.64 | 5.58 | 5.79 |
| Buffalo LAWR | Kenya | 1990 | 17 072 360 | 6 155 888 | 36.1 | 26.9 | 94.7 | 121 545 | 77 472 | 44 073 | 14.76 | 13.37 | 18.07 |
| Buffalo Z5E5 | Zambia | 1982 | 14 821 054 | 5 119 542 | 34.5 | 22.4 | 95.3 | 103 880 | 68 454 | 35 426 | 12.61 | 11.81 | 14.52 |
Figure 1.SNPs distribution across the Theileria genome. SNPs in individual strains were detected after mapping to the reference genome Muguga strain. The entire datasets of 34 814–121 545 SNPs (SNP dataset I) were plotted as SNP densities (per 10 kb intervals) alongside chromosome 1–4. The x-axis shows the chromosomal position, and the left y-axis shows the number of SNPs (black bars) per 10 kb interval. Average short read coverage is also shown on the right y-axis (above line). Arrowheads indicate the possible location of the centromere.
Figure 2.SNP density in each chromosome (SNP dataset I). Average SNP densities per 1 kb interval were calculated for each chromosome in nine T. parva strains with reference to the Muguga genome strain. In the published full genome sequence of T. parva, there is a large gap in the assembly of chromosome 3, due to the repetitive Tpr locus. The large contig AAGK01000005 and smaller contig AAGK 01000006 are shown as Chr3_530 and Chr3_531, respectively.
List of genes with high dN/dS ratios and a secretion signal peptide 71 genes were listed from 263 genes (higher dN/dS ratios), by selecting secretion signal peptide-predicted genes
| GeneID | Description | Ortholog group | Signal | GeneID | Description | Ortholog group | Signal |
|---|---|---|---|---|---|---|---|
| TP01_0144 | Hypothetical protein | PiroF0002444 | Y | TP03_0003 | Hypothetical (SVSP) | PiroF0100037 | Y |
| TP01_0178 | Hypothetical protein | PiroF0002919 | Y | TP03_0039 | Hypothetical protein | Not assigned | Y |
| TP01_0180 | 40S ribosomal protein S11 | PiroF0000589 | Y | TP03_0040 | Hypothetical protein | PiroF0003613 | Y |
| TP01_0291 | Hypothetical protein | PiroF0002390 | Y | TP03_0123 | Hypothetical protein | PiroF0002851 | Y |
| TP01_0367 | Hypothetical protein | PiroF0000012 | Y | TP03_0217 | Hypothetical protein | PiroF0000012 | Y |
| TP01_0378 | Hypothetical protein | PiroF0003402 | Y | TP03_0297 | Hypothetical (FAINT superfamily) | PiroF0100056 | Y |
| TP01_0380 | Hypothetical protein | PiroF0003404 | Y | TP03_0298 | Hypothetical (FAINT superfamily) | PiroF0000056 | Y |
| TP01_0610 | Hypothetical (Tash family) | PiroF0100038 | Y | TP03_0319 | Hypothetical protein | PiroF0000012 | Y |
| TP01_0619 | Hypothetical (Tash family) | PiroF0100038 | Y | TP03_0368 | Hypothetical (FAINT superfamily) | PiroF0100056 | Y |
| TP01_0621 | Hypothetical (Tash family) | PiroF0100038 | Y | TP03_0405 | Hypothetical protein | PiroF0002425 | Y |
| TP01_0914 | Hypothetical protein | PiroF0002316 | Y | TP03_0498 | Hypothetical (SVSP) | PiroF0100037 | Y |
| TP01_0955 | Hypothetical protein | PiroF0003569 | Y | TP03_0520 | Hypothetical protein | PiroF0000012 | Y |
| TP01_0987 | Hypothetical protein | PiroF0002967 | Y | TP03_0530 | Hypothetical protein | Y | |
| TP01_1011 | Hypothetical protein | PiroF0100045 | Y | TP03_0664 | Hypothetical protein | PiroF0000012 | Y |
| TP01_1044 | Hypothetical protein | Not assigned | Y | TP03_0780 | Hypothetical protein | PiroF0002660 | Y |
| TP01_1056 | 32 kDa surface antigen | PiroF0002963 | Y | TP03_0810 | Hypothetical protein | PiroF0002675 | Y |
| TP01_1109 | Hypothetical protein | PiroF0000207 | Y | TP03_0886 | Hypothetical (SVSP) | PiroF0100037 | Y |
| TP01_1227 | Hypothetical (SVSP) | PiroF0100037 | Y | TP03_0893 | Hypothetical (SVSP) | PiroF0100037 | Y |
| TP02_0004 | Hypothetical (SVSP) | PiroF0100037 | Y | TP04_0009 | Hypothetical (SVSP) | PiroF0100037 | Y |
| TP02_0006 | Hypothetical (SVSP) | PiroF0100037 | Y | TP04_0012 | Hypothetical (FAINT superfamily) | PiroF0100056 | Y |
| TP02_0010 | Hypothetical (SVSP) | PiroF0100037 | Y | TP04_0013 | Hypothetical (SVSP) | PiroF0100037 | Y |
| TP02_0018 | Hypothetical protein | PiroF0100055 | Y | TP04_0096 | Hypothetical (FAINT superfamily) | PiroF0100056 | Y |
| TP02_0239 | Hypothetical protein | PiroF0002609 | Y | TP04_0097 | Hypothetical (FAINT superfamily) | PiroF0100056 | Y |
| TP02_0327 | Hypothetical protein | PiroF0000012 | Y | TP04_0101 | Hypothetical (FAINT superfamily) | PiroF0100056 | Y |
| TP02_0331 | Ubiquitin-activating enzyme, putative | PiroF0002575 | Y | TP04_0104 | Hypothetical (FAINT superfamily) | PiroF0100056 | Y |
| TP02_0551 | 23 kDa piroplasm surface protein | PiroF0003021 | Y | TP04_0110 | Hypothetical protein | PiroF0001224 | Y |
| TP02_0575 | Hypothetical protein | PiroF0003017 | Y | TP04_0116 | Hypothetical protein | PiroF0003546 | Y |
| TP02_0819 | Hypothetical (FAINT superfamily) | PiroF0100056 | Y | TP04_0150 | Hypothetical (SVSP) | PiroF0000037 | Y |
| TP02_0856 | Hypothetical (FAINT superfamily) | PiroF0100056 | Y | TP04_0328 | Hypothetical protein | PiroF0002219 | Y |
| TP02_0875 | Hypothetical protein | PiroF0002985 | Y | TP04_0411 | Hypothetical protein | PiroF0003185 | Y |
| TP02_0952 | Hypothetical protein | PiroF0003456 | Y | TP04_0437 | 104 kDa antigen | PiroF0003088 | Y |
| TP02_0954 | Hypothetical (SVSP) | PiroF0100037 | Y | TP04_0558 | Hypothetical protein | PiroF0001517 | Y |
| TP02_0956 | Hypothetical (SVSP) | PiroF0100037 | Y | TP04_0919 | Hypothetical (SVSP) | PiroF0100037 | Y |
| TP03_0001 | Hypothetical (SVSP) | PiroF0100037 | Y | TP04_0920 | Hypothetical (SVSP) | PiroF0100037 | Y |
| TP03_0002 | Hypothetical protein (SVSP) | PiroF0100037 | Y | TP04_0921 | Hypothetical protein (FAINT superfamily) | PiroF0000056 | Y |
Figure 3.Mosaic pattern of SNPs in T. parva strains. The frequency of each of the 127 possible allelic combinations for the 8 cattle-derived T. parva strains was calculated using the SNP dataset II. The 10 top-ranking combinations were plotted onto schematic chromosomes in the assigned colours. Each line within a chromosome represents a single SNP marker position.
Figure 4.Neighbour-net network analysis of 10 T. parva strains. Neighbour-net network analysis was performed with the concatenated SNP allele sequence data from SNP dataset III. Bootstrap values are based on 100 replicates and were near 100 at most of the nodes.