| Literature DB >> 35890045 |
Jerald Yam1,2, Daniel R Bogema1, Melinda L Micallef1, Steven P Djordjevic2, Cheryl Jenkins1.
Abstract
Theileria orientalis causes losses to cattle producers in Eastern Asia, Oceania and, more recently, North America. One pathogenic genotype (Ikeda) has been sequenced to the chromosomal level, while only draft genomes exist for globally distributed Chitose and Buffeli genotypes. To provide an accurate comparative gene-level analysis and help further understand their pathogenicity, we sequenced isolates of the Chitose and Buffeli genotypes of T. orientalis using long-read sequencing technology. A combination of several long-read assembly methods and short reads produced chromosomal-level assemblies for both Fish Creek (Chitose) and Goon Nure (Buffeli) isolates, including the first complete and circular apicoplast genomes generated for T. orientalis. Comparison with the Shintoku (Ikeda) reference sequence showed both large and small translocations in T. orientalis Buffeli, between chromosomes 2 and 3 and chromosomes 1 and 4, respectively. Ortholog clustering showed expansion of ABC transporter genes in Chitose and Buffeli. However, differences in several genes of unknown function, including DUF529/FAINT-domain-containing proteins, were also identified and these genes were more prevalent in Ikeda and Chitose genotypes. Phylogenetics and similarity measures were consistent with previous short-read genomic analysis. The generation of chromosomal sequences for these highly prevalent T. orientalis genotypes will also support future studies of population genetics and mixed genotype infections.Entities:
Keywords: Theileria orientalis; comparative genomics; gene presence
Year: 2022 PMID: 35890045 PMCID: PMC9323827 DOI: 10.3390/pathogens11070801
Source DB: PubMed Journal: Pathogens ISSN: 2076-0817
Draft assembly results of the five different assemblers trailed.
| Genotype | Assembler | Total Contigs | Contigs (≥50 kb) | Total Length | N50 | Largest Contig (bp) |
|---|---|---|---|---|---|---|
| Chitose | Flye | 14–16 | 7–8 | 9,344,963 | 2,171,492 | 2,745,486 |
| Miniasm | 7–10 | 5–6 | 9,559,641 | 2,254,955 | 2,765,560 | |
| Necat | 4–6 | 4–6 | 9,416,796 | 2,296,410 | 2,765,760 | |
| Raven | 5–6 | 4 | 9,365,432 | 2,296,609 | 2,770,085 | |
| Shasta | 9–12 | 4–5 | 9,427,530 | 2,242,537 | 2,765,028 | |
| Buffeli | Flye | 20 | 7–12 | 9,316,485 | 1,958,568 | 2,504,925 |
| Miniasm | 9–14 | 6–12 | 9,531,547 | 1,733,126 | 2,109,761 | |
| Necat | 10–19 | 10–19 | 10,547,787 | 1,967,178 | 2,912,118 | |
| Raven | 9–12 | 4–6 | 9,269,508 | 2,079,080 | 2,896,450 | |
| Shasta | 17 | 5–7 | 9,313,556 | 2,177,669 | 2,788,592 |
Final chromosome lengths (bp) for sequenced T. orientalis isolates.
| Isolate | Chr 1 | Chr 2 | Chr 3 | Chr 4 | Apicoplast | Mitochondria |
|---|---|---|---|---|---|---|
| Shintoku | 2,746,313 | 2,216,979 | 2,000,793 | 2,019,511 | 24,173 * | 2595 * |
| Fish Creek | 2,765,963 | 2,233,854 | 2,297,733 | 2,024,851 | 31,688 | 6231 |
| Goon Nure | 2,785,604 | 2,153,779 | 2,196,581 | 1,884,878 | 37,498 | 5965 |
* incomplete sequence.
Figure 1Apicoplast genomes of T. orientalis Fish Creek and Goon Nure isolates. Outer ring (black) represents DNA sequence. Middle ring shows annotated genes including ribosomal RNA subunits (red), transfer RNA (purple) and protein coding sequences (green). Inner ring shows %GC difference from average with a 100 bp sliding window.
Figure 2Synteny dot plots of the T. orientalis Shintoku (Ikeda) reference and strains Fish Creek (Chitose) and Goon Nure (Buffeli). Red circles indicate rearrangement in strain Goon Nure between chromosomes 2 and 3 and translocation between chromosomes 1 and 4.
Genome annotation statistics of the T. orientalis isolates sequenced in this study and the T. orientalis Shintoku reference sequence.
| Shintoku | Fish Creek | Goon Nure | |
|---|---|---|---|
| Genome | |||
| Total predicted genes | 4058 | 3980 | 3924 |
| Total predicted mRNA | 4002 | 3907 | 3848 |
| Total predicted tRNA | 47 | 66 | 69 |
| Total predicted rRNA | 9 | 7 | 7 |
| Total predicted CDS | 4002 | 3907 | 3848 |
| Percentage coding sequence | 68.43 | 68.46 | 68.73 |
| Total annotated sequence length | 9,010,364 | 9,360,320 | 9,064,305 |
| Percentage GC | 41.53 | 38.84 | 37.46 |
| Genes (+tRNA and rRNA) | |||
| Longest gene | 26,436 | 23,559 | 25,877 |
| Shortest gene | 39 | 24 | 33 |
| Total gene length | 7,386,640 | 7,385,994 | 7,214,907 |
| Average gene length | 1820 | 1856 | 1839 |
| Average gene coding sequence | 1541 | 1640 | 1619 |
| Gene density (per 10,000 bp) | 450.37 | 425.2 | 432.91 |
| Percentage coding genes with introns | 78.3 | 76 | 76.1 |
| Exons | |||
| Total exon length | 6,180,198 | 6,424,419 | 6,246,171 |
| Total number of exons | 16,558 | 15,809 | 15,837 |
| Longest exon | 11,241 | 16,092 | 25,364 |
| Shortest exon | 2 | 3 | 3 |
| Average exon length | 373.2 | 406.4 | 394.4 |
| Percentage GC | 46.06 | 43.21 | 41.89 |
| Introns | |||
| Total intron length | 1,206,442 | 961,575 | 968,736 |
| Total number of introns | 12,500 | 11,829 | 11,913 |
| Longest intron | 5418 | 6291 | 4043 |
| Shortest intron | 4 | 11 | 11 |
| Average intron length | 96.5 | 81.3 | 81.3 |
| Average introns per gene | 3.1 | 3 | 3 |
| Percentage GC | 34.12 | 29.54 | 27.26 |
| Intergenic regions | |||
| Total intergenic length | 1,636,669 | 1,974,520 | 1,849,793 |
| Total intergenic regions | 4020 | 3962 | 3901 |
| Longest intergenic region | 9728 | 8374 | 18,440 |
| Shortest intergenic region | 1 | 1 | 1 |
| Average intergenic length | 407.1 | 498.4 | 474.2 |
| Percentage GC | 29.9 | 29.12 | 27.86 |
Figure 3Venn diagram showing number of genes found in each isolate combination.
Figure 4COG analysis of all predicted genes (top); genes identified as unique to each isolate combination (middle). Genes without COG assignment are not shown but consist of 31–38% of the total gene content of each isolate. COG categories (x-axis) are summarised by their letter categories (bottom).
Figure 5Maximum likelihood tree of Piroplasmida whole-genome protein sequences inferred with concordance factors with IQ-TREE 2 using 1417 concatenated protein sequences from single-copy genes. P. vivax str. Salvador I and P. falciparum str. 3D7 were used as outgroups. Each branch label on the tree shows the bootstrap, gene concordance factor (gCF) and site concordance factor (sCF), respectively (bootstrap/gCF/sCF).