| Literature DB >> 29947776 |
Wenhui Li1, Bo Liu2, Yang Yang1, Yuwei Ren2, Shuai Wang1, Conghui Liu2, Nianzhang Zhang1, Zigang Qu1, Wanxu Yang2, Yan Zhang2, Hongbing Yan1, Fan Jiang2, Li Li1, Shuqu Li2, Wanzhong Jia1,3, Hong Yin1,3, Xuepeng Cai1, Tao Liu4, Donald P McManus5, Wei Fan2, Baoquan Fu1,3.
Abstract
Coenurosis, caused by the larval coenurus of the tapeworm Taenia multiceps, is a fatal central nervous system disease in both sheep and humans. Though treatment and prevention options are available, the control of coenurosis still faces presents great challenges. Here, we present a high-quality genome sequence of T. multiceps in which 240 Mb (96%) of the genome has been successfully assembled using Pacbio single-molecule real-time (SMRT) and Hi-C data with a N50 length of 44.8 Mb. In total, 49.5 Mb (20.6%) repeat sequences and 13, 013 gene models were identified. We found that Taenia spp. have an expansion of transposable elements and recent small-scale gene duplications following the divergence of Taenia from Echinococcus, but not in Echinococcus genomes, and the genes underlying environmental adaptability and dosage effect tend to be over-retained in the T. multiceps genome. Moreover, we identified several genes encoding proteins involved in proglottid formation and interactions with the host central nervous system, which may contribute to the adaption of T. multiceps to its parasitic life style. Our study not only provides insights into the biology and evolution of T. multiceps, but also identifies a set of species-specific gene targets for developing novel treatment and control tools for coenurosis.Entities:
Mesh:
Substances:
Year: 2018 PMID: 29947776 PMCID: PMC6191302 DOI: 10.1093/dnares/dsy020
Source DB: PubMed Journal: DNA Res ISSN: 1340-2838 Impact factor: 4.458
Summary of assembly and annotation of tapeworm genomes
| Assembly feature | |||||||
|---|---|---|---|---|---|---|---|
| Estimated genome size (Mb) | 250 | — | 260 | 260 | 154 | 195 | — |
| Assembled sequences (Mb) | 240 | 122 | 169 | 168 | 115 | 114 | 182 |
| Assembly coverage (%) | 96 | — | 65 | 65 | 75 | 60 | — |
| Gaps Ratios (%) | 0.05 | 0.1 | 1.6 | 2.5 | 0.3 | 2.4 | 10.5 |
| Longest scaffold size (Mb) | 10.5 | 0.7 | 7.3 | 4.2 | 20.1 | 15.9 | 22.3 |
| N50 size of scaffold (Mb) | 44.8 | 0.1 | 0.6 | 0.3 | 13.8 | 5.2 | 7.6 |
| N90 size of scaffold (kb) | 8,527.5 | 5.3 | 29.4 | 14.3 | 2,924.3 | 213.5 | 40.7 |
| GC content in genome (%) | 43.7 | 42.9 | 43.2 | 43.1 | 42.2 | 41.9 | 35.9 |
| Gene annotation | |||||||
| Number of gene models | 13,013 | 12,481 | 13,161 | 13,323 | 10,663 | 10,245 | 12,368 |
| BUSCO complete gene (ratio) | 351 (81.8%) | 382 (89.0%) | 364 (84.8%) | 360 (83.9%) | 393 (91.6%) | 388 (90.4%) | 381 (88.8%) |
| Coding sequence size (Mb) | 18.5 | 15.5 | 13.3 | 13.5 | 15.7 | 15.2 | 16.9 |
| Average CDS size (bp) | 1,424 | 1,242 | 1,011 | 1,013 | 1,472 | 1, 484 | 1, 366 |
| Average exon number | 6.6 | 5.6 | 4.4 | 4.3 | 6.7 | 6.8 | 6.0 |
| Average exon size (bp) | 215 | 222 | 232 | 234 | 218 | 218 | 228 |
| GC content in coding region (%) | 50.9 | 50.1 | 50.1 | 50.1 | 50.0 | 50.0 | 44.4 |
The genome data of six other tapeworm species (T. solium, T. saginata, T. asiatica, E. multilocularis, E. granulosus and H. microstoma) were downloaded from WormBase and the NCBI database. The N50 and N90 sizes were calculated based on the assembled genome size.
The estimated genome sizes were reported in Wang et al. (2016); For the E. multilocularis and E. granulosus, the estimated genome sizes were used the Illumina reads by distribution of kmer frequency (Supplementary Fig. S2). We did not estimate the genome sizes of T. solium and H. microstoma, because of Illumina reads not be found.
Figure 1The genome characteristics of T. multiceps. (a) Distribution of 19-mer frequency. Error corrected Illumina reads were used to calculate kmer frequency. (b) Hi-C produces a genome-wide contact matrix with 500 kb window between in seven-linkage group (LG). (c) A Venn diagram showing the unique and shared size.
Figure 2Repetitive sequences content and TE divergence rate in tapeworm genomes. (a) Classification and contents of repetitive sequences in T. multiceps compared with T. saginata, T. asiatica, T. solium, E. granulosus and E. multilocularis. (b) The ratio of top 10 abundant TEs families in T. multiceps genome. (c) Distribution of TE divergence rate in tapeworm genomes.
Figure 3Evolution of the T. multiceps genome. (a) Dated tree for 14 species. The age of each node is indicated by 95% CI. (b) Distribution of different types of gene duplication in tapeworm genomes. WGD means whole genome duplication. (c) Distribution of synonymous mutation rate (Ks) values for paralogous gene pairs in each tapeworm genome. Ks values between T. multiceps and E. multilocularis are calculated using the syntenic ortholog gene pairs.
Figure 4Expansion and divergence of the Tbx6 subfamily in T. multiceps. (a) ML Phylogenetic tree of Tbx6 genes in T. multiceps. Two Tbx6 genes of Echinococcus were used as outgroup. (b) Motifs of Tbx6 genes in T. multiceps. Each colour of rectangle represents a motif. (c) Expression patterns of Tbx6 genes in stages (Onc, Pro, Cyst and Adu) of T. multiceps.
Figure 5Effect of the coenurus on the sheep CNS. (a) The diagrammatic map of T. multiceps life cycle. The enlarged photo is a growing coenurus cyst with clusters of protoscoleces. (b) Possible mechanisms of the effects of the coenurus on the intermediate host brain. Nutrients and waste materials, are transported outside and inside. The word ‘Partial enlargement’ means the big circle at the right of the arrow is the enlarged model of protoscoleces. Various shapes exist in the figure, and the same shape with the same colour refers to the same protein or ion, and two squares drawn with a red dotted line indicate the TCA cycle and neurotoxic materials, respectively. The two-way arrows with thickened black line show the double-direction transmission of proteins and ions, whereas the single-way arrow indicates one direction; the single-way arrow with a dotted line points to the name of the shape.
Figure 6Predicted host-origin HTGs in T. multiceps. (a) Phylogenetic tree of Tm1G003541 and its homologues. The clades highlighted in yellow, green and blue represent mammals, lower vertebrates and invertebrates, respectively, while the font in red represents tapeworm and dog. (b) The characteristic of sequence alignment between HTG candidate (Tm1G003541). (c) The local pair-end reads mapping of the flanking region of HTG candidate (Tm1G003541). The red box indicates the HTG gene. (d) Expression patterns of T. multiceps HTGs based on transcriptome analysis. (e) GO annotation of the HGT candidates.