| Literature DB >> 35241789 |
Pasi K Korhonen1, Liina Kinkar1, Neil D Young1, Huimin Cai2,3, Marshall W Lightowlers1, Charles Gauci1, Abdul Jabbar1, Bill C H Chang1, Tao Wang1, Andreas Hofmann1, Anson V Koehler1, Junhua Li2,3, Jiandong Li2,3, Daxi Wang2,3, Jiefang Yin2,3, Huanming Yang2,3, David J Jenkins4, Urmas Saarma5, Teivi Laurimäe5, Mohammad Rostami-Nejad6, Malik Irshadullah7, Hossein Mirhendi8, Mitra Sharbatkhori9, Francisco Ponce-Gordo10, Sami Simsek11, Adriano Casulli12, Houria Zait13, Hripsime Atoyan14, Mario Luiz de la Rue15, Thomas Romig16, Marion Wassermann16, Sargis A Aghayan17, Hasmik Gevorgyan18, Bicheng Yang19, Robin B Gasser20.
Abstract
Cystic echinococcosis is a socioeconomically important parasitic disease caused by the larval stage of the canid tapeworm Echinococcus granulosus, afflicting millions of humans and animals worldwide. The development of a vaccine (called EG95) has been the most notable translational advance in the fight against this disease in animals. However, almost nothing is known about the genomic organisation/location of the family of genes encoding EG95 and related molecules, the extent of their conservation or their functions. The lack of a complete reference genome for E. granulosus genotype G1 has been a major obstacle to addressing these areas. Here, we assembled a chromosomal-scale genome for this genotype by scaffolding to a high quality genome for the congener E. multilocularis, localised Eg95 gene family members in this genome, and evaluated the conservation of the EG95 vaccine molecule. These results have marked implications for future explorations of aspects such as developmentally-regulated gene transcription/expression (using replicate samples) for all E. granulosus stages; structural and functional roles of non-coding genome regions; molecular 'cross-talk' between oncosphere and the immune system; and defining the precise function(s) of EG95. Applied aspects should include developing improved tools for the diagnosis and chemotherapy of cystic echinococcosis of humans.Entities:
Mesh:
Substances:
Year: 2022 PMID: 35241789 PMCID: PMC8894454 DOI: 10.1038/s42003-022-03125-1
Source DB: PubMed Journal: Commun Biol ISSN: 2399-3642
Genome features.
| Characteristics | ||||
|---|---|---|---|---|
| Genome size (bp) | 172,983,221 | 114,538,160 | 110,837,706 | 114,963,242 |
| Number of scaffolds (contigs) | 31 (542) | 1288 | 957 | 1217 |
| N50 (bp); L50 of contig assembly | 1,386,608; 24 | – | – | – |
| N90 (bp); L90 of contig assembly | 114,310; 218 | – | – | – |
| N50 (bp); L50 of scaffolded assembly | 18,675,433; 4 | 5,228,736; 8 | 712,683; 39 | 13,762,452; 4 |
| N90 (bp); L90 of scaffolded assembly | 12,340,804; 8 | 213,489; 41 | 127,284; 181 | 2,924,275; 10 |
| Genome GC content (%) | 42.2 | 41.9 | 41.8 | 42.2 |
| Repetitive sequences (%) | 36.2 | 10.55 | - | 11.95 |
| Exonic proportion; including introns (%) | 9.0; 33.1 | 13.3; 48.8 | 14.3; 55.6 | 13.7; 49.1 |
| Number of putative coding genes | 9985 | 10,245 | 11,319 | 10,663 |
| Mean; median gene size (bp) | 5727; 2912 | 5459; 2692 | 5481; 3281 | 5292; 2654 |
| Mean; median CDS length (bp) | 1551; 1095 | 1486; 1062 | 1401; 939 | 1476; 1041 |
| Mean exon number per gene | 7.0 | 6.8 | 6.7 | 6.8 |
| Mean; median exon length (bp) | 221; 159 | 219; 159 | 211;153 | 218; 158 |
| Mean; median intron length (bp) | 693; 240 | 685; 247 | 722; 318 | 663; 242 |
| Coding GC content (%) | 50.1 | 50.0 | 49.3 | 49.9 |
| BUSCO completeness: complete; partial genes (%) | 69.9; 6.2 | 71.9; 5.5 | 69.2; 5.7 | 72.6; 5.2 |
Comparison of the characteristics of the genome Eg-G1s of Echinococcus granulosus (genotype G1) with those of previous draft genomes of E. granulosus (G1) and E. multilocularis.
aShort-read assemblies[14,15].
Fig. 1The genome Eg-G1s of Echinococcus granulosus (genotype G1) and the Eg95 gene family.
a Circular representation of the Echinococcus granulosus genome (genotype G1; designated Eg-G1s) with nine chromosomes (Ch1 to Ch9); indicated are gene (blue), repeat (orange) and encoded RNA (green; log2) densities ranging from 0 to 100% (bin-size of 100 kb) across the genome and the locations of the four Eg95 genes (Eg95-1, -4, -5 and -6). b Structure of the four Eg95 gene family members—thick and thin bars denote 3 exons and 2 introns, respectively. Black bars indicate 100% identity to Eg95-1; shades of grey to white correspond to sequence identity (%) to Eg95-1 (scale, below). c Structure of the Eg95-1 gene and mRNA. Predicted Goldberg-Hogness box (TATAA), start site (ATG), termination codon (TGA) and polyadenylation signal (AATACG) are indicated; the first and last exons are flanked by non-coding regions; mRNA includes 5ʹ- and 3ʹ-UTRs (white) and coding regions (grey). d Complete amino acid sequence of EG95-1 compared with those predicted for EG95-4, EG95-5 and EG95-6. Dashes indicate gaps inserted for the purpose of the alignment. Pairwise sequence comparisons among these four sequences range from ~77% to 99% identity.
Fig. 2Synteny, relationships and orthology.
a Synteny of the nine chromosomes (Ch1 to Ch9) of the genome Eg-G1s of Echinococcus granulosus (genotype G1) with scaffolds or chromosomes in the genome assemblies of E. multilocularis[14], Taenia multiceps[22] and Hymenolepis microstoma[23]. Each line represents a single copy orthologous (SCO) gene between two species (grey—same orientation; green—reverse orientation). Scale bar (top right) indicates chromosome length (Mb). b Consensus tree showing the genetic relationship of the four cestode species using data for 4040 shared SCOs (nodal support values: 1.0 and 100% for MrBayes and RAxML analyses, respectively; scale bar: substitutions per sequence site). c Venn diagram displaying the numbers of orthogroups between or among the four cestode species obtained using the program OrthoFinder[68] (numbers of E. granulosus genes in parentheses). d Venn diagram comparing the numbers of genes (using OrthoFinder) common or distinct between or among the reference genome Eg-G1s (top left) and previously published assemblies[14,16]. Numbers of paralogous genes (small ovals) and orthologous and/or single copy genes (large ovals and overlaps) are indicated, as are orphan (unknown) genes (in parenthesis). Numbers of gene predicted (n = 1432; 539 + 172 and 539 + 182) from two previous draft genomes of E. granulosus[14,16] for which homologous protein-coding genes were not identified in the final gene set of Eg-G1s. White lettering was used only to improve visibility of numbers on dark background.
Fig. 3Transcription in Echinococcus granulosus (genotype G1).
a Life cycle of E. granulosus with key developmental stages indicated – adapted from ref. [23] – canid definitive host (DH); intermediate host (IH). b Four distinct clusters (each with sub-clusters + and –; divided according to fold-change (FC) ≥ 4 and FC ≤ −4, respectively) of genes whose transcription correlated among the protoscolex, adult and oncosphere stages, inferred by weighted correlation network analysis (numbers in boxes are gene counts). The four Eg95 genes (within sub-cluster 1 + ) are highly transcribed in the oncospheral stage. Enriched biological (KEGG) pathways representing individual gene clusters/sub-clusters are indicated. White lettering was used only to improve visibility of numbers on dark red background.
Fig. 4Assessment of genetic variation in the Eg95-1 gene and associated gene product.
a Genomic DNA samples (n = 47) representing single cysts or adult worms of Echinococcus granulosus (genotype G1 or G3) from 8 distinct host species and 10 different countries were sequenced. b Mapping of sequence data from individual samples to the haploid reference genome (Eg-G1s) sequence detected polymorphism (allelic variability) but no unambiguous (i.e. fixed or homozygous) nucleotide difference in the 3 exons of Eg95-1 for any of the (diploid) sequences from any of the 47 samples with the reference sequence. Black horizontal bars represent the three exons (1 to 3) and black lines denote intervening introns. Polymorphic positions are indicated above each exon: a dominant base (black) matches the Eg-G1s reference sequence; a grey base represents the minor allele (cf. Supplementary Data 13); a fixed nucleotide difference from the reference sequence is indicated at one position; and a dash indicates an indel. c Mapping of allelic variation of EG95-1 to the modelled three-dimensional structure of the vaccine molecule EG95 reveals variable regions (see colour-key for percentage conservation) in the N-terminal α-helix, as well as two β-strands, each of which located in one of the predicted anti-parallel β-sheets. All residue side chains subject to allelic variation are surface exposed, and thus, due to the conservative nature of most mutations (A→T, T→I, G→E, M→R, V→I, R→H, E→D, D→S), overall structural conservation of the vaccine molecule (EG95-1) can be inferred.