| Literature DB >> 30668670 |
Ian W Wilson1, Gareth D Weedall1,2, Hernan Lorenzi3, Timothy Howcroft1, Chung-Chau Hon4, Marc Deloger4, Nancy Guillén4, Steve Paterson1, C Graham Clark5, Neil Hall6,7.
Abstract
Amoebiasis is the third-most common cause of mortality worldwide from a parasitic disease. Although the primary etiological agent of amoebiasis is the obligate human parasite Entamoeba histolytica, other members of the genus Entamoeba can infect humans and may be pathogenic. Here, we present the first annotated reference genome for Entamoeba moshkovskii, a species that has been associated with human infections, and compare the genomes of E. moshkovskii, E. histolytica, the human commensal Entamoeba dispar, and the nonhuman pathogen Entamoeba invadens. Gene clustering and phylogenetic analyses show differences in expansion and contraction of families of proteins associated with host or bacterial interactions. They intimate the importance to parasitic Entamoeba species of surface-bound proteins involved in adhesion to extracellular membranes, such as the Gal/GalNAc lectin and members of the BspA and Ariel1 families. Furthermore, E. dispar is the only one of the four species to lack a functional copy of the key virulence factor cysteine protease CP-A5, whereas the gene's presence in E. moshkovskii is consistent with the species' potentially pathogenic nature. Entamoeba moshkovskii was found to be more diverse than E. histolytica across all sequence classes. The former is ∼200 times more diverse than latter, with the four E. moshkovskii strains tested having a most recent common ancestor nearly 500 times more ancient than the tested E. histolytica strains. A four-haplotype test indicates that these E. moshkovskii strains are not the same species and should be regarded as a species complex.Entities:
Keywords: zzm321990 Entamoebazzm321990 ; gene family; genome diversity; species complex
Mesh:
Substances:
Year: 2019 PMID: 30668670 PMCID: PMC6414313 DOI: 10.1093/gbe/evz009
Source DB: PubMed Journal: Genome Biol Evol ISSN: 1759-6653 Impact factor: 3.416
Statistics Relating to the Genome Assemblies of Entamoeba histolytica HM-1:IMSS, Entamoeba dispar SAW760, Entamoeba invadens IP-1, and Entamoeba moshkovskii Laredo
| Statistic | ||||
|---|---|---|---|---|
| Genome length (bp) | 20,799,072 | 22,955,291 | 40,888,805 | 25,247,493 |
| GC content (%) | 24.20 | 23.53 | 29.91 | 26.54 |
| Non-ACGT (%) | 0.31 | 0.56 | 0.93 | 9.94 |
| Number of scaffolds | 1,496 | 3,312 | 1,149 | 1,147 |
| N50 of scaffolds (bp) | 49,118 | 27,840 | 243,235 | 40,197 |
| Average scaffold size (bp) | 13,903 | 6,931 | 35,586 | 19,190 |
| Number of contigs | — | — | — | 3,460 |
| Average contig size (bp) | — | — | — | 935 |
| Average coverage depth | 12.5×** | 4.32×* | 4×* | 82.65× |
Note.—Statistics are derived from AmoebaDB v2.0 data, except for asterisked (*) figures, taken from NCBI WGS Projects AANV02 and AANW03; and the double-asterisked (**) figure, taken from Loftus. et al. 2005.
. 1.—The range of GC contents in 100 base sections of reference genome assemblies for Entamoeba histolytica, Entamoeba dispar, Entamoeba moshkovskii, and Entamoeba invadens. In total, 99.19% of the E. histolytica assembly was included, as was 98.49% of the E. dispar assembly, 88.75% of the E. moshkovskii assembly, and 98.47% of the E. invadens assembly.
Genomic Comparison of Entamoeba histolytica HM-1:IMSS, Entamoeba dispar SAW760, Entamoeba invadens IP-1, and Entamoeba moshkovskii Laredo
| Statistic | ||||
|---|---|---|---|---|
| No of CDSs | 8,306 | 8,748 | 11,549 | 12,449 |
| Average gene size (bp) | 1,280 | 1,259 | 1,401 | 1,230 |
| % Coding DNA | 50.12 | 46.62 | 38.01 | 59.04 |
| Average protein size (aa) | 418 | 408 | 449 | 399 |
| Average intergenic distance (bp) | 1,223 | 1,365 | 2,139 | 798 |
| Proportion of multiexon genes (%) | 24.16 | 30.73 | 34.48 | 26.24 |
| Average intron size (bp) | 74 | 81 | 104 | 89 |
| Average number of introns per spliced gene | 1.27 | 1.34 | 1.48 | 1.31 |
| Number of BUSCO orthologs | 220 | 211 | 211 | 216 |
Note.—Annotation files upon which statistics are based were obtained from AmoebaDB v2.0.
. 2.—Venn diagram showing numbers of unique and orthologous genes and families in the genomes of Entamoeba histolytica, Entamoeba dispar, Entamoeba invadens, and Entamoeba moshkovskii. Numbers are based upon OrthoMCL output. Numbers in bold represent gene families; accompanying numbers in regular font represent the number of genes comprising those gene families.
Functional Annotations in Genes Unique to Entamoeba histolytica, Entamoeba dispar, Entamoeba invadens, or Entamoeba moshkovskii
| Number of Families with Function | Number of Genes within Families | Family Function |
|---|---|---|
| 6 | 22 | BspA family |
| 4 | 18 | Surface antigen ariel1 |
| 2 | 12 | AIG1 family |
| 2 | 12 | Mucins |
| 2 | 7 | Cylicin-2 |
| 2 | 7 | Cysteine protease (inc. five pseudogenes) |
| 1 | 6 | Acetyltransferase |
| 1 | 13 | AIG1 family |
| 2 | 5 | Heat shock protein |
| 46 | 214 | Serine/threonine/tyrosine kinase |
| 9 | 34 | Ras family GTPase |
| 2 | 32 | Ribonuclease |
| 8 | 27 | Heat shock protein |
| 1 | 21 | Cylicin |
| 2 | 21 | Myosin |
| 2 | 19 | Glutamine/asparagine-rich protein pqn-25 |
| 5 | 16 | Actin |
| 1 | 15 | Thioredoxin |
| 1 | 12 | Profilin |
| 1 | 11 | Capsular polysaccharide phosphotransferase |
| 2 | 11 | DNA double-strand break repair Rad50 ATPase |
| 1 | 9 | Embryonic protein DC-8 |
| 3 | 8 | Serine/threonine protein phosphatase |
| 1 | 8 | Tropomyosin alpha-1 chain |
| 2 | 7 | ADP ribosylation factor |
| 2 | 7 | Cysteine protease |
| 1 | 7 | Elongation factor 1-alpha |
| 1 | 7 | Furin |
| 2 | 6 | Actophorin |
| 1 | 6 | Gal/GalNAc lectin light subunit |
| 1 | 6 | Nitrogen fixation protein nifU |
| 1 | 5 | Calcium-binding protein/Caltractin/Centrin-1 |
| 2 | 5 | Chaperone Clpb |
| 1 | 5 | DNA repair and recombination protein rad52 |
| 1 | 5 | GRIP domain-containing protein RUD3 |
| 2 | 5 | Serpin (serine protease inhibitor) |
| 1 | 5 | Vacuolar protein sorting-associated protein |
| 40 | 753 | BspA like family |
| 80 | 538 | Serine/threonine/tyrosine/protein kinase |
| 10 | 58 | Ras family GTPase |
| 5 | 53 | Transposable element/transposase |
| 4 | 46 | Tigger transposable element-derived protein |
| 9 | 36 | Actin |
| 8 | 26 | Heat shock protein |
| 4 | 17 | Leukocyte elastase inhibitor |
| 1 | 14 | Large xylosyl- and glucuronyltransferase 2 isoform X1 |
| 2 | 13 | GNAT family |
| 1 | 12 | Enhancer binding protein-2 |
| 4 | 11 | DNA double-strand break repair Rad50 ATPase |
| 1 | 10 | TonB-dependent siderophore receptor |
| 1 | 9 | Methionine–tRNA ligase |
| 1 | 9 | Tandem lipoprotein |
| 2 | 8 | DEAD/DEAH box helicase |
| 2 | 8 | Reverse transcriptase |
| 1 | 8 | Chaperone |
| 3 | 7 | Methyltransferase (various) |
| 3 | 7 | Cysteine proteinase |
| 1 | 7 | Putative AC transposase |
| 3 | 6 | DNA mismatch repair protein MsH2 |
| 2 | 6 | piggyBac transposable element-derived protein |
| 1 | 6 | Polyphosphate:AMP phosphotransferase |
| 1 | 6 | Primary-amine oxidase |
| 1 | 6 | Surface antigen-like protein |
| 1 | 6 | Translation elongation factor |
| 1 | 6 | Type VI secretion system tip protein VgrG |
| 1 | 6 | Site-specific tyrosine recombinase XerC |
| 2 | 5 | Chaperone protein DNAK |
| 1 | 5 | Diaminobutyrate-2-oxoglutarate transaminase |
| 1 | 5 | Response regulator |
Mapping and Coverage Statistics for Each Strain Studied in This Project
| Strain | Country of Origin | Sequencing Platform | Year of Isolation | Average Coverage Depth ( | No of Mapped Reads | Coverage of Ref. (%) |
|---|---|---|---|---|---|---|
| | Mexico | SOLiD 4 | 1967 | 43.53 | 13,743,197 | 61.03 |
| 2592100c | Bangladesh | SOLiD 4 | 2005 | 41.50 | 13,618,188 | 68.83 |
| HK-9d | Korea | SOLiD 4 | 1951 | 57.41 | 21,217,510 | 71.86 |
| | Italy | SOLiD 4 | 2007 | 50.02 | 17,688,152 | 70.88 |
| | Italy | SOLiD 4 | 2007 | 29.61 | 8,506,016 | 71.88 |
| Rahmane | UK | SOLiD 4 | 1964 | 49.43 | 19,534,522 | 67.78 |
| | Bangladesh | SOLiD 4 | 2006 | 59.97 | 20,419,790 | 63.27 |
| | Bangladesh | SOLiD 4 | 2006 | 63.01 | 21,499,758 | 69.57 |
| | Bangladesh | Illumina GA II | 2007 | 114.03 | 20,527,917 | 89.00 |
| | Bangladesh | Illumina GA II | 2006 | 72.15 | 13,361,613 | 88.36 |
| Laredoh | America | Illumina MiSeq | 1956 | 97.61 | 8,833,683 | 89.91 |
| FICi | Canada | Illumina MiSeq | 1959 | 162.27 | 19,750,749 | 61.58 |
| Snake | France | Illumina MiSeq | 1948 | 209.10 | 25,655,106 | 76.96 |
| 15114 | Bangladesh | Illumina MiSeq | 1999 | 265.55 | 35,292,777 | 85.24 |
Gray rows represent reference strains, reads from which were mapped to their existing respective reference genome. Positions at which high-quality homozygous SNP calls were made in the reads were replaced in the original reference sequence. All other strains were mapped to the updated versions of their respective reference strains. Underlined sections of strain names represent the shortened versions of the names that will be used henceforth. References: a) Biller et al. (2009); b) Biller et al. (2010); c) Weedall et al. (2012); d) Ungar et al. (1985); e) Diamond and Clark (1993); f) Gilchrist et al. (2012); g) Ali et al. (2007); h) Dreyer (1961); i) Meerovitch (1958).
Sent from Institut Pasteur, Paris to Charles University, Prague in 1948. Institut Pasteur has no record of origin (Clark G, London School of Hygiene and Tropical Medicine, personal communication).
. 3.—Divergence of Entamoeba histolytica and Entamoeba moshkovskii strains, relative to their reference strains (HM-1:IMSS and Laredo, respectively), within different sequence classes. Circles represent E. moshkovskii strains, and crosses represent E. histolytica strains. SNPs occurring in regions classified as both flanking regions and coding regions were considered to occur in coding regions only. Rates are relative to sites within their respective sequence classes.
. 4.—Phylogenies of (a) Entamoeba histolytica and (b) Entamoeba moshkovskii strains based upon diversity in 4D synonymous sites. The trees were generated using a Neighbor-Joining method and are unrooted. Asterisks at all branching points indicate bootstrapping values of 1,000 out of 1,000. Branching points’ missing values were not supported by bootstrapping.