| Literature DB >> 35909595 |
Jenny G Maloney1, Aleksey Molokin1, Gloria Solano-Aguilar2, Jitender P Dubey3, Monica Santin1.
Abstract
Giardia duodenalis is a pathogenic intestinal protozoan parasite of humans and many other animals. Giardia duodenalis is found throughout the world, and infection is known to have adverse health consequences for human and other mammalian hosts. Yet, many aspects of the biology of this ubiquitous parasite remain unresolved. Whole genome sequencing and comparative genomics can provide insight into the biology of G. duodenalis by helping to reveal traits that are shared by all G. duodenalis assemblages or unique to an individual assemblage or strain. However, these types of analyses are currently hindered by the lack of available G. duodenalis genomes, due, in part, to the difficulty in obtaining the genetic material needed to perform whole genome sequencing. In this study, a novel approach using a multistep cleaning procedure coupled with a hybrid sequencing and assembly strategy was assessed for use in producing high quality G. duodenalis genomes directly from cysts obtained from feces of two naturally infected hosts, a cat and dog infected with assemblage A and D, respectively. Cysts were cleaned and concentrated using cesium chloride gradient centrifugation followed by immunomagnetic separation. Whole genome sequencing was performed using both Illumina MiSeq and Oxford Nanopore MinION platforms. A hybrid assembly strategy was found to produce higher quality genomes than assemblies from either platform alone. The hybrid G. duodenalis genomes obtained from fecal isolates (cysts) in this study compare favorably for quality and completeness against reference genomes of G. duodenalis from cultured isolates. The whole genome assembly for assemblage D is the most contiguous genome available for this assemblage and is an important reference genome for future comparative studies. The data presented here support a hybrid sequencing and assembly strategy as a suitable method to produce whole genome sequences from DNA obtained from G. duodenalis cysts which can be used to produce novel reference genomes necessary to perform comparative genomics studies of this parasite.Entities:
Keywords: Assemblage; Assembly; Genome; Giardia; Illumina MiSeq; Long-read sequencing; MinION
Year: 2022 PMID: 35909595 PMCID: PMC9325754 DOI: 10.1016/j.crmicr.2022.100114
Source DB: PubMed Journal: Curr Res Microb Sci ISSN: 2666-5174
Summary of available Giardia spp. genomes including details of type of isolate (feces/cysts or culture/trophozoites), host, and sequencing platform. In bold are genomes obtained from DNA extracted from cysts.
| Year | Assemblages sequenced (isolate ID) | Type of isolates | Host | Sequencing platform | Citations |
|---|---|---|---|---|---|
| 2020 | A1 (WB) | Culture | Human | Oxford Nanopore and Illumina | |
| 2020 | Mice | PacBio and Illumina | |||
| 2020 | A1 (WB/C6) | Culture | Human | PacBio and Illumina | |
| 2019 | Dogs | Illumina | |||
| 2019 | A1 (ZX15) | Culture | Human | Illumina | |
| 2018 | A1 (18 isolates) | Culture | Human | Illumina | |
| 2017 | A2 (12 isolates) | Culture | Human | Illumina | |
| 2015 | A1 (2 isolates) | Culture | Human | Illumina | |
| 2015 | B (BAH15c1) | Culture | Human | 454 Life Science | |
| 2015 | Human | SOLiD | |||
| 2015 | A2 (AS175) | Culture | Human | 454 Life Science | |
| 2013 | A2 (DH) | Culture | Human | 454 Life Science | |
| 2010 | E (P15) | Culture | Pig | 454 Life Science | Jerlstrom-Hultqvist et al. 2010 |
| 2009 | B (GS) | Culture | Human | 454 Life Science | Franzen et al. 2009 |
| 2007 | A1 (WB/C6) | Culture | Human | Sanger | Morrison et al.2007 |
ATCC 30957.
ATTCC 50580.
Obtained from Waterborne.
ATCC 50803.
A single cyst was used.
Pools of 40 cysts were used.
Refer to Tsui et al. (2018) for specific isolate identification.
No isolate identification provided.
Refer to Prystajecky et al. (2015) for specific isolate identification.
ATCC 50581.
Assessment of read taxonomy for Cat Isolate A (CIA) and Dog Isolate D (DID) from reads obtained using Illumina MiSeq and Oxford Nanopore MinION sequencing platforms.
| Isolate | Sequencing platform | Total read pairs | Percentage of reads aligned to NCBI non redundant (nr/nt) database | Percentage of reads aligned to | Percentage of reads aligned to bacteria reference |
|---|---|---|---|---|---|
| CIA | MiSeq | 7,171,780 | 85.3% | 80.0% | 18.9% |
| MinION | 158,089 | 99.8% | 96.1% | 2.2% | |
| DID | MiSeq | 32,319,608 | 61.8% | 84.1% | 15.0% |
| MinION | 176,926 | 89.3% | 90.2% | 9.1% |
Includes only those reads which aligned to NCBI non redundant (nr/nt) database.s.
Comparisons between assemblies from Illumina reads, MinION reads, and hybrids for Cat Isolate A (CIA) and Dog Isolate D (DID).
| Isolate | Features | Assembly method | |||
|---|---|---|---|---|---|
| SPAdes Illumina only | Canu MinION only | SPAdes hybrid | MaSuRCA hybrid | ||
| CIA | Number of scaffolds | 526 | 201 | 273 | 93 |
| Total length (Mb) | 10.4 | 10.1 | 10.7 | 10.7 | |
| N50 (Kb) | 58.3 | 83.5 | 132.7 | 149.1 | |
| L50 | 63 | 43 | 28 | 18 | |
| Max scaffold length (Kb) | 192.5 | 206.3 | 340.8 | 644.7 | |
| Scaffolds > 50 Kb | 77 | 83 | 79 | 68 | |
| Percentage of genome in scaffolds > 50 Kb | 57.5 | 76.2 | 87.8 | 93.2 | |
| DID | Number of scaffolds | 2821 | 824 | 2157 | 260 |
| Total length (Mb) | 11.6 | 7.6 | 13.7 | 13.1 | |
| N50 (Kb) | 8.1 | 12.1 | 16.4 | 93.1 | |
| L50 | 251 | 189 | 162 | 37 | |
| Max scaffold length (Kb) | 131.8 | 56.8 | 258.1 | 443.4 | |
| Scaffolds > 50 Kb | 26 | 2 | 46 | 80 | |
| Percentage of genome in scaffolds > 50 Kb | 15.8% | 1.4% | 25.7% | 72.7% | |
Mapping of Cat Isolate A (CIA) contigs to reference genomes (Assemblage A1, A2, B, and E).
| Reference genomes assemblage/isolate (GenBank Assembly Accession Number) | |||||
|---|---|---|---|---|---|
| A1/WB (GCA_000002435.2) | A1/WB (GCA_011634545.1) | A2/DH (GCA_000498715.1) | B/GS (GCA_000498735.1) | E/P15 (GCA_000182665.1) | |
| No. of scaffolds mapped | 93 | 93 | 93 | 84 | 92 |
| Scaffolds mapped (%) | 100 | 100 | 100 | 90.3 | 98.9 |
| No. of bases mapped | 10,379,576 | 10,503,494 | 10,321,125 | 1,416,987 | 9,205,450 |
| Bases mapped (%) | 97.0 | 98.1 | 96.4 | 13.2 | 86.0 |
| Base variation (%) | 1.7 | 1.9 | 2.4 | 17.7 | 13.9 |
| No. of reference scaffolds | 35 | 37 | 239 | 543 | 820 |
| No. reference bases | 12,078,186 | 11,696,115 | 10,703,894 | 12,009,633 | 11,522,052 |
| Reference scaffolds with any coverage (%) | 42.8 | 89.2 | 78.2 | 32.0 | 26.9 |
| Reference bases covered (%) | 85.2 | 88.6 | 92.7 | 11.6 | 78.9 |
Mapping of Dog Isolate D (DID) contigs to reference genomes (Assemblage A1, A2, B, C, D, and E).
| Reference genomes assemblage/isolate (GenBank Assembly Accession Number) | ||||||||
|---|---|---|---|---|---|---|---|---|
| A1/WB (GCA_000002435.2) | A2/DH (GCA_000498715.1) | B/GS (GCA_000498735.1) | C/Cyste1 (GCA_902,209,425.1) | D/Cyste2 (GCA_902,221,465.1) | D/Cyste4 (GCA_902,221,485.1) | D/Pool5 (GCA_902,221,535.1) | E/P15 (GCA_000182665.1) | |
| No. of scaffolds mapped | 70 | 67 | 82 | 195 | 231 | 231 | 231 | 69 |
| Scaffolds mapped (%) | 26.9 | 25.8 | 31.5 | 75.0 | 88.8 | 88.8 | 88.8 | 26.5 |
| No. of bases mapped | 251,458 | 243,027 | 331,274 | 3,698,229 | 13,119,130 | 13,063,264 | 13,067,073 | 292,241 |
| Bases mapped (%) | 1.9 | 1.8 | 2.5 | 28.2 | 99.9 | 99.5 | 99.6 | 2.2 |
| Base variation (%) | 16.6 | 16.6 | 17.1 | 17.8 | 2.0 | 2.0 | 20 | 17.0 |
| No. of reference scaffolds | 35 | 239 | 543 | 3388 | 3269 | 2885 | 3489 | 820 |
| No. reference bases | 12,078,186 | 10,703,894 | 12,009,633 | 11,557,310 | 11,374,926 | 11,268,649 | 11,499,674 | 11,522,052 |
| Reference scaffolds with any coverage (%) | 20.0 | 22.6 | 17.3 | 9.9 | 43.1 | 44.6 | 41.1 | 9.5 |
| Reference bases covered (%) | 2.2 | 2.2 | 2.7 | 30.2 | 91.2 | 91.8 | 90.5 | 2.4 |
Percentage of BUSCOs from Eukaryota dataset present in ORFs for genomes generated in this study, Cat Isolate A (CIA) and Dog Isolate D (DID), and four selected reference genomes for assemblages A and D.
| Assemblage (Isolate) | ||||||
|---|---|---|---|---|---|---|
| A (CIA) | D (DID) | A1 (WB/C6) | A1 (WB/C6) | D (Dog1/cyst4) | D (Dog5/pool5) | |
| Complete | 23.9 | 23.6 | 20.4 | 23.9 | 24.3 | 23.5 |
| Fragmented | 5.9 | 6.7 | 6.7 | 5.5 | 5.9 | 6.3 |
| Total | 29.8 | 30.3 | 27.1 | 29.4 | 30.2 | 29.8 |
Morrison et al. (2007).
Xu et al. (2020b).
Kooyman et al. (2019).