| Literature DB >> 24307482 |
Rodney D Adam1, Eric W Dahlstrom, Craig A Martens, Daniel P Bruno, Kent D Barbian, Stacy M Ricklefs, Matthew M Hernandez, Nirmala P Narla, Rima B Patel, Stephen F Porcella, Theodore E Nash.
Abstract
Giardia lamblia (syn G. intestinalis, G. duodenalis) is the most common pathogenic intestinal parasite of humans worldwide and is a frequent cause of endemic and epidemic diarrhea. G. lamblia is divided into eight genotypes (A-H) which infect a wide range of mammals and humans, but human infections are caused by Genotypes A and B. To unambiguously determine the relationship among genotypes, we sequenced GS and DH (Genotypes B and A2) to high depth coverage and compared the assemblies with the nearly completed WB genome and draft sequencing surveys of Genotypes E (P15; pig isolate) and B (GS; human isolate). Our results identified DH as the smallest Giardia genome sequenced to date, while GS is the largest. Our open reading frame analyses and phylogenetic analyses showed that GS was more distant from the other three genomes than any of the other three were from each other. Whole-genome comparisons of DH_A2 and GS_B with the optically mapped WB_A1 demonstrated substantial synteny across all five chromosomes but also included a number of rearrangements, inversions, and chromosomal translocations that were more common toward the chromosome ends. However, the WB_A1/GS_B alignment demonstrated only about 70% sequence identity across the syntenic regions. Our findings add to information presented in previous reports suggesting that GS is a different species of Giardia as supported by the degree of genomic diversity, coding capacity, heterozygosity, phylogenetic distance, and known biological differences from WB_A1 and other G. lamblia genotypes.Entities:
Keywords: diplomonad; genotype; heterozygosity; parasitology; synteny
Mesh:
Substances:
Year: 2013 PMID: 24307482 PMCID: PMC3879983 DOI: 10.1093/gbe/evt197
Source DB: PubMed Journal: Genome Biol Evol ISSN: 1759-6653 Impact factor: 3.416
A Comparison of Genomic Features of Giardia Genomes Sequenced to Date
| Isolate_Genotype | WB_A1 | GS_B | GB_B | Pig_E | DH_A2 |
|---|---|---|---|---|---|
| Bases in contigs | 11, 213, 615 | 12,017,449 | 11,001,532 | 11, 522, 052 | 10, 703, 894 |
| Percent-coding region (%) | 82.76 | 86.35 | 74.44 | 79.62 | 89.54 |
| GC (%) | 49.25 | 48.25 | 47.26 | 47.24 | 49.04 |
| Heterozygosity (%) | <0.01 | 0.425 | 0.0023 | 0.037 | |
| Contigs | 92 | 544 | 2,931 | 820 | 239 |
| Largest contig size (bp) | 1,886,627 | 255,388 | 149,277 | 191,544 | 434,863 |
| Mean contig length (bp) | 121,887 | 22,090 | 3,753 | 14,051 | 44,786 |
| Protein-coding ORFs | 5,901 | 7,477 | 4,470 | 5,008 | 6,724 |
| ORFs with assigned function | 2,905 | 3,946 | 2,842 | 2,752 | 2,900 |
| ORFs without assigned function | 2,996 | 3,531 | 1,628 | 2,253 | 3,824 |
| ORFs in asserted pathways | 710 | 942 | 841 | 680 | 656 |
| ORFs not in asserted pathways | 5,191 | 6,535 | 3,629 | 4,328 | 6,068 |
| ORFs with assigned function, but no pathway | 2,196 | 3,005 | 2,005 | 2,073 | 2,245 |
aNumber of scaffolds. The WB isolate has 306 contigs.
Comparison of vsp Gene Search Methods and Results for Each Giardia Isolate
| Isolate_ Genotype | CRGKA | Meets One of Three | Meets All Three Criteria Plus psi-Blast | Ratio of |
|---|---|---|---|---|
| Pig_E | 123 | 189 | 104 | 3.8 |
| DH_A2 | 121 | 190 | 94 | 2.8 |
| WB_A1 | 186 | 244 | 119 | 4.1 |
| GS_B | 275 | 503 | 197 | 6.7 |
| GB_B | 14 | 41 | 10 | 0.92 |
aNumbers were obtained by word search using the conserved C-terminal sequence of CRGKA, +/− a single amino acid variation.
bNumbers represents the total vsp genes found in at least one of three separate analyses. The first is the CRGKA word search shown in the second column. Second, we performed a Blast search against the ORFs from each genome using a conserved encoded 38 amino acid VSP tail sequence. Third, we used a keyword search of Pfam descriptions.
cNumber of vsp genes found in all three analyses in addition to a psi-Blast using the same 38 base tail sequence with the requirement of alignment length of greater than 29 bases and a mismatch rate less than 11.
dThe ratio of vsp genes fulfilling one of the first three criteria (third column) to the total ORFs for each genome (expressed as a percent).
FVenn diagram of the common and unique full-length ORFs of Giardia lamblia isolates. Diagram shows both unique and shared gene content of four G. lamblia genomes as derived by ortholog analysis. Numbers in parentheses represent unique numbers of ORFs per genomes and within intersections between genomes. Numbers not in parentheses represent conserved ORFs within intersections of comparison that are unique relative to the other genomes.
FPhylogenetic analysis of ribosomal subunit S12E genes from Giardia genomes and other representative protozoans. The ribosomal subunit S12E genes from each of the four Giardia genotypes were aligned along with those of S12E genes from Trypanosoma cruzi, Trypanosoma brucei, Theileria parva, Theileria annulata, Cryptosporidium parvum, Cryptosporidium hominis, Leishmania infantum, Leishmania major, Plasmodium falciparum, Plasmodium knowlesi, Trichomonas vaginalis, and Naegleria gruberi. Trees were constructed using Bayesian inference. The posterior tree is shown. The horizontal scale line represents number of base substitutions per site analyzed. Numbers at the nodes represent the posterior probability.
FComparative chromosomal sequence alignment between Giardia Genotypes WB_A1 and DH_A2 and WB_A1 and GS_B. Each horizontal panel represents one chromosome sequence, the name of the sequence, a scale representing the DNA sequence coordinates for that chromosome and a single, black center line that the colored blocks sit on top of or underneath. The colored blocks are those regions of conserved DNA that are internally free of genome rearrangements. These blocks are referred to as LCBs representing entirely collinear and homologous sequence between the two genomes. LCBs that lie above the centerline are regions oriented in the forward direction relative to the reference genome (WB_A1) (Perry et al. 2011). Blocks below this line are oriented in a reverse complement manner relative to the reference chromosome. Red vertical lines that start at the top of the LCBs and extend equidistance below the LCBs represent contig boundaries. For WB_A1, each of the five chromosomes is illustrated as a single contig. Therefore, only two red lines are shown for WB_A1, indicating the ends of the chromosomes. White regions between LCBs represent sequence that lacks detectable homology in the other genome. Within each LCB, the height of the color corresponds to the average conservation within that LCB. Segments of sequence that are completely white within a LCB align poorly and most likely contains sequence specific to that chromosome, but which is still collinear in relation to the sequence surrounding it. The height of the color or similarity profile within the LCBs is calculated to be inversely proportional to the average alignment column entropy over a region of the alignment. The boundaries of the LCBs represent breakpoints of genome rearrangement, while blank adjacent regions are isolate-specific sequence gained or lost in the breakpoint region. Colored lines connecting LCBs or non-LCBs between the two chromosomes represent homologous regions. (A) Mauve visual depiction of chromosomal alignments between WB_A1 and DH_A2. Brackets represent specific contigs discussed in the text. The “H” designates a junction verified by PCR (see supplementary file S1, Supplementary Material online, for more detail). (B) Mauve visual depiction of chromosomal alignments between WB_A1 and GS_B. Brackets represent specific contigs discussed in the text. The “H” designates a junction verified by PCR.