Literature DB >> 27519257

Annotated draft genome sequences of three species of Cryptosporidium: Cryptosporidium meleagridis isolate UKMEL1, C. baileyi isolate TAMU-09Q1 and C. hominis isolates TU502_2012 and UKH1.

Olukemi O Ifeonu1, Marcus C Chibucos1, Joshua Orvis1, Qi Su1, Kristin Elwin2, Fengguang Guo3, Haili Zhang3, Lihua Xiao4, Mingfei Sun5, Rachel M Chalmers2, Claire M Fraser1, Guan Zhu3, Jessica C Kissinger6, Giovanni Widmer7, Joana C Silva8.   

Abstract

Human cryptosporidiosis is caused primarily by Cryptosporidium hominis, C. parvum and C. meleagridis. To accelerate research on parasites in the genus Cryptosporidium, we generated annotated, draft genome sequences of human C. hominis isolates TU502_2012 and UKH1, C. meleagridis UKMEL1, also isolated from a human patient, and the avian parasite C. baileyi TAMU-09Q1. The annotation of the genome sequences relied in part on RNAseq data generated from the oocyst stage of both C. hominis and C. baileyi The genome assembly of C. hominis is significantly more complete and less fragmented than that available previously, which enabled the generation of a much-improved gene set for this species, with an increase in average gene length of 500 bp relative to the protein-encoding genes in the 2004 C. hominis annotation. Our results reveal that the genomes of C. hominis and C. parvum are very similar in both gene density and average gene length. These data should prove a valuable resource for the Cryptosporidium research community. © FEMS 2016.

Entities:  

Keywords:  C. hominis TU502_2012; Cryptosporidium; Cryptosporidium baileyi; Cryptosporidium meleagridis; annotation; genome assembly

Mesh:

Year:  2016        PMID: 27519257      PMCID: PMC5407061          DOI: 10.1093/femspd/ftw080

Source DB:  PubMed          Journal:  Pathog Dis        ISSN: 2049-632X            Impact factor:   3.166


Cryptosporidium parasites (Phylum: Apicomplexa) infect a wide range of vertebrates, from fish to humans, and are the causative agents of cryptosporidiosis in humans (Upton and Current 1985; Tzipori 1988; Widmer and Sullivan 2012). A recent, large, multicenter study of the etiology of moderate-to-severe diarrhea (MSD) in infants in the developing world found Cryptosporidium hominis to be among the four predominant pathogens associated with MSD in children under 5 years of age (Kotloff et al.2013). Some Cryptosporidium species are capable of zoonotic transmission (Ryan, Fayer and Xiao 2014). Comparative analysis of genomes from diverse Cryptosporidium species and related protists is essential to fully understand the biology, pathology, host specificity and evolution of this genus. The reference C. parvum IOWA II genome (Abrahamsen et al.2004) is essentially complete, with its eight chromosomes distributed among 18 contigs, including full-length chromosomes. In contrast, the reference assembly of C. hominis, based on isolate TU502, published in 2004 (Xu et al.2004), is a highly fragmented draft genome consisting of 1422 contigs. To accelerate research on these pathogens of public health and veterinary significance, we sequenced, assembled and annotated four Cryptosporidium genome sequences belonging to three species as part of a community White Paper undertaking. Two sequences were generated from a species infective to humans, C. hominis isolates TU502_2012 and UKH1. In addition, sequences were generated from the generalist species C. meleagridis, isolate UKMEL1, and from the TAMU-09Q1 isolate of C. baileyi, an avian-infecting parasite. All three species are enteric parasites. Cryptosporidium baileyi can complete its entire life cycle in embryonated chicken eggs, making it a useful laboratory model to address some aspects of Cryptosporidium biology. Cryptosporidium meleagridis appears to lack host specificity, as it is known to infect both avian and mammalian species (Akiyoshi et al.2003). Cryptosporidium hominis UKH1 and C. meleagridis UKMEL1 oocysts were isolated from fecal samples of naturally infected humans. Cryptosporidium meleagridis oocysts were propagated in immunosuppressed adult CD-1 mice, and C. hominis UKH1 in neonatal gnotobiotic pigs. Cryptosporidium hominis TU502_2012 originates from C. hominis TU502 isolate maintained by serial propagation in gnotobiotic pigs (Tzipori et al.1994; Xu et al.2004). Cryptosporidium baileyi oocysts were extracted from experimentally infected embryonated chicken eggs. Prior to isolating DNA, extracted oocysts were purified on density gradients (Widmer, Feng and Tanriverdi 2004) and surface-sterilized with bleach to minimize contamination with host and bacterial DNA. RNA samples were obtained from C. hominis TU502_2012 and C. baileyi TAMU-10GZ1 oocysts <4 months old, and sequenced to high coverage using strand-specific RNASeq (Parkhomchuk et al.2009). De novo assembly of the genomic reads was performed using MaSuRCA version1.9 (Zimin et al.2013) (Table 1).
Table 1.

Summary statistics of whole-genome sequence and transcriptome data, assemblies and annotation.

Cryptosporidium hominis Cryptosporidium Cryptosporidium
meleagridis baileyi
Isolate: DNATU502aTU502_2012UKH1UKMEL1TAMU-09Q1
gDNA Illumina library fragment size (bp)N/A460461517654
No. MiSeq readsN/A6,871,8587,596,41022,862,0446,240,960
No. base pairsN/A1,724,836,3581,906,698,9106,881,475,2441,566,480,960
Assembly size (bp)8,743,5709,107,7399,156,0918,973,2008,493,640
No. of contigs142211915657145
Contig N5014,504238,509179,408322,908203,018
Largest contig (bp)90,4441,270,815542,781732,862702,637
G + C content (%)30.930.130.131.024.3
No. protein-coding genes3,8853,7453,7653,7583,692
Average gene length (bp)1,3601,8921,8301,8441,778
Percent coding60.4%77.8%75.2%77.2%77.3%
Accession no.AAEL00000000JIBM00000000JIBN00000000JIBK00000000JIBL00000000
SNPs relative to TU502a synonymous : non-syn1303 : 2,567718 : 1336N/AN/A
SNPs relative to TU502_2012 synonymous : non-synN/A143 : 339N/AN/A
Isolate: RNATU502_2012UKH1UKMEL1TAMU-10GZ1
No. HiSeq read pairs16,568,11592,878,236N/A55,829,305
No. expressed genesb1,8682,454N/A2,235
Accession no.SRX481527SRX481475N/ASRX481530

2004 assembly (Xu et al.2004).

Minimum 5X CDS coverage.

Summary statistics of whole-genome sequence and transcriptome data, assemblies and annotation. 2004 assembly (Xu et al.2004). Minimum 5X CDS coverage. All the genomes except C. hominis UKH1 were annotated using a semi-automated approach. We trained Augustus (Stanke et al.2004) using a set of previously manually curated genes. Consensus predictor EVidence Modeler, EVM (Haas et al.2008), was used to generate annotations based on predictions from Augustus and GeneMark-ES (Borodovsky and Lomsadze 2011), transcripts assembled from RNAseq reads and matches to a set of highly conserved eukaryotic genes—the Core Eukaryotic Genes Mapping Approach genes (Parra, Bradnam and Korf 2007). In addition, 394 genes (∼10% of all genes) in the C. hominis TU502_2012 genome were manually annotated using Web Apollo (Lee et al.2013). The manually curated genes are thought to encode antigens (Ifeonu et al., in preparartion). The C. hominis genes TU502_2012 were mapped to the C. hominis UKH1 assembly using GMAP (v2015-12-31), and filtered to include only matches that extend at least over 95% of the sequences and have ≥95% alignment identity at the amino acid level. The final assembly attributes are listed in Table 1. This Whole Genome Shotgun project has been deposited in DDBJ/EMBL/GenBank under the accession numbers listed in Table 1 and the sequences are accessible at CryptoDB (http://CryptoDB.org). These are the first versions of genome sequence assemblies and annotations for each isolate. The genome of C. hominis isolate TU502 has been sequenced previously (Xu et al.2004). We resequenced the genome of this isolate, after multiple passages, in an attempt to improve the reference genome assembly and gene set for this species. The resulting C. hominis TU502_2012 genome assembly consists of only 119 contigs, a 10-fold reduction relative to the 2004 assembly. The genome assembly is now more complete, and roughly the same size as that of C. parvum, which is also 9.1 Mbp in length (Abrahamsen et al.2004). The genes in the new annotation are on average 500 bp longer than their counterparts in the original 2004 annotation, resulting in an increase of 17% in the fraction of the genome that encodes for proteins. In order to determine if this gene structural annotation is more accurate than the one published in 2004, we compared the length of all C. parvum IOWA II proteins with their orthologs in either C. hominis TU502 or C. hominis TU502_2012. The distribution of length differences based on the comparison to the 2012 reannotation indeed has lower variance, with an additional 500 genes similar in length between the two species (Fig. 1). Also, there are 538 C. parvum genes without orthologs in the C. hominis TU502 2004 annotation compared to only 288 such cases in the 2012 annotation. Interestingly, while the original C. hominis annotation had a preponderance of genes shorter than their C. parvum orthologs, the current gene set is skewed in the opposite direction (Fig. 1). Whether this difference is real, or a result of remaining gene structure errors in one or both species, remains to be determined. The C. hominis TU502_2012 annotation contains 206 predicted protein-coding genes with no orthologs in C. parvum IOWA II. Of the 3745 predicted protein-coding genes in C. hominis TU502_2012, only 63% are also found in all other annotated Cryptosporidium genomes available to date: C. parvum IOWA II, C. meleagridis UKMEL1, C. baileyi TAMU-09Q1 and C. muris RN66 (Fig. 1). Finally, 110 predicted protein-coding genes are present in the three newly sequenced genomes, but homologs are absent in the current C. parvum predicted proteome. These significant differences in gene content among species are, in all likelihood, due mostly to the limitations of the semi-automated annotation approach used, rather than to true instances of gene gain/loss. An intense, manual curation effort of the genome annotation of each species is ongoing, and will be essential to validate these results.
Figure 1.

Inter- and intraspecies genome-wide comparisons of genome composition. (A) Comparison of protein length between C parvum and the 2004 and 2012 versions of the C. hominis TU502. (B) Distribution of orthologous gene clusters in five Cryptosporidium species. (C) Distribution of SNPs and short indels among three C. hominis isolates, TU502, TU502_2012 and UKH1. DNA sequence reads from the C. hominis TU502_2012 and UKH1 were mapped against the reference genome assembly of C. hominis TU502, as well as against each other, using BWA (Li and Durbin 2009). SNPs and small indels were identified using GATK (McKenna et al.2010). Identified variants were further filtered for reliability, according to the following parameter values: (DP < 12) ∥ (QUAL < 50) ∥ (SB > –0.10) ∥ (MQ0 > = 2 && (MQ0/(1.0 * DP)) > 0.1). SNPs were categorized as coding and non-coding, given the assembly and the annotation, using VCFtools.

Inter- and intraspecies genome-wide comparisons of genome composition. (A) Comparison of protein length between C parvum and the 2004 and 2012 versions of the C. hominis TU502. (B) Distribution of orthologous gene clusters in five Cryptosporidium species. (C) Distribution of SNPs and short indels among three C. hominis isolates, TU502, TU502_2012 and UKH1. DNA sequence reads from the C. hominis TU502_2012 and UKH1 were mapped against the reference genome assembly of C. hominis TU502, as well as against each other, using BWA (Li and Durbin 2009). SNPs and small indels were identified using GATK (McKenna et al.2010). Identified variants were further filtered for reliability, according to the following parameter values: (DP < 12) ∥ (QUAL < 50) ∥ (SB > –0.10) ∥ (MQ0 > = 2 && (MQ0/(1.0 * DP)) > 0.1). SNPs were categorized as coding and non-coding, given the assembly and the annotation, using VCFtools. Genetic differences among C. hominis isolates were identified by read mapping, followed by calling and filtering of single nucleotide polymorphisms (SNPs) and small insertions/deletions (indels). A total of 10 526 sequence variants were identified in C. hominis TU502_2012 relative to the reference C. hominis TU502 assembly; in contrast, only 4394 sequence variants were found between C. hominis UKH1 and the reference C. hominis. Interestingly, the vast majority of the differences relative to the reference TU502 genome are shared between the two new isolates (Fig. 1). A plausible explanation, which remains to be verified, is that these SNPs common to both new isolates are in fact sequencing errors in the original C. hominis TU502 assembly, which was based on low-coverage Sanger sequencing. This, however, does not explain the fact C. hominis TU502_2012 has more differences relative to TU502 than does UKH1. It is possible that during the approximate 20 passages in gnotobiotic pigs which C. hominis TU502_2012 isolate has experienced between 2004 and 2012, the make-up of the parasite population has shifted. In the absence of methods for cloning and expanding single Cryptosporidium sporozoites, the isolates sequenced to date are likely to be heterogeneous populations (Grinberg and Widmer 2016). In fact, high-throughput sequencing of a polymorphic locus demonstrated the presence of multiple alleles in laboratory and natural Cryptosporidium isolates (Widmer et al.2015). We generated RNAseq data for two of the species, C. hominis and C. baileyi. These data are strand specific, a tremendous advantage when attempting to generate accurate gene-specific expression values in highly gene-dense genomes, where neighboring transcriptional units often overlap (Tretina, Pelle and Silva 2016). The quantity of RNAseq data generated for C. hominis UKH1 was six times than that for the TU502_2012 isolate (Table 1). Despite this difference, the relative expression values for each gene are remarkably similar for the two isolates (r2 ∼ 0.96; Fig. 2), which supports the strength of the relative expression results. The RNAseq data generated from oocysts indicate that ∼50% and ∼60% of protein-coding genes are expressed in C. hominis TU502_2012 and C. baileyi, respectively, during this stage of the life cycle (Table 1). Gene expression is also positively correlated between species (r2 ∼ 0.51; Fig. 2), with lactate/malate dehydrogenase (LDH), a GDP-fucose transporter, agrin and the ubiquitous heat shock protein 90 (HSP90) being among the most highly expressed genes in both species. LDH and HSP90 have been shown to be among the top nine most highly expressed genes in C. parvum oocysts (Zhang et al.2012). Genes preferentially expressed in one or the other species may provide a good starting point to investigate biological differences between taxa. Among the genes that differ most in expression level between the two species are pyridine nucleotide-disulphide oxidoreductase, which has a higher level of expression in C. hominis, and AhpC/TSA family protein, WD repeat-containing protein 82 and DNA mismatch repair protein msh-2, all of which have higher expression levels in C. baileyi.
Figure 2.

Gene expression in Cryptosporidium oocysts is correlated within and between species. (A) Correlation in oocyst gene expression is highly correlated between two isolates of C. hominis (r2 ∼ 96%). (B) Correlation in oocyst gene expression is correlated between C. hominis and C. baylei (r2 ∼ 51%), particularly among the most highly expressed genes.

Gene expression in Cryptosporidium oocysts is correlated within and between species. (A) Correlation in oocyst gene expression is highly correlated between two isolates of C. hominis (r2 ∼ 96%). (B) Correlation in oocyst gene expression is correlated between C. hominis and C. baylei (r2 ∼ 51%), particularly among the most highly expressed genes. The work on Cryptosporidium genomes and their respective annotations with particular emphasis on the manual curation of the structure and function of all protein-coding genes is continuing. Together with the identification of genes unique to each species and genes with species-specific expression profiles, this work will facilitate the identification of genes responsible for host specificity and other phenotypes relevant to the understanding of cryptosporidiosis.
  23 in total

1.  The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data.

Authors:  Aaron McKenna; Matthew Hanna; Eric Banks; Andrey Sivachenko; Kristian Cibulskis; Andrew Kernytsky; Kiran Garimella; David Altshuler; Stacey Gabriel; Mark Daly; Mark A DePristo
Journal:  Genome Res       Date:  2010-07-19       Impact factor: 9.043

2.  The MaSuRCA genome assembler.

Authors:  Aleksey V Zimin; Guillaume Marçais; Daniela Puiu; Michael Roberts; Steven L Salzberg; James A Yorke
Journal:  Bioinformatics       Date:  2013-08-29       Impact factor: 6.937

Review 3.  Cryptosporidium within-host genetic diversity: systematic bibliographical search and narrative overview.

Authors:  Alex Grinberg; Giovanni Widmer
Journal:  Int J Parasitol       Date:  2016-03-25       Impact factor: 3.981

4.  Eukaryotic gene prediction using GeneMark.hmm-E and GeneMark-ES.

Authors:  Mark Borodovsky; Alex Lomsadze
Journal:  Curr Protoc Bioinformatics       Date:  2011-09

5.  The genome of Cryptosporidium hominis.

Authors:  Ping Xu; Giovanni Widmer; Yingping Wang; Luiz S Ozaki; Joao M Alves; Myrna G Serrano; Daniela Puiu; Patricio Manque; Donna Akiyoshi; Aaron J Mackey; William R Pearson; Paul H Dear; Alan T Bankier; Darrell L Peterson; Mitchell S Abrahamsen; Vivek Kapur; Saul Tzipori; Gregory A Buck
Journal:  Nature       Date:  2004-10-28       Impact factor: 49.962

6.  The species of Cryptosporidium (Apicomplexa: Cryptosporidiidae) infecting mammals.

Authors:  S J Upton; W L Current
Journal:  J Parasitol       Date:  1985-10       Impact factor: 1.276

7.  Burden and aetiology of diarrhoeal disease in infants and young children in developing countries (the Global Enteric Multicenter Study, GEMS): a prospective, case-control study.

Authors:  Karen L Kotloff; James P Nataro; William C Blackwelder; Dilruba Nasrin; Tamer H Farag; Sandra Panchalingam; Yukun Wu; Samba O Sow; Dipika Sur; Robert F Breiman; Abu Sg Faruque; Anita Km Zaidi; Debasish Saha; Pedro L Alonso; Boubou Tamboura; Doh Sanogo; Uma Onwuchekwa; Byomkesh Manna; Thandavarayan Ramamurthy; Suman Kanungo; John B Ochieng; Richard Omore; Joseph O Oundo; Anowar Hossain; Sumon K Das; Shahnawaz Ahmed; Shahida Qureshi; Farheen Quadri; Richard A Adegbola; Martin Antonio; M Jahangir Hossain; Adebayo Akinsola; Inacio Mandomando; Tacilta Nhampossa; Sozinho Acácio; Kousick Biswas; Ciara E O'Reilly; Eric D Mintz; Lynette Y Berkeley; Khitam Muhsen; Halvor Sommerfelt; Roy M Robins-Browne; Myron M Levine
Journal:  Lancet       Date:  2013-05-14       Impact factor: 79.321

8.  Population structure of natural and propagated isolates of Cryptosporidium parvum, C. hominis and C. meleagridis.

Authors:  Giovanni Widmer; Refaat Ras; Rachel M Chalmers; Kristin Elwin; Enas Desoky; Ahmed Badawy
Journal:  Environ Microbiol       Date:  2014-04-02       Impact factor: 5.491

9.  Genotyping of Cryptosporidium parvum with microsatellite markers.

Authors:  Giovanni Widmer; Xiaochuan Feng; Sultan Tanriverdi
Journal:  Methods Mol Biol       Date:  2004

10.  Transcriptome analysis by strand-specific sequencing of complementary DNA.

Authors:  Dmitri Parkhomchuk; Tatiana Borodina; Vyacheslav Amstislavskiy; Maria Banaru; Linda Hallen; Sylvia Krobitsch; Hans Lehrach; Alexey Soldatov
Journal:  Nucleic Acids Res       Date:  2009-07-20       Impact factor: 16.971

View more
  16 in total

1.  Diverse single-amino-acid repeat profiles in the genus Cryptosporidium.

Authors:  Giovanni Widmer
Journal:  Parasitology       Date:  2018-02-12       Impact factor: 3.234

2.  Genomic Variation in IbA10G2 and Other Patient-Derived Cryptosporidium hominis Subtypes.

Authors:  Per Sikora; Sofia Andersson; Jadwiga Winiecka-Krusnell; Björn Hallström; Cecilia Alsmark; Karin Troell; Jessica Beser; Romanico B G Arrighi
Journal:  J Clin Microbiol       Date:  2016-12-21       Impact factor: 5.948

Review 3.  Comparative genomics: how has it advanced our knowledge of cryptosporidiosis epidemiology?

Authors:  Yingying Fan; Yaoyu Feng; Lihua Xiao
Journal:  Parasitol Res       Date:  2019-11-14       Impact factor: 2.289

4.  Identification and characterization of a new 34 kDa MORN motif-containing sporozoite surface-exposed protein, Cp-P34, unique to Cryptosporidium.

Authors:  Justyna J Jaskiewicz; Jacqueline M Tremblay; Saul Tzipori; Charles B Shoemaker
Journal:  Int J Parasitol       Date:  2021-03-25       Impact factor: 4.330

5.  Cryptosporidium hominis gene catalog: a resource for the selection of novel Cryptosporidium vaccine candidates.

Authors:  Olukemi O Ifeonu; Raphael Simon; Sharon M Tennant; Abhineet S Sheoran; Maria C Daly; Victor Felix; Jessica C Kissinger; Giovanni Widmer; Myron M Levine; Saul Tzipori; Joana C Silva
Journal:  Database (Oxford)       Date:  2016-10-19       Impact factor: 3.451

6.  Apollo: Democratizing genome annotation.

Authors:  Nathan A Dunn; Deepak R Unni; Colin Diesh; Monica Munoz-Torres; Nomi L Harris; Eric Yao; Helena Rasche; Ian H Holmes; Christine G Elsik; Suzanna E Lewis
Journal:  PLoS Comput Biol       Date:  2019-02-06       Impact factor: 4.475

Review 7.  Challenges for Cryptosporidium Population Studies.

Authors:  Rodrigo P Baptista; Garrett W Cooper; Jessica C Kissinger
Journal:  Genes (Basel)       Date:  2021-06-10       Impact factor: 4.096

8.  Genome-wide diversity and gene expression profiling of Babesia microti isolates identify polymorphic genes that mediate host-pathogen interactions.

Authors:  Joana C Silva; Emmanuel Cornillot; Carrie McCracken; Sahar Usmani-Brown; Ankit Dwivedi; Olukemi O Ifeonu; Jonathan Crabtree; Hanzel T Gotia; Azan Z Virji; Christelle Reynes; Jacques Colinge; Vidya Kumar; Lauren Lawres; Joseph E Pazzi; Jozelyn V Pablo; Chris Hung; Jana Brancato; Priti Kumari; Joshua Orvis; Kyle Tretina; Marcus Chibucos; Sandy Ott; Lisa Sadzewicz; Naomi Sengamalay; Amol C Shetty; Qi Su; Luke Tallon; Claire M Fraser; Roger Frutos; Douglas M Molina; Peter J Krause; Choukri Ben Mamoun
Journal:  Sci Rep       Date:  2016-10-18       Impact factor: 4.379

9.  Comparative genomic analysis of three intestinal species reveals reductions in secreted pathogenesis determinants in bovine-specific and non-pathogenic Cryptosporidium species.

Authors:  Zhixiao Xu; Na Li; Yaqiong Guo; Yaoyu Feng; Lihua Xiao
Journal:  Microb Genom       Date:  2020-05-14

Review 10.  Molecular epidemiologic tools for waterborne pathogens Cryptosporidium spp. and Giardia duodenalis.

Authors:  Lihua Xiao; Yaoyu Feng
Journal:  Food Waterborne Parasitol       Date:  2017-09-29
View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.