Literature DB >> 16381902

CryptoDB: a Cryptosporidium bioinformatics resource update.

Mark Heiges¹, Haiming Wang, Edward Robinson, Cristina Aurrecoechea, Xin Gao, Nivedita Kaluskar, Philippa Rhodes, Sammy Wang, Cong-Zhou He, Yanqi Su, John Miller, Eileen Kraemer, Jessica C Kissinger.

Abstract

The database, CryptoDB (http://CryptoDB.org), is a community bioinformatics resource for the AIDS-related apicomplexan-parasite, Cryptosporidium. CryptoDB integrates whole genome sequence and annotation with expressed sequence tag and genome survey sequence data and provides supplemental bioinformatics analyses and data-mining tools. A simple, yet comprehensive web interface is available for mining and visualizing the data. CryptoDB is allied with the databases PlasmoDB and ToxoDB via ApiDB, an NIH/NIAID-fundedBioinformatics Resource Center. Recent updates to CryptoDB include the deposition of annotated genome sequences for Cryptosporidium parvum and Cryptosporidium hominis, migration to a relational database (GUS), a new query and visualization interface and the introduction of Web services.

Entities: Chemical Disease Species

Mesh：

Substances：
Protozoan Proteins

Year: 2006 PMID： 16381902 PMCID： PMC1347441 DOI： 10.1093/nar/gkj078

Source DB: PubMed Journal: Nucleic Acids Res ISSN： 0305-1048 Impact factor: 16.971

INTRODUCTION

The Apicomplexan parasite Cryptosporidium is a global causative agent of severe and chronic diarrheal disease in humans and other animals. As no reliable chemo- or immuno-therapy is currently available, infections can be life threatening for people with a compromised immune system, such as AIDS patients. The pathogen is typically spread via contaminated drinking water and is resistant to water chlorination and filtration (1). Because of the water safety threat to public health, Cryptosporidium is ranked as a Category B Biodefense Pathogen by the National Institutes of Health. Bioinformatics analysis plays an important role in understanding the biology of and identification of potential drug targets in this medically important parasite. To aid the research community in this line of inquiry, the online database CryptoDB continues to update and expand its role of warehousing and interfacing Cryptosporidium genome sequence, annotation, sequence analysis and other Cryptosporidium-related information.

UPDATED DATASET

Version 3.0 of CryptoDB was released in April 2005 and contains the published genome sequence and annotation for Cryptosporidium hominis, strain TU502 (2) and Cryptosporidium parvum, strain IOWA (3). The database houses copies of assembled genome contigs and gene annotations deposited in GenBank (4) by the sequence generators. The C.parvum genome sequence is represented by 18 contigs ranging in size from 17 kb to 1.2 Mb in length and annotated with 3885 total genes. The C.hominis genome sequence is represented in 1422 contigs ranging in size from a few hundred to 90 thousand base pairs in length and annotated with 3956 total genes. C.parvum chromosome 6 has been independently sequenced and annotated (5) and is represented in the database. In addition to the data provided by genome sequencing efforts, ∼6 Mb of genome survey sequence (GSS) and expressed sequence tag (EST) (6) data are incorporated. The ESTs are clustered into RNA transcripts and aligned to the genome using the methodology applied at ApiEST-DB (7). Gene annotations provided by the genome sequencing centers are augmented with supplemental analyses. Pre-computed BLASTX analyses of Cryptosporidium contigs versus the GenBank non-redundant protein database and EST alignments to contig sequences offer supporting evidence for gene predictions. Potential syntenic relationships of the C.hominis and C.parvum contig sequences are calculated and graphically displayed. Protein feature predictions of signal peptides and transmembrane domains are provided. Open reading frames >50 and >100 amino acids in length have been calculated for all nucleic acid sequences in all six reading frames. All sequence datasets are available for bulk download in FASTA format. Programmatic access to selected resources is provided via Web service interfaces.

IMPROVED DATABASE AND WEB INTERFACE

CryptoDB 3.0 is backed by a relational database utilizing the Genomics Unified Schema (GUS, GUSdb.org) (8) and Oracle 10g. Migration to a relational database architecture marks a major improvement over previous releases because of the new services and resources that can now be offered. The CryptoDB web interface provides a set of forms through which users can easily query the annotation and pre-computed analysis data (Figure 1A). Queries for contig sequence, gene and protein features are possible and can be restricted to either or both of the hosted species genomes. At the gene level, users can conduct text searches of gene product descriptions, search for genes by RNA type (mRNA, rRNA, snoRNA and tRNA) and find genes having alignments to C.parvum ESTs. For protein features, users can select genes predicted by SignalP (9) to encode a signal peptide or predicted by TMHMM (10) to contain transmembrane domains. Users may also retrieve a specific gene by locus tag or a contig sequence by accession number. Ad hoc data selections not obtainable via the provided queries may be requested by email to help@cryptodb.org.

Figure 1

Database functionality. (A) Searches are initiated via queries provided on the web site's front page. (B) The results are returned as a summary table with links to detailed record pages. (C) Detailed record page with summary of all available data/information for this gene or contig sequence. (D) GBrowse of genomic region of interest provides a graphical view of annotations and similarity analyses. (E) Search results, such as BLAST reports, are linked to detailed records and to a sequence retrieval utility.

Gene pages and contig sequence pages provide a detailed view of annotation and analysis for a given record in the database. Gene pages contain a text overview of the gene, including the coordinate position on its contig and product description when available (Figure 1C). GBrowse (11) has been utilized to provide a graphic display of annotated gene features and the data mapped to the genome, BLAST hits, ESTs, etc. (Figure 1D). The web interface includes a mechanism to allow users to readily download the sequences and other attributes associated with their query result set. A query history permits users to track their searches and combine them into more complex queries across data types (e.g. ‘list all genes on chromosome 3 that contain transmembrane domains’).

ANALYSIS AND RESEARCH TOOLS

Several tools for data mining augment the published annotations and pre-computed analyses. Users may BLAST their own sequences against the genomic contig, annotated protein, GSS and EST sequence databases (Figure 1E). A motif search tool finds protein sequences with PROSITE (12) or user-defined amino acid patterns. Keywords from Cryptosporidium genomic sequences versus GenBank NRDB BLASTX results are indexed and searchable. In each case, the results contain links back to detailed gene, protein or contig record pages or to external databases (e.g. GenBank) as appropriate (Figure 1B). To facilitate tracking of the latest literature, PubCrawler (13) is used to poll NCBI's PubMed and GenBank each week day for new Cryptosporidium-related updates.

AFFILIATIONS

CryptoDB is a member of ApiDB.org, an NIH/NIAID funded Bioinformatics Resource Center (BRC) for Biodefense and Emerging or Re-Emerging Infectious Diseases (). Other ApiDB members include the genome databases for Plasmodium (PlasmoDB) (14,15) and Toxoplasma (ToxoDB) (16) and the Apicomplexan EST database, ApiEST-DB (7). CryptoDB and other member databases are linking to ApiDB in a coordinated effort to promote comparative studies and ease of access across these apicomplexan genomes.

WEB SERVICES AND DATABASE NEWS

To facilitate database integration with ApiDB, other NIAID BRC's and programmatic access of CryptoDB by others, web services for CryptoDB have been implemented. Web services are pieces of software that can communicate across the Internet to build distributed applications. They can do this regardless of the software used for their implementation as long as they use a common protocol, SOAP (17). CryptoDB uses SOAP and provides published WSDL files and sample client software in Java (using Axis) and PERL (using SOAP::Lite). Currently, one service that retrieves FASTA sequence files is active. Additional services and infrastructure (18) are planned. To facilitate the dissemination of news and updates related to CryptoDB, we have established a Really Simple Syndication news feed (RSS) that is displayed on the home page of CryptoDB and ApiDB and can be read by any RSS news aggregator.

FUTURE PLANS

CryptoDB is fully funded and staffed with biologists and software developers with close ties to software developers for GUS, ToxoDB and PlasmoDB. This fertile ground will support many opportunities for frequent database updates and expansions with new data types, analyses, data-mining tools and visual displays. Gene ontology terms and protein feature signatures from InterProScan (19) analyses will be included with gene records. Improvements to visualization of genome-wide synteny are planned. SRI International's Pathway Tools software (20) is being added to facilitate analyses of metabolic pathways in both annotated Cryptosporidium genome sequences. Future releases of CryptoDB will publish this information, as ‘CryptoCyc’ for querying and visualization in a graphical display. To facilitate data sharing, CryptoDB has the capacity to activate a Distributed Annotation Server (DAS) (21) via a DAS-GUS adapter if needed. CryptoDB will continue to work with the ApiDB consortium to further integrate its resources with other apicomplexan genome sites. Database federating technologies, web services and portal designs are currently being implemented toward this end. Data exchange and interoperability with other NIAID Bioinformatics Resource Centers will be a continued effort.

17 in total

1. Creating a bioinformatics nation.

Authors: Lincoln Stein
Journal: Nature Date: 2002-05-09 Impact factor: 49.962

2. The Pathway Tools software.

Authors: Peter D Karp; Suzanne Paley; Pedro Romero
Journal: Bioinformatics Date: 2002 Impact factor: 6.937

3. The generic genome browser: a building block for a model organism system database.

Authors: Lincoln D Stein; Christopher Mungall; ShengQiang Shu; Michael Caudy; Marco Mangone; Allen Day; Elizabeth Nickerson; Jason E Stajich; Todd W Harris; Adrian Arva; Suzanna Lewis
Journal: Genome Res Date: 2002-10 Impact factor: 9.043

4. The Plasmodium genome database.

Authors: Jessica C Kissinger; Brian P Brunk; Jonathan Crabtree; Martin J Fraunholz; Bindu Gajria; Arthur J Milgram; David S Pearson; Jonathan Schug; Amit Bahl; Sharon J Diskin; Hagai Ginsburg; Gregory R Grant; Dinesh Gupta; Philip Labo; Li Li; Matthew D Mailman; Shannon K McWeeney; Patricia Whetzel; Christian J Stoeckert; David S Roos
Journal: Nature Date: 2002-10-03 Impact factor: 49.962

5. PROSITE: a documented database using patterns and profiles as motif descriptors.

Authors: Christian J A Sigrist; Lorenzo Cerutti; Nicolas Hulo; Alexandre Gattiker; Laurent Falquet; Marco Pagni; Amos Bairoch; Philipp Bucher
Journal: Brief Bioinform Date: 2002-09 Impact factor: 11.622

6. Preliminary profile of the Cryptosporidium parvum genome: an expressed sequence tag and genome survey sequence analysis.

Authors: W B Strong; R G Nelson
Journal: Mol Biochem Parasitol Date: 2000-03-15 Impact factor: 1.759

7. Integrated mapping, chromosomal sequencing and sequence analysis of Cryptosporidium parvum.

Authors: Alan T Bankier; Helen F Spriggs; Berthold Fartmann; Bernard A Konfortov; Martin Madera; Christine Vogel; Sarah A Teichmann; Al Ivens; Paul H Dear
Journal: Genome Res Date: 2003-07-17 Impact factor: 9.043

8. ToxoDB: accessing the Toxoplasma gondii genome.

Authors: Jessica C Kissinger; Bindu Gajria; Li Li; Ian T Paulsen; David S Roos
Journal: Nucleic Acids Res Date: 2003-01-01 Impact factor: 16.971

9. PlasmoDB: the Plasmodium genome resource. A database integrating experimental and computational data.

Authors: Amit Bahl; Brian Brunk; Jonathan Crabtree; Martin J Fraunholz; Bindu Gajria; Gregory R Grant; Hagai Ginsburg; Dinesh Gupta; Jessica C Kissinger; Philip Labo; Li Li; Matthew D Mailman; Arthur J Milgram; David S Pearson; David S Roos; Jonathan Schug; Christian J Stoeckert; Patricia Whetzel
Journal: Nucleic Acids Res Date: 2003-01-01 Impact factor: 16.971

10. InterProScan: protein domains identifier.

Authors: E Quevillon; V Silventoinen; S Pillai; N Harte; N Mulder; R Apweiler; R Lopez
Journal: Nucleic Acids Res Date: 2005-07-01 Impact factor: 16.971

71 in total

1. The state of research for AIDS-associated opportunistic infections and the importance of sustaining smaller research communities.

Authors: Anthony P Sinai; Edna S Kaneshiro; Honorine Ward; Louis M Weiss; Melanie T Cushion
Journal: Eukaryot Cell Date: 2011-12-09

Review 2. Peroxiredoxins in parasites.

Authors: Michael C Gretes; Leslie B Poole; P Andrew Karplus
Journal: Antioxid Redox Signal Date: 2012-01-25 Impact factor: 8.401

3. Host pathogen protein interactions predicted by comparative modeling.

Authors: Fred P Davis; David T Barkan; Narayanan Eswar; James H McKerrow; Andrej Sali
Journal: Protein Sci Date: 2007-10-26 Impact factor: 6.725

4. Toward an online repository of Standard Operating Procedures (SOPs) for (meta)genomic annotation.

Authors: Samuel V Angiuoli; Aaron Gussman; William Klimke; Guy Cochrane; Dawn Field; George Garrity; Chinnappa D Kodira; Nikos Kyrpides; Ramana Madupu; Victor Markowitz; Tatiana Tatusova; Nick Thomson; Owen White
Journal: OMICS Date: 2008-06

5. Distribution of the SELMA translocon in secondary plastids of red algal origin and predicted uncoupling of ubiquitin-dependent translocation from degradation.

Authors: Simone Stork; Daniel Moog; Jude M Przyborski; Ilka Wilhelmi; Stefan Zauner; Uwe G Maier
Journal: Eukaryot Cell Date: 2012-10-05

6. MODBASE, a database of annotated comparative protein structure models and associated resources.

Authors: Ursula Pieper; Narayanan Eswar; Ben M Webb; David Eramian; Libusha Kelly; David T Barkan; Hannah Carter; Parminder Mankoo; Rachel Karchin; Marc A Marti-Renom; Fred P Davis; Andrej Sali
Journal: Nucleic Acids Res Date: 2008-10-23 Impact factor: 16.971

7. GiardiaDB and TrichDB: integrated genomic resources for the eukaryotic protist pathogens Giardia lamblia and Trichomonas vaginalis.

Authors: Cristina Aurrecoechea; John Brestelli; Brian P Brunk; Jane M Carlton; Jennifer Dommer; Steve Fischer; Bindu Gajria; Xin Gao; Alan Gingle; Greg Grant; Omar S Harb; Mark Heiges; Frank Innamorato; John Iodice; Jessica C Kissinger; Eileen Kraemer; Wei Li; John A Miller; Hilary G Morrison; Vishal Nayak; Cary Pennington; Deborah F Pinney; David S Roos; Chris Ross; Christian J Stoeckert; Steven Sullivan; Charles Treatman; Haiming Wang
Journal: Nucleic Acids Res Date: 2008-09-29 Impact factor: 16.971

8. EuPathDB: a portal to eukaryotic pathogen databases.

Authors: Cristina Aurrecoechea; John Brestelli; Brian P Brunk; Steve Fischer; Bindu Gajria; Xin Gao; Alan Gingle; Greg Grant; Omar S Harb; Mark Heiges; Frank Innamorato; John Iodice; Jessica C Kissinger; Eileen T Kraemer; Wei Li; John A Miller; Vishal Nayak; Cary Pennington; Deborah F Pinney; David S Roos; Chris Ross; Ganesh Srinivasamoorthy; Christian J Stoeckert; Ryan Thibodeau; Charles Treatman; Haiming Wang
Journal: Nucleic Acids Res Date: 2009-11-13 Impact factor: 16.971

9. InParanoid 7: new algorithms and tools for eukaryotic orthology analysis.

Authors: Gabriel Ostlund; Thomas Schmitt; Kristoffer Forslund; Tina Köstler; David N Messina; Sanjit Roopra; Oliver Frings; Erik L L Sonnhammer
Journal: Nucleic Acids Res Date: 2009-11-05 Impact factor: 16.971

10. A kernel for open source drug discovery in tropical diseases.

Authors: Leticia Ortí; Rodrigo J Carbajo; Ursula Pieper; Narayanan Eswar; Stephen M Maurer; Arti K Rai; Ginger Taylor; Matthew H Todd; Antonio Pineda-Lucena; Andrej Sali; Marc A Marti-Renom
Journal: PLoS Negl Trop Dis Date: 2009-04-21