Literature DB >> 17098930

ApiDB: integrated resources for the apicomplexan bioinformatics resource center.

Cristina Aurrecoechea¹, Mark Heiges, Haiming Wang, Zhiming Wang, Steve Fischer, Philippa Rhodes, John Miller, Eileen Kraemer, Christian J Stoeckert, David S Roos, Jessica C Kissinger.

Abstract

ApiDB (http://ApiDB.org) represents a unified entry point for the NIH-funded Apicomplexan Bioinformatics Resource Center (BRC) that integrates numerous database resources and multiple data types. The phylum Apicomplexa comprises numerous veterinary and medically important parasitic protozoa including human pathogenic species of the genera Cryptosporidium, Plasmodium and Toxoplasma. ApiDB serves not only as a database in its own right, but as a single web-based point of entry that unifies access to three major existing individual organism databases (PlasmoDB.org, ToxoDB.org and CryptoDB.org), and integrates these databases with data available from additional sources. Through the ApiDB site, users may pose queries and search all available apicomplexan data and tools, or they may visit individual component organism databases.

Entities: Chemical Disease Species

Mesh：

Year: 2006 PMID： 17098930 PMCID： PMC1669770 DOI： 10.1093/nar/gkl880

Source DB: PubMed Journal: Nucleic Acids Res ISSN： 0305-1048 Impact factor: 16.971

INTRODUCTION

The phylum Apicomplexa comprises numerous veterinary and medically important parasitic protozoa including human pathogenic species of the genera Cryptosporidium, Plasmodium and Toxoplasma. Multiple species of Plasmodium are capable of causing malaria in humans, a leading cause of morbidity and mortality in developing countries (1). Cryptosporidium causes a severe and chronic diarrheal disease that may be life threatening in immunocompromised patients (2). Toxoplasma gondii infections, although typically asymptomatic in healthy individuals, may lead to congenital birth defects and encephalitis in HIV/AIDS patients (3). Human infections with T.gondii may be acquired from food or soil contamination and infections with Cryptosporidium parvum from soil and water contamination. Due to the potential threat to public health from intentional dispersal into the population, T.gondii and C.parvum are listed as Category B Biodefense Pathogens by the National Institutes of Health. The research communities for Cryptosporidium, Plasmodium and Toxoplasma have benefited from the bioinformatics resources provided by the distinct online genome databases CryptoDB (4), PlasmoDB (5) and ToxoDB (6), respectively (See supplementary material). Because of the phylogenetic relationship of these human pathogens, (all are included in the phylum Apicomplexa, along with the prominent animal pathogens, Babesia, Theileria and Eimeria) comparative genomic and proteomic studies across these species is critical for expediting discovery of therapeutic targets, increasing understanding of parasite biology and enhancing other areas of research on the biology of these organisms. However, the researcher's ability to perform comparative studies utilizing multiple data sources has been tempered by the difficulty of managing and collating the data from the existing disparate resource databases. Here we describe the online apicomplexan Bioinformatics Resource Center (BRC), ApiDB (), which has been established to provide researchers centralized, integrated access to experimental and computational data, as well as tools to facilitate comparative research. ApiDB integrates the existing CryptoDB, ToxoDB and PlasmoDB component resources. Database integration is accomplished via a combination of federation and link integration technologies (7). Link integration allows researchers to begin their query with one data source and then follow hypertext links to related information in other data sources. Database federation is achieved by decomposing distributed queries into component queries and executing these queries in the source databases, delivering the results into a uniform format. It leaves the information in its source databases but builds an environment around the databases that makes them all seem part of one large system. In ApiDB 2.0 the federation has been implemented using Oracle DbLink technology. In order to handle heterogeneous data sources in the future we are studying other federation approaches, such as Java Database Connectivity (JDBC) () and Web Services (WS) (). ApiDB serves as a web portal for cross-species comparison. Genome data from other apicomplexan parasites are also integrated. In its current release, ApiDB 2.0 offers an initial set of queries that enable gene searches of the three component databases by a variety of criteria such as text keywords, Enzyme Commission (EC) number Gene Ontology (8) assignments, and Pfam (9) terms. In addition, ApiDB offers tools to BLAST (10) all public apicomplexan data, access to the multi-species gene orthology database OrthoMCL DB (11) and access to KEGG (12) metabolic pathway maps with ‘painted’ comparative highlights of apicomplexan and human enzymes.

FUNCTIONALITY OF CURRENT RELEASE

ApiDB 2.0 was released in April 2006. The datasets available in ApiDB include the component databases (CryptoDB, PlasmoDB and ToxoDB), apicomplexan genomic sequences for other species (Theileria annulata and Theileria parva) obtained from the NCBI Genbank (13) and GeneDB (14), a collection of clustered apicomplexan ESTs including: Eimeria, Gregarina, Neospora, Sarcocystis and Theileria called ApiDoTS [a newer version of ApiEST-DB (15)] and unclustered ESTs from the NCBI Genbank division, dbEST. The ApiDB web interface shares its architecture and ‘look and feel’ with the component sites (CryptoDB, PlasmoDB and ToxoDB). The user can interact with four areas on the main page: the sidebar, a tools section, a query section and a menu bar (Figure 1A). The sidebar gives the user access to apicomplexan community resources, from our project's most recent news to PubCrawler (16) and external resources, as well as information on the annual ApiDB training workshop (See supplementary material). The tools section provides access to a BLAST search of Apicomplexa genomic, EST and gene model sequences (Figure 1D) and to OrthoMCL DB and KEGG maps with apicomplexan and human enzymes highlighted. The query section provides queries for gene and protein features that span CryptoDB, PlasmoDB and ToxoDB and may include searches of all or a subset of the component species genomes (Figure 1B). Finally, the menu bar appears on every page and gives access to the user's query history and the information on the datasets used in the database.

Figure 1

Database Functionality (A) Searches are initiated via tools or queries provided on the web site's front page. (B) Query forms allow user-defined refinement for the queries, e.g. which species would you like to search? Descriptions and help are provided for the queries. (C) Query searches return a table with results from the component databases sites. The result summary states the number of records found at each site (circled). Individual results are linked to a detailed record page at its component database site. (D) The BLAST tool integrates searches of all publicly available apicomplexan genomic data, including species not hosted by a component database. (E) Query results can be downloaded in a customized tab-delimited file. ApiDB's query architecture provides a set of pages where users can easily execute and manage queries. On the front page, six federated queries are currently available that span the component databases: search genes by gene ID, by annotated keyword in product description (Figure 1B), by Pfam domain, by EC number, by GO term, and by BLAST similarity. Upon query selection, the user is presented with a question page where they can refine their search (Figure 1B). When the query is executed, a summary page offers the number of hits for each organism and the list of genes that meet the requirements (Figure 1C). Hyperlinks connect the user to the gene page in the component sites. Gene pages, acquired from the appropriate component database provide a detailed view of annotation and analysis for the given gene record in the database. For a detailed description of the gene record page we refer to the component databases (4–6). The query history page, linked in the menu bar, permits users to track their searches and combine them into more complex queries across data types, e.g. find all genes in Cryptosporidium, Plasmodium and Toxoplasma that have a signal peptide and no transmembrane domains. Summaries of the number of hits for each organism are provided when queries are executed (Figure 1C). As in the component sites, the web interface includes a mechanism to allow users to readily download the sequences and other attributes associated with their query result set (e.g. gene name, product description, coordinates, length) in a versatile tab-delimited file (Figure 1E) that can be viewed in the spreadsheet programs, or, if only sequences are desired, may be downloaded in Fasta format. Examples of inquiries that can be performed on ApiDB are located in the supplementary material.

FUTURE DIRECTIONS

ApiDB will be guided in large part by input from the user communities of ApiDB, the component databases and the objectives of the BRCs. An annual workshop on the usage of apicomplexan database resources is not only a valuable opportunity for users to obtain hands-on instruction, but it also provides a forum for feedback used to further drive development of this site. As the autonomous component databases evolve with new data and features, ApiDB will respond to integrate these elements as appropriate. As data from phylogenetically related species becomes increasingly available, e.g. ciliates and Perkinsus, and perhaps, some day a dinoflagellate, orthologous genes will be determined and links will be provided, via orthology to these resources when possible. The ApiDB website currently excludes queries of datasets not found in all three component databases. For example, PlasmoDB contains microarray-based gene expression data whereas CryptoDB and ToxoDB presently do not. We will investigate permitting the querying of a subset of the component databases through ApiDB. Feature enhancements that will be available by early 2007 include persistent history to allow users to save their search results and corresponding results and a ‘sort’ tool to allow users to sort the results they obtain from individual queries (Figure 1C) by species, gene ID or feature description. Combinations of existing tools, pre-formed queries and query history can be leveraged in powerful ways to mine the apicomplexan data but do not lend themselves well to high-throughput explorations. To facilitate large-scale database utilization, we will be providing programmatic access to our facilities through the use of standard web service technologies. Modular web service tasks will serve as building blocks for creating a user-defined workflow. As a single example, linking a series of tasks would enable a researcher with a collection of putative gene regulatory motifs to perform a bulk series of BLAST searches of chromosomal sequences, extract hit coordinates from the report, then query for gene models downstream of each hit. As our web services capabilities expand, we will use them to permit workflows that include ClustalW (17) for multiple sequence alignments. Web services may also be used to collaborate with additional databases, including other BRCs (), to offer users integrated access to additional pathogen data.

SUPPLEMENTARY DATA

Supplementary Data are available at NAR Online.

14 in total

1. The Pfam protein families database.

Authors: A Bateman; E Birney; R Durbin; S R Eddy; K L Howe; E L Sonnhammer
Journal: Nucleic Acids Res Date: 2000-01-01 Impact factor: 16.971

2. Gene ontology: tool for the unification of biology. The Gene Ontology Consortium.

Authors: M Ashburner; C A Ball; J A Blake; D Botstein; H Butler; J M Cherry; A P Davis; K Dolinski; S S Dwight; J T Eppig; M A Harris; D P Hill; L Issel-Tarver; A Kasarskis; S Lewis; J C Matese; J E Richardson; M Ringwald; G M Rubin; G Sherlock
Journal: Nat Genet Date: 2000-05 Impact factor: 38.330

3. GeneDB: a resource for prokaryotic and eukaryotic organisms.

Authors: Christiane Hertz-Fowler; Chris S Peacock; Valerie Wood; Martin Aslett; Arnaud Kerhornou; Paul Mooney; Adrian Tivey; Matthew Berriman; Neil Hall; Kim Rutherford; Julian Parkhill; Alasdair C Ivens; Marie-Adele Rajandream; Bart Barrell
Journal: Nucleic Acids Res Date: 2004-01-01 Impact factor: 16.971

4. GenBank.

Authors: Dennis A Benson; Ilene Karsch-Mizrachi; David J Lipman; James Ostell; David L Wheeler
Journal: Nucleic Acids Res Date: 2003-01-01 Impact factor: 16.971

5. PubCrawler: keeping up comfortably with PubMed and GenBank.

Authors: Karsten Hokamp; Kenneth H Wolfe
Journal: Nucleic Acids Res Date: 2004-07-01 Impact factor: 16.971

6. Multiple sequence alignment with Clustal X.

Authors: F Jeanmougin; J D Thompson; M Gouy; D G Higgins; T J Gibson
Journal: Trends Biochem Sci Date: 1998-10 Impact factor: 13.807

Review 7. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs.

Authors: S F Altschul; T L Madden; A A Schäffer; J Zhang; Z Zhang; W Miller; D J Lipman
Journal: Nucleic Acids Res Date: 1997-09-01 Impact factor: 16.971

8. From genomics to chemical genomics: new developments in KEGG.

Authors: Minoru Kanehisa; Susumu Goto; Masahiro Hattori; Kiyoko F Aoki-Kinoshita; Masumi Itoh; Shuichi Kawashima; Toshiaki Katayama; Michihiro Araki; Mika Hirakawa
Journal: Nucleic Acids Res Date: 2006-01-01 Impact factor: 16.971

9. OrthoMCL-DB: querying a comprehensive multi-species collection of ortholog groups.

Authors: Feng Chen; Aaron J Mackey; Christian J Stoeckert; David S Roos
Journal: Nucleic Acids Res Date: 2006-01-01 Impact factor: 16.971

10. CryptoDB: a Cryptosporidium bioinformatics resource update.

Authors: Mark Heiges; Haiming Wang; Edward Robinson; Cristina Aurrecoechea; Xin Gao; Nivedita Kaluskar; Philippa Rhodes; Sammy Wang; Cong-Zhou He; Yanqi Su; John Miller; Eileen Kraemer; Jessica C Kissinger
Journal: Nucleic Acids Res Date: 2006-01-01 Impact factor: 16.971

51 in total

1. Aligning sequences by minimum description length.

Authors: John S Conery
Journal: EURASIP J Bioinform Syst Biol Date: 2007

2. The origins of apicomplexan sequence innovation.

Authors: James Wasmuth; Jennifer Daub; José Manuel Peregrín-Alvarez; Constance A M Finney; John Parkinson
Journal: Genome Res Date: 2009-04-10 Impact factor: 9.043

3. Expression quantitative trait locus mapping of toxoplasma genes reveals multiple mechanisms for strain-specific differences in gene expression.

Authors: Jon P Boyle; Jeroen P J Saeij; Scott Y Harada; Jim W Ajioka; John C Boothroyd
Journal: Eukaryot Cell Date: 2008-06-13

4. Pathway Tools version 13.0: integrated software for pathway/genome informatics and systems biology.

Authors: Peter D Karp; Suzanne M Paley; Markus Krummenacker; Mario Latendresse; Joseph M Dale; Thomas J Lee; Pallavi Kaipa; Fred Gilham; Aaron Spaulding; Liviu Popescu; Tomer Altman; Ian Paulsen; Ingrid M Keseler; Ron Caspi
Journal: Brief Bioinform Date: 2009-12-02 Impact factor: 11.622

Review 5. The apicomplexan glideosome and adhesins - Structures and function.

Authors: Lauren E Boucher; Jürgen Bosch
Journal: J Struct Biol Date: 2015-03-09 Impact factor: 2.867

Review 6. Cryptosporidium: genomic and biochemical features.

Authors: Stanley Dean Rider; Guan Zhu
Journal: Exp Parasitol Date: 2008-12-31 Impact factor: 2.011

Review 7. Toxoplasma: the next 100years.

Authors: Kami Kim; Louis M Weiss
Journal: Microbes Infect Date: 2008-07-10 Impact factor: 2.700

Review 8. Design and utilization of epitope-based databases and predictive tools.

Authors: Nima Salimi; Ward Fleri; Bjoern Peters; Alessandro Sette
Journal: Immunogenetics Date: 2010-03-06 Impact factor: 2.846

9. GiardiaDB and TrichDB: integrated genomic resources for the eukaryotic protist pathogens Giardia lamblia and Trichomonas vaginalis.

Authors: Cristina Aurrecoechea; John Brestelli; Brian P Brunk; Jane M Carlton; Jennifer Dommer; Steve Fischer; Bindu Gajria; Xin Gao; Alan Gingle; Greg Grant; Omar S Harb; Mark Heiges; Frank Innamorato; John Iodice; Jessica C Kissinger; Eileen Kraemer; Wei Li; John A Miller; Hilary G Morrison; Vishal Nayak; Cary Pennington; Deborah F Pinney; David S Roos; Chris Ross; Christian J Stoeckert; Steven Sullivan; Charles Treatman; Haiming Wang
Journal: Nucleic Acids Res Date: 2008-09-29 Impact factor: 16.971

10. TriTrypDB: a functional genomic resource for the Trypanosomatidae.

Authors: Martin Aslett; Cristina Aurrecoechea; Matthew Berriman; John Brestelli; Brian P Brunk; Mark Carrington; Daniel P Depledge; Steve Fischer; Bindu Gajria; Xin Gao; Malcolm J Gardner; Alan Gingle; Greg Grant; Omar S Harb; Mark Heiges; Christiane Hertz-Fowler; Robin Houston; Frank Innamorato; John Iodice; Jessica C Kissinger; Eileen Kraemer; Wei Li; Flora J Logan; John A Miller; Siddhartha Mitra; Peter J Myler; Vishal Nayak; Cary Pennington; Isabelle Phan; Deborah F Pinney; Gowthaman Ramasamy; Matthew B Rogers; David S Roos; Chris Ross; Dhileep Sivam; Deborah F Smith; Ganesh Srinivasamoorthy; Christian J Stoeckert; Sandhya Subramanian; Ryan Thibodeau; Adrian Tivey; Charles Treatman; Giles Velarde; Haiming Wang
Journal: Nucleic Acids Res Date: 2009-10-20 Impact factor: 16.971