| Literature DB >> 23175615 |
Cristina Aurrecoechea1, Ana Barreto, John Brestelli, Brian P Brunk, Shon Cade, Ryan Doherty, Steve Fischer, Bindu Gajria, Xin Gao, Alan Gingle, Greg Grant, Omar S Harb, Mark Heiges, Sufen Hu, John Iodice, Jessica C Kissinger, Eileen T Kraemer, Wei Li, Deborah F Pinney, Brian Pitts, David S Roos, Ganesh Srinivasamoorthy, Christian J Stoeckert, Haiming Wang, Susanne Warrenfeltz.
Abstract
EuPathDB (http://eupathdb.org) resources include 11 databases supporting eukaryotic pathogen genomic and functional genomic data, isolate data and phylogenomics. EuPathDB resources are built using the same infrastructure and provide a sophisticated search strategy system enabling complex interrogations of underlying data. Recent advances in EuPathDB resources include the design and implementation of a new data loading workflow, a new database supporting Piroplasmida (i.e. Babesia and Theileria), the addition of large amounts of new data and data types and the incorporation of new analysis tools. New data include genome sequences and annotation, strand-specific RNA-seq data, splice junction predictions (based on RNA-seq), phosphoproteomic data, high-throughput phenotyping data, single nucleotide polymorphism data based on high-throughput sequencing (HTS) and expression quantitative trait loci data. New analysis tools enable users to search for DNA motifs and define genes based on their genomic colocation, view results from searches graphically (i.e. genes mapped to chromosomes or isolates displayed on a map) and analyze data from columns in result tables (word cloud and histogram summaries of column content). The manuscript herein describes updates to EuPathDB since the previous report published in NAR in 2010.Entities:
Mesh:
Substances:
Year: 2012 PMID: 23175615 PMCID: PMC3531183 DOI: 10.1093/nar/gks1113
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
This table lists EuPathDB resources, their web addresses and the included organisms
| Database | Web address | Supported organisms |
|---|---|---|
| EuPathDB | All EuPathDB organisms listed below | |
| AmoebaDB | ||
| CryptoDB | ||
| GiardiaDB | ||
| MicrosporidiaDB | ||
| PiroplasmaDB | ||
| PlasmoDB | ||
| ToxoDB | ||
| TrichDB | ||
| TriTrypDB | ||
| OrthoMCL | Includes proteins from over 150 organisms across bacteria, archaea and eukarya. |
Figure 1.Screen shots of a search strategy in PiroplasmaDB and GBrowse representing HTS (C–E from ToxoDB and F and G from AmoebaDB) (A) A three-step search strategy combining genes with predicted signal peptides, transmembrane domains and microarray expression data. (B) Search strategies may be saved and shared with others using a uniquely generated URL. (C) Peptides from mass spec experiments are mapped to genes and displayed graphically. Mousing over the graphics provides additional information, such as the peptide sequence and any posttranslational modifications. In this image, peptides are from a phophoproteomic experiment. (D) A track representing strand-specific RNA-seq data. Blue indicates reads mapping to the forward strand, whereas red represents those mapping to the reverse strand. (E) Unified splice junction track representing intron-spanning RNA-seq reads from all experiments in the database. (F) A 2 kb region with alignment of DNA sequencing reads to the genome. (G) Zooming in to 100 bp displays the actual sequence allowing data inspection. Highlighted nucleotides represent SNPs.
Figure 2.Screen shot from GiardiaDB depicting a genomic segment search. (A) Genomic segment searches (i.e. DNA motif pattern) are available on the home page. (B) DNA motifs may be entered as a standard string of characters or using a regular expression as depicted. (C) DNA segment records are generated dynamically and results are displayed in a search strategy with results represented in a dynamic table below the strategy.
Figure 4.Screen shots from PlasmoDB showing in (A) a typical result list from a search strategy, (B) an alternative graphical representation of genes on chromosomes, (C) a word cloud generated by clicking on the column analysis icon for the product description column and (D) a histogram generated by clicking on the column analysis icon for the ortholog count column.
Figure 3.Screen shots depicting the genomic colocation query in EuPathDB resources. In this example from GiardiaDB, genes that have a DNA motif located within 500-nt upstream are identified. (A) To identify genes in relation to DNA motifs, a step searching for genes based on the organism of interest is added to the strategy. The genomic colocation option is selected by default when combing different record types, such as DNA motifs and genes. (B) The customizable colocation popup provides a dynamic logic statement that is updated based on the chosen parameters. (C) Results of colocation query. Top of the panel shows the search strategy and the bottom portion includes the results with columns for gene IDs, number of matched motifs in the defined region and match genomic coordinates.