| Literature DB >> 27903906 |
Cristina Aurrecoechea1, Ana Barreto2,3, Evelina Y Basenko1, John Brestelli2,3, Brian P Brunk2,4, Shon Cade4, Kathryn Crouch5, Ryan Doherty2,4, Dave Falke1, Steve Fischer2,3, Bindu Gajria2,4, Omar S Harb6,4, Mark Heiges1, Christiane Hertz-Fowler7, Sufen Hu2,4, John Iodice2,3, Jessica C Kissinger1,8,9, Cris Lawrence2,4, Wei Li2,4, Deborah F Pinney2,3, Jane A Pulman10, David S Roos4, Achchuthan Shanmugasundram10, Fatima Silva-Franco10, Sascha Steinbiss11, Christian J Stoeckert2,3, Drew Spruill1, Haiming Wang1, Susanne Warrenfeltz1, Jie Zheng2,3.
Abstract
The Eukaryotic Pathogen Genomics Database Resource (EuPathDB, http://eupathdb.org) is a collection of databases covering 170+ eukaryotic pathogens (protists & fungi), along with relevant free-living and non-pathogenic species, and select pathogen hosts. To facilitate the discovery of meaningful biological relationships, the databases couple preconfigured searches with visualization and analysis tools for comprehensive data mining via intuitive graphical interfaces and APIs. All data are analyzed with the same workflows, including creation of gene orthology profiles, so data are easily compared across data sets, data types and organisms. EuPathDB is updated with numerous new analysis tools, features, data sets and data types. New tools include GO, metabolic pathway and word enrichment analyses plus an online workspace for analysis of personal, non-public, large-scale data. Expanded data content is mostly genomic and functional genomic data while new data types include protein microarray, metabolic pathways, compounds, quantitative proteomics, copy number variation, and polysomal transcriptomics. New features include consistent categorization of searches, data sets and genome browser tracks; redesigned gene pages; effective integration of alternative transcripts; and a EuPathDB Galaxy instance for private analyses of a user's data. Forthcoming upgrades include user workspaces for private integration of data with existing EuPathDB data and improved integration and presentation of host-pathogen interactions.Entities:
Mesh:
Year: 2016 PMID: 27903906 PMCID: PMC5210576 DOI: 10.1093/nar/gkw1105
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
EuPathDB resources and organisms supported
| Database | Web address | Link to access list of organisms supported |
|---|---|---|
| EuPathDB | ||
| AmoebaDB | ||
| CryptoDB | ||
| FungiDB | ||
| GiardiaDB | ||
| HostDB | ||
| MicrosporidiaDB | ||
| PiroplasmaDB | ||
| PlasmoDB | ||
| ToxoDB | ||
| TrichDB | ||
| TriTrypDB | ||
| OrthoMCL | Includes proteins from over 150 organisms across bacteria, archaea and eukarya |
Figure 1.PlasmoDB strategy showing graphical interface for exploring relationships across data sets, data types and organisms. (The strategy can be found here: http://plasmodb.org/plasmo/im.do?s=7b88206dd42007c8) (A) Home page bubble for choosing the first search of a strategy, showing the ‘Predicted Signal Peptide’ search categorized under ‘Protein targeting and localization’. Clicking on the search title opens a form where users are prompted to choose required parameter values (if any) and initiate the search. The results of this search are displayed in Step 1 of panel C. (B) Interface for choosing subsequent searches. To add the Ribosomal profiling search that is based on RNA Seq data, users navigate the interface through ‘Run a new search for’, ‘Genes’, ‘Transcriptomics’, ‘RNA Seq Evidence’. Alternatively, to transform a result in to orthologs of another species as in step 3 of the strategy, users choose ‘Transform by Orthology’ (green arrow) instead of the navigation indicated above. (C) Three-step strategy that returns P. vivax orthologs (Step 3) of P. falciparum genes that are likely translated in merozoites (step 2) and that are predicted to encode proteins with signal peptides. (D) Table detailing the data sets and data types interrogated in this strategy.
Figure 2.Galaxy Workspace. (A) FungiDB header showing the Analyze My Experiment (orange box) link for navigating to the EuPathDB Galaxy Workspace. (B) The EuPathDB Galaxy Workspace home page with preconfigured workflows available in the center section. Available tools are located in the left panel and the History panel showing result and data files on the right in green. The ‘Display in FungiDB’ link (black box) navigates to GBrowse with the Galaxy data file open as a data track in the user's current GBrowse session. (C) Partial workflow showing the ‘drag and drop’ function of building workflow. (D) Bigwig file displayed in FungiDB Gbrowse directly from EuPathDB Galaxy using the ‘Display in FungiDB’ (black box) link in panel B.
Figure 3.Explore transcripts and enrichment analyses. (A) PlasmoDB 2-step strategy that returns genes with signal peptides that are likely translated based on ribosomal transcriptomics data. This strategy can be found at http://plasmodb.org/plasmo/im.do?s=859df329f857438e (B) The result table contains a column of Transcript IDs. (C) When a search returns transcript subsets, the Gene Result tab will contain a statement inviting users to explore the transcript results. Clicking ‘Explore’ opens the Explore Transcripts tool. (D) The Explore Transcripts tool for viewing transcripts that did or did not meet the search criteria for the current or previous searches. Choosing an option and clicking Apply Selection will filter the strategy result and display your chosen transcripts in the Gene Result tab. (E) The Analyze Results Tab opens a new tab for your chosen enrichment analysis. (F) Gene Ontology Enrichment Analysis Tool. Analysis results appear below the parameters and include enriched terms plus P-values.
Figure 4.Redesigned Gene Page. URL for this gene page- http://plasmodb.org/plasmo/app/record/gene/PF3D7_0905700 (A) Gene IDs and product descriptions are displayed in the upper left corner with other information and links directly below. (B) ‘Shortcuts’ serve two functions. Clicking on the Shortcut's magnifying glass icon offers a larger view of the data, while clicking on the image (or its title) navigates to the data within the gene page. (C) The collapsible, interactive and searchable ‘Contents’ section reflects EDAM-based categories and remains visible/stationary while scrolling the data (D). A blue section indicator (circle) points to the currently displayed data category. The check boxes to the right of the category names can be used to hide data. (D) Data is presented in collapsible, interactive, searchable, and sortable tables that contain transcript-specific information when data can be unambiguously assigned to a transcript. (E) The ‘Transcriptomics’ table featuring expandable rows with detailed information and graphs for each data set and coverage plots for RNA sequence data sets (showing one of eight tracks to conserve space in this figure). (F) Protein features table with the same expandable structure as the Transcriptomics table and showing protein domains, BLASTP Hits, Low Complexity Regions and Secondary Structure predictions.
Figure 5.Filter Parameter for composing sample groups based on metadata. (A) Samples are chosen from participants age 0 to 10. The left panel displays categories of sample characteristics while the right shows details of the data for that category. A summary of the sample group characteristics appears above the panel—333 out of 421 samples are below age 10.9 (blue arrow). (B) Adding a characteristic to refine the sample group. A second characteristic is chosen from the left panel (Health Status) and the Malaria group is chosen. The summary now shows the group characteristics—263 out of 421 samples have age <10.9 and malaria health status (blue arrow).