| Literature DB >> 34718728 |
Beatrice Amos1, Cristina Aurrecoechea2, Matthieu Barba3, Ana Barreto4,5, Evelina Y Basenko1, Wojciech Bażant6, Robert Belnap2, Ann S Blevins7, Ulrike Böhme1, John Brestelli4,5, Brian P Brunk8, Mark Caddick1, Danielle Callan4,5, Lahcen Campbell3, Mikkel B Christensen3, George K Christophides9, Kathryn Crouch6, Kristina Davis10, Jeremy DeBarry2, Ryan Doherty4,8, Yikun Duan4,8, Michael Dunn10, Dave Falke2, Steve Fisher4,5, Paul Flicek3, Brett Fox10, Bindu Gajria8, Gloria I Giraldo-Calderón11,12, Omar S Harb8, Elizabeth Harper8, Christiane Hertz-Fowler1, Mark J Hickman8, Connor Howington10, Sufen Hu4,8, Jay Humphrey2, John Iodice4,5, Andrew Jones1, John Judkins4,8, Sarah A Kelly9, Jessica C Kissinger2,13,14, Dae Kun Kwon15, Kristopher Lamoureux2, Daniel Lawson9, Wei Li4,8, Kallie Lies10, Disha Lodha3, Jamie Long8, Robert M MacCallum9, Gareth Maslen3, Mary Ann McDowell11, Jaroslaw Nabrzyski10, David S Roos8, Samuel S C Rund11, Stephanie Wever Schulman8, Achchuthan Shanmugasundram1, Vasily Sitnik3, Drew Spruill2, David Starns1, Christian J Stoeckert4,5, Sheena Shah Tomko8, Haiming Wang2, Susanne Warrenfeltz2, Robert Wieck10, Paul A Wilkinson1, Lin Xu4,8, Jie Zheng4,5.
Abstract
The Eukaryotic Pathogen, Vector and Host Informatics Resource (VEuPathDB, https://veupathdb.org) represents the 2019 merger of VectorBase with the EuPathDB projects. As a Bioinformatics Resource Center funded by the National Institutes of Health, with additional support from the Welllcome Trust, VEuPathDB supports >500 organisms comprising invertebrate vectors, eukaryotic pathogens (protists and fungi) and relevant free-living or non-pathogenic species or hosts. Designed to empower researchers with access to Omics data and bioinformatic analyses, VEuPathDB projects integrate >1700 pre-analysed datasets (and associated metadata) with advanced search capabilities, visualizations, and analysis tools in a graphic interface. Diverse data types are analysed with standardized workflows including an in-house OrthoMCL algorithm for predicting orthology. Comparisons are easily made across datasets, data types and organisms in this unique data mining platform. A new site-wide search facilitates access for both experienced and novice users. Upgraded infrastructure and workflows support numerous updates to the web interface, tools, searches and strategies, and Galaxy workspace where users can privately analyse their own data. Forthcoming upgrades include cloud-ready application architecture, expanded support for the Galaxy workspace, tools for interrogating host-pathogen interactions, and improved interactions with affiliated databases (ClinEpiDB, MicrobiomeDB) and other scientific resources, and increased interoperability with the Bacterial & Viral BRC.Entities:
Mesh:
Year: 2022 PMID: 34718728 PMCID: PMC8728164 DOI: 10.1093/nar/gkab929
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
VEuPathDB resources and organisms supported
| Project | Web address | URL to access list of organisms supported | Number of datasets* (release 54) |
|---|---|---|---|
| VEuPathDB |
|
| 1775 |
| AmoebaDB |
|
| 62 |
| CryptoDB |
|
| 65 |
| FungiDB |
|
| 413 |
| GiardiaDB |
|
| 51 |
| HostDB |
|
| 46 |
| MicrosporidiaDB |
|
| 53 |
| PiroplasmaDB |
|
| 42 |
| PlasmoDB |
|
| 279 |
| ToxoDB |
|
| 150 |
| TrichDB |
|
| 23 |
| TriTrypDB |
|
| 239 |
| VectorBase |
|
| 492 |
| OrthoMCL DB |
|
| 655 genomes represented |
*Datasets can represent genome sequences or other omics-scale data e.g. an RNA-Seq developmental time series study, a proteomics analysis, or a phenotype screen. The sum of the project datasets is larger than the number of datasets contained in the parent project, VEuPathDB, because some common dataset, e.g. the taxonomy, are present in each project.
Figure 1.VEuPathDB redesigned homepage. (A) The banner section is present on all webpages and contains the site search box and access to all searches, data, and tools. (B) Just below the banner is a section reserved for general community announcements (C) The left-hand search panel contains categorized searches of data in the resource. (D) The Overview of resources and tools section is in the centre of the page, providing quick vignettes to help our users get started with a specific topic. (E) Links to more detailed step-by-step instructional exercises are in the centre of the page just above the footer. (F) The news and tweets section is an expandable tab (collapsed by default) offering users access to recent news releases and tweets. (G) The footer section includes hyperlinked logos to all VEuPathDB component and affiliated sites in addition to the Gitter chat button.
Figure 2.Updated strategy system interface. (A) example strategy showing graphic interface for exploring relationships across data sets and organisms. Final strategy can be found here https://toxodb.org/toxo/app/workspace/strategies/import/037c32a7060e4c90. Upper right corner shows actions that can be performed on the strategy: copy, add/edit description, save, share, delete and close. (B) Home page Search for menu often used to choose the first search in a strategy. (C) Result of first step with the graphic of the growing strategy in the top panel, redesigned vertical organism filter on the left and the gene result table on the right. (D) Redesigned Add Step popup. The three choices when adding a step are aligned on the right and details appear in the right panel once a left panel option is chosen.
Figure 3.MapVEu tool for visualizing and downloading spatially resolved data. (A) The MapVEu interface includes a left-side menu for accessing tools (blue arrow), a search and filtering bar at the top of the screen (green arrow), and a lower right-side legend (green box). Circular markers indicate the location of data collections. Clicking on these markers will zoom and disaggregate them into more spatially resolved points. (B) Tool for choosing data type of interest (enlarged search and filtering bar seen in A, green arrow). The map can display several data types including Samples, Insecticide Resistance, Genotypes, Abundance, Pathogens and Blood Meals. Users can select one of several views to see a specific data type on the map. (C) For many data types, specialized representations are available from the graph tab of the tool menu. Shown here are species abundance counts and insecticide resistance assays. (D) The tool menus’ data table tab highlights the depth of metadata recorded from each sample. (E) All data can be downloaded using the export data tab. F. Visible data can be filtered by adding search terms to the filter bar, which has auto-complete functionality.
Summary of functional annotation updates made by VEuPathDB curators
| Annotation field | PlasmoDB | TriTrypDB | FungiDB | Total |
|---|---|---|---|---|
| Product descriptions | 14 408 | 9932 | 4062 | 28 402 |
| Gene names | 4171 | 2955 | 1958 | 9084 |
| Gene synonyms | 42 | 257 | 245 | 544 |
| Gene Ontology Terms | 23 480 | 70 748 | 7984 | 102 212 |
| Enzyme Commission numbers | 477 | 198 | 99 | 774 |
| PMIDs | 14 288 | 5127 | 9409 | 28 824 |
| Comments (notes from annotators) | 3 | 7622 | 21 461 | 29 086 |
| External database references added | 1 | 47 133 | 94 182 | 141 316 |
| Reviewed user comments per gene | 328 | 7400 | 1080 | 8808 |
| Genes with new functional annotations | 20 276 | 60 708 | 86 872 | 167 856 |
VEuPathDB Community Resource access points
| Resource | URL |
|---|---|
|
| |
| Forum |
|
| Help desk email |
|
| Methods |
|
| News Feed |
|
|
| |
| Tutorials |
|
|
| |
| Webinars |
|
| Workshops |
|
| YouTube Channel |
|
Figure 4.VEuPathDB data production workflow and architecture. The complete pathway from data acquisition to web presentation and utilization by users is detailed. Production activities and systems are represented in the bottom purple box and the services and presentation layers are represented in the pink and grey boxes. Data enter the system in the data staging box where they are identified and prioritized by Outreach and entered into the Redmine issue tracking system. Once data are cleaned and structured, datasets are available to the processing and integration workflows. Genome sequence, annotation, RNA Seq and DNA sequencing reads are processed at the EBI (bottom box) and passed back to Penn (top box) for data integration and subsequent processing including integration of functional data and ortholog assignment. Data are prepared by these workflows for presentation in the form of relational databases and indexed flat files. The web clients provide access to users via a set of services that communicate with the back-end data stores. The system also includes a user data analysis system (right side) enabling users to analyse their own data and, for some datatypes, import their results into VEuPathDB for analysis and integration with publicly available data.