Literature DB >> 34864248

VectorBase.org updates: bioinformatic resources for invertebrate vectors of human pathogens and related organisms.

Gloria I Giraldo-Calderón1, Omar S Harb2, Sarah A Kelly3, Samuel Sc Rund4, David S Roos2, Mary Ann McDowell5.   

Abstract

VectorBase (VectorBase.org) is part of the VEuPathDB Bioinformatics Resource Center, providing free online access to multi-omics and population biology data, focusing on arthropod vectors and invertebrates of importance to human health. VectorBase includes genomics and functional genomics data from bed bugs, biting midges, body lice, kissing bugs, mites, mosquitoes, sand flies, ticks, tsetse flies, stable flies, house flies, fruit flies, and a snail intermediate host. Tools include the Search Strategy system and MapVEu, enabling users to interrogate and visualize diverse 'omics and population-level data using a graphical interface (no programming experience required). Users can also analyze their own private data, such as transcriptomic sequences, exploring their results in the context of other publicly-available information in the database. Help Desk: help@vectorbase.org.
Copyright © 2021 The Author(s). Published by Elsevier Inc. All rights reserved.

Entities:  

Mesh:

Year:  2021        PMID: 34864248      PMCID: PMC9133010          DOI: 10.1016/j.cois.2021.11.008

Source DB:  PubMed          Journal:  Curr Opin Insect Sci            Impact factor:   5.254


Introduction

As part of the Eukaryotic Pathogen, Vector and Host Informatics Resource (VEuPathDB.org) Bioinformatics Resource Center (BRC), VectorBase (VectorBase.org) is supported by the US National Institutes of Allergy and Infectious Diseases (NIAID) [1]. In addition to VectorBase [2], VEuPathDB [3] also supports eukaryotic pathogens (protists, fungi), selected mammalian host data, and provides resources for orthology determination and phylogenetic inference (OrthoMCL.org) [4]. Additional resources using the VEuPathDB model and infrastructure accommodate epidemiological (ClinEpiDB.org) [5] and microbiome data (MicrobiomeDB.org) [6]. Release 54 of VectorBase supports 53 vector genomes and integrates a wide range of other data types, including functional genomics and genetic variation data. The MapVEu geo-visualization tool displays different types of population data, including vector abundance, pathogen infection status, genetic variation, host blood meal source, and insecticide resistance phenotypes and genotypes, for ~470 species worldwide. Data are integrated from public repositories or directly from providers and analyzed with standard workflows using an ontology-driven framework to ensure data comparability. Expert knowledge from the community is also incorporated to improve genome annotation through an Apollo interface and in the form of User Comments. Here we present a general overview of the new VectorBase resource, including site use, data types, and tools, and finish with our future plans.

The new VectorBase: a merged BRC infrastructure

The rapid growth of genomic-scale datasets, increasing integration of scientific research, and funder mandates for improved efficiency have driven the development of VEuPathDB, coupling the Ensembl bioinformatic pipelines [7,8] long used by VectorBase, with the Genomics Unified Schema [9] and highly flexible Search Strategies [10] of EuPathDB. The net result offers improved scalability, flexibility, data flow, and overall user experience.

Web interface improvements

A redesigned common user interface provides convenient, consistent access to data, searches, and help infor- mation for all supported species. The home page (Figure 1) features a header (present on all pages), a main panel, an expandable ‘News & Tweets’ section (Figure 1d), and a footer with clickable icons to access other VEuPathDB resources (Figure 1f). The Site Search (Figure 1b) allows free text searches, returning categorized results; with filters allowing users to define categories or organisms of interest. Results (genes, SNPs, etc.) can then be exported to the ‘My Strategies’ system for further data mining, visualization, or download (see below). Educational materials, FAQs, virtual events, workshops, and methods are available under the ‘Help’ menu (Figure 1, arrow), and links to tutorials and exercises are at the bottom of the main panel (Figure 1e). Additional help is also available from the ‘Contact Us’ link (Figure 1b).
Figure 1

Redesigned VectorBase home page. (a) Left-hand panel provides access to all available searches, categorized by datatype. (b) Header on all site pages, including Site Search box and access to My Strategies, Searches, Tools, My Workspace, Data, About, Help (educational materials, arrow) and Contact Us sections. Release date and version adjacent to the logo at left; social media, login, registration, and user profile links at right. (c) Central section provides an overview of resources and tools, including vignettes to help users get started on specific topics of likely interest. (d) News & Tweets section is an expandable tab (collapsed by default), providing access to recent announcements. (e) Links to more detailed step-by-step exercises. (f) Hyperlinked logos to other VEuPathDB components and affiliated sites, and community chat button enabling users to ask questions and share information.

The left sidebar provides access to all searches (Figure 1a; also accessible from the ‘Searches’ menu). Searches are organized into expandable categories containing configurable queries against the underlying data. Search results are returned as an expandable Search Strategy and are displayed in a dynamic table that can be configured by adding, removing, or moving columns. The central section of the main panel provides an overview of available resources and tools (Figure 1c).

Omics and population data sets

VectorBase release 54 includes 492 datasets relating to vector species. Bimonthly releases incorporate new data and functionality into the site; the latest data can be found on the datasets page under the ‘Data’ menu (located in the header) (https://vectorbase.org/vectorbase/app/search/dataset/AllDatasets/result). Forty-one vector genomes represent ‘reference’ strains for distinct species, while 12 are additional strains or resequencing of already available strains. Gene set predictions are available for all reference species (and most additional strains), including 12 with chromosomal map-pings. Other datasets, including transcriptomes, pro-teomes, genetic variation, and orthology profiles are aligned or cross-referenced to reference genomes, and genomes are also cross-referenced with ~20 external databases, including Chemical Entities of Biological Interest (ChEBI) [11], Kyoto Encyclopedia of Genes and Genomes (KEGG) [12,13] and Gene Ontology (GO) [14,15]. All omics datasets are also available for download or use with the site tools accessible under the ‘Tools’ menu in the header. Population datasets include records for ~470 taxonomic groups, from field-collected samples divided into differ- ent map ‘views’ and/or data types, including >21 000 and >17 000 insecticide resistance phenotype and genotype assays respectively, >187 000 pathogen infection status assays, >12 000 blood meal source assays, >15 000 chromosomal inversions, >15 000 microsatellites, >2600 bar-codes, and >25 million population abundance records, among others. The MapVEu tool (Figure 2) is used for visualization, search, analysis, and raw data download. Specialized representations are also available; the bar graph in Figure 2b indicates species abundance counts for the geographic region shown.
Figure 2.

Population data in MapVEu, a tool for visualizing, analyzing, and downloading geographic data. (a) Select MapVEu from the Tools menu on the home page (arrow). (b) Select Abundance ‘view’ from the dropdown menu below the map search bar (violet arrow). Select sampling location on the map or, enter into the search bar and select from the autocomplete menu (Manatee County used in this example). Date Search set = July 2018 to July 2019. Open/collapse blue arrowhead in legend panel (green box) to set: Collection Protocol = CDC light trap & Attractant = carbon dioxide, in the same panel select Species & Optimize Colors options. A point was selected (indicated in color, center of the map), to explore it in more detail. The icon at the left (orange arrow) defines graph type (bars in this example); ‘EpiWeekly’ is set as temporal resolution. Left panel size can be adjusted (red box). The orange box indicates other ‘view’ specific data visualizations, metadata details, raw data download, and so on. Login to VectorBase and follow this link to recreate the image shown: https://tinyurl.com/InsectGx2021VectorBaseFig2.

New and improved tools and resources

Genome and protein browsers

Genome browsing is facilitated by the JBrowse genome browser [16], an open-source platform allowing users to select tracks displaying aligned transcriptomic, proteomic, epigenomic, and variation data. Variation data sets (SNP calls) are available via Variant Call Format (VCF) files aligned to reference genomes. Protein Browser tracks include transmembrane domains (TMHMM predictions) [17], protein domains (InterPro predictions) [18], and synteny views across multiple genomes.

Gene pages

Gene pages, now with a new design, compile all the available data about a particular gene into a single webpage. Aligning of orthologs and paralogs are identified using OrthoMCL [4], and Clustal Omega [19] can be launched for multiple sequence alignments. New representations facilitate exploration of transcriptomics data, protein features and properties, use of functional prediction tools, and assessment of metabolic pathways.

My strategies

Searches in VectorBase can be integrated into a Search Strategy, allowing users to integrate diverse results into a multistep in silico experiment (Figure 3). Multistep strategies (for example, find Aedes kinases expressed in a particular time or place, and conserved in species of interest) are built one step at a time, bringing together several searches by union (Figure 3c, step 2), intersection (Figure 3c, step 3), or subtraction operations. Strategies are extended by clicking ‘Add a step’ in the graphic panel (Figure 3c). Options for extending a strategy include ‘Combine’ with similar records’, ‘Transform’ to related records, and Genomic Colocation. Results can be transformed into orthologs, metabolic pathways, or compounds. Additionally, the genomic location can be exploited to search for additional features. Search Strategies can be saved, copied, revised, or shared with others using a private link. The Strategy System replaces BioMart functionalities, including the ability to download genome-wide information available from gene pages (e.g. homologs, expression values, GO terms, etc.).
Figure 3.

Search strategies as in silico experiments. (a) Site Search, a box that can be accessed from the header, returns site-wide hits, in the results page filters can be applied. Search strategies (for Genes, Organisms, Pathways, etc.) can be run from the Searches pull-down menu or the Search for . . . panel at the left. (b) Search filter can help to locate searches of interest, for example, identifying ‘text’ searches of Genes or Compounds. (c) Search results constitute one Step in a Search Strategy, which can be combined with other searches (+ Add a step; green arrow) using Boolean operators (e.g. union, intersection, subtraction). Results may be downloaded (black arrow), and searches edited, saved, shared, or published (icons at right; blue arrow). To retrieve this sample search for A. gambiae proteases expressed in the midgut with a specific promotor region DNA motif, see: https://vectorbase.org/vectorbase/app/workspace/strategies/import/bc4d101022805435. Publicly shared strategies are available from the menu at the top (orange arrow; https://vectorbase.org/vectorbase/app/workspace/strategies/public).

Enrichment analysis

Functional enrichment of gene results includes statistically valid gene ontology (GO) [14,15] and metabolic pathway enrichment results (also available as a word cloud). GO enrichment data can also be exported to REVIGO [20], facilitating data visualization using a variety of interactive tools.

Community annotation

VectorBase continues to support manual gene annotation with Apollo [21], which allows users to create & edit structural annotation, update product names, descriptions and symbols, and so on. For some species, VEuPathDB staff may integrate these annotations as part of the official gene set once several annotations have been submitted. Users can request that a specific genome be made available in Apollo for annotation by contacting the help desk. The ‘User Comments’ tool available on Gene Pages is new to VectorBase, allowing users to submit comments about specific genes, which are immediately integrated into the database and become searchable.

Homology predictions

VectorBase has historically been used to predict putative gene function, resolve evolutionary questions, and provide comparative genomic analyses using the Ensembl Compara pipeline. This functionality is now provided by OrthoMCL [4], but Compara can still be accessed via Ensembl Metazoa [7,8] (https://metazoa.ensembl.org/index.html) using the genome browser gene pages and BioMart [22,23].

Galaxy

Computationally intensive analysis of user-provided data (e.g. RNA-seq datasets, SNP calling, etc.) continues to be provided via a user-friendly front end to a cloud-based Galaxy pipeline [24], allowing users to privately analyze their own data. Output files can be exported for interrogation in the context of all other data in VectorBase.

Registration and citation

VectorBase does not require registration for use, but an account provides additional features including email alerts about new data sets, the ability to save and share BLAST jobs, Search Strategies, output results from Galaxy, gene annotations in Apollo, and more. Much of the data in VectorBase is provided by independent researchers, and citation information is included for each VectorBase record, including publications or other attribution details for unpublished datasets, allowing users to cite primary data sources when relevant. A FAQ (https://vectorbase.org/vectorbase/app/static-content/faq.html) and the ‘About’ section provides information on how to cite VectorBase, and when appropriate, users are encouraged to include the VectorBase logo, tables, figures, and images in their original research presentations and publications.

Recent science enabled by VectorBase data and tools

VectorBase data, tools, and analyses can demonstrably expedite basic discovery and translational research. For example, VectorBase genomes have been used to resolve questions involving individual genes [25-27], characterize gene families [28,29], and perform genome-wide analyses [30,31]. Genome resources have also been used to develop wet lab techniques, for example, for primer design [32] or a multilocus amplicon sequencing for simultaneous mosquito species identification and detection of parasite infection status [33]. BLAST [34,35], gene enrichment [36], and comparative genomic analyses among the same or different species, have been used for phylogenetic and homolog gene predictions [37•,38•,39•]. Genome assemblies have been improved, creating physical maps [40], karyotypes [41], and genome elements identified [42-44] using VectorBase files and tools. Researchers have also used VectorBase genome assembly and gene set files to perform analyses such as transcript differential expression [45,46] and peptide expression [47]. Transcriptomics and proteomics data sets deposited in VectorBase allow research groups to ask or test new hypotheses as described in this review paper on mosquito ‘omics [7•], now also possible using the Search Strategy system [12•]. Phenotype experiments, for example, insecticide susceptibility [50], have been analyzed using VectorBase genomes and the Ensembl Variation Effect Predictor (VEP), to interpret obtained genotypes (variant calling). The MapVEu tool has been used to generate meta-analyses, for example, with the population abundance view [51•], and/or facilitate reviews, for example, with the blood meal view [52•].

Summary and future perspectives

VEuPathDB provides consistent representation, interrogation, and visualization of data types and tools for hosts, vectors, parasites, and fungi species. Vector data can be accessed directly through VectorBase or through the VEuPathDB homepage. Infrastructure improvements resulting from the merger of VectorBase and EuPathDB allow for increased scalability, efficiency, and interoperability to incorporate the increasing quantities of data and new data types. Future VectorBase releases are expected to provide organism preference parameters enabling customization of the user experience including the ability to select organisms across taxonomic groups (e.g. exploration of both Plasmodium parasites and Anopheles mosquito vectors). Development plans include tools for analysis and visualization of vector-pathogen interactions and systems biology research, resources for integrated exploration of VectorBase and the bacterial/viral BRC, improved visualizations and analyses for the MapVEu tool, an improved variant calling pipeline and associated searches, improved mechanisms for portability of data to other applications, and additional workflows using the VEuPathDB Galaxy instance.
  50 in total

Review 1.  Of Genes and Genomes: Mosquito Evolution and Diversity.

Authors:  Livio Ruzzante; Maarten J M F Reijnders; Robert M Waterhouse
Journal:  Trends Parasitol       Date:  2018-11-01

Review 2.  A Network Perspective on the Vectoring of Human Disease.

Authors:  Ben Bellekom; Talya D Hackett; Owen T Lewis
Journal:  Trends Parasitol       Date:  2021-01-05

3.  Using OrthoMCL to assign proteins to OrthoMCL-DB groups or to cluster proteomes into new ortholog groups.

Authors:  Steve Fischer; Brian P Brunk; Feng Chen; Xin Gao; Omar S Harb; John B Iodice; Dhanasekaran Shanmugam; David S Roos; Christian J Stoeckert
Journal:  Curr Protoc Bioinformatics       Date:  2011-09

4.  EuPathDB: The Eukaryotic Pathogen Genomics Database Resource.

Authors:  Susanne Warrenfeltz; Evelina Y Basenko; Kathryn Crouch; Omar S Harb; Jessica C Kissinger; David S Roos; Achchuthan Shanmugasundram; Fatima Silva-Franco
Journal:  Methods Mol Biol       Date:  2018

5.  Analysis of Zika virus capsid-Aedes aegypti mosquito interactome reveals pro-viral host factors critical for establishing infection.

Authors:  Rommel J Gestuveo; Jamie Royle; Claire L Donald; Douglas J Lamont; Edward C Hutchinson; Andres Merits; Alain Kohl; Margus Varjak
Journal:  Nat Commun       Date:  2021-05-13       Impact factor: 14.919

6.  BioMart: driving a paradigm change in biological data management.

Authors:  Arek Kasprzyk
Journal:  Database (Oxford)       Date:  2011-11-13       Impact factor: 3.451

7.  Identification of new Anopheles gambiae transcriptional enhancers using a cross-species prediction approach.

Authors:  I Schember; M S Halfon
Journal:  Insect Mol Biol       Date:  2021-04-27       Impact factor: 3.424

8.  ChEBI in 2016: Improved services and an expanding collection of metabolites.

Authors:  Janna Hastings; Gareth Owen; Adriano Dekker; Marcus Ennis; Namrata Kale; Venkatesh Muthukrishnan; Steve Turner; Neil Swainston; Pedro Mendes; Christoph Steinbeck
Journal:  Nucleic Acids Res       Date:  2015-10-13       Impact factor: 16.971

9.  Apollo: Democratizing genome annotation.

Authors:  Nathan A Dunn; Deepak R Unni; Colin Diesh; Monica Munoz-Torres; Nomi L Harris; Eric Yao; Helena Rasche; Ian H Holmes; Christine G Elsik; Suzanna E Lewis
Journal:  PLoS Comput Biol       Date:  2019-02-06       Impact factor: 4.475

10.  Human blood microRNA hsa-miR-21-5p induces vitellogenin in the mosquito Aedes aegypti.

Authors:  Hugo D Perdomo; Mazhar Hussain; Rhys Parry; Kayvan Etebari; Lauren M Hedges; Guangmei Zhang; Benjamin L Schulz; Sassan Asgari
Journal:  Commun Biol       Date:  2021-07-09
View more
  5 in total

1.  Quantitative Trait Locus Determining the Time of Blood Feeding in Culex pipiens (Diptera: Culicidae).

Authors:  Paul V Hickner; Akio Mori; Samuel S C Rund; David W Severson
Journal:  J Med Entomol       Date:  2022-09-14       Impact factor: 2.435

2.  Retrogene Duplication and Expression Patterns Shaped by the Evolution of Sex Chromosomes in Malaria Mosquitoes.

Authors:  Duncan Miller; Jianhai Chen; Jiangtao Liang; Esther Betrán; Manyuan Long; Igor V Sharakhov
Journal:  Genes (Basel)       Date:  2022-05-28       Impact factor: 4.141

3.  A transcriptomic atlas of Aedes aegypti reveals detailed functional organization of major body parts and gut regional specializations in sugar-fed and blood-fed adult females.

Authors:  Bretta Hixson; Xiao-Li Bing; Xiaowei Yang; Alessandro Bonfini; Peter Nagy; Nicolas Buchon
Journal:  Elife       Date:  2022-04-26       Impact factor: 8.713

Review 4.  Strategies to improve homology-based repair outcomes following CRISPR-based gene editing in mosquitoes: lessons in how to keep any repair disruptions local.

Authors:  Micaela Finney; Joseph Romanowski; Zach N Adelman
Journal:  Virol J       Date:  2022-07-30       Impact factor: 5.913

5.  Serotonin modulation in the male Aedes aegypti ear influences hearing.

Authors:  Yifeng Y J Xu; YuMin M Loh; Tai-Ting Lee; Takuro S Ohashi; Matthew P Su; Azusa Kamikouchi
Journal:  Front Physiol       Date:  2022-08-29       Impact factor: 4.755

  5 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.