Literature DB >> 28402416

ReGaTE: Registration of Galaxy Tools in Elixir.

Olivia Doppelt-Azeroual1, Fabien Mareuil1, Eric Deveaud1, Matúš Kalaš2, Nicola Soranzo3, Marius van den Beek4, Björn Grüning5, Jon Ison6, Hervé Ménager1.   

Abstract

Background: Bioinformaticians routinely use multiple software tools and data sources in their day-to-day work and have been guided in their choices by a number of cataloguing initiatives. The ELIXIR Tools and Data Services Registry (bio.tools) aims to provide a central information point, independent of any specific scientific scope within bioinformatics or technological implementation. Meanwhile, efforts to integrate bioinformatics software in workbench and workflow environments have accelerated to enable the design, automation, and reproducibility of bioinformatics experiments. One such popular environment is the Galaxy framework, with currently more than 80 publicly available Galaxy servers around the world. In the context of a generic registry for bioinformatics software, such as bio.tools, Galaxy instances constitute a major source of valuable content. Yet there has been, to date, no convenient mechanism to register such services en masse. We present ReGaTE (Registration of Galaxy Tools in Elixir), a software utility that automates the process of registering the services available in a Galaxy instance. This utility uses the BioBlend application program interface to extract service metadata from a Galaxy server, enhance the metadata with the scientific information required by bio.tools, and push it to the registry. ReGaTE provides a fast and convenient way to publish Galaxy services in bio.tools. By doing so, service providers may increase the visibility of their services while enriching the software discovery function that bio.tools provides for its users. The source code of ReGaTE is freely available on Github at https://github.com/C3BI-pasteur-fr/ReGaTE .
© The Author 2017. Published by Oxford University Press.

Entities:  

Keywords:  Galaxy; bio.tools; bioinformatics services

Mesh:

Year:  2017        PMID: 28402416      PMCID: PMC5530318          DOI: 10.1093/gigascience/gix022

Source DB:  PubMed          Journal:  Gigascience        ISSN: 2047-217X            Impact factor:   6.524


Introduction

Over the recent years, various initiatives have aimed at cataloguing bioinformatics tools and services [1-6]. In particular, the ELIXIR Tools and Data Services Registry (bio.tools) [7] offers a community-curated information portal whose goals are comprehensive coverage and consistent description of bioinformatics tools and services. Another ongoing trend is the integration of bioinformatics software in workbench and workflow environments, which allows data analysts to design, automate, and reproduce bioinformatics experiments. The Galaxy framework [8-10] is one of the most popular of such environments, with currently more than 80 publicly available Galaxy servers (see https://wiki.galaxyproject.org/PublicGalaxyServers) around the world. The registration and maintenance of entries in the bio.tools registry is based on a “federated curation model” whereby the maintenance of resource entries is handled by their owners. Current efforts to automatically register tools and services in resource catalogs mostly target programming language–specific catalogs, such as the Python package index (see http://pypi.python.org/pypi), and seldom domain-specific catalogs, with a few notable exceptions such as BioJS [11], the BioGems registry [12], and BioMOBY [13]. In contrast, ReGaTE is a solution for owners of Galaxy server instances to easily register their tools on a resource catalog that is not specific to any programming language or technical requirement. The scientists browsing the bio.tools server can therefore search and compare resources independently of any technological implementation. In the context of a generic registry for bioinformatics software, such as bio.tools, Galaxy instances constitute a major source of valuable content. The ReGaTE utility is a software component that automates the registration of the bioinformatics tools installed on a Galaxy server. We will present in the following sections the major aspects of its implementation, its architecture, and finally the mapping of tool metadata from Galaxy to bio.tools.

Implementation

ReGaTE pulls tool descriptions from a Galaxy server, augments the information, and pushes it to the bio.tools registry. A Galaxy server is a framework that supports users to configure and run a range of bioinformatics tools and workflows and that gathers many other features for the sharing, visualization, and reproducibility of analyses. The user interface and execution of tools are based on a tool definition in an eXtensible Markup Language (XML) file (detailed documentation of this format is available at https://wiki.galaxyproject.org/Admin/Tools/ToolConfigSyntax). Each file describes the bioinformatics tool in a detailed way, including the tool parameters, inputs, and outputs. This allows the display of their sometimes complex configuration options in a graphical user interface, primarily to enable tool parameterization and its execution. Such tool definitions are loaded by the Galaxy server and are accessible through the Galaxy RESTful interface. The BioBlend library [14] allows convenient access to the Galaxy application program interface (API) from Python. Here, we have used BioBlend to extract Galaxy tool definitions from remote Galaxy instances. Bio.tools [7] is a web portal provided by ELIXIR – the European infrastructure for biological information for the exploration of bioinformatics resources including software packages, web services, and database portals. Through a dedicated graphical interface, users can search for and compare resources. Thus, bioinformatics resource providers can use bio.tools to enhance the visibility of their services. The description and registration of a resource can be done manually via a web user interface, or resources may be registered using the registry API. Registry entries follow a model that is formalized in biotoolsSchema (the biotoolsSchema format definition is available at https://github.com/bio-tools/biotoolsSchema/), an XML schema that defines a resource description model for bioinformatics with a mandatory core of 10 attributes. ReGaTE fetches the Galaxy tool definitions, enhances them with additional annotations, and converts them into the biotoolsSchema-based JSON format, using the mapping mechanism described in the next section, before pushing them to bio.tools. This process can be triggered all at once or step by step, first extracting the tool metadata, and second pushing enhanced metadata to bio.tools. A ReGaTE user needs to have an account on the targeted Galaxy and retrieve his API key to extract the tool definitions and an account on the bio.tools server to push the registry entries.

ReGaTE Architecture

ReGaTE is a Python script coupled with a configuration file and mapping of semantics used by Galaxy and bio.tools. An overview of its architecture is shown in Fig. 1.
Figure 1:

ReGaTE software architecture.

ReGaTE software architecture. The configuration file includes the Galaxy server uniform resource locator, an API key, and a directory to store the generated tool files uploadable to bio.tools. Suffix and prefix variables, for tagging the names of the tools extracted by ReGaTE, may also be specified. For example, the name of the tool SARTools DESeq2 [15], implemented at Institut Pasteur, can be named SARTools DESeq2-IP.

Tool Metadata Mapping

A biotools-Schema file describes a given software application, covering different properties: scientific properties, such as the domain catered for and description of the type of task(s) done by a tool; technical properties, such as the type of software and its interface(s), e.g., command line tool, web application, web service, etc.; credit, e.g., the references that need to be cited when referring to this work; administrative information, such as the license used in the software. Some of these properties are described using the EMBRACE Data and Methods (EDAM) ontology [16]. Development of the EDAM ontology is driven by community requests via GitHub, mailing lists, and community-based hackathons (more information on contributions can be found at https://github.com/edamontology/edamontology/blob/master/HOW_TO_CONTRIBUTE.md). It currently includes 3280 concepts with regular (at least quarterly) major releases. EDAM includes the following common bioinformatics concepts: topics, i.e., scientific disciplines or domains covered by the resource; specific operations performed by a tool or service; types of input and output data; formats in which inputs and outputs are available. The mapping from a Galaxy tool definition file (detailed documentation of this format is available at https://wiki.galaxyproject.org/Admin/Tools/ToolConfigSyntax) to a bio.tools file is handled by the ReGaTE code, taking advantage of the important number of common properties between such workbench wrappers and registry entries [17]. A few properties are not natively available in the Galaxy tool files retrieved by BioBlend; these missing data are provided by the ReGaTE configuration files. The mapping of Galaxy tool properties to EDAM concepts is a key component. This translation is handled by yet another markup language mapping files included in the ReGaTE distribution that handle the conversion of Galaxy datatypes to EDAM data and format concepts, and that also allow EDAM topics and operations to be specified.

Conclusions and Future Work

The bio.tools registry allows Galaxy server maintainers to increase the visibility of their services, set in context of offerings from other providers. The ReGaTE utility is a fast and convenient solution to enhance, publish, and maintain the services provided by a Galaxy server in the registry. Furthermore, ReGaTE can prove a valuable contribution toward providing bio.tools with more comprehensive coverage of the community resources. Current work on ReGaTE is focused on migration of the core functionality and tool semantics to the Galaxy Project itself. This integration will rely on the direct annotation of Galaxy datatypes with EDAM format and data concepts (see https://github.com/galaxyproject/galaxy/pull/2387 and https://github.com/galaxyproject/galaxy/pull/2428), as well as the possibility to specify an EDAM topic (see https://github.com/galaxyproject/galaxy/pull/2397) and operational concepts (see https://github.com/galaxyproject/galaxy/pull/2379) directly in Galaxy tool definitions (see https://github.com/galaxyproject/galaxy/pull/3221). The use of EDAM as a standard for describing bioinformatics resources can provide a backbone to improve interoperability and guide users to connect and compose Galaxy tools [18], extending potentially to external components and environments that share this common vocabulary. A future priority will therefore be to exploit EDAM annotations in these ways for the benefit of Galaxy users and providers.

Abbreviations

API: application program interface; EDAM: EMBRACE Data and Methods; URL: Uniform Resource Locator; XML: eXtensible Markup Language; YAML: Yet Another Markup Language. Click here for additional data file. Click here for additional data file. Click here for additional data file. Click here for additional data file. Click here for additional data file. Click here for additional data file. Click here for additional data file. Click here for additional data file. Click here for additional data file. Click here for additional data file. Click here for additional data file. Click here for additional data file.
  18 in total

Review 1.  Interoperability with Moby 1.0--it's better than sharing your toothbrush!

Authors:  Mark D Wilkinson; Martin Senger; Edward Kawas; Richard Bruskiewich; Jerome Gouzy; Celine Noirot; Philippe Bardou; Ambrose Ng; Dirk Haase; Enrique de Andres Saiz; Dennis Wang; Frank Gibbons; Paul M K Gordon; Christoph W Sensen; Jose Manuel Rodriguez Carrasco; José M Fernández; Lixin Shen; Matthew Links; Michael Ng; Nina Opushneva; Pieter B T Neerincx; Jack A M Leunissen; Rebecca Ernst; Simon Twigger; Bjorn Usadel; Benjamin Good; Yan Wong; Lincoln Stein; William Crosby; Johan Karlsson; Romina Royo; Iván Párraga; Sergio Ramírez; Josep Lluis Gelpi; Oswaldo Trelles; David G Pisano; Natalia Jimenez; Arnaud Kerhornou; Roman Rosset; Leire Zamacola; Joaquin Tarraga; Jaime Huerta-Cepas; Jose María Carazo; Joaquin Dopazo; Roderic Guigo; Arcadi Navarro; Modesto Orozco; Alfonso Valencia; M Gonzalo Claros; Antonio J Pérez; Jose Aldana; M Mar Rojano; Raul Fernandez-Santa Cruz; Ismael Navas; Gary Schiltz; Andrew Farmer; Damian Gessler; Heiko Schoof; Andreas Groscurth
Journal:  Brief Bioinform       Date:  2008-01-31       Impact factor: 11.622

2.  BioCatalogue: a universal catalogue of web services for the life sciences.

Authors:  Jiten Bhagat; Franck Tanoh; Eric Nzuobontane; Thomas Laurent; Jerzy Orlowski; Marco Roos; Katy Wolstencroft; Sergejs Aleksejevs; Robert Stevens; Steve Pettifer; Rodrigo Lopez; Carole A Goble
Journal:  Nucleic Acids Res       Date:  2010-05-19       Impact factor: 16.971

3.  Biogem: an effective tool-based approach for scaling up open source software development in bioinformatics.

Authors:  Raoul J P Bonnal; Jan Aerts; George Githinji; Naohisa Goto; Dan MacLean; Chase A Miller; Hiroyuki Mishima; Massimiliano Pagani; Ricardo Ramirez-Gonzalez; Geert Smant; Francesco Strozzi; Rob Syme; Rutger Vos; Trevor J Wennblom; Ben J Woodcroft; Toshiaki Katayama; Pjotr Prins
Journal:  Bioinformatics       Date:  2012-02-12       Impact factor: 6.937

4.  ExPASy: SIB bioinformatics resource portal.

Authors:  Panu Artimo; Manohar Jonnalagedda; Konstantin Arnold; Delphine Baratin; Gabor Csardi; Edouard de Castro; Séverine Duvaud; Volker Flegel; Arnaud Fortier; Elisabeth Gasteiger; Aurélien Grosdidier; Céline Hernandez; Vassilios Ioannidis; Dmitry Kuznetsov; Robin Liechti; Sébastien Moretti; Khaled Mostaguir; Nicole Redaschi; Grégoire Rossier; Ioannis Xenarios; Heinz Stockinger
Journal:  Nucleic Acids Res       Date:  2012-05-31       Impact factor: 16.971

5.  Galaxy: a comprehensive approach for supporting accessible, reproducible, and transparent computational research in the life sciences.

Authors:  Jeremy Goecks; Anton Nekrutenko; James Taylor
Journal:  Genome Biol       Date:  2010-08-25       Impact factor: 13.583

6.  The Bioinformatics Links Directory: a compilation of molecular biology web servers.

Authors:  Joanne A Fox; Stefanie L Butland; Scott McMillan; Graeme Campbell; B F Francis Ouellette
Journal:  Nucleic Acids Res       Date:  2005-07-01       Impact factor: 16.971

7.  BioSharing: curated and crowd-sourced metadata standards, databases and data policies in the life sciences.

Authors:  Peter McQuilton; Alejandra Gonzalez-Beltran; Philippe Rocca-Serra; Milo Thurston; Allyson Lister; Eamonn Maguire; Susanna-Assunta Sansone
Journal:  Database (Oxford)       Date:  2016-05-17       Impact factor: 3.451

8.  SARTools: A DESeq2- and EdgeR-Based R Pipeline for Comprehensive Differential Analysis of RNA-Seq Data.

Authors:  Hugo Varet; Loraine Brillet-Guéguen; Jean-Yves Coppée; Marie-Agnès Dillies
Journal:  PLoS One       Date:  2016-06-09       Impact factor: 3.240

9.  ReGaTE: Registration of Galaxy Tools in Elixir.

Authors:  Olivia Doppelt-Azeroual; Fabien Mareuil; Eric Deveaud; Matúš Kalaš; Nicola Soranzo; Marius van den Beek; Björn Grüning; Jon Ison; Hervé Ménager
Journal:  Gigascience       Date:  2017-06-01       Impact factor: 6.524

10.  Tools and data services registry: a community effort to document bioinformatics resources.

Authors:  Jon Ison; Kristoffer Rapacki; Hervé Ménager; Matúš Kalaš; Emil Rydza; Piotr Chmura; Christian Anthon; Niall Beard; Karel Berka; Dan Bolser; Tim Booth; Anthony Bretaudeau; Jan Brezovsky; Rita Casadio; Gianni Cesareni; Frederik Coppens; Michael Cornell; Gianmauro Cuccuru; Kristian Davidsen; Gianluca Della Vedova; Tunca Dogan; Olivia Doppelt-Azeroual; Laura Emery; Elisabeth Gasteiger; Thomas Gatter; Tatyana Goldberg; Marie Grosjean; Björn Grüning; Manuela Helmer-Citterich; Hans Ienasescu; Vassilios Ioannidis; Martin Closter Jespersen; Rafael Jimenez; Nick Juty; Peter Juvan; Maximilian Koch; Camille Laibe; Jing-Woei Li; Luana Licata; Fabien Mareuil; Ivan Mičetić; Rune Møllegaard Friborg; Sebastien Moretti; Chris Morris; Steffen Möller; Aleksandra Nenadic; Hedi Peterson; Giuseppe Profiti; Peter Rice; Paolo Romano; Paola Roncaglia; Rabie Saidi; Andrea Schafferhans; Veit Schwämmle; Callum Smith; Maria Maddalena Sperotto; Heinz Stockinger; Radka Svobodová Vařeková; Silvio C E Tosatto; Victor de la Torre; Paolo Uva; Allegra Via; Guy Yachdav; Federico Zambelli; Gert Vriend; Burkhard Rost; Helen Parkinson; Peter Løngreen; Søren Brunak
Journal:  Nucleic Acids Res       Date:  2015-11-03       Impact factor: 16.971

View more
  6 in total

1.  Using bio.tools to generate and annotate workbench tool descriptions.

Authors:  Kenzo-Hugo Hillion; Ivan Kuzmin; Anton Khodak; Eric Rasche; Michael Crusoe; Hedi Peterson; Jon Ison; Hervé Ménager
Journal:  F1000Res       Date:  2017-11-30

2.  An architecture for genomics analysis in a clinical setting using Galaxy and Docker.

Authors:  W Digan; H Countouris; M Barritault; D Baudoin; P Laurent-Puig; H Blons; A Burgun; B Rance
Journal:  Gigascience       Date:  2017-11-01       Impact factor: 6.524

3.  ReGaTE: Registration of Galaxy Tools in Elixir.

Authors:  Olivia Doppelt-Azeroual; Fabien Mareuil; Eric Deveaud; Matúš Kalaš; Nicola Soranzo; Marius van den Beek; Björn Grüning; Jon Ison; Hervé Ménager
Journal:  Gigascience       Date:  2017-06-01       Impact factor: 6.524

4.  The bio.tools registry of software tools and data resources for the life sciences.

Authors:  Jon Ison; Hans Ienasescu; Piotr Chmura; Emil Rydza; Hervé Ménager; Matúš Kalaš; Veit Schwämmle; Björn Grüning; Niall Beard; Rodrigo Lopez; Severine Duvaud; Heinz Stockinger; Bengt Persson; Radka Svobodová Vařeková; Tomáš Raček; Jiří Vondrášek; Hedi Peterson; Ahto Salumets; Inge Jonassen; Rob Hooft; Tommi Nyrönen; Alfonso Valencia; Salvador Capella; Josep Gelpí; Federico Zambelli; Babis Savakis; Brane Leskošek; Kristoffer Rapacki; Christophe Blanchet; Rafael Jimenez; Arlindo Oliveira; Gert Vriend; Olivier Collin; Jacques van Helden; Peter Løngreen; Søren Brunak
Journal:  Genome Biol       Date:  2019-08-12       Impact factor: 13.583

5.  biotoolsSchema: a formalized schema for bioinformatics software description.

Authors:  Jon Ison; Hans Ienasescu; Emil Rydza; Piotr Chmura; Kristoffer Rapacki; Alban Gaignard; Veit Schwämmle; Jacques van Helden; Matúš Kalaš; Hervé Ménager
Journal:  Gigascience       Date:  2021-01-27       Impact factor: 6.524

6.  Community curation of bioinformatics software and data resources.

Authors:  Jon Ison; Hervé Ménager; Bryan Brancotte; Erik Jaaniso; Ahto Salumets; Tomáš Raček; Anna-Lena Lamprecht; Magnus Palmblad; Matúš Kalaš; Piotr Chmura; John M Hancock; Veit Schwämmle; Hans-Ioan Ienasescu
Journal:  Brief Bioinform       Date:  2020-09-25       Impact factor: 11.622

  6 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.