Literature DB >> 32492127

GLASSgo in Galaxy: high-throughput, reproducible and easy-to-integrate prediction of sRNA homologs.

Richard A Schäfer1, Steffen C Lott2, Jens Georg2, Björn A Grüning3, Wolfgang R Hess2, Björn Voß1.   

Abstract

MOTIVATION: The correct prediction of bacterial sRNA homologs is a prerequisite for many downstream analyses based on comparative genomics, but it is frequently challenging due to the short length and distinct heterogeneity of such homologs. GLobal Automatic Small RNA Search go (GLASSgo) is an efficient tool for the prediction of sRNA homologs from a single input query. To make the algorithm available to a broader community, we offer a Docker container along with a free-access web service. For non-computer scientists, the web service provides a user-friendly interface. However, capabilities were lacking so far for batch processing, version control and direct interaction with compatible software applications as a workflow management system can provide.
RESULTS: Here, we present GLASSgo 1.5.2, an updated version that is fully incorporated into the workflow management system Galaxy. The improved version contains a new feature for extracting the upstream regions, allowing the search for conserved promoter elements. Additionally, it supports the use of accession numbers instead of the outdated GI numbers, which widens the applicability of the tool.
AVAILABILITY AND IMPLEMENTATION: GLASSgo is available at https://github.com/lotts/GLASSgo/ under the MIT license and is accompanied by instruction and application data. Furthermore, it can be installed into any Galaxy instance using the Galaxy ToolShed.
© The Author(s) 2020. Published by Oxford University Press.

Entities:  

Year:  2020        PMID: 32492127      PMCID: PMC7520042          DOI: 10.1093/bioinformatics/btaa556

Source DB:  PubMed          Journal:  Bioinformatics        ISSN: 1367-4803            Impact factor:   6.937


1 Introduction

One of the most fundamental analyses of newly discovered genes is the search for potential homologues in related species and beyond. Regulatory small RNAs (sRNAs) are often important post-transcriptional regulators of gene expression in bacteria. For their better characterization, information about homologs existing elsewhere is crucially relevant. However, the identification of homologous sRNA genes can be challenging due to their relatively short length, frequently little sequence conservation and the absence of reading frames and signals for translation. Therefore, we developed GLobal Automatic Small RNA Search go (GLASSgo) for the fast and reliable search for sRNA homologs starting from a single sRNA query (Lott ). Since its initial release in 2018, this algorithm has filled a gap in the computational analysis of sRNAs in bacteria (Wright et al., 2018). Briefly, GLASSgo performs iterative, sensitive searches for sequence similarity using BLASTn, filtering and clustering, with an optional final structure based assessment. GLASSgo is available on GitHub, Docker Hub (https://hub.docker.com/r/lotts/glassgo) and as a web service (http://rna.informatik.uni-freiburg.de/GLASSgo/) (Raden ). Especially the latter two options greatly simplify the use of GLASSgo by non-bioinformaticians. However, results may be difficult to reproduce later because GLASSgo makes heavy use of external resources, such as the NCBI BLAST databases that are updated regularly. Furthermore, high-throughput analyses for hundreds or thousands of sRNAs, and the integration of GLASSgo into automated workflows is not trivial. A framework that provides both, workflow automation and reproducibility, is Galaxy (Afgan ). For tool integration, Galaxy offers a one-click installation solution, which automatically resolves all dependencies and keeps track of all versions (libraries, tools and XML wrapper versions). Here, we present an updated version of GLASSgo and its integration into the Galaxy workflow management system utilizing Docker technology.

2 Results

2.1 Update of GLASSgo

With GLASSgo 1.5.2, we add the possibility to report upstream regions of the found sRNA homologs. This is a prerequisite to infer possible promoter elements of the sRNAs genes, hence facilitating the integration of sRNAs into transcription factor-based gene regulatory networks. To secure compatibility with the latest NCBI databases, GLASSgo 1.5.2 now supports the use of NCBI accession numbers (ACC numbers) as unique identifiers. This adaptation requires the preparation of new lookup tables for the taxonomic classification, which also serve as a prerequisite for clade-specific searches that are often more powerful than unrestricted analyses. To ensure easy updating and retrieval for existing and new installations, these lookup tables have been stored in an open access repository (Zenodo: https://zenodo.org/record/1320180).

2.2 Galaxy integration

The integration of GLASSgo into Galaxy is based on a Docker container. Therein we provide GLASSgo together with all its dependencies (NCBI-BLAST+, RNApdist, …) and distribute it via Docker Hub. Upon new releases the container is built automatically from source. This includes repository tests of the build process itself and functional tests of GLASSgo for different use cases. This Docker container can also be used for command-line-based analyses or for the integration into custom analysis pipelines. For the Galaxy integration, we make use of the Galaxy ToolShed (Blankenberg et al., 2014), which is a one-click solution for tool installation on custom hosted Galaxy instances. GLASSgo follows best practice guidelines and can be seamlessly installed from the Galaxy ToolShed (https://toolshed.g2.bx.psu.edu/view/computationaltranscriptomics/glassgo) using the admin interface. The installation procedure includes functional tests to ensure correct installation and comprehensive documentation of the usage of GLASSgo. Clade-specific searches require some additional configuration that is simplified by the included custom scripts. The automatic interplay of GLASSgo with the local Galaxy environment and external web resources for the Docker container and the lookup tables, which are hosted on Zenodo, is shown in Figure 1. We provide instruction videos that guide through the installation and setup process (https://youtu.be/SiS2ThYDkdU) as well as its usage (https://youtu.be/wFE7LFG9clQ) (Schäfer ).
Fig. 1.

Galaxy integration scheme of GLASSgo 1.5.2. The sources on GitHub are used to build a Docker container including all dependencies, which is available on Docker Hub. The Galaxy wrapper defines the user interface and manages the interaction with the Docker container. Special lookup tables required for clade-specific searches can be directly downloaded from Zenodo and are incorporated into Galaxy

Galaxy integration scheme of GLASSgo 1.5.2. The sources on GitHub are used to build a Docker container including all dependencies, which is available on Docker Hub. The Galaxy wrapper defines the user interface and manages the interaction with the Docker container. Special lookup tables required for clade-specific searches can be directly downloaded from Zenodo and are incorporated into Galaxy

2.3 Using GLASSgo in Galaxy

GLASSgo is part of the RNA workbench (Fallmann ), which provides a public Galaxy instance with a set of tools for RNA-related tasks and is available at https://rna.usegalaxy.eu. However, the following description fits also for installations on local Galaxy instances. The user interface follows the design of the GLASSgo web server and is shown in Figure 1. The sRNA sequence of interest has to be uploaded to the user’s history in FASTA format. GLASSgo relies on BLAST and, thus, a fundamental parameter is the database to search in. Most Galaxy instances will have a set of databases already available for standard BLAST searches, and GLASSgo can use the same databases for its tasks. If the user wants to use a specialized database, e.g. a clade-specific or a custom database, GLASSgo within Galaxy offers two options: First, the user can choose a clade to restrict the BLAST searches to, which is achieved with the aforementioned lookup tables. Second, users can use custom BLAST databases, for example created from sequences in their own Galaxy history. The usage of GLASSgo is shown in detail in the instruction video mentioned above.

3 Discussion

The integration of GLASSgo into the Galaxy workflow management system offers many advantages over the standalone and web server version such as parameter tracking, version control, batch processing and pipeline development. Galaxy also follows the FAIR manifesto, which stands for ‘Findable, Accessible, Interoperable and Reusable’ (Wilkinson ) and thereby ensures good scientific practice. GLASSgo can be easily installed into a running Galaxy instance through the ToolShed system. In addition, GLASSgo can now be integrated into larger workflows, for example to build RNA family models. Here, it delivers the set of homologous sRNA sequences, and Infernal (Nawrocki ) will be used to build the covariance model. In this regard, the incorporation of GLASSgo into Galaxy leverages its full potential with respect to workflow integration and increased usability. We decided on a Docker-based integration because this avoids dependency, compatibility and compilation issues, which frequently occur with other ways to distribute software. Finally, the new feature to include upstream sequences in the results enables new analyses, such as the search for conserved motifs in promoter regions. These can then be used for further validation and filtering, but most importantly allow to integrate sRNAs into gene regulatory networks. Together with recently available advanced tools for sRNA target predictions, this improvement represents another corner stone for the integration of transcription factor-based gene regulatory networks with the post-transcriptional targets of bacterial sRNAs.

Funding

This work was supported by the Federal Ministry of Education and Research (BMBF) programs RNAProNet [031L0106B to W.R.H. and 031L0164A to B.V.], inteRNAct [031A310 to B.V.], the German-Israeli Foundation (GIF) [G-1311 to W.R.H.] and the Deutsche Forschungsgemeinschaft [GE 3159/1-1 to J.G.]. Conflict of Interest: none declared.
  8 in total

1.  Workflow for a Computational Analysis of an sRNA Candidate in Bacteria.

Authors:  Patrick R Wright; Jens Georg
Journal:  Methods Mol Biol       Date:  2018

2.  Infernal 1.1: 100-fold faster RNA homology searches.

Authors:  Eric P Nawrocki; Sean R Eddy
Journal:  Bioinformatics       Date:  2013-09-04       Impact factor: 6.937

3.  Freiburg RNA tools: a central online resource for RNA-focused research and teaching.

Authors:  Martin Raden; Syed M Ali; Omer S Alkhnbashi; Anke Busch; Fabrizio Costa; Jason A Davis; Florian Eggenhofer; Rick Gelhausen; Jens Georg; Steffen Heyne; Michael Hiller; Kousik Kundu; Robert Kleinkauf; Steffen C Lott; Mostafa M Mohamed; Alexander Mattheis; Milad Miladi; Andreas S Richter; Sebastian Will; Joachim Wolff; Patrick R Wright; Rolf Backofen
Journal:  Nucleic Acids Res       Date:  2018-07-02       Impact factor: 16.971

4.  The Galaxy platform for accessible, reproducible and collaborative biomedical analyses: 2018 update.

Authors:  Enis Afgan; Dannon Baker; Bérénice Batut; Marius van den Beek; Dave Bouvier; Martin Cech; John Chilton; Dave Clements; Nate Coraor; Björn A Grüning; Aysam Guerler; Jennifer Hillman-Jackson; Saskia Hiltemann; Vahid Jalili; Helena Rasche; Nicola Soranzo; Jeremy Goecks; James Taylor; Anton Nekrutenko; Daniel Blankenberg
Journal:  Nucleic Acids Res       Date:  2018-07-02       Impact factor: 16.971

5.  The RNA workbench 2.0: next generation RNA data analysis.

Authors:  Jörg Fallmann; Pavankumar Videm; Andrea Bagnacani; Bérénice Batut; Maria A Doyle; Tomas Klingstrom; Florian Eggenhofer; Peter F Stadler; Rolf Backofen; Björn Grüning
Journal:  Nucleic Acids Res       Date:  2019-07-02       Impact factor: 16.971

6.  Dissemination of scientific software with Galaxy ToolShed.

Authors:  Daniel Blankenberg; Gregory Von Kuster; Emil Bouvier; Dannon Baker; Enis Afgan; Nicholas Stoler; James Taylor; Anton Nekrutenko
Journal:  Genome Biol       Date:  2014-02-20       Impact factor: 13.583

7.  The FAIR Guiding Principles for scientific data management and stewardship.

Authors:  Mark D Wilkinson; Michel Dumontier; I Jsbrand Jan Aalbersberg; Gabrielle Appleton; Myles Axton; Arie Baak; Niklas Blomberg; Jan-Willem Boiten; Luiz Bonino da Silva Santos; Philip E Bourne; Jildau Bouwman; Anthony J Brookes; Tim Clark; Mercè Crosas; Ingrid Dillo; Olivier Dumon; Scott Edmunds; Chris T Evelo; Richard Finkers; Alejandra Gonzalez-Beltran; Alasdair J G Gray; Paul Groth; Carole Goble; Jeffrey S Grethe; Jaap Heringa; Peter A C 't Hoen; Rob Hooft; Tobias Kuhn; Ruben Kok; Joost Kok; Scott J Lusher; Maryann E Martone; Albert Mons; Abel L Packer; Bengt Persson; Philippe Rocca-Serra; Marco Roos; Rene van Schaik; Susanna-Assunta Sansone; Erik Schultes; Thierry Sengstag; Ted Slater; George Strawn; Morris A Swertz; Mark Thompson; Johan van der Lei; Erik van Mulligen; Jan Velterop; Andra Waagmeester; Peter Wittenburg; Katherine Wolstencroft; Jun Zhao; Barend Mons
Journal:  Sci Data       Date:  2016-03-15       Impact factor: 6.444

8.  GLASSgo - Automated and Reliable Detection of sRNA Homologs From a Single Input Sequence.

Authors:  Steffen C Lott; Richard A Schäfer; Martin Mann; Rolf Backofen; Wolfgang R Hess; Björn Voß; Jens Georg
Journal:  Front Genet       Date:  2018-04-17       Impact factor: 4.599

  8 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.