Richard A Schäfer1, Steffen C Lott2, Jens Georg2, Björn A Grüning3, Wolfgang R Hess2, Björn Voß1. 1. Computational Biology, Institute of Biochemical Engineering, University of Stuttgart, Stuttgart 70569, Germany. 2. Genetics and Experimental Bioinformatics, Institute of Biology III, University of Freiburg, Freiburg 79104, Germany. 3. Bioinformatics, Institute of Computer Science, University of Freiburg, Freiburg 79110, Germany.
Abstract
MOTIVATION: The correct prediction of bacterial sRNA homologs is a prerequisite for many downstream analyses based on comparative genomics, but it is frequently challenging due to the short length and distinct heterogeneity of such homologs. GLobal Automatic Small RNA Search go (GLASSgo) is an efficient tool for the prediction of sRNA homologs from a single input query. To make the algorithm available to a broader community, we offer a Docker container along with a free-access web service. For non-computer scientists, the web service provides a user-friendly interface. However, capabilities were lacking so far for batch processing, version control and direct interaction with compatible software applications as a workflow management system can provide. RESULTS: Here, we present GLASSgo 1.5.2, an updated version that is fully incorporated into the workflow management system Galaxy. The improved version contains a new feature for extracting the upstream regions, allowing the search for conserved promoter elements. Additionally, it supports the use of accession numbers instead of the outdated GI numbers, which widens the applicability of the tool. AVAILABILITY AND IMPLEMENTATION: GLASSgo is available at https://github.com/lotts/GLASSgo/ under the MIT license and is accompanied by instruction and application data. Furthermore, it can be installed into any Galaxy instance using the Galaxy ToolShed.
MOTIVATION: The correct prediction of bacterial sRNA homologs is a prerequisite for many downstream analyses based on comparative genomics, but it is frequently challenging due to the short length and distinct heterogeneity of such homologs. GLobal Automatic Small RNA Search go (GLASSgo) is an efficient tool for the prediction of sRNA homologs from a single input query. To make the algorithm available to a broader community, we offer a Docker container along with a free-access web service. For non-computer scientists, the web service provides a user-friendly interface. However, capabilities were lacking so far for batch processing, version control and direct interaction with compatible software applications as a workflow management system can provide. RESULTS: Here, we present GLASSgo 1.5.2, an updated version that is fully incorporated into the workflow management system Galaxy. The improved version contains a new feature for extracting the upstream regions, allowing the search for conserved promoter elements. Additionally, it supports the use of accession numbers instead of the outdated GI numbers, which widens the applicability of the tool. AVAILABILITY AND IMPLEMENTATION: GLASSgo is available at https://github.com/lotts/GLASSgo/ under the MIT license and is accompanied by instruction and application data. Furthermore, it can be installed into any Galaxy instance using the Galaxy ToolShed.
One of the most fundamental analyses of newly discovered genes is the search for potential homologues in related species and beyond. Regulatory small RNAs (sRNAs) are often important post-transcriptional regulators of gene expression in bacteria. For their better characterization, information about homologs existing elsewhere is crucially relevant. However, the identification of homologous sRNA genes can be challenging due to their relatively short length, frequently little sequence conservation and the absence of reading frames and signals for translation. Therefore, we developed GLobal Automatic Small RNA Search go (GLASSgo) for the fast and reliable search for sRNA homologs starting from a single sRNA query (Lott ). Since its initial release in 2018, this algorithm has filled a gap in the computational analysis of sRNAs in bacteria (Wright et al., 2018). Briefly, GLASSgo performs iterative, sensitive searches for sequence similarity using BLASTn, filtering and clustering, with an optional final structure based assessment. GLASSgo is available on GitHub, Docker Hub (https://hub.docker.com/r/lotts/glassgo) and as a web service (http://rna.informatik.uni-freiburg.de/GLASSgo/) (Raden ). Especially the latter two options greatly simplify the use of GLASSgo by non-bioinformaticians. However, results may be difficult to reproduce later because GLASSgo makes heavy use of external resources, such as the NCBI BLAST databases that are updated regularly. Furthermore, high-throughput analyses for hundreds or thousands of sRNAs, and the integration of GLASSgo into automated workflows is not trivial. A framework that provides both, workflow automation and reproducibility, is Galaxy (Afgan ). For tool integration, Galaxy offers a one-click installation solution, which automatically resolves all dependencies and keeps track of all versions (libraries, tools and XML wrapper versions). Here, we present an updated version of GLASSgo and its integration into the Galaxy workflow management system utilizing Docker technology.
2 Results
2.1 Update of GLASSgo
With GLASSgo 1.5.2, we add the possibility to report upstream regions of the found sRNA homologs. This is a prerequisite to infer possible promoter elements of the sRNAs genes, hence facilitating the integration of sRNAs into transcription factor-based gene regulatory networks. To secure compatibility with the latest NCBI databases, GLASSgo 1.5.2 now supports the use of NCBI accession numbers (ACC numbers) as unique identifiers. This adaptation requires the preparation of new lookup tables for the taxonomic classification, which also serve as a prerequisite for clade-specific searches that are often more powerful than unrestricted analyses. To ensure easy updating and retrieval for existing and new installations, these lookup tables have been stored in an open access repository (Zenodo: https://zenodo.org/record/1320180).
2.2 Galaxy integration
The integration of GLASSgo into Galaxy is based on a Docker container. Therein we provide GLASSgo together with all its dependencies (NCBI-BLAST+, RNApdist, …) and distribute it via Docker Hub. Upon new releases the container is built automatically from source. This includes repository tests of the build process itself and functional tests of GLASSgo for different use cases. This Docker container can also be used for command-line-based analyses or for the integration into custom analysis pipelines. For the Galaxy integration, we make use of the Galaxy ToolShed (Blankenberg et al., 2014), which is a one-click solution for tool installation on custom hosted Galaxy instances. GLASSgo follows best practice guidelines and can be seamlessly installed from the Galaxy ToolShed (https://toolshed.g2.bx.psu.edu/view/computationaltranscriptomics/glassgo) using the admin interface. The installation procedure includes functional tests to ensure correct installation and comprehensive documentation of the usage of GLASSgo. Clade-specific searches require some additional configuration that is simplified by the included custom scripts. The automatic interplay of GLASSgo with the local Galaxy environment and external web resources for the Docker container and the lookup tables, which are hosted on Zenodo, is shown in Figure 1. We provide instruction videos that guide through the installation and setup process (https://youtu.be/SiS2ThYDkdU) as well as its usage (https://youtu.be/wFE7LFG9clQ) (Schäfer ).
Fig. 1.
Galaxy integration scheme of GLASSgo 1.5.2. The sources on GitHub are used to build a Docker container including all dependencies, which is available on Docker Hub. The Galaxy wrapper defines the user interface and manages the interaction with the Docker container. Special lookup tables required for clade-specific searches can be directly downloaded from Zenodo and are incorporated into Galaxy
Galaxy integration scheme of GLASSgo 1.5.2. The sources on GitHub are used to build a Docker container including all dependencies, which is available on Docker Hub. The Galaxy wrapper defines the user interface and manages the interaction with the Docker container. Special lookup tables required for clade-specific searches can be directly downloaded from Zenodo and are incorporated into Galaxy
2.3 Using GLASSgo in Galaxy
GLASSgo is part of the RNA workbench (Fallmann ), which provides a public Galaxy instance with a set of tools for RNA-related tasks and is available at https://rna.usegalaxy.eu. However, the following description fits also for installations on local Galaxy instances. The user interface follows the design of the GLASSgo web server and is shown in Figure 1. The sRNA sequence of interest has to be uploaded to the user’s history in FASTA format. GLASSgo relies on BLAST and, thus, a fundamental parameter is the database to search in. Most Galaxy instances will have a set of databases already available for standard BLAST searches, and GLASSgo can use the same databases for its tasks. If the user wants to use a specialized database, e.g. a clade-specific or a custom database, GLASSgo within Galaxy offers two options: First, the user can choose a clade to restrict the BLAST searches to, which is achieved with the aforementioned lookup tables. Second, users can use custom BLAST databases, for example created from sequences in their own Galaxy history. The usage of GLASSgo is shown in detail in the instruction video mentioned above.
3 Discussion
The integration of GLASSgo into the Galaxy workflow management system offers many advantages over the standalone and web server version such as parameter tracking, version control, batch processing and pipeline development. Galaxy also follows the FAIR manifesto, which stands for ‘Findable, Accessible, Interoperable and Reusable’ (Wilkinson ) and thereby ensures good scientific practice. GLASSgo can be easily installed into a running Galaxy instance through the ToolShed system. In addition, GLASSgo can now be integrated into larger workflows, for example to build RNA family models. Here, it delivers the set of homologous sRNA sequences, and Infernal (Nawrocki ) will be used to build the covariance model. In this regard, the incorporation of GLASSgo into Galaxy leverages its full potential with respect to workflow integration and increased usability. We decided on a Docker-based integration because this avoids dependency, compatibility and compilation issues, which frequently occur with other ways to distribute software. Finally, the new feature to include upstream sequences in the results enables new analyses, such as the search for conserved motifs in promoter regions. These can then be used for further validation and filtering, but most importantly allow to integrate sRNAs into gene regulatory networks. Together with recently available advanced tools for sRNA target predictions, this improvement represents another corner stone for the integration of transcription factor-based gene regulatory networks with the post-transcriptional targets of bacterial sRNAs.
Funding
This work was supported by the Federal Ministry of Education and Research (BMBF) programs RNAProNet [031L0106B to W.R.H. and 031L0164A to B.V.], inteRNAct [031A310 to B.V.], the German-Israeli Foundation (GIF) [G-1311 to W.R.H.] and the Deutsche Forschungsgemeinschaft [GE 3159/1-1 to J.G.].Conflict of Interest: none declared.
Authors: Martin Raden; Syed M Ali; Omer S Alkhnbashi; Anke Busch; Fabrizio Costa; Jason A Davis; Florian Eggenhofer; Rick Gelhausen; Jens Georg; Steffen Heyne; Michael Hiller; Kousik Kundu; Robert Kleinkauf; Steffen C Lott; Mostafa M Mohamed; Alexander Mattheis; Milad Miladi; Andreas S Richter; Sebastian Will; Joachim Wolff; Patrick R Wright; Rolf Backofen Journal: Nucleic Acids Res Date: 2018-07-02 Impact factor: 16.971
Authors: Enis Afgan; Dannon Baker; Bérénice Batut; Marius van den Beek; Dave Bouvier; Martin Cech; John Chilton; Dave Clements; Nate Coraor; Björn A Grüning; Aysam Guerler; Jennifer Hillman-Jackson; Saskia Hiltemann; Vahid Jalili; Helena Rasche; Nicola Soranzo; Jeremy Goecks; James Taylor; Anton Nekrutenko; Daniel Blankenberg Journal: Nucleic Acids Res Date: 2018-07-02 Impact factor: 16.971
Authors: Jörg Fallmann; Pavankumar Videm; Andrea Bagnacani; Bérénice Batut; Maria A Doyle; Tomas Klingstrom; Florian Eggenhofer; Peter F Stadler; Rolf Backofen; Björn Grüning Journal: Nucleic Acids Res Date: 2019-07-02 Impact factor: 16.971
Authors: Daniel Blankenberg; Gregory Von Kuster; Emil Bouvier; Dannon Baker; Enis Afgan; Nicholas Stoler; James Taylor; Anton Nekrutenko Journal: Genome Biol Date: 2014-02-20 Impact factor: 13.583
Authors: Mark D Wilkinson; Michel Dumontier; I Jsbrand Jan Aalbersberg; Gabrielle Appleton; Myles Axton; Arie Baak; Niklas Blomberg; Jan-Willem Boiten; Luiz Bonino da Silva Santos; Philip E Bourne; Jildau Bouwman; Anthony J Brookes; Tim Clark; Mercè Crosas; Ingrid Dillo; Olivier Dumon; Scott Edmunds; Chris T Evelo; Richard Finkers; Alejandra Gonzalez-Beltran; Alasdair J G Gray; Paul Groth; Carole Goble; Jeffrey S Grethe; Jaap Heringa; Peter A C 't Hoen; Rob Hooft; Tobias Kuhn; Ruben Kok; Joost Kok; Scott J Lusher; Maryann E Martone; Albert Mons; Abel L Packer; Bengt Persson; Philippe Rocca-Serra; Marco Roos; Rene van Schaik; Susanna-Assunta Sansone; Erik Schultes; Thierry Sengstag; Ted Slater; George Strawn; Morris A Swertz; Mark Thompson; Johan van der Lei; Erik van Mulligen; Jan Velterop; Andra Waagmeester; Peter Wittenburg; Katherine Wolstencroft; Jun Zhao; Barend Mons Journal: Sci Data Date: 2016-03-15 Impact factor: 6.444
Authors: Steffen C Lott; Richard A Schäfer; Martin Mann; Rolf Backofen; Wolfgang R Hess; Björn Voß; Jens Georg Journal: Front Genet Date: 2018-04-17 Impact factor: 4.599