Literature DB >> 29790989

The Galaxy platform for accessible, reproducible and collaborative biomedical analyses: 2018 update.

Enis Afgan¹, Dannon Baker¹, Bérénice Batut², Marius van den Beek³, Dave Bouvier⁴, Martin Cech⁴, John Chilton⁴, Dave Clements¹, Nate Coraor⁴, Björn A Grüning^2,5, Aysam Guerler¹, Jennifer Hillman-Jackson⁴, Saskia Hiltemann⁶, Vahid Jalili⁷, Helena Rasche², Nicola Soranzo⁸, Jeremy Goecks⁷, James Taylor¹, Anton Nekrutenko⁴, Daniel Blankenberg⁹.

Abstract

Galaxy (homepage: https://galaxyproject.org, main public server: https://usegalaxy.org) is a web-based scientific analysis platform used by tens of thousands of scientists across the world to analyze large biomedical datasets such as those found in genomics, proteomics, metabolomics and imaging. Started in 2005, Galaxy continues to focus on three key challenges of data-driven biomedical science: making analyses accessible to all researchers, ensuring analyses are completely reproducible, and making it simple to communicate analyses so that they can be reused and extended. During the last two years, the Galaxy team and the open-source community around Galaxy have made substantial improvements to Galaxy's core framework, user interface, tools, and training materials. Framework and user interface improvements now enable Galaxy to be used for analyzing tens of thousands of datasets, and >5500 tools are now available from the Galaxy ToolShed. The Galaxy community has led an effort to create numerous high-quality tutorials focused on common types of genomic analyses. The Galaxy developer and user communities continue to grow and be integral to Galaxy's development. The number of Galaxy public servers, developers contributing to the Galaxy framework and its tools, and users of the main Galaxy server have all increased substantially.

Entities: Chemical Disease Gene Species

Mesh：

Year: 2018 PMID： 29790989 PMCID： PMC6030816 DOI： 10.1093/nar/gky379

Source DB: PubMed Journal: Nucleic Acids Res ISSN： 0305-1048 Impact factor: 16.971

INTRODUCTION

Advances in biomedicine and biology increasingly rely on analysis of large datasets. Started in 2005, the Galaxy Project (https://galaxyproject.org) (1–3) maintains a focus on enabling data-driven biomedical science by pursuing three goals: (a) accessible data analysis serving all scientists regardless of their informatics expertise and tool developers seeking a wider audience and broad integration of their tools; (b) reproducible analyses regardless of the particular platform and (c) transparent communication of analyses, which in turn enables reuse and extension of analyses across communities of practice. The Galaxy Project consists of four complementary components: The main public Galaxy server (https://usegalaxy.org)—this server is the subject of this article and has been online since 2007. It features a rich toolset for large-scale genomics analyses, terabytes of public data for use, and hundreds of shared analysis histories, workflows, and interactive publication supplements. This server has more than 124,000 registered users whom run ∼245,000 analysis jobs each month. The Galaxy framework and software ecosystem (https://github.com/galaxyproject)—an open-source software package that anyone can use to run a Galaxy server on any Unix-based operating system. The Galaxy ecosystem includes a software development kit (SDK) for Galaxy tool development, API language bindings for multiple programming languages, software for scripting Galaxy interactions, and tools for automating setup and deployment of Galaxy and its plugins such as tools and visualizations. The Galaxy ToolShed (https://toolshed.g2.bx.psu.edu/)—a community-driven resource for the dissemination of Galaxy tools, workflows, and visualizations. This server functions as an ‘AppStore’ for Galaxy servers where developers and Galaxy admins can host, share, and install Galaxy tools, workflows and visualizations. The Galaxy Community (https://galaxyproject.org/community/)—distinct and complementary subcommunities make key contributions to all aspects of the Project. These subcommunities address the needs and desires of every category of stakeholder including users, administrators, developers, resource providers and educators. Galaxy has served hundreds of thousands of users, been used in >5700 scientific publications, and provided 500+ developers with a framework provisioning accessible, transparent, and reproducible data analysis (https://galaxyproject.org/galaxy-project/statistics/). Many instances of the framework have been installed, including Galaxy Main (https://usegalaxy.org) and over 99 publicly accessible servers (https://galaxyproject.org/public-galaxy-servers/), serving biomedical and other domain-specific research. Significant growth has occurred across all sectors of the Galaxy Project within the past two years (Figure 1).

Figure 1.

Circular barplot illustrating recent growth of the Galaxy Project across several independent facets. In the past two years, usage of the main public Galaxy server has increased 60%, the number of tools and supported versions has increased 53%, and the amount of data analyzed on the main server has increased 72%. A growing number of public instances (18% increase) and cloud-based Galaxy instances (38% increase) provide researchers with a wider range of options for scalability and application domains. Additionally, more developers (45% increase with 63% more commits to the codebase) contributed to the Galaxy framework and software ecosystem. Question and answer activity on the Galaxy Biostars forum increased 68%.

NEW FEATURES

Scalability

Scalability is amongst the most significant challenges that Galaxy faces as the size and number of biomedical and especially genomics datasets continues to grow. For instance, single-cell RNA-seq experiments routinely generate hundreds or thousands of primary datasets. As a web-based application, Galaxy must scale both in its web-based interface and on its backend server and do so in a multiuser environment. User interface scalability enables scientists to use the Galaxy web interface to analyze many datasets, apply (collective) operations on them, and design pipelines to analyze them. Galaxy implements a variety of features to facilitate analyzing large numbers of datasets, including workflows and collections. Our recent optimizations of the user interface (UI) yielded a significant improvement to frontend scalability. We benchmarked the optimizations by replicating an experiment conducted on single Hematopoietic stem cells and multipotent progenitors (4) to quantify the expression of 64 000 transcripts, which generates 11 872 history items. Galaxy ran this proof of concept experiment seamlessly using existing standard tools, whereas earlier versions of Galaxy would not have been able to support this analysis. Server scalability refers to the Galaxy's ability to execute many data analysis/manipulation tasks for many users. This is achieved by advantageously utilizing a range of available computing resources. The Galaxy framework runs on various platforms, from a standard laptop to institutional clusters and cloud-based platforms. Galaxy is highly versatile in its ability to deploy jobs (atomic units of work), as it can leverage a multitude of workload managers including Slurm (5), HTCondor (6), Apache Mesos (7) and Kubernetes (https://kubernetes.io), among others, in addition to a built-in lightweight job running system. Recent enhancements to Galaxy's job management include dynamic job destination assignment (which facilitate automatic job parameter-specific resource selection), delay in job queuing (e.g. for workflows), automatic job re-submission (e.g., on job failure due to a temporary cluster error), and means of implementing fair-share prioritization schemes. These features are being used on Galaxy Main (Figure 2) to leverage cloud computing resources for better job throughput. Specifically, Galaxy Main is now configured to take advantage of the XSEDE infrastructure (8) that includes Bridges and Stampede resources as well as the Jetstream cloud (9). The benefits of using these resources include the ability to run larger jobs, as shown in Figure 3. Additionally, use of these resources has enabled new types of analysis to be enabled on Main. Notably, this includes Galaxy Interactive Environments through the ability to use containerization technologies and provide sufficient isolation of individual jobs from other processes running on the same underlying compute infrastructure.

Figure 2.

Figure 3.

Enabling automated selection and use of specialized national cyberinfrastructure compute resources from Galaxy Main enhances user-experience. It is now possible to run jobs that are up to an order of magnitude larger than before by using Bridges and Stampede. New types of jobs, such as interactive environments (see Advances in tools section), that require execution isolation due to security concerns are enabled by utilizing virtualization facilitated by the Jetstream cloud. Consequently, it is possible to concurrently run more jobs due to the increase in processing capacity.

Schematic of servers and services in use at Galaxy Main. (A) A global overview of Galaxy Main resources. When users interact with usegalaxy.org, their browser connects to one of two frontends (shown as web-01/02) with file uploads being handled by web-03/04; each of these web servers connects to a database server and mounts a set of shared distributed file systems. Web-03/04 also prepares and schedules jobs using Slurm directly to manage compute tasks on fifteen dedicated compute nodes, which also directly mount the shared distributed file systems. A combination of Slurm and Pulsar (https://github.com/galaxyproject/pulsar) are used to manage tasks and for dataset file staging, respectively, on the Jetstream cloud at Indiana University (IU) and the Texas Advanced Computing Center (TACC). Communication between Galaxy and Pulsar is handled using the RabbitMQ (https://www.rabbitmq.com/) message broker. Additional jobs are sent to the supercomputer systems Bridges at Pittsburgh Supercomputing Center (PSC) and Stampede at TACC using Pulsar. These various compute resources are chosen based upon tool and job characteristics. See, e.g. https://github.com/galaxyproject/usegalaxy-playbook/wiki/Infrastructure for specific and up-to-date information. (B) Multiple frontend servers provide Galaxy content to users by utilizing round-robin load balancing. Nginx (https://nginx.org/) is used to serve HTTP content from the Galaxy uWSGI web application. Individual software processes are monitored and controlled using Supervisor (http://supervisord.org/). Each of these frontend servers connects to a PostgreSQL (https://www.postgresql.org/) database server. (C) Layout of data schemes used by Galaxy Main is optimized for application speed, concurrent access, and versioned content. Each Galaxy frontend server utilizes a combination of shared distributed file systems, CVMFS for versioned semi-static content and TACC’s Corral filesystem via NFS for mutable content, along with server-specific local file systems. (D) CernVM File System (CVMFS) infrastructure hosted by the Galaxy Project that is used at Main and available for access to any other Galaxy instance. Stratum 0 contains the single-source modifiable data repositories. File content is served using the Apache HTTP server (https://httpd.apache.org/). To enable redundancy and scaling to a large number of clients, Stratum 1 replica servers are hosted at multiple locations and utilize Squid (http://www.squid-cache.org/) for data caching. Additional replica servers can also be hosted by community members. Individual clients (Galaxy instances and compute nodes) access data content from Stratum 1 servers using a Filesystem in Userspace (FUSE) mount. Enabling automated selection and use of specialized national cyberinfrastructure compute resources from Galaxy Main enhances user-experience. It is now possible to run jobs that are up to an order of magnitude larger than before by using Bridges and Stampede. New types of jobs, such as interactive environments (see Advances in tools section), that require execution isolation due to security concerns are enabled by utilizing virtualization facilitated by the Jetstream cloud. Consequently, it is possible to concurrently run more jobs due to the increase in processing capacity. A complete Galaxy server with a full repertoire of tools and reference data can be run on major cloud platforms. These servers are launched independently by users, and come pre-configured with hundreds of tools and reasonable default settings typical of a production server. Notably, launched instances do not have usage quotas and can be customized to install any desired tool. We have designed a cloud-agnostic approach for leveraging these resources by developing the abstraction library CloudBridge (10) and a new CloudLaunch application. These two solutions make it possible to launch Galaxy instances across a variety of cloud providers while reducing the requirement to build and maintain cloud-specific resources (e.g. machine images, file systems). There are now 10 different flavors of Galaxy available for launching on major clouds including Amazon Web Services, Jetstream and Microsoft Azure (https://launch.usegalaxy.org).

Advances in tools

The Galaxy ToolShed (11) assumes the role of an AppStore for Galaxy instances by hosting thousands of tools. The ToolShed improves tool availability, deployment, and portability across Galaxy servers and computing environments.

Updated tool suite

Over the last two years, we have expanded both the quantity and quality of the tools available on the Galaxy ToolShed. As of April 2018, the ToolShed hosts 5628 tools, which shows 53% growth since 2016, and ∼2000 repositories had at least one new update. Examples of new tools include: GEMINI for exploring genetic variation (12); mothur for analyzing rRNA gene sequences (13); QIIME for quantitative microbiome analysis from raw DNA sequencing data (14); deepTools for explorative analysis of deeply sequence data (15,16); HiCexplorer (17) for analysis and visualization of Hi-C data; ChemicalToolBox for comprehensive access to cheminformatics libraries and drug discovery tools (18); minimap2 (https://arxiv.org/abs/1708.01492) and poretools for long read sequencing analysis (19); MultiQC (20) to aggregate multiple results into a single report; a new RNA-seq analysis tool suite with modern analysis tools such as Kallisto (21), Salmon (22), Deseq2 (23) and STAR-Fusion (24), and GenomeSpace (25), a cloud-based interoperability tool.

Tool environment and interface

The portability and backward-compatibility of the Galaxy tools environment is improved significantly. Accordingly, a tool configuration now includes a tool profile version, which is used to ensure compatibility between a version of a tool and its targeted Galaxy version. In addition, tool profile versions allow for the evolution of new and better tool defaults and behaviors while maintaining backwards compatibility. We also improved the ToolShed API and its interface to facilitate installing tools missing from an imported workflow. We improved the installation process so that restarting Galaxy is not required to use a newly installed tool.

Interactive analysis and visualization

Galaxy's UI makes it possible for anyone to run complex analyses. However, a complete analysis of genomic data often requires custom scripts or visualizations, especially at the beginning (data preparation) or end (data summarization) of analyses. To meet these customized needs, we recently introduced Galaxy Interactive Environments (26), an integration of Galaxy with Jupyter (RStudio is in development)—a commonly used interactive scripting platform. With Interactive Environments, Galaxy users benefit from existing computational infrastructure via both graphical UI and ad hoc scripting, or any combination of these. Galaxy's visualization framework (27) makes it possible to integrate a wide variety of Web-based and server-side visualizations. Through this framework, many new visualizations have been added to Galaxy, including Cytoscape (28), and the WebGL enabled 3D Protein viewer NGL (29), molecular interaction networks and macromolecular structures visualizations, and the 100+ visualizations available through BioJS (30), a rich set of community-driven JavaScript components for agile and interactive visualization of biological data.

User interface and experience enhancements

There are two common modes of data analysis: exploratory and pipeline execution. Galaxy enables simultaneous access to both of these. Users are able to interactively analyze their data by making use of individual tools in a trial-and-error manner. They are then able to automatically generate reusable and generalizable workflows from an ad hoc analysis. An interactive workflow editor is also available to modify or generate workflows from scratch. At any point in time, a user can seamlessly switch modes between interactively analyzing datasets and executing a workflow on these datasets. There is no analysis lock-in, and users can exercise full control, or make use of pre-existing pipelines. Importantly, these analysis artefacts, such as datasets, analysis histories, workflows, and visualizations can all be shared and copied by collaborators at the discretion of the analyst.

Client-side infrastructure

The client-side of Galaxy, which is the user-interface most people associate with Galaxy, has seen significant changes under the hood. The usage of server-side mako templates, for example to create forms, has been further reduced and replaced by client-side only code that communicates via the RESTful Galaxy API with the backend. This minimizes the number of full-page refreshes and improves response time by enabling partial page updates. The interface has been further enhanced to allow for drag-and-drop of files and datasets, presents a fuzzy search on dataset and tool metadata, and implements a modal scratchbook for visualizations and comparison of multiple datasets. Furthermore, the community has selected the Vue.js framework (https://vuejs.org/) as the base for future improvements allowing all UI elements to converge into a more reactive and future-proof interface. With the integration of Vue.js, the entire client-side build system was updated to utilize the latest web-technologies, to make routing and loading times faster, and to encourage rapid future interface improvements. While mostly transparent to users, these changes are the fundamental groundwork of a much more flexible UI framework that will enable visual enhancements and an improved user experience for years to come.

Webhooks

Inspired by user feedback and the need to quickly modify and adapt Galaxy's interface, we integrated a pluggable system to extend Galaxy's frontend. Webhooks provide an entry-point into the Galaxy UI, in which it is possible to add buttons, menu entries, or entire iframes. At these entry-points a developer can dynamically add client-side code (JavaScript, HTML, CSS) and interact with the rest of the Galaxy user-interface. By integrating Webhooks with the Galaxy API, it is also possible to trigger server-side functions from within a Webhook. Webhooks can be thoroughly customized and are enabled at the discretion of the Galaxy administrator.

Interactive tours

We have developed self-paced, interactive tours that users can step through to learn about Galaxy. These tours guide users step by step through using the interface including tools, workflows, and other features available in Galaxy. To simplify tour creation, a Tour Builder (https://github.com/TailorDev/galaxy-tourbuilder) has been created for recording, replaying, updating and exporting tours.

Improved workflows

Galaxy workflows have been extended in several ways. Switching between tool versions and upgrading workflows with new tool versions is now supported. A workflow can now be embedded in another, making it easier to create and edit workflows that have many common steps repeated. Many of these features have existed in in standalone workflow systems, such as Taverna (31), for sometime, but have been widely requested by Galaxy users. Workflows are now scheduled by a Galaxy server more efficiently and in the background, making it possible to execute larger workflows, generating tens of thousands of jobs, while providing instant feedback and a snappier user-experience. We have also enhanced Galaxy with initial support for running workflows defined in the Common Workflow Language (32) format.

Dataset collections

Galaxy Dataset Collections combine datasets to enable simultaneous analysis. They organize sets of datasets as potentially nested lists of objects allowing easier data handling and batch execution of tools. In addition to the related frontend improvements, and support of nesting collections together, we recently introduced specialized tools to be executed on collections (e.g. Collapse, which combines a list of datasets into a single dataset, Flatten which takes nested collections and produces a flat list of datasets, and Merge which takes two lists and creates a single unified list), and enabled uploading and downloading dataset collections to and from both user's local disk and Galaxy data libraries.

Infrastructure enhancements

In order to make Galaxy more robust in a production environment, we adopted technologies to enhance Galaxy's portability, security, reliability, and scalability. Galaxy now utilizes uWSGI (http://projects.unbit.it/uwsgi) as its default web application server. This adoption has several advantages, namely the ability to negate Python's limitations regarding concurrent tasks execution, built-in load balancing, scalability, improved fault tolerance and the possibility of restarting Galaxy uninterruptedly. Many tools available via Galaxy rely on the availability of reference and index data. To promote ease of use and efficient storage and compute resources, Galaxy is able to share a precomputed set of local reference data for tools to use. Previously, making this data available to the tools was a time intensive process where a Galaxy administrator had to install and properly configure the server, either manually or by using Data Managers (33). However, this resulted in much redundant effort required for each Galaxy server being configured. To streamline this process, we have made all the reference data we prepared for Galaxy Main available via a CernVM File System (CVMFS; (34)), a scalable and content-addressable file system. This repository currently hosts 5TB of pre-build reference data, which are versioned and shared publicly with read-only access. With minimal configuration, any instance of Galaxy, including Galaxy-Docker images, can attach to this file system and gain access to the same reference data available on Galaxy Main. To improve accessibility and fault-tolerance, this data source is replicated on servers located in Europe and Australia. Galaxy is powered by various open-source projects which are installed automatically, and used when needed. Galaxy is using the Conda package manager (https://conda.io) as its default tool dependency resolver, and offers support for virtualization and containerization technologies (e.g. Docker (https://www.docker.com) and Singularity (35)) to ensure a higher level of portability, if needed. By leveraging the Bioconda (https://doi.org/10.1101/207092) and the BioContainer (36) projects, Galaxy is able to provision and use reproducible tool execution environments ((37); https://doi.org/10.1101/200683). Galaxy is a generic data analysis framework, which can be configured for various application scenarios using a wide range of configuration parameters. To facilitate configuring these parameters with optimal values for a number of predefined application scenarios, the Galaxy project leverages Ansible (https://www.ansible.com), software for automated configuration and management of other software packages. We have developed and shared Ansible configurations for Galaxy Main, the main public Galaxy server, (https://github.com/galaxyproject/usegalaxy-playbook) and also a configurable generic playbook for setting up production instances on cloud resources, virtual machines, and bare metal (https://github.com/ARTbio/GalaxyKickStart). This playbook can be used as a reference for configuring a Galaxy instance for a production environment. The Galaxy-Docker project (https://github.com/bgruening/docker-galaxy-stable), delivers a production ready Galaxy instance in minutes and can be used as the basis for personalized, self-contained, portable instances of Galaxy, known as Galaxy flavors. Preconfigured by the Galaxy community, a plenitude of flavors already exist covering application scenarios, from BLAST+ (38,39), metagenomics (https://doi.org/10.1101/183970), ChIP-exo analysis, or RNA research (40). In addition to the facilitated and out-of-box functionality, these images provision isolated environments well-suited for experimenting with tools and Galaxy configurations, and are ideal for training courses, as demonstrated by the Galaxy Training Network. Server monitoring and issue management is crucial in production Galaxy instances. Galaxy has integrated a plugin module to submit user bug-reports to configurable endpoints such as mailing lists or GitHub issues. With this, Galaxy can be configured to send error reports to a local ticket system. The recent integration of Sentry (https://sentry.io/) for automated error tracking and reporting makes it easier for administrators to track both client- and server-side errors without requiring manual user bug reports.

COMMUNITY

Galaxy serves several distinct communities: researchers, tool developers, resource providers, trainers, and trainees. To centralize resources for all communities, we have developed the Galaxy Community Hub (https://galaxyproject.org) for all things Galaxy. The Hub uses a modified wiki approach, with content written in Markdown, a simple formatting language, and then built into a static website. Anyone can update the Markdown documents using GitHub pull requests, a standard approach for collaborating on code and documentation on GitHub projects. Submitted pull requests are reviewed and merged, and the Hub site is automatically regenerated and updated, resulting in high-quality reviewed content that can be updated by any member of the Galaxy community. The Hub includes a full list of public Galaxy servers (https://galaxyproject.org/public-galaxy-servers), a large set of tutorials for learning to use Galaxy and perform genomic analyses, extensive documentation on deploying and administering a Galaxy server in the Cloud or on local hardware, and upcoming events. We also maintain an annotated listing of the >5000 publications referencing Galaxy via the free and open-source Zotero service (https://www.zotero.org/groups/1732893/galaxy). The Main Galaxy server has over 124 000 registered users and ∼2000 new users register each month. On average, 20 000 unique users execute over 245 000 analysis jobs by accessing 750 different tools every month. With such an active user-base, questions on platform and tool usage, as well as general research questions (41), are common. To efficiently assist users in performing research, we provide a Biostars (42) Question and Answers forum (https://biostar.usegalaxy.org/) that leverages the knowledge and strength of community members to provide support. This forum is monitored and moderated by core team members, but the Galaxy user community provides many answers. Help is also available through live chat with the team and community members via Gitter and IRC chat services, which are used most often by developers and administrators. In addition to the online help and documentation, the Galaxy Training Network has developed comprehensive tutorials and workflows for performing common data analysis tasks, providing topic-specific introduction slides, hands-on material, sample data, and even playable Galaxy tours (https://doi.org/10.1101/225680). Many in-person events that highlight and build the Galaxy community occur each year (https://galaxyproject.org/events/). These include free or low-cost hands-on workshops and training sessions that have been hosted by the community on six continents. The Galaxy Community Conference (GCC) is an annual conference that was first held in 2010. GCC alternates between Europe and the United States, includes two full days of training, two days of coding and data analysis hackathons, and two days of oral and poster presentations. Galaxy conferences have had over two hundred attendees each year since 2012, and over eleven hundred different researchers have attended since 2010. Our 2018 conference will be hosted jointly with the Bioinformatics Open Source Conference (BOSC) in an effort to promote and centralize discussion of open-source software for bioinformatics. Another core area of community focus is tool development and availability. The Intergalactic Utilities Commission (IUC; https://galaxyproject.org/iuc/) is a community-based organization that defines best-practices for tool development that help ensure the availability of high-quality tools in the ToolShed. It is a self-organizing and self-regulating group that has grown by six new members in the last two years and is primarily composed of individuals outside of the core Galaxy development team. The IUC is only one of many tool contributors, with the ToolShed allowing any member of the community to share tools that they have added to Galaxy. To assist community members with tool development and distribution, a command-line tool named Planemo (https://github.com/galaxyproject/planemo) has been developed. Planemo provides functionality for verifying best-practice adherence, testing, installation and uploading of tools to the ToolShed. Community contributions have helped the Galaxy framework and its tool suite to grow considerably. One hundred and seventy-four developers, who have collectively produced 13 135 commits within just the past two years (63% increase since January 2016), have improved Galaxy's scalability, functionality, and usability. The project utilizes the Travis and Jenkins continuous integration (CI) services to automatically execute comprehensive test suites on each set of proposed code changes. This strategy helps prevent the introduction of bugs to the codebase and improves review time. By harnessing the open-source community and modern software development practices, we are able to release a new stable version of the Galaxy framework every four months. Current future directions include enabling data and compute federation; tighter coupling of Interactive Environments with provenance and reuse; ToolShed installation and development enhancements; continued work on collections, workflows, analysis interfaces and history views; additional training material; improving statistical usage tracking and instrumentation; and much more. For anyone interested in getting involved with Galaxy development, we invite them to read the project's Contributing and Code of Conduct documents, review open issues, and explore the current roadmap, all which are available from the Galaxy GitHub repository (https://github.com/galaxyproject/galaxy/).

33 in total

1. Cytoscape: a software environment for integrated models of biomolecular interaction networks.

Authors: Paul Shannon; Andrew Markiel; Owen Ozier; Nitin S Baliga; Jonathan T Wang; Daniel Ramage; Nada Amin; Benno Schwikowski; Trey Ideker
Journal: Genome Res Date: 2003-11 Impact factor: 9.043

2. Near-optimal probabilistic RNA-seq quantification.

Authors: Nicolas L Bray; Harold Pimentel; Páll Melsted; Lior Pachter
Journal: Nat Biotechnol Date: 2016-04-04 Impact factor: 54.908

3. The Taverna workflow suite: designing and executing workflows of Web Services on the desktop, web or in the cloud.

Authors: Katherine Wolstencroft; Robert Haines; Donal Fellows; Alan Williams; David Withers; Stuart Owen; Stian Soiland-Reyes; Ian Dunlop; Aleksandra Nenadic; Paul Fisher; Jiten Bhagat; Khalid Belhajjame; Finn Bacall; Alex Hardisty; Abraham Nieva de la Hidalga; Maria P Balcazar Vargas; Shoaib Sufi; Carole Goble
Journal: Nucleic Acids Res Date: 2013-05-02 Impact factor: 16.971

4. GEMINI: integrative exploration of genetic variation and genome annotations.

Authors: Umadevi Paila; Brad A Chapman; Rory Kirchner; Aaron R Quinlan
Journal: PLoS Comput Biol Date: 2013-07-18 Impact factor: 4.475

5. Wrangling Galaxy's reference data.

Authors: Daniel Blankenberg; James E Johnson; James Taylor; Anton Nekrutenko
Journal: Bioinformatics Date: 2014-02-28 Impact factor: 6.937

6. deepTools: a flexible platform for exploring deep-sequencing data.

Authors: Fidel Ramírez; Friederike Dündar; Sarah Diehl; Björn A Grüning; Thomas Manke
Journal: Nucleic Acids Res Date: 2014-05-05 Impact factor: 16.971

7. Poretools: a toolkit for analyzing nanopore sequence data.

Authors: Nicholas J Loman; Aaron R Quinlan
Journal: Bioinformatics Date: 2014-08-20 Impact factor: 6.937

8. MultiQC: summarize analysis results for multiple tools and samples in a single report.

Authors: Philip Ewels; Måns Magnusson; Sverker Lundin; Max Käller
Journal: Bioinformatics Date: 2016-06-16 Impact factor: 6.937

9. The RNA workbench: best practices for RNA and high-throughput sequencing bioinformatics in Galaxy.

Authors: Björn A Grüning; Jörg Fallmann; Dilmurat Yusuf; Sebastian Will; Anika Erxleben; Florian Eggenhofer; Torsten Houwaart; Bérénice Batut; Pavankumar Videm; Andrea Bagnacani; Markus Wolfien; Steffen C Lott; Youri Hoogstrate; Wolfgang R Hess; Olaf Wolkenhauer; Steve Hoffmann; Altuna Akalin; Uwe Ohler; Peter F Stadler; Rolf Backofen
Journal: Nucleic Acids Res Date: 2017-07-03 Impact factor: 16.971

10. NCBI BLAST+ integrated into Galaxy.

Authors: Peter J A Cock; John M Chilton; Björn Grüning; James E Johnson; Nicola Soranzo
Journal: Gigascience Date: 2015-08-25 Impact factor: 6.524

987 in total

1. Transcriptomic analysis of hookworm Ancylostoma ceylanicum life cycle stages reveals changes in G-protein coupled receptor diversity associated with the onset of parasitism.

Authors: James P Bernot; Gabriella Rudy; Patti T Erickson; Ramesh Ratnappan; Meseret Haile; Bruce A Rosa; Makedonka Mitreva; Damien M O'Halloran; John M Hawdon
Journal: Int J Parasitol Date: 2020-06-25 Impact factor: 3.981

2. Three C. elegans srf-6 mutants carry nsy-1 mutations (srf-6 is nsy-1 I).

Authors: Nicholas Van Sciver; Jennifer Pulkowski; Samuel Politz
Journal: MicroPubl Biol Date: 2019-07-04

3. Species-specific mechanisms of cytotoxicity toward immune cells determine the successful outcome of Vibrio infections.

Authors: Tristan Rubio; Daniel Oyanedel; Yannick Labreuche; Eve Toulza; Xing Luo; Maxime Bruto; Cristian Chaparro; Marta Torres; Julien de Lorgeril; Philippe Haffner; Jeremie Vidal-Dupiol; Arnaud Lagorce; Bruno Petton; Guillaume Mitta; Annick Jacq; Frédérique Le Roux; Guillaume M Charrière; Delphine Destoumieux-Garzón
Journal: Proc Natl Acad Sci U S A Date: 2019-06-20 Impact factor: 11.205

4. Salmonella enterica Serovar Hvittingfoss in Bar-Tailed Godwits (Limosa lapponica) from Roebuck Bay, Northwestern Australia.

Authors: Hannah G Smith; David C Bean; Jane Hawkey; Rohan H Clarke; Richard Loyn; Jo-Ann Larkins; Chris Hassell; Mary Valcanis; William Pitchers; Andrew R Greenhill
Journal: Appl Environ Microbiol Date: 2020-09-17 Impact factor: 4.792

5. Gene-Specific Control of tRNA Expression by RNA Polymerase II.

Authors: Alan Gerber; Keiichi Ito; Chi-Shuen Chu; Robert G Roeder
Journal: Mol Cell Date: 2020-04-15 Impact factor: 17.970

6. Mask, a component of the Hippo pathway, is required for Drosophila eye morphogenesis.

Authors: Miles W DeAngelis; Emily W McGhie; Joseph D Coolon; Ruth I Johnson
Journal: Dev Biol Date: 2020-05-25 Impact factor: 3.582

7. HDAC7 regulates histone 3 lysine 27 acetylation and transcriptional activity at super-enhancer-associated genes in breast cancer stem cells.

Authors: Corrado Caslini; Sunhwa Hong; Yuguang J Ban; Xi S Chen; Tan A Ince
Journal: Oncogene Date: 2019-08-02 Impact factor: 9.867