Literature DB >> 26454273

Jflow: a workflow management system for web applications.

Jérôme Mariette1, Frédéric Escudié1, Philippe Bardou2, Ibouniyamine Nabihoudine1, Céline Noirot1, Marie-Stéphane Trotard1, Christine Gaspin1, Christophe Klopp3.   

Abstract

SUMMARY: Biologists produce large data sets and are in demand of rich and simple web portals in which they can upload and analyze their files. Providing such tools requires to mask the complexity induced by the needed High Performance Computing (HPC) environment. The connection between interface and computing infrastructure is usually specific to each portal. With Jflow, we introduce a Workflow Management System (WMS), composed of jQuery plug-ins which can easily be embedded in any web application and a Python library providing all requested features to setup, run and monitor workflows.
AVAILABILITY AND IMPLEMENTATION: Jflow is available under the GNU General Public License (GPL) at http://bioinfo.genotoul.fr/jflow. The package is coming with full documentation, quick start and a running test portal. CONTACT: Jerome.Mariette@toulouse.inra.fr.
© The Author 2015. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

Entities:  

Mesh:

Year:  2015        PMID: 26454273      PMCID: PMC5859998          DOI: 10.1093/bioinformatics/btv589

Source DB:  PubMed          Journal:  Bioinformatics        ISSN: 1367-4803            Impact factor:   6.937


1 Introduction

Building rich web environments aimed at helping scientists analyze their data is a common trend in bioinformatics. Specialized web portals such as MG-RAST (Meyer ), MetaVir (Roux ) or NG6 (Mariette ) provide multiple services and analysis tools in an integrated manner for specific experiments or data types. These applications require WMS features to manage and execute their computational pipelines. Generic WMS, such as Galaxy (Goecks ), Ergatis (Orvis ) or Mobyle (Néron ) provide a user friendly graphical interface easing workflow creation and execution. Unfortunately, such environments come with their own interface, complicating their integration within already existing web tools. Other WMS such as weaver (Bui ), Snakemake (Koster ), Ruffus (Goodstadt, 2010) or Cosmos (Gafni ) provide a framework or a domain-specific language to developers wanting to build and run workflows. These software packages offer the flexibility and power of a high-level programming language, but they do not provide a user interface, enable component and workflow definition. JFlow combines a user friendly interface with an intuitive python API. It is, to our knowledge, the only WMS designed to be embedded in any web application, thanks to its organization as jQuery (http://jquery.com/) plug-ins.

2 Methods

Jflow user interface gathers five jQuery plug-ins providing user oriented views. availablewf lists all runnable workflows accessible to users, activewf monitors all started, completed, failed, aborted and reseted workflows, wfform presents workflow editable parameters in a form, wfoutputs displays all outputs produced by the workflow organized per component, wfstatus shows the workflow execution state as a list or an execution graph. The graph visualization uses the Cytoscape web JavaScript plug-in (Lopes ). The plug-ins give access to multiple communication methods and events. They interact with the server side through Jflow’s REST API, running under a cherrypy (http://www.cherrypy.org/) web server. The included server uses the JSONP communication technique enabling cross-domain requests. To be available from the different jQuery plug-ins, the workflows have to be implemented using the Jflow API. A Jflow component is in charge of an execution step. Adding a component to the system requires to write a Python Component subclass. In Jflow, different solutions are available to ease component creation. To wrap a single command line, the developer can give a position or a flag for each parameter. Jflow also embeds an XML parser which allows to run genuine Mobyle (Néron ) components. Finally, to allow developers to integrate components from other WMS, Jflow provides a skeleton class. This class only requires to implement the parsing step. A workflow chains components. It is represented by a directed acyclic graph (DAG) where nodes represent jobs and edges links between inputs and outputs. When paths are disjoint, jobs are run in parallel. A Jflow workflow is built as a Workflow subclass. Components are added to the workflow as variables and chained linking outputs and inputs. To define the parameters presented to the final user, Jflow gives access to different class methods. Each parameter has at least a name, a user help text and a data type. For file or directory parameters, it is possible to set required file format, size limitation and location. Jflow handles server side files with regular expressions, but also URLs and client side files, in which case, it automatically uploads them. Before running the workflow, Jflow checks data type compliance for each parameter. Job submission, status checking and error handling, rely on Makeflow (Albrecht ) and weaver (Bui ). Therefore Jflow manages error recovery and supports most distributed resource management systems (Condor, SGE, Work Queue or a single multi core machine, …). Replacing Makeflow by an other job submitter requires to implement a new Engine subclass. This class creates and executes the workflow DAG.

3 Example

Jflow user interface has been designed to allow an easy integration in mash up web applications. Hereunder, we present its integration in NG6, which provides a user-friendly interface to process, store and download high-throughput sequencing data. The environment displays sequencing runs as a table. From this view, the user can add new data by running workflows in charge of loading the data and checking its quality. Different workflows are available considering data type and sequencing technology. Workflows are listed by the availablewf plug-in built within a NG6 modal box. A select.availablewf event thrown by the availablewf plug-in is listened and caught to generate the parameter form using the wfform plug-in. Considering the parameter type, Jflow adapts its display. For example, a date is shown as a calendar and a boolean as a check box. Biologists use NG6 to check sequencing reads quality, including experimental samples contamination measure. The first input of this analysis is the contaminant reference genome fasta file, displayed as a file selector. The second input is a parameter set describing the biological samples. It includes the read files and metadata such as sample name, tissue and development stage. To help biologists populate it, Jflow uses a structured data input rendered by the wfform plug-in as a spreadsheet. It allows to copy and paste multiple lines. Jflow iterates then on the table content to launch each sample processing in parallel. To monitor running workflows, NG6 provides a table in a specific page. The table is filled by the activewf plug-in. In the same way as described above, the wfstatus is built on a modal box when a select.activewf event is thrown by the activewf plug-in, as presented in Figure 1. This view shows the workflow’s execution graph where nodes represent components and edges links between inputs and outputs.
Fig. 1.

Jflow integration: (a) a piece of the NG6 HTML code source in which is positioned an empty div to build the activewf plug-in and a modal box for the wfstatus plug-in. (b) The jQuery code in charge to build Jflow plug-ins and manage user action. When the select.activewf event is thrown from activewf-div, a function is called with two parameters: event and workflow. The last parameter stores all the workflow’s information, such as its name and its id, used in this example to update the modal box title and to build the wfstatus plug-in. (c) The status of the illumina_qc workflow with the id 26 displayed as a graph in the NG6 application

Jflow integration: (a) a piece of the NG6 HTML code source in which is positioned an empty div to build the activewf plug-in and a modal box for the wfstatus plug-in. (b) The jQuery code in charge to build Jflow plug-ins and manage user action. When the select.activewf event is thrown from activewf-div, a function is called with two parameters: event and workflow. The last parameter stores all the workflow’s information, such as its name and its id, used in this example to update the modal box title and to build the wfstatus plug-in. (c) The status of the illumina_qc workflow with the id 26 displayed as a graph in the NG6 application NG6 was first implemented using the Ergatis (Orvis ) WMS, which had a separate user interface. With Jflow, all actions are now available from the same application, which makes it user friendly.

4 Conclusion

Jflow is a simple and efficient solution to embed WMS features within a web application. It is, to our knowledge, the only WMS designed with this purpose. It is already embedded in RNAbrowse (Mariette ) and NG6 (Mariette ), where it has been used to process more than 2000 sequencing runs on a 5000 cores HPC environment. Conflict of Interest: none declared.
  11 in total

1.  Ruffus: a lightweight Python library for computational pipelines.

Authors:  Leo Goodstadt
Journal:  Bioinformatics       Date:  2010-09-16       Impact factor: 6.937

2.  Metavir: a web server dedicated to virome analysis.

Authors:  Simon Roux; Michaël Faubladier; Antoine Mahul; Nils Paulhe; Aurélien Bernard; Didier Debroas; François Enault
Journal:  Bioinformatics       Date:  2011-09-11       Impact factor: 6.937

3.  Snakemake--a scalable bioinformatics workflow engine.

Authors:  Johannes Köster; Sven Rahmann
Journal:  Bioinformatics       Date:  2012-08-20       Impact factor: 6.937

4.  Ergatis: a web interface and scalable software system for bioinformatics workflows.

Authors:  Joshua Orvis; Jonathan Crabtree; Kevin Galens; Aaron Gussman; Jason M Inman; Eduardo Lee; Sreenath Nampally; David Riley; Jaideep P Sundaram; Victor Felix; Brett Whitty; Anup Mahurkar; Jennifer Wortman; Owen White; Samuel V Angiuoli
Journal:  Bioinformatics       Date:  2010-04-22       Impact factor: 6.937

5.  Cytoscape Web: an interactive web-based network browser.

Authors:  Christian T Lopes; Max Franz; Farzana Kazi; Sylva L Donaldson; Quaid Morris; Gary D Bader
Journal:  Bioinformatics       Date:  2010-07-23       Impact factor: 6.937

6.  Galaxy: a comprehensive approach for supporting accessible, reproducible, and transparent computational research in the life sciences.

Authors:  Jeremy Goecks; Anton Nekrutenko; James Taylor
Journal:  Genome Biol       Date:  2010-08-25       Impact factor: 13.583

7.  The metagenomics RAST server - a public resource for the automatic phylogenetic and functional analysis of metagenomes.

Authors:  F Meyer; D Paarmann; M D'Souza; R Olson; E M Glass; M Kubal; T Paczian; A Rodriguez; R Stevens; A Wilke; J Wilkening; R A Edwards
Journal:  BMC Bioinformatics       Date:  2008-09-19       Impact factor: 3.169

8.  COSMOS: Python library for massively parallel workflows.

Authors:  Erik Gafni; Lovelace J Luquette; Alex K Lancaster; Jared B Hawkins; Jae-Yoon Jung; Yassine Souilmi; Dennis P Wall; Peter J Tonellato
Journal:  Bioinformatics       Date:  2014-06-30       Impact factor: 6.937

9.  RNAbrowse: RNA-Seq de novo assembly results browser.

Authors:  Jérôme Mariette; Céline Noirot; Ibounyamine Nabihoudine; Philippe Bardou; Claire Hoede; Anis Djari; Cédric Cabau; Christophe Klopp
Journal:  PLoS One       Date:  2014-05-13       Impact factor: 3.240

10.  Mobyle: a new full web bioinformatics framework.

Authors:  Bertrand Néron; Hervé Ménager; Corinne Maufrais; Nicolas Joly; Julien Maupetit; Sébastien Letort; Sébastien Carrere; Pierre Tuffery; Catherine Letondal
Journal:  Bioinformatics       Date:  2009-08-17       Impact factor: 6.937

View more
  1 in total

1.  Combination of tissue and liquid biopsy molecular profiling to detect transformation to small cell lung carcinoma during osimertinib treatment.

Authors:  Julie A Vendrell; Xavier Quantin; Isabelle Serre; Jérôme Solassol
Journal:  Ther Adv Med Oncol       Date:  2020-12-18       Impact factor: 8.168

  1 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.