| Literature DB >> 24982428 |
Erik Gafni1, Lovelace J Luquette1, Alex K Lancaster2, Jared B Hawkins1, Jae-Yoon Jung1, Yassine Souilmi2, Dennis P Wall2, Peter J Tonellato2.
Abstract
SUMMARY: Efficient workflows to shepherd clinically generated genomic data through the multiple stages of a next-generation sequencing pipeline are of critical importance in translational biomedical science. Here we present COSMOS, a Python library for workflow management that allows formal description of pipelines and partitioning of jobs. In addition, it includes a user interface for tracking the progress of jobs, abstraction of the queuing system and fine-grained control over the workflow. Workflows can be created on traditional computing clusters as well as cloud-based services.Entities:
Mesh:
Year: 2014 PMID: 24982428 PMCID: PMC4184253 DOI: 10.1093/bioinformatics/btu385
Source DB: PubMed Journal: Bioinformatics ISSN: 1367-4803 Impact factor: 6.937
Fig. 1.(a) Tools are defined in COSMOS by specifying input and output types, not files, and a cmd() function returning a string to be executed in a shell. cpu_req and other parameters may be inspected by a programmer-defined Python function to set DRM parameters or redirect jobs to queues. (b) Workflows are defined using map-reduce primitives: sequence_, map_ (execute the align tool from (a) on each ‘chunk’ in parallel) and reduce_ (group the aligned outputs by sample tag). (c) Directed acyclic graph of jobs generated by the workflow in (b) to be executed via the DRM for four input FASTQ files (with sample tags s1 and s2, and chunk tags of c1 and c2)