| Literature DB >> 31825479 |
Johan Dahlberg1, Johan Hermansson1, Steinar Sturlaugsson1, Mariya Lysenkova1, Patrik Smeds2, Claes Ladenvall2, Roman Valls Guimera3, Florian Reisinger3, Oliver Hofmann3, Pontus Larsson1.
Abstract
BACKGROUND: In recent years, nucleotide sequencing has become increasingly instrumental in both research and clinical settings. This has led to an explosive growth in sequencing data produced worldwide. As the amount of data increases, so does the need for automated solutions for data processing and analysis. The concept of workflows has gained favour in the bioinformatics community, but there is little in the scientific literature describing end-to-end automation systems. Arteria is an automation system that aims at providing a solution to the data-related operational challenges that face sequencing core facilities.Entities:
Keywords: automation; orchestration; sequencing; workflows
Mesh:
Year: 2019 PMID: 31825479 PMCID: PMC6905352 DOI: 10.1093/gigascience/giz135
Source DB: PubMed Journal: Gigascience ISSN: 2047-217X Impact factor: 6.524
Definitions
| Term | Description |
|---|---|
| Action | A computational unit of work, e.g., processing a file or inserting data into a database. This is sometimes referred to as a task. |
| Process | A set of steps that have to be finished to achieve a particular goal, e.g., delivering data to a user. A process can include automated and manual steps. |
| Workflow | A workflow models a process, as a number of "actions" following each other. This can be described by a directed acyclic graph. |
Figure 1:An overview of the conceptual levels of the Arteria project.
Figure 2:Description of the StackStorm event model. Sensors will perceive events in the environments, e.g., a file being created or a certain time of day occuring. This passes information to the rule layer where the data are evaluated and depending on which, if any, criteria are fulfilled 1 or more actions are triggered. Actions can be single commands or full workflows to be executed.
Descriptions of concepts in arteria-packs sample implementation
| Concept | Definition | arteria-packs implementation |
|---|---|---|
| Actions | Encapsulate system tasks | Micro-services arteria-runfolder, arteria-bcl2fastq, and checkQC |
| Workflows | Tie actions together | Mistral workflow defined in workflow_bcl2fastq_and_checkqc.yaml |
| Sensors | Pick up events from the environment | RunfolderSensor defined in runfolder_sensor.yaml, which detects runfolders ready for processing |
| Rules | Parse events from sensors and determine whether an action or a workflow should be initiated | Defined in when_runfolder_is_ready_start_bcl2fastq.yaml; fires bcl2fastq workflow when a runfolder is ready |
Figure 3:Schematic view of a system deployment scenario, showing how data are written to the local storage and compute nodes from the sequencing machines, and how the system uses information and resources from multiple sources to coordinate the process. The operator can then monitor and control the processes from the single interface provided at the master automation node.
Figure 4:UMCCR Arteria cloud infrastructure. When a commit is pushed to our GitHub repository and validated by TravisCI, it proceeds to our autoscaling group “arteria,” which subsequently deploys cloud instances, incorporating the new Arteria and StackStorm code changes. After changes are deployed, any incoming events, such as a new sequencing run being completed, are handled by this newly deployed code and data are copied from the sequencers to our university HPC center for further downstream processing with bcbio [24].