| Literature DB >> 32306927 |
Onur Yukselen1, Osman Turkyilmaz2, Ahmet Rasit Ozturk2, Manuel Garber3,4,5, Alper Kucukural6,7,8.
Abstract
BACKGROUND: The emergence of high throughput technologies that produce vast amounts of genomic data, such as next-generation sequencing (NGS) is transforming biological research. The dramatic increase in the volume of data, the variety and continuous change of data processing tools, algorithms and databases make analysis the main bottleneck for scientific discovery. The processing of high throughput datasets typically involves many different computational programs, each of which performs a specific step in a pipeline. Given the wide range of applications and organizational infrastructures, there is a great need for highly parallel, flexible, portable, and reproducible data processing frameworks. Several platforms currently exist for the design and execution of complex pipelines. Unfortunately, current platforms lack the necessary combination of parallelism, portability, flexibility and/or reproducibility that are required by the current research environment. To address these shortcomings, workflow frameworks that provide a platform to develop and share portable pipelines have recently arisen. We complement these new platforms by providing a graphical user interface to create, maintain, and execute complex pipelines. Such a platform will simplify robust and reproducible workflow creation for non-technical users as well as provide a robust platform to maintain pipelines for large organizations.Entities:
Keywords: Big data processing; Genome analysis; Pipeline; Sequencing; Workflow
Mesh:
Year: 2020 PMID: 32306927 PMCID: PMC7168977 DOI: 10.1186/s12864-020-6714-x
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 3.969
Fig. 1DolphinNext builds on Nextflow and simplifies creating complex workflows
Fig. 2a A process for building index files b Input and output parameters attached to a process c The STAR alignment module connected through input/output with matching parameter types. d The RNA-Seq pipeline can be designed using two nested pipelines: the STAR pipeline and the BAM analysis pipeline
Fig. 3Resuming RNA-Seq pipeline after changing RSEM parameters
Fig. 4a RSEM module which involves Count_Table to summarize sample counts into a consolidated count table. This process reports the results with a table or upload the count table to embedded DEBrowser [36], b Count table report c. MultiQC [37] report to summarize numerous bioinformatics tool results, and d Embedded DEBrowser [36] module for interactive differential expression analysis
Fig. 5R markdown report for RNA-Seq analysis. Users can adapt this template according to their needs
Fig. 6DolphinNext allows for implementation of a complete sequencing analysis cycle
Comparison of related applications
| DolphinNext | Galaxy [ | Sequanix [ | Taverna [ | Arvados [ | |
|---|---|---|---|---|---|
| Platforma | JS/PHP | Python | Python | Java | Go |
| Workflow management system | Nextflow | Galaxy | Snakemake | Taverna | Arvados |
| Native task supportb | Yes (any) | No | Yes (bash only) | Yes (bash only) | Yes |
| Common workflow languagec | No | Yes | No | No | Yes |
| Streaming processingd | Yes | No | No | No | Yes |
| Code sharing integratione | Yes | No | No | No | Yes (GitHub) |
| Workflow modulesf | Yes | Yes | Yes | Yes | Yes |
| Workflow versioningg | Yes | Yes | No | No | No |
| Automatic error failoverh | Yes | Yes | No | No | Yes |
| Nested workflows | Yes | Yes | No | Yes | No |
| Used syntax/ semantics | own/own | XML/own | Python/own | own/own | Python/own |
| Web-based | Yes | Yes | No | No | No |
| Web-based process developmenti | Yes | No | No | No | No |
| Distributed pipeline executionj | Yes | No | No | No | No |
| Container Support | |||||
| Docker support | Yes | Yes | Yes | No | Yes |
| Singularity support | Yes | Yes | Yes | No | No |
| Built-in batch schedulers | |||||
| LSF | Yes (Native) | Yes (DRMAA) | Yes (Native) | No | No |
| SGE | Yes (Native) | Yes (DRMAA) | Yes (Native) | Yes (Native) | No |
| SLURM | Yes (Native) | Yes (DRMAA) | Yes (Native) | No | Yes (Native) |
| IGNITE | Yes (Native) | No | No | No | No |
| Built-in cloud | |||||
| AWS (Amazon Web Services) | Yes | Yes | No | Yes | Yes |
| GCP (Google Cloud Platform) | Yes | Yes (Partial)k | No | No | Yes |
| Autoscaling | Yes | Yes | No | No | Yes |
aThe technology and the programming language in which each framework is implemented
bThe ability of the framework to support the execution of native commands and scripts without re-implementation of the original processes
cSupport for the CWL specification
dAbility to process tasks inputs/outputs as a stream of data
eSupport for code management and sharing platforms, such as GitHub
fSupport for modules, sub-workflows or workflow compositions
gAbility to track pipeline changes and to execute different versions at any point in time
hSupport for automatic error handling and resume execution mechanism
iAbility to add new processes in an embedded web editor without a wrapper or any installation of the wrapper
jSupport for executing the same pipeline without any change in multiple computing environments to process the data within a single interface (e.g. hpc clusters, a workstation and cloud)
kA Galaxy instance can be launched in Google cloud but for one-time use. When it is shut down, they are permanently deleted