| Literature DB >> 28655203 |
M Jafar Taghiyar1,2, Jamie Rosner1, Diljot Grewal1,2, Bruno M Grande3, Radhouane Aniba1,2, Jasleen Grewal3, Paul C Boutros4,5, Ryan D Morin3, Ali Bashashati1,2, Sohrab P Shah1,2.
Abstract
Background: The field of next-generation sequencing informatics has matured to a point where algorithmic advances in sequence alignment and individual feature detection methods have stabilized. Practical and robust implementation of complex analytical workflows (where such tools are structured into "best practices" for automated analysis of next-generation sequencing datasets) still requires significant programming investment and expertise.Entities:
Keywords: genomics; pipeline; reproducibility; workflow
Mesh:
Year: 2017 PMID: 28655203 PMCID: PMC5569921 DOI: 10.1093/gigascience/gix042
Source DB: PubMed Journal: Gigascience ISSN: 2047-217X Impact factor: 6.524
Figure 1:Make a workflow. Making a new workflow with Kronos includes 3 steps: (A) make a configuration file template: given a set of existing components, users can generate this file by running the command make_config; (B) configure the workflow: users can specify the desirable flow of their workflow using the connections and dependencies, customize output directory names, and specify input arguments and data to the required fields in the configuration file template; (C) initialize the workflow: this is achieved by running the command init on the configuration file, which transforms the YAML file into the Python workflow script.
Figure 3:Replace a component in a workflow. The configuration file has different sections as shown in the figure. These sections are: GENERAL, PIPELINE_INFO, SHARED, SAMPLES, and TASKs. The modular organization of the configuration file allows for easy customization of workflows, which can serve different purposes such as tool comparison. Adding, removing, or replacing nodes in the DAG of the workflows can be easily done by adding, removing, or replacing the corresponding TASK sections in the configuration file. For instance, to go from workflow DAG1 to workflow DAG2, i.e., to replace comp_1 (e.g., variant caller 1) in the first workflow with comp_5 (e.g., variant caller 2) in the second, the user only needs to replace the TASK_1 section with the TASK_5 section in the configuration file and perform Step 3.
Figure 4:Run a workflow. Workflows generated by Kronos are ready to run locally on a cluster of computing nodes and in the cloud. To run a workflow, users only need to run the Python workflow script. Each run of a workflow generates a specific directory structure tagged with a run-ID. When running a workflow for multiple samples, a separate directory is made for each sample to make it convenient to locate the results corresponding to each sample. This figure shows the tree structure of the resulting directory. There are 4 subdirectories that are always generated for each sample: (Ai) logs: to store the log files; (Bii) outputs: to store all the output files generated by all the components in the workflow; (iii) scripts: to store the scripts automatically generated by Kronos to run each component in the workflow; (iv) sentinels: to store sentinel files used by Kronos to pick up the workflow from where it left off in a previous run.
Figure 5:Strelka workflow. Results from the tumour-normal variant calling workflow on whole genome data of a breast cancer case (SA500 - EGA accession number EGAS00001000952). (A) Schematic of the workflow, which is comprised of 2 tasks. The plots generated by the workflow are in fact the output of TASK_2: (B) box plot of coverage and variant allelic ratios for the SNVs detected by Strelka, (C) base substitution patterns for the somatic SNVs, and (D) total number of SNVs and their histogram based on the quality score (QSS), (E) distribution of the number of SNVs across different chromosomes.