Warning: Undefined array key "mm" in /www/wwwroot/www.ai-bt.com/si.php on line 10 Deprecated: trim(): Passing null to parameter #1 ($string) of type string is deprecated in /www/wwwroot/www.ai-bt.com/si.php on line 10 Managing genomic variant calling workflows with Swift/T.

Literature DB >> 31287816

Managing genomic variant calling workflows with Swift/T.

Azza E Ahmed^1,2, Jacob Heldenbrand³, Yan Asmann⁴, Faisal M Fadlelmola¹, Daniel S Katz³, Katherine Kendig³, Matthew C Kendzior⁵, Tiffany Li³, Yingxue Ren⁴, Elliott Rodriguez³, Matthew R Weber⁵, Justin M Wozniak⁶, Jennie Zermeno³, Liudmila S Mainzer^3,7.

Abstract

Bioinformatics research is frequently performed using complex workflows with multiple steps, fans, merges, and conditionals. This complexity makes management of the workflow difficult on a computer cluster, especially when running in parallel on large batches of data: hundreds or thousands of samples at a time. Scientific workflow management systems could help with that. Many are now being proposed, but is there yet the "best" workflow management system for bioinformatics? Such a system would need to satisfy numerous, sometimes conflicting requirements: from ease of use, to seamless deployment at peta- and exa-scale, and portability to the cloud. We evaluated Swift/T as a candidate for such role by implementing a primary genomic variant calling workflow in the Swift/T language, focusing on workflow management, performance and scalability issues that arise from production-grade big data genomic analyses. In the process we introduced novel features into the language, which are now part of its open repository. Additionally, we formalized a set of design criteria for quality, robust, maintainable workflows that must function at-scale in a production setting, such as a large genomic sequencing facility or a major hospital system. The use of Swift/T conveys two key advantages. (1) It operates transparently in multiple cluster scheduling environments (PBS Torque, SLURM, Cray aprun environment, etc.), thus a single workflow is trivially portable across numerous clusters. (2) The leaf functions of Swift/T permit developers to easily swap executables in and out of the workflow, which makes it easy to maintain and to request resources optimal for each stage of the pipeline. While Swift/T's data-level parallelism eliminates the need to code parallel analysis of multiple samples, it does make debugging more difficult, as is common for implicitly parallel code. Nonetheless, the language gives users a powerful and portable way to scale up analyses in many computing architectures. The code for our implementation of a variant calling workflow using Swift/T can be found on GitHub at https://github.com/ncsa/Swift-T-Variant-Calling, with full documentation provided at http://swift-t-variant-calling.readthedocs.io/en/latest/.

Entities: Chemical Disease Gene Species

Mesh：

Year: 2019 PMID： 31287816 PMCID： PMC6615596 DOI： 10.1371/journal.pone.0211608

Source DB: PubMed Journal: PLoS One ISSN： 1932-6203 Impact factor: 3.240

Keyword Cloud
Cited

4 in total

1. Data analysis and modeling pipelines for controlled networked social science experiments.

Authors: Vanessa Cedeno-Mieles; Zhihao Hu; Yihui Ren; Xinwei Deng; Noshir Contractor; Saliya Ekanayake; Joshua M Epstein; Brian J Goode; Gizem Korkmaz; Chris J Kuhlman; Dustin Machi; Michael Macy; Madhav V Marathe; Naren Ramakrishnan; Parang Saraf; Nathan Self
Journal: PLoS One Date: 2020-11-24 Impact factor: 3.240

2. Design considerations for workflow management systems use in production genomics research and the clinic.

Authors: Azza E Ahmed; Joshua M Allen; Tajesvi Bhat; Prakruthi Burra; Christina E Fliege; Steven N Hart; Jacob R Heldenbrand; Matthew E Hudson; Dave Deandre Istanto; Michael T Kalmbach; Gregory D Kapraun; Katherine I Kendig; Matthew Charles Kendzior; Eric W Klee; Nate Mattson; Christian A Ross; Sami M Sharif; Ramshankar Venkatakrishnan; Faisal M Fadlelmola; Liudmila S Mainzer
Journal: Sci Rep Date: 2021-11-04 Impact factor: 4.379

3. Bioinformatics in Sudan: Status and challenges case study: The National University-Sudan.

Authors: Sofia B Mohamed; Sumaya Kambal; Sabah A E Ibrahim; Esra Abdalwhab; Abdalla Munir; Arwa Ibrahim; Qurashi Mohamed Ali
Journal: PLoS Comput Biol Date: 2021-10-21 Impact factor: 4.475

4. Orchestrating and sharing large multimodal data for transparent and reproducible research.

Authors: Anthony Mammoliti; Petr Smirnov; Minoru Nakano; Zhaleh Safikhani; Christopher Eeles; Heewon Seo; Sisira Kadambat Nair; Arvind S Mer; Ian Smith; Chantal Ho; Gangesh Beri; Rebecca Kusko; Eva Lin; Yihong Yu; Scott Martin; Marc Hafner; Benjamin Haibe-Kains
Journal: Nat Commun Date: 2021-10-04 Impact factor: 14.919

4 in total