Literature DB >> 21899774

Agile parallel bioinformatics workflow management using Pwrake.

Hiroyuki Mishima1, Kensaku Sasaki, Masahiro Tanaka, Osamu Tatebe, Koh-Ichiro Yoshiura.   

Abstract

BACKGROUND: In bioinformatics projects, scientific workflow systems are widely used to manage computational procedures. Full-featured workflow systems have been proposed to fulfil the demand for workflow management. However, such systems tend to be over-weighted for actual bioinformatics practices. We realize that quick deployment of cutting-edge software implementing advanced algorithms and data formats, and continuous adaptation to changes in computational resources and the environment are often prioritized in scientific workflow management. These features have a greater affinity with the agile software development method through iterative development phases after trial and error.Here, we show the application of a scientific workflow system Pwrake to bioinformatics workflows. Pwrake is a parallel workflow extension of Ruby's standard build tool Rake, the flexibility of which has been demonstrated in the astronomy domain. Therefore, we hypothesize that Pwrake also has advantages in actual bioinformatics workflows.
FINDINGS: We implemented the Pwrake workflows to process next generation sequencing data using the Genomic Analysis Toolkit (GATK) and Dindel. GATK and Dindel workflows are typical examples of sequential and parallel workflows, respectively. We found that in practice, actual scientific workflow development iterates over two phases, the workflow definition phase and the parameter adjustment phase. We introduced separate workflow definitions to help focus on each of the two developmental phases, as well as helper methods to simplify the descriptions. This approach increased iterative development efficiency. Moreover, we implemented combined workflows to demonstrate modularity of the GATK and Dindel workflows.
CONCLUSIONS: Pwrake enables agile management of scientific workflows in the bioinformatics domain. The internal domain specific language design built on Ruby gives the flexibility of rakefiles for writing scientific workflows. Furthermore, readability and maintainability of rakefiles may facilitate sharing workflows among the scientific community. Workflows for GATK and Dindel are available at http://github.com/misshie/Workflows.

Entities:  

Year:  2011        PMID: 21899774      PMCID: PMC3180464          DOI: 10.1186/1756-0500-4-331

Source DB:  PubMed          Journal:  BMC Res Notes        ISSN: 1756-0500


  24 in total

1.  Biopipe: a flexible framework for protocol-based bioinformatics analysis.

Authors:  Shawn Hoon; Kiran Kumar Ratnapu; Jer-Ming Chia; Balamurugan Kumarasamy; Xiao Juguang; Michele Clamp; Arne Stabenau; Simon Potter; Laura Clarke; Elia Stupka
Journal:  Genome Res       Date:  2003-07-17       Impact factor: 9.043

2.  Ruffus: a lightweight Python library for computational pipelines.

Authors:  Leo Goodstadt
Journal:  Bioinformatics       Date:  2010-09-16       Impact factor: 6.937

3.  Whole-genome sequencing and comprehensive variant analysis of a Japanese individual using massively parallel sequencing.

Authors:  Akihiro Fujimoto; Hidewaki Nakagawa; Naoya Hosono; Kaoru Nakano; Tetsuo Abe; Keith A Boroevich; Masao Nagasaki; Rui Yamaguchi; Tetsuo Shibuya; Michiaki Kubo; Satoru Miyano; Yusuke Nakamura; Tatsuhiko Tsunoda
Journal:  Nat Genet       Date:  2010-10-24       Impact factor: 38.330

4.  A haplotype map of the human genome.

Authors: 
Journal:  Nature       Date:  2005-10-27       Impact factor: 49.962

5.  Adapters, shims, and glue--service interoperability for in silico experiments.

Authors:  U Radetzki; U Leser; S C Schulze-Rauschenbach; J Zimmermann; J Lüssem; T Bode; A B Cremers
Journal:  Bioinformatics       Date:  2006-02-15       Impact factor: 6.937

6.  myExperiment: a repository and social network for the sharing of bioinformatics workflows.

Authors:  Carole A Goble; Jiten Bhagat; Sergejs Aleksejevs; Don Cruickshank; Danius Michaelides; David Newman; Mark Borkum; Sean Bechhofer; Marco Roos; Peter Li; David De Roure
Journal:  Nucleic Acids Res       Date:  2010-05-25       Impact factor: 16.971

7.  BioRuby: bioinformatics software for the Ruby programming language.

Authors:  Naohisa Goto; Pjotr Prins; Mitsuteru Nakao; Raoul Bonnal; Jan Aerts; Toshiaki Katayama
Journal:  Bioinformatics       Date:  2010-08-25       Impact factor: 6.937

8.  Galaxy: a comprehensive approach for supporting accessible, reproducible, and transparent computational research in the life sciences.

Authors:  Jeremy Goecks; Anton Nekrutenko; James Taylor
Journal:  Genome Biol       Date:  2010-08-25       Impact factor: 13.583

9.  BioWMS: a web-based Workflow Management System for bioinformatics.

Authors:  Ezio Bartocci; Flavio Corradini; Emanuela Merelli; Lorenzo Scortichini
Journal:  BMC Bioinformatics       Date:  2007-03-08       Impact factor: 3.169

10.  Pegasys: software for executing and integrating analyses of biological sequences.

Authors:  Sohrab P Shah; David Y M He; Jessica N Sawkins; Jeffrey C Druce; Gerald Quon; Drew Lett; Grace X Y Zheng; Tao Xu; B F Francis Ouellette
Journal:  BMC Bioinformatics       Date:  2004-04-19       Impact factor: 3.169

View more
  10 in total

1.  Identification of a homozygous frameshift variant in RFLNA in a patient with a typical phenotype of spondylocarpotarsal synostosis syndrome.

Authors:  Hitomi Shimizu; Satoshi Watanabe; Akira Kinoshita; Hiroyuki Mishima; Gen Nishimura; Hiroyuki Moriuchi; Koh-Ichiro Yoshiura; Sumito Dateki
Journal:  J Hum Genet       Date:  2019-02-22       Impact factor: 3.172

2.  Aberrant hypomethylation at imprinted differentially methylated regions is involved in biparental placental mesenchymal dysplasia.

Authors:  Saori Aoki; Ken Higashimoto; Hidenori Hidaka; Yasufumi Ohtsuka; Shigehisa Aoki; Hiroyuki Mishima; Koh-Ichiro Yoshiura; Kazuhiko Nakabayashi; Kenichiro Hata; Hitomi Yatsuki; Satoshi Hara; Takashi Ohba; Hidetaka Katabuchi; Hidenobu Soejima
Journal:  Clin Epigenetics       Date:  2022-05-17       Impact factor: 7.259

3.  The Ruby UCSC API: accessing the UCSC genome database using Ruby.

Authors:  Hiroyuki Mishima; Jan Aerts; Toshiaki Katayama; Raoul J P Bonnal; Koh-ichiro Yoshiura
Journal:  BMC Bioinformatics       Date:  2012-09-21       Impact factor: 3.169

4.  Nonsense mutation in CFAP43 causes normal-pressure hydrocephalus with ciliary abnormalities.

Authors:  Yoshiro Morimoto; Shintaro Yoshida; Akira Kinoshita; Chisei Satoh; Hiroyuki Mishima; Naohiro Yamaguchi; Katsuya Matsuda; Miako Sakaguchi; Takeshi Tanaka; Yoshihiro Komohara; Akira Imamura; Hiroki Ozawa; Masahiro Nakashima; Naohiro Kurotaki; Tatsuya Kishino; Koh-Ichiro Yoshiura; Shinji Ono
Journal:  Neurology       Date:  2019-04-19       Impact factor: 9.910

5.  HaTSPiL: A modular pipeline for high-throughput sequencing data analysis.

Authors:  Edoardo Morandi; Matteo Cereda; Danny Incarnato; Caterina Parlato; Giulia Basile; Francesca Anselmi; Andrea Lauria; Lisa Marie Simon; Isabelle Laurence Polignano; Francesca Arruga; Silvia Deaglio; Elisa Tirtei; Franca Fagioli; Salvatore Oliviero
Journal:  PLoS One       Date:  2019-10-15       Impact factor: 3.240

6.  Open Agile text mining for bioinformatics: the PubAnnotation ecosystem.

Authors:  Jin-Dong Kim; Yue Wang; Toyofumi Fujiwara; Shujiro Okuda; Tiffany J Callahan; K Bretonnel Cohen
Journal:  Bioinformatics       Date:  2019-11-01       Impact factor: 6.937

7.  Heterozygous missense variant of the proteasome subunit β-type 9 causes neonatal-onset autoinflammation and immunodeficiency.

Authors:  Nobuo Kanazawa; Hiroaki Hemmi; Noriko Kinjo; Hidenori Ohnishi; Jun Hamazaki; Hiroyuki Mishima; Akira Kinoshita; Tsunehiro Mizushima; Satoru Hamada; Kazuya Hamada; Norio Kawamoto; Saori Kadowaki; Yoshitaka Honda; Kazushi Izawa; Ryuta Nishikomori; Miyuki Tsumura; Yusuke Yamashita; Shinobu Tamura; Takashi Orimo; Toshiya Ozasa; Takashi Kato; Izumi Sasaki; Yuri Fukuda-Ohta; Naoko Wakaki-Nishiyama; Yutaka Inaba; Kayo Kunimoto; Satoshi Okada; Takeshi Taketani; Koichi Nakanishi; Shigeo Murata; Koh-Ichiro Yoshiura; Tsuneyasu Kaisho
Journal:  Nat Commun       Date:  2021-11-24       Impact factor: 14.919

8.  DRAW+SneakPeek: analysis workflow and quality metric management for DNA-seq experiments.

Authors:  Chiao-Feng Lin; Otto Valladares; D Micah Childress; Egor Klevak; Evan T Geller; Yih-Chii Hwang; Ellen A Tsai; Gerard D Schellenberg; Li-San Wang
Journal:  Bioinformatics       Date:  2013-08-13       Impact factor: 6.937

9.  Deep sequencing reveals variations in somatic cell mosaic mutations between monozygotic twins with discordant psychiatric disease.

Authors:  Yoshiro Morimoto; Shinji Ono; Akira Imamura; Yuji Okazaki; Akira Kinoshita; Hiroyuki Mishima; Hideyuki Nakane; Hiroki Ozawa; Koh-Ichiro Yoshiura; Naohiro Kurotaki
Journal:  Hum Genome Var       Date:  2017-07-27

10.  Whole-exome sequencing and gene-based rare variant association tests suggest that PLA2G4E might be a risk gene for panic disorder.

Authors:  Yoshiro Morimoto; Mihoko Shimada-Sugimoto; Takeshi Otowa; Shintaro Yoshida; Akira Kinoshita; Hiroyuki Mishima; Naohiro Yamaguchi; Takatoshi Mori; Akira Imamura; Hiroki Ozawa; Naohiro Kurotaki; Christiane Ziegler; Katharina Domschke; Jürgen Deckert; Tadashi Umekage; Mamoru Tochigi; Hisanobu Kaiya; Yuji Okazaki; Katsushi Tokunaga; Tsukasa Sasaki; Koh-Ichiro Yoshiura; Shinji Ono
Journal:  Transl Psychiatry       Date:  2018-02-02       Impact factor: 6.222

  10 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.