Christoph Kämpf1,2,3,4, Michael Specht3, Alexander Scholz3, Sven-Holger Puppel3, Gero Doose2,5, Kristin Reiche3, Jana Schor6, Jörg Hackermüller7,8. 1. Young Investigators Group Bioinformatics and Transcriptomics, Department Molecular Systems Biology, Helmholtz Center for Environmental Research - UFZ, Permoserstraße 15, Leipzig, 04318, Germany. 2. Bioinformatics Department, Universität Leipzig, Härtelstraße 16-18, Leipzig, 04107, Germany. 3. Bioinformatics Unit, Department of Diagnostics, Fraunhofer Institute for Cell Therapy and Immunology, Perlickstraße 1, Leipzig, 04103, Germany. 4. Present address: Department of Diagnostics, Fraunhofer Institute for Cell Therapy and Immunology, Perlickstraße 1, Leipzig, 04103, Germany. 5. Present address: ecSeq Bioinformatics GmbH, Sternwartenstraße 29, Leipzig, 04103, Germany. 6. Young Investigators Group Bioinformatics and Transcriptomics, Department Molecular Systems Biology, Helmholtz Center for Environmental Research - UFZ, Permoserstraße 15, Leipzig, 04318, Germany. jana.schor@ufz.de. 7. Young Investigators Group Bioinformatics and Transcriptomics, Department Molecular Systems Biology, Helmholtz Center for Environmental Research - UFZ, Permoserstraße 15, Leipzig, 04318, Germany. joerg.hackermueller@ufz.de. 8. Bioinformatics Department, Universität Leipzig, Härtelstraße 16-18, Leipzig, 04107, Germany. joerg.hackermueller@ufz.de.
Abstract
BACKGROUND: A lack of reproducibility has been repeatedly criticized in computational research. High throughput sequencing (HTS) data analysis is a complex multi-step process. For most of the steps a range of bioinformatic tools is available and for most tools manifold parameters need to be set. Due to this complexity, HTS data analysis is particularly prone to reproducibility and consistency issues. We have defined four criteria that in our opinion ensure a minimal degree of reproducible research for HTS data analysis. A series of workflow management systems is available for assisting complex multi-step data analyses. However, to the best of our knowledge, none of the currently available work flow management systems satisfies all four criteria for reproducible HTS analysis. RESULTS: Here we present uap, a workflow management system dedicated to robust, consistent, and reproducible HTS data analysis. uap is optimized for the application to omics data, but can be easily extended to other complex analyses. It is available under the GNU GPL v3 license at https://github.com/yigbt/uap. CONCLUSIONS: uap is a freely available tool that enables researchers to easily adhere to reproducible research principles for HTS data analyses.
BACKGROUND: A lack of reproducibility has been repeatedly criticized in computational research. High throughput sequencing (HTS) data analysis is a complex multi-step process. For most of the steps a range of bioinformatic tools is available and for most tools manifold parameters need to be set. Due to this complexity, HTS data analysis is particularly prone to reproducibility and consistency issues. We have defined four criteria that in our opinion ensure a minimal degree of reproducible research for HTS data analysis. A series of workflow management systems is available for assisting complex multi-step data analyses. However, to the best of our knowledge, none of the currently available work flow management systems satisfies all four criteria for reproducible HTS analysis. RESULTS: Here we present uap, a workflow management system dedicated to robust, consistent, and reproducible HTS data analysis. uap is optimized for the application to omics data, but can be easily extended to other complex analyses. It is available under the GNU GPL v3 license at https://github.com/yigbt/uap. CONCLUSIONS:uap is a freely available tool that enables researchers to easily adhere to reproducible research principles for HTS data analyses.
Entities:
Keywords:
Reproducible research; Sequencing data analysis; Work-flow management
Authors: Björn Grüning; John Chilton; Johannes Köster; Ryan Dale; Nicola Soranzo; Marius van den Beek; Jeremy Goecks; Rolf Backofen; Anton Nekrutenko; James Taylor Journal: Cell Syst Date: 2018-06-27 Impact factor: 10.304
Authors: Cole Trapnell; Brian A Williams; Geo Pertea; Ali Mortazavi; Gordon Kwan; Marijke J van Baren; Steven L Salzberg; Barbara J Wold; Lior Pachter Journal: Nat Biotechnol Date: 2010-05-02 Impact factor: 54.908
Authors: Erik Gafni; Lovelace J Luquette; Alex K Lancaster; Jared B Hawkins; Jae-Yoon Jung; Yassine Souilmi; Dennis P Wall; Peter J Tonellato Journal: Bioinformatics Date: 2014-06-30 Impact factor: 6.937
Authors: Enis Afgan; Dannon Baker; Bérénice Batut; Marius van den Beek; Dave Bouvier; Martin Cech; John Chilton; Dave Clements; Nate Coraor; Björn A Grüning; Aysam Guerler; Jennifer Hillman-Jackson; Saskia Hiltemann; Vahid Jalili; Helena Rasche; Nicola Soranzo; Jeremy Goecks; James Taylor; Anton Nekrutenko; Daniel Blankenberg Journal: Nucleic Acids Res Date: 2018-07-02 Impact factor: 16.971
Authors: Krishna R Kalari; Asha A Nair; Jaysheel D Bhavsar; Daniel R O'Brien; Jaime I Davila; Matthew A Bockol; Jinfu Nie; Xiaojia Tang; Saurabh Baheti; Jay B Doughty; Sumit Middha; Hugues Sicotte; Aubrey E Thompson; Yan W Asmann; Jean-Pierre A Kocher Journal: BMC Bioinformatics Date: 2014-06-27 Impact factor: 3.169
Authors: Manuela Völkner; Felix Wagner; Lisa Maria Steinheuer; Madalena Carido; Thomas Kurth; Ali Yazbeck; Jana Schor; Stephanie Wieneke; Lynn J A Ebner; Claudia Del Toro Runzer; David Taborsky; Katja Zoschke; Marlen Vogt; Sebastian Canzler; Andreas Hermann; Shahryar Khattak; Jörg Hackermüller; Mike O Karl Journal: Nat Commun Date: 2022-10-19 Impact factor: 17.694
Authors: Kristin Schubert; Isabel Karkossa; Jana Schor; Beatrice Engelmann; Lisa Maria Steinheuer; Tony Bruns; Ulrike Rolle-Kampczyk; Jörg Hackermüller; Martin von Bergen Journal: Front Immunol Date: 2021-05-24 Impact factor: 7.561
Authors: Lia Walcher; Ann-Kathrin Kistenmacher; Charline Sommer; Sebastian Böhlen; Christina Ziemann; Susann Dehmel; Armin Braun; Uta Sandy Tretbar; Stephan Klöß; Axel Schambach; Michael Morgan; Dennis Löffler; Christoph Kämpf; Conny Blumert; Kristin Reiche; Jana Beckmann; Ulla König; Bastian Standfest; Martin Thoma; Gustavo R Makert; Sebastian Ulbert; Uta Kossatz-Böhlert; Ulrike Köhl; Anna Dünkel; Stephan Fricke Journal: Front Immunol Date: 2021-06-04 Impact factor: 7.561
Authors: Hannes Petruschke; Christian Schori; Christian H Ahrens; Martin von Bergen; Sebastian Canzler; Sarah Riesbeck; Anja Poehlein; Rolf Daniel; Daniel Frei; Tina Segessemann; Johannes Zimmerman; Georgios Marinos; Christoph Kaleta; Nico Jehmlich Journal: Microbiome Date: 2021-02-23 Impact factor: 14.650