Literature DB >> 31121028

Software engineering for scientific big data analysis.

Björn A Grüning1,2, Samuel Lampa3,4, Marc Vaudel5,6, Daniel Blankenberg7.   

Abstract

The increasing complexity of data and analysis methods has created an environment where scientists, who may not have formal training, are finding themselves playing the impromptu role of software engineer. While several resources are available for introducing scientists to the basics of programming, researchers have been left with little guidance on approaches needed to advance to the next level for the development of robust, large-scale data analysis tools that are amenable to integration into workflow management systems, tools, and frameworks. The integration into such workflow systems necessitates additional requirements on computational tools, such as adherence to standard conventions for robustness, data input, output, logging, and flow control. Here we provide a set of 10 guidelines to steer the creation of command-line computational tools that are usable, reliable, extensible, and in line with standards of modern coding practices.
© The Author(s) 2019. Published by Oxford University Press.

Entities:  

Keywords:  big data; coding; computational tools; data analysis; integration systems; scientific software; software development; software engineering; standards; workflow

Mesh:

Year:  2019        PMID: 31121028      PMCID: PMC6532757          DOI: 10.1093/gigascience/giz054

Source DB:  PubMed          Journal:  Gigascience        ISSN: 2047-217X            Impact factor:   6.524


  21 in total

1.  The Bioperl toolkit: Perl modules for the life sciences.

Authors:  Jason E Stajich; David Block; Kris Boulez; Steven E Brenner; Stephen A Chervitz; Chris Dagdigian; Georg Fuellen; James G R Gilbert; Ian Korf; Hilmar Lapp; Heikki Lehväslaiho; Chad Matsalla; Chris J Mungall; Brian I Osborne; Matthew R Pocock; Peter Schattner; Martin Senger; Lincoln D Stein; Elia Stupka; Mark D Wilkinson; Ewan Birney
Journal:  Genome Res       Date:  2002-10       Impact factor: 9.043

2.  Bpipe: a tool for running and managing bioinformatics pipelines.

Authors:  Simon P Sadedin; Bernard Pope; Alicia Oshlack
Journal:  Bioinformatics       Date:  2012-04-12       Impact factor: 6.937

3.  GenePattern 2.0.

Authors:  Michael Reich; Ted Liefeld; Joshua Gould; Jim Lerner; Pablo Tamayo; Jill P Mesirov
Journal:  Nat Genet       Date:  2006-05       Impact factor: 38.330

4.  Snakemake--a scalable bioinformatics workflow engine.

Authors:  Johannes Köster; Sven Rahmann
Journal:  Bioinformatics       Date:  2012-08-20       Impact factor: 6.937

5.  The SeqAn C++ template library for efficient sequence analysis: A resource for programmers.

Authors:  Knut Reinert; Temesgen Hailemariam Dadi; Marcel Ehrhardt; Hannes Hauswedell; Svenja Mehringer; René Rahn; Jongkyu Kim; Christopher Pockrandt; Jörg Winkler; Enrico Siragusa; Gianvito Urgese; David Weese
Journal:  J Biotechnol       Date:  2017-09-06       Impact factor: 3.307

6.  Biopython: freely available Python tools for computational molecular biology and bioinformatics.

Authors:  Peter J A Cock; Tiago Antao; Jeffrey T Chang; Brad A Chapman; Cymon J Cox; Andrew Dalke; Iddo Friedberg; Thomas Hamelryck; Frank Kauff; Bartek Wilczynski; Michiel J L de Hoon
Journal:  Bioinformatics       Date:  2009-03-20       Impact factor: 6.937

7.  The Taverna workflow suite: designing and executing workflows of Web Services on the desktop, web or in the cloud.

Authors:  Katherine Wolstencroft; Robert Haines; Donal Fellows; Alan Williams; David Withers; Stuart Owen; Stian Soiland-Reyes; Ian Dunlop; Aleksandra Nenadic; Paul Fisher; Jiten Bhagat; Khalid Belhajjame; Finn Bacall; Alex Hardisty; Abraham Nieva de la Hidalga; Maria P Balcazar Vargas; Shoaib Sufi; Carole Goble
Journal:  Nucleic Acids Res       Date:  2013-05-02       Impact factor: 16.971

8.  Engineering bioinformatics: building reliability, performance and productivity into bioinformatics software.

Authors:  Brendan Lawlor; Paul Walsh
Journal:  Bioengineered       Date:  2015-05-21       Impact factor: 3.269

9.  Biology Needs Evolutionary Software Tools: Let's Build Them Right.

Authors:  Anton Nekrutenko; Galaxy Team; Jeremy Goecks; James Taylor; Daniel Blankenberg
Journal:  Mol Biol Evol       Date:  2018-06-01       Impact factor: 16.240

10.  The Galaxy platform for accessible, reproducible and collaborative biomedical analyses: 2018 update.

Authors:  Enis Afgan; Dannon Baker; Bérénice Batut; Marius van den Beek; Dave Bouvier; Martin Cech; John Chilton; Dave Clements; Nate Coraor; Björn A Grüning; Aysam Guerler; Jennifer Hillman-Jackson; Saskia Hiltemann; Vahid Jalili; Helena Rasche; Nicola Soranzo; Jeremy Goecks; James Taylor; Anton Nekrutenko; Daniel Blankenberg
Journal:  Nucleic Acids Res       Date:  2018-07-02       Impact factor: 16.971

View more
  3 in total

1.  Principles for data analysis workflows.

Authors:  Sara Stoudt; Váleri N Vásquez; Ciera C Martinez
Journal:  PLoS Comput Biol       Date:  2021-03-18       Impact factor: 4.475

2.  The Impact of Pathway Database Choice on Statistical Enrichment Analysis and Predictive Modeling.

Authors:  Sarah Mubeen; Charles Tapley Hoyt; André Gemünd; Martin Hofmann-Apitius; Holger Fröhlich; Daniel Domingo-Fernández
Journal:  Front Genet       Date:  2019-11-22       Impact factor: 4.599

3.  Ten simple rules for making a software tool workflow-ready.

Authors:  Paul Brack; Peter Crowther; Stian Soiland-Reyes; Stuart Owen; Douglas Lowe; Alan R Williams; Quentin Groom; Mathias Dillen; Frederik Coppens; Björn Grüning; Ignacio Eguinoa; Philip Ewels; Carole Goble
Journal:  PLoS Comput Biol       Date:  2022-03-24       Impact factor: 4.475

  3 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.