Literature DB >> 22332238

Biogem: an effective tool-based approach for scaling up open source software development in bioinformatics.

Raoul J P Bonnal1, Jan Aerts, George Githinji, Naohisa Goto, Dan MacLean, Chase A Miller, Hiroyuki Mishima, Massimiliano Pagani, Ricardo Ramirez-Gonzalez, Geert Smant, Francesco Strozzi, Rob Syme, Rutger Vos, Trevor J Wennblom, Ben J Woodcroft, Toshiaki Katayama, Pjotr Prins.   

Abstract

SUMMARY: Biogem provides a software development environment for the Ruby programming language, which encourages community-based software development for bioinformatics while lowering the barrier to entry and encouraging best practices. Biogem, with its targeted modular and decentralized approach, software generator, tools and tight web integration, is an improved general model for scaling up collaborative open source software development in bioinformatics. AVAILABILITY: Biogem and modules are free and are OSS. Biogem runs on all systems that support recent versions of Ruby, including Linux, Mac OS X and Windows. Further information at http://www.biogems.info. A tutorial is available at http://www.biogems.info/howto.html CONTACT: bonnal@ingm.org.

Entities:  

Mesh:

Year:  2012        PMID: 22332238      PMCID: PMC3315718          DOI: 10.1093/bioinformatics/bts080

Source DB:  PubMed          Journal:  Bioinformatics        ISSN: 1367-4803            Impact factor:   6.937


1 INTRODUCTION

In biomedical science, new technologies, data formats and methods emerge continuously. Scientists want to take advantage of these developments as soon as possible, which requires bioinformatics software to keep up with new requirements. We support the notion of the Open Bioinformatics Foundation (OBF) that development of collaborative open source software (OSS) is essential for bioinformatics. The OBF represents a number of important projects, such as BioPerl (Stajich ), Biopython (Cock ), BioRuby (Goto ) and BioJava (Holland ). These Bio-star (Bio*) projects effectively function as community centres and share a centralized approach in software development with large source code repositories. Bio* projects, generally, aim for consolidated tools, a stable application programming interface (API), and backwards compatibility. Within the BioRuby project we experienced the drive for stability easily overwhelmed and discouraged developers. Not only because of the complexity of the existing code base, but also because coding standards are enforced, and extensive tests and documentation are required. Furthermore, newly contributed code may be subject to community scrutiny, and in many cases further demands for improving the code follow. The full process introduces a significant delay between initial idea and final acceptance of the code in the main project. Months, even years, may pass between stable releases of main Bio* projects. It may take a long time before a new feature is publicly released. To scale up collaborative software development in BioRuby, we recognized existing and new developers need to be encouraged to contribute more code. To achieve this, we created Biogem a Ruby application framework for rapid creation of decentralized, internet published software modules written to lower the barrier to entry. Biogem was initially inspired by the R/Bioconductor packaging system (Gentleman ), which encourages software developers to publish software modules independently using simple rules; and Ruby on Rails plugins (Thomas ), which provides a software generator and modular software plugin system.

2 FEATURES

For Biogem, we created specific tools to support the creation of bioinformatics software functionalities and to support development ‘best practises’, i.e. infrastructure for software specification, documentation and tests. We also provide tight web integration based on public websites and services. These websites publish and distribute software modules and give web-based access to source code, complete with revision history (see Fig. 1). Biogem exposes Ruby bioinformatics modules, and makes developer productivity and module popularity visible.
Fig. 1.

Biogem eases publication of new bioinformatics Ruby software modules on the Internet, in a few steps. (1) The software generator creates the directory layout and files for a new software module named ‘foo’. (2) The developer writes or modifies source code and (3) quickly and easily publishes the source code and module online, for others to read, install and use. Collaboration (4) is facilitated by publishing source code and changes to navigationable websites. Then the workflow continues again at (2). The http://biogems.info website tracks published modules. Popularity of each published module is tracked, as well as source code changes, updates, bugs and issues. Unlike with the practise of publishing scientific papers, collaboration on software often comes post factum, i.e. after original publishing of a software module. Therefore, it pays to publish software modules early and often. This is reflected in the Biogem workflow.

Biogem eases publication of new bioinformatics Ruby software modules on the Internet, in a few steps. (1) The software generator creates the directory layout and files for a new software module named ‘foo’. (2) The developer writes or modifies source code and (3) quickly and easily publishes the source code and module online, for others to read, install and use. Collaboration (4) is facilitated by publishing source code and changes to navigationable websites. Then the workflow continues again at (2). The http://biogems.info website tracks published modules. Popularity of each published module is tracked, as well as source code changes, updates, bugs and issues. Unlike with the practise of publishing scientific papers, collaboration on software often comes post factum, i.e. after original publishing of a software module. Therefore, it pays to publish software modules early and often. This is reflected in the Biogem workflow. The primary tool of the Biogem framework is a software generator consisting of templates for bioinformatics scripts, source code, software specification, documentation and tests. With the generator, required directories and files are automatically created from templates for a new software module. Templates are included for commonly encountered tasks, such as command line parameter handling, error handling, make files etc. Another Biogem tool publishes the versioned module with its dependencies on the internet. The published module is immediately available for download and installation to bioinformatics users in the form of a Ruby gem (i.e. an archive of modular Ruby code with all the supporting files and information needed for installation by ‘package manager’ software). We refer to a Biogem module as a ‘BioRuby plugin’ if the module extends the BioRuby project. Published software modules are easily repackaged by software distributions, e.g. Debian Bio Med (Möller ) and BioLinux (Field ). The Biogem website (see Availability) makes it easy to find and install software modules. The website also allows people to track releases, software dependencies, development activity, outstanding issues, integration test results, documentation and popularity of published modules. A map shows the location of Biogem developers to help foster a sense of international community. Biogem encourages software development best practices by providing templates for documentation and multiple test driven development strategies; such as unit tests, behaviour driven development and a natural language parser for software specification (e.g. Chelimsky ). A notable difference to the traditional code contribution procedures of the Bio* projects is that best practices are encouraged, rather than enforced. Templates are also included for certain types of functionality, e.g. to generate portable SQL database handlers, and to build a dynamic website. With Biogem it is possible to create a functional web application, or service, in just a few steps. Generating the different features is handled through work flows (Fig. 1). We added tutorials for Biogem, which explain the software generators, templates and software publishing. These tutorials are part of the software distribution and available online. We created ‘collections’ that bundle important modules together as specific releases. For example, ‘bio-core’ contains stable modules, and ‘bio-core-ext’ contains stable modules with bindings to C libraries. Special purpose collections exist such as ‘bio-biolinux’, which is distributed by the Cloud Biolinux project and merged with the Galaxy CloudMan project (Afgan ). In the first 8 months of the Biogem functionality becoming available, over 20 new modules have been published through Biogem, showing a wide variety of subjects. These modules, for example, target big data handling, next generation sequencing and parsing of bioinformatics data formats (Table 1).
Table 1.

The introduction of Biogem has led to a broad range of new BioRuby plugins

NameDescription
bio assemblyRead and write assembly data
bio blastxmlparserFast, low memory, big data BLAST parser
bio bwaBurrows Wheeler aligner
bio cnls scraperNuclear localisation signal prediction
bio six frameSequence translation
bio genomic intervalDetect intervals
bio gff3Fast, low memory, big data GFF3 parser
bio isoelectric pointCalculate protein isoelectric point
bio kb illuminaIllumina annotations
bio lazyblastxmlAnother BLAST XML parser
bio loggerSane error handling
bio nexmlNeXML support, for phylogenetic data
bio ngsNGS workflows and display, included support for
Bwa, Bowtie, TopHat, and Cufflinks
bio octopusTransmembrane domain predictor interface
bio restriction enzymeDNA cutting operations with REBASE
bio samtoolsSamtools API
bio signalpSignal peptide prediction interface
bio sgeSplit huge files for cluster computing
bio tm hmmTransmembrane predictor interface
bio ucsc apiUCSC Genome Database binding

An up-to-date list can be found at http://biogems.info.

The introduction of Biogem has led to a broad range of new BioRuby plugins An up-to-date list can be found at http://biogems.info.

3 CONCLUSION

Biogem provides an environment for rapid bioinformatics software development with a low barrier to entry. Biogem frees potential contributors from code maturity expectations that can be deterring, and encourages Ruby developers to contribute experimental source code early to the BioRuby community. Through Biogem software is published in a modular way, and best practises are encouraged through infrastructure for software specification and testing. All this results in better utilization of existing and new software development manpower, thereby scaling up OSS development in bioinformatics. We suggest Biogem can serve as a generic model; not by replacing existing Bio* projects, but by supplementing them with a decentralized and evolutionary model for collaborative software development.
  8 in total

1.  The Bioperl toolkit: Perl modules for the life sciences.

Authors:  Jason E Stajich; David Block; Kris Boulez; Steven E Brenner; Stephen A Chervitz; Chris Dagdigian; Georg Fuellen; James G R Gilbert; Ian Korf; Hilmar Lapp; Heikki Lehväslaiho; Chad Matsalla; Chris J Mungall; Brian I Osborne; Matthew R Pocock; Peter Schattner; Martin Senger; Lincoln D Stein; Elia Stupka; Mark D Wilkinson; Ewan Birney
Journal:  Genome Res       Date:  2002-10       Impact factor: 9.043

2.  Open software for biologists: from famine to feast.

Authors:  Dawn Field; Bela Tiwari; Tim Booth; Stewart Houten; Dan Swan; Nicolas Bertrand; Milo Thurston
Journal:  Nat Biotechnol       Date:  2006-07       Impact factor: 54.908

3.  Biopython: freely available Python tools for computational molecular biology and bioinformatics.

Authors:  Peter J A Cock; Tiago Antao; Jeffrey T Chang; Brad A Chapman; Cymon J Cox; Andrew Dalke; Iddo Friedberg; Thomas Hamelryck; Frank Kauff; Bartek Wilczynski; Michiel J L de Hoon
Journal:  Bioinformatics       Date:  2009-03-20       Impact factor: 6.937

4.  BioRuby: bioinformatics software for the Ruby programming language.

Authors:  Naohisa Goto; Pjotr Prins; Mitsuteru Nakao; Raoul Bonnal; Jan Aerts; Toshiaki Katayama
Journal:  Bioinformatics       Date:  2010-08-25       Impact factor: 6.937

5.  Bioconductor: open software development for computational biology and bioinformatics.

Authors:  Robert C Gentleman; Vincent J Carey; Douglas M Bates; Ben Bolstad; Marcel Dettling; Sandrine Dudoit; Byron Ellis; Laurent Gautier; Yongchao Ge; Jeff Gentry; Kurt Hornik; Torsten Hothorn; Wolfgang Huber; Stefano Iacus; Rafael Irizarry; Friedrich Leisch; Cheng Li; Martin Maechler; Anthony J Rossini; Gunther Sawitzki; Colin Smith; Gordon Smyth; Luke Tierney; Jean Y H Yang; Jianhua Zhang
Journal:  Genome Biol       Date:  2004-09-15       Impact factor: 13.583

6.  Galaxy CloudMan: delivering cloud compute clusters.

Authors:  Enis Afgan; Dannon Baker; Nate Coraor; Brad Chapman; Anton Nekrutenko; James Taylor
Journal:  BMC Bioinformatics       Date:  2010-12-21       Impact factor: 3.169

7.  Community-driven computational biology with Debian Linux.

Authors:  Steffen Möller; Hajo Nils Krabbenhöft; Andreas Tille; David Paleino; Alan Williams; Katy Wolstencroft; Carole Goble; Richard Holland; Dominique Belhachemi; Charles Plessy
Journal:  BMC Bioinformatics       Date:  2010-12-21       Impact factor: 3.169

8.  BioJava: an open-source framework for bioinformatics.

Authors:  R C G Holland; T A Down; M Pocock; A Prlić; D Huen; K James; S Foisy; A Dräger; A Yates; M Heuer; M J Schreiber
Journal:  Bioinformatics       Date:  2008-08-08       Impact factor: 6.937

  8 in total
  19 in total

1.  A reference genome for pea provides insight into legume genome evolution.

Authors:  Jonathan Kreplak; Mohammed-Amin Madoui; Petr Cápal; Petr Novák; Karine Labadie; Grégoire Aubert; Philipp E Bayer; Krishna K Gali; Robert A Syme; Dorrie Main; Anthony Klein; Aurélie Bérard; Iva Vrbová; Cyril Fournier; Leo d'Agata; Caroline Belser; Wahiba Berrabah; Helena Toegelová; Zbyněk Milec; Jan Vrána; HueyTyng Lee; Ayité Kougbeadjo; Morgane Térézol; Cécile Huneau; Chala J Turo; Nacer Mohellibi; Pavel Neumann; Matthieu Falque; Karine Gallardo; Rebecca McGee; Bunyamin Tar'an; Abdelhafid Bendahmane; Jean-Marc Aury; Jacqueline Batley; Marie-Christine Le Paslier; Noel Ellis; Thomas D Warkentin; Clarice J Coyne; Jérome Salse; David Edwards; Judith Lichtenzveig; Jiří Macas; Jaroslav Doležel; Patrick Wincker; Judith Burstin
Journal:  Nat Genet       Date:  2019-09-02       Impact factor: 38.330

2.  expVIP: a Customizable RNA-seq Data Analysis and Visualization Platform.

Authors:  Philippa Borrill; Ricardo Ramirez-Gonzalez; Cristobal Uauy
Journal:  Plant Physiol       Date:  2016-02-11       Impact factor: 8.340

3.  Ten simple rules for the open development of scientific software.

Authors:  Andreas Prlić; James B Procter
Journal:  PLoS Comput Biol       Date:  2012-12-06       Impact factor: 4.475

4.  The Ruby UCSC API: accessing the UCSC genome database using Ruby.

Authors:  Hiroyuki Mishima; Jan Aerts; Toshiaki Katayama; Raoul J P Bonnal; Koh-ichiro Yoshiura
Journal:  BMC Bioinformatics       Date:  2012-09-21       Impact factor: 3.169

5.  An in-silico study of the mutation-associated effects on the spike protein of SARS-CoV-2, Omicron variant.

Authors:  Tushar Ahmed Shishir; Taslimun Jannat; Iftekhar Bin Naser
Journal:  PLoS One       Date:  2022-04-21       Impact factor: 3.752

6.  Anatomy of BioJS, an open source community for the life sciences.

Authors:  Guy Yachdav; Tatyana Goldberg; Sebastian Wilzbach; David Dao; Iris Shih; Saket Choudhary; Steve Crouch; Max Franz; Alexander García; Leyla J García; Björn A Grüning; Devasena Inupakutika; Ian Sillitoe; Anil S Thanki; Bruno Vieira; José M Villaveces; Maria V Schneider; Suzanna Lewis; Steve Pettifer; Burkhard Rost; Manuel Corpas
Journal:  Elife       Date:  2015-07-08       Impact factor: 8.140

7.  The long intergenic noncoding RNA landscape of human lymphocytes highlights the regulation of T cell differentiation by linc-MAF-4.

Authors:  Valeria Ranzani; Grazisa Rossetti; Ilaria Panzeri; Alberto Arrigoni; Raoul Jp Bonnal; Serena Curti; Paola Gruarin; Elena Provasi; Elisa Sugliano; Maurizio Marconi; Raffaele De Francesco; Jens Geginat; Beatrice Bodega; Sergio Abrignani; Massimiliano Pagani
Journal:  Nat Immunol       Date:  2015-01-26       Impact factor: 25.606

8.  BioC implementations in Go, Perl, Python and Ruby.

Authors:  Wanli Liu; Rezarta Islamaj Doğan; Dongseop Kwon; Hernani Marques; Fabio Rinaldi; W John Wilbur; Donald C Comeau
Journal:  Database (Oxford)       Date:  2014-06-23       Impact factor: 3.451

9.  MAFCO: a compression tool for MAF files.

Authors:  Luís M O Matos; António J R Neves; Diogo Pratas; Armando J Pinho
Journal:  PLoS One       Date:  2015-03-27       Impact factor: 3.240

10.  PyPedia: using the wiki paradigm as crowd sourcing environment for bioinformatics protocols.

Authors:  Alexandros Kanterakis; Joël Kuiper; George Potamias; Morris A Swertz
Journal:  Source Code Biol Med       Date:  2015-11-19
View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.