Literature DB >> 20739307

BioRuby: bioinformatics software for the Ruby programming language.

Naohisa Goto¹, Pjotr Prins, Mitsuteru Nakao, Raoul Bonnal, Jan Aerts, Toshiaki Katayama.

Abstract

SUMMARY: The BioRuby software toolkit contains a comprehensive set of free development tools and libraries for bioinformatics and molecular biology, written in the Ruby programming language. BioRuby has components for sequence analysis, pathway analysis, protein modelling and phylogenetic analysis; it supports many widely used data formats and provides easy access to databases, external programs and public web services, including BLAST, KEGG, GenBank, MEDLINE and GO. BioRuby comes with a tutorial, documentation and an interactive environment, which can be used in the shell, and in the web browser. AVAILABILITY: BioRuby is free and open source software, made available under the Ruby license. BioRuby runs on all platforms that support Ruby, including Linux, Mac OS X and Windows. And, with JRuby, BioRuby runs on the Java Virtual Machine. The source code is available from http://www.bioruby.org/. CONTACT: katayama@bioruby.org

Entities: Chemical Disease Gene Species

Mesh：

Year: 2010 PMID： 20739307 PMCID： PMC2951089 DOI： 10.1093/bioinformatics/btq475

Source DB: PubMed Journal: Bioinformatics ISSN： 1367-4803 Impact factor: 6.937

1 INTRODUCTION

Research in molecular biology depends critically on access to databases and web services. The BioRuby project was conceived in 2000 to provide easy access to bioinformatics resources through free and open source tools and libraries for Ruby, a dynamic open source programming language with a focus on simplicity and productivity (www.ruby-lang.org). The BioRuby software components cover a wide range of functionality that is comparable to that offered by other Bio* projects, each targeting a different computer programming language (Stajich and Lapp, 2006), such as BioPerl (Stajich et al., 2002), Biopython (Cock et al., 2009) and BioJava (Holland et al., 2008). BioRuby software components are written in standard Ruby, so they run on all operating systems that support Ruby itself, including Linux, OS X, FreeBSD, Solaris and Windows. With JRuby, BioRuby also can run inside a Java Virtual Machine (JVM), allowing interaction with Java applications and libraries, like Cytoscape for visualization (Shannon et al., 2003). Both BioRuby and Ruby are used in bioinformatics for scripting (Aerts and Law, 2009), scripting against applications (Katoh et al., 2005), modelling (Lee and Blundell, 2009; Metlagel et al., 2007), analysis (Prince and Marcotte, 2008), visualization and service integration (Philippi, 2004). The web development framework ‘Ruby on Rails’ is used to create web applications and web services (Biegert et al., 2006; Jacobsen et al., 2010). BioRuby provides connection functionality for major web services, such as the Kyoto Encyclopedia of Genes and Genomes (KEGG; see example in Fig. 1) (Kanehisa et al., 2008), and the TogoWS service, which provides a uniform web service front-end for the major bioinformatics databases (Katayama et al., 2010).

Fig. 1.

BioRuby shell example of fetching a KEGG graph using BioRuby's KEGG API (Kanehisa et al., 2008). After installing BioRuby, the ‘bioruby’ command starts the interactive shell. With the bfind command, the KEGG MODULE database is queried for entries involved in the metabolic ‘cytrate cycle’ or tricarboxylic acid cycle. The purple and blue colours, in input and output, reflect two modules in the carbon oxidation pathway. The user loads and confirms entries by using flatparse and pathways commands. Next, KEGG ORTHOLOGY database IDs are fetched and the colours are assigned to enzymes in each module. Finally, KEGG generates the coloured image of the ‘cytrate cycle’ pathway and the image is saved locally. The BioRuby source tree contains over 580 documented classes, 2800 public methods and 20 000 unit test assertions. Source code is kept under Git version control, which allows anyone to clone the source tree and start submitting. We have found that Git substantially lowers the barrier for new people to start contributing to the project. In the last 2 years, the source tree has gained 100 people tracking changes and 32 people cloned the repository. The BioRuby project is part of the Open Bioinformatics Foundation, which hosts the project website and mailing list, and organizes the annual Bioinformatics Open Source Conference together with the other Bio* projects. A number of BioRuby features support Bio* cross-project standards, such as the BioSQL relational model for interoperable storage of certain data objects or their implementation is coordinated across the Bio* projects, including support for the FASTQ (Cock et al., 2010) and phyloXML (Han and Zmasek, 2009) data exchange formats.

2 FEATURES

BioRuby covers a wide range of functional areas which have been logically divided into separate modules (Table 1).

Table 1.

BioRuby modules

Category	Module list
Object	Sequence, pathway, tree, bibliography reference
Sequence	Manipulation, translation, alignment, location, mapping, feature table, molecular weight, design siRNA, restriction enzyme
Format	GenBank, EMBL, UniProt, KEGG, PDB, MEDLINE, REBASE, FASTA, FASTQ, GFF, MSF, ABIF, SCF, GCG, Lasergene, GEO SOFT, Gene Ontology
Tool	BLAST, FASTA, EMBOSS, HMMER, InterProScan, GenScan, BLAT, Sim4, Spidey, MEME, ClustalW, MUSCLE, MAFFT, T-Coffee, ProbCons
Phylogeny	PHYLIP, PAML, phyloXML, NEXUS, Newick
Web service	NCBI, EBI, DDBJ, KEGG, TogoWS, PSORT, TargetP, PTS1, SOSUI, TMHMM
OBDA	BioSQL, BioFetch, indexed flat files

Refer to www.bioruby.org for an explanation of all acronyms.

BioRuby modules Refer to www.bioruby.org for an explanation of all acronyms. BioRuby allows accessing a comprehensive range of public bioinformatics resources. For example, BioRuby supports the Open Biological Database Access (ODBA) as a generic and standardized way of accessing biological data sources. In addition, BioRuby can directly process local database files in a variety of different flat file formats, including FASTA, FASTQ (Fig. 2), GenBank and PDB. BioRuby also allows querying and accessing remote online resources through their interfaces for programmatic access, such as those provided by KEGG, the DNA Databank of Japan (DDBJ), the National Center for Biotechnology Information (NCBI) and the European Bioinformatics Institute (EBI).

Fig. 2.

BioRuby example of masking sequences from next generation sequencing data in FASTQ format using a defined quality_threshold, and writing the results in FASTA format.

BioRuby example of masking sequences from next generation sequencing data in FASTQ format using a defined quality_threshold, and writing the results in FASTA format. BioRuby has online documentation, tutorials and code examples. It is straightforward to get started with BioRuby and use it to replace, or glue together, legacy shell scripts or to mix Ruby on Rails into an existing web application. BioRuby comes with an interactive environment, both for the command-line shell and in the browser. Ideas can be quickly prototyped in the interactive environment and can be saved as ‘scripts’ for later use. Such an interactive environment has shown to be especially useful for bioinformatics training and teaching (Fig. 1). New features, and refinements of existing ones, are constantly being added to the BioRuby code base. Current development activity focuses on adding support for the semantic web, and on designing a plugin system that allows adding entirely new components in a loosely coupled manner, such that experimental new code can be developed without having an impact on BioRuby's core stability and portability.

3 CONCLUSION

The BioRuby software toolkit provides a broad range of functionality for molecular biology and easy access to bioinformatics resources. BioRuby is written in Ruby, a dynamic programming language with a focus on simplicity and productivity, which targets all popular operating systems and the JVM. The BioRuby project is an international and vibrant collaborative software initiative that delivers life science programming resources for those researchers who want to benefit from the productivity features of the Ruby language, as well as from the larger Ruby ecosystem of reusable open source components.

17 in total

1. Light-weight integration of molecular biological databases.

Authors: Stephan Philippi
Journal: Bioinformatics Date: 2004-01-01 Impact factor: 6.937

2. The Bioperl toolkit: Perl modules for the life sciences.

Authors: Jason E Stajich; David Block; Kris Boulez; Steven E Brenner; Stephen A Chervitz; Chris Dagdigian; Georg Fuellen; James G R Gilbert; Ian Korf; Hilmar Lapp; Heikki Lehväslaiho; Chad Matsalla; Chris J Mungall; Brian I Osborne; Matthew R Pocock; Peter Schattner; Martin Senger; Lincoln D Stein; Elia Stupka; Mark D Wilkinson; Ewan Birney
Journal: Genome Res Date: 2002-10 Impact factor: 9.043

Review 3. Open source tools and toolkits for bioinformatics: significance, and where are we?

Authors: Jason E Stajich; Hilmar Lapp
Journal: Brief Bioinform Date: 2006-08-09 Impact factor: 11.622

4. Ruby-Helix: an implementation of helical image processing based on object-oriented scripting language.

Authors: Zoltan Metlagel; Yayoi S Kikkawa; Masahide Kikkawa
Journal: J Struct Biol Date: 2006-08-15 Impact factor: 2.867

5. TogoWS: integrated SOAP and REST APIs for interoperable bioinformatics Web services.

Authors: Toshiaki Katayama; Mitsuteru Nakao; Toshihisa Takagi
Journal: Nucleic Acids Res Date: 2010-05-14 Impact factor: 16.971

6. mspire: mass spectrometry proteomics in Ruby.

Authors: John T Prince; Edward M Marcotte
Journal: Bioinformatics Date: 2008-10-16 Impact factor: 6.937

7. The MPI Bioinformatics Toolkit for protein sequence analysis.

Authors: Andreas Biegert; Christian Mayer; Michael Remmert; Johannes Söding; Andrei N Lupas
Journal: Nucleic Acids Res Date: 2006-07-01 Impact factor: 16.971

8. MAFFT version 5: improvement in accuracy of multiple sequence alignment.

Authors: Kazutaka Katoh; Kei-ichi Kuma; Hiroyuki Toh; Takashi Miyata
Journal: Nucleic Acids Res Date: 2005-01-20 Impact factor: 16.971

9. BioJava: an open-source framework for bioinformatics.

Authors: R C G Holland; T A Down; M Pocock; A Prlić; D Huen; K James; S Foisy; A Dräger; A Yates; M Heuer; M J Schreiber
Journal: Bioinformatics Date: 2008-08-08 Impact factor: 6.937

10. KEGG for linking genomes to life and the environment.

Authors: Minoru Kanehisa; Michihiro Araki; Susumu Goto; Masahiro Hattori; Mika Hirakawa; Masumi Itoh; Toshiaki Katayama; Shuichi Kawashima; Shujiro Okuda; Toshiaki Tokimatsu; Yoshihiro Yamanishi
Journal: Nucleic Acids Res Date: 2007-12-12 Impact factor: 16.971

87 in total

1. The genome of the fire ant Solenopsis invicta.

Authors: Yannick Wurm; John Wang; Oksana Riba-Grognuz; Miguel Corona; Sanne Nygaard; Brendan G Hunt; Krista K Ingram; Laurent Falquet; Mingkwan Nipitwattanaphon; Dietrich Gotzek; Michiel B Dijkstra; Jan Oettler; Fabien Comtesse; Cheng-Jen Shih; Wen-Jer Wu; Chin-Cheng Yang; Jerome Thomas; Emmanuel Beaudoing; Sylvain Pradervand; Volker Flegel; Erin D Cook; Roberto Fabbretti; Heinz Stockinger; Li Long; William G Farmerie; Jane Oakey; Jacobus J Boomsma; Pekka Pamilo; Soojin V Yi; Jürgen Heinze; Michael A D Goodisman; Laurent Farinelli; Keith Harshman; Nicolas Hulo; Lorenzo Cerutti; Ioannis Xenarios; Dewayne Shoemaker; Laurent Keller
Journal: Proc Natl Acad Sci U S A Date: 2011-01-31 Impact factor: 11.205

2. Identifying recombination hot spots in the HIV-1 genome.

Authors: Redmond P Smyth; Timothy E Schlub; Andrew J Grimm; Caryll Waugh; Paula Ellenberg; Abha Chopra; Simon Mallal; Deborah Cromer; Johnson Mak; Miles P Davenport
Journal: J Virol Date: 2013-12-26 Impact factor: 5.103

3. REdiii: a pipeline for automated structure solution.

Authors: Markus Frederik Bohn; Celia A Schiffer
Journal: Acta Crystallogr D Biol Crystallogr Date: 2015-04-24

4. FSelector: a Ruby gem for feature selection.

Authors: Tiejun Cheng; Yanli Wang; Stephen H Bryant
Journal: Bioinformatics Date: 2012-08-31 Impact factor: 6.937

5. Mining disease fingerprints from within genetic pathways.

Authors: Ahmed Ragab Nabhan; Indra Neil Sarkar
Journal: AMIA Annu Symp Proc Date: 2012-11-03

6. BioJS: an open source JavaScript framework for biological data visualization.

Authors: John Gómez; Leyla J García; Gustavo A Salazar; Jose Villaveces; Swanand Gore; Alexander García; Maria J Martín; Guillaume Launay; Rafael Alcántara; Noemi Del-Toro; Marine Dumousseau; Sandra Orchard; Sameer Velankar; Henning Hermjakob; Chenggong Zong; Peipei Ping; Manuel Corpas; Rafael C Jiménez
Journal: Bioinformatics Date: 2013-02-23 Impact factor: 6.937

Review 7. Refined Pichia pastoris reference genome sequence.

Authors: Lukas Sturmberger; Thomas Chappell; Martina Geier; Florian Krainer; Kasey J Day; Ursa Vide; Sara Trstenjak; Anja Schiefer; Toby Richardson; Leah Soriaga; Barbara Darnhofer; Ruth Birner-Gruenberger; Benjamin S Glick; Ilya Tolstorukov; James Cregg; Knut Madden; Anton Glieder
Journal: J Biotechnol Date: 2016-04-12 Impact factor: 3.307

8. Picoplankton Bloom in Global South? A High Fraction of Aerobic Anoxygenic Phototrophic Bacteria in Metagenomes from a Coastal Bay (Arraial do Cabo--Brazil).

Authors: Rafael R C Cuadrat; Isabel Ferrera; Hans-Peter Grossart; Alberto M R Dávila
Journal: OMICS Date: 2016-02

9. The genome of the leaf-cutting ant Acromyrmex echinatior suggests key adaptations to advanced social life and fungus farming.

Authors: Sanne Nygaard; Guojie Zhang; Morten Schiøtt; Cai Li; Yannick Wurm; Haofu Hu; Jiajian Zhou; Lu Ji; Feng Qiu; Morten Rasmussen; Hailin Pan; Frank Hauser; Anders Krogh; Cornelis J P Grimmelikhuijzen; Jun Wang; Jacobus J Boomsma
Journal: Genome Res Date: 2011-06-30 Impact factor: 9.043

10. The DBCLS BioHackathon: standardization and interoperability for bioinformatics web services and workflows. The DBCLS BioHackathon Consortium*.

Authors: Toshiaki Katayama; Kazuharu Arakawa; Mitsuteru Nakao; Keiichiro Ono; Kiyoko F Aoki-Kinoshita; Yasunori Yamamoto; Atsuko Yamaguchi; Shuichi Kawashima; Hong-Woo Chun; Jan Aerts; Bruno Aranda; Lord Hendrix Barboza; Raoul Jp Bonnal; Richard Bruskiewich; Jan C Bryne; José M Fernández; Akira Funahashi; Paul Mk Gordon; Naohisa Goto; Andreas Groscurth; Alex Gutteridge; Richard Holland; Yoshinobu Kano; Edward A Kawas; Arnaud Kerhornou; Eri Kibukawa; Akira R Kinjo; Michael Kuhn; Hilmar Lapp; Heikki Lehvaslaiho; Hiroyuki Nakamura; Yasukazu Nakamura; Tatsuya Nishizawa; Chikashi Nobata; Tamotsu Noguchi; Thomas M Oinn; Shinobu Okamoto; Stuart Owen; Evangelos Pafilis; Matthew Pocock; Pjotr Prins; René Ranzinger; Florian Reisinger; Lukasz Salwinski; Mark Schreiber; Martin Senger; Yasumasa Shigemoto; Daron M Standley; Hideaki Sugawara; Toshiyuki Tashiro; Oswaldo Trelles; Rutger A Vos; Mark D Wilkinson; William York; Christian M Zmasek; Kiyoshi Asai; Toshihisa Takagi
Journal: J Biomed Semantics Date: 2010-08-21