Literature DB >> 27153616

SBtab: a flexible table format for data exchange in systems biology.

Timo Lubitz¹, Jens Hahn¹, Frank T Bergmann², Elad Noor³, Edda Klipp¹, Wolfram Liebermeister⁴.

Abstract

UNLABELLED: : SBtab is a table-based data format for Systems Biology, designed to support automated data integration and model building. It uses the structure of spreadsheets and defines conventions for table structure, controlled vocabularies and semantic annotations. The format comes with predefined table types for experimental data and SBML-compliant model structures and can easily be customized to cover new types of data.
AVAILABILITY AND IMPLEMENTATION: SBtab documents can be created and edited with any text editor or spreadsheet tool. The website www.sbtab.net provides online tools for syntax validation and conversion to SBML and HTML, as well as software for using SBtab in MS Excel, MATLAB and R. The stand-alone Python code contains functions for file parsing, validation, conversion to SBML and HTML and an interface to SQLite databases, to be integrated into Systems Biology workflows. A detailed specification of SBtab, including examples and descriptions of table types and available tools, can be found at www.sbtab.net CONTACT: : wolfram.liebermeister@gmail.com.

Entities: Chemical Species

Mesh：

Year: 2016 PMID： 27153616 PMCID： PMC4978929 DOI： 10.1093/bioinformatics/btw179

Source DB: PubMed Journal: Bioinformatics ISSN： 1367-4803 Impact factor: 6.937

1 Introduction

Data exchange between experimental and computational biologists plays a central role in Systems Biology. The scientific community has developed a range of standard formats for data and models that facilitate this exchange and increase software interoperability. Prominent examples are the Systems Biology Markup Language (SBML; Hucka ) for the exchange of mathematical models, the Systems Biology Graphical Notation (SBGN; Le Novère ) for standardized drawings of biological networks, and the Investigation/Study/Assay format (ISA-TAB), a standard spreadsheet format for experimental data and metadata (Sansone ). Standards facilitate automated file parsing and guarantee that files contain complete and unambiguous information, which makes research results better comprehensible and reproducible. Standardized formats are often not designed to be human-readable, but rather written and read by software. This makes it harder for users without particular technical knowledge or software skills to employ them efficiently. An alternative approach to fostering data exchange is to provide guidelines for minimal information requirements, as demonstrated by the MIRIAM rules for published models of Systems Biology (Le Novère ): These rules specify minimal pieces of model information that must be provided to ensure that model simulations can be reliably reproduced by other researchers. Such rules ensure complete and unambigous information while placing less restrictions on modellers than the aforementioned standardized file formats. Despite these efforts, the usage of standard formats has not been fully established at the interface of experimental and computational Systems Biology. At this interface, complex models need to be constructed from heterogeneous data types (descriptions of network topologies as well as omics, thermodynamic and kinetic data) and using innovative methods. A number of formats are employed by different subcommunities to approach these tasks: While SBML is popular among modellers, experimentalists typically publish their data, including network structures, as spreadsheets or delimiter-separated text files. In contrast to standardized formats—which also exist for many types of experimental data—spreadsheets provide high flexibility, but file structure and naming conventions vary widely. Integrating these diverse and heterogeneous data into computational models can be tedious, since data often need to be reformatted, associated with unique identifiers and annotated. Even without moving from spreadsheets to other formats, simple conventions regarding the usage of names, annotations and syntax, as well as automatic tools for file validation and conversion, could facilitate this process, help avoid errors and experiment repetitions.

2 Results and implementation

Combining the advantages of standardized formats with the flexibility of spreadsheet files, we developed SBtab as a set of conventions for spreadsheets and delimited text files. SBtab defines table structures and naming conventions that make tables easy to parse and support precise and complete information in data files. A simple example of an SBtab table is shown in Figure 1: The attributes in the first line provide general information about the dataset, followed by a line with defined column headers (marked by the ! character). Additional data can be placed into ‘uncontrolled’ columns, to which no restrictions apply.

Fig. 1.

(A) The depicted SBtab table type ‘Reaction’ describes biochemical reactions in upper glycolysis. The table comprises information about reaction modifiers and reversibility, and annotates reactions with identifiers from the KEGG database. (B) Structural information about a biochemical network model can be converted between SBML and SBtab formats. The SBtab tables refer to different types of SBML elements (e.g. Reaction, Compound, Compartment, Quantity, Rules, Events) In designing the format, we tried to adopt useful standards from other places. Identifiers.org, for example, provides a simple and safe mechanism to refer to the entries of databases and web repositories (Juty ). Instead of defining our own format for database references, we simply rely on this mechanism. This makes SBtab files easy to process and may contribute to promoting this useful existing standard. SBtab offers predefined table types that represent diverse kinds of data, e.g. experimental time series, biochemical model parameters (e.g. kinetic constants), or descriptions of SBML-compliant network models. Beyond these predefined table types, SBtab can be tailored to particular types of data to be exchanged. A detailed description of SBtab, together with many examples, is given in the SBtab specification document (http://arxiv.org/abs/1502.01463, Liebermeister ). To simplify the work with SBtab documents, we provide free online tools and software for using SBtab in MS Excel, Python, MATLAB and R. The online tools on www.sbtab.net comprise an automatic syntax validator for SBtab files and a tool for converting models from SBtab to SBML and vice versa. The same functions are also provided as Python code, as an R interface and as an add-in for MS Excel. The latter has been implemented in C# and enables the manipulation of SBML files via the SBtab interface from within Excel. Using the Python-based SBtab parser and object classes, support for SBtab can be easily integrated into Systems Biology workflows. The potential applications are numerous: The structure of SBtab resembles those of relational databases, and we also provide Python code for the import and export of SBtab to and from an SQLite database. Using such a database, model or data elements can be queried via SQL statements, which are supported by virtually all programming languages. This can be used in workflows comprising data storage, manipulation or creation of models (e.g. for the systematic construction of kinetic models; Stanford ), conversion of SBML files into human-readable formats, or incorporation of experimental data into models (e.g. SBtab as an input format for parameter balancing; Lubitz ).

3 Conclusion

SBtab is a flexible, table-based format for data exchange in Systems Biology that comes with tools for diverse groups of users. The use of SBtab can be beneficial both for programmers and for end users with little technical background: Programmers can use the open-source code to easily integrate the functionality into their own software. Excel users can use the MS Excel add-in to conveniently edit SBtab and SBML files in their normal working environment. Unlike specialized formats such as SBML, SBtab is designed to be human-readable: It relies on existing spreadsheets and defines rules that ensure complete and unambiguous information. Adapting existing experimental data files to the SBtab format is simple (since it does not enforce complex syntax restrictions) and rewarding, because the SBtab format is oriented towards established and accepted standards. Hence, SBtab has the potential to bridge the gap between the vast amounts of empirical data, typically stored in spreadsheets, and software that requires structured input formats. Currently, it is used for data storage (e.g. source data of the eQuilibrator web tool; Flamholz ) and modelling workflows (e.g. Stanford ), and is supported by the Data Repository for Kinetic Models of Biological Systems (KiMoSys; Costa ). We envisage that SBtab will foster an easy exchange of data for applications in which no specialized data formats have been established, or where these formats would be too restrictive to use. We also encourage users to customize SBtab for their needs and to tell us about interesting use cases, to support the further development of the format.

9 in total

1. The systems biology markup language (SBML): a medium for representation and exchange of biochemical network models.

Authors: M Hucka; A Finney; H M Sauro; H Bolouri; J C Doyle; H Kitano; A P Arkin; B J Bornstein; D Bray; A Cornish-Bowden; A A Cuellar; S Dronov; E D Gilles; M Ginkel; V Gor; I I Goryanin; W J Hedley; T C Hodgman; J-H Hofmeyr; P J Hunter; N S Juty; J L Kasberger; A Kremling; U Kummer; N Le Novère; L M Loew; D Lucio; P Mendes; E Minch; E D Mjolsness; Y Nakayama; M R Nelson; P F Nielsen; T Sakurada; J C Schaff; B E Shapiro; T S Shimizu; H D Spence; J Stelling; K Takahashi; M Tomita; J Wagner; J Wang
Journal: Bioinformatics Date: 2003-03-01 Impact factor: 6.937

2. Minimum information requested in the annotation of biochemical models (MIRIAM).

Authors: Nicolas Le Novère; Andrew Finney; Michael Hucka; Upinder S Bhalla; Fabien Campagne; Julio Collado-Vides; Edmund J Crampin; Matt Halstead; Edda Klipp; Pedro Mendes; Poul Nielsen; Herbert Sauro; Bruce Shapiro; Jacky L Snoep; Hugh D Spence; Barry L Wanner
Journal: Nat Biotechnol Date: 2005-12 Impact factor: 54.908

3. The Systems Biology Graphical Notation.

Authors: Nicolas Le Novère; Michael Hucka; Huaiyu Mi; Stuart Moodie; Falk Schreiber; Anatoly Sorokin; Emek Demir; Katja Wegner; Mirit I Aladjem; Sarala M Wimalaratne; Frank T Bergman; Ralph Gauges; Peter Ghazal; Hideya Kawaji; Lu Li; Yukiko Matsuoka; Alice Villéger; Sarah E Boyd; Laurence Calzone; Melanie Courtot; Ugur Dogrusoz; Tom C Freeman; Akira Funahashi; Samik Ghosh; Akiya Jouraku; Sohyoung Kim; Fedor Kolpakov; Augustin Luna; Sven Sahle; Esther Schmidt; Steven Watterson; Guanming Wu; Igor Goryanin; Douglas B Kell; Chris Sander; Herbert Sauro; Jacky L Snoep; Kurt Kohn; Hiroaki Kitano
Journal: Nat Biotechnol Date: 2009-08-07 Impact factor: 54.908

4. The first RSBI (ISA-TAB) workshop: "can a simple format work for complex studies?".

Authors: Susanna-Assunta Sansone; Philippe Rocca-Serra; Marco Brandizi; Alvis Brazma; Dawn Field; Jennifer Fostel; Andrew G Garrow; Jack Gilbert; Federico Goodsaid; Nigel Hardy; Phil Jones; Allyson Lister; Michael Miller; Norman Morrison; Tim Rayner; Nataliya Sklyar; Chris Taylor; Weida Tong; Guy Warner; Stefan Wiemann
Journal: OMICS Date: 2008-06

5. Parameter balancing in kinetic models of cell metabolism.

Authors: Timo Lubitz; Marvin Schulz; Edda Klipp; Wolfram Liebermeister
Journal: J Phys Chem B Date: 2010-11-01 Impact factor: 2.991

6. Identifiers.org and MIRIAM Registry: community resources to provide persistent identification.

Authors: Nick Juty; Nicolas Le Novère; Camille Laibe
Journal: Nucleic Acids Res Date: 2011-12-02 Impact factor: 16.971

7. eQuilibrator--the biochemical thermodynamics calculator.

Authors: Avi Flamholz; Elad Noor; Arren Bar-Even; Ron Milo
Journal: Nucleic Acids Res Date: 2011-11-07 Impact factor: 16.971

8. KiMoSys: a web-based repository of experimental data for KInetic MOdels of biological SYStems.

Authors: Rafael S Costa; André Veríssimo; Susana Vinga
Journal: BMC Syst Biol Date: 2014-08-13

9. Systematic construction of kinetic models from genome-scale metabolic networks.

Authors: Natalie J Stanford; Timo Lubitz; Kieran Smallbone; Edda Klipp; Pedro Mendes; Wolfram Liebermeister
Journal: PLoS One Date: 2013-11-14 Impact factor: 3.240

9 in total

13 in total

1. Combining hypothesis- and data-driven neuroscience modeling in FAIR workflows.

Authors: Olivia Eriksson; Upinder Singh Bhalla; Kim T Blackwell; Sharon M Crook; Daniel Keller; Andrei Kramer; Marja-Leena Linne; Ausra Saudargienė; Rebecca C Wade; Jeanette Hellgren Kotaleski
Journal: Elife Date: 2022-07-06 Impact factor: 8.713

Review 2. Best Practices for Making Reproducible Biochemical Models.

Authors: Veronica L Porubsky; Arthur P Goldberg; Anand K Rampadarath; David P Nickerson; Jonathan R Karr; Herbert M Sauro
Journal: Cell Syst Date: 2020-08-26 Impact factor: 10.304

3. The Protein Cost of Metabolic Fluxes: Prediction from Enzymatic Rate Laws and Cost Minimization.

Authors: Elad Noor; Avi Flamholz; Arren Bar-Even; Dan Davidi; Ron Milo; Wolfram Liebermeister
Journal: PLoS Comput Biol Date: 2016-11-03 Impact factor: 4.475

Review 4. Data Sharing: Convert Challenges into Opportunities.

Authors: Ana Sofia Figueiredo
Journal: Front Public Health Date: 2017-12-04

5. SBMLmod: a Python-based web application and web service for efficient data integration and model simulation.

Authors: Sascha Schäuble; Anne-Kristin Stavrum; Mathias Bockwoldt; Pål Puntervoll; Ines Heiland
Journal: BMC Bioinformatics Date: 2017-06-24 Impact factor: 3.169

6. DMPy: a Python package for automated mathematical model construction of large-scale metabolic systems.

Authors: Robert W Smith; Rik P van Rosmalen; Vitor A P Martins Dos Santos; Christian Fleck
Journal: BMC Syst Biol Date: 2018-06-19

Review 7. Systems Bioinformatics: increasing precision of computational diagnostics and therapeutics through network-based approaches.

Authors: Anastasis Oulas; George Minadakis; Margarita Zachariou; Kleitos Sokratous; Marilena M Bourdakou; George M Spyrou
Journal: Brief Bioinform Date: 2019-05-21 Impact factor: 11.622

8. A comprehensive, mechanistically detailed, and executable model of the cell division cycle in Saccharomyces cerevisiae.

Authors: Ulrike Münzner; Edda Klipp; Marcus Krantz
Journal: Nat Commun Date: 2019-03-21 Impact factor: 14.919

9. Parameter balancing: consistent parameter sets for kinetic metabolic models.

Authors: Timo Lubitz; Wolfram Liebermeister
Journal: Bioinformatics Date: 2019-10-01 Impact factor: 6.937

10. Modeling Meets Metabolomics-The WormJam Consensus Model as Basis for Metabolic Studies in the Model Organism Caenorhabditis elegans.

Authors: Michael Witting; Janna Hastings; Nicolas Rodriguez; Chintan J Joshi; Jake P N Hattwell; Paul R Ebert; Michel van Weeghel; Arwen W Gao; Michael J O Wakelam; Riekelt H Houtkooper; Abraham Mains; Nicolas Le Novère; Sean Sadykoff; Frank Schroeder; Nathan E Lewis; Horst-Joachim Schirra; Christoph Kaleta; Olivia Casanueva
Journal: Front Mol Biosci Date: 2018-11-14