Literature DB >> 26700851

The evolution of standards and data management practices in systems biology.

Natalie J Stanford¹, Katherine Wolstencroft², Martin Golebiewski³, Renate Kania³, Nick Juty⁴, Christopher Tomlinson⁵, Stuart Owen⁶, Sarah Butcher⁵, Henning Hermjakob⁴, Nicolas Le Novère⁷, Wolfgang Mueller³, Jacky Snoep⁸, Carole Goble⁶.

Abstract

Entities: Chemical

Mesh：

Year: 2015 PMID： 26700851 PMCID： PMC4704484 DOI： 10.15252/msb.20156053

Source DB: PubMed Journal: Mol Syst Biol ISSN： 1744-4292 Impact factor: 11.429

× No keyword cloud information.

Introduction

Systems biology involves the integration of multiple heterogeneous data sets, in order to model and predict biological processes. The domain's interdisciplinary nature requires data, models and other research assets to be formatted and described in standard ways to enable exchange and reuse. Infrastructure for Systems Biology Europe (ISBE) is a project to establish essential, centralized services for systems biology researchers throughout the systems biology lifecycle. A key component of ISBE is to support the management, integration and exchange of data, models, results and protocols. To inform further ISBE development, we surveyed the community to evaluate the uptake of available standards, and current practices of researchers in data and model management. The survey addressed four key areas as follows: Standards usage; Data and model storage before publication; Sharing in public repositories after publication; Reusability of data, models and results. The survey was sent to major mailing lists targeting the systems biology and computational biology communities and advertised at relevant consortia meetings. It elicited 153 responses, from 17 countries across 6 continents, with a cross section of the systems biology community represented (Appendix Fig S1). Lessons from the survey are being implemented as part of an ISBE supporting project, FAIRDOM (www.fair-dom.org). To understand how uptake of standards has developed, we compared our findings to a previous study by Klipp et al in 2007. Fig 1 shows a summary of the survey results (detailed results in Dataset EV1). A number of acronyms are used within the text, details of which can be found in Table 1.

Figure 1

Survey summary.

Table 1

Glossary of acronyms

Acronym	Description	Link
Array Ex.	Array Express—archive of functional genomics data	https://www.ebi.ac.uk/arrayexpress/
BioModels	Database for storing curated and non‐curated systems biology computational models	https://www.ebi.ac.uk/biomodels/
CellML	Standard for formatting models, as well as a model repository	https://www.cellml.org/
ChEBI	Chemical Entities of Biological Interest—a dictionary of molecular entities	https://www.ebi.ac.uk/chebi/init.do
COMBINE	Computational Modelling in Biology Network	http://co.mbine.org
ENA	European Nucleotide Archive—a comprehensive record of nucleotide sequences	http://www.ebi.ac.uk/ena
FAIRDOM	Findable Accessible Interoperable Reusable Data standard Operating Procedures and Models	http://fair-dom.org
FASTA	Text‐based format for representing nucleotide sequences	https://en.wikipedia.org/wiki/FASTA_format
GEO	Gene Expression Omnibus—repository for functional genomics data	http://www.ncbi.nlm.nih.gov/geo/
GO	Gene Ontology—a controlled vocabulary of gene and gene product attributes	http://geneontology.org/
ISBE	Infrastructure for Systems Biology Europe	http://project.isbe.eu
ISO	International Standards Organization	http://www.iso.org
JWS Online	Tool for online simulation of systems biology models	http://jjj.mib.ac.uk/
KISAO	Kinetic Simulation Algorithm Ontology, for identifying algorithms and associated set‐up of simulations	http://co.mbine.org/standards/kisao
MIAME	Minimum Information about a Microarray Experiment	http://fged.org/projects/miame/
MIASE	Minimum Information about a Simulation Experiment	http://co.mbine.org/standards/miase
MIRIAM	Minimum Information Required in the Annotation of Models	http://co.mbine.org/standards/miriam
SBGN	Systems Biology Graphical Notation	http://www.sbgn.org/
SBML	Systems Biology Mark‐up Language	http://sbml.org/
SEEK	Bespoke systems biology data management platform, which works as an aggregated content commons, and a database	http://fair-dom.org/SEEK

Survey summary. Glossary of acronyms

Standards usage

Formatting and describing data and models using community standards enables them to be understood, compared, exchanged and reused by both collaborators and the wider community. As such, uptake of standards is vital for high‐quality, reproducible research. This is especially true for systems biology which naturally requires frequent exchange of data and models. In systems biology, standards are primarily developed by community standardization initiatives such as COMBINE (Hucka et al, 2015), and ISO. In this study, we consider three major types of standards as follows: Standard formats for representing data and models; Standard metadata checklists for describing particular types of data and models; Controlled vocabularies and ontologies to provide a common notation and annotation vocabulary. In 2007, Klipp et al identified formats, in particular those for encoding models, as the most widely used standards. This is still the case now, with SBML (60%) and SBGN (22%) (Hucka et al, 2015) dominating. These standard formats allow easy exchange between software tools and databases, improving (re)usability. The availability and uptake of formats has grown rapidly since 2007. Standards for formatting and visualizing models and for some common experimental data are now available. Metadata standards—standards for data describing the data—were highlighted as requiring significant development in 2007. There are now over 40 minimum information checklists that consistently structure the least amount of information required to interpret a data set. These include common data and model types in systems biology (see Appendix). MIRIAM (Le Novère et al, 2005), MIAME (Brazma et al, 2001) and MIASE (Waltemath et al, 2011) are the most used by respondents. Ontologies are often used as annotation vocabularies within metadata descriptions. Ontologies for annotating gene functions (GO—47% Ashburner et al, 2000), small molecules (ChEBI—21% Hastings et al, 2013) and model simulations (KISAO—16% Courtot et al, 2011) are the most popular in the community, with growing acceptance since 2007. Whilst the availability of standards and their growing uptake is encouraging, there is still a dearth of standards for many data types. A priority must be to increase standard availability for common data types not covered. One of the major bottlenecks for uptake is most likely the lack of tools that implement support for standards. If standards compliant results were supported by information management software, it would become part of the research process and thereby reduce the time, knowledge and skills required to achieve compliance, facilitating quicker and more widespread adoption.

Storage of research assets

Systems biology researchers need to exchange experimental data, computer code and models between collaborators within their institute and with distributed, external partners. Despite this exchange being a key activity, the majority of researchers still only store their work on their local hard disc (71%), or shared file systems within their institute (58%). This can make versioning or snapshotting research assets difficult and raises barriers for sharing with collaborators, or, for example, when key personnel leave a team. Content management systems and bespoke systems biology platforms are more amenable to organizing, versioning and sharing, but are only used by 31% and 7% of researchers, respectively. Bespoke platforms require more investment in upload and updating, but provide users with more security for data backup, and offer versioning and easier sharing options.

Sharing in public repositories

Using public repositories is more common to share models (56%) than data (39%). BioModels (Chelliah et al, 2015) is the most popular models database (33%)—it is also one of the most popular for finding models after publication (22%). Data are often published in dedicated repositories, grouped by data type (e.g. metabolomics data in a metabolomics database), rather than by function (e.g. all data on human liver). This can make identifying complementary datasets for integration into models difficult, even if the data are well annotated. A major disadvantage for systems biology results is that data sets that were generated from the same samples to address specific biological processes can be separated and submitted to several independent repositories, which results in a loss of experimental context. Some researchers use content aggregator commons, such as SEEK (7%) (Wolstencroft et al, 2015), which support functional linking for data and model integration, helping retain experimental context. Sharing data and models solely through supplementary material in journal articles is still common practice. This represents a publication‐centric view of the data, which means finding related data might be more difficult than it would be when data are submitted to public repositories.

Reusability of models

Being able to reuse data and models in different studies allows a maximized return on research investments. The majority of respondents found it difficult to reuse models and associated data. Model parameters and the traceability of their origins were particularly notable as areas that needed improvement (67% finding issues). These could be improved with better annotation of the original data and better semantic linking of the models to the experimental data that was used to construct them.

Conclusions and outlook

It is clear from the research that we need: Software tools that support standards, thereby facilitating their adoption; Shared/cloud‐based platforms to disseminate assets across the community; Annotate and curate assets to enable their meaningful integration; Intimately and persistently, link structured and annotated data and models. To address the issues above, we suggest that centralized coordinated infrastructures like ISBE, in collaboration with standardization initiatives such as COMBINE, take lead in improving availability, adoption and long‐term sustainability of standards. This can be achieved through the training of researchers as well as tool development to support their work flows. The community should also look towards encouraging data and model sharing through incentives such as credit mechanisms and appropriate mandates on practices from journals. Appendix Click here for additional data file. Dataset EV1 Click here for additional data file.

11 in total

1. Minimum information about a microarray experiment (MIAME)-toward standards for microarray data.

Authors: A Brazma; P Hingamp; J Quackenbush; G Sherlock; P Spellman; C Stoeckert; J Aach; W Ansorge; C A Ball; H C Causton; T Gaasterland; P Glenisson; F C Holstege; I F Kim; V Markowitz; J C Matese; H Parkinson; A Robinson; U Sarkans; S Schulze-Kremer; J Stewart; R Taylor; J Vilo; M Vingron
Journal: Nat Genet Date: 2001-12 Impact factor: 38.330

2. Minimum information requested in the annotation of biochemical models (MIRIAM).

Authors: Nicolas Le Novère; Andrew Finney; Michael Hucka; Upinder S Bhalla; Fabien Campagne; Julio Collado-Vides; Edmund J Crampin; Matt Halstead; Edda Klipp; Pedro Mendes; Poul Nielsen; Herbert Sauro; Bruce Shapiro; Jacky L Snoep; Hugh D Spence; Barry L Wanner
Journal: Nat Biotechnol Date: 2005-12 Impact factor: 54.908

3. Systems biology standards--the community speaks.

Authors: Edda Klipp; Wolfram Liebermeister; Anselm Helbig; Axel Kowald; Jörg Schaber
Journal: Nat Biotechnol Date: 2007-04 Impact factor: 54.908

4. Minimum Information About a Simulation Experiment (MIASE).

Authors: Dagmar Waltemath; Richard Adams; Daniel A Beard; Frank T Bergmann; Upinder S Bhalla; Randall Britten; Vijayalakshmi Chelliah; Michael T Cooling; Jonathan Cooper; Edmund J Crampin; Alan Garny; Stefan Hoops; Michael Hucka; Peter Hunter; Edda Klipp; Camille Laibe; Andrew K Miller; Ion Moraru; David Nickerson; Poul Nielsen; Macha Nikolski; Sven Sahle; Herbert M Sauro; Henning Schmidt; Jacky L Snoep; Dominic Tolle; Olaf Wolkenhauer; Nicolas Le Novère
Journal: PLoS Comput Biol Date: 2011-04-28 Impact factor: 4.475

5. Controlled vocabularies and semantics in systems biology.

Authors: Mélanie Courtot; Nick Juty; Christian Knüpfer; Dagmar Waltemath; Anna Zhukova; Andreas Dräger; Michel Dumontier; Andrew Finney; Martin Golebiewski; Janna Hastings; Stefan Hoops; Sarah Keating; Douglas B Kell; Samuel Kerrien; James Lawson; Allyson Lister; James Lu; Rainer Machne; Pedro Mendes; Matthew Pocock; Nicolas Rodriguez; Alice Villeger; Darren J Wilkinson; Sarala Wimalaratne; Camille Laibe; Michael Hucka; Nicolas Le Novère
Journal: Mol Syst Biol Date: 2011-10-25 Impact factor: 11.429

Review 6. Promoting Coordinated Development of Community-Based Information Standards for Modeling in Biology: The COMBINE Initiative.

Authors: Michael Hucka; David P Nickerson; Gary D Bader; Frank T Bergmann; Jonathan Cooper; Emek Demir; Alan Garny; Martin Golebiewski; Chris J Myers; Falk Schreiber; Dagmar Waltemath; Nicolas Le Novère
Journal: Front Bioeng Biotechnol Date: 2015-02-24

7. BioModels: ten-year anniversary.

Authors: Vijayalakshmi Chelliah; Nick Juty; Ishan Ajmera; Raza Ali; Marine Dumousseau; Mihai Glont; Michael Hucka; Gaël Jalowicki; Sarah Keating; Vincent Knight-Schrijver; Audald Lloret-Villas; Kedar Nath Natarajan; Jean-Baptiste Pettit; Nicolas Rodriguez; Michael Schubert; Sarala M Wimalaratne; Yangyang Zhao; Henning Hermjakob; Nicolas Le Novère; Camille Laibe
Journal: Nucleic Acids Res Date: 2014-11-20 Impact factor: 16.971

8. The evolution of standards and data management practices in systems biology.

Authors: Natalie J Stanford; Katherine Wolstencroft; Martin Golebiewski; Renate Kania; Nick Juty; Christopher Tomlinson; Stuart Owen; Sarah Butcher; Henning Hermjakob; Nicolas Le Novère; Wolfgang Mueller; Jacky Snoep; Carole Goble
Journal: Mol Syst Biol Date: 2015-12-23 Impact factor: 11.429

9. SEEK: a systems biology data and model management platform.

Authors: Katherine Wolstencroft; Stuart Owen; Olga Krebs; Quyen Nguyen; Natalie J Stanford; Martin Golebiewski; Andreas Weidemann; Meik Bittkowski; Lihua An; David Shockley; Jacky L Snoep; Wolfgang Mueller; Carole Goble
Journal: BMC Syst Biol Date: 2015-07-11

10. The ChEBI reference database and ontology for biologically relevant chemistry: enhancements for 2013.

Authors: Janna Hastings; Paula de Matos; Adriano Dekker; Marcus Ennis; Bhavana Harsha; Namrata Kale; Venkatesh Muthukrishnan; Gareth Owen; Steve Turner; Mark Williams; Christoph Steinbeck
Journal: Nucleic Acids Res Date: 2012-11-24 Impact factor: 16.971

10 in total

1. A blueprint for human whole-cell modeling.

Authors: Balázs Szigeti; Yosef D Roth; John A P Sekar; Arthur P Goldberg; Saahith C Pochiraju; Jonathan R Karr
Journal: Curr Opin Syst Biol Date: 2017-11-09

2. The evolution of standards and data management practices in systems biology.

3. Image data in need of a home.

Authors: Thomas Lemberger
Journal: Mol Syst Biol Date: 2015-12-23 Impact factor: 11.429

4. A roadmap towards personalized immunology.

Authors: Sylvie Delhalle; Sebastian F N Bode; Rudi Balling; Markus Ollert; Feng Q He
Journal: NPJ Syst Biol Appl Date: 2018-02-06

5. BioModels-15 years of sharing computational models in life science.

Authors: Rahuman S Malik-Sheriff; Mihai Glont; Tung V N Nguyen; Krishna Tiwari; Matthew G Roberts; Ashley Xavier; Manh T Vu; Jinghao Men; Matthieu Maire; Sarubini Kananathan; Emma L Fairbanks; Johannes P Meyer; Chinmay Arankalle; Thawfeek M Varusai; Vincent Knight-Schrijver; Lu Li; Corina Dueñas-Roca; Gaurhari Dass; Sarah M Keating; Young M Park; Nicola Buso; Nicolas Rodriguez; Michael Hucka; Henning Hermjakob
Journal: Nucleic Acids Res Date: 2020-01-08 Impact factor: 16.971

6. BioModels Parameters: a treasure trove of parameter values from published systems biology models.

Authors: Mihai Glont; Chinmay Arankalle; Krishna Tiwari; Tung V N Nguyen; Henning Hermjakob; Rahuman S Malik-Sheriff
Journal: Bioinformatics Date: 2020-11-01 Impact factor: 6.937

7. The Minimum Information about a Molecular Interaction CAusal STatement (MI2CAST).

Authors: Vasundra Touré; Steven Vercruysse; Marcio Luis Acencio; Ruth C Lovering; Sandra Orchard; Glyn Bradley; Cristina Casals-Casas; Claudine Chaouiya; Noemi Del-Toro; Åsmund Flobak; Pascale Gaudet; Henning Hermjakob; Charles Tapley Hoyt; Luana Licata; Astrid Lægreid; Christopher J Mungall; Anne Niknejad; Simona Panni; Livia Perfetto; Pablo Porras; Dexter Pratt; Julio Saez-Rodriguez; Denis Thieffry; Paul D Thomas; Dénes Türei; Martin Kuiper
Journal: Bioinformatics Date: 2021-04-05 Impact factor: 6.937

Review 8. Computational models of melanoma.

Authors: Marco Albrecht; Philippe Lucarelli; Dagmar Kulms; Thomas Sauter
Journal: Theor Biol Med Model Date: 2020-05-14 Impact factor: 2.432

Review 9. From whole-mount to single-cell spatial assessment of gene expression in 3D.

Authors: Lisa N Waylen; Hieu T Nim; Luciano G Martelotto; Mirana Ramialison
Journal: Commun Biol Date: 2020-10-23

10. BioModels: expanding horizons to include more modelling approaches and formats.

Authors: Mihai Glont; Tung V N Nguyen; Martin Graesslin; Robert Hälke; Raza Ali; Jochen Schramm; Sarala M Wimalaratne; Varun B Kothamachu; Nicolas Rodriguez; Maciej J Swat; Jurgen Eils; Roland Eils; Camille Laibe; Rahuman S Malik-Sheriff; Vijayalakshmi Chelliah; Nicolas Le Novère; Henning Hermjakob
Journal: Nucleic Acids Res Date: 2018-01-04 Impact factor: 16.971

10 in total