| Literature DB >> 23047157 |
Wasco Wruck1, Martin Peuker, Christian R A Regenbrecht.
Abstract
Good accessibility of publicly funded research data is essential to secure an open scientific system and eventually becomes mandatory [Wellcome Trust will Penalise Scientists Who Don't Embrace Open Access. The Guardian 2012]. By the use of high-throughput methods in many research areas from physics to systems biology, large data collections are increasingly important as raw material for research. Here, we present strategies worked out by international and national institutions targeting open access to publicly funded research data via incentives or obligations to share data. Funding organizations such as the British Wellcome Trust therefore have developed data sharing policies and request commitment to data management and sharing in grant applications. Increased citation rates are a profound argument for sharing publication data. Pre-publication sharing might be rewarded by a data citation credit system via digital object identifiers (DOIs) which have initially been in use for data objects. Besides policies and incentives, good practice in data management is indispensable. However, appropriate systems for data management of large-scale projects for example in systems biology are hard to find. Here, we give an overview of a selection of open-source data management systems proved to be employed successfully in large-scale projects.Entities:
Keywords: data citation; data management; data sharing; open access; systems biology
Mesh:
Year: 2012 PMID: 23047157 PMCID: PMC3896927 DOI: 10.1093/bib/bbs064
Source DB: PubMed Journal: Brief Bioinform ISSN: 1467-5463 Impact factor: 11.622
Functional requirements for data management systems in large-scale systems biology projects
| No. | Requirement | Notes |
|---|---|---|
| 1 | Support for standard data formats | MIBBI, SBML, PSI-MI, mzML, etc. |
| 2 | Assistance in metadata annotation | E.g. suggesting predefined values from ontologies |
| 3 | Automation of data collection | E.g. via harvesting from distributed repositories |
| 4 | Support for modelling data | Primarily SBML models, CellML models |
| 5 | Support for upload to public repositories | NCBI GEO, EBI ArrayExpress, BioModels, JWS-Online, etc. |
| 6 | Extension system | To new data types and functionality (e.g. plug-ins) |
| 7 | Integration with heterogeneous DM systems | SW design, interfaces, web interfaces, servlets |
| a | Fine-grained access control | Keep data private, share with dedicated users, groups, world |
| b | Embargo periods | Retain data publishing until predefined time points |
| c | Support for publications, data publications | Providing data for supplementaries, data publications |
| d | Support for large data | Depends on the technique, e.g. next-generation sequencing |
| e | Analysis and modeling functionality | Optional |
| f | Connectivity to relevant external resources | (Optional) via integration (data warehouses) or links |
Figure 1:SysMO-DB system chart: SysMO-DB was developed for a multi-national large-scale project consisting of multiple ‘sub’-projects with own data management solutions which were not changed. For that purpose the Just Enough Results Model (JERM) was introduced which aims at finding minimal information to make data comparable across project borders. JERM templates cater for compliance to MIBBI. Data of multiple projects are brought together via upload to the assets catalogue which can be performed automatically using so-called JERM ‘harvesters’. The yellow pages component provides details about projects, participating people and institutions to enable exchange of expertise and association of assets to people. Many external resources are connected to SysMO-DB, e.g. integration of JWS-Online provides systems biological modelling facilities for project data.
Figure 2:DIPSBC system chart. Data are first converted to the Solr xml format (‘normalized’) and afterward indexed by the Solr search server. Then data sets can be found efficiently and will be passed to document type specific objects which initiate processes corresponding to the dedicated data type (here MIAME data and PubMed data are shown). New data types can be introduced straightforwardly by adapting new objects derived from existing ones, e.g. the xml object.
Figure 3:Benchmarking: data management systems for large-scale systems biology projects.