Literature DB >> 28743591

Terminology supported archiving and publication of environmental science data in PANGAEA.

Michael Diepenbroek1, Uwe Schindler2, Robert Huber3, Stéphane Pesant4, Markus Stocker5, Janine Felden6, Melanie Buss7, Matthias Weinrebe8.   

Abstract

Exemplified on the information system PANGAEA, we describe the application of terminologies for archiving and publishing environmental science data. A terminology catalogue (TC) was embedded into the system, with interfaces allowing to replicate and to manually work on terminologies. For data ingest and archiving, we show how the TC can improve structuring and harmonizing lineage and content descriptions of data sets. Key is the conceptualization of measurement and observation types (parameters) and methods, for which we have implemented a basic syntax and rule set. For data access and dissemination, we have improved findability of data through enrichment of metadata with TC terms. Semantic annotations, e.g. adding term concepts (including synonyms and hierarchies) or mapped terms of different terminologies, facilitate comprehensive data retrievals. The PANGAEA thesaurus of classifying terms, which is part of the TC is used as an umbrella vocabulary that links the various domains and allows drill downs and side drills with various facets. Furthermore, we describe how TC terms can be linked to nominal data values. This improves data harmonization and facilitates structural transformation of heterogeneous data sets to a common schema. Technical developments are complemented by work on the metadata content. Over the last 20 years, more than 100 new parameters have been defined on average per week. Recently, PANGAEA has increasingly been submitting new terms to various terminology services. Matching terms from terminology services with our parameter or method strings is supported programmatically. However, the process ultimately needs manual input by domain experts. The quality of terminology services is an additional limiting factor, and varies with respect to content, editorial, interoperability, and sustainability. Good quality terminology services are the building blocks for the conceptualization of parameters and methods. In our view, they are essential for data interoperability and arguably the most difficult hurdle for data integration. In summary, the application of terminologies has a mutual positive effect for terminology services and information systems such as PANGAEA. On both sides, the application of terminologies improves content, reliability and interoperability.
Copyright © 2017 The Authors. Published by Elsevier B.V. All rights reserved.

Entities:  

Keywords:  Data findability; Data interoperability; Data publishing; Semantics; Terminologies

Mesh:

Year:  2017        PMID: 28743591     DOI: 10.1016/j.jbiotec.2017.07.016

Source DB:  PubMed          Journal:  J Biotechnol        ISSN: 0168-1656            Impact factor:   3.307


  1 in total

1.  The de.NBI / ELIXIR-DE training platform - Bioinformatics training in Germany and across Europe within ELIXIR.

Authors:  Daniel Wibberg; Bérénice Batut; Peter Belmann; Jochen Blom; Frank Oliver Glöckner; Björn Grüning; Nils Hoffmann; Nils Kleinbölting; René Rahn; Maja Rey; Uwe Scholz; Malvika Sharan; Andreas Tauch; Ulrike Trojahn; Björn Usadel; Oliver Kohlbacher
Journal:  F1000Res       Date:  2019-11-07
  1 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.