| Literature DB >> 36035265 |
C J Lortie1,2, Camila Vargas Poulsen3, Julien Brun1, Li Kui3.
Abstract
Data support knowledge development and theory advances in ecology and evolution. We are increasingly reusing data within our teams and projects and through the global, openly archived datasets of others. Metadata can be challenging to write and interpret, but it is always crucial for reuse. The value metadata cannot be overstated-even as a relatively independent research object because it describes the work that has been done in a structured format. We advance a new perspective and classify methods for metadata curation and development with tables. Tables with templates can be effectively used to capture all components of an experiment or project in a single, easy-to-read file familiar to most scientists. If coupled with the R programming language, metadata from tables can then be rapidly and reproducibly converted to publication formats including extensible markup language files suitable for data repositories. Tables can also be used to summarize existing metadata and store metadata across many datasets. A case study is provided and the added benefits of tables for metadata, a priori, are developed to ensure a more streamlined publishing process for many data repositories used in ecology, evolution, and the environmental sciences. In ecology and evolution, researchers are often highly tabular thinkers from experimental data collection in the lab and/or field, and representations of metadata as a table will provide novel research and reuse insights.Entities:
Keywords: R programming language; data; metadata; open science; tables; template; workflows
Year: 2022 PMID: 36035265 PMCID: PMC9405493 DOI: 10.1002/ece3.9245
Source DB: PubMed Journal: Ecol Evol ISSN: 2045-7758 Impact factor: 3.167
A list of representative R packages associated with ecological metadata language (EML) standards and tabular thinking framework.
| Tool | Description | Primary use | Functions | Tabular strategy | Source |
|---|---|---|---|---|---|
| EMLassemblyline | R package to create EML metadata for dataset publication | Generate metadata | Generate EML using content from a single‐use input or shiny interface | Tables to EML: input information provided in Word document tables into an R script to create a formatted EML (XML extension) |
|
| Excel‐to‐EML | R package to enter metadata from an MS Excel template | Generate metadata | Three R functions support reading information from Excel into metadata for EDI data repository publication | Tables to EML: Excel tables store and organize metadata for subsequent EML creation |
|
| LTER‐core‐metabase | PostgreSQL‐based relational database model designed for the management of ecological metadata | Generate metadata | Metadata database used to store attributes | Tables to organize metadata: database used to store and organize metadata in tabular format |
|
| MetaEgress | R package to create Ecological Metadata Language (EML) standard metadata documents from an installed and populated LTER‐core‐metabase | Generate metadata | The main functions are to query metadata LTER‐core‐metabase, then insert information into appropriate EML slots, then output an R list structured according to the EML standard | Tables to EML: read metadata from the LTER‐core‐metabase output tables and generate EML file |
|
| metajam | Resources to download specific datasets and their associated metadata from DataOne | Reuse metadata | Download data using R including the associated metadata | EML to tables: serialize EML into R dataframes |
|
| pkEML | R package to convert Ecological Metadata Language documents to tables to help data managers migrate their metadata archives. Can also be used for meta‐analaysis | Manage and reuse metadata | Functions tabularize existing EML and support project work | EML to tables: extract contents from corresponding xml nodes and populate R dataframes |
|
Note: Offerings available on both CRAN and GitHub are listed. The tool is the formal package name, and the description is a short statement of the goal for the resource. The primary use column lists the main purpose of the resource. The functions column describes some of the utilities of each resource in working with metadata and tables at some point on a workflow of publishing or interacting with published datasets in ecology and evolution. The tabular strategy underscores the three strategies evident to date including tables of metadata to EML, EML to tables, and finally tables to organize existing metadata. The source column provides the most current location for installation of each specific resource.
FIGURE 1A general workflow for using tables to support better metadata in ecology and evolution. Three high‐level steps are proposed as a simple heuristic. Each step includes details to consider for the general process of developing well‐articulated structured metadata for publication in a data repository. The open‐source programming language R is listed in the details alongside ecological metadata as examples. However, other tools and standards can be sourced at each step depending on the data and methods. The final step share and review will support iteration of the process for others if the metadata tables are published openly.