| Literature DB >> 35874548 |
Max F Czapanskiy1, Roxanne S Beltran2.
Abstract
What new questions could ecophysiologists answer if physio-logging research was fully reproducible? We argue that technical debt (computational hurdles resulting from prioritizing short-term goals over long-term sustainability) stemming from insufficient cyberinfrastructure (field-wide tools, standards, and norms for analyzing and sharing data) trapped physio-logging in a scientific silo. This debt stifles comparative biological analyses and impedes interdisciplinary research. Although physio-loggers (e.g., heart rate monitors and accelerometers) opened new avenues of research, the explosion of complex datasets exceeded ecophysiology's informatics capacity. Like many other scientific fields facing a deluge of complex data, ecophysiologists now struggle to share their data and tools. Adapting to this new era requires a change in mindset, from "data as a noun" (e.g., traits, counts) to "data as a sentence", where measurements (nouns) are associate with transformations (verbs), parameters (adverbs), and metadata (adjectives). Computational reproducibility provides a framework for capturing the entire sentence. Though usually framed in terms of scientific integrity, reproducibility offers immediate benefits by promoting collaboration between individuals, groups, and entire fields. Rather than a tax on our productivity that benefits some nebulous greater good, reproducibility can accelerate the pace of discovery by removing obstacles and inviting a greater diversity of perspectives to advance science and society. In this article, we 1) describe the computational challenges facing physio-logging scientists and connect them to the concepts of technical debt and cyberinfrastructure, 2) demonstrate how other scientific fields overcame similar challenges by embracing computational reproducibility, and 3) present a framework to promote computational reproducibility in physio-logging, and bio-logging more generally.Entities:
Keywords: bio-logging; cyberinfrastructure; ecoinformatics; ecophysiology; technical debt
Year: 2022 PMID: 35874548 PMCID: PMC9304648 DOI: 10.3389/fphys.2022.917976
Source DB: PubMed Journal: Front Physiol ISSN: 1664-042X Impact factor: 4.755
Glossary of terms.
| Term | Definition | References |
|---|---|---|
| Cyberinfrastructure | The collective interface between data collection and data analysis for a scientific field, including software, hardware, personnel, and shared practices |
|
| Technical debt | Short-term, sub-optimal choices in data and code that hamper future development without refactoring, such as missing documentation and bug-prone code | ( |
| Heterogeneous data | Combinations of data collected at different temporal scales and/or with different properties, for example multivariate time series (e.g., acceleration) with intermittent geospatial locations (e.g., GPS) | ( |
| Literate programming | A programming technique that combines code itself with descriptive text and outputs (figures, tables). R Markdown is an implementation of literate programming | ( |
| Data provenance | A record of the origin and processing steps that produced the data |
|
FIGURE 1The proposed biologr R package will provide physio-logging cyberinfrastructure. (A) From raw data to standardized, reproducible data. Import and export functions ensure data standard compliance, e.g., file formats and directory structures. A validate function automatically verifies that the computational workflow (see (B)) reproducibly generates the processed data. (B) biologr provides an R Markdown template (see section Introducing biologr) for recording the workflow. The R Markdown knit command creates a report documenting data processing and interpretation, i.e., data provenance.