| Literature DB >> 29315357 |
Nico M Franz1, Beckett W Sterner1.
Abstract
Growing concerns about the quality of aggregated biodiversity data are lowering trust in large-scale data networks. Aggregators frequently respond to quality concerns by recommending that biologists work with original data providers to correct errors 'at the source.' We show that this strategy falls systematically short of a full diagnosis of the underlying causes of distrust. In particular, trust in an aggregator is not just a feature of the data signal quality provided by the sources to the aggregator, but also a consequence of the social design of the aggregation process and the resulting power balance between individual data contributors and aggregators. The latter have created an accountability gap by downplaying the authorship and significance of the taxonomic hierarchies-frequently called 'backbones'-they generate, and which are in effect novel classification theories that operate at the core of data-structuring process. The Darwin Core standard for sharing occurrence records plays an under-appreciated role in maintaining the accountability gap, because this standard lacks the syntactic structure needed to preserve the taxonomic coherence of data packages submitted for aggregation, potentially leading to inferences that no individual source would support. Since high-quality data packages can mirror competing and conflicting classifications, i.e. unsettled systematic research, this plurality must be accommodated in the design of biodiversity data integration. Looking forward, a key directive is to develop new technical pathways and social incentives for experts to contribute directly to the validation of taxonomically coherent data packages as part of a greater, trustworthy aggregation process.Entities:
Mesh:
Year: 2018 PMID: 29315357 PMCID: PMC7206650 DOI: 10.1093/database/bax100
Source DB: PubMed Journal: Database (Oxford) ISSN: 1758-0463 Impact factor: 3.451
Figure 1.Backbone-based aggregation disrupts coherent biodiversity data packages. ‘Most real’ example adopted from Franz et al. (30). The top right table presents an alignment of five different taxonomies for the Cleistes/Cleistesiopsis complex sec. Radford et al. (101), Fernald (102), USDA Plants (103), Kartesz (104) and Weakley (94). Columns indicate the relative congruence between different taxonomic concepts, whereas rows show the period of usage, validly recognized names and sources. (A–E) Five representations of the same set of 20 specimens provided by the SERNEC Data Portal (93), with distribution maps that identify four ecoregions R1–R4 (right) and tables displaying the ecoregion-specific presence (+), absence (–) or inapplicability (o––i.e. name not available) of occurrences identified to taxonomic concept labels. (A–C) Concept occurrence patterns according three reciprocally incongruent, yet internally coherent taxonomies; (D) raw (unprocessed) aggregate of (A–C), where each source contributes a complementary subset (data package) of the 20 specimens––hence six taxonomic names are shown and (E) backbone-based transformation of (D). Both (D) and (E) support new biological inferences (red circles) regarding the sympatry of multiple entities of the complex in ecoregions R1 and R4 (= false positives), and the local endemism of an entity labeled bifaria in R2 (= false negative), which is possible if pro parte synonymy relationships are not coherently transposed in the backbone-based synthesis.