| Literature DB >> 31890356 |
Rachel A Hackett1,2, Michael W Belitz1,3, Edward E Gilbert4, Anna K Monfils1.
Abstract
PREMISE: Heterogeneity of biodiversity data from the collections, research, and management communities presents challenges for data findability, accessibility, interoperability, and reusability. Workflows designed with data collection, standards, dissemination, and reuse in mind will generate better information across geopolitical, administrative, and institutional boundaries. Here, we present our data workflow as a case study of how we collected, shared, and used data from multiple sources.Entities:
Keywords: FAIR data principles; biodiversity data; natural history collections; plant diversity; research infrastructure
Year: 2019 PMID: 31890356 PMCID: PMC6923704 DOI: 10.1002/aps3.11310
Source DB: PubMed Journal: Appl Plant Sci ISSN: 2168-0450 Impact factor: 1.936
Figure 1Generalized workflow describing the people, places, and processes involved in the transfer of data from the field to users. Dashed lines represent alternative pathways offered by some online data repositories. Descriptions and examples of each group are provided in Table 1.
Term table clarifying the terminology used in this paper (Wieczorek et al., 2012; Darwin Core Maintenance Group, 2014). Definitions are provided for terms with potential for variable interpretation.
| Term | Definition | Example |
|---|---|---|
| Biodiversity data | Data related to knowledge about individual biological organisms and the ecosystems they shape | organismal, geographical, ecological, environmental data |
| Data | Unstructured quantitative and qualitative measurements and facts that are yet to be analyzed | count, length, life stage |
| Data aggregator | A virtual entity where information can be stored, searched, and queried for distributable information. Many data aggregators are also data repositories. | GBIF, iDigBio, DataONE |
| Data collector | Person or machine that collects primary data | citizen scientists, optical character resolution, technicians |
| Data curator | Person or organization that organizes, analyzes, and disseminates data into information | researchers, collections, management agencies |
| Data repository | A virtual entity where data curators can deposit, edit, and manipulate their data | LeptNet, Midwest Consortium of Herbaria |
| Data user | Person or organization that retrieves data or information from data curators, data repositories, and/or aggregators to clean, analyze, and use for their own purposes | land owners and managers, management agencies, non‐profit organizations, researchers, students, |
| Event | A Darwin Core table/class tied to a location, date, and time. There can be many event records corresponding to the same location. | |
| Field | Categorization of a set of data values as a column in a table. Also referred to as an attribute, column, or term name. | locationID, eventID, eventDate, lifeStage |
| GPS data dictionary | A form created to record attribute data and measurements to accompany a record or shapefile generated by a GPS unit. Often customizable. | ArcCollector, Trimble Pathfinder data dictionary, ColectoR, iNaturalist |
| In‐house database | Location in which organized data are stored, linked, searched, and queried for distributable information | BIOTICS World Heritage Database, SQL database, Specify |
| Information | Processed, organized, structured, or presented data that are given context so they are useful for answering a question | figure, map, mean, statistical analysis |
| Location | A Darwin Core table/class tied to a geographic location | Brandt Road Fen, Site 5 |
| Natural history collection | An archived collection of preserved physical specimens | herbarium, museum |
| Occurrence | A Darwin Core table/class tied to the collection or observation of an organism or related material during an event. There can be many occurrence records corresponding to the same event and the same location. | |
| Record | Related data. Also referred to as a row. | |
| Table | A grouping of related data, with each row corresponding to one record and columns containing fields describing data values. Also referred to as a worksheet, spreadsheet, or class. | Location, Event, Occurrence, MeasurementOrFact |
Figure 2Relational in‐house database. Each box with a black header represents a table included in the database. Each row in the boxes represents a field/attribute. Three dots (“…”) indicate that additional fields were not included for ease of viewing. Black lines represent common fields between tables that were linked for query capabilities. Darwin Core Standards were used for the table and field names (Wieczorek et al., 2012; Darwin Core Maintenance Group, 2014).
Figure 3Screenshot of the Waterloo Recreation Area–Mount Hope Road Fen/Glenn Fen checklist of one prairie fen location (http://midwestherbaria.org/portal/checklists/checklist.php?pxml:id=113&clxml:id=4393). Specimens not collected for this project but geolocated to this location were added through the portal (e.g., Nicole Schmidt 1 [CMC]). This dynamic checklist is a “child” checklist contributing all human observations and preserved specimen plant occurrences to the Prairie Fens of Southwest Michigan (http://midwestherbaria.org/portal/checklists/checklist.php?cl=4362&pxml:id=113), which includes species occurrences from all 29 surveyed prairie fens.
| Location | Event | Occurrence | |
|---|---|---|---|
|
|
locality locationID county stateProvince countryCode decimalLatitude decimalLongitude minimumElevationInMeters verbatimLocality locationRemarks habitat |
eventID locationID datasetID datasetName eventDate eventTime decimalLatitude decimalLongitude minimumElevationInMeters samplingProtocol sampleSizeValue sampleSizeUnit samplingEffort habitat fieldNotes PhotoStart* PhotoEnd* |
fieldNumber recordNumber eventID locationID datasetID basisOfRecord sex fieldIdentification* occurrenceRemarks recordedBy Journal* PhotoStart* PhotoEnd* |
|
|
species richness Shannon's Diversity Index surrounding land cover proportions (eight classes, four scales) area perimeter: area ratio accumulated least cost path least cost path distance mean near distance |
species richness porewater pH porewater temperature depth to water table water sample analysis results (three measurements) soil sample analysis results (10 measurements) floristic zone category†
estimate of percent cover (bare ground, water, vegetation) |
(Preserved specimens and human observations)
Daubenmire cover class (Daubenmire, |
|
|
area perimeter |
wind speed cloud cover DAFOR scale density ranking of nectar sources (Rich et al., temperature relative humidity |
distance from observer wing wear class nectar sources utilized†
behavior over 10 min†
|
|
(Human observation)
count of each flowering unit |