| Literature DB >> 32269475 |
Jarrett Blair1, Rodger Gwiazdowski2, Andrew Borrelli1, Michelle Hotchkiss1, Candace Park1, Gleannan Perrett1, Robert Hanner1.
Abstract
Biodiversity informatics depends on digital access to credible information about species. Many online resources host species' data, but the lack of categorisation for these resources inhibits the growth of this entire field. To explore possible solutions, we examined the (now retired) Biodiversity Information Projects of the World (BIPW) dataset created by the Biodiversity Information Standards (TDWG); this project, which ran from 2007-2015 (officially removed from the TDWG website in 2018) was an attempt at organising the Web's biodiversity databases into an indexed list. To do this, we applied a simple classification scheme to score databases within BIPW based on nine data categories, to characterise trends and current compositions of this biodiversity e-infrastructure. Primarily, we found that of 600 databases investigated from BIPW, only 315 (~53%) were accessible at the time of this writing, underscoring the precarious nature of the biodiversity information landscape. Many of these databases are still available, but suffer accessibility issues such as link rot, thus putting the information they contain in danger of being lost. We propose that a community-driven database of biodiversity databases with an accompanying ontology could facilitate efficient discovery of relevant biodiversity databases and support smaller databases - which have the greatest risk of being lost. Jarrett Blair, Rodger Gwiazdowski, Andrew Borrelli, Michelle Hotchkiss, Candace Park, Gleannan Perrett, Robert Hanner.Entities:
Keywords: Biodiversity; Database; Database of Databases; Databases; Indexing; Information Resource Discover; Metadata; Ontology
Year: 2020 PMID: 32269475 PMCID: PMC7125240 DOI: 10.3897/BDJ.8.e32765
Source DB: PubMed Journal: Biodivers Data J ISSN: 1314-2828
Figure 1.(a) Data and metadata make up (b) datasets. Multiple datasets in one location form a (c) database. (d) Aggregators compile data from many databases and (e) repackagers transform data in a way that makes it more accessible for all audiences (i.e. lay and professional). (f) External users (e.g. scientists, industries, government agencies etc.) access raw data by (g) going through any or all of these data sharing portals. This figure was adapted from Andy Bentley's presentation at the 2017 inaugural iDigBio conference (https://www.idigbio.org/wiki/images/4/4e/Natural_History_data_pipelines_-_Bentley.pdf).
A table of the different categories of biodiversity information used to score the biodiversity databases and what each score value means.
|
| ||
|
|
|
|
|
| -has not provided DNA sequences for species or information | -has listed DNA sequences or identifiers for individual organisms |
|
| -provided no dichotomous keys | -provided dichotomous keys to assist in identification of species based on physical characteristics |
|
| -does not have quantitative information on the number of individuals | -provides quantitative data on number of organisms in a population or area |
|
| -Only listed organism's genus and species | -lists taxonomic groups higher than genus |
|
| -provided little or no scientific literature | -provided an extensive list of scientific literature (does not have to host the pdf) |
|
| -no GPS coordinates or plots on a detailed and scaled map of species occurrence | -maps with occurrence plots/points on a map of where individuals were found at a particular point in time or, |
|
| -does not have material sample or entire specimen (living or dead) | -contains material sample(s) or entire (living or dead) specimens |
|
| -provided little or no physical description of organism/species | -provided qualitative data with regards to physical descriptions that are unique to the species or group which aid in identification or, |
|
| -no times or dates listed for where an organism was found to occur | - provided a date for when the observation(s) was observed and recorded in terms of where an organism was found |
Figure 2.Depiction of the relative amounts between accessible, with or without our categorical criteria and inaccessible databases (n=600) from the investigated TDWG list.
Figure 3.Most recent activity of all 266 databases that were accessible and compliant, based on information provided within each database. Categorisation of activity levels was based on yearly increments unless activity information was unavailable online (N/A).
Figure 4.Depiction of how frequently a database simultaneously hosted information complying with any number of data categories, according to our definitions, throughout the accessible and compliant databases (n=266).
Figure 5.Depiction of the accessible biodiversity databases (n=266) characterised by binary scoring values (0 and 1), based on compliance of information for each criterion analysed.