| Literature DB >> 25901117 |
Robert P Guralnick1, Nico Cellinese1, John Deck2, Richard L Pyle3, John Kunze4, Lyubomir Penev5, Ramona Walls6, Gregor Hagedorn7, Donat Agosti8, John Wieczorek9, Terry Catapano8, Roderic D M Page10.
Abstract
Biodiversity data is being digitized and made available online at a rapidly increasing rate but current practices typically do not preserve linkages between these data, which impedes interoperation, provenance tracking, and assembly of larger datasets. For data associated with biocollections, the biodiversity community has long recognized that an essential part of establishing and preserving linkages is to apply globally unique identifiers at the point when data are generated in the field and to persist these identifiers downstream, but this is seldom implemented in practice. There has neither been coalescence towards one single identifier solution (as in some other domains), nor even a set of recommended best practices and standards to support multiple identifier schemes sharing consistent responses. In order to further progress towards a broader community consensus, a group of biocollections and informatics experts assembled in Stockholm in October 2014 to discuss community next steps to overcome current roadblocks. The workshop participants divided into four groups focusing on: identifier practice in current field biocollections; identifier application for legacy biocollections; identifiers as applied to biodiversity data records as they are published and made available in semantically marked-up publications; and cross-cutting identifier solutions that bridge across these domains. The main outcome was consensus on key issues, including recognition of differences between legacy and new biocollections processes, the need for identifier metadata profiles that can report information on identifier persistence missions, and the unambiguous indication of the type of object associated with the identifier. Current identifier characteristics are also summarized, and an overview of available schemes and practices is provided.Entities:
Keywords: Biocollections; GUIDs; Globally Unique Identifiers; field collections; identifiers; legacy collections; linked open data; semantic publishing
Year: 2015 PMID: 25901117 PMCID: PMC4400380 DOI: 10.3897/zookeys.494.9352
Source DB: PubMed Journal: Zookeys ISSN: 1313-2970 Impact factor: 1.546
Examples of identifiers in use for biological samples in the GBIF database.
| GBIF occurrence | Identifier type | Identifier | Catalog number | Collection |
|---|---|---|---|---|
| LSID | urn:lsid:biosci.ohio-state.edu:osuc_occurrences:OSUC__169968 | OSUC 169968 | C.A. Triplehorn Insect Collection | |
| URN | urn:occurrence:Arctos:MVZ:Bird:157675:1526959 | MVZ 157675 | MVZ Bird Collection | |
| URN | urn:catalog:UMMZ:Mammals:171041 | UMMZ 71041 | UMMZ Mammal Collection | |
| HTTP URI | E00115694 | Royal Botanic Garden Edinburgh Herbarium | ||
| HTTP URI | UAM 230092 | UAM Entomology Collection | ||
| DOI | UAM 230092 | UAM Entomology Collection | ||
| UUID | EF0A4D3E-702F-4882-81B8-CA737AEB7B28 | UF 161444 | UF FLMNH Ichthyology | |
| Darwin Core Triplet | MCZ:Mamm:8831 | MCZ 8831 | Museum of Comparative Zoology, Harvard University |
Abbreviations and the full spelled out version or more detailed meaning.
| ABCD | Access to Biological Collections Data |
| ARK | Archival Resource Key |
| BCO | Biological Collections Ontology |
| DMP | Data Management Plan |
| DOI | Digital Object Identifier |
| EZID | A type of identifier & system run by California Digital Library |
| GBIF | Global Biodiversity Information Facility |
| GRBio | Global Repository of Biorepositories |
| GUID | Globally Unique Identifier |
| HTTP-URI | HTTP Uniform Resource Identifier |
| IGSN | International Geosample Number |
| LOD | Linked Open Data |
| LSID | Life Sciences Identifier |
| NEON | National Ecological Observatory Network |
| OCR | Optical Character Recognition |
| TDWG | Biodiversity Information Standards |
| URI | Uniform Resource Identifier |
| URL | Uniform Resource Locator |
| URN | Uniform Resource Name |
| UUID | Universally Unique Identifier |
Below the main characteristics of identifier schemes are listed. The list is not meant to be exhaustive but is intended to cover the major differences across different approaches.
|
Identifier Schemes:
support provide identifiers that are may require resolvers to support access to the may use may support may come with may come with may come with may come with administrative tools for central identifier |
Figure 1.Example of UUIDs embedded within QR-Codes on microcentrifuge tube labels. The 5 mm × 5 mm QR-Codes (Version 2) are printed with a standard laser printer on sheets of self-adhesive 9 mm dots, and scan reliably with a standard barcode reader, while still providing room for a human-readable 5-character prefix + 5-digit number (the human-readable number and UUID are permanently cross-linked in the data management system). Photo: Robert K. Whitton.
Figure 2.Example of a PURL-URI as a QR-Code, in this example attached to a digitised lichen type specimen in the Natural History Museum, University of Oslo. The QR-Code corresponds to http://purl.org/nhmuio/id/c1a8b878-a4f9-448b-be00-26cbad58b11c.
Identifiers schemes according to key characteristics noted in part in Box 2.
| Identifier characteristics | DataCite DOI | EZID ARK | OCLC PURL | Self-minted HTTP URI | LSID | DwC Triplet | UUID |
|---|---|---|---|---|---|---|---|
| yes | yes | yes | yes | yes | no | yes | |
| yes | yes | yes | yes | yes | no | no | |
| per id or subscription fee | yearly subscription fee | free | free | free | free | free | |
| registration | registration | registration | local | local | local | local | |
| provider dependent | provider dependent | provider dependent | provider dependent | provider dependent | high | low | |
| partial | partial | partial | provider dependent | provider dependent | low | high | |
| biodiversity publishing | low | low | high | low | collections community | variable | |
| variable | low | variable | high | low | low | high | |
| yes | yes | yes | yes | yes | no | no | |
| central | central | central | distributed | distributed | N/A | N/A | |
| HTML, RDF/XML | HTML | HTML | provider dependent | yes | N/A | N/A | |
| yes | yes | yes | possible | possible | N/A | N/A | |
| yes | yes | no | no | no | N/A | N/A | |
| yes | yes | no | provider dependent | no |
Self-minted HTTP URIs may include ARKs or PURLs as well
ARKs have special mechanisms to extend scalability
Structured metadata responses may be available after redirection, depending on the provider (e.g. dublincore.org returns RDF/XML for PURLs)
Perhaps, if hosted by a general service (e.g. GrBio for Biocollections, GBIF for occurrence records, etc.)
Figure 3.Identifier schemes differ in whether redirections and mappings to ensure stability are centrally managed or not. Top: a DOI dereferencing service like CrossRef or Datacite redirects to the actual content provider; the URIs of content data and RDF metadata are publicly visible and can be used as independent (albeit often unstable) identifiers. Bottom: A linked open data pattern, where each content provider assumes the responsibility for maintaining a stable mapping; the content negotiation is internal. Modified after Hagedorn 2013.
Participants in Identifiers Workshop held October 25–26, 2014 at the Stockholm Museum of Natural History.
| Name | Institution/Organization | |
|---|---|---|
| Nico Cellinese | University of Florida | ncellinese@flmnh.ufl.edu |
| John Deck | University of California, Berkeley | jdeck@berkeley.edu |
| Rob Guralnick | University of Colorado, Boulder | Robert.Guralnick@colorado.edu |
| Hilmar Lapp | NESCENT, Duke University | hlapp@nescent.org |
| Michael Denslow | NEON | mdenslow@neoninc.org |
| Richard Pyle | Bishop Museum, Honolulu | deepreef@bishopmuseum.org |
| Donat Agosti | Plazi | agosti@plazi.org |
| Joan Starr | California Digital Library | Joan.Starr@ucop.edu |
| Ramona Walls | iPlant Collaborative | rwalls@iplantcollaborative.org |
| Kerstin Lehnert | IGSN | lehnert@ldeo.columbia.edu |
| Roderic Page | University of Glasgow | Roderic.Page@glasgow.ac.uk |
| Karen Cranston | NESCENT | karen.cranston@nescent.org |
| Terence Catapano | Plazi | catapanoth@gmail.com |
| John Kunze | California Digital Library | jak@ucop.edu |
| Markus Döring | GBIF | mdoering@gbif.org |
| Lyubomir Penev | Pensoft | penev@pensoft.net |
| Teodor Georgiev | Pensoft | preprint@pensoft.net |
| John Wieczorek | Museum of Vertebrate Zoology, University of California, Berkeley | tuco@berkeley.edu |
| Dag Endresen | Natural History Museum, Oslo | dag.endresen@nhm.uio.no |
| David Schindel | CBOL, Smithsonian | schindeld@si.edu |
| Greg Riccardi | Florida State University, iDigBio | griccardi@fsu.edu |
| Deb Paul | Florida State University, iDigBio | dpaul@fsu.edu |
| David Fichtmueller | Berlin Botanic Garden | d.fichtmueller@bgbm.org |
| Falko Gloeckler | Natural History Museum, Berlin | falko.gloeckler@mfn-berlin.de |
| Jana Hoffmann | Natural History Museum, Berlin | jana.hoffmann@mfn-berlin.de |
| Elspeth Haston | Royal Botanic Garden, Edinburgh | e.haston@rbge.org.uk |