| Literature DB >> 27570508 |
Leah B Honor1, Christian Haselgrove2, Jean A Frazier2, David N Kennedy2.
Abstract
Data sharing and reuse, while widely accepted as good ideas, have been slow to catch on in any concrete and consistent way. One major hurdle within the scientific community has been the lack of widely accepted standards for citing that data, making it difficult to track usage and measure impact. Within the neuroimaging community, there is a need for a way to not only clearly identify and cite datasets, but also to derive new aggregate sets from multiple sources while clearly maintaining lines of attribution. This work presents a functional prototype of a system to integrate Digital Object Identifiers (DOI) and a standardized metadata schema into a XNAT-based repository workflow, allowing for identification of data at both the project and image level. These item and source level identifiers allow any newly defined combination of images, from any number of projects, to be tagged with a new group-level DOI that automatically inherits the individual attributes and provenance information of its constituent parts. This system enables the tracking of data reuse down to the level of individual images. The implementation of this type of data identification system would impact researchers and data creators, data hosting facilities, and data publishers, but the benefit of having widely accepted standards for data identification and attribution would go far toward making data citation practical and advantageous.Entities:
Keywords: credit; data attribution; data citation; data repository; data sharing
Year: 2016 PMID: 27570508 PMCID: PMC4981598 DOI: 10.3389/fninf.2016.00034
Source DB: PubMed Journal: Front Neuroinform ISSN: 1662-5196 Impact factor: 4.081
Figure 1Overview and workflow. Bottom: studies, subjects, and observations (images). Studies A and B are projects that released data; they contain subjects and images. Study C is a functional study, made up of images from existing studies. In this case, Study C is made up of images from Study A and Study B (the component images are represented by RelatedIdentifier fields with relationType = HasPart in the DataCite schema, and the source projects are represented by RelatedIdentifier fields with relationType = IsDerivedFrom). As a new and useful collection of images, Study C can be assigned a DOI. Top: The overall flow for data selection and tagging. Data for a novel study is selected by searching existing studies (Study A and B). This search can be refined at a fine-grained level: individual images can be added or removed to the search results to create an arbitrary collection of images. The resulting collection is then tagged with a DOI and can be referenced from a publication that uses it. Existing projects (i.e., Study A and B) with constituent images can be queried and grouped into new collections (i.e., Study C) that retain attribution to the collection members.
Fields and subfields defined by DataCite Metadata Schema V 3.1 as are field level requirements.
| M | The identifier is a unique string that identifies a resource | |||||
| M | Original creators | Original creators | Creators of data included in collection | |||
| 2.1 | The main researchers involved in producing the data, or the authors of the publication, in priority order. | |||||
| 2.2 | ORCID identifier | |||||
| 2.3 | URI of identifier scheme used | |||||
| 2.4 | affiliations are from the time of the data creation/paper publication | |||||
| M | A name or title by which a resource is known | |||||
| M | The name of the entity that holds, archives, publishes, prints, distributes, releases, issues, or produces the resource. This property will be used to formulate the citation, so consider the prominence of the role | |||||
| M | The year when the data was or will be made publicly available | Year functional DOI assigned | ||||
| R | Subject, keyword, classification code, or key phrase describing the resource. Free text | Can also include search terms/query used to produce collection | ||||
| 6.1 | i.e.,: MeSH terms from publication | |||||
| 6.2 | URI for souce of scheme | |||||
| R | The institution or person responsible for collecting, managing, distributing, or otherwise contributing to the development of the resource | Creator of the collection | ||||
| 7.1 | contact person, funder, etc., | |||||
| 7.2 | ||||||
| 7.3 | ||||||
| R | Different dates relevant to the work. | |||||
| 8.1 | Accepted, Available, Copyrighted, Collected, Created, Issued, Submitted, Updated, Valid, etc., | |||||
| O | ||||||
| R | A description of the resource | |||||
| 10.1 | ex: Dataset/Imaging Data | Image/Structural MRI Image | Dataset/Imaging Data | Dataset/Imaging Data | ||
| O | An identifier or identifiers other than the primary Identifier applied to the resource being registered | |||||
| 11.1 | The type of the AlternateIdentifier | |||||
| R | Identifiers of related resources. These must be globally unique identifiers | Can be iterated if the same data collection is used by multiple publications, papers should include a DOI/References as part of the data description | ||||
| 12.1 | DOI, PMID, etc., | |||||
| 12.2 | ||||||
| O | Unstructured size information about the resource | |||||
| O | Technical format of the resource | |||||
| O | The version number of the resource | |||||
| O | Any rights information for the resource | Inherits from source rights, list of all | ||||
| 16.1 | The URI of the license | |||||
| R | All additional information that does not fit in any of the other categories. May be used for technical information | |||||
| 17.1 | Abstract, Methods, SeriesInformation, TableOfContents, Other | |||||
| R | Spatial region or named place where the data was gathered or about which the data is focused | Need to consider PHI implications | ||||
| 18.1 | ||||||
| 18.2 | ||||||
| 18.3 | ||||||
M, Mandatory; R, Recommended, O, Optional.
Figure 2Reference implementation of DOI-enabled Image Database. The image database in this implementation is an XNAT instance customized to store DOIs for projects and image data. DOIs link (via dx.doi.org) to landing pages for the objects, which can then link back to the image database. DOI's are generated upon data upload into the database. This example shows the image (upper left) in its database representation (including DOI, arrow); the Image Landing Page (lower left) which is referenced by the image DOI, from which the one can access the download for this image (solid arrow) and find the project citation for this image (dashed arrow); and the Project Landing page (right) associated with this image. Landing pages for projects and images have similar considerations as landing pages for image collections. While projects and images do not need as intricate bookkeeping as image collections, additional semantics do still need to be applied. Images, for instance, use relatedIdentifier with relationType “IsPartOf” for both projects and image collections, so the landing page must keep track of which are which.
Figure 3Generation of functional-level collections. The sequence of steps includes the following. Search form: The search form mirrors the metadata descriptors of the image database. Examination of search results: The results page allows for finer control in specifying search results for final collection generation. Image collection tag form; Once a precise dataset has been selected, it can be tagged with a DOI. The proof-of-concept implementation allows for creating test DOIs for demonstration purposes. Functional-level Image collection landing page; The resulting image collection with user-specified metadata about the collection. If this collection is arrived at by another route (i.e., if somebody investigating something else happens to work with the same collection of data), a new DOI is not assigned, but the existing DOI is amended. As described in the text, the structure of the landing page does not reflect that of the underlying DataCite representation. A second use of this functional collection would result in another row in the top table; the DataCite schema does not accommodate the grouping of elements in this way.
Landing page comparison.
| Zuo et al., | 10.1038/sdata.2014.49 | Includes 31 separate data DOI's | |||||
| 10.15387/fcp_indi.corr.jhnu1 (Supplementary Figure | ✓ | ? | |||||
| 10.15387/fcp_indi.corr.ipcas4 (Supplementary Figure | ✓ | ? | ? | ||||
| 10.15387/fcp_indi.corr.uwm1 (Supplementary Figure | ✓ | ? | |||||
| Hanke et al., | 10.1038/sdata.2014.3 | ✓ | ? | ||||
| Watson et al., | 10.1016/j.dib.2016.03.100 | Data is provided as a supplementary table and a download zip file, with no clear landing page (Supplementary Figure | ✓ | ? |
Are “Principal Investigators” synonymous with authors?
Are individuals listed in the “Acknowledgements” synonymous with authors?
Are the data authors the same as the data article authors?