| Literature DB >> 29451541 |
Iain Hrynaszkiewicz1, Varsha Khodiyar2, Andrew L Hufton2, Susanna-Assunta Sansone2,3.
Abstract
Sharing of experimental clinical research data usually happens between individuals or research groups rather than via public repositories, in part due to the need to protect research participant privacy. This approach to data sharing makes it difficult to connect journal articles with their underlying datasets and is often insufficient for ensuring access to data in the long term. Voluntary data sharing services such as the Yale Open Data Access (YODA) and Clinical Study Data Request (CSDR) projects have increased accessibility to clinical datasets for secondary uses while protecting patient privacy and the legitimacy of secondary analyses but these resources are generally disconnected from journal articles-where researchers typically search for reliable information to inform future research. New scholarly journal and article types dedicated to increasing accessibility of research data have emerged in recent years and, in general, journals are developing stronger links with data repositories. There is a need for increased collaboration between journals, data repositories, researchers, funders, and voluntary data sharing services to increase the visibility and reliability of clinical research. Using the journal Scientific Data as a case study, we propose and show examples of changes to the format and peer-review process for journal articles to more robustly link them to data that are only available on request. We also propose additional features for data repositories to better accommodate non-public clinical datasets, including Data Use Agreements (DUAs).Entities:
Keywords: Data Journal; Data Repository; Journal Article; Open Access Journal; Scientific Data
Year: 2016 PMID: 29451541 PMCID: PMC5793987 DOI: 10.1186/s41073-016-0015-6
Source DB: PubMed Journal: Res Integr Peer Rev ISSN: 2058-8615
Advantages and disadvantages of alternative approaches to data sharing
| Approach | Description | Advantages | Disadvantages | Link/example |
|---|---|---|---|---|
| The ‘Beacon’ model | • A common web service allows researchers to discover data relevant to their research without the data holder storing the data outside the host institution | • Comparatively easy to implement | All those of share-on-request systems, including: | Being piloted by Global Alliance for Genomics and Health for genomics data |
| • Can improve discoverability of clinical datasets which cannot be openly shared | • Lack of data preservation guarantees | |||
| • No independent governance of data requests | ||||
| • No common system for citing datasets | ||||
| The ‘Federation’ model | • Separate, locally controlled data resources share a common index and data transfer protocols | • Improved data preservation over the Beacon model | • Data preservation relies on multiple partner nodes. | Global Alzheimer’s Association Interactive Network (GAAIN) [ |
| • Easier for institutions and ethical committees to accept because data holder does not give up control of the data to an independent repository | • No independent governance of data requests | |||
| • Terms for anonymous peer review of data, if permitted, would likely need to be negotiated with each node independently | ||||
| • Linking with the literature possible if stable data identifiers are used across the whole network | ||||
| The ‘Iron-safe’ model | • Data stored in a hardened, centralised resource and analysis conducted within the confines of the system | • Appropriate for highly sensitive data collected in the course of clinical care | • Access barriers may be prohibitive | Planned for 100,000 English Genomes system ( |
| • Anonymous peer review of data generally impossible | ||||
| • A centralised resource can, in principle, provide an independent system for vetting and providing access, helping avoid the creep of biasing access requirements like co-authorship | ||||
| • Difficult to link data with literature in a robust manner, if the index of the data resource is also protected | ||||
| Also, similar to the Clinical Study Data Request (CSDR) model | ||||
| • Data export from the system is limited and tightly controlled |
Data repositories that meet or potentially could meet the proposed requirements (in this article) for hosting non-public clinical trial data
| Repository name | Repository URL | Type of data hosted | Access controls | Example of non-public clinical dataset |
|---|---|---|---|---|
| UK Data Service—ReShare |
| All | Requires account to request access to specific dataset |
|
| ICPSR |
| All | Requires account to request access to specific dataset |
|
| European Genome-phenome Archive (EGA) |
| Genomics data | Specific to each study, some data are open |
|
| database of Genotypes and Phenotypes (dbGaP) |
| Human genotype and phenotype data | Requires account to request access to specific dataset |
|
| Harvard Dataverse |
| All | Specific to each study, some data are open |
|
| figshare |
| All | Specific to each study, some data are open |
|
| National Database for Clinical Trials related to Mental Illness (NDCT) |
| NIH funded data on any aspect of human mental health [includes National Database for Autism Research (NDAR) and Research Domain Criteria Database (RDoCdb)] | Requires account to request access to specific dataset |
|
| National Addiction & HIV Data Archive Program (NAHDAP) |
| Drug addiction and HIV research data | Requires account to request access to specific dataset |
|
| Cancer Imaging Archive |
| Anatomical imaging of human cancer | Requires completion of a DUA, some data are open |
|
| Synapse |
| Biomedical data | Specific to each study, some data are open |
|
Fig. 1Overview of standard Scientific Data editorial workflow for non-confidential datasets. With submission of a Data Descriptor, authors provide a secure link to dataset(s) stored in an external repository. Editors and referees are granted access to the data, in a manner that does reveal their identities to the authors. Upon publication, both the article and the datasets are made freely accessible online under appropriate terms or licences
Fig. 2Overview of an example editorial workflow accommodating peer review and publication of clinical Data Descriptors. For Data Descriptors describing a clinical restricted-access dataset, authors provide with their submission a description of where the dataset(s) are hosted and a copy of the DUA. A process is then agreed between the journal and the authors, by which referees and editors may request access to the data during peer review. Upon publication, the article is made openly available, and the host data repository releases a landing page for the clinical dataset(s). Users may request access to the data according to the process and terms outlined in the Data Descriptor and associated DUA