| Literature DB >> 24229443 |
Dawn Muddyman1, Carol Smee1, Heather Griffin2, Jane Kaye2.
Abstract
This paper outlines the history behind open access principles and describes the development of a managed access data-sharing process for the UK10K Project, currently Britain's largest genomic sequencing consortium (2010 to 2013). Funded by the Wellcome Trust, the purpose of UK10K was two-fold: to investigate how low-frequency and rare genetic variants contribute to human disease, and to provide an enduring data resource for future research into human genetics. In this paper, we discuss the challenge of reconciling data-sharing principles with the practicalities of delivering a sequencing project of UK10K's scope and magnitude. We describe the development of a sustainable, easy-to-use managed access system that allowed rapid access to UK10K data, while protecting the interests of participants and data generators alike. Specifically, we focus in depth on the three key issues that emerge in the data pipeline: study recruitment, data release and data access.Entities:
Year: 2013 PMID: 24229443 PMCID: PMC3978569 DOI: 10.1186/gm504
Source DB: PubMed Journal: Genome Med ISSN: 1756-994X Impact factor: 11.117
Figure 1Flow diagram to illustrate the process of data flow, access requests and approvals. BAM, binary alignment map; PM, project manager; VCF, variant call file.
Setting up a managed access data resource: project checklist
| ✓ | Have all pre-collected sample collections been identified for inclusion in the project? |
| ✓ | For these studies, do the sample custodians agree in principle to the potential inclusion of their sample sets in the project? |
| ✓ | For those studies willing to participate in the project, have the sample custodians confirmed total sample numbers available within the required timeframe for the project? |
| ✓ | Have they also confirmed the REC approval status for these samples? |
| | Specifically has the sample custodian provided signed assurance that: |
| | • appropriate consent and/or REC approvals are in place for the use of samples in the project, and are compatible with the terms of data sharing proposed by the project; |
| | • where appropriate consents and/or REC approvals are not in place, there is sufficient time to correct this within the timeframe of the project (if not, these studies should be excluded at the outset); |
| | • a mechanism is in place for sample custodians to withdraw the use of their samples; |
| | • the potential risks of participant identification (however minimal) have been explained; |
| | • all of these points have been documented in, for example, an Ethical Governance Framework, and is this document available to data users both inside and outside the consortium. |
| ✓ | Has it been made clear to the sample custodian that DNA samples must be submitted in a linked (coded) anonymized form, with the sample custodian retaining the linkage key? |
| ✓ | Do the sample custodians agree with the timeframe for submitting their samples, and the amount and quality of material required for sequencing? |
| ✓ | Has a comprehensive, project management committee-approved data-access document been prepared that includes: |
| | • a description of all available datasets and any constraints on the use of the data; |
| | • the project’s publication policy; |
| | • a clear explanation of the publication moratoria (if applicable) and its expiry dates; |
| | • a named point-of-contact to whom completed applications (and any queries) should be submitted; in the UK10K example this was the project manager (PM). |
| ✓ | Has the data access document been made available for download at the site of released data (that is, the resource; in this case the EGA), and/or on the project website? |
| ✓ | Has a person or group independent from the project been identified and appointed to function as a Data Access Committee (DAC)? Have they been briefed on the terms of data access? |
| ✓ | Is there a mechanism in place to arrange for approved data users to access the datasets in the resource once the DAC has approved the data request? If not, this needs to be put in place. |
| ✓ | Have the data been deposited into the Resource in a linked anonymized (pseudonymized) form, so that third-party data users are unable to identify study participants? |
| ✓ | Is the project tracking applications that are made to use project data? (For UK10K, applications were monitored by the PM from the point of receiving an application through to the DAC notifying the PM that the application had been formally approved.) |
| ✓ | Providing information as to how datasets are being used may be a condition of some studies’ permitting the inclusion of their samples in the project; and failure to do so may result in the dataset being withdrawn. |
| ✓ | Are approved data users notified as and when additional datasets are added to the resource? |
| ✓ | If the project is of a fixed duration, has a mechanism been put into place for managing data access once the project’s management structure dissolves? |