| Literature DB >> 29636096 |
Fida K Dankar1, Andrey Ptitsyn2, Samar K Dankar3.
Abstract
Contemporary biomedical databases include a wide range of information types from various observational and instrumental sources. Among the most important features that unite biomedical databases across the field are high volume of information and high potential to cause damage through data corruption, loss of performance, and loss of patient privacy. Thus, issues of data governance and privacy protection are essential for the construction of data depositories for biomedical research and healthcare. In this paper, we discuss various challenges of data governance in the context of population genome projects. The various challenges along with best practices and current research efforts are discussed through the steps of data collection, storage, sharing, analysis, and knowledge dissemination.Entities:
Keywords: Biomedical database; Data governance; Data privacy; Whole genome sequencing
Mesh:
Year: 2018 PMID: 29636096 PMCID: PMC5894154 DOI: 10.1186/s40246-018-0147-5
Source DB: PubMed Journal: Hum Genomics ISSN: 1473-9542 Impact factor: 4.639
Fig. 1Secure storage strategy for a large-scale population sequencing project. All data is stored in a secure data center with partial mirroring for research on site, partial archival mirroring for backup at geographically distant remote sites within the country, and additional mirror copy for protection against unforeseeable rare catastrophic (aka “Black Swan”) events.
Fig. 2De-identification of clinical data
Examples of unique identifiers
| Uniquely identifying fields | Remarks |
|---|---|
| National ID (or SSN) | |
| Name | Names of patients and caregivers |
| Source ID | Hospital/Biobank-assigned IDs |
| Passport number | |
| Exact address |
Fig. 3Framework for the secure multiparty computation
Characteristics of selected genome projects. In opt-out consent process, consent is presumed (for clinical data and left-over hospital samples) with an opportunity to opt out. Opt-out is usually coupled with paper-based consent for individuals who want to volunteer samples at the biobank. In local access model, researchers are not allowed to download the data; they can only access it on the data holder’s site. – indicates missing information, Intra-country indicates that data is not allowed to leave the country (collaborations should be done through a local researcher)
| Projects | Declared target #genomes/exomes | #Genomes sequenced to date | Context | Start date | Data access model | Consent process | Notification of relevant data | De-identification process |
|---|---|---|---|---|---|---|---|---|
| Human Longevity | 1000,000 WGS | – | Research | 2013 | Controlled | paper based | Yes | Yes |
| All of US | 1000,000 WGS | 0 | Research | 2017 | Multi-tier (open to controlled), based on risk of request | Dynamic consent | Yes | Yes |
| Korean Genome Project | 1000 WGS for 2016 | 1722 | Research | 2012 | Open | – | – | Yes |
| QGP | 300,000 WGS | 4000 | Research | 2013 | Controlled (multi-ethics/review boards) | Paper-based (11 simple questions) | Yes | Yes |
| Estonian Genome Project | – | 52,000a samples | Research | 2000 | Controlled (multi-ethics/review boards | broad paper-based consent | Yes | Yes |
| Saudi Human Genome Program | 100,000 WES | – | Research, diagnostic screening | 2013 | Controlled | Informed Paper based consent | Yes | Yes |
| Decode Genetics | 300,000 WGS (with imputation) | 160,000 | Research | 1996 | Controlled (intra-country) | Opt-out/ paper-based consent | Yes | Yes |
| The Faroe Genome Project | 50,000 WGS | – | Research | 2011 | Controlled (multi-ethics/review boards) | Informed consent (one for each research project) | Nob | Yes |
| Genomics England | 100,000 | 52,065 | Research | 2015 | Controlled (access committee) | Paper-based | Yes | Yes, coupled with local access |
aThe number of biological specimens collected up to date
bUpon participation in a research study, subjects may opt to receive notification about different genetic results that may be revealed