| Literature DB >> 27751180 |
Wolfgang Kuchinke1, Christian Krauth2, René Bergmann2, Töresin Karakoyun2, Astrid Woollard3, Irene Schluender4, Benjamin Braasch2, Martin Eckert2, Christian Ohmann5.
Abstract
BACKGROUND: In an unprecedented rate data in the life sciences is generated and stored in many different databases. An ever increasing part of this data is human health data and therefore falls under data protected by legal regulations. As part of the BioMedBridges project, which created infrastructures that connect more than 10 ESFRI research infrastructures (RI), the legal and ethical prerequisites of data sharing were examined employing a novel and pragmatic approach.Entities:
Mesh:
Year: 2016 PMID: 27751180 PMCID: PMC5067915 DOI: 10.1186/s12911-016-0325-0
Source DB: PubMed Journal: BMC Med Inform Decis Mak ISSN: 1472-6947 Impact factor: 2.796
Fig. 1Creation of the Requirements Matrix. Extracted rules and policies, data types used and combinations of access rules and the results of the analysis went into the creation of the matrix. Using tabulation the matrix was created, whereby the requirements are listed as rows and the columns represent data characteristics and applicable rules (e.g. data subject, data type, data protection (identifying, pseudonymous, anonymous), purpose of data processing, legal approval / consent, etc. The values of the different cells contain the extracted rules for certain data types and for certain combinations of data types and their possible combinations
Five usage scenarios
| Data bridge | Description |
|---|---|
| Image data bridge | A data bridge that facilitates the comparison of cellular phenotypes specific to individual genes with morphological imaging data from diseased tissue specimens, from both human and mouse tissues. |
| Phenotype data bridge | Datasets from mouse or human are collected that relate to the disease states of diabetes and / or obesity. After the annotation of these datasets, a service allows the automatic identification of phenotype matches across mouse and human. |
| Personalised medicine bridge | The data bridge enables access to and integration of, often heterogeneous and dispersed, patient data to enable better treatment decisions for individual patients by using a data analysis informatics pipeline. |
| Structural data bridge | A data bridge connecting databases with structural data and protein interaction data. Researchers receive access to a range of available structural techniques, such as crystallography, Nuclear Magnetic Resonance (NMR), MS or EM, and will be presented with a comprehensive structural model. |
| Biosamples data bridge | Researchers receive information about available sample data through the BioSamples database with the aim to set up clinical/translational research collaborations. |
The five usage scenarios representing different kinds of data bridges used to create requirement clusters and to evaluate the developed legal interfaces
Requirement clusters
| Name of cluster | Requirement tables | Description |
|---|---|---|
| Data protection/privacy | General data protection requirements applicable to data bridges; conditions, which must be fulfilled in order to legally process personal data. | |
| Table 1 Requirements for animal data and other data | Requirement in relation to: | |
| Table 2 Potential identification risk for human research subjects | Requirement in relation to: | |
| Table 3 Pseudonymous human data | Requirement in relation to: | |
| Table 4 Anonymous human data | Requirement in relation to: | |
| Data security | Data security issues with a focus on access control; measures to protect data from possible outsider attacks, as well as from re-identification attempts | |
| Table 5 Requirements for getting access to data/biosamples | Requirement in relation to: | |
| Table 6 Requirements for linking and sharing restricted access data and open access data | Requirement in relation to: | |
| Intellectual property and licences | Prevention of the infringement of intellectual property rights needs to be fulfilled in order to protect intellectual property rights within data bridges | |
| Table 7 Overview of the IP requirements cluster | Requirement in relation to: | |
| Security of biosamples | Security issues concerning biobanking, measures that have to be taken to securely use and share biosamples | |
| Table 8 Requirements concerning the security of biosamples | Requirement in relation to: |
Five requirement clusters were created containing eight requirement tables connecting requirements with constraints and protection measures
Fig. 2Dimension tree of the requirements. The criteria of the dimension tree organise requirements derived from five requirement clusters
Fig. 3Result of the semantic testing of the requirements matrix. Part of the MS Excel sheet representing the requirement matrix is shown used for filtering the values of the requirement dimensions. Requirements were validated by calculation from the matrix using filtering
Fig. 4Usage of the LAT. The user answers the questions of the LAT and receives a list with relevant requirements for data access and data sharing
Technical components of the LAT tool
| Tool | Version | Licence |
|---|---|---|
| Ubuntu | 12.04.5 | GNU GPL |
| Apache Tomcat [ | 7.0.42 | Apache License 2.0 |
| jQuery | 2.1.4 | MIT License |
| Hibernate | 4.3.11 | LGPL |
| Spring MVC | 4.1.6 | Apache License 2.0 |
| AngularJS | 1.4.3 | MIT License |
| Java [ | 1.7.0.72 | GPL Version 2 |
Components of LAT tool and contribution of Open Source Software and their licences
Fig. 5Course of a generic query. Shown is the workflow through the questions and decision points that characterises the user interface of LAT. The workflow begins top left with “location of data source” and ends at the right end. Depending on the data source not all of the specifications need to be determined/answered. Simultaneously with answering the questions, relevant results are presented (bottom lane)
Databases and their access policies considered for the requirement clusters of data bridges
| Data providers | Summary of main access restrictions and policies |
|---|---|
| Requirement cluster: Imaging bridge | See: Additional file |
| Mouse tissue imaging data (Infrafrontier), Human tumour tissue data (BBMRI/ FIMM), MitoCheck (cell-based RNAi screens), WebMicroscope (mouse and human image data sets), Ensembl, ArrayExpress, Phenotator, MitoCheck | Restricted access, data linking only possible, if the data provider gives permission based on the availability of informed consents, consent form which permits such a research may be required, application to the steering committee or principal investigator, Images are owned by image generator |
| Requirement cluster: Phenotypic bridge | See: Additional file |
| EuroPhenome, IMPC, Gene Expression Atlas (GXA), ArrayExpress, ChEMBL, Metabolights, Reactome, CERM datasets, Biobank/BBMRI (University of Graz, Austria) | Predominantly open access, open restricted for private data (pre-publication/ unpublished), open access, when data is used for research purposes, biobank with restricted access (access rules include project application and approval committee) |
| Requirement cluster: Personalised Medicine bridge | See: Additional file |
| ICGC (International Cancer Genome Consortium), TCGA (The Cancer Genome Atlas), EGA (European Genome-phenome Archive), Cosmic (Catalogue of somatic mutations in cancer), GEO (Gene Expression Omnibus), Array-Express, ChEMBL, Reactome, Ensembl, Drugbank, Pharmgkb, BioSD, Biobanks (BBMRI), EU-OPENSCREEN, ECRIN (CTIM), FIMM Institute for Molecular Medicine Finland (EATRIS) | Open access and data with and without restricted access, different policies apply, controlled access datasets means access control, Data Access Compliance Office (DACO) handles requests from scientists for access to controlled data, requirement for user certification via Data Access Request, download of datasets must be approved by the specified Data Access Committee (DAC), requires users to sign a Data Access Agreement (DAA), which details the terms and conditions of use for each dataset, all controlled access downloadable datasets are encrypted, restricted access usually for pre-published/unpublished data |
| Requirement cluster: Structural Data bridge | See: Additional file |
| UniProt, AmiGO (Gene Ontology database), EMDB (Electron Microscopy Data Bank), IntAct (Molecular Interaction Database), GenBank (NCBI), ELIXIR, BMB database | All are open access |
| Requirement cluster: Biosample data bridge | See: Additional file |
| Mainly restricted access. Open access for aggregated anonymised information or metadata about samples, for data access researcher has to apply to Data Access Committee (DAC) of biobank and has to agree to use data only for research, which has been specified between biobank and researcher, and not to try to identify patients |
Fig. 6LAT interface consists of two parts: a survey part showing a set of questions (left side) and the corresponding results part with the assessment (right side). Because no question has yet been answered, the assessment part doesn’t show any results
Fig. 7Conditional questions. Conditional questions depending on given answers. The example “HUMAN” data is shown. The right side shows the new, conditional questions
Fig. 8Exemplary assessment results. Only details of the lists of results for IP, Biosamples, Data protection, Data security and additional information are displayed. The upper parts of "Intellectual Property", "Data Security" and "Additional Information" are shown. The lower edges have triangulars to indicate that for the above mentioned topics more information is provided, but not shown in the figure. ID = requirement number. By clicking on the “Explanation” button, the user is linked to the respective regulations, or associated document templates, like Data Transfer Agreements