| Literature DB >> 24440148 |
Kerina H Jones1, David V Ford2, Chris Jones2, Rohan Dsilva2, Simon Thompson2, Caroline J Brooks2, Martin L Heaven2, Daniel S Thayer2, Cynthia L McNerney2, Ronan A Lyons2.
Abstract
With the current expansion of data linkage research, the challenge is to find the balance between preserving the privacy of person-level data whilst making these data accessible for use to their full potential. We describe a privacy-protecting safe haven and secure remote access system, referred to as the Secure Anonymised Information Linkage (SAIL) Gateway. The Gateway provides data users with a familiar Windows interface and their usual toolsets to access approved anonymously-linked datasets for research and evaluation. We outline the principles and operating model of the Gateway, the features provided to users within the secure environment, and how we are approaching the challenges of making data safely accessible to increasing numbers of research users. The Gateway represents a powerful analytical environment and has been designed to be scalable and adaptable to meet the needs of the rapidly growing data linkage community.Entities:
Keywords: Data linkage; Privacy-protection; Remote access system; e-Records research
Mesh:
Year: 2014 PMID: 24440148 PMCID: PMC4139270 DOI: 10.1016/j.jbi.2014.01.003
Source DB: PubMed Journal: J Biomed Inform ISSN: 1532-0464 Impact factor: 6.317
Fig. 1SAIL architecture. This diagram shows the SAIL databank system and the controls in place for data acquisition and utilisation, with an indication of the roles carried out by each party. Beginning at the base of the diagram, SAIL has formal agreements with data providers to provide their data to the databank in accordance with Information Governance. The commonly-recognised identifiers are anonymised at NWIS, who provide a trusted third party service to SAIL. Further processes of masking and encryption are carried out at SAIL, and the SAIL databank is constructed. From the top of the diagram, requests to use the data are reviewed by SAIL and an independent Information Governance Review Panel (IGRP) to assess compliance with Information Governance before access can be allowed. Once this is agreed, a data view is created by SAIL staff, and access to this view can be made available via the SAIL Gateway. For this to happen, further data transformations are carried out to control the risk of disclosure, and the data user signs an access agreement for responsible data utilisation, in accordance the specifications of the IGRP to comply with Information Governance.
Fig. 2The SAIL Gateway. The SAIL Gateway is a remote data access system, and this simplified illustration shows how data are accessed by end users using their own computer. Once approved with an account, data users are provided with a Gateway desktop within the Gateway Local Area Network (LAN), and this is accessed via remote desktop protocol (RDP) interposed by a Gateway security server. The Gateway desktop communicates with the SAIL database in the SAIL LAN to provide data users with a specified data view. Within the Gateway LAN, data users have access to analysis tools, a secure file store and other resources. File transfers into and out of the Gateway LAN are mediated by a guardian.
Fig. 3SAIL Info Central screenshot. This screenshot displays the home page for a data user on the SAIL Info Central (External to the SAIL Gateway) site. The top section displays menus to more information about the datasets and support options. The top left hand section displays information about the user, which is editable by the user. The bottom left hand section displays the services available to the user and indicates service status. The centre section displays all the projects that the user is authorised to access and the hyperlink directs them to more information about the project. The bottom right hand section display a timeline of news feed from projects and dataset updates.
Fig. 4SAIL Gateway screenshot. This screenshot displays the main features available within the SAIL Gateway. The left hand side displays the desktop icons to software that is installed as default. Starting from the top and working clockwise the screenshot displays the internet based resources – Question and Answer Forum, Frequently Asked Questions, SAIL Info Central (Internal to the SAIL Gateway), and the SAIL Gateway WIKI. The second screen displays the IBM InfoSphere environment used to develop SQL code to manipulate SAIL data to be research ready. The third screen displays the NHS Clinical Terminology Browser which enables users to search for read codes and their meanings. Finally the last two screens display the instant messaging system that enables users to communicate and share information within the secure SAIL Gateway environment.
Fig. 5The SAIL Gateway data user journey. This flowchart illustrates the SAIL data user journey from initial contact with SAIL to dissemination of outputs. Work conducted within the SAIL Gateway is highlighted.
The SAIL system objectives, methods and recent developments.
| Objective | Methods | Developments | |
|---|---|---|---|
| 1 | Ensuring data transportation is secure | Following data provider Information Governance permissions and subject to a data sharing agreement, datasets are split into a demographic component (comprising the commonly-recognised identifiers), and a clinical or event component (such as medication records and procedures). These are transported to NWIS | The use of secure file transfers is more robust than using portable transfer media such as CDs or USBs, and less subject to data mis-direction or loss |
| 2 | Operating a reliable record matching technique to enable accurate record linkage across datasets | Matching and assignment of a consistent, unique ALF to each individual is carried out by NWIS acting as a TTP so that SAIL does not handle identifiable data. An ALF can be assigned to NHS and non-NHS datasets (such as local authority housing, or fire service datasets). Similarly, a RALF has been developed for residences in address-level datasets | The extension of the matching and record-linkage processes beyond healthcare data, and to include RALFs, opens up new dimensions for research |
| 3 | Anonymising and encrypting the data to minimise the risk of re-identification | Demographic data are anonymised and encrypted by the TTP and subjected to quality assurance to ensure content anonymity. SAIL receives only the ALF, week of birth, gender code and area of residence (LSOA), which are then recombined with the clinical/event component of the dataset. Further encryption of the ALF is carried out at SAIL to form the ALF-E. A parallel process is in place for the RALF. Linkage across datasets is made via the ALF-E and RALF | The use of the ALF and RALF means that datasets can be linked at the individual record and address level (respectively), enabling a wide range of research whilst preserving privacy |
| 4 | Applying measures to address disclosure risk in data views created for researchers | A variety of measures can be applied at creation of individual data views to maximise utility and minimise disclosure risk, including: masking of practitioner codes; aggregation and suppression; limiting numbers of variables provided/sequential provision; project-specific encryption of the ALF-E to prevent cross-linkage where data users are involved in multiple projects | Having a variety of measures ensures a flexible approach. However, the current method can be labour-intensive for a senior analyst as the number of data users grows and more automated methods are in development |
| 5 | Ensuring data access is controlled and authorised | Subject to data user verification, a data access agreement, and physical and procedural controls, data users are assigned a time-limited account to access their data view. Whereas previously this was location-specific, data users are now able to access data remotely via the SAIL Gateway. Safeguards include a fire-walled VPN, enhanced user authentication, the use of YubiKeys, logging of all SQL commands, and configuration controls to ensure that data cannot be removed or transferred unless authorised. Datasets and user access are managed via SAIL Info Central | The SAIL Gateway enables greater numbers of researchers to engage safely with SAIL, compared to the previous access model, which was on-site only. The Gateway environment provides users with a range of familiar tools and applications, with secure file transfers in and out of the Gateway. SAIL Info Central provides a centralised management system and user interface |
| 6 | Scrutinising proposals for data utilisation and approving output | All proposals to use SAIL data are subject to review by an independent IGRP. In addition, some DPs request the right to review proposals seeking to use their data, and SAIL complies with this requirement. Output is scrutinised for potential disclosure risk before results can be released | The lay representation on the IGRP has been increased to enhance the patient/public viewpoint. Additional automation in privacy-protecting measures is being developed and will further streamline the process of output scrutiny |
| 7 | Gaining external verification of compliance with Information Governance | As well as in-house monitoring, IG compliance is verified by a regular programme of independent audit | SAIL is also working towards ISO 270001 compliance |
The objectives of the SAIL system are shown, along with a brief summary of the methods in place and recent developments.