| Literature DB >> 34095541 |
K H Jones1, D V Ford1, S Thompson1, R A Lyons1.
Abstract
BACKGROUND: The Secure Anonymised Information Linkage (SAIL) Databank is a national data safe haven of de identified datasets principally about the population of Wales, made available in anonymised form to researchers across the world. It was established to enable the vast arrays of data collected about individuals in the course of health and other public service delivery to be made available to answer important questions that could not otherwise be addressed without prohibitive effort. The SAIL Databank is the bedrock of other funded centres relying on the data for research. APPROACH: SAIL is a data repository surrounded by a suite of physical, technical and procedural control measures embodying a proportionate privacy-by-design governance model, informed by public engagement, to safeguard the data and facilitate data utility. SAIL operates on the UK Secure Research Platform (SeRP), which is a customisable technology and analysis platform. Researchers access anonymised data via this secure research environment, from which results can be released following scrutiny for disclosure risk. SAIL data are being used in multiple research areas to evaluate the impact of health and social exposures and policy interventions. DISCUSSION: Lessons learned and their applications include: managing evolving legislative and regulatory requirements; employing multiple, tiered security mechanisms; working hard to increase analytical capacity efficiency; and developing a multi-faceted programme of public engagement. Further work includes: incorporating new data types; enabling alternative means of data access; and developing further efficiencies across our operations.Entities:
Year: 2019 PMID: 34095541 PMCID: PMC8142954 DOI: 10.23889/ijpds.v4i2.1134
Source DB: PubMed Journal: Int J Popul Data Sci ISSN: 2399-4908
Figure 1: The SAIL Secure Research PlatformSAIL operates on a secure research platform (UK SeRP). Beginning at the left of the diagram, wherever researchers are based, they access data through a provisioned, secure, research ready desktop using VMware Horizon infrastructure. The connection from the user’s terminal to the desktop is strongly encrypted and access control prevents data being transferred outside the desktop environment. The end user is authenticated through both user credentials and two factor authentication tokens. Provisioned desktops come in a variety of capacities and configurations to suit the type of analysis that the end user and project needs. As part of the research environment there are shared project spaces to enable collaboration through database space, file store, wiki, Git (source control) as well as access to wider support and help materials.
UK SeRP has many shared infrastructure components that can help deliver the programme’s objectives or specific project needs. SAIL uses IBM DB2 as its data warehouse due to the massively parallel processing (MPP) architecture and the ability to scale to suit the needs of such a large repository and the big data needs that this drives. To support specific project needs, other UK SeRP components can be made available, such as the HPC cluster or Kubernetes cluster to support processing pipelines, or GPU and AI cluster for training computing models. Through the provision of virtual machines or container environment, SAIL can support more complex methodological developments that require bespoke infrastructure to support development or deployment of tailored solutions. Business intelligence tools such as Tableau, R Shiny and PowerBI (not shown) are also available.
Two other UK SeRP instances (Data Science Building projects (DSB) and Dementias Platform UK (DPUK)) are included on the diagram to help illustrate the customisability of the platform, since these will operate using different components, or other governance regimens to SAIL.
Figure 2: The National Research Data ApplianceThe various components of the National Research Data Appliance (NRDA) are shown. The entire UK SeRP environment is controlled by “Security 3 (S3)”, a feature of the National Research Data Appliance (NRDA) allowing the tenancy to be managed and controlled by non-technical team members. The user accounts and projects are defined and managed through a user interface (shown on the left) allowing different levels of access, even allowing project PI to self-manage project membership. (This self-management feature is not enabled for SAIL.) This system allows the infrastructure configuration, project configuration and governance structure to be documented, and all system components orchestrated, through the user interface. Which parts of the infrastructure are accessible and which projects within that environment are enacted in the particular tenancy model. The S3 model is periodically checked against the infrastructure and any nonconformity is corrected and reported.
| Core datasets | Features (with approximate volumes) |
|---|---|
| Annual District Birth Extract | Office for National Statistics (ONS) register of all births in Wales collected from birth registrations. 35,000 births per year, from 2003 to present. |
| Annual District Death Extract | ONS register of all deaths relating to Welsh residents, including those that died out of Wales, collected from death registrations. 32,000 births per year, from 2003 to present. |
| Critical Care Dataset | Nation-wide critical care database to monitor quality of service, to drive improvements and policy development, collected and coded at each hospital. 9,000 admissions per year, from April 2006 to May 2016. |
| Diagnostic and Therapy Services Waiting Times | Waiting times for specified diagnostic and therapy services for the NHS in Wales, submitted by Local Health Boards. 120,000 records per year from October 2005 to May 2016. |
| Emergency Department | Administrative and clinical information for all NHS Wales Accident and Emergency department attendances, collected and coded at each hospital. 750,000 attendances per year from 2009 to present. |
| National Community Child Health Database | Birth registration and monitoring of child health examinations and immunisations, bringing together data from local Child Health System databases, held by Local Health Boards. 35,000 births and 500,000 vaccinations per year, from 1987 to present. |
| Outpatient Dataset | Attendance information for all NHS Wales hospital outpatient appointments, collected from the central PAS (Patient Administrative System). 4,500,000 appointments per year from 2004 to present. |
| Outpatient Referral | Referral pathways to secondary care, submitted by Local Health Boards. 1,000,000 records per year, from 2009 to Jun 2016. |
| Patient Episode Database for Wales | NHS Wales hospital admissions comprising attendance and clinical information, diagnoses and operations performed, collected from the central PAS (Patient Administrative System). 950,000 hospital admissions per year, from 1997 to present. |
| Postponed Admitted Procedures | Information on reason for cancelled admitted procedures, submitted by Local Health Boards. 80,000 records per year, from 2013 to May 2016. |
| Primary Care GP dataset | Signs, symptoms, test results, diagnoses, prescribed treatment, referrals for specialist treatment and social aspects relating to the patient’s home environment, collected from General Practices. Averaging around 5000 patients per practice, from between January 2000 to October 2014 (depending on practice) to present. |
| Referral to Treatment Times | Monitoring the 26-week referral to treatment time target, submitted by Local Health Boards. 450,000 records per year, from September 2011 to May 2016. |
| UK Health Dimensions | NHS related lookup tables, including version 2 and version 3 Read codes, ICD10 codes, cross-coding system mappings, organisational information, and geographic information, from various reference data providers. Comprises thousands of reference tables. |
| Welsh Demographic Service Dataset | Administrative information about individuals in Wales that use NHS services, drawn from GP practices. Current and past population of Wales, from 1990 to present (approximately 5 million people). |
| Core-restricted datasets | Features (with approximate volumes) |
|---|---|
| Active Adult Survey | Large scale survey of the adult population in Wales covering sport related activities, provided by Sport Wales. Data on 13,000 individuals between January 2012 and January 2013. |
| Bowel Screening Wales | Administrative and clinical information for bowel screening, collected by Public Health Wales. 140,000 screening test records per year, from 2008 onwards. |
| Breast Test Wales | Administrative and clinical information for breast screening, collected by Public Health Wales. 110,000 screening test records per year, from 1989 onwards. |
| Cervical Screening Wales | Administrative and clinical information for cervical screening, collected by Public Health Wales. 225,000 screening test records per year, from 1990 onwards. |
| Congenital Anomaly Register and Information Service (CARIS) | Information about foetuses or babies who has or is suspected of having a congenital anomaly, collected by CARIS. 1,500 babies per year, from 1998 to 2011. |
| Educational Attainment | Annual return submitted by all sectors including nursery, primary, middle, secondary and special schools, provided by Welsh Government. 470,000 records per year, from 2003 to 2015. |
| National Survey for Wales | The Welsh Government’s major survey of the general population in Wales covering a wide range of topics affecting people and their local area. 12,000 people per year, from 2012 to 2015. |
| Welsh Cancer Intelligence and Surveillance Unit (WCISU) | National Cancer Registry for Wales to record, store and report on all incidences of cancer, collected by WCISU. 686,000 records, from 1972 to present. |
| Welsh Health Survey | Information on the health and health-related lifestyles of people living in Wales, collected by the National Centre for Social Research. 4,400 records from April 2013 to December 2014. |