| Literature DB >> 35262495 |
Chuang Gao1, Mark McGilchrist1, Shahzad Mumtaz1, Christopher Hall1, Lesley Ann Anderson2, John Zurowski3, Sharon Gordon4, Joanne Lumsden4, Vicky Munro4, Artur Wozniak4, Michael Sibley5, Christopher Banks5, Chris Duncan6, Pamela Linksted6, Alastair Hume7, Catherine L Stables8, Charlie Mayor9, Jacqueline Caldwell5, Katie Wilde4, Christian Cole1, Emily Jefferson1.
Abstract
For over a decade, Scotland has implemented and operationalized a system of Safe Havens, which provides secure analytics platforms for researchers to access linked, deidentified electronic health records (EHRs) while managing the risk of unauthorized reidentification. In this paper, a perspective is provided on the state-of-the-art Scottish Safe Haven network, including its evolution, to define the key activities required to scale the Scottish Safe Haven network's capability to facilitate research and health care improvement initiatives. A set of processes related to EHR data and their delivery in Scotland have been discussed. An interview with each Safe Haven was conducted to understand their services in detail, as well as their commonalities. The results show how Safe Havens in Scotland have protected privacy while facilitating the reuse of the EHR data. This study provides a common definition of a Safe Haven and promotes a consistent understanding among the Scottish Safe Haven network and the clinical and academic research community. We conclude by identifying areas where efficiencies across the network can be made to meet the needs of population-level studies at scale. ©Chuang Gao, Mark McGilchrist, Shahzad Mumtaz, Christopher Hall, Lesley Ann Anderson, John Zurowski, Sharon Gordon, Joanne Lumsden, Vicky Munro, Artur Wozniak, Michael Sibley, Christopher Banks, Chris Duncan, Pamela Linksted, Alastair Hume, Catherine L Stables, Charlie Mayor, Jacqueline Caldwell, Katie Wilde, Christian Cole, Emily Jefferson. Originally published in the Journal of Medical Internet Research (https://www.jmir.org), 09.03.2022.Entities:
Keywords: Safe Haven; data governance; electronic health records
Mesh:
Year: 2022 PMID: 35262495 PMCID: PMC8943560 DOI: 10.2196/31684
Source DB: PubMed Journal: J Med Internet Res ISSN: 1438-8871 Impact factor: 7.076
Figure 1Model of Scottish Safe Havens. Researchers have access to the Safe Haven application process after data governance approvals. Safe Haven staff link and deidentify data and make them available in the analytic platform for researchers to analyze. ETL: extract, transform, and load.
A summary table of Safe Haven properties.
| Function | Safe Haven | |||||
|
| eDRISa (national) | DaSHb | Glasgow Safe Haven | HICc | Lothian or DataLochd | |
|
| ||||||
|
| Safe Haven affiliation | PHSe | UoAf or NHSg | NHS | UoDh or NHS | NHS |
|
| Analytical platform affiliation | UoEi (EPCCj) | UoA | UoGk (RCBl) | UoD | UoE (EPCC) |
|
| Network for Safe Haven services (cohort building and linkage) | NHSnet or EPCC | NHSnet | NHSGm or NHSnet | NHS | NHSLn or NHSnet |
|
| Network for analytical platform | UoE or Janet | UoA or Janet | UoG or Janet | UoD or Janet and secure public cloud | UoE or Janet |
|
| Data repository network | NHSnet or EPCC | NHSnet | NHSnet | NHSnet | NHSnet |
|
| Geographical regiono | Scotland | NHS Grampian | West of Scotland | NHS Tayside and Fife | Lothian or South East of Scotland |
|
| Populationp | 5.7 million | 600,000 | 1.2 million | 850,000 | 900,000 |
|
| Active projects in 2020 | >600 | >120 | >100 | >100 | >20 |
|
| Controller or controllers | PHS+NRSq+Scottish government | Original data sources | Original data sources | Original data sources | Original data sources |
|
| Processor or processors | eDRIS | DaSH | Glasgow Safe Haven | HIC | Lothian or DataLoch |
|
| Governance committee | Health and Social Care PBPPr and Statistics PBPP | North Node Privacy Advisory Committee | Privacy advisory committee | HIC governance committee | Data access committee |
|
| ||||||
|
| Feasibility | Manual or NDCs | Manual or local documents | Manual, local documents, or TriNetXt | Manual or using RDMPu [ | Manual or local documents |
|
| Metadata provided with project extracts | No | Yes or standard (workflow) | Yes or bespoke | Yes or standard (RDMP) | Yes or bespoke |
|
| Phenotype or cohort development | ICDv code from user | By user | Locally stored algorithms or user | By user | By user or CALIBER Library |
|
| ||||||
|
| Indexer | External (PHS for health data) | Internal | Internal | Internal (RDMP) | Internal |
|
| Deidentification method | Workflow [ | SQL procedure | Database views (usually SQL) | Workflow (RDMP) | SQL procedure |
|
| CHIw seeding | NSSx or CHI Linkage Team | CHI Linkage Team or internal | Internal | Internal | CHI Linkage Team |
|
| ||||||
|
| Archival | NHSnet and UoE (EPCC) | UoA | NHSnet and UoG (RCB) | NHSnet and UoD and secure public cloud | NHSnet and UoE (EPCC) |
|
| Project data content standards | As source | As source or ICD | As source | As source | As source |
|
| Project data format standards | CSV | SPSS, Stata, or CSV | CSV | CSV or database | CSV |
|
| ||||||
|
| Data repository number | ≥85 | 1 | 1 | 1 | 1 each |
|
| Data repository ownership | No | Yes | Yes | Yes | Yes |
|
| Source data metadata | NDC | Internal shared files | Internal shared files | RDMP | Data dictionaries |
|
| Metadata publicly available | Yes | No | No | Yes | Yesy |
|
| Number of data sets available | 85 | 40 | ≥200 | 163 | 12 |
|
| Source data extract, transform, and load | Data management team PHS | Internal (SQL and Python) | Business Intelligence and Informatics in NHSG | Internal (RDMP) | Internal (SQL and Python) |
|
| Repository uses CDMz | No (proprietary) | No (proprietary) | No (proprietary) | No (proprietary) | No (proprietary) |
aeDRIS: electronic Data Research and Innovation Service.
bDaSH: Grampian Data Safe Haven.
cHIC: Health Informatics Centre.
dWhen this work was conducted, the Lothian Research Safe Haven (LRSH) and DataLoch were separate (though closely partnered). Since April 1, 2021, LRSH has been integrated within the DataLoch service.
ePHS: Public Health Scotland.
fUoA: University of Aberdeen.
gNHS: National Health Service.
hUoD: University of Dundee.
iUoE: University of Edinburgh.
jEPCC: Edinburgh Parallel Computing Centre.
kUoG: University of Glasgow.
lRCB: Robertson Centre For Biostatistics.
mNHSG: National Health Service Glasgow.
nNHSL: National Health Service Lothian.
oRegional Safe Havens have governance to request regional health board data. For example, Glasgow Safe Haven can request West of Scotland Health Board data.
pSafe Havens have access to historic records for patients who are deceased, which can increase the accessible data.
qNRS: National Records Scotland.
rPBPP: Public Benefit And Privacy Panel.
sNDC: national data catalog.
tTriNetX is a health research network tool that connects to assist drug discovery by helping pharmaceutical companies access clinical data. Glasgow Safe Haven has a TriNetX node. For data mapped into TriNetX tool, their study feasibility can be done using TriNetX.
uRDMP: Research Data Management Platform.
vICD: International Statistical Classification of Diseases and Related Health Problems.
wCHI: community health index.
xNSS: National Services Scotland.
yCOVID-19 data dictionary is on DataLoch website.
zCDM: common data model.
Figure 2The Safe Haven project workflow describes the stages a Safe Haven takes to support a typical project. (1) Data discovery and research feasibility—users will initialize the application on the data governance aspects; (2) (optionally) index and link a research data set or administrative or clinical data set for hosting at a given analytic platform; (3) cohort building the selected or agreed data from Safe Haven data sets; (4) the transfer of extracted data to an analytic platform after the data governance has been checked; a user analyzes analytic platform data set. The project data set is archived at the end of the project.
Figure 3Data indexing and linking services in Scotland. CHI: community health index; COPD: chronic obstructive pulmonary disease; NHS: National Health Service.
Figure 4Safe Haven data repository networks. Upper row from left to right: electronic Data Research and Innovation Service, Health Informatics Center, and Glasgow Safe Haven. The lower row from left to right: Lothian, DataLoch, and Grampian Data Safe Haven Safe Haven. BI: Business Intelligence and Informatics; DaSH: Grampian Data Safe Haven; eDRIS: electronic Data Research and Innovation Service; HIC: Health Informatics Center; NHS: National Health Service; PHS: Public Health Scotland; RDMP: Research Data Management Platform; SH: Safe Haven.