| Literature DB >> 35061834 |
Samantha S R Crossfield1, Kieran Zucker2, Paul Baxter3, Penny Wright2, Jon Fistein1, Alex F Markham1,2, Mark Birkin1, Adam W Glaser1,2, Geoff Hall1,2.
Abstract
BACKGROUND: The use of linked healthcare data in research has the potential to make major contributions to knowledge generation and service improvement. However, using healthcare data for secondary purposes raises legal and ethical concerns relating to confidentiality, privacy and data protection rights. Using a linkage and anonymisation approach that processes data lawfully and in line with ethical best practice to create an anonymous (non-personal) dataset can address these concerns, yet there is no set approach for defining all of the steps involved in such data flow end-to-end. We aimed to define such an approach with clear steps for dataset creation, and to describe its utilisation in a case study linking healthcare data.Entities:
Mesh:
Year: 2022 PMID: 35061834 PMCID: PMC8782367 DOI: 10.1371/journal.pone.0262609
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Common data types and the data processing steps applied in the Comprehensive Patient Records project.
| Data Type | Data Processing Steps |
|---|---|
| Patient Name | Excluded at source |
| NHS Number | An input variable in pseudonymous digest creation (performed by the data providers) |
| Date of Birth | Transformed into age at first cancer diagnosis / matched index date, in age-bands (<1 years, 1–4 years and 5-year bands thereafter until 80–100) |
| Date of Death | A Boolean indicator of survival status as known to the data source was provided |
| Postcode | Reduced to postcode sector only e.g. LS1 5, or mapped to Index of Multiple Deprivation score [ |
| Diagnostic Codes | Aggregated to a binary yes/no for prevalence of disease or disease groups in time periods pre- and post- cancer diagnosis or matched index date |
| Prescribing Data | Mapped to annual cost per patient; aggregated to a Boolean for presence or absence of diabetes drug classes |
| Sex | No processing applied |
| Ethnicity | No processing applied (coded using national code-lists) |
| Date last seen by primary care team | No processing applied |
Summary of actions defined in the data flow protocol and the parties involved in each step.
| Step | Action Description | Organisation |
|---|---|---|
| 1 | Create and share a hashed project-specific salt (SALT1) | Data providers |
| 2 | Determine records eligible for linkage | Data providers |
| 3 | Use agreed fields and SALT1 to generate project-specific digests (PSD1s) for these records | Data providers |
| 4 | Transfer PSD1s to the linkage party | Data providers |
| 5 | Compile a list of matching PSD1s, return to data providers | Third party |
| 6 | Delete any locally-held PSD1s | Third party |
| 7 | Creation of at-source anonymised datasets; transfer to the linkage party | Data providers |
| 8 | Creation of a project-specific salt (SALT2) and replacement of PSD1s with a second digest (PSD2) | Third party |
| 9 | Linkage of datasets to produce the research dataset (RD) | Third party |
| 10 | Authorised research access to the RD is granted, in a trusted research environment | Third party |
| 11 | Analysis of the RD; research output generation | Research team |
| 12 | Outputs screened for risk of re-identification and reviewed against ethical and governance requirements prior to authorised release | Third party |
Fig 1Diagram of the parties involved in the case study and the actions performed during the process of data flow, linkage and access, as defined using the protocol for linkage and anonymisation of data.
Fig 2Flow chart of the patient cohort selection process for the case study, defining the number of patient records in each stage of selection and linkage.
Recommended minimum criteria for incorporation into a checklist for data safe haven cross-accreditation.
| Recommended Minimum Criteria |
|---|
| Protocols and work instructions that incorporate any relevant ethical and legal measures |
| Demonstrate compliance with the NHS DSPT standards when handling health data (UK health data- specific) |
| Robust data access control |
| User training in advanced information security |
| An information security management system with an assigned data protection officer and information governance management group oversight |
| Procedures for data classification, measuring identifiability, anonymisation, risk assessment and privacy impact assessment |
| Demonstrable adherence to procedures for appropriate auditing of controls |
| Where data being linked are confidential or could potentially identify individuals and are linked for purposes of limited disclosure/access: demonstrate reasonable data stewardship (which in the UK is defined in The 2013 Information Governance Review [ |
| Where provision is also provided for analysis using identifiable data: remote access and controlled data ingress and egress; third-party review of outputs using data non-disclosure principles prior to dissemination; availability of ‘safe room’ with restricted access as appropriate |