| Literature DB >> 34095524 |
Jiao Song1, Elizabeth Elliot2, Andrew D Morris2, Joannes J Kerssens3, Ashley Akbari1, Simon Ellwood-Thompson1, Ronan A Lyons1.
Abstract
INTRODUCTION: Due to various regulatory barriers, it is increasingly difficult to move pseudonymised routine health data across platforms and among jurisdictions. To tackle this challenge, we summarized five approaches considered to support a scientific research project focused on the risk of the new non-vitamin K Target Specific Oral Anticoagulants (TSOACs) and collaborated between the Farr institute in Wales and Scotland. APPROACH: In Wales, routinely collected health records held in the Secure Anonymous Information Linkage (SAIL) Databank were used to identify the study cohort. In Scotland, data was extracted from national dataset resources administered by the eData Research & Innovation Service (eDRIS) and stored in the Scottish National Data Safe Haven. We adopted a federated data and multiple analysts approach, but arranged simultaneous accesses for Welsh and Scottish analysts to generate study cohorts separately by implementing the same algorithm. Our study cohort across two countries was boosted to 6,829 patients towards risk analysis. Source datasets and data types applied to generate cohorts were reviewed and compared by analysts based on both sites to ensure the consistency and harmonised output. DISCUSSION: This project used a fusion of two approaches among five considered. The approach we adopted is a simple, yet efficient and cost-effective method to ensure consistency in analysis and coherence with multiple governance systems. It has limitations and potentials of extending and scaling. It can also be considered as an initialisation of a developing infrastructure to support a distributed team science approach to research using Electronic Health Records (EHRs) across the UK and more widely.Entities:
Keywords: Team science; cross-jurisdictional data linkage; electronic health records
Year: 2018 PMID: 34095524 PMCID: PMC8142956 DOI: 10.23889/ijpds.v3i3.442
Source DB: PubMed Journal: Int J Popul Data Sci ISSN: 2399-4908
| Approach | Advantages/Disadvantages |
|---|---|
1. Centralised data and analysis. Data moved from 3 centres – 1 analyst (centralised data model) | All data submitted from each site to a central site. Analysis is easier because of being fully in control of a single researcher having all the data available. Each site must “trust” the central site and must seek governance approval from each site and perhaps put in place a legal contract. Restrictions about data sovereignty may prevent this approach. |
2. Federated data and single analyst. Data at 3 centres – 1 analyst accessing each platform then combining results | Same researcher accesses each system separately and combines outputs. Same researcher so same approach. Access to separate systems and learning curves. Separate access contracts and conditions of use. If outputs need combined individual level analysis, then not workable. |
3. Federated data and multiple analysts. Data at 3 centres – 3 separate analyses, combine results | All analysis done separately by host site with outputs collated. Can be done quickly as each site knows their own system. Consistency can be hard to achieve so more validation and process documentation required. If outputs need combined individual level analysis, then not workable. Dependant on resources available at each local site. |
4. Linked federated data and analysis. Data at 3 centres – 1 analyst (remote real time access model) | The sites have established inter connections. From a site the researcher can access all the required data. Analysis is easier because of being fully in control of the researcher having all the data available. Each site must “trust” the central site and must seek governance approval from each site and perhaps put in place a legal contract. Restrictions about data sovereignty may prevent this approach. |
5. Federated data and distributed analysis. Data at 3 centres – 1 analyst directing federated queries | Using a distributed query system – issue same query to all sites. Same analysis performed in each site. No data moving so could be good for cross country restrictions. Common data model required. Data needs to be harmonised. More complex from an IT and governance perspective. |
Figure 1: Cohort generation algorithm| Welsh cohort | Scottish cohort | Total | |
|---|---|---|---|
| Male | 1,347 | 1,938 | 3,285 |
| Female | 1,329 | 2,215 | 3,544 |
| Total | 2,676 | 4,153 | 6,829 |
| Welsh cohort | Scottish cohort | |||
|---|---|---|---|---|
| Variable name | Source data | Variable name | Source data | |
| Patient identity & linkage field | ALF_E | PEDW | UPI_NUMBER | SMR01 |
| Admission date | ADMIS_DT | PEDW | ADMISSION_DATE | SMR01 |
| Admission methods | ADMIS_MTHD_CD | PEDW | ADMISSION_TYPE | SMR01 |
| Discharge types | DISCH_MTHD_CD | PEDW | DISCHARGE_TYPE | SMR01 |
| Drugs prescription | EVENT_CD | WLGP | BNFItemcode | PIS |
| Date of prescription | EVENT_DT | WLGP | PRES_DATE | PIS |
| Date of birth | WOB | ADDE | DATE_OF_BIRTH | NRS |
| Gender | GNDR_CD | ADDE | SEX | NRS |
| Deprivation quintile | WIMD2011_5TH | PEDW | SIMD_QUINTILE | SMR01 |
| Primary cause of death | DEATHCAUSE_DIAG_UNDERLYING_CD | ADDE | CAUSE_OF_DEATH_CODE | NRS |
| Date of death | DOD | ADDE | DATE_OF_DEATH | NRS |
| Welsh data | Scottish data | ||
|---|---|---|---|
| Gender | 0 | N/A | Not known (i.e. indeterminate sex, includes intersex) |
| 1 | Male | Male | |
| 2 | Female | Female | |
| 8 | Not specified | N/A | |
| 9 | N/A | Not specified (includes not stated by patient, or not recorded) | |
| Date | Date format | YYYY-MM-DD | DDMMYY |
| Drug | Drug information | EVENT_CD: READ codes, e.g. bs74. | British National Formulary Drug Codes (BNF). e.g. BNF item code 0601011A0BBADAC |