| Literature DB >> 27014709 |
Rosa Gini1, Martijn Schuemie2, Jeffrey Brown3, Patrick Ryan2, Edoardo Vacchi4, Massimo Coppola5, Walter Cazzola4, Preciosa Coloma6, Roberto Berni7, Gayo Diallo8, José Luis Oliveira9, Paul Avillach3, Gianluca Trifirò6, Peter Rijnbeek6, Mariadonata Bellentani10, Johan van Der Lei6, Niek Klazinga11, Miriam Sturkenboom6.
Abstract
INTRODUCTION: We see increased use of existing observational data in order to achieve fast and transparent production of empirical evidence in health care research. Multiple databases are often used to increase power, to assess rare exposures or outcomes, or to study diverse populations. For privacy and sociological reasons, original data on individual subjects can't be shared, requiring a distributed network approach where data processing is performed prior to data sharing. CASE DESCRIPTIONS AND VARIATION AMONG SITES: We created a conceptual framework distinguishing three steps in local data processing: (1) data reorganization into a data structure common across the network; (2) derivation of study variables not present in original data; and (3) application of study design to transform longitudinal data into aggregated data sets for statistical analysis. We applied this framework to four case studies to identify similarities and differences in the United States and Europe: Exploring and Understanding Adverse Drug Reactions by Integrative Mining of Clinical Records and Biomedical Knowledge (EU-ADR), Observational Medical Outcomes Partnership (OMOP), the Food and Drug Administration's (FDA's) Mini-Sentinel, and the Italian network-the Integration of Content Management Information on the Territory of Patients with Complex Diseases or with Chronic Conditions (MATRICE).Entities:
Keywords: Data management; Data reuse; Electronic Health Records; Health Services Research; Pharmacoepidemiology; Research networks
Year: 2016 PMID: 27014709 PMCID: PMC4780748 DOI: 10.13063/2327-9214.1189
Source DB: PubMed Journal: EGEMS (Wash DC) ISSN: 2327-9214
Figure 1.Flowchart of the Data Transformation Process Occurring Locally in a Study Collecting Data from a Network of Databases
D1, D2, D3, and D4 represent data sets; T1, T2, and T3 represent data transformations.
Description of the D1 (Original DBs) Databases in Terms of Provenance and Data Items Collected from Each Data Source
|
| ||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| MA1 | EU1 | EU2 | EU3 | EU4 | EU5 | EU6 | EU7 | MS1 | MS2 | MS3 | O1 | O2 | O3 | O4 | ||
| Primary care | Administrative data | Dx Proc | Dx Proc | Dx Proc | Dx Proc | Dx Proc | Dx Proc | |||||||||
| Clinical data | Dx | Dx Rx Text Refspec Refinpat Vac | Dx Rx Text Refspec Refinpat Res | Dx Proc | Dx Rx | |||||||||||
| Secondary care | Administrative data | Spec Proc | Spec Proc | Spec Proc | Dx Proc | Dx Proc | Dx Proc | Dx Proc | Dx Proc | Dx Proc | Dx Proc | |||||
| Clinical data | Dx Proc | Dx Rx Text | ||||||||||||||
| Inpatient care | Administrative data | Dx Proc | Dx Proc | Dx Proc | Dx Proc | Dx Proc | Dx Proc | Dx Proc | Dx Proc | Dx Proc | Dx Proc | Dx Proc | ||||
| Clinical data | Dx Proc | |||||||||||||||
| Enrollment into the data collection | Geo | Geo | Geo | Geo | Geo | Charge | Charge | Charge | Elig | Elig | Elig | Elig | Elig | Elig | Elig | |
| Pharmacies | Rx | Rx | Rx | Rx | Rx | Rx | Rx | Rx | Rx | Rx | Rx | |||||
| Registry of disease-specific exemptions from copayment of healthcare | Dx | Dx | Dx | |||||||||||||
| Death registry | Dx | Dx | ||||||||||||||
| Vaccination registry | ||||||||||||||||
| Laboratory | Lab | Res | Res | Res | Res | Res | Res | |||||||||
Notes: If more than one database in a network has access to the same combination of data, they are represented by a single column. Data items—Dx: diagnostic codes; Proc: procedure codes; Rx: prescriptions or dispensings of drugs; Spec: specialty of secondary care encounters; Refsec: Referrals from secondary care; Refinpat: Referrals from inpatient care; Text Notes in free text; Lab: Labels of laboratory tests; Res: Laboratory test results; Vac: Vaccines; Geo: Presence in a geographical area; Charge: Being in assisted by a GP; Elig: Satisfying eligibility criteria for an insurance company or health plan.
Comparison with Respect to T1, D2, T2
| Does not require mapping to external standard: original coding and/or free text is maintained | Demanded to local partners, no formal procedure | No formal documentation | |
| Source data are homogeneous in coding systems | Local report on specific issues + feedback from standard programs checking for completeness and consistency | Data model, data elements and guiding principles approved by partners. ETL formal document, ad hoc per DB | |
| Source data standardized to common vocabulary by domain: Drug (RxNorm), Condition (SNOMED), Labs (LOINC) | Formal procedures: OSCAR and GROUCH tools | ETL formal document, ad hoc per DB | |
| Source data are homogeneous in coding systems | Formal procedures checking data completeness | Local configuration of the TheMatrix software (text file) | |
Comparison with T3 and D4
| Y | Y | N | Jerboa | Java & Jerboa scripting language | |
| Y | Y | Y | Modular programs and macros; PopMedNet | SQL, SAS, Java, R, | |
| Y | Y | Y | — | SQL, SAS, R, C, Java | |
| Y | Y | N | TheMatrix | Java & TheMatrix scripting language | |