| Literature DB >> 29760661 |
Anil Pacaci1,2, Suat Gonul1,3, A Anil Sinaci1, Mustafa Yuksel1, Gokce B Laleci Erturkmen1.
Abstract
Background: Utilization of the available observational healthcare datasets is key to complement and strengthen the postmarketing safety studies. Use of common data models (CDM) is the predominant approach in order to enable large scale systematic analyses on disparate data models and vocabularies. Current CDM transformation practices depend on proprietarily developed Extract-Transform-Load (ETL) procedures, which require knowledge both on the semantics and technical characteristics of the source datasets and target CDM. Purpose: In this study, our aim is to develop a modular but coordinated transformation approach in order to separate semantic and technical steps of transformation processes, which do not have a strict separation in traditional ETL approaches. Such an approach would discretize the operations to extract data from source electronic health record systems, alignment of the source, and target models on the semantic level and the operations to populate target common data repositories. Approach: In order to separate the activities that are required to transform heterogeneous data sources to a target CDM, we introduce a semantic transformation approach composed of three steps: (1) transformation of source datasets to Resource Description Framework (RDF) format, (2) application of semantic conversion rules to get the data as instances of ontological model of the target CDM, and (3) population of repositories, which comply with the specifications of the CDM, by processing the RDF instances from step 2. The proposed approach has been implemented on real healthcare settings where Observational Medical Outcomes Partnership (OMOP) CDM has been chosen as the common data model and a comprehensive comparative analysis between the native and transformed data has been conducted.Entities:
Keywords: common data model; healthcare datasets; pharmacovigilance; postmarketing safety study; semantic transformation
Year: 2018 PMID: 29760661 PMCID: PMC5937227 DOI: 10.3389/fphar.2018.00435
Source DB: PubMed Journal: Front Pharmacol ISSN: 1663-9812 Impact factor: 5.810
Figure 1Overview of the semantic transformation methodology—population of a CDM repository from disparate EHR datasets through semantic mapping rules.
Mappings of ER constructs to OWL resources.
| Entity | owl:Class |
| Attribute | owl:DatatypeProperty |
| Relationship | owl:ObjectProperty |
Figure 2Visualization of OMOP ontology constructs.
Figure 3(A) A sample semantic conversion rule for person gender and birthdate, (B) A sample filtering rule to check existence of gender and year of birth.
Figure 4(A) Small portion of a patient data that covers the gender and the birthdate. (B) A sample unit test for semantic transformation rules.
Figure 5Specifying eligibility criteria in TAS.
Figure 6Chronograph visualizing the temporal pattern between a prescription of a drug and the occurrence of a medical event.
Figure 7Implementation of the proposed framework in real-world settings.
Patient counts in original and transformed databases.
| Number of patients | 8,93,870 | 8,55,101 | 95.66 |
| Number of patients in nifedipine cohort | 113 | 113 | 100 |
| Number of patients in AMI cohort | 2,562 | 2,556 | 99.77 |
Patients' age statistic in original and transformed databases.
| Average age (overall) | 50.723 | 50.877 |
| Average age (male) | 49.917 | 49.879 |
| Average age (female) | 51.791 | 51.848 |
Figure 8Demographic summary of AMI and nifedipine cohorts in original TUD and populated OMOP database.
Patient counts of selected cohorts in original and transformed databases.
| Total exposures | 494 | 494 |
| Average exposure per patient | 4.371 | 4.371 |
| Total occurrences | 6708 | 6705 |
| Average occurrence per person | 2.618 | 2.623 |
Figure 9Proportion of Top 50 medications to whole exposures in TUD and transformed OMOP databases.
Figure 10Proportion of Top 50 conditions to whole occurrences in TUD and transformed OMOP databases.