| Literature DB >> 30779778 |
Jeffrey G Klann1,2,3, Matthew A H Joss1, Kevin Embree4, Shawn N Murphy1,2,3.
Abstract
BACKGROUND: The All Of Us Research Program (AOU) is building a nationwide cohort of one million patients' EHR and genomic data. Data interoperability is paramount to the program's success. AOU is standardizing its EHR data around the Observational Medical Outcomes Partnership (OMOP) data model. OMOP is one of several standard data models presently used in national-scale initiatives. Each model is unique enough to make interoperability difficult. The i2b2 data warehousing and analytics platform is used at over 200 sites worldwide, which uses a flexible ontology-driven approach for data storage. We previously demonstrated this ontology system can drive data reconfiguration, to transform data into new formats without site-specific programming. We previously implemented this on our 12-site Accessible Research Commons for Health (ARCH) network to transform i2b2 into the Patient Centered Outcomes Research Network model. METHODS ANDEntities:
Mesh:
Year: 2019 PMID: 30779778 PMCID: PMC6380544 DOI: 10.1371/journal.pone.0212463
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Fig 1Ontology-driven data transformation in i2b2.
The ontology, which defines concept metadata, drives the transformation from i2b2 to OMOP. Data are retrieved from the i2b2 fact table, converted to OMOP codes via ontology lookups, and then written to the OMOP tables specified through the ontology concept path.
ARCH ontologies and terminologies vs OMOP.
| i2b2 ARCH ontology tree | ARCH terminology provided | OMOP Table | OMOP Terminology | PCORnet Equivalent Table |
|---|---|---|---|---|
| Visit Occurrence | Encounter | |||
| Person | Demographics | |||
| ICD-9, ICD-10 | Condition Occurrence, Measurement, Procedure Occurrence, Observation | SNOMED | Diagnoses AND Condition | |
| ICD-9, ICD-10, CPT, HCPCS | Procedure Occurrence, Device Exposure, Drug Exposure, Observation | S | Procedure | |
| LOINC | Measurement | LOINC | Lab_Result_CM | |
| Measurement | LOINC | Vitals | ||
| RxNorm, NDC | Drug Exposure | RxNorm | Prescribing AND Dispensing |
ARCH Ontology modifiers vs. those in OMOP.
| Domain | Modifier | ARCH | OMOP |
|---|---|---|---|
| Diagnosis | Condition vs Diagnosis | ||
| Primary/Secondary | |||
| Stop Reason | |||
| Procedure | Primary/Secondary | ||
| Laboratory | Lab Priority | ||
| Lab Location | |||
| Prescribing vs. Dispensed | |||
| Medication | Refills | ||
| Quantity | |||
| Supply | |||
| Dose, Route, Sig, Stop Reason, Lot #, effective drug dose; route concept id; sig; stop reason; lot # | |||
| Frequency | |||
| Vitals | Source | ||
| Position | |||
| Normal Range |
Yellow: terminological differences. Red: not present. Green: equivalence between models.
Codes that could not be found in the OMOP concept dictionary.
| Code Type | Name | Code | ~# pts |
|---|---|---|---|
| Benzocaine/menthol | 466426 | 50,000 | |
| (retired) Antibody, non-RBC, quantitative, first antigen | 86008 | 2,000 | |
| Citric acid/simethicone/sodium bicarbonate | 689842 | 2,000 | |
| Acetaminophen/diphenhydramine/pseudoephedrine | 689786 | 1,000 | |
| Abdominal pain, unspecified site | 78900 | 1,000 | |
| Individual medical psychotherapy by a physician… | 90841–90844 | 500 | |
| Hepatitis C antibody | 86302 | 200 | |
| Fentanyl citrate, 0.05 mg/ml injectable solution | 856409 | 100 | |
| (retired) ADP Titration Platelet Aggregation Study | 85575 | 100 | |
| (retired) Kidney function study including pharmacologic intervention | 78726 | 100 |
Approximate term frequency in ARCH is shown to the right, as a measure of the data loss caused by the missing code.
Transformation source to target table.
| i2b2 | OMOP |
|---|---|
| Care Site | |
| Death | |
| Provider | |
| Condition Occurrence, Observation, Device, Measurement, Procedure | |
| Procedure, Observation, Measurement, Condition, Drug Exposure | |
| Prescribing | |
| Person | |
| Visit Occurrence | |
| Measurement | |
| Observation Period | |
| Drug Era | |
| Condition Era | |
This table is divided into four sections, showing the different ways target values are generated.
Fig 2Mapping distribution from ARCH terminologies to OMOP.
ICD and CPT codes map to six different tables in OMOP. This is just one (easily visualizable) aspect of the many complexities encountered in mapping. Boxes in the treemap are sized in a logarithmic scale.
Fig 3Achilles results on our “10% of Partners’ data” dataset.
From top left to bottom right: (a) data density (notice all are in the same magnitude); (b) age at first observation (notice the expected peak in 20s followed by decrease, with a spike at age 0 representing babies born in the hospital but not receiving follow-up care); (c) population distribution by race.