| Literature DB >> 34514354 |
Vaclav Papez1,2, Maxim Moinat3, Stefan Payralbe3, Folkert W Asselbergs1,4, R Thomas Lumbers1,2, Harry Hemingway1,2, Richard Dobson1,2,5, Spiros Denaxas1,2,6.
Abstract
OBJECTIVE: The aim of the study was to transform a resource of linked electronic health records (EHR) to the OMOP common data model (CDM) and evaluate the process in terms of syntactic and semantic consistency and quality when implementing disease and risk factor phenotyping algorithms.Entities:
Keywords: EHR; OMOP; algorithms; heart failure; phenotyping
Year: 2021 PMID: 34514354 PMCID: PMC8423424 DOI: 10.1093/jamiaopen/ooab001
Source DB: PubMed Journal: JAMIA Open ISSN: 2574-2531
Figure 1:Overview of the transformation process of raw electronic health records linked from three national sources to the OMOP common data model. The main steps of the process are the following: (A) Raw data from primary care (CPRD), hospitalizations (HES) and mortality (ONS) are migrated and loaded in a Postgres relational database system. (B) White Rabbit summary reports are generated and inform the design of the ETL pipeline; (C) Working with experts of the source data, syntactic mappings are generated using the “Rabbit In a Hat” tool and mappings between vocabularies are created using the “Usagi” tool; (D) In an iterative manner, raw data are passed through the ETL pipeline, mapping quality is assessed using the “Achilles” tool and bespoke queries/validation metrics and the ETL mappings are refined; (E) the final data set is stored in a Postgres relational database and is queried to produce datasets for statistical analyses. CPRD, clinical practice research datalink; HES, Hospital Episode Statistics; ONS, Office for National Statistics; OMOP, Observational Medical Outcomes Partnership (OMOP); CDM, common data model; ETL, extract transform load.
Mapping coverage for disease and drug clinical terminologies used across the entire cohort in raw CPRD, HES, and ONS and converted to the OMOP CDM standard dictionary
| Total unique terms in terminology | Total mapped terms (%) | Unique terms used in events | Used mapped terms (%) | Total unique events | Total excluded events (%) | Total mapped events (%) | |
|---|---|---|---|---|---|---|---|
|
| 111 163 | 82.13 | 67 886 | 97.58 | 320 328 788 | 0.22 | 97.42 |
|
| 6519 | 99.98 | 495 | 100 | 13 130 | 0.92 | 100 |
|
| 17 934 | 85.85 | 10 158 | 90.44 | 31 905 144 | 0.01 | 99.09 |
|
| 11 000 | 99.01 | 8474 | 99.45 | 8 453 813 | 0 | 99.88 |
|
| 66 970 | 60.09 | 40 647 | 62.53 | 264 589 509 | 1 | 92.67 |
|
| 287 | 45.29 | 22 | 72.72 | 27 036 | 1.55 | 99.95 |
|
| 259 | 51.35 | 245 | 54.28 | 125 581 411 | 0.59 | 54.06 |
|
| 324 | 97.22 | 324 | 97.22 | 151 645 201 | 12.24 | 98.16 |
Cohort summary and comparison between the entire cohort of raw CPRD, HES, and ONS data the OMOP CDM cohort
| CPRD-HES-ONS source data | OMOP CDM data | |
|---|---|---|
|
| 502 723 | 502 367 |
| Median follow-up (IQR) | 9.56 (10.39) | 9.56 (10.39) |
| Demographics | ||
| Female (%) | 52.39 | 52.4 |
| Caucasian (%) | 90.81 | 90.46 |
| Most deprived fifth (%) | 15.18 | 15.18 |
| Lifestyle | ||
| Smoker (%) | 324 755 (64.59) | 331 445 (65.97) |
| Never smoker (%) | 155 995 (31.03) | 149 569 (29.77) |
| Clinical measures mean (SD) or median (IQR) | ||
| BMI (kg/m2) | 28.9 (6.44) | 28.9 (6.44) |
| SBP (mmHg) | 143.07 (22.42) | 143.07 (22.42) |
| DBP (mmHg) | 80.05 (12.19) | 80.05 (12.19) |
| Platelets | 2.39 (3.53) | 2.39 (3.53) |
| Total WBC counts | 7.49 (2.88) | 7.49 (2.88) |
| Albumin | 40.71 (4.5) | 40.71 (4.5) |
| Creatinine (µmol/L) | 102.76 (58.09) | 102.76 (58.09) |
| Hemoglobin | 129.92 (18.22) | 129.92 (18.22) |
| Medication | ||
| Loop diuretics (%) | 42.2 | 42.2 |
| ACE-I (%) | 50.2 | 50.1 |
| Βeta-blockers (%) | 48.3 | 48.2 |
BMI, body mass index; SBP, systolic blood pressure; DBP, diastolic blood pressure; WBC, white blood cell count; ACE-I, angiotensin-converting enzyme (ACE) inhibitors; IQR, interquartile range
Overall comorbidity comparison between the entire cohort of raw CPRD, HES, and ONS data the OMOP CDM cohort
| Comorbidity | Unique patients % (n) | Unmapped patients % ( | Incorrectly mapped patients % ( | |
|---|---|---|---|---|
| Original | OMOP CDM | |||
|
| 35.39 (177 954) | 35.40 (177 866) | 0.05 (91) | 0.001 (3) |
|
| 49.55 (249 119) | 53.33 (267 925) | 0.005 (13) | 7.02 (18 819) |
|
| 23.86 (119 968) | 24.09 (121 059) | 0.23 (280) | 1.13 (1371) |
|
| 20.29 (102 020) | 20.31 (102 028) | 0.01 (11) | 0.02 (19) |
|
| 65.84 (331 011) | 65.86 (330 884) | 0.041 (138) | 0.003 (11) |
|
| 26.86 (135 047) | 27.34 (137 380) | 0.048 (65) | 1.74 (2398) |
AF, atrial fibrillation; COPD, chronic obstructive pulmonary disease; T2DM, type 2 diabetes, AMI, acute myocardial infarction; HT, hypertension.