| Literature DB >> 28231289 |
David A Springate1,2, Rosa Parisi3, Ivan Olier4, David Reeves1,2, Evangelos Kontopantelis1,5.
Abstract
Research with structured Electronic Health Records (EHRs) is expanding as data becomes more accessible; analytic methods advance; and the scientific validity of such studies is increasingly accepted. However, data science methodology to enable the rapid searching/extraction, cleaning and analysis of these large, often complex, datasets is less well developed. In addition, commonly used software is inadequate, resulting in bottlenecks in research workflows and in obstacles to increased transparency and reproducibility of the research. Preparing a research-ready dataset from EHRs is a complex and time consuming task requiring substantial data science skills, even for simple designs. In addition, certain aspects of the workflow are computationally intensive, for example extraction of longitudinal data and matching controls to a large cohort, which may take days or even weeks to run using standard software. The rEHR package simplifies and accelerates the process of extracting ready-for-analysis datasets from EHR databases. It has a simple import function to a database backend that greatly accelerates data access times. A set of generic query functions allow users to extract data efficiently without needing detailed knowledge of SQL queries. Longitudinal data extractions can also be made in a single command, making use of parallel processing. The package also contains functions for cutting data by time-varying covariates, matching controls to cases, unit conversion and construction of clinical code lists. There are also functions to synthesise dummy EHR. The package has been tested with one for the largest primary care EHRs, the Clinical Practice Research Datalink (CPRD), but allows for a common interface to other EHRs. This simplified and accelerated work flow for EHR data extraction results in simpler, cleaner scripts that are more easily debugged, shared and reproduced.Entities:
Mesh:
Year: 2017 PMID: 28231289 PMCID: PMC5323003 DOI: 10.1371/journal.pone.0171784
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Definitions of incidence and prevalence terms.
| Term | Definition |
|---|---|
| Incident Numerator | event occurs within year AND transfer out date > event date |
| Incident Denominator | No events in previous years AND transfer out date > year start date |
| Prevalent Numerator | event occurs within year AND transfer out date > event date |
| Prevalent Denominator | transfer out date > year start date |
| Follow-up | minimum of (year end date, transfer out date, death date)—year start date |
Example code list definition in csv format.
| definition | status | items | |
|---|---|---|---|
| terms | include | peripheral vascular disease | peripheral angiopathy terms disease presenile_gengrene terms |
| terms | include | peripheral gangrene | |
| terms | exclude | wrong answer | |
| terms | include | intermittent claudication | |
| terms | include | thromboangiitis obliterans | |
| terms | include | Diabetic peripheral angiopathy | |
| terms | include | diabetes | |
| terms | include | buerger | |
| terms | exclude | excepted | |
| codes | include | G73 | |
| drugs | include | insulin | |
| drugs | include | aspirin |
Available functions in rEHR.
| code file | function | description |
|---|---|---|
| codelists | extract_keywords | Function to extract rows from a lookup table based on keywords |
| MedicalDefinition | Constructor function for MedicalDefinition class | |
| import_definitions | Imports definitions to be searched from a csv file into a MedicalDefinition object | |
| export_definition_search | Exports definition searches to an excel file | |
| definition_search | This function is used to build new definition lists based on medical definitions | |
| print.MedicalDefinition | Basic print method for medical definition classes | |
| cohort | build_cohort | Converts a longitudinal data set from e.g. ∖ |
| cut_tv | Cuts a survival dataset on a time varying variable | |
| cprd_import | read_zip | Reads a zipped data file to a dataframe |
| database | Wrapper for dbConnect | |
| add_to_database | Adds a series of files to a database | |
| import_CPRD_data | Imports all selected CPRD data into an sqlite database | |
| cprd_medcodes | patients_per_medcode | Produce a dataset of CPRD medcodes with frequencies of patients in the clinical table |
| medcodes_to_read | Translate CPRD medcodes to Read/OXMIS | |
| read_to_medcodes | Translate Read/Oxmis codes to CPRD medcodes | |
| cprd_patients | patients_in_window | Select patients alive and registered between certain dates |
| data | clinical_codes | Clinical codes for 17 QOF conditions, smoking and HbA1c |
| entity | A sample of 6 clinical tests and meaures used in UK primary care | |
| product | A sample of 500 medicines used in UK primary care | |
| repsample_example | An example dataset to demonstrate the repsample function. 2474 theoretical UK general practices | |
| ehr_def | An example EHR_definition object for defining parameters for simulating EHR data | |
| db_view | head.SQLiteConnection | head for SQLiteConnection object |
| EHR_definition | define_EHR | Construct an EHR_definition object |
| print.EHR_definition | Tools for describing EMR_description objects | |
| ehr_simulation | random_dates | Generates random dates between a start and end day |
| surv_sims | Function to simulate survival data | |
| simulate_ehr_patients | Generate a dataframe of simulated patients with exit dates based on presented comorbidities | |
| simulate_ehr_practices | Generate a simulated dataframe of primary care practices in the same format as is used in the CPRD | |
| simulate_ehr_consultations | Generates simulated GP consultation tables | |
| simulate_ehr_events | Generate simulated events tables | |
| ehr_system | set_CPRD | Sets EHR metadata to CPRD format |
| get_EHR_attribute | Return the value of an attribute in the .ehr environment | |
| set_EHR_attribute | Sets the value of an attribute in the .ehr environment | |
| list_EHR_attribute | Lists all of the EHR attribute names in .ehr | |
| matching | match_case | Selected controls matching a list of variables from a case |
| get_matches | Find matched controls for a set of cases | |
| match_on_index | Function for performing matching of controls to cases using the consultation files to generate a dummy index date for controls | |
| prevalence | prev_terms | Adds columns enabling one to calculate numerators and denominators for prevalence and incidence |
| prev_totals | Calculates the prevalence totals for the output of a data frame of events/patients etc | |
| select_by_year | select_by_year | Runs a series of selects over a year range and collects in a list of dataframes |
| build_date_fn | Function to build start/enddate helper fuctions | |
| qof_years_fn | Helper function providing startdate and enddate for QOF years | |
| qof_15_month_fn | Helper function providing startdate and enddate for QOF 15 month periods | |
| standard_years_fn | Helper function providing startdate and enddate for calendar years | |
| select_events | select_events | Extracts from the database |
| first_events | Selects the earliest event grouped by patient | |
| last_events | Selects the latest event grouped by patient | |
| temp_tables | temp_table | Creates a temporary table in the database |
| append_to_temp_table | Appends rows to a temporary table | |
| to_temp_table | Send a dataframe to a temporary table in the database | |
| drop_temp_table | Checks if a temporary table exists and then deletes if it does | |
| drop_all_temp_tables | Checks if any temporary tables exist and then deletes all | |
| temp_location | Sets location of the db temporary store for temporary tables | |
| uniform_units | cprd_uniform_hba1c_values | Standardises HbA1C values to mmol/mol |
| utils | compress | Compresses a dataframe to make more efficient use of resources |
| to_stata | Compresses a dataframe and saves in stata format. Options to save as Stata 12 or 13 | |
| wrap_sql_query | Combines strings and vectors in a sensible way for select queries | |
| expand_string | Reads strings and expands sections wrapped in dotted parentheses | |
| convert_dates | Converts date fields from ISO character string format to R Date format | |
| export_fn | Exports to a variety of formats based on the file type argument | |
| flat_files | Exports flat files from the database. One file per practice |