| Literature DB >> 34948800 |
Abstract
In the United States, electronic health records (EHR) are increasingly being incorporated into healthcare organizations to document patient health and services rendered. EHRs serve as a vast repository of demographic, diagnostic, procedural, therapeutic, and laboratory test data generated during the routine provision of health care. The appeal of using EHR data for epidemiologic research is clear: EHRs generate large datasets on real-world patient populations in an easily retrievable form permitting the cost-efficient execution of epidemiologic studies on a wide array of topics. Constructing epidemiologic cohorts from EHR data involves as a defining feature the development of data machinery, which transforms raw EHR data into an epidemiologic dataset from which appropriate inference can be drawn. Though data machinery includes many features, the current report focuses on three aspects of machinery development of high salience to EHR-based epidemiology: (1) selecting study participants; (2) defining "baseline" and assembly of baseline characteristics; and (3) follow-up for future outcomes. For each, the defining features and unique challenges with respect to EHR-based epidemiology are discussed. An ongoing example illustrates key points. EHR-based epidemiology will become more prominent as EHR data sources continue to proliferate. Epidemiologists must continue to improve the methods of EHR-based epidemiology given the relevance of EHRs in today's healthcare ecosystem.Entities:
Keywords: cohort studies; electronic health records; epidemiology; retrospective studies
Mesh:
Year: 2021 PMID: 34948800 PMCID: PMC8701170 DOI: 10.3390/ijerph182413193
Source DB: PubMed Journal: Int J Environ Res Public Health ISSN: 1660-4601 Impact factor: 3.390
Comparison of study design attributes in prospective vs. retrospective studies using electronic health records.
| Prospective Study | Retrospective Study with EHR | |
|---|---|---|
| Selecting Research Participants |
Inclusion and exclusion criteria prospectively applied Volunteers willing to undergo study procedures and be actively followed |
Inclusion and exclusion criteria applied post hoc Patrons of a healthcare organization Preferably exceed some minimal information threshold |
| Baseline |
Often date of some salient health event such as diagnosis of a condition, procedure performed, or date informed consent provided among generally healthy volunteers |
Any time on or between first and last EHR-documented encounters Preferably on the date of an actual encounter Preferably not first encounter |
| Assembling Baseline Characteristics |
At baseline, measured by questionnaires, blood tests, imaging, and other measurement instruments in a standardized manner among all study participants |
Only considers data elements collected during usual care Uses encounters occurring on or before the specified baseline Qualitative characteristics determined by rules for the 99% |
| Follow-up for Future Outcomes |
Active, standardized follow-up at regular intervals in all study participants Adjudication of claimed outcomes |
Passive follow-up Only study outcomes documented at study institution are identified Rules for the 99% apply |
Figure 1The information quality spectrum of electronic health record data. Patients within an electronic health record database have highly variable information quality that is dependent on multiple factors such as frequency and types of interactions with healthcare organizations. Moving down the information quality spectrum allows more patients to be included in epidemiologic studies, but at the expense of information quality.
Figure 2An individual patient’s electronically documented journey through a healthcare organization depicted as a timeline, bookended by the first and last EHR-documented encounters, with multiple, variably spaced, and qualitatively different types of encounters between. Colored triangles represent different encounter types.
Definitions of terms and phrases.
| Term or Phrase | Definition |
|---|---|
| Encounter | Any professional contact between a patient and healthcare organization, including primary care, specialty care, laboratory testing, emergency department visits, hospital admissions, etc. |
| Opportunity for Information | The collection of pre-baseline encounters that could provide usable research information. Can be expressed in units of time (days from first encounter to baseline encounter) or as number of encounters (between first and baseline encounters). |
| Creating Rules for the 99% | When assembling baseline characteristics for an EHR-based retrospective study, rules must be created for determining presence/absence of qualitative characteristics and values for quantitative characteristics. This informal expression implies that imperfect rules must be implemented that work well for the majority but rarely universally. |
| Looking for Yes | An expression applied when determining the presence/absence of a binary characteristic, denoting how rules typically only look for positive affirmations of the characteristic and rarely negative affirmations. |
| Hidden Missingness | A phrase describing the scenario where a qualitative condition (e.g., diagnosis) is labeled “absent” but was never queried nor investigated in clinical practice. Thus, the condition’s true status as present/absent is actually undetermined despite being labeled “absent”. |
| Weak No | A scenario where a qualitative condition (e.g., a diagnosis) is labeled absent based on weak information. |
| Strong No | A scenario where a qualitative condition (e.g., a diagnosis) is labeled absent based on strong information. |
Figure 3Opportunity for Information—diabetes and heart failure hospitalization example. Relative frequency histograms of the number of encounters used to determine baseline characteristics for all encounters (in blue) and office visits only (in red).
Figure 4Opportunity for Information—diabetes and heart failure hospitalization example: the effect of applying a two-year, pre-baseline time restriction. Relative frequency histograms of the number of total encounters used to determine baseline characteristics with no pre-baseline time restriction (in blue) and restricting to encounters occurring within two years of baseline (in orange).
Loss of information when restricting pre-baseline time intervals for assessment of baseline characteristics.
| Baseline Characteristic | No Restriction | 2-Year Restriction |
|---|---|---|
| Hypertension | 71% ( | 67% ( |
| High cholesterol | 69% ( | 64% ( |
| Coronary bypass surgery | 7% ( | 6% ( |
| Heart failure | 11% ( | 10% ( |
| Acute myocardial infarction | 8% ( | 7% ( |
| Chest pain | 22% ( | 15% ( |
| Shortness of breath | 16% ( | 12% ( |
| Depression | 25% ( | 21% ( |
Numbers in table cells are percentage (number) of patients with a history of the characteristic. Denominator size is 79,354.