Literature DB >> 27668265

A General Framework for Considering Selection Bias in EHR-Based Studies: What Data Are Observed and Why?

Abstract

Electronic health records (EHR) data are increasingly seen as a resource for cost-effective comparative effectiveness research (CER). Since EHR data are collected primarily for clinical and/or billing purposes, their use for CER requires consideration of numerous methodologic challenges including the potential for confounding bias, due to a lack of randomization, and for selection bias, due to missing data. In contrast to the recent literature on confounding bias in EHR-based CER, virtually no attention has been paid to selection bias possibly due to the belief that standard methods for missing data can be readily-applied. Such methods, however, hinge on an overly simplistic view of the available/missing EHR data, so that their application in the EHR setting will often fail to completely control selection bias. Motivated by challenges we face in an on-going EHR-based comparative effectiveness study of choice of antidepressant treatment and long-term weight change, we propose a new general framework for selection bias in EHR-based CER. Crucially, the framework provides structure within which researchers can consider the complex interplay between numerous decisions, made by patients and health care providers, which give rise to health-related information being recorded in the EHR system, as well as the wide variability across EHR systems themselves. This, in turn, provides structure within which: (i) the transparency of assumptions regarding missing data can be enhanced, (ii) factors relevant to each decision can be elicited, and (iii) statistical methods can be better aligned with the complexity of the data.

Entities: Chemical Disease Gene Species

Keywords: 2014 Group Health Seattle Symposium; Comparative Effectiveness Research (CER); Electronic Health Record (EHR); Methods; Missing Data; Selection Bias

Year: 2016 PMID： 27668265 PMCID： PMC5013936 DOI： 10.13063/2327-9214.1203

Source DB: PubMed Journal: EGEMS (Wash DC) ISSN： 2327-9214

Introduction

Electronic health records (EHR) are playing an increasingly prominent role in comparative effectiveness research (CER) with key benefits including that they often contain rich information on large populations over long time frames, are relatively inexpensive to obtain, and can be updated in near real time.1–4 Recognizing these, the Institute of Medicine recently released called for increased use of EHR data for research.5 Notwithstanding their huge potential, however, since EHRs are typically developed for billing and/or clinical purposes, and not with any specific research agenda in mind, researchers must ask whether or not the available EHR data is “research quality.”6–8 This includes consideration of whether all covariates relevant to the research goals are routinely collected in clinical care, whether covariates that are collected are done so consistently across patients and time, and, whether the available data is accurate and error free. Without consideration of these issues, naïve analyses may be subject to numerous biases, the most commonly cited form being confounding bias.9 Another important type of bias is selection bias, which arises when some patients identified as being eligible for inclusion in the study are found to have insufficient data to be included in the analyses.10 Some patients may, for example, have missing baseline treatment, missing clinical information, missing laboratory measurements during follow-up, or have disenrolled from the health plan prior to the end of planned follow-up. Unfortunately, in contrast to confounding bias,11–21 the control of selection bias in EHR-based settings has received virtually no attention in the literature. This may be due, in part, to the notion that selection bias can be cast as a missing data problem and that statistical methods for missing data are well established22,23 and can be readily applied to EHR-based CER.24 Regardless of the specific method used, a critical step in any analysis involving missing data is the consideration of whether the data are missing completely at random (MCAR), missing at random (MAR), or missing not at random (MNAR). In the context of an EHR-based study, this corresponds to addressing the question of why some patients have complete data and others do not. In practice, this is typically operationalized by first conceiving of a so-called missingness mechanism that drives whether or not a patient has complete data, and secondly by determining which factors influence it. This approach may be reasonable in many settings, especially those in which the data collection scheme and design are under the control of the research team. Adopting this approach in the EHR setting, however, likely corresponds to an unrealistically simple view of the missing data. In particular, restricting one’s focus to a single mechanism for missingness masks the complex interplay of the numerous decisions made by patients, health care providers, and health systems that, collectively, give rise to the observed data. Naïvely moving forward with this standard approach, therefore, will likely result in a failure to completely control selection bias. To resolve this challenge, we propose a new, general framework for consideration of selection bias in the complex setting of EHR-based CER. Central to the proposed framework is a shift away from asking “what data are missing and why?” to asking instead “what data are observed and why?”. As we will elaborate upon, this shift provides researchers with a natural and intuitive approach to determining and understanding the sequence of decisions that must be made in order for a measurement to be recorded in the EHR and, ultimately, ensuring as thorough a control for selection bias as possible.

Antidepressants and Weight Change

The proposed framework is motivated by challenges we currently face in an ongoing comparative effectiveness study of antidepressant treatment and long-term weight change. Here we briefly describe the study, as well as the potential for selection bias.

Study Setting

The study is being conducted at Group Health, an integrated insurance and health care delivery system in Washington State. As part of its clinical systems, Group Health maintains numerous electronic databases including an EHR based on EpicCare (Epic Systems Corporation of Madison, WI), and a pharmacy database that has recorded all prescriptions dispensed at Group Health–owned pharmacies since 1977. Additionally, electronic databases track demographic information; inpatient treatment and outpatient encounter claims; insurance and enrollment status; and primary care visit appointments.

Study Population

To investigate the relationship between antidepressant choice and weight change, we identified all adults ages 18–65 years with a new episode of treatment for depression between January 2006 and November 2007. New treatment episodes were identified on the basis of a dispensing for an antidepressant medication without the occurrence of any treatment (including psychotherapy) in the prior nine months; thus, only subjects with at least nine months of continuous enrollment prior to the index date were included. Applying these criteria, we identified 9,704 patients in the Group Health EHR.

Primary Outcome

While previous studies indicate that certain antidepressants may reduce body weight (e.g., fluoxetine and bupropion) and others may increase body weight (e.g., paroxetine and mirtazapine), the existing literature is limited in that it focuses on short-term outcomes of 12 months or less.25–29 Given the lack of evidence regarding long-term outcomes we took weight change at 24 months post-treatment initiation to be the primary outcome.

Weight Information in the Electronic Health Record (EHR)

For each of the 9,704 patients identified by the inclusion- and exclusion criteria, we extracted all records of an outpatient visit for the interval starting 24 months prior to the start of the treatment episode through November 2009. This resulted in 354,945 records from which information—on weight; potential confounders; and auxiliary variables such as age, gender, smoking history, and comorbid conditions—was extracted. Focusing on weight, despite a patient’s underlying trajectory following some smooth path over time, the EHR provides only a series of “snapshots” of a patient’s weight post-treatment initiation. To illustrate this, Figure 1 provides a graphical summary of the available weight-related information for 12 select patients during the interval starting 180 days prior to the start of the treatment episode and ending at the 24-month marks. Across the panels, gray lines indicate times when the patient had an encounter with the clinical system, the blue dots indicate that a weight measurement was recorded, and the black lines indicate that a patient disenrolled from Group Health.

Figure 1.

Summary of Weight-Related Information for 12 Patients in the Group Health EHR–Based Study of Treatment for Depression and Weight Change

Gray lines indicate when an encounter occurred; blue dots indicate a weight measurement; and black lines indicate that the patient disenrolled prior to the 24-month mark.

From the first row of Figure 1, we see that some patients have rich weight-related information in the EHR with numerous encounters with the health system over the 24 month follow-up interval, as well as numerous weight measurements. In contrast, the eight patients in the second and third rows of Figure 1 have relatively sparse weight-related information, with substantial variation in the number of clinical encounters and weight measurements.

The Potential for Selection Bias

With respect to the primary outcome of weight change at 24 months, the “ideal” data scenario would be that complete weight information is available in the EHR at baseline and at 24 months post-treatment initiation for all 9,704 patients identified by the inclusion- and exclusion criteria. To simplify the illustration of potential selection bias we focus on 8,631 patients (88.9 percent) with complete weight information at treatment initiation based on a ±30 day window. Among these patients, only 2,408 (27.9 percent) patients have a valid weight measurement recorded in the EHR at 24 months, (again based on a ±30 day window). Given this significant missingness, and notwithstanding missingness in other important covariates such as confounders, there is clear potential for selection bias. Specifically, naïve analyses based solely on the 2,408 patients with complete data will be biased if they are not representative of the population defined by the inclusion/exclusion criteria. Cast as a missing data problem, the standard approach to selection bias first conceives of a missingness mechanism that drives whether or not a patient has complete data.10 Figure 2(a) provides a graphical representation, with S=0/1 indicating that a patient has incomplete- or complete-weight information at 24 months. Intuitively, this mechanism can be thought of as corresponding to a particular decision, made by the study participant, such as the decision to drop out from the study. Toward investigating the potential for selection bias and its impact, one could explore determinants of this (binary) decision via logistic regression analyses. Focusing on the 8,631 patients with complete data at baseline, the first three columns of Table 1 indicate that female patients have significantly higher estimated odds of complete weight information at 24 months, as do older patients. Furthermore, whether a patient has complete data at 24 months also depends, in part, on the choice of treatment; and patients with a higher baseline weight are estimated to have higher odds of complete data— odds ratio (OR) 1.05 for a 20-lb. increase in weight; 95 percent confidence interval (CI): (1.03, 1.07).

Figure 2.

Alternative Specifications for Observance of Complete Weight Data at 24 Months Post-treatment Initiation

Panel (a) corresponds to the traditional, single mechanism approach to selection bias; panel (b) corresponds to one possible implementation of the proposed framework that acknowledges the complexity of EHR data.

Table 1.

Results from Logistic Regression Analyses Examining the Association Between Select Patient-Level Characteristics and Whether or Not a Patient Has Complete Weight Information at 24 Months

	SINGLE MECHANISM: WEIGHT DATA AT 24 MONTHS(N=8,631)			SUB-MECHANISM NO. 1: ACTIVE ENROLLMENT AT 24 MONTHS(N=8,631)			SUB-MECHANISM NO. 2: ENCOUNTER AT 24 MONTHS ±30 DAYS, GIVEN ENROLLMENT AT 24 MONTHS(N=6,570)			SUB-MECHANISM NO. 3: WEIGHT MEASURED AT 24 MONTHS ±30 DAYS, GIVEN AN ENCOUNTER AT 24 MONTHS ±30 DAYS(N=3,688)

	OR	95% CI	P	OR	95% CI	P	OR	95% CI	P-VALUE	OR	95% CI	P
Female	1.33	(1.19, 1.48)	<0.001	1.11	(0.99, 1.24)	0.070	1.30	(1.16, 1.45)	<0.001	1.20	(1.03, 1.40)	0.022
Age^*	1.16	(1.12, 1.21)	<0.001	1.41	(1.35, 1.46)	<0.001	1.10	(1.05, 1.14)	<0.001	0.97	(0.92, 1.03)	0.277
Antidepressant
Fluoxetine	1.00	REF	<0.001	1.00	REF	<0.001	1.00	REF	<0.001	1.00	REF	0.014
Buproprion	1.05	(0.89, 1.22)		1.01	(0.86, 1.19)		1.14	(0.97, 1.33)		0.92	(0.74, 1.15)
Mirtazapene	1.29	(0.68, 2.45)		0.94	(0.46, 1.93)		1.18	(0.59, 2.33)		1.54	(0.55, 4.31)
Paroxetine	1.09	(0.83, 1.42)		1.27	(0.95, 1.72)		1.19	(0.91, 1.55)		0.83	(0.58, 1.19)
SSRI	0.90	(0.78, 1.03)		0.87	(0.76, 1.00)		1.08	(0.95, 1.24)		0.79	(0.65, 0.95)
SARI	1.51	(1.26, 1.80)		1.36	(1.09, 1.69)		1.59	(1.32, 1.93)		1.09	(0.85, 1.39)
Tricyclics	1.80	(1.53, 2.11)		1.33	(1.09, 1.63)		1.93	(1.61, 2.32)		1.28	(1.01, 1.61)
Weight at baseline^*	1.05	(1.03, 1.07)	<0.001	1.02	(1.00, 1.04)	0.068	1.03	(1.01, 1.05)	0.003	1.05	(1.02, 1.08)	<0.001

Note

Age odds ratio is for a 10-year contrast; weight at baseline OR is for a 20 lb. contrast. OR = odds ratio; CI = confidence interval.

A General Framework

Overall, the preliminary evidence presented so far strongly suggests that a naïve (unadjusted) analysis of the relationship between treatment choice and 24-month weight change will suffer from selection bias. To resolve this one could use multiple imputation,30,31 inverse-probability weighting,32 or both.33 The validity of each of these approaches, however, hinges on the appropriateness of the single mechanism and decision strategy as an approach to evaluating assumptions and performing necessary adjustments. In the clinical contexts that EHRs represent, however, a single mechanism is unlikely to fully characterize the complex set of decisions— made by the patient, their health care provider, and the health care system—that give rise to complete data in the EHR. As such, use of a single mechanism, as illustrated in Figure 2(a), will be unrealistically simple. To resolve this we propose a new framework for addressing selection bias in EHR-based CER, one that acknowledges and integrates the complex set of decisions that give rise to complete data. In the following we provide a detailed discussion of each aspect of the framework; Figure 3 provides an overview of the framework in the form of a process flow.

Figure 3.

Process Flow Representation of the Proposed Framework for Selection Bias in EHR-Based Studies That Can Be Used in Conjunction with

Consideration of the “Ideal” Study

Central to the proposed framework are two key principles. The first is that researchers initially specify the structure of the data that would have been collected had they had the opportunity to conduct an “ideal” study.34,35 This specification will depend primarily on the scientific goals of the study, which will, in turn, determine which specific covariates are needed as well as their timing. In the antidepressants study, given the primary interest in weight change at 24 months, such a data structure would at a minimum include weight at baseline and at 24 months. If the primary interest lay with understanding a patient’s trajectory over the first 24 months following treatment initiation, the ideal data structure would additionally include intermediary weight measurements, the timing of which would depend on the desired level of granularity. Beyond outcome information, the data structure for the ideal study would also include covariates necessary for the implementation of inclusion/exclusion criteria, covariates relevant to characterizing treatment choice, and potential confounders and effect modifiers. Practically, specification of this structure could be approached much in the same way that researchers approach detailing data collection strategies in grant proposals.

Consideration of Data Provenance

Given an ideal data structure, the second key principle is that researchers frame the task of controlling selection bias with the question “what data are observed and why.” Crucially, in doing so researchers can readily breakdown the complex process that governs whether or not a patient has complete data, referred to as the “provenance” of the data,36 into a series of more manageable components or sub-mechanisms. Toward this, given the wide variation in study questions addressed in CER, as well as the heterogeneity in EHR systems, one cannot, unfortunately, be prescriptive in this task; no single set of sub-mechanisms will be appropriate or sufficient for all studies. Nevertheless, Table 2 provides a list of sub-mechanisms that researchers could consider, each of which is accompanied by one or more contextual questions that could be used to determine its relevance for a particular study. Prior to describing them, we emphasize that each sub-mechanism could be considered for any given data element identified as being relevant for the ideal study (i.e., the outcome, treatment, and confounders, possibly measured at different time points).

Table 2.

Sub-mechanisms Potentially Relevant to Whether or not a Data Element Is Recorded in the EHR

SUB-MECHANISM	CONTEXTUAL QUESTIONS
1. Enrollment status	Was the patient continuously enrolled in the health plan or system during the time frame of interest or, at least, at the time points of interest?
2. Multiple facilities and institutions	Did the patient potentially receive care at multiple facilities and institutions? If so, do they maintain comparable and compatible EHR systems, and are they linked?
3. Encounters with the health system	Did the patient initiate an encounter with the health systems that the EHR corresponds to? If the patient initiated an encounter, was it of a type that would reasonably be expected to generate an entry in the EHR?
4. Measurement	If a relevant encounter type was initiated, was the measurement of interest recorded in the EHR? Are there any clinical reasons or contraindications why a patient would not have had a measurement taken? Could clinical priorities during an encounter have had an impact on whether or not a measurement was taken and recorded?
5. Structural changes	Have standards of care changed over time in a way that impacts which measurements are recorded and how? Has the EHR system evolved over time, either in terms of structure or of coding policies and procedures? If so, could these changes have influenced which measurements could have been recorded and how?

The first sub-mechanism in Table 2 refers to a patient’s enrollment status. In some settings, for a measurement to be recorded in the EHR, the patient must have been actively enrolled in some specific health plan system. In other settings, enrollment status may be less relevant or not at all. For example, since all individuals 65 years and older in the United States are automatically enrolled in Medicare, studies using Medicare claims data need not consider enrollment status once an individual is 65 years of age. If enrollment status is relevant, one may find that some patients have multiple periods of enrollment during the observation period. In the antidepressants study, for example, 617 of the 8,631 patients with complete data at baseline have at least two periods of enrollment; assuming that gaps of under 92 days do not represent actual discontinuities in coverage, Figure 4(b) summarizes the distribution of the first such gap across these patients. A second, related phenomenon is that when EHR data is extracted it is typically subject to administrative censoring. In the antidepressants study, this date was November 2009. To simplify the exposition, we restricted the study to patients with at least two years of potential follow-up (i.e., we included only patients with a new treatment episode prior to November 2007). An alternative would have been to include all patients who initiated a treatment episode prior to November 2009 and to use survival analysis methods to explore enrollment status at 24 months while accommodating censoring.

Figure 4.

Summary information Regarding Disenrollment and Censoring of Patient Follow-Up During the 24-Month Interval Post-treatment Initiation among the 8,631 patients in the Antidepressants and Weight Change Study Who Had an Observed Weight Measurement at Baseline

The right-hand panel shows estimates of the cumulative probability of being disenrolled and being either disenrolled or censored. The left-hand panel shows 617 patients with more than one distinct period of enrollment during follow-up, specifically the distribution of length of the first gap in enrollment.

Regardless of enrollment status, a patient may choose to receive their health care at different facilities and institutions. At Group Health, while 70 percent of patients receive mental health care within an integrated group practice that uses the same EpicCare EHR, 30 percent receive mental health care from an external network of providers. For these patients, although billing information is readily available, clinical information (including weight) may not be routinely collected using the same standards of care. Beyond Group Health–like settings, EHR-based studies conducted in tertiary care hospital settings may have detailed clinical information relevant to the condition that led to the admission and hospital stay but may not have broader information. The third sub-mechanism in Table 2 considers the timing and nature of the clinical encounter. Clearly, for a measurement to be recorded in the EHR at a particular time point, an encounter must have been initiated. As highlighted in Figure 1 there can be substantial variation in the timing and intensity of encounters across patients. There can also be substantial variation in the type of encounter, with patients interacting with their health care providers via a primary care or specialty care visit, an inpatient or outpatient visit, or an urgent care or routine care visit. Furthermore, patients increasingly have the option to interact with their providers virtually, via telephone encounters or secure messaging systems. Since measurements of interest, such as weight in the antidepressants study, may not be collected during all encounter types—either at all or routinely, it would be important to identify (1) the types of care options that patients have, and (2) which types are captured in the EHR. The fourth sub-mechanism speaks to the actual measurement and recording of information. From Figure 1 it is clear that measurements may not be taken during all encounters. Critical to this sub-mechanism is that whether or not a measurement is taken may be dictated by decisions made by the patient, health care provider, and health system. For example, a patient may decide not to be weighed, or a physician may decide there is insufficient time to weigh the patient or may decide not to record a measurement in the light of extenuating circumstances (e.g., blood glucose may not be measured if the patient is known not to have fasted). Finally, structural features of the EHR and the broader health system may result in information being less likely to be recorded. Even if the EHR system routinely collects clinical information, changing practice standards or an evolving internal structure of the EHR may result in differential completeness of some data elements over time. One ubiquitous example of this is the International Classification of Disease (ICD) coding system developed by the World Health Organization (WHO). First published in 1946, the current revision, ICD-10, came into use in 1994. In 2017 the WHO is planning on releasing the eleventh revision, which will be based on an updated standardized structure for disease definitions.

Application to the Antidepressants Study

To illustrate the proposed framework we return to the antidepressants study. As outlined above, the initial task in applying the framework is to specify the scientific question of interest and the corresponding ideal study and data structure. Since the primary analyses for the antidepressants study are ongoing, we focus on a simple question of whether choice of antidepressant medication at treatment initiation is associated with weight change at 24 months. That is, we focus on a question based on the intent-to-treat principle, which ignores changes in treatment choice postinitiation. This question is, arguably, most relevant to the clinical decision at the time of treatment initiation since one cannot know whether and how a patient will change treatment in the future. With this in mind, key variables that would be collected in an ideal study design would include the following: initial treatment choice; baseline; 24-month weight; and potential confounders such as gender, age, and comorbid conditions. Given this list of variables, the next step is to consider the extent of missing data and its nature. In principle, any and all variables with missing values should be considered in this way; here, for simplicity, we focus on missingness in the 24-month weight measurement. Furthermore, we focus on three specific sub-mechanisms: (1) whether the patient was actively enrolled in Group Health, (2) whether they initiated an encounter with the health system at 24 months, and (3) whether their weight was measured during the encounter and recorded in EHR. Figure 2(b) provides a flow-type diagram to help visualize these sub-mechanisms and their interaction with each other.

Sub-mechanism 1: Active Enrollment Status at 24 Months

Returning to Figure 1, 3 of the 12 patients can be seen to have disenrolled from Group Health prior to the 24-month mark. Analogous to a dropout in a typical research setting, if a patient disenrolls, the EHR cannot be expected to have a weight measurement recorded. In practice, there are many reasons why an individual might disenroll from their health plan including cost increases, changes in employment status or employer coverage options, reaching eligibility for Medicare, and dissatisfaction with their coverage or provider access. Among the 8,631 patients with complete weight data at baseline, 2,061 (23.9 percent) disenrolled at some point during the first 24 months following treatment initiation (Figure 4). From the second set of columns in Table 1, in contrast to the results for the single mechanism, gender does not appear to be associated with enrollment status at 24 months. Age, however, is positively associated with enrollment status, with older patients again estimated to have higher odds although the strength of the association is much greater (OR 1.41 for a 10-year increase in age; 95 percent CI: (1.36, 1.47)). With respect to treatment choice, the results are generally consistent with those based on single missingness mechanism although the strongest associations, specifically for serotonin antagonist and reuptake inhibitor (SARIs) and tricyclics, are somewhat attenuated. Finally, patients with higher baseline weight are estimated to have somewhat higher odds of active enrollment at 24 months (OR 1.02 for a 20-lb. increase in weight; 95 percent CI: (1.00, 1.04)), although the association is not statistically significant despite the sample size being the same as in the single missingness mechanism model.

Sub-mechanism 2: Initiation of an Encounter at 24 Months

Returning to Figure 1, despite being actively enrolled, none of the last three patients in the third row had initiated a clinical encounter at or around the 24-month mark. Clearly, for a weight measurement to be recorded in the EHR, however, an encounter must have taken place. In practice, encounters are initiated either because standards of care within the health system dictate a schedule of patient-provider interactions or because the patient is seeking care for a new or ongoing medical problem. Among the 6,570 patients actively enrolled at 24 months, 1,604 (24.4 percent) initiated at least one encounter in the 24-month ±7 days window; 2,485 patients (37.4 percent) initiated at least one encounter in the 24-month ±14 days window; and, 3,688 patients (56.1 percent) initiated at least one encounter in the 24-month ±30 days window. Focusing on the latter group, the third set of columns in Table 1 indicate that, in contrast to sub-mechanism 1, gender is strongly associated with initiation of an encounter: female patients are estimated to have 24 percent higher odds than males. Furthermore, while age is again significantly associated with initiation of an encounter, the magnitude of the association is substantially smaller than for sub-mechanism 1 (i.e., OR 1.10 compared to 1.41). As with sub-mechanism 1, treatment choice appears to be significantly associated with initiation of an encounter; the magnitudes of the associations for SARIs and tricyclics are stronger than they were for sub-mechanism 1, and buproprion appears to be marginally positively associated with an increased odds of initiating an encounter compared to fluoxetine (OR 1.14; 95 percent CI: (0.98, 1.34)).

Sub-mechanism 3: Measurement of Weight at 24 Months

Finally, even if a patient is enrolled at 24 months and initiates a clinical encounter, a weight measurement may nevertheless not have been recorded in the EHR. The first patients in the second and third rows of Figure 1, for example, were enrolled and had a clinical encounter at 24 months, yet neither had a weight measurement recorded. In practice, although standards of care at Group Health indicate that weight should be measured during all primary care visits, it may be that these patients refused to be weighed or that their health care providers decided not to weigh them because of the specific focus of the visit (e.g., an acute illness) and because of timing considerations. Among all 3,688 patients who initiated at least one encounter in the 24-month ±30 days window, 2,408 (65.3 percent) have at least one weight measurement recorded in the EHR during the same window. From Table 1, we find that neither gender nor age is associated with a patient having at least one weight measurement in the EHR given that they are enrolled and have an encounter. Treatment choice is, overall, significantly associated with having at least one weight measurement, with patients treated with a tricylic having higher estimated odds compared to those treated with fluoxetine (OR 1.28; 95 percent CI (1.01, 1.61)) and patients treated with a serotonin-specific reuptake inhibitor (SSRI) estimated to have lower odds (OR 0.80; 95 percent CI: (0.66, 0.96)). Finally, in contrast to the moderate association for sub-mechanism 2, baseline weight is strongly associated with whether or not a weight measurement is recorded given that an encounter was initiated (OR 1.05 for a 20-lb. increase in weight; 95 percent CI: (1.03, 1.08)).

Discussion

As researchers make use of EHR data for CER, the unique challenges posed by the complexity and heterogeneity of the observed data are well recognized. Since standard methods (i.e., those developed outside the EHR-based setting) have been found to be inadequate, the recent literature has seen a number of important advances to address these challenges, including methods that facilitate the coding and classification of text-based notes,37,38 methods for record linkage in the absence of unique patient identifiers,39,40 and methods for the control of confounding bias.11–21 Common throughout this recent literature is the general philosophy that one should make use of as much of the available information in the EHR as possible. This clearly has appeal in the sense that information is not thrown away and, presumably, statistical efficiency and power are maximized. As researchers grapple with selection bias, however, application of this philosophy has two important drawbacks, both of which are exemplified by Figure 1. First, because EHR systems are typically designed to support clinical and/or billing activities, not with any specific research agenda in mind, the standard notions of “complete” or “missing” data do not have well-defined meanings; these notions only have meaning with respect to some data structure that is (typically) pre-specified by the study design. Second, given the complexity and heterogeneity of EHR data, making use of all of the available information will likely require the development and fit of large, complex models—the components of which may be poorly identified. Consider, for example, the challenging task of accurately modeling the underlying weight trajectories of all 8,631 patients in the antidepressants study who have complete baseline weight values. To resolve these challenges, we have proposed a new general framework for addressing selection bias in EHR-based settings. Central to the framework are two key principles that explicitly address the drawbacks of the standard philosophy: (1) the analysis is grounded in some pre-specified ideal study, and (2) the data provenance, which is the process that gives rise to the available EHR data, is decomposed into a series of manageable components. This, we believe, represents a fundamental shift in how selection bias is addressed in EHR-based studies. Practically, the proposed framework enjoys numerous important benefits. First, it provides focus in the elicitation process during which researchers consult with subject-matter experts on reasons and determinants of completeness. This may be particularly useful if the sub-mechanisms interact in such a way that if a particular event has not occurred then whether or not a subsequent event occurs is deterministic (e.g., a patient cannot have a weight measurement recorded if there was no encounter with the clinical system). Second, it provides flexibility in that the various sub-mechanisms may not be driven by the same set of covariates. In Table 1, for example, there is strong evidence that all four covariates are associated with sub-mechanism 2 but not necessarily with sub-mechanisms 1 and 3. Third, it provides flexibility in that any given covariate may have differential effects across sub-mechanisms. In Table 1, patients treated with an SSRI are estimated to have lower odds of being actively enrolled at 24 months compared to those treated with fluoxetine (OR 0.87) and having a weight measurement at 24 months (OR 0.79) but higher odds of initiating an encounter at 24 months (OR 1.08). Fourth, the decomposition of observance into a series of sub-mechanisms provides a clearer framing for consideration of critical assumptions. Specifically, after consulting with subject-matter experts, one may find that the MAR assumption is reasonable for some sub-mechanisms but not others. This, in turn, provides researchers with the ability to target sensitivity analyses specifically to those sub-mechanisms for which MNAR is suspected.22,41 Notwithstanding these benefits, implementing the proposed framework in any given CER study is not without challenges. Specifically, as mentioned, the framework is not prescriptive, in the sense that no single implementation will be adequate for all EHR-based studies. While Figure 2(b) is, arguably, a reasonable step forward from Figure 2(a), it could not be used as a general template. In this sense, the proposed framework requires researchers to make a series of potentially challenging decisions including the specification of the ideal study, the specification of potential sub-mechanisms relevant to the EHR system, and the specification of covariates that may influence the collection of sub-mechanisms. These tasks will typically be nontrivial, although the use of flow diagrams analogous to those in Figure 2 may be useful during the elicitation process as well as during modeling and sensitivity analyses. To further aid these tasks we are developing a suite of data-driven strategies, analogous to those recently developed for confounding bias,13,20 that combine clinical knowledge with model selection methods42–44 to identify relevant sub-mechanisms and their determinants. Several features of the antidepressants application, as presented, are also worth noting. First, we chose to illustrate the framework in the context of a scientific question for which the appropriate analysis is an intent-to-treat analysis. Such questions clearly have clinical value, although they do not address important aspects of the relationship between treatment choice and 24-month weight change, including the potential impact of stopping treatment or treatment switching for which an appropriately adjusted as-treated analysis would be more appropriate. Furthermore, the intent-to-treat analysis does not consider the impact of intermediate events such as the resolution of the initial treatment episode. For each of these alternative scientific questions, however, the proposed framework could readily be applied. Second, to simply the development we restricted attention to patients with at least two years of potential follow-up (i.e., we only included patients with a new treatment episode prior to November 2007). An alternative would have been to include all patients who initiated a treatment episode prior to November 2009 and used survival analysis as a means to exploring enrollment status at 24 months while accommodating censoring. Finally, we considered only select baseline covariates for inclusion in the models in Table 1. In reality, it is likely that each of the sub-mechanisms will depend on patient characteristics that evolve over time. In principle, as with treatment changes over time, one could readily fold consideration of these covariates and the relevant timing of their measurement into the ideal study and sub-mechanism specification of the proposed framework. Finally, while understanding mechanisms and consideration of assumptions is a necessary first task in any analysis involving missing data, ultimately the most important question is whether or not the additional effort required by the proposed framework makes a difference in the overall study results and conclusions. To answer this, statistical analysis methods must be aligned with the proposed framework. One approach to doing so could be to adapt existing methods based on inverse-probability weighting. In the context of Figure 2, rather than reweighting the main analyses by the inverse of P(S=1) (i.e., the fitted values from a model based on a single mechanism) one could reweight by the inverse of P(S1=1,S2=1,S3=1)=P(S1=1)xP(S2=1|S1=1) xP(S3=1|S1=1,S2=1), where each of the latter three components are taken as the fitted values from sub-mechanism-specific regression models (i.e., those in Table 1). Interestingly, for the antidepressants study these two sets of sampling weights do not differ substantially (see Figure 5). As such, although details are not shown, primary results for the main analyses investigating the association between choice of treatment and body weight change at 24 months do not differ substantively. Clearly this will not be the case for all EHR-based studies, although the fact that it is the case in the antidepressants study raises an important question regarding whether a penalty is paid for using an unnecessarily complex analysis. That is, is there a loss of efficiency when the observance mechanism is overspecified? When coupled with the potential for bias when the observance mechanism is underspecified, a potential bias-variance trade-off arises. Understanding this trade-off and providing practical guidance is an avenue of research that we are actively pursuing.

Figure 5.

Fitted Sampling Weights Obtained from the Standard Single Missingness Mechanism Framework Compared to Those Obtained from Impementation of the Proposed Framework with Three Sub-mechanisms: Active Enrollment, Initiation of an Encounter, and Recording of a Body Weight Measurement

33 in total

Review 1. Antidepressants and body weight: a comprehensive review and meta-analysis.

Authors: Alessandro Serretti; Laura Mandelli
Journal: J Clin Psychiatry Date: 2010-10 Impact factor: 4.384

2. Understanding secondary databases: a commentary on "Sources of bias for health state characteristics in secondary databases".

Authors: Sebastian Schneeweiss
Journal: J Clin Epidemiol Date: 2007-02-26 Impact factor: 6.437

3. Role of electronic health records in comparative effectiveness research.

Authors: Blanca Gallego; Adam G Dunn; Enrico Coiera
Journal: J Comp Eff Res Date: 2013-11 Impact factor: 1.744

4. Toward reuse of clinical data for research and quality improvement: the end of the beginning?

Authors: Mark G Weiner; Peter J Embi
Journal: Ann Intern Med Date: 2009-07-28 Impact factor: 25.391

5. Linkage of patient records from disparate sources.

Authors: Xiaochun Li; Changyu Shen
Journal: Stat Methods Med Res Date: 2011-06-10 Impact factor: 3.021

6. Confounding adjustment via a semi-automated high-dimensional propensity score algorithm: an application to electronic medical records.

Authors: Sengwee Toh; Luis A García Rodríguez; Miguel A Hernán
Journal: Pharmacoepidemiol Drug Saf Date: 2011-06-30 Impact factor: 2.890

Review 7. Managing weight gain as a side effect of antidepressant therapy.

Authors: Rashmi Deshmukh; Kathleen Franco
Journal: Cleve Clin J Med Date: 2003-07 Impact factor: 2.321

8. Caveats for the use of operational electronic health record data in comparative effectiveness research.

Authors: William R Hersh; Mark G Weiner; Peter J Embi; Judith R Logan; Philip R O Payne; Elmer V Bernstam; Harold P Lehmann; George Hripcsak; Timothy H Hartzog; James J Cimino; Joel H Saltz
Journal: Med Care Date: 2013-08 Impact factor: 2.983

9. Combining multiple imputation and inverse-probability weighting.

Authors: Shaun R Seaman; Ian R White; Andrew J Copas; Leah Li
Journal: Biometrics Date: 2011-11-03 Impact factor: 2.571

10. Chapter 13: Mining electronic health records in the genomics era.

Authors: Joshua C Denny
Journal: PLoS Comput Biol Date: 2012-12-27 Impact factor: 4.475

29 in total

1. Challenges and opportunities using online portals to recruit diverse patients to behavioral trials.

Authors: Amir Alishahi Tabriz; Patrice Jordan Fleming; Yongyun Shin; Ken Resnicow; Resa M Jones; Susan A Flocke; Deirdre A Shires; Sarah T Hawley; David Willens; Jennifer Elston Lafata
Journal: J Am Med Inform Assoc Date: 2019-12-01 Impact factor: 4.497

2. Results of Database Studies in Spine Surgery Can Be Influenced by Missing Data.

Authors: Bryce A Basques; Ryan P McLynn; Michael P Fice; Andre M Samuel; Adam M Lukasiewicz; Daniel D Bohl; Junyoung Ahn; Kern Singh; Jonathan N Grauer
Journal: Clin Orthop Relat Res Date: 2017-12 Impact factor: 4.176

3. An augmented estimation procedure for EHR-based association studies accounting for differential misclassification.

Authors: Jiayi Tong; Jing Huang; Jessica Chubak; Xuan Wang; Jason H Moore; Rebecca A Hubbard; Yong Chen
Journal: J Am Med Inform Assoc Date: 2020-02-01 Impact factor: 4.497

4. How and when informative visit processes can bias inference when using electronic health records data for clinical research.

Authors: Benjamin A Goldstein; Matthew Phelan; Neha J Pagidipati; Sarah B Peskoe
Journal: J Am Med Inform Assoc Date: 2019-12-01 Impact factor: 4.497

5. The emerging landscape of health research based on biobanks linked to electronic health records: Existing resources, statistical challenges, and potential opportunities.

Authors: Lauren J Beesley; Maxwell Salvatore; Lars G Fritsche; Anita Pandit; Arvind Rao; Chad Brummett; Cristen J Willer; Lynda D Lisabeth; Bhramar Mukherjee
Journal: Stat Med Date: 2019-12-20 Impact factor: 2.373

6. Risk Factors for Alzheimer's Disease and Related Dementia Diagnoses in American Indians.

Authors: Cara L Carty; Carolyn Noonan; Clemma Muller; Don Saner; Eric M Reiman; Dedra Buchwald; Ronny A Bell; Lonnie A Nelson
Journal: Ethn Dis Date: 2020-09-24 Impact factor: 1.847

7. Grappling with the Future Use of Big Data for Translational Medicine and Clinical Care.

Authors: S Murphy; V Castro; K Mandl
Journal: Yearb Med Inform Date: 2017-09-11

8. Priorities to Overcome Barriers Impacting Data Science Application in Emergency Care Research.

Authors: Michael A Puskarich; Clif Callaway; Robert Silbergleit; Jesse M Pines; Ziad Obermeyer; David W Wright; Renee Y Hsia; Manish N Shah; Andrew A Monte; Alexander T Limkakeng; Zachary F Meisel; Phillip D Levy
Journal: Acad Emerg Med Date: 2018-08-16 Impact factor: 3.451

Review 9. Artificial Intelligence for Drug Toxicity and Safety.

Authors: Anna O Basile; Alexandre Yahi; Nicholas P Tatonetti
Journal: Trends Pharmacol Sci Date: 2019-08-02 Impact factor: 14.819

10. Improving rheumatoid arthritis comparative effectiveness research through causal inference principles: systematic review using a target trial emulation framework.

Authors: Sizheng Steven Zhao; Houchen Lyu; Daniel H Solomon; Kazuki Yoshida
Journal: Ann Rheum Dis Date: 2020-05-07 Impact factor: 19.103