Literature DB >> 28575313

Power calculator for instrumental variable analysis in pharmacoepidemiology.

Venexia M Walker^1,2, Neil M Davies^1,2, Frank Windmeijer^2,3, Stephen Burgess^2,4,5, Richard M Martin^1,2.

Abstract

Background: Instrumental variable analysis, for example with physicians' prescribing preferences as an instrument for medications issued in primary care, is an increasingly popular method in the field of pharmacoepidemiology. Existing power calculators for studies using instrumental variable analysis, such as Mendelian randomization power calculators, do not allow for the structure of research questions in this field. This is because the analysis in pharmacoepidemiology will typically have stronger instruments and detect larger causal effects than in other fields. Consequently, there is a need for dedicated power calculators for pharmacoepidemiological research. Methods and
Results: The formula for calculating the power of a study using instrumental variable analysis in the context of pharmacoepidemiology is derived before being validated by a simulation study. The formula is applicable for studies using a single binary instrument to analyse the causal effect of a binary exposure on a continuous outcome. An online calculator, as well as packages in both R and Stata, are provided for the implementation of the formula by others. Conclusions: The statistical power of instrumental variable analysis in pharmacoepidemiological studies to detect a clinically meaningful treatment effect is an important consideration. Research questions in this field have distinct structures that must be accounted for when calculating power. The formula presented differs from existing instrumental variable power formulae due to its parametrization, which is designed specifically for ease of use by pharmacoepidemiologists.

Entities: Disease Gene Species

Keywords: Pharmacoepidemiology; binary exposure; continuous outcome; instrumental variable; power; prescribing preference

Mesh：

Year: 2017 PMID： 28575313 PMCID： PMC5837396 DOI： 10.1093/ije/dyx090

Source DB: PubMed Journal: Int J Epidemiol ISSN： 0300-5771 Impact factor: 7.196

Introduction

Pharmacoepidemiological studies risk irrelevance if they are insufficiently powered to detect clinically meaningful treatment effects. Before starting a study, the statistical power to calculate a given treatment effect can be calculated. This type of calculation is becoming increasingly important for grant and data request applications, which look to value the contribution of such studies. The number of pharmacoepidemiology studies using instrumental variable analysis, for example with physicians’ prescribing preferences as an instrument for exposure, continues to grow. This is partly because instrumental variable analyses have the potential to overcome some of the issues associated with conventional statistical approaches, such as residual confounding and reverse causation. As the demand to provide power calculations to support applications increases, there is a more pressing need to be able to provide power calculations for this method. There are power calculators for instrumental variable analysis in other settings, such as Mendelian randomization, which uses germline genetic variants as proxies for exposures in disease-related research., However, pharmacoepidemiological research questions have distinct structures that are not sufficiently catered for by these existing calculators. Unlike Mendelian randomization studies, which often use a case-control study design, pharmacoepidemiology studies typically use a cohort study design. Further to this, pharmacoepidemiology studies usually report a risk difference for a binary exposure using a binary instrument, whereas Mendelian randomization studies report on a continuous exposure using a discrete or continuous genetic instrument (count of alleles or allele score respectively). As a result of these differences, as well as the stronger instruments and larger causal effects seen in pharmacoepidemiology, there is a need for a dedicated power calculator for instrumental variable analysis in the context of this field. This paper will address how to conduct power calculations for pharmacoepidemiological studies using a single binary instrument to analyse the causal effect of a binary exposure on a continuous outcome. The formula to calculate power will be derived and then validated by a simulation study. The formula is distinct from existing instrumental variable power formulae due to its parametrization, which is designed specifically for ease of use by pharmacoepidemiologists. An online calculator, as well as packages in both R and Stata, are provided for the implementation of the formula by others.

Methods and Results

Let us consider physicians’ prescribing preferences for two different treatments–for example a treatment of interest and a control treatment–as an instrument for exposure to these treatments. Physicians’ preferences are generally not directly observable, so each physician’s prescriptions to previous patients are used as a proxy for their preferences. This results in a binary instrument that takes a value of one if the physician issued a prescription for the treatment of interest to their previous patient and a value of zero if they prescribed the control treatment. We will derive the formula for the power of studies that use this instrument to measure the causal effect of a drug exposure on a continuous outcome, for example systolic blood pressure or low-density lipoprotein cholesterol.

Formula derivation

The instrumental variable analysis we consider requires the following three variables; namely a binary instrument , a binary exposure and a continuous outcome . The outcome for patient , for , is modelled as follows: where is a zero-mean error term containing unobserved confounders, determining both the outcome and the treatment . The instrument affects treatment , but is not associated with the unobserved confounders and has no direct effect on the outcome. Let , and , where , and are sample averages. Denote by , and the -vectors of observations on , and , respectively. The two-stage least squares (2SLS) estimator of is then given by: The variance of the 2SLS estimator is: where and is the residual variance. Note that conditional homoscedasticity holds, so the variance is constant for all values of the instrument i.e. for . Consider the term : Let , and . In large samples: Hence can be presented in the following way: Now consider the instrumental variable estimator of β. Using the asymptotic distribution , the distribution of the t-test statistic under the null hypothesis is: The distribution of the test statistic under the alternative hypothesis is: The null hypothesis is rejected if where is the critical value at significance level . The power is the probability that the test statistic will exceed the critical value, which is: where is the cumulative standard normal distribution function evaluated at . Power therefore increases as the value of decreases and/or the value of increases. By substituting and simplifying, we obtain the following formula for power: The formula requires a total of seven parameters to be specified. This includes four parameters that must always be specified–these are the significance level, ; the size of the causal effect, ; the residual variance, ; and the sample size, . Also three that can be chosen from the following four parameters–these are the frequency of the instrument, ; the frequency of exposure, ; the probability of exposure given the instrument , ; and the probability of exposure given the instrument , . The chosen parameters must be specified so that the following holds: The formula for power is available for use via an online calculator [https://venexia.shinyapps.io/PharmIV/] and packages for R and Stata can be downloaded from GitHub [https://github.com/venexia/PharmIV]. Note that the frequency of exposure in an instrumental variable analysis of this type is likely to be higher than in a general population study because a drug is compared against one or more other drugs in a population of people with the indication for these treatments. General population studies, on the other hand, tend to compare a population who received the drug of interest with a population who did not receive it, and consequently the frequency of exposure is generally much lower. The effect of varying the parameters within the formula on a study’s power is best presented graphically. Figure 1 illustrates an example of the effect of the frequency of the exposure on the power of a study to detect a causal effect of using an instrument with a frequency of , a residual variance of and a sample size of up to 30 000 participants. Both increasing the frequency of exposure up to 50% and increasing the sample size results in increased power for this study.

Figure 1

Power curves for several values of the frequency of exposure that show the effect on the power of a study to detect a causal effect of using an instrument with a frequency of , a residual variance of and a sample size of up to 30 000 participants.

Formula validation

To validate the power formula, we conducted a simulation. We simulated the data by defining the three variables necessary to conduct instrumental variable analysis with a single instrumental variable as follows: where is the frequency of the instrument, for are the inverse cumulative standard normal distribution, or quantile, functions of the conditional probabilities of exposure given the instrument, is the causal effect, and and are standard normally distributed error terms with covariance . The formula uses a binary instrument, binary exposure and continuous outcome and so the above variables were simulated to recreate data of this form. The instrument is modelled by a binomial distribution parameterized by its frequency . This ensures a binary variable with the correct probability of success. The exposure is also binary but is modelled using a threshold model. The variability in the equation for the exposure comes from the normally distributed error term . The use of the model equation allows the exposure to be associated with the instrument . The outcome is modelled by its model equation . In the model, the instrument is valid as the outcome is only associated with the exposure as dictated by the causal effect , and is not associated with the instrument other than through the exposure . Using the generated data, we performed an instrumental variable analysis using the command IVREG2 in Stata. From this analysis, we recorded the coefficient of the exposure with the 95% confidence interval. We then counted the number of simulations for which the confidence interval excluded the null, and divided this by the total number of simulations to determine the power. By running the simulation and calculating the formula using the same parameters, we are able to validate the formula against the simulation. We present the power calculated from both the simulation and the formula for several parameter combinations in Table 1. The table contains 27 different simulations and each was repeated 10 000 times. The simulations consider each combination of three values of the frequency of exposure, ; three values of the probability of exposure given the instrument , ; and three values of the sample size, . We set the frequency of the instrument, ; the causal effect, ; the residual variance, ; and calculated according to the following equation: The effect of confounding was removed as a parameter because the power was insensitive to its value in the simulation setting. Details of the simulations conducted to test this can be found in Supplementary File 1, available as Supplementary data at IJE online. The Stata code for this paper, including that used to create the simulation, is available from GitHub [https://github.com/venexia/PharmIV].

Table 1

A comparison of the power calculated from the formula and a validation simulation for an instrumental variable analysis where the causal effect , the frequency of the instrument and the residual variance

pX	pXZ	10 000 patients		20 000 patients		30 000 patients
pX	pXZ	Formula	Simulation	Formula	Simulation	Formula	Simulation
0.100	0.150	6.6%	6.1%	8.3%	7.9%	10.0%	9.8%
	0.300	32.3%	33.3%	56.4%	55.5%	73.8%	73.9%
	0.450	74.7%	75.6%	96.0%	95.9%	99.5%	99.5%
0.250	0.150	11.7%	11.4%	18.6%	18.3%	25.5%	25.5%
	0.300	6.6%	5.4%	8.3%	7.9%	10.0%	9.8%
	0.450	32.3%	32.8%	56.4%	56.1%	73.8%	73.7%
0.500	0.150	74.7%	74.2%	96.0%	95.9%	99.5%	99.6%
	0.300	32.3%	32.5%	56.4%	57.1%	73.8%	73.7%
	0.450	6.6%	5.0%	8.3%	7.2%	10.0%	10.1%

Simulation results

The formula and the simulation consistently provide similar results, with an absolute mean difference of for the parameter combinations presented in Table 1. There is also no discernible pattern in the differences, suggesting systematic bias is not present. Further to this, the power is consistent with its behaviour in other established power calculations. For example, increasing sample size universally improves power for all parameter combinations.

Discussion

In this paper, we have derived the formula necessary to calculate power for instrumental variable analysis with a single binary instrument, binary exposure and continuous outcome in the context of pharmacoepidemiology. The formula has been shown to be valid by comparison against a simulation study, which concluded that the formula provided near true values across a range of realistic parameters. We acknowledge that there is some overlap of this calculator with existing calculators such as that proposed by Burgess for Mendelian randomization. Although both calculators ultimately have a shared aim, namely to calculate the power of a study using an instrumental variable analysis, their application makes the calculators distinct. This is evident from the choice of parameterization of the power formula. The Mendelian randomization calculator opts for the coefficient of determination, . This is a natural choice for the application, as it summarizes the proportion of the variance expected to be explained by genetic factors. In contrast, we have opted to parameterize our calculator for use in pharmacoepidemiological studies in terms of the frequency of the instrument, the frequency of the exposure and the conditional probabilities that relate them. This is more intuitive to a pharmacoepidemiological audience who will typically use the proportion of patients exposed, i.e. the frequency of exposure. In addition to this, the instruments typically used in this framework–for example, physicians’ prescribing preference–do not necessarily fit as naturally to the notion of variance explained and are summarized much more easily by their frequency and their relationship with exposure. A concern for any instrumental variable analysis, whether in the context of pharmacoepidemiology or not, is the strength of the instrument. Instruments are termed weak when the correlation between the instruments and the exposure is low. A commonly cited threshold is a partial F statistic of the association between the instrument and the exposure of less than 10., Weak instruments will result in low power to detect a causal effect. They are also known to induce bias, as such instruments may explain only a small proportion of the association between the exposure and outcome. Therefore, although pharmacoepidemiological studies are likely to have stronger instruments than other forms of instrumental variable analysis such as Mendelian randomization, researchers should remain mindful of their choice of instrument and whether it is appropriate for the research question they wish to study. As for any power formula, the formula presented here is limited by its parameters, which simplify the dataset being considered. Power calculated from such formulae cannot account for dataset characteristics outside these parameters. For example, the formula makes no allowance for the presence of missing data–a known limiting factor in the power of a study. By allowing for missing data in the anticipated sample size, conservative estimates for the power of a study can be obtained using the formula presented. Further work is needed in order to establish the formula for power in other scenarios that use instrumental variable analysis within a pharmacoepidemiology context. This includes analyses with binary outcomes and analyses that involve multiple instrumental variables. As the use of instrumental variable analysis in pharmacoepidemiology becomes more commonplace, there is an increasing need to provide power calculations for studies using this type of analysis. To provide such information, accessible and accurate power formulae need to be made available. By using the formula presented here, either directly or via the online tool and packages in R and Stata, it is hoped that pharmacoepidemiologists can calculate the power of instrumental variable analysis studies with a single binary instrument, binary exposure and continuous outcome with ease.

Supplementary Data

Supplementary data area available at IJE online.

Funding

This work was supported by the Perros Trust and the Integrative Epidemiology Unit. The Integrative Epidemiology Unit is supported by the Medical Research Council and the University of Bristol [grant number MC_UU_12013/9]. S.B. is supported by a Sir Henry Dale Fellowship jointly funded by the Wellcome Trust and the Royal Society (Grant Number 204623/Z/16/Z). Conflict of interest: None to declare.

Key Messages

Research questions using instrumental variable analysis in pharmacoepidemiology have distinct structures that have previously not been catered for by instrumental variable analysis power calculators. Power can be calculated for studies using a single binary instrument to analyse the causal effect of a binary exposure on a continuous outcome in the context of pharmacoepidemiology using the presented formula, an online power calculator or packages available for use in both R and Stata. The use of this power calculator will allow investigators to determine whether a pharmacoepidemiology study is likely to detect clinically meaningful treatment effects before the study’s commencement. Click here for additional data file.

11 in total

1. An introduction to instrumental variables for epidemiologists.

Authors: S Greenland
Journal: Int J Epidemiol Date: 2000-08 Impact factor: 7.196

2. Evaluating short-term drug effects using a physician-specific prescribing preference as an instrumental variable.

Authors: M Alan Brookhart; Philip S Wang; Daniel H Solomon; Sebastian Schneeweiss
Journal: Epidemiology Date: 2006-05 Impact factor: 4.822

3. Instruments for causal inference: an epidemiologist's dream?

Authors: Miguel A Hernán; James M Robins
Journal: Epidemiology Date: 2006-07 Impact factor: 4.822

Review 4. Instrumental variable methods in comparative safety and effectiveness research.

Authors: M Alan Brookhart; Jeremy A Rassen; Sebastian Schneeweiss
Journal: Pharmacoepidemiol Drug Saf Date: 2010-06 Impact factor: 2.890

5. Instrumental variables I: instrumental variables exploit natural variation in nonexperimental data to estimate causal relationships.

Authors: Jeremy A Rassen; M Alan Brookhart; Robert J Glynn; Murray A Mittleman; Sebastian Schneeweiss
Journal: J Clin Epidemiol Date: 2009-04-08 Impact factor: 6.437

6. Instrumental variables II: instrumental variable application-in 25 variations, the physician prescribing preference generally was strong and reduced covariate imbalance.

Authors: Jeremy A Rassen; M Alan Brookhart; Robert J Glynn; Murray A Mittleman; Sebastian Schneeweiss
Journal: J Clin Epidemiol Date: 2009-04-05 Impact factor: 6.437

7. COX-2 selective nonsteroidal anti-inflammatory drugs and risk of gastrointestinal tract complications and myocardial infarction: an instrumental variable analysis.

Authors: Neil M Davies; George Davey Smith; Frank Windmeijer; Richard M Martin
Journal: Epidemiology Date: 2013-05 Impact factor: 4.822

8. The many weak instruments problem and Mendelian randomization.

Authors: Neil M Davies; Stephanie von Hinke Kessler Scholder; Helmut Farbmacher; Stephen Burgess; Frank Windmeijer; George Davey Smith
Journal: Stat Med Date: 2014-11-10 Impact factor: 2.373

9. Physicians' prescribing preferences were a potential instrument for patients' actual prescriptions of antidepressants.

Authors: Neil M Davies; David Gunnell; Kyla H Thomas; Chris Metcalfe; Frank Windmeijer; Richard M Martin
Journal: J Clin Epidemiol Date: 2013-09-24 Impact factor: 6.437

10. Sample size and power calculations in Mendelian randomization with a single instrumental variable and a binary outcome.

Authors: Stephen Burgess
Journal: Int J Epidemiol Date: 2014-03-06 Impact factor: 7.196

2 in total

Review 1. Accelerating the Search for Interventions Aimed at Expanding the Health Span in Humans: The Role of Epidemiology.

Authors: Anne B Newman; Stephen B Kritchevsky; Jack M Guralnik; Steven R Cummings; Marcel Salive; George A Kuchel; Jennifer Schrack; Martha Clare Morris; David Weir; Andrea Baccarelli; Joanne M Murabito; Yoav Ben-Shlomo; Mark A Espeland; James Kirkland; David Melzer; Luigi Ferrucci
Journal: J Gerontol A Biol Sci Med Sci Date: 2020-01-01 Impact factor: 6.591

2. Self-medication practices during the COVID-19 pandemic among the adult population in Peru: A cross-sectional survey.

Authors: Jean Franco Quispe-Cañari; Evelyn Fidel-Rosales; Diego Manrique; Jesús Mascaró-Zan; Katia Medalith Huamán-Castillón; Scherlli E Chamorro-Espinoza; Humberto Garayar-Peceros; Vania L Ponce-López; Jhesly Sifuentes-Rosales; Aldo Alvarez-Risco; Jaime A Yáñez; Christian R Mejia
Journal: Saudi Pharm J Date: 2020-12-15 Impact factor: 4.330

2 in total