Literature DB >> 31886433

Simulation-based power and sample size calculation for designing interrupted time series analyses of count outcomes in evaluation of health policy interventions.

Wei Liu1, Shangyuan Ye2, Bruce A Barton1, Melissa A Fischer3,4, Colleen Lawrence5, Elizabeth J Rahn6, Maria I Danila6, Kenneth G Saag6, Paul A Harris7, Stephenie C Lemon1, Jeroan J Allison1, Bo Zhang8.   

Abstract

OBJECTIVE: The purpose of this study was to present the design, model, and data analysis of an interrupted time series (ITS) model applied to evaluate the impact of health policy, systems, or environmental interventions using count outcomes. Simulation methods were used to conduct power and sample size calculations for these studies.
METHODS: We proposed the models and analyses of ITS designs for count outcomes using the Strengthening Translational Research in Diverse Enrollment (STRIDE) study as an example. The models we used were observation-driven models, which bundle a lagged term on the conditional mean of the outcome for a time series of count outcomes.
RESULTS: A simulation-based approach with ready-to-use computer programs was developed to calculate the sample size and power of two types of ITS models, Poisson and negative binomial, for count outcomes. Simulations were conducted to estimate the power of segmented autoregressive (AR) error models when autocorrelation ranged from -0.9 to 0.9, with various effect sizes. The power to detect the same magnitude of parameters varied largely, depending on the testing level change, the trend change, or both. The relationships between power and sample size and the values of the parameters were different between the two models.
CONCLUSION: This article provides a convenient tool to allow investigators to generate sample sizes that will ensure sufficient statistical power when the ITS study design of count outcomes is implemented.
© 2019 The Authors.

Entities:  

Keywords:  Count outcomes; Interrupted time series; Policy evaluation; Power; Quasi-experimental design; Sample size calculation; Segmented regression

Year:  2019        PMID: 31886433      PMCID: PMC6920506          DOI: 10.1016/j.conctc.2019.100474

Source DB:  PubMed          Journal:  Contemp Clin Trials Commun        ISSN: 2451-8654


Introduction

Interrupted time series (ITS) analysis is a strong quasi-experimental design that can be used to evaluate the effectiveness of a population-level intervention that is clearly defined at a given time point ([[1], [2], [3]]). ITS designs usually involve repeatedly collecting a particular aggregate level outcome pre- and post-intervention ([4,5]). The segmented time series regression model ([2]) with one discontinuity time point is the general tool used to evaluate such data, in which each segment can have a different level, trend, or both. That is, two-line segments are fitted simultaneously and separated at the intervention time point. A change in the “level” of the outcome is indicated by a discontinuity at the time point when the intervention was introduced, and the change in the “trend” is revealed by a change of slope. Statistical hypothesis tests [6] are typically used to detect changes in outcome after the implementation of intervention. ITS is typically used when randomized trials are infeasible and has been extensively used on evaluating public health and health service interventions ([3,7]). The assumptions and advantages of using ITS analysis have been thoroughly discussed ([8,9]). Although most studies have focused on aggregated level single-arm ITS design, two-arm ITS design ([4]) and individual level ITS models ([10]) have also been discussed. Modeling the time series of the observed count data is a more challenging task than creating time series models for continuous data. Unlike modeling a normal time series of continuous data, according to Jung et al. [11], a potential model for the time series of the count data must be able to characterize both the dependence structure and the overdispersion of data. Several models have been proposed and categorized into two types ([12]): observation-driven models, which bundle a lagged term on the conditional mean of the outcome; and parameter-driven models, driven by a dynamic process, which are reviewed by Cameron and Trivedi [13]. That is, observation-driven models directly model the conditional mean of current count data to historical data, and parameter-driven models can be considered to be a generalized linear model (GLM) with a pre-specified dependence structure. For the observation-driven models, the two most commonly used models are the generalized linear autoregressive moving average (GLARMA) model and the log-linear (LL) model. The GLARMA model was proposed by Shephard [14] and Davis et al. [15], and the LL model, first proposed by Zeger and Qaqish [16], has been further investigated by Fokianos and Fried [17,18], Woodard, Mateeson, and Henderson [19] and Douc, Doukhan, and Moulines [20]. Further discussion on theoretical properties, like the stationarity and ergodicity of the GLARMA and LL models, can be found in Dunsmuir and Scott [21] and Liboschik et al. [22]. The most common parameter-driven model is the Zeger model [23]. Considering the Gaussian linear process of the conditional mean of the outcome, the Zeger model was studied by Zeger [23] and Davis et al. [24]. Its equivalent logarithmic form was studied by Chan and Ledorter [25], Kuk and Cheng [26], Jung and Liesenfeld [27], and Jung and Tremayne [28]. Though count outcomes are the common practice in policy research, the ITS design of count outcomes has only made limited appearances in the literature. For instance, Walter et al. [29] modeled injury count data using the negative binomial log-linear model and fit the model by the maximum likelihood estimator. Wang, Olivier, and Grzebieta [30] considered the same model and compared the estimation performance between the maximum likelihood estimator, the full Bayesian estimator, and the empirical Bayesian estimator via simulation. However, the power of the statistical tests in ITS analyses with count data has never been studied. To address this gap, in this manuscript we conducted simulations to estimate the power and sample sizes in various settings. Here, we only considered the most basic two-phase single-arm ITS design for count outcomes. More complicated three-phase two-arm models are beyond the scope of this paper. A similar study on the two-phase ITS design of continuous data outcomes was conducted by Zhang, Wagner, and Ross-Degnan [6]. Herein, we solely focus on the observation-driven model for a time series of count data, in particular the LL models. We only consider the observation-driven models because they are designed to allow the likelihood to be evaluated easily, but the parameter-driven models usually involves high-dimensional integration, which is computationally infeasible [15].

Exemplar study: Strengthening Translational Research in a Diverse Enrollment (STRIDE) study

The power and sample size calculation for the ITS design of count outcomes were motivated by the required statistical analysis of data generated from the STRIDE study, an ongoing five-year study aimed at developing an intervention to increase the engagement of African Americans and Latinos in translational research ([31]). Since the primary outcome of the study is the number of African Americans and Latinos enrolled in ongoing translational clinical trials, to mitigate their historical underrepresentation in translational research, the STRIDE study is a representative example of the ITS design of count data. The STRIDE project is a partnership of the CTSAs (Clinical and Translational Science Awards program) at the University of Massachusetts Medical School, the University of Alabama at Birmingham, and Vanderbilt University—three geographically diverse sites with large African American and Latino populations. The STRIDE intervention was motivated by previous studies of exposed barriers to research participation ([[32], [33], [34]]). Participant and systematic barriers include limited research literacy, lack of trust stemming from historical abuses, lack of research staff training in appropriate cultural competency skills, and confusion of informed consent procedures in research. To overcome these barriers, the proposed multi-level intervention contains three components: (1) storytelling for the promotion of research literacy; (2) simulation-based training to improve culturally appropriate recruitment and informed consent; and (3) an electronic consent platform to enhance cultural competency. The STRIDE intervention builds synergistically on emerging work at each institution to create a new intervention that addresses barriers on multiple levels. The primary outcome of the STRIDE project is the number of recruitments of African Americans and Latinos, as well as the total recruitment. To test the effectiveness of the STRIDE intervention, we have recruited ongoing translational clinical studies at each of the three partnering CTSA hubs. Both the interventions and contemporaneous controls (i.e., clinical trials without STRIDE intervention) are introduced at each of the CTSA hubs. Each participating university layers the STRIDE intervention on one study, with another study serving as the un-intervened control. Thus, using the number of African American and Latino participants recruited, or the total number of participants, as the prime response variable (outcome), the STRIDE intervention will be evaluated by the two-arm ITS design and will include six ongoing translational research studies. Three studies will receive the intervention and comprise the study group, and the remaining three un-intervened studies will comprise the comparison group. The study outcomes are collected on a weekly basis. The change in study outcomes will be examined based on a two-phase framework (pre-implementation versus post implementation).

Methods

Design and analysis of a single-arm ITS study with count outcomes

The STRIDE study has motivated our investigation of a time series study design. In a two-phase ITS study, if all study subjects and sites are planned to be exposed to an intervention over time, then such a study is a single-arm ITS study. Let represent the count outcome variable that is measured at time point , let be the actual or converted study time (in the simulation, we also considered the logarithm of the actual time to avoid model explosion) from the start to the end of the study, let be a binary indicator for the second phase of the study, and let be the time point after the onset of intervention.

Observation-driven model

Here, we give a brief introduction of the modeling framework for the observation-driven segment regression time series model of count outcomes. For a single-arm ITS design of count outcomes, a common kind of observation-driven time series build model on the logarithm of the conditional mean of the response can be written aswhere , is the mean of conditioning on the past responses and means, the function joints current outcome with past outcomes that are correlated in the time series, is the actual time of the study, is the time point of intervention, is the binary indicator for the second phase of the study, and , and are unknown parameters. In observation-driven models, the effect of covariates on the outcome or its mean is complicated and difficult to interpret because the conditional mean also dependents on past outcomes ([15]). For the ITS design, the coefficient is the regression intercept representing the starting level of the logarithm of the conditional mean, is the slope of the logarithm of the conditional mean before the implementation of the intervention, represents the change in the level of the logarithm of the conditional mean caused by the intervention versus non-intervention, and represents the difference in the slopes of the logarithm of the conditional mean caused by the intervention versus non-intervention. The focus of the ITS analysis is to examine the significance of , which indicates an immediate intervention effect on the level change of the conditional mean, and the significance of, which indicates the intervention effect in terms of the change in the trend of the conditional mean. Note that the purpose of subtracting , the time point after the onset of intervention, from the study time is to maintain the interpretation of the corresponding regression coefficients . Let , while and are non-negative integers less than . A variety of choices for were proposed. For example, when , (model (1)) is the Zeger–Qaqish model [16], where is a transition of the shielding influence from a zero value, such as , with s positive constant ; when , where where is a scaling residual and is some scaling function of , and all and , the model is a generalized linear autoregressive moving average (GLARMA) model [14]; when , it is a log-linear (LL) model. Here, we will focus on LL models with low orders, i.e., small values of and . Specifically, we model the time series of counts via the LL model with and , denoted by LL (0,1), which has the formWhere the logarithm of the mean linearly depends on the logarithm of the last observation, which positively or negatively depends on . Since we use some logarithm functions in this model, it is hard to develop formulas for the mean or the autocovariance function of or . The most commonly used distribution for count data is Poisson distribution, in which the conditional distribution of response on past history is denoted by , and the density has the form Poisson distribution is simple and popular. However, Poisson distribution is known to have equal mean and variance, which can be unrealistic in some settings. A more appropriate and flexible model for modeling count data with a larger overdispersion than Poisson (i.e., with greater variability) is negative binomial distribution. Denoting the conditional distribution of response on past history to be , the density function for negative binomial can be expressed aswhere , with variance . For many observation-driven models of count time series, the stationarity and ergodicity of the process, which are used to develop consistency and asymptotic normality, are only partially discussed in some special and simple scenarios, the majority of which are still unclear. For Poisson responses with (constant), model (2) has a stationary distribution when . More discussion on the stationarity and ergodicity of GLARMA and LL models can be found in Dunsmuir and Scott [21] and Liboschik et al. [22].

Simulation-based sample size and power calculation

We used a simulation-based method to calculate the power of different statistical tests under different scenarios (different sample size and parameter values) for the two-phase single-arm ITS design of the count outcomes. For an arbitrary two-sided statistical test with the null hypothesis versus the alternative , where can be either a univariate regression coefficient or any combination of multiple coefficients defined in Section 3.2. Here, we considered three null hypotheses in our simulation study: (i) , to test whether any changes (level, trend or both) exist after intervention; (ii) , to test the change on level after intervention; and (iii) , to test any trend changes after intervention. In this simulation-based sample size and power calculation, we considered the logarithm of actual time to avoid model explosion. represented the change in the level of the logarithm of the conditional mean caused by intervention versus non-intervention, and represented the difference in the slopes of the logarithm of the conditional mean caused by intervention versus non-intervention. For these three hypothesis tests, chi-square (Wald) tests were employed as test statistics, and the empirical power of these tests were calculated via simulation. For any statistical tests, the power under a pre-specified significance level is defined as the probability that rejecting the null hypothesis conditioning with the alternative hypothesis is true, i.e., . Since this probability is generally unknown, we used simulation to estimate the power. For the simulation-based method, a large number of datasets were randomly generated from the ITS model we introduced in Section 3.2, with pre-specified non-zero coefficients, and statistical hypothesis tests were conducted for each dataset. Then, the empirical power was estimated as the frequency that the null hypothesis was rejected divided by the total number of datasets. Denoting as the number of datasets, this estimated power will approach the true power if the is large enough. In our simulation study, we used and a significance level of 0.05 for all cases. We considered different scenarios for sample sizes, parameters, and correlation coefficients. For sample size , i.e., the number of observations over time, we considered the cases and , with equal numbers of observations uniformly distributed before and after policy intervention. For the negative binomial distributions, we specified the overdispersion parameter to be . The start value was set to be . We considered 3 hypothesis tests. For hypothesis test (i), we considered the different values of , which are the expected level change plus the expected trend change after the intervention of conditioning on the same outcome history. In this case, we chose the parameter values to be and for both the Poisson and negative binomial time series. For hypothesis test (ii), we considered the different values of , which is the expected level change caused by the intervention of conditioning on the same outcome history. For this test, with specified to be 0, we chose the values of to be and for the Poisson time series, and and for the negative binomial time series. For hypothesis test (iii), we considered the different values of , which is the expected trend change caused by the intervention of conditioning on the same outcome history. For this test, with specified to be 0, we chose the values of to be and for the Poisson time series, and and for the negative binomial time series. Negative values for the parameters indicate a “decrease” (either level, trend, or both) after intervention, and positive values indicate an “increase” after intervention. We chose different parameter values between the Poisson and negative binomial models because negative binomial models usually use modeling count data with larger overdispersion than Poisson models. We also considered different values for coefficient in model (2), which represents the degree of dependence between the current conditional mean and historical outcomes. Here, we considered all cases from −0.9 to 0.9, with a step of and case , which represents the case with no correlation.

Results

Table 1, Table 2 show the estimated power for testing hypothesis (i) for the Poisson and negative binomial time series for model (2), with based on a significance level of 0.05. The estimated power increased as , the sample size increased, or the values of the parameter became more significant (i.e., the absolute value of became greater). The trends of the estimated power of and sample size are illustrated by the surface plots in Fig. 1.
Table 1

Estimated power testing for the Poisson time series with a conditional mean model LL (0,1) when = based on 200 simulated data sets and a statistical significance level of 0.05. The symbol “-” indicates that more than one fourth of the data sets cannot be successfully generated.

γ1Sample size
1824324856648096
β2+β3 = −1
−0.90.080.180.330.780.94111
−0.70.080.140.360.790.94111
−0.50.110.210.380.820.96111
−0.30.100.230.440.860.97111
−0.10.120.270.450.890.98111
00.140.310.500.920.99111
0.10.150.330.540.930.99111
0.30.200.430.640.991111
0.50.290.560.790.991111
0.70.470.700.9211111
0.9
0.88
1
1
1
1
1
1
1
β2+β3 = −0.5
−0.90.050.100.120.280.470.630.941
−0.70.040.100.130.320.490.660.951
−0.50.050.120.150.330.520.690.981
−0.30.070.130.200.380.600.750.991
−0.10.080.150.180.480.660.8511
00.110.130.220.500.690.9011
0.10.110.150.250.540.760.9211
0.30.130.200.310.660.900.9811
0.50.170.270.380.910.97111
0.70.270.450.7511111
0.9
0.74
0.97
1
1
1
1
1
1
β2+β3 = −0.25
−0.90.030.070.060.150.150.220.430.71
−0.70.040.060.050.150.190.230.540.79
−0.50.040.100.070.160.170.260.470.83
−0.30.070.100.090.190.170.270.600.86
−0.10.050.100.080.170.260.320.650.97
00.060.100.090.190.290.360.770.98
0.10.070.090.140.210.300.360.801
0.30.100.130.160.260.380.630.951
0.50.130.140.180.390.670.8511
0.70.170.250.400.860.99111
0.9
0.40
0.69
0.99
1
1
1


β2+β3 = 0.25
−0.90.060.060.060.150.170.210.460.74
−0.70.040.080.100.130.170.240.450.76
−0.50.050.080.070.150.190.260.510.86
−0.30.050.100.060.180.180.320.630.89
−0.10.070.120.070.170.230.350.690.93
00.070.110.090.170.310.350.690.95
0.10.070.110.110.220.300.400.790.99
0.30.050.080.140.270.410.550.871
0.50.100.130.150.390.630.8011
0.70.150.210.360.890.99111
0.9
0.26
0.42
0.97





β2+β3 = 0.5
−0.90.050.070.110.330.520.710.961
−0.70.040.100.130.350.570.710.931
−0.50.040.110.120.410.550.750.971
−0.30.070.120.170.470.640.860.991
−0.10.090.140.210.460.700.890.991
00.090.120.240.520.760.9411
0.10.080.170.290.580.740.9511
0.30.130.150.300.690.890.9811
0.50.170.270.500.910.98111
0.70.330.550.931111
0.9
0.52
0.94
1





β2+β3 = 1
−0.90.150.220.440.910.98111
−0.70.160.240.480.930.99111
−0.50.200.290.520.920.99111
−0.30.210.360.560.960.99111
−0.10.250.370.660.971111
00.320.480.700.991111
0.10.340.530.770.981111
0.30.450.690.9011111
0.50.720.910.99110.9911
0.70.981111
0.90.970.99
Table 2

Estimated power testing for the negative binomial time series with a conditional mean model LL (0,1) when = based on 200 simulated data sets and a statistical significance level of 0.05. The symbol “-” indicates that more than one fourth of the data sets cannot be successfully generated.

γ1Sample size
1824324856648096
β2+β3 = −1
−0.90.260.320.500.890.940.9811
−0.70.270.320.530.890.97111
−0.50.290.370.570.890.95111
−0.30.320.410.600.930.970.9911
−0.10.330.490.630.931.000.9911
00.360.490.660.960.99111
0.10.360.540.700.940.99111
0.30.390.590.790.951111
0.50.540.690.790.981111
0.70.670.840.9111111
0.9
0.84
0.93
0.98
1
1
1
1
1
β2+β3 = −0.5
−0.90.260.290.400.620.770.870.971
−0.70.280.310.430.650.790.870.971
−0.50.310.330.460.680.800.880.981
−0.30.330.360.500.720.860.900.971
−0.10.370.400.540.770.860.940.991
00.390.430.540.810.840.9311
0.10.430.470.570.820.930.9611
0.30.380.550.650.850.940.9811
0.50.580.670.790.941.00111
0.70.660.870.880.970.98111
0.9
0.87
0.94
0.99
1
1
1
1
1
β2+β3 = −0.25
−0.90.280.350.400.470.590.660.840.92
−0.70.290.370.400.520.550.600.890.95
−0.50.330.350.410.510.600.660.870.95
−0.30.360.380.470.570.730.770.880.95
−0.10.390.410.480.610.660.760.970.98
00.390.460.490.700.790.820.900.99
0.10.430.490.600.700.840.870.950.99
0.30.460.560.630.800.860.930.971
0.50.600.690.820.940.960.9911
0.70.760.850.930.981.000.9911
0.9
0.94
0.99
1
1
1
1
1
1
β2+β3 = 0.25
−0.90.330.420.510.630.700.770.840.93
−0.70.330.430.490.660.710.810.890.95
−0.50.370.470.530.590.720.780.910.95
−0.30.420.470.550.670.790.860.940.96
−0.10.470.480.600.810.790.890.951
00.510.560.610.820.800.940.970.99
0.10.590.610.650.850.880.940.991
0.30.580.740.850.940.950.9911
0.50.800.860.960.981111
0.70.900.950.980.991111
0.9
0.99
1
1
1
1
1

1
β2+β3 = 0.5
−0.90.430.460.600.750.880.940.991
−0.70.440.520.560.750.880.940.991
−0.50.460.550.610.840.870.950.991
−0.30.490.570.660.830.920.970.981
−0.10.480.650.760.910.970.9911
00.570.690.760.900.97111
0.10.660.710.800.960.990.9911
0.30.780.860.900.991111
0.50.830.920.970.991111
0.70.960.9911111
0.9
0.99
1
1





β2+β3 = 1
−0.90.580.690.820.9610.9911
−0.70.620.670.850.981111
−0.50.640.730.901.001111
−0.30.620.800.860.991111
−0.10.740.840.9411111
00.750.840.950.991111
0.10.820.910.9811111
0.30.890.950.9911111
0.50.941111111
0.70.99111111
0.9111111
Fig. 1

Surface plots of the estimated power for hypothesis test of and sample size . The left panel is for the Poisson time series with ; the right panel is for the negative binomial time series with .

Estimated power testing for the Poisson time series with a conditional mean model LL (0,1) when = based on 200 simulated data sets and a statistical significance level of 0.05. The symbol “-” indicates that more than one fourth of the data sets cannot be successfully generated. Estimated power testing for the negative binomial time series with a conditional mean model LL (0,1) when = based on 200 simulated data sets and a statistical significance level of 0.05. The symbol “-” indicates that more than one fourth of the data sets cannot be successfully generated. Surface plots of the estimated power for hypothesis test of and sample size . The left panel is for the Poisson time series with ; the right panel is for the negative binomial time series with . Table 3 and Table 4 show the estimated power for testing hypothesis (ii) for the Poisson and negative time series for model (2) and the pre-specified parameter values in the level change based on a significance level of 0.05. We considered for the Poisson time series in Table 3, and for the negative binomial time series in Table 4. For the Poisson models, the estimated power increased as , the sample size increased, or the values of the parameter became more significant. For negative binomial models, the results were similar to those of the Poisson models, but the estimated power was decreased for very large values of . The trends for the estimated power of and sample size are illustrated by the surface plots in Fig. 2.
Table 3

Estimated power testing for the Poisson time series with a conditional mean model LL (0,1) when = based on 200 simulated data sets and a statistical significance level of 0.05. The symbol “-” indicates more than one fourth of the data sets cannot be successfully generated.

γ1Sample size
1824324856648096
β2 = −1
−0.90.050.160.210.490.670.740.930.99
−0.70.060.170.250.530.650.771.000.99
−0.50.070.180.290.560.720.800.971
−0.30.100.240.300.600.770.870.981
−0.10.150.300.370.690.820.9311
00.150.310.380.750.870.9811
0.10.160.330.430.820.900.9711
0.30.230.390.570.900.99111
0.50.300.490.750.991111
0.70.460.670.9511111
0.9
0.68
0.98
1
1
1
1


β2 = −0.5
−0.90.040.090.070.210.250.330.450.63
−0.70.050.100.090.200.250.350.450.69
−0.50.050.120.120.250.260.440.590.77
−0.30.070.130.150.290.310.450.650.86
−0.10.110.180.120.350.420.520.770.89
00.110.170.160.390.460.550.800.99
0.10.120.190.200.430.460.660.881
0.30.150.230.250.600.730.870.981
0.50.200.310.390.840.94111
0.70.220.410.7511111
0.9
0.56
0.95
1
1
1



β2 = −0.25
−0.90.030.050.060.120.130.110.160.22
−0.70.030.070.060.130.110.160.170.20
−0.50.040.080.070.140.140.140.190.25
−0.30.050.080.060.140.140.170.250.40
−0.10.070.110.070.160.200.170.290.47
00.070.110.090.180.200.280.340.54
0.10.090.130.100.140.220.270.430.63
0.30.120.170.120.220.330.450.600.93
0.50.130.150.250.430.560.740.981
0.70.180.270.380.860.98111
0.9
0.33
0.72
0.99
1




β2 = 0.25
−0.90.040.060.060.090.100.130.130.28
−0.70.040.090.050.110.110.170.170.23
−0.50.040.080.070.110.140.150.260.28
−0.30.050.100.080.120.190.160.310.34
−0.10.050.080.070.100.130.230.320.48
00.070.100.100.150.170.190.350.50
0.10.060.100.100.130.170.280.490.72
0.30.090.110.100.250.400.450.760.97
0.50.120.160.240.420.710.810.991
0.70.130.270.490.940.99111
0.9
0.10
0.64
1





β2 = 0.5
−0.90.060.120.100.220.280.330.540.66
−0.70.070.120.130.250.390.460.580.68
−0.50.070.130.140.300.350.480.630.84
−0.30.120.120.150.350.360.520.790.94
−0.10.110.170.220.440.500.640.911
00.110.1650.210.460.530.730.921
0.10.130.200.200.500.670.780.971
0.30.160.210.430.790.860.960.991
0.50.230.390.650.971111
0.70.430.720.9511111
0.9
0.80
1
0.99





β2 = 1
−0.90.220.320.470.810.890.940.991
−0.70.240.350.510.840.890.9511
−0.50.270.420.540.840.940.9911
−0.30.330.460.700.900.97111
−0.10.360.510.730.970.99111
00.420.590.800.980.99111
0.10.480.660.8211111
0.30.640.830.9611111
0.50.790.97111111
0.70.9910.9711111
0.90.990.970.98
Table 4

Estimated power testing for the negative binomial time series with a conditional mean model LL (0,1) when = based on 200 simulated data sets and a statistical significance level of 0.05. The symbol “-” indicates that more than one fourth of the data sets cannot be successfully generated.

γ1Sample size
1824324856648096
β2 = −3
−0.90.020.050.320.910.96111
−0.70.020.050.320.910.97111
−0.50.030.050.340.940.97111
−0.30.030.070.360.931111
−0.10.030.080.400.940.99111
00.030.080.420.930.99111
0.10.030.110.430.930.99111
0.30.020.210.510.931111
0.50.070.220.530.9710.9911
0.70.110.400.660.950.960.960.940.87
0.9
0.22
0.50
0.66





β2 = −2
−0.90.040.150.340.760.860.890.981
−0.70.050.180.390.780.870.940.991
−0.50.060.200.380.830.850.940.991
−0.30.070.210.420.840.960.950.981
−0.10.070.220.470.860.950.9811
00.070.240.500.840.950.9711
0.10.090.280.520.880.940.980.991
0.30.120.340.580.870.930.9811
0.50.130.430.570.900.950.980.980.96
0.70.250.500.670.930.920.920.900.54
0.9
0.35
0.52
0.61





β2 = −1
−0.90.060.110.200.340.360.450.570.65
−0.70.070.140.180.330.390.470.610.66
−0.50.090.170.220.360.370.550.620.69
−0.30.100.190.220.340.460.520.660.73
−0.10.130.190.220.400.510.530.640.72
00.140.200.260.450.490.630.700.82
0.10.130.230.260.420.520.580.720.73
0.30.110.240.310.490.600.630.720.67
0.50.200.290.350.510.590.630.490.38
0.70.170.400.400.500.520.360.090.02
0.9
0.31
0.31
0.35





β2 = 1
−0.90.150.230.280.420.440.520.560.71
−0.70.200.210.290.400.470.470.660.73
−0.50.180.240.290.410.460.520.660.77
−0.30.210.260.390.490.540.620.690.78
−0.10.170.320.390.560.590.620.720.82
00.270.350.380.540.630.690.700.81
0.10.240.270.390.560.610.590.730.80
0.30.290.390.400.630.620.680.720.71
0.50.320.420.560.620.620.670.600.58
0.70.420.490.610.580.520.50
0.9
0.23
0.27






β2 = 2
−0.90.530.640.750.880.940.930.980.99
−0.70.500.670.800.940.950.980.991
−0.50.580.670.810.940.960.9911
−0.30.580.720.840.980.990.9811
−0.10.670.770.860.981.000.9911
00.670.780.890.980.990.991.1
0.10.670.790.930.990.970.980.990.97
0.30.790.820.880.920.970.970.940.93
0.50.780.880.840.850.850.840.770.73
0.70.700.780.740.74
0.9








β2 = 3
−0.90.770.920.950.990.99111
−0.70.850.870.980.99110.991
−0.50.820.950.9911111
−0.30.890.981.0011111
−0.10.890.971.0011111
00.930.960.9911.00111
0.10.920.980.991.001.00111
0.30.930.970.950.960.970.970.960.97
0.50.890.940.930.880.910.860.83
0.70.770.790.71
0.9
Fig. 2

Surface plots of the estimated power for hypothesis test of and sample size . The left panel is for the Poisson time series with ; the right panel is for the negative binomial time series with .

Estimated power testing for the Poisson time series with a conditional mean model LL (0,1) when = based on 200 simulated data sets and a statistical significance level of 0.05. The symbol “-” indicates more than one fourth of the data sets cannot be successfully generated. Estimated power testing for the negative binomial time series with a conditional mean model LL (0,1) when = based on 200 simulated data sets and a statistical significance level of 0.05. The symbol “-” indicates that more than one fourth of the data sets cannot be successfully generated. Surface plots of the estimated power for hypothesis test of and sample size . The left panel is for the Poisson time series with ; the right panel is for the negative binomial time series with . Table 5 and Table 6 show the estimated power testing for the Poisson and negative time series with model (2) and the pre-specified values of the trend change parameter based on a significance level of 0.05. We considered for the Poisson time series in Table 5, and for the negative binomial time series in Table 6. Similar to the previous test, the estimated power increased as , the sample size increased, or the values of the parameter became more significant for the Poisson time series. For the negative binomial time series, again, the estimated power increased first and then decreased as increased. This phenomenon can be more clearly observed for large values of the parameter. Further, when the value of the parameter was negative, the estimated power increased first and then decreased as the values of the parameter decreased. The difference in the estimated power between the parameter values of the opposite signs is due to the fact that count data are defined based on the non-negative support. Thus, models are built on the logarithm of the conditional mean of the responses. The trends of the estimated power of and sample size are illustrated by the surface plots in Fig. 3.
Table 5

Estimated power testing for the Poisson time series with a conditional mean model LL (0,1) when = based on 200 simulated data sets and a statistical significance level of 0.05. The symbol “-” indicates that more than one fourth of the data sets cannot be successfully generated.

γ1Sample size
1824324856648096
β3 = −0.10
−0.90.070.100.230.680.930.9811
−0.70.080.110.260.760.97111
−0.50.090.130.270.800.97111
−0.30.090.120.270.820.99111
−0.10.090.170.320.900.99111
00.120.170.340.941111
0.10.130.170.380.961111
0.30.150.240.510.981111
0.50.260.370.6911111
0.70.380.530.9311111
0.9
0.72
0.93
1
1
1
1
1

β3 = −0.05
−0.90.050.070.110.240.490.690.941
−0.70.060.080.110.270.510.750.981
−0.50.090.080.140.310.550.770.991
−0.30.050.070.170.350.600.840.991
−0.10.060.080.150.400.670.8911
00.080.100.180.470.750.9211
0.10.070.110.190.550.840.9611
0.30.090.120.240.710.95111
0.50.160.180.350.871111
0.70.250.410.6811111
0.9
0.43
0.79
1
0.99
1
1
1

β3 = −0.01
−0.90.050.070.070.060.050.050.130.19
−0.70.050.090.050.090.070.090.150.19
−0.50.060.070.060.060.080.090.200.21
−0.30.050.090.080.070.070.110.160.33
−0.10.070.090.060.070.080.110.140.40
00.080.090.080.080.090.120.190.38
0.10.080.100.100.090.100.160.260.48
0.30.090.090.080.120.110.190.340.72
0.50.120.090.090.170.210.280.660.95
0.70.140.150.140.240.420.7010.99
0.9
0.12
0.14
0.13
0.95




β3 = 0.01
−0.90.070.070.060.080.100.080.140.19
−0.70.060.090.090.070.070.150.130.28
−0.50.060.100.070.100.060.110.190.25
−0.30.050.110.060.090.090.120.120.31
−0.10.090.100.090.090.070.140.180.37
00.100.110.090.110.070.120.240.47
0.10.080.120.100.090.120.150.340.50
0.30.080.130.090.120.150.210.410.64
0.50.110.100.120.160.250.360.640.90
0.70.130.120.170.350.470.610.800.99
0.9
0.16
0.11
0.06





β3 = 0.05
−0.90.080.100.160.450.500.780.981
−0.70.090.110.200.390.550.830.991
−0.50.090.120.200.490.660.8211
−0.30.100.140.230.460.730.880.991
−0.10.120.150.170.580.780.9411
00.130.180.240.610.790.9711
0.10.130.180.250.710.880.9811
0.30.120.160.330.820.940.9811
0.50.130.220.490.870.970.990.990.97
0.70.220.430.670.980.990.990.98
0.9
0.22
0.30
0.83





β3 = 0.10
−0.90.130.190.360.900.99111
−0.70.170.180.470.920.99111
−0.50.160.230.430.951111
−0.30.170.270.500.951111
−0.10.190.250.530.971111
00.170.310.560.971111
0.10.200.320.640.991111
0.30.190.400.750.981111
0.50.290.580.841110.991
0.70.480.710.880.980.98
0.90.380.900.89
Table 6

Estimated power testing for the negative binomial time series with a conditional mean model LL (0,1), when = based on 200 simulated data sets and a statistical significance level of 0.05. The symbol “-” indicates that more than one fourth of the data sets cannot be successfully generated.

γ1Sample size
1824324856648096
β3 = −0.25
−0.90.070.150.410.910.960.960.970.98
−0.70.080.150.410.940.950.960.970.99
−0.50.120.190.390.950.950.950.950.98
−0.30.130.200.440.940.950.960.950.95
−0.10.130.240.430.950.950.970.980.95
00.150.250.460.930.940.950.970.95
0.10.150.260.450.950.940.970.920.92
0.30.180.310.510.940.950.930.940.89
0.50.260.430.600.890.880.860.820.74
0.70.320.510.650.680.660.640.550.48
0.9
0.39
0.50
0.57
0.39
0.44
0.35
0.16
0.18
β3 = −0.10
−0.90.080.090.190.510.730.8611
−0.70.080.100.200.590.780.890.991
−0.50.090.120.190.550.800.9211
−0.30.100.110.230.600.820.9511
−0.10.080.140.240.610.830.9311
00.090.150.230.700.850.9511
0.10.110.180.270.720.840.9911
0.30.100.150.280.710.890.9911
0.50.170.260.370.730.870.9510.99
0.70.210.280.370.700.740.740.730.66
0.9
0.30
0.36
0.34
0.39




β3 = −0.05
−0.90.090.090.100.180.270.360.710.89
−0.70.060.090.100.160.280.450.720.93
−0.50.090.080.100.210.280.440.720.93
−0.30.090.100.120.260.370.490.820.93
−0.10.090.120.120.270.400.540.791.00
00.100.110.140.230.440.580.840.98
0.10.110.100.170.260.390.560.870.99
0.30.120.120.120.290.390.570.840.96
0.50.170.160.180.320.430.600.810.93
0.70.150.140.180.370.380.450.530.48
0.9
0.21
0.17
0.17
0.14
0.07



β3 = 0.05
−0.90.100.110.090.160.300.300.490.66
−0.70.110.090.100.180.250.330.560.80
−0.50.120.130.110.230.310.400.650.81
−0.30.140.130.120.190.350.440.630.80
−0.10.100.100.120.240.290.360.610.74
00.150.090.140.240.260.390.530.72
0.10.100.150.130.200.270.330.490.66
0.30.130.110.160.160.190.270.310.31
0.50.170.130.100.160.140.120.150.12
0.70.140.080.080.040.040.04
0.9
0.08
0.04
0.02





β3 = 0.10
−0.90.100.110.230.490.670.770.930.99
−0.70.110.130.250.530.690.820.930.98
−0.50.090.220.180.510.750.850.950.97
−0.30.140.180.260.560.720.840.930.98
−0.10.110.180.260.540.660.760.810.84
00.140.190.230.530.640.740.760.74
0.10.160.150.210.470.500.590.650.57
0.30.120.170.230.280.300.320.360.36
0.50.150.140.130.170.180.13
0.70.120.080.04
0.9
0.07







β3 = 0.25
−0.90.310.450.680.940.940.950.960.94
−0.70.240.420.640.970.930.930.910.91
−0.50.190.410.720.890.900.910.870.82
−0.30.260.400.670.790.790.800.720.69
−0.10.270.360.560.620.610.620.56
00.230.330.430.560.530.480.49
0.10.220.290.450.380.410.34
0.30.200.270.220.190.20
0.50.150.130.10
0.70.07
0.9
Fig. 3

Surface plots of the estimated power for hypothesis test of and sample size . The left panel is for the Poisson time series with ; the right panel is for the negative binomial time series with .

Estimated power testing for the Poisson time series with a conditional mean model LL (0,1) when = based on 200 simulated data sets and a statistical significance level of 0.05. The symbol “-” indicates that more than one fourth of the data sets cannot be successfully generated. Estimated power testing for the negative binomial time series with a conditional mean model LL (0,1), when = based on 200 simulated data sets and a statistical significance level of 0.05. The symbol “-” indicates that more than one fourth of the data sets cannot be successfully generated. Surface plots of the estimated power for hypothesis test of and sample size . The left panel is for the Poisson time series with ; the right panel is for the negative binomial time series with . For large absolute values of , the time series were more likely to explode, i.e., the data in the certain time series can increase (or decrease) so fast that the computer program cannot generate values over a certain threshold because of this rapid expansion. It was also often impossible to generate a time series with the desired sample size. This situation usually happened for large sample sizes. Estimations do not exist for these exploded models, since data cannot be successfully generated, so the estimated powers are marked with the symbol “-” in the tables when more than one fourth of the simulations (more than 50 times) could not generate a time series with the specified length.

Discussion

ITS is a powerful yet simple quasi-experimental design that has been widely applied to many population-based public health and health service intervention studies ([2,7]). In this article, we studied the models of ITS design for count outcomes. More specifically, we discussed low order log-linear models for ITS design, a special type of observation-driven model, with two distribution specifications (Poisson and negative binomial). Our study was motivated by the STRIDE study, which was designed based on the state-of-the-art power calculation method of the two-arm two-phase ITS design of continuous outcomes (the rate of African American and Latino participants recruited) proposed by Zhang et al. [6]. Because we were also interested in the number of African American and Latino participants recruited and the total number of participants, similar power calculation method using ITS design for count outcomes needed to be investigated. Herein, a simulation-based method was applied to demonstrate the power of hypothesis tests on level change, trend change, and the change of both (the sum of the level change and trend change) under different values of parameters, sample sizes, and autocorrelation coefficients () under pre-specified conditions. We focused our attention on single-arm ITS studies. Tests for two-arm ITS studies require future investigation. As anticipated, for Poisson models, the estimated power increased as , the sample size increased, or the values of the parameter became more significant. For the negative binomial method, the estimated power increased as the sample size increased, or values of the parameter became more significant. However, the change of power showed a U-shape pattern as increased for tests on level change and trend change and also increased as increased for the tests of the total change. Further, summarizing the results across the six tables, the power of the hypothesis tests with the same level of parameter values can vary widely depending on the type of tests (level, trend, or both) and the model specifications. Like most ITS designs, our simulation-based power and sample size calculations were based upon models at the aggregated data level. For instance, in the STRIDE study, the aggregate number of participants of African Americans and Latino descent will be collected weekly. However, this type of analysis will not only lose information when individual level data are unavailable, but can also give an incomplete conclusion if the total number of participants increases simultaneously. Thus, although aggregate level ITS designs are the common practice, power and effect size calculations based on such an approach only consider the number of time tables, but not the number of observations at each time window. For this reason, individual level ITS designs for count data or ITS designs that account for the number of observations at each time window need to be further investigated. This study has several limitations. Firstly, we only considered observation-driven ITS models. Previous studies suggested that parameter-driven models are usually more complicated and computationally intensive because full likelihood of these models involve high-dimensional integration. Yet parameter-driven models have better interpretability for their parameters than observation-driven models. Thus, the performance of parameter-driven models for ITS design, based on count outcomes, needs to be further studied and compared with our proposed models. Secondly, it may be too simplistic to assume that an intervention is implemented at a single time point. Using the STRIDE study as an example, it is reasonable to assume that a “ramp-up” period is required to allow the research assistants to complete their training and for the intervention to achieve full implementation. Further, the study contains a comparison group. Although the ITS study may still be valid with the absence of a control study ([7]), and adapt the three-phase design to a two-phase design ([35]), the strength of the inference will be weaker. Therefore, the power and effect size calculations of count outcomes for more complicated models like two-arm three-phase ITS design should be further investigated. Thirdly, as mentioned above, the integrated level ITS design does not consider the number of individuals at each timetable. Using the STRIDE study as an example, this limitation may yield incomplete conclusions, since we expect an increase in the number of African Americans, Latinos, and total participants. Individual-level ITS design could be a reasonable approach to overcome this issue, though only a few health policy studies ([36]) have taken such an approach. Fourthly, excessive zeros are an issue in health policy studies, including the STRIDE study. Our ongoing research seeks to extend our work to zero-inflated Poisson or zero-inflated negative binomial models.

Conclusions

Sample size and power calculations were conducted for ITS studies of count outcomes using an observation-driven model through the simulation-based methods presented in this article. Results varied among the different model specifications and the target of the study (i.e., investigating level change, trend change, or both).

Author's contributions

BZ, WL, and SY presented the research idea. WL performed the numerical simulation. BZ and SY wrote the original manuscript with support from MAP, EJR, MID, and CL. The STRIDE principal investigators, KGS, PAH, SCL, and JJA, motivated the research idea and helped supervise the project.

Declaration of competing interest

None.
  17 in total

1.  Segmented regression analysis of interrupted time series studies in medication use research.

Authors:  A K Wagner; S B Soumerai; F Zhang; D Ross-Degnan
Journal:  J Clin Pharm Ther       Date:  2002-08       Impact factor: 2.512

2.  The impact of compulsory helmet legislation on cyclist head injuries in New South Wales, Australia: a response.

Authors:  Scott R Walter; Jake Olivier; Tim Churches; Raphael Grzebieta
Journal:  Accid Anal Prev       Date:  2013-01-20

3.  Markov regression models for time series: a quasi-likelihood approach.

Authors:  S L Zeger; B Qaqish
Journal:  Biometrics       Date:  1988-12       Impact factor: 2.571

4.  Counter-Point: Early Warning Systems Are Imperfect, but Essential.

Authors:  Christine Y Lu; Gregory Simon; Stephen B Soumerai; Martin Kulldorff
Journal:  Med Care       Date:  2018-05       Impact factor: 2.983

5.  A robust interrupted time series model for analyzing complex health care intervention data.

Authors:  Maricela Cruz; Miriam Bender; Hernando Ombao
Journal:  Stat Med       Date:  2017-08-29       Impact factor: 2.373

Review 6.  Use of interrupted time series analysis in evaluating health care quality improvements.

Authors:  Robert B Penfold; Fang Zhang
Journal:  Acad Pediatr       Date:  2013 Nov-Dec       Impact factor: 3.107

7.  Near Real-time Surveillance for Consequences of Health Policies Using Sequential Analysis.

Authors:  Christine Y Lu; Robert B Penfold; Sengwee Toh; Jessica L Sturtevant; Jeanne M Madden; Gregory Simon; Brian K Ahmedani; Gregory Clarke; Karen J Coleman; Laurel A Copeland; Yihe G Daida; Robert L Davis; Enid M Hunkeler; Ashli Owen-Smith; Marsha A Raebel; Rebecca Rossom; Stephen B Soumerai; Martin Kulldorff
Journal:  Med Care       Date:  2018-05       Impact factor: 2.983

8.  Dense Breast Notification Laws: Impact on Downstream Imaging After Screening Mammography.

Authors:  Michal Horný; Alan B Cohen; Richard Duszak; Cindy L Christiansen; Michael Shwartz; James F Burgess
Journal:  Med Care Res Rev       Date:  2018-01-19       Impact factor: 3.929

9.  Regression based quasi-experimental approach when randomisation is not an option: interrupted time series analysis.

Authors:  Evangelos Kontopantelis; Tim Doran; David A Springate; Iain Buchan; David Reeves
Journal:  BMJ       Date:  2015-06-09

10.  Interrupted time series regression for the evaluation of public health interventions: a tutorial.

Authors:  James Lopez Bernal; Steven Cummins; Antonio Gasparrini
Journal:  Int J Epidemiol       Date:  2017-02-01       Impact factor: 7.196

View more
  3 in total

1.  Improving Medication Adherence Through Adaptive Digital Interventions (iMedA) in Patients With Hypertension: Protocol for an Interrupted Time Series Study.

Authors:  Kobra Etminani; Carina Göransson; Alexander Galozy; Margaretha Norell Pejner; Sławomir Nowaczyk
Journal:  JMIR Res Protoc       Date:  2021-05-12

2.  sparrpowR: a flexible R package to estimate statistical power to identify spatial clustering of two groups and its application.

Authors:  Ian D Buller; Derek W Brown; Rena R Jones; Mitchell J Machiela; Timothy A Myers
Journal:  Int J Health Geogr       Date:  2021-03-18       Impact factor: 3.918

3.  The use of generalized synthetic control method to evaluate air pollution control measures of G20 Hangzhou Summit.

Authors:  Hao-Neng Huang; Zhou Yang; Yukun Wang; Chun-Quan Ou; Ying Guan
Journal:  Front Public Health       Date:  2022-10-03
  3 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.