Literature DB >> 32715149

Sample size planning in the design and analysis of cluster randomized trials using the symbolic two-step method.

David Zahrieh¹, Jennifer Le-Rademacher¹.

Abstract

INTRODUCTION: Evidence that can be used to improve clinical practice patterns and processes is frequently generated through standard, parallel-arms cluster randomized trial (CRT) designs that test interventions implemented at the center-level. Although the primary endpoint of these trials is often a center-level outcome, patient-level factors may vary between centers and, consequently, may influence the center-level outcome. Furthermore, there may be important factors that predict the variation in the center-level outcome and this knowledge can help contextualize the trial results and inform practice patterns.
METHODS: Our symbolic two-step method that applies symbolic data analysis to account for patient-level factors when estimating and testing a center-level effect on both the average center-level outcome and its variation was developed for such settings. Herein, we sought to extend the method to prospectively size a CRT so that the application of our method in data analysis is consistent with the design.
RESULTS: Our formulaic approach to sample size planning incorporated predictive factors of the within-center variation and accounted for patient-level characteristics. The sample size approximation performed well in many different pragmatic settings.
CONCLUSIONS: Our symbolic two-step method provides an alternate approach in the design and analysis of CRTs evaluating novel improvement processes within care delivery research.

Entities: Chemical

Keywords: Care delivery research; Cluster randomized trial; Sample size estimation; Symbolic data analysis

Year: 2020 PMID： 32715149 PMCID： PMC7378578 DOI： 10.1016/j.conctc.2020.100609

Source DB: PubMed Journal: Contemp Clin Trials Commun ISSN： 2451-8654

Introduction

It is well recognized that a hospital's (or medical center, hereinafter referred to as a center) processes and policies have a direct impact on patient care and subsequently affect patient outcomes. Many of these processes and policies are modifiable. For example, implementation of a medical team training program by the Veterans Health Administration to encourage teamwork and effective communication was shown to improve surgical mortality [1]. In care delivery clinical trials evaluating improvement processes, cluster randomization is often used to assign centers to either a new process (the intervention) or an existing process. The effectiveness of the intervention is determined by center-level outcomes. Because a center-level outcome is often an accumulation of patient-level outcomes and symbolic data analysis (SDA) embraces the idea of aggregating individual-level data (the micro-data) into group-level distributional summaries (the symbols) and then building models for inference directly at the group-level based on these summaries [2], data from care delivery clinical trials fit naturally in the symbolic data framework. SDA was first introduced by Diday [3] and continues to be a growing field of statistics. Unlike a classical observation, which takes a single value, a symbolic observation can take a set of values. In the above care delivery clinical trial setting, each center is a symbolic-valued observation comprising a set of patient-level outcomes. Despite randomization, arm imbalance in patient-level factors may occur in cluster randomized trials (CRTs) as centers may have different patient-mixes. The difference in patient-mix may potentially confound the effect of the new process. To achieve a meaningful estimate of the effect of the intervention on a particular center-level outcome, it is crucial in these circumstances to account for the imbalance in patient-level factors between centers. Furthermore, because a center-level outcome is an accumulation of patient-level outcomes, there is an inherent variation in center-level outcome, which may be linked to the patient-mix or clinical practice at each center. Oftentimes, characterization of this variation is not the focus in the statistical analysis; however, it is sometimes of interest to identify factors affecting or predicting the within-center variance, which may in turn lead to targeted changes in clinical practice to reduce such variation. A two-step method using the symbolic data framework (hereinafter referred to as the symbolic two-step) was recently proposed [4] that can adjust for patient-level factors while also maintaining the separation between the center and within-center contributions to the total variance to allow estimation of the effect of the intervention on the mean center-level outcome while simultaneously modeling the within-center outcome variability; consequently, the method conveniently facilitates sample size planning for CRTs. We leveraged this observation and extended the proposed methodology to estimate the sample size when planning a prospective CRT so that both the study design and the data analysis accounts for patient-level factors and incorporates our knowledge of important predictors of the variation of the center-level outcome.

Materials and methods

Symbolic data and the symbolic two-step method

In SDA, we can consider the realization of a symbolic-valued random variable such that can be written as where is the number of values in and is the relative frequency that occurs within for . Furthermore, we can assume that takes a distribution of values with a known density function and a parameter vector , where is defined to ensure a one-to-one correspondence between and ; thus, comprises the smallest set of parameters that uniquely identify. For example, if takes a Gaussian distribution of values with mean and variance then uniquely identifies in the family of Gaussian distributions. In the symbolic data framework, we say that and are the density function and the parameter vector of values within . We refer to as the internal density of ; additionally, we refer to as the vector of internal parameters of . Because is the vector of parameters within , and is a random variable, is not fixed but varies along with . In other words, we suppose that is a random vector corresponding to the symbolic-valued random variable , with known probability density function . Because there exists a one-to-one correspondence between and , the probability that takes a set of values equals the probability that takes value . The symbolic data framework above formed the basis for a two-step method recently proposed [4] that accounts for patient-level factors in estimating and testing center-level effects on both the average center-level outcome and its variation. The first step of the method estimates the effect of patient factors on the outcome, treating individual patients as the units of observation; importantly, the estimated effect is obtained from a stratified model that stratifies by center to avoid bias toward larger centers and to preserve the separation of the within and between variation among centers. The second step evaluates the center-level intervention effect on the center-mean outcome, after adjusting for patient-level factors from step one, treating centers as the units of observation. In parallel, the predictive effects of center-level characteristics on the within-center variation of the patient-adjusted outcomes are separately evaluated in the second step. Because this method considers the distribution of patient adjusted outcomes from each center as the center-level response, Le-Rademacher [4] applied SDA; a symbolic observation can take on a distribution of values defined by parameters internal to the symbolic-valued random variable [5]. Using the theoretical framework provided by Le-Rademacher and Billard [6], the distribution of patient outcomes is modeled in step two via the internal parameters using classical linear regression models. In the context of CRTs, suppose we measure some continuous outcome variable and let be the vector of salient patient-level characteristics of patient from center . Although the number of patients within each center need not be equal, for simplicity we will assume patients within each center. Suppose we randomize the centers with equal allocation to receive an intervention or control such that the centers are divided into equal groups with centers per group. Ordinary least squares (OLS) is used to estimate the effect of patient-level characteristics on the outcome using the stratified modelwhere are assumed to be independent with mean zero. The superscript s is used to emphasize that the error terms are obtained from the stratified model. In the stratified model (1), the vector represents the within-center effects of patient-level characteristics on the patient-level outcome and are constrained to be the same across centers. Because these effects are constrained to be the same across centers, the model permits the separation of the variation caused by patient-mix from the variation caused by center-level characteristics. The intercept in model (1) represents the average outcome of center after adjusting for the patient-level characteristics. Then,represents the outcome of patient from center adjusted for his or her patient-level characteristics and can be estimated by , where is a vector of unbiased estimators of . Now, consider to be a symbolic-valued random variable corresponding to the set of adjusted outcomes of the patients from center with internal mean and internal variance , such that and are random variables. Here, represents the center-level outcome of interest from center . Given a center , suppose further that the outcome of patient follows a Gaussian distribution with random parameter vector corresponding to the internal parameters of . Note that, based on this internal distributional assumption, in models (1) and (2). We observe the realization of where and represent the sample mean and sample variance for the center. Let denote an indicator variable, which takes the value one if the center was randomly allocated to receive the intervention and zero otherwise. Furthermore, because we anticipate heterogeneous within-center variances, we allow the variances to vary from center to center as a parametric function of explanatory variables. Suppose that there is a single dichotomous variable that is driving the within-center variances. Because has a 1-1 relationship with , and captures the total variance of , the effect of the indicator variables and on can be expressed in terms of their effect on . With being a vector of single variable observations, the effect of and on can be estimated using OLS. Specifically, the effect of on and the effect of on can be modeled separately using OLS by the following two linear modelswhere , , and . Stated differently, the sample center means of the patient-adjusted outcome are regressed on the variable and the sample log within-center variations of the patient adjusted outcome are separately regressed on the variable . The coefficient represents the intervention effect on the center-mean outcome and is the primary effect of interest in a CRT design. The coefficient represents the effect of the dichotomous predictor variable on the log within-center outcome variability. The standard errors (SEs) for the estimated coefficients , where equals the square root of the sum of squared errors divided by ) and , where equals the square root of the sum of squared errors divided by ) and confidence intervals (CIs) are immediately obtained in closed form from classical linear regression theory [7].

Sample size determination

Suppose we randomize centers with equal allocation to receive the intervention or control such that the centers are divided into equal groups with centers per group (i.e. ). Further, we will continue to assume patients within each center. Consider a normally distributed quantitative endpoint, where the objective is to compare the sample mean of the symbolic outcome variable in the intervention and control arms. Here, for the set {} of sample means for each center assigned to the intervention arm. Similarly, for the set {} of sample means for each center assigned to the control arm. We are interested in determining the number of centers within each arm, each with observations. The expected value of a patient-adjusted outcome issuch that is the expected value for a randomly selected patient in the absence of the intervention, while is the expected value in the presence of intervention. The expected value of is then The variance of will be a sum of two components, namely, the internal and external variation, where the internal variation is the mean of a log normal distribution. That is, for the patient-adjusted outcome , the variance is Based on this expression, the variance of the observed mean for the center with observations is given by Now, suppose that the dichotomous variable is a stratification factor at the time of random assignment such that proportion of the centers within each arm will correspond to the level of the factor and proportion of the centers within each arm will correspond to the level of the factor . We can express the internal variation as a weighted average, where and are the weights, such that the variance of the sample mean within the intervention arm, say, with centers is The variance of the difference in means between these two independent samples will then be The approximate power to detect a difference of for conducting a two-tailed test of size α is thenwhere is the cumulative standard Normal distribution and is the quantile of the standard Normal distribution function. With power, this means that the approximate number of centers required per arm can be computed as

Simulation

To assess the performance of the approximate sample size formula (5) under different scenarios, we conducted a simulation study. The goal for carrying out these simulation studies was to understand the settings when the formula would be deemed reliable for estimating the approximate sample size for a standard, parallel-arms CRT and, therefore, make sample size determination by way of information-based simulation methods unnecessary.

Data generating model

Data were generated from a multilevel (or hierarchical) linear model [8]. We considered grouped data - patients within centers - where some information was available on patients and some information was at the center-level. With this nested structure, we had observations clustered in centers . Within each center , we assumed that we had three continuous individual-level predictors , , and , and one center-level predictor labeled as such that the center-level predictor was an indicator variable (intervention or control) and was the primary variable of interest. The multilevel linear model can be expressed as a linking of local regressions in each center with a random intercept term indexed by center. The local individual-level predictors were generated from a . We supposed that there was a single dichotomous variable that was driving the within-center variances, such that when the center was associated with decreased within-center variation and zero otherwise. Then, the variances on the log scale were allowed to vary as a parametric function of the indicator variable such that The within-center variances were generated first from a log normal distribution with location parameter (i.e. the mean of the log of the distribution expressed as a function of the center-level predictor variable ) and shape parameter (i.e. the standard deviation of the log of the distribution). The center-level predictor was included as a predictor in the next level of the model: The errors in this model (with mean zero and standard deviation ) represented variation among centers that was not explained by the single center-level and three local individual-level predictors. Furthermore, we assumed and were independent.

Model parameterizations

For each simulated data set, we applied the symbolic two-step method to estimation and using the t-statistic for the regression coefficient, with the standard error for the estimated coefficient, tested at the 0.05 significance level the null hypothesis that vs the two-sided alternative hypothesis when the true value in model (3). Given the power of the symbolic two-step method obtained via simulations , we graphically compared the number of centers with the number of centers obtained from the formula. We considered the setting where centers were randomized to the intervention or control arm in a ratio of 1-1 and such that the proportion of the centers within each arm that corresponded to in model (4) was either 0.75, 0.5, or 0.25. Further, for each , we considered four scenarios corresponding to four model parameterizations of model (4), and varied to explore different effect sizes associated with the dichotomous predictor . The four model parameterizations applied to address our goals in this simulation study were. , , , , . The first model parameterization corresponded to the setting of homogeneous mean log variance in the presence of varying noise as defined by , while the second, third, and fourth scenarios, respectively, corresponded to increasing effect sizes of the dichotomous predictor such that the fourth scenario represented a substantial difference in mean log within-center variance when vs when and in the presence of different degrees of variation as defined by . For instance, when , , , the for the center was 0.25 and 5 for and , respectively, or a 20 fold increase when . An illustrative sample of the scenarios studied is shown in Table 1.

Table 1

An illustrative sample of the scenarios studied to assess the performance of the sample size approximation.

Parameterization		Mean Within-Center Variance		Mean Within-Center VarianceWeight Average
No.	Parameters	E[σi2\|xiσ=1]	E[σi2\|xiσ=0]	π×E[σi2\|xiσ=1]+(1−π)×E[σi2\|xiσ=0]
2	γσ0=−1.5, γσ1=0.680, σξ=0.5, π=0.75	0.50	0.25	0.44
4	γσ0=−1.5, γσ1=2.985, σξ=0.5, π=0.75	5.00	0.25	3.81
2	γσ0=−1.5, γσ1=0.680, σξ=1.5, π=0.75	1.36	0.69	1.19
4	γσ0=−1.5, γσ1=2.985, σξ=1.5, π=0.75	13.60	0.69	10.37

An illustrative sample of the scenarios studied to assess the performance of the sample size approximation. Testing that the coefficient in model (8) against the two-sided alternative that is equivalent to applying a two-sample t-test to test the null hypothesis that the population means within each arm are equal vs the two-sided alternative that the population means are not equal. Because there are centers in each of two arms, the dataset comprises numbers consisting of the sample means for each center. When the number of centers, and hence the degrees of freedom for the t-test ( degrees of freedom), are small, the percentage points of the t distribution are appreciably larger than those of the standard normal distribution; to correct for this in our simulation study, we added one center to in the formula because the calculation makes use of the normal approximation [9]. We assumed a targeted effect (intervention vs. control) of , and settings where the number of observations within each center was 25, 50, or 100.

Results

For the sample of four scenarios illustrated in Table 1, corresponding results are shown in Fig. 1, Fig. 2 with , and . The scenarios were selected for presentation because they encapsulated the performance of the myriad scenarios studied. The figures show the power of the test associated with increasing numbers of centers based on simulation as well as the number of centers determined from the formula given the power of the test obtained from simulation.

Fig. 1

Parameterizations 2 and 4 (Table 1)..

Fig. 2

Parameterizations 2 and 4 (Table 1)..

Each figure shows the power of the test associated with increasing numbers of centers based on Monte Carlo simulation as well as the number of centers determined from the approximate sample size formula given the power of the test obtained from Monte Carlo simulations. The targeted effect (intervention vs. control) of was assumed. Parameterizations 2 and 4 (Table 1).. Each figure shows the power of the test associated with increasing numbers of centers based on Monte Carlo simulation as well as the number of centers determined from the approximate sample size formula given the power of the test obtained from Monte Carlo simulations. The targeted effect (intervention vs. control) of was assumed. Parameterizations 2 and 4 (Table 1).. The formula performed well across a wide variety of scenarios one may encounter in practice (not all shown); however, in the presence of substantial differences in mean log within-center variation (parameterization 4), particularly when the proportion π of centers with increased mean log within-center variation was high, and large error variance , the robustness of the formula with 25 observations within each center was lacking for smaller numbers of centers . In these particular settings, the formula consistently indicated that more centers were needed to achieve the desired power. Practical limits of the formula as it pertains to the model parameterization of the internal variation are discussed in the supplementary materials.

Application

In the context of a recently performed data analysis, we consider an application of sample size planning of a hypothetical standard, parallel arms CRT with a continuous outcome.

Data analysis

Our symbolic two-step method was previously applied to a retrospective evaluation of bone marrow transplant center practices on post-transplant survival [4]. The retrospective data analysis was based on a subset of data obtained from a study conducted by Majhail et al. [10,11] The subset comprised 3320 patients from 67 transplant centers across the United States (approximately 50 patients per center, on average). The continuous outcome of interest was the probability of survival at 1-year based on pseudo-observations which represented the estimated value of the survival function at 1-year for each patient j in center i [12] and calculated using the R package pseudo. Pseudo-observations are based on the idea of the jackknife estimator and involve transforming a set of survival-time observations into a set of pseudo-observations that can be modeled directly using regression analysis. Pseudo-observations are not necessarily restricted to the range 0–1, as occurred in our retrospective data analysis; however, they provided an alternative, albeit informative approach to illuminate the relationship between transplant center characteristics and survival by allowing the analysis of censored survival data by linear regression. After adjusting for the patient-level characteristics in , namely, Karnofsky performance score, disease status, prior autologous transplant, time from diagnosis to transplant, donor-recipient HLA match, unrelated donor age, and Sorror comorbidity score index, the adjusted outcomes of patients from the same center were combined into a distribution of outcome representing center. The distribution of outcome within each center was assumed to be a symbolic random variable ( for the center) corresponding to the set of adjusted outcomes within that center ( for ) such that each distribution of outcome was characterized by an internal mean and variance . The center-level characteristics considered in the data analysis were allogeneic transplant volume (≤40 vs. > 40), whether the transplant center participated in at least one clinical trial in the past 12 months (yes vs. no), and the number of ventilation units (2 units vs. 1 unit). Approximately 40% of the centers had an allogeneic transplant volume ≤40 and 92% had participated in a clinical trial within the past 12 months. And 22% of the centers had a single ventilation unit in the transplant center. The results from applying our symbolic two-step method, which are shown in Table 2, indicated that centers with lower allogeneic transplant volume were associated with decreased center-mean survival at 1 year. In addition, the data suggested that increasing the number of ventilation units may result in a clinically important improvement in center-mean 1-year survival. Participation in clinical trials in the past 12 months was associated with lower log within-center variability in 1-year survival; furthermore, these data suggested that centers with lower allogeneic transplant volume may be associated with increased log within-center variability in 1-year survival. Increasing the number of ventilation units seemingly was not a predictor of log within-center variability in 1-year survival in the presence of the other two center-level factors (P = 0.345).

Table 2

Application of the symbolic two-step method. Effect of center-level characteristics on 1-year survival (pseudo-observations) after adjusting for patient-level characteristics.

Outcome	Center Characteristic	Parameter	Estimate (SE)	95% CI
	Intercept	γμ0	0.817 (0.077)	(0.667, 0.968)
Mean (μ)	Allogeneic transplant volume (≤40 vs > 40)	γμ1	−0.076 (0.030)	(-0.135, −0.018)
	Recent participation in clinical trials (yes vs no)	γμ2	−0.027 (0.053)	(-0.130, 0.076)
	Number of ventilation units (2 vs 1)	γμ3	0.036 (0.031)	(-0.025, 0.098)

		σμ	0.104

	Intercept	γσ0	−1.433 (0.106)	(-1.640, −1.225)
Log variance (ξ)	Allogeneic transplant volume (≤40 vs > 40)	γσ1	0.070 (0.041)	(-0.011, 0.150)
	Recent participation in clinical trials (yes vs no)	γσ2	−0.170 (0.072)	(-0.312, −0.028)

		σξ	0.143

Note: Pseudo-observations at 1-year represented the continuous outcome and were obtained from the R package pseudo. Pseudo-observations are not necessarily restricted to the range 0–1, as occurred in the current analysis; however, they provided an alternative, albeit informative approach to illuminate the relationship between transplant center characteristics and both the center-mean and log within center variability in 1-year survival by allowing the analysis of censored survival data by linear regression. SE = (model-based) standard error. The data set included 3320 patients from 67 transplant centers. After obtaining the patient-adjusted outcomes from Step 1 in the symbolic two-step method, the effect of the center-level characteristics on the center-mean survival at 1 year and log within center variability in 1-year survival were modeled separately using ordinary least squares estimation in Step 2.

Application of the symbolic two-step method. Effect of center-level characteristics on 1-year survival (pseudo-observations) after adjusting for patient-level characteristics. Note: Pseudo-observations at 1-year represented the continuous outcome and were obtained from the R package pseudo. Pseudo-observations are not necessarily restricted to the range 0–1, as occurred in the current analysis; however, they provided an alternative, albeit informative approach to illuminate the relationship between transplant center characteristics and both the center-mean and log within center variability in 1-year survival by allowing the analysis of censored survival data by linear regression. SE = (model-based) standard error. The data set included 3320 patients from 67 transplant centers. After obtaining the patient-adjusted outcomes from Step 1 in the symbolic two-step method, the effect of the center-level characteristics on the center-mean survival at 1 year and log within center variability in 1-year survival were modeled separately using ordinary least squares estimation in Step 2.

Study design

Transplant facilities that may not be properly ventilated, designed, or controlled can lead to the spread of airborne pathogens throughout the facility. Consequently, transplant patients with a compromised immune system will be susceptible to infection, leading to premature death. Suppose now, that we are interested in designing a standard, parallel-arms CRT to test whether a novel ventilation strategy (intervention) improves center-mean survival at 1-year compared with usual care ventilation strategies (control). Further, suppose at the time of random assignment we will stratify whether the center had participated in clinical trials within the past 12 months (yes; no) and allogeneic transplant volume (≤40; >40), factors which we have reasonable evidence to be predictive of the log within-center variability in 1-year survival. As with our retrospective data analysis, we consider the patient-adjusted 1-year survival probabilities (pseudo-observations) as the continuous outcome of interest. We further assume that the log within-center variability in that outcome can be expressed as a parametric function of the dichotomous variables allogeneic transplant volume (≤40; > 40) and recent participation in clinical trials (yes; no). In this hypothetical standard, parallel-arms CRT with a continuous outcome, we hypothesize that the novel ventilation strategy will increase the center mean of the patient-adjusted 1-year survival probabilities compared with usual care strategies such that in model (3), corresponding to a clinically important improvement in center-mean survival. To determine the needed number of centers to detect, with adequate power, this clinically important effect size for the novel ventilation strategy we apply formula (5). Based on the results obtained from our retrospective data analysis (Table 2), we further suppose that the between-center variance . To obtain parameter estimates of and , we also turn to our earlier results. Regressing the log within-center variability of the patient-adjusted 1-year survival probabilities on the binary predictors representing allogeneic transplant volume (≤40 vs. > 40) and clinical trial experience within the past year (yes vs. no), we obtained mean estimates (95% CI) of the coefficients and of −1.43 (−1.64, −1.23), 0.07 (−0.01, 0.15), and −0.17 (−0.31, −0.03), respectively. Our estimate of was 0.02. Based on the estimated proportion of 67 transplant centers within each level of the crossed two-level stratification factors , where and correspond to and , with the constraint that . Additionally, we will assume the possibility that . Further, we consider the following plausible values of and based on inspection of the corresponding 95% CIs: 1.23, 0.15, and −0.31. Table 3 displays the number of transplant centers (per arm) needed to detect a hypothesized difference of 0.05 (intervention vs. usual care ventilation strategies) in center-mean, patient-adjusted 1-year survival probabilities with 80% power and assuming a type I error rate of 5% under our assumptions, as well as the power obtained empirically via 10,000 replicates. Given the value , the empirical power was consistently close to 80%, i.e. the power applied in the formula.

Table 3

Input Parameters								Eˆ[σi2\|xiσ1,xiσ2]
γσ0	γσ1	γσ2	σξ2	m	σμ2	(π1,π2,π3,π4)	γμ1	Eˆ[σi2\|0,0]	Eˆ[σi2\|0,1]	Eˆ[σi2\|1,0]	Eˆ[σi2\|1,1]	c	EmpiricalPower
−1.43	0.07	−0.17	0.02	50	0.01	(0.02, 0.58, 0.06, 0.34)	0.05	0.24	0.20	0.26	0.22	90	0.795
				100								77	0.798
				50		(0.15, 0.30, 0.25, 0.30)						92	0.797
				100								78	0.807
−1.23	0.15	−0.31	0.02	50	0.01	(0.02, 0.58, 0.06, 0.34)	0.05	0.30	0.22	0.34	0.25	93	0.797
				100								78	0.795
				50		(0.15, 0.30, 0.25, 0.30)						97	0.803
				100								80	0.791

Note: sample size based on the formula. Given , the simulated power obtained empirically via 10,000 replicates is shown at the far right. When , the allogeneic transplant volume at center is > 40, while the allogeneic transplant volume at the center is ≤ 40. When , transplant center did not recently participate in a clinical trial, while indicates that the center recently participated in a clinical trial.

Design application of the symbolic two-step method in sample size planning. Number of transplant centers (per arm) needed to detect a hypothesized difference of 0.05 in center-mean, patient-adjusted 1-year survival probabilities based on pseudo-observations with 80% power and a two-sided α of 5%. Note: sample size based on the formula. Given , the simulated power obtained empirically via 10,000 replicates is shown at the far right. When , the allogeneic transplant volume at center is > 40, while the allogeneic transplant volume at the center is ≤ 40. When , transplant center did not recently participate in a clinical trial, while indicates that the center recently participated in a clinical trial.

Discussion

The symbolic two-step method that applies SDA accounts for patient-level factors when estimating and testing a center-level effect on both the average center-level outcome and its variation. Because statistical inference is conducted at the center-level, rather than at the individual-level, our methodologic approach parallels cluster-sampling designs. Further, because our method preserves the separation of the within- and between-center variation, and permits the within-center variation to depend on explanatory variables in a manner that can be estimated from the data, it conveniently facilitates sample size planning of CRTs. Our proposed formulaic approach to sample size planning within the cluster-sampling framework incorporated this knowledge and accounted for salient patient-level factors so that the study design can be consistent with the data analysis that applies the symbolic two-step method. The categorical predictor of the variation in center-level outcome that was identified a priori from applying the symbolic two-step method, say based on relevant retrospective data analysis, should be considered as a stratification factor at the time of randomization to ensure arm balance on the levels of the predictor. In that case, the estimated components obtained from modeling the variation of the center-level outcome in step 2 can be weighted according to the number of centers at each level of the stratification factor and the usual formula for determining the approximate number of clusters to randomize in a CRT can be applied. In the event that there are two categorical predictive factors, extending the number of stratification factors and applying the approximate sample size formula is straightforward as we demonstrated with our design application. Our approach to sample size determination performed well and was found to be robust for most pragmatic settings studied, and the formulaic approach obviates the need to estimate the sample size via extensive simulation studies. Because obtaining good estimates of the components needed for characterizing the within-center variation may be challenging, particularly in settings with small numbers of patients within centers, a range of plausible values of those components should be considered in order to assess the impact on sample size estimation. There are limitations to our research. To determine the number of centers to randomize with our method in a prospective CRT design, we assumed a fixed center size; in other words, we did not explore the implications on the study's power in the event of variable sample size per center. Further, equal weight is given to each center when modeling the average center-level outcome in step 2 of the symbolic two-step method, and weighted alternatives, while conceivable, were not considered. Additionally, the symbolic two-step method is limited to the setting of continuous response outcomes where maintaining the separation between the center and within-center contributions to the total variance is straightforward. While binomial and Poisson responses are also encountered in practice, the method is not currently able to directly handle these outcomes where variance is a function of the mean. These limitations, however, present opportunities for further research and future extensions of the method.

Conclusion

Knowing the predictive factors of the variation in the center-level outcome of interest is valuable in both the design and data analysis of standard, parallel arms CRTs and our proposed formulaic approach to sample size planning incorporated this knowledge as well as simultaneously accounted for patient-level characteristics. Our symbolic two-step method provides an alternate approach to both the design and analysis of standard, parallel arms CRTs evaluating novel improvement processes within care delivery research.

Funding

This work was supported in part by CTSA Grant Number UL1 TR002377 from the National Center for Advancing Translational Science (). Its contents are solely the responsibility of the authors and do not necessarily represent the official views of the NIH. This work was also partially supported by the Grants: U10CA180882 (Alliance for Clinical Trials in Oncology Statistics and Data Management Grant), and P30CA15083 (Mayo Clinic Comprehensive Cancer Center grant).

Authors’ contributions

Both authors contributed equally to the manuscript.

2 in total

1. Association between implementation of a medical team training program and surgical mortality.

Authors: Julia Neily; Peter D Mills; Yinong Young-Xu; Brian T Carney; Priscilla West; David H Berger; Lisa M Mazzia; Douglas E Paull; James P Bagian
Journal: JAMA Date: 2010-10-20 Impact factor: 56.272

2. Transplant center characteristics and survival after allogeneic hematopoietic cell transplantation in adults.

Authors: Navneet S Majhail; Lih-Wen Mau; Pintip Chitphakdithai; Ellen M Denzen; Steven Joffe; Stephanie J Lee; Charles F LeMaistre; Fausto Loberiza; Susan K Parsons; Ramona Repaczki-Jones; Pam Robinett; J Douglas Rizzo; Elizabeth Murphy; Brent Logan; Jennifer Le-Rademacher
Journal: Bone Marrow Transplant Date: 2019-11-18 Impact factor: 5.483

2 in total