Literature DB >> 30209821

Theory of general balance applied to step wedge designs.

Abstract

A standard idealized step-wedge design satisfies the requirements, in terms of the structure of the observation units, to be considered a balanced design and can be labeled as a criss-cross design (time crossed with cluster) with replication. As such, Nelder's theory of general balance can be used to decompose the analysis of variance into independent strata (grand mean, cluster, time, cluster:time, residuals). If time is considered as a fixed effect, then the treatment effect of interest is estimated solely within the cluster and time:cluster strata; the time effects are estimated solely within the time stratum. This separation leads directly to scalar, rather than matrix, algebraic manipulations to provide closed-form expressions for standard errors of the treatment effect estimate. We use the tools provided by the theory of general balance to obtain an expression for the standard error of the estimated treatment effect in a general case where the assumed covariance structure includes random-effects at the time and time:cluster levels. This provides insights that are helpful for experimental design regarding the assumed correlation within clusters over time, sample size in terms of numbers of clusters and replication within cluster, and components of the standard error for estimated treatment effect.

Entities: Gene Species

Keywords: cluster randomized clinical trials; linear mixed models; step-wedge design

Mesh：

Year: 2018 PMID： 30209821 PMCID： PMC6585670 DOI： 10.1002/sim.7960

Source DB: PubMed Journal: Stat Med ISSN： 0277-6715 Impact factor: 2.373

BACKGROUND AND MOTIVATION

The theory of general balance is presented within two papers by Nelder1, 2 and is expounded upon, in particular, using Hasse diagrams to perform calculations, in the work of Bailey.3 The theory applies to experimental designs that meet the condition that the structure of observation units be represented by an arbitrary degree of nesting and crossing of blocks, but that the level of replication within any level is constant. As such, an idealized step‐wedge (SW) design is an example of time crossed with cluster. The first of the two papers focused on the structure of observational units to the exclusion of treatment structure. It provides a calculus to decompose the implied covariance structure into orthogonal strata, with three different parametrizations that correspond to the expected sums of squares, variance parameters of random effects, interpretable correlation coefficients, along with corresponding degrees of freedom. The use of Hasse diagrams adds some clarity to this calculus,3 although stops short of providing all three versions of the parametrization. The second of the papers2 considers how to proceed with estimation of fixed treatment effects. Each fixed effect to be estimated may have information contained within any of the strata. Estimates from different strata have the useful property of mutual independence, conditional on the true values, and the covariance decomposition immediately provides expressions for the variance of each estimate within a stratum. In the canonical SW setting, aside from the universal level, sometimes referred to as the grand mean, there are four possible strata levels, ie, cluster, time, cluster:time, and residual error. The adjustment for time as a fixed, rather than random, effect is the default, as assumed in the work of Hussey and Hughes,4 and has the simplification that the estimation of time effects is contained entirely within the time stratum. Hence, the multidimensional information matrix that would need to be algebraically inverted in a modern analysis (see the work of Pinheiro and Bates5) of a linear mixed model with fixed effects for intercept, time, and treatment is conveniently avoided. When the information matrix is decomposed by strata, in the cluster and cluster‐time strata, there are zeroes for all elements aside from diagonal element corresponding to the treatment effect. Hence, the generalized inverse of the information matrix, which is needed to express the standard error of the strata‐specific treatment effect estimate, is trivial to express in terms of a scalar inverse. This paper is focused on applying the methods of general balance in the setting of an SW design. The Hussey‐and‐Hughes4 assumptions will be used predominantly as an exemplar and a subsequent generalization provided to show how the methods can be used in an alternative setting. There are a numerous variations on these assumptions, ie, fixed or random effects at any stratum and taking repeated observations within the same participant, rather than new participants within the same cluster, which adds in a new cluster‐participant stratum. The methods demonstrated within this paper can be easily applied by the reader across different combinations of assumptions. At the study design stage, a statistician typically has assumed numerical values for standard deviation, treatment effect size of interest, and within‐cluster correlation values. The result of applying methods from this paper is to provide a clear exposition in terms of simple closed‐form expressions for the standard error of treatment effect estimates at each stratum and overall and thus the statistical power. In contrast to the alternative of numerical calculations, we gain an understanding of the implications of the assumptions, uncertainty around parameter values, and the interplay between the number of clusters, participants, and replications.

NOTATION

We assume the same notation as Hussey and Hughes4 and will resolve this with Nelder,1, 2 providing explanation as needed. Let Y be the response for individual k at time j from cluster i (i in 1,…,I; j in 1,…,J; k in 1,…,N). We assume that The random elements a and e are assumed to be mutually independent. We also define , ie, the overall variation (Hussey and Hughes4 attach a different definition to σ 2 with no indices). The correlation within clusters is captured by the random effects, ie, a ; the fixed time effects are β ; and the treatment indicator x is constant within cluster time. The SW design imposes a condition on this covariate x to ensure that only unidirectional switching happens, so for all j ′ > j; however, the results presented here apply to any choice of treatment mapping; the treatment effect size is θ. As is common practice in the study design, we assume that the values of the covariance parameters , are known.

EXPERIMENTAL UNITS

We can classify the experimental units as criss‐cross (of time and cluster) with split plot (replications).1 Under the assumptions of balance, this is condensed into (I × J)→N. The equivalent Hasse diagram is given in Figure 1.

Figure 1

Hasse diagram: number of levels, degrees of freedom, ρ σ 2, f σ 2, ζ σ 2

Hasse diagram: number of levels, degrees of freedom, ρ σ 2, f σ 2, ζ σ 2 Expressing the degrees of freedom identity under the calculus is the first step, which, in this case, gives terms for intercept, cluster, time, time‐cluster, and residual, respectively, The individual terms can be obtained by working down the Hasse diagram, where d universal = 1 and where n is the number of levels within stratum F. The notation G≻F is defined precisely in the work of Bailey3 but simply means strata G that are above stratum F. Averaging matrices are defined to transform the vector Y to a vector of averages repeated over the index that has been dropped, for example , the i j kth component of B cluster Y, where Here, bold font is used to indicate a matrix where J is a square matrix of 1s and I is the identity matrix, both with dimensions equal to the number of levels within the stratum given in the suffix. The Kronecker product is indicated by ⊗. Generalization of this example to other combinations of indices is trivial. Next, we map each term in the degrees of freedom Equation (1) to a set of orthogonal matrices that are used to calculate the sums of squares within each stratum, ie, C universal,C cluster,C time,C cluster:time, C individual. For example, is obtained by the following: starting with the term (I − 1)(J − 1), the fourth term on the right‐hand side of Equation (1); replacing I with cluster, J with time, and 1 with universal; symbolically expanding to obtain the expression cluster:time ‐ cluster ‐ time + universal; mapping each term to the index of an averaging matrix and preserving the coefficients. Alternatively, we work down the Hasse diagram to apply the equation It is shown that these form a set of orthogonal matrices1 and that the sums of squares are thus independent and defined as .

COVARIANCE DECOMPOSITION

The covariance matrix of Y can be written in three different parametrizations where the F is used to index the strata. The first of these parametrizations is the easiest to conceptualize and propose suitable values for in study design stages; ρ represents the correlation between two observations that are within the same level of F, but not common to any strata below F. Following Hussey and Hughes,4 we are assuming the following: ρ universal = 0 the universal or grand mean; ρ cluster = τ 2/σ 2, cluster level; ρ time = 0, as per the assumption of fixed time effects; ρ cluster:time = ρ cluster, patients within the same cluster‐period are no more correlated than those in the same cluster, but different periods; ρ residual = 1, a mathematical formality that a random variable is perfectly correlated with itself. The third assumption may be less plausible in some contexts. The second parametrization in (2) corresponds to expressing the model as a mixed effect model, as in Section 2, where the f is the assumed variance parameter (or zero in its absence) of an independent random intercept scaled by the overall variance at the corresponding stratum level. The third parametrization in (2) is the expected mean sum‐of‐squares under the null hypothesis and provides the covariance matrix for estimates obtained in each stratum. The core of this paper lies in showing how to derive expressions for ζ . These values are of importance as they readily lead to expressions for the stratum‐specific and overall information, and the standard error of the treatment effects (4), which, in turn, can be used in standard formulae for power and significance level. Nelder1 derived these relationship, gave precise definitions to W ,U , and a calculus to obtain f and ζ from ρ . Note that we have scaled ζ by σ 2. The detailed steps of the calculations and results are given in Table 1. The equivalent obtained by working up and down the Hasse diagram using the rules described in the work of Bailey3 and stated below is in Figure 1.

Table 1

Hussey and Hughes covariance parametrizations

Stratum	ρ	f	σ ² ζ
Overall Mean	0	ρ = 0	σ2(IJNf+INftime+JNfcluster+Nfcluster:time+findividual)=JNτ2+σe2
Cluster	τ ²/σ ²	ρ _cluster − ρ = τ ²/σ ²	σ2(JNfcluster+Nfcluster:time+findividual)=JNτ2+σe2
Time	0	ρ _time − ρ = 0	σ2(INftime+Nfcluster:time+findividual)=σe2
Cluster:Time	τ ²/σ ²	ρ _cluster:time − ρ _cluster − ρ _time + ρ = 0	σ2(Nfcluster:time+findividual)=σe2
residual	1	ρindividual−ρcluster:time=σe2/σ2	σ2findividual=σe2

; ; . Hussey and Hughes covariance parametrizations If observations at distinct time points are taken on different participants, then the assumption of a constant correlation within a cluster appears plausible as a consequence of time‐invariant characteristics is unique to each cluster. A variant of the SW design repeatedly observes participants longitudinally over time. Including an additional stratum for each patient, nested below the cluster level and above the residual level may account for this. The methods described herein allow the reader to easily derive equivalent formulae. However, an extra stratum makes no provision to potentially allow the correlation to decrease for within‐participant observations that are further apart in time. Varying correlation falls outside the scope of this paper.

ESTIMATION

The second of Nelder's papers2 covers how to estimate treatment effects and provide standard errors, moving away from the null hypothesis. It assumes that there is a general design matrix X, with I J N rows and columns for all fixed effects including intercept, treatment effect, and time effects. We define the vector of coefficients β to satisfy E Y = X β. The response vector is projected within each stratum's subspace, and a stratum‐specific estimate is provided Define the basis X time = {e 1,…,e }, where e is a vector length I J N, where the rows corresponding to time j are 1, and 0 otherwise. Thus, X time can be used as the columns of the design matrix for time. Consider the mapping B e . Each cluster level average of e is 1/J, which also equates to the universal average of e . In the cluster time and time level, the average is either 1 or 0, indicating if the time point at which to average equals the j suffix in e . The projection matrices C are differences of averages (Section 3), which cancel out for cluster and cluster time, and thus the columns for times in X are orthogonal to C cluster and C cluster:time. Hence, the time fixed effects are entirely contained within the remaining time stratum. Similarly, the intercept is entirely contained in the universal stratum level. The matrix inversion denoted in (3) and (4) will thus involve singular matrices with rows and columns of zeroes, corresponding to the columns of X that are orthogonal to the C . Therefore, the stratum‐specific Equations (3) and (4) are modified to drop such columns from X. The equations now reduce to scalar equations in terms of θ, the treatment effect, for the cluster and cluster‐time strata, which greatly simplifies the algebraic inversion. We can replace the general design matrix X with the column vector x = {x }, as defined in Section 2, where the right‐most index varies the fastest. Note the assumption that x = x is constant for all k, which is true for an SW design. We must obtain expressions for the variances of the treatment effect estimates in each stratum using the specific case of x, associated with the specific SW design and the specific values of C cluster,C cluster:time. The explicit calculation of individual terms is trivial to evaluate by summing x over the indexes that correspond to a J in the Kronecker matrix products that define B (Section 3). We also note that as x ∈ {0,1} Using the notation of Hussey and Hughes,4 , together with the expression for ζ in Table 1, we obtain the information values from the cluster and cluster:time strata, as shown in Table 2.

Table 2

Information levels by stratum

Stratum	Hussey and Hughes	General Case
Cluster	(NV/J−NU2/IJ)/JNτ2+σe2	(NV/J−NU2/IJ)/JNτcluster2+NΔ2+σe2
Time	No information	(NW/I−NU2/IJ)/INτtime2+NΔ2+σe2
Cluster:Time	(NU−NV/J−NW/I+NU2/IJ)/σe2	(NU−NV/J−NW/I+NU2/IJ)/NΔ2+σe2

Information levels by stratum Having obtained multiple estimates of the treatment effect at different strata, the analysis is not yet concluded; the statistician must use these to report a single treatment effect estimate and associated standard error that is optimal in some way. Given that the estimates from the two strata are known to be independently distributed, the minimum variance linear combination is shown in the work of Scheffé6 to be the overall least squares estimate, assuming that the covariance parameters are known, and thus is also the maximum likelihood estimate under normality assumptions. The optimal linear combination takes weights proportional to the information values, formally, the inverse variance, for each individual estimate, and the resulting estimate has information equal to the sum of the individual information values. Simple algebraic manipulation verifies the formula given, but not derived, in the work of Hussey and Hughes,4 as per Table 2. At the analysis stage, we could continue in the same manner of projecting the response vector Y and the design matrix X by strata, using the C projections, and then fitting a standard linear regression model within each stratum, as per the work Chambers et al.7 If working by hand, one would calculate the projections by iteratively working down the Hasse diagram, taking the average for each level of the stratum, and subtracting off the projection from the matching level of any parent strata. The residual mean squares at the strata provide estimates of σ 2 ζ that, using the formula in Table 3, for the general case, could be inverted to provide estimates of the random effects variance parameters. In the case of the Hussey‐and‐Hughes assumptions, there are multiple estimates of at the residual, cluster time, and time strata, which need to be combined as a sum with weights proportional to the residual degrees of freedom at each stratum. A further complication is that these residual mean squares are calculated conditional on the fixed effect parameters estimated within each stratum. Ultimately, we want to replace the individual‐stratum fixed effect estimates with pooled values, which are obtained using weights derived from covariance parameters. Hence, an iterative calculation to achieve convergence would be required. In practice, at the analysis stage, we would apply modern software5 using restricted maximum likelihood estimation (REML) to estimate both the covariance parameters and fixed effects, using optimal weighting of observations to account for any deviations from the assumption of exact balance for any missing clusters/times or unequal replication. Supplementary material in the form of R code is provided to illustrate the calculations.

Table 3

General covariance parametrizations

Stratum	ρ	f	σ ² ζ
Overall mean	0	ρ = 0	σ ²(I J N f + I N f _time + J N f _cluster + N f _cluster:time + f _individual)
			=JNτcluster2+INτtime2+NΔ+σe2
Time	τtime2/σ2	ρtime−ρ=τtime2/σ2	σ2(INftime+Nfcluster:time+findividual)=INτtime2+NΔ+σe2
Cluster	τcluster2/σ2	ρ _cluster − ρ = τ ²/σ ²	σ2(JNfcluster+Nfcluster:time+findividual)=JNτcluster2+NΔ+σe2
Cluster:Time	τcluster:time2/σ2	ρ _cluster:time − ρ _cluster − ρ _time + ρ = Δ/σ ²	σ2(Nfcluster:time+findividual)=NΔ+σe2
residual	1	ρindividual−ρcluster:time=σe2/σ2	σ2findividual=σe2

General covariance parametrizations

GENERALIZATION

We can relax the assumption of Hussey and Hughes4 in terms of two additional parameters. We can allow random, rather than fixed, time effects, which requires an extra correlation parameter, ρ time, to quantify the degree of correlation between two participants observed in the same time period but from distinct clusters. There may be further, super‐additive, correlation between participants in the same cluster‐period, where ρ cluster:time = Δ/σ 2 + ρ cluster + ρ time, and we used Δ to represent the increment in correlation. We modify Table 1 to achieve Table 3. The equivalent Hasse diagram is Figure 2. The corresponding information values from all three strata are given in Table 2.

Figure 2

Hasse diagram generalized structure

EFFICIENCY FACTOR

The evaluation of U,V, and W provide expressions for the variance of the treatment effect estimate for all values of the design matrix X. The most efficient design, with the smallest estimated treatment effect variance, is when individual‐level randomization is used, blocked at the cluster‐time level. Assuming equal and balanced randomization within each cluster‐time block, it is easily shown that U = I J/2,V = I J 2/2,W = J I 2/2. Hence, from Table 2, under either set of assumptions, the cluster and time strata provide exactly zero information and the numerator in the cluster‐time stratum is N I J/4, thus leading to a well‐known expression for the variance under individual‐level randomization. For any other design, the efficiency factor is the ratio of the estimated treatment effect variance to the variance from the individual‐level randomization design, assuming the same number of observations overall. Sample sizes may be calculated by initially assuming individual‐level randomization and then scaling up the resulting sample size by the efficiency factor for the actual design choice. The efficiency factor thus gives a simple summary to enable a fair comparison between design choices. Beyond making detailed calculations for U,V,W in specific cases, it is helpful to provide a rule‐of‐thumb for the efficiency factor of the SW design. We consider an abstraction that approximates finite sums with continuous integrals (for example, ), and for simplification, we assume I = J and that the summand/integrand is the indicator function that i ≤ j. This leads to U = J 2/2,V = W = J 3/3 and the numerators in Table 2 are all equal to N J 2/12. Hence, the efficiency factor for the Hussey and Hughes design is approximated by which goes from 2/3 to 1/3 as ρ cluster goes from 0 to 1.

DISCUSSION

In the set of assumptions used initially in the work of Hussey and Hughes,4 the information within the time‐cluster stratum increased linearly with N. Increasing the sample size overall will increase the power up to a maximum of 100%. The cluster term has a the J N τ 2 in the denominator and so increasing N will only increase the information at the cluster level to an asymptote. Mathematically, it can be seen that, in the SW design J = O(I),U = O(I 2),V = O(I 3),W = O(I 3), and so the information at the cluster stratum is O(I). Moving to the general parametrization, unless Δ = 0, all the information expression across the strata are O(I). Therefore, with a fixed number of clusters, the power can only increase to an asymptote strictly less than 100% as the overall sample size is increased. This limitation is common to standard cluster‐randomized designs. The assumption Δ = 0 may be plausible if the setting is a treatment implemented by training a ward in a hospital; thus, any correlation is due to a common set of staff within a ward, rather than any temporal preservation of correlation. In an educational setting, where the clusters are defined by a teacher using a new teaching method and the time periods are year cohorts of pupils in the teacher's class, Δ = 0 is a more questionable assumption. Pupils within the same year‐class will interact and influence each other more than pupils from different years who share the same teacher. The parametrization considered previously adds on a point mass of extra correlation for observations in the same cluster and period. An alternative would be a parametric correlation structure that allows the increment Δ( j 1,j 2) to be a function of the proximity of the periods within the same cluster | j 1 − j 2|, an autoregressive structure for example. However, that is beyond the scope of this article. If it is assumed that , then we revert to fixed time effects while still assuming Δ ≠ 0. In this case, the time stratum provides no information on the treatment effect. The random effects are a stronger assumption, which does rely on the parametric assumption of normality, in contrast to the nonparametric assumption of the fixed effects; the extra strength of the assumption does provide extra information on the treatment effects if it holds. In either case, the smaller the value of or , the more similar the observations between clusters or periods, compared to within, the stronger the information provided at the respective strata. In a recent research of Matthews and Forbes,8 the focus is on the coefficients of the linear combination of Y used in estimation of the treatment effect. They demonstrate that the weighted sum of the individual stratum estimates that form the maximum likelihood estimate can also be represented as a weighted sum from two simpler models. The first component, termed as vertical model, is a fixed effect model that ignores the cluster effect, only adjusting for time; in which case, there is an equivalent Hasse diagram, and the resulting orthogonal matrix is B cluster:time − B time. The second component comes from a two‐way fixed effect model adjusting for time and cluster; the resulting orthogonal matrix coincides with C cluster:time derived in Section 3. It is straightforward to show that any linear combination of these two matrices can be expressed as a linear combination of the C cluster:time and C cluster as used in this paper. The two routes to deriving and interpreting the components of information that constitute the maximum likelihood estimate are thus complimentary. SIM_7960‐Supp‐001‐computational_example.pdf Click here for additional data file.

3 in total

3. The hunt for efficient, incomplete designs for stepped wedge trials with continuous recruitment and continuous outcome measures.

Authors: Richard Hooper; Jessica Kasza; Andrew Forbes
Journal: BMC Med Res Methodol Date: 2020-11-17 Impact factor: 4.615

3 in total

Theory of general balance applied to step wedge designs.

BACKGROUND AND MOTIVATION

NOTATION

EXPERIMENTAL UNITS

COVARIANCE DECOMPOSITION

ESTIMATION

GENERALIZATION

EFFICIENCY FACTOR

DISCUSSION

Review 1. Design and analysis of stepped wedge cluster randomized trials.

2. Stepped wedge designs: insights from a design of experiments perspective.

3. Theory of general balance applied to step wedge designs.

Review 1. Mixed-effects models for the design and analysis of stepped wedge cluster randomized trials: An overview.

2. Theory of general balance applied to step wedge designs.

3. The hunt for efficient, incomplete designs for stepped wedge trials with continuous recruitment and continuous outcome measures.