Literature DB >> 26748662

Statistical efficiency and optimal design for stepped cluster studies under linear mixed effects models.

Abstract

In stepped cluster designs the intervention is introduced into some (or all) clusters at different times and persists until the end of the study. Instances include traditional parallel cluster designs and the more recent stepped-wedge designs. We consider the precision offered by such designs under mixed-effects models with fixed time and random subject and cluster effects (including interactions with time), and explore the optimal choice of uptake times. The results apply both to cross-sectional studies where new subjects are observed at each time-point, and longitudinal studies with repeat observations on the same subjects. The efficiency of the design is expressed in terms of a 'cluster-mean correlation' which carries information about the dependency-structure of the data, and two design coefficients which reflect the pattern of uptake-times. In cross-sectional studies the cluster-mean correlation combines information about the cluster-size and the intra-cluster correlation coefficient. A formula is given for the 'design effect' in both cross-sectional and longitudinal studies. An algorithm for optimising the choice of uptake times is described and specific results obtained for the best balanced stepped designs. In large studies we show that the best design is a hybrid mixture of parallel and stepped-wedge components, with the proportion of stepped wedge clusters equal to the cluster-mean correlation. The impact of prior uncertainty in the cluster-mean correlation is considered by simulation. Some specific hybrid designs are proposed for consideration when the cluster-mean correlation cannot be reliably estimated, using a minimax principle to ensure acceptable performance across the whole range of unknown values.

Entities: Chemical

Keywords: cluster studies; intra-cluster correlation; optimal design; stepped-wedge designs

Mesh：

Year: 2016 PMID： 26748662 PMCID： PMC4949721 DOI： 10.1002/sim.6850

Source DB: PubMed Journal: Stat Med ISSN： 0277-6715 Impact factor: 2.373

Introduction

In a ‘stepped’ cluster design, an intervention is introduced into some or all of the clusters at (possibly) different uptake‐times during the study. Once introduced into a cluster, the intervention persists until the end of the study. We assume that outcome‐data is collected in all clusters throughout the duration of study and that interest centres on the contrast between the treated (post‐intervention) and control (pre‐intervention) conditions. This class of study design includes some traditional parallel cluster designs 1 with ongoing patient recruitment (where one group of clusters receives the intervention at the start, and the remainder at the end, or not at all) as well as the recent ‘stepped‐wedge’ designs in which the intervention is introduced into all clusters but at staggered (often regularly spaced) time‐points 2, 3, 4, 5. Some other examples feature in Figure 1. Stepped‐wedge designs offer a potential advantage over parallel designs in that each cluster functions as its own control. This can be useful when a high proportion of the variation occurs at the cluster level, but it must be set against the temporal confounding that may arise because the number of clusters exposed to the intervention increases over the course of the study. The tension between cluster‐level effects and temporal confounding is a critical factor for the performance of stepped designs 6.

Figure 1

Schematic for some designs with eight clusters. The horizontal direction represents time and the total duration of the study (which is the same for each design) is equally divided between the columns. The rows represent clusters. So each cell represents the observation‐times in a single cluster over one fixed time period. Treated cells are denoted by 1, and Controls by 0. We suppose that a study for a cluster‐level intervention is planned to take place in K clusters and that each cluster contributes m outcome measurements (observations) at each of T regularly‐spaced time‐points. The discussion here applies to two main types of cluster study: (i) longitudinal (cohort) studies where Km subjects are recruited at the beginning of the study, with repeat observations at each time‐point; and (ii) cross‐sectional studies where KTm different subjects are recruited during the course of the study, and each contributes a single observation on the main outcome measure. We exclude ‘open’ cohort designs – i.e. longitudinal studies with significant drop‐out and/or ongoing recruitment. Instances of these two types are provided by two paradigms: (i) studies of a health intervention in a closed population (such as a care‐home) where the same subjects are monitored over a period of time 7, 8; and (ii) service‐delivery studies in hospital in‐patient or emergency departments where a single outcome measure is obtained from each member of a continuously changing patient population 9, 10. Further instances of each type are enumerated in a recent review 4. In either case we assume that the frequency and timing of data‐collection are already settled, and focus on the choice of intervention uptake‐times: specifically the impact of this choice on the statistical performance of the intervention effect estimate. The observations are described by a linear mixed effects model with fixed time and random cluster effects due to Hussey and Hughes 11, with the addition of random cluster‐specific time components and subject‐level components for longitudinal studies 12. Even without these additions, the Hussey and Hughes model has been found useful for the design of stepped studies 13, 14. It is directly applicable when the outcome is continuous, and approximately for binary outcomes when m is large 11. It was used also in a recent paper that addresses the optimal design of stepped‐wedge studies 15, although under conditions that exclude most of the optimal designs in the present study. Some designs using eight clusters are illustrated in Figure 1. In each particular design the number of observation‐times in each cell is constant, with treatment status coded as 1 if exposed and 0 if unexposed to the intervention. The total number of observations in each cluster (mT = M) is the same for all designs. This means that the cell‐sizes may differ between designs. Thus a cell in Figure 1b – the parallel design – contains twice as many observations as a cell in Figure 1a. In most of these examples the number of treated clusters grows over the course of the study. An exception is the cross‐over design in Figure 1(a). This design is often not practical because the intervention is withdrawn in some clusters halfway through the study. However it is useful as a reference design against which the relative efficiency of other designs can be calibrated. The ‘Delay‐Control Design’ – a terminology suggested by an anonymous referee – in Figure 1c is an elaboration of the parallel design in which all clusters contribute baseline control observations and all receive the intervention towards the end of the study. A stepped‐wedge design arises if all clusters receive the intervention with uptake times evenly spaced throughout the study. The stepped‐wedge in Figure 1d has eight clusters but only g = 4 uptake times. Alternatives with g = 2 or 8 are also available. The final design (Figure 1e) is motivated by an optimality result described later. It represents a 50:50 hybrid combination of a parallel design in the top and bottom two clusters with a stepped‐wedge‐type layout in the middle four clusters, although with reduced time before the first and after the last uptake point. The paper contains two main sections. In section 2 a generic expression for the precision of the treatment effect estimate is obtained. This leads to a simple graphical method for comparing the efficiency of designs. Section 3 is concerned with the optimal choice of uptake times when the numbers of clusters and observation time‐points are both fixed, and includes some suggestions for practical design choices.

The precision of the effect estimate under a linear mixed effects model

The model

Each cluster i (i = 1,…,K) generates m measurements at every time‐point j (j = 1,…,T) according to the following model: Here has variance and is the outcome for the lth observation (l = 1,…,m) at time j in cluster i; c 1,…,c are mutually independent random cluster effects (variance ); t 1,…,t are fixed time‐effects; s and (st) are within‐cluster random effects with variances and respectively; (ct) represents cluster‐level fluctuations of the time‐effect with variance so that η = 1 − η − η − η ; θ is the intervention effect whose estimation is the purpose of the analysis; and J is a binary variable (as in Figure 1) which indicates whether or not the intervention is present at time j in cluster i. The random components in this model are as in Teerenstra et al. 12. Note that we use M = Tm to denote the total number of observations in each cluster. The number of time parameters is dictated by the number of observation times (T), and this may be greater than the number of columns in the representation used in Figure 1. However, the main precision formula below applies also for models with a single time‐parameter for each column. This formulation covers two important cases: Cross‐sectional study with no within‐cluster time effects: η = 0, η = 0. At each time‐point m (new) subjects are observed within each cluster. The (st) component combines variation between subjects (within clusters) and within‐subject measurement error. Here η is the conventional intra‐cluster correlation coefficient (ICC). Longitudinal/Cohort study: η > 0. Within each cluster, the same group of m subjects is measured at every time‐point. The s‐component describes variation between subjects (within clusters); the (st)‐component describes variation within‐subjects (including measurement error); the (ct)‐component describes cluster‐level time effects. In this work we consider the performance of the best linear unbiased estimate of θ in model (1). This can depend only on the means of the observations at each time‐point in each cluster, i.e. on . The Ys obey a model (2) of the form proposed by Hussey and Hughes 11 and which is the basis of our development. This is: where , with variance , and , with variance , are independent random components. The variance of Y can be partitioned as with . The quantity ρ is the correlation between Y‐values obtained in the same cluster at different times, and corresponds to the ICC for the Ys. Unlike ICCs for the individual observations , ρ can easily be large (i.e. close to 1), particularly if η = 0.

The cluster‐mean correlation

The efficiency of a stepped cluster design turns on the quantity R, defined by , from which . The value of R has a simple interpretation in terms of the Y as the proportion of the variance of a cluster‐mean that comes from random effects that are independent of time (i.e. from the c and s). In many circumstances this definition is aligned with that for the correlation between the means of two replicate sets of observations – say – from the same cluster i, although the nature of the replication must be carefully considered. For a cross‐sectional study with η = 0, η = 0 the replicate set simply entails the recruitment of a new set of subjects in the same cluster. Moreover, in this case, , furnishing a direct connection with η, the ICC for the individual observations. A similar interpretation in longitudinal studies is possible if the ‘replicate set’ is taken to require new observations at the same times on the same subjects rather than a set of measurements on new subjects. In all cases we refer to R as the cluster‐mean correlation (CMC). In general 0 ≤ R ≤ 1 and large values of R (i.e. close to 1) are possible even in cross‐sectional studies with a small ICC (η) for the individual observations. This feature is illustrated in Figure 2.

Figure 2

The cluster‐mean correlation (R) as a function of cluster‐size (M = Tm) in cross‐sectional studies with individual‐observation ICC (η) = 0.001, 0.01, 0.05, 0.10, 0.20.

A precision formula

The performance of the design can be identified with the precision of the best linear unbiased estimate (BLUE) of the treatment effect θ in model (1), with known variances. The BLUE under model (1) is also the BLUE under the Hussey and Hughes model (2), so that the following result can be applied: (Precision of the effect estimate) Under the model in (2) the BLUE of θ has precision given by: where is the CMC and a and b are constants, specific to the particular design D, that depend on the distribution of the treatment indicator J over the units in the study. Specifically we have and making use of the ‘dot‐bar’ notation, so that means and means . In the language of the analysis of variance, a is the ‘within‐columns’ variance of the binary variable J and b is the ‘between‐rows’ variance of J, under a probability model in which equal weight is assigned to each point of the (i,j) lattice. The result (3) is established in Appendix A under the conditions of model (2), and provides a straightforward method to calculate the precision for any configuration of Js. The argument in appendix also shows that the precision expression (3) holds for a model with linear constraints on the t parameters, provided that the resulting model is rich enough to represent an initial level, together with a separate time‐effect for each uptake point. If the design is represented as in Figure 1, the minimum requirement is a separate time parameter for each column. (Extension to fixed cluster effects). If the cluster effects (γ) are treated as fixed, the precision of the estimate of θ is recovered by setting R(T,ρ) = 1 in expression (3). The result (3) is formally equivalent to an expression given by Hussey and Hughes 11. However, these authors do not expose the important relationship with the CMC.

Precision for some particular designs

The design coefficients a and b in (3) are presented for some particular designs in Table 1 below. The calculations are outlined in Appendix B. In some cases these expressions apply strictly only if certain divisibility conditions on T and K are met. For example, the results for the g‐step stepped‐wedge design require that T be a multiple of (g + 1), and K a multiple of g. Where a required condition does not hold, the results may be regarded as approximations whose validity improves in larger studies.

Table 1

Design (D)	Example	4a_D	4b_D	ARE (T → ∞) = 4a_D − 4b_D
Cross‐Over (CXO)	Figure 1a	1	0	1
Parallel Design (PD)	Figure 1b	1	1	0
Delay‐Control Design (DCD_p,q,r)	Figure 1c	q	q²	q(1 − q)
Stepped‐Wedge (SW_g)	Figure 1d	231−1g	131−2g+1	131−2gg+1
Modified Stepped‐Wedge (MSW_g)	Figure 1e (middle 4 clusters)	231−1g2	131−1g2	131−1g2
Hybrid (^βH_g)	Figure 1e	1−β231+2g2	1−β32+1g2	β32+1g2−β1+2g2

Design coefficients for some selected designs. These determine the precision using equation (3) and the design effect using equation (5). The final column represents the asymptotic relative efficiency (ARE) of the design (compared to CXO) when the number of observations is large. Cluster Cross‐Over design (CXO): For this design it is assumed that both K and T are even. Initially there are K/2 intervention clusters and K/2 control clusters. Halfway through the study each cluster crosses over to the alternative condition. Parallel Design (PD): This is a classic design for a cluster‐randomised trial. An even number of clusters is assumed, equally divided between treated and control conditions. Delay‐Control Design (DCD): A parallel layout runs only for a proportion q of the study duration, preceded by a period (proportion p) in which all clusters remain in the control condition, and followed by a period (proportion r) in which the treatment is present in all clusters (p + q + r = 1). The PD (= DCD0,1,0) and the ANCOVA design 12 (= DCD0.5,0.5,0) are covered as special cases. The design coefficients (a depend only on the proportion of time (q) occupied by the parallel layout. For both the PD and the DCD it is possible to vary the proportion of control clusters within the parallel layout from the 50% assumed here. The effect would be to reduce both design coefficients – and hence the precision – by the factor 4s(1 − s) ≤ 1 where s is the (new) proportion of control clusters. Stepped‐Wedge design with g steps (SWg): This is the ‘standard’ stepped‐wedge design. The clusters are divided into g equal groups. Initially all clusters are untreated. The uptake time is the same for all clusters in any one group. In the first group this occurs when a fraction of the total study time equal to 1/(g + 1) has elapsed and T/(g + 1) observation time‐points have passed. The uptake times in the remaining groups occur at time‐fractions 2/(g + 1), 3/(g + 1),…, etc. until all clusters are exposed. During the final part of the study (from time‐fraction g/(g + 1) to the end) all clusters are exposed to the intervention. Modified SW design (MSW): In this modification of the SW design the time period before the first (and after the last) uptake point is equal to one half of the time between consecutive uptake points – as in the middle four clusters in Figure 1e. Because a − b R > a − b R for all values of R, the MSW design generates greater precision than the conventional SW design, although the relative advantage diminishes as g increases. Stepped‐Wedge/Parallel Hybrid designs (βH): These designs achieve the maximum precision over stepped designs with overall balance between treated and control observations, a result established below in section 3. In the βH hybrid, Kβ clusters are assigned to a modified stepped‐wedge layout with g uptake points, and the remaining K(1 − β) clusters to a concurrent parallel layout.

Precision‐ratio plots

Using the CXO as a reference design, the efficiency of any stepped design, D, may be defined as a precision ratio from (3): The performance of the design when T is large is captured by the asymptotic relative efficiency (ARE) as R → 1 (= 4a − 4b) as given in the final column of Table 1. The fact that this is zero for the PD reflects the generally poor performance of parallel cluster trials with large cluster sizes 16. The other designs all have non‐zero AREs so that there is no upper limit to the precision that can be obtained by increasing the number of observations in each cluster. Figure 3 shows the precision‐ratio as a function of the CMC, R, for the designs in Figure 1. When R is small, the PD outperforms the other designs, but is the worst design when R = 1. The SW4 design performs relatively well for large R but is dominated by MSW4. Similarly DCD with q = 2/3 is dominated by 0.5H4. Over the range the 0.5H4 design gives the best precision among these designs. Outside this range PD (for smaller R) or MSW4 (larger R) are best. If the value of R is uncertain the 0.5H4 design might be preferred as a compromise design, at least on statistical grounds, because it performs reasonably well throughout the range.

Figure 3

Precision–ratio plots. The relative precision of some stepped designs compared to the cross‐over design is plotted against the cluster‐mean correlation parameter R. Designs 1a to 1e are those illustrated for eight clusters in Figure 1. The Modified SW design corresponds to two copies of the middle four clusters in Figure 1e. This approach can inform the choice between any two candidate designs: for example, that between a stepped‐wedge and parallel design 14, 16, 17, 18, 19, 20, 21, 22. From (4) the stepped‐wedge is the more efficient only if R > r 0 where the threshold r 0 satisfies This implies that for a standard SWg design, and if an MSW is used. In general, a SW design – of either type – can be better than a PD only if the cluster‐mean correlation R exceeds ½. In a cross‐sectional study, where the ICC of a single observation (i.e. η) is generally small, this translates into a useful rule of thumb: to prefer the SW option only if the ICC is greater than 1/M.

Design effects

A design effect is the ratio of the precision of the treatment effect estimate in an individually randomised trial (RCT) to that in the cluster trial, and is convenient for sample size calculations. For a CXO design under model (1) the components c and s cancel out in the treatment‐effect estimate and the design effect is just mη + η . It follows that the design effect for a general design D is given by For a parallel cross‐sectional (PD) design with M observations per cluster and no cluster‐level time effects (η = η = 0) the formula (5) becomes (1 − η )/(1 − R) and yields the well‐known design effect [1 + (M − 1)η ] 1. Published stepped‐wedge design effects 14 can also be obtained in this way.

Finding the best design

In many traditional designs the intervention is introduced (into some of the clusters) at a single time‐point in the study. This is not true of stepped‐wedge designs for which a number (g) of different uptake times will be required. Normally these are equally spaced in time, yet they can be chosen so as to optimise the performance of the design. To this end, we assume that the study will involve T observation times within each of K clusters with observations Y generated by the model (2). The problem is to choose a configuration of uptake times in the clusters to achieve the best possible precision for the treatment effect estimate. Note that it is possible for an uptake time to occur before the start of the study (in a pure ‘treated’ cluster), or, notionally, after it has finished (in a pure ‘control’ cluster). Mathematically we must find the values of the Js (= 0 or 1) that maximise the expression for Π in (3), subject to the irreversibility constraint that j > j′ ⇒ J ≥ J for all i. First suppose that the clusters have been numbered according to the order in which their uptake times occur (so that i < i′ ⇒ J ≥ J for all j). This ordering has been implicitly assumed in the examples in Figure 1 above, with the cluster index i taken as increasing in the upwards direction and the time index j as increasing from left to right. Also it is convenient to map the design points (i,j) on to a lattice in the x‐y plane by setting The x–y design lattice is contained in a unit square centred on the origin (see Figure 4).

Figure 4

A best balanced design. The boundary line (of slope R) divides the treated (●) from the untreated (○) points, and contains the uptake points in the ‘stepped‐wedge’ clusters. If T/(K − 2P) is an even integer, the design is a Hybrid design as defined above in section 2. With these definitions an alternative expression to (3) for the precision is given by a result derived in Appendix C. Unlike (3) (which holds for any configuration of the Js) the expression (6) is valid in the presence of the irreversibility constraint, and where the clusters follow the suggested ordering.

An algorithm for optimal designs

Suppose that J • • (the total number of treated time‐points over all clusters) is fixed. Then, in view of (6), the precision is maximised by nominating treated points one by one, beginning with (i, j) = (1, T) and in decreasing order of the quantity (Rx), until exactly J • • are included in the treated set. Any ambiguity in the ordering caused by tied values can be resolved arbitrarily because tied points contribute the same amount to the expression in (6). In geometrical terms this is achieved by moving a straight line of slope R upwards until exactly J • • points lie on or beneath it (as, for example, in Figure 4). In this way designs with the best precision are determined for each of the (TK − 1) possible values of J • •. The overall optimum can then be obtained by maximising the best precision over J • •, using (6). This algorithm is easily implemented in a spreadsheet programme and can assist with the design of practical studies.

Best balanced designs

Practical experience with this algorithm shows: (i) that the optimal design is not always unique; and (ii) that when the number of design points is an even number there may still be no optimal design with overall balance between the treatment arms – i.e. for which . Notwithstanding (ii) it turns out that the design with greatest precision among those that satisfy the balance condition (the best balanced design, or BBD) is often approximately optimal, especially when both T and K are large. The BBD can be determined from the algorithm above by setting . (Best Balanced Design) Suppose that the number of design points, TK, is even. A design in which the treated points consist of those for which y < Rx together with half of the points (if there are any) for which y = Rx is a best balanced design. In such a design, the boundary between the treated and untreated lattice‐points is a straight line through the origin with slope R. An example is shown in Figure 4. The first group of P clusters receive the intervention at the start of the study (i.e. they are ‘Treated’ clusters) where P is the label of the last cluster for which the point (x 1 ,y) lies on, or just below, the boundary line y = Rx. This condition translates into the inequalities: The last P clusters are ‘control clusters’ and do not receive the intervention at all. The K − 2P clusters in the middle participate in a genuine stepped study with equal intervals of time between the uptake points. In some cases the design for these middle clusters corresponds exactly to a MSW( design, though this requires that T = 2(K − 2P)l for some integer l, where l is the number of time‐points in each cluster before the first uptake time (and 2l is the number between two successive uptake times). In the general case, the correspondence to the MSW design is only approximate. This result shows that the best balanced design when T is large approximates to a hybrid design H in which a stepped‐wedge layout is followed in a proportion of the clusters equal to the CMC, R, and a parallel layout in the remaining clusters.

Examples of optimal design

Consider the design of a study in 10 clusters over six observation times. The design lattice contains 60 (=10 × 6) points. The observations within each cluster are either unique measurements on 6m separate subjects, or they could be six repeat measurements on the same m subjects. The value of m need not be separately specified because the statistical design issues depend only on the CMC, R. Best balanced designs for this problem are shown in Figure 5 for the full range of R‐values from 0 to 1. For each design the range of validity includes both end‐points, so that, for instance, design (a) – a parallel layout – and design (b) are both BBDs when R = 0.12. The BBDs are unique except at these isolated end‐points. In a majority of cases the BBD is also an optimal design. Exceptions include design (b) for R = 0.2, and design (h) for R = 1. In each case conversion to optimality is achieved by modifying the treated set either to exclude the circled point or to include the boxed point. By repeated application of the optimality algorithm with R = 0(0.001)1 it was found that the BBD is optimal in 77.5% of cases, and achieved a relative efficiency of at least 98.83% for all values of R, with the worst case at R = 0.6. The mean relative efficiency of the BBDs was 99.92% – a minimal loss of efficiency compared to the true optimum.

Figure 5

BBDs for 10 clusters over six time‐points. The boxed and circled points in (b) and (f) relate to overall optimal designs for particular R‐values as discussed in the text.

BBDs for 10 clusters over six time‐points. The boxed and circled points in (b) and (f) relate to overall optimal designs for particular R‐values as discussed in the text. The case with R = 0.6 is complex: designs (d), (e), and (g) are just three out of 20 possible BBDs in this case. The other seventeen BBDs are obtained by choosing different sets of three treated points from the six design points along the diagonal indicated in Figure 5(g), whose slope (=0.6) exactly matches the value of R. Design (g) has been singled out here because it is an example of an exact hybrid design (0.6H3). However, none of these 20 BBDs is fully optimal when R = 0.6. An optimal design in this case omits all six diagonal points from the treated set. All the examples shown include pure ‘control’ clusters and pure ‘treated’ clusters as part of the optimal design. This is a common feature of optimal stepped designs as defined here, but is specifically excluded in recent work by Lawrie et al. 15.

Admissible designs for large studies

When T and K are large, the precision for the best balanced design approximates to that of the overall optimal design, with an error of small order in the quantity min(T,K). This is shown in Appendix D– by means of an integral approximation to (6). It follows that the efficiency (relative to the CXO) of the overall optimal design for a large study is that of a hybrid design in which g is large (i.e. H∞). From Table 1 this is This quadratic is plotted in Figure 6 and represents the boundary of the feasible region for stepped designs. The admissible designs (i.e. those that are not dominated by any other stepped design) are the hybrid designs, and generate tangents to the boundary curve. The parallel (R = 0) and stepped‐wedge (R = 1) designs are both admissible. The region of the unit square in Figure 6 above the quadratic curve is not attainable by any stepped design.

Figure 6

The relative efficiency of cluster designs for large studies as a function of the cluster‐mean correlation R. Three feasible (and admissible) stepped designs are shown: PD, SW (solid lines) and Minimax – i.e. 0.634H∞ – (broken line). The shaded region is prohibited by the irreversibility constraint and bounded by the quadratic curve . Hybrid designs do not appear to have been widely used in practice. We are aware of only one example 23. However they offer an appealing compromise between the parallel and stepped‐wedge designs given that these can offer optimal precision only at the extreme ends of the range for the cluster‐mean correlation (i.e. when R = 0 and 1, respectively). An appropriate hybrid design can improve the precision of the treatment estimate in all other cases.

Minimax designs

Where reliable information about the CMC is not available it makes sense to choose an admissible design that minimises the maximum possible loss of precision relative to the optimal design. For large studies this is a hybrid design (the ‘minimax hybrid’) where , the proportion of SW clusters, is given by At the minimax solution the relative loss of precision is the same at both ends of the range for R (i.e. at R = 0 and R = 1) because of the convexity of the quadratic boundary curve in Figure 6. So satisfies , from which . The precision of the minimax design is at least (≈86.6%) of the available precision at all values of R. In practice a near‐minimax design can be constructed if about 63.4% of the clusters are allocated to a stepped‐wedge layout with the remainder in a parallel layout. The extent to which this can be achieved depends on the number of clusters in the study. Ideally, 63.4% of K should be close to a whole number of clusters (the SW component), with the residue equal to an even number (for the Parallel component). Some possible near‐minimax designs are shown in Table 2.

Table 2

Design label	No. of parallel clusters	No. of SW clusters	Total no. of clusters	Proportion of SW clusters (β)	No. of SW uptake points (g)	Relative precision at R = 0	Relative precision at R = 1	Worst relative precision (WRP)
1	2	3	5	0.600	3	85.3	82.7	82.7
2	2	4	6	0.667	4	83.3	87.5	83.3
3	4	6	10	0.600	6	87.3	83.7	83.7
4	4	7	11	0.636	7	86.0	86.4	86.0
5	4	8	12	0.667	8	84.7	88.5	84.7
6	6	9	15	0.600	9	87.7	83.9	83.9
7	6	10	16	0.625	5	85.9	85.3	85.3
8	6	10	16	0.625	10	86.7	85.8	85.8
9	6	12	18	0.667	6	84.4	88.3	84.4
Minimax				0.634	∞	86.6	86.6	86.6
PD				0		100.0	0.0	0.0
SW_∞				1	∞	66.7	100.0	66.7

Performance of some near‐minimax hybrid designs alongside minimax (0.634H∞), PD and SW∞ for comparison. The relative precisions are computed with reference to the best possible designs in large studies. When R = 0 this is a parallel design (PD), when R = 1 it is a MSW with a notionally infinite value of g. Any of the designs in Table 2 can be ‘scaled up’ to larger numbers of clusters (rows) and observation periods (columns) without the affecting the precision relative to the CXO design, which can depend only on β and g. The best performance in the table is achieved by design 4, illustrated in Figure 7. Note that the 50:50 Hybrid in Figure 1e achieves a worst‐case relative precision (WRP) of 75%. Even the simplest five‐cluster design in the table (design 1) achieves a WRP of nearly 83%. This design was encountered above (Figure 5(g)) as a BBD for the case K = 10, T = 6.

Figure 7

A near‐Minimax design with 11 (groups of) clusters (row 4 from Table 2). It achieves at least 99.25% of the precision of the large‐study minimax design (0.634H∞) for any value of R.

Design performance when R is uncertain – a simulation study

In practice the likely performance of a chosen design depends on the degree of uncertainty in the prior estimate of the CMC. This was investigated using a class of prior distributions for R of the form: where R 0 is the median CMC value, and prior uncertainty is characterised by the variance parameter τ 2. Values of τ2 were chosen to correspond to three different coefficients of variation of the quantity R/(1 − R): 0.25 (low uncertainty); 0.5 (medium uncertainty); and 1.0 (high uncertainty). In view of the relation the prior is consistent with a similar level of uncertainty for ρ (or η). In particular, this analysis applies directly to the case of a cross‐sectional study with uncertain ICC. Table 3 shows the results of a simulation study for the distribution of the relative precision of some particular design‐choices compared to the BBD at the true value of R. Centiles from the distribution are displayed, alongside the WRP. The designs considered are: (i) BB0 – the best balanced design at the median‐estimate, R 0; the near‐minimax hybrid design 6H3; (iii) Mmx – the true‐minimax design for large studies (= 0.634H∞). Two cases are shown: (i) a ‘small’ study with 10 clusters and six observation times (for which the BBDs are shown in Figure 5); and (ii) the limiting case of large studies (T, K → ∞), for which the BBDs are hybrid designs.

Table 3

		(a) K = 10, T = 6						(b) Limiting case: T, K → ∞
		CV = 0.25		CV = 0.5		CV = 1.0		CV = 0.25			CV = 0.5			CV = 1.0
		BB₀	^.6H₃	BB₀	^.6H₃	BB₀	^.6H₃	BB₀	^.6H₃	Mmx	BB₀	^.6H₃	Mmx	BB₀	^.6H₃	Mmx
R ₀ = 0.1	cent50	1.000	0.884	1.000	0.884	1.000	0.884	1.000	0.881	0.895	1.000	0.881	0.895	0.999	0.881	0.895
	cent25	1.000	0.879	0.998	0.876	0.994	0.871	1.000	0.877	0.891	0.999	0.874	0.888	0.998	0.870	0.883
	cent10	0.998	0.876	0.993	0.870	0.978	0.864	1.000	0.874	0.888	0.998	0.869	0.883	0.991	0.864	0.877
	cent5	0.997	0.874	0.990	0.868	0.962	0.861	0.999	0.872	0.886	0.996	0.867	0.880	0.981	0.861	0.874
	cent1	0.994	0.871	0.978	0.864	0.910	0.858	0.998	0.870	0.883	0.991	0.863	0.877	0.944	0.858	0.871
	WRP	0.000	0.853	0.000	0.853	0.000	0.853	0.190	0.827	0.866	0.190	0.827	0.866	0.190	0.827	0.866
R ₀ = 0.3	cent50	1.000	0.936	1.000	0.936	0.997	0.936	0.999	0.931	0.949	0.998	0.931	0.949	0.994	0.931	0.949
	cent25	1.000	0.927	0.996	0.921	0.985	0.912	0.998	0.924	0.941	0.994	0.917	0.933	0.985	0.907	0.922
	cent10	0.998	0.921	0.985	0.910	0.959	0.893	0.997	0.917	0.933	0.988	0.905	0.920	0.959	0.889	0.903
	cent5	0.995	0.917	0.977	0.903	0.928	0.884	0.995	0.913	0.929	0.981	0.899	0.914	0.928	0.881	0.895
	cent1	0.988	0.911	0.959	0.893	0.837	0.871	0.991	0.906	0.922	0.960	0.888	0.902	0.853	0.870	0.883
	WRP	0.472	0.853	0.472	0.853	0.472	0.853	0.510	0.827	0.866	0.510	0.827	0.866	0.510	0.827	0.866
R ₀ = 0.5	cent50	1.000	0.977	1.000	0.977	0.997	0.971	0.999	0.968	0.990	0.997	0.967	0.989	0.990	0.961	0.984
	cent25	1.000	0.969	0.997	0.963	0.975	0.952	0.997	0.962	0.983	0.990	0.956	0.977	0.973	0.941	0.962
	cent10	1.000	0.963	0.979	0.951	0.932	0.925	0.994	0.956	0.977	0.980	0.944	0.962	0.949	0.919	0.938
	cent5	0.998	0.959	0.963	0.941	0.902	0.913	0.992	0.953	0.973	0.973	0.936	0.954	0.928	0.907	0.925
	cent1	0.985	0.953	0.932	0.924	0.846	0.893	0.987	0.945	0.964	0.954	0.920	0.937	0.881	0.888	0.903
	WRP	0.694	0.853	0.694	0.853	0.694	0.853	0.750	0.827	0.866	0.750	0.827	0.866	0.750	0.827	0.866
R ₀ = 0.7	cent50	1.000	0.985	1.000	0.983	1.000	0.975	0.999	0.969	0.997	0.997	0.968	0.995	0.991	0.961	0.988
	cent25	1.000	0.980	1.000	0.973	0.980	0.960	0.998	0.963	0.993	0.991	0.957	0.986	0.976	0.940	0.969
	cent10	1.000	0.974	0.985	0.964	0.951	0.935	0.995	0.957	0.987	0.983	0.942	0.974	0.957	0.914	0.946
	cent5	1.000	0.971	0.972	0.959	0.934	0.921	0.993	0.953	0.984	0.977	0.933	0.965	0.944	0.899	0.932
	cent1	0.989	0.966	0.949	0.941	0.895	0.901	0.989	0.945	0.977	0.963	0.916	0.950	0.915	0.875	0.911
	WRP	0.813	0.853	0.813	0.853	0.813	0.853	0.837	0.827	0.866	0.837	0.827	0.866	0.837	0.827	0.866
R ₀ = 0.9	cent50	1.000	0.926	1.000	0.926	1.000	0.926	1.000	0.901	0.936	0.999	0.901	0.936	0.998	0.901	0.936
	cent25	1.000	0.918	1.000	0.911	1.000	0.901	0.999	0.892	0.928	0.998	0.885	0.921	0.995	0.875	0.912
	cent10	1.000	0.911	0.998	0.900	0.979	0.887	0.999	0.885	0.921	0.996	0.873	0.910	0.985	0.857	0.895
	cent5	1.000	0.907	0.991	0.894	0.966	0.880	0.998	0.881	0.918	0.993	0.867	0.904	0.972	0.850	0.888
	cent1	0.999	0.902	0.978	0.887	0.933	0.873	0.997	0.875	0.912	0.984	0.857	0.895	0.933	0.841	0.880
	WRP	0.720	0.853	0.720	0.853	0.720	0.853	0.730	0.827	0.866	0.730	0.827	0.866	0.730	0.827	0.866

Performance of best balanced and near‐minimax designs under prior uncertainty for the CMC, R, (a) for 10 clusters with six observation times and (b) in the limiting case of large studies. The prior median is denoted by R 0, and CV is the coefficient of variation of the quantity R/(1 − R). the designs considered are: BB0 – the BBD at R 0; .6H3 – a near‐minimax hybrid design (design 1 in Table 2); Mmx (= .634H∞) – the minimax design in large studies. Entries are the centiles (from 9999 simulations) of the relative precision of the stated design compared to the BBD at the true value of R. WRP is the relative precision at the least favourable value of R (=0 or 1 in all cases). The BB0 designs show excellent relative precision for medium or low uncertainty about R: the median (cent50) relative precision is at least 99% and the 5th centile as least 95% in all cases. Under higher levels of uncertainty there is some deterioration in performance, even in large studies. Even so the 5th centile still exceeds 90% in all tabulated cases. In general, performance is somewhat better for values of R 0 towards the ends rather than the middle of the range. Nevertheless the low values of the WRP for many of the BB0 designs (it can be as low as 0 when R 0 is small) highlight the need for caution if the prior information is unreliable. In this case a more robust choice is the near‐minimax design .6H3. In the small study its WRP of 85.3% guarantees acceptable performance even when the CMC is very low. In large studies, it is inferior to the true minimax design (0.634H∞) by less than 4 percentage points (i.e. 0.04) at every centile of the distribution of relative precision. Improved near‐minimax designs are available in Table 2.

Comments

The mixed effects model (1) is a vehicle to express a certain correlation‐structure with additive fixed effects for time and treatment, including subject‐level effects and variation within clusters over time. This structure could be further enriched – for example by using time series models for the development of subject and cluster effects over time, or by using cluster by treatment interaction terms to model variation in treatment effects across clusters. In its current form the model expresses the tension between cluster‐effects and time‐effects for the design of cluster studies in a form susceptible to exact analysis. At the design stage, the correlations are assumed known and, following the strategy of Hussey and Hughes 11, we use the model to compute the precision of unbiased linear estimates of a fixed treatment effect. This approach is moment‐based and makes no explicit distributional assumption, although it is clearly justifiable when the data are normally distributed with the prescribed correlation‐structure. In some other cases, where central limiting arguments can be deployed, it approximates to the best approach – for example, for binary outcomes when m is large. The additivity requirement in the model can be relaxed within a generalised linear mixed model, with fixed and random effects on a transformed scale. Then our results may still apply approximately if these effects are relatively small, permitting a linear approximation on the natural scale. Several aspects of this work offer potential applications in the design of both cross‐sectional cluster studies (where each subject provides a single outcome measurement) and of cohort designs (where repeated measures are taken on the same subjects). First, the expression for precision in terms of the CMC leads to a convenient graphical tool for comparing the performance of competing study designs, and provides a simple approach to the computation of design effects which are useful in sample size calculations. Second, the class of hybrid designs emerge as admissible designs, at least in large studies, in the sense that any design not of this form is (weakly) dominated by at least one design that is. Cross‐sectional studies can often be conceptualised as a series of observations taken over a large number of weeks or months, in which case the hybrid results may be directly applicable. Third, a simple search algorithm is proposed for finding optimal designs when the ‘large‐study’ results are inapplicable. This is often the case in cohort studies, where the number of observation times is limited to the (usually small) number of repeat measures on each subject. The precise nature of the optimal design depends on the value of the CMC which, in common with the traditional ICC, may be difficult to predict with certainty. An alternative strategy is to look for a design that reduces the possible loss of precision because of an unknown CMC. For this purpose the near‐minimax hybrids of section 3.5 are offered as practical alternatives to more traditional designs.

19 in total

Review 1. Design and analysis of stepped wedge cluster randomized trials.

Authors: Michael A Hussey; James P Hughes
Journal: Contemp Clin Trials Date: 2006-07-07 Impact factor: 2.226

2. Stepped wedge cluster randomized trials are efficient and provide a method of evaluation without which some interventions would not be evaluated.

Authors: Karla Hemming; Alan Girling; James Martin; Simon J Bond
Journal: J Clin Epidemiol Date: 2013-07-09 Impact factor: 6.437

3. The efficiency of stepped wedge vs. cluster randomized trials: stepped wedge studies do not always require a smaller sample size.

Authors: Karla Hemming; Alan Girling
Journal: J Clin Epidemiol Date: 2013-09-10 Impact factor: 6.437

Review 4. Systematic review of stepped wedge cluster randomized trials shows that design is particularly used to evaluate interventions during routine implementation.

Authors: Noreen D Mdege; Mei-See Man; Celia A Taylor Nee Brown; David J Torgerson
Journal: J Clin Epidemiol Date: 2011-03-16 Impact factor: 6.437

5. Use of the stepped wedge design cannot be recommended: a critical appraisal and comparison with the classic cluster randomized controlled trial design.

Authors: Daniel Kotz; Mark Spigt; Ilja C W Arts; Rik Crutzen; Wolfgang Viechtbauer
Journal: J Clin Epidemiol Date: 2012-09-08 Impact factor: 6.437

6. Response to Keriel-Gascou et al. Addressing assumptions on the stepped wedge randomized trial design.

Authors: Noreen D Mdege; Mona Kanaan
Journal: J Clin Epidemiol Date: 2014-04-18 Impact factor: 6.437

7. Implementing the chronic care model for frail older adults in the Netherlands: study protocol of ACT (frail older adults: care in transition).

Authors: Maaike E Muntinga; Emiel O Hoogendijk; Karen M van Leeuwen; Hein P J van Hout; Jos W R Twisk; Henriette E van der Horst; Giel Nijpels; Aaltje P D Jansen
Journal: BMC Geriatr Date: 2012-04-30 Impact factor: 3.921

Review 8. The stepped wedge trial design: a systematic review.

Authors: Celia A Brown; Richard J Lilford
Journal: BMC Med Res Methodol Date: 2006-11-08 Impact factor: 4.615

Review 9. Sample size calculation for a stepped wedge trial.

Authors: Gianluca Baio; Andrew Copas; Gareth Ambler; James Hargreaves; Emma Beard; Rumana Z Omar
Journal: Trials Date: 2015-08-17 Impact factor: 2.279

10. Statistical efficiency and optimal design for stepped cluster studies under linear mixed effects models.

Authors: Alan J Girling; Karla Hemming
Journal: Stat Med Date: 2016-01-07 Impact factor: 2.373

39 in total

1. Novel methods for the analysis of stepped wedge cluster randomized trials.

Authors: Lee Kennedy-Shaffer; Victor de Gruttola; Marc Lipsitch
Journal: Stat Med Date: 2019-12-26 Impact factor: 2.373

Review 2. Review of Recent Methodological Developments in Group-Randomized Trials: Part 1-Design.

Authors: Elizabeth L Turner; Fan Li; John A Gallis; Melanie Prague; David M Murray
Journal: Am J Public Health Date: 2017-04-20 Impact factor: 9.308

3. Proposed variations of the stepped-wedge design can be used to accommodate multiple interventions.

Authors: Vivian H Lyons; Lingyu Li; James P Hughes; Ali Rowhani-Rahbar
Journal: J Clin Epidemiol Date: 2017-04-13 Impact factor: 6.437

4. Power calculation for analyses of cross-sectional stepped-wedge cluster randomized trials with binary outcomes via generalized estimating equations.

Authors: Linda J Harrison; Rui Wang
Journal: Stat Med Date: 2021-09-23 Impact factor: 2.373

5. Analysis of stepped wedge cluster randomized trials in the presence of a time-varying treatment effect.

Authors: Avi Kenny; Emily C Voldal; Fan Xia; Patrick J Heagerty; James P Hughes
Journal: Stat Med Date: 2022-06-30 Impact factor: 2.497

Review 6. Stepped Wedge Cluster Randomized Trials: A Methodological Overview.

Authors: Fan Li; Rui Wang
Journal: World Neurosurg Date: 2022-05 Impact factor: 2.210

7. Nationwide implementation of online communication skills training to reduce overprescribing of antibiotics: a stepped-wedge cluster randomized trial in general practice.

Authors: Leon D'Hulster; Steven Abrams; Robin Bruyndonckx; Sibyl Anthierens; Niels Adriaenssens; Chris C Butler; Theo Verheij; Herman Goossens; Paul Little; Samuel Coenen
Journal: JAC Antimicrob Resist Date: 2022-06-29

8. Design and analysis considerations for cohort stepped wedge cluster randomized trials with a decay correlation structure.

Authors: Fan Li
Journal: Stat Med Date: 2019-12-04 Impact factor: 2.373

9. Ratio estimators of intervention effects on event rates in cluster randomized trials.

Authors: Xiangmei Ma; Paul Milligan; Kwok Fai Lam; Yin Bun Cheung
Journal: Stat Med Date: 2021-10-15 Impact factor: 2.497

10. Prognosticating Outcomes and Nudging Decisions with Electronic Records in the Intensive Care Unit Trial Protocol.

Authors: Katherine R Courtright; Erich M Dress; Jaspal Singh; Brian A Bayes; Marzana Chowdhury; Dylan S Small; Timothy Hetherington; Lindsay Plickert; Michael E Detsky; Jason N Doctor; Michael O Harhay; Henry L Burke; Michael B Green; Toan Huynh; D Matthew Sullivan; Scott D Halpern
Journal: Ann Am Thorac Soc Date: 2021-02