Literature DB >> 29096682

Estimating relative risks in multicenter studies with a small number of centers - which methods to use? A simulation study.

Claudia Pedroza¹, Van Thi Thanh Truong².

Abstract

BACKGROUND: Analyses of multicenter studies often need to account for center clustering to ensure valid inference. For binary outcomes, it is particularly challenging to properly adjust for center when the number of centers or total sample size is small, or when there are few events per center. Our objective was to evaluate the performance of generalized estimating equation (GEE) log-binomial and Poisson models, generalized linear mixed models (GLMMs) assuming binomial and Poisson distributions, and a Bayesian binomial GLMM to account for center effect in these scenarios.
METHODS: We conducted a simulation study with few centers (≤30) and 50 or fewer subjects per center, using both a randomized controlled trial and an observational study design to estimate relative risk. We compared the GEE and GLMM models with a log-binomial model without adjustment for clustering in terms of bias, root mean square error (RMSE), and coverage. For the Bayesian GLMM, we used informative neutral priors that are skeptical of large treatment effects that are almost never observed in studies of medical interventions.
RESULTS: All frequentist methods exhibited little bias, and the RMSE was very similar across the models. The binomial GLMM had poor convergence rates, ranging from 27% to 85%, but performed well otherwise. The results show that both GEE models need to use small sample corrections for robust SEs to achieve proper coverage of 95% CIs. The Bayesian GLMM had similar convergence rates but resulted in slightly more biased estimates for the smallest sample sizes. However, it had the smallest RMSE and good coverage across all scenarios. These results were very similar for both study designs.
CONCLUSIONS: For the analyses of multicenter studies with a binary outcome and few centers, we recommend adjustment for center with either a GEE log-binomial or Poisson model with appropriate small sample corrections or a Bayesian binomial GLMM with informative priors.

Entities: Chemical Disease Gene Species

Keywords: Bayesian log binomial; Correlated binary data; Generalized estimating equations; Generalized linear mixed models; Multicenter studies; Relative risk

Mesh：

Substances：
Anti-Infective Agents

Year: 2017 PMID： 29096682 PMCID： PMC5667460 DOI： 10.1186/s13063-017-2248-1

Source DB: PubMed Journal: Trials ISSN： 1745-6215 Impact factor: 2.279

Background

In multicenter studies, outcomes from the same center cannot be assumed to be independent, and analyses often need to account for center clustering. Neglecting to account for center may lead to erroneous conclusions, particularly when randomization is stratified by center [1-4]. Yet, authors of a recent review of multicenter studies published in four major medical journals (BMJ, New England Journal of Medicine, JAMA, and The Lancet) found that only 22% of randomized controlled trials (RCTs) with a binary outcome reported accounting for a center effect, a rate similar to past reviews [5, 6]. This result may be due to the fact that it is challenging to properly adjust for center when there are few centers, total sample size is small, or there are few events per center. Clear practical guidelines for the statistical analyses and reporting of multicenter studies are needed to assist investigators and data analysts in conducting appropriate multicenter analyses. The best-suited methods to adjust for center include generalized estimating equations (GEEs) and generalized linear mixed models (GLMMs; also referred to as random effects, multilevel, or mixed effects models). However, careful application of these methods is needed for studies with few centers. For example, the robust SEs typically reported for GEEs are downward-biased when the number of centers is < 50 [7]. For GLMMs, the approximate Wald test and CIs may have inflated type I error rates [8]. Our objective in this article is to review and evaluate both frequentist and Bayesian state-of-the-art statistical methods for estimating relative risk (RR) for multicenter studies. We focus on RRs rather than ORs because RRs are considered a more meaningful and interpretable treatment measure [9, 10], and few studies have evaluated methods for estimating adjusted RRs. We provide recommendations and practical guidelines for analyzing both RCTs and observational multicenter studies.

Methods

We review three methods for estimating RRs: log-binomial regression, the widely used GEE method, and GLMMs. Specifically, we evaluate a total of ten regression models: Log binomial GEE binomial GEE binomial with small-sample correction of SEs GEE Poisson GEE Poisson with small-sample correction of SEs GLMM binomial GLMM binomial with bootstrapped SEs GLMM Poisson GLMM Poisson with bootstrapped SEs Bayesian GLMM binomial We assume a study design with ≤ 30 centers, binary outcome, binary treatment/exposure variable, and a binary baseline covariate that could be a stratifying variable or a potential confounder. We also assume the center size variation is not very large (i.e., coefficient of variation < 0.40). Table 1 provides a summary of all the models and details of their specification evaluated in the simulation study.

Table 1

Summary of details of ten regression models evaluated in the simulation study

Model	SEs	95% CI or posterior interval	Other assumptions
GLM-Bin	Model-based, unadjusted for center correlation	Wald
GEE binomial	Robust sandwich	t-based	Exchangeable working correlation
GEE binomial KC-corrected^a	Robust sandwich with small-sample correction	t-based	Exchangeable working correlation
GEE Poisson	Robust sandwich	t-based	Exchangeable working correlation
GEE Poisson KC-corrected^a	Robust sandwich with small-sample correction	t-based	Exchangeable working correlation
GLMM binomial	Model-based	Wald	Adaptive quadrature with 10 points
GLMM binomial bootstrap^a	Parametric bootstrap	Parametric bootstrap, quantile-based	Laplace for fitting bootstrap samples
GLMM Poisson	Model-based	Wald	Adaptive quadrature with 10 points
GLMM Poisson bootstrap^a	Parametric bootstrap	Parametric bootstrap, quantile-based	Laplace for fitting bootstrap samples
Bayesian binomial GLMM	Posterior SD	Quantile-based posterior interval	Priors β₀ ~ Normal(0,10²); β₁, β₂ ~ Normal(0,1); σ ~ half-Normal(0,1)

Abbreviations: GEE Generalized estimating equation, GLM-Bin Log-binomial regression model, GLMM Generalized linear mixed model, KC Kauermann and Carroll

aThe small sample KC correction or bootstrap samples correct only the SEs and 95% CIs and do not affect the point estimates of the risk ratio

Summary of details of ten regression models evaluated in the simulation study Abbreviations: GEE Generalized estimating equation, GLM-Bin Log-binomial regression model, GLMM Generalized linear mixed model, KC Kauermann and Carroll aThe small sample KC correction or bootstrap samples correct only the SEs and 95% CIs and do not affect the point estimates of the risk ratio We do not investigate methods treating center as a fixed effect, because problems with this approach (including exclusion of patients or centers, biased treatment effect estimate, and increased type I error) have been noted before [2-4].

Log-binomial regression

Binary regression models (i.e., logistic) without any adjustment for center correlation are the most often used methods for analyzing multicenter studies. To estimate RRs instead of ORs, we can use a log-binomial regression model (GLM-Bin) adjusting for covariates but with no adjustment for center correlation. Letting y be the observed binary outcome (yes/no) for subject i in center j, this model is specified as where p is the probability of the outcome y , x 1 is the treatment/exposure indicator, and x 2 is a binary baseline covariate. A Bernoulli distribution is assumed for y . The RR of the treatment/exposure is given by exp(β1). Model-based SEs (obtained from the regression model and unadjusted for center clustering) and Wald-type 95% CIs [ 1 ± 1.96 × SE( 1)] are usually reported for this model. In addition to ignoring within-center correlation, this model also has the disadvantage of convergence problems at the parameter boundary and can lead to probability estimates > 1.

GEE models

The GEE log-binomial and log Poisson models take the same form as model (1), assuming either a binomial or Poisson distribution for y. However, the SEs are corrected by using GEEs with an exchangeable correlation matrix, which assumes that patient outcomes from the same center are correlated but are independent from patient outcomes in different centers. GEE Poisson (also referred to as modified Poisson) regression is widely used to estimate RRs because it provides consistent estimates of the RR and is more stable than the GEE binomial model [9, 11]. Similarly to the GLM-Bin, probability estimates from both of these GEE models can be > 1. Robust sandwich SEs for possible misspecification of the covariance structure (and misspecification of the distribution for Poisson regression) are typically used with GEE methods [12]. When the number of centers is small, a bias-corrected variance sandwich estimator is needed to provide correct inference [13]. We use the Kauermann and Carroll (KC) [14] correction of robust SEs because it has been shown to perform well with small numbers of centers [15]. Similarly, the Wald test and CIs typically reported for GEEs have been noted to have inflated type I errors with few centers [15, 16]. An approximate t statistic that accounts for the large variation in the sandwich estimator often present with small samples has been shown to perform better than the Wald test in this setting [17]. For both the binomial and Poisson GEE models, we assess the performance of (1) robust SEs coupled with approximate t-based 95% CIs [ 1 ± t × SErobust( 1)] and (2) KC-corrected robust SEs with t-based 95% CIs for the RR.

GLMMs

Random effects models account for within-center correlation by including a random center intercept. Estimates derived from GLMMs are interpreted as center-specific (or conditional) as opposed to the marginal interpretation of GEE estimates. However, we note that under a log link, the estimated treatment/exposure effect is the same for GEE and GLMMs [18]. Again letting y be the observed binary outcome for subject i in center j, a binomial GLMM is specified as where u is the random center effect and x 1 and x 2 are the treatment/exposure indicator and baseline binary covariate, respectively. We fit GLMMs using adaptive Gauss-Hermite quadrature with 10 quadrature points to provide a good compromise between accuracy and computation time in the simulation study [19]. We use two methods to compute SEs and 95% CIs. First, we calculate Wald 95% CIs for the RR. Although t statistics with various methods for calculating the degrees of freedom have been proposed as a better alternative to the Wald test, these are still approximations in GLMMs and may not perform well, particularly with few centers [8]. Hence, in the second method, we use a parametric bootstrap to calculate SEs and 95% CIs, which is a better alternative for computing CIs for GLMMs [20]. We also assess the performance of log Poisson GLMM using model-based SEs coupled with Wald 95% CIs and compare them with those obtained from a parametric bootstrap. We again note that probability estimates from these GLMMs may be > 1.

Bayesian binomial GLMM

A Bayesian approach provides several advantages, including the ability to give direct estimates of probability of benefit or harm from an intervention or exposure [21]. For the binomial GLMM, weakly informative prior distributions help stabilize the parametric estimates and hence address the convergence issues often seen with the frequentist approach [10, 22–24]. Constraints on the parameters are also easily implemented to avoid probability estimates > 1 [22]. In contrast to frequentist methods, Bayesian SEs and credible intervals (CrIs) for the RR account for all uncertainty in the model, including the between-center variation. Another advantage is that Bayesian inference does not rely on asymptotic results, which is an important issue when the number of centers is limited. A Bayesian approach also allows for the inclusion of informative priors derived from external information to exclude unrealistic RR values [25, 26]. We investigate the performance of a Bayesian GLMM with the same form as that in model 2. For prior distributions, we use neutral priors for all parameters to represent equipoise: a Normal(0,102) for the intercept β0, Normal(0,1) for β1 and β2, and half-Normal(0,1) for σ. We use slightly informative priors on β1 and β2 with a 95% CrI of 0.14–7 in the RR scale to exclude unrealistic RR values that are almost never observed in studies of medical interventions. Similar priors skeptical of large treatment effects have been studied and shown to have good operating characteristics even with small sample sizes [26]. The half-Normal prior for the SD of the random center effect σ is a weakly informative prior that has been shown to perform well [27]. We constrain all p < 1 in the model (see Additional file 1 for sample code).

Simulation study

We conducted a simulation study assuming both a multicenter two-arm RCT and a multicenter observational study design. For each scenario, we simulated 1000 datasets from model 2 with 4, 10, or 30 centers. The number of subjects per center was sampled from a Poisson distribution with mean of 10, 20, and 50 to give average (expected) total samples sizes ranging from 40 to 1500. Under the RCT scenarios, randomization was stratified by center using permuted blocks of size 4. The covariate x 2 was generated from Bernoulli(0.3), and the random center effect uj from Normal(0,0.4) to induce an intracluster correlation coefficient (ICC) of 0.08, where σ2 = ICC×(1−)/ and is the average probability in the sample [28]. The ICC represents the degree of dependence or correlation among observations from individuals within the same cluster or center [27]. The ICC value used in this simulation is within the range of values previously reported in cluster clinical trials [29]. For all RCT scenarios, we assumed a control outcome rate of 15% [i.e., exp(β0) = 0.15]. The treatment and covariate effects were both set to an RR of 1.5 [i.e., β1 = β2 = log(1.5)]. Whenever the simulated p was > 1, a new value of the random center effect u was sampled until p < 1. For the observational study scenarios, we assigned half of the subjects to exposure and the other half to a nonexposure group. To induce confounding, we generated the binary covariate x 2 with prevalence of 0.4 in the exposure group and 0.2 in the nonexposed group using a discretized multivariate Normal method [30]. All other methods and parameters were the same as under the RCT scenarios. Each dataset was analyzed using all the methods listed in Table 1. For the binomial and Poisson GLMMs, we used 3000 bootstrap samples for each dataset to calculate the bootstrap SEs and 95% CIs (from the quantiles). To speed up the calculation, we used Laplace approximation when fitting the models to the bootstrap samples. The Bayesian GLMM was fitted via Markov chain Monte Carlo (MCMC) methods [31]. We used 3 MCMC chains, each with 10,000 iterations using the first 2000 as burn-in. Starting values were sampled from the estimated coefficients and SEs of the frequentist log Poisson model. We visually inspected the trace plots of all estimated parameters for the first 50 datasets of each scenario to monitor convergence of the chains. We additionally calculated the convergence diagnostic and deemed any datasets with an > 1.1 for any parameter as exhibiting nonconvergence (see below) [32]. We captured the posterior median of all four parameters and the 2.5% and 97.5% percentiles of β1 to calculate coverage of the 95% posterior interval. As a sensitivity analysis, for scenarios with ten subjects per center, we also fitted the Bayesian GLMM using vague Normal(0,104) priors for β0, β1, and β2 and half-Cauchy(0,1) [33] for σ. For all models, we calculated the bias (βestimate − βtrue), root mean square error (RMSE), coverage of the 95% CI or posterior interval for the treatment/exposure effect β1, and convergence rate. We defined convergence as the percentage of simulated datasets where (1) the model converged (i.e., no error messages); (2) the absolute values of the point estimates for β0, β1, and β2 were < 5 (larger values would indicate unstable estimates); and (3) for the Bayesian models, the values for all parameters were < 1.10. For each model and scenario, we assessed bias, RMSE, and coverage only in datasets where convergence was achieved. All simulations and analyses were conducted in R [34]. For the fitting of GEE models, we used the geepack [35] and geesmv [36] packages to calculate the degrees of freedom for the t statistic and the KC-corrected robust SEs. For GLMMs, we used the lme4 package [37] for the frequentist models and Stan [38] through the R interface rstan [39] to fit the Bayesian models. We provide sample code in Additional file 1.

Results

Convergence

For the Bayesian models, trace plots of the parameters showed the three chains mixing well after burn-in, except for a small percentage of the datasets, where one of the MCMC chains of σ failed to converge near 0 for a portion of the chain. (Other parameters also did not converge; an example of an RCT dataset is shown in Additional file 1: Figure S1.) These convergence issues were also detected by the diagnostic (>1.1), and these datasets were excluded from the results. Convergence rates for Bayesian models ranged from 92% for the RCT scenario with 4 centers and 10 subjects per center to 100% for some scenarios with 10 or 30 centers. Convergence rates for all scenarios are shown in Additional file 1: Table S1. All frequentist models exhibit convergence problems for the smallest sample size for both designs, with convergence rates ranging from 45% for the binomial GLMM to 86% for the GEE Poisson model. For all other scenarios, convergence was not an issue, except for the binomial GLMM, which had poor convergence rates for all scenarios. Its lowest convergence rate was 27% for the scenario with 30 centers with 10 subjects/center (Additional file 1: Table S1).

Bias and RMSE

The bias was generally small for all frequentist models. It was larger for the smallest sample sizes and diminished as the number of centers and total sample size increased. The Bayesian estimates were more biased in the smallest sample sizes. The negative bias indicates that the posterior medians of the treatment effect are shrunk toward 0 because of the influence of the informative priors. The effect of the prior on the posterior estimates and the resulting bias from the Bayesian GLMM diminishes as the sample size increases and is smaller than the bias from frequentist models for some scenarios (Fig. 1). The Bayesian models have the smallest RMSE for scenarios with four or ten centers. All models give very similar RMSEs with 30 centers (Fig. 2). The bias and RMSE were very similar for both study designs.

Fig. 1

Fig. 2

Root mean square error of β1 for scenarios under randomized controlled trial study designs (a–c) and observational study designs (d–f) based on 1000 simulations for each scenario. All scenarios used a β1 of log(1.5), a control outcome rate of 15%, and an intracluster correlation coefficient of 0.08. GEE Generalized estimating equation, GLM-Bin Log-binomial regression model, GLMM Generalized linear mixed model

Bias of the estimates of β1 (calculated as βestimate − βtrue) for different scenarios under a multicenter randomized controlled trial study design (a–c) and an observational study design (d–f) based on 1000 simulations for each scenario. All scenarios used a β1 of log(1.5), a control outcome rate of 15%, and an intracluster correlation coefficient of 0.08. GEE Generalized estimating equation, GLM-Bin Log-binomial regression model, GLMM Generalized linear mixed model Root mean square error of β1 for scenarios under randomized controlled trial study designs (a–c) and observational study designs (d–f) based on 1000 simulations for each scenario. All scenarios used a β1 of log(1.5), a control outcome rate of 15%, and an intracluster correlation coefficient of 0.08. GEE Generalized estimating equation, GLM-Bin Log-binomial regression model, GLMM Generalized linear mixed model

Coverage

Figure 3 shows coverage of the 95% CIs and posterior interval for β1. The unadjusted GLM-Bin exhibits coverage above the nominal range (93.6–96.4% using 1000 datasets) with 4 or 10 centers and 10 or 20 subjects/center, but it has good coverage for all other scenarios. GEE CIs without the KC correction also have poor coverage with 4 centers (85–87%) and 10 centers with 50 subjects (~92%). KC-corrected GEE CIs have good coverage across the majority of scenarios, with the GEE Poisson having coverage closer to nominal than the GEE binomial. The binomial GLMM has coverage above the nominal value for scenarios with the three smallest total sample sizes; the coverage of the bootstrap CIs is more conservative than the Wald CIs for some scenarios. However, the GLMM Poisson model results in coverage well above the nominal value in all scenarios with both Wald-type and bootstrap CIs. The Bayesian model has conservative coverage for scenarios with total sample size ≤ 200; for all other scenarios, coverage is close to the nominal value. The study design had little effect on coverage.

Fig. 3

Coverage of 95% CI and posterior interval of β1 for scenarios under randomized controlled trial study designs (a–c) and observational study designs (d–f) based on 1000 simulations for each scenario. All scenarios used a β1 of log(1.5), control outcome rate of 15%, and an intracluster correlation coefficient of 0.08. GEE Generalized estimating equation, GLM-Bin Log-binomial regression model, GLMM Generalized linear mixed model, KC Kauermann and Carroll, BS Bootstrap

Sensitivity results

Under both study designs with four centers, Bayesian GLMMs with vague priors had a lower convergence rate, smaller bias, larger RMSE, and less conservative coverage than informative priors. For scenarios with 10 or 30 centers, vague priors resulted in similar convergence rate, bias, RMSE, and coverage compared with informative priors for both designs. In all scenarios, the estimates of the between-center SD were very similar under both vague and informative priors.

Examples

Infection treatment multicenter trial

We analyzed the data presented by Beitler and Landis [40] arising from an eight-center RCT investigating the efficacy of an active drug compared with control for treatment of an infection. The primary outcome was favorable response to the drug. In the eight centers, the rate of success in the active drug group (n = 130) varied from 9% to 80%, whereas the control group (n = 143) had a rate of success ranging from 0 to 86%. We used the same methods as described for the simulation study (excluding Poisson GLMM, which did not perform well in the simulation study). Table 2 shows the estimated RRs derived from the different methods. The binomial GLMM model did not converge. The RR estimates differ across the models, with the binomial GEE resulting in the largest RR of 1.43 and the Bayesian GLMM resulting in the smallest RR of 1.27. The 95% CIs from the GEE models without the KC correction do not include 1.0. In comparison, the 95% CI from the unadjusted GLM and KC-corrected GEE models include 1.0. The Bayesian 95% CrI is the narrowest, despite properly accounting for center variation (estimated as 0.81) and might lead to a conclusion that the active drug is effective.

Table 2

Estimated relative risk and 95% CI or credible interval for multicenter randomized controlled trial example

Model^a	RR	95% CI or CrI
GLM-Bin	1.29	0.95–1.75
GEE binomial	1.43	1.01–2.02
GEE binomial, KC-corrected		0.96–2.14
GEE Poisson	1.42	1.01–2.01
GEE Poisson, KC-corrected		0.95–2.12
GLMM binomial^b	–	–
Bayesian binomial GLMM	1.27	1.00–1.65

Abbreviations: CrI Credible interval, GEE Generalized estimating equation, GLM-Bin Log-binomial regression model, GLMM Generalized linear mixed model, KC Kauermann and Carroll, RR Relative risk

Except for the GLM-Bin model, all models are adjusted for center correlation. The KC bias correction in the GEE models adjusts the robust SE for the small number of centers (estimate of RR does not change)

aGLMM Poisson models are excluded because of their poor performance in the simulation study

bModel did not converge

Estimated relative risk and 95% CI or credible interval for multicenter randomized controlled trial example Abbreviations: CrI Credible interval, GEE Generalized estimating equation, GLM-Bin Log-binomial regression model, GLMM Generalized linear mixed model, KC Kauermann and Carroll, RR Relative risk Except for the GLM-Bin model, all models are adjusted for center correlation. The KC bias correction in the GEE models adjusts the robust SE for the small number of centers (estimate of RR does not change) aGLMM Poisson models are excluded because of their poor performance in the simulation study bModel did not converge

Pediatric appendicitis observational study

In this study, we compared cohorts of children before and after implementation of a clinical practice guideline for treatment of perforated appendicitis in children [41]. The study was conducted in a children’s hospital with 11 surgeons providing care. The primary outcome was the occurrence of any adverse event, such as readmission or surgical site infection. Totals of 191 and 122 patients were included in the pre- and postimplementation cohorts, respectively. We compared the analytical methods including identification of intra-abdominal abscess as a covariate, with surgeon as the clustering variable. Estimated RRs and 95% CIs are shown in Table 3. Compared with the other models, the RR estimates from both GEE models are closer to 1 and their CIs without the KC bias correction are narrower than all other intervals. However, the main conclusion of the intervention being associated with reduced adverse outcomes (although not statistically significant) would not differ on the basis of the analytical method chosen.

Table 3

Results for pediatric appendicitis study

Model	RR	95% CI or CrI
GLM-Bin	0.76	0.51–1.11
GEE binomial	0.79	0.56–1.13
GEE binomial, KC-corrected		0.53–1.19
GEE Poisson	0.80	0.56–1.14
GEE Poisson, KC-corrected		0.53–1.21
GLMM binomial	0.76	0.51–1.11
GLMM binomial bootstrap		0.51–1.10
Bayesian binomial GLMM	0.73	0.49–1.07

Abbreviations: CrI Credible interval, GEE Generalized estimating equation, GLM-Bin Log-binomial regression model, GLMM Generalized linear mixed model, KC Kauermann and Carroll, RR Relative risk

The KC bias correction in the GEE models adjusts the robust SE for the small number of centers (estimate of RR does not change)

Results for pediatric appendicitis study Abbreviations: CrI Credible interval, GEE Generalized estimating equation, GLM-Bin Log-binomial regression model, GLMM Generalized linear mixed model, KC Kauermann and Carroll, RR Relative risk The KC bias correction in the GEE models adjusts the robust SE for the small number of centers (estimate of RR does not change)

Discussion

For multicenter studies, it is important to adjust for possible center correlation when computing treatment effects, particularly when the proportion of center variability over total variability is large or when randomization is stratified by center to ensure correct SEs and CIs [1, 4]. However, no clear guidelines exist for the appropriate analyses, and it can be challenging for data analysts to perform a properly adjusted analysis when the outcome is binary and there are few centers. In this paper, we have reviewed and evaluated methods available for analyses of multicenter studies.

Summary of simulation results

For all but the smallest sample size, convergence rates were ≥ 96% for all models except the binomial GLMM. This model had convergence problems in all scenarios investigated, and its use may be limited. All frequentist estimates of the treatment effect had small and very similar bias. The Bayesian estimates were more biased in the smallest sample sizes. Although the binomial GLM with unadjusted SEs had very little bias, it had conservative coverage for the smallest sample sizes in both the RCT and observational designs. The GEE models without a small sample correction for the sandwich SEs had very poor coverage with four or ten centers, even when coupled with approximate t-based CIs. This poor performance of sandwich SEs has been noted before in estimating ORs [3]. Using the KC bias correction greatly improved the performance of these models across all scenarios. These results are similar to those obtained by Yelland et al. [9] and Zou and Donner [11], although they used different corrections for the variance estimates. When the binomial GLMM achieved convergence, it had good overall performance except for sample sizes < 100, where it had conservative coverage. Across all scenarios, the Poisson GLMM also had coverage above the nominal value even with bootstrapped CIs, which would lead to diminished power. The Bayesian GLMM had good coverage across all scenarios, and the bias exhibited in the smallest sample sizes was only slightly larger than the other models evaluated. Its higher convergence rate for the smallest sample size is due to the use of informative priors that help stabilize the estimates, particularly in cases of complete separation [24, 26]. Although we do not report the estimate of the SD of the random effects (or ICC), the Bayesian GLMM outperformed the frequentist models with estimates that were very close to the true parameter value. In contrast, the frequentist models consistently underestimated this parameter even in the larger sample sizes. This downward bias of estimates of the variance components in GLMMs is well known [19, 42]. Here the Bayesian approach has a clear advantage over frequentist methods because it provides less biased variance estimates and automatically produces CIs for these parameters. More importantly, the point estimates and CIs of the treatment/exposure effect appropriately account for the uncertainty in the variance parameter. The sensitivity analysis for the Bayesian GLMM using vague priors produced results very similar to those under the informative priors. However, the range of effect sizes supported by the vague priors is unrealistic, and we strongly recommend against using these priors.

Recommendations

On the basis of our simulation results and other studies [3, 43], we recommend that both RCT and observational multicenter studies adjust for center in the analysis. Although adjusting for center when the ICC is small may not provide a great advantage, it also does not adversely affect the point estimates, SEs, and type I error rates. Furthermore, methods that properly adjust for center clustering are easy to implement in most statistical software. We do not recommend the use of a Poisson GLMM to estimate adjusted RR. We do recommend use of a binomial GLMM, except when the number of centers is < 5 or the total sample size is < 100, although convergence may be a problem. The most robust frequentist methods appear to be either a GEE log-binomial model or a Poisson model with an exchangeable correlation. When the number of centers or clusters is < 50, a sandwich variance estimator needs to include a small sample correction such as the KC correction used here. Kahan et al. [3] reported that model-based SEs may be another option for GEE models, but we did not evaluate them here. A Bayesian GLMM is a robust alternative. This method performed the best in terms of all measures of convergence, bias, RMSE, and coverage. Another advantage of a Bayesian approach is that exterior information about treatment/exposure effect can be formally incorporated into the prior distributions. As we did here, the priors can explicitly exclude large effects, which are unlikely in clinical studies. Probabilities of benefit or harm are easily obtained and can be more informative for investigators than the traditional p value. Although our focus was not on the estimation of the between-center variance, the Bayesian model outperformed all other methods in estimating this parameter. Implementation of the Bayesian model can be done in OpenBUGS or in Stan through R as was done here.

Limitations

Our simulation study was limited to one treatment/exposure effect size and control rate. However, others have noted similar performance when the effect size or control outcome rate was varied [3, 9, 11, 15]. We also did not investigate fixed effects models, because their limitations have been noted before [2-4]. However, these methods could be an alternative method of analysis for studies with three or fewer centers. We also note computational limitations faced in most simulation studies. In particular, results from the Bayesian models would have benefited from running longer chains. Increasing the number of bootstrap samples for the GLMMs could also potentially improve their performance. Our simulation study did not include scenarios with an ICC of 0. However, others have found that the methods recommended here perform well even in cases where the ICC is very close to 0 [8, 15, 43]. We investigated the performance of only an exchangeable correlation matrix for the GEE models, which is a plausible assumption for multicenter data. However, other correlation structures can be used, and the performance of GEE models has been shown to be robust to the choice of the correlation structure [11]. We chose to focus on binary outcome because it is the most common type of outcome reported in medical research, but it would be important to investigate methods for other types of outcomes (i.e., time-to-event data). Last, we did not investigate treatment by center interactions, and this is an important issue that needs to be investigated in future studies.

Conclusions

For the analysis of multicenter studies with a binary outcome, we recommend adjustment for center with either a GEE log-binomial or Poisson model or a Bayesian binomial GLMM with informative priors. The GEE models should include a small sample variance correction for sandwich estimators when the number of center is < 30. The Bayesian model with informative priors provides stable estimates, greater flexibility, and good performance even with very small sample sizes.

24 in total

1. Putting background information about relative risks into conjugate prior distributions.

Authors: S Greenland
Journal: Biometrics Date: 2001-09 Impact factor: 2.571

2. Bayesian methods for analysis of binary outcome data in cluster randomized trials on the absolute risk scale.

Authors: Simon G Thompson; David E Warn; Rebecca M Turner
Journal: Stat Med Date: 2004-02-15 Impact factor: 2.373

3. Estimation of risk ratios in cohort studies with common outcomes: a Bayesian approach.

Authors: Haitao Chu; Stephen R Cole
Journal: Epidemiology Date: 2010-11 Impact factor: 4.822

4. Performance of informative priors skeptical of large treatment effects in clinical trials: A simulation study.

Authors: Claudia Pedroza; Weilu Han; Van Thi Thanh Truong; Charles Green; Jon E Tyson
Journal: Stat Methods Med Res Date: 2015-12-13 Impact factor: 3.021

Review 5. Lessons for cluster randomized trials in the twenty-first century: a systematic review of trials in primary care.

Authors: Sandra M Eldridge; Deborah Ashby; Gene S Feder; Alicja R Rudnicka; Obioha C Ukoumunne
Journal: Clin Trials Date: 2004-02 Impact factor: 2.486

6. How vague is vague? A simulation study of the impact of the use of vague prior distributions in MCMC using WinBUGS.

Authors: Paul C Lambert; Alex J Sutton; Paul R Burton; Keith R Abrams; David R Jones
Journal: Stat Med Date: 2005-08-15 Impact factor: 2.373

7. A comparison of two bias-corrected covariance estimators for generalized estimating equations.

Authors: Bing Lu; John S Preisser; Bahjat F Qaqish; Chirayath Suchindran; Shrikant I Bangdiwala; Mark Wolfson
Journal: Biometrics Date: 2007-09 Impact factor: 2.571

8. Effect of a Clinical Practice Guideline for Pediatric Complicated Appendicitis.

Authors: Zachary I Willis; Eileen M Duggan; Brian T Bucher; John B Pietsch; Monica Milovancev; Whitney Wharton; Jessica Gillon; Harold N Lovvorn; James A O'Neill; M Cecilia Di Pentima; Martin L Blakely
Journal: JAMA Surg Date: 2016-05-18 Impact factor: 14.766

9. Covariance estimators for generalized estimating equations (GEE) in longitudinal analysis with small samples.

Authors: Ming Wang; Lan Kong; Zheng Li; Lijun Zhang
Journal: Stat Med Date: 2015-11-19 Impact factor: 2.373

10. Bayesian models as a unified approach to estimate relative risk (or prevalence ratio) in binary and polytomous outcomes.

Authors: Vanessa Bielefeldt Leotti Torman; Suzi Alves Camey
Journal: Emerg Themes Epidemiol Date: 2015-06-20

11 in total

1. Alternatives to Logistic Regression Models when Analyzing Cluster Randomized Trials with Binary Outcomes.

Authors: Francis L Huang
Journal: Prev Sci Date: 2021-04-06

2. A Multilevel Integrated Intervention to Reduce the Impact of HIV Stigma on HIV Treatment Outcomes Among Adolescents Living With HIV in Uganda: Protocol for a Randomized Controlled Trial.

Authors: Massy Mutumba; Fred Ssewamala; Rashida Namirembe; Ozge Sensoy Bahar; Proscovia Nabunya; Torsten Neilands; Yesim Tozan; Flavia Namuwonge; Jennifer Nattabi; Penina Acayo Laker; Barbara Mukasa; Abel Mwebembezi
Journal: JMIR Res Protoc Date: 2022-10-05

3. Statistical analysis plan for a cluster-randomised trial assessing the effectiveness of implementation of a bedside evidence-based checklist for clinical management of brain-dead potential organ donors in intensive care units: DONORS (Donation Network to Optimise Organ Recovery Study).

Authors: Natalia Elis Giordani; Caroline Cabral Robinson; Glauco Adrieno Westphal; Regis Goulart Rosa; Daniel Sganzerla; Alexandre Biasi Cavalcanti; Flávia Ribeiro Machado; Luciano Cesar Pontes Azevedo; Fernando Augusto Bozza; Cassiano Teixeira; Joel de Andrade; Cristiano Augusto Franke; Cátia Moreira Guterres; Itiana Cardoso Madalena; Adriane Isabel Rohden; Sabrina Souza da Silva; Luiza Vitelo Andrighetto; Gabriela Soares Rech; Bruna Dos Passos Gimenes; Luciano Serpa Hammes; Daniela Ferreira Salomão Pontes; Maureen O Meade; Maicon Falavigna
Journal: Trials Date: 2020-06-17 Impact factor: 2.279

4. Crowdsourcing to expand HIV testing among men who have sex with men in China: A closed cohort stepped wedge cluster randomized controlled trial.

Authors: Weiming Tang; Chongyi Wei; Bolin Cao; Dan Wu; Katherine T Li; Haidong Lu; Wei Ma; Dianmin Kang; Haochu Li; Meizhen Liao; Katie R Mollan; Michael G Hudgens; Chuncheng Liu; Wenting Huang; Aifeng Liu; Ye Zhang; M Kumi Smith; Kate M Mitchell; Jason J Ong; Hongyun Fu; Peter Vickerman; Ligang Yang; Cheng Wang; Heping Zheng; Bin Yang; Joseph D Tucker
Journal: PLoS Med Date: 2018-08-28 Impact factor: 11.069

5. Association Between Mode of First Delivery and Subsequent Fecundity and Fertility.

Authors: Kristen H Kjerulff; Ian M Paul; Carol S Weisman; Marianne M Hillemeier; Ming Wang; Richard S Legro; John T Repke
Journal: JAMA Netw Open Date: 2020-04-01

6. Analysis of multicenter clinical trials with very low event rates.

Authors: Jiyu Kim; Andrea B Troxel; Scott D Halpern; Kevin G Volpp; Brennan C Kahan; Tim P Morris; Michael O Harhay
Journal: Trials Date: 2020-11-09 Impact factor: 2.279

7. Kinematic analysis of movement impaired by generalization of fear of movement-related pain in workers with low back pain.

Authors: Ren Fujii; Ryota Imai; Shinichiro Tanaka; Shu Morioka
Journal: PLoS One Date: 2021-09-17 Impact factor: 3.240

8. Occupational factors and miscarriages in the US fire service: a cross-sectional analysis of women firefighters.

Authors: Alesia M Jung; Sara A Jahnke; Leslie K Dennis; Melanie L Bell; Jefferey L Burgess; Nattinee Jitnarin; Christopher M Kaipust; Leslie V Farland
Journal: Environ Health Date: 2021-11-08 Impact factor: 5.984

9. Effect of triclosan-coated sutures for abdominal wound closure on the incidence of abdominal wound dehiscence: a protocol for an individual participant data meta-analysis.

Authors: Allard S Timmer; Niels Wolfhagen; Frank Pianka; Phillip Knebel; Christoph Justinger; Christos Stravodimos; Kosuke Ichida; Toshiki Rikiyama; József Baracs; András Vereczkei; Luca Gianotti; Jaime Ruiz-Tovar; Artur M Hernández; Toru Nakamura; Marcel G W Dijkgraaf; Marja A Boermeester; Stijn W de Jonge
Journal: BMJ Open Date: 2022-02-23 Impact factor: 2.692

10. Statistical analysis plan for Better Evidence for Selecting Transplant Fluids (BEST-Fluids): a randomised controlled trial of the effect of intravenous fluid therapy with balanced crystalloid versus saline on the incidence of delayed graft function in deceased donor kidney transplantation.

Authors: Elaine M Pascoe; Steven J Chadban; Magid A Fahim; Carmel M Hawley; David W Johnson; Michael G Collins
Journal: Trials Date: 2022-01-18 Impact factor: 2.279