Literature DB >> 32728648

An evaluation of quadratic inference functions for estimating intervention effects in cluster randomized trials.

Hengshi Yu¹, Fan Li^2,3, Elizabeth L Turner^4,5.

Abstract

Cluster randomized trials (CRTs) usually randomize groups of individuals to interventions, and outcomes are typically measured at the individual level. Marginal intervention effects are frequently of interest in CRTs due to their population-averaged interpretations. Such effects are estimated using generalized estimating equations (GEE), or a recent alternative called the quadratic inference function (QIF). However, the performance of QIF relative to GEE have not been extensively evaluated in the CRT context, especially when the marginal mean model includes additional covariates. Motivated by the HALI trial, we conduct simulation studies to compare the finite-sample operating characteristics of QIF and GEE. We demonstrate that QIF and GEE are equivalent under some conditions. When the marginal mean model includes individual-level covariates, QIF shows an efficiency improvement over GEE with overall larger power, but its test size may be more liberal than GEE and GEE achieves better coverage than QIF. The test size inflation may not by fully addressed from using finite-sample bias corrections. The estimates of QIF tend to be closer to GEE in the HALI data, although the former presents a small standard error. Overall, we confirm that the QIF approach generally has potentially better efficiency than GEE in our simulation studies but might be more cautiously used as a viable approach for the analysis of CRTs. More research is needed, however, to address the finite-sample bias in the variance estimation of the QIF to better control its test size.

Entities: Chemical Disease Gene Species

Keywords: Generalized estimating equations; Generalized method of moments; Marginal intervention effect estimation; Population-average intervention effect; Quadratic inference function

Year: 2020 PMID： 32728648 PMCID： PMC7381491 DOI： 10.1016/j.conctc.2020.100605

Source DB: PubMed Journal: Contemp Clin Trials Commun ISSN： 2451-8654

Introduction

A cluster randomized trial (CRT), also referred to as a group randomized trial or a community-randomized trial, is a randomized controlled trial where a cluster (e.g. hospital, village) of individuals is the unit of randomization [1]. The outcomes are usually measured for each individual member within a cluster. As a consequence of the cluster randomization design, the outcomes of individuals within the same cluster are expected to be correlated and this correlation, namely the intraclass correlation coefficient (ICC), must be accounted for in both the design and analysis [2,3]. During the analysis stage, the ICC is commonly accounted for using one of two modeling approaches for individual-level outcome data. One approach is the mixed-effects model, which estimates the cluster-specific (conditional) intervention effect. The other approach is the generalized estimating equations (GEE), which estimate the population-averaged (marginal) intervention effect [4]. The conditional and marginal intervention effects are equal to each other with the identity or log link, while they could differ with the logit link for binary outcomes [5]. The choice of marginal or conditional models depends on the research objectives and each model has its pros and cons. In particular, while the mixed-effects model requires the correct specification of the random-effects distribution to obtain consistent model-based variance [6], the marginal model is more robust with the sandwich variance being consistent even under misspecification of the correlation structure. For this reason, the marginal model coupled with GEE has been commonly used in the analysis of CRTs [7]. Despite the robustness of the marginal model, misspecification of the correlation structure could lead to reduced efficiency in estimating the intervention effect. To improve the efficiency of GEE, an alternative approach – the quadratic inference function (QIF) – has been developed [8,9]. Although it has been shown that QIF may have an efficiency advantage over GEE with correlated data that arise in the longitudinal data settings [8,9], it is not as commonly used in the analysis of CRTs, with a few exceptions [[10], [11], [12]]. The purpose of this article is to provide additional empirical evidence by comparing the performance of QIF and GEE in the CRT scenarios, and discuss several analytical insights of these two methods.

Motivating study: the HALI trial

The HALI (Health and Literacy Intervention) trial is a 22 factorial CRT conducted in Kenya to evaluate the impact of a malaria intervention and a literacy intervention on child health and educational outcomes [13,14]. We focus on the literacy intervention and do not address features of the factorial design in this paper. In other words, we treat the trial as a regular parallel-arm CRT. A multilevel data structure is present because children are nested in schools, which are nested in teacher advisory center; the randomization is carried out at the school level and stratified by each teacher advisory center. We focus on the 9-month outcome and only consider school-level clustering as previous analysis suggested minimal clustering at the teacher advisory center level [15]. There are 51 schools in the literacy intervention arm and 50 schools in the control arm, with approximately 25 children in each school. The descriptive statistics in the HALI trial are provided in Table 1. The primary outcome is the spelling score, which ranges from 0 to 20 and is treated as a continuous outcome. Other baseline variables included in the trial are age, sex, baseline spelling score (individual-level variables) and the availability of handwashing facilities (school-level variable), which might have a direct or indirect relationship with the primary outcome. All baseline variables (with age log-transformed) are reasonably balanced between the intervention and the control arms due to randomization [15].

Table 1

Baseline descriptive statistics in the HALI trial. The total sample size includes children in schools (137 participants have missing baseline spelling scores and excluded in the analysis).

Level	Variable		Overall	Intervention	Control
Level	Variable		(101 schools)	(51 schools)	(50 schools)
Child-level	log-Age	n	2230	1103	1127
		Min	1.6	1.6	1.6
		Mean	2.0	2.0	2.0
		Median	2.1	2.1	2.1
		IQR	1.9~2.2	1.9~2.2	1.9~2.2
		Max	2.7	2.7	2.6
		SD	0.22	0.21	0.22
	Sex	n	2230(100%)	1103(100%)	1127(100%)
		Male	1136(50.9%)	575(52.1%)	561(49.8%)
		Female	1094(49.1%)	528(47.9%)	566(50.2%)
	Baseline spellingscore	n	2193	1089	1104
		Min	0	0	0
		Mean	8.2	8.5	7.9
		Median	7	8	7
		IQR	5~11	5~12	5~11
		Max	20	20	19
		SD	4.50	4.66	4.33
School-level	Handwash facilitiesin school	N	101(100%)	51(100%)	50(100%)
		Yes	26(25.7%)	17(33.3%)	9(18.0%)
		No	75(74.3%)	34(66.7%)	41(82.0%)

Baseline descriptive statistics in the HALI trial. The total sample size includes children in schools (137 participants have missing baseline spelling scores and excluded in the analysis). In the context of the HALI study, we consider CRTs with a continuous outcome measured at a single follow-up time point. We also assume a relatively large number of clusters (e.g., ) with equal cluster size. We first describe the GEE and QIF approaches to estimate the marginal intervention effect and analytically study their connections. The finite-sample operating characteristics of GEE and QIF are evaluated using simulations, and both approaches are applied to the HALI trial for empirical illustrations.

Statistical methods

Generalized estimating equations (GEE)

For correlated data arising from N clusters and m individuals in each cluster, we use , , to denote the p-dimensional design vector, individual-level outcome, and marginal mean of the jth individual in the ith cluster. We define the covariate matrix , outcome vector and mean vector for the ith cluster. We write as the working covariance matrix of the ith cluster, as the p-dimensional regression parameter vector, and define a generalized linear model , where is a monotonic and differentiable link function. The marginal variance is defined as , where is a parametric variance function and φ is the common dispersion. We also define the gradient matrix . The generalized estimating equations (GEE) approach was originally developed for longitudinal data analysis based on quasi-likelihood [4,16] and formulated as To specify the unknown covariance matrix , a working correlation matrix is assumed and frequently parameterized by a common parameter . Define the variance matrix as , and the covariance matrix can be written as . The estimation of and is carried out via the modified Fisher-scoring algorithm [4]. It is known that the GEE estimator is consistent for any choice of working correlation structure and has a robust ‘‘sandwich’’ covariance matrix given by A consistent estimator for the ‘‘sandwich’’ covariance matrix is obtained by replacing in equation (1) with its empirical version . In the context of CRTs, two most frequently used correlation structure is the independence and exchangeable structure. Interestingly, under some conditions, the GEE estimator provides identical point estimates using either correlation structure [8], as summarized in the following result. Under the independence and exchangeable working correlations, GEE produces identical point estimate and the robust sandwich covariance if the following two conditions are satisfied: (1) the marginal mean model only includes cluster-level covariates; (2) equal cluster sizes. The proof of Result 1 is provided in Appendix I. In CRTs with correlated binary outcomes, Pan [17] demonstrated the same result assuming a logistic marginal mean model, and Result 1 can be viewed as a generalization of this earlier result to arbitrary link and variance functions. In practice, this result implies that when only the intervention indicator is included in the marginal analysis of CRTs, the point estimate for is the same regardless of the working correlation specification as long as the cluster sizes are the same. Moreover, Result 1 has importantly implications for sample size and power calculations in the design stage. The equal cluster-size assumption is frequently assumed in designing CRTs [18], in which case the sample size requirements become identical under either working correlation specification.

Quadratic inference function (QIF)

The approach of quadratic inference function (QIF) was introduced to improve the efficiency of GEE in longitudinal data analysis under correlation misspecification [19]. The QIF approach expresses the inverse of the working correlation matrix as a linear combination of K basis matrices: with as the kth basis matrix and the weight. The first basis matrix is usually specified as the identity matrix . A -dimensional score vector is then defined as Let be the list of extended score equations and be the empirical covariance matrix of . The QIF is written as . Based on the generalized method of moments (GMM) [20], the estimator can be more efficient than the GEE estimator in large samples when the working correlation is misspecified. From the first derivative of , the QIF estimator obtained by minimizing the function is asymptotically equivalent to solving [19], and the Newton-Raphson algorithm can be used to iteratively update the estimator for until convergence [9]. A consistent variance estimator of the QIF estimator then has a sandwich form Due to the theoretical efficiency improvement of QIF over GEE, there has been increasing efforts in developing the theory of QIF for analyzing correlated data, including the following examples concerning longitudinal data analysis. A QIF likelihood-ratio test statistic, with asymptotic Chi-squared distribution, was proposed and shown useful to test for goodness-of-fit in longitudinal studies [19]. A penalized version of QIF can further accommodate the variable selection [21]. The Godambe Information (TGI) criterion and the trace of the empirical covariance matrix were developed to select the appropriate correlation structure [22,23]. The QIF approach has also been shown to automatically down-weight outlying observations [8], while GEE has unbounded influence function and can be sensitive to outliers. In addition, the QIF approach can also be utilized for meta-analysis with a flexible joint estimation procedure [24]. In small to moderately-sized samples, it was found that the standard errors of parameters can be severely downward-biased and two biased-corrected covariance estimators have shown to provide adequate finite-sample adjustments [9]. Furthermore, it was shown that imbalance of covariate distributions and of cluster sizes can also lead to larger variability of the QIF estimator [25]. In contrast to the longitudinal data setting, in a CRT with a single follow-up time-point, there is no natural ordering or structure of individuals in the same cluster. That is, decay-type structures are not appropriate but the exchangeable working correlation structure with a common ICC parameter is a natural choice. For the exchangeable working correlation structure, we can write , where is a -dimensional matrix of 1's. Following Li et al. [26], we have . As is not a full-rank matrix, a better use is to specify and as two full-rank basis matrices for exchangeable correlation structure of QIF [9]. We can also use QIF with an independence correlation structure with only one basis matrix of the identity matrix . Parallel to Result 1, we provide an additional insight that the results obtained from GEE and QIF are equal under conditions assumed below. Under the exchangeable working correlation, QIF and GEE produce identical point estimates and robust covariances if two conditions are satisfied: (1) the marginal mean model only includes cluster-level covariates; (2) equal cluster sizes. The proof of Result 2 is provided in Appendix I. Furthermore, it has been pointed out previously that QIF and GEE are identical when the independence working correlation is used [8,25]. Combining Result 1 and Result 2, we further summarize an additional result as follows. The point estimates and robust covariances are identical using either GEE or QIF with either independence or exchangeable working correlation, if the following two conditions are satisfied: (1) the marginal mean model only includes cluster-level covariates; (2) equal cluster sizes. The proof of Result 3 is also provided in Appendix I. As previously indicated, these results have implications for sample size procedures assuming equal cluster sizes, in which cases the sample requirements will be equivalent using either GEE or QIF coupled with either independence and exchangeable correlation structures.

Simulation studies

To study the empirical performance of QIF in CRTs, we carry out a series of simulation studies. Inspired by the HALI trial in our motivating example, we assume 100 clusters with 50 clusters randomized to the treatment and control arms. For simplicity, we specify the cluster sizes to be 25 for all 100 clusters. We consider two mean models and four types of correlation structures to form different data generating process (DGP) in CRTs. We then implement GEE and QIF with the correctly-specified mean model and exchangeable working correlation structure. In each simulation, we use a multivariate normal model to simulate outcomes in each cluster with individual-level variance . The four correlation structures used in the DGP include: fixed-regular exchangeable (CS0), regular exchangeable (CS1), cluster-specific exchangeable (CS2) and cluster-specific exchangeable with sub-clustering (CS3). The two selected mean models are: intervention-only model (MM1) and covariate-adjusted model (MM2). In the intervention-only mean model (MM1), we use to indicate whether the ith cluster is in the intervention arm () or not (). The MM1 is given by . We fix and allow the marginal intervention effect to adopt a range of different values in the DGP. For MM2, we simulate four covariates whose distributions are informed by the HALI trial. Specifically, we simulate three individual-level covariates , and mimicking the age, sex and baseline spelling score; we assume follows a log-normal distribution with mean of 2 and standard deviation 0.2 (both on the log scale), follows a binomial distribution with probability of 0.5 and follows a normal distribution with mean 8 and standard deviation 5. We also simulate a cluster-level covariate from a binomial distribution with probability of 0.26 based on the school-level prevalence of handwash facilities. We write MM2 as . We specify different values for the intervention effect in the DGP, while keeping all other covariate effects constant as . For the correlation structures, we first assume the fixed-regular exchangeable correlation structure (CS0) that all clusters have the same exchangeable correlation structure with ICC . For the regular exchangeable correlation structure (CS1), we assume all the clusters have the same exchangeable correlation matrices in the same simulation iteration but might vary in different iterations. The cluster-specific exchangeable correlation structure (CS2) further allows possibly different exchangeable correlation matrices for each cluster in each simulation iteration. Both CS1 and CS2 assume the ICC ρ in the exchangeable correlation structure is sampled from a uniform distribution from 0.01 to 0.2. For the cluster-specific exchangeable correlation structure with sub-clustering (CS3), we first generate a variable from a discrete uniform distribution in . For the jth and kth individuals in the ith cluster, we specify their pairwise correlation value to be if and are the same, indicating most closely-correlated. If and differ in 1, 2 or 3, we specify the correlation to be or , respectively, in order to model different degrees of pairwise correlations. Particularly, except for CS0, we have assumed the true correlation matrix is sampled from a population-level distribution; the purpose of this additional step is to provide a data generating process with the desired marginal mean and a complex correlation structure, but without the multivariate normality assumption. This type of DGP is less restrictive than the usual multivariate normal DGP, and we provide additional technical details on this type of DGP in Appendix II. For CS0, CS1 and CS2, we use five different values for in the DGP, and use six values for under CS3, leading to a total of 42 scenarios. We generate replicates for each of 8 combinations of the 2 mean models and 4 correlation structures. Two models are fit for each simulated data, namely GEE with exchangeable working correlation and QIF with exchangeable working correlation. As we explain in Appendix II, the working correlation matrix will be incorrectly specified under DGP with CS2 and CS3. In all scenarios, the correct mean model is specified in the analysis, namely an intervention-only mean model is fit for data generated under MM1 and the covariate-adjusted mean model is fit for data generated under MM2. The following metrics are used to compare the performance of GEE and QIF: (1) relative bias (RBS), which equals the bias relative to the true effect, (2) empirical standard error (ESE) of the estimates, and, (3) the mean robust standard error (MRSE) over all 3000 replicates. In addition, we set the nominal type I error rate to be 0.05 to obtain power for the Wald-type Z-test. We also calculate the power ratio (PR) defined by the power of QIF analysis relative to that of GEE analysis (Q/G). We additionally calculate the coverage probability (Coverage) for both GEE and QIF. Finally, the empirical type I error rate is estimated assuming a true null intervention effect () in the DGP. We present results for DGP with MM1 and four different correlation structures in Table 2. The results indicate that GEE and QIF lead to the exact same estimates, and confirms the analytical insights from Result 2. With the exchangeable working correlation structure and balanced cluster sizes, MM1 only includes the cluster-level intervention variable, satisfying the conditions listed in Result 2. In this setting, the relative biases are small for both GEE and QIF, and the coverage probabilities are all close to nominal. The type I error rate, which is the power when , are also close to 0.05. In addition, in each scenario, the ESE and the MRSE are close to each other, indicating that the robust variance estimators are consistent for both GEE and QIF. Notably, the results in scenarios with CS3 and or have smaller power than those with CS0 and CS1, suggesting that both QIF and GEE could be less efficient with a misspecified correlation structure [19].

Table 2

The GEE and QIF results for data generated under mean model MM1 with covariance structures CS0, CS1, CS2, CS3 and different true ’s based on data from 3000 replicates for each scenario (S = 3000).

DGP: MM1		RBS		ESE		MRSE		Power		Coverage
Analysis: MM1a		GEE	QIF	GEE	QIF	GEE	QIF	GEE	QIF	GEE	QIF
CS0	β1=0	–	–	–	–	–	–	5.57%	5.57%	94.43%	94.43%
	β1=0.40	−0.98%	−0.98%	0.118	0.118	0.117	0.117	91.97%	91.97%	94.67%	94.67%
	β1=0.45	−0.87%	−0.87%	0.118	0.118	0.117	0.117	96.70%	96.70%	94.67%	94.67%
	β1=0.50	−0.78%	−0.78%	0.118	0.118	0.117	0.117	98.60%	98.60%	94.67%	94.67%
	β1=0.55	0.20%	0.20%	0.119	0.119	0.117	0.117	99.60%	99.60%	94.27%	94.27%
CS1	β1=0	–	–	–	–	–	–	5.87%	5.87%	94.13%	94.13%
	β1=0.40	−0.35%	−0.35%	0.147	0.147	0.143	0.143	78.17%	78.17%	95.20%	95.20%
	β1=0.45	−0.80%	−0.80%	0.148	0.148	0.145	0.145	84.93%	84.93%	94.77%	94.77%
	β1=0.50	0.22%	0.22%	0.150	0.150	0.145	0.145	89.77%	89.77%	94.60%	94.60%
	β1=0.55	−0.53%	−0.53%	0.148	0.148	0.146	0.146	94.03%	94.03%	94.93%	94.93%
CS2	β1=0	–	–	–	–	–	–	5.30%	5.30%	94.70%	94.70%
	β1=0.40	1.00%	1.00%	0.152	0.152	0.148	0.148	77.43%	77.43%	93.93%	93.93%
	β1=0.45.	−0.40%	−0.40%	0.150	0.150	0.148	0.148	85.60%	85.60%	94.10%	94.10%
	β1=0.50	−1.03%	−1.03%	0.149	0.149	0.148	0.148	91.10%	91.10%	94.60%	94.60%
	β1=0.55	−0.57%	−0.57%	0.146	0.146	0.148	0.148	95.53%	95.53%	94.80%	94.80%
CS3	β1=0	–	–	–	–	–	–	5.30%	5.30%	94.70%	94.70%
	β1=0.40	−0.60%	−0.60%	0.215	0.215	0.211	0.211	47.70%	47.70%	94.20%	94.20%
	β1=0.50	0.30%	0.30%	0.219	0.219	0.212	0.212	65.30%	65.30%	94.00%	94.00%
	β1=0.60	−0.82%	−0.82%	0.216	0.216	0.212	0.212	78.97%	78.97%	94.53%	94.53%
	β1=0.70	−0.26%	−0.26%	0.212	0.212	0.212	0.212	90.63%	90.63%	94.60%	94.60%
	β1=0.80	1.13%	1.13%	0.213	0.213	0.212	0.212	96.53%	96.53%	94.10%	94.10%

The analysis utilizes mean model MM1 and exchangeable working correlation matrix with robust SE.

The GEE and QIF results for data generated under mean model MM1 with covariance structures CS0, CS1, CS2, CS3 and different true ’s based on data from 3000 replicates for each scenario (S = 3000). The analysis utilizes mean model MM1 and exchangeable working correlation matrix with robust SE. Table 3 presents the results for the DGP with MM2 and four correlation structures. Our analytical finding from Result 2 does not apply with MM2 which now includes multiple individual-level covariates. As a result, the GEE and QIF results differ, although both approaches give small relative bias across all scenarios. Interestingly, while GEE has smaller ESE compared to QIF, QIF presents smaller MRSE than GEE. This finding suggests that the robust variance of QIF tends to be biased towards zero under this complex mean model. The downward bias of QIF variance estimator further leads to under-coverage of the interval estimator, and a type I error inflation especially when the correlation structure deviates from CS0. In contrast, the type I error rate and coverage of the GEE estimator are more close to nominal throughout. The results in Table 3 also allow us to compare the efficiency between QIF and GEE, by comparing the ESE and the power. Under CS0 and CS1, GEE and QIF have almost identical power under the alternative, confirming that their results are asymptotically equivalent under correct correlation specification. When the working correlation model is misspecified, the power of both GEE and QIF will decrease, and QIF appears to be slightly more efficient, as evidenced by the results under CS3. Throughout, the power ratio of QIF over GEE is at most slightly larger than 1 (the largest increase in power is for QIF over GEE under CS3), and becomes closer to 1 as the effect size increases. However, one should be cautious in interpreting the efficiency advantage of QIF because (a) QIF carries an inflated type I error rate under the null and (b) QIF interval estimator frequently leads to under-coverage due to the negative bias in its robust variance estimator.

Table 3

The GEE and QIF results for MM2 with CS0, CS1, CS2, CS3 and different true ’s, each of which is from 3000 simulations ().

DGP: MM2		RBS		ESE		MRSE		Power		PR	Coverage
Analysis: MM2a		GEE	QIF	GEE	QIF	GEE	QIF	GEE	QIF	Q/G	GEE	QIF
CS0	β1=0	–	–	–	–	–	–	5.27%	6.93%	–	94.73%	93.07%
	β1=0.40	−0.83%	−1.03%	0.120	0.123	0.117	0.114	91.47%	91.67%	1.0022	94.50%	92.97%
	β1=0.45	−0.09%	0.02%	0.122	0.127	0.117	0.114	96.47%	96.30%	0.9983	93.53%	91.90%
	β1=0.50	0.62%	0.64%	0.120	0.124	0.117	0.114	98.67%	98.57%	0.9990	94.40%	92.67%
	β1=0.55	0.18%	0.22%	0.116	0.120	0.117	0.114	99.70%	99.70%	0.9990	94.37%	93.13%
CS1	β1=0	–	–	–	–	–	–	6.07%	7.83%	–	93.93%	92.17%
	β1=0.40	−0.33%	−0.10%	0.150	0.154	0.146	0.142	77.00%	78.37%	1.0178	94.87%	93.47%
	β1=0.45	0.58%	0.73%	0.154	0.159	0.146	0.142	84.67%	84.73%	1.0001	93.27%	91.80%
	β1=0.50	0.56%	0.48%	0.151	0.156	0.145	0.141	90.10%	90.13%	1.0003	94.17%	92.93%
	β1=0.55	−0.35%	−0.64%	0.157	0.163	0.146	0.142	92.87%	92.90%	1.0003	93.77%	91.73%
CS2	β1=0	–	–	–	–	–	–	5.67%	7.30%	–	94.33%	92.70%
	β1=0.40	0.48%	0.65%	0.151	0.153	0.148	0.143	77.17%	78.73%	1.0202	94.40%	93.20%
	β1=0.45	0.58%	0.69%	0.154	0.158	0.148	0.143	85.33%	86.57%	1.0145	94.37%	92.03%
	β1=0.50	0.01%	0.06%	0.151	0.154	0.148	0.143	91.80%	92.23%	1.0003	94.93%	93.53%
	β1=0.55	−0.35%	−0.64%	0.152	0.155	0.148	0.143	95.30%	95.40%	1.0010	94.10%	92.43%
CS3	β1=0	–	–	–	–	–	–	5.93%	7.17%	–	94.07%	92.83%
	β1=0.40	1.05%	1.13%	0.221	0.228	0.212	0.206	47.93%	50.07%	1.0446	94.07%	92.10%
	β1=0.50	−0.14%	−0.22%	0.220	0.227	0.212	0.206	64.90%	66.60%	1.0262	93.67%	91.87%
	β1=0.60	1.26%	1.15%	0.215	0.221	0.211	0.205	81.97%	82.50%	1.0065	94.10%	92.67%
	β1=0.70	−0.04%	−0.09%	0.216	0.223	0.212	0.206	90.67%	90.73%	1.0007	94.13%	92.93%
	β1=0.80	−0.25%	−0.29%	0.213	0.220	0.212	0.206	96.37%	96.37%	1.0000	94.53%	93.33%

The analysis utilizes mean model MM2 and exchangeable working correlation matrix with robust SE.

The GEE and QIF results for MM2 with CS0, CS1, CS2, CS3 and different true ’s, each of which is from 3000 simulations (). The analysis utilizes mean model MM2 and exchangeable working correlation matrix with robust SE. In an effort to potentially reduce the bias in QIF variance estimator under MM2, we additionally explore two bias-corrections for the variance proposed by Westgate [9]. These two bias-corrections are extensions of the Mancl-DeRouen (MD) and Kauermann-Carroll (KC) methods proposed in the GEE literature [27,28], and we denote them by and . The details of the two bias-correction methods for QIF are given in Appendix III. Table 4 shows that the type I error rates from the two bias-correction methods are closer to the nominal 0.05 level that from QIF without correction. However, the type I error rate of bias-corrected QIF remains liberal and consistently larger than that of GEE, cautioning the use of QIF in CRTs when the marginal mean model is complex with multiple individual-level covariates.

Table 4

The GEE, QIF and two corrected QIF results for MM2 with CS0, CS1, CS2, CS3, each of which is from 3000 simulations ().

DGP: MM2Analysis: MM2a		Method	ESE	MRSE	Power (Type I error)
CS0	β1=0	GEE	0.119	0.117	5.27%
		QIF	0.123	0.114	6.93%
		QIFMD	0.123	0.117	5.93%
		QIFKC	0.123	0.115	6.33%
CS1	β1=0	GEE	0.152	0.145	6.07%
		QIF	0.159	0.141	7.83%
		QIFMD	0.159	0.146	7.27%
		QIFKC	0.159	0.143	7.50%
CS2	β1=0	GEE	0.152	0.148	5.67%
		QIF	0.155	0.143	7.30%
		QIFMD	0.155	0.148	6.43%
		QIFKC	0.155	0.145	6.90%
CS3	β1=0	GEE	0.215	0.212	5.93%
		QIF	0.220	0.206	7.17%
		QIFMD	0.220	0.212	6.33%
		QIFKC	0.220	0.209	6.73%

The analysis utilizes mean model MM2 and exchangeable working correlation matrix with robust SE and bias-corrected SEs.

The GEE, QIF and two corrected QIF results for MM2 with CS0, CS1, CS2, CS3, each of which is from 3000 simulations (). The analysis utilizes mean model MM2 and exchangeable working correlation matrix with robust SE and bias-corrected SEs. As pointed out by a reviewer, a possible explanation for the bias of the variance estimator for QIF and the associated inflated type I error could be the large variability of the empirical weighting or covariance matrix . Previous studies have suggested that including additional covariates in the mean model could lead to increased variability in estimating , which affects the efficiency of the QIF estimator [23,25,29,30]. It remains to be explored whether improved estimation of along the lines of Westgate [29] and Westgate [30] coupled with the bias-corrected variances could reduce the negative bias in the robust variance of QIF and improve the coverage rate of the interval estimator.

Analysis of HALI trial

We apply GEE and QIF to the analysis of the HALI trial and focus on the continuous outcome of spelling score at the 9-month follow-up [15]. From the descriptive statistics, we observe the baseline covariates are approximately the same across arms, but the mean value of the spelling score at 9-month follow-up differs between the arms, indicating a potential intervention effect due to the literacy intervention. In the marginal mean model, we sequentially include the cluster-level intervention , age , sex , presence of handwash facilities and baseline spelling score . We utilize both independence and exchangeable correlation structures based on three models on the marginal mean outcome in Table 5 to estimate the unadjusted and covariate-adjusted literacy intervention effects. The intervention parameter is denoted as . Specifically, the first mean model is an unadjusted model with intervention indicator as the only covariate, which corresponds to MM1 in simulation studies. The second and third mean models are covariate-adjusted models with three and four other covariates, with the third model corresponding to MM2 in simulation studies. We denote and as GEE with the independence and exchangeable correlation structures, respectively. The notations of and are similarly used for QIF. In addition, we also consider the bias-correction techniques of QIF [9] for the two correlation structures and denote them as , , and .

Table 5

Mean models of the 9-month spelling score.

Model	Formulation
1	μij=E(Yij)=β0+β1Xi
2	μij=E(Yij)=β0+β1Xi+βBBij+βCCij+βEEi
3	μij=E(Yij)=β0+β1Xi+βBBij+βCCij+βDDij+βEEi

Mean models of the 9-month spelling score. We summarize the results in Table 6. The intervention effect estimates from GEE and QIF are generally close to each other under mean model 1, but may be slightly different under mean model 2 and 3. Specifically, the intervention effect estimate tends to be larger using QIF and assuming working exchangeable correlation compared to the rest of methods. Although the standard error of the uncorrected variance of QIF appears to be the smallest, it may carry negative bias as suggested in simulation studies. The two bias-corrected variances of QIF could slightly reduce the bias and improve the variance estimator. For example, the MD corrected QIF standard error is close to the GEE standard error with the exchangeable working correlation. Overall, the standard error estimates are similar across methods under each specific mean model, suggesting that the application of QIF may have limited efficiency improvement over GEE in this data example. However, across the mean models, the standard error estimates for all methods sharply decrease when mean model 3 is considered compared to the rest of mean models, suggesting a strong predictive effect of the baseline spelling score.

Table 6

The results of the intervention effects on the 9-month spelling score.

Model	Method	Estimate	S.E.	95% confidence interval	p-value
1	GEEind	1.766	0.4813	(0.823, 2.709)	0.00024
	GEEexc	1.758	0.4819	(0.813, 2.703)	0.00026
	QIFind	1.766	0.4813	(0.822, 2.709)	0.00024
	QIFind−MD	1.766	0.4913	(0.803, 2.728)	0.00033
	QIFind−KC	1.766	0.4863	(0.812, 2.719)	0.00028
	QIFexc	1.797	0.4748	(0.866, 2.727)	0.00015
	QIFexc−MD	1.797	0.4847	(0.847, 2.747)	0.00021
	QIFexe−KC	1.797	0.4797	(0.856, 2.737)	0.00018
2	GEEind	1.811	0.4758	(0.878, 2.744)	0.00014
	GEEexc	1.842	0.4774	(0.906, 2.778)	0.00011
	QIFind	1.811	0.4758	(0.879, 2.744)	0.00014
	QIFind−MD	1.811	0.4943	(0.842, 2.780)	0.00025
	QIFind−KC	1.811	0.4849	(0.861, 2.762)	0.00019
	QIFexc	2.056	0.4583	(1.158, 2.955)	0.00001
	QIFexc−MD	2.056	0.4734	(1.128, 2.984)	0.00001
	QIFexc−KC	2.056	0.4659	(1.143, 2.969)	0.00001
3	GEEind	1.413	0.2868	(0.851, 1.975)	8.4×10−7
	GEEexc	1.446	0.2934	(0.871, 2.021)	8.3×10−7
	QIFind	1.413	0.2868	(0.851, 1.975)	8.4×10−7
	QIFind−MD	1.413	0.2980	(0.829, 1.997)	2.1×10−6
	QIFind−KC	1.413	0.2924	(0.840, 1.986)	1.3×10−6
	QIFexc	1.601	0.2817	(1.049, 2.153)	1.3×10−8
	QIFexc−MD	1.601	0.2922	(1.028, 2.173)	4.3×10−8
	QIFexc−KC	1.601	0.2872	(1.038, 2.164)	2.5×10−8

The results of the intervention effects on the 9-month spelling score.

Discussion

In this paper, we compare the QIF approach with the more commonly-used GEE approach for the estimation of intervention effect in CRTs. In particular, we focus on CRTs with continuous outcomes at one follow-up time point, a large number of clusters and equal cluster sizes, similar to the motivating data example. We present three analytical results on the equivalence between GEE and QIF under specific conditions to explicitly acknowledge their connections. Our simulation studies also confirm these analytical results. Although our simulations show a potential power advantage of QIF over GEE, the inflated type I error rate of QIF cautions its use when the marginal mean model includes multiple baseline covariate. On the other hand, the GEE approach performs quite stable under complex mean and correlation models in our setting with a large number of clusters. Our simulation results suggest that the two specific limitations of QIF in the application to CRTs, which may be addressed by further research. First, we found an inflated type I error rate of QIF when the DGP concerns a complex marginal mean model MM2. Surprisingly, the application of the two bias-correction techniques does not fully address this issue, as the empirical type I error rate is still above 7% across 3000 simulations. The type I error inflation is mostly due to the negative bias of the QIF variance estimator. In CRTs, a better control of type I error rate may be achieved by permutation test. In future studies, one could consider developing the marginal-model-based permutation analysis as in Braun and Feng [31] and Li et al. [32] for QIF to achieve better finite-sample properties. Second, despite a potential power advantage of QIF over GEE suggested by Table 3, we additionally saw a larger ESE for QIF compared to GEE. In fact, if we define the relative efficiency based on the ratio of the ESE, then Table 3 suggests that QIF could be slightly less efficient than GEE, even though the asymptotic theory states otherwise. As we explain in the simulations, the reduced efficiency of QIF may be attributed to the large variability in estimating the empirical weighting or covariance matrix under a complex mean model [25]. Additionally studies are required to systematically evaluate whether improved estimation of [25,29,30] can lead to better efficiency of QIF and eventually address the bias of its variance estimates. To conclude, our empirical evaluation supports the use of GEE over QIF in CRTs with a large number of clusters. We observe that the QIF could exhibit inflated type I error rate and under-coverage when the marginal mean model includes baseline adjustment variables other than the intervention status, while the GEE approach performs consistently well across all scenarios. The bias-corrected variances of QIF also shows limited improvement in terms of type I error rate, and more empirical evaluations of QIF are required to clearly demonstrate its theoretical efficiency advantage over GEE before recommending its routine applications in CRTs.

Funding

This research did not receive any specific grant from funding agencies in the public, commercial, or not-for-profit sectors.

Table 7

Table of Abbreviations

Acronym	Full name
CRT	cluster randomized trial
ICC	intracluster correlation coefficient
GEE	generalized estimating equations
QIF	quadratic inference function
GMM	generalized method of moments
HALI	health and literacy intervention
IQR	interquartile range
SD	standard deviation
TGI	the godambe infomration criterion
DGP	data generating process
CS	correlation structure in the data generating process
MM	mean model in the data generating process
CS0	fixed-regular exchangeable correlation structure
CS1	regular exchangeable correlation structure
CS2	cluster-specific exchangeable correlation structure
CS3	cluster-specific exchangeable with sub-clustering correlation structure
MM1	intervention-arm only model
MM2	intervention-arm with 4 covariates model
RBS	relative bias
SE	standard error
ESE	empirical standard error
MRSE	mean robust standard error
PR	power ratio
Q/G	quadratic inference function relative to generalized estimating equations
QIFMD	quadratic inference function with the Mancl-DeRouen bias-correction method
QIFKC	quadratic inference function with the Kauermann-Carroll bias-correction method
GEEind	generalized estimating equations using independence working correlation
GEEexc	generalized estimating equations using exchangeable working correlation
QIFind−MD	quadratic inference function with Mancl-DeRouen bias correction and independence
QIFind−KC	quadratic inference function with Kauermann-Carroll bias correction and independence
QIFexc−MD	quadratic inference function with Mancl-DeRouen bias correction and exchangeable
QIFexc−KC	quadratic inference function with Kauermann-Carroll bias correction and exchangeable

19 in total

1. Sample size and power calculations with correlated binary data.

Authors: W Pan
Journal: Control Clin Trials Date: 2001-06

2. An integrated population-averaged approach to the design, analysis and sample size determination of cluster-unit trials.

Authors: John S Preisser; Mary L Young; Daniel J Zaccaro; Mark Wolfson
Journal: Stat Med Date: 2003-04-30 Impact factor: 2.373

3. Improving educational achievement and anaemia of school children: design of a cluster randomised trial of school-based malaria prevention and enhanced literacy instruction in Kenya.

Authors: Simon Brooker; George Okello; Kiambo Njagi; Margaret M Dubeck; Katherine E Halliday; Hellen Inyega; Matthew C H Jukes
Journal: Trials Date: 2010-10-07 Impact factor: 2.279

4. The effect of cluster size imbalance and covariates on the estimation performance of quadratic inference functions.

Authors: Philip M Westgate; Thomas M Braun
Journal: Stat Med Date: 2012-03-13 Impact factor: 2.373

5. To GEE or not to GEE: comparing population average and mixed models for estimating the associations between neighborhood risk factors and health.

Authors: Alan E Hubbard; Jennifer Ahern; Nancy L Fleischer; Mark Van der Laan; Sheri A Lippman; Nicholas Jewell; Tim Bruckner; William A Satariano
Journal: Epidemiology Date: 2010-07 Impact factor: 4.822

6. An improved quadratic inference function for parameter estimation in the analysis of correlated data.

Authors: Philip M Westgate; Thomas M Braun
Journal: Stat Med Date: 2012-12-28 Impact factor: 2.373

7. An evaluation of constrained randomization for the design and analysis of group-randomized trials with binary outcomes.

Authors: Fan Li; Elizabeth L Turner; Patrick J Heagerty; David M Murray; William M Vollmer; Elizabeth R DeLong
Journal: Stat Med Date: 2017-08-07 Impact factor: 2.373

8. Methods for sample size determination in cluster randomized trials.

Authors: Clare Rutterford; Andrew Copas; Sandra Eldridge
Journal: Int J Epidemiol Date: 2015-07-13 Impact factor: 7.196

9. Changes in Obesity Odds Ratio among Iranian Adults, since 2000: Quadratic Inference Functions Method.

Authors: Enayatollah Bakhshi; Koorosh Etemad; Behjat Seifi; Kazem Mohammad; Akbar Biglarian; Jalil Koohpayehzadeh
Journal: Comput Math Methods Med Date: 2016-10-10 Impact factor: 2.238

10. An association of platelet indices with blood pressure in Beijing adults: Applying quadratic inference function for a longitudinal study.

Authors: Kun Yang; Lixin Tao; Gehendra Mahara; Yan Yan; Kai Cao; Xiangtong Liu; Sipeng Chen; Qin Xu; Long Liu; Chao Wang; Fangfang Huang; Jie Zhang; Aoshuang Yan; Zhao Ping; Xiuhua Guo
Journal: Medicine (Baltimore) Date: 2016-09 Impact factor: 1.889

1 in total

1. Power considerations for generalized estimating equations analyses of four-level cluster randomized trials.

Authors: Xueqi Wang; Elizabeth L Turner; John S Preisser; Fan Li
Journal: Biom J Date: 2021-12-13 Impact factor: 1.715

1 in total