Literature DB >> 35622872

Assessing individual equivalence in parallel group and crossover designs: Exact test and sample size procedures.

Abstract

The consideration of individual equivalence provides an essential alternative to average equivalence in two-group comparative studies. A common procedure for declaring individual equivalence adopts the tolerance intervals of the designated proportions of measurement differences. This statistical practice is a direct generalization of the widely used two one-sided tests (TOST) for average equivalence. Such TOST extensions often do not have adequate control of Type I error and result in excessively conservative tests. To signify and resolve the underlying issues of existing methods, this paper presents exact tests for assessing individual equivalence between two treatments under parallel group and crossover designs. Rigorous evaluations are conducted to clarify the discrepancy of critical values and Type I error probabilities between the equivalence procedures. The findings elucidate the shortcoming of the TOST technique and the advantage of the proposed approach. The associated power and sample size calculations are also justified through simulation studies.

Entities: Chemical

Mesh：

Year: 2022 PMID： 35622872 PMCID： PMC9140302 DOI： 10.1371/journal.pone.0269128

Source DB: PubMed Journal: PLoS One ISSN： 1932-6203 Impact factor: 3.752

Introduction

The two one-sided tests (TOST) procedure of mean equivalence, first described by Schuirmann [1] and Westlake [2], is the most common method in equivalence methodology. The conceptual simplicity and technical feasibility of TOST provide an important reform to apply appropriate statistical tools for equivalence, rather than relying on failure to reject the conventional hypothesis of no difference between treatment effects. Meyners [3] presented a comprehensive review of different types of equivalence tests. Moreover, Hauschke, Steinijans, and Pigeot [4], Chow and Liu [5], Wellek [6], and Choudhary and Nagaraja [7] discussed the concepts and techniques for the design and analysis of equivalence studies. The TOST for mean equivalence focuses on the mean parameters of the target populations and represents a vital method within the general scope of average equivalence. It is important to note that mean equivalence testing specifies only the population mean difference and does not concern the other characteristics associated with the underlying distribution of measurement differences. Accordingly, the principle of average equivalence only demands similar average bioavailability and does not guarantee equivalence in intra-subject variability and closeness of the response distribution between the test and reference formulations. In view of the practical issue and important problem about the interchangeability of bioequivalence drug products, the notion of individual equivalence has been proposed to ensure switchability when a large proportion of individuals need to be sufficiently similar on the two drug formulations. The basic concept and rationale of individual equivalence are described in Anderson and Hauck [8], Hauck and Anderson [9], Sheiner [10], Schall and Luus [11], and Anderson [12]. Various individual equivalence principles and techniques have been proposed to evaluate exchangeability or switchability in terms of the desired proportion of the subject-level differences between two formulations. In particular, the commonly used reference limits of 95% proportion encompass the 2.5th percentile and 97.5th percentile for the distribution of measurement differences. Accordingly, the normal percentile is a linear function of the mean and standard deviation of the designated population. Statistical procedures and theoretical investigations of normal percentiles are essential for assessing individual equivalence. For the mean equivalence appraisals considered in the TOST, the duality between decision rules and confidence intervals is well documented. Specifically, the null hypothesis of no mean equivalence is rejected if and only if the confidence limits of the corresponding equal-tail two-sided 100(1–2α)% confidence interval of mean difference are contained in the designated equivalence bounds. Within the context of individual equivalence, the target parameters are the lower and upper percentiles for describing the desired population proportion. It is appealing to apply the confidence interval procedure to the normal percentiles of the distribution of subject-level differences. The one-sided confidence intervals of normal percentiles have a close link to the one-sided tolerance bounds of a normal distribution. This technical correspondence reveals that tolerance interval estimation has an extended utility in assessing individual bioequivalence. The notion of confidence intervals for mean equivalence or average equivalence has been extended to the appraisals of individual equivalence, such as the TOST methods presented in Esinhart and Chinchilli [13], Liu and Chow [14], and Tsong and Shen [15], among others. Accordingly, tolerance intervals are constructed for the desired proportions of measurement differences and individual equivalence is claimed if the resulting interval limits are within the selected equivalence range. General discussions of tolerance interval estimation are available in Krishnamoorthy and Mathew [16] and Meeker, Hahn, and Escobar [17]. Due to the close resemblance between tolerance intervals and confidence intervals, the TOST method for assessing individual equivalence is presumed to share the same desirable properties of the counterpart TOST for establishing mean equivalence. However, Berger and Hsu [18] showed that size-α bioequivalence tests do not generally correspond to 100(1–2α)% confidence sets. It is strongly advocated in Berger and Hsu [18] that statistically sound techniques should be employed to derive a test with the specified Type I error rate. Notably, the prescribed TOST methods for individual equivalence were conducted with respect to tolerance interval estimation. The corresponding numerical results did not directly evaluate their Type I error control in hypothesis testing. Although the assessment of individual equivalence mainly focuses on biopharmaceutical applications, the concept and analysis are pertinent to comparative studies across virtually all scientific disciplines. It is of great interest to clarify the potential deficiency and implications of current methods in equivalence testing. Following the two-sided sampling plan in Owen [19], this article presents a unified approach for evaluating individual equivalence between two treatment formulations. Exact test procedures are described for the parallel group and crossover designs. Extensive numerical investigations are conducted to demonstrate the underlying features the suggested and TOST procedures. The comparisons and findings reveal their essential discrepancy on critical values and Type I error rates that have not been addressed in the literature. The results update the less-recognized problems of the current TOST methods for examining individual equivalence in Liu and Chow [14], and Tsong and Shen [15]. To enhance the usefulness of the proposed approach, the associated power and sample size calculations are also demonstrated for planning individual equivalence studies. Computer algorithms for computing the critical value, statistical power, and sample size of the suggested test procedures are available as supplemental material. It should be noted that Owen [19] did not address hypothesis testing, power analysis and sample size determination for appraising individual equivalence. Moreover, the technical arguments presented here are more analytically transparent than the formulation based on the bivariate noncentral t distribution in Owen [19].

Methods

Parallel group design

Consider independent random samples from two normal populations with the following formulations: where μ, σ2 are unknown parameters, j = 1, …, N, and i = 1 and 2. To establish individual equivalence between two treatments, the central portion of the difference between the individual measurements of two treatments X1–X2 needs to lie within a reasonable range around zero. The 100·pth percentile of the distribution N(μ, ) of X1–X2 is denoted by where μ = μ1– μ2, = 2σ2, z is the 100·pth percentile of the standard normal distribution N(0, 1), and 0 < p < 1. The null and alternative hypotheses of the individual equivalence test are expressed as where p > 0.5 and the two designated constants Δ and Δ represent the lower and upper thresholds of the percentile range for declaring individual equivalence between two treatments. The alternative hypothesis indicates that there is at least p* = 2p – 1 central proportion of the distribution N(μ, ) in the range (Δ, Δ). Unlike the individual equivalence problem concerns the central proportion of a target distribution in terms of the pair of percentiles (θ1-, θ), a comparison of alternative approaches for difference, noninferiority, and equivalence testing of a single normal percentile was presented in Shieh [20]. Similar to the widely used TOST for mean equivalence, Shieh [20] showed that the TOST procedure for the comparability of a designated percentile also maintains good control the Type I error rate at the specified value. These promising results suggest that TOST principle can be useful for similar problems in more advanced designs and complex scenarios. However, a critical exposition of the TOST extensions for individual equivalence is presented to demonstrate that such generalizations do not have adequate control of Type I error and result in overly conservative tests.

The TOST procedure for parallel group design

To demonstrate average equivalence between two treatment means, the TOST procedure rejects the null hypothesis of incomparability if the ordinary 100(1–2α)% equal-tailed confidence interval of mean difference is entirely included in the equivalence range. The same principle was extended to individual equivalence assessment for exchangeability between the test and standard treatments in Tsong and Shen [15]. A concise illustration is presented to simplify the complicated results in Tsong and Shen [15]. The usual two-sample t statistic has the form where , , M = 1/(1/N1 + 1/N2), S2 = {(N1−1) + (N2−1) }/v, , , and v = N1 + N2−2. The ordinary interval limits (, ) of a 100(1–2α)% equal-tailed confidence interval of μ are respectively, where t is the 100(1 – α)th percentile of the t distribution with degrees of freedom v. In addition to the practical usefulness for interval estimation, the range {, } has an interesting connection to equivalence assessment. A well-known simple approach to conduct the TOST for mean equivalence is by examining whether the 100(1–2α)% confidence interval (, ) of μ falls within the designated range (δ, δ) where δ and δ are a priori constant and represent the sensible bounds for declaring mean equivalence. It is straightforward to show that the pivotal quantity for θ1- has a noncentral t distribution where t(v, z(2M)1/2) is a noncentral t distribution with degrees of freedom ν and noncentrality z(2M)1/2. The exact lower confidence limit of an upper 100(1 – α)% one-sided confidence interval {, ∞} of θ1− can be obtained as where τ = t1−(v, z(2M)1/2) is the 100(1 – α)th percentile of a noncentral t distribution t(v, z(2M)1/2). Similarly, the pivotal quantity for θ is distributed as Using the important property of a noncentral distribution as in Johnson, Kotz, and Balakrishnan [21, Chapter 31] that t1−(v, z(2M)1/2) = −t(v, −z(2M)1/2) for 0 < α < 1, the exact upper confidence limit of a lower 100(1 – α)% two-sided confidence interval {–∞, } of θ can be expressed as Note that the one-sided confidence intervals of normal percentiles are technically identical to the one-sided tolerance bounds of a normal distribution as noted in Hahn [22, 23]. The derived confidence limits and assure that P{ < θ1− < ∞} = P{P[ < (X1–X2) | ( – , S2)] > p} = 1 – α and P{–∞ < θ < } = P{P[(X1–X2) < | ( – , S2)] > p} = 1 – α, respectively. Accordingly, for p > 0.5, a lower 100(1 – α)% confidence limit for the 100(1–p)-th percentile θ1− is equivalent to a lower tolerance limit to be exceeded by at least a proportion p of the population with probability 1 – α. Likewise, an upper 100(1 – α)% confidence limit for the 100 p-th percentile θ for p > 0.5 is equivalent to an upper tolerance limit to exceed at least a proportion p of the population with probability 1 – α. As an extension to the use of tolerance intervals for the assessment of individual bioequivalence, Tsong and Shen [15] suggested that the null hypothesis H0: θ1− ≤ Δ or Δ ≤ θ is rejected if or The strong resemblance between (, ) and {, } in formulation and testing suggests that the rejection region {, } for individual equivalence may possess similar statistical properties with the confidence interval (, ) for mean equivalence. Specifically, the TOST of mean equivalence based on (, ) adequately controls the Type I error rate at the specified value. However, Berger and Hsu [18] exemplified that an equivalence procedure in terms of a 100(1–2α)% confidence interval can lead to a liberal or conservative test. The Type I error rate associated with the TOST of individual equivalence is evaluated by α = P{τ < T and T < −τ} when the boundary values (θ1−, θ) = (Δ, Δ). It follows from Δ = θ1− = μ−zσ and Δ = θ = μ + zσ that T ~ t(v, z(2M)1/2) and T = T− 2zσ/(S2/M)1/2. Thus, the Type I error rate is rewritten as α = P{τ < T and T < −τ} = P{τ < T < 2zσ/(S2/M)1/2 – τ} ≤ P{τ < T} = α. Note that the size of the TOST is the supremum = α which is attained as or σ2 goes to zero. However, the Type I error rate of the TOST procedure is generally less than the nominal level. The succeeding empirical investigations reveal that the discrepancy is of considerable concern. An improved procedure is proposed next to facilitate research practice in assessing individual equivalence.

The proposed procedure for parallel group design

By extending the two-sided sampling plan in Owen [19], the suggested exact rejection region for declaring individual equivalence is of the form where , , and the quantity τ is selected so that the Type I error rate = α. Note that the supremum is attained when the two percentiles coincide the boundary values (θ1−, θ) = (Δ, Δ) or alternatively, μ = (Δ + Δ)/2 and = (Δ− Δ)2/(4). Accordingly, the designated critical value τ is obtained by It follows from the normal assumption defined in Eq 1 that Z = ( – – μ)/(σ2/M)1/2 ~ N(0, 1) and K = vS2/σ2 ~ χ2(v) where χ2(v) is a chi-square distribution with degrees of freedom v. Also, Z and K are independent. Then, the probability evaluation in Eq 12 can be expressed as where G = z(2M)1/2 – τ(K/v)1/2. It is computationally transparent to adopt the formulation where G0 = G if K < k0 and G0 = 0 if K ≥ k0 with k0 = (2vM)/, Φ is the cumulative density function of the standard normal distribution, and the expectation E[·] is taken with respect to the distribution of K. A special-purpose computer program is required to calculate the critical value τ for the chosen model settings. Consequently, the null hypothesis is rejected if Note that the critical values τ of the suggested approach and τ of the TOST procedure generally differ. For example, when (N1, N2) = (20, 20), α = 0.05, p* = 0.80, the critical values are τ = 6.4527 and τ = 7.9987 for the suggested and TOST procedures, respectively. According to the rejection rules in Eqs 10 and 15, the TOST is less likely to reject the null hypothesis than the exact procedure because of τ > τ. Therefore, the two critical regions (, ) and (, ) do not necessarily lead to the same conclusion. On the other hand, with the definitions of the two random variables Z and K, it can be shown that the corresponding power function is where G = (Δ− μ)/(σ2/M)1/2 + τ(K/v)1/2 and G = (Δ− μ)/(σ2/M)1/2 – τ(K/v)1/2. Note that the power calculation is meaningful only when G < G or K < k1 where k1 = {vM(Δ− Δ)2}/(4). A transparent and convenient expression of the power function is where G = G and G = G if K < k1, and G = 0 and G = 0 if K ≥ k1. The power formula Ψ is useful for computing the achieved power with the given sample sizes, and for determining the required sample sizes to attain the nominal power under the selected configurations (Δ, Δ, p*, α, μ1, μ2, σ2).

Crossover design

In bioequivalence studies, a common scenario for comparing treatments is the two-period crossover design. Consider the standard two-sequence and two-period crossover design in terms of the model where Y is the outcome for the kth subject in the ith sequence and jth period, μ is the grand mean, F is the formulation effect, P is the fixed period effect, S is the random subject effect, and ε is the random error for i = 1 and 2, j = 1 and 2, and k = 1, …, N. The formulation effects are expressed as F11 = F22 = μ and F12 = F21 = μ for the reference product and test product, respectively, {S} are independent N(0, ) variables, and {ε} are independent N(0, ) variables with = = and = = . Moreover, it is assumed that P1 + P2 = μR + μ = 0. To establish individual equivalence between two treatments in the crossover design, the central portion of the contrast for the individual measurements of two treatments (C1–C2)/2 needs to be within a reasonable range around zero where C = (Y–Y)/2 for i = 1 and 2. Accordingly, (C1–C2) ~ N(μ, ) where μ = μ− μ, = 2σ2, and σ2 = ( + )/4. The 100·pth percentile for the distribution of (C1–C2) is denoted by for 0 < p < 1 as in Eq 2. An unbiased estimator of the difference between the two treatments μ is the sample mean difference where = for i = 1 and 2. It is clear that E() = μ/2, E() = –μ/2, Var() = σ2/N1, and Var() = σ2/N2. Hence, the mean difference – has the distribution where M = 1/(1/N1 + 1/N2). Moreover, S2 = is an unbiased estimator of σ2 and K = (vS2)/σ2 has a chi-square distribution with degrees of freedom v = N1 + N2−2. The formulations and properties for the crossover design show close resemblance to those of the parallel group design. Accordingly, the conceptual and statistical similarities enable the conversion of the individual equivalence inference of the parallel group design into that of the crossover design.

The TOST procedure for crossover design

By analogy to the parallel group design, the individual equivalence problem within the context of crossover design can be conducted with respect to the null and alternative hypotheses given in Eq 3. Following the TOST principle for assessing equivalence of mean effects, Liu and Chow [14] proposed an extension for declaring individual equivalence based on the lower confidence limit of a upper 100(1 – α)% one-sided confidence interval of θ1− and the upper confidence limit of a lower 100(1 – α)% one-sided confidence interval of θ. Specifically, Liu and Chow [14] suggested that the null hypothesis of no individual equivalence is rejected if or where , and the critical value τ = τ = t1−(v, z(2M)1/2).

The proposed procedure for crossover design

In this case of crossover design, the proposed exact rejection region for declaring individual equivalence is of the form where , , and the quantity τ is selected so that the Type I error rate = α. This evaluation of the Type I error rate has the same statistical property as that of the parallel group design. The critical value can be obtained with the identical technique. Consequently, with the similar argument and notation, it can be shown that the critical value τ is identical to that of the parallel group design: τ = τ. Alternatively, the null hypothesis is rejected if The corresponding power function is where G = (Δ− μ)/(σ2/M)1/2 + τ(K/v)1/2, G = (Δ− μ)/(σ2/M)1/2 – τ(K/v)1/2, Z ~ N(0, 1), and K ~ χ2(v). For computational ease, an alternative formulation of Ψ is where G = G and G = G if K < k, G = 0 and G = 0 if K ≥ k, k = {vM(Δ− Δ)2}/(). For ease of illustration, the endpoints of the prescribed test procedures for parallel group and crossover designs are summarized in Table 1.

Table 1

The endpoints of the proposed and TOST rejection rules.

Methods	Endpoints	Equation
The TOST procedure by Tsong and Shen [15]: {θ^TL, θ^TU}	θ^TL=X¯1−X¯2−τTS/M1/2θ^TU=X¯1−X¯2+τTS/M1/2	9
The proposed procedure: {θ^EL, θ^EU}	θ^EL=X¯1−X¯2−τES/M1/2θ^EU=X¯1−X¯2+τE(S/M1/2)	11
The TOST procedure by Liu and Chow [14]: {θ^CTL, θ^CTU}	θ^CTL=C¯1−C¯2−τCTS/M1/2θ^CTU=C¯1−C¯2+τCTS/M1/2	20
The proposed procedure: {θ^CEL, θ^CEU}	θ^CEL=C¯1−C¯2−τCE(S/M1/2)θ^CEU=C¯1−C¯2+τCE(S/M1/2)	22

Results

Type I errors

The suggested test procedures are derived by controlling the Type I error at the nominal level. Although the critical values do not have an explicit analytic expression, they can be determined with the designated configurations (N1, N2, p*, α, Δ, Δ). On the other hand, the TOST procedures generalize the results for mean equivalence assessment and tolerance interval estimation. The resulting critical values and rejection regions are not directly obtained with respect to the Type I error control in hypothesis testing. It is of theoretical and practical importance to evaluate the potential discrepancy between the proposed approach and benchmark TOST method. Accordingly, simulation study was conducted to examine the Type I error rates under the parallel group designs. For the numerical investigations, the selected central proportions of the individual equivalence tests are p* = 0.80, 0.90 and 0.95. The mean and variance of the null distribution N(μ, ) for the individual measurement difference are chosen as μ = 0 and = 1. The designated thresholds (Δ, Δ) are determined by Δ = μ–zσ and Δ = μ + zσ. The resulting similarity bounds are (Δ, Δ) = (–1.2816, 1.2816), (–1.6449, 1.6449), and (–1.9600, 1.9600) for p = 0.90, 0.95, and 0.975, respectively. Four sets of sample sizes are considered: (N1, N2) = (20, 20), (50, 50), (100, 100), and (200, 200). Throughout the empirical examination, the significance level is fixed as α = 0.05. Under the combined twelve structures of central proportions and sample sizes, an important step is to compute the critical values τ and τ of the proposed and TOST procedures for the specified settings. According to the results presented in Table 2, the two critical values have a systematic order that τ is consistently less than τ. Hence, the TOST method has smaller rejection rate than the suggested approach.

Table 2

The critical values of the proposed and TOST procedures for individual equivalence when the significance level α = 0.05.

		Sample sizes (N₁, N₂)
Test procedure	Central proportion p*	(20, 20)	(50, 50)	(100, 100)	(200, 200)
The proposed approach	0.80	6.4527	9.7099	13.4337	18.7232
TOST method		7.9987	11.1886	14.8840	20.1553
The proposed approach	0.90	8.4041	12.5728	17.3474	24.1334
TOST method		9.8812	13.9793	18.7236	25.4901
The proposed approach	0.95	10.1084	15.0664	20.7517	28.8354
TOST method		11.5352	16.4203	22.0744	30.1377

The simulated Type I error rates of the individual equivalence tests were computed via Monte Carlo simulation of 10,000 independent data sets. For the two test procedures, the simulated Type I error rates were the proportion of the 10,000 replicates whose critical intervals (, ) and (, ) were within the range of (Δ, Δ). The simulated Type I error probabilities under the four different sample sizes are summarized in Tables 3–5 for the three central portions p* = 0.80, 0.90, and 0.95, respectively. The adequacy of the two procedures is determined by the difference between the simulated Type I error rate and the nominal level 0.05 as summarized in the tables. To visualize the differences between the two procedures, the simulated results for p* = 0.90 in Table 4 are also plotted in Fig 1. It is evident that the simulated Type I error rates of the suggested approach are almost identical to the nominal value 0.05. In contrast, the simulated Type I error probabilities of the TOST method are less than 0.01 for the 12 settings considered here. These findings suggest that the proposed procedure has adequate Type I error control, whereas the TOST procedure is extremely conservative.

Table 3

The simulated Type I error rates of individual equivalence tests for central proportion p* = 0.80, equivalence bounds (Δ, Δ) = (–1.2816, 1.2816), and the significance level α = 0.05.

	Sample sizes (N₁, N₂)
	(20, 20)		(50, 50)		(100, 100)		(200, 200)
Test procedure	Simulated alpha	Difference	Simulated alpha	Difference	Simulated alpha	Difference	Simulated alpha	Difference
The proposed approach	0.0541	0.0041	0.0486	–0.0014	0.0506	0.0006	0.0496	–0.0004
TOST procedure	0.0011	–0.0489	0.0008	–0.0492	0.0004	–0.0496	0.0004	–0.0496

Note: Δ = μ−zσ and Δ = μ + zσ where μ = 0, = 1, p = 0.90, and z = 1.2816.

Table 5

The simulated Type I error rates of individual equivalence tests for central proportion p* = 0.95, equivalence bounds (Δ, Δ) = (–1.9600, 1.9600), and the significance level α = 0.05.

	Sample sizes (N₁, N₂)
	(20, 20)		(50, 50)		(100, 100)		(200, 200)
Test procedure	Simulated alpha	Difference	Simulated alpha	Difference	Simulated alpha	Difference	Simulated alpha	Difference
The proposed approach	0.0518	0.0018	0.0502	0.0002	0.0493	–0.0007	0.0522	0.0022
TOST procedure	0.0056	–0.0444	0.0041	–0.0459	0.0032	–0.0468	0.0031	–0.0469

Note: Δ = μ−zσ and Δ = μ + zσ where μ = = 0, = 1, p = 0.975, and z = 1.9600.

Table 4

The simulated Type I error rates of individual equivalence tests for central proportion p* = 0.90, equivalence bounds (Δ, Δ) = (–1.6449, 1.6449), and the significance level α = 0.05.

	Sample sizes (N₁, N₂)
	(20, 20)		(50, 50)		(100, 100)		(200, 200)
Test procedure	Simulated alpha	Difference	Simulated alpha	Difference	Simulated alpha	Difference	Simulated alpha	Difference
The proposed approach	0.0530	0.0030	0.0492	–0.0008	0.0489	–0.0011	0.0514	0.0014
TOST procedure	0.0029	–0.0471	0.0026	–0.0474	0.0019	–0.0491	0.0014	–0.0486

Note: Δ = μ−zσ and Δ = μ + zσ where μ = 0, = 1, p = 0.95, and z = 1.6449.

Fig 1

Simulated Type I error rates for central proportion 0.90 and α = 0.05.

Note: Δ = μ−zσ and Δ = μ + zσ where μ = 0, = 1, p = 0.90, and z = 1.2816. Note: Δ = μ−zσ and Δ = μ + zσ where μ = 0, = 1, p = 0.95, and z = 1.6449. Note: Δ = μ−zσ and Δ = μ + zσ where μ = = 0, = 1, p = 0.975, and z = 1.9600.

Power and sample size calculations

A related and important issue of the individual equivalence test is the power and sample size calculations. The power functions derived in Eqs 17 and 25 facilitate the desired power and sample size planning of the parallel group and crossover designs. The algorithms for computing the critical value, achieved power, and sample size are implemented in the supplementary programs. Accordingly, numerical studies were conducted to explicate the behavior of derived power function and the usefulness of accompanying computer algorithm in sample size determinations. Sample size determination requires test configurations of Type I error rate α, nominal power 1 – β, equivalence bounds (Δ, Δ), null central portion p*, and the alternative settings include the mean values (μ1, μ2), error variance σ2, and sample size allocation ratio r = N2/N1. Note that the resulting percentiles θ1− and θ need to be within the designated bounds (Δ, Δ) under the alternative distribution N(μ, ). For illustration, two central portions are considered: p* = 0.90 and 0.95 (p = 0.95 and 0.975). By fixing the null distribution N(μ, ) as N(0, 1), the resulting two sets of threshold bounds are (Δ, Δ) = (–1.6449, 1.6449), and (–1.9600, 1.9600). The alternative distributions are chosen to have the treatment means (μ1, μ2) = (0, 0), (0.05, 0), and (0.10, 0), and variance = 0.6, 0.7 and 0.8. Under the specified configurations, the minimum total sample size N = N1 + N2 is computed for balanced design r = 1 (N1 = N2), significance level α = 0.05, and nominal power 1 – β = 0.9. The estimated sample sizes and attained power levels are summarized in Table 6 for the combined 18 cases. The minimum sample size for attaining the nominal power increases with increasing mean difference μ or increasing variance when all other factors remain fixed. It is essential to see that the magnitudes of the computed sample sizes are substantially different for the settings considered here. The smallest sample size is 80 for two the settings of (p*, μ, ) = (0.95, 0, 0.6). On the other hand, the largest sample size 1852 is required for the situation with (p*, μ, ) = (0.90, 0.10, 0.8). The results indicate that the prescribed test configurations have unique and distinct influence on the power function. Conceivably, it is unlikely that a simple guideline will give accurate sample size determination.

Table 6

Estimated sample size, estimated power, and simulated power of the proposed individual equivalence test for balanced design N1 = N2, σ2 = /2, the nominal power 0.90, and the significance level α = 0.05.

Null proportion p*	Equivalence bounds (Δ_L, Δ_U)	Mean μ_D	Variance σD2	Sample size N_T	Simulated power	Estimated power	Difference
0.90	(–1.6449, 1.6449)	0	0.6	86	0.9020	0.9008	0.0012
			0.7	182	0.8973	0.9004	–0.0031
			0.8	482	0.9034	0.9009	0.0025
		0.05	0.6	92	0.9026	0.9005	0.0021
			0.7	210	0.9019	0.9020	–0.0001
			0.8	678	0.9012	0.9005	0.0007
		0.10	0.6	116	0.9013	0.9027	–0.0014
			0.7	322	0.8961	0.9005	–0.0044
			0.8	1852	0.9032	0.9001	0.0031
0.95	(–1.9600, 1.9600)	0	0.6	80	0.8981	0.9006	–0.0025
			0.7	168	0.9036	0.9007	0.0029
			0.8	440	0.9029	0.9003	0.0026
		0.05	0.6	86	0.9075	0.9057	0.0018
			0.7	186	0.9021	0.9008	0.0013
			0.8	566	0.8988	0.9002	–0.0014
		0.10	0.6	100	0.9039	0.9029	0.0010
			0.7	256	0.9033	0.9012	0.0021
			0.8	1170	0.8999	0.9000	–0.0001

Furthermore, under the prescribed model configurations, simulation study was conducted to justify the accuracy of the proposed power and sample size procedures. Specifically, the simulated power of the proposed test procedure was computed via Monte Carlo simulation of 10,000 independent data sets. The simulated power and the difference between the simulated power and estimated power are also presented in Table 6. For each of the 18 scenarios, the small difference reveals that the simulated power is nearly identical to the estimated power. The accuracy of the described power and sample size procedures is fairly consistent under various sample size and parameter configurations. Consequently, these findings suggest that the developed power and sample size algorithms are reliable for practical applications.

An application

A bioequivalence study was presented in Liu and Chow [14] to demonstrate the assessment of individual equivalence between two drug formulations. Under the standard setting of two-sequence two-period cross over design, the responses are the area under the plasma concentration-time curve (AUC). The sample sizes, sample mean difference, and residual error variance of the logarithmic transformation of AUC are N1 = N2 = 10, – = 0.05331, and S2 = 0.0378, respectively. To declare individual equivalence between the test and reference formulations, it is assumed that at least p* = 0.75 of the difference between two individual formulation measurements are within the bounds Δ = ln(0.80) = –0.2231 and Δ = ln(1.25) = 0.2231. Accordingly, the test statistics in Eq 21 can be computed as T = 3.1801 and T = –1.9537. With α = 0.05, the critical values of the TOST and proposed procedures are τ = 6.0173 and τ = 4.3436, respectively. Also, the two associated critical regions are (, ) = (–0.4698, 0.5764) and (, ) = (–0.3243, 0.4309). Thus, the two test procedures conclude that the null hypothesis of no individual equivalence cannot be rejected at the significance level 0.05. Under the normal assumptions, the difference between two individual formulation measurements has the distribution (C1–C2) ~ N(μ, ). Using the summary statistics as exemplifying parameter values (μ, ) = (0.05331, 0.0756), the proportion between the two bounds (Δ, Δ) = (–0.2231, 0.2231) for the normal distribution N(μ, ) is the probability P(Δ < C1–C2 < Δ) = 0.5744. Note that the coverage probability is substantially less than the nominal value 0.75 for declaring individual equivalence. For illustration, the working parameters are chosen as μ = 0.02, 0.03, 0.04, and 0.05 and = 0.0756/4. To meet the nominal power 0.80, the estimated sample sizes are (N1, N2) = (25, 25), (37, 37), (69, 69), and (183, 183) with the achieved power levels 0.8017, 0.8035, 0.8024, and 0.8002, respectively. Evidently, the magnitudes are larger than the sample sizes (N1, N2) = (10, 10) of the previous analysis. This indicates the importance and accuracy of power and sample size procedures for efficient computations in individual equivalence study. The accompanying computer algorithms are also presented for conducting the suggested power and sample size calculations.

Conclusions

The conventional TOST of mean focuses only on the equivalence of population means between the test and reference formulations. Therefore, the TOST of mean equivalence or average equivalence does not take into account the variability of formulation difference in bioavailability across subjects. In view of the limitation of average equivalence, Chen [24] identified several desirable features of bioequivalence criteria. The criteria include the assurance of switchability between formulations, the control of Type I error rate at 5%, determination of appropriate sample size, and user-friendly software application for the statistical method. Related considerations of individual equivalence can be found in the additional discussion in Chen et al. [25] and Chen and Lesko [26]. To address these issues, this article presents exact tests for assessing individual equivalence under parallel group and crossover designs. The numerical results showed that the TOST procedures based on tolerance intervals are overly conservative. More importantly, the exact approach has excellent Type I error control and can be recommended for routine use. Computer programs are also developed to implement the proposed equivalence test, power calculation, and sample size determination. The research designs and test procedures considered here are valid only if the homogeneous variance assumption is satisfied. The degree of robustness presumably depends on the extent of how badly the homogeneity of variance assumption is violated. Future research can explore possible extensions to accommodate heterogeneity of variance settings.

SAS/IML programs for performing the suggested procedures.

(PDF) Click here for additional data file.

R programs for performing the suggested procedures.

(PDF) Click here for additional data file.

11 in total

1. An individual bioequivalence criterion: regulatory considerations.

Authors: M L Chen; R Patnaik; W W Hauck; D J Schuirmann; T Hyslop; R Williams
Journal: Stat Med Date: 2000-10-30 Impact factor: 2.373

Review 2. Individual bioequivalence revisited.

Authors: M L Chen; L J Lesko
Journal: Clin Pharmacokinet Date: 2001 Impact factor: 6.447

3. Bioequivalence revisited.

Authors: L B Sheiner
Journal: Stat Med Date: 1992-09-30 Impact factor: 2.373

Review 4. Types of bioequivalence and related statistical considerations.

Authors: W W Hauck; S Anderson
Journal: Int J Clin Pharmacol Ther Toxicol Date: 1992-05

5. An alternative approach to assess exchangeability of a test treatment and the standard treatment with normally distributed response.

Authors: Yi Tsong; Meiyu Shen
Journal: J Biopharm Stat Date: 2007 Impact factor: 1.051