Literature DB >> 35199380

Sample size estimation using a latent variable model for mixed outcome co-primary, multiple primary and composite endpoints.

Martina E McMenamin^1,2, Jessica K Barrett¹, Anna Berglind³, James M S Wason^1,4.

Abstract

Mixed outcome endpoints that combine multiple continuous and discrete components are often employed as primary outcome measures in clinical trials. These may be in the form of co-primary endpoints, which conclude effectiveness overall if an effect occurs in all of the components, or multiple primary endpoints, which require an effect in at least one of the components. Alternatively, they may be combined to form composite endpoints, which reduce the outcomes to a one-dimensional endpoint. There are many advantages to joint modeling the individual outcomes, however in order to do this in practice we require techniques for sample size estimation. In this article we show how the latent variable model can be used to estimate the joint endpoints and propose hypotheses, power calculations and sample size estimation methods for each. We illustrate the techniques using a numerical example based on a four-dimensional endpoint and find that the sample size required for the co-primary endpoint is larger than that required for the individual endpoint with the smallest effect size. Conversely, the sample size required in the multiple primary case is similar to that needed for the outcome with the largest effect size. We show that the empirical power is achieved for each endpoint and that the FWER can be sufficiently controlled using a Bonferroni correction if the correlations between endpoints are less than 0.5. Otherwise, less conservative adjustments may be needed. We further illustrate empirically the efficiency gains that may be achieved in the composite endpoint setting.

Entities: Chemical

Keywords: latent variable modeling; mixed outcome endpoints; sample size estimation

Mesh：

Year: 2022 PMID： 35199380 PMCID： PMC7612654 DOI： 10.1002/sim.9356

Source DB: PubMed Journal: Stat Med ISSN： 0277-6715 Impact factor: 2.497

Introduction

Sample size estimation plays an integral role in the design of a study. The objective is to determine the minimum sample size that is large enough to detect, with a specified power, a clinically meaningful treatment effect. Although it is crucial that investigators have enough patients enrolled to detect this effect, overestimating the sample size also has ethical and practical implications. Namely, in a placebo-controlled trial, more patients are subjected to a placebo arm than is necessary, therefore withholding access to potentially beneficial drugs from them and delaying access to future patients.[1-3] Furthermore it results in longer, more expensive trials, using resources that could be allocated elsewhere. One vital aspect of sample size determination is the primary endpoint. Typically this is a single outcome, however in some instances there may be multiple outcomes of interest and so various combinations of these outcomes can be selected as the primary endpoint, depending on the hypothesis of interest. Assuming we have three outcomes of interest ν1, ν2 and ν3, one option is a co-primary endpoint, which takes the form of the multivariate endpoint ν1 ∩ ν2 ∩ ν3. This means that an intervention is deemed to be effective overall if it is shown to be effective in each of ν1, ν2, and ν3. Alternatively multiple primary endpoints may be of interest, which take the multivariate form v1 ∪ ν2 ∪ ν3, where an intervention is deemed effective if it is shown to be effective in at least one of ν1, ν2, or ν3. Another possibility is a composite endpoint, involving some function that maps the multivariate outcome to a univariate outcome for inference, for example ν1 + ν2 + ν3. In this case the outcomes within the composite may be assigned equal or differing degrees of relevance depending on clinical importance.[4] Alternatively, the composite endpoint may combine outcomes by labeling patients as ‘responders’ or ‘non-responders’ based on whether they exceed predefined thresholds in each of the outcomes. For instance, we let a response indicator S = 1 if ν1 ≤ η1, ν2 ≤ η, and ν3 ≤ η3, where η denotes the response cutpoints. Note that the composite case is distinct from the others in that it combines the parameters and hence test statistics for each outcome into one, rather than these remaining separate for each outcome. This will have implications for sample size estimation. For each of these endpoints, the individual outcomes may be a mix of multiple continuous, ordinal, and binary measures. One possible way to jointly model the outcomes is using a latent variable framework, arising in the graphical modeling literature, in which discrete outcomes are assumed to be latent continuous variables subject to estimable thresholds and modeled using a multivariate normal distribution.[5,6] By employing this framework we can take account of the correlation between the outcomes, improve the handling of missing data in individual components and potentially increase efficiency. Furthermore, in the case of multiple primary outcomes, it may reduce the severity of multiple testing corrections required by accounting for correlation between endpoints. A barrier to adopting these techniques is a lack of consensus on sample size determination. A recent and comprehensive overview of the existing literature for sample size calculation in clinical trials with co-primary and multiple primary endpoints is provided by Sozu et al.[7] The review found many proposals for power and sample size calculations for multiple continuous outcomes. In the co-primary case, some of these were based on assuming that the endpoints were bivariate normally distributed,[8,9] and extended for the case of more than two endpoints.[10,11] Other work focused on testing procedures[12,13] and controlling the type I error rate.[14-17] Similar ideas were investigated for multiple primary endpoints.[14,17-19] Approaches to sample size estimation for composite endpoints have focused primarily on the case of multiple binary components.[20-25] In the case of binary co-primary endpoints, five methods of power and sample size calculation based on three association measures have been introduced.[26] Additionally, sample size calculation for trials using multiple risk ratios and odds ratios for treatment effect estimation is discussed by Hamasaki et al,[27] and Song[28] explores co-primary endpoints in non-inferiority clinical trials. Consideration has also been given to the case where two co-primary endpoints are both time-to-event measures where effects are required in both endpoints,[29-31] and at least one of the endpoints.[32,33] Furthermore, composites comprised of time-to-event measures are common, in which the composite reflects time-to-first-event variable.[34] Sample size estimation in this case has been considered by Sugimoto et al.[35] The mixed outcome setting has received substantially less consideration. One paper considers overall power functions and sample size determinations for multiple co-primary endpoints that consist of mixed continuous and binary variables.[36] They assume that response variables follow a multivariate normal distribution, where binary variables are observed in a dichotomized normal distribution, and use Pearson’s correlations for association. A modification was suggested by Wu and de Leon[37] which involved using latent-level tests and pairwise correlations, and provided increased power. Thesemethods focus on the co-primary endpoint case,where effects are required in all outcomes. The case ofmultiple primary or composite endpoints where the components are measured on different scales has not been considered, each of which will require distinct hypotheses. In practice, if a mixed outcome composite is selected as the primary endpoint in a trial then the sample size calculation may be based on an overall binary endpoint or collapsed to form multiple binary endpoints however this will result in a large loss in efficiency.[38] In this article we build on the existing work for co-primary continuous and binary endpoints to include any combination of continuous, ordinal, and binary outcomes for co-primary, multiple primary, and composite endpoints. We propose a framework based on the same latent variable model and show how it may be tailored to each of the three endpoints to facilitate sample size estimation. The article will proceed as follows: in Section 2 we introduce the latent variable model, detailing how it can be used in each context, and specify hypothesis tests for each of the three combinations of mixed outcomes; in Section 3 we propose power calculations and sample size estimation techniques in each case; in Section 4 we illustrate the methods on a four dimensional endpoint consisting of two continuous, one ordinal and one binary outcome using a numerical example based on the MUSE trial;[39] and in Section 5 we simulate the empirical power for each test and the FWER for the union-intersection test. We conclude with a discussion and recommendations for practice in Section 6, and introduce user-friendly software and documentation for implementation in Section 7.

Endpoints and Hypothesis Testing

Latent variable framework

Let n and n represent the number of patients in the treatment group and the control group respectively and let K be the number of outcomes measured for each patient. Let be vector of K responses for patient i on the treatment arm and the vector of K responses for patient i on the control arm. Without loss of generality, the first elements of Y and Y are observed as continuous variables, the next are observed as ordinal and the remaining are observed as binary. For instance, for a three dimensional endpoint with one continuous, one ordinal and one binary measure, k = 1, k = 2, and K = 3. We use the biserial model of association by Tate,[40] which is based on latent continuous measures manifesting as discrete variables. Formally, we say that Y and Y have latent variables and respectively, where and where and x…x denotes the p covariates included in the model for outcome k. Likewise are the corresponding quantities for the control arm. Then for let and where and are the associationmeasures between the endpoints. For and The latent variables can be related to the observed variables by: and We set τ 0 = −∞, τ(w+1) = ∞ and the intercepts μ and μ equal to zero for k < k ≤ ko in order to estimate the cut-points. Additionally, τ0 = − ∞, τ1 = 0, τ = ∞ for k < k ≤ K so that the intercepts can be estimated for the outcomes observed as binary. The mixed outcomes are then combined as follows.

Co-primary endpoint

In this case, a treatment must be shown to be effective as measured by each of the outcomes in order to be deemed effective overall. We generalize previous work for mixed continuous and binary outcomes to include ordinal outcomes, as shown below.[36,37] In many clinical trials the hypothesis of interest is based on superiority, namely that the proposed treatment will perform better than the control treatment. The null hypothesis is that the difference in treatment effects for the treatment arm and control arm is less than or equal to zero. This is straightforward to formalize in the case of one endpoint but less so when there are multiple co-primary endpoints, particularly when they are measured on different scales. The hypothesis of interest is as shown in (1) where π and π is the effect of the intervention in the treatment and control arm respectively. For k < k ≤ K we can specify and for the treatment and control group. We can generalize this assumption to account for the ordinal endpoints based on the fact that for The definition of treatment effect for ordinal outcomes may be modified to include multiple ordinal levels by selecting the appropriate τ thresholds. For instance, As the latent means are estimable by maximum likelihood, in the treatment group and in the control group. We can proceed by specifying that the hypothesis in (1) holds if and only if the hypothesis holds, where and The maximum likelihood estimates and can be used for a test of and the variance of this test statistic can be obtained using the inverse of the Fisher information matrix.

Multiple primary endpoint

Multiple primary endpoints conclude a treatment is effective if it is shown to work in at least one of the outcomes. We would expect the sample size required to be reduced compared with the co-primary endpoint case which would require power to detect treatments in all outcomes. We can allow for sample size estimation for multiple primary endpoints as follows. The hypothesis of interest, accounting for the fact that a significant effect in only one outcome is required, is shown below. As before, π and π can be determined for k < k ≤ k using the relevant τ thresholds. The difference in latent means and their variance are estimated using the maximum likelihood estimates and Fisher information matrix, as before.

Composite endpoint

A review conducted by Wason et al[41] showed that composite responder endpoints are widely used and identified many clinical areas in which they are common, such as oncology, rheumatology, cardiovascular, and circulation. The latent variable framework may be used to model the underlying structure of these mixed outcome composite endpoints to greatly improve efficiency.[38] The joint distribution of the continuous, ordinal, and binary outcomes is modeled using the latent variable structure as before. However, in this case the endpoint of interest is a composite responder endpoint and so the required quantity is some function of the probability of response in the treatment group p and in the control group p. For instance, an overall responder index S can be formed for patient i, where S = 1 if and 0 otherwise, where the quantities (η1, …, η) are predefined responder thresholds. Generalizations where response only requires a certain number of the components to meet the thresholds are possible, but involve more complex sums. Note that this definition of response is distinct from that commonly found in composites formed from survival endpoints or binary composites typical in cardiovascular studies. We can specify p and p, the probability of response for patient i in the treatment arm and control arm respectively, as shown in (5), where θ is the vector of model parameters and we assume that and As in the case of co-primary and multiple primary endpoints, the assumptions allow us to estimate latent means for the observed discrete components using the model parameters. In the mixed outcome composite endpoint setting, note that although we are exploiting the latent multivariate Gaussian structure for efficiency gains we are ultimately still interested in a one dimensional endpoint, such as the difference in response probabilities between the treatment and control arms of the trial. This is distinct from the co-primary and multiple primary endpoints cases, where the overall hypothesis test must be based on some union or intersection of the hypotheses for the individual outcomes. For the composite endpoint we can formulate the hypothesis as shown in (6), where p and p are as in (5). For sample size estimation, we require the distribution of δ = p − p under H1, which we can assume to be The hypothesis can therefore be stated as where and ΦK(.;,Σ) is the K-dimensional multivariate normal distribution function, with mean vector and covariance matrix Σ. Estimates of the quantities can be obtained using the maximum likelihood estimates for the model parameters, as in the co-primary and multiple primary endpoint settings, so that and where is the K-dimensional vector of mean values in the treatment arm and is the corresponding vector for the control arm. Using a Taylor series expansion, we can obtain the quantity using the fact that Then, where ″δ is the vector of partial derivatives of δ* with respect to each of the parameter estimates. We can obtain and covariance matrix by fitting the model to pilot trial data.

Sample Size Estimation

Co-primary endpoints

To construct the power function, we define the required quantities as follows. Let and denote the difference in sample means for the continuous and discrete outcomes respectively. We assume and let z denote the (1 − α)100th standard normal percentile, where α is the prespecified significance level. We define the z score as and for the observed continuous and latent continuous measures respectively. The test statistic can then be defined as shown below. A useful property of is that it is asymptotically multivariate normal under regularity conditions.[11]The power function for the joint co-primary endpoints is as shown in (10) and hence can be approximated by (11). for Assuming n = n = n it is possible to rearrange (11) to obtain a sample size formula in terms of n as shown below.[7] where the sample size depends on the number of outcomes and C is the solution of Alternatively, we can input different values for n in (11) to achieve the required power.

Multiple primary endpoints

Using the and defined for co-primary endpoints and assuming n = n = n, we can define the overall power as in (14). In order to obtain an appropriate power function we rely on the inclusion-exclusion principle as follows. A closed form expression for the overall power is shown in (15) We then input different values for n to achieve the required power. Note when using the union-intersection test for multiple primary endpoints that a correction must be applied to control the family-wise error rate (FWER). Approaches used for multiple primary continuous endpoints, such as Bonferroni and Holm corrections, may also be implemented in this setting.

Composite endpoints

As the endpoint of interest is specified in terms of the overall one dimensional composite endpoint, we can use the formula assumed when employing the standard test of proportions technique. As we can assume that σ = σ = σ and n = n = n, so that The power is deduced in the standard way, as demonstrated below. Note that however to obtain a formula in terms of the required sample size we will need to separate n from the variance estimate. By fitting the model to pilot trial data we can obtain an estimate for σ, as the value of n will be known in this instance and n can be obtained using (17). This is similar to the sample size equation used for the binary method, however σ is not derived in the standard way and δ* is obtained using latent means as opposed to provided directly.

Numerical Application

Muse trial

We illustrate the technique for sample size determination using the MUSE trial.[39] The trial was a phase IIb, randomized, double-blind, placebo-controlled study investigating the efficacy and safety of anifrolumab in adults with moderate to severe systemic lupus erythematosus (SLE). Patients (n=305) were randomized (1:1:1) to receive anifrolumab (300 or 1000 mg) or placebo, in addition to standard therapy every 4 weeks for 48 weeks. The primary endpoint in the study was the percentage of patients achieving an SLE Responder Index (SRI) response at week 24, with sustained reduction of oral corticosteroids (<10 mg/day and less than or equal to the dose at week 1 from week 12 through 24). SRI is comprised of a continuous Physician’s Global Assessment (PGA) measure, a continuous SLE Disease Activity Index (SLEDAI) measure and an ordinal British Isles Lupus Assessment Group (BILAG) measure.[42] The study had a target sample size of 100 patients per group based on providing 88% power at the two-sided 0.10 alpha level, to detect at least 20% absolute improvement in SRI(4) response rate at week 24 for anifrolumab relative to placebo. The investigators assumed a 40% placebo response rate.

Model

In this case (Y1, Y2, Y3, Y4) are SLEDAI, PGA, BILAG and the corticosteroid tapering indicator respectively and () ~ N4 () where, and the ordinal and binary components may be related to their latent variables as shown in (19). The thresholds (τ31, τ32, τ33, τ34) are estimated from the data. We can use the MUSE trial to design future studies where we assume that the endpoints of interest are co-primary, multiple primary and composite endpoints. The overall power functions for each are shown below. where for k = {1, 2} and for k = {3, 4}. In the composite setting where σ is estimated using the delta method. For the Power calculation we apply the Bonferroni correction, such that each outcome is assessed at the level.

Computation

We have conducted the computations in R version 4.0.2. We define functions to evaluate the power for each of the endpoints using a combination of the pnorm and pmvnorm functions. Sample size is obtained by inserting values for n until the desired power is achieved. Details of our source code and a web app for implementation is included in the Software section. Code to obtain the results shown in this article can be obtained at https://github.com/martinamcm/mcmenamin_2021_multsamp. Considerations and instructions for fitting the latent variable model are discussed in detail McMenamin et al. (2021).[38]

Results

The power is largest for the multiple primary endpoint, where 80% is achieved for n=37 patients in each arm. The power for the composite endpoint is similar to that of PGA, the component with the highest effect size. As we would expect the power is considerably lower for co-primary endpoints, which would require n=325 for 80% power (Figure 1).

Figure 1

Power function for individual SLEDAI (continuous), PGA (continuous), BILAG (ordinal), and Taper (binary) outcomes and the power functions with when they are treated as co-primary, multiple primary, and composite endpoints using data from the MUSE trial

Table 1 shows the sample sizes required in each group, for the co-primary and multiple primary endpoints to obtain an overall power of at least 80% to detect a difference of 0.88 in SLEDAI, 0.38 in PGA, 0.24 in BILAG and 0.40 in the taper outcome based on the values observed in the trial. We allow for uncertainty in the variance of the continuous measures by setting and The sample sizes required for each individual endpoint are also shown, based on achieving a power of at least 80%. Allowing for uncertainty in the variance of the SLEDAI outcome varies the required sample size for the co-primary endpoint but not the multiple primary endpoint. The opposite is true when the assumed variance of the PGA outcome is changed, namely affecting the sample size required for the multiple primary endpoint but not the co-primary. This is intuitive given that the treatment effect observed in the SLEDAI outcome is smallest and is largest for the PGA outcome. For the co-primary and composite endpoints the power is largest when the correlation between the endpoints is high whereas for multiple primary endpoints the power is largest for zero correlation between endpoints (Figure 2).

Table 1

Sample sizes n = n = n for the co-primary and multiple primary endpoints for overall power 1 − β ≈ 0.80, α = 0.025, k = 2, K = 4 using the MUSE trial data

SLEDAI		PGA		BILAG		Taper
δ ₁	σ12	δ ₂	σ22	(π_T3,π_C3)	σ3*	(π_T4,π_C4)	σ4*	n_co	n_mult	SS ₁	SS ₂	SS ₃	SS ₄
0.88	18	0.38	0.35	(0.97,0.95)	0.24	(0.54,0.38)	0.40	403	46	365	39	273	99
0.88	19	0.38	0.35	(0.97,0.95)	0.24	(0.54,0.38)	0.40	419	46	386	39	273	99
0.88	20	0.38	0.35	(0.97,0.95)	0.24	(0.54,0.38)	0.40	435	46	406	39	273	99
0.88	18	0.38	0.45	(0.97,0.95)	0.24	(0.54,0.38)	0.40	403	55	365	49	273	99
0.88	18	0.38	0.55	(0.97,0.95)	0.24	(0.54,0.38)	0.40	403	63	365	60	273	99
0.88	18	0.38	0.65	(0.97,0.95)	0.24	(0.54,0.38)	0.40	403	70	365	71	273	99

Note: SS1, SS2, SS3, SS4 are sample sizes required per group for the individual endpoints for a power of at least 1 − β = 0.80.

Figure 2

Overall power 1 − β to detect the treatment effects assumed from the MUSE trial for the systemic lupus erythematosus co-primary, multiple primary, and composite endpoints for different sample sizes per group n = n = n and differing correlations between outcomes, where Low = 0.3, Medium = 0.5, and High = 0.8

We assume that a future trial in SLE is to be conducted using the composite responder endpoint, allowing for uncertainty in σ. The estimated variance for the risk difference from the trial dataset is with correlation parameters For a risk difference of 0.14, the required sample size per group is 50, compared to 135 for 88% power in the standard binary method. If the method were to be employed for increased power, rather than a decrease in required sample size, the estimated power of the latent variable method is over 99.99% for sample sizes giving 88% power at the 0.05 one-sided alpha level in the binary method. The empirical power is shown for the latent variable method in 1000 simulated datasets, which is approximately 88% for each sample size, as required. Note that the sample size for composite endpoints are highly dependent on the responder threshold chosen, which will be predefined by clinicians.

Empirical Performance Of Sample Sizes

The behavior of the sample sizes obtained for each of the endpoints can be shown empirically. Assuming the four dimensional SLE endpoint, we calculate the empirical power by simulating 100 000 datasets from the multivariate normal distribution and applying the corresponding tests for both the observed and latent continuous outcomes. The key concern for the co-primary endpoints is that the method gives the appropriate power whereas for multiple primary endpoints we must ensure the family-wise error rate is controlled. The sample sizes required for each of the three endpoints and the corresponding empirical power is shown for effect sizes observed in the MUSE trial with low, medium, and high correlation assumed between endpoints (Table 2). The empirical power derived is approximately equal to the desired power of 80% for all endpoints. As is well recognized in the multiple testing literature, the type I error rate must be controlled when multiple primary endpoints are tested using the union-intersection test. The degree to which the type I error rate is inflated depends on the number of outcomes and the correlation between outcomes, where lower correlation between outcomes and larger number of outcomes result in larger inflations (Figure 3). The performance of the Bonferroni correction in this setting is shown, where it is conservative in the case of high correlation between endpoints. As the maximum correlation between outcomes in the MUSE trial endpoint used in the numerical example is 0.5, we expect the sample sizes shown for this application to be a good estimate. If very large positive correlations between the endpoints are expected the required sample size from this approach may be overestimated. The code to obtain these empirical results is provided in the ‘Software’ section.

Table 2

Sample sizes and empirical power (%) for n = n = n for the co-primary, multiple primary, and composite endpoints for overall power 1 − β ≈ 0.80, α = 0.025, k = 2, K = 4 with observed and latent effect sizes and correlation ρ equal to 0.3, 0.5, 0.8 where correlations are assumed to be equal between all endpoints

δ ₁	δ ₂	δ3*	δ4*	ρ	Co-primary	Multiple primary	Composite
0.12	0.12	0.12	0.12	0.0	1766 (80.1)	591 (80.0)	1031 (80.1)
				0.3	1692 (80.0)	744 (80.1)	883 (80.0)
				0.5	1617 (80.0)	867 (80.1)	687 (80.0)
				0.8	1439 (80.0)	1117 (80.0)	589 (79.9)
0.35	0.35	0.15	0.15	0.0	917 (79.9)	105 (80.1)	201 (79.9)
				0.3	894 (80.0)	122 (79.9)	156 (80.0)
				0.5	870 (80.0)	134 (80.0)	115 (80.1)
				0.8	815 (80.1)	153 (80.0)	92 (80.0)
0.12	0.35	0.55	0.10	0.0	1772 (80.2)	61 (80.3)	81 (80.0)
				0.3	1736 (80.2)	67 (80.5)	72 (80.1)
				0.5	1700 (80.0)	70 (79.9)	67 (80.2)
				0.8	1625 (80.2)	74 (80.6)	58 (80.1)

Figure 3

Family-wise error rate (FWER) of the multiple primary endpoints shown both unadjusted and adjusted using the Bonferroni correction. FWERs are shown for K = (2, 3, 4) outcomes and correlations are constrained to be equal between all outcomes

Discussion

The work in this article demonstrated the various ways in which a latent variable framework may be employed for mixed continuous, ordinal, and binary outcomes. We illustrated sample size determination in the case of mixed continuous, ordinal, and binary co-primary outcomes. We extended this to allow for sample size determination in the case of mixed multiple primary endpoints and proposed a technique to estimate the sample size when using a latent variable model for the underlying structure of a mixed composite endpoint. For co-primary and multiple primary endpoints the resulting hypothesis is based on an intersection or union of the hypotheses for the individual outcomes and so is multivariate in nature. However, for composite responder endpoints the hypothesis of interest is stated in relation to the overall responder endpoint and so is univariate. Sample size estimation in this case can make use of the standard power and sample size functions but requires the distribution of the test statistic under the alternative hypothesis which we approximate using latent-level means and a Taylor series expansion. We applied the methods to a numerical example based on a phase IIb study. For the correlation structure observed in the MUSE trial, the sample size required for the co-primary endpoint was greater than that required for the individual endpoint with the lowest effect size. Alternatively, the sample size required for the multiple primary endpoint changes based on the variance assumed for the outcome with the largest treatment effect, however is similar to that required by the individual endpoint. The sample size required for the composite endpoint was between that required for the individual outcome with the largest and second largest effect size. Given that in the composite case we are concerned with the overall binary response endpoint, we compared the sample sizes required for the endpoint using the latent variable model with the standard binary method which we showed offered a large gain in efficiency. Results of the simulated scenarios agree with previous findings that the inclusion of the ordinal component with five levels is only responsible for a very small proportion of the precision gains. Given that the inclusion of the ordinal component substantially increases complexity and computational demand, it may be sufficient to combine any ordinal components with the binary outcome if necessary. Detailed simulation results for the composite endpoint are shown in the Supplementary Material. One practical consideration when calculating the sample size for a trial using the latent variable model is the need to specify a large number of parameters, even in the case of only a few outcomes. Estimates for the parameters could be obtained by fitting the model to pilot data however this is potentially challenging and restrictive for a number of reasons. First, it requires that a pilot or earlier phase trial must have already taken place. Furthermore, the pilot data could be fundamentally different to the future trial and observed effects may be imprecise. Therefore, placing too much emphasis on the existing data may lead to problems in the main trial. In theory, it is possible to specify the required covariance parameters without data however this would be difficult in practice. Additionally, in the case of composite endpoints, we cannot define the variance in terms of the model parameters only, as the treatment effect is defined for the one-dimensional composite and so is a function of the parameters. This means that the full covariance matrix of the estimated parameters is required for the Taylor series derivation. An alternative when there is no data available is to apply the method using the sample size required to achieve 80% power for the binary method and avail of the large increase in power. Alternatively, we can directly specify σ based on expert elicitation, as is sometimes the case in practice for standard one-dimensional endpoints. Allowing for uncertainty in the quantities and choosing conservative values should provide an appropriate sample size estimate. It is possible to extend this approach to use adaptive sample size re-estimation, or an internal pilot to allow for reductions in the required sample size in the trial as we collect more information about the treatment effect variability.

Software

The code to obtain the results in this article is available at https://github.com/martinamcm/mcmenamin_2021_multsamp. A Shiny application for implementing the method is available at https://martinamcm.shinyapps.io/multsampsize/. Documentation and example data are available at https://github.com/martinamcm/MultSampSize.

31 in total

1. Estimating significance level and power comparisons for testing multiple endpoints in clinical trials.

Authors: J Gong; J C Pinheiro; D L DeMets
Journal: Control Clin Trials Date: 2000-08

2. Multiplicity adjustment for composite binary endpoints.

Authors: Geraldine Rauch; M Kieser
Journal: Methods Inf Med Date: 2012-04-24 Impact factor: 2.176

3. Method of balanced adjustment in testing co-primary endpoints.

Authors: George Kordzakhia; Ohidul Siddiqui; Mohammad F Huque
Journal: Stat Med Date: 2010-08-30 Impact factor: 2.373

4. Simultaneous inference of a binary composite endpoint and its components.

Authors: M Große Ruse; C Ritz; L A Hothorn
Journal: J Biopharm Stat Date: 2016-02-16 Impact factor: 1.051

5. Some controversial multiple testing problems in regulatory applications.

Authors: H M James Hung; Sue-Jane Wang
Journal: J Biopharm Stat Date: 2009 Impact factor: 1.051

6. The Use of a Binary Composite Endpoint and Sample Size Requirement: Influence of Endpoints Overlap.

Authors: Josep-Ramon Marsal; Ignacio Ferreira-González; Sandra Bertran; Aida Ribera; Gaietà Permanyer-Miralda; David García-Dorado; Guadalupe Gómez
Journal: Am J Epidemiol Date: 2017-05-01 Impact factor: 4.897

7. A new approach for sizing trials with composite binary endpoints using anticipated marginal values and accounting for the correlation between components.

Authors: Marta Bofill Roig; Guadalupe Gómez Melis
Journal: Stat Med Date: 2019-01-13 Impact factor: 2.373

8. Multi-variate probit analysis.

Authors: J R Ashford; R R Sowden
Journal: Biometrics Date: 1970-09 Impact factor: 2.571

9. Statistical power, sample size, and their reporting in randomized controlled trials.

Authors: D Moher; C S Dulberg; G A Wells
Journal: JAMA Date: 1994-07-13 Impact factor: 56.272

10. Power and sample size for clinical trials when efficacy is required in multiple endpoints: application to an Alzheimer's treatment trial.

Authors: Chengjie Xiong; Kai Yu; Feng Gao; Yan Yan; Zhengjun Zhang
Journal: Clin Trials Date: 2005 Impact factor: 2.486

1 in total

1. Sample size estimation using a latent variable model for mixed outcome co-primary, multiple primary and composite endpoints.

Authors: Martina E McMenamin; Jessica K Barrett; Anna Berglind; James M S Wason
Journal: Stat Med Date: 2022-02-23 Impact factor: 2.497

1 in total