Nigel Stallard1, Frank Miller2, Simon Day3, Siew Wan Hee1, Jason Madan4, Sarah Zohar5, Martin Posch6. 1. Statistics and Epidemiology, Division of Health Sciences, Warwick Medical School, University of Warwick, Coventry, CV4 7AL, UK. 2. Department of Statistics, Stockholm University, Stockholm, Sweden. 3. Clinical Trials Consulting and Training Limited, Buckingham, UK. 4. Clinical Trials Unit, Warwick Medical School, University of Warwick, Coventry, UK. 5. INSERM, U1138, team 22, Centre de Recherche des Cordeliers, Université Paris 5, Université Paris 6, Paris, France. 6. Section of Medical Statistics, CeMSIIS, Medical University of Vienna, Austria.
Abstract
The problem of choosing a sample size for a clinical trial is a very common one. In some settings, such as rare diseases or other small populations, the large sample sizes usually associated with the standard frequentist approach may be infeasible, suggesting that the sample size chosen should reflect the size of the population under consideration. Incorporation of the population size is possible in a decision-theoretic approach either explicitly by assuming that the population size is fixed and known, or implicitly through geometric discounting of the gain from future patients reflecting the expected population size. This paper develops such approaches. Building on previous work, an asymptotic expression is derived for the sample size for single and two-arm clinical trials in the general case of a clinical trial with a primary endpoint with a distribution of one parameter exponential family form that optimizes a utility function that quantifies the cost and gain per patient as a continuous function of this parameter. It is shown that as the size of the population, N, or expected size, N∗ in the case of geometric discounting, becomes large, the optimal trial size is O(N1/2) or O(N∗1/2). The sample size obtained from the asymptotic expression is also compared with the exact optimal sample size in examples with responses with Bernoulli and Poisson distributions, showing that the asymptotic approximations can also be reasonable in relatively small sample sizes.
The problem of choosing a sample size for a clinical trial is a very common one. In some settings, such as rare diseases or other small populations, the large sample sizes usually associated with the standard frequentist approach may be infeasible, suggesting that the sample size chosen should reflect the size of the population under consideration. Incorporation of the population size is possible in a decision-theoretic approach either explicitly by assuming that the population size is fixed and known, or implicitly through geometric discounting of the gain from future patients reflecting the expected population size. This paper develops such approaches. Building on previous work, an asymptotic expression is derived for the sample size for single and two-arm clinical trials in the general case of a clinical trial with a primary endpoint with a distribution of one parameter exponential family form that optimizes a utility function that quantifies the cost and gain per patient as a continuous function of this parameter. It is shown that as the size of the population, N, or expected size, N∗ in the case of geometric discounting, becomes large, the optimal trial size is O(N1/2) or O(N∗1/2). The sample size obtained from the asymptotic expression is also compared with the exact optimal sample size in examples with responses with Bernoulli and Poisson distributions, showing that the asymptotic approximations can also be reasonable in relatively small sample sizes.
The problem of determining the sample size for a clinical trial is a very common one. For large‐scale definitive phase III clinical trials, a frequentist approach is usually adopted, with the sample size chosen so as to control the type I error rate at a specified level, α, and to give specified power , to detect some appropriately chosen size of treatment effect (see, e.g. Pocock, 1983, for details). Choice of and or 0.2 is typical.The sample sizes obtained using the frequentist approach do not always seem appropriate. In particular they do not reflect the size of the population to which the results of the trial apply. The population size is relevant, however, when considering decisions made on the basis of trial results. This is particularly true for clinical trials conducted in rare diseases or other small populations, when the population size means that a large trial would be infeasible, or even impossible.One way in which the size of the population can influence the sample size is to use a decision theoretic approach in which the benefits to future patients in the population, sometimes called the “patient horizon”, are explicitly considered so that future benefit depends on the size of this population. Such an approach has been proposed and discussed by numerous authors over the last 50 years (see, e.g. Anscombe, 1963; Colton, 1963; Sylvester, 1988; Berry et al., 1994; Cheng et al., 2003; Kikuchi and Gittins, 2009 and reviews by Pezeshk et al., 2013; Hee et al., 2016). Although this approach has very rarely been implemented in practice, it can nevertheless provide important insight into an appropriate choice of sample size for a clinical trial.The role of the population size in determination of the optimal sample size for a clinical trial has been considered by Cheng et al. (2003). They considered single‐arm and two‐arm clinical trials with a primary endpoint following a Bernoulli distribution indicating either success or failure of treatment for each patient. Adopting a decision‐theoretic approach, they obtained designs that maximize the total number of successes. Denoting the population size by N, they show that asymptotically as , the optimal sample size for a clinical trial is and give an expression for the asymptotically optimal sample size that depends on the prior distributions for the unknown probability of success for each trial arm.In this paper, we extend the work of Cheng et al. (2003), both to more general distributional forms for the primary endpoint and to situations in which the aim is to maximize some general utility expressed as a function of a parameter of these distributions. We show that the result that the optimal sample size is applies for any continuous utility function and for responses with a distribution of any one‐parameter exponential family form assuming a conjugate prior distribution. We also consider the case where no finite patient horizon is assumed, but gains from future patients are geometrically discounted, that is the gain from the jth patient is multiplied by for some discounting parameter, (Berry and Fristedt, 1985). As considered in the discussion section, the size of λ can also reflect the size of the patient population in this setting, via an effective number of patients, , which will be denoted . We show that in this case as so that is large, the optimal sample size is . We also investigate through exact calculation the small‐sample accuracy of the the large sample approximations. Although the results obtained depend on asymptotics, we show that, depending on the exact form of the utility function chosen, these may be reasonable even for extremely rare diseases, for example for patient populations of 1000 or less when the optimal sample size can be less than 50.
Detailed problem description and notation
Outline of the decision problem
Suppose that a clinical trial is to be conducted to choose between two treatments with n
1 and n
2 patients receiving treatment 1 and treatment 2, respectively, where treatment 2 may be the current standard treatment included as a control. Note that taking corresponds to a single‐arm trial, though the decision to be taken at the end of such a trial remains comparative, with a choice being made regarding treatment of future patients.It is assumed that the gain associated with treatment of patients in the trial or patients outside the trial who receive each treatment can be specified as a function of a parameter of the distribution for the response for patients receiving that treatment. It is noted that here “gain” is to be interpreted widely to include any kind of costs, losses, gains, or benefits associated with treatment. Following the trial, the most preferable treatment, that is the treatment for which the posterior expected gain given the observed data, is highest, will be selected. The remaining patients will then receive this treatment. We wish to determine the optimal values, and , of the sample sizes n
1 and n
2 and to determine how and depend on the population size.
Decision problem formulation and notation
We will assume that responses for patients follow some distribution of natural one‐parameter exponential family form. In detail, let denote the response for patient j receiving treatment i and assume are i.i.d. with density for some and . Typically responses from patients in the two treatment groups will follow distributions of the same form, that is functions and will not depend on i, with ψ1 and ψ2 differing, though this is not assumed. We will assume that ψ1 and ψ2 are taken to have independent prior distributions of conjugate form, that is with having density for some and and normalising constant . The values and can be interpreted respectively as the prior mean of and the number of observations to which the prior information is equivalent, so that following a trial with patients receiving treatment and observation of , the posterior mean for given is equal to
with (see Bernado and Smith, 2000).Suppose that the expected gain from a patient receiving treatment i in the trial is , and that the expected gain to a future patient receiving treatment i is where and are such that the expected values , and where E
0 denotes the expected value taken over the prior distribution of ξ, exist and and are assumed to be differentiable with strictly increasing and with finite derivative. Assume further that
This ensures that the gain from treating patients in the trial cannot exceed that from treating them outside the trial. This is considered further in the discussion section below.We will consider two cases. In the first, the population is considered to be finite with known size, N. The number of patients treated following the trial is thus . In the second case, no finite population size is assumed, but the gain from future patients is geometrically discounted, so that the gain from patient j if they receive treatment i is if they are included in the trial and if they are treated following the trial, for some .The geometric discounting of gains from future patients can be interpreted in a number of ways. One interpretation is that gains further in the future are reduced to reflect either opportunity loss or loss of financial interest on an investment (see, e.g. Fergusson, 2008). With this interpretation it might be appropriate to take and N finite. An alternative interpretation is to imagine that gain from each future patient is of constant value, as is assumed in the finite horizon case, but that the size of the population, N, rather than being fixed in advance, is random, following a geometric distribution. This could be the case if, for example, the population is limited by some new treatment becoming available at which point the trial, or use of the recommended treatment following the trial, will be terminated, with the probability of this event constant over time (see Berry and Fristedt, 1985). In this interpretation is the expected population size prior to this new treatment becoming available. The size of λ and the resulting can thus reflect the population size, and in some ways a smaller value of λ, corresponding to fewer patients being available prior to the new treatment becoming available, more reasonably models a small population than assuming the number of patients has some fixed and known value, N.
Determination of the optimal sample size
Finite patient horizon case
Consider first the setting of a finite patient horizon of size N. Following observation of data
, the total expected gain if treatment i is recommended for all further patients is where denotes the expected value taken over the posterior distribution of given .The optimal action at the end of the trial is thus to select the treatment with the largest value of and the expected gain assuming this action is taken is equal toSince, prior to the commencement of the trial, is unknown, the expected gain from the trial is equal to where the expectation is taken over the prior distribution for ξ1 and ξ2 and is the function of ξ1, ξ2, , and n
2 given by
where denotes the expectation taken over for a given value of ξ1 and ξ2 so that expectations are taken first over the posterior distributions of given and then over given ξ1 and ξ2.Since is equal to the prior expectation for any function for which the expectations exist, we getWe wish to find the optimal values of n
1 and n
2, that is the values for which is maximized. For small N, when the optimal sample sizes, and will also be small, it may be feasible to evaluate the prior expected gain given by (4) directly, taking the expectation over the prior predictive distribution for and to find and by a numerical search. For larger N, such an approach may be infeasible. In this case asymptotic expressions for the optimal sample sizes, and , as the population size, N, becomes large are more useful.For finite n
01 and n
02, the expectation is increasing in n
1 and n
2. Thus for , and hence is also increasing in n
1 and n
2. Thus as the optimal trial design has both and . The case in which both n
01 and n
02 are both infinite corresponds to both ξ1 and ξ2 being known a priori. We will therefore consider the case in which, without loss of generality, n
01 is finite, and derive the optimal value .Note that as , so that the optimal and also approach infinity, so that the optimal sample sizes are such that .By the central limit theorem we have where denotes the variance of . Thus from (1), applying the delta method since is assumed to be differentiable and strictly increasing so that the derivative, , is non‐zero, we getUsing an expression for the expected value of the maximum of two normally distributed random variables given by Clark (1961), we have
where , , and ϕ and Φ denote standard normal density and distribution functions. ThusIn order to find , we obtain the derivative . From (3),We will thus find a large‐sample approximation for this derivative.Equation (6) gives an approximation for . The derivative of the right hand side of (6) is equal to
(see Web Appendix A for details), so that can be approximated byThe limit of this as N, n
1 and all approach infinity with and approaching 0 (again, see Web Appendix A for details), has expected value
where denotes the joint prior distribution of ξ1 and ξ2.Setting this to zero and solving for n
1, the maximum expected gain is found to be at
Note that the fact that g
1 is increasing and the requirement (2) ensures that both numerator and denominator are positive so that the square root exists. It is also interesting to note that the asymptotic optimal sample size for arm 1, n
1, does not depend on n
2 since we have assumed either that n
02 is infinite or that n
2 also approaches infinity.When n
02 is finite, by symmetry, the optimal value of n
2 is given byWhen , the prior distribution has mass at only, and so may be written as a univariate density , so that the optimal value of n
1 becomes
As , . Thus from an expression similar to (8) giving the derivative with respect to n
2 and (2), the derivative is negative and .When g
1 and g
2 are identical, (10) becomes
and when , this becomes
showing that this is a generalization of the expression obtained by Cheng et al. (2003) for the case in which has a Bernoulli distribution with parameter and .
Geometric discounting case
Consider next the second setting introduced above; that of an infinite population with geometric discounting.In a two‐arm trial it is assumed that n
1 and n
2 are sufficiently large and randomisation to treatments 1 and 2 sufficiently balanced that the gain to patients receiving treatment i in the trial can be taken to be . The total expected gain if treatment i is recommended for all further patients is thenThe optimal action at the end of the trial is thus again to treat all future patients with the treatment with the largest value of and the expected gain assuming this action is taken is equal to
The prior predicted expected utility for a trial with patients receiving treatment i is thusOptimal sample sizes, and , can be found directly using this expression and a numerical search in cases when this is computationally feasible.As above, it is of interest to seek asymptotic approximations to and , in this case as the geometric discounting parameter, λ, approaches 1 from below. We will again assume that n
01 is finite and obtain first an approximation for .The derivatives of and with respect to n
1 are respectively and which, by L'Hospital's rule, tend to 1 and −1, respectively as , so that, since the limit as of is , the derivative of the term with respect to n
1 tends to 1 if and 0 if .As , the derivative of (13) with respect to n
1 thus approachesThe argument above gives an approximation to this derivative of
which, since , with the former term, which is equal to , dominating, gives the approximationWriting , (14) can be written as
directly analogous to (10) with replacing N.For n
02 finite, is again given by symmetry by an expression analogous to (11). For , and is given by an expression analogous to (12) with N replaced by .
Examples
Single arm trials with Bernoulli data
We consider first the case of Bernoulli data. In this case the distribution of , the responses in treatment group i, can be parameterised with equal to the probability of treatment success. We will take ξ1 to have a conjugate beta prior with parameters a
1 and b
1 so that and , and assume ξ2 is known with value y
02, that is with , so that and a single‐arm trial is optimal.Given observation of data , ξ1 has a Beta posterior distribution, and the prior predictive distribution of is Betabinomial(n
1, a
1, b
1).We will assume that the gain from patients receiving treatment i will be determined by whether or not the treatment is successful, so that . From (4) with , the prior expected gain is thusFollowing one of the examples considered by Cheng et al. (2003), we take and , the known value of ξ2 to be 0.5 and . Figure 1 shows the prior expected gain for a range of values of n
1, here plotted on a logarithmic scale. As given by Cheng et al. (2003), the optimal value of n
1 is equal to 9, which is marked with a plus sign. The approximation to the prior expected gain given by (7) is also shown on the figure as a dashed line, showing that in this case the approximate and true values are close even for small n
1. The approximately optimal value of n
1 given by (12) is 10. This is shown by the circle on the figure, showing that this is close to the true optimum. In this case, this is close to the value of n
1 maximizing the approximation given by (7), though in general the additional approximation leading to (12) means that this need not be the case.
Figure 1
Prior expected gain (solid line) and approximate prior expected gain (dashed line) for a range of n
1 for the first single‐arm Bernoulli example with , , and . The optimal and approximately optimal values of n
1 are marked by + and ○, respectively
Prior expected gain (solid line) and approximate prior expected gain (dashed line) for a range of n
1 for the first single‐arm Bernoulli example with , , and . The optimal and approximately optimal values of n
1 are marked by + and ○, respectivelyFigure 2 gives values of from (12) along with exact optima for a range of values of N, with both and N plotted on logarithmic scales so that the square root relationship between and N given by (12) corresponds to a straight line with slope 1/2. The approximation given by (12) is close to the true value, and approaches it as N increases as would be expected.
Figure 2
Optimal (solid lines with points) and approximately optimal (dashed lines) values for n
1 for a range of N values for the first single‐arm Bernoulli example with and .
Optimal (solid lines with points) and approximately optimal (dashed lines) values for n
1 for a range of N values for the first single‐arm Bernoulli example with and .The effect of varying the prior distribution for ξ1 was also investigated. Web Figure 1 in Web Appendix B shows the optimal sample size with and for a range of and values. When the prior mean value of ξ1, that is y
01, is equal or greater than the fixed value of ξ2, the optimal sample size increases with prior weight n
01. This is reasonable since as n
01 increases we are increasingly confident that patients are not harmed by being in the trial and a larger sample size gives more information for the final decision. When y
01 is less than ξ2, the optimal sample size increases with n
01 for small n
01, so that more information can be collected for the final decision and decreases for larger n
01 when there is strong prior belief that treatment 2 is superior to treatment 1. It is interesting to note that when the prior weight, n
01, is kept fixed, the optimal sample size increases as the prior mean y
01 of treatment 1 increases. This is in contrast to frequentist sample size that would decrease as the assumed success rate for treatment 1 is increased (and is larger than the success rate of treatment 2).We next consider an example based on Stallard (1998), who also considered a single‐arm phase II trial with a Bernoulli outcome. In this case the gain function was chosen to reflect the financial costs and rewards associated with the conduct of the trial assuming that, if successful, it would be followed by a further trial with a frequentist design and analysis. Assuming the probability of success for treatment 2, here taken to be the current standard treatment, to be known, costs and rewards were taken relative to continuing to give all patients the current treatment. Thus if this treatment is recommended, the gain to patients outside the trial is taken to be zero, that is . The gain per patient outside the trial if the experimental treatment is recommended was taken to be of the form where , , , and for some l, α, β, and δ0. This form reflects a fixed cost of m per patient with a gain of l per patient if the treatment is shown to be effective in the subsequent trial where that trial has frequentist (one‐sided) type I error rate of and power to detect a log‐odds ratio of θ1. Stallard assumed linear discounting for j less than some constant, n
0, with geometric discounting for whereas we will assume geometric discounting for all j. Patients in the trial were taken to have constant (discounted) cost, that is , for some k. Since we know , it is not necessary to specify h
2. The parameter ξ1 was again taken to have a Beta(a
1, b
1) prior distribution. As the gain function does not depend on ξ2, it is not necessary to specify a value for the point‐prior for this parameter in this case.Since for all ξ2, the expression (12) becomes
where is the (univariate) prior density for ξ1. The value of can be evaluated using numerical integration and, as , we have .Although the form of utility function proposed by Stallard (1998) was not exactly that proposed here, based on values given, we took , , , , , , , and . Figure 3 shows the prior distribution for ξ1 along with the form of in this case. Figure 4 shows the prior expected gain calculated exactly using the betabinomal prior predictive distribution for for a range of values of n
1 values, plotted on a logarithmic scale, along with the approximation given by (7). It can be seen that in this case the approximation (7) to the expected gain is rather poor, particularly for smaller n
1. The value of obtained using (15) in this case is 102. This value and the associated prior expected gain is marked on the figure by a circle. Note again that does not maximise the approximate gain given by (7) that is shown by the dashed line on the plot. In this case n
1 the value of is quite far from the value of n
1 maximizing the approximate gain, though it is closer to the true optimal value of 95, which marked on the figure by a plus sign.
Figure 3
Prior distribution for ξ1 (upper panel) and gain function giving gain from treating each future patient with treatment 1 (lower panel). The gain function is shown as a dashed line on the right hand panel for comparison (see text for details).
Figure 4
Prior expected gain (solid line) and approximate prior expected gain (dashed line) for a range of n
1 for the second single‐arm Bernoulli example with (see text for details of gain function and prior distribution parameter values). The optimal and approximately optimal values of n
1 are marked by + and ○ respectively
Prior distribution for ξ1 (upper panel) and gain function giving gain from treating each future patient with treatment 1 (lower panel). The gain function is shown as a dashed line on the right hand panel for comparison (see text for details).Prior expected gain (solid line) and approximate prior expected gain (dashed line) for a range of n
1 for the second single‐arm Bernoulli example with (see text for details of gain function and prior distribution parameter values). The optimal and approximately optimal values of n
1 are marked by + and ○ respectivelyFigure 5 gives values of from (15) along with exact optima for a range of values of N, again with both plotted on a logarithmic scale. The approximation given by (15) is again close to the true value, and approaches it as N increases as would be expected.
Figure 5
Optimal (solid lines with points) and approximately optimal (dashed lines) values for n
1 for a range of N values for the second single‐arm Bernoulli example (see text for details of gain function and prior distribution parameter values).
Optimal (solid lines with points) and approximately optimal (dashed lines) values for n
1 for a range of N values for the second single‐arm Bernoulli example (see text for details of gain function and prior distribution parameter values).The effect of varying the prior distribution for ξ1 was again investigated, and again illustrated in Web Appendix B. Web Figure 2 shows the optimal sample size with for a range of and values. In this case as y
01 increases from 0.04 to 0.0845 the optimal sample size is increased in a similar way to that noted for the example above. In this case as y
01 increases further, however, the optimal sample size is reduced. Here, since , there is a cost associated with experimentation so that when there is strong prior belief that treatment 1 is superior to treatment 2, rather than giving many patients treatment 1 in a trial, a smaller trial is optimal.At the end of the trial, treatment 1 will be recommended if . For large n
1, that is approximately if , which, using expression (1), is true when . Considering recommendation of treatment 1 to correspond to rejection of the null hypothesis that , frequentist error rates attained for the optimal designs obtained can be derived.With a prior distribution with and , taking gives a type I error rate of 0.39. The form of shown in Fig. 3 suggests that we might require a test with high power when , since this value of ξ1 is associated with a high gain value. The power of the optimal test in this case is 0.999.Type I and type II error rate values for the optimal designs and prior distributions considered in Web Fig. 2 are shown in Web Fig. 3. As the prior weight becomes small, the optimal decision at the end of the trial is to select treatment 1 whenever it has observed mean exceeding , so the type I error rate approaches 0.5. As the prior weight increases, since in this case the optimal value of n
1 is relatively small, prior information comes to dominate the final decision and the type I error rate approaches zero or one, with the type II error approaching one or zero, depending on whether the prior mean is less than or greater than .
A two‐arm trial with Poisson data
The third example is based on an example given by Berry et al. (1994), who describe a trial of an HIB vaccine in Navajo children aged 2–18 months. The number of HIB cases is assumed to follow a Poisson distribution. Rather than expressing n
1, n
2, and N in terms of child‐months, we will assume that all children are followed up for the entire 16‐month period, and refer to the number of children in the trial and population. The observed number of cases per child j in group i, will be denoted by . The distribution of can be parameterised such that ξ1 and ξ2 are the expected numbers of cases per child for treatments 1 (the new vaccine) and 2 (the placebo), respectively. Thus , so that has mean , with following independent prior gamma(, ) distributions, that is with density for some . Note that has prior mean and prior variance . The posterior distribution of given is a gamma (, ) distribution and the prior predictive distribution of is NegBin ().Berry et al. (1994) include in their gain function a term that depends on the observed data that reflects the probability of obtaining regulatory approval for the vaccine. Here, we assume the gain from a child receiving treatment i depends on alone and, since gives the rate of HIB cases, which we would like to minimize, we take gain functions , . The case of gain functions that depend on the observed data is considered briefly in the discussion section below.The optimal values may be approximated using expressions (10) and (11). In this case , , and so that, for example, is
the integral in the numerator being equal toFollowing Berry et al. (1994), we take and , the latter corresponding to the placebo (note that the values given by Berry et al. are 16 times those used here as they take to give the rate of cases per child‐month). Berry et al. report that approximately 5400 Navajo are born each year so that minimization of HIB cases over a 20‐year period would correspond to . Figure 6 shows a contour plot giving the prior expected gain for this N for a range of n
1 and n
2 values (plotted on logarithmic scales) together with the approximation given by (7) (dashed lines). It can be seen that even for small sample sizes, (7) gives a close approximation to the true prior expected gain. The optimal design has and , and is marked by the plus sign. The approximately optimal design given by (10) and (11) has and , and is marked by a circle. The prior expected gain, in this case corresponding to minus one times the prior expected number of HIB cases in the population over the 20 year period, is −416.9 using the optimal design and −417.4 using the approximately optimal design.
Figure 6
Contour plot of prior expected gain (solid lines) and approximate prior expected gain (dashed lines) for for a range of n
1 and n
2 values assuming gamma (1, 200) and gamma (5, 667) prior distributions. The optimal and approximately optimal values of n
1 and n
2 are marked by + and ○, respectively.
Contour plot of prior expected gain (solid lines) and approximate prior expected gain (dashed lines) for for a range of n
1 and n
2 values assuming gamma (1, 200) and gamma (5, 667) prior distributions. The optimal and approximately optimal values of n
1 and n
2 are marked by + and ○, respectively.Figure 7 shows the values of and along with the approximations from (10) and (11) (dashed lines) for a range of N values, again with both plotted on logarithmic scales. It can again be seen how the approximations become increasingly accurate as N increases.
Figure 7
Optimal (solid lines with points) and approximately optimal (dashed lines) values for n
1 (upper lines) and n
2 (lower lines) for a range of N values assuming gamma (1, 200) and gamma (5, 667) prior distributions.
Optimal (solid lines with points) and approximately optimal (dashed lines) values for n
1 (upper lines) and n
2 (lower lines) for a range of N values assuming gamma (1, 200) and gamma (5, 667) prior distributions.The effect of varying the prior distributions for ξ1 and ξ2 was again investigated. Web Fig. 4 in Web Appendix B shows the optimal sample sizes, and , for a range of prior means and prior weights, here equal for the two priors, when . When the prior means are equal, the optimal sample size increases with prior weight. For unequal prior means, more patients are assigned to the arm considered a priori to be superior, in this case corresponding to a lower prior mean since , with the number assigned to the inferior arm increasing with prior weight when this is small and decreasing for larger prior weight values. Web Fig. 5 shows the effect of changing the prior weight for ξ1 alone when the prior mean is equal to, greater than or less than that for ξ2. In this case increasing the prior weight leads to an increase in the optimal sample size for both arms, with the arm with the lower prior mean having a smaller optimal sample size.The examples above compare the large sample approximation for the optimal sample size for arm i, , given by expression (10), with that obtained by exact numerical optimization in two examples, enabling assessment of the approximation for smaller values of N. The derivation of (10) relies on large sample approximations in two ways; first the distribution of the posterior expected utility is approximated by its asymptotic normal form in (5) using the central limit theorem and the delta method, and second, the derivative of the expected gain, given by (8) is approximated by (9). The first approximation is exact when are normally distributed and are linear. The first and second examples suggest that both approximations are sufficiently accurate for Bernoulli data even for quite small N when are linear, but less so for nonlinear , when the first approximation may be poor, as noted by Bernado and Smith (2000). For nonnormal data the accuracy of the first approximation also improves as the prior weight increases, though, as illustrated in the third example above, this also leads to smaller , so that the overall accuracy of the asymptotic approximation to the optimal sample size might be poorer.
Discussion
The work reported above leads to expressions for the optimal sample size in a clinical trial to compare two treatments or to compare a single experimental treatment with a historical control the properties of which are assumed known. The observed data for patients receiving treatment i are assumed to follow a distribution of one parameter exponential family form, with mean assumed to have a conjugate prior distribution. Optimization is based on consideration of the costs and benefits both from patients in the trial, given by some utility function, for patients receiving treatment i, and from subsequent patients who will receive treatment i based on the results of the trial, given by some function if they receive treatment i. Although the expressions obtained could be directly used to design a trial, it is also of more general interest to see how the optimal sample size depends on the size of the population under investigation. We have shown that if the population is assumed to be of some known size, N, for any and satisfying sufficient regularity conditions for expectations to exist, differentiable with strictly increasing, and satisfying the condition given by (2); that is , the optimal sample size is as . If it is assumed that there is an infinite population with geometric discounting with discounting factor λ, under the same conditions the optimal sample size is as where . This extends previous work by Cheng et al. (2003).Although we have considered general functions and , giving the gain to patients inside and outside the trial who receive treatment i, , we have assumed that these are functions of only. This is a common assumption and it seems reasonable that the benefit to a patient from taking a given treatment will depend only on the properties of that treatment (see, e.g. Lindley, 1997, who cites the seminal work by Raiffa and Schlaifer, 1961). Noting, however, that the gain functions correspond to gain from future patients if the trial indicates that treatment i is superior, some authors have proposed gain functions for patients treated following the trial that depend, in addition to , on the observed trial data, . In particular, gain functions have been proposed that reflect the fact that use of a novel treatment following a trial may depend on regulatory decisions that in turn depends on whether trial results are sufficiently compelling (see, e.g. Posch and Bauer, 2013). In both of the more realistic examples described above, the gain functions given by Stallard (1998) and Berry et al. (1994) depended on , and we have simplified the gain functions when discussing these examples above.The forms of the utility functions and given above were motivated by consideration of the gain to each patient from participation in the trial or from being treated with treatment i following the trial, suggesting that the trial sample size is optimised from the patient's perspective. The general form of the expected gain given by (3), however, can express any gain so long as this can be specified on a per‐patient basis. The results obtained could thus also apply to financial gains from a commercial perspective or to societal gains from development of a novel therapy. In the latter cases it may be more appropriate for and to have a more complex form or to depend on trial data as discussed above.The condition (2) ensures that the gain per patient in the trial does not exceed that per patient outside the trial if patients were to receive optimal treatment. If this does not hold the optimal design will be to continue with trial forever, giving all patients the treatment for which the prior expected gain is the largest. This restriction on and seems reasonable if reflects not only the benefit to patients in the trial receiving treatment i, but also the cost of the trial, either in financial terms for the trial sponsor or funder or in terms of commitment by the patient, both of which may be considerable.It is interesting to compare the optimal sample sizes obtained above with sample sizes typical for clinical trials. In particular, it might be of interest to consider the size of population for which conventional sample sizes would correspond to that of the optimal design. A method for frequentist sample size calculations for a single arm trial with a Bernoulli response is given by Fleming (1982), who shows that the sample size required for a trial with (one‐sided) type I error rate α and power to detect an improvement to a success probability of p
1 from a control success probability of p
0 as is . As discussed above, the form of given in the second single arm Bernoulli data example above and shown in Fig. 3 suggests that an appropriate value for p
1 might be about 0.35, since this value of ξ1 is associated with a high gain value. For and , this would give a sample size of 111. Using the gain function described above, this would be optimal for a population of size, or expected population size in the case of geometric discounting, of about 3000. The prior distribution used in the example above, and also shown in Figure 3 is such that a value of ξ1 as large as 0.35 is highly unlikely, suggesting that a smaller value could be used for p
1 in the frequentist sample size calculation. The 95th percentile of the prior distribution is 0.256. To give power of 0.9 to detect a treatment effect corresponding to this value of p
1 would require a sample size of 704. This would be optimal for an expected population of size of a little over 100,000. It is important to note that even when the sample sizes are similar, the optimal designs obtained above may be very different from those obtained using the usual frequentist approach as, following the trial, a treatment is recommended depending on the posterior expected gains rather than on the basis of type I error rate control. As seen above, depending on the prior distribution, this can lead to type I error rates considerably higher than those conventionally used in large‐scale confirmatory studies. In this regard, the designs obtained are more similar to those sometimes used for early‐phase clinical trials or pilot studies (Schoenfeld, 1980; Stallard, 2012). Further comparison of frequentist and decision‐theoretic approaches is an area where further research would be of interest.
Conflict of interest
The authors have declared no conflict of interest.Stallard et al Program 1Click here for additional data file.Stallard et al Program 2Click here for additional data file.Web Figure 1: Optimal sample size for first single arm Bernoulli example with and N = 100 for varying weights, n
01 for the prior distribution for ξ1 when the prior mean is 0.3 (left hand panel), 0.5 (centre panel) or 0.7 (right hand panel).Web Figure 2: Optimal sample size for second single arm Bernoulli example with and as described in the main text and N = 5000 for varying weights, n
01 for the prior distribution for ξ1 when the prior mean is 0.04 (left hand panel), 0.0845 (centre panel) or 0.4 (right hand panel).Web Figure 3: Type I (dashed line) and type II (dotted line) error rates for the optimal designs shown in Web Figure 2.Web Figure 4: Optimal sample sizes for treatment group 1 (solid line) and treatment group 2 (dashed line) for two arm Poisson example with and N = 10800 for varying weights, , when the prior mean for treatment group 1 is 0.05 and the prior mean for treatment group 2 is 0.05 (left panel), 0.075 (centre panel) or 0.1 (right panel).Web Figure 5: Optimal sample sizes for treatment group 1 (solid line) and treatment group 2 (dashed line) for two arm Poisson example with and N = 10800 for varying weights n
01, with n
02 = 10, when the prior means for treatment groups 1 and 2 are respectively 0.1 and 0.05 (left panel), 0.05 and 0.05 (centre panel) or 0.05 and 0.1 (right panel).Click here for additional data file.
Authors: Hamid Pezeshk; Nader Nematollahi; Vahed Maroufy; Paul Marriott; John Gittins Journal: Stat Methods Med Res Date: 2011-03-24 Impact factor: 3.021
Authors: Siew Wan Hee; Thomas Hamborg; Simon Day; Jason Madan; Frank Miller; Martin Posch; Sarah Zohar; Nigel Stallard Journal: Stat Methods Med Res Date: 2015-06-05 Impact factor: 3.021
Authors: Michael Pearce; Siew Wan Hee; Jason Madan; Martin Posch; Simon Day; Frank Miller; Sarah Zohar; Nigel Stallard Journal: BMC Med Res Methodol Date: 2018-02-08 Impact factor: 4.615
Authors: Jonathan A Cook; Steven A Julious; William Sones; Joanne C Rothwell; Craig R Ramsay; Lisa V Hampson; Richard Emsley; Stephen J Walters; Catherine Hewitt; Martin Bland; Dean A Fergusson; Jesse A Berlin; Doug Altman; Luke D Vale Journal: Trials Date: 2017-06-12 Impact factor: 2.279
Authors: Jonathan A Cook; Steven A Julious; William Sones; Lisa V Hampson; Catherine Hewitt; Jesse A Berlin; Deborah Ashby; Richard Emsley; Dean A Fergusson; Stephen J Walters; Edward C F Wilson; Graeme MacLennan; Nigel Stallard; Joanne C Rothwell; Martin Bland; Louise Brown; Craig R Ramsay; Andrew Cook; David Armstrong; Doug Altman; Luke D Vale Journal: BMJ Date: 2018-11-05