Literature DB >> 24639051

Randomized reverse marker strategy design for prospective biomarker validation.

Abstract

We describe a novel study design for validating marker-based treatment strategies meant to select among possible therapeutic options using a biologic marker. Studying existing designs in realistic scenarios, we demonstrate that this design is more than four times more efficient for testing the interaction between a marker and its intended treatment. Our analysis employs a simple parametric framework that uncovers systematic biases in currently proposed designs and suggests how they may be accommodated or enumerated. In the context of markers for choosing a treatment for recurrent ovarian cancer, our proposal requires sample sizes on the order of recently completed phases II and III studies making validation studies for this clinical decision scenario viable.

Entities: Chemical Disease Gene Species

Keywords: biomarker validation; interaction; ovarian cancer; randomized trial; trial design

Mesh：

Substances：
Biomarkers, Tumor

Year: 2014 PMID： 24639051 PMCID： PMC4107176 DOI： 10.1002/sim.6146

Source DB: PubMed Journal: Stat Med ISSN： 0277-6715 Impact factor: 2.373

1. Introduction

Acknowledging that tumor heterogeneity contributes to the variety in response to treatment 1 and noting the rise in the discovery of therapeutics whose response is limited to a subgroup (e.g., gefitinib and epidermal growth factor receptor mutant patients) 2 or in compounds whose beneficial effects are tied to a marker (e.g., tamoxifen and estrogen receptor, Herceptin and Her2/neu), there is a significant interest in finding biomarkers that can be used to target treatments. Unfortunately, the clinical benefits of promising markers are rarely realized 3. One reason may be the surprisingly large sample sizes required to test the superiority of a marker-based (MB) treatment versus nonmolecular treatment plans. One example from a prospective, randomized design 4 requires about 1000 patients to detect a hazard ratio of 0.70; in another optimal design, a 13% difference in response rate requires about 500 patients 5. These large numbers add to the burden of discovery and may make trials prohibitively expensive for rarer cancers. Even so, validation by multiple independent prospective trials is a necessary step in the development of newly characterized markers 6,7. An economical way to test a marker is to test for treatment effects in a specific marker subset following an enriched or targeted design 8. However, it is also of interest to know whether the marker is relevant beyond a single stratum to quantify the practical, clinical impact of an MB strategy. A desirable design is both randomized and targeted 9; subsequently, we mean targeted to say that the marker's predictions have been included in the assignment of treatments in the study. The ultimate goal of validation, then, is to interrogate a biomarker strategy: a predictive marker linked to treatments to form a predictive strategy. For testing strategies, there exists a handful of study designs 4, which either directly test the clinical context of MB treatment or indirectly evaluate the interaction effect. The most efficient design is unclear as investigations have determined that there is ambiguity about the situations where one design is more efficient than the other 10. In this article, we will propose a new study design that we call a reverse marker (RM) strategy design. As in existing marker-strategy designs (viz. 4), it employs a two-arm randomization scheme, provides a direct estimate of the marker-strategy response rate, and evaluates the interaction between the marker and possible treatments. The design appears to use one quarter of the observations required to test the same interaction in likely scenarios. Evaluated in the context of recurrent ovarian cancer, this efficiency makes it possible to conduct studies at effect sizes reported in the literature while previous studies could not. In deriving this new design, we employ a parametric framework to characterize the exact hypotheses under study. This analysis reveals, in all of the designs, an implicit bias that relates to the marginal difference in effect between candidate treatments. Fortunately, it is straightforward to adjust planned studies to accommodate these biases using this framework. We first introduce three common designs for testing interactions as well as our proposed reverse-marker strategy design (Section 2). A nontechnical discussion of design considerations focuses on the contexts for and against the new design (Section 2.1). Section 3 reviews the parametric framework and makes more technical comparisons. Our ovarian cancer case study in Section 4 illustrates the relative efficiency of the designs and tests their sensitivity to marker prevalence.

2. Randomized designs for prospective biomarker validation

Throughout the article, we consider four designs whose schemas are given in Figure 1; the first three are described in 4, and sample sizes for a class including designs 2 and 3 are discussed in 5. Design 1 is the stratified, marker-interaction (MI) study and is distinct from the others because it randomizes treatments stratified on marker status. Design 2, the MB strategy, and design 3, the modified MB (MMB) strategy design, randomize patients to receive treatments following an MB assignment or a standard clinical decision workflow and thus directly test the marker-strategy hypothesis. For brevity, the reader is referred to the articles cited earlier for more details.

Figure 1

Four designs for marker validation studies. Shaded boxes indicate the arms used in the planned analysis. Parameters below each box are the expected response rate in each arm using the notation in Section 3. Designs 1–3 are described in 4. Design 4 is a novel proposal. Design 4, the RM strategy design is a novel proposal. As in the MB and MMB designs, this is a direct marker strategy design with two arms where one arm follows the MB strategy. The complementary arm of the RM design, instead of a default treatment (MB) or a randomly drawn treatment (MMB), tests the reverse treatment hypothesis that M + patients should be assigned to treatment B and M − patients should be assigned to treatment A. Each of these designs is intended to test the utility of a marker strategy by conducting a simple test of proportions (MB, MMB, RM) or a test of the additive interaction (MI, defined in Appendix 1) between the major arms shaded in Figure 1. Therefore, any statistical guarantees given by the associated test reflect on the qualities of the design, and we will refer to test and design interchangeably.

2.1. Considerations for testing marker strategies

Before proceeding into the technical discussion, we summarize the key considerations for the trial designs, pointing to the relevant statistical discussion in Section 3 and case studies in Section 4. Is our goal to test for a treatment effect in a marker positive subset only? The targeted or enriched designs assay patients first and then select only marker positive patients to study. Typically, the studied subset is based on a biological hypothesis or a drug's mechanism of effect and might be used in cases where a drug has shown little marginal effect in the general population. However, the trial design says nothing about the marker negative patients and may be less relevant in the clinical setting when the marker is rare. One notes that the power and sample sizes for enriched designs 8 must be similar to the situation where a marker is always present because the MI, RM, and MB designs become equivalent to the enriched design. These designs are not fully identical because of their own specific considerations. Is our goal to test the interaction or the clinical strategy in all the patients? The MI design is designed to test whether the active treatment is unusually effective in marker positive patients. The MB, MMB, and RM designs are powered to test the joint deployment of the marker and active treatment as a strategy (Section 3.1). The difference is that the clinical strategy takes into account the marker prevalence as well as the interaction to characterize the potential impact of the marker strategy on clinical care. A key difference between the MI (measuring the interaction) and the RM (testing the strategy) designs is whether the measurement of the marker implicitly represents an intent to treat. In the MI design, patients can be dropped from study between measurement and randomization. While the control of patients into arms is ideal because some patients may be discarded when the arm closes, this seems to have a mixed benefit because accrual rates may be low if the marker prevalence is extreme. In contrast, all patients are randomized, measured, treated, and evaluated in the RM design. Thus, a quantity of interest to some study designers (especially for rare diseases) may be the expected number of patients who must be assayed in order to reach the required accrual goal. The MB and MMB designs are in a gray area because the marker needs to only be measured in one arm of the trial. The active treatment should be appropriate for all patients. The MI, MMB, and RM designs assume that the active treatment can be given to marker negative patients. In the MI and RM designs, the marker values for all patients are assessed, so the design might be modified to exclude contraindications: marker negative patients may be reassigned to the control treatment regardless of randomization. In the MB and MMB designs, patients randomized to the non-MB treatment arm have unknown status that may allow assignment to the contraindicated regimen (Section 3.2). Is there a marginal drug effect? For candidate drugs in search of marker-defined subgroups, it is likely that there is existing evidence of no beneficial effect at a previously tested population level. In general, if there is a significant difference, there seems to be little need to conduct a marker trial. In the situation that a marker trial is warranted, the framework in the next section provides a method for adjusting for expected differences (Section 3.3). In our computational study, the RM sample requirements were robust to variation in the marginal difference (Section 4.3). How prevalent is the marker? Marker prevalence affects the clinical relevance of a marker strategy. In our computational example, we noted that the RM design was relatively robust to variation in the prevalence (Section 4.2). With respect to the logistics of accruing patients, if the marker is extremely rare, then one may consider enriched or targeted designs that study only the marker positive setting. During the design phase investigators, one may wish to compare the expected accrual rates among different designs. Statistical efficiency. The power of the RM design dominates the MB and MMB designs in most reasonable cases (Section 3.4). In the simulation study, we consider tests based on the RM and MI designs that are similar but powered for different comparisons (Sections 4.1 Section 4.4).

3. Probability framework for marker strategies

Suppose that binary variable Y represents a patient's response to their randomly assigned treatment, T ∈ {A,B}. Let the binary marker under study, M ∈ {M + ,M − }, have level M + associated with higher response rates and marker prevalence P(M = M + ) = π where 0 < π < 1. We assume that a patient's response depends only on their marker value and the treatment to which they are assigned. That is, P(Y | T = i,M = j) = θ where Here, I( ⋅ ) is the indicator function, β0 represents a baseline effect, β represents the added effect of treatment A, β + the effect of a positive marker, and β is a nonadditive effect. The notation is summarized in Table 1.

Table 1

Summary of mean parametrization and effect parametrization.

Treatment	Marker Status	Mean Notation	Effect Notation
A	M⁺	θ_A +	β₀ + β_A + β₊ + β_I
A	M⁻	θ_A −	β₀ + β_A
B	M⁺	θ_B +	β₀ + β₊
B	M⁻	θ_B −	β₀

For completeness, we denote the marginal effect of treatment A as θ = P(Y | T = A), likewise for treatment B. It is generally assumed that B is a standard treatment and that A is under study for its increased effect (β > 0) or MB effect (β > 0). Under this framework, the MB strategy under consideration assigns M + patients to treatment A and M − patients to treatment B. Defining the marginal effect of treatment A over B to be where γ = 0 corresponds to no difference, Section 3.1 will show that existing designs implicitly make assumptions on γ that can lead to anti-conservative analyses. We state that the marker-strategy validation designs (MB, MMB, RM) intend to test H: π(1 − π)β = 0. We derive this quantity in Section 3.1 by considering exactly what hypothesis is tested by each design. Intuitively, this hypothesis contains both the marker prevalence π and an effect β. Notably, β focuses on the specificity of the marker effect: a marker that is only prognostic (β + > 0) will not aid treatment; a treatment that is independently superior (γ > 0) does not necessarily require a marker. The interaction (β) captures the information about whether the strategy to assign M + patients to treatment A has more merit than its individual components. Targeted or enriched designs 8 that randomize within M + and M − strata are similar to the MI design 4, except that the targeted design intends to examine only the M + arm, testing the null hypothesis that H: θ = θ. The MI design is really a test of both arms, H: {θ = θ,θ = θ}. We emphasize the adjective marker-strategy to denote designs meant to test H: π(1 − π)β = 0. Finally, while we consider binomial responses in this article, a similar argument can be made by considering the same additive model parametrization of the (negative) log hazard of a survival time. The corresponding log-rank test between arms and its sample size computation are based on an equivalent quantity 11, so many of the following results translate directly. Summary of mean parametrization and effect parametrization.

3.1. Expected response rates and hypotheses under consideration

In the MI design, the test of interaction is based on the comparison of treatment effect in each marker arm. It expects to test H: Δ1 = 0 where We give a statistic testing the linear interaction in the Appendix. One might alternatively test the differences by constructing the 2 × 2 table of responders given treatment arm and marker status and then using Fisher's exact test, the test of proportions or a χ2-test; the power of such a test depends on the odds ratio and is inconvenient in this notation. The planned analyses of the MB, MMB, and RM designs each test the difference in response rates in the two arms. Let φ1 = πE(Y | M = M + ,T = A) + (1 − π)E(Y | M = M − ,T = B) be the expected response to an MB strategy. The non-marker strategy arm has expected response rate φ2 = E(Y | T = B) in the MB design; φ3 = E(Y | T = A) / 2 + E(Y | T = B) / 2, in the MMB design; and φ4 = πE(Y | M = M + ,T = B) + (1 − π)E(Y | M = M − ,T = A), in the RM design. By comparing the response rate across these two arms, the designs test H : Δ = 0 where Δ = φ1 − φ for k = 2,3,4 (MB, MMB, and RM). It can be shown that the expected differences are as follows: In the case that γ = 0, it is sensible that the three direct designs test the MB strategy effect, π(1 − π)β, as this depends on the prevalence of the marker as well as the expected interaction effect. The distinction between the indirect and direct designs is evident here: the direct designs account for the clinical utility of the marker and treatment. In the case that the marker is very common or very rare, the likelihood of a situation that may be adjudicated by a marker is low and π(1 − π)β reflects this. We expect that given the same number of patients, the RM design will have more power to detect deviations from H : π(1 − π)β = 0 than the MMB design, because it has twice the signal ( Δ4 = 2Δ3); simulation studies easily demonstrate this effect in supplemental material. Relative sample sizes calculations follow in Section 3.4. This amplification of effect comes from observing that φ3 = φ1 / 2 + φ4 / 2. This means that between the two arms in the modified MB strategy design, ceteris paribus, half the patients would have received the same treatment regardless of randomization. The RM design arises by recognizing that we can adjust the randomization point to minimize redundancy.

3.2. Treatment assignment frequency and balance

An informative comparison is to consider how each design assigns patients to the four possible groups: AM + , AM − , BM + , and BM − . The expected fractions are summarized in Table 2. We observe that all combinations of treatment and marker are possible in the MI, MMB, and RM designs. In these cases, the investigator must be prepared to use all of the possible levels.

Table 2

Expected fraction of patients assigned to treatment groups A and B by design.

Design	Fraction Assigned
	Marginal		Marker and Treatment				Probability of same treatment
	A	B	A,M⁺	A,M⁻	B,M⁺	B,M⁻	Probability of same treatment
1 (MI)							n/a
2 (MB)				0		1 − π	1 − π
3 (MMB)
4 (RM)							0

π is the prevalence of M + markers. The last column refers to the probability that the same treatment is assigned regardless of randomization to marker-strategy arm or not.MI, marker interaction; MB, market based; MMB, modified marker based; RM, reverse marker.

The MI and RM designs have identical assignment rates and can be applied in similar situations. We will see later that the designs are not fully equivalent because the MI design is powered to test treatment effects in each marker arm separately, while the RM design ought to be powered for the interaction hypothesis directly. Note that the marginal frequency of treatment assignments in the MB and MMB designs depends on marker prevalence. When π is very small, this dependence may lead to inefficiency: if M + is rare, the MB strategy is largely concordant with the non-MB strategy. In contrast, the MI and RM designs show a marginal sense of balance by assigning treatments in equal weight, invariant to π. For the MI design, this is a result of the stratified approach that treats marker groups as cohorts and does not use the marker to supervise treatment assignment. The RM design's balance comes because marker values should be evenly randomized across arms. Expected fraction of patients assigned to treatment groups A and B by design. π is the prevalence of M + markers. The last column refers to the probability that the same treatment is assigned regardless of randomization to marker-strategy arm or not.MI, marker interaction; MB, market based; MMB, modified marker based; RM, reverse marker.

3.3. Unequal treatment effects

When γ ≠ 0, the guarantee on type I control in the test of H: π(1 − π)β = 0 may be affected. Simply, the test may be invalid if β = 0 does not always imply that Δ = 0. Without loss of generality, we may consider the γ > 0 case. In the MB design, {β = 0,γ > 0} implies that πγ > 0, so the design is always anti-conservative and will falsely identify an interaction more often than it should. This occurs because only the marker-strategy arm receives the superior (γ > 0) treatment, which is aliased with marker effect 4. In the MMB and RM designs, {β = 0,γ > 0} implies that Δ = 0 only when π = 1 / 2, corresponding to even chance of assignment to the superior treatment. Because they decrease Δ3, rarer markers (π < 1 / 2) will make the test more conservative, while more prevalent markers will make the test anti-conservative. While concerning, the latter case is less relevant: highly prevalent markers in the presence of unequal treatments represents a case where biomarker-mediated treatment is likely to be redundant. Thus, we have identified the bias, which can be accounted for during study design by estimates from prior studies. If we adjust for the bias, the test becomes the two-sample test that the response proportions differ by the bias, namely, H: Δ = (π − 1 / 2) for designs k = 3, 4 (MMB, RM). Note that the MB design cannot be adjusted (if γ ≠ 0 and Δ2 = 0, then π must be zero).

3.4. Relative efficiency of designs for testing a marker-strategy hypothesis

We consider the relative sample sizes required by each design. Because the MI design is intended to test treatment effects in each stratified arm before testing for the interaction, it is recommended to power each arm separately 8,12. The formula is listed in the Appendix. Given a particular set of response rates, θ,θ,θ,θ, a target level (α) and power ( 1 − β), the sample sizes for the MB, MMB, and RM designs are computed as follows. Under one-to-one randomization, let n be the number of patients required in each arm in design k > 1. As in the previous section, suppose that φ1 is the expected response rate in the marker arm and φ,k > 1 is the rate in other arm under design k, where Δ = φ1 − φ. The required sample size for each arm in a test of proportions between the two randomization arms is Where z is the αth quantile of a standard normal distribution. The relative sample size required by designs 3 and 4 is So, if φ4 = φ3 or φ4 = 1 − φ3, then n3 / n4 = 4; the MMB design uses four times more subjects than the RM design. It can be shown that in general, φ4 < min{φ3,1 − φ3}and φ4 > max{φ3,1 − φ3}assure n3 / n4 > 4. When φ1 is close to zero or one, the relative efficiency is most sensitive to φ3 and φ4. Inversely, φ1 = 1 / 2 minimizes their effect. So, the parenthetical term in Equation (12) may be bounded by two ratios: As a function of φ4, the ratio r0 is lowest at φ4 = 1 / 2 by concavity, so the zeroes of the parabola mark a boundary: for (approximately, 0.07 < φ3 < 0.93) and 0 < φ4 < 1, and . In the other direction, considering φ1 = 1 / 2 and r1, a similar argument shows that r1 > 1 / 4 for all values of φ3 and φ4. These represent a conservative boundary as there are more extreme cases where the ratio is larger than 1/4. In summary, the RM design is more efficient than the MMB design for and is more than four times more efficient for φ4 < min{φ3,1 − φ3}or φ4 > max{φ3,1 − φ3}.

4. Recurrent ovarian cancer study planning

Our specific motivation comes from the validation of biomarkers meant to guide maintenance treatment of recurrent, advanced ovarian cancer. These cancers respond to initial platinum treatment but commonly relapse and, through a cycle of serial treatments, become increasingly platinum resistant 13. While several approved chemotherapies are available 14, the best recurrent alternative to platinum treatment is unclear 15. This treatment decision is presently guided by previous response to therapy that is coarse, is variable, and requires an intervention to evaluate 1. The use of genomic biomarkers offers an individually relevant guide 16, but these quantities will need to be evaluated through prospective study. As such, our general intent is to consider the characteristics of study designs that will be employed in the next phases of research. A review of platinum-resistant cancers in phases II and III studies without markers 17 reports sample sizes ranging from 27 to 254 total patients in single and double arm trials with response rates ranging from 0.06 to 0.18 for single agent and 0.22 to 0.40 for double agent therapies (each representing a different clinical context). These numbers are consistent with reviews citing a 0.10–0.20 response, regardless of treatment, in previously treated platinum-resistant cancers 1. Subsequently, we study the required sample sizes (Equations (10) and (16)) as a function of β and π, and we discuss tests of interaction versus stratification. The intention is to illustrate the use of the designs in study planning and is not meant to be comprehensive of all (β,π) scenarios. Throughout the section, we consider sample sizes for tests at level α = 0.05 and power 1 − β = 0.80.

4.1. Sample sizes for interaction tests

Denoting typical frontline treatment, a platinum and taxane, as treatment B, we imagine that we have a marker with prevalence π = 0.5, which is predictive in platinum/taxol treated patients: high-marker values have a response rate of 0.10, and low-marker values have a response rate of 0.50 (the marginal rate is 0.30). Obviously, a marker that simply tells us that some patients will not respond to treatment has an important, but limited, value. Thus, we imagine a search for a treatment combination that improves response in the high-marker patients. Table 3A parameterizes our scenario. The platinum/taxol combination is listed with fixed response rates. The response rate of an alternative active treatment (treatment A) is parameterized by β. The marginal rates are fixed at 0.30, so γ = 0 for all β. Note that while the response rate for AM + patients may be lower than AM − patients, as long as β > 0, marker positive patients respond to the active treatment better than treatment B. Figure (A) shows the required sample sizes as a function of 0 < β < 0.25. We observe that the MB and MMB designs have the same sample size requirement when π = 0.5 and are less efficient than the MI and RM designs (as expected per Section 3.4). The RM design is about twice as efficient as the MI design that may be attributed to the fact that the MI design divides patients four ways (AM + , AM − , BM + , BM − ), while the RM design requires only two (marker strategy and the reverse).

Table 3

Recurrent ovarian cancer treatment example scenarios used in Section 4.

Treatment	Response rate
Treatment	M⁺	M⁻	Population
A. Parametrization under 0 < β_I < 0.5, π = 0.5, γ = 0
A	0.10 + β_I	0.50 − β_I	0.30
B	0.10	0.50	0.30
B. β_I = 0.20 used for π ≠ 0.5 study
A	0.30	0.30	0.30π + 0.30(1 − π)
B	0.10	0.50	0.10π + 0.50(1 − π)
C. β_I = 0.20, π = 0.6 used for − 0.3 < γ < 0.7
A	0.30 + γ	0.30 + γ	0.30 + γ
B	0.10	0.50	0.30

Fixed values are taken from literature.

Figure 2

(A)–(C) Required sample sizes for ovarian cancer scenarios outlined in Table 3. The vertical lines highlight the β = 0.2, π = 0.5, and γ = 0 scenarios where the marker is uninformative in one and predictive in the other treatment. (D) Power of interaction and stratification tests given the computed sample size for the reverse marker design.

Recurrent ovarian cancer treatment example scenarios used in Section 4. Fixed values are taken from literature. (A)–(C) Required sample sizes for ovarian cancer scenarios outlined in Table 3. The vertical lines highlight the β = 0.2, π = 0.5, and γ = 0 scenarios where the marker is uninformative in one and predictive in the other treatment. (D) Power of interaction and stratification tests given the computed sample size for the reverse marker design. We consider what effect sizes can be detected using the roughly 200 patients accrued in the previously described second-line ovarian studies. For the MI design, 200 patients require β = 0.24 (a 0.34 response rate in M + and 0.26 in M − patients); similarly, the RM design with 200 patients could detect a β = 0.18 scenario (0.28 vs 0.32 response rates). These effects are not inconsistent with the meta-reviews 17. Subsequently, we consider the β = 0.20 case where the MI design calls for 298 and the RM design calls for 158 patients, which is close to the largest reported study sizes 18.

4.2. Sample sizes as a function of prevalence

We select the β = 0.2 scenario for further study because it calls for a realistic number of patients (n = 158) and has a reasonable magnitude of effect. Table 3B reparameterizes the scenario in terms of the marker prevalence π. Note that in this baseline scenario, the marker has no effect in treatment A, but it does indicate that for M + patients, A is the better treatment. In this case, the RM design dominates all of the others for all prevalence values so the efficiency gain seen in Figure 2(A) is invariant to π. Noting that the MB and RM designs have the same sample size requirement at π = 1, by design, these must be equivalent to the enriched or targeted design where only M + patients are randomized to treatment A or B. The sample size for the MI design does not vary as we may simply close the arm when the required number of patients is accrued. The MB design requires fewer patients as π increases, concordant with the idea that more patients are being assigned to the active treatment. Further, there does exist a range of π where the MB design is more economical than the MI design (as reported in 10), although this is likely to be an unrealistic clinical scenario.

4.3. Sample sizes as a function of marginal treatment effect

We reparameterize the β = 0.2 case again in Table 3C to depend on − 0.3 < γ < 0.7, the marginal treatment effect. Because Δ4 depends on γ(π − 1 / 2), we select the π = 0.6 case to avoid the insensitivity of RM and MMB designs on γ. In Figure 2(C), the MMB and RM sample sizes are still surprisingly insensitive to γ, while the MI design's sample size requirement increases mildly as the marginal effect increases.

4.4. Power to test stratified differences

We consider the β = 0.2 scenario as a function of π again (Table 3B) and fix the sample size at the targeted value required for the RM design (n4). Under this scenario, we compute the power for various RM-based and MI-based tests relative to the 80% RM interaction (marker strategy) test and plot the results in Figure 2(D). To compute power by simulation, we generated 10,000 datasets and reported the fraction of tests significant at level α = 0.05. Power for the MI interaction test can be evaluated by formula: fixing n = n4, we invert the sample size formula to obtain the power. Notably, the test has consistently lower power than the RM interaction test at n4 for all values of π. We have implemented both a stratified t-test and a χ2-test (goodness of fit). The former tests that one treatment is superior in both arms (separately), so the π = 0.5 case is a null scenario (and power approaches the appropriate target 0.05). The goodness of fit test is powered against the range of π, but both are consistently underpowered versus the interaction test.

5. Conclusion

We have proposed a new design for a randomized prospective marker validation study that is significantly more efficient for testing marker strategies than existing designs in scenarios motivated by our ovarian cancer work. Using literature-based estimates of the available sample sizes, we determine that this design is a step toward making these studies possible where previous designs have made them logistically infeasible. Pragmatically, for situations where a randomly selected treatment has a better than 7% response rate, the RM design is more efficient than the MMB designs, and we have given bounds for when it is more than four times more efficient. This design is balanced: randomization frequencies for each treatment are equal independent of marker prevalence. While the MI design balances treatments without using the marker in treatment assignment, the RM design maintains balance and implements an MB strategy mimicking the clinical workflow. Both of these properties have a place in phased biomarker development. While it is difficult to match specific designs to specific situations abstractly, we have provided some guidance on the considerations for marker strategy versus interaction designs as well as a set of parameters to consider during the design phase. It is of primary importance that the study designer is clear on what quantity best represents their research question: the interaction, the marker strategy, or the subgroup treatment effect.

18 in total

Review 1. Standard treatment in advanced ovarian cancer in 2005: the state of the art.

Authors: M A Bookman
Journal: Int J Gynecol Cancer Date: 2005 Nov-Dec Impact factor: 3.437

Review 2. Biomarkers in cancer staging, prognosis and treatment selection.

Authors: Joseph A Ludwig; John N Weinstein
Journal: Nat Rev Cancer Date: 2005-11 Impact factor: 60.716

Review 3. Diagnosis and management of epithelial ovarian cancer.

Authors: Snehal Bhoola; William J Hoskins
Journal: Obstet Gynecol Date: 2006-06 Impact factor: 7.661

4. A general framework of marker design with optimal allocation to assess clinical utility.

Authors: Liansheng Tang; Xiao-Hua Zhou
Journal: Stat Med Date: 2012-07-26 Impact factor: 2.373

5. Weekly topotecan in heavily pretreated patients with recurrent epithelial ovarian carcinoma.

Authors: David M O'Malley; Masoud Azodi; Anita Makkenchery; Jacob Tangir; Jessica McAlpine; Michael Kelly; Peter Schwartz; Thomas Rutherford
Journal: Gynecol Oncol Date: 2005-08 Impact factor: 5.482

Review 6. Clinical trial designs for predictive marker validation in cancer treatment trials.

Authors: Daniel J Sargent; Barbara A Conley; Carmen Allegra; Laurence Collette
Journal: J Clin Oncol Date: 2005-03-20 Impact factor: 44.544

Review 7. Ovarian cancer: markers of response.

Authors: Young-Jeong Na; John Farley; Audrey Zeh; Marcela del Carmen; Richard Penson; Michael J Birrer
Journal: Int J Gynecol Cancer Date: 2009-12 Impact factor: 3.437