Literature DB >> 35263169

Comparative Study of Bayesian Information Borrowing Methods in Oncology Clinical Trials.

Liwen Su¹, Xin Chen¹, Jingyi Zhang¹, Fangrong Yan¹.

Abstract

PURPOSE: With deeper insight into precision medicine, more innovative oncology trial designs have been proposed to contribute to the characteristics of novel antitumor drugs. Bayesian information borrowing is an indispensable part of these designs, which shows great advantages in improving the efficiency of clinical trials. Bayesian methods provide an effective framework when incorporating information. However, the key point lies in how to choose an appropriate method for complex oncology clinical trials.
METHODS: We divided the borrowing information scenarios into concurrent and nonconcurrent scenarios according to whether the data to be borrowed are observed at the same time as in the current trial or not. Then, we provided an overview of the methods in each scenario. Performance comparison of different methods is carried out with regard to the type I error and power.
RESULTS: As demonstrated by the simulation results in each borrowing scenario, the Bayesian hierarchical model and its extensions are more appropriate for concurrent borrowing. The simulation results demonstrate that the Bayesian hierarchical model shows great advantages when the arms are homogeneous. However, such a method should be adopted with caution when heterogeneity exists. We recommend the other methods, considering heterogeneity. Borrow information from informative priors is more suggested for nonconcurrent borrowing scenarios. Multisource exchangeability models are more suitable for multiple historical trials, while meta-analytic-predictive prior should be carefully applied.
CONCLUSION: Bayesian information borrowing is useful and can improve the efficiency of clinical trial designs. However, we should carefully choose an appropriate information borrowing method when facing a practical innovative oncology trial, as an appropriate method is essential to provide ideal design performance.

Entities: Chemical

Mesh：

Year: 2022 PMID： 35263169 PMCID： PMC8926037 DOI： 10.1200/PO.21.00394

Source DB: PubMed Journal: JCO Precis Oncol ISSN： 2473-4284

INTRODUCTION

Recently, with the development of cancer molecular biology, antitumor therapy has entered the era of precision medicine. A number of innovative therapies have emerged, including immunotherapies, targeted therapies, and cancer vaccines.[1-5] The emergence of these innovative drugs has introduced new challenges to the design of oncology clinical trials.[6] More innovative methods have been proposed, and some of them are also encouraged by the US Food and Drug Administration (FDA), such as master protocol trials[7] and Complex Innovative Trial Designs,[8] which raised several discussions.

CONTEXT

Key Objective How do we choose an appropriate information borrowing method with so many methodologies when conducting a complex innovative oncology trial? Knowledge Generated Borrowing information can increase power, reduce type I error, and improve the efficiency when trials are homogeneous. However, borrowing methods should be applied with caution when heterogeneity exists. Relevance When designing new clinical trials which can borrow external information to accelerate the drug development, clinical investigators can refer to this study to select more efficient and appropriate methods. In contrast to traditional trial designs, these innovative methods encourage the use of Bayesian methods.[9] One of the most common scenarios is borrowing information when multiple parallel arms exist. For example, we designed a master protocol clinical trial that encourages borrowing information within multiple substudies. The I-SPY2 trial is a success story for a platform trial that uses a Bayesian hierarchical model (BHM) to adaptively borrow information between running arms.[10] Another innovative paradigm is that historical information from external data forms the design of the current trial. Using large-scale real-world clinical data sets and high-quality completed medical data has become a new trend in clinical trials. By incorporating historical information, these methods offer the possibility of a substantially reduced sample size because of the efficient usage of external data. Information borrowing is an innovative technique of great importance in current clinical trials. It can accelerate the trial process and reduce the cost, thus ultimately improving the efficiency of clinical trials.[11] According to the needs of current cancer clinical trials, information borrowing strategies can be divided into two strategies according to when the information to be borrowed is observed, similar to the division of the control group in ICH-E10.[12] The first strategy is concurrent borrowing that is applied in the master protocol. Concurrent borrowing occurs when multiple parallel arms exist, such as basket and platform trials. Arms under these circumstances are of equal importance and are analyzed simultaneously without a chronologic order. We focus on the estimation of all these parameters by using all arm information. Another strategy for incorporating external data is nonconcurrent borrowing. For nonconcurrent borrowing, there is only one primary trial and others are recognized as supplementary trials, which are not of main interest. Analyses were conducted in proper sequence, and we mainly focused on estimation of parameters in the primary trial. In this case, these supplementary trials can be regarded as external or historical data. In some seamless clinical trial designs, borrowing information between different stages also belongs to this situation. Several Bayesian information borrowing methods have been developed, including concurrent and nonconcurrent borrowing. However, the proper method should be adopted to improve the efficiency of our trial. On the basis of the existing research, we conduct a more in-depth comparison and recommendation for application scenarios on methods for borrowing information.

METHODS

As discussed above, there are two scenarios for information borrowing: concurrent borrowing and nonconcurrent borrowing. The Bayesian method shows nonsubstitutable advantages because it naturally incorporates historical information through the setting of prior distribution. Some FDA guidelines also encourage the use of the Bayesian method in this situation.[9] Ibrahim first proposed the power prior (PP) method, which is a pioneering work in this field.[13] Since then, a large number of methods have been proposed. The PP raises the likelihood of historical data to the prespecified power α. Many similar methods have been developed under this framework, such as the modified power prior (MPP),[14] calibrated power prior (CPP),[15] P value–based power prior (PvPP),[16] commensurate power prior (CP),[17] meta-analytic-predictive prior (MAP),[18] robust meta-analytic-predictive prior (RMAP),[19] and multisource exchangeability models (MEMs).[20] These methods mainly borrow information from informative priors, which are used for nonconcurrent borrowing to a great extent. Another important method of borrowing information is BHM,[21] proposed by Berry, using variance in a hierarchical model to control the extent of borrowing. BHM has been developed into different variants, such as the calibrated Bayesian hierarchical model (CBHM),[22] Bayesian hierarchical classification and information sharing (BaCIS),[23] and Bayesian cluster hierarchical model (BCHM).[24] Although MEMs are not variants of BHM, MEMs have been extended to basket trials.[25-26] Application scenarios and the features of each method are listed in Table 1. In addition, the details of these methods are described in the Data Supplement.

TABLE 1.

Comparison of Typical Borrowing Information Methods

Comparison of Typical Borrowing Information Methods Concerning a novel clinical trial to evaluate the efficacy of an anticancer drug candidate, the primary end point was assumed to be the objective response rate (ORR) as determined by RECIST version 1.1. First, we investigated the scenarios in which concurrent borrowing occurs, along with other drug candidates to compose a parallel multiarm trial. A basket trial was hypothesized in this situation. By contrast, assuming that external data or supplementary trials exist, the efficacy was studied through nonconcurrent borrowing, including single historical trials and multiple historical trials. Detailed configurations of competing scenarios are provided in the Data Supplement. We examined eight scenarios of concurrent borrowing, as shown in the Data Supplement. Scenarios 1, 2, and 7 denote homogeneous efficacy between the four arms, and scenarios 3, 4, 5, 6, and 8 denote heterogeneous efficacy to different extents. For a single historical trial in nonconcurrent borrowing, the continuous change in type I error and power was investigated with regard to different response rates. The other six scenarios with manifold heterogeneity modes are considered in multiple historical trials, as shown in the Data Supplement. The number of simulated trials is 10,000.

RESULTS

Concurrent Scenarios

The methods to be compared include independent analysis, BHM, CBHM, BaCIS, BCHM, and MEMs. The probabilities of rejecting the null hypothesis for each indication are shown in Table 2 and the Data Supplement. Furthermore, additional sensitivity analyses can be found in the Data Supplement.

TABLE 2.

Rejection Rate of the Null Hypothesis for Concurrent Scenarios

Rejection Rate of the Null Hypothesis for Concurrent Scenarios Compared with independent analysis, all five hierarchical models can control type I error and increase power when the arms are homogeneous (scenario 1, 2, and 7). BHM gives better results, followed by MEMs. However, in heterogeneous scenarios (scenarios 3-8), BHM is inferior to other models. For example, in scenarios 4 and 6, where most arms are effective and there is only one ineffective arm, BHM reveals unacceptable type I error inflation, up to 0.2825 and 0.2497, respectively. When the number of effective arms is dominant (scenarios 4, 6, and 8), MEMs show great advantages as MEMs reach a high performance in power similar to BHM, while type I error is much smaller than BHM. However, in scenario 5 where most arms are ineffective, information borrowing from ineffective arms does not bring power advantages to MEMs. BaCIS and CBHM show higher efficiency, exhibiting high power while maintaining type I error in a relatively acceptable range, which is followed by BCHM. CBHM tends to maintain the lowest type I errors. This is caused by the parameter calibrating for a and b in exp{a + b × log(χ2)}. The control can also be relaxed by a different parameter setting according to risk preference. In scenario 3, where the number of effective and ineffective arms is balanced, BaCIS is the most effective, reflecting the advantage of dichotomous clustering. It is worth mentioning that, with regard to indications 2 and 4 in scenario 6, borrowing information simultaneously from arms with response rates of 0.15 and 0.4, the biases in both arms offset each other, leading to an unbiased result. This is unreasonable and requires special attention in practice. The results for BCHM lie in between BaCIS and CBHM. Theoretically, BCHM allows adaptive determination of the number of clusters. However, dividing the arms into effective and ineffective arms by a hypothesis test is actually a dichotomous result. Under such a hypothesis test, more clustering shows no advantages. Detailed results for the root mean square error of estimation for ORR for each indication can be found in the Data Supplement.

Single Historical Trial in Nonconcurrent Scenarios

The operating characteristics of each method are shown in Figure 1. Additional sensitivity analyses can be found in the Data Supplement. When the ORR in the current control arm is the same as the ORR in the historical trial (30%), borrowing information can result in a reduced type I error (Fig 1A), increased power (Fig 1C), and an unbiased estimation of the effect size (Figs 1B and 1D). This means that in case of homogeneity between the current control arm and the historical trial, all information borrowing methods are favorable.

FIG 1.

Simulation results for scenarios of a single historical trial. (A) Type I error under the null hypothesis against different ORR in current control arm. (B) Bias under the null hypothesis against different ORR in current control arm. (C) Power under the alternative hypothesis against different ORR in current control arm. (D) Bias under the alternative hypothesis against different ORR in current control arm. CP, commensurate prior; CPP, calibrated power prior; MAP, meta-analytic-predictive prior; MEMs, multisource exchangeability models; MPP, modified power prior; ORR, objective response rate; PP, power prior; PvPP, P value–based power prior. However, when trials were heterogeneous, the operating characteristics differed between the methods. Taking PP (α = 0), that is, independent analysis, as the benchmark, PP with α ≥ .5 can lead to a sharp change in type I error, power, and bias. The range of change was positively correlated with α. In the most extreme cases, PP (α = 1), that is, pooled analysis, can produce a type I error higher than 0.3 (Fig 1A) and a power lower than 0.6 (Fig 1C). The PP method is too sensitive to heterogeneity, which is unacceptable in practice. In the other five methods that consider the assessment of heterogeneity, the curves are typically S-shaped and control type I/II errors within an acceptable range. This is because these methods can reduce the amount of borrowed information when the current trial data show great heterogeneity from that of historical trials. Concerning bias, the changes in these curves are similar to those of the type I/II error. The curve of bias in PP, except PP with α = 0, has a linear relationship with pc, whereas the curves of bias in the five methods that consider heterogeneity are S-shaped and the estimation bias will not be outrageous, reflecting a tradeoff between precision gains and bias loss. As for the five models that consider heterogeneity, differences were detected between them in this simulation. First, in MPP, α is estimated by only one historical trial and one current control arm and will not be accurate in most cases. As a result, MPP does not have a strong control on type I/II error when heterogeneity exists; so, type I error continuously increases, as shown in Figure 1. By contrast, MPP is a data-driven method that restricts researchers' opinions on information borrowing. CP has a smaller Type I error than MPP and finally reaches a plateau in that it introduces commensurability parameters to measure heterogeneity. In CPP, PvPP, and MEMs, ideal statistical performance can be achieved as long as we carefully set those design parameters. Theoretically, these three methods can achieve very similar tradeoff profiles after sufficient calibrations of the parameters. When the ORR in the current control arm is substantially larger than that in the historical arm, they can identify the difference and reduce the borrowing, resulting in decreased type I error. Among these methods, CPP poses more control on type I error, whereas PvPP and MEMs achieve higher power. It should be pointed out that, when using CPP, it is sometimes difficult to specify the boundaries between fully homogeneous and fully heterogeneous situations. Therefore, CPP is more suitable for bioequivalence trials, in which we have an equivalence boundary usually defined by regulators [0.8, 1.25]. In a single historical trial, the performance of MEMs is similar to that of PvPP. However, the difference between them is more obvious in multiple historical trials, owing to the different borrowing mechanism.

Multiple Historical Trials in Nonconcurrent Scenarios

For multiple historical trials, the performance metrics were the same as those in the simulation of a single historical trial. The simulation results are presented in Table 3. Additional sensitivity analyses can also be found in the Data Supplement.

TABLE 3.

Results for Scenarios of Multiple Historical Trials

Results for Scenarios of Multiple Historical Trials PP, MPP, PvPP, and CP produced compromise results between independent and pooled analysis, similar to those in single historical trials. In most scenarios (Data Supplement), the type I error and power curves for MEMs, CPP, MAP, and RMAP are bounded by independent analysis and pooled analysis, reflecting the nature of borrowing information as a discounting of historical data, except for scenario 6 (Data Supplement), as well as scenarios 2 and 5 (Data Supplement). Considering scenario 6, MAP and RMAP have a larger type I error than the independent analysis. One possible interpretation is that the prior mean for scenario 4 is closer to 0.3 than that in scenario 5, which is an asymmetric mapping (Table 4). Therefore, for scenario 6, MAP and RMAP borrow more from historical trials that are lower than 0.3, which results in type I error inflation. For scenario 2, CPP outperforms independent analysis because the parameter settings in CPP allow it to regard a difference of 0.1 as almost totally heterogeneous (Data Supplement). Therefore, in scenario 2, CPP can exclude historical trials 1 and 2, and only borrow information from historical trial 3, making higher power than the independent method. MEMs fail to identify a difference of 0.1, so they tend to borrow information and have a relatively low power in scenario 2. Regarding scenario 5, where the difference can reach up to 0.2, MEMs successfully identify homogeneous trials and exclude the heterogeneous one (Data Supplement). This scenario is more meaningful for practical applications, as it is difficult to statistically explain whether a smaller difference (such as 0.1) should be borrowed. As a result, MEMs have an excellent performance in scenario 5, which is significantly better than the independent analysis, along with CPP. In the other scenarios in the Data Supplement, the methods mainly lie between independent and pooled analysis. In conclusion, pairwise comparison strategies (such us MEMs) are more effective in multiple historical trials than combined comparisons (such us PP and PvPP).

TABLE 4.

MAP Prior Generated in Each Scenario

MAP Prior Generated in Each Scenario No significant advantages were observed for MAP or RMAP compared with other methods in these scenarios. One possible reason is the bias of the prior mean. The other reason is that MAP is not sensitive to the standard deviation in case of different heterogeneous scenarios owing to the smaller number of historical trials, and mutual transformation of binary variables and continuous variables (Table 4). Taking scenario 1 and 3 as an example, we expected that MAP would have a smaller standard deviation than RMAP, but the truth is the opposite. We could not obtain a precise variance because there were only three historical trials. By contrast, when the binary variable ORR is converted into continuous variable and hierarchical models for meta-analysis are constructed, the logit function has a natural property that logit(p) varies in a relatively small range, although p has significantly changed. The possibility of detecting heterogeneity was further compressed after the transformation. Compared with MAP, RMAP can solve this problem to some extent by introducing a noninformative part of the prior. As a result, we can see a smaller bias in RMAP than in MAP as shown in Table 3, when the historical trials are not completely homogeneous.

DISCUSSION

In conclusion, Bayesian information borrowing is very useful in improving the performance of oncology clinical trial designs. Given the above results, we arrive at the decision-making conclusion summarized in Figure 2. For multiple parallel arm trials, such as master protocol trials, because subtrials are conducted simultaneously and the efficacy in all running arms is considered, it is recommended to choose a BHM and its extensions. When all arms are homogeneous (either effective or ineffective), the BHM method gains greater power than other methods. However, in practice, it is more common that we are not sure whether all arms are homogeneous, and methods with dynamic clustering are recommended, such as BaCIS and MEMs, to avoid type I error inflation. Furthermore, when the majority of the arms are expected to be effective, MEMs are recommended because of their excellent power performance. When the end point is ordinal or the division of the effective population is more refined, BCHM is recommended. For trials in which external data or supplementary trials exist, nonconcurrent borrowing methods are more appropriate, because historical data are available and we are mainly concerned with the current trial. PP and MPP failed to control type I errors when heterogeneity existed; thus, they were not appropriate for most scenarios. Methods other than PP take heterogeneity into consideration and gain an advantage in power. Specifically, the choice of method depends on the tradeoff between bias and variance. CPP provides a stricter criterion for controlling type I error, whereas MEMs place more weight on power gains. The performance of other methods lies between that of CPP and MEMs. In particular, CPP is naturally suitable for bioequivalence trials, whereas MEMs show an advantage in multiple historical trials. MAP and RMAP should be used with caution when the number of multiple historical trials is small.

FIG 2.

Decision-making diagram for how to choose an appropriate borrowing information method. BaCIS, Bayesian hierarchical classification and information sharing; BCHM, Bayesian cluster hierarchical model; BHM, Bayesian hierarchical model; CBHM, calibrated Bayesian hierarchical model; CP, commensurate prior; CPP, calibrated power prior; MAP, meta-analytic-predictive prior; MEM, multisource exchangeability model; MPP, modified power prior; PP, power prior; PvPP, P value–based power prior; RMAP, robust meta-analytic-predictive prior. In addition, sometimes, patient covariates are different, leading to heterogeneity in efficacy. This is more common in nonconcurrent scenarios, which usually have a large time span. Some methods, which have characteristics similar to those of nonconcurrent borrowing, have been proposed on the basis of covariate adjustment.[27-31] Regarding the borrowing strength parameters of each method, we used the recommended values in the literature. In fact, there are two common ways to determine these parameters: empirically specifying and objectively estimating. For the empirically specifying method, we can communicate with clinical investigators to know how much power they intend to borrow. In addition, we can refer to sensitivity analysis in published articles to obtain the recommended values or select particular settings according to the required performance of the new trial. For the objectively estimating method, we can either estimate these parameters[25] or use model averaging/model selection to fit the most likely value.[32] The above are general suggestions. For more details on the setting of key parameters in each method, we can refer to Viele's work.[33] It is worth noting that the difference between concurrent and nonconcurrent borrowing in our study is the information source. Methods commonly applied in two scenarios are listed in Table 1. However, in terms of statistical models, their use is not so strictly differentiated. Some methods can be applied to both concurrent and nonconcurrent scenarios, such as the MEMs in Table 1. In fact, other methods such as MAP and BHM are not only applicable to the recommended scenario, but can also be applied to the other scenario in terms of modeling construction when the source of information is properly explained. The specification of statistical models is actually flexible as long as it satisfies the practical clinical requirements. In conclusion, whatever method one finally chooses when conducting an oncology trial, sufficient simulations should be performed to explore the statistical performance under various scenarios, and adequate communication should be performed to obtain the regulator's approval.

26 in total

1. First-in-Human Trial of a Novel Anti-Trop-2 Antibody-SN-38 Conjugate, Sacituzumab Govitecan, for the Treatment of Diverse Metastatic Solid Tumors.

Authors: Alexander N Starodub; Allyson J Ocean; Manish A Shah; Michael J Guarino; Vincent J Picozzi; Linda T Vahdat; Sajeve S Thomas; Serengulam V Govindan; Pius P Maliakal; William A Wegener; Steven A Hamburger; Robert M Sharkey; David M Goldenberg
Journal: Clin Cancer Res Date: 2015-05-05 Impact factor: 12.531

2. Trends in the global immuno-oncology landscape.

Authors: Jun Tang; Laura Pearce; Jill O'Donnell-Tormey; Vanessa M Hubbard-Lucey
Journal: Nat Rev Drug Discov Date: 2018-10-19 Impact factor: 84.694

3. ComPAS: A Bayesian drug combination platform trial design with adaptive shrinkage.

Authors: Rui Tang; Jing Shen; Ying Yuan
Journal: Stat Med Date: 2018-11-12 Impact factor: 2.373

Review 4. A dynamic power prior for borrowing historical data in noninferiority trials with binary endpoint.

Authors: G Frank Liu
Journal: Pharm Stat Date: 2017-11-10 Impact factor: 1.894

Review 5. Combining precision radiotherapy with molecular targeting and immunomodulatory agents: a guideline by the American Society for Radiation Oncology.

Authors: Robert G Bristow; Brian Alexander; Michael Baumann; Scott V Bratman; J Martin Brown; Kevin Camphausen; Peter Choyke; Deborah Citrin; Joseph N Contessa; Adam Dicker; David G Kirsch; Mechthild Krause; Quynh-Thu Le; Michael Milosevic; Zachary S Morris; Jann N Sarkaria; Paul M Sondel; Phuoc T Tran; George D Wilson; Henning Willers; Rebecca K S Wong; Paul M Harari
Journal: Lancet Oncol Date: 2018-05 Impact factor: 41.316

6. Propensity-score-based meta-analytic predictive prior for incorporating real-world and historical data.

Authors: Meizi Liu; Veronica Bunn; Bradley Hupf; Junjing Lin; Jianchang Lin
Journal: Stat Med Date: 2021-06-14 Impact factor: 2.373

Review 7. Use of historical control data for assessing treatment effects in clinical trials.

Authors: Kert Viele; Scott Berry; Beat Neuenschwander; Billy Amzal; Fang Chen; Nathan Enas; Brian Hobbs; Joseph G Ibrahim; Nelson Kinnersley; Stacy Lindborg; Sandrine Micallef; Satrajit Roychoudhury; Laura Thompson
Journal: Pharm Stat Date: 2013-08-05 Impact factor: 1.894

8. A Calibrated Power Prior Approach to Borrow Information from Historical Data with Application to Biosimilar Clinical Trials.

Authors: Haitao Pan; Ying Yuan; Jielai Xia
Journal: J R Stat Soc Ser C Appl Stat Date: 2016-12-23 Impact factor: 1.864

9. An innovative plasmacytoid dendritic cell line-based cancer vaccine primes and expands antitumor T-cells in melanoma patients in a first-in-human trial.

Authors: Julie Charles; Laurence Chaperot; Dalil Hannani; Juliana Bruder Costa; Isabelle Templier; Sabiha Trabelsi; Hugo Gil; Anaick Moisan; Virginie Persoons; Harald Hegelhofer; Edith Schir; Jean-Louis Quesada; Christophe Mendoza; Caroline Aspord; Olivier Manches; Pierre G Coulie; Amir Khammari; Brigitte Dreno; Marie-Thérèse Leccia; Joel Plumas
Journal: Oncoimmunology Date: 2020-04-12 Impact factor: 8.110