Literature DB >> 27285532

Random-effects meta-analysis for systematic reviews of phase I clinical trials: Rare events and missing data.

Mi-Ok Kim¹, Xia Wang², Chunyan Liu¹, Kathleen Dorris³, Maryam Fouladi⁴, Seongho Song².

Abstract

Phase I trials aim to establish appropriate clinical and statistical parameters to guide future clinical trials. With individual trials typically underpowered, systematic reviews and meta-analysis are desired to assess the totality of evidence. A high percentage of zero or missing outcomes often complicate such efforts. We use a systematic review of pediatric phase I oncology trials as an example and illustrate the utility of advanced Bayesian analysis. Standard random-effects methods rely on the exchangeability of individual trial effects, typically assuming that a common normal distribution sufficiently describes random variation among the trial level effects. Summary statistics of individual trial data may become undefined with zero counts, and this assumption may not be readily examined. We conduct Bayesian semi-parametric analysis with a Dirichlet process prior and examine the assumption. The Bayesian semi-parametric analysis is also useful for visually summarizing individual trial data. It provides alternative statistics that are computed free of distributional assumptions about the shape of the population of trial level effects. Outcomes are rarely entirely missing in clinical trials. We utilize available information and conduct Bayesian incomplete data analysis. The advanced Bayesian analyses, although illustrated with the specific example, are generally applicable.

Entities: Chemical Disease Gene Species

Keywords: meta-analysis; missing data; semi-parametric Bayesian analysis; sparse outcomes; systematic review

Mesh：

Year: 2016 PMID： 27285532 PMCID： PMC5149121 DOI： 10.1002/jrsm.1209

Source DB: PubMed Journal: Res Synth Methods ISSN： 1759-2879 Impact factor: 5.273

Introduction

Systematic reviews and meta‐analyses have grown in popularity as the need to base decisions on the totality of relevant and sound evidence in medicine has been increasingly recognized (Sutton and Higgins, 2008). They are particularly useful for phase I trials as individual trials are generally underpowered to appropriately establish clinical and statistical parameters in order to guide future drug development. A high percentage of zero or missing outcomes often complicate such efforts as illustrated below. In this paper, we use a case study and describe the utility of advanced Bayesian analysis, specifically semi‐parametric analysis with a Dirichlet process prior and incomplete data analysis. We are concerned with random‐effects analysis in this paper. In one treatment group studies like phase I without within trial comparators, discrepancies in population, outcomes, exposures/interventions, design and/or conduct across individual trials may affect how the effects of the single treatment arms are realized across individual trials directly. The assumptions of random individual trial effects are well justified. The case study example comes from a systematic review of pediatric phase I oncology trials in patients with relapsed or refractory solid tumors (Dorris et al., 2015). This study reviewed publications that evaluated the safety and efficacy of molecularly targeted and cytotoxic agents. Molecularly targeted agents, the focus of recent drug development efforts, primarily inhibit tumor cell growth without necessarily killing tumor cells. In contrast, cytotoxic agents kill tumor cells. This systematic review aimed to establish appropriate clinical and statistical parameters to guide future drug development for novel molecularly targeted drugs by examining whether the previously identified efficacy and toxicity rates of phase I cytotoxic trials can be generalized to studies of molecularly targeted agents. The efficacy and safety outcomes were defined by rare events, reporting zero counts in 66% and 48% of the molecularly targeted and the cytotoxic agent trials, respectively. Incomplete toxicity data further complicated meta‐analysis of this systematic review. The difficulty of handling zero counts is well known in the meta‐analysis literature when two treatment group studies are concerned (e.g. Friedrich et al., 2007 and references therein). Random effects meta‐analysis for one treatment group studies is similarly affected. Standard response rate measures (e.g. the odds on the log scale for binary outcomes) may become undefined along with their variances, and methods relying on such statistics may not be directly applicable. Generalized linear mixed models (GLMM) method and full Bayesian (FB) methods have been introduced specifically to address this issue (Kooiman et al., 2012, Hamer et al., 2012, Appelman‐Dijkstra et al., 2011, Cai et al., 2010, Singh et al., 2009, Sweeting et al., 2004, Platt et al., 1999, Smith et al., 1995). However, these methods rely on the exchangeability of individual trial effects, typically assuming that a common normal distribution sufficiently describes random variation among them. The assumption may not hold when a high percentage of zero events exist. We conduct Bayesian semi‐parametric analysis with a Dirichlet process prior and examined the assumption. The Dirichlet process prior is assumed for the population of individual trial level effects and allows a general shape for the population, such as a heavy tailed or multi‐modal distribution. Under this prior, we obtain the posterior predictive distributions of the individual trial effects and estimate the population of individual trial effects. The semi‐parametric analysis is also useful for visually summarizing individual trial data. Summary statistics of individual trial data may become undefined with zero counts and not available, for example, for constructing forest plots. The semi‐parametric analysis provides alternative statistics based on the posterior distributions of individual trial effects. The statistics are obtained free of parametric distribution assumptions about the population of individual trial level effects and are only minimally affected by the estimation of the population means. In contrast, individual trial estimates from GLMM or FB analysis are pulled toward the population means under the exchangeability of individual trial effects assumption and are not appropriate for the inspection of potentially outlying trial level effects. Similar Bayesian semi‐parametric models have been considered in the literature (Burr and Doss, 2005, Branscum and Hanson, 2008), but their utility for improving model diagnostics was not examined, particularly as an enabling tool of visual inspection of data with many zero counts as described. Not all trials report the same type of information, and systematic reviews uncommonly include a substantial portion of missing data. In the given case study, toxicity data were incomplete with dose limiting toxicity (DLT) outcomes missing in 23 trials (21%) and grade 3 or 4 toxicity outcomes missing in 67 trials (60%). However, only 2 of 89 total studies had toxicity information completely missing. We use a parametric Bayesian approach, and model the relationship between the overall grade 3 or 4 toxicities and the DLT events and other auxiliary information. This Bayesian incomplete analysis only excluded the two studies with completely missing toxicity information, salvaging 21 trials with missing DLT outcomes. The rest of this paper is structured as follows: Section 2 presents the example phase I systematic review in detail. This serves as a real life application, of which the analysis results by different random effects meta‐analysis methods are compared. Section 3 describes the application of standard random‐effects meta‐analysis methods to the case example. Section 4 compares the case study results. Section 5 presents the semi‐parametric Bayesian analysis and the Bayesian incomplete DLT data analysis. Section 6 concludes the paper.

A case study: systematic review of pediatric phase I oncology trials

This systematic review included publications from 1990 to 2010 that studied the safety and efficacy of molecularly targeted and cytotoxic agents in pediatric phase I oncology trials. Molecularly targeted drugs target key molecular pathways that are disrupted or unregulated in specific cancers and may inhibit tumor cell growth without necessarily killing tumor cells. These drugs have been the focus of drug development increasingly over the past decade with the hope that rationally targeted therapies may be the key to finding cures for cancer in contrast to traditional cytotoxic agents that use nonspecific mechanisms to kill tumor cells. The systematic review aimed to establish appropriate clinical and statistical parameters to guide future clinical trial designs for novel molecularly targeted drugs by examining whether the previously identified efficacy and toxicity rates of phase I cytotoxic trials are generalizable to studies of molecularly targeted agents. We refer to Dorris et al. (2015) for the details. The study identified 89 phase I studies with 30 studies investigating 26 molecularly targeted drugs and 59 studies evaluating 37 cytotoxic agents. Accounting for multiple strata, a total of 111 trials were included. A substantial number of the trials had small sample size. Fewer than 20 patients were available for assessment of the efficacy endpoints in 47.5% of the trials. In 10.1% of the trials, fewer than 10 were available. Citation information about individual trials is included in Supplemental Table 1. The primary efficacy outcome was overall objective response that indicates complete resolution of all radiographic evidence of tumors or regression of the primary tumor greater than 25%. The primary safety or toxicity outcomes were dose‐limiting toxicities (DLT). Both were defined by rare events; for example, zero events were reported for the primary efficacy outcome in 66% and 48% of the molecularly targeted and cytotoxic agent trials, respectively. Toxicity data contained a substantial portion of missing outcomes. DLT outcomes were missing in 23 trials (21%). Secondary toxicity outcomes, grade 3 or 4 toxicity events, were missing in 67 trials (60%). However, only two studies out of the 89 total had toxicity information completely missing.

Standard random‐effects methods and applications to the case study

We model the binary primary efficacy outcome of patient j in a trial i(Y ) using a Bernoulli distribution: where p is the true rate of the efficacy outcome in the trial i and n is the sample size. The individual trial data are summarized as follows: for each of i = 1, …, N trials, True rates of binary outcomes are often considered on the logit scale and we denote the logit transformation by η(⋅), so that η(p ) = log(p /(1 − p )). Given standard random‐effects methods assume where denote the variances of the trial level statistics on the logit scale. We let X = x(x = 0, 1) denote the drug group of the single treatment arm tested in each trial with x = 1 for trials in the molecularly targeted drug group and x = 0 for trials in the cytotoxic agent drug group. Standard methods further assume that true response rates vary from trial to trial, which typically is described by normal distributions with the means μ and the variances ; The between trial variances are also typically assumed same .

DerSimonian and Laird's random effects model method

DerSimonian and Laird's method (DerSimonian and Laird, 1986) assumes a simple random effects model, which is the model (3) not specifying the distributions of the true responses to be normal distributions. With the normal distribution specification the method can be explained as an empirical Bayes (EB) method in the sense that the computation and estimators are equivalent to those of EB method replacing the true parameter values in the model (2) and (3) with corresponding sample estimates. With we have by the delta method. The EB method replaces with . It then computes the between‐trial variance estimate ( ) by the method of moments based on , i = 1, …, N. Given , and , i = 1, …, N, the EB method provides estimates of the population mean (μ ) given by where . For the individual trial response rates on the transformed scale, the EB method provides as estimates. With zero events, , and the quantities and become undefined. This problem can be avoided by adding a small constant. We used an R package called metafor (Viechtbauer, 2010) with a default value of 0.5 in the analysis of the case study. One may consider different transformations and avoid correcting zeros by adding a small constant arbitrarily. Variance stabilizing transformations are an option. We considered Freeman–Tukey's transformation (Freeman and Tukey, 1950). With the transformation we have We estimated the within‐trial variance with the asymptotic variance, so that The EB random effects method was applied similarly. We note that the EB method is subject to bias inherently: it does not account for the source of variability when substituting the parameters with the estimates. In this particular case it does not make allowance for the imprecision in , or and therefore underestimates the variance.

GLMM method

GLMM method also often considers the true response rate on the logit transformed scale and assumes the normal models (3). Unlike the EB method, it assumes model (1) instead of (2) and readily admits zero counts. The specific GLMM model under consideration in this case study is generalized linear logistic regression model with random study effects. We used the pseudo‐likelihood method as in Wolfinger and Oconnell (1993) and Breslow and Clayton (1993) with doubly iterative algorithm to fit the model. This method finds modes of the log‐posterior distribution as random effects estimates. We used the SAS GLIMMIX procedure for computation with the sample size n ' s treated as offset terms.

Full Bayesian (FB) method

A full Bayesian (FB) model also typically includes models (1) and (3) and readily admits zero counts similarly as the GLMM. The FB method differs from the GLMM in that the model parameters are also considered random quantities. Without loss of generality, a prior distribution π(ϕ) is assumed for the joint distribution of the population mean and variance parameters as follows: where ϕ denotes a vector of appropriately specified hyper‐parameters. A prior distribution is typically based on evidence external to the study in question or on subjective a priori beliefs. In the case of systematic reviews, evidence unrelated to the individual trials that systematic reviews examine is qualified as external evidence. The difficulty of and potentially subjective decisions involved in specifying the prior distribution are often cited as a major disadvantage of Bayesian methods (Sutton and Abrams, 2001). The empirical Bayes approach described in the earlier section avoids this difficulty by substituting parameters with sample estimates. We refer to Sutton and Abrams (2001) for the discussion of the importance and influence of the prior specification and choices available specifically for a meta‐analysis. In the given case study, we used vague priors, a normal prior N (0, 106) for μ 0 and μ 1, and a uniform prior U(0, 10) for τ 0 and τ 1. Vague or defused priors mitigate the impact of subjective prior specification (Gelman et al., 2013). Gibbs sampling can be conveniently carried out for computation and posterior inference. Some details are as follows; three independent chains of 100 000 Markov chain Monte Carlo (MCMC) simulation samples were generated after 50 000 of burn‐ins and by retaining every 20th iteration. WinBUGS implemented in R package called R2WinBUGS version 2.1‐18 (Sturtz et al., 2005) was used. The posterior mean values were used as summary estimates with 95% central credibility intervals. The posterior probability that the mean response rate of the cytotoxic agents is higher than that of the molecularly targeted agents was calculated using the MCMC samples and was provided as an evidence for the significance of the drug group difference. Sampling traces and distributions and Gelman–Rubin diagnostics were obtained by using the coda package for R, version 0.15‐2 (Plummer et al., 2006). No evidence against convergence was identified.

Case study analysis results

We first discuss results by the EB method on the two different transformation scales. Figure 1 presents forest plots of the response rate data of the cytotoxic agent group on the logit transformed scale (Figure 1a) and the Freeman–Tukey's (F–T) arcsine square‐root transformed scale (Figure 1b). With the logit transformation, the reported values are the log of the odd of having objective response. Confidence intervals were calculated using normal approximation with the within trial variance estimates defined respectively by (4) and (5). The plots are not directly comparable as the scales are different. Grey vertical reference lines are drawn for comparison at points that correspond to the efficacy response rate of 0.015 on the respective transformed scales. We observe that several trials are found on the left of the reference line on the F–T transformed scale (Figure 1b), whereas only two trials are found on the left of the reference line on the logit transformed scale with the 0.5 constant correction applied to the logit transformation (Figure 1a). The EB method also weighted studies with low odds less, which resulted in overestimating the overall means, as compared with the GLMM and the FB results that are on the same logit transformed scale (see Figure 2). Although both groups were affected, the small constant correction affected the molecularly targeted drug group more as it has the higher percentage of zero observed observations (66% vs. 48%): compared to the GLMM and FB results, the downward bias in the EB estimated between trial variance is much greater in the molecularly targeted drug group than in the cytotoxic group.

Figure 1

Response rates of the individual trials that tested cytotoxic agents: (a) on the logit transformed scale and (b) on the Freeman–Tukey's (F–T) arcsine square‐root transformed scale

Figure 2

Estimates of the population mean response rates of each drug group and their comparisons: EB‐Logit = Empirical Bayes (DerSimonian and Laird's method) analysis with the logit transformation, EB‐F–T = Empirical Bayes (DerSimonian and Laird's method) analysis the Freeman–Tukey's arcsine square‐root transformation, and SPB = semi‐parametric Bayes analysis with a Dirichlet process prior. τ2 indicates the between trial variance estimates

Response rates of the individual trials that tested cytotoxic agents: (a) on the logit transformed scale and (b) on the Freeman–Tukey's (F–T) arcsine square‐root transformed scale Estimates of the population mean response rates of each drug group and their comparisons: EB‐Logit = Empirical Bayes (DerSimonian and Laird's method) analysis with the logit transformation, EB‐F–T = Empirical Bayes (DerSimonian and Laird's method) analysis the Freeman–Tukey's arcsine square‐root transformation, and SPB = semi‐parametric Bayes analysis with a Dirichlet process prior. τ2 indicates the between trial variance estimates The EB method applied with the F–T transformation has its own shortcomings. It uses large sample approximation to define the within trial variances shown in (5), which did not work well for the given data. The approximation is known not to perform well for small or large p if n is not large (Mosteller and Youtz, 1961). Figure 2 in Mosteller and Youtz (1961) specifically shows that with n = 10, the large sample approximation overestimates the variance (or the within trial variances in this meta‐analysis case) by ≥125% if the response rate is <0.07. A number of molecularly targeted and cytotoxic trials reported response rates <0.07 with 66% and 48% respectively reporting zero counts. Also about 10% of the individual trials have sample size <10. The poorly estimated within trial variances affect the overall mean estimation as the overall means are weighted averages with the weights depending on the within trial variances. The inappropriateness of the large sample approximation is also indicated in the forest plot (see Figure 1b): the dotted vertical line indicates the lower limit of a valid range corresponding to the support of efficacy rate. The lower limits of many confidence intervals stretch beyond the valid range. The GLMM and FB methods are not subject to the aforementioned limitations, and reported comparable results.

Advanced Bayesian analysis

Semi‐parametric Bayesian analysis

For a meta‐analysis to be valid, individual trial results should be sufficiently similar to be compared and combined for a common pooled estimate. In standard random‐effects analysis, this “similarity” assumption typically requires that a common normal distribution sufficiently describes random variation among trial level response rates. The earlier sections showed that the GLMM and the FB methods assumed the simple homoscedastic normal model (3). In the presence of a high percentage of zeros, however, the “similarity” assumption may not readily examined. We use a Bayesian semi‐parametric GLMM to examine the assumption and sensitivity of the GLMM and FB results. We employed the Dirichlet process GLMM proposed in Mukhopadhyay and Gelfand (1997) specifically. The model assumes the populations of trial level response rates of each drug group are mixture distributions and replaces the model (3) and the prior (6) with the following: where G denotes the population distribution of individual trial response rates, and DP(αG 0) denotes the mixtures of Dirichlet processes with a precision parameter α and a normal base distribution G 0 (Antoniak, 1974). With the model (7.b) this semi‐parametric model allows a general family of distributions denoted by DP(αG 0) and assumes the population distribution G is drawn from this family of distributions. In Bayesian terminology, this means placing a prior on the population distribution. This contrasts with the FB method which assumes a completely specified distribution up to a fixed number of unknown parameters as in model (3) and place priors on the parameters as in (6). The mixture of Dirichlet processes (DP) specifically means that G follows an infinite mixture of normal distributions instead of one normal distribution. It does not priori fix the number of normal distributions and hence is distribution free, allowing more general distributional shapes for G such as heavy tailed or multi‐modal distributions. In the literature, simulation studies have reported that Bayesian approaches with the DP prior well approximate an unknown distribution, whether the unknown distribution is a simple, homoscedastic normal, a mixture of normal distributions, or a skewed or heavy‐tailed distribution (Gelfand and Mukhopadhyay, 1995, Pati and Dunson, 2014, Xu et al., 2015). We used a gamma prior Gamma(a0 = 1, b0 = 1) on α for the given case study. The gamma prior allowed the data to inform more strongly about the number of normal distributions needed to model G as a mixture distribution and about how tight neighborhood of G 0 the true population distribution G is in. For the base distribution G 0, a vague normal prior, N(0, 106), is placed on μ, and a vague inverse Wishart prior distribution, IW(3, 1), was placed on Σ. Similar to the FB analysis, we ran independent chains and confirmed the convergence of MCMC chains. Inference was performed also similarly as with the FB, using the posterior means, the 95% central credible intervals, and the posterior probability computed by the MCMC samples (see Figure 3a). This kind of model has been used successfully to model random effects in many situations (Dey et al., 1998, Kleinman and Ibrahim, 1998).

Figure 3

(a) Posterior distribution of population mean response rate difference on the logit transformed scale; (b) posterior predictive distributions of trial level response rates on the logit transformed scale We checked the normality assumption of the GLMM and the FB methods, using the posterior predictive distributions of the individual trial effects under the semi‐parametric model. The predictive distributions were estimated based on 5000 MCMC samples from 200 000 runs with 100 000 burn‐ins and by retaining every 20th iteration. In 63.6% of the 5000 MCMC samples, the DP model selected a mixture of two to five normal distributions to describe the population of trial level response rates for the molecularly targeted drug group. The three most frequent choices were mixing two to four normal distributions. The model selected a mixture of five to ten for the cytotoxic agent group in 64.1% with six to eight as the three most frequent choices. The predictive distributions suggest significant deviations from the assumed normality, particularly in the right tails (see Figure 3b). The small bumps in the right tails suggest the existence of sub populations that may be qualitatively different. We further investigate this possibility below. Despite the deviation from the normality assumption, other inference results of the semi‐parametric analysis agree well with those of the GLMM and the FB analysis in Figure 2. This implies that the population means and drug group comparison results of the GLMM and the FB analysis are robust to the deviations from their respective assumptions. No single parameter corresponds to the between trial variance in this semi‐parametric Bayes analysis and no explicit estimate is available. The semi‐parametric analysis is also useful in checking the presence of outlying trial effects. In a meta‐analysis, individual trials with extreme or outlying observations are not uncommon (Ohlssen et al., 2007). As the main objective of a meta‐analysis is to provide a reasonable summary, the presence of such outliers may question whether the outlying trials are inherently different from and are inappropriate to be combined with the rest. Forest plots are an essential tool for graphically summarizing individual trial data and visually inspecting for potential outliers. With zero observed counts, however, standard summary measures may become undefined, in which case forest plots are not applicable. This is a lesser known problem but is not trivial. The trial level estimates from the semi‐parametric analysis can be used instead. The trial level estimates are pooled estimates (toward the population means) but are obtained free of distributional assumptions. Because of the DP mixture prior, they are only minimally affected from the population mean estimation in contrast to the GLMM or FB method. In the GLMM or FB analysis, the individual trial estimates are obtained under the normality assumption, under which outlying individual trial effects, if exist, are much more strongly pooled toward the population means. The forest plot in Figure 4 presents the individual trial estimates from the semi‐parametric analysis in chronological order within each drug group, molecularly targeted drug studies first and cytotoxic studies later. We suspect two individual trial estimates in the molecularly targeted drug group and five in the cytotoxic group may be qualified as extreme observations. These potential outlying trials correspond to the small bumps in the right tails of the semi‐parametric Bayesian estimated individual trial effect population distributions (Figure 3b). Whether the bumps suggest sub populations that may be qualitatively different can be investigated by whether these studies are qualitatively different from the rest.

Figure 4

Forest plot using the semi‐parametric Bayes estimates of individual trial response rates on the logit transformed scale. The dotted vertical line is a reference line drawn at 0 on the logit transformed scale We investigated this question by examining the tested agents and the characteristics of the patient samples of the ostensibly outlying two molecularly targeted drug trials and five cytotoxic drug trials. We found that each of the tested agents was considered “winners” and continually tested in later studies as a single agent or in combination. Such later studies also reported relatively high response rates, and in comparison to these later studies, the seemingly outlying trial effects did not look extreme. Also, three of the seven trials were designed for a single tumor type based on knowledge of the biology of the relevant tumor while the other trials allowed for multiple relapsed tumor types. This single versus multiple tumor type mix was similarly observed in the rest of the studies. The seven trials were also similar in the patient characteristics such as the median patient age and prior therapy exposure. This led us to conclude the ostensibly outlying seven trials are merely a result of random sampling. On the contrary, forest plots using sample estimates of individual trials, not pooled estimates, are not appropriate for this kind of investigation. The forest plot on the logit transformed scale (Figure 1a) is subject to the upward bias because of the constant correction applied to zero counts. In the presence of the upward bias, the five cytotoxic trials indicated by the semi‐parametric analysis as ostensibly outlying do not appear clearly distinctive from the rest. The distinction was relatively clear in the forest plot on the double arcsine transformation scale (Figure 1b). We note that the semi‐parametric analysis is robust to perturbation in the prior specification (7.c)–(7.e). Nieto‐Barajas and Prunster (2009) performed a sensitivity analysis for a wide class of Bayesian nonparametric density estimators, including the mixture of DP, by perturbing the prior. Comparison of the resulting posterior density estimates found that the density estimation is robust. We also note that slight changes in the posterior distribution do not have much impact on the results. We performed simulation studies by arbitrarily removing the ostensibly outlying trials corresponding to the small bumps in the posterior predictive distributions. We removed 2 ostensibly outlying trials from each drug group or all. When not all outlying trials were removed, the resulting posterior predictive distributions showed small bumps corresponding to the remaining outlying trials, whereas they did not when all were removed (see Figure 5). The change in the number of outlying trials, however, minimally affected the posterior probability based group comparison. The posterior probability was 0.053 with the original data and ranged from 0.03 to 0.09 when some or all outlying trials were removed. We note that the semi‐parametric analysis results, although robust, shall be interpreted with caution. Qualitative investigation should follow for subpopulations indicated by small bumps in the resulting posterior density estimates.

Figure 5

Examples of posterior predictive distributions of trial level response rates when some or all outlying trials were removed. Numbers inside parentheses denote the numbers of outlying individual trials remained after arbitrarily removing some or all

Full Bayesian analysis of incomplete toxicity data

The primary toxicity outcome of the example case study was dose‐limiting toxicities (DLT). DLT were defined as grade 3 or 4 toxicity that occurred during the first cycle of drug. This outcome was missing in 23 trials. However, only 2 of 89 trials included toxicity information completely missing. Secondary toxicity outcomes and other information were available in the rest. We used a full Bayesian method and modeled the auxiliary information, salvaging the 21 trials with missing DLT. The auxiliary information included number of overall grade 3 or 4 toxicities, numbers of patients evaluable for efficacy and toxicity assessment (n ), and total number of courses of therapy (m ). The overall grade 3/4 toxicity events are sums of hematologic and non‐hematologic toxicities that occurred during all courses of therapy and included DLT events. The count generally increases along with the total number of treatment courses given per trial while the total number of treatment courses given per trial increases with the number of evaluable patients. The count also increases with the efficacy response rate observed in the trial, as patients were allowed to continue on receiving additional treatment courses if they responded. We modeled each toxicity event count using a Poisson distribution but constrained them to be added up to certain totals respectively. For example, hematologic DLT event counts and hematologic non‐DLT event counts need to be added up to the overall grade 3 or 4 hematologic toxicity event counts . This relationship suggested the following model: where λ , λ denote respectively the event rates of the hematologic DLT event count and the non‐DLT, grade 3 or 4 hematologic toxicity event count. In the model the DLT event counts increase with the number of patients (n ), whereas the overall grade 3/4 toxicity count increases with the total number of treatment courses given per trial (m ). Similar relationships hold for non‐hematologic event variables. For missing m , we imputed it based on the following assumptions: we first consider m , the number of treatment courses given to the subject j in the trial i, and assume it depends on the patient's response to the treatment as follows: ,where if the patient responded to the treatment of the trial or otherwise. With T and n denote the total number of responders and patients for trial i, we have A joint prior distribution was specified for all the parameters. This full Bayesian model assumed missing at randomness (MAR) within each drug group. We considered other possibilities of letting the probability of missingness dependent on the efficacy rate, the sample size and the total number of treatment courses but did not find any notable association. We examined the goodness of the fit of the full Bayesian model first. Figure 6a presents a plot of estimated versus observed DLT event rates of non‐missing cases for the cytotoxic group. The agreement between the estimated versus observed trial level DLT rates follows a tilted 45 degree line with the Bayesian estimates, larger in the lower range while smaller in the upper range. This is expected as the Bayesian estimates are pooled toward the population mean estimate. The tight point cloud along the tilted 45 degree line suggests a good agreement. Figure 6b represents a plot of estimated versus observed total numbers of treatment courses given per trial. It suggests a reasonable agreement considering 60% of the data is missing.

Figure 6

(a) Scatter plot of observed versus estimated trial level dose limiting toxicity (DLT) rates of the cytotoxic drug group; (b) scatter plot of observed versus estimated total number of treatment courses given per trial. The size of the bubbles indicates the sample size of individual trials We conducted complete cases (CC) alone analyses for comparison. Inference results of different methods are presented in Figure 7. Significantly higher toxicity rates were unanimously reported for the cytotoxic agent group. The empirical Bayes (EB) results refer to the log transformation results that used a constant correction by 1/2 for zero counts. We similarly note an upward bias in the estimates because of the constant correction. The estimates of the Bayesian incomplete data analysis are similar to the GLMM and FB estimates. This is expected as the standard methods provide valid results under MAR. The incomplete data analysis differs in that instead of excluding trials with the missing outcomes, it included them and utilized the data more efficiently. Although other possibilities were considered, the incomplete data analysis may have introduced bias, however, and hence the efficiency gained comes with the potential of bias.

Figure 7

Population mean dose limiting toxicity (DLT) rate estimates and comparison of drug groups. Incomplete data analysis refers to the Bayes analysis that utilized auxiliary information

Population mean dose limiting toxicity (DLT) rate estimates and comparison of drug groups. Incomplete data analysis refers to the Bayes analysis that utilized auxiliary information The length of confidence intervals or credible intervals in case of Bayesian analyses can be used to compare the efficiency of each method. As longer intervals mean lower efficiency, the ratio of two interval lengths indicates the relative efficiency of the associated analysis methods. We used 95% confidence intervals or 95% central credible intervals. Numbers in the parentheses in Figure 6 indicate relative efficiency of each method in percent as compared with the GLMM complete case analysis. The incomplete analysis utilized the auxiliary information and is the most efficient. The 95% credibility intervals were 34.3% and 18.7%, shorter than the respective confidence intervals of the GLMM complete case alone analysis.

Conclusion

The importance of decision making based on the totality of relevant and sound evidence is increasingly emphasized in medicine. More systematic reviews of phase I trials will likely be conducted to establish appropriate clinical and statistical parameters to guide future clinical trials. We focus on the handling of sparse and incomplete data. We used a systematic review of a pediatric phase I oncology trial and showed that standard random effects methods may not be sufficient or adequate. A high percentage of zeros may question the exchangeability of individual trial level effects under a Gaussian distribution which is typically assumed in standard random effects analyses. A semi‐parametric analysis with a DP prior estimates the population of trial level effects free of parametric distributional assumptions on the population and enables examining the assumption. A semi‐parametric analysis with a DP prior, however, may not be feasible or desirable unless the number of individual studies included is large. We also showed that incomplete toxicity data can be addressed by an advanced Bayesian model that utilizes auxiliary information. The missing outcome case illustrated by the case study is rather ideal in that a clear and plausible imputation mechanism exists from observed data. Such knowledge is required for the advanced incomplete analysis. Supplemental Table 1. Citation information on individual trials included in the systematic review of pediatric phase I oncology trials that compared the safety and efficacy of molecularly targeted and cytotoxic agents. Supporting info item Click here for additional data file.

19 in total

Review 1. Bayesian methods in meta-analysis and evidence synthesis.

Authors: A J Sutton; K R Abrams
Journal: Stat Methods Med Res Date: 2001-08 Impact factor: 3.021

2. A semiparametric Bayesian approach to the random effects model.

Authors: K P Kleinman; J G Ibrahim
Journal: Biometrics Date: 1998-09 Impact factor: 2.571

Review 3. Pituitary dysfunction in adult patients after cranial radiotherapy: systematic review and meta-analysis.

Authors: Natasha M Appelman-Dijkstra; Nieke E Kokshoorn; Olaf M Dekkers; Karen J Neelis; Nienke R Biermasz; Johannes A Romijn; Johannes W A Smit; Alberto M Pereira
Journal: J Clin Endocrinol Metab Date: 2011-05-25 Impact factor: 5.958

4. Meta-analysis in clinical trials.

Authors: R DerSimonian; N Laird
Journal: Control Clin Trials Date: 1986-09

5. Bayesian approaches to random-effects meta-analysis: a comparative study.

Authors: T C Smith; D J Spiegelhalter; A Thomas
Journal: Stat Med Date: 1995-12-30 Impact factor: 2.373

6. Impact of intraoperative stimulation brain mapping on glioma surgery outcome: a meta-analysis.

Authors: Philip C De Witt Hamer; Santiago Gil Robles; Aeilko H Zwinderman; Hugues Duffau; Mitchel S Berger
Journal: J Clin Oncol Date: 2012-04-23 Impact factor: 44.544

7. Meta-analysis: serum creatinine changes following contrast enhanced CT imaging.

Authors: Judith Kooiman; Sharif M Pasha; Wendy Zondag; Yvo W J Sijpkens; Aart J van der Molen; Menno V Huisman; Olaf M Dekkers
Journal: Eur J Radiol Date: 2011-12-15 Impact factor: 3.528

8. Bayesian nonparametric regression with varying residual density.

Authors: Debdeep Pati; David B Dunson
Journal: Ann Inst Stat Math Date: 2014-02 Impact factor: 1.267

9. A comparison of safety and efficacy of cytotoxic versus molecularly targeted drugs in pediatric phase I solid tumor oncology trials.

Authors: Kathleen Dorris; Chunyan Liu; Dandan Li; Trent R Hummel; Xia Wang; John Perentesis; Mi-Ok Kim; Maryam Fouladi
Journal: Pediatr Blood Cancer Date: 2016-09-22 Impact factor: 3.167

10. Random-effects meta-analysis for systematic reviews of phase I clinical trials: Rare events and missing data.

Authors: Mi-Ok Kim; Xia Wang; Chunyan Liu; Kathleen Dorris; Maryam Fouladi; Seongho Song
Journal: Res Synth Methods Date: 2016-06-10 Impact factor: 5.273

1 in total

1. Random-effects meta-analysis for systematic reviews of phase I clinical trials: Rare events and missing data.

Authors: Mi-Ok Kim; Xia Wang; Chunyan Liu; Kathleen Dorris; Maryam Fouladi; Seongho Song
Journal: Res Synth Methods Date: 2016-06-10 Impact factor: 5.273

1 in total