Literature DB >> 26040434

No solution yet for combining two independent studies in the presence of heterogeneity.

Andrea Gonnermann1, Theodor Framke1, Anika Großhennig1, Armin Koch1.   

Abstract

Entities:  

Mesh:

Year:  2015        PMID: 26040434      PMCID: PMC4471592          DOI: 10.1002/sim.6473

Source DB:  PubMed          Journal:  Stat Med        ISSN: 0277-6715            Impact factor:   2.373


× No keyword cloud information.
Meta-analysis plays an important role in the analysis and interpretation of clinical trials in medicine and of trials in the social sciences but is of importance in other fields (e.g., particle physics [1]) as well. In 2001, Hartung and Knapp [2],[3] introduced a new approach to test for a nonzero treatment effect in a meta-analysis of k studies. Hartung and Knapp [2],[3] suggest to use the random effects estimate according to DerSimonian and Laird [4] and propose a variance estimator q so that the test statistics for the treatment effect is t distributed with k − 1 degrees of freedom. In their paper on dichotomous endpoints, results of a simulation study with 6 and 12 studies illustrate for risk differences, log relative risks and log odds ratios, the excellent properties regarding control of the type I error, and the achieved power [2]. They investigate different sample sizes in each study, and different amounts of heterogeneity between studies and compare their new approach (Hartung and Knapp approach (HK)) with the fixed effects approach (FE) and the classical random effects approach by DerSimonian and Laird (DL). It can be clearly seen that, with increasing heterogeneity, the FE as well as the DL does not control the type I error rate, while the HK keeps the type I error rate in nearly every situation and in every scale. Advantages and disadvantages of the two standard approaches and respective test statistics have been extensively discussed (e.g., [5-7]). While it is well known that the FE is too liberal in the presence of heterogeneity, the DL is often thought to be rather conservative because heterogeneity is incorporated into the standard error of the estimate for the treatment effect and this should lead to larger confidence intervals and smaller test statistics for the treatment effect ([8] chapter 9.4.4.3). This was disproved among others by Ziegler and Victor [7], who observed in situations with increasing heterogeneity severe inflation of the type I error for the DerSimonian and Laird test statistic. Notably, the asymptotic properties of this approach will be valid, if both the number of studies and the number of patients per study are large enough ([8] chapter 9.54, [9,10]). Although power issues of meta-analysis tests have received some interest, comparisons between the approaches and the situation with two studies were not the main interest [11,12]. Borenstein et al. ([10], pp. 363/364) recommend the random effects approach in general for meta-analysis and do not recommend meta-analyses of small numbers of studies. However, meta-analyses of few and of even only two trials are of importance. In drug licensing in many instances, two successful phase III clinical trials have to be submitted as pivotal evidence for drug licensing [13], and summarizing the findings of these studies is required according to the International Conference on Harmonisation guidelines E9 and M4E ([14,15]). It is stated that ‘An overall summary and synthesis of the evidence on safety and efficacy from all the reported clinical trials is required for a marketing application [...]. This may be accompanied, when appropriate, by a statistical combination of results’ ([14], p. 31). For the summary, ‘The use of meta-analytic techniques to combine these estimates is often a useful addition, because it allows a more precise overall estimate of the size of the treatment effects to be generated, and provides a complete and concise summary of the results of the trials’ ([14], p. 32). While in standard drug development, this summary will include usually more than two studies; in rare diseases for the same intervention, barely ever more than two studies are available because of the limited number of patients. Likewise, decision making in the context of health technology assessment is based on systematic reviews and meta-analyses. Often in practice, only two studies are considered homogeneous enough from clinical grounds to be included into a meta-analysis and then form the basis for decision making about reimbursement [16]. Despite the fact that meta-analysis is non-experimental observational (secondary) research [17] and p-values should be interpreted with caution, meta-analyses of randomized clinical trials are termed highest-level information in evidence-based medicine and are the recommended basis for decision making [18]. As statistical significance plays an important role in the assessment of the meta-analysis, it is mandatory to understand the statistical properties of the relevant methodology also in a situation, where only two clinical trials are included into a meta-analysis. We found Cochrane reviews including meta-analyses with two studies only, which are considered for evidence-based decision making even in the presence of a large amount of heterogeneity (I2≈75%) [19-21] We repeated the simulation study for dichotomous endpoints of Hartung and Knapp [2] with programs written in R 3.1.0 [22] to compare the statistical properties of the FE, the DL, and the HK for testing the overall treatment effect θ (H0: θ = 0) in a situation with two to six clinical trials. We considered scenarios under the null and alternative hypothesis for the treatment effect with and without underlying heterogeneity. We present the findings for the odds ratio with p=0.2 and did vary probability of success in the treatment group p to investigate the type I error and the power characteristics. The total sample size per meta-analysis was kept constant in the different scenarios (n = 480) and n/k number of patients per study to clearly demonstrate the effect of the number of included studies on power and type I error of the various approaches. Likewise, we attempted to avoid problems with zero cell counts or extremely low event rates that may impact on type I error and power as well. I2 was used to describe heterogeneity because thresholds have been published (low: I2=25%, moderate: I2=50%, and high: I2=75%) [23] for the quantification of the degree of heterogeneity with this measure. We termed I2≤15% negligible, and this refers to simulations assuming no heterogeneity (i.e., the fixed effects model). Table I summarizes the results of our simulation study. The well-known anticonservative behavior of the FE and the DL in the presence of even low heterogeneity is visible for small numbers of studies in the meta-analysis. Particularly for the FE, the increase in the type I error is pronounced. With more than four studies even in situations with substantial heterogeneity, the HK perfectly controls the type I error. There is almost no impact on the power of the test in situations with no or low heterogeneity, and overall, it seems as if the only price to be paid for an increased heterogeneity is a reduced power of the test.
Table I

Overview of the empirical type I error and power.

kHet (mean I2)Empirical type I error forEmpirical power for
pC=pT=0.2
pT=0.2 and pT=0.3
FEDLHKFEDLHK
20.150.04660.03820.04810.71710.60740.1487
30.150.04590.03520.04770.71420.61690.2976
40.140.04170.03110.04730.69650.61460.3999
50.130.03910.03080.04730.70080.62670.4720
60.120.03730.03060.04470.68080.61470.5015
20.250.13130.08950.04690.65010.49800.1137
30.250.10160.06840.05250.65370.50300.2115
40.250.08610.06130.04670.65520.51130.2762
50.250.09000.06140.04670.64630.50300.3148
60.250.08350.05740.04230.63020.49480.3395
20.500.41420.21840.04890.59980.34470.0706
30.500.28840.13670.04930.58920.33620.1131
40.500.23910.11040.04670.58140.33770.1476
50.500.22310.09560.04430.55350.30890.1611
60.500.20770.08640.04210.55410.31710.1767
20.750.73060.28660.06390.74550.30500.0567
30.750.53840.17860.05090.60970.23070.0695
40.750.46640.13850.05010.56730.20820.0747
50.750.43030.12230.04660.54730.19820.0853
60.750.40230.11140.04680.52630.19360.0900

k, number of studies; Het, heterogeneity; FE, fixed effects approach; DL, DerSimonian and Laird approach; HK, Hartung and Knapp approach; pc, event rate in control group; pT, event rate in treatment group.

Note: In a random effects model for log odds ratios, normally distributed logit (pT) and logit (pC) were simulated. These values have been back transformed to pT and pC to generate binomially distributed number of successes for a given sample size per treatment arm. Median response rates are reported because of skewed distribution after back transformation of the logits generated in the first step of the simulation. The total sample size of the meta-analyses is 480 patients, with balanced treatment arms and 480/k patients per study to investigate the impact of the number of studies in the meta-analysis. Mean I2 from the simulations is reported to describe the degree of heterogeneity.

Overview of the empirical type I error and power. k, number of studies; Het, heterogeneity; FE, fixed effects approach; DL, DerSimonian and Laird approach; HK, Hartung and Knapp approach; pc, event rate in control group; pT, event rate in treatment group. Note: In a random effects model for log odds ratios, normally distributed logit (pT) and logit (pC) were simulated. These values have been back transformed to pT and pC to generate binomially distributed number of successes for a given sample size per treatment arm. Median response rates are reported because of skewed distribution after back transformation of the logits generated in the first step of the simulation. The total sample size of the meta-analyses is 480 patients, with balanced treatment arms and 480/k patients per study to investigate the impact of the number of studies in the meta-analysis. Mean I2 from the simulations is reported to describe the degree of heterogeneity. This is in strong contrast to the situation with only two studies. Again, the HK perfectly controls the prespecified type I error. However, even in a homogeneous situation, the power of the meta-analysis test was lower than 15% in situations where the power of the FE and the DL approximates 70% and 60%, respectively. In the presence of even low heterogeneity with the HK, there is not much chance to arrive at a positive conclusion even with substantial treatment effects. Figure 1 summarizes the main finding of our simulation study with k = 2 and 6 studies impressively.
Figure 1

(a–d): Influence of heterogeneity in meta-analysis with two and six studies on empirical power. FE, fixed effects approach; DL, DerSimonian and Laird approach; HK, Hartung and Knapp approach. In the left column, simulation results with two studies are presented, whereas in the right column, situations with six studies are investigated. No heterogeneity is assumed in the top row, and in the bottom row, the impact of moderate heterogeneity is shown.

(a–d): Influence of heterogeneity in meta-analysis with two and six studies on empirical power. FE, fixed effects approach; DL, DerSimonian and Laird approach; HK, Hartung and Knapp approach. In the left column, simulation results with two studies are presented, whereas in the right column, situations with six studies are investigated. No heterogeneity is assumed in the top row, and in the bottom row, the impact of moderate heterogeneity is shown. In the homogeneous situation with two studies, the DL and even better the FE can be used to efficiently base conclusions on a meta-analysis. In contrast, already with mild to moderate heterogeneity, both standard tests severely violate the prespecified type I error, and there is a high risk of false positive conclusion with the classical approaches. This has major implications for decision making in drug licensing as well. We have noted previously that a meta-analysis can be confirmatory if a drug development program was designed to include a preplanned meta-analysis of the two pivotal trials [24]. As an example, thrombosis prophylaxis was discussed in the paper by Koch and Röhmel [24], where venous thromboembolism is accepted as primary endpoint in the pivotal trials. In case when both pivotal trials are successful, they can be combined to demonstrate a positive impact on, for example, mortality. This can be preplanned as a hierarchical testing procedure: first, both pivotal trials will be assessed individually before confirmatory conclusions will be based on the meta-analysis. As explained, neither the FE, nor the DL, nor the HK can be the methodology to be recommended for a priori planning in this sensitive area unless any indication for heterogeneity is taken as a trigger not to combine studies in a meta-analysis at all. It is our belief that not enough emphasis has been given to this finding in the original paper and the important role of heterogeneity is not acknowledged enough in the discussion of findings from meta-analyses, in general.
  15 in total

1.  A comparison of statistical methods for meta-analysis.

Authors:  S E Brockwell; I R Gordon
Journal:  Stat Med       Date:  2001-03-30       Impact factor: 2.373

2.  The power of statistical tests in meta-analysis.

Authors:  L V Hedges; T D Pigott
Journal:  Psychol Methods       Date:  2001-09

3.  On tests of the overall treatment effect in meta-analysis with normally distributed responses.

Authors:  J Hartung; G Knapp
Journal:  Stat Med       Date:  2001-06-30       Impact factor: 2.373

Review 4.  Measuring inconsistency in meta-analyses.

Authors:  Julian P T Higgins; Simon G Thompson; Jonathan J Deeks; Douglas G Altman
Journal:  BMJ       Date:  2003-09-06

5.  Can meta-analyses be trusted?

Authors:  S G Thompson; S J Pocock
Journal:  Lancet       Date:  1991-11-02       Impact factor: 79.321

6.  Meta-analysis inside and outside particle physics: convergence using the path of least resistance?

Authors:  Dan Jackson; Rose Baker
Journal:  Res Synth Methods       Date:  2013-04-17       Impact factor: 5.273

Review 7.  TPO receptor agonist for chronic idiopathic thrombocytopenic purpura.

Authors:  Yan Zeng; Xin Duan; Jiajun Xu; Xun Ni
Journal:  Cochrane Database Syst Rev       Date:  2011-07-06

8.  Meta-analysis in clinical trials.

Authors:  R DerSimonian; N Laird
Journal:  Control Clin Trials       Date:  1986-09

9.  "The challenge of meta-analysis": discussion. Indications and contra-indications for meta-analysis.

Authors:  N Victor
Journal:  J Clin Epidemiol       Date:  1995-01       Impact factor: 6.437

Review 10.  Methadone at tapered doses for the management of opioid withdrawal.

Authors:  L Amato; M Davoli; S Minozzi; R Ali; M Ferri
Journal:  Cochrane Database Syst Rev       Date:  2005-07-20
View more
  22 in total

1.  The Adaptive designs CONSORT Extension (ACE) statement: a checklist with explanation and elaboration guideline for reporting randomised trials that use an adaptive design.

Authors:  Munyaradzi Dimairo; Philip Pallmann; James Wason; Susan Todd; Thomas Jaki; Steven A Julious; Adrian P Mander; Christopher J Weir; Franz Koenig; Marc K Walton; Jon P Nicholl; Elizabeth Coates; Katie Biggs; Toshimitsu Hamasaki; Michael A Proschan; John A Scott; Yuki Ando; Daniel Hind; Douglas G Altman
Journal:  BMJ       Date:  2020-06-17

2.  Anesthetic Exposure During Childhood and Neurodevelopmental Outcomes: A Systematic Review and Meta-analysis.

Authors:  Charles Reighard; Shaqif Junaid; William M Jackson; Ayesha Arif; Hannah Waddington; Andrew J O Whitehouse; Caleb Ing
Journal:  JAMA Netw Open       Date:  2022-06-01

3.  Prospectively assessed neurodevelopmental outcomes in studies of anaesthetic neurotoxicity in children: a systematic review and meta-analysis.

Authors:  Caleb Ing; William M Jackson; Michael J Zaccariello; Terry E Goldberg; Mary-Ellen McCann; Anneke Grobler; Andrew Davidson; Lena Sun; Guohua Li; David O Warner
Journal:  Br J Anaesth       Date:  2020-11-27       Impact factor: 9.166

4.  Bayesian model-averaged meta-analysis in medicine.

Authors:  František Bartoš; Quentin F Gronau; Bram Timmers; Willem M Otte; Alexander Ly; Eric-Jan Wagenmakers
Journal:  Stat Med       Date:  2021-10-27       Impact factor: 2.497

Review 5.  Effects of resistance exercise in prostate cancer patients: a meta-analysis.

Authors:  M Keilani; T Hasenoehrl; L Baumann; R Ristl; M Schwarz; M Marhold; T Sedghi Komandj; R Crevenna
Journal:  Support Care Cancer       Date:  2017-06-10       Impact factor: 3.603

6.  The adaptive designs CONSORT extension (ACE) statement: a checklist with explanation and elaboration guideline for reporting randomised trials that use an adaptive design.

Authors:  Munyaradzi Dimairo; Philip Pallmann; James Wason; Susan Todd; Thomas Jaki; Steven A Julious; Adrian P Mander; Christopher J Weir; Franz Koenig; Marc K Walton; Jon P Nicholl; Elizabeth Coates; Katie Biggs; Toshimitsu Hamasaki; Michael A Proschan; John A Scott; Yuki Ando; Daniel Hind; Douglas G Altman
Journal:  Trials       Date:  2020-06-17       Impact factor: 2.279

7.  Hartung-Knapp-Sidik-Jonkman approach and its modification for random-effects meta-analysis with few studies.

Authors:  Christian Röver; Guido Knapp; Tim Friede
Journal:  BMC Med Res Methodol       Date:  2015-11-14       Impact factor: 4.615

8.  Meta-analysis of two studies in the presence of heterogeneity with applications in rare diseases.

Authors:  Tim Friede; Christian Röver; Simon Wandel; Beat Neuenschwander
Journal:  Biom J       Date:  2016-10-18       Impact factor: 2.207

9.  Meta-analysis of few small studies in orphan diseases.

Authors:  Tim Friede; Christian Röver; Simon Wandel; Beat Neuenschwander
Journal:  Res Synth Methods       Date:  2016-06-30       Impact factor: 5.273

10.  Interval estimation of the overall treatment effect in a meta-analysis of a few small studies with zero events.

Authors:  Konstantinos Pateras; Stavros Nikolakopoulos; Dimitris Mavridis; Kit C B Roes
Journal:  Contemp Clin Trials Commun       Date:  2018-01-09
View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.