Lifeng Lin1. 1. Department of Statistics, Florida State University, Tallahassee, Florida.
Abstract
RATIONALE, AIMS, AND OBJECTIVES: Heterogeneity is a critical issue in meta-analysis, because it implies the appropriateness of combining the collected studies and impacts the reliability of the synthesized results. The Q test is a traditional method to assess heterogeneity; however, because it does not have an intuitive interpretation for clinicians and often has low statistical power, many meta-analysts alter to use some measures, such as the I2 statistic, to quantify the extent of heterogeneity. This article aims at providing a summary of available tools to assess heterogeneity and comparing their performance. METHODS: We reviewed four heterogeneity measures (I2 , R ̂ I , R ̂ M , and R ̂ b ) and illustrated how they could be treated as test statistics like the Q statistic. These measures were compared with respect to statistical power based on simulations driven by three real-data examples. The pairwise agreement among the four measures was also evaluated using Cohen's κ coefficient. RESULTS: Generally, R ̂ I was slightly more powerful than the Q test, while its type I error rate might be slightly inflated. The power of I2 was fairly close to that of Q. The R ̂ M and R ̂ b statistics might have low powers in some cases. Because the differences between the powers of I2 , R ̂ I , and Q were often tiny, meta-analysts might not expect I2 and R ̂ I to yield significant heterogeneity if the Q test failed to do so. In addition, I2 and R ̂ I had fairly good agreement based on the simulated meta-analyses, but all other pairs of heterogeneity measures generally had poor agreement. CONCLUSION: The I2 and R ̂ I statistics are recommended for measuring heterogeneity. Meta-analysts should use the heterogeneity measures as descriptive statistics which have intuitive interpretations from the clinical perspective, instead of determining the significance of heterogeneity simply based on their magnitudes.
RATIONALE, AIMS, AND OBJECTIVES: Heterogeneity is a critical issue in meta-analysis, because it implies the appropriateness of combining the collected studies and impacts the reliability of the synthesized results. The Q test is a traditional method to assess heterogeneity; however, because it does not have an intuitive interpretation for clinicians and often has low statistical power, many meta-analysts alter to use some measures, such as the I2 statistic, to quantify the extent of heterogeneity. This article aims at providing a summary of available tools to assess heterogeneity and comparing their performance. METHODS: We reviewed four heterogeneity measures (I2 , R ̂ I , R ̂ M , and R ̂ b ) and illustrated how they could be treated as test statistics like the Q statistic. These measures were compared with respect to statistical power based on simulations driven by three real-data examples. The pairwise agreement among the four measures was also evaluated using Cohen's κ coefficient. RESULTS: Generally, R ̂ I was slightly more powerful than the Q test, while its type I error rate might be slightly inflated. The power of I2 was fairly close to that of Q. The R ̂ M and R ̂ b statistics might have low powers in some cases. Because the differences between the powers of I2 , R ̂ I , and Q were often tiny, meta-analysts might not expect I2 and R ̂ I to yield significant heterogeneity if the Q test failed to do so. In addition, I2 and R ̂ I had fairly good agreement based on the simulated meta-analyses, but all other pairs of heterogeneity measures generally had poor agreement. CONCLUSION: The I2 and R ̂ I statistics are recommended for measuring heterogeneity. Meta-analysts should use the heterogeneity measures as descriptive statistics which have intuitive interpretations from the clinical perspective, instead of determining the significance of heterogeneity simply based on their magnitudes.
Authors: Hongmei Wang; Meng Wu; Haonan Liu; Hang Zhou; Yang Zhao; Yifan Geng; Bo Jiang; Kai Zhang; Bo Zhang; Zhengxiang Han; Xiuping Du Journal: Front Oncol Date: 2021-11-24 Impact factor: 6.244
Authors: Vincent Bruet; Marion Mosca; Amaury Briand; Patrick Bourdeau; Didier Pin; Noëlle Cochet-Faivre; Marie-Christine Cadiergues Journal: Vet Sci Date: 2022-03-22