| Literature DB >> 30978228 |
Robbie C M van Aert1, Jelte M Wicherts1, Marcel A L M van Assen1,2.
Abstract
Publication bias is a substantial problem for the credibility of research in general and of meta-analyses in particular, as it yields overestimated effects and may suggest the existence of non-existing effects. Although there is consensus that publication bias exists, how strongly it affects different scientific literatures is currently less well-known. We examined evidence of publication bias in a large-scale data set of primary studies that were included in 83 meta-analyses published in Psychological Bulletin (representing meta-analyses from psychology) and 499 systematic reviews from the Cochrane Database of Systematic Reviews (CDSR; representing meta-analyses from medicine). Publication bias was assessed on all homogeneous subsets (3.8% of all subsets of meta-analyses published in Psychological Bulletin) of primary studies included in meta-analyses, because publication bias methods do not have good statistical properties if the true effect size is heterogeneous. Publication bias tests did not reveal evidence for bias in the homogeneous subsets. Overestimation was minimal but statistically significant, providing evidence of publication bias that appeared to be similar in both fields. However, a Monte-Carlo simulation study revealed that the creation of homogeneous subsets resulted in challenging conditions for publication bias methods since the number of effect sizes in a subset was rather small (median number of effect sizes equaled 6). Our findings are in line with, in its most extreme case, publication bias ranging from no bias until only 5% statistically nonsignificant effect sizes being published. These and other findings, in combination with the small percentages of statistically significant primary effect sizes (28.9% and 18.9% for subsets published in Psychological Bulletin and CDSR), led to the conclusion that evidence for publication bias in the studied homogeneous subsets is weak, but suggestive of mild publication bias in both psychology and medicine.Entities:
Mesh:
Year: 2019 PMID: 30978228 PMCID: PMC6461282 DOI: 10.1371/journal.pone.0215052
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Summary of publication bias methods to assess publication bias and estimate effect sizes corrected for publication bias.
The penultimate column lists principal references of the different methods and the final column indicates whether a method is included in the analyses of this paper.
| Method | Description | Characteristics/Recommendations | Included in analyses | |
|---|---|---|---|---|
| Fail-safe | Estimates number of effect sizes in the file-drawer | Method is discouraged to be used, because it, for instance, assumes that all nonsignificant effect sizes are equal to zero and focuses on statistical instead of practical significance [ | No | |
| Funnel plot | Graphical representation of small-study effects where funnel plot asymmetry is an indicator of small-study effects | Publication bias is not the only cause of funnel plot asymmetry [ | No | |
| Egger’s and rank-correlation test | Statistical tests for testing funnel plot symmetry | Publication bias is not the only cause of funnel plot asymmetry [ | Yes | |
| Test of Excess Significance | Computes whether observed and expected number of statistically significant results are in agreement | Do not apply the method in case of heterogeneity in true effect size [ | Yes | |
| Examines whether statistically significant | Method does not use information of nonsignificant effect sizes and, assumes homogeneous true effect size [ | Yes | ||
| Trim and fill method | Method corrects for funnel plot asymmetry by trimming most extreme effect sizes and filling these effect sizes to obtain funnel plot symmetry | Method is discouraged to be used because it falsely imputes effect sizes when none are missing and other methods have shown to outperform trim and fill [ | No | |
| PET-PEESE | Extension of Egger’s test where the corrected estimate is the intercept of a regression line fitted through the effect sizes in a funnel plot | Method becomes biased if it is based on less than 10 effect sizes, the between-study variance in true effect size is large, and the sample size of primary studies included in a meta-analysis is rather similar [ | No | |
| Estimate is the effect size for which the distribution of conditional | Method does not use information of nonsignificant effect sizes and assumes homogeneous true effect size [ | Yes | ||
| Selection model approach | Method makes assumptions on the distribution of effect sizes (effect size model) and mechanism of observing effect sizes (selection model). Estimation is performed by combining these two models. | User has to make sophisticated assumptions and choices [ | No | |
| 10% most precise effect sizes | Only the 10% most precise effect sizes are used for estimation with a random-effects model | 90% of the available effect sizes is discarded and bias in estimates increases as a function of heterogeneity in true effect size [ | Yes | |
Fig 1Funnel plot showing the relationship between the observed effect size (Hedges’ g; solid circles) and its standard error in a meta-analysis by Jürgens and Graudal [42] on the effect of sodium intake on Noradrenaline (left panel).
The funnel plot in the right panel also includes the Hedges’ g effect sizes that are imputed by the trim and fill method (open circles).
Fig 2Flowchart illustrating the extraction procedure of data from meta-analyses published in Psychological Bulletin between 2004 and 2014.
Hypotheses between predictors and effect size estimate based on random-effects model, p-uniform, and overestimation in effect size when comparing estimate of the random-effects model with p-uniform (Y).
| Hypotheses | |||
|---|---|---|---|
| Predictor | Random-effects model | Overestimation ( | |
| Discipline | Larger estimates in subsets from Psychological Bulletin | No specific expectation | Overestimation more severe in Psychological Bulletin |
| No relationship | Positive relationship | Negative relationship | |
| Primary studies’ precision | Negative relationship | No relationship | Negative relationship |
| Proportion of significant effect sizes | Predictor not included | No specific expectation | No specific expectation |
Percentage of statistically significant effect size estimates, median number of effect sizes and median of average sample size per homogeneous subset, and mean and median of effect size estimates when the subsets were analyzed with random-effects meta-analysis, p-uniform, and random-effects meta-analysis based on the 10% most precise observed effect sizes.
| RE meta-analysis | 10% most precise | ||||
|---|---|---|---|---|---|
| Median (IQR) number of effect sizes | 6 (5;9) | 1 (0;4) | 1 (1;1) | ||
| Median (IQR) sample size | 97.8 (52.4;173.2) | 109 (56.5;206.2) | 207.3 (100;466) | ||
| Mean, median, [min.;max.], (SD) of estimates | 0.332, 0.279, [0;1.456] (0.264) | -0.168, 0.372, | 0.283, 0.22, | ||
| Mean, median, [min.;max.], (SD) of estimates | -0.216, -0.123, | -0.041, -0.214, | -0.228, -0.204, | ||
| Median (IQR) number of effect sizes | 6 (5;8) | 1 (0;2) | 1 (1;1) | ||
| Median (IQR) sample size | 126.6 (68.3;223.3) | 123.3 (71.9;283.5) | 207 (101.2;443) | ||
| Mean, median, [min.;max.], (SD) of estimates | 0.304, 0.215, [0.001;1.833] (0.311) | -1.049, 0.323, | 0.284, 0.201, | ||
| Mean, median, [min.;max.], (SD) of estimates | -0.267, -0.19, | 1.51, -0.239, | -0.214, -0.182, | ||
RE meta-analysis is random-effects meta-analysis, IQR is the interquartile range, min. is the minimum value, max. is the maximum value, SD is the standard deviation, and CDSR is Cochrane Database of Systematic Reviews. The percentages of homogeneous subsets with positive and negative RE meta-analysis estimates do not sum to 100%, because the estimates of three homogeneous subsets obtained from the meta-analysis by Else-Quest and colleagues [103] were equal to zero. These authors set effect sizes to zero if the effect size could not have been extracted from a primary study but was reported as not statistically significant.
Results of applying Egger’s regression test, rank-correlation test, p-uniform’s publication bias test, and test of excess significance (TES) to examine the prevalence of publication bias in meta-analyses from Psychological Bulletin and Cochrane Database of Systematic Reviews.
| Rank-correlation | |||||||||
|---|---|---|---|---|---|---|---|---|---|
| Not sig. | Sig. | Not sig. | Sig. | ||||||
| Egger | Not sig. | 600 | 35 | 635; 87.1% | Egger | Not sig. | 354 | 34 | 388; 83.3% |
| Sig. | 51 | 43 | 94; 12.9% | Sig. | 70 | 8 | 78; 16.7% | ||
| Total | 651; 89.3% | 78; 10.7% | Total | 424; 91% | 42; 9% | ||||
| TES | |||||||||
| Not sig. | Sig. | Not sig. | Sig. | ||||||
| Egger | Not sig. | 609 | 29 | 638; 87.2% | Rank-corr. | Not sig. | 377 | 34 | 411; 88.2% |
| Sig. | 83 | 11 | 94; 12.8% | Sig. | 47 | 8 | 55; 11.8% | ||
| Total | 692; 94.5% | 40; 5.5% | Total | 424; 91% | 42; 9% | ||||
| TES | TES | ||||||||
| Not sig. | Sig. | Not sig. | Sig. | ||||||
| Rank-corr. | Not sig. | 620 | 31 | 651; 89.3% | Not sig. | 393 | 31 | 424; 91% | |
| Sig. | 69 | 9 | 78; 10.7% | Sig. | 33 | 9 | 42; 9% | ||
| Total | 689; 94.5% | 40; 5.5% | Total | 426; 91.4% | 40; 8.6% | ||||
H denotes Loevinger’s H to describe the association between two methods. The rank-correlation could not be applied to all 732 subsets, because there was no variation in the observed effect sizes in three subsets. All these subsets were part of the meta-analysis by Else-Quest and colleagues [103] who set effect sizes to zero if the effect size could not have been extracted from a primary study but was reported as not statistically significant.
Results of meta-meta regression on the absolute value of the random-effects meta-analysis effect size estimate with predictors discipline, I2-statistic, harmonic mean of the standard error (standard error), and number of effect sizes in a subset.
| B (SE) | 95% CI | ||
|---|---|---|---|
| Intercept | 0.035 (0.018) | 1.924 (.055) | -0.001;0.07 |
| Discipline | 0.056 (0.014) | 3.888 (< .001) | 0.028;0.084 |
| 0.002 (0.0004) | 3.927 (< .001) | 0.001;0.002 | |
| Standard error | 0.776 (0.073) | 10.685 (< .001) | 0.633;0.918 |
| Number of effect sizes | -0.002 (0.0005) | -4.910 (< .001) | -0.003;-0.001 |
CDSR is the reference category for discipline. p-values for discipline and harmonic mean of the standard error are one-tailed whereas the other p-values are two-tailed. CI = Wald-based confidence interval.
Results of meta-meta-regression on the absolute value of p-uniform’s effect size estimate with predictors discipline, I2-statistic, harmonic mean of the standard error (standard error), proportion of statistically significant effect sizes in a subset (Prop. sig. effect sizes), and number of effect sizes in a subset.
| B (SE) | 95% CI | ||
|---|---|---|---|
| Intercept | 0.77 (0.689) | 1.118 (0.264) | -0.584;2.124 |
| Discipline | 0.001 (0.497) | 0.001 (0.999) | -0.975;0.976 |
| 0.013 (0.014) | 0.939 (0.174) | -0.014;0.039 | |
| Standard error | 3.767 (2.587) | 1.456 (0.146) | -1.316;8.851 |
| Prop. sig. effect sizes | -1.287 (0.797) | -1.615 (0.107) | -2.853;0.279 |
| Number of effect sizes | -0.02 (0.015) | -1.363 (0.173) | -0.049;0.009 |
CDSR is the reference category for discipline. p-value for the I2-statistic is one-tailed whereas the other p-values are two-tailed. CI = Wald-based confidence interval.
Mean, standard deviation (SD), 95% confidence interval (CI), and median of the Y variable computed with p-uniform and the 10% most precise observed effect sizes.
| 10% most precise | ||||
|---|---|---|---|---|
| Psy. Bull. | CDSR | Psy. Bull. | CDSR | |
| Mean (SD) | -0.007 (0.412) | 0.042 (0.305) | 0.030 (0.181) | 0.038 (0.220) |
| (95% CI) | (-0.056;0.043) | (0.002;0.083) | (0.011;0.048) | (0.016;0.061) |
| Median | 0.019 | 0.051 | 0.024 | 0.023 |
Results are reported for homogeneous subsets of meta-analyses published in Psychological Bulletin (Psy. Bull.) and Cochrane Database of Systematic Reviews (CDSR).
Results of meta-meta-regression on the effect size overestimation in random-effects meta-analysis when compared to p-uniform (Y) and predictors discipline, I2-statistic, harmonic mean of the standard error (standard error), proportion of statistically significant effect sizes in a subset (Prop. sig. effect sizes), and number of effect sizes in a subset.
| B (SE) | 95% CI | ||
|---|---|---|---|
| Intercept | -0.017 (0.033) | -0.517 (.605) | -0.083;0.048 |
| Discipline | -0.04 (0.024) | -1.651 (.951) | -0.087;0.008 |
| -0.004 (0.001) | -5.338 (< .001) | -0.005;-0.002 | |
| Standard error | 0.172 (0.126) | 1.371 (.086) | -0.074;0.419 |
| Prop. sig. effect sizes | 0.182 (0.039) | 4.713 (< .001) | 0.106;0.258 |
| Number of effect sizes | -0.001 (0.001) | -2.064 (.04) | -0.003;-0.0001 |
CDSR is the reference category for discipline. p-values for discipline, the I2-statistic, and the harmonic mean of the standard error are one-tailed whereas the other p-values are two-tailed. CI = Wald-based confidence interval.
Fig 3Type-I error rate and statistical power of the rank-correlation test (open bullets), Egger’s test (triangles), test of excess significance (TES; diamonds), and p-uniform’s publication bias test (solid bullets) in the Monte-Carlo simulation study.
pub and μ are the extent of publication bias and the average true effect size, respectively.
Fig 4Overestimation (Y) of the random-effects model when compared with the 10% most precise observed effect sizes for simulated data based on characteristics of subsets from meta-analyses published in Psychological Bulletin (left panel) and Cochrane Database of Systematic Reviews (CDSR; right panel).
pub refers to the extent of publication bias and open bullets, triangles, and diamonds indicate no, small, and medium average true effect size. The solid line indicates the mean of the Y-variable observed in the homogeneous subsets and the dashed lines are the upper and lower bound of a 95% confidence interval (CI) around the mean of the Y-variable.