| Literature DB >> 33150663 |
Sanghyun Hong1, W Robert Reed1.
Abstract
The purpose of this study is to show how Monte Carlo analysis of meta-analytic estimators can be used to select estimators for specific research situations. Our analysis conducts 1620 individual experiments, where each experiment is defined by a unique combination of sample size, effect size, effect size heterogeneity, publication selection mechanism, and other research characteristics. We compare 11 estimators commonly used in medicine, psychology, and the social sciences. These are evaluated on the basis of bias, mean squared error (MSE), and coverage rates. For our experimental design, we reproduce simulation environments from four recent studies. We demonstrate that relative estimator performance differs across performance measures. Estimator performance is a complex interaction of performance indicator and aspects of the application. An estimator that may be especially good with respect to MSE may perform relatively poorly with respect to coverage rates. We also show that the size of the meta-analyst's sample and effect heterogeneity are important determinants of relative estimator performance. We use these results to demonstrate how these observable characteristics can guide the meta-analyst to choose the most appropriate estimator for their research circumstances.Entities:
Keywords: Monte Carlo; estimator performance; experiments; meta-analysis; publication bias; simulation design
Year: 2020 PMID: 33150663 PMCID: PMC8074967 DOI: 10.1002/jrsm.1467
Source DB: PubMed Journal: Res Synth Methods ISSN: 1759-2879 Impact factor: 5.273
Summary of selected Monte Carlo studies of estimator performance: number of experiments and estimators studied
| Study | Experiments | Estimators |
|---|---|---|
| Stanley et al | 180 | RE, WLS, WAAP, PP |
| Alinaghi and Reed | 74 | WLS‐FE, WLS‐RE, PP |
| Bom and Rachinger | 215 | FE, RE, WAAP, PP, EK |
| Carter et al | 432 | TF, pC, pU, RE, 3PSM WAAP, PP |
| Hedges and Vevea | 176 | 5PSM |
| McShane et al | 125 | pC, pU, 3PSM |
| Moreno et al | 240 | TF(FE‐FE), TF(FE‐RE), TF(RE‐RE), FE, RE, FE‐se, RE‐se, D‐se, FE‐var, RE‐var, D‐var, Harbord, Peters, and Harbord‐C |
| Reed | 36 | OLS, PET, PEESE, FE, WLS, RE |
| Rucker et al | 36 | TF, CSM, RE, LMA |
| Simonsohn et al | 30 | TF, pC, FE |
| Stanley | 120 | WLS, FE, PP |
| Stanley and Doucouliagos | 60 | FE, RE, Top10, PEESE, PP, WLS‐se, WLS‐Quadratic, WLS‐Cubic |
| van Aert et al | 25 | pC, pU, FE, RE |
| van Assen et al | 36 | FE, TF, pU, TES |
|
|
|
|
Note: Estimators:
3PSM/4PSM/5PSM = Three‐Parameter, Four‐Parameter, and Five Parameter Selection Models
AK1 = Andrews and Kasy's “symmetric selection” model
AK2 = Andrews and Kasy's “asymmetric selection” selection
CSM = Copas selection model (Copas )
EK = Bom and Rachinger's Endogenous Kink estimator
FE = Fixed Effects
FE‐se, RE‐se, and WLS‐se/D‐se/PET = Estimates the following model using FE, RE, and WLS:
FE‐var, RE‐var, and PEESE/D‐var/ = Estimates the following model using FE, RE, and WLS.
Harbord/Harbord‐C = Harbord, Egger, and Sterne's “Regression test for small‐study effects” and variant
LMA = Limit meta‐analysis (Rucker et al ).
OLS = OLS regression of estimated effects on a constant.
pC = p‐curve
pU = p‐uniform
Peters = Peters et al's “Regression test for funnel asymmetry”
PP = PET‐PEESE (Stanley and Doucouliagos )
RE = Random Effects
TES = Test for excess significance (Ioannidis and Trikalinos )
TF/TF(RE‐RE) = Trim and Fill with RE used for both the “trim” and “fill” components
TF(FE‐FE)/TF(FE‐RE) = Trim and Fill with variants depending on whether FE or RE is used for the “trim” and “fill” components, respectively
Top10 = Estimator which uses only the most precise 10% of estimates (Stanley et al. )
WLS/WLS‐FE = Weighted Least Squares with weights
WLS‐RE = Weighted Least Squares with weights
WLS‐Quadratic = Estimates the following model using WLS:
WLS‐Cubic = Estimates the following model using WLS:
WAAP = Stanley et al's Weighted Average of the Adequately Powered‐WLS‐FE hybrid estimator.
FIGURE 1Illustration of 3PSM and 4PSM. [Colour figure can be viewed at wileyonlinelibrary.com]
FIGURE 2Illustration of AK1 and AK2 [Colour figure can be viewed at wileyonlinelibrary.com]
Number of experiments by sample size and extent of effect heterogeneity
| A. Stanley, Doucouliagos, and Ioannidis | ||||
|---|---|---|---|---|
| Sample size | Low 𝐼2 | Moderate 𝐼2 | High 𝐼2 | Total |
|
|
|
| ||
| {5,10} | 30 | 27 | 15 | 72 |
| 20 | 15 | 10 | 11 | 36 |
| 40 | 15 | 10 | 11 | 36 |
| 80 | 13 | 12 | 11 | 36 |
| {100, 200, 400, 800} | 51 | 49 | 44 | 144 |
| Total | 124 | 108 | 92 | 324 |
Note: The table lists the number of experiments for each {sample size, effect heterogeneity} category, by simulation environment. An experiment is defined as a unique set of parameters determining (a) effect size, (b) effect heterogeneity, (c) publication selection, (d) sample size, and (e) (for Carter et al., 2019) questionable research practices (see Appendix 2 in Data S1). Each experiment consists of 3000 simulated meta‐analyses. measures the share of effect size variance that is due to heterogeneity in true effects. It is based on , which we, following Carter et al, estimate using restricted maximum likelihood (REML) [see Equation (6) in the text and the associated discussion].
Comparison of estimator performance: all experiments
| Performance criterion | |||||||
|---|---|---|---|---|---|---|---|
| |Bias| | MSE | |Coverage‐0.95| | Type I Error | ||||
|
| 0.076 |
| 0.075 |
| 0.172 |
| 0.113 |
|
| 0.081 |
| 0.106 | 3PSM | 0.212 |
| 0.236 |
|
| 0.083 |
| 0.107 | 4PSM | 0.258 | 3PSM | 0.243 |
| 4PSM | 0.090 | TF | 0.110 |
| 0.290 |
| 0.267 |
| 3PSM | 0.101 |
| 0.120 |
| 0.297 | 4PSM | 0.274 |
|
| 0.109 |
| 0.136 |
| 0.310 |
| 0.516 |
|
| 0.132 | 3PSM | 0.140 |
| 0.346 |
| 0.566 |
| TF | 0.140 | pU | 0.160 | TF | 0.396 |
| 0.586 |
|
| 0.216 | 4PSM | 0.163 |
| 0.512 | pU | 0.589 |
| pU | 0.229 |
| 0.195 | pU | 0.578 |
| 0.640 |
| pC | 0.333 | pC | 0.608 | pC | NA | pC | NA |
Note: The values in the table represent the average values of the respective performance measures across all 1620 experiments for the first three columns. The last column only reports results for those experiments where the true mean effect = 0. The three “best” performing estimators on the dimensions of Bias, MSE, and Coverage rates/Type I Error (EK, WAAP, and AK2) are color‐coded to facilitate comparison across performance measures.
Estimators:
3PSM/4PSM = Three‐Parameter/Four‐Parameter Selection Models
AK1 = Andrews and Kasy's “symmetric selection” model
AK2 = Andrews and Kasy's “asymmetric selection” selection
EK = Bom and Rachinger's Endogenous Kink estimator
pC = p‐curve
pU = p‐uniform
PP = PET‐PEESE (Stanley and Doucouliagos )
RE = Random Effects
TF = Trim and Fill
WAAP = Stanley et al's Weighted Average of the Adequately Powered‐WLS hybrid estimator.
This column reports the average, absolute value of the difference between (a) the percent of times the 95% confidence interval contains the true mean value and (b) 95%.
This column reports the percentage of false positives when the true mean effect = 0; that is, the percent of times an estimate is statistically significant when there is no true effect.
Comparison of Estimator Performance across Simulation Environments
| A. |Bias| | |||||||
|---|---|---|---|---|---|---|---|
|
|
|
|
| ||||
|
| 0.031 |
| 0.200 |
| 0.071 |
| 0.058 |
|
| 0.036 |
| 0.213 |
| 0.089 |
| 0.062 |
|
| 0.040 |
| 0.256 |
| 0.099 |
| 0.064 |
|
| 0.050 |
| 0.263 |
| 0.124 |
| 0.071 |
|
| 0.053 |
| 0.284 |
| 0.147 |
| 0.080 |
|
| 0.060 |
| 0.298 |
| 0.187 | TF | 0.091 |
|
| 0.083 |
| 0.390 |
| 0.238 |
| 0.095 |
|
| 0.088 |
| 0.468 |
| 0.262 |
| 0.105 |
|
| 0.107 |
| 0.550 |
| 0.361 |
| 0.107 |
| pU | 0.146 |
| 1.530 | pU | 0.373 |
| 0.114 |
| pC | 0.420 |
| 1.556 | pC | 0.521 |
| 0.150 |
Note: The four panels rank the performance of the 11 estimators on the basis of their average Bias, MSE, |Coverage‐0.95|, and Type I Error performance, disaggregated by simulation environment. Estimators are ranked from “best” (least Bias, smallest MSE, etc.) to worst. Values in the tables are the average values for the respective performance measures and simulation environments. In each panel the best and second best performing estimators in the SD&I environments are color‐coded brown and gray, respectively. This allows one to track their relative performance across the remaining three simulation environments.
It is important to note that the maximization procedures that underlie some of the estimators do not always converge. Averages across estimators will not be comparable if they average across different experiments due to lack of convergence. To indicate this in the table, we indicate three types of convergence behaviour. Boldfaced estimators indicate a convergence rate of 99% or higher (eg, AK1). Conventional, non‐boldfaced type indicates that the estimator converged between 90%‐99% of the time (eg, AK1). Italicized estimators indicate that convergence rates were lower than 90% (eg, AK1).
This panel reports the average, absolute value of the difference between (i) the percent of times the 95% confidence interval contains the true mean value and (ii) 95%.
This panel reports the percentage of false positives when the true mean effect = 0; that is, the percent of times an estimate is statistically significant when there is no true effect.
Sample size and effect heterogeneity as determinants of absolute estimator performance: CSG&H simulation environment
| Estimator | Bias | MSE | ||
|---|---|---|---|---|
|
|
|
|
| |
|
|
−0.0143* (0.0074) |
0.1147*** (0.0068) | −0.0101*** (0.0018) |
0.0240*** (0.0018) |
|
|
0.0112 (0.0116) |
0.2214*** (0.0098) | −0.0160*** (0.0045) |
0.0812*** (0.0040) |
|
|
0.0029 (0.0101) |
0.1624*** (0.0088) | −0.0156*** (0.0034) | 0.0536*** (0.0030) |
|
|
−0.0366*** (0.0091) |
0.1163*** (0.0078) | −0.0235*** (0.0027) | 0.0344*** (0.0024) |
|
|
−0.0150 (0.0121) |
0.1555*** (0.0093) | −0.0121*** (0.0042) | 0.0413*** (0.0033) |
|
|
−0.0355** (0.0168) |
0.1883*** (0.0122) | −0.0458*** (0.0099) | 0.0676*** (0.0075) |
|
|
−0.0206*** (0.0069) |
0.0868*** (0.0060) | −0.0443*** (0.0040) | 0.0468*** (0.0037) |
|
|
−0.0222 (0.0178) |
0.2180*** (0.0151) | −0.0139** (0.0083) | 0.0927*** (0.0071) |
|
|
−0.0286*** (0.0058) |
0.0125*** (0.0058) | −0.055*** (0.0041) | 0.0399*** (0.0039) |
|
|
−0.0180 (0.0124) |
0.1352*** (0.0121) | −0.0182*** (0.0050) | 0.0575*** (0.0053) |
|
|
−0.0403*** (0.0151) |
0.1360*** (0.0150) | −0.1140*** (0.0302) |
−0.0025 (0.0318) |
Note: The table reports the results of estimating Equations (8a) and (8b) in the text. Regressions were estimated using OLS with bootstrapped t‐statistics to obtain p‐values. Each regression used the Bias/MSE results for a given estimator j. The respective samples were constructed from the individual results of the 756 experiments in the Carter, Schönbrodt, Gervais, and Hilgard simulations. Bootstrap SEs are reported in parentheses. When estimating the model we use Sample size/1000. This transformation increases the size of by a factor of 1000, but leaves economic and statistical significance unchanged.
The relationship between relative estimator performance, sample size, and I : CSG&H simulation environment
| A. Sample size = 10 | |||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
|
|
| ||||||||||
|
|
|
|
|
|
| ||||||
| AK1 | 0.028 | 4PSM | 0.071 |
| 0.097 | AK1 | 0.006 |
| 0.027 |
| 0.027 |
| 4PSM | 0.033 | 3PSM | 0.074 |
| 0.104 | 3PSM | 0.007 | pU | 0.042 |
| 0.045 |
| 3PSM | 0.035 |
| 0.087 |
| 0.110 | 4PSM | 0.008 | TF | 0.043 |
| 0.057 |
|
| 0.040 |
| 0.088 |
| 0.119 | TF | 0.010 | 3PSM | 0.043 |
| 0.069 |
| TF | 0.042 |
| 0.098 |
| 0.146 |
| 0.010 |
| 0.046 |
| 0.075 |
|
| 0.047 | pU | 0.107 |
| 0.153 |
| 0.010 | 4PSM | 0.049 |
| 0.078 |
|
| 0.063 |
| 0.112 |
| 0.160 |
| 0.018 |
| 0.068 |
| 0.093 |
|
| 0.082 | TF | 0.127 |
| 0.177 |
| 0.021 |
| 0.081 | pU | 0.114 |
| pU | 0.090 | pC | 0.147 |
| 0.179 | pU | 0.023 |
| 0.092 | pC | 0.164 |
|
| 0.101 |
| 0.160 | pU | 0.253 |
| 0.030 |
| 0.102 |
| 0.209 |
| pC | 0.150 |
| 0.188 | pC | 0.270 | pC | 0.278 | pC | 0.203 |
| 0.220 |
Note: The panels above rank the performance of the 11 estimators on the basis of their average Bias and MSE performance, disaggregated by {sample size, effect heterogeneity} categories. Estimators are ranked from “best” (least Bias, smallest MSE) to worst. Values in the tables are the average values for the respective performance measures and {sample size, effect heterogeneity} categories. For both Bias and MSE, the top two estimators in the cell for smallest sample size (10) and effect heterogeneity (low I 2) are identified by color‐coding. For Bias, these are the AK1 and 4PSM estimators. For MSE, they are AK1 and 3PSM. The relative position of these estimators are then tracked as sample size and effect heterogeneity increases.
It is important to note that the maximization procedures that underlie some of the estimators do not always converge. Averages across estimators will not be comparable if they average across different experiments due to lack of convergence. To indicate this in the table, we indicate three types of convergence behaviour. Boldfaced estimators indicate a convergence rate of 99% or higher (eg, AK1). Conventional, non‐boldfaced type indicates that the estimator converged between 90%–99% of the time (eg, AK1). Italicized estimators indicate that convergence rates were lower than 90% (eg, AK1).
Comparison of MSE performance: Sample size = 100, high I, 2 CSG&H simulation environment
|
|
| ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
|
|
|
|
|
|
|
|
|
|
|
| |
| {0, 0.822, None, No} | 0.005 | 0.207 | 0.203 | 0.002 | 0.004 | 0.008 | 0.002 | 0.012 | 0.007 | 0.019 | 0.025 |
| {0.2, 0.821, None, No} | 0.006 | 0.110 | 0.106 | 0.002 | 0.004 | 0.008 | 0.002 | 0.012 | 0.017 | 0.019 | 0.025 |
| {0.5, 0.818, None, No} | 0.007 | 0.032 | 0.029 | 0.002 | 0.004 | 0.008 | 0.003 | 0.011 | 0.014 | 0.012 | 0.027 |
| {0.8, 0.810, None, No} | 0.010 | 0.004 | 0.003 | 0.002 | 0.005 | 0.008 | 0.004 | 0.009 | 0.010 | 0.013 | 0.029 |
| {0, 0.864, Med, No} | 0.004 | 0.101 | 0.096 | 0.003 | 0.020 | 0.028 | 0.001 | 0.025 | 0.005 | 0.030 | 0.034 |
| {0.2, 0.856, Med, No} | 0.006 | 0.046 | 0.040 | 0.003 | 0.032 | 0.037 | 0.003 | 0.034 | 0.016 | 0.035 | 0.039 |
| {0.5, 0.835, Med, No} | 0.012 | 0.008 | 0.006 | 0.003 | 0.048 | 0.066 | 0.012 | 0.079 | 0.015 | 0.027 | 0.043 |
| {0.8, 0.805, Med, No} | 0.019 | 0.003 | 0.004 | 0.003 | 0.051 | 0.079 | 0.021 | 0.083 | 0.010 | 0.016 | 0.039 |
| {0, 0.879, High, No} | 0.003 | 0.070 | 0.065 | 0.004 | 0.040 | 0.050 | 0.001 | 0.043 | 0.005 | 0.042 | 0.045 |
| {0.2, 0.869, High, No} | 0.006 | 0.029 | 0.024 | 0.005 | 0.063 | 0.065 | 0.004 | 0.055 | 0.015 | 0.050 | 0.052 |
| {0.5, 0.837, High, No} | 0.010 | 0.005 | 0.003 | 0.006 | 0.096 | 0.109 | 0.018 | 0.120 | 0.013 | 0.034 | 0.052 |
| {0.8, 0.803, High, No} | 0.020 | 0.005 | 0.007 | 0.004 | 0.109 | 0.142 | 0.031 | 0.142 | 0.011 | 0.019 | 0.047 |
| {0, 0.769, None, Med} | 0.022 | 0.056 | 0.055 | 0.053 | 0.009 | 0.020 | 0.021 | 0.009 | 0.020 | 0.014 | 0.010 |
| {0, 0.763, Med, Med} | 0.047 | 0.009 | 0.009 | 0.100 | 0.006 | 0.018 | 0.014 | 0.009 | 0.020 | 0.011 | 0.008 |
| {0, 0.757, High, Med} | 0.061 | 0.003 | 0.003 | 0.125 | 0.010 | 0.012 | 0.013 | 0.005 | 0.021 | 0.013 | 0.010 |
| {0, 0.920, None, Med} | 0.031 | 0.201 | 0.197 | 0.103 | 0.006 | 0.086 | 0.041 | 0.023 | 0.040 | 0.030 | 0.027 |
| {0.2, 0.859, None, Med} | 0.034 | 0.107 | 0.103 | 0.106 | 0.009 | 0.063 | 0.027 | 0.022 | 0.033 | 0.034 | 0.022 |
| {0.5, 0.785, None, Med} | 0.012 | 0.030 | 0.028 | 0.058 | 0.007 | 0.021 | 0.011 | 0.018 | 0.016 | 0.012 | 0.017 |
| {0.8, 0.774, None, Med} | 0.003 | 0.003 | 0.003 | 0.023 | 0.004 | 0.007 | 0.003 | 0.012 | 0.007 | 0.008 | 0.026 |
| {0, 0.908, Med, Med} | 0.050 | 0.100 | 0.090 | 0.136 | 0.072 | 0.151 | 0.029 | 0.048 | 0.040 | 0.027 | 0.024 |
| {0.2, 0.829, Med, Med} | 0.036 | 0.045 | 0.038 | 0.119 | 0.060 | 0.146 | 0.009 | 0.060 | 0.030 | 0.030 | 0.021 |
| {0, 0.901, High, Med} | 0.060 | 0.069 | 0.060 | 0.153 | 0.108 | 0.136 | 0.026 | 0.042 | 0.042 | 0.030 | 0.027 |
| {0.2, 0.816, High, Med} | 0.038 | 0.029 | 0.022 | 0.124 | 0.111 | 0.145 | 0.006 | 0.067 | 0.030 | 0.032 | 0.022 |
| {0, 0.755, None, Strong} | 0.071 | 0.056 | 0.056 | 0.133 | 0.010 | 0.009 | 0.036 | ‐ | 0.033 | 0.030 | 0.011 |
| {0, 0.895, None, Strong} | 0.116 | 0.202 | 0.196 | 0.255 | 0.019 | 0.101 | 0.081 | ‐ | 0.087 | 0.074 | 0.043 |
| {0.2, 0.807, None, Strong} | 0.080 | 0.108 | 0.104 | 0.185 | 0.018 | 0.043 | 0.050 | ‐ | 0.064 | 0.053 | 0.023 |
| {0.5, 0.759, None, Strong} | 0.019 | 0.030 | 0.028 | 0.083 | 0.009 | 0.013 | 0.014 | ‐ | 0.023 | 0.015 | 0.014 |
| {0.8, 0.768, None, Strong} | 0.002 | 0.003 | 0.003 | 0.031 | 0.005 | 0.006 | 0.003 | ‐ | 0.008 | 0.008 | 0.027 |
| {0, 0.843, Med, Strong} | 0.130 | 0.101 | 0.090 | 0.271 | 0.051 | 0.053 | 0.056 | ‐ | 0.081 | 0.060 | 0.034 |
| {0, 0.823, High, Strong} | 0.135 | 0.071 | 0.060 | 0.277 | 0.041 | 0.041 | 0.051 | ‐ | 0.080 | 0.057 | 0.031 |
|
| 0.035 | 0.062 | 0.058 | 0.079 | 0.034 | 0.056 | 0.020 | 0.041 | 0.027 | 0.028 | 0.028 |
|
| (0.002,0.135) | (0.003,0.207) | (0.003,0.203) | (0.002,0.277) | (0.004,0.111) | (0.006,0.151) | (0.001,0.081) | (0.005,0.142) | (0.005,0.087) | (0.008,0.074) | (0.008,0.052) |
Note: This table reports estimator MSE performance results for the 30 experiments included within the {sample size = 100, high I 2} category of the CSG&H simulations. The estimators are described in Section 2 of the text. The first column gives details about the individual experiment (cf. the bottom panel in Appendix 2 in Data S1). Each cell represents results for a single experiment consisting of 3000 simulated meta‐analyses. Each simulated meta‐analysis produces a single estimate of the mean population effect. The numbers in the table are the averaged mean squared error (MSE) value for the 3000 simulated meta‐analyses for that estimator and experiment. The last two rows of each panel report the overall average MSE, followed by the smallest and largest (average) MSE values over the 30 experiments. Yellow‐highlighted cells in the upper panel of the table identify the smallest (average) MSE for each experiment. The yellow‐highlighted cell in the bottom panel of the table identifies the estimator (AK1) with the lowest overall, averaged MSE value. The blue‐highlighted cells identify estimators that are close to AK1 in terms of overall performance.
Indicates that all estimates failed to converge for that experiment.