| Literature DB >> 32330175 |
Marcellin Martinie1, Tom Wilkening2, Piers D L Howe1.
Abstract
A common approach to improving probabilistic forecasts is to identify and leverage the forecasts from experts in the crowd based on forecasters' performance on prior questions with known outcomes. However, such information is often unavailable to decision-makers on many forecasting problems, and thus it can be difficult to identify and leverage expertise. In the current paper, we propose a novel algorithm for aggregating probabilistic forecasts using forecasters' meta-predictions about what other forecasters will predict. We test the performance of an extremised version of our algorithm against current forecasting approaches in the literature and show that our algorithm significantly outperforms all other approaches on a large collection of 500 binary decision problems varying in five levels of difficulty. The success of our algorithm demonstrates the potential of using meta-predictions to leverage latent expertise in environments where forecasters' expertise cannot otherwise be easily identified.Entities:
Mesh:
Year: 2020 PMID: 32330175 PMCID: PMC7182234 DOI: 10.1371/journal.pone.0232058
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Details of each aggregation approach used.
The name, formula, and description for each probabilistic aggregation approach used in this paper. The notation for each aggregation approach is explained in the main text above, excluding the aggregator, for which, due to its complexity, we refer readers to the original paper by Satopää et al. [8].
| Aggregation approach | Formula | Description |
|---|---|---|
| Simple average | Simple unweighted average of all individual forecasts in the crowd. | |
| Revealed Aggregator for the Gaussian Model under compound symmetry–see Satopää et al. [ | ||
| Minimal Pivoting | Simple average corrected by the minimal pivoting procedure [ | |
| Meta-probability Weighting (MPW) | Weighted average of forecasters’ probability forecasts, where weights are calculated from the normalized absolute difference between their probability forecasts and their meta-predictions about the average probability forecasted by others. |
Fig 1Overall performance of the standard vs. extremised versions of each aggregation approach.
The mean transformed Brier score over a total of 500 US grade school problems spanning five levels of difficulty. Error bars indicate the standard error. The standard version of each approach generates probabilistic forecasts according to their formulae shown in Table 1. The extremised version of each approach transforms these predictions using a simple extremisation function [. The extremised MPW algorithm significantly outperforms both the standard and extremised versions of every other aggregation approach.
Fig 2Performance of each aggregation approach on each level of difficulty.
The mean transformed Brier score for each level of difficulty of US grade school problems. Error bars indicate the standard error. The extremised MPW algorithm (blue bar) outperforms the best-performing version of all other aggregation approaches on problems from difficulties 2 to 5. The 95% CIs for mean difference in score between the extremised MPW algorithm and each other aggregation approach is shown in Table 2.
95% Confidence intervals for the mean difference in the transformed Brier score between the extremised MPW algorithm and the standard and extremised versions of each other aggregation approach.
Asterisks indicate where the difference in score was statistically significant at the α = .05 level according to the paired mean difference in transformed Brier score using the BCa bootstrap [19].
| Aggregation approach | Version | Difficulty 1 | Difficulty 2 | Difficulty 3 | Difficulty 4 | Difficulty 5 |
|---|---|---|---|---|---|---|
| Mean individual | Standard | [11.22, 15.58] | [14.44, 19.65] | [12.79, 18.19] | [7.93, 13.89] | [12.04, 16.79] |
| Extremised | [12.97, 18.48] | [17.32, 23.78] | [16.90, 23.08] | [13.16, 19.23] | [16.49, 22.04] | |
| Simple average | Standard | [3.62, 6.95] | [6.25, 10.44] | [5.63, 10.23] | [1.06, 6.78] | [5.24, 9.59] |
| Extremised | [-0.85, 3.03] | [3.54, 9.75] | [5.86, 11.57] | [3.23, 9.38] | [5.79, 12.72] | |
| Standard | [0.47, 6.11] | [3.02, 8.98] | [3.66, 8.49] | [0.53, 6.75] | [5.42, 12.03] | |
| Extremised | [-1.01, 6.63] | [2.32, 11.16] | [5.07, 13.25] | [3.73, 12.41] | [7.69, 18.29] | |
| Minimal pivoting | Standard | [1.44, 3.94] | [3.61, 6.80] | [3.20, 6.89] | [-0.17, 4.53] | [3.68, 7.22] |
| Extremised | [-2.23, 0.47] | [0.80, 5.11] | [2.88, 6.88] | [1.87, 6.50] | [3.75, 9.47] |
* indicates where p < .05
Fig 3Performance of each aggregation approach using cross-validated recalibration parameters.
This figure shows the mean performance of each approach using the fixed parameter value a = 2.5 (orange bars) vs. optimal recalibration parameters estimated via cross-validation (blue bars). Error bars show the standard error.