| Literature DB >> 33210222 |
Rianne de Heide1,2, Peter D Grünwald3,4.
Abstract
Recently, optional stopping has been a subject of debate in the Bayesian psychology community. Rouder (Psychonomic Bulletin & Review 21(2), 301-308, 2014) argues that optional stopping is no problem for Bayesians, and even recommends the use of optional stopping in practice, as do (Wagenmakers, Wetzels, Borsboom, van der Maas & Kievit, Perspectives on Psychological Science 7, 627-633, 2012). This article addresses the question of whether optional stopping is problematic for Bayesian methods, and specifies under which circumstances and in which sense it is and is not. By slightly varying and extending Rouder's (Psychonomic Bulletin & Review 21(2), 301-308, 2014) experiments, we illustrate that, as soon as the parameters of interest are equipped with default or pragmatic priors-which means, in most practical applications of Bayes factor hypothesis testing-resilience to optional stopping can break down. We distinguish between three types of default priors, each having their own specific issues with optional stopping, ranging from no-problem-at-all (type 0 priors) to quite severe (type II priors).Entities:
Keywords: Bayesian statistics; Hypothesis testing; Model selection; Statistical inference
Year: 2021 PMID: 33210222 PMCID: PMC8219595 DOI: 10.3758/s13423-020-01803-x
Source DB: PubMed Journal: Psychon Bull Rev ISSN: 1069-9384
Fig. 1The interpretation of the posterior odds in Rouder’s experiment, from 20,000 replicate experiments. a The empirical sampling distribution of the posterior odds as a histogram under and . b Calibration plot: the observed posterior odds as a function of the nominal posterior odds
Fig. 2Calibration of the experiment of Section Example 1: Rouder’s example with a nuisance parameter, from 20,000 replicate experiments. a The observed posterior odds as a function of the nominal posterior odds. b The observed posterior odds as a function of the nominal posterior odds with optional stopping
Fig. 3Calibration in the t test experiment, Section Example 2: Bayesian , from 20,000 replicate experiments. a The distribution of posterior odds as a histogram under and in one figure. b The observed posterior odds as a function of the nominal posterior odds. c Distribution of the posterior odds with optional stopping. d The observed posterior odds as a function of the nominal posterior odds with optional stopping
Fig. 4Calibration in the t test experiment with fixed values for the means of and (Section Example 2: Bayesian , from 40,000 replicate experiments). a The observed posterior odds as a function of the nominal posterior odds. b The observed posterior odds as a function of the nominal posterior odds with optional stopping
Fig. 5Default priors that depend on aspects of the experimental setup: ag-priors for the regression example of Section Example 3: Bayesian linear regression and type II priors with different sample sizes: n = 20 (black), n = 23 (red), and n = 34 (blue). b Jeffreys’ prior for the Bernoulli model for the specific case that n is fixed in advance (no optional stopping): a beta (1/2,1/2) distribution
Overview of several common default Bayes factors (from the R-package BayesFactor Morey and Rouder (2015)), and their robustness against different kinds of optional stopping (proofs can be found in Hendriksen et al., (2018))
| Prior Cal. | Strong Calibration | Freq. OS | |
|---|---|---|---|
| Default Bayes Factors | |||
| T test (Rouder et al., |
|
|
|
|
| |||
| ANOVA (Rouder et al., |
|
|
|
|
| |||
| Regression |
|
|
|
| Rouder and Morey ( |
| ||
| Contingency Tables |
|
|
|
| Jamil et al., ( | |||
| Bayes Factors with proper, fully |
| N/A | N/A |
| subjective priors (Rouder, |
‘Prior Cal.’ means ‘prior calibration’ and ‘Freq. OS’ means ‘frequentist optional stopping’. Between parentheses is the type of prior used (0, I, or II), in the taxonomy introduced in this paper. The indicates that, formally, prior calibration works for the priors, yet, because we are in the default setting, the Bayes factor is not fully subjective, so prior calibration is not too meaningful—which is just the main point of this paper