| Literature DB >> 32191073 |
Jian-Qiao Zhu1, Adam N Sanborn1, Nick Chater2.
Abstract
Human probability judgments are systematically biased, in apparent tension with Bayesian models of cognition. But perhaps the brain does not represent probabilities explicitly, but approximates probabilistic calculations through a process of sampling, as used in computational probabilistic models in statistics. Naïve probability estimates can be obtained by calculating the relative frequency of an event within a sample, but these estimates tend to be extreme when the sample size is small. We propose instead that people use a generic prior to improve the accuracy of their probability estimates based on samples, and we call this model the Bayesian sampler. The Bayesian sampler trades off the coherence of probabilistic judgments for improved accuracy, and provides a single framework for explaining phenomena associated with diverse biases and heuristics such as conservatism and the conjunction fallacy. The approach turns out to provide a rational reinterpretation of "noise" in an important recent model of probability judgment, the probability theory plus noise model (Costello & Watts, 2014, 2016a, 2017; Costello & Watts, 2019; Costello, Watts, & Fisher, 2018), making equivalent average predictions for simple events, conjunctions, and disjunctions. The Bayesian sampler does, however, make distinct predictions for conditional probabilities and distributions of probability estimates. We show in 2 new experiments that this model better captures these mean judgments both qualitatively and quantitatively; which model best fits individual distributions of responses depends on the assumed size of the cognitive sample. (PsycInfo Database Record (c) 2020 APA, all rights reserved).Entities:
Year: 2020 PMID: 32191073 PMCID: PMC7571263 DOI: 10.1037/rev0000190
Source DB: PubMed Journal: Psychol Rev ISSN: 0033-295X Impact factor: 8.934
Figure 1Illustrations of Bayes’ prior, Jeffreys’ prior, Haldane’s prior, and the symmetric Beta prior that best fits empirical data on the real-world occurrence of probabilities. Empirical data are plotted as a histogram.
Probabilistic Identities and Their Predicted Values From Probability Theory
| Identity name | Identity calculation | Predicted value |
|---|---|---|
| = 0 | ||
|
| = 0 | |
|
| = 0 | |
|
| = 0 | |
|
| = 0 | |
|
| = 0 | |
|
| = 0 | |
|
| = 0 | |
|
| = 0 | |
|
| = 0 | |
|
| = 0 | |
|
| = 0 | |
|
| = 0 | |
|
| = 0 | |
|
| = 0 | |
|
| = 0 | |
|
| = 0 | |
|
| = 0 | |
Figure 2An illustration of model behaviors for PT+N (Left) and Bayesian sampler (Right), showing the underlying subjective probability of a simple event A (x-axis) and the expected probability estimates (y-axis) predicted by models. This link holds here when the Bayesian sampler uses a generic prior of Beta(1, 1).
Model Agreement (on Average) With Probability Theory for Probabilistic Identities
| Probability theory | Relative frequency | PT+N (Δ | PT+N (Δ | Bayesian sampler ( | Bayesian sampler ( |
|---|---|---|---|---|---|
| ✓ | ✓ | If
| ✓ | If
| |
| ✓ | ✓ | If
| ✓ | If
| |
| ✓ | No | No | No | No | |
| ✓ | ✓ | ✓ | If
| If
| |
| ✓ | ✓ | ✓ | If
| If
| |
| ✓ | No | No | No | No | |
Figure 3Human probability estimates and model predictions. (A) Mean probability estimates and 95% confidence intervals across participants. The overlaid dots are best-fitting model predictions generated by the most general form of each model (red dot: the Bayesian sampler, green square: the relative frequency model, and blue triangle: the probability theory plus noise model). (B) The mean of the probabilistic identities from to with 95% confidence intervals across participants. The overlaid dots are best-fitting model predictions for models fit to the mean estimates in (A).
Summary of t-tests and Bayes Factors for Key Probabilistic Identities: Z10 to Z13 of Experiment 1
| { | { | |||||
|---|---|---|---|---|---|---|
| Null hypothesis | Bayes factor | Bayes factor | ||||
| 2.85 | −4.79 | |||||
| 4.67 | −.533 | .596 | .163 | |||
| 4.02 | −3.91 | |||||
| 5.50 | −3.24 | |||||
Figure 4Posterior probabilities of models for individual participants in Experiment 1. Each stacked bar represents the split across models of the approximate posterior probabilities for one participant. 67.80%, 1.69%, and 30.51% participants can be best described by the Bayesian sampler (combined over the two variants), relative frequency, and PT+N models (combined over the two variants) respectively.
Figure 5(A) Minimized Wasserstein distances between model predicted distributions and individual judgments of Experiment 1 vary with the maximum number of samples allowed for each individual for the Bayesian sampler (red) and PT+N (blue). Error bars are 95% confidence interval across participants. The smaller the Wasserstein distance, the better the model in explaining distributions of raw judgments. (B) Best-fitting model parameters for the Bayesian Sampler with median values across participants are displayed in red. (C) Best-fitting model parameters for the with median values across participants are displayed in red.
Figure 6Human probability estimates and model predictions. (A) Mean probability estimates and 95% confidence intervals across participants. The overlaid dots are best-fitting model predictions generated by the most general form of each model (red dot: the Bayesian sampler, green square: the relative frequency model, and blue triangle: the probability theory plus noise model). (B) The mean of the probabilistic identities from to with 95% confidence intervals across participants. The overlaid dots are best-fitting model predictions for models fit to the mean estimates in (A).
Summary of t-tests and Bayes Factors for Key Probabilistic Identities: Z10 to Z13 of Experiment 2
| { | { | |||||
|---|---|---|---|---|---|---|
| Null hypothesis | Bayes factor | Bayes factor | ||||
| .240 | .811 | .124 | −.298 | .767 | .126 | |
| −.946 | .347 | .185 | −1.42 | .161 | .314 | |
| .073 | .942 | .121 | −2.18 | .880 | ||
| −1.06 | .291 | .208 | −.069 | .945 | .121 | |
Figure 7Posterior probabilities of models in Experiment 2. Each stacked bar represents the split across models of the approximate posterior probabilities for one participant. 61.90%, 0%, and 38.10% participants can be best described by the Bayesian sampler (combined over the two variants), relative frequency, and PT+N models (combined over the two variants) respectively.
Figure 8(A) Minimized Wasserstein distances between model predicted distributions and individual judgments of Experiment 2 vary with the maximum number of samples allowed for each individual for the Bayesian sampler (red) and PT+N (blue). Error bars are 95% confidence interval across participants. The smaller the Wasserstein distance, the better the model in explaining distributions of raw judgments. (B) Best-fitting model parameters for the Bayesian Sampler with median values across participants displayed in red lines. (C) Best-fitting model parameters for the probability theory plus noise model with median values across participants displayed in red lines.
| Throw | Hit coconut | Knocked coconut off stick |
|---|---|---|
| 1 | Yes | Yes |
Predicted Average Values of Probabilistic Identities for the Bayesian Sampler (BS) and Probability Theory Plus Noise (PT+N)
| Mean identity | Model | Prediction |
|---|---|---|
| BS ( | 0 | |
| BS ( | ||
| PT+N ( | 0 | |
| PT+N ( |
| |
| BS ( | 0 | |
| BS ( |
| |
| PT+N ( | 0 | |
| PT+N ( |
| |
| BS ( | ||
| BS ( |
| |
| PT+N ( | ||
| PT+N ( |
| |
| BS ( | ||
| BS ( |
| |
| PT+N ( | ||
| PT+N ( |
| |
| BS ( | ||
| BS ( |
| |
| PT+N ( | ||
| PT+N ( |
| |
| BS ( | ||
| BS ( |
| |
| PT+N ( | ||
| PT+N ( |
| |
| BS ( | 2 | |
| BS ( |
| |
| PT+N ( | 2 | |
| PT+N ( |
| |
| BS ( | 2 | |
| BS ( |
| |
| PT+N ( | 2 | |
| PT+N ( |
| |
| BS ( |
| |
| BS ( |
| |
| PT+N ( | 0 | |
| PT+N ( | 0 | |
| BS ( |
| |
| BS ( |
| |
| PT+N ( | 0 | |
| PT+N ( | 0 | |
| BS ( |
| |
| BS ( |
| |
| PT+N ( | 0 | |
| PT+N ( | 0 | |
| BS ( |
| |
| BS ( |
| |
| PT+N ( | 0 | |
| PT+N ( | 0 | |
| BS ( |
| |
| BS ( |
| |
| PT+N ( | 0 | |
| PT+N ( | 0 | |
| BS ( |
| |
| BS ( |
| |
| PT+N ( |
| |
| PT+N ( |
| |
| BS ( |
| |
| BS ( |
| |
| PT+N ( |
| |
| PT+N ( |
| |
| BS ( |
| |
| BS ( |
| |
| PT+N ( |
| |
| PT+N ( |
| |
| BS ( |
| |
| BS ( |
| |
| PT+N ( |
| |
| PT+N ( |
| |
| BS ( |
| |
| BS ( |
| |
| PT+N ( |
| |
| PT+N ( |
| |