| Literature DB >> 23087606 |
Elise Payzan-Lenestour1, Peter Bossaerts.
Abstract
Little is known about how humans solve the exploitation/exploration trade-off. In particular, the evidence for uncertainty-driven exploration is mixed. The current study proposes a novel hypothesis of exploration that helps reconcile prior findings that may seem contradictory at first. According to this hypothesis, uncertainty-driven exploration involves a dilemma between two motives: (i) to speed up learning about the unknown, which may beget novel reward opportunities; (ii) to avoid the unknown because it is potentially dangerous. We provide evidence for our hypothesis using both behavioral and simulated data, and briefly point to recent evidence that the brain differentiates between these two motives.Entities:
Keywords: Bayesian learning; estimation uncertainty; exploration bonuses; restless bandit problem; unexpected uncertainty
Year: 2012 PMID: 23087606 PMCID: PMC3472893 DOI: 10.3389/fnins.2012.00150
Source DB: PubMed Journal: Front Neurosci ISSN: 1662-453X Impact factor: 4.677
Figure 1Comparative fits of the ambiguity averse and hybrid models. The comparison of the fits is based on the negative log-likelihood (-LL) criterion. Each data point corresponds to one subject (500 samples on average per subject). The hybrid model fits better when the data point is below the 45° line.
Figure 2Economic performances of models featuring different kinds of uncertainty-driven exploration, as a function of the inverse temperature. Each point reports the economic performance averaged across 500 simulations of 500 trials each. Performance is measured by the amount of money accumulated till the 500th trial (“final gain”). X-axis: β parameter (inverse temperature in the softmax rule). Y-axis: average final gain across 500 simulations. Star (*): performance of the ambiguity seeker model. Circle (o): performance of the ambiguity averse model. Dot (.): performance of the novelty seeker model. Cross (×): performance of the hybrid model. The hybrid model combines ambiguity aversion and novelty seeking as described in the main text. Dashed line: performance of the base model in which there is no uncertainty-driven exploration (for reference). Vertical bars represent standard errors.