Warning: Undefined array key "mm" in /www/wwwroot/www.ai-bt.com/si.php on line 10 Deprecated: trim(): Passing null to parameter #1 ($string) of type string is deprecated in /www/wwwroot/www.ai-bt.com/si.php on line 10 A Normative Account of Confirmation Bias During Reinforcement Learning.

Literature DB >> 34758486

A Normative Account of Confirmation Bias During Reinforcement Learning.

Germain Lefebvre¹, Christopher Summerfield², Rafal Bogacz³.

Abstract

Reinforcement learning involves updating estimates of the value of states and actions on the basis of experience. Previous work has shown that in humans, reinforcement learning exhibits a confirmatory bias: when the value of a chosen option is being updated, estimates are revised more radically following positive than negative reward prediction errors, but the converse is observed when updating the unchosen option value estimate. Here, we simulate performance on a multi-arm bandit task to examine the consequences of a confirmatory bias for reward harvesting. We report a paradoxical finding: that confirmatory biases allow the agent to maximize reward relative to an unbiased updating rule. This principle holds over a wide range of experimental settings and is most influential when decisions are corrupted by noise. We show that this occurs because on average, confirmatory biases lead to overestimating the value of more valuable bandits and underestimating the value of less valuable bandits, rendering decisions overall more robust in the face of noise. Our results show how apparently suboptimal learning rules can in fact be reward maximizing if decisions are made with finite computational precision.

Entities: Chemical

Mesh：

Year: 2022 PMID： 34758486 PMCID： PMC7612695 DOI： 10.1162/neco_a_01455

Source DB: PubMed Journal: Neural Comput ISSN： 0899-7667 Impact factor: 2.026

26 in total

1. The physics of optimal decision making: a formal analysis of models of performance in two-alternative forced-choice tasks.

Authors: Rafal Bogacz; Eric Brown; Jeff Moehlis; Philip Holmes; Jonathan D Cohen
Journal: Psychol Rev Date: 2006-10 Impact factor: 8.934

A Normative Account of Confirmation Bias During Reinforcement Learning.

1. The physics of optimal decision making: a formal analysis of models of performance in two-alternative forced-choice tasks.

2. Cortical substrates for exploratory decisions in humans.

3. The drift diffusion model as the choice rule in reinforcement learning.

4. Optimal data selection: revision, review, and reevaluation.

5. Confirmation Bias through Selective Overweighting of Choice-Consistent Evidence.

6. Representation of confidence associated with a decision by neurons in the parietal cortex.

7. Robust averaging protects decisions from noise in neural computations.

8. Confirmation bias in human reinforcement learning: Evidence from counterfactual feedback processing.

9. Learning the payoffs and costs of actions.

10. Selective Effects of the Loss of NMDA or mGluR5 Receptors in the Reward System on Adaptive Decision-Making.

1. Implicit Counterfactual Effect in Partial Feedback Reinforcement Learning: Behavioral and Modeling Approach.

2. Humans actively sample evidence to support prior beliefs.

3. Model Sharing in the Human Medial Temporal Lobe.