Literature DB >> 34758486

A Normative Account of Confirmation Bias During Reinforcement Learning.

Germain Lefebvre1, Christopher Summerfield2, Rafal Bogacz3.   

Abstract

Reinforcement learning involves updating estimates of the value of states and actions on the basis of experience. Previous work has shown that in humans, reinforcement learning exhibits a confirmatory bias: when the value of a chosen option is being updated, estimates are revised more radically following positive than negative reward prediction errors, but the converse is observed when updating the unchosen option value estimate. Here, we simulate performance on a multi-arm bandit task to examine the consequences of a confirmatory bias for reward harvesting. We report a paradoxical finding: that confirmatory biases allow the agent to maximize reward relative to an unbiased updating rule. This principle holds over a wide range of experimental settings and is most influential when decisions are corrupted by noise. We show that this occurs because on average, confirmatory biases lead to overestimating the value of more valuable bandits and underestimating the value of less valuable bandits, rendering decisions overall more robust in the face of noise. Our results show how apparently suboptimal learning rules can in fact be reward maximizing if decisions are made with finite computational precision.
© 2021 Massachusetts Institute of Technology. Published under a Creative CommonsAttribution 4.0 International (CC BY 4.0) license.

Entities:  

Mesh:

Year:  2022        PMID: 34758486      PMCID: PMC7612695          DOI: 10.1162/neco_a_01455

Source DB:  PubMed          Journal:  Neural Comput        ISSN: 0899-7667            Impact factor:   2.026


  26 in total

1.  The physics of optimal decision making: a formal analysis of models of performance in two-alternative forced-choice tasks.

Authors:  Rafal Bogacz; Eric Brown; Jeff Moehlis; Philip Holmes; Jonathan D Cohen
Journal:  Psychol Rev       Date:  2006-10       Impact factor: 8.934

2.  Cortical substrates for exploratory decisions in humans.

Authors:  Nathaniel D Daw; John P O'Doherty; Peter Dayan; Ben Seymour; Raymond J Dolan
Journal:  Nature       Date:  2006-06-15       Impact factor: 49.962

3.  The drift diffusion model as the choice rule in reinforcement learning.

Authors:  Mads Lund Pedersen; Michael J Frank; Guido Biele
Journal:  Psychon Bull Rev       Date:  2017-08

4.  Optimal data selection: revision, review, and reevaluation.

Authors:  Mike Oaksford; Nick Chater
Journal:  Psychon Bull Rev       Date:  2003-06

5.  Confirmation Bias through Selective Overweighting of Choice-Consistent Evidence.

Authors:  Bharath Chandra Talluri; Anne E Urai; Konstantinos Tsetsos; Marius Usher; Tobias H Donner
Journal:  Curr Biol       Date:  2018-09-13       Impact factor: 10.834

6.  Representation of confidence associated with a decision by neurons in the parietal cortex.

Authors:  Roozbeh Kiani; Michael N Shadlen
Journal:  Science       Date:  2009-05-08       Impact factor: 47.728

7.  Robust averaging protects decisions from noise in neural computations.

Authors:  Vickie Li; Santiago Herce Castañón; Joshua A Solomon; Hildward Vandormael; Christopher Summerfield
Journal:  PLoS Comput Biol       Date:  2017-08-25       Impact factor: 4.475

8.  Confirmation bias in human reinforcement learning: Evidence from counterfactual feedback processing.

Authors:  Stefano Palminteri; Germain Lefebvre; Emma J Kilford; Sarah-Jayne Blakemore
Journal:  PLoS Comput Biol       Date:  2017-08-11       Impact factor: 4.475

9.  Learning the payoffs and costs of actions.

Authors:  Moritz Möller; Rafal Bogacz
Journal:  PLoS Comput Biol       Date:  2019-02-28       Impact factor: 4.475

10.  Selective Effects of the Loss of NMDA or mGluR5 Receptors in the Reward System on Adaptive Decision-Making.

Authors:  Przemysław Eligiusz Cieślak; Woo-Young Ahn; Rafał Bogacz; Jan Rodriguez Parkitna
Journal:  eNeuro       Date:  2018-10-05
View more
  3 in total

1.  Implicit Counterfactual Effect in Partial Feedback Reinforcement Learning: Behavioral and Modeling Approach.

Authors:  Zahra Barakchian; Abdol-Hossein Vahabie; Majid Nili Ahmadabadi
Journal:  Front Neurosci       Date:  2022-05-10       Impact factor: 5.152

2.  Humans actively sample evidence to support prior beliefs.

Authors:  Paula Kaanders; Pradyumna Sepulveda; Tomas Folke; Pietro Ortoleva; Benedetto De Martino
Journal:  Elife       Date:  2022-04-11       Impact factor: 8.713

3.  Model Sharing in the Human Medial Temporal Lobe.

Authors:  Leonie Glitz; Keno Juechems; Christopher Summerfield; Neil Garrett
Journal:  J Neurosci       Date:  2022-05-23       Impact factor: 6.709

  3 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.