Literature DB >> 30502584

Selective maintenance of value information helps resolve the exploration/exploitation dilemma.

Michael N Hallquist1, Alexandre Y Dombrovski2.   

Abstract

In natural environments with many options of uncertain value, one faces a difficult tradeoff between exploiting familiar, valuable options or searching for better alternatives. Reinforcement learning models of this exploration/exploitation dilemma typically modulate the rate of exploratory choices or preferentially sample uncertain options. The extent to which such models capture human behavior remains unclear, in part because they do not consider the constraints on remembering what is learned. Using reinforcement-based timing as a motivating example, we show that selectively maintaining high-value actions compresses the amount of information to be tracked in learning, as quantified by Shannon's entropy. In turn, the information content of the value representation controls the balance between exploration (high entropy) and exploitation (low entropy). Selectively maintaining preferred action values while allowing others to decay renders the choices increasingly exploitative across learning episodes. To adjudicate among alternative maintenance and sampling strategies, we developed a new reinforcement learning model, StrategiC ExPloration/ExPloitation of Temporal Instrumental Contingencies (SCEPTIC). In computational studies, a resource-rational selective maintenance approach was as successful as more resource-intensive strategies. Furthermore, human behavior was consistent with selective maintenance; information compression was most pronounced in subjects with superior performance and non-verbal intelligence, and in learnable vs. unlearnable contingencies. Cognitively demanding uncertainty-directed exploration recovered a more accurate representation in simulations with no foraging advantage and was strongly unsupported in our human study.
Copyright © 2018 The Authors. Published by Elsevier B.V. All rights reserved.

Entities:  

Keywords:  Conditioning, operant; Decision making; Entropy; Exploratory behavior; Memory, short-term; Reinforcement

Mesh:

Year:  2018        PMID: 30502584      PMCID: PMC6328060          DOI: 10.1016/j.cognition.2018.11.004

Source DB:  PubMed          Journal:  Cognition        ISSN: 0010-0277


  58 in total

Review 1.  Learning and selective attention.

Authors:  P Dayan; S Kakade; P R Montague
Journal:  Nat Neurosci       Date:  2000-11       Impact factor: 24.884

Review 2.  Neural basis of the perception and estimation of time.

Authors:  Hugo Merchant; Deborah L Harrington; Warren H Meck
Journal:  Annu Rev Neurosci       Date:  2013-05-29       Impact factor: 12.449

3.  The image of time: a voxel-wise meta-analysis.

Authors:  Martin Wiener; Peter Turkeltaub; H B Coslett
Journal:  Neuroimage       Date:  2009-10-02       Impact factor: 6.556

Review 4.  A neural substrate of prediction and reward.

Authors:  W Schultz; P Dayan; P R Montague
Journal:  Science       Date:  1997-03-14       Impact factor: 47.728

5.  A model for Pavlovian learning: variations in the effectiveness of conditioned but not of unconditioned stimuli.

Authors:  J M Pearce; G Hall
Journal:  Psychol Rev       Date:  1980-11       Impact factor: 8.934

6.  The strategic retention of task-relevant objects in visual working memory.

Authors:  Ashleigh M Maxcey-Richard; Andrew Hollingworth
Journal:  J Exp Psychol Learn Mem Cogn       Date:  2012-07-30       Impact factor: 3.051

Review 7.  Dorsal Anterior Cingulate Cortex: A Bottom-Up View.

Authors:  Sarah R Heilbronner; Benjamin Y Hayden
Journal:  Annu Rev Neurosci       Date:  2016-04-18       Impact factor: 12.449

8.  Learning the value of information in an uncertain world.

Authors:  Timothy E J Behrens; Mark W Woolrich; Mark E Walton; Matthew F S Rushworth
Journal:  Nat Neurosci       Date:  2007-08-05       Impact factor: 24.884

9.  Exploring the 4th dimension: hippocampus, time, and memory revisited.

Authors:  Bin Yin; Andrew B Troger
Journal:  Front Integr Neurosci       Date:  2011-08-11

10.  VBA: a probabilistic treatment of nonlinear models for neurobiological and behavioural data.

Authors:  Jean Daunizeau; Vincent Adam; Lionel Rigoux
Journal:  PLoS Comput Biol       Date:  2014-01-23       Impact factor: 4.475

View more
  4 in total

Review 1.  From exploration to exploitation: a shifting mental mode in late life development.

Authors:  R Nathan Spreng; Gary R Turner
Journal:  Trends Cogn Sci       Date:  2021-09-27       Impact factor: 20.229

2.  Humans adaptively resolve the explore-exploit dilemma under cognitive constraints: Evidence from a multi-armed bandit task.

Authors:  Vanessa M Brown; Michael N Hallquist; Michael J Frank; Alexandre Y Dombrovski
Journal:  Cognition       Date:  2022-07-30

Review 3.  Search for solutions, learning, simulation, and choice processes in suicidal behavior.

Authors:  Alexandre Y Dombrovski; Michael N Hallquist
Journal:  Wiley Interdiscip Rev Cogn Sci       Date:  2021-05-18

4.  Differential reinforcement encoding along the hippocampal long axis helps resolve the explore-exploit dilemma.

Authors:  Alexandre Y Dombrovski; Beatriz Luna; Michael N Hallquist
Journal:  Nat Commun       Date:  2020-10-26       Impact factor: 14.919

  4 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.