Warning: Undefined array key "mm" in /www/wwwroot/www.ai-bt.com/si.php on line 10 Deprecated: trim(): Passing null to parameter #1 ($string) of type string is deprecated in /www/wwwroot/www.ai-bt.com/si.php on line 10 Selective maintenance of value information helps resolve the exploration/exploitation dilemma.

Literature DB >> 30502584

Selective maintenance of value information helps resolve the exploration/exploitation dilemma.

Michael N Hallquist¹, Alexandre Y Dombrovski².

Abstract

In natural environments with many options of uncertain value, one faces a difficult tradeoff between exploiting familiar, valuable options or searching for better alternatives. Reinforcement learning models of this exploration/exploitation dilemma typically modulate the rate of exploratory choices or preferentially sample uncertain options. The extent to which such models capture human behavior remains unclear, in part because they do not consider the constraints on remembering what is learned. Using reinforcement-based timing as a motivating example, we show that selectively maintaining high-value actions compresses the amount of information to be tracked in learning, as quantified by Shannon's entropy. In turn, the information content of the value representation controls the balance between exploration (high entropy) and exploitation (low entropy). Selectively maintaining preferred action values while allowing others to decay renders the choices increasingly exploitative across learning episodes. To adjudicate among alternative maintenance and sampling strategies, we developed a new reinforcement learning model, StrategiC ExPloration/ExPloitation of Temporal Instrumental Contingencies (SCEPTIC). In computational studies, a resource-rational selective maintenance approach was as successful as more resource-intensive strategies. Furthermore, human behavior was consistent with selective maintenance; information compression was most pronounced in subjects with superior performance and non-verbal intelligence, and in learnable vs. unlearnable contingencies. Cognitively demanding uncertainty-directed exploration recovered a more accurate representation in simulations with no foraging advantage and was strongly unsupported in our human study.

Entities: Chemical Disease Gene Species

Keywords: Conditioning, operant; Decision making; Entropy; Exploratory behavior; Memory, short-term; Reinforcement

Mesh：

Year: 2018 PMID： 30502584 PMCID： PMC6328060 DOI： 10.1016/j.cognition.2018.11.004

Source DB: PubMed Journal: Cognition ISSN： 0010-0277

58 in total

Review 1. Learning and selective attention.

Authors: P Dayan; S Kakade; P R Montague
Journal: Nat Neurosci Date: 2000-11 Impact factor: 24.884

Review 2. Neural basis of the perception and estimation of time.

Authors: Hugo Merchant; Deborah L Harrington; Warren H Meck
Journal: Annu Rev Neurosci Date: 2013-05-29 Impact factor: 12.449

3. The image of time: a voxel-wise meta-analysis.

Authors: Martin Wiener; Peter Turkeltaub; H B Coslett
Journal: Neuroimage Date: 2009-10-02 Impact factor: 6.556

Review 4. A neural substrate of prediction and reward.

Authors: W Schultz; P Dayan; P R Montague
Journal: Science Date: 1997-03-14 Impact factor: 47.728

5. A model for Pavlovian learning: variations in the effectiveness of conditioned but not of unconditioned stimuli.

Authors: J M Pearce; G Hall
Journal: Psychol Rev Date: 1980-11 Impact factor: 8.934

6. The strategic retention of task-relevant objects in visual working memory.

Authors: Ashleigh M Maxcey-Richard; Andrew Hollingworth
Journal: J Exp Psychol Learn Mem Cogn Date: 2012-07-30 Impact factor: 3.051

Review 7. Dorsal Anterior Cingulate Cortex: A Bottom-Up View.

Authors: Sarah R Heilbronner; Benjamin Y Hayden
Journal: Annu Rev Neurosci Date: 2016-04-18 Impact factor: 12.449

8. Learning the value of information in an uncertain world.

Authors: Timothy E J Behrens; Mark W Woolrich; Mark E Walton; Matthew F S Rushworth
Journal: Nat Neurosci Date: 2007-08-05 Impact factor: 24.884

9. Exploring the 4th dimension: hippocampus, time, and memory revisited.

Authors: Bin Yin; Andrew B Troger
Journal: Front Integr Neurosci Date: 2011-08-11

10. VBA: a probabilistic treatment of nonlinear models for neurobiological and behavioural data.

Authors: Jean Daunizeau; Vincent Adam; Lionel Rigoux
Journal: PLoS Comput Biol Date: 2014-01-23 Impact factor: 4.475

4 in total

Review 1. From exploration to exploitation: a shifting mental mode in late life development.

Authors: R Nathan Spreng; Gary R Turner
Journal: Trends Cogn Sci Date: 2021-09-27 Impact factor: 20.229

2. Humans adaptively resolve the explore-exploit dilemma under cognitive constraints: Evidence from a multi-armed bandit task.

Authors: Vanessa M Brown; Michael N Hallquist; Michael J Frank; Alexandre Y Dombrovski
Journal: Cognition Date: 2022-07-30

Review 3. Search for solutions, learning, simulation, and choice processes in suicidal behavior.

Authors: Alexandre Y Dombrovski; Michael N Hallquist
Journal: Wiley Interdiscip Rev Cogn Sci Date: 2021-05-18

4. Differential reinforcement encoding along the hippocampal long axis helps resolve the explore-exploit dilemma.

Authors: Alexandre Y Dombrovski; Beatriz Luna; Michael N Hallquist
Journal: Nat Commun Date: 2020-10-26 Impact factor: 14.919

4 in total