| Literature DB >> 30502584 |
Michael N Hallquist1, Alexandre Y Dombrovski2.
Abstract
In natural environments with many options of uncertain value, one faces a difficult tradeoff between exploiting familiar, valuable options or searching for better alternatives. Reinforcement learning models of this exploration/exploitation dilemma typically modulate the rate of exploratory choices or preferentially sample uncertain options. The extent to which such models capture human behavior remains unclear, in part because they do not consider the constraints on remembering what is learned. Using reinforcement-based timing as a motivating example, we show that selectively maintaining high-value actions compresses the amount of information to be tracked in learning, as quantified by Shannon's entropy. In turn, the information content of the value representation controls the balance between exploration (high entropy) and exploitation (low entropy). Selectively maintaining preferred action values while allowing others to decay renders the choices increasingly exploitative across learning episodes. To adjudicate among alternative maintenance and sampling strategies, we developed a new reinforcement learning model, StrategiC ExPloration/ExPloitation of Temporal Instrumental Contingencies (SCEPTIC). In computational studies, a resource-rational selective maintenance approach was as successful as more resource-intensive strategies. Furthermore, human behavior was consistent with selective maintenance; information compression was most pronounced in subjects with superior performance and non-verbal intelligence, and in learnable vs. unlearnable contingencies. Cognitively demanding uncertainty-directed exploration recovered a more accurate representation in simulations with no foraging advantage and was strongly unsupported in our human study.Entities:
Keywords: Conditioning, operant; Decision making; Entropy; Exploratory behavior; Memory, short-term; Reinforcement
Mesh:
Year: 2018 PMID: 30502584 PMCID: PMC6328060 DOI: 10.1016/j.cognition.2018.11.004
Source DB: PubMed Journal: Cognition ISSN: 0010-0277