Warning: Undefined array key "mm" in /www/wwwroot/www.ai-bt.com/si.php on line 10 Deprecated: trim(): Passing null to parameter #1 ($string) of type string is deprecated in /www/wwwroot/www.ai-bt.com/si.php on line 10 Learning reward frequency over reward probability: A tale of two learning rules.

Literature DB >> 31430606

Learning reward frequency over reward probability: A tale of two learning rules.

Hilary J Don¹, A Ross Otto², Astin C Cornwall³, Tyler Davis⁴, Darrell A Worthy³.

Abstract

Learning about the expected value of choice alternatives associated with reward is critical for adaptive behavior. Although human choice preferences are affected by the presentation frequency of reward-related alternatives, this may not be captured by some dominant models of value learning, such as the delta rule. In this study, we examined whether reward learning is driven more by learning the probability of reward provided by each option, or how frequently each option has been rewarded, and assess how well models based on average reward (e.g. the delta model) and models based on cumulative reward (e.g. the decay model) can account for choice preferences. In a binary-outcome choice task, participants selected between pairs of options that had reward probabilities of 0.65 (A) versus 0.35 (B) or 0.75 (C) versus 0.25 (D). Crucially, during training there were twice the number of AB trials as CD trials, such that option A was associated with higher cumulative reward, while option C gave higher average reward. Participants then decided between novel combinations of options (e.g., AC). Most participants preferred option A over C, a result predicted by the Decay model, but not the Delta model. We also compared the Delta and Decay models to both more simplified as well as more complex models that assumed additional mechanisms, such as representation of uncertainty. Overall, models that assume learning about cumulative reward provided the best account of the data.

Entities: Chemical Disease Gene Species

Keywords: Decay rule; Delta rule; Prediction error; Probability learning; Reinforcement learning; Reward frequency

Mesh：

Year: 2019 PMID： 31430606 PMCID： PMC6814570 DOI： 10.1016/j.cognition.2019.104042

Source DB: PubMed Journal: Cognition ISSN： 0010-0277

26 in total

Review 1. Neuronal coding of prediction errors.

Authors: W Schultz; A Dickinson
Journal: Annu Rev Neurosci Date: 2000 Impact factor: 12.449

2. Stimulus recognition and the mere exposure effect.

Authors: R F Bornstein; P R D'Agostino
Journal: J Pers Soc Psychol Date: 1992-10

3. Cortical substrates for exploratory decisions in humans.

Authors: Nathaniel D Daw; John P O'Doherty; Peter Dayan; Ben Seymour; Raymond J Dolan
Journal: Nature Date: 2006-06-15 Impact factor: 49.962

4. How persuasive is a good fit? A comment on theory testing.

Authors: S Roberts; H Pashler
Journal: Psychol Rev Date: 2000-04 Impact factor: 8.934

5. Dopamine-dependent prediction errors underpin reward-seeking behaviour in humans.

Authors: Mathias Pessiglione; Ben Seymour; Guillaume Flandin; Raymond J Dolan; Chris D Frith
Journal: Nature Date: 2006-08-23 Impact factor: 49.962

6. Working-memory load and temporal myopia in dynamic decision making.

Authors: Darrell A Worthy; A Ross Otto; W Todd Maddox
Journal: J Exp Psychol Learn Mem Cogn Date: 2012-04-30 Impact factor: 3.051

7. How much of reinforcement learning is working memory, not reinforcement learning? A behavioral, computational, and neurogenetic analysis.

Authors: Anne G E Collins; Michael J Frank
Journal: Eur J Neurosci Date: 2012-04 Impact factor: 3.386

8. A Comparison Model of Reinforcement-Learning and Win-Stay-Lose-Shift Decision-Making Processes: A Tribute to W.K. Estes.

Authors: Darrell A Worthy; W Todd Maddox
Journal: J Math Psychol Date: 2014-04-01 Impact factor: 2.223

Learning reward frequency over reward probability: A tale of two learning rules.

Review 1. Neuronal coding of prediction errors.

2. Stimulus recognition and the mere exposure effect.

3. Cortical substrates for exploratory decisions in humans.

4. How persuasive is a good fit? A comment on theory testing.

5. Dopamine-dependent prediction errors underpin reward-seeking behaviour in humans.

6. Working-memory load and temporal myopia in dynamic decision making.

7. How much of reinforcement learning is working memory, not reinforcement learning? A behavioral, computational, and neurogenetic analysis.

8. A Comparison Model of Reinforcement-Learning and Win-Stay-Lose-Shift Decision-Making Processes: A Tribute to W.K. Estes.

9. Adult age differences in frontostriatal representation of prediction error but not reward outcome.

10. A Unifying Probabilistic View of Associative Learning.

Review 1. Hearing hooves, thinking zebras: A review of the inverse base-rate effect.

2. The more, the merrier: Treatment frequency influences effectiveness perception and further treatment choice.

3. Choice perseverance underlies pursuing a hard-to-get target in an avatar choice task.