| Literature DB >> 29250006 |
Florian Bolenz1, Andrea M F Reiter1,2, Ben Eppinger1,3,4.
Abstract
Our ability to learn from the outcomes of our actions and to adapt our decisions accordingly changes over the course of the human lifespan. In recent years, there has been an increasing interest in using computational models to understand developmental changes in learning and decision-making. Moreover, extensions of these models are currently applied to study socio-emotional influences on learning in different age groups, a topic that is of great relevance for applications in education and health psychology. In this article, we aim to provide an introduction to basic ideas underlying computational models of reinforcement learning and focus on parameters and model variants that might be of interest to developmental scientists. We then highlight recent attempts to use reinforcement learning models to study the influence of social information on learning across development. The aim of this review is to illustrate how computational models can be applied in developmental science, what they can add to our understanding of developmental mechanisms and how they can be used to bridge the gap between psychological and neurobiological theories of development.Entities:
Keywords: cognitive modeling; decision-making; developmental neuroscience; lifespan; reinforcement learning; social cognition
Year: 2017 PMID: 29250006 PMCID: PMC5715389 DOI: 10.3389/fpsyg.2017.02048
Source DB: PubMed Journal: Front Psychol ISSN: 1664-1078
Figure 1Structure of a real-world decision as a Markov decision process. The state “ice cream 922 parlor” has two available actions, “chocolate ice cream” and “strawberry ice cream”. With a certain probability (represented by numbers next to the arrows), each choice leads to either a reward state or a non-reward state.
Figure 2(A) Development of a state-action value for two different learning rates. For the purpose of illustration, we assume that the agent makes identical choices across all trials. Filled and empty circles indicate trials in which the action was rewarded (r = 1) or not rewarded (r = 0), respectively. With a high learning rate (light line), the state-action value estimate fluctuates strongly, representing the rewards of the most recent trials. In contrast, with a low learning rate (dark line), the state-action value is more stable because it pools over more of the previous trials. (B) The higher the inverse softmax temperature, the more it is likely to prefer an action with a state-action value of 1 over another action with a state-action value of 0.