| Literature DB >> 31770715 |
Kate Nussenbaum1, Catherine A Hartley2.
Abstract
The past decade has seen the emergence of the use of reinforcement learning models to study developmental change in value-based learning. It is unclear, however, whether these computational modeling studies, which have employed a wide variety of tasks and model variants, have reached convergent conclusions. In this review, we examine whether the tuning of model parameters that govern different aspects of learning and decision-making processes vary consistently as a function of age, and what neurocognitive developmental changes may account for differences in these parameter estimates across development. We explore whether patterns of developmental change in these estimates are better described by differences in the extent to which individuals adapt their learning processes to the statistics of different environments, or by more static learning biases that emerge across varied contexts. We focus specifically on learning rates and inverse temperature parameter estimates, and find evidence that from childhood to adulthood, individuals become better at optimally weighting recent outcomes during learning across diverse contexts and less exploratory in their value-based decision-making. We provide recommendations for how these two possibilities - and potential alternative accounts - can be tested more directly to build a cohesive body of research that yields greater insight into the development of core learning processes.Entities:
Keywords: Computational modeling; Decision making; Reinforcement learning
Mesh:
Year: 2019 PMID: 31770715 PMCID: PMC6974916 DOI: 10.1016/j.dcn.2019.100733
Source DB: PubMed Journal: Dev Cogn Neurosci ISSN: 1878-9293 Impact factor: 6.464
Fig. 1The softmax function transforms estimates of the value of different options into choice probabilities. The inverse temperature determines the extent to which differences in the value of different options are scaled. When the inverse temperature is high, differences are exaggerated and choices are more deterministic.
Fig. 2Simulated data from 40,000 agents in two different learning environments indicate that the optimal asymmetry between positive and negative learning rates differs across contexts. In a two-armed bandit task with static, asymmetric reward probabilities and binary rewards, agents earned the most reward by implementing a slightly higher positive relative to negative learning rate (A). In a two-armed bandit task with static but equivalent reward probabilities, but rewards that differed in their magnitude, agents generally earned the most reward by implementing a very low negative learning rate (B).