| Literature DB >> 28443814 |
Abstract
Two theoretical studies reveal how networks of neurons may behave during reward-based learning.Entities:
Keywords: cognition; computational biology; decision making; learning; modeling; neuroscience; none; recurrent neural networks; systems biology; working memory
Mesh:
Year: 2017 PMID: 28443814 PMCID: PMC5406203 DOI: 10.7554/eLife.26157
Source DB: PubMed Journal: Elife ISSN: 2050-084X Impact factor: 8.140
Figure 1.Models for reward-based learning in neural networks.
(A) Many behavioral tasks can be formulated as reward-based (or reinforcement) learning problems: the animal learns to use sensory inputs to perform actions that maximize the expected reward. Miconi and, independently, Song et al. addressed two complementary aspects of how brain circuits might solve such problems. (B) Miconi studied a biologically plausible form of a synaptic plasticity rule (ability of a synapse to strengthen or weaken), which is modulated by reward and is capable of learning complex tasks by adjusting the connectivity of a “decision network” (Miconi, 2017). The strengths of synapses are modified according to a function of the electrical activities on each side of the synapse and a delayed reward signal (R) delivered at the end of each trial. Critically, successful learning requires an appropriate reward baseline (Rb) to be subtracted from the actual reward, but exactly how this baseline can be estimated by another circuit is not addressed. (C) Song et al. show that the total future reward can be estimated dynamically by a separate “value network” that integrates the rewards received from the environment as well as the activity (and outputs) of the decision network (Song et al., 2017). The output of the value network then serves as a reward baseline used to modulate a mathematically optimal, but biologically infeasible, rule that governs the synaptic plasticity in the decision network. Neurons are shown as gray circles, and synapses as black lines with a circle at one end.