Warning: Undefined array key "mm" in /www/wwwroot/www.ai-bt.com/si.php on line 10 Deprecated: trim(): Passing null to parameter #1 ($string) of type string is deprecated in /www/wwwroot/www.ai-bt.com/si.php on line 10 Value and reward based learning in neurorobots.

Literature DB >> 24062683

Value and reward based learning in neurorobots.

Abstract

Entities: Chemical Disease Species

Keywords: action selection; basal ganglia; embodied cognition; neuromodulation; neurorobotics; reinforcement learning; reward-based learning; value system

Year: 2013 PMID： 24062683 PMCID： PMC3772325 DOI： 10.3389/fnbot.2013.00013

Source DB: PubMed Journal: Front Neurorobot ISSN： 1662-5218 Impact factor: 2.650

× No keyword cloud information.

Organisms are equipped with value systems that signal the salience of environmental cues to their nervous system, causing a change in the nervous system that results in modification of their behavior. These systems are necessary for an organism to adapt its behavior when an important environmental event occurs. A value system constitutes a basic assumption of what is good and bad for an agent. These value systems have been effectively used in robotic systems to shape behavior. For example, many robots have used models of the dopaminergic system to reinforce behavior that leads to rewards. Other modulatory systems that shape behavior are acetylcholine's effect on attention, norepinephrine's effect on vigilance, and serotonin's effect on impulsiveness, mood, and risk. Moreover, hormonal systems such as oxytocin and its effect on trust constitute as a value system. A recent Research Topic in Frontiers of Neurorobotics explored value and reward based learning. The topic comprised of nine papers on research involving neurobiologically inspired robots whose behavior was shaped by value and reward learning, adapted through interaction with the environment, or shaped by extracting value from the environment. Value systems are often linked to reward systems in neurobiology and in modeling. For example, Jayet Bray and her colleagues developed a neurorobotic system that learned to categorize the valence of speech through positive verbal encouragement, much like a baby would (Jayet Bray et al., 2013). Their virtual robot, which interacted with a human partner, was controlled by a large-scale spiking neuron model of the visual cortex, premotor cortex, and reward system. An important issue in both biological and artificial reward systems is the credit assignment problem that is, how can a distal cue be linked to a reward. In other words, how can you extract the stimulus that predicts a future reward from all the noisy stimuli that you are faced with? Soltoggio and colleagues introduce the principle of rare correlations to resolve this issue (Soltosggio et al., 2013). By using Rarely Correlating Hebbian Plasticity, they demonstrated classical and operant conditioning in a set of human-robot experiments with the iCub robot. The notion of value and reward has often been formalized in reinforcement learning systems. For example, Li and colleagues show that reinforcement learning, in the form of a dynamic actor-critic model, can be used to tune central pattern generators in a humanoid robot (Li et al., 2013). Through interaction with the environment, this dynamical system developed biped locomotion on a NAO robot that could adapt its gaits to different conditions. Elfwing and colleagues introduced a scaled version of free-energy reinforcement learning (FERL) and applied it to visual recognition and navigation tasks (Elfwing et al., 2013). This novel algorithm was shown to be significantly better than standard FERL and feedforward neural network RL. Another related method, Linearly solvable Markov Decision Process (LMDP) has been shown to have advantages over RL in optimal control policy (Kinjo et al., 2013). Kinjo and colleagues demonstrated the power of LMDP for robot control by applying the method to a pole balancing task, and a visually guided navigation problem using their Spring Dog robot which has six degrees-of-freedom. Value does need not be reward-based; curiosity, harm, novelty, and uncertainty can all carry a value signal. For example, in a biomimetic model of the cortex, basal ganglia and phasic dopamine, Bolado-Gomez and colleagues (Bolado-Gomez and Gurney, 2013) showed that intrinsically motivated operant learning (i.e., action discovery) could replicate rodent experiments, in a virtual robot. In this case, phasic dopaminergic neuromodulation carried a novelty salience signal, rather than the more conventional reward signal. In a model called CURIOUSity-DRiven, Modular, Incremental Slow Feature Analysis (Curious Dr. MISFA), Luciw and colleagues showed that curiosity could shape the behavior of an iCub robot in a multi-context environment (Luciw et al., 2013). Their model was inspired by cortical regions of the brain involved in unsupervised learning, as well as neuromodulatory systems responsible for providing intrinsic rewards through dopamine and regulating levels of attention through norepinephrine. Different neuromodulatory systems in the brain may be related to different aspects of value (Krichmar, 2013). In a model of multiple neuromodulatory systems, Krichmar showed that interactions between the dopaminergic (reward), serotoninergic (harm aversion), and the cholinergic/noradrenergic (novelty) systems could lead to interesting behavioral control in an autonomous robot. Finally, in an interesting position paper, Friston, Adams, and Montague suggest that value is evidence, specifically log Bayesian evidence (Friston et al., 2012). They propose that reward or cost functions that underlie value in conventional models of optimal control can be cast as prior beliefs about future states, which is simply accumulation of evidence through Bayesian updating of posterior beliefs. As can be gleaned from reading the papers in the Research Topic, as well as the empirical evidence and studies they are built on, Value and Reward Based Learning is an active and broad area of research. The application to neurorobotics is important for several reasons: (1) It provides an embodied platform for testing hypotheses regarding the neural correlates of value and reward, (2) it provides a means to test more theoretical hypotheses on the acquisition of value and its function for biological and artificial systems, and (3) it may lead to the development of improved learning systems in robots and other autonomous agents.

9 in total

1. A neurorobotic platform to test the influence of neuromodulatory signaling on anxious and curious behavior.

Authors: Jeffrey L Krichmar
Journal: Front Neurorobot Date: 2013-02-05 Impact factor: 2.650

2. An intrinsic value system for developing multiple invariant representations with incremental slowness learning.

Authors: Matthew Luciw; Varun Kompella; Sohrob Kazerounian; Juergen Schmidhuber
Journal: Front Neurorobot Date: 2013-05-30 Impact factor: 2.650

3. Evaluation of linearly solvable Markov decision process with dynamic model learning in a mobile robot navigation task.

Authors: Ken Kinjo; Eiji Uchibe; Kenji Doya
Journal: Front Neurorobot Date: 2013-04-05 Impact factor: 2.650

4. Reward-based learning for virtual neurorobotics through emotional speech processing.

Authors: Laurence C Jayet Bray; Gareth B Ferneyhough; Emily R Barker; Corey M Thibeault; Frederick C Harris
Journal: Front Neurorobot Date: 2013-04-29 Impact factor: 2.650

5. Humanoids Learning to Walk: A Natural CPG-Actor-Critic Architecture.

Authors: Cai Li; Robert Lowe; Tom Ziemke
Journal: Front Neurorobot Date: 2013-04-08 Impact factor: 2.650

6. A biologically plausible embodied model of action discovery.

Authors: Rufino Bolado-Gomez; Kevin Gurney
Journal: Front Neurorobot Date: 2013-03-12 Impact factor: 2.650

7. Scaled free-energy based reinforcement learning for robust and efficient learning in high-dimensional state spaces.

Authors: Stefan Elfwing; Eiji Uchibe; Kenji Doya
Journal: Front Neurorobot Date: 2013-02-28 Impact factor: 2.650

8. Rare neural correlations implement robotic conditioning with delayed rewards and disturbances.

Authors: Andrea Soltoggio; Andre Lemme; Felix Reinhart; Jochen J Steil
Journal: Front Neurorobot Date: 2013-04-02 Impact factor: 2.650

9. What is value-accumulated reward or evidence?

Authors: Karl Friston; Rick Adams; Read Montague
Journal: Front Neurorobot Date: 2012-11-02 Impact factor: 2.650

9 in total

4 in total

1. Operant conditioning: a minimal components requirement in artificial spiking neurons designed for bio-inspired robot's controller.

Authors: André Cyr; Mounir Boukadoum; Frédéric Thériault
Journal: Front Neurorobot Date: 2014-07-25 Impact factor: 2.650

2. Hedonic quality or reward? A study of basic pleasure in homeostasis and decision making of a motivated autonomous robot.

Authors: Matthew Lewis; Lola Cañamero
Journal: Adapt Behav Date: 2016-10-12 Impact factor: 1.942

3. Learning touch preferences with a tactile robot using dopamine modulated STDP in a model of insular cortex.

Authors: Ting-Shuo Chou; Liam D Bucci; Jeffrey L Krichmar
Journal: Front Neurorobot Date: 2015-07-22 Impact factor: 2.650

4. Editorial: Neural plasticity for rich and uncertain robotic information streams.

Authors: Andrea Soltoggio; Frank van der Velde
Journal: Front Neurorobot Date: 2015-10-27 Impact factor: 2.650

4 in total