Literature DB >> 12371519

Control of exploitation-exploration meta-parameter in reinforcement learning.

Shin Ishii1, Wako Yoshida, Junichiro Yoshimoto.   

Abstract

In reinforcement learning (RL), the duality between exploitation and exploration has long been an important issue. This paper presents a new method that controls the balance between exploitation and exploration. Our learning scheme is based on model-based RL, in which the Bayes inference with forgetting effect estimates the state-transition probability of the environment. The balance parameter, which corresponds to the randomness in action selection, is controlled based on variation of action results and perception of environmental change. When applied to maze tasks, our method successfully obtains good controls by adapting to environmental changes. Recently, Usher et al. [Science 283 (1999) 549] has suggested that noradrenergic neurons in the locus coeruleus may control the exploitation-exploration balance in a real brain and that the balance may correspond to the level of animal's selective attention. According to this scenario, we also discuss a possible implementation in the brain.

Entities:  

Mesh:

Year:  2002        PMID: 12371519     DOI: 10.1016/s0893-6080(02)00056-4

Source DB:  PubMed          Journal:  Neural Netw        ISSN: 0893-6080


  27 in total

1.  Nicotinic receptors in the ventral tegmental area promote uncertainty-seeking.

Authors:  Jérémie Naudé; Stefania Tolu; Malou Dongelmans; Nicolas Torquet; Sébastien Valverde; Guillaume Rodriguez; Stéphanie Pons; Uwe Maskos; Alexandre Mourot; Fabio Marti; Philippe Faure
Journal:  Nat Neurosci       Date:  2016-01-18       Impact factor: 24.884

2.  Striatal dopamine modulates song spectral but not temporal features through D1 receptors.

Authors:  Arthur Leblois; David J Perkel
Journal:  Eur J Neurosci       Date:  2012-05-17       Impact factor: 3.386

3.  Regulatory fit effects in a choice task.

Authors:  Darrell A Worthy; W Todd Maddox; Arthur B Markman
Journal:  Psychon Bull Rev       Date:  2007-12

4.  Common neural code for reward and information value.

Authors:  Kenji Kobayashi; Ming Hsu
Journal:  Proc Natl Acad Sci U S A       Date:  2019-06-11       Impact factor: 11.205

5.  Individual differences in implicit motor learning: task specificity in sensorimotor adaptation and sequence learning.

Authors:  Alit Stark-Inbar; Meher Raza; Jordan A Taylor; Richard B Ivry
Journal:  J Neurophysiol       Date:  2016-11-02       Impact factor: 2.714

Review 6.  The hierarchically mechanistic mind: an evolutionary systems theory of the human brain, cognition, and behavior.

Authors:  Paul B Badcock; Karl J Friston; Maxwell J D Ramstead; Annemie Ploeger; Jakob Hohwy
Journal:  Cogn Affect Behav Neurosci       Date:  2019-12       Impact factor: 3.282

7.  Dynamic Flexibility in Striatal-Cortical Circuits Supports Reinforcement Learning.

Authors:  Raphael T Gerraty; Juliet Y Davidow; Karin Foerde; Adriana Galvan; Danielle S Bassett; Daphna Shohamy
Journal:  J Neurosci       Date:  2018-02-05       Impact factor: 6.167

8.  Adaptive Regulation of Motor Variability.

Authors:  Ashesh K Dhawale; Yohsuke R Miyamoto; Maurice A Smith; Bence P Ölveczky
Journal:  Curr Biol       Date:  2019-10-17       Impact factor: 10.834

9.  The Effects of 24-hour Sleep Deprivation on the Exploration-Exploitation Trade-off.

Authors:  Brian D Glass; W Todd Maddox; Christopher Bowen; Zachary R Savarie; Michael D Matthews; Arthur B Markman; David M Schnyer
Journal:  Biol Rhythm Res       Date:  2011-04       Impact factor: 1.219

10.  Decision making under uncertainty: a neural model based on partially observable markov decision processes.

Authors:  Rajesh P N Rao
Journal:  Front Comput Neurosci       Date:  2010-11-24       Impact factor: 2.380

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.